Skip to content

Commit 3169ccd

Browse files
aksOpsclaude
andauthored
block 4: observability & ops (version, health, metrics, access-log, log-format) (#74)
* feat(api): GET /api/version with ldflags + BuildInfo fallback Moves the shared version-resolution logic out of cmd into a new internal/buildinfo package so internal/api can serve the same data without a cmd import cycle. Adds a public GET /api/version endpoint that returns {version, commit, build_date, go_version, dirty, deps} as JSON. Makefile LDFLAGS retargeted at the new package path. bearerAuthMiddleware public bypass list extended with /healthz, /readyz, and /api/version so upcoming probes and version endpoint remain public. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(api): /healthz + /readyz probes with 10s cache /healthz is dependency-free liveness. /readyz checks SQLite PingContext and an LLM provider Complete(maxTokens=1) reach, caching the verdict for 10 seconds to absorb Prometheus + Kubernetes probe loops. Nil provider (config provider: none) reports llm.status=skipped and keeps readiness green. Legacy /health route remains as a 200-returning alias for older clients; /healthz is the canonical probe. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(obs): Prometheus metrics + workq stats Replaces the ad-hoc text-format collector in internal/api/metrics.go with the official prometheus/client_golang v1.20.5. Adds a new internal/obs package hosting the Default registry and per-subject collectors (HTTP, pipeline stages, embed latency, LLM tokens, workq depth + rejections, build info). Workq gains a Pool.Stats() snapshot accessor with rejectedTotal counter; cmd/serve.go initialises obs and binds the live pool stats provider. Pipeline IndexPath/IndexURL/Finalize are wrapped with TimeStage for granular stage timings via a nil-safe helper. Embedder observes per-batch provider latency; LLM Complete records an approximate token count (bytes/4) until langchaingo usage data is threaded through the Provider interface (tracked as follow-up). HTTP recording moved out of loggingMiddleware's bespoke collector into obs.HTTP.Observe; uses the Go 1.22 r.Pattern route to bound label cardinality. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(api): structured access log with bytes_out + panic resilience loggingMiddleware now emits one JSON/text slog line per request with {req_id, method, path, route, status, duration_ms, bytes_out, auth, project, panic}. The emission is deferred so a panic escaping recoveryMiddleware still produces an access-log entry. auth is a coarse label (bearer|cookie|anon) because docsiq uses a single shared API key; there is no real user identity. responseWriter now tracks bytes via an overridden Write and also proxies Flush for SSE/ streaming handlers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(log): log.format=json|text with emoji-strip production handler New log.format config key (default "text"; DOCSIQ_LOG_FORMAT env). Precedence is --log-format > env > config > default. The json handler is wrapped in obs.NewProductionHandler, which strips a leading emoji from slog Record.Message so log aggregators do not have to special- case multi-byte sequences. The text handler keeps emoji for human readers. Adds config-level defaults + env binding and three load- level tests covering default, YAML, and env-var precedence. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(mock): add BatchCeiling() to satisfy post-rebase llm.Provider interface Block 3 (merged to main) added BatchCeiling() int to llm.Provider. After rebasing feat/block4-observability onto main, the mock provider needed the new method to compile. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(api): bypass TimeoutHandler for SSE / streaming routes http.TimeoutHandler buffers the response body and does not implement http.Flusher. Block 4's responseWriter wrapper grew a Flush() method, so SSE handlers now pass the Flusher type-assertion and enter their streaming loop — but every Flush is a no-op because the underlying timeoutWriter absorbs writes until the request completes. The client times out reading the body at 30s (matches cfg.Server.RequestTimeout). Carve GET /api/upload/progress and GET /mcp out of the timeout wrapper so Flush propagates to the real net/http writer. SSE teardown still runs via r.Context() cancellation on client disconnect or shutdown. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(health,test): real SSE stream consumption + decoupled probe ctx + lazy sq pinger Three fixes bundled, all flagged by post-rebase CI or Codex review: 1. waitUploadDone now consumes the /api/upload/progress SSE stream incrementally via bufio.Scanner instead of ReadAll. Prior code worked on main only because Flush() was missing on responseWriter, which made the handler return 500 immediately; with Block 4's Flusher now wired through, ReadAll blocks until the handler closes the stream, tripping the client's 30s timeout before "done" arrives. 2. /readyz SQLite probe no longer installs a no-op success fallback when stores.Get fails at router build time. A lazy pinger resolves the default store at probe time, so a genuine open failure (permissions, corruption, disk) surfaces as 503 — and a store that becomes available later flips readiness green without a restart. 3. readyzCache.check decouples probe context from the incoming request context via context.WithoutCancel. Previously, a probing client (Kubernetes, Prometheus, curl) disconnecting mid-probe would return context.Canceled and pollute the 10-second cache for every subsequent caller. Coverage: new TestReadyz_ProbeCtxDecoupledFromRequestCtx exercises (3). TestIsStreamingRoute_Classification and TestRequestTimeoutMiddleware_StreamingRouteBypassesTimeout already cover the Block-4-aware SSE bypass added in 8f9595b. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 3a61148 commit 3169ccd

37 files changed

Lines changed: 2327 additions & 747 deletions

Makefile

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,9 @@
33
VERSION ?= $(shell git describe --tags --always --dirty 2>/dev/null || echo dev)
44
COMMIT ?= $(shell git rev-parse --short HEAD 2>/dev/null || echo unknown)
55
DATE ?= $(shell date -u +%Y-%m-%dT%H:%M:%SZ)
6-
LDFLAGS := -X github.com/RandomCodeSpace/docsiq/cmd.Version=$(VERSION) \
7-
-X github.com/RandomCodeSpace/docsiq/cmd.Commit=$(COMMIT) \
8-
-X github.com/RandomCodeSpace/docsiq/cmd.Date=$(DATE)
6+
LDFLAGS := -X github.com/RandomCodeSpace/docsiq/internal/buildinfo.Version=$(VERSION) \
7+
-X github.com/RandomCodeSpace/docsiq/internal/buildinfo.Commit=$(COMMIT) \
8+
-X github.com/RandomCodeSpace/docsiq/internal/buildinfo.Date=$(DATE)
99

1010
ui-install:
1111
cd ui && npm install

cmd/logformat_test.go

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,10 @@ package cmd
33
import (
44
"bytes"
55
"encoding/json"
6+
"io"
67
"log/slog"
8+
"os"
9+
"path/filepath"
710
"testing"
811
)
912

@@ -28,3 +31,73 @@ func TestLogFormatJSON(t *testing.T) {
2831
t.Errorf("k = %v, want v", decoded["k"])
2932
}
3033
}
34+
35+
// TestBuildLogHandler_JSONStripsEmoji confirms buildLogHandler returns
36+
// a handler chain that strips emoji from the message when format=json.
37+
// buildLogHandler writes to os.Stderr so we redirect it through a pipe
38+
// for the test (NOT parallel — shares global os.Stderr).
39+
func TestBuildLogHandler_JSONStripsEmoji(t *testing.T) {
40+
origStderr := os.Stderr
41+
r, w, err := os.Pipe()
42+
if err != nil {
43+
t.Fatalf("pipe: %v", err)
44+
}
45+
os.Stderr = w
46+
t.Cleanup(func() { os.Stderr = origStderr })
47+
48+
h := buildLogHandler(slog.LevelInfo, "json")
49+
slog.New(h).Info("✅ ready", "k", "v")
50+
_ = w.Close()
51+
52+
out, err := io.ReadAll(r)
53+
if err != nil {
54+
t.Fatalf("read: %v", err)
55+
}
56+
57+
var rec map[string]any
58+
if err := json.Unmarshal(bytes.TrimSpace(out), &rec); err != nil {
59+
t.Fatalf("not JSON: %v — raw=%q", err, out)
60+
}
61+
msg, _ := rec["msg"].(string)
62+
if msg != "ready" {
63+
t.Errorf("msg=%q want 'ready' (emoji stripped)", msg)
64+
}
65+
}
66+
67+
// TestInitConfig_LogFormatFromConfigFile asserts the config file is
68+
// consulted when neither --log-format nor DOCSIQ_LOG_FORMAT is set.
69+
// NOT parallel — mutates env + HOME + package-level flags.
70+
func TestInitConfig_LogFormatFromConfigFile(t *testing.T) {
71+
origHome := os.Getenv("HOME")
72+
origLogFormat := os.Getenv("DOCSIQ_LOG_FORMAT")
73+
t.Cleanup(func() {
74+
logLevel = "info"
75+
logFormat = ""
76+
cfgFile = ""
77+
cfg = nil
78+
os.Setenv("HOME", origHome)
79+
if origLogFormat != "" {
80+
os.Setenv("DOCSIQ_LOG_FORMAT", origLogFormat)
81+
}
82+
})
83+
84+
dir := t.TempDir()
85+
os.Setenv("HOME", dir)
86+
os.Unsetenv("DOCSIQ_LOG_FORMAT")
87+
88+
yaml := filepath.Join(dir, "config.yaml")
89+
if err := os.WriteFile(yaml, []byte("log:\n format: json\n"), 0o600); err != nil {
90+
t.Fatal(err)
91+
}
92+
cfgFile = yaml
93+
logFormat = ""
94+
95+
initConfig()
96+
97+
if cfg == nil {
98+
t.Fatal("initConfig produced nil cfg")
99+
}
100+
if cfg.Log.Format != "json" {
101+
t.Errorf("cfg.Log.Format=%q want json", cfg.Log.Format)
102+
}
103+
}

cmd/root.go

Lines changed: 34 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ import (
77
"strings"
88

99
"github.com/RandomCodeSpace/docsiq/internal/config"
10+
"github.com/RandomCodeSpace/docsiq/internal/obs"
1011
"github.com/spf13/cobra"
1112
)
1213

@@ -41,7 +42,7 @@ func init() {
4142
}
4243

4344
func initConfig() {
44-
// Set up structured logger
45+
// Set up structured logger. Level comes from --log-level only.
4546
var level slog.Level
4647
switch logLevel {
4748
case "debug":
@@ -53,32 +54,51 @@ func initConfig() {
5354
default:
5455
level = slog.LevelInfo
5556
}
56-
// Log format: --log-format wins, else DOCSIQ_LOG_FORMAT, else "text".
57+
58+
// Format resolution order (highest wins):
59+
// 1. --log-format flag
60+
// 2. DOCSIQ_LOG_FORMAT env var
61+
// 3. config file log.format
62+
// 4. default "text"
63+
// (3) requires config.Load() to have run; install a temporary
64+
// handler first so config-load errors land somewhere, then
65+
// upgrade once the final format is known.
5766
format := strings.ToLower(strings.TrimSpace(logFormat))
5867
if format == "" {
5968
format = strings.ToLower(strings.TrimSpace(os.Getenv("DOCSIQ_LOG_FORMAT")))
6069
}
61-
handlerOpts := &slog.HandlerOptions{Level: level}
62-
var handler slog.Handler
63-
switch format {
64-
case "json":
65-
handler = slog.NewJSONHandler(os.Stderr, handlerOpts)
66-
default:
67-
handler = slog.NewTextHandler(os.Stderr, handlerOpts)
68-
}
69-
slog.SetDefault(slog.New(handler))
70+
71+
slog.SetDefault(slog.New(buildLogHandler(level, format)))
7072

7173
var err error
7274
cfg, err = config.Load(cfgFile)
7375
if err != nil {
7476
slog.Error("❌ config error", "err", err)
7577
os.Exit(1)
7678
}
77-
if err := os.MkdirAll(cfg.DataDir, 0755); err != nil {
79+
if err := os.MkdirAll(cfg.DataDir, 0o755); err != nil {
7880
slog.Error("❌ failed to create data directory", "path", cfg.DataDir, "err", err)
7981
os.Exit(1)
8082
}
81-
}
82-
8383

84+
// If neither flag nor env specified format, use the value from
85+
// the loaded config (falls back to default "text").
86+
if format == "" && cfg.Log.Format != "" {
87+
format = strings.ToLower(strings.TrimSpace(cfg.Log.Format))
88+
slog.SetDefault(slog.New(buildLogHandler(level, format)))
89+
}
90+
}
8491

92+
// buildLogHandler assembles the slog handler chain. For "json" format
93+
// we wrap the JSON handler in obs.NewProductionHandler to strip emoji
94+
// prefixes from the message field (keeping them is harmless but noisy
95+
// for log aggregators). "text" keeps emoji for human readability.
96+
func buildLogHandler(level slog.Level, format string) slog.Handler {
97+
opts := &slog.HandlerOptions{Level: level}
98+
switch format {
99+
case "json":
100+
return obs.NewProductionHandler(slog.NewJSONHandler(os.Stderr, opts))
101+
default:
102+
return slog.NewTextHandler(os.Stderr, opts)
103+
}
104+
}

cmd/serve.go

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,11 @@ import (
1515
"time"
1616

1717
"github.com/RandomCodeSpace/docsiq/internal/api"
18+
"github.com/RandomCodeSpace/docsiq/internal/buildinfo"
1819
"github.com/RandomCodeSpace/docsiq/internal/config"
1920
"github.com/RandomCodeSpace/docsiq/internal/embedder"
2021
"github.com/RandomCodeSpace/docsiq/internal/llm"
22+
"github.com/RandomCodeSpace/docsiq/internal/obs"
2123
"github.com/RandomCodeSpace/docsiq/internal/project"
2224
"github.com/RandomCodeSpace/docsiq/internal/sqlitevec"
2325
"github.com/RandomCodeSpace/docsiq/internal/vectorindex"
@@ -154,6 +156,19 @@ var serveCmd = &cobra.Command{
154156
}
155157
pool := workq.New(workq.Config{Workers: workers, QueueDepth: depth})
156158

159+
// Observability — initialise the Prometheus registry once per
160+
// process and bind the workq stats provider so the /metrics
161+
// scrape can read live queue depth + rejection count.
162+
obs.Init()
163+
obs.Workq.BindStatsProvider(func() obs.WorkqStats {
164+
s := pool.Stats()
165+
return obs.WorkqStats{Depth: s.Depth, Rejected: s.Rejected}
166+
})
167+
{
168+
info := buildinfo.Resolve(false)
169+
api.SetBuildInfo(info.Version, info.Commit)
170+
}
171+
157172
router := api.NewRouter(prov, emb, cfg, registry,
158173
api.WithProjectStores(stores),
159174
api.WithVectorIndexes(vecIndexes),

cmd/version.go

Lines changed: 4 additions & 106 deletions
Original file line numberDiff line numberDiff line change
@@ -2,124 +2,22 @@ package cmd
22

33
import (
44
"fmt"
5-
"runtime/debug"
65

6+
"github.com/RandomCodeSpace/docsiq/internal/buildinfo"
77
"github.com/spf13/cobra"
88
)
99

10-
// Set via -ldflags at build time (see Makefile). These act as overrides.
11-
// When the binary is installed via `go install <module>@<version>`, go install
12-
// cannot pass -ldflags, so these remain at their sentinel defaults and we fall
13-
// back to runtime/debug.ReadBuildInfo() to populate version info from the VCS
14-
// data that `go install` embeds automatically.
15-
var (
16-
Version = "dev"
17-
Commit = "unknown"
18-
Date = "unknown"
19-
)
20-
21-
// VersionInfo holds resolved version metadata for the running binary.
22-
type VersionInfo struct {
23-
Version string
24-
Commit string
25-
Date string
26-
Dirty string // "true", "false", or "unknown"
27-
}
28-
29-
// isSentinel reports whether an ldflags variable is empty or equal to a
30-
// known default placeholder, meaning we should defer to ReadBuildInfo.
31-
func isSentinel(v string) bool {
32-
switch v {
33-
case "", "dev", "unknown":
34-
return true
35-
}
36-
return false
37-
}
38-
39-
// readBuildInfo is a package-level indirection so tests can substitute a
40-
// stub when needed. It mirrors the signature of debug.ReadBuildInfo.
41-
var readBuildInfo = debug.ReadBuildInfo
42-
43-
// versionInfo resolves the current version metadata using the following order:
44-
// 1. -ldflags overrides (if non-sentinel)
45-
// 2. runtime/debug.ReadBuildInfo() (module version + VCS settings)
46-
// 3. "unknown" for any remaining field
47-
func versionInfo() VersionInfo {
48-
vi := VersionInfo{
49-
Version: Version,
50-
Commit: Commit,
51-
Date: Date,
52-
Dirty: "unknown",
53-
}
54-
55-
info, ok := readBuildInfo()
56-
if !ok {
57-
if isSentinel(vi.Version) {
58-
vi.Version = "unknown"
59-
}
60-
if isSentinel(vi.Commit) {
61-
vi.Commit = "unknown"
62-
}
63-
if isSentinel(vi.Date) {
64-
vi.Date = "unknown"
65-
}
66-
return vi
67-
}
68-
69-
// Version: fall back to module version (e.g. "v0.5.0" or "(devel)").
70-
if isSentinel(vi.Version) {
71-
if info.Main.Version != "" {
72-
vi.Version = info.Main.Version
73-
} else {
74-
vi.Version = "unknown"
75-
}
76-
}
77-
78-
// Walk VCS settings for commit/time/modified.
79-
var vcsRev, vcsTime, vcsMod string
80-
for _, s := range info.Settings {
81-
switch s.Key {
82-
case "vcs.revision":
83-
vcsRev = s.Value
84-
case "vcs.time":
85-
vcsTime = s.Value
86-
case "vcs.modified":
87-
vcsMod = s.Value
88-
}
89-
}
90-
91-
if isSentinel(vi.Commit) {
92-
if vcsRev != "" {
93-
vi.Commit = vcsRev
94-
} else {
95-
vi.Commit = "unknown"
96-
}
97-
}
98-
if isSentinel(vi.Date) {
99-
if vcsTime != "" {
100-
vi.Date = vcsTime
101-
} else {
102-
vi.Date = "unknown"
103-
}
104-
}
105-
if vcsMod != "" {
106-
vi.Dirty = vcsMod
107-
}
108-
109-
return vi
110-
}
111-
11210
var versionCmd = &cobra.Command{
11311
Use: "version",
11412
Short: "Print the version of docsiq",
11513
Run: func(cmd *cobra.Command, args []string) {
116-
vi := versionInfo()
14+
info := buildinfo.Resolve(false)
11715
dirtySuffix := ""
118-
if vi.Dirty == "true" {
16+
if info.Dirty == "true" {
11917
dirtySuffix = " (dirty)"
12018
}
12119
fmt.Printf("docsiq %s (commit: %s, built: %s)%s\n",
122-
vi.Version, vi.Commit, vi.Date, dirtySuffix)
20+
info.Version, info.Commit, info.BuildDate, dirtySuffix)
12321
},
12422
}
12523

0 commit comments

Comments
 (0)