Performance And Tuning

This document keeps resource sizing, tuning knobs, and benchmark workflows out of the root README.

Resource Expectations

The numbers below are local measurements from this repository as of 2026-03-29, unless labeled as estimates.

Disk

Measured local asset sizes:

daemon binary: 7.7M
bundled Nomic model directory: 523M
bundled MiniLM fallback model directory: 87M
optional T5 summarizer directory: 371M
unpacked ONNX Runtime directory on macOS arm64: 44M
ONNX Runtime archive download on macOS arm64: 9.5M

Vector payload lower bounds:

MiniLM 384d: 384 * 4 = 1536 bytes per vector
Nomic 768d: 768 * 4 = 3072 bytes per vector

Estimated lower-bound vector payload for 10,000 stored turns:

MiniLM: about 15.4 MB
Nomic: about 30.7 MB

Actual on-disk LibraVDB usage is higher because text, metadata, collection structure, and index state are stored as well.

Memory

Measured on Apple M2 by starting the daemon and reading RSS after startup:

Nomic embedding path loaded without optional T5 summarizer: about 266 MB
Nomic plus local ONNX T5 summarizer loaded: about 503 MB

Not yet bench-measured in this repo:

RSS during active inference
peak RSS during compaction of large clusters

CPU

Measured from the current Go benchmark harness on Apple M2:

MiniLM bundled query embedding: about 22.6 ms/op
MiniLM onnx-local query embedding: about 16.3 ms/op
Nomic onnx-local query embedding: about 43.7 ms/op

Measured from a one-off 40-query timing sample on Apple M2:

Nomic query embedding p50: about 18.61 ms
Nomic query embedding p95: about 24.19 ms

Measured from a one-off synthetic 50-turn compaction run with the current extractive summarizer and Nomic embeddings:

50-turn extractive compaction wall time: about 3175 ms

Not yet bench-measured:

equivalent Linux x64 embedding latency on a reference machine
50-turn compaction wall time through the optional ONNX T5 abstractive path

Runtime Tuning Fields

Prefer the defaults unless you are measuring a specific problem. These fields are advanced controls, not required install settings.

Field	Effect
`topK`	Search result budget before prompt fitting.
`alpha`, `beta`, `gamma`	Hybrid scoring weights for similarity, scope, and recency-style signals.
`ingestionGateThreshold`	Durable-memory promotion threshold, default `0.35`.
`gatingWeights`	Domain-adaptive admission weights for conversational and technical memory.
`gatingTechNorm`	Normalization control for the technical-content gate.
`gatingCentroidK`	Number of centroid candidates used by the gate.
`compactionQualityWeight`	How much summary confidence affects retrieval score, default `0.5`.
`recencyLambdaSession`	Session-memory recency decay.
`recencyLambdaUser`	Durable user-memory recency decay.
`recencyLambdaGlobal`	Global-memory recency decay.
`tokenBudgetFraction`	Fraction of host context budget available to memory assembly.
`compactThreshold`	Explicit compaction trigger threshold.
`compactionThresholdFraction`	Dynamic trigger ratio when `compactThreshold` is unset, default `0.8`.
`compactSessionTokenBudget`	Auto-compaction budget since the last compaction, default `2000`; set `0` to disable.
`rpcTimeoutMs`	Sidecar RPC timeout, default `30000`.
`maxRetries`	Retry budget for sidecar RPC calls.
`logLevel`	Plugin log level.

Model-related fields live in Embedding profiles and Models.

LongMemEval Harness

The repository includes a local LongMemEval harness that runs the dataset through the plugin layer and checks whether the assembled prompt still contains the evidence turns.

The benchmark runner is committed, but the dataset and generated reports are not. Keep downloaded data and local outputs under benchmarks/longmemeval/, which is ignored by default.

Run it with:

LONGMEMEVAL_DATA_FILE=/path/to/longmemeval_oracle.json pnpm run benchmark:longmemeval

If you already have a daemon running and do not want the benchmark to spawn another one, set:

LONGMEMEVAL_USE_EXISTING_DAEMON=1 \
LONGMEMEVAL_SIDECAR_PATH=unix:/path/to/libravdb.sock \
pnpm run benchmark:longmemeval

Optional controls:

LONGMEMEVAL_LIMIT caps the number of questions
LONGMEMEVAL_TOPK changes the search budget
LONGMEMEVAL_OUT_FILE writes JSONL records for analysis

The harness writes JSONL incrementally, so partial results survive if a transient daemon failure interrupts a long run. If the local test daemon drops mid-run, the benchmark restarts it and retries the current instance once before recording an error result.

To score a hypothesis JSONL file with the official LongMemEval evaluator:

LONGMEMEVAL_EVAL_REPO=/path/to/LongMemEval \
LONGMEMEVAL_HYPOTHESIS_FILE=/path/to/hypotheses.jsonl \
LONGMEMEVAL_DATA_FILE=/path/to/longmemeval_oracle.json \
OPENAI_API_KEY=... \
pnpm run benchmark:longmemeval:score

The scorer wrapper shells out to the official Python evaluation script and then prints aggregate metrics from the generated log when available.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance And Tuning

Resource Expectations

Disk

Memory

CPU

Runtime Tuning Fields

LongMemEval Harness

FilesExpand file tree

performance-and-tuning.md

Latest commit

History

performance-and-tuning.md

File metadata and controls

Performance And Tuning

Resource Expectations

Disk

Memory

CPU

Runtime Tuning Fields

LongMemEval Harness