Problem
OnDiskGraphIndex is immutable. Today, to delete nodes or perform compaction, clients must:
- Load each on-disk graph into an in-memory
OnHeapGraphIndex.
- Apply deletions.
- Rewrite a new on-disk graph from the fully materialized heap index.
For large graphs (millions of nodes), this approach requires loading the full graph structure, which frequently leads to OOM and does not scale.
Proposal
Introduce an OnDiskGraphIndexCompactor that performs streaming N:1 compaction over multiple on-disk indexes without loading them into memory.
The compactor provides:
-
Delete Tracking
Track live/dead nodes using an in-memory FixedBitSet, allowing deletion without rewriting the index or constructing an OnHeapGraphIndex.
-
Streaming N:1 Compaction that performs streaming compaction
Write a new OnDiskGraphIndex via a streaming writer (no full materialization of the graph). Because the existing writer does not support streaming individual nodes, this proposal includes implementing a new CompactorWriter.
-
Custom Ordinal Remapping
Allow clients to pass a custom OridnalMapper that remaps ordinals from each source index to ordinals in the compacted output index.
Compactor Internal State
List<ImmutableGraphIndex> sources; // Source graph indexes to compact
Map<ImmutableGraphIndex, FixedBitSet> livenodes; // Tracks which ordinals are live
Map<ImmutableGraphIndex, OrdinalMapper> remappers; // Maps old ordinals -> new ordinals
API Usage
// Create a compactor from multiple immutable graph indexes
OnDiskGraphIndexCompactor compactor =
new OnDiskGraphIndexCompactor(List.of(source1, source2, source3));
// Mark which nodes are still live in each source
compactor.setLiveNodes(source1, liveNodes1); // FixedBitSet
compactor.setLiveNodes(source2, liveNodes2);
// Supply remapping functions for each source
compactor.setRemapper(source1, remapper1);
compactor.setRemapper(source2, remapper2);
// Execute streaming compaction
OnDiskGraphIndex result = compactor.compact(outputPath, numNeighbors);
Implementation of Compaction
-
Traverse each node in each index
Iterate through all ordinals across all sources and skip deleted ones.
-
Perform search across all sources
For each node, run a multi-source search to retrieve top-K candidates from all indexes.
-
Apply pruning
Apply standard HNSW pruning rules to maintain neighbor limits and graph quality.
-
Apply ordinal remapping
Convert both node ordinal and neighbor ordinals using the per-source remappers.
-
Stream the node to I/O
Use the new CompactorWriter to write each node sequentially to disk.
Problem
OnDiskGraphIndexis immutable. Today, to delete nodes or perform compaction, clients must:OnHeapGraphIndex.For large graphs (millions of nodes), this approach requires loading the full graph structure, which frequently leads to OOM and does not scale.
Proposal
Introduce an
OnDiskGraphIndexCompactorthat performs streaming N:1 compaction over multiple on-disk indexes without loading them into memory.The compactor provides:
Delete Tracking
Track live/dead nodes using an in-memory
FixedBitSet, allowing deletion without rewriting the index or constructing anOnHeapGraphIndex.Streaming N:1 Compaction that performs streaming compaction
Write a new
OnDiskGraphIndexvia a streaming writer (no full materialization of the graph). Because the existing writer does not support streaming individual nodes, this proposal includes implementing a newCompactorWriter.Custom Ordinal Remapping
Allow clients to pass a custom
OridnalMapperthat remaps ordinals from each source index to ordinals in the compacted output index.Compactor Internal State
API Usage
Implementation of Compaction
Traverse each node in each index
Iterate through all ordinals across all sources and skip deleted ones.
Perform search across all sources
For each node, run a multi-source search to retrieve top-K candidates from all indexes.
Apply pruning
Apply standard HNSW pruning rules to maintain neighbor limits and graph quality.
Apply ordinal remapping
Convert both node ordinal and neighbor ordinals using the per-source remappers.
Stream the node to I/O
Use the new
CompactorWriterto write each node sequentially to disk.