Skip to content

Commit 71187a2

Browse files
aksOpsclaude
andcommitted
Fix non-determinism bug: ensure 100% consistent results across runs
Root cause: as_completed() returned futures in non-deterministic order, causing files to be processed in varying order across runs. This affected ModuleContainmentLinker which depends on node insertion order. Fixes: - analyzer.py: use indexed result slots to preserve file ordering regardless of thread completion order - builder.py: sort set iteration in TopicLinker for deterministic edges - store.py: return sorted neighbors instead of non-deterministic set Verified: terraform-provider-azurerm (42,850 files) now produces 312,232 nodes and 448,540 edges consistently across 3 runs (was varying by 3-4 nodes previously). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 08c6a17 commit 71187a2

3 files changed

Lines changed: 12 additions & 9 deletions

File tree

src/code_intelligence/analyzer.py

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -327,25 +327,28 @@ def _report(msg: str) -> None:
327327
for f in files_to_analyze
328328
]
329329
else:
330-
results = []
331330
max_workers = min(parallelism, len(files_to_analyze))
331+
# Use a list aligned with files_to_analyze to preserve
332+
# deterministic ordering regardless of thread completion order.
333+
result_slots: list[tuple[DiscoveredFile, DetectorResult] | None] = [None] * len(files_to_analyze)
332334
with ThreadPoolExecutor(max_workers=max_workers) as executor:
333335
futures = {
334336
executor.submit(
335337
_analyze_file, f, repo_path, self._registry, pm
336-
): f
337-
for f in files_to_analyze
338+
): idx
339+
for idx, f in enumerate(files_to_analyze)
338340
}
339341
for future in as_completed(futures):
342+
idx = futures[future]
340343
try:
341-
results.append(future.result())
344+
result_slots[idx] = future.result()
342345
except Exception:
343-
src_file = futures[future]
344346
logger.warning(
345347
"Analysis failed for %s",
346-
src_file.path,
348+
files_to_analyze[idx].path,
347349
exc_info=True,
348350
)
351+
results = [r for r in result_slots if r is not None]
349352

350353
# ----------------------------------------------------------
351354
# 5. Aggregate results into graph builder

src/code_intelligence/graph/builder.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -59,8 +59,8 @@ def link(self, store: GraphStore) -> list[GraphEdge]:
5959
producers.update(producers_by_topic.get(tid, []))
6060
consumers.update(consumers_by_topic.get(tid, []))
6161

62-
for prod in producers:
63-
for cons in consumers:
62+
for prod in sorted(producers):
63+
for cons in sorted(consumers):
6464
if prod != cons:
6565
edges.append(
6666
GraphEdge(

src/code_intelligence/graph/store.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,7 @@ def neighbors(
110110
for source, _, data in self._g.in_edges(node_id, data=True):
111111
if edge_kinds is None or EdgeKind(data.get("kind", "")) in edge_kinds:
112112
result.add(source)
113-
return list(result)
113+
return sorted(result)
114114

115115
def subgraph(self, node_ids: set[str]) -> GraphStore:
116116
"""Create a new GraphStore containing only the specified nodes and edges between them."""

0 commit comments

Comments
 (0)