Skip to content

Commit 6c4fd92

Browse files
aksOpsclaude
andcommitted
Add project CLAUDE.md with architecture, conventions, and rules
Covers: determinism rules, cross-backend parity, how to add detectors and backends, code conventions, key files, known tech debt. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent a59a45f commit 6c4fd92

1 file changed

Lines changed: 96 additions & 0 deletions

File tree

CLAUDE.md

Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
# Code Intelligence — Project Instructions
2+
3+
## What This Project Is
4+
5+
A CLI tool that scans codebases to build a deterministic code knowledge graph. No AI, no external APIs — pure pattern matching. 72 detectors, 35 languages, 3 storage backends (NetworkX, SQLite, KuzuDB).
6+
7+
## Architecture
8+
9+
```
10+
FileDiscovery → Parsers → Detectors → GraphBuilder (buffered) → Linkers → LayerClassifier → GraphStore (backend)
11+
```
12+
13+
- **Detectors** follow the `Detector` Protocol in `detectors/base.py` — implement `name`, `supported_languages`, `detect(ctx) -> DetectorResult`
14+
- **Backends** follow the `GraphBackend` Protocol in `graph/backend.py` — implement 16 methods. `CypherBackend` is optional for Cypher-capable backends.
15+
- **GraphStore** is a facade delegating to a backend — never access backends directly
16+
- **GraphBuilder** buffers all nodes and edges, flushes nodes first then edges (ensures cross-backend parity)
17+
- **Linkers** run after all detectors, produce cross-file relationship edges
18+
- **LayerClassifier** runs after linkers, sets `layer` property on every node
19+
20+
## Critical Rules
21+
22+
### Determinism is Non-Negotiable
23+
- Same input MUST produce same output, every time, on every backend
24+
- No set iteration without `sorted()` first
25+
- No dependency on thread completion order (builder uses indexed result slots)
26+
- All detectors must be stateless pure functions — no class-level mutable state
27+
- Benchmark after every change: run 2+ times, assert identical node/edge counts
28+
29+
### Cross-Backend Data Parity
30+
- All 3 backends (NetworkX, SQLite, KuzuDB) must produce identical node and edge counts
31+
- Edges are only added if both source and target nodes exist
32+
- Test parity after any change to builder, store, or backends
33+
34+
### Adding a New Detector
35+
1. Create file in `detectors/<category>/my_detector.py`
36+
2. Implement `Detector` protocol (name, supported_languages, detect method)
37+
3. Add to the hardcoded list in `detectors/registry.py` (will be auto-discovered after tech debt cleanup)
38+
4. Create test in `tests/detectors/<category>/test_my_detector.py`
39+
5. Include a determinism test (run twice, assert identical output)
40+
6. Run `pytest tests/ -x -q` — all tests must pass
41+
42+
### Adding a New Backend
43+
1. Create file in `graph/backends/my_backend.py`
44+
2. Implement `GraphBackend` protocol (16 methods)
45+
3. Optionally implement `CypherBackend` for Cypher support
46+
4. Add to factory in `graph/backends/__init__.py`
47+
5. Add to `GraphConfig` backend choices in `config.py`
48+
6. Test parity: same nodes/edges as NetworkX on the same input
49+
50+
## Code Conventions
51+
52+
- Python 3.11+, `from __future__ import annotations`
53+
- Pydantic for data models, typer for CLI, rich for output
54+
- Regex-based detection (no tree-sitter dependency for new detectors unless needed)
55+
- `NodeKind` and `EdgeKind` enums in `models/graph.py` — add new values there
56+
- ID format: `"{prefix}:{filepath}:{type}:{identifier}"` for cross-file uniqueness
57+
- Properties dict for detector-specific metadata (`auth_type`, `framework`, `roles`, etc.)
58+
- `layer` property on every node: `frontend | backend | infra | shared | unknown`
59+
60+
## Testing
61+
62+
- `pytest tests/ -x -q` — must always pass (currently 361 tests)
63+
- Every detector needs: positive match test, negative match test, determinism test
64+
- Benchmark on spring-boot (10K files) for performance regression checks
65+
- Cross-backend parity test on contoso-real-estate for data quality
66+
67+
## Key Files
68+
69+
| File | Purpose |
70+
|------|---------|
71+
| `detectors/base.py` | Detector protocol (42 lines) |
72+
| `graph/backend.py` | GraphBackend + CypherBackend protocols |
73+
| `graph/store.py` | GraphStore facade |
74+
| `graph/builder.py` | GraphBuilder with buffered flush + linkers |
75+
| `graph/backends/networkx.py` | Default in-memory backend |
76+
| `graph/backends/kuzu.py` | KuzuDB embedded graph DB with Cypher |
77+
| `graph/backends/sqlite_backend.py` | SQLite file-based backend |
78+
| `classifiers/layer_classifier.py` | Deterministic layer classification |
79+
| `models/graph.py` | NodeKind, EdgeKind, GraphNode, GraphEdge |
80+
| `config.py` | Config with GraphConfig for backend selection |
81+
| `analyzer.py` | Pipeline orchestrator |
82+
| `cli.py` | CLI commands (analyze, graph, query, find, cypher, bundle, cache, plugins) |
83+
84+
## Known Tech Debt (Phase 2)
85+
86+
- Registry has 75-entry hardcoded detector list — needs auto-discovery
87+
- `imports_detector.py` is 723 lines — needs splitting per language
88+
- 60+ detectors have no tests — need coverage
89+
- `_parse_structured()` has 11-branch elif chain — needs dispatch table
90+
- Linker protocol uses `_new_module_nodes` private attribute hack — needs `LinkResult`
91+
- Missing extensions: `.html`, `.css`, `.mjs`, `.cjs`, `.jsonc`, `.groovy`, `.pyi`
92+
- No extensionless file support (Dockerfile, Makefile, go.mod)
93+
94+
## Updating This File
95+
96+
After significant changes (new detectors, new backends, architectural decisions, conventions learned), update this CLAUDE.md to reflect the current state. Keep it concise and actionable.

0 commit comments

Comments
 (0)