Skip to content

Commit a0d58a8

Browse files
aksOpsclaude
andcommitted
Rewrite CLAUDE.md for OSSCodeIQ — comprehensive project reference
Complete rewrite reflecting current state: renamed to osscodeiq, 97 detectors, 2074 tests, server architecture, MCP tools, CI/CD workflows, versioning strategy, gotchas, and all conventions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent e3efc6c commit a0d58a8

1 file changed

Lines changed: 127 additions & 63 deletions

File tree

CLAUDE.md

Lines changed: 127 additions & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,23 @@
1-
# Code Intelligence — Project Instructions
1+
# OSSCodeIQ — Project Instructions
22

33
## What This Project Is
44

5-
A CLI tool that scans codebases to build a deterministic code knowledge graph. No AI, no external APIs — pure pattern matching. 72 detectors, 35 languages, 3 storage backends (NetworkX, SQLite, KuzuDB).
5+
**OSSCodeIQ** (`osscodeiq` on PyPI) — a CLI tool + server that scans codebases to build a deterministic code knowledge graph. No AI, no external APIs — pure pattern matching. 97 detectors, 35 languages, 3 storage backends (NetworkX, SQLite, KuzuDB), REST API + MCP server, interactive flow diagrams.
6+
7+
- **PyPI package:** `osscodeiq`
8+
- **CLI command:** `osscodeiq`
9+
- **Python package:** `osscodeiq` (under `src/osscodeiq/`)
10+
- **GitHub repo:** `RandomCodeSpace/code-iq` (repo name differs from package name)
11+
- **Cache directory on disk:** `.code-intelligence` (legacy name, kept for backward compatibility)
612

713
## Architecture
814

915
```
1016
FileDiscovery → Parsers → Detectors → GraphBuilder (buffered) → Linkers → LayerClassifier → GraphStore (backend)
17+
18+
CodeIQService (shared facade)
19+
↙ ↘
20+
FastAPI REST (/api) FastMCP MCP (/mcp)
1121
```
1222

1323
- **Detectors** follow the `Detector` Protocol in `detectors/base.py` — implement `name`, `supported_languages`, `detect(ctx) -> DetectorResult`
@@ -16,6 +26,8 @@ FileDiscovery → Parsers → Detectors → GraphBuilder (buffered) → Linkers
1626
- **GraphBuilder** buffers all nodes and edges, flushes nodes first then edges (ensures cross-backend parity)
1727
- **Linkers** run after all detectors, produce cross-file relationship edges
1828
- **LayerClassifier** runs after linkers, sets `layer` property on every node
29+
- **CodeIQService** wraps GraphStore + FlowEngine + GraphQuery + Analyzer — shared by REST and MCP
30+
- **Server** is a single FastAPI app: `/api` (REST), `/mcp` (MCP via fastmcp streamable HTTP), `/` (welcome UI), `/docs` (OpenAPI)
1931

2032
## Critical Rules
2133

@@ -24,111 +36,163 @@ FileDiscovery → Parsers → Detectors → GraphBuilder (buffered) → Linkers
2436
- No set iteration without `sorted()` first
2537
- No dependency on thread completion order (builder uses indexed result slots)
2638
- All detectors must be stateless pure functions — no class-level mutable state
27-
- Benchmark after every change: run 2+ times, assert identical node/edge counts
2839

2940
### Cross-Backend Data Parity
3041
- All 3 backends (NetworkX, SQLite, KuzuDB) must produce identical node and edge counts
3142
- Edges are only added if both source and target nodes exist
3243
- Test parity after any change to builder, store, or backends
3344

34-
### Adding a New Detector
35-
1. Create file in `detectors/<category>/my_detector.py`
36-
2. Implement `Detector` protocol (name, supported_languages, detect method)
37-
3. Add to the hardcoded list in `detectors/registry.py` (will be auto-discovered after tech debt cleanup)
38-
4. Create test in `tests/detectors/<category>/test_my_detector.py`
39-
5. Include a determinism test (run twice, assert identical output)
40-
6. Run `pytest tests/ -x -q` — all tests must pass
45+
### Windows Compatibility
46+
- Always use `encoding="utf-8"` when reading/writing files (Windows defaults to cp1252)
47+
- This applies to templates, vendor JS, HTML output, and any file I/O in the server
48+
49+
### pyproject.toml is the Single Source of Truth
50+
- All dependencies, scripts, metadata, and package config live in `pyproject.toml`
51+
- After ANY change to pyproject.toml, run `uv lock` and commit both files together
52+
- Version in pyproject.toml is `0.0.0` (placeholder) — publish/beta workflows patch it at build time
53+
- Server deps (fastapi, uvicorn, fastmcp) are core dependencies, not optional
54+
- Only `dev` (pytest) and `kuzu` remain as optional deps
4155

42-
### Adding a New Backend
43-
1. Create file in `graph/backends/my_backend.py`
44-
2. Implement `GraphBackend` protocol (16 methods)
45-
3. Optionally implement `CypherBackend` for Cypher support
46-
4. Add to factory in `graph/backends/__init__.py`
47-
5. Add to `GraphConfig` backend choices in `config.py`
48-
6. Test parity: same nodes/edges as NetworkX on the same input
56+
### GitHub References
57+
- Repo URL is `RandomCodeSpace/code-iq` — do NOT change this even though package is `osscodeiq`
58+
- SonarCloud project key: `RandomCodeSpace_code-iq`
59+
- Badge URLs, workflow URLs, and clone URLs all use `code-iq`
4960

5061
## Code Conventions
5162

5263
- Python 3.11+, `from __future__ import annotations`
5364
- Pydantic for data models, typer for CLI, rich for output
65+
- FastAPI for REST API, fastmcp for MCP server (streamable HTTP, NOT SSE)
5466
- Regex-based detection (no tree-sitter dependency for new detectors unless needed)
5567
- `NodeKind` and `EdgeKind` enums in `models/graph.py` — add new values there
5668
- ID format: `"{prefix}:{filepath}:{type}:{identifier}"` for cross-file uniqueness
5769
- Properties dict for detector-specific metadata (`auth_type`, `framework`, `roles`, etc.)
5870
- `layer` property on every node: `frontend | backend | infra | shared | unknown`
71+
- Suppress websockets deprecation warnings in serve command (upstream uvicorn issue)
72+
73+
## CLI Commands
74+
75+
| Command | Purpose |
76+
|---------|---------|
77+
| `osscodeiq analyze [path]` | Scan codebase, build graph |
78+
| `osscodeiq graph [path]` | Export graph (json, yaml, mermaid, dot) |
79+
| `osscodeiq query [path]` | Semantic graph queries |
80+
| `osscodeiq find [what] [path]` | Preset queries (endpoints, guards, entities, etc.) |
81+
| `osscodeiq cypher [query]` | Raw Cypher (KuzuDB only) |
82+
| `osscodeiq flow [path]` | Architecture flow diagrams (mermaid, json, html) |
83+
| `osscodeiq serve [path]` | Start unified server (API + MCP) |
84+
| `osscodeiq bundle [path]` | Create distributable package |
85+
| `osscodeiq cache [action]` | Manage analysis cache |
86+
| `osscodeiq plugins [action]` | List/inspect detectors |
87+
| `osscodeiq version` | Show version info |
88+
89+
## Server Architecture
90+
91+
### Endpoints
92+
- `GET /` — Welcome page (self-contained HTML, fetches `/api/stats`)
93+
- `GET /api/stats` — Graph statistics
94+
- `GET /api/nodes`, `GET /api/edges` — Paginated queries with `?kind=&limit=&offset=`
95+
- `GET /api/nodes/{id}/neighbors` — Neighbor traversal
96+
- `GET /api/ego/{id}` — Ego subgraph
97+
- `GET /api/query/cycles`, `/shortest-path`, `/consumers/{id}`, `/producers/{id}`, `/callers/{id}`, `/dependencies/{id}`, `/dependents/{id}`
98+
- `GET /api/flow/{view}` — Flow diagrams (overview, ci, deploy, runtime, auth)
99+
- `POST /api/analyze` — Trigger analysis
100+
- `POST /api/cypher` — Raw Cypher (400 if not KuzuDB)
101+
- `GET /api/triage/component`, `/impact/{id}`, `/endpoints` — Agentic triage tools
102+
- `GET /api/search?q=` — Free-text graph search
103+
- `GET /api/file?path=` — Serve source files (path traversal protected)
104+
- `POST /mcp` — MCP endpoint (20 tools via streamable HTTP)
105+
106+
### MCP Tools (20)
107+
15 core tools (get_stats, query_nodes, query_edges, get_node_neighbors, get_ego_graph, find_cycles, find_shortest_path, find_consumers, find_producers, find_callers, find_dependencies, find_dependents, generate_flow, analyze_codebase, run_cypher) + 5 agentic triage tools (find_component_by_file, trace_impact, find_related_endpoints, search_graph, read_file).
108+
109+
### Key Server Files
110+
| File | Purpose |
111+
|------|---------|
112+
| `server/app.py` | FastAPI app assembly, mounts /api, /mcp, / |
113+
| `server/service.py` | CodeIQService — shared facade over GraphStore + FlowEngine + GraphQuery |
114+
| `server/routes.py` | REST API endpoints (uses Annotated type hints) |
115+
| `server/mcp_server.py` | FastMCP tool definitions |
116+
| `server/middleware.py` | Auth middleware stub (no-op, ready for future auth) |
117+
| `server/templates/welcome.html` | Self-contained welcome page |
59118

60119
## Testing
61120

62-
- `pytest tests/ -x -q` — must always pass (currently 565 tests)
121+
- `pytest tests/ -x -q` — must always pass (currently 2,074 tests, 86% coverage)
63122
- Every detector needs: positive match test, negative match test, determinism test
123+
- Server tests use FastAPI TestClient
124+
- MCP tools tested by calling functions directly after `set_service()`
64125
- All detectors use shared `detectors/utils.py` — decode_text, find_line_number, etc.
65-
66-
## Benchmark Requirements
67-
68-
**After every change**, run a clean benchmark on a small project to verify:
69-
1. No performance regression (time should not increase significantly)
70-
2. 100% determinism (2 runs produce identical node/edge counts)
71-
3. Coverage doesn't decrease (file/node/edge counts should not drop)
72-
73-
**Benchmark procedure:**
74-
```bash
75-
rm -rf ~/projects/testDir/contoso-real-estate/.code-intelligence/
76-
find ~/projects/testDir/contoso-real-estate -name ".code_intelligence_cache*" -delete
77-
# Run twice
78-
time code-intelligence analyze ~/projects/testDir/contoso-real-estate --full -j 8
79-
time code-intelligence analyze ~/projects/testDir/contoso-real-estate --full -j 8
80-
```
81-
82-
If `testDir/contoso-real-estate` is not available, clone an official secure project:
83-
```bash
84-
git clone --depth 1 https://github.com/Azure-Samples/contoso-real-estate.git ~/projects/testDir/contoso-real-estate
85-
```
86-
87-
**Baseline (contoso-real-estate, 488 files):** 2,313 nodes, 2,905 edges, ~3.7s
88-
- Cross-backend parity test on contoso-real-estate for data quality
126+
- KuzuDB tests require `kuzu` package (installed in CI via `pip install -e ".[dev,kuzu]"`)
89127

90128
## Key Files
91129

92130
| File | Purpose |
93131
|------|---------|
94-
| `detectors/base.py` | Detector protocol (42 lines) |
132+
| `detectors/base.py` | Detector protocol |
95133
| `graph/backend.py` | GraphBackend + CypherBackend protocols |
96134
| `graph/store.py` | GraphStore facade |
97135
| `graph/builder.py` | GraphBuilder with buffered flush + linkers |
98136
| `graph/backends/networkx.py` | Default in-memory backend |
99137
| `graph/backends/kuzu.py` | KuzuDB embedded graph DB with Cypher |
100138
| `graph/backends/sqlite_backend.py` | SQLite file-based backend |
101139
| `classifiers/layer_classifier.py` | Deterministic layer classification |
102-
| `models/graph.py` | NodeKind, EdgeKind, GraphNode, GraphEdge |
140+
| `models/graph.py` | NodeKind (31 types), EdgeKind (26 types), GraphNode, GraphEdge |
103141
| `config.py` | Config with GraphConfig for backend selection |
104142
| `analyzer.py` | Pipeline orchestrator |
105-
| `cli.py` | CLI commands (analyze, graph, query, find, cypher, bundle, cache, plugins) |
106-
107-
## Tech Debt Resolved (Phase 2 — Complete)
143+
| `cli.py` | CLI commands — constants `_GRAPH_DIR_NAME`, `_KUZU_DB_NAME`, `_SQLITE_DB_NAME` |
144+
| `flow/engine.py` | FlowEngine — generate/render flow diagrams |
145+
| `flow/renderer.py` | Mermaid, JSON, HTML renderers (vendor JS inlined for offline use) |
146+
| `flow/views.py` | 5 view builders (overview, ci, deploy, runtime, auth) |
147+
| `flow/vendor/` | Bundled Cytoscape.js + Dagre.js (no CDN — works behind firewalls) |
108148

109-
- Registry auto-discovers detectors via `pkgutil.walk_packages()` — new detector = create file, done
110-
- `imports_detector.py` split into `kotlin_structures.py`, `rust_structures.py`, `scala_structures.py` with fixed regexes
111-
- 54 new tests added for 10 previously untested detectors (415 total tests)
112-
- `_parse_structured()` uses `_STRUCTURED_PARSERS` dispatch dict
113-
- Linker protocol uses `LinkResult(nodes, edges)` dataclass — no more private attribute hack
114-
- 16 new extensions added (.html, .css, .mjs, .cjs, .jsonc, .groovy, .pyi, .razor, .cshtml, .adoc, etc.)
115-
- Extensionless files supported via `_FILENAME_MAP` (Dockerfile, Makefile, go.mod, Jenkinsfile)
116-
- Shared `detectors/utils.py` with `decode_text`, `iter_lines`, `find_line_number`, `filename`, `matches_filename`
117-
118-
## Adding a New Detector (Updated)
149+
## Adding a New Detector
119150

120151
1. Create file in `detectors/<category>/my_detector.py`
121152
2. Implement `Detector` protocol (name, supported_languages, detect method)
122-
3. **No registry changes needed** — auto-discovered by package scanning
153+
3. **No registry changes needed** — auto-discovered by `pkgutil.walk_packages()`
123154
4. Create test in `tests/detectors/<category>/test_my_detector.py`
124155
5. Include a determinism test (run twice, assert identical output)
125156
6. Run `pytest tests/ -x -q` — all tests must pass
126157

127-
## Remaining Work
128-
129-
- Phase 3: Flow generator (GitLab CI, Helm, enhanced Dockerfile, Mermaid flow command)
130-
- Phase 4: 30+ new framework detectors (Go web, EF Core, Prisma, Pydantic, etc.)
131-
- KuzuDB bulk import optimization for edge insertion
158+
## CI/CD Workflows
159+
160+
| Workflow | File | Purpose |
161+
|----------|------|---------|
162+
| CI | `ci.yml` | Run tests on Python 3.11-3.12, installs `[dev,kuzu]` |
163+
| Beta | `beta.yml` | Auto-publish beta on push to src/tests. Version: latest stable tag + incremental counter (PEP 440: `v0.1.0b0`) |
164+
| Publish | `publish.yml` | Manual trigger. Patches version from input, builds, tests on 11 OS combos + 9 containers, publishes to PyPI, creates GitHub release |
165+
| SonarCloud | `sonarcloud.yml` | Code quality + coverage analysis |
166+
| SBOM | `sbom.yml` | Dependency audit |
167+
168+
### Beta Versioning
169+
- Derives base version from latest stable git tag (e.g. `v0.1.0``0.1.0`)
170+
- Increments beta number from existing beta tags (not commit count)
171+
- Tags: PEP 440 format (`v0.1.0b0`, `v0.1.0b1`, ...)
172+
- Falls back to pyproject.toml version if no stable tags exist
173+
174+
### PyPI Publishing
175+
- Trusted publisher configured (environment: `pypi`)
176+
- Version patched from workflow_dispatch input (pyproject.toml stays at `0.0.0`)
177+
- Creates GitHub release with auto-generated changelog after successful publish
178+
179+
## SonarCloud
180+
181+
- Project key: `RandomCodeSpace_code-iq`
182+
- Config: `sonar-project.properties` — sources at `src/osscodeiq`
183+
- Coverage report: `coverage.xml` generated by pytest-cov
184+
- Keep 0 bugs, 0 vulnerabilities. Cognitive complexity issues are tracked but not blocking.
185+
186+
## Gotchas & Lessons Learned
187+
188+
- **Package name ≠ repo name**: Package is `osscodeiq`, repo is `code-iq`. Never change GitHub URLs.
189+
- **pyproject.toml section ordering matters**: `[project.urls]` must come AFTER `dependencies = [...]`, not before. TOML will silently parse dependencies as a URL key otherwise.
190+
- **Windows encoding**: All file reads/writes must specify `encoding="utf-8"`. Minified JS vendor files contain bytes invalid in cp1252.
191+
- **FastAPI path params with colons**: Node IDs contain colons (e.g. `gha:workflow:build`). Use `{node_id:path}` in route definitions. Route ordering matters — `/nodes/{id}/neighbors` must be registered BEFORE `/nodes/{id}`.
192+
- **MCP transport**: Use streamable HTTP (`mcp.http_app(transport="streamable-http")`), NOT SSE.
193+
- **Loop bounds from user input**: Cap `radius` and `depth` params (max 10) to prevent DoS. SonarCloud flags this as a vulnerability.
194+
- **Vendor JS for offline use**: Cytoscape.js and Dagre.js are bundled in `flow/vendor/` and inlined into HTML at render time. No CDN dependencies.
195+
- **uv.lock**: Always regenerate with `uv lock` after pyproject.toml changes.
132196

133197
## Updating This File
134198

0 commit comments

Comments
 (0)