1- # Code Intelligence — Project Instructions
1+ # OSSCodeIQ — Project Instructions
22
33## What This Project Is
44
5- A CLI tool that scans codebases to build a deterministic code knowledge graph. No AI, no external APIs — pure pattern matching. 72 detectors, 35 languages, 3 storage backends (NetworkX, SQLite, KuzuDB).
5+ ** OSSCodeIQ** (` osscodeiq ` on PyPI) — a CLI tool + server that scans codebases to build a deterministic code knowledge graph. No AI, no external APIs — pure pattern matching. 97 detectors, 35 languages, 3 storage backends (NetworkX, SQLite, KuzuDB), REST API + MCP server, interactive flow diagrams.
6+
7+ - ** PyPI package:** ` osscodeiq `
8+ - ** CLI command:** ` osscodeiq `
9+ - ** Python package:** ` osscodeiq ` (under ` src/osscodeiq/ ` )
10+ - ** GitHub repo:** ` RandomCodeSpace/code-iq ` (repo name differs from package name)
11+ - ** Cache directory on disk:** ` .code-intelligence ` (legacy name, kept for backward compatibility)
612
713## Architecture
814
915```
1016FileDiscovery → Parsers → Detectors → GraphBuilder (buffered) → Linkers → LayerClassifier → GraphStore (backend)
17+ ↓
18+ CodeIQService (shared facade)
19+ ↙ ↘
20+ FastAPI REST (/api) FastMCP MCP (/mcp)
1121```
1222
1323- ** Detectors** follow the ` Detector ` Protocol in ` detectors/base.py ` — implement ` name ` , ` supported_languages ` , ` detect(ctx) -> DetectorResult `
@@ -16,6 +26,8 @@ FileDiscovery → Parsers → Detectors → GraphBuilder (buffered) → Linkers
1626- ** GraphBuilder** buffers all nodes and edges, flushes nodes first then edges (ensures cross-backend parity)
1727- ** Linkers** run after all detectors, produce cross-file relationship edges
1828- ** LayerClassifier** runs after linkers, sets ` layer ` property on every node
29+ - ** CodeIQService** wraps GraphStore + FlowEngine + GraphQuery + Analyzer — shared by REST and MCP
30+ - ** Server** is a single FastAPI app: ` /api ` (REST), ` /mcp ` (MCP via fastmcp streamable HTTP), ` / ` (welcome UI), ` /docs ` (OpenAPI)
1931
2032## Critical Rules
2133
@@ -24,111 +36,163 @@ FileDiscovery → Parsers → Detectors → GraphBuilder (buffered) → Linkers
2436- No set iteration without ` sorted() ` first
2537- No dependency on thread completion order (builder uses indexed result slots)
2638- All detectors must be stateless pure functions — no class-level mutable state
27- - Benchmark after every change: run 2+ times, assert identical node/edge counts
2839
2940### Cross-Backend Data Parity
3041- All 3 backends (NetworkX, SQLite, KuzuDB) must produce identical node and edge counts
3142- Edges are only added if both source and target nodes exist
3243- Test parity after any change to builder, store, or backends
3344
34- ### Adding a New Detector
35- 1 . Create file in ` detectors/<category>/my_detector.py `
36- 2 . Implement ` Detector ` protocol (name, supported_languages, detect method)
37- 3 . Add to the hardcoded list in ` detectors/registry.py ` (will be auto-discovered after tech debt cleanup)
38- 4 . Create test in ` tests/detectors/<category>/test_my_detector.py `
39- 5 . Include a determinism test (run twice, assert identical output)
40- 6 . Run ` pytest tests/ -x -q ` — all tests must pass
45+ ### Windows Compatibility
46+ - Always use ` encoding="utf-8" ` when reading/writing files (Windows defaults to cp1252)
47+ - This applies to templates, vendor JS, HTML output, and any file I/O in the server
48+
49+ ### pyproject.toml is the Single Source of Truth
50+ - All dependencies, scripts, metadata, and package config live in ` pyproject.toml `
51+ - After ANY change to pyproject.toml, run ` uv lock ` and commit both files together
52+ - Version in pyproject.toml is ` 0.0.0 ` (placeholder) — publish/beta workflows patch it at build time
53+ - Server deps (fastapi, uvicorn, fastmcp) are core dependencies, not optional
54+ - Only ` dev ` (pytest) and ` kuzu ` remain as optional deps
4155
42- ### Adding a New Backend
43- 1 . Create file in ` graph/backends/my_backend.py `
44- 2 . Implement ` GraphBackend ` protocol (16 methods)
45- 3 . Optionally implement ` CypherBackend ` for Cypher support
46- 4 . Add to factory in ` graph/backends/__init__.py `
47- 5 . Add to ` GraphConfig ` backend choices in ` config.py `
48- 6 . Test parity: same nodes/edges as NetworkX on the same input
56+ ### GitHub References
57+ - Repo URL is ` RandomCodeSpace/code-iq ` — do NOT change this even though package is ` osscodeiq `
58+ - SonarCloud project key: ` RandomCodeSpace_code-iq `
59+ - Badge URLs, workflow URLs, and clone URLs all use ` code-iq `
4960
5061## Code Conventions
5162
5263- Python 3.11+, ` from __future__ import annotations `
5364- Pydantic for data models, typer for CLI, rich for output
65+ - FastAPI for REST API, fastmcp for MCP server (streamable HTTP, NOT SSE)
5466- Regex-based detection (no tree-sitter dependency for new detectors unless needed)
5567- ` NodeKind ` and ` EdgeKind ` enums in ` models/graph.py ` — add new values there
5668- ID format: ` "{prefix}:{filepath}:{type}:{identifier}" ` for cross-file uniqueness
5769- Properties dict for detector-specific metadata (` auth_type ` , ` framework ` , ` roles ` , etc.)
5870- ` layer ` property on every node: ` frontend | backend | infra | shared | unknown `
71+ - Suppress websockets deprecation warnings in serve command (upstream uvicorn issue)
72+
73+ ## CLI Commands
74+
75+ | Command | Purpose |
76+ | ---------| ---------|
77+ | ` osscodeiq analyze [path] ` | Scan codebase, build graph |
78+ | ` osscodeiq graph [path] ` | Export graph (json, yaml, mermaid, dot) |
79+ | ` osscodeiq query [path] ` | Semantic graph queries |
80+ | ` osscodeiq find [what] [path] ` | Preset queries (endpoints, guards, entities, etc.) |
81+ | ` osscodeiq cypher [query] ` | Raw Cypher (KuzuDB only) |
82+ | ` osscodeiq flow [path] ` | Architecture flow diagrams (mermaid, json, html) |
83+ | ` osscodeiq serve [path] ` | Start unified server (API + MCP) |
84+ | ` osscodeiq bundle [path] ` | Create distributable package |
85+ | ` osscodeiq cache [action] ` | Manage analysis cache |
86+ | ` osscodeiq plugins [action] ` | List/inspect detectors |
87+ | ` osscodeiq version ` | Show version info |
88+
89+ ## Server Architecture
90+
91+ ### Endpoints
92+ - ` GET / ` — Welcome page (self-contained HTML, fetches ` /api/stats ` )
93+ - ` GET /api/stats ` — Graph statistics
94+ - ` GET /api/nodes ` , ` GET /api/edges ` — Paginated queries with ` ?kind=&limit=&offset= `
95+ - ` GET /api/nodes/{id}/neighbors ` — Neighbor traversal
96+ - ` GET /api/ego/{id} ` — Ego subgraph
97+ - ` GET /api/query/cycles ` , ` /shortest-path ` , ` /consumers/{id} ` , ` /producers/{id} ` , ` /callers/{id} ` , ` /dependencies/{id} ` , ` /dependents/{id} `
98+ - ` GET /api/flow/{view} ` — Flow diagrams (overview, ci, deploy, runtime, auth)
99+ - ` POST /api/analyze ` — Trigger analysis
100+ - ` POST /api/cypher ` — Raw Cypher (400 if not KuzuDB)
101+ - ` GET /api/triage/component ` , ` /impact/{id} ` , ` /endpoints ` — Agentic triage tools
102+ - ` GET /api/search?q= ` — Free-text graph search
103+ - ` GET /api/file?path= ` — Serve source files (path traversal protected)
104+ - ` POST /mcp ` — MCP endpoint (20 tools via streamable HTTP)
105+
106+ ### MCP Tools (20)
107+ 15 core tools (get_stats, query_nodes, query_edges, get_node_neighbors, get_ego_graph, find_cycles, find_shortest_path, find_consumers, find_producers, find_callers, find_dependencies, find_dependents, generate_flow, analyze_codebase, run_cypher) + 5 agentic triage tools (find_component_by_file, trace_impact, find_related_endpoints, search_graph, read_file).
108+
109+ ### Key Server Files
110+ | File | Purpose |
111+ | ------| ---------|
112+ | ` server/app.py ` | FastAPI app assembly, mounts /api, /mcp, / |
113+ | ` server/service.py ` | CodeIQService — shared facade over GraphStore + FlowEngine + GraphQuery |
114+ | ` server/routes.py ` | REST API endpoints (uses Annotated type hints) |
115+ | ` server/mcp_server.py ` | FastMCP tool definitions |
116+ | ` server/middleware.py ` | Auth middleware stub (no-op, ready for future auth) |
117+ | ` server/templates/welcome.html ` | Self-contained welcome page |
59118
60119## Testing
61120
62- - ` pytest tests/ -x -q ` — must always pass (currently 565 tests)
121+ - ` pytest tests/ -x -q ` — must always pass (currently 2,074 tests, 86% coverage )
63122- Every detector needs: positive match test, negative match test, determinism test
123+ - Server tests use FastAPI TestClient
124+ - MCP tools tested by calling functions directly after ` set_service() `
64125- All detectors use shared ` detectors/utils.py ` — decode_text, find_line_number, etc.
65-
66- ## Benchmark Requirements
67-
68- ** After every change** , run a clean benchmark on a small project to verify:
69- 1 . No performance regression (time should not increase significantly)
70- 2 . 100% determinism (2 runs produce identical node/edge counts)
71- 3 . Coverage doesn't decrease (file/node/edge counts should not drop)
72-
73- ** Benchmark procedure:**
74- ``` bash
75- rm -rf ~ /projects/testDir/contoso-real-estate/.code-intelligence/
76- find ~ /projects/testDir/contoso-real-estate -name " .code_intelligence_cache*" -delete
77- # Run twice
78- time code-intelligence analyze ~ /projects/testDir/contoso-real-estate --full -j 8
79- time code-intelligence analyze ~ /projects/testDir/contoso-real-estate --full -j 8
80- ```
81-
82- If ` testDir/contoso-real-estate ` is not available, clone an official secure project:
83- ``` bash
84- git clone --depth 1 https://github.com/Azure-Samples/contoso-real-estate.git ~ /projects/testDir/contoso-real-estate
85- ```
86-
87- ** Baseline (contoso-real-estate, 488 files):** 2,313 nodes, 2,905 edges, ~ 3.7s
88- - Cross-backend parity test on contoso-real-estate for data quality
126+ - KuzuDB tests require ` kuzu ` package (installed in CI via ` pip install -e ".[dev,kuzu]" ` )
89127
90128## Key Files
91129
92130| File | Purpose |
93131| ------| ---------|
94- | ` detectors/base.py ` | Detector protocol (42 lines) |
132+ | ` detectors/base.py ` | Detector protocol |
95133| ` graph/backend.py ` | GraphBackend + CypherBackend protocols |
96134| ` graph/store.py ` | GraphStore facade |
97135| ` graph/builder.py ` | GraphBuilder with buffered flush + linkers |
98136| ` graph/backends/networkx.py ` | Default in-memory backend |
99137| ` graph/backends/kuzu.py ` | KuzuDB embedded graph DB with Cypher |
100138| ` graph/backends/sqlite_backend.py ` | SQLite file-based backend |
101139| ` classifiers/layer_classifier.py ` | Deterministic layer classification |
102- | ` models/graph.py ` | NodeKind, EdgeKind, GraphNode, GraphEdge |
140+ | ` models/graph.py ` | NodeKind (31 types) , EdgeKind (26 types) , GraphNode, GraphEdge |
103141| ` config.py ` | Config with GraphConfig for backend selection |
104142| ` analyzer.py ` | Pipeline orchestrator |
105- | ` cli.py ` | CLI commands (analyze, graph, query, find, cypher, bundle, cache, plugins) |
106-
107- ## Tech Debt Resolved (Phase 2 — Complete)
143+ | ` cli.py ` | CLI commands — constants ` _GRAPH_DIR_NAME ` , ` _KUZU_DB_NAME ` , ` _SQLITE_DB_NAME ` |
144+ | ` flow/engine.py ` | FlowEngine — generate/render flow diagrams |
145+ | ` flow/renderer.py ` | Mermaid, JSON, HTML renderers (vendor JS inlined for offline use) |
146+ | ` flow/views.py ` | 5 view builders (overview, ci, deploy, runtime, auth) |
147+ | ` flow/vendor/ ` | Bundled Cytoscape.js + Dagre.js (no CDN — works behind firewalls) |
108148
109- - Registry auto-discovers detectors via ` pkgutil.walk_packages() ` — new detector = create file, done
110- - ` imports_detector.py ` split into ` kotlin_structures.py ` , ` rust_structures.py ` , ` scala_structures.py ` with fixed regexes
111- - 54 new tests added for 10 previously untested detectors (415 total tests)
112- - ` _parse_structured() ` uses ` _STRUCTURED_PARSERS ` dispatch dict
113- - Linker protocol uses ` LinkResult(nodes, edges) ` dataclass — no more private attribute hack
114- - 16 new extensions added (.html, .css, .mjs, .cjs, .jsonc, .groovy, .pyi, .razor, .cshtml, .adoc, etc.)
115- - Extensionless files supported via ` _FILENAME_MAP ` (Dockerfile, Makefile, go.mod, Jenkinsfile)
116- - Shared ` detectors/utils.py ` with ` decode_text ` , ` iter_lines ` , ` find_line_number ` , ` filename ` , ` matches_filename `
117-
118- ## Adding a New Detector (Updated)
149+ ## Adding a New Detector
119150
1201511 . Create file in ` detectors/<category>/my_detector.py `
1211522 . Implement ` Detector ` protocol (name, supported_languages, detect method)
122- 3 . ** No registry changes needed** — auto-discovered by package scanning
153+ 3 . ** No registry changes needed** — auto-discovered by ` pkgutil.walk_packages() `
1231544 . Create test in ` tests/detectors/<category>/test_my_detector.py `
1241555 . Include a determinism test (run twice, assert identical output)
1251566 . Run ` pytest tests/ -x -q ` — all tests must pass
126157
127- ## Remaining Work
128-
129- - Phase 3: Flow generator (GitLab CI, Helm, enhanced Dockerfile, Mermaid flow command)
130- - Phase 4: 30+ new framework detectors (Go web, EF Core, Prisma, Pydantic, etc.)
131- - KuzuDB bulk import optimization for edge insertion
158+ ## CI/CD Workflows
159+
160+ | Workflow | File | Purpose |
161+ | ----------| ------| ---------|
162+ | CI | ` ci.yml ` | Run tests on Python 3.11-3.12, installs ` [dev,kuzu] ` |
163+ | Beta | ` beta.yml ` | Auto-publish beta on push to src/tests. Version: latest stable tag + incremental counter (PEP 440: ` v0.1.0b0 ` ) |
164+ | Publish | ` publish.yml ` | Manual trigger. Patches version from input, builds, tests on 11 OS combos + 9 containers, publishes to PyPI, creates GitHub release |
165+ | SonarCloud | ` sonarcloud.yml ` | Code quality + coverage analysis |
166+ | SBOM | ` sbom.yml ` | Dependency audit |
167+
168+ ### Beta Versioning
169+ - Derives base version from latest stable git tag (e.g. ` v0.1.0 ` → ` 0.1.0 ` )
170+ - Increments beta number from existing beta tags (not commit count)
171+ - Tags: PEP 440 format (` v0.1.0b0 ` , ` v0.1.0b1 ` , ...)
172+ - Falls back to pyproject.toml version if no stable tags exist
173+
174+ ### PyPI Publishing
175+ - Trusted publisher configured (environment: ` pypi ` )
176+ - Version patched from workflow_dispatch input (pyproject.toml stays at ` 0.0.0 ` )
177+ - Creates GitHub release with auto-generated changelog after successful publish
178+
179+ ## SonarCloud
180+
181+ - Project key: ` RandomCodeSpace_code-iq `
182+ - Config: ` sonar-project.properties ` — sources at ` src/osscodeiq `
183+ - Coverage report: ` coverage.xml ` generated by pytest-cov
184+ - Keep 0 bugs, 0 vulnerabilities. Cognitive complexity issues are tracked but not blocking.
185+
186+ ## Gotchas & Lessons Learned
187+
188+ - ** Package name ≠ repo name** : Package is ` osscodeiq ` , repo is ` code-iq ` . Never change GitHub URLs.
189+ - ** pyproject.toml section ordering matters** : ` [project.urls] ` must come AFTER ` dependencies = [...] ` , not before. TOML will silently parse dependencies as a URL key otherwise.
190+ - ** Windows encoding** : All file reads/writes must specify ` encoding="utf-8" ` . Minified JS vendor files contain bytes invalid in cp1252.
191+ - ** FastAPI path params with colons** : Node IDs contain colons (e.g. ` gha:workflow:build ` ). Use ` {node_id:path} ` in route definitions. Route ordering matters — ` /nodes/{id}/neighbors ` must be registered BEFORE ` /nodes/{id} ` .
192+ - ** MCP transport** : Use streamable HTTP (` mcp.http_app(transport="streamable-http") ` ), NOT SSE.
193+ - ** Loop bounds from user input** : Cap ` radius ` and ` depth ` params (max 10) to prevent DoS. SonarCloud flags this as a vulnerability.
194+ - ** Vendor JS for offline use** : Cytoscape.js and Dagre.js are bundled in ` flow/vendor/ ` and inlined into HTML at render time. No CDN dependencies.
195+ - ** uv.lock** : Always regenerate with ` uv lock ` after pyproject.toml changes.
132196
133197## Updating This File
134198
0 commit comments