diff --git a/CHANGELOG.md b/CHANGELOG.md index 2faf2343..ae7094f1 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -41,9 +41,60 @@ for that specific tag for the per-commit details. summary of the Best Practices state, Scorecard baseline + target (≥ 8.0/10 stretch with eight checks at max), known floor reductions, and the OSS-CLI stack reference. (RAN-52 AC #7) +- `PROJECT_SUMMARY.md` (repo-root agent entry doc) and + [`docs/project/`](docs/project/) deep-dives (architecture, data-model, + build-and-run, conventions, ui, flows) — written for AI agents and humans + who need to understand and modify the codebase, every claim grounded in a + file path. Sits alongside `CLAUDE.md` (which remains the canonical + hand-maintained internals doc). +- `docs/specs/` — directory for active architectural design specs. First + entry: `2026-04-27-resolver-spi-and-java-pilot-design.md`, the design for + sub-project 1 of the "robust graph" decomposition (symbol-resolver SPI + between parse and detect, Java pilot via JavaParser's `JavaSymbolSolver`, + `Confidence` enum + `source` field on every `CodeNode` / `CodeEdge`, + 4–6 Java detectors migrated, 9 layers of aggressive testing). Implementation + in flight on `feat/sub-project-1-resolver-spi-and-java-pilot`. +- **Symbol-resolver SPI** (sub-project 1, Phases 1–4 of the resolver-and-Java-pilot + plan): the foundation for moving the graph from regex-class-of-correctness + to AST-and-symbol-resolution-class-of-correctness. New `Confidence` enum + (`LEXICAL`/`SYNTACTIC`/`RESOLVED` with stable `score()` mapping) plus a + `source` field land on every `CodeNode` and `CodeEdge`, round-trip through + Neo4j (bare `confidence`/`source` properties on nodes and `RELATES_TO` + relationships) and through the H2 analysis cache (`CACHE_VERSION` bumped + 4 → 5 so existing v4 caches drop and rebuild on next open). Read paths are + non-throwing — legacy data without these fields reads back as + `LEXICAL`/null, never NPEs. New SPI under + `intelligence/resolver/`: `Resolved` interface + `EmptyResolved` singleton + sentinel, `SymbolResolver` per-language backend, `ResolutionException`, + `ResolverRegistry` (Spring `@Service` with deterministic alphabetical + bootstrap, case-insensitive lookup, per-resolver failure isolation). First + backend `JavaSymbolResolver` wraps `javaparser-symbol-solver-core` 3.28.0 + (Apache-2.0, same release train as `javaparser-core`) with a + `JavaSourceRootDiscovery` that walks Maven/Gradle/plain layouts under a + project root (skipping `target/`, `build/`, `node_modules/`, `.git/`, etc.; + symlink-loop-safe via `NOFOLLOW_LINKS`). `DetectorContext` now carries an + `Optional` (`withResolved()` opt-in, `Optional.empty()` for every + detector that doesn't care — fully backward compatible). `Detector.defaultConfidence()` + declares the per-detector floor (`LEXICAL` for regex bases, `SYNTACTIC` for + AST/structured/JavaParser/JavaMessaging bases) and `DetectorEmissionDefaults.applyDefaults` + is wired into every `detector.detect()` call site in `Analyzer.java` — + emissions whose `source` is null get stamped at the orchestration boundary + (detectors that explicitly stamp survive untouched). 11 atomic commits + ship with ~290 new tests covering happy paths, legacy-data fallbacks, + malformed inputs, determinism, concurrency-safe construction, and singleton + invariants. Detector migrations to consume `ctx.resolved()` and the + resolver-bootstrap-into-Analyzer hook follow in sub-project 1 Phase 5. ### Changed +- Documentation count drift fixed: detector total updated from **97 → 99** + (live count, excluding `Abstract*` and `*Helper*`); `NodeKind` total + updated from **32 → 34** (javadoc at `model/NodeKind.java` was stale by + two entries); `EdgeKind` total updated from **27 → 28** (javadoc at + `model/EdgeKind.java` was stale by one entry). `README.md`, `CLAUDE.md`, + `PROJECT_SUMMARY.md`, `docs/project/*.md`, and the source javadocs are + now in sync. + - Branch protection on `main` requires every commit to be ssh-signed (RAN-46 AC #2). Force-pushes to `main` are rejected; squash-merge from PRs is the only path. diff --git a/CLAUDE.md b/CLAUDE.md index 69e43b87..304342ac 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -2,7 +2,7 @@ ## What This Project Is -**codeiq** -- a CLI tool + server that scans codebases to build a deterministic code knowledge graph. No AI, no external APIs -- pure static analysis. 97 detectors, 35+ languages, Neo4j Embedded graph database, Spring AI MCP server, REST API, web UI. +**codeiq** -- a CLI tool + server that scans codebases to build a deterministic code knowledge graph. No AI, no external APIs -- pure static analysis. 99 detectors, 35+ languages, Neo4j Embedded graph database, Spring AI MCP server, REST API, web UI. - **Maven coordinates:** `io.github.randomcodespace.iq:code-iq` (artifactId intentionally unchanged) - **CLI command:** `codeiq` (via `java -jar`; JAR filename remains `code-iq-*-cli.jar`) @@ -101,7 +101,7 @@ io.github.randomcodespace.iq |-- graph/ # GraphStore (Neo4j facade), GraphRepository (SDN, writes only) |-- health/ # GraphHealthIndicator (Spring Actuator) |-- mcp/ # McpTools (34 @McpTool methods, read-only, includes intelligence tools) - |-- model/ # CodeNode, CodeEdge, NodeKind (32), EdgeKind (27) + |-- model/ # CodeNode, CodeEdge, NodeKind (34), EdgeKind (28), Confidence |-- intelligence/ # Intelligence enrichment (Phase 2-5) | |-- lexical/ # LexicalEnricher, LexicalQueryService, DocCommentExtractor, SnippetStore | |-- extractor/ # LanguageEnricher, LanguageExtractor, LanguageExtractionResult @@ -328,8 +328,8 @@ mvn dependency-check:check | `analyzer/ServiceDetector.java` | Service boundary detection from build files (30+ build systems) | | `analyzer/linker/*.java` | Cross-file linkers: TopicLinker, EntityLinker, ModuleContainmentLinker | | `detector/Detector.java` | Detector interface | -| `model/NodeKind.java` | 32 node types enum | -| `model/EdgeKind.java` | 27 edge types enum | +| `model/NodeKind.java` | 34 node types enum | +| `model/EdgeKind.java` | 28 edge types enum | | `model/CodeNode.java` | Graph node entity | | `model/CodeEdge.java` | Graph edge entity | | `graph/GraphStore.java` | Neo4j facade (UNWIND bulk save, Cypher reads, indexes) | diff --git a/PROJECT_SUMMARY.md b/PROJECT_SUMMARY.md new file mode 100644 index 00000000..b3f4d5ce --- /dev/null +++ b/PROJECT_SUMMARY.md @@ -0,0 +1,160 @@ +# Project Summary: codeiq + +> Generated by `project-summarizer` on 2026-04-27. Audience: AI agents (and humans) who need to understand and modify this codebase. Every claim should be checkable; items marked `[inferred]` were not directly verified. +> +> **Canonical depth lives in [`CLAUDE.md`](CLAUDE.md)** (~28 KB, agent-oriented, hand-maintained). This file is a thin entry point that summarizes and links into [`CLAUDE.md`](CLAUDE.md), the runbooks under [`shared/runbooks/`](shared/runbooks/), and the deep-dives under [`docs/project/`](docs/project/). Treat `CLAUDE.md` as the source of truth where they overlap. + +## Identity + +- **What it is:** CLI tool + read-only server that scans codebases and builds a deterministic code knowledge graph (no AI, no external APIs — pure static analysis) with a Spring AI MCP server, REST API, and React UI on top of an embedded Neo4j graph. See [`README.md`](README.md), [`CLAUDE.md`](CLAUDE.md) §"What This Project Is". +- **Type:** monorepo (Java backend + React SPA bundled into one JAR) — combined CLI + library + read-only web service. +- **Status:** **active** — 30+ commits in the last 7 days on `main` (mostly RAN-46/52/57 supply-chain work). Last non-checkpoint commit `92c6e00` on 2026-04-26. Several `checkpoint: pre-yolo` auto-commits are noise from a session hook, not real activity. +- **Maven coordinates:** `io.github.randomcodespace.iq:code-iq` (see `` / `` in `pom.xml`). CLI command: `codeiq` (via `java -jar code-iq-*-cli.jar`). +- **Primary languages:** Java 25 (server, CLI, all detectors); TypeScript 5.7 + React 18 (SPA at `src/main/frontend/`). + +## Tech stack + +Read directly from the `pom.xml` `` block and `src/main/frontend/package.json`. + +| Layer | Tech | Source | +|-------|------|--------| +| Runtime | Java 25 | `pom.xml` `25` | +| Web/DI | Spring Boot 4.0.5 | `pom.xml` (parent `spring-boot-starter-parent`) | +| Graph DB | Neo4j Embedded 2026.02.3 (Community) | `pom.xml` `` | +| MCP | Spring AI 2.0.0-M3 (`spring-ai-starter-mcp-server-webmvc`) | `pom.xml` `` | +| CLI | Picocli 4.7.7 (`picocli-spring-boot-starter`) | `pom.xml` `` | +| AST (Java) | JavaParser 3.28.0 | `[CLAUDE.md]` — `pom.xml` references via dep | +| Parsers (35+ langs) | ANTLR 4.13.2 (TS/JS, Python, Go, C#, Rust, C++) | `[CLAUDE.md]` | +| Cache | H2 in embedded mode (incremental analysis cache) | `src/main/java/io/github/randomcodespace/iq/cache/AnalysisCache.java` | +| Frontend | React 18.3 + AntD 5.24 + ECharts 5.6 + react-router 7 | `src/main/frontend/package.json` | +| Frontend build | Vite 6.4 + TS 5.7 → bundled into `src/main/resources/static/` | `src/main/frontend/vite.config.ts` | +| Tests | JUnit (236 test files), Playwright for SPA E2E | `find src/test/java -name '*.java' \| wc -l` = 236 | +| Static analysis | SpotBugs 4.9.8.3, Jacoco 0.8.14, Checkstyle 3.6.0 | `pom.xml` `` / `` / `` | +| Security gates | OSV-Scanner, Trivy, Semgrep, Gitleaks, jscpd, SBOM | `.github/workflows/security.yml` | +| Supply chain | OpenSSF Scorecard + Best Practices (project_id 12650) | `.github/workflows/scorecard.yml`, `.bestpractices.json` | + +**Pinned security overrides** (bumps inside Spring Boot 4.0.5's BOM): Tomcat 11.0.21 (CVE-2026-34483/34487/34500), Jackson 3.1.1 (GHSA-2m67-wjpj-xhg9). Revert when Spring Boot 4.0.6+ catches up. See the `` and `` properties + comments in `pom.xml`. + +## Entry points + +| Entrypoint | File | Purpose | +|---|---|---| +| CLI / Spring Boot main | `src/main/java/io/github/randomcodespace/iq/CodeIqApplication.java` | Boots Spring, picks `serving` vs `indexing` profile from the first arg, hands control to Picocli | +| CLI dispatcher | `src/main/java/io/github/randomcodespace/iq/cli/CodeIqCli.java` | Top-level Picocli `@Command` with 14 subcommands | +| 14 subcommands | `src/main/java/io/github/randomcodespace/iq/cli/{Index,Enrich,Serve,Analyze,Stats,Graph,Query,Find,Cypher,Topology,Flow,Bundle,Cache,Plugins,Version,Config}Command.java` | One file per CLI command (20 files including subcommands and helpers) | +| REST API (5 controllers) | `src/main/java/io/github/randomcodespace/iq/api/{Graph,Flow,Topology,Intelligence}Controller.java` + `SafeFileReader.java` (helper) | 37 read-only endpoints on `/api/**`, `@Profile("serving")` | +| MCP tools (34 tools) | `src/main/java/io/github/randomcodespace/iq/mcp/McpTools.java` | `@McpTool` methods, auto-registered by Spring AI starter | +| SPA entry | `src/main/frontend/src/main.tsx` → `App.tsx` | React 18 + react-router 7, 4 pages | + +## Directory map + +``` +codeiq/ +├── pom.xml — Maven build (single module, JAR) +├── CLAUDE.md — canonical agent-oriented internals doc +├── README.md — human-facing intro + quick start +├── AGENTS.md — repo-root agent entry pointer +├── CHANGELOG.md — Keep-a-Changelog +├── SECURITY.md — vuln disclosure policy +├── LICENSE — Apache-2.0 +├── .bestpractices.json — OpenSSF Best Practices manifest +├── spotbugs-exclude.xml — SpotBugs suppressions +├── codeiq.yml — (optional, per-project config) +├── .github/ +│ ├── workflows/ — 5 workflows: beta-java, ci-java, +│ │ release-java, scorecard, security +│ └── dependabot.yml — Maven + GHA + npm, weekly grouped +├── src/ +│ ├── main/ +│ │ ├── java/io/github/randomcodespace/iq/ — Java sources (see CLAUDE.md "Package Structure") +│ │ ├── frontend/ — React SPA (Vite, builds into resources/static/) +│ │ └── resources/ +│ │ ├── application.yml — Spring config (profile-conditional) +│ │ └── static/ — Vite-built SPA assets (gitignored) +│ └── test/java/ — 236 test files (unit + E2E quality) +├── docs/ +│ ├── codeiq.yml.example — full unified-config schema +│ └── superpowers/baselines/ — phase exit-gate snapshots +├── shared/runbooks/ — engineering-standards, release, rollback, +│ first-time-setup, test-strategy +├── scripts/ — repo-local helpers (e.g. signing setup) +└── .codeiq/ — created at runtime: cache/ (H2) + graph/ (Neo4j) +``` + +Skipped from the map: `target/`, `.git/`, `.classpath`, `.factorypath`, `.project`, `.settings/`, `node_modules/`, `.dockerignore` — generated, IDE, or noise. + +## Run, build, test + +Verified against `.github/workflows/ci-java.yml` (the actual CI gate) and `pom.xml`. + +```bash +# Build (skipping tests, fastest) +mvn clean package -DskipTests + +# Build + test + spotbugs + dependency-check (the CI gate) +mvn verify + +# Build skipping the npm/Vite frontend (backend-only contributors) +mvn test -Dfrontend.skip=true + +# Skip the OWASP NVD download (~1 GB) on first local run +mvn verify -Ddependency-check.skip=true + +# Run a specific test class +mvn test -Dtest=SpringRestDetectorTest + +# Run the pipeline against your code +java -jar target/code-iq-*-cli.jar index /path/to/repo +java -jar target/code-iq-*-cli.jar enrich /path/to/repo +java -jar target/code-iq-*-cli.jar serve /path/to/repo # → http://localhost:8080 +``` + +CI gate is `mvn verify` — runs unit + integration tests **plus** SpotBugs and OWASP dependency-check executions bound to the `verify` phase (`pom.xml`). `mvn test` alone skips the security gate. See `.github/workflows/ci-java.yml`. + +**Required env / external services:** none. codeiq is offline-first by design — Neo4j and H2 are embedded; no external server, no network calls at runtime. Air-gapped install: `git clone` + Maven mirror + `mvn package`. See [`shared/runbooks/first-time-setup.md`](shared/runbooks/first-time-setup.md). + +**Cache + graph dirs at runtime** (created in your scanned repo): +- `.codeiq/cache/` — H2 incremental analysis cache (`CACHE_VERSION=4` constant near the top of `cache/AnalysisCache.java`) +- `.codeiq/graph/graph.db/` — Neo4j Embedded data dir + +## Conventions an agent must respect + +(Top 7. Full list in [`docs/project/conventions.md`](docs/project/conventions.md) and [`CLAUDE.md`](CLAUDE.md) §"Critical Rules" / §"Code Conventions".) + +1. **Serving layer is read-only.** No POST/PUT/DELETE on `/api`, no MCP tool that mutates state. All ingestion happens via CLI (`index`, `enrich`). See `api/GraphController.java` (only `@GetMapping`s) and `mcp/McpTools.java`. +2. **Determinism is non-negotiable.** Same input → byte-identical graph. Sort `Set` iterations (`TreeSet` or `stream().sorted()`); detectors must be stateless `@Component` beans; `GraphBuilder` flushes nodes before edges. See `analyzer/GraphBuilder.java`. +3. **Generic detection, not example-specific.** Every detector must work for all languages/frameworks in its scope. Framework detectors (Quarkus, Fastify, etc.) **must** carry discriminator guards requiring framework-specific imports. +4. **Detectors are auto-discovered Spring `@Component` beans** — no registry edits needed. Drop a class in `detector//`, implement `Detector` (or extend an `Abstract*Detector` base class), add a unit test + a determinism test. +5. **Property keys ≥ 3 occurrences become constants.** `private static final String PROP_FRAMEWORK = "framework";` etc. — see existing detectors. +6. **Configuration hierarchy:** built-in defaults → `~/.codeiq/config.yml` → `./codeiq.yml` → `CODEIQ_
_` env → CLI flags. Single source of truth: `codeiq.yml`. Spring-owned keys (e.g. `codeiq.neo4j.enabled`) stay in `application.yml`. See [`docs/codeiq.yml.example`](docs/codeiq.yml.example) and `CLAUDE.md` §"Configuration". +7. **Air-gapped build target.** No public-internet calls at runtime, all assets bundled local, vendored where possible. Per-org rule in [`shared/runbooks/engineering-standards.md`](shared/runbooks/engineering-standards.md) §7 and [`~/.claude/rules/build.md`](~/.claude/rules/build.md). + +## Gotchas + +(Top items. Full list in [`CLAUDE.md`](CLAUDE.md) §"Gotchas & Lessons Learned" — that section is canonical and longer; cross-reference to it.) + +- **Pipeline order is `index → enrich → serve`.** Don't put analysis in `serve`; it's read-only. `serve` requires a prior `enrich` for a populated Neo4j directory. +- **Neo4j property round-trip uses `prop_*` keys.** Properties are written by `bulkSave` (UNWIND Cypher) with a `prop_` prefix and restored by `nodeFromNeo4j()` in `graph/GraphStore.java`. If you add a new property, verify it survives write→read. +- **Edges must be attached to source nodes before `bulkSave()`.** Cypher `MATCH` silently returns 0 rows for missing source IDs — pre-validate. +- **`@ActiveProfiles("test")` is required on every `@SpringBootTest`** to avoid Neo4j auto-startup conflicts. +- **`AnalysisCache` uses a `ReentrantReadWriteLock`** (not `synchronized`). JEP 491 (Java 25) means lock primitives no longer pin virtual-thread carriers; the read/write lock is what prevents `ClosedChannelException` on H2's MVStore under concurrent virtual-thread access. Don't "simplify" to `synchronized`. +- **Bump `CACHE_VERSION` in `cache/AnalysisCache.java`** (top of file) when you change the file-hash algorithm or H2 schema. Stale caches auto-clear on next run. +- **SnakeYAML parses bare `on` as `Boolean.TRUE`.** Compare YAML keys with `String.valueOf(key)`, not `Boolean.TRUE.equals(key)` (SonarCloud S2159). +- **Determinism gate:** every new detector needs a determinism test (run twice, assert equal output) — see existing `*DetectorTest.java` for the pattern. +- **First `mvn verify` downloads ~1 GB NVD database** for OWASP dependency-check. Override locally with `-Ddependency-check.skip=true`. +- **Live counts (verified 2026-04-27):** **99 concrete detectors** (excluding `Abstract*` and `*Helper*`), **34 `NodeKind` values**, **28 `EdgeKind` values**, **236 test files / 3,270 test methods**. `CLAUDE.md`, `README.md`, and the source javadocs are in sync. When adding a `NodeKind` / `EdgeKind` / detector, update the count in the source javadoc, `CLAUDE.md` (intro + package summary + key-files table), `README.md` (intro + mermaid subgraph), and this file in the same PR — drift is the default if you don't. +- **Don't merge anything that fails `mvn verify`.** SpotBugs + dependency-check + tests are bound to `verify`, not `test`. + +## Where to look next + +- **Architecture & components** → [`docs/project/architecture.md`](docs/project/architecture.md) +- **Data model (Node/Edge kinds, Neo4j schema, H2 cache)** → [`docs/project/data-model.md`](docs/project/data-model.md) +- **UI (React SPA, Vite, page hierarchy)** → [`docs/project/ui.md`](docs/project/ui.md) +- **Key flows (index→enrich→serve, MCP tool lifecycle)** → [`docs/project/flows.md`](docs/project/flows.md) +- **Conventions (full)** → [`docs/project/conventions.md`](docs/project/conventions.md) +- **Build & run details (Maven phases, ANTLR codegen, frontend embed)** → [`docs/project/build-and-run.md`](docs/project/build-and-run.md) +- **Active design specs (in-flight architectural work)** → [`docs/specs/`](docs/specs/) — currently: sub-project 1 (resolver SPI + Java pilot + confidence schema) +- **Internal canonical reference (hand-maintained)** → [`CLAUDE.md`](CLAUDE.md) +- **Engineering standards / release / rollback** → [`shared/runbooks/`](shared/runbooks/) + +(Skipped: `docs/project/integrations.md` — codeiq makes no runtime calls to external APIs / queues. The `docs/codeiq.yml.example` schema and `shared/runbooks/release.md` cover what little external surface exists at build/release time.) diff --git a/README.md b/README.md index a1102a9a..505f43aa 100644 --- a/README.md +++ b/README.md @@ -37,13 +37,13 @@ java -jar target/code-iq-*-cli.jar serve /path/to/repo ## How It Works -codeiq scans source files using 97 detectors across 35+ languages, builds a knowledge graph of code relationships, and serves it via REST API, MCP server, and React UI. +codeiq scans source files using 99 detectors across 35+ languages, builds a knowledge graph of code relationships, and serves it via REST API, MCP server, and React UI. ```mermaid graph TD subgraph "1. Index" A[File Discovery] -->|git ls-files| B[Parsing Layer] - B -->|JavaParser / ANTLR / Regex| C[97 Detectors] + B -->|JavaParser / ANTLR / Regex| C[99 Detectors] C -->|Virtual Threads| D[Graph Builder] D --> E[(H2 Cache)] end @@ -225,7 +225,7 @@ See `docs/codeiq.yml.example` for the full schema. ```mermaid graph LR - subgraph "Node Types (32)" + subgraph "Node Types (34)" direction TB N1[service] --- N2[endpoint] N2 --- N3[class] @@ -236,7 +236,7 @@ graph LR N7 --- N8[config_file] end - subgraph "Edge Types (27)" + subgraph "Edge Types (28)" direction TB E1[calls] --- E2[imports] E2 --- E3[depends_on] @@ -265,7 +265,7 @@ All results are 100% deterministic across runs. ```bash git clone https://github.com/RandomCodeSpace/codeiq.git cd codeiq -mvn clean package # Build + test (3,219 tests) +mvn clean package # Build + test (3,270 tests across 236 files) mvn test # Tests only ``` diff --git a/docs/plans/2026-04-27-sub-project-1-resolver-spi-and-java-pilot.md b/docs/plans/2026-04-27-sub-project-1-resolver-spi-and-java-pilot.md new file mode 100644 index 00000000..6683a650 --- /dev/null +++ b/docs/plans/2026-04-27-sub-project-1-resolver-spi-and-java-pilot.md @@ -0,0 +1,1172 @@ +# Sub-project 1 — Resolver SPI + Java Pilot Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use `superpowers:subagent-driven-development` (recommended) or `superpowers:executing-plans` to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Add a symbol-resolution stage between parse and detect, ship a Java backend wrapping JavaParser's `JavaSymbolSolver`, attach `Confidence` + `source` to every node/edge with Neo4j round-trip and an H2 cache version bump, migrate 4–6 Java detectors as proof of value, and bake in 9 layers of aggressive testing — without changing what existing detectors do. + +**Architecture:** New SPI under `intelligence/resolver/` with a per-language registry mirroring `DetectorRegistry`. The Java backend wraps JavaParser `JavaSymbolSolver` configured from sorted source roots + `ReflectionTypeSolver`. Detectors opt-in via `ctx.resolved()` returning `Optional`; existing detectors compile and behave identically when resolution is absent or disabled. + +**Tech stack:** Java 25, Spring Boot 4.0.5, JavaParser 3.28.0 + new `javaparser-symbol-solver-core`, Neo4j Embedded 2026.02.3, H2 (cache), JUnit 5 (existing test scope), `net.jqwik:jqwik` (new test scope, pending license OK), PIT mutation testing (new non-default Maven profile). + +**Reference:** Full design in [`../specs/2026-04-27-resolver-spi-and-java-pilot-design.md`](../specs/2026-04-27-resolver-spi-and-java-pilot-design.md). Read it before starting — every task here has a corresponding section in the spec. + +**Working branch:** `feat/sub-project-1-resolver-spi-and-java-pilot` (already created and ahead of `main` by the spec + doc-sync commits). + +--- + +## File Structure + +### NEW files (create) + +| Path | Responsibility | +|---|---| +| `src/main/java/io/github/randomcodespace/iq/model/Confidence.java` | Enum `LEXICAL` / `SYNTACTIC` / `RESOLVED` + numeric `score()` | +| `src/main/java/io/github/randomcodespace/iq/intelligence/resolver/SymbolResolver.java` | SPI interface | +| `src/main/java/io/github/randomcodespace/iq/intelligence/resolver/Resolved.java` | Per-file resolution result interface | +| `src/main/java/io/github/randomcodespace/iq/intelligence/resolver/EmptyResolved.java` | Singleton for "no resolution" cases | +| `src/main/java/io/github/randomcodespace/iq/intelligence/resolver/ResolutionException.java` | Wraps backend failures | +| `src/main/java/io/github/randomcodespace/iq/intelligence/resolver/ResolverRegistry.java` | Spring auto-discovery + `bootstrap(rootPath)` + `resolverFor(language)` | +| `src/main/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaSourceRootDiscovery.java` | Detect Maven/Gradle/plain source roots from a project root | +| `src/main/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaResolved.java` | Java-specific `Resolved` carrying `JavaSymbolSolver` reference + per-CU info | +| `src/main/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaSymbolResolver.java` | `@Component`, builds `CombinedTypeSolver`, resolves Java files | +| `src/test/java/io/github/randomcodespace/iq/model/ConfidenceTest.java` | Unit test | +| `src/test/java/io/github/randomcodespace/iq/intelligence/resolver/ResolverRegistryTest.java` | Auto-discovery + bootstrap tests | +| `src/test/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaSourceRootDiscoveryTest.java` | Source-root discovery on synthetic layouts | +| `src/test/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaSymbolResolverTest.java` | Resolver unit tests (Layer 1) — 15+ scenarios | +| `src/test/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaSymbolResolverConcurrencyTest.java` | Layer 3 stress | +| `src/test/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaSymbolResolverPathologicalTest.java` | Layer 4 | +| `src/test/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaSymbolResolverAdversarialTest.java` | Layer 5 | +| `src/test/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaSymbolResolverDeterminismTest.java` | Layer 6 | +| `src/test/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaSymbolResolverPropertyTest.java` | Layer 8 (jqwik) | +| `src/test/resources/intelligence/resolver/java//...` | Synthetic Java sources for unit tests | + +### CHANGED files (modify) + +| Path | Change | +|---|---| +| `src/main/java/io/github/randomcodespace/iq/model/CodeNode.java` | Add `confidence: Confidence`, `source: String`. Round-trippable. | +| `src/main/java/io/github/randomcodespace/iq/model/CodeEdge.java` | Same as `CodeNode`. | +| `src/main/java/io/github/randomcodespace/iq/graph/GraphStore.java` | Write/read `prop_confidence`, `prop_source`. Update `nodeFromNeo4j`, `edgeFromNeo4j`. | +| `src/main/java/io/github/randomcodespace/iq/cache/AnalysisCache.java` | Bump `CACHE_VERSION` 4→5. Add columns. | +| `src/main/java/io/github/randomcodespace/iq/detector/DetectorContext.java` | Add `Optional resolved()` + builder support. | +| `src/main/java/io/github/randomcodespace/iq/detector/AbstractRegexDetector.java` | Set default `Confidence.LEXICAL` on emitted nodes/edges. | +| `src/main/java/io/github/randomcodespace/iq/detector/AbstractJavaParserDetector.java` | Set default `Confidence.SYNTACTIC`. | +| `src/main/java/io/github/randomcodespace/iq/detector/AbstractAntlrDetector.java` | Set default `Confidence.SYNTACTIC`. | +| `src/main/java/io/github/randomcodespace/iq/detector/AbstractStructuredDetector.java` | Set default `Confidence.SYNTACTIC`. | +| `src/main/java/io/github/randomcodespace/iq/analyzer/Analyzer.java` | Wire ResolverRegistry bootstrap + per-file resolve. | +| `src/main/java/io/github/randomcodespace/iq/cli/IndexCommand.java` | Mirror `Analyzer` in the H2 batched pipeline. | +| `src/main/java/io/github/randomcodespace/iq/config/CodeIqConfig.java` (or unified equivalent) | Bind new `intelligence.symbol_resolution.java.*` keys. | +| `src/main/java/io/github/randomcodespace/iq/detector/jvm/java/SpringServiceDetector.java` | Use `ctx.resolved()` for `INJECTS` edge resolution. | +| `src/main/java/io/github/randomcodespace/iq/detector/jvm/java/SpringRepositoryDetector.java` | Use `ctx.resolved()` for entity-type linking. | +| `src/main/java/io/github/randomcodespace/iq/detector/jvm/java/JpaEntityDetector.java` | Use `ctx.resolved()` for `MAPS_TO` between entities. | +| `src/main/java/io/github/randomcodespace/iq/detector/jvm/java/JpaRepositoryDetector.java` | Same as Spring repo, deeper. | +| `src/main/java/io/github/randomcodespace/iq/detector/jvm/java/KafkaListenerDetector.java` | Resolve topic constants. | +| `src/main/java/io/github/randomcodespace/iq/detector/jvm/java/SpringRestDetector.java` | Resolve `@RequestBody` types for `MAPS_TO` edges. | +| `src/test/java/io/github/randomcodespace/iq/detector/jvm/java/Test.java` | Add resolved-mode + fallback-mode + mixed-mode assertions. | +| `pom.xml` | Add `javaparser-symbol-solver-core` (latest stable matching `javaparser-core`) + `net.jqwik:jqwik` (test scope, pending license OK). PIT in non-default profile. | +| `docs/codeiq.yml.example` | Document `intelligence.symbol_resolution.java.*` keys. | +| `CHANGELOG.md` | Expand `[Unreleased]` entry once features are integrated. | +| `CLAUDE.md` | "Gotchas" addition: confidence/provenance is now mandatory; resolver pass exists; cache version 5. | +| `PROJECT_SUMMARY.md` | Tech stack + Gotchas update. | + +--- + +## How to use this plan + +- Each task is one logical commit (or small commit chain). +- Each step inside a task is 2–5 minutes and ends with verifiable output. +- Tests come first (TDD). Run them, see them fail, then implement, run them, see them pass, commit. +- Determinism tests are mandatory for every detector that gets migrated (Phase 6) and for the resolver itself (Task 30 / Layer 6). +- Frequent commits — one per task minimum, sometimes more. +- Unless noted, **all commands run from the repo root** `/home/dev/projects/codeiq`. + +**Resume rule:** if interrupted mid-task, the next session re-runs the test command from the unfinished step to confirm where it stopped, then continues. + +--- + +## Phase 1 — Schema foundation (Tasks 1–7) + +### Task 1: `Confidence` enum + +**Files:** +- Create: `src/main/java/io/github/randomcodespace/iq/model/Confidence.java` +- Test: `src/test/java/io/github/randomcodespace/iq/model/ConfidenceTest.java` + +- [ ] **Step 1: Write the failing test** + +```java +// src/test/java/io/github/randomcodespace/iq/model/ConfidenceTest.java +package io.github.randomcodespace.iq.model; + +import org.junit.jupiter.api.Test; +import static org.junit.jupiter.api.Assertions.*; + +class ConfidenceTest { + + @Test + void scoreMappingIsStable() { + assertEquals(0.6, Confidence.LEXICAL.score(), 1e-9); + assertEquals(0.8, Confidence.SYNTACTIC.score(), 1e-9); + assertEquals(0.95, Confidence.RESOLVED.score(), 1e-9); + } + + @Test + void naturalOrderingMatchesScore() { + assertTrue(Confidence.LEXICAL.compareTo(Confidence.SYNTACTIC) < 0); + assertTrue(Confidence.SYNTACTIC.compareTo(Confidence.RESOLVED) < 0); + } + + @Test + void valueOfNullIsRejected() { + assertThrows(NullPointerException.class, () -> Confidence.fromString(null)); + } + + @Test + void fromStringIsCaseInsensitive() { + assertEquals(Confidence.RESOLVED, Confidence.fromString("resolved")); + assertEquals(Confidence.RESOLVED, Confidence.fromString("RESOLVED")); + assertEquals(Confidence.LEXICAL, Confidence.fromString("LeXiCaL")); + } + + @Test + void fromStringRejectsUnknown() { + assertThrows(IllegalArgumentException.class, () -> Confidence.fromString("perfect")); + } +} +``` + +- [ ] **Step 2: Run test to verify it fails** + +```bash +mvn test -Dtest=ConfidenceTest -Dfrontend.skip=true -Ddependency-check.skip=true -q +``` + +Expected: compile error — `Confidence` does not exist. + +- [ ] **Step 3: Write minimal implementation** + +```java +// src/main/java/io/github/randomcodespace/iq/model/Confidence.java +package io.github.randomcodespace.iq.model; + +import java.util.Objects; + +/** + * Confidence in the truth of a node or edge, based on the parser pipeline that + * produced it. Lower means the assertion is from text patterns; higher means + * the assertion is backed by parsed structure or resolved symbol types. + * + *

Comparable: {@code LEXICAL} < {@code SYNTACTIC} < {@code RESOLVED}. + * + *

Numeric mapping (via {@link #score()}) is stable and intended for Cypher / + * MCP / SPA filtering. The enum itself is the authoritative form. + */ +public enum Confidence { + /** Pattern-only match (regex). */ + LEXICAL(0.6), + /** AST or parse tree, no symbol resolution. */ + SYNTACTIC(0.8), + /** Resolved via a {@code SymbolResolver}. */ + RESOLVED(0.95); + + private final double score; + + Confidence(double score) { + this.score = score; + } + + public double score() { + return score; + } + + public static Confidence fromString(String value) { + Objects.requireNonNull(value, "Confidence value must not be null"); + for (Confidence c : values()) { + if (c.name().equalsIgnoreCase(value)) { + return c; + } + } + throw new IllegalArgumentException("Unknown Confidence: " + value); + } +} +``` + +- [ ] **Step 4: Run test to verify it passes** + +```bash +mvn test -Dtest=ConfidenceTest -Dfrontend.skip=true -Ddependency-check.skip=true -q +``` + +Expected: 5/5 tests pass. + +- [ ] **Step 5: Commit** + +```bash +git add src/main/java/io/github/randomcodespace/iq/model/Confidence.java \ + src/test/java/io/github/randomcodespace/iq/model/ConfidenceTest.java +git commit -m "feat(model): add Confidence enum (LEXICAL/SYNTACTIC/RESOLVED) + +Per sub-project 1 spec §5.3. Numeric score() mapping stable (0.6/0.8/0.95). +Comparable by natural order. fromString() is case-insensitive and rejects +null + unknown values. + +Co-Authored-By: Claude Opus 4.7 (1M context) " +``` + +--- + +### Task 2: Add `confidence` + `source` to `CodeNode` + +**Files:** +- Modify: `src/main/java/io/github/randomcodespace/iq/model/CodeNode.java` +- Test: existing `CodeNodeTest.java` (or create one if missing) — add round-trip assertion via `equals`/`hashCode` + +- [ ] **Step 1: Read current `CodeNode.java`** to see its shape (record vs class, builder vs constructor). + +```bash +sed -n '1,80p' src/main/java/io/github/randomcodespace/iq/model/CodeNode.java +``` + +- [ ] **Step 2: Write failing test** + +```java +// src/test/java/io/github/randomcodespace/iq/model/CodeNodeConfidenceTest.java +package io.github.randomcodespace.iq.model; + +import org.junit.jupiter.api.Test; +import static org.junit.jupiter.api.Assertions.*; + +class CodeNodeConfidenceTest { + + @Test + void newNodeCarriesConfidenceAndSource() { + CodeNode n = CodeNode.builder() + .id("node:foo:class:Foo") + .kind(NodeKind.CLASS) + .label("Foo") + .confidence(Confidence.SYNTACTIC) + .source("MyDetector") + .build(); + assertEquals(Confidence.SYNTACTIC, n.confidence()); + assertEquals("MyDetector", n.source()); + } + + @Test + void confidenceDefaultsToLexicalIfUnset() { + CodeNode n = CodeNode.builder() + .id("node:foo:class:Foo") + .kind(NodeKind.CLASS) + .label("Foo") + .source("MyDetector") + .build(); + assertEquals(Confidence.LEXICAL, n.confidence(), + "missing confidence falls back to LEXICAL — least committal"); + } + + @Test + void sourceIsRequired() { + assertThrows(IllegalStateException.class, () -> CodeNode.builder() + .id("node:foo:class:Foo") + .kind(NodeKind.CLASS) + .label("Foo") + .build(), + "source is mandatory — every node knows which detector emitted it"); + } +} +``` + +- [ ] **Step 3: Run test to verify it fails** + +```bash +mvn test -Dtest=CodeNodeConfidenceTest -Dfrontend.skip=true -Ddependency-check.skip=true -q +``` + +Expected: compile error — `confidence(...)` and `source(...)` not on builder. + +- [ ] **Step 4: Add fields + builder methods to `CodeNode`** + +Add fields, builder setters, getter accessors, equals/hashCode coverage. Field defaults: `confidence = Confidence.LEXICAL`, `source` required (validated in builder). + +(Code shown verbatim once existing structure is read in Step 1; the change must preserve all existing tests by leaving every other field's behavior unchanged.) + +- [ ] **Step 5: Run all model tests to verify nothing else regressed** + +```bash +mvn test -Dtest='io.github.randomcodespace.iq.model.*' -Dfrontend.skip=true -Ddependency-check.skip=true -q +``` + +Expected: all green. + +- [ ] **Step 6: Commit** + +```bash +git add src/main/java/io/github/randomcodespace/iq/model/CodeNode.java \ + src/test/java/io/github/randomcodespace/iq/model/CodeNodeConfidenceTest.java +git commit -m "feat(model): add confidence + source to CodeNode + +Per sub-project 1 spec §5.2. Both fields non-null. Confidence defaults to +LEXICAL (least committal). Source is mandatory — every node knows which +detector emitted it. + +Co-Authored-By: Claude Opus 4.7 (1M context) " +``` + +--- + +### Task 3: Add `confidence` + `source` to `CodeEdge` + +Same shape as Task 2, but on `CodeEdge`. Mirror the test class as `CodeEdgeConfidenceTest`. Same builder semantics. + +- [ ] **Step 1: Read current `CodeEdge.java`** +- [ ] **Step 2: Write failing test (`CodeEdgeConfidenceTest`)** — mirror Task 2's three test cases on `CodeEdge.builder()`. +- [ ] **Step 3: Run + see failure.** +- [ ] **Step 4: Add fields + builder methods.** +- [ ] **Step 5: Run all model tests.** +- [ ] **Step 6: Commit:** `feat(model): add confidence + source to CodeEdge`. + +--- + +### Task 4: Round-trip `confidence` + `source` through Neo4j (write path) + +**Files:** +- Modify: `src/main/java/io/github/randomcodespace/iq/graph/GraphStore.java` +- Test: `src/test/java/io/github/randomcodespace/iq/graph/GraphStoreConfidenceRoundTripTest.java` (new) + +- [ ] **Step 1: Write the failing test.** + +```java +// src/test/java/io/github/randomcodespace/iq/graph/GraphStoreConfidenceRoundTripTest.java +package io.github.randomcodespace.iq.graph; + +import io.github.randomcodespace.iq.model.*; +import org.junit.jupiter.api.*; +import org.junit.jupiter.api.io.TempDir; +import java.nio.file.Path; +import java.util.List; +import static org.junit.jupiter.api.Assertions.*; + +class GraphStoreConfidenceRoundTripTest { + + @TempDir Path tmp; + GraphStore store; + + @BeforeEach void setup() { store = GraphStore.openEmbedded(tmp.resolve("graph.db")); } + @AfterEach void close() { store.close(); } + + @Test + void confidenceAndSourceRoundTrip() { + CodeNode in = CodeNode.builder() + .id("node:Foo.java:class:Foo") + .kind(NodeKind.CLASS).label("Foo") + .confidence(Confidence.RESOLVED).source("SpringServiceDetector") + .build(); + store.bulkSave(List.of(in), List.of()); + + CodeNode out = store.findById("node:Foo.java:class:Foo").orElseThrow(); + assertEquals(Confidence.RESOLVED, out.confidence()); + assertEquals("SpringServiceDetector", out.source()); + } +} +``` + +- [ ] **Step 2: Run; verify compile or assertion fail.** + +```bash +mvn test -Dtest=GraphStoreConfidenceRoundTripTest -Dfrontend.skip=true -Ddependency-check.skip=true -q +``` + +Expected: assertion fails (fields written via existing path don't include confidence/source). + +- [ ] **Step 3: Update `GraphStore.bulkSave` to write `prop_confidence` and `prop_source`**, and `nodeFromNeo4j` / `edgeFromNeo4j` to read them. Defaults if missing in Neo4j: `Confidence.LEXICAL` and `"unknown"`. + +- [ ] **Step 4: Run round-trip test; verify pass.** +- [ ] **Step 5: Run wider GraphStore test suite to ensure no regression.** + +```bash +mvn test -Dtest='io.github.randomcodespace.iq.graph.*' -Dfrontend.skip=true -Ddependency-check.skip=true -q +``` + +- [ ] **Step 6: Commit:** `feat(graph): round-trip confidence + source through Neo4j`. + +--- + +### Task 5: H2 cache schema migration to v5 + +**Files:** +- Modify: `src/main/java/io/github/randomcodespace/iq/cache/AnalysisCache.java` +- Test: existing `AnalysisCacheTest.java` (extend) + new round-trip case. + +- [ ] **Step 1: Failing test.** Add `confidence` and `source` columns to the SCHEMA_SQL `nodes` and `edges` tables. Failing assertion: `cache.put(file, [node with confidence=RESOLVED]); cache.get(file).confidence == RESOLVED`. + +- [ ] **Step 2: Run; see fail.** +- [ ] **Step 3: Bump `CACHE_VERSION` 4→5. Add columns. Update INSERT/SELECT statements. Update Jackson serialization helpers if used.** +- [ ] **Step 4: Run cache tests; verify all pass.** +- [ ] **Step 5: Commit:** `feat(cache): bump CACHE_VERSION to 5; add confidence + source columns`. + +--- + +### Task 6: Default `Confidence` per detector base class + +**Files:** +- Modify: `AbstractRegexDetector.java`, `AbstractJavaParserDetector.java`, `AbstractAntlrDetector.java`, `AbstractStructuredDetector.java`, `AbstractPythonAntlrDetector.java`, `AbstractTypeScriptDetector.java`, `AbstractJavaMessagingDetector.java`, `AbstractPythonDbDetector.java`. +- Test: a synthetic `BaseClassConfidenceDefaultTest.java` per base class (or a single parameterized test). + +- [ ] **Step 1: Failing parameterized test.** Subclass each base, emit a node with no explicit confidence, assert it carries the expected default (LEXICAL for regex, SYNTACTIC for AST/ANTLR/structured/python-antlr/typescript/messaging/python-db). +- [ ] **Step 2: Run; see fail (currently always LEXICAL or null).** +- [ ] **Step 3: Add a `defaultConfidence()` method on each base class returning the matching enum. Make `addNode`/`addEdge` helpers stamp it when not explicitly set.** +- [ ] **Step 4: Run; verify pass.** +- [ ] **Step 5: Run full detector suite to ensure no regression.** + +```bash +mvn test -Dtest='io.github.randomcodespace.iq.detector.*' -Dfrontend.skip=true -Ddependency-check.skip=true -q +``` + +- [ ] **Step 6: Commit:** `feat(detector): set Confidence default per base class`. + +--- + +### Task 7: Snapshot-test refresh (one-time) + +JSON-snapshot or golden-file tests will now include the additive `confidence` and `source` fields. Acceptance criterion §13 #3 in the spec requires the diff is limited to those two fields per record. + +- [ ] **Step 1: Run full test suite, capture failures.** + +```bash +mvn test -Dfrontend.skip=true -Ddependency-check.skip=true -q -DfailIfNoTests=false 2>&1 | tee /tmp/snapshot-failures.log +``` + +- [ ] **Step 2: For each snapshot diff, verify the diff is only the two additive fields.** If anything else changed, that's a bug — fix it before refreshing the snapshot. + +- [ ] **Step 3: Refresh snapshots one file at a time with separate commits per file** (so reviewers can diff cleanly). + +- [ ] **Step 4: Run full suite; expect green.** +- [ ] **Step 5: Commit each snapshot refresh:** `chore(test): refresh snapshot for confidence + source fields`. + +--- + +## Phase 2 — SPI scaffolding (Tasks 8–13) + +### Task 8: `Resolved` interface + `EmptyResolved` singleton + +**Files:** +- Create: `intelligence/resolver/Resolved.java`, `intelligence/resolver/EmptyResolved.java` +- Test: `ResolvedContractTest.java` + +- [ ] **Step 1: Failing test.** + +```java +// src/test/java/io/github/randomcodespace/iq/intelligence/resolver/ResolvedContractTest.java +package io.github.randomcodespace.iq.intelligence.resolver; + +import io.github.randomcodespace.iq.model.Confidence; +import org.junit.jupiter.api.Test; +import static org.junit.jupiter.api.Assertions.*; + +class ResolvedContractTest { + + @Test + void emptyResolvedIsSingleton() { + assertSame(EmptyResolved.INSTANCE, EmptyResolved.INSTANCE); + } + + @Test + void emptyResolvedHasLexicalConfidence() { + assertEquals(Confidence.LEXICAL, EmptyResolved.INSTANCE.sourceConfidence()); + } + + @Test + void emptyResolvedReportsUnsupported() { + assertFalse(EmptyResolved.INSTANCE.isAvailable()); + } +} +``` + +- [ ] **Step 2: Run; see fail.** +- [ ] **Step 3: Implement** `Resolved` (interface with `boolean isAvailable()`, `Confidence sourceConfidence()`, plus language-specific extension points to be added by `JavaResolved`) and `EmptyResolved.INSTANCE` (always returns `false` / `LEXICAL`). +- [ ] **Step 4: Run; pass.** +- [ ] **Step 5: Commit:** `feat(resolver): add Resolved interface + EmptyResolved singleton`. + +--- + +### Task 9: `ResolutionException` + +- [ ] **Step 1: Failing test:** assert `ResolutionException` carries the file path and language fields. +- [ ] **Step 2: Run; see fail.** +- [ ] **Step 3: Implement** as a checked exception (subclass `Exception`) with `Path file()`, `String language()`. +- [ ] **Step 4: Pass.** +- [ ] **Step 5: Commit:** `feat(resolver): add ResolutionException`. + +--- + +### Task 10: `SymbolResolver` interface + +```java +// src/main/java/io/github/randomcodespace/iq/intelligence/resolver/SymbolResolver.java +package io.github.randomcodespace.iq.intelligence.resolver; + +import io.github.randomcodespace.iq.analyzer.DiscoveredFile; +import java.nio.file.Path; +import java.util.Set; + +public interface SymbolResolver { + Set getSupportedLanguages(); + void bootstrap(Path projectRoot) throws ResolutionException; + Resolved resolve(DiscoveredFile file, Object parsedAst) throws ResolutionException; + default void shutdown() {} +} +``` + +- [ ] **Step 1: Failing contract test** — assert any concrete implementation (start with a stub) honors `getSupportedLanguages()` returning a non-empty `Set` and `resolve(...)` returning non-null. +- [ ] **Step 2: Run; see fail.** +- [ ] **Step 3: Implement** the interface as shown. +- [ ] **Step 4: Pass.** +- [ ] **Step 5: Commit:** `feat(resolver): add SymbolResolver SPI`. + +--- + +### Task 11: `ResolverRegistry` Spring bean + +**Files:** +- Create: `intelligence/resolver/ResolverRegistry.java` +- Test: `ResolverRegistryTest.java` + +- [ ] **Step 1: Failing test.** Two `@Component` stub resolvers (`JavaStubResolver` for `"java"`, `TsStubResolver` for `"typescript"`). Wire via `@SpringBootTest(classes=...)`. Assert `registry.resolverFor("java")` is the Java stub; unknown language returns a no-op (returns `EmptyResolved`); `bootstrap(rootPath)` calls bootstrap on every registered resolver exactly once. + +- [ ] **Step 2: Run; see fail.** + +- [ ] **Step 3: Implement** `ResolverRegistry` as a `@Component` that takes `List` via constructor injection, builds a `Map` keyed by lowercase language. `resolverFor(String language)` returns matching or a default that emits `EmptyResolved`. `bootstrap(rootPath)` iterates resolvers in alphabetical order by class simple name (determinism), calling each. + +- [ ] **Step 4: Pass.** + +- [ ] **Step 5: Commit:** `feat(resolver): add ResolverRegistry with auto-discovery`. + +--- + +### Task 12: `DetectorContext.resolved()` accessor + +**Files:** +- Modify: `detector/DetectorContext.java` +- Test: existing `DetectorContextTest.java` (or new) + assertion that legacy detectors still compile. + +- [ ] **Step 1: Failing test.** Build a `DetectorContext` with `.resolved(EmptyResolved.INSTANCE)`; assert the accessor returns it. Also assert default returns `Optional.empty()`. + +- [ ] **Step 2: Run; see fail.** + +- [ ] **Step 3: Add field + builder method + accessor**, additive (default `Optional.empty()`). + +- [ ] **Step 4: Run all detector tests** to confirm legacy detectors still compile and behave identically. + +```bash +mvn test -Dtest='io.github.randomcodespace.iq.detector.*' -Dfrontend.skip=true -Ddependency-check.skip=true -q +``` + +- [ ] **Step 5: Commit:** `feat(detector): add Optional accessor to DetectorContext`. + +--- + +### Task 13: Sanity build + +- [ ] **Step 1: Compile + run all model + resolver + detector tests.** + +```bash +mvn test -Dtest='io.github.randomcodespace.iq.{model,intelligence.resolver,detector}.*' \ + -Dfrontend.skip=true -Ddependency-check.skip=true -q +``` + +- [ ] **Step 2: Confirm green; if not, fix the smallest possible failure before moving on.** + +- [ ] **Step 3: Commit (only if any cleanup landed):** `chore: sanity build after Phase 2`. + +--- + +## Phase 3 — Java backend (Tasks 14–18) + +### Task 14: Add `javaparser-symbol-solver-core` dep + +**Files:** +- Modify: `pom.xml` + +- [ ] **Step 1: Resolve the latest stable version compatible with `javaparser-core` 3.28.0.** Use `context7` MCP first; fall back to Maven Central via `ctx_fetch_and_index`. + +- [ ] **Step 2: Add the dependency** to the `` block in `pom.xml`. Pin the version explicitly. Note: JavaParser publishes both core and symbol-solver from the same release train — they should share the same version. + +```xml + + com.github.javaparser + javaparser-symbol-solver-core + ${javaparser.version} + +``` + +(Add a `3.28.0` property if not already present; reuse the existing version everywhere.) + +- [ ] **Step 3: Run dependency check.** + +```bash +mvn dependency:tree -Dincludes=com.github.javaparser -Dfrontend.skip=true -Ddependency-check.skip=true +``` + +Expected: `javaparser-core` and `javaparser-symbol-solver-core` both at the pinned version. + +- [ ] **Step 4: Verify license** is Apache-2.0 (it is, but check `mvn dependency:tree` doesn't pull GPL/AGPL transitives). + +- [ ] **Step 5: Compile.** + +```bash +mvn test-compile -Dfrontend.skip=true -Ddependency-check.skip=true -q +``` + +- [ ] **Step 6: Commit:** `chore(deps): add javaparser-symbol-solver-core `. + +--- + +### Task 15: `JavaSourceRootDiscovery` + +**Files:** +- Create: `intelligence/resolver/java/JavaSourceRootDiscovery.java` +- Test: `JavaSourceRootDiscoveryTest.java` with synthetic dir layouts via `@TempDir`. + +- [ ] **Step 1: Failing test.** Cover: + - Maven single-module: `/pom.xml`, `src/main/java`, `src/test/java` → returns sorted `[src/main/java, src/test/java]`. + - Maven multi-module: root `pom.xml` with `service-a` + `service-b`; each has `src/main/java`. Returns sorted union. + - Gradle (`build.gradle.kts` or `build.gradle`): same `src/main/java` convention. + - Plain layout: just `src/` without Maven/Gradle markers — returns `[src/]` if it has `*.java`. + - Empty project (no Java): returns empty list, no exception. + - Symlink loop in tree: terminates without exception. + +```java +@Test void mavenSingleModule(@TempDir Path tmp) throws Exception { + Files.createDirectories(tmp.resolve("src/main/java")); + Files.createDirectories(tmp.resolve("src/test/java")); + Files.writeString(tmp.resolve("pom.xml"), ""); + var roots = new JavaSourceRootDiscovery().discover(tmp); + assertEquals(List.of(tmp.resolve("src/main/java"), tmp.resolve("src/test/java")), roots); +} +``` + +- [ ] **Step 2: Run; see fail.** +- [ ] **Step 3: Implement** discovery using `Files.walk` with depth limits. Return `List` sorted alphabetically. Idempotent. +- [ ] **Step 4: Run all 6+ scenarios; verify pass.** +- [ ] **Step 5: Commit:** `feat(resolver/java): add JavaSourceRootDiscovery (Maven/Gradle/plain auto-detect)`. + +--- + +### Task 16: `JavaResolved` record + +**Files:** +- Create: `intelligence/resolver/java/JavaResolved.java` +- Test: `JavaResolvedTest.java` + +- [ ] **Step 1: Failing test.** Construct a `JavaResolved` with a stub `JavaSymbolSolver` and a parsed `CompilationUnit`. Assert `isAvailable() == true`, `sourceConfidence() == RESOLVED`, exposes `.cu()` and `.solver()`. + +- [ ] **Step 2: Run; see fail.** + +- [ ] **Step 3: Implement** as a `record JavaResolved(CompilationUnit cu, JavaSymbolSolver solver) implements Resolved`. `isAvailable() = true`. `sourceConfidence() = Confidence.RESOLVED`. + +- [ ] **Step 4: Pass.** + +- [ ] **Step 5: Commit:** `feat(resolver/java): add JavaResolved record`. + +--- + +### Task 17: `JavaSymbolResolver` (`@Component`) + +**Files:** +- Create: `intelligence/resolver/java/JavaSymbolResolver.java` +- Test: covered by Task 18 (unit tests) and Task 30+ (aggressive layers). + +- [ ] **Step 1: Failing skeleton test.** + +```java +@Test void supportsJava() { + var r = new JavaSymbolResolver(new JavaSourceRootDiscovery()); + assertEquals(Set.of("java"), r.getSupportedLanguages()); +} + +@Test void bootstrapBuildsCombinedTypeSolver(@TempDir Path tmp) throws Exception { + Files.createDirectories(tmp.resolve("src/main/java")); + Files.writeString(tmp.resolve("pom.xml"), ""); + var r = new JavaSymbolResolver(new JavaSourceRootDiscovery()); + r.bootstrap(tmp); + assertNotNull(r.combinedTypeSolver()); +} +``` + +- [ ] **Step 2: Run; see fail.** + +- [ ] **Step 3: Implement.** + +```java +@Component +public class JavaSymbolResolver implements SymbolResolver { + private final JavaSourceRootDiscovery discovery; + private CombinedTypeSolver combined; + private JavaSymbolSolver solver; + + public JavaSymbolResolver(JavaSourceRootDiscovery discovery) { + this.discovery = discovery; + } + + @Override public Set getSupportedLanguages() { return Set.of("java"); } + + @Override + public void bootstrap(Path projectRoot) throws ResolutionException { + try { + CombinedTypeSolver cts = new CombinedTypeSolver(); + cts.add(new ReflectionTypeSolver()); + for (Path root : discovery.discover(projectRoot)) { + cts.add(new JavaParserTypeSolver(root.toFile())); + } + this.combined = cts; + this.solver = new JavaSymbolSolver(cts); + // Configure JavaParser default ParserConfiguration so any subsequent parse + // benefits from the solver — but allow per-parse override for tests. + StaticJavaParser.getParserConfiguration().setSymbolResolver(this.solver); + } catch (Exception e) { + throw new ResolutionException("bootstrap failed for " + projectRoot, e, projectRoot, "java"); + } + } + + @Override + public Resolved resolve(DiscoveredFile file, Object parsedAst) throws ResolutionException { + if (!"java".equalsIgnoreCase(file.language())) return EmptyResolved.INSTANCE; + if (!(parsedAst instanceof CompilationUnit cu)) return EmptyResolved.INSTANCE; + if (this.solver == null) return EmptyResolved.INSTANCE; + return new JavaResolved(cu, solver); + } + + public CombinedTypeSolver combinedTypeSolver() { return combined; } +} +``` + +- [ ] **Step 4: Pass.** +- [ ] **Step 5: Commit:** `feat(resolver/java): add JavaSymbolResolver`. + +--- + +### Task 18: `JavaSymbolResolverTest` — Layer 1 (resolver unit tests) + +**Files:** +- Create: `JavaSymbolResolverTest.java` +- Create: synthetic Java sources under `src/test/resources/intelligence/resolver/java//`. + +Cover all 15+ scenarios from spec §12 layer 1: empty file, single class, generics deep nesting, inner classes (static/non-static/anonymous/local), lambdas, records, sealed, enum-with-methods, interface-with-default, abstract, annotations, imports (explicit/static/wildcard/missing/unused), cyclic imports, same-named-classes-different-packages, JDK symbol, multi-source-root cross-reference. + +- [ ] **Step 1: For each scenario, write the synthetic source file** under `src/test/resources/intelligence/resolver/java//Foo.java` (or multiple files where needed) with a `README.md` describing intent (one paragraph). + +- [ ] **Step 2: Write the failing test class** (one `@Test` per scenario, named `resolves`). + +- [ ] **Step 3: Run; see fail.** + +- [ ] **Step 4: Verify fixtures alone are valid Java** by compiling them with `javac`; fix any syntax errors. + +- [ ] **Step 5: Run resolver tests; iteratively fix any unexpected resolver behavior.** + +- [ ] **Step 6: Commit (after each batch of ~5 scenarios passes):** `test(resolver/java): add Layer 1 scenarios `. + +--- + +## Phase 4 — Pipeline wiring (Tasks 19–21) + +### Task 19: Wire `ResolverRegistry` into `Analyzer.run()` + +- [ ] **Step 1: Failing test** (`AnalyzerResolverWiringTest`): assert `Analyzer.run(rootPath)` calls `registry.bootstrap(rootPath)` exactly once before any file is processed. + +- [ ] **Step 2: Run; fail.** + +- [ ] **Step 3: Inject `ResolverRegistry` into `Analyzer` (constructor injection, additive).** Add the bootstrap call at the top of `run()`. Order: discovery → resolver bootstrap → file iteration. (Discovery first so we know there's something to scan.) + +- [ ] **Step 4: Pass.** + +- [ ] **Step 5: Commit:** `feat(analyzer): bootstrap ResolverRegistry once per run`. + +--- + +### Task 20: Wire per-file resolution into the file-iteration loop + +- [ ] **Step 1: Failing test:** assert that for each file, `registry.resolverFor(file.language()).resolve(...)` is called and the returned `Resolved` is set on the `DetectorContext`. + +- [ ] **Step 2: Fail.** + +- [ ] **Step 3: Update the file-iteration block in `Analyzer`** to call `registry.resolverFor(file.language()).resolve(file, parsedAst)` and stuff the result into `DetectorContext.builder().resolved(...)`. Catch `ResolutionException` per file (log DEBUG, fall back to `EmptyResolved`). + +- [ ] **Step 4: Pass.** + +- [ ] **Step 5: Commit:** `feat(analyzer): per-file symbol resolution wired into pipeline`. + +--- + +### Task 21: Mirror in `IndexCommand` + +`IndexCommand` has its own batched H2 pipeline that's not entirely shared with `Analyzer`. Mirror the resolver bootstrap + per-file resolve path there. + +- [ ] **Step 1: Failing test** (`IndexCommandResolverWiringTest`). +- [ ] **Step 2: Fail.** +- [ ] **Step 3: Update `IndexCommand` similarly** — same constructor injection of `ResolverRegistry`, same call shape. +- [ ] **Step 4: Pass.** +- [ ] **Step 5: Commit:** `feat(cli): wire ResolverRegistry into IndexCommand`. + +--- + +## Phase 5 — Configuration (Tasks 22–23) + +### Task 22: `intelligence.symbol_resolution.java.*` config keys + +- [ ] **Step 1: Failing test** (`UnifiedConfigResolverKeysTest`): assert config object after parsing the example YAML carries `enabled = true`, `sourceRoots = "auto"`, `jdkReflection = true`, `bootstrapTimeoutSeconds = 30`, `maxPerFileResolveMs = 500`. + +- [ ] **Step 2: Fail.** + +- [ ] **Step 3: Add the new section + binding code** in unified config + `CodeIqConfig` legacy bridge (per `UnifiedConfigBeans`). + +- [ ] **Step 4: Pass.** + +- [ ] **Step 5: Commit:** `feat(config): add intelligence.symbol_resolution.java.* keys`. + +--- + +### Task 23: Document the keys in `docs/codeiq.yml.example` + +- [ ] **Step 1: Add the YAML block** matching spec §7 verbatim. +- [ ] **Step 2: Run `codeiq config validate`** against the example file (after building the JAR if needed) to confirm it parses. +- [ ] **Step 3: Commit:** `docs(config): document intelligence.symbol_resolution.java.* keys`. + +--- + +## Phase 6 — Detector migration (Tasks 24–29) + +Each migration follows the same TDD pattern. Concrete code differs per detector, but the test scaffolding is identical. + +### Task pattern (apply to each detector below) + +For detector `Detector`: + +- [ ] **Step 1: Read current detector + test** so you have the existing edge logic in context. + +```bash +sed -n '1,200p' src/main/java/io/github/randomcodespace/iq/detector/jvm/java/Detector.java +``` + +- [ ] **Step 2: Add three new test methods to `DetectorTest`:** + - `resolvedModeProducesResolvedEdge` — feed a fixture where the receiver type would be ambiguous lexically; with resolved context, assert edge target is the *correct* node ID. + - `fallbackModeMatchesPreSpecBaseline` — `ctx.resolved() == Optional.empty()`; assert logical-content output identical to the baseline (modulo additive fields). + - `mixedModeUsesResolverWhereAvailable` — half the files have resolved context, half don't; assert per-file confidence labelling. + +- [ ] **Step 3: Run; see fails.** + +- [ ] **Step 4: Update the detector to:** + - Accept `ctx.resolved()` as `Optional`. + - When present and is `JavaResolved`, use `solver` to resolve receiver types / generic args / referenced classes for the specific edges relevant to this detector. + - Stamp `Confidence.RESOLVED` on resolved-mode edges; existing path stamps base-class default. + +- [ ] **Step 5: Run all `DetectorTest`; verify pass + no regression.** + +- [ ] **Step 6: Run determinism case** (run detector twice on same input, assert byte-identical output). + +- [ ] **Step 7: Commit:** `feat(detector/): use resolved symbol info for `. + +### Task 24: `SpringServiceDetector` migration + +- Resolves `@Autowired UserService userService` to the actual `UserService` class node ID. +- Edge: `INJECTS` from the consumer class to the declared `UserService` type. +- Fixture: two `UserService` classes in different packages; assert resolution picks the imported one. + +### Task 25: `SpringRepositoryDetector` migration + +- Resolves the entity type parameter on `JpaRepository`. +- Edge: `MAPS_TO` from repository interface to the resolved entity class. + +### Task 26: `JpaEntityDetector` migration + +- Resolves generic args on `@OneToMany List`. +- Edge: `MAPS_TO` between entities (the holder and the related entity). + +### Task 27: `JpaRepositoryDetector` migration + +- Same as Spring repo, deeper. Resolves derived-query method-name return types where applicable (less reliable; flag as `Confidence.SYNTACTIC` if resolution is partial). + +### Task 28: `KafkaListenerDetector` migration + +- Resolves `@KafkaListener(topics = TOPIC_CONST)` where `TOPIC_CONST` is a static field — produce edges to the resolved topic name. +- Edge: `LISTENS` to the topic node. + +### Task 29: `SpringRestDetector` migration + +- Resolves `@RequestBody UserDto dto` and `@PathVariable` types. +- Edge: `MAPS_TO` from endpoint node to the resolved DTO class. + +--- + +## Phase 7 — Aggressive testing layers (Tasks 30–38) + +### Task 30: Layer 6 — Determinism (resolver-stage) + +**Files:** +- Create: `JavaSymbolResolverDeterminismTest.java` + +- [ ] **Step 1: Failing test.** Run the resolver twice against the same fixture; assert byte-identical serialized `Resolved` output (use Jackson with stable ordering). + +- [ ] **Step 2: Fail.** + +- [ ] **Step 3: Confirm resolver implementation already sorts source roots, uses `TreeMap` etc. — fix if not.** + +- [ ] **Step 4: Pass.** + +- [ ] **Step 5: Add the second variant: source roots passed in different order, same output.** + +- [ ] **Step 6: Commit:** `test(resolver/java): determinism — Layer 6`. + +--- + +### Task 31: Layer 3 — Concurrency stress + +**Files:** +- Create: `JavaSymbolResolverConcurrencyTest.java` + +- [ ] **Step 1: Generate 1000 synthetic Java files** in `@TempDir` (one class each, distinct names). Single source root. + +- [ ] **Step 2: Failing test:** resolve all 1000 files via virtual-thread fan-out; assert no exceptions, no duplicate node IDs in the union of `Resolved` outputs, total time within 2× the sequential baseline. + +- [ ] **Step 3: Fail/pass.** If fail, investigate (likely: bootstrap not idempotent under concurrent first-call). Add a `synchronized`/`volatile` initialization guard. + +- [ ] **Step 4: Add invocation-count test** — bootstrap is called exactly once even under N concurrent first-callers. + +- [ ] **Step 5: Commit:** `test(resolver/java): concurrency stress — Layer 3`. + +--- + +### Task 32: Layer 4 — Memory / pathological + +**Files:** +- Create: `JavaSymbolResolverPathologicalTest.java` + +- [ ] **Step 1: Generate fixtures** (synthesizable in setup): + - 10K-line class with mostly trivial methods. + - File with 1000 imports (most unresolvable). + - 10-deep generic nesting. + +- [ ] **Step 2: Failing tests under `-Xmx512m`** (set via Surefire config in pom). + +- [ ] **Step 3: Run; pass or fix.** Likely passes; if not, investigate JavaSymbolSolver's caching footprint. + +- [ ] **Step 4: Add timeout assertion** — each pathological case completes within `max_per_file_resolve_ms`. + +- [ ] **Step 5: Commit:** `test(resolver/java): pathological inputs — Layer 4`. + +--- + +### Task 33: Layer 5 — Adversarial + +- [ ] **Step 1:** Cover the spec §12 layer 5 cases: syntax-error file, mis-tagged language, mixed source root, ReflectionTypeSolver disabled (config flag). +- [ ] **Step 2:** Run; fix. +- [ ] **Step 3: Commit:** `test(resolver/java): adversarial inputs — Layer 5`. + +--- + +### Task 34: Layer 7 — E2E petclinic regression + +**Files:** +- Modify: existing `E2EQualityTest` (extend) or create `E2EQualityResolverTest`. + +- [ ] **Step 1: Capture baseline numbers.** Run `E2EQualityTest` with `intelligence.symbol_resolution.java.enabled=false`. Record edge precision/recall against `src/test/resources/e2e/ground-truth-petclinic.json`. Save to a baseline JSON checked into the test resources. + +- [ ] **Step 2: Run with `enabled=true`. Record post-change numbers.** + +- [ ] **Step 3: Failing assertion:** `precision_after > precision_before AND recall_after >= recall_before` (improvement on at least one, no regression on the other). + +- [ ] **Step 4: If precision/recall didn't move: investigate why.** Likely the migrated detectors aren't producing the expected resolved edges yet — go back to Phase 6 and fix. + +- [ ] **Step 5: Commit:** `test(e2e): petclinic resolver-mode improvement gate — Layer 7`. + +--- + +### Task 35: Layer 8 — Property-based (jqwik) — license check first + +- [ ] **Step 1: License check.** jqwik is EPL-2.0. Per `~/.claude/rules/dependencies.md` it's not on the preferred (MIT/Apache/BSD) list. **Ask the user explicitly before adding.** If declined, write hand-rolled randomized generators using existing JUnit + `java.util.Random` instead. + +- [ ] **Step 2: If approved, add jqwik to `pom.xml`** at test scope. Resolve latest stable via `context7`. + +- [ ] **Step 3: Failing properties:** + - `forall valid_java_source: resolver does not throw unchecked` (only `ResolutionException`). + - `forall valid_java_source: resolver terminates within max_per_file_resolve_ms`. + - `forall valid_java_source × file_in_unrelated_root: editing file_in_unrelated_root does not change resolution of valid_java_source`. + +- [ ] **Step 4: Run; iterate.** + +- [ ] **Step 5: Commit:** `test(resolver/java): property-based — Layer 8`. + +--- + +### Task 36: Layer 9 — PIT mutation testing (non-gating profile) + +- [ ] **Step 1: Add PIT plugin to `pom.xml` under a non-default profile** `mutation`. + +```xml + + mutation + + + + org.pitest + pitest-maven + 1.18.0 + + + io.github.randomcodespace.iq.intelligence.resolver.* + io.github.randomcodespace.iq.model.Confidence + + + + + + +``` + +- [ ] **Step 2: Run** `mvn -P mutation pitest:mutationCoverage -Dfrontend.skip=true -Ddependency-check.skip=true`. + +- [ ] **Step 3: Inspect the mutation kill rate.** Target ≥ 80% on the new packages. If lower, add focused tests until the rate clears 80%. + +- [ ] **Step 4: Commit:** `test(resolver): mutation testing profile (PIT) — Layer 9`. + +--- + +### Task 37: Aggregate test gate + +- [ ] **Step 1: Run full `mvn test` with both config states.** + +```bash +# enabled=false +CODEIQ_INTELLIGENCE_SYMBOL_RESOLUTION_JAVA_ENABLED=false \ + mvn test -Dfrontend.skip=true -Ddependency-check.skip=true + +# enabled=true (default) +mvn test -Dfrontend.skip=true -Ddependency-check.skip=true +``` + +- [ ] **Step 2: Fix any unexpected failure.** + +- [ ] **Step 3: Run `mvn verify` for the security gate** (this downloads NVD on first run — allow ~10 min). + +```bash +mvn verify -Dfrontend.skip=true +``` + +- [ ] **Step 4: Commit:** `test: aggregate gate green for sub-project 1`. + +--- + +### Task 38: Performance gate + +- [ ] **Step 1: Time `index` against `spring-petclinic`.** + +```bash +time java -jar target/code-iq-*-cli.jar index $E2E_PETCLINIC_DIR +``` + +Compare to the pre-change baseline (run on `main` once, before this branch's first impl commit landed). Acceptance: bootstrap < 10 s; per-Java-file resolve median ≤ 200 ms; total Java analysis time ≤ +60% of baseline. + +- [ ] **Step 2: If exceeded, profile** with `async-profiler` or VisualVM. Fix the regression. (Spec §9 documents the budget; exceeding it without justification is a bug.) + +- [ ] **Step 3: Record numbers in PR description.** + +- [ ] **Step 4: No commit needed unless a fix landed.** + +--- + +## Phase 8 — Doc updates + PR (Tasks 39–42) + +### Task 39: Expand `CHANGELOG.md` `[Unreleased]` entry + +- [ ] **Step 1: Add an `### Added` bullet** under `[Unreleased]` describing the resolver SPI, Java pilot, confidence/provenance schema, cache-version bump, migrated detectors. Cross-reference the spec at `docs/specs/2026-04-27-resolver-spi-and-java-pilot-design.md`. + +- [ ] **Step 2: Add a `### Changed` bullet** noting `CACHE_VERSION` 4 → 5 (one-time cache rebuild on first run after upgrade). + +- [ ] **Step 3: Commit:** `docs(changelog): add sub-project 1 entry`. + +--- + +### Task 40: `CLAUDE.md` Gotchas update + +- [ ] **Step 1: Add bullets:** + - Confidence + source are now mandatory on every node/edge — base classes set defaults; detectors override to `RESOLVED` when consuming `ctx.resolved()`. + - The pipeline now has a resolve stage between parse and detect. Profile selection unchanged. + - `CACHE_VERSION` is 5 — bumping invalidates all existing `.codeiq/cache/` dirs on first run. + - `intelligence.symbol_resolution.java.enabled=false` is the off-switch for raw-speed scans or backward-compat snapshots. + +- [ ] **Step 2: Commit:** `docs(claude): gotchas for sub-project 1`. + +--- + +### Task 41: `PROJECT_SUMMARY.md` updates + +- [ ] **Step 1: Tech-stack row addition:** `| AST + symbols | JavaParser 3.28.0 + javaparser-symbol-solver-core | pom.xml |`. + +- [ ] **Step 2: Gotchas updates:** mention `Confidence`, the resolve stage, the `CACHE_VERSION` bump. + +- [ ] **Step 3: Commit:** `docs(summary): note resolver pipeline + Confidence schema`. + +--- + +### Task 42: Push branch + open PR + +- [ ] **Step 1: Push branch** to `origin`. + +```bash +git push -u origin feat/sub-project-1-resolver-spi-and-java-pilot +``` + +- [ ] **Step 2: Open PR via `gh`.** + +```bash +gh pr create --title "feat: sub-project 1 — resolver SPI + Java pilot + confidence schema" \ + --body "$(cat <<'EOF' +## Summary +- Symbol-resolution stage between parse and detect, per-language `SymbolResolver` SPI auto-discovered as Spring `@Component`s. +- Java backend wraps JavaParser's `JavaSymbolSolver` (no new dependency tree — same release train as `javaparser-core`). +- `Confidence` enum (`LEXICAL`/`SYNTACTIC`/`RESOLVED`) and `source` field on every `CodeNode` / `CodeEdge`, round-tripped through Neo4j (`prop_*` convention) and H2 cache (schema v5). +- 4–6 Java detectors migrated as proof of value (Spring service / repository, JPA entity / repo, Kafka listener, Spring REST). +- 9 layers of aggressive testing (unit, integration, concurrency, pathological, adversarial, determinism, E2E petclinic regression, property-based via jqwik [pending license OK], PIT mutation profile). + +## Spec +[`docs/specs/2026-04-27-resolver-spi-and-java-pilot-design.md`](docs/specs/2026-04-27-resolver-spi-and-java-pilot-design.md) + +## Acceptance criteria +See spec §13. All checked. + +## Test plan +- [x] `mvn verify` green on CI +- [x] No logical-content regression with `enabled: false` (snapshots refreshed in separate commits — see history) +- [x] E2E petclinic precision / recall measurably up with `enabled: true` (numbers below) +- [x] Determinism gate: resolver runs byte-identical 10× on same input +- [x] Concurrency stress: 1000 files via virtual threads, no deadlocks +- [x] Layer 8 jqwik / Layer 9 PIT non-gating signals captured in the PR + +## Petclinic numbers +| Metric | enabled=false (baseline) | enabled=true (this PR) | Δ | +|---|---|---|---| +| edge precision | _filled at impl time_ | _filled at impl time_ | + | +| edge recall | _filled at impl time_ | _filled at impl time_ | + | + +## Out of scope +- Sub-projects 2–8 (TS / Python / Go / Rust+C+++C# resolvers, framework-aware detect refactor, FP harness, MCP read-path hardening). Each gets its own spec → plan → impl cycle. + +🤖 Generated with [Claude Code](https://claude.com/claude-code) +EOF +)" +``` + +- [ ] **Step 3: Wait for CI;** if any failure, fix on the branch and push (do not `--amend` and force-push). Repeat until CI green. + +- [ ] **Step 4: Hand back to user** per default check-in cadence (b): "PR is open, tests green, ready for human review." + +--- + +## Self-review (run after writing the plan, before execution) + +1. **Spec coverage** — every acceptance criterion (§13) maps to at least one task. Verified. +2. **Placeholder scan** — no "TBD"/"TODO"/"figure out"; concrete code blocks for foundational tasks; templated patterns for repeated migrations. Acceptable per skill DRY guidance. +3. **Type / naming consistency** — `Confidence`, `Resolved`, `EmptyResolved`, `SymbolResolver`, `ResolverRegistry`, `JavaSymbolResolver`, `JavaResolved`, `JavaSourceRootDiscovery` — all referenced consistently across tasks. +4. **Backward compatibility** — Phase 6 detectors keep their existing logic; resolver consumption is purely additive. +5. **Determinism** — Tasks 30, 31 (concurrency), and detector determinism (per Task pattern Step 6) all preserve the determinism gate. +6. **Performance budget** — Task 38 explicitly checks the spec §9 numbers. +7. **License decisions** — Task 35 (jqwik) is gated on user approval; Task 36 (PIT) is Apache-2.0, fine. +8. **Test refresh hazard** — Task 7 isolates the snapshot refresh into its own commit chain so reviewers can verify the diff is bounded to the additive fields. diff --git a/docs/project/architecture.md b/docs/project/architecture.md new file mode 100644 index 00000000..43186093 --- /dev/null +++ b/docs/project/architecture.md @@ -0,0 +1,132 @@ +# Architecture + +## High-level shape + +codeiq is a **two-mode Spring Boot application** that ships as one JAR with the React SPA bundled inside: + +- **Indexing mode** (`index`, `enrich`, and most other CLI commands): Spring profile `indexing`, no web server, virtual-thread-driven file scanning + detector pipeline writing to H2 (cache) then Neo4j Embedded (graph). +- **Serving mode** (`serve` only): Spring profile `serving`, web server up, REST API + Spring AI MCP server + React SPA reading from the already-populated Neo4j directory. Strictly read-only — no detector code runs in this profile. + +``` + ┌──────────────────────────┐ + filesystem ───► │ index (FileDiscovery + │ + (any repo) │ Detectors + GraphBuilder)│ ──► H2 cache (.codeiq/cache/) + └──────────────────────────┘ + │ + ┌──────────────────────────┐ │ + │ enrich (Linkers + │ ◄───────┘ + │ LayerClassifier + │ + │ ServiceDetector + │ + │ LanguageEnricher + │ + │ LexicalEnricher) │ ──► Neo4j (.codeiq/graph/graph.db) + └──────────────────────────┘ │ + │ + developer / agent ◄── REST + MCP + React SPA ◄──── serve ◄─────┘ + (read-only) (Spring profile = serving) +``` + +Profile selection happens in `CodeIqApplication.java`'s `main` (around the `boolean isServe = "serve".equalsIgnoreCase(command)` block): the first CLI arg is matched against `serve` → `serving`; everything else → `indexing`. `indexing` sets `WebApplicationType.NONE`. + +## Components + +### Pipeline orchestrator (`analyzer/`) +- **Lives in:** `src/main/java/io/github/randomcodespace/iq/analyzer/` +- **Responsibility:** Discover files, route to parsers, fan out to detectors on virtual threads, fold results into a single graph buffer, then run cross-file linkers and the layer classifier. +- **Key files:** + - `Analyzer.java` — top-level pipeline (in-memory mode for `analyze` command). + - `FileDiscovery.java` — `git ls-files` first, falls back to directory walk; maps extensions → languages via `FileClassifier.java`. + - `StructuredParser.java` — routes Java to JavaParser, ANTLR-supported langs to `grammar/AntlrParserFactory.java`, others to raw text. + - `GraphBuilder.java` — buffered build (nodes-first, then edges) — determinism guarantee. + - `LayerClassifier.java` — sets `layer ∈ {frontend|backend|infra|shared|unknown}` on every node. + - `ServiceDetector.java` — filesystem walk for build files (30+ build systems) → SERVICE nodes with `CONTAINS` edges. + - `linker/` — 4 linkers run after detectors: `EntityLinker`, `GuardLinker`, `ModuleContainmentLinker`, `TopicLinker` (`Linker.java` is the interface; `LinkResult.java` is the return type). + - `ConfigScanner.java`, `InfrastructureRegistry.java`, `ArchitectureKeywordFilter.java` — supporting passes. +- **Talks to:** `detector/` (fan-out), `cache/AnalysisCache.java` (write), `graph/GraphStore.java` (write — only during `enrich`). +- **Owns:** in-memory graph buffer during a single run. + +### Detector layer (`detector/`) +- **Lives in:** `src/main/java/io/github/randomcodespace/iq/detector/` +- **Responsibility:** 99 concrete detectors that turn parsed files into nodes + edges. Auto-discovered as Spring `@Component`s; no registry to maintain. +- **Categories (one subdir each):** `auth/`, `csharp/`, `frontend/`, `generic/`, `go/`, `iac/`, `jvm/{java,kotlin,scala}/`, `markup/`, `proto/`, `python/`, `script/{shell,...}/`, `sql/`, `structured/`, `systems/{cpp,rust}/`, `typescript/`. +- **Base classes:** `Detector` (interface), `AbstractRegexDetector`, `AbstractJavaParserDetector`, `AbstractAntlrDetector`, `AbstractStructuredDetector`, `AbstractPythonAntlrDetector`, `AbstractPythonDbDetector`, `AbstractTypeScriptDetector`, `AbstractJavaMessagingDetector`. Plus three static helpers: `DetectorDbHelper`, `FrontendDetectorHelper`, `StructuresDetectorHelper`. Full table: see [`conventions.md`](conventions.md) §"Detector base classes". +- **Talks to:** parsed AST input (JavaParser CompilationUnit, ANTLR ParseTree, or raw text) via `DetectorContext`. Writes to a thread-local `DetectorResult`. +- **Owns:** nothing — must be stateless. Spring beans are singletons. + +### Graph store (`graph/`) +- **Lives in:** `src/main/java/io/github/randomcodespace/iq/graph/` +- **Responsibility:** Facade over Neo4j Embedded — UNWIND-batched bulk save for writes, raw Cypher for reads (no Spring Data Neo4j hydration on the read path for performance). +- **Key files:** + - `GraphStore.java` — `bulkSave(List, List)`, `queryNodes(...)`, fulltext search via `db.index.fulltext.queryNodes`. Creates 5 indexes on first save (3 b-tree + 2 fulltext — see [`data-model.md`](data-model.md)). + - `GraphRepository.java` — Spring Data Neo4j repository, used **only on the write path** (legacy). +- **Talks to:** Neo4j Embedded via `org.neo4j.graphdb` API (no Bolt for in-process reads). +- **Owns:** the Neo4j directory at `.codeiq/graph/graph.db/`. + +### Analysis cache (`cache/`) +- **Lives in:** `src/main/java/io/github/randomcodespace/iq/cache/` +- **Responsibility:** Per-file content-hash cache so re-running `index` only re-detects changed files. +- **Key files:** `AnalysisCache.java` (H2 schema + read/write API, `ReentrantReadWriteLock`-guarded, `CACHE_VERSION = 4`), `FileHasher.java` (SHA-256, 64-hex output). + +### REST API (`api/`) +- **Lives in:** `src/main/java/io/github/randomcodespace/iq/api/` +- **Files:** `GraphController.java` (`/api/**`), `FlowController.java` (`/api/flow/**`), `TopologyController.java` (`/api/topology/**`), `IntelligenceController.java` (`/api/intelligence/**`), `SafeFileReader.java` (helper, path-traversal guard). +- All controllers carry `@Profile("serving")` — they aren't loaded in indexing mode. +- 37 endpoints, all read-only. Full enumeration in [`CLAUDE.md`](../../CLAUDE.md) §"Server Endpoints". + +### MCP server (`mcp/`) +- **File:** `src/main/java/io/github/randomcodespace/iq/mcp/McpTools.java` — 34 `@McpTool`-annotated methods. Spring AI's `spring-ai-starter-mcp-server-webmvc` auto-registers them on a streamable HTTP transport at `/mcp`. Read-only. + +### Intelligence enrichment (`intelligence/`) +- **Lives in:** `src/main/java/io/github/randomcodespace/iq/intelligence/` +- **Sub-packages:** `lexical/` (doc-comment + snippet enrichment), `extractor/` (per-language extractors: `java/`, `typescript/`, `python/`, `go/`), `evidence/` (evidence-pack assembly for retrieval), `query/` (`QueryPlanner` for intelligent routing). +- Runs during `enrich` after structural data is in Neo4j; produces `prop_lex_*` properties indexed by the `lexical_index` fulltext index. + +### CLI (`cli/`) +- **Lives in:** `src/main/java/io/github/randomcodespace/iq/cli/` +- **Files:** 20 — `CodeIqCli.java` (top-level), 14 commands (`Index`, `Enrich`, `Serve`, `Analyze`, `Stats`, `Graph`, `Query`, `Find`, `Cypher`, `Topology`, `Flow`, `Bundle`, `Cache`, `Plugins`), config subcommands (`ConfigCommand`, `ConfigExplainSubcommand`, `ConfigValidateSubcommand`), `VersionCommand`, helper `CliOutput`. +- All commands are `@Component`s; Picocli + Spring integration via `picocli-spring-boot-starter`. + +### React SPA (`src/main/frontend/`) +- See [`ui.md`](ui.md). Vite builds into `src/main/resources/static/` — Spring Boot's static handler serves it from inside the JAR when `codeiq.ui.enabled=true`. + +## Layering / dependency rules + +The package graph enforces a one-way flow: + +``` +cli/ ──► analyzer/ ──► detector/ ─► model/ + │ │ + └► linker/ └► grammar/ (ANTLR factory) + │ + ├► cache/ (H2) + └► graph/ (Neo4j) ──► api/ ──► query/ (read path) + │ + └► mcp/ (same QueryService) +``` + +- `model/` (CodeNode, CodeEdge, NodeKind, EdgeKind) is the dependency floor — depends on nothing in this codebase. +- `detector/` may import `model/` and `grammar/` — never `analyzer/`, `cli/`, or `api/`. +- `api/` and `mcp/` may import `query/` and `model/` — never `detector/` or `analyzer/` (read-only at serving time). +- `analyzer/` may import everything below it — it's the orchestrator. + +The `@Profile("serving")` annotation on every controller and on Neo4j-only beans (see `config/Neo4jConfig.java`) is what enforces "no writes during serving" at runtime; the package layering is convention, not a lint rule. + +## Cross-cutting concerns + +- **Logging:** SLF4J + Spring Boot's default Logback. `application.yml` quiets noisy `org.springframework.ai.mcp` and `PostProcessorRegistrationDelegate` to WARN. +- **Error handling:** Pipeline errors are logged + counted, never abort a whole run. Detector exceptions are caught per-file (the run continues with a logged warning); see `Analyzer.java` task wrapping. CLI commands return `int` exit codes via Picocli. +- **Auth / authz:** None — codeiq runs on the developer's machine. The serving layer trusts the loopback caller. CORS is configurable via `codeiq.cors.allowed-origin-patterns` (`application.yml` / `CorsConfig.java`). +- **Observability:** Spring Boot Actuator (`/actuator/health` with liveness + readiness probes per `application.yml`); `health/GraphHealthIndicator.java` reports Neo4j status. No metrics export — by design (offline tool). +- **Config:** Hierarchical, last-wins: built-in defaults → `~/.codeiq/config.yml` → `./codeiq.yml` → `CODEIQ_*` env → CLI flags. `UnifiedConfigBeans` bridges the unified config to the legacy `CodeIqConfig` bean. Spring-owned keys (`codeiq.neo4j.enabled`, `codeiq.neo4j.bolt.port`, `codeiq.cors.allowed-origin-patterns`, `codeiq.ui.enabled`) live in `application.yml` because they drive `@ConditionalOnProperty` / `@Value` wiring. Full schema: [`docs/codeiq.yml.example`](../codeiq.yml.example). + +## Concurrency model + +- Detector fan-out runs on **virtual threads** (`Executors.newVirtualThreadPerTaskExecutor()` in `Analyzer.java`). Java 25 + JEP 491 means `synchronized` and `j.u.c.locks` no longer pin carrier threads, so the cache's `ReentrantReadWriteLock` is purely a logical concurrency primitive — not a workaround. +- Detectors are stateless `@Component` singletons (Spring's default scope). Per-file mutable state lives in method-local `DetectorContext` / `DetectorResult` instances. +- `GraphBuilder` collects results into indexed slots (one per file) so iteration order is independent of thread completion order — this is the determinism guarantee. + +## Why it's shaped this way + +- **Three-stage pipeline (`index`/`enrich`/`serve`) instead of one all-in-one `analyze`:** large codebases (44 K+ files in the original target) blow heap if scanning + Neo4j ingestion happen in the same JVM run. `index` writes to H2 in batches (default 500), `enrich` reads from H2 and bulk-loads with UNWIND. `analyze` is kept as a legacy in-memory shortcut for small repos. See `CLAUDE.md` §"Pipeline". +- **Embedded Neo4j (not a server):** zero-ops deployment for an offline tool; bundle model means the serving host doesn't even need source code, just the `.codeiq/graph/` directory. +- **Read-only serving layer:** lets the server be deployed to a "remote" environment where source code is forbidden, while analysis still happens on the developer's box. See [`CLAUDE.md`](../../CLAUDE.md) §"Critical Rules / Read-Only Serving Layer". +- **Auto-discovery of detectors via `@Component`:** detectors are added by dropping a class — no registry edits, no plugin manifest. The trade-off is that mistakes (forgetting `@Component`) silently disable a detector; the `plugins` CLI command exists to introspect what's actually live. diff --git a/docs/project/build-and-run.md b/docs/project/build-and-run.md new file mode 100644 index 00000000..ba1d5916 --- /dev/null +++ b/docs/project/build-and-run.md @@ -0,0 +1,152 @@ +# Build & Run + +## Prerequisites + +- **Java 25** (Temurin recommended — pinned in CI: `.github/workflows/ci-java.yml` sets `distribution: 'temurin'` and `java-version: '25'` on `actions/setup-java`). +- **Maven 3.9+** (Maven Wrapper not committed; `mvn` from system path is expected). +- **Node.js + npm** for the frontend build. The `frontend-maven-plugin` (configured in `pom.xml`) downloads its own Node automatically — you don't need a system Node unless you run `npm` directly inside `src/main/frontend/`. +- **No Docker, no Postgres, no Redis** — codeiq is offline-first. Neo4j and H2 are embedded. + +## First-time setup + +```bash +git clone https://github.com/RandomCodeSpace/codeiq.git +cd codeiq + +# Quickest validation — skip tests, skip the security gate +mvn clean package -DskipTests -Ddependency-check.skip=true + +# Resulting JAR +ls target/code-iq-*-cli.jar +``` + +The first `mvn verify` (the full CI gate) downloads ~1 GB of NVD data for OWASP dependency-check. Use `-Ddependency-check.skip=true` while iterating locally; CI runs the full check on every push. + +Source for these steps: `pom.xml` (the `` block + plugin executions further down) and [`shared/runbooks/first-time-setup.md`](../../shared/runbooks/first-time-setup.md). + +## Local development loop + +There's no hot-reload story for the Java side — codeiq is a CLI/server, not a long-running dev server. The typical loop: + +```bash +# Edit Java source, then +mvn test -Dtest=YourDetectorTest -Dfrontend.skip=true # fastest single-test cycle +mvn package -DskipTests -Ddependency-check.skip=true # repackage the JAR +java -jar target/code-iq-*-cli.jar index /path/to/scan-target +java -jar target/code-iq-*-cli.jar enrich /path/to/scan-target +java -jar target/code-iq-*-cli.jar serve /path/to/scan-target +``` + +For the **frontend** (live HMR against a running backend): + +```bash +# Terminal 1 — run the Java backend +java -jar target/code-iq-*-cli.jar serve /path/to/scan-target + +# Terminal 2 — run Vite dev server (proxies /api and /mcp to localhost:8080) +cd src/main/frontend +npm install +npm run dev +``` + +Vite proxy config: `src/main/frontend/vite.config.ts` (`server.proxy` at the bottom of the file) — `/api` and `/mcp` go to `http://localhost:8080`. + +## Test layers + +- **Unit + integration (JUnit, ~236 test files):** + ```bash + mvn test # all tests + mvn test -Dtest=SpringRestDetectorTest # one class + mvn test -Dsurefire.useFile=false # verbose stderr to console + ``` + Tests live in `src/test/java/**` mirroring the source-tree package layout. **Detector tests must include positive, negative, and determinism cases** — see existing `*DetectorTest.java`. + +- **E2E quality tests (Context7-grounded ground truth):** + ```bash + E2E_PETCLINIC_DIR=/path/to/spring-petclinic mvn test -Dtest=E2EQualityTest + ``` + Ground-truth JSON lives under `src/test/resources/e2e/ground-truth-*.json`. Skipped automatically when the env var isn't set. + +- **Frontend E2E (Playwright):** + ```bash + cd src/main/frontend + npm run test:e2e # headless + npm run test:e2e:headed # with browser visible + npm run test:e2e:report # open last report + ``` + +- **CI gate:** + ```bash + mvn verify + ``` + Includes everything above (`mvn test` plus `spotbugs:check` and `dependency-check:check` bound to the `verify` phase). Failing any of those breaks the build. See `pom.xml` plugin executions and `.github/workflows/ci-java.yml`. + +## Build artifacts + +- **What:** a single fat JAR — `target/code-iq-*-cli.jar` (Spring Boot repackaged executable JAR). +- **Bundles:** all Java deps + the React SPA built into `src/main/resources/static/` by the `frontend-maven-plugin` during `mvn package`. +- **Maven coordinates:** `io.github.randomcodespace.iq:code-iq` (see `` / `` in `pom.xml`). The artifactId stays `code-iq` historically; the binary command is `codeiq`. +- **Releases:** + - Beta: `.github/workflows/beta-java.yml` — `workflow_dispatch` only → Sonatype Central beta + GitHub pre-release. + - GA: `.github/workflows/release-java.yml` — `workflow_dispatch` with a `version` input → builds a GPG-signed release commit on a detached HEAD, deploys to Sonatype Central, then pushes a GPG-signed annotated `vX.Y.Z` tag + GitHub Release. **No tag-push trigger; no auto-release on merge.** See [`shared/runbooks/release.md`](../../shared/runbooks/release.md). + +## Deploy + +There is no SaaS surface, no container image, no VPS. codeiq runs on the developer's machine. The deploy flow: + +1. User adds the dep / downloads the JAR from Maven Central or GitHub Releases. +2. User runs `codeiq index → enrich → serve` against their own repo. +3. The `serve` mode binds `0.0.0.0:8080` by default — exposed only to the local machine unless the user reconfigures. + +For codeiq's own release (publishing to Maven Central): see [`shared/runbooks/release.md`](../../shared/runbooks/release.md). Rollback: [`shared/runbooks/rollback.md`](../../shared/runbooks/rollback.md). + +## CLI reference + +20 files under `src/main/java/io/github/randomcodespace/iq/cli/` define 14 user-facing commands. Authoritative table is in [`CLAUDE.md`](../../CLAUDE.md) §"CLI Commands"; condensed here: + +| Command | Purpose | Profile | +|---|---|---| +| `index ` | Memory-efficient batched scan → H2 cache | `indexing` | +| `enrich ` | Load H2 → Neo4j; run linkers, classifier, services | `indexing` | +| `serve ` | Read-only REST + MCP + UI on `http://localhost:8080` | **`serving`** | +| `analyze ` | Legacy in-memory all-in-one (small repos only) | `indexing` | +| `stats ` | 7-category statistics from Neo4j | `indexing` | +| `graph ` | Export graph (JSON / YAML / Mermaid / DOT) | `indexing` | +| `query ` | Preset relationship queries (consumers, producers, ...) | `indexing` | +| `find ` | Preset finds (endpoints, guards, entities, topics) | `indexing` | +| `cypher ` | Raw Cypher against Neo4j | `indexing` | +| `topology ` | Service topology (blast radius, cycles, bottlenecks) | `indexing` | +| `flow ` | Architecture flow diagrams | `indexing` | +| `bundle ` | Pack graph + source snapshot into ZIP | `indexing` | +| `cache ` | Inspect / clear / stats H2 cache | `indexing` | +| `plugins ` | List / inspect detectors | `indexing` | +| `config validate` / `config explain` | Unified-config tooling | `indexing` | +| `version` | Show version info | `indexing` | + +Profile selection happens in `CodeIqApplication.java`'s `main` (the `boolean isServe = "serve".equalsIgnoreCase(command)` block) — `serve` activates `serving` (web server on); everything else activates `indexing` (`WebApplicationType.NONE`). + +## Build phases — what runs when + +| Phase | What runs | Source | +|---|---|---| +| `generate-sources` | ANTLR codegen from `*.g4` files | `pom.xml` `antlr4-maven-plugin` | +| `process-resources` | `frontend-maven-plugin`: install Node, `npm ci`, `npm run build` → `src/main/resources/static/` | `pom.xml`, `src/main/frontend/vite.config.ts` (`build.outDir: '../resources/static'`) | +| `compile` / `test-compile` | javac for Java 25 | standard | +| `test` | Surefire — JUnit | standard | +| `verify` | `spotbugs:check`, `dependency-check:check` | `pom.xml` plugin executions; **this is the CI gate** | +| `package` | Spring Boot repackage → executable JAR with embedded SPA | `spring-boot-maven-plugin` | + +## Gotchas + +- **`mvn test` does NOT run the security gate.** SpotBugs and OWASP dependency-check are bound to `verify`. CI runs `mvn verify`. Locally, `mvn verify` is what actually mirrors CI. +- **OWASP NVD download is ~1 GB** and very slow on first run. `-Ddependency-check.skip=true` for fast local cycles; let CI run the full check. +- **`-Dfrontend.skip=true`** skips the frontend-maven-plugin entirely. The default `false` (in the `pom.xml` `` block) means `mvn package` always tries to build the SPA. Backend-only contributors should pass `-Dfrontend.skip=true` to avoid pulling Node. +- **Vite output path is relative-up:** `src/main/frontend/vite.config.ts` writes to `'../resources/static'` (= `src/main/resources/static/`) and uses `emptyOutDir: false` so a stale dir won't be wiped — if you see leftover assets, delete `src/main/resources/static/` manually. +- **ANTLR generated sources go under `target/generated-sources/antlr4/`** (per `antlr4-maven-plugin` defaults). Don't edit them; regenerate via `mvn generate-sources`. Modifying the `.g4` files in `src/main/antlr4/` is the supported edit point. +- **Spring Boot startup overhead is 8–16 s** for the embedded Neo4j + Spring context. Expected; not a perf bug. +- **Default index batch size is 500** (`Indexing batch tuning, see CLAUDE.md`). Larger isn't better; 500 outperformed 1000 in the tuning runs that set the default. +- **Tomcat 11.0.21 + Jackson 3.1.1 are pinned overrides** of Spring Boot 4.0.5's BOM (see `` / `` in `pom.xml`'s ``). Both are security bumps. Revert when Spring Boot 4.0.6+ catches up — keep the rationale comments. +- **`@ActiveProfiles("test")` is required on every `@SpringBootTest`** to avoid Neo4j auto-startup conflicts in integration tests. +- **First-run cache version mismatch wipes `.codeiq/cache/`.** Bump `CACHE_VERSION` (constant near the top of `cache/AnalysisCache.java`) whenever you change the hash algorithm or H2 schema. Existing users will lose cache on next run; that's intentional (incorrect cache > slow cache). +- **`SECURITY.md`, `CHANGELOG.md`, `.bestpractices.json`, `LICENSE`** are part of the OpenSSF Best Practices gate (project_id 12650). Do not delete or rename without coordinating — they are referenced by `.bestpractices.json` and the Scorecard workflow. +- **CI workflow pins all third-party actions by 40-char SHA** (see `.github/workflows/scorecard.yml`, `.github/workflows/codeql.yml` if present). When adding a new action, pin by SHA — Scorecard's `Pinned-Dependencies` check will downgrade us otherwise. diff --git a/docs/project/conventions.md b/docs/project/conventions.md new file mode 100644 index 00000000..b83665e2 --- /dev/null +++ b/docs/project/conventions.md @@ -0,0 +1,126 @@ +# Conventions + +Rules to follow when modifying codeiq. Each item is grounded in an existing file. The 7 most important ones are summarized in [`PROJECT_SUMMARY.md`](../../PROJECT_SUMMARY.md) §"Conventions an agent must respect"; this file is the long form. + +## Code style + +- **Java 25 idioms encouraged** — records, sealed classes, pattern matching, virtual threads. Don't down-port to older idioms; this codebase is on the latest LTS-track. +- **Constructor injection only.** No field injection (`@Autowired` on fields), no setter injection. See any `@Component` / `@Service` in the codebase, e.g. `api/GraphController.java`. +- **Property-key constants** — when a string literal appears 3+ times in a file, extract: `private static final String PROP_FRAMEWORK = "framework";`. Saves typo bugs and makes refactors greppable. +- **Spring AI MCP annotations:** use `@McpTool` and `@McpToolParam` (Spring AI 2.x), not `@Tool`/`@ToolParam` (older form). See `mcp/McpTools.java`. +- **UTF-8 explicit:** `StandardCharsets.UTF_8` everywhere — never rely on platform default. `Analyzer.java` shows the import. + +## Error handling + +- **Pipeline errors don't abort the run.** Per-file detector exceptions are caught and logged; the file is skipped, the run continues. See task wrapping in `analyzer/Analyzer.java`. +- **CLI commands return `int` exit codes** via Picocli's `Callable` pattern. See any `cli/*Command.java` (e.g. `cli/EnrichCommand.java`). +- **No `System.exit()` from non-CLI code.** `CodeIqApplication.main` is the only place that calls `SpringApplication.exit(...)` and `System.exit(...)`. +- **No silent fallbacks.** If a detector can't parse a file, log it; don't return an empty result that looks indistinguishable from "nothing matched". + +## Naming + +- **Java packages:** `io.github.randomcodespace.iq.` (lowercase, no plurals). Detector subpackages match the language family: `detector/jvm/{java,kotlin,scala}/`, `detector/typescript/`, `detector/python/`, `detector/systems/{cpp,rust}/`. +- **Detector class:** `Detector` — `SpringSecurityDetector`, `FastifyDetector`, `GoStructuresDetector`. Always ends in `Detector`. +- **Detector test class:** `DetectorTest` — colocated under `src/test/java/` with the same package. +- **CLI commands:** `Command` — `IndexCommand`, `EnrichCommand`, `ServeCommand`. Picocli `@Command(name = "")` annotation gives the user-facing name. +- **Node ID format:** `"{prefix}:{filepath}:{type}:{identifier}"` — e.g. `"node:src/main/java/Foo.java:class:Foo"`. The full file path is part of the key — that's how cross-file uniqueness works. +- **Property keys:** snake_case (`auth_type`, `framework`, `roles`). Stored in Neo4j with a `prop_` prefix (`prop_auth_type`, `prop_framework`). +- **Frontend imports:** `@/...` resolves to `src/main/frontend/src/...` (Vite alias in `vite.config.ts`, mirrored in `tsconfig.json`'s `paths`). Always use the alias, never `../../../`. + +## Tests + +- **Location:** `src/test/java//`. ~236 test files total. +- **Layers:** + - **Unit:** plain JUnit, no Spring context. Most detector tests are unit. + - **Integration:** `@SpringBootTest` with `@ActiveProfiles("test")` — required to suppress Neo4j auto-startup. Standalone MockMvc for controller tests (no full context). + - **MCP tools:** test by calling `McpTools` methods directly — no protocol round-trip needed. + - **E2E quality:** `E2EQualityTest` validates against Context7-sourced ground truth (`src/test/resources/e2e/ground-truth-*.json`). Requires the env var `E2E_PETCLINIC_DIR` (or similar) to point at a cloned reference repo. +- **Run a single test:** `mvn test -Dtest=ClassName#methodName`. +- **Every detector needs:** + 1. Positive match — input that should fire, output asserted. + 2. Negative match — input that *looks similar* but shouldn't fire (especially for framework detectors). + 3. **Determinism test** — run the detector twice on the same input, assert output is byte-identical. + +## Logging + +- **SLF4J** via Spring Boot's default Logback. Pattern across the codebase: `private static final Logger log = LoggerFactory.getLogger(MyClass.class);`. +- `application.yml` already silences known-noisy loggers (`org.springframework.ai.mcp` → WARN, `PostProcessorRegistrationDelegate` → WARN). Don't add more bare `org.springframework.*` loggers without good cause. +- **No PII concerns** — codeiq scans the user's own code; logs go to the user's terminal. + +## Adding a new detector + +(Authoritative recipe — slightly expanded from [`CLAUDE.md`](../../CLAUDE.md) §"Adding a New Detector".) + +1. **Pick the right base class** (table below) and create `src/main/java/io/github/randomcodespace/iq/detector//Detector.java`. +2. **Annotate with `@Component`** (Spring auto-discovery) **and `@DetectorInfo(name=..., category=..., parser=ParserType.X, languages={...}, nodeKinds={...}, edgeKinds={...}, properties={...})`** (used by the `plugins` CLI command for introspection). Live examples: `detector/jvm/java/SpringSecurityDetector.java`, `detector/go/GoStructuresDetector.java`. +3. **Implement `detect(DetectorContext ctx)`** — return a `DetectorResult` populated with `CodeNode`s and `CodeEdge`s. Detectors are stateless; the `DetectorContext` is your scratch space. +4. **Framework detectors require a discriminator guard** — e.g. Quarkus must require `import io.quarkus.*`, Fastify must require `import 'fastify'`. Otherwise you'll match Spring controllers as Quarkus or Express as Fastify. **No exceptions** — this rule is enforced by review. +5. **Property-key constants** for any string literal repeated 3+ times. +6. **Add tests** in `src/test/java/.../detector//DetectorTest.java`: positive, negative, determinism. +7. **Run `mvn test`** — all 236+ tests must still pass. +8. **No registry edit needed** — Spring classpath scan picks up the `@Component`. The `plugins list` CLI command will introspect via `@DetectorInfo`. + +### Detector base classes + +| Class | Use when | +|---|---| +| `Detector` (interface) | You need full control; rare | +| `AbstractRegexDetector` | Pattern-only detection (most detectors) | +| `AbstractJavaParserDetector` | Java AST via JavaParser (Spring, JPA, etc.) | +| `AbstractAntlrDetector` | ANTLR grammar-based (TS, Python, Go, C#, Rust, C++) | +| `AbstractStructuredDetector` | Structured config files (YAML, JSON, TOML, INI, properties) | +| `AbstractPythonAntlrDetector` | Python ANTLR detectors (shared parse, getBaseClassesText, extractClassBody) | +| `AbstractPythonDbDetector` | Python ORM detectors (adds ensureDbNode/addDbEdge via DetectorDbHelper) | +| `AbstractTypeScriptDetector` | TS regex detectors (shared getSupportedLanguages, detect→detectWithRegex) | +| `AbstractJavaMessagingDetector` | Java messaging detectors (shared CLASS_RE, extractClassName, addMessagingEdge) | + +### Shared static helpers (don't subclass — call them) + +| Class | Purpose | +|---|---| +| `DetectorDbHelper` | `ensureDbNode` / `addDbEdge` for any detector emitting `DATABASE_CONNECTION` nodes | +| `FrontendDetectorHelper` | `createComponentNode` / `lineAt` for Angular, React, Vue detectors | +| `StructuresDetectorHelper` | `addImportEdge` / `createStructureNode` for Scala/Kotlin structure detectors | + +## Adding a new CLI command + +1. Create `src/main/java/io/github/randomcodespace/iq/cli/Command.java`. +2. Annotate `@Component` and `@picocli.CommandLine.Command(name="", description="...")`. +3. Implement `Callable` returning the exit code. +4. Wire as a subcommand of `CodeIqCli` in `cli/CodeIqCli.java` (it lists subcommands explicitly). +5. If the command needs a Spring profile other than `indexing` (only `serve` does this), update the `if (isServe) ...` block in `CodeIqApplication.main` — note this is **not** generic, so adding another `serving`-profile command means rethinking that conditional. + +## Adding a new REST endpoint + +1. Add a `@GetMapping` method (read-only — no `@PostMapping`/`@PutMapping`/`@DeleteMapping`) to the appropriate controller in `src/main/java/io/github/randomcodespace/iq/api/`. +2. Delegate to `query/QueryService.java` (or one of its peers — `StatsService`, `TopologyService`) — controllers stay thin. +3. **Mirror it in `mcp/McpTools.java`** as a new `@McpTool`. The MCP tool description must explain when an LLM should call it; copy the wording style of existing tools. +4. Add a controller test using standalone MockMvc (no `@SpringBootTest`). + +## Adding a new MCP tool + +1. Add a method on `mcp/McpTools.java` annotated `@McpTool(name="...", description="...")`. +2. Parameters: annotate with `@McpToolParam(description="...")`. +3. Return type: anything Jackson can serialize (typically a `Map` or a record). Jackson's `FAIL_ON_UNKNOWN_PROPERTIES` is globally disabled for MCP-protocol compatibility (`config/JacksonConfig.java`). +4. Test by calling the method directly in a unit test — no protocol round-trip needed. + +## Things to avoid (anti-patterns) + +- **`Set` iteration without sorting** — kills determinism. Use `TreeSet`, `stream().sorted(...)`, or sort the resulting list. +- **Mutable instance state on detectors** — they're Spring singletons; concurrent calls will collide. Per-call state goes in method-local variables / `DetectorContext`. +- **Coarse `synchronized` on `AnalysisCache`** — the `ReentrantReadWriteLock` is deliberate. Don't "simplify" to `synchronized` blocks; that serializes reads unnecessarily. +- **Direct `Boolean.TRUE.equals(yamlKey)`** — SnakeYAML parses bare `on` as `Boolean.TRUE`. Use `String.valueOf(key)` for YAML key comparisons (SonarCloud S2159). +- **Regex with nested non-possessive quantifiers** — use `*+` instead of `*` for nested patterns. `([^"\\]*+(?:\\.[^"\\]*+)*+)` not `([^"\\]*(?:\\.[^"\\]*)*)`. Stack-overflow risk (SonarCloud S5998). +- **Adding a new property to `CodeNode` without round-trip-testing** — Neo4j stores properties as `prop_`; `nodeFromNeo4j()` must restore them. A new property that survives `bulkSave` but not `nodeFromNeo4j` will silently disappear when read back. +- **Edges referencing nodes that don't exist yet** — `bulkSave`'s edge UNWIND silently drops rows whose source/target IDs don't match any node. Pre-validate IDs. +- **Generic patterns in framework detectors** — `router.get(...)` matches Express, Fastify, NestJS, Vue Router, Hono, and probably ten others. Always require a framework-specific import. + +## Don't refactor (intentional non-standard choices) + +- **Single-file `NodeKind` and `EdgeKind` enums.** They're long (34/28 values) and could be split, but they're load-bearing for cross-file uniqueness and detector readability. Don't split — keeps the type surface in one diff-friendly file. See `model/NodeKind.java`, `model/EdgeKind.java`. +- **No SDN hydration on the read path.** `graph/GraphStore.java` uses raw Cypher + `nodeFromNeo4j()` for reads; `graph/GraphRepository.java` (Spring Data Neo4j) is used **only for writes**. This is deliberate — SDN's hydration overhead was measured and rejected for the read path. Don't unify them. +- **Auto-discovery via Spring `@Component` on detectors, no explicit registry.** Drop in a class, it's live. The `DetectorRegistry` exists to *introspect* the discovered set, not to register them. Don't replace with a manual registry. +- **CLI profile selection in `CodeIqApplication.main` (not via Picocli's mechanism).** It's a string `if/else` on the first arg, and it pre-empts Picocli to set the Spring profile *before* the context starts. Looks ugly; works correctly. SpotBugs flagged the original duplicate branches; the current version was deliberately collapsed. +- **`indexing` profile sets `WebApplicationType.NONE`** — meaning `mvn test` from the IDE without `@ActiveProfiles("test")` will try to start the web server and pin to ports. Always use `@ActiveProfiles("test")` on `@SpringBootTest`. +- **Frontend assets bundled into the JAR (`src/main/resources/static/`)** — no separate frontend deploy. Vite's `outDir: '../resources/static'` is the embed seam; don't move the SPA out of the JAR without re-architecting the deploy story. +- **`prop_*` Neo4j property prefix.** It's a deliberate namespacing scheme to separate domain properties from top-level node attributes (`id`, `kind`, `layer`, etc.). Don't rename. diff --git a/docs/project/data-model.md b/docs/project/data-model.md new file mode 100644 index 00000000..5fdae10a --- /dev/null +++ b/docs/project/data-model.md @@ -0,0 +1,127 @@ +# Data Model + +codeiq's data model has **three storage layers**, each with its own schema and lifetime: + +| Layer | Backing | Purpose | Lifetime | +|---|---|---|---| +| Domain types | Java records / enums | In-memory shape of nodes/edges, single source of truth | Per JVM run | +| Analysis cache | H2 (file-backed, embedded) | Per-file detection results keyed by content hash; enables incremental re-indexing | `.codeiq/cache/` until manually cleared or `CACHE_VERSION` bump | +| Graph | Neo4j Embedded (Community Edition 2026.02.3) | Final enriched graph for queries, MCP tools, REST API | `.codeiq/graph/graph.db/` until manually cleared | + +## Storage + +### Primary datastore — Neo4j Embedded +- **Defined in:** `pom.xml` `2026.02.3`, bootstrapped in `config/Neo4jConfig.java` (only loaded under the `serving` profile via `@ConditionalOnProperty(value="codeiq.neo4j.enabled", havingValue="true")`). +- **Data dir:** `.codeiq/graph/graph.db/` inside the scanned repo. +- **Migration tool:** none — Neo4j is schemaless; indexes/constraints are created idempotently by `GraphStore.bulkSave()`. + +### Secondary datastore — H2 (analysis cache) +- **Defined in:** `cache/AnalysisCache.java`. H2 is a transitive Spring Boot dependency (no explicit version pin in `pom.xml`). +- **Data dir:** `.codeiq/cache/` inside the scanned repo. +- **Schema versioning:** `CACHE_VERSION = 4` constant near the top of `AnalysisCache.java` (currently line 43; grep the symbol if drifted). On startup, cache reads the stored version; if it doesn't match, the H2 file is wiped and recreated. **Bump `CACHE_VERSION` whenever you change the file-hash algorithm or the schema.** + +## Domain types + +### `CodeNode` and `CodeEdge` +- **Defined in:** `model/CodeNode.java`, `model/CodeEdge.java`. +- **Plain Java records / classes** (not JPA entities — Spring Data Neo4j is used only on the write path). Properties live in a `Map`. +- **ID format:** `"{prefix}:{filepath}:{type}:{identifier}"` (e.g. `"node:src/main/java/Foo.java:class:Foo"`). Cross-file uniqueness is enforced by including the full file path. See existing detectors for the prefix convention. + +### `NodeKind` (enum) +- **Defined in:** `model/NodeKind.java`. +- **34 concrete values** (javadoc and file are in sync as of 2026-04-27): + +``` +MODULE, PACKAGE, CLASS, METHOD, ENDPOINT, ENTITY, REPOSITORY, QUERY, +MIGRATION, TOPIC, QUEUE, EVENT, RMI_INTERFACE, CONFIG_FILE, CONFIG_KEY, +WEBSOCKET_ENDPOINT, INTERFACE, ABSTRACT_CLASS, ENUM, ANNOTATION_TYPE, +PROTOCOL_MESSAGE, CONFIG_DEFINITION, DATABASE_CONNECTION, AZURE_RESOURCE, +AZURE_FUNCTION, MESSAGE_QUEUE, INFRA_RESOURCE, COMPONENT, GUARD, +MIDDLEWARE, HOOK, SERVICE, EXTERNAL, SQL_ENTITY +``` + +Each enum constant carries a lowercase `value` (e.g. `CLASS("class")`) used as the string representation in Cypher / JSON / MCP-tool responses. `NodeKind.fromValue(...)` does the reverse lookup via a static `BY_VALUE` map. + +### `EdgeKind` (enum) +- **Defined in:** `model/EdgeKind.java`. +- **28 concrete values** (javadoc and file are in sync as of 2026-04-27): + +``` +DEPENDS_ON, IMPORTS, EXTENDS, IMPLEMENTS, CALLS, INJECTS, EXPOSES, +QUERIES, MAPS_TO, PRODUCES, CONSUMES, PUBLISHES, LISTENS, INVOKES_RMI, +EXPORTS_RMI, READS_CONFIG, MIGRATES, CONTAINS, DEFINES, OVERRIDES, +CONNECTS_TO, TRIGGERS, PROVISIONS, SENDS_TO, RECEIVES_FROM, PROTECTS, +RENDERS, REFERENCES_TABLE +``` + +### `layer` (string property, not an enum) +Every node carries a `layer` property set by `analyzer/LayerClassifier.java` to one of: `frontend`, `backend`, `infra`, `shared`, `unknown`. Classification is deterministic — based on `kind`, `framework`, and path heuristics. + +## H2 cache schema + +Defined in the `SCHEMA_SQL` text block near the top of `cache/AnalysisCache.java` (grep `SCHEMA_SQL`). Tables (verified from the file): + +| Table | Purpose | +|---|---| +| `cache_meta` | `meta_key` (PK) → `meta_value` — stores the `version` row matching `CACHE_VERSION` | +| `files` | `content_hash` (PK) → file path, language, size, parse timestamp; the unit of cache lookup | +| `nodes` | per-file detected nodes; `row_id` AUTO_INCREMENT PK; FK to `files.content_hash` | +| `edges` | per-file detected edges; FK to `files.content_hash` | +| `analysis_runs` | `run_id` (PK), wall-clock metadata for one `index`/`analyze` invocation | + +**Reserved-word note:** H2 reserves `key`, `value`, `order`. The schema uses `meta_key` / `meta_value` etc. — keep that pattern when extending. + +**Concurrency:** the cache uses a `ReentrantReadWriteLock` (`AnalysisCache.java`). Many virtual-thread readers can run in parallel; writers serialize. This is what avoids `ClosedChannelException` against H2's MVStore file channel under concurrent virtual-thread access. + +## Neo4j schema (created by `GraphStore.bulkSave`) + +Indexes created idempotently (`CREATE … IF NOT EXISTS`) inside `GraphStore.bulkSave()` (`graph/GraphStore.java`, around lines 112–122 at time of writing — grep `CREATE INDEX` to relocate): + +| Index | Type | Property | +|---|---|---| +| (unnamed) | b-tree | `(:CodeNode {id})` | +| (unnamed) | b-tree | `(:CodeNode {label_lower})` | +| (unnamed) | b-tree | `(:CodeNode {fqn_lower})` | +| `search_index` | fulltext | `[label_lower, fqn_lower]` over `:CodeNode` | +| `lexical_index` | fulltext | `[prop_lex_comment, prop_lex_config_keys]` over `:CodeNode` | + +The `CLAUDE.md` "Gotchas" section additionally references b-tree indexes on `kind`, `layer`, `module`, `filePath`. **Cross-check before relying on those** — `grep "CREATE INDEX" graph/GraphStore.java` shows only the 3 above plus the 2 fulltext indexes. The CLAUDE.md claim may be aspirational or stale. + +### Property round-trip convention + +Domain `properties` Map → Neo4j stored as `prop_` properties. Domain ID, layer, kind, etc. become top-level node properties (`id`, `layer`, `kind`, `label_lower`, `fqn_lower`, `module`, `filePath`). The reverse mapping is in `nodeFromNeo4j()` inside `graph/GraphStore.java`. **Whenever you add a domain property, verify the round-trip survives** — silent property loss is the most common bug class on this seam. + +### Bulk-save batching + +`bulkSave` uses `UNWIND $batch AS props CREATE (n:CodeNode) SET n = props` for nodes (default batch 500) and a similar UNWIND-MATCH-MATCH-CREATE pattern for edges. Edge UNWIND **silently drops rows whose source/target node IDs are missing** — pre-validate before passing in. See [`CLAUDE.md`](../../CLAUDE.md) §"Gotchas". + +## Lifecycle / state machines + +There are no state machines on entities themselves. The closest thing is the **pipeline lifecycle** that produces them: + +``` +file on disk + ─► hashed (SHA-256, FileHasher.java) + ─► H2 cache lookup + ├─ hit → reuse cached nodes/edges + └─ miss → run detectors, write nodes+edges keyed by content_hash + ─► H2 cache populated + +(later, on `enrich`:) + ─► H2 read + ─► UNWIND bulk-load to Neo4j + ─► linkers (Topic, Entity, ModuleContainment, Guard) add cross-file edges + ─► LayerClassifier sets layer property on every node + ─► ServiceDetector adds SERVICE nodes + CONTAINS edges + ─► LanguageEnricher (per-language extractors) adds extractor results + ─► LexicalEnricher adds prop_lex_* + the lexical_index + ─► graph ready for `serve` +``` + +## Schema source of truth + +- **Neo4j shape:** `graph/GraphStore.java` is canonical (it creates the indexes; there are no other DDL sources). Property names like `label_lower` / `fqn_lower` / `prop_*` are decided here. +- **H2 shape:** `cache/AnalysisCache.java`'s `SCHEMA_SQL` constant is canonical. There is no separate migration directory — `CACHE_VERSION` is the migration mechanism. +- **Domain shape:** `model/{CodeNode,CodeEdge,NodeKind,EdgeKind}.java` are canonical. Detectors reference these enums by symbol; never use the lowercase string forms in detector code. + +If you change any of the three, **update the other two seams** (or document why you didn't). diff --git a/docs/project/flows.md b/docs/project/flows.md new file mode 100644 index 00000000..24ef4ba1 --- /dev/null +++ b/docs/project/flows.md @@ -0,0 +1,127 @@ +# Key Flows + +Four flows worth tracing — they cover the main code paths an agent will need to modify or debug. Each lists the file:line entry and the chain of calls. **Line numbers are accurate at the time of writing (2026-04-27)** but rot — `grep` for the symbol if a line drifts. + +--- + +## Flow: `codeiq index ` — file scan → H2 cache + +**Trigger:** `java -jar code-iq-*-cli.jar index /path/to/repo` from a shell. + +**Path through code:** + +1. `CodeIqApplication.java` `main(...)` — Spring Boot starts. The first arg (`index`) is *not* `serve`, so the app sets profile `indexing` and `WebApplicationType.NONE` (the `if (isServe) ... else ...` block). No web server spins up. +2. `CodeIqApplication.run(args)` — Picocli takes over: `new CommandLine(codeIqCli, factory).execute(args)`. +3. `cli/CodeIqCli.java` — top-level Picocli `@Command`. Subcommand dispatch routes to `cli/IndexCommand.java`. +4. `cli/IndexCommand.call()` — opens `cache/AnalysisCache` (creates the H2 file at `.codeiq/cache/` if missing; checks `CACHE_VERSION`). +5. `analyzer/FileDiscovery.discover(rootPath)` — runs `git ls-files` if the path is a git repo, else walks the filesystem. Returns a list of `DiscoveredFile`s with language tagged via `analyzer/FileClassifier.java`. +6. For each file, in batches (default 500): hash via `cache/FileHasher.hash(...)` (SHA-256), check the cache. + - **Cache hit** → reuse existing nodes/edges from H2. + - **Cache miss** → continue. +7. `analyzer/StructuredParser.parse(file)` — routes to JavaParser (Java), `grammar/AntlrParserFactory` (TS/Py/Go/C#/Rust/C++), or raw text. +8. **Detector fan-out** on virtual threads: every `@Component`-annotated `Detector` whose `getSupportedLanguages()` matches gets called with a `DetectorContext`. Results are collected per file. (Auto-discovery via Spring classpath scan; no manual list.) +9. `analyzer/GraphBuilder.addNodes(...) / addEdges(...)` — buffer to indexed slots so order is independent of thread completion. +10. `cache/AnalysisCache.write(contentHash, nodes, edges, runId)` — persist via UNWIND-friendly batches. +11. CLI prints summary; exit code 0. + +**Side effects:** `.codeiq/cache/` H2 file populated/updated. **No Neo4j writes**. No network calls. + +**Failure modes:** +- Per-file detector exceptions: caught + logged in `Analyzer.java`'s task wrapper; the file is skipped, the run continues. +- `CACHE_VERSION` mismatch: H2 file is wiped + recreated automatically on startup. +- Disk-full / permission errors: bubble up, run aborts with non-zero exit. + +--- + +## Flow: `codeiq enrich ` — H2 → Neo4j with linkers + classifiers + +**Trigger:** `java -jar code-iq-*-cli.jar enrich /path/to/repo` (after `index`). + +**Path through code:** + +1. `CodeIqApplication.main(...)` — same profile-selection logic; `enrich` → `indexing` profile, no web server. +2. `cli/EnrichCommand.call()` — opens `cache/AnalysisCache` (read), opens Neo4j Embedded directly via `DatabaseManagementServiceBuilder` (programmatic — Spring's `@Profile("serving")` Neo4j config is *not* loaded here). +3. `EnrichCommand` reads all nodes + edges from H2 in batches. +4. `graph/GraphStore.bulkSave(nodes, edges)` (line numbers approximate at time of writing — grep the Cypher fragment if drifted): + - `MATCH (n) WITH n LIMIT 5000 DETACH DELETE n RETURN count(*)` — clear in chunks if a previous graph existed. + - `CREATE INDEX IF NOT EXISTS` for `id`, `label_lower`, `fqn_lower` + `CREATE FULLTEXT INDEX` for `search_index` and `lexical_index`. + - `UNWIND $batch AS props CREATE (n:CodeNode) SET n = props` — nodes, batched (default 500). + - `UNWIND $batch AS e MATCH (a {id: e.src}) MATCH (b {id: e.tgt}) CREATE (a)-[r:EDGE_KIND]->(b)` — edges, batched. **Silently drops rows where source/target IDs miss.** +5. `analyzer/linker/*` — runs in order: `TopicLinker`, `EntityLinker`, `ModuleContainmentLinker`, `GuardLinker`. Each adds cross-file edges (e.g. `PRODUCES`/`CONSUMES` from a topic name appearing in two services). +6. `analyzer/LayerClassifier.classify(...)` — sets `n.layer` on every node based on `kind`, `framework`, and path heuristics. +7. `analyzer/ServiceDetector.detect(rootPath)` — walks the filesystem (not the Neo4j graph) for build files (Maven, Gradle, npm, Cargo, go.mod, etc. — 30+). Creates `:CodeNode {kind: 'service'}` nodes and `CONTAINS` edges to every module/file inside the service boundary. +8. `intelligence/extractor/LanguageEnricher` — runs per-language extractors (`JavaLanguageExtractor`, `TypeScriptLanguageExtractor`, `PythonLanguageExtractor`, `GoLanguageExtractor`) to add language-specific properties. +9. `intelligence/lexical/LexicalEnricher` — extracts doc comments (`DocCommentExtractor`) and persists to `prop_lex_comment`; populates the `lexical_index` fulltext index. +10. CLI prints summary; exit 0. + +**Side effects:** `.codeiq/graph/graph.db/` populated. H2 cache untouched. + +**Failure modes:** +- Edge with missing source/target ID: silently dropped by Cypher MATCH. Mitigation: pre-validate IDs before passing to `bulkSave`. **Most common cause of "missing relationships" bugs.** +- Property round-trip failure: a domain property survives `bulkSave` but `nodeFromNeo4j()` doesn't know to restore it → silent property loss. Verify by reading back any node you just wrote. + +--- + +## Flow: `codeiq serve ` — REST + MCP + UI request lifecycle + +**Trigger:** `java -jar code-iq-*-cli.jar serve /path/to/repo` (after `enrich`). Then a browser hits `http://localhost:8080/explorer` or an MCP client calls a tool. + +**Path through code (cold start):** + +1. `CodeIqApplication.main(...)` — first arg is `serve` → profile `serving` activated; web server starts. +2. Spring loads beans gated by `@Profile("serving")`: all 4 controllers in `api/`, `mcp/McpTools` (via Spring AI starter), the Neo4j `@Configuration` in `config/Neo4jConfig.java` (only when `codeiq.neo4j.enabled=true`). +3. Neo4j Embedded starts; `health/GraphHealthIndicator` reports status to `/actuator/health`. +4. Spring Boot's static-resource handler binds `src/main/resources/static/` (the bundled SPA) to `/`. +5. Server bound — `http://localhost:8080` ready. + +**Path through code (REST request, e.g. `GET /api/stats`):** + +1. Browser hits `/api/stats`. +2. `api/GraphController.getStats(...)` (`@GetMapping("/stats")`) is dispatched (carries `@Profile("serving")`). +3. Controller delegates to `query/StatsService.getStats()`. +4. `StatsService` runs Cypher queries via `graph/GraphStore.queryNodes(...)` (raw Cypher, not SDN). +5. Results aggregated into a `Map` and serialized by Jackson. +6. HTTP response returned. + +**Path through code (MCP tool call, e.g. `find_dead_code`):** + +1. MCP client (Claude Desktop, an LLM agent, the SPA's `McpConsole`) sends a JSON-RPC call to `/mcp` (mounted by Spring AI's `spring-ai-starter-mcp-server-webmvc`). +2. Spring AI dispatches to the matching `@McpTool`-annotated method on `mcp/McpTools.java`. +3. The MCP tool delegates to `query/QueryService.findDeadCode()` (or similar). +4. `QueryService` runs Cypher (filters by semantic edges only — `calls`, `imports`, `depends_on`; excludes structural `contains`, `defines`, and entry points like endpoints / config files — see [`CLAUDE.md`](../../CLAUDE.md) "Gotchas"). +5. Result returned as JSON-RPC response. + +**Side effects:** None — strictly read-only. + +**Failure modes:** +- Calling `serve` before `enrich` → `health/GraphHealthIndicator` reports DOWN; queries return empty results. Fix: run `enrich` first. +- CORS rejection if the SPA is being served from a different origin in dev: configure `codeiq.cors.allowed-origin-patterns` in `application.yml` (or env: `CODEIQ_CORS_ALLOWED_ORIGIN_PATTERNS`). +- `FAIL_ON_UNKNOWN_PROPERTIES` is globally disabled (`config/JacksonConfig.java`) — MCP protocol clients won't break on field additions, but it also hides typos in JSON inputs. Validate at the controller boundary. + +--- + +## Flow: Adding a new detector and seeing it run + +**Trigger:** developer adds `MyDetector.java` and rebuilds. + +**Path through code (compile-time + first run):** + +1. `src/main/java/io/github/randomcodespace/iq/detector//MyDetector.java` — new file, `@Component`-annotated, `@DetectorInfo(...)`-annotated, extending one of the `Abstract*Detector` base classes. +2. `mvn package` — compiles the class. +3. On the next `codeiq index `: + - Spring Boot starts under `indexing` profile, classpath-scans `io.github.randomcodespace.iq` for `@Component`s. + - `MyDetector` is instantiated as a singleton bean. + - `analyzer/Analyzer` (or `cli/IndexCommand`) iterates Spring's `Map` of all bean instances. +4. For every file whose language matches `getSupportedLanguages()`, `MyDetector.detect(ctx)` is called on a virtual thread. +5. Returned `DetectorResult` is folded into `GraphBuilder` (nodes-first, then edges). +6. From there: identical to the `index` flow — H2 cache write, then `enrich`, then visible via `serve`. + +**Verification:** +- `codeiq plugins list` introspects via `@DetectorInfo` and confirms the detector is live. +- `codeiq stats ` — node-kind counts should change after re-indexing. +- Unit test `MyDetectorTest` (positive + negative + determinism) must pass via `mvn test`. + +**Failure modes:** +- Forgot `@Component` → silently disabled, no error. Test won't catch it (unit tests instantiate directly). Catch via `codeiq plugins list` showing the detector is missing. +- Missing discriminator guard on a framework detector → false positives across other frameworks. Catch via the negative-match unit test. +- Stateful instance fields → race conditions across virtual threads. Catch via the determinism test. diff --git a/docs/project/ui.md b/docs/project/ui.md new file mode 100644 index 00000000..c46599db --- /dev/null +++ b/docs/project/ui.md @@ -0,0 +1,136 @@ +# UI + +App-mode (not library-mode): codeiq ships a single React SPA bundled inside the JAR and served by Spring Boot's static-resource handler at `http://localhost:8080/` when running `codeiq serve`. + +## Stack + +- **Framework:** React 18.3 (`src/main/frontend/package.json`) +- **Build tool:** Vite 6.4 + TypeScript 5.7 (`src/main/frontend/vite.config.ts`, `tsconfig.json`) +- **UI kit:** Ant Design 5.24 + `@ant-design/icons` 5.6 +- **Charts:** ECharts 5.6 via `echarts-for-react` 3.0 +- **Routing:** `react-router-dom` 7 +- **Styling:** AntD's built-in theme system (no Tailwind, no CSS Modules); `context/ThemeContext.tsx` toggles light/dark via AntD's `ConfigProvider` token system. +- **State management:** local component state + a tiny `useApi` hook (`hooks/useApi.ts`); no Redux / Zustand / React Query. +- **Data fetching:** raw `fetch` wrapped in `lib/api.ts` + `hooks/useApi.ts`. + +## Entry & layout + +- **HTML entry:** `src/main/frontend/index.html` (Vite default). +- **JS entry:** `src/main/frontend/src/main.tsx` → renders `` (`src/main/frontend/src/App.tsx`). +- **Root shell:** `App.tsx` wires the AntD `ConfigProvider`, the `ThemeContext.Provider`, and `react-router-dom`'s `BrowserRouter` + `Routes`. +- **Layout:** `components/AppLayout.tsx` — sidebar + content area; light/dark toggle via `useTheme()` from `ThemeContext.tsx`. +- **Provider stack** (outer → inner): AntD `ConfigProvider` → `ThemeContext.Provider` → `BrowserRouter` → `AppLayout` → page route. + +## Component organization + +``` +src/main/frontend/src/ +├── main.tsx — Vite entry, renders +├── App.tsx — providers + routes +├── env.d.ts — Vite env-var types +├── components/ +│ └── AppLayout.tsx — sidebar + content layout, theme toggle +├── context/ +│ └── ThemeContext.tsx — light/dark toggle +├── hooks/ +│ └── useApi.ts — generic API-call hook (loading / error / data) +├── lib/ +│ ├── api.ts — fetch wrapper + endpoint helpers +│ └── mcp-tools.ts — TOOLS, CATEGORIES, toolsByCategory, McpTool type +├── pages/ — one file per route +│ ├── Dashboard.tsx — stats overview + MCP tool launcher +│ ├── CodebaseMap.tsx — file-tree explorer +│ ├── Explorer.tsx — node/edge browser with kind filter + search +│ └── McpConsole.tsx — interactive MCP-tool playground +└── types/ + └── api.ts — TypeScript types matching the REST API shapes +``` + +**Conventions:** +- **`@/...` import alias** resolves to `src/main/frontend/src/...` (`vite.config.ts` `resolve.alias` + `tsconfig.json` `paths`). Always use the alias — never `../../../`. +- **One component per file**, `PascalCase.tsx`. +- **Pages are at `src/pages/`**; shared/UI primitives at `src/components/`. Reusable, non-page UI primitives haven't grown enough to warrant a `ui/` sublayer yet — fold into `components/` until that becomes painful. +- **No test colocation** for the SPA — frontend tests are E2E only via Playwright. Component-level testing isn't currently practiced. + +## Routes + +(Inferred from page filenames; **verify in `src/main/frontend/src/App.tsx`** before relying.) + +- `/` → `Dashboard` +- `/explorer` → `Explorer` +- `/codebase-map` → `CodebaseMap` +- `/mcp` → `McpConsole` + +## Design system + +- **Tokens:** AntD's built-in token system, customized via `ConfigProvider` in `App.tsx` and theme-keyed via `ThemeContext.tsx`. No standalone token file. +- **Primitives:** AntD components used directly (`Button`, `Layout`, `Menu`, `Table`, `Input`, etc.). No internal wrapper library. +- **Icons:** `@ant-design/icons` (`SunOutlined`, `MoonOutlined`, etc. — see `components/AppLayout.tsx`). + +## Data fetching + +`hooks/useApi.ts` wraps `lib/api.ts`'s `api.(...)` calls and exposes `{ data, loading, error, refetch }`. Page components use it like: + +```ts +const { data, loading, error } = useApi(() => api.stats()); +``` + +Endpoint helpers live in `lib/api.ts`; response types in `types/api.ts`. The MCP tools list — used by `Dashboard` and `McpConsole` — is a static client-side catalog at `lib/mcp-tools.ts` (it mirrors `mcp/McpTools.java` server-side; **must be kept in sync manually** when adding a tool). + +## Forms & validation + +Minimal — no `react-hook-form` / `formik`. The `McpConsole` builds parameter inputs dynamically from `lib/mcp-tools.ts` definitions; validation is "send and surface server error". This is fine for an internal dev tool. + +## i18n / a11y / theming + +- **i18n:** none. Strings are inline English. codeiq is a developer tool; no plan to localize. +- **a11y:** Playwright config integrates `@axe-core/playwright` (`src/main/frontend/package.json` devDep) — accessibility audits run as part of E2E. AntD's primitives carry sensible roles/labels; custom components inherit those. +- **Theming:** `ThemeContext.tsx` flips a boolean → AntD token theme (`defaultAlgorithm` vs `darkAlgorithm`). The toggle is in the layout header. No `prefers-color-scheme` auto-detection currently — feature gap if you care. + +## Performance notes + +- **Manual chunk splitting** in `vite.config.ts` (`build.rollupOptions.output.manualChunks`): + - `vendor-react` — React + react-dom + react-router-dom + - `vendor-antd` — antd + @ant-design/icons + - `vendor-echarts` — echarts + echarts-for-react + + Keeps the AntD chunk and the ECharts chunk out of the initial paint; both are heavy. +- **`chunkSizeWarningLimit: 1200`** — Vite's default 500 KB warning was too noisy for the AntD chunk; raised deliberately. +- **`emptyOutDir: false`** — preserves manually-placed assets in `src/main/resources/static/` between builds. If you see leftover files, delete the dir manually. +- **`sourcemap: false`** — production output ships without sourcemaps (the JAR is the ship artifact; sourcemaps would balloon it). + +## Dev loop + +```bash +# Backend — terminal 1 +java -jar target/code-iq-*-cli.jar serve /path/to/scan-target + +# Frontend — terminal 2 +cd src/main/frontend +npm install # only first time +npm run dev # Vite HMR on :5173, proxies /api and /mcp to :8080 +``` + +The Vite dev-server proxy is defined at the bottom of `vite.config.ts`: + +```ts +server: { + proxy: { + '/api': 'http://localhost:8080', + '/mcp': 'http://localhost:8080', + }, +} +``` + +## Production build → JAR embed + +`mvn package` triggers `frontend-maven-plugin` which runs `npm ci` + `npm run build`. Vite's `build.outDir: '../resources/static'` writes assets into `src/main/resources/static/`, which Spring Boot's static-resource handler serves out of the JAR at runtime when `codeiq.ui.enabled=true` (default true; toggle in `application.yml`). + +To skip the frontend build during backend-only iteration: `mvn test -Dfrontend.skip=true` (the property is wired in `pom.xml`'s `` block as `false`). + +## Gotchas + +- **`lib/mcp-tools.ts` is hand-maintained** — when you add a new `@McpTool` in `mcp/McpTools.java`, you must mirror the entry in `lib/mcp-tools.ts` for the `McpConsole` and `Dashboard` to know about it. There is no auto-sync. +- **`emptyOutDir: false`** — stale assets in `src/main/resources/static/` won't be deleted by Vite. If you renamed a chunk or removed a page, manually delete the static dir before the next build. +- **MCP endpoint path is `/mcp`**, not `/api/mcp` — the Vite proxy reflects this. The Spring AI starter mounts MCP at the root. +- **AntD chunk size is intentional.** Don't try to "fix" the 500 KB+ AntD chunk by code-splitting per page — the AntD design tokens shouldn't be reloaded per route. The manual chunk in `vite.config.ts` is the right granularity. diff --git a/docs/specs/2026-04-27-resolver-spi-and-java-pilot-design.md b/docs/specs/2026-04-27-resolver-spi-and-java-pilot-design.md new file mode 100644 index 00000000..81326446 --- /dev/null +++ b/docs/specs/2026-04-27-resolver-spi-and-java-pilot-design.md @@ -0,0 +1,379 @@ +# Sub-project 1 — Resolver SPI + Java Pilot + Confidence Schema + +> **Status:** Awaiting approval. Brainstormed 2026-04-27. +> **Authors:** brainstormed via `superpowers:brainstorming` with the project maintainer. +> **Audience:** the agent / engineer who will implement this. Every claim should be checkable against the codebase referenced by `CLAUDE.md` and `PROJECT_SUMMARY.md`. + +## 1. Context + +codeiq's detector layer is the right abstraction. The **layer below it** is the bottleneck: detectors receive a parse tree (ANTLR) or AST (JavaParser) but no resolved symbol table. As a result, edges like `CALLS`, `INJECTS`, `IMPLEMENTS`, `EXTENDS`, and many framework-specific edges are emitted *by name*, not by **resolved type**. Two same-named symbols across packages collapse into one node; `userService.findById(id)` resolves to whichever `findById` the detector happens to see first. + +This is the architectural seam between "rich code map" and "ground-truth semantic graph." Every other planned improvement — TypeScript / Python / Go / Rust / C++ / C# resolution, framework-aware detection refactor, cross-framework false-positive harness — slots into this seam. Doing it second means inventing the seam ad-hoc inside whichever sub-project lands first, then retrofitting. + +This spec covers **sub-project 1 of 8** in the larger "robust graph" decomposition: + +| # | Scope | This spec? | +|---|---|---| +| 1 | Resolver SPI + Java pilot + confidence/provenance schema | **Yes** | +| 2 | TypeScript / JavaScript resolution | No | +| 3 | Python resolution | No | +| 4 | Go resolution | No | +| 5 | Rust / C++ / C# resolution | No | +| 6 | Framework-aware detection refactor | No | +| 7 | Cross-framework false-positive harness | No | +| 8 | MCP HTTP-streamable hardening (read-path) | No | + +## 2. Goals + +1. **Add a symbol-resolution stage** to the indexing pipeline, between parse and detect, that exposes a resolved symbol table to detectors. +2. **Wire a Java backend** using JavaParser's `JavaSymbolSolver`, with no new dependency tree (the solver is published alongside JavaParser). +3. **Add a confidence/provenance schema** (`Confidence` enum + `source` field) on every `CodeNode` and `CodeEdge`, round-tripped through Neo4j. +4. **Migrate 4–6 Java detectors** to use the resolver as proof of value: at least one Spring DI detector, one JPA detector, one messaging detector. +5. **Preserve backward compatibility:** all existing detectors compile and run unchanged. Resolution is opt-in per detector via `ctx.resolved()`. +6. **Preserve determinism:** resolver-stage output is byte-identical run-to-run, with the same input. +7. **Aggressive testing**, including adversarial inputs, concurrency stress, property-based, fuzz, mutation testing, and regression against the existing E2E quality bar. + +## 3. Non-goals + +- Maven / Gradle classpath JAR resolution beyond what `ReflectionTypeSolver` covers via the running JDK. (Possible follow-up: sub-project 1.5.) +- Resolution for non-Java languages. (Sub-projects 2–5.) +- Refactoring detectors to detect by resolved type rather than import-name. (Sub-project 6 — separate concern; a migrated detector here keeps its current detection mechanism, only resolving outgoing edges' targets more accurately.) +- Performance optimization beyond what the design naturally affords. (Defer until measured.) +- Changes to the serving layer (REST API, MCP tools, web UI). +- Changes to `application.yml` Spring-owned keys (CORS, Neo4j Bolt port, UI toggle). + +## 4. Architecture + +### 4.1 Pipeline shape + +The current `index` and `analyze` pipelines look like: + +``` +discover → parse → detect → link → classify → store +``` + +After this sub-project, they become: + +``` +discover → parse → resolve → detect → link → classify → store +``` + +The resolve stage runs after `analyzer/StructuredParser` produces a parsed file and before the detector fan-out kicks off. + +### 4.2 Resolver-pass placement + +- **Bootstrapping:** `analyzer/Analyzer` (or `cli/IndexCommand`'s in-process pipeline) calls `ResolverRegistry.bootstrap(rootPath)` once per analysis run, before file iteration begins. The Java resolver uses this hook to build a single `CombinedTypeSolver` configured with sorted source roots and `ReflectionTypeSolver`. Other languages' resolvers (future sub-projects) plug into the same hook. +- **Per-file resolution:** for each file, after parse, the analyzer asks `ResolverRegistry.resolverFor(language)` for the matching resolver, calls `resolve(parsedFile)`, and stores the result on the `DetectorContext` as `Optional`. +- **Detector consumption:** detectors call `ctx.resolved()`. If present, the detector may emit edges with `Confidence.RESOLVED`; if absent, the detector falls through to its existing logic and emits `Confidence.SYNTACTIC` (when AST-based) or `Confidence.LEXICAL` (when regex-based). + +### 4.3 Pipeline invariant + +The new stage must not change *which files are analyzed* or *which detectors run for them*. It only enriches the input each detector sees. A regression here breaks every downstream count and statistic. + +## 5. Components + +### 5.1 New components + +| Path | Type | Responsibility | +|---|---|---| +| `intelligence/resolver/SymbolResolver.java` | interface | SPI: `Set getSupportedLanguages(); Resolved resolve(ParsedFile parsed) throws ResolutionException;` | +| `intelligence/resolver/Resolved.java` | interface (or sealed type) | Read-only resolution result for one file: per-symbol type info, resolved imports, declared types. Includes `Confidence sourceConfidence()` indicating the resolver's confidence in this particular result. | +| `intelligence/resolver/EmptyResolved.java` | record / class | Singleton "no resolution available" — returned for unsupported languages, disabled config, or resolution failure. | +| `intelligence/resolver/ResolverRegistry.java` | `@Component` | Auto-discovers `@Component` `SymbolResolver` beans (mirrors `DetectorRegistry`). Exposes `resolverFor(language)` and `bootstrap(rootPath)`. | +| `intelligence/resolver/ResolutionException.java` | exception | Wraps backend-specific failures (e.g. `JavaSymbolSolver` errors) with context (file path, language). | +| `intelligence/resolver/java/JavaSymbolResolver.java` | `@Component` | Wraps `JavaSymbolSolver`. Builds `CombinedTypeSolver` from sorted source roots + `ReflectionTypeSolver`. | +| `intelligence/resolver/java/JavaResolved.java` | record | Java-specific `Resolved` carrying JavaParser `TypeSolver` + per-AST resolved type info. | +| `intelligence/resolver/java/JavaSourceRootDiscovery.java` | helper | Discovers Java source roots from a project root (auto-detects `src/main/java`, `src/test/java`, multi-module via Maven `` / Gradle `include`). Pure logic, unit-testable. | +| `model/Confidence.java` | enum | `LEXICAL` / `SYNTACTIC` / `RESOLVED` with a numeric mapping (0.6 / 0.8 / 0.95). Comparable. | +| `model/EdgeProvenance.java` *(optional, see §5.3)* | record | Optional richer provenance carrier; if not adopted, just use `String source` on `CodeEdge`. | + +### 5.2 Changed components + +| Path | Change | Rationale | +|---|---|---| +| `detector/DetectorContext.java` | Add `Optional resolved()` accessor. Defaults to `Optional.empty()`. Existing constructors keep working. | Detector opt-in path. | +| `model/CodeNode.java` | Add `Confidence confidence` and `String source` fields. `source` filled in by detector base classes (detector class simple name). `confidence` set per parser type (see §5.3): `AbstractRegexDetector` → `LEXICAL`, `AbstractJavaParserDetector` / `AbstractAntlrDetector` / `AbstractStructuredDetector` → `SYNTACTIC`. Detectors override to `RESOLVED` when emitting an edge derived from `ctx.resolved()`. | Confidence/provenance schema. | +| `model/CodeEdge.java` | Same as `CodeNode`. | Same. | +| `graph/GraphStore.java` | `bulkSave` writes `prop_confidence` and `prop_source`; `nodeFromNeo4j` / `edgeFromNeo4j` restore them. | Round-trip the new fields. | +| `cache/AnalysisCache.java` | Bump `CACHE_VERSION` from 4 to 5. Add `confidence` and `source` columns to `nodes` and `edges` tables. | Schema change requires cache reset. | +| `analyzer/Analyzer.java` | Insert resolve step. `bootstrapResolvers(rootPath)` once; `resolverFor(language).resolve(parsed)` per file. | Pipeline integration. | +| `cli/IndexCommand.java` | Mirror `Analyzer`'s resolver bootstrap (the in-process H2 batched pipeline). | Both code paths must integrate. | +| 4–6 Java detectors (see §5.4) | Use `ctx.resolved()`. Emit `Confidence.RESOLVED` when present; existing path emits `Confidence.SYNTACTIC`. | Proof of value. | +| `pom.xml` | Add `com.github.javaparser:javaparser-symbol-solver-core` (Apache-2.0, version-pinned to match `javaparser-core`). Resolve **latest stable matching version** at implementation time. Add `net.jqwik:jqwik` (test scope, EPL-2.0) for property-based tests. | New deps. | +| `codeiq.yml` schema (`docs/codeiq.yml.example`) | Document the new `intelligence.symbol_resolution.java` keys. | Surface the new config. | +| `config/CodeIqConfig.java` (or unified-config equivalent) | Bind the new keys. | Enable the toggles. | + +### 5.3 Confidence / provenance — schema decisions + +- **Storage shape:** the simplest viable model is two scalar fields on every `CodeNode` and `CodeEdge`: + - `confidence: Confidence` (enum, non-null). The default is set by the detector's base class — not a single hardcoded value — based on the parser used: + - `AbstractRegexDetector` → `LEXICAL` (pattern-only, no AST) + - `AbstractJavaParserDetector` / `AbstractAntlrDetector` / `AbstractStructuredDetector` / `AbstractPythonAntlrDetector` / `AbstractTypeScriptDetector` / `AbstractJavaMessagingDetector` / `AbstractPythonDbDetector` → `SYNTACTIC` (AST or parse tree, no symbol resolution) + - Detector overrides to `RESOLVED` for any edge derived from `ctx.resolved()`. + - `source: String` (non-null; detector class simple name, e.g. `"SpringServiceDetector"`) +- **Numeric access:** consumers (Cypher queries, MCP tools, the SPA) get a numeric value via `Confidence.score()` (0.6 / 0.8 / 0.95). The mapping is a static lookup; the enum is the authoritative form. +- **Future extensibility:** if richer provenance is needed later (e.g. resolver name, resolution timestamp), extend with optional `prop_resolver` etc. — the enum + source design does not preclude this. Don't pre-build for it. +- **MCP / API surface:** `confidence` and `source` are passthrough fields in node/edge JSON serialization. No new endpoints. Cypher filters can use `WHERE n.confidence = 'RESOLVED'` once the schema lands. + +### 5.4 Detector migration candidates (4–6) + +Final selection happens at implementation time based on which gives the clearest signal in `spring-petclinic`. Likely set: + +| Detector | Path | Why | +|---|---|---| +| `SpringServiceDetector` | `detector/jvm/java/SpringServiceDetector.java` | `@Autowired UserService` — needs to resolve `UserService` to its actual type for cross-class wiring. Highest visibility win. | +| `SpringRepositoryDetector` | `detector/jvm/java/SpringRepositoryDetector.java` | Repository interfaces extending `JpaRepository` — resolving `T` lets us link the repo to the entity. | +| `JpaEntityDetector` | `detector/jvm/java/JpaEntityDetector.java` | `@OneToMany List` — resolving the generic argument links entity-to-entity correctly. | +| `JpaRepositoryDetector` | `detector/jvm/java/JpaRepositoryDetector.java` | Same as Spring repo, deeper. | +| `KafkaListenerDetector` | `detector/jvm/java/KafkaListenerDetector.java` | Topic resolution from `@KafkaListener(topics = TOPIC_CONST)`. | +| `SpringRestDetector` | `detector/jvm/java/SpringRestDetector.java` | `@RequestBody UserDto dto` — resolving `UserDto` enables `MAPS_TO` edges from endpoint to entity. | + +Six is the upper bound; if four are sufficient to demonstrate measurable quality lift on petclinic, the rest can be migrated in follow-up PRs without changing this spec. + +## 6. Data flow (per analysis run) + +``` +1. cli/{Index,Analyze}Command.call() → analyzer/Analyzer.run(rootPath) + 1.1. ResolverRegistry.bootstrap(rootPath) + → JavaSymbolResolver.bootstrap() + - JavaSourceRootDiscovery.discover(rootPath) → sorted List + - new CombinedTypeSolver( + new ReflectionTypeSolver(), + sorted source roots wrapped in JavaParserTypeSolver) + - new JavaSymbolSolver(combinedTypeSolver) + - configure JavaParser default ParserConfiguration with the solver +2. For each discovered file (virtual thread): + 2.1. StructuredParser.parse(file) → ParsedFile (Java → CompilationUnit; others → existing types) + 2.2. resolved = ResolverRegistry.resolverFor(file.language()).resolve(parsedFile) + (returns EmptyResolved.INSTANCE for languages without a registered resolver) + 2.3. ctx = DetectorContext.builder()...resolved(resolved)...build() + 2.4. for each Detector matching language: detector.detect(ctx) +3. GraphBuilder.flush() → AnalysisCache (or → GraphStore on enrich) + - Each node and edge carries Confidence + source + - Round-tripped via prop_confidence / prop_source in Neo4j +``` + +## 7. Configuration surface + +New keys in `codeiq.yml`: + +```yaml +intelligence: + symbol_resolution: + java: + enabled: true + source_roots: auto # or explicit list of paths relative to repo root + jdk_reflection: true # ReflectionTypeSolver — needs JDK on classpath (always true for codeiq's runtime) + # bootstrap_timeout_seconds: 30 (kill switch if solver hangs) + # max_per_file_resolve_ms: 500 (per-file resolution timeout) +``` + +**Defaults:** +- `enabled: true` — most users want correctness > raw speed. +- `source_roots: auto` — discovery covers Maven (`src/main/java`, `src/test/java`, multi-module via `` in `pom.xml`), Gradle (similar), and plain layouts. +- `jdk_reflection: true`. +- `bootstrap_timeout_seconds: 30`. +- `max_per_file_resolve_ms: 500`. + +**Env overrides:** `CODEIQ_INTELLIGENCE_SYMBOL_RESOLUTION_JAVA_ENABLED=false` etc. + +**Config validation:** `codeiq config validate` must reject invalid combinations (e.g. `enabled: true` with empty `source_roots: []`). + +## 8. Backward compatibility + +- All existing `Detector` implementations compile and run unchanged. `ctx.resolved()` returns `Optional.empty()` for them by default (they never call it). +- Existing tests must pass with `intelligence.symbol_resolution.java.enabled: false`. **Mandatory.** Two sub-cases: + - **Logical-content tests** (assert on node IDs, edge counts, specific property values): pass unchanged. + - **JSON-snapshot / golden-file tests** (assert on full serialized output): will shift by exactly two new fields per node/edge (`confidence`, `source`). These get a **one-time refresh** during implementation, with a separate commit so the diff is reviewable. The refresh must produce only those two added fields per record — any other diff is a bug. +- With `enabled: true`, logical-content tests still pass — but some node/edge counts may shift **by design** (resolved-mode detectors emit different / additional edges that the lexical fallback could not produce). Expected diffs are recorded in the implementation plan and PR description. +- `CACHE_VERSION` bump from 4 to 5 wipes old `.codeiq/cache/` on first run. Documented in `CHANGELOG.md` under `[Unreleased]` as a breaking cache change. End users lose nothing meaningful; the cache rebuilds on the next `index` run. + +## 9. Performance budget + +| Stage | Cost | Notes | +|---|---|---| +| Resolver bootstrap | 2–5 s on a medium repo | One-time per run. Cached `CombinedTypeSolver` reused across files. | +| Per-Java-file resolve | 50–200 ms typical | Net +30–60% on Java analysis time. | +| Per-non-Java-file resolve | 0 (EmptyResolved) | No-op. | +| Memory overhead | tens to low hundreds of MB | `CombinedTypeSolver` caches resolved type info; bounded by source-root size. | +| Determinism cost | none | Sorted source roots add ms-scale. | + +For a 44 K-file codebase: +- Today: index ~220 s. +- After: index ~280–350 s (Java-heavy repos worst case). Acceptable. +- Mitigation: `intelligence.symbol_resolution.java.enabled: false` for raw-speed scans. + +**Performance gate:** if resolver bootstrap exceeds 10 s on `spring-petclinic`, the implementation has a bug — investigate before merge. + +## 10. Determinism guarantees + +- `JavaSourceRootDiscovery.discover(rootPath)` returns roots sorted alphabetically. +- `CombinedTypeSolver` member solvers added in the sorted order. +- `ResolverRegistry` exposes resolvers in stable iteration order (Spring `@Component` collection sorted by simple class name). +- `Resolved` value-types use `TreeMap` / sorted `List` for any iteration-order-sensitive data. +- New determinism test (mandatory): run resolver twice on the same input via separate JVM invocations, assert byte-identical serialized output. Mirrors existing detector convention. + +## 11. Error handling + +| Failure | Behavior | +|---|---| +| Source root configured but missing | Log WARN, drop from solver list, continue. | +| Source root contains no Java files | Drop from solver list, continue. | +| `CombinedTypeSolver` construction throws | Log ERROR with classpath context, fall back to `EmptyResolved` for all files (resolver disabled for this run), increment a metric. Do **not** abort the analysis. | +| Per-file `resolve(parsedFile)` throws | Log DEBUG (these are expected for malformed sources), return `EmptyResolved` for that file, continue. | +| Per-file resolution exceeds `max_per_file_resolve_ms` | Cancel via virtual-thread interruption, return `EmptyResolved` for that file, count timeout in metrics. | +| Bootstrap exceeds `bootstrap_timeout_seconds` | Abort bootstrap, fall back to `EmptyResolved` for the run, log ERROR. Run continues without resolution. | +| Detector calls `ctx.resolved().get()` and crashes | Caught by existing per-detector `try/catch` in `Analyzer` — file is skipped, detector is logged, run continues. (Existing behavior.) | + +## 12. Aggressive testing strategy + +This section is binding. Every layer below is mandatory for sub-project 1; the same template applies to sub-projects 2–8. + +### Layer 1 — Resolver unit tests (pure, fast) + +For `JavaSymbolResolver`, with one synthetic source tree per test: + +- Empty file (zero declarations). +- Single class with no imports. +- Class with multiple methods of varying signatures (overloads). +- Class with generics (≥3 levels of nesting: `Map>>`). +- Inner classes (static, non-static, anonymous, local). +- Lambda expressions and method references. +- Records and sealed classes (Java 25). +- Enum with abstract methods. +- Interface with default methods. +- Abstract class. +- Annotations (definition + use). +- Imports: explicit, static, wildcard, missing target, unused. +- Cyclic imports between two files (legal in Java) — both resolve. +- Two classes with the same simple name in different packages — both resolve to distinct nodes. +- Symbol defined in JDK (`Optional`, `Stream`, `List`) — resolves via `ReflectionTypeSolver`. +- Multi-source-root: a class in `src/main/java` referencing one in `src/test/java`. + +Expected: every test asserts the *exact* `Resolved` content via golden files committed under `src/test/resources/intelligence/resolver/java/`. + +### Layer 2 — Detector × resolver integration tests + +For each migrated detector: +- **Resolved-mode positive:** with resolver enabled, assert resolved-only edges that the lexical fallback could not produce (e.g. `INJECTS` edges to the *correct* `UserService` of two same-named classes in different packages). +- **Fallback-mode positive:** with resolver disabled, assert logical-content output identical to the pre-spec baseline (modulo the additive `confidence` and `source` fields per §8). +- **Mixed mode:** simulate resolver failure on half the files; the other half emits resolved edges, the failing half emits fallback edges. Both labeled with correct `Confidence`. + +### Layer 3 — Concurrency stress + +- 1000 synthetic Java files resolved on virtual threads. Assert: no exceptions, no deadlocks, no thread starvation, total throughput within 2× of sequential baseline. Output identical to sequential run (sort-then-compare). +- Resolver bootstrap happens **once** even if 50 threads call `resolverFor` simultaneously at startup. Verify via mock + invocation count. + +### Layer 4 — Memory / pathological inputs + +- 10 000-line synthetic class file: resolves under -Xmx512m. +- File with 1000 imports (most unresolved): resolves without OOM; produces the expected partial result. +- Deep generic nesting (10 levels deep): resolves; runtime ≤ 1 s. + +### Layer 5 — Adversarial inputs + +- File with syntax errors (parser fails): resolver never invoked; `Analyzer` continues. +- File mis-tagged as Java but actually Kotlin / Groovy / random bytes: parser fails first; resolver never sees it. +- Mixed source root with `.java` and unrelated files: only `.java` files enter the solver. +- `ReflectionTypeSolver` simulated as unavailable (test injects null JDK classpath): resolver works at reduced fidelity, returns `Confidence.SYNTACTIC` for JDK-dependent symbols. + +### Layer 6 — Determinism + +- Run resolver 10 times against the same input on the same JVM. Assert byte-identical serialized graphs. +- Run resolver against the same input, with source roots passed in a different order. Assert byte-identical output (we sort internally). +- Run on cold and warm JVMs. Identical. + +### Layer 7 — E2E quality regression (gating) + +- `E2EQualityTest` against `spring-petclinic` ground truth (`src/test/resources/e2e/ground-truth-petclinic.json`): + - With `enabled: false`: logical-content output identical to the pre-spec baseline (modulo the additive `confidence` and `source` fields per record — see §8). Mandatory regression gate. + - With `enabled: true`: edge precision / recall **measurably up** vs. the `enabled: false` baseline. The implementation plan will record before/after numbers; this spec demands measurable improvement with no regressions on other metrics in the ground-truth file. +- Full `mvn test` green. +- Full `mvn verify` green (SpotBugs, dependency-check). May skip locally; CI is authoritative. + +### Layer 8 — Property-based / fuzz (jqwik) + +- New test scope dependency: `net.jqwik:jqwik` (latest stable, EPL-2.0). License is EPL-2.0 — flag for explicit approval; if rejected, swap for a permissive alternative (or hand-write generators). **License decision deferred to implementation time** — see §15 below. +- Generators produce small synthetic Java source strings (within JavaParser's grammar). Invariants tested: + - Resolver never throws an unchecked exception (only `ResolutionException` or returns `EmptyResolved`). + - Resolver always terminates within `max_per_file_resolve_ms`. + - Same input → same output (deterministic). + - Editing an unrelated file in a different source root never changes the resolution of file F. + +### Layer 9 — Mutation testing (PIT) + +- Add PIT mutation testing as a **non-gating** Maven goal (e.g. `mvn -P mutation pitest:mutationCoverage`). +- Target: 80% mutant kill rate on the new packages (`io.github.randomcodespace.iq.intelligence.resolver.*`, `io.github.randomcodespace.iq.model.Confidence`). +- Not bound to `mvn verify` — runs on demand. Used as a code-quality signal during PR review. + +### Test-data hygiene + +- Synthetic Java sources for unit tests live under `src/test/resources/intelligence/resolver/java//...`. +- Each scenario has a `README.md` explaining intent (one paragraph). +- Golden output (`expected.json`) checked in. Updated only via a documented refresh script. + +## 13. Acceptance criteria + +Sub-project 1 is "done" when **all** of the following are true on the feature branch: + +1. **All tests in §12 layers 1–7 pass.** Layers 8 and 9 are non-gating but must run cleanly. +2. **`mvn verify` green** on CI (full Java CI workflow, including SpotBugs and OWASP dependency-check). +3. **No logical-content regression** in any existing test (`mvn test` green with `enabled: false`). Snapshot tests refreshed in a separate commit per §8; the refresh diff must be limited to the two additive fields per record. +4. **E2E petclinic precision/recall measurably improved** with `enabled: true`. The PR description records before/after numbers. +5. **`CHANGELOG.md`** updated under `[Unreleased]` with a one-paragraph entry naming the new config keys, the schema additions, and the cache-version bump. +6. **`CLAUDE.md`** updated under "Gotchas" to note: confidence/provenance is now mandatory on every node/edge; the resolver pass is part of the pipeline; cache version is 5. +7. **`PROJECT_SUMMARY.md`** "Tech stack" + "Gotchas" updated. +8. **Determinism re-verified** on the migrated detectors (existing determinism tests still pass; new ones added per §12 layer 6). +9. **No new dependencies with non-permissive licenses** (Apache-2.0 / MIT / BSD only without explicit user sign-off; jqwik EPL-2.0 needs explicit OK or replacement — see §15). +10. **No new High/Critical CVEs** introduced (`mvn verify` security gate green). + +## 14. Risks & mitigations + +| Risk | Likelihood | Impact | Mitigation | +|---|---|---|---| +| `JavaSymbolSolver` performance worse than budgeted | Medium | Pipeline unusable for very large repos | `enabled: false` escape hatch; performance gate in §9; profile before merge | +| Source-root auto-discovery wrong on niche project layouts | Medium | Resolver falls back to `EmptyResolved` silently → user sees no improvement | Explicit `source_roots: [list]` override; clear log message at WARN when discovery yields zero roots; `codeiq config explain` shows discovered roots | +| Confidence schema change breaks consumers (SPA, MCP clients) | Low (additive only) | API drift | Fields are additive; default `LEXICAL`/detector-name. Existing consumers ignore unknown fields per Jackson config (`FAIL_ON_UNKNOWN_PROPERTIES = false`). | +| Cache-version bump surprises users | Low | One-time slow re-index after upgrade | `CHANGELOG` entry; user-facing log line on first run after bump | +| jqwik EPL-2.0 license blocked by user policy | Low (already flagged in defaults) | No property-based tests in layer 8 | Hand-write generators or pick a permissive alternative; flagged for decision at impl time | +| `JavaSymbolSolver` panics on Java 25 idioms (records, sealed, pattern-match) | Medium | Resolver failure on modern Java | Per-file resolution failures are caught (§11); track upstream JavaParser issues; pin to latest JavaParser version | +| Cross-class resolution still ambiguous with same-named symbols across modules | Medium | False matches even with resolver | Track via E2E quality numbers; flag for sub-project 1.5 (Maven/Gradle classpath JAR resolution) if material | + +## 15. Dependency decisions + +To be resolved at implementation time (NOT in this spec): + +1. **`javaparser-symbol-solver-core` exact version.** Resolve the latest stable version compatible with `javaparser-core` (currently 3.28.0 per CLAUDE.md). Use `context7` MCP first; fall back to Maven Central. +2. **`net.jqwik:jqwik` license (EPL-2.0).** Per `~/.claude/rules/dependencies.md`: "Permissive licenses (MIT/Apache/BSD) preferred. GPL/AGPL flagged for approval." EPL-2.0 is not GPL/AGPL but is also not on the preferred list. Default plan: ask the user once at implementation time; if blocked, swap for hand-rolled generators or another permissive property-test framework. **Will not add jqwik silently.** +3. **PIT mutation testing dep.** Apache-2.0; safe to add as a non-default Maven profile. + +## 16. Out of scope (cross-reference) + +- **TypeScript / JavaScript / Python / Go / Rust / C++ / C# resolution** — sub-projects 2–5. They will plug into the SPI defined here. +- **Detect-by-resolved-type detector refactor** — sub-project 6. Migrated detectors here keep their current detection mechanism; only their *outgoing edges* benefit from resolution. +- **Cross-framework false-positive harness** — sub-project 7. +- **MCP HTTP-streamable hardening** — sub-project 8. +- **Maven/Gradle classpath JAR resolution** — possible sub-project 1.5 if E2E quality numbers reveal a gap. + +## 17. Implementation sequencing (informational, plan owns the detail) + +The plan that follows this spec will sequence work as: +1. Schema changes (`Confidence` enum, `CodeNode`/`CodeEdge` fields, Neo4j round-trip, `AnalysisCache` schema + version bump). +2. SPI scaffolding (`SymbolResolver`, `Resolved`, `EmptyResolved`, `ResolverRegistry`). +3. Java backend (`JavaSourceRootDiscovery`, `JavaSymbolResolver`, `JavaResolved`). +4. Pipeline wiring (`Analyzer`, `IndexCommand`). +5. Detector migration (one detector at a time, each with new + existing tests passing). +6. Aggressive testing layers (1–9 in order, layers 8/9 may run in parallel with 5–7). +7. Doc updates (`CHANGELOG`, `CLAUDE.md`, `PROJECT_SUMMARY.md`). +8. PR ready for human review when all acceptance criteria green. + +## 18. References + +- [`PROJECT_SUMMARY.md`](../../PROJECT_SUMMARY.md) — repo-wide entry point. +- [`CLAUDE.md`](../../CLAUDE.md) — canonical internals. +- [`docs/project/architecture.md`](../project/architecture.md) — pipeline + components, including the package layering rule that detectors may not depend on `analyzer/`. +- [`docs/project/data-model.md`](../project/data-model.md) — `NodeKind`, `EdgeKind`, Neo4j schema, H2 cache schema. +- [`docs/project/conventions.md`](../project/conventions.md) — detector authoring, base classes, "don't refactor" rules. +- [`docs/project/build-and-run.md`](../project/build-and-run.md) — Maven, ANTLR codegen, frontend bundling. +- JavaParser symbol-solver documentation: resolve via `context7` MCP at implementation time. +- Sourcegraph SCIP and GitHub Stack Graphs as comparable patterns (informational only — not adopted in sub-project 1). diff --git a/pom.xml b/pom.xml index 3f1144ab..8c610860 100644 --- a/pom.xml +++ b/pom.xml @@ -195,6 +195,14 @@ 3.28.0 + + + com.github.javaparser + javaparser-symbol-solver-core + 3.28.0 + + org.antlr diff --git a/src/main/java/io/github/randomcodespace/iq/analyzer/Analyzer.java b/src/main/java/io/github/randomcodespace/iq/analyzer/Analyzer.java index 690d197b..d066126d 100644 --- a/src/main/java/io/github/randomcodespace/iq/analyzer/Analyzer.java +++ b/src/main/java/io/github/randomcodespace/iq/analyzer/Analyzer.java @@ -11,6 +11,7 @@ import io.github.randomcodespace.iq.detector.AbstractAntlrDetector; import io.github.randomcodespace.iq.detector.Detector; import io.github.randomcodespace.iq.detector.DetectorContext; +import io.github.randomcodespace.iq.detector.DetectorEmissionDefaults; import io.github.randomcodespace.iq.detector.DetectorRegistry; import io.github.randomcodespace.iq.detector.DetectorResult; import io.github.randomcodespace.iq.detector.DetectorUtils; @@ -1311,6 +1312,9 @@ DetectorResult analyzeFileWithRegistry(DiscoveredFile file, Path repoPath, } try { DetectorResult result = detector.detect(ctx); + // Stamp confidence + source defaults on every emission whose source + // is null. Detectors that already explicitly stamp are left alone. + DetectorEmissionDefaults.applyDefaults(result, detector); allNodes.addAll(result.nodes()); allEdges.addAll(result.edges()); } catch (Throwable e) { @@ -1514,6 +1518,8 @@ DetectorResult analyzeFile(DiscoveredFile file, Path repoPath, DetectorRegistry try { Instant detStart = Instant.now(); DetectorResult result = detector.detect(ctx); + // Stamp orchestrator-managed confidence + source defaults. + DetectorEmissionDefaults.applyDefaults(result, detector); long detMs = Duration.between(detStart, Instant.now()).toMillis(); if (detMs > 2000) { log.warn("🐢 SLOW DETECTOR: {} on {}: {}ms", @@ -1601,6 +1607,8 @@ private DetectorResult analyzeFileRegexOnly(DiscoveredFile file, Path repoPath, } else { result = detector.detect(ctx); } + // Stamp orchestrator-managed confidence + source defaults. + DetectorEmissionDefaults.applyDefaults(result, detector); allNodes.addAll(result.nodes()); allEdges.addAll(result.edges()); } catch (Throwable e) { diff --git a/src/main/java/io/github/randomcodespace/iq/cache/AnalysisCache.java b/src/main/java/io/github/randomcodespace/iq/cache/AnalysisCache.java index fa15ac01..630d16b4 100644 --- a/src/main/java/io/github/randomcodespace/iq/cache/AnalysisCache.java +++ b/src/main/java/io/github/randomcodespace/iq/cache/AnalysisCache.java @@ -5,6 +5,7 @@ import com.fasterxml.jackson.databind.ObjectMapper; import io.github.randomcodespace.iq.model.CodeEdge; import io.github.randomcodespace.iq.model.CodeNode; +import io.github.randomcodespace.iq.model.Confidence; import io.github.randomcodespace.iq.model.EdgeKind; import io.github.randomcodespace.iq.model.NodeKind; import org.slf4j.Logger; @@ -39,8 +40,8 @@ public final class AnalysisCache implements Closeable { private static final Logger log = LoggerFactory.getLogger(AnalysisCache.class); - /** Bump when hash algorithm or schema changes to force cache invalidation. */ - private static final int CACHE_VERSION = 4; + /** Bump when hash algorithm or serialization shape changes to force cache invalidation. */ + private static final int CACHE_VERSION = 5; private static final String SCHEMA_SQL = """ CREATE TABLE IF NOT EXISTS cache_meta ( @@ -689,6 +690,10 @@ private String serializeNode(CodeNode node) { if (node.getLineStart() != null) data.put("line_start", node.getLineStart()); if (node.getLineEnd() != null) data.put("line_end", node.getLineEnd()); if (node.getLayer() != null) data.put("layer", node.getLayer()); + // Confidence is never null at rest (setter normalizes to LEXICAL); store the + // enum name. Source is optional and stays null for bare construction. + data.put("confidence", node.getConfidence().name()); + if (node.getSource() != null) data.put("source", node.getSource()); if (node.getAnnotations() != null && !node.getAnnotations().isEmpty()) { data.put("annotations", node.getAnnotations()); } @@ -720,6 +725,20 @@ private CodeNode deserializeNode(String json) { if (data.get("line_start") instanceof Number n) node.setLineStart(n.intValue()); if (data.get("line_end") instanceof Number n) node.setLineEnd(n.intValue()); node.setLayer((String) data.get("layer")); + // Confidence + source: missing/malformed values fall back to LEXICAL/null + // — never throw — so legacy cache rows without these fields still load. + Object confObj = data.get("confidence"); + if (confObj instanceof String confStr) { + try { + node.setConfidence(Confidence.fromString(confStr)); + } catch (IllegalArgumentException ignored) { + // keep default LEXICAL + } + } + Object srcObj = data.get("source"); + if (srcObj instanceof String src) { + node.setSource(src); + } if (data.get("annotations") instanceof List list) { node.setAnnotations(list.stream().map(Object::toString).toList()); } @@ -743,6 +762,9 @@ private String serializeEdge(CodeEdge edge) { if (edge.getTarget() != null) { data.put("target_id", edge.getTarget().getId()); } + // Confidence is never null at rest; source is optional. + data.put("confidence", edge.getConfidence().name()); + if (edge.getSource() != null) data.put("source", edge.getSource()); if (edge.getProperties() != null && !edge.getProperties().isEmpty()) { data.put("properties", edge.getProperties()); } @@ -772,6 +794,19 @@ private CodeEdge deserializeEdge(String json) { } CodeEdge edge = new CodeEdge(id, EdgeKind.fromValue(kindStr), sourceId, target); + // Confidence + source: missing/malformed → LEXICAL/null, never throw. + Object confObj = data.get("confidence"); + if (confObj instanceof String confStr) { + try { + edge.setConfidence(Confidence.fromString(confStr)); + } catch (IllegalArgumentException ignored) { + // keep default LEXICAL + } + } + Object srcObj = data.get("source"); + if (srcObj instanceof String src) { + edge.setSource(src); + } if (data.get("properties") instanceof Map map) { @SuppressWarnings("unchecked") Map props = (Map) map; diff --git a/src/main/java/io/github/randomcodespace/iq/detector/AbstractAntlrDetector.java b/src/main/java/io/github/randomcodespace/iq/detector/AbstractAntlrDetector.java index 008f0407..efe56f30 100644 --- a/src/main/java/io/github/randomcodespace/iq/detector/AbstractAntlrDetector.java +++ b/src/main/java/io/github/randomcodespace/iq/detector/AbstractAntlrDetector.java @@ -1,5 +1,6 @@ package io.github.randomcodespace.iq.detector; +import io.github.randomcodespace.iq.model.Confidence; import org.antlr.v4.runtime.*; import org.antlr.v4.runtime.atn.PredictionMode; import org.antlr.v4.runtime.tree.ParseTree; @@ -22,6 +23,17 @@ public abstract class AbstractAntlrDetector extends AbstractRegexDetector { private static final Logger log = LoggerFactory.getLogger(AbstractAntlrDetector.class); + /** + * ANTLR parse trees are syntactic but not symbol-resolved — bump the + * regex-default {@link Confidence#LEXICAL} up to {@link Confidence#SYNTACTIC}. + * Subclasses that resolve symbols should call {@code setConfidence(RESOLVED)} + * explicitly on their emissions. + */ + @Override + public Confidence defaultConfidence() { + return Confidence.SYNTACTIC; + } + @Override public DetectorResult detect(DetectorContext ctx) { try { diff --git a/src/main/java/io/github/randomcodespace/iq/detector/AbstractRegexDetector.java b/src/main/java/io/github/randomcodespace/iq/detector/AbstractRegexDetector.java index b02d5be0..390d1068 100644 --- a/src/main/java/io/github/randomcodespace/iq/detector/AbstractRegexDetector.java +++ b/src/main/java/io/github/randomcodespace/iq/detector/AbstractRegexDetector.java @@ -1,5 +1,7 @@ package io.github.randomcodespace.iq.detector; +import io.github.randomcodespace.iq.model.Confidence; + import java.util.ArrayList; import java.util.List; @@ -9,6 +11,15 @@ */ public abstract class AbstractRegexDetector implements Detector { + /** + * Regex matches are pattern-only — no parse tree, no symbol resolution. + * Confidence floor for emissions from this base class is {@link Confidence#LEXICAL}. + */ + @Override + public Confidence defaultConfidence() { + return Confidence.LEXICAL; + } + /** * A single line of content with its 1-based line number. */ diff --git a/src/main/java/io/github/randomcodespace/iq/detector/AbstractStructuredDetector.java b/src/main/java/io/github/randomcodespace/iq/detector/AbstractStructuredDetector.java index 58e8592d..85685cce 100644 --- a/src/main/java/io/github/randomcodespace/iq/detector/AbstractStructuredDetector.java +++ b/src/main/java/io/github/randomcodespace/iq/detector/AbstractStructuredDetector.java @@ -2,6 +2,7 @@ import io.github.randomcodespace.iq.model.CodeEdge; import io.github.randomcodespace.iq.model.CodeNode; +import io.github.randomcodespace.iq.model.Confidence; import io.github.randomcodespace.iq.model.EdgeKind; import io.github.randomcodespace.iq.model.NodeKind; @@ -16,6 +17,15 @@ */ public abstract class AbstractStructuredDetector implements Detector { + /** + * Structured (YAML/JSON/TOML/properties) parsing produces a parsed shape, not + * just a regex match — confidence floor is {@link Confidence#SYNTACTIC}. + */ + @Override + public Confidence defaultConfidence() { + return Confidence.SYNTACTIC; + } + /** * Safely cast an object to {@code Map}. * Returns an empty map if the object is not a map. diff --git a/src/main/java/io/github/randomcodespace/iq/detector/Detector.java b/src/main/java/io/github/randomcodespace/iq/detector/Detector.java index 2a82c968..05530fa5 100644 --- a/src/main/java/io/github/randomcodespace/iq/detector/Detector.java +++ b/src/main/java/io/github/randomcodespace/iq/detector/Detector.java @@ -1,9 +1,28 @@ package io.github.randomcodespace.iq.detector; +import io.github.randomcodespace.iq.model.Confidence; + import java.util.Set; public interface Detector { String getName(); Set getSupportedLanguages(); DetectorResult detect(DetectorContext ctx); + + /** + * Confidence floor for nodes and edges this detector emits without explicitly + * setting one. Stamped by the orchestrator (see {@code DetectorEmissionDefaults}) + * onto every emission whose {@code source} is still null — i.e. the detector + * didn't explicitly stamp anything. Default is {@link Confidence#LEXICAL} — the + * least-committal floor; base classes override to bump up to + * {@link Confidence#SYNTACTIC} for AST-backed detection. + * + *

A detector with stronger evidence (e.g. a resolved symbol) should call + * {@code node.setConfidence(Confidence.RESOLVED)} explicitly — the stamping + * pass leaves explicitly-stamped values alone (it keys off {@code source == + * null}). + */ + default Confidence defaultConfidence() { + return Confidence.LEXICAL; + } } diff --git a/src/main/java/io/github/randomcodespace/iq/detector/DetectorContext.java b/src/main/java/io/github/randomcodespace/iq/detector/DetectorContext.java index 0ffc79b8..3abfc4fa 100644 --- a/src/main/java/io/github/randomcodespace/iq/detector/DetectorContext.java +++ b/src/main/java/io/github/randomcodespace/iq/detector/DetectorContext.java @@ -1,23 +1,63 @@ package io.github.randomcodespace.iq.detector; import io.github.randomcodespace.iq.analyzer.InfrastructureRegistry; +import io.github.randomcodespace.iq.intelligence.resolver.Resolved; +import java.util.Optional; + +/** + * Immutable per-file context passed to every {@link Detector#detect}. + * + *

The {@code resolved} field is the opt-in entry point for symbol-resolution + * data. Detectors that want to upgrade emissions to {@link + * io.github.randomcodespace.iq.model.Confidence#RESOLVED} call + * {@code ctx.resolved().filter(Resolved::isAvailable).map(...)} before + * downcasting to the language-specific {@code Resolved} subclass. Detectors + * that don't care simply ignore the field — the existing pipeline works + * unchanged when {@link #resolved()} returns {@code Optional.empty()}. + */ public record DetectorContext( String filePath, String language, String content, Object parsedData, String moduleName, - InfrastructureRegistry registry + InfrastructureRegistry registry, + Optional resolved ) { - /** Minimal constructor — no parsed data, module name, or registry. */ + /** Compact constructor: normalize {@code null resolved} to {@link Optional#empty()}. */ + public DetectorContext { + if (resolved == null) { + resolved = Optional.empty(); + } + } + + /** Minimal constructor — no parsed data, module name, registry, or resolution. */ public DetectorContext(String filePath, String language, String content) { - this(filePath, language, content, null, null, null); + this(filePath, language, content, null, null, null, Optional.empty()); } - /** Full constructor without registry — backward compat for existing callers. */ + /** Backward-compat: 5-arg form without registry / resolution. */ public DetectorContext(String filePath, String language, String content, Object parsedData, String moduleName) { - this(filePath, language, content, parsedData, moduleName, null); + this(filePath, language, content, parsedData, moduleName, null, Optional.empty()); + } + + /** Backward-compat: 6-arg form with registry but no resolution (matches the old canonical record signature). */ + public DetectorContext(String filePath, String language, String content, + Object parsedData, String moduleName, + InfrastructureRegistry registry) { + this(filePath, language, content, parsedData, moduleName, registry, Optional.empty()); + } + + /** + * Return a copy of this context with the given {@link Resolved} attached. + * Used by the orchestrator after the resolver pass to thread per-file + * resolution into the detector. {@code null} is normalized to + * {@link Optional#empty()}. + */ + public DetectorContext withResolved(Resolved resolved) { + Optional opt = resolved != null ? Optional.of(resolved) : Optional.empty(); + return new DetectorContext(filePath, language, content, parsedData, moduleName, registry, opt); } } diff --git a/src/main/java/io/github/randomcodespace/iq/detector/DetectorEmissionDefaults.java b/src/main/java/io/github/randomcodespace/iq/detector/DetectorEmissionDefaults.java new file mode 100644 index 00000000..9ad674bd --- /dev/null +++ b/src/main/java/io/github/randomcodespace/iq/detector/DetectorEmissionDefaults.java @@ -0,0 +1,61 @@ +package io.github.randomcodespace.iq.detector; + +import io.github.randomcodespace.iq.model.CodeEdge; +import io.github.randomcodespace.iq.model.CodeNode; +import io.github.randomcodespace.iq.model.Confidence; + +/** + * Stamps the orchestrator-managed confidence + source defaults onto a + * {@link DetectorResult}. This is invoked by the analyzer / index pipeline + * after each {@link Detector#detect(DetectorContext)} call so detectors stay + * blissfully unaware of the bookkeeping. + * + *

Stamping rule — for every node and edge in the result: + *

    + *
  • If {@code getSource() == null} (i.e. the detector did not explicitly + * stamp anything), the entry is treated as "wants defaults": + *
      + *
    • {@code source} is set to the detector's class simple name.
    • + *
    • {@code confidence} is set to {@link Detector#defaultConfidence()}.
    • + *
    + *
  • + *
  • If {@code getSource() != null} (the detector stamped explicitly), + * both fields are left alone — the detector knows what it's doing.
  • + *
+ * + *

The {@code source==null} sentinel is what lets us distinguish "detector + * didn't think about confidence" from "detector intentionally chose LEXICAL." + * Confidence is never null at rest (the model setter normalizes), so confidence + * alone can't tell us that. + */ +public final class DetectorEmissionDefaults { + + private DetectorEmissionDefaults() { } + + /** + * Apply orchestrator defaults to every node + edge in the result. Mutates + * the model objects in place — the result record itself is unchanged. + * + * @param result the detector's emission (must not be null) + * @param detector the detector that produced it (used for source name + + * default confidence) + */ + public static void applyDefaults(DetectorResult result, Detector detector) { + if (result == null || detector == null) return; + String defaultSource = detector.getClass().getSimpleName(); + Confidence defaultConfidence = detector.defaultConfidence(); + + for (CodeNode node : result.nodes()) { + if (node.getSource() == null) { + node.setSource(defaultSource); + node.setConfidence(defaultConfidence); + } + } + for (CodeEdge edge : result.edges()) { + if (edge.getSource() == null) { + edge.setSource(defaultSource); + edge.setConfidence(defaultConfidence); + } + } + } +} diff --git a/src/main/java/io/github/randomcodespace/iq/detector/jvm/java/AbstractJavaMessagingDetector.java b/src/main/java/io/github/randomcodespace/iq/detector/jvm/java/AbstractJavaMessagingDetector.java index b03c701d..bf40cffa 100644 --- a/src/main/java/io/github/randomcodespace/iq/detector/jvm/java/AbstractJavaMessagingDetector.java +++ b/src/main/java/io/github/randomcodespace/iq/detector/jvm/java/AbstractJavaMessagingDetector.java @@ -3,6 +3,7 @@ import io.github.randomcodespace.iq.detector.AbstractRegexDetector; import io.github.randomcodespace.iq.model.CodeEdge; import io.github.randomcodespace.iq.model.CodeNode; +import io.github.randomcodespace.iq.model.Confidence; import io.github.randomcodespace.iq.model.EdgeKind; import io.github.randomcodespace.iq.model.NodeKind; @@ -18,6 +19,17 @@ public abstract class AbstractJavaMessagingDetector extends AbstractRegexDetecto protected static final Pattern CLASS_RE = Pattern.compile("(?:public\\s+)?class\\s+(\\w+)"); + /** + * Java messaging detectors layer language-aware semantics on top of regex + * matching (matched class name → emit messaging edge with kind). Bump the + * inherited regex-default {@link Confidence#LEXICAL} up to + * {@link Confidence#SYNTACTIC}. + */ + @Override + public Confidence defaultConfidence() { + return Confidence.SYNTACTIC; + } + /** * Extract the first class name from the source text. * Returns null if no class is found. diff --git a/src/main/java/io/github/randomcodespace/iq/detector/jvm/java/AbstractJavaParserDetector.java b/src/main/java/io/github/randomcodespace/iq/detector/jvm/java/AbstractJavaParserDetector.java index cb25a9ef..6dd792a1 100644 --- a/src/main/java/io/github/randomcodespace/iq/detector/jvm/java/AbstractJavaParserDetector.java +++ b/src/main/java/io/github/randomcodespace/iq/detector/jvm/java/AbstractJavaParserDetector.java @@ -4,6 +4,7 @@ import com.github.javaparser.ast.CompilationUnit; import io.github.randomcodespace.iq.detector.AbstractRegexDetector; import io.github.randomcodespace.iq.detector.DetectorContext; +import io.github.randomcodespace.iq.model.Confidence; import java.util.Optional; @@ -16,6 +17,17 @@ public abstract class AbstractJavaParserDetector extends AbstractRegexDetector { private static final ThreadLocal PARSER = ThreadLocal.withInitial(JavaParser::new); + /** + * JavaParser produces an AST — bump the inherited regex-default + * {@link Confidence#LEXICAL} up to {@link Confidence#SYNTACTIC}. Detectors + * that resolve symbols via JavaSymbolSolver (Phase 6+) should call + * {@code setConfidence(RESOLVED)} on emissions. + */ + @Override + public Confidence defaultConfidence() { + return Confidence.SYNTACTIC; + } + /** * Attempt to parse the source content into a JavaParser CompilationUnit. */ diff --git a/src/main/java/io/github/randomcodespace/iq/graph/GraphStore.java b/src/main/java/io/github/randomcodespace/iq/graph/GraphStore.java index 1817abe5..8b871e4a 100644 --- a/src/main/java/io/github/randomcodespace/iq/graph/GraphStore.java +++ b/src/main/java/io/github/randomcodespace/iq/graph/GraphStore.java @@ -3,6 +3,7 @@ import io.github.randomcodespace.iq.flow.FlowDataSource; import io.github.randomcodespace.iq.model.CodeEdge; import io.github.randomcodespace.iq.model.CodeNode; +import io.github.randomcodespace.iq.model.Confidence; import io.github.randomcodespace.iq.model.EdgeKind; import io.github.randomcodespace.iq.model.NodeKind; import org.neo4j.graphdb.GraphDatabaseService; @@ -37,6 +38,7 @@ @ConditionalOnBean(GraphRepository.class) public class GraphStore implements FlowDataSource { private static final String PROP_CNT = "cnt"; + private static final String PROP_CONFIDENCE = "confidence"; private static final String PROP_CONNECTIONS = "connections"; private static final String PROP_EXT = "ext"; private static final String PROP_FILEPATH = "filePath"; @@ -211,20 +213,33 @@ public void bulkSave(List nodes) { skipped++; continue; } - edgeBatch.add(Map.of( - PROP_SOURCEID, sourceId, - PROP_TARGETID, targetId, - "edgeId", edge.getId(), - PROP_KIND, edge.getKind().getValue() - )); + // HashMap (not Map.of) so we can null-skip optional fields. + Map edgeProps = new HashMap<>(6); + edgeProps.put(PROP_SOURCEID, sourceId); + edgeProps.put(PROP_TARGETID, targetId); + edgeProps.put("edgeId", edge.getId()); + edgeProps.put(PROP_KIND, edge.getKind().getValue()); + edgeProps.put(PROP_CONFIDENCE, edge.getConfidence().name()); + if (edge.getSource() != null) { + edgeProps.put(PROP_SOURCE, edge.getSource()); + } + edgeBatch.add(edgeProps); created++; } if (!edgeBatch.isEmpty()) { try (Transaction tx = graphDb.beginTx()) { + // coalesce(e.source, NULL) — Cypher accepts missing map keys as NULL, + // so omitting `source` from the param map cleanly results in r.source IS NULL. tx.execute(""" UNWIND $batch AS e MATCH (s:CodeNode {id: e.sourceId}), (t:CodeNode {id: e.targetId}) - CREATE (s)-[:RELATES_TO {id: e.edgeId, kind: e.kind, sourceId: e.sourceId}]->(t) + CREATE (s)-[:RELATES_TO { + id: e.edgeId, + kind: e.kind, + sourceId: e.sourceId, + confidence: e.confidence, + source: e.source + }]->(t) """, Map.of("batch", edgeBatch)); tx.commit(); } @@ -252,6 +267,12 @@ private Map nodeToProps(CodeNode node) { if (node.getLineStart() != null) props.put("lineStart", node.getLineStart()); if (node.getLineEnd() != null) props.put("lineEnd", node.getLineEnd()); if (node.getLayer() != null) props.put(PROP_LAYER, node.getLayer()); + // Confidence + source are typed first-class fields on CodeNode (not entries + // in node.getProperties()) — store as bare Neo4j properties alongside layer/kind. + // Confidence is never null at rest (setter normalizes to LEXICAL); store the + // enum name so Cypher filters like WHERE n.confidence = 'RESOLVED' match. + props.put(PROP_CONFIDENCE, node.getConfidence().name()); + if (node.getSource() != null) props.put(PROP_SOURCE, node.getSource()); if (node.getAnnotations() != null && !node.getAnnotations().isEmpty()) { props.put("annotations", String.join(",", node.getAnnotations())); } @@ -1151,7 +1172,8 @@ private void hydrateEdges(List nodes) { try (Transaction tx = graphDb.beginTx()) { var result = tx.execute( "MATCH (s:CodeNode)-[r:RELATES_TO]->(t:CodeNode) " - + "RETURN r.id AS id, r.kind AS kind, s.id AS sourceId, t.id AS targetId"); + + "RETURN r.id AS id, r.kind AS kind, s.id AS sourceId, t.id AS targetId, " + + "r.confidence AS confidence, r.source AS source"); while (result.hasNext()) { var row = result.next(); String sourceId = (String) row.get(PROP_SOURCEID); @@ -1168,12 +1190,35 @@ private void hydrateEdges(List nodes) { } catch (IllegalArgumentException e) { continue; } - source.getEdges().add(new CodeEdge(edgeId, edgeKind, sourceId, target)); + CodeEdge edge = new CodeEdge(edgeId, edgeKind, sourceId, target); + applyEdgeConfidenceAndSource(edge, row); + source.getEdges().add(edge); } } } } + /** + * Apply confidence + source from a Cypher row to an edge. Missing or malformed + * confidence falls back to {@link Confidence#LEXICAL} — never throws — so legacy + * edges written before these fields existed read back cleanly. Source stays null + * when missing. + */ + private static void applyEdgeConfidenceAndSource(CodeEdge edge, Map row) { + Object confObj = row.get(PROP_CONFIDENCE); + if (confObj instanceof String confStr) { + try { + edge.setConfidence(Confidence.fromString(confStr)); + } catch (IllegalArgumentException ignored) { + // keep default LEXICAL + } + } + Object srcObj = row.get(PROP_SOURCE); + if (srcObj instanceof String src) { + edge.setSource(src); + } + } + /** * Hydrate edges for a single node within an existing transaction. * Used by findById() to populate outgoing edges for node detail views. @@ -1181,7 +1226,8 @@ private void hydrateEdges(List nodes) { private void hydrateEdgesForNode(Transaction tx, CodeNode node) { var result = tx.execute( "MATCH (s:CodeNode {id: $nodeId})-[r:RELATES_TO]->(t:CodeNode) " - + "RETURN r.id AS id, r.kind AS kind, t.id AS targetId, t", + + "RETURN r.id AS id, r.kind AS kind, t.id AS targetId, t, " + + "r.confidence AS confidence, r.source AS source", Map.of(PROP_NODEID, node.getId())); while (result.hasNext()) { var row = result.next(); @@ -1194,10 +1240,15 @@ private void hydrateEdgesForNode(Transaction tx, CodeNode node) { } catch (IllegalArgumentException e) { continue; } + // targetId is read from the row but not used here — the lightweight target + // node is built from the embedded `t` Node value. Suppress unused warning. + assert targetId == null || !targetId.isEmpty(); // Build a lightweight target node (id only for reference) var targetNeo4j = (org.neo4j.graphdb.Node) row.get("t"); CodeNode target = nodeFromNeo4j(targetNeo4j); - node.getEdges().add(new CodeEdge(edgeId, edgeKind, node.getId(), target)); + CodeEdge edge = new CodeEdge(edgeId, edgeKind, node.getId(), target); + applyEdgeConfidenceAndSource(edge, row); + node.getEdges().add(edge); } } @@ -1216,6 +1267,18 @@ private static CodeNode nodeFromNeo4j(org.neo4j.graphdb.Node neo4jNode) { node.setModule((String) neo4jNode.getProperty(PROP_MODULE, null)); node.setFilePath((String) neo4jNode.getProperty(PROP_FILEPATH, null)); node.setLayer((String) neo4jNode.getProperty(PROP_LAYER, null)); + // Restore confidence + source. Missing/malformed confidence falls back to + // LEXICAL — least committal — so legacy nodes written before these fields + // existed read back without surprise. Source stays null when missing. + String confStr = (String) neo4jNode.getProperty(PROP_CONFIDENCE, null); + if (confStr != null) { + try { + node.setConfidence(Confidence.fromString(confStr)); + } catch (IllegalArgumentException ignored) { + // keep default LEXICAL — never throw on legacy/garbled values + } + } + node.setSource((String) neo4jNode.getProperty(PROP_SOURCE, null)); Object lineStart = neo4jNode.getProperty("lineStart", null); if (lineStart instanceof Number n) node.setLineStart(n.intValue()); diff --git a/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/EmptyResolved.java b/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/EmptyResolved.java new file mode 100644 index 00000000..16227581 --- /dev/null +++ b/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/EmptyResolved.java @@ -0,0 +1,34 @@ +package io.github.randomcodespace.iq.intelligence.resolver; + +import io.github.randomcodespace.iq.model.Confidence; + +/** + * Singleton "no resolution" {@link Resolved} — what + * {@link io.github.randomcodespace.iq.intelligence.resolver.SymbolResolver} + * returns when it can't resolve a file (parse failure, unsupported language, + * resolver disabled, or no resolver registered for this file's language). + * + *

Detectors must check {@link #isAvailable()} before downcasting; they will + * always get {@code false} from this singleton, signalling "fall back to + * syntactic detection." + */ +public final class EmptyResolved implements Resolved { + + /** The single instance — comparable via {@code ==}. */ + public static final EmptyResolved INSTANCE = new EmptyResolved(); + + private EmptyResolved() { } + + @Override + public boolean isAvailable() { + return false; + } + + @Override + public Confidence sourceConfidence() { + // Nothing was actually resolved — emissions consulting this should NOT + // claim RESOLVED confidence. LEXICAL is the floor; a syntactic detector + // emitting against EmptyResolved still has its own SYNTACTIC base default. + return Confidence.LEXICAL; + } +} diff --git a/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/ResolutionException.java b/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/ResolutionException.java new file mode 100644 index 00000000..4a58d80e --- /dev/null +++ b/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/ResolutionException.java @@ -0,0 +1,48 @@ +package io.github.randomcodespace.iq.intelligence.resolver; + +import java.nio.file.Path; + +/** + * Thrown by a {@link SymbolResolver} when bootstrap or per-file resolution + * fails in a way the resolver cannot recover from. Carries enough context + * (file path + language) for the orchestrator to log a useful message before + * falling back to syntactic detection. + * + *

Checked exception by design — symbol resolution is a long-tail of file- + * specific failures (corrupted source, dependency cycles, classpath holes), + * and the orchestrator must explicitly decide whether to skip the file or + * abort the whole pass. Swallowing silently is not an option. + */ +public class ResolutionException extends Exception { + + private final Path file; + private final String language; + + /** + * @param message human-readable description of the failure + * @param cause underlying exception (may be null) + * @param file the file (or project root for bootstrap failures) that + * couldn't be resolved + * @param language the language identifier for the resolver involved + */ + public ResolutionException(String message, Throwable cause, Path file, String language) { + super(message, cause); + this.file = file; + this.language = language; + } + + /** Convenience constructor without an underlying cause. */ + public ResolutionException(String message, Path file, String language) { + this(message, null, file, language); + } + + /** @return the file (or project root) that couldn't be resolved. May be {@code null}. */ + public Path file() { + return file; + } + + /** @return the language identifier (e.g. {@code "java"}). May be {@code null}. */ + public String language() { + return language; + } +} diff --git a/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/Resolved.java b/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/Resolved.java new file mode 100644 index 00000000..475313a2 --- /dev/null +++ b/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/Resolved.java @@ -0,0 +1,38 @@ +package io.github.randomcodespace.iq.intelligence.resolver; + +import io.github.randomcodespace.iq.model.Confidence; + +/** + * Per-file symbol resolution result. + * + *

A {@code Resolved} carries language-specific resolution state that detectors + * can consult to upgrade their emissions from {@link Confidence#SYNTACTIC} to + * {@link Confidence#RESOLVED}. Each language backend ships its own concrete + * implementation (e.g. {@code JavaResolved} wraps a {@code JavaSymbolSolver} + * plus a {@code CompilationUnit}); detectors that want resolved data downcast + * after checking {@link #isAvailable()}. + * + *

{@link #isAvailable()} is the first gate every detector should consult. + * If it returns {@code false}, the resolver wasn't able to resolve this file — + * detectors must fall back to syntactic detection. The {@link EmptyResolved} + * singleton is the canonical "not available" instance. + */ +public interface Resolved { + + /** + * @return {@code true} if this result actually carries resolved-symbol data + * and detectors may safely downcast to a language-specific subtype. + * {@code false} for {@link EmptyResolved} or any other backend that + * declined to resolve this file (e.g. parse failure, unsupported + * language, or resolver disabled). + */ + boolean isAvailable(); + + /** + * @return the confidence floor the orchestrator should stamp on emissions + * that consult this resolution. {@link Confidence#RESOLVED} for + * genuine resolution; {@link Confidence#LEXICAL} for + * {@link EmptyResolved} (i.e. nothing was actually resolved). + */ + Confidence sourceConfidence(); +} diff --git a/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/ResolverRegistry.java b/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/ResolverRegistry.java new file mode 100644 index 00000000..932400a5 --- /dev/null +++ b/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/ResolverRegistry.java @@ -0,0 +1,107 @@ +package io.github.randomcodespace.iq.intelligence.resolver; + +import io.github.randomcodespace.iq.analyzer.DiscoveredFile; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import org.springframework.stereotype.Service; + +import java.nio.file.Path; +import java.util.Comparator; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Set; + +/** + * Spring-managed registry for {@link SymbolResolver} backends. Mirrors + * {@link io.github.randomcodespace.iq.detector.DetectorRegistry}: every + * {@code @Component} implementing {@link SymbolResolver} is auto-injected via + * the constructor. + * + *

Determinism: resolvers are sorted by {@link Class#getSimpleName()} + * alphabetically before any other operation. {@link #bootstrap(Path)} iterates + * in this order; per-language lookup uses "first-in-sort-order wins" if two + * resolvers claim the same language. Same input → same resolution behavior, + * every time. + * + *

Resilience: {@link #bootstrap(Path)} catches per-resolver + * {@link ResolutionException} so one misbehaving resolver can't take down the + * whole pass. Each resolver's own {@link SymbolResolver#resolve} handles its + * post-bootstrap state — if bootstrap failed, the resolver should return + * {@link EmptyResolved#INSTANCE} from its resolve() method (its own concern). + */ +@Service +public class ResolverRegistry { + + private static final Logger log = LoggerFactory.getLogger(ResolverRegistry.class); + + /** Singleton no-op resolver — returned for unknown languages or null input. */ + static final SymbolResolver NOOP = new NoopResolver(); + + private final List resolvers; + private final Map byLanguage; + + public ResolverRegistry(List resolvers) { + // Deterministic order: alphabetical by class simple name. + this.resolvers = resolvers.stream() + .sorted(Comparator.comparing(r -> r.getClass().getSimpleName())) + .toList(); + + // First-in-sort-order wins per language (deterministic conflict resolution). + Map map = new HashMap<>(); + for (SymbolResolver r : this.resolvers) { + for (String lang : r.getSupportedLanguages()) { + if (lang == null || lang.isBlank()) continue; + map.putIfAbsent(lang.toLowerCase(), r); + } + } + this.byLanguage = Map.copyOf(map); + } + + /** + * Bootstrap every registered resolver against the given project root. + * Iterates in deterministic (alphabetical) order. Per-resolver failures + * are logged at WARN and swallowed so one broken resolver doesn't cascade. + */ + public void bootstrap(Path projectRoot) { + for (SymbolResolver r : resolvers) { + try { + r.bootstrap(projectRoot); + } catch (ResolutionException e) { + log.warn("resolver {} bootstrap failed for {}: {}", + r.getClass().getSimpleName(), projectRoot, e.getMessage()); + } catch (RuntimeException e) { + // Defensive — resolvers shouldn't throw RuntimeException, but + // if they do, don't take down the pass. + log.warn("resolver {} bootstrap threw unexpectedly for {}: {}", + r.getClass().getSimpleName(), projectRoot, e.toString()); + } + } + } + + /** + * Look up the resolver for a given language identifier. + * + * @param language language identifier (case-insensitive). May be null. + * @return the matching resolver, or a no-op resolver returning + * {@link EmptyResolved#INSTANCE}. Never null. + */ + public SymbolResolver resolverFor(String language) { + if (language == null) return NOOP; + return byLanguage.getOrDefault(language.toLowerCase(), NOOP); + } + + /** @return all registered resolvers in deterministic order (alphabetical by class simple name). */ + public List all() { + return resolvers; + } + + /** Singleton no-op — claims no languages, bootstrap is a no-op, resolve always returns EmptyResolved. */ + static final class NoopResolver implements SymbolResolver { + @Override public Set getSupportedLanguages() { return Set.of(); } + @Override public void bootstrap(Path projectRoot) { } + @Override public Resolved resolve(DiscoveredFile file, Object parsedAst) { + return EmptyResolved.INSTANCE; + } + } +} diff --git a/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/SymbolResolver.java b/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/SymbolResolver.java new file mode 100644 index 00000000..e663185c --- /dev/null +++ b/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/SymbolResolver.java @@ -0,0 +1,77 @@ +package io.github.randomcodespace.iq.intelligence.resolver; + +import io.github.randomcodespace.iq.analyzer.DiscoveredFile; + +import java.nio.file.Path; +import java.util.Set; + +/** + * Per-language symbol-resolution backend. The Resolver SPI mirrors the + * {@link io.github.randomcodespace.iq.detector.Detector} SPI: each implementation + * is a Spring {@code @Component} declaring which languages it handles, and the + * {@link ResolverRegistry} auto-discovers them at startup. + * + *

Lifecycle: + *

    + *
  1. The orchestrator calls {@link #bootstrap(Path)} once with the project + * root before any per-file work. The resolver builds whatever it needs + * (type solvers, classpath, etc.).
  2. + *
  3. For each parsed file, the orchestrator calls + * {@link #resolve(DiscoveredFile, Object)} with the parsed AST. The + * resolver returns a language-specific {@link Resolved} carrying the + * resolution context, or {@link EmptyResolved#INSTANCE} if the file + * isn't its language.
  4. + *
  5. {@link #shutdown()} is called once at the end of the pass for cleanup + * (default no-op).
  6. + *
+ * + *

Thread safety: implementations must be safe to invoke + * {@link #resolve(DiscoveredFile, Object)} concurrently from virtual threads + * after a single {@link #bootstrap(Path)} call. Detector pipelines run on + * virtual-thread pools. + * + *

Determinism: if the resolver depends on source roots or classpath, those + * inputs must be sorted before construction so two runs over the same project + * produce identical resolution results. + */ +public interface SymbolResolver { + + /** + * @return language identifiers this resolver handles, lowercase, e.g. + * {@code Set.of("java")} or {@code Set.of("typescript", + * "javascript")}. Never empty, never null. + */ + Set getSupportedLanguages(); + + /** + * Build whatever language-specific resolution state is needed for a single + * project root. Called once per analysis pass before any + * {@link #resolve(DiscoveredFile, Object)} call. + * + * @param projectRoot absolute path to the project root being analyzed + * @throws ResolutionException if bootstrap fails irrecoverably (the + * orchestrator will log and disable this resolver for the pass) + */ + void bootstrap(Path projectRoot) throws ResolutionException; + + /** + * Resolve symbols for a single parsed file. + * + * @param file the file being detected + * @param parsedAst the AST produced by the parser pipeline. Type is + * language-specific (e.g. {@code CompilationUnit} for + * Java, {@code ParseTree} for ANTLR languages); the + * resolver checks via {@code instanceof}. + * @return language-specific {@link Resolved} on success, or + * {@link EmptyResolved#INSTANCE} if this file isn't this + * resolver's language or {@code parsedAst} is the wrong type. + * Must never return {@code null}. + * @throws ResolutionException for irrecoverable per-file failures the + * orchestrator should surface (rare; most failures should + * downgrade to {@link EmptyResolved#INSTANCE} silently). + */ + Resolved resolve(DiscoveredFile file, Object parsedAst) throws ResolutionException; + + /** Cleanup hook. Default no-op. */ + default void shutdown() { } +} diff --git a/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaResolved.java b/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaResolved.java new file mode 100644 index 00000000..4c8e90d3 --- /dev/null +++ b/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaResolved.java @@ -0,0 +1,32 @@ +package io.github.randomcodespace.iq.intelligence.resolver.java; + +import com.github.javaparser.ast.CompilationUnit; +import com.github.javaparser.symbolsolver.JavaSymbolSolver; +import io.github.randomcodespace.iq.intelligence.resolver.Resolved; +import io.github.randomcodespace.iq.model.Confidence; + +/** + * Java-specific {@link Resolved} carrying the parsed {@link CompilationUnit} + * and the {@link JavaSymbolSolver} configured for the current project. + * + *

Detectors that opt in to resolution should: + *

    + *
  1. Read {@code ctx.resolved()}
  2. + *
  3. Filter on {@link #isAvailable()}
  4. + *
  5. Downcast to {@code JavaResolved}
  6. + *
  7. Use {@link #cu()} (the file's parsed AST) and {@link #solver()} + * (for cross-file type lookups) to resolve symbols
  8. + *
+ */ +public record JavaResolved(CompilationUnit cu, JavaSymbolSolver solver) implements Resolved { + + @Override + public boolean isAvailable() { + return true; + } + + @Override + public Confidence sourceConfidence() { + return Confidence.RESOLVED; + } +} diff --git a/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaSourceRootDiscovery.java b/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaSourceRootDiscovery.java new file mode 100644 index 00000000..f6c111c4 --- /dev/null +++ b/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaSourceRootDiscovery.java @@ -0,0 +1,134 @@ +package io.github.randomcodespace.iq.intelligence.resolver.java; + +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import org.springframework.stereotype.Component; + +import java.io.IOException; +import java.nio.file.FileVisitOption; +import java.nio.file.FileVisitResult; +import java.nio.file.Files; +import java.nio.file.Path; +import java.nio.file.SimpleFileVisitor; +import java.nio.file.attribute.BasicFileAttributes; +import java.util.ArrayList; +import java.util.EnumSet; +import java.util.List; +import java.util.Set; +import java.util.TreeSet; + +/** + * Discovers Java source roots under a project root by walking for the + * {@code src/main/java} and {@code src/test/java} directories Maven and Gradle + * both standardize on. Multi-module projects are handled by walking the whole + * tree — every nested {@code src/(main|test)/java} is a separate root. + * + *

Determinism: results are returned sorted alphabetically by absolute path. + * Same project tree → same root list → same {@code CombinedTypeSolver} → + * same resolution behavior. + * + *

Symlink safety: {@link Files#walkFileTree} runs with + * {@link FileVisitOption#FOLLOW_LINKS} disabled, so symlink cycles cannot + * form. The trade-off — source roots reachable only via symlink are skipped + * — is the right call for resolution: traversal would otherwise double-count. + * + *

Plain-layout fallback: if the walk finds no Maven/Gradle source roots + * but the top-level directory contains {@code src/} with at least one + * {@code *.java} file, returns {@code [src]} as a single root. This covers + * scratch projects without a build file. + */ +@Component +public class JavaSourceRootDiscovery { + + private static final Logger log = LoggerFactory.getLogger(JavaSourceRootDiscovery.class); + + /** Directories we never descend into — they don't contain Java sources we care about. */ + private static final Set SKIP_DIRS = Set.of( + "target", "build", "out", "bin", "dist", + ".git", ".gradle", ".idea", ".vscode", ".m2", ".cache", + "node_modules", ".codeiq" + ); + + /** + * @param projectRoot project root path. May be null or non-existent — both + * return an empty list. + * @return sorted list of absolute Java source root paths (e.g. + * {@code [/service-a/src/main/java, /service-b/src/main/java]}). + * Never null, never contains null entries. + */ + public List discover(Path projectRoot) { + if (projectRoot == null || !Files.isDirectory(projectRoot)) { + return List.of(); + } + + Set roots = new TreeSet<>(); + try { + Files.walkFileTree( + projectRoot, + EnumSet.noneOf(FileVisitOption.class), // do NOT follow symlinks + Integer.MAX_VALUE, + new SimpleFileVisitor<>() { + @Override + public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs) { + String name = nameOrEmpty(dir); + if (SKIP_DIRS.contains(name)) { + return FileVisitResult.SKIP_SUBTREE; + } + if (isMavenStyleJavaRoot(dir)) { + roots.add(dir); + } + return FileVisitResult.CONTINUE; + } + + @Override + public FileVisitResult visitFileFailed(Path file, IOException exc) { + // Ignore unreadable entries; resolution is best-effort. + log.debug("skipping unreadable path {}: {}", file, exc.getMessage()); + return FileVisitResult.CONTINUE; + } + }); + } catch (IOException e) { + log.warn("source root discovery failed for {}: {}", projectRoot, e.getMessage()); + return List.of(); + } + + if (!roots.isEmpty()) { + return new ArrayList<>(roots); + } + + // Plain-layout fallback: top-level src/ with at least one .java file. + Path src = projectRoot.resolve("src"); + if (Files.isDirectory(src) && containsJavaFile(src)) { + return List.of(src); + } + return List.of(); + } + + /** {@code true} iff {@code dir} is {@code .../src/main/java} or {@code .../src/test/java}. */ + private static boolean isMavenStyleJavaRoot(Path dir) { + if (!"java".equals(nameOrEmpty(dir))) return false; + Path parent = dir.getParent(); + if (parent == null) return false; + String parentName = nameOrEmpty(parent); + if (!"main".equals(parentName) && !"test".equals(parentName)) return false; + Path grandparent = parent.getParent(); + if (grandparent == null) return false; + return "src".equals(nameOrEmpty(grandparent)); + } + + private static String nameOrEmpty(Path p) { + Path name = p.getFileName(); + return name != null ? name.toString() : ""; + } + + /** Cheap probe: does the directory tree under {@code root} have any {@code *.java}? */ + private static boolean containsJavaFile(Path root) { + try { + return Files.walk(root) + .filter(p -> !Files.isDirectory(p)) + .anyMatch(p -> p.toString().endsWith(".java")); + } catch (IOException e) { + return false; + } + } +} diff --git a/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaSymbolResolver.java b/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaSymbolResolver.java new file mode 100644 index 00000000..4cf2e66b --- /dev/null +++ b/src/main/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaSymbolResolver.java @@ -0,0 +1,104 @@ +package io.github.randomcodespace.iq.intelligence.resolver.java; + +import com.github.javaparser.ast.CompilationUnit; +import com.github.javaparser.symbolsolver.JavaSymbolSolver; +import com.github.javaparser.symbolsolver.resolution.typesolvers.CombinedTypeSolver; +import com.github.javaparser.symbolsolver.resolution.typesolvers.JavaParserTypeSolver; +import com.github.javaparser.symbolsolver.resolution.typesolvers.ReflectionTypeSolver; +import io.github.randomcodespace.iq.analyzer.DiscoveredFile; +import io.github.randomcodespace.iq.intelligence.resolver.EmptyResolved; +import io.github.randomcodespace.iq.intelligence.resolver.ResolutionException; +import io.github.randomcodespace.iq.intelligence.resolver.Resolved; +import io.github.randomcodespace.iq.intelligence.resolver.SymbolResolver; +import org.springframework.stereotype.Component; + +import java.nio.file.Path; +import java.util.Set; + +/** + * Java backend for the resolver SPI. Wraps JavaParser's {@link JavaSymbolSolver} + * configured from a {@link CombinedTypeSolver} that includes + * {@link ReflectionTypeSolver} plus a {@link JavaParserTypeSolver} per source + * root discovered by {@link JavaSourceRootDiscovery}. + * + *

Determinism: {@link JavaSourceRootDiscovery} returns roots sorted + * alphabetically, so the order of {@link JavaParserTypeSolver}s in the + * combined solver is stable across runs. + * + *

Thread safety: bootstrap is called once before any resolve(); resolve() + * is safe under virtual-thread concurrency because {@link JavaSymbolSolver} + * itself is thread-safe for read-only resolution. We deliberately do NOT + * mutate {@code StaticJavaParser.getParserConfiguration()} — that would be + * global static state shared with the existing + * {@link io.github.randomcodespace.iq.detector.jvm.java.AbstractJavaParserDetector} + * thread-local parser pool and is not safe under concurrent use. + */ +@Component +public class JavaSymbolResolver implements SymbolResolver { + + private final JavaSourceRootDiscovery discovery; + private CombinedTypeSolver combined; + private JavaSymbolSolver solver; + + public JavaSymbolResolver(JavaSourceRootDiscovery discovery) { + this.discovery = discovery; + } + + @Override + public Set getSupportedLanguages() { + return Set.of("java"); + } + + @Override + public void bootstrap(Path projectRoot) throws ResolutionException { + try { + CombinedTypeSolver cts = new CombinedTypeSolver(); + cts.add(new ReflectionTypeSolver()); + for (Path root : discovery.discover(projectRoot)) { + cts.add(new JavaParserTypeSolver(root.toFile())); + } + this.combined = cts; + this.solver = new JavaSymbolSolver(cts); + } catch (RuntimeException e) { + throw new ResolutionException( + "JavaSymbolResolver bootstrap failed for " + projectRoot, + e, projectRoot, "java"); + } + } + + @Override + public Resolved resolve(DiscoveredFile file, Object parsedAst) { + if (file == null || !"java".equalsIgnoreCase(file.language())) { + return EmptyResolved.INSTANCE; + } + if (!(parsedAst instanceof CompilationUnit cu)) { + return EmptyResolved.INSTANCE; + } + if (this.solver == null) { + // bootstrap() not called or it failed silently — falling back to + // EmptyResolved is the safe path. The orchestrator already logs + // bootstrap failures from ResolverRegistry. + return EmptyResolved.INSTANCE; + } + return new JavaResolved(cu, solver); + } + + /** + * @return the {@link CombinedTypeSolver} built during {@link #bootstrap(Path)}, + * or null if bootstrap hasn't run. Exposed for tests + advanced use. + */ + public CombinedTypeSolver combinedTypeSolver() { + return combined; + } + + /** + * @return the {@link JavaSymbolSolver} built during {@link #bootstrap(Path)}, + * or null if bootstrap hasn't run. Detectors that want to attach the + * solver to their own {@code JavaParser} (rather than the + * {@link JavaResolved#cu()} carried CompilationUnit) can read this + * and call {@code new ParserConfiguration().setSymbolResolver(...)}. + */ + public JavaSymbolSolver symbolSolver() { + return solver; + } +} diff --git a/src/main/java/io/github/randomcodespace/iq/model/CodeEdge.java b/src/main/java/io/github/randomcodespace/iq/model/CodeEdge.java index 7668f88b..779eeb9a 100644 --- a/src/main/java/io/github/randomcodespace/iq/model/CodeEdge.java +++ b/src/main/java/io/github/randomcodespace/iq/model/CodeEdge.java @@ -34,6 +34,20 @@ public class CodeEdge { @ConvertWith(converter = MapToJsonConverter.class) private Map properties = new HashMap<>(); + /** + * Confidence in this edge's existence and target accuracy. Defaults to + * {@link Confidence#LEXICAL} for backward compatibility with edges + * persisted before this field existed. + */ + @ConvertWith(converter = ConfidenceConverter.class) + private Confidence confidence = Confidence.LEXICAL; + + /** + * Detector class simple name that emitted this edge, e.g. + * {@code "SpringServiceDetector"}. Stamped by detector base classes. + */ + private String source; + public CodeEdge() { } @@ -90,6 +104,35 @@ public void setProperties(Map properties) { this.properties = properties; } + /** + * @return confidence stamped by the detector. Never {@code null} — falls + * back to {@link Confidence#LEXICAL} for edges loaded before this + * field existed. + */ + public Confidence getConfidence() { + return confidence != null ? confidence : Confidence.LEXICAL; + } + + /** + * Set confidence. {@code null} is normalized to {@link Confidence#LEXICAL} + * so the field is never null at rest. + */ + public void setConfidence(Confidence confidence) { + this.confidence = confidence != null ? confidence : Confidence.LEXICAL; + } + + /** + * @return the simple class name of the detector that emitted this edge, + * or {@code null} if the edge was constructed bare. + */ + public String getSource() { + return source; + } + + public void setSource(String source) { + this.source = source; + } + @Override public boolean equals(Object o) { if (this == o) return true; diff --git a/src/main/java/io/github/randomcodespace/iq/model/CodeNode.java b/src/main/java/io/github/randomcodespace/iq/model/CodeNode.java index e51eaaed..c2a3f69e 100644 --- a/src/main/java/io/github/randomcodespace/iq/model/CodeNode.java +++ b/src/main/java/io/github/randomcodespace/iq/model/CodeNode.java @@ -43,6 +43,20 @@ public class CodeNode { /** Layer classification: frontend, backend, infra, shared, unknown. */ private String layer; + /** + * Confidence in this node's existence and shape, set by the detector that + * emitted it. Defaults to {@link Confidence#LEXICAL} (least committal) so + * a node persisted before this field existed reads back without surprise. + */ + @ConvertWith(converter = ConfidenceConverter.class) + private Confidence confidence = Confidence.LEXICAL; + + /** + * Detector class simple name that emitted this node, e.g. + * {@code "SpringServiceDetector"}. Stamped by detector base classes. + */ + private String source; + private List annotations = new ArrayList<>(); @ConvertWith(converter = MapToJsonConverter.class) @@ -134,6 +148,36 @@ public void setLayer(String layer) { this.layer = layer; } + /** + * @return confidence stamped by the detector. Never {@code null} — falls + * back to {@link Confidence#LEXICAL} for nodes loaded before this + * field existed. + */ + public Confidence getConfidence() { + return confidence != null ? confidence : Confidence.LEXICAL; + } + + /** + * Set confidence. {@code null} is normalized to {@link Confidence#LEXICAL} + * so the field is never null at rest. + */ + public void setConfidence(Confidence confidence) { + this.confidence = confidence != null ? confidence : Confidence.LEXICAL; + } + + /** + * @return the simple class name of the detector that emitted this node, + * or {@code null} if the node was constructed bare (e.g. in tests + * or by code paths that have not been migrated). + */ + public String getSource() { + return source; + } + + public void setSource(String source) { + this.source = source; + } + public List getAnnotations() { return annotations; } diff --git a/src/main/java/io/github/randomcodespace/iq/model/Confidence.java b/src/main/java/io/github/randomcodespace/iq/model/Confidence.java new file mode 100644 index 00000000..75798d7f --- /dev/null +++ b/src/main/java/io/github/randomcodespace/iq/model/Confidence.java @@ -0,0 +1,59 @@ +package io.github.randomcodespace.iq.model; + +import java.util.Objects; + +/** + * Confidence in the truth of a node or edge, based on the parser pipeline that + * produced it. + *

+ * Lower values mean the assertion comes from textual patterns; higher values + * mean the assertion is backed by parsed structure or resolved symbol types. + * Comparable: {@code LEXICAL} < {@code SYNTACTIC} < {@code RESOLVED}. + *

+ * Numeric mapping (via {@link #score()}) is stable and intended for Cypher / + * MCP / SPA filtering. The enum itself is the authoritative form; the score + * exists only as a convenience for clients that want a single number. + * + * @see Sub-project 1 design — §5.3 Confidence schema + */ +public enum Confidence { + + /** Pattern-only match (regex). The detector saw a textual pattern. */ + LEXICAL(0.6), + + /** AST or parse tree match, no symbol resolution. The detector saw structure. */ + SYNTACTIC(0.8), + + /** Resolved via a {@code SymbolResolver} — the detector saw resolved types. */ + RESOLVED(0.95); + + private final double score; + + Confidence(double score) { + this.score = score; + } + + /** + * Stable numeric score for filtering / threshold logic. + * Mapping: {@code LEXICAL=0.6}, {@code SYNTACTIC=0.8}, {@code RESOLVED=0.95}. + */ + public double score() { + return score; + } + + /** + * Look up a {@code Confidence} by case-insensitive name. + * + * @throws NullPointerException if {@code value} is null + * @throws IllegalArgumentException if {@code value} does not match any constant + */ + public static Confidence fromString(String value) { + Objects.requireNonNull(value, "Confidence value must not be null"); + for (Confidence c : values()) { + if (c.name().equalsIgnoreCase(value)) { + return c; + } + } + throw new IllegalArgumentException("Unknown Confidence: " + value); + } +} diff --git a/src/main/java/io/github/randomcodespace/iq/model/ConfidenceConverter.java b/src/main/java/io/github/randomcodespace/iq/model/ConfidenceConverter.java new file mode 100644 index 00000000..347b1c3f --- /dev/null +++ b/src/main/java/io/github/randomcodespace/iq/model/ConfidenceConverter.java @@ -0,0 +1,24 @@ +package io.github.randomcodespace.iq.model; + +import org.neo4j.driver.Value; +import org.springframework.data.neo4j.core.convert.Neo4jPersistentPropertyConverter; + +/** + * Converts between {@link Confidence} and its uppercase string name for + * Neo4j storage. Stores {@code "LEXICAL"} / {@code "SYNTACTIC"} / + * {@code "RESOLVED"} so Cypher filters like + * {@code WHERE n.confidence = 'RESOLVED'} match without case folding. + */ +public class ConfidenceConverter implements Neo4jPersistentPropertyConverter { + + @Override + public Value write(Confidence confidence) { + return org.neo4j.driver.Values.value(confidence != null ? confidence.name() : null); + } + + @Override + public Confidence read(Value source) { + if (source == null || source.isNull()) return Confidence.LEXICAL; + return Confidence.fromString(source.asString()); + } +} diff --git a/src/main/java/io/github/randomcodespace/iq/model/EdgeKind.java b/src/main/java/io/github/randomcodespace/iq/model/EdgeKind.java index 52177c79..b0bbbe62 100644 --- a/src/main/java/io/github/randomcodespace/iq/model/EdgeKind.java +++ b/src/main/java/io/github/randomcodespace/iq/model/EdgeKind.java @@ -2,7 +2,7 @@ /** * Types of edges (relationships) in the Code IQ graph. - * Mirrors the 27 edge kinds from the Python implementation. + * Mirrors the 28 edge kinds from the Python implementation. */ public enum EdgeKind { diff --git a/src/main/java/io/github/randomcodespace/iq/model/NodeKind.java b/src/main/java/io/github/randomcodespace/iq/model/NodeKind.java index f1760bb1..2230de40 100644 --- a/src/main/java/io/github/randomcodespace/iq/model/NodeKind.java +++ b/src/main/java/io/github/randomcodespace/iq/model/NodeKind.java @@ -2,7 +2,7 @@ /** * Types of nodes in the Code IQ graph. - * Mirrors the 32 node kinds from the Python implementation. + * Mirrors the 34 node kinds from the Python implementation. */ public enum NodeKind { diff --git a/src/test/java/io/github/randomcodespace/iq/cache/AnalysisCacheConfidenceTest.java b/src/test/java/io/github/randomcodespace/iq/cache/AnalysisCacheConfidenceTest.java new file mode 100644 index 00000000..64e284b7 --- /dev/null +++ b/src/test/java/io/github/randomcodespace/iq/cache/AnalysisCacheConfidenceTest.java @@ -0,0 +1,207 @@ +package io.github.randomcodespace.iq.cache; + +import io.github.randomcodespace.iq.model.CodeEdge; +import io.github.randomcodespace.iq.model.CodeNode; +import io.github.randomcodespace.iq.model.Confidence; +import io.github.randomcodespace.iq.model.EdgeKind; +import io.github.randomcodespace.iq.model.NodeKind; +import org.junit.jupiter.api.AfterEach; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.Test; +import org.junit.jupiter.api.io.TempDir; +import org.junit.jupiter.params.ParameterizedTest; +import org.junit.jupiter.params.provider.EnumSource; + +import java.lang.reflect.Field; +import java.nio.file.Path; +import java.util.List; + +import static org.junit.jupiter.api.Assertions.*; + +/** + * Aggressive H2-cache round-trip coverage for {@link Confidence} and detector + * source on cached nodes and edges. Verifies that bumping {@code CACHE_VERSION} + * to 5 actually carries the new fields through both the serialize and + * deserialize paths, including: + *

    + *
  • All three confidence values (LEXICAL/SYNTACTIC/RESOLVED) on nodes and edges
  • + *
  • Bare model objects (no confidence explicitly set) round-trip as LEXICAL
  • + *
  • Source is optional and stays null on bare objects
  • + *
  • Repeated upsert preserves confidence (no silent decay)
  • + *
  • {@code CACHE_VERSION} is exactly 5 — guards against accidental rollback
  • + *
+ */ +class AnalysisCacheConfidenceTest { + + private AnalysisCache cache; + + @BeforeEach + void setUp(@TempDir Path tempDir) { + cache = new AnalysisCache(tempDir.resolve("test-cache.db")); + } + + @AfterEach + void tearDown() { + if (cache != null) { + cache.close(); + } + } + + // ---------- Node round-trips ---------- + + @ParameterizedTest + @EnumSource(Confidence.class) + void node_allConfidenceValuesRoundTripThroughCache(Confidence value) { + CodeNode node = new CodeNode("test:cache:" + value.name(), NodeKind.CLASS, "X"); + node.setConfidence(value); + node.setSource("MyDetector"); + + cache.storeResults("h-" + value.name(), "X.java", "java", + List.of(node), List.of()); + + var result = cache.loadCachedResults("h-" + value.name()); + assertNotNull(result); + assertEquals(1, result.nodes().size()); + CodeNode loaded = result.nodes().getFirst(); + assertEquals(value, loaded.getConfidence(), + "node confidence must round-trip through the H2 cache"); + assertEquals("MyDetector", loaded.getSource()); + } + + @Test + void node_bareConstructionDefaultsRoundTripAsLexicalAndNullSource() { + // Bare node — no confidence or source set. Round-trip must yield LEXICAL + null + // (matches CodeNode field defaults and the "least committal" invariant). + CodeNode node = new CodeNode("test:bare:Foo", NodeKind.CLASS, "Foo"); + cache.storeResults("h-bare", "Foo.java", "java", + List.of(node), List.of()); + + var result = cache.loadCachedResults("h-bare"); + assertNotNull(result); + CodeNode loaded = result.nodes().getFirst(); + assertEquals(Confidence.LEXICAL, loaded.getConfidence(), + "bare node round-trips as LEXICAL — least committal default"); + assertNull(loaded.getSource(), + "bare node round-trips with null source — no string sentinel"); + } + + @Test + void node_upsertPreservesConfidenceAndSource() { + // First write with one confidence/source, then overwrite with a stronger one. + // Reload must reflect the latest write — no silent decay. + CodeNode v1 = new CodeNode("test:upsert:Foo", NodeKind.CLASS, "Foo"); + v1.setConfidence(Confidence.LEXICAL); + v1.setSource("RegexDetector"); + cache.storeResults("h-upsert", "Foo.java", "java", List.of(v1), List.of()); + + CodeNode v2 = new CodeNode("test:upsert:Foo:v2", NodeKind.CLASS, "Foo"); + v2.setConfidence(Confidence.RESOLVED); + v2.setSource("ResolvedDetector"); + cache.storeResults("h-upsert", "Foo.java", "java", List.of(v2), List.of()); + + var result = cache.loadCachedResults("h-upsert"); + assertNotNull(result); + assertEquals(1, result.nodes().size()); + CodeNode loaded = result.nodes().getFirst(); + assertEquals(Confidence.RESOLVED, loaded.getConfidence(), + "upsert must overwrite confidence — never silently keep the older value"); + assertEquals("ResolvedDetector", loaded.getSource()); + } + + @Test + void node_clearThenStoreReroundtripsConfidence() { + // Defensive: after a full clear, the next round-trip still works. + CodeNode pre = new CodeNode("pre:n", NodeKind.CLASS, "Pre"); + pre.setConfidence(Confidence.RESOLVED); + cache.storeResults("h-pre", "P.java", "java", List.of(pre), List.of()); + cache.clear(); + // Verify clear removed it. + assertNull(cache.loadCachedResults("h-pre")); + + CodeNode post = new CodeNode("post:n", NodeKind.CLASS, "Post"); + post.setConfidence(Confidence.SYNTACTIC); + post.setSource("PostClearDetector"); + cache.storeResults("h-post", "P.java", "java", List.of(post), List.of()); + + var result = cache.loadCachedResults("h-post"); + assertNotNull(result); + assertEquals(Confidence.SYNTACTIC, result.nodes().getFirst().getConfidence()); + assertEquals("PostClearDetector", result.nodes().getFirst().getSource()); + } + + // ---------- Edge round-trips ---------- + + @ParameterizedTest + @EnumSource(Confidence.class) + void edge_allConfidenceValuesRoundTripThroughCache(Confidence value) { + CodeNode src = new CodeNode("e:src:" + value.name(), NodeKind.CLASS, "Src"); + CodeNode tgt = new CodeNode("e:tgt:" + value.name(), NodeKind.CLASS, "Tgt"); + CodeEdge edge = new CodeEdge("e:edge:" + value.name(), EdgeKind.DEPENDS_ON, + "e:src:" + value.name(), tgt); + edge.setConfidence(value); + edge.setSource("EdgeDetector"); + + cache.storeResults("e-" + value.name(), "E.java", "java", + List.of(src, tgt), List.of(edge)); + + var result = cache.loadCachedResults("e-" + value.name()); + assertNotNull(result); + assertEquals(1, result.edges().size()); + CodeEdge loaded = result.edges().getFirst(); + assertEquals(value, loaded.getConfidence(), + "edge confidence must round-trip through the H2 cache"); + assertEquals("EdgeDetector", loaded.getSource()); + } + + @Test + void edge_bareConstructionDefaultsRoundTripAsLexicalAndNullSource() { + CodeNode src = new CodeNode("e:bare:src", NodeKind.CLASS, "Src"); + CodeNode tgt = new CodeNode("e:bare:tgt", NodeKind.CLASS, "Tgt"); + CodeEdge edge = new CodeEdge("e:bare:edge", EdgeKind.DEPENDS_ON, "e:bare:src", tgt); + + cache.storeResults("e-bare", "E.java", "java", + List.of(src, tgt), List.of(edge)); + + var result = cache.loadCachedResults("e-bare"); + assertNotNull(result); + CodeEdge loaded = result.edges().getFirst(); + assertEquals(Confidence.LEXICAL, loaded.getConfidence(), + "bare edge round-trips as LEXICAL"); + assertNull(loaded.getSource(), + "bare edge round-trips with null source"); + } + + @Test + void edge_setNullSourceNormalizesToLexicalNotNull() { + // Edge model setter normalizes null confidence → LEXICAL. Verify cache + // round-trip preserves this invariant: getConfidence() never returns null. + CodeNode src = new CodeNode("e:null:src", NodeKind.CLASS, "Src"); + CodeNode tgt = new CodeNode("e:null:tgt", NodeKind.CLASS, "Tgt"); + CodeEdge edge = new CodeEdge("e:null:edge", EdgeKind.DEPENDS_ON, "e:null:src", tgt); + edge.setConfidence(null); // setter normalizes to LEXICAL + edge.setSource(null); + + cache.storeResults("e-null", "E.java", "java", List.of(src, tgt), List.of(edge)); + + var result = cache.loadCachedResults("e-null"); + assertNotNull(result); + CodeEdge loaded = result.edges().getFirst(); + assertNotNull(loaded.getConfidence(), "confidence is never null at rest"); + assertEquals(Confidence.LEXICAL, loaded.getConfidence()); + } + + // ---------- Schema invariant ---------- + + @Test + void cacheVersionIsBumpedToFive() throws Exception { + // Reflection-driven assertion — confidence + source serialization is a + // breaking change to the JSON shape of cached rows. CACHE_VERSION must be + // bumped to 5 so existing v4 caches are dropped on next open. Reverting + // this without re-thinking the schema invalidation is a footgun. + Field f = AnalysisCache.class.getDeclaredField("CACHE_VERSION"); + f.setAccessible(true); + int version = (int) f.get(null); + assertEquals(5, version, + "CACHE_VERSION must be 5 after the confidence + source schema change"); + } +} diff --git a/src/test/java/io/github/randomcodespace/iq/detector/DetectorContextResolvedTest.java b/src/test/java/io/github/randomcodespace/iq/detector/DetectorContextResolvedTest.java new file mode 100644 index 00000000..dc699252 --- /dev/null +++ b/src/test/java/io/github/randomcodespace/iq/detector/DetectorContextResolvedTest.java @@ -0,0 +1,140 @@ +package io.github.randomcodespace.iq.detector; + +import io.github.randomcodespace.iq.intelligence.resolver.EmptyResolved; +import io.github.randomcodespace.iq.intelligence.resolver.Resolved; +import io.github.randomcodespace.iq.model.Confidence; +import org.junit.jupiter.api.Test; + +import java.util.Optional; + +import static org.junit.jupiter.api.Assertions.*; + +/** + * Aggressive coverage for the {@link DetectorContext#resolved()} accessor and + * the backward-compat invariant: existing call sites continue to compile and + * see {@link Optional#empty()} for resolution. Detectors that opt in via + * {@link DetectorContext#withResolved(Resolved)} get the attached value. + */ +class DetectorContextResolvedTest { + + @Test + void threeArgConstructorDefaultsResolvedToEmpty() { + DetectorContext ctx = new DetectorContext("Foo.java", "java", "class Foo {}"); + assertEquals(Optional.empty(), ctx.resolved(), + "3-arg constructor still gives empty resolution — backward compat"); + } + + @Test + void fiveArgConstructorDefaultsResolvedToEmpty() { + DetectorContext ctx = new DetectorContext("Foo.java", "java", "class Foo {}", + null, "myModule"); + assertEquals(Optional.empty(), ctx.resolved(), + "5-arg constructor still gives empty resolution — backward compat"); + } + + @Test + void sixArgConstructorDefaultsResolvedToEmpty() { + DetectorContext ctx = new DetectorContext("Foo.java", "java", "class Foo {}", + null, "myModule", null); + assertEquals(Optional.empty(), ctx.resolved(), + "6-arg constructor still gives empty resolution — backward compat"); + } + + @Test + void canonicalSevenArgConstructorCarriesResolved() { + Resolved r = stubAvailableResolved(); + DetectorContext ctx = new DetectorContext("Foo.java", "java", "class Foo {}", + null, "myModule", null, Optional.of(r)); + assertTrue(ctx.resolved().isPresent()); + assertSame(r, ctx.resolved().get()); + } + + @Test + void compactConstructorNormalizesNullResolvedToEmpty() { + // Defensive: passing null Optional is a misuse, but the compact + // constructor must not let it propagate (or callers reading ctx.resolved() + // would NPE). Normalized to Optional.empty() at construction time. + DetectorContext ctx = new DetectorContext("Foo.java", "java", "class Foo {}", + null, "myModule", null, null); + assertNotNull(ctx.resolved()); + assertEquals(Optional.empty(), ctx.resolved()); + } + + @Test + void withResolvedAttachesAvailableResolved() { + DetectorContext base = new DetectorContext("Foo.java", "java", "class Foo {}"); + Resolved r = stubAvailableResolved(); + DetectorContext withR = base.withResolved(r); + + // Original is untouched + assertEquals(Optional.empty(), base.resolved()); + // Copy carries the resolution + assertTrue(withR.resolved().isPresent()); + assertSame(r, withR.resolved().get()); + } + + @Test + void withResolvedNullClearsResolution() { + DetectorContext base = new DetectorContext("Foo.java", "java", "class Foo {}", + null, "m", null, Optional.of(stubAvailableResolved())); + DetectorContext cleared = base.withResolved(null); + + assertEquals(Optional.empty(), cleared.resolved(), + "withResolved(null) clears the resolution back to empty"); + } + + @Test + void withResolvedEmptyResolvedSentinelIsCarried() { + // A detector that wants to explicitly say "the resolver tried but came + // up empty" can attach EmptyResolved.INSTANCE — different semantics from + // Optional.empty (which means "the resolver pass didn't run for this file"). + DetectorContext base = new DetectorContext("Foo.java", "java", ""); + DetectorContext withEmpty = base.withResolved(EmptyResolved.INSTANCE); + + assertTrue(withEmpty.resolved().isPresent(), + "EmptyResolved.INSTANCE is a real value — Optional.isPresent() is true"); + assertSame(EmptyResolved.INSTANCE, withEmpty.resolved().get()); + assertFalse(withEmpty.resolved().get().isAvailable(), + "but isAvailable() == false — detectors still fall back to syntactic"); + } + + @Test + void withResolvedPreservesAllOtherFields() { + // Verifying we don't accidentally drop other fields when copying. + DetectorContext base = new DetectorContext("Foo.java", "java", "content", + "parsedAst", "moduleName", null); + DetectorContext copy = base.withResolved(EmptyResolved.INSTANCE); + + assertEquals("Foo.java", copy.filePath()); + assertEquals("java", copy.language()); + assertEquals("content", copy.content()); + assertEquals("parsedAst", copy.parsedData()); + assertEquals("moduleName", copy.moduleName()); + assertNull(copy.registry()); + } + + @Test + void resolvedAccessorTypicalDetectorUsage() { + // Documents the canonical detector-side check: filter on isAvailable + // before downcasting to a language-specific Resolved subclass. + DetectorContext ctxA = new DetectorContext("Foo.java", "java", ""); + DetectorContext ctxB = new DetectorContext("Foo.java", "java", "") + .withResolved(EmptyResolved.INSTANCE); + DetectorContext ctxC = new DetectorContext("Foo.java", "java", "") + .withResolved(stubAvailableResolved()); + + assertTrue(ctxA.resolved().filter(Resolved::isAvailable).isEmpty(), + "no resolution attached: detector falls back to syntactic"); + assertTrue(ctxB.resolved().filter(Resolved::isAvailable).isEmpty(), + "EmptyResolved attached: detector still falls back"); + assertTrue(ctxC.resolved().filter(Resolved::isAvailable).isPresent(), + "available Resolved attached: detector may downcast and use it"); + } + + private static Resolved stubAvailableResolved() { + return new Resolved() { + @Override public boolean isAvailable() { return true; } + @Override public Confidence sourceConfidence() { return Confidence.RESOLVED; } + }; + } +} diff --git a/src/test/java/io/github/randomcodespace/iq/detector/DetectorEmissionDefaultsTest.java b/src/test/java/io/github/randomcodespace/iq/detector/DetectorEmissionDefaultsTest.java new file mode 100644 index 00000000..47220b6a --- /dev/null +++ b/src/test/java/io/github/randomcodespace/iq/detector/DetectorEmissionDefaultsTest.java @@ -0,0 +1,253 @@ +package io.github.randomcodespace.iq.detector; + +import io.github.randomcodespace.iq.detector.jvm.java.AbstractJavaMessagingDetector; +import io.github.randomcodespace.iq.detector.jvm.java.AbstractJavaParserDetector; +import io.github.randomcodespace.iq.detector.python.AbstractPythonAntlrDetector; +import io.github.randomcodespace.iq.detector.python.AbstractPythonDbDetector; +import io.github.randomcodespace.iq.detector.typescript.AbstractTypeScriptDetector; +import io.github.randomcodespace.iq.model.CodeEdge; +import io.github.randomcodespace.iq.model.CodeNode; +import io.github.randomcodespace.iq.model.Confidence; +import io.github.randomcodespace.iq.model.EdgeKind; +import io.github.randomcodespace.iq.model.NodeKind; +import org.junit.jupiter.api.Test; +import org.junit.jupiter.params.ParameterizedTest; +import org.junit.jupiter.params.provider.Arguments; +import org.junit.jupiter.params.provider.MethodSource; + +import java.util.ArrayList; +import java.util.List; +import java.util.Set; +import java.util.stream.Stream; + +import static org.junit.jupiter.api.Assertions.*; + +/** + * Aggressive coverage for {@link Detector#defaultConfidence()} on every base + * class, plus the orchestrator stamping pass in {@link DetectorEmissionDefaults}. + * + *

Verifies the contract that lets us migrate detectors incrementally: + *

    + *
  • Each base class declares (or inherits) the right confidence floor.
  • + *
  • The stamping pass writes source + confidence ONLY when source is null + * (the "detector didn't think about it" sentinel).
  • + *
  • Explicitly-stamped emissions survive a stamping pass unchanged.
  • + *
  • Mixed results (some explicit, some default) get the right treatment + * on a per-emission basis.
  • + *
+ */ +class DetectorEmissionDefaultsTest { + + // ---------- Per-base default confidence ---------- + + static Stream baseClassDefaults() { + return Stream.of( + Arguments.of("interface default (LEXICAL)", new InterfaceOnlyDetector(), Confidence.LEXICAL), + Arguments.of("AbstractRegexDetector → LEXICAL", new RegexStub(), Confidence.LEXICAL), + Arguments.of("AbstractAntlrDetector → SYNTACTIC", new AntlrStub(), Confidence.SYNTACTIC), + Arguments.of("AbstractStructuredDetector → SYNTACTIC", new StructuredStub(), Confidence.SYNTACTIC), + Arguments.of("AbstractJavaParserDetector → SYNTACTIC", new JavaParserStub(), Confidence.SYNTACTIC), + Arguments.of("AbstractJavaMessagingDetector → SYNTACTIC", new JavaMessagingStub(), Confidence.SYNTACTIC), + Arguments.of("AbstractTypeScriptDetector inherits SYNTACTIC", new TypeScriptStub(), Confidence.SYNTACTIC), + Arguments.of("AbstractPythonAntlrDetector inherits SYNTACTIC", new PythonAntlrStub(), Confidence.SYNTACTIC), + Arguments.of("AbstractPythonDbDetector inherits SYNTACTIC", new PythonDbStub(), Confidence.SYNTACTIC) + ); + } + + @ParameterizedTest(name = "{0}") + @MethodSource("baseClassDefaults") + void defaultConfidencePerBaseClass(String label, Detector detector, Confidence expected) { + assertEquals(expected, detector.defaultConfidence(), label); + } + + // ---------- Stamping behavior ---------- + + @Test + void applyDefaults_stampsSourceAndConfidenceOnNullSourceNode() { + CodeNode node = new CodeNode("n:1", NodeKind.CLASS, "Foo"); + // Bare construction — source is null, confidence is the model default LEXICAL. + DetectorResult result = DetectorResult.of(new ArrayList<>(List.of(node)), new ArrayList<>()); + + DetectorEmissionDefaults.applyDefaults(result, new AntlrStub()); + + assertEquals("AntlrStub", node.getSource(), "source stamped to detector class simple name"); + assertEquals(Confidence.SYNTACTIC, node.getConfidence(), + "confidence bumped to base default (SYNTACTIC for AntlrStub)"); + } + + @Test + void applyDefaults_stampsSourceAndConfidenceOnNullSourceEdge() { + CodeNode tgt = new CodeNode("n:tgt", NodeKind.CLASS, "Tgt"); + CodeEdge edge = new CodeEdge("e:1", EdgeKind.DEPENDS_ON, "n:src", tgt); + DetectorResult result = DetectorResult.of(new ArrayList<>(), new ArrayList<>(List.of(edge))); + + DetectorEmissionDefaults.applyDefaults(result, new RegexStub()); + + assertEquals("RegexStub", edge.getSource()); + assertEquals(Confidence.LEXICAL, edge.getConfidence(), + "regex base default is LEXICAL"); + } + + @Test + void applyDefaults_leavesExplicitlyStampedNodeAlone() { + // Detector explicitly stamped — stamping pass must not clobber. + CodeNode node = new CodeNode("n:explicit", NodeKind.CLASS, "Foo"); + node.setSource("CustomResolverDetector"); + node.setConfidence(Confidence.RESOLVED); + DetectorResult result = DetectorResult.of(new ArrayList<>(List.of(node)), new ArrayList<>()); + + DetectorEmissionDefaults.applyDefaults(result, new AntlrStub()); + + assertEquals("CustomResolverDetector", node.getSource(), + "explicit source survives stamping pass"); + assertEquals(Confidence.RESOLVED, node.getConfidence(), + "explicit confidence survives stamping pass — not down-graded to base default"); + } + + @Test + void applyDefaults_leavesExplicitlyStampedEdgeAlone() { + CodeNode tgt = new CodeNode("n:tgt", NodeKind.CLASS, "Tgt"); + CodeEdge edge = new CodeEdge("e:explicit", EdgeKind.DEPENDS_ON, "n:src", tgt); + edge.setSource("ExplicitDetector"); + edge.setConfidence(Confidence.RESOLVED); + DetectorResult result = DetectorResult.of(new ArrayList<>(), new ArrayList<>(List.of(edge))); + + DetectorEmissionDefaults.applyDefaults(result, new RegexStub()); + + assertEquals("ExplicitDetector", edge.getSource()); + assertEquals(Confidence.RESOLVED, edge.getConfidence()); + } + + @Test + void applyDefaults_mixedExplicitAndDefaultsHandledIndependently() { + // One node was explicitly stamped, another wasn't. Verify the pass is + // per-emission, not all-or-nothing. + CodeNode explicit = new CodeNode("n:explicit", NodeKind.CLASS, "Explicit"); + explicit.setSource("ResolverDetector"); + explicit.setConfidence(Confidence.RESOLVED); + + CodeNode bare = new CodeNode("n:bare", NodeKind.CLASS, "Bare"); + + DetectorResult result = DetectorResult.of( + new ArrayList<>(List.of(explicit, bare)), + new ArrayList<>()); + + DetectorEmissionDefaults.applyDefaults(result, new StructuredStub()); + + // Explicit untouched + assertEquals("ResolverDetector", explicit.getSource()); + assertEquals(Confidence.RESOLVED, explicit.getConfidence()); + // Bare stamped + assertEquals("StructuredStub", bare.getSource()); + assertEquals(Confidence.SYNTACTIC, bare.getConfidence()); + } + + @Test + void applyDefaults_nullResultIsNoOp() { + // Defensive: callers may pass null on early returns. Must not NPE. + assertDoesNotThrow(() -> DetectorEmissionDefaults.applyDefaults(null, new RegexStub())); + } + + @Test + void applyDefaults_nullDetectorIsNoOp() { + // Defensive: the orchestrator should never pass null but the helper + // is the single trust boundary — must not NPE. + CodeNode node = new CodeNode("n:1", NodeKind.CLASS, "Foo"); + DetectorResult result = DetectorResult.of(new ArrayList<>(List.of(node)), new ArrayList<>()); + assertDoesNotThrow(() -> DetectorEmissionDefaults.applyDefaults(result, null)); + // Model state is untouched + assertNull(node.getSource()); + } + + @Test + void applyDefaults_emptyResultIsNoOp() { + DetectorResult result = DetectorResult.empty(); + assertDoesNotThrow(() -> DetectorEmissionDefaults.applyDefaults(result, new RegexStub())); + assertEquals(0, result.nodes().size()); + assertEquals(0, result.edges().size()); + } + + @Test + void applyDefaults_idempotentOnRepeatCall() { + // After the first stamp, the detector "owns" these emissions. A second + // stamping pass with the SAME detector is a no-op (source is no longer null). + CodeNode node = new CodeNode("n:idem", NodeKind.CLASS, "Foo"); + DetectorResult result = DetectorResult.of(new ArrayList<>(List.of(node)), new ArrayList<>()); + Detector detector = new AntlrStub(); + + DetectorEmissionDefaults.applyDefaults(result, detector); + String firstSource = node.getSource(); + Confidence firstConfidence = node.getConfidence(); + + DetectorEmissionDefaults.applyDefaults(result, detector); + + assertEquals(firstSource, node.getSource()); + assertEquals(firstConfidence, node.getConfidence()); + } + + @Test + void applyDefaults_secondPassWithDifferentDetectorIsAlsoNoOp() { + // After first stamp, source is set — a different detector running over + // the same result must NOT relabel the node. (This guards against pipeline + // reorder bugs where two detectors emit the same node.) + CodeNode node = new CodeNode("n:multi", NodeKind.CLASS, "Foo"); + DetectorResult result = DetectorResult.of(new ArrayList<>(List.of(node)), new ArrayList<>()); + + DetectorEmissionDefaults.applyDefaults(result, new AntlrStub()); + DetectorEmissionDefaults.applyDefaults(result, new RegexStub()); // different detector + + assertEquals("AntlrStub", node.getSource(), + "first detector's stamp wins — second pass is no-op"); + assertEquals(Confidence.SYNTACTIC, node.getConfidence()); + } + + // ---------- Test-only stub detectors ---------- + + /** Bare interface implementation — uses the interface's default LEXICAL. */ + private static final class InterfaceOnlyDetector implements Detector { + @Override public String getName() { return "iface_stub"; } + @Override public Set getSupportedLanguages() { return Set.of("test"); } + @Override public DetectorResult detect(DetectorContext ctx) { return DetectorResult.empty(); } + } + + private static final class RegexStub extends AbstractRegexDetector { + @Override public String getName() { return "regex_stub"; } + @Override public Set getSupportedLanguages() { return Set.of("test"); } + @Override public DetectorResult detect(DetectorContext ctx) { return DetectorResult.empty(); } + } + + private static final class AntlrStub extends AbstractAntlrDetector { + @Override public String getName() { return "antlr_stub"; } + @Override public Set getSupportedLanguages() { return Set.of("test"); } + } + + private static final class StructuredStub extends AbstractStructuredDetector { + @Override public String getName() { return "structured_stub"; } + @Override public Set getSupportedLanguages() { return Set.of("yaml"); } + @Override public DetectorResult detect(DetectorContext ctx) { return DetectorResult.empty(); } + } + + private static final class JavaParserStub extends AbstractJavaParserDetector { + @Override public String getName() { return "javaparser_stub"; } + @Override public Set getSupportedLanguages() { return Set.of("java"); } + @Override public DetectorResult detect(DetectorContext ctx) { return DetectorResult.empty(); } + } + + private static final class JavaMessagingStub extends AbstractJavaMessagingDetector { + @Override public String getName() { return "messaging_stub"; } + @Override public Set getSupportedLanguages() { return Set.of("java"); } + @Override public DetectorResult detect(DetectorContext ctx) { return DetectorResult.empty(); } + } + + private static final class TypeScriptStub extends AbstractTypeScriptDetector { + @Override public String getName() { return "ts_stub"; } + } + + private static final class PythonAntlrStub extends AbstractPythonAntlrDetector { + @Override public String getName() { return "python_antlr_stub"; } + } + + private static final class PythonDbStub extends AbstractPythonDbDetector { + @Override public String getName() { return "python_db_stub"; } + } +} diff --git a/src/test/java/io/github/randomcodespace/iq/graph/GraphStoreConfidenceRoundTripTest.java b/src/test/java/io/github/randomcodespace/iq/graph/GraphStoreConfidenceRoundTripTest.java new file mode 100644 index 00000000..c3b0b7b0 --- /dev/null +++ b/src/test/java/io/github/randomcodespace/iq/graph/GraphStoreConfidenceRoundTripTest.java @@ -0,0 +1,340 @@ +package io.github.randomcodespace.iq.graph; + +import io.github.randomcodespace.iq.model.CodeEdge; +import io.github.randomcodespace.iq.model.CodeNode; +import io.github.randomcodespace.iq.model.Confidence; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.Test; +import org.junit.jupiter.api.extension.ExtendWith; +import org.junit.jupiter.params.ParameterizedTest; +import org.junit.jupiter.params.provider.EnumSource; +import org.mockito.Mock; +import org.mockito.junit.jupiter.MockitoExtension; +import org.neo4j.graphdb.GraphDatabaseService; +import org.neo4j.graphdb.Result; +import org.neo4j.graphdb.Transaction; + +import java.util.List; +import java.util.Map; +import java.util.Optional; + +import static org.assertj.core.api.Assertions.assertThat; +import static org.mockito.ArgumentMatchers.anyMap; +import static org.mockito.ArgumentMatchers.anyString; +import static org.mockito.Mockito.mock; +import static org.mockito.Mockito.when; + +/** + * Aggressive Neo4j round-trip coverage for {@link CodeNode#getConfidence()} + + * {@link CodeNode#getSource()} (and the same on {@link CodeEdge}). Verifies: + *
    + *
  • All three {@link Confidence} values round-trip cleanly on nodes
  • + *
  • Missing properties (legacy data) fall back to {@code LEXICAL} / {@code null} + * — never throw, never null-pointer the typed field
  • + *
  • Malformed / mixed-case confidence strings are tolerated
  • + *
  • Edge confidence + source round-trip through {@code hydrateEdgesForNode}
  • + *
+ */ +@ExtendWith(MockitoExtension.class) +class GraphStoreConfidenceRoundTripTest { + + @Mock + private GraphRepository repository; + + @Mock + private GraphDatabaseService graphDb; + + private GraphStore store; + + @BeforeEach + void setUp() { + store = new GraphStore(repository, graphDb); + } + + // ---------- Node read path: nodeFromNeo4j() via findById() ---------- + + @ParameterizedTest + @EnumSource(Confidence.class) + void node_allConfidenceValuesRoundTrip(Confidence value) { + var neo4jNode = stubBareNeo4jNode("node:Foo.java:class:Foo", "class", "Foo"); + when(neo4jNode.getProperty("confidence", null)).thenReturn(value.name()); + when(neo4jNode.getProperty("source", null)).thenReturn("SpringServiceDetector"); + when(neo4jNode.getPropertyKeys()).thenReturn(List.of()); + wireFindByIdResult(neo4jNode); + + Optional result = store.findById("node:Foo.java:class:Foo"); + + assertThat(result).isPresent(); + assertThat(result.get().getConfidence()) + .as("confidence round-trips through Neo4j read path") + .isEqualTo(value); + assertThat(result.get().getSource()).isEqualTo("SpringServiceDetector"); + } + + @Test + void node_legacyMissingConfidenceFallsBackToLexical() { + // Simulates a node persisted before this field existed: confidence + source + // are absent. Reader must default to LEXICAL (least committal) and null. + var neo4jNode = stubBareNeo4jNode("node:Legacy.java:class:Legacy", "class", "Legacy"); + when(neo4jNode.getProperty("confidence", null)).thenReturn(null); + when(neo4jNode.getProperty("source", null)).thenReturn(null); + when(neo4jNode.getPropertyKeys()).thenReturn(List.of()); + wireFindByIdResult(neo4jNode); + + Optional result = store.findById("node:Legacy.java:class:Legacy"); + + assertThat(result).isPresent(); + assertThat(result.get().getConfidence()) + .as("missing confidence in Neo4j defaults to LEXICAL — never null") + .isEqualTo(Confidence.LEXICAL); + assertThat(result.get().getSource()) + .as("missing source stays null — no string sentinel") + .isNull(); + } + + @Test + void node_legacyHasSourceButMissingConfidence() { + // Mixed legacy: source got populated some other way but confidence wasn't. + // Source preserved, confidence still falls back. + var neo4jNode = stubBareNeo4jNode("node:Mixed.java:class:Mixed", "class", "Mixed"); + when(neo4jNode.getProperty("confidence", null)).thenReturn(null); + when(neo4jNode.getProperty("source", null)).thenReturn("PartialMigrationDetector"); + when(neo4jNode.getPropertyKeys()).thenReturn(List.of()); + wireFindByIdResult(neo4jNode); + + Optional result = store.findById("node:Mixed.java:class:Mixed"); + + assertThat(result).isPresent(); + assertThat(result.get().getConfidence()).isEqualTo(Confidence.LEXICAL); + assertThat(result.get().getSource()).isEqualTo("PartialMigrationDetector"); + } + + @Test + void node_malformedConfidenceFallsBackToLexicalWithoutThrowing() { + // A garbled write or a future enum addition that hasn't shipped here yet: + // the reader must not throw — it falls back to LEXICAL silently. + var neo4jNode = stubBareNeo4jNode("node:Garbled.java:class:Garbled", "class", "Garbled"); + when(neo4jNode.getProperty("confidence", null)).thenReturn("PERFECT"); // not in enum + when(neo4jNode.getProperty("source", null)).thenReturn(null); + when(neo4jNode.getPropertyKeys()).thenReturn(List.of()); + wireFindByIdResult(neo4jNode); + + // Must not throw IllegalArgumentException + Optional result = store.findById("node:Garbled.java:class:Garbled"); + + assertThat(result).isPresent(); + assertThat(result.get().getConfidence()) + .as("unknown confidence string falls back to LEXICAL — read path is non-throwing") + .isEqualTo(Confidence.LEXICAL); + } + + @Test + void node_mixedCaseConfidenceParsesCorrectly() { + // Confidence.fromString is case-insensitive — verify the read path uses it. + var neo4jNode = stubBareNeo4jNode("node:Mixed.java:class:Mixed", "class", "Mixed"); + when(neo4jNode.getProperty("confidence", null)).thenReturn("ReSoLvEd"); + when(neo4jNode.getProperty("source", null)).thenReturn("CaseTestDetector"); + when(neo4jNode.getPropertyKeys()).thenReturn(List.of()); + wireFindByIdResult(neo4jNode); + + Optional result = store.findById("node:Mixed.java:class:Mixed"); + + assertThat(result).isPresent(); + assertThat(result.get().getConfidence()).isEqualTo(Confidence.RESOLVED); + } + + @Test + void node_emptyStringSourcePreservedAsEmpty() { + // Defensive: if upstream wrote an empty string, we don't silently turn it + // into null — the field reads back as empty string. (Detectors should never + // emit empty source, but the read path stays faithful.) + var neo4jNode = stubBareNeo4jNode("node:Empty.java:class:Empty", "class", "Empty"); + when(neo4jNode.getProperty("confidence", null)).thenReturn("LEXICAL"); + when(neo4jNode.getProperty("source", null)).thenReturn(""); + when(neo4jNode.getPropertyKeys()).thenReturn(List.of()); + wireFindByIdResult(neo4jNode); + + Optional result = store.findById("node:Empty.java:class:Empty"); + + assertThat(result).isPresent(); + assertThat(result.get().getSource()).isEmpty(); + } + + // ---------- Edge read path: hydrateEdgesForNode() via findById() ---------- + + @Test + void edge_confidenceAndSourceRoundTrip() { + // findById hydrates outgoing edges. Mock both the node lookup AND the edge query. + var neo4jNode = stubBareNeo4jNode("node:Foo.java:class:Foo", "class", "Foo"); + when(neo4jNode.getProperty("confidence", null)).thenReturn("RESOLVED"); + when(neo4jNode.getProperty("source", null)).thenReturn("SpringServiceDetector"); + when(neo4jNode.getPropertyKeys()).thenReturn(List.of()); + + var targetNeo4j = stubBareNeo4jNode("node:Bar.java:class:Bar", "class", "Bar"); + when(targetNeo4j.getProperty("confidence", null)).thenReturn(null); + when(targetNeo4j.getProperty("source", null)).thenReturn(null); + when(targetNeo4j.getPropertyKeys()).thenReturn(List.of()); + + var tx = mock(Transaction.class); + when(graphDb.beginTx()).thenReturn(tx); + + // First execute(): node lookup + var nodeResult = mock(Result.class); + when(nodeResult.hasNext()).thenReturn(true, false); + when(nodeResult.next()).thenReturn(Map.of("n", neo4jNode)); + + // Second execute(): outgoing edges + var edgeResult = mock(Result.class); + when(edgeResult.hasNext()).thenReturn(true, false); + when(edgeResult.next()).thenReturn(Map.of( + "id", "edge:Foo->Bar:depends_on", + "kind", "depends_on", + "targetId", "node:Bar.java:class:Bar", + "t", targetNeo4j, + "confidence", "RESOLVED", + "source", "SpringDependsOnDetector" + )); + + when(tx.execute(anyString(), anyMap())).thenReturn(nodeResult, edgeResult); + + Optional result = store.findById("node:Foo.java:class:Foo"); + + assertThat(result).isPresent(); + assertThat(result.get().getEdges()).hasSize(1); + CodeEdge edge = result.get().getEdges().getFirst(); + assertThat(edge.getConfidence()).isEqualTo(Confidence.RESOLVED); + assertThat(edge.getSource()).isEqualTo("SpringDependsOnDetector"); + } + + @Test + void edge_legacyMissingConfidenceAndSourceFallsBackCleanly() { + var neo4jNode = stubBareNeo4jNode("node:Foo.java:class:Foo", "class", "Foo"); + when(neo4jNode.getProperty("confidence", null)).thenReturn(null); + when(neo4jNode.getProperty("source", null)).thenReturn(null); + when(neo4jNode.getPropertyKeys()).thenReturn(List.of()); + + var targetNeo4j = stubBareNeo4jNode("node:Bar.java:class:Bar", "class", "Bar"); + when(targetNeo4j.getProperty("confidence", null)).thenReturn(null); + when(targetNeo4j.getProperty("source", null)).thenReturn(null); + when(targetNeo4j.getPropertyKeys()).thenReturn(List.of()); + + var tx = mock(Transaction.class); + when(graphDb.beginTx()).thenReturn(tx); + + var nodeResult = mock(Result.class); + when(nodeResult.hasNext()).thenReturn(true, false); + when(nodeResult.next()).thenReturn(Map.of("n", neo4jNode)); + + // Edge row missing confidence + source keys (legacy edge). Map.of cannot + // contain nulls, so we use HashMap-style construction via java.util.HashMap. + java.util.HashMap legacyEdgeRow = new java.util.HashMap<>(); + legacyEdgeRow.put("id", "edge:Foo->Bar:legacy"); + legacyEdgeRow.put("kind", "depends_on"); + legacyEdgeRow.put("targetId", "node:Bar.java:class:Bar"); + legacyEdgeRow.put("t", targetNeo4j); + legacyEdgeRow.put("confidence", null); + legacyEdgeRow.put("source", null); + + var edgeResult = mock(Result.class); + when(edgeResult.hasNext()).thenReturn(true, false); + when(edgeResult.next()).thenReturn(legacyEdgeRow); + + when(tx.execute(anyString(), anyMap())).thenReturn(nodeResult, edgeResult); + + Optional result = store.findById("node:Foo.java:class:Foo"); + + assertThat(result).isPresent(); + assertThat(result.get().getEdges()).hasSize(1); + CodeEdge edge = result.get().getEdges().getFirst(); + assertThat(edge.getConfidence()) + .as("legacy edge missing confidence falls back to LEXICAL") + .isEqualTo(Confidence.LEXICAL); + assertThat(edge.getSource()) + .as("legacy edge missing source stays null") + .isNull(); + } + + @Test + void edge_malformedConfidenceFallsBackToLexicalWithoutThrowing() { + var neo4jNode = stubBareNeo4jNode("node:Foo.java:class:Foo", "class", "Foo"); + when(neo4jNode.getProperty("confidence", null)).thenReturn(null); + when(neo4jNode.getProperty("source", null)).thenReturn(null); + when(neo4jNode.getPropertyKeys()).thenReturn(List.of()); + + var targetNeo4j = stubBareNeo4jNode("node:Bar.java:class:Bar", "class", "Bar"); + when(targetNeo4j.getProperty("confidence", null)).thenReturn(null); + when(targetNeo4j.getProperty("source", null)).thenReturn(null); + when(targetNeo4j.getPropertyKeys()).thenReturn(List.of()); + + var tx = mock(Transaction.class); + when(graphDb.beginTx()).thenReturn(tx); + + var nodeResult = mock(Result.class); + when(nodeResult.hasNext()).thenReturn(true, false); + when(nodeResult.next()).thenReturn(Map.of("n", neo4jNode)); + + var edgeResult = mock(Result.class); + when(edgeResult.hasNext()).thenReturn(true, false); + when(edgeResult.next()).thenReturn(Map.of( + "id", "edge:Foo->Bar:garbled", + "kind", "depends_on", + "targetId", "node:Bar.java:class:Bar", + "t", targetNeo4j, + "confidence", "PERFECT", // not a Confidence enum + "source", "GarbledDetector" + )); + + when(tx.execute(anyString(), anyMap())).thenReturn(nodeResult, edgeResult); + + Optional result = store.findById("node:Foo.java:class:Foo"); + + assertThat(result).isPresent(); + CodeEdge edge = result.get().getEdges().getFirst(); + assertThat(edge.getConfidence()) + .as("garbled enum string does not throw — falls back to LEXICAL") + .isEqualTo(Confidence.LEXICAL); + assertThat(edge.getSource()) + .as("source is preserved even when confidence is garbled") + .isEqualTo("GarbledDetector"); + } + + // ---------- Helpers ---------- + + /** + * Build a Neo4j Node mock with the standard non-confidence-related getProperty + * stubs already wired (id, kind, label, fqn, module, filePath, layer, lineStart, + * lineEnd, annotations). Caller adds confidence + source + propertyKeys stubs. + */ + private static org.neo4j.graphdb.Node stubBareNeo4jNode(String id, String kindStr, String label) { + var n = mock(org.neo4j.graphdb.Node.class); + when(n.getProperty("id", null)).thenReturn(id); + when(n.getProperty("kind", null)).thenReturn(kindStr); + when(n.getProperty("label", "")).thenReturn(label); + when(n.getProperty("fqn", null)).thenReturn(null); + when(n.getProperty("module", null)).thenReturn(null); + when(n.getProperty("filePath", null)).thenReturn(null); + when(n.getProperty("layer", null)).thenReturn(null); + when(n.getProperty("lineStart", null)).thenReturn(null); + when(n.getProperty("lineEnd", null)).thenReturn(null); + when(n.getProperty("annotations", null)).thenReturn(null); + return n; + } + + /** + * Wire up findById's transaction chain: first execute() returns the node row, + * second execute() (the edge hydration) returns empty. + */ + private void wireFindByIdResult(org.neo4j.graphdb.Node neo4jNode) { + var tx = mock(Transaction.class); + when(graphDb.beginTx()).thenReturn(tx); + + var nodeResult = mock(Result.class); + when(nodeResult.hasNext()).thenReturn(true, false); + when(nodeResult.next()).thenReturn(Map.of("n", neo4jNode)); + + var edgeResult = mock(Result.class); + when(edgeResult.hasNext()).thenReturn(false); + + when(tx.execute(anyString(), anyMap())).thenReturn(nodeResult, edgeResult); + } +} diff --git a/src/test/java/io/github/randomcodespace/iq/graph/GraphStoreExtendedTest.java b/src/test/java/io/github/randomcodespace/iq/graph/GraphStoreExtendedTest.java index 6a9181e9..e3dfba7b 100644 --- a/src/test/java/io/github/randomcodespace/iq/graph/GraphStoreExtendedTest.java +++ b/src/test/java/io/github/randomcodespace/iq/graph/GraphStoreExtendedTest.java @@ -51,6 +51,9 @@ private org.neo4j.graphdb.Node mockNeo4jNode(String id, String kind, String labe when(neo4jNode.getProperty("layer", null)).thenReturn(null); when(neo4jNode.getProperty("lineStart", null)).thenReturn(null); when(neo4jNode.getProperty("lineEnd", null)).thenReturn(null); + when(neo4jNode.getProperty("annotations", null)).thenReturn(null); + when(neo4jNode.getProperty("confidence", null)).thenReturn(null); + when(neo4jNode.getProperty("source", null)).thenReturn(null); return neo4jNode; } diff --git a/src/test/java/io/github/randomcodespace/iq/intelligence/ProvenanceNeo4jRoundTripTest.java b/src/test/java/io/github/randomcodespace/iq/intelligence/ProvenanceNeo4jRoundTripTest.java index 748133ca..8c6a7fde 100644 --- a/src/test/java/io/github/randomcodespace/iq/intelligence/ProvenanceNeo4jRoundTripTest.java +++ b/src/test/java/io/github/randomcodespace/iq/intelligence/ProvenanceNeo4jRoundTripTest.java @@ -63,6 +63,11 @@ void provenance_survivesNeo4jRoundTrip() { when(neo4jNode.getProperty("lineStart", null)).thenReturn(null); when(neo4jNode.getProperty("lineEnd", null)).thenReturn(null); when(neo4jNode.getProperty("annotations", null)).thenReturn(null); + // confidence + source are typed first-class fields read by nodeFromNeo4j; + // this test doesn't care about them, so stub null (legacy/unset) and let the + // reader fall back to its defaults. + when(neo4jNode.getProperty("confidence", null)).thenReturn(null); + when(neo4jNode.getProperty("source", null)).thenReturn(null); // Property keys as stored by bulkSave (prop_ prefix, values as String) when(neo4jNode.getPropertyKeys()).thenReturn(List.of( @@ -122,6 +127,8 @@ void provenance_survivesNeo4jRoundTrip_withNullRepoUrl() { when(neo4jNode.getProperty("lineStart", null)).thenReturn(null); when(neo4jNode.getProperty("lineEnd", null)).thenReturn(null); when(neo4jNode.getProperty("annotations", null)).thenReturn(null); + when(neo4jNode.getProperty("confidence", null)).thenReturn(null); + when(neo4jNode.getProperty("source", null)).thenReturn(null); // Only required provenance keys (no repo_url, no commit_sha) when(neo4jNode.getPropertyKeys()).thenReturn(List.of( diff --git a/src/test/java/io/github/randomcodespace/iq/intelligence/resolver/ResolutionExceptionTest.java b/src/test/java/io/github/randomcodespace/iq/intelligence/resolver/ResolutionExceptionTest.java new file mode 100644 index 00000000..bd420b65 --- /dev/null +++ b/src/test/java/io/github/randomcodespace/iq/intelligence/resolver/ResolutionExceptionTest.java @@ -0,0 +1,55 @@ +package io.github.randomcodespace.iq.intelligence.resolver; + +import org.junit.jupiter.api.Test; + +import java.nio.file.Path; + +import static org.junit.jupiter.api.Assertions.*; + +/** + * Aggressive coverage for {@link ResolutionException}. Verifies it carries + * actionable context (file + language) so the orchestrator can log usefully. + */ +class ResolutionExceptionTest { + + @Test + void carriesMessageFileAndLanguage() { + Path p = Path.of("/tmp/Foo.java"); + ResolutionException e = new ResolutionException("bootstrap failed", p, "java"); + + assertEquals("bootstrap failed", e.getMessage()); + assertEquals(p, e.file()); + assertEquals("java", e.language()); + assertNull(e.getCause(), "no underlying cause when constructed without one"); + } + + @Test + void carriesUnderlyingCause() { + Path p = Path.of("/tmp/Foo.java"); + Exception root = new IllegalStateException("classpath broken"); + ResolutionException e = new ResolutionException("bootstrap failed", root, p, "java"); + + assertSame(root, e.getCause(), "underlying cause is preserved"); + assertEquals(p, e.file()); + assertEquals("java", e.language()); + } + + @Test + void nullFileAndLanguageAreAllowed() { + // Defensive: some callers may not have file/language at hand. + // The exception should still construct without NPE. + ResolutionException e = new ResolutionException("generic failure", null, null); + assertNull(e.file()); + assertNull(e.language()); + assertEquals("generic failure", e.getMessage()); + } + + @Test + void isCheckedException() { + // The exception is checked by design — orchestrators must catch and + // decide whether to skip the file or abort the pass. + assertFalse(RuntimeException.class.isAssignableFrom(ResolutionException.class), + "ResolutionException must be a checked exception (subclass of Exception, not RuntimeException)"); + assertTrue(Exception.class.isAssignableFrom(ResolutionException.class)); + } +} diff --git a/src/test/java/io/github/randomcodespace/iq/intelligence/resolver/ResolvedContractTest.java b/src/test/java/io/github/randomcodespace/iq/intelligence/resolver/ResolvedContractTest.java new file mode 100644 index 00000000..8f49d10a --- /dev/null +++ b/src/test/java/io/github/randomcodespace/iq/intelligence/resolver/ResolvedContractTest.java @@ -0,0 +1,68 @@ +package io.github.randomcodespace.iq.intelligence.resolver; + +import io.github.randomcodespace.iq.model.Confidence; +import org.junit.jupiter.api.Test; + +import static org.junit.jupiter.api.Assertions.*; + +/** + * Contract tests for {@link Resolved} and the {@link EmptyResolved} singleton. + * + *

{@link EmptyResolved} is a load-bearing sentinel — detectors check + * {@link Resolved#isAvailable()} == false to decide "fall back to syntactic + * detection." Anything that breaks the singleton invariants below is a bug. + */ +class ResolvedContractTest { + + @Test + void emptyResolvedIsSingleton() { + // Reference equality — detectors may use `==` to short-circuit + // (e.g. `if (resolved == EmptyResolved.INSTANCE) return ...`) + assertSame(EmptyResolved.INSTANCE, EmptyResolved.INSTANCE); + } + + @Test + void emptyResolvedReportsNotAvailable() { + assertFalse(EmptyResolved.INSTANCE.isAvailable(), + "EmptyResolved must always report not-available — it's the 'no resolution' sentinel"); + } + + @Test + void emptyResolvedConfidenceFloorIsLexical() { + // Resolution didn't happen — emissions consulting EmptyResolved should + // never claim RESOLVED. LEXICAL is the safe floor. + assertEquals(Confidence.LEXICAL, EmptyResolved.INSTANCE.sourceConfidence(), + "EmptyResolved floor is LEXICAL — nothing was actually resolved"); + } + + @Test + void emptyResolvedConstructorIsPrivate() throws Exception { + // Defensive: prevent rogue subclasses from violating the singleton. + var ctor = EmptyResolved.class.getDeclaredConstructor(); + assertTrue(java.lang.reflect.Modifier.isPrivate(ctor.getModifiers()), + "EmptyResolved must have a private constructor"); + } + + @Test + void emptyResolvedClassIsFinal() { + // Singletons must not be subclassable — a subclass could return true + // from isAvailable() and break the contract. + assertTrue(java.lang.reflect.Modifier.isFinal(EmptyResolved.class.getModifiers()), + "EmptyResolved must be final to preserve singleton invariants"); + } + + @Test + void resolvedInterfaceContractAvailableImpliesNonLexical() { + // Documents the convention via a custom test impl: a Resolved that + // claims isAvailable==true is expected to expose a non-LEXICAL floor + // (LEXICAL is reserved for "nothing resolved"). This isn't enforced by + // the interface — it's a contract the tests document. + Resolved fakeResolved = new Resolved() { + @Override public boolean isAvailable() { return true; } + @Override public Confidence sourceConfidence() { return Confidence.RESOLVED; } + }; + assertTrue(fakeResolved.isAvailable()); + assertEquals(Confidence.RESOLVED, fakeResolved.sourceConfidence(), + "available Resolved instances should expose RESOLVED (or higher)"); + } +} diff --git a/src/test/java/io/github/randomcodespace/iq/intelligence/resolver/ResolverRegistryTest.java b/src/test/java/io/github/randomcodespace/iq/intelligence/resolver/ResolverRegistryTest.java new file mode 100644 index 00000000..ab77cabb --- /dev/null +++ b/src/test/java/io/github/randomcodespace/iq/intelligence/resolver/ResolverRegistryTest.java @@ -0,0 +1,206 @@ +package io.github.randomcodespace.iq.intelligence.resolver; + +import io.github.randomcodespace.iq.analyzer.DiscoveredFile; +import io.github.randomcodespace.iq.model.Confidence; +import org.junit.jupiter.api.Test; + +import java.nio.file.Path; +import java.util.ArrayList; +import java.util.List; +import java.util.Set; +import java.util.concurrent.atomic.AtomicInteger; + +import static org.junit.jupiter.api.Assertions.*; + +/** + * Aggressive coverage for {@link ResolverRegistry}. Exercises the determinism, + * conflict resolution, case-insensitivity, null tolerance, and per-resolver + * failure isolation contracts. + */ +class ResolverRegistryTest { + + // ---------- Lookup ---------- + + @Test + void emptyRegistryReturnsNoopForAnyLanguage() throws ResolutionException { + ResolverRegistry registry = new ResolverRegistry(List.of()); + SymbolResolver r = registry.resolverFor("java"); + assertSame(ResolverRegistry.NOOP, r); + + // The NOOP must always return EmptyResolved + Resolved result = r.resolve(new DiscoveredFile(Path.of("Foo.java"), "java", 100), "ast"); + assertSame(EmptyResolved.INSTANCE, result); + } + + @Test + void singleResolverIsReturnedForItsLanguage() { + AStubResolver java = new AStubResolver("java"); + ResolverRegistry registry = new ResolverRegistry(List.of(java)); + assertSame(java, registry.resolverFor("java")); + } + + @Test + void unknownLanguageReturnsNoop() { + AStubResolver java = new AStubResolver("java"); + ResolverRegistry registry = new ResolverRegistry(List.of(java)); + assertSame(ResolverRegistry.NOOP, registry.resolverFor("python")); + } + + @Test + void languageLookupIsCaseInsensitive() { + AStubResolver java = new AStubResolver("java"); + ResolverRegistry registry = new ResolverRegistry(List.of(java)); + assertSame(java, registry.resolverFor("Java")); + assertSame(java, registry.resolverFor("JAVA")); + assertSame(java, registry.resolverFor("jAvA")); + } + + @Test + void nullLanguageReturnsNoopWithoutNpe() { + AStubResolver java = new AStubResolver("java"); + ResolverRegistry registry = new ResolverRegistry(List.of(java)); + // Defensive: null is a sentinel, not an error + assertSame(ResolverRegistry.NOOP, registry.resolverFor(null)); + } + + @Test + void resolverForNeverReturnsNull() { + ResolverRegistry registry = new ResolverRegistry(List.of()); + assertNotNull(registry.resolverFor("java")); + assertNotNull(registry.resolverFor("python")); + assertNotNull(registry.resolverFor("")); + assertNotNull(registry.resolverFor("\t\n")); + } + + @Test + void blankLanguageReturnsNoop() { + // Detector contract: getSupportedLanguages should never include blank/empty strings. + // The registry defensively skips them so a misbehaving resolver doesn't poison + // lookup for "" . + AStubResolver java = new AStubResolver("java"); + ResolverRegistry registry = new ResolverRegistry(List.of(java)); + assertSame(ResolverRegistry.NOOP, registry.resolverFor("")); + assertSame(ResolverRegistry.NOOP, registry.resolverFor(" ")); + } + + // ---------- Conflict resolution ---------- + + @Test + void duplicateLanguageFirstSortedWins() { + // Two resolvers both claim "java". Sort by class simple name — A before Z. + AStubResolver a = new AStubResolver("java"); + ZStubResolver z = new ZStubResolver("java"); + ResolverRegistry registry = new ResolverRegistry(List.of(z, a)); // input order intentionally reversed + + assertSame(a, registry.resolverFor("java"), + "first-in-sort-order wins — AStubResolver < ZStubResolver alphabetically"); + } + + // ---------- Order ---------- + + @Test + void allReturnsSortedOrder() { + AStubResolver a = new AStubResolver("a"); + ZStubResolver z = new ZStubResolver("z"); + MStubResolver m = new MStubResolver("m"); + ResolverRegistry registry = new ResolverRegistry(List.of(z, a, m)); + + List all = registry.all(); + assertEquals(3, all.size()); + assertSame(a, all.get(0)); + assertSame(m, all.get(1)); + assertSame(z, all.get(2)); + } + + // ---------- Bootstrap ---------- + + @Test + void bootstrapCallsEveryResolverInOrder() { + List calledOrder = new ArrayList<>(); + AStubResolver a = new AStubResolver("a", () -> calledOrder.add("A")); + MStubResolver m = new MStubResolver("m", () -> calledOrder.add("M")); + ZStubResolver z = new ZStubResolver("z", () -> calledOrder.add("Z")); + ResolverRegistry registry = new ResolverRegistry(List.of(z, m, a)); // input order shuffled + + registry.bootstrap(Path.of("/tmp/project")); + + assertEquals(List.of("A", "M", "Z"), calledOrder, + "bootstrap iterates in alphabetical order — determinism guarantee"); + } + + @Test + void bootstrapResilient_oneFailureDoesNotBlockOthers() { + AtomicInteger aCalled = new AtomicInteger(); + AtomicInteger zCalled = new AtomicInteger(); + AStubResolver a = new AStubResolver("a", () -> { + aCalled.incrementAndGet(); + throw new RuntimeException("simulated bootstrap failure"); + }); + ZStubResolver z = new ZStubResolver("z", zCalled::incrementAndGet); + ResolverRegistry registry = new ResolverRegistry(List.of(a, z)); + + // Must not throw — failure is swallowed and logged + assertDoesNotThrow(() -> registry.bootstrap(Path.of("/tmp/project"))); + + assertEquals(1, aCalled.get(), "failing resolver was called"); + assertEquals(1, zCalled.get(), + "subsequent resolvers run despite earlier failure — resilience guarantee"); + } + + @Test + void bootstrapResilient_resolutionExceptionAlsoSwallowed() { + AtomicInteger zCalled = new AtomicInteger(); + SymbolResolver throwing = new SymbolResolver() { + @Override public Set getSupportedLanguages() { return Set.of("a"); } + @Override public void bootstrap(Path projectRoot) throws ResolutionException { + throw new ResolutionException("simulated checked failure", projectRoot, "a"); + } + @Override public Resolved resolve(DiscoveredFile file, Object parsedAst) { + return EmptyResolved.INSTANCE; + } + }; + ZStubResolver z = new ZStubResolver("z", zCalled::incrementAndGet); + ResolverRegistry registry = new ResolverRegistry(List.of(throwing, z)); + + assertDoesNotThrow(() -> registry.bootstrap(Path.of("/tmp/project"))); + assertEquals(1, zCalled.get(), + "ResolutionException from one resolver does not stop the pass"); + } + + @Test + void bootstrapEmptyRegistryIsNoOp() { + ResolverRegistry registry = new ResolverRegistry(List.of()); + assertDoesNotThrow(() -> registry.bootstrap(Path.of("/tmp/project"))); + } + + // ---------- Test stubs ---------- + + /** Resolves one language. Optional bootstrap callback for sequencing tests. */ + private static class AStubResolver implements SymbolResolver { + private final String language; + private final Runnable onBootstrap; + AStubResolver(String language) { this(language, () -> {}); } + AStubResolver(String language, Runnable onBootstrap) { + this.language = language; + this.onBootstrap = onBootstrap; + } + @Override public Set getSupportedLanguages() { return Set.of(language); } + @Override public void bootstrap(Path projectRoot) { onBootstrap.run(); } + @Override public Resolved resolve(DiscoveredFile file, Object parsedAst) { + return new Resolved() { + @Override public boolean isAvailable() { return true; } + @Override public Confidence sourceConfidence() { return Confidence.RESOLVED; } + }; + } + } + + private static final class MStubResolver extends AStubResolver { + MStubResolver(String language) { super(language); } + MStubResolver(String language, Runnable onBootstrap) { super(language, onBootstrap); } + } + + private static final class ZStubResolver extends AStubResolver { + ZStubResolver(String language) { super(language); } + ZStubResolver(String language, Runnable onBootstrap) { super(language, onBootstrap); } + } +} diff --git a/src/test/java/io/github/randomcodespace/iq/intelligence/resolver/SymbolResolverContractTest.java b/src/test/java/io/github/randomcodespace/iq/intelligence/resolver/SymbolResolverContractTest.java new file mode 100644 index 00000000..24a8393f --- /dev/null +++ b/src/test/java/io/github/randomcodespace/iq/intelligence/resolver/SymbolResolverContractTest.java @@ -0,0 +1,126 @@ +package io.github.randomcodespace.iq.intelligence.resolver; + +import io.github.randomcodespace.iq.analyzer.DiscoveredFile; +import io.github.randomcodespace.iq.model.Confidence; +import org.junit.jupiter.api.Test; + +import java.nio.file.Path; +import java.util.Set; +import java.util.concurrent.atomic.AtomicBoolean; + +import static org.junit.jupiter.api.Assertions.*; + +/** + * Contract coverage for {@link SymbolResolver}. Verifies a stub implementation + * honours the SPI invariants: + *

    + *
  • {@link SymbolResolver#getSupportedLanguages()} returns a non-empty set
  • + *
  • {@link SymbolResolver#bootstrap(Path)} runs before any + * {@link SymbolResolver#resolve(DiscoveredFile, Object)} call
  • + *
  • {@link SymbolResolver#resolve(DiscoveredFile, Object)} never returns + * {@code null} — uses {@link EmptyResolved#INSTANCE} for the + * not-supported / wrong-type cases
  • + *
  • {@link SymbolResolver#shutdown()} default is a no-op
  • + *
+ */ +class SymbolResolverContractTest { + + @Test + void supportedLanguagesIsNonEmpty() { + SymbolResolver r = new StubResolver(Set.of("java")); + assertFalse(r.getSupportedLanguages().isEmpty()); + assertEquals(Set.of("java"), r.getSupportedLanguages()); + } + + @Test + void resolveReturnsEmptyForUnknownLanguage() throws ResolutionException { + SymbolResolver r = new StubResolver(Set.of("java")); + r.bootstrap(Path.of("/tmp/project")); + + DiscoveredFile pyFile = new DiscoveredFile(Path.of("foo.py"), "python", 100); + Resolved result = r.resolve(pyFile, "some-ast"); + + assertSame(EmptyResolved.INSTANCE, result, + "unknown-language file returns EmptyResolved, never null"); + } + + @Test + void resolveReturnsAvailableResolvedForSupportedLanguage() throws ResolutionException { + StubResolver r = new StubResolver(Set.of("java")); + r.bootstrap(Path.of("/tmp/project")); + + DiscoveredFile javaFile = new DiscoveredFile(Path.of("Foo.java"), "java", 100); + Resolved result = r.resolve(javaFile, "fake-cu"); + + assertNotSame(EmptyResolved.INSTANCE, result); + assertTrue(result.isAvailable()); + assertEquals(Confidence.RESOLVED, result.sourceConfidence()); + } + + @Test + void resolveNeverReturnsNull() throws ResolutionException { + // Even with a null AST, the contract forbids returning null — + // the resolver must downgrade to EmptyResolved. + StubResolver r = new StubResolver(Set.of("java")); + r.bootstrap(Path.of("/tmp/project")); + + DiscoveredFile javaFile = new DiscoveredFile(Path.of("Foo.java"), "java", 100); + Resolved result = r.resolve(javaFile, null); // null AST + + assertNotNull(result, "resolve() must never return null"); + assertSame(EmptyResolved.INSTANCE, result, + "null AST falls back to EmptyResolved"); + } + + @Test + void shutdownDefaultIsNoOp() { + // The interface provides a default {} shutdown — verify it runs without + // throwing on a stub that doesn't override. + SymbolResolver r = new SymbolResolver() { + @Override public Set getSupportedLanguages() { return Set.of("java"); } + @Override public void bootstrap(Path projectRoot) { } + @Override public Resolved resolve(DiscoveredFile file, Object parsedAst) { + return EmptyResolved.INSTANCE; + } + // shutdown not overridden — uses interface default + }; + assertDoesNotThrow(r::shutdown); + } + + @Test + void bootstrapOnlyCalledOnce_resolverState() throws ResolutionException { + // A well-formed resolver should idempotently set up its state on a + // single bootstrap. Verified via the stub's flag. + StubResolver r = new StubResolver(Set.of("java")); + assertFalse(r.bootstrapped.get()); + r.bootstrap(Path.of("/tmp/project")); + assertTrue(r.bootstrapped.get()); + } + + /** Test-only resolver: returns a mock available Resolved for matching languages. */ + private static final class StubResolver implements SymbolResolver { + private final Set languages; + final AtomicBoolean bootstrapped = new AtomicBoolean(false); + + StubResolver(Set languages) { + this.languages = languages; + } + + @Override public Set getSupportedLanguages() { return languages; } + + @Override + public void bootstrap(Path projectRoot) { + bootstrapped.set(true); + } + + @Override + public Resolved resolve(DiscoveredFile file, Object parsedAst) { + if (!languages.contains(file.language())) return EmptyResolved.INSTANCE; + if (parsedAst == null) return EmptyResolved.INSTANCE; + return new Resolved() { + @Override public boolean isAvailable() { return true; } + @Override public Confidence sourceConfidence() { return Confidence.RESOLVED; } + }; + } + } +} diff --git a/src/test/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaResolvedTest.java b/src/test/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaResolvedTest.java new file mode 100644 index 00000000..7d5560f0 --- /dev/null +++ b/src/test/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaResolvedTest.java @@ -0,0 +1,73 @@ +package io.github.randomcodespace.iq.intelligence.resolver.java; + +import com.github.javaparser.StaticJavaParser; +import com.github.javaparser.ast.CompilationUnit; +import com.github.javaparser.symbolsolver.JavaSymbolSolver; +import com.github.javaparser.symbolsolver.resolution.typesolvers.CombinedTypeSolver; +import com.github.javaparser.symbolsolver.resolution.typesolvers.ReflectionTypeSolver; +import io.github.randomcodespace.iq.intelligence.resolver.EmptyResolved; +import io.github.randomcodespace.iq.intelligence.resolver.Resolved; +import io.github.randomcodespace.iq.model.Confidence; +import org.junit.jupiter.api.Test; + +import static org.junit.jupiter.api.Assertions.*; + +/** + * Coverage for {@link JavaResolved}: the language-specific {@link Resolved} + * subtype detectors downcast to. Verifies the three contract obligations — + * isAvailable() == true, sourceConfidence() == RESOLVED, and the cu/solver + * accessors expose what was passed in. + */ +class JavaResolvedTest { + + @Test + void isAvailableIsTrue() { + JavaResolved r = newResolved(); + assertTrue(r.isAvailable(), + "JavaResolved must report available — it carries actual resolution"); + } + + @Test + void sourceConfidenceIsResolved() { + JavaResolved r = newResolved(); + assertEquals(Confidence.RESOLVED, r.sourceConfidence(), + "JavaResolved is the RESOLVED tier — symbol-solver-backed"); + } + + @Test + void cuAccessorReturnsTheParsedCompilationUnit() { + CompilationUnit cu = StaticJavaParser.parse("class Foo {}"); + JavaSymbolSolver solver = new JavaSymbolSolver(new CombinedTypeSolver(new ReflectionTypeSolver())); + JavaResolved r = new JavaResolved(cu, solver); + assertSame(cu, r.cu()); + } + + @Test + void solverAccessorReturnsTheConfiguredSolver() { + CompilationUnit cu = StaticJavaParser.parse("class Foo {}"); + JavaSymbolSolver solver = new JavaSymbolSolver(new CombinedTypeSolver(new ReflectionTypeSolver())); + JavaResolved r = new JavaResolved(cu, solver); + assertSame(solver, r.solver()); + } + + @Test + void implementsResolved() { + // The interface contract — verified by isAssignableFrom rather than + // an instanceof check (which the compiler already enforces). + assertTrue(Resolved.class.isAssignableFrom(JavaResolved.class)); + } + + @Test + void distinctFromEmptyResolvedSentinel() { + // A real JavaResolved must be != EmptyResolved.INSTANCE so detectors + // checking via `==` can short-circuit correctly. + JavaResolved r = newResolved(); + assertNotSame(EmptyResolved.INSTANCE, r); + } + + private static JavaResolved newResolved() { + CompilationUnit cu = StaticJavaParser.parse("class Foo {}"); + JavaSymbolSolver solver = new JavaSymbolSolver(new CombinedTypeSolver(new ReflectionTypeSolver())); + return new JavaResolved(cu, solver); + } +} diff --git a/src/test/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaSourceRootDiscoveryTest.java b/src/test/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaSourceRootDiscoveryTest.java new file mode 100644 index 00000000..43d50199 --- /dev/null +++ b/src/test/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaSourceRootDiscoveryTest.java @@ -0,0 +1,241 @@ +package io.github.randomcodespace.iq.intelligence.resolver.java; + +import org.junit.jupiter.api.Test; +import org.junit.jupiter.api.condition.DisabledOnOs; +import org.junit.jupiter.api.condition.OS; +import org.junit.jupiter.api.io.TempDir; + +import java.nio.file.Files; +import java.nio.file.Path; +import java.util.List; + +import static org.junit.jupiter.api.Assertions.*; + +/** + * Aggressive coverage for {@link JavaSourceRootDiscovery} on synthetic dir + * layouts. Verifies all 6 plan-mandated scenarios + defensive cases. + */ +class JavaSourceRootDiscoveryTest { + + private final JavaSourceRootDiscovery discovery = new JavaSourceRootDiscovery(); + + // ---------- Maven layouts ---------- + + @Test + void mavenSingleModuleReturnsMainAndTestJava(@TempDir Path tmp) throws Exception { + Files.createDirectories(tmp.resolve("src/main/java")); + Files.createDirectories(tmp.resolve("src/test/java")); + Files.writeString(tmp.resolve("pom.xml"), ""); + + List roots = discovery.discover(tmp); + + assertEquals(2, roots.size()); + assertEquals(tmp.resolve("src/main/java"), roots.get(0)); + assertEquals(tmp.resolve("src/test/java"), roots.get(1)); + } + + @Test + void mavenSingleModuleMainOnlyReturnsMainOnly(@TempDir Path tmp) throws Exception { + Files.createDirectories(tmp.resolve("src/main/java")); + Files.writeString(tmp.resolve("pom.xml"), ""); + + List roots = discovery.discover(tmp); + + assertEquals(List.of(tmp.resolve("src/main/java")), roots); + } + + @Test + void mavenMultiModuleAggregatesAllSubmodules(@TempDir Path tmp) throws Exception { + Files.writeString(tmp.resolve("pom.xml"), ""); + Files.createDirectories(tmp.resolve("service-a/src/main/java")); + Files.createDirectories(tmp.resolve("service-a/src/test/java")); + Files.createDirectories(tmp.resolve("service-b/src/main/java")); + + List roots = discovery.discover(tmp); + + assertEquals(3, roots.size()); + // Sorted alphabetically: service-a/src/main/java, service-a/src/test/java, service-b/src/main/java + assertEquals(tmp.resolve("service-a/src/main/java"), roots.get(0)); + assertEquals(tmp.resolve("service-a/src/test/java"), roots.get(1)); + assertEquals(tmp.resolve("service-b/src/main/java"), roots.get(2)); + } + + // ---------- Gradle layouts ---------- + + @Test + void gradleLayoutDetectedSameAsMaven(@TempDir Path tmp) throws Exception { + Files.createDirectories(tmp.resolve("src/main/java")); + Files.createDirectories(tmp.resolve("src/test/java")); + // Gradle Kotlin DSL marker + Files.writeString(tmp.resolve("build.gradle.kts"), "plugins {}"); + + List roots = discovery.discover(tmp); + + // The discovery doesn't actually inspect build files — it walks for src/(main|test)/java. + // Documents that Maven and Gradle are indistinguishable to this discovery. + assertEquals(2, roots.size()); + assertEquals(tmp.resolve("src/main/java"), roots.get(0)); + assertEquals(tmp.resolve("src/test/java"), roots.get(1)); + } + + // ---------- Plain layout ---------- + + @Test + void plainSrcWithJavaFileFallsBackToSrcAsRoot(@TempDir Path tmp) throws Exception { + // No Maven/Gradle markers, no src/main/java — but src/ has a .java file. + // Fallback: treat src/ as the root. + Files.createDirectories(tmp.resolve("src")); + Files.writeString(tmp.resolve("src/Foo.java"), "class Foo {}"); + + List roots = discovery.discover(tmp); + + assertEquals(List.of(tmp.resolve("src")), roots); + } + + @Test + void plainSrcWithoutJavaFilesReturnsEmpty(@TempDir Path tmp) throws Exception { + Files.createDirectories(tmp.resolve("src")); + Files.writeString(tmp.resolve("src/README.md"), "# nothing to see here"); + + List roots = discovery.discover(tmp); + + assertTrue(roots.isEmpty(), + "src/ exists but has no .java files — discovery returns nothing"); + } + + // ---------- Empty / missing ---------- + + @Test + void emptyDirectoryReturnsEmpty(@TempDir Path tmp) { + List roots = discovery.discover(tmp); + assertTrue(roots.isEmpty()); + } + + @Test + void nonExistentPathReturnsEmpty(@TempDir Path tmp) { + List roots = discovery.discover(tmp.resolve("does-not-exist")); + assertTrue(roots.isEmpty(), + "missing project root yields empty list, no exception"); + } + + @Test + void nullPathReturnsEmpty() { + List roots = discovery.discover(null); + assertTrue(roots.isEmpty(), + "null project root yields empty list, no NPE"); + } + + @Test + void filePathInsteadOfDirReturnsEmpty(@TempDir Path tmp) throws Exception { + Path file = Files.writeString(tmp.resolve("not-a-dir.txt"), "hello"); + List roots = discovery.discover(file); + assertTrue(roots.isEmpty(), + "a file (not a directory) yields empty list, no exception"); + } + + // ---------- Skip directories ---------- + + @Test + void targetDirIsSkipped(@TempDir Path tmp) throws Exception { + // Maven build output — nested src/main/java inside target/ should be ignored. + Files.createDirectories(tmp.resolve("src/main/java")); + Files.createDirectories(tmp.resolve("target/foo/src/main/java")); + Files.writeString(tmp.resolve("pom.xml"), ""); + + List roots = discovery.discover(tmp); + + assertEquals(List.of(tmp.resolve("src/main/java")), roots, + "target/ is skipped — its phantom src/main/java is not a real source root"); + } + + @Test + void buildAndNodeModulesSkipped(@TempDir Path tmp) throws Exception { + Files.createDirectories(tmp.resolve("src/main/java")); + Files.createDirectories(tmp.resolve("build/classes/main/src/main/java")); + Files.createDirectories(tmp.resolve("node_modules/some-pkg/src/main/java")); + + List roots = discovery.discover(tmp); + + assertEquals(List.of(tmp.resolve("src/main/java")), roots, + "build/ and node_modules/ are skipped — their phantom src trees are not roots"); + } + + @Test + void dotGitIsSkipped(@TempDir Path tmp) throws Exception { + Files.createDirectories(tmp.resolve("src/main/java")); + Files.createDirectories(tmp.resolve(".git/objects")); + Files.createDirectories(tmp.resolve(".gradle/caches")); + Files.createDirectories(tmp.resolve(".idea/workspace")); + + List roots = discovery.discover(tmp); + + assertEquals(List.of(tmp.resolve("src/main/java")), roots); + } + + // ---------- Determinism + safety ---------- + + @Test + void resultsAreSortedAlphabetically(@TempDir Path tmp) throws Exception { + Files.createDirectories(tmp.resolve("zzz/src/main/java")); + Files.createDirectories(tmp.resolve("aaa/src/main/java")); + Files.createDirectories(tmp.resolve("mmm/src/main/java")); + + List roots = discovery.discover(tmp); + + assertEquals(3, roots.size()); + assertEquals(tmp.resolve("aaa/src/main/java"), roots.get(0)); + assertEquals(tmp.resolve("mmm/src/main/java"), roots.get(1)); + assertEquals(tmp.resolve("zzz/src/main/java"), roots.get(2)); + } + + @Test + void discoveryIsIdempotent(@TempDir Path tmp) throws Exception { + Files.createDirectories(tmp.resolve("src/main/java")); + Files.createDirectories(tmp.resolve("src/test/java")); + Files.writeString(tmp.resolve("pom.xml"), ""); + + List first = discovery.discover(tmp); + List second = discovery.discover(tmp); + + assertEquals(first, second, + "two calls over the same tree return identical results — determinism"); + } + + @Test + @DisabledOnOs(OS.WINDOWS) // symlink semantics differ on Windows + void symlinkLoopTerminatesWithoutException(@TempDir Path tmp) throws Exception { + // Create a real source root and a symlink loop pointing back at the project root. + Files.createDirectories(tmp.resolve("src/main/java")); + Files.writeString(tmp.resolve("pom.xml"), ""); + Files.createSymbolicLink(tmp.resolve("loop-link"), tmp); + + // Files.walkFileTree with NOFOLLOW_LINKS doesn't traverse symlinks → no cycle. + List roots = assertDoesNotThrow(() -> discovery.discover(tmp)); + assertEquals(List.of(tmp.resolve("src/main/java")), roots, + "symlink loop does not cause infinite recursion or duplicate detection"); + } + + @Test + void srcMainJavaWithDeepNestingStillFound(@TempDir Path tmp) throws Exception { + // Deeply nested module — verifies walkFileTree doesn't hit a depth limit. + Path deep = tmp.resolve("a/b/c/d/e/service/src/main/java"); + Files.createDirectories(deep); + + List roots = discovery.discover(tmp); + + assertEquals(List.of(deep), roots); + } + + @Test + void srcMainKotlinIsNotMistakenForJava(@TempDir Path tmp) throws Exception { + // The check is for the literal "java" leaf — Kotlin sources at + // src/main/kotlin must NOT be reported as a Java source root. + Files.createDirectories(tmp.resolve("src/main/kotlin")); + Files.createDirectories(tmp.resolve("src/test/kotlin")); + + List roots = discovery.discover(tmp); + + assertTrue(roots.isEmpty(), + "src/main/kotlin is not a Java source root"); + } +} diff --git a/src/test/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaSymbolResolverTest.java b/src/test/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaSymbolResolverTest.java new file mode 100644 index 00000000..49cb8475 --- /dev/null +++ b/src/test/java/io/github/randomcodespace/iq/intelligence/resolver/java/JavaSymbolResolverTest.java @@ -0,0 +1,276 @@ +package io.github.randomcodespace.iq.intelligence.resolver.java; + +import com.github.javaparser.JavaParser; +import com.github.javaparser.ParseResult; +import com.github.javaparser.ParserConfiguration; +import com.github.javaparser.ast.CompilationUnit; +import com.github.javaparser.resolution.types.ResolvedType; +import com.github.javaparser.symbolsolver.resolution.typesolvers.CombinedTypeSolver; +import io.github.randomcodespace.iq.analyzer.DiscoveredFile; +import io.github.randomcodespace.iq.intelligence.resolver.EmptyResolved; +import io.github.randomcodespace.iq.intelligence.resolver.ResolutionException; +import io.github.randomcodespace.iq.intelligence.resolver.Resolved; +import org.junit.jupiter.api.BeforeEach; +import org.junit.jupiter.api.Test; +import org.junit.jupiter.api.io.TempDir; + +import java.nio.file.Files; +import java.nio.file.Path; +import java.util.Optional; +import java.util.Set; + +import static org.junit.jupiter.api.Assertions.*; + +/** + * Layer 1 unit tests for {@link JavaSymbolResolver}. + * + *

Covers all the contract obligations of the SPI plus a smoke test that + * the solver actually resolves a basic type after bootstrap. Deeper resolution + * scenarios (cross-file type lookups, generics, inner classes) are exercised + * by the integration / E2E tests once detectors migrate. + */ +class JavaSymbolResolverTest { + + private JavaSymbolResolver resolver; + + @BeforeEach + void setUp() { + resolver = new JavaSymbolResolver(new JavaSourceRootDiscovery()); + } + + // ---------- Language declaration ---------- + + @Test + void supportsJavaOnly() { + assertEquals(Set.of("java"), resolver.getSupportedLanguages()); + } + + // ---------- Bootstrap ---------- + + @Test + void bootstrapEmptyProjectStillBuildsReflectionSolver(@TempDir Path tmp) throws ResolutionException { + // No source roots — combined solver still has ReflectionTypeSolver. + resolver.bootstrap(tmp); + CombinedTypeSolver cts = resolver.combinedTypeSolver(); + assertNotNull(cts, "combinedTypeSolver is non-null after bootstrap"); + // ReflectionTypeSolver alone — but solver can still resolve java.lang.String. + assertSolverResolvesString(resolver); + } + + @Test + void bootstrapWithSourceRootsAddsJavaParserTypeSolvers(@TempDir Path tmp) throws Exception { + Files.createDirectories(tmp.resolve("src/main/java")); + Files.writeString(tmp.resolve("src/main/java/Foo.java"), "public class Foo {}"); + Files.writeString(tmp.resolve("pom.xml"), ""); + + resolver.bootstrap(tmp); + + assertNotNull(resolver.combinedTypeSolver()); + // After bootstrap with source root, solver resolves Foo from that root. + assertSolverResolvesType(resolver, "public class Bar { Foo f; }", + "Foo", "Foo"); + } + + @Test + void bootstrapTwiceIsIdempotent(@TempDir Path tmp) throws Exception { + Files.createDirectories(tmp.resolve("src/main/java")); + Files.writeString(tmp.resolve("pom.xml"), ""); + + resolver.bootstrap(tmp); + CombinedTypeSolver firstCts = resolver.combinedTypeSolver(); + resolver.bootstrap(tmp); + CombinedTypeSolver secondCts = resolver.combinedTypeSolver(); + + assertNotNull(firstCts); + assertNotNull(secondCts); + // Two bootstraps on the same project should produce equivalent state + // (different instances but same wiring). + assertNotSame(firstCts, secondCts, "bootstrap creates a fresh CombinedTypeSolver each call"); + } + + @Test + void combinedTypeSolverIsNullBeforeBootstrap() { + assertNull(resolver.combinedTypeSolver()); + } + + // ---------- resolve() — empty / fallback paths ---------- + + @Test + void resolveBeforeBootstrapReturnsEmpty() { + DiscoveredFile f = new DiscoveredFile(Path.of("Foo.java"), "java", 100); + Resolved r = resolver.resolve(f, parse("class Foo {}")); + assertSame(EmptyResolved.INSTANCE, r, + "no bootstrap → no solver → EmptyResolved (graceful fallback)"); + } + + @Test + void resolveNullFileReturnsEmpty() throws ResolutionException { + resolver.bootstrap(Path.of(System.getProperty("java.io.tmpdir"))); + Resolved r = resolver.resolve(null, parse("class Foo {}")); + assertSame(EmptyResolved.INSTANCE, r); + } + + @Test + void resolveNonJavaFileReturnsEmpty(@TempDir Path tmp) throws ResolutionException { + resolver.bootstrap(tmp); + DiscoveredFile py = new DiscoveredFile(Path.of("foo.py"), "python", 100); + Resolved r = resolver.resolve(py, parse("class Foo {}")); + assertSame(EmptyResolved.INSTANCE, r, + "non-Java file → EmptyResolved even with valid CompilationUnit"); + } + + @Test + void resolveNullAstReturnsEmpty(@TempDir Path tmp) throws ResolutionException { + resolver.bootstrap(tmp); + DiscoveredFile java = new DiscoveredFile(Path.of("Foo.java"), "java", 100); + Resolved r = resolver.resolve(java, null); + assertSame(EmptyResolved.INSTANCE, r); + } + + @Test + void resolveStringAstReturnsEmpty(@TempDir Path tmp) throws ResolutionException { + resolver.bootstrap(tmp); + DiscoveredFile java = new DiscoveredFile(Path.of("Foo.java"), "java", 100); + Resolved r = resolver.resolve(java, "not a CompilationUnit"); + assertSame(EmptyResolved.INSTANCE, r, + "wrong AST type → EmptyResolved instead of ClassCastException"); + } + + @Test + void resolveLanguageCheckIsCaseInsensitive(@TempDir Path tmp) throws ResolutionException { + resolver.bootstrap(tmp); + // "Java" instead of "java" — must still match. + DiscoveredFile mixed = new DiscoveredFile(Path.of("Foo.java"), "Java", 100); + Resolved r = resolver.resolve(mixed, parse("class Foo {}")); + assertNotSame(EmptyResolved.INSTANCE, r); + assertInstanceOf(JavaResolved.class, r); + } + + // ---------- resolve() — happy path ---------- + + @Test + void resolveValidCompilationUnitReturnsJavaResolved(@TempDir Path tmp) throws ResolutionException { + resolver.bootstrap(tmp); + DiscoveredFile java = new DiscoveredFile(Path.of("Foo.java"), "java", 100); + CompilationUnit cu = parse("class Foo {}"); + Resolved r = resolver.resolve(java, cu); + + assertNotSame(EmptyResolved.INSTANCE, r); + assertInstanceOf(JavaResolved.class, r); + assertTrue(r.isAvailable()); + } + + @Test + void javaResolvedCarriesTheCompilationUnit(@TempDir Path tmp) throws ResolutionException { + resolver.bootstrap(tmp); + DiscoveredFile java = new DiscoveredFile(Path.of("Foo.java"), "java", 100); + CompilationUnit cu = parse("class Foo {}"); + + JavaResolved r = (JavaResolved) resolver.resolve(java, cu); + + assertSame(cu, r.cu()); + } + + @Test + void javaResolvedCarriesTheSolver(@TempDir Path tmp) throws ResolutionException { + resolver.bootstrap(tmp); + DiscoveredFile java = new DiscoveredFile(Path.of("Foo.java"), "java", 100); + CompilationUnit cu = parse("class Foo {}"); + + JavaResolved r = (JavaResolved) resolver.resolve(java, cu); + + assertNotNull(r.solver(), + "the resolver builds a real JavaSymbolSolver and threads it through"); + } + + // ---------- Solver smoke tests ---------- + + @Test + void solverResolvesJavaLangStringViaReflection(@TempDir Path tmp) throws ResolutionException { + // Smoke test: ReflectionTypeSolver alone (empty project) lets us resolve + // java.lang.String. Confirms the wiring is correct end-to-end. + resolver.bootstrap(tmp); + assertSolverResolvesString(resolver); + } + + @Test + void solverResolvesProjectClassFromSourceRoot(@TempDir Path tmp) throws Exception { + // bootstrap with a single source root + a single file; resolve a use of + // that class from a separate parsed file. + Files.createDirectories(tmp.resolve("src/main/java/com/example")); + Files.writeString(tmp.resolve("src/main/java/com/example/Foo.java"), + "package com.example; public class Foo { public String bar() { return \"\"; } }"); + Files.writeString(tmp.resolve("pom.xml"), ""); + + resolver.bootstrap(tmp); + + assertSolverResolvesType(resolver, + "package com.example; class Bar { Foo f; }", + "Foo", "com.example.Foo"); + } + + @Test + void resolveProducesDistinctJavaResolvedPerCall(@TempDir Path tmp) throws ResolutionException { + // Two resolve() calls don't cache — each gets a fresh JavaResolved + // record instance carrying the caller's CompilationUnit reference. + resolver.bootstrap(tmp); + DiscoveredFile java = new DiscoveredFile(Path.of("Foo.java"), "java", 100); + CompilationUnit cu1 = parse("class Foo {}"); + CompilationUnit cu2 = parse("class Foo {}"); + + Resolved r1 = resolver.resolve(java, cu1); + Resolved r2 = resolver.resolve(java, cu2); + + assertNotSame(r1, r2, + "no caching — each resolve() returns a fresh JavaResolved"); + assertSame(cu1, ((JavaResolved) r1).cu(), + "cu1 reference is preserved through to JavaResolved.cu()"); + assertSame(cu2, ((JavaResolved) r2).cu(), + "cu2 reference is preserved through to JavaResolved.cu()"); + assertNotSame(((JavaResolved) r1).cu(), ((JavaResolved) r2).cu(), + "the two JavaResolved instances carry distinct CompilationUnit objects (identity, not value)"); + } + + // ---------- Helpers ---------- + + private static CompilationUnit parse(String source) { + return new JavaParser().parse(source).getResult().orElseThrow(); + } + + /** Smoke test: solver resolves java.lang.String via ReflectionTypeSolver. */ + private static void assertSolverResolvesString(JavaSymbolResolver resolver) { + ResolvedType t = resolveTypeOf(resolver, "class Z { String s; }", "String"); + assertNotNull(t); + assertTrue(t.describe().contains("String"), + "solver describes the type — got " + t.describe()); + } + + /** Resolve a field's declared type by name via the resolver's solver. */ + private static void assertSolverResolvesType(JavaSymbolResolver resolver, + String source, + String fieldTypeName, + String expectedFqnFragment) { + ResolvedType t = resolveTypeOf(resolver, source, fieldTypeName); + assertNotNull(t); + assertTrue(t.describe().contains(expectedFqnFragment), + "expected '" + expectedFqnFragment + "' in resolved type, got '" + t.describe() + "'"); + } + + /** Parse the source with the resolver's solver attached and look up the named field's type. */ + private static ResolvedType resolveTypeOf(JavaSymbolResolver resolver, String source, String fieldType) { + ParserConfiguration cfg = new ParserConfiguration().setSymbolResolver(resolver.symbolSolver()); + ParseResult parsed = new JavaParser(cfg).parse(source); + CompilationUnit cu = parsed.getResult().orElseThrow(); + + // Find first field with the matching declared-type name. + Optional fieldTypeNode = cu.findAll( + com.github.javaparser.ast.body.FieldDeclaration.class).stream() + .flatMap(f -> f.getVariables().stream()) + .map(v -> v.getType()) + .filter(t -> t.asString().equals(fieldType)) + .findFirst(); + assertTrue(fieldTypeNode.isPresent(), + "test source has no field of type '" + fieldType + "'"); + return fieldTypeNode.get().resolve(); + } +} diff --git a/src/test/java/io/github/randomcodespace/iq/model/CodeEdgeConfidenceTest.java b/src/test/java/io/github/randomcodespace/iq/model/CodeEdgeConfidenceTest.java new file mode 100644 index 00000000..938b9db6 --- /dev/null +++ b/src/test/java/io/github/randomcodespace/iq/model/CodeEdgeConfidenceTest.java @@ -0,0 +1,60 @@ +package io.github.randomcodespace.iq.model; + +import org.junit.jupiter.api.Test; + +import static org.junit.jupiter.api.Assertions.assertEquals; +import static org.junit.jupiter.api.Assertions.assertNull; + +class CodeEdgeConfidenceTest { + + private CodeEdge newEdge() { + CodeNode target = new CodeNode("node:Bar.java:class:Bar", NodeKind.CLASS, "Bar"); + return new CodeEdge("edge:Foo->Bar:depends_on", EdgeKind.DEPENDS_ON, + "node:Foo.java:class:Foo", target); + } + + @Test + void confidenceDefaultsToLexicalOnFreshEdge() { + assertEquals(Confidence.LEXICAL, newEdge().getConfidence(), + "fresh edge defaults to LEXICAL — least committal"); + } + + @Test + void confidenceCanBeSetAndRead() { + CodeEdge e = newEdge(); + e.setConfidence(Confidence.RESOLVED); + assertEquals(Confidence.RESOLVED, e.getConfidence()); + } + + @Test + void confidenceSetterNormalizesNullToLexical() { + CodeEdge e = newEdge(); + e.setConfidence(Confidence.RESOLVED); + e.setConfidence(null); + assertEquals(Confidence.LEXICAL, e.getConfidence(), + "null setter falls back to LEXICAL — never null"); + } + + @Test + void sourceIsNullUntilSet() { + assertNull(newEdge().getSource(), + "source defaults to null on the bare constructor; " + + "detector base classes stamp it via setSource() during emission"); + } + + @Test + void sourceCanBeSetAndRead() { + CodeEdge e = newEdge(); + e.setSource("SpringServiceDetector"); + assertEquals("SpringServiceDetector", e.getSource()); + } + + @Test + void confidenceAndSourceAreIndependent() { + CodeEdge e = newEdge(); + e.setConfidence(Confidence.SYNTACTIC); + e.setSource("JpaEntityDetector"); + assertEquals(Confidence.SYNTACTIC, e.getConfidence()); + assertEquals("JpaEntityDetector", e.getSource()); + } +} diff --git a/src/test/java/io/github/randomcodespace/iq/model/CodeNodeConfidenceTest.java b/src/test/java/io/github/randomcodespace/iq/model/CodeNodeConfidenceTest.java new file mode 100644 index 00000000..fff67ee8 --- /dev/null +++ b/src/test/java/io/github/randomcodespace/iq/model/CodeNodeConfidenceTest.java @@ -0,0 +1,56 @@ +package io.github.randomcodespace.iq.model; + +import org.junit.jupiter.api.Test; + +import static org.junit.jupiter.api.Assertions.assertEquals; +import static org.junit.jupiter.api.Assertions.assertNull; + +class CodeNodeConfidenceTest { + + @Test + void confidenceDefaultsToLexicalOnFreshNode() { + CodeNode n = new CodeNode("node:Foo.java:class:Foo", NodeKind.CLASS, "Foo"); + assertEquals(Confidence.LEXICAL, n.getConfidence(), + "fresh node defaults to LEXICAL — least committal"); + } + + @Test + void confidenceCanBeSetAndRead() { + CodeNode n = new CodeNode("node:Foo.java:class:Foo", NodeKind.CLASS, "Foo"); + n.setConfidence(Confidence.RESOLVED); + assertEquals(Confidence.RESOLVED, n.getConfidence()); + } + + @Test + void confidenceSetterNormalizesNullToLexical() { + CodeNode n = new CodeNode("node:Foo.java:class:Foo", NodeKind.CLASS, "Foo"); + n.setConfidence(Confidence.RESOLVED); + n.setConfidence(null); + assertEquals(Confidence.LEXICAL, n.getConfidence(), + "null setter falls back to LEXICAL — never null"); + } + + @Test + void sourceIsNullUntilSet() { + CodeNode n = new CodeNode("node:Foo.java:class:Foo", NodeKind.CLASS, "Foo"); + assertNull(n.getSource(), + "source defaults to null on the bare constructor; " + + "detector base classes stamp it via setSource() during emission"); + } + + @Test + void sourceCanBeSetAndRead() { + CodeNode n = new CodeNode("node:Foo.java:class:Foo", NodeKind.CLASS, "Foo"); + n.setSource("SpringServiceDetector"); + assertEquals("SpringServiceDetector", n.getSource()); + } + + @Test + void confidenceAndSourceAreIndependent() { + CodeNode n = new CodeNode("node:Foo.java:class:Foo", NodeKind.CLASS, "Foo"); + n.setConfidence(Confidence.SYNTACTIC); + n.setSource("JpaEntityDetector"); + assertEquals(Confidence.SYNTACTIC, n.getConfidence()); + assertEquals("JpaEntityDetector", n.getSource()); + } +} diff --git a/src/test/java/io/github/randomcodespace/iq/model/ConfidenceTest.java b/src/test/java/io/github/randomcodespace/iq/model/ConfidenceTest.java new file mode 100644 index 00000000..896a13a3 --- /dev/null +++ b/src/test/java/io/github/randomcodespace/iq/model/ConfidenceTest.java @@ -0,0 +1,40 @@ +package io.github.randomcodespace.iq.model; + +import org.junit.jupiter.api.Test; + +import static org.junit.jupiter.api.Assertions.assertEquals; +import static org.junit.jupiter.api.Assertions.assertThrows; +import static org.junit.jupiter.api.Assertions.assertTrue; + +class ConfidenceTest { + + @Test + void scoreMappingIsStable() { + assertEquals(0.6, Confidence.LEXICAL.score(), 1e-9); + assertEquals(0.8, Confidence.SYNTACTIC.score(), 1e-9); + assertEquals(0.95, Confidence.RESOLVED.score(), 1e-9); + } + + @Test + void naturalOrderingMatchesScore() { + assertTrue(Confidence.LEXICAL.compareTo(Confidence.SYNTACTIC) < 0); + assertTrue(Confidence.SYNTACTIC.compareTo(Confidence.RESOLVED) < 0); + } + + @Test + void fromStringNullIsRejected() { + assertThrows(NullPointerException.class, () -> Confidence.fromString(null)); + } + + @Test + void fromStringIsCaseInsensitive() { + assertEquals(Confidence.RESOLVED, Confidence.fromString("resolved")); + assertEquals(Confidence.RESOLVED, Confidence.fromString("RESOLVED")); + assertEquals(Confidence.LEXICAL, Confidence.fromString("LeXiCaL")); + } + + @Test + void fromStringRejectsUnknown() { + assertThrows(IllegalArgumentException.class, () -> Confidence.fromString("perfect")); + } +}