|
| 1 | +# Flow Generator — Design Spec |
| 2 | + |
| 3 | +**Date:** 2026-03-28 |
| 4 | +**Status:** Approved |
| 5 | + |
| 6 | +## Problem |
| 7 | + |
| 8 | +The code-intelligence graph has 2M+ nodes and 3M+ edges, but there's no way to see the "big picture" — how code gets built, deployed, and runs. A developer joining a project needs a single command that shows: here's the CI pipeline, here's the deployment topology, here's how services talk to each other, here's the auth layer. |
| 9 | + |
| 10 | +## Solution |
| 11 | + |
| 12 | +A `FlowEngine` core library that collapses the full graph into clean, small diagrams (5-30 nodes per view) with drill-down support. Output formats: Mermaid (text), JSON (API-ready), and a self-contained interactive HTML file with click-to-drill navigation. |
| 13 | + |
| 14 | +## Non-Goals |
| 15 | + |
| 16 | +- No running server required for the UI (static HTML only) |
| 17 | +- No single comprehensive diagram (too complex, nobody benefits) |
| 18 | +- No real-time dynamic filtering in the UI (pre-computed views only) |
| 19 | +- No new NodeKind or EdgeKind values (use existing types with properties) |
| 20 | + |
| 21 | +## Output Consistency Requirement |
| 22 | + |
| 23 | +**All consumers (CLI, HTTP API, MCP tool, HTML UI) MUST receive identical data from the same FlowEngine methods.** The rendering format changes, the data never does. |
| 24 | + |
| 25 | +- `FlowEngine.generate(view)` returns a `FlowDiagram` — this is the single source of truth |
| 26 | +- `FlowDiagram` is a plain dataclass — serializable to any format |
| 27 | +- Renderers are pure functions: `FlowDiagram → str` (Mermaid, JSON, HTML) |
| 28 | +- The JSON renderer output IS the API response schema — same fields, same structure, same values |
| 29 | +- MCP tool returns the same JSON, CLI prints the same Mermaid, HTML embeds the same data |
| 30 | +- No consumer-specific logic in the engine or views — all differentiation happens at the render layer only |
| 31 | + |
| 32 | +``` |
| 33 | +FlowEngine.generate("ci") → FlowDiagram (identical object) |
| 34 | + │ |
| 35 | + ├─ render(diagram, "mermaid") → str (CLI prints this) |
| 36 | + ├─ render(diagram, "json") → str (API returns this, MCP returns this) |
| 37 | + └─ render_interactive() → str (HTML embeds all views as JSON, renders as Mermaid client-side) |
| 38 | +``` |
| 39 | + |
| 40 | +If a Mermaid diagram shows 12 endpoints in the runtime view, the JSON must show 12, the API must return 12, the MCP tool must return 12. Zero divergence. |
| 41 | + |
| 42 | +## Architecture |
| 43 | + |
| 44 | +``` |
| 45 | +┌─────────┐ ┌──────────┐ ┌──────────┐ ┌─────────────────┐ |
| 46 | +│ CLI │ │ HTTP API │ │ MCP Tool │ │ Web UI (HTML) │ |
| 47 | +│ │ │ (future) │ │ (future) │ │ Static file │ |
| 48 | +└────┬────┘ └────┬──────┘ └────┬─────┘ └────┬────────────┘ |
| 49 | + │ │ │ │ |
| 50 | + │ generate(view): │ render_interactive(): |
| 51 | + │ render(diagram, fmt) │ embeds all views as JSON |
| 52 | + │ │ │ renders via Mermaid.js |
| 53 | + ▼ ▼ ▼ ▲ |
| 54 | +┌────────────────────────────────────────────────┤ |
| 55 | +│ FlowEngine (core library) │ |
| 56 | +│ │ |
| 57 | +│ generate(store, view) → FlowDiagram │ |
| 58 | +│ generate_all(store) → dict[str, FlowDiagram] │ |
| 59 | +│ render(diagram, format) → str │ |
| 60 | +│ render_interactive() → str (self-contained)───┘ |
| 61 | +│ │ |
| 62 | +└───────┬──────────────────────┬─────────────────┘ |
| 63 | + │ │ |
| 64 | + ┌────▼────────┐ ┌─────▼──────┐ |
| 65 | + │ Views │ │ Renderers │ |
| 66 | + │ (5 views, │ │ (Mermaid, │ |
| 67 | + │ filter + │ │ JSON, │ |
| 68 | + │ rollup) │ │ HTML) │ |
| 69 | + └─────────────┘ └────────────┘ |
| 70 | +``` |
| 71 | + |
| 72 | +**All 4 consumers call the same FlowEngine methods:** |
| 73 | +- **CLI**: `engine.generate(view)` → `engine.render(diagram, "mermaid")` → prints to stdout |
| 74 | +- **HTTP API** (future): `engine.generate(view)` → `engine.render(diagram, "json")` → returns JSON response |
| 75 | +- **MCP Tool** (future): `engine.generate(view)` → `engine.render(diagram, "json")` → returns to agent |
| 76 | +- **Web UI**: `engine.render_interactive()` → generates all views, bakes into static HTML with Mermaid.js client-side rendering. The HTML file is a build artifact — no server needed, open directly in browser. |
| 77 | + |
| 78 | +FlowEngine is a standalone class — no CLI, no HTTP, no transport dependency. |
| 79 | + |
| 80 | +## File Organization |
| 81 | + |
| 82 | +``` |
| 83 | +src/code_intelligence/flow/ |
| 84 | + __init__.py |
| 85 | + engine.py # FlowEngine class |
| 86 | + views.py # 5 view implementations |
| 87 | + models.py # FlowDiagram, FlowNode, FlowEdge, FlowSubgraph |
| 88 | + renderer.py # Mermaid + JSON + HTML rendering |
| 89 | + templates/ |
| 90 | + interactive.html # Self-contained drill-down UI template (~300 lines) |
| 91 | +``` |
| 92 | + |
| 93 | +--- |
| 94 | + |
| 95 | +## 1. FlowDiagram Model (`flow/models.py`) |
| 96 | + |
| 97 | +```python |
| 98 | +@dataclass |
| 99 | +class FlowNode: |
| 100 | + id: str |
| 101 | + label: str |
| 102 | + kind: str # display category: "trigger", "job", "service", "endpoint", "database", "guard", etc. |
| 103 | + properties: dict # count, stage, auth_type, image, etc. |
| 104 | + style: str = "default" # "default", "success", "warning", "danger" — for visual emphasis |
| 105 | + |
| 106 | +@dataclass |
| 107 | +class FlowSubgraph: |
| 108 | + id: str |
| 109 | + label: str |
| 110 | + nodes: list[FlowNode] |
| 111 | + drill_down_view: str | None # "ci", "deploy", "runtime", "auth" — clickable in HTML |
| 112 | + |
| 113 | +@dataclass |
| 114 | +class FlowEdge: |
| 115 | + source: str |
| 116 | + target: str |
| 117 | + label: str | None = None |
| 118 | + style: str = "solid" # "solid", "dotted", "thick" |
| 119 | + |
| 120 | +@dataclass |
| 121 | +class FlowDiagram: |
| 122 | + title: str |
| 123 | + view: str # "overview", "ci", "deploy", "runtime", "auth" |
| 124 | + direction: str = "LR" # Mermaid direction: LR, TD, etc. |
| 125 | + subgraphs: list[FlowSubgraph] = field(default_factory=list) |
| 126 | + loose_nodes: list[FlowNode] = field(default_factory=list) # nodes not in subgraphs |
| 127 | + edges: list[FlowEdge] = field(default_factory=list) |
| 128 | + stats: dict = field(default_factory=dict) # total_nodes, total_edges, etc. |
| 129 | +``` |
| 130 | + |
| 131 | +--- |
| 132 | + |
| 133 | +## 2. FlowEngine (`flow/engine.py`) |
| 134 | + |
| 135 | +```python |
| 136 | +class FlowEngine: |
| 137 | + def __init__(self, store: GraphStore) -> None: |
| 138 | + self._store = store |
| 139 | + |
| 140 | + def generate(self, view: str = "overview") -> FlowDiagram: |
| 141 | + """Generate a single flow view diagram.""" |
| 142 | + |
| 143 | + def generate_all(self) -> dict[str, FlowDiagram]: |
| 144 | + """Generate all 5 views. Used for HTML interactive output.""" |
| 145 | + |
| 146 | + def render(self, diagram: FlowDiagram, format: str = "mermaid") -> str: |
| 147 | + """Render a diagram to string: mermaid, json, or dot.""" |
| 148 | + |
| 149 | + def render_interactive(self) -> str: |
| 150 | + """Generate all views and bake into self-contained HTML.""" |
| 151 | +``` |
| 152 | + |
| 153 | +--- |
| 154 | + |
| 155 | +## 3. Views (`flow/views.py`) |
| 156 | + |
| 157 | +Each view function takes a `GraphStore` and returns a `FlowDiagram`. All views: |
| 158 | +- Filter the graph to relevant node/edge kinds |
| 159 | +- Collapse/group nodes into a small number of display nodes |
| 160 | +- Count collapsed nodes (e.g., "API Endpoints x42") |
| 161 | +- Produce max ~30 FlowNodes |
| 162 | + |
| 163 | +### 3a. Overview View (default) |
| 164 | + |
| 165 | +Produces 4 subgraphs with 5-15 total nodes: |
| 166 | + |
| 167 | +**CI/CD subgraph:** |
| 168 | +- Scan for nodes from GitHub Actions / GitLab CI detectors (MODULE nodes with workflow/pipeline properties, METHOD nodes that are CI jobs) |
| 169 | +- Collapse into: Trigger → Build → Test → Deploy (or whatever stages exist) |
| 170 | +- If no CI detected: omit subgraph |
| 171 | + |
| 172 | +**Infrastructure subgraph:** |
| 173 | +- Scan for INFRA_RESOURCE nodes (K8s, Docker Compose, Terraform, Bicep) |
| 174 | +- Group by type: "K8s Deployments x3", "Services x5", "ConfigMaps x2" |
| 175 | +- Show CONNECTS_TO and DEPENDS_ON edges between groups |
| 176 | + |
| 177 | +**Application subgraph:** |
| 178 | +- Count endpoints, entities, services (classes with METHOD children) |
| 179 | +- Collapse into: "API Endpoints x42" → "Services x15" → "Database Entities x8" |
| 180 | +- Show CALLS/QUERIES edges between groups |
| 181 | + |
| 182 | +**Security subgraph:** |
| 183 | +- Count GUARD and MIDDLEWARE nodes |
| 184 | +- Show which endpoint groups they protect |
| 185 | +- If no guards: show "No auth detected" warning node |
| 186 | + |
| 187 | +### 3b. CI View (`--view ci`) |
| 188 | + |
| 189 | +- Find all GitHub Actions workflows and GitLab CI pipelines |
| 190 | +- Show every job as a node with stage/runner properties |
| 191 | +- Show DEPENDS_ON edges (job dependencies, `needs:`) |
| 192 | +- Show CONTAINS edges (workflow → jobs) |
| 193 | +- Group by stage if available |
| 194 | +- Show trigger events as entry nodes |
| 195 | + |
| 196 | +### 3c. Deploy View (`--view deploy`) |
| 197 | + |
| 198 | +- Find all INFRA_RESOURCE nodes (K8s, Docker, Terraform, Bicep) |
| 199 | +- Show Dockerfile build stages (FROM → build → runtime) |
| 200 | +- Show K8s topology (Ingress → Service → Deployment → ConfigMap/Secret) |
| 201 | +- Show Docker Compose services with depends_on/links |
| 202 | +- Show Helm chart structure if detected |
| 203 | +- Group by namespace or compose project |
| 204 | + |
| 205 | +### 3d. Runtime View (`--view runtime`) |
| 206 | + |
| 207 | +- Use ArchitectView rollup to get module-level graph |
| 208 | +- Show modules as nodes with endpoint/entity/method counts |
| 209 | +- Show DEPENDS_ON, CALLS, PRODUCES/CONSUMES edges between modules |
| 210 | +- Highlight database connections and messaging |
| 211 | +- Group by layer (frontend, backend, infra) using the layer classifier property |
| 212 | + |
| 213 | +### 3e. Auth View (`--view auth`) |
| 214 | + |
| 215 | +- Find all GUARD and MIDDLEWARE nodes |
| 216 | +- Find all ENDPOINT nodes |
| 217 | +- Show PROTECTS edges from guards to endpoints |
| 218 | +- Highlight unprotected endpoints (no incoming PROTECTS edge) with "danger" style |
| 219 | +- Group guards by auth_type (spring_security, django, jwt, etc.) |
| 220 | +- Show auth coverage stats: "42 of 50 endpoints protected" |
| 221 | + |
| 222 | +--- |
| 223 | + |
| 224 | +## 4. Renderers (`flow/renderer.py`) |
| 225 | + |
| 226 | +### Mermaid Renderer |
| 227 | + |
| 228 | +Converts `FlowDiagram` → Mermaid flowchart syntax: |
| 229 | +- Subgraphs → `subgraph` blocks |
| 230 | +- Node styles based on `kind` and `style` |
| 231 | +- Edge styles: solid (→), dotted (-.->), thick (==>) |
| 232 | +- Click handlers for HTML mode: `click nodeId callback` |
| 233 | + |
| 234 | +### JSON Renderer |
| 235 | + |
| 236 | +Converts `FlowDiagram` → JSON via dataclass serialization. Used by future HTTP API and MCP tools. |
| 237 | + |
| 238 | +### HTML Renderer |
| 239 | + |
| 240 | +Reads `templates/interactive.html`, injects all 5 view Mermaid strings as a JSON object, outputs a single self-contained HTML file. |
| 241 | + |
| 242 | +The template: |
| 243 | +- Loads Mermaid.js from CDN (with fallback comment for offline) |
| 244 | +- Renders overview on page load |
| 245 | +- Click on a subgraph → swaps to that view's Mermaid diagram |
| 246 | +- Breadcrumb nav: `Overview > CI Pipeline` |
| 247 | +- Stats bar showing total nodes, edges, languages, detectors |
| 248 | +- Dark/light theme toggle |
| 249 | +- ~300 lines total, no build step, no npm |
| 250 | + |
| 251 | +--- |
| 252 | + |
| 253 | +## 5. New Detector: GitLab CI (`detectors/config/gitlab_ci.py`) |
| 254 | + |
| 255 | +- Language: `yaml` |
| 256 | +- Trigger: file path ends with `.gitlab-ci.yml` or contains `/ci/` with `.yml` |
| 257 | +- Detects: |
| 258 | + - `stages:` list → ordered pipeline stages as CONFIG_KEY nodes |
| 259 | + - Each job (top-level key that's not a keyword) → METHOD node with properties: stage, image, script summary, environment, artifacts |
| 260 | + - `needs:` → DEPENDS_ON edges between jobs |
| 261 | + - `extends:` → EXTENDS edges to template jobs |
| 262 | + - `include:` → IMPORTS edges to included CI files |
| 263 | + - `rules:`/`only:`/`except:` → trigger conditions in properties |
| 264 | + - `image:` → Docker image as property |
| 265 | + - Tool usage in `script:` → extract docker/helm/kubectl/terraform/maven/npm calls as properties |
| 266 | +- Produces: MODULE for the pipeline, METHOD for jobs, CONFIG_KEY for stages/triggers |
| 267 | +- ID format: `gitlab:{filepath}:pipeline`, `gitlab:{filepath}:job:{name}`, `gitlab:{filepath}:stage:{name}` |
| 268 | + |
| 269 | +--- |
| 270 | + |
| 271 | +## 6. Enhanced Dockerfile Detector (modify existing) |
| 272 | + |
| 273 | +Add to `detectors/iac/dockerfile.py`: |
| 274 | +- Detect multi-stage builds: `FROM image AS stagename` |
| 275 | +- Create INFRA_RESOURCE node per stage with `build_stage` property |
| 276 | +- `COPY --from=stagename` → DEPENDS_ON edge between stages |
| 277 | +- `ARG` → CONFIG_DEFINITION node |
| 278 | +- Track stage order for flow visualization |
| 279 | + |
| 280 | +--- |
| 281 | + |
| 282 | +## 7. CLI Integration |
| 283 | + |
| 284 | +Add to `cli.py`: |
| 285 | + |
| 286 | +```python |
| 287 | +@app.command() |
| 288 | +def flow( |
| 289 | + path: Path = Path("."), |
| 290 | + view: str = "overview", # overview, ci, deploy, runtime, auth |
| 291 | + format: str = "mermaid", # mermaid, json, html |
| 292 | + backend: str = "networkx", |
| 293 | + output: Path | None = None, |
| 294 | + config: Path | None = None, |
| 295 | +) -> None: |
| 296 | + """Generate architecture flow diagrams.""" |
| 297 | +``` |
| 298 | + |
| 299 | +Logic: |
| 300 | +1. Load graph from backend (or analyze if no graph exists) |
| 301 | +2. Create `FlowEngine(store)` |
| 302 | +3. If format == "html": `engine.render_interactive()` → write to output file |
| 303 | +4. Else: `engine.generate(view)` → `engine.render(diagram, format)` → print or write |
| 304 | + |
| 305 | +--- |
| 306 | + |
| 307 | +## 8. Bundle Integration |
| 308 | + |
| 309 | +Update `bundle` command to include `flow.html` in the zip: |
| 310 | +```python |
| 311 | +# In bundle command, after graph files: |
| 312 | +flow_html = FlowEngine(result.graph).render_interactive() |
| 313 | +zf.writestr("flow.html", flow_html) |
| 314 | +``` |
| 315 | + |
| 316 | +--- |
| 317 | + |
| 318 | +## 9. Testing |
| 319 | + |
| 320 | +- Unit test each view with fixture graphs containing CI, K8s, endpoint, guard nodes |
| 321 | +- Test that overview produces 4 subgraphs with <= 15 nodes |
| 322 | +- Test Mermaid output is valid syntax (contains `graph`, `subgraph`, `-->`) |
| 323 | +- Test JSON output is valid JSON with expected keys |
| 324 | +- Test HTML output contains all 5 view data blocks and Mermaid script tag |
| 325 | +- Test GitLab CI detector with a realistic `.gitlab-ci.yml` fixture |
| 326 | +- Determinism: generate twice, assert identical output |
| 327 | +- Benchmark: flow generation on contoso-real-estate < 100ms |
| 328 | + |
| 329 | +--- |
| 330 | + |
| 331 | +## 10. Determinism |
| 332 | + |
| 333 | +- All views sort nodes/edges by ID before rendering |
| 334 | +- Subgraph ordering is fixed (CI, Infrastructure, Application, Security for overview) |
| 335 | +- No set iteration without sorting |
| 336 | +- Same graph → same FlowDiagram → same Mermaid/HTML output, always |
0 commit comments