Skip to content

Commit e7ebc49

Browse files
aksOpsclaude
andcommitted
Add flow generator design spec
5 views (overview, ci, deploy, runtime, auth), FlowEngine core library, Mermaid/JSON/HTML renderers, self-contained interactive HTML UI, GitLab CI detector, enhanced Dockerfile detector. Output consistency: all consumers (CLI, API, MCP, UI) get identical data. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 15bd098 commit e7ebc49

1 file changed

Lines changed: 336 additions & 0 deletions

File tree

Lines changed: 336 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,336 @@
1+
# Flow Generator — Design Spec
2+
3+
**Date:** 2026-03-28
4+
**Status:** Approved
5+
6+
## Problem
7+
8+
The code-intelligence graph has 2M+ nodes and 3M+ edges, but there's no way to see the "big picture" — how code gets built, deployed, and runs. A developer joining a project needs a single command that shows: here's the CI pipeline, here's the deployment topology, here's how services talk to each other, here's the auth layer.
9+
10+
## Solution
11+
12+
A `FlowEngine` core library that collapses the full graph into clean, small diagrams (5-30 nodes per view) with drill-down support. Output formats: Mermaid (text), JSON (API-ready), and a self-contained interactive HTML file with click-to-drill navigation.
13+
14+
## Non-Goals
15+
16+
- No running server required for the UI (static HTML only)
17+
- No single comprehensive diagram (too complex, nobody benefits)
18+
- No real-time dynamic filtering in the UI (pre-computed views only)
19+
- No new NodeKind or EdgeKind values (use existing types with properties)
20+
21+
## Output Consistency Requirement
22+
23+
**All consumers (CLI, HTTP API, MCP tool, HTML UI) MUST receive identical data from the same FlowEngine methods.** The rendering format changes, the data never does.
24+
25+
- `FlowEngine.generate(view)` returns a `FlowDiagram` — this is the single source of truth
26+
- `FlowDiagram` is a plain dataclass — serializable to any format
27+
- Renderers are pure functions: `FlowDiagram → str` (Mermaid, JSON, HTML)
28+
- The JSON renderer output IS the API response schema — same fields, same structure, same values
29+
- MCP tool returns the same JSON, CLI prints the same Mermaid, HTML embeds the same data
30+
- No consumer-specific logic in the engine or views — all differentiation happens at the render layer only
31+
32+
```
33+
FlowEngine.generate("ci") → FlowDiagram (identical object)
34+
35+
├─ render(diagram, "mermaid") → str (CLI prints this)
36+
├─ render(diagram, "json") → str (API returns this, MCP returns this)
37+
└─ render_interactive() → str (HTML embeds all views as JSON, renders as Mermaid client-side)
38+
```
39+
40+
If a Mermaid diagram shows 12 endpoints in the runtime view, the JSON must show 12, the API must return 12, the MCP tool must return 12. Zero divergence.
41+
42+
## Architecture
43+
44+
```
45+
┌─────────┐ ┌──────────┐ ┌──────────┐ ┌─────────────────┐
46+
│ CLI │ │ HTTP API │ │ MCP Tool │ │ Web UI (HTML) │
47+
│ │ │ (future) │ │ (future) │ │ Static file │
48+
└────┬────┘ └────┬──────┘ └────┬─────┘ └────┬────────────┘
49+
│ │ │ │
50+
│ generate(view): │ render_interactive():
51+
│ render(diagram, fmt) │ embeds all views as JSON
52+
│ │ │ renders via Mermaid.js
53+
▼ ▼ ▼ ▲
54+
┌────────────────────────────────────────────────┤
55+
│ FlowEngine (core library) │
56+
│ │
57+
│ generate(store, view) → FlowDiagram │
58+
│ generate_all(store) → dict[str, FlowDiagram] │
59+
│ render(diagram, format) → str │
60+
│ render_interactive() → str (self-contained)───┘
61+
│ │
62+
└───────┬──────────────────────┬─────────────────┘
63+
│ │
64+
┌────▼────────┐ ┌─────▼──────┐
65+
│ Views │ │ Renderers │
66+
│ (5 views, │ │ (Mermaid, │
67+
│ filter + │ │ JSON, │
68+
│ rollup) │ │ HTML) │
69+
└─────────────┘ └────────────┘
70+
```
71+
72+
**All 4 consumers call the same FlowEngine methods:**
73+
- **CLI**: `engine.generate(view)``engine.render(diagram, "mermaid")` → prints to stdout
74+
- **HTTP API** (future): `engine.generate(view)``engine.render(diagram, "json")` → returns JSON response
75+
- **MCP Tool** (future): `engine.generate(view)``engine.render(diagram, "json")` → returns to agent
76+
- **Web UI**: `engine.render_interactive()` → generates all views, bakes into static HTML with Mermaid.js client-side rendering. The HTML file is a build artifact — no server needed, open directly in browser.
77+
78+
FlowEngine is a standalone class — no CLI, no HTTP, no transport dependency.
79+
80+
## File Organization
81+
82+
```
83+
src/code_intelligence/flow/
84+
__init__.py
85+
engine.py # FlowEngine class
86+
views.py # 5 view implementations
87+
models.py # FlowDiagram, FlowNode, FlowEdge, FlowSubgraph
88+
renderer.py # Mermaid + JSON + HTML rendering
89+
templates/
90+
interactive.html # Self-contained drill-down UI template (~300 lines)
91+
```
92+
93+
---
94+
95+
## 1. FlowDiagram Model (`flow/models.py`)
96+
97+
```python
98+
@dataclass
99+
class FlowNode:
100+
id: str
101+
label: str
102+
kind: str # display category: "trigger", "job", "service", "endpoint", "database", "guard", etc.
103+
properties: dict # count, stage, auth_type, image, etc.
104+
style: str = "default" # "default", "success", "warning", "danger" — for visual emphasis
105+
106+
@dataclass
107+
class FlowSubgraph:
108+
id: str
109+
label: str
110+
nodes: list[FlowNode]
111+
drill_down_view: str | None # "ci", "deploy", "runtime", "auth" — clickable in HTML
112+
113+
@dataclass
114+
class FlowEdge:
115+
source: str
116+
target: str
117+
label: str | None = None
118+
style: str = "solid" # "solid", "dotted", "thick"
119+
120+
@dataclass
121+
class FlowDiagram:
122+
title: str
123+
view: str # "overview", "ci", "deploy", "runtime", "auth"
124+
direction: str = "LR" # Mermaid direction: LR, TD, etc.
125+
subgraphs: list[FlowSubgraph] = field(default_factory=list)
126+
loose_nodes: list[FlowNode] = field(default_factory=list) # nodes not in subgraphs
127+
edges: list[FlowEdge] = field(default_factory=list)
128+
stats: dict = field(default_factory=dict) # total_nodes, total_edges, etc.
129+
```
130+
131+
---
132+
133+
## 2. FlowEngine (`flow/engine.py`)
134+
135+
```python
136+
class FlowEngine:
137+
def __init__(self, store: GraphStore) -> None:
138+
self._store = store
139+
140+
def generate(self, view: str = "overview") -> FlowDiagram:
141+
"""Generate a single flow view diagram."""
142+
143+
def generate_all(self) -> dict[str, FlowDiagram]:
144+
"""Generate all 5 views. Used for HTML interactive output."""
145+
146+
def render(self, diagram: FlowDiagram, format: str = "mermaid") -> str:
147+
"""Render a diagram to string: mermaid, json, or dot."""
148+
149+
def render_interactive(self) -> str:
150+
"""Generate all views and bake into self-contained HTML."""
151+
```
152+
153+
---
154+
155+
## 3. Views (`flow/views.py`)
156+
157+
Each view function takes a `GraphStore` and returns a `FlowDiagram`. All views:
158+
- Filter the graph to relevant node/edge kinds
159+
- Collapse/group nodes into a small number of display nodes
160+
- Count collapsed nodes (e.g., "API Endpoints x42")
161+
- Produce max ~30 FlowNodes
162+
163+
### 3a. Overview View (default)
164+
165+
Produces 4 subgraphs with 5-15 total nodes:
166+
167+
**CI/CD subgraph:**
168+
- Scan for nodes from GitHub Actions / GitLab CI detectors (MODULE nodes with workflow/pipeline properties, METHOD nodes that are CI jobs)
169+
- Collapse into: Trigger → Build → Test → Deploy (or whatever stages exist)
170+
- If no CI detected: omit subgraph
171+
172+
**Infrastructure subgraph:**
173+
- Scan for INFRA_RESOURCE nodes (K8s, Docker Compose, Terraform, Bicep)
174+
- Group by type: "K8s Deployments x3", "Services x5", "ConfigMaps x2"
175+
- Show CONNECTS_TO and DEPENDS_ON edges between groups
176+
177+
**Application subgraph:**
178+
- Count endpoints, entities, services (classes with METHOD children)
179+
- Collapse into: "API Endpoints x42" → "Services x15" → "Database Entities x8"
180+
- Show CALLS/QUERIES edges between groups
181+
182+
**Security subgraph:**
183+
- Count GUARD and MIDDLEWARE nodes
184+
- Show which endpoint groups they protect
185+
- If no guards: show "No auth detected" warning node
186+
187+
### 3b. CI View (`--view ci`)
188+
189+
- Find all GitHub Actions workflows and GitLab CI pipelines
190+
- Show every job as a node with stage/runner properties
191+
- Show DEPENDS_ON edges (job dependencies, `needs:`)
192+
- Show CONTAINS edges (workflow → jobs)
193+
- Group by stage if available
194+
- Show trigger events as entry nodes
195+
196+
### 3c. Deploy View (`--view deploy`)
197+
198+
- Find all INFRA_RESOURCE nodes (K8s, Docker, Terraform, Bicep)
199+
- Show Dockerfile build stages (FROM → build → runtime)
200+
- Show K8s topology (Ingress → Service → Deployment → ConfigMap/Secret)
201+
- Show Docker Compose services with depends_on/links
202+
- Show Helm chart structure if detected
203+
- Group by namespace or compose project
204+
205+
### 3d. Runtime View (`--view runtime`)
206+
207+
- Use ArchitectView rollup to get module-level graph
208+
- Show modules as nodes with endpoint/entity/method counts
209+
- Show DEPENDS_ON, CALLS, PRODUCES/CONSUMES edges between modules
210+
- Highlight database connections and messaging
211+
- Group by layer (frontend, backend, infra) using the layer classifier property
212+
213+
### 3e. Auth View (`--view auth`)
214+
215+
- Find all GUARD and MIDDLEWARE nodes
216+
- Find all ENDPOINT nodes
217+
- Show PROTECTS edges from guards to endpoints
218+
- Highlight unprotected endpoints (no incoming PROTECTS edge) with "danger" style
219+
- Group guards by auth_type (spring_security, django, jwt, etc.)
220+
- Show auth coverage stats: "42 of 50 endpoints protected"
221+
222+
---
223+
224+
## 4. Renderers (`flow/renderer.py`)
225+
226+
### Mermaid Renderer
227+
228+
Converts `FlowDiagram` → Mermaid flowchart syntax:
229+
- Subgraphs → `subgraph` blocks
230+
- Node styles based on `kind` and `style`
231+
- Edge styles: solid (→), dotted (-.->), thick (==>)
232+
- Click handlers for HTML mode: `click nodeId callback`
233+
234+
### JSON Renderer
235+
236+
Converts `FlowDiagram` → JSON via dataclass serialization. Used by future HTTP API and MCP tools.
237+
238+
### HTML Renderer
239+
240+
Reads `templates/interactive.html`, injects all 5 view Mermaid strings as a JSON object, outputs a single self-contained HTML file.
241+
242+
The template:
243+
- Loads Mermaid.js from CDN (with fallback comment for offline)
244+
- Renders overview on page load
245+
- Click on a subgraph → swaps to that view's Mermaid diagram
246+
- Breadcrumb nav: `Overview > CI Pipeline`
247+
- Stats bar showing total nodes, edges, languages, detectors
248+
- Dark/light theme toggle
249+
- ~300 lines total, no build step, no npm
250+
251+
---
252+
253+
## 5. New Detector: GitLab CI (`detectors/config/gitlab_ci.py`)
254+
255+
- Language: `yaml`
256+
- Trigger: file path ends with `.gitlab-ci.yml` or contains `/ci/` with `.yml`
257+
- Detects:
258+
- `stages:` list → ordered pipeline stages as CONFIG_KEY nodes
259+
- Each job (top-level key that's not a keyword) → METHOD node with properties: stage, image, script summary, environment, artifacts
260+
- `needs:` → DEPENDS_ON edges between jobs
261+
- `extends:` → EXTENDS edges to template jobs
262+
- `include:` → IMPORTS edges to included CI files
263+
- `rules:`/`only:`/`except:` → trigger conditions in properties
264+
- `image:` → Docker image as property
265+
- Tool usage in `script:` → extract docker/helm/kubectl/terraform/maven/npm calls as properties
266+
- Produces: MODULE for the pipeline, METHOD for jobs, CONFIG_KEY for stages/triggers
267+
- ID format: `gitlab:{filepath}:pipeline`, `gitlab:{filepath}:job:{name}`, `gitlab:{filepath}:stage:{name}`
268+
269+
---
270+
271+
## 6. Enhanced Dockerfile Detector (modify existing)
272+
273+
Add to `detectors/iac/dockerfile.py`:
274+
- Detect multi-stage builds: `FROM image AS stagename`
275+
- Create INFRA_RESOURCE node per stage with `build_stage` property
276+
- `COPY --from=stagename` → DEPENDS_ON edge between stages
277+
- `ARG` → CONFIG_DEFINITION node
278+
- Track stage order for flow visualization
279+
280+
---
281+
282+
## 7. CLI Integration
283+
284+
Add to `cli.py`:
285+
286+
```python
287+
@app.command()
288+
def flow(
289+
path: Path = Path("."),
290+
view: str = "overview", # overview, ci, deploy, runtime, auth
291+
format: str = "mermaid", # mermaid, json, html
292+
backend: str = "networkx",
293+
output: Path | None = None,
294+
config: Path | None = None,
295+
) -> None:
296+
"""Generate architecture flow diagrams."""
297+
```
298+
299+
Logic:
300+
1. Load graph from backend (or analyze if no graph exists)
301+
2. Create `FlowEngine(store)`
302+
3. If format == "html": `engine.render_interactive()` → write to output file
303+
4. Else: `engine.generate(view)``engine.render(diagram, format)` → print or write
304+
305+
---
306+
307+
## 8. Bundle Integration
308+
309+
Update `bundle` command to include `flow.html` in the zip:
310+
```python
311+
# In bundle command, after graph files:
312+
flow_html = FlowEngine(result.graph).render_interactive()
313+
zf.writestr("flow.html", flow_html)
314+
```
315+
316+
---
317+
318+
## 9. Testing
319+
320+
- Unit test each view with fixture graphs containing CI, K8s, endpoint, guard nodes
321+
- Test that overview produces 4 subgraphs with <= 15 nodes
322+
- Test Mermaid output is valid syntax (contains `graph`, `subgraph`, `-->`)
323+
- Test JSON output is valid JSON with expected keys
324+
- Test HTML output contains all 5 view data blocks and Mermaid script tag
325+
- Test GitLab CI detector with a realistic `.gitlab-ci.yml` fixture
326+
- Determinism: generate twice, assert identical output
327+
- Benchmark: flow generation on contoso-real-estate < 100ms
328+
329+
---
330+
331+
## 10. Determinism
332+
333+
- All views sort nodes/edges by ID before rendering
334+
- Subgraph ordering is fixed (CI, Infrastructure, Application, Security for overview)
335+
- No set iteration without sorting
336+
- Same graph → same FlowDiagram → same Mermaid/HTML output, always

0 commit comments

Comments
 (0)