Problem
The kbc llm export command generates a data lineage graph (indices/graph.jsonl) that only contains table: and transform: node types. Non-transformation components — extractors, writers, and applications — are completely absent from the graph, even though they are the actual data sources and sinks of the pipeline.
Current behavior
- All nodes in the graph are either
table:* or transform:*
- Missing from graph: extractors, writers, applications — zero representation
- Edge types: only
consumed_by and produces — no edges connecting components to tables they extract into or write from
components/index.json correctly catalogs all components, but the lineage graph ignores everything except transformations
Impact on AI agents
The primary consumer of kbc llm export output is AI agents/LLMs. Without extractor/writer edges, an agent cannot:
- Trace data origin — "Where does this table come from?" → No answer from lineage (it may be extracted from an external DB, but the graph doesn't show this)
- Trace data destination — "Where does this reporting table go?" → No answer (it may feed a writer or PowerBI refresh app, but graph doesn't show this)
- Assess blast radius — "If I change this extractor config, what transformations are affected?" → Requires manually cross-referencing bucket names with component configs
Expected behavior
The lineage graph should include edges for all component types:
{"source":"extractor:component-id:config-id","target":"table:bucket/table-name","type":"produces"}
{"source":"table:bucket/table-name","target":"writer:component-id:config-id","type":"consumed_by"}
{"source":"table:bucket/table-name","target":"application:component-id:config-id","type":"consumed_by"}
This would make the lineage graph a true end-to-end representation of the data pipeline.
Additional request: Implicit SQL table references
A secondary (but related) gap: the lineage graph is built solely from explicit input/output mappings declared in transformation configs. However, Snowflake transformations can reference tables by fully-qualified name directly in SQL code (e.g., SELECT * FROM "bucket"."table") without declaring them in the input mapping. These implicit dependencies are invisible to the current lineage graph.
Ideally, the export could optionally perform a lightweight static analysis of SQL code blocks to detect FROM/JOIN clauses referencing fully-qualified table names that are not present in the declared input mapping, and add these as a separate edge type (e.g., type: "implicit_ref").
Environment
Problem
The
kbc llm exportcommand generates a data lineage graph (indices/graph.jsonl) that only containstable:andtransform:node types. Non-transformation components — extractors, writers, and applications — are completely absent from the graph, even though they are the actual data sources and sinks of the pipeline.Current behavior
table:*ortransform:*consumed_byandproduces— no edges connecting components to tables they extract into or write fromcomponents/index.jsoncorrectly catalogs all components, but the lineage graph ignores everything except transformationsImpact on AI agents
The primary consumer of
kbc llm exportoutput is AI agents/LLMs. Without extractor/writer edges, an agent cannot:Expected behavior
The lineage graph should include edges for all component types:
{"source":"extractor:component-id:config-id","target":"table:bucket/table-name","type":"produces"} {"source":"table:bucket/table-name","target":"writer:component-id:config-id","type":"consumed_by"} {"source":"table:bucket/table-name","target":"application:component-id:config-id","type":"consumed_by"}This would make the lineage graph a true end-to-end representation of the data pipeline.
Additional request: Implicit SQL table references
A secondary (but related) gap: the lineage graph is built solely from explicit input/output mappings declared in transformation configs. However, Snowflake transformations can reference tables by fully-qualified name directly in SQL code (e.g.,
SELECT * FROM "bucket"."table") without declaring them in the input mapping. These implicit dependencies are invisible to the current lineage graph.Ideally, the export could optionally perform a lightweight static analysis of SQL code blocks to detect
FROM/JOINclauses referencing fully-qualified table names that are not present in the declared input mapping, and add these as a separate edge type (e.g.,type: "implicit_ref").Environment