Audience: Developers new to Osiris who need to understand the foundational architecture before building components.
Osiris is an LLM-first conversational ETL pipeline system built around self-describing components. This document explains the core concepts and how they fit together.
What: A self-describing building block for data pipelines.
Purpose: Declares what operations are possible, what configuration is needed, and what capabilities are supported.
File: components/<name>/spec.yaml (declarative YAML)
Example: mysql.extractor component declares:
- "I can extract data from MySQL databases"
- "I need: host, database, user, password, query"
- "I support: discovery mode, batch operations"
Key Properties:
- Modes: What operations? (
extract,write,transform,discover) - Capabilities: What features? (
discover: true,streaming: false) - Config Schema: JSON Schema defining required/optional fields
- Secrets: Which fields contain sensitive data (passwords, keys)
Lifecycle: Loaded by Registry → Validated → Used by Compiler → Referenced at Runtime
What: Database/API client managing connections and low-level protocol details.
Purpose: Handles where to connect and how to authenticate.
Location: osiris/connectors/<family>/connection.py (Python module)
Example: MySQLConnector provides:
- Connection pooling
- Query execution
- Transaction management
- Error handling
Key Responsibilities:
- Establish connections using resolved credentials
- Provide query/command execution methods
- Health checking (
doctor()method) - Connection lifecycle management
Relationship: A connector is used by one or more drivers.
What: Executable logic that performs a specific data operation.
Purpose: Implements how to execute a pipeline step (extract, write, transform).
Location: osiris/drivers/<name>_driver.py (Python class)
Example: MySQLExtractorDriver implements:
def run(self, *, step_id, config, inputs, ctx) -> dict:
# 1. Get resolved connection from config
# 2. Build SQL query
# 3. Execute query via connector
# 4. Emit metrics (rows_read)
# 5. Return {"df": DataFrame}Protocol: Must implement Driver protocol from osiris/core/driver.py
Key Responsibilities:
- Validate configuration
- Execute data operation (extract/write/transform)
- Emit metrics and events via context
- Handle errors gracefully
- Return structured output (
{"df": DataFrame}or{})
Relationship: A driver uses a connector and implements a component's runtime behavior.
What: Centralized catalog of all component specifications.
Purpose: Single source of truth for component metadata, schemas, and capabilities.
Location: osiris/components/registry.py
Key Functions:
- Load specs: Scan
components/directory forspec.yamlfiles - Validate specs: Check against JSON Schema and semantic rules
- Query components: List available components, filter by mode/capability
- Provide metadata: Supply config schemas, secrets, examples to compiler
CLI Interface:
osiris components list # All components
osiris components list --runnable # Only components with drivers
osiris components show mysql.extractor # Detailed spec
osiris components validate mysql.extractor --level strictRelationship: Registry loads components and validates specs. Used by Compiler and CLI.
What: Orchestrator that executes compiled pipelines step-by-step.
Purpose: Coordinates driver execution, manages state, collects artifacts.
Location: osiris/core/runner_v0.py
Execution Flow:
- Load manifest: Read compiled pipeline (manifest.yaml)
- Resolve connections: Convert
@mysql.default→ actual credentials - Create session: Initialize logging, artifacts directory
- Execute steps: For each step:
- Resolve inputs from upstream outputs
- Instantiate driver
- Call
driver.run(step_id, config, inputs, ctx) - Collect outputs, emit events/metrics
- Cleanup: Finalize artifacts, write status.json
Adapters: Runner uses execution adapters for different environments:
- LocalAdapter: Execute on host machine
- E2BTransparentProxy: Execute in cloud sandbox (E2B)
Relationship: Runner uses Registry to discover drivers, calls driver.run(), and manages execution lifecycle.
┌─────────────────────────────────────────────────────────────────┐
│ USER / LLM COMPILER │
└────────────────────────────┬────────────────────────────────────┘
│ creates
▼
┌─────────────────┐
│ OML Pipeline │ (YAML file)
│ pipeline.yaml │
└────────┬────────┘
│ compiles via
▼
┌─────────────────┐
│ Component │◄──────┐
│ Registry │ │ loads specs
└────────┬────────┘ │
│ validates │
│ │
┌───────────────────┼────────────────┤
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────┐ ┌──────────────┐
│ Component Spec │ │ Component │ │ Component │
│ mysql.extractor │ │supabase.writer │duckdb.processor
│ spec.yaml │ │ spec.yaml │ │ spec.yaml │
└─────────────────┘ └─────────────┘ └──────────────┘
│ │ │
│ x-runtime.driver │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────┐ ┌──────────────┐
│ Driver Impl │ │ Driver │ │ Driver │
│MySQLExtractorDriver│SupabaseWriterDr│DuckDBProcessor│
│ (Python class) │ │(Python class) │(Python class)│
└────────┬────────┘ └──────┬──────┘ └──────┬───────┘
│ │ │
│ uses │ uses │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────┐ ┌──────────────┐
│ Connector │ │ Connector │ │ DuckDB │
│ MySQLConnector │ │SupabaseClient │ Engine │
│(connection mgmt)│ │(REST client)│ │ (SQL engine) │
└─────────────────┘ └─────────────┘ └──────────────┘
│ │ │
│ connects to │ calls │
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ MySQL │ │Supabase │ │In-Memory│
│Database │ │ API │ │ Data │
└─────────┘ └─────────┘ └─────────┘
▲
│ orchestrates
┌────────┴────────┐
│ Runner │
│ (Executor) │
└─────────────────┘
│
┌──────────────┼──────────────┐
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌────────────┐
│LocalAdapter │ │ E2BTransparent│ │Future: │
│ (host exec) │ │ Proxy (cloud) │ │K8sAdapter │
└─────────────┘ └─────────────┘ └────────────┘
| Aspect | Component (spec.yaml) | Driver (Python class) |
|---|---|---|
| Nature | Declarative metadata | Imperative code |
| Purpose | Describe capabilities | Implement behavior |
| Location | components/<name>/spec.yaml |
osiris/drivers/<name>_driver.py |
| Consumed By | Compiler, Registry, CLI | Runner |
| Example | "I support discovery mode" | def discover(config): ... |
| Validation | JSON Schema validation | Runtime protocol check |
Relationship: Component spec declares what the driver implements.
| Aspect | Component | Connector |
|---|---|---|
| Scope | Single data operation | Multiple operations |
| Reusability | One component = one operation | One connector = many components |
| Configuration | Operation-specific (query, table) | Connection-specific (host, credentials) |
| Example | mysql.extractor (reads data) |
MySQLConnector (manages connections) |
| Files | spec.yaml + driver |
connection.py + utils |
Relationship: Multiple components (extractor, writer) can share the same connector (MySQLConnector).
Example:
mysql.extractor→ usesMySQLConnectormysql.writer→ uses sameMySQLConnector
| Aspect | Driver | Connector |
|---|---|---|
| Layer | Business logic | Infrastructure |
| Knows About | Pipeline steps, DataFrames, metrics | Connections, protocols, retries |
| Entry Point | run(step_id, config, inputs, ctx) |
connect(), execute(), close() |
| Error Handling | Step-level errors, retry logic | Connection errors, timeouts |
| State | Stateless (per-step) | Stateful (connection pooling) |
Relationship: Driver calls connector methods to perform I/O.
Example Flow:
# In MySQLExtractorDriver.run()
connector = MySQLConnector(config["resolved_connection"])
df = connector.execute_query(config["query"])
ctx.log_metric("rows_read", len(df))
return {"df": df}| Aspect | Registry | Runner |
|---|---|---|
| Phase | Compile-time | Runtime |
| Purpose | Validate and catalog components | Execute pipeline steps |
| Input | Component specs (YAML) | Compiled manifest (JSON) |
| Output | Validated specs, metadata | Executed steps, artifacts |
| Mutability | Read-only | Writes artifacts, logs |
Relationship: Runner queries Registry to discover available drivers, then executes them.
Developer writes:
components/mycomp/spec.yaml (what it does)
osiris/drivers/mycomp_driver.py (how it works)
osiris/connectors/mydb/connection.py (where it connects)
Registry.load_specs() →
Validate spec.yaml against JSON Schema →
Check semantic rules (secrets, aliases) →
Store in memory cache →
✓ Component available
User writes pipeline.yaml (OML) →
Compiler queries Registry for component specs →
Validates config against component's configSchema →
Resolves connections (@mysql.default) →
Generates manifest.yaml →
✓ Pipeline ready to run
Runner loads manifest.yaml →
For each step:
1. DriverRegistry.get(component_name) → Driver instance
2. Driver.run(step_id, config, inputs, ctx)
- Driver calls Connector methods
- Connector executes database operations
- Driver emits metrics (rows_read, duration_ms)
3. Runner collects outputs, logs events
✓ Pipeline complete
Scenario: Extract from MySQL, write to Supabase
oml_version: "0.1.0"
name: "mysql_to_supabase"
steps:
- id: extract_users
component: mysql.extractor
mode: extract
config:
connection: "@mysql.default"
query: "SELECT * FROM users"
- id: write_users
component: supabase.writer
mode: write
inputs:
df: "${extract_users.df}"
config:
connection: "@supabase.main"
table: "users"-
Registry lookup:
- Load
components/mysql.extractor/spec.yaml - Validate
config.queryagainstconfigSchema - Load
components/supabase.writer/spec.yaml - Validate
config.tableagainstconfigSchema
- Load
-
Connection resolution:
@mysql.default→{host: localhost, database: mydb, user: admin, password: $MYSQL_PASSWORD}@supabase.main→{url: https://..., key: $SUPABASE_KEY}
-
Manifest generation:
- Create
manifest.yamlwith resolved connections - Store in
logs/compile_<timestamp>/
- Create
-
Step 1: extract_users
Runner → DriverRegistry.get("mysql.extractor") → MySQLExtractorDriver instance → driver.run( step_id="extract_users", config={ "query": "SELECT * FROM users", "resolved_connection": {host, database, user, password} }, inputs=None, ctx=ExecutionContext ) Inside driver.run(): connector = MySQLConnector(resolved_connection) df = connector.execute_query(query) ctx.log_metric("rows_read", len(df)) return {"df": df} Runner stores: outputs["extract_users"] = {"df": DataFrame(100 rows)} -
Step 2: write_users
Runner → Resolve inputs: df = outputs["extract_users"]["df"] → DriverRegistry.get("supabase.writer") → SupabaseWriterDriver instance → driver.run( step_id="write_users", config={ "table": "users", "resolved_connection": {url, key} }, inputs={"df": DataFrame(100 rows)}, ctx=ExecutionContext ) Inside driver.run(): client = SupabaseClient(resolved_connection) records = inputs["df"].to_dict("records") client.insert(table, records) ctx.log_metric("rows_written", len(records)) return {} Runner emits events: step_complete, rows_written=100
- Adding a new data source or destination
- Defining a new transformation type
- Exposing new capabilities to the LLM compiler
Files to create:
components/<name>/spec.yaml
- Implementing the runtime behavior for a component
- Executing extract/write/transform logic
Files to create:
osiris/drivers/<name>_driver.py
- Supporting a new database/API type
- Reusing connection logic across multiple drivers
Files to create:
osiris/connectors/<family>/connection.pyosiris/connectors/<family>/utils.py
- Adding new validation rules
- Implementing new discovery patterns
- Supporting new component metadata
Files to modify:
osiris/components/registry.py
- Adding new execution phases
- Implementing new orchestration patterns
- Supporting new artifact types
Files to modify:
osiris/core/runner_v0.py
Component Spec (mysql.extractor/spec.yaml)
↓ declares
Driver (mysql_extractor_driver.py)
↓ uses
Connector (connectors/mysql/connection.py)
↓ connects to
MySQL Database
Component Spec (supabase.writer/spec.yaml)
↓ declares
Driver (supabase_writer_driver.py)
↓ uses
Connector (connectors/supabase/connection.py)
↓ calls
Supabase REST API
Component Spec (duckdb.processor/spec.yaml)
↓ declares
Driver (duckdb_processor_driver.py)
↓ uses
DuckDB Engine (in-memory)
↓ transforms
DataFrame
- Components are declarative, Drivers are imperative
- Connectors are reusable across multiple drivers
- Registry validates at compile-time, Runner executes at runtime
- Specs describe capabilities, Drivers implement behavior
- One connector can serve multiple components (extractor + writer)
- Build a Component: See
BUILD_A_COMPONENT.md - Understand Drivers: See
modules/drivers.md - Understand Registry: See
modules/components.md - Understand Connectors: See
modules/connectors.md - Understand Runner: See
modules/runtime.md
Remember: Component = What, Driver = How, Connector = Where, Registry = Catalog, Runner = Executor.