The Component Specification Schema enables Osiris components to be self-describing, providing configuration requirements, capabilities, and security metadata. This allows the LLM to generate valid pipeline configurations and enables automatic secrets masking in logs and artifacts.
- Schema Draft: JSON Schema Draft 2020-12
- Schema ID:
https://osiris.ai/schemas/component-spec/v1.0.0 - Location:
components/spec.schema.json
| Field | Type | Description | Example |
|---|---|---|---|
name |
string | Component identifier (pattern: ^[a-z0-9_.-]+$) |
mysql.table |
version |
string | Semantic version (semver) | 1.0.0 |
modes |
array | Supported operational modes | ["extract", "load"] |
capabilities |
object | Component capability flags | See Capabilities section |
configSchema |
object | JSON Schema for component configuration | See ConfigSchema section |
| Field | Type | Description |
|---|---|---|
title |
string | Human-readable component title |
description |
string | Detailed component description |
secrets |
array | JSON Pointer paths to secret fields |
x-connection-fields |
array | Fields provided by connection references with override policies |
redaction |
object | Policy for redacting sensitive data |
constraints |
object | Cross-field validation rules |
examples |
array | Usage examples with config and OML |
compatibility |
object | Requirements and conflicts |
llmHints |
object | Hints for LLM-driven generation |
loggingPolicy |
object | Logging configuration |
limits |
object | Resource and operational limits |
Available operational modes:
extract- Read data from sourcewrite- Write data to destinationload- Write data to destination (deprecated, usewrite)transform- Transform data in-placediscover- Discover schema/metadataanalyze- Perform analytical queriesstream- Stream processing
Boolean flags indicating component capabilities:
{
"discover": true, // Supports schema discovery
"adHocAnalytics": false, // Supports ad-hoc queries
"inMemoryMove": true, // Supports in-memory transfers
"streaming": false, // Supports streaming
"bulkOperations": true, // Supports bulk operations
"transactions": true, // Supports transactions
"partitioning": false, // Supports partitioned processing
"customTransforms": false // Supports custom transforms
}The configSchema field contains a nested JSON Schema defining the component's configuration:
{
"configSchema": {
"type": "object",
"properties": {
"connection": {
"type": "object",
"properties": {
"host": {"type": "string"},
"password": {"type": "string"}
}
},
"table": {"type": "string"}
},
"required": ["connection", "table"]
}
}The secrets field uses JSON Pointer notation to identify sensitive fields:
{
"secrets": [
"/connection/password", // Points to connection.password
"/auth/apiKey", // Points to auth.apiKey
"/credentials/privateKey" // Points to credentials.privateKey
]
}JSON Pointer Syntax:
- Always starts with
/ - Path segments separated by
/ - Array indices as numbers:
/items/0/secret - Special characters escaped:
~0for~,~1for/
The x-connection-fields field declares which configuration fields can be provided by a connection reference and controls whether they can be overridden in pipeline step configs. This enables secure credential management and fine-grained control over field overrides.
Purpose: When a pipeline step uses a connection reference (e.g., connection: "@mysql.prod"), certain fields are resolved from the connection definition. The x-connection-fields specification tells the validator which fields are expected from the connection and their override policies.
Simple Format (all fields overridable by default):
x-connection-fields:
- endpoint
- auth_token
- auth_usernameAdvanced Format (with override control):
x-connection-fields:
- name: host
override: allowed # Can be overridden in step config
- name: password
override: forbidden # Cannot be overridden (security)
- name: headers
override: warning # Can override but emits warningOverride Policies:
allowed: Step config can override connection value (for testing/flexibility)forbidden: Step config cannot override (security-sensitive fields)warning: Step config can override but emits warning (ambiguous fields)
Example with MySQL:
x-connection-fields:
- name: host
override: allowed # Infrastructure field - safe to override
- name: port
override: allowed # Infrastructure field - safe to override
- name: database
override: forbidden # Security: prevent database switching
- name: user
override: forbidden # Security: prevent user switching
- name: password
override: forbidden # Security: prevent credential overrideValidation Behavior:
- When connection reference used: Validator skips validation of connection-provided required fields
- Forbidden override: Validation error if step config attempts override
- Warning override: Validation warning if step config attempts override
- Allowed override: No error or warning
See Also: x-connection-fields Specification for comprehensive documentation.
Controls how sensitive data is handled in logs:
{
"redaction": {
"strategy": "mask", // mask, drop, or hash
"mask": "***", // Mask string (if strategy=mask)
"extras": [ // Additional paths to redact
"/connection/host"
]
}
}Component usage examples with configuration and OML snippets:
{
"examples": [
{
"title": "Basic MySQL extraction",
"config": {
"connection": {
"host": "localhost",
"database": "mydb",
"username": "user",
"password": "secret"
},
"table": "customers"
},
"omlSnippet": "type: mysql.table\nconnection: @mysql\ntable: customers",
"notes": "Requires read permissions"
}
]
}Guidance for LLM-driven pipeline generation:
{
"llmHints": {
"inputAliases": {
"table": ["table_name", "source_table"],
"schema": ["database", "namespace"]
},
"promptGuidance": "Use for MySQL operations. Always specify connection and table.",
"yamlSnippets": [
"type: mysql.table\nconnection: @mysql"
],
"commonPatterns": [
{
"pattern": "bulk_load",
"description": "Use batchSize for efficiency"
}
]
}
}Defines logging behavior and sensitive data handling:
{
"loggingPolicy": {
"sensitivePaths": ["/connection/host"],
"eventDefaults": ["discovery.start", "transfer.progress"],
"metricsToCapture": ["rows_read", "rows_written", "duration_ms"]
}
}# components/minimal.example/spec.yaml
name: minimal.example
version: 1.0.0
modes:
- extract
capabilities:
discover: true
streaming: false
configSchema:
type: object
properties:
connection:
type: string
source:
type: string
required:
- connection
- source# components/mysql.table/spec.yaml
name: mysql.table
version: 2.1.0
title: MySQL Table Connector
description: Connect to MySQL tables for ETL operations
modes:
- extract
- load
- discover
- analyze
capabilities:
discover: true
adHocAnalytics: true
inMemoryMove: false
streaming: true
bulkOperations: true
transactions: true
configSchema:
type: object
properties:
connection:
type: object
properties:
host:
type: string
port:
type: integer
default: 3306
database:
type: string
username:
type: string
password:
type: string
required:
- host
- database
- username
- password
table:
type: string
schema:
type: string
default: public
options:
type: object
properties:
batchSize:
type: integer
default: 1000
timeout:
type: integer
default: 30
required:
- connection
- table
secrets:
- /connection/password
- /connection/username
redaction:
strategy: mask
mask: "****"
extras:
- /connection/host
constraints:
required:
- when:
mode: load
must:
options:
batchSize:
minimum: 1
error: batchSize must be at least 1 for load mode
environment:
python: ">=3.10"
memory: 512MB
examples:
- title: Extract from MySQL
config:
connection:
host: localhost
port: 3306
database: mydb
username: reader
password: secret123
table: customers
schema: public
omlSnippet: |
type: mysql.table
connection: @mysql
table: customers
schema: public
notes: Requires SELECT permissions
- title: Load to MySQL with batching
config:
connection:
host: db.example.com
database: warehouse
username: writer
password: secret456
table: orders
options:
batchSize: 5000
timeout: 60
omlSnippet: |
type: mysql.table
connection: @mysql_prod
table: orders
options:
batchSize: 5000
compatibility:
requires:
- python>=3.10
- mysql>=8.0
conflicts:
- postgres
platforms:
- linux
- darwin
- docker
llmHints:
inputAliases:
table:
- table_name
- source_table
- target_table
schema:
- database
- namespace
- db
promptGuidance: |
Use mysql.table for MySQL database operations.
Always specify both connection and table.
For bulk operations, set appropriate batchSize in options.
yamlSnippets:
- "type: mysql.table"
- "connection: @mysql"
- "table: {{ table_name }}"
- "schema: {{ schema_name }}"
commonPatterns:
- pattern: bulk_extract
description: Use batchSize for efficient extraction
- pattern: upsert
description: Use merge mode with appropriate keys
loggingPolicy:
sensitivePaths:
- /connection/host
- /connection/port
eventDefaults:
- discovery.start
- discovery.complete
- transfer.start
- transfer.progress
- transfer.complete
metricsToCapture:
- rows_read
- rows_written
- bytes_processed
- duration_ms
limits:
maxRows: 1000000
maxSizeMB: 1024
maxDurationSeconds: 3600
maxConcurrency: 10
rateLimit:
requests: 100
period: minuteThe Component Registry will use these specifications to:
- Validate configurations against
configSchema - Apply secrets masking using
secretsandredactionfields - Generate LLM context from
llmHintsand examples - Enforce limits during execution
- Configure logging based on
loggingPolicy
Integration helpers to be implemented in osiris/components/utils.py:
def collect_secret_paths(spec: dict) -> set[str]:
"""Collect all secret paths from spec"""
paths = set(spec.get("secrets", []))
if "redaction" in spec:
paths.update(spec["redaction"].get("extras", []))
if "loggingPolicy" in spec:
paths.update(spec["loggingPolicy"].get("sensitivePaths", []))
return paths
def redaction_policy(spec: dict) -> RedactionPolicy:
"""Extract redaction policy from spec"""
policy = spec.get("redaction", {})
return RedactionPolicy(
strategy=policy.get("strategy", "mask"),
mask=policy.get("mask", "***"),
paths=collect_secret_paths(spec)
)All component specifications must:
- Pass JSON Schema validation against
spec.schema.json - Have unique component names
- Use valid semantic versioning
- Provide valid JSON Pointers for secrets
- Include at least one operational mode
- Define required capabilities
The Component Registry provides session-aware validation with structured logging:
# Validate a component spec with session logging
$ python osiris.py components validate mysql.writer --level enhanced
✓ Component 'mysql.writer' is valid (level: enhanced)
Version: 1.0.0
Modes: write, discover
Session: components_validate_1735900000000
# View validation session logs
$ python osiris.py logs show --session components_validate_1735900000000
Session: components_validate_1735900000000
Created: 2025-01-03 12:00:00
Events:
12:00:00.123 component_validation_start {component: mysql.writer, level: enhanced}
12:00:00.456 component_validation_complete {status: ok, errors: 0, duration_ms: 333}
# Custom session with filtered events
$ python osiris.py components validate supabase.extractor \
--session-id my_validation \
--level strict \
--events "component_validation_*" \
--json
{
"component": "supabase.extractor",
"level": "strict",
"is_valid": true,
"errors": [],
"session_id": "my_validation",
"duration_ms": 250
}Validation creates structured logs in logs/<session_id>/ including:
events.jsonl: Structured validation eventsmetrics.jsonl: Performance metricsosiris.log: Standard application logsdebug.log: Debug-level logs (if enabled)
- Secrets: Always use JSON Pointers for secret fields
- Examples: Provide 1-2 clear, working examples
- LLM Hints: Keep
promptGuidanceconcise (≤500 chars) - Constraints: Document cross-field dependencies clearly
- Versioning: Follow semantic versioning strictly
- Documentation: Use
titleanddescriptionfor clarity
The following component specifications have been implemented as part of M1a.2:
-
mysql.extractor (
components/mysql.extractor/spec.yaml)- Extracts data from MySQL databases
- Supports discovery, custom SQL queries, and bulk extraction
- Configurable connection pooling and batch sizes
-
mysql.writer (
components/mysql.writer/spec.yaml)- Writes data to MySQL databases
- Supports append, replace, and upsert modes in write mode
- Supports discovery of target schemas
- Transaction support with configurable batch sizes
-
supabase.extractor (
components/supabase.extractor/spec.yaml)- Extracts data from Supabase via REST API
- Supports PostgREST filters and joins
- Rate-limited API calls with retry logic
-
supabase.writer (
components/supabase.writer/spec.yaml)- Writes data to Supabase via REST API
- Supports insert, upsert, and update modes in write mode
- Supports discovery of target schemas
- Conflict resolution via on_conflict specification
- Separation of Concerns: Extractors and writers are separate components
- Mode-Specific: Extractors support
extractanddiscover, writers supportwriteanddiscover - Secrets Management: All components declare password/key fields as secrets
- LLM Optimization: Each component includes promptGuidance and yamlSnippets
- Validation: All examples are validated against their configSchema
| Component | Required Fields | Notes |
|---|---|---|
| mysql.extractor | host, database, user, password, table | Standard MySQL connection parameters |
| mysql.writer | host, database, user, password, table | Same as extractor |
| supabase.extractor | key, table | Also needs url OR project_id (constraint) |
| supabase.writer | key, table | Also needs url OR project_id (constraint) |
| Capability | Meaning | Impact on Agent |
|---|---|---|
| discover | Can list tables/schemas | Agent can explore database structure |
| adHocAnalytics | Can execute arbitrary queries | Agent can run custom SQL for analysis |
| inMemoryMove | Supports DataFrame transfers | Can pass data directly between components |
| streaming | Supports stream processing | Can handle real-time data flows |
| bulkOperations | Supports batch operations | Efficient for large datasets |
| transactions | Supports ACID transactions | Ensures data consistency |
| partitioning | Supports partitioned processing | Can parallelize operations |
| customTransforms | Supports custom transformations | Agent can apply user-defined logic |
- MySQL Extractor: discover ✓, adHocAnalytics ✓ (execute_query), bulkOperations ✓
- MySQL Writer: discover ✓, bulkOperations ✓, transactions ✓ (conn.commit)
- Supabase Extractor: discover ✓, bulkOperations ✓ (REST API limits apply)
- Supabase Writer: discover ✓, bulkOperations ✓ (batch_size supported)
Note: REST-based components (Supabase) don't support transactions due to stateless HTTP nature.
When updating component specifications:
- Increment version following semver
- Maintain backward compatibility when possible
- Document breaking changes in constraints
- Update examples to reflect changes
- Test with existing pipelines before deployment