Component Specification Reference

Overview

The Component Specification Schema enables Osiris components to be self-describing, providing configuration requirements, capabilities, and security metadata. This allows the LLM to generate valid pipeline configurations and enables automatic secrets masking in logs and artifacts.

Schema Version

Schema Draft: JSON Schema Draft 2020-12
Schema ID: https://osiris.ai/schemas/component-spec/v1.0.0
Location: components/spec.schema.json

Core Fields

Required Fields

Field	Type	Description	Example
`name`	string	Component identifier (pattern: `^[a-z0-9_.-]+$`)	`mysql.table`
`version`	string	Semantic version (semver)	`1.0.0`
`modes`	array	Supported operational modes	`["extract", "load"]`
`capabilities`	object	Component capability flags	See Capabilities section
`configSchema`	object	JSON Schema for component configuration	See ConfigSchema section

Optional Fields

Field	Type	Description
`title`	string	Human-readable component title
`description`	string	Detailed component description
`secrets`	array	JSON Pointer paths to secret fields
`x-connection-fields`	array	Fields provided by connection references with override policies
`redaction`	object	Policy for redacting sensitive data
`constraints`	object	Cross-field validation rules
`examples`	array	Usage examples with config and OML
`compatibility`	object	Requirements and conflicts
`llmHints`	object	Hints for LLM-driven generation
`loggingPolicy`	object	Logging configuration
`limits`	object	Resource and operational limits

Field Details

Modes

Available operational modes:

extract - Read data from source
write - Write data to destination
load - Write data to destination (deprecated, use write)
transform - Transform data in-place
discover - Discover schema/metadata
analyze - Perform analytical queries
stream - Stream processing

Capabilities

Boolean flags indicating component capabilities:

{
  "discover": true,          // Supports schema discovery
  "adHocAnalytics": false,   // Supports ad-hoc queries
  "inMemoryMove": true,      // Supports in-memory transfers
  "streaming": false,        // Supports streaming
  "bulkOperations": true,    // Supports bulk operations
  "transactions": true,      // Supports transactions
  "partitioning": false,     // Supports partitioned processing
  "customTransforms": false  // Supports custom transforms
}

ConfigSchema

The configSchema field contains a nested JSON Schema defining the component's configuration:

{
  "configSchema": {
    "type": "object",
    "properties": {
      "connection": {
        "type": "object",
        "properties": {
          "host": {"type": "string"},
          "password": {"type": "string"}
        }
      },
      "table": {"type": "string"}
    },
    "required": ["connection", "table"]
  }
}

Secrets (JSON Pointers)

The secrets field uses JSON Pointer notation to identify sensitive fields:

{
  "secrets": [
    "/connection/password",     // Points to connection.password
    "/auth/apiKey",            // Points to auth.apiKey
    "/credentials/privateKey"  // Points to credentials.privateKey
  ]
}

JSON Pointer Syntax:

Always starts with /
Path segments separated by /
Array indices as numbers: /items/0/secret
Special characters escaped: ~0 for ~, ~1 for /

Connection Fields (x-connection-fields)

The x-connection-fields field declares which configuration fields can be provided by a connection reference and controls whether they can be overridden in pipeline step configs. This enables secure credential management and fine-grained control over field overrides.

Purpose: When a pipeline step uses a connection reference (e.g., connection: "@mysql.prod"), certain fields are resolved from the connection definition. The x-connection-fields specification tells the validator which fields are expected from the connection and their override policies.

Simple Format (all fields overridable by default):

x-connection-fields:
  - endpoint
  - auth_token
  - auth_username

Advanced Format (with override control):

x-connection-fields:
  - name: host
    override: allowed      # Can be overridden in step config
  - name: password
    override: forbidden    # Cannot be overridden (security)
  - name: headers
    override: warning      # Can override but emits warning

Override Policies:

allowed: Step config can override connection value (for testing/flexibility)
forbidden: Step config cannot override (security-sensitive fields)
warning: Step config can override but emits warning (ambiguous fields)

Example with MySQL:

x-connection-fields:
  - name: host
    override: allowed      # Infrastructure field - safe to override
  - name: port
    override: allowed      # Infrastructure field - safe to override
  - name: database
    override: forbidden    # Security: prevent database switching
  - name: user
    override: forbidden    # Security: prevent user switching
  - name: password
    override: forbidden    # Security: prevent credential override

Validation Behavior:

When connection reference used: Validator skips validation of connection-provided required fields
Forbidden override: Validation error if step config attempts override
Warning override: Validation warning if step config attempts override
Allowed override: No error or warning

See Also: x-connection-fields Specification for comprehensive documentation.

Redaction Policy

Controls how sensitive data is handled in logs:

{
  "redaction": {
    "strategy": "mask",        // mask, drop, or hash
    "mask": "***",             // Mask string (if strategy=mask)
    "extras": [                // Additional paths to redact
      "/connection/host"
    ]
  }
}

Examples

Component usage examples with configuration and OML snippets:

{
  "examples": [
    {
      "title": "Basic MySQL extraction",
      "config": {
        "connection": {
          "host": "localhost",
          "database": "mydb",
          "username": "user",
          "password": "secret"
        },
        "table": "customers"
      },
      "omlSnippet": "type: mysql.table\nconnection: @mysql\ntable: customers",
      "notes": "Requires read permissions"
    }
  ]
}

LLM Hints

Guidance for LLM-driven pipeline generation:

{
  "llmHints": {
    "inputAliases": {
      "table": ["table_name", "source_table"],
      "schema": ["database", "namespace"]
    },
    "promptGuidance": "Use for MySQL operations. Always specify connection and table.",
    "yamlSnippets": [
      "type: mysql.table\nconnection: @mysql"
    ],
    "commonPatterns": [
      {
        "pattern": "bulk_load",
        "description": "Use batchSize for efficiency"
      }
    ]
  }
}

Logging Policy

Defines logging behavior and sensitive data handling:

{
  "loggingPolicy": {
    "sensitivePaths": ["/connection/host"],
    "eventDefaults": ["discovery.start", "transfer.progress"],
    "metricsToCapture": ["rows_read", "rows_written", "duration_ms"]
  }
}

Complete Example

Minimal Component Spec

# components/minimal.example/spec.yaml
name: minimal.example
version: 1.0.0
modes:
  - extract
capabilities:
  discover: true
  streaming: false
configSchema:
  type: object
  properties:
    connection:
      type: string
    source:
      type: string
  required:
    - connection
    - source

Full-Featured Component Spec

# components/mysql.table/spec.yaml
name: mysql.table
version: 2.1.0
title: MySQL Table Connector
description: Connect to MySQL tables for ETL operations
modes:
  - extract
  - load
  - discover
  - analyze
capabilities:
  discover: true
  adHocAnalytics: true
  inMemoryMove: false
  streaming: true
  bulkOperations: true
  transactions: true
configSchema:
  type: object
  properties:
    connection:
      type: object
      properties:
        host:
          type: string
        port:
          type: integer
          default: 3306
        database:
          type: string
        username:
          type: string
        password:
          type: string
      required:
        - host
        - database
        - username
        - password
    table:
      type: string
    schema:
      type: string
      default: public
    options:
      type: object
      properties:
        batchSize:
          type: integer
          default: 1000
        timeout:
          type: integer
          default: 30
  required:
    - connection
    - table
secrets:
  - /connection/password
  - /connection/username
redaction:
  strategy: mask
  mask: "****"
  extras:
    - /connection/host
constraints:
  required:
    - when:
        mode: load
      must:
        options:
          batchSize:
            minimum: 1
      error: batchSize must be at least 1 for load mode
  environment:
    python: ">=3.10"
    memory: 512MB
examples:
  - title: Extract from MySQL
    config:
      connection:
        host: localhost
        port: 3306
        database: mydb
        username: reader
        password: secret123
      table: customers
      schema: public
    omlSnippet: |
      type: mysql.table
      connection: @mysql
      table: customers
      schema: public
    notes: Requires SELECT permissions
  - title: Load to MySQL with batching
    config:
      connection:
        host: db.example.com
        database: warehouse
        username: writer
        password: secret456
      table: orders
      options:
        batchSize: 5000
        timeout: 60
    omlSnippet: |
      type: mysql.table
      connection: @mysql_prod
      table: orders
      options:
        batchSize: 5000
compatibility:
  requires:
    - python>=3.10
    - mysql>=8.0
  conflicts:
    - postgres
  platforms:
    - linux
    - darwin
    - docker
llmHints:
  inputAliases:
    table:
      - table_name
      - source_table
      - target_table
    schema:
      - database
      - namespace
      - db
  promptGuidance: |
    Use mysql.table for MySQL database operations.
    Always specify both connection and table.
    For bulk operations, set appropriate batchSize in options.
  yamlSnippets:
    - "type: mysql.table"
    - "connection: @mysql"
    - "table: {{ table_name }}"
    - "schema: {{ schema_name }}"
  commonPatterns:
    - pattern: bulk_extract
      description: Use batchSize for efficient extraction
    - pattern: upsert
      description: Use merge mode with appropriate keys
loggingPolicy:
  sensitivePaths:
    - /connection/host
    - /connection/port
  eventDefaults:
    - discovery.start
    - discovery.complete
    - transfer.start
    - transfer.progress
    - transfer.complete
  metricsToCapture:
    - rows_read
    - rows_written
    - bytes_processed
    - duration_ms
limits:
  maxRows: 1000000
  maxSizeMB: 1024
  maxDurationSeconds: 3600
  maxConcurrency: 10
  rateLimit:
    requests: 100
    period: minute

Integration with Osiris

Registry Usage (M1a.3)

The Component Registry will use these specifications to:

Validate configurations against configSchema
Apply secrets masking using secrets and redaction fields
Generate LLM context from llmHints and examples
Enforce limits during execution
Configure logging based on loggingPolicy

Helper Functions (TODO)

Integration helpers to be implemented in osiris/components/utils.py:

def collect_secret_paths(spec: dict) -> set[str]:
    """Collect all secret paths from spec"""
    paths = set(spec.get("secrets", []))
    if "redaction" in spec:
        paths.update(spec["redaction"].get("extras", []))
    if "loggingPolicy" in spec:
        paths.update(spec["loggingPolicy"].get("sensitivePaths", []))
    return paths

def redaction_policy(spec: dict) -> RedactionPolicy:
    """Extract redaction policy from spec"""
    policy = spec.get("redaction", {})
    return RedactionPolicy(
        strategy=policy.get("strategy", "mask"),
        mask=policy.get("mask", "***"),
        paths=collect_secret_paths(spec)
    )

Validation

All component specifications must:

Pass JSON Schema validation against spec.schema.json
Have unique component names
Use valid semantic versioning
Provide valid JSON Pointers for secrets
Include at least one operational mode
Define required capabilities

Develop & Validate

The Component Registry provides session-aware validation with structured logging:

# Validate a component spec with session logging
$ python osiris.py components validate mysql.writer --level enhanced
✓ Component 'mysql.writer' is valid (level: enhanced)
  Version: 1.0.0
  Modes: write, discover
  Session: components_validate_1735900000000

# View validation session logs
$ python osiris.py logs show --session components_validate_1735900000000
Session: components_validate_1735900000000
Created: 2025-01-03 12:00:00
Events:
  12:00:00.123 component_validation_start    {component: mysql.writer, level: enhanced}
  12:00:00.456 component_validation_complete  {status: ok, errors: 0, duration_ms: 333}

# Custom session with filtered events
$ python osiris.py components validate supabase.extractor \
    --session-id my_validation \
    --level strict \
    --events "component_validation_*" \
    --json
{
  "component": "supabase.extractor",
  "level": "strict",
  "is_valid": true,
  "errors": [],
  "session_id": "my_validation",
  "duration_ms": 250
}

Validation creates structured logs in logs/<session_id>/ including:

events.jsonl: Structured validation events
metrics.jsonl: Performance metrics
osiris.log: Standard application logs
debug.log: Debug-level logs (if enabled)

Best Practices

Secrets: Always use JSON Pointers for secret fields
Examples: Provide 1-2 clear, working examples
LLM Hints: Keep promptGuidance concise (≤500 chars)
Constraints: Document cross-field dependencies clearly
Versioning: Follow semantic versioning strictly
Documentation: Use title and description for clarity

Bootstrap Components (M1a.2)

The following component specifications have been implemented as part of M1a.2:

MySQL Components

mysql.extractor (components/mysql.extractor/spec.yaml)
- Extracts data from MySQL databases
- Supports discovery, custom SQL queries, and bulk extraction
- Configurable connection pooling and batch sizes
mysql.writer (components/mysql.writer/spec.yaml)
- Writes data to MySQL databases
- Supports append, replace, and upsert modes in write mode
- Supports discovery of target schemas
- Transaction support with configurable batch sizes

Supabase Components

supabase.extractor (components/supabase.extractor/spec.yaml)
- Extracts data from Supabase via REST API
- Supports PostgREST filters and joins
- Rate-limited API calls with retry logic
supabase.writer (components/supabase.writer/spec.yaml)
- Writes data to Supabase via REST API
- Supports insert, upsert, and update modes in write mode
- Supports discovery of target schemas
- Conflict resolution via on_conflict specification

Key Design Decisions

Separation of Concerns: Extractors and writers are separate components
Mode-Specific: Extractors support extract and discover, writers support write and discover
Secrets Management: All components declare password/key fields as secrets
LLM Optimization: Each component includes promptGuidance and yamlSnippets
Validation: All examples are validated against their configSchema

Required Configuration by Component

Component	Required Fields	Notes
mysql.extractor	host, database, user, password, table	Standard MySQL connection parameters
mysql.writer	host, database, user, password, table	Same as extractor
supabase.extractor	key, table	Also needs url OR project_id (constraint)
supabase.writer	key, table	Also needs url OR project_id (constraint)

Capability Flags Explained

Capability	Meaning	Impact on Agent
discover	Can list tables/schemas	Agent can explore database structure
adHocAnalytics	Can execute arbitrary queries	Agent can run custom SQL for analysis
inMemoryMove	Supports DataFrame transfers	Can pass data directly between components
streaming	Supports stream processing	Can handle real-time data flows
bulkOperations	Supports batch operations	Efficient for large datasets
transactions	Supports ACID transactions	Ensures data consistency
partitioning	Supports partitioned processing	Can parallelize operations
customTransforms	Supports custom transformations	Agent can apply user-defined logic

Current Implementation Status

MySQL Extractor: discover ✓, adHocAnalytics ✓ (execute_query), bulkOperations ✓
MySQL Writer: discover ✓, bulkOperations ✓, transactions ✓ (conn.commit)
Supabase Extractor: discover ✓, bulkOperations ✓ (REST API limits apply)
Supabase Writer: discover ✓, bulkOperations ✓ (batch_size supported)

Note: REST-based components (Supabase) don't support transactions due to stateless HTTP nature.

Migration Notes

When updating component specifications:

Increment version following semver
Maintain backward compatibility when possible
Document breaking changes in constraints
Update examples to reflect changes
Test with existing pipelines before deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Component Specification Reference

Overview

Schema Version

Core Fields

Required Fields

Optional Fields

Field Details

Modes

Capabilities

ConfigSchema

Secrets (JSON Pointers)

Connection Fields (x-connection-fields)

Redaction Policy

Examples

LLM Hints

Logging Policy

Complete Example

Minimal Component Spec

Full-Featured Component Spec

Integration with Osiris

Registry Usage (M1a.3)

Helper Functions (TODO)

Validation

Develop & Validate

Best Practices

Bootstrap Components (M1a.2)

MySQL Components

Supabase Components

Key Design Decisions

Required Configuration by Component

Capability Flags Explained

Current Implementation Status

Migration Notes

FilesExpand file tree

components-spec.md

Latest commit

History

components-spec.md

File metadata and controls

Component Specification Reference

Overview

Schema Version

Core Fields

Required Fields

Optional Fields

Field Details

Modes

Capabilities

ConfigSchema

Secrets (JSON Pointers)

Connection Fields (x-connection-fields)

Redaction Policy

Examples

LLM Hints

Logging Policy

Complete Example

Minimal Component Spec

Full-Featured Component Spec

Integration with Osiris

Registry Usage (M1a.3)

Helper Functions (TODO)

Validation

Develop & Validate

Best Practices

Bootstrap Components (M1a.2)

MySQL Components

Supabase Components

Key Design Decisions

Required Configuration by Component

Capability Flags Explained

Current Implementation Status

Migration Notes