Current Implementation
SQLite (Local/CLI Usage)
The project uses SQLite for local operations through two main classes:
SQLiteDatabaseEngine
https://github.com/iterative/datachain/blob/ed973c82f4ab4674a5a24816eaf689498117aef6/src/datachain/data_storage/sqlite.py#L98
SQLiteWarehouse
https://github.com/iterative/datachain/blob/ed973c82f4ab4674a5a24816eaf689498117aef6/src/datachain/data_storage/sqlite.py#L406C7-L406C22
Clickhouse implementation
Currently used in the SaaS version
Why Consider Iceberg?
Current Pain Points
- Dual Implementation Overhead:
- Maintaining separate SQLite and ClickHouse implementations
- Different transaction and concurrency models
- Separate optimization strategies
- Performance
Current Implementation
SQLite (Local/CLI Usage)
The project uses SQLite for local operations through two main classes:
SQLiteDatabaseEnginehttps://github.com/iterative/datachain/blob/ed973c82f4ab4674a5a24816eaf689498117aef6/src/datachain/data_storage/sqlite.py#L98
SQLiteWarehousehttps://github.com/iterative/datachain/blob/ed973c82f4ab4674a5a24816eaf689498117aef6/src/datachain/data_storage/sqlite.py#L406C7-L406C22
Clickhouse implementation
Currently used in the SaaS version
Why Consider Iceberg?
Current Pain Points