Skip to content

Add InfluxDB 3 Core entry#864

Open
alexey-milovidov wants to merge 1 commit intomainfrom
add-influxdb-entry
Open

Add InfluxDB 3 Core entry#864
alexey-milovidov wants to merge 1 commit intomainfrom
add-influxdb-entry

Conversation

@alexey-milovidov
Copy link
Copy Markdown
Member

Summary

  • Adds an influxdb/ benchmark entry targeting InfluxDB 3 Core — the open-source, SQL-capable (DataFusion) build of InfluxDB.
  • load.py streams hits.tsv to /api/v3/write_lp. All 105 columns are stored as fields (no tags), with a unique row-index nanosecond timestamp so points don't merge. Field names are lowercased so standard CamelCase ClickBench queries resolve under DataFusion's identifier-case folding.
  • queries.sql is the standard ClickBench set; only Q19 and Q43 are adapted (CAST(EventTime AS TIMESTAMP)) since EventTime is stored as a string field.
  • Removes InfluxDB from the README TODO list.

Validation

  • Verified the install + start + create-db + load + query flow on a 1000-row sample of hits.tsv. All 43 queries returned three timings each — no nulls, no errors. Spot-checked Q19 (extract(minute ...)), Q29 (REGEXP_REPLACE), and Q43 (DATE_TRUNC('minute', ...)) — all returned sensible rows.
  • Full 100M-row load not run on this branch — that is best done on a benchmark VM. At ~1 KB per line-protocol point this will take some hours; results to be added after a real run.

Test plan

  • Run benchmark.sh on a c6a.4xlarge VM and capture Load time / Data size / 43 query timings.
  • Add influxdb/results/c6a.4xlarge.json.

🤖 Generated with Claude Code

Adds an entry for the open-source SQL build of InfluxDB. The query engine is
Apache DataFusion; ingestion is line protocol over /api/v3/write_lp because
there is no native CSV/Parquet bulk loader. load.py streams hits.tsv, encodes
each row as a line-protocol point with a unique row-index timestamp, and POSTs
in 1000-row batches. Field names are lowercased so the standard CamelCase
ClickBench queries resolve under DataFusion's identifier folding. Q19 and Q43
cast EventTime (stored as a string field) to TIMESTAMP for extract(minute) and
date_trunc('minute', ...). Removes InfluxDB from the README TODO list.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@alexey-milovidov
Copy link
Copy Markdown
Member Author

Data loading is painfully slow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant