Add InfluxDB 3 Core entry#864
Open
alexey-milovidov wants to merge 1 commit intomainfrom
Open
Conversation
Adds an entry for the open-source SQL build of InfluxDB. The query engine is
Apache DataFusion; ingestion is line protocol over /api/v3/write_lp because
there is no native CSV/Parquet bulk loader. load.py streams hits.tsv, encodes
each row as a line-protocol point with a unique row-index timestamp, and POSTs
in 1000-row batches. Field names are lowercased so the standard CamelCase
ClickBench queries resolve under DataFusion's identifier folding. Q19 and Q43
cast EventTime (stored as a string field) to TIMESTAMP for extract(minute) and
date_trunc('minute', ...). Removes InfluxDB from the README TODO list.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Member
Author
|
Data loading is painfully slow. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
influxdb/benchmark entry targeting InfluxDB 3 Core — the open-source, SQL-capable (DataFusion) build of InfluxDB.load.pystreamshits.tsvto/api/v3/write_lp. All 105 columns are stored as fields (no tags), with a unique row-index nanosecond timestamp so points don't merge. Field names are lowercased so standard CamelCase ClickBench queries resolve under DataFusion's identifier-case folding.queries.sqlis the standard ClickBench set; only Q19 and Q43 are adapted (CAST(EventTime AS TIMESTAMP)) sinceEventTimeis stored as a string field.TODOlist.Validation
hits.tsv. All 43 queries returned three timings each — no nulls, no errors. Spot-checked Q19 (extract(minute ...)), Q29 (REGEXP_REPLACE), and Q43 (DATE_TRUNC('minute', ...)) — all returned sensible rows.Test plan
benchmark.shon ac6a.4xlargeVM and captureLoad time/Data size/ 43 query timings.influxdb/results/c6a.4xlarge.json.🤖 Generated with Claude Code