Intern queued query text in shared HTAB to bound DSA usage#92
Open
iskakaushik wants to merge 1 commit intomainfrom
Open
Intern queued query text in shared HTAB to bound DSA usage#92iskakaushik wants to merge 1 commit intomainfrom
iskakaushik wants to merge 1 commit intomainfrom
Conversation
Without interning, every queued event owned a private DSA copy of the normalized query text — live DSA usage grew as `queued_events * query_len` and exhausted the bounded DSA pool well before the queue reached capacity. Repeated long normalized queries were the worst case. Add a shared, partition-locked HTAB whose entries point at refcount-managed DSA bodies, and route TryEnqueueLocked / PschDequeueEvent through it for query text. Live DSA usage drops to `distinct_live_query_texts * query_len`. Error messages stay per-event for now (separate optimization). The pattern mirrors pg_stat_statements (shared HTAB sized via hash_estimate_size + ShmemInitHash) and pgstat_shmem (refcounted DSA bodies freed only after the HTAB entry is removed). Adds t/032_query_intern.pl: 6000 EXECUTEs of a long normalized query through an 8MB DSA pool exit with dsa_oom_count == 0; the same workload without interning would push ~12MB through an 8MB pool and OOM.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR introduces a shared-memory query-text interner for the pg_stat_ch queue so repeated identical normalized queries share a single DSA-backed copy, bounding DSA usage to the number of distinct live query texts instead of the number of queued events.
Changes:
- Add a partition-locked shared HTAB + refcounted DSA objects for interned query text (
query_intern.{h,c}). - Route queue slot query text through the interner (acquire on enqueue, resolve+release on dequeue) and adjust shmem sizing/lock tranche usage (
shmem.cc). - Add a TAP test that stresses tight DSA settings with many repeated long EXECUTEs (
t/032_query_intern.pl).
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| t/032_query_intern.pl | New TAP test to validate that repeated long normalized query text does not exhaust a tight DSA pool. |
| src/queue/shmem.cc | Integrates query interning into enqueue/dequeue and adjusts shared memory sizing + LWLock tranche allocation. |
| src/queue/query_intern.h | Declares the shared query-text interner API and documents its design. |
| src/queue/query_intern.c | Implements partition-locked HTAB interning with refcounted DSA-backed query bodies. |
| src/queue/psch_dsa.h | Adds a forward typedef so PschSharedState can be referenced cleanly from C translation units. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+37
to
+39
| // Add the HTAB size requirement to the running total of extension shmem. | ||
| // Caller passes the shared memory size accumulator; this function increases | ||
| // it to include hash_estimate_size for the interner. |
Comment on lines
+61
to
+66
| // the DSA handle is unavailable, DSA allocation fails, the shared HTAB is | ||
| // full, or a hash collision is detected against a different query text | ||
| // (collisions are treated as a miss with no insert — exporting empty query | ||
| // text is preferable to exporting the wrong SQL). | ||
| // | ||
| // Must be called by a backend that has already attached to the DSA area. |
Comment on lines
+11
to
+15
| # Strategy: | ||
| # 1. Configure an unreachable ClickHouse so the bgworker cannot drain the | ||
| # queue. Events accumulate. | ||
| # 2. Set `string_area_size = 8MB` (the minimum allowed) so the DSA pool is | ||
| # tight. 6000 × ~2KB unique copies would be ~12MB. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
queued_events * query_lentodistinct_live_query_texts * query_len.pg_stat_statements(shared HTAB sized viahash_estimate_size+ShmemInitHash) andpgstat_shmem(refcounted DSA bodies freed only after the HTAB entry is removed).Why
Without interning, every queued event owned a private DSA copy of the normalized query text. With
query_len ~= 2047bytes, the DSA string pool was exhausted well before the queue itself reached capacity, especially under workloads that repeatedly executed the same long normalized query.What changed
src/queue/query_intern.{h,c}— pure-C interner: 32-partition LWLock HTAB → DSA-allocatedPschQueryInternObject(key + magic + bytes), refcount on entry.Acquireallocates outside the partition lock and re-checks under it;ResolveAndReleasecopies bytes (caller's slot is the live reference) then drops the refcount; on last release the entry is removed and the DSA body freed outside the partition lock.src/queue/shmem.cc— split shmem sizing into[state + ring + DSA](passed toShmemInitStruct) and the HTAB pool (allocated byShmemInitHashfrom the sameRequestAddinShmemSpacereservation). Request1 + 32LWLocks in the existingpg_stat_chnamed tranche. Init the interner underAddinShmemInitLock. Replace the per-eventPschDsaAllocString/PschDsaResolveStringcalls for query text with the newAcquire/ResolveAndRelease.src/queue/psch_dsa.h— addtypedef struct PschSharedState PschSharedState;so the bare type is usable from pure C.Failure modes (best-effort telemetry preserved)
InvalidDsaPointer, caller setsquery_len = 0(numeric data preserved).dsa_oom_countstill bumped.HASH_ENTER_NULL) → free loser allocation,InvalidDsaPointer,query_len = 0.refcount++.(dbid, queryid, query_hash, query_len)) → treat as miss, returnInvalidDsaPointerso we export empty rather than wrong SQL.Test plan
t/032_query_intern.pl— drives 6000 EXECUTEs of a long normalized query through an 8MB DSA pool. Assertsenqueued >= 5000anddsa_oom_count == 0. The same workload without interning would push ~12MB through an 8MB pool and OOM.001-009,015,017,020,022.010,011,021,023-025,027,031,016single-cycle) verified failing locally on a cleanmainworktree with the same deterministic checksum error — pre-existing local container/version issue, not introduced here. CI should be authoritative.028,029reference apg_stat_ch.debug_throw_in_exportGUC that doesn't exist in the tree — pre-existing, unrelated.🤖 Generated with Claude Code