Skip to content

fix: account for initial hash table allocation in ArrowBytesMap/ArrowBytesViewMap#21249

Open
yashrb24 wants to merge 3 commits intoapache:mainfrom
yashrb24:fix/binary-map-initial-map-size
Open

fix: account for initial hash table allocation in ArrowBytesMap/ArrowBytesViewMap#21249
yashrb24 wants to merge 3 commits intoapache:mainfrom
yashrb24:fix/binary-map-initial-map-size

Conversation

@yashrb24
Copy link
Copy Markdown

@yashrb24 yashrb24 commented Mar 30, 2026

Which issue does this PR close?

Rationale for this change

ArrowBytesMap and ArrowBytesViewMap allocate their hash tables with HashTable::with_capacity(INITIAL_MAP_CAPACITY) but initialize map_size to 0. The insert_accounted method only tracks incremental growth beyond the current capacity, so the initial allocation from with_capacity is never counted. size() understates memory usage until the first resize.

What changes are included in this PR?

Initialize map_size with map.allocation_size() in both ArrowBytesMap::new and ArrowBytesViewMap::new to capture the pre-allocated memory.

Are these changes tested?

The change is a one-line init fix. Existing tests cover the size() method behavior.

Are there any user-facing changes?

No API changes. size() now returns the memory estimate that includes the initial hash table allocation.

@github-actions github-actions bot added the physical-expr Changes to the physical-expr crates label Mar 30, 2026
…BytesViewMap

ArrowBytesMap and ArrowBytesViewMap initialize their hash table with
with_capacity(INITIAL_MAP_CAPACITY) but set map_size to 0. The
insert_accounted method only tracks incremental growth beyond the
current capacity, so the initial allocation is never counted in
size() — understating memory usage.

Initialize map_size with map.allocation_size() to capture the
pre-allocated memory.
@yashrb24 yashrb24 force-pushed the fix/binary-map-initial-map-size branch from 03429d9 to 0673088 Compare March 30, 2026 12:08
yashrb24 and others added 2 commits March 30, 2026 18:16
Verify that ArrowBytesMap::size() and ArrowBytesViewMap::size() account
for the hash table memory pre-allocated by HashTable::with_capacity().
Both tests fail without the fix (map_size: 0) and pass with it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ArrowBytesMap and ArrowBytesViewMap undercount memory by not accounting for initial hash table allocation

1 participant