Describe the bug
ArrowBytesMap and ArrowBytesViewMap allocate their hash tables with HashTable::with_capacity(INITIAL_MAP_CAPACITY) (128 and 512 entries respectively) but initialize map_size to 0. The size() method returns map_size as the hash table's contribution to total memory.
The insert_accounted method (from HashTableAllocExt) only adds to map_size when the table needs to grow beyond its current capacity. Since the initial allocation happens in with_capacity, not through insert_accounted, those bytes are never counted — size() understates memory usage until the first resize.
// binary_map.rs — INITIAL_MAP_CAPACITY = 128
// binary_view_map.rs — INITIAL_MAP_CAPACITY = 512
pub fn new(output_type: OutputType) -> Self {
Self {
map: hashbrown::hash_table::HashTable::with_capacity(INITIAL_MAP_CAPACITY),
map_size: 0, // ← should be map.allocation_size()
// ...
}
}
insert_accounted only tracks growth:
fn insert_accounted(&mut self, x: Self::T, hasher: impl Fn(&Self::T) -> u64, accounting: &mut usize) {
if self.len() == self.capacity() {
let bump_size = self.capacity().max(16) * size_of::<T>();
*accounting = (*accounting).checked_add(bump_size).expect("overflow");
self.reserve(bump_elements, &hasher);
}
self.insert_unique(hash, x, hasher);
}
To Reproduce
N/A
Expected behavior
map_size should be initialized with map.allocation_size() to account for the pre-allocated memory from with_capacity.
Additional context
Two other uses of map_size: 0 in the codebase (row.rs and multi_group_by/mod.rs) use HashTable::with_capacity(0) which allocates nothing, so they are already correct.
Describe the bug
ArrowBytesMapandArrowBytesViewMapallocate their hash tables withHashTable::with_capacity(INITIAL_MAP_CAPACITY)(128 and 512 entries respectively) but initializemap_sizeto0. Thesize()method returnsmap_sizeas the hash table's contribution to total memory.The
insert_accountedmethod (fromHashTableAllocExt) only adds tomap_sizewhen the table needs to grow beyond its current capacity. Since the initial allocation happens inwith_capacity, not throughinsert_accounted, those bytes are never counted —size()understates memory usage until the first resize.insert_accountedonly tracks growth:To Reproduce
N/A
Expected behavior
map_sizeshould be initialized withmap.allocation_size()to account for the pre-allocated memory fromwith_capacity.Additional context
Two other uses of
map_size: 0in the codebase (row.rsandmulti_group_by/mod.rs) useHashTable::with_capacity(0)which allocates nothing, so they are already correct.