Skip to content

ArrowBytesMap and ArrowBytesViewMap undercount memory by not accounting for initial hash table allocation #21248

@yashrb24

Description

@yashrb24

Describe the bug

ArrowBytesMap and ArrowBytesViewMap allocate their hash tables with HashTable::with_capacity(INITIAL_MAP_CAPACITY) (128 and 512 entries respectively) but initialize map_size to 0. The size() method returns map_size as the hash table's contribution to total memory.

The insert_accounted method (from HashTableAllocExt) only adds to map_size when the table needs to grow beyond its current capacity. Since the initial allocation happens in with_capacity, not through insert_accounted, those bytes are never counted — size() understates memory usage until the first resize.

// binary_map.rs — INITIAL_MAP_CAPACITY = 128
// binary_view_map.rs — INITIAL_MAP_CAPACITY = 512
pub fn new(output_type: OutputType) -> Self {
    Self {
        map: hashbrown::hash_table::HashTable::with_capacity(INITIAL_MAP_CAPACITY),
        map_size: 0,  // ← should be map.allocation_size()
        // ...
    }
}

insert_accounted only tracks growth:

fn insert_accounted(&mut self, x: Self::T, hasher: impl Fn(&Self::T) -> u64, accounting: &mut usize) {
    if self.len() == self.capacity() {
        let bump_size = self.capacity().max(16) * size_of::<T>();
        *accounting = (*accounting).checked_add(bump_size).expect("overflow");
        self.reserve(bump_elements, &hasher);
    }
    self.insert_unique(hash, x, hasher);
}

To Reproduce

N/A

Expected behavior

map_size should be initialized with map.allocation_size() to account for the pre-allocated memory from with_capacity.

Additional context

Two other uses of map_size: 0 in the codebase (row.rs and multi_group_by/mod.rs) use HashTable::with_capacity(0) which allocates nothing, so they are already correct.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions