Summary
The input_format_parquet_metadata_cache_max_size server setting is documented and intended as a byte limit for the Parquet file metadata cache, but the implementation uses the default per-entry weight of 1 (entry count), not bytes. As a result, the cache can grow far beyond the configured "maximum size," leading to unbounded memory use, OOM risk, and availability loss when querying many unique Parquet objects.
Source: Audit of PR #1385 – Antalya 26.1 – Forward port of parquet metadata caching.
Impact
- Correctness/availability: High — resource exhaustion and potential process crash.
- Likelihood: Realistic for object-storage workloads with high file cardinality.
- Blast radius: Process-wide (global singleton cache).
- Exploitability: Operationally easy (querying many unique Parquet objects is normal usage).
Root cause
- Product contract: Setting is described as "Maximum size of parquet file metadata cache" and defaults to
500000000 (bytes).
- Implementation:
ParquetFileMetaDataCache extends CacheBase with the default policy, which uses EqualWeightFunction — every cache entry has weight 1, so the limit is enforced as number of entries, not bytes.
- Result: With many unique
path:etag keys and non-trivial metadata per file, the cache can retain far more than the intended byte budget because eviction is driven by entry count, not memory size.
Affected code / anchors
| Location |
Relevance |
src/Processors/Formats/Impl/ParquetFileMetaDataCache.h |
Cache class inherits from CacheBase with default (entry-count) weight |
src/Common/ICachePolicy.h |
EqualWeightFunction returns 1 for any entry |
src/Core/ServerSettings.cpp |
input_format_parquet_metadata_cache_max_size declared as bytes (e.g. 500000000) |
Minimal reproduction
- Set
input_format_parquet_use_metadata_cache=1 and leave the server default for input_format_parquet_metadata_cache_max_size (500000000).
- Query Parquet files in object storage with many unique keys (distinct path/etag).
- Observe: Cache admission continues well past the intended byte budget because each entry contributes weight 1, so the "max size" is effectively a cap on entry count, not memory.
Affected transitions / subsystems
- Transitions: T2 (cache insert), T5 (configured-size enforcement at server start).
- Subsystem: Parquet object-storage read path; both
Parquet and ParquetMetadata input formats; all server threads using the global cache.
Labels / metadata suggestions
- Severity: High
- Component: Parquet / Formats / Caching
- Type: Bug (correctness / resource contract)
Summary
The
input_format_parquet_metadata_cache_max_sizeserver setting is documented and intended as a byte limit for the Parquet file metadata cache, but the implementation uses the default per-entry weight of 1 (entry count), not bytes. As a result, the cache can grow far beyond the configured "maximum size," leading to unbounded memory use, OOM risk, and availability loss when querying many unique Parquet objects.Source: Audit of PR #1385 – Antalya 26.1 – Forward port of parquet metadata caching.
Impact
Root cause
500000000(bytes).ParquetFileMetaDataCacheextendsCacheBasewith the default policy, which usesEqualWeightFunction— every cache entry has weight 1, so the limit is enforced as number of entries, not bytes.path:etagkeys and non-trivial metadata per file, the cache can retain far more than the intended byte budget because eviction is driven by entry count, not memory size.Affected code / anchors
src/Processors/Formats/Impl/ParquetFileMetaDataCache.hCacheBasewith default (entry-count) weightsrc/Common/ICachePolicy.hEqualWeightFunctionreturns1for any entrysrc/Core/ServerSettings.cppinput_format_parquet_metadata_cache_max_sizedeclared as bytes (e.g. 500000000)Minimal reproduction
input_format_parquet_use_metadata_cache=1and leave the server default forinput_format_parquet_metadata_cache_max_size(500000000).Affected transitions / subsystems
ParquetandParquetMetadatainput formats; all server threads using the global cache.Labels / metadata suggestions