fix: build_fallback_field_id_map produces incorrect column indices for schemas with nested types by mbutrovich · Pull Request #2307 · apache/iceberg-rust

mbutrovich · 2026-03-31T22:53:26Z

Which issue does this PR close?

Closes bug: build_fallback_field_id_map produces incorrect column indices for schemas with nested types #2306.
Downstream issue: bug: native Iceberg reader errors on residual filter on column after nested type for migrated Parquet files datafusion-comet#3860

What changes are included in this PR?

build_fallback_field_id_map iterated over Parquet leaf columns instead of top-level fields when building the field ID to column index mapping for migrated files (no embedded field IDs). When nested types (struct, list, map) precede a primitive column, they expand into multiple leaves, causing the mapping to diverge from add_fallback_field_ids_to_arrow_schema which correctly assigns ordinal IDs to top-level Arrow fields. This made predicates on columns after nested types resolve to a leaf inside the group, crashing with "Leaf column id in predicates isn't a root column in Parquet schema".

The fix iterates root_schema().get_fields() directly, assigning ordinal IDs only to top-level fields. For non-primitive fields (struct/list/map), it uses get_column_root_idx to advance past their leaf columns. This mirrors iceberg-java's ParquetSchemaUtil.addFallbackIds(), which iterates fileSchema.getFields() assigning ordinal IDs to top-level fields.

Also renames "Leave column" to "Leaf column" in error messages.

Are these changes tested?

An integration test (test_predicate_on_migrated_file_with_nested_types) writes a Parquet file without field IDs containing struct, list, and map columns before an id column, then reads with a predicate on id. This reproduces the exact crash before the fix. Test data is constructed with serde_arrow for readability.
Apache DataFusion Comet used the repro test in
apache/datafusion-comet#3860 and it passes with this change:
test: [DO NOT MERGE] test upstream iceberg-rust fix for #3860 datafusion-comet#3872

crates/iceberg/src/arrow/reader.rs

blackmwk

Thanks @mbutrovich for this fix! Generally LGTM!

crates/iceberg/src/arrow/reader.rs

mbutrovich · 2026-04-02T17:07:24Z

Most recent changes:

crates/iceberg/Cargo.toml — added serde_arrow = { version = "0.14", features = ["arrow-58"] } dev-dependency
crates/iceberg/src/arrow/reader.rs — replaced three separate tests (test_predicate_on_migrated_file_with_{struct,list,map}) and the shared helper with one test_predicate_on_migrated_file_with_nested_types that:
- Uses serde_arrow with Serialize/Deserialize structs for readable data construction
- Tests a schema with all three nested types (struct, list, map) before the id column

mbutrovich added 3 commits March 31, 2026 18:47

fix apache#2306 and add tests

9a476fa

simplify

2424d00

update tests to reference issue

80a539e

mbutrovich mentioned this pull request Mar 31, 2026

bug: native Iceberg reader errors on residual filter on column after nested type for migrated Parquet files apache/datafusion-comet#3860

Open

blackmwk reviewed Apr 1, 2026

View reviewed changes

crates/iceberg/src/arrow/reader.rs Show resolved Hide resolved

address PR feedback, add comments.

869d28b

mbutrovich requested a review from blackmwk April 1, 2026 15:12

Merge branch 'main' into field_id_nested_types

8232740

This was referenced Apr 1, 2026

test: [DO NOT MERGE] test upstream iceberg-rust fix for #3860 apache/datafusion-comet#3872

Closed

Tracking Issue of Iceberg Rust 0.9.1 Release #2303

Open

mbutrovich self-assigned this Apr 1, 2026

mbutrovich added the bug Something isn't working label Apr 1, 2026

blackmwk reviewed Apr 2, 2026

View reviewed changes

crates/iceberg/src/arrow/reader.rs Outdated Show resolved Hide resolved

address PR feedback.

2561149

mbutrovich requested a review from blackmwk April 2, 2026 17:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: build_fallback_field_id_map produces incorrect column indices for schemas with nested types#2307

fix: build_fallback_field_id_map produces incorrect column indices for schemas with nested types#2307
mbutrovich wants to merge 6 commits intoapache:mainfrom
mbutrovich:field_id_nested_types

mbutrovich commented Mar 31, 2026 •

edited

Loading

Uh oh!

Uh oh!

blackmwk left a comment

Uh oh!

Uh oh!

mbutrovich commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mbutrovich commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

What changes are included in this PR?

Are these changes tested?

Uh oh!

Uh oh!

blackmwk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mbutrovich commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mbutrovich commented Mar 31, 2026 •

edited

Loading