Skip to content

Decompose non-constant-offset byte_extract from union in field sensitivity#8958

Open
tautschnig wants to merge 2 commits intodiffblue:developfrom
tautschnig:fix-8813-follow-up
Open

Decompose non-constant-offset byte_extract from union in field sensitivity#8958
tautschnig wants to merge 2 commits intodiffblue:developfrom
tautschnig:fix-8813-follow-up

Conversation

@tautschnig
Copy link
Copy Markdown
Collaborator

@tautschnig tautschnig commented Apr 13, 2026

When a byte_extract with a non-constant offset is applied to a union-typed SSA expression (e.g., from a pointer dereference inside a quantified contract postcondition), field sensitivity previously returned the expression unchanged. This left the full union symbol referenced in the SSA equation, forcing the SMT2 converter to materialise the entire union as a flat bitvector concatenation (e.g., 65536 bits for polyveck with 8x256 int32 elements).

Extend apply_byte_extract to handle non-constant offsets by decomposing through the widest union member, analogous to the get_subexpression_at_offset fix in pointer_offset_size.cpp. This creates a field-sensitive SSA symbol for the member and wraps it in a new byte_extract, which apply then processes recursively through the struct/array decomposition.

On the reproducer from
github.com//issues/8813#issuecomment-4234622724:

Without --slice-formula:
struct: 1.8s union before: 44s union after: 50s
With --slice-formula:
struct: 1.1s union before: 21s union after: 1.1s

The remaining gap without --slice-formula is due to the union definition equation still being emitted (it is no longer referenced by the forall expressions, but the slicer is needed to remove it).

  • Each commit message has a non-empty body, explaining why the change was made.
  • n/a Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • n/a The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • n/a My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • n/a White-space or formatting changes outside the feature-related changed lines are in commits of their own.

@tautschnig tautschnig self-assigned this Apr 13, 2026
Copilot AI review requested due to automatic review settings April 13, 2026 12:23
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves field-sensitive SSA handling for byte_extract operations on union-typed SSA expressions when the extract offset is non-constant, preventing large unions from being unnecessarily materialised as flat bitvectors during SMT2 conversion (notably in quantified contract postconditions). It also adds a regression test derived from the performance reproducer in issue #8813.

Changes:

  • Extend field_sensitivityt::apply_byte_extract to rewrite non-constant-offset byte_extract from a union SSA expression via the widest union member, enabling subsequent struct/array field sensitivity to kick in.
  • Add a DFCC regression test (union_quantifier_performance) covering the quantified-contract + union + pointer-indirection scenario.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
src/goto-symex/field_sensitivity.cpp Adds non-constant-offset union byte_extract decomposition via widest member to reduce SMT materialisation blowups.
regression/contracts-dfcc/union_quantifier_performance/test.desc New regression driver for the union/quantifier performance case (expects success under DFCC+Z3+slice).
regression/contracts-dfcc/union_quantifier_performance/main.c Minimal C reproducer using a union and quantified postconditions to exercise the new decomposition path.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 13, 2026

Codecov Report

❌ Patch coverage is 94.73684% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 80.49%. Comparing base (cfd7295) to head (d61a206).

Files with missing lines Patch % Lines
src/goto-symex/field_sensitivity.cpp 94.73% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #8958   +/-   ##
========================================
  Coverage    80.49%   80.49%           
========================================
  Files         1704     1704           
  Lines       188789   188805   +16     
  Branches        73       73           
========================================
+ Hits        151967   151988   +21     
+ Misses       36822    36817    -5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…ivity

When a byte_extract with a non-constant offset is applied to a
union-typed SSA expression (e.g., from a pointer dereference inside a
quantified contract postcondition), field sensitivity previously
returned the expression unchanged. This left the full union symbol
referenced in the SSA equation, forcing the SMT2 converter to
materialise the entire union as a flat bitvector concatenation
(e.g., 65536 bits for polyveck with 8x256 int32 elements).

Extend apply_byte_extract to handle non-constant offsets by
decomposing through the widest union member, analogous to the
get_subexpression_at_offset fix in pointer_offset_size.cpp. This
creates a field-sensitive SSA symbol for the member and wraps it in
a new byte_extract, which apply then processes recursively through
the struct/array decomposition.

On the reproducer from
github.com/diffblue/issues/8813#issuecomment-4234622724:

  Without --slice-formula:
    struct: 1.8s    union before: 44s    union after: 50s
  With --slice-formula:
    struct: 1.1s    union before: 21s    union after: 1.1s

The remaining gap without --slice-formula is due to the union
definition equation still being emitted (it is no longer referenced
by the forall expressions, but the slicer is needed to remove it).

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
@hanno-becker
Copy link
Copy Markdown

I am happy to report that with this patch atop the others recently merged, the mldsa-native proofs for the union-based RAM optimizations go through without any code change! 🎉 pq-code-package/mldsa-native#1016

Thank you very much, @tautschnig!

tautschnig added a commit to tautschnig/cbmc that referenced this pull request Apr 13, 2026
Drop the is_ssa_expr requirement from the non-constant-offset path so
that it also handles operands like index(ssa, 0) that arise when
array field sensitivity is disabled. Instead of constructing an SSA
member expression and renaming it, simply wrap the operand in a
member_exprt for the widest union component and let the recursive
apply handle the rest.

This also addresses the Copilot review feedback on PR diffblue#8958:
the redundant tmp.type() assignment is gone (set_expression already
sets the type), and the double apply on the member operand is
eliminated.

With --no-array-field-sensitivity the forall body is now properly
decomposed through the widest member, but the array WITH-update
equation still materialises the full union (45s vs 1.2s for struct).
That remaining gap is a separate issue in how symex generates
WITH-updates for arrays of unions.

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Drop the is_ssa_expr requirement from the non-constant-offset path so
that it also handles operands like index(ssa, 0) that arise when
array field sensitivity is disabled. Instead of constructing an SSA
member expression and renaming it, simply wrap the operand in a
member_exprt for the widest union component and let the recursive
apply handle the rest.

This also addresses the Copilot review feedback on PR diffblue#8958:
the redundant tmp.type() assignment is gone (set_expression already
sets the type), and the double apply on the member operand is
eliminated.

With --no-array-field-sensitivity the forall body is now properly
decomposed through the widest member, but the array WITH-update
equation still materialises the full union (45s vs 1.2s for struct).
That remaining gap is a separate issue in how symex generates
WITH-updates for arrays of unions.

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants