Decompose non-constant-offset byte_extract from union in field sensitivity#8958
Decompose non-constant-offset byte_extract from union in field sensitivity#8958tautschnig wants to merge 2 commits intodiffblue:developfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR improves field-sensitive SSA handling for byte_extract operations on union-typed SSA expressions when the extract offset is non-constant, preventing large unions from being unnecessarily materialised as flat bitvectors during SMT2 conversion (notably in quantified contract postconditions). It also adds a regression test derived from the performance reproducer in issue #8813.
Changes:
- Extend
field_sensitivityt::apply_byte_extractto rewrite non-constant-offsetbyte_extractfrom a union SSA expression via the widest union member, enabling subsequent struct/array field sensitivity to kick in. - Add a DFCC regression test (
union_quantifier_performance) covering the quantified-contract + union + pointer-indirection scenario.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/goto-symex/field_sensitivity.cpp | Adds non-constant-offset union byte_extract decomposition via widest member to reduce SMT materialisation blowups. |
| regression/contracts-dfcc/union_quantifier_performance/test.desc | New regression driver for the union/quantifier performance case (expects success under DFCC+Z3+slice). |
| regression/contracts-dfcc/union_quantifier_performance/main.c | Minimal C reproducer using a union and quantified postconditions to exercise the new decomposition path. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #8958 +/- ##
========================================
Coverage 80.49% 80.49%
========================================
Files 1704 1704
Lines 188789 188805 +16
Branches 73 73
========================================
+ Hits 151967 151988 +21
+ Misses 36822 36817 -5 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…ivity When a byte_extract with a non-constant offset is applied to a union-typed SSA expression (e.g., from a pointer dereference inside a quantified contract postcondition), field sensitivity previously returned the expression unchanged. This left the full union symbol referenced in the SSA equation, forcing the SMT2 converter to materialise the entire union as a flat bitvector concatenation (e.g., 65536 bits for polyveck with 8x256 int32 elements). Extend apply_byte_extract to handle non-constant offsets by decomposing through the widest union member, analogous to the get_subexpression_at_offset fix in pointer_offset_size.cpp. This creates a field-sensitive SSA symbol for the member and wraps it in a new byte_extract, which apply then processes recursively through the struct/array decomposition. On the reproducer from github.com/diffblue/issues/8813#issuecomment-4234622724: Without --slice-formula: struct: 1.8s union before: 44s union after: 50s With --slice-formula: struct: 1.1s union before: 21s union after: 1.1s The remaining gap without --slice-formula is due to the union definition equation still being emitted (it is no longer referenced by the forall expressions, but the slicer is needed to remove it). Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
cb4c3be to
5470156
Compare
|
I am happy to report that with this patch atop the others recently merged, the mldsa-native proofs for the Thank you very much, @tautschnig! |
Drop the is_ssa_expr requirement from the non-constant-offset path so that it also handles operands like index(ssa, 0) that arise when array field sensitivity is disabled. Instead of constructing an SSA member expression and renaming it, simply wrap the operand in a member_exprt for the widest union component and let the recursive apply handle the rest. This also addresses the Copilot review feedback on PR diffblue#8958: the redundant tmp.type() assignment is gone (set_expression already sets the type), and the double apply on the member operand is eliminated. With --no-array-field-sensitivity the forall body is now properly decomposed through the widest member, but the array WITH-update equation still materialises the full union (45s vs 1.2s for struct). That remaining gap is a separate issue in how symex generates WITH-updates for arrays of unions. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Drop the is_ssa_expr requirement from the non-constant-offset path so that it also handles operands like index(ssa, 0) that arise when array field sensitivity is disabled. Instead of constructing an SSA member expression and renaming it, simply wrap the operand in a member_exprt for the widest union component and let the recursive apply handle the rest. This also addresses the Copilot review feedback on PR diffblue#8958: the redundant tmp.type() assignment is gone (set_expression already sets the type), and the double apply on the member operand is eliminated. With --no-array-field-sensitivity the forall body is now properly decomposed through the widest member, but the array WITH-update equation still materialises the full union (45s vs 1.2s for struct). That remaining gap is a separate issue in how symex generates WITH-updates for arrays of unions. Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
7ec9647 to
d61a206
Compare
When a byte_extract with a non-constant offset is applied to a union-typed SSA expression (e.g., from a pointer dereference inside a quantified contract postcondition), field sensitivity previously returned the expression unchanged. This left the full union symbol referenced in the SSA equation, forcing the SMT2 converter to materialise the entire union as a flat bitvector concatenation (e.g., 65536 bits for polyveck with 8x256 int32 elements).
Extend apply_byte_extract to handle non-constant offsets by decomposing through the widest union member, analogous to the get_subexpression_at_offset fix in pointer_offset_size.cpp. This creates a field-sensitive SSA symbol for the member and wraps it in a new byte_extract, which apply then processes recursively through the struct/array decomposition.
On the reproducer from
github.com//issues/8813#issuecomment-4234622724:
Without --slice-formula:
struct: 1.8s union before: 44s union after: 50s
With --slice-formula:
struct: 1.1s union before: 21s union after: 1.1s
The remaining gap without --slice-formula is due to the union definition equation still being emitted (it is no longer referenced by the forall expressions, but the slicer is needed to remove it).