Skip to content

JIT: Extend cast removal pattern for bitwise ops: OP(CAST(x), CAST(y)) → CAST(OP(x, y))#126671

Open
Copilot wants to merge 11 commits into
mainfrom
copilot/extend-jit-cast-removal-pattern
Open

JIT: Extend cast removal pattern for bitwise ops: OP(CAST(x), CAST(y)) → CAST(OP(x, y))#126671
Copilot wants to merge 11 commits into
mainfrom
copilot/extend-jit-cast-removal-pattern

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 8, 2026

Description

PR #120980 introduced range assertions that can eliminate outer casts on bitwise operations when the result is provably in range. This breaks the existing fgSimpleLowerCastOfSmpOp pattern which expects CAST(OP(CAST(x), CAST(y)))CAST(OP(x, y)). When the outer cast is removed first, we're left with OP(CAST(x), CAST(y)) — two casts where at most one is needed.

New function fgSimpleLowerSmpOpCasts handles the complementary case for AND, OR, XOR:

AND(CAST_T(x), CAST_T(y))  →  CAST_T(AND(x, y))

This is valid because bitwise ops are bit-independent: ext_n(x) OP ext_n(y) = ext_n(x OP y) for sign/zero extension with AND, OR, XOR.

Diffs.

When range assertions remove an outer cast from CAST(OP(CAST(x), CAST(y))),
we're left with OP(CAST(x), CAST(y)) with two redundant casts. For bitwise
ops (AND, OR, XOR), we can transform this to CAST(OP(x, y)) since these
operations are bit-independent: sign/zero extension commutes with bitwise ops.

Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/7a203210-db47-47e7-9074-92b298f13949

Co-authored-by: adamperlin <10533886+adamperlin@users.noreply.github.com>
Copilot AI requested review from Copilot and removed request for Copilot April 8, 2026 23:21
Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/7a203210-db47-47e7-9074-92b298f13949

Co-authored-by: adamperlin <10533886+adamperlin@users.noreply.github.com>
Copilot AI requested review from Copilot and removed request for Copilot April 8, 2026 23:37
Copilot AI changed the title [WIP] Extend JIT cast removal pattern for simple ops Extend JIT cast removal pattern for bitwise ops: OP(CAST(x), CAST(y)) → CAST(OP(x, y)) Apr 8, 2026
Copilot AI requested a review from adamperlin April 8, 2026 23:38
@jkoritzinsky jkoritzinsky added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed area-Infrastructure labels Apr 9, 2026
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Copilot AI review requested due to automatic review settings May 8, 2026 18:21
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

🤖 Copilot Code Review — PR #126671

Note

This review was generated by GitHub Copilot using multi-model analysis (Claude Opus 4.6 primary, with GPT-5.3-Codex and Claude Haiku 4.5 sub-agents).

Holistic Assessment

Motivation: The optimization is well-motivated. When range assertions remove the outer cast from CAST(OP(CAST(x), CAST(y))), the remaining OP(CAST(x), CAST(y)) has two redundant casts that can be folded into one. This is a natural complement to the existing fgSimpleLowerCastOfSmpOp which handles the inverse pattern (removing inner casts when the outer cast is present).

Approach: The approach is sound. For bitwise operations (AND, OR, XOR), truncation/extension commutes with the operation — the lower bits of the result depend only on the lower bits of the inputs. The transformation OP(CAST_small(x), CAST_small(y))CAST_small(OP(x, y)) reduces two CAST nodes to one. Performing this in rationalization (during HIR→LIR transition) correctly avoids the NOL (normalized-on-load) issues that would make this unsafe in morph, consistent with the existing similar optimizations.

Summary: ⚠️ Needs Human Review. The transformation logic is mathematically correct and the implementation follows established patterns in the codebase. However, there are a few points that warrant human maintainer attention: missing benchmark evidence for an optimization PR, no regression tests, and a minor robustness concern around flag handling. None are likely to cause miscompilation, but a JIT maintainer should confirm.


Detailed Findings

✅ Correctness — Bitwise commutativity with truncation is valid

The core claim that OP(CAST_small(x), CAST_small(y))CAST_small(OP(x, y)) for bitwise AND/OR/XOR is correct. For zero-extension: (x & mask) OP (y & mask) = (x OP y) & mask. For sign-extension, the lower bits are identical, and the CAST at the end re-extends correctly. The guards are appropriate:

  • Both casts must target the same small integer type (castToType match)
  • No overflow-checked casts
  • Cast sources must have the same actual type as the op

✅ Type safety — No type mismatch for consumers

One sub-agent flagged a type mismatch concern, but this is a false positive. GenTreeCast for small type targets has node type TYP_INT (not the small type), as confirmed by how casts are constructed throughout the JIT (e.g., gtNewCastNode(TYP_INT, op1, false, castToType) in gentree.cpp:15834). Consumers of the replaced node will continue to see TYP_INT, matching the original op's type. The CastToType() carries the narrowing semantics separately.

✅ GTF_UNSIGNED flag — Not meaningful for small type casts

The GTF_UNSIGNED flag on the repurposed cast1 is technically stale (it reflects the original cast source, not the new OP(x,y) source). However, per the GenTreeCast documentation (gentree.h:4128-4131), for unchecked casts to small types, IsUnsigned is meaningless — the extension behavior is determined entirely by varTypeIsUnsigned(CastToType()) (see IsZeroExtending() at gentree.h:4166-4168). This is not a correctness issue.

✅ LIR ordering — Correct linear sequence maintained

After transformation, the LIR order is [..., x, y, OP(x,y), CAST(OP(x,y)), ...] which is a valid def-use chain. The range.Remove and range.InsertAfter calls correctly maintain this.

⚠️ Missing benchmark evidence

Per dotnet/runtime conventions, optimization PRs should include benchmark evidence (BenchmarkDotNet or equivalent) showing the improvement is real and measurable. The PR description doesn't reference any benchmarks or codegen diffs. While removing a CAST node is a straightforward win, it would be helpful to show:

  1. An example of before/after codegen for a real-world pattern
  2. Evidence that this pattern actually occurs in practice (e.g., from SPMI/JITDUMP analysis)

This is non-blocking but a JIT maintainer should evaluate whether evidence is needed.

⚠️ No regression tests added

The PR adds no tests. While this is an optimization (not a bug fix), and existing tests should cover correctness of AND/OR/XOR on small types, it would be valuable to add at least one test method that exercises the specific pattern OP(CAST_small(x), CAST_small(y)) for the three bitwise ops, to ensure the optimization fires and produces correct results. This could be a simple [MethodImpl(MethodImplOptions.NoInlining)] method with small-type parameters.

💡 Side-effect flag propagation — Consistent with existing patterns

After repurposing cast1 (changing CastOp() from x to OP(x,y)), the side-effect flags (GTF_ALL_EFFECT) on cast1 are not recomputed to reflect the new subtree. However, the existing fgSimpleLowerCastOfSmpOp and fgSimpleLowerBswap16 follow the same pattern — they don't update flags either. In LIR, execution order is governed by linear ordering rather than tree structure, so this is unlikely to cause issues. Noting for completeness.

💡 Naming consistency — Minor observation

The JITDUMP message says "Lower - Simple Op Casts" which follows the fgSimpleLower* naming convention. This is fine and consistent, though it could briefly confuse someone since this runs during rationalization, not the lowering phase proper. The existing functions have the same naming pattern, so this is established convention.


Models contributing to this review: Claude Opus 4.6 (primary analysis), GPT-5.3-Codex (correctness verification), Claude Haiku 4.5 (flag/type analysis — some findings were false positives, noted above).

Generated by Code Review for issue #126671 ·

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new rationalization-time (LIR) cast-reduction transform for bitwise ops so that AND/OR/XOR(CAST_T(x), CAST_T(y)) can be rewritten as CAST_T(AND/OR/XOR(x, y)), reducing two inner casts down to one outer cast after certain earlier optimizations remove an outer cast.

Changes:

  • Add Compiler::fgSimpleLowerSmpOpCasts in flowgraph.cpp to fold two operand casts into a single outer cast for GT_AND/GT_OR/GT_XOR.
  • Invoke this new helper during rationalization for GT_AND/GT_OR/GT_XOR, replacing the parent use when a rewrite occurs.
  • Declare the new helper in compiler.h.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
src/coreclr/jit/rationalize.cpp Calls the new cast-folding helper for GT_AND/GT_OR/GT_XOR and replaces the use edge when it rewrites.
src/coreclr/jit/flowgraph.cpp Implements fgSimpleLowerSmpOpCasts to transform OP(CAST, CAST) into CAST(OP) in LIR.
src/coreclr/jit/compiler.h Adds the declaration for fgSimpleLowerSmpOpCasts.

Comment thread src/coreclr/jit/rationalize.cpp Outdated
Comment thread src/coreclr/jit/flowgraph.cpp
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 8, 2026 20:27
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comment thread src/coreclr/jit/rationalize.cpp Outdated
Comment thread src/coreclr/jit/flowgraph.cpp
Copilot AI review requested due to automatic review settings May 11, 2026 21:41
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comment thread src/coreclr/jit/rationalize.cpp Outdated
Comment thread src/coreclr/jit/flowgraph.cpp
Copilot AI review requested due to automatic review settings May 12, 2026 01:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

Comment thread src/tests/JIT/opt/And/IntAnd.cs Outdated
Comment thread src/coreclr/jit/flowgraph.cpp
Comment on lines +1847 to +1854
case GT_AND:
case GT_OR:
case GT_XOR:
{
GenTree* parent = parentStack.Height() >= 2 ? parentStack.Top(1) : nullptr;
if ((parent == nullptr) || !parent->OperIs(GT_CAST))
{
GenTree* replacement = m_compiler->fgSimpleLowerSmpOpCasts(BlockRange(), node->AsOp());
@adamperlin adamperlin requested a review from EgorBo May 12, 2026 16:53
@adamperlin
Copy link
Copy Markdown
Contributor

adamperlin commented May 12, 2026

I'm not totally sure if this opt is worth it. Scanning for this specific pattern while processing any AND/OR/XOR operation during rationalization seems a bit overkill for small-ish diffs. I'm curious if you have any thoughts @EgorBo!

@EgorBo
Copy link
Copy Markdown
Member

EgorBo commented May 12, 2026

I'm not totally sure if this opt is worth it.

Not sure either, seems like it mostly only improves win-x64.

Any particular reason you've done it in Rationalizer? Normally, we either do such transforms in gtFoldExpr, morph or Lower

@adamperlin
Copy link
Copy Markdown
Contributor

adamperlin commented May 12, 2026

Any particular reason you've done it in Rationalizer? Normally, we either do such transforms in gtFoldExpr, morph or Lower

There is a comment on fgSimpleLowerCastOfSmpOp which is a similar transform, which says doing the transformation in morph causes problems with normalized-on-load locals in value numbering, so I figured we might run into a similar issue here with removing inner casts? Do you have any idea if that might be an issue here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Extend JIT Cast Removal Pattern for Simple Ops

5 participants