Skip to content

arm64: Refactor mov/movprfx for embedded masked operations#126398

Open
ylpoonlg wants to merge 4 commits intodotnet:mainfrom
ylpoonlg:github-movprfx_refactor_2
Open

arm64: Refactor mov/movprfx for embedded masked operations#126398
ylpoonlg wants to merge 4 commits intodotnet:mainfrom
ylpoonlg:github-movprfx_refactor_2

Conversation

@ylpoonlg
Copy link
Copy Markdown
Contributor

@ylpoonlg ylpoonlg commented Apr 1, 2026

This PR is the second part of #115508, following #123717.

The mov/movprfx logic for embedded masked operations are moved from codegen into the emit functions in a similar way to #123717. The main difference is that we can use predicated movprfx (zeroing or merging) for embedded masked operations, depending on the false argument of the wrapped conditional select. This information is passed into the emitInsSve_Mov helper using a new option mopt, which defaults to INS_SVE_MOV_OPTS_UNPRED.

@dotnet/arm64-contrib @a74nh @dhartglassMSFT

* Add an option for SVE mov/movprfx to differentiate between unpredicated,
  zeroing and merging operation for the emitInsSve_Mov helper function.

* Clean up codegen for embedded masked operation.

* Fix SIMD&FP scalar register name in SVE emit display.
@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 1, 2026
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Apr 1, 2026
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@dhartglassMSFT
Copy link
Copy Markdown
Contributor

Hi @a74nh are you physically able to kick off a "jitstressregs" pipeline? Or, run locally on your side?

(Specifically referring to the test scenario that discovered this bug #126379 )

It should be at least possible to run the tests locally, it's controlled by DOTNET* env variables.

We've had at least a couple jitstress-discovered issues in arm64 codegen over the last few weeks, I'm hoping if that leg is run proactively we can cut down on that.

(That said, it shouldn't be run until this PR goes in #126434 )

@a74nh
Copy link
Copy Markdown
Contributor

a74nh commented Apr 7, 2026

Hi @a74nh are you physically able to kick off a "jitstressregs" pipeline? Or, run locally on your side?

(Specifically referring to the test scenario that discovered this bug #126379 )

It should be at least possible to run the tests locally, it's controlled by DOTNET* env variables.

We've had at least a couple jitstress-discovered issues in arm64 codegen over the last few weeks, I'm hoping if that leg is run proactively we can cut down on that.

(That said, it shouldn't be run until this PR goes in #126434 )

We can't run the pipeline in github as we don't have the permissions.

We do have a script that Kunal wrote, which just runs the command line you give it using many different stress scenarios.

Running the hwintrinsic tests using it should be good enough.

@ylpoonlg - could you rebase and test all the hwintrinsics using that script please.

@ylpoonlg
Copy link
Copy Markdown
Contributor Author

ylpoonlg commented Apr 7, 2026

Fixed a similar issue to #126434 . The jitstress tests now pass running hwintrinsic tests with the script.

@dhartglassMSFT
Copy link
Copy Markdown
Contributor

Do you have an spmi asmdiffs result like for the first PR?

Also, I had to use your jitstress changes to fix a higher pri issue last week, so you'll probably get merge conflicts : /

@ylpoonlg
Copy link
Copy Markdown
Contributor Author

Do you have an spmi asmdiffs result like for the first PR?

Some examples from the SPMI asmdiffs summary:

@@ -20,16 +20,15 @@ G_M37464_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
 G_M37464_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
             ptrue   p0.s
             movi    v16.4s, #0
-            movprfx z0, z0
             addp    z0.s, p0/m, z0.s, z1.s
             sel     z0.s, p0, z0.s, z16.s
-                                               ;; size=20 bbWeight=1 PerfScore 7.50
+                                               ;; size=16 bbWeight=1 PerfScore 5.50

Removes unnecessary movprfx where destination and source registers are the same.

@@ -49,7 +49,7 @@ G_M63337_IG02:        ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
             movk    x4, #0xD1FFAB1E LSL #16
             movk    x4, #0xD1FFAB1E LSL #32
             ldr     w1, [x4]
-            mov     x4, x1
+            mov     w4, w1
             sqincb  x4, w4, vl8, mul #2
             mov     x0, x19
             ; gcrRegs +[x0]

Small change to the scalar *qinc/qdec*, as they read the source 32-bit register and output 64-bit to the same register. So semantically only the 32-bit input need to be moved, but it doesn't affect the behavior or results. This is done to generalize the logic for the emit size of mov instructions.

@ylpoonlg
Copy link
Copy Markdown
Contributor Author

ylpoonlg commented Apr 15, 2026

Also, I had to use your jitstress changes to fix a higher pri issue last week, so you'll probably get merge conflicts : /

Seems to be clean with main so far, but I can do a rebase anyway. Sorry for breaking the jitstress tests and thanks for fixing them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants