arm64: Refactor mov/movprfx for embedded masked operations#126398
arm64: Refactor mov/movprfx for embedded masked operations#126398ylpoonlg wants to merge 4 commits intodotnet:mainfrom
Conversation
* Add an option for SVE mov/movprfx to differentiate between unpredicated, zeroing and merging operation for the emitInsSve_Mov helper function. * Clean up codegen for embedded masked operation. * Fix SIMD&FP scalar register name in SVE emit display.
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
|
Hi @a74nh are you physically able to kick off a "jitstressregs" pipeline? Or, run locally on your side? (Specifically referring to the test scenario that discovered this bug #126379 ) It should be at least possible to run the tests locally, it's controlled by DOTNET* env variables. We've had at least a couple jitstress-discovered issues in arm64 codegen over the last few weeks, I'm hoping if that leg is run proactively we can cut down on that. (That said, it shouldn't be run until this PR goes in #126434 ) |
We can't run the pipeline in github as we don't have the permissions. We do have a script that Kunal wrote, which just runs the command line you give it using many different stress scenarios. Running the hwintrinsic tests using it should be good enough. @ylpoonlg - could you rebase and test all the hwintrinsics using that script please. |
|
Fixed a similar issue to #126434 . The jitstress tests now pass running hwintrinsic tests with the script. |
|
Do you have an spmi asmdiffs result like for the first PR? Also, I had to use your jitstress changes to fix a higher pri issue last week, so you'll probably get merge conflicts : / |
Some examples from the SPMI asmdiffs summary: @@ -20,16 +20,15 @@ G_M37464_IG01: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
G_M37464_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
ptrue p0.s
movi v16.4s, #0
- movprfx z0, z0
addp z0.s, p0/m, z0.s, z1.s
sel z0.s, p0, z0.s, z16.s
- ;; size=20 bbWeight=1 PerfScore 7.50
+ ;; size=16 bbWeight=1 PerfScore 5.50Removes unnecessary movprfx where destination and source registers are the same. @@ -49,7 +49,7 @@ G_M63337_IG02: ; bbWeight=1, gcrefRegs=80000 {x19}, byrefRegs=0000 {}, by
movk x4, #0xD1FFAB1E LSL #16
movk x4, #0xD1FFAB1E LSL #32
ldr w1, [x4]
- mov x4, x1
+ mov w4, w1
sqincb x4, w4, vl8, mul #2
mov x0, x19
; gcrRegs +[x0]Small change to the scalar |
Seems to be clean with main so far, but I can do a rebase anyway. Sorry for breaking the jitstress tests and thanks for fixing them. |
This PR is the second part of #115508, following #123717.
The mov/movprfx logic for embedded masked operations are moved from codegen into the emit functions in a similar way to #123717. The main difference is that we can use predicated movprfx (zeroing or merging) for embedded masked operations, depending on the false argument of the wrapped conditional select. This information is passed into the
emitInsSve_Movhelper using a new optionmopt, which defaults toINS_SVE_MOV_OPTS_UNPRED.@dotnet/arm64-contrib @a74nh @dhartglassMSFT