Skip to content

gh-146640: Optimize int operations by mutating uniquely-referenced operands in place (JIT only)#146641

Merged
Fidget-Spinner merged 15 commits intopython:mainfrom
eendebakpt:int_int_inplace
Apr 3, 2026
Merged

gh-146640: Optimize int operations by mutating uniquely-referenced operands in place (JIT only)#146641
Fidget-Spinner merged 15 commits intopython:mainfrom
eendebakpt:int_int_inplace

Conversation

@eendebakpt
Copy link
Copy Markdown
Contributor

@eendebakpt eendebakpt commented Mar 30, 2026

In this PR we add inplace binary operations for int type, similar to #146397. A complication compared to the float case is that the int operations can return small ints which should not be modified in place. In this PR we:

  • Use PyJitRef_MakeUnique(sym_new_compact_int(ctx)); to mark that a symbol is an int that is either unique or one of the small ints.
  • Add _BINARY_OP_ADD_INT_INPLACE (with variations for the other operations) that handles both small ints as input and overflows. An alternative to handling these cases in the opcode would be do deopt. Here we decided not to deopts because this could lead to many deopts.
  • There is a PyStackRef_DUP(TARGET); inside the macro to get the stack effects right. I tried optimizing that, but it requires changing the structure of the opcode following _BINARY_OP_ADD_INT (the POP_TOP ones). I estimate that it is not worth the complexity to try to eliminate that.

Micro benchmark results:

Pattern main (ns/iter) branch (ns/iter) Speedup Notes
total += a*b + c (non-small) 34.4 27.8 1.24x inplace mutation on unique a*b result
total += a + b (non-small) 19.8 13.0 1.52x a+b result is unique, inplace on +=
t = a + b (plain assign) 13.4 14.1 -- no +=, not affected
total += a*b + c*d (chain) 40.4 34.6 1.17x two unique products, inplace on +
total += a*b + 1 (small int) 23.2 24.1 -- TARGET is small int, fallback to standard

All values are non-small integers (> 1024) unless noted.

Selected pyperformance benchmarks:

### deltablue ###
Mean +- std dev: 1.95 ms +- 0.03 ms -> 1.97 ms +- 0.03 ms: 1.01x slower
Not significant

### scimark_fft ###
Mean +- std dev: 208 ms +- 5 ms -> 206 ms +- 1 ms: 1.01x faster
Not significant

### scimark_lu ###
Mean +- std dev: 58.1 ms +- 0.2 ms -> 58.7 ms +- 0.8 ms: 1.01x slower
Not significant

### scimark_monte_carlo ###
Mean +- std dev: 41.7 ms +- 1.7 ms -> 41.6 ms +- 0.8 ms: 1.00x faster
Not significant

### scimark_sor ###
Mean +- std dev: 66.8 ms +- 0.6 ms -> 64.3 ms +- 0.5 ms: 1.04x faster
Significant (t=26.88)

### scimark_sparse_mat_mult ###
Mean +- std dev: 3.64 ms +- 0.03 ms -> 3.64 ms +- 0.04 ms: 1.00x slower
Not significant

### spectral_norm ###
Mean +- std dev: 56.4 ms +- 0.2 ms -> 51.5 ms +- 0.4 ms: 1.10x faster
Significant (t=79.61)

// On mutation success, DUP the target so POP_TOP_INT can safely
// decref the original. Tier 2 only.
tier2 op(_BINARY_OP_ADD_INT_INPLACE, (left, right -- res, l, r)) {
INT_INPLACE_OP(left, right, left, +);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can factor out all the common code here into a function that takes a function pointer (the operation to do).

For each op, pass the function pointer e.g. _PyCompactLong_Add or whatever, and then the CPU/inliner can fix it up/constant-fold it away in the compiler.

Copy link
Copy Markdown
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor coding style comments.

@eendebakpt eendebakpt marked this pull request as ready for review April 1, 2026 21:27
// the following _POP_TOP_INT becomes _POP_TOP_NOP. Tier 2 only.
tier2 op(_BINARY_OP_ADD_INT_INPLACE, (left, right -- res, l, r)) {
INT_INPLACE_OP(left, right, left, +, _PyCompactLong_Add);
EXIT_IF(PyStackRef_IsNull(_int_inplace_res));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be ERROR_IF instead? The only way this can be null after the compactlong_add operation is that it fails?

Same for below.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The non-inplace _BINARY_OP_ADD_INT uses EXIT_IF as well. The _PyCompactLong_Add can error for two reasons: OOM and the result of the add being non-compact (e.g. requiring more than one digit). I think for the latter we want the EXIT_IF?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah that's fine then ok. thanks!

@Fidget-Spinner Fidget-Spinner merged commit 48317fe into python:main Apr 3, 2026
81 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants