gh-146640: Optimize int operations by mutating uniquely-referenced operands in place (JIT only)#146641
Conversation
Python/bytecodes.c
Outdated
| // On mutation success, DUP the target so POP_TOP_INT can safely | ||
| // decref the original. Tier 2 only. | ||
| tier2 op(_BINARY_OP_ADD_INT_INPLACE, (left, right -- res, l, r)) { | ||
| INT_INPLACE_OP(left, right, left, +); |
There was a problem hiding this comment.
I think you can factor out all the common code here into a function that takes a function pointer (the operation to do).
For each op, pass the function pointer e.g. _PyCompactLong_Add or whatever, and then the CPU/inliner can fix it up/constant-fold it away in the compiler.
vstinner
left a comment
There was a problem hiding this comment.
Some minor coding style comments.
| // the following _POP_TOP_INT becomes _POP_TOP_NOP. Tier 2 only. | ||
| tier2 op(_BINARY_OP_ADD_INT_INPLACE, (left, right -- res, l, r)) { | ||
| INT_INPLACE_OP(left, right, left, +, _PyCompactLong_Add); | ||
| EXIT_IF(PyStackRef_IsNull(_int_inplace_res)); |
There was a problem hiding this comment.
Shouldn't this be ERROR_IF instead? The only way this can be null after the compactlong_add operation is that it fails?
Same for below.
There was a problem hiding this comment.
The non-inplace _BINARY_OP_ADD_INT uses EXIT_IF as well. The _PyCompactLong_Add can error for two reasons: OOM and the result of the add being non-compact (e.g. requiring more than one digit). I think for the latter we want the EXIT_IF?
There was a problem hiding this comment.
oh yeah that's fine then ok. thanks!
In this PR we add inplace binary operations for int type, similar to #146397. A complication compared to the float case is that the int operations can return small ints which should not be modified in place. In this PR we:
PyJitRef_MakeUnique(sym_new_compact_int(ctx));to mark that a symbol is an int that is either unique or one of the small ints._BINARY_OP_ADD_INT_INPLACE(with variations for the other operations) that handles both small ints as input and overflows. An alternative to handling these cases in the opcode would be do deopt. Here we decided not to deopts because this could lead to many deopts.PyStackRef_DUP(TARGET);inside the macro to get the stack effects right. I tried optimizing that, but it requires changing the structure of the opcode following_BINARY_OP_ADD_INT(the POP_TOP ones). I estimate that it is not worth the complexity to try to eliminate that.Micro benchmark results:
total += a*b + c(non-small)a*bresulttotal += a + b(non-small)a+bresult is unique, inplace on+=t = a + b(plain assign)+=, not affectedtotal += a*b + c*d(chain)+total += a*b + 1(small int)All values are non-small integers (> 1024) unless noted.
Selected pyperformance benchmarks: