Feature or enhancement
The new PyBytesWriter() API is fast and easy to use. I expect it will bring a nice improvement both to maintainability and speed for compression output buffer management.
I have some perf recordings showing that a large portion (>50%!) of time in decompression for a mix of data sizes (1K, 1M, 1G) is in _BlocksOutputBuffer_Finish, re-assembling the output buffer.
I also made a very hacky modification to pycore_blocks_output_buffer.h to use PyBytesWriter() and found it greatly sped up decompression time:
The below two tests are operating on compressed enwiki content with zstd compression.
| test |
main |
PyBytesWriter() |
| decompress 1M |
2.15ms |
1.65ms |
| decompress 1G |
2.2s |
1.73s |
Those are 25-30% speedups!
I think this is enough to motivate a refactor of this code to use PyBytesWriter() and benchmark against the current implementation across compression modules and data sizes.
cc @vstinner for viz
Linked PRs