https://github.com/requests/requests/blob/883caaf145fbe93bd0d208a6b864de9146087312/requests/models.py#L827
iter chunk is 10K, let's say we have binary content that is about 10MB, the .join first converts the iter_content to a list of 1000 items of 10K each (if it's not chunked), then it creates a new large 10MB string, copies all of them to it, then returns it.
This both creates pressure on the allocator and at the peak takes * 2 the memory of the input.
Without resorting to C, and if we know the input size (i.e. not chunked), we can create a bytearray and fill it with one iteration at a time. On my computer it is about *2-3 faster, takes half the memory and causes 1/2 of page reclaims.
The problem is that is of course that it returns a bytearray and not a bytes type, maybe this should be a helper method for large inputs if people care about this stuff ?
arr = bytearray(totalsize)
i = 0
for item in self.iter_content(CONTENT_CHUNK_SIZE):
l = len(item)
arr[i:i + l] = item
i += l
return arr
https://github.com/requests/requests/blob/883caaf145fbe93bd0d208a6b864de9146087312/requests/models.py#L827
iter chunk is 10K, let's say we have binary content that is about 10MB, the .join first converts the iter_content to a list of 1000 items of 10K each (if it's not chunked), then it creates a new large 10MB string, copies all of them to it, then returns it.
This both creates pressure on the allocator and at the peak takes * 2 the memory of the input.
Without resorting to C, and if we know the input size (i.e. not chunked), we can create a bytearray and fill it with one iteration at a time. On my computer it is about *2-3 faster, takes half the memory and causes 1/2 of page reclaims.
The problem is that is of course that it returns a bytearray and not a bytes type, maybe this should be a helper method for large inputs if people care about this stuff ?