Response.content is wasteful in time and memory for large inputs

https://github.com/requests/requests/blob/883caaf145fbe93bd0d208a6b864de9146087312/requests/models.py#L827

iter chunk is 10K, let's say we have binary content that is about 10MB, the .join first converts the iter_content to a list of 1000 items of 10K each (if it's not chunked), then it creates a new large 10MB string, copies all of them to it, then returns it.

This both creates pressure on the allocator and at the peak takes * 2 the memory of the input.

Without resorting to C, and if we know the input size (i.e. not chunked), we can create a bytearray and fill it with one iteration at a time. On my computer it is about *2-3 faster, takes half the memory and causes 1/2 of page reclaims.

The problem is that is of course that it returns a bytearray and not a bytes type, maybe this should be a helper method for large inputs if people care about this stuff ?

```python
arr = bytearray(totalsize)
i = 0
for item in self.iter_content(CONTENT_CHUNK_SIZE):
  l = len(item)
  arr[i:i + l] = item
  i += l
return arr
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Response.content is wasteful in time and memory for large inputs #4687

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Response.content is wasteful in time and memory for large inputs #4687

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions