Skip to content

Response.content is wasteful in time and memory for large inputs #4687

@tzickel

Description

@tzickel

https://github.com/requests/requests/blob/883caaf145fbe93bd0d208a6b864de9146087312/requests/models.py#L827

iter chunk is 10K, let's say we have binary content that is about 10MB, the .join first converts the iter_content to a list of 1000 items of 10K each (if it's not chunked), then it creates a new large 10MB string, copies all of them to it, then returns it.

This both creates pressure on the allocator and at the peak takes * 2 the memory of the input.

Without resorting to C, and if we know the input size (i.e. not chunked), we can create a bytearray and fill it with one iteration at a time. On my computer it is about *2-3 faster, takes half the memory and causes 1/2 of page reclaims.

The problem is that is of course that it returns a bytearray and not a bytes type, maybe this should be a helper method for large inputs if people care about this stuff ?

arr = bytearray(totalsize)
i = 0
for item in self.iter_content(CONTENT_CHUNK_SIZE):
  l = len(item)
  arr[i:i + l] = item
  i += l
return arr

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions