Skip to content

gh-145615: Fix mimalloc page leak in the free-threaded build#145626

Merged
colesbury merged 4 commits intopython:mainfrom
colesbury:gh-145615-fix-mimalloc-page-leak
Mar 9, 2026
Merged

gh-145615: Fix mimalloc page leak in the free-threaded build#145626
colesbury merged 4 commits intopython:mainfrom
colesbury:gh-145615-fix-mimalloc-page-leak

Conversation

@colesbury
Copy link
Contributor

@colesbury colesbury commented Mar 7, 2026

Fix three issues that caused mimalloc pages to be leaked until the owning thread exited:

  1. In _PyMem_mi_page_maybe_free(), move pages out of the full queue when relying on QSBR to defer freeing the page. Pages in the full queue are never searched by mi_page_queue_find_free_ex(), so a page left there is unusable for allocations.

  2. Move _PyMem_mi_page_clear_qsbr() from _mi_page_free_collect() to _mi_page_thread_free_collect() where it only fires when all blocks on the page are free (used == 0). The previous placement was too broad: it cleared QSBR state whenever local_free was non-NULL, but _mi_page_free_collect() is called from non-allocation paths (e.g., page visiting in mi_heap_visit_blocks) where the page is not being reused.

  3. In _PyMem_mi_page_maybe_free(), use the page's heap tld to find the correct thread state for QSBR list insertion instead of PyThreadState_GET(). During stop-the-world pauses, the function may process pages belonging to other threads, so the current thread state is not necessarily the owner of the page.

Fix three issues that caused mimalloc pages to be leaked until the
owning thread exited:

1. In _PyMem_mi_page_maybe_free(), move pages out of the full queue
   when relying on QSBR to defer freeing the page. Pages in the full
   queue are never searched by mi_page_queue_find_free_ex(), so a page
   left there is unusable for allocations.

2. Move _PyMem_mi_page_clear_qsbr() from _mi_page_free_collect() to
   _mi_page_thread_free_collect() where it only fires when all blocks
   on the page are free (used == 0). The previous placement was too
   broad: it cleared QSBR state whenever local_free was non-NULL, but
   _mi_page_free_collect() is called from non-allocation paths (e.g.,
   page visiting in mi_heap_visit_blocks) where the page is not being
   reused.

3. In _PyMem_mi_page_maybe_free(), use the page's heap tld to find the
   correct thread state for QSBR list insertion instead of
   PyThreadState_GET(). During stop-the-world pauses, the function may
   process pages belonging to other threads, so the current thread
   state is not necessarily the owner of the page.
@colesbury colesbury added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Mar 7, 2026
@bedevere-bot

This comment was marked as outdated.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Mar 7, 2026
@colesbury colesbury added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Mar 7, 2026
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @colesbury for commit 2516a1a 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F145626%2Fmerge

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Mar 7, 2026
@colesbury colesbury marked this pull request as ready for review March 9, 2026 14:06
@colesbury colesbury requested a review from Yhg1s March 9, 2026 14:06
@colesbury colesbury assigned mpage and unassigned mpage Mar 9, 2026
@colesbury colesbury requested a review from mpage March 9, 2026 14:06
@mpage
Copy link
Contributor

mpage commented Mar 9, 2026

The changes look good to me, though I am not intimately familiar with mimalloc or our modifications to it. The buildbot failure seems like it might be real though. It looks like test.test_tools.test_freeze.TestFreeze.test_freeze_simple_script is consistently timing out on this PR. None of the recent runs on the buildbot failed in the same way.

@colesbury
Copy link
Contributor Author

colesbury commented Mar 9, 2026

I don't think the buildbot failure is related. The PR only touches mimalloc and Py_GIL_DISABLED code, which isn't used by that buildbot configuration. There are fairly frequent time outs at random places with that buildbot: https://buildbot.python.org/#/workers/4. The test_freeze_simple_script test is pretty slow and isn't run in GitHub CI.

@colesbury colesbury merged commit d76df75 into python:main Mar 9, 2026
130 of 131 checks passed
@miss-islington-app
Copy link

Thanks @colesbury for the PR 🌮🎉.. I'm working now to backport this PR to: 3.14.
🐍🍒⛏🤖

@colesbury colesbury deleted the gh-145615-fix-mimalloc-page-leak branch March 9, 2026 17:24
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Mar 9, 2026
…ythongh-145626)

Fix three issues that caused mimalloc pages to be leaked until the
owning thread exited:

1. In _PyMem_mi_page_maybe_free(), move pages out of the full queue
   when relying on QSBR to defer freeing the page. Pages in the full
   queue are never searched by mi_page_queue_find_free_ex(), so a page
   left there is unusable for allocations.

2. Move _PyMem_mi_page_clear_qsbr() from _mi_page_free_collect() to
   _mi_page_thread_free_collect() where it only fires when all blocks
   on the page are free (used == 0). The previous placement was too
   broad: it cleared QSBR state whenever local_free was non-NULL, but
   _mi_page_free_collect() is called from non-allocation paths (e.g.,
   page visiting in mi_heap_visit_blocks) where the page is not being
   reused.

3. In _PyMem_mi_page_maybe_free(), use the page's heap tld to find the
   correct thread state for QSBR list insertion instead of
   PyThreadState_GET(). During stop-the-world pauses, the function may
   process pages belonging to other threads, so the current thread
   state is not necessarily the owner of the page.
(cherry picked from commit d76df75f51e662fd15ebe00e107058841de94860)

Co-authored-by: Sam Gross <colesbury@gmail.com>
@bedevere-app
Copy link

bedevere-app bot commented Mar 9, 2026

GH-145691 is a backport of this pull request to the 3.14 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.14 bugs and security fixes label Mar 9, 2026
colesbury added a commit that referenced this pull request Mar 9, 2026
…h-145626) (#145691)

Fix three issues that caused mimalloc pages to be leaked until the
owning thread exited:

1. In _PyMem_mi_page_maybe_free(), move pages out of the full queue
   when relying on QSBR to defer freeing the page. Pages in the full
   queue are never searched by mi_page_queue_find_free_ex(), so a page
   left there is unusable for allocations.

2. Move _PyMem_mi_page_clear_qsbr() from _mi_page_free_collect() to
   _mi_page_thread_free_collect() where it only fires when all blocks
   on the page are free (used == 0). The previous placement was too
   broad: it cleared QSBR state whenever local_free was non-NULL, but
   _mi_page_free_collect() is called from non-allocation paths (e.g.,
   page visiting in mi_heap_visit_blocks) where the page is not being
   reused.

3. In _PyMem_mi_page_maybe_free(), use the page's heap tld to find the
   correct thread state for QSBR list insertion instead of
   PyThreadState_GET(). During stop-the-world pauses, the function may
   process pages belonging to other threads, so the current thread
   state is not necessarily the owner of the page.

(cherry picked from commit d76df75)

Co-authored-by: Sam Gross <colesbury@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants