Skip to content
This repository was archived by the owner on May 20, 2025. It is now read-only.
This repository was archived by the owner on May 20, 2025. It is now read-only.

Segmentation fault with distributed package #66

@lesteve

Description

@lesteve

Here is a file that reproduces the segmentation fault (not completely deterministic, it does run fine sometimes, you may need to run it a few times):

# test-distributed.py
import time

from distributed import Client

def func(arg):
    time.sleep(1)
    return arg + 1

if __name__ == '__main__':
    client = Client()
    futures = client.map(func, range(10))
    results = client.gather(futures)
    print(results)

To reproduce:

pip install distributed
python test-distributed.py

Debugging with gdb points towards the distributed profiling code. Maybe it is doing some CPython-specific things?

Here is the bt gdb info:

(gdb) bt
#0  vm_stack_walk (w=<optimized out>)
    at ./Include/internal/pycore_stackwalk.h:80
#1  vm_frame_at_offset (ts=<optimized out>, offset=<optimized out>)
    at Python/ceval_meta.c:3272
#2  0x000055555587e334 in PyFrame_GetLineNumber (f=0x4c06a8c0250)
    at Objects/frameobject.c:48
#3  frame_getlineno (f=0x4c06a8c0250, closure=<optimized out>)
    at Objects/frameobject.c:56
#4  0x000055555562d7dc in _PyObject_GenericGetAttrWithDict (
    obj=0x4c06a8c0250, name=0x4c066823840, dict=0x0, 
    suppress=<optimized out>) at Objects/object.c:1251
#5  0x00005555558db738 in _PyEval_Fast (ts=0x7fffe4006230, initial_acc=..., 
    initial_pc=0x4c068ec9302 "5") at Python/ceval.c:1271
#6  0x00005555557189f2 in _PyEval_Eval (pc=<optimized out>, acc=..., 
    tstate=0x7fffe4006230) at Python/ceval_meta.c:2830
#7  _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, 
    nargsf=<optimized out>, kwnames=<optimized out>)
    at Python/ceval_meta.c:3227
#8  0x000055555586491d in _PyObject_VectorcallTstate (
    kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, 
    callable=<optimized out>, tstate=<optimized out>)
    at ./Include/cpython/abstract.h:114
#9  method_vectorcall (method=<optimized out>, 
    args=0x555555a302f0 <_Py_EmptyTupleStruct+48>, nargsf=<optimized out>, 
    kwnames=<optimized out>) at Objects/classobject.c:84
#10 0x00005555555cca1d in PyVectorcall_Call (callable=0x4c06a81d760, 
    tuple=<optimized out>, kwargs=<optimized out>) at Objects/call.c:231
#11 0x00005555558114a1 in t_bootstrap (boot_raw=0x4c06a8070a0)
    at ./Modules/_threadmodule.c:1289
#12 0x000055555578950b in pythread_wrapper (arg=<optimized out>)
    at Python/thread_pthread.h:245
#13 0x00007ffff7d6e609 in start_thread (arg=<optimized out>)
    at pthread_create.c:477
#14 0x00007ffff7b39163 in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

and the py-bt info:

(gdb) py-bt
Traceback (most recent call first):
  File "/home/lesteve/venvs/nogil/lib/python3.9/site-packages/distributed/profile.py", line 72, in _f_lineno
    f_lineno = frame.f_lineno
  File "/home/lesteve/venvs/nogil/lib/python3.9/site-packages/distributed/profile.py", line 99, in info_frame
    f_lineno = _f_lineno(frame)
  File "/home/lesteve/venvs/nogil/lib/python3.9/site-packages/distributed/profile.py", line 178, in process
    d = {
  File "/home/lesteve/venvs/nogil/lib/python3.9/site-packages/distributed/profile.py", line 168, in process
    new_state = process(prev, frame, state, stop=stop, depth=depth - 1)
  File "/home/lesteve/venvs/nogil/lib/python3.9/site-packages/distributed/profile.py", line 168, in process
    new_state = process(prev, frame, state, stop=stop, depth=depth - 1)
  File "/home/lesteve/venvs/nogil/lib/python3.9/site-packages/distributed/profile.py", line 168, in process
    new_state = process(prev, frame, state, stop=stop, depth=depth - 1)
  File "/home/lesteve/venvs/nogil/lib/python3.9/site-packages/distributed/profile.py", line 168, in process
    new_state = process(prev, frame, state, stop=stop, depth=depth - 1)
  File "/home/lesteve/venvs/nogil/lib/python3.9/site-packages/distributed/profile.py", line 168, in process
    new_state = process(prev, frame, state, stop=stop, depth=depth - 1)
  File "/home/lesteve/venvs/nogil/lib/python3.9/site-packages/distributed/profile.py", line 168, in process
    new_state = process(prev, frame, state, stop=stop, depth=depth - 1)
  File "/home/lesteve/venvs/nogil/lib/python3.9/site-packages/distributed/profile.py", line 168, in process
    new_state = process(prev, frame, state, stop=stop, depth=depth - 1)
  File "/home/lesteve/venvs/nogil/lib/python3.9/site-packages/distributed/profile.py", line 343, in _watch
    process(frame, None, recent, omit=omit)
  File "/home/lesteve/dev/nogil/Lib/threading.py", line 886, in run
    self._target(*self._args, **self._kwargs)
  File "/home/lesteve/dev/nogil/Lib/threading.py", line 935, in _bootstrap_inner
    self.run()
  File "/home/lesteve/dev/nogil/Lib/threading.py", line 906, in _bootstrap
    self._bootstrap_inner()

In case there may be more info there, here is the full gdb session:

❯ gdb run --args python /tmp/test.py
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...
warning: File "/home/lesteve/dev/nogil/python-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
To enable execution of this file add
	add-auto-load-safe-path /home/lesteve/dev/nogil/python-gdb.py
line to your configuration file "/home/lesteve/.gdbinit".
To completely disable this security protection add
	set auto-load safe-path /
line to your configuration file "/home/lesteve/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
	info "(gdb)Auto-loading safe path"
(gdb) r
Starting program: /home/lesteve/venvs/nogil/bin/python /tmp/test.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff3c94700 (LWP 447850)]
[New Thread 0x7ffff3493700 (LWP 447851)]
[New Thread 0x7fffeec92700 (LWP 447852)]
[New Thread 0x7fffec04e700 (LWP 447853)]
[New Thread 0x7fffeb84d700 (LWP 447854)]
[New Thread 0x7fffeb04c700 (LWP 447855)]
[New Thread 0x7fffea847700 (LWP 447856)]
[Thread 0x7fffeec92700 (LWP 447852) exited]
[Thread 0x7ffff3493700 (LWP 447851) exited]
[Thread 0x7ffff3c94700 (LWP 447850) exited]
[Detaching after fork from child process 447857]
[New Thread 0x7fffeec92700 (LWP 447858)]
[Detaching after fork from child process 447859]
[New Thread 0x7ffff3493700 (LWP 447860)]
[New Thread 0x7ffff3c94700 (LWP 447861)]
[Detaching after fork from child process 447862]
[New Thread 0x7fffe9ffa700 (LWP 447863)]
[New Thread 0x7fffe97f9700 (LWP 447864)]
[Detaching after fork from child process 447865]
[New Thread 0x7fffe8ff8700 (LWP 447866)]
[New Thread 0x7fffc3fff700 (LWP 447867)]
[Detaching after fork from child process 447868]
[New Thread 0x7fffc37fe700 (LWP 447869)]
2022-06-13 11:56:51,444 - distributed.diskutils - INFO - Found stale lock file and directory '/home/lesteve/dask-worker-space/worker-ijsuqx4o', purging
2022-06-13 11:56:51,445 - distributed.diskutils - INFO - Found stale lock file and directory '/home/lesteve/dask-worker-space/worker-37teagdg', purging
2022-06-13 11:56:51,445 - distributed.diskutils - INFO - Found stale lock file and directory '/home/lesteve/dask-worker-space/worker-ehw7h68k', purging
[Detaching after fork from child process 447920]

Thread 6 "python" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffeb84d700 (LWP 447854)]
vm_stack_walk (w=<optimized out>)
    at ./Include/internal/pycore_stackwalk.h:80
80	        if (func != NULL && PyFunction_Check(func) && w->pc != NULL) {
(gdb) bt
#0  vm_stack_walk (w=<optimized out>)
    at ./Include/internal/pycore_stackwalk.h:80
#1  vm_frame_at_offset (ts=<optimized out>, offset=<optimized out>)
    at Python/ceval_meta.c:3272
#2  0x000055555587e334 in PyFrame_GetLineNumber (f=0x4c06a8c0250)
    at Objects/frameobject.c:48
#3  frame_getlineno (f=0x4c06a8c0250, closure=<optimized out>)
    at Objects/frameobject.c:56
#4  0x000055555562d7dc in _PyObject_GenericGetAttrWithDict (
    obj=0x4c06a8c0250, name=0x4c066823840, dict=0x0, 
    suppress=<optimized out>) at Objects/object.c:1251
#5  0x00005555558db738 in _PyEval_Fast (ts=0x7fffe4006230, initial_acc=..., 
    initial_pc=0x4c068ec9302 "5") at Python/ceval.c:1271
#6  0x00005555557189f2 in _PyEval_Eval (pc=<optimized out>, acc=..., 
    tstate=0x7fffe4006230) at Python/ceval_meta.c:2830
#7  _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, 
    nargsf=<optimized out>, kwnames=<optimized out>)
    at Python/ceval_meta.c:3227
#8  0x000055555586491d in _PyObject_VectorcallTstate (
    kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, 
    callable=<optimized out>, tstate=<optimized out>)
    at ./Include/cpython/abstract.h:114
#9  method_vectorcall (method=<optimized out>, 
    args=0x555555a302f0 <_Py_EmptyTupleStruct+48>, nargsf=<optimized out>, 
    kwnames=<optimized out>) at Objects/classobject.c:84
#10 0x00005555555cca1d in PyVectorcall_Call (callable=0x4c06a81d760, 
    tuple=<optimized out>, kwargs=<optimized out>) at Objects/call.c:231
#11 0x00005555558114a1 in t_bootstrap (boot_raw=0x4c06a8070a0)
    at ./Modules/_threadmodule.c:1289
#12 0x000055555578950b in pythread_wrapper (arg=<optimized out>)
    at Python/thread_pthread.h:245
#13 0x00007ffff7d6e609 in start_thread (arg=<optimized out>)
    at pthread_create.c:477
#14 0x00007ffff7b39163 in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) source /home/lesteve/dev/nogil/python-gdb.py
(gdb) py-bt
Traceback (most recent call first):
  File "/home/lesteve/venvs/nogil/lib/python3.9/site-packages/distributed/profile.py", line 72, in _f_lineno
    f_lineno = frame.f_lineno
  File "/home/lesteve/venvs/nogil/lib/python3.9/site-packages/distributed/profile.py", line 99, in info_frame
    f_lineno = _f_lineno(frame)
  File "/home/lesteve/venvs/nogil/lib/python3.9/site-packages/distributed/profile.py", line 178, in process
    d = {
  File "/home/lesteve/venvs/nogil/lib/python3.9/site-packages/distributed/profile.py", line 168, in process
    new_state = process(prev, frame, state, stop=stop, depth=depth - 1)
  File "/home/lesteve/venvs/nogil/lib/python3.9/site-packages/distributed/profile.py", line 168, in process
    new_state = process(prev, frame, state, stop=stop, depth=depth - 1)
  File "/home/lesteve/venvs/nogil/lib/python3.9/site-packages/distributed/profile.py", line 168, in process
    new_state = process(prev, frame, state, stop=stop, depth=depth - 1)
  File "/home/lesteve/venvs/nogil/lib/python3.9/site-packages/distributed/profile.py", line 168, in process
    new_state = process(prev, frame, state, stop=stop, depth=depth - 1)
  File "/home/lesteve/venvs/nogil/lib/python3.9/site-packages/distributed/profile.py", line 168, in process
    new_state = process(prev, frame, state, stop=stop, depth=depth - 1)
  File "/home/lesteve/venvs/nogil/lib/python3.9/site-packages/distributed/profile.py", line 168, in process
    new_state = process(prev, frame, state, stop=stop, depth=depth - 1)
  File "/home/lesteve/venvs/nogil/lib/python3.9/site-packages/distributed/profile.py", line 168, in process
    new_state = process(prev, frame, state, stop=stop, depth=depth - 1)
  File "/home/lesteve/venvs/nogil/lib/python3.9/site-packages/distributed/profile.py", line 343, in _watch
    process(frame, None, recent, omit=omit)
  File "/home/lesteve/dev/nogil/Lib/threading.py", line 886, in run
    self._target(*self._args, **self._kwargs)
  File "/home/lesteve/dev/nogil/Lib/threading.py", line 935, in _bootstrap_inner
    self.run()
  File "/home/lesteve/dev/nogil/Lib/threading.py", line 906, in _bootstrap
    self._bootstrap_inner()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions