Context
The Dask threaded scheduler (e.g. dask.array or dask.dataframe without dask.distributed) blocks the main thread when the user calls compute() and internally spawns a concurrent.futures.ThreadPoolExecutor to schedule tasks. If a user hits Ctrl+C, they expect the program to terminate soon (at most waiting for the tasks that are halfway through execution to terminate).
Bug report
This sequence frequently deadlocks on Linux x64 free-threaded:
- main thread calls
threading.Thread.start
- another thread calls
signal.pthread_kill(main_thread, signal.SIGINT), which lands after start() has returned but before the thread's target function has started executing
- main thread handles the resulting
KeyboardInterrupt
- main thread calls
.join() on the child thread started at point 1.
- frequently,
.join() deadlocks. A second SIGINT (e.g. from Ctrl+C) unblocks it.
Reproducer
import signal
import threading
USE_BARRIER = False
NTHREADS = 20
def demo():
main_thread = threading.get_ident()
barrier = threading.Barrier(NTHREADS) if USE_BARRIER else None
thread_can_die = threading.Event()
def f() -> None:
if barrier:
barrier.wait()
thread_can_die.wait()
def interrupt() -> None:
if barrier:
barrier.wait()
signal.pthread_kill(main_thread, signal.SIGINT)
threads = []
for _ in range(NTHREADS - 1):
t = threading.Thread(target=f)
t.start()
threads.append(t)
try:
t = threading.Thread(target=interrupt)
t.start()
threads.append(t)
for t in threads:
t.join()
except KeyboardInterrupt:
pass
else:
assert False, "Expected KeyboardInterrupt"
thread_can_die.set()
for t in threads:
t.join()
if __name__ == "__main__":
i = 0
while True:
print(i, flush=True)
demo()
i += 1
On these setups, the above program hangs within a handful of cycles:
- Linux x64 3.14.5t
- Linux x64 git tip #7a468a101268d2b13105f94ae027df8b502d0c87 (built with
Tools/pixi-packages/freethreading)
Cannot reproduce on
Setting USE_BARRIER=True, which ensures that all child threads are already fully started when the SIGINT lands on the main thread, makes the issue disappear.
CPython versions tested on:
CPython main branch, 3.14
Operating systems tested on:
Linux
Context
The Dask threaded scheduler (e.g.
dask.arrayordask.dataframewithoutdask.distributed) blocks the main thread when the user callscompute()and internally spawns aconcurrent.futures.ThreadPoolExecutorto schedule tasks. If a user hits Ctrl+C, they expect the program to terminate soon (at most waiting for the tasks that are halfway through execution to terminate).Bug report
This sequence frequently deadlocks on Linux x64 free-threaded:
threading.Thread.startsignal.pthread_kill(main_thread, signal.SIGINT), which lands afterstart()has returned but before the thread'stargetfunction has started executingKeyboardInterrupt.join()on the child thread started at point 1..join()deadlocks. A second SIGINT (e.g. from Ctrl+C) unblocks it.Reproducer
On these setups, the above program hangs within a handful of cycles:
Tools/pixi-packages/freethreading)Cannot reproduce on
Setting USE_BARRIER=True, which ensures that all child threads are already fully started when the SIGINT lands on the main thread, makes the issue disappear.
CPython versions tested on:
CPython main branch, 3.14
Operating systems tested on:
Linux