Backend Development 10 min read

Investigation of Subprocess Pipe Blocking in a Python Agent

The article analyzes a rare blocking issue in a Python Agent where a subprocess created via subprocess.Popen hangs due to an unclosed pipe inherited by a forked child process, demonstrates debugging steps using thread frames and lsof, and provides a reproducible example and mitigation recommendations.

NetEase Game Operations Platform
NetEase Game Operations Platform
NetEase Game Operations Platform
Investigation of Subprocess Pipe Blocking in a Python Agent

Background : During a recent update, a new subprocess was added to a Python Agent. After deployment, the monitoring thread that watches this subprocess occasionally became blocked. The blocking call was subprocess.Popen , which executed an ss command to collect local connection information.

现场 (Investigation) : Using Pylane Shell , the author inspected the blocked thread's stack trace, revealing that the thread was stuck in os.read inside the subprocess32 library. The relevant code sections were identified, showing the creation of a pipe and a loop that reads from it until an empty string or a size limit is reached.

1 In [25]: tools.print_threads(["pusher_keep_alive"])
2 Out[25]:
3 
4 Thread: pusher_keep_alive
5 Stack:
6 ...
7   File "/usr/local/gpostd_virtualenv/local/lib/python2.7/site-packages/libase/multirun/common.py", line 237, in run_cmdline_with_file
8   ...
9   File "/usr/local/gpostd_virtualenv/local/lib/python2.7/site-packages/subprocess32.py", line 825, in __init__
10   ...
11   File "/usr/local/gpostd_virtualenv/local/lib/python2.7/site-packages/subprocess32.py", line 1537, in _execute_child
12   part = _eintr_retry_call(os.read, errpipe_read, 50000)
13   ...
14   Locals:
15   {'args': (18, 50000), 'func':
}

The code shows that the thread was blocked at os.read(errpipe_read, 50000) . The author then used lsof to discover that both the main process (PID 912) and the forked child process (PID 25189) held the read end of the pipe, while only the child held the write end, preventing the pipe from closing.

lsof -d 18
python2.7   912   root   18r   FIFO   0,10   0t0 20491606 pipe
python2.7 25189   root   18r   FIFO   0,10   0t0 20491606 pipe

lsof -d 19
python2.7 25189   root   19w   FIFO   0,10   0t0 20491606 pipe

Analysis : The issue stems from a race condition where the main process opens the pipe, then a separate thread forks a child process that inherits the write end of the pipe. When the main thread later closes its write end, the child still holds it open, so the read call never receives an EOF and blocks indefinitely.

Proof of Concept : A minimal reproducible example was provided. The case_good function demonstrates normal pipe behavior (close write end → read returns empty). The case_bad function forks a child while keeping the write end open, reproducing the blocking behavior.

# -*- coding: utf-8 -*-
import time, os
from multiprocessing import Process

def case_good():
    """Normal case: close write end, read returns empty string"""
    fdr, fdw = os.pipe()
    os.close(fdw)
    r = os.read(fdr, 10)
    if r == '':
        print("fdr closed")

def sub_process():
    """Child process sleeps then closes write end"""
    time.sleep(10)
    os.close(fdw)

def case_bad():
    """Fork while holding fd, reproduces the blocking issue"""
    global fdw, fdr
    fdr, fdw = os.pipe()
    p = Process(target=sub_process)
    p.start()
    time.sleep(2)
    os.close(fdw)  # write end still held by child
    r = os.read(fdr, 10)  # blocks until child closes fdw
    if r == '':
        print("fdr closed")

case_good()
case_bad()

Running the script shows the case_bad execution hanging for about 10 seconds until the child process closes the write end, after which the read returns and the program finishes.

Conclusion : Forking in an unexpected context can cause child processes to inherit unnecessary file descriptors, leading to subtle bugs such as pipe blocking. The recommended mitigation is to carefully manage resource inheritance—close unnecessary descriptors before forking or use close_fds=True where appropriate, and ensure thread and process lifecycles are well‑ordered.

Further Reading :

Pipe fundamentals: IBM IPC guide , Python os.pipe documentation

Linux fork: fork(2) manual

Python stack frames: Python data model

debuggingPythonLinuxforkpipesubprocess
NetEase Game Operations Platform
Written by

NetEase Game Operations Platform

The NetEase Game Automated Operations Platform delivers stable services for thousands of NetEase titles, focusing on efficient ops workflows, intelligent monitoring, and virtualization.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.