Cloud Native 14 min read

Why Do Readiness Probe Failures Show “OCI runtime exec failed: EOF” in Kubernetes?

A Kubernetes pod reported readiness probe warnings with an OCI runtime exec failure, which was traced through kubelet, Docker, dockershim, containerd, and runc, ultimately caused by a race condition with cpu‑manager updating the container state file, and resolved by disabling cpu‑manager or upgrading runc.

Ops Development Stories
Ops Development Stories
Ops Development Stories
Why Do Readiness Probe Failures Show “OCI runtime exec failed: EOF” in Kubernetes?

Introduction

Problem investigation process, source code part recorded by a developer colleague; published with consent.

Problem

Customer reported many warning events:

Readiness probe failed: OCI runtime exec failed: exec failed: EOF: unknown

, but the service remained accessible.

Environment

Note: the customer enabled cpu-manager on the k8s node running the workload.

Component

Version

k8s

1.14.x

Investigation

1. After receiving the feedback, check the kubelet logs on the node where the pod runs:

<code>I0507 03:43:28.310630 57003 prober.go:112] Readiness probe for "adsfadofadfabdfhaodsfa(d1aab5f0-ae8f-11eb-a151-080027049c65):c0" failed (failure): OCI runtime exec failed: exec failed: EOF: unknown
I0507 07:08:49.834093 57003 prober.go:112] Readiness probe for "adsfadofadfabdfhaodsfa(a89a158e-ae8f-11eb-a151-080027049c65):c0" failed (failure): OCI runtime exec failed: exec failed: unexpected EOF: unknown
I0507 10:06:58.307881 57003 prober.go:112] Readiness probe for "adsfadofadfabdfhaodsfa(d1aab5f0-ae8f-11eb-a151-080027049c65):c0" failed (failure): OCI runtime exec failed: exec failed: EOF: unknown</code>

The probe error type is

failure

, corresponding code is shown:

probe error code
probe error code

2. Check Docker logs:

<code>time="2021-05-06T16:51:40.009989451+08:00" level=error msg="stream copy error: reading from a closed fifo"
time="2021-05-06T16:51:40.010054596+08:00" level=error msg="stream copy error: reading from a closed fifo"
time="2021-05-06T16:51:40.170676532+08:00" level=error msg="Error running exec 8e34e8b910694abe95a467b2936b37635fdabd2f7b7c464dfef952fa5732aa4e in container: OCI runtime exec failed: exec failed: EOF: unknown"</code>

Although Docker logs show a stream copy error, the underlying

runc

returned EOF, causing the error. Because the probe type is Failure,

e.CombinedOutPut()

returns a non‑nil error and a non‑zero exit status, which leads to a call to

ExecInContainer

.

ExecInContainer flow
ExecInContainer flow
ExecSync via dockershim
ExecSync via dockershim
dockershim ExecInContainer
dockershim ExecInContainer
ExecInContainer

implementation (excerpt):

<code>func (*NativeExecHandler) ExecInContainer(client libdocker.Interface, container *dockertypes.ContainerJSON, cmd []string, stdin io.Reader, stdout, stderr io.WriteCloser, tty bool, resize <-chan remotecommand.TerminalSize, timeout time.Duration) error {
    execObj, err := client.CreateExec(container.ID, createOpts)
    startOpts := dockertypes.ExecStartCheck{Detach: false, Tty: tty}
    streamOpts := libdocker.StreamOptions{InputStream: stdin, OutputStream: stdout, ErrorStream: stderr, RawTerminal: tty, ExecStarted: execStarted}
    err = client.StartExec(execObj.ID, startOpts, streamOpts)
    if err != nil { return err }
    // poll for completion
    ticker := time.NewTicker(2 * time.Second)
    defer ticker.Stop()
    for {
        inspect, err2 := client.InspectExec(execObj.ID)
        if err2 != nil { return err2 }
        if !inspect.Running {
            if inspect.ExitCode != 0 { err = &dockerExitError{inspect} }
            break
        }
        <-ticker.C
    }
    return err
}
</code>

ExecInContainer performs three main steps:

Call

CreateExec

to create an ExecID.

Call

StartExec

to run the exec and redirect I/O.

Call

InspectExec

to obtain the running status and exit code.

The error printed in the logs is the response stream from dockerd, i.e., dockerd’s response contains the error.

dockerd error handling
dockerd error handling

Further tracing shows that

ExecStart

eventually calls containerd code, which invokes

runc

. The

runc

exec fails with

exec failed: EOF: unknown

.

runc execution path
runc execution path

Repeated execution of

runc

reproduces the issue sporadically. Investigation revealed that

runc

reads the container’s

state.json

. When the kubelet cpu‑manager updates the container (default every 10 s), it writes to

state.json

concurrently, causing a partial write. The JSON decoder then encounters an unexpected EOF.

state.json race condition
state.json race condition

A related runc PR fixes the problem by making

saveState

an atomic operation.

<code>// original saveState
func (c *linuxContainer) saveState(s *State) error {
    f, err := os.Create(filepath.Join(c.root, stateFilename))
    if err != nil { return err }
    defer f.Close()
    return utils.WriteJSON(f, s)
}

// fixed saveState
func (c *linuxContainer) saveState(s *State) (retErr error) {
    tmpFile, err := ioutil.TempFile(c.root, "state-")
    if err != nil { return err }
    defer func() {
        if retErr != nil {
            tmpFile.Close()
            os.Remove(tmpFile.Name())
        }
    }()
    err = utils.WriteJSON(tmpFile, s)
    if err != nil { return err }
    err = tmpFile.Close()
    if err != nil { return err }
    stateFilePath := filepath.Join(c.root, stateFilename)
    return os.Rename(tmpFile.Name(), stateFilePath)
}
</code>

Solution

Disable cpu‑manager.

Upgrade

runc

to a version containing the above fix.

DebuggingKubernetescontainerdrunccpu-managerReadinessProbeOCI runtime
Ops Development Stories
Written by

Ops Development Stories

Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.