Operations 20 min read

Why Did kube-apiserver OOM? A Deep Dive into Kubernetes Control-Plane Failures

This article analyzes a September 2021 incident where a Kubernetes cluster’s kube-apiserver repeatedly OOM-killed, causing kubectl hangs, by examining cluster specs, monitoring data, logs, heap and goroutine profiles, and the DeleteCollection implementation, ultimately offering troubleshooting steps and preventive measures for control-plane stability.

Efficient Ops

Aug 9, 2022

Why Did kube-apiserver OOM? A Deep Dive into Kubernetes Control-Plane Failures

Cluster and Environment Information

k8s v1.18.4

3 master nodes, each 8 CPU / 16 GiB RAM, 50 Gi SSD

19 minion nodes with heterogeneous configurations

Control‑plane components (kube-apiserver, etcd, kube-controller-manager, kube-scheduler) deployed as static pods

A VIP load‑balances traffic to the three kube-apiserver instances

SSD performance on Tencent Cloud ≈130 MiB/s

Fault Description

On 2021‑09‑10 the kubectl command intermittently hung, indicating that some kube‑apiserver instances were not responding.

Observed Symptoms

kube-apiserver pod description (partial yaml) shows one instance terminated with OOMKilled :

$ kubectl get pods -n kube-system kube-apiserver-x.x.x.x -o yaml
... 
containerStatuses:
- containerID: docker://xxxxx
  lastState:
    terminated:
      exitCode: 137
      reason: OOMKilled
      finishedAt: "2021-09-10T09:29:02Z"
  name: kube-apiserver
  ready: true
  restartCount: 1
  started: true
  state:
    running:
      startedAt: "2021-09-10T09:29:08Z"
...

Surrounding Monitoring

IaaS‑level black‑box metrics (CPU, memory, disk I/O) show a strong correlation and a sharp drop on 2021‑09‑10, after which metrics returned to normal.

Memory, CPU and disk read metrics were positively correlated and all dropped sharply on 2021‑09‑10.

kube-apiserver Prometheus metrics reveal that the API server’s I/O stopped being scraped for a period, and its memory usage grew monotonically while the workqueue ADD IOPS spiked.

Real‑time Debug Screenshots

Two master nodes consumed ~80‑90% of memory.

kube-apiserver process ate most of the memory.

One master’s CPU was saturated with high kernel‑mode wait (wa).

Almost every process performed massive disk reads, making the shell unusable.

The only node with relatively low memory usage (≈8 GiB) had previously been OOM‑killed.

Hypotheses and Guesses

Why does kube‑apiserver consume so much memory?

Clients performing full‑list operations on core resources.

etcd failing to serve, causing the API server and other control‑plane components to repeatedly ListAndWatch and exhaust resources.

Potential memory leak in kube‑apiserver code.

Why is the etcd cluster malfunctioning?

Network jitter within the etcd cluster.

Degraded disk performance preventing normal etcd operation.

Insufficient CPU/RAM on etcd hosts, leading to time‑slice starvation and deadline expirations.

Why do kube‑controller‑manager and kube‑scheduler generate heavy disk reads?

They read local configuration files.

When the OS is under extreme memory pressure, large processes have their pages evicted; when rescheduled they are re‑loaded, increasing I/O.

Relevant Logs

kube‑apiserver logs (excerpt):

I0907 07:04:17.611412 1 trace.go:116] Trace[1140445702]: "Get" url:/apis/storage.k8s.io/v1/volumeattachments/... (total time: 976.1773ms)</code>
<code>E0907 07:04:37.327057 1 authentication.go:53] Unable to authenticate the request due to an error: [invalid bearer token, context canceled]</code>
<code>W0907 07:10:39.496915 1 clientconn.go:1208] grpc: addrConn.createTransport failed to connect to {https://etcd0:2379 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing context deadline exceeded". Reconnecting...

etcd warning logs show repeated connection rejections and time‑outs:

{"level":"warn","ts":"2021-09-10T17:14:50.559+0800","msg":"rejected connection","error":"read tcp 10.0.0.8:2380->10.0.0.42:49824: i/o timeout"}</code>
<code>{"level":"warn","ts":"2021-09-10T17:15:03.961+0800","msg":"rejected connection","error":"EOF"}

Deep Investigation

Heap profile of kube‑apiserver shows massive memory consumption by registry(*Store).DeleteCollection, which first lists items then concurrently deletes them.

Goroutine profile reveals tens of thousands of goroutines blocked on channel sends, all originating from DeleteCollection workers.

goroutine 18970952966 [chan send, 429 minutes]:
 k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/registry/generic/registry.(*Store).DeleteCollection.func1(...)
--
... (many similar entries omitted)

The DeleteCollection implementation creates a dispatcher goroutine that feeds work items to a channel and a pool of worker goroutines that process deletions. If an etcd error occurs during e.Delete, the workers exit but the dispatcher remains blocked on sending to the channel, preventing garbage collection of the retrieved items and causing memory to balloon.

func (e *Store) DeleteCollection(ctx context.Context, deleteValidation rest.ValidateObjectFunc, options *metav1.DeleteOptions, listOptions *metainternalversion.ListOptions) (runtime.Object, error) {
    listObj, err := e.List(ctx, listOptions)
    if err != nil { return nil, err }
    items, err := meta.ExtractList(listObj)
    // ... create channel, workers, dispatcher ...
    wg.Wait()
    select {
    case err := <-errs:
        return nil, err
    default:
        return listObj, nil
    }
}

Summary of Troubleshooting Steps

Identify abnormal behavior (kubectl hangs, API server unresponsive).

Gather component information via management tools.

Correlate timestamps in monitoring systems to pinpoint CPU, RAM, and disk spikes.

Form hypotheses about root causes (client load, etcd failure, code bug).

Validate hypotheses by inspecting logs, heap profiles, and goroutine dumps.

Preventing Control‑Plane Cascading Failures

Explicitly set resource limits for kube‑apiserver to avoid excessive memory consumption in mixed‑workload clusters.

Deploy the etcd cluster separately from other control‑plane components.

Reference: Original article on GitHub

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

cloud-native OOM Goroutine etcd kube-apiserver

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.