Why Did kube-apiserver OOM? A Deep Dive into Kubernetes Control-Plane Failures
This article analyzes a September 2021 incident where a Kubernetes cluster’s kube-apiserver repeatedly OOM-killed, causing kubectl hangs, by examining cluster specs, monitoring data, logs, heap and goroutine profiles, and the DeleteCollection implementation, ultimately offering troubleshooting steps and preventive measures for control-plane stability.
Cluster and Environment Information
k8s v1.18.4
3 master nodes, each 8 CPU / 16 GiB RAM, 50 Gi SSD
19 minion nodes with heterogeneous configurations
Control‑plane components (kube-apiserver, etcd, kube-controller-manager, kube-scheduler) deployed as static pods
A VIP load‑balances traffic to the three kube-apiserver instances
SSD performance on Tencent Cloud ≈130 MiB/s
Fault Description
On 2021‑09‑10 the kubectl command intermittently hung, indicating that some kube‑apiserver instances were not responding.
Observed Symptoms
kube-apiserver pod description (partial yaml) shows one instance terminated with OOMKilled :
<code>$ kubectl get pods -n kube-system kube-apiserver-x.x.x.x -o yaml
...
containerStatuses:
- containerID: docker://xxxxx
lastState:
terminated:
exitCode: 137
reason: OOMKilled
finishedAt: "2021-09-10T09:29:02Z"
name: kube-apiserver
ready: true
restartCount: 1
started: true
state:
running:
startedAt: "2021-09-10T09:29:08Z"
...</code>Surrounding Monitoring
IaaS‑level black‑box metrics (CPU, memory, disk I/O) show a strong correlation and a sharp drop on 2021‑09‑10, after which metrics returned to normal.
Memory, CPU and disk read metrics were positively correlated and all dropped sharply on 2021‑09‑10.
kube-apiserver Prometheus metrics reveal that the API server’s I/O stopped being scraped for a period, and its memory usage grew monotonically while the workqueue
ADDIOPS spiked.
Real‑time Debug Screenshots
Two master nodes consumed ~80‑90% of memory.
kube-apiserver process ate most of the memory.
One master’s CPU was saturated with high kernel‑mode wait (wa).
Almost every process performed massive disk reads, making the shell unusable.
The only node with relatively low memory usage (≈8 GiB) had previously been OOM‑killed.
Hypotheses and Guesses
Why does kube‑apiserver consume so much memory?
Clients performing full‑list operations on core resources.
etcd failing to serve, causing the API server and other control‑plane components to repeatedly ListAndWatch and exhaust resources.
Potential memory leak in kube‑apiserver code.
Why is the etcd cluster malfunctioning?
Network jitter within the etcd cluster.
Degraded disk performance preventing normal etcd operation.
Insufficient CPU/RAM on etcd hosts, leading to time‑slice starvation and deadline expirations.
Why do kube‑controller‑manager and kube‑scheduler generate heavy disk reads?
They read local configuration files.
When the OS is under extreme memory pressure, large processes have their pages evicted; when rescheduled they are re‑loaded, increasing I/O.
Relevant Logs
kube‑apiserver logs (excerpt):
<code>I0907 07:04:17.611412 1 trace.go:116] Trace[1140445702]: "Get" url:/apis/storage.k8s.io/v1/volumeattachments/... (total time: 976.1773ms)</code>
<code>E0907 07:04:37.327057 1 authentication.go:53] Unable to authenticate the request due to an error: [invalid bearer token, context canceled]</code>
<code>W0907 07:10:39.496915 1 clientconn.go:1208] grpc: addrConn.createTransport failed to connect to {https://etcd0:2379 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing context deadline exceeded". Reconnecting...</code>etcd warning logs show repeated connection rejections and time‑outs:
<code>{"level":"warn","ts":"2021-09-10T17:14:50.559+0800","msg":"rejected connection","error":"read tcp 10.0.0.8:2380->10.0.0.42:49824: i/o timeout"}</code>
<code>{"level":"warn","ts":"2021-09-10T17:15:03.961+0800","msg":"rejected connection","error":"EOF"}</code>Deep Investigation
Heap profile of kube‑apiserver shows massive memory consumption by
registry(*Store).DeleteCollection, which first lists items then concurrently deletes them.
Goroutine profile reveals tens of thousands of goroutines blocked on channel sends, all originating from
DeleteCollectionworkers.
<code>goroutine 18970952966 [chan send, 429 minutes]:
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/registry/generic/registry.(*Store).DeleteCollection.func1(...)
--
... (many similar entries omitted)</code>The
DeleteCollectionimplementation creates a dispatcher goroutine that feeds work items to a channel and a pool of worker goroutines that process deletions. If an etcd error occurs during
e.Delete, the workers exit but the dispatcher remains blocked on sending to the channel, preventing garbage collection of the retrieved items and causing memory to balloon.
<code>func (e *Store) DeleteCollection(ctx context.Context, deleteValidation rest.ValidateObjectFunc, options *metav1.DeleteOptions, listOptions *metainternalversion.ListOptions) (runtime.Object, error) {
listObj, err := e.List(ctx, listOptions)
if err != nil { return nil, err }
items, err := meta.ExtractList(listObj)
// ... create channel, workers, dispatcher ...
wg.Wait()
select {
case err := <-errs:
return nil, err
default:
return listObj, nil
}
}
</code>Summary of Troubleshooting Steps
Identify abnormal behavior (kubectl hangs, API server unresponsive).
Gather component information via management tools.
Correlate timestamps in monitoring systems to pinpoint CPU, RAM, and disk spikes.
Form hypotheses about root causes (client load, etcd failure, code bug).
Validate hypotheses by inspecting logs, heap profiles, and goroutine dumps.
Preventing Control‑Plane Cascading Failures
Explicitly set resource limits for kube‑apiserver to avoid excessive memory consumption in mixed‑workload clusters.
Deploy the etcd cluster separately from other control‑plane components.
Reference: Original article on GitHub
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.