Why Did kube-apiserver OOM? A Deep Dive into Kubernetes Control‑Plane Failures
This article details a real‑world Kubernetes control‑plane outage where kube‑apiserver repeatedly OOM‑killed, examines cluster metrics, logs, heap and goroutine profiles, explores root‑cause hypotheses such as etcd latency and DeleteCollection memory leaks, and offers practical prevention steps.
Cluster and Environment Information
k8s v1.18.4
3 master nodes, each 8 CPU / 16 GB RAM, 50 Gi‑SSD
19 minion nodes with varied configurations
Control‑plane components (kube‑apiserver, etcd, kube‑controller‑manager, kube‑scheduler) deployed as static pods
VIP load‑balances traffic to the three kube‑apiserver front‑ends
Tencent Cloud SSD performance ~130 MB/s
Fault Description
On 2021‑09‑10 afternoon, kubectl occasionally hung and could not CRUD standard resources (Pod, Node, etc.). Investigation revealed that some kube‑apiserver instances were not functioning properly.
Observed Information
k8s control‑plane kube‑apiserver pod details:
<code>$ kubectl get pods -n kube-system kube-apiserver-x.x.x.x -o yaml
...
containerStatuses:
- containerID: docker://xxxxx
lastState:
terminated:
containerID: docker://yyyy
exitCode: 137
finishedAt: "2021-09-10T09:29:02Z"
reason: OOMKilled
startedAt: "2020-12-09T07:02:23Z"
name: kube-apiserver
ready: true
restartCount: 1
started: true
state:
running:
startedAt: "2021-09-10T09:29:08Z"
...</code>9 Sep: kube‑apiserver was OOM‑killed.
Surrounding Monitoring
IaaS layer (host black‑box) monitoring snapshots:
Effective information:
Memory, CPU, and disk reads were positively correlated and dropped sharply on 10 Sep, then returned to normal.
Kube‑apiserver Prometheus metrics:
Effective information:
kube‑apiserver I/O problems – Prometheus failed to scrape metrics for a period.
kube‑apiserver memory grew monotonically; workqueue
ADDIOPS were high.
Real‑time Debug Information
Effective information:
Both master nodes used ~80‑90 % of memory.
Most memory was consumed by the kube‑apiserver process.
One master’s CPU was saturated with high WA time.
Nearly every process was performing massive disk reads; shells became almost unusable.
The only master with relatively low memory usage (8 Gi) had previously experienced an OOM‑Kill of kube‑apiserver.
Questions and Hypotheses
Why does kube‑apiserver consume large memory?
Clients performing full List of core resources.
etcd unable to serve, causing kube‑apiserver and other control‑plane components to fail leader election, leading to continuous ListAndWatch loops that overwhelm the machine.
Potential memory leak in kube‑apiserver code.
Why does the etcd cluster malfunction?
Network jitter within the etcd cluster.
Disk performance degradation preventing normal etcd operation.
Insufficient compute resources (CPU, RAM) on etcd hosts, causing time‑slice starvation and deadline expirations.
Why do kube‑controller‑manager and kube‑scheduler generate heavy disk reads?
They read local configuration files.
When OS memory is extremely tight, large processes have pages evicted; when rescheduled they are re‑loaded, increasing I/O.
Logs
kube‑apiserver related logs:
<code>I0907 07:04:17.611412 1 trace.go:116] Trace[1140445702]: "Get" url:/apis/storage.k8s.io/v1/volumeattachments/... (total time: 976.1773ms)
...
E0907 07:04:37.327057 1 authentication.go:53] Unable to authenticate the request due to an error: [invalid bearer token, context canceled]
...
W0907 07:10:39.496915 1 clientconn.go:1208] grpc: addrConn.createTransport failed to connect to {https://etcd0:2379 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing context deadline exceeded". Reconnecting...</code>etcd operations became progressively slower and eventually lost connection.
etcd logs (partial):
<code>{"level":"warn","ts":"2021-09-10T17:14:50.559+0800","caller":"embed/config_logging.go:279","msg":"rejected connection","remote-addr":"10.0.0.42:49824","error":"read tcp 10.0.0.8:2380->10.0.0.42:49824: i/o timeout"}
{... "error":"EOF"}
</code>etcd communication with other nodes was abnormal, unable to provide services.
Deep Investigation
Heap profile of kube‑apiserver shows massive memory consumption by
registry(*Store).DeleteCollection, which first lists items then concurrently deletes them.
Goroutine profile reveals thousands of goroutines blocked on channel send, all originating from
DeleteCollectionworkers.
<code>goroutine 18970952966 [chan send, 429 minutes]:
k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/registry/generic/registry.(*Store).DeleteCollection.func1(...)
--
... (many similar entries) ...
</code>Explanation: when etcd errors occur, the worker goroutine exits but the distributor goroutine remains blocked on sending to the
toProcesschannel. The items retrieved from etcd cannot be garbage‑collected, causing heap growth and eventual OOM.
<code>func (e *Store) DeleteCollection(ctx context.Context, deleteValidation rest.ValidateObjectFunc, options *metav1.DeleteOptions, listOptions *metainternalversion.ListOptions) (runtime.Object, error) {
listObj, err := e.List(ctx, listOptions)
if err != nil { return nil, err }
items, err := meta.ExtractList(listObj)
// ... spawn workers that call e.Delete ...
wg.Wait()
select { case err := <-errs: return nil, err default: return listObj, nil }
}
</code>Summary
Define a clear baseline for a healthy cluster (e.g., 100 nodes, 1400 pods, 2 Gi memory for kube‑apiserver).
During troubleshooting: detect abnormal behavior, pinpoint the failing component, locate the time window via monitoring, hypothesize causes, and validate with logs and profiles.
Prevent control‑plane cascade failures by setting explicit CPU/memory limits for kube‑apiserver and isolating the etcd cluster from other control‑plane workloads.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.