Operations 19 min read

Why Did kube-apiserver OOM? A Deep Dive into Kubernetes Control‑Plane Failures

This article details a real‑world Kubernetes control‑plane outage where kube‑apiserver repeatedly OOM‑killed, examines cluster metrics, logs, heap and goroutine profiles, explores root‑cause hypotheses such as etcd latency and DeleteCollection memory leaks, and offers practical prevention steps.

Efficient Ops

Feb 7, 2023

Why Did kube-apiserver OOM? A Deep Dive into Kubernetes Control‑Plane Failures

Cluster and Environment Information

k8s v1.18.4

3 master nodes, each 8 CPU / 16 GB RAM, 50 Gi‑SSD

19 minion nodes with varied configurations

Control‑plane components (kube‑apiserver, etcd, kube‑controller‑manager, kube‑scheduler) deployed as static pods

VIP load‑balances traffic to the three kube‑apiserver front‑ends

Tencent Cloud SSD performance ~130 MB/s

Fault Description

On 2021‑09‑10 afternoon, kubectl occasionally hung and could not CRUD standard resources (Pod, Node, etc.). Investigation revealed that some kube‑apiserver instances were not functioning properly.

Observed Information

k8s control‑plane kube‑apiserver pod details:

$ kubectl get pods -n kube-system kube-apiserver-x.x.x.x -o yaml
...
containerStatuses:
- containerID: docker://xxxxx
  lastState:
    terminated:
      containerID: docker://yyyy
      exitCode: 137
      finishedAt: "2021-09-10T09:29:02Z"
      reason: OOMKilled
      startedAt: "2020-12-09T07:02:23Z"
  name: kube-apiserver
  ready: true
  restartCount: 1
  started: true
  state:
    running:
      startedAt: "2021-09-10T09:29:08Z"
...

9 Sep: kube‑apiserver was OOM‑killed.

Surrounding Monitoring

IaaS layer (host black‑box) monitoring snapshots:

Effective information:

Memory, CPU, and disk reads were positively correlated and dropped sharply on 10 Sep, then returned to normal.

Kube‑apiserver Prometheus metrics:

Effective information:

kube‑apiserver I/O problems – Prometheus failed to scrape metrics for a period.

kube‑apiserver memory grew monotonically; workqueue ADD IOPS were high.

Real‑time Debug Information

Effective information:

Both master nodes used ~80‑90 % of memory.

Most memory was consumed by the kube‑apiserver process.

One master’s CPU was saturated with high WA time.

Nearly every process was performing massive disk reads; shells became almost unusable.

The only master with relatively low memory usage (8 Gi) had previously experienced an OOM‑Kill of kube‑apiserver.

Questions and Hypotheses

Why does kube‑apiserver consume large memory?

Clients performing full List of core resources.

etcd unable to serve, causing kube‑apiserver and other control‑plane components to fail leader election, leading to continuous ListAndWatch loops that overwhelm the machine.

Potential memory leak in kube‑apiserver code.

Why does the etcd cluster malfunction?

Network jitter within the etcd cluster.

Disk performance degradation preventing normal etcd operation.

Insufficient compute resources (CPU, RAM) on etcd hosts, causing time‑slice starvation and deadline expirations.

Why do kube‑controller‑manager and kube‑scheduler generate heavy disk reads?

They read local configuration files.

When OS memory is extremely tight, large processes have pages evicted; when rescheduled they are re‑loaded, increasing I/O.

Logs

kube‑apiserver related logs:

I0907 07:04:17.611412 1 trace.go:116] Trace[1140445702]: "Get" url:/apis/storage.k8s.io/v1/volumeattachments/... (total time: 976.1773ms)
...
E0907 07:04:37.327057 1 authentication.go:53] Unable to authenticate the request due to an error: [invalid bearer token, context canceled]
...
W0907 07:10:39.496915 1 clientconn.go:1208] grpc: addrConn.createTransport failed to connect to {https://etcd0:2379 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing context deadline exceeded". Reconnecting...

etcd operations became progressively slower and eventually lost connection.

etcd logs (partial):

{"level":"warn","ts":"2021-09-10T17:14:50.559+0800","caller":"embed/config_logging.go:279","msg":"rejected connection","remote-addr":"10.0.0.42:49824","error":"read tcp 10.0.0.8:2380->10.0.0.42:49824: i/o timeout"}
{... "error":"EOF"}

etcd communication with other nodes was abnormal, unable to provide services.

Deep Investigation

Heap profile of kube‑apiserver shows massive memory consumption by registry(*Store).DeleteCollection, which first lists items then concurrently deletes them.

Goroutine profile reveals thousands of goroutines blocked on channel send, all originating from DeleteCollection workers.

goroutine 18970952966 [chan send, 429 minutes]:
 k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/registry/generic/registry.(*Store).DeleteCollection.func1(...)
--
... (many similar entries) ...

Explanation: when etcd errors occur, the worker goroutine exits but the distributor goroutine remains blocked on sending to the toProcess channel. The items retrieved from etcd cannot be garbage‑collected, causing heap growth and eventual OOM.

func (e *Store) DeleteCollection(ctx context.Context, deleteValidation rest.ValidateObjectFunc, options *metav1.DeleteOptions, listOptions *metainternalversion.ListOptions) (runtime.Object, error) {
    listObj, err := e.List(ctx, listOptions)
    if err != nil { return nil, err }
    items, err := meta.ExtractList(listObj)
    // ... spawn workers that call e.Delete ...
    wg.Wait()
    select { case err := <-errs: return nil, err default: return listObj, nil }
}

Summary

Define a clear baseline for a healthy cluster (e.g., 100 nodes, 1400 pods, 2 Gi memory for kube‑apiserver).

During troubleshooting: detect abnormal behavior, pinpoint the failing component, locate the time window via monitoring, hypothesize causes, and validate with logs and profiles.

Prevent control‑plane cascade failures by setting explicit CPU/memory limits for kube‑apiserver and isolating the etcd cluster from other control‑plane workloads.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Troubleshooting Profiling OOM etcd kube-apiserver

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.