How to Fix Common Kubernetes Memory Leaks and Certificate Expiration Issues
This article walks through diagnosing and resolving two frequent Kubernetes problems—memory‑leak errors that cause "cannot allocate memory" or "no space left on device" messages, and expired cluster certificates—by checking cgroup stats, recompiling runc and kubelet, and renewing certificates with kubeadm for long‑term validity.
As microservice adoption grows, Kubernetes clusters are used more extensively, bringing a series of operational problems. This article introduces two common issues and provides step‑by‑step solutions.
Problem 1: "cannot allocate memory" or "no space left on device" – Kubernetes memory leak
Problem description
After a Kubernetes cluster runs for a long time, some nodes fail to create new Pods and report errors such as:
<code>applying cgroup … caused: mkdir …no space left on device</code>
<code>cannot allocate memory</code>The cause is often a memory leak in the kmem accounting subsystem.
Detecting the leak
<code>cat /sys/fs/cgroup/memory/kubepods/memory.kmem.slabinfo</code>If the command returns "Input/output error", the leak is absent; otherwise, slabinfo entries indicate a leak.
Solution overview
Disable kmem accounting by recompiling
runcand
kubeletwithout kmem support, then replace the binaries.
Recompile runc
<code>wget https://dl.google.com/go/go1.12.9.linux-amd64.tar.gz</code>
<code>tar xf go1.12.9.linux-amd64.tar.gz -C /usr/local/</code>
<code>export GOPATH="/data/Documents"</code>
<code>export GOROOT="/usr/local/go"</code>
<code>export PATH="$GOROOT/bin:$GOPATH/bin:$PATH"</code>
<code>go env</code>
<code>mkdir -p /data/Documents/src/github.com/opencontainers/</code>
<code>git clone https://github.com/opencontainers/runccd runc/</code>
<code>git checkout v1.0.0-rc9</code>
<code>sudo yum install libseccomp-devel</code>
<code>make BUILDTAGS='seccomp nokmem'</code>Recompile kubelet
<code>mkdir -p /root/k8s/</code>
<code>git clone https://github.com/kubernetes/kubernetes/ /root/k8s/kubernetes</code>
<code>git checkout v1.15.3</code>
<code># Build a Docker image with Go environment</code>
<code>FROM centos:centos7.3.1611</code>
<code>ENV GOROOT /usr/local/go</code>
<code>ENV GOPATH /usr/local/gopath</code>
<code>ENV PATH /usr/local/go/bin:$PATH</code>
<code># install build tools, then compile</code>
<code>docker run -it --rm -v /root/k8s/kubernetes:/usr/local/gopath/src/k8s.io/kubernetes build-k8s:centos-7.3-go-1.12.9-k8s-1.15.3 bash</code>
<code>cd /usr/local/gopath/src/k8s.io/kubernetes</code>
<code>GO111MODULE=off KUBE_GIT_TREE_STATE=clean KUBE_GIT_VERSION=v1.15.3 make kubelet GOFLAGS="-tags=nokmem"</code>Replace binaries
<code>mv /usr/bin/kubelet /home/kubelet</code>
<code>mv /usr/bin/docker-runc /home/docker-runc</code>
<code>systemctl stop docker</code>
<code>systemctl stop kubelet</code>
<code>cp kubelet /usr/bin/kubelet</code>
<code>cp kubelet /usr/local/bin/kubelet</code>
<code>cp runc /usr/bin/docker-runc</code>
<code>cat /sys/fs/cgroup/memory/kubepods/burstable/memory.kmem.usage_in_bytes</code>
<code>cat /sys/fs/cgroup/memory/kubepods/memory.kmem.slabinfo</code>Problem 2: Kubernetes certificate expiration
Background
The API becomes inaccessible with the error:
<code>Unable to connect to the server: x509: certificate has expired or is not yet valid</code>Check expiration using:
<code>kubeadm alpha certs check-expiration</code>Solution
Renew all certificates and restart components:
<code>kubeadm alpha certs renew all --config=kubeadm.yaml</code>
<code>systemctl restart kubelet</code>
<code>kubeadm init phase kubeconfig all --config kubeadm.yaml</code>For a long‑lived (10‑year) certificate, edit the
kube-controller-managermanifest to add:
<code>spec:</code>
<code> containers:</code>
<code> - command:</code>
<code> - kube-controller-manager</code>
<code> - --experimental-cluster-signing-duration=87600h</code>
<code> - --client-ca-file=/etc/kubernetes/pki/ca.crt</code>Approve pending CSRs, then replace the etcd CA files:
<code>cp /etc/kubernetes/pki/ca.crt /etc/kubernetes/pki/etcd/ca.crt</code>
<code>cp /etc/kubernetes/pki/ca.crt /etc/kubernetes/pki/front-proxy-ca.crt</code>
<code>cp /etc/kubernetes/pki/ca.key /etc/kubernetes/pki/front-proxy-ca.key</code>After these changes, the cluster runs with a ten‑year certificate without needing frequent renewals.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.