Scaling Kubernetes Clusters: Node Quotas, Kernel Tweaks & Etcd Tips
This guide outlines how to prepare large‑scale Kubernetes clusters on public clouds by increasing node quotas, adjusting kernel parameters, configuring high‑availability etcd with the etcd‑operator, tuning kube‑apiserver settings, and applying pod‑level best practices for resource limits and affinity.
1. Node Quotas and Kernel Parameter Adjustments
When a Kubernetes cluster on a public cloud grows, you may encounter quota limits that need to be increased in the cloud console. Common quotas to enlarge include:
Number of virtual machines
Number of vCPUs
Number of internal IP addresses
Number of external IP addresses
Number of security groups
Number of route tables
Persistent storage size
Reference for GCE master node sizing as node count increases:
1‑5 nodes: n1-standard-1
6‑10 nodes: n1-standard-2
11‑100 nodes: n1-standard-4
101‑250 nodes: n1-standard-8
251‑500 nodes: n1-standard-16
More than 500 nodes: n1-standard-32
Reference for Alibaba Cloud configuration:
<code># Maximum number of file handles the system can open
fs.file-max=1000000
# ARP cache size parameters
net.ipv4.neigh.default.gc_thresh1=1024 # Minimum number of entries before GC runs (default 128)
net.ipv4.neigh.default.gc_thresh2=4096 # Soft limit for ARP entries
net.ipv4.neigh.default.gc_thresh3=8192 # Hard limit for ARP entries
# Netfilter connection tracking limits
net.netfilter.nf_conntrack_max=10485760
net.core.netdev_max_backlog=10000
net.netfilter.nf_conntrack_tcp_timeout_established=300
net.netfilter.nf_conntrack_buckets=655360
# Inotify limits
fs.inotify.max_user_instances=524288
fs.inotify.max_user_watches=524288</code>2. Etcd Database
1. Deploy a highly‑available etcd cluster that can automatically scale when the cluster grows.
The current solution uses the etcd‑operator, a CoreOS‑provided framework that simplifies management of stateful applications by extending the Kubernetes API.
Key features of the etcd‑operator:
create/destroy: automatically provision and delete etcd clusters without manual intervention.
resize: dynamically scale the etcd cluster up or down.
backup: support data backup and cluster restoration.
upgrade: upgrade the etcd cluster without service interruption.
Additional recommendations:
Use SSDs for etcd storage.
Increase
--quota-backend-bytes(default 2 GB) to enlarge storage limits.
Configure a dedicated etcd storage for the kube‑apiserver event data.
3. Kube APIServer Configuration
For clusters with ≥ 3000 nodes, set:
<code>--max-requests-inflight=3000
--max-mutating-requests-inflight=1000</code>For clusters with 1000‑3000 nodes, set:
<code>--max-requests-inflight=1500
--max-mutating-requests-inflight=500</code>Memory target (in MB) can be calculated as:
<code>--target-ram-mb=node_nums * 60</code>4. Pod Configuration
Best practices for running Pods include:
Define resource requests and limits for containers, especially for core add‑on services:
<code>spec.containers[].resources.limits.cpu
spec.containers[].resources.limits.memory
spec.containers[].resources.requests.cpu
spec.containers[].resources.requests.memory
spec.containers[].resources.limits.ephemeral-storage
spec.containers[].resources.requests.ephemeral-storage</code>Kubernetes classifies Pods into QoS classes based on these settings:
Guaranteed
Burstable
BestEffort
When node resources are scarce, the kubelet evicts Pods in the order: BestEffort > Burstable > Guaranteed.
Use
nodeAffinity,
podAffinityand
podAntiAffinityto spread critical workloads across nodes. Example for kube‑dns:
<code>affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- weight: 100
labelSelector:
matchExpressions:
- key: k8s-app
operator: In
values:
- kube-dns
topologyKey: kubernetes.io/hostname</code>Prefer managing containers with higher‑level controllers such as Deployment, StatefulSet, DaemonSet, or Job. Set --kube-api-qps=100 (default 50) for the scheduler and controller‑manager, and --kube-api-burst=100 (default 30).
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.