Operations 8 min read

Scaling Kubernetes Clusters: Node Quotas, Kernel Tweaks & Etcd Tips

This guide outlines how to prepare large‑scale Kubernetes clusters on public clouds by increasing node quotas, adjusting kernel parameters, configuring high‑availability etcd with the etcd‑operator, tuning kube‑apiserver settings, and applying pod‑level best practices for resource limits and affinity.

Efficient Ops

Aug 11, 2021

Scaling Kubernetes Clusters: Node Quotas, Kernel Tweaks & Etcd Tips

1. Node Quotas and Kernel Parameter Adjustments

When a Kubernetes cluster on a public cloud grows, you may encounter quota limits that need to be increased in the cloud console. Common quotas to enlarge include:

Number of virtual machines

Number of vCPUs

Number of internal IP addresses

Number of external IP addresses

Number of security groups

Number of route tables

Persistent storage size

Reference for GCE master node sizing as node count increases:

1‑5 nodes: n1-standard-1

6‑10 nodes: n1-standard-2

11‑100 nodes: n1-standard-4

101‑250 nodes: n1-standard-8

251‑500 nodes: n1-standard-16

More than 500 nodes: n1-standard-32

Reference for Alibaba Cloud configuration:

# Maximum number of file handles the system can open
fs.file-max=1000000

# ARP cache size parameters
net.ipv4.neigh.default.gc_thresh1=1024   # Minimum number of entries before GC runs (default 128)
net.ipv4.neigh.default.gc_thresh2=4096   # Soft limit for ARP entries
net.ipv4.neigh.default.gc_thresh3=8192   # Hard limit for ARP entries

# Netfilter connection tracking limits
net.netfilter.nf_conntrack_max=10485760
net.core.netdev_max_backlog=10000
net.netfilter.nf_conntrack_tcp_timeout_established=300
net.netfilter.nf_conntrack_buckets=655360

# Inotify limits
fs.inotify.max_user_instances=524288
fs.inotify.max_user_watches=524288

2. Etcd Database

1. Deploy a highly‑available etcd cluster that can automatically scale when the cluster grows.

The current solution uses the etcd‑operator, a CoreOS‑provided framework that simplifies management of stateful applications by extending the Kubernetes API.

Key features of the etcd‑operator:

create/destroy: automatically provision and delete etcd clusters without manual intervention.

resize: dynamically scale the etcd cluster up or down.

backup: support data backup and cluster restoration.

upgrade: upgrade the etcd cluster without service interruption.

Additional recommendations:

Use SSDs for etcd storage.

Increase --quota-backend-bytes (default 2 GB) to enlarge storage limits.

Configure a dedicated etcd storage for the kube‑apiserver event data.

3. Kube APIServer Configuration

For clusters with ≥ 3000 nodes, set:

--max-requests-inflight=3000
--max-mutating-requests-inflight=1000

For clusters with 1000‑3000 nodes, set:

--max-requests-inflight=1500
--max-mutating-requests-inflight=500

Memory target (in MB) can be calculated as:

--target-ram-mb=node_nums * 60

4. Pod Configuration

Best practices for running Pods include:

Define resource requests and limits for containers, especially for core add‑on services:

spec.containers[].resources.limits.cpu
spec.containers[].resources.limits.memory
spec.containers[].resources.requests.cpu
spec.containers[].resources.requests.memory
spec.containers[].resources.limits.ephemeral-storage
spec.containers[].resources.requests.ephemeral-storage

Kubernetes classifies Pods into QoS classes based on these settings:

Guaranteed

Burstable

BestEffort

When node resources are scarce, the kubelet evicts Pods in the order: BestEffort > Burstable > Guaranteed.

Use nodeAffinity, podAffinity and podAntiAffinity to spread critical workloads across nodes. Example for kube‑dns:

affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - weight: 100
      labelSelector:
        matchExpressions:
        - key: k8s-app
          operator: In
          values:
          - kube-dns
      topologyKey: kubernetes.io/hostname

Prefer managing containers with higher‑level controllers such as Deployment, StatefulSet, DaemonSet, or Job. Set --kube-api-qps=100 (default 50) for the scheduler and controller‑manager, and --kube-api-burst=100 (default 30).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Operations cluster scaling Kernel Tuning

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.