Containerizing Elasticsearch: Architecture Upgrade, API Gateway Integration, and Cloud‑Native Migration
This article details how a large‑scale Elasticsearch deployment was transformed from physical servers to a Kubernetes‑based, containerized architecture, addressing cost, scalability, API compatibility, security, observability, and multi‑cloud migration using the 极限网关 API gateway.
Background – In 2022 the logging Elasticsearch service ran on dozens of physical clusters with hundreds of machines, leading to high hardware and operational costs and limited resource elasticity.
Traffic Characteristics – Real‑world traffic shows strong tidal patterns: high CPU and memory usage during peaks and very low utilization during troughs, which cannot be shared across business lines in a physical setup.
Motivation for Containerization – To reduce cost and operational complexity, the team explored running Elasticsearch on Kubernetes (k8s), which offers both horizontal and vertical scaling.
Physical‑Era API Architecture – The legacy API relied on tribe‑node, which introduced high coupling, version incompatibility, and lacked traffic control or authentication.
Hybrid‑Era API Architecture – By evaluating official proxy mode and third‑party solutions, the team selected the 极限网关 (Jiliang Gateway) as a replacement for tribe‑node to provide a high‑availability, version‑compatible API layer.
Why 极限网关? – It has low learning cost, strong performance (optimized for Elasticsearch), high security (basic‑auth, LDAP), cross‑version support, and flexible extensibility via plugins and filters.
Security Policies – Basic‑auth and LDAP plugins enforce authentication; request_user_filter enables user‑based request filtering, preventing unauthorized access.
Filtering Capabilities – The gateway supports IP, hostname, header rate‑limiting and user‑based blocking, allowing fine‑grained traffic control.
Observability – Detailed request logging provides traffic curves, status‑code distribution, cache statistics, per‑gateway request volume, and request‑to‑node mapping, simplifying monitoring and troubleshooting.
Milestones – By March 2023, all Elasticsearch clusters were containerized using the Elastic Cloud on Kubernetes (ECK) operator, scaling to hundreds of nodes with >1 PB of data and ~100 TB daily ingest, supporting multiple business lines.
Unexpected Benefits – Seamless dual‑write to two clusters, smooth cloud migration, and reduced downtime during version upgrades.
Gateway Traffic Overview – Aggregated metrics give a clear view of request volume, status codes, and cache usage, enabling proactive management.
Future Plans – Continue containerizing remaining clusters, adopt hybrid multi‑cloud deployments for elastic scaling, implement disaster‑recovery with rapid cloud‑based ES provisioning, and explore new features such as Easysearch for performance gains.
TAL Education Technology
TAL Education is a technology-driven education company committed to the mission of 'making education better through love and technology'. The TAL technology team has always been dedicated to educational technology research and innovation. This is the external platform of the TAL technology team, sharing weekly curated technical articles and recruitment information.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.