How We Cut Test Environment Build Time to Minutes with Kubernetes
The article details how Xueersi's 1‑to‑1 quality‑efficiency team transformed their cumbersome manual test‑environment setup into a fast, containerised, Kubernetes‑driven workflow, introducing swim‑lane environments, trace‑ID coloring, and a continuous‑delivery platform to dramatically improve resource utilization and deployment speed.
Preface
The 1‑to‑1 quality‑efficiency team at Xueersi has been dedicated to shortening the time required to bring high‑quality code into production; manual maintenance of many test environments in 2018 caused resource contention, environment conflicts, and long setup times, prompting a focused test‑environment governance effort.
Timeline of Improvements
Oct 2018 – Started containerising test environments with Kubernetes, achieving minute‑level build times and launching the internal continuous‑delivery platform “Nautilus”.
Jul 2019 – Introduced “swim‑lane” environment v1.0, reducing server usage.
Oct 2021 – Launched swim‑lane v2.0 beta, further lowering maintenance complexity and resource consumption.
Minute‑Level Test Environment Construction
Background
Initially more than ten test environments shared a single ECS server and common services (MySQL, RabbitMQ, Elasticsearch, Redis, etc.). Developers frequently faced questions such as “Is this environment usable?”, “Are the services up‑to‑date?”, and “Who changed the libraries?”. Adding new environments required lengthy approval and deployment cycles.
Technical Solution
We chose Kubernetes as the foundation because its robust configuration management, rapid rollout, and rollback capabilities meet both test‑environment and production requirements.
System Architecture
Service types are divided into base services (MySQL, RabbitMQ, ES, Redis, etc.), front‑end services, and back‑end services.
Challenges
Key difficulties include quickly provisioning a MySQL instance with data, ensuring the data schema is up‑to‑date, and sharing PHP code across services without disrupting development.
Data‑Storage Base Service Solution
We use production‑masked data snapshots on cloud disks. By creating a snapshot of a data disk, we can mount a new disk with the required data to a MySQL pod, enabling instant creation of a data‑filled MySQL instance.
PHP Code‑Sharing Solution
To avoid copying code, we mount a NAS‑backed shared directory into containers, allowing services to reference the same code base without modifying the build process.
Kubernetes Resource Examples
Namespace
Namespaces isolate resources for multi‑tenant environments; each test environment receives its own namespace.
<code>apiVersion: v1
kind: Namespace
metadata:
name: env1
...</code>Ingress
Ingress maps env1‑web.xxx.com to the web service in namespace env1.
<code>apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: web
namespace: env1
spec:
rules:
- host: env1-web.xxx.com
http:
paths:
- backend:
serviceName: web
servicePort: 80
path: /</code>Service
Exposes port 80 of the web pod in namespace env1.
<code>apiVersion: v1
kind: Service
metadata:
name: web
namespace: env1
spec:
selector:
app: web
ports:
- port: 80
targetPort: 80
protocol: TCP</code>Deployment
Deploys the web:master‑20210913153259 image in namespace env1.
<code>apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: web
namespace: env1
spec:
selector:
matchLabels:
app: web
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
template:
metadata:
labels:
app: web
spec:
containers:
- name: web
image: web:master-20210913153259</code>Swim‑Lane Environments
Concept
A swim‑lane environment isolates each test suite like lanes in a pool; lanes do not interfere with each other.
There are backbone lanes (independent services) and branch lanes (services that depend on the backbone).
v1.0 Technical Approach
We used Kubernetes Service ExternalName to route traffic without requiring changes to existing services, keeping business impact zero.
Service YAML example:
<code>apiVersion: v1
kind: Service
metadata:
name: app2
namespace: evn2
spec:
externalName: app2.env1.svc.cluster.local
ports:
- name: port-80
port: 80
protocol: TCP
type: ExternalName</code>Limitations of v1.0
Callback requests could not be handled.
All upstream services of the tested service had to be deployed in the same environment.
v2.0 Enhancements
v2.0 introduces trace‑ID coloring to propagate environment information, eliminating the need to deploy upstream services.
Technical Stack
Traffic entry coloring via Nginx.
Sidecar http‑proxy (SOCKS5) for internal service calls.
service‑proxy for final routing based on colored trace‑ID.
Component Details
Nginx extracts the environment name from the host, appends it to the trace‑ID, and forwards the request.
Example curl:
<code>curl -X GET http://env2-web.xxx.com/version/info -H 'traceId: cb20db7d-af38-4684-b950-1c0e4febca3a'</code>http‑proxy is a Go‑based SOCKS5 proxy that intercepts HTTP traffic, adds missing coloring, and forwards to service‑proxy.
service‑proxy is an Nginx instance that reads the $namespace, $service, and $o_port variables from the request and proxies to the appropriate service.
Example proxy_pass directive:
<code>proxy_pass http://$service.$namespace.svc.cluster.local:$o_port;</code>Continuous Delivery Platform “Nautilus”
The platform provides multi‑functional environment lanes, automatic resource reclamation, service‑status alerts, configuration‑diff checks, version alignment, database schema synchronization, and traffic‑based log retrieval.
Future Outlook
Swim‑lane v1.0 has been stable for years; v2.0 is now in gray‑scale testing and is expected to dramatically reduce test‑environment management complexity.
Xueersi 1-on-1 Technology Team
Official account of the Xueersi 1-on-1 Technology Team
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.