Operations 6 min read

Implementing Multi-Active Deployment for the Beetle Platform: Session Sharing, Distributed Locks, and Service Recovery

This article describes how the Beetle efficiency platform addresses multi‑active deployment challenges by solving session‑sharing, distributed locking, and task‑recovery problems to enable seamless service upgrades and bug fixes without user disruption.

转转QA
转转QA
转转QA
Implementing Multi-Active Deployment for the Beetle Platform: Session Sharing, Distributed Locks, and Service Recovery

Introduction Beetle is a lightweight efficiency platform developed by ZhiZhuan's testing department, built on GitLab, Jenkins, and Maven, aiming to improve the efficiency of compilation, deployment, testing, and release processes.

Background As the platform is adopted across the company's clusters, the single‑instance deployment requires service downtime for each iteration or bug fix, which is unacceptable for users. Multi‑active deployment is proposed to address this.

Goal Enable service restarts for upgrades or bug fixes without user perception, requiring solutions to session sharing, distributed locking, and task recovery.

Overview Beetle interacts with compilation, environment deployment, and release services by sending commands and polling results.

Problems

Session sharing: client requests may be routed to different servers.

Synchronized lock: the use of synchronized cannot guarantee atomicity across multiple nodes.

How to recover ongoing compilation and deployment tasks after a service stop in a multi‑node environment.

Solutions

1. Session sharing Options include inter‑service session synchronization, memcached‑session‑manager (MSM), filter‑based interception, and persistence to a database. The team adopted a filter approach that saves the session in a login interceptor, automatically adding it when absent on a server.

2. Distributed lock Three typical implementations: using a database, a cache (Redis), or Zookeeper.

Database lock : Insert a unique record as a lock and delete it after use. Issues include handling process crashes, lack of timeout, and insert failures; mitigated by scheduled tasks to clean expired locks.

Cache lock : Use Redis atomic commands (SETNX, DEL) with expiration. Challenges are setting appropriate expiration to avoid premature release or long hold during restarts; conclusion: Redis lock expiration is unreliable.

Zookeeper lock : Create sequential ephemeral nodes; the client with the smallest sequence obtains the lock, others watch the predecessor node. After task completion, the node is deleted or expires automatically. This solves deadlock and lock release issues but is complex to implement; third‑party client Curator can simplify it.

3. Custom ZZLock ZhiZhuan's zzlock, based on etcd, provides automatic renewal and recovery when a process crashes.

Solutions 3 and 4 are relatively simple and effectively address the problems.

3. Task recovery after service termination Use scheduled tasks to poll and resume interrupted tasks, employing the distributed lock mechanisms described above.

After adding nodes, the Beetle architecture is updated:

Extension With the above issues solved, the platform achieves true dual‑active operation. The team also explored deploying Beetle itself using its own release pipeline: after sending an upgrade task, one server is removed from Nginx, the remaining servers handle requests, and after the upgrade, Nginx is re‑added.

Deploymentmulti-activedistributed lockservice availabilitysession sharingBeetle platform
转转QA
Written by

转转QA

In the era of knowledge sharing, discover 转转QA from a new perspective.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.