Gray Release Strategy and Architecture for Online Customer Service System
The article describes the “小得物” gray‑release environment—a single‑region, isolated production setup that adds dedicated SLBs and DLB routing, uses UID‑based user and staff segmentation via HTTP headers and message metadata, and enables daytime partial‑traffic releases for the online customer‑service system, dramatically improving stability and reducing incident rates.
We present the gray‑release environment “小得物” – an isolated, single‑region production setup that covers network, access layer, middleware and core applications, providing tools for stable development and business growth.
Current online customer‑service system faces stability challenges: lack of gray environment forces midnight full‑traffic releases, leading to prolonged bug exposure; delayed fault detection; limited gray verification for staff and users.
Our gray‑release strategy aims to enable daytime releases, support partial user and staff traffic for verification, and reduce P‑level incidents.
Challenges
Whether long‑link services can be gray‑released.
How to gray‑release C‑end (user) and B‑end (staff) simultaneously.
Choosing UID‑based gray strategies for C‑end and B‑end.
Gray Scenarios
We identified combinations of user, staff, App version and feature verification, as shown in the table (user‑gray vs non‑gray, staff‑gray vs non‑gray, new/old version, etc.). The most common scenario is old‑App + gray user + gray staff.
Gray Strategies
(1) UID range gray – selects users whose 7‑day average inbound volume ≤3% of total.
(2) Inbound‑flow split gray – assigns gray users to gray staff based on entry‑point groups. Example headers:
xdw-kefu-agent-gid: 14133836521**** // gray staff group xdw-kefu-entryid: 12133836112**** // gray entry point(3) Staff UID gray – selects staff from various workplaces to broaden coverage.
System Refactoring
We added gray markers to HTTP headers, cookies and message heads.
HTTP request example:
Accept: application/json, text/plain, */* Accept-Encoding: gzip, deflate, br Accept-Language: zh-CN,zh;q=0.9 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36 accessToken: D9ECDE2F26C1CBA44734EA80EB704A4702BA9C81A5E48DC328832DBFB1947D1A350F52350****** xdw-kefu-agent-gid: 141338365****,141338365****Static resource cookie example:
Cookie: xdw-kefu-agent-gid=141338365***** ,141338365*****; accessToken=D9ECDE2F26C1CBA44734EA80EB704A4702BA9C81A5E48DA61EDA9071F47E80C4E7E9A21C0B2A12A591C46411*****Long‑link message head example:
{ "xdw-kefu-entryid": "141338365*****", // entry "xdw-kefu-uid": "112233", // user id "xdw-app-version": "1.0.0", // app version "xdw-device-id": "iOS/Android" // device }IM gateway messages now include the gray headers and forward them to the DLB.
Architecture
The upgraded architecture adds two SLBs to handle traffic from the messaging service and staff console, using a shared DLB cluster for routing. It supports staff UID + group gray, user UID + range + entry gray, and backend & static resource gray.
Benefits
During version X release, the gray environment was used in daytime with production traffic verification, resulting in smooth deployment and reduced incidents.
Conclusion
The upgrade enables comprehensive gray release for inbound, robot QA, recommendation, staff assignment, chat, rating, user info, tracking, consignment query, order query, and front‑end features, meeting the original objectives.
DeWu Technology
A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.