ByteDance’s Best Practices for Billion‑DAU Mobile Client Release Engineering
This talk presents ByteDance’s mobile release engineering practices for billion‑DAU apps, detailing challenges, the evolution of their release platform from Jenkins‑based pipelines to a custom distributed system, and solutions for pipeline efficiency, safety, release velocity, and data reliability via artifact libraries.
Speaker Gao Lei, a ByteDance release engineering engineer with over ten years of software development experience, introduces himself and outlines the talk’s focus on ByteDance’s billion‑DAU mobile client release best practices, covering challenges, platform evolution, and solutions.
He contrasts mobile and server releases: mobile upgrades depend on terminal devices and app stores, making rollback costly and version numbers critical; release cycles are longer (1‑2 weeks) versus frequent server updates; more roles are involved in mobile release; historical versions are retained, unlike server‑side single‑version retention.
Typical online accidents—wrong architecture packages, broken download links, inappropriate assets—show the high risks in mobile release, stemming from security, data, testing gaps that can cause financial loss or user churn.
From these risks arise four core questions: how to build an efficient release pipeline, ensure process safety, achieve better and faster rollout, and guarantee release data reliability.
ByteDance’s mobile release platform evolved in three stages: pre‑2017 ad‑hoc Jenkins clusters; 2017‑2019 “Rocket 1.0” introducing Jenkins‑based pipelines and a dedicated build team; post‑2019 “Rocket 2.0” replacing Jenkins with a custom distributed scheduler, defining reusable atomic capabilities, integrating security throughout CI/CD, adopting an artifact library as data foundation, and exploring varied gray‑release strategies.
The current platform supports headline products like Toutiao, Douyin, Xigua, novels, Feishu across Android, iOS, macOS, Windows, providing a one‑stop mobile R&D platform from requirement specification to build, test, release, and post‑release monitoring.
To achieve pipeline efficiency, the platform enables multi‑scene custom task orchestration and reduces complexity by decomposing the platform into measurable atomic capabilities, guided by principles of independent functionality and independent measurability.
Safety is shifted left: security, platform, and business teams share responsibility, with security providing scanning and remediation guidance, the platform offering flexible checkpoints, and businesses prioritizing and implementing fixes; high‑risk vulnerabilities trigger a block on releases until remediation.
Release velocity is improved through internal dogfooding (the 'ByteDance Internal Test' mini‑program engaging tens of thousands of employees) and external precision targeting using user/app/version data and machine‑learning models that raise CTR/CVR by about ten points over manual selection.
Data reliability is ensured by an immutable artifact library that stores verified build outputs, preventing tampered links and enabling reliable safety checks, functional tests, and user‑story validation; problematic artifacts can be blacklisted to protect rollout.
Practices emphasize iterative improvement, learning from accidents via thorough case studies, balancing standardized and individualized needs, and measuring platform value to drive continuous improvement.
Observed trends include increasing release frequency (moving toward weekly or even daily cycles), heightened safety focus, AI‑driven precise test generation reducing maintenance overhead, and the adoption of continuous gray‑release where release boundaries blur.
The platform’s value is judged by speed, efficiency, safety, and cost, requiring balance among these dimensions; optimizations must be measurable to be improvable.
Future work expands the release concept to a large‑scale system encompassing configuration and static resources, refines release velocity with richer data dimensions, establishes a measurement‑consumption loop, and opens the Mars SDK for external use.
ByteDance Terminal Technology
Official account of ByteDance Terminal Technology, sharing technical insights and team updates.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.