Large‑Scale Code Deletion at Google (Sensenmann Project)
The article examines Google’s massive monorepo, the challenges of dead code, the Sensenmann project’s automated deletion using build‑graph analysis and Tarjan’s algorithm, discusses whitelist handling and communication strategies, and concludes with a brief promotion for a Python continuous‑deployment course.
Google’s single monolithic repository, stored in the Piper system, contains billions of lines of code shared across libraries, services, experiments, and tools, allowing any engineer to view almost all code. While this openness enables rapid reuse and updates, maintaining such a massive codebase incurs high production and engineering costs.
The Sensenmann project was created to address the accumulation of dead code in this environment. By automatically identifying unused code and generating code‑review change lists, Sensenmann has removed nearly 5 % of Google’s C++ code, submitting over 1,000 deletion changes each week.
Dead code detection relies on Google’s Blaze build system (the internal version of Bazel) to construct a complete dependency graph, revealing libraries not linked to any binary. However, binaries, one‑off migration scripts, and diagnostic tools also need consideration; unused binaries generate no logs, making it hard to prove they are dead.
To avoid deleting essential code, a whitelist system marks exceptions such as example APIs or non‑executable test utilities. The project also tracks runtime logs of internal binaries; if a binary has not run for a long period, it becomes a candidate for removal.
Complexities arise when tests depend on libraries that are otherwise dead. By creating a cyclic dependency between a library and its tests, the graph treats them as a strongly connected component. Tarjan’s algorithm is then used to identify these components and safely delete dead nodes while preserving test coverage.
Matching tests to the libraries they validate can be non‑trivial. Simple naming conventions help, but more sophisticated approaches—such as edit‑distance heuristics or coverage analysis—are needed for cases like LZW compression libraries versus URL‑encoding utilities.
Beyond technical challenges, the project emphasizes user‑centric communication: concise change descriptions, well‑structured support documents, and thoughtful handling of feedback are essential to gain engineer acceptance of automated deletions.
Finally, the article includes a promotional notice for a Python continuous‑deployment training course, highlighting its benefits for improving software delivery speed and reliability.
Continuous Delivery 2.0
Tech and case studies on organizational management, team management, and engineering efficiency
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.