Operations 10 min read

Google and Microsoft Automated Testing Practices: Unit Test Levels and DevOps Evolution

The article examines how Google and Microsoft have shaped automated testing across DevOps eras, defining test sizes, responsibilities, and tiered test classifications from unit to large‑scale integration tests to improve productivity, reliability, and release speed.

Continuous Delivery 2.0

Dec 8, 2020

Google and Microsoft Automated Testing Practices: Unit Test Levels and DevOps Evolution

As the term “DevOps” gains popularity in the IT industry, two recurring questions emerge: what qualifies as a unit test and who should be responsible for writing them.

These questions have already been explored by trillion‑dollar companies like Google and Microsoft.

1. Google: Automated Test Cases S/M/L

Content summarized from the 2020 edition of "Google Software Engineering", Chapter 11, "Overview of Automated Testing".

For developers coming from organizations without a strong testing culture, writing tests may seem counter‑productive because it appears to take as much time—or even longer—than implementing features. Google, however, has found that investing in software automation testing yields several key benefits for developer productivity:

Less debugging

Increased confidence in changes

Improved documentation

Simpler code reviews

Thoughtful design

Fast, high‑quality releases

1. Dark Ages (pre‑2005)

In Google’s early days, engineer‑driven testing was considered non‑essential; the belief was that smart engineers would simply write good software.

Several large systems had extensive integration tests, but most products were developed in an “agile naked‑run” fashion.

By 2005, the Google Web Server (GWS) suffered severe service‑quality issues, leading to exhausted engineers, declining productivity, and a surge of releases containing user‑impacting defects—up to 80% of releases required rollbacks.

The GWS tech lead mandated automated testing and continuous integration, even assigning a “Build Cop” to ensure every failing build was promptly fixed.

One year later, emergency fixes were cut by half, and GWS now runs tens of thousands of automated test cases, enabling daily releases with minimal user‑visible defects.

This marked a turning point in Google’s software‑engineering mindset: it became clear that relying solely on individual engineers could not prevent product defects, especially as team size grew.

2. Post‑2008

Google learned early that while engineers prefer large, system‑level automated tests, such tests are slower, less reliable, and harder to debug than smaller tests.

When the pain of debugging system‑level tests became intolerable, engineers asked why they couldn’t test a single server at a time, leading to the creation of smaller test cases that are faster, more stable, and less painful.

Google therefore defined two dimensions for every automated test case: resource consumption and verification scope.

Resource size : the memory, processes, and time required to run the test.

Verification scope : the size of the specific code path being validated.

Although size and scope are related, they are distinct concepts. Small, medium, and large tests are defined by the constraints of the testing infrastructure: small tests run in a single process, medium tests on a single machine, and large tests anywhere needed.

Google does not adhere strictly to traditional “unit test” or “integration test” labels; instead, it prioritizes speed and determinism regardless of test scope.

Large end‑to‑end tests are retained mainly for system configuration verification, while smaller tests are isolated and run only on release branches to avoid disrupting developers’ workflows.

2. Microsoft: Automated Test Case Levels – From L0 to L3

1. Dark Ages (pre‑2010)

Microsoft has long emphasized automated testing. Since the 1990s, its testing organization featured two dedicated roles:

(1) Software Design Engineer in Test (SDET): responsible for developing automation and testing infrastructure.

(2) Software Test Engineer (STE): runs automated tests and performs manual testing.

At that time, the developer‑to‑tester ratio was roughly 1:1, a model that is no longer viable today.

—Excerpt from “Evolving Test Practices at Microsoft – Azure DevOps”

The approach proved ineffective; developers would hand code to SDETs, who in turn passed automation to STEs, leading to costly bottlenecks and delayed product releases.

2. Opening the DevOps Era

After Microsoft embarked on its DevOps journey, a 2015 quality vision shifted testing upstream, redefining test categories based on external dependencies rather than execution timing.

L0/L1 – Unit Tests

L0 tests are the most numerous: fast, in‑memory unit tests that depend only on the code under test, with no external dependencies.

L1 tests may require binary integration packages along with additional dependencies such as a database or file system.

L2/L3 – Functional Tests

L2 functional tests target “testable” service deployments; they require service deployment but may isolate key dependencies.

L3 tests are limited integration tests executed in production environments, requiring a full product deployment.

Microsoft’s VSTS team spent two and a half years transitioning from a large‑system‑test‑centric approach to a strategy focused primarily on L0 test cases.

Consequently, VSTS no longer employs dedicated SDETs or STEs; the responsibility for writing and maintaining automated tests now rests with the development engineers themselves.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

automated testing Google Microsoft Unit Tests

Written by

Continuous Delivery 2.0

Tech and case studies on organizational management, team management, and engineering efficiency

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.