Artificial Intelligence 6 min read

Meta’s TestGen‑LLM: AI‑Driven Automatic Unit Test Generation for Kotlin Code

In 2024 Meta introduced TestGen‑LLM, an AI‑powered tool that automatically generates Kotlin unit tests using large language models, improving test coverage through a multi‑stage pipeline of candidate generation, compilation filtering, execution filtering, coverage validation, refactoring, and engineer review, with reported coverage gains across Facebook and Instagram codebases.

Continuous Delivery 2.0
Continuous Delivery 2.0
Continuous Delivery 2.0
Meta’s TestGen‑LLM: AI‑Driven Automatic Unit Test Generation for Kotlin Code

Meta’s parent company introduced TestGen‑LLM in 2024, applying large language model (LLM) tools to automatically supplement unit tests for Kotlin code in Facebook and Instagram, aiming to increase unit test coverage.

The approach, called Assured LLM‑based Software Engineering (Assured LLMSE), focuses on four key principles: (1) targeting regression test cases that can run automatically and pass, (2) measurable improvement via increased line coverage, (3) integrating multiple LLMs to generate composable code components, and (4) final human engineer review to assist rather than replace developers.

The generation pipeline consists of the following steps:

LLM generates a candidate list of automated unit test cases.

Extract test case code.

First filter: compile the generated tests and discard those that fail to compile.

Second filter: run the tests and discard those that fail execution.

Third filter: discard tests that do not improve coverage.

Refactor the test class.

Submit a diff for engineer review; if approved, the diff is merged into the codebase.

First engineering validation : During an initial test competition, 36 engineers produced 105 unit‑test diffs, 16 of which were generated by TestGen‑LLM. TestGen‑LLM contributed 17 diffs, covering 28 new files, improving coverage in 13 partially covered files, and adding three A/B test guards. Each test was submitted as an individual diff rather than a whole test class.

Second engineering validation : In a later automated run on the same directories, TestGen‑LLM generated 42 diffs, of which 36 were accepted by engineers, 4 were rejected, and 2 were withdrawn. Rejections were due to tests for trivial getters, violation of single‑responsibility principle, or lack of assertions.

Overall results on 86 existing Kotlin components showed that 75% of test classes had at least one correctly built test, 57% had at least one test that built and passed reliably, and 25% achieved increased line coverage compared to all other tests sharing the same build target. Coverage improvements were higher on Facebook than Instagram, reflecting the larger existing test suite on Facebook.

Key takeaways include the effectiveness of LLM‑generated tests in expanding coverage, the importance of automated filtering stages, and the necessity of human review to ensure test quality and relevance.

AILLMSoftware EngineeringKotlinUnit Testingtest generation
Continuous Delivery 2.0
Written by

Continuous Delivery 2.0

Tech and case studies on organizational management, team management, and engineering efficiency

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.