Artificial Intelligence 6 min read

Meta’s TestGen‑LLM: AI‑Driven Automatic Unit Test Generation for Kotlin Code

In 2024 Meta introduced TestGen‑LLM, an AI‑powered tool that automatically generates Kotlin unit tests using large language models, improving test coverage through a multi‑stage pipeline of candidate generation, compilation filtering, execution filtering, coverage validation, refactoring, and engineer review, with reported coverage gains across Facebook and Instagram codebases.

Continuous Delivery 2.0

May 30, 2024

Meta’s TestGen‑LLM: AI‑Driven Automatic Unit Test Generation for Kotlin Code

Meta’s parent company introduced TestGen‑LLM in 2024, applying large language model (LLM) tools to automatically supplement unit tests for Kotlin code in Facebook and Instagram, aiming to increase unit test coverage.

The approach, called Assured LLM‑based Software Engineering (Assured LLMSE), focuses on four key principles: (1) targeting regression test cases that can run automatically and pass, (2) measurable improvement via increased line coverage, (3) integrating multiple LLMs to generate composable code components, and (4) final human engineer review to assist rather than replace developers.

The generation pipeline consists of the following steps:

LLM generates a candidate list of automated unit test cases.

Extract test case code.

First filter: compile the generated tests and discard those that fail to compile.

Second filter: run the tests and discard those that fail execution.

Third filter: discard tests that do not improve coverage.

Refactor the test class.

Submit a diff for engineer review; if approved, the diff is merged into the codebase.

First engineering validation : During an initial test competition, 36 engineers produced 105 unit‑test diffs, 16 of which were generated by TestGen‑LLM. TestGen‑LLM contributed 17 diffs, covering 28 new files, improving coverage in 13 partially covered files, and adding three A/B test guards. Each test was submitted as an individual diff rather than a whole test class.

Second engineering validation : In a later automated run on the same directories, TestGen‑LLM generated 42 diffs, of which 36 were accepted by engineers, 4 were rejected, and 2 were withdrawn. Rejections were due to tests for trivial getters, violation of single‑responsibility principle, or lack of assertions.

Overall results on 86 existing Kotlin components showed that 75% of test classes had at least one correctly built test, 57% had at least one test that built and passed reliably, and 25% achieved increased line coverage compared to all other tests sharing the same build target. Coverage improvements were higher on Facebook than Instagram, reflecting the larger existing test suite on Facebook.

Key takeaways include the effectiveness of LLM‑generated tests in expanding coverage, the importance of automated filtering stages, and the necessity of human review to ensure test quality and relevance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI LLM Kotlin unit testing Test Generation

Written by

Continuous Delivery 2.0

Tech and case studies on organizational management, team management, and engineering efficiency

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.