Backend Development 9 min read

Mastering the 1 Billion Row Java Challenge: Tips, Rules, and Evaluation

This article explains the 1 Billion Row Challenge (1BRC) for Java 21, detailing the data format, required output, how to build and run the benchmark, optimization options, submission rules, and the evaluation environment used to rank participants.

Java Architecture Diary
Java Architecture Diary
Java Architecture Diary
Mastering the 1 Billion Row Java Challenge: Tips, Rules, and Evaluation

1. Introduction

Starting from New Year's Day 2024, the 1 Billion Row Challenge (1BRC) is open for submissions until 31 January 2024 23:59 UTC; any pull request created after that will not be considered.

The challenge aims to test Java's ability to aggregate one billion lines of temperature data from a text file, encouraging the use of all (virtual) threads, SIMD, garbage‑collector tuning, or any other technique to produce the fastest solution.

2. Challenge Details

The input file contains temperature measurements from weather stations, one per line, formatted as

<string: station name>;<double: measurement>

with exactly one decimal place. Example (10 lines):

<code>Hamburg;12.0<br/>Bulawayo;8.9<br/>Palembang;38.8<br/>St. John's;15.2<br/>Cracow;12.6<br/>Bridgetown;26.9<br/>Istanbul;6.2<br/>Roseau;34.4<br/>Conakry;31.2<br/>Istanbul;23.0<br/></code>

The task is to write a Java program that reads the file, computes the minimum, average, and maximum temperature for each station, sorts the stations alphabetically, and prints results in the form

&lt;min&gt;/&lt;avg&gt;/&lt;max&gt;

with one decimal place, e.g.

<code>{Abha=-23.0/18.0/59.2, Abidjan=-16.2/26.0/67.3, …}</code>

Java 21 must be used.

3. Running the Challenge

The repository (named

1brc

) contains two programs:

dev.morling.onebrc.CreateMeasurements (invoked via create_measurements.sh ) creates a configurable

measurements.txt

file with random data.

dev.morling.onebrc.CalculateAverage (invoked via calculate_average.sh ) computes the averages from

measurements.txt

.

Steps to run:

Build the project with Apache Maven:

./mvnw clean verify

Create a 1‑billion‑row measurement file (run once):

./create_measurements.sh 1000000000

(produces a ~12 GB file; ensure sufficient disk space).

Calculate the averages:

./calculate_average.sh

Optimize the

CalculateAverage

program using any technique you deem appropriate—parallelism, the incubating Vector API, memory‑mapped file sections, AppCDS, GraalVM, CRaC, GC tuning, etc.

The provided simple implementation uses Java Stream API and finishes in about two minutes on the reference hardware, serving as a baseline.

4. Rules and Limitations

Any Java distribution may be used (SDKMan builds, early‑access builds from openjdk.net, builds from builds.shipilev.net, etc.).

No external dependencies are allowed.

The implementation must consist of a single Java source file.

All computation must happen at runtime; pre‑computing results during build time (e.g., embedding them in a native image) is prohibited.

Input constraints:

Station name: non‑empty UTF‑8 string, 1–100 characters.

Temperature: double between –99.9 and 99.9 inclusive, always with one decimal place.

The solution must work for any valid station name and any data distribution; it cannot rely on special properties of the provided dataset.

5. Participating in the Challenge

To submit your implementation:

Fork the

1brc

GitHub repository.

Copy

CalculateAverage.java

to a new file named

CalculateAverage_<your_GH_user>.java

(e.g.,

CalculateAverage_doloreswilson.java

).

Make your implementation as fast as possible.

Copy

calculate_average.sh

to

calculate_average_<your_GH_user>.sh

and adjust it to invoke your class, adding any JVM options via

JAVA_OPTS

if needed.

OpenJDK 21 is the default; if you use a custom JDK, include the appropriate

sdk use java [version]

command in the startup script.

(Optional) To build a native binary with GraalVM, modify

pom.xml

accordingly.

Create a pull request against the upstream repository, clearly stating the class name, your runtime on your hardware (CPU, cores, RAM), and the measured execution time.

Community discussion is encouraged via the repository’s GitHub Discussions.

6. Evaluation

Results are measured on a Hetzner Cloud CCX33 instance (8 CPU, 32 GB RAM). Execution time is recorded for five consecutive runs; the fastest and slowest runs are discarded, and the average of the remaining three determines the competitor’s score. All competitors use the identical

measurements.txt

file. Scripts based on Terraform and Ansible are provided for anyone who wishes to reproduce the environment (note that running them incurs cloud costs).

Project repository: https://github.com/gunnarmorling/1brc . Join the challenge!

JavaPerformanceoptimizationData ProcessingBenchmarkChallenge
Java Architecture Diary
Written by

Java Architecture Diary

Committed to sharing original, high‑quality technical articles; no fluff or promotional content.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.