Artificial Intelligence 5 min read

Inside Twitter’s Open‑Source Recommendation Engine: Architecture & Key Components

This article examines the open‑source Twitter recommendation algorithm released by Elon Musk, detailing its main services, machine‑learning models, data sources, programming languages, and the GitHub repositories that host the core components such as SimClusters, TwHIN, rankers, and the Rust‑based navi framework.

Java Architecture Diary
Java Architecture Diary
Java Architecture Diary
Inside Twitter’s Open‑Source Recommendation Engine: Architecture & Key Components

1. Introduction

Elon Musk promised on Twitter to open‑source the core recommendation algorithm and fulfilled this promise on March 31, 2023. The released code includes the algorithm that recommends tweets in users' timelines, with two new repositories: the‑algorithm and the‑algorithm‑ml.

2. Algorithm Architecture

Twitter's recommendation algorithm is a collection of services and jobs that build and serve the home timeline. The diagram below shows the main connections between these services and jobs.

The main components included in this repository are:

SimClusters : community detection and sparse embeddings into these communities.

TwHIN : dense knowledge‑graph embeddings for users and tweets.

trust-and-safety-models : models for detecting NSFW or abusive content.

real-graph : model predicting the likelihood of interaction between Twitter users.

tweepcred : PageRank‑like algorithm for calculating user reputation.

recos-injector : stream event processor that builds input streams for GraphJet‑based services.

graph-feature-service : provides graph features for a pair of directed users (e.g., how many users A follow tweets from user B).

search-index : searches the network for tweets and ranks them; about 50% of tweets come from this candidate source.

cr-mixer : coordination layer that extracts out‑of‑network tweet candidates from foundational compute services.

user-tweet-entity-graph : maintains an in‑memory user‑to‑tweet interaction graph and finds candidates by traversing it, built on the GraphJet framework.

follow-recommendation-service : suggests accounts for users to follow and tweets from those accounts.

light-ranker : lightweight ranking model used by search to rank tweets.

heavy-ranker : neural network that ranks candidate tweets, one of the main signals for timeline selection.

home-mixer : primary service that builds and serves the home timeline.

visibility-filters : filters Twitter content to support legal compliance, improve product quality, increase user trust, and protect revenue through hard filters, safe‑product handling, and coarse‑grained degradation.

timelineranker : traditional service providing relevance scores for tweets from earlybird search index and UTEG service.

navi : high‑performance machine‑learning model service written in Rust.

product-mixer : software framework for building content sources.

twml : traditional machine‑learning framework built on TensorFlow v1.

3. Programming Languages

The codebase includes the following programming languages:

🥇 Scala – a JVM language

🥈 Java – essential

🥉 Starlark – a Python dialect that fixes many Python shortcomings

Additionally Python, C++, and Rust.

4. GitHub Repositories

Algorithm main repository: https://github.com/twitter/the-algorithm/

ML model repository: https://github.com/twitter/the-algorithm-ml

backend architecturemachine learningrecommendation systemopen sourceTwitter
Java Architecture Diary
Written by

Java Architecture Diary

Committed to sharing original, high‑quality technical articles; no fluff or promotional content.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.