Fundamentals 18 min read

Uncovering the Mystery Behind App ANR Caused by Unicode Bidi Strings

This article continues the investigation of a mysterious string that triggers Android app freezes, explaining Unicode line‑breaking and bidirectional algorithms, how runs end with double zeros, how to craft a reproducing string, and practical ways to avoid the resulting ANR.

Kuaishou Frontend Engineering
Kuaishou Frontend Engineering
Kuaishou Frontend Engineering
Uncovering the Mystery Behind App ANR Caused by Unicode Bidi Strings

Background

The previous article identified the location of a dead loop caused by a mysterious string; this continuation dives deeper into why the string behaves that way and how it leads to an ANR.

Unicode Line‑Breaking Algorithm

The line‑breaking algorithm decides where a long line can be split when no explicit newline exists, based on character categories and specific rules such as allowing a break after a space following English letters.

Unicode Bidirectional (Bidi) Algorithm

Most scripts are left‑to‑right (LTR), but languages like Arabic and Hebrew are right‑to‑left (RTL). When LTR and RTL characters appear together, the Bidi algorithm assigns each character a type (strong, weak, neutral) and an embedding level, then determines visual order.

Why Runs End with Double Zero

During layout, the method TextLine.getOffsetToLeftRightOf uses a runs array that stores visual direction information. When the cursor is at the line end, runIndex points past the last run, and a subsequent branch expects a non‑zero run value. Because the last run remains zero, the algorithm falls into an infinite loop, causing the ANR.

Constructing a String That Triggers ANR

The string must contain:

Arabic or Hebrew characters (RTL strong type)

The sequence LRI (U+2066, decimal 8294) followed by a space (U+0020)

Any number of LTR characters between these sequences

Repeating the LRI+space combination two or three times increases the chance of the line break occurring at the problematic position.

char arabicChar = 1766; arabicChar = 1727; char[] chars = new char[]{ arabicChar, 'A','A','A','A','A','A','A','A','A', arabicChar, 'A','A','A','A','A','A','A','A','A','A', 8294, 32, 'A','A','A', arabicChar, 'A','A','A','A','A','A','A','A','A','A', arabicChar, 'A','A','A','A','A','A','A','A','A','A', 8294, 32, 'A','A','A', arabicChar, 'A','A','A','A','A','A','A','A','A','A', arabicChar, 'A','A','A','A','A','A','A','A','A','A', 8294, 32, 'A','A','A', };

How to Avoid or Handle This ANR Scenario

Three practical approaches:

When internationalization is not required, disable RTL handling and strip directional formatting characters.

Detect a line ending with the LRI+space pattern (or a run array ending with double zero) and skip the getOffsetForHorizontal call for that line.

Use Paint.measureText to compute character widths and compare with the horizontal offset before invoking the problematic method.

Related Links

UAX #14: Unicode Line Breaking Algorithm

UAX #9: Unicode Bidirectional Algorithm

Android source code and test cases referenced in the analysis

AndroidANRUnicodeBidiTextLayoutLineBreaking
Kuaishou Frontend Engineering
Written by

Kuaishou Frontend Engineering

Explore the cutting‑edge tech behind Kuaishou's front‑end ecosystem

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.