Analyzing the Mystery String Causing Android ANR: Unicode Line‑Breaking and Bidi Algorithm
This article explains why a specially crafted Unicode string triggers an Android ANR by dissecting the Unicode line‑breaking (UAX #14) and bidirectional (UAX #9) algorithms, showing how runs ending with double zeros lead to an infinite loop in TextLine.getOffsetToLeftRightOf.
In the previous article we identified a mysterious string that caused an Android app to freeze; this continuation explains why the string triggers an ANR by examining Unicode line‑breaking (UAX #14) and bidirectional (UAX #9) algorithms.
It describes how the line‑breaking algorithm determines permissible break points, lists key rules (e.g., English letters with spaces, hyphens, parentheses), and shows that the algorithm classifies characters into strong, weak, and neutral types.
The Bidi algorithm section explains LTR/RTL directionality, character categories, embedding levels, runs, and how Android’s AndroidBidi and StaticLayout compute runs and levels, leading to a runs array that may end with two zeros.
Analysis of the Android source reveals that when TextLine.getOffsetToLeftRightOf is called with cursor at the line end, toLeft false, and certain run conditions, the method enters a loop that never exits, causing the ANR.
By constructing a string containing an Arabic character followed by LRI (U+2066) and a space repeated several times, the runs array ends with double zero, reproducing the bug. The article provides the exact character array used for testing:
char arabicChar = 1766;
arabicChar = 1727;
char[] chars = new char[]{
arabicChar, 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A',
arabicChar, 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 8294,
32, 'A', 'A', 'A',
arabicChar, 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A',
arabicChar, 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 8294,
32, 'A', 'A', 'A',
arabicChar, 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A',
arabicChar, 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 8294,
32, 'A', 'A', 'A',
};Finally, three mitigation strategies are offered: remove RTL formatting characters when internationalization is unnecessary, detect the problematic LRI‑space pattern before calling offset methods, or use Paint.measureText to verify break positions.
Kuaishou Tech
Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.