Backend Development 13 min read

Understanding Catastrophic Backtracking in Java Regular Expressions and How to Fix It

During a production incident a Java service’s URL‑validation regex caused near‑100 % CPU due to catastrophic backtracking from a greedy domain pattern, which was fixed by adding missing characters to the final class and converting the domain part to a possessive or atomic quantifier, preventing exponential matching.

Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud Developer
Understanding Catastrophic Backtracking in Java Regular Expressions and How to Fix It

During a production incident a Java service showed near‑100% CPU usage. A thread dump revealed that thousands of stack frames were repeatedly calling a method named validateUrl , which validates URLs using a regular expression.

A minimal reproducer was created to demonstrate the problem:

public static void main(String[] args) {
    String badRegex = "^([hH][tT]{2}[pP]://|[hH][tT]{2}[pP][sS]://)(([A-Za-z0-9-~]+).)+([A-Za-z0-9-~\\/])+$";
    String bugUrl = "http://www.fapiao.com/dddp-web/pdf/download?request=6e7JGxxxxx4ILd-kExxxxxxxqJ4-CHLmqVnenXC692m74H38sdfdsazxcUmfcOH2fAfY1Vw__%5EDadIfJgiEf";
    if (bugUrl.matches(badRegex)) {
        System.out.println("match!!");
    } else {
        System.out.println("no match!!");
    }
}

Running this program caused the Java process to consume about 91 % CPU, confirming that the regular expression is the culprit.

The regex in question is:

^([hH][tT]{2}[pP]://|[hH][tT]{2}[pP][sS]://)(([A-Za-z0-9-~]+).)+([A-Za-z0-9-~\/])+$

It consists of three parts: protocol validation, domain validation, and parameter validation. The domain part uses a greedy pattern (([A-Za-z0-9-~]+).)+ , which, together with the final part, leads to catastrophic backtracking when the input URL contains characters that are not matched by the last character class (e.g., underscores or percent signs).

Java’s regex engine is based on an NFA (Non‑deterministic Finite Automaton). When a pattern contains ambiguous or overly greedy constructs, the engine may backtrack extensively, causing exponential time consumption.

To illustrate NFA matching, simple examples are shown:

text="Today is a nice day."\nregex="day"

and

text="abbc"\nregex="ab{1,3}c"

These examples demonstrate how greedy quantifiers can lead to backtracking. A lazy version ( ab{1,3}?c ) reduces the amount of backtracking but does not eliminate it. A possessive quantifier ( ab{1,3}+ ) would prevent backtracking entirely.

Fixing the original URL validator involves two steps:

Allowing the characters that appear in real URLs (e.g., _ and % ) in the final character class.

Changing the greedy domain part to a possessive or atomic pattern to stop unnecessary backtracking.

One solution is to modify the regex as follows:

public static void main(String[] args) {
    String badRegex = "^([hH][tT]{2}[pP]://|[hH][tT]{2}[pP][sS]://)(([A-Za-z0-9-~]+).)+([A-Za-z0-9-~_%\\/])+$";
    String bugUrl = "http://www.fapiao.com/dddp-web/pdf/download?request=6e7JGxxxxx4ILd-kExxxxxxxqJ4-CHLmqVnenXC692m74H38sdfdsazxcUmfcOH2fAfY1Vw__%5EDadIfJgiEf";
    if (bugUrl.matches(badRegex)) {
        System.out.println("match!!");
    } else {
        System.out.println("no match!!");
    }
}

After adding _ and % to the last character class, the test prints match!! . However, to avoid future backtracking issues, a possessive quantifier can be applied to the domain part:

^([hH][tT]{2}[pP]://|[hH][tT]{2}[pP][sS]://)(([A-Za-z0-9-~]+).)++([A-Za-z0-9-~\/])+$

Using this pattern, the same test runs instantly with no CPU spike. An online regex debugger shows that the original pattern required over 110 000 steps before aborting, while the improved pattern completes in under 60 steps.

The article concludes with a recommendation to use tools such as “Online regex tester and debugger” to detect catastrophic backtracking early, and suggests further study of NFA, lazy, and possessive quantifiers for robust regex design.

JavaperformanceNFAbacktrackingRegular ExpressionsregexPossessive Quantifier
Tencent Cloud Developer
Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.