Fundamentals 12 min read

String Matching Algorithms: Brute‑Force, KMP, Boyer‑Moore, BMH, Sunday, Shift‑And/Or, and Bitmap

This article introduces several classic string‑matching algorithms—including Brute‑Force, KMP, Boyer‑Moore, BMH, Sunday, Shift‑And/Or, and bitmap techniques—explaining their principles, preprocessing steps, and practical considerations for improving text search performance.

Qunar Tech Salon

Apr 16, 2017

String Matching Algorithms: Brute‑Force, KMP, Boyer‑Moore, BMH, Sunday, Shift‑And/Or, and Bitmap

When validating forms we often need to check whether a string contains a certain substring, such as an email containing "indexOf". The indexOf method was introduced in ES3.1 and works for both strings and arrays. Interviewers frequently ask how the array version is implemented, but rarely the string version because it is considerably more complex.

The difficulty of implementing indexOf for strings stems from prefix‑suffix relationships, which can be modeled with prefix trees (tries) or suffix trees. Understanding these concepts is essential for writing an efficient implementation.

The simplest approach is the Brute‑Force algorithm (sometimes humorously called the "boyfriend" algorithm). It treats the longer string as the target and the shorter one as the pattern.

The idea is to compare the first character of the target with the first character of the pattern; if they match, continue character‑by‑character comparison, resetting the pattern index if a mismatch occurs.

Brute‑Force Algorithm

KMP Algorithm

The famous Knuth‑Morris‑Pratt (KMP) algorithm can match any pattern against a target in linear time by preprocessing the pattern to build a "next" (or "nextval") table.

For a detailed explanation of KMP’s inner workings, see the linked article.

http://blog.csdn.net/qq_29501587/article/details/52075200

Constructing the next and nextval tables is explained in the summary below.

KMP can be viewed as a degenerated deterministic finite automaton (DFA); it was the first algorithm to break the barrier of slow string matching.

Boyer‑Moore Algorithm

Boyer‑Moore is widely used in text editors for diff operations and is about three to four times faster than KMP. It also preprocesses the pattern to build a bad‑character table and a good‑suffix table.

Ruanyifeng’s article explains the bad‑character rule and the good‑suffix rule, though it does not provide full implementation details.

http://www.ruanyifeng.com/blog/2013/05/boyer-moore_string_search_algorithm.html

The bad‑character rule is relatively easy to implement; the good‑suffix rule can be O(n²) or O(n³) and is harder to understand.

For advanced single‑pattern matching algorithms, the self‑containment of substrings (prefixes/suffixes) is a crucial concept that accelerates matching; KMP popularized this idea, while BM leverages suffix self‑containment. The bad‑character table can speed up any single‑pattern matcher, not only BM; for large character sets, it is stored in a dictionary that maps characters to shift distances.

BMH Algorithm

BMH is a simplification of Boyer‑Moore that discards the complex good‑suffix rule and relies solely on the bad‑character heuristic, achieving better average performance.

Algorithm idea:

Search the text from right to left.

When a mismatch occurs, shift the pattern according to the bad‑character table and continue.

The shift amount is computed by the shift_table shown in the figure below.

Here k is the largest index such that Pattern[k] == Text[i+m-1].

If no character matches, shift the pattern by its full length m.

If the pattern matches completely, return the position in the text.

If the end of the text is reached without a full match, return -1 .

Sunday Algorithm

Sunday’s method is similar to Boyer‑Moore but, on a mismatch, looks at the character immediately following the current window in the text; if that character does not appear in the pattern, the pattern is shifted by its length plus one.

Shift‑And and Shift‑Or Algorithms

These algorithms are beyond the author’s current expertise; only reference links are provided.

http://www.iteye.com/topic/1130001

Bitmap Algorithm Idea

This technique appeared in a Tencent interview; its advantages and disadvantages are discussed in the linked article.

http://www.tuicool.com/articles/aYfEvy

Additional resources with implementations and complexity analysis are listed below.

http://blog.csdn.net/airfer/article/details/8951802/

For ordinary developers, these algorithms are rarely used on the front end, but they can significantly boost performance in rich‑text editors, log processing, and similar scenarios.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Algorithms fundamentals KMP string-matching Boyer-Moore

Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.