Fundamentals 12 min read

Dynamic Hyperlink Insertion in Automotive Articles Using HanLP Aho‑Corasick Double‑Array Trie

This article describes a Java‑based solution that dynamically adds brand and model hyperlinks to automotive articles by building multilingual keyword dictionaries with HanLP and employing an Aho‑Corasick Double‑Array Trie for efficient, context‑aware matching without altering the original content.

HomeTech
HomeTech
HomeTech
Dynamic Hyperlink Insertion in Automotive Articles Using HanLP Aho‑Corasick Double‑Array Trie

The article explains the business need to automatically link car brand and model mentions in overseas automotive articles to dealer inventory pages, avoiding manual hyperlink insertion and ensuring links reflect real‑time stock availability.

To meet this requirement, a keyword dictionary is built from dealer inventory data (English and German) and stored in a Double‑Array Trie (DAT) using the open‑source HanLP library, which implements the Aho‑Corasick algorithm for fast multi‑pattern matching.

The Aho‑Corasick algorithm constructs a finite‑state automaton (similar to a trie with failure links) that can match all dictionary entries in linear time relative to the article length, while handling overlapping patterns efficiently.

Implementation details include:

Creating the DAT structure (base and check arrays) and the transition equations: base[r] + c = s and check[s] = r .

Storing the tail array for suffixes without common prefixes.

Sample code for building the dictionary:

TreeMap<String, String> treeThesaurus = new TreeMap<String, String>();
resultMap.getResult().get("params").forEach(o -> {
if ("makeList".equals(o.get("paramname"))) {
o.get("options").forEach(d -> {
String sourceBrandValue = d.get("value").toString().trim();
String brandValue = replaceStockWord(sourceBrandValue.replace(" ","-"));
treeThesaurus.put(sourceBrandValue.toUpperCase(), brandValue);
List<Map> options = (List<Map>) d.get("options");
if (!CollectionUtils.isEmpty(options)) {
options.forEach(v -> {
String svalue = v.get("value").toString().trim();
String value = brandValue+"/"+replaceStockWord(svalue.replace(" ","-"));
treeThesaurus.put((sourceBrandValue+" "+svalue).toUpperCase(), value.trim());
});
}
});
}
});

During article processing, HTML tags are stripped using Jsoup, then the cleaned text is parsed with HanLP's parseText method, providing a custom IHit implementation to record match positions and longest matches.

public interface IHit
{
void hit(int begin, int end, V value);
}

Matches are stored in a BuildThesaurus object (begin, end, sourceStr) and later used to replace the original article text with appropriate hyperlinks, taking care to adjust indices after each replacement.

public class BuildThesaurus {
private Integer begin;
private Integer end;
private String sourceStr;
// getters and setters
}

The final step assembles the hyperlink HTML and inserts it back into the article, ensuring that the original content remains unchanged for the author while providing dynamic, inventory‑aware links for readers.

In conclusion, the solution demonstrates a practical application of Aho‑Corasick Double‑Array Trie within HanLP to meet real‑world backend requirements for dynamic content linking in multilingual automotive portals.

JavaDynamic LinkingAho-CorasickDouble-Array TrieHanLP
HomeTech
Written by

HomeTech

HomeTech tech sharing

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.