Dynamic Hyperlink Insertion in Automotive Articles Using HanLP Aho‑Corasick Double‑Array Trie
This article describes a Java‑based solution that dynamically adds brand and model hyperlinks to automotive articles by building multilingual keyword dictionaries with HanLP and employing an Aho‑Corasick Double‑Array Trie for efficient, context‑aware matching without altering the original content.
The article explains the business need to automatically link car brand and model mentions in overseas automotive articles to dealer inventory pages, avoiding manual hyperlink insertion and ensuring links reflect real‑time stock availability.
To meet this requirement, a keyword dictionary is built from dealer inventory data (English and German) and stored in a Double‑Array Trie (DAT) using the open‑source HanLP library, which implements the Aho‑Corasick algorithm for fast multi‑pattern matching.
The Aho‑Corasick algorithm constructs a finite‑state automaton (similar to a trie with failure links) that can match all dictionary entries in linear time relative to the article length, while handling overlapping patterns efficiently.
Implementation details include:
Creating the DAT structure (base and check arrays) and the transition equations: base[r] + c = s and check[s] = r .
Storing the tail array for suffixes without common prefixes.
Sample code for building the dictionary:
TreeMap<String, String> treeThesaurus = new TreeMap<String, String>();
resultMap.getResult().get("params").forEach(o -> {
if ("makeList".equals(o.get("paramname"))) {
o.get("options").forEach(d -> {
String sourceBrandValue = d.get("value").toString().trim();
String brandValue = replaceStockWord(sourceBrandValue.replace(" ","-"));
treeThesaurus.put(sourceBrandValue.toUpperCase(), brandValue);
List<Map> options = (List<Map>) d.get("options");
if (!CollectionUtils.isEmpty(options)) {
options.forEach(v -> {
String svalue = v.get("value").toString().trim();
String value = brandValue+"/"+replaceStockWord(svalue.replace(" ","-"));
treeThesaurus.put((sourceBrandValue+" "+svalue).toUpperCase(), value.trim());
});
}
});
}
});During article processing, HTML tags are stripped using Jsoup, then the cleaned text is parsed with HanLP's parseText method, providing a custom IHit implementation to record match positions and longest matches.
public interface IHit
{
void hit(int begin, int end, V value);
}Matches are stored in a BuildThesaurus object (begin, end, sourceStr) and later used to replace the original article text with appropriate hyperlinks, taking care to adjust indices after each replacement.
public class BuildThesaurus {
private Integer begin;
private Integer end;
private String sourceStr;
// getters and setters
}The final step assembles the hyperlink HTML and inserts it back into the article, ensuring that the original content remains unchanged for the author while providing dynamic, inventory‑aware links for readers.
In conclusion, the solution demonstrates a practical application of Aho‑Corasick Double‑Array Trie within HanLP to meet real‑world backend requirements for dynamic content linking in multilingual automotive portals.
HomeTech
HomeTech tech sharing
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.