Backend Development 11 min read

Implementing Fuzzy Company Name Matching with MySQL RegExp in a Business Approval Workflow

This article describes a business approval scenario where a company name entered by a business user must be checked for duplicates, and explains how to implement fuzzy matching using MySQL RegExp, tokenization with IKAnalyzer, and Java service code to extract, preprocess, match, and rank results by relevance.

Java Architect Essentials
Java Architect Essentials
Java Architect Essentials
Implementing Fuzzy Company Name Matching with MySQL RegExp in a Business Approval Workflow

The goal is to build an approval process for company applications where a business user adds a company and an administrator reviews it, requiring a check for duplicate entries.

The core steps are extracting key information from the company name, tokenizing it, and performing fuzzy matching against existing records.

Three MySQL fuzzy search options are considered: LIKE (exact match, unsuitable), full‑text index (limited customizability), and REGEXP (supports arbitrary patterns). Because the dataset is small, REGEXP is chosen despite slightly lower performance.

Key code snippets:

/** * 匹配前去除公司名称的无意义信息 * @param targetCompanyName * @return */ private String formatCompanyName(String targetCompanyName) { String regex = "(? [^省]+自治区|.*?省|.*?行政区|.*?市)" + "?(? [^市]+自治州|.*?地区|.*?行政单位|.+盟|市辖区|.*?市|.*?县)" + "?(? [^(区|市|县|旗|岛)]+区|.*?市|.*?县|.*?旗|.*?岛)" + "?(? .*)"; Matcher matcher = Pattern.compile(regex).matcher(targetCompanyName); while (matcher.find()) { // remove province, city, county etc. } // additional address removal using AddressUtil.ADDRESS return targetCompanyName; }

public class AddressUtil { public static final String[][] ADDRESS = { {"北京"}, {"天津"}, {"安徽","安庆","蚌埠",...}, /* many provinces and cities */ }; }

com.janeluo ikanalyzer 2012_u6 ... org.apache.lucene lucene-queryparser 7.3.0

@Slf4j public class IKAnalyzerSupport { public static List iKSegmenterToList(String target) throws Exception { if (StringUtils.isEmpty(target)) return new ArrayList<>(); List result = new ArrayList<>(); StringReader sr = new StringReader(target); IKSegmenter ik = new IKSegmenter(sr, true); Lexeme lex; while ((lex = ik.next()) != null) { result.add(lex.getLexemeText()); } return result; } }

private String splitWord(String targetCompanyName) { log.info("对处理后端公司名称进行分词"); List splitWord = new ArrayList<>(); String result = targetCompanyName; try { splitWord = iKSegmenterToList(targetCompanyName); result = splitWord.stream().distinct().collect(Collectors.joining("|")); log.info("分词结果:{}", result); } catch (Exception e) { log.error("分词报错:{}", e.getMessage()); } return result; }

public JsonResult matchCompanyName(CompanyDTO companyDTO, String accessToken, String localIp) { String sourceCompanyName = companyDTO.getCompanyName(); String targetCompanyName = sourceCompanyName; log.info("处理前公司名称:{}", targetCompanyName); targetCompanyName = targetCompanyName.replaceAll("[(]|[)]|[(]|[)]", ""); targetCompanyName = targetCompanyName.replaceAll("[(集团|股份|有限|责任|分公司)]", ""); if (!targetCompanyName.contains("银行")) { targetCompanyName = formatCompanyName(targetCompanyName); } String splitCompanyName = splitWord(targetCompanyName); List matchedCompany = companyRepository.queryMatchCompanyName(splitCompanyName, targetCompanyName); List result = new ArrayList<>(); for (Company c : matchedCompany) { result.add(c.getCompanyName()); if (companyDTO.getCompanyId().equals(c.getCompanyId())) { result.remove(c.getCompanyName()); } } return JsonResult.successResult(result); }

@Query(value = "SELECT * FROM company WHERE isDeleted = '0' and companyName REGEXP ?1 ORDER BY length(REPLACE(companyName,?2,''))/length(companyName)", nativeQuery = true) List queryMatchCompanyName(String companyNameRegex, String companyName);

The ordering uses LENGTH(companyName) and LENGTH(REPLACE(companyName, ?2, '')) to count keyword occurrences, ranking companies with more matches higher.

Finally, the article ends with a call to share the content and join a community for further architectural discussions.

backendJavaDatabaseMySQLtokenizationfuzzy-matchingRegExp
Java Architect Essentials
Written by

Java Architect Essentials

Committed to sharing quality articles and tutorials to help Java programmers progress from junior to mid-level to senior architect. We curate high-quality learning resources, interview questions, videos, and projects from across the internet to help you systematically improve your Java architecture skills. Follow and reply '1024' to get Java programming resources. Learn together, grow together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.