Artificial Intelligence 11 min read

Yiche OCR System: Architecture, Data Expansion, Multi‑Branch Optimization, and Server Migration

The Yiche OCR system combines a DBNet‑based text detector and a CRNN recognizer, enhances performance on natural‑scene texts through data expansion, multi‑branch dictionaries, distribution‑aware weighting, and accelerates training via IPEX and parallel processing on CPU servers.

Yiche Technology

Jan 4, 2022

Yiche OCR System: Architecture, Data Expansion, Multi‑Branch Optimization, and Server Migration

The Yiche OCR system addresses the difficulty of recognizing natural‑scene texts, especially artistic and handwritten characters, by designing a fast, accurate, and easily deployable pipeline that integrates text detection, orientation classification, and text recognition.

Overall Architecture : The pipeline consists of three stages—text detection (DBNet), direction classification (open‑source model), and text recognition (CRNN). Training and inference are separated; detection and recognition models are trained on a combined dataset of Yiche data and ICDAR2019 LSVT, with a 9:1 split for training and validation.

Project Highlights :

Data Expansion: Added 30,000 ICDAR2019 LSVT images to improve performance on posters; accuracy increased from 0.7033 to 0.7231 and normalized edit distance (ned) from 0.8600 to 0.8715.

Multi‑Branch Structure: Introduced separate dictionary branches, each with a single fully‑connected layer, concatenating the final probability vectors to cover all characters and raise recall.

Distribution‑Weighted Loss: Computed character‑frequency weights (0‑1 range) and multiplied them with the output matrix to balance training data distribution, improving precision on low‑frequency characters.

Server Migration and Acceleration :

Detection Migration: Compiled PyTorch and Intel IPEX with AMP and bfloat16, using 96 logical CPU cores (24 OMP/MKL threads). Training time reduced from ~10 days (official) to ~4.3 days (compiled).

Recognition Migration: Patched PaddlePaddle for multi‑core parallelism (gloo backend) and used asynchronous DataLoader with MKL, cutting training time from ~1 month (official) to ~8 days (compiled).

Parallel Training Configuration: Explored various process counts, batch sizes, and worker numbers. Selected configurations that maximized batch size while minimizing iteration count, achieving stable and faster training for both detection and recognition.

Experimental Results :

Data

Method

Poster acc/ned

Bill acc/ned

Yiche

DB+CRNN

0.7033 / 0.8600

0.8501 / 0.9600

Yiche+ICDAR

DB+CRNN+MD*

0.7103 / 0.8605

0.8530 / 0.9692

The multi‑branch and distribution‑weighted optimizations further improved accuracy and reduced normalized edit distance for both posters and invoices, as shown in Table 3.

Conclusion :

Data analysis revealed imbalanced character distributions, prompting the use of multi‑branch dictionaries and frequency‑based weighting.

Problem‑specific solutions (multi‑branch networks, weighted loss) effectively addressed long‑tail and dictionary mismatch issues.

Engineering optimizations (IPEX, parallel training, appropriate thread settings) significantly accelerated CPU‑based training while maintaining model performance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

OCR text detection DBNet CRNN parallel training

Written by

Yiche Technology

Official account of Yiche Technology, regularly sharing the team's technical practices and insights.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.