Artificial Intelligence 11 min read

OCR Techniques and Solutions for Ctrip Business: Deep Learning Based Text Detection and Recognition

This article presents an overview of computer‑vision based OCR in Ctrip's operations, detailing deep‑learning text detection methods for controlled and uncontrolled scenarios, sequence‑based recognition models, training strategies with synthetic data, and performance results, while discussing current challenges and future improvements.

Ctrip Technology

Feb 28, 2019

OCR Techniques and Solutions for Ctrip Business: Deep Learning Based Text Detection and Recognition

Author Introduction

Yuan Qiulong, an intern with the Ctrip Big Data AI R&D team, focuses on computer‑vision research and applications, primarily working on OCR during the internship.

Overview

Computer vision aims to enable machines to "see" by using cameras and algorithms to recognize, track, and analyze objects. In Ctrip, computer‑vision techniques are applied to supplier qualification, product upload, and display, involving OCR/Scene‑Text Recognition, image quality assessment, intelligent cropping, and object detection.

OCR serves two main purposes in Ctrip: (1) verification, such as checking business licenses and filtering products with sensitive words; and (2) data entry assistance, like automatically extracting license information.

OCR Fundamentals

OCR consists of text detection and text recognition. Detection methods include Stroke Width Transform (SWT), Maximally Stable Extremal Regions (MSER), and Fully‑Convolutional Networks combined with Recurrent Neural Networks (FCN+RNN). Recognition approaches are divided into character‑based methods (traditional DPM features or CNN‑extracted features) and sequence‑based methods (CTC and Seq2Seq).

Technical Solution for Ctrip

The solution follows a two‑stage pipeline: first detect text regions in images, then recognize the detected text.

3.1 Deep‑Learning Based Text Detection

Scenarios are split into controlled (e.g., business licenses) and uncontrolled (e.g., product posters). For controlled scenes, the CTPN model is used; for uncontrolled scenes, TextSnake is adopted. Training follows a coarse‑to‑fine strategy: pre‑training on synthetic data followed by fine‑tuning on a small set of real samples. The CTPN model achieves an F1 score of 89% on license detection, while TextSnake reaches an F1 of 81% on poster detection.

3.2 Sequence‑Based Text Recognition

Two architectures are used: CNN+LSTM+CTC and CNN+LSTM+Seq2Seq (with attention). Both employ CNN for visual feature extraction and bidirectional LSTM for contextual modeling; the difference lies in the loss function (CTC vs. Seq2Seq). The combined CTC‑attention model improves convergence speed while maintaining high accuracy. Training also follows the synthetic‑then‑real fine‑tuning approach.

The integrated OCR system (CTPN + recognition model) achieves up to 85% accuracy on full‑field extraction of critical information such as the Unified Social Credit Code, even under challenging conditions like stamps and reflections.

Conclusion

Deep‑learning OCR models rely heavily on large, diverse datasets; synthetic data is crucial for both detection and recognition stages. Ongoing work focuses on generating more realistic synthetic samples and addressing remaining shortcomings in natural‑scene OCR services.

References

[1] Epshtein et al., "Detecting text in natural scenes with stroke width transform," CVPR 2010. [2] Neumann & Matas, "Real‑time scene text localization and recognition," CVPR 2012. [3] Tian et al., 2016. [4] Shi et al., 2013. [5] Jaderberg et al., 2016. [6] He et al., 2016. [7] Shi, Bai, & Yao, 2016. [8] Lee & Osindero, 2016. [9] Long et al., 2018. [10] Kim et al., 2016.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

computer vision AI OCR text detection text recognition Ctrip

Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.