Poetry Generation from Images: Design, Implementation, and Evaluation of Ctrip’s “Xiao Shi Ji” System
The article presents Ctrip’s “Xiao Shi Ji” system that combines large‑scale tourism knowledge graphs, image recognition, and deep‑learning‑based poetry generation to automatically compose Chinese classical poems from photos, evaluates its performance against human poets, and discusses the underlying AI techniques.
In early 2017, Ctrip launched the “Xiao Shi Ji” (Little Poetry Machine) that can understand and appreciate user‑uploaded photos and generate classical Chinese poems that match the image’s scenery and mood using a massive knowledge base.
Evaluation with blind tests against human poets in Shanghai showed that the system reaches human‑level quality, with professional and public judges unable to reliably distinguish machine‑generated poems; it often ranked among the top entries.
The system also supports functions such as image‑based poem retrieval, acrostic poems, and tower poems, showcasing AI’s challenge to human creativity in tourism contexts.
1. Overall Process
The pipeline consists of three core modules: a tourism knowledge graph, image recognition, and a poetry‑generation engine (see Figure 4).
2. Knowledge Graph Construction
Data sources include Ctrip’s proprietary tourism data, user‑generated content (reviews, travel notes), and public resources such as Wikipedia and Baidu Baike. The data are categorized as unstructured (text), semi‑structured (large encyclopedic entries), and structured (tourism entities, hotel, itinerary, user intent).
Knowledge extraction employs NLP techniques—segmentation, POS tagging, dependency parsing, semantic role labeling, and NER (using CRF++ combined with dictionaries) to extract entities, relations, and topics (tf‑idf, chi‑square, TextRank, LDA). Fusion merges multi‑source entities via semantic and lexical similarity and custom weighting, followed by symbolic logical reasoning to infer new relationships.
3. Image Recognition
State‑of‑the‑art CNN models are used. Starting from LeNet‑5, the evolution through AlexNet, VGGNet, GoogLeNet, and ResNet is described. Ctrip adopts an Inception‑v3 model with transfer learning, training both high‑level and low‑level layers to cope with a relatively small, domain‑specific dataset, achieving 92.5% mAP on the internal tourism image set.
4. Poetry Generation Engine
Traditional statistical and rule‑based methods are combined with deep learning. RNN language models alleviate sparsity; encoder‑decoder frameworks with attention capture theme and context; hierarchical RNNs ensure global coherence. The system scores image‑theme relevance, plans topics, and uses a greedy plus local‑optimal two‑pass algorithm (or genetic algorithms) to generate verses that satisfy rhyme, fluency, and relevance.
5. Summary
The “Xiao Shi Ji” demonstrates a successful integration of large‑scale tourism knowledge graphs, computer‑vision image understanding, and AI‑driven poetry generation, achieving human‑comparable poetic quality. Future work will refine entity tagging, expand visual and knowledge coverage, and further optimize the generation engine for richer, more diverse poetic expressions.
Ctrip Technology
Official Ctrip Technology account, sharing and discussing growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.