Artificial Intelligence 11 min read

Intelligent Image Aesthetic Scoring for OTA Ticket Listings Using Deep Learning

This article describes how deep learning techniques, including convolutional neural networks and CAM visualizations, are applied to automatically evaluate and select high‑quality images for online travel agency ticket listings, improving user experience while reducing manual effort.

Ctrip Technology
Ctrip Technology
Ctrip Technology
Intelligent Image Aesthetic Scoring for OTA Ticket Listings Using Deep Learning

Author Biography Lu Chan, algorithm engineer at a vacation‑AI R&D team, focuses on computer vision and machine learning, currently working on intelligent image processing for travel applications.

1. Overview As a leading OTA, Ctrip serves millions of customers daily, and scenic‑spot images are crucial for users to understand attractions. Traditionally, staff manually select representative images for each product, a process that is subjective, labor‑intensive, and hard to keep up‑to‑date. The article proposes an automated deep‑learning‑based solution to evaluate image aesthetics and select optimal cover images, thereby enhancing user satisfaction and reducing manual workload.

Recent breakthroughs in deep learning for image tasks enable neural networks to capture both semantic and aesthetic cues, making them suitable for automatic image quality assessment.

2. Technical Introduction The goal is to build an intelligent aesthetic scoring system that automatically filters high‑quality images for display. The system also employs Class Activation Mapping (CAM) to visualize network decisions, providing interpretability.

2.1 Image Aesthetic Scoring Aesthetic assessment aims to predict quality scores aligned with human perception, differing from traditional Image Quality Assessment that focuses on pixel‑level degradation. Early methods relied on handcrafted low‑level (color, texture) and high‑level (depth, contrast) features followed by regression or classification models. Deep convolutional neural networks (CNNs) now dominate due to their automatic feature learning capability.

2.1.1 Research Status Key works include RAPID (based on AlexNet with multi‑patch inputs) that extracts global and local aesthetic features, A‑Lamp which uses adaptive multi‑patch strategies, NIMA which predicts a rating distribution histogram, and ranking‑based networks that learn from pairwise image comparisons. These approaches demonstrate the effectiveness of deep models for aesthetic prediction.

2.1.2 Application In practice, a ResNet pretrained on ImageNet serves as the backbone. Inputs combine the whole image (padded and resized) with multiple cropped patches to capture global and local information. Data augmentation is limited to horizontal flips to avoid distorting composition. Ambiguous mid‑range scores are handled carefully; binary thresholds can mislabel such samples, so softmax classification or histogram learning (1‑10 rating) is explored. Semi‑supervised learning and hard‑mining are employed to mitigate the scarcity of labeled aesthetic data.

2.2 Visualization To explain why a network judges an image as attractive or not, CAM and Grad‑CAM are applied. These methods compute weighted sums of the final convolutional feature maps using class‑specific weights or gradients, producing heatmaps that highlight regions influencing the decision.

2.3 Image Content Matching Beyond aesthetic quality, business rules require that selected images match the scenic theme (e.g., excluding food or schedule screenshots). A multi‑label classification model tags each image with content categories; only images with permissible tags are considered, and the highest‑scoring aesthetic image among them becomes the final cover.

3. Summary Deep learning, particularly DCNNs, effectively extracts both global and local features necessary for aesthetic assessment. Combining these cues yields a comprehensive representation that can evaluate image quality even with limited data, enabling automated, high‑quality image selection for OTA platforms.

References [1] Lu, Xin, et al. "Rapid: Rating pictorial aesthetics using deep learning." Proceedings of the 22nd ACM International Conference on Multimedia, 2014. [2] Ma, Shuang, Jing Liu, and Chang Wen Chen. "A‑Lamp: Adaptive layout‑aware multi‑patch deep convolutional neural network for photo aesthetic assessment." IEEE CVPR, 2017. [3] Talebi, Hossein, and Peyman Milanfar. "NIMA: Neural image assessment." IEEE Transactions on Image Processing, 2018. [4] Kong, Shu, et al. "Photo aesthetics ranking network with attributes and content adaptation." European Conference on Computer Vision, 2016. [5] Zhou, Bolei, et al. "Learning deep features for discriminative localization." CVPR, 2016. [6] Selvaraju, Ramprasaath R., et al. "Grad‑CAM: Visual Explanations from Deep Networks via Gradient‑Based Localization." ICCV, 2017. [7] Malu, Gautam, et al. "Learning Photography Aesthetics with Deep CNNs." arXiv preprint arXiv:1707.03981, 2017.

CNNcomputer visionDeep Learningimage aestheticsvisualizationOTAimage ranking
Ctrip Technology
Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.