Practice of Image Feature Extraction and Its Applications in Retrieval and Quality Assessment
This article summarizes a team's practical experience with various image feature extraction methods—including global, local, and CNN features—and demonstrates their use in image retrieval and no‑reference quality assessment through extensive experiments and analysis.
Background : Visual perception is fundamental to humans, making image data crucial for many tasks such as image moderation, classification, face recognition, OCR, object detection, retrieval, and quality assessment. The authors' platform stores massive image collections, prompting a systematic study of feature extraction techniques.
Feature Extraction : Global features : Simple descriptors like average hue, saturation, brightness, sharpness, and contrast that capture overall image statistics but are sensitive to illumination and rotation. Local features : Point‑based descriptors (e.g., SIFT, SURF, ORB) that are robust to scale, rotation, and illumination changes. The pipeline includes keypoint detection (DOG, FAST, etc.) and descriptor computation, followed by encoding methods such as Bag‑of‑Visual‑Words (BoVW), Fisher Vector (FV), and VLAD. CNN features : Deep representations extracted from pretrained or fine‑tuned networks (e.g., VGG16 fc2, Xception avg_pool). Deeper layers provide high‑level semantic information, while earlier layers capture low‑level patterns.
Applications : Image Retrieval : Built instance‑ and category‑based retrieval datasets (2,000 images each) and evaluated feature combinations. Retrieval pipeline includes feature extraction, codebook training (BoVW, VLAD), inverted‑index construction (no product quantization), and L2 similarity scoring. Image Quality Assessment : Conducted no‑reference quality evaluation on public datasets (TID2008, CSIQ, LIVE) using Xception avg_pool features and a binary SVM classifier, reporting accuracy, recall, and AUC.
Experimental Results : For category retrieval, Xception avg_pool features outperformed SIFT. For instance retrieval, SIFT combined with VLAD encoding yielded competitive results. Quality assessment achieved acceptable accuracy on LIVE but lower recall for certain distortion types, likely due to limited training data.
Future Directions : Compare different pretrained CNN models and layers. Explore feature fusion (CNN shallow vs. deep, handcrafted vs. deep, local vs. global). Scale up proprietary image datasets and fine‑tune models. Extend quality assessment with larger datasets and additional degradation types. Investigate the use of image features for CTR prediction, deduplication, and other downstream tasks.
Conclusion : The team’s practical study shows that a variety of image features—global statistics, local descriptors, and deep CNN embeddings—have distinct strengths for different tasks. Selecting appropriate features based on task requirements leads to effective and efficient solutions.
58 Tech
Official tech channel of 58, a platform for tech innovation, sharing, and communication.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.