Artificial Intelligence 4 min read

Python Script for Scraping Zhihu “Beauty” Topic Images with Baidu AI Face Detection

This tutorial explains how to use Python 3 with Requests, lxml, and Baidu's AipFace SDK to crawl images from Zhihu's "美女" topic, filter them by face detection, gender, authenticity, and beauty score, and store the qualified pictures locally.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Python Script for Scraping Zhihu “Beauty” Topic Images with Baidu AI Face Detection

The data source consists of all images appearing in answers to questions under Zhihu's "美女" (beauty) topic.

Tools used include Python 3 and the third‑party libraries Requests, lxml, and Baidu's AipFace SDK, with the script comprising roughly 100 lines of code.

Required environment: a Mac, Linux, or Windows machine (Linux untested, Windows may need filename character filtering), no Zhihu login needed, and a Baidu Cloud account to access the face‑detection service.

The face‑detection library is Baidu's AipFace, a free Python SDK that provides HTTP‑based facial analysis.

Filtering criteria applied after detection are: discard images without faces, non‑female faces, non‑real persons (e.g., anime characters with confidence < 0.6), and images with a beauty score below 45 to save storage.

Implementation logic: Send HTTP requests via Requests to retrieve discussion lists under the "美女" topic. Parse each discussion's HTML with lxml to extract all img tags and their src attributes. Download each image (excluding animated GIFs) using Requests. Submit the image to AipFace for facial analysis. Apply the filtering rules from step 5. Save the remaining images locally with filenames composed of beauty score, author, question title, and an index. Repeat the process for the next discussion.

The resulting images are stored in a folder, with examples showing the highest beauty scores (e.g., 88) and noting that the ordering may be subjective.

Code snippets are provided as images in the original article.

Preparation steps before running the script: Install Python 3. Install the required libraries via pip install requests lxml baidu-aip . Apply for a free Baidu Cloud face‑detection service.

image processingface detectionWeb ScrapingZhihubaidu-ai
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.