Fundamentals 4 min read

Analyzing and Predicting the Box Office of "The Battle at Lake Changjin" Using Python Data Scraping and Visualization

This tutorial demonstrates how to scrape Maoyan movie comments for "The Battle at Lake Changjin", clean and store the data, perform comprehensive visual analyses such as likes, city, gender, watch status, rating, user level, and creator mentions, and finally predict next‑day box office using linear regression with sklearn.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Analyzing and Predicting the Box Office of "The Battle at Lake Changjin" Using Python Data Scraping and Visualization

The article begins by describing the popularity of the movie "The Battle at Lake Changjin" and introduces a data‑driven analysis using Python to scrape comments from the Maoyan platform.

Data acquisition : The comments are fetched by constructing the Maoyan API request and the JSON response is saved locally.

df_new = df.drop_duplicates(['comment_id'])

Data storage : A function is written to write the retrieved JSON data to a local file.

Data cleaning : Duplicate entries are removed based on comment_id , and non‑Chinese characters are filtered out from the comment text.

Visualization analysis includes:

Removing duplicate comments.

Ranking comments by number of likes.

City‑wise comment distribution, showing a correlation with regional economic development.

Gender distribution, revealing a slight female majority.

Watch status, indicating most users comment after watching.

Rating distribution, converting the 10‑point UI scale to a 5‑point internal scale.

User level distribution, where most users are level 2.

Frequency of mentions of the film’s creators, with 易烊千玺 ranking highest.

All visualizations are generated with pyecharts and displayed as bar, pie, and line charts.

Box‑office prediction : The daily box‑office data is plotted, showing a rise until day 7 and a decline thereafter. A linear regression model from sklearn.linear_model is fitted to the data.

attr = ['其他','男','女'] b = (Pie() .add(...))

The fitted model is then used to forecast the next day's box‑office, and the prediction result is presented as an image.

Overall, the article provides a step‑by‑step guide for data scraping, cleaning, visualization, and simple predictive modeling applied to movie comment data.

machine learningPythondata analysisvisualizationWeb Scrapingpandaspyechartsbox-office
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.