Predicting Movie Box Office with Playwright Data Scraping and DeepSeek AI
This article demonstrates how to combine Playwright web‑scraping of multiple Chinese movie platforms with the DeepSeek AI model to automatically collect data and generate a scientific prediction of the box‑office revenue for the film "Ne Zha 2".
Ever wondered how to use AI to predict a movie's box office? In this tutorial we combine large‑scale data collection with the DeepSeek AI model to forecast the final revenue of the recent hit "Ne Zha 2".
What is this operation?
First, we use Playwright —a powerful browser automation tool—to scrape real‑time data from platforms such as Douban, Taopiaopiao, Maoyan, Weibo, and Douyin. These platforms provide ratings, review counts, wish‑to‑see numbers, and social‑media heat indices, all of which are strong indicators for box‑office performance.
How to scrape the data?
Below is the code for extracting Douban data:
async function scrapeDouban() {
const browser = await chromium.launch({ headless: true });
const page = await browser.newPage();
try {
await page.goto('https://movie.douban.com/subject/34780991/');
const ratingLocator = page.locator('.rating_num');
const votesLocator = page.locator('span[property="v:votes"]');
const rating = await ratingLocator.innerText();
const votes = await votesLocator.innerText();
console.log(`Douban data - rating: ${rating}, votes: ${votes}`);
return { rating, votes };
} catch (error) {
console.error('Error scraping Douban:', error);
return null;
} finally {
await browser.close();
}
}Similar functions are provided for Taopiaopiao, Maoyan, Weibo, and Douyin, each extracting the relevant metrics (rating, wish‑to‑see count, view count, heat index, etc.).
Data collection is easy, analysis is the hard part
After gathering all platform data, we feed it to DeepSeek using the Promise.all pattern to run the scrapers in parallel, then call a predictBoxOffice function that constructs a detailed prompt for the AI model.
async function main() {
const [doubanData, taopiaopiaoData, maoyanData, weiboData, douyinData] = await Promise.all([
scrapeDouban(),
scrapeTaopiaopiao(),
scrapeMaoyan(),
scrapeWeibo(),
scrapeDouyin()
]);
if (!doubanData || !taopiaopiaoData || !maoyanData || !weiboData || !douyinData) {
console.error('Error: some data failed to scrape');
return;
}
const combinedData = { douban: doubanData, taopiaopiao: taopiaopiaoData, maoyan: maoyanData, weibo: weiboData, douyin: douyinData };
const predictedBoxOffice = await predictBoxOffice(combinedData);
console.log(`Predicted box office for Ne Zha 2: ${predictedBoxOffice}`);
}
main();The predictBoxOffice function uses the OpenAI-compatible DeepSeek API. It builds a prompt that lists all collected metrics, asks the model to consider holiday effects, cultural impact, social‑media trends, historical box‑office ceilings, and competing releases, and finally returns a predicted revenue range.
import OpenAI from "openai";
const openai = new OpenAI({ baseURL: 'https://api.deepseek.com', apiKey: '<DeepSeek API Key>' });
async function predictBoxOffice(data) {
const { douban, taopiaopiao, maoyan, weibo, douyin } = data;
const prompt = `You are a senior movie‑box‑office forecasting expert...`; // shortened for brevity
const completion = await openai.chat.completions.create({
model: "deepseek-chat",
messages: [{ role: 'system', content: prompt }, { role: 'user', content: 'Give the final box‑office prediction for Ne Zha 2 in billions of yuan.' }],
temperature: 0.3,
});
return completion.choices[0].message.content;
}Result
The AI model returned a prediction of 165 billion yuan for "Ne Zha 2", illustrating how automated data collection and large‑language‑model analysis can produce a scientifically grounded box‑office estimate far beyond simple guesswork.
Feel free to copy the code, run it locally, and experiment with your own movie predictions.
Rare Earth Juejin Tech Community
Juejin, a tech community that helps developers grow.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.