Big Data 10 min read

Weibo Hot Search Data Crawling, Analysis, and Visualization Project

This article presents a Python‑based project that continuously crawls Weibo hot‑search data, stores it with timestamps, and visualizes trends through dynamic bar, line, and word‑cloud charts using libraries such as BeautifulSoup, pandas, schedule, pyecharts, and jieba.

Python Programming Learning Circle

Oct 6, 2024

Weibo Hot Search Data Crawling, Analysis, and Visualization Project

The article introduces a project that collects real‑time Weibo hot‑search data, processes it, and creates eye‑catching visualizations, providing a practical example of web scraping, data analysis, and dynamic chart generation with Python.

Project Overview : The source is the Weibo hot‑search summary page. The data includes ranking, title, and heat value, which are captured every minute, timestamped, and saved to CSV using pandas. The project emphasizes handling anti‑scraping measures by adding random user‑agents via fake_useragent and respecting request intervals with schedule and time.

Core Crawling Code :

from fake_useragent import UserAgent
import schedule
import pandas as pd
from datetime import datetime
import requests
from bs4 import BeautifulSoup
import time

ua = UserAgent()
url = "https://s.weibo.com/top/summary?cate=realtimehot&sudaref=s.weibo.com&display=0&retcode=6102"
headers = {"User-Agent": ua.random}
get_info_dict = {}
count = 0
a = 1

def main():
    global url, get_info_dict, count, a
    get_info_list = []
    html = requests.get(url, headers).text  # fetch page source
    # parsing logic omitted for brevity

Scheduling ensures the main function runs every minute:

# schedule the crawler
schedule.every(1).minutes.do(main)
while True:
    time.sleep(2)
    schedule.run_pending()

Visualization : Using pyecharts, the project creates a dynamic bar chart (Timeline) that updates with each crawl, a line chart for tracking specific topics, and a word‑cloud generated from comment data via jieba. Example code for the bar chart:

df = pd.read_csv('夜间微博.csv', encoding='gbk')
from pyecharts.charts import Bar, Timeline
from pyecharts import options as opts
from pyecharts.globals import ThemeType

t = Timeline({"theme": ThemeType.MACARONS})
for i in range(389):
    bar = (
        Bar()
        .add_xaxis(list(df['关键词'][i*20:i*20+20][::-1]))
        .add_yaxis('热度', list(df['热度'][i*20:i*20+20][::-1]))
        .reversal_axis()
        .set_global_opts(
            title_opts=opts.TitleOpts(title=f"{list(df['时间'])[i*20]}", pos_right="5%", pos_bottom="15%"),
            xaxis_opts=opts.AxisOpts(splitline_opts=opts.SplitLineOpts(is_show=True)),
            yaxis_opts=opts.AxisOpts(splitline_opts=opts.SplitLineOpts(is_show=True), axislabel_opts=opts.LabelOpts(color='#149bff'))
        )
        .set_series_opts(label_opts=opts.LabelOpts(position="right", color='#ff1435'))
    )
    t.add(bar, f"frame {i}")

For comment analysis, the script fetches hot comments via Weibo's mobile API, extracts text, filters punctuation, performs word segmentation with jieba, and builds a word‑cloud image.

import requests, json, re

def get_comments(url):
    headers = {
        "cookie": "...",
        "Accept": "application/json, text/plain, */*",
        "User-Agent": "Mozilla/5.0 ...",
        "X-Requested-With": "XMLHttpRequest",
        "X-XSRF-TOKEN": "50171a"
    }
    res = requests.get(url, headers=headers)
    ids = re.findall('u524d","id":"(.*?)",', res.text)
    # further processing omitted

Overall, the project demonstrates a complete pipeline—from data acquisition and storage to real‑time visual analytics—illustrating how Python can be leveraged for big‑data projects involving social‑media trend monitoring.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Web Scraping Pyecharts Weibo

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.