Analyzing Jianshu Platform Data with Python Crawling and FineBI Visualization
This article details how to use Python to crawl user and article data from the Jianshu platform, then apply FineBI for business intelligence analysis and visualizations, covering author contracts, follower distribution, popular articles, and engagement metrics.
While Zhihu dominates the Chinese UGC market, other platforms such as Jianshu are often overlooked; this report explores the quality of Jianshu users, high‑fan‑count accounts, and the most‑read articles.
1. Data Acquisition – The data was collected using Python web‑scraping, retrieving up to 900 followers per user and about 1,900 articles per author, resulting in 261,277 user records containing usernames, profile URLs, contract status, follower/following counts, article counts, total word counts, etc.
Analysis of the 1,916 scraped articles shows the top‑liked article received 17,076 likes, while the least liked had 488, indicating that most popular content was captured.
2. BI Analysis – After obtaining the data, the next step is visualization. Although many front‑end chart libraries exist (Highcharts, ECharts, Chart.js, D3.js), they require coding skills and are not truly open‑source. For non‑technical users, a Business Intelligence (BI) tool is preferable.
Internationally, Tableau is a leading BI product, but its real‑time query capabilities, high cost, lack of built‑in data warehouse, and poor support for complex Chinese tables make it unsuitable for the Chinese market.
Instead, the author chose FineBI, a free‑for‑personal‑use, enterprise‑grade BI platform.
Advantages of FineBI
Automatic and flexible data modeling.
Rich visualizations and front‑end analysis features, supporting drill‑down, slicing, and rotating of multidimensional data.
Built‑in ETL for real‑time analysis and fast processing of large datasets.
3. Data Visualization with FineBI – After installing and activating FineBI, the Python‑scraped dataset was imported for analysis.
3.1 Contracted Author Analysis – Among over 260,000 users, only 126 were explicitly marked as “contracted authors,” indicating a strict author selection process.
3.2 Follower Distribution – A pyramid‑style chart shows that only five users have more than 100,000 followers, while the 10‑100 k follower range accounts for the largest proportion (40.38 %).
3.3 24‑Hour Hot Article Timing – The highest number of articles were posted around 11 am, suggesting that many creators publish in the morning rather than late at night.
3.4 Engagement Metrics – Likes and comments correlate strongly with article popularity, as illustrated by the accompanying charts.
Overall, the combination of Python crawling and FineBI visualization provides a practical workflow for extracting and analyzing UGC platform data without deep programming expertise.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.