Fundamentals 12 min read

Web Scraping and Data Analysis of Pet Cat Breeds Using Python

This article demonstrates how to scrape cat breed information from a dedicated website, store the data in Excel, and perform comprehensive analysis and visualizations—including relationship graphs, geographic distribution, size ratios, price extremes, and word clouds—using Python libraries such as requests, lxml, pandas, pyecharts, and stylecloud.

Rare Earth Juejin Tech Community

Dec 2, 2021

Web Scraping and Data Analysis of Pet Cat Breeds Using Python

The article begins with a brief introduction to the Juejin "Use Code to Attract Cats" activity, posing two questions about cat ownership and curiosity, and explains the author's motivation to learn about various pet cat breeds through coding.

Data collection is performed by crawling the cat breed website www.maomijiaoyi.com . The following Python code fetches the list of breed pages, extracts the breed name, price, and detail URL, and prints the results:

from lxml import etree
import requests

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36"}
url_base = "http://www.maomijiaoyi.com"
session = requests.Session()

# Access the breed index page and collect detail links
url = url_base + "/index.php?/pinzhongdaquan_5.html"
res = session.get(url, headers=headers)
html = etree.HTML(res.text)
main_data = []
for a_tag in html.xpath("//div[@class='pinzhong_left']/a"):
    url = url_base + a_tag.xpath("./@href")[0]
    pet_name, pet_price = None, None
    pet_name_tag = a_tag.xpath("./div[@class='pet_name']/text()")
    if pet_name_tag:
        pet_name = pet_name_tag[0].strip()
    pet_price_tag = a_tag.xpath("./div[@class='pet_price']/span/text()")
    if pet_price_tag:
        pet_price = pet_price_tag[0].strip()
    print(pet_name, pet_price, url)
    main_data.append((pet_name, pet_price, url))

After obtaining the links, the script visits each detail page, parses basic attributes, appearance attributes, detailed descriptions, and image URLs, then downloads the images. The extracted data is saved to an Excel file named 猫咪.xlsx. Sample screenshots of the scraped data and downloaded images are shown below:

Data analysis starts by loading the Excel file with pandas:

import pandas as pd

df = pd.read_excel("猫咪.xlsx")

Various visualizations are created using the pyecharts library:

A relationship graph shows each breed and its aliases.

A bar chart displays the geographic distribution of breeds.

A treemap visualizes the distribution of breeds across countries.

A pie chart illustrates the proportion of different body sizes.

from pyecharts import options as opts
from pyecharts.charts import Graph, Bar, TreeMap, Pie
# (code omitted for brevity – the full snippets are present in the source)

Price analysis splits the "参考价格" column, identifies the cheapest and most expensive breeds, and prints the results:

tmp = df.参考价格.str.split("-", expand=True)
tmp.columns = ["最低价格", "最高价格"]
tmp.dropna(inplace=True)
tmp = tmp.astype("int")
cheap_cat = df.loc[tmp.index[tmp.最低价格 == tmp.最低价格.min()], "中文学名"].to_list()
costly_cat = df.loc[tmp.index[tmp.最高价格 == tmp.最高价格.max()], "中文学名"].to_list()
print("最便宜的品种有：", cheap_cat)
print("最贵的品种有：", costly_cat)

Word clouds are generated for descriptive columns using the stylecloud library. Example code for creating a general word cloud and separate clouds for personality traits and living habits is provided:

import stylecloud, jieba
from IPython.display import Image
# (code omitted for brevity – the full snippets are present in the source)

Finally, a mind‑map style diagram groups breeds by body size, producing a hierarchical view of the cat taxonomy.

References:

https://juejin.cn/post/7024369534119182367

http://www.maomijiaoyi.com/

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Data Analysis pandas Pyecharts cat breeds

Written by

Rare Earth Juejin Tech Community

Juejin, a tech community that helps developers grow.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.