Backend Development 4 min read

Python Web Scraping Tutorial: Extract Quotes, Authors, and Tags from quotes.toscrape.com and Save to CSV

This tutorial demonstrates how to use Python's requests and lxml libraries to scrape quotes, authors, and tags from quotes.toscrape.com, parse the HTML with XPath, and save the extracted data into a CSV file.

Python Programming Learning Circle

Oct 16, 2021

Python Web Scraping Tutorial: Extract Quotes, Authors, and Tags from quotes.toscrape.com and Save to CSV

Step 1: Open the target website https://quotes.toscrape.com/ and observe that each quote entry contains the quote text, author, and tags.

Step 2: Use the browser developer tools to inspect the network requests; the page is fetched with a simple GET request, so the Python requests.get() method with appropriate headers can be used to mimic a browser.

Step 3: Parse the returned HTML with lxml.etree and XPath expressions to locate the quote container ( //div[@class="col-md-8"]) and extract the text, author, and tag elements.

Step 4: Store the extracted fields in a list and write them to a CSV file using the csv module.

The article also provides the full Python script that performs these steps.

import requests
from lxml import etree
import csv

url = "https://quotes.toscrape.com/"
headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36'
}
res = requests.get(url,headers = headers).text
html = etree.HTML(res)
queto_list = html.xpath('//div[@class="col-md-8"]')
lists = []
for queto in queto_list:
    # 名言正文
    title = queto.xpath('./div[@class="quote"]/span[1]/text()')
    # 作者
    authuor = queto.xpath('./div[@class="quote"]/span[2]/small/text()')
    # 名言标签
    tags = queto.xpath('./div[@class="quote"]/div[@class="tags"]/a[@class="tag"]/text()')
    # 将数据统一添加进列表中保存
    lists.append(title)
    lists.append(authuor)
    lists.append(tags)

    with open("./名人名言.csv",'w',encoding='utf-8',newline='
') as f:
            writer = csv.writer(f)
            for i in lists:
                writer.writerow(x)

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

web-scraping lxml

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.