Python Web Scraping Tutorial: Extract Quotes, Authors, and Tags from quotes.toscrape.com and Save to CSV
This tutorial demonstrates how to use Python's requests and lxml libraries to scrape quotes, authors, and tags from quotes.toscrape.com, parse the HTML with XPath, and save the extracted data into a CSV file.
Step 1: Open the target website https://quotes.toscrape.com/ and observe that each quote entry contains the quote text, author, and tags.
Step 2: Use the browser developer tools to inspect the network requests; the page is fetched with a simple GET request, so the Python requests.get() method with appropriate headers can be used to mimic a browser.
Step 3: Parse the returned HTML with lxml.etree and XPath expressions to locate the quote container ( //div[@class="col-md-8"] ) and extract the text, author, and tag elements.
Step 4: Store the extracted fields in a list and write them to a CSV file using the csv module.
The article also provides the full Python script that performs these steps.
<code>import requests
from lxml import etree
import csv
url = "https://quotes.toscrape.com/"
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36'
}
res = requests.get(url,headers = headers).text
html = etree.HTML(res)
queto_list = html.xpath('//div[@class="col-md-8"]')
lists = []
for queto in queto_list:
# 名言正文
title = queto.xpath('./div[@class="quote"]/span[1]/text()')
# 作者
authuor = queto.xpath('./div[@class="quote"]/span[2]/small/text()')
# 名言标签
tags = queto.xpath('./div[@class="quote"]/div[@class="tags"]/a[@class="tag"]/text()')
# 将数据统一添加进列表中保存
lists.append(title)
lists.append(authuor)
lists.append(tags)
with open("./名人名言.csv",'w',encoding='utf-8',newline='\n') as f:
writer = csv.writer(f)
for i in lists:
writer.writerow(x)</code>Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.