Backend Development 7 min read

Python Web Scraping Tutorial: Using requests and BeautifulSoup to Extract Weather Data

This article demonstrates how to use Python's requests library and BeautifulSoup to inspect webpage source, set request headers, fetch weather page HTML, parse it with CSS selectors, extract daytime and nighttime temperatures, and extend the script to handle multiple cities, providing complete code examples.

Python Programming Learning Circle

Aug 10, 2022

Python Web Scraping Tutorial: Using requests and BeautifulSoup to Extract Weather Data

This guide introduces three essential web‑scraping techniques: inspecting page source and elements, using the requests library, and parsing HTML with BeautifulSoup. It shows how to retrieve the weather page for Beijing, extract daytime and nighttime temperatures, and then generalize the script for multiple Chinese cities.

First, the script sets a custom User‑Agent header, sends a GET request to the weather URL, forces UTF‑8 encoding, and obtains the raw HTML. The HTML is parsed with the lxml parser to create a Soup object, which can be linked to the browser's "Inspect Element" view.

Using CSS selector syntax p.tem span, the script selects the temperature elements, extracts their text, and prints the results.

Code example for a single city (Beijing):

# -*- coding: utf-8 -*-
__author__ = 'duohappy'
import requests  # import requests module
from bs4 import BeautifulSoup  # import BeautifulSoup

# Set request headers with a common User‑Agent
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.110 Safari/537.36"}

url = "http://www.weather.com.cn/weather1d/101010100.shtml"
web_data = requests.get(url, headers=headers)
web_data.encoding = 'utf-8'
content = web_data.text
soup = BeautifulSoup(content, 'lxml')

tag_list = soup.select('p.tem span')
day_temp = tag_list[0].text
night_temp = tag_list[1].text
print('白天温度为{0}℃n晚上温度为{1}℃'.format(day_temp, night_temp))

To scrape multiple cities, a dictionary maps city names to their weather codes. The user inputs a city name, the URL is formatted accordingly, and the same extraction logic is applied.

Code example for multiple cities:

# -*- coding: utf-8 -*-
__author__ = 'duohappy'
import requests
from bs4 import BeautifulSoup

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.110 Safari/537.36"}

weather_code = {'北京':'101010100','上海':'101020100','深圳':'101280601','广州':'101280101','杭州':'101210101'}
city = input('请输入城市名：')  # only accepts the listed cities
url = "http://www.weather.com.cn/weather1d/{}.shtml".format(weather_code[city])
web_data = requests.get(url, headers=headers)
web_data.encoding = 'utf-8'
content = web_data.text
soup = BeautifulSoup(content, 'lxml')

tag_list = soup.select('p.tem span')
day_temp = tag_list[0].text
night_temp = tag_list[1].text
print('白天温度为{0}℃n晚上温度为{1}℃'.format(day_temp, night_temp))

The article also briefly covers using BeautifulSoup methods like find and find_all with regular expressions to extract text from specific tags, emphasizing that many web‑page contents are directly embedded in the HTML source.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

html-parsing web-scraping

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.