Comprehensive Guide to Python urllib Library: Modules, Functions, and Usage Examples
This article provides a detailed tutorial on Python's urllib library, covering its main modules (request, error, parse, robotparser), key functions and classes, code examples for URL fetching, parsing, encoding, and handling robots.txt, making it a practical resource for backend developers and web scrapers.
Python's urllib library provides tools for handling URLs and fetching web content.
The library consists of several modules: urllib.request for opening and reading URLs, urllib.error for handling exceptions, urllib.parse for parsing and constructing URLs, and urllib.robotparser for interpreting robots.txt files.
urllib.request offers functions such as urlopen and the Request class, allowing custom headers, authentication, and timeout settings. Example:
<code>import urllib.request
url = urllib.request.urlopen("https://www.baidu.com")
print(url.read().decode('utf-8'))</code>Common methods of the response object include read() , readline() , info() , getcode() , and geturl() .
urllib.error defines URLError and HTTPError exceptions, where URLError indicates network issues and HTTPError represents HTTP status errors.
Example handling:
<code>from urllib import request, error
try:
response = request.urlopen("http://invalid.url")
except error.URLError as e:
print(e.reason)
except error.HTTPError as e:
print(e.code)</code>urllib.parse provides functions for URL parsing ( urlparse , urlsplit ) and construction ( urlunparse , urlunsplit ), as well as encoding utilities ( quote , urlencode , unquote ). Example parsing:
<code>from urllib.parse import urlparse
o = urlparse("https://docs.python.org/3/library/urllib.parse.html")
print('scheme:', o.scheme)
print('netloc:', o.netloc)</code>Encoding a query string:
<code>from urllib import parse
query = parse.urlencode({'wd':'爬虫'})
url = f"http://www.baidu.com/s?{query}"
print(url)</code>urllib.robotparser parses robots.txt files to determine crawling permissions. It offers methods such as set_url , read , can_fetch , and others for managing crawl policies.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.