Comprehensive List of Python Libraries for Web Crawling, Web Development, and Related Technologies
This article provides an extensive overview of Python libraries and frameworks for web crawling, HTTP handling, HTML parsing, text processing, asynchronous programming, queue management, cloud execution, WebSocket communication, DNS resolution, computer vision, proxy servers, and popular web frameworks such as Django, Flask, Web2py, Tornado, and CherryPy, helping developers choose appropriate tools for backend development.
Python learners often start with web crawling because abundant resources and open‑source projects exist.
Web crawling can be divided into three major stages: fetching, parsing, and storing.
When a URL is entered in a browser, four steps occur: domain name resolution, sending a request to the server, receiving the response, and browser rendering.
Common networking libraries:
urllib (stdlib)
requests
grab (based on pycurl)
pycurl
urllib3
httplib2
RoboBrowser
MechanicalSoup
mechanize
socket (stdlib)
Unirest for Python
hyper (HTTP/2 client)
PySocks
Web crawling frameworks:
grab
scrapy (Twisted‑based, no Python 3 support)
pyspider
cola (distributed)
portia (visual, based on Scrapy)
restkit
demiurge
HTML/XML parsers:
lxml
cssselect
pyquery
BeautifulSoup
html5lib
feedparser
MarkupSafe
xmltodict
xhtml2pdf
untangle
Text processing libraries:
difflib (stdlib)
Levenshtein
fuzzywuzzy
esmre
ftfy
Natural language processing:
NLTK
Pattern
TextBlob
jieba
SnowNLP
loso
Browser automation:
selenium
Ghost.py
Spynner
Splinter
Multiprocessing and concurrency:
threading (stdlib)
multiprocessing (stdlib)
celery
concurrent‑futures
Asynchronous networking libraries:
asyncio (stdlib)
Twisted
Tornado
pulsar
diesel
gevent
eventlet
Tomorrow
Queue systems:
celery
huey
mrq
RQ
simpleq
python‑gearman
Cloud execution services:
picloud
dominoup.com
Web content extraction:
newspaper
html2text
python‑goose
lassie
WebSocket libraries:
Crossbar
AutobahnPython
WebSocket‑for‑Python
DNS utilities:
dnsyo
pycares
Computer vision:
OpenCV
SimpleCV
mahotas
Proxy tools:
shadowsocks
tproxy
Popular Python web frameworks:
Django – full‑featured, database‑agnostic framework
Flask – lightweight microframework based on Werkzeug and Jinja2
Web2py – rapid‑development framework with built‑in admin
Tornado – asynchronous web server and microframework
CherryPy – minimalistic framework with plugin system
When choosing a framework, avoid the trap of seeking the “best” one; select the one that fits your team’s expertise and project requirements, and don’t over‑focus on performance for low‑traffic sites.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.