Running Scrapy Crawlers: Command‑Line, CrawlerProcess, and CrawlerRunner Approaches
This tutorial demonstrates how to execute Scrapy spiders from the command line, run them within Python files using cmdline, and manage single or multiple spiders with CrawlerProcess and CrawlerRunner, highlighting configuration steps, limitations, and best‑practice recommendations.
This guide explains several ways to run Scrapy crawlers, starting with a simple command‑line execution of a spider file (e.g., baidu.py ) and showing two possible command‑line methods.
Next, the article covers running a spider from within a Python file using the cmdline.execute approach, illustrated with screenshots.
It then introduces the CrawlerProcess method for running a spider programmatically, followed by the CrawlerRunner technique, each accompanied by visual examples.
The guide proceeds to running multiple spiders in a single project. It shows that the cmdline method cannot execute multiple spiders sequentially because the process exits after the first spider finishes.
Two more elegant solutions are presented: using CrawlerProcess to run multiple spiders concurrently and using CrawlerRunner to run them one after another, which reduces interference and is recommended by the official documentation.
Finally, the summary notes that cmdline.execute offers the simplest configuration for single‑file spiders, allowing one-time setup with repeated runs, while CrawlerRunner provides a safer way to run multiple spiders sequentially.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.