Running Scrapy Spiders via Command Line, CrawlerProcess, and CrawlerRunner
This guide explains how to execute Scrapy spiders from the command line, within Python scripts using CrawlerProcess or CrawlerRunner, and how to manage multiple spiders efficiently, highlighting configuration steps, execution methods, and practical observations about middleware behavior.
1. Running a Spider from the Command Line
Create a spider file (e.g., baidu.py ) and run it using two possible approaches shown in the screenshots.
2. Running a Spider Inside a Python File
Three methods are demonstrated:
• cmdline.execute – the simplest way to launch a single spider.
• CrawlerProcess – allows running a spider programmatically.
• CrawlerRunner – provides more control over the crawling process.
3. Running Multiple Spiders in One Project
Attempting to run multiple spiders with cmdline.execute fails because the process exits after the first spider finishes.
Two better alternatives are presented:
• Using CrawlerProcess to start several spiders concurrently, though middleware is initialized only once and requests are sent almost simultaneously, which may cause interference.
• Using CrawlerRunner to run spiders sequentially; middleware is still loaded once, but the sequential execution reduces interference and is recommended by the official Scrapy documentation.
Conclusion
The cmdline.execute method offers the simplest configuration for running a single spider repeatedly, while CrawlerProcess and CrawlerRunner provide more flexible solutions for handling multiple spiders with considerations for middleware behavior.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.