Backend Development 3 min read

Running Scrapy Crawlers: Command‑Line, CrawlerProcess, and CrawlerRunner Approaches

This tutorial demonstrates how to execute Scrapy spiders from the command line, run them within Python files using cmdline, and manage single or multiple spiders with CrawlerProcess and CrawlerRunner, highlighting configuration steps, limitations, and best‑practice recommendations.

Python Programming Learning Circle

Jan 24, 2024

Running Scrapy Crawlers: Command‑Line, CrawlerProcess, and CrawlerRunner Approaches

This guide explains several ways to run Scrapy crawlers, starting with a simple command‑line execution of a spider file (e.g., baidu.py) and showing two possible command‑line methods.

Next, the article covers running a spider from within a Python file using the cmdline.execute approach, illustrated with screenshots.

It then introduces the CrawlerProcess method for running a spider programmatically, followed by the CrawlerRunner technique, each accompanied by visual examples.

The guide proceeds to running multiple spiders in a single project. It shows that the cmdline method cannot execute multiple spiders sequentially because the process exits after the first spider finishes.

Two more elegant solutions are presented: using CrawlerProcess to run multiple spiders concurrently and using CrawlerRunner to run them one after another, which reduces interference and is recommended by the official documentation.

Finally, the summary notes that cmdline.execute offers the simplest configuration for single‑file spiders, allowing one-time setup with repeated runs, while CrawlerRunner provides a safer way to run multiple spiders sequentially.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python Backend Development Web Crawling CrawlerProcess CrawlerRunner

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.