Tagged articles
3 articles
Page 1 of 1
MaGe Linux Operations
MaGe Linux Operations
Oct 27, 2020 · Backend Development

Build a Distributed Scrapy Crawler in Minutes with RabbitMQ and RedisBloom

This guide walks you through installing Scrapy-Distributed, setting up RabbitMQ and RedisBloom containers, creating a sitemap spider, configuring the distributed scheduler and dupefilter, and running the spider, while explaining why this non‑intrusive solution improves over existing Scrapy‑Redis and scrapy‑rabbitmq approaches.

PythonRabbitMQRedisBloom
0 likes · 7 min read
Build a Distributed Scrapy Crawler in Minutes with RabbitMQ and RedisBloom
Efficient Ops
Efficient Ops
Mar 30, 2017 · Backend Development

Designing a Scalable, Configurable Distributed Web Crawler

This article outlines the motivation, requirements, modular decomposition, and architecture of a distributed web crawling platform that emphasizes reusability, lightweight modules, real‑time monitoring, and easy configuration for diverse data‑collection tasks.

Backend ArchitectureMonitoringconfiguration
0 likes · 10 min read
Designing a Scalable, Configurable Distributed Web Crawler