Tag

web crawler

1 views collected around this technical thread.

Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Sep 26, 2024 · Backend Development

Step-by-Step Guide to Building a Spring Boot Backend and Douyin Hot Search Crawler

This tutorial walks through creating a Maven‑based Spring Boot backend project with multiple modules, configuring pom.xml files, application properties, and logging, then adds a scheduled Douyin hot‑search crawler using OkHttp, demonstrating full end‑to‑end setup for a web service.

DouyinJavaMaven
0 likes · 31 min read
Step-by-Step Guide to Building a Spring Boot Backend and Douyin Hot Search Crawler
php中文网 Courses
php中文网 Courses
Jan 18, 2024 · Backend Development

Building an Efficient Web Crawler with PHP and Selenium

This article explains how to set up a web crawler using PHP and Selenium, covering installation of Selenium and its PHP bindings via Composer, configuring a Chrome WebDriver, simulating user actions to fetch news links, extracting titles and content, and storing results, with tips for further optimization.

AutomationSeleniumphp
0 likes · 4 min read
Building an Efficient Web Crawler with PHP and Selenium
php中文网 Courses
php中文网 Courses
Dec 14, 2023 · Backend Development

Building a Simple Web Crawler with PHP on Linux

This article explains how to create a basic web crawler in a Linux environment using PHP, covering prerequisite installations, script development with cURL and DOMDocument, execution steps, and sample output while emphasizing legal and ethical considerations for web scraping.

CurlDOMDocumentLinux
0 likes · 4 min read
Building a Simple Web Crawler with PHP on Linux
php中文网 Courses
php中文网 Courses
May 4, 2023 · Backend Development

How to Write a Simple PHP Web Crawler

This guide explains how to create a basic PHP web crawler by using cURL to fetch pages, DOMDocument and XPath to parse HTML, and then storing the extracted data, while also providing a complete example script and reminders about legal and ethical considerations.

CurlDOMDocumentbackend development
0 likes · 3 min read
How to Write a Simple PHP Web Crawler
php中文网 Courses
php中文网 Courses
Apr 10, 2023 · Backend Development

A PHP Web Crawler: Design, Implementation, and Challenges

This article describes a PHP‑based web crawler that extracts links and images using regular expressions, stores URLs in MySQL, handles duplicate detection via MD5, discusses performance limitations, and provides the full source code and usage instructions.

MySQLURL processingbackend development
0 likes · 8 min read
A PHP Web Crawler: Design, Implementation, and Challenges
Python Programming Learning Circle
Python Programming Learning Circle
Dec 31, 2021 · Information Security

Photon: High‑Efficiency Multithreaded Web Crawler – Features, Compatibility, and Usage Guide

Photon is a fast, multithreaded Python web crawler that extracts URLs, files, and various intelligence from targets, offering flexible options, Ninja mode, and extensive command‑line parameters while supporting Linux, Windows, macOS, and Termux environments.

Information SecurityPythoncommand-line
0 likes · 10 min read
Photon: High‑Efficiency Multithreaded Web Crawler – Features, Compatibility, and Usage Guide
Python Programming Learning Circle
Python Programming Learning Circle
Apr 25, 2020 · Backend Development

Building a Node.js Web Crawler for Indeed Job Listings with MongoDB

This article details how to build a Node.js web crawler for Indeed job listings, covering entry page selection, HTML parsing with Cheerio, request handling, MongoDB task storage, and a modular architecture that extracts city, category, search, brief, and detail data for a searchable job engine.

MongoDBbackendindeed
0 likes · 15 min read
Building a Node.js Web Crawler for Indeed Job Listings with MongoDB
Java Architecture Diary
Java Architecture Diary
Aug 2, 2019 · Backend Development

Mastering Mica-HTTP v1.1.7: A Lightweight Web Crawler Guide

This tutorial continues the mica-http complete guide, showcasing the new v1.1.7 release with proxy, retry, page crawling, model visualization, results, documentation links, and open‑source tool recommendations for building efficient backend crawlers.

HTTPJavabackend
0 likes · 3 min read
Mastering Mica-HTTP v1.1.7: A Lightweight Web Crawler Guide
Sohu Tech Products
Sohu Tech Products
Dec 5, 2018 · Backend Development

Overview of Web Crawler Types and the Architecture of the Mole Crawler System

This article explains the evolution and classification of web crawlers, describes the design and components of the Mole distributed crawler—including scheduler, fetcher, processor, rate‑limiting, URL deduplication, and Elasticsearch storage optimization—and outlines common anti‑anti‑crawling strategies.

ElasticsearchURL deduplicationanti-crawling
0 likes · 12 min read
Overview of Web Crawler Types and the Architecture of the Mole Crawler System
Tencent IMWeb Frontend Team
Tencent IMWeb Frontend Team
Jan 18, 2018 · Backend Development

Build a Simple Node.js Web Crawler in 16 Lines with Request & Cheerio

This guide walks you through creating a lightweight Node.js web crawler using the request and cheerio modules, covering preparation, installation, core code, and testing steps, so you can fetch page HTML, parse data, and store results with just a few dozen lines of code.

JavaScriptNode.jscheerio
0 likes · 5 min read
Build a Simple Node.js Web Crawler in 16 Lines with Request & Cheerio
Architecture Digest
Architecture Digest
Jan 17, 2018 · Backend Development

Design and Implementation of a Java Web Crawler Framework Inspired by Scrapy

This article explains how to design and build a lightweight Java web crawler framework, covering crawler fundamentals, anti‑scraping challenges, core components such as URL manager, scheduler, downloader, parser and pipeline, and provides concrete code examples and architectural diagrams.

JavaScrapyarchitecture
0 likes · 14 min read
Design and Implementation of a Java Web Crawler Framework Inspired by Scrapy