Build a Simple Node.js Web Crawler in 16 Lines with Request & Cheerio
This guide walks you through creating a lightweight Node.js web crawler using the request and cheerio modules, covering preparation, installation, core code, and testing steps, so you can fetch page HTML, parse data, and store results with just a few dozen lines of code.
Since Node.js appeared, developers have used it for tasks traditionally handled by backend languages like PHP or Python, such as writing web crawlers. This tutorial shows how to build a simple crawler with just a few dozen lines of code.
Crawler Overview
Send HTTP requests to obtain page HTML (optionally adding headers like cookies or referer). Parse the HTML using regular expressions or third‑party modules to extract useful data. Persist the extracted data to a database or file.
Preparation Stage
NPM
Install the required modules:
requestand
cheerioRun
npm install request cheerioin your project directory.
After installation, your
package.jsonwill contain the two dependencies.
crawler.js
Create a file named
crawler.jsand require the installed modules:
<code>const request = require('request');
const cheerio = require('cheerio');
</code>Learning Stage
REQUEST
The
requestmodule is a simplified HTTP client that wraps
http.request, making it easy to download resources.
CHEERIO
cheerioprovides a server‑side implementation of jQuery’s core API, allowing you to manipulate and query the DOM of fetched HTML quickly and flexibly.
Construction Stage
Use
requestto fetch the target page (e.g., an article list on site A) and then parse the response with
cheerioto extract the desired information.
Finally, write the extracted results to
result.json:
<code>const fs = require('fs');
fs.writeFileSync('result.json', JSON.stringify(data, null, 2));
</code>Experiment Stage
Run the crawler with
node crawler.js. After execution, a
result.jsonfile should appear in your directory containing the scraped data.
Congratulations, you have built a functional web crawler with only about 16 lines of code.
Tencent IMWeb Frontend Team
IMWeb Frontend Community gathering frontend development enthusiasts. Follow us for refined live courses by top experts, cutting‑edge technical posts, and to sharpen your frontend skills.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.