Backend Development 7 min read

QueryList: A Modern PHP Content Scraping Library – Features, Installation, and Usage Guide

This article introduces QueryList, a modern PHP content‑scraping tool that uses CSS selectors instead of regex, explains its two versions (V3 and V4), shows how to install it via Composer, demonstrates basic crawling code and various collection methods such as flatten, take, reverse, filter, map, and multi‑request concurrency.

Laravel Tech Community
Laravel Tech Community
Laravel Tech Community
QueryList: A Modern PHP Content Scraping Library – Features, Installation, and Usage Guide

QueryList is a PHP content‑scraping library that adopts modern development ideas, offering concise syntax, extensibility, and CSS‑selector based extraction, which simplifies and makes code more maintainable compared with traditional regex‑based crawlers.

It provides a complete solution including DOM selection via CSS selectors, HTTP client GuzzleHTTP, content filtering, built‑in charset handling, and extensible plugins.

Two supported versions exist: V3 (requires PHP 5.3+, single file, no Composer) and V4 (requires PHP 7.1+, Composer‑based, modular, richer API). Installation via Composer:

composer require jaeger/querylist
composer require jaeger/querylist:~V4
composer config -g repo.packagist composer https://mirrors.aliyun.com/composer/

Basic usage example shows loading Composer autoloader, using QL\QueryList , fetching a page, defining rules for title and link, limiting range, and printing results:

require_once('./vendor/autoload.php');
use QL\QueryList;
$data = QueryList::get('https://www.baidu.com/s?wd=QueryList', null, [
'headers' => [
        'User-Agent' => 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36',
        'Accept-Encoding' => 'gzip, deflate, br',
    ]
])->rules([
    'title' => ['h3', 'text'],
    'link' => ['h3>a', 'href']
])->range('.result')
    ->queryData();
print_r($data);

The returned collection can be processed with methods such as flatten() , take() , reverse() , filter() , map() , and these methods can be chained for complex transformations.

$rt = $data->flatten()->all();
print_r($rt);

$rt = $data->take(2)->all();
print_r($rt);

$rt = $data->reverse()->all();
print_r($rt);

$rt = $data->filter(function($item){
    return $item['image'] != '/path/to/2.jpg';
})->all();
print_r($rt);

$rt = $data->map(function($item){
    $item['image'] = 'http://xxx.com' . $item['image'];
    return $item;
})->all();
print_r($rt);

$rt = $data->reverse()->map(function($item){
    $item['image'] = 'http://xxx.com' . $item['image'];
    return $item;
})->take(2)->all();
print_r($rt);

Multi‑request concurrency is supported: define URLs, rules, range, then call multiGet($urls) with concurrency, options, headers, and success/error callbacks.

use GuzzleHttp\Psr7\Response;
use QL\QueryList;

$urls = [
    'https://github.com/trending/go?since=daily',
    'https://github.com/trending/html?since=daily',
    'https://github.com/trending/java?since=daily'
];

$rules = [
    'name' => ['h3>a', 'text'],
    'desc' => ['.py-1', 'text']
];
$range = '.repo-list>li';

QueryList::rules($rules)
    ->range($range)
    ->multiGet($urls)
    ->concurrency(2)
    ->withOptions(['timeout' => 60])
    ->withHeaders(['User-Agent' => 'QueryList'])
    ->success(function (QueryList $ql, Response $response, $index) {
        $data = $ql->queryData();
        print_r($data);
    })
    ->error(function (QueryList $ql, $reason, $index) {
        // handle error
    })
    ->send();

Official website: http://www.querylist.cc

Data ProcessingphpComposerweb scrapingcontent-extractionquerylist
Laravel Tech Community
Written by

Laravel Tech Community

Specializing in Laravel development, we continuously publish fresh content and grow alongside the elegant, stable Laravel framework.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.