Information Security 10 min read

Photon: High‑Efficiency Multithreaded Web Crawler – Features, Compatibility, and Usage Guide

Photon is a fast, multithreaded Python web crawler that extracts URLs, files, and various intelligence from targets, offering flexible options, Ninja mode, and extensive command‑line parameters while supporting Linux, Windows, macOS, and Termux environments.

Python Programming Learning Circle

Dec 31, 2021

Photon: High‑Efficiency Multithreaded Web Crawler – Features, Compatibility, and Usage Guide

Project URL

https://github.com/s0md3v/Photon

Main Features

Photon provides many options for customized crawling, but its standout capability is high‑speed data extraction through intelligent multithreading.

Data Extraction

By default, Photon extracts the following data: URLs (both in‑scope and out‑of‑scope) Parameterized URLs (e.g., example.com/gallery.php?id=2) Intelligence such as emails, social media accounts, Amazon buckets, etc. Files (pdf, png, xml, …) JavaScript and other files Strings matching custom regular‑expression patterns

Extracted information is saved as shown in the diagram below.

Intelligent Multithreading

Unlike many tools that misuse threads, Photon assigns distinct work lists to each thread, avoiding contention and maximizing throughput.

Ninja Mode

In Ninja mode, three online servers act as proxies, allowing up to four clients to request the target simultaneously, which improves speed and reduces the risk of connection resets.

Compatibility & Dependencies

Compatibility

Photon works on Python 2.x and 3.x, though future development may drop Python 2 support.

Operating Systems

Tested on Linux (Arch, Debian, Ubuntu), Termux, Windows 7/10, and macOS. Bugs can be reported on GitHub.

Color Output

ANSI colour codes are not supported on macOS and Windows terminals.

Dependencies

requests
urllib3
argparse

All other required libraries are part of the Python standard library.

How to Use Photon

syntax: photon.py [options]
  -u --url          target URL
  -l --level        crawl depth (default 2)
  -t --threads      number of threads (default 2)
  -d --delay        delay between requests (seconds)
  -c --cookie       cookie header
  -r --regex        custom regex pattern
  -s --seeds        additional sub‑URLs (comma‑separated)
  -e --export       export format (e.g., json)
  -o --output       output directory (default target domain)
  --exclude         exclude URLs matching regex
  --timeout         request timeout (seconds, default 5)
  --ninja           enable Ninja mode
  --update          check for updates
  --dns             dump DNS data
  --only-urls       extract URLs only
  --user-agent      custom User‑Agent(s) (comma‑separated)

Single‑Site Crawl

python photon.py -u "http://example.com"

Depth Control

python photon.py -u "http://example.com" -l 3

Depth defines how many link levels are followed; depth 2 crawls the homepage and its immediate links.

Thread Count

python photon.py -u "http://example.com" -t 10

Increasing threads speeds up crawling but may trigger security mechanisms or overload small sites.

Request Delay

python photon.py -u "http://example.com" -d 2

Specifies a pause (in seconds) between each HTTP(S) request.

Timeout

python photon.py -u "http://example.com" --timeout=4

Sets the maximum wait time for a response before timing out.

Cookies

python photon.py -u "http://example.com" -c "PHPSESSID=u5423d78fqbaju9a0qke25ca87"

Allows sending a Cookie header for sites that require session validation.

Output Directory

python photon.py -u "http://example.com" -o "my_directory"

Results are saved in a folder named after the target domain by default; this option overrides the folder name.

Exclude Specific URLs

python photon.py -u "http://example.com" --exclude="/blog/20[17|18]"

URLs matching the provided regex are omitted from crawling and results.

Specify Sub‑URLs

python photon.py -u "http://example.com" --seeds "http://example.com/blog/2018,http://example.com/portals.html"

Custom seed URLs can be added, separated by commas.

Custom User‑Agents

python photon.py -u "http://example.com" --user-agent "curl/7.35.0,Wget/1.15 (linux-gnu)"

Overrides the default user‑agent list without editing the user‑agents.txt file.

Custom Regex Pattern

python photon.py -u "http://example.com" --regex "\d{10}"

Extracts strings that match the supplied regular expression during crawling.

Export Results

python photon.py -u "http://example.com" --export=json

Supported export format: json.

Skip Data Extraction

python photon.py -u "http://example.com" --only-urls

Only URLs are collected; files such as JavaScript are ignored.

Update

python photon.py --update

Checks for a newer version, downloads it, and merges updates without overwriting existing files.

Ninja Mode

Enables the use of three proxy sites to issue requests on your behalf:

codebeautify.org photopea.com pixlr.com

DNS Dump

python photon.py -u http://example.com --dns

Generates an image displaying DNS data for the target domain (sub‑domains are not supported).

Source references: kitploit, Covfefe compilation; please credit FreeBuf.COM when republishing.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

multithreading Information Security command-line Web Crawler reconnaissance

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.