Information Security 10 min read

Photon: High‑Efficiency Multithreaded Web Crawler – Features, Compatibility, and Usage Guide

Photon is a fast, multithreaded Python web crawler that extracts URLs, files, and various intelligence from targets, offering flexible options, Ninja mode, and extensive command‑line parameters while supporting Linux, Windows, macOS, and Termux environments.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Photon: High‑Efficiency Multithreaded Web Crawler – Features, Compatibility, and Usage Guide

Project URL

https://github.com/s0md3v/Photon

Main Features

Photon provides many options for customized crawling, but its standout capability is high‑speed data extraction through intelligent multithreading.

Data Extraction

By default, Photon extracts the following data: URLs (both in‑scope and out‑of‑scope) Parameterized URLs (e.g., example.com/gallery.php?id=2) Intelligence such as emails, social media accounts, Amazon buckets, etc. Files (pdf, png, xml, …) JavaScript and other files Strings matching custom regular‑expression patterns

Extracted information is saved as shown in the diagram below.

Intelligent Multithreading

Unlike many tools that misuse threads, Photon assigns distinct work lists to each thread, avoiding contention and maximizing throughput.

Ninja Mode

In Ninja mode, three online servers act as proxies, allowing up to four clients to request the target simultaneously, which improves speed and reduces the risk of connection resets.

Compatibility & Dependencies

Compatibility

Photon works on Python 2.x and 3.x, though future development may drop Python 2 support.

Operating Systems

Tested on Linux (Arch, Debian, Ubuntu), Termux, Windows 7/10, and macOS. Bugs can be reported on GitHub.

Color Output

ANSI colour codes are not supported on macOS and Windows terminals.

Dependencies

<code>requests
urllib3
argparse</code>

All other required libraries are part of the Python standard library.

How to Use Photon

<code>syntax: photon.py [options]
  -u --url          target URL
  -l --level        crawl depth (default 2)
  -t --threads      number of threads (default 2)
  -d --delay        delay between requests (seconds)
  -c --cookie       cookie header
  -r --regex        custom regex pattern
  -s --seeds        additional sub‑URLs (comma‑separated)
  -e --export       export format (e.g., json)
  -o --output       output directory (default target domain)
  --exclude         exclude URLs matching regex
  --timeout         request timeout (seconds, default 5)
  --ninja           enable Ninja mode
  --update          check for updates
  --dns             dump DNS data
  --only-urls       extract URLs only
  --user-agent      custom User‑Agent(s) (comma‑separated)
</code>

Single‑Site Crawl

<code>python photon.py -u "http://example.com"</code>

Depth Control

<code>python photon.py -u "http://example.com" -l 3</code>

Depth defines how many link levels are followed; depth 2 crawls the homepage and its immediate links.

Thread Count

<code>python photon.py -u "http://example.com" -t 10</code>

Increasing threads speeds up crawling but may trigger security mechanisms or overload small sites.

Request Delay

<code>python photon.py -u "http://example.com" -d 2</code>

Specifies a pause (in seconds) between each HTTP(S) request.

Timeout

<code>python photon.py -u "http://example.com" --timeout=4</code>

Sets the maximum wait time for a response before timing out.

Cookies

<code>python photon.py -u "http://example.com" -c "PHPSESSID=u5423d78fqbaju9a0qke25ca87"</code>

Allows sending a Cookie header for sites that require session validation.

Output Directory

<code>python photon.py -u "http://example.com" -o "my_directory"</code>

Results are saved in a folder named after the target domain by default; this option overrides the folder name.

Exclude Specific URLs

<code>python photon.py -u "http://example.com" --exclude="/blog/20[17|18]"</code>

URLs matching the provided regex are omitted from crawling and results.

Specify Sub‑URLs

<code>python photon.py -u "http://example.com" --seeds "http://example.com/blog/2018,http://example.com/portals.html"</code>

Custom seed URLs can be added, separated by commas.

Custom User‑Agents

<code>python photon.py -u "http://example.com" --user-agent "curl/7.35.0,Wget/1.15 (linux-gnu)"</code>

Overrides the default user‑agent list without editing the user‑agents.txt file.

Custom Regex Pattern

<code>python photon.py -u "http://example.com" --regex "\d{10}"</code>

Extracts strings that match the supplied regular expression during crawling.

Export Results

<code>python photon.py -u "http://example.com" --export=json</code>

Supported export format: json.

Skip Data Extraction

<code>python photon.py -u "http://example.com" --only-urls</code>

Only URLs are collected; files such as JavaScript are ignored.

Update

<code>python photon.py --update</code>

Checks for a newer version, downloads it, and merges updates without overwriting existing files.

Ninja Mode

Enables the use of three proxy sites to issue requests on your behalf:

codebeautify.org photopea.com pixlr.com

DNS Dump

<code>python photon.py -u http://example.com --dns</code>

Generates an image displaying DNS data for the target domain (sub‑domains are not supported).

Source references: kitploit, Covfefe compilation; please credit FreeBuf.COM when republishing.

pythonMultithreadingcommand lineinformation securityweb crawlerreconnaissance
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.