Backend Development 4 min read

How to Bypass Anti‑Scraping Measures: User‑Agent, Cookies & Proxies

This guide explains practical techniques such as faking User‑Agent headers, rotating cookies, adding random delays, and using proxy pools to prevent IP bans while crawling large amounts of data from websites with anti‑scraping defenses.

Python Programming Learning Circle

Oct 19, 2019

How to Bypass Anti‑Scraping Measures: User‑Agent, Cookies & Proxies

When writing web crawlers, many sites implement anti‑scraping measures that can quickly block your IP, especially when scraping large volumes of data.

The article summarizes several countermeasures that can be applied individually or together for better results.

Fake User‑Agent

Set the User-Agent header to mimic a real browser.

Example:

headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36'}
resp = requests.get(url, headers=headers)

You can also collect multiple browser User‑Agents and randomly select one for each request to increase anonymity.

Random User-Agent selection code example

Random Delays

Introduce a random pause between requests to avoid rapid request patterns.

time.sleep(random.randint(0, 3))  # pause 0‑3 seconds
# or
time.sleep(random.random())          # pause 0‑1 second

Fake Cookies

If a page can be accessed in a browser, copy its cookies and use them in your requests.

Note: Even with cookies, excessive request frequency may still trigger IP bans; manual verification (e.g., captcha) may be required.

Use Proxies

Rotate multiple proxy IPs to distribute requests and prevent a single IP from being blocked.

Anti‑Anti‑Spider Project

For advanced counter‑measures, refer to the "Anti‑Anti‑Spider" project on GitHub, which collects various techniques to evade anti‑scraping defenses.

Link: github.com/luyishisi/Anti-Anti-Spider

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Web Scraping cookies User-Agent anti-scraping proxies

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.