Backend Development 5 min read

Introducing DrissionPage: A Python Web Automation Tool Combining Browser Control and Requests

DrissionPage is a Python-based web automation library that merges browser-driven interaction with high‑efficiency request handling, offering a concise syntax, faster execution, and features like iframe handling, shadow‑root processing, and full‑page screenshots, making web scraping and testing more accessible.

Python Programming Learning Circle

May 5, 2024

Introducing DrissionPage: A Python Web Automation Tool Combining Browser Control and Requests

DrissionPage is a Python web automation tool that can control browsers and send/receive network packets, combining the convenience of browser automation with the efficiency of the requests library.

Background

Using requests for data collection on sites that require login often involves analyzing packets, handling JavaScript, dealing with captchas, obfuscation, and signature parameters, which reduces development efficiency. Browsers can bypass many of these obstacles but are slower.

The library was designed to merge these approaches, enabling fast development and execution, allowing mode switching, and providing a user‑friendly API that abstracts details so developers can focus on functionality.

Earlier versions wrapped Selenium; since version 3.0 the core was rewritten to remove Selenium dependency, enhancing features and performance.

Core Capabilities

No webdriver characteristics.

No need to download drivers for different browser versions.

Faster runtime speed.

Can locate elements across iframes without switching contexts.

Treats iframes as normal elements, simplifying logic.

Allows simultaneous operation on multiple browser tabs, even when inactive.

Can directly read browser cache to save images without GUI interaction.

Supports full‑page screenshots, including areas outside the viewport (supported in browsers version 90+).

Handles non‑open shadow‑root elements.

Getting Started Demo

The SessionPage object operates in "s" mode, using the requests library to access pages while providing a Page Object Model (POM) interface.

Example code demonstrates creating a SessionPage, navigating to a URL, locating h3 elements, extracting links, and printing their text and href attributes.

# 导入
from DrissionPage import SessionPage
# 创建页面对象
page = SessionPage()
# 访问网页
page.get('https://gitee.com/explore/all')
# 在页面中查找元素
items = page.eles('t:h3')
# 遍历元素
for item in items[:-1]:
    # 获取当前<h3>元素下的<a>元素
    lnk = item('tag:a')
    # 打印<a>元素文本和href属性
    print(lnk.text, lnk.link)

The script outputs the titles and links of the explored items, demonstrating how DrissionPage simplifies data extraction compared to using requests + BeautifulSoup.

For more details, visit the author's documentation site: https://g1879.gitee.io/drissionpagedocs/.

Additional resources and recommended reading about Python and data scraping are provided via links at the end of the article.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

web automation selenium-alternative drissionpage Scraping

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.