Big Data 7 min read

Python Web Scraping Tutorial with Selenium and BeautifulSoup

This tutorial demonstrates how to create a Python web scraper using Selenium and BeautifulSoup, covering login automation, HTML retrieval, parsing with html5lib, data extraction from tables, and strategies for handling anti‑scraping measures such as headless browsing and proxy usage.

Python Programming Learning Circle

Jan 14, 2021

Python Web Scraping Tutorial with Selenium and BeautifulSoup

Web scraping is increasingly used by e‑commerce companies to collect competitor data and research new products.

The article explains how to build a Python scraper using Selenium and BeautifulSoup, covering the basic concepts of HTML tree structure and the steps required to fetch a page, log in, and extract data.

First, obtain the target URL and download its HTML. Then use Selenium to launch a (optionally headless) Chrome browser, navigate to the login page, fill in credentials, and retrieve the page source after authentication.

# Import libraries
from selenium import webdriver
from bs4 import BeautifulSoup

Configure ChromeDriver and headless options:

# Chrome driver path
chromedriver = '/usr/local/bin/chromedriver'
options = webdriver.ChromeOptions()
options.add_argument('headless')
browser = webdriver.Chrome(executable_path=chromedriver,
                            chrome_options=options)

Navigate to the login page, locate the email, password and submit elements, send credentials, and click the login button.

# Open login page
browser.get('http://playsports365.com/default.aspx')
email = browser.find_element_by_name('ctl00$MainContent$ctlLogin$_UserName')
password = browser.find_element_by_name('ctl00$MainContent$ctlLogin$_Password')
login = browser.find_element_by_name('ctl00$MainContent$ctlLogin$BtnSubmit')
email.send_keys('********')
password.send_keys('*******')
login.click()

After successful login, go to the target page and capture the HTML content.

# After login, go to "OpenBets" page
browser.get('http://playsports365.com/wager/OpenBets.aspx')
requiredHtml = browser.page_source

Parse the HTML with BeautifulSoup using the html5lib parser, locate the desired table, and iterate over rows and cells to print extracted values.

soup = BeautifulSoup(requiredHtml, 'html5lib')
table = soup.findChildren('table')
my_table = table[0]
rows = my_table.findChildren(['th','tr'])
for row in rows:
    cells = row.findChildren('td')
    for cell in cells:
        value = cell.text
        print(value)

Install required packages via pip and run the script with python <program_name>. For sites that block frequent requests, the article suggests using rotating user‑agents, delays, or proxy services such as Tor or commercial proxy providers.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python Selenium beautifulsoup

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.