Big Data 7 min read

Python Web Scraping Tutorial with Selenium and BeautifulSoup

This tutorial demonstrates how to create a Python web scraper using Selenium and BeautifulSoup, covering login automation, HTML retrieval, parsing with html5lib, data extraction from tables, and strategies for handling anti‑scraping measures such as headless browsing and proxy usage.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Python Web Scraping Tutorial with Selenium and BeautifulSoup

Web scraping is increasingly used by e‑commerce companies to collect competitor data and research new products.

The article explains how to build a Python scraper using Selenium and BeautifulSoup, covering the basic concepts of HTML tree structure and the steps required to fetch a page, log in, and extract data.

First, obtain the target URL and download its HTML. Then use Selenium to launch a (optionally headless) Chrome browser, navigate to the login page, fill in credentials, and retrieve the page source after authentication.

<code># Import libraries
from selenium import webdriver
from bs4 import BeautifulSoup
</code>

Configure ChromeDriver and headless options:

<code># Chrome driver path
chromedriver = '/usr/local/bin/chromedriver'
options = webdriver.ChromeOptions()
options.add_argument('headless')
browser = webdriver.Chrome(executable_path=chromedriver,
                            chrome_options=options)
</code>

Navigate to the login page, locate the email, password and submit elements, send credentials, and click the login button.

<code># Open login page
browser.get('http://playsports365.com/default.aspx')
email = browser.find_element_by_name('ctl00$MainContent$ctlLogin$_UserName')
password = browser.find_element_by_name('ctl00$MainContent$ctlLogin$_Password')
login = browser.find_element_by_name('ctl00$MainContent$ctlLogin$BtnSubmit')
email.send_keys('********')
password.send_keys('*******')
login.click()
</code>

After successful login, go to the target page and capture the HTML content.

<code># After login, go to "OpenBets" page
browser.get('http://playsports365.com/wager/OpenBets.aspx')
requiredHtml = browser.page_source
</code>

Parse the HTML with BeautifulSoup using the html5lib parser, locate the desired table, and iterate over rows and cells to print extracted values.

<code>soup = BeautifulSoup(requiredHtml, 'html5lib')
table = soup.findChildren('table')
my_table = table[0]
rows = my_table.findChildren(['th','tr'])
for row in rows:
    cells = row.findChildren('td')
    for cell in cells:
        value = cell.text
        print(value)
</code>

Install required packages via pip and run the script with python <program_name> . For sites that block frequent requests, the article suggests using rotating user‑agents, delays, or proxy services such as Tor or commercial proxy providers.

HTML parsingData Extractionweb scrapingSeleniumBeautifulSoup
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.