Fundamentals 10 min read

Python Web Scraping Tutorial: Extracting Fast Track 100 Companies with BeautifulSoup

This tutorial walks through using Python's urllib and BeautifulSoup libraries to fetch, parse, clean, and export the Fast Track 100 company table into a CSV file, covering installation, page inspection, element extraction, data cleaning, link handling, and file writing.

Python Programming Learning Circle

Apr 12, 2021

Python Web Scraping Tutorial: Extracting Fast Track 100 Companies with BeautifulSoup

As a data scientist, the first task often involves web scraping; this article demonstrates how to scrape the Fast Track 100 company list using Python.

First, install the BeautifulSoup library with pip, which will be used to parse HTML.

Inspect the target page (https://www.fasttrack.co.uk/league-tables/tech-track-100/league-table/) with the browser's developer tools to understand the table structure, noting that each row is enclosed in a <tr> tag.

In Python, import the required modules: BeautifulSoup for HTML parsing, urllib for fetching the page, and csv (or json) for saving the data.

Fetch the page content using urllib, store it in a variable (e.g., page), and create a BeautifulSoup object ( soup) to process the HTML.

Locate all table rows with soup.find_all('tr'), skip the header row, and iterate over the remaining rows.

For each row, extract the eight columns using find_all('td') and assign them to variables such as rank, company, location, year_end, sales_rise, latest_sales, staff, and comments. Some columns contain extra information like a company description and a link.

Clean the extracted data: split the company cell to separate the name and description using find('span'), remove unwanted characters from the sales field with strip, and handle missing values.

To capture the company website, follow the link in the second column to the detail page, fetch it, parse with BeautifulSoup, and locate the <a> element in the last table row; use a try... except block to set the URL to None if not found.

Append each cleaned row to a list ( rows.append(...)) and, after processing all rows, optionally print the list for verification.

Finally, write the rows list to an external CSV file using csv.writer, resulting in a file with 100 rows of structured company data ready for further analysis.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python urllib beautifulsoup

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.