Backend Development 6 min read

Batch Download Images from a Webpage Using Python

This tutorial explains how to use Python with requests and BeautifulSoup to locate image URLs in a webpage's HTML, extract them via regular expressions, and programmatically download all images to a local folder in an automated batch process.

Python Programming Learning Circle

Dec 1, 2023

Batch Download Images from a Webpage Using Python

The article demonstrates how to efficiently download all images from a webpage that presents its content as PPT slides, avoiding the slow manual click‑by‑click method by using a web‑scraping script.

Two main steps are covered: (1) obtaining the URLs of all images on the page by inspecting the HTML (using F12) and locating img tags or attributes such as data-src and data-type; (2) iterating over those URLs to download each image and save it locally.

After explaining how to find the image links in the browser's developer tools, the article shows that the links can be extracted with a regular expression like data-src="(.*?)" data-type. The same pattern applies to other images on the page.

The complete Python program is provided below. It uses requests for HTTP requests, re for regex matching, and BeautifulSoup for optional HTML parsing. The script prompts the user for a folder name, creates it if necessary, counts the images, and then downloads each image while skipping very small files.

# -*- coding: utf-8 -*-
import re
import requests
from urllib import error
from bs4 import BeautifulSoup
import os

file = ''
List = []

# Crawl image links

def Find(url, A):
    global List
    print('正在检测图片总数，请稍等.....')
    s = 0
    try:
        Result = A.get(url, timeout=7, allow_redirects=False)
    except BaseException:
        print('error')
    else:
        result = Result.text
        pic_url = re.findall('data-src="(.*?)" data-type', result)  # extract URLs
        s += len(pic_url)
        if len(pic_url) == 0:
            print('没读到')
        else:
            List.append(pic_url)
    return s

# Download pictures

def dowmloadPicture():
    num = 1
    for each in List[0]:
        print('正在下载第' + str(num) + '张图片，图片地址:' + str(each))
        try:
            if each is not None:
                pic = requests.get(each, timeout=7)
            else:
                continue
        except BaseException:
            print('错误，当前图片无法下载')
            continue
        else:
            if len(pic.content) < 200:
                continue
            string = file + r'\\' + str(num) + '.jpg'
            fp = open(string, 'wb')
            fp.write(pic.content)
            fp.close()
            num += 1

if __name__ == '__main__':
    headers = {
        'Accept-Language': 'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2',
        'Connection': 'keep-alive',
        'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0',
        'Upgrade-Insecure-Requests': '1'
    }
    A = requests.Session()
    A.headers = headers
    url = 'https://mp.weixin.qq.com/s/An0nKnwlml9gvyUDyT65zQ'
    total = Find(url, A)
    print('经过检测图片共有%d张' % (total))
    file = input('请建立一个存储图片的文件夹，输入文件夹名称即可: ')
    if not os.path.exists(file):
        os.mkdir(file)
    else:
        print('该文件已存在，请重新输入')
        file = input('请建立一个存储图片的文件夹，输入文件夹名称即可: ')
        os.mkdir(file)
    dowmloadPicture()
    print('当前爬取结束，感谢使用')

To use the script, simply replace the url variable with the target page URL and adjust the regular expression if the image attributes differ.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python automation Web Scraping beautifulsoup Image Download

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.