Backend Development 6 min read

Batch Download Images from a Webpage Using Python

This tutorial explains how to use Python to automatically extract image URLs from a webpage via developer tools and regular expressions, then download all images in bulk with a simple script that handles sessions, headers, and file saving.

Python Programming Learning Circle

Jul 18, 2022

Batch Download Images from a Webpage Using Python

The article demonstrates a two‑step process for bulk downloading images displayed on a webpage: first, locate the image URLs using the browser's developer tools (F12) and extract them with a regular expression that matches the data-src attribute; second, iterate over the collected URLs and download each image to a local folder using the requests library.

It provides detailed instructions on how to open the HTML source, identify the <img> tags, and verify that the correct part of the HTML is highlighted. The extracted URLs are stored in a list for later processing.

Finally, a complete Python script is presented. The script creates a session with custom headers, fetches the target page, parses the HTML to find all image links, and then downloads each image while handling errors and small file size checks. The code is wrapped in a ... block to preserve formatting:

# -*- coding: utf-8 -*-
import re
import requests
from urllib import error
from bs4 import BeautifulSoup
import os

file = ''
List = []

def Find(url, A):
    global List
    print('正在检测图片总数，请稍等.....')
    s = 0
    try:
        Result = A.get(url, timeout=7, allow_redirects=False)
    except BaseException:
        print('error')
    else:
        result = Result.text
        pic_url = re.findall('data-src="(.*?)" data-type', result)  # 先利用正则表达式找到图片url
        s += len(pic_url)
        if len(pic_url) == 0:
            print('没读到')
        else:
            List.append(pic_url)
    return s

def dowmloadPicture():
    num = 1
    for each in List[0]:
        print('正在下载第' + str(num) + '张图片，图片地址:' + str(each))
        try:
            if each is not None:
                pic = requests.get(each, timeout=7)
            else:
                continue
        except BaseException:
            print('错误，当前图片无法下载')
            continue
        else:
            if len(pic.content) < 200:
                continue
            string = file + r'\' + str(num) + '.jpg'
            fp = open(string, 'wb')
            fp.write(pic.content)
            fp.close()
            num += 1

if __name__ == '__main__':
    headers = {
        'Accept-Language': 'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2',
        'Connection': 'keep-alive',
        'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0',
        'Upgrade-Insecure-Requests': '1'
    }
    A = requests.Session()
    A.headers = headers
    url = 'https://mp.weixin.qq.com/s/An0nKnwlml9gvyUDyT65zQ'
    total = Find(url, A)
    print('经过检测图片共有%d张' % (total))
    file = input('请建立一个存储图片的文件夹，输入文件夹名称即可: ')
    if os.path.exists(file):
        print('该文件已存在，请重新输入')
        file = input('请建立一个存储图片的文件夹，)输入文件夹名称即可: ')
        os.mkdir(file)
    else:
        os.mkdir(file)
    dowmloadPicture()
    print('当前爬取结束，感谢使用')

The guide also notes that the script can be adapted to other URLs by changing the target link and, if necessary, adjusting the regular expression to match different HTML structures.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

regex image-downloading web-scraping

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.