Batch Download Images from a Webpage Using Python
This tutorial explains how to use Python to automatically extract image URLs from a webpage via developer tools and regular expressions, then download all images in bulk with a simple script that handles sessions, headers, and file saving.
The article demonstrates a two‑step process for bulk downloading images displayed on a webpage: first, locate the image URLs using the browser's developer tools (F12) and extract them with a regular expression that matches the data-src attribute; second, iterate over the collected URLs and download each image to a local folder using the requests library.
It provides detailed instructions on how to open the HTML source, identify the <img> tags, and verify that the correct part of the HTML is highlighted. The extracted URLs are stored in a list for later processing.
Finally, a complete Python script is presented. The script creates a session with custom headers, fetches the target page, parses the HTML to find all image links, and then downloads each image while handling errors and small file size checks. The code is wrapped in a ... block to preserve formatting:
# -*- coding: utf-8 -*-
import re
import requests
from urllib import error
from bs4 import BeautifulSoup
import os
file = ''
List = []
def Find(url, A):
global List
print('正在检测图片总数,请稍等.....')
s = 0
try:
Result = A.get(url, timeout=7, allow_redirects=False)
except BaseException:
print('error')
else:
result = Result.text
pic_url = re.findall('data-src="(.*?)" data-type', result) # 先利用正则表达式找到图片url
s += len(pic_url)
if len(pic_url) == 0:
print('没读到')
else:
List.append(pic_url)
return s
def dowmloadPicture():
num = 1
for each in List[0]:
print('正在下载第' + str(num) + '张图片,图片地址:' + str(each))
try:
if each is not None:
pic = requests.get(each, timeout=7)
else:
continue
except BaseException:
print('错误,当前图片无法下载')
continue
else:
if len(pic.content) < 200:
continue
string = file + r'\' + str(num) + '.jpg'
fp = open(string, 'wb')
fp.write(pic.content)
fp.close()
num += 1
if __name__ == '__main__':
headers = {
'Accept-Language': 'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2',
'Connection': 'keep-alive',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0',
'Upgrade-Insecure-Requests': '1'
}
A = requests.Session()
A.headers = headers
url = 'https://mp.weixin.qq.com/s/An0nKnwlml9gvyUDyT65zQ'
total = Find(url, A)
print('经过检测图片共有%d张' % (total))
file = input('请建立一个存储图片的文件夹,输入文件夹名称即可: ')
if os.path.exists(file):
print('该文件已存在,请重新输入')
file = input('请建立一个存储图片的文件夹,)输入文件夹名称即可: ')
os.mkdir(file)
else:
os.mkdir(file)
dowmloadPicture()
print('当前爬取结束,感谢使用')The guide also notes that the script can be adapted to other URLs by changing the target link and, if necessary, adjusting the regular expression to match different HTML structures.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.