Home » Tutorials » How to Download All Images from a Web Page in Python

How to Download All Images from a Web Page in Python

Imagine effortlessly capturing every image from a website that catches your eye, without the tedious task of clicking and saving each one individually. That’s not just wishful thinking, it’s entirely possible with a bit of Python and a user-friendly graphical interface.

In today’s tutorial, you’ll learn how to download all images from a web page in Python and incorporate a graphical interface, which is perfect for users who want a simple and effective way to save their favorite images.

Let’s get started!

Table of Contents

Necessary Libraries

Make sure to install the tkinter, requests, and beautifulsoup4 libraries via the terminal or your command prompt for the code to function properly:

$ pip install tk
$ pip install requests
$ pip install beautifulsoup4 

Imports

We start by importing several libraries and modules to build our program:

First, we import os to interact with the operating system. This will allow our program to perform tasks like navigating directories and handling files. Next, we import the requests library, which will act as a browser for our program, enabling us to fetch web pages and images.

import os
import requests

To keep track of what our program is doing, we use the logging module. This will act as a diary, recording events and errors, which is invaluable for debugging and monitoring the program’s behavior.

import logging

For the graphical user interface (GUI), we use the tkinter library. From this library, we specifically import ttk, filedialog, and messagebox. These components will help us create buttons, set a directory for storing downloaded images, and display message boxes, respectively.

import tkinter as tk
from tkinter import ttk, filedialog, messagebox

For parsing HTML, we import BeautifulSoup from the bs4 library. This tool is essential for navigating and manipulating the structure of web pages we fetch.

from bs4 import BeautifulSoup

We’ll also need to handle URLs effectively. For this purpose, we import urljoin and urlparse from the urllib.parse module. These functions will help us resolve relative URLs and parse URL components.

from urllib.parse import urljoin, urlparse

Finally, to run multiple tasks concurrently without overloading the main window of our GUI, we use the threading module. This allows our program to perform background operations, such as downloading files, without freezing the user interface.

import threading

Suppressing Logging

In this part of the code, we configure our program to suppress “info” and “debug” log messages, which are less critical and primarily serve to clutter the console output. Instead, we opt to retain “warning” level logs. This means we are instructing the program to not inform us about every action it performs, but rather to alert us only if something goes wrong.

# Suppress INFO logs from the image_downloader logger
logging.getLogger("image_downloader").setLevel(logging.WARNING)


# Suppress debug logs from urllib3 and chardet
logging.getLogger("urllib3").setLevel(logging.WARNING)
logging.getLogger("chardet").setLevel(logging.WARNING)


# Setup logging for monitoring operations
logging.basicConfig(level=logging.DEBUG)
log = logging.getLogger('image_downloader')


# Setup logging for monitoring operations
logging.basicConfig(level=logging.DEBUG)
log = logging.getLogger('image_downloader')

Download Image from a Web Page Functions

Now, let’s define the heart of our code:

check_url_validity Function

As the name suggests, this function verifies the validity of a URL entered by the user. It does so by parsing the input URL into its components, such as scheme, netloc (network location), path, etc.

The function then checks if the URL contains both a netloc and a scheme (e.g., http or https). If both components are present, the function considers the URL valid and returns True. However, if either component is missing, the URL is deemed invalid, and the function returns False.

def check_url_validity(web_url):
   """
   Verifies if the provided URL is well-formed and valid.
   """
   parsed_url = urlparse(web_url)
   return parsed_url.netloc and parsed_url.scheme

extract_image_urls Function

This function begins by accessing the URL inputted by the user, already validated by the previous function. It uses requests.get() to fetch the webpage’s content. Following this, BeautifulSoup is employed to parse the HTML content of the webpage, setting the stage for the next steps.

The function then proceeds to search for images by using find_all('img') to locate <img> tags. For each image found, it extracts the src attribute, which holds the URL of the image. To form the complete URL of each image, the function utilizes urljoin() to combine the base URL with the src attribute.

After creating the URLs, we validate them. Valid URLs are added to a download list. Any errors are logged to alert the user.

def extract_image_urls(web_page_url):
   """
   Collects and returns all the image URLs found on the provided web page URL.
   """
   try:
       page_response = requests.get(web_page_url, headers={'User-Agent': 'Custom User Agent'})
       page_response.raise_for_status()
       html_parser = BeautifulSoup(page_response.content, "html.parser")
       image_urls = []
       for image_tag in html_parser.find_all("img"):
           image_source = image_tag.get("src")
           if image_source:
               full_image_url = urljoin(web_page_url, image_source)
               full_image_url = full_image_url.split("?")[0]  # Strip URL parameters
               if check_url_validity(full_image_url):
                   image_urls.append(full_image_url)
       return image_urls
   except Exception as error:
       log.error(f"Failed to fetch images from {web_page_url}: {error}")
       return []

save_image Function

First off, this function checks if the folder where you want to save your images actually exists. If it doesn’t, no sweat; the function will just go ahead and create it. This step makes sure there’s a spot ready for all those images you’re about to download, keeping everything neat and organized.

Next up, it’s time to get those images. The function uses requests.get() to grab them from the web, one by one, from a list we prepared earlier. It pulls the images in their raw, binary glory and tucks them into the newly confirmed folder, writing the data bit by bit to be super-efficient with memory. Once all is said and done, it logs how everything went, giving us the rundown on which images made it into the folder smoothly, and if any issues popped up along the way.

def save_image(image_url, target_folder):
   """
   Downloads and saves an image from its URL to the specified directory.
   """
   if not os.path.exists(target_folder):
       os.makedirs(target_folder)


   try:
       with requests.get(image_url, stream=True) as response:
           response.raise_for_status()
           file_path = os.path.join(target_folder, os.path.basename(image_url))


           with open(file_path, "wb") as file:
               for chunk in response.iter_content(chunk_size=1024):
                   file.write(chunk)


           log.info(f"Image saved: {file_path}")
   except Exception as error:
       log.error(f"Failed to download {image_url}: {error}")

fetch_and_save_images Function

This one serves as a coordinator for the two preceding functions. Initially, it invokes the extract_image_urls() function to compile a list of image URLs. Following that, it loops through each URL in this list, calling the save_image() function for every single one to download the image and store it in the specified save_path directory. After completing these operations, it displays a message box to signal the successful completion of the process.

def fetch_and_save_images(main_url, save_path):
   """
   Main function to retrieve and store images from a given URL.
   """
   images = extract_image_urls(main_url)
   for image in images:
       save_image(image, save_path)
   messagebox.showinfo("Download Complete", "All images have been downloaded.")

Start Download Process Function

It begins by collecting the user’s inputs, which include a directory path and a web page URL. It then submits the URL to the check_url_validity() function to verify its validity. If the URL is deemed invalid, an error message is displayed to the user through a pop-up.

Conversely, if the URL passes the validation check, the function initiates a thread to run fetch_and_save_images() in the background, thereby starting the image download process. Simultaneously, it informs the user that the download has commenced by displaying a message box.

def start_download_process():
   entered_url = url_input.get()
   save_directory = directory_input.get()
   if not check_url_validity(entered_url):
       messagebox.showerror("Error", "The URL entered is not valid.")
       return


   # Start the download process in a separate thread
   download_thread = threading.Thread(target=fetch_and_save_images, args=(entered_url, save_directory))
   download_thread.start()


   messagebox.showinfo("Download Started", "Image download process has started.")

choose_directory Function

The last one opens a file dialog, enabling the user to select a destination folder for saving images. Once a selection is made, the function captures the chosen directory and updates the input field in the main window with this information. This allows the user to visibly confirm the directory they have selected for image storage.

def choose_directory():
   folder_selected = filedialog.askdirectory()
   if folder_selected:
       directory_input.delete(0, tk.END)
       directory_input.insert(0, folder_selected)

GUI Setup

Here, we start by constructing the main window, which will host our interface elements including buttons, entry fields, and labels. First, we assign a title to the window. Following that, we create a label prompting the user to enter a URL, and directly below it, we place an entry field for the URL input.

Next, we introduce another entry field designated for the directory path, accompanied by a “Browse” button. This button is linked to the choose_directory() function, allowing users to select a save directory through a file dialog. Subsequently, we add a “Start Download” button, which is connected to the start_download_process() function, initiating the download process upon click.

All these elements are organized within the main window using a grid layout, ensuring a structured and user-friendly interface.

# GUI Setup
root = tk.Tk()
root.title("Image Downloader - The Pycodes")


url_label = ttk.Label(root, text="Enter URL:")
url_label.grid(column=0, row=0, padx=5, pady=5, sticky=tk.W)
url_input = ttk.Entry(root, width=50)
url_input.grid(column=1, row=0, padx=5, pady=5, sticky=tk.EW)


directory_label = ttk.Label(root, text="Save Images To:")
directory_label.grid(column=0, row=1, padx=5, pady=5, sticky=tk.W)
directory_input = ttk.Entry(root, width=50)
directory_input.grid(column=1, row=1, padx=5, pady=5, sticky=tk.EW)
browse_button = ttk.Button(root, text="Browse", command=choose_directory)
browse_button.grid(column=2, row=1, padx=5, pady=5)


download_btn = ttk.Button(root, text="Start Download", command=start_download_process)
download_btn.grid(column=0, row=2, columnspan=3, padx=5, pady=5, sticky=tk.EW)

Main Loop

Finally, this part ensures that the main window remains open and responsive to user interactions until it is explicitly closed by the user.

root.mainloop()

Example

Full Code

import os
import requests
import logging
import tkinter as tk
from tkinter import ttk, filedialog, messagebox
from bs4 import BeautifulSoup
from urllib.parse import urljoin, urlparse
import threading


# Suppress INFO logs from the image_downloader logger
logging.getLogger("image_downloader").setLevel(logging.WARNING)


# Suppress debug logs from urllib3 and chardet
logging.getLogger("urllib3").setLevel(logging.WARNING)
logging.getLogger("chardet").setLevel(logging.WARNING)


# Setup logging for monitoring operations
logging.basicConfig(level=logging.DEBUG)
log = logging.getLogger('image_downloader')


# Setup logging for monitoring operations
logging.basicConfig(level=logging.DEBUG)
log = logging.getLogger('image_downloader')




def check_url_validity(web_url):
   """
   Verifies if the provided URL is well-formed and valid.
   """
   parsed_url = urlparse(web_url)
   return parsed_url.netloc and parsed_url.scheme




def extract_image_urls(web_page_url):
   """
   Collects and returns all the image URLs found on the provided web page URL.
   """
   try:
       page_response = requests.get(web_page_url, headers={'User-Agent': 'Custom User Agent'})
       page_response.raise_for_status()
       html_parser = BeautifulSoup(page_response.content, "html.parser")
       image_urls = []
       for image_tag in html_parser.find_all("img"):
           image_source = image_tag.get("src")
           if image_source:
               full_image_url = urljoin(web_page_url, image_source)
               full_image_url = full_image_url.split("?")[0]  # Strip URL parameters
               if check_url_validity(full_image_url):
                   image_urls.append(full_image_url)
       return image_urls
   except Exception as error:
       log.error(f"Failed to fetch images from {web_page_url}: {error}")
       return []




def save_image(image_url, target_folder):
   """
   Downloads and saves an image from its URL to the specified directory.
   """
   if not os.path.exists(target_folder):
       os.makedirs(target_folder)


   try:
       with requests.get(image_url, stream=True) as response:
           response.raise_for_status()
           file_path = os.path.join(target_folder, os.path.basename(image_url))


           with open(file_path, "wb") as file:
               for chunk in response.iter_content(chunk_size=1024):
                   file.write(chunk)


           log.info(f"Image saved: {file_path}")
   except Exception as error:
       log.error(f"Failed to download {image_url}: {error}")




def fetch_and_save_images(main_url, save_path):
   """
   Main function to retrieve and store images from a given URL.
   """
   images = extract_image_urls(main_url)
   for image in images:
       save_image(image, save_path)
   messagebox.showinfo("Download Complete", "All images have been downloaded.")




def start_download_process():
   entered_url = url_input.get()
   save_directory = directory_input.get()
   if not check_url_validity(entered_url):
       messagebox.showerror("Error", "The URL entered is not valid.")
       return


   # Start the download process in a separate thread
   download_thread = threading.Thread(target=fetch_and_save_images, args=(entered_url, save_directory))
   download_thread.start()


   messagebox.showinfo("Download Started", "Image download process has started.")




def choose_directory():
   folder_selected = filedialog.askdirectory()
   if folder_selected:
       directory_input.delete(0, tk.END)
       directory_input.insert(0, folder_selected)




# GUI Setup
root = tk.Tk()
root.title("Image Downloader - The Pycodes")


url_label = ttk.Label(root, text="Enter URL:")
url_label.grid(column=0, row=0, padx=5, pady=5, sticky=tk.W)
url_input = ttk.Entry(root, width=50)
url_input.grid(column=1, row=0, padx=5, pady=5, sticky=tk.EW)


directory_label = ttk.Label(root, text="Save Images To:")
directory_label.grid(column=0, row=1, padx=5, pady=5, sticky=tk.W)
directory_input = ttk.Entry(root, width=50)
directory_input.grid(column=1, row=1, padx=5, pady=5, sticky=tk.EW)
browse_button = ttk.Button(root, text="Browse", command=choose_directory)
browse_button.grid(column=2, row=1, padx=5, pady=5)


download_btn = ttk.Button(root, text="Start Download", command=start_download_process)
download_btn.grid(column=0, row=2, columnspan=3, padx=5, pady=5, sticky=tk.EW)


root.mainloop()

Happy Coding!

Subscribe for Top Free Python Tutorials!

Receive the best directly.  Elevate Your Coding Journey!

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top
×