Home » Tutorials » How to Extract and Submit Web Forms Using Python

How to Extract and Submit Web Forms Using Python

Dealing with web forms is something you’ll often encounter when working on web scraping and automation projects. Whether you need to gather information, send data, or automate various tasks, knowing how to handle web forms effectively is a really useful skill. With Python’s powerful libraries, you can create tools that make interacting with web forms easy and hassle-free.

In today’s tutorial, you’ll learn how to extract and submit web forms using Python. We’re going to create a graphical user interface (GUI) that allows you to input a URL, load the forms present on that web page, select one of them, and submit it. We’ll then use the requests library to fetch web pages. So, let’s get started!

Table of Contents

Necessary Libraries

Make sure to install the tkinter, requests, and beautifulsoup4 libraries via the terminal or your command prompt by running these commands:

$ pip install tk
$ pip install requests 
$ pip install beautifulsoup4

Imports

Our program aims to be user-friendly, so we create a graphical user interface. First, we import the tkinter library as tk, and from this library, we specifically import messagebox to use message boxes. Next, we import the urljoin function from the urllib.parse module, which allows us to form complete URLs by combining the base URL with a relative URL. We also import requests to make HTTP requests and retrieve data from URLs.

Additionally, we import BeautifulSoup from the bs4 module to parse HTML and XML documents. Lastly, we include the webbrowser module to open URLs in the default browser.

import tkinter as tk
from tkinter import messagebox
from urllib.parse import urljoin
import requests
from bs4 import BeautifulSoup
import webbrowser

Global Variables and Session Setup

We start by initializing an empty list to store details of forms found on web pages, which we call form_list. Then, we initialize an empty string to hold the URL currently being explored in the application, naming it current_url. Lastly, we initialize web_session to maintain session data when communicating with web servers, such as staying logged into a website across multiple pages, using requests.Session().

# Global variables to store form details and session
form_list = []
current_url = ""
web_session = requests.Session()

Functions for Extracting and Submitting Web Forms

Now, let’s define our functions:

setup_interface Function

The objective of this function is to create various GUI elements. First, it creates label_url, a label that prompts the user to enter a URL. Next, it adds an entry widget for URL input. It then creates a “Load Forms” button, which triggers the load_forms() function. This function also adds a widget to display a list of forms loaded by load_forms(), allowing the user to select a form.

Following this, the function introduces the “Submit Selected Form” button, which calls the edit_form() function when clicked. Finally, the function creates frame_form_fields, a frame that groups and organizes all the GUI elements and dynamically displays the form fields based on the user’s selection.

def setup_interface(root):
   global entry_url, listbox_forms, frame_form_fields


   # Interface for URL input
   label_url = tk.Label(root, text="Enter URL:")
   label_url.pack(pady=(10, 0))
   entry_url = tk.Entry(root, width=60)
   entry_url.pack(pady=5)


   # Button to retrieve forms
   button_load_forms = tk.Button(root, text="Load Forms", command=load_forms)
   button_load_forms.pack(pady=10)


   # Listbox for showing forms
   listbox_forms = tk.Listbox(root, height=6)
   listbox_forms.pack(pady=15, fill=tk.X)


   # Button to choose a form
   button_select_form = tk.Button(root, text="Edit & Submit Selected Form", command=edit_form)
   button_select_form.pack(pady=5)


   # Frame for displaying form fields
   frame_form_fields = tk.Frame(root)
   frame_form_fields.pack(fill=tk.BOTH, expand=True)

load_forms Function

The first thing this function does is declare form_list, current_url, entry_url, listbox_forms, and web_session as global variables. This allows the function to access and modify these variables even though they are defined outside this function. Initially, the function retrieves the URL entered by the user, stripping it of any leading or trailing whitespace using entry_url.get().strip(). If no URL is entered, an error message prompts the user to provide one.

Once the URL is handled, the function begins a try-except block. The try block starts by sending an HTTP GET request to the URL provided by the user via web_session.get(current_url), storing the response in the response variable. Next, it retrieves the HTML content of the response, parses it with BeautifulSoup, and stores it in the parsed_html variable. This parsed HTML is then used to find all ‘form’ elements with parsed_html.find_all("form"), which are stored in form_list.

After extracting the form elements, the function prepares to display them. It first clears any existing content in listbox_forms using listbox_forms.delete(). It then iterates through all the forms in form_list using the enumerate() function. For each form, it retrieves the value of the ‘action’ attribute using form.attrs.get(), defaulting to ‘no specified action’ if the attribute is missing. These forms are then added to the listbox_forms widget using listbox_forms.insert() for display.

If any errors occur during the process within the try block, an error message will be displayed to the user.

def load_forms():
   global form_list, current_url, entry_url, listbox_forms, web_session


   current_url = entry_url.get().strip()
   if not current_url:
       messagebox.showerror("Error", "Please provide a URL.")
       return


   try:
       response = web_session.get(current_url)
       parsed_html = BeautifulSoup(response.text, "html.parser")
       form_list = parsed_html.find_all("form")
       listbox_forms.delete(0, tk.END)
       for idx, form in enumerate(form_list):
           form_action = form.attrs.get('action', 'No specified action')
           listbox_forms.insert(tk.END, f"Form {idx + 1}: {form_action}")
   except Exception as e:
       messagebox.showerror("Error", f"Could not load forms: {str(e)}")

edit_form Function

This function starts by declaring listbox_forms and frame_form_fields as global variables. It then retrieves the index of the form selected by the user from listbox_forms using listbox_forms.curselection(). If no form is selected, an error message is displayed, prompting the user to select a form first, and the function returns early.

After obtaining the selected index, the function accesses the corresponding form from form_list using this index and retrieves its details with the extract_form_details() function, storing them in form_details.

Next, the function clears any old form fields by iterating over all child widgets in frame_form_fields using winfo_children() and destroying each widget.

It then creates new GUI elements based on the form_details. For each field in form_details['fields'], it generates a label and an entry widget. The label displays the field name, and the entry widget is pre-filled with default values if available. Each label and entry widget is packed into the frame_form_fields frame.

Finally, a “Submit Form” button is created and packed at the bottom of the frame. This button, when clicked, calls the submit_selected_form() function with the form_details as its argument.

def edit_form():
   global listbox_forms, frame_form_fields


   selected_index = listbox_forms.curselection()
   if not selected_index:
       messagebox.showerror("Error", "Select a form first.")
       return
   selected_form = form_list[selected_index[0]]
   form_details = extract_form_details(selected_form)


   # Clear old form fields
   for widget in frame_form_fields.winfo_children():
       widget.destroy()


   for field in form_details['fields']:
       label = tk.Label(frame_form_fields, text=f"{field['label']}:")
       label.pack()
       entry = tk.Entry(frame_form_fields, width=50)
       entry.insert(0, field['default'])
       entry.pack()


   submit_button = tk.Button(frame_form_fields, text="Submit Form", command=lambda: submit_selected_form(form_details))
   submit_button.pack(pady=10)

submit_selected_form Function

This one starts by taking form_details as a parameter and declaring frame_form_fields, current_url, and web_session as global variables. It first retrieves all child widgets (labels, entry_widgets) from frame_form_fields using winfo_children(). Then, it employs the list slicing technique [1::2] to select only the entry widgets, excluding the labels, thus capturing the user’s inputs. Next, it creates a dictionary called form_data, where the keys are the form field names and the values are the user inputs (widget.get()). This mapping is achieved using the zip() function to iterate simultaneously over the form fields and input widgets.

The function then constructs the URL for submitting the data by combining the base URL (provided by the user) with the action URL (from form_details) using urljoin(). Depending on whether a GET or POST request is needed, it processes the submission differently. For a GET request, it appends the form_data as query parameters to the URL using urljoin(), and then opens the full URL in the default browser using webbrowser.open(). For a POST request, it sends the combined URL and the form data in the HTTP request body using web_session.post(). Once the server processes the request, a response is returned as a temporary HTML file (temp_html), which is then opened in the default browser with webbrowser.open().

def submit_selected_form(form_details):
   global frame_form_fields, current_url, web_session


   input_widgets = frame_form_fields.winfo_children()[1::2]
   form_data = {field['name']: widget.get() for field, widget in zip(form_details['fields'], input_widgets)}


   form_action_url = urljoin(current_url, form_details['action'])


   if form_details['method'] == 'get':
       params = '&'.join([f"{key}={value}" for key, value in form_data.items()])
       full_url = f"{form_action_url}?{params}"
       webbrowser.open(full_url)  # Open the URL in the default web browser
   else:
       response = web_session.post(form_action_url, data=form_data)
       temp_html = 'temp_result.html'
       with open(temp_html, 'w', encoding='utf-8') as file:
           file.write(response.text)
       webbrowser.open(f'file://{temp_html}')  # Open the temporary file in the default web browser

extract_form_details Function

The last function extract_form_details accepts a single parameter, form, which represents the HTML of a form. It begins by initializing a dictionary named details with three keys: action, method, and fields. The action key fetches the form’s action URL using form.attrs.get('action', '').strip(), which also removes any trailing whitespace. The method key retrieves the form’s HTTP method, defaults to ‘get’ if not specified, and converts it to lowercase using form.attrs.get('method', 'get').strip().lower().

An empty list is assigned to the fields key to store details about each form field. The function then iterates over all input elements found within the form using form.find_all('input'). For each input element, it checks if a name attribute exists. If it does, the function appends a dictionary to details['fields'] containing the input’s name, type, default value, and a label which is a capitalized version of the name.

Finally, the function returns the details dictionary, which now includes comprehensive information about the form’s action URL, method, and fields.

def extract_form_details(form):
   details = {
       'action': form.attrs.get('action', '').strip(),
       'method': form.attrs.get('method', 'get').strip().lower(),
       'fields': []
   }
   for input_element in form.find_all('input'):
       input_name = input_element.attrs.get('name', None)
       if input_name:  # Only add inputs that have a name attribute
           details['fields'].append({
               'name': input_name,
               'type': input_element.attrs.get('type', 'text'),
               'default': input_element.attrs.get('value', ''),
               'label': input_name.capitalize()  # Use name for label, capitalized
           })
   return details

Main Execution Block

This part ensures that the script can only be run directly and not imported as a module. It also creates the main window, sets its title, and defines its geometry.

Additionally, it triggers the setup_interface() function to set up the GUI elements within the main window. Finally, it ensures that the main event loop starts, keeping the main window running and responsive to user interactions until the user decides to exit.

if __name__ == "__main__":
   root = tk.Tk()
   root.title("Web Form Explorer - The Pycodes")
   root.geometry("650x500")


   setup_interface(root)
   root.mainloop()

Example

Full Code

import tkinter as tk
from tkinter import messagebox
from urllib.parse import urljoin
import requests
from bs4 import BeautifulSoup
import webbrowser


# Global variables to store form details and session
form_list = []
current_url = ""
web_session = requests.Session()


def setup_interface(root):
   global entry_url, listbox_forms, frame_form_fields


   # Interface for URL input
   label_url = tk.Label(root, text="Enter URL:")
   label_url.pack(pady=(10, 0))
   entry_url = tk.Entry(root, width=60)
   entry_url.pack(pady=5)


   # Button to retrieve forms
   button_load_forms = tk.Button(root, text="Load Forms", command=load_forms)
   button_load_forms.pack(pady=10)


   # Listbox for showing forms
   listbox_forms = tk.Listbox(root, height=6)
   listbox_forms.pack(pady=15, fill=tk.X)


   # Button to choose a form
   button_select_form = tk.Button(root, text="Edit & Submit Selected Form", command=edit_form)
   button_select_form.pack(pady=5)


   # Frame for displaying form fields
   frame_form_fields = tk.Frame(root)
   frame_form_fields.pack(fill=tk.BOTH, expand=True)


def load_forms():
   global form_list, current_url, entry_url, listbox_forms, web_session


   current_url = entry_url.get().strip()
   if not current_url:
       messagebox.showerror("Error", "Please provide a URL.")
       return


   try:
       response = web_session.get(current_url)
       parsed_html = BeautifulSoup(response.text, "html.parser")
       form_list = parsed_html.find_all("form")
       listbox_forms.delete(0, tk.END)
       for idx, form in enumerate(form_list):
           form_action = form.attrs.get('action', 'No specified action')
           listbox_forms.insert(tk.END, f"Form {idx + 1}: {form_action}")
   except Exception as e:
       messagebox.showerror("Error", f"Could not load forms: {str(e)}")


def edit_form():
   global listbox_forms, frame_form_fields


   selected_index = listbox_forms.curselection()
   if not selected_index:
       messagebox.showerror("Error", "Select a form first.")
       return
   selected_form = form_list[selected_index[0]]
   form_details = extract_form_details(selected_form)


   # Clear old form fields
   for widget in frame_form_fields.winfo_children():
       widget.destroy()


   for field in form_details['fields']:
       label = tk.Label(frame_form_fields, text=f"{field['label']}:")
       label.pack()
       entry = tk.Entry(frame_form_fields, width=50)
       entry.insert(0, field['default'])
       entry.pack()


   submit_button = tk.Button(frame_form_fields, text="Submit Form", command=lambda: submit_selected_form(form_details))
   submit_button.pack(pady=10)


def submit_selected_form(form_details):
   global frame_form_fields, current_url, web_session


   input_widgets = frame_form_fields.winfo_children()[1::2]
   form_data = {field['name']: widget.get() for field, widget in zip(form_details['fields'], input_widgets)}


   form_action_url = urljoin(current_url, form_details['action'])


   if form_details['method'] == 'get':
       params = '&'.join([f"{key}={value}" for key, value in form_data.items()])
       full_url = f"{form_action_url}?{params}"
       webbrowser.open(full_url)  # Open the URL in the default web browser
   else:
       response = web_session.post(form_action_url, data=form_data)
       temp_html = 'temp_result.html'
       with open(temp_html, 'w', encoding='utf-8') as file:
           file.write(response.text)
       webbrowser.open(f'file://{temp_html}')  # Open the temporary file in the default web browser


def extract_form_details(form):
   details = {
       'action': form.attrs.get('action', '').strip(),
       'method': form.attrs.get('method', 'get').strip().lower(),
       'fields': []
   }
   for input_element in form.find_all('input'):
       input_name = input_element.attrs.get('name', None)
       if input_name:  # Only add inputs that have a name attribute
           details['fields'].append({
               'name': input_name,
               'type': input_element.attrs.get('type', 'text'),
               'default': input_element.attrs.get('value', ''),
               'label': input_name.capitalize()  # Use name for label, capitalized
           })
   return details


if __name__ == "__main__":
   root = tk.Tk()
   root.title("Web Form Explorer - The Pycodes")
   root.geometry("650x500")


   setup_interface(root)
   root.mainloop()

Happy Coding!

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top