Home » Tutorials » How to Develop a Question Answering System with T5 in Python

How to Develop a Question Answering System with T5 in Python

In today’s data-driven world, the ability to quickly extract relevant information from documents is invaluable. Imagine having a system where you can upload a document, ask a question, and get an accurate answer within seconds. That’s exactly what we are going to build in this tutorial.

We’ll walk you through creating a Question Answering System using T5 transformers in Python, integrated with a user-friendly tkinter GUI. This system leverages the power of natural language processing (NLP) to understand and answer questions based on the content of any uploaded text document. Whether you’re a seasoned developer or just getting started with machine learning, this article will provide you with a step-by-step approach to building this powerful tool.

Let’s get started!

Table of Contents

Necessary Libraries

For this code to work properly, make sure to install these libraries via the terminal or command prompt by running the following commands:

$ pip install tk
$ pip install transformers 
$ pip install torch

Imports

import tkinter as tk
from tkinter import filedialog, scrolledtext, messagebox
from transformers import T5Tokenizer, T5ForConditionalGeneration
import torch
import threading

Now then, it’s time to prepare our toolkit for building this amazing Question Answering System. Let’s dive in:

Tkinter: This is our go-to library for creating a graphical user interface (GUI).

  • From tkinter, we import filedialog to open and save files.
  • Also, scrolledtext and messagebox to handle large text documents and display messages.

Transformers: This powerful library provides pre-trained models and tools for Natural Language Processing (NLP).

  • From Transformers, we import T5Tokenizer to convert text to tokens and vice versa.
  • We will also import T5ForConditionalGeneration, which is the T5 model that generates text, such as answering questions.

Torch: Essential for handling tensors and performing computations.

Threading: Keeps our program running smoothly by allowing it to perform multiple tasks without freezing.

Initializing the Main Window

Next, we’ll set up the central command by creating the main window, setting its title, and defining its geometry.

# Initialize main window
root = tk.Tk()
root.title("Document-Based Question Answering System - The Pycodes")
root.geometry("800x600")

Initializing the T5 Model

Let’s move on to the brain of our operation: the pre-trained T5 model variant. This powerhouse will help us load both the tokenizer, which encodes and decodes text effortlessly, and the T5 model for conditional generation, which generates context-aware answers.

# Initialize T5 tokenizer and model
model_name = 't5-base'
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)

Setting Up the Document Content Variable

# Setting up the document content variable
document_content = ""

The document_content variable acts as a vault that stores the content of the uploaded document. This vault can be accessed and updated by various functions in the script, which we will explore later.

The Upload Document Function

We spoke above about storing the uploaded document, but before storing it, we need to upload it. The upload_document() function is the gateway to that. It uses filedialog with a filter to select only text documents by specifying the “.txt” extension. Once the desired file is selected, it will be read and stored in the document_content variable we created earlier.

Now that we have our text document read and stored, it’s time to display it for the user in the document_text widget. Before doing this, we need to delete any existing text in the widget using document_text.delete(), and then insert our stored document using document_text.insert().

def upload_document():
   global document_content
   # We Open file dialog to select a text file
   file_path = filedialog.askopenfilename(filetypes=[("Text Files", "*.txt")])
   if file_path:
       with open(file_path, 'r') as file:
           document_content = file.read()
       document_text.delete(1.0, tk.END)
       document_text.insert(tk.END, document_content)

Handling Answering in a Separate Thread

Before we dive into the get_answer() function, we need to create a workaround to prevent the main window from freezing. That’s why we have the get_answer_thread() function, which starts a new thread and calls the get_answer() function.

def get_answer_thread():
   # Start a new thread to keep the GUI responsive
   threading.Thread(target=get_answer).start()

get_answer Function

Finally, it is time to get to the very core of our program, the get_answer() function. Especially after we created a backdoor for it, so it doesn’t freeze the main window. With that being said, let’s see how this function actually works:

  • The function retrieves the question entered by the user from the question entry widget.
  • It then accesses and retrieves the text document content either from the document_content variable or from the document_text widget, removing any whitespace or trailing from it.
  • Next, it prints the question along with the context (the first 200 characters for debugging purposes). If no question or context is provided, it shows an error message and returns to the initial state so the user can provide the question.
  • With both the question and context available, it prepares the input text for the T5 model.
  • It uses the tokenizer to encode the text into tokens for the T5 model.To speed up processing, it skips gradient calculations and uses the model to generate the answer in tokens.
  • Afterwards, it decodes those generated tokens into a readable answer using the tokenizer.
  • It prints the answer for debugging purposes and then updates the answer_label widget to display the answer.

Naturally, if any exception occurs during this process, an error message will be printed for debugging purposes and displayed to the user.

def get_answer():
   try:
       global document_content
       # Get question from GUI input
       question = question_entry.get()
       context = document_content or document_text.get(1.0, tk.END).strip()


       print(f"Question: {question}")
       print(f"Context: {context[:200]}...")  # Display only the first 200 characters for debugging


       if not question or not context:
           messagebox.showerror("Error", "Please enter a question and provide The document content.")
           return


       # Prepare the input text for T5
       input_text = f"question: {question} context: {context}"
       inputs = tokenizer.encode(input_text, return_tensors="pt", max_length=512, truncation=True)


       # Generate the answer
       with torch.no_grad():
           outputs = model.generate(inputs, max_length=150, num_beams=4, early_stopping=True)


       answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
       print(f"Answer: {answer}")


       # Update answer label in GUI
       answer_label.config(text=f"Answer: {answer}")
   except Exception as e:
       print(f"Error: {e}")
       messagebox.showerror("Error", f"An error occurred: {e}")

Creating GUI Elements

We have already created our center of command, but we haven’t finished it. This part of the code makes it easy for the user to navigate through the program. Let’s see the elements we added to make this script user-friendly:

  • Title Label: We created an eye-catching label with the title of the program.
  • Upload Button: The “Upload Document” button calls the upload_document() function.
  • Document Input: A label indicating where to input the text, along with the scrollable document_text widget where the text is entered.
  • Question Input: A label asking the user to input their question and the answer_label where the answer is displayed.
  • Get Answer Button: The “Get Answer” button calls the get_answer_thread() function.
  • Main Loop: The mainloop() command starts the main event loop and ensures the main window keeps running and is responsive to the user.
# Create GUI elements
# Title
tk.Label(root, text="The Pycodes: Document-Based Question Answering System", font=("Helvetica", 16, "bold")).pack(pady=10)


# Upload button
upload_button = tk.Button(root, text="Upload Document", command=upload_document, bg="lightblue", fg="black")
upload_button.pack(pady=10)


# Document display
tk.Label(root, text="Document Content:").pack()
document_text = scrolledtext.ScrolledText(root, wrap=tk.WORD, width=80, height=10)
document_text.pack(pady=10)


# Question input
tk.Label(root, text="Enter your question:").pack()
question_entry = tk.Entry(root, width=80)
question_entry.pack(pady=10)


# Answer display
answer_label = tk.Label(root, text="Answer will be displayed here.", wraplength=700, justify=tk.LEFT, bg="lightgrey", anchor="w")
answer_label.pack(pady=10, fill=tk.BOTH, padx=10)


# Answer button
answer_button = tk.Button(root, text="Get Answer", command=get_answer_thread, bg="lightblue", fg="black")
answer_button.pack(pady=10)


# Run the GUI application
root.mainloop()

Example

I ran this script on a Windows system as shown in the images below:

Also on a Linux system:

Full Code

import tkinter as tk
from tkinter import filedialog, scrolledtext, messagebox
from transformers import T5Tokenizer, T5ForConditionalGeneration
import torch
import threading


# Initialize main window
root = tk.Tk()
root.title("Document-Based Question Answering System - The Pycodes")
root.geometry("800x600")


# Initialize T5 tokenizer and model
model_name = 't5-base'
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)


# Setting up the document content variable
document_content = ""


def upload_document():
   global document_content
   # We Open file dialog to select a text file
   file_path = filedialog.askopenfilename(filetypes=[("Text Files", "*.txt")])
   if file_path:
       with open(file_path, 'r') as file:
           document_content = file.read()
       document_text.delete(1.0, tk.END)
       document_text.insert(tk.END, document_content)


def get_answer_thread():
   # Start a new thread to keep the GUI responsive
   threading.Thread(target=get_answer).start()


def get_answer():
   try:
       global document_content
       # Get question from GUI input
       question = question_entry.get()
       context = document_content or document_text.get(1.0, tk.END).strip()


       print(f"Question: {question}")
       print(f"Context: {context[:200]}...")  # Display only the first 200 characters for debugging


       if not question or not context:
           messagebox.showerror("Error", "Please enter a question and provide The document content.")
           return


       # Prepare the input text for T5
       input_text = f"question: {question} context: {context}"
       inputs = tokenizer.encode(input_text, return_tensors="pt", max_length=512, truncation=True)


       # Generate the answer
       with torch.no_grad():
           outputs = model.generate(inputs, max_length=150, num_beams=4, early_stopping=True)


       answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
       print(f"Answer: {answer}")


       # Update answer label in GUI
       answer_label.config(text=f"Answer: {answer}")
   except Exception as e:
       print(f"Error: {e}")
       messagebox.showerror("Error", f"An error occurred: {e}")


# Create GUI elements
# Title
tk.Label(root, text="The Pycodes: Document-Based Question Answering System", font=("Helvetica", 16, "bold")).pack(pady=10)


# Upload button
upload_button = tk.Button(root, text="Upload Document", command=upload_document, bg="lightblue", fg="black")
upload_button.pack(pady=10)


# Document display
tk.Label(root, text="Document Content:").pack()
document_text = scrolledtext.ScrolledText(root, wrap=tk.WORD, width=80, height=10)
document_text.pack(pady=10)


# Question input
tk.Label(root, text="Enter your question:").pack()
question_entry = tk.Entry(root, width=80)
question_entry.pack(pady=10)


# Answer display
answer_label = tk.Label(root, text="Answer will be displayed here.", wraplength=700, justify=tk.LEFT, bg="lightgrey", anchor="w")
answer_label.pack(pady=10, fill=tk.BOTH, padx=10)


# Answer button
answer_button = tk.Button(root, text="Get Answer", command=get_answer_thread, bg="lightblue", fg="black")
answer_button.pack(pady=10)


# Run the GUI application
root.mainloop()

Happy Coding!

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top