Paraphrasing has become an essential tool for content creators, students, and professionals alike. Whether you want to rephrase sentences for clarity or generate new variations, having an automated tool to do so saves a ton of time and effort. But how do you build one yourself?
Today, you’ll learn how to create a powerful paraphrasing tool in Python using Transformers. We’ll guide you through using two popular models, PEGASUS and FLAN-T5, and show you how to integrate everything into a user-friendly interface with Tkinter. Let’s dive in and start building!
Table of Contents
- Setting Up the Environment
- Loading the Selected Model
- Generating Paraphrased Sentences
- Running Paraphrasing in the Background
- Running the Paraphrasing in a New Thread
- Setting Up the Main Window
- Example
- Full Code
Setting Up the Environment
Before running the code, you need to install a few libraries. Here’s a step-by-step guide to set up everything:
- Install the transformers library.
The transformers
library is used to load the Pegasus and FLAN-T5 models for paraphrasing. Install it using pip:
$ pip install transformers
- Install torch.
torch
is the PyTorch library, which is required for running the models loaded by transformers
. Install it using:
$ pip install torch
- Ensure tkinter is available.
tkinter
is the library that lets us create the graphical interface for the tool. It usually comes pre-installed with Python, but depending on your system, you might need to install it manually:
On Linux, run:
$ sudo apt-get install python3-tk
On macOS, if not already available, you can install it using Homebrew:
$ brew install tcl-tk
On Windows, tkinter
is usually bundled with Python, so you shouldn’t need to install it separately.
Imports
import threading
import tkinter as tk
from tkinter import ttk, scrolledtext
from transformers import PegasusForConditionalGeneration, PegasusTokenizerFast, AutoModelForSeq2SeqLM, AutoTokenizer
Let’s kick things off by setting the stage for what this code is all about, starting with the imports.
- First up, we bring in threading. This lets us run tasks in the background, so our main window doesn’t freeze while everything’s processing.
- Next, we’ve got tkinter, which is our go-to for creating a graphical user interface. Within that, we use ttk to add some themed widgets and scrolledtext to create a nice text box with a scrollbar.
- And last but definitely not least, we import the transformers library. This is the star of the show since it gives us access to models designed for natural language processing, which will do the heavy lifting when it comes to paraphrasing.
Initializing Model Variables
# Load models (deferred to when selected)
pegasus_model = None
pegasus_tokenizer = None
flan_t5_model = None
flan_t5_tokenizer = None
This is where we set up our model variables, which are basically just empty slots (set to None
) waiting to hold the models and tokenizers we’ll load later. So, how does this all work?
Well, once a user selects a model and kicks off the paraphrasing, we load that specific model into its variable. This setup does a couple of great things: first, it boosts performance because the model stays in memory, ready to go without any delays. Plus, it saves on memory and makes the app start up faster since we only load the model that’s needed, rather than loading everything all at once.
Loading the Selected Model
Now that we have the slots ready to store the models, all that’s left is to fetch them using the load_model()
function. This function declares the model variables as global, which means we can access and modify them throughout the code.
# Function to load the selected model dynamically
def load_model(selected_model):
global pegasus_model, pegasus_tokenizer, flan_t5_model, flan_t5_tokenizer
Next, it checks which model the user selected. Once it confirms the selection, it looks to see if the model has already been loaded by checking if the variable is still set to None
.
status_label.config(text="Loading model, please wait...")
try:
if selected_model == "PEGASUS" and pegasus_model is None:
If the model hasn’t been loaded yet (which means the variable is empty), the function uses from_pretrained()
to download the selected model—either PEGASUS or FLAN-T5. Both of these models are pre-trained and fine-tuned for paraphrasing.
pegasus_model = PegasusForConditionalGeneration.from_pretrained("tuner007/pegasus_paraphrase")
pegasus_tokenizer = PegasusTokenizerFast.from_pretrained("tuner007/pegasus_paraphrase")
elif selected_model == "FLAN-T5" and flan_t5_model is None:
flan_t5_tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-base")
flan_t5_model = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-base")
While the model is loading, the function updates the status label in the GUI to let the user know that the process is underway.
status_label.config(text="Loading model, please wait...")
If anything goes wrong during this process, it catches the error and updates the output text box to display what went wrong, so the user knows what to look for.
except Exception as e:
output_text.delete("1.0", tk.END)
output_text.insert(tk.END, f"Error loading model: {str(e)}")
return False
Once the model is successfully loaded, the status label is updated to inform the user, and the function returns True
.
status_label.config(text="Model loaded.")
return True
Generating Paraphrased Sentences
This is where the magic happens! With the model loaded and ready, the get_paraphrased_sentences()
function is responsible for generating paraphrased versions of a given sentence. It takes in the model, tokenizer, and the sentence we want to paraphrase:
- First, the function uses the tokenizer to convert the sentence into a format the model can process. Then, using
num_beams
andnum_return_sequences
, the model generates multiple paraphrased variations of the input sentence. Finally, the function returns these variations, making them available for other parts of the code to use.
# Function to get paraphrased sentences
def get_paraphrased_sentences(model, tokenizer, sentence, num_return_sequences=5, num_beams=5):
inputs = tokenizer([sentence], truncation=True, padding="longest", return_tensors="pt")
outputs = model.generate(
**inputs,
num_beams=num_beams,
num_return_sequences=num_return_sequences,
)
return tokenizer.batch_decode(outputs, skip_special_tokens=True)
Running Paraphrasing in the Background
We’ve reached the core of the program: the paraphrase_text()
function, which handles the entire paraphrasing process:
- First, we get the selected model and input sentence.
The paraphrase_text()
function starts by grabbing the model the user chose and the text they entered. It uses input_text.get()
to fetch the sentence and strip()
to remove any extra spaces.
selected_model = model_choice.get()
sentence = input_text.get("1.0", tk.END).strip()
- If no sentence is provided, we let the user know.
If the input box is empty, the function clears any previous results from the output box and prompts the user to enter something.
if not sentence:
output_text.delete("1.0", tk.END)
output_text.insert(tk.END, "Please enter a sentence.")
return
- Now, it’s time to load the model.
Next, we attempt to load the selected model by calling load_model()
. If the model can’t be loaded, the function stops further execution.
if not load_model(selected_model):
return
- Let’s keep the user informed and disable the paraphrase button.
To prevent confusion, we disable the “Paraphrase” button during processing and update the status label to indicate the model is working on paraphrasing.
status_label.config(text="Paraphrasing, please wait...")
paraphrase_button.config(state=tk.DISABLED)
- We then generate the paraphrased sentences.
Here’s where the paraphrasing magic happens. The function checks which model is selected (PEGASUS or FLAN-T5), uses the corresponding tokenizer and model to get paraphrased sentences, and catches any errors that might occur during the process.
try:
if selected_model == "PEGASUS":
paraphrased_sentences = get_paraphrased_sentences(pegasus_model, pegasus_tokenizer, sentence)
elif selected_model == "FLAN-T5":
paraphrased_sentences = get_paraphrased_sentences(flan_t5_model, flan_t5_tokenizer, sentence)
except Exception as e:
paraphrased_sentences = [f"Error during paraphrasing: {str(e)}"]
- Finally, we display the results and reset the UI.
Once the paraphrasing is complete, the function clears the output text box and inserts the newly generated sentences. It then updates the status label and re-enables the “Paraphrase” button, ready for the next input.
output_text.delete("1.0", tk.END)
for idx, paraphrase in enumerate(paraphrased_sentences):
output_text.insert(tk.END, f"{idx + 1}. {paraphrase}\n\n")
status_label.config(text="Paraphrasing complete.")
paraphrase_button.config(state=tk.NORMAL)
Running the Paraphrasing in a New Thread
Now it’s time for the finishing touches. We create a new thread to run the paraphrase_text()
function using run_paraphrasing()
. This allows the paraphrasing process to happen in the background, ensuring the main window doesn’t freeze while the task is being processed.
# Function to run paraphrasing in a new thread to avoid freezing the UI
def run_paraphrasing():
threading.Thread(target=paraphrase_text).start()
Setting Up the Main Window
We’ve arrived at the exciting conclusion—the visual part where everything comes together! Here, we use tkinter
to create the main window, set its title, and define its size, while also making it resizable. Next, we create the input text box, adding a scrollbar and labeling it for clarity. Then, we set up a drop-down menu using a Combobox
, labeling it and setting PEGASUS as the default model with model_choice.current(0)
.
After that, we create the “Paraphrase” button that triggers the run_paraphrasing()
function. Moving on, we add an output text box with a label and a scrollbar for displaying the results. We also create a status label to keep the user informed.
# Tkinter window setup
window = tk.Tk()
window.title("Paraphrasing Tool with Transformers - The Pycodes")
window.geometry("700x550")
window.resizable(True, True)
# Label for input
input_label = tk.Label(window, text="Enter text to paraphrase:")
input_label.pack(pady=10)
# Input text box
input_text = scrolledtext.ScrolledText(window, wrap=tk.WORD, width=60, height=5)
input_text.pack(pady=10)
# Dropdown menu to select model
model_label = tk.Label(window, text="Select model:")
model_label.pack(pady=10)
# Default model is PEGASUS
model_choice = ttk.Combobox(window, values=["PEGASUS", "FLAN-T5"])
model_choice.current(0) # Set PEGASUS as default
model_choice.pack()
# Button to trigger paraphrasing
paraphrase_button = tk.Button(window, text="Paraphrase", command=run_paraphrasing)
paraphrase_button.pack(pady=10)
# Output text box
output_label = tk.Label(window, text="Paraphrased Text:")
output_label.pack(pady=10)
output_text = scrolledtext.ScrolledText(window, wrap=tk.WORD, width=60, height=10)
output_text.pack(pady=10)
# Status label
status_label = tk.Label(window, text="Waiting for input...", fg="blue")
status_label.pack(pady=10)
Lastly, we call mainloop()
to start the event loop, ensuring the main window stays open and responsive to user input.
# Start the Tkinter loop
window.mainloop()
Example
Full Code
import threading
import tkinter as tk
from tkinter import ttk, scrolledtext
from transformers import PegasusForConditionalGeneration, PegasusTokenizerFast, AutoModelForSeq2SeqLM, AutoTokenizer
# Load models (deferred to when selected)
pegasus_model = None
pegasus_tokenizer = None
flan_t5_model = None
flan_t5_tokenizer = None
# Function to load the selected model dynamically
def load_model(selected_model):
global pegasus_model, pegasus_tokenizer, flan_t5_model, flan_t5_tokenizer
status_label.config(text="Loading model, please wait...")
try:
if selected_model == "PEGASUS" and pegasus_model is None:
pegasus_model = PegasusForConditionalGeneration.from_pretrained("tuner007/pegasus_paraphrase")
pegasus_tokenizer = PegasusTokenizerFast.from_pretrained("tuner007/pegasus_paraphrase")
elif selected_model == "FLAN-T5" and flan_t5_model is None:
flan_t5_tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-base")
flan_t5_model = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-base")
except Exception as e:
output_text.delete("1.0", tk.END)
output_text.insert(tk.END, f"Error loading model: {str(e)}")
return False
status_label.config(text="Model loaded.")
return True
# Function to get paraphrased sentences
def get_paraphrased_sentences(model, tokenizer, sentence, num_return_sequences=5, num_beams=5):
inputs = tokenizer([sentence], truncation=True, padding="longest", return_tensors="pt")
outputs = model.generate(
**inputs,
num_beams=num_beams,
num_return_sequences=num_return_sequences,
)
return tokenizer.batch_decode(outputs, skip_special_tokens=True)
# Function to handle the paraphrasing in a separate thread
def paraphrase_text():
selected_model = model_choice.get()
sentence = input_text.get("1.0", tk.END).strip()
if not sentence:
output_text.delete("1.0", tk.END)
output_text.insert(tk.END, "Please enter a sentence.")
return
paraphrased_sentences = []
if not load_model(selected_model):
return
status_label.config(text="Paraphrasing, please wait...")
paraphrase_button.config(state=tk.DISABLED)
try:
if selected_model == "PEGASUS":
paraphrased_sentences = get_paraphrased_sentences(pegasus_model, pegasus_tokenizer, sentence)
elif selected_model == "FLAN-T5":
paraphrased_sentences = get_paraphrased_sentences(flan_t5_model, flan_t5_tokenizer, sentence)
except Exception as e:
paraphrased_sentences = [f"Error during paraphrasing: {str(e)}"]
# Display paraphrased sentences
output_text.delete("1.0", tk.END)
for idx, paraphrase in enumerate(paraphrased_sentences):
output_text.insert(tk.END, f"{idx + 1}. {paraphrase}\n\n")
status_label.config(text="Paraphrasing complete.")
paraphrase_button.config(state=tk.NORMAL)
# Function to run paraphrasing in a new thread to avoid freezing the UI
def run_paraphrasing():
threading.Thread(target=paraphrase_text).start()
# Tkinter window setup
window = tk.Tk()
window.title("Paraphrasing Tool with Transformers - The Pycodes")
window.geometry("700x550")
window.resizable(True, True)
# Label for input
input_label = tk.Label(window, text="Enter text to paraphrase:")
input_label.pack(pady=10)
# Input text box
input_text = scrolledtext.ScrolledText(window, wrap=tk.WORD, width=60, height=5)
input_text.pack(pady=10)
# Dropdown menu to select model
model_label = tk.Label(window, text="Select model:")
model_label.pack(pady=10)
# Default model is PEGASUS
model_choice = ttk.Combobox(window, values=["PEGASUS", "FLAN-T5"])
model_choice.current(0) # Set PEGASUS as default
model_choice.pack()
# Button to trigger paraphrasing
paraphrase_button = tk.Button(window, text="Paraphrase", command=run_paraphrasing)
paraphrase_button.pack(pady=10)
# Output text box
output_label = tk.Label(window, text="Paraphrased Text:")
output_label.pack(pady=10)
output_text = scrolledtext.ScrolledText(window, wrap=tk.WORD, width=60, height=10)
output_text.pack(pady=10)
# Status label
status_label = tk.Label(window, text="Waiting for input...", fg="blue")
status_label.pack(pady=10)
# Start the Tkinter loop
window.mainloop()
Happy Coding!