Home » Tutorials » How to Build a Language Translator with Transformers in Python

How to Build a Language Translator with Transformers in Python

In today’s interconnected world, languageNecessary Libraries translation plays a crucial role in bridging communication gaps across different cultures and regions. With advancements in natural language processing (NLP) and machine learning, building a robust language translator has become more accessible than ever. In this tutorial, we will guide you through the process of creating a powerful language translator using Transformers in Python.

We’ll be using the MarianMTModel from Hugging Face’s Transformers library, which is renowned for its top-notch translation capabilities. To make things even better, we’ll create a user-friendly graphical user interface (GUI) with tkinter, so our translator will be easy for anyone to use. By the time we’re done, you’ll have a working language translator that can handle multiple languages with ease.

Let’s get started!

Learn also:

Table of Contents

Necessary Libraries

Don’t forget to install these libraries using the terminal or command prompt so the code works properly:

$ pip install tk
$ pip install transformers

Imports

Just like any great adventure, we begin by gathering our essential tools. Here’s what we’ll need:

  • MarianMTModel and MarianTokenizer: These are our heroes for machine translation, handling the heavy lifting of language processing.
  • Tkinter: To create a graphical user interface (GUI).
  • Messagebox: Think of this as our way to pop up helpful messages and alerts.
  • OptionMenu: To create a drop-down menu for selecting an option.
  • StringVar: This magical tool dynamically links variables with tkinter widgets, making sure everything stays in sync.
  • ttk: Providing us with stylish, themed widgets to make our GUI look polished.
  • Threading: Allowing our program to multitask like a pro, keeping the main window smooth and responsive.
  • Platform: Helping us identify the operating system so we can set the cursor just right.
from transformers import MarianMTModel, MarianTokenizer
import tkinter as tk
from tkinter import messagebox, OptionMenu, StringVar, ttk
import threading
import platform

With these tools in hand, we’re ready to dive into our journey to build an awesome language translator!

Translation Function

Now that we have collected our tools, let’s move on to the engine that drives our program: the translate() function, which transforms words from one language to another. How, you might wonder? Don’t worry, we’ll go through it step by step:

  • First, the function constructs the model name and loads it along with the tokenizer based on the source and target languages. Then, it tokenizes the input text and prepares it for the model.
  • Next, it uses beam search, which considers multiple possible translations to generate a translation with improved quality.
  • Finally, once we get the generated tokens of the translation, they are decoded into a readable format.
def translate(texts, src_lang="en", tgt_lang="fr", num_beams=5, early_stopping=True):
  """ This Translate texts from src_lang to tgt_lang using MarianMTModel with beam search."""
  model_name = f'Helsinki-NLP/opus-mt-{src_lang}-{tgt_lang}'
  tokenizer = MarianTokenizer.from_pretrained(model_name)
  model = MarianMTModel.from_pretrained(model_name)



  # Tokenize the text
  inputs = tokenizer(texts, return_tensors="pt", padding=True)



  # Generate translation using the model with beam search
  translated = model.generate(**inputs, num_beams=num_beams, early_stopping=early_stopping)



  # Decode the translated text
  translated_texts = [tokenizer.decode(t, skip_special_tokens=True) for t in translated]
  return translated_texts

Translate Button Click Handler

With the engine of our program complete, it is time to make the guide that steers our engine: the on_translate() function. When triggered, this function retrieves the input text from the entry widget and strips any whitespace using text_entry.get().strip(). It then ensures that the text entry widget isn’t empty; if it is, it prompts the user to input the text with a message box. Next, it checks if the user has selected both the source and target languages. If not, it prompts the user again to select the languages from the drop-down menu with a message box.

After validating the inputs, it converts the language names to their corresponding codes using language_dict (e.g., English is “en” and French is “fr”). With all inputs ready, the function proceeds as follows:

  • It starts a new thread with the sub-function translate_thread().
  • It sets the cursor to a watch icon and starts the progress bar to indicate the process has begun.
  • It calls the translate() function and updates the result text widget with the translation.
  • Any error that occurs during this process is indicated with a message box.
  • Once the operation is over, the progress bar stops, showing that the translation is complete.
def on_translate():
  """Handle the translate button click."""
  source_text = text_entry.get("1.0", tk.END).strip()
  if not source_text:
      messagebox.showwarning("Input Error", "Please enter text to translate.")
      return


  src_lang_name = src_lang_var.get().strip()
  tgt_lang_name = tgt_lang_var.get().strip()


  if not src_lang_name or not tgt_lang_name:
      messagebox.showwarning("Input Error", "Please specify both source and target languages.")
      return


  src_lang = language_dict[src_lang_name]
  tgt_lang = language_dict[tgt_lang_name]


  def translate_thread():
      try:
          set_cursor("watch")
          progress_bar.start()
          translations = translate([source_text], src_lang, tgt_lang)
          result_text.set(translations[0])
      except Exception as e:
          messagebox.showerror("Translation Error", str(e))
      finally:
          progress_bar.stop()
          set_cursor("")


  # Run the translation in a separate thread
  threading.Thread(target=translate_thread).start()

Set Cursor Type

def set_cursor(cursor_type):
  """Set the cursor type depending on the platform."""
  cursor = cursor_type if platform.system() != 'Linux' else 'watch'
  root.config(cursor=cursor)

Next, to keep things clear for our users, we need a way to show that something’s happening behind the scenes. That’s where our set_cursor() function comes in. It checks the operating system using the platform module and sets the cursor type based on the cursor_type parameter, giving a visual cue that work is in progress.

GUI Setup

For this step, let’s bring all our components together and build the graphical interface with the setup_gui() function. We’ll start by creating the main window and setting its title. First, we add a label with the name of our blog. Next, we set up an entry widget where you can type in the text you want to translate.

Below the text entry, we place a label that says “Source Language” and a drop-down menu to select the source language. We use a StringVar to hold the selected language, with English set as the default. We’ll do the same for the target language, adding a label, a drop-down menu, and another StringVar, with French as the default.

After that, we add a “Translate” button that will trigger the on_translate() function when clicked. To show that the translation process is ongoing, we’ll include a progress bar. Finally, we create a StringVar to hold the results, which will be displayed on the result label.

Feel free to customize this function however you like. For example, you could replace the result label with a widget that lets you select and copy the results, or even add a copy button.

def setup_gui():
  """Set up the GUI components."""
  global root
  root = tk.Tk()
  root.title("Translator - The Pycodes")


  # Blog name label
  blog_name_label = tk.Label(root, text="The Pycodes", font=("Helvetica", 16, "bold"))
  blog_name_label.pack(pady=10)


  # Source text entry
  global text_entry
  text_entry = tk.Text(root, height=10, width=50)
  text_entry.pack(pady=10)


  # Source language selection
  src_lang_label = tk.Label(root, text="Source Language:")
  src_lang_label.pack()
  global src_lang_var
  src_lang_var = StringVar(root)
  src_lang_var.set("English")
  src_lang_menu = OptionMenu(root, src_lang_var, *language_dict.keys())
  src_lang_menu.pack()


  # Target language selection
  tgt_lang_label = tk.Label(root, text="Target Language:")
  tgt_lang_label.pack()
  global tgt_lang_var
  tgt_lang_var = StringVar(root)
  tgt_lang_var.set("French")
  tgt_lang_menu = OptionMenu(root, tgt_lang_var, *language_dict.keys())
  tgt_lang_menu.pack()


  # Translate button
  translate_button = tk.Button(root, text="Translate", command=on_translate)
  translate_button.pack(pady=10)


  # Progress bar
  global progress_bar
  progress_bar = ttk.Progressbar(root, mode='indeterminate')
  progress_bar.pack(pady=10)


  # Result display
  global result_text
  result_text = tk.StringVar()
  result_label = tk.Label(root, textvariable=result_text, wraplength=400, justify="left", bg="lightgray", height=10, width=50)
  result_label.pack(pady=10)


  return root

Language Dictionary

Since we are using language codes, our program would be incomplete without a dictionary that maps languages to their codes. This is the objective of the language_dict, ensuring accurate translation.

# Available languages
language_dict = {
  "English": "en",
  "French": "fr",
  "German": "de",
  "Spanish": "es",
  "Italian": "it",
  "Dutch": "nl",
  "Portuguese": "pt",
  "Russian": "ru",
  "Chinese": "zh",
  "Japanese": "ja",
  "Korean": "ko"
}

Main Entry Point

We have finally reached the grand finale. This part ensures that the script can only be run directly and not imported as a module. It also starts the main event loop, keeping the main window running and responsive to the user, and calls the setup_gui() function to set up the graphical user interface.

if __name__ == "__main__":
  # Set up and run the GUI
  root = setup_gui()
  root.mainloop()

Example

I ran this code on a Windows Syestem as shown in the image below:

Also on Linux system:

Full Code

from transformers import MarianMTModel, MarianTokenizer
import tkinter as tk
from tkinter import messagebox, OptionMenu, StringVar, ttk
import threading
import platform




def translate(texts, src_lang="en", tgt_lang="fr", num_beams=5, early_stopping=True):
  """ This Translate texts from src_lang to tgt_lang using MarianMTModel with beam search."""
  model_name = f'Helsinki-NLP/opus-mt-{src_lang}-{tgt_lang}'
  tokenizer = MarianTokenizer.from_pretrained(model_name)
  model = MarianMTModel.from_pretrained(model_name)




  # Tokenize the text
  inputs = tokenizer(texts, return_tensors="pt", padding=True)




  # Generate translation using the model with beam search
  translated = model.generate(**inputs, num_beams=num_beams, early_stopping=early_stopping)




  # Decode the translated text
  translated_texts = [tokenizer.decode(t, skip_special_tokens=True) for t in translated]
  return translated_texts




def on_translate():
  """Handle the translate button click."""
  source_text = text_entry.get("1.0", tk.END).strip()
  if not source_text:
      messagebox.showwarning("Input Error", "Please enter text to translate.")
      return




  src_lang_name = src_lang_var.get().strip()
  tgt_lang_name = tgt_lang_var.get().strip()




  if not src_lang_name or not tgt_lang_name:
      messagebox.showwarning("Input Error", "Please specify both source and target languages.")
      return




  src_lang = language_dict[src_lang_name]
  tgt_lang = language_dict[tgt_lang_name]




  def translate_thread():
      try:
          set_cursor("watch")
          progress_bar.start()
          translations = translate([source_text], src_lang, tgt_lang)
          result_text.set(translations[0])
      except Exception as e:
          messagebox.showerror("Translation Error", str(e))
      finally:
          progress_bar.stop()
          set_cursor("")




  # Run the translation in a separate thread
  threading.Thread(target=translate_thread).start()




def set_cursor(cursor_type):
  """Set the cursor type depending on the platform."""
  cursor = cursor_type if platform.system() != 'Linux' else 'watch'
  root.config(cursor=cursor)




def setup_gui():
  """Set up the GUI components."""
  global root
  root = tk.Tk()
  root.title("Translator - The Pycodes")




  # Blog name label
  blog_name_label = tk.Label(root, text="The Pycodes", font=("Helvetica", 16, "bold"))
  blog_name_label.pack(pady=10)




  # Source text entry
  global text_entry
  text_entry = tk.Text(root, height=10, width=50)
  text_entry.pack(pady=10)




  # Source language selection
  src_lang_label = tk.Label(root, text="Source Language:")
  src_lang_label.pack()
  global src_lang_var
  src_lang_var = StringVar(root)
  src_lang_var.set("English")
  src_lang_menu = OptionMenu(root, src_lang_var, *language_dict.keys())
  src_lang_menu.pack()




  # Target language selection
  tgt_lang_label = tk.Label(root, text="Target Language:")
  tgt_lang_label.pack()
  global tgt_lang_var
  tgt_lang_var = StringVar(root)
  tgt_lang_var.set("French")
  tgt_lang_menu = OptionMenu(root, tgt_lang_var, *language_dict.keys())
  tgt_lang_menu.pack()




  # Translate button
  translate_button = tk.Button(root, text="Translate", command=on_translate)
  translate_button.pack(pady=10)




  # Progress bar
  global progress_bar
  progress_bar = ttk.Progressbar(root, mode='indeterminate')
  progress_bar.pack(pady=10)




  # Result display
  global result_text
  result_text = tk.StringVar()
  result_label = tk.Label(root, textvariable=result_text, wraplength=400, justify="left", bg="lightgray", height=10, width=50)
  result_label.pack(pady=10)




  return root




# Available languages
language_dict = {
  "English": "en",
  "French": "fr",
  "German": "de",
  "Spanish": "es",
  "Italian": "it",
  "Dutch": "nl",
  "Portuguese": "pt",
  "Russian": "ru",
  "Chinese": "zh",
  "Japanese": "ja",
  "Korean": "ko"
}




if __name__ == "__main__":
  # Set up and run the GUI
  root = setup_gui()
  root.mainloop()

Happy Coding!

Subscribe for Top Free Python Tutorials!

Receive the best directly.  Elevate Your Coding Journey!

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top
×