Home » Tutorials » How to Automate Machine Learning Model Optimization with TPOT in Python

How to Automate Machine Learning Model Optimization with TPOT in Python

In the fast-paced world of machine learning, efficiency and optimization are key. As data scientists and enthusiasts, we constantly seek ways to streamline our workflows, automate repetitive tasks, and achieve the best possible results with minimal effort. This is where TPOT comes in.

TPOT is an open-source AutoML tool designed to simplify the process of machine learning model optimization. Imagine exploring thousands of possible pipelines, fine-tuning hyperparameters, and selecting the best model for your data—all without writing extensive code. TPOT uses the power of genetic programming to automate these tasks, saving you time and enhancing productivity.

Today, you’ll learn how to automate machine learning model optimization with TPOT in Python using tkinter, pandas, sklearn, and TPOTClassifier. You’ll cover dataset loading, automated model training, evaluation of accuracy, and exporting predictions to CSV files. Discover how to streamline your machine learning workflows with TPOT’s automated optimization capabilities. So, let’s get started!

Table of Contents

Necessary Libraries

To ensure this code functions properly, make sure to install these libraries via the terminal or command prompt by running these commands:

$ pip install tk
$ pip install pandas 
$ pip install tpot
$ pip install scikit-learn

Imports

import tkinter as tk
from tkinter import filedialog, messagebox
import pandas as pd
from tpot import TPOTClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import LabelEncoder
import threading

Well then, if we want to command the power of machine learning, we will need the assistance of our trusty tools. This is why we import:

  • tkinter: This will help us create a user-friendly graphical interface. We’ll also use filedialog to open file dialogs and messagebox to display messages.
  • pandas: Our go-to library for handling and analyzing data with ease.
  • TPOTClassifier: The star of our show, this tool will help us create and optimize our machine-learning models automatically.
  • train_test_split: To divide our dataset into training and testing parts, ensuring our model learns and is evaluated correctly.
  • accuracy_score: To measure how well our model’s predictions match up with the actual outcomes.
  • LabelEncoder: To convert text labels into numerical values, making it easier for our model to process them.
  • threading: To keep our application responsive by allowing multitasking without freezing the main window.

Load Dataset Function

Now it’s time to load the core of our quest, our treasure chest, you might say, the dataset. This is the objective of the load_dataset() function. It allows the user to pick a CSV file and loads it into the dataset variable through filedialog. The selected file is then read using pd.read_csv(). Once this is done, the “Run TPOT Optimization” button is enabled.

Finally, a message pops up to confirm that the load process was successful. If something goes wrong, it shows an error message.

def load_dataset():
   global dataset
   file_path = filedialog.askopenfilename()
   if file_path:
       try:
           dataset = pd.read_csv(file_path)
           run_button.config(state=tk.NORMAL)
           messagebox.showinfo("Dataset Loaded", "Dataset loaded successfully!")
       except Exception as e:
           messagebox.showerror("Error", f"Failed to load dataset: {e}")

Run TPOT Optimization Function

With our dataset loaded, let’s dive into the heart of the TPOT optimization operation. To perform this operation, we created a function called run_tpot(). This function uses all columns as features except the last one, which is the target we want to predict. If needed, it can also convert text labels into numbers using LabelEncoder.

Let’s dig into where the magic actually happens. The function uses train_test_split to divide the data into training and testing sets:

  • The training set to build the model and the testing set to check its accuracy. With the data split, we use TPOTClassifier to find the best machine learning model. After finding the model, we train it using tpot.fit.
  • Next, we test our model’s predictions on the test data with tpot.predict and calculate the accuracy of those predictions with accuracy_score, displaying the result on the GUI with result_label.config. But it doesn’t end here.
  • The function saves the best model found by TPOT with the help of pipeline, and finally enables the “Export Predictions” button while showing a success message. In case of failure, it shows an error message.
def run_tpot():
   global pipeline, X_test_split, y_test_split, y_pred
   if dataset is not None:
       try:
           # Assume the last column is the target variable
           X = dataset.iloc[:, :-1]
           y = dataset.iloc[:, -1]


           # Encode target labels if they are categorical
           if y.dtype == 'object':
               le = LabelEncoder()
               y = le.fit_transform(y)


           X_train_split, X_test_split, y_train_split, y_test_split = train_test_split(X, y, test_size=0.2,
                                                                                       random_state=42)


           tpot = TPOTClassifier(verbosity=2, generations=5, population_size=20, random_state=42)
           tpot.fit(X_train_split, y_train_split)


           y_pred = tpot.predict(X_test_split)
           accuracy = accuracy_score(y_test_split, y_pred)
           result_label.config(text=f"Test Accuracy: {accuracy:.4f}")


           pipeline = tpot.fitted_pipeline_
           export_button.config(state=tk.NORMAL)


           messagebox.showinfo("Optimization Complete", "TPOT optimization complete!")
       except Exception as e:
           messagebox.showerror("Error", f"Failed to run TPOT optimization: {e}")

Run TPOT in a Separate Thread

Having nailed down the core function of our program, we now want to ensure it runs smoothly without freezing the main window of our application. To achieve this, we defined the run_tpot_thread() function. This function starts a new thread and calls the run_tpot() function to execute in the background.

def run_tpot_thread():
   threading.Thread(target=run_tpot).start()

Export Predictions Function

With our best model ready, it’s time to save its predictions into a CSV file. How do we do that?

We use the export_predictions() function. This handy function opens a file dialog so users can choose where to save the CSV file. Then, it uses pandas to create a DataFrame with the test data, adding both the actual and predicted labels. Finally, it saves the DataFrame as a CSV file. To wrap things up, a message box pops up to let us know if everything was successful or if something went wrong.

def export_predictions():
   if y_pred is not None:
       file_path = filedialog.asksaveasfilename(defaultextension=".csv", filetypes=[("CSV files", "*.csv")])
       if file_path:
           try:


               results_df = pd.DataFrame(X_test_split)
               results_df['True_Label'] = y_test_split
               results_df['Predicted_Label'] = y_pred


               # Save the DataFrame to a CSV file
               results_df.to_csv(file_path, index=False)
               messagebox.showinfo("Predictions Exported", "Predictions exported successfully!")
           except Exception as e:
               messagebox.showerror("Error", f"Failed to export predictions: {e}")

Main Block

This is the grand finale, where we set up our program. First, we make sure this script can only be run directly and not imported as a module. Then, we set up global variables to store the dataset, model, and predictions. Next, we create the main window, set its title, and define its size. We add a label as the program’s title and buttons for different functions:

  • The “Load Dataset” button calls the load_dataset() function.
  • The “Run TPOT Optimization” button calls the run_tpot_thread() function.
  • The result_label displays the model’s accuracy_score.
  • The “Export Predictions” button calls the export_predictions() function.

Finally, we use the mainloop() command to keep the main window running and responsive to the user.

if __name__ == "__main__":
   # Initialize global variables
   dataset = None
   pipeline = None
   X_test_split, y_test_split, y_pred = None, None, None


   # Create the main window
   root = tk.Tk()
   root.title("AutoML with TPOT - The Pycodes")
   root.geometry("400x250")


   # Create and place widgets
   label = tk.Label(root, text="AutoML with TPOT", font=("Helvetica", 16))
   label.pack(pady=10)


   load_button = tk.Button(root, text="Load Dataset", command=load_dataset)
   load_button.pack(pady=10)


   run_button = tk.Button(root, text="Run TPOT Optimization", command=run_tpot_thread, state=tk.DISABLED)
   run_button.pack(pady=10)


   result_label = tk.Label(root, text="", font=("Helvetica", 12))
   result_label.pack(pady=10)


   export_button = tk.Button(root, text="Export Predictions", command=export_predictions, state=tk.DISABLED)
   export_button.pack(pady=10)


   # Start the Tkinter event loop
   root.mainloop()

Example

This code works on all systems (Windows, Linux, and macOS).

As you see in the images below I executed this script on Windows:

Also on Linux system as shown in the images below:

Full Code

import tkinter as tk
from tkinter import filedialog, messagebox
import pandas as pd
from tpot import TPOTClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import LabelEncoder
import threading




def load_dataset():
   global dataset
   file_path = filedialog.askopenfilename()
   if file_path:
       try:
           dataset = pd.read_csv(file_path)
           run_button.config(state=tk.NORMAL)
           messagebox.showinfo("Dataset Loaded", "Dataset loaded successfully!")
       except Exception as e:
           messagebox.showerror("Error", f"Failed to load dataset: {e}")




def run_tpot():
   global pipeline, X_test_split, y_test_split, y_pred
   if dataset is not None:
       try:
           # Assume the last column is the target variable
           X = dataset.iloc[:, :-1]
           y = dataset.iloc[:, -1]


           # Encode target labels if they are categorical
           if y.dtype == 'object':
               le = LabelEncoder()
               y = le.fit_transform(y)


           X_train_split, X_test_split, y_train_split, y_test_split = train_test_split(X, y, test_size=0.2,
                                                                                       random_state=42)


           tpot = TPOTClassifier(verbosity=2, generations=5, population_size=20, random_state=42)
           tpot.fit(X_train_split, y_train_split)


           y_pred = tpot.predict(X_test_split)
           accuracy = accuracy_score(y_test_split, y_pred)
           result_label.config(text=f"Test Accuracy: {accuracy:.4f}")


           pipeline = tpot.fitted_pipeline_
           export_button.config(state=tk.NORMAL)


           messagebox.showinfo("Optimization Complete", "TPOT optimization complete!")
       except Exception as e:
           messagebox.showerror("Error", f"Failed to run TPOT optimization: {e}")




def run_tpot_thread():
   threading.Thread(target=run_tpot).start()




def export_predictions():
   if y_pred is not None:
       file_path = filedialog.asksaveasfilename(defaultextension=".csv", filetypes=[("CSV files", "*.csv")])
       if file_path:
           try:


               results_df = pd.DataFrame(X_test_split)
               results_df['True_Label'] = y_test_split
               results_df['Predicted_Label'] = y_pred


               # Save the DataFrame to a CSV file
               results_df.to_csv(file_path, index=False)
               messagebox.showinfo("Predictions Exported", "Predictions exported successfully!")
           except Exception as e:
               messagebox.showerror("Error", f"Failed to export predictions: {e}")




if __name__ == "__main__":
   # Initialize global variables
   dataset = None
   pipeline = None
   X_test_split, y_test_split, y_pred = None, None, None


   # Create the main window
   root = tk.Tk()
   root.title("AutoML with TPOT - The Pycodes")
   root.geometry("400x250")


   # Create and place widgets
   label = tk.Label(root, text="AutoML with TPOT", font=("Helvetica", 16))
   label.pack(pady=10)


   load_button = tk.Button(root, text="Load Dataset", command=load_dataset)
   load_button.pack(pady=10)


   run_button = tk.Button(root, text="Run TPOT Optimization", command=run_tpot_thread, state=tk.DISABLED)
   run_button.pack(pady=10)


   result_label = tk.Label(root, text="", font=("Helvetica", 12))
   result_label.pack(pady=10)


   export_button = tk.Button(root, text="Export Predictions", command=export_predictions, state=tk.DISABLED)
   export_button.pack(pady=10)


   # Start the Tkinter event loop
   root.mainloop()

Happy Coding!

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top