In the fast-paced world of machine learning, efficiency and optimization are key. As data scientists and enthusiasts, we constantly seek ways to streamline our workflows, automate repetitive tasks, and achieve the best possible results with minimal effort. This is where TPOT comes in.
TPOT is an open-source AutoML tool designed to simplify the process of machine learning model optimization. Imagine exploring thousands of possible pipelines, fine-tuning hyperparameters, and selecting the best model for your data—all without writing extensive code. TPOT uses the power of genetic programming to automate these tasks, saving you time and enhancing productivity.
Today, you’ll learn how to automate machine learning model optimization with TPOT in Python using tkinter
, pandas
, sklearn
, and TPOTClassifier
. You’ll cover dataset loading, automated model training, evaluation of accuracy, and exporting predictions to CSV files. Discover how to streamline your machine learning workflows with TPOT’s automated optimization capabilities. So, let’s get started!
Table of Contents
- Necessary Libraries
- Imports
- Load Dataset Function
- Run TPOT Optimization Function
- Run TPOT in a Separate Thread
- Export Predictions Function
- Main Block
- Example
- Full Code
Necessary Libraries
To ensure this code functions properly, make sure to install these libraries via the terminal or command prompt by running these commands:
$ pip install tk
$ pip install pandas
$ pip install tpot
$ pip install scikit-learn
Imports
import tkinter as tk
from tkinter import filedialog, messagebox
import pandas as pd
from tpot import TPOTClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import LabelEncoder
import threading
Well then, if we want to command the power of machine learning, we will need the assistance of our trusty tools. This is why we import:
- tkinter: This will help us create a user-friendly graphical interface. We’ll also use
filedialog
to open file dialogs andmessagebox
to display messages. - pandas: Our go-to library for handling and analyzing data with ease.
- TPOTClassifier: The star of our show, this tool will help us create and optimize our machine-learning models automatically.
- train_test_split: To divide our dataset into training and testing parts, ensuring our model learns and is evaluated correctly.
- accuracy_score: To measure how well our model’s predictions match up with the actual outcomes.
- LabelEncoder: To convert text labels into numerical values, making it easier for our model to process them.
- threading: To keep our application responsive by allowing multitasking without freezing the main window.
Load Dataset Function
Now it’s time to load the core of our quest, our treasure chest, you might say, the dataset. This is the objective of the load_dataset()
function. It allows the user to pick a CSV file and loads it into the dataset
variable through filedialog
. The selected file is then read using pd.read_csv()
. Once this is done, the “Run TPOT Optimization” button is enabled.
Finally, a message pops up to confirm that the load process was successful. If something goes wrong, it shows an error message.
def load_dataset():
global dataset
file_path = filedialog.askopenfilename()
if file_path:
try:
dataset = pd.read_csv(file_path)
run_button.config(state=tk.NORMAL)
messagebox.showinfo("Dataset Loaded", "Dataset loaded successfully!")
except Exception as e:
messagebox.showerror("Error", f"Failed to load dataset: {e}")
Run TPOT Optimization Function
With our dataset loaded, let’s dive into the heart of the TPOT optimization operation. To perform this operation, we created a function called run_tpot()
. This function uses all columns as features except the last one, which is the target we want to predict. If needed, it can also convert text labels into numbers using LabelEncoder
.
Let’s dig into where the magic actually happens. The function uses train_test_split
to divide the data into training and testing sets:
- The training set to build the model and the testing set to check its accuracy. With the data split, we use
TPOTClassifier
to find the best machine learning model. After finding the model, we train it usingtpot.fit
. - Next, we test our model’s predictions on the test data with
tpot.predict
and calculate the accuracy of those predictions withaccuracy_score
, displaying the result on the GUI withresult_label.config
. But it doesn’t end here. - The function saves the best model found by TPOT with the help of
pipeline
, and finally enables the “Export Predictions” button while showing a success message. In case of failure, it shows an error message.
def run_tpot():
global pipeline, X_test_split, y_test_split, y_pred
if dataset is not None:
try:
# Assume the last column is the target variable
X = dataset.iloc[:, :-1]
y = dataset.iloc[:, -1]
# Encode target labels if they are categorical
if y.dtype == 'object':
le = LabelEncoder()
y = le.fit_transform(y)
X_train_split, X_test_split, y_train_split, y_test_split = train_test_split(X, y, test_size=0.2,
random_state=42)
tpot = TPOTClassifier(verbosity=2, generations=5, population_size=20, random_state=42)
tpot.fit(X_train_split, y_train_split)
y_pred = tpot.predict(X_test_split)
accuracy = accuracy_score(y_test_split, y_pred)
result_label.config(text=f"Test Accuracy: {accuracy:.4f}")
pipeline = tpot.fitted_pipeline_
export_button.config(state=tk.NORMAL)
messagebox.showinfo("Optimization Complete", "TPOT optimization complete!")
except Exception as e:
messagebox.showerror("Error", f"Failed to run TPOT optimization: {e}")
Run TPOT in a Separate Thread
Having nailed down the core function of our program, we now want to ensure it runs smoothly without freezing the main window of our application. To achieve this, we defined the run_tpot_thread()
function. This function starts a new thread and calls the run_tpot()
function to execute in the background.
def run_tpot_thread():
threading.Thread(target=run_tpot).start()
Export Predictions Function
With our best model ready, it’s time to save its predictions into a CSV file. How do we do that?
We use the export_predictions()
function. This handy function opens a file dialog so users can choose where to save the CSV file. Then, it uses pandas
to create a DataFrame with the test data, adding both the actual and predicted labels. Finally, it saves the DataFrame as a CSV file. To wrap things up, a message box pops up to let us know if everything was successful or if something went wrong.
def export_predictions():
if y_pred is not None:
file_path = filedialog.asksaveasfilename(defaultextension=".csv", filetypes=[("CSV files", "*.csv")])
if file_path:
try:
results_df = pd.DataFrame(X_test_split)
results_df['True_Label'] = y_test_split
results_df['Predicted_Label'] = y_pred
# Save the DataFrame to a CSV file
results_df.to_csv(file_path, index=False)
messagebox.showinfo("Predictions Exported", "Predictions exported successfully!")
except Exception as e:
messagebox.showerror("Error", f"Failed to export predictions: {e}")
Main Block
This is the grand finale, where we set up our program. First, we make sure this script can only be run directly and not imported as a module. Then, we set up global variables to store the dataset, model, and predictions. Next, we create the main window, set its title, and define its size. We add a label as the program’s title and buttons for different functions:
- The “Load Dataset” button calls the
load_dataset()
function. - The “Run TPOT Optimization” button calls the
run_tpot_thread()
function. - The
result_label
displays the model’saccuracy_score
. - The “Export Predictions” button calls the
export_predictions()
function.
Finally, we use the mainloop()
command to keep the main window running and responsive to the user.
if __name__ == "__main__":
# Initialize global variables
dataset = None
pipeline = None
X_test_split, y_test_split, y_pred = None, None, None
# Create the main window
root = tk.Tk()
root.title("AutoML with TPOT - The Pycodes")
root.geometry("400x250")
# Create and place widgets
label = tk.Label(root, text="AutoML with TPOT", font=("Helvetica", 16))
label.pack(pady=10)
load_button = tk.Button(root, text="Load Dataset", command=load_dataset)
load_button.pack(pady=10)
run_button = tk.Button(root, text="Run TPOT Optimization", command=run_tpot_thread, state=tk.DISABLED)
run_button.pack(pady=10)
result_label = tk.Label(root, text="", font=("Helvetica", 12))
result_label.pack(pady=10)
export_button = tk.Button(root, text="Export Predictions", command=export_predictions, state=tk.DISABLED)
export_button.pack(pady=10)
# Start the Tkinter event loop
root.mainloop()
Example
This code works on all systems (Windows, Linux, and macOS).
As you see in the images below I executed this script on Windows:
Also on Linux system as shown in the images below:
Full Code
import tkinter as tk
from tkinter import filedialog, messagebox
import pandas as pd
from tpot import TPOTClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import LabelEncoder
import threading
def load_dataset():
global dataset
file_path = filedialog.askopenfilename()
if file_path:
try:
dataset = pd.read_csv(file_path)
run_button.config(state=tk.NORMAL)
messagebox.showinfo("Dataset Loaded", "Dataset loaded successfully!")
except Exception as e:
messagebox.showerror("Error", f"Failed to load dataset: {e}")
def run_tpot():
global pipeline, X_test_split, y_test_split, y_pred
if dataset is not None:
try:
# Assume the last column is the target variable
X = dataset.iloc[:, :-1]
y = dataset.iloc[:, -1]
# Encode target labels if they are categorical
if y.dtype == 'object':
le = LabelEncoder()
y = le.fit_transform(y)
X_train_split, X_test_split, y_train_split, y_test_split = train_test_split(X, y, test_size=0.2,
random_state=42)
tpot = TPOTClassifier(verbosity=2, generations=5, population_size=20, random_state=42)
tpot.fit(X_train_split, y_train_split)
y_pred = tpot.predict(X_test_split)
accuracy = accuracy_score(y_test_split, y_pred)
result_label.config(text=f"Test Accuracy: {accuracy:.4f}")
pipeline = tpot.fitted_pipeline_
export_button.config(state=tk.NORMAL)
messagebox.showinfo("Optimization Complete", "TPOT optimization complete!")
except Exception as e:
messagebox.showerror("Error", f"Failed to run TPOT optimization: {e}")
def run_tpot_thread():
threading.Thread(target=run_tpot).start()
def export_predictions():
if y_pred is not None:
file_path = filedialog.asksaveasfilename(defaultextension=".csv", filetypes=[("CSV files", "*.csv")])
if file_path:
try:
results_df = pd.DataFrame(X_test_split)
results_df['True_Label'] = y_test_split
results_df['Predicted_Label'] = y_pred
# Save the DataFrame to a CSV file
results_df.to_csv(file_path, index=False)
messagebox.showinfo("Predictions Exported", "Predictions exported successfully!")
except Exception as e:
messagebox.showerror("Error", f"Failed to export predictions: {e}")
if __name__ == "__main__":
# Initialize global variables
dataset = None
pipeline = None
X_test_split, y_test_split, y_pred = None, None, None
# Create the main window
root = tk.Tk()
root.title("AutoML with TPOT - The Pycodes")
root.geometry("400x250")
# Create and place widgets
label = tk.Label(root, text="AutoML with TPOT", font=("Helvetica", 16))
label.pack(pady=10)
load_button = tk.Button(root, text="Load Dataset", command=load_dataset)
load_button.pack(pady=10)
run_button = tk.Button(root, text="Run TPOT Optimization", command=run_tpot_thread, state=tk.DISABLED)
run_button.pack(pady=10)
result_label = tk.Label(root, text="", font=("Helvetica", 12))
result_label.pack(pady=10)
export_button = tk.Button(root, text="Export Predictions", command=export_predictions, state=tk.DISABLED)
export_button.pack(pady=10)
# Start the Tkinter event loop
root.mainloop()
Happy Coding!