Artificial intelligence is continuously expanding the horizons of what’s possible, and one of the most fascinating advancements is emotion detection. Imagine a computer that can understand your feelings through the words you speak or type. By recognizing human emotions from text and speech, AI systems can interact with us in more personalized and empathetic ways. Whether it’s improving customer service by understanding how you feel or providing better mental health support through sentiment analysis, the potential applications are both vast and transformative.
As AI technology continues to advance, incorporating emotion detection into various applications is becoming not only possible but also increasingly beneficial.
In today’s article, we’ll explore how to detect emotions from text and speech in Python. We’ll leverage powerful tools such as transformers, text2emotion, and speech recognition to demonstrate how to build a robust emotion detection system. This tutorial is perfect for AI enthusiasts and Python developers looking to enhance their projects with emotion analysis capabilities. Whether you’re interested in text emotion analysis, speech emotion analysis, or integrating machine learning into your applications, this guide has you covered.
Let’s get started!
Table of Contents
- Getting Started
- Imports
- Load the emotion classifier
- Functions for Detecting Emotions from Text
- Speech-to-text and Emotion Detection
- Display the Results in the GUI
- Process Text and Speech Inputs
- Browse for a Speech File
- Creating the GUI
- Example
- Full Code
Getting Started
First, let’s get everything set up. Ensure you install these libraries via the terminal or command prompt by running these commands:
$ pip install tk
$ pip install transformers torch
$ pip install text2emotion
$ pip install SpeechRecognition
$ pip install pydub
$ pip install tensorflow
Requirements for Speech Processing
Before diving into speech emotion detection, ensure you have FFmpeg installed on your system. Here’s how to do it for different operating systems:
For Linux:
Open your terminal and run the following command:
sudo apt-get install ffmpeg
For macOS:
Use Homebrew to install FFmpeg by running:
brew install ffmpeg
For Windows:
- Download the FFmpeg installer from the FFmpeg website and extract
ffmpeg-release-essentials.zip
. - Add the FFmpeg
bin
directory (e.g.,C:\ffmpeg\bin
) to your system’s PATH environment variable.
PS: FFmpeg is a vital tool for ensuring that your speech processing and emotion detection work smoothly and effectively, no matter the audio file format or operating system you are using.
Adding FFmpeg to your PATH ensures that your system can recognize and use FFmpeg from any command prompt.
Imports
Now, just like we are used to, before starting any journey, we need to gear up with the appropriate tools. This time is no different, so let’s see which libraries and modules we will import:
- text2emotion: This library helps us detect emotions from text and gain insights into the sentiment behind words.
- transformers: This one Uses a pre-trained model for advanced emotion detection, making sentiment analysis smarter and more accurate.
- tkinter: Helps create a graphical user interface filled with GUI elements to make our program user-friendly.
- pydub: Simplifies working with audio files, enabling easy management of audio formats and conversions.
- os: Interacts with the operating system.
- speech recognition: A library that converts audio files to text with ease, unlocking new ways to process and analyze spoken content.
- threading: Runs tasks in the background to keep the main window responsive.
import text2emotion as te
from transformers import pipeline
from tkinter import *
from tkinter import scrolledtext, filedialog, ttk
from pydub import AudioSegment
import os
import speech_recognition as sr
import threading
We also check if torch is installed, as it is required by transformers
. If it is not installed, we proceed to install it.
# Ensure PyTorch is installed
try:
import torch
except ImportError:
print("Installing PyTorch...")
os.system('pip install torch')
Load the emotion classifier
Next, let’s bring our advanced emotion-detection robot to life! How do we do that? It’s simple. We’ll load a pre-trained model from the transformers
library that’s fine-tuned specifically for emotion detection. In this code, we’re using the j-hartmann/emotion-english-distilroberta-base
model.
# Load the emotion classifier
emotion_classifier = pipeline('sentiment-analysis', model='j-hartmann/emotion-english-distilroberta-base')
Functions for Detecting Emotions from Text
Now that we have all the tools necessary to build our emotion-detecting machines, let’s dive in:
detect_emotion_text_t2e Function
With this function, we will use the text2emotion
library to detect emotions from the text it processes. The function returns the detected emotions as a dictionary. If it fails, it returns an empty dictionary and prints an error message.
# Function for text-based emotion detection using text2emotion
def detect_emotion_text_t2e(text):
try:
emotions = te.get_emotion(text)
return emotions
except Exception as e:
print(f"Error in text2emotion text emotion detection: {e}")
return {}
detect_emotion_text_transformers Function
This function is similar to the previous one, but with two exciting differences:
- First, instead of using
text2emotion
, it leverages a pre-trainedtransformers
model to detect emotions. - Second, not only does it return the detected emotion, but it also provides a confidence score for each one. This way, you get a more detailed understanding of the emotional analysis.
# Function for text-based emotion detection using transformers
def detect_emotion_text_transformers(text):
try:
result = emotion_classifier(text)
return result
except Exception as e:
print(f"Error in transformers text emotion detection: {e}")
return []
Speech-to-text and Emotion Detection
Having covered emotion detection from text, let’s move on to speech. Detecting emotions from speech is a bit more complex and involves several steps:
- The
detect_emotion_speech()
function starts by converting the audio file to WAV format usingAudioSegment
. Then, it transcribes the speech to text withspeech_recognition
and prints the text for verification. - After that, it uses a
transformers
model to detect emotions from the transcribed text. - Finally, it cleans up the temporary WAV file. This function is also robust, as it can handle errors and print any issues it encounters immediately.
# Function for speech-to-text and emotion detection
def detect_emotion_speech(audio_path):
try:
# Convert audio to WAV in case it's not in WAV format
sound = AudioSegment.from_file(audio_path)
wav_path = "temp_audio.wav"
sound.export(wav_path, format="wav")
recognizer = sr.Recognizer()
with sr.AudioFile(wav_path) as source:
audio = recognizer.record(source)
# Perform speech recognition
text = recognizer.recognize_google(audio)
print(f"Transcribed Text: {text}")
# Perform emotion detection on transcribed text
result = emotion_classifier(text)
os.remove(wav_path) # Clean up temporary file
return text, result
except sr.UnknownValueError:
return "Speech Recognition could not understand audio", None
except sr.RequestError as e:
return f"Could not obtain results from Google Speech Recognition service due to: {e}", None
except Exception as e:
print(f"Error in speech-to-text conversion: {e}")
return None, None
Display the Results in the GUI
For this step, having made all the necessary arrangements to detect emotions from both text and audio, it’s time to orchestrate how the results will be displayed. This is the purpose of the display_results()
function. Let’s see how this function works:
When triggered, the function initiates the progress bar, indicating that emotion detection has started. It then retrieves the user input, whether it is text or audio.
- For text input: The function retrieves the text from the entry box. If the box is empty, it prompts the user to input text and stops the progress bar. Once the user provides the text, the process starts again in a new thread using the
process_text_input
function. - For audio input: The function verifies the audio file path using the
os
module. If the path is valid, it starts a new thread using theprocess_speech_input
function. If the path is invalid, it stops the progress bar and displays an error message.
Finally, the function inserts the results into the output text widget.
# Function to display results in the GUI
def display_results():
progress_bar.start()
input_type = var.get()
if input_type == "text":
text = input_text.get("1.0", END).strip()
if not text:
result_text = "Please enter some text."
output_text.delete("1.0", END)
output_text.insert(INSERT, result_text)
progress_bar.stop()
else:
threading.Thread(target=process_text_input, args=(text,)).start()
elif input_type == "speech":
audio_path = input_text.get("1.0", END).strip()
if not os.path.isfile(audio_path):
result_text = "Invalid file path. Please select a valid audio file."
output_text.delete("1.0", END)
output_text.insert(INSERT, result_text)
progress_bar.stop()
else:
threading.Thread(target=process_speech_input, args=(audio_path,)).start()
Process Text and Speech Inputs
If you’ve read this far, congratulations! You’ve reached the core of our program: the process_text_input()
and process_speech_input()
functions. This is where the magic happens:
process_text_input
The first function uses both the text2emotion
library and the pre-trained transformers
model to analyze the provided text for emotions. Once the analysis is complete, it formats the results into a readable string and inserts it into the output_text
widget. The function also stops the progress bar to display the result and indicate that the process is complete.
def process_text_input(text):
emotions_t2e = detect_emotion_text_t2e(text)
emotions_transformers = detect_emotion_text_transformers(text)
result_text = f"Emotions (Text - text2emotion):\n{emotions_t2e}\n\n"
result_text += f"Emotions (Text - transformers):\n{emotions_transformers}\n"
output_text.delete("1.0", END)
output_text.insert(INSERT, result_text)
progress_bar.stop()
process_speech_input
While the previous function is exclusive to text input, this one is dedicated to audio input. It calls the detect_emotion_speech()
function to detect emotions from the audio after converting it to text. If both the text and emotion are None
, it means an error occurred, so an error message is displayed. If successful, the transcribed text and detected emotion are formatted into a readable string and inserted into the output_text
widget to display the results. The progress bar is also stopped, signifying the end of the process.
def process_speech_input(audio_path):
text, emotions = detect_emotion_speech(audio_path)
if text is None and emotions is None:
result_text = "Error in processing the audio file."
else:
result_text = f"Transcribed Text:\n{text}\n\nEmotions (Speech):\n{emotions}\n"
output_text.delete("1.0", END)
output_text.insert(INSERT, result_text)
progress_bar.stop()
Browse for a Speech File
As we have emphasized, our goal is to make our programs user-friendly. That’s why we created the browse_file()
function to act as a pathfinder for the audio file. It uses filedialog
to allow the user to select the audio file they want.
# Function to browse for a speech file
def browse_file():
file_path = filedialog.askopenfilename(filetypes=[("Audio Files", "*.mp3 *.wav *.ogg *.flac *.aac")])
if file_path:
input_text.delete("1.0", END)
input_text.insert(INSERT, file_path)
Creating the GUI
Well, guys, we’ve reached the final part of our code. This is where we bring together all the elements we’ve created to form an engaging and functional user interface. First, we create the main window and set its title.
Next, we add a label prompting the user to enter the text or audio path and a text widget for the input, complete with a scrollbar. We create a StringVar
to store the selected input type, defaulting to text, along with radio buttons for input type selection.
Then, we add a “Browse” button that calls the browse_file()
function when clicked, and a “Detect Emotion” button that calls the display_results()
function. We also create an output label and an output_text
widget to display results, ensuring it has a scrollbar.
Finally, we include a horizontal progress bar to indicate that the process is ongoing. We close our code with mainloop()
, which starts the main event loop and ensures the main window remains responsive to the user.
# Create the main window
root = Tk()
root.title("Emotion Detection from Text and Speech - The Pycodes")
# Create and place widgets
input_label = Label(root, text="Please Enter a Text or an Audio File Path:")
input_label.pack()
input_text = scrolledtext.ScrolledText(root, wrap=WORD, width=100, height=10)
input_text.pack(padx=10, pady=10)
var = StringVar(value="text")
text_radio = Radiobutton(root, text="Text", variable=var, value="text")
text_radio.pack()
speech_radio = Radiobutton(root, text="Speech", variable=var, value="speech")
speech_radio.pack()
browse_button = Button(root, text="Browse", command=browse_file)
browse_button.pack(pady=10)
process_button = Button(root, text="Detect Emotion", command=display_results)
process_button.pack(pady=10)
output_label = Label(root, text="Output:")
output_label.pack()
output_text = scrolledtext.ScrolledText(root, wrap=WORD, width=100, height=20)
output_text.pack(padx=10, pady=10)
progress_bar = ttk.Progressbar(root, orient=HORIZONTAL, length=380, mode='indeterminate')
progress_bar.place(x=3, y=280)
# Run the GUI event loop
root.mainloop()
Example
I ran this script on a Windows system, I wrote a sad phrase as shown in the image below:
Next, I detected emotions from a speech by browsing for an MP3 file of Rocky Balboa’s motivational speech, which is full of intense emotions, as shown in the image below:
Also, I ran this code on a Linux system. This time, I tried using Happy Speech:
Full Code
import text2emotion as te
from transformers import pipeline
from tkinter import *
from tkinter import scrolledtext, filedialog, ttk
from pydub import AudioSegment
import os
import speech_recognition as sr
import threading
# Ensure PyTorch is installed
try:
import torch
except ImportError:
print("Installing PyTorch...")
os.system('pip install torch')
# Load the emotion classifier
emotion_classifier = pipeline('sentiment-analysis', model='j-hartmann/emotion-english-distilroberta-base')
# Function for text-based emotion detection using text2emotion
def detect_emotion_text_t2e(text):
try:
emotions = te.get_emotion(text)
return emotions
except Exception as e:
print(f"Error in text2emotion text emotion detection: {e}")
return {}
# Function for text-based emotion detection using transformers
def detect_emotion_text_transformers(text):
try:
result = emotion_classifier(text)
return result
except Exception as e:
print(f"Error in transformers text emotion detection: {e}")
return []
# Function for speech-to-text and emotion detection
def detect_emotion_speech(audio_path):
try:
# Convert audio to WAV in case it's not in WAV format
sound = AudioSegment.from_file(audio_path)
wav_path = "temp_audio.wav"
sound.export(wav_path, format="wav")
recognizer = sr.Recognizer()
with sr.AudioFile(wav_path) as source:
audio = recognizer.record(source)
# Perform speech recognition
text = recognizer.recognize_google(audio)
print(f"Transcribed Text: {text}")
# Perform emotion detection on transcribed text
result = emotion_classifier(text)
os.remove(wav_path) # Clean up temporary file
return text, result
except sr.UnknownValueError:
return "Speech Recognition could not understand audio", None
except sr.RequestError as e:
return f"Could not obtain results from Google Speech Recognition service due to: {e}", None
except Exception as e:
print(f"Error in speech-to-text conversion: {e}")
return None, None
# Function to display results in the GUI
def display_results():
progress_bar.start()
input_type = var.get()
if input_type == "text":
text = input_text.get("1.0", END).strip()
if not text:
result_text = "Please enter some text."
output_text.delete("1.0", END)
output_text.insert(INSERT, result_text)
progress_bar.stop()
else:
threading.Thread(target=process_text_input, args=(text,)).start()
elif input_type == "speech":
audio_path = input_text.get("1.0", END).strip()
if not os.path.isfile(audio_path):
result_text = "Invalid file path. Please select a valid audio file."
output_text.delete("1.0", END)
output_text.insert(INSERT, result_text)
progress_bar.stop()
else:
threading.Thread(target=process_speech_input, args=(audio_path,)).start()
def process_text_input(text):
emotions_t2e = detect_emotion_text_t2e(text)
emotions_transformers = detect_emotion_text_transformers(text)
result_text = f"Emotions (Text - text2emotion):\n{emotions_t2e}\n\n"
result_text += f"Emotions (Text - transformers):\n{emotions_transformers}\n"
output_text.delete("1.0", END)
output_text.insert(INSERT, result_text)
progress_bar.stop()
def process_speech_input(audio_path):
text, emotions = detect_emotion_speech(audio_path)
if text is None and emotions is None:
result_text = "Error in processing the audio file."
else:
result_text = f"Transcribed Text:\n{text}\n\nEmotions (Speech):\n{emotions}\n"
output_text.delete("1.0", END)
output_text.insert(INSERT, result_text)
progress_bar.stop()
# Function to browse for a speech file
def browse_file():
file_path = filedialog.askopenfilename(filetypes=[("Audio Files", "*.mp3 *.wav *.ogg *.flac *.aac")])
if file_path:
input_text.delete("1.0", END)
input_text.insert(INSERT, file_path)
# Create the main window
root = Tk()
root.title("Emotion Detection from Text and Speech - The Pycodes")
# Create and place widgets
input_label = Label(root, text="Please Enter a Text or an Audio File Path:")
input_label.pack()
input_text = scrolledtext.ScrolledText(root, wrap=WORD, width=100, height=10)
input_text.pack(padx=10, pady=10)
var = StringVar(value="text")
text_radio = Radiobutton(root, text="Text", variable=var, value="text")
text_radio.pack()
speech_radio = Radiobutton(root, text="Speech", variable=var, value="speech")
speech_radio.pack()
browse_button = Button(root, text="Browse", command=browse_file)
browse_button.pack(pady=10)
process_button = Button(root, text="Detect Emotion", command=display_results)
process_button.pack(pady=10)
output_label = Label(root, text="Output:")
output_label.pack()
output_text = scrolledtext.ScrolledText(root, wrap=WORD, width=100, height=20)
output_text.pack(padx=10, pady=10)
progress_bar = ttk.Progressbar(root, orient=HORIZONTAL, length=380, mode='indeterminate')
progress_bar.place(x=3, y=280)
# Run the GUI event loop
root.mainloop()
Happy Coding!