Home » Tutorials » How to Detect Emotions from Text and Speech in Python

How to Detect Emotions from Text and Speech in Python

Artificial intelligence is continuously expanding the horizons of what’s possible, and one of the most fascinating advancements is emotion detection. Imagine a computer that can understand your feelings through the words you speak or type. By recognizing human emotions from text and speech, AI systems can interact with us in more personalized and empathetic ways. Whether it’s improving customer service by understanding how you feel or providing better mental health support through sentiment analysis, the potential applications are both vast and transformative.

As AI technology continues to advance, incorporating emotion detection into various applications is becoming not only possible but also increasingly beneficial.

In today’s article, we’ll explore how to detect emotions from text and speech in Python. We’ll leverage powerful tools such as transformers, text2emotion, and speech recognition to demonstrate how to build a robust emotion detection system. This tutorial is perfect for AI enthusiasts and Python developers looking to enhance their projects with emotion analysis capabilities. Whether you’re interested in text emotion analysis, speech emotion analysis, or integrating machine learning into your applications, this guide has you covered.

Let’s get started!

Table of Contents

Getting Started

First, let’s get everything set up. Ensure you install these libraries via the terminal or command prompt by running these commands:

$ pip install tk
$ pip install transformers torch
$ pip install text2emotion
$ pip install SpeechRecognition
$ pip install pydub
$ pip install tensorflow

Requirements for Speech Processing

Before diving into speech emotion detection, ensure you have FFmpeg installed on your system. Here’s how to do it for different operating systems:

For Linux:

Open your terminal and run the following command:

sudo apt-get install ffmpeg

For macOS:

Use Homebrew to install FFmpeg by running:

brew install ffmpeg

For Windows:

  • Download the FFmpeg installer from the FFmpeg website and extract ffmpeg-release-essentials.zip.
  • Add the FFmpeg bin directory (e.g., C:\ffmpeg\bin) to your system’s PATH environment variable.

PS: FFmpeg is a vital tool for ensuring that your speech processing and emotion detection work smoothly and effectively, no matter the audio file format or operating system you are using.

Adding FFmpeg to your PATH ensures that your system can recognize and use FFmpeg from any command prompt.

Imports

Now, just like we are used to, before starting any journey, we need to gear up with the appropriate tools. This time is no different, so let’s see which libraries and modules we will import:

  • text2emotion: This library helps us detect emotions from text and gain insights into the sentiment behind words.
  • transformers: This one Uses a pre-trained model for advanced emotion detection, making sentiment analysis smarter and more accurate.
  • tkinter: Helps create a graphical user interface filled with GUI elements to make our program user-friendly.
  • pydub: Simplifies working with audio files, enabling easy management of audio formats and conversions.
  • os: Interacts with the operating system.
  • speech recognition: A library that converts audio files to text with ease, unlocking new ways to process and analyze spoken content.
  • threading: Runs tasks in the background to keep the main window responsive.
import text2emotion as te
from transformers import pipeline
from tkinter import *
from tkinter import scrolledtext, filedialog, ttk
from pydub import AudioSegment
import os
import speech_recognition as sr
import threading

We also check if torch is installed, as it is required by transformers. If it is not installed, we proceed to install it.

# Ensure PyTorch is installed
try:
   import torch
except ImportError:
   print("Installing PyTorch...")
   os.system('pip install torch')

Load the emotion classifier

Next, let’s bring our advanced emotion-detection robot to life! How do we do that? It’s simple. We’ll load a pre-trained model from the transformers library that’s fine-tuned specifically for emotion detection. In this code, we’re using the j-hartmann/emotion-english-distilroberta-base model.

# Load the emotion classifier
emotion_classifier = pipeline('sentiment-analysis', model='j-hartmann/emotion-english-distilroberta-base')

Functions for Detecting Emotions from Text

Now that we have all the tools necessary to build our emotion-detecting machines, let’s dive in:

detect_emotion_text_t2e Function

With this function, we will use the text2emotion library to detect emotions from the text it processes. The function returns the detected emotions as a dictionary. If it fails, it returns an empty dictionary and prints an error message.

# Function for text-based emotion detection using text2emotion
def detect_emotion_text_t2e(text):
   try:
       emotions = te.get_emotion(text)
       return emotions
   except Exception as e:
       print(f"Error in text2emotion text emotion detection: {e}")
       return {}

detect_emotion_text_transformers Function

This function is similar to the previous one, but with two exciting differences:

  • First, instead of using text2emotion, it leverages a pre-trained transformers model to detect emotions.
  • Second, not only does it return the detected emotion, but it also provides a confidence score for each one. This way, you get a more detailed understanding of the emotional analysis.
# Function for text-based emotion detection using transformers
def detect_emotion_text_transformers(text):
   try:
       result = emotion_classifier(text)
       return result
   except Exception as e:
       print(f"Error in transformers text emotion detection: {e}")
       return []

Speech-to-text and Emotion Detection

Having covered emotion detection from text, let’s move on to speech. Detecting emotions from speech is a bit more complex and involves several steps:

  • The detect_emotion_speech() function starts by converting the audio file to WAV format using AudioSegment. Then, it transcribes the speech to text with speech_recognition and prints the text for verification.
  • After that, it uses a transformers model to detect emotions from the transcribed text.
  • Finally, it cleans up the temporary WAV file. This function is also robust, as it can handle errors and print any issues it encounters immediately.
# Function for speech-to-text and emotion detection
def detect_emotion_speech(audio_path):
   try:
       # Convert audio to WAV in case it's not in WAV format
       sound = AudioSegment.from_file(audio_path)
       wav_path = "temp_audio.wav"
       sound.export(wav_path, format="wav")


       recognizer = sr.Recognizer()
       with sr.AudioFile(wav_path) as source:
           audio = recognizer.record(source)


       # Perform speech recognition
       text = recognizer.recognize_google(audio)
       print(f"Transcribed Text: {text}")


       # Perform emotion detection on transcribed text
       result = emotion_classifier(text)
       os.remove(wav_path)  # Clean up temporary file
       return text, result
   except sr.UnknownValueError:
       return "Speech Recognition could not understand audio", None
   except sr.RequestError as e:
       return f"Could not obtain results from Google Speech Recognition service due to: {e}", None
   except Exception as e:
       print(f"Error in speech-to-text conversion: {e}")
       return None, None

Display the Results in the GUI

For this step, having made all the necessary arrangements to detect emotions from both text and audio, it’s time to orchestrate how the results will be displayed. This is the purpose of the display_results() function. Let’s see how this function works:

When triggered, the function initiates the progress bar, indicating that emotion detection has started. It then retrieves the user input, whether it is text or audio.

  • For text input: The function retrieves the text from the entry box. If the box is empty, it prompts the user to input text and stops the progress bar. Once the user provides the text, the process starts again in a new thread using the process_text_input function.
  • For audio input: The function verifies the audio file path using the os module. If the path is valid, it starts a new thread using the process_speech_input function. If the path is invalid, it stops the progress bar and displays an error message.

Finally, the function inserts the results into the output text widget.

# Function to display results in the GUI
def display_results():
   progress_bar.start()
   input_type = var.get()
   if input_type == "text":
       text = input_text.get("1.0", END).strip()
       if not text:
           result_text = "Please enter some text."
           output_text.delete("1.0", END)
           output_text.insert(INSERT, result_text)
           progress_bar.stop()
       else:
           threading.Thread(target=process_text_input, args=(text,)).start()
   elif input_type == "speech":
       audio_path = input_text.get("1.0", END).strip()
       if not os.path.isfile(audio_path):
           result_text = "Invalid file path. Please select a valid audio file."
           output_text.delete("1.0", END)
           output_text.insert(INSERT, result_text)
           progress_bar.stop()
       else:
           threading.Thread(target=process_speech_input, args=(audio_path,)).start()

Process Text and Speech Inputs

If you’ve read this far, congratulations! You’ve reached the core of our program: the process_text_input() and process_speech_input() functions. This is where the magic happens:

process_text_input

The first function uses both the text2emotion library and the pre-trained transformers model to analyze the provided text for emotions. Once the analysis is complete, it formats the results into a readable string and inserts it into the output_text widget. The function also stops the progress bar to display the result and indicate that the process is complete.

def process_text_input(text):
   emotions_t2e = detect_emotion_text_t2e(text)
   emotions_transformers = detect_emotion_text_transformers(text)
   result_text = f"Emotions (Text - text2emotion):\n{emotions_t2e}\n\n"
   result_text += f"Emotions (Text - transformers):\n{emotions_transformers}\n"
   output_text.delete("1.0", END)
   output_text.insert(INSERT, result_text)
   progress_bar.stop()

process_speech_input

While the previous function is exclusive to text input, this one is dedicated to audio input. It calls the detect_emotion_speech() function to detect emotions from the audio after converting it to text. If both the text and emotion are None, it means an error occurred, so an error message is displayed. If successful, the transcribed text and detected emotion are formatted into a readable string and inserted into the output_text widget to display the results. The progress bar is also stopped, signifying the end of the process.

def process_speech_input(audio_path):
   text, emotions = detect_emotion_speech(audio_path)
   if text is None and emotions is None:
       result_text = "Error in processing the audio file."
   else:
       result_text = f"Transcribed Text:\n{text}\n\nEmotions (Speech):\n{emotions}\n"
   output_text.delete("1.0", END)
   output_text.insert(INSERT, result_text)
   progress_bar.stop()

Browse for a Speech File

As we have emphasized, our goal is to make our programs user-friendly. That’s why we created the browse_file() function to act as a pathfinder for the audio file. It uses filedialog to allow the user to select the audio file they want.

# Function to browse for a speech file
def browse_file():
   file_path = filedialog.askopenfilename(filetypes=[("Audio Files", "*.mp3 *.wav *.ogg *.flac *.aac")])
   if file_path:
       input_text.delete("1.0", END)
       input_text.insert(INSERT, file_path)

Creating the GUI

Well, guys, we’ve reached the final part of our code. This is where we bring together all the elements we’ve created to form an engaging and functional user interface. First, we create the main window and set its title.

Next, we add a label prompting the user to enter the text or audio path and a text widget for the input, complete with a scrollbar. We create a StringVar to store the selected input type, defaulting to text, along with radio buttons for input type selection.

Then, we add a “Browse” button that calls the browse_file() function when clicked, and a “Detect Emotion” button that calls the display_results() function. We also create an output label and an output_text widget to display results, ensuring it has a scrollbar.

Finally, we include a horizontal progress bar to indicate that the process is ongoing. We close our code with mainloop(), which starts the main event loop and ensures the main window remains responsive to the user.

# Create the main window
root = Tk()
root.title("Emotion Detection from Text and Speech - The Pycodes")


# Create and place widgets
input_label = Label(root, text="Please Enter a Text or an Audio File Path:")
input_label.pack()


input_text = scrolledtext.ScrolledText(root, wrap=WORD, width=100, height=10)
input_text.pack(padx=10, pady=10)


var = StringVar(value="text")
text_radio = Radiobutton(root, text="Text", variable=var, value="text")
text_radio.pack()
speech_radio = Radiobutton(root, text="Speech", variable=var, value="speech")
speech_radio.pack()


browse_button = Button(root, text="Browse", command=browse_file)
browse_button.pack(pady=10)


process_button = Button(root, text="Detect Emotion", command=display_results)
process_button.pack(pady=10)


output_label = Label(root, text="Output:")
output_label.pack()


output_text = scrolledtext.ScrolledText(root, wrap=WORD, width=100, height=20)
output_text.pack(padx=10, pady=10)


progress_bar = ttk.Progressbar(root, orient=HORIZONTAL, length=380, mode='indeterminate')
progress_bar.place(x=3, y=280)


# Run the GUI event loop
root.mainloop()

Example

I ran this script on a Windows system, I wrote a sad phrase as shown in the image below:

Next, I detected emotions from a speech by browsing for an MP3 file of Rocky Balboa’s motivational speech, which is full of intense emotions, as shown in the image below:

Also, I ran this code on a Linux system. This time, I tried using Happy Speech:

Full Code

import text2emotion as te
from transformers import pipeline
from tkinter import *
from tkinter import scrolledtext, filedialog, ttk
from pydub import AudioSegment
import os
import speech_recognition as sr
import threading


# Ensure PyTorch is installed
try:
   import torch
except ImportError:
   print("Installing PyTorch...")
   os.system('pip install torch')


# Load the emotion classifier
emotion_classifier = pipeline('sentiment-analysis', model='j-hartmann/emotion-english-distilroberta-base')


# Function for text-based emotion detection using text2emotion
def detect_emotion_text_t2e(text):
   try:
       emotions = te.get_emotion(text)
       return emotions
   except Exception as e:
       print(f"Error in text2emotion text emotion detection: {e}")
       return {}


# Function for text-based emotion detection using transformers
def detect_emotion_text_transformers(text):
   try:
       result = emotion_classifier(text)
       return result
   except Exception as e:
       print(f"Error in transformers text emotion detection: {e}")
       return []


# Function for speech-to-text and emotion detection
def detect_emotion_speech(audio_path):
   try:
       # Convert audio to WAV in case it's not in WAV format
       sound = AudioSegment.from_file(audio_path)
       wav_path = "temp_audio.wav"
       sound.export(wav_path, format="wav")


       recognizer = sr.Recognizer()
       with sr.AudioFile(wav_path) as source:
           audio = recognizer.record(source)


       # Perform speech recognition
       text = recognizer.recognize_google(audio)
       print(f"Transcribed Text: {text}")


       # Perform emotion detection on transcribed text
       result = emotion_classifier(text)
       os.remove(wav_path)  # Clean up temporary file
       return text, result
   except sr.UnknownValueError:
       return "Speech Recognition could not understand audio", None
   except sr.RequestError as e:
       return f"Could not obtain results from Google Speech Recognition service due to: {e}", None
   except Exception as e:
       print(f"Error in speech-to-text conversion: {e}")
       return None, None


# Function to display results in the GUI
def display_results():
   progress_bar.start()
   input_type = var.get()
   if input_type == "text":
       text = input_text.get("1.0", END).strip()
       if not text:
           result_text = "Please enter some text."
           output_text.delete("1.0", END)
           output_text.insert(INSERT, result_text)
           progress_bar.stop()
       else:
           threading.Thread(target=process_text_input, args=(text,)).start()
   elif input_type == "speech":
       audio_path = input_text.get("1.0", END).strip()
       if not os.path.isfile(audio_path):
           result_text = "Invalid file path. Please select a valid audio file."
           output_text.delete("1.0", END)
           output_text.insert(INSERT, result_text)
           progress_bar.stop()
       else:
           threading.Thread(target=process_speech_input, args=(audio_path,)).start()


def process_text_input(text):
   emotions_t2e = detect_emotion_text_t2e(text)
   emotions_transformers = detect_emotion_text_transformers(text)
   result_text = f"Emotions (Text - text2emotion):\n{emotions_t2e}\n\n"
   result_text += f"Emotions (Text - transformers):\n{emotions_transformers}\n"
   output_text.delete("1.0", END)
   output_text.insert(INSERT, result_text)
   progress_bar.stop()


def process_speech_input(audio_path):
   text, emotions = detect_emotion_speech(audio_path)
   if text is None and emotions is None:
       result_text = "Error in processing the audio file."
   else:
       result_text = f"Transcribed Text:\n{text}\n\nEmotions (Speech):\n{emotions}\n"
   output_text.delete("1.0", END)
   output_text.insert(INSERT, result_text)
   progress_bar.stop()


# Function to browse for a speech file
def browse_file():
   file_path = filedialog.askopenfilename(filetypes=[("Audio Files", "*.mp3 *.wav *.ogg *.flac *.aac")])
   if file_path:
       input_text.delete("1.0", END)
       input_text.insert(INSERT, file_path)


# Create the main window
root = Tk()
root.title("Emotion Detection from Text and Speech - The Pycodes")


# Create and place widgets
input_label = Label(root, text="Please Enter a Text or an Audio File Path:")
input_label.pack()


input_text = scrolledtext.ScrolledText(root, wrap=WORD, width=100, height=10)
input_text.pack(padx=10, pady=10)


var = StringVar(value="text")
text_radio = Radiobutton(root, text="Text", variable=var, value="text")
text_radio.pack()
speech_radio = Radiobutton(root, text="Speech", variable=var, value="speech")
speech_radio.pack()


browse_button = Button(root, text="Browse", command=browse_file)
browse_button.pack(pady=10)


process_button = Button(root, text="Detect Emotion", command=display_results)
process_button.pack(pady=10)


output_label = Label(root, text="Output:")
output_label.pack()


output_text = scrolledtext.ScrolledText(root, wrap=WORD, width=100, height=20)
output_text.pack(padx=10, pady=10)


progress_bar = ttk.Progressbar(root, orient=HORIZONTAL, length=380, mode='indeterminate')
progress_bar.place(x=3, y=280)


# Run the GUI event loop
root.mainloop()

Happy Coding!

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top