Home » Tutorials » How to Build a Speech Recognition in Python

How to Build a Speech Recognition in Python

Table of Contents

Hi there! Last time, we had fun turning text into speech with Python, making our computers talk. Today, we’re going to create a Speech Recognition system using Python to make our conversation with computers even better. We’ll teach them to listen to us and then talk back, just like a friend. No need for typing; just speak, and the computer will write it down and read it back to you.

Let’s get Started!

Necessary Libraries

Let’s get everything set up before we jump into the coding part, make sure to install the SpeechRecognition and pyttsx3 libraries via the terminal or your command prompt for the code to function properly:

$ pip install SpeechRecognition
$ pip install pyttsx3

Imports

First, we start by importing the necessary libraries. This code will be based on two parts:

  • The first part is recognizing the speech from the audio input, for this we import the speech_recognition library.
  • The second part is converting text into speech, for this we import the pyttsx3 library.
import speech_recognition as sr
import pyttsx3

Initializing recognizer and text-to-speech engine

# Initialize the recognizer and text-to-speech engine
recognizer = sr.Recognizer()
engine = pyttsx3.init()

After importing our libraries, we will need to Initialize two objects:

  • The first one is recognizer from the speech_recognition library, which we will use to recognize the speech of the user.
  • The second one is engine from the pyttsx3 library to Initialize text to speech, which we will use to convert text to speech.

Defining the Speak Function

Next, we define a function that will convert the printed text that we said into speech.

def speak(text):
   """Converts text to speech."""
   engine.say(text)
   engine.runAndWait()

recognize_speech_from_mic Function

def recognize_speech_from_mic(recognizer):
   """Captures and recognizes speech."""
   mic = sr.Microphone()
   with mic as source:
       recognizer.adjust_for_ambient_noise(source, duration=0.2)
       audio = recognizer.listen(source)
       try:
           return {"success": True, "error": None,
                   "transcription": recognizer.recognize_google(audio, language='en').lower()} # change ‘en’ to the language code you want
       except sr.UnknownValueError:
           return {"success": False, "error": "Could not understand audio", "transcription": None}
       except sr.RequestError as e:
           return {"success": False, "error": f"Could not request results; {e}", "transcription": None}

After that, we create a function that:

  • First, set up the microphone to be ready to listen to our voices.
  • Second, adapt to the background noise to hear us better.
  • Third, listen to our voice for a short period.
  • Fourth, by using Google’s speech recognition service, it attempts to understand what was said.
  • Last but not least, it gives us feedback on whether it understood our speech or not.

Defining the Main Function

Once our speech is recognized by the previous function, this one will transcribe it into text, and then speak it using the speak() function, and so on it continues in a loop until the user exits.

However, if an error occurs and our speech is not recognized by the recognize_speech_from_mic() function then this function will print “Could not understand audio” or  “Sorry, I didn’t catch that. Can you please repeat?“.

def main():
   try:
       while True:
           print("Listening...")
           result = recognize_speech_from_mic(recognizer)


           if result["success"]:
               text = result["transcription"]
               print(f"Recognized: {text}")
               speak(f"You said: {text}")
           else:
               error = result["error"]
               print(error)
               speak(
                   error if "Could not understand audio" not in error else "Sorry, I didn't catch that. Can you please repeat?")
   except KeyboardInterrupt:
       print("Exiting...")

Main Block

Finally, this part ensures that the code is run directly and not imported as a module, because if the code is imported as a module it will not work.

if __name__ == "__main__":
   main()

Example

Full Code

import speech_recognition as sr
import pyttsx3


# Initialize the recognizer and text-to-speech engine
recognizer = sr.Recognizer()
engine = pyttsx3.init()


def speak(text):
   """Converts text to speech."""
   engine.say(text)
   engine.runAndWait()


def recognize_speech_from_mic(recognizer):
   """Captures and recognizes speech."""
   mic = sr.Microphone()
   with mic as source:
       recognizer.adjust_for_ambient_noise(source, duration=0.2)
       audio = recognizer.listen(source)
       try:
           return {"success": True, "error": None,
                   "transcription": recognizer.recognize_google(audio, language='en').lower()} # change ‘en’ to the language code you want
       except sr.UnknownValueError:
           return {"success": False, "error": "Could not understand audio", "transcription": None}
       except sr.RequestError as e:
           return {"success": False, "error": f"Could not request results; {e}", "transcription": None}



def main():
   try:
       while True:
           print("Listening...")
           result = recognize_speech_from_mic(recognizer)


           if result["success"]:
               text = result["transcription"]
               print(f"Recognized: {text}")
               speak(f"You said: {text}")
           else:
               error = result["error"]
               print(error)
               speak(
                   error if "Could not understand audio" not in error else "Sorry, I didn't catch that. Can you please repeat?")
   except KeyboardInterrupt:
       print("Exiting...")


if __name__ == "__main__":
   main()

Happy Coding!

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top