Table of Contents
- Necessary Libraries
- Imports
- Initializing recognizer and text-to-speech engine
- Defining the Speak Function
- recognize_speech_from_mic Function
- Defining the Main Function
- Main Block
- Example
- Full Code
Hi there! Last time, we had fun turning text into speech with Python, making our computers talk. Today, we’re going to create a Speech Recognition system using Python to make our conversation with computers even better. We’ll teach them to listen to us and then talk back, just like a friend. No need for typing; just speak, and the computer will write it down and read it back to you.
Let’s get Started!
Necessary Libraries
Let’s get everything set up before we jump into the coding part, make sure to install the SpeechRecognition and pyttsx3 libraries via the terminal or your command prompt for the code to function properly:
$ pip install SpeechRecognition
$ pip install pyttsx3
Imports
First, we start by importing the necessary libraries. This code will be based on two parts:
- The first part is recognizing the speech from the audio input, for this we import the
speech_recognition
library. - The second part is converting text into speech, for this we import the
pyttsx3
library.
import speech_recognition as sr
import pyttsx3
Initializing recognizer and text-to-speech engine
# Initialize the recognizer and text-to-speech engine
recognizer = sr.Recognizer()
engine = pyttsx3.init()
After importing our libraries, we will need to Initialize two objects:
- The first one is
recognizer
from thespeech_recognition
library, which we will use to recognize the speech of the user. - The second one is
engine
from thepyttsx3
library to Initialize text to speech, which we will use to convert text to speech.
Defining the Speak Function
Next, we define a function that will convert the printed text that we said into speech.
def speak(text):
"""Converts text to speech."""
engine.say(text)
engine.runAndWait()
recognize_speech_from_mic Function
def recognize_speech_from_mic(recognizer):
"""Captures and recognizes speech."""
mic = sr.Microphone()
with mic as source:
recognizer.adjust_for_ambient_noise(source, duration=0.2)
audio = recognizer.listen(source)
try:
return {"success": True, "error": None,
"transcription": recognizer.recognize_google(audio, language='en').lower()} # change ‘en’ to the language code you want
except sr.UnknownValueError:
return {"success": False, "error": "Could not understand audio", "transcription": None}
except sr.RequestError as e:
return {"success": False, "error": f"Could not request results; {e}", "transcription": None}
After that, we create a function that:
- First, set up the microphone to be ready to listen to our voices.
- Second, adapt to the background noise to hear us better.
- Third, listen to our voice for a short period.
- Fourth, by using Google’s speech recognition service, it attempts to understand what was said.
- Last but not least, it gives us feedback on whether it understood our speech or not.
Defining the Main Function
Once our speech is recognized by the previous function, this one will transcribe it into text, and then speak it using the speak()
function, and so on it continues in a loop until the user exits.
However, if an error occurs and our speech is not recognized by the recognize_speech_from_mic()
function then this function will print “Could not understand audio” or “Sorry, I didn’t catch that. Can you please repeat?“.
def main():
try:
while True:
print("Listening...")
result = recognize_speech_from_mic(recognizer)
if result["success"]:
text = result["transcription"]
print(f"Recognized: {text}")
speak(f"You said: {text}")
else:
error = result["error"]
print(error)
speak(
error if "Could not understand audio" not in error else "Sorry, I didn't catch that. Can you please repeat?")
except KeyboardInterrupt:
print("Exiting...")
Main Block
Finally, this part ensures that the code is run directly and not imported as a module, because if the code is imported as a module it will not work.
if __name__ == "__main__":
main()
Example
Full Code
import speech_recognition as sr
import pyttsx3
# Initialize the recognizer and text-to-speech engine
recognizer = sr.Recognizer()
engine = pyttsx3.init()
def speak(text):
"""Converts text to speech."""
engine.say(text)
engine.runAndWait()
def recognize_speech_from_mic(recognizer):
"""Captures and recognizes speech."""
mic = sr.Microphone()
with mic as source:
recognizer.adjust_for_ambient_noise(source, duration=0.2)
audio = recognizer.listen(source)
try:
return {"success": True, "error": None,
"transcription": recognizer.recognize_google(audio, language='en').lower()} # change ‘en’ to the language code you want
except sr.UnknownValueError:
return {"success": False, "error": "Could not understand audio", "transcription": None}
except sr.RequestError as e:
return {"success": False, "error": f"Could not request results; {e}", "transcription": None}
def main():
try:
while True:
print("Listening...")
result = recognize_speech_from_mic(recognizer)
if result["success"]:
text = result["transcription"]
print(f"Recognized: {text}")
speak(f"You said: {text}")
else:
error = result["error"]
print(error)
speak(
error if "Could not understand audio" not in error else "Sorry, I didn't catch that. Can you please repeat?")
except KeyboardInterrupt:
print("Exiting...")
if __name__ == "__main__":
main()
Happy Coding!