Object detection is a critical capability in the realm of computer vision, enabling machines to identify and locate objects within images or video streams. In this tutorial, we will delve into the powerful combination of YOLO (You Only Look Once) and OpenCV to implement efficient and accurate object detection in Python. YOLO’s speed and precision, coupled with OpenCV’s comprehensive image processing library, make them an ideal pair for a variety of applications.
We’ll cover everything from setting up the necessary libraries to processing real-time video, individual images, and video files for object detection. Whether you’re looking to enhance your security systems, develop autonomous vehicles, or simply explore the fascinating world of computer vision, this article will provide you with the tools and knowledge to get started.
We’ve seen how real-time object detection with YOLOv5 can be implemented effectively in this tutorial. Let’s dive in and see how you can harness the power of YOLO and OpenCV to build robust object detection systems in Python.
Table of Contents
- Setup and Installation
- Imports
- Initializing Variables and Signal Handling
- Frame Processing Function
- Video Processing Function
- Photo Processing Function
- Live Video Detection Function
- Video Processing Initiation
- Main Window Setup
- Example
- Full Code
Setup and Installation
For the code to function properly, make sure to install these libraries using the terminal or command prompt by running the following commands:
$ pip install torch
$ pip install torchvision
$ pip install pandas
$ pip install opencv-python
$ pip install tk
We’ll also need to clone the YOLOv5 repository to access the pre-trained model and the necessary scripts:
For Linux and macOS:
- Open Terminal and run the following command:
$ git clone https://github.com/ultralytics/yolov5
- Navigate to the YOLOv5 directory and install the required dependencies:
$ cd yolov5
$ pip install -r requirements.txt
For Windows:
- Install Microsoft Visual C++ Build Tools: Download and install from here.
- Then, download the
requirements.txt
file from here. - Place this file in the project directory.
- Open the Terminal and execute this command to install the necessary dependencies:
$ pip install -r requirements.txt
Imports
Well then, we are about to unlock the power of AI-driven object detection. We better get ready by gathering all the necessary tools, so let’s start with the imports:
- torch: for models.
- cv2: for handling images and videos.
- pandas: for data handling.
- datetime: for managing date and time, as well as timestamps.
- os: to interact with the operating system.
- tkinter: to create a graphical user interface, access directories, use message boxes, and themed widgets.
- threading: to multi-task without freezing the main window.
- signal: to catch system interruptions and ensure a smooth exit.
- sys: to ensure the program shuts down cleanly and efficiently.
Now that we have finished our imports, let’s move on to loading our models:
- yolov5s: perfect for real-time detection because it’s fast and lightweight.
- yolov5x: a larger, more accurate model ideal for video and photo processing.
import torch
import cv2
import pandas as pd
from datetime import datetime
import os
from tkinter import *
from tkinter import filedialog
from tkinter import messagebox
from tkinter import ttk
import threading
import signal
import sys
Initializing Variables and Signal Handling
With everything set up, it’s time to dive into the exciting part: setting up our variables and functions for seamless video detection!
Get Ready to Detect with YOLOv5
First, we’re going to load our models. For real-time video detection, we’ll use the speedy YOLOv5s model:
# Load YOLOv5s model for real-time video detection
model_realtime = torch.hub.load('ultralytics/yolov5', 'yolov5s') # YOLOv5s for real-time detection
For processing video and photo files with higher accuracy, we’ll employ the powerful YOLOv5x model:
# Load YOLOv5x model for better accuracy in file processing
model_file = torch.hub.load('ultralytics/yolov5', 'yolov5x') # YOLOv5x for video and photo files
Setting the Stage with Variables
Now, let’s establish our key variables:
# Initialize variables
confidence_threshold = 0.3 # Confidence threshold for detections
frames_dir = 'detected_frames'
os.makedirs(frames_dir, exist_ok=True)
Here, confidence_threshold
sets the bar for what counts as a valid detection, and frames_dir
is where we’ll save the detected frames. Don’t forget to create this directory if it doesn’t already exist.
Logging Our Adventures
We also need to log our detection journey. We’ll set up a CSV file to record all the exciting moments:
# Log file setup
log_file = 'detection_log.csv'
log_columns = ['Timestamp', 'Object', 'Confidence', 'Frame']
log_data = []
Our log file will capture the timestamp, detected object, confidence level, and the corresponding frame.
Managing the Camera
To manage the video capture, we’ll use the cap
variable:
cap = None
We need to ensure our camera resources are properly released when we’re done. Here’s how we do it:
def release_camera():
"""Release the camera resource."""
global cap
if cap is not None:
cap.release()
cv2.destroyAllWindows()
cap = None
Handling Termination Gracefully
Finally, we must ensure our program stops smoothly when we receive termination signals. This is where our signal_handler
function comes into play:
def signal_handler(sig, frame):
"""Handle termination signals to ensure resources are released."""
release_camera()
sys.exit(0)
signal.signal(signal.SIGINT, signal_handler)
Frame Processing Function
Now, this part is extremely important, so pay close attention. Why, you ask? Because here we will analyze each frame carefully to detect objects and see if they meet the threshold to be recorded in the log.
The Magic of Frame Processing
Imagine examining each frame with surgical precision to identify every object within it. How is this done? Let’s dive into the process_frame()
function:
def process_frame(frame, frame_width, frame_height, frame_count, model):
"""Process a single frame for object detection."""
results = model(frame)
labels, cords = results.xyxyn[0][:, -1].numpy(), results.xyxyn[0][:, :-1].numpy()
n = len(labels)
detected = False
for i in range(n):
row = cords[i]
if row[4] >= confidence_threshold:
x1, y1, x2, y2 = int(row[0] * frame_width), int(row[1] * frame_height), int(row[2] * frame_width), int(row[3] * frame_height)
bgr = (0, 255, 0)
cv2.rectangle(frame, (x1, y1), (x2, y2), bgr, 2)
text = f"{model.names[int(labels[i])]} {row[4]:.2f}"
cv2.putText(frame, text, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 1.0, bgr, 2)
timestamp = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
log_data.append([timestamp, model.names[int(labels[i])], row[4], frame_count])
detected = True
return frame, detected
This function works like a detective. It passes each frame through the YOLO model, which uses its neural network to identify objects. Once the results are out, numpy
is used to split them into labels (identifying the objects) and coordinates (for drawing boxes around them).
Breaking Down the Process
- Frame Analysis: The function processes each frame to detect objects using the YOLO model.
- Threshold Check: It checks if the detection confidence meets the threshold set by
confidence_threshold
. - Bounding Boxes: For each valid detection, it draws a bounding box around the object.
- Labeling: It labels the object with its name and confidence score.
- Logging: The detection details, including the timestamp, are logged for further analysis.
This detailed process ensures that only objects with high confidence scores are considered. If the confidence is too low, the object is ignored. However, if it meets or exceeds the threshold, a bounding box is drawn, and a label with the object’s name and confidence score is added using cv2.putText()
. The detection details are logged with the timestamp using log_data.append()
.
Finally, all these results are ready to be used by other parts of the code, ensuring a seamless and efficient object detection system.
Video Processing Function
def process_video(video_path, progress_bar):
"""Process a video file for object detection."""
cap = cv2.VideoCapture(video_path)
frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
output_video_path = 'output.avi'
out = cv2.VideoWriter(output_video_path, cv2.VideoWriter_fourcc(*'XVID'), 20.0, (frame_width, frame_height))
frame_count = 0
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
frame, _ = process_frame(frame, frame_width, frame_height, frame_count, model_file)
out.write(frame)
frame_count += 1
progress = (frame_count / total_frames) * 100
progress_bar['value'] = progress
progress_bar.update_idletasks()
cap.release()
out.release()
messagebox.showinfo("Info", f"Processed video saved to {output_video_path}")
progress_bar['value'] = 0
This time, we’re stepping up our game from processing a single frame to handling an entire video file with our process_video()
function. Curious about how it works? Let’s break it down.
First, we use cv2.VideoCapture()
to open the specified video file from its directory path. This handy tool helps us capture the video frame by frame. Next, we use cap
to gather three crucial pieces of information: the total number of frames, the width, and the height of each frame.
Once we’ve got these details, we prepare to save the processed video by creating an output video writer with cv2.VideoWriter()
. We also set up a counter, frame_count
, to keep track of how many frames we’ve processed.
Now, with everything in place, we enter a loop where cap.read()
retrieves each frame of the video file. Each frame is then sent to the process_frame()
function for detection and annotation. Throughout this process, the progress bar is updated with the incremented frame_count
, giving us a visual indication of how many frames have been processed.
When all the frames are processed, we release the video capture and video writer, and display a message box to let us know the processed video has been successfully saved. The progress bar is then reset to 0, ready for the next video to be processed.
Photo Processing Function
def process_photo(photo_path):
"""Process a photo file for object detection."""
frame = cv2.imread(photo_path)
if frame is None:
messagebox.showerror("Error", "Could not open or find the image.")
return
frame_height, frame_width, _ = frame.shape
frame, detected = process_frame(frame, frame_width, frame_height, 0, model_file)
max_display_size = 800
scale = min(max_display_size / frame_width, max_display_size / frame_height)
display_frame = cv2.resize(frame, (int(frame_width * scale), int(frame_height * scale)))
cv2.imshow('Processed Photo - The Pycodes', display_frame)
cv2.waitKey(0)
cv2.destroyAllWindows()
if detected:
frame_path = os.path.join(frames_dir, os.path.basename(photo_path))
cv2.imwrite(frame_path, frame)
messagebox.showinfo("Info", f"Processed photo saved at {frame_path}")
else:
messagebox.showinfo("Info", "No objects detected in the photo.")
Now that we have seen how videos are processed, let’s explore how photos are handled through the process_photo()
function. This function starts by loading the image from its path using cv2.imread()
. If an error occurs and the image cannot be opened, a message box is displayed to alert the user.
Once the image is successfully loaded, the function extracts the height and width of the frame. These dimensions, along with the frame itself, are then passed to the process_frame()
function for object detection and annotation.
After the detection and annotation are complete, the function prepares the processed photo for display by resizing it with cv2.resize()
. This ensures that the image fits within a display window. The resized photo is then displayed for inspection using cv2.imshow()
.
If an object is detected and the user closes the display window, the processed photo is saved in the frames_dir
directory, and a message box indicates the location of the saved photo. If no objects are detected, a message box informs the user that no objects were found.
Live Video Detection Function
def start_realtime_detection():
"""Start real-time video detection."""
def run():
global cap
cap = cv2.VideoCapture(0)
if not cap.isOpened():
messagebox.showerror("Error", "Could not open video.")
return
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 1280)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 720)
frame_width = int(cap.get(3))
frame_height = int(cap.get(4))
output_video_path = 'output.avi'
out = cv2.VideoWriter(output_video_path, cv2.VideoWriter_fourcc(*'XVID'), 20.0, (frame_width, frame_height))
frame_count = 0
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
frame, detected = process_frame(frame, frame_width, frame_height, frame_count, model_realtime)
out.write(frame)
cv2.imshow('YOLOv5 Object Detection - The Pycodes', frame)
key = cv2.waitKey(1) & 0xFF
if key == ord('q'):
break
elif key == ord('s'):
frame_path = os.path.join(frames_dir, f"frame_{frame_count}.jpg")
cv2.imwrite(frame_path, frame)
print(f"Frame {frame_count} saved at {frame_path}")
frame_count += 1
release_camera()
out.release()
messagebox.showinfo("Info", f"Annotated video saved to {output_video_path}")
threading.Thread(target=run).start()
It’s time for us to dive into the world of real-time video detection, so let’s see how the start_realtime_detection()
function works. This function initiates a thread that calls the run()
function.
How does it work? First, it opens the camera with cv2.VideoCapture(0)
, and if it fails, it displays an error message. Then, it sets up a higher resolution for the frame by adjusting its width and height. Next, it creates an output video writer to save the processed video.
The function initializes frame_count
to track the number of frames in the video. With everything set, a loop begins, processing each frame using the process_frame()
function while the camera remains open, thanks to cap.isOpened()
.
Each frame is annotated and detected using the YOLOv5 model and then displayed with cv2.imshow()
. The function includes two handy commands: pressing “s” saves the current frame, and pressing “q” exits the real-time detection, ending the loop and releasing the camera. This saves the output video and displays a message indicating successful completion.
Video Processing Initiation
def start_video_processing():
"""Start video file processing."""
video_path = filedialog.askopenfilename(filetypes=[("Video Files", "*.mp4;*.avi")])
if video_path:
progress_bar = ttk.Progressbar(root, orient="horizontal", length=400, mode="determinate")
progress_bar.pack(pady=10)
threading.Thread(target=process_video, args=(video_path, progress_bar)).start()
We have reached the command center: the start_video_processing()
function. Once triggered, it opens a file dialog for the user to select a video file. It then verifies the file by checking its file path. A progress bar is created to show the progress of the video processing. Finally, to prevent the main window from freezing, it starts a new thread that calls the process_video()
function, passing the video path and the progress bar as arguments.
Main Window Setup
Welcome to the grand finale, where we bring everything together into a graphical user interface (GUI).
First, we create the main window, set its title, and define its size. Then, we add the following buttons:
- Real-time Video Detection: This button calls the
start_realtime_detection()
function. - Process Video File: This button calls the
start_video_processing()
function. - Process Photo File: This button starts a new thread, calls the
process_photo()
function, and lets the user select a photo for detection usingfiledialog
. - Exit: This button uses the
root.quit
command to close the main window.
Finally, we start the main event loop with root.mainloop()
. This keeps the window running and responsive until the user exits. When the user exits, the detection log is saved to a CSV file containing records of all detected objects.
# Tkinter GUI setup
root = Tk()
root.title("YOLOv5 Object Detection - The Pycodes")
root.geometry("400x300")
btn_realtime = Button(root, text="Real-time Video Detection", command=start_realtime_detection)
btn_realtime.pack(pady=10)
btn_video = Button(root, text="Process Video File", command=start_video_processing)
btn_video.pack(pady=10)
btn_photo = Button(root, text="Process Photo File", command=lambda: threading.Thread(target=process_photo, args=(filedialog.askopenfilename(filetypes=[("Image Files", "*.jpg;*.jpeg;*.png")]),)).start())
btn_photo.pack(pady=10)
btn_exit = Button(root, text="Exit", command=root.quit)
btn_exit.pack(pady=10)
root.mainloop()
log_df = pd.DataFrame(log_data, columns=log_columns)
log_df.to_csv(log_file, index=False)
print(f"Detection log saved to {log_file}")
print(f"Detected frames saved to {frames_dir}")
Example
First, I executed this code on a video as shown in the images below:
Then I detected objects from this image: “car + person”:
Finally, I ran this script for real-time object detection as shown in the video below:
Full Code
import torch
import cv2
import pandas as pd
from datetime import datetime
import os
from tkinter import *
from tkinter import filedialog
from tkinter import messagebox
from tkinter import ttk
import threading
import signal
import sys
# Load YOLOv5s model for real-time video detection
model_realtime = torch.hub.load('ultralytics/yolov5', 'yolov5s') # YOLOv5s for real-time detection
# Load YOLOv5x model for better accuracy in file processing
model_file = torch.hub.load('ultralytics/yolov5', 'yolov5x') # YOLOv5x for video and photo files
# Initialize variables
confidence_threshold = 0.3 # Confidence threshold for detections
frames_dir = 'detected_frames'
os.makedirs(frames_dir, exist_ok=True)
# Log file setup
log_file = 'detection_log.csv'
log_columns = ['Timestamp', 'Object', 'Confidence', 'Frame']
log_data = []
cap = None
def release_camera():
"""Release the camera resource."""
global cap
if cap is not None:
cap.release()
cv2.destroyAllWindows()
cap = None
def signal_handler(sig, frame):
"""Handle termination signals to ensure resources are released."""
release_camera()
sys.exit(0)
signal.signal(signal.SIGINT, signal_handler)
def process_frame(frame, frame_width, frame_height, frame_count, model):
"""Process a single frame for object detection."""
results = model(frame)
labels, cords = results.xyxyn[0][:, -1].numpy(), results.xyxyn[0][:, :-1].numpy()
n = len(labels)
detected = False
for i in range(n):
row = cords[i]
if row[4] >= confidence_threshold:
x1, y1, x2, y2 = int(row[0] * frame_width), int(row[1] * frame_height), int(row[2] * frame_width), int(row[3] * frame_height)
bgr = (0, 255, 0)
cv2.rectangle(frame, (x1, y1), (x2, y2), bgr, 2)
text = f"{model.names[int(labels[i])]} {row[4]:.2f}"
cv2.putText(frame, text, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 1.0, bgr, 2)
timestamp = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
log_data.append([timestamp, model.names[int(labels[i])], row[4], frame_count])
detected = True
return frame, detected
def process_video(video_path, progress_bar):
"""Process a video file for object detection."""
cap = cv2.VideoCapture(video_path)
frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
output_video_path = 'output.avi'
out = cv2.VideoWriter(output_video_path, cv2.VideoWriter_fourcc(*'XVID'), 20.0, (frame_width, frame_height))
frame_count = 0
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
frame, _ = process_frame(frame, frame_width, frame_height, frame_count, model_file)
out.write(frame)
frame_count += 1
progress = (frame_count / total_frames) * 100
progress_bar['value'] = progress
progress_bar.update_idletasks()
cap.release()
out.release()
messagebox.showinfo("Info", f"Processed video saved to {output_video_path}")
progress_bar['value'] = 0
def process_photo(photo_path):
"""Process a photo file for object detection."""
frame = cv2.imread(photo_path)
if frame is None:
messagebox.showerror("Error", "Could not open or find the image.")
return
frame_height, frame_width, _ = frame.shape
frame, detected = process_frame(frame, frame_width, frame_height, 0, model_file)
max_display_size = 800
scale = min(max_display_size / frame_width, max_display_size / frame_height)
display_frame = cv2.resize(frame, (int(frame_width * scale), int(frame_height * scale)))
cv2.imshow('Processed Photo - The Pycodes', display_frame)
cv2.waitKey(0)
cv2.destroyAllWindows()
if detected:
frame_path = os.path.join(frames_dir, os.path.basename(photo_path))
cv2.imwrite(frame_path, frame)
messagebox.showinfo("Info", f"Processed photo saved at {frame_path}")
else:
messagebox.showinfo("Info", "No objects detected in the photo.")
def start_realtime_detection():
"""Start real-time video detection."""
def run():
global cap
cap = cv2.VideoCapture(0)
if not cap.isOpened():
messagebox.showerror("Error", "Could not open video.")
return
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 1280)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 720)
frame_width = int(cap.get(3))
frame_height = int(cap.get(4))
output_video_path = 'output.avi'
out = cv2.VideoWriter(output_video_path, cv2.VideoWriter_fourcc(*'XVID'), 20.0, (frame_width, frame_height))
frame_count = 0
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
frame, detected = process_frame(frame, frame_width, frame_height, frame_count, model_realtime)
out.write(frame)
cv2.imshow('YOLOv5 Object Detection - The Pycodes', frame)
key = cv2.waitKey(1) & 0xFF
if key == ord('q'):
break
elif key == ord('s'):
frame_path = os.path.join(frames_dir, f"frame_{frame_count}.jpg")
cv2.imwrite(frame_path, frame)
print(f"Frame {frame_count} saved at {frame_path}")
frame_count += 1
release_camera()
out.release()
messagebox.showinfo("Info", f"Annotated video saved to {output_video_path}")
threading.Thread(target=run).start()
def start_video_processing():
"""Start video file processing."""
video_path = filedialog.askopenfilename(filetypes=[("Video Files", "*.mp4;*.avi")])
if video_path:
progress_bar = ttk.Progressbar(root, orient="horizontal", length=400, mode="determinate")
progress_bar.pack(pady=10)
threading.Thread(target=process_video, args=(video_path, progress_bar)).start()
# Tkinter GUI setup
root = Tk()
root.title("YOLOv5 Object Detection - The Pycodes")
root.geometry("400x300")
btn_realtime = Button(root, text="Real-time Video Detection", command=start_realtime_detection)
btn_realtime.pack(pady=10)
btn_video = Button(root, text="Process Video File", command=start_video_processing)
btn_video.pack(pady=10)
btn_photo = Button(root, text="Process Photo File", command=lambda: threading.Thread(target=process_photo, args=(filedialog.askopenfilename(filetypes=[("Image Files", "*.jpg;*.jpeg;*.png")]),)).start())
btn_photo.pack(pady=10)
btn_exit = Button(root, text="Exit", command=root.quit)
btn_exit.pack(pady=10)
root.mainloop()
log_df = pd.DataFrame(log_data, columns=log_columns)
log_df.to_csv(log_file, index=False)
print(f"Detection log saved to {log_file}")
print(f"Detected frames saved to {frames_dir}")
Happy Coding!