Home » Tutorials » Object Detection with YOLO and OpenCV in Python

Object Detection with YOLO and OpenCV in Python

Object detection is a critical capability in the realm of computer vision, enabling machines to identify and locate objects within images or video streams. In this tutorial, we will delve into the powerful combination of YOLO (You Only Look Once) and OpenCV to implement efficient and accurate object detection in Python. YOLO’s speed and precision, coupled with OpenCV’s comprehensive image processing library, make them an ideal pair for a variety of applications.

We’ll cover everything from setting up the necessary libraries to processing real-time video, individual images, and video files for object detection. Whether you’re looking to enhance your security systems, develop autonomous vehicles, or simply explore the fascinating world of computer vision, this article will provide you with the tools and knowledge to get started.

We’ve seen how real-time object detection with YOLOv5 can be implemented effectively in this tutorial. Let’s dive in and see how you can harness the power of YOLO and OpenCV to build robust object detection systems in Python.

Table of Contents

Setup and Installation

For the code to function properly, make sure to install these libraries using the terminal or command prompt by running the following commands:

$ pip install torch
$ pip install torchvision
$ pip install pandas
$ pip install opencv-python
$ pip install tk

We’ll also need to clone the YOLOv5 repository to access the pre-trained model and the necessary scripts:

For Linux and macOS:

  • Open Terminal and run the following command:
$ git clone https://github.com/ultralytics/yolov5
  • Navigate to the YOLOv5 directory and install the required dependencies:
$ cd yolov5
$ pip install -r requirements.txt

For Windows:

  1. Install Microsoft Visual C++ Build Tools: Download and install from here.
  2. Then, download the requirements.txt file from here.
  3. Place this file in the project directory.
  4. Open the Terminal and execute this command to install the necessary dependencies:
$ pip install -r requirements.txt

Imports

Well then, we are about to unlock the power of AI-driven object detection. We better get ready by gathering all the necessary tools, so let’s start with the imports:

  • torch: for models.
  • cv2: for handling images and videos.
  • pandas: for data handling.
  • datetime: for managing date and time, as well as timestamps.
  • os: to interact with the operating system.
  • tkinter: to create a graphical user interface, access directories, use message boxes, and themed widgets.
  • threading: to multi-task without freezing the main window.
  • signal: to catch system interruptions and ensure a smooth exit.
  • sys: to ensure the program shuts down cleanly and efficiently.

Now that we have finished our imports, let’s move on to loading our models:

  • yolov5s: perfect for real-time detection because it’s fast and lightweight.
  • yolov5x: a larger, more accurate model ideal for video and photo processing.
import torch
import cv2
import pandas as pd
from datetime import datetime
import os
from tkinter import *
from tkinter import filedialog
from tkinter import messagebox
from tkinter import ttk
import threading
import signal
import sys

Initializing Variables and Signal Handling

With everything set up, it’s time to dive into the exciting part: setting up our variables and functions for seamless video detection!

Get Ready to Detect with YOLOv5

First, we’re going to load our models. For real-time video detection, we’ll use the speedy YOLOv5s model:

# Load YOLOv5s model for real-time video detection
model_realtime = torch.hub.load('ultralytics/yolov5', 'yolov5s')  # YOLOv5s for real-time detection

For processing video and photo files with higher accuracy, we’ll employ the powerful YOLOv5x model:

# Load YOLOv5x model for better accuracy in file processing
model_file = torch.hub.load('ultralytics/yolov5', 'yolov5x')  # YOLOv5x for video and photo files

Setting the Stage with Variables

Now, let’s establish our key variables:

# Initialize variables
confidence_threshold = 0.3  # Confidence threshold for detections
frames_dir = 'detected_frames'
os.makedirs(frames_dir, exist_ok=True)

Here, confidence_threshold sets the bar for what counts as a valid detection, and frames_dir is where we’ll save the detected frames. Don’t forget to create this directory if it doesn’t already exist.

Logging Our Adventures

We also need to log our detection journey. We’ll set up a CSV file to record all the exciting moments:

# Log file setup
log_file = 'detection_log.csv'
log_columns = ['Timestamp', 'Object', 'Confidence', 'Frame']
log_data = []

Our log file will capture the timestamp, detected object, confidence level, and the corresponding frame.

Managing the Camera

To manage the video capture, we’ll use the cap variable:

cap = None

We need to ensure our camera resources are properly released when we’re done. Here’s how we do it:

def release_camera():
    """Release the camera resource."""
    global cap
    if cap is not None:
        cap.release()
        cv2.destroyAllWindows()
        cap = None

Handling Termination Gracefully

Finally, we must ensure our program stops smoothly when we receive termination signals. This is where our signal_handler function comes into play:

def signal_handler(sig, frame):
    """Handle termination signals to ensure resources are released."""
    release_camera()
    sys.exit(0)

signal.signal(signal.SIGINT, signal_handler)

Frame Processing Function

Now, this part is extremely important, so pay close attention. Why, you ask? Because here we will analyze each frame carefully to detect objects and see if they meet the threshold to be recorded in the log.

The Magic of Frame Processing

Imagine examining each frame with surgical precision to identify every object within it. How is this done? Let’s dive into the process_frame() function:

def process_frame(frame, frame_width, frame_height, frame_count, model):
    """Process a single frame for object detection."""
    results = model(frame)
    labels, cords = results.xyxyn[0][:, -1].numpy(), results.xyxyn[0][:, :-1].numpy()
    n = len(labels)
    detected = False
    for i in range(n):
        row = cords[i]
        if row[4] >= confidence_threshold:
            x1, y1, x2, y2 = int(row[0] * frame_width), int(row[1] * frame_height), int(row[2] * frame_width), int(row[3] * frame_height)
            bgr = (0, 255, 0)
            cv2.rectangle(frame, (x1, y1), (x2, y2), bgr, 2)
            text = f"{model.names[int(labels[i])]} {row[4]:.2f}"
            cv2.putText(frame, text, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 1.0, bgr, 2)
            timestamp = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
            log_data.append([timestamp, model.names[int(labels[i])], row[4], frame_count])
            detected = True
    return frame, detected

This function works like a detective. It passes each frame through the YOLO model, which uses its neural network to identify objects. Once the results are out, numpy is used to split them into labels (identifying the objects) and coordinates (for drawing boxes around them).

Breaking Down the Process

  • Frame Analysis: The function processes each frame to detect objects using the YOLO model.
  • Threshold Check: It checks if the detection confidence meets the threshold set by confidence_threshold.
  • Bounding Boxes: For each valid detection, it draws a bounding box around the object.
  • Labeling: It labels the object with its name and confidence score.
  • Logging: The detection details, including the timestamp, are logged for further analysis.

This detailed process ensures that only objects with high confidence scores are considered. If the confidence is too low, the object is ignored. However, if it meets or exceeds the threshold, a bounding box is drawn, and a label with the object’s name and confidence score is added using cv2.putText(). The detection details are logged with the timestamp using log_data.append().

Finally, all these results are ready to be used by other parts of the code, ensuring a seamless and efficient object detection system.

Video Processing Function

def process_video(video_path, progress_bar):
   """Process a video file for object detection."""
   cap = cv2.VideoCapture(video_path)
   frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
   frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
   total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
   output_video_path = 'output.avi'
   out = cv2.VideoWriter(output_video_path, cv2.VideoWriter_fourcc(*'XVID'), 20.0, (frame_width, frame_height))


   frame_count = 0
   while cap.isOpened():
       ret, frame = cap.read()
       if not ret:
           break
       frame, _ = process_frame(frame, frame_width, frame_height, frame_count, model_file)
       out.write(frame)
       frame_count += 1
       progress = (frame_count / total_frames) * 100
       progress_bar['value'] = progress
       progress_bar.update_idletasks()


   cap.release()
   out.release()
   messagebox.showinfo("Info", f"Processed video saved to {output_video_path}")
   progress_bar['value'] = 0

This time, we’re stepping up our game from processing a single frame to handling an entire video file with our process_video() function. Curious about how it works? Let’s break it down.

First, we use cv2.VideoCapture() to open the specified video file from its directory path. This handy tool helps us capture the video frame by frame. Next, we use cap to gather three crucial pieces of information: the total number of frames, the width, and the height of each frame.

Once we’ve got these details, we prepare to save the processed video by creating an output video writer with cv2.VideoWriter(). We also set up a counter, frame_count, to keep track of how many frames we’ve processed.

Now, with everything in place, we enter a loop where cap.read() retrieves each frame of the video file. Each frame is then sent to the process_frame() function for detection and annotation. Throughout this process, the progress bar is updated with the incremented frame_count, giving us a visual indication of how many frames have been processed.

When all the frames are processed, we release the video capture and video writer, and display a message box to let us know the processed video has been successfully saved. The progress bar is then reset to 0, ready for the next video to be processed.

Photo Processing Function

def process_photo(photo_path):
  """Process a photo file for object detection."""
  frame = cv2.imread(photo_path)
  if frame is None:
      messagebox.showerror("Error", "Could not open or find the image.")
      return
  frame_height, frame_width, _ = frame.shape
  frame, detected = process_frame(frame, frame_width, frame_height, 0, model_file)
  max_display_size = 800
  scale = min(max_display_size / frame_width, max_display_size / frame_height)
  display_frame = cv2.resize(frame, (int(frame_width * scale), int(frame_height * scale)))
  cv2.imshow('Processed Photo - The Pycodes', display_frame)
  cv2.waitKey(0)
  cv2.destroyAllWindows()


  if detected:
      frame_path = os.path.join(frames_dir, os.path.basename(photo_path))
      cv2.imwrite(frame_path, frame)
      messagebox.showinfo("Info", f"Processed photo saved at {frame_path}")
  else:
      messagebox.showinfo("Info", "No objects detected in the photo.")

Now that we have seen how videos are processed, let’s explore how photos are handled through the process_photo() function. This function starts by loading the image from its path using cv2.imread(). If an error occurs and the image cannot be opened, a message box is displayed to alert the user.

Once the image is successfully loaded, the function extracts the height and width of the frame. These dimensions, along with the frame itself, are then passed to the process_frame() function for object detection and annotation.

After the detection and annotation are complete, the function prepares the processed photo for display by resizing it with cv2.resize(). This ensures that the image fits within a display window. The resized photo is then displayed for inspection using cv2.imshow().

If an object is detected and the user closes the display window, the processed photo is saved in the frames_dir directory, and a message box indicates the location of the saved photo. If no objects are detected, a message box informs the user that no objects were found.

Live Video Detection Function

def start_realtime_detection():
  """Start real-time video detection."""
  def run():
      global cap
      cap = cv2.VideoCapture(0)
      if not cap.isOpened():
          messagebox.showerror("Error", "Could not open video.")
          return
      cap.set(cv2.CAP_PROP_FRAME_WIDTH, 1280)
      cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 720)
      frame_width = int(cap.get(3))
      frame_height = int(cap.get(4))
      output_video_path = 'output.avi'
      out = cv2.VideoWriter(output_video_path, cv2.VideoWriter_fourcc(*'XVID'), 20.0, (frame_width, frame_height))
      frame_count = 0



      while cap.isOpened():
          ret, frame = cap.read()
          if not ret:
              break
          frame, detected = process_frame(frame, frame_width, frame_height, frame_count, model_realtime)
          out.write(frame)
          cv2.imshow('YOLOv5 Object Detection - The Pycodes', frame)
          key = cv2.waitKey(1) & 0xFF
          if key == ord('q'):
              break
          elif key == ord('s'):
              frame_path = os.path.join(frames_dir, f"frame_{frame_count}.jpg")
              cv2.imwrite(frame_path, frame)
              print(f"Frame {frame_count} saved at {frame_path}")
          frame_count += 1


      release_camera()
      out.release()
      messagebox.showinfo("Info", f"Annotated video saved to {output_video_path}")



  threading.Thread(target=run).start()

It’s time for us to dive into the world of real-time video detection, so let’s see how the start_realtime_detection() function works. This function initiates a thread that calls the run() function.

How does it work? First, it opens the camera with cv2.VideoCapture(0), and if it fails, it displays an error message. Then, it sets up a higher resolution for the frame by adjusting its width and height. Next, it creates an output video writer to save the processed video.

The function initializes frame_count to track the number of frames in the video. With everything set, a loop begins, processing each frame using the process_frame() function while the camera remains open, thanks to cap.isOpened().

Each frame is annotated and detected using the YOLOv5 model and then displayed with cv2.imshow(). The function includes two handy commands: pressing “s” saves the current frame, and pressing “q” exits the real-time detection, ending the loop and releasing the camera. This saves the output video and displays a message indicating successful completion.

Video Processing Initiation

def start_video_processing():
   """Start video file processing."""
   video_path = filedialog.askopenfilename(filetypes=[("Video Files", "*.mp4;*.avi")])
   if video_path:
       progress_bar = ttk.Progressbar(root, orient="horizontal", length=400, mode="determinate")
       progress_bar.pack(pady=10)
       threading.Thread(target=process_video, args=(video_path, progress_bar)).start()

We have reached the command center: the start_video_processing() function. Once triggered, it opens a file dialog for the user to select a video file. It then verifies the file by checking its file path. A progress bar is created to show the progress of the video processing. Finally, to prevent the main window from freezing, it starts a new thread that calls the process_video() function, passing the video path and the progress bar as arguments.

Main Window Setup

Welcome to the grand finale, where we bring everything together into a graphical user interface (GUI).

First, we create the main window, set its title, and define its size. Then, we add the following buttons:

  • Real-time Video Detection: This button calls the start_realtime_detection() function.
  • Process Video File: This button calls the start_video_processing() function.
  • Process Photo File: This button starts a new thread, calls the process_photo() function, and lets the user select a photo for detection using filedialog.
  • Exit: This button uses the root.quit command to close the main window.

Finally, we start the main event loop with root.mainloop(). This keeps the window running and responsive until the user exits. When the user exits, the detection log is saved to a CSV file containing records of all detected objects.

# Tkinter GUI setup
root = Tk()
root.title("YOLOv5 Object Detection - The Pycodes")
root.geometry("400x300")


btn_realtime = Button(root, text="Real-time Video Detection", command=start_realtime_detection)
btn_realtime.pack(pady=10)


btn_video = Button(root, text="Process Video File", command=start_video_processing)
btn_video.pack(pady=10)


btn_photo = Button(root, text="Process Photo File", command=lambda: threading.Thread(target=process_photo, args=(filedialog.askopenfilename(filetypes=[("Image Files", "*.jpg;*.jpeg;*.png")]),)).start())
btn_photo.pack(pady=10)


btn_exit = Button(root, text="Exit", command=root.quit)
btn_exit.pack(pady=10)


root.mainloop()


log_df = pd.DataFrame(log_data, columns=log_columns)
log_df.to_csv(log_file, index=False)
print(f"Detection log saved to {log_file}")
print(f"Detected frames saved to {frames_dir}")

Example

First, I executed this code on a video as shown in the images below:

Then I detected objects from this image: “car + person”:

Finally, I ran this script for real-time object detection as shown in the video below:

Full Code

import torch
import cv2
import pandas as pd
from datetime import datetime
import os
from tkinter import *
from tkinter import filedialog
from tkinter import messagebox
from tkinter import ttk
import threading
import signal
import sys


# Load YOLOv5s model for real-time video detection
model_realtime = torch.hub.load('ultralytics/yolov5', 'yolov5s')  # YOLOv5s for real-time detection
# Load YOLOv5x model for better accuracy in file processing
model_file = torch.hub.load('ultralytics/yolov5', 'yolov5x')  # YOLOv5x for video and photo files


# Initialize variables
confidence_threshold = 0.3  # Confidence threshold for detections
frames_dir = 'detected_frames'
os.makedirs(frames_dir, exist_ok=True)


# Log file setup
log_file = 'detection_log.csv'
log_columns = ['Timestamp', 'Object', 'Confidence', 'Frame']
log_data = []


cap = None


def release_camera():
  """Release the camera resource."""
  global cap
  if cap is not None:
      cap.release()
      cv2.destroyAllWindows()
      cap = None


def signal_handler(sig, frame):
  """Handle termination signals to ensure resources are released."""
  release_camera()
  sys.exit(0)


signal.signal(signal.SIGINT, signal_handler)


def process_frame(frame, frame_width, frame_height, frame_count, model):
  """Process a single frame for object detection."""
  results = model(frame)
  labels, cords = results.xyxyn[0][:, -1].numpy(), results.xyxyn[0][:, :-1].numpy()
  n = len(labels)
  detected = False
  for i in range(n):
      row = cords[i]
      if row[4] >= confidence_threshold:
          x1, y1, x2, y2 = int(row[0] * frame_width), int(row[1] * frame_height), int(row[2] * frame_width), int(row[3] * frame_height)
          bgr = (0, 255, 0)
          cv2.rectangle(frame, (x1, y1), (x2, y2), bgr, 2)
          text = f"{model.names[int(labels[i])]} {row[4]:.2f}"
          cv2.putText(frame, text, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 1.0, bgr, 2)
          timestamp = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
          log_data.append([timestamp, model.names[int(labels[i])], row[4], frame_count])
          detected = True
  return frame, detected



def process_video(video_path, progress_bar):
  """Process a video file for object detection."""
  cap = cv2.VideoCapture(video_path)
  frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
  frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
  total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
  output_video_path = 'output.avi'
  out = cv2.VideoWriter(output_video_path, cv2.VideoWriter_fourcc(*'XVID'), 20.0, (frame_width, frame_height))


  frame_count = 0
  while cap.isOpened():
      ret, frame = cap.read()
      if not ret:
          break
      frame, _ = process_frame(frame, frame_width, frame_height, frame_count, model_file)
      out.write(frame)
      frame_count += 1
      progress = (frame_count / total_frames) * 100
      progress_bar['value'] = progress
      progress_bar.update_idletasks()



  cap.release()
  out.release()
  messagebox.showinfo("Info", f"Processed video saved to {output_video_path}")
  progress_bar['value'] = 0


def process_photo(photo_path):
  """Process a photo file for object detection."""
  frame = cv2.imread(photo_path)
  if frame is None:
      messagebox.showerror("Error", "Could not open or find the image.")
      return
  frame_height, frame_width, _ = frame.shape
  frame, detected = process_frame(frame, frame_width, frame_height, 0, model_file)
  max_display_size = 800
  scale = min(max_display_size / frame_width, max_display_size / frame_height)
  display_frame = cv2.resize(frame, (int(frame_width * scale), int(frame_height * scale)))
  cv2.imshow('Processed Photo - The Pycodes', display_frame)
  cv2.waitKey(0)
  cv2.destroyAllWindows()




  if detected:
      frame_path = os.path.join(frames_dir, os.path.basename(photo_path))
      cv2.imwrite(frame_path, frame)
      messagebox.showinfo("Info", f"Processed photo saved at {frame_path}")
  else:
      messagebox.showinfo("Info", "No objects detected in the photo.")


def start_realtime_detection():
  """Start real-time video detection."""
  def run():
      global cap
      cap = cv2.VideoCapture(0)
      if not cap.isOpened():
          messagebox.showerror("Error", "Could not open video.")
          return
      cap.set(cv2.CAP_PROP_FRAME_WIDTH, 1280)
      cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 720)
      frame_width = int(cap.get(3))
      frame_height = int(cap.get(4))
      output_video_path = 'output.avi'
      out = cv2.VideoWriter(output_video_path, cv2.VideoWriter_fourcc(*'XVID'), 20.0, (frame_width, frame_height))
      frame_count = 0


      while cap.isOpened():
          ret, frame = cap.read()
          if not ret:
              break
          frame, detected = process_frame(frame, frame_width, frame_height, frame_count, model_realtime)
          out.write(frame)
          cv2.imshow('YOLOv5 Object Detection - The Pycodes', frame)
          key = cv2.waitKey(1) & 0xFF
          if key == ord('q'):
              break
          elif key == ord('s'):
              frame_path = os.path.join(frames_dir, f"frame_{frame_count}.jpg")
              cv2.imwrite(frame_path, frame)
              print(f"Frame {frame_count} saved at {frame_path}")
          frame_count += 1


      release_camera()
      out.release()
      messagebox.showinfo("Info", f"Annotated video saved to {output_video_path}")


  threading.Thread(target=run).start()


def start_video_processing():
  """Start video file processing."""
  video_path = filedialog.askopenfilename(filetypes=[("Video Files", "*.mp4;*.avi")])
  if video_path:
      progress_bar = ttk.Progressbar(root, orient="horizontal", length=400, mode="determinate")
      progress_bar.pack(pady=10)
      threading.Thread(target=process_video, args=(video_path, progress_bar)).start()


# Tkinter GUI setup
root = Tk()
root.title("YOLOv5 Object Detection - The Pycodes")
root.geometry("400x300")


btn_realtime = Button(root, text="Real-time Video Detection", command=start_realtime_detection)
btn_realtime.pack(pady=10)


btn_video = Button(root, text="Process Video File", command=start_video_processing)
btn_video.pack(pady=10)


btn_photo = Button(root, text="Process Photo File", command=lambda: threading.Thread(target=process_photo, args=(filedialog.askopenfilename(filetypes=[("Image Files", "*.jpg;*.jpeg;*.png")]),)).start())
btn_photo.pack(pady=10)


btn_exit = Button(root, text="Exit", command=root.quit)
btn_exit.pack(pady=10)


root.mainloop()


log_df = pd.DataFrame(log_data, columns=log_columns)
log_df.to_csv(log_file, index=False)
print(f"Detection log saved to {log_file}")
print(f"Detected frames saved to {frames_dir}")

Happy Coding!

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top