Home » Tutorials » How to Build a Real-Time Object Detection System Using YOLOv5 and OpenCV

How to Build a Real-Time Object Detection System Using YOLOv5 and OpenCV

Real-time object detection has changed the game in many areas, from keeping our security systems sharp to powering the brains of autonomous vehicles. It’s an essential skill for any tech enthusiast today. Now, thanks to Python, building your own real-time object detection system is easier than ever, letting you spot and track objects with amazing precision. With technological advancements, you can create powerful systems that truly transform how we interact with the world.

In today’s tutorial, you will learn how to build a real-time object detection system using YOLOv5 and OpenCV in Python. I’ll guide you through setting up the YOLOv5 model, integrating it with a webcam, and utilizing OpenCV for video processing. You’ll also learn how to save detection logs and annotated frames, making this project perfect for enhancing your computer vision capabilities.

By leveraging YOLOv5’s advanced detection features and OpenCV’s robust computer vision functions, you’ll be able to create a system that’s not only powerful but also incredibly efficient. So, let’s get started!

Table of Contents

Setup and Installation

Make sure to install the torch, torchvision, pandas, and opencv-python libraries via the terminal or command prompt for the code to function properly:

$ pip install torch
$ pip install torchvision
$ pip install pandas
$ pip install opencv-python

We also need to clone the YOLOv5 repository to get the pre-trained model and the necessary scripts:

For Linux and macOS:

  • Open Terminal and run the following command:
$ git clone https://github.com/ultralytics/yolov5
  • Navigate to the YOLOv5 directory and install the required dependencies:
$ cd yolov5
$ pip install -r requirements.txt

For Windows:

  1. Install Microsoft Visual C++ Build Tools: Download and install from here.
  2. Then, download the requirements.txt file from here.
  3. Put this file in the project directory.
  4. Go to the Terminal and install the required dependencies by running this command:
$ pip install -r requirements.txt

Imports

Alright, let’s get ready to turn our vision into reality! First things first, we need to gather our tools. Here’s what we’ll be using:

  • torch: This is our powerhouse for loading the pre-trained YOLOv5 model.
  • cv2: We’ll use this to capture videos and process images. It’s like our camera’s best friend!
  • pandas: This handy library helps us handle and manipulate data, plus it saves our logs in neat CSV files.
  • datetime: Perfect for handling dates and times, and for timestamping our detection logs.
  • os: This lets us interact with the operating system to manage files and directories.
import torch
import cv2
import pandas as pd
from datetime import datetime
import os

Load YOLOv5 Model

With all our tools ready, the next step is to load the model that will detect objects. We’re going to use the medium version of the YOLOv5 model, which offers a good balance of speed and accuracy, from the PyTorch Hub. Specifically, we’ll pull it from “ultralytics/yolov5” where the YOLOv5 model is located.

# Load YOLOv5 model (using a larger model for better accuracy)
model = torch.hub.load('ultralytics/yolov5', 'yolov5m')  # Use yolov5m instead of yolov5s for better accuracy

Initialize Webcam and Set Resolution

# Initialize webcam
cap = cv2.VideoCapture(0)

Next, to recognize objects, we first need to see them clearly. We’ll use cv2.VideoCapture() to initialize our default webcam and set the width of the captured frames to 1280 pixels and the resolution to 720 pixels. This way, we get a wider field of view and better image quality, enhancing our detection accuracy.

Initializing Video Writer and Setting Up Logging

Now, just like any adventure, you need a scribe (video writer) and a chronicler (logging). Let’s start with our scribe, the video writer. To record our journey, it first retrieves the frame width and height from the video capture object using cap.get(). Then, cv2.VideoWriter() kicks in to save the video.

With the scribe ready, let’s move on to our chronicler, who will document every discovery. We’ll do this by setting up a CSV file to log detections. First, we define the columns for our logs:

  • Timestamp: When the object was detected.
  • Object: Whatever is detected.
  • Confidence: How sure the model is about the detection.
  • Frame: The frame number where the object was detected.

We also create an empty list, log_data, to store these log entries as we go along.

# Set higher resolution for better accuracy (if your webcam supports it)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 1280)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 720)


# We Get the video writer initialized to save the output video
frame_width = int(cap.get(3))
frame_height = int(cap.get(4))
output_video_path = 'output.avi'
out = cv2.VideoWriter(output_video_path, cv2.VideoWriter_fourcc(*'XVID'), 20.0, (frame_width, frame_height))


# Log file setup
log_file = 'detection_log.csv'
log_columns = ['Timestamp', 'Object', 'Confidence', 'Frame']
log_data = []

Setup Directories and Check Webcam

Let’s make sure everything is set up correctly so we can smoothly save frames and verify our webcam is working. First, we’ll specify frames_dir as the directory where we want to save the frames. If the directory doesn’t exist, we’ll create it using the os module:

frames_dir = 'detected_frames'
os.makedirs(frames_dir, exist_ok=True)

Next, we need to check if our webcam is ready for action with cap.isOpened(). If it isn’t, we’ll get an error message, and the program will stop running:

if not cap.isOpened():
    print("Error: Could not open video.")
    exit()

We’ll also set some initial parameters, such as frame_count to keep track of frames and confidence_threshold for better accuracy:

frame_count = 0
confidence_threshold = 0.5  # Increased confidence threshold for better accuracy

Finally, we’ll start reading frames from the webcam. If a frame can’t be read, the loop will break:

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break

This ensures we’re all set to capture and save our object detections without any issues.

Object Detection Loop

  # Perform object detection
  results = model(frame)


  # Get detection results
  labels, cords = results.xyxyn[0][:, -1].numpy(), results.xyxyn[0][:, :-1].numpy()



  # Annotate frame
  n = len(labels)
  for i in range(n):
      row = cords[i]
      if row[4] >= confidence_threshold:  # Apply the confidence threshold
          x1, y1, x2, y2 = int(row[0] * frame_width), int(row[1] * frame_height), int(row[2] * frame_width), int(row[3] * frame_height)
          bgr = (0, 255, 0)
          cv2.rectangle(frame, (x1, y1), (x2, y2), bgr, 2)
          text = f"{model.names[int(labels[i])]} {row[4]:.2f}"
          cv2.putText(frame, text, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, bgr, 2)


          # Log detected objects
          timestamp = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
          log_data.append([timestamp, model.names[int(labels[i])], row[4], frame_count])


  # Save frame to video file
  out.write(frame)


  # Display the frame
  cv2.imshow('YOLOv5 Object Detection - The Pycodes', frame)


  # Press 'q' to quit or 's' to save the current frame
  key = cv2.waitKey(1) & 0xFF
  if key == ord('q'):
      break
  elif key == ord('s'):
      frame_path = os.path.join(frames_dir, f"frame_{frame_count}.jpg")
      cv2.imwrite(frame_path, frame)
      print(f"Frame {frame_count} saved at {frame_path}")


  frame_count += 1

Now it’s time for us to dive into the heart of our program. This is where we keep track of the number of processed frames and set a confidence score threshold. Any detection below this threshold will be ignored to improve accuracy. We read frames from the webcam, and the YOLOv5 model performs object detection on each captured frame, identifying objects and their coordinates and confidence scores.

We extract the results and convert them to numpy arrays for easier manipulation. Next, we get the number of detected objects and iterate through them to get their coordinates and scores. If the confidence score meets the threshold, we scale the coordinates to the actual frame dimensions. We then draw a green box around the detected object using cv2.rectangle() and annotate the frame with the object label and confidence score using cv2.putText().

Additionally, we record this information in our log_data list, timestamping each entry. We write the annotated frame to our video file and display it live using cv2.imshow(). Finally, we provide two choices to the user: press “q” to exit or press “s” to save a frame from the video.

Releasing Resources and Saving Logs

# Release everything if the job is finished
cap.release()
out.release()
cv2.destroyAllWindows()


# Save log data to CSV
log_df = pd.DataFrame(log_data, columns=log_columns)
log_df.to_csv(log_file, index=False)


print(f"Detection log saved to {log_file}")
print(f"Annotated video saved to {output_video_path}")
print(f"Detected frames saved to {frames_dir}")

As we conclude our journey, we close all windows and release the webcam and video writer. We also finalize recording our adventure by saving the log data into a CSV file.

Example

As you can see, the code detected a TV, cell phone, and remote:

Full Code

import torch
import cv2
import pandas as pd
from datetime import datetime
import os


# Load YOLOv5 model (using a larger model for better accuracy)
model = torch.hub.load('ultralytics/yolov5', 'yolov5m')  # Use yolov5m instead of yolov5s for better accuracy


# Initialize webcam
cap = cv2.VideoCapture(0)


# Set higher resolution for better accuracy (if your webcam supports it)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 1280)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 720)


# We Get the video writer initialized to save the output video
frame_width = int(cap.get(3))
frame_height = int(cap.get(4))
output_video_path = 'output.avi'
out = cv2.VideoWriter(output_video_path, cv2.VideoWriter_fourcc(*'XVID'), 20.0, (frame_width, frame_height))


# Log file setup
log_file = 'detection_log.csv'
log_columns = ['Timestamp', 'Object', 'Confidence', 'Frame']
log_data = []


# Directory to save frames
frames_dir = 'detected_frames'
os.makedirs(frames_dir, exist_ok=True)


if not cap.isOpened():
  print("Error: Could not open video.")
  exit()


frame_count = 0
confidence_threshold = 0.5  # Increased confidence threshold for better accuracy


while cap.isOpened():
  ret, frame = cap.read()
  if not ret:
      break


  # Perform object detection
  results = model(frame)


  # Get detection results
  labels, cords = results.xyxyn[0][:, -1].numpy(), results.xyxyn[0][:, :-1].numpy()


  # Annotate frame
  n = len(labels)
  for i in range(n):
      row = cords[i]
      if row[4] >= confidence_threshold:  # Apply the confidence threshold
          x1, y1, x2, y2 = int(row[0] * frame_width), int(row[1] * frame_height), int(row[2] * frame_width), int(row[3] * frame_height)
          bgr = (0, 255, 0)
          cv2.rectangle(frame, (x1, y1), (x2, y2), bgr, 2)
          text = f"{model.names[int(labels[i])]} {row[4]:.2f}"
          cv2.putText(frame, text, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, bgr, 2)




          # Log detected objects
          timestamp = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
          log_data.append([timestamp, model.names[int(labels[i])], row[4], frame_count])


  # Save frame to video file
  out.write(frame)


  # Display the frame
  cv2.imshow('YOLOv5 Object Detection - The Pycodes', frame)


  # Press 'q' to quit or 's' to save the current frame
  key = cv2.waitKey(1) & 0xFF
  if key == ord('q'):
      break
  elif key == ord('s'):
      frame_path = os.path.join(frames_dir, f"frame_{frame_count}.jpg")
      cv2.imwrite(frame_path, frame)
      print(f"Frame {frame_count} saved at {frame_path}")


  frame_count += 1


# Release everything if the job is finished
cap.release()
out.release()
cv2.destroyAllWindows()


# Save log data to CSV
log_df = pd.DataFrame(log_data, columns=log_columns)
log_df.to_csv(log_file, index=False)

print(f"Detection log saved to {log_file}")
print(f"Annotated video saved to {output_video_path}")
print(f"Detected frames saved to {frames_dir}")

Happy Coding!

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top