top of page

Computer Vision

Step-by-Step Guide to Implementing YOLO Object Counting | With Code Sample

In this post, we'll guide you through the process of preparing annotated data for YOLO model training, from labeling to organizing your data

6

min

Mahmoud_edited.jpg

Jagger W.

In our previous posts, we introduced you to the YOLO object detection algorithm and walked you through preparing annotated data for training your YOLO model.

Now, it's time to put that knowledge into practice.

Among the countless applications of YOLO algorithms, we want to focus on a real-world scenario: road vehicle counting. This use case holds significant importance for traffic planning and decision-making in smart cities.

Step-by-Step Guide to Implementing YOLO Object Counting

In this post, we'll take you on a step-by-step journey to implement YOLO object detection and counting, using vehicle tracking as our practical example.

Get ready to see your YOLO model come to life!

What is Object Counting?

Object counting is a crucial application in computer vision that focuses on identifying and counting specific objects, such as people, animals, or vehicles, within images or videos. This technology has a wide range of application prospects in computer vision, such as:

  1. Traffic monitoring: Object counting can be used to measure vehicle flow, enabling better traffic management decisions.

  1. Retail industry: By counting customer flow within a store, businesses can gain insights into customer shopping behavior and improve sales efficiency.

  1. Security monitoring: Object counting can be employed to monitor the flow of people in public places, enhancing safety and security.

Object Counting

Four Steps to Achieve Object Counting

The object-counting process typically involves four main steps:

  1. Image preprocessing: This step aims to improve counting accuracy by applying techniques such as denoising, filtering, and enhancement to the input images or video frames.

  1. Object detection: The goal of this step is to identify specific objects within the preprocessed images or videos. Popular object detection algorithms include Faster R-CNN, YOLO (You Only Look Once), and SSD (Single Shot MultiBox Detector).

  1. Object tracking: In this step, the same object is tracked across a sequence of consecutive image frames. Common object-tracking algorithms include KCF (Kernelized Correlation Filter), TLD (Tracking, Learning, and Detection), and MOSSE (Minimum Output Sum of Squared Error). Object tracking ensures that the same object is not counted multiple times in consecutive frames of a dynamic video.

  1. Object counting: The final step involves counting the number of detected and tracked objects.

Why YOLOv8 is Suitable for Object Counting?

YOLO is a powerful tool for object detection tasks in computer vision.

Although YOLO is primarily used for object detection, it can also be used for object counting, such as counting the number of people and cars appearing in a real-time surveillance video stream. Compared to traditional object detection algorithms like the R-CNN series, YOLOv8 offers several advantages that make it well-suited for object counting applications.

YOLO v8 Architecture
YOLO v8 Architecture (Source: https://medium.com/@juanpedro.bc22)

Real-time Performance

One of the main strengths of YOLOv8 is its ability to perform real-time object detection.

Due to its "look at once" characteristic, the YOLO model does not need to scan the image multiple times but divides the image into multiple grids at once, classifies and regresses each grid, thereby achieving object detection. This real-time performance makes YOLOv8 ideal for applications that require immediate feedback, such as autonomous driving, video surveillance, and robot vision.

High Accuracy

Another advantage of YOLOv8 is its high accuracy. Traditional object detection algorithms often rely on multi-stage processing, which can lead to cumulative errors at each stage. YOLOv8 avoids this issue by completing all tasks in a single step. Additionally, YOLOv8 considers the context of the objects, allowing it to maintain high accuracy even in complex scenarios involving occlusion or overlap.


Read: Annotate in YOLO Format

Implementing Object Counting with YOLOv8

To use YOLOv8 for object counting, we first need to detect objects in each frame of the video and obtain their categories and locations. However, to avoid counting the same object multiple times across consecutive frames, we also need to employ object-tracking algorithms.

💡 Object tracking algorithms aim to track one or more objects across a sequence of video frames. There are various types of object-tracking algorithms, including optical flow-based methods, feature-based methods, and model-based methods. Optical flow-based methods are particularly common, as they estimate the direction and distance of motion for each pixel by comparing two consecutive frames, enabling effective object tracking.

However, this method has a challenge: how to handle objects' entry and exit. For example, a person may enter from the left side of the screen and then exit from the right side. In this case, we should count this person as 1, not 2. To address this issue, we can establish rules such as only starting to track and count an object when it fully enters the screen and stopping the tracking and counting process when the object completely leaves the screen.

Hands-on Practice: YOLO Object Counting

In the following section, we'll demonstrate how to count the number of people and vehicles appearing in a one-minute road surveillance video (street.mp4) using the YOLOv8 model provided by Ultralytics.

The output will be a video (street_object_counting.mp4) with annotations indicating the appearance and disappearance of people and vehicles in each frame. Ultralytics not only provides YOLO models but also offers advanced features like Object Counting in their PIP package, making it easier for developers to utilize YOLO models.

YOLO Object Counting: Input Video and Output Video

You can download the complete Google Colab Notebook code from here: yolov8_object_counting.ipynb.

Google Colab Notebook
Google Colab Notebook

First, set up the runtime environment by downloading the test video and installing the required PIP dependencies.

$ curl -o /content/street.mp4 https://basicai-asset.s3.amazonaws.com/www/blogs/yolov8-object-counting/street.mp4
$ pip install ultralytics

Then, input the following code snippets.

import cv2
from ultralytics import YOLO
from ultralytics.solutions import object_counter as oc

Import the necessary Python objects, including cv2 for reading and writing video files, YOLO for detecting and tracking people and vehicles in each video frame, and object_counter or annotating the appearing people and vehicles in each frame to generate the video result.

input_video_path = "street.mp4"
output_video_path = "street_object_counting.mp4"
video_capture = cv2.VideoCapture(input_video_path)
assert video_capture.isOpened(), "Illegal or non-existing video file"
video_width, video_height, video_fps = (
    int(video_capture.get(p))
    for p in (cv2.CAP_PROP_FRAME_WIDTH, cv2.CAP_PROP_FRAME_HEIGHT, cv2.CAP_PROP_FPS)
)
video_writer = cv2.VideoWriter(
    output_video_path, cv2.VideoWriter_fourcc(*"mp4v"), video_fps, (video_width, video_height)
)

Use VideoCapture to read the video frames one by one and VideoWriter to generate the resulting video with the same resolution and frame rate as the source video.

yolo = YOLO("yolov8n.pt")
object_counter = oc.ObjectCounter()
object_counter.set_args(
    view_img=True,
    reg_pts=[(0, 540), (1280, 540)],
    classes_names=yolo.names,
    draw_tracks=True
)
while video_capture.isOpened():
    success, frame = video_capture.read()
    if not success:
        break
    tracks = yolo.track(frame, persist=True, show=False, classes=[0, 2])
    frame = object_counter.start_counting(frame, tracks)
    video_writer.write(frame)

Create a YOLO model object and an ObjectCounter object. Then, use VideoCapture to read all the frames from the source video frame by frame. Utilize YOLO to detect and track the people and vehicles appearing in each frame, and employ ObjectCounter to annotate the appearing people and vehicles in each frame. Finally, write the annotated frames to the output video stream using VideoWriter.

Important parameters:

  • set_args.reg_pts: The counting region or line. Only objects appearing within the specified region or crossing the specified line will be counted. Here, we specified a horizontal line at the bottom of the frame.

  • track.persist: Since cross-frame object tracking is required (to avoid counting the same person or vehicle multiple times), the detection results of all previous frames need to be preserved.

  • track.classes: Specifies the types of objects to be detected and tracked. 0 represents people, and 1 represents vehicles.

video_capture.release()
video_writer.release()
cv2.destroyAllWindows()

Finally, release the relevant resources.



Q&A

Q1: Hey, I'm curious, what are some of the biggest hurdles you've faced when implementing this in real-world scenarios?

A1: One of the most common issues is dealing with occlusion (objects being partially or fully covered by other objects), which can make it tricky for the model to detect and count them accurately. Another challenge is making sure the model doesn't count the same object multiple times across different frames. It takes some clever preprocessing and fine-tuning of the YOLO model to tackle these problems head-on. Having a solid object-tracking algorithm in place is crucial, too!

Q2: I'm working on a project that involves counting wildlife in natural habitats. Do you think this YOLO-based approach could work for that?

A2: Absolutely! The cool thing about YOLOv8 is that it's super versatile. With the right training data, you can teach it to detect and count pretty much any type of object you need. Whether it's animals in the wild, products on an assembly line, or even tiny cells under a microscope, YOLO's got you covered. The key is to have a well-labeled dataset that's specific to your use case.

Q3: I've heard a lot about Google Colab recently. Can you explain what it is and why it's so popular?

A3: Google Colab is a game-changing cloud-based Jupyter notebook environment that allows you to run Python code and train deep learning models using Google's powerful computing resources, including GPUs and TPUs, all for free! It's incredibly user-friendly, with pre-installed Python libraries, seamless integration with Google Drive, and support for Markdown and LaTeX. It's no wonder why Colab has become such a popular tool among machine learning enthusiasts and data scientists alike.

Q4: I've been experimenting with object counting in low-light conditions, but the accuracy just isn't where I need it to be. Any tips on how to improve it?

A4: Low-light scenarios can be a real headache for object detection and counting. There are a few tricks you can try. Consider applying some image enhancement techniques as a preprocessing step. Histogram equalization or gamma correction can work wonders in bringing out the details in those dark images. Another thing that can help is using a YOLO model that's been specifically trained on a dataset that includes low-light images. That way, it'll be better equipped to handle those challenging conditions.

Read Next

YOLO Object Detection Algorithms 101: Part 1

Preparing Data for YOLO Training: Data Annotation Techniques and Best Practices

Get Project Estimates
Get a Quote Today

Get Essential Training Data
for Your AI Model Today.

bottom of page