YOLO v9 Model Guide

Getting Started with Modelbit

Modelbit is an MLOps platform that lets you train and deploy any ML model, from any Python environment, with a few lines of code.

Table of Contents

Getting StartedOverviewUse CasesStrengthsLimitationsLearning Type

Model Comparisons

No items found.

Getting Started

Model Documentation

Deploying YoloV9 Model to A Rest API Endpoint for Object Detection

Installations and Set Up

To begin, we will clone an existing implementation of the YoloV9 research paper and install the required dependencies.


!git clone https://github.com/SkalskiP/yolov9.git
%cd yolov9
!pip install -r requirements.txt -q

!pip install modelbit

!pip install supervision

# Getting the current working directory, where the notebook is running and assign that directory to HOME variable

import os
HOME = os.getcwd()
print(HOME)

Downloading YoloV9 weights

The repository contains pre-trained weight files for different configurations (yolov9-c.pt, yolov9-e.pt, gelan-c.pt, gelan-e.pt) of the YOLOv9 object detection model and its components. These weights are crucial for the model's ability to detect objects in images or videos.

Let's break down what each of these weights represents:

  • yolov9-c.pt: This weight file is designed for a balance between speed and accuracy. The "c" in the filename denotes a variant within the YOLOv9 model family, focusing on computational efficiency or a compact model size.
  • yolov9-e.pt: The "e" in the filename denotes it is "enhanced" or "extended" version, offering higher accuracy. It is larger in size and requires more computational resources compared to other variants like yolov9-c.pt.
  • gelan-c.pt and gelan-e.pt: These weights are associated with the GELAN (Generalized Efficient Layer Aggregation Network) architecture, which is part of the innovations introduced in YOLOv9. GELAN is designed to optimize parameter efficiency and improve the model's performance. The "c" and "e" in the filenames indicate different versions of the GELAN architecture, with each tailored for specific use cases or performance criteria. "c" is aimed at computational efficiency, while "e" aimed at enhancing performance.

!wget -P {HOME}/weights -q https://github.com/WongKinYiu/yolov9/releases/download/v0.1/yolov9-c.pt
!wget -P {HOME}/weights -q https://github.com/WongKinYiu/yolov9/releases/download/v0.1/yolov9-e.pt
!wget -P {HOME}/weights -q https://github.com/WongKinYiu/yolov9/releases/download/v0.1/gelan-c.pt
!wget -P {HOME}/weights -q https://github.com/WongKinYiu/yolov9/releases/download/v0.1/gelan-e.pt

!ls -la {HOME}/weights

!pwd

Downloading Sample Data


!wget -P {HOME}/data -q https://doc.modelbit.com/img/cat.jpg
!wget -P {HOME}/data -q http://doc.modelbit.com/img/dog-in-snow.jpg

Detection with pre-trained COCO model

Detection with the computationally efficient yolov9-c weight


!python detect.py --weights {HOME}/weights/yolov9-c.pt --conf 0.1 --source {HOME}/data/cat.jpg --device cpu

from IPython.display import Image

Image(filename=f"{HOME}/runs/detect/exp/cat.jpg", width=600)

!python detect.py --weights {HOME}/weights/yolov9-c.pt --conf 0.1 --source {HOME}/data/dog-in-snow.jpg --device cpu

from IPython.display import Image

Image(filename=f"{HOME}/runs/detect/exp2/dog-in-snow.jpg", width=600)

Detection with the computationally efficient gelan-c weight


!python detect.py --weights {HOME}/weights/gelan-c.pt --conf 0.1 --source {HOME}/data/cat.jpg --device cpu

from IPython.display import Image

Image(filename=f"{HOME}/runs/detect/exp3/cat.jpg", width=600)

!python detect.py --weights {HOME}/weights/gelan-c.pt --conf 0.1 --source {HOME}/data/dog-in-snow.jpg --device cpu

from IPython.display import Image

Image(filename=f"{HOME}/runs/detect/exp4/dog-in-snow.jpg", width=600)

Model Weight of choice

In our deployment, we would choose to deploy the model weight that is more computationally efficient. From the code cells above, you can observe that the gelan-c.pt model has a much faster inference time compared to yolov9-c.pt. Therefore, in this tutorial, we would deploy gelan-c.pt as it offers better computational efficiency.

Inference Function for Object detection

The function begins by initializing the device and loading the pre-trained object detection model. It then downloads the image from the provided URL, preprocesses it by resizing and normalizing the pixel values, and converts it to a PyTorch tensor. This tensor is passed through the loaded model, and the resulting predictions are filtered using non-maximum suppression. For each remaining detection, the script scales the bounding box coordinates to match the original image dimensions, creates a Detections object from the Supervision library, and generates labels by combining the class IDs and confidence scores. Finally, it prints the generated labels to the console and returns them as a list.


import torch
import cv2
import numpy as np
from models.common import DetectMultiBackend
from utils.general import non_max_suppression, scale_boxes
from utils.torch_utils import select_device, smart_inference_mode
from utils.augmentations import letterbox
from PIL import Image
import supervision as sv
import requests

@smart_inference_mode()
def generate_labels(image_url, weights='weights/gelan-c.pt', imgsz=640, conf_thres=0.1, iou_thres=0.45, device='cpu', data='data/coco.yaml'):
    # Initialize
    device = select_device(device)
    model = DetectMultiBackend(weights, device=device, fp16=False, data=data)
    stride, names, pt = model.stride, model.names, model.pt

    # Load image
    image = Image.open(requests.get(image_url, stream=True).raw)
    img0 = np.array(image)
    assert img0 is not None, f'Image Not Found {image_url}'
    img = letterbox(img0, imgsz, stride=stride, auto=True)[0]
    img = img[:, :, ::-1].transpose(2, 0, 1)
    img = np.ascontiguousarray(img)
    img = torch.from_numpy(img).to(device).float()
    img /= 255.0
    if img.ndimension() == 3:
        img = img.unsqueeze(0)

    # Inference
    pred = model(img, augment=False, visualize=False)

    # Apply NMS
    pred = non_max_suppression(pred[0][0], conf_thres, iou_thres, classes=None, max_det=1000)

    # Process detections
    for i, det in enumerate(pred):
        if len(det):
          det[:, :4] = scale_boxes(img.shape[2:], det[:, :4], img0.shape).round()
          for *xyxy, conf, cls in reversed(det):
            # Transform detections to supervisions detections
            detections = sv.Detections(
                xyxy=torch.stack(xyxy).cpu().numpy().reshape(1, -1),
                class_id=np.array([int(cls)]),
                confidence=np.array([float(conf)])
              )
              # Labels
            labels = [f"{class_id} {confidence:0.2f}"
                    for class_id, confidence
                    in zip(detections.class_id, detections.confidence)
                    ]
            print(labels)
    return labels

labels = generate_labels(image_url='https://doc.modelbit.com/img/cat.jpg')
labels

Deploy YoloV9 with GELAN compact pre-trained weights gelan-c.pt to a REST API Endpoint

Log into modelbit


import modelbit as mb

mb.login()

Now in this deployment we include the cloned repository and pre-trained weight of choice


mb.deploy(generate_labels,
          python_packages=[
                          "matplotlib==3.7.1",
                          "numpy==1.25.2",
                          "torch==2.0.1+cpu",
                          "torchvision==0.15.2+cpu",
                          "opencv-python==4.8.0.76",
                          "supervision==0.19.0",
                          "Pillow==9.4.0"],
          system_packages=["python3-opencv"],
          extra_files=["weights/gelan-c.pt","data/coco.yaml",
          "models", "utils", "export.py"]
          )

Test the REST Endpoint with a Single Image

You can test your REST Endpoint by sending single or batch production images to it for scoring.

Use the requests package to POST a request to the API and use json to format the response to print nicely:

⚠️ Replace the ENTER_WORKSPACE_NAME placeholder with your workspace name.


import json
import requests

requests.post("https://ENTER_WORKSPACE_NAME.us-east-1.modelbit.com/v1/generate_labels/latest",
              headers={"Content-Type":"application/json"},
              data=json.dumps({"data": ["https://doc.modelbit.com/img/cat.jpg"]})).json()

You can also test your endpoint from the command line using:


curl -s -XPOST "https://ENTER_WORKSPACE_NAME.us-east-1.modelbit.com/v1/generate_labels/latest" -d
'{"data": ["https://doc.modelbit.com/img/cat.jpg"]}' | json_pp

⚠️ Replace the ENTER_WORKSPACE_NAME placeholder with your workspace name.

Model Overview

YOLO v9, developed by Chien-Yao Wang, I-Hau Yeh, and Hong-Yuan Mark Liao, represents a considerable leap forward in the YOLO series. It introduces groundbreaking techniques such as Programmable Gradient Information (PGI) and the Generalized Efficient Layer Aggregation Network (GELAN). These innovations are aimed at overcoming information loss challenges in deep neural networks, ensuring exceptional accuracy and performance​.

YOLOv9 is built upon the Information Bottleneck Principle, focusing on minimizing information loss through the network. This is achieved by incorporating PGI and reversible functions, enabling the model to maintain a complete information flow. Such architectural advancements allow YOLO v9 to achieve remarkable efficiency and accuracy, setting new standards in the MS COCO dataset​​.

Use Cases

Autonomous Vehicles: YOLOv9's precision in object detection aids in navigating safely.

Retail: It can detect customer movements and queue lengths, optimizing the shopping experience.

Logistics: Enhances inventory management through accurate object detection.

Sports Analytics: Provides insights by tracking player movements​.

Strengths

YOLO v9's primary strength lies in its architectural innovations, such as PGI and GELAN, which allow it to achieve high accuracy while being efficient in terms of computational resources. It successfully balances model complexity with performance, making it suitable for various applications from lightweight devices to performance-intensive tasks.

Limitations

As a cutting-edge technology, YOLO v9 may require substantial resources for training and fine-tuning for specific tasks, posing a challenge for those with limited computational power. Additionally, the complexity of its architecture might present a steep learning curve for newcomers to the field.

Learning Type & Algorithmic Approach

YOLO v9 operates on a supervised learning paradigm, specifically tailored for real-time object detection tasks. Its architecture, featuring PGI and GELAN, focuses on minimizing information loss and improving gradient flow across the network, making it highly effective and efficient.

This model has proven to be a significant advancement in the field of object detection, offering a blend of efficiency, accuracy, and versatility. Its development not only highlights the ongoing evolution of the YOLO series but also underscores the potential for innovative solutions to longstanding challenges in computer vision.

Ready to see an ML platform you will love?

Get a demo and learn how ML teams are deploying and managing ML models with Modelbit.
Book a Demo