Deploying DINOv2 for Image Classification with Modelbit

Introduction

Meta AI's cutting-edge self-supervised learning model, DINOv2, is at the forefront of advancements in computer vision. As an example of the most recent progress in AI, DINOv2 works without labels or annotations. This lets it learn the full visual representations used in various other tasks. In this article we will walk through the necessary steps to deploy DINOv2 as a REST API endpoint with Modelbit. First, a quick look at what makes DINOv2 so impressive.

DINOv2 can accomplish many tasks, such as:

Depth estimation with strong generality on images.
Semantic segmentation without any fine-tuning involves clustering images into object classes.
Instance retrieval to find specific objects or images in a whole data pool.
Dense matching to consistently map similar pixels of an image to others without supervision.
Sparse matching to compare features across images to match their most similar parts.

Here’s Meta's page on DINOv2 where you can view examples and demos of the capabilities listed above.

What Makes DINOv2 Useful?

One of the significant advantages of DINOv2 is its versatility for different types of computer vision tasks. As aforementioned, unlike specific models that work on single tasks, DINOv2 can accomplish many tasks, such as:

Depth estimation with strong generality on images.
Semantic segmentation without any fine-tuning involves clustering images into object classes.
Instance retrieval to find specific objects or images in a whole data pool.
Dense matching to consistently map similar pixels of an image to others without supervision.
Sparse matching to compare features across images to match their most similar parts.

Overview of What You’ll Accomplish in This Tutorial

First, let's take a quick look at the solution you will build and put into use before we get into the code. This API will serve as a communication channel between your image and the final classification results.

Once the API receives a URL to an image file, it will pass the image through your DINOv2 model deployed on Modelbit and return the class ID and label in JSON format.

This demo will utilize the pre-trained ImageNet weights from DINOv2’s official GitHub. As we’ve mentioned before, DINOv2 is exceptionally versatile. It’s possible to provide your dataset and retrain DINO with several epochs to adopt this deployment in your domain.

Here's the game plan:

You will start by setting up your environment in Colab.
Load the DINOv2 PyTorch weights.
Encapsulate a sample query to DINOv2 in a function, preparing it to be deployed with Modelbit.
Test your REST API endpoint.

💻 If you'd like to code along, open this Colab Notebook!

Ready to dive in? Let's go! 🏊

Modelbit Overview

Modelbit is a lightweight platform that deploys any ML model to a production endpoint from any Python environment. Deploying small or large models is as simple as passing your inference function to “modelbit.deploy()”.

Here are the basics you need to know about Modelbit:

Deploy from any Python environment: Models can be deployed directly from Google Colab (or local Jupyter Notebooks), Hex, Deepnote, VS Code, or any Python environment.
Detect dependencies: Automatically detects which dependencies, libraries, and data your model needs and includes them in your model’s production Docker container.
Launch a REST API Endpoint: Your model will be callable as a RESTful API endpoint.
Git-based version control: Track and manage model iterations with Git repositories.
CI/CD integration: Integrate model updates and deployment into continuous integration and continuous delivery (CI/CD) pipelines like GitHub Actions and GitLab CI/CD.

Get started by installing the prerequisite libraries and setting up your environment!

🧑‍💻 Installation and Setup

As a prerequisite, you will need to install the proper packages for DINOv2. Run "pip install" with the "requirements.txt" file directly from the official GitHub repository.

Note: Run "apt-get update" and upgrade "pip" before installing your packages to ensure you download the latest package from the repositories. Updating "pip" is also good practice, as there are times when environments may have an older version of "pip". Older versions may output errors when checking if your packages meet all other package dependencies in your environment.


!apt update && apt upgrade
pip3 install --upgrade pip

!pip3 install -r !https://raw.githubusercontent.com/facebookresearch/dinov2/main/requirements.txt

!pip3 install modelbit dinov2

Next, it's time to consider choosing which DINOv2 weights to load into the model. The weight size is a significant decision that requires you to evaluate your current VRAM status.

For instance, if you're operating with free Colab instances, you can access NVIDIA T4 GPUs. These powerhouses come with 16GB of VRAM, giving you the capacity to load DINOv2’s “large” class weights for this project. You can change your runtime context to utilize this GPU on the upper right of your Colab notebook.

How would you decide on the appropriate weight size for DINOv2 compared to other GPUs? See this table, which provides all the insights you need to consider before loading your DINOv2 weights.

Also, for a comprehensive list of all the current weight class names, check Meta’s DINOv2 repository to see a detailed listing of each class size and select the most suitable weights for your requirements. DINOv2’s weights will automatically download if this is your first time using them.

🔃 Download and Load DINOv2 Weights into Memory

Next, import the necessary dependencies for the walkthrough. This step lays the groundwork by ensuring access to all the required libraries, functions, and modules. We also grab the necessary class labels for DINOv2 to use when returning its results. Then, we quickly check to see if a GPU is in our environment for use before finally loading our pre-trained weights:


import torch
import torchvision.transforms as T
import json
import urllib
import requests
from PIL import Image
from io import BytesIO

# Get ImageNet labels
imagenet_class_url = 'https://raw.githubusercontent.com/anishathalye/imagenet-simple-labels/master/imagenet-simple-labels.json'
imagenet_classes = json.loads(urllib.request.urlopen(imagenet_class_url).read())

# Set a device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load the DINOv2 model
# dinov2_vitg14_reg_lc = torch.hub.load('facebookresearch/dinov2', 'dinov2_vitg14_reg_lc').eval().to(device)
dinov2_vitg14_reg_lc = torch.hub.load('facebookresearch/dinov2', 'dinov2_vitb14_reg_lc').eval().to(device)

🧪 Test DINOv2 Locally with a Sample Image

Now that you have loaded the DINOv2 weights, we can pass a preprocessed image to the model. To do this, simply use "wget" or upload an image already on your machine to your Colab directory.


!wget -O golden_retriever.jpg https://www.princeton.edu/sites/default/files/styles/crop_2048_ipad/public/images/2022/02/KOA_Nassau_2697x1517.jpg?itok=AuZckGYV

Next, you'll want to preprocess the image for DINOv2. Since the model was pre-trained with ImageNet, we used ImageNet preprocessing on the image.


image = Image.open('golden_retriever.jpg')

transform = T.Compose([
    T.Resize(256, interpolation=T.InterpolationMode.BICUBIC),
    T.CenterCrop(224),
    T.ToTensor(),
    T.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
])

image = transform(image).to(device)

Now, we can pass the image through our model and get a class ID and label.


with torch.no_grad():
    features = dinov2_vitg14_reg_lc(image.unsqueeze(0))

print(imagenet_classes[features.argmax(-1).item()])

✅ Prepare a DINOv2 Image Classification Function

Before we upload our working code to Modelbit, we need to wrap all inference-related lines of code into a function. Then, we'll add code to accept an image via a URL and return a response from our API.

Below, we define a function aptly named "dinov2_classifier()". We have designed this function to accept a URL, "img_url", as a string. This URL points to the image we want to classify. Once the transcription process finishes, the function returns a JSON object with the results.


def dinov2_classifier(img_url):
    response = requests.get(img_url)
    image = Image.open(BytesIO(response.content))

    # Preprocess the image
    transform = T.Compose([
        T.Resize(256, interpolation=T.InterpolationMode.BICUBIC),
        T.CenterCrop(224),
        T.ToTensor(),
        T.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
    ])
    image = transform(image)

    # Move the image to the GPU if available
    image = image.to(device)

    # Extract the features
    with torch.no_grad():
        features = dinov2_vitg14_reg_lc(image.unsqueeze(0))

    # Print the features
    return {'index': features.argmax(-1).item(),
            'label': imagenet_classes[features.argmax(-1).item()]
    }

We can now run a local test since we've contained our inference code into a function. Simply pass in a URL into "dinov2_classifier()".

Feel free to choose any image online from the ImageNet Classes.


dinov2_classifier('https://www.akc.org/wp-content/uploads/2020/07/Golden-Retriever-puppy-standing-outdoors.jpg')

Deploying DINOv2 to a REST API Endpoint

Now that we've verified it works locally, it's time to see how easy it is to deploy our code directly to Modelbit with minimal lines of code.

🔒 Authenticate Modelbit to Register Your Notebook Kernel

Now that we have set up our environment, you need to authenticate Modelbit to securely connect to your kernel so that only you and authorized users can access your files and metadata.

👉 NOTE: If you don’t have a Modelbit account, sign up here—we offer a free plan you can use to run this demo.

Log into the "modelbit" service and create a development ("dev") or staging ("stage") branch for staging your deployment. Learn how to work with branches in the documentation.

If you cannot create a “dev” branch, you can use the default "main" branch for your deployment:


import modelbit

# Log into the 'modelbit' service using the development ("dev") branch
# Ensure you create a "dev" branch in Modelbit or use the "main" branch for your deployment
mb = modelbit.login(branch="dev")

The command should return a link to authenticate your kernel. Click on the authentication link:

If the authentication is successful, you should see a similar screen:

📦 Deploy Your Inference Function with "modelbit.deploy()"

Finally, you are ready to deploy to Modelbit. When you call "mb.deploy()", a series of sophisticated operations execute behind the scenes, designed to streamline the deployment process:

Pushes the source code to your Modelbit workspace, marking the initiation of the deployment process.
Pickles the project variables, which serializes the variables into a format suitable for effective storage and allows easy reconstruction.
Automatically detects dependencies required by your application.
Containerizes model weights and helper files to cut down on the possibility of errors, saving valuable deployment time.
Spins up a REST API endpoint that replicates the development environment in production.

If you require any additional packages, there are other flags you can add to customize your runtime environment. For a deeper understanding of environment customization, explore the documentation here.

To use DINOv2 in production, as you have done locally, explicitly require Modelbit to enable GPUs for the inference service. After deploying, you can turn the GPUs on or off through the Modelbit dashboard.


mb.deploy(dinov2_classifier, require_gpu=True)

Running that snippet may take several minutes. If the deployment is successful, you should see a similar output:

You should now notice the deployment process on the Modelbit dashboard has started the container build process:

Next, you might notice there's an issue building the environment on Modelbit. This is due to the following lines in “requirements.txt” not being able to be installed at the initial deployment:


--extra-index-url https://pypi.nvidia.com
cuml-cu11

To add these requirements once again, mirror the Modelbit deployment to a GitHub repo. Click on the gear cog on the top right, and go to “Git Settings.” In the “Connect to your git repo” section, click on GitHub and follow the instructions there.

Once you complete the instructions, your GitHub repo will sync with Modelbit. Any changes made to your direct GitHub repo will be reflected in Modelbit. Add the two missing requirements to your GitHub repo's requirements.txt and wait for Modelbit to redeploy automatically. Do it directly by editing the “requirements.txt” file directly from the GitHub website on the deployment branch (in this case, “dev”).

⚠️ Note: If you want to switch your Modelbit-linked repo to another GitHub repo, make sure to delete the deploy keys before deleting the repo.

Once the build is complete, you should see the API endpoint where you can access your DINOv2 deployment and the source files that "mb.deploy()" detected from your notebook environment. Ensure you copy your deployment endpoint from the Modelbit dashboard under “⚡API Endpoints”.

📩 Test the REST Endpoint with an Image URL

Once your deployment is ready, you can use your API endpoint now!

Test your endpoint from the command line using:


curl -X POST "https://ENTER_WORKSPACE_NAME.app.modelbit.com/v1/dinov2_classifier/dev/latest"
-d '{"data": 
"https://www.akc.org/wp-content/uploads/2020/07/Golden-Retriever-puppy-standing
-outdoors.jpg"}'

Replace the "ENTER_WORKSPACE_NAME" placeholder with your workspace name.

You can also test your REST endpoint with Python by sending single or batch requests for classification. Use the "requests" package you imported earlier to POST a request to the API and use JSON to format the response to print nicely:


import json

requests.post(
    "https://ENTER_WORKSPACE_NAME.app.modelbit.com/v1/dinov2_classifier/dev/latest",
    headers={"Content-Type": "application/json"},
    data=json.dumps(
        {
            "data": "https://www.akc.org/wp-content/uploads/2020/07/Golden-Retriever-puppy-standing-outdoors.jpg"
        }
    ),
).json()

You should receive a similar output as the response:


# OUTPUT - DO NOT COPY

{'index': 207, 'label': 'Golden Retriever'}

Nice! Next, go back to your dashboard on Modelbit to inspect the API logs to monitor usage and track the endpoint outputs:

Perfect! You have now deployed a working classification service powered by Meta’s DINOv2 model. Run a few more tests with different images to understand the latency of your endpoint and whether it matches your production requirements.

If it does, you are likely ready to merge the deployment into the main branch or integrate it with your CI/CD workflow!

Conclusion

DINOv2 is a self-supervised learning model that has introduced a new approach to computer vision. Unlike traditional models that depend heavily on labels or annotations, DINOv2 can learn directly from diverse inputs.

More vision-based applications will likely drive demand for general-purpose vision models like DINOv2. We see this demand with today's language models serving multiple uses, from coding to general knowledge retrieval. The next frontier is vision! By integrating DINOv2 with efficient deployment platforms like Modelbit, we are stepping towards a future where AI can play a more significant role with minimal setup time.

Are you curious about ho w DINOv2 works from a technical level? Feel free to check out these papers: