Deploying Depth Anything Model to A Rest API Endpoint for Depth Detection

Michael Butler, ML Community Lead
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Introduction to Depth Anything

Unlike traditional models that interpret images in two dimensions, Depth Anything adds a crucial third dimension: depth perception. This capability allows machines to understand not only what is in an image but also how far away each element is.

Depth perception is essential for a variety of applications, from autonomous driving to advanced robotics. For example, in autonomous driving, the ability to accurately gauge distances can mean the difference between safely navigating the road and an accident. In robotics, understanding depth enables more precise interactions with the environment, such as picking up objects or avoiding obstacles.

The Depth Anything model leverages the latest neural network architecture, including the DINOv2-small backbone, to achieve its depth estimation. This technology is not just powerful but also efficient, mimicking the way humans perceive spatial relationships. By converting 2D images into 3D interpretations, the model provides a more nuanced and comprehensive understanding of visual data.

Deploying the Depth Anything model involves using pre-trained models and an image processor from Hugging Face Transformers. This setup simplifies the process of loading models, preprocessing inputs, and postprocessing outputs, making it accessible even to those who are not experts in machine learning.

In essence, the Depth Anything model transforms simple images into rich, three-dimensional data, opening up new possibilities for machine interaction with the physical world.

In this tutorial, we’ll walk through the necessary steps to build a model with Depth Anything in a notebook and deploy it to a REST API endpoint using Modelbit.

🧑‍💻 Installations and Set Up for Model Deployment

We recommend creating your own notebook and following along step by step, but if you want to follow along in our pre-built Colab notebook, you can do so here

Let's start by installing 🤗 Transformers and Modelbit.

!pip install --upgrade git+ modelbit

Load and Process Image for Depth Detection

We'll perform inference on the familiar cat and dog image.

from PIL import Image
import requests

url = ''
image =, stream=True).raw)

Using the Pipeline API for Model Deployment

The Pipeline API in Hugging Face Transformers simplifies the process of performing inference with pre-trained models. It handles model loading, input preprocessing, and output postprocessing, allowing users to focus on their specific task.

Note: The pipeline API doesn't leverage a GPU by default; you need to pass the device argument for that. See the collection of Hugging Face-compatible checkpoints.

from transformers import pipeline

pipe = pipeline(task="depth-estimation", model="LiheYoung/depth-anything-base-hf")
result = pipe(image)

Here we load the Depth Anything model, which leverages a DINOv2-small backbone. There are also checkpoints available with a base and large backbone for better performance. We also load the corresponding image processor.

from transformers import AutoImageProcessor, AutoModelForDepthEstimation

processor = AutoImageProcessor.from_pretrained("nielsr/depth-anything-small")
model = AutoModelForDepthEstimation.from_pretrained("nielsr/depth-anything-small")

Let's prepare the image for the model using the image processor.

pixel_values = processor(images=image, return_tensors="pt").pixel_values

Forward Pass for Depth Estimation

Next, we perform a forward pass. As we're at inference time, we use the "torch.no_grad()" operator to save memory (we don't need to compute any gradients).

import torch

with torch.no_grad():
  outputs = model(pixel_values)
  predicted_depth = outputs.predicted_depth

Visualize Depth Detection Results

Finally, let's visualize the results! The opencv-python package has a handy "applyColorMap()" function which we can leverage.

import cv2
import numpy as np

h, w = image.size[::-1]

depth = torch.nn.functional.interpolate(predicted_depth[None], (h, w), mode='bilinear', align_corners=False)[0, 0]

raw_depth = Image.fromarray(depth.cpu().numpy().astype('uint16'))"predicted_depth.png")

depth = (depth - depth.min()) / (depth.max() - depth.min()) * 255.0
depth = depth.cpu().numpy().astype(np.uint8)
colored_depth = cv2.applyColorMap(depth, cv2.COLORMAP_INFERNO)[:, :, ::-1]


Inference Function for Generating Predicted Depth

The "get_depth_any_dino_v2_backbone" function, decorated with @cache, is our key player. This function uses "snapshot_download" to fetch the specific backbone.

The use of "@cache" is a clever optimization; it ensures that once the model and processor are loaded, they are stored in memory. This significantly speeds up future calls to this function, as it avoids reloading the model and processor from scratch each time, making it ideal for deployments.

from functools import cache
from huggingface_hub import snapshot_download

def get_depth_any_dino_v2_backbone():
    model_path = snapshot_download(repo_id="nielsr/depth-anything-small")
    processor = AutoImageProcessor.from_pretrained(model_path)
    model = AutoModelForDepthEstimation.from_pretrained(model_path)
    return model, processor

import io
import base64

def depth_any_inference(image_url):
  model, processor = get_depth_any_dino_v2_backbone()
  print("Model Backbone loaded")
  image =, stream=True).raw)
  print("Image url loaded")
  pixel_values = processor(images=image, return_tensors="pt").pixel_values
  with torch.no_grad():
    outputs = model(pixel_values)
    predicted_depth = outputs.predicted_depth

  depth = torch.nn.functional.interpolate(predicted_depth.unsqueeze(1), size=image.size[::-1],
                                          mode="bicubic", align_corners=False,)
  output = depth.squeeze().cpu().numpy()
  formatted = (output * 255 / np.max(output)).astype("uint8")
  formatted_depth = Image.fromarray(formatted)
  # Convert the image to a BytesIO object
  image_bytes = io.BytesIO(), format='PNG')
  image_bytes = image_bytes.getvalue()

  # Encode the image bytes using base64
  base64_encoded_image = base64.b64encode(image_bytes).decode('utf-8')

  # Create a dictionary with the base64 encoded image
  response = {
      'image': base64_encoded_image

  return response


🚢 Deploy Depth Anything to a REST API Endpoint

Deploying your Depth Anything model to a REST API endpoint makes it accessible for real-time applications.

🔐 Log into modelbit


📩 Test the REST Endpoint with a Single Image

You can test your REST Endpoint by sending single or batch production images to it for inference.

Use the requests package to POST a request to the API and use json to format the response to print nicely:

⚠️ Replace the "ENTER_WORKSPACE_NAME" placeholder with your workspace name.

import json
import requests"",
              data=json.dumps({"data": [""]})).json()

You can also test your endpoint from the command line using:

curl -s -XPOST "" -d '{"data": [""]}' | json_pp

By following these steps, you'll be able to deploy the Depth Anything model for depth detection via a REST API endpoint, making it accessible for various applications that require real-time depth estimation.

Want more tutorials for deploying ML models to production?

Deploy Custom ML Models to Production with Modelbit

Join other world class machine learning teams deploying customized machine learning models to REST Endpoints.
Get Started for Free