Depth Anything Model Guide

Getting Started with Modelbit

Modelbit is an MLOps platform that lets you train and deploy any ML model, from any Python environment, with a few lines of code.

Table of Contents

Getting StartedOverviewUse CasesStrengthsLimitationsLearning Type

Model Comparisons

No items found.

Getting Started

Model Documentation

Deploying Depth Anything Model to a Rest API Endpoint for Depth Detection

Installations and Set Up

!pip install --upgrade git+ modelbit

Load image

from PIL import Image
import requests

url = ''
image =, stream=True).raw)

Pipeline API

The Pipeline API in Hugging Face Transformers is a convenient tool that simplifies the process of performing inference with pre-trained models. It abstracts away the underlying complexities of model loading, input preprocessing, and output postprocessing, allowing users to focus on their specific task.

Note: the pipeline API doesn't leverage a GPU by default, one needs to pass the device argument for that. See the collection of Hugging Face-compatible checkpoints.

from transformers import pipeline

pipe = pipeline(task="depth-estimation", model="LiheYoung/depth-anything-base-hf")
result = pipe(image)

Here we load the Depth Anything model which leverages a DINOv2-small backbone. There were also checkpoints released which leverages a base and large backbone, resulting in better performance. We also load the corresponding image processor.

from transformers import AutoImageProcessor, AutoModelForDepthEstimation

processor = AutoImageProcessor.from_pretrained("nielsr/depth-anything-small")
model = AutoModelForDepthEstimation.from_pretrained("nielsr/depth-anything-small")

Let's prepare the image for the model using the image processor.

pixel_values = processor(images=image, return_tensors="pt").pixel_values

Forward pass

Next we perform a forward pass. As we're at inference time, we use the "torch.no_grad()" operator to save memory (we don't need to compute any gradients).

import torch

with torch.no_grad():
  outputs = model(pixel_values)
  predicted_depth = outputs.predicted_depth


Finally, let's visualize the results! The opencv-python package has a handy applyColorMap() function which we can leverage.

import cv2
import numpy as np

h, w = image.size[::-1]

depth = torch.nn.functional.interpolate(predicted_depth[None], (h, w), mode='bilinear', align_corners=False)[0, 0]

raw_depth = Image.fromarray(depth.cpu().numpy().astype('uint16'))"predicted_depth.png")

depth = (depth - depth.min()) / (depth.max() - depth.min()) * 255.0
depth = depth.cpu().numpy().astype(np.uint8)
colored_depth = cv2.applyColorMap(depth, cv2.COLORMAP_INFERNO)[:, :, ::-1]


Inference Function for Generating Predicted Depth

The "get_depth_any_dino_v2_backbone" function, decorated with @cache, is our key player. This function uses "snapshot_download" to fetch the specific backbone.

The use of @cache is a clever optimization; it ensures that once the model and processor are loaded, they are stored in memory. This significantly speeds up future calls to this function, as it avoids reloading the model and processor from scratch each time, making it ideal for deployments.

from functools import cache
from huggingface_hub import snapshot_download

def get_depth_any_dino_v2_backbone():
    model_path = snapshot_download(repo_id="nielsr/depth-anything-small")
    processor = AutoImageProcessor.from_pretrained(model_path)
    model = AutoModelForDepthEstimation.from_pretrained(model_path)
    return model, processor

def depth_any_inference(image_url):
    model, processor = get_depth_any_dino_v2_backbone()
    print("Model Backbone loaded")
    image =, stream=True).raw)
    print("Image url loaded")
    pixel_values = processor(images=image, return_tensors="pt").pixel_values
    with torch.no_grad():
      outputs = model(pixel_values)
      predicted_depth = outputs.predicted_depth
    print(f"Predicted Depth {predicted_depth}")
    return predicted_depth


Deploy Depth Anything to a REST API Endpoint

Log into modelbit

import modelbit as mb


#Deploy the depth anything function to modelbit

Test the REST Endpoint with a Single Image

You can test your REST Endpoint by sending single or batch production images to it for inference.

Use the requests package to POST a request to the API and use json to format the response to print nicely:

⚠️ Replace the "ENTER_WORKSPACE_NAME" placeholder with your Modelbit workspace name.

import json
import requests"https://",
              data=json.dumps({"data": [""]})).json()

You can also test your endpoint from the command line using:

curl -s -XPOST "" -d
'{"data": [""]}' | json_pp

⚠️ Replace the "ENTER_WORKSPACE_NAME" placeholder with your Modelbit workspace name.

Model Overview

Depth Anything is a state-of-the-art model in the field of monocular depth estimation, developed to address the challenges associated with understanding 3D structures from single 2D images. This model stands out due to its unique approach to utilizing unlabeled data, significantly enhancing its depth perception capabilities. Unlike traditional models, Depth Anything does not rely on complex new technical modules; instead, it focuses on scaling up datasets and improving data coverage, which in turn reduces generalization errors and enhances model robustness​​​​​​.

Use Cases

Depth Anything has significant applications in fields like autonomous driving, 3D modeling, and augmented reality. Its superior depth estimation capabilities make it particularly useful in scenarios where understanding the spatial layout from a single viewpoint is crucial. Moreover, the model's versatility is highlighted through its improved depth-conditioned ControlNet, making it beneficial for dynamic scene understanding and video editing​​.


The primary strength of Depth Anything lies in its exceptional ability to perform monocular depth estimation leveraging large-scale unlabeled datasets. This enables the model to achieve state-of-the-art performance in both relative and absolute depth estimations. The model’s training approach and architecture allow it to outperform predecessors significantly in zero-shot evaluations and establish new benchmarks when fine-tuned on specific datasets like NYUv2 and KITTI​​​​.


While Depth Anything marks a significant improvement in depth estimation, its reliance on large-scale data might pose challenges in scenarios with limited computational resources or specific privacy constraints. Additionally, while it advances monocular depth estimation, there might be limitations in extremely diverse or novel environments not represented in the training data​​.

Learning Type & Algorithmic Approach

Depth Anything utilizes a semi-supervised learning approach, capitalizing on both labeled and unlabeled data. The model employs a novel training strategy that includes pseudo-labeling of unlabeled images and strong data augmentation techniques. This approach helps in overcoming the limitations of traditional supervised learning methods and enables the model to adapt to a wide variety of visual domains​​​​.

Ready to deploy your ML model?

Get a demo and learn how ML teams are deploying and managing ML models with Modelbit.
Book a Demo