Deploying a ResNet-50 Image Classification Model to a REST API Endpoint with Modelbit

Michael Butler
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

You have selected a ResNet flavor (maybe ResNet-50? 😉) for your image classification use case, fine-tuned a few layers, optimized the hyperparameters, and rigorously evaluated model performance. 

You now have a high-performance ML model and are prepared to deploy it to a production endpoint. Typically, the tasks associated with deploying a computer vision model range from setting up and managing the infrastructure to converting the ResNet model into a scalable service, ensuring security, and handling the production load. Deploying customized vision ML can seem like a task requiring multiple engineers.

Luckily, it does not have to be. In this article, you will learn how to quickly deploy a ResNet-50 model directly from your notebook environment as a REST endpoint available in real-time and for batch requests with Modelbit.

By the end of the article, you should have deployed a ResNet-50 model as a REST endpoint with a few lines of code. Infrastructure, containerization, security, and scalability—all managed for you to serve an image classification model in real-time.

Here’s an overview of the solution you will build in this article:

Let’s delve right in! 🏊

What Modelbit is: Fast ML Model Deployment

We’ll be using Modelbit to deploy this model, so it’s worth quickly mentioning what Modelbit is!

Modelbit is a lightweight platform designed to make deploying any ML model to a production endpoint fast and simple. With the ability to deploy models from anywhere, it makes deploying your custom ML model as simple as passing an inference function to “modelbit.deploy()”.

Modelbit will deploy your model to an isolated container behind a REST API hosted on serverless infrastructure that scales up and down automatically.

Here are the basics you need to know about Modelbit:

  • Deployment from any Python environment: Models can be deployed directly from Jupyter notebooks, Hex, Deepnote, and VS Code.
  • Dependency detection: Unlike traditional deployment strategies that require manual tracking of dependencies, Modelbit intelligently detects which dependencies, libraries, and data your model needs and includes them in your model’s production Docker container.
  • REST API Endpoints: Your models will be callable as RESTful API endpoints.
  • Git-based version control: Track and manage model iterations leveraging Git repositories.
  • CI/CD integration: Seamlessly integrate model updates and deployment into continuous integration and continuous delivery (CI/CD) pipelines like GitHub Actions and GitLab CI/CD.

By significantly reducing the complexities of deploying ML models to production, Modelbit allows teams to focus more on building and refining high-performance models.

For detailed information on the "mb.deploy()" command, refer to Modelbit's deployment documentation here.

Build the ResNet-50 Model Inference Workflow

Before diving into Modelbit’s features, let’s start with understanding a simple inference code snippet of a pre-trained ResNet-50 model from PyTorch’s model zoo.

Want to follow along? You can find the complete code, including all the imports you need to set up your environment, in this Colab notebook.

💻 Select your device

First, ensure you select the correct device context for the model and runtime. In this article, you will use the CPU for your development and inference runtimes.

device = torch.device("cpu")

🔃 Load the pre-trained model from PyTorch to memory

Next, load the pre-trained ResNet-50 model from the PyTorch model zoo to memory:

from torchvision import models, transforms
resnet50 = models.resnet50(pretrained=True)

Once the model is loaded in, ensure you’re using the CPU by using ".to(device)". 

Also, set the model to evaluation mode.

resnet50 =
resnet50.eval() # Set the model to evaluation mode

The ".eval()" method ensures consistent behavior of dropout and batch normalization layers. Consistent behavior is ideal if you want to build a reliable production service with reproducible results. The evaluation mode also skips training-specific operations like computing gradients and backpropagation to make the model more memory-and-compute-efficient. 

That, of course, leads to faster performance in production and a clearer understanding of the model behavior in the production environment. Easy-to-debug models are essential for debugging issues or performance monitoring.

🏷️ Download ImageNet Labels

Pass a sample image for the model to classify and provide the necessary class labels for it to return a human-readable prediction.

Download and initialize our labels from an open-source JSON file:

labels = requests.get(LABELS_URL).json()

🖼️ Ingest and preprocess the sample image

Next, define your preprocessing code:

preprocess = transforms.Compose(
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),

The image is preprocessed using various techniques, including resizing to 256x256 pixels, center cropping to 224x224 pixels, tensor conversion, and normalization. Those specific preprocessing functions are standard for models pre-trained from ImageNet like this one.

Pass the sample image through the preprocessor and include the batch size:

image_path = "samoyed.jpg"
img =

input_tensor = preprocess(img)
input_batch = input_tensor.unsqueeze(0)

Great! You have successfully parsed an image into the correct data format to be used by our model.

🔮Calling inference and grabbing the results

Now that everything is set, send the image through ResNet-50 to see if it works. Simply pass the image through with the snippet below:

with torch.no_grad():
    output = resnet50(input_batch)

_, predicted_idx = torch.max(output, 1)
print(f"Predicted class index: {predicted_idx.item()}")
print(f"Predicted class label: {labels[predicted_idx.item()]}")

In the snippet above, "torch.no_grad()" disables gradient calculations during inference for better performance. The index of the predicted class is extracted, which is used to print the class label and index for the model's classification of the input image.

Wrapping the Inference Function

Now that you have the working code in PyTorch, it’s time to prepare the function for deployment in Modelbit. 

To prepare the model for upload to Modelbit, add a helper function for logging purposes. Every time a sample image is sent to the model you will deploy, Modelbit will log the image to your dashboard:

def display_image(inp, input_batch, predicted_label):
    # De-normalize the image for display
    mean = np.array([0.485, 0.456, 0.406])
    std = np.array([0.229, 0.224, 0.225])
    inp = input_batch[0].numpy().transpose((1, 2, 0))
    inp = std * inp + mean
    inp = np.clip(inp, 0, 1)

    fig = plt.figure(figsize=(16, 4))

    return fig

Now, wrap the code logic required for a single image inference into a function. This will involve passing in an image, preprocessing the image, calling inference, and returning the results. 

Here is what that might look like after making the necessary revisions:

def resnet_inference(img_url):
    response = requests.get(img_url)
    img =

    # Preprocessing code
    preprocess = transforms.Compose(
            transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),

    # Pass the image for preprocessing and reshape to add batch dimension
    input_tensor = preprocess(img_url)
    input_batch = input_tensor.unsqueeze(0)

    # Predict the class of the image
    with torch.no_grad():  # no_grad ensures the gradients are not calculated in prod
        output = resnet50(input_batch)

    _, predicted_idx = torch.max(output, 1)

    # Snippet to log image to Modelbit with the helper func
    pred_img = display_image(
        img, input_batch, predicted_label=labels[predicted_idx.item()]
    mb.log_image(pred_img)  # show the predicted boxes on the image in modelbit logs

    return {"index": predicted_idx.item(), "label": labels[predicted_idx.item()]}

In the revised script, you can see a new function, "resnet_inference()", where given an "img_url" as a parameter, the image is fetched, preprocessed, and passed to the pre-trained model to predict the class of the image.

The only difference in code logic between the previous script and our revised script is how the image is obtained. In this version, instead of loading the image from disk in your notebook, it is loaded directly from a URL using the "requests" library. It also logs the image after inference to your Modelbit dashboard. 

Once you define the function, you can move onto deployment in the next section! 🚀

🚀 Deploying the Model

Now that your sample code is ready, it’s time to set up Modelbit. As you progress through this tutorial, you’ll observe that with two steps, "modelbit.login()" and "mb.deploy()", you will deploy the Resnet-50 model to production.

Sign up for Modelbit here—we offer a free plan you can use to run this demo. 

Next, install the Modelbit Python package in your environment:

# Install the latest version of 'modelbit' for model deployment quietly
!pip install -q --upgrade modelbit

Open your development notebook and authenticate the kernel to Modelbit. You can use the “branch” parameter to indicate which branch you’d like to deploy your models. In this case, deploy to the “dev” branch for testing the endpoint, but you can still merge to the “main” branch when you want.

import modelbit
mb = modelbit.login(branch="dev")

This should show you a sign-in interface. After running the cell, click the sign-in button to authenticate your notebook kernel:

Finally, run “mb.deploy(resnet_inference)”. This function will determine all requirements and pickle the necessary variables to deploy the model. Once the model is uploaded, you can see your deployment on the Modelbit dashboard.

Click on your deployment. You’ll notice the API endpoint and the build status. Wait until the deployment is finished building before using the API endpoint. If you are curious about what is happening behind the scenes, click the “🌳Environment” tab to view the provisioning progress.

Once your build is complete, you should see a similar dashboard with your API endpoint ready to receive requests:

Testing the Deployment

To test your deployment, simply use the POST command in a terminal.

For example, from the command line:

curl -s -X POST "" \
    -d '{"data": ""}' \
    | json_pp

For more complex deployments, you can use software such as Postman to test your REST API. Postman provides a user-friendly interface that allows you to request HTTP APIs, test API endpoints, and automate various API testing and development aspects.

Check your Modelbit dashboard for the log of the sample request you sent:

Next Steps

Model deployment is a pivotal yet often overlooked component of the machine learning lifecycle. It involves a multi-skilled effort to transform a trained model into a functional service for making real-world predictions. Modelbit provides an interface to simplify model deployment, offering support for numerous ML frameworks, including TensorFlow, PyTorch, and Scikit Learn. 

As you saw, Modelbit only requires minimal code changes to achieve ML deployment with your project. Modelbit not only automates tasks like dependency detection and variable pickling, but it also has integration amongst data pools and model versioning for easy integration and testing.

The next step would be to secure your endpoint! 🔐 Check out how in this documentation.

For additional information on Modelbit’s features to add to your project, take a look at:

Deploy Custom ML Models to Production with Modelbit

Join other world class machine learning teams deploying customized machine learning models to REST Endpoints.
Get Started for Free