You have selected a ResNet flavor (maybe ResNet-50? 😉) for your image classification use case, fine-tuned a few layers, optimized the hyperparameters, and rigorously evaluated model performance.
You now have a high-performance ML model and are prepared to deploy it to a production endpoint. Typically, the tasks associated with deploying a computer vision model range from setting up and managing the infrastructure to converting the ResNet model into a scalable service, ensuring security, and handling the production load. Deploying customized vision ML can seem like a task requiring multiple engineers.
Luckily, it does not have to be. In this article, you will learn how to quickly deploy a ResNet-50 model directly from your notebook environment as a REST endpoint available in real-time and for batch requests with Modelbit.
By the end of the article, you should have deployed a ResNet-50 model as a REST endpoint with a few lines of code. Infrastructure, containerization, security, and scalability—all managed for you to serve an image classification model in real-time.
Here’s an overview of the solution you will build in this article:
Let’s delve right in! 🏊
We’ll be using Modelbit to deploy this model, so it’s worth quickly mentioning what Modelbit is!
Modelbit is a lightweight platform designed to make deploying any ML model to a production endpoint fast and simple. With the ability to deploy models from anywhere, it makes deploying your custom ML model as simple as passing an inference function to “modelbit.deploy()”.
Modelbit will deploy your model to an isolated container behind a REST API hosted on serverless infrastructure that scales up and down automatically.
Here are the basics you need to know about Modelbit:
By significantly reducing the complexities of deploying ML models to production, Modelbit allows teams to focus more on building and refining high-performance models.
Before diving into Modelbit’s features, let’s start with understanding a simple inference code snippet of a pre-trained ResNet-50 model from PyTorch’s model zoo.
Want to follow along? You can find the complete code, including all the imports you need to set up your environment, in this Colab notebook.
First, ensure you select the correct device context for the model and runtime. In this article, you will use the CPU for your development and inference runtimes.
Next, load the pre-trained ResNet-50 model from the PyTorch model zoo to memory:
Once the model is loaded in, ensure you’re using the CPU by using ".to(device)".
Also, set the model to evaluation mode.
The ".eval()" method ensures consistent behavior of dropout and batch normalization layers. Consistent behavior is ideal if you want to build a reliable production service with reproducible results. The evaluation mode also skips training-specific operations like computing gradients and backpropagation to make the model more memory-and-compute-efficient.
That, of course, leads to faster performance in production and a clearer understanding of the model behavior in the production environment. Easy-to-debug models are essential for debugging issues or performance monitoring.
Pass a sample image for the model to classify and provide the necessary class labels for it to return a human-readable prediction.
Download and initialize our labels from an open-source JSON file:
Next, define your preprocessing code:
The image is preprocessed using various techniques, including resizing to 256x256 pixels, center cropping to 224x224 pixels, tensor conversion, and normalization. Those specific preprocessing functions are standard for models pre-trained from ImageNet like this one.
Pass the sample image through the preprocessor and include the batch size:
Great! You have successfully parsed an image into the correct data format to be used by our model.
Now that everything is set, send the image through ResNet-50 to see if it works. Simply pass the image through with the snippet below:
In the snippet above, "torch.no_grad()" disables gradient calculations during inference for better performance. The index of the predicted class is extracted, which is used to print the class label and index for the model's classification of the input image.
Now that you have the working code in PyTorch, it’s time to prepare the function for deployment in Modelbit.
To prepare the model for upload to Modelbit, add a helper function for logging purposes. Every time a sample image is sent to the model you will deploy, Modelbit will log the image to your dashboard:
Now, wrap the code logic required for a single image inference into a function. This will involve passing in an image, preprocessing the image, calling inference, and returning the results.
Here is what that might look like after making the necessary revisions:
In the revised script, you can see a new function, "resnet_inference()", where given an "img_url" as a parameter, the image is fetched, preprocessed, and passed to the pre-trained model to predict the class of the image.
The only difference in code logic between the previous script and our revised script is how the image is obtained. In this version, instead of loading the image from disk in your notebook, it is loaded directly from a URL using the "requests" library. It also logs the image after inference to your Modelbit dashboard.
Once you define the function, you can move onto deployment in the next section! 🚀
Now that your sample code is ready, it’s time to set up Modelbit. As you progress through this tutorial, you’ll observe that with two steps, "modelbit.login()" and "mb.deploy()", you will deploy the Resnet-50 model to production.
Sign up for Modelbit here—we offer a free plan you can use to run this demo.
Next, install the Modelbit Python package in your environment:
Open your development notebook and authenticate the kernel to Modelbit. You can use the “branch” parameter to indicate which branch you’d like to deploy your models. In this case, deploy to the “dev” branch for testing the endpoint, but you can still merge to the “main” branch when you want.
This should show you a sign-in interface. After running the cell, click the sign-in button to authenticate your notebook kernel:
Finally, run “mb.deploy(resnet_inference)”. This function will determine all requirements and pickle the necessary variables to deploy the model. Once the model is uploaded, you can see your deployment on the Modelbit dashboard.
Click on your deployment. You’ll notice the API endpoint and the build status. Wait until the deployment is finished building before using the API endpoint. If you are curious about what is happening behind the scenes, click the “🌳Environment” tab to view the provisioning progress.
Once your build is complete, you should see a similar dashboard with your API endpoint ready to receive requests:
To test your deployment, simply use the POST command in a terminal.
For example, from the command line:
For more complex deployments, you can use software such as Postman to test your REST API. Postman provides a user-friendly interface that allows you to request HTTP APIs, test API endpoints, and automate various API testing and development aspects.
Check your Modelbit dashboard for the log of the sample request you sent:
Model deployment is a pivotal yet often overlooked component of the machine learning lifecycle. It involves a multi-skilled effort to transform a trained model into a functional service for making real-world predictions. Modelbit provides an interface to simplify model deployment, offering support for numerous ML frameworks, including TensorFlow, PyTorch, and Scikit Learn.
As you saw, Modelbit only requires minimal code changes to achieve ML deployment with your project. Modelbit not only automates tasks like dependency detection and variable pickling, but it also has integration amongst data pools and model versioning for easy integration and testing.
The next step would be to secure your endpoint! 🔐 Check out how in this documentation.
For additional information on Modelbit’s features to add to your project, take a look at: