An Alternative to SageMaker Endpoints

Michael Butler, ML Community Lead
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Since its launch in 2017, Amazon SageMaker has been adopted by countless organizations looking to leverage machine learning for use cases ranging from improved business intelligence to state-of-the-art ML models powering new customer-facing products. 

Today, it’s likely stifling innovation at your company and making your product less competitive.

While SageMaker was once thought of as the default platform to develop and deploy ML models into production, it is increasingly becoming a burden on ML teams who are looking to iterate quickly in a world where the pace of ML model innovation is accelerating.

In short, ML teams using SageMaker cannot move as fast as they need to in order to stay competitive.

In this blog post we take a detailed look at:

  • Two disruptive trends happening in ML that are making SageMaker obsolete.
  • How ML teams using SageMaker Endpoints for deployment are at a disadvantage.
  • Why we built an alternative to SageMaker Endpoints.

Two Disruptive Trends Happening in ML

Before we look at the specific challenges ML teams face when using SageMaker, we first need to zoom out and understand two key trends happening that are impacting all machine learning practitioners. 

Trend 1 - ML models are becoming obsolete faster than ever

A force that looms large over machine learning teams is the increasing pace at which groundbreaking new model technologies are being released. If you’ve worked on an ML team for at least a few months, then you likely already understand just how fast these new model technologies are dropping. Dedicated research teams at institutions like Meta Research and Google AI now have a multi-year track record of releasing new technology that can make the existing models running in production completely obsolete.

Many of these new model technologies are open-sourced, which allows hobbyists to iterate on these models to make them even more performant for use-case specific functions. What has been even more profound has been the centralization of these customized models into open-source model hubs. Hugging Face has created the central model hub for a large majority of the ML community.

Here is a pretty typical example of a new model technology’s lifecycle:

  1. Meta Research releases a groundbreaking new model technology.
  2. Hobbyists quickly customize the model and release specialized versions of it.
  3. Each new, specialized version of the model is released on Hugging Face.
  4. Agile ML teams are able to quickly take action to evaluate these new models in production and replace any existing models that they outperform.
  5. Google releases a new model technology that outperforms Meta’s latest model.
  6. The cycle repeats.

It goes without saying, but part 4 of the lifecycle is where you are able to have a choice in all of this. Innovation is accelerating and its pace isn’t slowing down any time soon. How prepared you are to evaluate and deploy new machine learning models will come down to key decisions you make about the tools you use.

Due to the rigid nature of the platform, many ML teams we’ve spoken to using Amazon SageMaker are not able to evaluate new model technologies as fast as they are being released. As a result, their product’s velocity suffers. In today’s environment, it’s critical that machine learning teams can evaluate and deploy new types of ML models faster than their competition.

At Modelbit, we often talk to teams who have done a great job building processes to deploy specific models into production and serve inferences with SageMaker. The challenge, however, is that building these model-specific processes is often a multi-month effort. Once built, the platforms are rigid and don’t allow for quick iteration when newer model technologies are released. 

Trend 2 - Being full stack in ML is no longer optional

By far, one of the biggest trends we have noticed over the last couple of years has been that an increasing number of companies now have an expectation that if you want to work on ML models that will be used in the product, you need to be full stack. That means being competent in not only ML model development, but additional skill sets like deployment and monitoring. 

Some practitioners have gotten used to a bifurcation of roles, where the Data Scientist builds the models, and the Machine Learning Engineer deploys them. This rosy past reality has largely gone by the wayside with the explosion of model frameworks, model complexity, and the endless appetites from the business side to deploy more machine learning directly into products.

These days, companies that have pure “Data Scientists” largely have them building simple models for back-office use cases. We even see it often as a form of title inflation largely for BI analysts doing historical data analysis. 

Meanwhile, as of 2022 “Machine Learning Engineer” was the 4th fastest growing job opening in the United States. As the number of ways ML can be used within customer-facing products continues to grow, we expect the trend of machine learning engineers owning the end-to-end ML process, often without support from dedicated MLOps teams, to grow.

To recap, there are two major things trends happening in ML:

  1. ML Models are becoming obsolete faster than ever because of the accelerating pace of innovation in new model technologies.
  2. ML practitioners are expected to be able to handle the entire ML model lifecycle, from development to deployment.

If you’re on a team building ML models, then you need to be prepared to take quick action to evaluate and deploy new types of ML models if they can outperform what you’re already serving in production. On top of that, you’ll likely need to be able to accomplish this without support from your product engineering team.

With all of this in mind, let’s take a deeper look at how SageMaker is at odds with this new reality. 

SageMaker: What got us here won’t get us to where we need to go

From an evolutionary perspective, Amazon releasing a platform to manage the deployment of ML models made complete sense. If you were already building your product on top of AWS, it was seemingly convenient to add SageMaker to your AWS bill. SageMaker was also attractive because you could leverage the same adjacent AWS services that you were already using like S3 for model storage, VPCs for networking, and API Gateways for REST APIs. 

But over time, SageMaker layered in additional functionality and dependencies which increased the complexity of managing end-to-end ML projects. As the complexity of managing ML workflows in SageMaker increased, more and more organizations began to stand up specialized functional teams dedicated to supporting machine learning operations (MLOps). For some companies, this meant enlisting volunteers from the DevOps team to build infrastructure to support the growing number of models being served in production. But for smaller companies or for those with less robust engineering resources, the responsibility of managing ML workflows in SageMaker had to land somewhere else within the company.

At Modelbit, we’ve seen the responsibility of managing machine learning infrastructure rest on the shoulders of everyone from junior data scientists all the way to the CTO. The simple fact is: most companies that want to leverage machine learning cannot afford to hire entire machine learning operations teams.

Why you should consider an alternative to SageMaker Endpoints

If ML teams need to be able to rapidly evaluate and deploy new model technologies into production, then a prerequisite to that endeavor is a machine learning platform that enables them to iterate quickly. Based on what we’ve learned from hundreds of conversations with ML teams, we’ve found that Amazon SageMaker often prevents machine learning teams from being as nimble and effective as they need to be.

If you’ve already experienced the typical pains of using SageMaker, feel free to skip to the next section where we introduce an alternative. If, however, you’re picking an ML platform for the first time, we recommend going out and talking to others in the ML community who have used SageMaker. Ask them about their experience. In particular, we recommend asking them about:

The Total Cost of Deploying ML with SageMaker

If you search around for alternatives for SageMaker Endpoints, you’ll find several blog posts that talk about the hidden costs of running end-to-end ML on the platform. There is definitely some truth to these claims. On one hand, SageMaker provides access to compute resources managed by AWS. Their list prices are hard to beat and therefore difficult to ignore. On the other hand, building systems and processes to effectively manage these compute resources will at minimum require valuable engineering resources. We haven’t met a team using SageMaker that hasn’t ended up hiring full time employees to manage the platform.

In fact, we met one ML leader who told us, “We had to hire two full time front-end engineers to build a user interface to make SageMaker usable.”

Which naturally lends to the next topic we encourage you to ask other Sagemaker veterans about:

The Complexity and Rigidity of SageMaker

One way to think about the complexity of deploying ML models with SageMaker Endpoints is to simply consider the different products you’ll need to be able to use just to deploy an ML model into production:

  1. Amazon S3 - to store features and connect them to SageMaker Feature Store.
  2. Amazon IAM - to set up the roles and permissions required to operate SageMaker.
  3. Amazon API Gateway - for your REST API
  4. Apache Airflow - to run a daily workflow for features in S3.
  5. SageMaker Training - to train models.
  6. SageMaker Pipelines will allow us to connect to the Feature store, create training data and run SageMaker training jobs.
  7. Amazon VPC - for the (mandatory) networking between all these components

In addition to all that complexity, remember, the containers you deploy your models into are yours to manage! SageMaker will provide the on-demand compute at a reasonable cost, and will let you use the above tools to connect your containers to endpoints and other resources. But every model has its own container with its own fully independent Python stack. How will you manage all those containers? That’ll be additional complexity for your team to manage.

Those are just a few tools and steps you’ll need to take when using SageMaker. All of that leads up to actually deploying the ML model to SageMaker Endpoints:

Workflow diagram of the steps to deploy ML models to inference endpoints in SageMaker.

When you talk to fellow ML practitioners and leaders about their experience, it’s important to not only get an understanding of the amount of engineering resources SageMaker requires, but also which particular skill sets need to be employed to keep the system running.

All of this leads up to what we believe is the fundamental question:

How Fast They Can Deploy New ML Models with SageMaker?

This is the biggest factor that modern ML teams need to consider. If you’re considering SageMaker for the first time, you have the luxury of first talking to other teams using it before making such an important decision. If you’re using SageMaker today, you need to understand in which ways and to what extent it’s slowing your team down.

Speed is critical. Again, this comes back to the pace at which innovation in new ML model technology is accelerating. Gone are the days of assuming that a particular type of ML model will be relevant for years to come. If companies want their products leveraging ML to stay competitive in the market, they must be able to rapidly evaluate and deploy new types of ML models into production. 

A New Way to Serve ML Models into Production

In 2022, we launched what we believe is a necessary alternative to SageMaker Endpoints: Modelbit. Prior to doing so, we spent 10 years building a company that helped data teams visualize data with SQL & Python. Throughout that time, we consistently heard from teams working on ML that deploying ML models into production was a huge point of friction, and in no small part caused by legacy tools like SageMaker.

In fact, we heard about this problem so much that we couldn’t resist the urge to go out and build a better way for ML teams to deploy ML models into production.

When we designed Modelbit to help make it easier to productionize ML models, we made sure to align the product around solving some of the biggest points of friction where heard from machine learning teams. Some of the biggest areas to tackle were: 

Automated ML Model Containerization

It’s a pretty popular meme in the ML community to joke about how much product engineering teams dislike data science notebooks. There’s a good reason for this. Historically, data scientists and ML researchers would prototype and develop their ML models in a Jupyter notebook. Once they got it working well, it was essentially thrown over the wall to engineering. Engineering then spends valuable cycles doing one of two things: Rebuilding the model from scratch in the language of the product code, e.g. Java; or fully isolating and containerizing the model and all its dependencies, including finding a suitable place to host the container.

Modelbit automates the entire process of deploying ML models into containers. Whether in a Jupyter notebook or VS Code, all you have to do is pass your inference function to “modelbit.deploy()”. Modelbit will automatically detect and carry your model, along with all of its dependencies, into a fully productionized container with access to high-speed network and high-powered compute resources. Organization, maintenance and management of these containers then becomes something Modelbit handles for you. Remember all that container management you were doing with SageMaker? Not necessary any more.

Screenshot of automatic ML model environment detection in Modelbit.
Modelbit automatically detects your ML model's environment.

Screen shot of an ML Model .pkl file in Modelbit
Modelbit deploys each ML model into its own container.

Heterogeneous Compute Environments

Another major constraint for ML teams looking to iterate rapidly is the need to provision and manage various types of cloud compute resources. Many ML teams have told us that they’ve been prevented from evaluating newer model technologies because their existing platform was optimized to only support specific compute environments.

At Modelbit, we believe that compute environments should not be a constraint for ML teams looking to move quickly. Small models may need a smaller container on a smaller EC2 box, while larger models will need a larger size with more performance. Modelbit makes all of these available, automatically.

When you deploy an ML model with Modelbit, we instantly provision the right compute environment for your model. If you want to customize it you can, but the key point is that you will not be constrained by limited access to compute environments.

ML Models as REST Endpoints

Because every model deployed with Modelbit is placed into its own container, we made sure to generate a unique REST Endpoint for each model as well. The API between the product and the constellation of ML models is a key point of standardization. 

Having a namespace of REST APIs, e.g.“” can reduce the friction between ML practitioners and platform engineering. Platform engineering needs to simply integrate new REST APIs on occasion – not take on the task of building environments for every single new model.

Production ML, Backed by Your Git

When you deploy ML models to production with Modelbit, everything automatically syncs with your git repo. The model code itself, model artifacts, notebooks used to train models, and all other ML-related assets are version controlled in your team’s git repository. Pull requests in the git repository can kick off CI/CD pipelines that build the model’s container from the specified requirements. In this way, it becomes easy for ML teams to rapidly stand up new models from a simple git push.

Choosing the Right ML Platform

We acknowledge that we are building a competitive product to Amazon SageMaker Endpoints, and so inherently our point of view is going to be biased. The good news is that you don’t have to take our word for it.

If you’re already looking to migrate away from SageMaker, you can create a free trial of Modelbit today and deploy your first ML model to production in less than 10 minutes.

If you haven’t tried either Modelbit or SageMaker, we certainly encourage you to spend some time becoming familiar with both products. See what it takes to deploy a model to production in each tool. Talk to your fellow ML practitioners in the community. It’s an important decision.

We’d love to hear your thoughts on this blog post. Please feel free to schedule some time to chat with us, or shoot us a note at

Deploy Custom ML Models to Production with Modelbit

Join other world class machine learning teams deploying customized machine learning models to REST Endpoints.
Get Started for Free