ML Teams, Build an ML Platform Optimized for Iteration Speed

Don’t optimize for Llama2 or Segment-Anything. Optimize for rapidly adopting what comes next.

ML From 2018 to 2023

Over the last few years, there has been an explosion in model technologies in use all over the world. The business world rapidly evolved from mainly relying on regressions, classifiers and the like, to rapidly adopting a new universe of language models, image recognition models and more – often running on the modern neural net frameworks.

Our team works with a large and diverse group of companies to help them deploy their ML models into their products. The transition has been stark. Just a couple years ago, the set of technologies the average company worked with was relatively stable: XGBoost, Scikit-Learn, and related technologies ruled the roost. R was still a little bit common.

The New New Thing in ML

Then the advent of GPT models from OpenAI and transformers from HuggingFace hit the industry like a starting gun. Companies previously only using ML to do some regressions and classifications in the back office started to want to use LLMs in any way they could. We started to see support bots booting up BERT transformers to suggest links to documentation pages. This was quickly replaced by using GPT models to actually write the support replies. Then this entire system was rapidly scrapped and replaced by prompt engineering on top of OpenAI API calls to ChatGPT 4.

This is a good thing: The organizations that rapidly made these changes succeeded and built competitive advantages. They are delivering better customer support, smarter risk analysis, more sophisticated customer segmentation, and more.

But there’s a cost: Not only did these companies burn through costly compute hosting deep nets only to throw them away in favor of perhaps-more-costly OpenAI API credits, but they thrashed their ML organizations supporting wildly different model types with wildly different compute, memory and GPU requirements in short order.

Along the way, some of these organizations built durable competitive moats around machine learning: But the moat isn’t one particular technology or model type or application. It’s the ability to experiment with and adopt new model technologies more rapidly than your competition.

The View From Computer Vision

For an example of this phenomenon in practice, let’s take a look at the evolution of computer vision models over the last five years. Computer vision models are used by some of the most exciting technology companies around: Security companies that can identify threats in real time; Drone companies that can cheaply and scalably monitor large farmlands and construction sites; and defense tech companies keeping our citizens safe from harm, to name a few that we work with.

If you’re leading or building the ML practice at one of these companies, let’s take a look at the evolving state of the art that you’ve had to contend with. You might have started with something stable like OpenCV. But as compute has gotten cheaper, GPUs have become readily available for ML, and deep neural nets have been applied to a wide range of vision problems, we’ve seen an explosion in new model frameworks. As of now, HuggingFace boasts over 7,000 computer vision models, and that number is climbing daily.

Meta Didn’t Come To Play

As the ML practitioner or leader at one of these vision companies, Meta’s rapid pace of research in particular has been a huge opportunity but also a huge technological challenge. Meta Research has been driving the state of the art in computer vision for the last five years.

It was the confluence of the above factors – cheaply available compute, GPUs, and neural net frameworks – that kicked them off. On top of their then-state-of-the-art deep learning framework Caffe2 (itself a fork of Caffe from UC Berkeley), Facebook launched Detectron, an object detection framework that implemented its own original algorithm Mask R-CNN.

But woe was the team that rearchitected its computer vision systems along those lines. Soon, Facebook launched PyTorch, its premier deep learning framework, and reimplemented all of Detectron on top of PyTorch. The resulting framework, Detectron2, is one of the most common frameworks we see in production to this day. It seems a lot of companies got this far but have not been agile enough to make use of the latest models.

To wit, Facebook soon launched Mask2Former, another PyTorch model, this one based on original research from Facebook’s research team. Mask2Former implements the Masked-attention Mask Transformer algorithm, which in their research paper, Facebook claims “addresses any image segmentation task”. Mask2Former has also been a hit with the computer vision teams we work with.

Source: segment-anything.com/demo

But Facebook clearly isn’t done. Just a few months ago, they launched Segment-Anything, giving the lie to their own previous claims of “addressing any image segmentation task.” Segment-Anything added two novel innovations: Inspired by LLMs, it separates the generation of embeddings from an additional prompting step, allowing very fast interactive segmentation when an image has already had embeddings generated. And two, the very slick marketing of Segment-Anything shows off a newfound integration between Facebook’s research and application teams.

What’s An ML Team To Do?

From our vantage point helping many teams deploy ML models, the problem is immediately apparent. Some teams come to us with prior generations of models, because that is where they got off the innovation train. Others come to us because they have Mask2Former and they want to innovate with the latest and greatest, Segment-Anything. These models and frameworks represent true, accelerating innovation: The best teams are not getting left behind.

The lesson is not to rapidly rewrite your infrastructure to optimize the separation of embedding and prompting to serve Segment-Anything models at scale. (Though, yes, you need to do that.) The lesson is to build a platform and an organization that is able to rapidly experiment with new model types as they are developed, so that you can live on the cutting edge.

Four Key Characteristics of Winning Platforms

Below are four key characteristics we've observed from working with teams who are able to deploy the latest and greatest ML models rapidly.

1. One container per deployed model

‍It can be tempting to standardize on certain versions of certain ML frameworks, e.g. “Here at AcmeCo, we use Tensorflow 2.13.0.” Do not do this. Newer models will have new and unforeseen dependencies. You want the ability to rapidly build containers for new models without having to retrofit every single previous model.

2. Heterogeneous compute options for deployed models

‍Small models called in large batches are natural fits for AWS Lamdba. Larger models with GPU requirements may need to go in AWS Fargate or EC2. AWS ECS can be a nice fit for managing a fleet of heterogeneous Docker containers.

3. Every model gets a REST API

‍The API between the product and the constellation of ML models is a key point of standardization. Having a namespace of REST APIs, e.g. “model_name.ml.mycompany.com/version_number” can reduce the friction between ML practitioners and platform engineering. Platform engineering needs to simply integrate new REST APIs on occasion – not take on the task of building environments for every single new model.

4. Everything In Git

‍Model code itself, model artifacts, notebooks used to train models, and all other ML-related assets should be version controlled in the team’s git repository. Commits to the git repository should kick off CI/CD pipelines that build the model’s container from the specified requirements. In this way, it becomes easy for ML teams to rapidly stand up new models from a simple git push.

Conclusion

Innovation in ML frameworks and ML models is only accelerating. The best teams commit themselves to building ML platforms that allow them to rapidly experiment with new ML model types. We’ve seen the most nimble teams follow these best practices: Containerize each model separately; keep your options open for compute environment for each container; standardize behind REST APIs; and kick off CI/CD from git checkins to build all of the above. These guidelines hopefully provide a roadmap for ML platforms that allow rapid experimentation and iteration, so you don’t get left behind on old ML architectures.

Just as important as a flexible ML platform is a flexible ML team organization and culture. Teams need to remain agile, mentally flexible, and willing & able to tackle the latest ML challenges. We hope to share our learnings on this front in a future post.