Deploying a BERT Model to a REST API Endpoint for Text Classification

Michael Butler, ML Community Lead
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Bidirectional Encoder Representations from Transformers (BERT), like GPT-3 and GPT-2, is a large language model designed for natural language processing tasks. The Transformer architecture forms the backbone of BERT. 

The architecture comprises multiple layers of self-attention and feed-forward neural networks, which enable BERT to process sequences of data in parallel rather than sequentially.

Modified and adapted from: LLMs: BERT

The BERT-base model contains 110 million parameters, and the larger variant, BERT-large, contains 340 million parameters. Deploying such BERT variants to production can pose significant challenges, including:

  • The size of the model, especially when deploying to environments with storage constraints. 
  • Setting up and managing the infrastructure for BERT due to significant memory consumption and computational overhead.
  • Scaling the deployment to handle traffic surges.
  • Managing different versions of the model and monitoring their performance over time.
  • Keeping track of dependencies, ensuring compatibility, and managing library updates.

This tutorial guides you through deploying a pre-trained BERT model as a real-time REST API endpoint for efficient and scalable text classification in production using Modelbit.

By the end of this article, you will have learned:

  • How BERT powers text classification.
  • How to load a pre-trained BERT model for text classification.
  • How to build spam classification pipelines with BERT.
  • How to deploy the BERT model as a REST API Endpoint with Modelbit.

Here’s an overview of the solution you will build in this article:

Let’s dive right in! 🏊

An Overview of Text Classification with BERT

Text classification involves applying precise labels or categories to textual data. With this, you can train a model to sort text into defined categories. Many industries and large businesses use text classification models for real-world applications like document classification. 

One everyday example of text classification silently operating in the background is filtering spam from your email inbox—separating legitimate mail from unsolicited spam. Text classification also plays a crucial role in sentiment analysis. Here, it's instrumental in identifying harmful content, such as hate speech or offensive language, by classifying the sentiment behind posts, a significant step towards creating a safer online environment.

You might wonder, "How does BERT grasp human language and execute text classification?" Essentially, there are two critical phases to harnessing BERT's capabilities:

1. The pre-training phase

2. The fine-tuning phase

Pre-training Phase

During the pre-training phase, the model undergoes training on vast amounts of textual data to grasp linguistic structures. This phase demands significant computational power. For instance, to pre-train BERT, Google used multiple TPUs, specialized processors designed for deep learning models.

The pre-training phase is structured around three pivotal steps:

  • Selection of text corpus: Training on the English Wikipedia dump and BookCorpus datasets, a compilation of freely accessible ebooks. Notably, these datasets are generic, offering a broad spectrum of information rather than delving into niche subjects. 
  • Masked language modeling: Training to learn how to interpret sentences considering both left-to-right and right-to-left contexts—holistically. The primary challenge for BERT is to identify and predict words that have been intentionally masked, using context from both sides of the obscured word.

Next-sentence prediction: This self-supervised task involves presenting BERT with two sentences, A and B, and challenging it to determine whether B logically follows A or is merely a random excerpt from the dataset.

Fine-tuning Phase

Once the BERT model is pre-trained, it's primed for fine-tuning for any specific NLP task. At this stage, you can introduce a domain-specific dataset in the same language to leverage BERT’s pre-trained weights 

The beauty of fine-tuning BERT is that it doesn't necessitate vast datasets, making the process more cost-effective. BERT's complex architecture makes it very good at understanding the subtleties and context of language, but it cannot easily classify text.

You must fine-tune BERT on text classification tasks, adapting it to categorize text effectively. Alternatively, you can use a BERT model that has already been fine-tuned for text classification, available in open-source variants. This should provide immediate, nuanced textual analysis without the need for extensive computational resources.

Step 1: Load a Pre-Trained BERT Model for Text Classification

For an interactive experience, check out the Colab Notebook, which contains all the provided code and is ready to run!

Use the Hugging Face Hub to find an ideal pre-trained BERT model. The Hugging Face Hub hosts the most extensive registry of open-source machine learning models. As of this writing, the Hub boasts a repository of over 6,900 fine-tuned BERT models.

From the hub, you can find fine-tuned BERT models that may meet your needs and even contribute back by pushing your optimized model to the community. We will use BERT for text classification, specifically “spam detection,” to filter out and isolate spam content.

Install the "transformers" library from Hugging Face:

!pip install transformers

Step 2: Build a Spam Classification Pipeline with a Pre-Trained BERT Model

To build the spam classification pipeline, we will use the `pipeline()` class from the transformers library to run inference with the pre-trained BERT model. It acts as a high-level wrapper, requiring you to specify the task and model, along with other parameters, as detailed in its official documentation

This setup makes it easier to use the BERT model for specific tasks, in this case, text classification. The `pipeline` abstracts the model’s complexity and gives you a simple way to interact with it.

Build the pre-trained BERT model pipeline:

from transformers import pipeline

spam_classifier_pipeline = pipeline(
    task="text-classification", model="wesleyacheng/sms-spam-classification-with-bert"
data = [
    "WINNER!! As a valued network customer you have been selected to receivea '900 prize reward! To claim",
    "I've been searching for the right words to thank you for this breather. I promise i wont take your help for granted",

Pass your data through the spam classifier pipeline to classify the text as “SPAM” or “HAM”:


You should see an output similar to the one below distinguishing between both classes with corresponding probability scores.

Step 3: Deploy the Spam Classification Pipeline as a REST API Endpoint

Modelbit deploys ML models from your Jupyter or Colab notebook, Python environment, or Git to a REST endpoint you can call from your data warehouse or client application. 

To get started with Modelbit, create an account. This demo can run on a free account.

Install the Modelbit Python package with pip:

# Using latest version of pip
!pip install -q --upgrade pip
# Install the latest version of 'modelbit' for model deployment quietly
!pip install -q --upgrade modelbit

🪄 Define the Python Function for Inference

Before deploying the pipeline, embed the classification pipeline within a function, with the function’s argument being the text data you want to classify. Here is how we set up the “classifer_text” function:

def classifer_text(data):
    classifer_pipeline = pipeline(
    outputs = classifer_pipeline(data)
    return outputs

Authenticate your notebook kernel and connect it to your workspace by calling the "modelbit.login()" API:

import modelbit
mb = modelbit.login()

In Modelbit, separate Git branches allow for independent work. They also streamline Git processes like code reviews and merges. To create a new branch, head to Modelbit's UI, click the branch dropdown in the top-right, and select "New Git Branch". Enter the name of your branch in the dialog pop-up.

Now, switch to the new branch you created (replace “your_branch” with the new branch name):


⚡Deploy with Modelbit

Next, deploy the spam classifier pipeline containing the BERT Model as a REST API endpoint with Modelbit. 

Call "mb.deploy()" and pass the deployment function ("classifier_text"). Modelbit detects all notebook dependencies and system packages, pickles the Python functions and the variables integral to "classifier_text". 

Under the hood, the API call pushes your source code to Modelbit. Modebit builds a container with the model weights, Python functions, and necessary dependencies to replicate the notebook environment in production:

mb.deploy(classifer_text, python_packages=['transformers==4.34.1'])

For reproducibility, the transformers library version used in this demo is "4.34.1"

You should see an output similar to the following:

Next, verify the deployment by clicking the "View in Modelbit" button. It should direct you to the endpoint you deployed within your dashboard on Modelbit. There, select the “classifier_text” deployment to proceed. You might have to wait for a few minutes for the container to build and Modelbit to deploy the endpoint 

After your endpoint ships, you should see your new endpoint and instructions to send requests using cURL:

You can invoke the API programmatically using Python with the assistance of the "requests" package, or you can opt for "cURL" for the task. Check the Colab Notebook for how to use both request formats.

Use the “requests” library (change “ENTER_WORKSPACE_NAME” to your deployment workspace name:

import json
import requests

# Change `ENTER_WORKSPACE_NAME` to your workspace name
    headers={"Content-Type": "application/json"},
            "data": [
                "URGENT! You have won a 1 week FREE membership in our �100,000 Prize Jackpot! Txt the word: CLAIM",

If everything works in your code, your output should look like this:

Looking at the “📚Log” panel, we can see that the API processed the request was successfully:

That’s it! You have a real-time endpoint ready to classify messages as either “SPAM” or “HAM.” 

As you saw in the intro, deploying BERT-based models can be challenging, but Modelbit, under the hood, ran the following for you:

  • Pushed your notebook source code into a central workspace.
  • Serialized the notebook variables into a format that it stores and can redefine in production.
  • Detected which dependencies, libraries, and data your model needs from the notebook environment.
  • Containerized the model, dependencies, source code, and necessary helper files to reduce the possibility of errors and save deployment time.
  • Replicated the environment in production for consistency and spins up the REST endpoint.

All you had to do was call “mb.deploy()”, and everything required for production is provisioned and auto-scales to your requirements.

Next Steps

Before integrating your API into a product or client application, familiarize yourself with the security measures necessary to safeguard your API.

When you navigate to your Modelbit dashboard, you'll see a suite of features designed to operate your endpoint in a production environment. This includes functionality for logging prediction data, monitoring endpoint activity, and managing dependencies in your production environment.

Modelbit also offers integration with Arize to monitor data drift, detect issues with your models in Arize, diagnose and fix these issues, and then quickly deploy to production again.

To maximize these resources, don't hesitate to explore your dashboard further and check out more tutorials in the documentation for comprehensive guidance.

Want more tutorials for deploying ML models to production?

Deploy Custom ML Models to Production with Modelbit

Join other world class machine learning teams deploying customized machine learning models to REST Endpoints.
Get Started for Free