Segment Anything Model (SAM) Guide

Getting Started with Modelbit

Modelbit is an MLOps platform that lets you train and deploy any ML model, from any Python environment, with a few lines of code.

Table of Contents

Getting StartedOverviewUse CasesStrengthsLimitationsLearning Type

Model Comparisons

No items found.

Getting Started

Model Documentation

https://github.com/facebookresearch/segment-anything

Tutorial for Deploying Segment Anything to a REST API Endpoint

Setup & Installs


!pip install git+https://github.com/facebookresearch/segment-anything.git

!wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth

!pip install --upgrade modelbit

from segment_anything import sam_model_registry, SamPredictor
import cv2
import urllib
import numpy as np
import matplotlib.pyplot as plt

import modelbit
mb = modelbit.login()

Building Blocks For Working With Images


# Given the pixels of an image mask, return the mask's bounding box
def mask2boundingbox(mask):
    x_min = None
    x_max = None
    y_min = None
    y_max = None
    for y, row in enumerate(mask):
      for x, val in enumerate(row):
        if val:
          if x_min is None or x_min > x:
            x_min = x
          if y_min is None or y_min > y:
            y_min = y
          if x_max is None or x_max < x:
            x_max = x
          if y_max is None or y_max < y:
            y_max = y
    return x_min, y_min, x_max, y_max

# Render a mask in matplotlib
def show_mask(mask, ax, random_color=False):
    if random_color:
        color = np.concatenate([np.random.random(3), np.array([0.6])], axis=0)
    else:
        color = np.array([30/255, 144/255, 255/255, 0.6])
    h, w = mask.shape[-2:]
    mask_image = mask.reshape(h, w, 1) * color.reshape(1, 1, -1)
    ax.imshow(mask_image)

# Render a point as a star in matplotlib
def show_points(coords, labels, ax, marker_size=375):
    pos_points = coords[labels==1]
    neg_points = coords[labels==0]
    ax.scatter(pos_points[:, 0], pos_points[:, 1], color='green', marker='*', s=marker_size, edgecolor='white', linewidth=1.25)
    ax.scatter(neg_points[:, 0], neg_points[:, 1], color='red', marker='*', s=marker_size, edgecolor='white', linewidth=1.25)

# Render a box in matplotlib
def show_box(box, ax):
    x0, y0 = box[0], box[1]
    w, h = box[2] - box[0], box[3] - box[1]
    ax.add_patch(plt.Rectangle((x0, y0), w, h, edgecolor='red', facecolor=(0,0,0,0), lw=2))

# Render an image in matplotlib
def show_image(img, points = None, mask = np.ndarray([]), box = ()):
    im = plt.figure(figsize=(10,10))
    plt.imshow(img)
    if points:
        show_points(np.array([[points[0], points[1]]]), np.array([1]), plt.gca())
    if mask.any():
        show_mask(mask, plt.gca())
    if box:
        show_box(box, plt.gca())
    plt.axis('on')
    modelbit.log_image(im)
    plt.show()

Getting Segment Anything Running


sam = sam_model_registry["default"](checkpoint="sam_vit_h_4b8939.pth")
sam.to(device="cuda")
predictor = SamPredictor(sam)

def find_cat(url, x_coord, y_coord):
    url_response = urllib.request.urlopen(url)
    img = cv2.imdecode(np.array(bytearray(url_response.read()), dtype=np.uint8), -1)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    predictor.set_image(img)

    masks, scores, logits = predictor.predict(
        point_coords=np.array([[x_coord, y_coord]]),
        point_labels=np.array([1]),
        multimask_output=True,
    )

    top_score = 0
    best_mask = None
    for i, score in enumerate(scores):
      if score > top_score:
        top_score = score
        best_mask = masks[i]

    bbox = mask2boundingbox(best_mask)
    show_image(img, (x_coord, y_coord), best_mask, bbox)

    return bbox

bounding_box = find_cat("https://montereyzoo.org/wp-content/uploads/2017/04/big-cats-6.jpg", 225, 150)
bounding_box

Deploying Segment Anything to a REST API


mb.deploy(find_cat,
          python_packages=[
            "git+https://github.com/facebookresearch/segment-anything.git",
            "opencv-python==4.8.0.76",
            "pycocotools==2.0.7",
            "matplotlib==3.7.1",
            "numpy==1.25.2",
            "torch==2.0.1+cu118",
            "torchvision==0.15.2+cu118",
          ],
          system_packages=["python3-opencv", "build-essential"],
          require_gpu=True
)

batch_data = [[i, "https://montereyzoo.org/wp-content/uploads/2017/04/big-cats-6.jpg", 225, 150] for i in range(1, 251)]
batch_data

mb.get_inference(workspace="harrys-house", deployment="find_cat", data=batch_data)

classes = ['AB', 'AC', 'AE', 'AH', 'AM', 'B', 'BT', 'C2', 'CB', 'CG', 'CM', 'CS', 'D2', 'DB', 'DO', 'EC', 'EP', 'ES', 'F2', 'FM', 'HB', 'HC', 'HP', 'LA', 'LB', 'LM', 'LS', 'LW', 'MC', 'MM', 'MR', 'MS', 'PA', 'PM', 'PT', 'PW', 'PY', 'R2', 'RB', 'RC', 'RH', 'RT', 'S', 'SE', 'SH', 'T', 'VC']
url = 'https://i.ibb.co/BBkGc4t/L4-1-CONSTRUCTION-PLAN-2-page-0001-jpg-rf-1c84cf86e0e1bf1db47b7a2d6914dbf0.jpg'
batch_data = [[i, url, classes] for i in range(1, 21)]
batch_data

for i in range(5):
  print(mb.get_inference(workspace="harrys-house", deployment="CLIP_inference", data=[url, classes]))

mb.get_inference(workspace="harrys-house", deployment="CLIP_inference", data=batch_data)

Model Overview

The Segment Anything Model (SAM) was released in April 2023 by Meta Research. It belongs to the category of image segmentation computer vision models, focusing on revolutionizing segmentation model building.

Model Overview: SAM is designed to simplify the image segmentation process, making it accessible for various applications. It is trained on over 1 billion segmentation masks, enabling it to adapt to specific segmentation tasks with ease.

Architecture: SAM's architecture integrates a task component, a model component, and a dataset component. These components work together to enable real-time image segmentation with high versatility and accuracy.

Use Cases: SAM finds applications in environmental monitoring, fashion retail, augmented reality, content creation, scientific research, and data annotation. Its ability to adapt to diverse scenarios and perform zero-shot transfer makes it valuable across various industries.

Libraries and Frameworks: SAM uses Python and integrates with the Hugging Face Transformers library. It employs PyTorch for computations, PIL (Python Imaging Library) for image processing, and requests for HTTP requests. The SAM model and processor are specifically loaded using the SamModel and SamProcessor classes from the transformers package.

Use Cases

Segment Anything Model (SAM) has a wide array of popular use cases across various domains:

Assisted Image Labeling: SAM can assist in creating polygon annotations for images without the need to click on individual points around a polygon. By clicking on an object of interest and refining annotations as necessary, SAM streamlines the annotation process, particularly useful in scenarios requiring detailed image labeling.

Zero-Shot Labeling: SAM is capable of annotating images from previously unseen categories. For instance, when provided with images of cars on a road, SAM can recommend segmentation masks for all cars in the image, along with other elements, though it requires another model, like Grounding DINO, for specific object identification and labeling.

Removing Backgrounds: This feature of SAM is particularly beneficial in photo editing. It can precisely identify and remove backgrounds from images, allowing for the replacement with custom backgrounds or transparent layers, enhancing the versatility in image manipulation.

Inpainting: SAM's accuracy in identifying object boundaries makes it ideal for inpainting in image generation. For example, changing the color of cars in an image can be achieved by using SAM to identify cars, select the masks of interest, and then process them through an inpainting model.

Synthetic Data Generation: SAM can be used in conjunction with zero-shot object detection models to paste segmented objects onto new backgrounds. This is particularly useful in creating diverse datasets for training models in specific environments, such as identifying defects on metal pipes and adding artificial defects for training purposes.

Versatile Segmentation: SAM is adaptable for various real-world scenarios like environmental monitoring, which includes ecosystem analysis, deforestation detection, wildlife tracking, and land use categorization. Its versatility allows for broad applications in conservation, urban planning, and environmental research.

Zero-Shot Transfer: This feature enables SAM to be used directly in new image domains without additional training, streamlining processes in fashion retail by enabling the quick introduction of new clothing lines without the need for specific model training.

Real-Time Interaction: SAM's architecture supports real-time interaction, crucial for augmented reality and rapid segmentation tasks in content creation【50†source】.

Multimodal Understanding: SAM can be integrated into larger AI systems for a comprehensive understanding of both text and visual content, enhancing capabilities in areas like web content analysis.

Efficient Data Annotation and Equitable Data Collection: SAM aids in creating large-scale datasets efficiently and aims to better represent diverse geographic regions and demographic groups, making it suitable for varied applications involving diverse populations.

Content Creation and AR/VR: Its segmentation capabilities enhance content creation tools by automating object extraction for collages or video editing and enrich user experience in AR/VR through object selection and transformation.

Scientific Research: SAM finds applications in scientific research by locating and tracking objects in videos, offering insights and advancing various fields of study.

Strengths

Segment Anything Model demonstrates several strengths that make it particularly effective for the use cases previously outlined:

Versatility and Adaptability: SAM's design as a foundation model for image segmentation allows it to be used across a wide range of applications without the need for task-specific modeling expertise. This versatility is evident in its ability to segment objects and regions within images using various inputs like clicks, boxes, or text. This adaptability makes it accessible to a broader range of users and applications, from content creation to scientific research.

Generalization to New Tasks: One of SAM's key strengths is its ability to generalize to new tasks and image domains without the need for custom data annotation or extensive retraining. This is made possible by its training on a diverse dataset of over 1 billion segmentation masks. Such a capability is crucial in fields like environmental monitoring and fashion retail, where the ability to adapt quickly to new types of images is essential.

Real-Time Interaction Capabilities: SAM’s efficient architecture enables real-time interaction with the model. This is especially beneficial in augmented reality applications and content creation tasks that require rapid segmentation. The real-time interaction ensures immediate feedback, which is crucial for applications that depend on quick response times.

Zero-Shot Transfer: SAM's zero-shot transfer ability allows it to segment new objects and image domains straight out of the box, reducing the need for task-specific models. This feature is particularly useful in fashion retail, where e-commerce platforms can quickly adapt to new fashion trends by effortlessly introducing new clothing lines without specific model training.

Efficient Data Annotation: The data engine of SAM, responsible for creating and curating the SA-1B dataset, plays a pivotal role in its ability to generalize to new tasks. The data engine incorporates stages of interactive and automatic annotation, ensuring the dataset's high quality and variety. This efficient data annotation process is a significant strength, especially for researchers and developers working on their own segmentation tasks.

Equitable Data Collection: SAM aims for better representation across diverse geographic regions and demographic groups in its dataset creation process. This makes it more equitable and suitable for real-world applications that involve varied populations, addressing the need for inclusivity in data used for AI models.

These strengths of SAM underline its revolutionary impact on image segmentation, offering unmatched flexibility, precision, and efficiency in addressing a wide range of real-world applications.

Limitations

The Segment Anything Model (SAM) exhibits certain limitations that affect its performance in specific scenarios

Precision in Complex Structures: Despite being trained on 1.1 billion masks, SAM often struggles with precision, particularly when dealing with objects that have complex shapes and intricate structures.

Sensitivity to Input Placement: The model's performance is highly sensitive to the placement of input prompts, like points. Incorrect placement can lead to inaccurate segmentation results.

Requirement for Prior Knowledge: SAM requires more manual prompts with prior knowledge for complex scenes, such as crop segmentation and fundus image segmentation.

Challenges in Low-Contrast Scenarios: The model is less effective in segmenting objects in low-contrast scenarios, such as transparent or camouflaged objects.

Limited Understanding of Proficient Data: In real-world medical and industrial scenarios, SAM has shown unsatisfactory results, especially when using certain modes of operation.

Difficulty with Small and Irregular Objects: SAM faces challenges in segmenting smaller and irregular objects, such as those encountered in remote sensing and agriculture.

Learning Type & Algorithmic Approach

The Segment Anything Model (SAM) leverages a unique algorithmic approach that sets it apart from traditional image segmentation models. Its methodological innovations allow it to handle a wider range of segmentation tasks with greater precision and adaptability.

Advanced Neural Networks: At its core, SAM employs deep learning techniques, likely using a variation of convolutional neural networks (CNNs). These networks are adept at processing visual data, extracting features, and learning complex patterns in images.

Enhanced Object Recognition: Unlike standard models that rely heavily on pre-defined object categories, SAM utilizes algorithms that are designed to recognize and segment objects in a more generalized and flexible manner. This is achieved through advanced pattern recognition and feature extraction techniques.

Innovative Segmentation Techniques: SAM likely incorporates state-of-the-art segmentation techniques such as Mask R-CNN or U-Net architectures. These techniques are known for their efficiency in differentiating between object foregrounds and backgrounds, even in complex scenes.

Data Augmentation and Transfer Learning: To enhance its adaptability, SAM might use data augmentation strategies to train on a diverse set of images, enabling it to handle a variety of objects and scenarios. Additionally, transfer learning could be employed to leverage knowledge from pre-trained models, enhancing SAM’s capability to segment novel objects.

Reinforcement Learning for Adaptability: Incorporating elements of reinforcement learning, SAM can continually improve its segmentation accuracy through iterative training processes, adapting to new types of images and environmental conditions.

The fundamental algorithms behind SAM represent a blend of the latest techniques in machine learning and computer vision. This combination allows SAM to approach the task of image segmentation with a level of versatility and accuracy that surpasses many of its predecessors. Its algorithmic sophistication is a key factor in its ability to segment a wide array of objects in varying contexts, making it a powerful tool in the field of computer vision.

Ready to deploy your ML model?

Get a demo and learn how ML teams are deploying and managing ML models with Modelbit.
Book a Demo