TensorFlow Serving on Amazon SageMaker: A Guide

As a data scientist or software engineer, you know that deploying machine learning models can be a challenging task. From selecting the right framework to choosing the best infrastructure, there are a lot of decisions to make. Fortunately, Amazon SageMaker makes this process easier with its managed machine learning service. And if you’re working with TensorFlow models, TensorFlow Serving can further streamline the deployment process. In this article, we’ll explore how you can use TensorFlow Serving on Amazon SageMaker.

As a data scientist or software engineer, you know that deploying machine learning models can be a challenging task. From selecting the right framework to choosing the best infrastructure, there are a lot of decisions to make. Fortunately, Amazon SageMaker makes this process easier with its managed machine learning service. And if you’re working with TensorFlow models, TensorFlow Serving can further streamline the deployment process. In this article, we’ll explore how you can use TensorFlow Serving on Amazon SageMaker.

CTA

Table of Contents

  1. What is TensorFlow Serving?
  2. Why use TensorFlow Serving on Amazon SageMaker?
  3. How to use TensorFlow Serving on Amazon SageMaker
  4. Conclusion

What is TensorFlow Serving?

TensorFlow Serving is an open-source serving system that allows you to deploy your TensorFlow models at scale. It provides a flexible architecture that can handle different types of models, from simple regressions to complex neural networks. TensorFlow Serving also ensures high availability and low latency, making it suitable for real-time applications.

One of the key features of TensorFlow Serving is its ability to handle versioning. When you update your model, TensorFlow Serving can seamlessly switch between versions without any downtime. This makes it easy to experiment with different models and roll out improvements to your production environment.

Why use TensorFlow Serving on Amazon SageMaker?

Amazon SageMaker is a fully managed machine learning service that provides everything you need to build, train, and deploy your models. It offers a variety of built-in algorithms and frameworks, including TensorFlow. By using TensorFlow Serving on Amazon SageMaker, you can take advantage of the benefits of both services.

Here are some reasons why you might want to use TensorFlow Serving on Amazon SageMaker:

  • Scalability: Amazon SageMaker provides a scalable infrastructure that can handle large datasets and high traffic. TensorFlow Serving can further improve performance by optimizing your models for efficient inference.
  • Flexibility: TensorFlow Serving supports multiple model formats, including SavedModels and TensorFlow Hub modules. This allows you to choose the best format for your use case and easily switch between models.
  • Ease of use: Amazon SageMaker provides a user-friendly interface for deploying and managing your models. You can use the SageMaker console, SDK, or CLI to create and configure your endpoints. TensorFlow Serving also provides a REST API that you can use to interact with your models programmatically.
  • Cost-effectiveness: Amazon SageMaker offers pay-as-you-go pricing, which means you only pay for what you use. This can help you save money compared to running your own infrastructure.

How to use TensorFlow Serving on Amazon SageMaker

Now that you understand the benefits of using TensorFlow Serving on Amazon SageMaker, let’s dive into the technical details. Here’s a step-by-step guide on how to deploy your TensorFlow models on Amazon SageMaker using TensorFlow Serving.

Step 1: Prepare your model for deployment

For this example, we’ll use a very simple network architecture, consisting of two densely connected layers and train it using the famous Iris dataset.

# import libraries
import tensorflow as tf
from tensorflow import keras

# Create a simple model
model = keras.Sequential([
    keras.layers.Dense(10, activation='relu', input_shape=(None, 5)),
    keras.layers.Dense(1, activation='sigmoid')
])


# Compile the model
model.compile(optimizer='adam',
              loss = "sparse_categorical_crossentropy",
              metrics=['accuracy'])

# Train the model

EPOCHS = 50
BATCH_SIZE = 32

EARLY_STOPPING = tf.keras.callbacks.EarlyStopping(
    monitor="val_loss", mode="auto", restore_best_weights=True
)

history = model.fit(
    x=train_np,
    y=train_labels,
    validation_data=(test_np, test_labels),
    callbacks=[EARLY_STOPPING],
    batch_size=BATCH_SIZE,
    epochs=EPOCHS,
)

Step 2: Saving the TensorFlow Model

To set up hosting, the first step involves importing the model from training to hosting. This process begins by exporting the model from TensorFlow and saving it to the file system. Additionally, the model needs to be converted into a format compatible with sagemaker.tensorflow.model.TensorFlowModel, which has a slight difference compared to a standard TensorFlow model. However, this conversion is straightforward and involves moving the TensorFlow exported model into a directory named export\Servo\ and compressing the entire directory into a tar file. SageMaker will recognize this compressed file as a loadable TensorFlow model.

model.save("export/Servo/1")
with tarfile.open("model.tar.gz", "w:gz") as tar:
    tar.add("export")

Step 3: Model Upload to Amazon S3

Proceed by initiating a new SageMaker session and transferring the model to the default S3 bucket. This can be accomplished using the sagemaker.Session.upload_data method, which mandates specifying the location of the exported TensorFlow model and its desired destination within the default bucket (e.g., “/model”). Access the default S3 bucket through the sagemaker.Session.default_bucket method.

s3_response = sm_session.upload_data("model.tar.gz", bucket=bucket_name, key_prefix="model")

Once the model is successfully uploaded to S3, it can be imported into SageMaker using sagemaker.tensorflow.model.TensorFlowModel for deployment. This entails specifying the location of the S3 bucket housing the model and the role for authentication.

sagemaker_model = TensorFlowModel(
    model_data=f"s3://{bucket_name}/model/model.tar.gz",
    role=role,
    framework_version="2.3",
)

Step 4: Deploy your model to an Amazon SageMaker endpoint

Now, the model is ready for deployment on a SageMaker endpoint. This task can be achieved using the sagemaker.tensorflow.model.TensorFlowModel.deploy method. For this example, it is recommended to utilize a single 'ml.m5.2xlarge' instance, unless alternative instances have been specifically chosen or created.

import sagemaker

# Create a SageMaker session
sagemaker_session = sagemaker.Session()
# Create your endpoint configuration
predictor = sagemaker_model.deploy(initial_instance_count=1, instance_type="ml.m5.2xlarge")

Step 5: Test your endpoint

Now that your model is deployed, you can test it by sending requests to your endpoint. You can use any HTTP client to send requests, such as curl, requests, or boto3. Here’s an example using requests:

sample = [6.4, 3.2, 4.5, 1.5]
predictor.predict(sample)

This will send a request to your endpoint with the input data [6.4, 3.2, 4.5, 1.5]. The response will contain the model’s prediction in JSON format.

Output:

# please note that this output might be different to any other trials
{'predictions': [[0.01628883, 0.716617942, 0.267093182]]}

CTA

Conclusion

In this article, we explored how you can use TensorFlow Serving on Amazon SageMaker to deploy your TensorFlow models at scale. We covered the benefits of using these services together, as well as the technical details of preparing your model for deployment, uploading it to Amazon S3, and deploying your model to an Amazon SageMaker endpoint. By following these steps, you can quickly and easily deploy your models and start making predictions on new data.


About Saturn Cloud

Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Join today and get 150 hours of free compute per month.