How to Make Predictions with SageMaker on Pandas DataFrame

In the world of data science, making predictions on large datasets is a common task. Amazon SageMaker, a fully managed machine learning service, provides a powerful platform for this purpose. In this blog post, we’ll guide you through the process of making predictions with SageMaker on a Pandas DataFrame.

In the world of data science, making predictions on large datasets is a common task. Amazon SageMaker, a fully managed machine learning service, provides a powerful platform for this purpose. In this blog post, we’ll guide you through the process of making predictions with SageMaker on a Pandas DataFrame.

CTA

Table of Contents

  1. Prerequisites
  2. Step 1: Setting Up Your Environment
  3. Step 2: Importing Your Data
  4. Step 3: Preprocessing Your Data
  5. Step 4: Loading Your SageMaker Model
  6. Step 5: Making Predictions
  7. Step 6: Postprocessing Your Predictions
  8. Conclusion

Prerequisites

Before we start, ensure you have the following:

  • An AWS account
  • A trained ML model uploaded to S3 Storage. In this example, we already trained and uploaded an RandomForest Model to our S3 Bucket.
  • Basic knowledge of Python and Pandas
  • Familiarity with AWS SageMaker

Step 1: Setting Up Your Environment

First, install the necessary libraries. You’ll need boto3, sagemaker, and pandas. You can install them using pip:

pip install boto3 sagemaker pandas

Step 2: Importing Your Data

Next, import your data into a Pandas DataFrame. For this tutorial, we’ll use a CSV file stored in an S3 bucket.

import pandas as pd
# Load your test data
test_data = pd.read_csv('s3://your-bucket-name/your-data.csv')
test_data.head()

Output:

Alt text

Step 3: Preprocessing Your Data

Before making predictions, preprocess your data to ensure it’s in the right format. This might involve cleaning, normalizing, or encoding your data.

# Preprocessing code here

CTA

Step 4: Loading Your SageMaker Model

Load your pre-trained SageMaker model. You can do this using the boto3 client.

import boto3

sagemaker = boto3.client('sagemaker-runtime')
endpoint_name = "your-endpoint-name"

# csv serialization
response = runtime.invoke_endpoint(
    EndpointName=predictor.endpoint,
    Body=test_data[feature_names].to_csv(header=False, index=False).encode("utf-8"),
    ContentType="text/csv",
)

Step 5: Making Predictions

Now, you can make predictions using the invoke_endpoint method. This method takes your data in CSV format and returns the predictions.

predictions = response["Body"].read()

Step 6: Postprocessing Your Predictions

Finally, postprocess your predictions as needed. This might involve converting them back into a DataFrame, or further analysis.

import ast
# Convert bytes to string and then to DataFrame
data_str = bytes_data.decode('utf-8')
actual_list = ast.literal_eval(data_str)

# Add the predicted list to the original DF
test_data['prediction'] = actual_list
test_data.head()

Alt text

CTA

Conclusion

Amazon SageMaker provides a powerful, scalable solution for making predictions on large datasets. By integrating it with Pandas, you can leverage the power of these two tools to make your data science tasks easier and more efficient.

Remember, this is just a basic guide. Depending on your specific use case, you might need to adjust or add steps. For example, you might need to split your data into training and test sets, or tune your model’s hyperparameters.

We hope this guide has helped you understand how to make predictions with SageMaker on a Pandas DataFrame.


About Saturn Cloud

Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Join today and get 150 hours of free compute per month.