A Detailed Guide to Amazon SageMaker
In this blog post, we’re diving into the world of Amazon SageMaker, providing a detailed overview of all its components. Amazon SageMaker is a comprehensive machine learning service from Amazon Web Services (AWS), designed to cater to the needs of data scientists, developers, and businesses. Our goal is to clarify how each part of SageMaker functions and interacts, covering everything from data labeling with Ground Truth to the complexities of model training and deployment. This guide is aimed at anyone looking to understand and utilize Amazon SageMaker, whether you’re new to machine learning or an experienced practitioner. We’ll break down the platform’s features and services, giving you a practical understanding of leveraging SageMaker for your machine learning projects.
What is SageMaker, and how does it work?
SageMaker is a product suite that touches various parts of the ML life cycle. By my count, AWS SageMaker has 14 distinct products, making it pretty challenging to wrap your head around. Most ML teams only need a 3 of the 14 products, and the rest can be ignored until you need them. We don’t have hard data on this, but from conversations with other data scientists, most people are leveraging Amazon SageMaker for a cloud data science development environment (Hosted Notebooks), Model serving (SageMaker Inference), and model training (SageMaker Training).
Amazon actually has 3 SageMaker products for hosted notebooks: Amazon SageMaker Studio, Amazon SageMaker Studio Classic, and Amazon SageMaker Notebook Instances. Amazon SageMaker Studio, the most recent offering released at the end of 2023, unifies all the best parts of Studio Classic and Notebook Instances and is probably where you should start.
Components of AWS SageMaker
This section goes through a brief overview of all the components of AWS SageMaker. Feel free to skip this section or only go through the components that you are interested in.
Developing code in SageMaker Studio
There are currently 3 IDE products in SageMaker. Amazon SageMaker Studio is now the default environment, with Studio Classic and Notebook instances as alternative legacy environments.
What’s the difference between SageMaker Studio, Studio Classic, and Notebook Instances?
As mentioned above, Amazon seems to have released a new offering, “SageMaker Studio,” at the end of 2023, which is probably the best path forward for new users to SageMaker. Here’s a breakdown of their differences:
Amazon SageMaker Studio: SageMaker Studio is an integrated development environment (IDE) for machine learning. It provides a single, web-based visual interface where you can perform all machine learning development steps. This includes writing code, tracking experiments, visualizing data, and deploying models. SageMaker Studio supports Jupyter Lab, R Studio Workbench (if you have a valid license), and Code-OSS interactive environments. File Systems (aka Spaces) used in SageMaker Studio are hosted in EBS volumes and cannot be shared. There is also no way to bring your own docker images into SageMaker Studio, but you can for Studio Classic.
Amazon SageMaker Studio Classic: SageMaker Studio Classic is a legacy product. It supports Jupyter Lab as the primary interface, and uses EFS (managed NFS) as primary storage. SageMaker Studio Classic Spaces can be shared, unlike spaces for SageMaker Studio. SageMaker Studio Classic can also be customized with your own docker images, git repositories, and lifecycle configurations.Learn how to build custom Docker images for SageMaker here.
Amazon SageMaker Notebook Instances: SageMaker Notebook Instances are also a legacy product. These are fully managed instances running Jupyter notebooks. Unlike Studio, these are more standalone and provide a basic Jupyter environment without the advanced integrations and tools of SageMaker Studio.
Amazon SageMaker Training
After the IDE features, many users also leverage SageMaker Training, a core component of the Amazon SageMaker service, designed to simplify the process of training machine learning models. In a nutshell, SageMaker Training allows you to run and manage machine learning model training jobs easily. This process is integral to developing models to make predictions or analyze trends based on input data.
The Training Process
The training process in Amazon SageMaker involves several key steps. First, you start a model training job, where SageMaker uses algorithms and data you provide to train a model. The service supports various built-in algorithms, or you can specify your own. You also need to provide the data for training, which SageMaker accesses from specified Amazon S3 locations.
Key Features and Capabilities
One of the key features of SageMaker Training is its flexibility in resource management. You can specify the type and quantity of resources (like compute instances) needed for the training job. This ensures efficient use of resources and can help optimize costs, especially with options like Managed Spot Training.
Hyperparameters and Optimization
Hyperparameters are critical in machine learning, and SageMaker Training allows you to set these algorithm-specific parameters. Furthermore, the service offers automatic hyperparameter tuning to optimize these parameters for the best model performance.
Security and Compliance
Security is a priority in SageMaker Training. It provides options like network isolation and encryption to protect your data and training jobs. Additionally, SageMaker Training complies with various certifications and regulations, ensuring that your training jobs adhere to industry standards.
Integration and Usage
After training, the resulting model artifacts are stored in Amazon S3, which you can use for inference or further analysis. SageMaker Training seamlessly integrates with other AWS services, enhancing its capabilities and making it a versatile tool for machine learning tasks.
Amazon SageMaker Inference
Hosted models on SageMaker are also very popular. Amazon SageMaker Inference is the process of deploying machine learning models for making predictions, also known as inference, using Amazon SageMaker. This process involves several steps and features:
Deployment of Models: Amazon SageMaker facilitates the deployment of trained models for inference. This can be done in various environments tailored to different requirements, such as real-time or batch processing.
Real-Time Inference: SageMaker supports real-time inference by creating endpoints. These endpoints are scalable and can handle varying volumes of inference requests.
Batch Transform: For scenarios where real-time inference is not required, SageMaker offers Batch Transform, a feature for processing data files in batches.
Model Optimization: Amazon SageMaker Neo is a tool for optimizing models to run efficiently on specific hardware.
Endpoint Management and Testing: SageMaker provides tools for managing and testing these endpoints, ensuring they perform as expected under different conditions.
Inference Recommender: To assist in choosing the best deployment option, SageMaker Inference Recommender evaluates the model and suggests the most suitable deployment configuration.
What are the other components of SageMaker
There are several other components to SageMaker. These are less popular, but if you’re using SageMaker, you should know they exist.
Amazon SageMaker JumpStart
Amazon SageMaker JumpStart is a machine learning hub offering foundation models, built-in algorithms, and pre-built ML solutions for quick deployment. It’s designed to accelerate the machine learning journey, enabling users to evaluate, compare, and select models based on pre-defined metrics for summarization and image generation tasks. These pre-trained models are fully customizable with user data and can be easily deployed into production. SageMaker JumpStart also ensures data privacy and security, as all data is encrypted and remains within the user’s virtual private cloud.
Amazon SageMaker Governance
Amazon SageMaker Governance is an integral part of Amazon SageMaker, focusing on responsible and efficient management of machine learning models. It includes features like SageMaker Model Cards and Model Dashboard. The Model Cards provide a standardized way to document important details about machine learning models, ensuring transparency and consistency. This documentation includes model performance, training data, and ethical considerations. The Model Dashboard, on the other hand, offers a comprehensive view for monitoring the performance and behavior of models, ensuring they function as intended and remain aligned with compliance standards. These governance tools in SageMaker facilitate effective model management and help maintain trust and reliability in machine-learning projects.
Amazon SageMaker HyperPod
Amazon SageMaker HyperPod is a feature within Amazon SageMaker that enhances the performance and scalability of machine learning training jobs. HyperPod allows users to run these jobs on clusters of instances with high-speed networking, significantly improving training efficiency for large, complex models. This feature is particularly useful for deep learning tasks that require processing vast amounts of data or complex computations. By using HyperPod, users can reduce the time it takes to train models, making the process more efficient and cost-effective.
Amazon SageMaker Ground Truth
Amazon SageMaker Ground Truth is a key component of Amazon SageMaker, designed to help build highly accurate training datasets for machine learning quickly and cost-effectively. It offers tools for labeling data, such as images, text, and videos, and it can handle large-scale labeling tasks. Ground Truth supports different workforce options, including private, third-party, or Amazon Mechanical Turk, giving flexibility in managing data labeling tasks. The service also integrates machine learning to offer pre-labeling, making the process more efficient. Ground Truth is crucial for preparing high-quality data, which is essential for the success of machine learning models.
Amazon SageMaker Processing
Amazon SageMaker Processing is a feature within Amazon SageMaker that allows for the efficient processing of machine learning datasets. It simplifies tasks such as data pre-processing, post-processing, feature engineering, and model evaluation. SageMaker Processing provides a managed environment where users can run these tasks on scalable infrastructure, separate from model training and deployment. It supports various data formats and is compatible with machine learning frameworks like Scikit-Learn. This makes it a versatile tool for handling diverse data processing needs in machine learning workflows.
Amazon SageMaker Edge Manager
Amazon SageMaker Edge Manager is a feature within Amazon SageMaker that enhances machine learning model performance at the edge. It is designed for devices that aren’t always connected to a network. Edge Manager optimizes, secures, monitors, and maintains ML models on edge devices. This tool streamlines the process of running and updating machine learning models on various types of hardware outside traditional data centers, enabling more efficient and effective ML use in different environments.
Amazon SageMaker Augmented AI
Amazon SageMaker Augmented AI is a feature within Amazon SageMaker that facilitates human review of machine learning predictions. It integrates human judgment into ML workflows, particularly in cases where ML models cannot confidently make decisions. This tool is especially useful in scenarios requiring nuanced understanding or subjective assessment.
Amazon SageMaker Canvas
Amazon SageMaker Canvas is a feature within Amazon SageMaker, designed to simplify creating machine learning models without coding. It offers a user-friendly interface that allows users to build, train, and deploy machine learning models using a visual point-and-click approach. This makes it particularly accessible to business analysts and others without deep machine learning or programming expertise.
Key features of SageMaker Canvas include:
Ease of Use: With its no-code interface, SageMaker Canvas enables users to build machine learning models without writing any code. This feature is particularly beneficial for users who are not proficient in programming but need to leverage machine learning for their business needs.
Data Integration: SageMaker Canvas allows importing data from various sources, making it easier for users to gather the necessary data for their machine learning models.
Model Building and Deployment: Users can build custom models and deploy them effectively, facilitating the application of machine learning insights in practical business scenarios.
Collaboration and Sharing: The platform encourages collaboration by allowing the sharing and reviewing of models between business analysts and data science teams. This feature promotes a more integrated approach to model development and deployment.
Amazon SageMaker Profiler
Amazon SageMaker Profiler is a feature within Amazon SageMaker, designed to optimize the performance of machine learning models during training. It provides detailed insights into utilizing computing resources like CPU and GPU, helping identify and resolve bottlenecks and inefficiencies.
Key aspects of SageMaker Profiler include:
Resource Monitoring: It profiles CPU and GPU usage, memory operations, and data transfers, offering a comprehensive view of resource utilization during model training.
User Interface: SageMaker Profiler includes a user interface that visualizes the profiling data, making it easier to understand and analyze the performance of training jobs.
Framework Support: It supports popular machine learning frameworks like PyTorch and TensorFlow, facilitating its integration into various machine learning projects.
Custom Annotations: Users can add custom annotations in their training scripts to focus the profiling on specific areas of interest, enhancing the relevance of the profiling data.
Efficiency Optimization: By providing detailed profiling data, SageMaker Profiler aids in optimizing the efficiency of training jobs, reducing time and resource consumption.
What does the ML workflow look like with SageMaker?
1. Set Up
- Environment Setup: Begin by setting up your AWS account and configuring SageMaker. This involves creating an S3 bucket for data storage, setting up IAM roles for permissions, and initializing a SageMaker instance.
2. Prepare Data
- Data Collection and Storage: Collect the data you need for training and store it in Amazon S3. SageMaker integrates with S3 seamlessly, allowing easy access to datasets during the training and model evaluation stages.
- Data Pre-processing: Use SageMaker Processing or other AWS services to clean, pre-process, and transform your data into a format suitable for training. This might include normalization, tokenization, feature extraction, etc.
You should be able to (but you don’t have to) leverage IDE resources in SageMaker (such as SageMaker Studio) to write code for the data preparation. Theoretically, you could write data anywhere, but storing it in S3 is where SageMaker expects it if you leverage other parts of SageMaker infrastructure, such as SageMaker training jobs.
3. Develop the Model
Again, most people will write notebooks inside SageMaker Studio to build and train the first iterations of the model. When this becomes more mature, you can leverage SageMaker training jobs to scale out your training or help automate the model re-training.
4. Deploy Model
Once you’re happy with the model, you can leverage SageMaker Inference to deploy it.
5. Maintenance
SageMaker has various model monitoring tools you can leverage to monitor the model going forward and retrain it when you would like.
How should teams leverage SageMaker?
Ownership of object in SageMaker
The first thing to understand about SageMaker and AWS is that objects in AWS do not have owners. Random EC2 instances that are created, EBS volumes, S3 buckets, none of those are owned by any specific user in AWS. This means that if you create SageMaker training jobs and model endpoints, those will also have no owners. The implication is that anyone with access to AWS SageMaker on the AWS account will have access to all SageMaker objects in the account. If you are leveraging SageMaker for a company, each data science team should have its own AWS account so you have clear separation for access control.
By nesting users one level down, SageMaker Studio has an exception to this. Within SageMaker Studio, you can have multiple users, and those users can have separate spaces. Within SageMaker studio, you do have user isolation. However, since that does not extend to other SageMaker objects (like training jobs), we still recommend having a separate AWS account for each data science team.
Multiple users in SageMaker Studio
You can use SageMaker Studio with or without AWS SSO. Just know that if you are not using AWS SSO - the concept of users is a bit artificial. Without SSO - there is a simple drop-down with all SageMaker Studio users. Anyone can pretend to be any other user within SageMaker Studio. If you need true isolation, you will need to setup and use AWS SSO before you can setup SageMaker.
What are common issues with Amazon SageMaker?
- How to build docker images for SageMaker
- How to SSH into SageMaker Instances
- Understanding SageMaker input modes (s3/efs/fsx)
- Configuring SageMaker for multiple users
- Using custom S3 buckets with SageMaker
- Using PyCharm with SageMaker
Additionally, if you’re looking for a workspace that can fit your exact needs, explore Saturn Cloud for more customization, scalability, and ease of use. Work in the cloud with any IDE, including Jupyter and the R IDE. Run jobs, deployments, build dashboards, and more. Learn more.
About Saturn Cloud
Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Join today and get 150 hours of free compute per month.
Saturn Cloud provides customizable, ready-to-use cloud environments for collaborative data teams.
Try Saturn Cloud and join thousands of users moving to the cloud without
having to switch tools.