An Introduction to AWS SageMaker

Create, train, and deploy machine learning (ML) models that address business needs with fully managed infrastructure, tools, and workflows using AWS Amazon SageMaker. Amazon SageMaker makes it fast and easy to build, train, and deploy ML models that solve business challenges. Amazon SageMaker supports several AWS services, including Amazon Kinesis Firehose Streaming, Amazon Kinesis Analytics, Amazon Redshift, Amazon CloudWatch, Amazon S3, Amazon Simple Storage Service (Amazon S3), Amazon Elastic Compute Cloud (Amazon EC2), and Amazon Elastic MapReduce (Amazon EMR).

Tasks can be executed using a visual console, a fully featured UI, and CLI in parallel.

Level Up Now: Get Certified in Cloud Ops on AWS

Cloud Operations on AWSENROLL NOW
Level Up Now: Get Certified in Cloud Ops on AWS
Here is an example:

  1. Working with a table of JSON files, build, train and deploy a table classification model for the classification of financial records into three categories: loans, deposits, or cash flow.
  2. Create an algorithm that will correctly classify financial records with a known binary value using the stored function library Elastic.
  3. Learn.

This process will demonstrate training a binary classification model for a data set of financial records and then selecting to stream the results to Amazon Redshift. Once the code and the model are created, they can be exported to Amazon S3 for hosting and execution, a cloud cluster for scaling, and then deployed directly to a Kinesis stream for streaming data ingestion. 

What is AWS?

Amazon Web Services (AWS) is an on-demand cloud platform offered by Amazon, that provides service over the internet. AWS services can be used to build, monitor, and deploy any application type in the cloud. Here's where the AWS SageMaker comes into play. 

Want a Job at AWS? Find Out What It Takes

Cloud Architect Master's ProgramExplore Program
Want a Job at AWS? Find Out What It Takes

What is AWS SageMaker?

Amazon SageMaker is a cloud-based machine-learning platform that helps users create, design, train, tune, and deploy machine-learning models in a production-ready hosted environment. The AWS SageMaker comes with a pool of advantages (know all about it in the next section)

Advantages of AWS SageMaker

Some of the advantages of SageMaker are below:

  • It enhances the productivity of a machine learning project
  • It helps in creating and managing compute instance with the least amount of time 
  • It inspects raw data and automatically creates, deploys, and trains model with complete visibility 
  • It reduces the cost of building machine learning models up to 70% 
  • It reduces the time required for data labeling tasks 
  • It helps in storing all ML components in one place
  • It is highly scalable and trains model faster

Machine Learning With AWS SageMaker

Now, let’s have a look at the concept of Machine Learning With AWS SageMaker and understand how to build, test, tune, and deploy a model.

The following diagram shows how machine learning works with AWS SageMaker.

sagemaker

Builds

  • It provides more than 15 widely used ML Algorithm for training purpose
  • It gives the capability to select the required server size for our notebook instance
  • A user can write code (for creating model training jobs) using notebook instance 
  • Choose and optimize the necessary algorithm, such as
    • AWS SageMaker helps developers to customize Machine Learning instances with the Jupyter notebook interface

    Test and Tune 

    • Set up and import required libraries
    • Define a few environment variables and manage them for training the model
    • Train and tune the model with Amazon SageMaker
    • SageMaker implements hyperparameter tuning by adding a suitable combination of algorithm parameters
    • SageMaker uses Amazon S3 to store data as it’s safe and secure.

    Note: S3 is used for storing and recovering data over the internet. 

    • SageMaker uses ECR for managing Docker containers as it is highly scalable.

    Note: ECR helps a user to save, monitor, and deploy Docker containers. 

    • SageMaker divides the training data and stores in Amazon S3, whereas the training algorithm code is stored in ECR
    • Later, SageMaker sets up a cluster for the input data, trains, and stores it in Amazon S3 itself

    Note: Suppose you want to predict limited data at a time, use Amazon SageMaker hosting services, but if you're going to get predictions for an entire dataset, use Amazon SageMaker batch transform.

    Deploy

    • Once tuning is done, models can be deployed to SageMaker endpoints
    • In the endpoints, a real-time prediction is performed
    • Now, evaluate your model and determine whether you have achieved your business goals

    How to Train a Model With AWS SageMaker?

    Model training in SageMaker is done on machine learning compute instances.

    • When a user trains a model in Amazon SageMaker, he/ she creates a training job. 
    • Training jobs comprise of:
      • S3 bucket (within the compute instance): The URL of the Amazon S3 bucket where the training data is stored
      • AWS SageMaker on ML instance: Compute resources or Machine Learning compute instances
      • S3 bucket (outside the compute instance): The URL of the Amazon S3 bucket where the output will be stored
      • Inference code image: The path of AWS Elastic Container Registry path where the code data is saved
      • The input data is fetched from the specified Amazon S3 bucket
      • Once the training job is built, Amazon SageMaker launches the ML compute instances
      • Then, it trains the model with the training code and dataset
      • SageMaker stores the output and model artifacts in the AWS S3 bucket
      • In case the training code fails, the helper code performs the remaining task 
      • The interference code consists of multiple linear sequence containers that process the request for inferences on data
      • EC2 container registry is a storage registry that helps users to save, monitor, and deploy container images

      Note: container images are the ready applications

      • Once the data is trained, the output is stored in the specified Amazon S3 bucket
      • To prevent your algorithm from being deleted, save the data in Amazon SageMaker critical system processes on your ML compute instances

      How to Validate a Model With SageMaker?

      You can evaluate your model using offline or historical data:

      1. Offline Testing

      Use historical data to send requests to the model through Jupyter notebook in Amazon SageMaker for evaluation.

      2. Online Testing with Live Data

      It deploys multiple models into the endpoint of Amazon SageMaker and directs live traffic to the model for validation.

      3. Validating Using a "Holdout Set"

      Here, a part of the data is set aside, which is called a "holdout set“. Later, the model is trained with remaining input data and generalizes the data based on what it learned initially.

      4. K-fold Validation

      Here, the input data is split into two parts. One part is called k, which is the validation data for testing the model, and the other part is k − 1 which is used as training data. Now, based on the input data, the machine learning models evaluate the final output.

      Ignite Your Knowledge in AWS Cloud Operations

      Cloud Operations on AWSENROLL NOW
      Ignite Your Knowledge in AWS Cloud Operations

      Companies Using SageMaker Service

      train-s3

      Let’s consider an example of ProQuest

      ProQuest is a global information-content and technology company that provides valuable content such as eBooks, newspapers, etc. to the users.

      ProQuest used AWS SageMaker to create a content recommendation system. With the help of SageMaker, ProQuest was able to create videos of better user experience and helped in providing maximum relevant search results. 

      Demo- Steps to Build and Train a Machine Learning Model using AWS Sagemaker

      Let us create a SageMaker notebook instance:

      • To create a notebook instance, use either the SageMaker console or the CreateNotebookInstance API
      • First, open the SageMaker console at https://console.aws.amazon.com/sagemaker/.
      • Once the instance is opened, select Notebook instances -> Create notebook instance. This will create the notebook instance successfully
      • On the instance page, enter the following information:
      • In the Notebook instance name and tab, type a suitable name and tag for your notebook instance.

      notebook-instance

      • Next, in the Notebook instance type, select an appropriate instance size for your project. 
      • In the Elastic Inference option, choose none if you want to skip that option, otherwise, select inference accelerator type in case you are planning to conduct inferences 
      • (Optional) In this configuration option, you can specify ML storage volume in MB for notebook instances. 

      Create an IAM Role

      • Next, specify the IAM role for the SageMaker model. You can either select an existing IAM role in your account or Create a new role.

      new-role

      • Now, enable root access for all notebook instances. For this, select Enable. In case, you want to disable root access, select Disable.
      • Finally, click on the Create notebook instance.

      notebook

      Within a few minutes, SageMaker creates a Machine Learning Notebook instance and attaches a storage volume. 

      Note: This notebook instance has a preconfigured Jupyter notebook server and predefined libraries.

      Want a Job at AWS? Find Out What It Takes

      Cloud Architect Master's ProgramExplore Program
      Want a Job at AWS? Find Out What It Takes

      Prepare Data Using AWS SageMaker

      • Now, prepare the data using the Amazon SageMaker notebook that you require to train your ML model. (Note: Wait until the SageMaker Instance changes from Pending to InService state.)
      • Once the Jupyter notebook opens, go to -> Files tab -> New -> conda_python3. 
      • In this step, you should train and deploy the ML model by importing necessary libraries in your Jupyter notebook environment
      • Here, Copy the following code into the code cell in your instance and select Run

      import libraries

      import boto3, re, sys, math, json, os, sagemaker, urllib.request

      from sagemaker import get_execution_role

      import numpy as np                                

      import pandas as pd                               

      import matplotlib.pyplot as plt                   

      from IPython.display import Image                 

      from IPython.display import display               

      from time import gmtime, strftime                 

      from sagemaker.predictor import csv_serializer   

      # Define IAM role

      role = get_execution_role()

      prefix = 'sagemaker/DEMO-xgboost-dm'

      containers = {'us-west-2': '433757028032.dkr.ecr.us-west-2.amazonaws.com/xgboost:latest',

                    'us-east-1': '811284229777.dkr.ecr.us-east-1.amazonaws.com/xgboost:latest',

                    'us-east-2': '825641698319.dkr.ecr.us-east-2.amazonaws.com/xgboost:latest',

                    'eu-west-1': '685385470294.dkr.ecr.eu-west-1.amazonaws.com/xgboost:latest'} # each region has its XGBoost container

      my_region = boto3.session.Session().region_name # set the region of the instance

      print("Success - the MySageMakerInstance is in the " + my_region + " region. You will use the " + containers[my_region] + " container for your SageMaker endpoint.")

      • Next, create an Amazon S3 bucket to store your necessary data by copying and pasting the below program into the next code cell in your notebook

      bucket_name = 'dummydemo' # <--- CHANGE THIS VARIABLE TO A UNIQUE NAME FOR YOUR BUCKET

      s3 = boto3.resource('s3')

      try:

          if  my_region == 'us-east-1':

            s3.create_bucket(Bucket=bucket_name)

          else: 

            s3.create_bucket(Bucket=bucket_name, CreateBucketConfiguration={ 'LocationConstraint': my_region })

          print('S3 bucket created successfully')

      except Exception as e:

          print('S3 error: ',e)

      • Select Run. Suppose you get an error, rename the S3 bucket name and run again.
      • Below is the following output we get after execute:

      Skyrocket Your AWS Career: Earn Top Salaries!

      Cloud Operations on AWSENROLL NOW
      Skyrocket Your AWS Career: Earn Top Salaries!

      Output:

      output-sage

      With this, we reach the end of this article about the AWS SageMaker.

      What Type of Machine Learning Solutions Can Amazon SageMaker Solve?

      To answer this, we need to first know the problem at hand, how it affects existing machine learning models and what kind of solutions Amazon SageMaker can solve.

      Let's try to understand how machine learning works in another specific scenario. The back-end of a store is filled with products which are classified as "new," "similar," "special," and "used." For each category, there is a target entity called "category use" and some elements, such as titles, descriptions, price, etc., to produce each feature.

      If you go to an Amazon product search, you are looking for a large bucket with a 50% probability of being valid for many customers. You can use that technique to produce the "used" element of the "category use" bucket. You will need to create parameters manually. These could be industry, product, or website, where the category uses the product.

      Amazon SageMaker is designed to solve problems like this. You specify some fields with a predefined probability for each element to belong to a category. You also define a target segment to examine. The list of options is endless; in our example, Amazon will use the same bucket with the same list of categories for all the vendors with the same numbers, so it's still an excellent source to select the right vendor. Amazon will then automatically create the appropriate machine learning classifier for each piece. The search result will show only vendors that meet your chosen parameters.

      On the visual side, Amazon will start applying data science to give you the categories you specified in your SageMaker problem description. The ones it finds will show up as a list of products. So the process is entirely automated, and Amazon can look for categories with a size and a probability distribution that are interesting for your use case. Your solution will ultimately differ from the machine learning solution you usually buy.

      So, how do we build this system? You could ask for examples of other solutions to demonstrate that this solution is possible.

      Sure, we can use a list of all the vendors with the same number of products in the same buckets, each using one of the machine learning algorithms, like Hidden Markov Models (HMM), Support Vector Machine (SVM), or Random Forest. But this approach has a disadvantage: the training set. There are not enough data points for all vendors in your problem to develop a good solution. We might also give up objectivity, as it is hard to see how the results come from this step.

      But with Amazon SageMaker, you can input the same problems you usually would, with the same data, but you can build a solution, or the first step of a solution, in less than an hour.

      Want a Job at AWS? Find Out What It Takes

      Cloud Architect Master's ProgramExplore Program
      Want a Job at AWS? Find Out What It Takes

      Why Should I Try Amazon SageMaker?

      Now that you understand what Amazon SageMaker can do for you, it's time to put the pieces together. Before we can use Amazon SageMaker, we need to train the machine learning classifiers. As you can imagine, this is not a straightforward process.

      To learn how to build this system, you need some data science and machine learning expertise. At the same time, you need a dataset with a large sample of products.

      If you are familiar with R, the libraries for working with data in R are pretty good and available in GitHub. Several packages are available on GitHub: the best is scikit-learn. I will not be covering that; I recommend you read more about it.

      If you do not use R, there are several packages for Python. Several Python packages are available for you to build data pipelines, such as pip.

      Once you have a dataset, you must go to Amazon SageMaker and use Amazon Machine Learning API. You will need to enable it for Amazon CloudWatch. Then you can build the job with the job execution plan. Once you have your job and the AWS CloudWatch parameters, you can run the job or the first step of the job. And then, you can run the job and apply the result to the data stored in Amazon S3.

      After you have created the job, you need to submit the job and wait for a response. So you need to test it on another dataset. You'll have similar results if you apply the same scenario to your test dataset. So you need to submit the job again and wait for the results.

      That's all it takes to build a machine learning model, apply the model to your problem, and get an answer to your question.

      One Last Thing: How Long Does It Take?

      Now you can drag the button to ask Amazon SageMaker for a solution, and you are presented with a job and the CloudWatch parameters. There are three tasks to run and a few steps to complete before the results appear.

      But you will probably be impatient, so we will try a simple case, a scenario with just one ingredient. If you are familiar with A/B testing, you can come up with a system like this. We want to compare two options (or things that contain only one ingredient), and we want the results to be precise: we want to see that our marketing strategy is working and not just any random comparison.

      Here is the example scenario: we want to show that cookies are just as good for the user as ice cream. We have two options: we want the users to choose whether they want cookies or ice cream.

      How many cookies are you offering?

      One cookie, two cookies, or three cookies?

      We have two options: one cookie or two cookies. Which option would you choose?

      The results appear quickly and clearly. The picture tells us how many cookies are being offered (in the cookie case) and whether the user opted for a cookie or ice cream (in the ice cream case).

      Want a Job at AWS? Find Out What It Takes

      Cloud Architect Master's ProgramExplore Program
      Want a Job at AWS? Find Out What It Takes

      Conclusion

      All clear about AWS SageMaker and its benefits, how Machine Learning works with SageMaker, different ways to train a model, how to validate a model with SageMaker, companies using SageMaker? 

      Whether you’re an experienced AWS Architect, or you’re aspiring to break into this exciting industry, enrolling in our Cloud Architect Master’s program will help you with all levels of experience in master AWS Cloud Architect techniques and strategies. 

      The AWS Solution Architect Certification encompasses a wide range of AWS services, including AWS SageMaker, a fully managed service that provides the ability to build, train, and deploy machine learning models quickly. For individuals pursuing this certification, a comprehensive understanding of SageMaker is crucial, as it represents a key component of AWS's machine learning offerings

      Do you have any questions? Please feel free to leave them in the comments section of this article; our experts will get back to you as soon as possible.

      About the Author

      Sana AfreenSana Afreen

      Sana Afreen is a Senior Research Analyst at Simplilearn and works on several latest technologies. She holds a degree in B. Tech Computer Science. She has also achieved certification in Advanced SEO. Sana likes to explore new places for their cultures, traditions, and cuisines.

      View More
      • Acknowledgement
      • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, OPM3 and the PMI ATP seal are the registered marks of the Project Management Institute, Inc.