AWS for Machine Learning

Pratibha
Jan 20, 2021
9 min read

Updated: Mar 26, 2021

What is AWS?

Amazon web (AWS) service is a platform that offers flexible, reliable, scalable, easy-to-use and cost-effective cloud computing solutions.

It is a comprehensive, easy to use computing platform offered Amazon. The platform is developed with a combination of infrastructure as a service (IaaS), platform as a service (PaaS) and packaged software as a service (SaaS) offerings.

AWS is the world’s most comprehensive and broadly adopted cloud platform, offering over 175 fully featured services from data centres globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWS was first established in 2002 because the company wanted to sell its unused infrastructure as a service or as an offering to customers. In 2006, Amazon Web Services (AWS) was re-launched and began offering IT infrastructure services to businesses in the form of web services -- now commonly known as cloud computing.

Below is a list of companies utilising AWS:

Instagram
Zoopla
Smugmug
Pinterest
Netflix
Dropbox
Etsy
Talkbox
Playfish
Ftopia

Advantages of AWS

Following are the pros of using AWS services:

AWS allows organizations to use the already familiar programming models, operating systems, databases, and architectures.
You only need to pay for the service you avail, without any up-front or long-term commitments.
You will not require to spend money on running and maintaining data centres.
Offers fast deployments
You can easily add or remove capacity.
You are allowed cloud access quickly with limitless capacity.
Total Cost of Ownership is very low compared to any private/dedicated servers.
Offers Centralized Billing and management
Offers Hybrid Capabilities
Allows you to deploy your application in multiple regions around the world with just a few clicks

Disadvantages of AWS

Following are the cons of using AWS services:

If you need more immediate or intensive assistance, you'll have to opt for paid support packages.
Amazon Web Services may have some common cloud computing issues when you move to a cloud. For example, downtime, limited control, and backup protection.
AWS sets default limits on resources which differ from region to region. These resources consist of images, volumes and snapshots.
Hardware-level changes happen to your application which may not offer the best performance and usage of your applications.

AWS for machine learning

Machine learning is a field in computational science that analyses patterns and structures in data to help with learning, reasoning, and decision-making—all without human interaction. Data is the lifeblood of business, and machine learning helps identify signals among the data noise.

AWS offers the broadest and deepest set of machine learning services and supporting cloud infrastructure, putting machine learning in the hands of every developer, data scientist and expert practitioner. Named a leader in Gartner's Cloud AI Developer services' Magic Quadrant, AWS is helping tens of thousands of customers accelerate their machine learning journey.

Now that we have ample information about AWS we will move one step forward to discuss its services available for machine learning.

1. Amazon Sagemaker

Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy ML models at scale. It removes the complexity from each step of the ML workflow so you can more easily deploy your ML use cases, anything from predictive maintenance to computer vision to predicting customer behaviors.

The SageMaker comprises of the following 12 features:

Amazon SageMaker Studio It is the first fully integrated development environment designed specifically for ML that brings everything you need for ML under one unified, visual user interface. You can use Amazon SageMaker’s integrated capabilities for ML development, in order to eliminate months of writing custom integration code, and ultimately reduce cost.
Amazon SageMaker Autopilot It automatically builds, trains, and tunes the best machine learning models based on your data, while allowing you to maintain full control and visibility. With SageMaker Autopilot, you simply provide a tabular dataset and select the target column to predict, which can be a number (such as a house price, called regression), or a category (such as spam/not spam, called classification). SageMaker Autopilot will automatically explore different solutions to find the best model. You then can directly deploy the model to production with just one click, or iterate on the recommended solutions with Amazon SageMaker Studio to further improve the model quality.
Amazon SageMaker Ground Truth It is a fully managed data labelling service that makes it easy to build highly accurate training datasets for machine learning. Get started with labelling your data in minutes through the SageMaker Ground Truth console using custom or built-in data labelling workflows. These workflows support a variety of use cases including 3D point clouds, video, images, and text. As part of the workflows, labellers have access to assistive labelling features such as automatic 3D cuboid snapping, removal of distortion in 2D images, and auto-segment tools to reduce the time required in labelling datasets. In addition, Ground Truth offers automatic data labelling which uses a machine learning model to label your data.
Amazon SageMaker JumpStart It helps you quickly and easily get started with machine learning. To make it easier to get started, SageMaker JumpStart provides a set of solutions for the most common use cases that can be deployed readily with just a few clicks. The solutions are fully customizable and showcase the use of AWS CloudFormation templates and reference architectures so you can accelerate your ML journey. Amazon SageMaker JumpStart also supports one-click deployment and fine-tuning of more than 150 popular open source models such as natural language processing, object detection, and image classification models.
Amazon SageMaker Data Wrangler It reduces the time it takes to aggregate and prepare data for machine learning (ML) from weeks to minutes. With SageMaker Data Wrangler, you can simplify the process of data preparation and feature engineering, and complete each step of the data preparation workflow, including data selection, cleansing, exploration, and visualization from a single visual interface. It contains over 300 built-in data transformations so you can quickly normalize, transform, and combine features without having to write any code.
Amazon SageMaker Feature Store It is a fully managed, purpose-built repository to store, update, retrieve, and share machine learning (ML) features. It is a purpose-built repository where you can store and access features so it’s much easier to name, organize, and reuse them across teams. SageMaker Feature Store provides a unified store for features during training and real-time inference without the need to write additional code or create manual processes to keep features consistent. It keeps track of the metadata of stored features (e.g. feature name or version number) so that you can query the features for the right attributes in batches or in real time using Amazon Athena, an interactive query service. It also keeps features updated, because as new data is generated during inference, the single repository is updated so new features are always available for models to use during training and inference.
Amazon SageMaker Clarify It provides machine learning developers with greater visibility into their training data and models so they can identify and limit bias and explain predictions. It detects potential bias during data preparation, after model training, and in your deployed model by examining attributes you specify. For instance, you can check for bias related to age in your initial dataset or in your trained model and receive a detailed report that quantifies different types of possible bias. It also includes feature importance graphs that help you explain model predictions and produces reports which can be used to support internal presentations or to identify issues with your model that you can take steps to correct.
Amazon SageMaker Debugger It makes it easy to optimize machine learning (ML) models by capturing training metrics in real-time such as data loss during regression and sending alerts when anomalies are detected. This helps you immediately rectify inaccurate model predictions such as an incorrect identification of an image. SageMaker Debugger automatically stops the training process when the desired accuracy is achieved, reducing the time and cost of training ML models.
Amazon SageMaker Model Monitor It helps you maintain high quality machine learning (ML) models by automatically detecting and alerting on inaccurate predictions from models deployed in production. It helps you maintain high quality ML models by detecting model and concept drift in real-time, and sending you alerts so you can take immediate action. Model and concept drift are detected by monitoring the quality of the model based on independent and dependent variables. Further, SageMaker Model Monitor constantly monitors model performance characteristics such as accuracy which measures the number of correct predictions compared to the total number of predictions so you can take action to address anomalies.
Amazon SageMaker distributed training It offers the fastest and easiest methods for training large deep learning models and datasets. Using partitioning algorithms, SageMaker distributed training automatically splits large deep learning models and training datasets across AWS GPU instances in a fraction of the time it takes to do manually. SageMaker achieves these efficiencies through two techniques: data parallelism and model parallelism. With only a few lines of additional code, you can add either data parallelism or model parallelism to your PyTorch and TensorFlow training scripts and Amazon SageMaker will apply your selected method for you. It will determine the best approach to split your model by using graph partitioning algorithms to balance the computation of each GPU while minimizing the communication between GPU instances. SageMaker also optimizes your distributed training jobs through algorithms that are designed to fully utilize AWS compute and network infrastructure in order to achieve near-linear scaling efficiency, which allows you to complete training faster than manual implementations.
Amazon SageMaker Pipelines It is the first purpose-built, easy-to-use continuous integration and continuous delivery (CI/CD) service for machine learning (ML). With SageMaker Pipelines, you can create, automate, and manage end-to-end ML workflows at scale. Since it is purpose-built for machine learning, SageMaker Pipelines helps you automate different steps of the ML workflow, including data loading, data transformation, training and tuning, and deployment. With it, you can build dozens of ML models a week, manage massive volumes of data, thousands of training experiments, and hundreds of different model versions. You can share and re-use workflows to recreate or optimize models, helping you scale ML throughout your organization.
Amazon SageMaker Edge Manager It allows you to optimize, secure, monitor, and maintain ML models on fleets of smart cameras, robots, personal computers, and mobile devices. Amazon SageMaker Edge Manager provides a software agent that runs on edge devices. The agent comes with a ML model optimized with SageMaker Neo automatically so you don’t need to have Neo runtime installed on your devices in order to take advantage of the model optimizations. The agent also collects prediction data and sends a sample of the data to the cloud for monitoring, labelling, and retraining so you can keep models accurate over time. All data can be viewed in the SageMaker Edge Manager dashboard which reports on the operation of deployed models. And, because it enables you to manage models separately from the rest of the application, you can update the model and the application independently reducing costly downtime and service disruptions. It also cryptographically signs your models so you can verify that it was not tampered with as it moves from the cloud to edge devices.

2. Amazon Polly

Amazon Polly is a service that turns text into lifelike speech, allowing you to create applications that talk, and build entirely new categories of speech-enabled products.

It was launched in November 2016 and now includes 60 voices across 29 languages.

3. Amazon Lex

Amazon Lex is a service for building conversational interfaces into any application using voice and text. It powers the Amazon Alexa virtual assistant.

You can design, build, and deploy chatbots with it.

4. Amazon Rekognition

Amazon Rekognition makes it easy to add image and video analysis to your applications using proven, highly scalable, deep learning technology that requires no machine learning expertise to use. With Amazon Rekognition, you can identify objects, people, text, scenes, and activities in images and videos, as well as detect any inappropriate content.

It was launched in 2016.

5. Amazon Comprehend

Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to find insights and relationships in text. No machine learning experience required.

6. Amazon Transcribe

Amazon Transcribe uses a deep learning process called automatic speech recognition (ASR) to convert speech to text quickly and accurately.

7. Amazon Fraud Detector

Amazon Fraud Detector is a fully managed service that uses machine learning (ML) and more than 20 years of fraud detection expertise from Amazon, to identify potentially fraudulent activity so customers can catch more online fraud faster. Amazon Fraud Detector automates the time-consuming and expensive steps to build, train, and deploy an ML model for fraud detection, making it easier for customers to leverage the technology. Amazon Fraud Detector customizes each model it creates to a customer’s own dataset, making the accuracy of models higher than current one-size-fits all ML solutions.

8. Amazon Forecast

Amazon Forecast is a fully managed service that uses machine learning to deliver highly accurate forecasts.

Amazon Forecast uses machine learning to combine time series data with additional variables to build forecasts. Amazon Forecast requires no machine learning experience to get started. You only need to provide historical data, plus any additional data that you believe may impact your forecasts. Once you provide your data, Amazon Forecast will automatically examine it, identify what is meaningful, and produce a forecasting model capable of making predictions that are up to 50% more accurate than looking at time series data alone.

These were few of the important services offered by Amazon in relation to ML but there are many more services offered by them in a variety of field. Do check them out.

Related links: