Building Batch Data Analytics Solutions on AWS
Building Batch Data Analytics Solutions on AWS
Description
In this course, you will build batch data analytics solutions using Amazon EMR and AWS Glue. Amazon EMR provides a managed Apache Hadoop service to enable cost-effective processing of large amounts of data using distributed applications such as Apache Spark and Apache Hive. AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development. The course focuses on the data collection, ingestion, cataloging, storage, and processing components of the analytics pipeline. You will learn to integrate Amazon EMR and AWS Glue with a data lake to support both analytics and machine learning workloads. You will also learn to apply security, performance, and cost management best practices to the operation of Amazon EMR and AWS Glue.