AWS EMR Tutorial – Submit an Apache Spark Job on EMR Cluster
This tutorial will show you how to run spark application on Amazon EMR cluster. EMR stands for Elastic map reduce. Amazon EMR cluster provides a managed Hadoop framework that makes it easy, fast, and cost-effective to process vast amounts of data across dynamically scalable Amazon EC2 instances.
We can also run other popular distributed frameworks such as Apache Spark and HBase in Amazon EMR, and interact with data in other AWS data stores such as Amazon S3 and Amazon DynamoDB.
* Prerequisites
* Quick Recap of Application Development
* Defining Problem Statement
* Setup Project and add Dependencies
* Develop Application
* Validating using IDE
* Build and Validate Jar
* Uploading Jars to s3
* Quick recap of Setting up EMR Cluster
* Deploy Jar – Step Execution
* Review Job Logs
#AWS #EMR #Spark
Written by admin