AWS EMR Tutorial – Submitting Apache Spark Jobs
This AWS EMR tutorial will cover end to end life cycle of development of Spark Jobs and submit them using AWS EMR Cluster.
The topics that we will cover in this session are as follows:
What is AWS EMR?
AWS EMR Benefits
AWS EMR Applications
AWS EMR Case Study
AWS EMR Demo
Code : https://github.com/isurunuwanthilaka/pyspark-product-count
Amazon EMR cluster provides a managed Hadoop framework that makes it easy, fast, and cost-effective to process vast amounts of data across dynamically scalable Amazon EC2 instances.
We can also run other popular distributed frameworks such as Apache Spark and HBase in Amazon EMR, and interact with data in other AWS data stores such as Amazon S3 and Amazon DynamoDB.
Amazon EMR is the industry-leading cloud big data platform for processing vast amounts of data using open source tools such as Apache Spark, Apache Hive, Apache HBase, Apache Flink, Apache Hudi, and Presto. Amazon EMR makes it easy to set up, operate, and scale your big data environments by automating time-consuming tasks like provisioning capacity and tuning clusters. With EMR you can run petabyte-scale analysis at less than half of the cost of traditional on-premises solutions and over 3x faster than standard Apache Spark. You can run workloads on Amazon EC2 instances, on Amazon Elastic Kubernetes Service (EKS) clusters, or on-premises using EMR on AWS Outposts.
Written by admin