November 17, 2017
Director in EngineeringComputer Software Company, 10,001+ employees
Score 8 out of 10
Overall Satisfaction with Amazon Elastic MapReduce
EMR is being used by our department, not the whole organization. We use it as the infrastructure on which we run Spark jobs. Those jobs are mainly used for data I/O, data processing, and machine learning applications.
- Ease of use and ease to setup
- Autoscaling functionality
- Integrated into the AWS environment
- Cost overhead is a bit high
- Limited versions of frameworks that can be used
- It was easy to set up initial versions of Spark on this
- Still used as our compute platform as its easy to manage
- Certain times we forgot to shut down clusters and were overcharged
The alternatives to EMR are mainly hadoop distributions owned by the 3 companies above. I have not used the other distributions so it is difficult to comment, but the general tradeoff is, at the cost of a longer setup time and more infra management, you get more flexible versioning and potentially faster access to newer versions of some frameworks such as Spark.
Well suited if you quickly want to setup a distributed compute platform, such as Spark. But you have to be advanced enough that you really want to separate compute from data storage. For example, for certain applications packaged solution such as MPP databases (e.g. Redshift) is much easier to set up that Spark on EMR and S3 with the appropriate file formats.