Research Experience with Amazon EMR
June 22, 2016

Research Experience with Amazon EMR

Anonymous | TrustRadius Reviewer
Score 6 out of 10
Vetted Review
Verified User

Overall Satisfaction with Amazon Elastic MapReduce

As a PhD student, I used Amazon Elastic MapReduce for my research for analyzing my data. Firstly, it was very scalable and did not cause much performance impact when using large data sets. Secondly, their web console is very easy to use and intuitive. There were many resources that could be used whenever I encountered any problems with EMR.
  • The cluster size of MapReduce is very dynamic and therefore scalability is good for EMR.
  • It also works well with other Amazon Web Services like Amazon Simple Storage Service, which means that data can be taken from those services and written back to them.
  • I tried using the in-house hosting at the university I work in, but there would be a lot of complications with technical support required. For Amazon, the support and documentation was good to solve these problems faster.
  • It would have been better if packages like HBase and Flume were available with Amazon EMR. This would make the product even more helpful in some cases.
  • Products like Cloudera provide the options to move the whole deployment into a dedicated server and use it at our discretion. This would have been a good option if available with EMR.
  • If EMR gave the option to be used with any choice of cloud provider, it would have helped instead of having to move the data from another cloud service to S3.
  • Positive: Helped process the jobs amazingly fast.
  • Positive: Did not have to spend much time to learn the system, therefore, saving valuable research time.
  • Negative: Not flexible for some scenarios, like when some plugins are required, or when the project has to be moved in-house.
  • Cloudera
EMR provides dynamic cluster size, lots of documentation, and integration with other Amazon Web Services which are some of the things that Cloudera distribution for Hadoop lacked. Some products are hard to learn but EMR was much easier and helped save time spent on trying to figure out how to deploy projects in MapReduce.
If the person using EMR does not need much customization, like debugging or other modifications, or the data is not entirely in another cloud, then Amazon Elastic MapReduce is a better option. Otherwise, there are other open source projects available like Cloudera that are available to be used. Products like Cloudera can also be deployed in any cloud, rather than having to stick with Amazon.