TrustRadius
Amazon Elastic MapReduce (EMR) is a web service for processing big data (hadoop).https://media.trustradius.com/product-logos/96/0p/49F4QV1KRH9P.pngAmazon Elastic MapReduce is Worth the Effort if You Really Need Big Data Management, Just Make Sure You Really Need ItAmazon Elastic MapReduce is used by my department to produce big data analytics for certain clients. The software address data mining and predictive analytics for data sets that take a long time to process. The software is not used for econometric or other analytical evaluation because the size of the data sets does not lend themselves to such analysis. The software is used almost exclusively for data mining and simple reporting for large data cases.,Amazon Elastic MapReduce works well for managing analyses that use multiple tools, such as Hadoop and Spark. If it were not for the fact that we use multiple tools, there would be less need for MapReduce. MapReduce is always on. I've never had a problem getting data analyses to run on the system. It's simple to set up data mining projects. Amazon Elastic MapReduce has no problems dealing with very large data sets. It processes them just fine. With that said, the outputs don't come instantaneously. It takes time.,The analytical processes generally run quicker with the standalone tools of Hadoop, Spark, and others. If you only use one big data tool and don't really need things simplified, then Elastic MapReduce is more of an overhead tool that doesn't add much value. The analytical capabilities of Elastic MapReduce are nowhere near as complex or broad as non-big data tools. I would suggest not using the tool unless your data really is big data. The machine learning capabilities of Elastic MapReduce (using the big data tools of Hadoop/Spark) are good but are not as easy to use as other machine learning tools.,8,Amazon Elastic MapReduce has had a positive ROI in the sense that it saved time managing big data projects where analysts were using different big data tools. Essentially, an increase in employee productivity. Elastic MapReduce is not worth it in cases where you're just trying things out. You'll likely lose money unless you're sure that using MapReduce is a good idea. Elastic MapReduce takes some time learning, although not too much. If the employee is less well-versed in big data analytics, the software is a high hill to climb that eats up employee time.,Hadoop, Apache Spark, Apache Spark MLib, Apache Spark Streaming, Apache Web Server, Google Cloud AI, Google Cloud CDN, Google Cloud Dataflow, Google Cloud Datastore, Google Cloud Pub/Sub, Google Cloud SQL, Google Cloud Spanner, Google Data Studio, Google Correlate, Google Marketing Platform (formerly DoubleClick), Google Pay (formerly Google Wallet), Hortonworks Data Platform, Cloudera Data Science Workbench, Cloudera Enterprise, Cloudera Manager, Datameer, Pi Datametrics, FICO Model Builder, IBM SPSS Modeler, Microsoft Dynamics SL, Microsoft Exchange, 6sense and Sybase Aleri Streaming Platform,Google Ad Manager, Google Analytics Premium, Google Cloud AI, Google Cloud Dataflow, Google Cloud Datastore, Google Cloud Storage, 6sense, MapR, IBM InfoSphere DataStage, IBM InfoSphere Data Replication, Domino, Datameer, Cloudera Data Science Workbench, Cloudera Manager, Google Cloud SQLAmazon EMR- Great cloud based Hadoop platformWe use Amazon EMR for big data storage and processing. It's cluster architecture with each department having different clusters. It's great for processing and storage of large volumes of data, specifically, the data which is unstructured and generates very rapidly, like network logs.,Distributed computing Fault tolerant Uptime,Providing user friendly tools for hdfs access More simpler apis for easy access and processsing Memory requirenent,9,Better accesss to business data Faster business decisions Better storage and processingEMR reviewEMR is being used by our department, not the whole organization. We use it as the infrastructure on which we run Spark jobs. Those jobs are mainly used for data I/O, data processing, and machine learning applications.,Ease of use and ease to setup Autoscaling functionality Integrated into the AWS environment,Cost overhead is a bit high Limited versions of frameworks that can be used,8,It was easy to set up initial versions of Spark on this Still used as our compute platform as its easy to manage Certain times we forgot to shut down clusters and were overcharged,Databricks, Cloudera Enterprise and Hortonworks Data Platform,Amazon S3 (Simple Storage Service), Amazon Relational Database Service, Apache Spark, Cassandra, Apache KafkaAWS EMR at a glance!!We have used AWS EMR before starting to use Databricks on EC2 instances. EMR was solving the problem but we needed a better solution (Enterprise edition) to manage our Workbooks and better scheduler for running or jobs. EMR was working fine but we did not find it user friendly to add the data nodes on demand. We used EMR primarily to process the data on AWS S3 using Hadoop and Spark frameworks. We have also used AWS SWF to orchestrate our job flow by adding steps. It was used widely by the data processing team and not by the entire organization as most of the data was on local servers. It addresses problems like processing data which might not need to be processed live as the cluster can be spun up and shut down once the job is completed. It is cost efficient (especially if you do not need data nodes and only task nodes), scalable and reliable.,EMR does well in managing the cost as it uses the task node cores to process the data and these instances are cheaper when the data is stored on s3. It is really cost efficient. No need to maintain any libraries to connect to AWS resources. EMR is highly available, secure and easy to launch. No much hassle in launching the cluster (Simple and easy). EMR manages the big data frameworks which the developer need not worry (no need to maintain the memory and framework settings) about the framework settings. It's all setup on launch time. The bootstrapping feature is great.,Sometimes bootstrapping certain tools comes with debugging costs. The tools provided by some of the enterprise editions are great compared to EMR. Like some of the enterprise editions EMR does not provide on premises options. No UI client for saving the workbooks or code snippets. Everything has to go through submitting process. Not really convenient for tracking the job as well.,7,It was obviously cheaper and convenient to use as most of our data processing and pipelines are on AWS. It was fast and readily available with a click and that saved a ton of time rather than having to figure out the down time of the cluster if its on premises. It saved time on processing chunks of big data which had to be processed in short period with minimal costs. EMR solved this as the cluster setup time and processing was simple, easy, cheap and fast. It had a negative impact as it was very difficult in submitting the test jobs as it lags a UI to submit spark code snippets.,Databricks and Hortonworks Data Platform,Databricks, Amazon Elastic Compute Cloud (EC2), Amazon DynamoDB, Amazon S3 (Simple Storage Service), Amazon Aurora, Amazon Redshift, Amazon CloudFront, Amazon CloudWatchResearch Experience with Amazon EMRAs a PhD student, I used Amazon Elastic MapReduce for my research for analyzing my data. Firstly, it was very scalable and did not cause much performance impact when using large data sets. Secondly, their web console is very easy to use and intuitive. There were many resources that could be used whenever I encountered any problems with EMR.,The cluster size of MapReduce is very dynamic and therefore scalability is good for EMR. It also works well with other Amazon Web Services like Amazon Simple Storage Service, which means that data can be taken from those services and written back to them. I tried using the in-house hosting at the university I work in, but there would be a lot of complications with technical support required. For Amazon, the support and documentation was good to solve these problems faster.,It would have been better if packages like HBase and Flume were available with Amazon EMR. This would make the product even more helpful in some cases. Products like Cloudera provide the options to move the whole deployment into a dedicated server and use it at our discretion. This would have been a good option if available with EMR. If EMR gave the option to be used with any choice of cloud provider, it would have helped instead of having to move the data from another cloud service to S3.,6,Positive: Helped process the jobs amazingly fast. Positive: Did not have to spend much time to learn the system, therefore, saving valuable research time. Negative: Not flexible for some scenarios, like when some plugins are required, or when the project has to be moved in-house.,Cloudera,Cloudera Enterprise
Unspecified
Amazon Elastic MapReduce
28 Ratings
Score 8.2 out of 101
<a href='https://www.trustradius.com/static/about-trustradius-scoring' target='_blank' rel='nofollow noopener noreferrer'>trScore algorithm: Learn more.</a>TRScore

Amazon EMR Reviews

Amazon EMR
28 Ratings
<a href='https://www.trustradius.com/static/about-trustradius-scoring' target='_blank' rel='nofollow noopener noreferrer'>trScore algorithm: Learn more.</a>
Score 8.2 out of 101

Do you work for this company?

Show Filters 
Hide Filters 
Filter 28 vetted Amazon EMR reviews and ratings
Clear all filters
Overall Rating
Reviewer's Company Size
Last Updated
By Topic
Industry
Department
Experience
Job Type
Role

Reviews (1-5 of 5)

Companies can't remove reviews or game the system. Here's why.
Thomas Young profile photo
Score 8 out of 10
Vetted Review
Verified User
Review Source
Amazon Elastic MapReduce is used by my department to produce big data analytics for certain clients. The software address data mining and predictive analytics for data sets that take a long time to process. The software is not used for econometric or other analytical evaluation because the size of the data sets does not lend themselves to such analysis. The software is used almost exclusively for data mining and simple reporting for large data cases.
  • Amazon Elastic MapReduce works well for managing analyses that use multiple tools, such as Hadoop and Spark. If it were not for the fact that we use multiple tools, there would be less need for MapReduce.
  • MapReduce is always on. I've never had a problem getting data analyses to run on the system. It's simple to set up data mining projects.
  • Amazon Elastic MapReduce has no problems dealing with very large data sets. It processes them just fine. With that said, the outputs don't come instantaneously. It takes time.
  • The analytical processes generally run quicker with the standalone tools of Hadoop, Spark, and others. If you only use one big data tool and don't really need things simplified, then Elastic MapReduce is more of an overhead tool that doesn't add much value.
  • The analytical capabilities of Elastic MapReduce are nowhere near as complex or broad as non-big data tools. I would suggest not using the tool unless your data really is big data.
  • The machine learning capabilities of Elastic MapReduce (using the big data tools of Hadoop/Spark) are good but are not as easy to use as other machine learning tools.
Amazon Elastic MapReduce is useful in cases where two conditions are met. First, that you are planning on using multiple big data tools simultaneously to analyze big data sets. And second, that you need a tool that simplifies managing big data tools. If these two conditions are met, MapReduce does a great job. The user interface is simple. The program eliminates some programming requirements. The software also makes setting up big data analyses much easier. With these benefits acknowledged, MapReduce is not a good tool for "small" data analyses, given that there are other tools that do the job quicker and much more professional output. If you're on the fence, try out MapReduce with competing "small" data tools and see if you really need big data software.
Read Thomas Young's full review
No photo available
Score 9 out of 10
Vetted Review
Verified User
Review Source
We use Amazon EMR for big data storage and processing. It's cluster architecture with each department having different clusters. It's great for processing and storage of large volumes of data, specifically, the data which is unstructured and generates very rapidly, like network logs.
  • Distributed computing
  • Fault tolerant
  • Uptime
  • Providing user friendly tools for hdfs access
  • More simpler apis for easy access and processsing
  • Memory requirenent
If you don't have big data ..i.e petabytes of data with terabytes of data generating every day, then don't use Hadoop. Relational databases are enough for terabytes of data. Hadoop is not well suited for transactional systems or data.
Read this authenticated review
No photo available
November 17, 2017

EMR review

Score 8 out of 10
Vetted Review
Verified User
Review Source
EMR is being used by our department, not the whole organization. We use it as the infrastructure on which we run Spark jobs. Those jobs are mainly used for data I/O, data processing, and machine learning applications.
  • Ease of use and ease to setup
  • Autoscaling functionality
  • Integrated into the AWS environment
  • Cost overhead is a bit high
  • Limited versions of frameworks that can be used
Well suited if you quickly want to setup a distributed compute platform, such as Spark. But you have to be advanced enough that you really want to separate compute from data storage. For example, for certain applications packaged solution such as MPP databases (e.g. Redshift) is much easier to set up that Spark on EMR and S3 with the appropriate file formats.
Read this authenticated review
No photo available
October 25, 2017

AWS EMR at a glance!!

Score 7 out of 10
Vetted Review
Verified User
Review Source
We have used AWS EMR before starting to use Databricks on EC2 instances. EMR was solving the problem but we needed a better solution (Enterprise edition) to manage our Workbooks and better scheduler for running or jobs. EMR was working fine but we did not find it user friendly to add the data nodes on demand. We used EMR primarily to process the data on AWS S3 using Hadoop and Spark frameworks. We have also used AWS SWF to orchestrate our job flow by adding steps. It was used widely by the data processing team and not by the entire organization as most of the data was on local servers. It addresses problems like processing data which might not need to be processed live as the cluster can be spun up and shut down once the job is completed. It is cost efficient (especially if you do not need data nodes and only task nodes), scalable and reliable.
  • EMR does well in managing the cost as it uses the task node cores to process the data and these instances are cheaper when the data is stored on s3. It is really cost efficient. No need to maintain any libraries to connect to AWS resources.
  • EMR is highly available, secure and easy to launch. No much hassle in launching the cluster (Simple and easy).
  • EMR manages the big data frameworks which the developer need not worry (no need to maintain the memory and framework settings) about the framework settings. It's all setup on launch time. The bootstrapping feature is great.
  • Sometimes bootstrapping certain tools comes with debugging costs. The tools provided by some of the enterprise editions are great compared to EMR.
  • Like some of the enterprise editions EMR does not provide on premises options.
  • No UI client for saving the workbooks or code snippets. Everything has to go through submitting process. Not really convenient for tracking the job as well.
EMR is suited if the jobs are long running and doesn't really need much monitoring. EMR is really flexible in processing the data on s3 as a developer doesn't need to spend time on debugging the connections to s3 from a big data framework as most of the configuration is taken care of by Amazon. Very cheap when compared to most of the solutions on the market and the ready to go configuration at the launch time reduces the amount of time required for admin tasks. So, considering the cheap cost, processing options on s3 and scalability via adding task nodes, EMR serves a better purpose for startups considering open source and cost efficient options.

However, EMR comes with its own disadvantages. There is no proper UI to track real time jobs which is however possible with Enterprise editions like Cloudera, Hortonworks etc. EMR could provide an interface to add workbooks and code snippets in the cluster as it would reduce the time to submit the tasks. EMR also lags the potential to automatically replace unhealthy nodes.
Read this authenticated review
No photo available
Score 6 out of 10
Vetted Review
Verified User
Review Source
As a PhD student, I used Amazon Elastic MapReduce for my research for analyzing my data. Firstly, it was very scalable and did not cause much performance impact when using large data sets. Secondly, their web console is very easy to use and intuitive. There were many resources that could be used whenever I encountered any problems with EMR.
  • The cluster size of MapReduce is very dynamic and therefore scalability is good for EMR.
  • It also works well with other Amazon Web Services like Amazon Simple Storage Service, which means that data can be taken from those services and written back to them.
  • I tried using the in-house hosting at the university I work in, but there would be a lot of complications with technical support required. For Amazon, the support and documentation was good to solve these problems faster.
  • It would have been better if packages like HBase and Flume were available with Amazon EMR. This would make the product even more helpful in some cases.
  • Products like Cloudera provide the options to move the whole deployment into a dedicated server and use it at our discretion. This would have been a good option if available with EMR.
  • If EMR gave the option to be used with any choice of cloud provider, it would have helped instead of having to move the data from another cloud service to S3.
If the person using EMR does not need much customization, like debugging or other modifications, or the data is not entirely in another cloud, then Amazon Elastic MapReduce is a better option. Otherwise, there are other open source projects available like Cloudera that are available to be used. Products like Cloudera can also be deployed in any cloud, rather than having to stick with Amazon.
Read this authenticated review

Amazon EMR Scorecard Summary

About Amazon EMR

Amazon Elastic MapReduce (EMR) is a web service for processing big data (hadoop).
Categories:  Hadoop-Related

Amazon EMR Technical Details

Operating Systems: Unspecified
Mobile Application:No