What users are saying about

Amazon EMR

24 Ratings

Apache Spark

99 Ratings

Amazon EMR

24 Ratings
<a href='https://www.trustradius.com/static/about-trustradius-scoring' target='_blank' rel='nofollow'>trScore algorithm: Learn more.</a>
Score 8.3 out of 101

Apache Spark

99 Ratings
<a href='https://www.trustradius.com/static/about-trustradius-scoring' target='_blank' rel='nofollow'>trScore algorithm: Learn more.</a>
Score 8.6 out of 101

Add comparison

Likelihood to Recommend

Amazon EMR

Well suited if you quickly want to setup a distributed compute platform, such as Spark. But you have to be advanced enough that you really want to separate compute from data storage. For example, for certain applications packaged solution such as MPP databases (e.g. Redshift) is much easier to set up that Spark on EMR and S3 with the appropriate file formats.
No photo available

Apache Spark

Apache Spark is very well suited for big data analytics in conjunction with the hadoop file system and also does a good job of providing fast access to data in SQL workloads since it has an in memory data processing engine that can very quickly process data. In addition, it can also be used for streaming data processing.
Shiv Shivakumar profile photo

Pros

  • Distributed computing
  • Fault tolerant
  • Uptime
No photo available
  • in memory data engine and hence faster processing
  • does well to lay on top of hadoop file system for big data analytics
  • very good tool for streaming data
Shiv Shivakumar profile photo

Cons

  • Providing user friendly tools for hdfs access
  • More simpler apis for easy access and processsing
  • Memory requirenent
No photo available
  • could do a better job for analytics dashboards to provide insights on a data stream and hence not have to rely on data visualization tools along with spark
  • also there is room for improvement in the area of data discovery
Shiv Shivakumar profile photo

Alternatives Considered

The alternatives to EMR are mainly hadoop distributions owned by the 3 companies above. I have not used the other distributions so it is difficult to comment, but the general tradeoff is, at the cost of a longer setup time and more infra management, you get more flexible versioning and potentially faster access to newer versions of some frameworks such as Spark.
No photo available
We evaluated SAS alongside with Apache Spark but during the course of proof of concept found that Apache Spark was able to support the hadoop eco-system and hadoop file system much better. It was much faster at that time while having the ability to process data quickly for the business analytical needs and and also scaled up well.
Shiv Shivakumar profile photo

Return on Investment

  • It was easy to set up initial versions of Spark on this
  • Still used as our compute platform as its easy to manage
  • Certain times we forgot to shut down clusters and were overcharged
No photo available
  • overall positive impact to the business for analysis of big data using hadoop file system
  • very well received by data scientists in the business despite its shortcoming on analytical dashboarding
Shiv Shivakumar profile photo

Pricing Details

Amazon EMR

General
Free Trial
Free/Freemium Version
Premium Consulting/Integration Services
Entry-level set up fee?
No
Additional Pricing Details

Apache Spark

General
Free Trial
Free/Freemium Version
Premium Consulting/Integration Services
Entry-level set up fee?
No
Additional Pricing Details