What users are saying about
26 Ratings
6 Ratings
26 Ratings
<a href='https://www.trustradius.com/static/about-trustradius-scoring' target='_blank' rel='nofollow'>trScore algorithm: Learn more.</a>
Score 8.3 out of 101
6 Ratings
<a href='https://www.trustradius.com/static/about-trustradius-scoring' target='_blank' rel='nofollow'>trScore algorithm: Learn more.</a>
Score 8.1 out of 101

Add comparison

Likelihood to Recommend

Amazon EMR

Well suited if you quickly want to setup a distributed compute platform, such as Spark. But you have to be advanced enough that you really want to separate compute from data storage. For example, for certain applications packaged solution such as MPP databases (e.g. Redshift) is much easier to set up that Spark on EMR and S3 with the appropriate file formats.
No photo available

Apache Flume

Apache Flume is well suited in small batch and near real time processing projects, taking data from one point to another with local processing (I mean not external enrichment).
Filtering, transforming and multiple push destinations are common grounds for Flume.
It is not so nice to use if your data needs external enrichment (taking data from external databases or web services), as transactions and (micro)batches may lead to reprocessing and it relies upon the application to avoid duplicates.
Juan Francisco Tavira profile photo

Pros

  • Ease of use and ease to setup
  • Autoscaling functionality
  • Integrated into the AWS environment
No photo available
  • Multiple sources of data (sources) and destinations (sinks) that allows you to move data form and to any relevant data storage
  • It is very easy to setup and run
  • Very open to personalization, you can create filters, enrichment, new sources and destinations
Juan Francisco Tavira profile photo

Cons

  • Cost overhead is a bit high
  • Limited versions of frameworks that can be used
No photo available
  • Apache Flume develops new functionality at a slower pace than other OpenSource projects, it is well behing Kafka and has some compatibiliy issues with latest releases
  • It lack HA or FT, it relies on third party management software like Hortonworks or Cloudera
Juan Francisco Tavira profile photo

Alternatives Considered

The alternatives to EMR are mainly hadoop distributions owned by the 3 companies above. I have not used the other distributions so it is difficult to comment, but the general tradeoff is, at the cost of a longer setup time and more infra management, you get more flexible versioning and potentially faster access to newer versions of some frameworks such as Spark.
No photo available
Apache Flume is a very good solution when your project is not very complex at transformation and enrichment, and good if you have an external management suite like Cloudera, Hortonworks, etc. But it is not a real EAI or ETL like AB Initio or Attunity so
you need to know exactly what you want.On the other hand being an opensource project give Apache a lot of room to personalize thanks to its plug-able architecture and has a very nice performance having a very low CPU and Memory footprint, a single server can do the job on many occasions, as opposed to the multi-server architecture of paid products.
Juan Francisco Tavira profile photo

Return on Investment

  • It was easy to set up initial versions of Spark on this
  • Still used as our compute platform as its easy to manage
  • Certain times we forgot to shut down clusters and were overcharged
No photo available
  • Flume has simplified a lot many of our ingest procedures, easier to deploy and integrate than a classical EAI, reducing the time to market
  • But opposed to EAIs if the project starts to grow in complexity Apache Flume project may not be as suitable
Juan Francisco Tavira profile photo

Pricing Details

Amazon EMR

General
Free Trial
Free/Freemium Version
Premium Consulting/Integration Services
Entry-level set up fee?
No
Additional Pricing Details

Apache Flume

General
Free Trial
Free/Freemium Version
Premium Consulting/Integration Services
Entry-level set up fee?
No
Additional Pricing Details