What users are saying about

Apache Flume

5 Ratings

Apache Pig

18 Ratings

Apache Flume

5 Ratings
<a href='https://www.trustradius.com/static/about-trustradius-scoring' target='_blank' rel='nofollow'>trScore algorithm: Learn more.</a>
Score 7.9 out of 101

Apache Pig

18 Ratings
<a href='https://www.trustradius.com/static/about-trustradius-scoring' target='_blank' rel='nofollow'>trScore algorithm: Learn more.</a>
Score 7.3 out of 101

Add comparison

Likelihood to Recommend

Apache Flume

Apache Flume is well suited in small batch and near real time processing projects, taking data from one point to another with local processing (I mean not external enrichment).
Filtering, transforming and multiple push destinations are common grounds for Flume.
It is not so nice to use if your data needs external enrichment (taking data from external databases or web services), as transactions and (micro)batches may lead to reprocessing and it relies upon the application to avoid duplicates.
Juan Francisco Tavira profile photo

Apache Pig

- Custom load, store, filter functionalities are needed and writing Java map reduce code is not an option due susceptible to bugs.- Chain multiple MR jobs into one pig job.
No photo available

Pros

  • Multiple sources of data (sources) and destinations (sinks) that allows you to move data form and to any relevant data storage
  • It is very easy to setup and run
  • Very open to personalization, you can create filters, enrichment, new sources and destinations
Juan Francisco Tavira profile photo
  • Apache pig DSL provides a better alternative to Java map reduce code and the instruction set is very easy to learn and master.
  • It has many advanced features built-in such as joins, secondary sort, many optimizations, predicate push-down, etc.
  • When Hive was not very advanced (extremely slow) few years ago, pig has always been the go to solution. Now with Spark and Hive (after significant updates), the need to learn apache pig may be questionable.
No photo available

Cons

  • Apache Flume develops new functionality at a slower pace than other OpenSource projects, it is well behing Kafka and has some compatibiliy issues with latest releases
  • It lack HA or FT, it relies on third party management software like Hortonworks or Cloudera
Juan Francisco Tavira profile photo
  • Improve Spark support and compatibility
  • Spark and Hive are already being used main-stream, both of them have an instruction set that is easier to learn and master in a matter of days. While apache pig used to be a great alternative to writing java map reduce, Hive after significant updates is now either equal or better than pig.
No photo available

Usability

No score
No answers yet
No answers on this topic
Apache Pig10.0
Based on 1 answer
It is quick, fast and easy to implement Apache Pig which makes is quite popular to be used.
Subhadipto Poddar profile photo

Alternatives Considered

Apache Flume is a very good solution when your project is not very complex at transformation and enrichment, and good if you have an external management suite like Cloudera, Hortonworks, etc. But it is not a real EAI or ETL like AB Initio or Attunity so
you need to know exactly what you want.On the other hand being an opensource project give Apache a lot of room to personalize thanks to its plug-able architecture and has a very nice performance having a very low CPU and Memory footprint, a single server can do the job on many occasions, as opposed to the multi-server architecture of paid products.
Juan Francisco Tavira profile photo
I use both Apache Pig and its alternatives like Apache Spark & Apache Hive. Apache Pig was one of the best options in Big Data's initial stages. But now alternatives have taken over the market, rendering Apache Pig behind in the competition. But it is still a better alternative to Map Reduce. It is also a good option for working with unstructured datasets. Moreover, in certain cases, Apache Pig is much faster than Hive & Spark.
Kartik Chavan profile photo

Return on Investment

  • Flume has simplified a lot many of our ingest procedures, easier to deploy and integrate than a classical EAI, reducing the time to market
  • But opposed to EAIs if the project starts to grow in complexity Apache Flume project may not be as suitable
Juan Francisco Tavira profile photo
  • The ROI was definitely positive in the beginning, but hard to say the same now due to advancements in Hive and Spark.
No photo available

Pricing Details

Apache Flume

General
Free Trial
Free/Freemium Version
Premium Consulting/Integration Services
Entry-level set up fee?
No
Additional Pricing Details

Apache Pig

General
Free Trial
Free/Freemium Version
Premium Consulting/Integration Services
Entry-level set up fee?
No
Additional Pricing Details