9 Reviews and Ratings
22 Reviews and Ratings
No answers on this topic
Apache Flume is well suited when the use case is log data ingestion and aggregate only, for example for compliance of configuration management. It is not well suited where you need a general-purpose real-time data ingestion pipeline that can receive log data and other forms of data streams (eg IoT, messages).Incentivized
Apache Pig is best suited for ETL-based data processes. It is good in performance in handling and analyzing a large amount of data. it gives faster results than any other similar tool. It is easy to implement and any user with some initial training or some prior SQL knowledge can work on it. Apache Pig is proud to have a large community base globally.Incentivized
Multiple sources of data (sources) and destinations (sinks) that allows you to move data form and to any relevant data storageIt is very easy to setup and runVery open to personalization, you can create filters, enrichment, new sources and destinationsIncentivized
Its performance, ease of use, and simplicity in learning and deployment.Using this tool, we can quickly analyze large amounts of data.It's adequate for map-reducing large datasets and fully abstracted MapReduce.Incentivized
It is very specific for log data ingestion so it is pretty hard to use for anything else besides log dataData replication is not built in and needs to be added on top of Apache Flume (not a hard job to do though)Incentivized
UDFS Python errors are not interpretable. Developer struggles for a very very long time if he/she gets these errors.Being in early stage, it still has a small community for help in related matters.It needs a lot of improvements yet. Only recently they added datetime module for time series, which is a very basic requirement.Incentivized
It is quick, fast and easy to implement Apache Pig which makes is quite popular to be used.Incentivized
Apache Flume is open-source so support is limited. Never the less, it has great documentation and best practices documents from their end-users so it is not hard to use, setup and configure.Incentivized
The documentation is adequate. I'm not sure how large of an external community there is for support.Incentivized
Apache Flume is a very good solution when your project is not very complex at transformation and enrichment, and good if you have an external management suite like Cloudera, Hortonworks, etc. But it is not a real EAI or ETL like AB Initio or Attunity soyou need to know exactly what you want. On the other hand being an opensource project give Apache a lot of room to personalize thanks to its plug-able architecture and has a very nice performance having a very low CPU and Memory footprint, a single server can do the job on many occasions, as opposed to the multi-server architecture of paid products.Incentivized
Apache Pig might help to start things faster at first and it was one of the best tool years back but it lacks important features that are needed in the data engineering world right now. Pig also has a steeper learning curve since it uses a proprietary language compared to Spark which can be coded with Python, Java. Incentivized
Flume has simplified a lot many of our ingest procedures, easier to deploy and integrate than a classical EAI, reducing the time to marketBut opposed to EAIs if the project starts to grow in complexity Apache Flume project may not be as suitableIncentivized
Higher learning curve than other similar technologies so on-boarding new engineers or change ownership of Apache Pig code tends to be a bit of a headacheOnce the language is learned and understood it can be relatively straightforward to write simple Pig scripts so development can go relatively quickly with a skilled teamAs distributed technologies grow and improve, overall Apache Pig feels left in the dust and is more legacy code to support than something to actively develop with.Incentivized