Item: Apache Flume
Rating: 7
Author: Juan Francisco Tavira

Overall Satisfaction with Apache Flume

Use Cases and Deployment Scope

Apache Flume is a key software piece in BigData environments, we have used it along with CDC (Change Data Capture) to ingest near real time database changes into Kafka so the data is available for realtime analysis, machine learning, dynamic dashboards and so
on.

We have successfully integrated also Apache Flume in log acquisition solutions (mainly PaaS and Docker) where application log is difficult access.

Pros and Cons

Pros

Multiple sources of data (sources) and destinations (sinks) that allows you to move data form and to any relevant data storage
It is very easy to setup and run
Very open to personalization, you can create filters, enrichment, new sources and destinations

Cons

Apache Flume develops new functionality at a slower pace than other OpenSource projects, it is well behing Kafka and has some compatibiliy issues with latest releases
It lack HA or FT, it relies on third party management software like Hortonworks or Cloudera

Return on Investment

Flume has simplified a lot many of our ingest procedures, easier to deploy and integrate than a classical EAI, reducing the time to market
But opposed to EAIs if the project starts to grow in complexity Apache Flume project may not be as suitable

Alternatives Considered

Logstash

Apache Flume is a very good solution when your project is not very complex at transformation and enrichment, and good if you have an external management suite like Cloudera, Hortonworks, etc. But it is not a real EAI or ETL like AB Initio or Attunity so
you need to know exactly what you want.
On the other hand being an opensource project give Apache a lot of room to personalize thanks to its plug-able architecture and has a very nice performance having a very low CPU and Memory footprint, a single server can do the job on many occasions, as opposed to the multi-server architecture of paid products.

Other Software Used

Apache Kafka, Logstash, TIBCO BusinessWorks, TIBCO Enterprise Message Service

Likelihood to Recommend

Apache Flume is well suited in small batch and near real time processing projects, taking data from one point to another with local processing (I mean not external enrichment).
Filtering, transforming and multiple push destinations are common grounds for Flume.

It is not so nice to use if your data needs external enrichment (taking data from external databases or web services), as transactions and (micro)batches may lead to reprocessing and it relies upon the application to avoid duplicates.

Comments

Please log in to join the conversation

Apache Flume, the way your information flows