Apache Flume is used for aggregating and analyzing log data in near-real-time across the organization for compliance purposes with a goal to generate monthly compliance reports based on log data.
- Apache Flume being a log-centric system, it is able to parse and aggregate log data very well.
- It is easy to customize it for different source (producers) for log data ingestion as well as for sinks (consumers).
- It is very specific for log data ingestion so it is pretty hard to use for anything else besides log data
- Data replication is not built in and needs to be added on top of Apache Flume (not a hard job to do though)
Apache Flume is well suited when the use case is log data ingestion and aggregate only, for example for compliance of configuration management. It is not well suited where you need a general-purpose real-time data ingestion pipeline that can receive log data and other forms of data streams (eg IoT, messages).
Apache Flume is open-source so support is limited. Never the less, it has great documentation and best practices documents from their end-users so it is not hard to use, setup and configure.