Apache Flume for log aggregation and compliance monitoring in real-time
Use Cases and Deployment Scope
Apache Flume is used for aggregating and analyzing log data in near-real-time across the organization for compliance purposes with a goal to generate monthly compliance reports based on log data.
Pros
- Apache Flume being a log-centric system, it is able to parse and aggregate log data very well.
- It is easy to customize it for different source (producers) for log data ingestion as well as for sinks (consumers).
Cons
- It is very specific for log data ingestion so it is pretty hard to use for anything else besides log data
- Data replication is not built in and needs to be added on top of Apache Flume (not a hard job to do though)
Likelihood to Recommend
Apache Flume is well suited when the use case is log data ingestion and aggregate only, for example for compliance of configuration management. It is not well suited where you need a general-purpose real-time data ingestion pipeline that can receive log data and other forms of data streams (eg IoT, messages).
