Apache Flume vs. Apache Kafka

Overview
ProductRatingMost Used ByProduct SummaryStarting Price
Apache Flume
Score 7.1 out of 10
N/A
Apache Flume is a product enabling the flow of logs and other data into a Hadoop environment.N/A
Apache Kafka
Score 8.4 out of 10
N/A
Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java. The Kafka event streaming platform is used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.N/A
Pricing
Apache FlumeApache Kafka
Editions & Modules
No answers on this topic
No answers on this topic
Offerings
Pricing Offerings
Apache FlumeApache Kafka
Free Trial
NoNo
Free/Freemium Version
NoNo
Premium Consulting/Integration Services
NoNo
Entry-level Setup FeeNo setup feeNo setup fee
Additional Details
More Pricing Information
Community Pulse
Apache FlumeApache Kafka
Considered Both Products
Apache Flume
Chose Apache Flume
Apache Flume is on par with Scribe with similar functions. Apache Kafka is a generation purpose while Apache Flume is specific to log aggregation. Google Pub/Sub and IBM MQ are costlier than Apache Flume ( open source ) and have a lot more cost associated with them. Apama …
Chose Apache Flume
Apache Flume is a very good solution when your project is not very complex at transformation and enrichment, and good if you have an external management suite like Cloudera, Hortonworks, etc. But it is not a real EAI or ETL like AB Initio or Attunity so you need to know exactly …
Apache Kafka
Chose Apache Kafka
Confluent Cloud is still based on Apache Kafka but it has a subscription fee so, from a long term perspective, it is wiser to deploy your own Kafka instance that spans public and private cloud. Amazon Kinesis, Google Cloud Pub/Sub do not do well for a very number of messages …
Chose Apache Kafka
Kafka is faster and more scalable, also "free" as opensource (albeit we deploy using a commercial distribution). Infrastructure tends to be cheaper. On the other hand, projects must adapt to Kafka APIs that sometimes change and BAU increases until a major 1.x version comes out …
Top Pros
Top Cons
Best Alternatives
Apache FlumeApache Kafka
Small Businesses

No answers on this topic

No answers on this topic

Medium-sized Companies
Cloudera Manager
Cloudera Manager
Score 9.7 out of 10
IBM MQ
IBM MQ
Score 9.0 out of 10
Enterprises
IBM Analytics Engine
IBM Analytics Engine
Score 8.8 out of 10
IBM MQ
IBM MQ
Score 9.0 out of 10
All AlternativesView all alternativesView all alternatives
User Ratings
Apache FlumeApache Kafka
Likelihood to Recommend
8.0
(2 ratings)
8.3
(18 ratings)
Likelihood to Renew
-
(0 ratings)
9.0
(2 ratings)
Usability
-
(0 ratings)
10.0
(1 ratings)
Support Rating
5.0
(1 ratings)
8.4
(4 ratings)
User Testimonials
Apache FlumeApache Kafka
Likelihood to Recommend
Apache
Apache Flume is well suited when the use case is log data ingestion and aggregate only, for example for compliance of configuration management. It is not well suited where you need a general-purpose real-time data ingestion pipeline that can receive log data and other forms of data streams (eg IoT, messages).
Read full review
Apache
Apache Kafka is well-suited for most data-streaming use cases. Amazon Kinesis and Azure EventHubs, unless you have a specific use case where using those cloud PaAS for your data lakes, once set up well, Apache Kafka will take care of everything else in the background. Azure EventHubs, is good for cross-cloud use cases, and Amazon Kinesis - I have no real-world experience. But I believe it is the same.
Read full review
Pros
Apache
  • Multiple sources of data (sources) and destinations (sinks) that allows you to move data form and to any relevant data storage
  • It is very easy to setup and run
  • Very open to personalization, you can create filters, enrichment, new sources and destinations
Read full review
Apache
  • Really easy to configure. I've used other message brokers such as RabbitMQ and compared to them, Kafka's configurations are very easy to understand and tweak.
  • Very scalable: easily configured to run on multiple nodes allowing for ease of parallelism (assuming your queues/topics don't have to be consumed in the exact same order the messages were delivered)
  • Not exactly a feature, but I trust Kafka will be around for at least another decade because active development has continued to be strong and there's a lot of financial backing from Confluent and LinkedIn, and probably many other companies who are using it (which, anecdotally, is many).
Read full review
Cons
Apache
  • It is very specific for log data ingestion so it is pretty hard to use for anything else besides log data
  • Data replication is not built in and needs to be added on top of Apache Flume (not a hard job to do though)
Read full review
Apache
  • Sometimes it becomes difficult to monitor our Kafka deployments. We've been able to overcome it largely using AWS MSK, a managed service for Apache Kafka, but a separate monitoring dashboard would have been great.
  • Simplify the process for local deployment of Kafka and provide a user interface to get visibility into the different topics and the messages being processed.
  • Learning curve around creation of broker and topics could be simplified
Read full review
Likelihood to Renew
Apache
No answers on this topic
Apache
Kafka is quickly becoming core product of the organization, indeed it is replacing older messaging systems. No better alternatives found yet
Read full review
Usability
Apache
No answers on this topic
Apache
Apache Kafka is highly recommended to develop loosely coupled, real-time processing applications. Also, Apache Kafka provides property based configuration. Producer, Consumer and broker contain their own separate property file
Read full review
Support Rating
Apache
Apache Flume is open-source so support is limited. Never the less, it has great documentation and best practices documents from their end-users so it is not hard to use, setup and configure.
Read full review
Apache
Support for Apache Kafka (if willing to pay) is available from Confluent that includes the same time that created Kafka at Linkedin so they know this software in and out. Moreover, Apache Kafka is well known and best practices documents and deployment scenarios are easily available for download. For example, from eBay, Linkedin, Uber, and NYTimes.
Read full review
Alternatives Considered
Apache
Apache Flume is a very good solution when your project is not very complex at transformation and enrichment, and good if you have an external management suite like Cloudera, Hortonworks, etc. But it is not a real EAI or ETL like AB Initio or Attunity so
you need to know exactly what you want. On the other hand being an opensource project give Apache a lot of room to personalize thanks to its plug-able architecture and has a very nice performance having a very low CPU and Memory footprint, a single server can do the job on many occasions, as opposed to the multi-server architecture of paid products.
Read full review
Apache
I used other messaging/queue solutions that are a lot more basic than Confluent Kafka, as well as another solution that is no longer in the market called Xively, which was bought and "buried" by Google. In comparison, these solutions offer way fewer functionalities and respond to other needs.
Read full review
Return on Investment
Apache
  • Flume has simplified a lot many of our ingest procedures, easier to deploy and integrate than a classical EAI, reducing the time to market
  • But opposed to EAIs if the project starts to grow in complexity Apache Flume project may not be as suitable
Read full review
Apache
  • Positive: Get a quick and reliable pub/sub model implemented - data across components flows easily.
  • Positive: it's scalable so we can develop small and scale for real-world scenarios
  • Negative: it's easy to get into a confusing situation if you are not experienced yet or something strange has happened (rare, but it does). Troubleshooting such situations can take time and effort.
Read full review
ScreenShots