Skip to main content
TrustRadius
Apache Kafka

Apache Kafka

Overview

What is Apache Kafka?

Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java. The Kafka event streaming platform is used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical…

Read more

Learn from top reviewers

Return to navigation

Product Details

What is Apache Kafka?

Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java. The Kafka event streaming platform is used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.

Apache Kafka Technical Details

Operating SystemsUnspecified
Mobile ApplicationNo
Return to navigation

Comparisons

View all alternatives
Return to navigation

Reviews From Top Reviewers

(1-5 of 5)

Apache Kafka: Where messaging meets storage

Rating: 7 out of 10
April 09, 2021
Vetted Review
Verified User
Apache Kafka
4 years of experience
Apache Kafka is used by our company as the "next generation" of messaging/data-streaming pipeline solutions, to replace our old legacy JMS-based messaging solution and enable the modern streaming API based applications. When it is used for messaging purposes, we shift the responsibility of data replay from the message source (publisher application) to the message destination (consumer application). This flexibility resolved the legacy issue of sources replaying the messages but impacting all subscribers to the same topic. When Kafka is used as the streaming pipeline, it is integrated seamlessly with the Spark/Spring Stream-based analytic solutions, as it is also a kind of distributed storage.
  • Undoubtedly, Kafka's high throughput and low latency feature are the highlights.
  • Kafka can scale horizontally very well.
Cons
  • The CLI and configuration details need to be worked out more in-depth. The naming convention of configuration is not so good and causing a lot of confusion. Sometimes there are too many configuration parameters to tune--requires the adopter to understand a lot of tricks like NFS entrapment, for example.
  • Lack of a good monitoring solution so far
When it is used as messaging, Apache Kafka is majorly preferred when the use case is Pub/Sub typed. It is not suitable to deal with the end-to-end queue use case nor the request/response paradigm. When Apache Kafka is used for streaming purposes, it doesn't have the native implementation of the query language, it is just a pipeline. You still need to put a lot of programming efforts into your streaming client-side to take care of those analytic requirements.
  • Kafka makes the messaging itself more reliable (as it has the distributed storage by itself and the message doesn't disappear even after it has been consumed).
  • Kafka can support a much higher volume use case, without too much extra pressure on the existed hardware.
Kafka is not a real messaging broker implementation as RabbitMQ or TIBCO EMS/JMS are. Although it can be used as messaging, we like the idea behind the Kafka (data isn't "passing by," instead it remains centra, so the client can revisit the data if necessary). This also relieves the pressure of keeping the old duplicated data copy on both the publisher and the consumer sides.
We are using the Apache open source version of Kafka. The community is a good place to ask questions. and we can get most of our problems resolved there.

Apache Kafka, the F1 of messaging

Rating: 9 out of 10
March 01, 2018
JF
Vetted Review
Verified User
Apache Kafka
3 years of experience
Apache Kafka is becoming the new standard for messaging at our organization. Originally we limited the use to big data environments and projects but as the technology is becoming more mature we think it will eventually replace classical messaging software.
  • High volume/performance throughput environments
  • Low latency projects
  • Multiple consumers for the same data, reprocessing, long-lasting information
Cons
  • Still a bit inmature, some clients have required recoding in the last few versions
  • New feaures coming very fast, several upgrades a year may be required
  • Not many commercial companies provide support
Apache Kafka is extremely well suited in near real-time scenarios, high volume or multi-location projects. It can solve escalation problems for a fraction of the cost other solutions do and it has the flexibility of open source scenarios.
  • Easier deployment and horizontal scalability
  • Messaging cost reduction
  • Developments require adaptation and some paradigm shift to interoperate with Kafka
Kafka is faster and more scalable, also "free" as opensource (albeit we deploy using a commercial distribution). Infrastructure tends to be cheaper. On the other hand, projects must adapt to Kafka APIs that sometimes change and BAU increases until a major 1.x version comes out and adds stability to the product.
20
Kafka is core for several business/technical functions:
- Data streaming: ingest data into Datalake, process information near real-time
- Log processing: able to hold logs from all the company applications both to process and as transport them to a final storage (like timeseries DB, Elasticsearch and so on)
- Reliable messaging now that exactly-one-delivery semantics have been implemented
and so on
Developers with deep knowledge of stream processing, otherwise your organization will not use all the capabilities.
Operators with DevOps skills. Kafka, even in version 1.0, is still a bit inmature and lacks of proper adminitration tools (apart from those from 3rd party like Cloudera, Hortonworks, Confluent, Lenses and so on) so hands on scripts and detailled monitoring of the platform is a must.
  • Application technical log processing
  • Realtime transaction analysis
  • Messaging as a Service for PaaS and CaaS applications
  • Several data hubs: technical, business, social...
Kafka is quickly becoming core product of the organization, indeed it is replacing older messaging systems. No better alternatives found yet

Kafka for tracking changes

Rating: 8 out of 10
May 30, 2023
AK
Vetted Review
Verified User
Apache Kafka
3 years of experience
Verified on LinkedIn
We use Apache Kafka to stream order information across systems. An order may go through certain updates through its lifecycle. These updates need to be communicated to the systems in near real time and we rely on Kafka for this.Our business use case is to take these orders up with the insurance companies for approval and thus the order information need to be up to date. Kafka has been excellent at doing this so far.
  • Receiving messages from publisher and sending to consumer in FIFO manner
  • Handling of errors using Dead Letter Queue when message could not be consumed on the consumer end
  • Fault tolerance
Cons
  • Sometimes it becomes difficult to monitor our Kafka deployments. We've been able to overcome it largely using AWS MSK, a managed service for Apache Kafka, but a separate monitoring dashboard would have been great.
  • Simplify the process for local deployment of Kafka and provide a user interface to get visibility into the different topics and the messages being processed.
  • Learning curve around creation of broker and topics could be simplified
Kafka is well suited in scenarios where a message need to be sent to another system in fault tolerant manner. It is useful when the message size could be large and large number of messages could be floating around.
It would be less appropriate or rather an overkill to use Kafka in scenarios where we are sending short messages to offload certain tasks(like invoice generation and sending email) to a worker(like celery). For such use cases, simple queueing solutions like Amazon SQS should suffice.
  • High throughput
  • Low latency
  • Fault tolerance
  • We are able to submit orders to the insurance companies with almost 100% accuracy because we receive Kafka updates in almost real time
  • We are getting notified of error scenarios separately because of our Dead Letter Queue implementation so that we can handle those cases
  • There is certain engineering effort being spent to maintain Kafka
Apache Kafka can work at a higher scale as compared to SQS. It can work with higher size per message and millions of messages per second. Moreover it can be scaled horizontally by adding more brokers to the cluster. SQS is good enough for simple use cases like making a task async by passing it to the worker app or delaying a task execution by certain time duration but not advisable for heavy load systems.
1000
We have an engineering team of about 1500 engineers and almost 2/3 of them use Kafka for some or other use case.
50
Support for Kafka comes from within our developer community. So the people managing Kafka are engineers only.
  • Publishing messages to other services to inform them of modification in an entity
  • Publishing messages to Camunda to mark a user task as complete
Kafka has suited our use case very well so far. Going forward we are planning to expand our platform manifold so the load on Kafka and our reliance on Kafka is going to increase only.

Apache Kafka for your Data solutions

Rating: 10 out of 10
August 07, 2021
Vetted Review
Verified User
Apache Kafka
6 years of experience
It is being used for the product mainly. We have huge data pipelines running which depend on Apache Kafka. It is being used for more than 5 years now and we are really happy with the performance and the reliability Apache Kafka has to offer. The experience has been excellent.
  • Data Pipeline
  • Asynchronous processing
  • Data retention for reprocessing
Cons
  • Dashboards to monitor the performance
  • ZooKeeper free
  • Connectors for more languages
  • It works overall really well for maintaining data and then processing whenever you want to as it has really good retention options. Multiple consumers can be run and systems can be scaled.
  • Works well when scale is needed
  • Can work well on low hardware requirements
  • Where it can be limiting is while implementing priority queues as it has to be done at the producer level.
  • Faster deployments
  • Scalable solution which improves up time of systems
  • Low Dev effort
Apache Kafka is much more scalable and more reliable. Does not depend on memory, works well on rotational disks and that makes it a cheaper to use solution on low hardware requirements. Running multiple consumers on the same topic can also mean processing the same data again and again and this can be a big plus.

Apache Kafka - Default Choice For Large Scale Messaging

Rating: 8 out of 10
August 23, 2023
VT
Vetted Review
Verified User
Apache Kafka
5 years of experience
Apache Kafka is really the bedrock of all things streaming and data processing. I cannot imagine if there is any other product that does it better. My last 2 companies used it, and my current one does so as well. If you want your data stream to be organized and sent, Apache Kafka has become the tool of choice. I have dabbled in Azure EventHubs as well, if you are into opensource data streaming, Apache Kafka will take you where you need to be for data lakes and the amount of data that is streamed for the cybersecurity industry that my company is in. Without Apache Kafka, there is no way that my company products can handle the volume of data that we process for our customers.
  • Data streaming is really second to none.
  • Scaling, done right, Apache Kafka is a workhorse.
  • Ease of administration - Although you cannot really compare to Azure EventHubs, but that is comparing between Apples and Oranges.
Cons
  • The web UI has not really changed in years. UX has been refreshed, but a more streamlined UX instead of many 3rd party webUX tools, will be most welcome.
  • Webhooks can still be tricky to troubleshoot at times.
  • CLI monitoring is a learning curve to get it right.
Apache Kafka is well-suited for most data-streaming use cases. Amazon Kinesis and Azure EventHubs, unless you have a specific use case where using those cloud PaAS for your data lakes, once set up well, Apache Kafka will take care of everything else in the background. Azure EventHubs, is good for cross-cloud use cases, and Amazon Kinesis - I have no real-world experience. But I believe it is the same.
  • Well known and known set of tools from setup to admin.
  • Scalability.
  • Fit for use in both onprem, and cloud-base use cases.
  • Being an open-source tool, Apache Kafka is invaluable to my company's product. I cannot imagine how much it is if we are using Amazon Kinesis or Azure EventHubs.
  • The negative part will be in the event of Apache Kafka failures, the trouble-shooting can really be a pain and bane. But given enough exposure to its inner workings, Apache Kafka still comes out OK.
  • Having used Apache Kafka for years in this company, I can only say without Apache Kafka, my company would not be cost-efficient and would be much more costlier to sell to customers if we were paying on top of Azure Event hubs or Amazon Kinesis.
Apache Kafka is built for scale. From high throughput and real-time data streaming, it has a strong advantage over RabbitMQ with its low latency. This put Apache Kafka at the forefront as the platform of choice for large datasets messaging and ensuring scalability when data scale up tremendously.

RabbitMQ however has its strengths in traditional messaging. Routing and message delivery reliability are the bedrock of RabbitMQ and this is where RabbitMQ excels. In my previous workplace, RabbitMQ was of choice as reliability matters more than scale.

In two words. Apache Kafka for scale, RabbitMQ for reliability. And for cloud deployment and large dataset messaging in what I am doing now, Apache Kafka is the default choice.
Amazon Elastic Kubernetes Service (EKS), Apache Spark, Amazon Elastic Compute Cloud (EC2)
Return to navigation