Apache Kafka

Apache Kafka

About TrustRadius Scoring
Score 9.1 out of 100
Apache Kafka

Overview

Recent Reviews

Kafka quick queue

8 out of 10
January 30, 2019
We are using Kafka as an ingress and egress queue for data being saved into a big data system. Kafka is also being used as a queue for …
Continue reading

Reviewer Pros & Cons

View all pros & cons

Video Reviews

Leaving a video review helps other professionals like you evaluate products. Be the first one in your network to record a review of Apache Kafka, and make your voice heard!

Pricing

View all pricing
N/A
Unavailable

What is Apache Kafka?

Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java. The Kafka event streaming platform is used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical…

Entry-level set up fee?

  • No setup fee

Offerings

  • Free Trial
  • Free/Freemium Version
  • Premium Consulting / Integration Services

Would you like us to let the vendor know that you want pricing?

6 people want pricing too

Alternatives Pricing

What is Amazon SNS?

Amazon Web Services offers the Amazon Simple Notification Service (SNS) which provides pub/sub messaging and push notifications to iOS and Android devices. It is meant to operate in a microservices architecture and which can support event-driven contingencies and support the decoupling of…

What is Amazon SQS?

Amazon Web Services (AWS) Provides the Amazon Simple Queue Service (SQS), a managed message queue service which supports the safe decoupling and distribution of different components in a cloud infrastructure and cloud applications.

Features Scorecard

No scorecards have been submitted for this product yet..

Product Details

What is Apache Kafka?

Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java. The Kafka event streaming platform is used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.

Apache Kafka Technical Details

Operating SystemsUnspecified
Mobile ApplicationNo

Comparisons

View all alternatives

Reviews and Ratings

 (97)

Ratings

Reviews

(1-13 of 13)
Companies can't remove reviews or game the system. Here's why
Score 10 out of 10
Vetted Review
Verified User
Review Source
Apache Kafka is the most powerful and scalable streaming framework on the market. We have used Apache Kafka as a part of many real-time analytics solutions. It has a great performance [and is] easy to integrate with big data technologies like Spark. Due to its distributed nature, Apache Kafka is capable of operating very quickly and can handle millions of messages every second.
  • Real time streaming
  • Performance
  • Scalability
  • Management tools
I have used Apache Kafka for real-time analytics and streaming. It’s highly scalable and integrates well with big data technologies like Spark. I believe Apache Kafka is the best in the market.
Borislav Traykov | TrustRadius Reviewer
Score 8 out of 10
Vetted Review
Verified User
Review Source
Kafka is an event streaming platform and this is exactly the purpose we use it for in our company. Application data-in-transit goes into Kafka, which generates an even, and all relevant applications (consumers) get notified and then consume said messages. We are really happy with the volume of data we get through and the speed that we get from Kafka. It's used in multiple 1st and 3rd party components of the applications we develop in the entire company. It addresses data proliferation and notifications. If not for Kafka, we'd have to invent a pub/sub model (which multiple people have in the past in this company) - those are complex, hard to maintain, extend and customize. Kafka is fair well documented and used so there is a lot of info about multiple use cases online.
  • The pub/sub model
  • Quick data transfer - regardless of volume (if you have enough resources)
  • Ability to transfer large amounts of data consistently (non-binary)
  • The Kafka Tool is a community-made Java application that looks and feels from the past century.
  • Logging can be confusing. This certainly shows when we have to do troubleshooting.
  • Hybrid scenarios - pub/sub, but there are services in and outside a Kubernetes cluster. Then there are a ~3 options, but only 2 (the harder ones) are production-safe.
  • Pub/sub model when more services are involved.
  • A lot of of technologies know how to work with Kafka. There are Kafka libraries for all general-purpose languages.
  • Quick and reliable data transit and notifications.
  • Kafka can have a big memory and/or disk footprint depending on your scenario. Be prepared to delegate resources if your amount of data gets more and more. Kafka is lean by default, but it does require memory (in-mem storage) and disk (offloading) to keep your data.
  • Kafka has a lot of configuration options - be sure to check them if you need to fit Kafka into a specific scenario.
  • The Kafka Tools looks ancient, but it does what it's supposed to.
  • If your developers are debugging, they may unintentionally "steal" events/data from a given queue as they would probably register as a consumer. This is very nasty especially when dealing with a living system There are ways to avoid this, but people need to be aware that it can happen.
Tyler Twitchell | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Review Source
We use Kafka as the queuing mechanism for records in an indexing pipeline. Previous to using Kafka we were working with tables in SQL Server to handle a queue in a situation that SQL is not really designed for. Kafka provides a simple and efficient system that does the job it was intended for, queuing and maintaining records in a queue, and works very well. We use Kafka for several processes in our organization that require records to be stored and be processed by dedicated servers.
  • Queuing of records
  • Easy expansion of Topic parititions
  • An abundance of options for managing and maintaining queues
  • Easy expansion of cluster for growth
  • A management interface would be nice
  • Built in logging tools
Kafka is a queuing system, plain and simple, and it does its job efficiently and with little fuss. We utilize Splunk logging to keep track of records in queues and how items are being processed and outside of that we generally do not have to mess with Kafka, it just does the job with little maintenance or problems. Any situation where records or information need to be placed in a queue to be accessed and processed by other systems would be well suited to scenarios where Kafka is the right solution.
Score 10 out of 10
Vetted Review
Verified User
Review Source
My application was dependent on other applications to generate data and those data were needed to be processed immediately. And, processed data were published for other applications. Moreover, data load was very high nearly a hundred thousand a day. And, consumed data may be replayed in the future if required. So, after carefully considering several messaging queues we finally decided to continue with Apache Kafka.
  • Every setting is configurable.
  • Work seamlessly during high data load.
  • Partition mechanism.
  • Easy configurable.
  • Zookeeper configuration.
  • Front-end can be developed to configure properties.
  • UI for administrative configuration.
Kafka can be used as a database but it is not recommended to store data for a long time. Also, if your application has a high data load then only we should utilize Kafka otherwise any other messaging queue is recommended. In addition, Apache Kafka provides far more features than just a simple messaging queue. Using Apache Kafka we can develop loosely coupled, real-time processing, and fault-tolerance architecture.
Score 10 out of 10
Vetted Review
Verified User
Review Source
Kafka is being used for our IoT data flows as the middle layer to transport data and make it available for consumption. We are implementing it slowly starting project by project and plan to use it globally.
  • Message queue
  • Capture data
  • Make data available
  • Integration between systems
  • More out of the box connectors for various other system integration
Kafka is great for moving data between systems! You can even store data for a while before purging it so you know you have consumed it!
Score 10 out of 10
Vetted Review
Verified User
Review Source
It is being used for the product mainly. We have huge data pipelines running which depend on Apache Kafka. It is being used for more than 5 years now and we are really happy with the performance and the reliability Apache Kafka has to offer. The experience has been excellent.
  • Data Pipeline
  • Asynchronous processing
  • Data retention for reprocessing
  • Dashboards to monitor the performance
  • ZooKeeper free
  • Connectors for more languages
  • It works overall really well for maintaining data and then processing whenever you want to as it has really good retention options. Multiple consumers can be run and systems can be scaled.
  • Works well when scale is needed
  • Can work well on low hardware requirements
  • Where it can be limiting is while implementing priority queues as it has to be done at the producer level.
Score 10 out of 10
Vetted Review
Verified User
Review Source
Kafka is being used for sending log information in real time and there[fore] can monitor apps and send these events to feed other apps. It's the core for send[ing] and receiv[ing] messages due to quantity of messages per second. Helps us to scale and manage the common errors in this type of problem.
  • Scalable
  • Fast
  • Performance
  • Open source
  • Performance security
  • Monitoring
  • Configuration
Send a few events in a few time slots: Kafka is designed for high computing events. If you application doesn't work with more [than] 25.000 messages, Kafka isn't the correct solution.

Send events with high size: don't try working with events with more [than] 1 Mb, the performance is very poor.

Send event without compression: if you work with any compression with messages this will help the performance in net traffic and speed of pipeline
Score 7 out of 10
Vetted Review
Verified User
Review Source
Apache Kafka is used by our company as the "next generation" of messaging/data-streaming pipeline solutions, to replace our old legacy JMS-based messaging solution and enable the modern streaming API based applications. When it is used for messaging purposes, we shift the responsibility of data replay from the message source (publisher application) to the message destination (consumer application). This flexibility resolved the legacy issue of sources replaying the messages but impacting all subscribers to the same topic. When Kafka is used as the streaming pipeline, it is integrated seamlessly with the Spark/Spring Stream-based analytic solutions, as it is also a kind of distributed storage.
  • Undoubtedly, Kafka's high throughput and low latency feature are the highlights.
  • Kafka can scale horizontally very well.
  • The CLI and configuration details need to be worked out more in-depth. The naming convention of configuration is not so good and causing a lot of confusion. Sometimes there are too many configuration parameters to tune--requires the adopter to understand a lot of tricks like NFS entrapment, for example.
  • Lack of a good monitoring solution so far
When it is used as messaging, Apache Kafka is majorly preferred when the use case is Pub/Sub typed. It is not suitable to deal with the end-to-end queue use case nor the request/response paradigm. When Apache Kafka is used for streaming purposes, it doesn't have the native implementation of the query language, it is just a pipeline. You still need to put a lot of programming efforts into your streaming client-side to take care of those analytic requirements.
We are using the Apache open source version of Kafka. The community is a good place to ask questions. and we can get most of our problems resolved there.
Viral Patel | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Review Source
We used it for event logging. It was used for application log collection. Was used with exception tracking and with core microservices of the web application. It helped us reduce cost and simplified operational monitoring.
  • It handles large amount of data simultaneously. Makes application scalable.
  • It is able to handle real time data pipeline.
  • Resistant to node failure within the cluster.
  • Does not have complete set of monitoring tools.
  • It does not support wild card topic selection.
  • Brokers and consumer pattern reduces the performance.
It works well as a replacement for traditional message broker. Used when you want to log simultaneously tracking multiple web activities.
They provide very good response. Sometimes they get queued up.
Score 10 out of 10
Vetted Review
Verified User
Review Source
Apache Kafka is used as a stream/message ingestion engine for all the customer-facing apps including some internal streams company-wide. It is used to ingest close to 2-5 million small (few bytes) messages per second that are then used for internal analytics and decision making in realtime and feed analytics backend (Tibco Spotfire).
  • Apache Kafka is able to handle a large number of I/Os (writes) using 3-4 cheap servers.
  • It scales very well over large workloads and can handle extreme-scale deployments (eg. Linkedin with 300 billion user events each day).
  • The same Kafka setup can be used as a messaging bus, storage system or a log aggregator making it easy to maintain as one system feeding multiple applications.
  • Apache Kafka does take some initial setup and deployment time especially if you haven't bought support from Confluent.
  • It is not a full solution so for an analytics use case, you will still need something like Tibco.
  • It does not have a SQL based query engine out-of-the-box so building/using analytics on top can be a lot of work. It would be great to have something already baked into Kafka out-of-the-box.
Apache Kafka is very well suited where the deployment entails getting a very large number of small messages at extremely high rates—4 million-plus messages a second. It is also very well suited when you need stronger ordering guarantees than a traditional messaging system can provide. It is less suited when you don't need such high message ingestion rates and need to do everything in a public cloud. Apache Kafka will be an overkill for such small/simple deployments.
Support for Apache Kafka (if willing to pay) is available from Confluent that includes the same time that created Kafka at Linkedin so they know this software in and out. Moreover, Apache Kafka is well known and best practices documents and deployment scenarios are easily available for download. For example, from eBay, Linkedin, Uber, and NYTimes.
Score 9 out of 10
Vetted Review
Verified User
Review Source
We use Kafka for two key features: (1) keeping a buffer of all the incoming records that need to be stored in our data infrastructure, and (2) having a way to replay messages in case our data infrastructure loses some data.
The reason we need to buffer is that when our traffic spikes, we can have up to 1 million messages coming in that need to be processed in some form or fashion. To expect the back-end service to support that is crazy. Instead, we dump them into Kafka to give our data infrastructure time to ingest them. As for replaying events, sometimes the ingestion pipeline fails and drops some messages. I know - that's a huge mistake on our engineering team's part - but when it does happen Kafka has the ability to rewind and replay messages, resulting in delayed processing but no data loss.
  • Really easy to configure. I've used other message brokers such as RabbitMQ and compared to them, Kafka's configurations are very easy to understand and tweak.
  • Very scalable: easily configured to run on multiple nodes allowing for ease of parallelism (assuming your queues/topics don't have to be consumed in the exact same order the messages were delivered)
  • Not exactly a feature, but I trust Kafka will be around for at least another decade because active development has continued to be strong and there's a lot of financial backing from Confluent and LinkedIn, and probably many other companies who are using it (which, anecdotally, is many).
  • Doesn't work well with many small topics (on the order of thousands). There is a physical limit due to file handler usage on the number of topics Kafka can have before it grinds to a halt. This is not an issue for most people but it became an issue for us, as we need to have many, many topics and so we weren't able to fully migrate to Kafka except for a few of our big queues.
  • Lack of tenant isolation: if a partition on one node starts to lag on consume or publish, then all the partitions on that node will start to lag. That's what we've noticed and it's really frustrating to our customers that another customer's bad data affects them as well.
  • I don't have tooo much experience here, but I hear from other engineers on my team that the CLI admin tool is a real pain to use. For example, they say the arguments have no clear naming convention so they are hard to memorize and sometime you have to pass in undocumented properties.
Despite the disadvantages I list, I really believe that Kafka is the right choice whenever you need a queueing or message broker system. Kafka is way too battle-tested and scales too well to ever not consider it. The only exception is if your use case requires many, many small topics. Also, Kafka doesn't support delay queues out of the box and so you will need to "hack" it through special code on the consumer side.
We use Heroku to host Pulsar and they have tons of Kafka experts that have helped us tune every little setting and give us advice via email or live chat (if you pay for premium support).
January 30, 2019

Kafka quick queue

Score 8 out of 10
Vetted Review
Verified User
Review Source
We are using Kafka as an ingress and egress queue for data being saved into a big data system. Kafka is also being used as a queue for frontend applications to use in order to retrieve data and analytics from MapR and HortonWorks.
  • Fast queuing
  • Easy to set up and configure
  • Easy to add and remove queues
  • User interface for configuration could be a little better
  • Could be a little more defined when configuring files
  • Logging is a little hard to follow
If you need a queue for ingest or user interfaces Kafka is a great tool. Easy on the admins as well as the developers.
Juan Francisco Tavira | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Review Source
Apache Kafka is becoming the new standard for messaging at our organization. Originally we limited the use to big data environments and projects but as the technology is becoming more mature we think it will eventually replace classical messaging software.
  • High volume/performance throughput environments
  • Low latency projects
  • Multiple consumers for the same data, reprocessing, long-lasting information
  • Still a bit inmature, some clients have required recoding in the last few versions
  • New feaures coming very fast, several upgrades a year may be required
  • Not many commercial companies provide support
Apache Kafka is extremely well suited in near real-time scenarios, high volume or multi-location projects. It can solve escalation problems for a fraction of the cost other solutions do and it has the flexibility of open source scenarios.