Apache Kafka vs. IBM StreamSets

Apache Kafka

Apache Kafka

152 Reviews and Ratings

IBM StreamSets

IBM StreamSets

18 Reviews and Ratings

Overview
Product	Rating	Most Used By	Product Summary	Starting Price
Apache Kafka	Score 8.8 out of 10	N/A	Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java. The Kafka event streaming platform is used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.	N/A
IBM StreamSets	Score 8.0 out of 10	N/A	IBM® StreamSets enables users to create and manage smart streaming data pipelines through a graphical interface, facilitating data integration across hybrid and multicloud environments. IBM StreamSets can support millions of data pipelines for analytics, applications and hybrid integration.	N/A

Pricing

Apache Kafka

IBM StreamSets

Editions & Modules

No answers on this topic

No answers on this topic

Offerings

Pricing Offerings
Apache Kafka	IBM StreamSets
Free Trial
No	No
Free/Freemium Version
No	No
Premium Consulting/Integration Services
No	No

Entry-level Setup Fee

No setup fee

No setup fee

Additional Details

—

—

More Pricing Information

Community Pulse
	Apache Kafka	IBM StreamSets
Considered Both Products	Apache Kafka AS Ankit Singh Senior Engineering Manager Chose Apache Kafka It has very minimal overhead and doesn't have a steep learning curve. Incentivized Helpful? VT Victor Tay Engineer Chose Apache Kafka Apache Kafka is built for scale. From high throughput and real-time data streaming, it has a strong advantage over RabbitMQ with its low latency. This put Apache Kafka at the forefront as the platform of choice for large datasets messaging and ensuring scalability when data … Incentivized Helpful? Verified User Anonymous Chose Apache Kafka It had the clustering functionality and gave tolerance against machine failure. Incentivized Helpful? Alok Pabalkar Co-Founder & CTO Chose Apache Kafka - The biggest advantage of using Apache Kafka is that it is cloud agnostic - It handles super high volume, is fault tolerance, high performance Incentivized Helpful? Animesh Kumar Senior Member of Technical Staff Chose Apache Kafka Apache Kafka can work at a higher scale as compared to SQS. It can work with higher size per message and millions of messages per second. Moreover it can be scaled horizontally by adding more brokers to the cluster. SQS is good enough for simple use cases like making a task … Incentivized Helpful? Verified User Anonymous Chose Apache Kafka I used other messaging/queue solutions that are a lot more basic than Confluent Kafka, as well as another solution that is no longer in the market called Xively, which was bought and "buried" by Google. In comparison, these solutions offer way fewer functionalities and respond … Incentivized Helpful? Verified User Anonymous Chose Apache Kafka Apache Kafka is open-sourced, scales great has cloud agnostics and performs better than Amazon Kinesis [in my view]. Amazon Kinesis has some limitations and vendor lockin is not something I [like]. With Confluent operators you can easily install it on a kubernetes cluster. Incentivized Helpful? Tyler Twitchell Senior System Engineer Chose Apache Kafka We really needed to get away from using a SQL database to act as a queue for processing records, so a new solution was needed. Kafka is a leading software application initially designed for queuing messages which is essentially what we were looking for. It has a great user … Incentivized Helpful? Verified User Anonymous Chose Apache Kafka Kafka is simple and lower in price. Incentivized Helpful? Borislav Traykov DevOps Team Leader Chose Apache Kafka For us, Kafka really doesn't have a 1:1 alternative. We have used ActiveMQ extensively and we still use it as a lighter option for small messages. The situation is similar with Redis - although it could be used like a Kafka alternative, we do use it just as a per-component … Incentivized Helpful? Verified User Anonymous Chose Apache Kafka Apache Kafka is much more scalable and more reliable. Does not depend on memory, works well on rotational disks and that makes it a cheaper to use solution on low hardware requirements. Running multiple consumers on the same topic can also mean processing the same data again … Incentivized Helpful? Verified User Anonymous Chose Apache Kafka All stack tech helps our app and system. These technologies allow us to have the data available faster between different regions (due to our particular configuration) and thus the data and processing load of each system is lower. This allows the systems to be used more … Incentivized Helpful? Viral Patel Senior Software Engineer Chose Apache Kafka We had lots of problems with active mq. That is why we started using Apache Kafka. Incentivized Helpful? Verified User Anonymous Chose Apache Kafka Kafka is not a real messaging broker implementation as RabbitMQ or TIBCO EMS/JMS are. Although it can be used as messaging, we like the idea behind the Kafka (data isn't "passing by," instead it remains centra, so the client can revisit the data if necessary). This also … Incentivized Helpful? Verified User Anonymous Chose Apache Kafka Confluent Cloud is still based on Apache Kafka but it has a subscription fee so, from a long term perspective, it is wiser to deploy your own Kafka instance that spans public and private cloud. Amazon Kinesis, Google Cloud Pub/Sub do not do well for a very number of messages … Incentivized Helpful? Verified User Anonymous Chose Apache Kafka I would only use RabbitMQ over Kafka when you need to have delay queues or tons of small topics/queues around. I don't know too much about Pulsar - currently evaluating it - but it's supposed to have the same or better throughput while allowing for tons of queues. Stay tuned - I … Incentivized Helpful? Juan Francisco Tavira Global Technology Centre - Middleware Chose Apache Kafka Kafka is faster and more scalable, also "free" as opensource (albeit we deploy using a commercial distribution). Infrastructure tends to be cheaper. On the other hand, projects must adapt to Kafka APIs that sometimes change and BAU increases until a major 1.x version comes out … Incentivized Helpful?	IBM StreamSets ME Max Evans Logistics Coordinator Chose IBM StreamSets At Unify Logistics, we chose IBM StreamSets over Fivetran for its flexibility in handling complex, real time data pipelines across hybrid environments. While Fivetran offers simplicity and fast setup, StreamSets provides deeper customization, better data drift handling, and … Incentivized Helpful? Verified User Anonymous Chose IBM StreamSets First advantage is that this software is particularly new and it keeps updating according to the needs of the user. Other advantage is the it organises and produces conclusions on the basis of data without leaving any relevant information. Other softwares lack in data … Incentivized Helpful? Verified User Anonymous Chose IBM StreamSets Before, we were using Informatica since most of our applications were running on on-prem servers. Later, when we started moving to the cloud, we tried Informatica Cloud, but it's more useful for batch-oriented than streaming. That's why one of our tech architects suggested IBM … Incentivized Helpful? Verified User Anonymous Chose IBM StreamSets the IBM solution can be considered a good player in the specific perimeter of application because its main functionalities are working well, are easy to use, and complete. it allows also a good degree of freedom when it comes to personalization of pipelines and streams, and … Incentivized Helpful? Verified User Anonymous Chose IBM StreamSets We chose IBM StreamSets because we used to own the product before selling it to IBM, so we have a tremendous amount of folks who are familiar with the product. Incentivized Helpful? Abhishek Katara Assistant Consultant Chose IBM StreamSets StreamSets is a one-stop solution to design Data engineering Pipelines and doesn't require deep Programming knowledge, It's so user-friendly that anyone in Team can contribute to the Idea of pipeline design. In Hadoop One has to be programming proficient to use its various … Incentivized Helpful?

Best Alternatives
	Apache Kafka	IBM StreamSets
Small Businesses	No answers on this topic	Skyvia Score 10.0 out of 10
Medium-sized Companies	IBM MQ Score 8.9 out of 10	Astera Data Pipeline Builder (Centerprise) Score 8.7 out of 10
Enterprises	IBM MQ Score 8.9 out of 10	Control-M Score 9.4 out of 10
All Alternatives	View all alternatives	View all alternatives

User Ratings
	Apache Kafka	IBM StreamSets
Likelihood to Recommend	8.0 (0 ratings)	7.3 (0 ratings)
Likelihood to Renew	9.0 (0 ratings)	- (0 ratings)
Usability	8.0 (0 ratings)	7.7 (0 ratings)
Support Rating	8.4 (0 ratings)	- (0 ratings)

User Testimonials
	Apache Kafka	IBM StreamSets
Likelihood to Recommend	For brokering messages, Confluent Kafka is well suited since it offers a managed solution ready to use. Scenarios where the solution is not very well suited are for example, where pricing is an issue. The solution costs quite a lot for basic usage (for example: for 3 clusters, pricing is above 100k$ a year). Incentivized Verified User Anonymous Read full review	When you are dealing with a data warehouse and want to find an easy way to integrate applications and expose data in real-time, then IBM StreamSets is the best tool to go for. I'm using it for the same purpose in my applications. This tool will be well-suited for someone with a proper technical background. Though IBM StreamSets UI is mostly drag and drop, advanced configurations require technical expertise or support to do the initial setup. Incentivized Verified User Anonymous Read full review
Pros	Apache Kafka is able to handle a large number of I/Os (writes) using 3-4 cheap servers. It scales very well over large workloads and can handle extreme-scale deployments (eg. Linkedin with 300 billion user events each day). The same Kafka setup can be used as a messaging bus, storage system or a log aggregator making it easy to maintain as one system feeding multiple applications. Incentivized Verified User Anonymous Read full review	It makes building data pipelines quite super intuitive even for non coders. Ir also handles real time data ingestion effortlessly so I always have up to date information for my reports. It's great at monitoring data quality as well. Incentivized Sarthak Chopra Specialist Read full review
Cons	The Kafka Tool is a community-made Java application that looks and feels from the past century. Logging can be confusing. This certainly shows when we have to do troubleshooting. Hybrid scenarios - pub/sub, but there are services in and outside a Kubernetes cluster. Then there are a ~3 options, but only 2 (the harder ones) are production-safe. Incentivized Borislav Traykov DevOps Team Leader Read full review	Where the person's skillsets in data analysis is not of an expert. Data monitoring and analysis. Customer data for better customer acquisition Incentivized Verified User Anonymous Read full review
Likelihood to Renew	Kafka has suited our use case very well so far. Going forward we are planning to expand our platform manifold so the load on Kafka and our reliance on Kafka is going to increase only. Animesh Kumar Senior Member of Technical Staff Read full review	No answers on this topic
Usability	Apache Kafka is highly recommended to develop loosely coupled, real-time processing applications. Also, Apache Kafka provides property based configuration. Producer, Consumer and broker contain their own separate property file Incentivized JV Jimesh V Shah Senior Software Engineer Read full review	because i think that overall the solution is having a positive impact on the business, it allows multiple benefits in simplification of the tasks and is capable of doing multiple process that are usually done by a combination of man and systems, reducing the time and effort required to have the data. Incentivized Verified User Anonymous Read full review
Support Rating	Support for Apache Kafka (if willing to pay) is available from Confluent that includes the same time that created Kafka at Linkedin so they know this software in and out. Moreover, Apache Kafka is well known and best practices documents and deployment scenarios are easily available for download. For example, from eBay, Linkedin, Uber, and NYTimes. Incentivized Verified User Anonymous Read full review	No answers on this topic
Alternatives Considered	Apache Kafka is built for scale. From high throughput and real-time data streaming, it has a strong advantage over RabbitMQ with its low latency. This put Apache Kafka at the forefront as the platform of choice for large datasets messaging and ensuring scalability when data scale up tremendously. RabbitMQ however has its strengths in traditional messaging. Routing and message delivery reliability are the bedrock of RabbitMQ and this is where RabbitMQ excels. In my previous workplace, RabbitMQ was of choice as reliability matters more than scale. In two words. Apache Kafka for scale, RabbitMQ for reliability. And for cloud deployment and large dataset messaging in what I am doing now, Apache Kafka is the default choice. VT Victor Tay Engineer Read full review	At Unify Logistics, we chose IBM StreamSets over Fivetran for its flexibility in handling complex, real time data pipelines across hybrid environments. While Fivetran offers simplicity and fast setup, StreamSets provides deeper customization, better data drift handling, and stronger support for dynamic logistics workflows. Incentivized ME Max Evans Logistics Coordinator Read full review
Return on Investment	Positive: bursts of traffic on special holidays are easy to handle because Kafka can absorb and buffer all the messages we need to process long enough to let an understaffed set of back-end services catch up on processing. Hard to put a number to it but we probably save $5k a month having fewer machines running. Positive: makes decoupling the web and API services from the deeper back-end services easier by providing topics as an interface. This allowed us to split up our teams and have them develop independently of each other, speeding up software development. Negative: our engineers have made mistakes such as accidentally dropping a few thousand messages due to the CLI being confusing to use, and as a result a customer lost some of their precious data. I'd say that was more our fault than Kafka's though. Incentivized Verified User Anonymous Read full review	Huge reduction in maintenance as the StreamSets pipelines are resilient. Eliminated the need for other data ingestion tools, saving us hundreds of thousands annually. Incentivized Verified User Anonymous Read full review
ScreenShots