Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java. The Kafka event streaming platform is used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.
N/A
Apache Pulsar
Score 9.2 out of 10
N/A
Apache Pulsar is a cloud-native, distributed messaging and streaming platform originally created at Yahoo! and now an Apache Software Foundation project. It is free and open source, available under the Apache License, version 2.0.
N/A
SSIS
Score 7.6 out of 10
N/A
Microsoft's SQL Server Integration Services (SSIS) is a data integration solution.
N/A
Pricing
Apache Kafka
Apache Pulsar
SQL Server Integration Services (SSIS)
Editions & Modules
No answers on this topic
No answers on this topic
No answers on this topic
Offerings
Pricing Offerings
Apache Kafka
Apache Pulsar
SSIS
Free Trial
No
No
No
Free/Freemium Version
No
Yes
No
Premium Consulting/Integration Services
No
No
No
Entry-level Setup Fee
No setup fee
No setup fee
No setup fee
Additional Details
—
—
—
More Pricing Information
Community Pulse
Apache Kafka
Apache Pulsar
SQL Server Integration Services (SSIS)
Features
Apache Kafka
Apache Pulsar
SQL Server Integration Services (SSIS)
Data Source Connection
Comparison of Data Source Connection features of Product A and Product B
Apache Kafka
-
Ratings
Apache Pulsar
-
Ratings
SQL Server Integration Services (SSIS)
7.0
56 Ratings
17% below category average
Connect to traditional data sources
00 Ratings
00 Ratings
9.056 Ratings
Connecto to Big Data and NoSQL
00 Ratings
00 Ratings
5.043 Ratings
Data Transformations
Comparison of Data Transformations features of Product A and Product B
Apache Kafka
-
Ratings
Apache Pulsar
-
Ratings
SQL Server Integration Services (SSIS)
6.8
56 Ratings
17% below category average
Simple transformations
00 Ratings
00 Ratings
9.056 Ratings
Complex transformations
00 Ratings
00 Ratings
4.755 Ratings
Data Modeling
Comparison of Data Modeling features of Product A and Product B
Apache Kafka
-
Ratings
Apache Pulsar
-
Ratings
SQL Server Integration Services (SSIS)
7.5
54 Ratings
4% below category average
Data model creation
00 Ratings
00 Ratings
9.028 Ratings
Metadata management
00 Ratings
00 Ratings
6.035 Ratings
Business rules and workflow
00 Ratings
00 Ratings
7.045 Ratings
Collaboration
00 Ratings
00 Ratings
9.040 Ratings
Testing and debugging
00 Ratings
00 Ratings
6.351 Ratings
Data Governance
Comparison of Data Governance features of Product A and Product B
Apache Kafka is well-suited for most data-streaming use cases. Amazon Kinesis and Azure EventHubs, unless you have a specific use case where using those cloud PaAS for your data lakes, once set up well, Apache Kafka will take care of everything else in the background. Azure EventHubs, is good for cross-cloud use cases, and Amazon Kinesis - I have no real-world experience. But I believe it is the same.
As I mentioned earlier SQL Server Integration Services is suitable if you want to manage data from different applications. It really helps in fetching the data and generating reports. Its automation make it very easy and time efficient. It works well with large database as well. But it doesn't work well with real time data, it will take some time to gather the real time data. I would not recommend using it in a real time/fast-paced environment.
Really easy to configure. I've used other message brokers such as RabbitMQ and compared to them, Kafka's configurations are very easy to understand and tweak.
Very scalable: easily configured to run on multiple nodes allowing for ease of parallelism (assuming your queues/topics don't have to be consumed in the exact same order the messages were delivered)
Not exactly a feature, but I trust Kafka will be around for at least another decade because active development has continued to be strong and there's a lot of financial backing from Confluent and LinkedIn, and probably many other companies who are using it (which, anecdotally, is many).
Sometimes it becomes difficult to monitor our Kafka deployments. We've been able to overcome it largely using AWS MSK, a managed service for Apache Kafka, but a separate monitoring dashboard would have been great.
Simplify the process for local deployment of Kafka and provide a user interface to get visibility into the different topics and the messages being processed.
Learning curve around creation of broker and topics could be simplified
Connection managers for online data sources can be tricky to configure.
Performance tuning is an art form and trialing different data flow task options can be cumbersome. SSIS can do a better job of providing performance data including historical for monitoring.
Mapping destination using OLE DB command is difficult as destination columns are unnamed.
Excel or flat file connections are limited by version and type.
Some features should be revised or improved, some tools (using it with Visual Studio) of the toolbox should be less schematic and somewhat more flexible. Using for example, the CSV data import is still very old-fashioned and if the data format changes it requires a bit of manual labor to accept the new data structure
Apache Kafka is highly recommended to develop loosely coupled, real-time processing applications. Also, Apache Kafka provides property based configuration. Producer, Consumer and broker contain their own separate property file
SSIS is a great tool for most ETL needs. It has the 90% (or more) use cases covered and even in many of the use cases where it is not ideal SSIS can be extended via a .NET language to do the job well in a supportable way for almost any performance workload.
SQL Server Integration Services performance is dependent directly upon the resources provided to the system. In our environment, we allocated 6 nodes of 4 CPUs, 64GB each, running in parallel. Unfortunately, we had to ramp-up to such a robust environment to get the performance to where we needed it. Most of the reports are completed in a reasonable timeframe. However, in the case of slow running reports, it is often difficult if not impossible to cancel the report without killing the report instance or stopping the service.
Support for Apache Kafka (if willing to pay) is available from Confluent that includes the same time that created Kafka at Linkedin so they know this software in and out. Moreover, Apache Kafka is well known and best practices documents and deployment scenarios are easily available for download. For example, from eBay, Linkedin, Uber, and NYTimes.
The support, when necessary, is excellent. But beyond that, it is very rarely necessary because the user community is so large, vibrant and knowledgable, a simple Google query or forum question can answer almost everything you want to know. You can also get prewritten script tasks with a variety of functionality that saves a lot of time.
The implementation may be different in each case, it is important to properly analyze all the existing infrastructure to understand the kind of work needed, the type of software used and the compatibility between these, the features that you want to exploit, to understand what is possible and which ones require integration with third-party tools
I used other messaging/queue solutions that are a lot more basic than Confluent Kafka, as well as another solution that is no longer in the market called Xively, which was bought and "buried" by Google. In comparison, these solutions offer way fewer functionalities and respond to other needs.
I think SQL Server Integration Services is better suited for on-premises data movement and ADF is more suited for the cloud. Though ADF has more connectors, SQL Server Integration Services is more robust and has better functionality just because it has been around much longer
Positive: Get a quick and reliable pub/sub model implemented - data across components flows easily.
Positive: it's scalable so we can develop small and scale for real-world scenarios
Negative: it's easy to get into a confusing situation if you are not experienced yet or something strange has happened (rare, but it does). Troubleshooting such situations can take time and effort.
Without this, we would have to manually update a spreadsheet of our SQL Server inventory
We would also have poor alerting; if an instance was down we wouldn't know until it was reported by a user
We only have one other person who uses SQL Server Integration Services , he's the expert. It would fall to me without him and I would not enjoy being responsible for it.