Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java. The Kafka event streaming platform is used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.
N/A
Fivetran
Score 8.3 out of 10
N/A
Fivetran replicates applications, databases, events and files into a high-performance data warehouse, after a five minute setup. The vendor says their standardized cloud pipelines are fully managed and zero-maintenance. The vendor says Fivetran began with a realization: For modern companies using cloud-based software and storage, traditional ETL tools badly underperformed, and the complicated configurations they required often led to project failures. To streamline and accelerate…
Apache Kafka is well-suited for most data-streaming use cases. Amazon Kinesis and Azure EventHubs, unless you have a specific use case where using those cloud PaAS for your data lakes, once set up well, Apache Kafka will take care of everything else in the background. Azure EventHubs, is good for cross-cloud use cases, and Amazon Kinesis - I have no real-world experience. But I believe it is the same.
Fivetran's business model justifies the use-case where we require data from a single source basically a lot of data but if the requirement is not on the heavier side, Fivetran comes to costly operation when compared to its peers. Otherwise, I'll recommend Fivetran for stability and update and seamless service provider.
Really easy to configure. I've used other message brokers such as RabbitMQ and compared to them, Kafka's configurations are very easy to understand and tweak.
Very scalable: easily configured to run on multiple nodes allowing for ease of parallelism (assuming your queues/topics don't have to be consumed in the exact same order the messages were delivered)
Not exactly a feature, but I trust Kafka will be around for at least another decade because active development has continued to be strong and there's a lot of financial backing from Confluent and LinkedIn, and probably many other companies who are using it (which, anecdotally, is many).
Sometimes it becomes difficult to monitor our Kafka deployments. We've been able to overcome it largely using AWS MSK, a managed service for Apache Kafka, but a separate monitoring dashboard would have been great.
Simplify the process for local deployment of Kafka and provide a user interface to get visibility into the different topics and the messages being processed.
Learning curve around creation of broker and topics could be simplified
Apache Kafka is highly recommended to develop loosely coupled, real-time processing applications. Also, Apache Kafka provides property based configuration. Producer, Consumer and broker contain their own separate property file
Very easy and intuitive to setup and maintain as there usually are not that many options. Very well documented (e.g. how to setup each connector, how the schema looks like, any specific features of this connector etc.). Also the operation is intuitive, e.g. you have status pages, log pages, configuration pages etc. for each connector.
It runs pretty well and gets our data from point A to point cluster quickly enough. Honestly, it's not something I think about unless it breaks and that's pretty rare.
Support for Apache Kafka (if willing to pay) is available from Confluent that includes the same time that created Kafka at Linkedin so they know this software in and out. Moreover, Apache Kafka is well known and best practices documents and deployment scenarios are easily available for download. For example, from eBay, Linkedin, Uber, and NYTimes.
I used other messaging/queue solutions that are a lot more basic than Confluent Kafka, as well as another solution that is no longer in the market called Xively, which was bought and "buried" by Google. In comparison, these solutions offer way fewer functionalities and respond to other needs.
We never seriously considered using anything else. Our data engineers had used Fivetran extensively in previous roles so when it came time to make a decision, there wasn't much of a process. They gladly signed the contract with Fivetran pretty quickly.
Positive: Get a quick and reliable pub/sub model implemented - data across components flows easily.
Positive: it's scalable so we can develop small and scale for real-world scenarios
Negative: it's easy to get into a confusing situation if you are not experienced yet or something strange has happened (rare, but it does). Troubleshooting such situations can take time and effort.