Item: Apache Kafka
Rating: 10
Author: Verified User

Use Cases and Deployment Scope

Apache Kafka is used as a stream/message ingestion engine for all the customer-facing apps including some internal streams company-wide. It is used to ingest close to 2-5 million small (few bytes) messages per second that are then used for internal analytics and decision making in realtime and feed analytics backend (Tibco Spotfire).

Pros and Cons

Apache Kafka is able to handle a large number of I/Os (writes) using 3-4 cheap servers.
It scales very well over large workloads and can handle extreme-scale deployments (eg. Linkedin with 300 billion user events each day).
The same Kafka setup can be used as a messaging bus, storage system or a log aggregator making it easy to maintain as one system feeding multiple applications.

Apache Kafka does take some initial setup and deployment time especially if you haven't bought support from Confluent.
It is not a full solution so for an analytics use case, you will still need something like Tibco.
It does not have a SQL based query engine out-of-the-box so building/using analytics on top can be a lot of work. It would be great to have something already baked into Kafka out-of-the-box.

Return on Investment

Positive impact on ROI since now we can use one large deployment of Apache Kafka that can be used for multiple scenarios ( storage systems, log aggregate, messaging queue ).
It is open-source so there are no licenses or subscription fees reducing the cost of deployment.
Data can now be ingested and analyzed in real-time making it easy to fine-tune the customer experience and decision making for internal IT.

Alternatives Considered

Confluent Cloud, Amazon Kinesis, Google Cloud Pub/Sub, IBM MQ and RabbitMQ

Confluent Cloud is still based on Apache Kafka but it has a subscription fee so, from a long term perspective, it is wiser to deploy your own Kafka instance that spans public and private cloud. Amazon Kinesis, Google Cloud Pub/Sub do not do well for a very number of messages and doesn't provide ordering guarantees as Apache Kafka or Confluent. Apache Kafka does better in scaling and availability than IBM MQ and Rabbit MQ.

Support Rating

Support for Apache Kafka (if willing to pay) is available from Confluent that includes the same time that created Kafka at Linkedin so they know this software in and out. Moreover, Apache Kafka is well known and best practices documents and deployment scenarios are easily available for download. For example, from eBay, Linkedin, Uber, and NYTimes.

Key Insights

Do you think Apache Kafka delivers good value for the price?

Yes

Are you happy with Apache Kafka's feature set?

Yes

Did Apache Kafka live up to sales and marketing promises?

Yes

Did implementation of Apache Kafka go as expected?

Yes

Would you buy Apache Kafka again?

Yes

Other Software Used

Amazon Elastic Compute Cloud (EC2), Google Cloud Pub/Sub, Apache Flume, Vertica

Likelihood to Recommend

Apache Kafka is very well suited where the deployment entails getting a very large number of small messages at extremely high rates—4 million-plus messages a second. It is also very well suited when you need stronger ordering guarantees than a traditional messaging system can provide. It is less suited when you don't need such high message ingestion rates and need to do everything in a public cloud. Apache Kafka will be an overkill for such small/simple deployments.

Apache Kafka for large scale message ingestion

Overall Satisfaction with Apache Kafka