Google Cloud Pub/Sub, the jewel of streaming data
April 04, 2023

Google Cloud Pub/Sub, the jewel of streaming data

Cézar Augusto Nascimento e Silva | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User

Overall Satisfaction with Google Cloud Pub/Sub

We used Google Cloud Pub/Sub to solve ETL/Streaming and real-time processing problems for high volumes of data. We used it either to fill datalakes, process and store in warehouses or data marts and also for processing events, either using JSON or protobuf.

This was integrated for many languages such as python, java, golang and kotlin. We had configured kubernetes auto scaling system based on some Google Cloud Pub/Sub metrics which worked very well. The main observed metrics for alerts and overall health indicator of our systems were both the size of each queue and the oldest message in queue, either indicating a high volume jam or some random specific error for a single message, respectively.

We had to handle idempotency since a duplicated message delivery is a possibility, this was usually paired and a Redis Cache to guarantee idempotency for a reasonable time window.
  • Data Streaming
  • Even Sourcing
  • Protobuf message format
  • Scalability
  • Easy to Use
  • Observability
  • Integrated Dead Letter Queue (DLQ) functionality
  • Deliver Once (idempotency) - currently in preview
  • Vendor locked to Google
  • DLQ (Dead Letter Queues)
  • Scalability
  • Delivering backoff for failed messages
  • Scalable System
  • Better Alerts (observability)
  • Auto Scaling
Kafka looks like and ordered queue, there no deliver backoff, so if a message has a problem, it doesn't advance to the next one. Google Cloud Pub/Sub looks like more a SET of messages, and kafka like a LIST. In kafka a same message will repeat instantaneously while it is being NACKED, on the other hand Google Cloud Pub/Sub will just deliver another and apply a backoff to the previous NACKED one, never stopping on a single message forever. Dead Letter Queues are innate to Google Cloud Pub/Sub while in Kafka is not a feature at all. One can configure the maximum amount of NACKs before sending a message to the DLQ, this is very powerful if combined with exponential backoff. Google Cloud Pub/Sub lets users scale consumers at will with no constraint, Kafka has a rather convoluted concept of partitions and consumer groups that requires one to plan ahead how many consumers they would like to plug to the queue because these constraints prevent a free scaling of consumers. If you plan for N consumers, you have to probably stick to that N forever or recreate your Kafka topic, in Confluent Cloud this is somewhat solved, but looks like a solution to a problem that should even exist in the first place.

Do you think Google Cloud Pub/Sub delivers good value for the price?

Yes

Are you happy with Google Cloud Pub/Sub's feature set?

Yes

Did Google Cloud Pub/Sub live up to sales and marketing promises?

Yes

Did implementation of Google Cloud Pub/Sub go as expected?

Yes

Would you buy Google Cloud Pub/Sub again?

Yes

If you want to stream high volumes of data, be it for ETL streaming or event sourcing, Google Cloud Pub/Sub is your go-to tool. It's easy to learn, easy to observe its metrics and scales with ease without additional configuration so if you have more producers of consumers, all you need to do is to deploy on k8s your solutions so that you can perform autoscaling on your pods to adjust to the data volume. The DLQ is also very transparent and easy to configure. Your code will have no logic whatsoever regarding orchestrating pubsub, you just plug and play.

However, if you are not in the Google Cloud Pub/Sub environment, you might have trouble or be most likely unable to use it since I think it's a product of Google Cloud.

Using Google Cloud Pub/Sub

100 - Data and AI engineers
10 - SRE (Site Reliability Engineer)
  • ETL data Streaming
  • Event Sourcing
It serves all of our purposes in the most transparent way I can imagine, after seeing other message queueing providers, I can only attest to its quality.

Configuring Google Cloud Pub/Sub

It's just right, everything is configurable and very easy to understand.
Always set a good exponential backoff to avoid processing the same failed message over and over. Consider using performatic data formats like protobuf instead of simple JSON when dealing with predictable structures of data.
No - we have not done any customization to the interface
No - the product does not support adding custom code

Using Google Cloud Pub/Sub

It has many libraries in many languages, google provides either good guides or they're AI generated code libraries that are easy to understand. It has very good observability too.
ProsCons
Like to use
Relatively simple
Easy to use
Technical support not required
Well integrated
Consistent
Quick to learn
Convenient
Feel confident using
Familiar
None
  • Dead Letter Queue (DLQ)
  • Exponential Backoff
  • Priority Queueing
  • Deliver Once

Google Cloud Pub/Sub Reliability

You can just plug in consumers at will and it will respond, there's no need for further configuration or introducing new concepts. You have a queue, if it's slow, you plug in more consumers to process more messages: simple as that.
I have never faced a single problem in 4 years.
It's very fast, can be even better if you use protobuf.