Overall Satisfaction with Google BigQuery
We use BigQuery as our data warehouse. Meaning, we use BigQuery (BQ) for storing our data, aggregating it and creating pipelines to push data into BQ, and take aggregates out of BQ in order to push them into ElasticSearch. We use it across our whole organization, and most of our data pipelines are now natively using BQ. For us, BQ helped us scale beyond Postgres for very large data sets in a convenient and most importantly, inexpensive way.
- BigQuery integrates exceptionally well with Google Storage. All you have to do is push a CSV to Google Storage, and add it to BQ. BQ will try to detect the schema and import the CSV as a table. The process is very quick.
- There are lots of ways to interact with BQ. Besides the web interface, there are also SDKs you can use to interface with bigquery from your tools. Meaning, it's not just data stuck in the cloud.
- BigQuery lets you search extremely large datasets, quickly. We have many 100m+ datasets loaded, and searching any number of fields through them is not only easy (SQL!) but fast as well (most queries finish < 30 seconds). It's not a real-time system, but for OLAP, it's unbeatable.
- It would be awesome to have BQ be real-time. Right now it serves the OLAP use case very well, but interactive would be great too.
- The user interface is not the best we've used.
- We'd love to have the Standard SQL mode be on by default.
- We were able to reduce our investment in self-hosted Postgres and move our bigger data assets to BigQuery. Overall, we saved hundreds per month, thousands per year.
- We are able to search through very large datasets with BQ that were difficult to search with standard Postgres, even on very large servers. This gave us the ability to do ad-hoc data introspection easily.
- Because we already used Google Storage, it was easy to integrate BQ into our environment.
We liked BQ because the cost of it is only dependent on the amount of data you store (and there are tiers of data access) and how much you search. For us, it is significantly less expensive to run BQ than an equivalent hosted RDBMS. Because most of our data pipelines are automated, and, we only need to do ad-hoc queries irregularly, BQ fit our criteria very well.