Overall Satisfaction with Google BigQuery
Google BigQuery has become the de facto analytics warehouse for our organization. It has allowed us to scale effectively into massive datasets when our internal, physical database could no longer handle these types of workloads. BigQuery is being used by numerous areas within our organization, including my team (Solution Architecture), our internal ETL team, as well as our Advanced Analytics team. BigQuery truly democratizes data access and processing power to anyone that can understand SQL, and has allowed our internal teams to increase the efficiency with which ad hoc analyses can be accomplished on very large datasets.
- BigQuery is a highly optimized, columnar oriented database, and as such it exceeds when doing complex aggregations over massive datasets, i.e. computing n-tiles, statistics, sorting, etc.
- BigQuery is seamlessly integrated with the rest of the Google Cloud Platform stack, and as such it is extremely easy to move data in and out of BigQuery for analysis and storage. However, it also exposes very well defined APIs for inserting and streaming data in, and as such can be used easily with other on-premeses or cloud solutions.
- Because BigQuery is fully managed, there is no need to think about provisioning machines, optimizing memory/cores, 'vacuuming', etc. This increases the 'democratization' effect BigQuery can have, as a basic knowledge of SQL is all that is needed to get started.
- BigQuery does impose quite a few limits on the higher end queries, although they are entirely understandable. For example, very large 'GROUP BY' clauses can sometimes fail with a "Resources Exceeded" error, as the distributed computational nature of BigQuery forces all of that data to be compiled on a single machine, and when that machine runs out of memory it throws the aforementioned error. You can increase your Billing Tier to complete these queries, though.
- When getting data out of BigQuery, there are also quite a few limits. For example, if you are returning a large result set, you are essentially forced to write the results to a table and then export that table to Google Cloud Storage to then be downloaded. However, during the export process, if the table is large, Google will split that table into many smaller blocks that need to be reassembled.
- Google BigQuery has had enormous impact in terms of ROI to our business, as it has allowed us to ease our dependence on our physical servers, which we pay for monthly from another hosting service. We have been able to run multiple enterprise scale data processing applications with almost no investment
- Since our business is highly client focused, Google Cloud Platform, and BigQuery specifically, has allowed us to get very granular in how our usage should be attributed to different projects, clients, and teams.
- Plain and simple, I believe the meager investments that we have made in Google BigQuery have paid themselves back hundreds of times over.
BigQuery is extremely well suited to being a general purpose analytics data warehouse, i.e. if you have large datasets that you wish to extract insights from, and are comfortable with SQL, then BigQuery should be the only place those data live. BigQuery is also extremely well suited to driving enterprise-level dashboards on your actual data, decreasing the deviation of the summarized data from the raw. BigQuery is not as well suited to cases where you hope to return very large datasets, as it is optimized for aggregations.