Evaluation of BigQuery from a Hadoop viewpoint
November 20, 2015

Evaluation of BigQuery from a Hadoop viewpoint

Csaba Toth | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User

Overall Satisfaction with Google BigQuery

I evaluated and presented introduction to Google BigQuery for the Fresno Google Developer Group technology meetup and also at Google DevFest West conference.
I tried several publicly available datasets, followed several sample queries, studied BigQuery specific instructions. ALso took a look at Google Genomics and its public datasets.
  • The web console provides extremely simple interface for test and try.
  • REST API provides capability for integrating with software solutions.
  • The web interface provides useful features like query history, named/saved queries, export results.
  • If accidentally the return dataset would be humongous (you forget to LIMIT), you cannot really stop a running query, and it'll probably be billed
  • If BigQuery fits your business needs (see best scenarios) it can yield great ROI.
  • You maybe able to answer questions with ease which would take much more effort with other big data query technologies (HIVE, Spark, ...). You miay pay some costs.
  • Following some best practices (how to construct and limit your queries) can decrease your costs.
Spinning up, provisioning, maintaining and debugging a Hadoop solution can be non-trivial, painful. I'm talking about both GCE based or HDInsight clusters. It requires expertise (+ employee hire, costs). With BigQuery if someone has a good SQL knowledge (and maybe a little programming), can already start to test and develop. All of the infrastructure and platform services are taken care of. Google BigQuery is a magnitudes simpler to use than Hadoop, but you have to evaluate the costs. BigQuery billing is dependent on your data size and how much data your query touches.
It can be an extremely good fit if:
1. You have data in Google Cloud Storage
2. You don't want to deal with the hassle of spinning up a Hadoop cluster
or you have especially large dataset and you don't want to deal with scaling-out logic. Also, costs might be high.
It's not good for you if you have some specific algorithm which cannot be phrased in the BogQuery SQL flavor.
It maybe unnecessary if near-real-time results are not too important factor, and it doesn't matter if a query returns in 2-3 seconds or 20-30. If you already have some Hadoop infrastructure, HIVE or Spark, your existing solution might be cheaper.
There are best practices which can decrease your costs a lot (for e.g. how many columns your query involves, how well do you filter your data in the query).

Google BigQuery Feature Ratings

Database scalability
Database security provisions
Monitoring and metrics