Evaluation of BigQuery from a Hadoop viewpoint
Overall Satisfaction with Google BigQuery
I evaluated and presented introduction to Google BigQuery for the Fresno Google Developer Group technology meetup and also at Google DevFest West conference.
I tried several publicly available datasets, followed several sample queries, studied BigQuery specific instructions. ALso took a look at Google Genomics and its public datasets.
I tried several publicly available datasets, followed several sample queries, studied BigQuery specific instructions. ALso took a look at Google Genomics and its public datasets.
Pros
- The web console provides extremely simple interface for test and try.
- REST API provides capability for integrating with software solutions.
- The web interface provides useful features like query history, named/saved queries, export results.
Cons
- If accidentally the return dataset would be humongous (you forget to LIMIT), you cannot really stop a running query, and it'll probably be billed
- If BigQuery fits your business needs (see best scenarios) it can yield great ROI.
- You maybe able to answer questions with ease which would take much more effort with other big data query technologies (HIVE, Spark, ...). You miay pay some costs.
- Following some best practices (how to construct and limit your queries) can decrease your costs.
Spinning up, provisioning, maintaining and debugging a Hadoop solution can be non-trivial, painful. I'm talking about both GCE based or HDInsight clusters. It requires expertise (+ employee hire, costs). With BigQuery if someone has a good SQL knowledge (and maybe a little programming), can already start to test and develop. All of the infrastructure and platform services are taken care of. Google BigQuery is a magnitudes simpler to use than Hadoop, but you have to evaluate the costs. BigQuery billing is dependent on your data size and how much data your query touches.


Comments
Please log in to join the conversation