Likelihood to Recommend Well suited: To most of the local run of datasets and non-prod systems - scalability is not a problem at all. Including data from multiple types of data sources is an added advantage. MLlib is a decently nice built-in library that can be used for most of the ML tasks. Less appropriate: We had to work on a RecSys where the music dataset that we used was around 300+Gb in size. We faced memory-based issues. Few times we also got memory errors. Also the MLlib library does not have support for advanced analytics and deep-learning frameworks support. Understanding the internals of the working of Apache Spark for beginners is highly not possible.
Read full review Google BigQuery really shines in scenarios requiring real-time analytics on large data streams and predictive analytics with its machine learning integration. Teams have been using it extensively all over. However, it may not be the best fit for organizations dealing with small datasets because of the higher costs. And also, it might not be the best fit for highly complex data transformations, where simpler or more specialized solutions could be more appropriate.
Read full review Pros Rich APIs for data transformation making for very each to transform and prepare data in a distributed environment without worrying about memory issues Faster in execution times compare to Hadoop and PIG Latin Easy SQL interface to the same data set for people who are comfortable to explore data in a declarative manner Interoperability between SQL and Scala / Python style of munging data Read full review Its serverless architecture and underlying Dremel technology are incredibly fast even on complex datasets. I can get answers to my questions almost instantly, without waiting hours for traditional data warehouses to churn through the data. Previously, our data was scattered across various databases and spreadsheets and getting a holistic view was pretty difficult. Google BigQuery acts as a central repository and consolidates everything in one place to join data sets and find hidden patterns. Running reports on our old systems used to take forever. Google BigQuery's crazy fast query speed lets us get insights from massive datasets in seconds. Read full review Cons Memory management. Very weak on that. PySpark not as robust as scala with spark. spark master HA is needed. Not as HA as it should be. Locality should not be a necessity, but does help improvement. But would prefer no locality Read full review It is challenging to predict costs due to BigQuery's pay-per-query pricing model. User-friendly cost estimation tools, along with improved budget alerting features, could help users better manage and predict expenses. The BigQuery interface is less intuitive. A more user-friendly interface, enhanced documentation, and built-in tutorial systems could make BigQuery more accessible to a broader audience. Read full review Likelihood to Renew Capacity of computing data in cluster and fast speed.
Steven Li Senior Software Developer (Consultant)
Read full review We have to use this product as its a 3rd party supplier choice to utilise this product for their data side backend so will not be likely we will move away from this product in the future unless the 3rd party supplier decides to change data vendors.
Read full review Usability The only thing I dislike about spark's usability is the learning curve, there are many actions and transformations, however, its wide-range of uses for ETL processing, facility to integrate and it's multi-language support make this library a powerhouse for your data science solutions. It has especially aided us with its lightning-fast processing times.
Read full review web UI is easy and convenient. Many RDBMS clients such as aqua data studio, Dbeaver data grid, and others connect. Range of well-documented APIs available. The range of features keeps expanding, increasing similar features to traditional RDBMS such as Oracle and DB2
Read full review Support Rating 1. It integrates very well with scala or python. 2. It's very easy to understand SQL interoperability. 3. Apache is way faster than the other competitive technologies. 4. The support from the Apache community is very huge for Spark. 5. Execution times are faster as compared to others. 6. There are a large number of forums available for Apache Spark. 7. The code availability for Apache Spark is simpler and easy to gain access to. 8. Many organizations use Apache Spark, so many solutions are available for existing applications.
Read full review BigQuery can be difficult to support because it is so solid as a product. Many of the issues you will see are related to your own data sets, however you may see issues importing data and managing jobs. If this occurs, it can be a challenge to get to speak to the correct person who can help you.
Read full review Alternatives Considered Spark in comparison to similar technologies ends up being a one stop shop. You can achieve so much with this one framework instead of having to stitch and weave multiple technologies from the
Hadoop stack, all while getting incredibility performance, minimal boilerplate, and getting the ability to write your application in the language of your choosing.
Read full review I have used
Snowflake and
DataGrip for data retrieval as well as Google BigQuery and can say that all these tools compete for head to head. It is very difficult to say which is better than the other but some features provided by Google BigQuery give it an edge over the others. For example, the reliability of Google is unmatchable by others. One thing that I really like is the ability to integrate Data Studio so easily with Google BigQuery.
Read full review Contract Terms and Pricing Model None so far. Very satisfied with the transparency on contract terms and pricing model.
Read full review Professional Services Google Support has kindly provide individual support and consultants to assist with the integration work. In the circumstance where the consultants are not present to support with the work, Google Support Helpline will always be available to answer to the queries without having to wait for more than 3 days.
Read full review Return on Investment Business leaders are able to take data driven decisions Business users are able access to data in near real time now . Before using spark, they had to wait for at least 24 hours for data to be available Business is able come up with new product ideas Read full review Pricing has been very reasonable for us. The first 10 GB of storage is free each month and costs start at 2 cents per GB per month after that. For example, if you store 1 terabyte (TB) for a month, then the cost would be $20. Streaming data inserts start at 1 cent per 200 megabytes (MBs). The first 1 TB of queries is free, with additional analysis at $5 per TB thereafter. Meta data operations are free. Big Query helps reduce the bar for data analytics, ML and AI. BQ takes care of mundane tasks and streamlines for easy data processing, consumption. The most impressive thing is the ML and AI integration as SQL functions, so the need for moving data around is minimized. The visuals of ML models is very helpful to fine tune training, model building and prediction, etc. Read full review ScreenShots Google BigQuery Screenshots