It takes me less time to write a Pig script than get a Spark program running for batch ETL workloads. Compared to Spark, Pig has a steeper learning curve because it employs a proprietary programming language. In one script and one fine, it can handle both Map Reduce and Hadoop. …
Apache Pig is best suited for ETL-based data processes. It is good in performance in handling and analyzing a large amount of data. it gives faster results than any other similar tool. It is easy to implement and any user with some initial training or some prior SQL knowledge can work on it. Apache Pig is proud to have a large community base globally.
One of the most important aspects while working with data warehousing solutions and analytics is the ability to handle large datasets. Google BigQuery is the best in business for that particular aspect. It is ridiculously fast while handling large data sets. Another aspect where it is well suited is the ability to integrate it with data visualization tools like Data Studio. It is fast, easy to use, and very reliable. The only aspect where I feel it is less appropriate where you have to pay more of inefficient scripts and that can hamper the growth of the company a bit.
One issue with Google Cloud Storage is its price. For one to have that premium Google Cloud Storage, for the purpose of massive storage, he/she must have adequate cash. Otherwise, Google Cloud Storage is a safe and perfect online storage platform.
The only thing that can come to mind that would be annoying with this software was that sometimes when trying to share files on the Cloud with coworkers, it would just not share at all, or there would be a massive delay in when I shared them and when they received them. Other than that though, everything is perfect with this.
web UI is easy and convenient. Many RDBMS clients such as aqua data studio, Dbeaver data grid, and others connect. Range of well-documented APIs available. The range of features keeps expanding, increasing similar features to traditional RDBMS such as Oracle and DB2
It’s Google, they’re big and well organized, the documentation is abundant and the scalability is amazing. The UX is good too, considering it’s a professional tool expected to be used by people with a specific technical background. Overall, it makes me feels good and secure that we know where to store the data, how to use that data and that the data is handled with utmost security and performance practices.
Apache Pig might help to start things faster at first and it was one of the best tool years back but it lacks important features that are needed in the data engineering world right now. Pig also has a steeper learning curve since it uses a proprietary language compared to Spark which can be coded with Python, Java.
Spinning up, provisioning, maintaining and debugging a Hadoop solution can be non-trivial, painful. I'm talking about both GCE based or HDInsight clusters. It requires expertise (+ employee hire, costs). With BigQuery if someone has a good SQL knowledge (and maybe a little programming), can already start to test and develop. All of the infrastructure and platform services are taken care of. Google BigQuery is a magnitudes simpler to use than Hadoop, but you have to evaluate the costs. BigQuery billing is dependent on your data size and how much data your query touches.
Google Support has kindly provide individual support and consultants to assist with the integration work. In the circumstance where the consultants are not present to support with the work, Google Support Helpline will always be available to answer to the queries without having to wait for more than 3 days.
Higher learning curve than other similar technologies so on-boarding new engineers or change ownership of Apache Pig code tends to be a bit of a headache
Once the language is learned and understood it can be relatively straightforward to write simple Pig scripts so development can go relatively quickly with a skilled team
As distributed technologies grow and improve, overall Apache Pig feels left in the dust and is more legacy code to support than something to actively develop with.
Google BigQuery has had enormous impact in terms of ROI to our business, as it has allowed us to ease our dependence on our physical servers, which we pay for monthly from another hosting service. We have been able to run multiple enterprise scale data processing applications with almost no investment
Since our business is highly client focused, Google Cloud Platform, and BigQuery specifically, has allowed us to get very granular in how our usage should be attributed to different projects, clients, and teams.
Plain and simple, I believe the meager investments that we have made in Google BigQuery have paid themselves back hundreds of times over.