Big Data the SQL way
September 23, 2020
Big Data the SQL way
Score 8 out of 10
Vetted Review
Verified User
Overall Satisfaction with Apache Hive
I am working as a Research Assistant where I have to process tons of data to produce appropriate findings. Our NLP lab used it for all its big data processing, for example: removing urls, finding counts of specific words, etc. Mainly it assisted in all the processing, cleaning on big datasets we collected for our research.
- The SQL-like query language is very familiar to all the CS students. Hence, it's easy to use.
- I used it on a server so I realize it is very scalable and can be used to process small and big datasets.
- I particularly liked the UDF functionality where the user could define functions to produce particular output.
- Transactions are not supported
- Lack of subqueries made some tasks achievable only when completing one query and then the subsequent one
- It is not as fast as spark.
- A good engine for data analysis
- Easy syntax lead to fast learning for NLP team.
- Shifted to spark later on which supports almost all Hive functions and was faster
Hive and Spark have the same parent company hence they share a lot of common features. Hive follows SQL syntax while Spark has support for RDD, DataFrame API. DataFrame API supports both SQL syntax and has custom functions to perform the same functionality. Spark is faster and can run on distributed systems while Hive is slower.
Do you think Apache Hive delivers good value for the price?
Yes
Are you happy with Apache Hive's feature set?
Yes
Did Apache Hive live up to sales and marketing promises?
I wasn't involved with the selection/purchase process
Did implementation of Apache Hive go as expected?
Yes
Would you buy Apache Hive again?
Yes