Big Data the SQL way
September 23, 2020

Big Data the SQL way

Anonymous | TrustRadius Reviewer
Score 8 out of 10
Vetted Review
Verified User
Review Source

Overall Satisfaction with Apache Hive

I am working as a Research Assistant where I have to process tons of data to produce appropriate findings. Our NLP lab used it for all its big data processing, for example: removing urls, finding counts of specific words, etc. Mainly it assisted in all the processing, cleaning on big datasets we collected for our research.
  • The SQL-like query language is very familiar to all the CS students. Hence, it's easy to use.
  • I used it on a server so I realize it is very scalable and can be used to process small and big datasets.
  • I particularly liked the UDF functionality where the user could define functions to produce particular output.
  • Transactions are not supported
  • Lack of subqueries made some tasks achievable only when completing one query and then the subsequent one
  • It is not as fast as spark.
  • A good engine for data analysis
  • Easy syntax lead to fast learning for NLP team.
  • Shifted to spark later on which supports almost all Hive functions and was faster
Hive and Spark have the same parent company hence they share a lot of common features. Hive follows SQL syntax while Spark has support for RDD, DataFrame API. DataFrame API supports both SQL syntax and has custom functions to perform the same functionality. Spark is faster and can run on distributed systems while Hive is slower.
While I used Hive a lot recently, I never faced issues that lead me to look for technical support. The documentation for developer reference is good enough, although I like the documentation for Spark much more. Since Hive follows SQL syntax, it's very easy to find references for queries online.
I rate it highly because it is easy to use and scalable. The SQL syntax enables the user to quickly jump in. It also works really well for datasets that have a large disk size. Also, the documentation is good enough for new users.

Do you think Apache Hive delivers good value for the price?


Are you happy with Apache Hive's feature set?


Did Apache Hive live up to sales and marketing promises?

I wasn't involved with the selection/purchase process

Did implementation of Apache Hive go as expected?


Would you buy Apache Hive again?


Apache Hive is very well suited for those who are very familiar to SQL query syntax. Due to its easy to use syntax, it can really help in scenarios where a conventional database cannot be used for analysis of big datasets.

On the other hand, it's definitely slower than some other alternatives such as spark. Also, it's not recommended to use it in processing small datasets. Pandas and other normal data loading libraries can be useful to deal with small datasets.