Item: Apache Hive
Rating: 8
Author: Verified User

Overall Satisfaction with Apache Hive

Use Cases and Deployment Scope

I am working as a Research Assistant where I have to process tons of data to produce appropriate findings. Our NLP lab used it for all its big data processing, for example: removing urls, finding counts of specific words, etc. Mainly it assisted in all the processing, cleaning on big datasets we collected for our research.

Pros and Cons

Pros

The SQL-like query language is very familiar to all the CS students. Hence, it's easy to use.
I used it on a server so I realize it is very scalable and can be used to process small and big datasets.
I particularly liked the UDF functionality where the user could define functions to produce particular output.

Cons

Transactions are not supported
Lack of subqueries made some tasks achievable only when completing one query and then the subsequent one
It is not as fast as spark.

Return on Investment

A good engine for data analysis
Easy syntax lead to fast learning for NLP team.
Shifted to spark later on which supports almost all Hive functions and was faster

Alternatives Considered

Apache Spark

Hive and Spark have the same parent company hence they share a lot of common features. Hive follows SQL syntax while Spark has support for RDD, DataFrame API. DataFrame API supports both SQL syntax and has custom functions to perform the same functionality. Spark is faster and can run on distributed systems while Hive is slower.

Support Rating

While I used Hive a lot recently, I never faced issues that lead me to look for technical support. The documentation for developer reference is good enough, although I like the documentation for Spark much more. Since Hive follows SQL syntax, it's very easy to find references for queries online.

Usability

I rate it highly because it is easy to use and scalable. The SQL syntax enables the user to quickly jump in. It also works really well for datasets that have a large disk size. Also, the documentation is good enough for new users.

Key Insights

Do you think Apache Hive delivers good value for the price?

Yes

Are you happy with Apache Hive's feature set?

Yes

Did Apache Hive live up to sales and marketing promises?

I wasn't involved with the selection/purchase process

Did implementation of Apache Hive go as expected?

Yes

Would you buy Apache Hive again?

Yes

Other Software Used

Apache Spark, Hadoop

Likelihood to Recommend

Apache Hive is very well suited for those who are very familiar to SQL query syntax. Due to its easy to use syntax, it can really help in scenarios where a conventional database cannot be used for analysis of big datasets.

On the other hand, it's definitely slower than some other alternatives such as spark. Also, it's not recommended to use it in processing small datasets. Pandas and other normal data loading libraries can be useful to deal with small datasets.

Comments

Please log in to join the conversation

Big Data the SQL way