Item: Apache Hive
Rating: 9
Author: Ananth Gouri

Overall Satisfaction with Apache Hive

Use Cases and Deployment Scope

As we all know that, Apache Hive sits on the top of Apache Hadoop and is basically used for data-related tasks - majorly at the higher abstraction level. I work as an Assitant Professor at NIE, Mysuru and I am a user of Apache Hive since the first time I taught Big Data Analytics as a PG Course to my students.
It was one of those technical sessions and I was supposed to demonstrate a word count program of a novel downloaded from the Project Gutenberg. I was successfully able to download the novel, load it into the Hadoop platform and execute a HiveQL (a SQL similar syntax used by Apache Hive) query to demonstrate for few unique words, their count, and related examples.

Pros and Cons

Pros

The capability to handle large amounts of data and its querying process.
A syntax similar to SQL is an added advantage.
An active developer support and community always ready to help.
Ease of usage.

Cons

Resource consuming sometimes. May be that I was using a larger object file.
Needs to add an update or a modify functionality. This has to be the minimilastic CRUD requirement.

Return on Investment

We did not face a ROI problem with Apache Hive - as its open source.

Alternatives Considered

Presto

One of the major advantages of using Presto or the main reason why people use Presto (Teradata) is due to that fact it can support multiple data sources - which is lacking as in the case of Apache Hive. But still, most people who come from a Structured data-based background like the old days of Dbase, or the later ones of SQL databases like MS SQL, MySQL, PostgreSQL - may still opt to go with Apache Hive for its HiveQL ease and functionality.

Support Rating

Apache Hive is a FOSS project and its open source. We need not definitely comment on anything about the support of open source and its developer community. But, it has got tremendous developer support, awesome documentation. I would justify the fact that much support can be gathered from the community backup.

Usability

Apache Hive makes use of HiveQL as mentioned earlier. Most of the people working with data are comfortable using Structured Query Language syntax. Hence I would give a rating of 10 on 10 for the Hive's usability. May be few functionalities can be added like an update or a post execute feature - making it much more robust.

Key Insights

Do you think Apache Hive delivers good value for the price?

Yes

Are you happy with Apache Hive's feature set?

Yes

Did Apache Hive live up to sales and marketing promises?

I wasn't involved with the selection/purchase process

Did implementation of Apache Hive go as expected?

Yes

Would you buy Apache Hive again?

Yes

Other Software Used

Apache Druid, Presto, Apache Spark

Likelihood to Recommend

I would definitely recommend Apache Hive if sought by a colleague. Especially for people who are working at academic institutions, they can demonstrate programs like word count, tab count, space count, new lines count, and other related programs - with a basic setup of a HiveQL.

The only underlying problem could be that the Apache Hive is designed to run on the Apache Hadoop ecosystem. People who are not comfortable using a Linux tree structure based File System or even people who are not likely to use a Linux OS might not like to use Hive.

Comments

Please log in to join the conversation

Manage data for your warehouse as strong as a beehive using Apache HIve!