Item: Apache Hive
Rating: 8
Author: Verified User

Overall Satisfaction with Apache Hive

Use Cases and Deployment Scope

Hive plays a vital role in our company, together with Hadoop storage. It makes the query and aggregation much easier for old DBA background data analyst, while still benefiting a lot from the performance boost brought by Hadoop. It makes big data analysis more feasible and close to the daily business context.

Pros and Cons

Pros

The SQL, like query interface, is the core value and shining core of the Hive.
It supports various data formats stored and also allows indexing.
It is fast.

Cons

No transaction support.
No sub-query support.
Can only deal with the cold data (non-real time).

Return on Investment

It exposes the distributed calculation world (Hadoop) to the users but doesn't require the user to have the in-depth understanding of boilerplate details, it reduces the time of learning and let the data analyst can focus their efforts on the core business.

Alternatives Considered

HBase

Apache Hive decouples the query layer from the storage layer, it is more flexible and expandable.

Support Rating

We take the advantage of the Apache community which provides a lot of value suggestions and support.

Usability

Hive is a very good big data analysis and ad-hoc query platform, which supports scaling also. The BI processes can be easily integrated with Hadoop via the Hive. It can deal with a much larger data set that traditional RDBMS can not. It is a "must-have" component of the big data domain.

Key Insights

Do you think Apache Hive delivers good value for the price?

Yes

Are you happy with Apache Hive's feature set?

Yes

Did Apache Hive live up to sales and marketing promises?

I wasn't involved with the selection/purchase process

Did implementation of Apache Hive go as expected?

Yes

Would you buy Apache Hive again?

Yes

Other Software Used

Docker, Kubernetes

Likelihood to Recommend

Hive is suitable for big data analysis tasks on top of the historical data storage but is not quite suitable for any real-time data (if that is the case, Casandra should be considered). And as it is not real SQL, for a read-only operation and in-fly aggregation, it is very good, however, if data modification and transaction are needed, it is not suitable.

Comments

Please log in to join the conversation

Hive: When SQL marries with Hadoop