Item: Apache Hive
Rating: 5
Author: Verified User

Overall Satisfaction with Apache Hive

Use Cases and Deployment Scope

Apache Hive is being using across our organisation for analytical workloads. We use Hive along with Hortonworks distribution and it's a great SQL on Hadoop tool.

Pros and Cons

Pros

Hive is good for ETL workloads on Hadoop.
HiveQL translates SQL like queries into map reduce jobs.It supports custom map reduce scripts to plugged in.
Hive has two kinds of tables- Hive managed tables and external tables.
Use external table when other applications like pig, sqoop or mapareduce also using the file in hdfs. Once we delete the external table from Hive, it just deletes the metadata from Hive and original file in hdfs stays.

Cons

Use Hive for analytical work loads. Write once and read many scenarios. Do not prefer updates and deletes.
Behind scenes Hive creates map reduce jobs. Hive performance is slow compared to Apache Spark.
Map reduce writes the intermediate outputs to dial whereas Spark operates in in-memory and uses DAG.