Item: Apache Hive
Rating: 7
Author: Jordan Moore

Overall Satisfaction with Apache Hive

Use Cases and Deployment Scope

Hive allows us to run SQL queries against data sitting in Hadoop.

Pros and Cons

Pros

One of the standard SQL on Hadoop implementations. Comes installed in both HDP and CDH Hadoop distributions.
Hive Live Long and Process has made recent significant improvement on long-running queries.
Allows BI tools to run analysis over Hadoop data.
Allows various relational databases for its metastore. These include MySQL, Postgres, Derby, or Oracle.

Cons

Needs to keep up with execution engine improvements. Spark or Tez on Hive, then LLAP are good starts.
Overall speed of ad-hoc querying could be improved.

Return on Investment

Allows analysts to use their SQL skills against large datasets.
Slow queries allow for opportunities to discover bottlenecks, parameters to tune, and alternative tools or ways to architect a system.

Alternatives Considered

Apache Impala, Apache Spark and PostgreSQL

Hive was one of the first SQL on Hadoop technologies, and it comes bundled with the main Hadoop distributions of HDP and CDH. Since its release, it has gained good improvements, but selecting the right SQL on Hadoop technology requires a good understanding of the strengths and weaknesses of the alternative options.

Other Software Used

Presto, Apache Spark, MySQL, PostgreSQL

Likelihood to Recommend

Hive is well-suited for providing an SQL engine on Hadoop, but there are alternative SQL on Hadoop projects that claim to have improvements over Hive.

Comments

Please log in to join the conversation

One of the first SQL on Hadoop tools. Perhaps not the best.