Overall Satisfaction with Apache Hive
- One of the standard SQL on Hadoop implementations. Comes installed in both HDP and CDH Hadoop distributions.
- Hive Live Long and Process has made recent significant improvement on long-running queries.
- Allows BI tools to run analysis over Hadoop data.
- Allows various relational databases for its metastore. These include MySQL, Postgres, Derby, or Oracle.
- Needs to keep up with execution engine improvements. Spark or Tez on Hive, then LLAP are good starts.
- Overall speed of ad-hoc querying could be improved.
- Allows analysts to use their SQL skills against large datasets.
- Slow queries allow for opportunities to discover bottlenecks, parameters to tune, and alternative tools or ways to architect a system.
Hive was one of the first SQL on Hadoop technologies, and it comes bundled with the main Hadoop distributions of HDP and CDH. Since its release, it has gained good improvements, but selecting the right SQL on Hadoop technology requires a good understanding of the strengths and weaknesses of the alternative options.