One of the first SQL on Hadoop tools. Perhaps not the best.
Updated February 17, 2018

One of the first SQL on Hadoop tools. Perhaps not the best.

Jordan Moore | TrustRadius Reviewer
Score 7 out of 10
Vetted Review
Verified User

Overall Satisfaction with Apache Hive

Hive allows us to run SQL queries against data sitting in Hadoop.
  • One of the standard SQL on Hadoop implementations. Comes installed in both HDP and CDH Hadoop distributions.
  • Hive Live Long and Process has made recent significant improvement on long-running queries.
  • Allows BI tools to run analysis over Hadoop data.
  • Allows various relational databases for its metastore. These include MySQL, Postgres, Derby, or Oracle.
  • Needs to keep up with execution engine improvements. Spark or Tez on Hive, then LLAP are good starts.
  • Overall speed of ad-hoc querying could be improved.
  • Allows analysts to use their SQL skills against large datasets.
  • Slow queries allow for opportunities to discover bottlenecks, parameters to tune, and alternative tools or ways to architect a system.
Hive was one of the first SQL on Hadoop technologies, and it comes bundled with the main Hadoop distributions of HDP and CDH. Since its release, it has gained good improvements, but selecting the right SQL on Hadoop technology requires a good understanding of the strengths and weaknesses of the alternative options.
Hive is well-suited for providing an SQL engine on Hadoop, but there are alternative SQL on Hadoop projects that claim to have improvements over Hive.