Apache Hive for ETL workloads
September 11, 2017

Apache Hive for ETL workloads

Anonymous | TrustRadius Reviewer
Score 5 out of 10
Vetted Review
Verified User
Review Source

Overall Satisfaction with Apache Hive

Apache Hive is being using across our organisation for analytical workloads. We use Hive along with Hortonworks distribution and it's a great SQL on Hadoop tool.
  • Hive is good for ETL workloads on Hadoop.
  • HiveQL translates SQL like queries into map reduce jobs.It supports custom map reduce scripts to plugged in.
  • Hive has two kinds of tables- Hive managed tables and external tables.
  • Use external table when other applications like pig, sqoop or mapareduce also using the file in hdfs. Once we delete the external table from Hive, it just deletes the metadata from Hive and original file in hdfs stays.
  • Use Hive for analytical work loads. Write once and read many scenarios. Do not prefer updates and deletes.
  • Behind scenes Hive creates map reduce jobs. Hive performance is slow compared to Apache Spark.
  • Map reduce writes the intermediate outputs to dial whereas Spark operates in in-memory and uses DAG.
  • Helps us to handle large volumes
  • Improve the Client SLAs
  • Better decision making due to better/more data processing
Hive is SQL compliant which makes it easy for the data folks compared to Pig
Use it for ETL workloads. I prefer repeat the same workload with Spark and decide the better performance