Hive, a very powerful open source data warehouse solution.
Overall Satisfaction with Hive
Hive is used by data team to store the largest datasets of the company. Data is partitioned in Hive and can be queried by Impala.
Pros
- Partition to increase query efficiency.
- Serde to support different data storage format.
- Integrate well with Impala and data can be queried by Impala.
- Support of parquet compression format
Cons
- Speed is slower compared to Impala since it uses map reduce
- Hive, combined with Impala increases the efficiency that our analyst queries the data.
Impala queries faster than Hive on the same data but it highly depends on Hive. Also Impala does not support Serde allowing to query different data format (JSON, XML), but Hive does.
Comments
Please log in to join the conversation