Skip to main content
TrustRadius
Apache Hive

Apache Hive

Overview

What is Apache Hive?

Apache Hive is database/data warehouse software that supports data querying and analysis of large datasets stored in the Hadoop distributed file system (HDFS) and other compatible systems, and is distributed under an open source license.

Read more
Recent Reviews

TrustRadius Insights

Apache Hive is a versatile software that has been widely used across various departments and organizations for different use cases. It has …
Continue reading

Help your dev team !

8 out of 10
April 12, 2022
Incentivized
We build our data lake and perform queries on large amounts of data. We group data from multiple sources into a common structure, making …
Continue reading

very useful for OLTP

10 out of 10
April 06, 2022
Incentivized
We use Apache to process large data and get the output with less process time. The framework is very much useful for data processing and …
Continue reading

Big Data the SQL way

8 out of 10
September 23, 2020
Incentivized
I am working as a Research Assistant where I have to process tons of data to produce appropriate findings. Our NLP lab used it for all its …
Continue reading
Read all reviews

Awards

Products that are considered exceptional by their customers based on a variety of criteria win TrustRadius awards. Learn more about the types of TrustRadius awards to make the best purchase decision. More about TrustRadius Awards

Return to navigation

Pricing

View all pricing
N/A
Unavailable

What is Apache Hive?

Apache Hive is database/data warehouse software that supports data querying and analysis of large datasets stored in the Hadoop distributed file system (HDFS) and other compatible systems, and is distributed under an open source license.

Entry-level set up fee?

  • No setup fee

Offerings

  • Free Trial
  • Free/Freemium Version
  • Premium Consulting/Integration Services

Would you like us to let the vendor know that you want pricing?

24 people also want pricing

Alternatives Pricing

What is ClicData?

ClicData is a 100% cloud-based business intelligence platform that allows users to connect, process, blend, visualize and share data from a single place. As an automated platform, users are able to rely on the latest version of company data, to ensure users make the right decisions. Hundreds of…

What is retailMetrix?

RetailMetrix is a data analytics platform for retailers with the mission of enabling retailers to get value from their data. RetailMatrix processes and stores sales, labor and customer data using data warehouse technologies. Its dashboards and reports allows team to find the data that matters to…

Return to navigation

Product Demos

Apache Hive Hadoop Ecosystem - Big Data Analytics Tutorial by Mahesh Huddar

YouTube

Connecting Microsoft Power BI to Apache Hive using Simba Hive ODBC driver

YouTube

Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive

YouTube
Return to navigation

Product Details

Apache Hive Technical Details

Operating SystemsUnspecified
Mobile ApplicationNo

Frequently Asked Questions

Apache Hive is database/data warehouse software that supports data querying and analysis of large datasets stored in the Hadoop distributed file system (HDFS) and other compatible systems, and is distributed under an open source license.

Reviewers rate Usability highest, with a score of 8.5.

The most common users of Apache Hive are from Enterprises (1,001+ employees).
Return to navigation

Comparisons

View all alternatives
Return to navigation

Reviews and Ratings

(98)

Community Insights

TrustRadius Insights are summaries of user sentiment data from TrustRadius reviews and, when necessary, 3rd-party data sources. Have feedback on this content? Let us know!

Apache Hive is a versatile software that has been widely used across various departments and organizations for different use cases. It has proven to be particularly helpful in handling large datasets, migrating data between different operating systems, synchronizing programs, and fetching and generating product metrics. Users have found value in using Hive for data analytics, engineering, data science, product management, and IT-related tasks such as improving analysis of big datasets stored in Hadoop HDFS.

Furthermore, Apache Hive has simplified the process of filtering and cleaning data using SQL, reducing the learning curve for handling big data. It allows users to run SQL queries against data in Hadoop, enabling efficient analysis of large datasets without the need to learn a new language. Additionally, Hive has been utilized for building reports, analyzing data stored in the Hadoop file system, processing events gathered in HDFS, and converting them into parquet files for fast querying.

Overall, users have praised Apache Hive for its scalability, accessibility, and cost-effectiveness in storing and retrieving analytics data. It has provided an intuitive solution for storing large datasets, querying big sets of data using SQL, aggregating massive datasets into distilled information for data-driven decision making, and creating external and internal tables in Hadoop/BigData projects. With its ability to process both unstructured and structured data efficiently, Hive has become an essential tool for data analysts, engineers, and business analysts across organizations.

Attribute Ratings

Reviews

(1-25 of 35)
Companies can't remove reviews or game the system. Here's why
Score 8 out of 10
Vetted Review
Verified User
  • Reduce-based query language with a simple query language.
  • Parallelism across a distributed system is provided.
  • All cloud platforms have access to a tabular format and interfaces.
  • Due to the shuffled data, complex joins may take a long time to complete.
  • Execution is dependent on external storage and memory.
Score 9 out of 10
Vetted Review
Verified User
Incentivized
  • Easy-to-use, interactive modern layout
  • Easy to organize data and view tables and views from across the organization
  • Fast speed for most queries
  • Some queries, particularly complex joins, are still quite slow and can take hours
  • Previous jobs and queries are not stored sometimes
  • Switching to Impala can sometimes be time-consuming (i.e. the system hangs, or is slow to respond).
  • Sometimes, directories and tables don't load properly which causes confusion
Camilo Palacios | TrustRadius Reviewer
Score 8 out of 10
Vetted Review
Verified User
Incentivized
  • Please provide some detailed examples of things that Apache Hive does particularly well.
  • Migration to the cloud is modern and very secure.
  • The best way to do this is to schedule the extraction at times established by hours and quantities.
  • So that it can be used normally in daily use, it must be taken into account that the maintenance management of the system so that it works effectively.
Pablo Gonzalez | TrustRadius Reviewer
Score 8 out of 10
Vetted Review
Verified User
Incentivized
  • The unification of the data will help to establish the commercial criteria.
  • We are sure that the data is protected
  • If you try to extract an excessive amount of data, the system will become slow
  • You may have the danger that the system collapses due to the amount of data
Score 8 out of 10
Vetted Review
Verified User
Incentivized
  • It can be used to retrieve data from database like SQL.
  • We can partition the data and distribute amongst the clustered machines
  • Easily scalable, which gives capability of running analytics at a larger level
  • No support for working with Unstructured data.
  • ACID properties are not followed like database which creates confusion many times
  • Support OLAP environment only, OLTP is not supported
Score 9 out of 10
Vetted Review
Verified User
Incentivized
  • Apache Hive supports external data tables.
  • Supports data partitioning to improve overall performance.
  • Apache hive is reliable and scalable solution.
  • Apache Hive supports writing ad-hoc queries as well.
  • Apache hive is not best suited for OLTP based jobs.
  • Sometimes we observed high latency rate while querying data.
  • Limitations on providing row-level data update.
  • Training materials needs improvements.
April 06, 2022

very useful for OLTP

Score 10 out of 10
Vetted Review
Verified User
Incentivized
  • Used in data warehouse like similar to ETL tools.
  • Interface like SQL give data stored in various db group.
  • Enables analytics at massive scale.
  • Way of framework development can be improved.
  • OLTP is not supported.
  • Does not offer real time queries.
November 24, 2021

Apache Hive

Surendranatha Reddy Chappidi | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Incentivized
  • Apache Hive is fault-tolerant.
  • Apache Hive's latest version supports ACID transactions.
  • Apache Hive supports UPDATE, DELETE and MERGE.
  • Apache Hive should support ROLLBACK, COMMIT operations.
  • Apache Hive should support XML SerDe.
  • Apache Hive.
akshay kashyap | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Incentivized
  • Simple query language built on top of Ma reduce paradigm.
  • Provides parallel execution over distributed system.
  • Tabular format and connectors available for all cloud platforms.
  • Complex joins may take time to execute due to shuffling of data.
  • Static queries mostly.
  • Slower than Apache Spark by almost 100 times.
  • Dependent on external memory and storage to execute.
Manjeet Singh | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Incentivized
  • It is very easy to set up and start with
  • Apache Hive is a cheaper solution for data warehousing and aggregation compared to other products
  • One of the cons is the speed which is slightly lesser as compare to other enterprise solutions like BigQuery
  • Also, It needs to be maintained by the company itself
September 23, 2020

Big Data the SQL way

Score 8 out of 10
Vetted Review
Verified User
Incentivized
  • The SQL-like query language is very familiar to all the CS students. Hence, it's easy to use.
  • I used it on a server so I realize it is very scalable and can be used to process small and big datasets.
  • I particularly liked the UDF functionality where the user could define functions to produce particular output.
  • Transactions are not supported
  • Lack of subqueries made some tasks achievable only when completing one query and then the subsequent one
  • It is not as fast as spark.
Score 8 out of 10
Vetted Review
Verified User
Incentivized
  • The SQL, like query interface, is the core value and shining core of the Hive.
  • It supports various data formats stored and also allows indexing.
  • It is fast.
  • No transaction support.
  • No sub-query support.
  • Can only deal with the cold data (non-real time).
Ananth Gouri | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Incentivized
  • The capability to handle large amounts of data and its querying process.
  • A syntax similar to SQL is an added advantage.
  • An active developer support and community always ready to help.
  • Ease of usage.
  • Resource consuming sometimes. May be that I was using a larger object file.
  • Needs to add an update or a modify functionality. This has to be the minimilastic CRUD requirement.
August 29, 2018

My Apache Hive Review

Kartik Chavan | TrustRadius Reviewer
Score 8 out of 10
Vetted Review
Verified User
Incentivized
  • Querying in Apache Hive is very simple because it is very similar to SQL.
  • Hive produces good ad hoc queries required for data analysis.
  • Another advantage of Hive is that it is scalable.
  • Apache Hive isn't designed for and doesn't support online processing of data.
  • Sub queries not supported.
  • Updating the data can be a problematic task.
Score 9 out of 10
Vetted Review
Verified User
Incentivized
  • It's Fast!
  • You can store a different kind of data structures here other than the standard ones
  • Good scalability
  • Good redundancy too
  • It's not as ACID compliant as an RDBMS. It's a recently added feature and still needs work.
  • This is not the tool to go for online data processing.
  • It does not support sub-queries.
  • It can't process data in real time.
Jordan Moore | TrustRadius Reviewer
Score 7 out of 10
Vetted Review
Verified User
Incentivized
  • One of the standard SQL on Hadoop implementations. Comes installed in both HDP and CDH Hadoop distributions.
  • Hive Live Long and Process has made recent significant improvement on long-running queries.
  • Allows BI tools to run analysis over Hadoop data.
  • Allows various relational databases for its metastore. These include MySQL, Postgres, Derby, or Oracle.
  • Needs to keep up with execution engine improvements. Spark or Tez on Hive, then LLAP are good starts.
  • Overall speed of ad-hoc querying could be improved.
Tejaswar Rao | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Incentivized
  • Can query on large sets of data and fast when compared to RDBMS
  • Can use SQL for data access and no need to learn new language
  • Can write custom functions (UDF) with python and also Java
  • Security roles for different users should be implemented
  • All the functionalities of SQL should be available
Bharadwaj (Brad) Chivukula | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Incentivized
  • Hive syntax is almost like SQL, so for someone already familiar with SQL it takes almost no effort to pick up Hive.
  • To be able to run map reduce jobs using json parsing and generate dynamic partitions in parquet file format.
  • Simplifies your experience with Hadoop especially for non-technical/coding partners.
  • Hive doesn't support many features that traditional RDBMS SQL has; so it may not be an easier transformation as one would presume.
  • Being OpenSource, it has its share of problems and lack of support; need to explore community groups to get some clarifications if you are not using any of the big distribution providers like Cloudera or HW.
  • Hive is comparatively slower than its competitors. It's easy to use but that comes with the cost of processing. If you are using it just for batch processing then Hive is well and fine.
Return to navigation