Skip to main content
TrustRadius
Apache Hive

Apache Hive

Overview

What is Apache Hive?

Apache Hive is database/data warehouse software that supports data querying and analysis of large datasets stored in the Hadoop distributed file system (HDFS) and other compatible systems, and is distributed under an open source license.

Read more
Recent Reviews

TrustRadius Insights

Apache Hive is a versatile software that has been widely used across various departments and organizations for different use cases. It has …
Continue reading

Help your dev team !

8 out of 10
April 12, 2022
Incentivized
We build our data lake and perform queries on large amounts of data. We group data from multiple sources into a common structure, making …
Continue reading

very useful for OLTP

10 out of 10
April 06, 2022
Incentivized
We use Apache to process large data and get the output with less process time. The framework is very much useful for data processing and …
Continue reading

Big Data the SQL way

8 out of 10
September 23, 2020
Incentivized
I am working as a Research Assistant where I have to process tons of data to produce appropriate findings. Our NLP lab used it for all its …
Continue reading
Read all reviews

Awards

Products that are considered exceptional by their customers based on a variety of criteria win TrustRadius awards. Learn more about the types of TrustRadius awards to make the best purchase decision. More about TrustRadius Awards

Return to navigation

Pricing

View all pricing
N/A
Unavailable

What is Apache Hive?

Apache Hive is database/data warehouse software that supports data querying and analysis of large datasets stored in the Hadoop distributed file system (HDFS) and other compatible systems, and is distributed under an open source license.

Entry-level set up fee?

  • No setup fee

Offerings

  • Free Trial
  • Free/Freemium Version
  • Premium Consulting/Integration Services

Would you like us to let the vendor know that you want pricing?

24 people also want pricing

Alternatives Pricing

What is ClicData?

ClicData is a 100% cloud-based business intelligence platform that allows users to connect, process, blend, visualize and share data from a single place. As an automated platform, users are able to rely on the latest version of company data, to ensure users make the right decisions. Hundreds of…

What is retailMetrix?

RetailMetrix is a data analytics platform for retailers with the mission of enabling retailers to get value from their data. RetailMatrix processes and stores sales, labor and customer data using data warehouse technologies. Its dashboards and reports allows team to find the data that matters to…

Return to navigation

Product Demos

Apache Hive Hadoop Ecosystem - Big Data Analytics Tutorial by Mahesh Huddar

YouTube

Connecting Microsoft Power BI to Apache Hive using Simba Hive ODBC driver

YouTube

Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive

YouTube
Return to navigation

Product Details

Apache Hive Technical Details

Operating SystemsUnspecified
Mobile ApplicationNo

Frequently Asked Questions

Apache Hive is database/data warehouse software that supports data querying and analysis of large datasets stored in the Hadoop distributed file system (HDFS) and other compatible systems, and is distributed under an open source license.

Reviewers rate Usability highest, with a score of 8.5.

The most common users of Apache Hive are from Enterprises (1,001+ employees).
Return to navigation

Comparisons

View all alternatives
Return to navigation

Reviews and Ratings

(97)

Community Insights

TrustRadius Insights are summaries of user sentiment data from TrustRadius reviews and, when necessary, 3rd-party data sources. Have feedback on this content? Let us know!

Apache Hive is a versatile software that has been widely used across various departments and organizations for different use cases. It has proven to be particularly helpful in handling large datasets, migrating data between different operating systems, synchronizing programs, and fetching and generating product metrics. Users have found value in using Hive for data analytics, engineering, data science, product management, and IT-related tasks such as improving analysis of big datasets stored in Hadoop HDFS.

Furthermore, Apache Hive has simplified the process of filtering and cleaning data using SQL, reducing the learning curve for handling big data. It allows users to run SQL queries against data in Hadoop, enabling efficient analysis of large datasets without the need to learn a new language. Additionally, Hive has been utilized for building reports, analyzing data stored in the Hadoop file system, processing events gathered in HDFS, and converting them into parquet files for fast querying.

Overall, users have praised Apache Hive for its scalability, accessibility, and cost-effectiveness in storing and retrieving analytics data. It has provided an intuitive solution for storing large datasets, querying big sets of data using SQL, aggregating massive datasets into distilled information for data-driven decision making, and creating external and internal tables in Hadoop/BigData projects. With its ability to process both unstructured and structured data efficiently, Hive has become an essential tool for data analysts, engineers, and business analysts across organizations.

Attribute Ratings

Reviews

(1-25 of 35)
Companies can't remove reviews or game the system. Here's why
Score 8 out of 10
Vetted Review
Verified User
Data warehouses that update and append records in batches or real time can be queried using Apache Hive. Tableau and other reporting tools may be used straight from Python searches on Apache data sets. Structured data and tables may be accessed using SQL-like syntax. Using a hive, you may build tables at various levels of the Data Lake. Transactional databases are not the best fit.
April 12, 2022

Help your dev team !

Score 8 out of 10
Vetted Review
Verified User
Incentivized
It is great for laboratory environments and to start working with unstructured data about which we are not very clear about how we want to treat it. It also allows queries to be improved very quickly by allowing developers to work with SQL instead of map-reduce. As an improvement, in productive environments, troubleshooting is complicated and requires expert personnel.
Score 9 out of 10
Vetted Review
Verified User
Incentivized
Apache Hive is well-suited for querying Hadoop. If you use Hadoop you should consider Hive. It is well-suited for large organizations where there is lots of data that needs to be queried. However, there is significant overhead to set up and maintaining Hive (and Hadoop in general). Small companies and individuals should consider other means of storing data, such as SQL.
Camilo Palacios | TrustRadius Reviewer
Score 8 out of 10
Vetted Review
Verified User
Incentivized
Software work execution is on a large scale, it is good to use for new projects or organizational changes, data lineage mapping has always been dubious but this one has had good results. You can store and synchronize data from different departments, the storage process can be manual but it is best automated.
Omkar Marne | TrustRadius Reviewer
Score 6 out of 10
Vetted Review
Verified User
Incentivized
Apache Hive is best for ETL ( Extract Transform Load ) purposes. It gives its best performance when integrated with the Hadoop file distributed system. Its also very good for performing mathematical operations and when the data is organized and structured. It can handle large sizes of data ( petabytes) but requires a lot of in-memory in the system. It supports both unstructured and structured data nut best with structured data.
Pablo Gonzalez | TrustRadius Reviewer
Score 8 out of 10
Vetted Review
Verified User
Incentivized
In addition to the fact that the information is quickly accessible through the established security protocols, it has not helped us as users to maintain a fairly comfortable data processing flow, it is more profitable to process the data in batches, we have been able to unify data from different sources
Score 8 out of 10
Vetted Review
Verified User
Incentivized
If you have workforce who are knowing SQL and you have a need to explore large-scale data and get insights from it then Apache Hive is perfect for you. If you have experienced people who have worked on big data earlier then using Splunk is better. For starting the journey in data-driven decisions and data analytics it is better to use Apache Hive first.
Score 9 out of 10
Vetted Review
Verified User
Incentivized
Apache Hive is a data warehouse/ ETL solution that is being used for processing big data for analytics and visualizations. Apache Hive has great architecture that makes it very well suited for organizations.
The Metastore, is used for storing metadata for each table and its schema. The Driver operates as a controller for executions of the statements. Like other components such as Optimizer and CLI, Thrift Server are some components that enable the processing of big data transformation.

April 06, 2022

very useful for OLTP

Score 10 out of 10
Vetted Review
Verified User
Incentivized
Keeps queries running very fast and takes very little time to write Hive queries in comparison to MapReduce code. Very easy to write queries including joins in Hive.
akshay kashyap | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Incentivized
You can use Apache Hive to query over a large data warehouse which updates, append records on either batch or in real time. Apache queries can give you output in the desired format that you can use as any reporting tool such as Tableau, directly using Python.
September 23, 2020

Big Data the SQL way

Score 8 out of 10
Vetted Review
Verified User
Incentivized
Apache Hive is very well suited for those who are very familiar to SQL query syntax. Due to its easy to use syntax, it can really help in scenarios where a conventional database cannot be used for analysis of big datasets.

On the other hand, it's definitely slower than some other alternatives such as spark. Also, it's not recommended to use it in processing small datasets. Pandas and other normal data loading libraries can be useful to deal with small datasets.
Kristjan Gannon | TrustRadius Reviewer
Score 7 out of 10
Vetted Review
Verified User
Incentivized
Apache Hive is useful for regularly reporting and analyzing data. In terms of ad-hoc analysis and debugging, the cycles can be quite long for querying, feedback, debugging queries, etc.
Score 8 out of 10
Vetted Review
Verified User
Incentivized
Hive is suitable for big data analysis tasks on top of the historical data storage but is not quite suitable for any real-time data (if that is the case, Casandra should be considered). And as it is not real SQL, for a read-only operation and in-fly aggregation, it is very good, however, if data modification and transaction are needed, it is not suitable.
Ananth Gouri | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Incentivized
I would definitely recommend Apache Hive if sought by a colleague. Especially for people who are working at academic institutions, they can demonstrate programs like word count, tab count, space count, new lines count, and other related programs - with a basic setup of a HiveQL.

The only underlying problem could be that the Apache Hive is designed to run on the Apache Hadoop ecosystem. People who are not comfortable using a Linux tree structure based File System or even people who are not likely to use a Linux OS might not like to use Hive.
Nicolas Hubert | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Incentivized
Apache Hive acts as a hub for information to be stored and smoothly readable + analyzed by BI analysts in order to make wise and data-driven decisions. Users can read, write and manage data, too. This only requires some SQL intermediary knowledge, and we all know learning SQL is quite easy. I do not think of any scenario where Apache Hive would not be appropriate.
Score 7 out of 10
Vetted Review
Verified User
Incentivized
Apache Hive is well suited for organizations looking for an initial tool to begin their process of managing their data warehouse as it is open-source and relatively easy to set up. This works well with some legacy systems and many consoles support this. While Hive used to be quite revolutionary, it has fallen behind many other tools that are more performant or specialized for managing DBs, writing queries, and partitioning tables.
Score 9 out of 10
Vetted Review
Verified User
Incentivized
This is best suited for data analysts and scientists, it's not a programmers tool. You may still need an RDBMS to read data from as updates and deletes can get a bit more complicated, you can run batch jobs, this will have to be facilitated by additional tools.
Its good for fast query processing, for storing large amounts of data.
Tejaswar Rao | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Incentivized
  1. To query on large sets of data
  2. Faster access compared to traditional Databases
  3. OLAP projects
  4. Data Warehousing project
  5. To get insights from GigaByte's or TeraByte's of data
  6. Rule based projects and also to identify the patterns in data
  7. For applying transformations on large sets of data
  8. Faster response time than traditional databases
  9. Also able to get connected with hadoop components
  10. For complex analytical and different types of data formats
Bharadwaj (Brad) Chivukula | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Incentivized

We are trying to mine data from massive data sets for a wide variety of purposes (debugging production issues, creating business metrics, models, and forecasts among other things). We have been able to do this very easily using our data warehouse and a combo of Hive and Pig. Makes it simpler for your BA's as they are familiar with SQL, and can adapt to Hive without too much of technical knowhow.

Return to navigation