Apache Hive Reviews

<a href='https://www.trustradius.com/static/about-trustradius-scoring#question3' target='_blank' rel='nofollow noopener'>Customer Verified: Read more.</a>
83 Ratings
<a href='https://www.trustradius.com/static/about-trustradius-scoring' target='_blank' rel='nofollow noopener'>trScore algorithm: Learn more.</a>
Score 8.1 out of 100

Do you work for this company? Learn how we help vendors

Overall Rating

Reviewer's Company Size

Last Updated

By Topic

Industry

Department

Experience

Job Type

Role

Reviews (1-25 of 27)

Companies can't remove reviews or game the system. Here's why.
June 02, 2021
akshay kashyap | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Review Source
We are using Apache Hive over an on-premise big data setup built on top of Cloud ERA Servers. Use case behind using Apache Hive [it] is query efficient over distributed system and runs queries faster, with parallel execution. We save our metrics such as user info, purchase history, transaction and preferences in HDFS file system and use Apache Hive to query on top of it and run analytics to display output.
  • Simple query language built on top of Ma reduce paradigm.
  • Provides parallel execution over distributed system.
  • Tabular format and connectors available for all cloud platforms.
  • Complex joins may take time to execute due to shuffling of data.
  • Static queries mostly.
  • Slower than Apache Spark by almost 100 times.
  • Dependent on external memory and storage to execute.
You can use Apache Hive to query over a large data warehouse which updates, append records on either batch or in real time. Apache queries can give you output in the desired format that you can use as any reporting tool such as Tableau, directly using Python.
Read akshay kashyap's full review
June 02, 2021
Anonymous | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Review Source
Hive is used as an ETL tool for Hadoop databases. HiveQl is used by all departments to query data from hive databases and tables. It provides an easy interface to extract and load data from Hadoop data tables. Hive provides a data Warehouse infrastructure/system built on top of Hadoop for querying and analyzing structured data residing in HDFS.
  • Hive queries are very efficient and fast and produces results in seconds.
  • Provides features like ETL , reporting and analytics on top of Hadoop file systems.
  • Supports SQL like syntax to query data from Hive tables.
  • Supports multiple data formats.
  • No updates and support as it is open source.
  • Not suitable for Online analytical transaction Processing systems.
  • No sub-query support.
Well suited to query data from Hadoop systems as it provides an easy SQL like syntax called Hiveql for extracting data. Not suited for complex queries as it doesn't supports sub-query processing.
Read this authenticated review
December 28, 2020
Manjeet Singh | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Review Source
I have used Apache Hive in [the] last 3 companies and it's being used by the multiple departments spread across data analytics, engineering, data science and product management.
It's being used for fetching and generating all the product metrics, for fetching legal data whenever required. All the product history data is stored in it,
It's the one stop cheaper solution for storing and fetching all the analytics data
  • It is very easy to set up and start with
  • Apache Hive is a cheaper solution for data warehousing and aggregation compared to other products
  • One of the cons is the speed which is slightly lesser as compare to other enterprise solutions like BigQuery
  • Also, It needs to be maintained by the company itself
It's fairly okay to set up and also cost is well within the pocket.
If our requirement of aggregation is within seconds for. Terabytes of data then we may have to lookup for other solutions
Read Manjeet Singh's full review
September 22, 2020
Partha Protim Pegu | TrustRadius Reviewer
Score 6 out of 10
Vetted Review
Verified User
Review Source
We are currently using Apache Hive as a data warehousing tool to run SQL queries on multiple databases and file systems. Since we are also using Hadoop, we later integrate those queries with Hadoop for data processing. It is a great tool that can be used instead of MySQL, and we are happily using it on multiple projects.
  • Its ability to integrate with Hadoop
  • Multiple users can query data simultaneously
  • Conversion of varieties of data formats within Hive
  • ETL can be done easily
  • It is used only for OLAP and not used for OLTP
  • Sub queries not supported
If you have a lot of tabular data, then Apache Hive is good option for storing and running queries on them. These can later be used for batch processing. Hive can also be used for archiving old data in RDMS tables. This is much cheaper. It's also well suited when you have expensive queries like big Join queries. I would not suggest Hive if you have a lot of sub queries to run.
Hive also has a community platform of its own just like other Hadoop frameworks. Most of the queries/problems are resolved in the community itself. We can just post our problems or get in touch with a specific user and get the issue resolved. Otherwise there is always the product support team for any resolution.
Read Partha Protim Pegu's full review
September 21, 2020
Kristjan Gannon | TrustRadius Reviewer
Score 7 out of 10
Vetted Review
Verified User
Review Source
We use Apache Hive to make data-driven decisions. It is used from finance to engineering to sales. It helps aggregate our massive data sets into distilled information.
  • Flexibility through schema on read
  • Familiar SQL like query language
  • Functions for complex queries and analysis
  • Slower processing than other tools on the market
Apache Hive is useful for regularly reporting and analyzing data. In terms of ad-hoc analysis and debugging, the cycles can be quite long for querying, feedback, debugging queries, etc.
Apache Hive is open-source, so there are a solid community and information around it for support.
Read Kristjan Gannon's full review
September 19, 2020
Ananth Gouri | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Review Source
As we all know that, Apache Hive sits on the top of Apache Hadoop and is basically used for data-related tasks - majorly at the higher abstraction level. I work as an Assitant Professor at NIE, Mysuru and I am a user of Apache Hive since the first time I taught Big Data Analytics as a PG Course to my students.
It was one of those technical sessions and I was supposed to demonstrate a word count program of a novel downloaded from the Project Gutenberg. I was successfully able to download the novel, load it into the Hadoop platform and execute a HiveQL (a SQL similar syntax used by Apache Hive) query to demonstrate for few unique words, their count, and related examples.
  • The capability to handle large amounts of data and its querying process.
  • A syntax similar to SQL is an added advantage.
  • An active developer support and community always ready to help.
  • Ease of usage.
  • Resource consuming sometimes. May be that I was using a larger object file.
  • Needs to add an update or a modify functionality. This has to be the minimilastic CRUD requirement.
I would definitely recommend Apache Hive if sought by a colleague. Especially for people who are working at academic institutions, they can demonstrate programs like word count, tab count, space count, new lines count, and other related programs - with a basic setup of a HiveQL.

The only underlying problem could be that the Apache Hive is designed to run on the Apache Hadoop ecosystem. People who are not comfortable using a Linux tree structure based File System or even people who are not likely to use a Linux OS might not like to use Hive.
Apache Hive is a FOSS project and its open source. We need not definitely comment on anything about the support of open source and its developer community. But, it has got tremendous developer support, awesome documentation. I would justify the fact that much support can be gathered from the community backup.
Read Ananth Gouri's full review
September 18, 2020
Nicolas Hubert | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Review Source
It is only used in the IT department, mainly by IT engineers, Data Scientists, and Business Analysts with a technical background. It requires some time to master this tool, so this is only for engineer-related positions.
  • Reading databases
  • Writing databases
  • Storing databases
  • Distributed databases
  • Improvement techniques for handling Relational Data
  • Advanced optimizations
  • Transactions memory
Apache Hive acts as a hub for information to be stored and smoothly readable + analyzed by BI analysts in order to make wise and data-driven decisions. Users can read, write and manage data, too. This only requires some SQL intermediary knowledge, and we all know learning SQL is quite easy. I do not think of any scenario where Apache Hive would not be appropriate.
Documentation is so thorough and the community so huge that I never actually had to contact support for Apache Hive. I always found an answer to my questions on some forums. So I put a mark of 9 out of 10.
Read Nicolas Hubert's full review
October 07, 2020
Anonymous | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Review Source
We are using Apache Hive in our whole company as the main data warehouse software solution covering all needed data warehousing tasks. It is being used to interact with huge datasets located in a distributed storage. Since we are using a variety of data formats Apache Hive enables us to query anything with unified SQL syntax.
  • Gives access to files stored in a variety of data storage systems
  • Facilitates ETL operations, reporting and data analysis
  • Supports queries expressed in a declarative language very similar to SQL
  • Not suitable for for online transaction processing workloads
  • Much more complicated than any typical RDBMS
  • Licensing model based on Apache License 2.0
Apache Hive fits perfectly if scalability, performance and fault-tolerance are essential for your data warehousing needs. If you are required to process batch jobs Apache Hive will keep your customers happy. On the other hand, if you are working with web logs data and append-only flat-file type of data, then there are better solutions on the market.
Since Apache Hive is an open source, community developed project the easiest way to get help is to reach out to issue tracker or mailing list. The response we got from there was always very quick and answers were usually relevant. Another great way to solve issues is an easily accessible version control system with well documented source code.
Read this authenticated review
September 20, 2020
Anonymous | TrustRadius Reviewer
Score 8 out of 10
Vetted Review
Verified User
Review Source
Hive plays a vital role in our company, together with Hadoop storage. It makes the query and aggregation much easier for old DBA background data analyst, while still benefiting a lot from the performance boost brought by Hadoop. It makes big data analysis more feasible and close to the daily business context.
  • The SQL, like query interface, is the core value and shining core of the Hive.
  • It supports various data formats stored and also allows indexing.
  • It is fast.
  • No transaction support.
  • No sub-query support.
  • Can only deal with the cold data (non-real time).
Hive is suitable for big data analysis tasks on top of the historical data storage but is not quite suitable for any real-time data (if that is the case, Casandra should be considered). And as it is not real SQL, for a read-only operation and in-fly aggregation, it is very good, however, if data modification and transaction are needed, it is not suitable.
We take the advantage of the Apache community which provides a lot of value suggestions and support.
Read this authenticated review
September 18, 2020
Anonymous | TrustRadius Reviewer
Score 7 out of 10
Vetted Review
Verified User
Review Source
Our company primarily uses Apache Hive to manage our data warehouse by being able to query multiple databases. We partition our tables as well as monitor query performance on very custom data queries by using this hive. Hive is only used by our data analysts and an overseas data warehouse team with only a few shared licenses existing on our virtual machines.
  • Monitor query performance
  • Manage tables in the data warehouse
  • Uses standard SQL
  • UI is quite dated and not intuitive
  • Open-source, so does not have consistent updates or support
  • Not the most optimal for ETL processes
Apache Hive is well suited for organizations looking for an initial tool to begin their process of managing their data warehouse as it is open-source and relatively easy to set up. This works well with some legacy systems and many consoles support this. While Hive used to be quite revolutionary, it has fallen behind many other tools that are more performant or specialized for managing DBs, writing queries, and partitioning tables.
Open-source software with little incentive to innovate or have consistent support to push out updates and changes to the platform
Read this authenticated review
September 23, 2020
Anonymous | TrustRadius Reviewer
Score 8 out of 10
Vetted Review
Verified User
Review Source
I am working as a Research Assistant where I have to process tons of data to produce appropriate findings. Our NLP lab used it for all its big data processing, for example: removing urls, finding counts of specific words, etc. Mainly it assisted in all the processing, cleaning on big datasets we collected for our research.
  • The SQL-like query language is very familiar to all the CS students. Hence, it's easy to use.
  • I used it on a server so I realize it is very scalable and can be used to process small and big datasets.
  • I particularly liked the UDF functionality where the user could define functions to produce particular output.
  • Transactions are not supported
  • Lack of subqueries made some tasks achievable only when completing one query and then the subsequent one
  • It is not as fast as spark.
Apache Hive is very well suited for those who are very familiar to SQL query syntax. Due to its easy to use syntax, it can really help in scenarios where a conventional database cannot be used for analysis of big datasets.

On the other hand, it's definitely slower than some other alternatives such as spark. Also, it's not recommended to use it in processing small datasets. Pandas and other normal data loading libraries can be useful to deal with small datasets.
While I used Hive a lot recently, I never faced issues that lead me to look for technical support. The documentation for developer reference is good enough, although I like the documentation for Spark much more. Since Hive follows SQL syntax, it's very easy to find references for queries online.
Read this authenticated review
August 29, 2018
Kartik Chavan | TrustRadius Reviewer
Score 8 out of 10
Vetted Review
Verified User
Review Source
Apache Hive is being used in our company mainly for big data analysis. It has greatly helps us with data processing & analysis. It is being used across the whole organization. The business problem addressed by it is that it has been helping our organization in storing large data sets and easily accessing them.
  • Querying in Apache Hive is very simple because it is very similar to SQL.
  • Hive produces good ad hoc queries required for data analysis.
  • Another advantage of Hive is that it is scalable.
  • Apache Hive isn't designed for and doesn't support online processing of data.
  • Sub queries not supported.
  • Updating the data can be a problematic task.
It is perfectly suited for analytics.
Read Kartik Chavan's full review
February 17, 2018
Jordan Moore | TrustRadius Reviewer
Score 7 out of 10
Vetted Review
Verified User
Review Source
Hive allows us to run SQL queries against data sitting in Hadoop.
  • One of the standard SQL on Hadoop implementations. Comes installed in both HDP and CDH Hadoop distributions.
  • Hive Live Long and Process has made recent significant improvement on long-running queries.
  • Allows BI tools to run analysis over Hadoop data.
  • Allows various relational databases for its metastore. These include MySQL, Postgres, Derby, or Oracle.
  • Needs to keep up with execution engine improvements. Spark or Tez on Hive, then LLAP are good starts.
  • Overall speed of ad-hoc querying could be improved.
Hive is well-suited for providing an SQL engine on Hadoop, but there are alternative SQL on Hadoop projects that claim to have improvements over Hive.
Read Jordan Moore's full review
December 05, 2017
Tejaswar Rao | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Review Source
We use hive for analyzing big sets of data and for developing rule-based applications. And also for visualization tools and where we query on large sets of data using hive for desired visualization. Hive is fast and also can be imported/exported using other hadoop components. We can use SQL to access data in hive and with no need to learn a new language.
  • Can query on large sets of data and fast when compared to RDBMS
  • Can use SQL for data access and no need to learn new language
  • Can write custom functions (UDF) with python and also Java
  • Security roles for different users should be implemented
  • All the functionalities of SQL should be available
  1. To query on large sets of data
  2. Faster access compared to traditional Databases
  3. OLAP projects
  4. Data Warehousing project
  5. To get insights from GigaByte's or TeraByte's of data
  6. Rule based projects and also to identify the patterns in data
  7. For applying transformations on large sets of data
  8. Faster response time than traditional databases
  9. Also able to get connected with hadoop components
  10. For complex analytical and different types of data formats
Read Tejaswar Rao's full review
October 25, 2017
Bharadwaj (Brad) Chivukula | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Review Source
1. In Retail, the business partners are more comfortable querying their own data instead of relying on Engineers. Hive solves one of those problems. The main purpose of using Hive is to building reports and do analysis of data that is stored in the Hadoop file system.
2. Events are gathered in HDFS by flume and needs to be processed into parquet files for fast querying. The input data contains variable attributes in the json payload as each customer could define custom attributes.

  • Hive syntax is almost like SQL, so for someone already familiar with SQL it takes almost no effort to pick up Hive.
  • To be able to run map reduce jobs using json parsing and generate dynamic partitions in parquet file format.
  • Simplifies your experience with Hadoop especially for non-technical/coding partners.
  • Hive doesn't support many features that traditional RDBMS SQL has; so it may not be an easier transformation as one would presume.
  • Being OpenSource, it has its share of problems and lack of support; need to explore community groups to get some clarifications if you are not using any of the big distribution providers like Cloudera or HW.
  • Hive is comparatively slower than its competitors. It's easy to use but that comes with the cost of processing. If you are using it just for batch processing then Hive is well and fine.

We are trying to mine data from massive data sets for a wide variety of purposes (debugging production issues, creating business metrics, models, and forecasts among other things). We have been able to do this very easily using our data warehouse and a combo of Hive and Pig. Makes it simpler for your BA's as they are familiar with SQL, and can adapt to Hive without too much of technical knowhow.

Read Bharadwaj (Brad) Chivukula's full review
September 13, 2017
Sameer Gupta | TrustRadius Reviewer
Score 8 out of 10
Vetted Review
Verified User
Review Source
Hive is currently being used across the entire analytics organization at SurveyMonkey. The business problem that we solve through it is, accessing/storing large data sets(typically logs), in a scalable and accessible place.
  • SQL like query engine, allows easy ramp up from a standard RDBMS
  • Scalability is great
  • If properly configured the data retreival is fantastic
  • The way we currently have it implemented is quite slow, but I believe that's more of our implementation
  • Joins tend to be slow
I think Apache hive is great for a company just stepping into the big data realm. I think the fact that it's open source allows for a variety of tools to be integrated. The fact that it has HiveQL makes for a great transition from a standard RDMS to a big data tool. This can be very nice in terms of cost savings as the ramp up time for an analyst will be quite low.
Read Sameer Gupta's full review
February 28, 2017
Praveen Murugesan | TrustRadius Reviewer
Score 6 out of 10
Vetted Review
Verified User
Review Source
We use apache hive across the whole organization. We built our own in-house hadoop cluster for data warehousing purposes complementary to HP Vertica which we were using. Vertica is limited to scale, and to achieve true scalability and process trillions of records we had to invest in a new solution. Enter Apache Hive. We are very data driven as an organization and hence to satisfy to appetite of people and also stick to something familiar to query data (SQL) we decided to invest in Apache Hive as a starting point in our new data infrastructure.
  • Hive which leverages traditional MapReduce at the core, can be used to process a large amount of data without a problem. Any problem that can be solved with MapReduce can now be simply expressed in SQL.
  • Hive leverages the disk in the case of processing large data and is not limited by physical memory of any one machine (which is a limitation for systems like Presto). Hence it even allows reasonable fact-fact cross joins.
  • Hive is extensible with UDFs. For any common patterns you can quickly write your own function set and it can be leveraged by everyone.
  • Compute Speed - Hive will be my last option to query vs. something like Presto, which has a much smarter query engine. Hive is slow, and I'd use it only if we cannot use something like Presto/Impala.
  • SQL syntax of hive is unique and does not conform to ANSI SQL. This is quite painful for beginners.
  • The ability to upsert records would be nice to have. Hive is cumbersome for mutable data where partitions require them to be rewritten. No one has solved this really well. If this is solved - it could be leveraged by many systems.
Process large datasets (especially joins of two large datasets, cross joins etc). Hive is not well suited for generic queries on one table and it can still be very slow. There are better solutions for that (Presto, Impala).
Read Praveen Murugesan's full review
September 13, 2016
Venkata Mallepudi | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Review Source
Apache Hive is used for data processing and analysis in the company that I am working for. Apache Hive is being used by the IT department and the results it produces are shared across the whole organization. Performing operations on terabytes of data has become easy without worrying much about the complexity involved. Similarity with SQL related tools has increased the difficulty in looking for employees with big-data skills.
  • Apache Hive works extremely well with large data sets. Analysis over a large data set (Example: 1PB of data) is made easy with hive.
  • User-defined functions gives flexibility to users to define operations that are used frequently as functions.
  • String functions that are available in hive has been extensively used for analysis.
  • Joins (especially left join and right join) are very complex, space consuming and time consuming. Improvement in this area would be of great help!
  • Having more descriptive errors help in resolving issues that arise when configuring and running Apache Hive.
Apache Hive is well suited in situations where doing aggregations would be very time consuming. Apache Hive returns results faster than many other applications.

Latency that exists when working with small data sets is a situation that needs to be looked at. Apache Hive is less appropriate in that scenario.
Read Venkata Mallepudi's full review
May 25, 2016
Tom Thomas | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Review Source
I have used Hive at an enterprise company where I interned. It was being used by the IT department to improve analysis of large datasets stored in the company's Hadoop HDFS. It was also being used because of its support for HiveQL which is a SQL like language enabling queries on large datasets. It also reduced the learning curve for handling big data because of HiveQL's similarity to SQL.
  • Supports SQL like queries
  • Various storage types including RCFile, HBase, ORC, etc.
  • Supports indexing for acceleration
  • HiveQL does not have all the features of SQL
  • No support for transactions
Hive is very well suited for large enterprise businesses that rely on Hadoop for efficient processing of big data in a distributed cluster. HiveQL also brings familiarity of SQL which speeds up the learning process for new users. However, Hive is not an ideal option for a business where data is frequently changing and dynamic.
Read Tom Thomas's full review
June 07, 2018
Anonymous | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Review Source
Hive is currently used in our Data Warehouse in our company. It helps us give more structure to our data and as Hive sits on top of Hadoop, the MR engine. It is a big plus when you want to run a complex query and get faster results. This helps us facilitate the Business Intelligence team to use Hive as a self-querying tool.
  • It's Fast!
  • You can store a different kind of data structures here other than the standard ones
  • Good scalability
  • Good redundancy too
  • It's not as ACID compliant as an RDBMS. It's a recently added feature and still needs work.
  • This is not the tool to go for online data processing.
  • It does not support sub-queries.
  • It can't process data in real time.
This is best suited for data analysts and scientists, it's not a programmers tool. You may still need an RDBMS to read data from as updates and deletes can get a bit more complicated, you can run batch jobs, this will have to be facilitated by additional tools.
Its good for fast query processing, for storing large amounts of data.
Read this authenticated review
December 22, 2014
Yinghua Hu | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User
Review Source
Hive is used by data team to store the largest datasets of the company. Data is partitioned in Hive and can be queried by Impala.
  • Partition to increase query efficiency.
  • Serde to support different data storage format.
  • Integrate well with Impala and data can be queried by Impala.
  • Support of parquet compression format
  • Speed is slower compared to Impala since it uses map reduce
Hive is a data warehouse and it does not allow for updates and deletions. If data needs to be updated frequently, it might not be the best storage solution for that purpose.
Read Yinghua Hu's full review
March 01, 2018
Anonymous | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Review Source
Hive is not used across whole organization but used by certain teams which require querying data from our big data store infrastructure like HDFS. It provides an interface to interact with and directly query HDFS, similar to the way we do it with any relational databases. It is a powerful tool for querying big data.
  • Querying, joining and aggregating data
  • In built-in and user-defined functions
  • Speed
  • Support for other big data frameworks like Spark
  • Need better user interfaces for browsing datastores and querying
[Well suited for] Enterprises who want to create data warehouses on top of Hadoop ecosystem for reporting purpose or get summaries or aggregation from big data. In short, if you have implemented Hadoop then you need Hive.
Read this authenticated review
September 11, 2017
Anonymous | TrustRadius Reviewer
Score 5 out of 10
Vetted Review
Verified User
Review Source
Apache Hive is being using across our organisation for analytical workloads. We use Hive along with Hortonworks distribution and it's a great SQL on Hadoop tool.
  • Hive is good for ETL workloads on Hadoop.
  • HiveQL translates SQL like queries into map reduce jobs.It supports custom map reduce scripts to plugged in.
  • Hive has two kinds of tables- Hive managed tables and external tables.
  • Use external table when other applications like pig, sqoop or mapareduce also using the file in hdfs. Once we delete the external table from Hive, it just deletes the metadata from Hive and original file in hdfs stays.
  • Use Hive for analytical work loads. Write once and read many scenarios. Do not prefer updates and deletes.
  • Behind scenes Hive creates map reduce jobs. Hive performance is slow compared to Apache Spark.
  • Map reduce writes the intermediate outputs to dial whereas Spark operates in in-memory and uses DAG.
Use it for ETL workloads. I prefer repeat the same workload with Spark and decide the better performance
Read this authenticated review
April 26, 2017
Anonymous | TrustRadius Reviewer
Score 8 out of 10
Vetted Review
Verified User
Review Source
We use Apache Hive for two main use cases, analyzing our ever growing data volume insights and reports, and as part of our ETL pipeline where we found writing in SQL like syntax to allow for more rapid development with low complexity to the overall system.

Apache Hive solves a few issues for us but the main one being the ability to analyze large volumes of data on S3 directly with overall strong performance. We have been able to analyze billions of records in a matter of minutes with relatively small EC2 cluster using Apache Hive. It also allows for our Data Analysts to simply write SQL and avoids the ramp up to use other tools such as Apache Pig.
  • Apache Hive allows use to write expressive solutions to complex problems thanks to its SQL-like syntax.
  • Relatively easy to set up and start using.
  • Very little ramp-up to start using the actual product, documentation is very thorough, there is an active community, and the code base is constantly being improved.
  • Debugging can be messy with ambiguous return codes and large jobs can fail without much explanation as to why.
  • Hive is only SQL-like, while more features are being added we have found that some things do not translate over (for example outer joins, inserts, columns can only be referenced once in a select, etc.).
  • For out ETL jobs it does not seem to be the optimal tool due to tunings and performance being difficult, Apache Pig may be better for heavy processing jobs.
Apache Hive shines for ad-hoc analysis and plugging into BI tools. Its SQL-like syntax allows for ease of use not for only for engineers but also for data analysts. Through our experience, there are probably more desirable tools to use if you are planning on integrating Hive into your processing pipeline.
Read this authenticated review
February 14, 2017
Anonymous | TrustRadius Reviewer
Score 8 out of 10
Vetted Review
Verified User
Review Source
Apache Hive is primarily used by data analysts and data engineers at our company. We store most of our data in Hadoop and Apache Hive allows us to access the data faster than by writing MapReduce jobs.
  • Faster than writing MapReduce or scalding jobs to access data in Hadoop.
  • Syntax is essentially the same as that of SQL, making the barriers for entry to start using data low.
  • Apache Hive can be quite slow and is not suitable for interactive querying. Simple queries will take many minutes and more complex queries can take a very long time to finish running.
Apache Hive is suitable for allowing easy access to data stored in Hadoop via a familiar SQL syntax. It is more suitable for one-off data pulls and less suitable for interactive querying due to its speed. For a better interactive querying experience, a solution like Presto would be more suitable.
Read this authenticated review

What is Apache Hive?

Apache Hive is database/data warehouse software that supports data querying and analysis of large datasets stored in the Hadoop distributed file system (HDFS) and other compatible systems, and is distributed under an open source license.

Apache Hive Technical Details

Operating Systems: Unspecified
Mobile Application:No

Frequently Asked Questions

What is Apache Hive?

Apache Hive is database/data warehouse software that supports data querying and analysis of large datasets stored in the Hadoop distributed file system (HDFS) and other compatible systems, and is distributed under an open source license.

What is Apache Hive's best feature?

Reviewers rate Usability highest, with a score of 8.5.

Who uses Apache Hive?

The most common users of Apache Hive are from Enterprises and the Computer Software industry.