Apache Hive Reviews & Insights

Score8 out of 10

95 Reviews and Ratings

Community insights

TrustRadius Insights for Apache Hive are summaries of user sentiment data from TrustRadius reviews and, when necessary, third party data sources.

Business Problems Solved

Apache Hive is a versatile software that has been widely used across various departments and organizations for different use cases. It has proven to be particularly helpful in handling large datasets, migrating data between different operating systems, synchronizing programs, and fetching and generating product metrics. Users have found value in using Hive for data analytics, engineering, data science, product management, and IT-related tasks such as improving analysis of big datasets stored in Hadoop HDFS.

Furthermore, Apache Hive has simplified the process of filtering and cleaning data using SQL, reducing the learning curve for handling big data. It allows users to run SQL queries against data in Hadoop, enabling efficient analysis of large datasets without the need to learn a new language. Additionally, Hive has been utilized for building reports, analyzing data stored in the Hadoop file system, processing events gathered in HDFS, and converting them into parquet files for fast querying.

Overall, users have praised Apache Hive for its scalability, accessibility, and cost-effectiveness in storing and retrieving analytics data. It has provided an intuitive solution for storing large datasets, querying big sets of data using SQL, aggregating massive datasets into distilled information for data-driven decision making, and creating external and internal tables in Hadoop/BigData projects. With its ability to process both unstructured and structured data efficiently, Hive has become an essential tool for data analysts, engineers, and business analysts across organizations.

Apache Hive Reviews

35 Reviews

With Apache Hive, you can enter the world of Big Data

Rating: 8 out of 10

April 22, 2022

Use Cases and Deployment Scope

On-premises large data processing is handled by Apache Hive, which is running on Cloud ERA Servers. In order to use Apache Hive, you must have a distributed system that is query efficient and can perform queries quicker with parallel execution. Metrics like user information and purchase history are stored in HDFS and then accessed using queries built on top of Hive using Apache Hive.

Pros

Reduce-based query language with a simple query language.
Parallelism across a distributed system is provided.
All cloud platforms have access to a tabular format and interfaces.

Cons

Due to the shuffled data, complex joins may take a long time to complete.
Execution is dependent on external storage and memory.

Likelihood to Recommend

Data warehouses that update and append records in batches or real time can be queried using Apache Hive. Tableau and other reporting tools may be used straight from Python searches on Apache data sets. Structured data and tables may be accessed using SQL-like syntax. Using a hive, you may build tables at various levels of the Data Lake. Transactional databases are not the best fit.

Verified User

Engineer in Engineering (201-500 employees)

Vetted Review

2 years of experience

Best Distributed Database in the market

Rating: 6 out of 10

Incentivized

April 19, 2022

Use Cases and Deployment Scope

We use Apache Hive to store a large set of data, which are huge documents such as problem statements and its answer, not only submitted by the site owners but also by the user of the site.

Pros

It is easy to store the data that are unstructured
Easy to retrieve using SQL queries instead of other complicated way
Large set of data can be stored efficiently

Cons

Apache Hive can provide more flexibility on the Integration.

Likelihood to Recommend

Apache Hive wont is really useful when we just store small data sets. so sometimes our usage wont is suitable for Hive. we are planning to move to SQL Databases if it continues.

Prasanna Kumar TR

Developer and Site Contributor in Research & Development at ForgetCode.com (1-10 employees)

Vetted Review

2 years of experience

View profile

Help your dev team !

Rating: 8 out of 10

Incentivized

April 12, 2022

Use Cases and Deployment Scope

We build our data lake and perform queries on large amounts of data. We group data from multiple sources into a common structure, making it easy for our developers to perform complex queries without leaving the simple framework provided by SQL. Although the deployment is not easy, once we have the infrastructure, the work is greatly simplified.

Pros

Simplify query to devs
Organize data
Batch process

Cons

Deploy
Maintenance
Support

Likelihood to Recommend

It is great for laboratory environments and to start working with unstructured data about which we are not very clear about how we want to treat it. It also allows queries to be improved very quickly by allowing developers to work with SQL instead of map-reduce. As an improvement, in productive environments, troubleshooting is complicated and requires expert personnel.

Verified User

C-Level Executive in Information Technology (11-50 employees)

Vetted Review

1 year of experience

Spectacular SQL-like interface for accessing Hadoop

Rating: 9 out of 10

Incentivized

April 11, 2022

Use Cases and Deployment Scope

To manage and view Apache Hadoop data in a SQL-like format To be able to query databases across the organization, quickly To query data for the purpose of using on Spark projects To save queries

Pros

Easy-to-use, interactive modern layout
Easy to organize data and view tables and views from across the organization
Fast speed for most queries

Cons

Some queries, particularly complex joins, are still quite slow and can take hours
Previous jobs and queries are not stored sometimes
Switching to Impala can sometimes be time-consuming (i.e. the system hangs, or is slow to respond).
Sometimes, directories and tables don't load properly which causes confusion

Likelihood to Recommend

Apache Hive is well-suited for querying Hadoop. If you use Hadoop you should consider Hive. It is well-suited for large organizations where there is lots of data that needs to be queried. However, there is significant overhead to set up and maintaining Hive (and Hadoop in general). Small companies and individuals should consider other means of storing data, such as SQL.

Verified User

Engineer in Engineering (10,001+ employees)

Vetted Review

1 year of experience

This system makes active data of value.

Rating: 8 out of 10

Incentivized

April 9, 2022

Use Cases and Deployment Scope

We have used the system to migrate data either for new versions or because we will use another operating program, the software helps us to synchronize programs between different operating systems, a history of information can be kept constant, and it can be sent to third parties the information already transformed.

Pros

Please provide some detailed examples of things that Apache Hive does particularly well.
Migration to the cloud is modern and very secure.

Cons

The best way to do this is to schedule the extraction at times established by hours and quantities.
So that it can be used normally in daily use, it must be taken into account that the maintenance management of the system so that it works effectively.

Likelihood to Recommend

Software work execution is on a large scale, it is good to use for new projects or organizational changes, data lineage mapping has always been dubious but this one has had good results. You can store and synchronize data from different departments, the storage process can be manual but it is best automated.

Camilo Palacios

Administrador informático. in Marketing at Logitech (51-200 employees)

Vetted Review

2 years of experience

View profile

Best query platform for ETL.

Rating: 6 out of 10

Incentivized

April 8, 2022

Use Cases and Deployment Scope

I used Apache Hive on top of Hadoop for filtering and cleaning data using SQL. It was the part of the project which I was working on. Apache Hive gives SQL-like a platform where we can fire SQL queries. Apache Hive was a perfect choice for cleaning data as we were using Apache Hadoop and both are Apache products.

Pros

Filtering data
cleaning data
SQL like interface
Integrates with Hadoop

Cons

Uses lot of lot of memory
Not compatible with other databases like Postgres, MySql
Limited support
Slow as compare o other interfaces

Likelihood to Recommend

Apache Hive is best for ETL ( Extract Transform Load ) purposes. It gives its best performance when integrated with the Hadoop file distributed system. Its also very good for performing mathematical operations and when the data is organized and structured. It can handle large sizes of data ( petabytes) but requires a lot of in-memory in the system. It supports both unstructured and structured data nut best with structured data.

Omkar Marne

Research Application Software Engineer in Information Technology at University of North Carolina at Charlotte (1001-5000 employees)

Vetted Review

1 year of experience

View profile

It is an advance to the ease of the processes

Rating: 8 out of 10

Incentivized

April 8, 2022

Use Cases and Deployment Scope

The software is intuitive from the first steps, one of the first features we take into account for the software does not allow duplicate files to be stored. It is advanced software that through data the system constantly learns and develops. The first phase is very effective, the analysis and checking of the information are verified in detail.

Pros

The unification of the data will help to establish the commercial criteria.
We are sure that the data is protected

Cons

If you try to extract an excessive amount of data, the system will become slow
You may have the danger that the system collapses due to the amount of data

Likelihood to Recommend

In addition to the fact that the information is quickly accessible through the established security protocols, it has not helped us as users to maintain a fairly comfortable data processing flow, it is more profitable to process the data in batches, we have been able to unify data from different sources

Pablo Gonzalez

Internet Marketing Manager in Marketing at MKTi México (51-200 employees)

Vetted Review

2 years of experience

View profile

Capabilities of Apache Hive

Rating: 8 out of 10

Incentivized

April 7, 2022

Use Cases and Deployment Scope

Main purpose for using Apache Hive was to get the insights from data. Analyzing the data and use it to take informed business decisions. Also the interface is similar to SQL working so it is easy to understand for a new person also.

Pros

It can be used to retrieve data from database like SQL.
We can partition the data and distribute amongst the clustered machines
Easily scalable, which gives capability of running analytics at a larger level

Cons

No support for working with Unstructured data.
ACID properties are not followed like database which creates confusion many times
Support OLAP environment only, OLTP is not supported

Likelihood to Recommend

If you have workforce who are knowing SQL and you have a need to explore large-scale data and get insights from it then Apache Hive is perfect for you. If you have experienced people who have worked on big data earlier then using Splunk is better. For starting the journey in data-driven decisions and data analytics it is better to use Apache Hive first.

Verified User

C-Level Executive in Product Management (51-200 employees)

Vetted Review

1 year of experience

Excellent bigdata warehouse solution

Rating: 9 out of 10

Incentivized

April 7, 2022

Use Cases and Deployment Scope

Apache Hive is an open-source data warehouse solution built on top of Hadoop that helps to analyze a very large amount of data.
Our use case/scope is to work on a large data analytics project where the data frequency and velocity are very high. Apache Hive is very useful in processing both the unstructured and structured data in a seamless way. It help us in reducing to write complex queries as it is targeted to the SQL queries, we have a engineer team who are very proficient in writing SQL queries with the help of Apache Hive to process the big data.
We have identified no business issues using the solution.

Pros

Apache Hive supports external data tables.
Supports data partitioning to improve overall performance.
Apache hive is reliable and scalable solution.
Apache Hive supports writing ad-hoc queries as well.

Cons

Apache hive is not best suited for OLTP based jobs.
Sometimes we observed high latency rate while querying data.
Limitations on providing row-level data update.
Training materials needs improvements.

Likelihood to Recommend

Apache Hive is a data warehouse/ ETL solution that is being used for processing big data for analytics and visualizations. Apache Hive has great architecture that makes it very well suited for organizations.
The Metastore, is used for storing metadata for each table and its schema. The Driver operates as a controller for executions of the statements. Like other components such as Optimizer and CLI, Thrift Server are some components that enable the processing of big data transformation.

Verified User

Program Manager in Information Technology (201-500 employees)

Vetted Review

2 years of experience

very useful for OLTP

Rating: 10 out of 10

Incentivized

April 6, 2022

Use Cases and Deployment Scope

We use Apache to process large data and get the output with less process time. The framework is very much useful for data processing and analytics purpose.

Pros

Used in data warehouse like similar to ETL tools.
Interface like SQL give data stored in various db group.
Enables analytics at massive scale.

Cons

Way of framework development can be improved.
OLTP is not supported.
Does not offer real time queries.

Likelihood to Recommend

Keeps queries running very fast and takes very little time to write Hive queries in comparison to MapReduce code. Very easy to write queries including joins in Hive.

Verified User

Administrator in Information Technology (1001-5000 employees)

Vetted Review

2 years of experience