TrustRadius: an HG Insights company

Apache Hive

Score8 out of 10

95 Reviews and Ratings

What is Apache Hive?

Apache Hive is database/data warehouse software that supports data querying and analysis of large datasets stored in the Hadoop distributed file system (HDFS) and other compatible systems, and is distributed under an open source license.

This system makes active data of value.

Use Cases and Deployment Scope

We have used the system to migrate data either for new versions or because we will use another operating program, the software helps us to synchronize programs between different operating systems, a history of information can be kept constant, and it can be sent to third parties the information already transformed.

Pros

  • Please provide some detailed examples of things that Apache Hive does particularly well.
  • Migration to the cloud is modern and very secure.

Cons

  • The best way to do this is to schedule the extraction at times established by hours and quantities.
  • So that it can be used normally in daily use, it must be taken into account that the maintenance management of the system so that it works effectively.

Most Important Features

  • You can use your automation to set certain tasks for repetition, such as separating information based on the criteria you set.
  • It is easy to use the system without the need to understand it programmatically.

Return on Investment

  • When developing projects you will obtain correct figures and true information about what you need or what you have to develop.
  • We are currently trying to integrate with other tools so the software will help us.

Alternatives Considered

ACES (Automated Compliance and Evaluation System) Web Audit Technology, APARAVI and data intelligence & automation platform

Other Software Used

Alight Global Payroll (formerly NGA HR), Agile Hive - SAFe in Jira, Amazon EC2 Auto Scaling

Best query platform for ETL.

Use Cases and Deployment Scope

I used Apache Hive on top of Hadoop for filtering and cleaning data using SQL. It was the part of the project which I was working on. Apache Hive gives SQL-like a platform where we can fire SQL queries. Apache Hive was a perfect choice for cleaning data as we were using Apache Hadoop and both are Apache products.

Pros

  • Filtering data
  • cleaning data
  • SQL like interface
  • Integrates with Hadoop

Cons

  • Uses lot of lot of memory
  • Not compatible with other databases like Postgres, MySql
  • Limited support
  • Slow as compare o other interfaces

Most Important Features

  • Integrates with Hadoop
  • Large size data analysis
  • query optimization

Return on Investment

  • fast results
  • reduced time complexity
  • code debugging is easy

Alternatives Considered

Apache Hadoop

Other Software Used

Apache Hadoop, PostgreSQL, Microsoft SQL Server

It is an advance to the ease of the processes

Use Cases and Deployment Scope

The software is intuitive from the first steps, one of the first features we take into account for the software does not allow duplicate files to be stored. It is advanced software that through data the system constantly learns and develops. The first phase is very effective, the analysis and checking of the information are verified in detail.

Pros

  • The unification of the data will help to establish the commercial criteria.
  • We are sure that the data is protected

Cons

  • If you try to extract an excessive amount of data, the system will become slow
  • You may have the danger that the system collapses due to the amount of data

Most Important Features

  • The good thing is that depending on the requirement that we are considering for storage, the information will be transformed
  • We can track the information we upload constantly

Return on Investment

  • It used to be complicated by the many lines of connectors, but today you just need to understand where to click
  • The software ensures the quality of the data, prepares and cleans it

Alternatives Considered

Dataddo

Other Software Used

Automation Anywhere IQ Bot, AutomationEdge, (EOL) Cisco CloudCenter

Best Distributed Database in the market

Use Cases and Deployment Scope

We use Apache Hive to store a large set of data, which are huge documents such as problem statements and its answer, not only submitted by the site owners but also by the user of the site.

Pros

  • It is easy to store the data that are unstructured
  • Easy to retrieve using SQL queries instead of other complicated way
  • Large set of data can be stored efficiently

Cons

  • Apache Hive can provide more flexibility on the Integration.

Most Important Features

  • MapReduce
  • Using SQL queries to operate on the data

Return on Investment

  • It gave ease of use and performance wise to read/insert large set of data.
  • It helped to improve site performance to load faster

Alternatives Considered

Microsoft SQL Server, Elasticsearch and Google BigQuery

Other Software Used

Microsoft SQL Server, Elasticsearch, Google BigQuery

Apache Hive

Use Cases and Deployment Scope

1. Used Apache Hive to create external and internal tables in Hadoop / BigData projects on Cloudera and Azure platforms. 2. Apache Hive supports different file formats to create tables. Supported file formats are CSV, Parquet, Avro, JSON. 3. Apache Hive can store billions of records in distributed storage and retrieve them efficiently. 4. Apache hive used spark/ Tez / MapReduce engines in the backend for computation.

Pros

  • Apache Hive is fault-tolerant.
  • Apache Hive's latest version supports ACID transactions.
  • Apache Hive supports UPDATE, DELETE and MERGE.

Cons

  • Apache Hive should support ROLLBACK, COMMIT operations.
  • Apache Hive should support XML SerDe.
  • Apache Hive.

Most Important Features

  • Hive supports partitioning and bucketing for faster SQL queries results.
  • Hive support UPDATE, DELETE orations.
  • Apache hive external tables data can be accessed by other applications.

Return on Investment

  • Apache hive helped to manage data on HDFS.
  • Apache hive helped to do data cleansing and data transformation.
  • Apache hive queries were slow, so we had to use Impala (MPP) for exposing the data to end-users.

Alternatives Considered

Azure Synapse Analytics (Azure SQL Data Warehouse) and Databricks Lakehouse Platform (Unified Analytics Platform)

Other Software Used

Azure Synapse Analytics (Azure SQL Data Warehouse), Azure Data Factory