What users are saying about
102 Ratings
65 Ratings
102 Ratings
<a href='https://www.trustradius.com/static/about-trustradius-scoring' target='_blank' rel='nofollow'>trScore algorithm: Learn more.</a>
Score 8.5 out of 101
65 Ratings
<a href='https://www.trustradius.com/static/about-trustradius-scoring' target='_blank' rel='nofollow'>trScore algorithm: Learn more.</a>
Score 8.1 out of 101

Add comparison

Likelihood to Recommend

Apache Spark

Apache Spark has rich APIs for regular data transformations or for ML workloads or for graph workloads, whereas other systems may not such a wide range of support. Choose it when you need to perform data transformations for big data as offline jobs, whereas use MongoDB-like distributed database systems for more realtime queries.
Nitin Pasumarthy profile photo

Apache Hive

Hive is mostly useful in HDFS environments where legacy BI tools need to access the data. This is ok if there is a low concurrency of users but will fall over with any significant multi-user environment.
No photo available

Pros

  • Rich APIs for data transformation making for very each to transform and prepare data in a distributed environment without worrying about memory issues
  • Faster in execution times compare to Hadoop and PIG Latin
  • Easy SQL interface to the same data set for people who are comfortable to explore data in a declarative manner
  • Interoperability between SQL and Scala / Python style of munging data
Nitin Pasumarthy profile photo
  • Apache Hive works extremely well with large data sets. Analysis over a large data set (Example: 1PB of data) is made easy with hive.
  • User-defined functions gives flexibility to users to define operations that are used frequently as functions.
  • String functions that are available in hive has been extensively used for analysis.
Venkata Mallepudi profile photo

Cons

  • Memory management. Very weak on that.
  • PySpark not as robust as scala with spark.
  • spark master HA is needed. Not as HA as it should be.
  • Locality should not be a necessity, but does help improvement. But would prefer no locality
Anson Abraham profile photo
  • Speed is slower compared to Impala since it uses map reduce
Yinghua Hu profile photo

Likelihood to Renew

No score
No answers yet
No answers on this topic
Apache Hive10.0
Based on 1 answer
Since I do not know the second data warehouse solution that integrate with HDFS as well as Hive.
Yinghua Hu profile photo

Usability

No score
No answers yet
No answers on this topic
Apache Hive9.0
Based on 1 answer
Hive's support SQL like queries improves its usability since almost every potential user of Hive would have had experience with SQL.
Tom Thomas profile photo

Alternatives Considered

vs MapRedce, it was faster and easier to manage. Especially for Machine Learning, where MapReduce is lacking. Also Apache Storm was slower and didn't scale as much as Spark does. Spark elasticity was easier to apply compared to storm and MapReduce.managing resources for Spark was easier compared to storm as well. MapReduce is slower than spark.
Anson Abraham profile photo
I considered Hive because it is the best suited option when it comes to larger data access. Besides, learning HiveQL is comparatively easy.
Kartik Chavan profile photo

Return on Investment

  • overall positive impact to the business for analysis of big data using hadoop file system
  • very well received by data scientists in the business despite its shortcoming on analytical dashboarding
Shiv Shivakumar profile photo
  • Installation and set up of the clusters is easy.
  • Effective handling of the complex queries and large set of data.
Kartik Chavan profile photo

Pricing Details

Apache Spark

General
Free Trial
Free/Freemium Version
Premium Consulting/Integration Services
Entry-level set up fee?
No
Additional Pricing Details

Apache Hive

General
Free Trial
Free/Freemium Version
Premium Consulting/Integration Services
Entry-level set up fee?
No
Additional Pricing Details