What users are saying about
103 Ratings
<a href='https://www.trustradius.com/static/about-trustradius-scoring' target='_blank' rel='nofollow'>trScore algorithm: Learn more.</a>
Score 8.5 out of 101
36 Ratings
<a href='https://www.trustradius.com/static/about-trustradius-scoring' target='_blank' rel='nofollow'>trScore algorithm: Learn more.</a>
Score 8.6 out of 101

Add comparison

Likelihood to Recommend

Apache Spark

The software appears to run more efficiently than other big data tools, such as Hadoop. Given that, Apache Spark is well-suited for querying and trying to make sense of very, very large data sets. The software offers many advanced machine learning and econometrics tools, although these tools are used only partially because very large data sets require too much time when the data sets get too large. The software is not well-suited for projects that are not big data in size. The graphics and analytical output are subpar compared to other tools.
Thomas Young profile photo

Hortonworks Data Platform

I find HDP easy to use and solves most of the problems for people looking to manage their big data. Evaluating the Hortonworks Data Platform is easy as it is free to download and install in your cluster. Single node cluster available as Sandbox is also easy for POCs.
Piyush Routray profile photo

Pros

  • Rich APIs for data transformation making for very each to transform and prepare data in a distributed environment without worrying about memory issues
  • Faster in execution times compare to Hadoop and PIG Latin
  • Easy SQL interface to the same data set for people who are comfortable to explore data in a declarative manner
  • Interoperability between SQL and Scala / Python style of munging data
Nitin Pasumarthy profile photo
  • It is a well suited data platform to support big data storage and analysis, with computational efficiency, good performance, and stability.
  • It is free to use. Online development community is well supported. Hortonworks engineers seem to have good experience and skill sets.
  • It is easy and fast to integrate with other tools or components for big data handling and analysis.
No photo available

Cons

  • Documentation could be better as I usually end up going to other sites / blogs to understand the concepts better
  • More APIs are to be ported to MLlib as only very few algorithms are available at least in clustering segment
Nitin Pasumarthy profile photo
  • As an open source project collection, it relies strongly on community activity. You still have the option to contract premium consulting or training services.
  • Altough it is quickly evolving into Data Science tools availability (eg. Tensorflow incorporate in HDP 3), it can be cumbersome from a developer transitioning from a traditional IDE, into the notebook vs. datalake metaphore.
  • As expected for a big data infranstructure, the resource requirements base line is rather high. This means that if used on premise, you need to think of about 10 machines for a minimal reasonable deploy.
Fernando López Bello profile photo

Implementation

No score
No answers yet
No answers on this topic
Hortonworks Data Platform9.0
Based on 1 answer
Try not to change variable names.
Wonoh Kim profile photo

Alternatives Considered

All the above systems work quite well on big data transformations whereas Spark really shines with its bigger API support and its ability to read from and write to multiple data sources. Using Spark one can easily switch between declarative versus imperative versus functional type programming easily based on the situation. Also it doesn't need special data ingestion or indexing pre-processing like Presto. Combining it with Jupyter Notebooks (https://github.com/jupyter-incubator/sparkmagic), one can develop the Spark code in an interactive manner in Scala or Python
Nitin Pasumarthy profile photo
There are many alternatives, but in order to provide a short list:
- Cloudera CDP is the obvious contendant or alternative, being a leader in big data platforms
- MapR
Cloud options:
- AWS
- Azure
- Google
Also, it is worth noting that Hortonworks has been the replacement selected by IBM for its former BigInsights big data infrastructure offering.
Fernando López Bello profile photo

Return on Investment

  • It has had a very positive impact, as it helps reduce the data processing time and thus helps us achieve our goals much faster.
  • Being easy to use, it allows us to adapt to the tool much faster than with others, which in turn allows us to access various data sources such as Hadoop, Apache Mesos, Kubernetes, independently or in the cloud. This makes it very useful.
  • It was very easy for me to use Apache Spark and learn it since I come from a background of Java and SQL, and it shares those basic principles and uses a very similar logic.
Carla Borges profile photo
  • It is difficult to have a negative impact, because the required investment is not that high.
  • The big open community behind Hortonworks and related Apache Project makes it easy to put 'the wheel to meet the road' quite quickly.
  • We have seen management meetings where the attendants were impressed by the results achieved with the datalake built on HDP.
Fernando López Bello profile photo

Pricing Details

Apache Spark

General
Free Trial
Free/Freemium Version
Premium Consulting/Integration Services
Entry-level set up fee?
No
Additional Pricing Details

Hortonworks Data Platform

General
Free Trial
Free/Freemium Version
Premium Consulting/Integration Services
Entry-level set up fee?
No
Additional Pricing Details