Hortonworks: based in the open, close to success
October 08, 2018

Hortonworks: based in the open, close to success

Fernando López Bello | TrustRadius Reviewer
Score 9 out of 10
Vetted Review

Overall Satisfaction with Hortonworks Data Platform

We support organizations and existing customers evolving from traditional business analytics into big data and data science. Hortonworks is big data stack distribution, consisting of 100% open source projects. It mainly stands on Hadoop and Spark components, plus a number of improvements also developed in the open, like Druid and Superset.
  • Hortonworks two main pillars are HDP (Hortonworks Data Platform) and HDP (Hortonworks Data Flow). The former applies to the infrastructure required for building and deploying a data lake, and the latter is about ingestion, in batch or realtime.
  • Both HDP and HDF rely entirely on opensource projects, this is a distinctive point about Hortonworks.
  • In the last year new improvements like Data Plane and Stream Analytics Manager (SAM) take HDP and HDF several steps further into management and governance.
  • As an open source project collection, it relies strongly on community activity. You still have the option to contract premium consulting or training services.
  • Altough it is quickly evolving into Data Science tools availability (eg. Tensorflow incorporate in HDP 3), it can be cumbersome from a developer transitioning from a traditional IDE, into the notebook vs. datalake metaphore.
  • As expected for a big data infranstructure, the resource requirements base line is rather high. This means that if used on premise, you need to think of about 10 machines for a minimal reasonable deploy.
  • It is difficult to have a negative impact, because the required investment is not that high.
  • The big open community behind Hortonworks and related Apache Project makes it easy to put 'the wheel to meet the road' quite quickly.
  • We have seen management meetings where the attendants were impressed by the results achieved with the datalake built on HDP.
There are many alternatives, but in order to provide a short list:
- Cloudera CDP is the obvious contendant or alternative, being a leader in big data platforms
- MapR

Cloud options:
- Azure
- Google

Also, it is worth noting that Hortonworks has been the replacement selected by IBM for its former BigInsights big data infrastructure offering.
It is best used where organizations need to build a data lake from scratch, leveraging its capabilities for ingesting huge volumes from a vast number of different sources -including sensors, logs, text, transactional systems and more.

However, if you just want to try a data science use case, think about if your volume demands such deployment.