Apache Hadoop Can Save on the Headaches
January 16, 2021

Apache Hadoop Can Save on the Headaches

Joe Hughes | TrustRadius Reviewer
Score 7 out of 10
Vetted Review
Verified User

Modules Used

  • Hadoop Common
  • Hadoop Distributed File System
  • Hive
  • Spark

Overall Satisfaction with Apache Hadoop

[Apache Hadoop] is being handled as it is (mostly) intended. For large, unstructured data management from our data flows to include logging and reports extract, transform and load. We are using it at a medium scale in an on-prem server delivery with Cloudera as the management platform. While I firmly believe cloudera makes it a bit easier to manage, it obfuscates issues at times.
  • Handles large amounts of unstructured data well, for business level purposes
  • Is a good catchall because of this design, i.e. what does not fit into our vertical tables fits here.
  • Decent for large ETL pipelines and logging free-for-alls because of this, also.
  • Many, many modules and because of Apache open source, takes time to learn
  • Integration is not always seamless between the disparate pieces nor are all the pieces required.
  • Optimization can be challenging (see PSTL design)
  • Positive as we have saved money on hardware (and software costs) as data scaling as increased in the last several years.
  • Positive as I said earlier as the design of Hadoop allows for a natural split of the dataflows and less data to be "shoved into" the vertical data stack. This saves money and is naturally more efficient.
  • Negative, where we need expertise to manage the Hadoop datastacks due to the learning curves.
MariaDB - Better to be already in the cloud you will use it for. Issues have improved as it has matured over the year.s
CockroachDB - Not nearly as performant (even out of the box) as Apache Hadoop. More configurations required just to make it work. In memory cacheing is an issue.

Do you think Apache Hadoop delivers good value for the price?


Are you happy with Apache Hadoop's feature set?


Did Apache Hadoop live up to sales and marketing promises?


Did implementation of Apache Hadoop go as expected?


Would you buy Apache Hadoop again?


Apache Hadoop (and its subsequent add-ons) are well-suited to larger, unstructured data flows, such as aggregation of web traffic or advertising. Geospatial algorithms and their outputs are well-suited for this kind of aggregation as structuring that data is challenging, but leaving it unstructured and performing queries as-needed is a better fit for most business models. With the advent of data science, I would expect Hadoop fits a LOT of their initial outputs quite well.