Fault Tolerance and High Availablility Made Easy with Hadoop
September 20, 2020

Fault Tolerance and High Availablility Made Easy with Hadoop

Gene Baker | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User

Modules Used

  • Hadoop Common
  • Hadoop Distributed File System
  • Hadoop MapReduce
  • Hadoop YARN
  • Spark

Overall Satisfaction with Hadoop

We are using it within my department to process large sets of data that can't be processed in a timely fashion on a single computer or node. The various modules provided with Hadoop make it easy for us to implement map-reduce and perform parallel processing on large sets of data. We have approximately 40TB of data that we run various algorithms against as we try to use the data to solve business problems and prevent fraudulent transactions.
  • Map-reduce
  • Parallel processing
  • Handles node failures
  • HDFS: distributed file system
  • More connectors
  • Query optimization
  • Job scheduling
  • Positive: easy implementation
  • Positive: ease of scalability
  • Positive: ease of distributing data and workloads
  • Positive: low cost
  • Positive: low learning curve
Hands down, Hadoop is less expensive than the other platforms we considered. Cloudera was easier to set up but the expense ruled it out. MS-SQL didn't have the performance we saw with the Hadoop clusters and was more expensive. We considered MS-SQL mainly for its ability to support SQL queries in hopes we could leverage the existing codebase. Azure was just more expensive but again was easier to setup. In the end, cost won out because even though the competition was easier to set up, it's not like Hadoop was that much harder to setup.
We went with a third party for support, i.e., consultant. Had we gone with Azure or Cloudera, we would have obtained support directly from the vendor. my rating is more on the third party we selected and doesn't reflect the overall support available for Hadoop. I think we could have done better in our selection process, however, we were trying to use an already approved vendor within our organization. There is plenty of self-help available for Hadoop online.
It is easy to set up and use. There is a low learning curve. Plenty of help available online and the modules that are included with Hadoop make it easy to setup map-reduce and parallel processing using the platform. It is also fairly easy to set up high availability and fault tolerance which can be cumbersome on other platforms. Hadoop makes that all easy.

Do you think Apache Hadoop delivers good value for the price?


Are you happy with Apache Hadoop's feature set?


Did Apache Hadoop live up to sales and marketing promises?


Did implementation of Apache Hadoop go as expected?


Would you buy Apache Hadoop again?


Hadoop is easy to use. It is a scalable and cost-effective solution for working with large data sets. Hadoop accepts data from a variety of disparate data sources, such as social media feeds, structured or unstructured data, XML, text files, images, etc. Hadoop is also highly available and fault-tolerant, supporting multiple standby NameNodes. The performance of Hadoop is also good because it stores data in a distributed fashion allowing for distributed processing and lower run times. And Hadoop is open-source, making the source code available for modification if necessary. Hadoop also supports multiple languages like C/C++, Python, and Groovy.