Item: Apache Hadoop
Rating: 10
Author: Gene Baker

Overall Satisfaction with Hadoop

Use Cases and Deployment Scope

We are using it within my department to process large sets of data that can't be processed in a timely fashion on a single computer or node. The various modules provided with Hadoop make it easy for us to implement map-reduce and perform parallel processing on large sets of data. We have approximately 40TB of data that we run various algorithms against as we try to use the data to solve business problems and prevent fraudulent transactions.

Pros and Cons

Pros

Map-reduce
Parallel processing
Handles node failures
HDFS: distributed file system

Cons

More connectors
Query optimization
Job scheduling

Return on Investment

Positive: easy implementation
Positive: ease of scalability
Positive: ease of distributing data and workloads
Positive: low cost
Positive: low learning curve

Alternatives Considered

Cloudera Data Platform, Microsoft SQL Server and Azure Data Lake Storage

Hands down, Hadoop is less expensive than the other platforms we considered. Cloudera was easier to set up but the expense ruled it out. MS-SQL didn't have the performance we saw with the Hadoop clusters and was more expensive. We considered MS-SQL mainly for its ability to support SQL queries in hopes we could leverage the existing codebase. Azure was just more expensive but again was easier to setup. In the end, cost won out because even though the competition was easier to set up, it's not like Hadoop was that much harder to setup.

Support Rating

We went with a third party for support, i.e., consultant. Had we gone with Azure or Cloudera, we would have obtained support directly from the vendor. my rating is more on the third party we selected and doesn't reflect the overall support available for Hadoop. I think we could have done better in our selection process, however, we were trying to use an already approved vendor within our organization. There is plenty of self-help available for Hadoop online.

Usability

It is easy to set up and use. There is a low learning curve. Plenty of help available online and the modules that are included with Hadoop make it easy to setup map-reduce and parallel processing using the platform. It is also fairly easy to set up high availability and fault tolerance which can be cumbersome on other platforms. Hadoop makes that all easy.

Key Insights

Do you think Apache Hadoop delivers good value for the price?

Yes

Are you happy with Apache Hadoop's feature set?

Yes

Did Apache Hadoop live up to sales and marketing promises?

Yes

Did implementation of Apache Hadoop go as expected?

Yes

Would you buy Apache Hadoop again?

Yes

Other Software Used

Apache Spark, Cassandra, Docker

Likelihood to Recommend

Hadoop is easy to use. It is a scalable and cost-effective solution for working with large data sets. Hadoop accepts data from a variety of disparate data sources, such as social media feeds, structured or unstructured data, XML, text files, images, etc. Hadoop is also highly available and fault-tolerant, supporting multiple standby NameNodes. The performance of Hadoop is also good because it stores data in a distributed fashion allowing for distributed processing and lower run times. And Hadoop is open-source, making the source code available for modification if necessary. Hadoop also supports multiple languages like C/C++, Python, and Groovy.

Comments

Please log in to join the conversation

Fault Tolerance and High Availablility Made Easy with Hadoop

Modules Used