Item: Apache Hadoop
Rating: 8
Author: Johanes Siregar

Overall Satisfaction with Hadoop

Use Cases and Deployment Scope

Currently, there are two directorates using Hadoop for processing a vast amount of data from various data sources in my organization. Hadoop helps us tackle our problem of maintaining and processing a huge amount of data efficiently. High availability, scalability and cost efficiency are the main considerations for implementing Hadoop as one of the core solutions in our big-data infrastructure.

Pros and Cons

Pros

Scalability is one of the main reasons we decided to use Hadoop. Storage and processing power can be seamlessly increased by simply adding more nodes.
Replication on Hadoop's distributed file system (HDFS) ensures robustness of data being stored which ensures high-availability of data.
Using commodity hardware as a node in a Hadoop cluster can reduce cost and eliminates dependency on particular proprietary technology.

Cons

User and access management are still challenging to implement in Hadoop, deploying a kerberized secured cluster is quite a challenge itself.
Multiple application versioning on a single cluster would be a nice to have feature.
Processing a large number of small files also becomes a problem on a very large cluster with hundreds of nodes.

Return on Investment

Hadoop as a huge impact on reducing the cost of data storage in our organization.
Other then that it also serves as low-cost big data processing framework.
The use of commodity hardware for the physical layer greatly reduces technological dependency on proprietary products.

Alternatives Considered

Teradata Database, Amazon Elastic MapReduce and Elastic Grid

Hadoop offers a scalable, cost-effective and highly available solution for big data storage and processing. The use of a non-proprietary physical layer greatly reduces dependency on technology. It also offers elastic dimensioning capability when deployed on virtual machines or even on IAAS cloud. The main challenge, however, is to manage user access and to maintain security.

Other Software Used

Microsoft Azure, Microsoft Power BI, Teradata Aster Database

Likelihood to Recommend

Hadoop is well suited for internal projects in a secure environment without any external exposure. It also excels well in storing and processing large amounts of data. It is also suitable to be implemented as a data repository for data-intensive applications which require high data availability, a significant amount of memory and huge processing power. However, it is not appropriate to implement as a near real-time solution which needs a high response time with a high number of high transactions per seconds.

Comments

Please log in to join the conversation

Hadoop: Highly available, scalable and cost effective for big data storage and processing.

Modules Used