Hadoop: Highly available, scalable and cost effective for big data storage and processing.
December 13, 2017

Hadoop: Highly available, scalable and cost effective for big data storage and processing.

Johanes Siregar | TrustRadius Reviewer
Score 8 out of 10
Vetted Review
Verified User

Modules Used

  • Hadoop Common
  • Hadoop Distributed File System
  • Hadoop MapReduce

Overall Satisfaction with Hadoop

Currently, there are two directorates using Hadoop for processing a vast amount of data from various data sources in my organization. Hadoop helps us tackle our problem of maintaining and processing a huge amount of data efficiently. High availability, scalability and cost efficiency are the main considerations for implementing Hadoop as one of the core solutions in our big-data infrastructure.
  • Scalability is one of the main reasons we decided to use Hadoop. Storage and processing power can be seamlessly increased by simply adding more nodes.
  • Replication on Hadoop's distributed file system (HDFS) ensures robustness of data being stored which ensures high-availability of data.
  • Using commodity hardware as a node in a Hadoop cluster can reduce cost and eliminates dependency on particular proprietary technology.
  • User and access management are still challenging to implement in Hadoop, deploying a kerberized secured cluster is quite a challenge itself.
  • Multiple application versioning on a single cluster would be a nice to have feature.
  • Processing a large number of small files also becomes a problem on a very large cluster with hundreds of nodes.
  • Hadoop as a huge impact on reducing the cost of data storage in our organization.
  • Other then that it also serves as low-cost big data processing framework.
  • The use of commodity hardware for the physical layer greatly reduces technological dependency on proprietary products.
Hadoop offers a scalable, cost-effective and highly available solution for big data storage and processing. The use of a non-proprietary physical layer greatly reduces dependency on technology. It also offers elastic dimensioning capability when deployed on virtual machines or even on IAAS cloud. The main challenge, however, is to manage user access and to maintain security.
Hadoop is well suited for internal projects in a secure environment without any external exposure. It also excels well in storing and processing large amounts of data. It is also suitable to be implemented as a data repository for data-intensive applications which require high data availability, a significant amount of memory and huge processing power. However, it is not appropriate to implement as a near real-time solution which needs a high response time with a high number of high transactions per seconds.