Skip to main content
TrustRadius: an HG Insights Company
Hadoop

Hadoop

Overview

What is Hadoop?

Hadoop is an open source software from Apache, supporting distributed processing and data storage. Hadoop is popular for its scalability, reliability, and functionality available across commoditized hardware.

Read more

Learn from top reviewers

Return to navigation

Product Demos

Installation of Apache Hadoop 2.x or Cloudera CDH5 on Ubuntu | Hadoop Practical Demo

YouTube

Big Data Complete Course and Hadoop Demo Step by Step | Big Data Tutorial for Beginners | Scaler

YouTube

Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop Tutorial | Simplilearn

YouTube
Return to navigation

Product Details

What is Hadoop?

Hadoop Video

What is Hadoop?

Hadoop Technical Details

Operating SystemsUnspecified
Mobile ApplicationNo

Frequently Asked Questions

Hadoop is an open source software from Apache, supporting distributed processing and data storage. Hadoop is popular for its scalability, reliability, and functionality available across commoditized hardware.

Reviewers rate Data Sources highest, with a score of 8.7.

The most common users of Hadoop are from Enterprises (1,001+ employees).
Return to navigation

Comparisons

View all alternatives
Return to navigation

Reviews From Top Reviewers

(1-5 of 37)

Hadoop - You Can Tame the Elephant

Rating: 10 out of 10
August 19, 2015
MR
Vetted Review
Verified User
Apache Hadoop
1.5 years of experience
Hadoop is slowly taking the place of the company-wide MySQL data warehouse. Sqoop is being used to import the data from MySQL. Impala is gradually being used as the new data source for all queries. Eventually, MySQL will be phased out, and all data will go directly into Hadoop. Tests have shown that the queries run from Impala are much faster than those from MySQL
  • The built-in data block redundancy helps ensure that the data is safe. Hadoop also distributes the storage, processing, and memory, to work with large amounts of data in a shorter period of time, compared to a typical database system.
  • There are numerous ways to get at the data. The basic way is via the Java-based API, by submitting MapReduce jobs in Java. Hive works well for quick queries, using SQL, which are automatically submitted as MapReduce Jobs.
  • The web-based interface is great for monitoring and administering the cluster, because it can potentially be done from anywhere.
  • Impala is a very fast alternative to Hive. Unlike Hive, which submits queries as MapReduce jobs, Impala provides immediate access to the data.
Cons
  • If you are not familiar with Java and the operating system Hadoop rides on, such as Linux, and have trouble with submitted MapReduce jobs, the error messages can seem cryptic, and it can be challenging to track down the source of the problem.
Hadoop is designed for huge data sets, which can save a lot of time with reading and processing data. However, the NameNode, which allocates the data blocks, is a single point of failure. Without a proper backup, or another NameNode ready to kick in, the file system can be become instantly useless. There are typically two ways to ensure the integrity of the NameNode.

One way is to have a Secondary NameNode, which periodically creates a copy of the file system image file. The process is called a "checkpoint". In the event of a failure of the Primary NameNode, the Secondary NameNode can be manually configured as the Primary NameNode. The need for manual intervention can cause delays and potentially other problems.

The second method is with a Standby NameNode. In this scenario, the same checkpoints are performed, however, in the event of a Primary NameNode failure, the Standby NameNode will immediately take the place of the Primary, preventing a disruption in service. This method requires additional services to be installed for it to operate.

Hadoop usage and relevance for large data indexing and searching (getting Trends of Data)

Rating: 9 out of 10
May 03, 2014
CG
Vetted Review
Apache Hadoop
1 year of experience
We used to understand hadoop stack for data indexing and search using the pentaho plugin.
  • Setup cluster of data on commodity server
  • Distribute data indexing and search process
  • Faster access to distributed indexed data using pentaho plugin and don't need to write whole set of analytics scripts that pentaho does for us
Cons
  • Uniformity of installation
For data indexing (unstructured content) and search without any costly server... there is no real alternative option?

Advantage Hadoopo

Rating: 10 out of 10
November 11, 2015
AJ
Vetted Review
Verified User
Apache Hadoop
5 years of experience
We are using it for Retail data ETL processing. This is going to be used in whole organization. It allows terabytes of data to be processed in faster manner with scalability.
  • Processes big volume of data using parallelism in faster manner.
  • No schema required. Hadoop can process any type of data.
  • Hadoop is horizontally scalable.
  • Hadoop is free.
Cons
  • Development tools are not that friendly.
  • Hard to find hadoop resources.
Hadoop is not a replacement of a transactional system such as RDBMS. It is suitable for batch processing.

Hadoop: Highly available, scalable and cost effective for big data storage and processing.

Rating: 8 out of 10
December 13, 2017
JS
Vetted Review
Verified User
Apache Hadoop
1 year of experience
Currently, there are two directorates using Hadoop for processing a vast amount of data from various data sources in my organization. Hadoop helps us tackle our problem of maintaining and processing a huge amount of data efficiently. High availability, scalability and cost efficiency are the main considerations for implementing Hadoop as one of the core solutions in our big-data infrastructure.
  • Scalability is one of the main reasons we decided to use Hadoop. Storage and processing power can be seamlessly increased by simply adding more nodes.
  • Replication on Hadoop's distributed file system (HDFS) ensures robustness of data being stored which ensures high-availability of data.
  • Using commodity hardware as a node in a Hadoop cluster can reduce cost and eliminates dependency on particular proprietary technology.
Cons
  • User and access management are still challenging to implement in Hadoop, deploying a kerberized secured cluster is quite a challenge itself.
  • Multiple application versioning on a single cluster would be a nice to have feature.
  • Processing a large number of small files also becomes a problem on a very large cluster with hundreds of nodes.
Hadoop is well suited for internal projects in a secure environment without any external exposure. It also excels well in storing and processing large amounts of data. It is also suitable to be implemented as a data repository for data-intensive applications which require high data availability, a significant amount of memory and huge processing power. However, it is not appropriate to implement as a near real-time solution which needs a high response time with a high number of high transactions per seconds.

Hadoop Review

Rating: 7 out of 10
May 16, 2018
KC
Vetted Review
Verified User
Apache Hadoop
2 years of experience
It is massively being used in our organization for data storage, data backup, and machine learning analytics. Managing vast amounts of data has become quite easy since the arrival of the Hadoop environment. Our department is on verge of moving towards Spark instead of MapReduce, but for now, Hadoop is being used extensively for MapReduce purposes.
  • Hadoop Distributed Systems is reliable.
  • High scalability
  • Open Sources, Low Cost, Large Communities
Cons
  • Compatibility with Windows Systems
  • Security needs more focus
  • Hadoop lack in real time processing
Hadoop helps us tackle our problem of maintaining and processing a huge amount of data efficiently. High availability, scalability and cost efficiency are the main considerations for implementing Hadoop as one of the core solutions in our big-data infrastructure. Where relational databases fall short with regard to tuning and performance, Hadoop rises to the occasion and allows for massive customization leveraging the different tools and modules. We use Hadoop to input raw data and add layers of consolidation or analysis to make business decisions about disparate data points.

Return to navigation