Skip to main content
TrustRadius
Hadoop

Hadoop

Overview

What is Hadoop?

Hadoop is an open source software from Apache, supporting distributed processing and data storage. Hadoop is popular for its scalability, reliability, and functionality available across commoditized hardware.

Read more

Learn from top reviewers

Return to navigation

Product Demos

Installation of Apache Hadoop 2.x or Cloudera CDH5 on Ubuntu | Hadoop Practical Demo

YouTube

Big Data Complete Course and Hadoop Demo Step by Step | Big Data Tutorial for Beginners | Scaler

YouTube

Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop Tutorial | Simplilearn

YouTube
Return to navigation

Product Details

What is Hadoop?

Hadoop Video

What is Hadoop?

Hadoop Technical Details

Operating SystemsUnspecified
Mobile ApplicationNo

Frequently Asked Questions

Hadoop is an open source software from Apache, supporting distributed processing and data storage. Hadoop is popular for its scalability, reliability, and functionality available across commoditized hardware.

Reviewers rate Data Sources highest, with a score of 8.7.

The most common users of Hadoop are from Enterprises (1,001+ employees).
Return to navigation

Comparisons

View all alternatives
Return to navigation

Reviews From Top Reviewers

(1-5 of 16)

Advantage Hadoopo

Rating: 10 out of 10
November 11, 2015
AJ
Vetted Review
Verified User
Apache Hadoop
5 years of experience
We are using it for Retail data ETL processing. This is going to be used in whole organization. It allows terabytes of data to be processed in faster manner with scalability.
  • Processes big volume of data using parallelism in faster manner.
  • No schema required. Hadoop can process any type of data.
  • Hadoop is horizontally scalable.
  • Hadoop is free.
Cons
  • Development tools are not that friendly.
  • Hard to find hadoop resources.
Hadoop is not a replacement of a transactional system such as RDBMS. It is suitable for batch processing.

Good tool for unstructured data

Rating: 9 out of 10
July 21, 2021
PS
Vetted Review
Verified User
Apache Hadoop
2 years of experience
Apache Hadoop is an open-source software library that is designed for the collection, storage, and analysis of large amounts of data sets. Apache Hadoop’s architecture comprises components that include a distributed file system. This is mostly used for massive data collection, analytics, and storage. Also, having consistent data can be integrated across other platforms and have one single source of truth.
  • Apache Hadoop has made managing large amounts of data quite easy.
  • The system contains a file system known as HDFS (Hadoop Distributed File System) which processes components and programs.
  • The parallel processing tool of this software is also a good aspect of Apache Hadoop.
  • It keeps interesting and reliable features and functions.
  • Apache Hadoop also has a store of very big data files in machines with high levels of availability.
Cons
  • I personally feel that Apache Hadoop is slower as compared to other interactive querying platforms. Queries can take up to hours sometimes which can be frustrating and discouraging sometimes.
  • Also, there are so many modules of Apache Hadoop so it takes so much more time to learn all of them. Other than that, optimization is somewhat a challenge in Apache Hadoop.
Altogether, I want to say that Apache Hadoop is well-suited to a larger and unstructured data flow like an aggregation of web traffic or even advertising. I think Apache Hadoop is great when you literally have petabytes of data that need to be stored and processed on an ongoing basis. Also, I would recommend that the software should be supplemented with a faster and interactive database for a better querying service. Lastly, it's very cost-effective so it is good to give it a shot before coming to any conclusion.

Great enterprise tool for handling large data

Rating: 9 out of 10
August 17, 2021
CM
Vetted Review
Verified User
Apache Hadoop
2 years of experience
Apache Hadoop is one of the most effective and efficient software which has been storing and processing an extremely colossal amount of data in my company for a long time now. The software Hadoop is primarily used for data collection of large amounts, storage as well as for analytics. From my experience, I have to say that Hadoop is extremely useful and has a reliable plus valid purpose.
  • The various modules sometimes are pretty challenging to learn but at the same time, it has made Hadoop easy to implement and perform.
  • Hadoop comprises a thoughtful file system which is called as Hadoop Distributed File System that beautifully processes all components and programs.
  • Hadoop is also very easy to install so this is also a great aspect of Hadoop as sometimes the installation process is so tricky that the user loses interest.
  • Customer support is quick.
Cons
  • As much as I really appreciate Hadoop there are certain cons attached to it as well. I personally think that Hadoop should work attentively towards their interactive querying platforms which in my opinion is quite slow as compared to other players available in the market.
  • Apart from that, a con that I have noticed is that there are many modules that exist in Hadoop so due to the higher number of modules it becomes difficult and time-consuming to learn and ace all of them.
Apache Hadoop is majorly suited for companies that have large amounts of unstructured data flow like advertising and even web traffic so I feel that Hadoop is a great option when you have the extra bulk of data that is required to be stored and processed on a continuous basis. Moreover, I do recommend Hadoop but at the same time, I would also hope and suggest that the software of Hadoop gets supplemented with a faster and interactive database so that the overall querying service gets better.

Hadoop - Effective tool for large scale distributed processing.

Rating: 8 out of 10
December 01, 2015
MD
Vetted Review
Verified User
Apache Hadoop
3 years of experience
I have used Hadoop for building business feeds for a telecom client. The major purpose for using Hadoop was to tackle the problem of gaining insights into the ever growing number of business data. We leveraged the map reduce programming model to churn more than 30 gigabytes of data per day into actionable and aggregated data which was further leveraged by campaign teams to design and shape marketing and by product teams to envision new customer experiences.
  • Hadoop is an excellent framework for building distributed, fault tolerant data processing systems which leverage HDFS which is optimized for low latency storage and high throughput performance.
  • Hadoop Map reduce is a powerful programming model and can be leveraged directly either via use of Java programming language or by data flow languages like Apache Pig.
  • Hadoop has a reach eco system of companion tools which enable easy integration for ingesting large amounts of data efficiently from various sources. For example Apache Flume can act as data bus which can use HDFS as a sink and integrates effectively with disparate data sources.
  • Hadoop can also be leveraged to build complex data processing and machine learning workflows, due to availability of Apache Mahout, which uses the map reduce model of Hadoop to run complex algorithms.
Cons
  • Hadoop is a batch oriented processing framework, it lacks real time or stream processing.
  • Hadoop's HDFS file system is not a POSIX compliant file system and does not work well with small files, especially smaller than the default block size.
  • Hadoop cannot be used for running interactive jobs or analytics.
1. How large are your data sets? If your answer is few gigabytes, Hadoop may be overkill for your needs.
2. Do you require real-time analytical processing? If yes, Hadoop's map reduce may not be a great asset in that scenario.
3. Do you want to want to process data in a batch processing fashion and scale for TeraBytes size clusters? Hadoop is definitely a great fit for your use case.

Hadoop is the Perfect Enterprise tool for Big Data

Rating: 10 out of 10
May 25, 2016
TT
Vetted Review
Verified User
Apache Hadoop
1 year of experience
The company I worked at used Hadoop clusters for processing huge datasets. They had several nodes for both production and per-production nodes. It allowed distributed processing of data across several clusters with an easy to use software model. It is used by the Systems and IT department at my company.
  • HDFS provides a very robust and fast data storage system.
  • Hadoop works well with generic "commodity" hardware negating the need for expensive enterprise grade hardware.
  • It is mostly unaffected by system and hardware failures of nodes and is self-sustained.
Cons
  • While its open source nature provides a lot of benefits, there are multiple stability issues that arise due to it.
  • Limited support for interactive analytics.
Hadoop is a very powerful tool that can be used in almost any environment where huge scale processing of data across clusters is required. It provides multiple modules such as HDFS and MapReduce that will make managing and analyzing said data reliable and efficient. Hadoop is a new and constantly evolving tool, and hence it needs users to be on top of it all the time.
Return to navigation