Skip to main content
TrustRadius
Hadoop

Hadoop

Overview

What is Hadoop?

Hadoop is an open source software from Apache, supporting distributed processing and data storage. Hadoop is popular for its scalability, reliability, and functionality available across commoditized hardware.

Read more
Recent Reviews

TrustRadius Insights

Hadoop has been widely adopted by organizations for various use cases. One of its key use cases is in storing and analyzing log data, …
Continue reading

Hadoop vs. Alternatives

8 out of 10
June 05, 2019
Incentivized
It is being used at our Fortune 500 clients. It is great for storage, but it is not well understood by the business. The challenge is that …
Continue reading

Hadoop Review

7 out of 10
May 16, 2018
Incentivized
It is massively being used in our organization for data storage, data backup, and machine learning analytics. Managing vast amounts of …
Continue reading

Hadoop is pretty Badass

9 out of 10
January 04, 2018
Incentivized
Apache Hadoop is a cost effective solution for storing and managing vast amounts of data efficiently. It is dependable and works even when …
Continue reading
Read all reviews
Return to navigation

Product Demos

Installation of Apache Hadoop 2.x or Cloudera CDH5 on Ubuntu | Hadoop Practical Demo

YouTube

Big Data Complete Course and Hadoop Demo Step by Step | Big Data Tutorial for Beginners | Scaler

YouTube

Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop Tutorial | Simplilearn

YouTube
Return to navigation

Product Details

What is Hadoop?

Hadoop Video

What is Hadoop?

Hadoop Technical Details

Operating SystemsUnspecified
Mobile ApplicationNo

Frequently Asked Questions

Hadoop is an open source software from Apache, supporting distributed processing and data storage. Hadoop is popular for its scalability, reliability, and functionality available across commoditized hardware.

Reviewers rate Data Sources highest, with a score of 8.7.

The most common users of Hadoop are from Enterprises (1,001+ employees).
Return to navigation

Comparisons

View all alternatives
Return to navigation

Reviews and Ratings

(270)

Community Insights

TrustRadius Insights are summaries of user sentiment data from TrustRadius reviews and, when necessary, 3rd-party data sources. Have feedback on this content? Let us know!

Hadoop has been widely adopted by organizations for various use cases. One of its key use cases is in storing and analyzing log data, financial data from systems like JD Edwards, and retail catalog and session data for an omnichannel experience. Users have found that Hadoop's distributed processing capabilities allow for efficient and cost-effective storage and analysis of large amounts of data. It has been particularly helpful in reducing storage costs and improving performance when dealing with massive data sets. Furthermore, Hadoop enables the creation of a consistent data store that can be integrated across platforms, making it easier for different departments within organizations to collect, store, and analyze data. Users have also leveraged Hadoop to gain insights into business data, analyze patterns, and solve big data modeling problems. The user-friendly nature of Hadoop has made it accessible to users who are not necessarily experts in big data technologies. Additionally, Hadoop is utilized for ETL processing, data streaming, transformation, and querying data using Hive. Its ability to serve as a large volume ETL platform and crunching engine for analytical and statistical models has attracted users who were previously reliant on MySQL data warehouses. They have observed faster query performance with Hadoop compared to traditional solutions. Another significant use case for Hadoop is secure storage without high costs. Hadoop efficiently stores and processes large amounts of data, addressing the problem of secure storage without breaking the bank. Moreover, Hadoop enables parallel processing on large datasets, making it a popular choice for data storage, backup, and machine learning analytics. Organizations have found that it helps maintain and process huge amounts of data efficiently while providing high availability, scalability, and cost efficiency. Hadoop's versatility extends beyond commercial applications—it is also used in research computing clusters to complete tasks faster using the MapReduce framework. Finally, the Systems and IT department relies on Hadoop to create data pipelines and consult on potential projects involving Hadoop. Overall, the use cases of Hadoop span across industries and departments, providing valuable solutions for data collection, storage, and analysis.

Attribute Ratings

Reviews

(26-36 of 36)
Companies can't remove reviews or game the system. Here's why
Gaurav Kasliwal | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User
Incentivized
I have been using Hadoop for 2 years and I really find it very useful, especially working with bigger datasets. I have used Hadoop and Mahout for my project to analyze and learn different patterns from Yelp Dataset. It was really very easy and user friendly to use.

  • Scalability. Hadoop is really useful when you are dealing with a bigger system and you want to make your system scalable.
  • Reliable. Very reliable.
  • Fast, Fast Fast!!! Hadoop really works very fast, even with bigger datasets.
  • Development tools are not that easy to use.
  • Learning curve can be reduced. As of now, some skill is a must to use Hadoop.
  • Security. In today's world, security is of prime importance. Hadoop could be made more secure to use.
Hadoop is really useful for larger datasets. It is not very useful when you are dealing with a smaller dataset.
Score 9 out of 10
Vetted Review
Verified User
Incentivized
I used Hadoop for my academic projects for processing high volume data of my data mining project.
  • It was able to map our data with clear distinction based on the key.
  • We were able to write simple map reduce code which ran simultaneously on multiple nodes.
  • The auto heal system was really helpful in case of multiple failures.
  • I think Hadoop should not have single point of failure in terms of name node.
  • It should have good public facing API's for easy integration.
  • Internals of Hadoop are very abstract.
  • Protoco Buffers is a really good concept but I am not sure if we have checked other options as well.
I think Hadoop has multiple flavors which people can customize to use as per their requirement. But I would choose hadoop based on following factors:
1. Number of nodes decision based on parallelism we want.
2. The module we want to run should be able to run parallely on all machine.
Sumant Murke | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Incentivized
I have being using Hadoop for the last 12 months and really find it effective while dealing with large amounts of data. I have used Hadoop jointly with Apache Mahout for building a recommendation system and got amazing results. It was fast, reliable and easy to manage.
  • Fast. Prior to working with Hadoop I had many performance based issues where our system was very slow and took time. But after using Hadoop the performance was significantly increased.
  • Fault tolerant. The HDFS (Hadoop distributed file system) is good platform for working with large data sets and makes the system fault tolerant.
  • Scalable. As Hadoop can deal with structured and unstructured data it makes the system scalable.
  • Security. As it has to deal with a large data set it can be vulnerable to malicious data.
  • Less performance with smaller data. Doesn't provide effective results if the data is very small.
  • Requires a skilled person to handle the system.
I would recommend Hadoop when a system is dealing with huge amount of data.
November 11, 2015

Advantage Hadoopo

Ajay Jha | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User
We are using it for Retail data ETL processing. This is going to be used in whole organization. It allows terabytes of data to be processed in faster manner with scalability.
  • Processes big volume of data using parallelism in faster manner.
  • No schema required. Hadoop can process any type of data.
  • Hadoop is horizontally scalable.
  • Hadoop is free.
  • Development tools are not that friendly.
  • Hard to find hadoop resources.
Hadoop is not a replacement of a transactional system such as RDBMS. It is suitable for batch processing.
Michael Reynolds | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User
Hadoop is slowly taking the place of the company-wide MySQL data warehouse. Sqoop is being used to import the data from MySQL. Impala is gradually being used as the new data source for all queries. Eventually, MySQL will be phased out, and all data will go directly into Hadoop. Tests have shown that the queries run from Impala are much faster than those from MySQL
  • The built-in data block redundancy helps ensure that the data is safe. Hadoop also distributes the storage, processing, and memory, to work with large amounts of data in a shorter period of time, compared to a typical database system.
  • There are numerous ways to get at the data. The basic way is via the Java-based API, by submitting MapReduce jobs in Java. Hive works well for quick queries, using SQL, which are automatically submitted as MapReduce Jobs.
  • The web-based interface is great for monitoring and administering the cluster, because it can potentially be done from anywhere.
  • Impala is a very fast alternative to Hive. Unlike Hive, which submits queries as MapReduce jobs, Impala provides immediate access to the data.
  • If you are not familiar with Java and the operating system Hadoop rides on, such as Linux, and have trouble with submitted MapReduce jobs, the error messages can seem cryptic, and it can be challenging to track down the source of the problem.
Hadoop is designed for huge data sets, which can save a lot of time with reading and processing data. However, the NameNode, which allocates the data blocks, is a single point of failure. Without a proper backup, or another NameNode ready to kick in, the file system can be become instantly useless. There are typically two ways to ensure the integrity of the NameNode.

One way is to have a Secondary NameNode, which periodically creates a copy of the file system image file. The process is called a "checkpoint". In the event of a failure of the Primary NameNode, the Secondary NameNode can be manually configured as the Primary NameNode. The need for manual intervention can cause delays and potentially other problems.

The second method is with a Standby NameNode. In this scenario, the same checkpoints are performed, however, in the event of a Primary NameNode failure, the Standby NameNode will immediately take the place of the Primary, preventing a disruption in service. This method requires additional services to be installed for it to operate.
Bhushan Lakhe | TrustRadius Reviewer
Score 7 out of 10
Vetted Review
Verified User
Hadoop is used for storing and analyzing log data (logs from warehouse loads or other data processing) as well as storing and retrieving financial data from JD Edwards. It's also planned to be used for archival. Hadoop is used by several departments within our organization. Currently, we are paying a lot of money for hosting historical data and we plan to move that to Hadoop; reducing our storage costs. Also, we got a much better performance out of our Hadoop cluster for processing a large amount of financial data. So, in that senese, Hadoop addressed multiple business problems for us.
  • Hadoop stores and processes unstructured data such as web access logs or logs of data processing very well
  • Hadoop can be effectively used for archiving; providing a very economic, fast, flexible, scalable and reliable way to store data
  • Hadoop can be used to store and process a very large amount of data very fast
  • Security is a piece that's missing from Hadoop - you have to supplement security using Kerberos etc.
  • Hadoop is not easy to learn - there are various modules with little or no documentation
  • Hadoop being open-source, testing, quality control and version control are very difficult
Hadoop is best suited for warehouse or OLAP processing. It's not suitable for OLTP or small transaction processing
March 20, 2015

Hadoop review

Score 8 out of 10
Vetted Review
Verified User
We use Hadoop for our ETL and analytic functions. We stream data and land it on HDFS and then massage and transform data. We then use Hive interface to query this data. Using Sqoop we export and import data in and out of hadoop ecosystem. We store the data on HDFS in Avro and Parquet file formats.
  • Streaming data and loading to HDFS
  • Load jobs using Oozie and Sqoop for exporting data.
  • Analytic queries using MapReduce, Spark and Hive
  • Speed is one of the improvements we are looking for. We see Spark as an option and we are excited.
OLTP is a scenario I think it is less appropriate. But future will be certainly different.
February 04, 2015

Benefits of using Hadoop

Score 9 out of 10
Vetted Review
Verified User
It was used by a department
  • Definitely speed up data processing efforts
  • I think certain design patterns should be more recommended than others.
Any batch or bulk processing - certainly recommended. Single transaction processing needed real time - not sure, might have issues with such.
Score 10 out of 10
Vetted Review
Verified User
My company's new cloud based architecture is Hadoop based . It is being used across several organizations in our company . Using Hadoop our company has been able to solve many big data problems faster with very high performance.
  • Cost Effective
  • Distributed and Fault Tolerant
  • Easily Scalable
  • Cluster management and debugging is kind of not user friendly ( Doesn't has many tools )
  • More focus should be given to Hadoop Security
  • Single Master Node
  • More user adoption ( Even though it is increasing by each day )
Hadoop is best suited for processing and analyzing unstructured and huge volumes of data . So ask yourself if the problem you are trying to solve involves unstructured data and also the volume .
Score 10 out of 10
Vetted Review
Verified User
Hadoop is part of the overall Data Strategy and is mainly used as a large volume ETL platform and crunching engine for proprietary analytical and statistical models. The biggest challenge for developers/users is moving from an RDBMS query approach for accessing data to a schema on read and list processing framework. The learning curve is steep upfront, but Hive and end user tools like Datameer can help to bridge the gap. Data governance and stewardship are of key importance given the fluid nature of how data is stored and accessed.
  • Gives developers and data analysts flexibility for sourcing, storing and handling large volumes of data.
  • Data redundancy and tunable MapReduce parameters to ensure jobs complete in the event of hardware failure.
  • Adding capacity is seamless.
  • Logs that are easier to read.
Not an RDBMS - not well suited for traditional BI applications.
Chandra Gupta | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Reseller
We used to understand hadoop stack for data indexing and search using the pentaho plugin.
  • Setup cluster of data on commodity server
  • Distribute data indexing and search process
  • Faster access to distributed indexed data using pentaho plugin and don't need to write whole set of analytics scripts that pentaho does for us
  • Uniformity of installation
For data indexing (unstructured content) and search without any costly server... there is no real alternative option?
Return to navigation