Skip to main content
TrustRadius
Hadoop

Hadoop

Overview

What is Hadoop?

Hadoop is an open source software from Apache, supporting distributed processing and data storage. Hadoop is popular for its scalability, reliability, and functionality available across commoditized hardware.

Read more
Recent Reviews

TrustRadius Insights

Hadoop has been widely adopted by organizations for various use cases. One of its key use cases is in storing and analyzing log data, …
Continue reading

Hadoop Review

7 out of 10
May 16, 2018
Incentivized
It is massively being used in our organization for data storage, data backup, and machine learning analytics. Managing vast amounts of …
Continue reading
Read all reviews
Return to navigation

Product Demos

Installation of Apache Hadoop 2.x or Cloudera CDH5 on Ubuntu | Hadoop Practical Demo

YouTube

Big Data Complete Course and Hadoop Demo Step by Step | Big Data Tutorial for Beginners | Scaler

YouTube

Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop Tutorial | Simplilearn

YouTube
Return to navigation

Product Details

What is Hadoop?

Hadoop Video

What is Hadoop?

Hadoop Technical Details

Operating SystemsUnspecified
Mobile ApplicationNo

Frequently Asked Questions

Hadoop is an open source software from Apache, supporting distributed processing and data storage. Hadoop is popular for its scalability, reliability, and functionality available across commoditized hardware.

Reviewers rate Data Sources highest, with a score of 8.7.

The most common users of Hadoop are from Enterprises (1,001+ employees).
Return to navigation

Comparisons

View all alternatives
Return to navigation

Reviews and Ratings

(270)

Community Insights

TrustRadius Insights are summaries of user sentiment data from TrustRadius reviews and, when necessary, 3rd-party data sources. Have feedback on this content? Let us know!

Hadoop has been widely adopted by organizations for various use cases. One of its key use cases is in storing and analyzing log data, financial data from systems like JD Edwards, and retail catalog and session data for an omnichannel experience. Users have found that Hadoop's distributed processing capabilities allow for efficient and cost-effective storage and analysis of large amounts of data. It has been particularly helpful in reducing storage costs and improving performance when dealing with massive data sets. Furthermore, Hadoop enables the creation of a consistent data store that can be integrated across platforms, making it easier for different departments within organizations to collect, store, and analyze data. Users have also leveraged Hadoop to gain insights into business data, analyze patterns, and solve big data modeling problems. The user-friendly nature of Hadoop has made it accessible to users who are not necessarily experts in big data technologies. Additionally, Hadoop is utilized for ETL processing, data streaming, transformation, and querying data using Hive. Its ability to serve as a large volume ETL platform and crunching engine for analytical and statistical models has attracted users who were previously reliant on MySQL data warehouses. They have observed faster query performance with Hadoop compared to traditional solutions. Another significant use case for Hadoop is secure storage without high costs. Hadoop efficiently stores and processes large amounts of data, addressing the problem of secure storage without breaking the bank. Moreover, Hadoop enables parallel processing on large datasets, making it a popular choice for data storage, backup, and machine learning analytics. Organizations have found that it helps maintain and process huge amounts of data efficiently while providing high availability, scalability, and cost efficiency. Hadoop's versatility extends beyond commercial applications—it is also used in research computing clusters to complete tasks faster using the MapReduce framework. Finally, the Systems and IT department relies on Hadoop to create data pipelines and consult on potential projects involving Hadoop. Overall, the use cases of Hadoop span across industries and departments, providing valuable solutions for data collection, storage, and analysis.

Attribute Ratings

Reviews

(1-25 of 36)
Companies can't remove reviews or game the system. Here's why
Kunal Sonalkar | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Incentivized
Hadoop is very well suited for big data modeling problems in various industries like finance, insurance, healthcare, automobiles, CRM, etc. In every industry where you need data analysis in real time, Hadoop is a perfect fit in terms of storage, analysis, retrieval, and processing. It won't be a very good tool to perform ETL (Extract Transform Load) techniques though.
Chantel Moreno | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Incentivized
Apache Hadoop is majorly suited for companies that have large amounts of unstructured data flow like advertising and even web traffic so I feel that Hadoop is a great option when you have the extra bulk of data that is required to be stored and processed on a continuous basis. Moreover, I do recommend Hadoop but at the same time, I would also hope and suggest that the software of Hadoop gets supplemented with a faster and interactive database so that the overall querying service gets better.
Peter Suter | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Incentivized
Altogether, I want to say that Apache Hadoop is well-suited to a larger and unstructured data flow like an aggregation of web traffic or even advertising. I think Apache Hadoop is great when you literally have petabytes of data that need to be stored and processed on an ongoing basis. Also, I would recommend that the software should be supplemented with a faster and interactive database for a better querying service. Lastly, it's very cost-effective so it is good to give it a shot before coming to any conclusion.
Score 7 out of 10
Vetted Review
Verified User
Incentivized
If you have petabytes of data that you need to store and process on a regular basis and don't mind having to wait minutes for your queries to run, Apache Hadoop is great for that use case. I would supplement it with another faster interactive database for interactive querying.
Score 7 out of 10
Vetted Review
Verified User
Incentivized
Apache Hadoop (and its subsequent add-ons) are well-suited to larger, unstructured data flows, such as aggregation of web traffic or advertising. Geospatial algorithms and their outputs are well-suited for this kind of aggregation as structuring that data is challenging, but leaving it unstructured and performing queries as-needed is a better fit for most business models. With the advent of data science, I would expect Hadoop fits a LOT of their initial outputs quite well.
Blake Baron | TrustRadius Reviewer
Score 7 out of 10
Vetted Review
Verified User
Incentivized
Need cheap enterprise-level storage for data that is necessary to keep but isn't regularly accessed? Hadoop is the option for you. If you regularly have analysts or apps accessing the data warehouse, look for something more premium such as Teradata. The good news is that general SQL knowledge transfers well to this warehouse.
Gene Baker | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User
Incentivized
Hadoop is easy to use. It is a scalable and cost-effective solution for working with large data sets. Hadoop accepts data from a variety of disparate data sources, such as social media feeds, structured or unstructured data, XML, text files, images, etc. Hadoop is also highly available and fault-tolerant, supporting multiple standby NameNodes. The performance of Hadoop is also good because it stores data in a distributed fashion allowing for distributed processing and lower run times. And Hadoop is open-source, making the source code available for modification if necessary. Hadoop also supports multiple languages like C/C++, Python, and Groovy.
Score 8 out of 10
Vetted Review
Verified User
Incentivized
Massive processing in a distributed environment with data that can be distributed. Research environments. Lab environments would also be a good use for Hadoop. Hadoop can also be used in support of Spark environments and used by Frameworks if deployed properly. The best scenario is with a Data Scientist that understands how to program appropriately.
May 16, 2018

Hadoop Review

Kartik Chavan | TrustRadius Reviewer
Score 7 out of 10
Vetted Review
Verified User
Incentivized
Hadoop helps us tackle our problem of maintaining and processing a huge amount of data efficiently. High availability, scalability and cost efficiency are the main considerations for implementing Hadoop as one of the core solutions in our big-data infrastructure. Where relational databases fall short with regard to tuning and performance, Hadoop rises to the occasion and allows for massive customization leveraging the different tools and modules. We use Hadoop to input raw data and add layers of consolidation or analysis to make business decisions about disparate data points.

Bharadwaj (Brad) Chivukula | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User
Incentivized
  • Less appropriate for small data sets
  • Works well for scenarios with bulk amount of data. They can surely go for Hadoop file system, having offline applications
  • It's not an instant querying software like SQL; so if your application can wait on the crunching of data, then use it
  • Not for real-time applications
January 04, 2018

Hadoop is pretty Badass

Score 9 out of 10
Vetted Review
Verified User
Incentivized
When we have data coming in from various sources, using hadoop is a good call. Its a good central station to take a good look at your data and see what needs to be done.
Hadoop should not be used directly for Real time Analytics. HDFS should be used to store data and we could use Hive to query the files.
Hadoop needs to be understood thoroughly even before attempting to use it for data warehousing needs. So you may need to take stock of what Hadoop provides, and read up on its accompanying tools to see what fits your needs.
Johanes Siregar | TrustRadius Reviewer
Score 8 out of 10
Vetted Review
Verified User
Incentivized
Hadoop is well suited for internal projects in a secure environment without any external exposure. It also excels well in storing and processing large amounts of data. It is also suitable to be implemented as a data repository for data-intensive applications which require high data availability, a significant amount of memory and huge processing power. However, it is not appropriate to implement as a near real-time solution which needs a high response time with a high number of high transactions per seconds.
Score 10 out of 10
Vetted Review
Verified User
Incentivized
Hadoop is well suited for organizations with a lot of data, trying to justify business decisions with data-driven KPIs and milestones. This tool is best utilized by engineers with data modeling experience and a high-level understanding of how the different data points can be used and correlated. It will be challenging for people with limited knowledge of the business and how data points are created.
Mark Gargiulo | TrustRadius Reviewer
Score 8 out of 10
Vetted Review
Verified User
Incentivized
Hadoop is not for the faint of heart and is not a technology per se but an ecosystem of disparate technologies sitting on top of HDFS. It is certainly powerful but if, like me, you were handed this with no prior knowledge or experience using or administering this ecosystem the learning curve can be significant and ongoing having said that I don't think currently there are many other opensource technologies that can provide the flexibility in the "big data" arena especially for ETL or machine learning.
Muhammad Fazalul Rahman | TrustRadius Reviewer
Score 7 out of 10
Vetted Review
Verified User
Incentivized
If the user is trying to complete a task quickly and efficiently, then Hadoop is the best option for them. However, it may happen that the deadline for the submission is close and the user has little or no knowledge of Hadoop. In this case, it is easier not to use hadoop since it takes time to learn.
Tom Thomas | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User
Incentivized
Hadoop is a very powerful tool that can be used in almost any environment where huge scale processing of data across clusters is required. It provides multiple modules such as HDFS and MapReduce that will make managing and analyzing said data reliable and efficient. Hadoop is a new and constantly evolving tool, and hence it needs users to be on top of it all the time.
February 23, 2016

Hadoop quick review

Score 9 out of 10
Vetted Review
Verified User
Incentivized
Data is growing and grows fast. A relationship database can't hold this requirement any more. Real-time applications and distributed design are required for highly scalability and fault tolerance.
Pierre LaFromboise | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User
Incentivized
A big data problem doesn't always mean huge volumes of data. The other V's of big data (velocity and variety) are also important factors that may lead to selecting Hadoop as a platform.
Mrugen Deshmukh | TrustRadius Reviewer
Score 8 out of 10
Vetted Review
Verified User
Incentivized
1. How large are your data sets? If your answer is few gigabytes, Hadoop may be overkill for your needs.
2. Do you require real-time analytical processing? If yes, Hadoop's map reduce may not be a great asset in that scenario.
3. Do you want to want to process data in a batch processing fashion and scale for TeraBytes size clusters? Hadoop is definitely a great fit for your use case.
Return to navigation