Skip to main content
TrustRadius
Hadoop

Hadoop

Overview

What is Hadoop?

Hadoop is an open source software from Apache, supporting distributed processing and data storage. Hadoop is popular for its scalability, reliability, and functionality available across commoditized hardware.

Read more
Recent Reviews

TrustRadius Insights

Hadoop has been widely adopted by organizations for various use cases. One of its key use cases is in storing and analyzing log data, …
Continue reading

Hadoop Review

7 out of 10
May 16, 2018
Incentivized
It is massively being used in our organization for data storage, data backup, and machine learning analytics. Managing vast amounts of …
Continue reading
Read all reviews
Return to navigation

Product Demos

Installation of Apache Hadoop 2.x or Cloudera CDH5 on Ubuntu | Hadoop Practical Demo

YouTube

Big Data Complete Course and Hadoop Demo Step by Step | Big Data Tutorial for Beginners | Scaler

YouTube

Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop Tutorial | Simplilearn

YouTube
Return to navigation

Product Details

What is Hadoop?

Hadoop Video

What is Hadoop?

Hadoop Technical Details

Operating SystemsUnspecified
Mobile ApplicationNo

Frequently Asked Questions

Hadoop is an open source software from Apache, supporting distributed processing and data storage. Hadoop is popular for its scalability, reliability, and functionality available across commoditized hardware.

Reviewers rate Data Sources highest, with a score of 8.7.

The most common users of Hadoop are from Enterprises (1,001+ employees).
Return to navigation

Comparisons

View all alternatives
Return to navigation

Reviews and Ratings

(270)

Community Insights

TrustRadius Insights are summaries of user sentiment data from TrustRadius reviews and, when necessary, 3rd-party data sources. Have feedback on this content? Let us know!

Hadoop has been widely adopted by organizations for various use cases. One of its key use cases is in storing and analyzing log data, financial data from systems like JD Edwards, and retail catalog and session data for an omnichannel experience. Users have found that Hadoop's distributed processing capabilities allow for efficient and cost-effective storage and analysis of large amounts of data. It has been particularly helpful in reducing storage costs and improving performance when dealing with massive data sets. Furthermore, Hadoop enables the creation of a consistent data store that can be integrated across platforms, making it easier for different departments within organizations to collect, store, and analyze data. Users have also leveraged Hadoop to gain insights into business data, analyze patterns, and solve big data modeling problems. The user-friendly nature of Hadoop has made it accessible to users who are not necessarily experts in big data technologies. Additionally, Hadoop is utilized for ETL processing, data streaming, transformation, and querying data using Hive. Its ability to serve as a large volume ETL platform and crunching engine for analytical and statistical models has attracted users who were previously reliant on MySQL data warehouses. They have observed faster query performance with Hadoop compared to traditional solutions. Another significant use case for Hadoop is secure storage without high costs. Hadoop efficiently stores and processes large amounts of data, addressing the problem of secure storage without breaking the bank. Moreover, Hadoop enables parallel processing on large datasets, making it a popular choice for data storage, backup, and machine learning analytics. Organizations have found that it helps maintain and process huge amounts of data efficiently while providing high availability, scalability, and cost efficiency. Hadoop's versatility extends beyond commercial applications—it is also used in research computing clusters to complete tasks faster using the MapReduce framework. Finally, the Systems and IT department relies on Hadoop to create data pipelines and consult on potential projects involving Hadoop. Overall, the use cases of Hadoop span across industries and departments, providing valuable solutions for data collection, storage, and analysis.

Attribute Ratings

Reviews

(1-3 of 3)
Companies can't remove reviews or game the system. Here's why
Blake Baron | TrustRadius Reviewer
Score 7 out of 10
Vetted Review
Verified User
Incentivized
It's used organization-wide for older data that's not used as frequently. We use Teradata to warehouse our more recent data, but for data we don't access as often, it's migrated to Hadoop. It addresses the problem of securely storing data without paying the fortune that most warehouses charge for premium cloud storage.
  • Accessible
  • Inexpensive
  • User friendly
  • Much slower than more premium platforms
  • Doesn't connect with other data warehouses
  • Not mainstream -- somewhat more, "hacky" of a solution
Need cheap enterprise-level storage for data that is necessary to keep but isn't regularly accessed? Hadoop is the option for you. If you regularly have analysts or apps accessing the data warehouse, look for something more premium such as Teradata. The good news is that general SQL knowledge transfers well to this warehouse.
  • It's half the price of our more premium data storage, so we've saved 50% on costs there.
  • Figure it's about half as fast, so it takes 2x as long for queries to execute.
  • We utilize Hadoop cloud storage, so we've been able to reduce onsite maintenance costs.
Hadoop utilizes a SQL structure, which is great. You pay less for the services, but it's definitely less of an enterprise-level option and more just a good place to store your seldom-used data. Teradata and AWS are a lot faster in returning queries than Hadoop, but you pay more, of course.
It's a great value for what you pay, and most Data Base Administrators (DBAs) can walk in and use it without substantial training. I tend to dabble on the analyst side, so querying the data I need feels like it can take forever, especially on higher traffic days like Monday.
Great! Hadoop has an easy to use interface that mimics most other data warehouses. You can access your data via SQL and have it display in a terminal before exporting it to your business intelligence platform of choice. Of course, for smaller data sets, you can also export it to Microsoft Excel.
Atlassian JIRA Align (formerly AgileCraft), Jira Software, Teradata Data Warehouse Appliance, Teradata Enterprise Data Warehousing, Teradata Database
Gene Baker | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User
Incentivized
We are using it within my department to process large sets of data that can't be processed in a timely fashion on a single computer or node. The various modules provided with Hadoop make it easy for us to implement map-reduce and perform parallel processing on large sets of data. We have approximately 40TB of data that we run various algorithms against as we try to use the data to solve business problems and prevent fraudulent transactions.
  • Map-reduce
  • Parallel processing
  • Handles node failures
  • HDFS: distributed file system
  • More connectors
  • Query optimization
  • Job scheduling
Hadoop is easy to use. It is a scalable and cost-effective solution for working with large data sets. Hadoop accepts data from a variety of disparate data sources, such as social media feeds, structured or unstructured data, XML, text files, images, etc. Hadoop is also highly available and fault-tolerant, supporting multiple standby NameNodes. The performance of Hadoop is also good because it stores data in a distributed fashion allowing for distributed processing and lower run times. And Hadoop is open-source, making the source code available for modification if necessary. Hadoop also supports multiple languages like C/C++, Python, and Groovy.
  • Positive: easy implementation
  • Positive: ease of scalability
  • Positive: ease of distributing data and workloads
  • Positive: low cost
  • Positive: low learning curve
Hands down, Hadoop is less expensive than the other platforms we considered. Cloudera was easier to set up but the expense ruled it out. MS-SQL didn't have the performance we saw with the Hadoop clusters and was more expensive. We considered MS-SQL mainly for its ability to support SQL queries in hopes we could leverage the existing codebase. Azure was just more expensive but again was easier to setup. In the end, cost won out because even though the competition was easier to set up, it's not like Hadoop was that much harder to setup.
We went with a third party for support, i.e., consultant. Had we gone with Azure or Cloudera, we would have obtained support directly from the vendor. my rating is more on the third party we selected and doesn't reflect the overall support available for Hadoop. I think we could have done better in our selection process, however, we were trying to use an already approved vendor within our organization. There is plenty of self-help available for Hadoop online.
It is easy to set up and use. There is a low learning curve. Plenty of help available online and the modules that are included with Hadoop make it easy to setup map-reduce and parallel processing using the platform. It is also fairly easy to set up high availability and fault tolerance which can be cumbersome on other platforms. Hadoop makes that all easy.
Bhushan Lakhe | TrustRadius Reviewer
Score 7 out of 10
Vetted Review
Verified User
Hadoop is used for storing and analyzing log data (logs from warehouse loads or other data processing) as well as storing and retrieving financial data from JD Edwards. It's also planned to be used for archival. Hadoop is used by several departments within our organization. Currently, we are paying a lot of money for hosting historical data and we plan to move that to Hadoop; reducing our storage costs. Also, we got a much better performance out of our Hadoop cluster for processing a large amount of financial data. So, in that senese, Hadoop addressed multiple business problems for us.
  • Hadoop stores and processes unstructured data such as web access logs or logs of data processing very well
  • Hadoop can be effectively used for archiving; providing a very economic, fast, flexible, scalable and reliable way to store data
  • Hadoop can be used to store and process a very large amount of data very fast
  • Security is a piece that's missing from Hadoop - you have to supplement security using Kerberos etc.
  • Hadoop is not easy to learn - there are various modules with little or no documentation
  • Hadoop being open-source, testing, quality control and version control are very difficult
Hadoop is best suited for warehouse or OLAP processing. It's not suitable for OLTP or small transaction processing
  • We had a large ROI due to improved performance and expedited reporting - our clients were happier and business improved
  • Our storage costs reduced
  • Our infrastructure costs reduced - we used old hardware for our Hadoop cluster
not applicable - I have not evaluated any other products
50
Various - IT, business users, vendors
3
Hadoop Administrator, Java Developer, Hive deveoper
  • Use of HDFS / Hive for storage / analysis of data processing logs
  • Use of HDFS / Hive for storage / analysis of historical financial data
  • Use of HDFS for Archival
  • Archival
  • Reporting
  • ETL
  • Data transfer
  • Staging area
  • Historical reporting
Hadoop is organization-independent and can be used for various purposes ranging from archiving to reporting and can make use of economic, commodity hardware. There is also a lot of saving in terms of licensing costs - since most of the Hadoop ecosystem is available as open-source and is free
Yes
We replaced 5 Windows based servers by a 10 node CentOS based desktops. Saved a lot on hardware and Windows server licenses
  • Price
  • Product Features
  • Product Usability
Price. We saved a lot of money
I will evaluate the ROI more closely
Hadoop is a complex topic and best suited for classrom training. Online training are a waste of time and money.
Return to navigation