Skip to main content
TrustRadius
Hadoop

Hadoop

Overview

What is Hadoop?

Hadoop is an open source software from Apache, supporting distributed processing and data storage. Hadoop is popular for its scalability, reliability, and functionality available across commoditized hardware.

Read more
Recent Reviews

TrustRadius Insights

Hadoop has been widely adopted by organizations for various use cases. One of its key use cases is in storing and analyzing log data, …
Continue reading

Hadoop Review

7 out of 10
May 16, 2018
Incentivized
It is massively being used in our organization for data storage, data backup, and machine learning analytics. Managing vast amounts of …
Continue reading
Read all reviews
Return to navigation

Product Demos

Installation of Apache Hadoop 2.x or Cloudera CDH5 on Ubuntu | Hadoop Practical Demo

YouTube

Big Data Complete Course and Hadoop Demo Step by Step | Big Data Tutorial for Beginners | Scaler

YouTube

Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop Tutorial | Simplilearn

YouTube
Return to navigation

Product Details

What is Hadoop?

Hadoop Video

What is Hadoop?

Hadoop Technical Details

Operating SystemsUnspecified
Mobile ApplicationNo

Frequently Asked Questions

Hadoop is an open source software from Apache, supporting distributed processing and data storage. Hadoop is popular for its scalability, reliability, and functionality available across commoditized hardware.

Reviewers rate Data Sources highest, with a score of 8.7.

The most common users of Hadoop are from Enterprises (1,001+ employees).
Return to navigation

Comparisons

View all alternatives
Return to navigation

Reviews and Ratings

(270)

Community Insights

TrustRadius Insights are summaries of user sentiment data from TrustRadius reviews and, when necessary, 3rd-party data sources. Have feedback on this content? Let us know!

Hadoop has been widely adopted by organizations for various use cases. One of its key use cases is in storing and analyzing log data, financial data from systems like JD Edwards, and retail catalog and session data for an omnichannel experience. Users have found that Hadoop's distributed processing capabilities allow for efficient and cost-effective storage and analysis of large amounts of data. It has been particularly helpful in reducing storage costs and improving performance when dealing with massive data sets. Furthermore, Hadoop enables the creation of a consistent data store that can be integrated across platforms, making it easier for different departments within organizations to collect, store, and analyze data. Users have also leveraged Hadoop to gain insights into business data, analyze patterns, and solve big data modeling problems. The user-friendly nature of Hadoop has made it accessible to users who are not necessarily experts in big data technologies. Additionally, Hadoop is utilized for ETL processing, data streaming, transformation, and querying data using Hive. Its ability to serve as a large volume ETL platform and crunching engine for analytical and statistical models has attracted users who were previously reliant on MySQL data warehouses. They have observed faster query performance with Hadoop compared to traditional solutions. Another significant use case for Hadoop is secure storage without high costs. Hadoop efficiently stores and processes large amounts of data, addressing the problem of secure storage without breaking the bank. Moreover, Hadoop enables parallel processing on large datasets, making it a popular choice for data storage, backup, and machine learning analytics. Organizations have found that it helps maintain and process huge amounts of data efficiently while providing high availability, scalability, and cost efficiency. Hadoop's versatility extends beyond commercial applications—it is also used in research computing clusters to complete tasks faster using the MapReduce framework. Finally, the Systems and IT department relies on Hadoop to create data pipelines and consult on potential projects involving Hadoop. Overall, the use cases of Hadoop span across industries and departments, providing valuable solutions for data collection, storage, and analysis.

Attribute Ratings

Reviews

(1-25 of 27)
Companies can't remove reviews or game the system. Here's why
Kunal Sonalkar | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Incentivized
Apache Spark can be considered as an alternative because of its similar capabilities around processing and storing big data. The reason we went with Hadoop was the literature available online and integration capability with platforms like R Studio. The popularity of Hadoop has helped us in debugging issues and solving problems at a faster rate.
Chantel Moreno | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Incentivized
Different departments of my organization have been getting the benefit from Apache Hadoop as it serves the purpose of saving lives when large amounts of data is unable to be converted and processed in a timely manner from a node or a simple computer. Hadoop also has an easier process of configuration in a clustered environment. Additionally, from my experience, I have noticed that Hadoop provides great scalability and redundancy. Also, it provides enterprise-level support from a variety of vendors. Lastly, I think that a great positive fact of Hadoop is its horizontal scaling.
Peter Suter | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Incentivized
I feel that this is a highly reliable and scalable solution computing technology that is highly capable of processing large data sets across multiple servers and thousands of machines in a well-defined and distributed manner. Apache Hadoop can automatically scale up the number of servers and machines that are needed to process, store, and analyze data sets. It also handles explosions in data with big data technology. Apache Hadoop is good at handling all node failures as well.
Score 7 out of 10
Vetted Review
Verified User
Incentivized
Spark is a good alternative to Hadoop that can have faster querying and processing performance and can offer more flexibility in terms of applications that it can support.

Google BigQuery has also been a great alternative and is especially great in terms of ease of use. The capacity to process data and the speed are great without having to do any settings tuning or optimization. It also doesn't require any on-site hosting, making it a great hands off solution.
Score 7 out of 10
Vetted Review
Verified User
Incentivized
MariaDB - Better to be already in the cloud you will use it for. Issues have improved as it has matured over the year.s
CockroachDB - Not nearly as performant (even out of the box) as Apache Hadoop. More configurations required just to make it work. In memory cacheing is an issue.
Blake Baron | TrustRadius Reviewer
Score 7 out of 10
Vetted Review
Verified User
Incentivized
Hadoop utilizes a SQL structure, which is great. You pay less for the services, but it's definitely less of an enterprise-level option and more just a good place to store your seldom-used data. Teradata and AWS are a lot faster in returning queries than Hadoop, but you pay more, of course.
Gene Baker | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User
Incentivized
Hands down, Hadoop is less expensive than the other platforms we considered. Cloudera was easier to set up but the expense ruled it out. MS-SQL didn't have the performance we saw with the Hadoop clusters and was more expensive. We considered MS-SQL mainly for its ability to support SQL queries in hopes we could leverage the existing codebase. Azure was just more expensive but again was easier to setup. In the end, cost won out because even though the competition was easier to set up, it's not like Hadoop was that much harder to setup.
Score 8 out of 10
Vetted Review
Verified User
Incentivized
When comparing to the sophistication of IBM GPFS (Spectrum Scale) to Hadoop, it is clear that Spectrum Scale is a much better choice. That is maybe something you don't want to hear, but in all of our research, this has been the final decision of the client.
May 16, 2018

Hadoop Review

Kartik Chavan | TrustRadius Reviewer
Score 7 out of 10
Vetted Review
Verified User
Incentivized
  • For real-time streaming, use Spark; can provide a stark contrast to the way MR works
  • Hadoop offers a scalable, cost-effective and highly available solution for big data storage and processing.
  • Amazon Redshift is somewhat closer to Hadoop. But to analyze Petabytes of data Hadoop as better performance.
  • Hadoop is being open source, is cheaper to use and do POCs for client

Johanes Siregar | TrustRadius Reviewer
Score 8 out of 10
Vetted Review
Verified User
Incentivized
Hadoop offers a scalable, cost-effective and highly available solution for big data storage and processing. The use of a non-proprietary physical layer greatly reduces dependency on technology. It also offers elastic dimensioning capability when deployed on virtual machines or even on IAAS cloud. The main challenge, however, is to manage user access and to maintain security.
Score 10 out of 10
Vetted Review
Verified User
Incentivized
I haven't worked with other Big Data aggregation services like Hadoop. As far as I know, Hadoop is the leading choice in this field with good cause. There is a lot of community support, custom modules, paid consultants, free and paid training. All this makes it an ideal choice for facilitating Big Data aggregation.
September 22, 2017

Hadoop review 2346

Gyan Dwibedy | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Incentivized
No SQL database were evaluated along with MPP platform. Hadoop performs very well compared to the other platforms. Also since lot of investment goes into Hadoop there is a good chance of getting what one needs from the developer community.
Mark Gargiulo | TrustRadius Reviewer
Score 8 out of 10
Vetted Review
Verified User
Incentivized
As I am new to the hadoop ecosystem I have not used or evaluated any other similar products at this time. This was handed to me from a previous much older installation that was very under utilized. Our new platform will be working the new cluster much harder with jobs that run indefinitely. I'm not sure that any of the other "big data" technologies out there have as many certified components or work with such a diverse collection but as I said I am pretty new to this and so have only tertiary knowledge of competing products.
Muhammad Fazalul Rahman | TrustRadius Reviewer
Score 7 out of 10
Vetted Review
Verified User
Incentivized
Hadoop was a cheaper alternative to Amazon. Since I had to pay for every minute I use with Amazon, I had to make sure multiple times that the code was good enough before I purchased with Amazon. But since Hadoop was available on the cluster, I had the opportunity to code on the way.
Piyush Routray | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Incentivized
Hadoop being open source, is cheaper to use and do POCs for clients. Cloudera, Hortonworks and MapR also compete to contribute to open source Hadoop and keep their product conceptually similar to Hadoop.
Tushar Kulkarni | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User
Incentivized
Apache Spark has an in memory processing model, making it powerful for lightning fast data processing. Apache Spark also exposes Scala and Python in APIs which is one of the most commonly used programming languages in data analytic and data processing domains.
Score 9 out of 10
Vetted Review
Verified User
Incentivized
Not used any other product than Hadoop and I don't think our company will switch to any other product, as Hadoop is providing excellent results. Our company is growing rapidly, Hadoop helps to keep up our performance and meet customer expectations. We also use HDFS which provides very high bandwidth to support MapReduce workloads.
Mrugen Deshmukh | TrustRadius Reviewer
Score 8 out of 10
Vetted Review
Verified User
Incentivized
Hadoop provides storage for large data sets and a powerful processing model to crunch and transform huge amounts of data. It does not assume the underlying hardware or infrastructure and enables the users to build data processing infrastructure from commodity hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are commonplace and thus should be automatically handled in software by the framework, relieving the developers from handling every edge scenario that can occur in a large distributed system.
Hadoop can be deployed in a traditional onsite datacenter as well as in the cloud. The cloud allows organizations to deploy Hadoop without hardware to acquire or a specific setup expertise. Many vendors who currently have an offer for the cloud include Microsoft, Amazon and Google.
Score 9 out of 10
Vetted Review
Verified User
Incentivized
  • Nope
Processing of big data has been the ultimate need for the me choosing Hadoop. Big data is massive and messy, and it’s coming at you uncontrolled. Data are gathered to be analyzed to discover patterns and correlations that could not be initially apparent, but might be useful in making business decisions in an organization. These data are often personal data, which are useful from a marketing viewpoint to understand the desires and demands of potential customers and in analyzing and predicting their buying tendencies.

I think Hadoop processes it very efficiently.
Return to navigation