Apache Hadoop Reviews and Ratings

Rating: 7.5 out of 10

Score

7.5 out of 10

Community insights

TrustRadius Insights for Apache Hadoop are summaries of user sentiment data from TrustRadius reviews and, when necessary, third party data sources.

Business Problems Solved

Hadoop has been widely adopted by organizations for various use cases. One of its key use cases is in storing and analyzing log data, financial data from systems like JD Edwards, and retail catalog and session data for an omnichannel experience. Users have found that Hadoop's distributed processing capabilities allow for efficient and cost-effective storage and analysis of large amounts of data. It has been particularly helpful in reducing storage costs and improving performance when dealing with massive data sets. Furthermore, Hadoop enables the creation of a consistent data store that can be integrated across platforms, making it easier for different departments within organizations to collect, store, and analyze data. Users have also leveraged Hadoop to gain insights into business data, analyze patterns, and solve big data modeling problems. The user-friendly nature of Hadoop has made it accessible to users who are not necessarily experts in big data technologies. Additionally, Hadoop is utilized for ETL processing, data streaming, transformation, and querying data using Hive. Its ability to serve as a large volume ETL platform and crunching engine for analytical and statistical models has attracted users who were previously reliant on MySQL data warehouses. They have observed faster query performance with Hadoop compared to traditional solutions. Another significant use case for Hadoop is secure storage without high costs. Hadoop efficiently stores and processes large amounts of data, addressing the problem of secure storage without breaking the bank. Moreover, Hadoop enables parallel processing on large datasets, making it a popular choice for data storage, backup, and machine learning analytics. Organizations have found that it helps maintain and process huge amounts of data efficiently while providing high availability, scalability, and cost efficiency. Hadoop's versatility extends beyond commercial applications—it is also used in research computing clusters to complete tasks faster using the MapReduce framework. Finally, the Systems and IT department relies on Hadoop to create data pipelines and consult on potential projects involving Hadoop. Overall, the use cases of Hadoop span across industries and departments, providing valuable solutions for data collection, storage, and analysis.

Reviews

37 Reviews

Open source Hadoop: smart choice smart price

Rating: 8 out of 10

Incentivized

January 14, 2025

Use Cases and Deployment Scope

We are using the Apache Hadoop to handle the data which is continuously coming from different devices in real time from different geographical location across the globe and then run spark jobs and notebook to ingest the data and process it and then load it other external systems for further processing.

Pros

It’s ability to handle magnitude of data is what makes Hadoop a go to open source product
It’s open source nature makes if quite configurable
Its community support is superb.

Cons

It’s set up is quite complex which requires good knowledge of it
It’s fine tuning in terms of configuration requires in depth knowledge of the product
It’s logging can be improved

Likelihood to Recommend

When you have real time data which amounts to massive volumes close to terabytes daily, it’s become quite imperative that we should have a system which can handle it and ingest without losing it. Having Hadoop in place makes our product more robust, its stability comes handy. <div>

</div><div>The only challenge in running huge clusters is it require huge amount of space and memory for efficient working.</div>

Verified User

Engineer in Engineering (10,001+ employees)

Vetted Review

4 years of experience

Great enterprise tool for handling large data

Rating: 9 out of 10

Incentivized

August 17, 2021

Use Cases and Deployment Scope

Apache Hadoop is one of the most effective and efficient software which has been storing and processing an extremely colossal amount of data in my company for a long time now. The software Hadoop is primarily used for data collection of large amounts, storage as well as for analytics. From my experience, I have to say that Hadoop is extremely useful and has a reliable plus valid purpose.

Pros

The various modules sometimes are pretty challenging to learn but at the same time, it has made Hadoop easy to implement and perform.
Hadoop comprises a thoughtful file system which is called as Hadoop Distributed File System that beautifully processes all components and programs.
Hadoop is also very easy to install so this is also a great aspect of Hadoop as sometimes the installation process is so tricky that the user loses interest.
Customer support is quick.

Cons

As much as I really appreciate Hadoop there are certain cons attached to it as well. I personally think that Hadoop should work attentively towards their interactive querying platforms which in my opinion is quite slow as compared to other players available in the market.
Apart from that, a con that I have noticed is that there are many modules that exist in Hadoop so due to the higher number of modules it becomes difficult and time-consuming to learn and ace all of them.

Likelihood to Recommend

Apache Hadoop is majorly suited for companies that have large amounts of unstructured data flow like advertising and even web traffic so I feel that Hadoop is a great option when you have the extra bulk of data that is required to be stored and processed on a continuous basis. Moreover, I do recommend Hadoop but at the same time, I would also hope and suggest that the software of Hadoop gets supplemented with a faster and interactive database so that the overall querying service gets better.

Chantel Moreno

Finance & Accounting Professional in Finance and Accounting at Fagron (1001-5000 employees)

Vetted Review

2 years of experience

View profile

Good tool for unstructured data

Rating: 9 out of 10

Incentivized

July 21, 2021

Use Cases and Deployment Scope

Apache Hadoop is an open-source software library that is designed for the collection, storage, and analysis of large amounts of data sets. Apache Hadoop’s architecture comprises components that include a distributed file system. This is mostly used for massive data collection, analytics, and storage. Also, having consistent data can be integrated across other platforms and have one single source of truth.

Pros

Apache Hadoop has made managing large amounts of data quite easy.
The system contains a file system known as HDFS (Hadoop Distributed File System) which processes components and programs.
The parallel processing tool of this software is also a good aspect of Apache Hadoop.
It keeps interesting and reliable features and functions.
Apache Hadoop also has a store of very big data files in machines with high levels of availability.

Cons

I personally feel that Apache Hadoop is slower as compared to other interactive querying platforms. Queries can take up to hours sometimes which can be frustrating and discouraging sometimes.
Also, there are so many modules of Apache Hadoop so it takes so much more time to learn all of them. Other than that, optimization is somewhat a challenge in Apache Hadoop.

Likelihood to Recommend

Altogether, I want to say that Apache Hadoop is well-suited to a larger and unstructured data flow like an aggregation of web traffic or even advertising. I think Apache Hadoop is great when you literally have petabytes of data that need to be stored and processed on an ongoing basis. Also, I would recommend that the software should be supplemented with a faster and interactive database for a better querying service. Lastly, it's very cost-effective so it is good to give it a shot before coming to any conclusion.

Peter Suter

Senior Software Engineer (GUI) in Engineering at SIX (1001-5000 employees)

Vetted Review

2 years of experience

View profile

Good solution for storing and processing large data

Rating: 7 out of 10

Incentivized

May 20, 2021

Use Cases and Deployment Scope

We use Apache Hadoop to store and process large amounts of data (petabytes per day) across thousands of data pipelines. Hadoop works reliably for this purpose. Data scientists at the company also use it for interactive querying for analytics and modeling purposes.

Pros

Storing large amounts of data
Processing large amounts of data via a familiar SQL interface

Cons

Slower than other interactive querying engines. Queries take minutes at least and up to hours sometimes
Tuning the settings to be able to run certain queries can require a lot of domain knowledge

Likelihood to Recommend

If you have petabytes of data that you need to store and process on a regular basis and don't mind having to wait minutes for your queries to run, Apache Hadoop is great for that use case. I would supplement it with another faster interactive database for interactive querying.

Verified User

Analyst in Marketing (501-1000 employees)

Vetted Review

5 years of experience

Apache Hadoop Can Save on the Headaches

Rating: 7 out of 10

Incentivized

January 16, 2021

Use Cases and Deployment Scope

[Apache Hadoop] is being handled as it is (mostly) intended. For large, unstructured data management from our data flows to include logging and reports extract, transform and load. We are using it at a medium scale in an on-prem server delivery with Cloudera as the management platform. While I firmly believe cloudera makes it a bit easier to manage, it obfuscates issues at times.

Pros

Handles large amounts of unstructured data well, for business level purposes
Is a good catchall because of this design, i.e. what does not fit into our vertical tables fits here.
Decent for large ETL pipelines and logging free-for-alls because of this, also.

Cons

Many, many modules and because of Apache open source, takes time to learn
Integration is not always seamless between the disparate pieces nor are all the pieces required.
Optimization can be challenging (see PSTL design)

Likelihood to Recommend

Apache Hadoop (and its subsequent add-ons) are well-suited to larger, unstructured data flows, such as aggregation of web traffic or advertising. Geospatial algorithms and their outputs are well-suited for this kind of aggregation as structuring that data is challenging, but leaving it unstructured and performing queries as-needed is a better fit for most business models. With the advent of data science, I would expect Hadoop fits a LOT of their initial outputs quite well.

Joe Hughes

Senior DevOps Engineer in Information Technology at Simpli.fi (201-500 employees)

Vetted Review

5 years of experience

Hadoop -- Great Value for What You Pay

Rating: 7 out of 10

Incentivized

September 21, 2020

Use Cases and Deployment Scope

It's used organization-wide for older data that's not used as frequently. We use Teradata to warehouse our more recent data, but for data we don't access as often, it's migrated to Hadoop. It addresses the problem of securely storing data without paying the fortune that most warehouses charge for premium cloud storage.

Pros

Accessible
Inexpensive
User friendly

Cons

Much slower than more premium platforms
Doesn't connect with other data warehouses
Not mainstream -- somewhat more, "hacky" of a solution

Likelihood to Recommend

Need cheap enterprise-level storage for data that is necessary to keep but isn't regularly accessed? Hadoop is the option for you. If you regularly have analysts or apps accessing the data warehouse, look for something more premium such as Teradata. The good news is that general SQL knowledge transfers well to this warehouse.

Blake Baron

Senior Financial Analyst in Finance and Accounting at Lowe's Companies, Inc. (10,001+ employees)

Vetted Review

1 year of experience

View profile

Fault Tolerance and High Availablility Made Easy with Hadoop

Rating: 10 out of 10

Incentivized

September 20, 2020

Use Cases and Deployment Scope

We are using it within my department to process large sets of data that can't be processed in a timely fashion on a single computer or node. The various modules provided with Hadoop make it easy for us to implement map-reduce and perform parallel processing on large sets of data. We have approximately 40TB of data that we run various algorithms against as we try to use the data to solve business problems and prevent fraudulent transactions.

Pros

Map-reduce
Parallel processing
Handles node failures
HDFS: distributed file system

Cons

More connectors
Query optimization
Job scheduling

Likelihood to Recommend

Hadoop is easy to use. It is a scalable and cost-effective solution for working with large data sets. Hadoop accepts data from a variety of disparate data sources, such as social media feeds, structured or unstructured data, XML, text files, images, etc. Hadoop is also highly available and fault-tolerant, supporting multiple standby NameNodes. The performance of Hadoop is also good because it stores data in a distributed fashion allowing for distributed processing and lower run times. And Hadoop is open-source, making the source code available for modification if necessary. Hadoop also supports multiple languages like C/C++, Python, and Groovy.

Gene Baker

Vice President, Chief Architect, Development Manager and Software Engineer in Information Technology at WySTAR Global Retirement Solutions, a Wells Fargo Company (10,001+ employees)

Vetted Review

2 years of experience

View profile

Hadoop vs. Alternatives

Rating: 8 out of 10

Incentivized

June 5, 2019

Use Cases and Deployment Scope

It is being used at our Fortune 500 clients. It is great for storage, but it is not well understood by the business. The challenge is that it requires very sophisticated data scientists to use properly and in parallel, but the data scientists turn the data on its head, causing IT execution issues. This has forced IT to restructure data in a denormalized form so the business users can actually be productive. This is a big trend in organizations.

Pros

Great for inexpensive storage, when originally introduced.
Distributed processing
Industry standard

Cons

Network fabric needs to be more sophisticated.
Need centralized storage.
The three copy of data should have been in the original design, not years later.
Consider deploying Spectrum Scale in these environments.

Likelihood to Recommend

Massive processing in a distributed environment with data that can be distributed. Research environments. Lab environments would also be a good use for Hadoop. Hadoop can also be used in support of Spark environments and used by Frameworks if deployed properly. The best scenario is with a Data Scientist that understands how to program appropriately.

Verified User

Executive in Professional Services (51-200 employees)

Vetted Review

5 years of experience

Hadoop: A Robust Big Data Platform

Rating: 9 out of 10

Incentivized

December 22, 2018

Use Cases and Deployment Scope

Hadoop is being used to solve big data modeling problems in our firm. The corporate analytics team uses Hadoop to perform functions like data manipulation, information retrieval, data mapping, and statistical modeling. The business problem which it solves is the limitation of CSV/Excel files to handle more than a million rows. Hadoop allows you to process big data and also has connectivity with platforms like R Studio where you can deploy mathematical models.

Pros

Capability to collaborate with R Studio. Most of the statistical algorithms can be deployed.
Handling Big Data issues like storage, information retrieval, data manipulation, etc.
Redundant tasks like data wrangling, data processing, and cleaning are more efficient in Hadoop as the processing times are faster.

Cons

Hadoop requires intensive computational platforms like a minimum of 8GB memory and i5 processor. Sometimes the hardware does become a hindrance.
If we can connect Hadoop to Salesforce, it would be a tremendous functionality as most CRM data comes from that channel.
It will be good to have some Geo Coding features if someone wants to opt for spatial data analysis using latitudes and longitudes.

Likelihood to Recommend

Hadoop is very well suited for big data modeling problems in various industries like finance, insurance, healthcare, automobiles, CRM, etc. In every industry where you need data analysis in real time, Hadoop is a perfect fit in terms of storage, analysis, retrieval, and processing. It won't be a very good tool to perform ETL (Extract Transform Load) techniques though.

Kunal Sonalkar

Data Research Analyst in Information Technology at Southwest Florida Water Management District (5001-10,000 employees)

Vetted Review

2 years of experience

View profile

Hadoop Review

Rating: 7 out of 10

Incentivized

May 16, 2018

Use Cases and Deployment Scope

It is massively being used in our organization for data storage, data backup, and machine learning analytics. Managing vast amounts of data has become quite easy since the arrival of the Hadoop environment. Our department is on verge of moving towards Spark instead of MapReduce, but for now, Hadoop is being used extensively for MapReduce purposes.

Pros

Hadoop Distributed Systems is reliable.
High scalability
Open Sources, Low Cost, Large Communities

Cons

Compatibility with Windows Systems
Security needs more focus
Hadoop lack in real time processing

Likelihood to Recommend

<div>Hadoop helps us tackle our problem of maintaining and processing a huge amount of data efficiently. High availability, scalability and cost efficiency are the main considerations for implementing Hadoop as one of the core solutions in our big-data infrastructure. Where relational databases fall short with regard to tuning and performance, Hadoop rises to the occasion and allows for massive customization leveraging the different tools and modules. We use Hadoop to input raw data and add layers of consolidation or analysis to make business decisions about disparate data points.</div><div>

</div>

Kartik Chavan

Peer Educator (Tutor) & Supplemental Instructions (SI) Leader in Information Technology at The University of Texas at Arlington (1001-5000 employees)

Vetted Review

2 years of experience

View profile

Loading Reviews List....

Home
,
Hadoop-Related
,
Apache Hadoop
,
Reviews and Ratings

Great enterprise tool for handling large data

Rating: 9 out of 10

Incentivized

August 17, 2021

Use Cases and Deployment Scope

Pros

The various modules sometimes are pretty challenging to learn but at the same time, it has made Hadoop easy to implement and perform.
Hadoop comprises a thoughtful file system which is called as Hadoop Distributed File System that beautifully processes all components and programs.
Hadoop is also very easy to install so this is also a great aspect of Hadoop as sometimes the installation process is so tricky that the user loses interest.
Customer support is quick.

Cons

As much as I really appreciate Hadoop there are certain cons attached to it as well. I personally think that Hadoop should work attentively towards their interactive querying platforms which in my opinion is quite slow as compared to other players available in the market.
Apart from that, a con that I have noticed is that there are many modules that exist in Hadoop so due to the higher number of modules it becomes difficult and time-consuming to learn and ace all of them.

Likelihood to Recommend

Chantel Moreno

Finance & Accounting Professional in Finance and Accounting at Fagron (1001-5000 employees)

Vetted Review

2 years of experience

View profile