Hadoop

Hadoop

About TrustRadius Scoring
Score 8.5 out of 100
Apache Hadoop

Overview

Recent Reviews

Hadoop vs. Alternatives

8 out of 10
June 05, 2019
It is being used at our Fortune 500 clients. It is great for storage, but it is not well understood by the business. The challenge is that …
Continue reading

Hadoop Review

7 out of 10
May 16, 2018
It is massively being used in our organization for data storage, data backup, and machine learning analytics. Managing vast amounts of …
Continue reading

Hadoop is pretty Badass

9 out of 10
January 04, 2018
Apache Hadoop is a cost effective solution for storing and managing vast amounts of data efficiently. It is dependable and works even when …
Continue reading

Hadoop review 2346

9 out of 10
September 22, 2017
Hadoop is used to build a data lake where all enterprise data for my entire company can be stored. With data centralization and …
Continue reading

Reviewer Pros & Cons

View all pros & cons

Video Reviews

Leaving a video review helps other professionals like you evaluate products. Be the first one in your network to record a review of Hadoop, and make your voice heard!

Pricing

View all pricing
N/A
Unavailable

What is Hadoop?

Hadoop is an open source software from Apache, supporting distributed processing and data storage. Hadoop is popular for its scalability, reliability, and functionality available across commoditized hardware.

Entry-level set up fee?

  • No setup fee

Offerings

  • Free Trial
  • Free/Freemium Version
  • Premium Consulting / Integration Services

Would you like us to let the vendor know that you want pricing?

12 people want pricing too

Features Scorecard

No scorecards have been submitted for this product yet..

Product Details

What is Hadoop?

Hadoop is an open source software from Apache, supporting distributed processing and data storage. Hadoop is popular for its scalability, reliability, and functionality available across commoditized hardware.

Hadoop Video

What is Hadoop?

Hadoop Integrations

  • Sematext Infrastructure Monitoring (formerly Sematext SPM)

Hadoop Technical Details

Operating SystemsUnspecified
Mobile ApplicationNo

Comparisons

View all alternatives

Frequently Asked Questions

What is Hadoop?

Hadoop is an open source software from Apache, supporting distributed processing and data storage. Hadoop is popular for its scalability, reliability, and functionality available across commoditized hardware.

What is Hadoop's best feature?

Reviewers rate Data Sources highest, with a score of 8.7.

Who uses Hadoop?

The most common users of Hadoop are from Enterprises (1,001+ employees) and the Information Technology & Services industry.

Reviews

(1-25 of 37)
Companies can't remove reviews or game the system. Here's why
Kunal Sonalkar | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Review Source
  • Capability to collaborate with R Studio. Most of the statistical algorithms can be deployed.
  • Handling Big Data issues like storage, information retrieval, data manipulation, etc.
  • Redundant tasks like data wrangling, data processing, and cleaning are more efficient in Hadoop as the processing times are faster.
  • Hadoop requires intensive computational platforms like a minimum of 8GB memory and i5 processor. Sometimes the hardware does become a hindrance.
  • If we can connect Hadoop to Salesforce, it would be a tremendous functionality as most CRM data comes from that channel.
  • It will be good to have some Geo Coding features if someone wants to opt for spatial data analysis using latitudes and longitudes.
Chantel Moreno | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Review Source
  • The various modules sometimes are pretty challenging to learn but at the same time, it has made Hadoop easy to implement and perform.
  • Hadoop comprises a thoughtful file system which is called as Hadoop Distributed File System that beautifully processes all components and programs.
  • Hadoop is also very easy to install so this is also a great aspect of Hadoop as sometimes the installation process is so tricky that the user loses interest.
  • Customer support is quick.
  • As much as I really appreciate Hadoop there are certain cons attached to it as well. I personally think that Hadoop should work attentively towards their interactive querying platforms which in my opinion is quite slow as compared to other players available in the market.
  • Apart from that, a con that I have noticed is that there are many modules that exist in Hadoop so due to the higher number of modules it becomes difficult and time-consuming to learn and ace all of them.
Peter Suter | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Review Source
  • Apache Hadoop has made managing large amounts of data quite easy.
  • The system contains a file system known as HDFS (Hadoop Distributed File System) which processes components and programs.
  • The parallel processing tool of this software is also a good aspect of Apache Hadoop.
  • It keeps interesting and reliable features and functions.
  • Apache Hadoop also has a store of very big data files in machines with high levels of availability.
  • I personally feel that Apache Hadoop is slower as compared to other interactive querying platforms. Queries can take up to hours sometimes which can be frustrating and discouraging sometimes.
  • Also, there are so many modules of Apache Hadoop so it takes so much more time to learn all of them. Other than that, optimization is somewhat a challenge in Apache Hadoop.
Score 7 out of 10
Vetted Review
Verified User
Review Source
  • Storing large amounts of data
  • Processing large amounts of data via a familiar SQL interface
  • Slower than other interactive querying engines. Queries take minutes at least and up to hours sometimes
  • Tuning the settings to be able to run certain queries can require a lot of domain knowledge
Score 7 out of 10
Vetted Review
Verified User
Review Source
  • Handles large amounts of unstructured data well, for business level purposes
  • Is a good catchall because of this design, i.e. what does not fit into our vertical tables fits here.
  • Decent for large ETL pipelines and logging free-for-alls because of this, also.
  • Many, many modules and because of Apache open source, takes time to learn
  • Integration is not always seamless between the disparate pieces nor are all the pieces required.
  • Optimization can be challenging (see PSTL design)
Mark McCully | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Review Source
  • Scale.
  • Stability.
  • Reliability.
  • There are a lot of Hadoop-specific services and applications under the hood that you have to learn how to administer across the Hadoop cluster.
  • Enterprise-class support doesn't live up to other third-party vendors.
Score 8 out of 10
Vetted Review
Verified User
Review Source
  • Great for inexpensive storage, when originally introduced.
  • Distributed processing
  • Industry standard
  • Network fabric needs to be more sophisticated.
  • Need centralized storage.
  • The three copy of data should have been in the original design, not years later.
  • Consider deploying Spectrum Scale in these environments.
May 16, 2018

Hadoop Review

Kartik Chavan | TrustRadius Reviewer
Score 7 out of 10
Vetted Review
Verified User
Review Source
  • Hadoop Distributed Systems is reliable.
  • High scalability
  • Open Sources, Low Cost, Large Communities
  • Compatibility with Windows Systems
  • Security needs more focus
  • Hadoop lack in real time processing
Bharadwaj (Brad) Chivukula | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User
Review Source
  • HDFS is reliable and solid, and in my experience with it, there are very few problems using it
  • Enterprise support from different vendors makes it easier to 'sell' inside an enterprise
  • It provides High Scalability and Redundancy
  • Horizontal scaling and distributed architecture
  • Less organizational support system. Bugs need to be fixed and outside help take a long time to push updates
  • Not for small data sets
  • Data security needs to be ramped up
  • Failure in NameNode has no replication which takes a lot of time to recover
January 04, 2018

Hadoop is pretty Badass

Score 9 out of 10
Vetted Review
Verified User
Review Source
  • It is cost effective.
  • It is highly scalable.
  • Failure tolerant.
  • Hadoop does not fit all needs.
  • Converting data into a single format takes time.
  • Need to take additional security measures to secure data.
Johanes Siregar | TrustRadius Reviewer
Score 8 out of 10
Vetted Review
Verified User
Review Source
  • Scalability is one of the main reasons we decided to use Hadoop. Storage and processing power can be seamlessly increased by simply adding more nodes.
  • Replication on Hadoop's distributed file system (HDFS) ensures robustness of data being stored which ensures high-availability of data.
  • Using commodity hardware as a node in a Hadoop cluster can reduce cost and eliminates dependency on particular proprietary technology.
  • User and access management are still challenging to implement in Hadoop, deploying a kerberized secured cluster is quite a challenge itself.
  • Multiple application versioning on a single cluster would be a nice to have feature.
  • Processing a large number of small files also becomes a problem on a very large cluster with hundreds of nodes.
Score 10 out of 10
Vetted Review
Verified User
Review Source
  • Hadoop can take loads of data quickly and performs well under load.
  • Hadoop is customizable so that nearly any business objective can be justified with the right combination of data and reports.
  • Hadoop has a lot of great resources, both informal like the community and formal like the supported modules and training.
  • Hadoop is not a relational database, but it has the ability to add modules to run sql-like queries like Impala and Hive.
  • Hadoop is open source and has many modules. It can be difficult without context to know which modules to leverage.
August 24, 2017

Hadoop for Big Data

Vinay Suneja | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User
Review Source
  • Highly Scalable Architecture
  • Low cost
  • Can be used in a Cloud Environment
  • Can be run on commodity Hardware
  • Open Source
  • Its open source but there are companies like hortonworks, Cloudera etc., which give enterprise support
  • Lots of scripting still needed
  • Some tools in the hadoop eco system overlap
Mark Gargiulo | TrustRadius Reviewer
Score 8 out of 10
Vetted Review
Verified User
Review Source
  • The distributed replicated HDFS filesystem allows for fault tolerance and the ability to use low cost JBOD arrays for data storage.
  • Yarn with MapReduce2 gives us a job slot scheduler to fully utilize available compute resources while providing HA and resource management.
  • The hadoop ecosystem allows for the use of many different technologies all using the same compute resources so that your spark, samza, camus, pig and oozie jobs can happily co-exist on the same infrastructure.
  • Without Cloudera as a management interface the hadoop components are much harder to manage to ensure consistency across a cluster.
  • The calculations of hardware resources to job slots/resource management can be quite an exercise in finding that "sweet spot" with your applications, a more transparent way of figuring this out would be welcome.
  • A lot of the roles and management pieces are written in java, which from an administration perspective can have there own issues with garbage collection and memory management.
Muhammad Fazalul Rahman | TrustRadius Reviewer
Score 7 out of 10
Vetted Review
Verified User
Review Source
  • Provides a reliable distributed storage to store and retrieve data. I am able to store data without having to worry that a node failing might cause the loss of data.
  • Parallelizes the task with MapReduce and helps complete the task faster. The ease of use of MapReduce makes it possible to write code in a simple way to make it run on different slaves in the cluster.
  • With the massive user base, it is not hard to find documentation or help relating to any problem in the area. Therefore, I rarely had any instances where I had to look for a solution for a really long time.
  • I would have hoped for a simpler interface if possible, so that the initial effort that had to be spent would have been much less. I often see others who are starting to use hadoop are finding it hard to learn.
  • I'm not sure if it is a problem with the organization and the modules they provide, but sometimes I wish there were more modules available to be used.
Tom Thomas | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User
Review Source
  • HDFS provides a very robust and fast data storage system.
  • Hadoop works well with generic "commodity" hardware negating the need for expensive enterprise grade hardware.
  • It is mostly unaffected by system and hardware failures of nodes and is self-sustained.
  • While its open source nature provides a lot of benefits, there are multiple stability issues that arise due to it.
  • Limited support for interactive analytics.
February 23, 2016

Hadoop quick review

Score 9 out of 10
Vetted Review
Verified User
Review Source
  • Machine Learning Model, when SAS can not process 3 of years data. Hadoop is good tool to build the model.
  • Data warehousing is also another good use case. Using Teradata is expensive.
  • A lot of people are not from a programming background which makes Hue very important for end users when starting the Hadoop journey. Making Hue more user friendly and functional will be helpful for end users who don't much of a programming background.
Piyush Routray | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Review Source
  • Hadoop is open source and with a wide community already present, the usage is much easy for individuals, startups and MNCs alike.
  • Hadoop works well for commodity hardware and that makes it easier to avoid pricey clusters.
  • Hadoop takes parallel programming to next level and helps processing of multi terabytes (even petabytes) of data easier.
  • While Hadoop MR parallelizes jobs involving Big Data, it is slow for smaller data sets
  • OLAP (analytics)is easier, however, OLTP (transactions) is a problem in most cases.
  • People using Hadoop have to keep in mind that small proof of concepts may not scale as expected.
Tushar Kulkarni | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User
Review Source
  • It is robust in the sense that any big data applications will continue to run even when individual servers fail.
  • Enormous data can be easily sorted.
  • It can be improved in terms of security.
  • Since it is open source, stability issues must be improved.
Score 9 out of 10
Vetted Review
Verified User
Review Source
  • Hadoop is a very cost effective storage solution for businesses’ exploding data sets.
  • Hadoop can store and distribute very large data sets across hundreds of servers that operate, therefore it is a highly scalable storage platform.
  • Hadoop can process terabytes of data in minutes and faster as compared to other data processors.
  • Hadoop File System can store all types of data, structured and unstructured, in nodes across many servers
  • For now, Hadoop is doing great and is very productive.
Pierre LaFromboise | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User
Review Source
  • No requirement for schema on write.
  • Ability to scale to massive amounts of data.
  • Open platform provides multiple options and customizations to fit your exact needs.
  • The platform is still maturing and can be confusing to research and use. Basic tasks can still be manual and are not always user friendly.
Mrugen Deshmukh | TrustRadius Reviewer
Score 8 out of 10
Vetted Review
Verified User
Review Source
  • Hadoop is an excellent framework for building distributed, fault tolerant data processing systems which leverage HDFS which is optimized for low latency storage and high throughput performance.
  • Hadoop Map reduce is a powerful programming model and can be leveraged directly either via use of Java programming language or by data flow languages like Apache Pig.
  • Hadoop has a reach eco system of companion tools which enable easy integration for ingesting large amounts of data efficiently from various sources. For example Apache Flume can act as data bus which can use HDFS as a sink and integrates effectively with disparate data sources.
  • Hadoop can also be leveraged to build complex data processing and machine learning workflows, due to availability of Apache Mahout, which uses the map reduce model of Hadoop to run complex algorithms.
  • Hadoop is a batch oriented processing framework, it lacks real time or stream processing.
  • Hadoop's HDFS file system is not a POSIX compliant file system and does not work well with small files, especially smaller than the default block size.
  • Hadoop cannot be used for running interactive jobs or analytics.