TrustRadius
Hadoop is an open source software from Apache, supporting distributed processing and data storage. Hadoop is popular for its scalability, reliability, and functionality available across commoditized hardware.https://dudodiprj2sv7.cloudfront.net/product-logos/Iz/Fd/2KTEU64L1CK3.PNGGreat Option for Unstructured DataUsed for Massive data collection, storage, and analyticsUsed for MapReduce processes, Hive tables, Spark job input, and for backing up dataStoring Retail Catalog & Session data to enable omnichannel experience for customers, and a 360-degree customer insightHaving a consistent data store that can be integrated across other platforms, and have one single source of truth.,HDFS is reliable and solid, and in my experience with it, there are very few problems using it Enterprise support from different vendors makes it easier to 'sell' inside an enterprise It provides High Scalability and Redundancy Horizontal scaling and distributed architecture,Less organizational support system. Bugs need to be fixed and outside help take a long time to push updates Not for small data sets Data security needs to be ramped up Failure in NameNode has no replication which takes a lot of time to recover,10,Too many Hadoop projects have community focus divided; this causes some bug fixes to happen slow Mindset change among business partners Adopting Hadoop/MapReduce has a learning curve,Apache Spark,Apache Solr, Apache Hive, Apache SparkHadoop ReviewIt is massively being used in our organization for data storage, data backup, and machine learning analytics. Managing vast amounts of data has become quite easy since the arrival of the Hadoop environment. Our department is on verge of moving towards Spark instead of MapReduce, but for now, Hadoop is being used extensively for MapReduce purposes.,Hadoop Distributed Systems is reliable. High scalability Open Sources, Low Cost, Large Communities,Compatibility with Windows Systems Security needs more focus Hadoop lack in real time processing,7,Positive impact as this is the future. Abundance of tools Return on Investment is high, as Big Data helps make better decisions Hadoop has made it possible to implement projects that require large amounts of data from a diverse set of source systems.,Apache Spark, Apache Spark MLib, Apache Pig and Amazon Redshift,Apache Hive, Apache Pig, Apache SparkHadoop: Highly available, scalable and cost effective for big data storage and processing.Currently, there are two directorates using Hadoop for processing a vast amount of data from various data sources in my organization. Hadoop helps us tackle our problem of maintaining and processing a huge amount of data efficiently. High availability, scalability and cost efficiency are the main considerations for implementing Hadoop as one of the core solutions in our big-data infrastructure.,Scalability is one of the main reasons we decided to use Hadoop. Storage and processing power can be seamlessly increased by simply adding more nodes. Replication on Hadoop's distributed file system (HDFS) ensures robustness of data being stored which ensures high-availability of data. Using commodity hardware as a node in a Hadoop cluster can reduce cost and eliminates dependency on particular proprietary technology.,User and access management are still challenging to implement in Hadoop, deploying a kerberized secured cluster is quite a challenge itself. Multiple application versioning on a single cluster would be a nice to have feature. Processing a large number of small files also becomes a problem on a very large cluster with hundreds of nodes.,8,Hadoop as a huge impact on reducing the cost of data storage in our organization. Other then that it also serves as low-cost big data processing framework. The use of commodity hardware for the physical layer greatly reduces technological dependency on proprietary products.,Teradata Database, Amazon Elastic MapReduce and Elastic Grid,Microsoft Azure, Microsoft Power BI, Teradata Aster DatabaseHadoop review 2346Hadoop is used to build a data lake where all enterprise data for my entire company can be stored. With data centralization and standardization we use it to build analytical solutions for our company. There are many other uses for the data - for example monitoring performance via KPIs, etc.,Massive data processing Fault tolerance Speed to market,Data visualization Data history Random access,9,Cost reduction Time to market Abundant tools,,IBM Netezza Data Warehouse Appliances, dotConnect for SQL ServerHadoop for Justifying Business Decisions with Hard DataHadoop has been an amazing development in the world of Big Data. Where relational databases fall short with regard to tuning and performance, Hadoop rises to the occasion and allows for massive customization leveraging the different tools and modules. We use Hadoop to input raw data and add layers of consolidation or analysis to make business decisions about disparate datapoints.,Hadoop can take loads of data quickly and performs well under load. Hadoop is customizable so that nearly any business objective can be justified with the right combination of data and reports. Hadoop has a lot of great resources, both informal like the community and formal like the supported modules and training.,Hadoop is not a relational database, but it has the ability to add modules to run sql-like queries like Impala and Hive. Hadoop is open source and has many modules. It can be difficult without context to know which modules to leverage.,10,Hadoop has an amazing potential for ROI if implemented properly to justify business decisions. Hadoop can allow groups to understand how their work is impacting performance at a high level, such as page clicks, where people are spending their time, how users are engaging with the application, etc.,,Atlassian Confluence, MySQL, PostgreSQL
Unspecified
Hadoop
204 Ratings
Score 8.0 out of 101
TRScore

Hadoop Reviews

Hadoop
204 Ratings
Score 8.0 out of 101
Show Filters 
Hide Filters 
Filter 204 vetted Hadoop reviews and ratings
Clear all filters
Overall Rating
Reviewer's Company Size
Last Updated
By Topic
Industry
Department
Experience
Job Type
Role
Reviews (1-25 of 28)
  Vendors can't alter or remove reviews. Here's why.
March 28, 2018

Hadoop Review: "Great Option for Unstructured Data"

Score 10 out of 10
Vetted Review
Verified User
Review Source
  • Used for Massive data collection, storage, and analytics
  • Used for MapReduce processes, Hive tables, Spark job input, and for backing up data
  • Storing Retail Catalog & Session data to enable omnichannel experience for customers, and a 360-degree customer insight
  • Having a consistent data store that can be integrated across other platforms, and have one single source of truth.
  • HDFS is reliable and solid, and in my experience with it, there are very few problems using it
  • Enterprise support from different vendors makes it easier to 'sell' inside an enterprise
  • It provides High Scalability and Redundancy
  • Horizontal scaling and distributed architecture
  • Less organizational support system. Bugs need to be fixed and outside help take a long time to push updates
  • Not for small data sets
  • Data security needs to be ramped up
  • Failure in NameNode has no replication which takes a lot of time to recover
  • Less appropriate for small data sets
  • Works well for scenarios with bulk amount of data. They can surely go for Hadoop file system, having offline applications
  • It's not an instant querying software like SQL; so if your application can wait on the crunching of data, then use it
  • Not for real-time applications
Read Bharadwaj (Brad) Chivukula's full review
May 16, 2018

"Hadoop Review"

Score 7 out of 10
Vetted Review
Verified User
Review Source
It is massively being used in our organization for data storage, data backup, and machine learning analytics. Managing vast amounts of data has become quite easy since the arrival of the Hadoop environment. Our department is on verge of moving towards Spark instead of MapReduce, but for now, Hadoop is being used extensively for MapReduce purposes.
  • Hadoop Distributed Systems is reliable.
  • High scalability
  • Open Sources, Low Cost, Large Communities
  • Compatibility with Windows Systems
  • Security needs more focus
  • Hadoop lack in real time processing
Hadoop helps us tackle our problem of maintaining and processing a huge amount of data efficiently. High availability, scalability and cost efficiency are the main considerations for implementing Hadoop as one of the core solutions in our big-data infrastructure. Where relational databases fall short with regard to tuning and performance, Hadoop rises to the occasion and allows for massive customization leveraging the different tools and modules. We use Hadoop to input raw data and add layers of consolidation or analysis to make business decisions about disparate data points.

Read Kartik Chavan's full review
December 13, 2017

Review: "Hadoop: Highly available, scalable and cost effective for big data storage and processing."

Score 8 out of 10
Vetted Review
Verified User
Review Source
Currently, there are two directorates using Hadoop for processing a vast amount of data from various data sources in my organization. Hadoop helps us tackle our problem of maintaining and processing a huge amount of data efficiently. High availability, scalability and cost efficiency are the main considerations for implementing Hadoop as one of the core solutions in our big-data infrastructure.
  • Scalability is one of the main reasons we decided to use Hadoop. Storage and processing power can be seamlessly increased by simply adding more nodes.
  • Replication on Hadoop's distributed file system (HDFS) ensures robustness of data being stored which ensures high-availability of data.
  • Using commodity hardware as a node in a Hadoop cluster can reduce cost and eliminates dependency on particular proprietary technology.
  • User and access management are still challenging to implement in Hadoop, deploying a kerberized secured cluster is quite a challenge itself.
  • Multiple application versioning on a single cluster would be a nice to have feature.
  • Processing a large number of small files also becomes a problem on a very large cluster with hundreds of nodes.
Hadoop is well suited for internal projects in a secure environment without any external exposure. It also excels well in storing and processing large amounts of data. It is also suitable to be implemented as a data repository for data-intensive applications which require high data availability, a significant amount of memory and huge processing power. However, it is not appropriate to implement as a near real-time solution which needs a high response time with a high number of high transactions per seconds.
Read Johanes Siregar's full review
September 22, 2017

"Hadoop review 2346"

Score 9 out of 10
Vetted Review
Verified User
Review Source
Hadoop is used to build a data lake where all enterprise data for my entire company can be stored. With data centralization and standardization we use it to build analytical solutions for our company. There are many other uses for the data - for example monitoring performance via KPIs, etc.
  • Massive data processing
  • Fault tolerance
  • Speed to market
  • Data visualization
  • Data history
  • Random access
Best - Analytics

Worst: transaction processing
Read Gyan Dwibedy's full review
October 24, 2017

Review: "Hadoop for Justifying Business Decisions with Hard Data"

Score 10 out of 10
Vetted Review
Verified User
Review Source
Hadoop has been an amazing development in the world of Big Data. Where relational databases fall short with regard to tuning and performance, Hadoop rises to the occasion and allows for massive customization leveraging the different tools and modules. We use Hadoop to input raw data and add layers of consolidation or analysis to make business decisions about disparate datapoints.
  • Hadoop can take loads of data quickly and performs well under load.
  • Hadoop is customizable so that nearly any business objective can be justified with the right combination of data and reports.
  • Hadoop has a lot of great resources, both informal like the community and formal like the supported modules and training.
  • Hadoop is not a relational database, but it has the ability to add modules to run sql-like queries like Impala and Hive.
  • Hadoop is open source and has many modules. It can be difficult without context to know which modules to leverage.
Hadoop is well suited for organizations with a lot of data, trying to justify business decisions with data-driven KPIs and milestones. This tool is best utilized by engineers with data modeling experience and a high-level understanding of how the different data points can be used and correlated. It will be challenging for people with limited knowledge of the business and how data points are created.
Read this authenticated review
January 03, 2018

User Review: "Hadoop is pretty Badass"

Score 9 out of 10
Vetted Review
Verified User
Review Source
Apache Hadoop is a cost effective solution for storing and managing vast amounts of data efficiently. It is dependable and works even when various clusters fail. The Hadoop Distributed File System (HDFS) also goes a long way in helping in storing data. MapReduce and Tez, with the help of Hive of course, processes large amounts of data in a lesser time frame than expected. This helps our data warehouse to be updated with lesser resources rather than reading, processing and updating data in a relational data base.
  • It is cost effective.
  • It is highly scalable.
  • Failure tolerant.
  • Hadoop does not fit all needs.
  • Converting data into a single format takes time.
  • Need to take additional security measures to secure data.
When we have data coming in from various sources, using hadoop is a good call. Its a good central station to take a good look at your data and see what needs to be done.
Hadoop should not be used directly for Real time Analytics. HDFS should be used to store data and we could use Hive to query the files.
Hadoop needs to be understood thoroughly even before attempting to use it for data warehousing needs. So you may need to take stock of what Hadoop provides, and read up on its accompanying tools to see what fits your needs.
Read this authenticated review
August 24, 2017

User Review: "Hadoop for Big Data"

Score 10 out of 10
Vetted Review
Verified User
Review Source
[It was used] As a proof of concept to analyze a huge amount of data. We were building a product to analyze huge data and eventually sell that product to a utility.
  • Highly Scalable Architecture
  • Low cost
  • Can be used in a Cloud Environment
  • Can be run on commodity Hardware
  • Open Source
  • Its open source but there are companies like hortonworks, Cloudera etc., which give enterprise support
  • Lots of scripting still needed
  • Some tools in the hadoop eco system overlap
  • To analyze a huge quantity of data at a low cost. It is definitely the future.
  • Machine learning with Spark is also a good use case.
  • You can also use AWS - EMR with S3 to store a lot of data with low cost.
Read Vinay Suneja's full review
June 03, 2016

User Review: "A newbie's look at Hadoop"

Score 8 out of 10
Vetted Review
Verified User
Review Source
We needed a robust/redundant system to run multiple simultaneous jobs for our ETL pipeline, this needed distributed storage space, integration with Windows AD user accounts and the ability to expand when needed with little to no downtime.
We are using Cloudera 5.6 to orchestrate the install (along with puppet) and manage the hadoop cluster.
  • The distributed replicated HDFS filesystem allows for fault tolerance and the ability to use low cost JBOD arrays for data storage.
  • Yarn with MapReduce2 gives us a job slot scheduler to fully utilize available compute resources while providing HA and resource management.
  • The hadoop ecosystem allows for the use of many different technologies all using the same compute resources so that your spark, samza, camus, pig and oozie jobs can happily co-exist on the same infrastructure.
  • Without Cloudera as a management interface the hadoop components are much harder to manage to ensure consistency across a cluster.
  • The calculations of hardware resources to job slots/resource management can be quite an exercise in finding that "sweet spot" with your applications, a more transparent way of figuring this out would be welcome.
  • A lot of the roles and management pieces are written in java, which from an administration perspective can have there own issues with garbage collection and memory management.
Hadoop is not for the faint of heart and is not a technology per se but an ecosystem of disparate technologies sitting on top of HDFS. It is certainly powerful but if, like me, you were handed this with no prior knowledge or experience using or administering this ecosystem the learning curve can be significant and ongoing having said that I don't think currently there are many other opensource technologies that can provide the flexibility in the "big data" arena especially for ETL or machine learning.
Read Mark Gargiulo's full review
May 26, 2016

Review: "Experience with Hadoop by a novice user."

Score 7 out of 10
Vetted Review
Verified User
Review Source
Hadoop is not used as a norm in my organization. I just use it personally to complete my job faster. It is implemented in the research computing cluster to be used by faculty and students. It completes jobs faster by parallelizing the tasks using MapReduce framework. This gives me considerable speed in the tasks I perform.
  • Provides a reliable distributed storage to store and retrieve data. I am able to store data without having to worry that a node failing might cause the loss of data.
  • Parallelizes the task with MapReduce and helps complete the task faster. The ease of use of MapReduce makes it possible to write code in a simple way to make it run on different slaves in the cluster.
  • With the massive user base, it is not hard to find documentation or help relating to any problem in the area. Therefore, I rarely had any instances where I had to look for a solution for a really long time.
  • I would have hoped for a simpler interface if possible, so that the initial effort that had to be spent would have been much less. I often see others who are starting to use hadoop are finding it hard to learn.
  • I'm not sure if it is a problem with the organization and the modules they provide, but sometimes I wish there were more modules available to be used.
If the user is trying to complete a task quickly and efficiently, then Hadoop is the best option for them. However, it may happen that the deadline for the submission is close and the user has little or no knowledge of Hadoop. In this case, it is easier not to use hadoop since it takes time to learn.
Read Muhammad Fazalul Rahman's full review
May 25, 2016

Review: "Hadoop is the Perfect Enterprise tool for Big Data"

Score 10 out of 10
Vetted Review
Verified User
Review Source
The company I worked at used Hadoop clusters for processing huge datasets. They had several nodes for both production and per-production nodes. It allowed distributed processing of data across several clusters with an easy to use software model. It is used by the Systems and IT department at my company.
  • HDFS provides a very robust and fast data storage system.
  • Hadoop works well with generic "commodity" hardware negating the need for expensive enterprise grade hardware.
  • It is mostly unaffected by system and hardware failures of nodes and is self-sustained.
  • While its open source nature provides a lot of benefits, there are multiple stability issues that arise due to it.
  • Limited support for interactive analytics.
Hadoop is a very powerful tool that can be used in almost any environment where huge scale processing of data across clusters is required. It provides multiple modules such as HDFS and MapReduce that will make managing and analyzing said data reliable and efficient. Hadoop is a new and constantly evolving tool, and hence it needs users to be on top of it all the time.
Read Tom Thomas's full review
December 01, 2015

Review: "Hadoop - Effective tool for large scale distributed processing."

Score 8 out of 10
Vetted Review
Verified User
Review Source
I have used Hadoop for building business feeds for a telecom client. The major purpose for using Hadoop was to tackle the problem of gaining insights into the ever growing number of business data. We leveraged the map reduce programming model to churn more than 30 gigabytes of data per day into actionable and aggregated data which was further leveraged by campaign teams to design and shape marketing and by product teams to envision new customer experiences.
  • Hadoop is an excellent framework for building distributed, fault tolerant data processing systems which leverage HDFS which is optimized for low latency storage and high throughput performance.
  • Hadoop Map reduce is a powerful programming model and can be leveraged directly either via use of Java programming language or by data flow languages like Apache Pig.
  • Hadoop has a reach eco system of companion tools which enable easy integration for ingesting large amounts of data efficiently from various sources. For example Apache Flume can act as data bus which can use HDFS as a sink and integrates effectively with disparate data sources.
  • Hadoop can also be leveraged to build complex data processing and machine learning workflows, due to availability of Apache Mahout, which uses the map reduce model of Hadoop to run complex algorithms.
  • Hadoop is a batch oriented processing framework, it lacks real time or stream processing.
  • Hadoop's HDFS file system is not a POSIX compliant file system and does not work well with small files, especially smaller than the default block size.
  • Hadoop cannot be used for running interactive jobs or analytics.
1. How large are your data sets? If your answer is few gigabytes, Hadoop may be overkill for your needs.
2. Do you require real-time analytical processing? If yes, Hadoop's map reduce may not be a great asset in that scenario.
3. Do you want to want to process data in a batch processing fashion and scale for TeraBytes size clusters? Hadoop is definitely a great fit for your use case.
Read Mrugen Deshmukh's full review
February 16, 2016

Review: "Apache Hadoop is the best open source product I used."

Score 9 out of 10
Vetted Review
Verified User
Review Source
My present company uses Hadoop and associated technology to create a data pipeline using open source tools. Apart from that we also consult for projects which could potentially use Hadoop. Apart from that, I also work as a consultant for HDP. We actively help in installation and setup of hadoop clusters.
  • Hadoop is open source and with a wide community already present, the usage is much easy for individuals, startups and MNCs alike.
  • Hadoop works well for commodity hardware and that makes it easier to avoid pricey clusters.
  • Hadoop takes parallel programming to next level and helps processing of multi terabytes (even petabytes) of data easier.
  • While Hadoop MR parallelizes jobs involving Big Data, it is slow for smaller data sets
  • OLAP (analytics)is easier, however, OLTP (transactions) is a problem in most cases.
  • People using Hadoop have to keep in mind that small proof of concepts may not scale as expected.
Hadoop is well suited only if you have large datasets to work upon. Jumping to Hadoop with small data sets won't be as useful.
Read Piyush Routray's full review
February 13, 2016

Review: "Hadoop an awesome tool for large scale batch processing."

Score 10 out of 10
Vetted Review
Verified User
Review Source
I have been working with Hadoop since last year. It is very user friendly. Hadoop was used by the data center management team. It allows distributed processing of huge amount of data sets across clusters of computers using simple programming models.
  • It is robust in the sense that any big data applications will continue to run even when individual servers fail.
  • Enormous data can be easily sorted.
  • It can be improved in terms of security.
  • Since it is open source, stability issues must be improved.
Hadoop is really very useful when dealing with big data.
Read Tushar Kulkarni's full review
December 04, 2015

"A very generic Hadoop review"

Score 10 out of 10
Vetted Review
Verified User
Review Source
We utilize Hadoop primarily as a large data staging area for disparate corporate data. Select data is aggregated and moved downstream to a more formal data warehouse. Some data analytics is also performed directly against the Hadoop stored data. The direct analytics is done primarily with Apache Spark utilizing Scala and Python.
  • No requirement for schema on write.
  • Ability to scale to massive amounts of data.
  • Open platform provides multiple options and customizations to fit your exact needs.
  • The platform is still maturing and can be confusing to research and use. Basic tasks can still be manual and are not always user friendly.
A big data problem doesn't always mean huge volumes of data. The other V's of big data (velocity and variety) are also important factors that may lead to selecting Hadoop as a platform.
Read Pierre LaFromboise's full review
December 01, 2015

Review: "Hadoop the solution to big data problems"

Score 10 out of 10
Vetted Review
Verified User
Review Source
Hadoop is used by data center management team. Hadoop processes the metric data pushed by virtual machines. Hadoop's output is served to the analytics engine and respective actions are taken to maintain even load on machines.
  • Processing huge data sets.
  • Concurrent processing.
  • Performance increases with distribution of data across multiple machines.
  • Better handling of unstructured data.
  • Data nodes and processing nodes
  • Make Haadop lighweight.
  • Installation is very difficult. Make it more user friendly.
  • Introduce a feature that works with continuous integration.
Ask about how Hadoop fits in your environment and how fast it processes streaming data.
Read Sudhakar Kamanboina's full review
December 01, 2015

User Review: "Fast and Reliable, Use Hadoop!"

Score 10 out of 10
Vetted Review
Verified User
Review Source
I have been using Hadoop for 2 years and I really find it very useful, especially working with bigger datasets. I have used Hadoop and Mahout for my project to analyze and learn different patterns from Yelp Dataset. It was really very easy and user friendly to use.

  • Scalability. Hadoop is really useful when you are dealing with a bigger system and you want to make your system scalable.
  • Reliable. Very reliable.
  • Fast, Fast Fast!!! Hadoop really works very fast, even with bigger datasets.
  • Development tools are not that easy to use.
  • Learning curve can be reduced. As of now, some skill is a must to use Hadoop.
  • Security. In today's world, security is of prime importance. Hadoop could be made more secure to use.
Hadoop is really useful for larger datasets. It is not very useful when you are dealing with a smaller dataset.
Read Gaurav Kasliwal's full review
November 17, 2015

User Review: "Wanna gain insight? Use Hadoop!"

Score 9 out of 10
Vetted Review
Verified User
Review Source
I have being using Hadoop for the last 12 months and really find it effective while dealing with large amounts of data. I have used Hadoop jointly with Apache Mahout for building a recommendation system and got amazing results. It was fast, reliable and easy to manage.
  • Fast. Prior to working with Hadoop I had many performance based issues where our system was very slow and took time. But after using Hadoop the performance was significantly increased.
  • Fault tolerant. The HDFS (Hadoop distributed file system) is good platform for working with large data sets and makes the system fault tolerant.
  • Scalable. As Hadoop can deal with structured and unstructured data it makes the system scalable.
  • Security. As it has to deal with a large data set it can be vulnerable to malicious data.
  • Less performance with smaller data. Doesn't provide effective results if the data is very small.
  • Requires a skilled person to handle the system.
I would recommend Hadoop when a system is dealing with huge amount of data.
Read Sumant Murke's full review
November 11, 2015

User Review: "Advantage Hadoopo"

Score 10 out of 10
Vetted Review
Verified User
Review Source
We are using it for Retail data ETL processing. This is going to be used in whole organization. It allows terabytes of data to be processed in faster manner with scalability.
  • Processes big volume of data using parallelism in faster manner.
  • No schema required. Hadoop can process any type of data.
  • Hadoop is horizontally scalable.
  • Hadoop is free.
  • Development tools are not that friendly.
  • Hard to find hadoop resources.
Hadoop is not a replacement of a transactional system such as RDBMS. It is suitable for batch processing.
Read Ajay Jha's full review
April 29, 2015

Review: "Hadoop for better economy and efficiency"

Score 7 out of 10
Vetted Review
Verified User
Review Source
Hadoop is used for storing and analyzing log data (logs from warehouse loads or other data processing) as well as storing and retrieving financial data from JD Edwards. It's also planned to be used for archival. Hadoop is used by several departments within our organization. Currently, we are paying a lot of money for hosting historical data and we plan to move that to Hadoop; reducing our storage costs. Also, we got a much better performance out of our Hadoop cluster for processing a large amount of financial data. So, in that senese, Hadoop addressed multiple business problems for us.
  • Hadoop stores and processes unstructured data such as web access logs or logs of data processing very well
  • Hadoop can be effectively used for archiving; providing a very economic, fast, flexible, scalable and reliable way to store data
  • Hadoop can be used to store and process a very large amount of data very fast
  • Security is a piece that's missing from Hadoop - you have to supplement security using Kerberos etc.
  • Hadoop is not easy to learn - there are various modules with little or no documentation
  • Hadoop being open-source, testing, quality control and version control are very difficult
Hadoop is best suited for warehouse or OLAP processing. It's not suitable for OLTP or small transaction processing
Read Bhushan Lakhe's full review
August 19, 2015

User Review: "Hadoop - You Can Tame the Elephant"

Score 10 out of 10
Vetted Review
Verified User
Review Source
Hadoop is slowly taking the place of the company-wide MySQL data warehouse. Sqoop is being used to import the data from MySQL. Impala is gradually being used as the new data source for all queries. Eventually, MySQL will be phased out, and all data will go directly into Hadoop. Tests have shown that the queries run from Impala are much faster than those from MySQL
  • The built-in data block redundancy helps ensure that the data is safe. Hadoop also distributes the storage, processing, and memory, to work with large amounts of data in a shorter period of time, compared to a typical database system.
  • There are numerous ways to get at the data. The basic way is via the Java-based API, by submitting MapReduce jobs in Java. Hive works well for quick queries, using SQL, which are automatically submitted as MapReduce Jobs.
  • The web-based interface is great for monitoring and administering the cluster, because it can potentially be done from anywhere.
  • Impala is a very fast alternative to Hive. Unlike Hive, which submits queries as MapReduce jobs, Impala provides immediate access to the data.
  • If you are not familiar with Java and the operating system Hadoop rides on, such as Linux, and have trouble with submitted MapReduce jobs, the error messages can seem cryptic, and it can be challenging to track down the source of the problem.
Hadoop is designed for huge data sets, which can save a lot of time with reading and processing data. However, the NameNode, which allocates the data blocks, is a single point of failure. Without a proper backup, or another NameNode ready to kick in, the file system can be become instantly useless. There are typically two ways to ensure the integrity of the NameNode.

One way is to have a Secondary NameNode, which periodically creates a copy of the file system image file. The process is called a "checkpoint". In the event of a failure of the Primary NameNode, the Secondary NameNode can be manually configured as the Primary NameNode. The need for manual intervention can cause delays and potentially other problems.

The second method is with a Standby NameNode. In this scenario, the same checkpoints are performed, however, in the event of a Primary NameNode failure, the Standby NameNode will immediately take the place of the Primary, preventing a disruption in service. This method requires additional services to be installed for it to operate.
Read Michael Reynolds's full review
March 19, 2015

"Hadoop review"

Score 8 out of 10
Vetted Review
Verified User
Review Source
We use Hadoop for our ETL and analytic functions. We stream data and land it on HDFS and then massage and transform data. We then use Hive interface to query this data. Using Sqoop we export and import data in and out of hadoop ecosystem. We store the data on HDFS in Avro and Parquet file formats.
  • Streaming data and loading to HDFS
  • Load jobs using Oozie and Sqoop for exporting data.
  • Analytic queries using MapReduce, Spark and Hive
  • Speed is one of the improvements we are looking for. We see Spark as an option and we are excited.
OLTP is a scenario I think it is less appropriate. But future will be certainly different.
Read Pramod Deshmukh's full review
May 09, 2014

"User Review of Hadoop"

Score 10 out of 10
Vetted Review
Verified User
Review Source
Hadoop is part of the overall Data Strategy and is mainly used as a large volume ETL platform and crunching engine for proprietary analytical and statistical models. The biggest challenge for developers/users is moving from an RDBMS query approach for accessing data to a schema on read and list processing framework. The learning curve is steep upfront, but Hive and end user tools like Datameer can help to bridge the gap. Data governance and stewardship are of key importance given the fluid nature of how data is stored and accessed.
  • Gives developers and data analysts flexibility for sourcing, storing and handling large volumes of data.
  • Data redundancy and tunable MapReduce parameters to ensure jobs complete in the event of hardware failure.
  • Adding capacity is seamless.
  • Logs that are easier to read.
Not an RDBMS - not well suited for traditional BI applications.
Read Andrea Krause's full review
May 03, 2014

Review: "Hadoop usage and relevance for large data indexing and searching (getting Trends of Data)"

Score 9 out of 10
Vetted Review
Reseller
Review Source
We used to understand hadoop stack for data indexing and search using the pentaho plugin.
  • Setup cluster of data on commodity server
  • Distribute data indexing and search process
  • Faster access to distributed indexed data using pentaho plugin and don't need to write whole set of analytics scripts that pentaho does for us
  • Uniformity of installation
For data indexing (unstructured content) and search without any costly server... there is no real alternative option?
Read Chandra Gupta's full review
December 09, 2015

Review: "Hadoop - best data optimization for the Enterprise"

Score 9 out of 10
Vetted Review
Verified User
Review Source
My organization uses Apache Hadoop for log analysis/data mining of data fetched from different practices in the US, Canada and India. It uses this data for showing analytical graphs and the progress of our software in those regions. Data from the practices is optimized and consumed by the customer applications. It provides faster performance and ease for data usage.
  • Hadoop is a very cost effective storage solution for businesses’ exploding data sets.
  • Hadoop can store and distribute very large data sets across hundreds of servers that operate, therefore it is a highly scalable storage platform.
  • Hadoop can process terabytes of data in minutes and faster as compared to other data processors.
  • Hadoop File System can store all types of data, structured and unstructured, in nodes across many servers
  • For now, Hadoop is doing great and is very productive.
Hadoop is well suited for healthcare organizations that deal with huge amounts of data and optimizing data.
Read this authenticated review
December 01, 2015

Hadoop Review: "From the experience of a naive developer!"

Score 9 out of 10
Vetted Review
Verified User
Review Source
I used Hadoop for my academic projects for processing high volume data of my data mining project.
  • It was able to map our data with clear distinction based on the key.
  • We were able to write simple map reduce code which ran simultaneously on multiple nodes.
  • The auto heal system was really helpful in case of multiple failures.
  • I think Hadoop should not have single point of failure in terms of name node.
  • It should have good public facing API's for easy integration.
  • Internals of Hadoop are very abstract.
  • Protoco Buffers is a really good concept but I am not sure if we have checked other options as well.
I think Hadoop has multiple flavors which people can customize to use as per their requirement. But I would choose hadoop based on following factors:
1. Number of nodes decision based on parallelism we want.
2. The module we want to run should be able to run parallely on all machine.
Read this authenticated review

About Hadoop

Hadoop is an open source software from Apache, supporting distributed processing and data storage. Hadoop is popular for its scalability, reliability, and functionality available across commoditized hardware.
Categories:  Hadoop-Related

Hadoop Integrations

Pricing

Does not have featureFree Trial Available?No
Has featureFree or Freemium Version Available?Yes
Does not have featurePremium Consulting/Integration Services Available?No
Entry-level set up fee?No

Hadoop Technical Details

Operating Systems: Unspecified
Mobile Application:No