Item: Apache Hadoop
Rating: 9
Author: Verified User

Overall Satisfaction with Hadoop

Use Cases and Deployment Scope

Apache Hadoop is a cost effective solution for storing and managing vast amounts of data efficiently. It is dependable and works even when various clusters fail. The Hadoop Distributed File System (HDFS) also goes a long way in helping in storing data. MapReduce and Tez, with the help of Hive of course, processes large amounts of data in a lesser time frame than expected. This helps our data warehouse to be updated with lesser resources rather than reading, processing and updating data in a relational data base.

Pros and Cons

Pros

It is cost effective.
It is highly scalable.
Failure tolerant.

Cons

Hadoop does not fit all needs.
Converting data into a single format takes time.
Need to take additional security measures to secure data.

Return on Investment

It has made us respect the sheer volume of Big data.
It helps us update our dashboards.
It's easier to think big with Hadoop.

Other Software Used

Slack, Enthought Canopy, IntelliJ IDEA, Google Drive, Eclipse, JIRA Software, Atlassian Confluence

Likelihood to Recommend

When we have data coming in from various sources, using hadoop is a good call. Its a good central station to take a good look at your data and see what needs to be done.
Hadoop should not be used directly for Real time Analytics. HDFS should be used to store data and we could use Hive to query the files.
Hadoop needs to be understood thoroughly even before attempting to use it for data warehousing needs. So you may need to take stock of what Hadoop provides, and read up on its accompanying tools to see what fits your needs.

Comments

Please log in to join the conversation

Hadoop is pretty Badass

Modules Used