Great Option for Unstructured Data | TrustRadius

Overall Satisfaction with Hadoop

Use Cases and Deployment Scope

Used for Massive data collection, storage, and analytics
Used for MapReduce processes, Hive tables, Spark job input, and for backing up data
Storing Retail Catalog & Session data to enable omnichannel experience for customers, and a 360-degree customer insight
Having a consistent data store that can be integrated across other platforms, and have one single source of truth.

Pros and Cons

Pros

HDFS is reliable and solid, and in my experience with it, there are very few problems using it
Enterprise support from different vendors makes it easier to 'sell' inside an enterprise
It provides High Scalability and Redundancy
Horizontal scaling and distributed architecture

Cons

Less organizational support system. Bugs need to be fixed and outside help take a long time to push updates
Not for small data sets
Data security needs to be ramped up
Failure in NameNode has no replication which takes a lot of time to recover

Return on Investment

Too many Hadoop projects have community focus divided; this causes some bug fixes to happen slow
Mindset change among business partners
Adopting Hadoop/MapReduce has a learning curve

Alternatives Considered

Apache Spark

For real-time streaming, use Spark; can provide a stark contrast to the way MR works
Use Hive for querying purposes

Other Software Used

Apache Solr, Apache Hive, Apache Spark

Likelihood to Recommend

Less appropriate for small data sets
Works well for scenarios with bulk amount of data. They can surely go for Hadoop file system, having offline applications
It's not an instant querying software like SQL; so if your application can wait on the crunching of data, then use it
Not for real-time applications

Comments

Please log in to join the conversation