User Review of Hadoop
May 09, 2014

User Review of Hadoop

Andrea Krause | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User

Modules Used

  • Hadoop Common
  • Hadoop Distributed File System
  • Hadoop MapReduce
  • Pig, Hive, Streaming

Overall Satisfaction with Hadoop

Hadoop is part of the overall Data Strategy and is mainly used as a large volume ETL platform and crunching engine for proprietary analytical and statistical models. The biggest challenge for developers/users is moving from an RDBMS query approach for accessing data to a schema on read and list processing framework. The learning curve is steep upfront, but Hive and end user tools like Datameer can help to bridge the gap. Data governance and stewardship are of key importance given the fluid nature of how data is stored and accessed.
  • Gives developers and data analysts flexibility for sourcing, storing and handling large volumes of data.
  • Data redundancy and tunable MapReduce parameters to ensure jobs complete in the event of hardware failure.
  • Adding capacity is seamless.
  • Logs that are easier to read.
  • Hadoop has made it possible to implement projects that require large amounts of data from a diverse set of source systems.
  • Hadoop has also taken load off the Enterprise Data Warehouse space by absorbing some of the analytics and model building work.
Not an RDBMS - not well suited for traditional BI applications.

Using Hadoop

Only a small portion of Hadoop's capabilities have been explored within our organization. Scalability is not a labor/cost intensive exercise and new workload management features of YARN are very attractive.