Overall Satisfaction with Hadoop
Hadoop is part of the overall Data Strategy and is mainly used as a large volume ETL platform and crunching engine for proprietary analytical and statistical models. The biggest challenge for developers/users is moving from an RDBMS query approach for accessing data to a schema on read and list processing framework. The learning curve is steep upfront, but Hive and end user tools like Datameer can help to bridge the gap. Data governance and stewardship are of key importance given the fluid nature of how data is stored and accessed.
- Hadoop has made it possible to implement projects that require large amounts of data from a diverse set of source systems.
- Hadoop has also taken load off the Enterprise Data Warehouse space by absorbing some of the analytics and model building work.
Not an RDBMS - not well suited for traditional BI applications.
Only a small portion of Hadoop's capabilities have been explored within our organization. Scalability is not a labor/cost intensive exercise and new workload management features of YARN are very attractive.