Overall Satisfaction with Hadoop
We utilize Hadoop primarily as a large data staging area for disparate corporate data. Select data is aggregated and moved downstream to a more formal data warehouse. Some data analytics is also performed directly against the Hadoop stored data. The direct analytics is done primarily with Apache Spark utilizing Scala and Python.
- No requirement for schema on write.
- Ability to scale to massive amounts of data.
- Open platform provides multiple options and customizations to fit your exact needs.
- The platform is still maturing and can be confusing to research and use. Basic tasks can still be manual and are not always user friendly.