Overall Satisfaction with Hadoop
We use Hadoop for our ETL and analytic functions. We stream data and land it on HDFS and then massage and transform data. We then use Hive interface to query this data. Using Sqoop we export and import data in and out of hadoop ecosystem. We store the data on HDFS in Avro and Parquet file formats.
- Streaming data and loading to HDFS
- Load jobs using Oozie and Sqoop for exporting data.
- Analytic queries using MapReduce, Spark and Hive
- Speed is one of the improvements we are looking for. We see Spark as an option and we are excited.
- Fast ETL and realtime streaming data
- Transformation and loading jobs are orchestrated using Oozie