Hadoop review
Overall Satisfaction with Hadoop
We use Hadoop for our ETL and analytic functions. We stream data and land it on HDFS and then massage and transform data. We then use Hive interface to query this data. Using Sqoop we export and import data in and out of hadoop ecosystem. We store the data on HDFS in Avro and Parquet file formats.
Pros
- Streaming data and loading to HDFS
- Load jobs using Oozie and Sqoop for exporting data.
- Analytic queries using MapReduce, Spark and Hive
Cons
- Speed is one of the improvements we are looking for. We see Spark as an option and we are excited.
- Fast ETL and realtime streaming data
- Transformation and loading jobs are orchestrated using Oozie
Comments
Please log in to join the conversation