AWS Data Pipeline - Data engineer's time saver
Use Cases and Deployment Scope
We are using AWS data pipeline to create data flows to extract, transform and load data to redshift, Basically creating ETL job flows using AWS data pipeline. It is helping data engineers to effectively and quickly create and manage complex data processing flows.
Pros
- Helps you easily create complex data processing workloads
- Fault tolerant
- Highly available
Cons
- Pipeline Stuck in Pending Status
- Pipeline Component Stuck in Waiting for Runner Status
- EMR Cluster Fails With Error
Most Important Features
- Easy way to create pipeline
- Scalable infrastructure to process large amount of data
- Fault tolerant
Return on Investment
- Easy to use
- Data engineers are able to create the data pipelines quickly and effectively
- Scalable and Fault tolerant
Alternatives Considered
Azure Data Factory
Other Software Used
Azure Data Factory, AWS Glue, Google Cloud Dataflow
