TrustRadius: an HG Insights company

AWS Data Pipeline

Score6.1 out of 10

14 Reviews and Ratings

What is AWS Data Pipeline?

AWS Data Pipeline is a web service used to process and move data between different AWS compute and storage services, as well as on-premises data sources, at specified intervals. With AWS Data Pipeline, users can regularly access data where it’s stored, transform and process it at scale, and transfer the results to AWS services such as Amazon S3, Amazon RDS, Amazon DynamoDB, and Amazon EMR. AWS Data Pipeline is designed to help create complex data processing workloads that are fault tolerant, repeatable, and highly available.

AWS Data Pipeline - Data engineer's time saver

Use Cases and Deployment Scope

We are using AWS data pipeline to create data flows to extract, transform and load data to redshift, Basically creating ETL job flows using AWS data pipeline. It is helping data engineers to effectively and quickly create and manage complex data processing flows.

Pros

  • Helps you easily create complex data processing workloads
  • Fault tolerant
  • Highly available

Cons

  • Pipeline Stuck in Pending Status
  • Pipeline Component Stuck in Waiting for Runner Status
  • EMR Cluster Fails With Error

Most Important Features

  • Easy way to create pipeline
  • Scalable infrastructure to process large amount of data
  • Fault tolerant

Return on Investment

  • Easy to use
  • Data engineers are able to create the data pipelines quickly and effectively
  • Scalable and Fault tolerant

Alternatives Considered

Azure Data Factory

Other Software Used

Azure Data Factory, AWS Glue, Google Cloud Dataflow