TrustRadius: an HG Insights company

AWS Glue

Score8.6 out of 10

42 Reviews and Ratings

What is AWS Glue?

AWS Glue is a managed extract, transform, and load (ETL) service designed to make it easy for customers to prepare and load data for analytics. With it, users can create and run an ETL job in the AWS Management Console. Users point AWS Glue to data stored on AWS, and AWS Glue discovers data and stores the associated metadata (e.g. table definition and schema) in the AWS Glue Data Catalog. Once cataloged, data is immediately searchable, queryable, and available for ETL.

Categories & Use Cases

AWS Glue Partner Review

Use Cases and Deployment Scope

As an AWS Advanced Consulting Partner, we use AWS Glue in many of our Data and Analytics Solutions. We've implemented in the major enterprises in the Philippines that are in the media, telecommunication, logistics and Fintech industries. The company aims to centralize their data lake of operational raw data containing various shipping details by making use of the AWS platform.The architecture must involve an automation of the data extraction from an API. The data lake should also be visualized to provide graphical details using QuickSight, and the generated dashboards are to be embedded into the customer web portal. AWS Services implemented - Lambda, S3, Glue, Athena, Quicksight, EventBridge

Pros

  • After data cleansing, the team also implemented the best practices for using AWS platform services as a Data Lake, such as job bookmarking for AWS Glue jobs, proper delimiter for the AWS Glue crawlers, partitioning in AWS S3, and transformation to parquet file for compression and faster querying time in Amazon Athena.
  • Data modernization through combining data from multiple sources into a functioning datasets, rebuilding DW, and resctructuring data sources.
  • Aims to lessen customer complaints, eliminate manual data extraction requests via SR from different data sources, and Increase accuracy, consistency and speed up reconciliation process.

Cons

  • Faster processing, on cases where data is not partitioned efficiency
  • Cost optimization and pricing
  • Developer experience on new users

Return on Investment

  • ROI
  • Faster processing of Data
  • Integration to Athena and other AWS Data Services

Usability

Alternatives Considered

Amazon Athena, AWS Lake Formation and Snowflake

Other Software Used

Snowflake, MongoDB, Confluent

AWS Glue-The ETL Friend.

Use Cases and Deployment Scope

Utilized AWS Glue for the ETL process in a healthcare domain, where data from claims-related 837 and remittance-related 835 JSON files was ingested into Delta tables. Used it for data cleansing, validation, and processing as per business needs. During processing, AWS Glue uses Apache Spark environments to run transformation scripts in Python.

Pros

  • ETL business logic.
  • Monitoring.
  • Data Lineage.
  • Data migration.

Cons

  • AI based agent to have data related questions answered.
  • DQ tool for data quality checks as per business rules.

Return on Investment

  • Reduces lot of manual burden of a data engineer.
  • Helps in efficiently monitoring and managing data loads.

Usability

Alternatives Considered

Azure Databricks and Snowflake

Other Software Used

Amazon EMR (Elastic MapReduce)

Think of ETL: Think AWS GLUE

Use Cases and Deployment Scope

We have certain transformations needs, we use Glue to fulfil them. As its integrated service from AWS itself, the connectivity to other AWS service is pretty seamless. As it’s fully serverless, we don’t have to worry about the infrastructure as well. It can crawl the things for us so we don’t have worry about the updation in our source, we got to know itself. So it’s powerful ETL FOR US.

Pros

  • Scale up and scale down easily
  • Seamless connectivity with other AWS services
  • Cost effective as you need to pay what you are using.

Cons

  • It’s integration with other cloud vendors is bit difficult
  • If it can support non SQL based databases as well, it would be powerful.
  • Real time data synchronisation in data source is missing

Return on Investment

  • We are using GLUE for our ETL purpose. it’s ease with other our AWS services makes our ROI, 100% ROI.
  • One missing piece was compatibility with other data source for which we found a work around and made our data source as S3 only, so our dependencies on other data source is also reducing

Usability

Alternatives Considered

Informatica Intelligent Cloud Integration Services and Informatica PowerCenter

Other Software Used

Apache Hadoop, Amazon EMR (Elastic MapReduce), Amazon RDS on VMware

Software developer

Use Cases and Deployment Scope

The main concern in AWS Glue is to so much costing of Glue jobs and I was worked with 5000 dataset I was facing some performance issue as compare to cost they need to work on performance aand also reduced our time utilisation to save time using this method. AWS Glue integrates with services like Amazon Redshift, Amazon Athena, and Amazon QuickSight, enabling organizations to analyze data with their preferred analytics tools.

Pros

  • Data integration
  • Data transformation
  • Job scheduling

Cons

  • Complexity transformation
  • Debugging and monitoring
  • Custom connectors

Most Important Features

  • Etl automation
  • Data transformation
  • Job scheduling

Return on Investment

  • Reduced the time and effort
  • Improving data quality
  • Job scheduling

Alternatives Considered

Apache Spark

Other Software Used

AWS Cloud9, Aspire AWS Cloud Services and Solutions, Azure Analysis Services, Azure App Service

AWS Glue - The managed ETL service for your data

Use Cases and Deployment Scope

We use AWS Glue for ETL of the healthcare data. The input data come from different source systems and so with different formats. With help of the AWS Glue jobs, we translate the data into a common format. With help of python scripts and the scheduled job feature, the data is fetched in a periodic manner, processed with help of the python script, converted to the parquet format, and stored in the S3 bucket. The glue catalog generates the schema of the stored data and allows AWS Athena to query the same for analytics purposes.

Pros

  • It is extremely fast, easy, and self-intuitive. Though it is a suite of services, it requires pretty less time to get control over it.
  • As it is a managed service, one need not take care of a lot of underlying details. The identification of data schema, code generation, customization, and orchestration of the different job components allows the developers to focus on the core business problem without worrying about infrastructure issues.
  • It is a pay-as-you-go service. So, there is no need to provide any capacity in advance. So, it makes scheduling much easier.

Cons

  • The sample code should cover more scenarios. They are quite basic. However, you can find good pointers from the internet and AWS community and tickets.
  • AWS Glue runs on Apache Spark. So, to take the best of the AWS Glue service, the developer should have a good idea of Apache Spark.

Most Important Features

  • AWS Glue Data catalog to write the efficient queries.
  • AWS Glue Crawler for the automatic schema recognition.
  • AWS Glue schedule job to perform certain ETL tasks on the defined interval.

Return on Investment

  • We were transforming the data using a simple python script and were facing a lot of orchestration issues. The failure of the script was quite prominent as the nature of the data was a bit more dynamic. With help of AWS glue, we could fix ~80% of orchestration issues. With help of automatic schema generation, dynamism is also addressed very well. So, we have started realising the ROI from day 1.

Alternatives Considered

AWS Data Pipeline

Other Software Used

Amazon SageMaker, Alexa, Amazon Lex