Skip to main content
TrustRadius
AWS Glue

AWS Glue

Overview

What is AWS Glue?

AWS Glue is a managed extract, transform, and load (ETL) service designed to make it easy for customers to prepare and load data for analytics. With it, users can create and run an ETL job in the AWS Management Console.…

Read more
Recent Reviews

AWS Glue ETL tool

8 out of 10
July 25, 2023
Incentivized
We use AWS Glue to creat Etl pipelines for transforming and moving of data from different data sources like S3, snowflakes, postgres to …
Continue reading
Read all reviews

Reviewer Pros & Cons

View all pros & cons
Return to navigation

Pricing

View all pricing

per DPU-Hour

$0.44

Cloud
billed per second, 1 minute minimum

Entry-level set up fee?

  • No setup fee

Offerings

  • Free Trial
  • Free/Freemium Version
  • Premium Consulting/Integration Services
Return to navigation

Product Details

What is AWS Glue?

AWS Glue Technical Details

Deployment TypesSoftware as a Service (SaaS), Cloud, or Web-Based
Operating SystemsUnspecified
Mobile ApplicationNo
Return to navigation

Comparisons

View all alternatives
Return to navigation

Reviews and Ratings

(32)

Attribute Ratings

Reviews

(1-7 of 7)
Companies can't remove reviews or game the system. Here's why
October 23, 2023

Software developer

Ashutosh Mishra | TrustRadius Reviewer
Score 7 out of 10
Vetted Review
Verified User
The main concern in AWS Glue is to so much costing of Glue jobs and I was worked with 5000 dataset I was facing some performance issue as compare to cost they need to work on performance aand also reduced our time utilisation to save time using this method. AWS Glue integrates with services like Amazon Redshift, Amazon Athena, and Amazon QuickSight, enabling organizations to analyze data with their preferred analytics tools.
  • Data integration
  • Data transformation
  • Job scheduling
  • Complexity transformation
  • Debugging and monitoring
  • Custom connectors
AWS Glue is well-suited for data warehousing scenarios where you need to extract, transform, and load data into a centralized repository like Amazon Redshift. It simplifies the ETL.It's a great choice for preparing data in data lakes, especially when dealing with diverse data sources and formats. Glue can help normalize and structure data for analytics.
July 25, 2023

AWS Glue ETL tool

Score 8 out of 10
Vetted Review
Verified User
Incentivized
We use AWS Glue to creat Etl pipelines for transforming and moving of data from different data sources like S3, snowflakes, postgres to Redshift and vice versa. Execution of spark jobs is really easy as it has auto generated code which establishes connections with source and target data bases securedly and helps in the cleansing of data like deduplication and performing validations on data. As it is Serverless it will automatically scale up and scale down the memory resources required to run the spark glue job.
  • Execution of spark jobs
  • Scaling of memory resources
  • Crawling the schemas
  • Incremental data sync
  • Real time data triggers
  • Grouping of small files
ETL operations and jobs are well suited to perform with glue. If we want to transform or extract data from the data sources specially in the data stored in the AWS cloud . It is very well integrated with the other AWS services. It is easier to establish connections. We can schedule the crawlers or run on demand.
Score 8 out of 10
Vetted Review
Verified User
Incentivized
1) In my current use case we mainly use AWS Glue for Extract Transform Load to process batch data on daily basis. 2) the main problem we can able to solve or we can say the solution which Glue provides that is, it can easily integrate with other AWS services like S3, RDS, Athena 3) pricing model is also very like pay-as-you-go 4) the main business problem which glue solve we can ingest the data ewe can perform ETL on top that and can create spark or python shell jobs
  • Extract , transform , Load
  • AWS Data catalog
  • triggers
  • we can create workflows
  • In-Stream schema registries feature people can not use this more efficiently
  • in Connections feature they can add more connectors as well
  • The crucial problem with AWS Glue is that it only works with AWS.
well suited:- when you want to use it to transform your data then glue also provides there own transformation also in that option you can able to do PII masking of Data if you don't want to use any code approach. The second scenario would be when want to integrate glue with other AWS services. and also wants to run Spark on glue for faster processing. less appropriate:- If you want to integrate with other services which are outside of AWS. it does not support Java as of now so if you have java resources then you can not run it.
Score 9 out of 10
Vetted Review
Verified User
Incentivized
We heavily rely on AWS Glue for cataloging our data objects (tables and views). We use AWS Glue as our Data Catalog and use it in our data pipelines to sync external and internal data sources. We also utilize AWS Glue to auto-generate SQL-based ETL based on AWS Glue catalog objects.
  • Create schemes, tables and views (data catalog).
  • Sync external and internal data sources.
  • Auto-generate SQL-based data pipelines, based on AWS Glue catalog objects.
  • It is very difficult (almost impossible) to scale
  • We sometimes get throttled by service limitations.
  • AWS Glue crawlers sometimes mismatch the data in the files
AWS Glue is a mature product, which helps organizations start their journey with data exploration and analysis. AWS Glue has many great features, like a data catalog, jobs, crawlers, helping non-engineers to handle data and build a data lake.
Score 9 out of 10
Vetted Review
Verified User
Incentivized
The automation of numerous tasks, including logging, alerting, monitoring, etc., is made possible by AWS Glue. Additionally, it is economical because you only pay for the resources you actually use. One of AWS Glue's most notable features that are helpful is that it aids in the generation and transformation of data in its data catalog.
  • Helps in data creation and transformations.
  • Automation of the data schema recognition.
  • Support and scheduling of the data schema.
  • Integration with systems outside of the AWS environment.
  • Glue runs on spark so the engineer should be aware of the language.
One of AWS Glue's most notable features that aid in the creation and transformation of data is its data catalog. Support, scheduling, and the automation of the data schema recognition make it superior to its competitors aside from that. It also integrates perfectly with other AWS tools. The main restriction may be integrated with systems outside of the AWS environment. It functions flawlessly with the current AWS services but not with other goods. Another potential restriction that comes to mind is that glue operates on a spark, which means the engineer needs to be conversant in the language.
Apurv Doshi | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Incentivized
We use AWS Glue for ETL of the healthcare data. The input data come from different source systems and so with different formats. With help of the AWS Glue jobs, we translate the data into a common format. With help of python scripts and the scheduled job feature, the data is fetched in a periodic manner, processed with help of the python script, converted to the parquet format, and stored in the S3 bucket. The glue catalog generates the schema of the stored data and allows AWS Athena to query the same for analytics purposes.
  • It is extremely fast, easy, and self-intuitive. Though it is a suite of services, it requires pretty less time to get control over it.
  • As it is a managed service, one need not take care of a lot of underlying details. The identification of data schema, code generation, customization, and orchestration of the different job components allows the developers to focus on the core business problem without worrying about infrastructure issues.
  • It is a pay-as-you-go service. So, there is no need to provide any capacity in advance. So, it makes scheduling much easier.
  • The sample code should cover more scenarios. They are quite basic. However, you can find good pointers from the internet and AWS community and tickets.
  • AWS Glue runs on Apache Spark. So, to take the best of the AWS Glue service, the developer should have a good idea of Apache Spark.
When the data which requires ETL has different formats, schema, and volume, this service suits them best. So, when the volume is not consistent (typical use-case of healthcare and online shopping), AWS Glue can be the prime choice. When the data is available in both batch and streaming mode, the developer needs to generate a separate codebase. This increases the source code management efforts. So, prefer to go with Glue when the nature of the data is the same (either batched or streamed).
Score 9 out of 10
Vetted Review
Verified User
Incentivized
One of the straightforward and quick cloud-based ETL tools is AWS Glue. It comes under the umbrella of AWS services. We use AWS Glue to analyze an extensive data set of USA based clinics and hospitals. Its HIPAA compliance for sensitive data. It comes with the support of python script, Schedular, and works very well with other AWS services like s3, rds.
  • Very quick for ETL job.
  • UI as well Command Interface with very few steps to create and schedule ETL Job.
  • Sample Code is very basic and not available in most of the scenario.
AWS glue is best if your organization is dealing with large and sensitive data like medical record. Its comes with scheduler and easy deployment for AWS user. The data catalog keeps the reference of the data in a well-structured format. If you are already part of the AWS services, then AWS Glue is the best choice; otherwise, it's not a simple one for deployment.
Return to navigation