AWS Glue is a managed extract, transform, and load (ETL) service designed to make it easy for customers to prepare and load data for analytics. With it, users can create and run an ETL job in the AWS Management Console. Users point AWS Glue to data stored on AWS, and AWS Glue discovers data and stores the associated metadata (e.g. table definition and schema) in the AWS Glue Data Catalog. Once cataloged, data is immediately searchable, queryable, and available for ETL.
$0.44
billed per second, 1 minute minimum
Azure Synapse Analytics
Score 7.7 out of 10
N/A
Azure Synapse Analytics is described as the former Azure SQL Data Warehouse, evolved, and as a limitless analytics service that brings together enterprise data warehousing and Big Data analytics. It gives users the freedom to query data using either serverless or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate BI and machine learning needs.
One of AWS Glue's most notable features that aid in the creation and transformation of data is its data catalog. Support, scheduling, and the automation of the data schema recognition make it superior to its competitors aside from that. It also integrates perfectly with other AWS tools. The main restriction may be integrated with systems outside of the AWS environment. It functions flawlessly with the current AWS services but not with other goods. Another potential restriction that comes to mind is that glue operates on a spark, which means the engineer needs to be conversant in the language.
It's well suited for large, fastly growing, and frequently changing data warehouses (e.g., in startups). It's also suited for companies that want a single, relatively easy-to-use, centralized cloud service for all their data needs. Larger, more structured organizations could still benefit from this service by using Synapse Dedicated SQL Pools, knowing that costs will be much higher than other solutions. I think this product is not suited for smaller, simpler workloads (where an Azure SQL Database and a Data Factory could be enough) or very large scenarios, where it may be better to build custom infrastructure.
It is extremely fast, easy, and self-intuitive. Though it is a suite of services, it requires pretty less time to get control over it.
As it is a managed service, one need not take care of a lot of underlying details. The identification of data schema, code generation, customization, and orchestration of the different job components allows the developers to focus on the core business problem without worrying about infrastructure issues.
It is a pay-as-you-go service. So, there is no need to provide any capacity in advance. So, it makes scheduling much easier.
Quick to return data. Queries in a SQL data warehouse architecture tend to return data much more quickly than a OLTP setup. Especially with columnar indexes.
Ability to manage extremely large SQL tables. Our databases contain billions of records. This would be unwieldy without a proper SQL datawarehouse
Backup and replication. Because we're already using SQL, moving the data to a datawarehouse makes it easier to manage as our users are already familiar with SQL.
With Azure, it's always the same issue, too many moving parts doing similar things with no specialisation. ADF, Fabric Data Factory and Synapse pipeline serve the same purpose. Same goes for Fabric Warehouse and Synapse SQL pools.
Could do better with serverless workloads considering the competition from databricks and its own fabric warehouse
Synapse pipelines is a replica of Azure Data Factory with no tight integration with Synapse and to a surprise, with missing features from ADF. Integration of warehouse can be improved with in environment ETl tools
While easy to set up and manage monitoring for large datasets, its complexity can be a barrier for new users. Integration with AWS Ecosystem, Managed Monitoring, Dashboards and monitoring tools for AWS Glue are generally easy to set up and maintain, Automated Data Pipelines. Automates data pipeline creation, making it efficient for certain data integration
The data warehouse portion is very much like old style on-prem SQL server, so most SQL skills one has mastered carry over easily. Azure Data Factory has an easy drag and drop system which allows quick building of pipelines with minimal coding. The Spark portion is the only really complex portion, but if there's an in-house python expert, then the Spark portion is also quiet useable.
Amazon responds in good time once the ticket has been generated but needs to generate tickets frequent because very few sample codes are available, and it's not cover all the scenarios.
Microsoft does its best to support Synapse. More and more articles are being added to the documentation, providing more useful information on best utilizing its features. The examples provided work well for basic knowledge, but more complex examples should be added to further assist in discovering the vast abilities that the system has.
AWS Glue is a fully managed ETL service that automates many ETL tasks, making it easier to set AWS Glue simplifies ETL through a visual interface and automated code generation.
In comparing Azure Synapse to the Google BigQuery - the biggest highlight that I'd like to bring forward is Azure Synapse SQL leverages a scale-out architecture in order to distribute computational processing of data across multiple nodes whereas Google BigQuery only takes into account computation and storage.
We are using GLUE for our ETL purpose. it’s ease with other our AWS services makes our ROI, 100% ROI.
One missing piece was compatibility with other data source for which we found a work around and made our data source as S3 only, so our dependencies on other data source is also reducing
Licensing fees is replaced with Azure subscription fee. No big saving there
More visibility into the Azure usage and cost
It can be used a hot storage and old data can be archived to data lake. Real time data integration is possible via external tables and Microsoft Power BI