AWS Glue is a managed extract, transform, and load (ETL) service designed to make it easy for customers to prepare and load data for analytics. With it, users can create and run an ETL job in the AWS Management Console. Users point AWS Glue to data stored on AWS, and AWS Glue discovers data and stores the associated metadata (e.g. table definition and schema) in the AWS Glue Data Catalog. Once cataloged, data is immediately searchable, queryable, and available for ETL.
$0.44
billed per second, 1 minute minimum
Databricks Data Intelligence Platform
Score 8.8 out of 10
N/A
Databricks offers the Databricks Lakehouse Platform (formerly the Unified Analytics Platform), a data science platform and Apache Spark cluster manager. The Databricks Unified Data Service provides a platform for data pipelines, data lakes, and data platforms.
$0.07
Per DBU
Tableau Prep
Score 7.1 out of 10
N/A
Tableau Prep enables users to get to the analysis phase faster by helping them quickly combine, shape, and clean their data. According to the vendor, a direct and visual experience helps provide users with a deeper understanding of their data, smart features make data preparation simple, and integration with the Tableau analytical workflow allows for faster speed to insight. Tableau Prep allows users to connect to data on-premises or in the cloud, whether it’s a database or a…
One of AWS Glue's most notable features that aid in the creation and transformation of data is its data catalog. Support, scheduling, and the automation of the data schema recognition make it superior to its competitors aside from that. It also integrates perfectly with other AWS tools. The main restriction may be integrated with systems outside of the AWS environment. It functions flawlessly with the current AWS services but not with other goods. Another potential restriction that comes to mind is that glue operates on a spark, which means the engineer needs to be conversant in the language.
Medium to Large data throughput shops will benefit the most from Databricks Spark processing. Smaller use cases may find the barrier to entry a bit too high for casual use cases. Some of the overhead to kicking off a Spark compute job can actually lead to your workloads taking longer, but past a certain point the performance returns cannot be beat.
If your data sets are coming in without much stewardship then Tableau Prep can help to clean the data before you start trying to create visualizations for your end users. You will save a lot of time this way - rather than seeing problems once you are creating dashboards. If you don't have large data sets or your data is relatively simple, then Tableau Prep may not be needed.
It is extremely fast, easy, and self-intuitive. Though it is a suite of services, it requires pretty less time to get control over it.
As it is a managed service, one need not take care of a lot of underlying details. The identification of data schema, code generation, customization, and orchestration of the different job components allows the developers to focus on the core business problem without worrying about infrastructure issues.
It is a pay-as-you-go service. So, there is no need to provide any capacity in advance. So, it makes scheduling much easier.
While easy to set up and manage monitoring for large datasets, its complexity can be a barrier for new users. Integration with AWS Ecosystem, Managed Monitoring, Dashboards and monitoring tools for AWS Glue are generally easy to set up and maintain, Automated Data Pipelines. Automates data pipeline creation, making it efficient for certain data integration
Because it is an amazing platform for designing experiments and delivering a deep dive analysis that requires execution of highly complex queries, as well as it allows to share the information and insights across the company with their shared workspaces, while keeping it secured.
in terms of graph generation and interaction it could improve their UI and UX
Amazon responds in good time once the ticket has been generated but needs to generate tickets frequent because very few sample codes are available, and it's not cover all the scenarios.
One of the best customer and technology support that I have ever experienced in my career. You pay for what you get and you get the Rolls Royce. It reminds me of the customer support of SAS in the 2000s when the tools were reaching some limits and their engineer wanted to know more about what we were doing, long before "data science" was even a name. Databricks truly embraces the partnership with their customer and help them on any given challenge.
I have not really had to reach out for any kind of customer support for Tableau Prep, so I can't really say. However, the support that Tableau has given for their other products has been great, so I would assume it would be the same here. They are also constantly adding new features and providing software updates, and that is always a plus.
Live connections to cloud services (Google Sheets for example) and cloud hosted databases (cloud hosted SIS for example) for scheduled flows are not supported
AWS Glue is a fully managed ETL service that automates many ETL tasks, making it easier to set AWS Glue simplifies ETL through a visual interface and automated code generation.
The most important differentiating factor for Databricks Lakehouse Platform from these other platforms is support for ACID transactions and the time travel feature. Also, native integration with managed MLflow is a plus. EMR, Cloudera, and Hortonworks are not as optimized when it comes to Spark Job Execution. Other platforms need to be self-managed, which is another huge hassle.
Before Prep, we had to do all the data joining and connecting in a Tableau Workbook. Not only did this cause workbooks connected with live data to run frustratingly slowly, a new connection and set-up had to be established every time a new workbook as created, even if you were working with the same data. The extracts produced by Prep allow several workbooks to be working from the same data set-up without any additional work, saving time and stress.
We are using GLUE for our ETL purpose. it’s ease with other our AWS services makes our ROI, 100% ROI.
One missing piece was compatibility with other data source for which we found a work around and made our data source as S3 only, so our dependencies on other data source is also reducing