TrustRadius: an HG Insights company

Azure Databricks

Score8.5 out of 10

59 Reviews and Ratings

What is Azure Databricks?

Azure Databricks is a service available on Microsoft's Azure platform and suite of products. It provides the latest versions of Apache Spark so users can integrate with open source libraries, or spin up clusters and build in a fully managed Apache Spark environment with the global scale and availability of Azure. Clusters are set up, configured, and fine-tuned to ensure reliability and performance without the need for monitoring. The solution includes autoscaling and auto-termination to improve total cost of ownership (TCO).

Categories & Use Cases

Top Performing Features

  • Data Encryption

    Data encryption to ensure data privacy

    Category average: 8.4

  • Data Transformations

    Use visual tools for standard transformations

    Category average: 9.1

  • Automated Machine Learning

    Tools to help automate algorithm development

    Category average: 8.9

Areas for Improvement

  • Multiple Model Development Languages and Tools

    Access to multiple popular languages, tools, and packages such as R, Python, SAS, Jupyter, RStudio, etc.

    Category average: 9.2

  • Connect to Multiple Data Sources

    Ability to connect to a wide variety of data sources including data lakes or data warehouses for data ingestion

    Category average: 8.7

  • Visualization

    The product’s support and tooling for analysis and visualization of data.

    Category average: 8.2

Always trustworthy

Use Cases and Deployment Scope

Azure Databricks is primarily used by our insight and analytics team. They use this for machine learning and reporting. We use Azure Databricks as our data lake into Braze. This helps with all the data we need which is very important for reporting and metrics on our customer base. This reduces data silos for us.

Pros

  • Stops data silos
  • Collaborative
  • Single workspace

Cons

  • Quite expensive
  • Simple tasks can be difficult
  • Hard to learn

Return on Investment

  • Hard to find people who know how to use it.
  • Works well with Braze.
  • Cost management not fully developed.

Usability

Alternatives Considered

Snowflake

Scaling the Lakehouse

Use Cases and Deployment Scope

Azure Databricks is used for Data Analytics, Modelling and AI/ML uses cases in our analytics architecture. Analytics modelling from Non-SAP Sources like Partner Portal, Microsoft Dynamics CRM, Oracle DB are done using Azure Databricks. For the AI/ML use cases on Manufacturing Defect support is implemented using Azure Databricks in our organization

Pros

  • Unity Catalog
  • Data Federation in Lakehouse Architecture
  • Integration of Mosaic AI in the SQL Layer

Cons

  • Data Orchestration limitations compared to Azure Data Factory
  • Limitations in Native Modelling Features
  • Integration with SAP Sources need SAP Datasphere

Return on Investment

  • Integrated ML Flow has reduced the modelling time
  • Reduction on the Data load time because of Lakehouse
  • Better Integration with SAP is needed to avoid tools like Datasphere

Usability

Alternatives Considered

SAP Datasphere and Snowflake

Other Software Used

Anaplan, Snowflake, SAP Datasphere, SAP BW/4HANA, SAP Analytics Cloud, Microsoft Power BI, SAP HANA Cloud

Azure Databricks: A Data Consultant's Dream

Use Cases and Deployment Scope

As a Big Data Consultant. Azure Databricks is my favorite tool in the house!

The biggest problems with data consulting is a plethora of programming languages it deals in, from SQL, Scala,R, Python, Java etc.

That is exactly where Azure Databricks excels! It supports all languages in a single notebook with an equivalent performance for all! Club that with a visually pleasing UI, features that integrate the entire data lifecycle, and an architecture that gets the best of spark and you have one of the best data tools in your hand!

Pros

  • Data Processing and Transformations based on Spark
  • Delta Lakehouse when clubbed with an external cloud storage
  • Governance using Unity Catalog to unify IAM
  • Delta Live Tables is a product, which although relatively newer, has a great potential with the visuals of a pipeline.

Cons

  • The new UI is a bit clunky compared to the old UI. It also adds new elements in the sidebar which are not relevant to the workspace. Can be worked upon
  • Delta Live Tables, although powerful, has a lot of things that can be improved, including error debugging, support for new things
  • Concurrent requests need some more optimisation and work in the delta lake tables.

Return on Investment

  • The support team is amazing, they help you at every stage of the projects, from sales to delivery.
  • On a framework level, it has had an amazing impact and has reduced the clients overall data platform costs by a staggering 65%
  • There has been a 40% Manual work requirement on average for the clients when they move to Azure Databricks Data Platform

Usability

Alternatives Considered

Jupyter Notebook, Azure Synapse Analytics and Cloudera Data Platform

Other Software Used

Azure Data Factory, Cloudera Data Platform, Apache Iceberg

Azure Databricks ! Best of cloud and Big data

Use Cases and Deployment Scope

We are leveraging Databricks capabilities in various use cases. For instance, to design a tailor made change data capture that keep track of users account details and keep it updated in delta lake. We have also designed numerous ETL processes which is scheduled to provide data to data analytics on strict delivery timelines. Moreover, the workspaces is integrated with other Azure services such as Azure synapse analytics, Azure data lake, Azure data factory. Some of our Databricks are triggered by Azure data factory.

Pros

  • Consistently great performance when dealing with huge scale data with the help of spark architecture
  • Magic commands such as spark sql, pyspark, scala . This comes really handy in day to day work
  • Integration with other Azure services is super smooth and robust

Cons

  • Their pipeline workflow orchestration is pretty primitive. Lacks some common features
  • Workspace UI and navigation requires steep learning curve
  • Personally, I am not fond of their autosave feature. Its dangerous for production level notebooks scripts

Other Software Used

Azure Data Factory, Azure Synapse Analytics, Azure Data Lake Storage

Our new go-to tool for managing large databases and tables!

Use Cases and Deployment Scope

We use Databricks to pull performance metrics for our content hosted on the company website. Having one tool to view and analyze the data has been a game changer for us, saving many hours of collecting the data various sources in the past.

Pros

  • SQL
  • Data management
  • Data access

Cons

  • Intuitive interface
  • Ease of use
  • Providing FAQ or QRGs

Return on Investment

  • Helped reduce time for collecting data
  • Reduced cost in maintaining multiple data sources
  • Access for multiple users and management of users/data in a single platform

Other Software Used

Tableau Server, Tableau Cloud, Microsoft Excel