Databricks Data Intelligence Platform

Score8.8 out of 10

108 Reviews and Ratings

What is Databricks Data Intelligence Platform?

Databricks offers the Databricks Lakehouse Platform (formerly the Unified Analytics Platform), a data science platform and Apache Spark cluster manager. The Databricks Unified Data Service provides a platform for data pipelines, data lakes, and data platforms.

Categories & Use Cases

#1 most frequent

Professional, Scientific, and Technical Services

26.5%

26 installations of 98

#2 most frequent

Manufacturing

17.3%

17 installations of 98

#3 most frequent

Information

14.3%

14 installations of 98

Yuvaraj Mamidi

Data engineer in Customer Service at modak analytics (201-500 employees employees)

Use Cases and Deployment Scope

In our organization, we use the Databricks Data Intelligence Platform as the main platform for building and managing data products. I use Databricks for creating notebooks for data transformation and creating data products and redirect them to project repositories and jobs scheduling using Databricks Workflows. And create business data products as delta tables in catalog (unity). It helps in solving big data manipulation/handling and jobs management.

Pros

Large Data Processing:- It handles large volumes of data efficiently using Spark for transforming data fulfilling business purposes and handles the jobs smoothly using clusters, workers.
Notebook-Oriented Development:- Databricks notebooks make development easy and flexible of data transformations using SQL, Python and R. Helps in testing the notebooks before deploying
Data Governance:- Provides data governance providing unity catalog for managing permissions security.

Cons

Job Monitoring and Alerts:- A better visual dashboard for pipeline tracking dependencies and failures would improve visibility.
Serverless limitation:- The sql variable set up using 'set' in notebooks is limited in serverless and can't be initialize, could be improved

Return on Investment

Reduced Manual Effort:- Automated workflows and schedulings reduced manual monitoring.
Faster Transform Developments:- With the integrated assistant support bug resolvent and code development became faster
Faster Data Availability:- Optimized and Reduced processing time for daily ETL runs.

Usability

Alternatives Considered

Snowflake and SnapLogic

Other Software Used

SnapLogic, GitLab, SAP HANA Cloud

Raghwendra Singh

SDE 4 in Information Technology at Meesho (1001-5000 employees employees)

Use Cases and Deployment Scope

Databricks is used to run all the data engineering, data science, analytics jobs related to spark , feature engineering jobs of data science and all the model training jobs of datascience. Databricks SQLwarehouse is also used as the main compute for serving the analytics powering the queries between 10 minutes to 60 minutes

Pros

Databricks provides an amazing notebook which serves and the backbone of unified analytics for all things data engineering, data science, and analytics
the performance of Databricks SQL Warehouse is top notch and very hard to beat by many competitors , and amazing performance optimization for delta file format
Databricks' support for iceberg format is also very good allowing for no vendor lock - in for delta format

Cons

The Unity catalog is restrictive in terms of policy definition due to user-defined functions compared to market alternatives like Apache Ranger or Open Policy Agent.
Unity Catalog has a table quota limitation where a schema with a higher number of tables requires increasing the table quota repeatedly , which is not an issue if you use Hive Metastore.
SQL Warehouse as a solution is amazing and performant, but there should have been some support to add Spark plugins or extensions so that there is some room for changes or adding custom dependencies

Return on Investment

the SLA has reduced due to the performance improvement when we migrated from oss spark to Databricks.
the notebook is very well polished with lots of features , boosting our analyst productivity
The Databricks workflows are a powerful tool to create complex dependency graphs and dags in one place

Usability

Alternatives Considered

Jupyter Notebook, Apache Spark and Google Cloud Dataproc

Other Software Used

Apache Hive, Apache Flink, Open Policy Agent

Verified User

Engineer in Information Technology (201-500 employees employees)

Use Cases and Deployment Scope

I use Databricks every day. We use multiple environments, such as dev, stage, and prod. Our primary use case is to get data from SnapLogic into the Bronze layer of Databricks. So, it handles both full and incremental loads as per the use case. Our workflows also support versioning across releases. This is really good for minimizing prod risks. We also perform many transformations on silver tables for end-use data products in the gold layer.

Pros

First, it handles large amounts of data. We run daily and weekly jobs that process a lot of records. Databricks manages it very well, with no issues, if the cluster is set up properly.
Second, it really works well for incremental updates. We load only new or changed data, which makes it easy to update existing tables without duplicating records.
Third, job scheduling is useful. We can schedule the jobs easily and monitor them. The best part is that we can retry or repair the failed runs.
The last one is about the notebook interface that I really love. It makes development and debugging easy. We can test logic step by step, validate data, and fix all our issues.

Cons

Sometimes, when multiple jobs depend on each other in different environments, it is not always easy to see the full workflow in one place.
It is sometimes difficult to determine which job or cluster contributes more to the overall cost.
For beginners, cluster configuration may be a little difficult. So more recommendation in the platform can help.

Return on Investment

Because of scheduled jobs it has reduced repetitive manual triggering of jobs.
As data volumes increased, it has already been scaled with our workload, which really saves us time.
Job monitoring and repair have reduced the number of failures and improved data availability.

Usability

Other Software Used

SnapLogic, Atlassian Jira, Atlassian Confluence

Austin Franchino View profile

Senior Data and Security Engineer in Engineering at Lumos Data (11-50 employees employees)

Use Cases and Deployment Scope

Databricks is the primary data platform where we land, standardize, clean, transform, and clean our data sources. We utilize the Workflows feature to automate reoccurring tasks and have built internal applications around the reusable workflows. We use the dashboard feature internally to allow customer success teams and business analysts to keep tabs on the performance and outputs of our products. The workloads are orchestrated in Databricks but executed within our own AWS accounts, allowing us to stay compliant with our stringent security requirements.

Pros

Thoughtful application of AI assistants during the coding and analysis steps.
Intuitive UI for users of varying skill sets.
Frequently updated documentation.

Cons

Greater support for non spark workloads.
Ability to host JAR files on serverless endpoints.

Return on Investment

Greater democratization to data sources.
Migration took a while, as we were largely a Pandas shop.

Usability

Alternatives Considered

Snowflake

Other Software Used

Notion, Datadog

Axel Richier View profile

Tech Lead Data Engineer in Engineering at Ekimetrics (201-500 employees employees)

Use Cases and Deployment Scope

I use Databricks Lakehouse Platform in my Data Scienc & AI consulting company to help various business entities with data-driven solutions. The platform can handle large and complex data sets and enable us to build and deploy applications using the latest technologies. The opennness of Databricks allows us to seamlessly integrate and adapt to our clients requirements :
* Creating dashboards with Tableau, Redash, Qlik,
* Feed their CRM tool like Salesforce, SAP,
* developing chatbots for Knowledge Management
* Serve ML models behind API endpoints.
Databricks Lakehouse Platform is a versatile and open product that saves us a lot of time, help us control cloud cost and human resources energy !

Pros

Enhanced Data Science & Data Engineering collaboration
Complete Infrastructure-as-code Terraform provider
Very easy streaming capabilities
Multiple Git providers integration with merge assistant

Cons

VsCode IDE support for local development
Python SDK for Workflows
Poetry support

Most Important Features

Unity Catalog
Collaborative Spark Notebook supporting python, SQL, Scala, R
Serverless Endpoints
mlflow integration

Return on Investment

Data Science environment is ready in a matter of minutes, not days.
Much better cost control
Easy onboarding for all clouds

Alternatives Considered

Azure Synapse Analytics and Snowflake

Other Software Used

Bitbucket, Azure Data Factory, Tableau Desktop

Databricks Data Intelligence Platform

What is Databricks Data Intelligence Platform?

Categories & Use Cases

Most Frequent Users

Professional, Scientific, and Technical Services

Manufacturing

Information