Databricks Lakehouse Platform Reviews & Ratings 2024

Starting at $0.07 Per DBU

Overview

What is Databricks Lakehouse Platform?

Recent Reviews

TrustRadius Insights

December 15, 2023

The Databricks Lakehouse Platform, also known as the Unified Analytics Platform, has been widely used by multiple departments to address a …

Most collaborative Data Science & AI workspace !

10 out of 10

July 26, 2023

I use Databricks Lakehouse Platform in my Data Scienc & AI consulting company to help various business entities with data-driven …

Databricks Lakehouse Platform: A 2-year user review

9 out of 10

March 09, 2023

Incentivized

I use Databricks Lakehouse Platform to build a data-science based solutions that adress many problems in my business. This includes: …

Databricks Lakehouse Platform for all your analytics requirements

8 out of 10

May 15, 2022

Incentivized

We used Databricks Lakehouse platform for running all our Machine Learning workloads as well as storing large amounts of data in our data …

Best in the industry

9 out of 10

February 08, 2022

This product is used for Data Science project development, from data analysis/wrangling to feature creation, to training, to finetuning …

The wonders of all your data analysis in one place

8 out of 10

November 09, 2021

Incentivized

It is currently used by our Data and Product teams in order to perform deep dives analysis on how our current metrics are performing …

Positive review for Databricks Lakehouse Platform

9 out of 10

August 13, 2021

Incentivized

We currently use the Databricks Lakehouse Platform for a client. My team specifically uses it to data-mine, create reports and analytics …

My Lakehouse experiences

10 out of 10

August 11, 2021

Incentivized

We build all our data pipelines with Databricks Lakehouse technology. It is reliable and the tech support from Databricks is very good.

Databricks is Great Platform for Data Virtualization based on Delta Lake

10 out of 10

August 09, 2021

Incentivized

We use Databricks to replace traditional RDBMS like Oracle. We have Big Batch ETL, Ingestion and Extraction Job for Big data ran across …

Data for insights

7 out of 10

July 12, 2021

Incentivized

[Databricks Lakehouse Platform (Unified Analytics Platform) is] used by a few departments to start off with data warehousing. SQL …

Databricks Lakehouse is modern solutions for current big data problems

9 out of 10

July 07, 2021

Incentivized

Databricks Lakehouse platform is used across all departments in my current organization.
It is used as part of solving different data …

Databricks--a good all-rounder

9 out of 10

May 28, 2021

Incentivized

We use Databricks Lakehouse Platform (Unified Analytics Platform) in our ETL process (data loading) to perform transformations and to …

Great for both ad-hoc analyzes and scheduled jobs

8 out of 10

May 15, 2021

We use Databricks Lakehouse Platform to transform IoT data and build data models for BI tools. It is being used by engineering and IT …

Databricks for modern day ETL

9 out of 10

January 31, 2019

Incentivized

Data from APIs is streamed into our One Lake environment. This one lake is S3 on AWS.
Once this raw data is on S3, we use Databricks to …

Databricks provides a cost effective end to end solution for Enterprise analytics

9 out of 10

August 22, 2018

Incentivized

It's being used for:

Ingestion and cleansing of data
Interactive Analysis of data
Development of Analytic Services
Production Environment …

Databricks Review

9 out of 10

August 22, 2018

Incentivized

We leverage Databricks (DB) to run Big Data workloads. Primarily we build a Jar and attach to DB. We do not leverage the notebooks except …

Read all reviews

Awards

Products that are considered exceptional by their customers based on a variety of criteria win TrustRadius awards. Learn more about the types of TrustRadius awards to make the best purchase decision. More about TrustRadius Awards

Reviewer Pros & Cons

View all pros & cons

Multiple Git providers integration with merge assistant
VsCode IDE support for local development

Axel Richier

Tech Lead Data Engineer

Ekimetrics (Information Technology & Services, 201-500 employees)

Return to navigation

Pricing

View all pricing

Standard

$0.07

Cloud

Per DBU

Premium

$0.10

Cloud

Per DBU

Enterprise

$0.13

Cloud

Per DBU

Entry-level set up fee?

No setup fee

Offerings

Free Trial
Free/Freemium Version
Premium Consulting/Integration Services

Return to navigation

Product Details

About
Tech Details
FAQs

What is Databricks Lakehouse Platform?

Databricks Lakehouse Platform Technical Details

Deployment Types	Software as a Service (SaaS), Cloud, or Web-Based
Operating Systems	Unspecified
Mobile Application	No

Frequently Asked Questions

Databricks in San Francisco offers the Databricks Lakehouse Platform (formerly the Unified Analytics Platform), a data science platform and Apache Spark cluster manager. The Databricks Unified Data Service aims to provide a reliable and scalable platform for data pipelines, data lakes, and data platforms. Users can manage full data journey, to ingest, process, store, and expose data throughout an organization. Its Data Science Workspace is a collaborative environment for practitioners to run all analytic processes in one place, and manage ML models across the full lifecycle. The Machine Learning Runtime (MLR) provides data scientists and ML practitioners with scalable clusters that include popular frameworks, built-in AutoML and optimizations.

Reviewers rate Usability highest, with a score of 9.4.

The most common users of Databricks Lakehouse Platform are from Enterprises (1,001+ employees).

Return to navigation

Comparisons

View all alternatives

Compare with

Reviews and Ratings

(73)

January 31st 2024

Community Insights

TrustRadius Insights are summaries of user sentiment data from TrustRadius reviews and, when necessary, 3rd-party data sources. Have feedback on this content? Let us know!

Business Problems Solved
Pros
Cons
Recommendations

The Databricks Lakehouse Platform, also known as the Unified Analytics Platform, has been widely used by multiple departments to address a range of data engineering and analytics challenges. Users have leveraged the platform to initiate data warehousing, SQL analytics, real-time monitoring, and data governance. The versatility and openness of the platform have allowed users to save a significant amount of time and effectively manage cloud costs and human resources.

Customers have utilized the Databricks Lakehouse Platform for various use cases, including creating dashboards with tools like Tableau, Redash, and Qlik, as well as integrating with CRM systems like Salesforce and SAP. The platform has also been employed for developing chatbots in Knowledge Management and serving machine learning models behind API endpoints. Furthermore, it is extensively used for data science project development, facilitating tasks such as data analysis, wrangling, feature creation, training, model testing, validation, and deployment.

Databricks' integration capabilities, including Git integration and integration with Azure or AWS, enable users to leverage the power of integrated machine learning features. Additionally, the platform's reliability and excellent technical support make it a preferred choice for building data pipelines and solving big data engineering problems. It is widely used by engineering and IT teams to transform IoT data, build data models for business intelligence tools, and run daily/hourly jobs to create BI models.

Moreover, the Databricks Lakehouse Platform serves as an invaluable learning tool for individuals in the Computer Information System department. The community forum proves particularly helpful for self-learners with questions. Furthermore, the platform supports deep dive analysis on metrics by Data and Product teams, facilitates client reporting and analytics through data mining capabilities, replaces traditional RDBMS like Oracle for Big Batch ETL jobs on big data sets.

In summary, the Databricks Lakehouse Platform is employed across organizations to solve a variety of data engineering and analytics use cases. Its seamless integration with cloud platforms, support for different data formats, and scalability make it suitable for tasks such as data ingestion and cleansing, interactive analysis, and development of analytic services.

User-Friendly SQL: Users have found the SQL in Databricks to be user-friendly, allowing them to easily write and execute queries. Several reviewers have praised the intuitive nature of the SQL interface, making it accessible for users of different skill levels.

Enhanced Collaboration: The enhanced collaboration between data science and data engineering teams is seen as a positive feature by many users. They appreciate how Databricks facilitates seamless communication and knowledge sharing among team members, ultimately leading to improved productivity and efficiency.

Versatile Integration: The integration with multiple Git providers and the merge assistant is highly valued by users. This feature allows for smooth version control and simplifies the collaborative development process. With this capability, developers can easily manage their codebase, track changes, resolve conflicts, and ensure a streamlined workflow.

Confusing Workspace Navigation: Several users have found the navigation to create a workspace in the Databricks Lakehouse Platform confusing and time-consuming, hindering their productivity. They have expressed frustration over the complex steps involved, resulting in wasted time.

Difficulty Locating Tables: Many reviewers have expressed difficulty in locating tables after they were created, often leading to the need for deletion and recreation. This issue has caused frustration and wasted time for users who struggle to find their data within the platform.

Random Task Failures: Some users have experienced random task failures while using the platform, making it challenging for them to debug and profile code effectively. These unexpected failures undermine confidence in the system's stability and result in delays as users attempt to identify and fix these issues.

Users highly recommend the Lakehouse platform for various data-related tasks, such as building cloud-native lakehouse platforms, ingesting and transforming big data batches/streams, and implementing medallion lakehouse architectures. They find the platform simple to use and appreciate its hassle-free administration and maintenance.

The Lakehouse platform is also highly recommended for setting up Hadoop clusters and dealing with big data, analytics, and machine learning workflows. Users believe that it provides a comprehensive and open solution for these tasks.

Users suggest exploring the features of the Lakehouse platform, such as partner connect, advanced analytics/MLOPS/Data science Auto-ML capabilities. They find these features useful and believe that they enhance the platform's salient functionalities.

Overall, users highly recommend the Lakehouse platform for its ease of use, support for major cloud providers (AWS, AZURE, GCP), and useful features like data sharing (Delta Sharing). However, users also recommend considering the level of reliance on proprietary technology versus industry standards like Spark, SQL, and dbt. It is advised to read through the documentation and gather firsthand experiences from individuals who have used the Lakehouse platform.

Attribute Ratings

Reviews

(1-17 of 17)

Sort By *

Companies can't remove reviews or game the system. Here's why

July 26, 2023

Most collaborative Data Science & AI workspace !

Axel Richier

Tech Lead Data Engineer

Ekimetrics (Information Technology & Services, 201-500 employees)

Score 10 out of 10

Vetted Review

Verified User

Use Cases and Deployment Scope

I use Databricks Lakehouse Platform in my Data Scienc & AI consulting company to help various business entities with data-driven solutions. The platform can handle large and complex data sets and enable us to build and deploy applications using the latest technologies. The opennness of Databricks allows us to seamlessly integrate and adapt to our clients requirements :
* Creating dashboards with Tableau, Redash, Qlik,
* Feed their CRM tool like Salesforce, SAP,
* developing chatbots for Knowledge Management
* Serve ML models behind API endpoints.
Databricks Lakehouse Platform is a versatile and open product that saves us a lot of time, help us control cloud cost and human resources energy !

Pros and Cons

Enhanced Data Science & Data Engineering collaboration
Complete Infrastructure-as-code Terraform provider
Very easy streaming capabilities
Multiple Git providers integration with merge assistant

VsCode IDE support for local development
Python SDK for Workflows
Poetry support

Likelihood to Recommend

Databricks shines when you are working with a growing team of multiple data professions. By providing an easy to instantiate common workspace for Data Engineers, Data Scientist, ML Engineers and Data Analyst, fully integrated with Active Directory security, it makes your data projects more likely to go to production. No need to switch between tools, to transfer the data, the Unity Catalog will centralize all the assets and all your data citizens will find it in a second and can benefit from the Spark engine whatever language they use.

It would be less appropriate for very small data projects as the entry cost may be high. Yet, if the data is meant to grow, Databricks will horizontally scale without requiring a re-write of your codebase

March 09, 2023

Databricks Lakehouse Platform: A 2-year user review

Verified User

Employee in Engineering

Retail Company, 10,001+ employees

Score 9 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

I use Databricks Lakehouse Platform to build a data-science based solutions that adress many problems in my business. This includes: increment our data in the lake house and use Databricks Lakehouse Platform computational capabilities to analyze and feature engineer our data, build different machine learning model and track different experiment and finally register our trained model that can be used by the business.

Pros and Cons

MLFLOW Experiment
MLFLOW Registry
Databricks Lakehouse Platform Notebook

Connect my local code in Visual code to my Databricks Lakehouse Platform cluster so I can run the code on the cluster. The old databricks-connect approach has many bugs and is hard to set up. The new Databricks Lakehouse Platform extension on Visual Code, doesn't allow the developers to debug their code line by line (only we can run the code).
Maybe have a specific Databricks Lakehouse Platform IDE that can be used by Databricks Lakehouse Platform users to develop locally.
Visualization in MLFLOW experiment can be enhanced

Likelihood to Recommend

Well Suited: Dealing with big data and being able to train different models that address many problems in my business. In addition to its computational capabilities, using Databricks Lakehouse Platform allowed us to do all development in one platform. Less Appropriate: Having a small dataset that doesn't need parallel processing. Local development is easier to develop and track so if no parallelization is needed (data is not big or parallelized computations is not required), I prefer local development.

May 15, 2022

Databricks Lakehouse Platform for all your analytics requirements

Verified User

Engineer in Engineering

Computer Software Company, 1001-5000 employees

Score 8 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

We used Databricks Lakehouse platform for running all our Machine Learning workloads as well as storing large amounts of data in our data lake backend. The data stored in the databricks lakehouse was used to train state-of-the-art ML and Deep Learning models on text and image datasets. Databricks' Spark jobs as well as Delta Lake Lakehouse backend is well equipped for these kinds of tasks.

Pros and Cons

Very well optimized Spark Jobs Execution Engine.
Time travel in Databricks Lakehouse Platform allows you to version your datasets.
Newly integrated Analytics feature allows you to build visualization dashboards.
Native integration with managed MLflow service.

Running MLflow jobs remotely is extremely cluttered and needs to be simplified.
All the runnable code has to stay in Notebooks which are not very production-friendly.
File management on DBFS can be improved.

Likelihood to Recommend

If you need a managed big data megastore, which has native integration with highly optimized Apache Spark Engine and native integration with MLflow, go for Databricks Lakehouse Platform. The Databricks Lakehouse Platform is a breeze to use and analytics capabilities are supported out of the box. You will find it a bit difficult to manage code in notebooks but you will get used to it soon.

February 08, 2022

Best in the industry

Jonatan Bouchard

Director Data Science

CN (Transportation/Trucking/Railroad, 10,001+ employees)

Score 9 out of 10

Vetted Review

Verified User

Use Cases and Deployment Scope

This product is used for Data Science project development, from data analysis/wrangling to feature creation, to training, to finetuning and to model test and validation, and finally to deployment. While Databricks is used by many users, we also use GitHub and code Q/A to promote a code in production. This is one of the advantages of Databricks is the integration part, not only Git but whether you use it on Azure or AWS, you can also leverage the power of the integrated Machine Learning in those platforms, such as auto ml or Azure ML.

Pros and Cons

Data Science code agnostic (SQL, R, Pyton, Pyspark, Scala)
Customer Service with REAL support from data eng. and data scientist
Integration with many technology : Tableau, Azure, AWS, Spark, etc.

Visualization
Collaboration

Likelihood to Recommend

Currently the best Data Science tool for a large-scale company that needs strong tech support once and a while. The performance and the connectivity/integration with a large bread of tools and platform is also important when you don't want to change all your stack. DataBricks is a great non-drage and drops tool for real Data Scientist that knows their things.

November 09, 2021

The wonders of all your data analysis in one place

Verified User

Manager in Product Management

Financial Services Company, 201-500 employees

Score 8 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

It is currently used by our Data and Product teams in order to perform deep dives analysis on how our current metrics are performing (KPIs, OKRs), to develop tools for metric predictions based on data models in languages such as SQL and Python while mixing them and giving to the entire company visibility of the results with graphs via shared workspaces

Pros and Cons

Cross company shared workspaces for unified comprehension of the data
Combining different languages such as SQL and Python in one single space in order to make data analysis
Quick execution of highly complex queries

How graphs are created, it requires a certain level of expertise in the platform and it could be more intuitive and user friendly
More guidance on the basics, since some of the new users come from different platforms expecting a similar UI
An option where all the tables are shown with their respective fields, when a DB is selected for a query

Likelihood to Recommend

I reckon is an amazing platform for users with a certain level of expertise for designing experiments and delivering a deep dive analysis that requires execution of highly complex queries, also it is very useful when it comes to cross company shared workspaces for unified comprehension of the data.

it is less appropriate for users who don't have full knowledge of the tables they are going to query on and need more support on the data, since the platform doesn't give an option to see what are the fields in a table before even querying it

August 13, 2021

Positive review for Databricks Lakehouse Platform

Verified User

Analyst in Marketing

Marketing & Advertising Company, 5001-10,000 employees

Score 9 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

We currently use the Databricks Lakehouse Platform for a client. My team specifically uses it to data-mine, create reports and analytics for the client. Depending on where the data is stored, various Analytics teams in my company use different platforms - GCP, AWS, Databricks, etc.

Pros and Cons

Scheduling jobs to automate queries
User friendly - a new user can easily navigate through SQL/Python queries
Options to code in multiple languages (SQL, Python, Scala, R) and easy to switch with the use of the % operator

Errors can be difficult to understand at times
Session resets automatically at times, which leads to the temporary tables being wiped out from memory
Git connections are dicey
Very inconsistent with job success/failure notification emails

Likelihood to Recommend

Databricks is great for beginner as well as advanced coders. The interface is extremely user-friendly and the learning curve is quite short. It is well suited for automation where we can have scripts running late at night when the load is less and wake up to an email notification of success or failure. It is also well suited for writing codes that require the use of multiple languages (in some cases of data modeling)

The ability to store temporary/permanent tables on data lakes is a fabulous feature as well. PySpark is an excellent language to learn and it works really fast with large datasets.

August 11, 2021

My Lakehouse experiences

Stefan Panayotov

Lead Data Engineer

Cadent (Broadcast Media, 201-500 employees)

Score 10 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

We build all our data pipelines with Databricks Lakehouse technology. It is reliable and the tech support from Databricks is very good.

Pros and Cons

Better performance through consolidating small files in delta tables
ACID functionality on delta tables
Live delta tables

Make it easier to test features in public preview, like delta live tables.

Likelihood to Recommend

We can run data pipelines and use SQL Analytics to build dynamic dashboards for clients. The same platform can be used for running ML pipelines.

August 09, 2021

Databricks is Great Platform for Data Virtualization based on Delta Lake

Verified User

Director in Information Technology

Hospitality Company, 10,001+ employees

Score 10 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

We use Databricks to replace traditional RDBMS like Oracle. We have Big Batch ETL, Ingestion and Extraction Job for Big data ran across different products where we leverage Lakehouse platform to put our raw data in Data Lake and Create Delta Lake platform based on high performing Parquet.
It is kind of proposed to use across the whole organization and different BU's. Databricks will be our key main virtualized platform.
It addresses very fast data ingestion, reduces the overall ETL window. Integrated different datasource and also helps to perform Machine Learning jobs to run and scale. Idea is to reduce overall computation time to save cost on onprem.

Pros and Cons

Data Virtualization
Spark Real time and Batch streaming
Notebook to run Jobs
integrate Python and Apache Spark SQL
SQL Analytics

SQL Analytics Performance
Help migration for RDBMS sources
To make Transactional OLTP aspects faster

Likelihood to Recommend

Delta Share, Data virtualization , Open Data Integration with Other data sources, parquet ingestion

July 12, 2021

Data for insights

Gilrod Maerina

Analyst

Riverbed Technology (Internet, 1001-5000 employees)

Score 7 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

[Databricks Lakehouse Platform (Unified Analytics Platform) is] used by a few departments to start off with data warehousing. SQL analytics, real time monitoring and data governance.

Pros and Cons

SQL
User friendly
Great development environment

Errors are not explained
No data back up feature
Interface can be more intuitive

Likelihood to Recommend

[Databricks Lakehouse Platform (Unified Analytics Platform)] makes the power of Spark accessible. Databricks's proactive and customer-centric service. It is a highly adaptable solution for data engineering, data science, and AI. Load times are not consistent and no ability to restrict data access to specific users or groups.

July 07, 2021

Databricks Lakehouse is modern solutions for current big data problems

Surendranatha Reddy Chappidi

Senior Data Engineer

A.P. Moller - Maersk (Logistics & Supply Chain, 10,001+ employees)

Score 9 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

Databricks Lakehouse platform is used across all departments in my current organization.
It is used as part of solving different data engineering and data analytics use cases in different teams.
Databricks Lakehouse platform provides seamless integration with Azure cloud in Maersk. Databricks Lakehouse platform uses spark, mlops, delta for slovong the recent big data engineering problems.

Pros and Cons

Seamless integration with Azure cloud platform services like Azure Data Lake Storage, Blobstorage , Azure Data Factory, Azure DevOps.
Databricks lakehouse platform in backed uses Apache Spark for all the computation to be faster and distributed. It helps to complete data pipelines to process huge amounts [of] big data in lesser time with low cost.
Databricks Lakehouse solves the problems data lake, by introducing Delta Lake concept. It provides support for updates, deletes, schema evaluation.

Databricks Lakehouse platform can provide better platform for managing, and monitoring the cluster performance, utilization, optimization suggestions. It helps developers to leverage those insights for building better data pipelines.
Databricks Lakehouse platform can provide GUI version to create spark jobs by click, drag and drop. That reduces the significant amount of time to develop code.
Databricks Lakehouse platform can provide better insights and details regarding the jobs failures and resources consumption

Likelihood to Recommend

Databricks Lakehouse platform is well suited for below use cases :
1. Process different types of data sources like structured data, semi structured data and unstructured data.
2. Process data different data sources like RDBMS, REST APIs, File servers, IoT sensors.
3. Provide support for Updates, Deletes, schema evaluation

Databricks Lakehouse platform is not well suited for below usecases :
1. Less data volume and doesn't have analytics requirements

2. Developers doesn't have skill set on spark and Hive

May 28, 2021

Databricks--a good all-rounder

Verified User

Professional in Information Technology

Information Technology & Services Company, 201-500 employees

Score 9 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

We use Databricks Lakehouse Platform (Unified Analytics Platform) in our ETL process (data loading) to perform transformations and to implement the toughest loading strategies on huge datasets. It is very easy to understand and it can connect to almost all the modern data formats like Avro, Parquet, and JSON. It supports almost every popular cloud platform, like Azure and AWS, and offers better performance in terms of data processing speed.

Pros and Cons

Complex transformations
Supports major data sources
Great performance

User interface to connect data sources
Pricing
Community support

Likelihood to Recommend

Databricks Lakehouse Platform (Unified Analytics Platform) can be used to process raw data from any system like IoT, structured, and unstructured data sources. Since it supports Pyspark and Scala to do data processing, it can do any complex business transformation very easily. Also, the Databricks Lakehouse Platform (Unified Analytics Platform) architecture is very similar to Big Data; it can process huge datasets from Hadoop systems and machine learning models in minutes.

May 15, 2021

Great for both ad-hoc analyzes and scheduled jobs

Ender Ortak

Senior Data Analyst

Ford Otosan (Automotive, 10,001+ employees)

Score 8 out of 10

Vetted Review

Verified User

Use Cases and Deployment Scope

We use Databricks Lakehouse Platform to transform IoT data and build data models for BI tools. It is being used by engineering and IT teams. We use it with a data lake platform, read the raw data and transform it to a suitable format for analytics tools. We run daily/hourly jobs to create BI models and save the resulting models back to data lake or SQL tables.

Pros and Cons

Ready-2-use Spark environment with zero configuration required
Interactive analysis with notebook-style coding
Variety of language options (R, Scala, Python, SQL, Java)
Scheduled jobs

Random task failures
Hard to debug code
Hard to profile code

Likelihood to Recommend

It is great for both ad-hoc analyzes and scheduled jobs. It supports most of the cloud storage technologies and provides an easy to use API to connect with them. Clusters can be auto scaled with the load, and you can also create temporary clusters for job runs, which cost less compared to all purpose clusters.

January 31, 2019

Databricks for modern day ETL

Verified User

Team Lead in Engineering

Financial Services Company, 10,001+ employees

Score 9 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

Data from APIs is streamed into our One Lake environment. This one lake is S3 on AWS.
Once this raw data is on S3, we use Databricks to write Spark SQL queries and pySpark to process this data into relational tables and views.

Then those views are used by our data scientists and modelers to generate business value and use in lot of places like creating new models, creating new audit files, exports etc.

Pros and Cons

Process raw data in One Lake (S3) env to relational tables and views
Share notebooks with our business analysts so that they can use the queries and generate value out of the data
Try out PySpark and Spark SQL queries on raw data before using them in our Spark jobs
Modern day ETL operations made easy using Databricks. Provide access mechanism for different set of customers

Databricks should come with a fine grained access control mechanism. If I have tables or views created then access mechanism should be able to restrict access to certain tables or columns based on the logged in user
There should be improved graphing and dash boarding provided from within Databricks
Better integration with AWS could help me code jobs in Databricks and run them in AWS EMR more easily using better devops pipelines

Likelihood to Recommend

Databricks has helped my teams write PySpark and Spark SQL jobs and test them out before formally integrating them in Spark jobs. Through Databricks we can create parquet and JSON output files. Datamodelers and scientists who are not very good with coding can get good insight into the data using the notebooks that can be developed by the engineers.

August 22, 2018

Databricks provides a cost effective end to end solution for Enterprise analytics

Verified User

Strategist in Engineering

Computer Hardware Company, 10,001+ employees

Score 9 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

It's being used for:

Ingestion and cleansing of data
Interactive Analysis of data
Development of Analytic Services
Production Environment Customer Facing Analytic Services

Pros and Cons

Collaborative Development Environment using Notebooks.
Stable and Secure Cloud Development Environment requiring minimum DevOPs support
Fast with excellent scalability reduces time to market
Open source library support

Automation of Machine Learning Development
Optimization of GPU usage

Likelihood to Recommend

Great end to end analytics solution on AWS or Azure. Databricks continues to grow based on customer feedback. Just like everyone in the industry, they are focused on Machine Learning, but they also understand a complete solution is needed.

August 22, 2018

Databricks Review

Verified User

Director in Information Technology

Financial Services Company, 201-500 employees

Score 9 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

We leverage Databricks (DB) to run Big Data workloads. Primarily we build a Jar and attach to DB. We do not leverage the notebooks except for prototyping.

Pros and Cons

Extremely Flexible in Data Scenarios
Fantastic Performance
DB is always updating the system so we can have latest features.

Better Localized Testing
When they were primarily OSS Spark; it was easier to test/manage releases versus the newer DB Runtime. Wish there was more configuration in Runtime less pick a version.
Graphing Support went non-existent; when it was one of their compelling general engine.

Likelihood to Recommend

DB generally fits 95% of what you need to do
Primarily the ability to transform data and or do ad-hoc DS work

March 28, 2018

If you want to be an effective ML learner, use Databricks

Ann Le

Freelance Translator

ZOO Digital Group plc (Entertainment, 501-1000 employees)

Score 7 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

I actually use Databricks for experiments and research for my master's program. I mostly use it to implement Python codes and testing the viability of the programs that I write. Many individuals in the Computer Information System department are using this software platform to implement programs. It is a good tool for us to learn [and] includes a community forum that is rather helpful if you are self-learning and have questions.

Pros and Cons

There is databricks community, which is a free version. It is available for beginners to have an easy start with a big data platform. It does not have every feature of the full version but is still adequate for extremely new coders.
There are many resourceful training elements that are available to developers, data scientists, data engineers and other IT professionals to learn Apache Spark.

The navigation through which one would create a workspace is a bit confusing at first. It takes a couple minutes to figure out how to create a folder and upload files since it is not the same as traditional file systems such as box.com
Also, when you create a table, if you forgot to copy the link where the table is stored, it is hard to relocate it. Most of the time I would have to delete the table and re-created.

Likelihood to Recommend

Right now, I am learning about Spark ML and general machine learning concepts. It is a good practice space to run different Spark ML codes. Databricks does provide valid errors and detailed descriptions of where I can fix my code. And to run a set of codes is very easy to maneuver around. If you do not know how to code, it might be less appropriate to use Databricks. But then again, they do have a large community where help can be found.

September 15, 2017

Databricks Review

Verified User

Director in Engineering

Financial Services Company, 10,001+ employees

Score 6 out of 10

Vetted Review

Verified User

Incentivized

Use Cases and Deployment Scope

Across whole organization.

[It's] Used by self-service analysts to quickly do analysis

Pros and Cons

Very simplified infrastructure initialization
Seamless and automated optimization of job execution
Simple tool to get used to

Visualization - Great area of improvement
Integration with Git
COST

Likelihood to Recommend

When you have analysts that are not cloud-savvy, this tool helps them quickly run code and not be overwhelmed by infrastructure and optimization. [It's] Less appropriate in production deployments.

Return to navigation

Standard

$0.07

Premium

$0.10

Enterprise

$0.13

Azure HDInsight

AWS Glue

Microsoft SQL Server

Denodo

Azure SQL Database

Apache Kafka

Elasticsearch

SAP Datasphere

Google BigQuery

Amazon S3

Community Insights