Databricks Lakehouse Platform

Databricks Lakehouse Platform

About TrustRadius Scoring
Score 8.8 out of 100
Databricks Lakehouse Platform (Unified Analytics Platform)

Overview

Recent Reviews

Databricks--a good all-rounder

9 out of 10
May 28, 2021
We use Databricks Lakehouse Platform (Unified Analytics Platform) in our ETL process (data loading) to perform transformations and to …
Continue reading

Databricks for modern day ETL

9 out of 10
January 31, 2019
Data from APIs is streamed into our One Lake environment. This one lake is S3 on AWS.
Once this raw data is on S3, we use Databricks to …
Continue reading

Databricks Review

9 out of 10
August 22, 2018
We leverage Databricks (DB) to run Big Data workloads. Primarily we build a Jar and attach to DB. We do not leverage the notebooks except …
Continue reading

Databricks Review

6 out of 10
September 15, 2017
Across whole organization.

[It's] Used by self-service analysts to quickly do analysis
Continue reading

Reviewer Pros & Cons

View all pros & cons

Video Reviews

Leaving a video review helps other professionals like you evaluate products. Be the first one in your network to record a review of Databricks Lakehouse Platform, and make your voice heard!

Pricing

View all pricing

Standard

$0.07

Cloud
Per DBU

Premium

$0.10

Cloud
Per DBU

Enterprise

$0.13

Cloud
Per DBU

Entry-level set up fee?

  • No setup fee

Offerings

  • Free Trial
  • Free/Freemium Version
  • Premium Consulting / Integration Services

Features Scorecard

No scorecards have been submitted for this product yet..

Product Details

What is Databricks Lakehouse Platform?

Databricks in San Francisco offers the Databricks Lakehouse Platform (formerly the Unified Analytics Platform), a data science platform and Apache Spark cluster manager. The Databricks Unified Data Service aims to provide a reliable and scalable platform for data pipelines, data lakes, and data platforms. Users can manage full data journey, to ingest, process, store, and expose data throughout an organization. Its Data Science Workspace is a collaborative environment for practitioners to run all analytic processes in one place, and manage ML models across the full lifecycle. The Machine Learning Runtime (MLR) provides data scientists and ML practitioners with scalable clusters that include popular frameworks, built-in AutoML and optimizations.

Databricks Lakehouse Platform Technical Details

Deployment TypesSaaS
Operating SystemsUnspecified
Mobile ApplicationNo

Comparisons

View all alternatives

Frequently Asked Questions

What is Databricks Lakehouse Platform?

Databricks in San Francisco offers the Databricks Lakehouse Platform (formerly the Unified Analytics Platform), a data science platform and Apache Spark cluster manager. The Databricks Unified Data Service aims to provide a reliable and scalable platform for data pipelines, data lakes, and data platforms. Users can manage full data journey, to ingest, process, store, and expose data throughout an organization. Its Data Science Workspace is a collaborative environment for practitioners to run all analytic processes in one place, and manage ML models across the full lifecycle. The Machine Learning Runtime (MLR) provides data scientists and ML practitioners with scalable clusters that include popular frameworks, built-in AutoML and optimizations.

What is Databricks Lakehouse Platform's best feature?

Reviewers rate Usability highest, with a score of 9.

Who uses Databricks Lakehouse Platform?

The most common users of Databricks Lakehouse Platform are from Enterprises (1,001+ employees) and the Information Technology & Services industry.

Reviews

(1-15 of 15)
Companies can't remove reviews or game the system. Here's why
Score 8 out of 10
Vetted Review
Verified User
Review Source
We used Databricks Lakehouse platform for running all our Machine Learning workloads as well as storing large amounts of data in our data lake backend. The data stored in the databricks lakehouse was used to train state-of-the-art ML and Deep Learning models on text and image datasets. Databricks' Spark jobs as well as Delta Lake Lakehouse backend is well equipped for these kinds of tasks.
  • Very well optimized Spark Jobs Execution Engine.
  • Time travel in Databricks Lakehouse Platform allows you to version your datasets.
  • Newly integrated Analytics feature allows you to build visualization dashboards.
  • Native integration with managed MLflow service.
  • Running MLflow jobs remotely is extremely cluttered and needs to be simplified.
  • All the runnable code has to stay in Notebooks which are not very production-friendly.
  • File management on DBFS can be improved.
If you need a managed big data megastore, which has native integration with highly optimized Apache Spark Engine and native integration with MLflow, go for Databricks Lakehouse Platform. The Databricks Lakehouse Platform is a breeze to use and analytics capabilities are supported out of the box. You will find it a bit difficult to manage code in notebooks but you will get used to it soon.
February 08, 2022

Best in the industry

Jonatan Bouchard | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Review Source
This product is used for Data Science project development, from data analysis/wrangling to feature creation, to training, to finetuning and to model test and validation, and finally to deployment. While Databricks is used by many users, we also use GitHub and code Q/A to promote a code in production. This is one of the advantages of Databricks is the integration part, not only Git but whether you use it on Azure or AWS, you can also leverage the power of the integrated Machine Learning in those platforms, such as auto ml or Azure ML.
  • Data Science code agnostic (SQL, R, Pyton, Pyspark, Scala)
  • Customer Service with REAL support from data eng. and data scientist
  • Integration with many technology : Tableau, Azure, AWS, Spark, etc.
  • Visualization
  • Collaboration
Currently the best Data Science tool for a large-scale company that needs strong tech support once and a while. The performance and the connectivity/integration with a large bread of tools and platform is also important when you don't want to change all your stack. DataBricks is a great non-drage and drops tool for real Data Scientist that knows their things.
One of the best customer and technology support that I have ever experienced in my career. You pay for what you get and you get the Rolls Royce. It reminds me of the customer support of SAS in the 2000s when the tools were reaching some limits and their engineer wanted to know more about what we were doing, long before "data science" was even a name. Databricks truly embraces the partnership with their customer and help them on any given challenge.
Score 8 out of 10
Vetted Review
Verified User
Review Source
It is currently used by our Data and Product teams in order to perform deep dives analysis on how our current metrics are performing (KPIs, OKRs), to develop tools for metric predictions based on data models in languages such as SQL and Python while mixing them and giving to the entire company visibility of the results with graphs via shared workspaces
  • Cross company shared workspaces for unified comprehension of the data
  • Combining different languages such as SQL and Python in one single space in order to make data analysis
  • Quick execution of highly complex queries
  • How graphs are created, it requires a certain level of expertise in the platform and it could be more intuitive and user friendly
  • More guidance on the basics, since some of the new users come from different platforms expecting a similar UI
  • An option where all the tables are shown with their respective fields, when a DB is selected for a query
I reckon is an amazing platform for users with a certain level of expertise for designing experiments and delivering a deep dive analysis that requires execution of highly complex queries, also it is very useful when it comes to cross company shared workspaces for unified comprehension of the data.

it is less appropriate for users who don't have full knowledge of the tables they are going to query on and need more support on the data, since the platform doesn't give an option to see what are the fields in a table before even querying it
I have not had the need of support from Databricks [Lakehouse Platform] team itself to the date
Score 9 out of 10
Vetted Review
Verified User
Review Source
We currently use the Databricks Lakehouse Platform for a client. My team specifically uses it to data-mine, create reports and analytics for the client. Depending on where the data is stored, various Analytics teams in my company use different platforms - GCP, AWS, Databricks, etc.
  • Scheduling jobs to automate queries
  • User friendly - a new user can easily navigate through SQL/Python queries
  • Options to code in multiple languages (SQL, Python, Scala, R) and easy to switch with the use of the % operator
  • Errors can be difficult to understand at times
  • Session resets automatically at times, which leads to the temporary tables being wiped out from memory
  • Git connections are dicey
  • Very inconsistent with job success/failure notification emails
Databricks is great for beginner as well as advanced coders. The interface is extremely user-friendly and the learning curve is quite short. It is well suited for automation where we can have scripts running late at night when the load is less and wake up to an email notification of success or failure. It is also well suited for writing codes that require the use of multiple languages (in some cases of data modeling)

The ability to store temporary/permanent tables on data lakes is a fabulous feature as well. PySpark is an excellent language to learn and it works really fast with large datasets.
Stefan Panayotov | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User
Review Source
We build all our data pipelines with Databricks Lakehouse technology. It is reliable and the tech support from Databricks is very good.
  • Better performance through consolidating small files in delta tables
  • ACID functionality on delta tables
  • Live delta tables
  • Make it easier to test features in public preview, like delta live tables.
We can run data pipelines and use SQL Analytics to build dynamic dashboards for clients. The same platform can be used for running ML pipelines.
Score 10 out of 10
Vetted Review
Verified User
Review Source
We use Databricks to replace traditional RDBMS like Oracle. We have Big Batch ETL, Ingestion and Extraction Job for Big data ran across different products where we leverage Lakehouse platform to put our raw data in Data Lake and Create Delta Lake platform based on high performing Parquet.
It is kind of proposed to use across the whole organization and different BU's. Databricks will be our key main virtualized platform.
It addresses very fast data ingestion, reduces the overall ETL window. Integrated different datasource and also helps to perform Machine Learning jobs to run and scale. Idea is to reduce overall computation time to save cost on onprem.
  • Data Virtualization
  • Spark Real time and Batch streaming
  • Notebook to run Jobs
  • integrate Python and Apache Spark SQL
  • SQL Analytics
  • SQL Analytics Performance
  • Help migration for RDBMS sources
  • To make Transactional OLTP aspects faster
Delta Share, Data virtualization , Open Data Integration with Other data sources, parquet ingestion
July 12, 2021

Data for insights

Score 7 out of 10
Vetted Review
Verified User
Review Source
[Databricks Lakehouse Platform (Unified Analytics Platform) is] used by a few departments to start off with data warehousing. SQL analytics, real time monitoring and data governance.
  • SQL
  • User friendly
  • Great development environment
  • Errors are not explained
  • No data back up feature
  • Interface can be more intuitive
[Databricks Lakehouse Platform (Unified Analytics Platform)] makes the power of Spark accessible. Databricks's proactive and customer-centric service. It is a highly adaptable solution for data engineering, data science, and AI. Load times are not consistent and no ability to restrict data access to specific users or groups.
Surendranatha Reddy Chappidi | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User
Review Source
Databricks Lakehouse platform is used across all departments in my current organization.
It is used as part of solving different data engineering and data analytics use cases in different teams.
Databricks Lakehouse platform provides seamless integration with Azure cloud in Maersk. Databricks Lakehouse platform uses spark, mlops, delta for slovong the recent big data engineering problems.
  • Seamless integration with Azure cloud platform services like Azure Data Lake Storage, Blobstorage , Azure Data Factory, Azure DevOps.
  • Databricks lakehouse platform in backed uses Apache Spark for all the computation to be faster and distributed. It helps to complete data pipelines to process huge amounts [of] big data in lesser time with low cost.
  • Databricks Lakehouse solves the problems data lake, by introducing Delta Lake concept. It provides support for updates, deletes, schema evaluation.
  • Databricks Lakehouse platform can provide better platform for managing, and monitoring the cluster performance, utilization, optimization suggestions. It helps developers to leverage those insights for building better data pipelines.
  • Databricks Lakehouse platform can provide GUI version to create spark jobs by click, drag and drop. That reduces the significant amount of time to develop code.
  • Databricks Lakehouse platform can provide better insights and details regarding the jobs failures and resources consumption
Databricks Lakehouse platform is well suited for below use cases :
1. Process different types of data sources like structured data, semi structured data and unstructured data.
2. Process data different data sources like RDBMS, REST APIs, File servers, IoT sensors.
3. Provide support for Updates, Deletes, schema evaluation

Databricks Lakehouse platform is not well suited for below usecases :
1. Less data volume and doesn't have analytics requirements
2. Developers doesn't have skill set on spark and Hive


Score 9 out of 10
Vetted Review
Verified User
Review Source
We use Databricks Lakehouse Platform (Unified Analytics Platform) in our ETL process (data loading) to perform transformations and to implement the toughest loading strategies on huge datasets. It is very easy to understand and it can connect to almost all the modern data formats like Avro, Parquet, and JSON. It supports almost every popular cloud platform, like Azure and AWS, and offers better performance in terms of data processing speed.
  • Complex transformations
  • Supports major data sources
  • Great performance
  • User interface to connect data sources
  • Pricing
  • Community support
Databricks Lakehouse Platform (Unified Analytics Platform) can be used to process raw data from any system like IoT, structured, and unstructured data sources. Since it supports Pyspark and Scala to do data processing, it can do any complex business transformation very easily. Also, the Databricks Lakehouse Platform (Unified Analytics Platform) architecture is very similar to Big Data; it can process huge datasets from Hadoop systems and machine learning models in minutes.
Score 8 out of 10
Vetted Review
Verified User
Review Source
We use Databricks Lakehouse Platform to transform IoT data and build data models for BI tools. It is being used by engineering and IT teams. We use it with a data lake platform, read the raw data and transform it to a suitable format for analytics tools. We run daily/hourly jobs to create BI models and save the resulting models back to data lake or SQL tables.
  • Ready-2-use Spark environment with zero configuration required
  • Interactive analysis with notebook-style coding
  • Variety of language options (R, Scala, Python, SQL, Java)
  • Scheduled jobs
  • Random task failures
  • Hard to debug code
  • Hard to profile code
It is great for both ad-hoc analyzes and scheduled jobs. It supports most of the cloud storage technologies and provides an easy to use API to connect with them. Clusters can be auto scaled with the load, and you can also create temporary clusters for job runs, which cost less compared to all purpose clusters.
Score 9 out of 10
Vetted Review
Verified User
Review Source
Data from APIs is streamed into our One Lake environment. This one lake is S3 on AWS.
Once this raw data is on S3, we use Databricks to write Spark SQL queries and pySpark to process this data into relational tables and views.

Then those views are used by our data scientists and modelers to generate business value and use in lot of places like creating new models, creating new audit files, exports etc.
  • Process raw data in One Lake (S3) env to relational tables and views
  • Share notebooks with our business analysts so that they can use the queries and generate value out of the data
  • Try out PySpark and Spark SQL queries on raw data before using them in our Spark jobs
  • Modern day ETL operations made easy using Databricks. Provide access mechanism for different set of customers
  • Databricks should come with a fine grained access control mechanism. If I have tables or views created then access mechanism should be able to restrict access to certain tables or columns based on the logged in user
  • There should be improved graphing and dash boarding provided from within Databricks
  • Better integration with AWS could help me code jobs in Databricks and run them in AWS EMR more easily using better devops pipelines
Databricks has helped my teams write PySpark and Spark SQL jobs and test them out before formally integrating them in Spark jobs. Through Databricks we can create parquet and JSON output files. Datamodelers and scientists who are not very good with coding can get good insight into the data using the notebooks that can be developed by the engineers.
Score 9 out of 10
Vetted Review
Verified User
Review Source
It's being used for:

  • Ingestion and cleansing of data
  • Interactive Analysis of data
  • Development of Analytic Services
  • Production Environment Customer Facing Analytic Services
  • Collaborative Development Environment using Notebooks.
  • Stable and Secure Cloud Development Environment requiring minimum DevOPs support
  • Fast with excellent scalability reduces time to market
  • Open source library support
  • Automation of Machine Learning Development
  • Optimization of GPU usage
Great end to end analytics solution on AWS or Azure. Databricks continues to grow based on customer feedback. Just like everyone in the industry, they are focused on Machine Learning, but they also understand a complete solution is needed.
August 22, 2018

Databricks Review

Score 9 out of 10
Vetted Review
Verified User
Review Source
We leverage Databricks (DB) to run Big Data workloads. Primarily we build a Jar and attach to DB. We do not leverage the notebooks except for prototyping.
  • Extremely Flexible in Data Scenarios
  • Fantastic Performance
  • DB is always updating the system so we can have latest features.
  • Better Localized Testing
  • When they were primarily OSS Spark; it was easier to test/manage releases versus the newer DB Runtime. Wish there was more configuration in Runtime less pick a version.
  • Graphing Support went non-existent; when it was one of their compelling general engine.
  • DB generally fits 95% of what you need to do
  • Primarily the ability to transform data and or do ad-hoc DS work
Ann Le | TrustRadius Reviewer
Score 7 out of 10
Vetted Review
Verified User
Review Source
I actually use Databricks for experiments and research for my master's program. I mostly use it to implement Python codes and testing the viability of the programs that I write. Many individuals in the Computer Information System department are using this software platform to implement programs. It is a good tool for us to learn [and] includes a community forum that is rather helpful if you are self-learning and have questions.
  • There is databricks community, which is a free version. It is available for beginners to have an easy start with a big data platform. It does not have every feature of the full version but is still adequate for extremely new coders.
  • There are many resourceful training elements that are available to developers, data scientists, data engineers and other IT professionals to learn Apache Spark.
  • The navigation through which one would create a workspace is a bit confusing at first. It takes a couple minutes to figure out how to create a folder and upload files since it is not the same as traditional file systems such as box.com
  • Also, when you create a table, if you forgot to copy the link where the table is stored, it is hard to relocate it. Most of the time I would have to delete the table and re-created.
Right now, I am learning about Spark ML and general machine learning concepts. It is a good practice space to run different Spark ML codes. Databricks does provide valid errors and detailed descriptions of where I can fix my code. And to run a set of codes is very easy to maneuver around. If you do not know how to code, it might be less appropriate to use Databricks. But then again, they do have a large community where help can be found.
September 15, 2017

Databricks Review

Score 6 out of 10
Vetted Review
Verified User
Review Source
Across whole organization.

[It's] Used by self-service analysts to quickly do analysis
  • Very simplified infrastructure initialization
  • Seamless and automated optimization of job execution
  • Simple tool to get used to
  • Visualization - Great area of improvement
  • Integration with Git
  • COST
When you have analysts that are not cloud-savvy, this tool helps them quickly run code and not be overwhelmed by infrastructure and optimization. [It's] Less appropriate in production deployments.