TrustRadius Insights for Databricks Data Intelligence Platform are summaries of user sentiment data from TrustRadius reviews and, when necessary, third party data sources.
Pros
User-Friendly SQL: Users have found the SQL in Databricks to be user-friendly, allowing them to easily write and execute queries. Several reviewers have praised the intuitive nature of the SQL interface, making it accessible for users of different skill levels.
Enhanced Collaboration: The enhanced collaboration between data science and data engineering teams is seen as a positive feature by many users. They appreciate how Databricks facilitates seamless communication and knowledge sharing among team members, ultimately leading to improved productivity and efficiency.
Versatile Integration: The integration with multiple Git providers and the merge assistant is highly valued by users. This feature allows for smooth version control and simplifies the collaborative development process. With this capability, developers can easily manage their codebase, track changes, resolve conflicts, and ensure a streamlined workflow.
Loading Reviews List....
Databricks Data Intelligence Platform Reviews
6 Reviews
Professional, Scientific, and Technical ServicesInformation Technology & Services5Marketing & Advertising1
I use Databricks every day. We use multiple environments, such as dev, stage, and prod. Our primary use case is to get data from SnapLogic into the Bronze layer of Databricks. So, it handles both full and incremental loads as per the use case. Our workflows also support versioning across releases. This is really good for minimizing prod risks. We also perform many transformations on silver tables for end-use data products in the gold layer.
Pros
First, it handles large amounts of data. We run daily and weekly jobs that process a lot of records. Databricks manages it very well, with no issues, if the cluster is set up properly.
Second, it really works well for incremental updates. We load only new or changed data, which makes it easy to update existing tables without duplicating records.
Third, job scheduling is useful. We can schedule the jobs easily and monitor them. The best part is that we can retry or repair the failed runs.
The last one is about the notebook interface that I really love. It makes development and debugging easy. We can test logic step by step, validate data, and fix all our issues.
Cons
Sometimes, when multiple jobs depend on each other in different environments, it is not always easy to see the full workflow in one place.
It is sometimes difficult to determine which job or cluster contributes more to the overall cost.
For beginners, cluster configuration may be a little difficult. So more recommendation in the platform can help.
Likelihood to Recommend
Table merges: When we have to update existing tables with new records, Databricks makes it very simple and also reliable. The notebook environment helps multiple team members work together, test logic, and debug issues very quickly. It also works well when we need separate environments for dev and prod. Jobs can be tested safely in dev before moving to prod.
VU
Verified User
Engineer in Information Technology (Information Technology & Services company, 201-500 employees)
Databricks is used to run all the data engineering, data science, analytics jobs related to spark , feature engineering jobs of data science and all the model training jobs of datascience. Databricks SQLwarehouse is also used as the main compute for serving the analytics powering the queries between 10 minutes to 60 minutes
Pros
Databricks provides an amazing notebook which serves and the backbone of unified analytics for all things data engineering, data science, and analytics
the performance of Databricks SQL Warehouse is top notch and very hard to beat by many competitors , and amazing performance optimization for delta file format
Databricks' support for iceberg format is also very good allowing for no vendor lock - in for delta format
Cons
The Unity catalog is restrictive in terms of policy definition due to user-defined functions compared to market alternatives like Apache Ranger or Open Policy Agent.
Unity Catalog has a table quota limitation where a schema with a higher number of tables requires increasing the table quota repeatedly , which is not an issue if you use Hive Metastore.
SQL Warehouse as a solution is amazing and performant, but there should have been some support to add Spark plugins or extensions so that there is some room for changes or adding custom dependencies
Likelihood to Recommend
for a new team with less number of users Databricks outshines due to the unified analytics , and provided all solutions from data ingestion to data lake ETL , to data consumption with a modern notebook . The Databricks GPU and Spark runtimes are also very well maintained and are regularly updated. So there is very little maintenance. If someone is using Databricks but does not use Unity Catalog, then they miss out on a lot of advanced features like SQL Warehouse serverless. Making Unity Catalog optional would have been easier for non-Unity Catalog customers to use the full capabilities of Databricks and the newer features that require Unity Catalog
In our organization, we use the Databricks Data Intelligence Platform as the main platform for building and managing data products. I use Databricks for creating notebooks for data transformation and creating data products and redirect them to project repositories and jobs scheduling using Databricks Workflows. And create business data products as delta tables in catalog (unity). It helps in solving big data manipulation/handling and jobs management.
Pros
Large Data Processing:- It handles large volumes of data efficiently using Spark for transforming data fulfilling business purposes and handles the jobs smoothly using clusters, workers.
Notebook-Oriented Development:- Databricks notebooks make development easy and flexible of data transformations using SQL, Python and R. Helps in testing the notebooks before deploying
Data Governance:- Provides data governance providing unity catalog for managing permissions security.
Cons
Job Monitoring and Alerts:- A better visual dashboard for pipeline tracking dependencies and failures would improve visibility.
Serverless limitation:- The sql variable set up using 'set' in notebooks is limited in serverless and can't be initialize, could be improved
Likelihood to Recommend
Based on my experience, Databricks Platform is well suited for huge-scale data processing and building end-to-end data pipelines. It works very well for developing complete data products using bronze, silver, and gold architecture.Well suited for faster development with the help of integrated Ai assistant. The notebook environment allows us to quickly develop, test and deploy.
I use Databricks Lakehouse Platform in my Data Scienc & AI consulting company to help various business entities with data-driven solutions. The platform can handle large and complex data sets and enable us to build and deploy applications using the latest technologies. The opennness of Databricks allows us to seamlessly integrate and adapt to our clients requirements : * Creating dashboards with Tableau, Redash, Qlik, * Feed their CRM tool like Salesforce, SAP, * developing chatbots for Knowledge Management * Serve ML models behind API endpoints. Databricks Lakehouse Platform is a versatile and open product that saves us a lot of time, help us control cloud cost and human resources energy !
Pros
Enhanced Data Science & Data Engineering collaboration
Multiple Git providers integration with merge assistant
Cons
VsCode IDE support for local development
Python SDK for Workflows
Poetry support
Likelihood to Recommend
Databricks shines when you are working with a growing team of multiple data professions. By providing an easy to instantiate common workspace for Data Engineers, Data Scientist, ML Engineers and Data Analyst, fully integrated with Active Directory security, it makes your data projects more likely to go to production. No need to switch between tools, to transfer the data, the Unity Catalog will centralize all the assets and all your data citizens will find it in a second and can benefit from the Spark engine whatever language they use.
It would be less appropriate for very small data projects as the entry cost may be high. Yet, if the data is meant to grow, Databricks will horizontally scale without requiring a re-write of your codebase
We currently use the Databricks Lakehouse Platform for a client. My team specifically uses it to data-mine, create reports and analytics for the client. Depending on where the data is stored, various Analytics teams in my company use different platforms - GCP, AWS, Databricks, etc.
Pros
Scheduling jobs to automate queries
User friendly - a new user can easily navigate through SQL/Python queries
Options to code in multiple languages (SQL, Python, Scala, R) and easy to switch with the use of the % operator
Cons
Errors can be difficult to understand at times
Session resets automatically at times, which leads to the temporary tables being wiped out from memory
Git connections are dicey
Very inconsistent with job success/failure notification emails
Likelihood to Recommend
Databricks is great for beginner as well as advanced coders. The interface is extremely user-friendly and the learning curve is quite short. It is well suited for automation where we can have scripts running late at night when the load is less and wake up to an email notification of success or failure. It is also well suited for writing codes that require the use of multiple languages (in some cases of data modeling)
The ability to store temporary/permanent tables on data lakes is a fabulous feature as well. PySpark is an excellent language to learn and it works really fast with large datasets.
VU
Verified User
Analyst in Marketing (Marketing & Advertising company, 5001-10,000 employees)
We use Databricks Lakehouse Platform (Unified Analytics Platform) in our ETL process (data loading) to perform transformations and to implement the toughest loading strategies on huge datasets. It is very easy to understand and it can connect to almost all the modern data formats like Avro, Parquet, and JSON. It supports almost every popular cloud platform, like Azure and AWS, and offers better performance in terms of data processing speed.
Pros
Complex transformations
Supports major data sources
Great performance
Cons
User interface to connect data sources
Pricing
Community support
Likelihood to Recommend
Databricks Lakehouse Platform (Unified Analytics Platform) can be used to process raw data from any system like IoT, structured, and unstructured data sources. Since it supports Pyspark and Scala to do data processing, it can do any complex business transformation very easily. Also, the Databricks Lakehouse Platform (Unified Analytics Platform) architecture is very similar to Big Data; it can process huge datasets from Hadoop systems and machine learning models in minutes.
VU
Verified User
Professional in Information Technology (Information Technology & Services company, 201-500 employees)