Likelihood to Recommend
If you need a managed big data megastore, which has native integration with highly optimized
Engine and native integration with MLflow, go for Databricks Lakehouse Platform. The Databricks Lakehouse Platform is a breeze to use and analytics capabilities are supported out of the box. You will find it a bit difficult to manage code in notebooks but you will get used to it soon.
Read full review
If you can load your data first into your warehouse, dbt is excellent. It does the T(ransformation) part of ELT brilliantly but does not do the E(xtract) or L(oad) part. If you know SQL or your development team knows SQL, it's a framework and extension around that. So, it's easy to learn and easy to hire people with that technical skill (as opposed to specific Informatica,
, etc. experience). dbt uses plain text files and integrates with GitHub. You can easily see the changes made between versions. In GUI-based UIs it was always hard to tell what someone had changed. Each "model" is essentially a "SELECT" statement. You never need to do a "CREATE TABLE" or "CREATE VIEW" - it's all done for you, leaving you to work on the business logic. Instead of saying "FROM specific_db.schema.table" you indicate "FROM ref('my_other_model')". It creates an internal dependency diagram you can view in a DAG. When you deploy, the dependencies work like magic in your various environments. They also have great documentation, an active slack community, training, and support. I like the enhancements they have been making and I believe they are headed in a good direction.
Read full review Pros Process raw data in One Lake (S3) env to relational tables and views Share notebooks with our business analysts so that they can use the queries and generate value out of the data Try out PySpark and Spark SQL queries on raw data before using them in our Spark jobs Modern day ETL operations made easy using Databricks. Provide access mechanism for different set of customers Read full review user experience makes it easy to work with SQL and version control customer success team and the dbt (data build tool) community help establish best practices thorough and clear documentation Read full review Cons Connect my local code in Visual code to my Databricks Lakehouse Platform cluster so I can run the code on the cluster. The old databricks-connect approach has many bugs and is hard to set up. The new Databricks Lakehouse Platform extension on Visual Code, doesn't allow the developers to debug their code line by line (only we can run the code). Maybe have a specific Databricks Lakehouse Platform IDE that can be used by Databricks Lakehouse Platform users to develop locally. Visualization in MLFLOW experiment can be enhanced Read full review Slow load times of the dbt cloud environment (they're working on it via a new UI though) More out-of-the-box solutions for managing procedures, functions, etc would be nice to have, but honestly, it's pretty easy to figure out how to adapt dbt macros Read full review Usability
Because it is an amazing platform for designing experiments and delivering a deep dive analysis that requires execution of highly complex queries, as well as it allows to share the information and insights across the company with their shared workspaces, while keeping it secured.
in terms of graph generation and interaction it could improve their UI and UX Read full review Support Rating
One of the best customer and technology support that I have ever experienced in my career. You pay for what you get and you get the Rolls Royce. It reminds me of the customer support of SAS in the 2000s when the tools were reaching some limits and their engineer wanted to know more about what we were doing, long before "data science" was even a name. Databricks truly embraces the partnership with their customer and help them on any given challenge.
Read full review Alternatives Considered
Databricks has a much better edge than Synapse in hundred different ways. Databricks has Photon engine, faster available release in cloud and databricks does not run on Open source spark version so better optimization, better performance and better agility and all kind of performance boost can be achieved in Databricks rather Open source synapse spark
Read full review
Most ETL pipeline products have a T layer, but dbt just does it better. The transformation is on steroids compared to the others. Also, just allows much more Adhoc solutions for very specific projects. Those ETL tools are probably better on the T part if you don't need too many transforms - also dbt is pretty much free dependent on how you work it, also extremely scalable.
Read full review Contract Terms and Pricing Model
The problem with this tool and all other ones that are at the top of the industry, it's so expensive that soon as another one will be on the market and deliver the same or different value, it will be catastrophic for them. So you get the fact that they are cashing every dime right now like SAS or Hadoop once did. Now, look at them
Read full review Professional Services
Again, another level of professional services, this is not their biggest strength but this is the cherry on top. I couldn't think about any other professional services like this one. Now I'm talking about meaningful services that really help out our project and delivery.
Read full review Return on Investment The ability to spin up a BIG Data platform with little infrastructure overhead allows us to focus on business value not admin DB has the ability to terminate/time out instances which helps manage cost. The ability to quickly access typical hard to build data scenarios easily is a strength. Read full review Simplified our BI layer for faster load times Increased the quality of data reaching our end users Makes complex transformations manageable Read full review ScreenShots