Azure Databricks is used for Data Analytics, Modelling and AI/ML uses cases in our analytics architecture. Analytics modelling from Non-SAP Sources like Partner Portal, Microsoft Dynamics CRM, Oracle DB are done using Azure Databricks. For the AI/ML use cases on Manufacturing Defect support is implemented using Azure Databricks in our organization
Pros
Unity Catalog
Data Federation in Lakehouse Architecture
Integration of Mosaic AI in the SQL Layer
Cons
Data Orchestration limitations compared to Azure Data Factory
Limitations in Native Modelling Features
Integration with SAP Sources need SAP Datasphere
Likelihood to Recommend
Azure Databricks is well suited for scenarios where AI/ML uses cases are present on top of the Analytics Models. The Integration of LLM Models with SQL is good with Azure Data Bricks. The costing model needs simplification. The Integration of SAP Source systems are not straight forward, this mandates the licensing cost for Datasphere in the Architecture
VU
Verified User
Employee in Information Technology (10,001+ employees)
Azure Databricks is primarily used by our insight and analytics team. They use this for machine learning and reporting. We use Azure Databricks as our data lake into Braze. This helps with all the data we need which is very important for reporting and metrics on our customer base. This reduces data silos for us.
Pros
Stops data silos
Collaborative
Single workspace
Cons
Quite expensive
Simple tasks can be difficult
Hard to learn
Likelihood to Recommend
Centralised notebooks are out directly into production. This can lead to poorly engineered code. It is very good for fast queries and our data team are always able to provide what we ask for. It is a big cost to our business so it is important it runs efficiently and returns on our investment.
As a Big Data Consultant. Azure Databricks is my favorite tool in the house!
The biggest problems with data consulting is a plethora of programming languages it deals in, from SQL, Scala,R, Python, Java etc.
That is exactly where Azure Databricks excels! It supports all languages in a single notebook with an equivalent performance for all! Club that with a visually pleasing UI, features that integrate the entire data lifecycle, and an architecture that gets the best of spark and you have one of the best data tools in your hand!
Pros
Data Processing and Transformations based on Spark
Delta Lakehouse when clubbed with an external cloud storage
Governance using Unity Catalog to unify IAM
Delta Live Tables is a product, which although relatively newer, has a great potential with the visuals of a pipeline.
Cons
The new UI is a bit clunky compared to the old UI. It also adds new elements in the sidebar which are not relevant to the workspace. Can be worked upon
Delta Live Tables, although powerful, has a lot of things that can be improved, including error debugging, support for new things
Concurrent requests need some more optimisation and work in the delta lake tables.
Likelihood to Recommend
Suppose you have multiple data sources and you want to bring the data into one place, transform it and make it into a data model. Azure Databricks is a perfectly suited solution for this. Leverage spark JDBC or any external cloud based tool (ADG, AWS Glue) to bring the data into a cloud storage. From there, Azure Databricks can handle everything. The data can be ingested by Azure Databricks into a 3 Layer architecture based on the delta lake tables. The first layer, raw layer, has the raw as is data from source. The enrich layer, acts as the cleaning and filtering layer to clean the data at an individual table level. The gold layer, is the final layer responsible for a data model. This acts as the serving layer for BI For BI needs, if you need simple dashboards, you can leverage Azure Databricks BI to create them with a simple click! For complex dashboards, just like any sql db, you can hook it with a simple JDBC string to any external BI tool.
We are leveraging Databricks capabilities in various use cases. For instance, to design a tailor made change data capture that keep track of users account details and keep it updated in delta lake. We have also designed numerous ETL processes which is scheduled to provide data to data analytics on strict delivery timelines. Moreover, the workspaces is integrated with other Azure services such as Azure synapse analytics, Azure data lake, Azure data factory. Some of our Databricks are triggered by Azure data factory.
Pros
Consistently great performance when dealing with huge scale data with the help of spark architecture
Magic commands such as spark sql, pyspark, scala . This comes really handy in day to day work
Integration with other Azure services is super smooth and robust
Cons
Their pipeline workflow orchestration is pretty primitive. Lacks some common features
Workspace UI and navigation requires steep learning curve
Personally, I am not fond of their autosave feature. Its dangerous for production level notebooks scripts
Likelihood to Recommend
It works great for use cases where you want to have a more customized solution able to handle huge data volumes ( cluster nodes power and spark). Also, if you want to migrate native spark solution to cloud. Or if you want to integrate your existing Azure data services together
We use Databricks to pull performance metrics for our content hosted on the company website. Having one tool to view and analyze the data has been a game changer for us, saving many hours of collecting the data various sources in the past.
Pros
SQL
Data management
Data access
Cons
Intuitive interface
Ease of use
Providing FAQ or QRGs
Likelihood to Recommend
Having access to all databases and tables in one place is what has helped me and my team to function better. The in built functionality/access to SQL and Python is definitely an added bonus! The icing on the cake is the ability to export your data into an Excel spreadsheet for additional analysis. If you have less to no working knowledge of SQL or Python, its better to look at alternatives.