Item: Databricks Lakehouse Platform
Rating: 9
Author: Verified User

Use Cases and Deployment Scope

Data from APIs is streamed into our One Lake environment. This one lake is S3 on AWS.
Once this raw data is on S3, we use Databricks to write Spark SQL queries and pySpark to process this data into relational tables and views.

Then those views are used by our data scientists and modelers to generate business value and use in lot of places like creating new models, creating new audit files, exports etc.

Pros and Cons

Process raw data in One Lake (S3) env to relational tables and views
Share notebooks with our business analysts so that they can use the queries and generate value out of the data
Try out PySpark and Spark SQL queries on raw data before using them in our Spark jobs
Modern day ETL operations made easy using Databricks. Provide access mechanism for different set of customers

Databricks should come with a fine grained access control mechanism. If I have tables or views created then access mechanism should be able to restrict access to certain tables or columns based on the logged in user
There should be improved graphing and dash boarding provided from within Databricks
Better integration with AWS could help me code jobs in Databricks and run them in AWS EMR more easily using better devops pipelines

Return on Investment

ROI for us has been tremendous. Time to market by processing raw data in our big data infrastructure has been pretty fast.
Non engineers can easily use Databricks, hence helping business customers.
Thousands of different data combinations can easily be joined and used by our data teams.

Alternatives Considered

Apache Spark, Apache Spark Streaming and H2O

Databricks was picked among other competitors. Closest competition in our organization was H2O.ai and Databricks came out to be more useful for ROI and time to market in our internal research.
We could have used AWS products, however Databricks notebooks and ability to launch clusters directly from notebooks was seen as a very helpful tool for non tech users.

Other Software Used

AWS WAF, Scala

Usability

This has been very useful in my organization for shared notebooks, integrated data pipeline automation and data sources integrations. Integration with AWS is seamless. Non tech users can easily learn how to use Databricks.
You can have your company LDAP connect to it for login based access controls to some extent.

Likelihood to Recommend

Databricks has helped my teams write PySpark and Spark SQL jobs and test them out before formally integrating them in Spark jobs. Through Databricks we can create parquet and JSON output files. Datamodelers and scientists who are not very good with coding can get good insight into the data using the notebooks that can be developed by the engineers.

Databricks for modern day ETL

Overall Satisfaction with Databricks Unified Analytics Platform