Databricks for modern day ETL
Anonymous | TrustRadius Reviewer
January 31, 2019

Databricks for modern day ETL

Score 9 out of 10
Vetted Review
Verified User
Review Source

Overall Satisfaction with Databricks Unified Analytics Platform

Data from APIs is streamed into our One Lake environment. This one lake is S3 on AWS.
Once this raw data is on S3, we use Databricks to write Spark SQL queries and pySpark to process this data into relational tables and views.

Then those views are used by our data scientists and modelers to generate business value and use in lot of places like creating new models, creating new audit files, exports etc.
  • Process raw data in One Lake (S3) env to relational tables and views
  • Share notebooks with our business analysts so that they can use the queries and generate value out of the data
  • Try out PySpark and Spark SQL queries on raw data before using them in our Spark jobs
  • Modern day ETL operations made easy using Databricks. Provide access mechanism for different set of customers
  • Databricks should come with a fine grained access control mechanism. If I have tables or views created then access mechanism should be able to restrict access to certain tables or columns based on the logged in user
  • There should be improved graphing and dash boarding provided from within Databricks
  • Better integration with AWS could help me code jobs in Databricks and run them in AWS EMR more easily using better devops pipelines
  • ROI for us has been tremendous. Time to market by processing raw data in our big data infrastructure has been pretty fast.
  • Non engineers can easily use Databricks, hence helping business customers.
  • Thousands of different data combinations can easily be joined and used by our data teams.
Databricks was picked among other competitors. Closest competition in our organization was H2O.ai and Databricks came out to be more useful for ROI and time to market in our internal research.
We could have used AWS products, however Databricks notebooks and ability to launch clusters directly from notebooks was seen as a very helpful tool for non tech users.
This has been very useful in my organization for shared notebooks, integrated data pipeline automation and data sources integrations. Integration with AWS is seamless. Non tech users can easily learn how to use Databricks.
You can have your company LDAP connect to it for login based access controls to some extent.
Databricks has helped my teams write PySpark and Spark SQL jobs and test them out before formally integrating them in Spark jobs. Through Databricks we can create parquet and JSON output files. Datamodelers and scientists who are not very good with coding can get good insight into the data using the notebooks that can be developed by the engineers.

Databricks Unified Analytics Platform Feature Ratings

Connect to Multiple Data Sources
9
Extend Existing Data Sources
9
Automatic Data Format Detection
7
Visualization
6
Interactive Data Analysis
6
Interactive Data Cleaning and Enrichment
8
Data Transformations
9
Data Encryption
7
Built-in Processors
8
Multiple Model Development Languages and Tools
9
Automated Machine Learning
8
Single platform for multiple model development
9
Self-Service Model Delivery
7
Flexible Model Publishing Options
7
Security, Governance, and Cost Controls
8