Based on 43 reviews and ratings
Based on 24 reviews and ratings
Highlights
Databricks and Amazon EMR (Elastic MapReduce) are solutions for processing big data workloads. Both tend to be deployed at larger enterprises. Databricks handles data ingestion, data pipeline engineering, and ML/data science with its collaborative workbook for writing in R, Python, etc. Amazon EMR allows users rely on multiple open-source tools such as Apache Spark, Apache Hive, HBase, or Presto, to integrate and process big data workloads more simply.
Features
Databricks and Amazon EMR boast distinct advantages for processing big data workloads.
Amazon EMR/Elastic MapReduce is described as ideal when managing big data housed in multiple open-source tools such as Apache Hadoop or Spark. Users state that relative to other big data processing tools it is simple to use, and AWS pricing is very simple and appealing compared to competitors. It is secure, scalable, and highly available for a cloud service.
Databricks is praised for its core competencies; its data science notebook is better than alternatives (e.g. Jupyter Notebook) for enabling flexible and fast analysis on massive amounts of data while swapping between work in SQL, R, Scala, Python. Its open-source community documentation, available to all, is well regarded. And because the Databricks Community Edition is free and open-source, it is one of the relatively few options that presents a lower cost solution than Amazon EMR, though for the right users, and use cases.
Limitations
Users remark on similar limitations when considering Databricks and Amazon EMR for big data.
Amazon EMR is not a fast processor and shines primarily where users need a simplified framework for managing data from multiple tools. Also, particularly when compared to Databricks, the Amazon workbook and its machine learning capabilities are not as mature.
The licensed edition of Databricks is costly, as is its certification cost. Additionally, Databricks can be hard to use for non-technical users, who say its in-app help is unclear and hard to use. And a few say Databricks lacks good visualizations for displaying work.
Pricing
Databricks is available open-source and free via its community edition, or through its Enterprise Cloud editions, on Azure or AWS. Pricing can be complex.
Azure Databricks “Databricks Units” are priced on workload type (Data Engineering, Data Engineering Light, or Data Analytics) and service tier: Standard vs. Premium. Premium adds authentication, access features, and audit log. The Data Analytics workload is $.40 per DBU hour ($.55 premium tier) and includes data prep and data science notebook. The Data Engineering tier includes data pipeline and workload processing, for $.15 per DBU hour ($.30 Premium tier). Data Engineering Light is $.07 per DBU hour ($.22 Premium tier) and only allows users to run jobs.
Databricks AWS is also priced based on service tier (Standard, Premium, Enterprise) and workload type. Higher service tiers add Optimized Autoscaling, role-based access, federated IAM, HIPAA compliant storage, access lists for audit, and customer-managed keys. The Jobs Compute workload allows users to run data engineering pipelines and manage & clean data lakes (priced $.07, $.10, .$13 per service tier). The All-Purpose Compute service ($.40, $.55, $.65) is fully featured.
Amazon EMR is available from AWS, and is priced simply on a per-second rate for every second used with a one-minute minimum. Its hourly rate depends on instance type (e.g. standard, high CPU, high memory, high storage), with present price ranging from $0.011/hour to $0.27/hour. Amazon EMR is also available as an add-on service for Amazon EC2, and is available reserved, on-demand, or on lower-cost Spot Instances (i.e. AWS’s discounted service using EC2’s unused capacity). Pricing still falls within range of .011 to .27 per hour.
Provided by the TrustRadius Research Team
Published on October 30, 2020
Likelihood to Recommend
Amazon EMR
Databricks Unified Analytics Platform

Feature Rating Comparison
Platform Connectivity
Connect to Multiple Data Sources
Extend Existing Data Sources
Automatic Data Format Detection
Data Exploration
Visualization
Interactive Data Analysis
Data Preparation
Interactive Data Cleaning and Enrichment
Data Transformations
Data Encryption
Built-in Processors
Platform Data Modeling
Multiple Model Development Languages and Tools
Automated Machine Learning
Single platform for multiple model development
Self-Service Model Delivery
Model Deployment
Flexible Model Publishing Options
Security, Governance, and Cost Controls
Pros
Amazon EMR
- Easier to implement than older on-premise solutions
- Works with open source technologies.
- Keeps processing cost low.
- It is flexible and works also for short term workloads and the pricing changes to that model.
Databricks Unified Analytics Platform
- Extremely Flexible in Data Scenarios
- Fantastic Performance
- DB is always updating the system so we can have latest features.

Cons
Amazon EMR
- It could have been more matured with machine learning capabilities.
- The support material available online on Elastic MapReduce is limited and we might end up spending more time in understanding/researching the tool.

Databricks Unified Analytics Platform
- The navigation through which one would create a workspace is a bit confusing at first. It takes a couple minutes to figure out how to create a folder and upload files since it is not the same as traditional file systems such as box.com
- Also, when you create a table, if you forgot to copy the link where the table is stored, it is hard to relocate it. Most of the time I would have to delete the table and re-created.
Usability
Amazon EMR

Databricks Unified Analytics Platform

Support Rating
Amazon EMR

Databricks Unified Analytics Platform
Alternatives Considered
Amazon EMR

Databricks Unified Analytics Platform

Return on Investment
Amazon EMR
- It was obviously cheaper and convenient to use as most of our data processing and pipelines are on AWS. It was fast and readily available with a click and that saved a ton of time rather than having to figure out the down time of the cluster if its on premises.
- It saved time on processing chunks of big data which had to be processed in short period with minimal costs. EMR solved this as the cluster setup time and processing was simple, easy, cheap and fast.
- It had a negative impact as it was very difficult in submitting the test jobs as it lags a UI to submit spark code snippets.

Databricks Unified Analytics Platform
- Rapid growth of analytics within our company.
- Cost model aligns with usage allowing us to make a reasonable initial investment and scale the cost as we realize the value.
- Platform is easy to learn and Databricks provides excellent support and training.
- Platform does not require a large DevOPs investment
