Amazon EMR

24 Ratings
<a href='https://www.trustradius.com/static/about-trustradius-scoring' target='_blank' rel='nofollow'>trScore algorithm: Learn more.</a>
Score 8.3 out of 101

Databricks Unified Analytics Platform

10 Ratings
<a href='https://www.trustradius.com/static/about-trustradius-scoring' target='_blank' rel='nofollow'>trScore algorithm: Learn more.</a>
Score 8.3 out of 101

Add comparison

Likelihood to Recommend

Amazon EMR

If you don't have big data ..i.e petabytes of data with terabytes of data generating every day, then don't use Hadoop. Relational databases are enough for terabytes of data. Hadoop is not well suited for transactional systems or data.
No photo available

Databricks Unified Analytics Platform

Right now, I am learning about Spark ML and general machine learning concepts. It is a good practice space to run different Spark ML codes. Databricks does provide valid errors and detailed descriptions of where I can fix my code. And to run a set of codes is very easy to maneuver around. If you do not know how to code, it might be less appropriate to use Databricks. But then again, they do have a large community where help can be found.
Ann Le profile photo

Pros

  • Distributed computing
  • Fault tolerant
  • Uptime
No photo available
  • There is databricks community, which is a free version. It is available for beginners to have an easy start with a big data platform. It does not have every feature of the full version but is still adequate for extremely new coders.
  • There are many resourceful training elements that are available to developers, data scientists, data engineers and other IT professionals to learn Apache Spark.
Ann Le profile photo

Cons

  • Cost overhead is a bit high
  • Limited versions of frameworks that can be used
No photo available
  • The navigation through which one would create a workspace is a bit confusing at first. It takes a couple minutes to figure out how to create a folder and upload files since it is not the same as traditional file systems such as box.com
  • Also, when you create a table, if you forgot to copy the link where the table is stored, it is hard to relocate it. Most of the time I would have to delete the table and re-created.
Ann Le profile photo

Alternatives Considered

The alternatives to EMR are mainly hadoop distributions owned by the 3 companies above. I have not used the other distributions so it is difficult to comment, but the general tradeoff is, at the cost of a longer setup time and more infra management, you get more flexible versioning and potentially faster access to newer versions of some frameworks such as Spark.
No photo available
I also use Microsoft Azure Machine Learning in parallel with Databricks. They use different file formats which teach me to be flexible and able to write different programs. They are equally useful to me and I would like to master both platforms for any future usage. I do prefer Databricks because it could be free if I decided to go with the Databricks Community Edition only.
Ann Le profile photo

Return on Investment

  • It was easy to set up initial versions of Spark on this
  • Still used as our compute platform as its easy to manage
  • Certain times we forgot to shut down clusters and were overcharged
No photo available
  • Machine learning is a very new concept and not many universities offer to teach it. My school and a few others have been utilizing Databricks as one of the tools to teach and learn machine learning. By doing this, my university is creating a strong future workforce for the job market.
Ann Le profile photo

Pricing Details

Amazon EMR

General
Free Trial
Free/Freemium Version
Premium Consulting/Integration Services
Entry-level set up fee?
No
Additional Pricing Details

Databricks Unified Analytics Platform

General
Free Trial
Free/Freemium Version
Premium Consulting/Integration Services
Entry-level set up fee?
No
Additional Pricing Details