What users are saying about
102 Ratings
<a href='https://www.trustradius.com/static/about-trustradius-scoring' target='_blank' rel='nofollow'>trScore algorithm: Learn more.</a>
Score 8.5 out of 101
11 Ratings
<a href='https://www.trustradius.com/static/about-trustradius-scoring' target='_blank' rel='nofollow'>trScore algorithm: Learn more.</a>
Score 7.6 out of 101

Add comparison

Likelihood to Recommend

Apache Spark

Spark is great as a workflow process and extract transform layer process tool. Is really good for machine learning especially for large datasets that can be processed in split file paralallelization. Spark streaming is scalable for close to real-time data workflow process.what it's not good for, is smaller subset of data processing.
Anson Abraham profile photo

Data Science Workbench

  • If you already have a Cloudera partnership and a cluster, having this is a no brainer.
  • It integrates well with your existing ecosystem and it immediately starts working on projects, accessing full datasets and share analysis and results.
  • With the inclusion of Kubernetes, CPU and memory across worker nodes can be managed effectively.
Bharadwaj (Brad) Chivukula profile photo

Pros

  • Ease of use, the Spark API allows for minimal boilerplate and can be written in a variety of languages including Python, Scala, and Java.
  • Performance, for most applications we have found that jobs are more performant running via Spark than other distributed processing technologies like Map-Reduce, Hive, and Pig.
  • Flexibility, the frameworks comes with support for streaming, batch processing, sql queries, machine learning, etc. It can be used in a variety of applications without needing to integrate a lot of other distributed processing technologies.
No photo available
  • One single IDE (browser based application) that makes Scala, R, Python integrated under one tool
  • For larger organizations/teams, it lets you be self reliant
  • As it sits on your cluster, it has very easy access of all the data on the HDFS
  • Linking with Github is a very good way to keep the code versions intact
Bharadwaj (Brad) Chivukula profile photo

Cons

  • Increase the information and trainings that come with the application, especially for debugging since the process is difficult to understand.
  • It should be more attentive to users and make tutorials, to reduce the learning curve.
  • There should be more grouping algorithms.
Carla Borges profile photo
  • Not as great as RStudio; lacks some features when compared with it
  • It is quite simple still (because its very early in its initiative), and companies may want to wait until they see a more developed product
Bharadwaj (Brad) Chivukula profile photo

Alternatives Considered

I prefer Apache Spark compared to Hadoop, since in my experience Spark has more usability and comes equipped with simple APIs for Scala, Python, Java and Spark SQL, as well as provides feedback in REPL format on the commands. At the same time, Apache Spark seems to have the best performance in the processing of large data that works in memory and, therefore, more processes can be downloaded on Spark than on Hadoop, despite the fact that Hadoop is also a very useful tool.
Carla Borges profile photo
Both the tools have similar features and have made it pretty easy to install/deploy/use. Depending on your existing platform (Cloudera vs. Azure) you need to pick the Workbench. Another observation is that Cloudera has better support where you can get feedback on your questions pretty fast (unlike MS). As its a new product, I expect MS to be more efficient in handling customers questions.
Bharadwaj (Brad) Chivukula profile photo

Return on Investment

  • We were able to make batch job faster by 20 times as compared to MapReduce
  • With the language support like Scala, Java, and Python, easily manageable
No photo available
  • As the tool itself can access all the HDFS, Spark data easily, the wait time between teams has reduced
  • Installation was a breeze, and ramp up time was fairly easy
Bharadwaj (Brad) Chivukula profile photo

Pricing Details

Apache Spark

General
Free Trial
Free/Freemium Version
Premium Consulting/Integration Services
Entry-level set up fee?
No
Additional Pricing Details

Data Science Workbench

General
Free Trial
Free/Freemium Version
Premium Consulting/Integration Services
Entry-level set up fee?
No
Additional Pricing Details