What users are saying about

Apache Spark

98 Ratings
<a href='https://www.trustradius.com/static/about-trustradius-scoring' target='_blank' rel='nofollow'>trScore algorithm: Learn more.</a>
Score 8.6 out of 101

Data Science Workbench

10 Ratings
<a href='https://www.trustradius.com/static/about-trustradius-scoring' target='_blank' rel='nofollow'>trScore algorithm: Learn more.</a>
Score 7.4 out of 101

Add comparison

Likelihood to Recommend

Apache Spark

Well suited:1. Data can be integrated from several sources including click stream, logs, transactional systems2. Real-time ingestion through Kafka, Kinesis, and other streaming platforms
No photo available

Data Science Workbench

  • If you already have a Cloudera partnership and a cluster, having this is a no brainer.
  • It integrates well with your existing ecosystem and it immediately starts working on projects, accessing full datasets and share analysis and results.
  • With the inclusion of Kubernetes, CPU and memory across worker nodes can be managed effectively.
Bharadwaj (Brad) Chivukula profile photo

Pros

  • Spark uses Scala which is a functional programming language and easy to use language. Syntax is simpler and human readable.
  • It can be used to run transformations on huge data on different cluster parallelly. It automatically optimizes the process to get output efficiently in less time.
  • It also provides machine learning API for data science applications and also Spark SQL to query fast for data analysis.
  • I also use Zeppelin online tool which is used to fast query and very helpful for BI guys to visualize query outputs.
Kamesh Emani profile photo
  • One single IDE (browser based application) that makes Scala, R, Python integrated under one tool
  • For larger organizations/teams, it lets you be self reliant
  • As it sits on your cluster, it has very easy access of all the data on the HDFS
  • Linking with Github is a very good way to keep the code versions intact
Bharadwaj (Brad) Chivukula profile photo

Cons

  • No true streaming.
  • Lack of strongly typed yet convenient APIs.
No photo available
  • Not as great as RStudio; lacks some features when compared with it
  • It is quite simple still (because its very early in its initiative), and companies may want to wait until they see a more developed product
Bharadwaj (Brad) Chivukula profile photo

Alternatives Considered

There are a few newer frameworks for general processing like Flink, Beam, frameworks for streaming like Samza and Storm, and traditional Map-Reduce. I think Spark is at a sweet spot where its clearly better than Map-Reduce for many workflows yet has gotten a good amount of support in the community that there is little risk in deploying it. It also integrates batch and streaming workflows and APIs, allowing an all in package for multiple use-cases.
No photo available
Both the tools have similar features and have made it pretty easy to install/deploy/use. Depending on your existing platform (Cloudera vs. Azure) you need to pick the Workbench. Another observation is that Cloudera has better support where you can get feedback on your questions pretty fast (unlike MS). As its a new product, I expect MS to be more efficient in handling customers questions.
Bharadwaj (Brad) Chivukula profile photo

Return on Investment

  • Positive: we don't worry about scale.
  • Positive: large support community.
  • Negative: Takes time to set up, overkill for many simpler workflows.
No photo available
  • As the tool itself can access all the HDFS, Spark data easily, the wait time between teams has reduced
  • Installation was a breeze, and ramp up time was fairly easy
Bharadwaj (Brad) Chivukula profile photo

Pricing Details

Apache Spark

General
Free Trial
Free/Freemium Version
Premium Consulting/Integration Services
Entry-level set up fee?
No
Additional Pricing Details

Data Science Workbench

General
Free Trial
Free/Freemium Version
Premium Consulting/Integration Services
Entry-level set up fee?
No
Additional Pricing Details