Apache Spark if great for high volume production workflows
August 02, 2017
Apache Spark if great for high volume production workflows

Score 10 out of 10
Vetted Review
Verified User
Overall Satisfaction with Apache Spark
We use it primarily in our department as part of a machine learning and data processing platform to build enterprise scale predictive applications.
- Great APIs and tools.
- Scale.
- Speed for iterative algorithms.
- No true streaming.
- Lack of strongly typed yet convenient APIs.
- Positive: we don't worry about scale.
- Positive: large support community.
- Negative: Takes time to set up, overkill for many simpler workflows.
There are a few newer frameworks for general processing like Flink, Beam, frameworks for streaming like Samza and Storm, and traditional Map-Reduce. I think Spark is at a sweet spot where its clearly better than Map-Reduce for many workflows yet has gotten a good amount of support in the community that there is little risk in deploying it. It also integrates batch and streaming workflows and APIs, allowing an all in package for multiple use-cases.