Apache Spark if great for high volume production workflows
August 02, 2017

Apache Spark if great for high volume production workflows

Anonymous | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User

Overall Satisfaction with Apache Spark

We use it primarily in our department as part of a machine learning and data processing platform to build enterprise scale predictive applications.
  • Great APIs and tools.
  • Scale.
  • Speed for iterative algorithms.
  • No true streaming.
  • Lack of strongly typed yet convenient APIs.
  • Positive: we don't worry about scale.
  • Positive: large support community.
  • Negative: Takes time to set up, overkill for many simpler workflows.
There are a few newer frameworks for general processing like Flink, Beam, frameworks for streaming like Samza and Storm, and traditional Map-Reduce. I think Spark is at a sweet spot where its clearly better than Map-Reduce for many workflows yet has gotten a good amount of support in the community that there is little risk in deploying it. It also integrates batch and streaming workflows and APIs, allowing an all in package for multiple use-cases.
Amazon Redshift, Amazon S3 (Simple Storage Service), Amazon Elastic Compute Cloud (EC2), Amazon Elastic MapReduce, Salesforce Analytics Cloud, Looker
Well suited for batch and near-real time data processing tasks as well as production deployments of machine learning, especially at large scale. Not well suited for general analytics workflows for small and medium sized data sets; SQL based data warehouses like Redshift, Vertica, and etc. are better for those use cases.