Item: Apache Spark
Rating: 10
Author: Verified User

Use Cases and Deployment Scope

We use it primarily in our department as part of a machine learning and data processing platform to build enterprise scale predictive applications.

Pros and Cons

Great APIs and tools.
Scale.
Speed for iterative algorithms.

No true streaming.
Lack of strongly typed yet convenient APIs.

Return on Investment

Positive: we don't worry about scale.
Positive: large support community.
Negative: Takes time to set up, overkill for many simpler workflows.

There are a few newer frameworks for general processing like Flink, Beam, frameworks for streaming like Samza and Storm, and traditional Map-Reduce. I think Spark is at a sweet spot where its clearly better than Map-Reduce for many workflows yet has gotten a good amount of support in the community that there is little risk in deploying it. It also integrates batch and streaming workflows and APIs, allowing an all in package for multiple use-cases.

Other Software Used

Amazon Redshift, Amazon S3 (Simple Storage Service), Amazon Elastic Compute Cloud (EC2), Amazon Elastic MapReduce, Salesforce Analytics Cloud, Looker

Likelihood to Recommend

Well suited for batch and near-real time data processing tasks as well as production deployments of machine learning, especially at large scale. Not well suited for general analytics workflows for small and medium sized data sets; SQL based data warehouses like Redshift, Vertica, and etc. are better for those use cases.

Apache Spark if great for high volume production workflows

Overall Satisfaction with Apache Spark

Use Cases and Deployment Scope

Pros and Cons

Return on Investment

Alternatives Considered

Other Software Used

Likelihood to Recommend