Item: Apache Spark
Rating: 9
Author: Shiv Shivakumar

Use Cases and Deployment Scope

Used as the in memory data engine for big data analytics, streaming data and SQL workloads. Also, in the process of trying it out for certain machine learning algorithms. It basically processes data for analytical needs of the business and is a great tool to co-exist with the hadoop file systems.

Pros and Cons

in memory data engine and hence faster processing
does well to lay on top of hadoop file system for big data analytics
very good tool for streaming data

could do a better job for analytics dashboards to provide insights on a data stream and hence not have to rely on data visualization tools along with spark
also there is room for improvement in the area of data discovery

Return on Investment

overall positive impact to the business for analysis of big data using hadoop file system
very well received by data scientists in the business despite its shortcoming on analytical dashboarding

Alternatives Considered

We evaluated SAS alongside with Apache Spark but during the course of proof of concept found that Apache Spark was able to support the hadoop eco-system and hadoop file system much better. It was much faster at that time while having the ability to process data quickly for the business analytical needs and and also scaled up well.

Other Software Used

Apache Kafka, SAP HANA, Couchbase Data Platform

Likelihood to Recommend

Apache Spark is very well suited for big data analytics in conjunction with the hadoop file system and also does a good job of providing fast access to data in SQL workloads since it has an in memory data processing engine that can very quickly process data. In addition, it can also be used for streaming data processing.

Apache Spark - defacto for big data processing/analytics

Overall Satisfaction with Apache Spark