Overall Satisfaction with Apache Spark
Used as the in memory data engine for big data analytics, streaming data and SQL workloads. Also, in the process of trying it out for certain machine learning algorithms. It basically processes data for analytical needs of the business and is a great tool to co-exist with the hadoop file systems.
- in memory data engine and hence faster processing
- does well to lay on top of hadoop file system for big data analytics
- very good tool for streaming data
- could do a better job for analytics dashboards to provide insights on a data stream and hence not have to rely on data visualization tools along with spark
- also there is room for improvement in the area of data discovery
- overall positive impact to the business for analysis of big data using hadoop file system
- very well received by data scientists in the business despite its shortcoming on analytical dashboarding
We evaluated SAS alongside with Apache Spark but during the course of proof of concept found that Apache Spark was able to support the hadoop eco-system and hadoop file system much better. It was much faster at that time while having the ability to process data quickly for the business analytical needs and and also scaled up well.