Apache Spark, the be all End All.
March 27, 2018

Apache Spark, the be all End All.

Anson Abraham | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User

Overall Satisfaction with Apache Spark

Spark was/is being used in myriad of ways. With Kafka, using Spark Streams to grab data from kafka queue into our hdfs environment. SparkSQL used for analysis of data for those not familiar with spark. Using Spark for data analysis as well and for main workflow process. Using spark over mapreduce. Using Spark for some machine learning algo's with the data.
  • Machine Learning.
  • Data Analysis
  • WorkFlow process (faster than MapReduce).
  • SQL connector to multiple data sources
  • Memory management. Very weak on that.
  • PySpark not as robust as scala with spark.
  • spark master HA is needed. Not as HA as it should be.
  • Locality should not be a necessity, but does help improvement. But would prefer no locality
  • Workflow process using spark went from 1 day to 2 hours
  • Spark Streaming allowed for quick determiniation of data validity
  • spark on yarn was good for manangement. But Spark with Kubernetes was easier to use.
  • mapreduce and apache storm
vs MapRedce, it was faster and easier to manage. Especially for Machine Learning, where MapReduce is lacking. Also Apache Storm was slower and didn't scale as much as Spark does. Spark elasticity was easier to apply compared to storm and MapReduce.
managing resources for Spark was easier compared to storm as well. MapReduce is slower than spark.
Spark is great as a workflow process and extract transform layer process tool. Is really good for machine learning especially for large datasets that can be processed in split file paralallelization.
Spark streaming is scalable for close to real-time data workflow process.
what it's not good for, is smaller subset of data processing.