Apache Spark, the be all End All.
Overall Satisfaction with Apache Spark
Spark was/is being used in myriad of ways. With Kafka, using Spark Streams to grab data from kafka queue into our hdfs environment. SparkSQL used for analysis of data for those not familiar with spark. Using Spark for data analysis as well and for main workflow process. Using spark over mapreduce. Using Spark for some machine learning algo's with the data.
Pros
- Machine Learning.
- Data Analysis
- WorkFlow process (faster than MapReduce).
- SQL connector to multiple data sources
Cons
- Memory management. Very weak on that.
- PySpark not as robust as scala with spark.
- spark master HA is needed. Not as HA as it should be.
- Locality should not be a necessity, but does help improvement. But would prefer no locality
- Workflow process using spark went from 1 day to 2 hours
- Spark Streaming allowed for quick determiniation of data validity
- spark on yarn was good for manangement. But Spark with Kubernetes was easier to use.
- mapreduce and apache storm
vs MapRedce, it was faster and easier to manage. Especially for Machine Learning, where MapReduce is lacking. Also Apache Storm was slower and didn't scale as much as Spark does. Spark elasticity was easier to apply compared to storm and MapReduce.
managing resources for Spark was easier compared to storm as well. MapReduce is slower than spark.
managing resources for Spark was easier compared to storm as well. MapReduce is slower than spark.
Comments
Please log in to join the conversation