Overall Satisfaction with Apache Spark
We were working for one of our products, which has a requirement for developing an enterprise-level product catering to manage a vast amount of Big data involved. We wanted to use a technology that is faster than Hadoop and can process large scale data by providing a streamlined process for the data scientists. Apache Spark is a powerful unified solution as we thought to be.
The main problem that we identified in our existing approach was that it was taking a large amount of time to process the data, and also the statistical analysis of the data was not up to the mark. We wanted a sophisticated analytical solution that was easy and fast to use. With using Apache Spark, the processing was made 5 times faster than earlier, giving rise to pretty good analytics. With Spark, across a cluster of machines, the data abstraction was achieved by using RDDs.
The main problem that we identified in our existing approach was that it was taking a large amount of time to process the data, and also the statistical analysis of the data was not up to the mark. We wanted a sophisticated analytical solution that was easy and fast to use. With using Apache Spark, the processing was made 5 times faster than earlier, giving rise to pretty good analytics. With Spark, across a cluster of machines, the data abstraction was achieved by using RDDs.
- DataFrames, DataSets, and RDDs.
- Spark has in-built Machine Learning library which scales and integrates with existing tools.
- The data processing done by Spark comes at a price of memory blockages, as in-memory capabilities of processing can lead to large consumption of memory.
- The caching algorithm is not in-built in Spark. We need to manually set up the caching mechanism.
- The ROI was increased by considerable percentage after using Apache Spark.
- Apache Spark provided the agility towards supporting multiple applications.
- Hadoop and Amazon EMR (Elastic MapReduce)
1. Apache Spark is almost 100 % faster than Hadoop.
2. Apache Spark is more stable than Amazon EMR.
3. The end to end distributed machine library is more robust in Apache Spark.
4. For very large data sets, Apache Spark is more trustworthy than the other two.
5. For data transformations, Apache Spark provides a very rich set of APIs.
6. The interface provided for SQL in Apache Spark is easy to understand as compared to others.
2. Apache Spark is more stable than Amazon EMR.
3. The end to end distributed machine library is more robust in Apache Spark.
4. For very large data sets, Apache Spark is more trustworthy than the other two.
5. For data transformations, Apache Spark provides a very rich set of APIs.
6. The interface provided for SQL in Apache Spark is easy to understand as compared to others.
Do you think Apache Spark delivers good value for the price?
Yes
Are you happy with Apache Spark's feature set?
Yes
Did Apache Spark live up to sales and marketing promises?
Yes
Did implementation of Apache Spark go as expected?
Yes
Would you buy Apache Spark again?
Yes