Apache Spark Review
September 19, 2020
Apache Spark Review
Score 8 out of 10
Overall Satisfaction with Apache Spark
Our organization currently uses Apache Spark for processing large chunks of data. It is being used for machine learning and large scale SQL queries. We are using high-level APIs for performing complex tasks. Our team of developers and data scientists incorporate Spark into their applications to transform large chunks of data. It is also being used for LOT, ETL, etc.
- It has API working with big data.
- Reduces the number of read and write actions to disk.
- Data is stored primarily on memory and not stored on hard disk unless required.
- Easy to program.
- Runs complex jobs in a fraction of the time.
- Automation is missing from Spark i.e. automatic optimization process.
- It needs to have its own file management system.
- Inability to support more concurrent users.
- Helped reduce churn rate by 8%.
- It is open source and hence saved significant licensing costs.
Each technology has its own advantages and disadvantages. We do not have much experience with Cassandra or Apache Flume. But the one thing that makes Apache Spark stand out from its competitors/alternatives is its ability to integrate with multiple big data frameworks. So, it is not a surprise that Spark has more Github stars than its competitors.
We have been using Spark for a very long time and we are very happy with its service and support. It has a very good and interactive Community, which is enough to solve any problem which we encounter. The tool itself is very easy to use and combined with the support makes it a very useful tool.
Apache integrates with multiple big data frameworks. It does not exert too much load on the disks. Moreover, it is easy to program and use. It reduces the headache of using different applications separately through its high-level APIs. Big data processing has never been as easy as it is with Apache Spark.
Do you think Apache Spark delivers good value for the price?
Are you happy with Apache Spark's feature set?
Did Apache Spark live up to sales and marketing promises?
Did implementation of Apache Spark go as expected?
Would you buy Apache Spark again?
Apache Spark is well suited for the below scenarios:
Processing large chunks of data. Spark supports multiple frameworks for Big data. It is good when we need high scalability
Apache Spark is not well suited for the below scenarios:
If we want real-time analytics and need results quickly. Not to be used as a replacement to existing infrastructure but can be used as a parallel framework. Working with small datasets.