Apache Spark is still a valid DE tool
December 28, 2024
Apache Spark is still a valid DE tool

Score 9 out of 10
Vetted Review
Verified User
Overall Satisfaction with Apache Spark
We use Apache Spark on a daily basis as the main computation engine for updating most critical and non-critical data pipelines. We mostly work with batch processing but there are instances for using Spark Streaming as well. The scope is for all analysis pipelines, machine learning datasets and several operational use cases.
Pros
- Parallel processing
- Configurability
- Usage with other tools
Cons
- More ready-to-use solutions for tweaking the Apache Spark configs
- Reduce the creation of UDFs for Pyspark by implementing transformations directly
- Increased data literacy and adherence to best data engineering practices across the organization
- Increased ability for the data analysts to quickly and reliably have access to their data, better supporting data driven decisions
- Decreased costs due to better parallelization of resources
Do you think Apache Spark delivers good value for the price?
Yes
Are you happy with Apache Spark's feature set?
Yes
Did Apache Spark live up to sales and marketing promises?
I wasn't involved with the selection/purchase process
Did implementation of Apache Spark go as expected?
Yes
Would you buy Apache Spark again?
Yes

Comments
Please log in to join the conversation