Great open source tool for data processing
December 13, 2019
Great open source tool for data processing

Score 9 out of 10
Vetted Review
Verified User
Overall Satisfaction with Apache Spark
We do use Apache Spark for cluster computing for our ETL environment, data and analytics as well as machine learning. It is mainly used by our data engineering team to support the entire Data Lake foundation. As we have huge amounts of information coming from multiple sources, we needed an effective cluster management system to handle capacity and deliver the performance and throughput we needed.
- Cluster management for ETL.
- Data processing engine for our data lake.
- You still need Hive or other HDFS to store information.
- Security is behind compared to MapReduce.
- Simplified our landscape.
- Drove great performance for data processing.
Databricks uses Spark as a foundation, and is also a great platform. It does bring several add-ons, which we did not feel needed by the time we evaluated - and haven't needed since then. One interesting plus in our opinion was the engineering support, which is great depending on the criticality of your platform.
Do you think Apache Spark delivers good value for the price?
Yes
Are you happy with Apache Spark's feature set?
Yes
Did Apache Spark live up to sales and marketing promises?
Yes
Did implementation of Apache Spark go as expected?
Yes
Would you buy Apache Spark again?
Yes