Great open source tool for data processing
December 13, 2019

Great open source tool for data processing

Anonymous | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User

Overall Satisfaction with Apache Spark

We do use Apache Spark for cluster computing for our ETL environment, data and analytics as well as machine learning. It is mainly used by our data engineering team to support the entire Data Lake foundation. As we have huge amounts of information coming from multiple sources, we needed an effective cluster management system to handle capacity and deliver the performance and throughput we needed.
  • Cluster management for ETL.
  • Data processing engine for our data lake.
  • You still need Hive or other HDFS to store information.
  • Security is behind compared to MapReduce.
  • Simplified our landscape.
  • Drove great performance for data processing.
Databricks uses Spark as a foundation, and is also a great platform. It does bring several add-ons, which we did not feel needed by the time we evaluated - and haven't needed since then. One interesting plus in our opinion was the engineering support, which is great depending on the criticality of your platform.
As every open source tool, you have to use forums, consulting companies and engineering power to support and maintain. There is plenty of documentation available, so you will be in good hands. You can also find consulting companies small-mid size which can support your environment at a decent cost. Another alternative is going to Data Bricks, if support is a key criteria for your decision.

Do you think Apache Spark delivers good value for the price?

Yes

Are you happy with Apache Spark's feature set?

Yes

Did Apache Spark live up to sales and marketing promises?

Yes

Did implementation of Apache Spark go as expected?

Yes

Would you buy Apache Spark again?

Yes

Spark is a one-size-fits-all data processing platform. You can run batch and in-motion streams, you can use for ETL, machine learning or even graphs. You do not have multiple tools, so it makes your TCO and management tasks way easier. As every new platform, has room to grow: storage and security are the main opportunities we found.