Apache Spark is the next generation of big data computing.
April 18, 2022

Apache Spark is the next generation of big data computing.

Steven Li | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User

Overall Satisfaction with Apache Spark

We need to calculate risk-weighted assets (RWA) daily and monthly for different positions the bank holds on a T+1 basis. The volume of calculations is large: more than millions of records per day with very complicated formulas and algorithms. In our applications/projects, we used Scala and Apache Spark clusters to load all data we needed for calculation and implemented complicated formulas and algorithms via its DataFrame or DataSet from the Apache Spark platform.

Without adopting the Apache Spark cluster, it would be pretty hard for us to implement such a big system to handle a large volume of data calculations daily. After this system was successfully deployed into PROD, we've been able to provide capital risk control reports to regulation/compliance controllers in different regions in this global financial world.
  • DataFrame as a distributed collection of data: easy for developers to implement algorithms and formulas.
  • Calculation in-memory.
  • Cluster to distribute large data of calculation.
  • It would be great if Apache Spark could provide a native database to manage all file info of saved parquet.
  • The speed of processing a large volume of data.
  • Dataframe with SQL-like operations reduces the learning curve for new developers if they do have very good knowledge of databases and SQL.
  • Cluster to scale up/down easily.
  • With the daily risk reports being calculated via Apache Spark, the bank is able to comply with the FHC rule in the US and other regions and control capitals much better with counterparties.
Other teams used to work on Apache Hadoop but our team started with Apache Spark directly.

Do you think Apache Spark delivers good value for the price?

Yes

Are you happy with Apache Spark's feature set?

Yes

Did Apache Spark live up to sales and marketing promises?

Yes

Did implementation of Apache Spark go as expected?

Yes

Would you buy Apache Spark again?

Yes

For a large volume of data to be calculated, Apache Spark is the go-to; for intermediate or small volumes of data sets, Apache Spark is an option.

Using Apache Spark

100 - Regulation/compliance for capital risk control.
10 - This is supported by our EI team.
  • Large volume of data computing.
  • Speed of computing.
  • Extend it to other computing in different departments.
Capacity of computing data in cluster and fast speed.