Apache Spark is the next generation of big data computing.
Overall Satisfaction with Apache Spark
We need to calculate risk-weighted assets (RWA) daily and monthly for different positions the bank holds on a T+1 basis. The volume of calculations is large: more than millions of records per day with very complicated formulas and algorithms. In our applications/projects, we used Scala and Apache Spark clusters to load all data we needed for calculation and implemented complicated formulas and algorithms via its DataFrame or DataSet from the Apache Spark platform.
Without adopting the Apache Spark cluster, it would be pretty hard for us to implement such a big system to handle a large volume of data calculations daily. After this system was successfully deployed into PROD, we've been able to provide capital risk control reports to regulation/compliance controllers in different regions in this global financial world.
Without adopting the Apache Spark cluster, it would be pretty hard for us to implement such a big system to handle a large volume of data calculations daily. After this system was successfully deployed into PROD, we've been able to provide capital risk control reports to regulation/compliance controllers in different regions in this global financial world.
Pros
- DataFrame as a distributed collection of data: easy for developers to implement algorithms and formulas.
- Calculation in-memory.
- Cluster to distribute large data of calculation.
Cons
- It would be great if Apache Spark could provide a native database to manage all file info of saved parquet.
- The speed of processing a large volume of data.
- Dataframe with SQL-like operations reduces the learning curve for new developers if they do have very good knowledge of databases and SQL.
- Cluster to scale up/down easily.
- With the daily risk reports being calculated via Apache Spark, the bank is able to comply with the FHC rule in the US and other regions and control capitals much better with counterparties.
Other teams used to work on Apache Hadoop but our team started with Apache Spark directly.
Do you think Apache Spark delivers good value for the price?
Yes
Are you happy with Apache Spark's feature set?
Yes
Did Apache Spark live up to sales and marketing promises?
Yes
Did implementation of Apache Spark go as expected?
Yes
Would you buy Apache Spark again?
Yes
Using Apache Spark
100 - Regulation/compliance for capital risk control.
10 - This is supported by our EI team.
- Large volume of data computing.
- Speed of computing.
- Extend it to other computing in different departments.
Comments
Please log in to join the conversation