Item: Apache Spark
Rating: 9
Author: Steven Li

Use Cases and Deployment Scope

We need to calculate risk-weighted assets (RWA) daily and monthly for different positions the bank holds on a T+1 basis. The volume of calculations is large: more than millions of records per day with very complicated formulas and algorithms. In our applications/projects, we used Scala and Apache Spark clusters to load all data we needed for calculation and implemented complicated formulas and algorithms via its DataFrame or DataSet from the Apache Spark platform.

Without adopting the Apache Spark cluster, it would be pretty hard for us to implement such a big system to handle a large volume of data calculations daily. After this system was successfully deployed into PROD, we've been able to provide capital risk control reports to regulation/compliance controllers in different regions in this global financial world.

Pros and Cons

DataFrame as a distributed collection of data: easy for developers to implement algorithms and formulas.
Calculation in-memory.
Cluster to distribute large data of calculation.

It would be great if Apache Spark could provide a native database to manage all file info of saved parquet.

Most Important Features

The speed of processing a large volume of data.
Dataframe with SQL-like operations reduces the learning curve for new developers if they do have very good knowledge of databases and SQL.
Cluster to scale up/down easily.

Return on Investment

With the daily risk reports being calculated via Apache Spark, the bank is able to comply with the FHC rule in the US and other regions and control capitals much better with counterparties.

Alternatives Considered

Apache Hadoop

Other teams used to work on Apache Hadoop but our team started with Apache Spark directly.

Key Insights

Do you think Apache Spark delivers good value for the price?

Yes

Are you happy with Apache Spark's feature set?

Yes

Did Apache Spark live up to sales and marketing promises?

Yes

Did implementation of Apache Spark go as expected?

Yes

Would you buy Apache Spark again?

Yes

Large volume of data computing.
Speed of computing.

Future Planned Uses

Extend it to other computing in different departments.

Likelihood to Renew

Capacity of computing data in cluster and fast speed.

Apache Spark is the next generation of big data computing.

Overall Satisfaction with Apache Spark

Use Cases and Deployment Scope

Pros and Cons

Most Important Features

Return on Investment

Alternatives Considered

Key Insights

Do you think Apache Spark delivers good value for the price?

Are you happy with Apache Spark's feature set?

Did Apache Spark live up to sales and marketing promises?

Did implementation of Apache Spark go as expected?

Would you buy Apache Spark again?

Other Software Used

Likelihood to Recommend

Using Apache Spark

Users and Roles

Support Headcount Required

Business Processes Supported

Future Planned Uses

Likelihood to Renew