23
How Apache Spark and Apache Hadoop is helping to keep the Banking regulators happy

How Apache Spark and Apache Hadoop are being used to keep banking regulators happy

Embed Size (px)

Citation preview

1© Cloudera, Inc. All rights reserved.

How Apache Spark and Apache

Hadoop is helping to keep the Banking

regulators happy

2© Cloudera, Inc. All rights reserved.

Agenda

• Existing Architecture for Analytics & Risk

• Ever-changing Regulatory Landscape

• Challenges with existing architectures

• Modern architecture for Financial Risk

• Demo of key capabilities

3© Cloudera, Inc. All rights reserved.

Typical Existing Analytical Architecture

Data Sources

ETL/Staging

EDW

Archive

Data

Marts

Canned

Reports

Dashboards/An

alytic

Applications

Non-SQL

Workloads

Self-Service

BI/Ad Hoc

4© Cloudera, Inc. All rights reserved.

Regulatory Landscape

2012 2013 2014 2015 2016 2017 2018 2019

ICB Ring-fencing

ICB Loss

Absorbency

Leverage

Ratio -

Basel III

NSFR – Basel

III

MiFID II

T2S

LCR -

Basel III

ICB / Competition

Audit Policy

Cross Border

Debt Recovery

Financial

Transaction Tax

Market Abuse

Directive (MAD

II)

PRIP

Accounting

Directive

Review

AIFM Directive

EU Transparency

Directive

EU Reg on

Credit Rating

Agencies

CRDV

Internal

Governance

GuidelinesFATCA

PD

EMIR

SWAPS Push Out

– Dodd Frank

Securities Law

Directive (SLD)

Volker Rule –

Dodd Frank

Short Selling

Close Out

Netting

Crisis

Management

Recovery &

Resolution

Effective dates yet to be confirmed

BCBS 239 FRTB

5© Cloudera, Inc. All rights reserved.

Existing Architectures under pressureLimited Data – Incorporating new risk factors

Data Sources

ETL/Staging

EDW

Archive

Data

Marts

Canned

Reports

Dashboards/An

alytic

Applications

Non-SQL

Workloads

Self-Service

BI/Ad Hoc

!Limited Data & Insight

• Adding new data source

• Risk Factors

!Latent Value

• How long to get new

reports with new risk factors

6© Cloudera, Inc. All rights reserved.

Existing Architectures under pressureMissed SLA’s for VaR, ES & Stress scenarios

Data Sources

ETL/Staging

EDW

Archive

Data

Marts

Canned

Reports

Dashboards/An

alytic

Applications

Non-SQL

Workloads

Self-Service

BI/Ad Hoc

!Overloaded Bottlenecks

* Ever-increasing ETL

windows

!Overloaded Bottlenecks

* Ever-increasing batch

windows to extract data

7© Cloudera, Inc. All rights reserved.

Existing Architectures under pressureFrustrated Quants on the “edge” nodes (not-only-sql)

Data Sources

ETL/Staging

EDW

Archive

Data

Marts

Canned

Reports

Dashboards/An

alytic

Applications

Non-SQL

Workloads

Self-Service

BI/Ad Hoc

!Lack of Tooling

* Ad-hoc, on-demand

complex risk modeling

requirements

8© Cloudera, Inc. All rights reserved.

http://www.bis.org/publ/bcbs239.pdf

9© Cloudera, Inc. All rights reserved.

III - Accuracy &

IntegrityStrive for a single

authoritative source for

risk data. Aggregate on

an automated basis.

IV - CompletenessCapture and aggregate

all material risk data.

Data available by

business line, legal entity,

asset type, industry,

region.…

V - TimelinessGenerate aggregate

and up-to-date risk

data in a timely

manner.

VI - AdaptabilityMeet a broad range of

on-demand, ad-hoc

risk management

reporting requests.

BCBS-239: Principles for Risk Data Aggregation

• Data, models and

processes live in silos

• Hard to get enterprise

wide view of risk

• Difficult to aggregate

• Lack of enterprise data

taxonomy

• Failed audits

• Aggregate / reported

risk data is infrequent

and stale

• Unable to handle

crisis situations

• Complex risk

modeling process

• Unable to handle

crisis situations

10© Cloudera, Inc. All rights reserved.

A modern risk platform calls for…

Scalability

More risk measures, more

scenarios. Fine-grained risk

data result in an order of

magnitude increase in

volume.

Speed

More frequent stress testing

and regulatory reporting.

High velocity scenario

development and

deployment.

Agility

More frequent stress testing

and Support for variety of

languages. Pre-trade

decisions. “What-if”

scenarios.

Transparency

Verifiable data. Timely

response to audits. Data

quality and lineage. Data and

model governance.

11© Cloudera, Inc. All rights reserved.

Storage

• Archival

• Traceability

Batch

• ETL

• Data Validation

• Reg Reporting

Interactive

• Risk Aggregation

• Stress Testing

HPC

• Risk Modeling

• Backtesting

• Simulation

Streaming & Real Time

• Mkt Surveillance

• Best Execution

Evolution towards a modern risk platformRisk & Regulatory Compliance Use Cases on Hadoop

HDFS

High-throughput, scalable,

fault-tolerant, distributed

file system.

MapReduce

Distributed parallel

processing

frameworks.

12© Cloudera, Inc. All rights reserved.

Storage

• Archival

• Traceability

Batch

• ETL

• Data Validation

• Reg Reporting

Interactive

• Risk Aggregation

• Stress Testing

HPC

• Risk Modeling

• Backtesting

• Simulation

Streaming & Real Time

• Mkt Surveillance

• Best Execution

Apache Impala

Massively Parallel

Processing (MPP) SQL

engine.

Apache Spark

In-memory distributed

processing framework.

Evolution towards a modern risk platformRisk & Regulatory Compliance Use Cases on Hadoop

13© Cloudera, Inc. All rights reserved.

Storage

• Archival

• Traceability

Batch

• ETL

• Data Validation

• Reg Reporting

Interactive

• Risk Aggregation

• Stress Testing

HPC

• Risk Modeling

• Backtesting

• Simulation

Streaming & Real Time

• Mkt Surveillance

• Best Execution

Apache Spark

Distributed compute

framework. Can support

Python / C++, as well as

Java and Scala.

Data Science Workbench

Fully integrated data science

notebook application.

Cloudera Data Science Workbench

Evolution towards a modern risk platformRisk & Regulatory Compliance Use Cases on Hadoop

14© Cloudera, Inc. All rights reserved.

Storage

• Archival

• Traceability

Batch

• ETL

• Data Validation

• Reg Reporting

Interactive

• Risk Aggregation

• Stress Testing

HPC

• Risk Modeling

• Backtesting

• Simulation

Streaming & Real Time

• Mkt Surveillance

• Best Execution

Cloudera Data Science Workbench

Apache Kudu

Real-time streaming

architectures for true

Aggregated Risk of

Demand

Evolution towards a modern risk platformRisk & Regulatory Compliance Use Cases on Hadoop

15© Cloudera, Inc. All rights reserved.

Modern Platform for Analytics and Machine Learning

Data

Sources

EDW

Analytic

Database

Operational

Database

Data Science

& Engineering

Shared Data

Layer

Modern Data Platform

Fixed

Reports

Dashboards/

Analytic

Applications

Non-SQL

WorkloadsSelf-

Service

BI/Ad Hoc

Flexible

Reporting

MiFID II, FRTB, IFRS-9, BCBS-239, MAD/MAR, GDPR, ….

16© Cloudera, Inc. All rights reserved.

BCBS 239 / FRTB “Illustrative” Architecture

Market Data Revaluation Calculation & Aggregation Reporting

Market Data Feeds

IPVIndependent Price Valuation Function

MRF / NMRFModelable & Non-

Modelable Risk Factors

Calibration

Fixed IncomeFront Office

Pricing Engines

Equity MktsFront Office

Pricing Engines

FXFront Office

Pricing Engines

… Other MktsFront Office

Pricing Engines

Enterprise Data Hub

Static Data Market Data Configuration

P&L Vectors Sensitivities Events

Positions & Transaction Data

Scenarios- Current- Historic- Stressed- Projected

RiskMetrics SA-related Risk

ComponentsCounter-Party

Credit Risk XVA

ES & Stressed ES P&L Attribution VaR

RegulatoryApplications

MiFID 2 Stress Testing GDPR

FRTB SA FRTB IMA EMIR

Regulatory

Reporting

Management

Reporting

Scenarios

Risk Sen

sitivities

17© Cloudera, Inc. All rights reserved.

BCBS 239 – Timeliness (Real-time risk)Simplifying Lambda architectures with Apache Kudu

KafkaSpark

StreamingKudu

Spark MLlib

ApplicationData

Sources

Individual Session

Full Model/Learning

Genesis

Real-time

Risk with

Greeks

1Event

Occurs

2Market

Data 3Stream

Processin

g

4Land in

RDBMS

5Batch

Valuation

18© Cloudera, Inc. All rights reserved.

Metadata Management

IngestValidationProfiling

Developer Tools: IDEs, Notebooks, SCM Operations Tools: Scheduling, Workflow, Publishing

Data Management Exploration / Model Development Production / Model Deployment

Feature Engineering

Model Training & Testing

Visualization

ProductionFeature

Generation

ProductionModel Port

Production Testing

ResultValidation

Serving

User: Data Engineer User: Quant Analyst Users: Data / Dev / Ops Engineer

Modern Platform for Analytics and Machine LearningSupporting complete development lifecycle for risk

19© Cloudera, Inc. All rights reserved.

Risk Footprint with

Apache Spark and Hadoop

o 19 GSIB customers

o 9 banks with risk use

cases in production

o 6000+ nodes deployed

o >5 years in production

20© Cloudera, Inc. All rights reserved.

Market Risk aggregation platform for a Global Systemically Important Bank

55x faster processing, 8x more data

capacity

300+ daily interactive users analyzing

current and historical data

21© Cloudera, Inc. All rights reserved.

Global Systemically Important Bank

On-premise and cloud-based Hadoop clusters according to workload.

Tested on AWS to 40,000 cores. Demonstrated linear scaling of simulation workloads.

22© Cloudera, Inc. All rights reserved.

Demo

23© Cloudera, Inc. All rights reserved.

Q&A