Risk managementusinghadoop

Welcome to Redefining Perspectives November 2012

2 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL

Capital Markets Risk Management

And Hadoop Kevin Samborn and

Nitin Agrawal


Agenda

• Risk Management

• Hadoop

•Monte Carlo VaR Implementation

• Q & A


Risk Management


What is Risk Management

• Risk is a tool – the goal is to optimize and understand risk

o Too much risk is locally and systemically dangerous

o Too little risk means the firm may be “leaving profit on the table”

• Portfolio exposure

o Modern portfolios contain many different types of assets

o Simple instruments, Complex instruments and derivatives

• Many types of risk measures

o Defined scenario-based stress testing

o Value at Risk (VaR)

o “Sensitivities”

• Key is valuation under different scenarios

• VaR is used in banking regulations, margin calculations and risk

management


Value at Risk (VaR)

• VaR is a statistical measure of risk – expressed as amount of loss given

probability %. E.g. 97.5% chance that the firm will not lose more than 1mill

USD over the next 5 days

• Computing VaR is a challenging data sourcing and compute intensive process

• VaR calculation:

o Generate statistical scenarios of market behavior

o Revalue the portfolio for each scenario, compare returns to today’s value

o Sort results and select the desired percentage return: VALUE AT RISK

• Different VaR techniques:

o Parametric – analytic approximation

o Historical – captures real (historical) market dynamics

o Monte Carlo – many scenarios, depends on statistical distributions


VaR Graphically

Source: An Introduction To Value at Risk (VAR), Investopedia, May 2010


Complexities

• For modern financial firms, VaR is complex. Calculation requirements:

o Different types of assets require different valuation models

• Risk-based approach

• Full revaluation

o With large numbers of scenarios, many thousands of calculations are required

o Monte Carlo simulations require significant calibration, depending on large historical

data

• Many different reporting dimensions

o VaR is not additive across dimensions. Product/asset class, Currency

o Portfolio – including “what-if” and intraday activity

• Intraday market changes requiring new simulations

• Incremental VaR – how does a single (new) trade contribute to the total


Backtesting VaR



• Data stored with REDUNDANCY on a Distributed File System

• Abstracts H/W FAILURES delivering a highly-available service on COMMODITY H/W

• SCALES-UP from single to thousands of nodes

• Data stored WITHOUT A SCHEMA

• Tuned for SEQUENTIAL DATA ACCESS

• Provides an EASY ABSTRACTION for processing large data sets

• Infrastructure for PARALLEL DATA PROCESSING across huge Commodity cluster

• Infrastructure for TASK and LOAD MANAGEMENT

• Framework achieves DATA-PROCESS LOCALITY

Hadoop Core

Makes two critical assumptions though:

• Data doesn’t need to be updated

• Data doesn’t need to be accessed randomly


A Simple Map Reduce Job Problem Statement: From historical price data, create frequency distribution of 1-day %age change

for various stocks

Map 1

Map 2

Stock Date Open Close

BP 23-Nov 435.25 435.5

NXT 23-Nov 3598 3620

MKS 23-Nov 378.5 380.7

BP 22-Nov 434.8 433.6

NXT 22-Nov 3579 3603

MKS 22-Nov 377.8 378

BP 21-Nov 430.75 433

NXT 21-Nov 3574 3582

MKS 21-Nov 375 376

BP 20-Nov 430.9 432.25

NXT 20-Nov 3592 3600

MKS 20-Nov 373.7 375.3

BP 19-Nov 422.5 431.6

NXT 19-Nov 3560 3600

MKS 19-Nov 368.5 372.6

BP 16-Nov 423.9 416.6

NXT 16-Nov 3575 3542

MKS 16-Nov 370.3 366.4

BP 15-Nov 422 425.4

NXT 15-Nov 3596 3550

MKS 15-Nov 376.5 370.6

Map M

Reduce 1

Reduce 2

Reduce 3

Reduce N

SORT /SHUFFLE

BP|1, 33 BP|2, 64 …

NXT|81, 2 NXT|-20, 5 …

Output3

Output N

public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { SecurityAttributes sa = RecordsReadHelper.readAttribs(value.toString()); context.write(new Text(sa.getTicker()), new IntWritable(sa.getPercentChange())); }

public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { Map<Integer, Long> freqDist = buildFreqDistribution(values); Set<Integer> percentChanges = freqDist.keySet(); for (Integer percentChange : percentChanges) { context.write(new Text(key.toString() + "|" + percentChange.toString()), new LongWritable(freqDist.get(percentChange))); }


Hadoop Ecosystem | How/Where These Fit

STORAGE LOAD

PROCESSING

SUPPORT

HUE

Zoo Keeper

Sqoop

hiho

Scribe

Flume

DATA WAREHOUSE

VISUALIZATION TOOLS

USERS


Monte-Carlo VaR Implementation


Monte Carlo VaR

Challenges

Daily trade data could be massive

Valuations are Compute intensive

VaR is not a simple arithmetic sum across hierarchies

Aggregation Aggregation AGGREGATION

HLV1 = (∑AiVi) 1

HLV2 = (∑AiVi) 2

HLV10k= (∑AiVi) 10k

…

…

2 Steps

SIMULATION

IBM

MSFT

V3 V10,000

IBM.CO

V2 V1

…

…

…

…

…

…

…

… … …


Simulation Step - MapReduce

MAP - Read-through portfolio data - Emit (K,V) as (Underlyer,InstrumentDetails) e.g. (IBM, IBM.CO.DEC14.225)

SIMULATION

IBM MSFT

V3

IBM.CO

V2 V1

…

…

… …

…

…

… … … …

REDUCE - For the Underlyer, perform 10k random walks in parallel - For each random walk output, simulate derivative prices - Emit 10k sets of simulated prices of the stock and associated derivatives i.e. IBM , [V1, V2, …..V10000] IBM.CO.DEC14.225 , [V1, V2, …..V10000]

Job job = new Job(getConf()); job.setJobName("RandomValuationGenerator");

job.setMapperClass(SecurityAttributeMapper.class); job.setReducerClass(PriceSimulationsReducer.class); public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { SecurityAttributes sa = RecordsReadHelper.readAttribs(value.toString());

context.write(new Text(sa.getUnderlyer()), sa); }

SecurityAttributes stockAttrib = (SecurityAttributes) iter.next();

simPricesStock = getSimPricesForStock(stockAttrib); writeReducerOutput(stockAttrib, simPricesStock, context); … bsmp = new BlackScholesMertonPricingOption(); while (iter.hasNext()) { SecurityAttributes secAttribs = iter.next();

writeReducerOutput(secAttribs,getSimPricesForOptions( simPricesStock, bsmp, secAttribs), context); }


Aggregation Step MapReduce

MAP - Read-through de-normalized portfolio data - Emit (K,V) as (Hierarchy-level, Position Details) US , [IBM, 225, 191.23] US|Tech , [IBM, 400, 191.23] US|Tech|Eric , [IBM, 400, 191.23]

REDUCE • For the hierarchy level (e.g. US|ERIC),

perform ∑AiVi for each simulation and get simulated portfolio values - HLVi

• Sort HLVi , find 1%, 5% and 10% values and emit position and VaR data

Aggregation Aggregation

HLV1 = (∑AiVi) 1 HLV2 = (∑AiVi) 2

…

…

protected void map(LongWritable key, HoldingWritable value, Context context) throws java.io.IOException ,InterruptedException { SecurityAttributes sa = RecordsReadHelper.readAttribs(value.toString());

Set<String> hierarchyLevels = sa.getHierarchyLevels(); for (String hierarchyLevel : hierarchyLevels) { context.write(new Text(hierarchyLevel), new Text(sa.getPositionDtls())); } }

Map<String, Double> portfolioPositionData = combineInputForPFPositionData(rows); Map<String, Double[]> simulatedPrices=

loadSimulatedPrices(portfolioPositionData.keySet()); for(long i=0; i<NO_OF_SIMULATIONS-1; i++) { simulatedPFValues.add(getPFSimulatedValue(i, portfolioPositionData, simulatedPrices)); } Collections.sort(simulatedPFValues); emitResults(portfolioPositionData, simulatedPFValues);


DEMO RUN


Observations

• As expected, processing time of Map jobs increased marginally

when input data volume was increased

• Process was IO-bound on Simulation’s Reduce job as

intermediate data emitted was huge

• Data replication factor needs to be chosen carefully

• MapReduce jobs should be designed such that Map/Reduce

output is not huge


Questions?


Thank You!


Appendix




Let’s build a Simple Map Reduce Job Problem Statement: Across a huge set of documents, we need to find all locations (i.e.

document, page, line) for all words having more than 10 characters.

Store Map

D A T A NO D E 1

D A T A NO D E 2 STORAGE


Business

Risk managementusinghadoop