Upload
sapientindia
View
79
Download
0
Embed Size (px)
DESCRIPTION
Citation preview
Welcome to Redefining Perspectives November 2012
2 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL
Capital Markets Risk Management
And Hadoop Kevin Samborn and
Nitin Agrawal
4 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL
Agenda
• Risk Management
• Hadoop
•Monte Carlo VaR Implementation
• Q & A
5 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL
Risk Management
6 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL
What is Risk Management
• Risk is a tool – the goal is to optimize and understand risk
o Too much risk is locally and systemically dangerous
o Too little risk means the firm may be “leaving profit on the table”
• Portfolio exposure
o Modern portfolios contain many different types of assets
o Simple instruments, Complex instruments and derivatives
• Many types of risk measures
o Defined scenario-based stress testing
o Value at Risk (VaR)
o “Sensitivities”
• Key is valuation under different scenarios
• VaR is used in banking regulations, margin calculations and risk
management
7 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL
Value at Risk (VaR)
• VaR is a statistical measure of risk – expressed as amount of loss given
probability %. E.g. 97.5% chance that the firm will not lose more than 1mill
USD over the next 5 days
• Computing VaR is a challenging data sourcing and compute intensive process
• VaR calculation:
o Generate statistical scenarios of market behavior
o Revalue the portfolio for each scenario, compare returns to today’s value
o Sort results and select the desired percentage return: VALUE AT RISK
• Different VaR techniques:
o Parametric – analytic approximation
o Historical – captures real (historical) market dynamics
o Monte Carlo – many scenarios, depends on statistical distributions
8 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL
VaR Graphically
Source: An Introduction To Value at Risk (VAR), Investopedia, May 2010
9 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL
Complexities
• For modern financial firms, VaR is complex. Calculation requirements:
o Different types of assets require different valuation models
• Risk-based approach
• Full revaluation
o With large numbers of scenarios, many thousands of calculations are required
o Monte Carlo simulations require significant calibration, depending on large historical
data
• Many different reporting dimensions
o VaR is not additive across dimensions. Product/asset class, Currency
o Portfolio – including “what-if” and intraday activity
• Intraday market changes requiring new simulations
• Incremental VaR – how does a single (new) trade contribute to the total
10 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL
Backtesting VaR
11 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL
12 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL
• Data stored with REDUNDANCY on a Distributed File System
• Abstracts H/W FAILURES delivering a highly-available service on COMMODITY H/W
• SCALES-UP from single to thousands of nodes
• Data stored WITHOUT A SCHEMA
• Tuned for SEQUENTIAL DATA ACCESS
• Provides an EASY ABSTRACTION for processing large data sets
• Infrastructure for PARALLEL DATA PROCESSING across huge Commodity cluster
• Infrastructure for TASK and LOAD MANAGEMENT
• Framework achieves DATA-PROCESS LOCALITY
Hadoop Core
Makes two critical assumptions though:
• Data doesn’t need to be updated
• Data doesn’t need to be accessed randomly
13 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL
A Simple Map Reduce Job Problem Statement: From historical price data, create frequency distribution of 1-day %age change
for various stocks
Map 1
Map 2
Stock Date Open Close
BP 23-Nov 435.25 435.5
NXT 23-Nov 3598 3620
MKS 23-Nov 378.5 380.7
BP 22-Nov 434.8 433.6
NXT 22-Nov 3579 3603
MKS 22-Nov 377.8 378
BP 21-Nov 430.75 433
NXT 21-Nov 3574 3582
MKS 21-Nov 375 376
BP 20-Nov 430.9 432.25
NXT 20-Nov 3592 3600
MKS 20-Nov 373.7 375.3
BP 19-Nov 422.5 431.6
NXT 19-Nov 3560 3600
MKS 19-Nov 368.5 372.6
BP 16-Nov 423.9 416.6
NXT 16-Nov 3575 3542
MKS 16-Nov 370.3 366.4
BP 15-Nov 422 425.4
NXT 15-Nov 3596 3550
MKS 15-Nov 376.5 370.6
Map M
Reduce 1
Reduce 2
Reduce 3
Reduce N
SORT /SHUFFLE
BP|1, 33 BP|2, 64 …
NXT|81, 2 NXT|-20, 5 …
Output3
Output N
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { SecurityAttributes sa = RecordsReadHelper.readAttribs(value.toString()); context.write(new Text(sa.getTicker()), new IntWritable(sa.getPercentChange())); }
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { Map<Integer, Long> freqDist = buildFreqDistribution(values); Set<Integer> percentChanges = freqDist.keySet(); for (Integer percentChange : percentChanges) { context.write(new Text(key.toString() + "|" + percentChange.toString()), new LongWritable(freqDist.get(percentChange))); }
14 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL
Hadoop Ecosystem | How/Where These Fit
STORAGE LOAD
PROCESSING
SUPPORT
HUE
Zoo Keeper
Sqoop
hiho
Scribe
Flume
DATA WAREHOUSE
VISUALIZATION TOOLS
USERS
15 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL
Monte-Carlo VaR Implementation
16 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL
Monte Carlo VaR
Challenges
Daily trade data could be massive
Valuations are Compute intensive
VaR is not a simple arithmetic sum across hierarchies
Aggregation Aggregation AGGREGATION
HLV1 = (∑AiVi) 1
HLV2 = (∑AiVi) 2
HLV10k= (∑AiVi) 10k
…
…
2 Steps
SIMULATION
IBM
MSFT
V3 V10,000
IBM.CO
V2 V1
…
…
…
…
…
…
…
… … …
17 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL
Simulation Step - MapReduce
MAP - Read-through portfolio data - Emit (K,V) as (Underlyer,InstrumentDetails) e.g. (IBM, IBM.CO.DEC14.225)
SIMULATION
IBM MSFT
V3
IBM.CO
V2 V1
…
…
… …
…
…
… … … …
REDUCE - For the Underlyer, perform 10k random walks in parallel - For each random walk output, simulate derivative prices - Emit 10k sets of simulated prices of the stock and associated derivatives i.e. IBM , [V1, V2, …..V10000] IBM.CO.DEC14.225 , [V1, V2, …..V10000]
Job job = new Job(getConf()); job.setJobName("RandomValuationGenerator");
job.setMapperClass(SecurityAttributeMapper.class); job.setReducerClass(PriceSimulationsReducer.class); public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { SecurityAttributes sa = RecordsReadHelper.readAttribs(value.toString());
context.write(new Text(sa.getUnderlyer()), sa); }
SecurityAttributes stockAttrib = (SecurityAttributes) iter.next();
simPricesStock = getSimPricesForStock(stockAttrib); writeReducerOutput(stockAttrib, simPricesStock, context); … bsmp = new BlackScholesMertonPricingOption(); while (iter.hasNext()) { SecurityAttributes secAttribs = iter.next();
writeReducerOutput(secAttribs,getSimPricesForOptions( simPricesStock, bsmp, secAttribs), context); }
18 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL
Aggregation Step MapReduce
MAP - Read-through de-normalized portfolio data - Emit (K,V) as (Hierarchy-level, Position Details) US , [IBM, 225, 191.23] US|Tech , [IBM, 400, 191.23] US|Tech|Eric , [IBM, 400, 191.23]
REDUCE • For the hierarchy level (e.g. US|ERIC),
perform ∑AiVi for each simulation and get simulated portfolio values - HLVi
• Sort HLVi , find 1%, 5% and 10% values and emit position and VaR data
Aggregation Aggregation
HLV1 = (∑AiVi) 1 HLV2 = (∑AiVi) 2
…
…
protected void map(LongWritable key, HoldingWritable value, Context context) throws java.io.IOException ,InterruptedException { SecurityAttributes sa = RecordsReadHelper.readAttribs(value.toString());
Set<String> hierarchyLevels = sa.getHierarchyLevels(); for (String hierarchyLevel : hierarchyLevels) { context.write(new Text(hierarchyLevel), new Text(sa.getPositionDtls())); } }
Map<String, Double> portfolioPositionData = combineInputForPFPositionData(rows); Map<String, Double[]> simulatedPrices=
loadSimulatedPrices(portfolioPositionData.keySet()); for(long i=0; i<NO_OF_SIMULATIONS-1; i++) { simulatedPFValues.add(getPFSimulatedValue(i, portfolioPositionData, simulatedPrices)); } Collections.sort(simulatedPFValues); emitResults(portfolioPositionData, simulatedPFValues);
21 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL
DEMO RUN
22 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL
Observations
• As expected, processing time of Map jobs increased marginally
when input data volume was increased
• Process was IO-bound on Simulation’s Reduce job as
intermediate data emitted was huge
• Data replication factor needs to be chosen carefully
• MapReduce jobs should be designed such that Map/Reduce
output is not huge
23 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL
Questions?
24 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL
Thank You!
25 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL
Appendix
26 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL
27 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL
28 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL
Let’s build a Simple Map Reduce Job Problem Statement: Across a huge set of documents, we need to find all locations (i.e.
document, page, line) for all words having more than 10 characters.
Store Map
D A T A NO D E 1
D A T A NO D E 2 STORAGE
29 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL