Building Wall St Risk Systems with Apache Geode

Andre Langevin March 9th 2016

Wall Street Derivative Risk Solutions Using Apache Geode (Incubating)

Design Whiteboards for Solution Architects

Design Pattern

Event-based cross-product risk system using Geode

A Crash Course in Wall Street Trading

•  Big Wall Street firms have “FICC” trading business organized by market: •  Each business will trade “cash” and derivative products, but traders specialize in one or the other •  There may be a team of traders working to trade a single product

•  Trading systems are product specific and often highly specialized:

•  May have up to 50 different booking points for transactions •  Multiple instances of the same trading system, deployed in different regions •  Electronic markets mean that there are often external booking points to consolidate

•  Managing these businesses requires a consolidated view of risk:

•  Risk factors span products and markets – it is not sufficient to just look at the risk by trade or book •  Risk measures must be both fast and detailed to be relevant on the trading floor •  Desk heads aggregate risk from individual trades to stay within desk limits for risk •  Business heads aggregate risk across desk to stay within the firm’s risk appetite and regulatory limits

FICC ”Fixed Income Commodities & Currencies”

Calculating Risk

•  What is the “risk” that we are trying to measure? •  Trades are valued by discounting their estimated future cash flows •  Discount factors are based on observable market data •  Movement in markets can change the value of your trades •  “Trade Risk” is the sensitivity of each trade to changes in market data

•  Markets are represented using curves: •  A curve is defined using observable rates and prices and then “built” into a smooth, consistent “zero

curve” using interpolation

•  Consistency is paramount: •  Most firms have a proprietary math library used for valuation and risk •  Use the same market data in all calculations to avoid basis differences

Technology Solutions that Work Badly

•  The easiest thing to do is just book all of your trades using one trading system! •  Trading systems are product specific for many very good reasons, so this idea is a non-starter

•  How about booking all of the hedges into the primary trading system?

•  Cash product systems can’t price derivatives, so you have to invent simple “proxies” for them •  Have to build live feeds from one trading system into another – or book duplicates by hand •  The back office has to remove the duplicates before settlement and accounting

•  How about adding up all of the risk from each trading system into a report? •  Almost impossible to make the valuations consistent across systems:

•  Different yield curve definitions, and different market data sources feeding curves •  Different math libraries, and often a technology challenge to make them run fast enough •  Different calculation methodologies (is relative risk up or down?)

•  Difficult to achieve speed needed to accurately compute hedge requirements

Cash Products Cash products are securities that are settled immediately for a cash payment, such as stocks and bonds.

Filling in the Details of the Design

Event-based cross-product risk system using Geode

PDX Integration Glue

•  PDX serialization is an effective cross-language bridge: •  PDX data objects bridge solution components in Java, C# and C++ •  Avoid language-specific data types (e.g. C# date types) that don’t

have equivalents in other languages

•  Structure PDX objects to optimize performance: •  May want to externalize sub-objects or lists into separate objects •  Balance speed of lookup with memory consumption •  Need to consider cluster locality

•  JSON is a good message format: •  PDX natively handles JSON, but not XML •  C# works well with JSON, so the calculation engine and the

downstream consumers should consume easily

Designing and Naming Data Objects

•  The trade data model serves two distinct purposes: •  Descriptive data is only used for aggregation and viewing •  Model parameters are only needed to calculate risk •  Can be split into two regions to optimize performance

•  Market data should follow the calculation design: •  Model data to align to the calculation engine’s math library to reduce

format conversions downstream

•  Use “dot” notation to give business-friendly keys to objects: •  Create compound keys like “USD.LIBOR.3M” and ”USD.LIBOR.6M” to

allow business users to “guess” a key easily – promotes use of Geode data in secondary applications and spreadsheets

•  Values in the “dot” name are repeated as attributes of the object

Region Design

•  Trade and market data regions: •  Both may be high velocity, but with a low number of contributors •  Curve definitions are updated slowly but used constantly •  Typically a curve embeds a list of rates – leave it denormalized if

rates are updated slowly •  If calculation engine supports it, create a second region to cache

built interest rate and credit curves (building a credit curve is 80% of the valuation time for a credit default swap)

•  Consider splitting model parameters from descriptive data to reduce amount of data flowing to compute grid

•  Foreign exchange quotes are typically small and updated daily •  Interest rates change slowly and are referenced constantly

•  Computational results and aggregation: •  Risk results will be the the largest and highest traffic region •  Pre-aggregate risk inside Geode to support lower powered

consumers (e.g. web pages)

Region Placement On the Geode Cluster

•  Region placement optimizes the solution’s performance: •  Consider placement of market data and trades holistically to make the

risk calculation efficient – keep all data on one machine

•  Partition the trades regions to balance the cluster: •  Partition trade region to maximize parallel execution during compute •  Use a business term (e.g. valuation curve, currency, industry) that can

be used to partition both the trade and market data sets

•  Partition or replicate market data to optimize computations: •  Replicate interest rates and foreign exchange rates to all nodes •  Replicate or partition curve data to maximize collocation of trades with

their market data to minimize cross-member network traffic •  When using an external compute grid, this technique should also be

applied to the local Geode cache on the compute grid

Getting Trade Data into Geode

•  Message formats vary by product type: •  OTC derivatives typically are usually captured in XML documents •  Bond trading systems use FIX or similar (e.g. TOMS) •  Proprietary formats from legacy trading systems

•  Broker messages in an application server:

•  Transactional message consumer is best pattern •  XML-to-object parsing tools readily available

•  Trade data capture is transactional: •  Best practice is to make end-to-end process a transaction, but may

need to split into two legs based on source of messages

Getting Market Data into Geode

•  Market data feeds have many proprietary formats

•  Market data is often exceptionally fast moving: •  Foreign exchange quotes for the major current pairs can reach

70,000 messages/second

•  Market data can also be very slow moving: •  Rate fixings like LIBOR are once per day •  Illiquid securities may not be quoted daily

•  Conflate fast market data by sampling:

•  Discard inbound ticks that don’t move the market sufficiently •  Sample down to a rate that your compute farm can accommodate •  External client required to conflate within message queue

•  Gate market data into batches:

•  Push complete update of all market data at pre-determined intervals •  Day open and close by trading location (NY, London, Hong Kong)

Crunching Numbers on a Shared Grid

•  Most trading firms have a proprietary math library: •  Developed by internal quantitative teams to ensure consistency •  Usually coded in C++ or C# to take advantage of Intel compute grid

•  Pushing Geode events to an external compute grid:

•  Typical compute grid has a “head node” or “broker” •  Use client-side Asynchronous Event Queue (“AEQ”) to collect events

for grid’s broker to process •  Stateless grid nodes can synchronously put results back to Geode

regions to ensure results are captured

•  Caching locally on the grid to accelerate performance: •  Grid nodes can use Geode client-side caching proxies •  Use client-side region interest registration to ensure updates are

pushed to grid nodes •  Can use wildcards on keys (see dot notation)

Crunching Numbers Inside Geode

•  Running the math inside Geode is dramatically faster: •  STAC Report Issue 0.99 in 2010 found that trade valuations running

inside GemFire 6.3 were 76 times faster than a traditional grid

•  Using the Geode grid as a compute grid: •  Math library must be coded in java (most are C++ or C#) •  Try to use function parameters to define data model •  Opportunities to cache frequently used derived results

•  Using cache listeners to propagate valuation events: •  Use cache listener to detect data updates in regions that contain

valuation inputs (e.g. new trade, market data updates) •  Do not listen to “jittery” regions, such as exchange rates

•  Encapsulate math into functions that cache listener can execute •  Ensure regions are partitioned in order to get parallel execution

across the grid

Ticking Risk Views

•  Roll-your-own client applications to view ticking risk: •  Desktop applications can use the client libraries to receive events

from the cluster using Continuous Queries, which can then be displayed in real time

•  Server hosted applications can use Continuous Queries or Asynchronous Event Queues

•  Integrating packaged products: •  Some specialty products handle streaming risk:

•  Armanta TheatreTM

•  ION Enterprise RiskTM

•  Integrate using custom java components

•  The traders will always want spreadsheets: •  Write an Excel a plug-in

Join the Apache Geode Community!

•  Check out: http://geode.incubator.apache.org

•  Subscribe: [email protected]

•  Download: http://geode.incubator.apache.org/releases/

Thank you!