18
Andre Langevin March 9 th 2016

Building Wall St Risk Systems with Apache Geode

Embed Size (px)

Citation preview

Andre Langevin March 9th 2016

Wall Street Derivative Risk Solutions Using Apache Geode (Incubating)

Design Whiteboards for Solution Architects

Design Pattern

Event-based cross-product risk system using Geode

A Crash Course in Wall Street Trading

•  Big Wall Street firms have “FICC” trading business organized by market: •  Each business will trade “cash” and derivative products, but traders specialize in one or the other •  There may be a team of traders working to trade a single product

•  Trading systems are product specific and often highly specialized:

•  May have up to 50 different booking points for transactions •  Multiple instances of the same trading system, deployed in different regions •  Electronic markets mean that there are often external booking points to consolidate

•  Managing these businesses requires a consolidated view of risk:

•  Risk factors span products and markets – it is not sufficient to just look at the risk by trade or book •  Risk measures must be both fast and detailed to be relevant on the trading floor •  Desk heads aggregate risk from individual trades to stay within desk limits for risk •  Business heads aggregate risk across desk to stay within the firm’s risk appetite and regulatory limits

FICC ”Fixed Income Commodities & Currencies”

Calculating Risk

•  What is the “risk” that we are trying to measure? •  Trades are valued by discounting their estimated future cash flows •  Discount factors are based on observable market data •  Movement in markets can change the value of your trades •  “Trade Risk” is the sensitivity of each trade to changes in market data

•  Markets are represented using curves: •  A curve is defined using observable rates and prices and then “built” into a smooth, consistent “zero

curve” using interpolation

•  Consistency is paramount: •  Most firms have a proprietary math library used for valuation and risk •  Use the same market data in all calculations to avoid basis differences

Technology Solutions that Work Badly

•  The easiest thing to do is just book all of your trades using one trading system! •  Trading systems are product specific for many very good reasons, so this idea is a non-starter

•  How about booking all of the hedges into the primary trading system?

•  Cash product systems can’t price derivatives, so you have to invent simple “proxies” for them •  Have to build live feeds from one trading system into another – or book duplicates by hand •  The back office has to remove the duplicates before settlement and accounting

•  How about adding up all of the risk from each trading system into a report? •  Almost impossible to make the valuations consistent across systems:

•  Different yield curve definitions, and different market data sources feeding curves •  Different math libraries, and often a technology challenge to make them run fast enough •  Different calculation methodologies (is relative risk up or down?)

•  Difficult to achieve speed needed to accurately compute hedge requirements

Cash Products Cash products are securities that are settled immediately for a cash payment, such as stocks and bonds.

Filling in the Details of the Design

Event-based cross-product risk system using Geode

PDX Integration Glue

•  PDX serialization is an effective cross-language bridge: •  PDX data objects bridge solution components in Java, C# and C++ •  Avoid language-specific data types (e.g. C# date types) that don’t

have equivalents in other languages

•  Structure PDX objects to optimize performance: •  May want to externalize sub-objects or lists into separate objects •  Balance speed of lookup with memory consumption •  Need to consider cluster locality

•  JSON is a good message format: •  PDX natively handles JSON, but not XML •  C# works well with JSON, so the calculation engine and the

downstream consumers should consume easily

Designing and Naming Data Objects

•  The trade data model serves two distinct purposes: •  Descriptive data is only used for aggregation and viewing •  Model parameters are only needed to calculate risk •  Can be split into two regions to optimize performance

•  Market data should follow the calculation design: •  Model data to align to the calculation engine’s math library to reduce

format conversions downstream

•  Use “dot” notation to give business-friendly keys to objects: •  Create compound keys like “USD.LIBOR.3M” and ”USD.LIBOR.6M” to

allow business users to “guess” a key easily – promotes use of Geode data in secondary applications and spreadsheets

•  Values in the “dot” name are repeated as attributes of the object

Region Design

•  Trade and market data regions: •  Both may be high velocity, but with a low number of contributors •  Curve definitions are updated slowly but used constantly •  Typically a curve embeds a list of rates – leave it denormalized if

rates are updated slowly •  If calculation engine supports it, create a second region to cache

built interest rate and credit curves (building a credit curve is 80% of the valuation time for a credit default swap)

•  Consider splitting model parameters from descriptive data to reduce amount of data flowing to compute grid

•  Foreign exchange quotes are typically small and updated daily •  Interest rates change slowly and are referenced constantly

•  Computational results and aggregation: •  Risk results will be the the largest and highest traffic region •  Pre-aggregate risk inside Geode to support lower powered

consumers (e.g. web pages)

Region Placement On the Geode Cluster

•  Region placement optimizes the solution’s performance: •  Consider placement of market data and trades holistically to make the

risk calculation efficient – keep all data on one machine

•  Partition the trades regions to balance the cluster: •  Partition trade region to maximize parallel execution during compute •  Use a business term (e.g. valuation curve, currency, industry) that can

be used to partition both the trade and market data sets

•  Partition or replicate market data to optimize computations: •  Replicate interest rates and foreign exchange rates to all nodes •  Replicate or partition curve data to maximize collocation of trades with

their market data to minimize cross-member network traffic •  When using an external compute grid, this technique should also be

applied to the local Geode cache on the compute grid

Getting Trade Data into Geode

•  Message formats vary by product type: •  OTC derivatives typically are usually captured in XML documents •  Bond trading systems use FIX or similar (e.g. TOMS) •  Proprietary formats from legacy trading systems

•  Broker messages in an application server:

•  Transactional message consumer is best pattern •  XML-to-object parsing tools readily available

•  Trade data capture is transactional: •  Best practice is to make end-to-end process a transaction, but may

need to split into two legs based on source of messages

Getting Market Data into Geode

•  Market data feeds have many proprietary formats

•  Market data is often exceptionally fast moving: •  Foreign exchange quotes for the major current pairs can reach

70,000 messages/second

•  Market data can also be very slow moving: •  Rate fixings like LIBOR are once per day •  Illiquid securities may not be quoted daily

•  Conflate fast market data by sampling:

•  Discard inbound ticks that don’t move the market sufficiently •  Sample down to a rate that your compute farm can accommodate •  External client required to conflate within message queue

•  Gate market data into batches:

•  Push complete update of all market data at pre-determined intervals •  Day open and close by trading location (NY, London, Hong Kong)

Crunching Numbers on a Shared Grid

•  Most trading firms have a proprietary math library: •  Developed by internal quantitative teams to ensure consistency •  Usually coded in C++ or C# to take advantage of Intel compute grid

•  Pushing Geode events to an external compute grid:

•  Typical compute grid has a “head node” or “broker” •  Use client-side Asynchronous Event Queue (“AEQ”) to collect events

for grid’s broker to process •  Stateless grid nodes can synchronously put results back to Geode

regions to ensure results are captured

•  Caching locally on the grid to accelerate performance: •  Grid nodes can use Geode client-side caching proxies •  Use client-side region interest registration to ensure updates are

pushed to grid nodes •  Can use wildcards on keys (see dot notation)

Crunching Numbers Inside Geode

•  Running the math inside Geode is dramatically faster: •  STAC Report Issue 0.99 in 2010 found that trade valuations running

inside GemFire 6.3 were 76 times faster than a traditional grid

•  Using the Geode grid as a compute grid: •  Math library must be coded in java (most are C++ or C#) •  Try to use function parameters to define data model •  Opportunities to cache frequently used derived results

•  Using cache listeners to propagate valuation events: •  Use cache listener to detect data updates in regions that contain

valuation inputs (e.g. new trade, market data updates) •  Do not listen to “jittery” regions, such as exchange rates

•  Encapsulate math into functions that cache listener can execute •  Ensure regions are partitioned in order to get parallel execution

across the grid

Ticking Risk Views

•  Roll-your-own client applications to view ticking risk: •  Desktop applications can use the client libraries to receive events

from the cluster using Continuous Queries, which can then be displayed in real time

•  Server hosted applications can use Continuous Queries or Asynchronous Event Queues

•  Integrating packaged products: •  Some specialty products handle streaming risk:

•  Armanta TheatreTM

•  ION Enterprise RiskTM

•  Integrate using custom java components

•  The traders will always want spreadsheets: •  Write an Excel a plug-in

Join the Apache Geode Community!

•  Check out: http://geode.incubator.apache.org

•  Subscribe: [email protected]

•  Download: http://geode.incubator.apache.org/releases/

Thank you!