Sr. Solution Architect, MongoDB
Matt Kalan
How Capital Markets Firms Use MongoDB as a Tick Database
Agenda
• MongoDB One Slide Overview
• FS Use Cases
• Writing/Capturing Market Data
• Reading/Analyzing Market Data
• Performance, Scalability, & High Availability
• Q&A
MongoDB Technical Benefits
Horizontally Scalable-Sharding
Agile &Flexible
High Performance-Indexes-RAM
Application
HighlyAvailable-Replica Sets
{ name: “John Smith”, date: “2013-08-01”), address: “10 3rd St.”, phone: [ { home: 1234567890}, { mobile: 1234568138} ] }
db.cust.insert({…})db.cust.find({ name:”John Smith”})
Most Common FS Use Cases
1. Tick Data Capture & Analysis
2. Reference Data Management
3. Risk Analysis & Reporting
4. Trade Repository
5. Portfolio Reporting
Writing and Capturing Tick Data
Tick Data Capture & Analysis Requirements
• Capture real-time market data (multi-asset, top of book, depth of book, even news)
• Load historical data
• Aggregate data into bars, daily, monthly intervals
• Enable queries & analysis on raw ticks or aggregates
• Drive backtesting or automated signals
Tick Data Capture & Analysis –Why MongoDB?
• High throughput => can capture real-time feeds for all
products/asset classes needed
• High scalability => all data and depth for all historical time
periods can be captured
• Flexible & Range-based indexing => fast querying on time
ranges and any fields
• Aggregation Framework => can shape raw data into aggregates
(e.g. ticks to bars)
• Map-reduce capability (Native MR or Hadoop Connector) =>
batch analysis looking for patterns and opportunities
• Easy to use => native language drivers and JSON expressions that
you can apply for most operational database needs as well
• Low TCO => Low software license cost and commodity hardware
Trades/metrics
High Level Trading Architecture
Feed Handler
Exchanges/Markets/Brokers
Capturing Application
Low Latency Applications
Higher Latency Trading
Applications
Backtesting and Analysis Applications
Market Data
Cached Static & Aggregated Data
News & social networking
sources
Orders
Orders
Trades/metrics
High Level Trading Architecture
Feed Handler
Exchanges/Markets/Brokers
Capturing Application
Low Latency Applications
Higher Latency Trading
Applications
Backtesting and Analysis Applications
Market Data
Cached Static & Aggregated Data
News & social networking
sources
Orders
Orders
Data Types• Top of book• Depth of book• Multi-asset• Derivatives (e.g.
strips)• News (text, video)• Social Networking
{ _id : ObjectId("4e2e3f92268cdda473b628f6"),symbol : "DIS",timestamp: ISODate("2013-02-15 10:00"),bidPrice: 55.37,offerPrice: 55.58,bidQuantity: 500,offerQuantity: 700
}
> db.ticks.find( {symbol: "DIS",
bidPrice: {$gt: 55.36} } )
Top of Book [e.g. equities]
{ _id : ObjectId("4e2e3f92268cdda473b628f6"),symbol : "DIS",timestamp: ISODate("2013-02-15 10:00"),bidPrices: [55.37, 55.36, 55.35],offerPrices: [55.58, 55.59, 55.60],bidQuantities: [500, 1000, 2000],offerQuantities: [1000, 2000, 3000]
}
> db.ticks.find( {bidPrices: {$gt: 55.36} } )
Depth of Book
{ _id : ObjectId("4e2e3f92268cdda473b628f6"),symbol : "DIS",timestamp: ISODate("2013-02-15 10:00"),bids: [
{price: 55.37, amount: 500}, {price: 55.37, amount: 1000}, {price: 55.37, amount: 2000} ],
offers: [ {price: 55.58, amount: 1000}, {price: 55.58, amount: 2000}, {price: 55.59, amount: 3000} ]
}
> db.ticks.find( {"bids.price": {$gt: 55.36} } )
Or However Your App Uses It
{ _id : ObjectId("4e2e3f92268cdda473b628f6"),symbol : "DIS",timestamp: ISODate("2013-02-15 10:00"),spreadPrice: 0.58leg1: {symbol: “CLM13, price: 97.34}leg2: {symbol: “CLK13, price: 96.92}
}
db.ticks.find( { “leg1” : “CLM13” },
{ “leg2” : “CLK13” },
{ “spreadPrice” : {$gt: 0.50 } } )
Synthetic Spreads
{
_id : ObjectId("4e2e3f92268cdda473b628f6"),
symbol : "DIS",
timestamp: ISODate("2013-02-15 10:00"),
title: “Disney Earnings…”
body: “Walt Disney Company reported…”,
tags: [“earnings”, “media”, “walt disney”]
}
News
{
_id : ObjectId("4e2e3f92268cdda473b628f6"),
timestamp: ISODate("2013-02-15 10:00"),
twitterHandle: “jdoe”,
tweet: “Heard @DisneyPictures is releasing…”,
usernamesIncluded: [“DisneyPictures”],
hashTags: [“movierumors”, “disney”]
}
Social Networking
{ _id : ObjectId("4e2e3f92268cdda473b628f6"),symbol : "DIS”,openTS: Date("2013-02-15 10:00"),closeTS: Date("2013-02-15 10:05"),open: 55.36,high: 55.80,low: 55.20,close: 55.70
}
Aggregates (bars, daily, etc)
Querying/Analyzing Tick Data
Architecture for Querying Data
Higher Latency Trading
Applications
Backtesting Applications
• Ticks• Bars• Other
analysis
Research & Analysis
Applications
// Compound indexes
> db.ticks.ensureIndex({symbol: 1, timestamp:1})
// Index on arrays
>db.ticks.ensureIndex( {bidPrices: -1})
// Index on any depth
> db.ticks.ensureIndex( {“bids.price”: 1} )
// Full text search
> db.ticks.ensureIndex ( {tweet: “text”} )
Index Any Fields: Arrays, Nested, etc.
// Ticks for last month for media companies
> db.ticks.find({ symbol: {$in: ["DIS", “VIA“, “CBS"]}, timestamp: {$gt: new ISODate("2013-01-01")}, timestamp: {$lte: new ISODate("2013-01-31")}})
// Ticks when Disney’s bid breached 55.50 this month
> db.ticks.find({ symbol: "DIS",
bidPrice: {$gt: 55.50}, timestamp: {$gt: new ISODate("2013-02-01")}})
Query for ticks by time; price threshold
Analyzing/Aggregating Options
• Custom application code– Run your queries, compute your results
• Aggregation framework– Declarative, pipeline-based approach
• Native Map/Reduce in MongoDB– Javascript functions distributed across cluster
• Hadoop Connector– Offline batch processing/computation
//Aggregate minute bars for Disney for February
db.ticks.aggregate( { $match: {symbol: "DIS”, timestamp: {$gt: new ISODate("2013-02-01")}}}, { $project: { year: {$year: "$timestamp"}, month: {$month: "$timestamp"}, day: {$dayOfMonth: "$timestamp"}, hour: {$hour: "$timestamp"}, minute: {$minute: "$timestamp"}, second: {$second: "$timestamp"}, timestamp: 1, price: 1}}, { $sort: { timestamp: 1}}, { $group : { _id : {year: "$year", month: "$month", day: "$day", hour: "$hour", minute: "$minute"}, open: {$first: "$price"}, high: {$max: "$price"}, low: {$min: "$price"}, close: {$last: "$price"} }} )
Aggregate into min bars
…
//then count the number of down bars
{ $project: { downBar: {$lt: [“$close”, “$open”] }, timestamp: 1, open: 1, high: 1, low: 1, close: 1}}, { $group: {
_id: “$downBar”,
sum: {$sum: 1}}} })
Add Analysis on the Bars
var mapFunction = function () {
emit(this.symbol, this.bidPrice);
}
var reduceFunction = function (symbol, priceList) {
return Array.sum(priceList);
}
> db.ticks.mapReduce(
map, reduceFunction, {out: ”tickSums"})
MapReduce Example: Sum
Process Data in Hadoop
• MongoDB’s Hadoop Connector
• Supports Map/Reduce, Streaming, Pig
• MongoDB as input/output storage for Hadoop jobs– No need to go through HDFS
• Leverage power of Hadoop ecosystem against operational data in MongoDB
Performance, Scalability, and High Availability
Why MongoDB Is Fast and Scalable
Better data locality
Relational MongoDB
In-Memory Caching
Auto-Sharding
Read/write scaling
Auto-sharding for Horizontal Scale
mongod
Read/Write Scalability
Key RangeSymbol: A…Z
Auto-sharding for Horizontal Scale
Read/Write Scalability
mongod mongod
Key RangeSymbol: A…J
Key RangeSymbol: K…Z
Sharding
mongod mongodmongod mongod
Read/Write Scalability
Key RangeSymbol: A…F
Key RangeSymbol: G…J
Key RangeSymbol: K…O
Key RangeSymbol: P…Z
Primary
Secondary
Secondary
Primary
Secondary
Secondary
Primary
Secondary
Secondary
Primary
Secondary
Secondary
MongoS MongoS MongoS
Key RangeSymbol: A…F, Time
Key RangeSymbol: G…J,Time
Key RangeSymbol: K…O,Time
Key RangeSymbol: P…Z, Time
Application
Summary
• MongoDB is high performance for tick data
• Scales horizontally automatically by auto-sharding
• Fast, flexible querying, analysis, & aggregation
• Dynamic schema can handle any data types
• MongoDB has all these features with low TCO
• We can support you with anything discussed
Questions?
Sr. Solution Architect, MongoDB
Matt Kalan
#ConferenceHashtag
Thank You