18
How a hedge fund uses MongoDB Roman Shtylman Athena Capital Research

How a Hedge Fund Uses MongoDB

Embed Size (px)

Citation preview

Page 1: How a Hedge Fund Uses MongoDB

How a hedge fund uses MongoDB

Roman ShtylmanAthena Capital Research

Page 2: How a Hedge Fund Uses MongoDB

Making money in the stock market.

1. Listen to market data2. ????3. Profit!

Page 3: How a Hedge Fund Uses MongoDB

Agenda

● About Athena Capital Research● 3 uses of MongoDB at Athena

○ Dropcopy○ BSON Logging○ Realtime Monitoring

● Wrap-Up● Questions

Page 4: How a Hedge Fund Uses MongoDB

Athena Capital Research

● Strong focus on technical talent and technology○ 90% of employees come from engineering, math, or hard

science backgrounds● Quantitative investment manager

○ math● Automated trading

○ robots● C++

○ speed● Open source stack

○ freedom

Page 5: How a Hedge Fund Uses MongoDB

MongoDB at Athena

● Lots of unstructured data● Many sources of data● Want to be able to query quickly● Not everything goes into a database● Avoid creating schema after schema

Page 6: How a Hedge Fund Uses MongoDB

Dropcopy

● Third parties require near-real-time reporting of trading activity

○ Accounting○ Risk management○ Compliance

● Exchanges provide a "drop-copy"○ FIX protocol

● Scrub the messages and forward to said third party○ MongoDB for message passing

Page 7: How a Hedge Fund Uses MongoDB

FIX Protocol

● Financial Information eXchange● Key/value based ASCII

○ Header + body + trailer○ Key is numeric (maps to some "standard" name)○ Value is string

● Good fit for MongoDB○ Key / value○ Flexible document sizes○ easier to query than SQL alternatives

Page 8: How a Hedge Fund Uses MongoDB

Architecture● We have incoming FIX session (drop copy)● Need to have outgoing FIX session

MongoDB acts as the glue (message passing layer)

1. Incoming drop copy  -> FIX log file2. fix2json3. MongoDB4. Tail cursor5. Client

Page 9: How a Hedge Fund Uses MongoDB

Drop side

● C++ client application for the drop copy connection○ Known system and can be kept database free○ QuickFix

● fix2json○ Tail reading of output FIX log files○ Easy to represent fix as json and subsequently bson○ Keep db inserts independent of FIX connection

● Downsides of combining○ Re-population○ Data will not be resent

Page 10: How a Hedge Fund Uses MongoDB

MongoDB setup

● Capped collection○ Natural index

● Data is purged daily using a simple MongoDB shell script● Important to keep tabs on the data size if your data

requirements change often○ Mitigated intraday if you are constantly reading○ Critical if you want full replay

● Easy to reconcile with Drop FIX logs

Page 11: How a Hedge Fund Uses MongoDB

Outgoing side

● C++ FIX application○ QuickFix

● Tail cursor○ Handling restarts

● Select only required fields● Filter and alter any field before sending● Outgoing message log in FIX● Easily handle different clients

Page 12: How a Hedge Fund Uses MongoDB

Benefits

● Full copy of incoming data for querying○ Aggregation queries

● Easy replay○ Client disconnects

● Easy verification

Page 13: How a Hedge Fund Uses MongoDB

BSON Logging

● Event logging○ Independent of std::cout

● Relevant for tracking down problems and keeping records● Logging time is "wasted" time● Previous logging solution was slow

○ XML based○ String conversions

● XML is easy to read after logging

Page 14: How a Hedge Fund Uses MongoDB

BSON Benefits

● Binary with loose document format○ Defined by the app during logging

● Internal data format for MongoDB○ mongorestore

● Exists sequentially in flat files● Easily rendered as json● Numbers:

○ original XML implementation: 1k ops/s○ improved XLM implementation: 3k ops/s○ first pass BSON implementation: ~20k ops/s○ current BSON implementation: ~30k ops/s

Page 15: How a Hedge Fund Uses MongoDB

BSON Gotchas

● BSON timestamp type is int64_t milliseconds● BSON not a standalone library

○ Highly coupled to MongoDB c++ driver● Like MongoDB, schema-less

○ Just something to remember if creating post-processing tools

Page 16: How a Hedge Fund Uses MongoDB

Realtime Monitoring

● Log entries are similar to one another○ Some can have extra fields

● Each machine contains independent logs○ Each log could be a different format○ Daemon to read and insert into MongoDD

■ Central location, no hunting when problems happen● Real-time monitoring and alerting

○ Human intervention required● Web based tools to "tail" view log entries

○ WebSockets

Page 17: How a Hedge Fund Uses MongoDB

Wrap-Up

● "Realtime" is relative○ Benchmark to meet your needs

● Disjoint pieces can be less prone to failure● Other MongoDB uses

○ Contribute to LuaMongo driver○ BSON code contributions○ Bugfixes

Page 18: How a Hedge Fund Uses MongoDB

Questions?

[email protected]

Reference:http://www.mongodb.org/display/DOCS/Tailable+Cursors

FIX:http://en.wikipedia.org/wiki/Financial_Information_eXchangehttp://www.quickfixengine.org/http://www.onixs.biz/tools/fixdictionary/

BSON:http://bsonspec.org/