38
Building a Dark Pool Third Year Honours Project Author: Dan-Matei Arsenie BSc (Hons) Computer Science with Industrial Experience Supervisor: Prof. John Keane May, 2016 Page of 1 38

Building a Dark Pool - University of Manchesterstudentnet.cs.manchester.ac.uk/.../2016/dan.arsenie.pdf · Building a Dark Pool Third Year Honours Project Author: Dan-Matei Arsenie

  • Upload
    lyhanh

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Building a Dark Pool - University of Manchesterstudentnet.cs.manchester.ac.uk/.../2016/dan.arsenie.pdf · Building a Dark Pool Third Year Honours Project Author: Dan-Matei Arsenie

Building a Dark PoolThird Year Honours Project

Author: Dan-Matei ArsenieBSc (Hons) Computer Science with Industrial Experience

Supervisor: Prof. John Keane

May, 2016

Page � of �1 38

Page 2: Building a Dark Pool - University of Manchesterstudentnet.cs.manchester.ac.uk/.../2016/dan.arsenie.pdf · Building a Dark Pool Third Year Honours Project Author: Dan-Matei Arsenie

Abstract

The present financial landscape is fast paced and constantly changing. A recent revision of regulatory requirements for public stock exchanges has left investors open to speculative behaviour. We will first cover the financial background of how a private stock exchange (also known as a dark pool) can solve this issue. The report will then focus on how a complex system such as a stock exchange was designed, built and tested.

The project deliverable manages to achieve market-accurate behaviour in matching stock orders and has a scalable architecture that will run on almost all hardware/software configurations that support Java 6. An investigation of canonical computer science algorithms adapted for the graph matching problem is also provided, together with performance comparisons in terms of speed and results.

Page � of �2 38

Page 3: Building a Dark Pool - University of Manchesterstudentnet.cs.manchester.ac.uk/.../2016/dan.arsenie.pdf · Building a Dark Pool Third Year Honours Project Author: Dan-Matei Arsenie

Acknowledgments

I would like to extend my gratitude to my tutor, prof. John Keane for his continuous support over the course of the project. I would further like to thank my industry contacts, Edmondo Porcu for the technical advice he has given me and Irene Oppong for clarifying the regulatory obligations that financial institutions need to fulfil.

Page � of �3 38

Page 4: Building a Dark Pool - University of Manchesterstudentnet.cs.manchester.ac.uk/.../2016/dan.arsenie.pdf · Building a Dark Pool Third Year Honours Project Author: Dan-Matei Arsenie

Abstract 2Acknowledgments 3Chapter 1. Context 61.1 An Overview of Financial Markets 61.2 Project Motivation 81.3 Project Goals 9Chapter 2. Requirements Gathering 102.1 Client Communication API 102.2 Order Storage 112.3 Matching Engine 112.4 User Interface 12Chapter 3. System Design 133.1 Placing Orders 133.2 Withdrawing Orders 143.3 Querying for Orders 143.4 Matching Orders 153.5 Notifying Clients of Order Completion 163.6 Modelling Orders 163.7 Order Storage 163.8 Communication Layer 183.9 Matching Engine 19Chapter 4. System Implementation 224.1 Development Processes 224.2 Development Languages 234.3 Database Implementation 234.4 REST Service 244.5 Matching Engine 254.6 User Interface 26Chapter 5. Testing 275.1 Unit Tests 275.2 Specification Tests 285.3 System Tests 28Chapter 6. Evaluation 296.1 Test Data Generation 296.2 Expected Behaviour 30

Page � of �4 38

Page 5: Building a Dark Pool - University of Manchesterstudentnet.cs.manchester.ac.uk/.../2016/dan.arsenie.pdf · Building a Dark Pool Third Year Honours Project Author: Dan-Matei Arsenie

6.3 System results 306.4 Algorithm Performance 31Chapter 7. Reflection and Conclusions 327.1 Changes to Milestone Timing 327.2 New Knowledge 327.3 Project Results 337.4 Future Work 33References 34Appendix A 36Appendix B 37Appendix C 38

Page � of �5 38

Page 6: Building a Dark Pool - University of Manchesterstudentnet.cs.manchester.ac.uk/.../2016/dan.arsenie.pdf · Building a Dark Pool Third Year Honours Project Author: Dan-Matei Arsenie

Chapter 1. Context

When did people first start trading goods? The original Sumerian writing system derives from a system of clay tokens used to represent commodities[1]. This means that simple and localised forms of commerce precede written history. Over time, trading routes evolved and wherever they intersected, trading hubs formed. Merchants would gather in these markets to price their goods and either sell or exchange them.

As civilisation evolved, so did the nature of transacted goods. Debt issues for individuals and even governments were introduced, with fourteenth century Venetians being the leaders in this field. The first stock exchange was opened in Antwerp, Belgium in 1531. The imperialism of the 1600s brought about the first truly international trading corporations in the form of the numerous East India Companies. As we look closer to the present date, the world became more interconnected and the products exchanged grew more abstract, with pricing determined by complex mathematical formulas like the Black-Scholes model[2].

The large investment banks of today trace their origins back to the late 19th or early 20th century. The modern financial landscape in which they operate is as interesting as it is complicated. While a series of major financial events like the 2008 economic contraction or the European sovereign debt crisis have been the topic of extensive media coverage, there are other more frequently occurring dangers to investors, that rarely get discussed. One such threat, closely related to my project, involves speculative behaviour in a transparent stock exchange market. However, before discussing it any further, I need to provide a view of how the relevant parts of the present financial environment work.

1.1 An Overview of Financial MarketsSo far, the historic evolution of small markets and bazars to present day stock exchanges

has only been alluded to. Modern stock exchanges are governed by industry specific fairness rules and have reporting obligations to multiple government regulators, but their purpose has remained the same - to be the forum where offer and demand meet.

1.1.1 Financial BasicsIn order to introduce the financial concepts used in this project, we will discuss a simple

hypothetical scenario. Once an intuitive explanation of a topic has been presented, it will then be followed up with a formal economic definition.

Let us assume that through her company, Alice owns a can factory. She sells her product to the local soft drink factory, which Bob owns. Their contract requires Alice to deliver ten thousand cans per day to Bob’s factory and in order to do so Alice needs one ton of aluminium per day.

Since their collaboration is mutually profitable, Alice wants to make sure that her factory will always deliver the required amount of cans to Bob. She has enough raw materials to maintain the pace of production for the current month, but would also like to secure supplies for the coming month (22 working days). Alice also knows what is the most she can afford to pay for a ton of aluminium and still operate profitably - assume that amount is £500. This means she has to buy 22 tons of metal on her preferred stock exchange, under the condition that the price per ton is below £500.

Stock exchanges function by matching “trade orders” - instructions given to brokers to enter or exit a position. For example, Alice’s requirements are packaged by the stock exchange in the form of a buy limit order.

Definition: “A buy limit order is an order to purchase a security at or below a specified price. A buy limit order allows traders and investors to specify the price that they are willing to pay for a security, such as a stock. By using a buy limit order, the investor is guaranteed to pay that

Page � of �6 38

Page 7: Building a Dark Pool - University of Manchesterstudentnet.cs.manchester.ac.uk/.../2016/dan.arsenie.pdf · Building a Dark Pool Third Year Honours Project Author: Dan-Matei Arsenie

price or better, meaning that he or she will pay the specified price or less for the purchase of the security.”[3]

Alice would like the aluminium to be delivered to her factory on a daily basis, starting at a future date. The price of the contract is settled on the day she finds a provider willing to accept her terms. This is called a futures contract.

Definition: “A futures contract is a contractual agreement, generally made on the trading floor of a futures exchange, to buy or sell a particular commodity […] at a pre-determined price in the future. Futures contracts detail the quality and quantity of the underlying asset; they are standardised to facilitate trading on a futures exchange. Some futures contracts may call for physical delivery of the asset.”[4]

One day, Bob’s factory is asked by his customers to supply them an additional thousand cans of soda the very next day. This means Alice urgently needs more aluminium in order to fulfil this surge in request. She goes to her broker and asks for a same day delivery of aluminium to cover Bob’s updated order. This is a spot trade.

Definition: “A spot trade is the purchase or sale of a commodity for immediate delivery.” [5] As an observation, since spot trades are fulfilled at the time they are traded, they are more susceptible to market volatility than futures contracts. Because of this, spot contracts are on average more expensive that futures.

1.1.2 Public Stock ExchangesIn the previous scenario, the stock exchange was regarded as a black box system. Input

orders were given and matched contracts resulted as outputs. Some of the principles of typical stock exchange operation, that are relevant to the current project, will now be presented

To begin with, stock exchanges make a profit by charging either a fixed fee on executed transactions or a small percentage of the transacted value. The pricing model used differs between stock exchanges and is often dependant on the type of contract being traded.

When pairing orders, stock exchanges are legally required to act impartially, without favouring either the buying or the selling party[6]. In order to achieve this, they employ social surplus maximisation, by ensuring goods transact at their highest possible value (further details as to why this is true can be found in Appendix A).

Modern regulations require maximum transparency with regard to what transactions are being executed on the market at any point in time. This means stock exchanges need to report all executed contracts “as fast as technologically feasible”[7]. Real time transaction information is stored in publicly visible, free to access data repositories, like the one managed by the US based Depository Trust & Clearing Corporation (DTCC)[8].

Market transparency is beneficial for governmental bodies to ensure fair trade, avoid monopoly/duopoly/oligopoly scenarios from forming and prevent large scale default events like the one in 2008 from happening again. However, it opens the door to speculation. Going back to the example from the previous section, if someone notices a buying pattern in Alice’s behaviour, they can intentionally drive aluminium prices up in order to maximise their profits, at the expense of buyers like Alice.

Moreover, stock exchanges usually divide large orders into multiple smaller ones, in order to guarantee the shortest possible execution time (in general, there are more investors capable of fulfilling a small order than a large one). This means that, more often than not, someone can start manipulating the price to their interest as soon as they observe a relatively small buy/sell order being executed. It will most likely be immediately followed by others similar to it.

1.1.3 Private stock exchangesWe have seen how the transparency that allows government regulators to safeguard the

market from large scale financial difficulties comes at the price of exposing investors to speculative behaviour. It is worth noting that this is not just a theoretical threat. In his book titled “Flash Boys: A Wall Street Revolt”[9], investigative journalist and author Michael Lewis describes the phenomenon as being systemic. The question then becomes how to protect investors from this type of speculative practices? What has been done to address this issue and by whom?

One solution that has grown in popularity involves executing orders on private stock exchanges. These so called “dark pools” resemble their public counterparts in all but two important

Page � of �7 38

Page 8: Building a Dark Pool - University of Manchesterstudentnet.cs.manchester.ac.uk/.../2016/dan.arsenie.pdf · Building a Dark Pool Third Year Honours Project Author: Dan-Matei Arsenie

ways. Firstly, they are hosted within the walls of investment banks, as opposed to being publicly listed companies of their own. Secondly, all orders executed on a dark pool are anonymous and clients can specify when they want the bank to first report the transaction (usually as a percent of how much of the order has been executed).

Dark pools allow their clients to decide when the order should be reported. Until the client specified threshold of completion has been achieved, the matching of orders has two important properties. It is only theoretical (no real transfer of goods has yet taken place) and known to the investment bank hosting the exchange. Only when the threshold has been reached does the transaction (or a part of it) get executed and need to be reported.

It might now seem desirable to completely deprecate public stock exchanges and switch entirely to dark pools. However, this is not feasible, as each dark pool only contains offers from the clients of the host bank (as opposed to public stock exchanges, where any investor can participate). This lack of offer diversity greatly reduces the market share dark pools can hold. In fact, recent statistics posted by Bats Global Markets [10] shows that when added together, the top twelve largest private stock exchanges only amount for around 9% of the market. However, in his book[9] Michael Lewis quotes figures from his interviews wherein 70-80% of orders placed on a dark pool would be executed there. This means that dark pools do not currently accommodate for the needs of most market investors, but cater well for the segment they cover.

1.2 Project MotivationAs has already been established, the topic of protecting investors from speculative

behaviour is both relevant and open ended. Dark pools are the most popular solution for this problem at the moment. Their study and potential increase in market share is relevant for both economic and academic reasons. To begin with, having more transactions executed on private stock exchanges would save more of the investors’ money, given that their market moves would appear to the outside world as atomic transactions. This would reduce speculation against large orders being split into multiple smaller ones. From an academic perspective, it would be beneficial to observe the effects an increase in dark pool market share would have on some trading dynamics. In the same way in which introducing real-time transaction reporting rules has had the unwanted side effect of speculative behaviour, the impact of a higher dark pool market share is hard to predict. Dark pool simulations, such as my project, could expose potential faults with this trading model, in a safe environment.

Building a dark pool is also a complex process from a technological point of view. To begin with, it involves writing a concurrent, web-based application for accepting orders. Database and data structure knowledge is required in order to safely and efficiently store potentially large (starting at an order of magnitude of several GB per minute) volumes of data. Furthermore, if stock orders are represented using a bipartite graph, it provides an opportunity to apply canonical algorithmic approaches to the matching problem and study their performance using real world data. Such a project also requires extensive and varied testing, ranging from unit tests to end-to-end demonstrations. Finally, the system architecture needs to be easily scalable.

Page � of �8 38

Page 9: Building a Dark Pool - University of Manchesterstudentnet.cs.manchester.ac.uk/.../2016/dan.arsenie.pdf · Building a Dark Pool Third Year Honours Project Author: Dan-Matei Arsenie

1.3 Project GoalsThere are two main goals that this project aims to achieve:

• Build a simplified dark pool application programming interface (API) that can be used as a sandbox environment for further research. The technological objective here was to code the three main components of a stock exchange - an order entry point, an order matching engine and a form of order storage. It was also desirable that the resulting system architecture be easily deployable to most available hardware/software platforms and be effortlessly scalable. From a financial correctness perspective, I wanted the system to behave similarly to its commercial counterparts (like IEX, NASDAQ etc). This meant that when provided with historical stock order data, the system was expected to produce similar matches to what had happened on the market.

• Evaluate the performance of algorithmic matching solutions in the context of big data and suggest domain specific optimisations. Starting from an exponential class brute force algorithm and subsequently implementing a greedy strategy, a dynamic programming solution and a flow network, the aim was to compare the time and space performance of each algorithm. Furthermore, I wanted to draw conclusions as to which algorithm produced the most accurately matched orders.

Page � of �9 38

Page 10: Building a Dark Pool - University of Manchesterstudentnet.cs.manchester.ac.uk/.../2016/dan.arsenie.pdf · Building a Dark Pool Third Year Honours Project Author: Dan-Matei Arsenie

Chapter 2. Requirements Gathering

Based on the provided description of how stock exchanges operate, one can already envision a series of requirements. In this chapter, the specifications used in designing and implementing the project will be provided. They have been divided into four categories, reflecting the main work components: client communication API, stock order storage, matching engine and client UI.

2.1 Client Communication APIThe purpose of this component is to allow clients to programmatically communicate with the

system. For the clients, it serves as the only way to place or withdraw orders. The system uses it to acknowledge client messages and inform them when their orders have expired or been completed.

Functional Requirements Non-Functional Requirements

Authenticate a client to the system using their unique ID

Reply within the standard 20-120 second timeout most web frameworks use [11][12]

Allow incoming messages from authenticated clients, as long as the message is either a new order, amend order or withdraw order

Use standard HTTP response codes

Reject any invalid or incorrect messages Use HTTP verbs suitable for each operation (example: DELETE for withdrawing orders, POST for placing etc)

Reply to clients with a message of acknowledgement or rejection, as case dictates

Be extensible, as new forms of client-system interactions may be added in the future

Maintain message ordering in client-server interactions (sending order is same as receiving order)

Maintain client privacy, denying requests to disclose confidential information to other clients.

Handle multiple stock orders arriving simultaneously

Operate asynchronously (clients should be able to place an order without waiting for the previous one to be executed)

Page � of �10 38

Page 11: Building a Dark Pool - University of Manchesterstudentnet.cs.manchester.ac.uk/.../2016/dan.arsenie.pdf · Building a Dark Pool Third Year Honours Project Author: Dan-Matei Arsenie

2.2 Order StorageOnce an order has been received through the client communication API, it needs to be

stored for processing and subsequent audit purposes. The role of this component is to ensure correct data persistence before and after matching.

2.3 Matching EngineThe purpose of the matching engine is to pair together buy and sell orders from the

database in a way that best serves the interests of the customers and favours the health of the market. As such, it performs the most challenging function of a stock exchange and represents a core focus of my project.

Functional Requirements Non-Functional Requirements

Persist all accepted new orders Support simultaneous read operations

Differentiate between orders that have been matched and those that have not

Support simultaneous independent write operations

Propagate newly added orders to matching engine Be fast enough to support potentially large volumes of data being added (order of GB per second)

Respond to valid queries Provide redundancy

Provide a failover mechanism

Ensure efficient storage of information (be normalised)

Functional Requirements Non-Functional Requirements

Pair orders in a way that maximises social surplus Deterministic behaviour

Given real market data from the past as input, produce output similar to historic values

Create logs as audit trails that explain the matching of orders that has been performed

Respond to database triggers of new information available and update internal data structures at soonest available opportunity

Operate as fast as possible, given large volume of work data

Remove matched orders from internal data structures and mark them appropriately in the database

Be scalable

Allow for interchangeable algorithm modules that handle pairing of orders

Allow for testability and performance analytics for all algorithm modules

Page � of �11 38

Page 12: Building a Dark Pool - University of Manchesterstudentnet.cs.manchester.ac.uk/.../2016/dan.arsenie.pdf · Building a Dark Pool Third Year Honours Project Author: Dan-Matei Arsenie

2.4 User InterfaceWhile providing a user interface is not one of the main goals of my project, there are clear

benefits to visualising its results. As such, the role of this component is to provide a visual aid in understanding the output data.

Functional Requirements Non-Functional Requirements

Allow dark pool users to see their orders on a time scale and filter them by date range, matching state and financial details

Ease of use (intuitive design)

Update displayed information live Maintain user privacy by only displaying data for current user

Allow users to see individual order details Reply within the standard 20-120 second timeout most web frameworks use [11][12]

Be compatible with all major web browsers

Page � of �12 38

Page 13: Building a Dark Pool - University of Manchesterstudentnet.cs.manchester.ac.uk/.../2016/dan.arsenie.pdf · Building a Dark Pool Third Year Honours Project Author: Dan-Matei Arsenie

Chapter 3. System Design

There are five main backend flows that constitute my project - placing orders on the system, withdrawing an existing order, querying the system for existing orders, matching orders and notifying the client of order completion. In this chapter, a description for each of them will be given. We will then examine each component in further detail, explaining the design choices made and relating them back to the requirements from the previous section.

3.1 Placing OrdersThe client communication API represents the interface to the system. With the help of a

provided message factory, customers can input the details of their order. A round of logical correctness checking is performed by the factory. A syntactically correct message is said to have illogical values for fields when trying to trade a negative quantity, place an order that expires on a past date or set a limit price below or equal to zero. Such occurrences will be rejected with the list of logical errors identified included in the error message. The order is then sent to the dark pool server where a round of integrity checks follow (to confirm that the client sending the request is registered with the dark pool). The flow for a successful order placing, showing the components that the client interacts with, can be seen in Figure 1.

Once a valid stock order has reached the server, it is inserted in the database of currently unpaired contracts (from hereon called “live orders database”). Upon successful insertion in the live orders database, the database interactions layer returns its unique identifier. The server forwards this identifier to the client as part of the confirmation message. The interaction between the Representational State Transfer (REST) service and live orders database is represented in Figure 2.

Page � of �13 38

Figure 1. Client using order factory Figure 2. REST to Live Orders Database

Page 14: Building a Dark Pool - University of Manchesterstudentnet.cs.manchester.ac.uk/.../2016/dan.arsenie.pdf · Building a Dark Pool Third Year Honours Project Author: Dan-Matei Arsenie

3.2 Withdrawing OrdersSimilarly to the process of placing an order, withdrawing one begins by formulating a

request to the server, via the web service. The request need only contain the ID of the order the server had returned upon insertion. On the server side, checks to ensure the order still exists in the live orders database and that it belongs to the client who has formulated the request are performed. If all conditions are met, the order is deleted from the live orders database and a confirmation of this is propagated to the client, via the REST service. While the components involved in this flow are the same as the ones involved in placing an order, the messages they exchange are different. Figure 3 shows a successful end-to-end order removal.

3.3 Querying for OrdersAs seen before, the client has to initiate the process by formulating a request via the REST

service. This request needs to contain financial details of the orders they want to see, a private client identifier and a time frame to bound the results. Once a request has reached the server, it is translated into a query, executed and the marshalled results are returned to the client via the REST interface. This flow can be seen in Figure 4. An example array of results is shown in Figure 5.

Page � of �14 38

Figure 3. Withdrawing Orders Flow

Figure 4. Order Query Flow Figure 5. Envisioned Result Set

Page 15: Building a Dark Pool - University of Manchesterstudentnet.cs.manchester.ac.uk/.../2016/dan.arsenie.pdf · Building a Dark Pool Third Year Honours Project Author: Dan-Matei Arsenie

3.4 Matching OrdersThe process of order matching begins with a database trigger listener. Such triggers are

created by the database interaction layer whenever a new order has been added or an existing order has been removed.

The listener process is responsible with consuming database triggers and notifying the matching engine whenever new data has become available or an order has been withdrawn. In the case of a withdrawn order, the matching engine simply removes it from its data structures. In the case of new orders being added, the matching engine sends an update request to the database interaction layer, specifying the unique ID of the most recent order it has. The database interaction layer queries the live orders database for data newer than that and returns it (Figure 6 refers to this new data as “deltas”).

The matching engine performs the order pairing (details of which are provided in more detail in Section 3.9) and sends the IDs of the successfully paired orders to the database interaction layer. The latter then removes the orders from the live orders database and stores them into the closed orders database.

This flow and the messages exchanged by the described components are shown below in Figure 6.

Page � of �15 38

Figure 6. Matching Order Flow

Page 16: Building a Dark Pool - University of Manchesterstudentnet.cs.manchester.ac.uk/.../2016/dan.arsenie.pdf · Building a Dark Pool Third Year Honours Project Author: Dan-Matei Arsenie

3.5 Notifying Clients of Order CompletionOnce an order has been paired and added to the closed orders database, a trigger is

created, containing the order ID, settle price and the client who had placed it. The activity listener consumes this trigger and notifies the client via the REST service. This flow is represented in Figure 7.

3.6 Modelling OrdersThus far, the implementation details of an order have been abstracted away in the

description of the system flows. We will now provide a more detailed look at how dark pool orders have been modelled, leveraging the financial concepts introduced in Chapter 1. A visualisation of the data structure used is show in Figure 8.

To begin with, a choice was made to model the most commonly used stock instrument - the limit order. The meaning of each is provided below:• ID serves as a unique system identifier and is assigned by the live

order database, upon insertion. • Client ID uniquely identifies the customer who has place the order. • Type of security is a financial term that defines what kind of

instrument the order refers to. In the present case, orders are allowed to be either futures or spot contracts.

• Amount represents the total size of the order (tons of aluminium for instance).

• Direction can either be buy or sell, depending on client intention.• Expiry date is used by clients to specify how long they are willing to

wait until the order is completed.• Arrival time is populated by the live orders database and is used for

audit purposes.• Paired Order ID is populated by the matching engine once an order has been completed. It

serves as an audit trail for how supply has met demand.• Previous Order ID is populated when the matching engine divides a larger order into smaller

parts. It serves as an audit trail of how the algorithm has behaved (this mechanic is discussed in more detail in Section 3.9).

3.7 Order StorageAs has already been mentioned, this project makes use of two separate databases: one for

orders that are yet to be matched and another one for those that have already been paired. We will begin by discussing the common traits of the live and closed order databases, then highlight their design differences.

To begin with, both are relational databases that share the same structure, consisting of two tables - Client and StockOrder. The Client table of the live orders database holds information about currently registered clients of the system, while its counterpart in the closed orders database maintains information on both past and current clients. A client entry is a tuple composed of an ID, a name, an account number and a means of contact. For the purpose of the system, this is enough information. The unique client ID is also used as a primary key to the table.

Page � of �16 38

Figure 7. Order Closing Flow

Figure 8. Order Model

Page 17: Building a Dark Pool - University of Manchesterstudentnet.cs.manchester.ac.uk/.../2016/dan.arsenie.pdf · Building a Dark Pool Third Year Honours Project Author: Dan-Matei Arsenie

StockOrder has columns for all of the attributes used in modelling an order. The unique ID field is used as a primary key for the table. An additional foreign key, ClientID, is present, joining the two tables. Constraints on order direction being either a buy or a sell and type of security traded being either a spot or a future are also present.

The entity relationship diagram for the database is shown in Figure 9.

It can be observed that the above schema is normalised to at least the second normal form. This is because there can be no duplicate fields in the table (which passes the first normal form) and there is no partial dependency on a concatenated key (which passes the second normal form).

Redundancy and fallback are ensured by having periodic copies of both databases stored in a separate location. These copies are created individually for each database when a user definable number of entries have been added. Creating back-ups for every newly added record is infeasible given the speed at which new data arrives in the system, while backing up once a day would risk loosing too much information in the event of a system failure. I believe there is no universally correct solution to this problem. In the present design it has been decided to implement a back-up every five minutes, as a proof of concept.

The main difference between the two databases relates to the trade off between speed and storage. The live order database needs to be fast enough to support potentially tens of thousands of orders per second[13]. Assuming all string data fields in my database are of maximum size 10 (the only strings currently being allowed define direction of trade and type of security traded, the value of the longest being 6 and corresponds to “future” ) and all integers (including date/time fields) are 32 bit, a limit order database entry will be of maximum size 52 bytes. This means that for every ten thousand trades an average volume of just over 0.5 GB per second has to be persisted. A database writing to a fast SSD can only store at most 509 MB/s[14]. An in-memory database (held in RAM) using DDR4 SDRAM can write at up to 14513 MB/s[15]. As such, the live order database has to be stored in RAM.

In contrast, storage space is more important than speed to the closed order database. Writing operations to it can be buffered if necessary, but the emphasis is on maintaining an audit

Page � of �17 38

Figure 9. Entity Relationship Diagram

Page 18: Building a Dark Pool - University of Manchesterstudentnet.cs.manchester.ac.uk/.../2016/dan.arsenie.pdf · Building a Dark Pool Third Year Honours Project Author: Dan-Matei Arsenie

trail of transactions performed over a long period of time. As such, standard HDD or SSD persisted database technologies will suffice.

Finally, as evidenced in the sections on system flows, the database interactions are designed to either be insert or withdraw operations. The lack of updates to records means that multiple read operations can take place on the same entry and are guaranteed to return the same result. This is extremely useful in terms of designing a parallel system.

3.8 Communication LayerThe client to server communication layer is designed as an asynchronous representational

state transfer (REST) web service that exposes a dedicated URL for each of the supported client interactions. These include placing an order, cancelling an order and querying the dark pool for orders belonging to a given client. As a means of guaranteeing that all messages received by the system follow the expected format, a request factory is provided. Its secondary role is to perform logical checks on the messages (as described in Section 3.1). Finally, once users have programmatically filled in the details of the desired message type, they use the request factory to send their request to the REST service.

The Richardson Maturity Model[16] is used to grade an API against the constraints of REST. It distinguishes between four levels of sophistication, of which only the highest describes a fully RESTful service. We will now briefly cover the four levels of the model and argue how the presented design obeys all of them:• Level 0 (Swamp of POX) uses only one entry point (Universal Resource Identifier, or URI) and

one kind of HTTP method. The fact that the current system has multiple URIs and uses three different HTTP methods (POST, DELETE and GET) means it passes this level. Figure 10 lists the URIs supported by the system and the methods that are used for each.

• Level 1 (Resources) enriches Level 0 by adding support for multiple resources. However, it is still confined to using only one HTTP predicate. Figure 10 shows how the current system passes the requirements for the level.

• Level 2 (HTTP verbs) requires the proper use of dedicated HTTP methods and response codes for each scenario. In the project design, POST is used to send new orders to the dark pool, GET is used to retrieve order information for the given client and DELETE is used to send withdraw order requests. Furthermore, Figure 11 explains the usage of HTTP codes in the system.

Code Meaning

200 Request successfully fulfilled

400 Bad request received (potential problems with message body)

401 Client is trying to access resources/information not intended for them (i.e. trying to query other clients’ orders)

404 URI attempted is not valid

405 Client has attempted to use the wrong HTTP method for the given URI (i.e. GET on /orders/withdraw)

500 The system is experiencing an internal error and requires maintenance work

Page � of �18 38

Figure 11. HTTP response code meaning in system

Figure 10. Available REST resources

Page 19: Building a Dark Pool - University of Manchesterstudentnet.cs.manchester.ac.uk/.../2016/dan.arsenie.pdf · Building a Dark Pool Third Year Honours Project Author: Dan-Matei Arsenie

• Level 3 (Hypermedia controls) introduces Hypertext As The Engine Of Application State (HATEOS). This implies that further service functionality is exposed as part of the reply. In our case, this is done on POST orders, where a link to the withdraw URI is provided.

3.9 Matching EngineSection 3.4 has already established the purpose and the flow of the matching engine. It is

now time to present in more detail how this component operates. There are three main parts to the matching engine. To begin with, there is a pre-matching module, whose purpose is to reduce the number of orders that end up in the matching pool (the data structure which holds all open orders). Following the pre-matching module, there is a runner system, whose purpose is to manage the matching pool and feed orders from it to the pairing algorithm. Finally, the pairing algorithm attempts to match the orders it is given in a way that maximises social surplus.

There is an additional component, called the activity listener (pictured in Figure 6), whose purpose is to monitor database activity and notify the matching engine that it needs to update its internal data structures. However, constant interruptions to the matching engine for the purpose of updating its data store are time expensive. A better solution is therefore required. There is nothing that can be done about delete interruptions, as the system cannot risk pairing withdrawn orders. However, new order interruptions can benefit from the following optimisation: only interrupt the matching engine when a set number of new orders have been added. If a fixed time has passed since the last listener interruption, it may be the case that new orders have arrived but not reached the interruption threshold. In this case, the matching engine will query the database for new orders without the need for listener interruption.

The pre-matching module is the first component in the matching engine to receive new order information. It operates by trying to pair orders that perfectly match each other. For example, assume an existing spot buy order of 100 tons of aluminium priced at most at £100 represents the highest amount anyone on the market is willing to pay for that quantity. Such an order will be matched by this module with a newly arrived spot sell order of 100 tons of aluminium priced at a value less than £100. Observe how in this situation, the rule of maximising social surplus has not been violated.

In order to implement its functionality, the pre-matching module is designed to store all orders currently in the matching pool in a look-up table, indexed by the order type, amount, price and type of security traded fields. When a new order arrives, this module pulls all currently stored data that match the new arrival in terms of amount, price and type of security traded, but are of opposing direction. The one that ensures the highest price to the seller is then paired with the new arrival. The polling of a look-up table for compatible values takes linear time and so does finding the element with the maximum price. As such, some orders can be paired up in linear time, before any algorithm is even employed.

The orders that could not be closed by the pre-matching module arrive at the runner system, which stores them in the matching pool. The runner system uses one of the available pairing algorithms on the currently open orders. It is important that the runner system and pairing algorithms employed be decoupled, in order to allow system extensibility. In this way, when more performant algorithmic modules become available, they can easily be swapped in favour of the older ones. The other role fulfilled by the runner system is interrupting the matching process when it has been notified of an order withdraw. When this happens, the order that has been withdrawn is removed from the matching pool and any tentative matches of which it was a part are undone. Furthermore, any contracts closed that involved the withdrawn order are reverted. The internal flow of the matching engine for new order updates is represented in Figure 12. Figure 13 presents the same flow for when a withdraw interrupt has been received.

Page � of �19 38

Page 20: Building a Dark Pool - University of Manchesterstudentnet.cs.manchester.ac.uk/.../2016/dan.arsenie.pdf · Building a Dark Pool Third Year Honours Project Author: Dan-Matei Arsenie

The system makes use of four different matching algorithms modules, which will now be presented in more detail:• Naive Matching is a brute force approach that exhaustively attempts to match any two trades.

Given that it explores the whole set of potential pairings, it is guaranteed to produce an optimal match. It can be observed that such an algorithm is time bounded by at least an exponential function to the size of the input.

• Greedy Matching is similar to the pre-matching module solution. It works by first considering each sell order and finding its potential matches in terms of price and quantity. It then sorts these matches descending by price and starts pairing them up with the current sell order, splitting the buy orders as need be (Figure 14). This approach ensures social surplus and runs in polynomial time. Figure 14 also shows the downside of this approach: the sell order will be fully paired, but it might not be closed until the matching threshold for all the buy orders whose pieces it has been paired with is reached. Lowering the matching threshold speeds up closing orders, but starts to question the usefulness of a private stock exchange.

Page � of �20 38

Figure 12. New order interruption

Figure 13. Delete order interruption

Figure 14. Issue of pairing with matching threshold

Page 21: Building a Dark Pool - University of Manchesterstudentnet.cs.manchester.ac.uk/.../2016/dan.arsenie.pdf · Building a Dark Pool Third Year Honours Project Author: Dan-Matei Arsenie

• The Dynamic Programming approach to the problem of matching can be seen as a rephrased 0/1 Knapsack Problem. Given a sell request and a list of buy orders, each defined by its amount (weight) and limit price (value) what are the corresponding buy orders that can ensure a maximum value for the sell request while completely fulfilling it and not dividing any of the buy orders. This solution also works in polynomial time, but is very dependent on an even distribution of order amounts, such that fast and complete pairings can be produced.

• Using a Flow Network has the logical advantage of overlaying a graph model to the problem space. It employs standard computer science algorithms, such as Ford-Fulkerson[17], but suffers from the same limitation as the Greedy Matching solution - until all orders that make up the pairing have reached their matching threshold, the match cannot be completed.

Page � of �21 38

Page 22: Building a Dark Pool - University of Manchesterstudentnet.cs.manchester.ac.uk/.../2016/dan.arsenie.pdf · Building a Dark Pool Third Year Honours Project Author: Dan-Matei Arsenie

Chapter 4. System Implementation

So far, we have seen how the domain requirements have been reflected in the system design. The aim of this chapter is to cover the development methodologies used during project realisation, discuss implementation details and provide an overview of technologies involved in building a dark pool.

4.1 Development ProcessesDuring the course of this project, three main development processes have been used. For

each of them, a brief explanation will be provided, followed by individual descriptions of how they were applied.

4.1.1 Test Driven DevelopmentTo begin with, the project was developed in line with Test Driven Development (TDD)

recommendations. This involves repeating a short development cycle that begins with writing an (initially failing) automated test case, in order to define desired functionality. Then, the minimum amount of code is written to pass the test. Refactoring the code to acceptable standards follows. For the current project, the two types of tests used were unit and specification tests.

In order to pass, an example unit test written during the database implementation phase required the storage of an existing database to disk. The minimum amount of code required to pass it involved producing a function that took a database instance as a parameter and persisted it to disk under a generic name. Subsequent refactoring then ensured that the name given to the newly stored database was unique, in order not to override previous saves.

Specification tests were written to ensure that the REST service behaved according to expectations. Such an example required that a valid message sent to a valid URI, using the wrong HTTP verb be rejected with a standard 405 error code and an explanation of what the correct HTTP method should have been. The minimum code required to pass this test involved an overriding of the API error handling, in order to insert the hardcoded message. Upon refactoring, a dedicated exception class was created for this purpose and its instances used as responses.

4.1.2 Learning SpikesLearning spikes are an Agile practice used in order to minimise the time required to gain a

working knowledge of a new tool/technique/language. They involve studying just enough of the new element in order to produce a set of tests that model desired behaviour. Following this first step, developers proceed to study only as much as is needed to write code that passes each test. During this project, learning spikes have been used for both REST API implementation as well as database design. On average, each new component required 2-3 hours of investigation before tests could be written for it, followed by a similar time investment for implementation research.

4.1.3 Kanban Task BoardAgile task boards are a lightweight mechanism for tracking progress on a project. User

stories represent rows on the planning board, each story having multiple features associated with it. Kanban boards build on the idea of an AGILE task board by adding the concept of limiting work-in-progress (WIP)[18]. The board layout used during this project was a standard Kanban design, consisting of the following headings: Story, Task Backlog, In Development, In Testing, Done. Details of what each of these categories means, together with an example board can be seen in Appendix B.

Page � of �22 38

Page 23: Building a Dark Pool - University of Manchesterstudentnet.cs.manchester.ac.uk/.../2016/dan.arsenie.pdf · Building a Dark Pool Third Year Honours Project Author: Dan-Matei Arsenie

4.2 Development Languages

The two main development languages used as part of this project are Scala and SQL. The purpose of this section is to elaborate on the reasons behind using these technologies and compare them with other available options.

4.2.1 ScalaScala is a statically typed JVM compiled language. This means it can interact seamlessly

with Java byte code, as they both run in the same virtual machine. It also natively supports all the programming paradigms that Java uses (like object-oriented programming). Furthermore, it offers developers the ability to write native functional code (which was also added in Java 8), using a minimum Java Runtime Environment (JRE) 6[19].

It has been argued that Scala is more expressive than Java[20][21]. Other advantages over Java include a superior pattern matching functionality [22], better native support for multithreaded computing [23] and a richer type system [24].

Scala also benefits from a powerful dependency management and build tool, called Simple Build Tool (SBT) [25]. A detailed comparison between SBT and other build tools that also explains the advantages of the former can be found here [26].

For all the above reasons, Scala was chosen as the main development language of the project. A more detailed discussion on the topic of using Scala over Java falls outside of the scope of this report.

4.2.2 SQLStructured Query Language (SQL) is the domain specific language designed for managing

data held in a relational database management system (RDBMS). As such, the use of SQL is dictated by the type of database used. The alternative to a RDBMS is a non-relational (NoSQL) database. Given that relational databases guarantee a fixed schema (useful for consistency) and ACID compliance, while NoSQL implementations do not, the choice was made to use the former.

4.3 Database Implementation

As the previous section has already established, the system will make use of a RDBMS. As described in Section 3.7, the live orders database will be held in RAM memory, while the closed orders database will be stored on disk. This section aims to first cover the decision concerning what relational database framework to use. Implementation details will then be presented, including storage redundancy and fail-over.

From a software engineering perspective, the best way to implement all database related functionality was by means of a library singleton object.

4.3.1 Database EngineH2 [27] is an open-source relational database engine written in Java that supports the

standard Java Database Connection (JDBC) API. It fits the requirements set out in Section 2.2 in the following ways:• Allows for the creation of both in-memory (fast) and disk persisted (large) databases. • Has support for SQL triggers, like the ones needed to notify of new order arrivals/order

withdrawals.• Supports multiple connections and concurrency.• Has support for two phase commit transactions.• Is faster than other open-source or commercial solutions [28].• Is lightweight, as the JAR file only requires between 1-2 MB of disk space.

For all of the above reasons, H2 was selected for use in this project.

Page � of �23 38

Page 24: Building a Dark Pool - University of Manchesterstudentnet.cs.manchester.ac.uk/.../2016/dan.arsenie.pdf · Building a Dark Pool Third Year Honours Project Author: Dan-Matei Arsenie

4.3.2. Redundancy and Fail-OverRedundancy is implemented for both live and closed orders databases. For the former,

given that it is an in-memory database, redundancy is ensured by having periodic copies of it created on disk and maintaining logs of all orders being inserted. Redundant storage for the already disk-persisted closed orders database is ensured by storing a copy of the it in a different location (ideally, this would be a different disk). Additionally, two copies of the closed orders logs are kept - one for each location where the database is stored.

If an unexpected disk/memory failure happens, fail-over is ensured by first loading the safety back-up from its remote location. Any changes that had occurred on that database, recorded by the log files after the safety snapshot had been taken, are then performed. While the backloading of changes is happening, any new orders/withdrawals are queued and the matching process is paused. Once the information held in the failed database(s) is up to date with the logs, normal system operation is resumed.

4.4 REST Service

With the resources and behaviour of the communication layer defined in Section 3.8, it is now time to present the implementation of the REST API. The first part of this section will cover the choice of framework and briefly explain the actor model design pattern. Details of message processing and multithreaded programming will then be presented.

4.4.1 REST FrameworkThe two most popular web development JVM frameworks are spray.io[29] and Play[30].

Both allow developers to build scalable, modern web and mobile applications. In terms of community and documentation, there is ample literature on both. There are also no significant performance differences between the two. However, spray.io has a more modular design, as such the decision was made to use it for this project. The modules used were spray-can (HTTP server), spray-routing (for exposing resources), spray-client (further HTTP functionality like request building) and spray-json (for JSON marshalling/unmarshalling).

4.4.2 Actor Model and AkkaSpray uses an asynchronous mathematical model of concurrent computation called “the

actor model”. Actors are treated as the universal primitives of concurrent computation. Each actor has an inbox and an outbox for communicating with the other actors in the system. It is important to know that literature [31] describes the actor model as being the most scalable multithreaded design to date.

Akka [32] is a toolkit and runtime for building highly concurrent, distributed, and resilient message-driven applications on the JVM. It is build in Scala and implements the actor model concisely. An important feature of Akka is that it ensures messages are delivered to a receiving actor in the same order in which they have been dispatched by the sending actor. This satisfied the formal requirement of maintaining client message ordering.

4.4.3 Message ProcessingCommunication with the REST API is achieved by sending JavaScript Object Notation

(JSON) [33] messages to the appropriate web resource. For each client communicating with the system, an actor is created. These actors are deleted once they have been inactive for a fixed amount of time.

Upon message arrival, the JSON is unmarshalled into dedicated case classes [34] for each expected request type. For example, there is an Order case class, for new order requests, whose signature stores all the details from Figure 8. Similarly, there are predefined case classes for all system to client interactions. The factory design pattern [35] is used to create instances of these case classes without exposing their creation logic to the REST API. Asynchronous replies to client requests are sent by marshalling these case classes into JSON.

Page � of �24 38

Page 25: Building a Dark Pool - University of Manchesterstudentnet.cs.manchester.ac.uk/.../2016/dan.arsenie.pdf · Building a Dark Pool Third Year Honours Project Author: Dan-Matei Arsenie

4.4.4 Multithreaded ComputingAs previously mentioned in this section, all communications between clients and the dark

pool are asynchronous. This is achieved with the help of a Scala feature called “future”. “Futures provide a way to reason about performing many operations in parallel– in an efficient and non-blocking way. A Future is a placeholder object for a value that may not yet exist. Generally, the value of the Future is supplied concurrently and can subsequently be used. Composing concurrent tasks in this way tends to result in faster, asynchronous, non-blocking parallel code” [36]. Coupled with the use of actors, the result is a scalable and low-complexity multithreaded system. More importantly, the fact that actor tasks are simple in nature makes the system scale both horizontally (requires more CPU cores) and vertically (requires more powerful CPU cores).

4.5 Matching Engine

Over the course of this section, an overview of the implementation details concerning the three most important matching engine components described in Section 3.9 will be provided.

4.5.1 Pre-Matching FilterThe pre-matching component is designed around the look-up table described in Section

3.9. This table is implemented as a HashMap whose hash function is given below:

hashValue = order.direction.hashCode + “ ” + order.amount.hashCode + “ ” + order.limitPrice.hashCode + “ ” order.typeOfSecurityTraded.hashCode

It should be obvious that the above hash function will result in collisions, when two or more identical orders exist. Collisions are resolved by appending to an array of orders who share the same hash code and are sorted by expiry date. Assuming a fast hash function, the cost of adding a new order to this data structure is at most O(N).

When a new order arrives and it is determined that at least one identical request of opposing direction exists in the look-up table, the first element from the array of values with the correct hash value is removed. This process takes O(1) time. The memory footprint of the look-up table will grow linearly with the number of entries it has to store.

4.5.2 Algorithm RunnerThe algorithm runner is implemented as a thin layer whose purpose is to instantiate each

algorithm module class. Upon creation, the algorithm module instance is given the list of orders currently in the matching pool and starts running. Interruptions to its run are made by the runner as described in the design section of this paper.

4.5.3 Algorithmic ModulesEach one of the four algorithmic strategies described in the design section of the report is

implemented as a stand-alone class. Each class also contains metric collection code. The two most important performance metrics recorded refer to the total number of paired orders and the average time taken for a match to be executed.

Page � of �25 38

Page 26: Building a Dark Pool - University of Manchesterstudentnet.cs.manchester.ac.uk/.../2016/dan.arsenie.pdf · Building a Dark Pool Third Year Honours Project Author: Dan-Matei Arsenie

4.6 User Interface

While building a user interface for the project was not one of the main goals, it has proven to be a technologically challenging one. This section will cover the issue of cross-domain resource sharing and the solution implemented in the project.

The REST API used by the dark pool is hosted on localhost, port 8080 - together the hostname and port form the web domain. Any web interface to the system will inevitably have a different domain, either because it is stored on a different host or the same host and a different port. Because of security concerns, AJAX requests to a webpage perceived to be on a different server by the browser are not allowed.

One solution to bypass the cross-domain policies in web browsers is to use JSON with Padding (JSONP). JSONP requests are not dispatched using the XMLHTTPRequest and the associated browser methods. Instead they create a script tag, whose source is set to the target URL. This script tag is then added to the DOM [37]. On the client side, a Javascript function with the same name as the JSONP script tag exists and will process the incoming message. An example of this behaviour can be seen in Figure 15.

A screenshot of the UI showing available query options for a given client and returning a set of orders can be seen in Figure 16. CanvasJS [38] was used to create the graph. The vertical axis represents the value of the order (amount multiplied by limit price), while the horizontal axis represents a time series. Circles and triangles differentiate between buy and sell orders respectively.

Page � of �26 38

Figure 15. JSON-P Illustration

Response

processJSONP ({ key1: value1, key2: value2, … keyN: valueN})

Client-side Javascript

function processJSONP { … // JSON processing code …}

Figure 16. UI screenshot

Page 27: Building a Dark Pool - University of Manchesterstudentnet.cs.manchester.ac.uk/.../2016/dan.arsenie.pdf · Building a Dark Pool Third Year Honours Project Author: Dan-Matei Arsenie

Chapter 5. Testing

During the implementation phase of this project, a number of different types of tests were written to ensure the correct behaviour of individual components. Furthermore, end-to-end testing was performed to guarantee overall system correctness. This chapter will give an overview of all the aforementioned types of testing as well as show code coverage statistics.

5.1 Unit Tests

As per TDD recommendation, unit tests were written for all relevant classes, prior to their implementation. The two most important unit tests concerned database interactions and matching algorithm behaviour under various inputs. Scalatest [39] was the testing framework used in this project and individual tests were written using the FlatSpec style. Figure 17 lists all database tests written.

Tested Component

Test Setup Tested Behaviour Expected Outcome

def createDB None Create an empty in-memory/disk database Return created DB

def createDB Save an empty database to disk

Load an existing database from disk into memory

Return loaded DB

def createDB Save an empty database to disk

Load an existing database from disk and continue using it on disk

Return loaded DB

def saveDB None Persist an in-memory/disk database to disk, using a unique name

Save DB to disk with date/time in name

def insert Create an empty temporary test database

Insert a new entry in the Clients table Successful insertion

def insert Create an empty temporary test database

Attempt to insert an invalid entry in the Clients table

Failed Insertion

def insert Create an empty temporary test database

Insert a new entry in the StockOrder table Successful insertion

def insert Create an empty temporary test database

Attempt to insert an invalid entry in the StockOrder table

Failed Insertion

def pullData Create an empty temporary test database and add one record to it, matching test

query parameters

Query StockOrder table using valid criteria that are guaranteed to return a result

Return query result set containing at least one

record

def pullData Create an empty temporary test database and add one

record to it

Query StockOrder table using valid criteria that are guaranteed to return no result

Return empty query result set

def pullData Create an empty temporary test database

Query StockOrder table using invalid criteria

Return empty query result set

def removeOrder

ByID

Create an empty temporary test database and add one

record to it, matching withdraw request

Withdraw an existing order from StockOrder table

Return confirmation of order withdrawal

def removeOrder

ByID

Create an empty temporary test database and add one record to it, not matching

withdraw request

Attempt to withdraw a non-existing order from StockOrder table

Return failure message for order withdrawal

Page � of �27 38Figure 17. Database Unit Tests

Page 28: Building a Dark Pool - University of Manchesterstudentnet.cs.manchester.ac.uk/.../2016/dan.arsenie.pdf · Building a Dark Pool Third Year Honours Project Author: Dan-Matei Arsenie

The matching engine tests are common across all algorithm modules. Figure 18 shows the passing tests for the greedy matching algorithm. It also highlights the expressive power of FlatSpec style tests through scenario nesting and associated output messages.

Unit test code coverage for the whole project is at 84%. Literature [40] describes a code coverage between 70-80% as the main goal for most projects, as such the above figure is in line with industry expectations.

5.2 Specification Tests

As previously noted, specification tests are used to ensure expected REST service behaviour. Spray-testkit and spec2-core were the modules used for this purpose. In order to only test the communication layer, a mock database implementation had to be created (H2 provides this functionality). Each of the three REST resources described in Section 3.8 were tested by sending both valid and invalid messages to them and expecting the correct HTTP response code to be returned.

5.3 System Tests

In order to perform an end-to-end system test, client processes are required to interact with the system. The client process behaviour necessary for a system test is based on two interactions: sending new order and withdraw requests. Each client process can be given how many new orders to place, a pivot point around which random values will be generated and a standard deviation to bound the order values. Furthermore, such a process can also be instructed to randomly withdraw a placed order every T seconds.

Figure 19 shows the flow of client process generation and their interactions with the dark pool.

Page � of �28 38

Figure 18. Greedy Matching FlatSpec style tests

Figure 19. System test client creation and usage

Page 29: Building a Dark Pool - University of Manchesterstudentnet.cs.manchester.ac.uk/.../2016/dan.arsenie.pdf · Building a Dark Pool Third Year Honours Project Author: Dan-Matei Arsenie

Chapter 6. Evaluation

This chapter will first describe how the system testing design presented in Section 5.3 can be used to evaluate dark pool performance under real world circumstances. A behaviour prediction will then be given and compared to results obtained from the system. Finally, a comparison of individual algorithm performance will be given.

6.1 Test Data Generation

As discussed in Section 1.1.2, all public stock exchanges post their daily executed trades records, together with closing prices for all transacted goods. Using historic aluminium closing prices provided by the London Metal Exchange (LME) [41] as pivots to client processes, market accurate test data can be produced. An interval of six months between January 2015 and June 2015 was used as source data and a standard deviation of 10% value was chosen for 90% of the placed orders. The remaining 10% of new orders placed would be allowed to vary from the historical data by as much as 200% (in order to simulate market irregularities and absurd orders that would never get executed under normal circumstances). Figure 20 shows the historic closing prices for aluminium spot and future contracts executed on the LME and the deviation intervals of a given batch of test data.

Page � of �29 38

Figure 20. LME historic closing prices for aluminium spot and future contracts

Page 30: Building a Dark Pool - University of Manchesterstudentnet.cs.manchester.ac.uk/.../2016/dan.arsenie.pdf · Building a Dark Pool Third Year Honours Project Author: Dan-Matei Arsenie

6.2 Expected Behaviour

Ideally, the system would produce a set of pairings whose closing day price would differ to the historical data by no more than the normal 5-6% variation observed between consecutive trading days in the real world. Furthermore, the generated orders that would differ from the average price by wide margins (like the 10% of orders generated with a standard deviation of 200% value) should get very few matches and mostly be paired among themselves.

6.3 System results

The following results were obtained after running the end-to-end test suite five times. Each time, ten client processes were used to place orders onto the system. Each client was programmed to place ten thousand orders per simulated day of testing and randomly withdraw 1% of its orders. The figures given here represent the average results of all runs. The computational intractability of the naive solution meant it was not tested for such large volumes of data. For the greedy solution, three different thresholds of order completion before being allowed to execute a matching were tested - 25%, 50% and 75% (problem described in greedy design section).

To begin with, there was very little (the most registered in one run was 8%) difference between average numbers of orders matched by each of the three algorithms, with the greedy algorithm performing best (on a 25% threshold) at 92.1% closed orders. Figure 21 shows how the matching performance of the algorithms compares. It can also be observed how the number of unmatchable orders remains constant at around 7-8%, but the number of expired orders before a match could be found depends on algorithm performance.

Page � of �30 38

0%

25%

50%

75%

100%

Greedy on 25% threshold

Greedy on 50% threshold

Greedy on 75% threshold

DynamicProgramming

Flow Network

11.8%6.6%10.8%4.4%0.6%

9.2%8.4%

7.9%8.6%

7.3%

79.0%85.0%

81.3%87.0%

92.1%

Orders matched Orders with no match available Orders expired

Figure 21. Algorithm matching performance comparison chart

Page 31: Building a Dark Pool - University of Manchesterstudentnet.cs.manchester.ac.uk/.../2016/dan.arsenie.pdf · Building a Dark Pool Third Year Honours Project Author: Dan-Matei Arsenie

In terms of closing day prices, an average of 6.87% difference was found between the average closing day price of the above solutions and the historical data. This figure is only 0.87% off from the expected behaviour, confirming system correctness. The dynamic programming solution scored the closest to the historical closing prices, as it averaged 5.34% difference over the five test runs. Unsurprisingly, the greedy strategy on 25% threshold ranked at the opposite end of the scale, as it strayed away from historical closing prices by 8.93%. The average results for all algorithms across the six month time frame are presented in Figure 22. The fact that the line chart below follows the same trends as the original dataset (Figure 20) serves as further visual confirmation of results accuracy (blue lines for spot orders and green for futures).

6.4 Algorithm Performance

All of the above algorithms, for which statistics have been given, have a polynomial time complexity with respect to the size of their inputs. In order to understand the importance of time performance for large datasets, Figure 23 compares the execution time that the naive ( O ( N ! ) ) a n d g r e e d y ( O ( N l o g N ) ) implementations each produce, for a small dataset. As can be seen, the difference is significant even for small inputs of 6 orders that require matching.

Page � of �31 38

Figure 22. Plot of closing prices averaged for all algorithms

Figure 23. Factorial vs Polynomial time complexity

Page 32: Building a Dark Pool - University of Manchesterstudentnet.cs.manchester.ac.uk/.../2016/dan.arsenie.pdf · Building a Dark Pool Third Year Honours Project Author: Dan-Matei Arsenie

Chapter 7. Reflection and Conclusions

From requirements gathering to design and implementation, working on this project has been a great learning experience for me. To begin with, it reinforced various aspects of my past and present learning. Such examples include leveraging database and software engineering knowledge from the second year together with advanced algorithms and agile software engineering from the third year of university. It also served as a catalyst for my study of new technologies and development methodologies. This chapter is first dedicated to highlighting the evolution of my initial timing of milestones. A listing of what new knowledge I have acquired with respect to the financial domain and new technologies will then follow. A project conclusion, and suggestions of future work in the area will close the paper.

7.1 Changes to Milestone Timing

There have been a small number of discrepancies between my initial plan (Appendix C) and the actual implementation work undergone, with respect to milestone timing. The most relevant example was postponing the REST service development until after the January exams had finished, due to revision work.

Another significant difference involved the number of algorithms implemented for the matching engine. Wherein the project plan only expected me to implement one in the span of two weeks, I was able to implement four.

7.2 New Knowledge

To begin with, this project has provided me with a great avenue for exploring how stock exchanges operate. As a result, I have researched financial literature in both electronic and printed form. Furthermore, I have liaised with industry contacts from my work placement in order to seek a better understanding of the current regulatory environment.

From a technology and tooling perspective, the current project has been a great opportunity to explore topics outside of my comfort zone. The first such example that comes to mind was the study of Scala - a language I have come to understand and appreciate for its versatility. Other such occurrences involved using Akka, spray.io and sbt, all of which were new to me.

Finally, working on this project and its associated tasks has also contributed to my soft skills development. I am confident that my abilities to manage a project, estimate task duration and communicate progress in both written and verbal form have improved as a result of working on my third year project.

Page � of �32 38

Page 33: Building a Dark Pool - University of Manchesterstudentnet.cs.manchester.ac.uk/.../2016/dan.arsenie.pdf · Building a Dark Pool Third Year Honours Project Author: Dan-Matei Arsenie

7.3 Project Results

The current project set out to build a dark pool API and investigate the performance of various computer science matching algorithms applied in a financial context. It has achieved both these goals and can now be used as a starting point for other explorations of the financial landscape.

Even though the communication layer and database have only been tested using 0.1% of actual market volumes of data, I am confident that my proposed architecture would scale up well to handle much higher loads.

The matching engine manages to close between 79% and 92.1% of placed orders. Furthermore, the resulting end of day price for the computed pairings is only 0.87% away from the expected margin (Sections 6.2 and 6.3).

Finally, the project has highlighted that while some matching algorithms lend themselves better to the domain space than others, there is currently no ideal solution that can guarantee 0% wastage of orders due to expiry time.

7.4 Future Work

Potential future improvements to the current project include exploring other matching algorithms (or mixing existing strategies) in order to either increase the number of closed orders or speed up the pairing process.

Expanding the scope of the project can be done in a number of ways. One example might involve adding more types of financial instruments (options, swaps etc). Another avenue that would be interesting to explore involves the use of machine learning and heuristics to provide superior order matching and even incoming order predictions.

As long as the financial landscape will continue changing at its current pace, I believe that tools such as the one presented in this report will become a growingly important research tool for both economists and academia to test their market theories in a safe environment.

Page � of �33 38

Page 34: Building a Dark Pool - University of Manchesterstudentnet.cs.manchester.ac.uk/.../2016/dan.arsenie.pdf · Building a Dark Pool Third Year Honours Project Author: Dan-Matei Arsenie

References[1] - Peter T. Daniels, "The First Civilizations", in The World's Writing Systems, ed. Bright and Daniels, p.24, accurate as of 28.04.2016[2] - http://www.investopedia.com/university/options-pricing/black-scholes-model.asp, Black Scholes model, accurate as of 28.04.2016[3] - Buy Limit Order Definition | Investopedia http://www.investopedia.com/terms/b/buy-limit-order.asp - accurate as of 23.04.2016[4] - Futures Contract Definition | Investopedia http://www.investopedia.com/terms/f/futurescontract.asp - accurate as of 23.04.2016[5] - Spot Trade Definition | Investopedia http://www.investopedia.com/terms/s/spottrade.asp - accurate as of 23.04.2016[6] - http://www.londonstockexchange.com/traders-and-brokers/rules-regulations/rules-lse.pdf, London Stock Exchange Rules, accurate as of 02.05.2016[7] - http://www.londonstockexchange.com/traders-and-brokers/rules-regulations/rules-lse.pdf, London Stock Exchange Rules, accurate as of 02.05.2016[8] - https://rtdata.dtcc.com/gtr/dashboard.do, Real Time DTCC Data, accurate as of 02.05.2016[9] - Michael Lewis, “Flash Boys: A Wall Street Revolt”, ISBN 978-0141981031, Penguin, 2014[10] - https://batstrading.co.uk/market_data/market_share/market/, BATS Europe | Market Share, accurate as of 02.05.2016[11] - https://www.playframework.com/documentation/2.5.x/ScalaWS - Play Framework Documentation, section on play.ws.timeout.request - accurate as of 24.04.2016[12] - http://spray.io/documentation/1.1.2/spray-can/configuration/ - Spray Framework Documentation - see code concerning request-timeout - accurate as of 24.06.2016[13] - http://www.phlx.com/Trader.aspx?id=DailyMarketSummary Daily Market Summary for Nasdaq - quotes 1,570,182,907 transactions executed on Nasdaq on 25.06.2016, or about 18000 transactions a second (on average the volume is around 1.7 billion trades a day)[14] - http://ssd.userbenchmark.com/, SSD Benchmarks, accurate as of 25.06.2016, figure quoted for Corsair Neutron XT 240GB[15] - http://www.memorybenchmark.net/write_ddr3_intel.html, RAM Memory Benchmarks, accurate as of 02.05.2016[16] - http://restcookbook.com/Miscellaneous/richardsonmaturitymodel/, Richardson Maturity Model, accurate as of 27.05.2016[17] - L. R. Ford, Jr. & D. R. Fulkerson, “Flows in Networks”, ISBN: 9780691146676, Princeton University Press, 1962[18] - "On Setting Your Initial WIP Limits". The Agile Director http://theagiledirector.com/. 2014-12-07. Retrieved 2015-06-08 [19] - http://scala-lang.org/news/2.11.1/, Scala Documentation, accurate as of 29.04.2016[20] - http://redmonk.com/dberkholz/2013/03/25/programming-languages-ranked-by-expressiveness/, Programming Languages Ranked By Expressiveness accurate as of 29.04.2016[21] - Dean Wampler, Alex Payne, “Chapter 2: Type Less, Do More”, “Programming Scala, 2nd Edition”, O’Reily Media, ISBN: 978-1-4919-4984-9, 2014[22] - Dean Wampler, Alex Payne, “Chapter 4: Pattern Matching”, “Programming Scala, 2nd Edition”, O’Reily Media, ISBN: 978-1-4919-4984-9, 2014[23] - Dean Wampler, Alex Payne, “Chapter 17: Tools for Concurrency”, “Programming Scala, 2nd Edition”, O’Reily Media, ISBN: 978-1-4919-4984-9, 2014[24] - Dean Wampler, Alex Payne, “Chapter 1: Zero to Sixty: Introducing Scala”, “Programming Scala, 2nd Edition”, O’Reily Media, ISBN: 978-1-4919-4984-9, 2014[25] - http://www.scala-sbt.org/, SBT documentation, accurate as 0f 30.04.2016[26] - Joshua Suereth and Matthew Farwell, “Chapter 1: Why sbt?”, “sbt in Action”, Manning Publications, ISBN 9781617291272, 2015[27] - http://www.h2database.com/html/main.html, Official H2 Database Documentation, accurate as of 29.04.2016[28] - http://www.h2database.com/html/performance.html, Official H2 Database Documentation, accurate as of 29.04.2016

Page � of �34 38

Page 35: Building a Dark Pool - University of Manchesterstudentnet.cs.manchester.ac.uk/.../2016/dan.arsenie.pdf · Building a Dark Pool Third Year Honours Project Author: Dan-Matei Arsenie

[29] - http://spray.io/, Home Page for spray.io, accurate as of 29.04.2016[30] - https://www.playframework.com/, Home Page for Play Framework, accurate as of 29.04.2016[31] - Paul Chiusano and Rúnar Bjarnason, “Chapter 7. Purely Functional Parallelism”, “Functional Programming in Scala”, Manning Publications, ISBN 9781617290657, 2014[32] - http://akka.io/, Home Page for Akka, accurate as of 29.04.2016[33] - http://www.json.org/, Home Page for JSON Protocol Description, accurate as of 29.04.2016[34] - http://docs.scala-lang.org/tutorials/tour/case-classes.html, Guide to Scala Case Classes, accurate as of 29.04.2016[35] - Eric Freeman, Elisabeth Robson, Bert Bates, Kathy Sierra, “Chapter 4: The Factory Pattern: Baking with OO Goodness”, “Head First Design Patterns”, Publisher: O'Reilly Media, Ebook ISBN:978-0-596-55656-3, 2004[36] - http://docs.scala-lang.org/overviews/core/futures.html, Scala Futures, accurate as of 30.04.2016[37] - http://json-p.org/, Home Page for JSON-P Protocol Description, accurate as of 30.04.2016[38] - http://canvasjs.com/, CanvasJS Landing, accurate as of 30.04.2016[39] - http://www.scalatest.org/, Scalatest Home, accurate as of 30.04.2016[40] - http://www.bullseye.com/minimum.html, Code Coverage Guide by Steve Cornett, accurate as of 30.04.2016[ 4 1 ] - h t t p s : / / w w w . q u a n d l . c o m / d a t a / L M E / P R _ A L - A l u m i n u m - P r i c e s ?utm_medium=graph&utm_source=quandl, London Metal Exchange Historical Data, accurate as of 1.05.2016[42] - Vickrey, William (1961). "Counterspeculation, Auctions, and Competitive Sealed Tenders". The Journal of Finance 16 (1): 8–37. doi:10.1111/j.1540-6261.1961.tb02789.x

Page � of �35 38

Page 36: Building a Dark Pool - University of Manchesterstudentnet.cs.manchester.ac.uk/.../2016/dan.arsenie.pdf · Building a Dark Pool Third Year Honours Project Author: Dan-Matei Arsenie

Appendix A

Social surplus, as defined in COMP34120 AI and Games is defined as “the total value added by a given activity to all members of society who are affected by that activity.” The activity in question for this project is the act of placing an order on a stock exchange.

All potential buyers of a given good have a predefined knowledge of how much that given good is worth for them. In the case of Alice from Section 1, a ton of aluminium is only worth buying if its price is less than £500 - this is called her true valuation for a ton of aluminium. More formally, the true valuation of an item is the utility value an agent has assigned to that item. It can be demonstrated[42] that a dominant strategy in any stock exchange or bidding scenario involves bidding the true valuation for an item. This means that if the highest price gets executed, then the buyer who would find the item most useful to him/her will receive it. Such a strategy is said to guarantee social surplus.

Page � of �36 38

Page 37: Building a Dark Pool - University of Manchesterstudentnet.cs.manchester.ac.uk/.../2016/dan.arsenie.pdf · Building a Dark Pool Third Year Honours Project Author: Dan-Matei Arsenie

Appendix BThe contents of this appendix are extracts from Coursework 1 of COMP33711: Agile

Software Engineering 2015-2016, which I have submitted on October 25th 2015.

“Kanban column explanations: • Story: an overall view of what wants to be achieved. Each story entry is a new row on the board,

to which all the other headings will be referring.• Task Backlog: a list of features that need to be implemented in order to complete the story in

question. All new tickets are created in this state initially.• In Development: a list of features currently being worked on. Tickets move from the backlog to

the development column once work begins on them.• In Testing: a list of tickets for which testing is currently taking place. The tests can either be unit

tests or complete system integration tests, to check that adding the given component does not break any of the working components. Tickets get moved into this column when development is completed and if it makes logical sense for them to be tested (i.e. the writing of tests before developing the code to pass them does not need to be tested).

• Done: a list of delivered features that have passed all tests, when tests were applicable. Tickets get put into this category when there is no development/testing work left to be done for them.”

Example Kanban board at a given point in time (also taken from same coursework):

Story Task Backlog In Development In Testing Done

As a customer to the system, I want to be able to place orders onto the dark pool, so as to change my market position

1. Built REST interface

2. Test client connectivity to interface

3. etc

1. Research available REST frameworks compatible with Scala and AKKA

- -

Page � of �37 38

Page 38: Building a Dark Pool - University of Manchesterstudentnet.cs.manchester.ac.uk/.../2016/dan.arsenie.pdf · Building a Dark Pool Third Year Honours Project Author: Dan-Matei Arsenie

Appendix C

A higher resolution version of the below diagram can be found here: https://github.com/mateiarsenie/darkpool-project/blob/master/Initial%20Plan.pdf

Page � of �38 38