Download pdf - ATHABASCA UNIVERSITY MESSAGE ORIENTED SOFTWARE …dtpr.lib.athabascau.ca/action/download.php?... · Section 2 will cover the current state of CQRS. It will explore what de nes CQRS,

ATHABASCA UNIVERSITY

MESSAGE ORIENTED SOFTWARE DESIGN USING COMMAND QUERY

RESPONSIBILITY SEGREGATION

BY

SIMON TIMMS

A project submitted in partial fulfillment

Of the requirements for the degree of

MASTER OF SCIENCE in INFORMATION SYSTEMS

Athabasca, Alberta

March, 2012

c©Simon Timms, 2012

ABSTRACT

The design of software is an ever evolving process. It seems that every year there are

new technologies and new techniques which require that software developers adopt new

methodologies for the development of software. As the popularity of the computers and

the Internet increase the computational and storage requirements of Internet services

also rise. In order to address these needs large distributed systems such as Amazon’s

EC2 and Microsoft’s Azure were created. However it is not enough to simply scale up

an application to larger and faster computers as the growth of demand is outstripping

even the astronomical pace of improvements to memory and CPU. Instead of scaling up

one must scale out to a greater number of computers as the growth in the number of

computers is not as limited. Scaling out is a complex undertaking due to a number of

factors. One possible approach to scaling out is to make use of a technique known as

Command Query Responsibility Segregation(CQRS). In its purest form CQRS dictates

that different data models be used for reading and writing data synchronized through

the use of messaging.

A design used in many CQRS system is to retain a stream of events rather than a

rich domain model. This event sourcing relies on the assumption that it is very quick

to recreated the current state of domain objects and views from a stream of messages.

However, this has not been well proven and there is especially little research on the best

method of storing messages and the actual serialization of the messages. An exploration

of the best method of using event sourcing will be made and conclusions drawn about

the advantages of the various approaches.

i

TABLE OF CONTENTS

CHAPTER 1 - INTRODUCTION 1

CHAPTER II - REVIEW OF RELATED LITERATURE 3

Architectures for Development of Data Driven Applications . . . . . . . . . . . 3

Scaling Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Consistency is a Lie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

CQRS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

CQRS Explained . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Idempotence of Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

CQRS vs. ActiveRecord(CRUD) . . . . . . . . . . . . . . . . . . . . . . . . . 33

Relationship with Domain Driven Design . . . . . . . . . . . . . . . . . . . . . 35

CQRS in Real World Situations . . . . . . . . . . . . . . . . . . . . . . . . . . 38

Research Question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

Expected Outcome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

CHAPTER III - METHODOLOGY 47

Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

CHAPTER IV - RESULTS 51

Serialization and Deserialization . . . . . . . . . . . . . . . . . . . . . . . . . . 51

Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

Storage Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

CHAPTER V - RESEARCH IMPLICATIONS 60

Serialization Technology Selection . . . . . . . . . . . . . . . . . . . . . . . . . 61

Database Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

General Applicability to CQRS . . . . . . . . . . . . . . . . . . . . . . . . . . 62

ii

CHAPTER VI - CONCLUSIONS 63

Glossary 65

References 68

iii

List of Figures

1 Single Tiered Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Repository Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 The CAP theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4 Order database model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

5 Order database model after denormalization . . . . . . . . . . . . . . . . 16

6 A standard queue data structure . . . . . . . . . . . . . . . . . . . . . . 17

7 A queue data structure altered to function using the CQS pattern . . . . 17

8 CQRS pattern diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

9 A series of commands arriving out of order. . . . . . . . . . . . . . . . . 32

10 A comparison of a customer as seen by traditional and DDD architectures. 37

11 Traditional synchronous persistence model . . . . . . . . . . . . . . . . . 41

12 Asynchronous persistence model . . . . . . . . . . . . . . . . . . . . . . . 42

13 The average time, in ms, for serialization of 50 000 small messages. . . . 53

14 The average time, in ms, for deserialization of 50 000 small messages. . . 54

15 The on disk size of a serialized small message. In bytes. . . . . . . . . . . 54

16 The average time, in ms, for serialization of 50 000 large messages. . . . . 55

17 The average time, in ms, for deserialization of 50 000 large messages. . . 56

18 The on disk size of a serialized large message. In bytes. . . . . . . . . . . 56

19 The average time, in ms, for serialization of 50 000 sparsely populated

large messages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

20 The average time, in ms, for deserialization of 50 000 sparsely populated

large messages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

21 The on disk size of a serialized sparsely populated large message. In bytes. 58

22 The time taken to serialize and deserialize 500 messages to a variety of

different storage technologies. In ms. . . . . . . . . . . . . . . . . . . . . 59

iv

List of Code Listings

1 An example of active record used to save a customer record to the database. 5

2 An example of retrieving a collection of customers from an Active Record

system and updating a property. . . . . . . . . . . . . . . . . . . . . . . 5

3 An example of a scenario which would trigger the n+1 problem. . . . . . 8

4 Working with a set of entities . . . . . . . . . . . . . . . . . . . . . . . . 11

5 Simple Message Router . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

6 A simple message. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

7 A complex message with 28 fields. . . . . . . . . . . . . . . . . . . . . . . 52

v

CHAPTER 1

INTRODUCTION

Building software is a difficult problem and the field of software engineering is still very

new. Large systems have been built for less than 50 year and the process is far from well-

known. The variety of different systems means that no single approach is always correct.

Unlike building a bridge the ability to simply copy a computer system means that should

the same solution be needed somewhere else there is almost zero cost to reapplying the

same solution. Business software also tends to be put in place to replace existing manual

processes whereas bridges are built where no passage previously existed. This means that

software must fit in with existing constraints while the bridge builders are able to dictate

how the bridge is used. Over the years many suggestions have been made as to how to

build systems and many have claimed that they have the one true solution. What has

become apparent is that a single solution for all problems is ill advised. The architecture

of a stock trading system must be different from an off line reporting system. It even

seems that one stock trading system is significantly different from other systems that

seek to solve the same set of problems. Often it is possible to create working systems

using two completely different architectures; the advantages of one over the other may

only become apparent years later during maintenance or expansion. These architectures

are often formalized in patterns - high level explanations of how to apply a particular

architecture.

Command Query Responsibility Segregation is a relatively new development in en-

terprise architecture having only been suggested the last few years. Perhaps the most

amusing indicator of the newness of CQRS is that, at the time of writing, a Google search

for CQRS will result in the suggestion ”CQRS... did you mean cars?”. This has become

something of a running joke in the CQRS community. This essay will explore a number

of aspects of CQRS from its history, to its design to the situations in which it should and

1

should not be applied.

Section 2 will cover the current state of CQRS. It will explore what defines CQRS,

the current state of research and how to address some of the short comings of CQRS.

In section 3 a research problem related to event sourcing will be introduced. Section 4

will outline the results of the experiments. Section 5 will suggest an approach which can

be followed to make use of the results of the experiments. Finally section 6 will draw

conclusions from the literature in section 2 as well as the experiments.

The experiments in this paper provide a number of benchmarks for the serialization,

persistence and deserialization of messages which are central to CQRS and other message

based architectures. In order for messages to be durable1 they must be written to disk.

While the experiments here focus on writing large numbers of messages as part of even

sourcing the concepts are applicable to any persistence of messages.

One of the applications of CQRS is in high performance distributed computing where

message size and processing speed are critical. A study of which message formats and

storage mechanisms should be used is crucial for ensuring that these systems can be

performant. There is limited information on the size of systems deployed using CQRS

but there are reports of systems processing 250 million messages a day. The author has

worked on systems processing almost two million messages across half a dozen nodes.

1Durable messages are recoverable after a process or machine crash. Messages stored solely in volatilememory such as that used in processor caches and main memory are not durable as they are wiped aftera power outage.

2

CHAPTER II

REVIEW OF RELATED LITERATURE

There is very little academic research on CQRS as it is only about a year old. The

original term CQRS was coined in 2010 by Greg Young who split the CQRS mailing list

from the Domain Driven Design group. Fortunately the relationship between CQRS and

DDD as well as a number of other established technologies is well established and there

exists a large body of work on these topics. Because of the youth of CQRS limiting this

essay to recognized academic resources is illogical; thus I shall draw on sources such as

mailing lists and blog posts on the topic of CQRS by some of the thought leaders in the

field. There is, however, a great deal of excitement about the potential of CQRS. At the

time of writing the Patters and Practices group in Microsoft has announced that they will

be producing a set of recommendations and, possibly, some demonstration applications

centered around CQRS. Despite some controversy and early skepticism on the mailing

list it is likely that a recommendation from of of the largest software vendors in the world

will spur significant adoption. The Patterns and Practices group’s initial publication is

expected early in 2012[23].

ARCHITECTURES FOR DEVELOPMENT OF DATA DRIVEN

APPLICATIONS

A huge number of developers are employed to create system for the entry, analysis and

reporting of data. This is simply because businesses produce a great deal of data and

have a need to make decisions based on that data. These enterprise systems are largely

rely heavily on relational database systems. Table based relational database systems

have been at the forefront of enterprise systems for the last thirty years. There has

been almost no other computer technology which has been as persistent or as pervasive

as relational database systems. An SQL query which worked on a database from 1985

3

is almost certain to work on the latest relational databases. What has evolved is the

method for accessing the data in the database.

Monolithic Single Tiered

There are a number of different architectural patterns which have seen common use for

accessing data from an application. The simplest method is a single tiered monolithic

system. In this the application and data access form a single tier on a single computer.

This sort of system is useful for applications which are not collaborative and have no

need to share data. Traditionally systems such as word processors and spreadsheets fall

into this category, although those systems are now making their way online and becoming

collaborative. This method has also seen use in mainframe systems where the applications

are access via dumb terminals. Often these systems make use of flat files or files which

contain a proprietary data structure optimized for access by the application.

Figure 1: Single Tiered Pattern

N-Tiered

As data access requirements become more complex a popular approach is to put in place

an architectural layer dedicated to data access. When using object oriented programming

there is a discord between the tables based structure of traditional relational databases

and the data transfer objects(DTOs) found in an application. Application developers

prefer to walk the object to get access to its properties. For instance it would be expected

4

1 ICustomer customer = new Customer ( ) ; // a c t i v e record customer2 customer . FirstName = ”Frank ” ;3 customer . LastName = ”Berry ” ;4 customer . Save ( ) ; // p e r s i s t the ob j e c t to the database5 re turn customer . ID ; // the ID i s populated from the database

Code Listing 1: An example of active record used to save a customer record to thedatabase.

1 IEnummerable<Customer> customers = CutomerTable . ID . GreaterThan (7) ;2 f o r each ( var customer in customers )3 {4 customer . I sP r e f e r r e d = true ;5 customer . Save ( ) ;6 }

Code Listing 2: An example of retrieving a collection of customers from an Active Recordsystem and updating a property.

that the orders collection hanging off of the Customers object would be populated with

the customer’s orders. Often this functionality can be provided by an object relational

mapper such a Hibernate, Entity Framework or, very popular in the Ruby on Rails

community, ActiveRecord. These tools are often used against a remote database in order

to gain the advantages of N-Tier architecture. The operations available through these

systems are usually limited to Create Read Update and Delete which is commonly known

as CRUD. We will refer to this style of operation as Active Record architecture.

Each system which provides Active Record architecture differs in its implementation

it is very common to see the data objects extended or decorated with methods allowing

for database access. Languages such as ruby which permit monkey patching2 usually

implement the data methods by patching objects while languages, such as C#, may

extend the objects or generate the objects through static analysis and code generation

prior to compile time.

Queries are constructed behind the scenes and are usually provided as decorators on

a collection of objects.

2Monkey patching is the ability to add or even change the methods of an existing object at run-time. The ability to monkey patch allows for much easier unit testing although it does introduce someuncertainty about whether calling a method on two objects of the same type will have the same effect.

5

As the system again increases in complexity it becomes undesirable to use an Active

Record as the data operations become spread throughout the code and changes to the data

access tools or methods become major operations. It is simply too easy for developers

to access the database from any context they desire. To alleviate this concern many

systems centralize data operations into a layer containing repositories. These repositories

are classes through which all the data access requests pass. They may contain methods

which simulate the CRUD operations of Active Record but they may also start to make

use of task based data operations. This is preferable as it provides some abstraction

around the data access. Consider the act of saving a new customer and adding an order.

Using Active Record this would consist of first creating a customer, retrieving the ID of

the created record and then creating an order related to that and saving it. This operation

can be made atomic through the use of transactions at the database level but the logic

would still be decentralized, every place in the application which created a customer and

an order would need to implement the operation. Should the requirements change and

there be a need to log the creation of a customer it would be difficult. A repository might

provide a method CreateCustomerWithOrder which would be called from throughout the

application providing an easy extension point and reducing the amount of duplication of

code - always a good thing.

6

Figure 2: Repository Pattern

All these database access patterns are based on manipulating a single set of domain

objects. A page showing a list of customers might use the the customers collection and

a projection to display the customer id, name and city. Difficulty arises when there

is a requirement to display information which crosses the boundaries of an entity. The

collection of orders is not contained in the same entity as the customer information. Were

it desirable to display the customer name and a count of their orders then multiple queires

are required. In an Active Record scenario this typically takes the form of decorating the

Customer object with a lazily instantiated collection of orders. Accessing the collection in

code will trigger the population of the records from the data source. One of the problems

associated with this method of data access is the n+1 problem. n+1 is a result of the

object relational mapper’s inability to predict the future operations on a retrieved entity.

When iterating over a sub collection of objects the object relational mapper will

only load the object as it is needed. Thus when retrieving the entire sub collection each

7

1 Customer customer = Customers . Load (15) ; // load the customer with ID 152 fo r each (Order order in customer . Orders )3 {4 //some operat i on on order5 }

Code Listing 3: An example of a scenario which would trigger the n+1 problem.

object is retrieved with its own a query rather than returning multiple rows from a query.

Accessing each entity is less efficient than accessing all of them at once but the object

relational mapper acts in this way because does not know if the code is curious only

about the first entity or if the entire collection is needed. As the collection might be

prohibitively large a decision is made to retrieve exactly what the code requests and not

assume that the collection is being iterated over fully. Programmers need to explicitly

instruct the object relational mapper to load the collection but n+1 is an extremely

subtle bug because the code does work, it just works slowly. Often it is not noticed in

test because of the limited size of the test database. Once in production of huge quantities

of data the issue is enough to bring a database to its knees.

SCALING DATABASES

As the quantity of data which the application needs to access grows it becomes necessary

to scale up the database infrastructure. Unfortunately it is very difficult to scale table-

based databases because of the nature of the queries run against them. Almost every

meaningful data retrieval action against a relational database requires that tables be

crossed against one another. In our customer and customer order example above if

we wished to list the name of a customer and a list of their orders we would need

to perform a cross between the customers and customerOrders tables. This operation

usually consists of matching records in one table and then augmenting and crossing them

against another table. If the two tables are stored on a single computer the operation

is not that difficult but as soon as you attempt to cross the tables against data on

8

another computer the limitations of network speed become apparent. One cannot go

transferring large quantities of data over the slow network in this fashion. Despite the

growth in storage media the amount of data generated is growing far faster. Data growth

is proving to be somewhere on the order of 60% a year[16]. The result is that querying

this quantity of data cannot be performed on a single machine and must be distributed

over a large number of machines. This is known as scaling out as opposed to simply

buying a faster computer, known as scaling up. The best option for scaling table based

databases is to distribute the data in a logical fashion. One popular technique is to make

use of sharding. This technique partitions the database in a horizontal fashion, splitting

tables into groups of rows each of which is distributed to a different server[29]. Specialized

techniques and tools must be used to reassemble the data after it has been split across

many instances. Care must also be taken to ensure that the partitioning scheme is both

efficient and easy to calculate such that the destination of an inserted row can be found

efficiently.

Sharding is a technique designed to help scale out databases which were never origi-

nally designed for it. It involves partitioning data within a table across multiple nodes.

For instance a technique might be to keep all the customers whose ID is even on node A

and all those whose ID is odd on node B. It is now apparent, given an ID, which server

should be queried. Of course if the ID of the customer is not known in the query both

databases need to be queried and if an additional table needs to be referenced then there

are significant data transfer requirements. If the query calls for a list of all customers

in Calgary which more than 12 orders then ether the full list of customers in Calgary

must be calculated and aggregated somewhere or the list of customer IDs with more than

12 orders must be distributed to each of the nodes containing customer data. In either

instance the data sent back and forth between nodes is potentially large.

Often a better solution is to make use of the new generation of NOSQL databases

which are constructed with a goal of allowing for easy scalability across a large number of

9

nodes. Distributed databases usually provide a restricted set of query tools but are always

limited by the CAP theorem. Established by Brewer[5] and later proved by Gilbert and

Lynch[13], the CAP theorem states that a distributed computer system can only ever

fulfill two of

• Consistent

• Available

• Partition tolerant

at any one time. Traditional methods for scaling table based databases such as shard-

ing ensure consistency and availability but require that each node in the database be

reachable at all times. As soon as the network is partitioned then the entire system fails.

Many of the NoSQL databases are constructed in a way that does not allow for

crossing tables. For instance CouchDB relies on map-reduce functions to filter datasets.

These functions can be run on many nodes at once and the results aggregated for return.

Figure 3: The CAP theorem from http://guide.couchdb.org/draft/consistency.html

As has been observed[21] availability is highly desirable for any sort of online system

so the question is really reduced to ”Is it more desirable to have consistency or parti-

tion tolerance?”. NoSQL databases usually answer that partition tolerance is preferable.

10

1 Customer customer = s e s s i o n . Load<Customer>( id ) ;2 i f ( customer . Country == ”Canada”)3 {4 fo r each (Order order in customer . Orders )5 {6 order . TaxRate = 0 . 0 5 ;7 s e s s i o n . Update ( order ) ;8 }9 }

Code Listing 4: Working with a set of entities. This code uses the conventions from theNHibernate ORM but should be familiar to anybody who as used Hibernate

Often this results in the use of eventual consistency. An update to a record may not

be visible right away but will eventually be visible to subsequent queries. This sort of

database is useful for information which don’t have real time requirements. However even

if you do have real time requirements there are some huge issues related to the loading

and use of data.

CONSISTENCY IS A LIE

We have always been told that transactions and database consistency levels will save us

from getting inconsistent data from a traditional relational database; after all most of the

world’s computer systems are based on these systems. Unfortunately that is not really

the case. In order to see the issues let’s look at a couple of examples which initially seem

to be okay but turn out to be problematic.

Object relational mappers often offer a facility to perform lazy loading of database

objects. Thus you might load a customer entity and then explore the collection of orders

related to that customer, as you explore the collection the ORM will load the entities.

In code listing 1 a customer is loaded and then their orders collection is lazily loaded.

Because these events occur separately it is possible that another process has updated the

customer entity between it being loaded and the collection of orders being loaded. The

only isolation level which would prevent this is serializable, the highest isolation level[27].

This is certainly not the default level for any database. It is likely that code of this

11

sort exists in many systems which do not make use of serializable. I know that I have

written systems which would fail in this situation. But if we use serializable isolation

then at least we can be assured that all the records for that customer have been updated,

right? Unfortunately not, it is entirely possible that a new record has been added for

the customer after we’ve populated our collection. The only way to prevent such an

addition from occurring is to lock the entire table any time a transaction runs. Because

it is impossible to know what a transaction is going to do before running it(see halting

problem) every single transaction would need to lock every table it uses for the duration

of the transaction.

It is the conjecture of DDD expert Udi Dahan that race conditions do not actually

exist[9]. His argument is that very slight differences in the timing of transactions should

not have an impact on business logic for the vast majority of businesses. He give the

example of having requirements

1. If the order was already shipped, don’t let the user cancel the order.

2. If the order was already canceled, don’t let the user ship the order.

If two users issue the ship and cancel commands at almost the same time what should

happen? The solution comes from the observation that the refund to the customer does

not have to be issued at once. We can issue a command to refund the customer once we

have assurances that the order can actually be canceled.

Computer systems in businesses are put in place to replace manual processes. Prior

to the advent of computers a memo would have been issued by the order department and

sent to the shipping department instructing them to cancel the order. A follow up memo

would be sent by the shipping department to the billing department informing them that

either the order had already shipped or that they had canceled the shipment. The near

instantaneous nature of computers and networks has lead to the false assumption that

millisecond timings are important in to the business. This is simply not the case3, for

3There is a business where this sort of timing is actually important and that is in stock trading.The requirements there are unlike those of traditional system and are not really applicable to this race

12

the most part these conflicts can be solved by using compensating logic. For instance if

the customer cancels the order after it has shipped then the compensating action might

be to e-mail the user to let them know that their order shipped anyway and that they

can ship it back for a refund.

Even the most common example of using a transaction to prevent race conditions,

withdrawing money from a bank account, can be restated such that it doesn’t rely on

concurrency. Normally the argument is that the bank account balance must be locked

before the withdrawal to ensure no competing process withdraws all the money and

leaves the bank with a negative balance it must cover. In practice, however, it is rare

that this situation would arise and if it does the bank simply allows the withdrawal,

places the account in the negative and bills the customer some outrageous amount in

overdraft charges. In this situation the bank actually makes money from not observing

transactional consistency. Certainly there are cases where the bank will never recover its

money but these are so few that they are more than paid for with the overdraft fees.

CQRS Data Access

Command Query Responsibility Segregation provides an alternative method of looking

at data access. The vast majority of data access commands are read. Anecdotal evidence

suggests that 99% of data access operations are reads[34] however the database models

and systems commonly in use tend to optimize for write operations. The goal of database

normalization is to reduce the number of places in a database where information exists.

Traditionally it is the goal of developers to keep their database in one of the normal

forms, ideally BCNF. However this optimization is used to make writing to the database

easier. Each unique piece of information should appear in exactly one location so that

updates do not create inconsistencies. If 99% of database operations are reads then why

are we optimizing for the operations which take up only 1%?

condition discussion.

13

CQRS breaks up the entire data model into a domain model and read views. User

actions result in a command being issued which triggers and update to the domain model

facilitated by a command handler. The command handler may then publish a message

informing a number of listeners on the read model side that an update has been performed

and that they should update their state.

This is truly the crux of CQRS - separate database structures for reading

and writing.

The issue with a normalized database is that data which users would like to see on

the screen at the same time is not necessarily proximal in the data model. An example

of a traditional data model for a commerce system might be as shown in figure 4. The

database in this model is normalized and is in BCNF. It has the advantage that the

data is consistent and there is no duplication of data - from a disk usage and ease of

update perspective it is highly efficient. However, using this model to display the user’s

order complete with item names, item prices accurate for the order date and the latest

record from the shipping history table requires crossing all the tables. This is not only

a complicated query for developers to write but is also expensive for the database to

process. The problem is, of course, compounded by the possibility that the database

could be distributed across a number of shards. In order to make queries to the database

more efficient one might consider denormalizing the database.

Figure 4: Order database model

14

Denormalization is a database design which aims to optimize the read queries against

a database by shifting fields and combining tables such that data is closer to the structure

required by the query. It is an intentional move away from strong normal forms towards

weaker ones. The cost of denormalization is that it makes updates to the database more

difficult to perform as data may be duplicated in multiple tables. The database from

figure 4 has undergone denormalization in figure 5. Using this model it is clearly much

easier to run the same order information query. Now only a handful of tables need to

be crossed and this very common query can be performed far more efficiently. However

the cost is clear: what would happen if the name of an item needed to be updated?

Now it would be necessary to update not only the name in the Item table but also in

the denormalized OrderItem table. While the cost of this operation is not staggeringly

high from a computational stance there is a lot of developer overhead. It is likely that

the denormalization in figure 5 is simply one of many denormalization which would be

required to feed the various screens and reports in the commerce site. Developers must

now remember to update the item name in all the denormalized tables associated with the

order. One possible solution is to make use of database triggers to push updates to the

various child tables. This has a couple of disadvantages, first it moves a lot of the business

logic into the database which, as we’ve already discussed, is difficult to scale. Second it

makes updates take longer as the user now needs to wait for all the triggers to complete

as part of their transaction. Another oft cited argument against denormalization is that

the data duplication may have significant data storage costs.

15

Figure 5: Order database model after denormalization

CQRS

Origins in CQS

CQRS, as a pattern, is closely related to the Command Query Separation pattern which

was first proposed by Meyer[22]. Meyer suggests that ”Functions should not produce

abstract side effects” and that only procedures should be permitted to make changes to

the state of the object4. The metaphor behind this pattern is that of a machine which

has two kinds of buttons, command buttons and query buttons. Only command buttons

may be used to change the state of the object and this state is not directly observable.

The only way to extract information is by pressing one of the query buttons which will

change the output displayed on the indicator lights. No matter how many query buttons

are pressed and in what order the internal state of the machine will not change. Queries

are side-effect free.

Applying this pattern to a common programming problem may give some idea of how

it can be used. Consider a queue data structure which has the normal queue operations

such as push, pop and peek, how can we update this object to follow the principles of

CQS?

4Many of today’s most popular programming languages do not distinguish between a function anda procedure as a syntactical level. This means that there is a whole generation of developers who haveno idea that there is even a difference. Procedures do not return anything, in languages with C-stylesyntaxes this would be represented as a method with a return type of void. Functions do return a value.

16

Figure 6: A standard queue data structure

The push operation is a procedure which makes changes to the internal state of the

object. This already complies with the CQS pattern so no change is needed. Peek returns

an object so it is a function. Peek does not change the internal state of the object so it too

complies with CQS. Finally the pop operation returns and object and alters the state of

the queue. This is a violation of CQS. We can alter the queue so that the pop operation

does not return the object from the queue but simply removes it. Now when interacting

with the queue users will need to peek and pop as two separate operations, however

they can peek as much as they like and be assured that there will be no unexpected

consequences.

Figure 7: A queue data structure altered to function using the CQS pattern

Meyer feels that adhering to a strict separation between commands and queries in

a large system is key to success. Modules can safely call into other modules knowing

that any queries they run will be without side effect. He goes on to suggest that a class

which implements CQS should implement at least two interfaces, one which describes

the functions and one which describes the procedures. If it is undesirable that other

modules be able to update the state of the system then all that need be exported is

the interface which defines the methods. This is a theory which is in strong agreement

17

with the SOLID principles of software design[18] specifically the Interface Segregation

Principle which states that ”Clients should not be forced to depend upon interfaces that

they do not use”. Interfaces should be as narrow as possible to reduce the risk from

changing the underlying class. If the method to change was not one of the ones presented

in an interface to another component then that component need not worry about changes

to the underlying object.

CQRS takes the separation principles applied in CQS at the class level and expands

them to work at the system level. In the same way that CQS restricts data updates to

procedures CQRS restricts business model updates to commands. Queries against the

read model are equivalent to functions and must be side-effect free.

CQRS EXPLAINED

Thus far we have satisfied ourselves with an informal description of how CQRS works

and some of the implications of the pattern; let us formalize the pattern and explain it

fully. As we know the crux of CQRS is the separation of database read and write models

such that each one can be optimized for its specific purpose.

Figure 8: CQRS pattern diagram

Let us use the example of managing the information about a customer as our example

domain. In DDD parlance this would be the customer relation management bounded

18

context5. Within this bounded context we wish to perform actions against the customer

aggregate which contains the fields

• ID - a UUID for identifying the aggregate

• Customer Name6

• Street Address

• City

• Postal Code/Zip Code

• Province/State

• Country

• EMail Address

• Phone Number

• Hashed Password

• Has Gold Status

As a user the first difference you would notice is that the user interface is not organized

around making adhoc updates to the customer entity. In may system the edit customer

screen would have a field for each one of the fields in the customer object and a single

”Save” or ”Update” button at the bottom of the page. In a CQRS system a task based UI

5For more information about bounded contexts and DDD in general please see the section ”Relation-ship with Domain Driven Design”. Because of the close relationship between CQRS and DDD I will beusing a lot of DDD terminology; if you are unfamiliar with DDD it would be useful to read that sectionfirst.

6Unrelated to this essay but a interesting diversion is the paper found athttp://www.w3.org/International/questions/qa-personal-names which talks about the complexi-ties of how people are named around the world. The general advice is that there is no standard formfor first and last names and that system should attempt to just track a name rather than a first andlast name.

19

is used. The goal of a user interface of this sort is to capture the intent of the user rather

than simply the after effects. The update view is gone and has been replaced by a series

of business tasks related to the customer. For instance if the customer had moved and

changed their address there would be a function for ChangeUserAddress which would

provide fields for changing the fields of the aggregate related to the address, likely street

address, city, postal code, province and country. You’ll notice that phone number was

not included in that list of fields. This was an intentional exclusion as there is a loose

coupling between address and phone number: moving does not always result in a change

to the phone number. Alterations to the phone number are a separate business process

and would be handled through the ChangePhoneNumber use case and command. The

user interface will, through some layers of abstraction, publish a ChangeUserAddress

command.

The structure of the command will be very simple, it will contain the Aggregate ID

and the fields to set. This command will be sent to a command handler which will load

the aggregate from the domain model and update the required fields. Because CQRS is

designed to work in collaborative domains and there could be multiple updates to the

customer aggregate which arrive at almost the same time. Thus the command might also

contain a version number of the object which was displayed to the user, if this version

number does not match the version in the database then there have been updates between

the user being shown the aggregate and the arrival of the command.

There are a couple of strategies for dealing with supposed race conditions. The

first is to minimize the actions each command performs. In much the same way as the

Interface Segregation Principle dictates that as little should be made public as possible

the commands should be as granular as possible. A ChangeUserAddress command would

not at all conflict with a ChangePhoneNumber command as they update different fields.

If there is actually a conflict then it should be up to the business to decide what should

be done. If the conflict is rare then it might be flagged for an administrator to review if it

20

1 c l a s s MessageRouter ( )2 {3 p r i va t e Lis t<KeyValuePair<Str ing , IMessageHandler>> hand l e r s = new List<

KeyValuePair<Str ing , IMessageHandler>>() ;45 pub l i c void Reg i s te rHandler ( s t r i n g messageTypeToHandle , IMessageHandler

handler )6 {7 hand l e r s . add (new KeyValuePair<Str ing , IMessageHandler>(

messageTypeToHandle , handler ) ;8 }9

10 pub l i c void RouteMessage ( IMessage message )11 {12 s t r i n g messageName = message . GetType ( ) .Name ;13 hand l e r s .Where (x => x .Key == messageName )14 . ForEach (x => x . handle ( message ) ;15 }16 }

Code Listing 5: A message router which allows for registering message handlers androuting messages to them.

is more common then a more automated solution may be required. It is not the place of

the software to dictate what the business should do, the behaviour of the software should

be dictated by business needs.

Once the command handler has completed its processing it may raise an event to

inform interested parties that something has changed. In the case of the ChangePho-

neNumber handler an event such as PhoneNumberChanged would be published. Views

on the read model side which contain the customer’s phone number would subscribe to

this event and update their view of the world when it was received. This is often the

most difficult part of CQRS for people to understand as there is oft a concern about what

happens if the event is missed. If the event is missed it will throw the view model into

an inconsistent state which is highly undesirable, for the most part.

In pure CQRS the message delivery is all performed inside a single process. This can

often be achieved by registering command and event handlers in some sort of a container

and iterating over the container contents each time an message is sent. A very basic

implementation can be seen in listing 2. Because these handlers are in process they are

as reliable as calling any other piece of in process code.

21

Message Queues

As the scale of a CQRS solution grows it may become necessary to scale out to multiple

machines or, at the very least, out of a single process. Usually this is done by distributing

the message handlers to multiple machines and decentralizing the read model database.

In order to achieve this distribution one can make use of one of the many reliable mes-

saging technologies which have existed for many years. Message queues are the simplest

solutions, they provide a queue of messages which can be read from remote machines and

often also provide some support for transactions so that a process will read the message,

process it and only then will it be removed from the queue. Because queues of this nature

are so simple a Google search for Message Queue will return hundreds of products which

deal with ensuring that messages, once sent, will arrive. What cannot be guaranteed is

that the messages will arrive rapidly.

If the computer responsible for updating the CustomerOrder view model is broken

then it may not receive the update to the customer’s shipping address in a timely fashion.

Proponents of traditional architectures would point at this and claim it to be a major

flaw, however the message is safe it is simply sitting in a message queue somewhere. It

will eventually be delivered. In a traditional system if the database server is down an

error message is displayed for the user or the update is simply swallowed. These sorts

of failures often go unreported as it is very difficult to log events which did not happen.

With CQRS you can be assured that, baring a calamitous hardware failure, the message

will arrive at its destination. There are even some techniques which can be used to

alleviate the calamitous loss of a server; see the section on event sourcing.

Although CQRS is a technology agnostic pattern much of the development of it has

come out of the Windows .net community. It is therefore unsurprising that one of the

most popular message queuing technologies for use in CQRS projects is Microsoft’s own

Microsoft Message Queue(MSMQ). This system comes built into Windows operating

systems and has done for many years. It is one of the few queues which has support for

22

the Microsoft Distributed Transaction Coordinator(MSDTC).Although it is not necessary

to support transactions it can be useful especially in long running processes which are

usually known as sagas.

The Advanced Message Queue Protocol(AMQP) is an open protocol for message queu-

ing systems which was developed out of banking giant JPMorgan Chase. This standard

just reached version 1.0 this year although development began some years ago. AMQP

provides standards for message queues as well as message brokering which is unnecessary

for CQRS implementations. One of the goals for which CQRS strives is to maintain

simplicity in all of its aspects. The full AMQP is far more powerful than is required but

utilizing a subset of its capabilities allows for the use of one of the many implementations

of AMQP. These implementations include RabbitMQ, StormMQ and ApacheQuid7.

For applications with cloud based requirements there are cloud based messaging so-

lutions. Windows Azure provides Windows Azure Queues(WAQs) and Azure Service

Bus which are light weight but still functional queues. WAQs supports message queuing

and expiry but not distributed transactions. Based on the direction Microsoft is taking,

placing cloud computing at the top of its priority list, it seems likely that WAQs will be

the focus of a great deal of development time. Amazon also have their Simple Queue

Service(Amazon SQS) which has many of the same features as a WAQ. Windows Service

Bus was developed with the help of Udi Dahan and is especially designed to allow for

use in CQRS like solutions.

Event Sourcing

It should be clear by now CQRS is heavily dependent on messages. The domain model

in CQRS is built up from a series of commands and the view model is constructed from

a series of event messages. As the system grows and evolves the views must also evolve.

Consider a situation where a page on a website showed the price of a stock and the stock

7These queues have much niftier names than MSMQ and must therefore be superior as per Web 2.0nomenclature rules

23

symbol. The requirements change such that the view should also contain the name of

the company. How would one go about populating the view with the new information?

Usually it would be done by reading the information from some source and updating the

view table. In order to ensure that the page stays up to date a change will also have

to be made to the view population message handler. An alternative is to maintain a

collection of all the events which have occurred since the creation of the system and to

simply replay them when changes to the view model are needed. Ensuring that there is

a single source of all the events for the system is known as event sourcing[11].

Event sourcing is much more powerful than simply storing the current state of the

system because we gain a historical perspective of how we reached the current state.

Consider the insight we might gain from building an event sourced shopping cart on a

website. In a traditional system the shopping cart might store

• ItemID

• Quantity

This information is sufficient to allow us to process a checkout for the customer.

However if we wish to perform analysis of how people are shopping then the shopping

cart needs to gain some more fields. Perhaps we want to know what shoppers end up

buying when they have first added product X to the cart. For this we will need to also

store the time at which each item was added to the cart. The model now grows to

• ItemID

• Quantity

• AddedTime

Now consider the case where the shopper has removed an item and replaced it with

another. How would we store this in the shopping cart model?

24

• ItemID

• Quantity

• AddedTime

• RemovedTime

What if the customer changes their mind and adds the item back?

• ItemID

• Quantity

• AddedTime

• RemovedTime

• ReaddedTime

As you can see the model rapidly loses any elegance it had before. We must spend

a terrible amount of time designing this cart to ensure that we have covered all the

situations business might wish to analyze in future. In order to deal with the ever

changing requirements of business we would have to be continually changing the cart

and any new analysis would be unable to run on shopping carts created before the

requirement.

If we instead stored the stream of events which created the shopping cart then all the

analysis above becomes trivial. If there is a need to find out which product was added

after product X we can simply examine the order of the events. Even if an item is added

and removed many times we store each event and can create an accurate time line of the

state of the cart. With event sourcing we not only have the same data we would normally

have with a traditional we have all the metadata which describes the story of how the

data came into being.

25

Event sourcing and event storage has a number of other advantages. The foremost of

these is to keep track of the history of an object for audit purposes. There are countless

systems which need to know how an object came into being. This sort of architecture is

often built at the database level using shadow tables. Shadow tables are log tables which

are inserted into by a trigger on the parent table. Their structure mirrors the parent table

except that it adds a user, a date and an action. When a row in the table is altered the

original row is inserted into the shadow table by the trigger. Equally a trigger on delete

keeps track of when and by whom the row was deleted. With an event store tracking the

changes to the object is easy as you can quickly see exactly which fields were changed

and, if the system supports proper message names, the reasons for the change. The

reason is a key advantage over the shadow table approach because you can distinguish a

change to the address because of a move and because of a typo Messages can, if properly

constructed, convey intent. One can also perform analysis of the events which have been

fired for a particular root which can give the power to extract meaningful metrics about

how the system is being used.

The ability to replay events is also very useful for debugging the application workflow.

Events can be restored from the event store and run against a handler to build up a

snapshot of the system state at any point in time. The resulting state can be examined

and poked to isolate errors. Usually it is not advisable to change the history of a system

by altering the commands fed into it as that alters history and corrupts the audit trail.

However with a complete history of the commands one could change the way in which a

command is handled and rebuild the system state from the commands. In most cases we

treat the event store in much the same way that an accountant would treat a ledger. Line

items cannot simply be erased from the ledger if a mistake is made instead a correcting

entry is made to adjust the ledger back into balance. In CQRS correcting actions are

issued to bring the system back into line.

Replaying events to restore a past snapshot of the system is not only useful in debug-

26

ging and auditing, it can also have applications in every day activities. Consider the case

of requesting a copy of an invoice from six months ago. Since that invoice was issued

some of the data in the system may have changed, perhaps the name of the company has

changed. We don’t want the new data showing up on the invoice but this is a common

occurrence in many systems which are not time aware. It would not do for the invoice to

be issued to a company which did not exist six months ago.

The key to event sourcing is to ensure that all the state changes in a system occur

through event sourcing. Every event raised by the command handlers should be written

to some sort of a persistent store for later user. The storage of these commands is very

simple as one really needs only to store

• The date and time of the event

• A unique identifier for the event (UUID style ID is best for this application)

• The name or ID of the bounded context to which this command is tied.

• A serialized version of the event

A date is useful for purposes of ordering and also to be able to restrict the set of

commands being replayed to a narrower window than the entire lifetime of the system.

The unique identifier is useful for keeping track of which commands have been run and

which might still need to be run. It is also very useful to give each item in the system

a unique name. Database administrators are not, generally, fans of UUIDs or GUIDs

as they are more commonly known in the .net world. UUIDs tend to play havoc with

clustered indexes as they are completely random and inserting them into a clustered index

causes the database tree to have to be rebalanced frequently[6]. This is a computational

complex operation, especially on a database such an event store which experiences a

large number of insertions. There are two good solutions to this issue, the first is to use

non-clustered index and the second is to make use of Comb-GUIDs. These are GUIDs

27

which are generated using a sequential and a non-sequential portion which allows them

to be efficiently clustered and, at the same time, retain their global uniqueness.

It is often useful to have some record of the bounded context to which the command

belongs. While it is unlikely that the name of a command would be shared between BCs it

does allow for faster processing by filtering the commands and avoids the possibility that

a developer might rely on replaying the commands and events from a different bounded

context in order to obtain some sort of additional information. There are methods for

inter bounded context communication usually in the form of sagas or passing the required

information in along with the command.

Finally the serialized version of the event is what is reloaded and fired through the

event handlers to rebuild a view model. This same process can be used if a new view is

needed or even if the views are lost. In fact many proponents of event sourcing suggest

that it is not necessary to store the view models on non-volatile storage at all. They

claim that as the views can be rebuilt quickly there is no point in storing them on disk

or backing them up. Keeping the data in memory is becoming very cheap8 and accessing

in memory information is at least an order of magnitude faster than accessing it from

disk[17]. For high performance applications event sourcing and an in memory view model

is ideal.

The serialized version of the event can be represented in a number of ways. The easiest

is to simply use the built in object serializer which many modern languages now provide.

Java and .net languages provide binary serializes as do Ruby, Python, and pretty much

every other major language[1, 24, 2, 3] . Unfortunately the format for each of these is

different which is somewhat limiting if the messages need to be replayed in systems which

are written in different languages9. Binary serializers often have issue with missing or

8Amazon have released a product called ElastiCache(http://aws.amazon.com/elasticache/) which isan in memory key-value store. For about $1700 a month it is possible to rent 68GB of extremely fastmemory which would be perfect even for fairly sizable databases

9The use of multiple languages is starting to become the norm in large systems. Each languageis used for the specific application where it is best suited. This is commonly known as polygotprogramming(http://memeagora.blogspot.com/2006/12/polyglot-programming.html).

28

added fields. Unless the definition of the object being deserialized is exactly the same as

the current version in the application the deserialization may fail. A better approach is to

use one of the many textbased representations of object data such as JavaScript object

notation(JSON), yet another markup language(YAML) or protocol buffers. Protocol

buffers are particularly resilient to changing objects as they use an identifier for a field

which will accommodate the spelling correction change from a field called Csutomer and

Customer. All of these alternatives have wide cross platform acceptance and libraries for

most languages exist.

On the other side of CQRS the domain model is constructed from a series of com-

mands. Domain models can be quite complex and retrieving them from structured

databases can be slow for high performance systems. It is also difficult to continually

change the domain model in the database to deal with ever evolving domains. Fields may

need to be added, removed or altered as the system evolves. An interesting approach

to dealing with an evolving domain is to simply not store the domain entities at all.

That seems highly unusual after all the domain model is needed for validating data and

ensuring functions are permitted. Advantage can be taken of the fact that the only way

to alter the domain is through a command. The event store can be used to keep track of

all the commands used to build up a specific aggregate root. When this aggregate root is

needed the system can query the event store, retrieve the historical stream of commands

which built the model and replay them to create an in memory representation of the

object. This object is used for the lifetime of the operation and is then deallocated.

Using events to build up objects is not a new approach. It was used in the smalltalk

language developed in the early 1970s. It was possible to examine the log of changes make

to an object and even replay them for debugging purposes[12]. The LMAX architecture

also makes use of in memory images, though they use them to avoid the cost of traveling

to the disk[32].

It may seem that the retrieval and replaying of events would be slow and complicated

29

but it is usually not at all that complex. The actual performance of replaying events in

this fashion will be examined during the research phase of this project. On older systems

or systems which have a large number of transactions on a single object it may become

time prohibitive to reconstruct the object from its history. In these rare cases the objects

can be checkpointed from time to time. This involves replaying the messages used to

construct the object then saving the resulting object to some data store. When it is to

be retrieved again then the checkpoint is loaded and only messages which arrived after

the checkpoint was created would be applied. This checkpointing can be performed again

and again as the system ages to ensure the restoring the domain object is quick.

As the system matures it is likely that the events and commands will also change

to match changing requirements. In order to handle this change it is often desirable to

version events so that correct handlers will receive the messages. It is not uncommon

to see commands with names like IssueReceipt20110101 to denote a version of the even

which was current on the first of January 2011. This will ensure that the handler with

the correct logic for this sort of event is fired.

Many of the advantages of event storage can be realized without having to resort

to using an event store. As Udi Dahan points out if a message queue is used a simple

audit queue will allow for the replaying of events[25]. The advantages of a fully fledged

event store are that is provides a single source of truth, is easily replayed with the same

mechanism as is already in use and it allows for in memory images.

The Language of CQRS

There are two message types in CQRS, commands and events. In order to distinguish

the one from the other we use different tenses and voices. Commands are in the present

and use the imperative voice examples are

• GrantCustomerGoldStatus

• ChangeCustomerAddressDueToMove

30

• ChangeCustomerAddressDueToTypo

• IncrementCustomerRank

Usually a command starts with a verb followed by a subject.

Messages are notifications of events which have already occurred so they appear in

the past tense

• CustomerGrantedGoldStatus

• CustomerAddressChangedDueToMove

• CustomerAddressChangedDueToTypo

• CustomerRankIncremented

The names of the commands are crafted in a way that they suggest the intention

of the user so we know not only what changed but why. There is a difference between

changing a customer’s address due to their having moved and correcting a typo. In the

first case business processes to welcome the customer to their new home may be kicked

off, it would be embarrassing if the same processes were started for the correction of a

typo. Ideally the names of the commands should come from the business during the

domain analysis.

IDEMPOTENCE OF COMMANDS

One of the properties of a message based architecture is that messages may arrive out of

order. If a user creates an order and adds two items to it not only may the items added

to the order arrive at a service in the wrong order but the request to add an item to

the order may arrive before the message to create the order. In order to deal with this

a couple of things can be done. First the out of order messages can be returned to the

end of the queue in the hopes that by the time they are processed again the required

31

messages will arrive. It is preferable to build a skeleton object which will have additional

fields filled in as information arrives. Consider the case of the order and two items added

above:

Figure 9: A series of commands arriving out of order.

The command handler starts by looking up the order in order to save the item to it.

However because the order has yet to be created it cannot find an order. Thus it creates

a place holder or skeleton order which is missing some of the fields. Next an additional

AddItemToOrder command arrives. Now the handler is able to find an order item and

goes about adding the item to it. Finally the create order command arrives and, finding

the order already partially created, adds the additional information.

It is also possible that a message may arrive multiple times. Most message queues

32

guarantee that a message will arrive but offer no such assurances about messages arriving

more than once. A situation where this might happen is easy to imagine. Suppose a

command is sent but the reply confirming that the message has arrived is not received

by the sender, this could be due to a network issue or a crash of a server. The sender has

no option but to send the message again. In these situations either the system can check

to see if the message has already been processed or ensure that message processing is

idempotent. Checking to see if a message has been processed is difficult if the history is

stored in a database which exhibits eventual consistency. Even in a traditional database

there is opportunity for a race condition such that the message will be processed again.

By ensuring that the processing of commands is idempotent these can be avoided.

CQRS VS. ACTIVERECORD(CRUD)

CRUD is the typical architecture of data driven applications. In comparison with CQRS

there are a number of places where each holds advantages over the other.

In terms of speed of development CRUD is almost always faster. Because CRUD

focuses on updating entire entities at a time there is no need to spend development

time defining the various use cases around the updates to an entity. Users can simply

update any field of the entity on a generic update screen. The similarity of updates

to different objects presents the opportunity for building very generic update screens.

Indeed frameworks such as ASP.net MVC and Ruby on Rails provide scaffold tools which

can generate all the pages for performing updates to an entity. When coupled with

an object database such as MongoDB or RavenDB development using CRUD can be

amazingly easy. With a properly developed business model validations on such entries as

phone numbers and addresses is much easier with CRUD based approaches. Some even

suggest that an entire application can be more or less generated from the basic business

entities[26].

One area in which CQRS is easier is constructing queries for complex reports. Es-

33

pecially in the case of heavily normalized databases constructing reporting queries can

involve crossing a dozen or more tables. Building a denormalized view for the report

is typically far faster and easier. Many queries which are cumbersome in a relational

database become easy if they are constructed as views. For instance queries which re-

quire that row data be transformed into columns require that the developers be familiar

with the rather complicated PIVOT command10.

It is important to appreciate that the initial development of an application is only

part of the story. If the move away from waterfall to agile has proven anything it is

that designs and needs change over time. The world is dynamic and applications need

to also be dynamic. An example given by Young is that of a changed requirement for an

e-commerce website. The CEO decides that an important metric to capture is how many

times a customer removes an item from their cart before purchasing it. With a CRUD

approach the history of the cart is likely not captured. So the best that can be done is

to start capturing that information after the change has been requested. However with

CQRS+ES the cart is represented by a stream of events. From this stream it is trivial

to construct a denormalizer which can provide the required information. In this case the

lost data is not particularly important but it is not difficult to think of a situation where

the lost data has value. Any time that code contains an update statement data is lost.

Adapting to changing business requirements is easier with CQRS however a real

advantage is the ease of adapting to changing technologies. The read model database is

likely to be the part of the application which experiences the most load. Most companies

don’t feel comfortable yet with web-scale databases but with an event store rebuilding

the database is easy which means that moving to a new database technology is also easy.

Debugging a production problem in a CQRS system with an event log is fantastically

easy. While in a CRUD system the database need be carefully set up to mirror production

a CQRS developer can simply replay the production event log and bring the system to

10http://msdn.microsoft.com/en-us/library/ms177410.aspx

34

any point in time. The ability to pick and choose a point in time also allows for examining

what-if scenarios. To get the same functionality in a CRUD system is very difficult.

As data grows to what is commonly referred to as Web-Scale CRUD based approaches

break down. It is impossible to scale a single server up to the point which many companies

need. Instead applications must be scaled out. In order to facilitate multiple servers

making use of data at the same time complex synchronization techniques must be used.

While not a panacea messaging allows for much easier communication and coordination

between a number of machines. Applying messaging to CRUD tends to be quite difficult

as the messages which are the result of CRUD perform monolithic updates to all the

fields of an object at once. This approach increases the changes of a conflict.

There are advantages to both a CRUD and CQRS based solution. In an ideal world

developers would be able to pick an choose the technique used in any one domain(See

”CQRS as a Top Level Architecture”).

RELATIONSHIP WITH DOMAIN DRIVEN DESIGN

Just as CQS was a strong influence on CQRS Domain Driven Design(DDD) has been

instrumental in helping to outline the modular approach that CQRS takes when dealing

with an entire business.

Domain driven design is set of patterns and methodologies for building business soft-

ware from the domain model out. All the code in the application comes from the do-

main model. If the model change then the code must also change, equally a change in

the code implies a change in the model[10, 15]. The business is divided into bounded

contexts(BCs) which are usually delimited by a business function. For instance the mar-

keting department might be a bounded context and the shipping department another.

Each department is responsible for a set of information and is the canonical source of

that information. Should another bounded context require access to some of the infor-

mation in another bounded context then it must request that information from the other

35

BC. For instance it is conceivable that the shipping department, billing department and

marketing department are all interested in knowing a customer’s address. Instead of

keeping multiple copies of the customer’s address which must be updated whenever a

change is made only one of the BCs owns this information. Let’s say that it is the billing

department. None of the other BCs are permitted to perform updates to the customer

information. Should the shipping department require the address of the customer it will

request the information from the shipping BC.

This, of course, plays very well into a CQRS/ES setup. Each BC can provide a read

only view of the data which other services might like to consume. Alternately messages

may be sent from BC to BC in order to fill a complete message. Consider the business

process behind ordering a product. In order to full fill an order a number of different

activities need to be orchestrated. To a customer a usual workflow might look like

1. Login to the site

2. Add items to the shopping cart

3. Proceed to the checkout

4. Enter a shipping address

5. Enter credit card information

6. Receive confirmation of the order

Behind the scenes there are all sorts of actions which cross between different BCs.

Logging onto the site is an operation which might run against the Security BC. Logging

in is likely not part of the order workflow since it can occur without an order being placed

and is common to a number of other workflows. We’ll call it a prerequisite step.

Adding items to the cart is likely to be managed by the sales BC. It will keep track

of the items and the quantity. When finished selecting items the customer checks out.

Checking out is largely handled by the billing BC. It will be responsible for collecting the

36

customer’s address and payment information. On many sites it is possible to ship to a

different address from the payment address. In this case the shipping address should be

owned by the shipping BC. Finally the order is sent to the shipping BC for fulfillment.

Figure 10: A comparison of a customer as seen by traditional and DDD architectures.

What is apparent is that an item which would be considered a distinct entity in a

traditional system, customer, is actually divided into a lot of pieces. In DDD the data is

divided not into groupings which make sense for the entity but rather groups which make

sens for the business. Dividing responsibility means that each BC can be implemented in

any way that the company sees fit. Some of the BCs might be collaborative in nature and

would benefit from CQRS while others may be implemented using CRUD. A pragmatic

approach like this is key for DDD, and the ability to compartmentalize the businesses

into BCs allow for a high degree of flexibility.

There remains a great deal of discussion on the CQRS mailing list as to the necessity

of using DDD in an CQRS implementation. It is certainly not a solved problem. There

exists at least some level of symbiosis between DDD and CQRS. DDD solutions may be

implemented without CQRS just as CQRS solutions can be implemented without DDD.

However without defining domain boundaries mapping the flow of information between

services is very difficult. Equally traditional approaches lack much of the malleability of

historical data which is crucial for evolving domains.

37

CQRS IN REAL WORLD SITUATIONS

CQRS as a Top Level Architecture

With the various advantages of CQRS it is tempting to make use of CQRS in every

situation. However it is costly to implement CQRS and many domains may not need

CQRS. In a typical application there are at least two domains. The first domain is

the domain which the application serves directly. For instance in a loan application for

a bank the served domain would be that of loans. Functions are provided to transfer

money to loans and apply for new ones. However a second domain must also be called

upon to perform the authentication and authorization of users. This is likely a completely

different domain from the loan domain. We shall refer to this as the security domain.

The security domain is a low value domain. It does not provide any significant business

advantage to the bank. The functions of the security domain are so common that there

are numerous third party tools out there which could be plugged into the application to

provide the required functions11. There are a limited number of developers available so

it is logical to have these developers focus on the areas of the business which result in

the most revenue or which provide the largest competitive advantage. As such it is not

advisable to make use of CQRS as a top level architecture. Instead CQRS should be

applied selectively to domains which are important.

Fitting CQRS Into Teams

For most developers CQRS is a departure from the traditional development methods. The

largest concern I have heard in talking to developers and watching the CQRS mailing

list is that the domain model and the view models will become desynchronized. This

is certainly a concern but the use of messaging or transactions, if the view updates are

11Services such as LDAP and OpenID provide at least authentication and, in the case of LDAPauthorization. They are both well known and well tested and are likely to be cheaper to purchase andplug in than building a custom version. Indeed authentication in particular is far more difficult thanfirst appears.

38

synchronous, ensure that everything does remain in sync. Even if some sort of a disaster

occurs which throws the view model out of sync using event sourcing allows for the

rebuilding of the view models.

Jonathan Oliver, an early proponent of CQRS, has suggested that teams working with

CQRS can make better use of the varying skills of developers. McConnell suggests that

there can be as much as two orders of magnitude difference in the skills of developers[20].

Unfortunately there are too few developers at the high end of the spectrum for current

business needs[28]. Oliver suggests that directing less experienced developers to create

the view models and the message handlers required to update the views. These are areas

which are easy to fix in the event of an issues and the knowledge required to build the

views is minimal.

While there are frameworks and tools for CQRS the general feeling among early

adaptors is that it is best to build your own framework. The code required to develop

a CQRS solution is minimal. In his example application Young builds an entire CQRS

application in a scant 500 lines of code12. Oliver suggest that the more senior developers

be focused on this sort of structural code as well as any code which requires special

attention. Ideally the loose coupling of components in a CQRS project mean that the

command handlers should be fairly minimal. In many cases the handlers will do nothing

but publish an event.

In his Advanced Distributed Systems course Udi Dahan places a great deal of empha-

sis on keeping the code as simple as possible. He recommends against the use of heavy

weight object relational mappers, dependency injection and all manner of other architec-

tural niceties. Keeping the length of the handlers down means that such techniques as

separation of concerns and abstraction layers melt away. N-Tiered architecture was only

developed as a method of dealing with large code bases. The division of handlers into

those associated with a single bounded context ensures that the code is minimal. If you

12https://github.com/gregoryyoung/m-r

39

find yourself with a large number of handlers then it is a sign that the bounded context

is poorly defined.

Adding CQRS to existing applications is a difficult problem. Usually existing ap-

plications have not been built with DDD in mind. Actions cross over what would be

AR-boundaries because nobody has defined them. The UI is likely to be CRUD based

and the user feedback dependent on rapid synchronous actions. Even if we assume that

CQRS is possible without undertaking the task of performing DDD analysis of the do-

main then some progress towards CQRS is possible. A place to start is to commence

the creation of commands and the publishing of messages from the current application.

Nothing need listen in on the messages initially. Any new functionality should make use

of the messages whenever possible. The UI should also be realigned to make use of a

task based UI instead of CRUD operations. Perhaps the only savior is that CQRS need

not be applied to an entire application at once. DDD teaches that isolating each AR is

key to dealing with the management of complexity. Some of the ARs may benefit from

CQRS while others have no need of it. There is added complexity for domains which are

implemented using CQRS so domains which have no need of CQRS can be implemented

with traditional techniques.

Providing User Feedback in Asynchronous Applications

It is commonly accepted that users benefit from rapid and consistent feedback in an

application. Clicking on a button should have some effect and it is helpful if that effect

is as close to instant as possible. Everybody has suffered the frustration of clicking on a

link and waiting, breath held, to see if the requested page will load. However providing

feedback in an asynchronous message based architecture can be difficult. Consider adding

a new item to a collection. In a traditional design the database is updated synchronously

as part of the post back and the subsequently generated index page will contain any

updates. However asynchronous applications may not perform the update before the

40

index page is rendered.

Figure 11: Traditional synchronous persistence model. Data is persisted to the databasebefore the listing page is rendered.

It is confusing to users to rename, remove or add an item and not see that change

reflected on the listing page which is typically returned after an update. This is not an

issue which is limited to CQRS situations where the change is performed via a command

and subsequent events. Many of the new databases such as MongoDB, CouchDB and

Casandra perform updates in an asynchronous fashion favoring eventual consistency.

Even in cases where the data for the index is rendered from the same database node

as the update there is no certainty that the update will have been processed before the

listing. To mitigate user confusion a number of techniques may be used. The first is

to change the flow of the update use case such that instead of the update returning

to an index page it return either to itself or a page which states that the update has

been submitted. This is something of a hack as it depends on introducing a slight delay

to the workflow permitting the data to be processed on the server. If the processing

delay is short then this is a very good solution. Of course the processing time cannot be

guaranteed. Another option is to make use of local storage to persist the change. HTML

5 web browsers support saving data offline through local or session storage. The listing

41

page can then be rendered with a combination of cached data and database data.

Finally attempts should be made to educate users about the limitations of computers.

In the general public there is a belief that computers are instant. This is not the case and

many user issues stem from this belief. Giving users more realistic expectations about

how data flows and is saved will be helpful in this case and many others.

Figure 12: Asynchronous persistence model in which the data may or may not appear inthe database before the listing page is rendered.

Solving the Difficult Offline Problem

Thus far we have spent a lot of time discussing how to deal with processing messages

from web or desktop applications. These situations assume that the client is connected

to the server at all times. As mobile phones increase in both power and popularity they,

along with the relatively young tablet platform, are becoming very important13. Mobile

devices are not always connected so for applications to function in a disconnected state

special allowances must be made. There are two problems with being disconnected:

• Updating the device with the latest data

• Ensuring that actions which occurred on the device in a disconnected state are

persisted back to the server.

13Shipments of iPads have been very strong for some time and the trend continues[31]

42

The first item is usually resolved through a process of synching. If you have ever

plugged in an iPod and waited while iTunes painstakingly compares the content on the

iPod with that on the computer it should be obvious that running naive comparisons is

slow. Instead of comparing the entire database it would make much more sense for the

iDevice to store a timestamp of the last sync and then have iTunes simply replay the

addition and deletion commands which were subsiquent to the iDevice timestamp. This

more or less eliminates the need for expensive comparison operations. All that is being

done is that a view model is being updated from a stream of events. This is no different

from how a view would be updated in normal CQRS operations.

Commands issued on the device can be used to solve the second problem. Imagine

that you are creating a playlist on the iDevice for later synchronization to iTunes. Each

action taken, be it adding a song, removing a song or renaming the playlist can be

stored in an event store. When the device is reconnected these events are replayed and

interwoven with any commands which may have happened in iTunes. In most cases the

commands from the two sources will not be in conflict however in the rare case when

they are then the user may be prompted for action. Synchronizing these two devices is

no different from dealing with delayed events in a web based system; something at which

CQRS excels.

Many of the mobile applications are smaller versions of a larger application. While the

main application may be constructed with multitenencay in mind and has the need to to

process all the messages for all the users the mobile version has less onerous requirements.

For instance a typical mobile application might require only the records for a specific user.

It would, in fact, be a security concern to allow the records for another user to be loaded

onto the device. This situation is easily handled by the offline CQRS. The messages can

be filtered by customer or by user so that only pertinent updates are sent to the device.

43

Examples

There are not a huge number of products which make use of CQRS at this juncture but

there are several projects which do. The CQRS mailing list is more than a thousand

people strong now so it is likely that there will be some not significant projects soon.

CQRS is also a technique which tends to lend itself more fully to being used inside a

business. Thus it is likely that there are many implementation out there which are not

known as a result of confidentiality agreements. Dahan claims that a significant portion

of Amazon’s architecture is built using techniques which are very similar to CQRS[8]

and the description Yegge gives of the internal workings of Amazon confirms it[33]. In

particular the composite user interface is a huge component of building the front ends

for CQRS systems.

One company which is very publicly building its systems using CQRS techniques

is the business forecasting company Lokad. Their involvement is largely due to their

chief technologist Rinat Abdullin’s interest in the CQRS community14. Lokad make use

of streams of events to provide not only predictions about inventory but also realtime

updates of stock levels. Their predictions benefit from being able to analyze historical

streams of events. As mentioned in the event sourcing section all the data about trans-

actions is preserved for later analysis, this lends itself well to such datamining questions

as

• During what part of the day do I sell the most ice cream?

• What percentage of product X is returned?

• When should I reorder a product from its supplier based on historical consumption

levels?

Lokad also makes use of the data available in the event stream about why an event

occurred. Having available to them a differentiation between a customer created due to

14Rinat hosts the distributed systems podcast and is also very active on the mailing lists

44

an import from an old system and one created on the company’s website is useful in

performing in depth analysis.

RESEARCH QUESTION

CQRS is obviously a huge and complicated topic which has only just begun to be explored

by the community and is more or less absent from the academic community. I propose

to explore only a tiny fragment of the CQRS problem space, specifically the area around

using an event store to rebuild the domain model whenever it is queried.

I propose to answer a number of question

• How quickly can we replay messages using various different serialzation approaches?

• How quickly can we replay messages using various different data storage techniques?

• How many messages can reasonably be processed to rebuild an object?

• How large can objects become before they need to be checkpointed?

• How much does the size of the objects(number of fields) have an impact on the

speed of message serialization and deserialization?

EXPECTED OUTCOME

There is a great deal of hype about the efficiency of NoSQL databases and their typi-

cal eventual consistency model does allow for much greater burst throughput than SQL

databases. However the data does eventually need to be made consistent so while the

burst rate may be higher over the long run I would expect the performance to degrade

somewhat. Still I expect that NoSQL solutions will outperform their SQL based com-

petitors. The data being added is so simplistic and non-relational that the extensive

optimizations in SQL solutions will not be necessary nor effective.

45

There is also a lot of hype in the technical community about the effectiveness of light

weight serialization formats such as JSON and ProtocolBuffers. Certainly these formats

should be faster to read and write on a bit for bit level than XML which is far more

verbose. It is, however, possible that what these formats gain in length of serialization

and size they will lose due to complex parsing requirements which tax the CPU. I do

not believe that will be the case as processors are stunningly fast and generally limited

by disk I/O. However even the most efficient text based serialization is likely to be far

less efficient than a binary serializer. I expect that the binary format will be the most

efficient followed by text based serializations and finally XML serializations.

46

CHAPTER III

METHODOLOGY

There are a number of pre-built solutions for the storage and replay of events, the most

popular of which is Johnathan Oliver’s EventStore. EventStore is an amazingly extensible

system which allows for storing events in a large number of different databases in a large

number of formats. Many of the supported databases are simply there to prove that

it is easy to store events in almost any datastore15. There are, however, a significant

number of solutions which may be reasonable. I propose to explore some of the more

likely candidates and benchmark them to establish if the ideal event stores are relational

databases, cloud databases or even one of the newer NoSQL style databases such as a

KeyValue store or a document database.

In much the same way that there many database supported there are also a large

number of potential serializers. Many developers are most comfortable using the built in

serialization. The format of the language native serializations varies form implementation

to implementation. Java and C# tend towards a binary serialization format where as

languages such as Python, Ruby and JavaScript use text based serialization. Binary

serialization formats tend to be smaller than their text based alternatives which is highly

advantageous in an environment which has limited disk space or bandwidth. Their smaller

size also tends to make people believe that the mechanics of serializing to and from them

would be faster. As previously discussed there are some shortcomings when using a

binary serializer to store events. Largely these related to language lock in, a lack of

readability and por adaptability to changing object definitions. Even small variations

such as changing the namespace of an object can make deserialization in older software

impossible.

15The complete list of supported data stores can be found at https://github.com/joliver/EventStoreand includes such ridiculous datastores as Microsoft Access 2000 which should not be used to run alemonade stand let alone a complex CQRS system

47

EventStore provides a common platform for evaluating combinations of storage and

serialization format. The list of storage technologies is almost endless. The suported

formts list is slightly shorter with support for

• JSON

• BSON

• .net Binary

There are also secondary serializers which add compression or encryption on top of the

standard serializers. I believe that protocol buffers are an ideal serialization format for

EventStore however there is currently no support for it so I will add support for it.

In order to answer the research questions I shall generate a number of different domain

objects ranging from trivial objects to complex objects. A trivial object might have only

two or three fields while a more complex object might have dozens of fields. These

objects will be constructed in memory by replaying a number of events from the event

store. The messages will also vary in terms of complexity from setting a single field to

setting multiple fields using logic based on the current in memory representation of the

object. The time required to reconstruct the objects in memory from their serialized

message stream will be measured using a variety of data storage tools and serializations.

I will benchmark these data stores

• SQL Server(traditional relational database)

• MySQL (traditional relational database)

• SQLite (embedded relational database)

• MongoDB (document database)

Each of these will be locally installed in order to be tested.

The serializers which will be tested are

48

• JSON

• BSON

• Protocol Buffers

• XML

• Binary

For each of these serializations I will track the time taken as well as the size of the

resulting serialized object. This will provide metrics for the amount of disk space or

network bandwidth needed for each type. As mentioned there is currently no implemen-

tation in EvenStore for Protocol Buffers so I will add support using the protobuf-net16

library which is a well used protocol buffer implementation. There are many other text

based serialization formats which could be used such as Apache Thrift17 and YAML18

but they tend to have lower adoption than the formats we will be testing. XML is also

a favorite serialization format however it has fallen out of favour in most development

communities(except Java) due it its high file size overhead and the perceived complexity

of the myriad of associated standards.

ENVIRONMENT

All the testing will be done using the .net framework and any additional code will be

written in C#. There are no complexities in the testing or in EventStore proper which

would preclude the use of other languages. The same environment is simply kept to

ensure as much consistency as possible.

The tests will be run run on a computer with the following configuration

• Windows 7 - 64 bit

16http://marcgravell.blogspot.com/2011/05/protobuf-net-v2-beta.html17http://thrift.apache.org/18http://yaml.org/

49

• Intel i7-2600K 3.40GHz CPU overclocked to 4.10GHz which contains 4 physical

cores and 4 hyperthreading cores

• 16GB of memory

• 120GB Corsair Force3 Solid State Drive

• 2TB WD20EARS Rotational Disk

Serialized objects will be persisted to the rotational disk.

The tests will each be run 20 times to ensure that any outlying datapoints are

smoothed away. The actual variation in results is likely to be extremely low.

50

CHAPTER IV

RESULTS

SERIALIZATION AND DESERIALIZATION

The first thing investigated were the properties of various different serializers and dese-

rializers. JavaScript Object Notation or JSON is a human readable serialization format

which was originally proposed by Crockford[7]. BSON is a binary version of JSON. It

loses its human readability but is generally more space efficient and is also faster to scan.

For both the JSON and BOSN serialization experiments the Newtonsoft JSON serializa-

tion library was used19. This is the preeminent .net library for JSON serialization. XML

serialization is simply the transformation of the messages into the very well known XML

format. XML is human readable and is know for being quite verbose. The serialization

was handled by the built in serialization system in .net. Protocol buffers are a product

of Google’s internal workings[14]. They strive for speed and adaptability to changing

message fields. Unlike the other formats they ignore the names of the fields and instead

number them. This means that a protocol buffer serialization is not as easy to create

as the other formats. A .proto file must be created which acts as a mapping between

the field names and their numbers. It was for this reason that it turned out to include

protocol buffer serialization in the event store library which aims to require nothing from

teh developer but an object to serialize. There are a number of libraries for performing

protocol buffer serialization in .net, the test made use of the protobuf-net library20.

Three different metrics were examined for each of the serializares. First the speed of

serialization, second the speed of deserialization and finally the size of the serialized ob-

jects. Two different object were serialized, the first was a very simple message consisting

of only three fields. The second was a large object with twenty eight fields. The large

19http://james.newtonking.com/pages/json-net.aspx20http://code.google.com/p/protobuf-net/

51

1 pub l i c Guid ID { get ; s e t ; }2 pub l i c s t r i n g CustomerName { get ; s e t ; }3 pub l i c s t r i n g CustomerAddress { get ; s e t ; }

Code Listing 6: A simple message.

1 pub l i c Guid ID { get ; s e t ; }2 pub l i c s t r i n g Name { get ; s e t ; }3 pub l i c s t r i n g HouseNumber { get ; s e t ; }4 pub l i c s t r i n g StreetAddress { get ; s e t ; }5 pub l i c s t r i n g PostalCode { get ; s e t ; }6 pub l i c s t r i n g City { get ; s e t ; }7 pub l i c s t r i n g Province { get ; s e t ; }8 pub l i c f l o a t Height { get ; s e t ; }9 pub l i c s t r i n g HeightUnits { get ; s e t ; }

10 pub l i c f l o a t Weight { get ; s e t ; }11 pub l i c s t r i n g WeightUnits { get ; s e t ; }12 pub l i c s t r i n g ShoeSize { get ; s e t ; }13 pub l i c decimal LeftArmLength { get ; s e t ; }14 pub l i c s t r i n g LeftArmLengthUnits { get ; s e t ; }15 pub l i c decimal LeftWristCircumfrence { get ; s e t ; }16 pub l i c s t r i n g LeftWristCircumfrenceUnits { get ; s e t ; }17 pub l i c decimal LeftLegLength { get ; s e t ; }18 pub l i c s t r i n g LeftLegLengthUnits { get ; s e t ; }19 pub l i c decimal LeftAnkleCircumfrence { get ; s e t ; }20 pub l i c s t r i n g LeftAnkleCircumfrenceUnits { get ; s e t ; }21 pub l i c decimal RightArmLength { get ; s e t ; }22 pub l i c s t r i n g RightArmLengthUnits { get ; s e t ; }23 pub l i c decimal RightWristCircumfrence { get ; s e t ; }24 pub l i c s t r i n g RightWristCircumfrenceUnits { get ; s e t ; }25 pub l i c decimal RightLegLength { get ; s e t ; }26 pub l i c s t r i n g RightLegLengthUnits { get ; s e t ; }27 pub l i c decimal RightAnkleCircumfrence { get ; s e t ; }28 pub l i c s t r i n g RightAnkleCircumfrenceUnits { get ; s e t ; }

Code Listing 7: A complex message with 28 fields.

object was also serialized with only three fields filled out in an attempt to test how well

the serializers worked on a sparsely populated object.

RESULTS

50 000 messages were serialized to individual files on a rather slow rotational disk. The

trial was repeated 20 times and the averages recorded.

52

Serializer Avg Serialization(ms) Avg Deserialization(ms) Size on disk (b)

JSON 12326.25 2747.70 101

BSON 11476.55 2507.84 108

ProtocolBuffer 11654.91 1907.70 41

XML 13035.59 3870.07 293

Binary 11938.38 2751.45 360

Serializing 50 000 small message

Figure 13: The average time, in ms, for serialization of 50 000 small messages.

53

Figure 14: The average time, in ms, for deserialization of 50 000 small messages.

Figure 15: The on disk size of a serialized small message. In bytes.

54


JSON 15911.86 11285.09 722

BSON 23865.26 4398.40 759


XML 15389.13 6285.25 1382

Binary 14767.49 4931.53 1405

Serializing 50 000 large message

Figure 16: The average time, in ms, for serialization of 50 000 large messages.

55

Figure 17: The average time, in ms, for deserialization of 50 000 large messages.

Figure 18: The on disk size of a serialized large message. In bytes.

56


JSON 16400.63 11712.37 676

BSON 18244.99 2789.06 350


XML 16074.92 6297.91 679

Binary 14258.41 5006.68 1270

Serializing 50 000 sparsely populated large message (Only 3 of the 28 fields were

populated with non-default values)

Figure 19: The average time, in ms, for serialization of 50 000 sparsely populated largemessages.

57

Figure 20: The average time, in ms, for deserialization of 50 000 sparsely populated largemessages.

Figure 21: The on disk size of a serialized sparsely populated large message. In bytes.

STORAGE TECHNOLOGIES

An examination was also made of a variety of different storage technologies. Relational

database have, for a long time, been the default tool for storing any sort of structured

data but are they the best tool for storing events?

58

The relational databases were represented by Microsoft SQL Server and

All experiments were performed using 500 messages and were repeated 20 times.

Storage Technology Avg Serialization(ms) Avg Deserialization(ms)

SQL Server 39.35 11.10

MySQL 101.20 56.70

SQLite 3506.60 38.50

MongoDB 40.60 7.90

Figure 22: The time taken to serialize and deserialize 500 messages to a variety of differentstorage technologies. In ms.

59

CHAPTER V

RESEARCH IMPLICATIONS

What is apparent is that rebuilding domain objects from a stream of messages is not

very time consuming at all. There are clear advantages to maintaining a rich history of

domain objects and should the replay of messages become expensive checkpointing can

be used. However a thousand messages can easily be replayed in half a second so unless

the object’s history is in excess of 1000 messages checkpointing should not be needed.

If an object is really so mutable that it requires 1000 messages then there may well be

some design issues such as too large a domain object. It is unimaginable that an object

such as a customer address would ever be changed 1000 times.

Development is such that it is very difficult to suggest a single solution as a panacea.

It is such with event storage solution: no one solution which is best for all situations.

As with all technologies is best to examine the technology choice within the context

of the business. Adoption of NoSQL technologies such as MongoDB has been slow in

large enterprise and likely even slower in smaller companies[30]. In such cases it may be

difficult to get buy in from the company to use a non-traditional data storage technology.

Event storage is quite simple and can be done without a database at all. Abdullin now

uses simple on disk storage for his messages[4] and has even proposed writing raw data

to the disk to avoid the overhead of the file system. Because messages are accessed in a

sequential fashion for replay disk storage is a reasonable approach, however storing the

events in a database does allow for some additional analysis of metadata. For instance it

is easy to extract statistics about the number and type of messages and when they are

most prevalent, No matter which data storage choice is made the messages will still need

to be serialized.

60

SERIALIZATION TECHNOLOGY SELECTION

While the binary serializers seem to be faster in general than the text based serializers

there is a development cost in that any time a message needs to be debugged or investi-

gated it must first be opened in some sort of a tool which understands the serialization

format. Text serializations are easily read by the operator and issues are far more ap-

parent. Overall the difference in serialization time is rarely significant unless the number

of messages is astronomical. It is a far more common task to deserialize a message than

to serialize it so we should pay attention to serializers which are fast at deserialization

more than ones which are fast at serialization. For this case protocol buffer does seem

to have the advantage, in some cases protocol buffers demonstrated a full order of mag-

nitude difference in the time it took to serialize a message vs. deserialize it. However

protocol buffers requires that an explicit schema be created for each object. In many

CQRS systems this would require that hundreds of messages be explicitly set up for se-

rialization. The other serializers tested are schemaless in that they do not require any

serialization schema other than the class file. As mentioned previously serializaers which

fail on missing or new attributes should be avoided to allow for easier message mutation

as the system grows and changes.

We can still draw some best practices from the results. Large messages always took

longer to work with than small messages, even if the large messages were sparely popu-

lated. It is therefore preferable that use is made of a number of small messages over large

messages. This practice mirrors best programming practices which suggests that smaller

classes are far more maintainable than large classes.

DATABASE SELECTION

SQL server and MongoDB turned out to be the best databases at storing event streams.

The advantage of MongoDB is that it is not a single node database and, through eventual

consistency, it is possible to have multiple nodes to which messages can be written. This

61

would speed up writing significantly. However a similar approach can be taken with

SQL server. Because once a message is created it is immutable the messages can be

written across a number of smaller SQL Server nodes and then interwoven at replay time.

Selecting the appropriate writing node may be done in a simple round robin fashion. This

approach allows for massive scalability with almost no increased licensing costs.

The embedded solution, SQLite, was significantly slower than the dedicated solutions.

In most cases it should be avoided. However in the case of an offline or occasionally

connected client then an embedded database would be ideal for maintaining a collection

of messages generated on the client. Embedded databases might also be a good approach

for low traffic systems or packaged systems which are delivered to a client.

GENERAL APPLICABILITY TO CQRS

Event storage and more specifically the EventStore technology is a very small part of

CQRS and may not even be used in many implementations. Indeed the most well writ-

ten about CQRS implementations those from the company LOKAD do not make use of

EventStore. However the underlying database and serialization technologies are of impor-

tance to CQRS developers and should be taken as a general examination of serialization

techniques for messages.

62

CHAPTER VI

CONCLUSIONS

CQRS+ES is a fascinating solution to a number of different problems which are common

in enterprise computer programming. While CQRS may not be applicable to all situations

there are many places in which the advantages are clear. Any sort of financial problems

benefit greatly from having a historical stream of events to aide with auditing. Equally

the ease of scaling CQRS means that adapting to a growing business is simplified. Perhaps

the single greatest advantage is that adapting to changing business requirements is much

easier with a CQRS+ES solution.

There is a definite overhead to developing with CQRS+ES. The initial project setup

is far more difficult than with a traditional project. Many of the tools such as object

relational mappers to which developers are use are no longer as heavily used as they

were in the past. This means that finding developers to work on CQRS+ES solutions

is much more difficult. This is especially true at the moment as CQRS+ES is very new

and relatively few developers have had a chance to become familiar with the ins and

outs of it. There is a lack of literature and training material on CQRS+ES, for the most

part the training is limited to a handful of videos recorded by the likes of Udi Dahan

and Greg Young. There are limited training classes and the mailing list is extremely

verbose. Obviously these situations will improve as CQRS+ES become more accepted.

That Microsoft has directed their Patterns and Practices group to examine CQRS and

develop guidelines around it will likely spur adoption in a community which is largely

suspicious of anything which is not backed by a large software vendor.

There is also a lack of frameworks for CQRS. While the general feeling on the mailing

list is that frameworks are totally unnecessary and, in fact, undesirable many people

will avoid making using of CQRS without some sort of a framework to help them along.

With such hostility towards a framework it is unlikely that a common framework will be

63

adopted at any point in the near future.

Developers are starting to bump into the limits of traditional relational databases

and application architecture. As applications scale many of the old ways of working

are proving to be simply too slow. There is also a large discrepancy in the cost of

computers. It is far cheaper to buy a large number of small computers than to buy the

expensive high end multihundred CPU systems. This commodity computing approach

is one which is well proven by companies such as Google and Amazon. At the same

time the popularization of cloud computing is presenting the opportunity to scale out an

application quickly. Under these conditions CQRS is an ideal architecture for it allows for

quick scale out. Some cloud systems offer discounts for performing processing at off-peak

hours which is a great model for CQRS as much of the message processing can be delayed

and developers are already thinking about how quickly processes need to run.

I will certainly be aware of the possible applications of CQRS in any green fields

development projects in which I am involved in the next while.

64

Glossary

• OODBMS - Object Oriented Database Management System. Similar to a tradi-

tional RDBMS but object oriented rather than table based.

• RDBMS - Relational Database Management System. A system which uses tables

and relationships to organize, sort and query structured information. Examples

include systems such as Oracle, MySQL and Microsoft SQL Server.

• DTO - Data Transfer Object. A very light weight object in an OO language

which simply acts as a container for transporting data from one system or location

to another. Usually these objects have no functions or method, only fields or

properties.

• DDD - Domain Driven Design. A software engineering approach which favours

creation of software through the establishment of business models developed in

close conjunction with domain experts.

• BCNF - Boyce Codd Normal Form. A strong normal form for databases in which

each and every one of the non-trivial dependencies X → Y is a superkey.

• JSON - JavaScript Object Notation. A standard method of serializing JavaScript

objects. Serialized objects take the form { <property>: <value> } with arrays

denoted by square braces. It was originally proposed by Douglas Crockford in

RFC4627 21 and has since progressed from an addon library to being natively im-

plemented by all major browsers.

• BSON - A binary representation of JSON data. Its goal it is to be more efficient

for decoding than JSON however as it is binary it loses the advantage of being

human readable like JSON.22

21http://tools.ietf.org/html/rfc462722http://bsonspec.org/#/specification

65

• Protocol Buffer - A language and platform neutral serialization format developed

by Google for use in their internal RPCs. Google claims that they are far more

efficient than using XML both in terms of the size of the payload and in the speed

of serialization and deserialization. 23

• SOLID Principles - A set of software engineering principles most often ascribed

to ”Uncle” Robert C. Martin[19, 18]. In short they are

– Single Responsibility Principle - There should never be more than one

reason for a class to change

– Open Closed Principle - Software entities (classes, modules, functions, etc.)

should be open for extension, but closed for modification. That is to say that

the source should not change but the objects can be modified by extending

them.

– Liskov Substitution Principle - Functions that use pointers or references

to base classes must be able to use objects of derived classes without knowing

it. For instance a function which takes a shape object should be able to take

a triangle object as it is more specific. This is also known as covariance.

– Dependency Inversion Principle - High level modules should not depend

upon low level modules. Both should depend upon abstractions. Abstractions

should not depend upon details. Details should depend upon abstractions.

– Interface Segregation Principle - Clients should not be forced to depend

upon interfaces that they do not use, instead specific interfaces should be

created which present only the used methods.

• UUID - Universally Unique Identifier. A 128-bit record identifier used for identifing

pieces of data in a large system. Often represented as a hexadecimal string with

23http://code.google.com/apis/protocolbuffers/docs/overview.html

66

hyphens in the form 8-4-4-4-12, an example might be 75e67b67-b605-4ff7-bdc7-

51e5ca7f0a0b. The 128-bit keyspace is so large that it is highly, highly improbable

that any two generated keys will be the same even taking into account the birthday

paradox.

• MSDTC - Microsoft Distributed Transaction Coordinator. A system used to com-

mit transactions across a number of databases or systems ensuring that information

is committed in a consistent fashion.

67

References

[1] Serialization (C# and visual basic). http://msdn.microsoft.com/en-

us/library/ms233843.aspx, 2010.

[2] Module: Marshal. http://ruby-doc.org/core/classes/Marshal.html, 2011.

[3] Python object serialization. http://docs.python.org/library/pickle.html, 2011.

[4] Rinat Abdullin. Event sourcing: Projections. http://bliki.abdullin.com/event-

sourcing/projections.

[5] Eric Brewer. Towards robust distributed systems, July 2000.

[6] S. Choenni, H. Blanken, and T. Chang. Index selection in relational databases. In

, Fifth International Conference on Computing and Information, 1993. Proceedings

ICCI ’93, pages 491–496. IEEE, May 1993.

[7] Douglas Crockford. The application/json media type for JavaScript object notation

(JSON). http://tools.ietf.org/html/rfc4627, 2006.

[8] Udi Dahan. Advanced distributed systems, October 2010.

[9] Udi Dahan. Race conditions don’t exist. http://www.udidahan.com/2010/08/31/race-

conditions-dont-exist/, 2010.

[10] Eric Evans. Domain-Driven Design: Tackling Complexity in the Heart of Software.

Addison-Wesley Professional, 1 edition, August 2003.

[11] Martin Fowler. Event sourcing. http://martinfowler.com/eaaDev/EventSourcing.html,

December 2005.

68

[12] Martin Fowler. MemoryImage. http://martinfowler.com/bliki/MemoryImage.html,

August 2011.

[13] Seth Gilbert and Nancy Lynch. Brewer’s conjecture and the feasibility of consistent,

available, Partition-Tolerant web services, 2002.

[14] Google. Developer guide - protocol buffers.

http://code.google.com/apis/protocolbuffers/docs/overview.html.

[15] Dan Haywood. An introduction to domain driven design.

http://www.methodsandtools.com/archive/archive.php?id=97, 2009.

[16] IDC. The diverse and exploding digital universe, 2008.

[17] Tobias Kind. RAMDISK benchmarks, 2009.

[18] Robert C. Martin. Agile Software Development, Principles, Patterns, and Practices.

Prentice Hall, 1 edition, October 2002.

[19] Robert C. Martin. ArticleS.UncleBob.PrinciplesOfOod.

http://butunclebob.com/ArticleS.UncleBob.PrinciplesOfOod, 2005.

[20] Steve McConnell. Code Complete: A Practical Handbook of Software Construction.

Microsoft Press, 2nd edition, July 2004.

[21] Dwight Merriman. NoSQL and MongoDB.

[22] Bertrand Meyer. Object-oriented software construction. Prentice Hall, Upper Saddle

River, NJ, 1997.

[23] Microsoft. Patterns and practices: Upcoming releases.

http://msdn.microsoft.com/en-us/practices/bb232643.

69

[24] Sun Microsystems. Java object serialization specification.

http://download.oracle.com/javase/1,5.0/docs/guide/serialization/spec/serialTOC.html,

2004.

[25] Jonathan Oliver, Udi Dahan, Rinat Abdulin, and Jonathan Matheus. When to avoid

CQRS - clarified.

[26] R. Pawson. Naked objects. IEEE Software, 19(4):81– 83, August 2002.

[27] Raghu Ramakrishnan and Johannes Gehrke. Database Management Systems.

McGraw-Hill Science/Engineering/Math, 3 edition, August 2002.

[28] Matt Richtel. Tech recruiting clashes with immigration rules. The New York Times,

April 2009.

[29] Stefano Rivera. Storage & retrieval for an immensely scalable monitoring system,

2008.

[30] Darryl K. Taft. NoSQL makes big inroads in enterprise development:

Survey. http://www.eweek.com/c/a/Desktops-and-Notebooks/NoSQL-Makes-Big-

Inroads-in-Enterprise-Development-Survey-500444/, June 2011.

[31] Trefis Team. Surging iPad shipments this quarter likely to top forecasts, boost $510

target price. http://www.forbes.com/sites/greatspeculations/2011/09/06/surging-

ipad-shipments-this-quarter-likely-to-top-forecasts-boost-510-target-price/, Septem-

ber 2011.

[32] Martin Thompson and Michael Barker. LMAX - how to do 100K TPS at less than

1ms latency. http://www.infoq.com/presentations/LMAX, 2010.

[33] Steve Yegge. Rip rowan - google+ - stevey’s google platforms rant, October 2011.

[34] Greg Young. CQRS workshop, July 2011.

70