ATHABASCA UNIVERSITY
MESSAGE ORIENTED SOFTWARE DESIGN USING COMMAND QUERY
RESPONSIBILITY SEGREGATION
BY
SIMON TIMMS
A project submitted in partial fulfillment
Of the requirements for the degree of
MASTER OF SCIENCE in INFORMATION SYSTEMS
Athabasca, Alberta
March, 2012
c©Simon Timms, 2012
ABSTRACT
The design of software is an ever evolving process. It seems that every year there are
new technologies and new techniques which require that software developers adopt new
methodologies for the development of software. As the popularity of the computers and
the Internet increase the computational and storage requirements of Internet services
also rise. In order to address these needs large distributed systems such as Amazon’s
EC2 and Microsoft’s Azure were created. However it is not enough to simply scale up
an application to larger and faster computers as the growth of demand is outstripping
even the astronomical pace of improvements to memory and CPU. Instead of scaling up
one must scale out to a greater number of computers as the growth in the number of
computers is not as limited. Scaling out is a complex undertaking due to a number of
factors. One possible approach to scaling out is to make use of a technique known as
Command Query Responsibility Segregation(CQRS). In its purest form CQRS dictates
that different data models be used for reading and writing data synchronized through
the use of messaging.
A design used in many CQRS system is to retain a stream of events rather than a
rich domain model. This event sourcing relies on the assumption that it is very quick
to recreated the current state of domain objects and views from a stream of messages.
However, this has not been well proven and there is especially little research on the best
method of storing messages and the actual serialization of the messages. An exploration
of the best method of using event sourcing will be made and conclusions drawn about
the advantages of the various approaches.
i
TABLE OF CONTENTS
CHAPTER 1 - INTRODUCTION 1
CHAPTER II - REVIEW OF RELATED LITERATURE 3
Architectures for Development of Data Driven Applications . . . . . . . . . . . 3
Scaling Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Consistency is a Lie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
CQRS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
CQRS Explained . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Idempotence of Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
CQRS vs. ActiveRecord(CRUD) . . . . . . . . . . . . . . . . . . . . . . . . . 33
Relationship with Domain Driven Design . . . . . . . . . . . . . . . . . . . . . 35
CQRS in Real World Situations . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Research Question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Expected Outcome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
CHAPTER III - METHODOLOGY 47
Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
CHAPTER IV - RESULTS 51
Serialization and Deserialization . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Storage Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
CHAPTER V - RESEARCH IMPLICATIONS 60
Serialization Technology Selection . . . . . . . . . . . . . . . . . . . . . . . . . 61
Database Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
General Applicability to CQRS . . . . . . . . . . . . . . . . . . . . . . . . . . 62
ii
CHAPTER VI - CONCLUSIONS 63
Glossary 65
References 68
iii
List of Figures
1 Single Tiered Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Repository Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 The CAP theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4 Order database model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5 Order database model after denormalization . . . . . . . . . . . . . . . . 16
6 A standard queue data structure . . . . . . . . . . . . . . . . . . . . . . 17
7 A queue data structure altered to function using the CQS pattern . . . . 17
8 CQRS pattern diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
9 A series of commands arriving out of order. . . . . . . . . . . . . . . . . 32
10 A comparison of a customer as seen by traditional and DDD architectures. 37
11 Traditional synchronous persistence model . . . . . . . . . . . . . . . . . 41
12 Asynchronous persistence model . . . . . . . . . . . . . . . . . . . . . . . 42
13 The average time, in ms, for serialization of 50 000 small messages. . . . 53
14 The average time, in ms, for deserialization of 50 000 small messages. . . 54
15 The on disk size of a serialized small message. In bytes. . . . . . . . . . . 54
16 The average time, in ms, for serialization of 50 000 large messages. . . . . 55
17 The average time, in ms, for deserialization of 50 000 large messages. . . 56
18 The on disk size of a serialized large message. In bytes. . . . . . . . . . . 56
19 The average time, in ms, for serialization of 50 000 sparsely populated
large messages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
20 The average time, in ms, for deserialization of 50 000 sparsely populated
large messages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
21 The on disk size of a serialized sparsely populated large message. In bytes. 58
22 The time taken to serialize and deserialize 500 messages to a variety of
different storage technologies. In ms. . . . . . . . . . . . . . . . . . . . . 59
iv
List of Code Listings
1 An example of active record used to save a customer record to the database. 5
2 An example of retrieving a collection of customers from an Active Record
system and updating a property. . . . . . . . . . . . . . . . . . . . . . . 5
3 An example of a scenario which would trigger the n+1 problem. . . . . . 8
4 Working with a set of entities . . . . . . . . . . . . . . . . . . . . . . . . 11
5 Simple Message Router . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6 A simple message. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
7 A complex message with 28 fields. . . . . . . . . . . . . . . . . . . . . . . 52
v
CHAPTER 1
INTRODUCTION
Building software is a difficult problem and the field of software engineering is still very
new. Large systems have been built for less than 50 year and the process is far from well-
known. The variety of different systems means that no single approach is always correct.
Unlike building a bridge the ability to simply copy a computer system means that should
the same solution be needed somewhere else there is almost zero cost to reapplying the
same solution. Business software also tends to be put in place to replace existing manual
processes whereas bridges are built where no passage previously existed. This means that
software must fit in with existing constraints while the bridge builders are able to dictate
how the bridge is used. Over the years many suggestions have been made as to how to
build systems and many have claimed that they have the one true solution. What has
become apparent is that a single solution for all problems is ill advised. The architecture
of a stock trading system must be different from an off line reporting system. It even
seems that one stock trading system is significantly different from other systems that
seek to solve the same set of problems. Often it is possible to create working systems
using two completely different architectures; the advantages of one over the other may
only become apparent years later during maintenance or expansion. These architectures
are often formalized in patterns - high level explanations of how to apply a particular
architecture.
Command Query Responsibility Segregation is a relatively new development in en-
terprise architecture having only been suggested the last few years. Perhaps the most
amusing indicator of the newness of CQRS is that, at the time of writing, a Google search
for CQRS will result in the suggestion ”CQRS... did you mean cars?”. This has become
something of a running joke in the CQRS community. This essay will explore a number
of aspects of CQRS from its history, to its design to the situations in which it should and
1
should not be applied.
Section 2 will cover the current state of CQRS. It will explore what defines CQRS,
the current state of research and how to address some of the short comings of CQRS.
In section 3 a research problem related to event sourcing will be introduced. Section 4
will outline the results of the experiments. Section 5 will suggest an approach which can
be followed to make use of the results of the experiments. Finally section 6 will draw
conclusions from the literature in section 2 as well as the experiments.
The experiments in this paper provide a number of benchmarks for the serialization,
persistence and deserialization of messages which are central to CQRS and other message
based architectures. In order for messages to be durable1 they must be written to disk.
While the experiments here focus on writing large numbers of messages as part of even
sourcing the concepts are applicable to any persistence of messages.
One of the applications of CQRS is in high performance distributed computing where
message size and processing speed are critical. A study of which message formats and
storage mechanisms should be used is crucial for ensuring that these systems can be
performant. There is limited information on the size of systems deployed using CQRS
but there are reports of systems processing 250 million messages a day. The author has
worked on systems processing almost two million messages across half a dozen nodes.
1Durable messages are recoverable after a process or machine crash. Messages stored solely in volatilememory such as that used in processor caches and main memory are not durable as they are wiped aftera power outage.
2
CHAPTER II
REVIEW OF RELATED LITERATURE
There is very little academic research on CQRS as it is only about a year old. The
original term CQRS was coined in 2010 by Greg Young who split the CQRS mailing list
from the Domain Driven Design group. Fortunately the relationship between CQRS and
DDD as well as a number of other established technologies is well established and there
exists a large body of work on these topics. Because of the youth of CQRS limiting this
essay to recognized academic resources is illogical; thus I shall draw on sources such as
mailing lists and blog posts on the topic of CQRS by some of the thought leaders in the
field. There is, however, a great deal of excitement about the potential of CQRS. At the
time of writing the Patters and Practices group in Microsoft has announced that they will
be producing a set of recommendations and, possibly, some demonstration applications
centered around CQRS. Despite some controversy and early skepticism on the mailing
list it is likely that a recommendation from of of the largest software vendors in the world
will spur significant adoption. The Patterns and Practices group’s initial publication is
expected early in 2012[23].
ARCHITECTURES FOR DEVELOPMENT OF DATA DRIVEN
APPLICATIONS
A huge number of developers are employed to create system for the entry, analysis and
reporting of data. This is simply because businesses produce a great deal of data and
have a need to make decisions based on that data. These enterprise systems are largely
rely heavily on relational database systems. Table based relational database systems
have been at the forefront of enterprise systems for the last thirty years. There has
been almost no other computer technology which has been as persistent or as pervasive
as relational database systems. An SQL query which worked on a database from 1985
3
is almost certain to work on the latest relational databases. What has evolved is the
method for accessing the data in the database.
Monolithic Single Tiered
There are a number of different architectural patterns which have seen common use for
accessing data from an application. The simplest method is a single tiered monolithic
system. In this the application and data access form a single tier on a single computer.
This sort of system is useful for applications which are not collaborative and have no
need to share data. Traditionally systems such as word processors and spreadsheets fall
into this category, although those systems are now making their way online and becoming
collaborative. This method has also seen use in mainframe systems where the applications
are access via dumb terminals. Often these systems make use of flat files or files which
contain a proprietary data structure optimized for access by the application.
Figure 1: Single Tiered Pattern
N-Tiered
As data access requirements become more complex a popular approach is to put in place
an architectural layer dedicated to data access. When using object oriented programming
there is a discord between the tables based structure of traditional relational databases
and the data transfer objects(DTOs) found in an application. Application developers
prefer to walk the object to get access to its properties. For instance it would be expected
4
1 ICustomer customer = new Customer ( ) ; // a c t i v e record customer2 customer . FirstName = ”Frank ” ;3 customer . LastName = ”Berry ” ;4 customer . Save ( ) ; // p e r s i s t the ob j e c t to the database5 re turn customer . ID ; // the ID i s populated from the database
Code Listing 1: An example of active record used to save a customer record to thedatabase.
1 IEnummerable<Customer> customers = CutomerTable . ID . GreaterThan (7) ;2 f o r each ( var customer in customers )3 {4 customer . I sP r e f e r r e d = true ;5 customer . Save ( ) ;6 }
Code Listing 2: An example of retrieving a collection of customers from an Active Recordsystem and updating a property.
that the orders collection hanging off of the Customers object would be populated with
the customer’s orders. Often this functionality can be provided by an object relational
mapper such a Hibernate, Entity Framework or, very popular in the Ruby on Rails
community, ActiveRecord. These tools are often used against a remote database in order
to gain the advantages of N-Tier architecture. The operations available through these
systems are usually limited to Create Read Update and Delete which is commonly known
as CRUD. We will refer to this style of operation as Active Record architecture.
Each system which provides Active Record architecture differs in its implementation
it is very common to see the data objects extended or decorated with methods allowing
for database access. Languages such as ruby which permit monkey patching2 usually
implement the data methods by patching objects while languages, such as C#, may
extend the objects or generate the objects through static analysis and code generation
prior to compile time.
Queries are constructed behind the scenes and are usually provided as decorators on
a collection of objects.
2Monkey patching is the ability to add or even change the methods of an existing object at run-time. The ability to monkey patch allows for much easier unit testing although it does introduce someuncertainty about whether calling a method on two objects of the same type will have the same effect.
5
As the system again increases in complexity it becomes undesirable to use an Active
Record as the data operations become spread throughout the code and changes to the data
access tools or methods become major operations. It is simply too easy for developers
to access the database from any context they desire. To alleviate this concern many
systems centralize data operations into a layer containing repositories. These repositories
are classes through which all the data access requests pass. They may contain methods
which simulate the CRUD operations of Active Record but they may also start to make
use of task based data operations. This is preferable as it provides some abstraction
around the data access. Consider the act of saving a new customer and adding an order.
Using Active Record this would consist of first creating a customer, retrieving the ID of
the created record and then creating an order related to that and saving it. This operation
can be made atomic through the use of transactions at the database level but the logic
would still be decentralized, every place in the application which created a customer and
an order would need to implement the operation. Should the requirements change and
there be a need to log the creation of a customer it would be difficult. A repository might
provide a method CreateCustomerWithOrder which would be called from throughout the
application providing an easy extension point and reducing the amount of duplication of
code - always a good thing.
6
Figure 2: Repository Pattern
All these database access patterns are based on manipulating a single set of domain
objects. A page showing a list of customers might use the the customers collection and
a projection to display the customer id, name and city. Difficulty arises when there
is a requirement to display information which crosses the boundaries of an entity. The
collection of orders is not contained in the same entity as the customer information. Were
it desirable to display the customer name and a count of their orders then multiple queires
are required. In an Active Record scenario this typically takes the form of decorating the
Customer object with a lazily instantiated collection of orders. Accessing the collection in
code will trigger the population of the records from the data source. One of the problems
associated with this method of data access is the n+1 problem. n+1 is a result of the
object relational mapper’s inability to predict the future operations on a retrieved entity.
When iterating over a sub collection of objects the object relational mapper will
only load the object as it is needed. Thus when retrieving the entire sub collection each
7
1 Customer customer = Customers . Load (15) ; // load the customer with ID 152 fo r each (Order order in customer . Orders )3 {4 //some operat i on on order5 }
Code Listing 3: An example of a scenario which would trigger the n+1 problem.
object is retrieved with its own a query rather than returning multiple rows from a query.
Accessing each entity is less efficient than accessing all of them at once but the object
relational mapper acts in this way because does not know if the code is curious only
about the first entity or if the entire collection is needed. As the collection might be
prohibitively large a decision is made to retrieve exactly what the code requests and not
assume that the collection is being iterated over fully. Programmers need to explicitly
instruct the object relational mapper to load the collection but n+1 is an extremely
subtle bug because the code does work, it just works slowly. Often it is not noticed in
test because of the limited size of the test database. Once in production of huge quantities
of data the issue is enough to bring a database to its knees.
SCALING DATABASES
As the quantity of data which the application needs to access grows it becomes necessary
to scale up the database infrastructure. Unfortunately it is very difficult to scale table-
based databases because of the nature of the queries run against them. Almost every
meaningful data retrieval action against a relational database requires that tables be
crossed against one another. In our customer and customer order example above if
we wished to list the name of a customer and a list of their orders we would need
to perform a cross between the customers and customerOrders tables. This operation
usually consists of matching records in one table and then augmenting and crossing them
against another table. If the two tables are stored on a single computer the operation
is not that difficult but as soon as you attempt to cross the tables against data on
8
another computer the limitations of network speed become apparent. One cannot go
transferring large quantities of data over the slow network in this fashion. Despite the
growth in storage media the amount of data generated is growing far faster. Data growth
is proving to be somewhere on the order of 60% a year[16]. The result is that querying
this quantity of data cannot be performed on a single machine and must be distributed
over a large number of machines. This is known as scaling out as opposed to simply
buying a faster computer, known as scaling up. The best option for scaling table based
databases is to distribute the data in a logical fashion. One popular technique is to make
use of sharding. This technique partitions the database in a horizontal fashion, splitting
tables into groups of rows each of which is distributed to a different server[29]. Specialized
techniques and tools must be used to reassemble the data after it has been split across
many instances. Care must also be taken to ensure that the partitioning scheme is both
efficient and easy to calculate such that the destination of an inserted row can be found
efficiently.
Sharding is a technique designed to help scale out databases which were never origi-
nally designed for it. It involves partitioning data within a table across multiple nodes.
For instance a technique might be to keep all the customers whose ID is even on node A
and all those whose ID is odd on node B. It is now apparent, given an ID, which server
should be queried. Of course if the ID of the customer is not known in the query both
databases need to be queried and if an additional table needs to be referenced then there
are significant data transfer requirements. If the query calls for a list of all customers
in Calgary which more than 12 orders then ether the full list of customers in Calgary
must be calculated and aggregated somewhere or the list of customer IDs with more than
12 orders must be distributed to each of the nodes containing customer data. In either
instance the data sent back and forth between nodes is potentially large.
Often a better solution is to make use of the new generation of NOSQL databases
which are constructed with a goal of allowing for easy scalability across a large number of
9
nodes. Distributed databases usually provide a restricted set of query tools but are always
limited by the CAP theorem. Established by Brewer[5] and later proved by Gilbert and
Lynch[13], the CAP theorem states that a distributed computer system can only ever
fulfill two of
• Consistent
• Available
• Partition tolerant
at any one time. Traditional methods for scaling table based databases such as shard-
ing ensure consistency and availability but require that each node in the database be
reachable at all times. As soon as the network is partitioned then the entire system fails.
Many of the NoSQL databases are constructed in a way that does not allow for
crossing tables. For instance CouchDB relies on map-reduce functions to filter datasets.
These functions can be run on many nodes at once and the results aggregated for return.
Figure 3: The CAP theorem from http://guide.couchdb.org/draft/consistency.html
As has been observed[21] availability is highly desirable for any sort of online system
so the question is really reduced to ”Is it more desirable to have consistency or parti-
tion tolerance?”. NoSQL databases usually answer that partition tolerance is preferable.
10
1 Customer customer = s e s s i o n . Load<Customer>( id ) ;2 i f ( customer . Country == ”Canada”)3 {4 fo r each (Order order in customer . Orders )5 {6 order . TaxRate = 0 . 0 5 ;7 s e s s i o n . Update ( order ) ;8 }9 }
Code Listing 4: Working with a set of entities. This code uses the conventions from theNHibernate ORM but should be familiar to anybody who as used Hibernate
Often this results in the use of eventual consistency. An update to a record may not
be visible right away but will eventually be visible to subsequent queries. This sort of
database is useful for information which don’t have real time requirements. However even
if you do have real time requirements there are some huge issues related to the loading
and use of data.
CONSISTENCY IS A LIE
We have always been told that transactions and database consistency levels will save us
from getting inconsistent data from a traditional relational database; after all most of the
world’s computer systems are based on these systems. Unfortunately that is not really
the case. In order to see the issues let’s look at a couple of examples which initially seem
to be okay but turn out to be problematic.
Object relational mappers often offer a facility to perform lazy loading of database
objects. Thus you might load a customer entity and then explore the collection of orders
related to that customer, as you explore the collection the ORM will load the entities.
In code listing 1 a customer is loaded and then their orders collection is lazily loaded.
Because these events occur separately it is possible that another process has updated the
customer entity between it being loaded and the collection of orders being loaded. The
only isolation level which would prevent this is serializable, the highest isolation level[27].
This is certainly not the default level for any database. It is likely that code of this
11
sort exists in many systems which do not make use of serializable. I know that I have
written systems which would fail in this situation. But if we use serializable isolation
then at least we can be assured that all the records for that customer have been updated,
right? Unfortunately not, it is entirely possible that a new record has been added for
the customer after we’ve populated our collection. The only way to prevent such an
addition from occurring is to lock the entire table any time a transaction runs. Because
it is impossible to know what a transaction is going to do before running it(see halting
problem) every single transaction would need to lock every table it uses for the duration
of the transaction.
It is the conjecture of DDD expert Udi Dahan that race conditions do not actually
exist[9]. His argument is that very slight differences in the timing of transactions should
not have an impact on business logic for the vast majority of businesses. He give the
example of having requirements
1. If the order was already shipped, don’t let the user cancel the order.
2. If the order was already canceled, don’t let the user ship the order.
If two users issue the ship and cancel commands at almost the same time what should
happen? The solution comes from the observation that the refund to the customer does
not have to be issued at once. We can issue a command to refund the customer once we
have assurances that the order can actually be canceled.
Computer systems in businesses are put in place to replace manual processes. Prior
to the advent of computers a memo would have been issued by the order department and
sent to the shipping department instructing them to cancel the order. A follow up memo
would be sent by the shipping department to the billing department informing them that
either the order had already shipped or that they had canceled the shipment. The near
instantaneous nature of computers and networks has lead to the false assumption that
millisecond timings are important in to the business. This is simply not the case3, for
3There is a business where this sort of timing is actually important and that is in stock trading.The requirements there are unlike those of traditional system and are not really applicable to this race
12
the most part these conflicts can be solved by using compensating logic. For instance if
the customer cancels the order after it has shipped then the compensating action might
be to e-mail the user to let them know that their order shipped anyway and that they
can ship it back for a refund.
Even the most common example of using a transaction to prevent race conditions,
withdrawing money from a bank account, can be restated such that it doesn’t rely on
concurrency. Normally the argument is that the bank account balance must be locked
before the withdrawal to ensure no competing process withdraws all the money and
leaves the bank with a negative balance it must cover. In practice, however, it is rare
that this situation would arise and if it does the bank simply allows the withdrawal,
places the account in the negative and bills the customer some outrageous amount in
overdraft charges. In this situation the bank actually makes money from not observing
transactional consistency. Certainly there are cases where the bank will never recover its
money but these are so few that they are more than paid for with the overdraft fees.
CQRS Data Access
Command Query Responsibility Segregation provides an alternative method of looking
at data access. The vast majority of data access commands are read. Anecdotal evidence
suggests that 99% of data access operations are reads[34] however the database models
and systems commonly in use tend to optimize for write operations. The goal of database
normalization is to reduce the number of places in a database where information exists.
Traditionally it is the goal of developers to keep their database in one of the normal
forms, ideally BCNF. However this optimization is used to make writing to the database
easier. Each unique piece of information should appear in exactly one location so that
updates do not create inconsistencies. If 99% of database operations are reads then why
are we optimizing for the operations which take up only 1%?
condition discussion.
13
CQRS breaks up the entire data model into a domain model and read views. User
actions result in a command being issued which triggers and update to the domain model
facilitated by a command handler. The command handler may then publish a message
informing a number of listeners on the read model side that an update has been performed
and that they should update their state.
This is truly the crux of CQRS - separate database structures for reading
and writing.
The issue with a normalized database is that data which users would like to see on
the screen at the same time is not necessarily proximal in the data model. An example
of a traditional data model for a commerce system might be as shown in figure 4. The
database in this model is normalized and is in BCNF. It has the advantage that the
data is consistent and there is no duplication of data - from a disk usage and ease of
update perspective it is highly efficient. However, using this model to display the user’s
order complete with item names, item prices accurate for the order date and the latest
record from the shipping history table requires crossing all the tables. This is not only
a complicated query for developers to write but is also expensive for the database to
process. The problem is, of course, compounded by the possibility that the database
could be distributed across a number of shards. In order to make queries to the database
more efficient one might consider denormalizing the database.
Figure 4: Order database model
14
Denormalization is a database design which aims to optimize the read queries against
a database by shifting fields and combining tables such that data is closer to the structure
required by the query. It is an intentional move away from strong normal forms towards
weaker ones. The cost of denormalization is that it makes updates to the database more
difficult to perform as data may be duplicated in multiple tables. The database from
figure 4 has undergone denormalization in figure 5. Using this model it is clearly much
easier to run the same order information query. Now only a handful of tables need to
be crossed and this very common query can be performed far more efficiently. However
the cost is clear: what would happen if the name of an item needed to be updated?
Now it would be necessary to update not only the name in the Item table but also in
the denormalized OrderItem table. While the cost of this operation is not staggeringly
high from a computational stance there is a lot of developer overhead. It is likely that
the denormalization in figure 5 is simply one of many denormalization which would be
required to feed the various screens and reports in the commerce site. Developers must
now remember to update the item name in all the denormalized tables associated with the
order. One possible solution is to make use of database triggers to push updates to the
various child tables. This has a couple of disadvantages, first it moves a lot of the business
logic into the database which, as we’ve already discussed, is difficult to scale. Second it
makes updates take longer as the user now needs to wait for all the triggers to complete
as part of their transaction. Another oft cited argument against denormalization is that
the data duplication may have significant data storage costs.
15
Figure 5: Order database model after denormalization
CQRS
Origins in CQS
CQRS, as a pattern, is closely related to the Command Query Separation pattern which
was first proposed by Meyer[22]. Meyer suggests that ”Functions should not produce
abstract side effects” and that only procedures should be permitted to make changes to
the state of the object4. The metaphor behind this pattern is that of a machine which
has two kinds of buttons, command buttons and query buttons. Only command buttons
may be used to change the state of the object and this state is not directly observable.
The only way to extract information is by pressing one of the query buttons which will
change the output displayed on the indicator lights. No matter how many query buttons
are pressed and in what order the internal state of the machine will not change. Queries
are side-effect free.
Applying this pattern to a common programming problem may give some idea of how
it can be used. Consider a queue data structure which has the normal queue operations
such as push, pop and peek, how can we update this object to follow the principles of
CQS?
4Many of today’s most popular programming languages do not distinguish between a function anda procedure as a syntactical level. This means that there is a whole generation of developers who haveno idea that there is even a difference. Procedures do not return anything, in languages with C-stylesyntaxes this would be represented as a method with a return type of void. Functions do return a value.
16
Figure 6: A standard queue data structure
The push operation is a procedure which makes changes to the internal state of the
object. This already complies with the CQS pattern so no change is needed. Peek returns
an object so it is a function. Peek does not change the internal state of the object so it too
complies with CQS. Finally the pop operation returns and object and alters the state of
the queue. This is a violation of CQS. We can alter the queue so that the pop operation
does not return the object from the queue but simply removes it. Now when interacting
with the queue users will need to peek and pop as two separate operations, however
they can peek as much as they like and be assured that there will be no unexpected
consequences.
Figure 7: A queue data structure altered to function using the CQS pattern
Meyer feels that adhering to a strict separation between commands and queries in
a large system is key to success. Modules can safely call into other modules knowing
that any queries they run will be without side effect. He goes on to suggest that a class
which implements CQS should implement at least two interfaces, one which describes
the functions and one which describes the procedures. If it is undesirable that other
modules be able to update the state of the system then all that need be exported is
the interface which defines the methods. This is a theory which is in strong agreement
17
with the SOLID principles of software design[18] specifically the Interface Segregation
Principle which states that ”Clients should not be forced to depend upon interfaces that
they do not use”. Interfaces should be as narrow as possible to reduce the risk from
changing the underlying class. If the method to change was not one of the ones presented
in an interface to another component then that component need not worry about changes
to the underlying object.
CQRS takes the separation principles applied in CQS at the class level and expands
them to work at the system level. In the same way that CQS restricts data updates to
procedures CQRS restricts business model updates to commands. Queries against the
read model are equivalent to functions and must be side-effect free.
CQRS EXPLAINED
Thus far we have satisfied ourselves with an informal description of how CQRS works
and some of the implications of the pattern; let us formalize the pattern and explain it
fully. As we know the crux of CQRS is the separation of database read and write models
such that each one can be optimized for its specific purpose.
Figure 8: CQRS pattern diagram
Let us use the example of managing the information about a customer as our example
domain. In DDD parlance this would be the customer relation management bounded
18
context5. Within this bounded context we wish to perform actions against the customer
aggregate which contains the fields
• ID - a UUID for identifying the aggregate
• Customer Name6
• Street Address
• City
• Postal Code/Zip Code
• Province/State
• Country
• EMail Address
• Phone Number
• Hashed Password
• Has Gold Status
As a user the first difference you would notice is that the user interface is not organized
around making adhoc updates to the customer entity. In may system the edit customer
screen would have a field for each one of the fields in the customer object and a single
”Save” or ”Update” button at the bottom of the page. In a CQRS system a task based UI
5For more information about bounded contexts and DDD in general please see the section ”Relation-ship with Domain Driven Design”. Because of the close relationship between CQRS and DDD I will beusing a lot of DDD terminology; if you are unfamiliar with DDD it would be useful to read that sectionfirst.
6Unrelated to this essay but a interesting diversion is the paper found athttp://www.w3.org/International/questions/qa-personal-names which talks about the complexi-ties of how people are named around the world. The general advice is that there is no standard formfor first and last names and that system should attempt to just track a name rather than a first andlast name.
19
is used. The goal of a user interface of this sort is to capture the intent of the user rather
than simply the after effects. The update view is gone and has been replaced by a series
of business tasks related to the customer. For instance if the customer had moved and
changed their address there would be a function for ChangeUserAddress which would
provide fields for changing the fields of the aggregate related to the address, likely street
address, city, postal code, province and country. You’ll notice that phone number was
not included in that list of fields. This was an intentional exclusion as there is a loose
coupling between address and phone number: moving does not always result in a change
to the phone number. Alterations to the phone number are a separate business process
and would be handled through the ChangePhoneNumber use case and command. The
user interface will, through some layers of abstraction, publish a ChangeUserAddress
command.
The structure of the command will be very simple, it will contain the Aggregate ID
and the fields to set. This command will be sent to a command handler which will load
the aggregate from the domain model and update the required fields. Because CQRS is
designed to work in collaborative domains and there could be multiple updates to the
customer aggregate which arrive at almost the same time. Thus the command might also
contain a version number of the object which was displayed to the user, if this version
number does not match the version in the database then there have been updates between
the user being shown the aggregate and the arrival of the command.
There are a couple of strategies for dealing with supposed race conditions. The
first is to minimize the actions each command performs. In much the same way as the
Interface Segregation Principle dictates that as little should be made public as possible
the commands should be as granular as possible. A ChangeUserAddress command would
not at all conflict with a ChangePhoneNumber command as they update different fields.
If there is actually a conflict then it should be up to the business to decide what should
be done. If the conflict is rare then it might be flagged for an administrator to review if it
20
1 c l a s s MessageRouter ( )2 {3 p r i va t e Lis t<KeyValuePair<Str ing , IMessageHandler>> hand l e r s = new List<
KeyValuePair<Str ing , IMessageHandler>>() ;45 pub l i c void Reg i s te rHandler ( s t r i n g messageTypeToHandle , IMessageHandler
handler )6 {7 hand l e r s . add (new KeyValuePair<Str ing , IMessageHandler>(
messageTypeToHandle , handler ) ;8 }9
10 pub l i c void RouteMessage ( IMessage message )11 {12 s t r i n g messageName = message . GetType ( ) .Name ;13 hand l e r s .Where (x => x .Key == messageName )14 . ForEach (x => x . handle ( message ) ;15 }16 }
Code Listing 5: A message router which allows for registering message handlers androuting messages to them.
is more common then a more automated solution may be required. It is not the place of
the software to dictate what the business should do, the behaviour of the software should
be dictated by business needs.
Once the command handler has completed its processing it may raise an event to
inform interested parties that something has changed. In the case of the ChangePho-
neNumber handler an event such as PhoneNumberChanged would be published. Views
on the read model side which contain the customer’s phone number would subscribe to
this event and update their view of the world when it was received. This is often the
most difficult part of CQRS for people to understand as there is oft a concern about what
happens if the event is missed. If the event is missed it will throw the view model into
an inconsistent state which is highly undesirable, for the most part.
In pure CQRS the message delivery is all performed inside a single process. This can
often be achieved by registering command and event handlers in some sort of a container
and iterating over the container contents each time an message is sent. A very basic
implementation can be seen in listing 2. Because these handlers are in process they are
as reliable as calling any other piece of in process code.
21
Message Queues
As the scale of a CQRS solution grows it may become necessary to scale out to multiple
machines or, at the very least, out of a single process. Usually this is done by distributing
the message handlers to multiple machines and decentralizing the read model database.
In order to achieve this distribution one can make use of one of the many reliable mes-
saging technologies which have existed for many years. Message queues are the simplest
solutions, they provide a queue of messages which can be read from remote machines and
often also provide some support for transactions so that a process will read the message,
process it and only then will it be removed from the queue. Because queues of this nature
are so simple a Google search for Message Queue will return hundreds of products which
deal with ensuring that messages, once sent, will arrive. What cannot be guaranteed is
that the messages will arrive rapidly.
If the computer responsible for updating the CustomerOrder view model is broken
then it may not receive the update to the customer’s shipping address in a timely fashion.
Proponents of traditional architectures would point at this and claim it to be a major
flaw, however the message is safe it is simply sitting in a message queue somewhere. It
will eventually be delivered. In a traditional system if the database server is down an
error message is displayed for the user or the update is simply swallowed. These sorts
of failures often go unreported as it is very difficult to log events which did not happen.
With CQRS you can be assured that, baring a calamitous hardware failure, the message
will arrive at its destination. There are even some techniques which can be used to
alleviate the calamitous loss of a server; see the section on event sourcing.
Although CQRS is a technology agnostic pattern much of the development of it has
come out of the Windows .net community. It is therefore unsurprising that one of the
most popular message queuing technologies for use in CQRS projects is Microsoft’s own
Microsoft Message Queue(MSMQ). This system comes built into Windows operating
systems and has done for many years. It is one of the few queues which has support for
22
the Microsoft Distributed Transaction Coordinator(MSDTC).Although it is not necessary
to support transactions it can be useful especially in long running processes which are
usually known as sagas.
The Advanced Message Queue Protocol(AMQP) is an open protocol for message queu-
ing systems which was developed out of banking giant JPMorgan Chase. This standard
just reached version 1.0 this year although development began some years ago. AMQP
provides standards for message queues as well as message brokering which is unnecessary
for CQRS implementations. One of the goals for which CQRS strives is to maintain
simplicity in all of its aspects. The full AMQP is far more powerful than is required but
utilizing a subset of its capabilities allows for the use of one of the many implementations
of AMQP. These implementations include RabbitMQ, StormMQ and ApacheQuid7.
For applications with cloud based requirements there are cloud based messaging so-
lutions. Windows Azure provides Windows Azure Queues(WAQs) and Azure Service
Bus which are light weight but still functional queues. WAQs supports message queuing
and expiry but not distributed transactions. Based on the direction Microsoft is taking,
placing cloud computing at the top of its priority list, it seems likely that WAQs will be
the focus of a great deal of development time. Amazon also have their Simple Queue
Service(Amazon SQS) which has many of the same features as a WAQ. Windows Service
Bus was developed with the help of Udi Dahan and is especially designed to allow for
use in CQRS like solutions.
Event Sourcing
It should be clear by now CQRS is heavily dependent on messages. The domain model
in CQRS is built up from a series of commands and the view model is constructed from
a series of event messages. As the system grows and evolves the views must also evolve.
Consider a situation where a page on a website showed the price of a stock and the stock
7These queues have much niftier names than MSMQ and must therefore be superior as per Web 2.0nomenclature rules
23
symbol. The requirements change such that the view should also contain the name of
the company. How would one go about populating the view with the new information?
Usually it would be done by reading the information from some source and updating the
view table. In order to ensure that the page stays up to date a change will also have
to be made to the view population message handler. An alternative is to maintain a
collection of all the events which have occurred since the creation of the system and to
simply replay them when changes to the view model are needed. Ensuring that there is
a single source of all the events for the system is known as event sourcing[11].
Event sourcing is much more powerful than simply storing the current state of the
system because we gain a historical perspective of how we reached the current state.
Consider the insight we might gain from building an event sourced shopping cart on a
website. In a traditional system the shopping cart might store
• ItemID
• Quantity
This information is sufficient to allow us to process a checkout for the customer.
However if we wish to perform analysis of how people are shopping then the shopping
cart needs to gain some more fields. Perhaps we want to know what shoppers end up
buying when they have first added product X to the cart. For this we will need to also
store the time at which each item was added to the cart. The model now grows to
• ItemID
• Quantity
• AddedTime
Now consider the case where the shopper has removed an item and replaced it with
another. How would we store this in the shopping cart model?
24
• ItemID
• Quantity
• AddedTime
• RemovedTime
What if the customer changes their mind and adds the item back?
• ItemID
• Quantity
• AddedTime
• RemovedTime
• ReaddedTime
As you can see the model rapidly loses any elegance it had before. We must spend
a terrible amount of time designing this cart to ensure that we have covered all the
situations business might wish to analyze in future. In order to deal with the ever
changing requirements of business we would have to be continually changing the cart
and any new analysis would be unable to run on shopping carts created before the
requirement.
If we instead stored the stream of events which created the shopping cart then all the
analysis above becomes trivial. If there is a need to find out which product was added
after product X we can simply examine the order of the events. Even if an item is added
and removed many times we store each event and can create an accurate time line of the
state of the cart. With event sourcing we not only have the same data we would normally
have with a traditional we have all the metadata which describes the story of how the
data came into being.
25
Event sourcing and event storage has a number of other advantages. The foremost of
these is to keep track of the history of an object for audit purposes. There are countless
systems which need to know how an object came into being. This sort of architecture is
often built at the database level using shadow tables. Shadow tables are log tables which
are inserted into by a trigger on the parent table. Their structure mirrors the parent table
except that it adds a user, a date and an action. When a row in the table is altered the
original row is inserted into the shadow table by the trigger. Equally a trigger on delete
keeps track of when and by whom the row was deleted. With an event store tracking the
changes to the object is easy as you can quickly see exactly which fields were changed
and, if the system supports proper message names, the reasons for the change. The
reason is a key advantage over the shadow table approach because you can distinguish a
change to the address because of a move and because of a typo Messages can, if properly
constructed, convey intent. One can also perform analysis of the events which have been
fired for a particular root which can give the power to extract meaningful metrics about
how the system is being used.
The ability to replay events is also very useful for debugging the application workflow.
Events can be restored from the event store and run against a handler to build up a
snapshot of the system state at any point in time. The resulting state can be examined
and poked to isolate errors. Usually it is not advisable to change the history of a system
by altering the commands fed into it as that alters history and corrupts the audit trail.
However with a complete history of the commands one could change the way in which a
command is handled and rebuild the system state from the commands. In most cases we
treat the event store in much the same way that an accountant would treat a ledger. Line
items cannot simply be erased from the ledger if a mistake is made instead a correcting
entry is made to adjust the ledger back into balance. In CQRS correcting actions are
issued to bring the system back into line.
Replaying events to restore a past snapshot of the system is not only useful in debug-
26
ging and auditing, it can also have applications in every day activities. Consider the case
of requesting a copy of an invoice from six months ago. Since that invoice was issued
some of the data in the system may have changed, perhaps the name of the company has
changed. We don’t want the new data showing up on the invoice but this is a common
occurrence in many systems which are not time aware. It would not do for the invoice to
be issued to a company which did not exist six months ago.
The key to event sourcing is to ensure that all the state changes in a system occur
through event sourcing. Every event raised by the command handlers should be written
to some sort of a persistent store for later user. The storage of these commands is very
simple as one really needs only to store
• The date and time of the event
• A unique identifier for the event (UUID style ID is best for this application)
• The name or ID of the bounded context to which this command is tied.
• A serialized version of the event
A date is useful for purposes of ordering and also to be able to restrict the set of
commands being replayed to a narrower window than the entire lifetime of the system.
The unique identifier is useful for keeping track of which commands have been run and
which might still need to be run. It is also very useful to give each item in the system
a unique name. Database administrators are not, generally, fans of UUIDs or GUIDs
as they are more commonly known in the .net world. UUIDs tend to play havoc with
clustered indexes as they are completely random and inserting them into a clustered index
causes the database tree to have to be rebalanced frequently[6]. This is a computational
complex operation, especially on a database such an event store which experiences a
large number of insertions. There are two good solutions to this issue, the first is to use
non-clustered index and the second is to make use of Comb-GUIDs. These are GUIDs
27
which are generated using a sequential and a non-sequential portion which allows them
to be efficiently clustered and, at the same time, retain their global uniqueness.
It is often useful to have some record of the bounded context to which the command
belongs. While it is unlikely that the name of a command would be shared between BCs it
does allow for faster processing by filtering the commands and avoids the possibility that
a developer might rely on replaying the commands and events from a different bounded
context in order to obtain some sort of additional information. There are methods for
inter bounded context communication usually in the form of sagas or passing the required
information in along with the command.
Finally the serialized version of the event is what is reloaded and fired through the
event handlers to rebuild a view model. This same process can be used if a new view is
needed or even if the views are lost. In fact many proponents of event sourcing suggest
that it is not necessary to store the view models on non-volatile storage at all. They
claim that as the views can be rebuilt quickly there is no point in storing them on disk
or backing them up. Keeping the data in memory is becoming very cheap8 and accessing
in memory information is at least an order of magnitude faster than accessing it from
disk[17]. For high performance applications event sourcing and an in memory view model
is ideal.
The serialized version of the event can be represented in a number of ways. The easiest
is to simply use the built in object serializer which many modern languages now provide.
Java and .net languages provide binary serializes as do Ruby, Python, and pretty much
every other major language[1, 24, 2, 3] . Unfortunately the format for each of these is
different which is somewhat limiting if the messages need to be replayed in systems which
are written in different languages9. Binary serializers often have issue with missing or
8Amazon have released a product called ElastiCache(http://aws.amazon.com/elasticache/) which isan in memory key-value store. For about $1700 a month it is possible to rent 68GB of extremely fastmemory which would be perfect even for fairly sizable databases
9The use of multiple languages is starting to become the norm in large systems. Each languageis used for the specific application where it is best suited. This is commonly known as polygotprogramming(http://memeagora.blogspot.com/2006/12/polyglot-programming.html).
28
added fields. Unless the definition of the object being deserialized is exactly the same as
the current version in the application the deserialization may fail. A better approach is to
use one of the many textbased representations of object data such as JavaScript object
notation(JSON), yet another markup language(YAML) or protocol buffers. Protocol
buffers are particularly resilient to changing objects as they use an identifier for a field
which will accommodate the spelling correction change from a field called Csutomer and
Customer. All of these alternatives have wide cross platform acceptance and libraries for
most languages exist.
On the other side of CQRS the domain model is constructed from a series of com-
mands. Domain models can be quite complex and retrieving them from structured
databases can be slow for high performance systems. It is also difficult to continually
change the domain model in the database to deal with ever evolving domains. Fields may
need to be added, removed or altered as the system evolves. An interesting approach
to dealing with an evolving domain is to simply not store the domain entities at all.
That seems highly unusual after all the domain model is needed for validating data and
ensuring functions are permitted. Advantage can be taken of the fact that the only way
to alter the domain is through a command. The event store can be used to keep track of
all the commands used to build up a specific aggregate root. When this aggregate root is
needed the system can query the event store, retrieve the historical stream of commands
which built the model and replay them to create an in memory representation of the
object. This object is used for the lifetime of the operation and is then deallocated.
Using events to build up objects is not a new approach. It was used in the smalltalk
language developed in the early 1970s. It was possible to examine the log of changes make
to an object and even replay them for debugging purposes[12]. The LMAX architecture
also makes use of in memory images, though they use them to avoid the cost of traveling
to the disk[32].
It may seem that the retrieval and replaying of events would be slow and complicated
29
but it is usually not at all that complex. The actual performance of replaying events in
this fashion will be examined during the research phase of this project. On older systems
or systems which have a large number of transactions on a single object it may become
time prohibitive to reconstruct the object from its history. In these rare cases the objects
can be checkpointed from time to time. This involves replaying the messages used to
construct the object then saving the resulting object to some data store. When it is to
be retrieved again then the checkpoint is loaded and only messages which arrived after
the checkpoint was created would be applied. This checkpointing can be performed again
and again as the system ages to ensure the restoring the domain object is quick.
As the system matures it is likely that the events and commands will also change
to match changing requirements. In order to handle this change it is often desirable to
version events so that correct handlers will receive the messages. It is not uncommon
to see commands with names like IssueReceipt20110101 to denote a version of the even
which was current on the first of January 2011. This will ensure that the handler with
the correct logic for this sort of event is fired.
Many of the advantages of event storage can be realized without having to resort
to using an event store. As Udi Dahan points out if a message queue is used a simple
audit queue will allow for the replaying of events[25]. The advantages of a fully fledged
event store are that is provides a single source of truth, is easily replayed with the same
mechanism as is already in use and it allows for in memory images.
The Language of CQRS
There are two message types in CQRS, commands and events. In order to distinguish
the one from the other we use different tenses and voices. Commands are in the present
and use the imperative voice examples are
• GrantCustomerGoldStatus
• ChangeCustomerAddressDueToMove
30
• ChangeCustomerAddressDueToTypo
• IncrementCustomerRank
Usually a command starts with a verb followed by a subject.
Messages are notifications of events which have already occurred so they appear in
the past tense
• CustomerGrantedGoldStatus
• CustomerAddressChangedDueToMove
• CustomerAddressChangedDueToTypo
• CustomerRankIncremented
The names of the commands are crafted in a way that they suggest the intention
of the user so we know not only what changed but why. There is a difference between
changing a customer’s address due to their having moved and correcting a typo. In the
first case business processes to welcome the customer to their new home may be kicked
off, it would be embarrassing if the same processes were started for the correction of a
typo. Ideally the names of the commands should come from the business during the
domain analysis.
IDEMPOTENCE OF COMMANDS
One of the properties of a message based architecture is that messages may arrive out of
order. If a user creates an order and adds two items to it not only may the items added
to the order arrive at a service in the wrong order but the request to add an item to
the order may arrive before the message to create the order. In order to deal with this
a couple of things can be done. First the out of order messages can be returned to the
end of the queue in the hopes that by the time they are processed again the required
31
messages will arrive. It is preferable to build a skeleton object which will have additional
fields filled in as information arrives. Consider the case of the order and two items added
above:
Figure 9: A series of commands arriving out of order.
The command handler starts by looking up the order in order to save the item to it.
However because the order has yet to be created it cannot find an order. Thus it creates
a place holder or skeleton order which is missing some of the fields. Next an additional
AddItemToOrder command arrives. Now the handler is able to find an order item and
goes about adding the item to it. Finally the create order command arrives and, finding
the order already partially created, adds the additional information.
It is also possible that a message may arrive multiple times. Most message queues
32
guarantee that a message will arrive but offer no such assurances about messages arriving
more than once. A situation where this might happen is easy to imagine. Suppose a
command is sent but the reply confirming that the message has arrived is not received
by the sender, this could be due to a network issue or a crash of a server. The sender has
no option but to send the message again. In these situations either the system can check
to see if the message has already been processed or ensure that message processing is
idempotent. Checking to see if a message has been processed is difficult if the history is
stored in a database which exhibits eventual consistency. Even in a traditional database
there is opportunity for a race condition such that the message will be processed again.
By ensuring that the processing of commands is idempotent these can be avoided.
CQRS VS. ACTIVERECORD(CRUD)
CRUD is the typical architecture of data driven applications. In comparison with CQRS
there are a number of places where each holds advantages over the other.
In terms of speed of development CRUD is almost always faster. Because CRUD
focuses on updating entire entities at a time there is no need to spend development
time defining the various use cases around the updates to an entity. Users can simply
update any field of the entity on a generic update screen. The similarity of updates
to different objects presents the opportunity for building very generic update screens.
Indeed frameworks such as ASP.net MVC and Ruby on Rails provide scaffold tools which
can generate all the pages for performing updates to an entity. When coupled with
an object database such as MongoDB or RavenDB development using CRUD can be
amazingly easy. With a properly developed business model validations on such entries as
phone numbers and addresses is much easier with CRUD based approaches. Some even
suggest that an entire application can be more or less generated from the basic business
entities[26].
One area in which CQRS is easier is constructing queries for complex reports. Es-
33
pecially in the case of heavily normalized databases constructing reporting queries can
involve crossing a dozen or more tables. Building a denormalized view for the report
is typically far faster and easier. Many queries which are cumbersome in a relational
database become easy if they are constructed as views. For instance queries which re-
quire that row data be transformed into columns require that the developers be familiar
with the rather complicated PIVOT command10.
It is important to appreciate that the initial development of an application is only
part of the story. If the move away from waterfall to agile has proven anything it is
that designs and needs change over time. The world is dynamic and applications need
to also be dynamic. An example given by Young is that of a changed requirement for an
e-commerce website. The CEO decides that an important metric to capture is how many
times a customer removes an item from their cart before purchasing it. With a CRUD
approach the history of the cart is likely not captured. So the best that can be done is
to start capturing that information after the change has been requested. However with
CQRS+ES the cart is represented by a stream of events. From this stream it is trivial
to construct a denormalizer which can provide the required information. In this case the
lost data is not particularly important but it is not difficult to think of a situation where
the lost data has value. Any time that code contains an update statement data is lost.
Adapting to changing business requirements is easier with CQRS however a real
advantage is the ease of adapting to changing technologies. The read model database is
likely to be the part of the application which experiences the most load. Most companies
don’t feel comfortable yet with web-scale databases but with an event store rebuilding
the database is easy which means that moving to a new database technology is also easy.
Debugging a production problem in a CQRS system with an event log is fantastically
easy. While in a CRUD system the database need be carefully set up to mirror production
a CQRS developer can simply replay the production event log and bring the system to
10http://msdn.microsoft.com/en-us/library/ms177410.aspx
34
any point in time. The ability to pick and choose a point in time also allows for examining
what-if scenarios. To get the same functionality in a CRUD system is very difficult.
As data grows to what is commonly referred to as Web-Scale CRUD based approaches
break down. It is impossible to scale a single server up to the point which many companies
need. Instead applications must be scaled out. In order to facilitate multiple servers
making use of data at the same time complex synchronization techniques must be used.
While not a panacea messaging allows for much easier communication and coordination
between a number of machines. Applying messaging to CRUD tends to be quite difficult
as the messages which are the result of CRUD perform monolithic updates to all the
fields of an object at once. This approach increases the changes of a conflict.
There are advantages to both a CRUD and CQRS based solution. In an ideal world
developers would be able to pick an choose the technique used in any one domain(See
”CQRS as a Top Level Architecture”).
RELATIONSHIP WITH DOMAIN DRIVEN DESIGN
Just as CQS was a strong influence on CQRS Domain Driven Design(DDD) has been
instrumental in helping to outline the modular approach that CQRS takes when dealing
with an entire business.
Domain driven design is set of patterns and methodologies for building business soft-
ware from the domain model out. All the code in the application comes from the do-
main model. If the model change then the code must also change, equally a change in
the code implies a change in the model[10, 15]. The business is divided into bounded
contexts(BCs) which are usually delimited by a business function. For instance the mar-
keting department might be a bounded context and the shipping department another.
Each department is responsible for a set of information and is the canonical source of
that information. Should another bounded context require access to some of the infor-
mation in another bounded context then it must request that information from the other
35
BC. For instance it is conceivable that the shipping department, billing department and
marketing department are all interested in knowing a customer’s address. Instead of
keeping multiple copies of the customer’s address which must be updated whenever a
change is made only one of the BCs owns this information. Let’s say that it is the billing
department. None of the other BCs are permitted to perform updates to the customer
information. Should the shipping department require the address of the customer it will
request the information from the shipping BC.
This, of course, plays very well into a CQRS/ES setup. Each BC can provide a read
only view of the data which other services might like to consume. Alternately messages
may be sent from BC to BC in order to fill a complete message. Consider the business
process behind ordering a product. In order to full fill an order a number of different
activities need to be orchestrated. To a customer a usual workflow might look like
1. Login to the site
2. Add items to the shopping cart
3. Proceed to the checkout
4. Enter a shipping address
5. Enter credit card information
6. Receive confirmation of the order
Behind the scenes there are all sorts of actions which cross between different BCs.
Logging onto the site is an operation which might run against the Security BC. Logging
in is likely not part of the order workflow since it can occur without an order being placed
and is common to a number of other workflows. We’ll call it a prerequisite step.
Adding items to the cart is likely to be managed by the sales BC. It will keep track
of the items and the quantity. When finished selecting items the customer checks out.
Checking out is largely handled by the billing BC. It will be responsible for collecting the
36
customer’s address and payment information. On many sites it is possible to ship to a
different address from the payment address. In this case the shipping address should be
owned by the shipping BC. Finally the order is sent to the shipping BC for fulfillment.
Figure 10: A comparison of a customer as seen by traditional and DDD architectures.
What is apparent is that an item which would be considered a distinct entity in a
traditional system, customer, is actually divided into a lot of pieces. In DDD the data is
divided not into groupings which make sense for the entity but rather groups which make
sens for the business. Dividing responsibility means that each BC can be implemented in
any way that the company sees fit. Some of the BCs might be collaborative in nature and
would benefit from CQRS while others may be implemented using CRUD. A pragmatic
approach like this is key for DDD, and the ability to compartmentalize the businesses
into BCs allow for a high degree of flexibility.
There remains a great deal of discussion on the CQRS mailing list as to the necessity
of using DDD in an CQRS implementation. It is certainly not a solved problem. There
exists at least some level of symbiosis between DDD and CQRS. DDD solutions may be
implemented without CQRS just as CQRS solutions can be implemented without DDD.
However without defining domain boundaries mapping the flow of information between
services is very difficult. Equally traditional approaches lack much of the malleability of
historical data which is crucial for evolving domains.
37
CQRS IN REAL WORLD SITUATIONS
CQRS as a Top Level Architecture
With the various advantages of CQRS it is tempting to make use of CQRS in every
situation. However it is costly to implement CQRS and many domains may not need
CQRS. In a typical application there are at least two domains. The first domain is
the domain which the application serves directly. For instance in a loan application for
a bank the served domain would be that of loans. Functions are provided to transfer
money to loans and apply for new ones. However a second domain must also be called
upon to perform the authentication and authorization of users. This is likely a completely
different domain from the loan domain. We shall refer to this as the security domain.
The security domain is a low value domain. It does not provide any significant business
advantage to the bank. The functions of the security domain are so common that there
are numerous third party tools out there which could be plugged into the application to
provide the required functions11. There are a limited number of developers available so
it is logical to have these developers focus on the areas of the business which result in
the most revenue or which provide the largest competitive advantage. As such it is not
advisable to make use of CQRS as a top level architecture. Instead CQRS should be
applied selectively to domains which are important.
Fitting CQRS Into Teams
For most developers CQRS is a departure from the traditional development methods. The
largest concern I have heard in talking to developers and watching the CQRS mailing
list is that the domain model and the view models will become desynchronized. This
is certainly a concern but the use of messaging or transactions, if the view updates are
11Services such as LDAP and OpenID provide at least authentication and, in the case of LDAPauthorization. They are both well known and well tested and are likely to be cheaper to purchase andplug in than building a custom version. Indeed authentication in particular is far more difficult thanfirst appears.
38
synchronous, ensure that everything does remain in sync. Even if some sort of a disaster
occurs which throws the view model out of sync using event sourcing allows for the
rebuilding of the view models.
Jonathan Oliver, an early proponent of CQRS, has suggested that teams working with
CQRS can make better use of the varying skills of developers. McConnell suggests that
there can be as much as two orders of magnitude difference in the skills of developers[20].
Unfortunately there are too few developers at the high end of the spectrum for current
business needs[28]. Oliver suggests that directing less experienced developers to create
the view models and the message handlers required to update the views. These are areas
which are easy to fix in the event of an issues and the knowledge required to build the
views is minimal.
While there are frameworks and tools for CQRS the general feeling among early
adaptors is that it is best to build your own framework. The code required to develop
a CQRS solution is minimal. In his example application Young builds an entire CQRS
application in a scant 500 lines of code12. Oliver suggest that the more senior developers
be focused on this sort of structural code as well as any code which requires special
attention. Ideally the loose coupling of components in a CQRS project mean that the
command handlers should be fairly minimal. In many cases the handlers will do nothing
but publish an event.
In his Advanced Distributed Systems course Udi Dahan places a great deal of empha-
sis on keeping the code as simple as possible. He recommends against the use of heavy
weight object relational mappers, dependency injection and all manner of other architec-
tural niceties. Keeping the length of the handlers down means that such techniques as
separation of concerns and abstraction layers melt away. N-Tiered architecture was only
developed as a method of dealing with large code bases. The division of handlers into
those associated with a single bounded context ensures that the code is minimal. If you
12https://github.com/gregoryyoung/m-r
39
find yourself with a large number of handlers then it is a sign that the bounded context
is poorly defined.
Adding CQRS to existing applications is a difficult problem. Usually existing ap-
plications have not been built with DDD in mind. Actions cross over what would be
AR-boundaries because nobody has defined them. The UI is likely to be CRUD based
and the user feedback dependent on rapid synchronous actions. Even if we assume that
CQRS is possible without undertaking the task of performing DDD analysis of the do-
main then some progress towards CQRS is possible. A place to start is to commence
the creation of commands and the publishing of messages from the current application.
Nothing need listen in on the messages initially. Any new functionality should make use
of the messages whenever possible. The UI should also be realigned to make use of a
task based UI instead of CRUD operations. Perhaps the only savior is that CQRS need
not be applied to an entire application at once. DDD teaches that isolating each AR is
key to dealing with the management of complexity. Some of the ARs may benefit from
CQRS while others have no need of it. There is added complexity for domains which are
implemented using CQRS so domains which have no need of CQRS can be implemented
with traditional techniques.
Providing User Feedback in Asynchronous Applications
It is commonly accepted that users benefit from rapid and consistent feedback in an
application. Clicking on a button should have some effect and it is helpful if that effect
is as close to instant as possible. Everybody has suffered the frustration of clicking on a
link and waiting, breath held, to see if the requested page will load. However providing
feedback in an asynchronous message based architecture can be difficult. Consider adding
a new item to a collection. In a traditional design the database is updated synchronously
as part of the post back and the subsequently generated index page will contain any
updates. However asynchronous applications may not perform the update before the
40
index page is rendered.
Figure 11: Traditional synchronous persistence model. Data is persisted to the databasebefore the listing page is rendered.
It is confusing to users to rename, remove or add an item and not see that change
reflected on the listing page which is typically returned after an update. This is not an
issue which is limited to CQRS situations where the change is performed via a command
and subsequent events. Many of the new databases such as MongoDB, CouchDB and
Casandra perform updates in an asynchronous fashion favoring eventual consistency.
Even in cases where the data for the index is rendered from the same database node
as the update there is no certainty that the update will have been processed before the
listing. To mitigate user confusion a number of techniques may be used. The first is
to change the flow of the update use case such that instead of the update returning
to an index page it return either to itself or a page which states that the update has
been submitted. This is something of a hack as it depends on introducing a slight delay
to the workflow permitting the data to be processed on the server. If the processing
delay is short then this is a very good solution. Of course the processing time cannot be
guaranteed. Another option is to make use of local storage to persist the change. HTML
5 web browsers support saving data offline through local or session storage. The listing
41
page can then be rendered with a combination of cached data and database data.
Finally attempts should be made to educate users about the limitations of computers.
In the general public there is a belief that computers are instant. This is not the case and
many user issues stem from this belief. Giving users more realistic expectations about
how data flows and is saved will be helpful in this case and many others.
Figure 12: Asynchronous persistence model in which the data may or may not appear inthe database before the listing page is rendered.
Solving the Difficult Offline Problem
Thus far we have spent a lot of time discussing how to deal with processing messages
from web or desktop applications. These situations assume that the client is connected
to the server at all times. As mobile phones increase in both power and popularity they,
along with the relatively young tablet platform, are becoming very important13. Mobile
devices are not always connected so for applications to function in a disconnected state
special allowances must be made. There are two problems with being disconnected:
• Updating the device with the latest data
• Ensuring that actions which occurred on the device in a disconnected state are
persisted back to the server.
13Shipments of iPads have been very strong for some time and the trend continues[31]
42
The first item is usually resolved through a process of synching. If you have ever
plugged in an iPod and waited while iTunes painstakingly compares the content on the
iPod with that on the computer it should be obvious that running naive comparisons is
slow. Instead of comparing the entire database it would make much more sense for the
iDevice to store a timestamp of the last sync and then have iTunes simply replay the
addition and deletion commands which were subsiquent to the iDevice timestamp. This
more or less eliminates the need for expensive comparison operations. All that is being
done is that a view model is being updated from a stream of events. This is no different
from how a view would be updated in normal CQRS operations.
Commands issued on the device can be used to solve the second problem. Imagine
that you are creating a playlist on the iDevice for later synchronization to iTunes. Each
action taken, be it adding a song, removing a song or renaming the playlist can be
stored in an event store. When the device is reconnected these events are replayed and
interwoven with any commands which may have happened in iTunes. In most cases the
commands from the two sources will not be in conflict however in the rare case when
they are then the user may be prompted for action. Synchronizing these two devices is
no different from dealing with delayed events in a web based system; something at which
CQRS excels.
Many of the mobile applications are smaller versions of a larger application. While the
main application may be constructed with multitenencay in mind and has the need to to
process all the messages for all the users the mobile version has less onerous requirements.
For instance a typical mobile application might require only the records for a specific user.
It would, in fact, be a security concern to allow the records for another user to be loaded
onto the device. This situation is easily handled by the offline CQRS. The messages can
be filtered by customer or by user so that only pertinent updates are sent to the device.
43
Examples
There are not a huge number of products which make use of CQRS at this juncture but
there are several projects which do. The CQRS mailing list is more than a thousand
people strong now so it is likely that there will be some not significant projects soon.
CQRS is also a technique which tends to lend itself more fully to being used inside a
business. Thus it is likely that there are many implementation out there which are not
known as a result of confidentiality agreements. Dahan claims that a significant portion
of Amazon’s architecture is built using techniques which are very similar to CQRS[8]
and the description Yegge gives of the internal workings of Amazon confirms it[33]. In
particular the composite user interface is a huge component of building the front ends
for CQRS systems.
One company which is very publicly building its systems using CQRS techniques
is the business forecasting company Lokad. Their involvement is largely due to their
chief technologist Rinat Abdullin’s interest in the CQRS community14. Lokad make use
of streams of events to provide not only predictions about inventory but also realtime
updates of stock levels. Their predictions benefit from being able to analyze historical
streams of events. As mentioned in the event sourcing section all the data about trans-
actions is preserved for later analysis, this lends itself well to such datamining questions
as
• During what part of the day do I sell the most ice cream?
• What percentage of product X is returned?
• When should I reorder a product from its supplier based on historical consumption
levels?
Lokad also makes use of the data available in the event stream about why an event
occurred. Having available to them a differentiation between a customer created due to
14Rinat hosts the distributed systems podcast and is also very active on the mailing lists
44
an import from an old system and one created on the company’s website is useful in
performing in depth analysis.
RESEARCH QUESTION
CQRS is obviously a huge and complicated topic which has only just begun to be explored
by the community and is more or less absent from the academic community. I propose
to explore only a tiny fragment of the CQRS problem space, specifically the area around
using an event store to rebuild the domain model whenever it is queried.
I propose to answer a number of question
• How quickly can we replay messages using various different serialzation approaches?
• How quickly can we replay messages using various different data storage techniques?
• How many messages can reasonably be processed to rebuild an object?
• How large can objects become before they need to be checkpointed?
• How much does the size of the objects(number of fields) have an impact on the
speed of message serialization and deserialization?
EXPECTED OUTCOME
There is a great deal of hype about the efficiency of NoSQL databases and their typi-
cal eventual consistency model does allow for much greater burst throughput than SQL
databases. However the data does eventually need to be made consistent so while the
burst rate may be higher over the long run I would expect the performance to degrade
somewhat. Still I expect that NoSQL solutions will outperform their SQL based com-
petitors. The data being added is so simplistic and non-relational that the extensive
optimizations in SQL solutions will not be necessary nor effective.
45
There is also a lot of hype in the technical community about the effectiveness of light
weight serialization formats such as JSON and ProtocolBuffers. Certainly these formats
should be faster to read and write on a bit for bit level than XML which is far more
verbose. It is, however, possible that what these formats gain in length of serialization
and size they will lose due to complex parsing requirements which tax the CPU. I do
not believe that will be the case as processors are stunningly fast and generally limited
by disk I/O. However even the most efficient text based serialization is likely to be far
less efficient than a binary serializer. I expect that the binary format will be the most
efficient followed by text based serializations and finally XML serializations.
46
CHAPTER III
METHODOLOGY
There are a number of pre-built solutions for the storage and replay of events, the most
popular of which is Johnathan Oliver’s EventStore. EventStore is an amazingly extensible
system which allows for storing events in a large number of different databases in a large
number of formats. Many of the supported databases are simply there to prove that
it is easy to store events in almost any datastore15. There are, however, a significant
number of solutions which may be reasonable. I propose to explore some of the more
likely candidates and benchmark them to establish if the ideal event stores are relational
databases, cloud databases or even one of the newer NoSQL style databases such as a
KeyValue store or a document database.
In much the same way that there many database supported there are also a large
number of potential serializers. Many developers are most comfortable using the built in
serialization. The format of the language native serializations varies form implementation
to implementation. Java and C# tend towards a binary serialization format where as
languages such as Python, Ruby and JavaScript use text based serialization. Binary
serialization formats tend to be smaller than their text based alternatives which is highly
advantageous in an environment which has limited disk space or bandwidth. Their smaller
size also tends to make people believe that the mechanics of serializing to and from them
would be faster. As previously discussed there are some shortcomings when using a
binary serializer to store events. Largely these related to language lock in, a lack of
readability and por adaptability to changing object definitions. Even small variations
such as changing the namespace of an object can make deserialization in older software
impossible.
15The complete list of supported data stores can be found at https://github.com/joliver/EventStoreand includes such ridiculous datastores as Microsoft Access 2000 which should not be used to run alemonade stand let alone a complex CQRS system
47
EventStore provides a common platform for evaluating combinations of storage and
serialization format. The list of storage technologies is almost endless. The suported
formts list is slightly shorter with support for
• JSON
• BSON
• .net Binary
There are also secondary serializers which add compression or encryption on top of the
standard serializers. I believe that protocol buffers are an ideal serialization format for
EventStore however there is currently no support for it so I will add support for it.
In order to answer the research questions I shall generate a number of different domain
objects ranging from trivial objects to complex objects. A trivial object might have only
two or three fields while a more complex object might have dozens of fields. These
objects will be constructed in memory by replaying a number of events from the event
store. The messages will also vary in terms of complexity from setting a single field to
setting multiple fields using logic based on the current in memory representation of the
object. The time required to reconstruct the objects in memory from their serialized
message stream will be measured using a variety of data storage tools and serializations.
I will benchmark these data stores
• SQL Server(traditional relational database)
• MySQL (traditional relational database)
• SQLite (embedded relational database)
• MongoDB (document database)
Each of these will be locally installed in order to be tested.
The serializers which will be tested are
48
• JSON
• BSON
• Protocol Buffers
• XML
• Binary
For each of these serializations I will track the time taken as well as the size of the
resulting serialized object. This will provide metrics for the amount of disk space or
network bandwidth needed for each type. As mentioned there is currently no implemen-
tation in EvenStore for Protocol Buffers so I will add support using the protobuf-net16
library which is a well used protocol buffer implementation. There are many other text
based serialization formats which could be used such as Apache Thrift17 and YAML18
but they tend to have lower adoption than the formats we will be testing. XML is also
a favorite serialization format however it has fallen out of favour in most development
communities(except Java) due it its high file size overhead and the perceived complexity
of the myriad of associated standards.
ENVIRONMENT
All the testing will be done using the .net framework and any additional code will be
written in C#. There are no complexities in the testing or in EventStore proper which
would preclude the use of other languages. The same environment is simply kept to
ensure as much consistency as possible.
The tests will be run run on a computer with the following configuration
• Windows 7 - 64 bit
16http://marcgravell.blogspot.com/2011/05/protobuf-net-v2-beta.html17http://thrift.apache.org/18http://yaml.org/
49
• Intel i7-2600K 3.40GHz CPU overclocked to 4.10GHz which contains 4 physical
cores and 4 hyperthreading cores
• 16GB of memory
• 120GB Corsair Force3 Solid State Drive
• 2TB WD20EARS Rotational Disk
Serialized objects will be persisted to the rotational disk.
The tests will each be run 20 times to ensure that any outlying datapoints are
smoothed away. The actual variation in results is likely to be extremely low.
50
CHAPTER IV
RESULTS
SERIALIZATION AND DESERIALIZATION
The first thing investigated were the properties of various different serializers and dese-
rializers. JavaScript Object Notation or JSON is a human readable serialization format
which was originally proposed by Crockford[7]. BSON is a binary version of JSON. It
loses its human readability but is generally more space efficient and is also faster to scan.
For both the JSON and BOSN serialization experiments the Newtonsoft JSON serializa-
tion library was used19. This is the preeminent .net library for JSON serialization. XML
serialization is simply the transformation of the messages into the very well known XML
format. XML is human readable and is know for being quite verbose. The serialization
was handled by the built in serialization system in .net. Protocol buffers are a product
of Google’s internal workings[14]. They strive for speed and adaptability to changing
message fields. Unlike the other formats they ignore the names of the fields and instead
number them. This means that a protocol buffer serialization is not as easy to create
as the other formats. A .proto file must be created which acts as a mapping between
the field names and their numbers. It was for this reason that it turned out to include
protocol buffer serialization in the event store library which aims to require nothing from
teh developer but an object to serialize. There are a number of libraries for performing
protocol buffer serialization in .net, the test made use of the protobuf-net library20.
Three different metrics were examined for each of the serializares. First the speed of
serialization, second the speed of deserialization and finally the size of the serialized ob-
jects. Two different object were serialized, the first was a very simple message consisting
of only three fields. The second was a large object with twenty eight fields. The large
19http://james.newtonking.com/pages/json-net.aspx20http://code.google.com/p/protobuf-net/
51
1 pub l i c Guid ID { get ; s e t ; }2 pub l i c s t r i n g CustomerName { get ; s e t ; }3 pub l i c s t r i n g CustomerAddress { get ; s e t ; }
Code Listing 6: A simple message.
1 pub l i c Guid ID { get ; s e t ; }2 pub l i c s t r i n g Name { get ; s e t ; }3 pub l i c s t r i n g HouseNumber { get ; s e t ; }4 pub l i c s t r i n g StreetAddress { get ; s e t ; }5 pub l i c s t r i n g PostalCode { get ; s e t ; }6 pub l i c s t r i n g City { get ; s e t ; }7 pub l i c s t r i n g Province { get ; s e t ; }8 pub l i c f l o a t Height { get ; s e t ; }9 pub l i c s t r i n g HeightUnits { get ; s e t ; }
10 pub l i c f l o a t Weight { get ; s e t ; }11 pub l i c s t r i n g WeightUnits { get ; s e t ; }12 pub l i c s t r i n g ShoeSize { get ; s e t ; }13 pub l i c decimal LeftArmLength { get ; s e t ; }14 pub l i c s t r i n g LeftArmLengthUnits { get ; s e t ; }15 pub l i c decimal LeftWristCircumfrence { get ; s e t ; }16 pub l i c s t r i n g LeftWristCircumfrenceUnits { get ; s e t ; }17 pub l i c decimal LeftLegLength { get ; s e t ; }18 pub l i c s t r i n g LeftLegLengthUnits { get ; s e t ; }19 pub l i c decimal LeftAnkleCircumfrence { get ; s e t ; }20 pub l i c s t r i n g LeftAnkleCircumfrenceUnits { get ; s e t ; }21 pub l i c decimal RightArmLength { get ; s e t ; }22 pub l i c s t r i n g RightArmLengthUnits { get ; s e t ; }23 pub l i c decimal RightWristCircumfrence { get ; s e t ; }24 pub l i c s t r i n g RightWristCircumfrenceUnits { get ; s e t ; }25 pub l i c decimal RightLegLength { get ; s e t ; }26 pub l i c s t r i n g RightLegLengthUnits { get ; s e t ; }27 pub l i c decimal RightAnkleCircumfrence { get ; s e t ; }28 pub l i c s t r i n g RightAnkleCircumfrenceUnits { get ; s e t ; }
Code Listing 7: A complex message with 28 fields.
object was also serialized with only three fields filled out in an attempt to test how well
the serializers worked on a sparsely populated object.
RESULTS
50 000 messages were serialized to individual files on a rather slow rotational disk. The
trial was repeated 20 times and the averages recorded.
52
Serializer Avg Serialization(ms) Avg Deserialization(ms) Size on disk (b)
JSON 12326.25 2747.70 101
BSON 11476.55 2507.84 108
ProtocolBuffer 11654.91 1907.70 41
XML 13035.59 3870.07 293
Binary 11938.38 2751.45 360
Serializing 50 000 small message
Figure 13: The average time, in ms, for serialization of 50 000 small messages.
53
Figure 14: The average time, in ms, for deserialization of 50 000 small messages.
Figure 15: The on disk size of a serialized small message. In bytes.
54
Serializer Avg Serialization(ms) Avg Deserialization(ms) Size on disk (b)
JSON 15911.86 11285.09 722
BSON 23865.26 4398.40 759
ProtocolBuffer 14895.85 2149.17 193
XML 15389.13 6285.25 1382
Binary 14767.49 4931.53 1405
Serializing 50 000 large message
Figure 16: The average time, in ms, for serialization of 50 000 large messages.
55
Figure 17: The average time, in ms, for deserialization of 50 000 large messages.
Figure 18: The on disk size of a serialized large message. In bytes.
56
Serializer Avg Serialization(ms) Avg Deserialization(ms) Size on disk (b)
JSON 16400.63 11712.37 676
BSON 18244.99 2789.06 350
ProtocolBuffer 18889.18 1907.61 41
XML 16074.92 6297.91 679
Binary 14258.41 5006.68 1270
Serializing 50 000 sparsely populated large message (Only 3 of the 28 fields were
populated with non-default values)
Figure 19: The average time, in ms, for serialization of 50 000 sparsely populated largemessages.
57
Figure 20: The average time, in ms, for deserialization of 50 000 sparsely populated largemessages.
Figure 21: The on disk size of a serialized sparsely populated large message. In bytes.
STORAGE TECHNOLOGIES
An examination was also made of a variety of different storage technologies. Relational
database have, for a long time, been the default tool for storing any sort of structured
data but are they the best tool for storing events?
58
The relational databases were represented by Microsoft SQL Server and
All experiments were performed using 500 messages and were repeated 20 times.
Storage Technology Avg Serialization(ms) Avg Deserialization(ms)
SQL Server 39.35 11.10
MySQL 101.20 56.70
SQLite 3506.60 38.50
MongoDB 40.60 7.90
Figure 22: The time taken to serialize and deserialize 500 messages to a variety of differentstorage technologies. In ms.
59
CHAPTER V
RESEARCH IMPLICATIONS
What is apparent is that rebuilding domain objects from a stream of messages is not
very time consuming at all. There are clear advantages to maintaining a rich history of
domain objects and should the replay of messages become expensive checkpointing can
be used. However a thousand messages can easily be replayed in half a second so unless
the object’s history is in excess of 1000 messages checkpointing should not be needed.
If an object is really so mutable that it requires 1000 messages then there may well be
some design issues such as too large a domain object. It is unimaginable that an object
such as a customer address would ever be changed 1000 times.
Development is such that it is very difficult to suggest a single solution as a panacea.
It is such with event storage solution: no one solution which is best for all situations.
As with all technologies is best to examine the technology choice within the context
of the business. Adoption of NoSQL technologies such as MongoDB has been slow in
large enterprise and likely even slower in smaller companies[30]. In such cases it may be
difficult to get buy in from the company to use a non-traditional data storage technology.
Event storage is quite simple and can be done without a database at all. Abdullin now
uses simple on disk storage for his messages[4] and has even proposed writing raw data
to the disk to avoid the overhead of the file system. Because messages are accessed in a
sequential fashion for replay disk storage is a reasonable approach, however storing the
events in a database does allow for some additional analysis of metadata. For instance it
is easy to extract statistics about the number and type of messages and when they are
most prevalent, No matter which data storage choice is made the messages will still need
to be serialized.
60
SERIALIZATION TECHNOLOGY SELECTION
While the binary serializers seem to be faster in general than the text based serializers
there is a development cost in that any time a message needs to be debugged or investi-
gated it must first be opened in some sort of a tool which understands the serialization
format. Text serializations are easily read by the operator and issues are far more ap-
parent. Overall the difference in serialization time is rarely significant unless the number
of messages is astronomical. It is a far more common task to deserialize a message than
to serialize it so we should pay attention to serializers which are fast at deserialization
more than ones which are fast at serialization. For this case protocol buffer does seem
to have the advantage, in some cases protocol buffers demonstrated a full order of mag-
nitude difference in the time it took to serialize a message vs. deserialize it. However
protocol buffers requires that an explicit schema be created for each object. In many
CQRS systems this would require that hundreds of messages be explicitly set up for se-
rialization. The other serializers tested are schemaless in that they do not require any
serialization schema other than the class file. As mentioned previously serializaers which
fail on missing or new attributes should be avoided to allow for easier message mutation
as the system grows and changes.
We can still draw some best practices from the results. Large messages always took
longer to work with than small messages, even if the large messages were sparely popu-
lated. It is therefore preferable that use is made of a number of small messages over large
messages. This practice mirrors best programming practices which suggests that smaller
classes are far more maintainable than large classes.
DATABASE SELECTION
SQL server and MongoDB turned out to be the best databases at storing event streams.
The advantage of MongoDB is that it is not a single node database and, through eventual
consistency, it is possible to have multiple nodes to which messages can be written. This
61
would speed up writing significantly. However a similar approach can be taken with
SQL server. Because once a message is created it is immutable the messages can be
written across a number of smaller SQL Server nodes and then interwoven at replay time.
Selecting the appropriate writing node may be done in a simple round robin fashion. This
approach allows for massive scalability with almost no increased licensing costs.
The embedded solution, SQLite, was significantly slower than the dedicated solutions.
In most cases it should be avoided. However in the case of an offline or occasionally
connected client then an embedded database would be ideal for maintaining a collection
of messages generated on the client. Embedded databases might also be a good approach
for low traffic systems or packaged systems which are delivered to a client.
GENERAL APPLICABILITY TO CQRS
Event storage and more specifically the EventStore technology is a very small part of
CQRS and may not even be used in many implementations. Indeed the most well writ-
ten about CQRS implementations those from the company LOKAD do not make use of
EventStore. However the underlying database and serialization technologies are of impor-
tance to CQRS developers and should be taken as a general examination of serialization
techniques for messages.
62
CHAPTER VI
CONCLUSIONS
CQRS+ES is a fascinating solution to a number of different problems which are common
in enterprise computer programming. While CQRS may not be applicable to all situations
there are many places in which the advantages are clear. Any sort of financial problems
benefit greatly from having a historical stream of events to aide with auditing. Equally
the ease of scaling CQRS means that adapting to a growing business is simplified. Perhaps
the single greatest advantage is that adapting to changing business requirements is much
easier with a CQRS+ES solution.
There is a definite overhead to developing with CQRS+ES. The initial project setup
is far more difficult than with a traditional project. Many of the tools such as object
relational mappers to which developers are use are no longer as heavily used as they
were in the past. This means that finding developers to work on CQRS+ES solutions
is much more difficult. This is especially true at the moment as CQRS+ES is very new
and relatively few developers have had a chance to become familiar with the ins and
outs of it. There is a lack of literature and training material on CQRS+ES, for the most
part the training is limited to a handful of videos recorded by the likes of Udi Dahan
and Greg Young. There are limited training classes and the mailing list is extremely
verbose. Obviously these situations will improve as CQRS+ES become more accepted.
That Microsoft has directed their Patterns and Practices group to examine CQRS and
develop guidelines around it will likely spur adoption in a community which is largely
suspicious of anything which is not backed by a large software vendor.
There is also a lack of frameworks for CQRS. While the general feeling on the mailing
list is that frameworks are totally unnecessary and, in fact, undesirable many people
will avoid making using of CQRS without some sort of a framework to help them along.
With such hostility towards a framework it is unlikely that a common framework will be
63
adopted at any point in the near future.
Developers are starting to bump into the limits of traditional relational databases
and application architecture. As applications scale many of the old ways of working
are proving to be simply too slow. There is also a large discrepancy in the cost of
computers. It is far cheaper to buy a large number of small computers than to buy the
expensive high end multihundred CPU systems. This commodity computing approach
is one which is well proven by companies such as Google and Amazon. At the same
time the popularization of cloud computing is presenting the opportunity to scale out an
application quickly. Under these conditions CQRS is an ideal architecture for it allows for
quick scale out. Some cloud systems offer discounts for performing processing at off-peak
hours which is a great model for CQRS as much of the message processing can be delayed
and developers are already thinking about how quickly processes need to run.
I will certainly be aware of the possible applications of CQRS in any green fields
development projects in which I am involved in the next while.
64
Glossary
• OODBMS - Object Oriented Database Management System. Similar to a tradi-
tional RDBMS but object oriented rather than table based.
• RDBMS - Relational Database Management System. A system which uses tables
and relationships to organize, sort and query structured information. Examples
include systems such as Oracle, MySQL and Microsoft SQL Server.
• DTO - Data Transfer Object. A very light weight object in an OO language
which simply acts as a container for transporting data from one system or location
to another. Usually these objects have no functions or method, only fields or
properties.
• DDD - Domain Driven Design. A software engineering approach which favours
creation of software through the establishment of business models developed in
close conjunction with domain experts.
• BCNF - Boyce Codd Normal Form. A strong normal form for databases in which
each and every one of the non-trivial dependencies X → Y is a superkey.
• JSON - JavaScript Object Notation. A standard method of serializing JavaScript
objects. Serialized objects take the form { <property>: <value> } with arrays
denoted by square braces. It was originally proposed by Douglas Crockford in
RFC4627 21 and has since progressed from an addon library to being natively im-
plemented by all major browsers.
• BSON - A binary representation of JSON data. Its goal it is to be more efficient
for decoding than JSON however as it is binary it loses the advantage of being
human readable like JSON.22
21http://tools.ietf.org/html/rfc462722http://bsonspec.org/#/specification
65
• Protocol Buffer - A language and platform neutral serialization format developed
by Google for use in their internal RPCs. Google claims that they are far more
efficient than using XML both in terms of the size of the payload and in the speed
of serialization and deserialization. 23
• SOLID Principles - A set of software engineering principles most often ascribed
to ”Uncle” Robert C. Martin[19, 18]. In short they are
– Single Responsibility Principle - There should never be more than one
reason for a class to change
– Open Closed Principle - Software entities (classes, modules, functions, etc.)
should be open for extension, but closed for modification. That is to say that
the source should not change but the objects can be modified by extending
them.
– Liskov Substitution Principle - Functions that use pointers or references
to base classes must be able to use objects of derived classes without knowing
it. For instance a function which takes a shape object should be able to take
a triangle object as it is more specific. This is also known as covariance.
– Dependency Inversion Principle - High level modules should not depend
upon low level modules. Both should depend upon abstractions. Abstractions
should not depend upon details. Details should depend upon abstractions.
– Interface Segregation Principle - Clients should not be forced to depend
upon interfaces that they do not use, instead specific interfaces should be
created which present only the used methods.
• UUID - Universally Unique Identifier. A 128-bit record identifier used for identifing
pieces of data in a large system. Often represented as a hexadecimal string with
23http://code.google.com/apis/protocolbuffers/docs/overview.html
66
hyphens in the form 8-4-4-4-12, an example might be 75e67b67-b605-4ff7-bdc7-
51e5ca7f0a0b. The 128-bit keyspace is so large that it is highly, highly improbable
that any two generated keys will be the same even taking into account the birthday
paradox.
• MSDTC - Microsoft Distributed Transaction Coordinator. A system used to com-
mit transactions across a number of databases or systems ensuring that information
is committed in a consistent fashion.
67
References
[1] Serialization (C# and visual basic). http://msdn.microsoft.com/en-
us/library/ms233843.aspx, 2010.
[2] Module: Marshal. http://ruby-doc.org/core/classes/Marshal.html, 2011.
[3] Python object serialization. http://docs.python.org/library/pickle.html, 2011.
[4] Rinat Abdullin. Event sourcing: Projections. http://bliki.abdullin.com/event-
sourcing/projections.
[5] Eric Brewer. Towards robust distributed systems, July 2000.
[6] S. Choenni, H. Blanken, and T. Chang. Index selection in relational databases. In
, Fifth International Conference on Computing and Information, 1993. Proceedings
ICCI ’93, pages 491–496. IEEE, May 1993.
[7] Douglas Crockford. The application/json media type for JavaScript object notation
(JSON). http://tools.ietf.org/html/rfc4627, 2006.
[8] Udi Dahan. Advanced distributed systems, October 2010.
[9] Udi Dahan. Race conditions don’t exist. http://www.udidahan.com/2010/08/31/race-
conditions-dont-exist/, 2010.
[10] Eric Evans. Domain-Driven Design: Tackling Complexity in the Heart of Software.
Addison-Wesley Professional, 1 edition, August 2003.
[11] Martin Fowler. Event sourcing. http://martinfowler.com/eaaDev/EventSourcing.html,
December 2005.
68
[12] Martin Fowler. MemoryImage. http://martinfowler.com/bliki/MemoryImage.html,
August 2011.
[13] Seth Gilbert and Nancy Lynch. Brewer’s conjecture and the feasibility of consistent,
available, Partition-Tolerant web services, 2002.
[14] Google. Developer guide - protocol buffers.
http://code.google.com/apis/protocolbuffers/docs/overview.html.
[15] Dan Haywood. An introduction to domain driven design.
http://www.methodsandtools.com/archive/archive.php?id=97, 2009.
[16] IDC. The diverse and exploding digital universe, 2008.
[17] Tobias Kind. RAMDISK benchmarks, 2009.
[18] Robert C. Martin. Agile Software Development, Principles, Patterns, and Practices.
Prentice Hall, 1 edition, October 2002.
[19] Robert C. Martin. ArticleS.UncleBob.PrinciplesOfOod.
http://butunclebob.com/ArticleS.UncleBob.PrinciplesOfOod, 2005.
[20] Steve McConnell. Code Complete: A Practical Handbook of Software Construction.
Microsoft Press, 2nd edition, July 2004.
[21] Dwight Merriman. NoSQL and MongoDB.
[22] Bertrand Meyer. Object-oriented software construction. Prentice Hall, Upper Saddle
River, NJ, 1997.
[23] Microsoft. Patterns and practices: Upcoming releases.
http://msdn.microsoft.com/en-us/practices/bb232643.
69
[24] Sun Microsystems. Java object serialization specification.
http://download.oracle.com/javase/1,5.0/docs/guide/serialization/spec/serialTOC.html,
2004.
[25] Jonathan Oliver, Udi Dahan, Rinat Abdulin, and Jonathan Matheus. When to avoid
CQRS - clarified.
[26] R. Pawson. Naked objects. IEEE Software, 19(4):81– 83, August 2002.
[27] Raghu Ramakrishnan and Johannes Gehrke. Database Management Systems.
McGraw-Hill Science/Engineering/Math, 3 edition, August 2002.
[28] Matt Richtel. Tech recruiting clashes with immigration rules. The New York Times,
April 2009.
[29] Stefano Rivera. Storage & retrieval for an immensely scalable monitoring system,
2008.
[30] Darryl K. Taft. NoSQL makes big inroads in enterprise development:
Survey. http://www.eweek.com/c/a/Desktops-and-Notebooks/NoSQL-Makes-Big-
Inroads-in-Enterprise-Development-Survey-500444/, June 2011.
[31] Trefis Team. Surging iPad shipments this quarter likely to top forecasts, boost $510
target price. http://www.forbes.com/sites/greatspeculations/2011/09/06/surging-
ipad-shipments-this-quarter-likely-to-top-forecasts-boost-510-target-price/, Septem-
ber 2011.
[32] Martin Thompson and Michael Barker. LMAX - how to do 100K TPS at less than
1ms latency. http://www.infoq.com/presentations/LMAX, 2010.
[33] Steve Yegge. Rip rowan - google+ - stevey’s google platforms rant, October 2011.
[34] Greg Young. CQRS workshop, July 2011.
70