58
Decision Support and Business Intelligence Information Technologies for Business Intelligence Master Thesis Kofi Nyamekye Manful Visualizing Economic Metrics of the Bitcoin Transaction Graph prepared at AVIZ/INRIA Defended on September 3-4, 2015 Advisor : Petra Isenberg - AVIZ/INRIA [email protected] Advisor : Tobias Isenberg - AVIZ/INRIA [email protected]

Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

Decision Support andBusiness Intelligence

Information Technologies for BusinessIntelligence

Master Thesis

Kofi Nyamekye Manful

Visualizing Economic Metricsof the Bitcoin Transaction Graph

prepared at AVIZ/INRIADefended on September 3-4, 2015

Advisor : Petra Isenberg - AVIZ/[email protected]

Advisor : Tobias Isenberg - AVIZ/[email protected]

Page 2: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered
Page 3: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

AcknowledgmentsI am immensely grateful to Daniel Augot of the Aije-Bitcoin team for makingit possible for me to be a part of this research project. My sincere gratitudealso goes to Tobias and Petra Isenberg who supervised and advised meduring the thesis work. Without the time and attention they devoted tome, and the detailed feedback they provided, this thesis would not havebeen possible.

I thank all my friends and family who supported me at various pointsthroughout the project. Most of all I would like to express my gratitude toGod whose divine providence has sustained and continues to sustain me.

Page 4: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered
Page 5: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

Contents

1 Introduction 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.5 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . 4

2 Related Work 52.1 User Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Economic Analysis on the Transaction Graph . . . . . . . . . 72.3 Visualizations of the Transaction Graph . . . . . . . . . . . . 9

2.3.1 Chart Views . . . . . . . . . . . . . . . . . . . . . . . 102.3.2 Overview Graph views . . . . . . . . . . . . . . . . . . 122.3.3 Detail Graph Views . . . . . . . . . . . . . . . . . . . 13

2.4 Visualization Framework . . . . . . . . . . . . . . . . . . . . . 14

3 Requirement Gathering 173.1 Task Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . 173.2 Prototype Demo . . . . . . . . . . . . . . . . . . . . . . . . . 193.3 Card Sorting Exercise . . . . . . . . . . . . . . . . . . . . . . 19

4 Solution 234.1 Data Extraction . . . . . . . . . . . . . . . . . . . . . . . . . 234.2 Data Warehouse Design . . . . . . . . . . . . . . . . . . . . . 234.3 Entity Clustering . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.3.1 Extended Clustering Heuristic . . . . . . . . . . . . . 254.4 Preview-Then-Detail . . . . . . . . . . . . . . . . . . . . . . . 284.5 System Architecture . . . . . . . . . . . . . . . . . . . . . . . 304.6 Visualizations . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5 Evaluation 375.1 Qualitative Evaluation . . . . . . . . . . . . . . . . . . . . . . 375.2 Quantitative Evaluation . . . . . . . . . . . . . . . . . . . . . 385.3 Further Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

6 Conclusion 41

A Card Sorting Results 43

B Data Warehouse Logical Schema 47

Page 6: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

iv Contents

Bibliography 51

Page 7: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

Chapter 1

Introduction

The last decade has witnessed a substantial increase in digital currencies.Although research into digital currencies has been going on for longer thanthat [Flohr 1996, Gladstone 1996], the rise of Bitcoin and other publiclyaccessible digital currencies has brought additional interest from the field ofEconomics. The details behind what makes Bitcoin technologically feasibleare well documented [Nakamoto 2008, Eyal 2014], however its behaviour asa currency and how people interact with it are not as well understood. Inthis thesis I discuss the design of a system to support the study of economicfeatures of Bitcoin.

1.1 Motivation

Bitcoin is one of several digital currencies that facilitates online payments.At the time of writing (June 2015), it was priced at approximately US$250with a market capitalization of US$3.5 billion and an average of 100,000 dailytransactions1. With over 80% of the total market share of digital currencies,this makes Bitcoin the largest digital currency by transaction volume andvalue. Digital currencies like Bitcoin are of particular interest to economists.There are questions regarding its similarity to traditional fiat currencies,whether it follows similar economic models to traditional currencies andwhether it can be a viable alternative to existing payment systems [Ali 2014,Evans 2014]. One big advantage for economists studying Bitcoin is that thecomplete transaction history is stored in an easily accessible and verifiablepublic ledger.

However, analysis of this ledger is not straightforward, especially if theeconomist does not have a technical background in Computer Science. Thesize of the data is quite large (in excess of 60 million transactions) and isstored in a graph data structure. The protocol by which the Bitcoin networkoperates results in a different concept of an account and account holder thaneconomists might be used to. By applying data analysis strategies, the workpresented in this thesis attempts to make study of the public ledger morestraightforward for researchers with an economic background.

1Market capitalization and transaction volumes from http://www.coinmarketcap.comand https://blockchain.info respectively

Page 8: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

2 Chapter 1. Introduction

1.2 Problem Statement

Bitcoin, unlike traditional fiat currencies such as the dollar or euro, canbe considered a decentralized currency because it operates without a sin-gle overall administrator. It makes use of public key signatures and othercryptographic tools to enable the decentralized processing and to obfuscatethe identities of its users. Since its invention and launch in 2008 and 2009respectively, the number of users and amount of currency available havesteadily grown [Ron 2013, Spagnuolo 2014, Reid 2013].

On the Bitcoin network, a user can own one or more addresses (identifiedby a public key) and verifies ownership of an address by providing the privatekey for the given address. A transaction using the Bitcoin protocol takesplace when a user sends bitcoins (the name of the unit of value on thenetwork) from one or more source addresses to one or more target addresses.As there is no central repository of transactions, the Bitcoin protocol relieson an innovative way of preventing double spending and other forms offraud. This is done through the use of blockchain technology which allowsall participants on the network to come to a consensus regarding the stateof the network [Nakamoto 2008].

The information verified and secured by the blockchain is stored in adistributed public ledger which records all transactions that have publiclytaken place since the start of the network. The cryptographic techniquesused in the protocol and blockchain mean that all recorded transactions inthe ledger are immutable. This makes the transaction ledger an excellentsource of information for studying the characteristics of the Bitcoin currency,its usage, and its evolution. The entire transaction history in the ledger canbe represented as a graph in which nodes represent addresses (which holdbitcoins) and edges represent transactions (which transfer bitcoins). Twonodes are connected if there was a transfer of funds from one to the other.

One challenge is that this data structure does not easily lend itself toeconomic analysis. The large number of transactions (over 65 million) makesthis graph complicated to process. This is compounded by the Bitcoin pro-tocol’s policies on address ownership and transactions. Users can (and areencouraged) to use as many different addresses as possible2. The protocolalso requires that a source address be completely emptied in a transaction.This results in new addresses being created for virtually every transaction,thereby further increasing the cardinality of addresses.

Figure 1.1a shows a user in control of address A making a payment of5 bitcoins to another user who controls address B. Since the source addressholds more bitcoins than necessary for the payment, the remainder is placedin a new address C, which is under the control of the first user. In Figure1.1b, the first user needs to make a payment to the second user but does not

2Address reuse: https://en.bitcoin.it/wiki/Address_reuse

Page 9: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

1.3. Methodology 3

Address ATr

ansa

ctio

n

Address B

Address C

20 BTC 5 BTC

15 BTC

(a) Single input.

Address A Address D

Address C

2.5 BTC5 BTC

0.4 BTC

Address B

Address E

1.7 BTC

1.2 BTC

Tran

sact

ion

(b) Multiple inputs.

Figure 1.1: Two instances of multi-address bitcoin transactions. In theseexamples, the owner of the gray addresses is sending bitcoins from the ad-dress on the left to the owner of the pink address and putting the remainderin a change address on the right.

have enough in any of the addresses under his/her control. Therefore theyare all combined to make the payment to the second user and the remainderplaced into a new address. This means addresses cannot be thought of asbank accounts in the traditional economic sense.

While details at the micro-economic level (such as transaction amounts)are relatively easy to extract from the graph structure, it is much moredifficult to retrieve other macro-economic measures (for example money ve-locity). The obfuscation of accounts and their owners as well as the unusualtransaction policies means some pre-processing has to be done to transformthe public ledger into a data structure more suitable for economic analysis.A tool that abstracts the transaction graph would remove the need for tech-nical knowledge of the graph and would provide a means for economists tostudy the public ledger.

To that end, this thesis focuses on describing a structure for a datawarehouse that will help to reorganize the data in the transaction graphso that it not only provides the economic metrics of interest to researchersstudying Bitcoin, but also provides them in a manner that facilitates visualand interactive data exploration.

1.3 Methodology

My research was carried out in collaboration with another student whose fo-cus was on designing the visualization tool that would use the data generatedand stored in the data warehouse. To ensure that the data warehouse andvisualization tool to be developed would meet actual needs of economists, wefollowed a general design life-cycle model [Sharp 2007]. The end users of the

Page 10: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

4 Chapter 1. Introduction

tool were economists from the RITM3 research lab. Before any design workbegan, we met with the economists to identify their needs and requirements.This involved eliciting the kinds of metrics and economic phenomena theywere looking to investigate using the Bitcoin transaction graph. Followingfrom this, we produced alternative designs in an attempt to match the dataneeds of the economists. Subsequent meetings were organized where thedesigns were presented to the economists for feedback. That feedback wasused to refine the features and designs of the visualization tool which I inturn used to modify the design of the data warehouse. A demo of the toolwas made at a conference4 and we used the user feedback to further mod-ify the designs. This feedback and refinement process continued iterativelyuntil we produced a close-to-final product to meet the need of the end users.

1.4 ContributionPrevious attempts at performing economic analysis on the Bitcoin trans-action graph have lacked comprehensive support for visual-aided analysis.Some of those with a visual element, have lacked an interactive component.This thesis describes a structure for building a data warehouse that sup-ports interactive investigation of different economic phenomena associatedwith Bitcoin. Also included in this thesis is a preview-then-detail approachimproving the interactivity of the tool when certain ad hoc queries are sentto the data warehouse. This is necessary because the visualization tool beingdeveloped by the other student allows users to request data which cannot besupplied by the standard slice, dice and drill down operations provided by astandard data warehouse. In order to keep the visualization tool interactiveeven when unplanned queries are executed, novel querying techniques wereneeded.

1.5 Thesis OrganizationThe remainder of the thesis is divided as follows: Section 2 discusses previousresearch efforts into analyzing and visualizing the Bitcoin transaction graph.Section 3 covers the methods I used in the design process for the visualizationtool. Section 4 introduces the data warehouse I designed as a solution to theneeds discussed in the Introduction and explains my research contributions.In Section 5, I evaluate the solution I design and discuss potential futureimprovements. Finally Section 6 summarizes the work presented in thethesis.

3Réseaux Innovation Territoires et Mondialisation (Networks, Innovation, Space, andGlobalization)

4La 6e édition de l’Atelier sur la Protection de la Vie Privée, 2015, Mosnes

Page 11: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

Chapter 2

Related Work

As Bitcoin has gained traction over the last six years, the level of interestfrom researchers seeking to understand what it is and how it operates hasincreased as well. As a result, some analysis has been done on the Bitcointransaction graph to try and explain some of its interesting features. A fewattempts have also been made to build tools to aid in this analysis. Thissection reviews some of the most important existing work in this area.

2.1 User Clustering

One of the biggest challenges to performing any kind of analysis on the Bit-coin transaction graph is the pseudo-anonymity it provides its users. Thepublic ledger only contains addresses which represent sources and destina-tions in a transaction. Each address is a randomly generated, case-sensitive,26-35 alphanumeric string1. Within the transaction graph, no association ismade between an address and a real world person or entity.

Since a user can and is encouraged to use different addresses ratherthan reusing an address [Ron 2013, Reid 2013], identifying which addressesbelong to a single user is complicated. Identifying users is important becausefor a lot of the economic analysis to be performed, it is crucial to distinguishbetween users. For example to determine the flow of the currency, we need toknow if the bitcoins being sent between two addresses constitute an economictransaction, or just transfer of funds from one address to another of the sameuser. Without being able to determine which addresses belong to the sameuser, the size of economic activity in the transaction graph will be grosslyoverestimated.

Within the domain of privacy research, a lot of work has gone into ex-amining the level of anonymity provided by the Bitcoin protocol. Whilethis thesis is not focused on de-anonymization, I discuss and use some of thetechniques suggested for clustering addresses into single users/entities.

In 2013, Fergal Reid and Martin Harrigan carried out an analysis ofanonymity in the Bitcoin system [Reid 2013]. Their aim was to show that theuse of randomly generated addresses would not provide complete anonymityfor users within the Bitcoin system. To do this, they built a network graphof transactions within the public ledger. A user network graph was also

1https://en.bitcoin.it/wiki/Address

Page 12: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

6 Chapter 2. Related Work

built showing how bitcoins moved from one user to another. To build theuser graph, they exploited a by-product of a feature of Bitcoin transactions.

In the Bitcoin system, addresses are made up of a public and private keypair. A user can only transfer bitcoins from an address if he/she possessesthe private key for that address. As explained in Section 1.2 a user cansend money from multiple addresses in a single transaction. Since a usermust possess the private keys for all the source addresses in a transaction,we can link all addresses which are combined as inputs in a transaction. Byusing this method, Reid and Harrigan were able to contract the network ofaddresses into clusters in which all addresses belong to a single user/entity.They went on to compare the user graph with the transaction graph andinformation scraped from websites such as Bitcoin forums. They used thisinformation to de-anonymize some addresses and trace an allegedly stolen25,000 bitcoins.

Another strategy they proposed for clustering addresses was to examinetransactions over an extended period of time and look for addresses whichtend to be used around the same times. Although this offers the possibilityof further clustering addresses, the degree of uncertainty with this approachwould result in addresses being added to wrong clusters thereby offsettingany gains. Consequently, in this thesis, I decided to use the first approachfor creating clusters.

Another approach for augmenting the clusters created is described byMichele Spagnuolo in his paper on BitIodine [Spagnuolo 2014]. Building onthe work done by Reid and Harrigan in clustering input addresses, Spagnuolosuggests a basis on which to add output addresses to a cluster. As was shownin Figure 1.1b, sometimes the sum of the input addresses exceed what theuser wishes to transfer to another user. In that case the remainder is placedinto a "change" address which is controlled by the original user. Unlike thecase with the input addresses, deciding which of the output addresses isthe change address is not deterministic. Spagnuolo outlines a heuristic formaking that decision.

For the majority of users, the choice of change address is made by apiece of software which has been programmed to choose a new address as achange address. Therefore, we can infer that of the output addresses, thechange address has to be one which has never been seen in the transactionhistory until that point. With this heuristic, Spagnuolo is able to add moreaddresses to the clusters generated using the input address heuristic. Thisapproach is not as reliable, however, and carries the potential to create falsepositives. In other words, it is possible that an output address which hasnot been seen before may belong to a different user than the one controllingthe input addresses. In Section 4.3.1, I discuss an extended clustering whichminimizes this issue.

Using the approaches mentioned in the above papers it is possible tocreate clusters and thereby identify users/entities within the transaction

Page 13: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

2.2. Economic Analysis on the Transaction Graph 7

Cluster Size Number of Clusters1 2,214,186

2-10 234,01511-100 12,026101-500 499501-1,000 351,001-5,000 415,001-10,000 510,001-50,000 550,001-100,000 1

100,000< 1

Table 2.1: The distribution of the number of addresses per cluster. Tabletaken from [Ron 2013, p.6]

graph, in the process overcoming the challenge created by anonymization ofaddresses in the public ledger.

2.2 Economic Analysis on the Transaction Graph

With the ability to identify clusters, researchers were able to study economicfeatures of the Bitcoin network. One of the first attempts to analyze theBitcoin transaction graph was done by Dorit Ron and Adi Shamir. In their2013 paper [Ron 2013], they processed the transaction graph and by ana-lyzing various statistical properties, were able to answer some interestingquestions about the behaviour of users of Bitcoin.

One of the questions they examined was how users of the system wereusing their bitcoins. They wanted to know if bitcoins were being used intransactions or whether they were being stored in "savings accounts". Todo this, they summed up the balances of all addresses that had been activeuntil three months before their cut-off date. They also calculated all thebitcoins that were in existence up until their cut-off date. By comparing thetwo, they discovered that between 51% to 55% of all the existing bitcoinswere stored in addresses which had not been active in over three months,suggesting that the majority of Bitcoins were being stored rather than used.

Ron and Shamir were also able to show that several economic metrics ofthe transaction graph have long tail distributions. We can see this in Tables2.1 to 2.3 where we are able to observe the cluster size, current balance, andnumber of transactions and their distribution with respect to the number ofaddresses and clusters.

This long tail distribution also applies to the transaction size and accu-mulated income of clusters. Though the metrics are presented in tabular

Page 14: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

8 Chapter 2. Related Work

Current Balance (in bitcoins) Number of Clusters Number of Addresses0 ≤0.01 2,097,245 3,399,539

0.01 ≤0.1 192,931 152,8900.1 ≤10 95,396 101,18610 ≤100 67,579 68,907

100 ≤1,000 6,746 6,7781, 000 ≤10,000 841 84810, 000 ≤50,000 71 6550, 000 ≤100,000 5 3100, 000 ≤200,000 1 1200, 000 ≤400,000 1 1

400,000 < 0 0

Table 2.2: The distribution of bitcoins per cluster and address as of May 132013. Table taken from [Ron 2013, p.9]

form in the paper, the concept of the long tail distribution would be mucheasier and quicker to present in graphical form. An interactive tool wouldalso allow researchers to modify the boundaries of the discrete groupings(e.g. 1,000 ≤ 5000) that were chosen in this paper.

Another economic analysis of Bitcoin can be found in the third of theBank of England’s quarterly bulletins for 2014 [Ali 2014]. The Bank ofEngland carried out research and analysis on digital currencies using Bitcoinas a representative. The authors of the article sought to assess the extentto which digital currencies are used as money. They define money usingeconomic theory as that which serves three purposes: a storage of value,a medium of exchange, and a unit of account. The authors state that theshort term volatility in prices make digital currencies such as Bitcoin a poorshort term store of value. However, they go on to note that "...the worth ofbitcoin as a medium or long-term store of value, however, depends on thestrength of demand over time..."[Ali 2014].

Although Bitcoin is not currently a widely used medium of exchange, itsfuture widespread adoption will be affected by the transaction fees on theplatform. The authors show in a thought experiment that Bitcoin’s fixedmoney supply will in the long run lead to higher transaction fees[Ali 2014].Evaluating statements such as these is difficult to do directly from the trans-action graph. An interactive visualization tool would be extremely usefulin studying the transaction graph to examine user demand, price activity,transaction fees and compare how those have varied over time.

Page 15: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

2.3. Visualizations of the Transaction Graph 9

Transaction Volume Number of Clusters Number of Addresses1 ≤2 557,783 495,7732 ≤4 1,615,899 2,197,8364 ≤10 222,433 780,433

10 ≤100 55,875 228,275100 ≤1,000 8,464 26,789

1, 000 ≤5,000 287 1,0325, 000 ≤10,000 35 51

10, 000 ≤100,000 32 24100, 000 ≤500,000 7 3

500,000 1 2

Table 2.3: The distribution of transaction volume per cluster and addressas of May 13 2013. Table taken from [Ron 2013, p.9]

2.3 Visualizations of the Transaction GraphAs stated in the previous section, visual tools can be beneficial for under-standing the processes around Bitcoin. Visualizations of the Bitcoin trans-action graph typically come in one of three forms:

Chart ViewsThis refers to views in which a chosen metric is plotted with respect tosome variable of interest (usually time). These can be bar charts, linechart, scatter plots among others. A frequently occurring example ofthis view is a graph showing the variation of transaction volume withrespect typically to time.

Overview Graph ViewsIn this view, clusters of addresses in the transaction graph are thefocus. This is because with the large size of data, a graph visualizationof all addresses and transactions will be indecipherable. Therefore anoverview graph visualization may show how the clusters interact witheach other over a period of time. An alternative version of this viewcould be a visualization that focuses on how bitcoins are distributedwithin the cluster.

Detail Graph ViewsDetail views are usually used to depict details within the transactiongraph. Once again, since displaying the whole graph would make itimpossible to track any level of detail, detail views focus on a subsec-tion of the transaction graph. A typical visualization in this categorywould be one that focuses on a set of bitcoins and tracks how they getdispersed within the network.

Page 16: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

10 Chapter 2. Related Work

Figure 2.1: Chart showing a strong correlation between transaction fees perblock (in black) and the price of bitcoins (in red). The smoothed curveshows that total fees have stabilized at about 45 USD per block in 2014.Image taken from [Möser 2015, p.8].

2.3.1 Chart Views

Creating chart visualizations for the Bitcoin transaction graph can be aresource intensive process. It involves traversing the entire graph and aggre-gating the desired metric (for example transaction fees ) at each node. Thiswas the approach taken in the 2015 paper, "Trends, Tips, Tolls: A longitu-dinal Study of Bitcoin Transaction Fees" [Möser 2015]. Möser and Böhmewere interested in how transaction fees had developed and changed overtime as well as how transaction fees affected the speed with which trans-actions were processed. To do this, they imported the transaction graphinto an instance of the Neo4j graph database and then extracted the trans-action amounts and fees. From the website blockchain.info, they gatheredadditional data on the transaction processing time for a randomly chosensample of 9000 transactions. The visualizations they produced revealed acouple of insights:

The authors plotted the price of bitcoins in US dollars and the transac-tion fee per block (a block is a group of transactions that are processed atthe same time) in US dollars. This is shown in Figure 2.1. The graph showeda strong correlation between the value of transaction fees per block and theprice of bitcoins. The authors concluded that, with regards to transactionfees, bitcoins rather that US dollars seem to be the unit of account. Theyalso studied the distribution of transaction fees per transaction and how thathad varied with time. As shown in Figure 2.2, the majority of transaction

Page 17: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

2.3. Visualizations of the Transaction Graph 11

Figure 2.2: Distribution of transaction fees (in bitcoins). Image taken from[Möser 2015, p.9].

fees belong to one of a few values (0, 0.0001, 0.0002, 0.0005, 0.001, 0.01).These values tend to coincide with the update of the software used by par-ticipants on the Bitcoin network. The authors state that these values arethe default transaction fees provided in the software. This would suggestthat the majority of users do not actually set transfer fees or strategicallyuse the ability to set transfer fees.

Another use of a chart view can be found in the paper by Ober, Katzen-beisser and Hamacher, where they plot a line chart of the number of ad-dresses created on the network, the number of addresses used, and the num-ber of entities on the network (Figure 2.3) [Ober 2013]. Using the sameclustering technique proposed by Reid and Harrigan, they are able to groupaddresses into clusters, treating each cluster as an economically active en-tity. The chart shows clearly that the ratio of addresses to entities hasdropped from over 100:1 to little more than 2:1 implying that the Bitcoinsystem has changed from having a few users with hundreds of addresses tohaving hundreds of thousands of users with one or two addresses each. Thedrastic reduction in the address-entity ration is marked by the leftmost lineon the chart and coincides with the start of public trading of bitcoins. Thelaunch of Bitcoin exchanges which allowed the trade of bitcoins resulted inthe expansion of Bitcoin’s user base from technology enthusiasts to the moremainstream population who acquire bitcoins without requiring knowledge ofthe technical details or software and hardware to operate on the network.A final example of a chart view is a scatter plot chart of user activity versusprice of bitcoins. This chart was can be found in the paper "Exploring the

Page 18: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

12 Chapter 2. Related Work

Figure 2.3: Number of all public keys, used public keys and entities. Imagetaken from [Ober 2013, p.240]

Bitcoin" Network [Baumann 2014], in which the authors show that increasedprices of bitcoin coincided with increased activity by users on the network.The authors used this to suggest that users were engaging in financial spec-ulation with bitcoins.

2.3.2 Overview Graph views

Creating graph views of the Bitcoin transaction graph is a difficult challenge.With the numbers of addresses and transactions in the tens of millions, adirect visualization of the transaction graph is impossible to decipher. Oneway of dealing with this is to cluster addresses and transactions according toentities. Reid and Harrigan used this approach to create a graph visualiza-tion [Reid 2013]. The clustering algorithm they used is explained in Section2.1.

In their paper, they try to examine anonymity in the Bitcoin systemusing donations to Wikileaks as a case study. After aggregating the datain the transaction graph they created the egocentric visualization in Figure2.4. In this graph, the Wikileaks cluster is at the center of the graph andother entities that have made donations to that address are shown in varyingcolors. The authors attempted to use the ’warmth’ of a color to differentiatethe volume of bitcoins those entities have handled. The size of each clusteris proportional to the number of addresses it has. The visualization alsoshows interactions of the clusters with each other if any exist.

Page 19: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

2.3. Visualizations of the Transaction Graph 13

Figure 2.4: Egocentric visualization showing the Wikileaks address and enti-ties that have donated bitcoins to Wikileaks. Image taken from [Reid 2013,p.19]

The Bitcoin Big Bang2 is another graph based visualization of the trans-action graph. The creators of this visualization attempted to show "theemergence over time of the largest entities on the Bitcoin blockchain, andtheir interconnectivity". This graph view (Figure 2.5) also uses the size ofa node to reflect the total volume of bitcoins used by the node while us-ing concentric cirlces to show the year in which they first showed up in thetransaction graph. The thickness of a connecting line is proportional to thevolume of bitcoins in the transactions represented by that line.

2.3.3 Detail Graph Views

Economic research on the Bitcoin transaction graph can also be aided bydetail views. In this case, overall views of the transaction graph are replacedwith localized views. This was the case when Ron and Shamir tried totrack transactions with a value greater than 50,000 bitcoins [Ron 2013].The visualization produced showed addresses that had taken part in thosetransactions and how the bitcoins had been transferred between them.

An interactive version of a detail view can be found at CoinAlytics3.This visualization tool allows a user to select a particular address and trackthe flow of bitcoins through the transaction graph. The interactive natureof the tool means a user can decide which area of the subgraph to view inmore detail (Figure 2.6).

2https://www.elliptic.co/bigbang-v1.html3http://coinalytics.co/tools/tracker.html

Page 20: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

14 Chapter 2. Related Work

Figure 2.5: Entity interactions between a known Bitcoin exchange and otherentities. Image from Bitcoin Big Bang (https://www.elliptic.co/bigbang-v1.html).

2.4 Visualization Framework

A complete tool that supports visualizations of the Bitcoin transactiongraph should allow for these three kinds of views. One of the morecomprehensive attempts at creating a visualization framework is BitIodine[Spagnuolo 2014]. BitIodine aims to organize and classify the Bitcoin trans-action graph to make analysis more meaningful. Its creator, Michele Spag-nuolo, denotes six main modules in its architecture. There is a block parsermodule which processes the transaction graph and stores it in a relationaldatabase form and updates it with new data. A clusterizer module uses theheuristics identified by Reid and Harrigan [Reid 2013] to group addressesinto clusters. A scraper module, scrapes publicly accessible websites anduses their contents to generate a list of possible cluster identifiers. The clas-sifier module uses the data generated by the scraper module to label theclusters created by the clusterizer. A grapher module, recreates a transac-tion graph from the relational database to aid in transaction tracking andfinally an exporter module allows portions of the transaction graph to beexported for use in other software.

The way the data is structured in BitIodine’s relational database makesit more suitable for tracking individual transactions (detail view) and trans-actions between/within clusters (graph view) than for extracting global ag-gregate metrics (chart view). Spagnuolo demonstrates this capability bycreating visualizations that track the movement of money on the Silk Road,a large bitcoin black market. As this is a visualization framework, BitIodine

Page 21: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

2.4. Visualization Framework 15

Figure 2.6: Sub-graph taken from the Bitcoin transaction graph. Imagefrom CoinAlytics Tracker Beta (http://coinalytics.co/tools/tracker.html).

does not focus on creating visualizations itself but rather on providing thedata needed to make visualizations possible.

I used the same approach in this thesis, focusing on creating a datawarehouse from which visualizations can be made quickly and easily. Afellow research student created visualizations using the data infrastructureI designed. In designing the data warehouse, I took into consideration thethree types of views to support. This meant my solution benefited from thespeed and efficiency data warehouses have in calculating aggregate metricsfor chart views but was also tailored to support graph and detail views. Thenext section describes the procedure by which the metrics to be included inthe data warehouse were elicited from the end users.

Page 22: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered
Page 23: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

Chapter 3

Requirement Gathering

In designing the data warehouse for the visualization tool, I followed theinteractive design life-cycle model [Sharp 2007]. This model describes iter-ative stages to continually refine the understanding of the needs of the endusers and adjust the design accordingly. The iterative nature of the modelis shown in Figure 3.1. I chose this model because the end-users made itclear that as they did not really have a good understanding of Bitcoin andhow it operates, they were not sure what metrics to look for in their study.

Given the level of uncertainty, a design approach that encouraged con-tinual feedback would be more appropriate than a linear approach withdefined stages of requirement gathering and design. A continual feedbackapproach would allow the design to rapidly adjust to the requirements asthey evolved. Therefore, I held multiple meetings in collaboration with ateam of economists to gather requirements to shape the design of the datawarehouse and evaluate the extent to which it met those requirements.

The team of economists I worked with was comprised of researchers atthe RITM1 research lab. Their domains of research cover topics such as theeconomics of privacy, economics of networks, economics of innovation andICT among others. Also included in the team was a former consultant forthe International Monetary Fund.

3.1 Task Descriptions

After the preliminary meeting held with the economists in February 2015,we proceeded to generate initial task descriptions [Diaper 2003, Sharp 2007]to identify what economic phenomena they would be interested in studying.Task descriptions are an informal way of representing a task that a userwants to perform. I used them as a means to elicit the metrics that I wouldneed to extract from the transaction graph. The task descriptions also servedas a means of evaluating whether the data warehouse design met the needsof the economists.

One economic feature that was stated in a task description was the ques-tion of assessing whether bitcoin behaved more like a currency or a commod-ity. To tell whether Bitcoin behaves more like a currency or commodity, an

1Réseaux Innovation Territoires et Mondialisation (Networks, Innovation, Space, andGlobalization)

Page 24: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

18 Chapter 3. Requirement Gathering

Identify needs/ establish

requirements

Evaluate

Build an interactive

version

(Re)Design

Final Product

Figure 3.1: A simple interaction design lifecycle model. Image taken from[Sharp 2007, p.448]

economist would look at the rate at which it is used in transactions to findout whether people are storing or using it. One way to find out if people arestoring their bitcoins, would be to check the average age of bitcoins withinthe network. An economist would look at the network at a point in timeand for each transaction, calculate the number of days since the coins werelast used. This would be done for all transactions at that point in time andthe average age of bitcoins is produced. The economist would do this fordifferent points in time and then plot a graph of the average age versus thetime. If the average age of bitcoins is rising, it would suggest most peopleare acquiring and storing rather than spending.

An economist could also look at the distribution of the age of the bit-coins involved in transactions and examine how that distribution varies withprice. The economist would create discrete groupings of age ranges (e.g. 0-7 days) and the frequency of transactions using bitcoins of that age. Theeconomist would calculate this distribution for different days. By comparingthe distribution plots for days of different prices, the economist can assesswhether users store their bitcoins during periods of high prices or if they arenot responsive to differences in price.

From an initial task description such as the one above I was able todeduce that metrics such as coin age and Bitcoin market price will be neededin the data warehouse. The initial list of metrics we generated can be seenin Figure 3.2.

Page 25: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

3.2. Prototype Demo 19

Metric Aggregation Filter Categories

Age of Coins Avg, Max, Min Cluster, Time, Discrete Grouping

Transaction Value Sum, Max, Min, Avg Cluster, Time, Discrete Grouping

Transaction Volume Sum, Avg, Max, Min Cluster, Time

Transaction Fee Sum, Avg, Max, Min, Non-zero Min Cluster, Time, Discrete Grouping

Market Price Avg, Max, Min Time

Account Balance Avg, Max, Min Cluster, Time

Input Address Count Avg, Max, Min Cluster, Time, Discrete Grouping

Output Address Count Av, Max, Min Cluster, Time, Discrete Grouping

Figure 3.2: Initial list of metrics produced after first meeting with theeconomists

3.2 Prototype Demo

In keeping with the iterative nature of the design model I had chosen, Idesigned and implemented an early prototype of the data warehouse. Thiswas combined with a prototype of the visualization tool being developed byanother student and presented at the 6th Edition of the Workshop on theProtection of Privacy2 in June 2015. While the conference was not focusedon economic issues, there was a lot of interest in the Bitcoin network becauseof the measures it implements to protect the identities of users who maketransactions on the network. Attendees of the conference who viewed thedemo, expressed an interest in being able to view the economic metrics notonly with respect to time but with respect to users as well. Another featurethat was highly requested was the functionality to go beyond aggregatedmetrics and view details at an individual transaction level. This feedbackwas useful in validating certain design decisions that I took in designing thedata warehouse.

3.3 Card Sorting Exercise

Another means by which I was able to gather further requirements fromthe economists was through a card sorting exercise [Nielsen 1995]. In asubsequent meeting with the economists, the other research student and Ipresented the economists with small cards we had prepared. On each card

2La 6e édition de l’Atelier sur la Protection de la Vie Privée, 2015, Mosnes

Page 26: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

20 Chapter 3. Requirement Gathering

Figure 3.3: Top row: Economists taking part in the card sorting exercise.Bottom row: Card groups created by participants in the card sorting exer-cise.

we had written variety of features related to the visualization tool to bedeveloped. Some of these features were more related to the data analysis(such as the metrics to be calculated), other features were more relatedto the presentation of the data (such as graph types), and yet other cardsheld features related to the interface and interactivity of the tool (such asannotation or filtering).

We asked the economists to make groups out of the cards according towhatever criteria made sense to them (see Figure 3.3). We also providedextra cards so they could add features that came to mind that we may haveoverlooked. The reasoning behind this exercise was to get better insightinto how the end users understood the tool and how they wanted to useit. We used the groups of concepts they created (Appendix A) to providestructure to the design of the visualization tool. That structure also helpedreveal features which were of importance to the economists that were not yetincluded. For example, when we discussed with the economists the resultsof their groups, we discovered that although aggregations (sum, minimum,maximum, average) of the transaction fees for all transactions were providedfor, they were also interested in being able to view these aggregations oftransaction fees for only transactions that actually had a transaction fee

Page 27: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

3.3. Card Sorting Exercise 21

(non-zero transaction fees).Another insight we gained through this card sorting exercise was the

level of granularity at which the economists wanted to view the metrics.With respect to time, they were interested in daily, weekly and monthlyaggregations while with respect to clusters they were interested in aggrega-tions per cluster and per address in a cluster. This feedback was extremelyuseful in updating the design of the data warehouse which is explained indetail in Section 4.2.

Page 28: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered
Page 29: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

Chapter 4

Solution

Based on the requirements I gathered, I decided to implement a data ware-house to power the visualization tool. A data warehouse is naturally suitedto examining multiple metrics across different categories. It stores informa-tion in fact tables and dimension tables. Fact tables are tables that storequantitative data for chosen metrics at a desired level of granularity (forexample storing transaction values and transaction fees at the transactionlevel).

The metrics in the fact table can be analyzed and filtered across differentcategories (for example time, input address). These categories are stored indimension tables. Through the use of special queries known as MultiDimen-sional Expressions (MDX), the metrics in the fact table can be aggregatedand analyzed according to the needs of the user [Kimball 2011]. This madea data warehouse a logical choice for use in a visualization tool that wasaimed at supporting exploratory studies of the Bitcoin transaction graph.

4.1 Data ExtractionTo get most up-to-date data, I installed the Bitcoin client software whichdownloaded the transaction graph into data files. As this format was notuseful for exploring the transaction graph, I used Bitcoin-Abe1, a softwarewhich processes the data files and exports them into an SQL database. Theexport procedure generated 120GB of data which we stored in an InnoDBEngine of a MariaDB database.

The data consisted of approximately 62 million transactions and 65 mil-lion addresses from Bitcoin’s launch in January 2009 to March 3rd 2015, thedate on which I downloaded the transaction graph. I then wrote scripts inPython to extract this information into temporary tables where the desiredmetrics were calculated and imported into the data warehouse. The designof the data warehouse is explained below.

4.2 Data Warehouse DesignIn Figure 4.1 and Appendix B, the conceptual and logical designs for the datawarehouse are displayed in detail. I chose four fact tables to store different

1https://github.com/bitcoin-abe/bitcoin-abe

Page 30: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

24 Chapter 4. Solution

Figure 4.1: Conceptual model of the data warehouse showing the fact tablesand dimensions.

metrics at different granularities. The Transaction fact table was used tostore metrics at the transaction level, such as the transaction value andtransaction volume. Through the use of filters and calculated measures, thisfact table was also used to support queries on metrics such as the transactionfee to transaction value ratio. This metric is useful for comparing the costof transactions between Bitcoin and other payment platforms.

The Sub Transaction fact table was used to keep track of metrics thatdeal with the sub-components of a transaction, that is, the different inputsthat make up a transaction. This covered metrics such as the age andvalue of the bitcoins in that input. In order to track the account balancesof different addresses over time, I included an Address Instance fact tablewhich records the account balance of each address every time the transactiongraph is modified. To keep the size of the fact table manageable, onlyaddresses with a non-zero balance were included. Finally, I stored the priceinformation in a fact table of its own.

From the interaction with the economists, I chose two main dimensionsfor exploring the data. One of these, the time dimension, is arranged ina hierarchy with a block being the lowest level. A block in the bitcointransaction is a point in time at which transactions were processes andadded to the transaction graph. On the bitcoin network, transactions arenot processed individually, but grouped into a block and processed in bulk.There is also a cluster dimension. This dimension contains all the inputand output addresses that can be found in the transaction graph. Using anextended clustering heuristic, I was able to group these into clusters which

Page 31: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

4.3. Entity Clustering 25

represent users/entities. These can all be seen in Figure 4.1.

4.3 Entity ClusteringOne of the user requirements was to be able to perform analysis on theentities. In the Bitcoin transaction graph, addresses can be grouped intoclusters that belong to a single entity. To create these entities I had tocluster the addresses in the transaction graph. Using the work done onuser clustering by Reid and Harrigan, and Spagnuolo (see Section 2.1) as abasis, I extended their heuristics in an attempt to assign more addresses toclusters. Their two heuristics can be summarized as follows:

1. If multiple addresses are used together as inputs in a single transaction,those address belong to the same cluster.

2. If there are multiple output addresses, the one which has not yet beenseen in the transaction graph can be considered the change addressand therefore part of the same cluster as the input addresses.

One challenge with the second heuristic is that sometimes in transactionswith multiple output addresses, there is more than one address which hasnot been seen before. In that case it is impossible to assign any of them toa cluster using the second heuristic. The approach I proposed was designedto assign weights to the output addresses in order to determine which wasmost likely to be the change address.

4.3.1 Extended Clustering Heuristic

In Figure 4.2, a hypothetical transaction t4 has two input addresses B andC. There are four output addresses A, D, E, and F. A has already beenseen earlier in the transaction graph but the other three output addresseshave not. The table shows a subsection of the Address dimension table aftertransaction t4 has been processed. The clusters each address belongs to isdisplayed as is the strength of that association as a value between 0 and 1.

Since B and C belong to the same cluster, the strength of their relation-ship is 1. A has already been seen before and belongs to another cluster.For the output addresses which have not yet been seen in the graph (i.e.D, E and F) we reflect the potential for any one of them to be the changeaddress by assigning them to the same cluster as the input addresses andgiving each a weight as a fraction of the unseen output addresses, in thiscase 1

3 . Given that D, E and F may well belong to independent clustersof their own, additional entries for those addresses are made with uniquecluster ids and a strength of 1. Using address D as an example, we outlinea number of situations that can occur as the address clustering algorithmproceeds through the transaction graph.

Page 32: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

26 Chapter 4. Solution

t1 t2 t3

A B C

t4

D

E

F

Address Cluster Strength

B 7 1

C 7 1

A 4 1

D 7 1/3

E 7 1/3

F 7 1/3

D 8 1

E 9 1

F 10 1

Figure 4.2: Sub section of address dimension table after transaction t4 hastaken place.

Case 1The algorithm may never encounter the address D again in the re-mainder of the transaction graph. In this case, the table remainsunmodified.

Case 2a - Figure 4.3Further on in the transaction graph (at a hypothetical transaction t20),the address D may be encountered as an input address together withaddress B or C. We can conclude that D belongs to the same clusteras B and C and is therefore most likely to be the change address intransaction t4. The table is updated and the entry for D belonging tocluster 7 is given a weight of 1 while the weight of the other outputaddresses’ relationship to cluster 7 is reduced to 1

3000 . I chose 3000 forthe denominator of the fraction of the minimum weight because thereis no transaction in the graph with up to 3000 output addresses. Thisensures that 1

3000 will be the lowest weight possible.

Case 2b - Figure 4.4It is also possible that the address D may be seen later with anotheraddress K, that was seen at an earlier point in the transaction graphthan t4. If this occurs, then D belongs to the same cluster as K and socannot be the change address for transaction t4. The table is modifiedto reflect this. The weight of the relationship between D and cluster 7is reduced to 1

3000 . To reflect the increased likelihood of E or F beingthe change addresses, the weight of their relationships to cluster 7 is

Page 33: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

4.3. Entity Clustering 27

increased from 13 to 1

2 .

Case 2c - Figure 4.5Another possibility is that the address D may be seen later with an-other address Y, that was seen at a later point in the transactiongraph than t4. While we can conclude that D and Y belong to thesame cluster, we still have no further information to determine whichof the output addresses of t4 is the change address. Y is thereforeadded to the table as part of cluster 8 which is the independent clus-ter for D. If at a later point, D is confirmed to be the change addressand therefore part of cluster 7, then Y along with any other addressesthat form a part of cluster 8 will be added to cluster 7 as well. If, how-ever, D is later confirmed to be not the change address, it will formits own cluster with Y and any other addresses that are subsequentlyadded to cluster 8

Case 3 - Figure 4.6Finally, address D could be the output address in a later transaction.If that is the case, then address D is much less likely to be the changeaddress for transaction t4. This is because the official software usedfor transferring bitcoins on the Bitcoin network does not use existingaddresses as output addresses unless a user specifically instructs thatit should do so. Therefore if an output address is repeated in thetransaction graph, it is much more likely that this is an address spec-ified by a user for receiving payments rather than a change address.In the table, the weight for the relationship between D and cluster 7is reduced and the weights for addresses E and F are increased.

This evaluation of these cases is done for each of the output addresseswhenever they are encountered later in the transaction graph. At the end ofprocessing the entire graph, the output address with the highest weight (ifthere is only one) is added to the same cluster as the input addresses. Anyadditional addresses that were added to its independent cluster (as in case2c) are added to the input address cluster as well. In this way, this extendedheuristic is able to reduce the number of disjoint address clusters.

If this extended heuristic is implemented on a machine with enoughmemory to store the entire address table in memory, minimum weights neednot be used at all. Those rows rather can be deleted rather than reducingtheir weights. If the computer does not have enough memory to hold thecomplete address table then each delete will require an update on disk ofthe index for that table, resulting in a prolonged run time.

Page 34: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

28 Chapter 4. Solution

Address Cluster Strength

B 7 1

C 7 1

A 4 1

D 7 1

E 7 1/3000

F 7 1/3000

D 8 1/3000

E 9 1

F 10 1

t1 t2 t3

A B C

t4

D

E

F

D C

t20

G

Figure 4.3: Address D is seen later with one of the input addresses.

4.4 Preview-Then-Detail

Another challenge that arose as a result of the requirements was related tothe performance of the system when running certain ad-hoc queries. Theeconomists wanted to be able to perform the kind of economic analysis doneby Ron and Shamir (see Section 2.2), where the distribution of a particularmetric is examined with respect to discrete groupings of another metric.These distributions such as the one in Table 2.3 could be represented in theform of bar charts.

However, the boundaries for the discrete groupings (e.g. 1≤2, 2≤4) werearbitrarily chosen by the authors. Users of the visualization tool would beable to specify their own boundaries (e.g. 1≤5). Updating the visualizationto reflect the new groupings would involve recounting the records in the facttable according to the new groupings. As this could be a time-consumingprocess, I designed an approach that would allow the visualization tool totemporarily show a preview based on approximated data while the recount-ing is done and then display the accurate chart upon completion.

To do this, I created several group tables in the data warehouse (e.g.DimTransactionValueGroup, see Appendix B). These tables containedpre-computed statistics for the metric of interest. Using a subsection of theTransaction Value Group table as an example (Table 4.1), the count of alltransactions with value 0–10 is recorded along with the mean transactionvalue of those transactions and the standard deviation. The data fromthis table could be used to create the initial visualization. If a user of thevisualization should decide to change the boundaries of the first groupingin the chart from 0–10 to 0–16, the Preview-then-Detail approach would

Page 35: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

4.4. Preview-Then-Detail 29

Address Cluster Strength

B 7 1

C 7 1

A 4 1

D 7 1/3000

E 7 1/2

F 7 1/2

D 2 1

E 9 1

F 10 1

K 2 1

t1 t2 t3

A B C

t4

D

E

F

D K

t20

G

Figure 4.4: Address D is seen later with an address appearing earlier thant4.

ID label min max count mean std1 0-10 0 10 560,000 3.8 0.62 11-20 11 20 98,000 14.3 0.63 21-30 21 30 12,000 21.2 0.94 31-40 31 40 2,000 35.1 0.5

Table 4.1: Examples of possible values in the DimTransactionValueGrouptable.

estimate the number of transactions that fall in this new grouping. All560,000 transactions from row 1 and some of the transactions from row 2would be a part of this new grouping. To estimate the number from row 2to include in the new grouping, I used the following formula:

N=m*n

where N is the number of transactions from row 2 to include in the newgrouping, m is the percentage of transactions that are less than the newupper boundary of 16 and n is the total count of transactions in row 2.

If the distribution of transaction values followed a Gaussian distribution,it would have been possible to use the 67-95-99.5% rule to estimate m.However, as mentioned in Section 2.2, this metric and many others followa long tail distribution. To get around this I used Chebyshev’s inequality

Page 36: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

30 Chapter 4. Solution

Address Cluster Strength

B 7 1

C 7 1

A 4 1

D 7 1/3

E 7 1/3

F 7 1/3

D 8 1

Y 8 1

E 9 1

F 10 1

t1 t2 t3

A B C

t4

D

E

F

D Y

t20

G

Figure 4.5: Address D is seen later with an address appearing later than t4.

from probability theory. This states that in any probability distribution, atleast 1 − 1/k2 of the distribution’s values are within k standard deviationsof the mean.

Using the values from Table 4.1, 16 is 2.83 standard deviations from themean of 14.3. By applying Chebyshev’s inequality, the percentage likely tobe less than the upper boundary of 16 , m, would be approximately 87.5%.Plugging that into the original formula gives:

N=0.875*98,000N=85,000

Therefore the new group 0–16 would be estimated to have the 560,000 trans-actions from row 1 and 85,000 from row 2. This information could be dis-played in the visualization tool and some cues given to the user that it is anapproximation (e.g. through the use of fuzzy borders of the bars in a barchart). When the complete recounting is done from the database, the accu-rate details would then replace the approximation which had been displayed.Through the use of this preview-then-detail approach, the responsiveness ofthe visualization tool can be improved.

4.5 System Architecture

As earlier mentioned, the visualization tool was developed in collaborationwith another student whose focus was on actual visualization of the data

Page 37: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

4.6. Visualizations 31

Address Cluster Strength

B 7 1

C 7 1

A 4 1

D 7 1/3000

E 7 1/2

F 7 1/2

D 8 1

E 9 1

F 10 1

P 11 1

Q 11 1

t1 t2 t3

A B C

t4

D

E

F

P Q

t20

D

Figure 4.6: Address D is seen later as an output address.

produced from the data warehouse. The architecture of the system can beseen in Figure 4.7. The data warehouse resided on a server powered byLinux Fedora with 1 Terabyte of main memory and 1.5 Terabytes of harddisk space. As this was a server belonging to a research lab, the securitypolicy of the lab did not allow for external connections directly to the server.

To deal with this problem, a single port was opened to receive connec-tions from a proxy server. Communication between the two servers wasachieved using Json files. All end user requests from the visualization toolwere transformed into simplified parameters which were transferred to thedata server where an Apache server triggered the scripts needed to retrievethe data for the visualization. The Apache server would then forward thedata in the form of a Json file through the secured port to the proxy serverwhich would forward that to the individual end user who had initiated therequest.

4.6 Visualizations

As stated earlier, I worked with another research student whose focus wasto create visualizations from the data produced from the data warehouse.These visualizations covered the three main areas discussed in Section 2.3(chart views, overview graph views, detail graph views). At the time of writ-ing, the student had not yet completed implementing all the visualizations,however I have included a few here to demonstrate the tool’s capability.

One of the initial task descriptions described how an economist would go

Page 38: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

32 Chapter 4. Solution

ServerFedora1TB RAM1.5TB HDDMariaDBApache

Reverse Proxy Server8GB RAM50GB HDDNode.js

Replies JSON object containing data

Sends JSON object containing query and parameters

Client 1Javascript

Client 2Javascript

Figure 4.7: System architecture of the visualization tool.

about assessing whether users of Bitcoin were storing up their bitcoins likea commodity (Section 3.1). To do this, the economist would plot one graphshowing the variation of the age of bitcoins spent versus time and anothergraph showing the variation of the price of bitcoins versus time. If users aretreating their bitcoins like a commodity, the average age of bitcoins spentversus time should increase during periods when the price of bitcoins is onthe rise. This would be because users would be holding back the bitcoinsthey buy, rather than using them.

In order to create this chart the user selects the parameters they wantto visualize (in this case, price versus time and coin age versus time). Thisrequest is relayed through the reverse proxy server using a Json file. TheApache server receives the file and parses it to determine which fact tablesand dimension tables are needed from the data warehouse. A script is thenexecuted to retrieve the data from the data warehouse. The data is returnedto the proxy server which relays it to the clients of the end user.

When this chart is plotted with our visualization tool (Fig 4.8 ), we seethat there does seem to be some correlation between the price and the coinage. When the price graph rises and peaks in April 2013, the coin age graphfollows a similar pattern. This correlation is not as strong but still visiblearound the second peak in December 2013. This suggests that users do holdon to their bitcoins during periods of price increases.

Another chart mentioned in the task description is the scatter plot dia-gram showing the variation of the coin age with price. This chart can alsobe created with our visualization tool (Fig 4.9). Through the careful use ofindexes, I have optimized the data warehouse for queries which focus on a

Page 39: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

4.6. Visualizations 33

Figure 4.8: Comparing the variation of the coin age with time and thevariation of price with time. In April and December, upward turns in theprice are matched by increases in coin age. This suggests commodity usageof bitcoins.

time period. When we focus on a period during which the price was increas-ing (April 2013), we can see a positive correlation between the coin age andthe price. This gives further credence to the idea of bitcoins being treatedas a commodity.

The above visualizations show that our tool caters to chart views ofthe transaction graph. For overview graph views which focus on how en-tities/clusters relate with each other, the research student implemented anovel visualization that represents the clusters and relationships in a matrix-like display (Fig 4.10). The user can zoom in to an area of interest to getmore detail on which clusters are involved in a relationship (Fig 4.11).Whencombined with de-anonymization, this view can be a useful tool for spottingpatterns in relationships between clusters.

A second overview graph which gives information on the details ofthe transactions between clusters (such transaction volume and transactionvalue) was not yet complete at the time of writing. This was also the casewith detail views. However, because the data warehouse stores informationon the source and destination of all transactions, the information needed tocreate these visualization is all available in the data warehouse.

After implementing the system I went through some steps to perform aquantitative and qualitative evaluation of my proposed solution. The detailsof these can be found in Section 5

Page 40: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

34 Chapter 4. Solution

Figure 4.9: Scatter plot of the variation of the coin age with respect toprice. During the period of price increase in April 2013, there appears to bea positive correlation between the coin age and the price. This also suggestscommodity usage of bitcoins.

Cluster 12Cluster 49

Cluster 56Cluster 28

Cluster 26Cluster 59

Cluster 24Cluster 66

Cluster 64Cluster 67

Cluster 65Cluster 63

Cluster 61Cluster 58

Cluster 42

Cluster 27Cluster 25

Cluster 72Cluster 71

Cluster 70

Cluster 68

Cluster 1Cluster 73

Cluster 62

Cluster 17

Cluster 30

Cluster 78Cluster 77

Cluster 52Cluster 50

Cluster 23Cluster 22

Cluster 21Cluster 20

Cluster 19Cluster 18

Cluster 39Cluster 38

Cluster 37Cluster 36

Cluster 35

Cluster 55

Cluster 32Cluster 29

Cluster 74Cluster 44

Cluster 43

Cluster 40

Cluster 4Cluster 2

Cluster 76Cluster 75

Cluster 57

Cluster 53Cluster 51

Cluster 48

Cluster 45Cluster 34

Cluster 31

Cluster 13

Cluster 9Cluster 8

Cluster 7

Cluster 69

Cluster 6

Cluster 54

Cluster 5

Cluster 47

Cluster 46

Cluster 41

Cluster 33

cluster 3

Cluster 16Cluster 15

Cluster 14

Cluster 11

Cluster 10

Cluster 12 Cluster 49 Cluster 56 Cluster 28 Cluster 26Cluster 59 Cluster 24 Cluster 66

Cluster 64Cluster 67

Cluster 65Cluster 63

Clu

ster

61

Clu

ster

58

Clu

ster

42

Cluster 27Cluster 25

Cluster 72Cluster 71

Cluster 70

Clu

ster

68

Cluster 1Cluster 73

Cluster 17

Cluster 30

Clu

ster

77

Cluster 52Cluster 50

Cluster 23Cluster 22

Cluster 21

Clu

ster

20

Cluster 39Cluster 38

Clu

ster

37

Clu

ster

36

Clu

ster

32

Clu

ster

29

Clu

ster

40

Clu

ster

4

Clu

ster

76

Clu

ster

48

Figure 4.10: Overview of relationship between clusters/entities

Page 41: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

4.6. Visualizations 35

Figure 4.11: Close up of overview of relationship between clusters

Page 42: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered
Page 43: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

Chapter 5

Evaluation

Although there was continual evaluation and feedback throughout the pro-cess of designing the visualization tool, I carried out a final evaluation to de-termine the suitability of the design solution to the needs of the economists.I have classified the evaluation criteria into qualitative and quantitative cat-egories

5.1 Qualitative EvaluationIn evaluating how well the system supports the needs of the end users, someassessment can be done qualitatively. Using the table of proposed metrics(Figure 3.2) created earlier in the requirements gathering phase, I was ableto outline those which were supported. At the time of writing, all metricswith the exception of Account Balance had been calculated and stored inthe data warehouse. I chose to implement that metric last because it wouldrequire the most computational resources and disk space.

Task descriptions also served as a useful means of assessing the suit-ability of the design tool. These task descriptions were also created duringthe requirement gathering stage and identified some of the main tasks theeconomists wanted to perform. One of these task descriptions is shown withaccompanying visualizations in Section 4.6, demonstrating the capability ofthe tool to perform the tasks outlined in the task description.

In the Related Work section (Section 2.3), I discussed three main kinds ofvisualizations of the Bitcoin transaction graph: chart views, overview graphviews and detail graph views. A data warehouse stores metrics in fact tablesand their various values with respect to different dimensions. This makesdata warehouses inherently suitable for supporting chart views and my de-signed data warehouse solution benefited from this feature. Additionally, Iwas able to support detail graph views by separating address informationinto incoming and outgoing addresses. This allowed for tracking the flowof money through the graph. I also implemented an extended clusteringheuristic that made it possible to group addresses into clusters/entities andstudy their interaction. Thus, all three view types are supported by thevisualization tool.

Another means of evaluating the visualization tool is by assessing itsflexibility. By using a data warehouse as the basis for design solution andchoosing the fact tables I did, I ensured that adding future metrics will not

Page 44: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

38 Chapter 5. Evaluation

be too complicated. I covered the different levels of granularity of interestto the economists making it easier to add metrics. In deciding the metricsto include, I chose to include metrics which could be used as a base for cal-culating other metrics. For example, although transaction fee ratios did notinitially show up as a metric in the requirements gathering phase, they canstill be calculated using the transaction fee and transaction value metrics.With the granularity provided, the transaction fee ratios can be calculatedat a daily, weekly or monthly level or even at the address or cluster level.

5.2 Quantitative Evaluation

There are several aspects of the designed solution that can be evaluatedquantitatively. One of these is the efficacy of the extended clustering heuris-tic. Implementing and running my extended clustering heuristic produced12,484,375 unique clusters. This represents a reduction of almost 200,000clusters when compared to the implementation of the clustering heuristicdiscussed in the Section 2.1 which yielded 12,679,162 unique clusters.

Although the extended clustering heuristic was able to merge 200,000clusters, this was only a 1.6% reduction in the number of clusters. This canbe explained by examining the distribution of transactions with respect tonumber of output addresses in a transaction (Table 5.1). Almost 92% ofall transactions have either one or two outputs. This means the extendedclustering heuristic would need to make its gains from 8% of the remainingtransactions.

Of the 8%, not all will have an identifiable change address because nofurther information about the output addresses could be found in the trans-action graph to confirm or eliminate their candidacy as change addresses.With this knowledge, only modest gains in contracting the number of clus-ters can be expected using the extended clustering heuristic.

Another feature of the visualization tool that can be measured quanti-tatively is the response time for queries. There is bound to be a variation inthe response times depending on the query issued. However in the worst casescenario of selecting an aggregation of a metric over the complete data set of61 million transactions, the query responded in approximately 30 seconds.When filters were used to restrict the data returned based on a particulartime range, the response times were even lower. For example, selecting the2.9 million transactions from the two busiest months (January and February2015) took 1.5 seconds.

A response time of approximately 1.5 seconds is enough to give users ofthe visualization tool the feeling of interactivity [Nielsen 1993]. For somemetrics, such as the discrete distribution shown in table 5.1 the responsetimes averaged between 25 and 30 seconds. However, using the preview-then-detail approach discussed in Section 4.4 the visualization tool took less

Page 45: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

5.3. Further Work 39

Number of output addresses Transaction Volume1 4,858,5652 51,194,8963 3,235,6804 395,0545 296,2316 218,9947 86,8628 59,6659 46,56810 44,000

Table 5.1: Distribution of total number of transactions with respect to num-ber of output address in a transaction for the period January 2009 to Febru-ary 2015.

than a second to generate the data to create preview visualizations whilethe accurate figures continued to be calculated.

One major drawback of the design used to support the preview-then-detail approach is that the counts, standard deviations and means can onlybe calculated and stored for the transaction graph as a whole. If the usershould decide to filter according to any criteria, the pre-calculated valuescannot be used to make estimations. However, this drawback is offset by thefact that introducing filters into a query also reduces the response time andtherefore reduces the likelihood of needing the preview-then-detail approach.

5.3 Further Work

Despite the efforts that went into designing a system that would meet theneeds of the end users as closely as possible, there are a few areas which couldbenefit from future work. One of the primary areas of improvement (whichalso showed up in subsequent feedback from the economists) is refreshingthe data warehouse. In the current design iteration, the data warehousehas a cut-off date of March 3rd, 2015. However, the transaction graph iscontinually expanding and adding more transactions in each year as theuser base of Bitcoin increases (Table 5.2). Being able to analyze up-to-datedata will be an important part of the economists’ study of Bitcoin. Anotherway to improve the visualization tool for economists is through the use ofentity de-anonymization. In the current system, entities are only identifiedthrough the use of unique numerical IDs. This can make analysis of entityrelationships rather difficult when the user cannot identify who the entity isor at least what type of entity it is. Although this was beyond the scope of

Page 46: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

40 Chapter 5. Evaluation

Time period Transaction VolumeJan 2009 – Dec 2009 32,711Jan 2010 – Dec 2010 185,425Jan 2011 – Dec 2011 1,902,252Jan 2012 – Dec 2012 8,455,509Jan 2013 – Dec 2013 19,649,296Jan 2014 – Dec 2014 25,278,892Jan 2015 – Feb 2015 5,553,980

Table 5.2: Growth of transaction volume since the launch of Bitcoin

the current project, future work can use the web scraping methods describedin BitIodine [Spagnuolo 2014] to identify entities.

Another direction for future research is the use of parallelization of com-putational tasks. Majority of the queries on the data warehouse consistof additive aggregations (such as Sum and Count) as well as semi-additiveaggregations (such as Average). Therefore parallel computing techniquessuch as MapReduce could be used to reduce response times significantly[Dean 2008]. With techniques such as MapReduce, rather than a singlehigh-powered machine performing the calculations, the data is sub-dividedinto smaller sets on a network of machines which work in parallel to processthe data with greater speed. This is an important need as the transactiongraph continues to increase in size.

Finally, collaboration among end users can be supported using collabo-rative filtering techniques. By keeping a log of queries executed against thedata warehouse the visualization tool can use the collaborative filtering ap-proach proposed by Aligon et al, to recommend queries of interest to usersof the tool [Aligon 2015]. As the data for the visualization tool is storedin a data warehouse, the collaborative filtering approach which is based onOLAP sessions will be a natural fit for this system.

Page 47: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

Chapter 6

Conclusion

In this paper, I have discussed the concept of digital currencies and explainedbasic workings of the Bitcoin protocol. For economists looking to studyBitcoin, the large volume of transactions and addresses, makes it difficult toextract the metrics they need. The calculation of metrics is also complicatedby the nature of transaction graph which obsfucates identities and makes itdifficult to tell which addresses belong to the same user. To help economistsdeal with this problem, I (in collaboration with another research student)designed and created a visualization tool.

I collaborated with economists to gather their requirements using anditerative design life-cycle model which incorporated multiple rounds of feed-back. Based on these requirements, I designed a data warehouse to restruc-ture the transaction graph data into a format that supported speedy andeasy access to the metrics of interest. For certain queries with long responsetimes, I described a preview-then-detail approach in which the visualizationtool could create estimates to display in a preview, while accurate figurescontinued to be calculated.

To deal with the challenge identifying users on the transaction graph,I proposed an extension of an address clustering heuristic with the aim ofimproving the assignment of addresses to clusters. My implementation ofthe heuristic resulted in a 1.6% reduction in the number of clusters.

One area of improvement on the visualization tool will be the implemen-tation of user de-anonymization. This will make analysis of relationshipsbetween users much easier. Additional scripts to ensure regular refreshingof the data warehouse will also be needed to keep the data warehouse up-to-date as new transactions are added to the transaction graph. Finally,the use of parallel computing techniques such as MapReduce will greatlyimprove response times on aggregation queries.

Page 48: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered
Page 49: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

Appendix A

Card Sorting Results

The diagrams on the next two pages show the results of the card sortingexercise held with the economists. We divided them into two groups andasked them to sort the features of the visualization tool into categories thatmade sense. For each category they gave headings (in green for the firstgroup and yellow for for the second group). These categorizations wereuseful in determining metrics to include in the data warehouse and otherdesign decisions for the visualization tool.

Page 50: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

44 Appendix A. Card Sorting Results

Tran

sact

ion

Type

s

Var

iabl

es

Non

-zer

o tra

nsac

tion

fees

(m

in, m

ax, a

vg, s

um)

trans

actio

n va

lue

(min

, m

ax, a

vg, s

um)

trans

actio

n fe

es (m

in,

max

, avg

, sum

)

non-

zero

coi

n ag

e (m

in,

max

, avg

, sum

)

coin

age

(min

, max

, avg

, su

m)

trans

actio

n vo

lum

e

mar

ket p

rice(

min

, max

, av

g, s

um)

Exp

ort D

ata

to c

sv +

filte

r

Dat

a M

anag

emen

t

Exp

ort G

raph

figu

res

Add

Ser

ies

Net

wor

k S

tatis

tics

on

Bitc

oin

trans

actio

n gr

aph

Adv

ance

d S

tatis

tics

deta

ils o

f dai

ly a

ctiv

ities

deta

ils o

n in

divi

dual

tra

nsac

tions

deta

ils o

f all

past

act

iviti

es

addr

esse

s se

ndin

g B

itcoi

n

addr

esse

s re

ceiv

ing

Bitc

oins

de-a

nony

miz

atio

n

deta

ils o

n in

divi

dual

ad

dres

ses

aggr

egat

e st

atis

tics

per

mon

th

Bas

ic S

tatis

tics

aggr

egat

e st

atis

tics

per

day

aggr

egat

e st

atis

tics

per

wee

k

aggr

egat

e st

atis

tics

per

day

of w

eek

aggr

egat

e st

atis

tics

per

day

of m

onth

mov

ing

aver

age

time

serie

s st

atis

tics

daily

diff

eren

ce (b

oxpl

ot)

Sta

tistic

s (M

etric

s)

deta

ils o

n w

eekl

y ac

tiviti

es

plat

form

inde

pend

ence

(m

ac, w

indo

ws)

Sof

twar

e up

date

s an

d te

chni

cal f

eatu

res/

Gra

ph M

anag

emen

t

dyna

mic

upd

ates

cust

omiz

e pl

ots

binn

ed s

cale

s

com

pare

diff

eren

t plo

ts

re-a

rran

ge p

lots

timel

ines

spec

ify e

xact

tim

e ra

nge

loga

rithm

ic s

cale

s

mod

ify th

e da

ta (e

.g.

rem

ove

outli

er)

mov

ing

scal

es

fixed

sca

les

anno

tate

gra

ph/c

hart

impo

rtant

eve

nts

high

light

ed

Gro

up 1

Page 51: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

45

impo

rtant

eve

nts

high

light

ed

Net

wor

k

Non

-zer

o tra

nsac

tion

fees

(m

in, m

ax, a

vg, s

um)

trans

actio

n va

lue

(min

, m

ax, a

vg, s

um)

trans

actio

n fe

es (m

in,

max

, avg

, sum

)

non-

zero

coi

n ag

e (m

in,

max

, avg

, sum

)

coin

age

(min

, max

, avg

, su

m)

trans

actio

n vo

lum

e

mar

ket p

rice(

min

, max

, av

g, s

um)

Exp

ort D

ata

to c

sv

Dat

a M

anag

emen

t

Exp

ort G

raph

figu

res

Dat

a fro

m d

iffer

ent

plat

form

s

Net

wor

k S

tatis

tics

on

Bitc

oin

trans

actio

n gr

aph

Cus

tom

izat

ion

deta

ils o

f dai

ly a

ctiv

ities

deta

ils o

n in

divi

dual

tra

nsac

tions

deta

ils o

f all

past

act

iviti

es

addr

esse

s se

ndin

g B

itcoi

ns

addr

esse

s re

ceiv

ing

Bitc

oins

de-a

nony

miz

atio

n

deta

ils o

n in

divi

dual

ad

dres

ses

aggr

egat

e st

atis

tics

per

mon

th

Des

crip

tive

Sta

ts

aggr

egat

e st

atis

tics

per

day

aggr

egat

e st

atis

tics

per

wee

k

aggr

egat

e st

atis

tics

per

day

of w

eek

aggr

egat

e st

atis

tics

per

day

of m

onth

mov

ing

aver

age

deta

ils o

n w

eekl

y ac

tiviti

es

plat

form

inde

pend

ence

(m

ac, w

indo

ws)

dyna

mic

upd

ates

cust

omiz

e pl

ots

binn

ed s

cale

s

com

pare

diff

eren

t plo

ts

re-a

rran

ge p

lots

timel

ines

spec

ify e

xact

tim

e ra

nge

loga

rithm

ic s

cale

s

mod

ify th

e da

ta (e

.g.

rem

ove

outli

er)

mov

ing

scal

es

fixed

sca

les

anno

tate

gra

ph/c

hart

trans

actio

n ty

pes

Net

wor

k V

isua

lizat

ion

ex.

Louv

ain

met

hod

Tran

sact

ions

Page 52: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered
Page 53: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

Appendix B

Data Warehouse LogicalSchema

The diagrams on the next two pages show the logical schema for the tablesin the data warehouse. The first diagram shows the Transaction fact tableand its relationship with the dimension tables. The other three fact tablesare shown in the second diagram.

Page 54: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

48 Appendix B. Data Warehouse Logical Schema

FactTransaction

transaction_value

transaction_fee

incoming_address_count

outgoing_address_count

transaction_id

tv_id

tf_id

inc_count_id

out_count_id

time_id

tx_id

DimTransactionFeeGroup

tf_id

tf_label

tf_min

tf_max

DimTransactionV

alueGroup

tv_id

tv_label

tv_m

in

tv_m

ax

DimIncomingA

ddressCountGroup

inc_count_id

inc_count_label

int_count_min

inc_count_max

DimOutgoingA

ddressCountGroup

out_count_id

out_count_label

out_count_min

out_count_max

DimTime

time_id

block_no

date_value

day_of_m

onth

day_of_w

eek

week_no

month

year

DimAddress

address_id

address_label

cluster_id

cluster_type

first_seen

last_seen

DimOutgoingA

ddress

transaction_id

address_id

DimIncomingA

ddress

transaction_id

address_id

Page 55: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

49

FactAddressInstance

account_balance

instance_id

address_id

time_id

DimTime

time_id

block_no

date_value

day_of_m

onth

day_of_w

eek

week_no

month

year

DimAddress

address_id

address_label

address_hash

cluster_id

cluster_type

first_seen

last_seen

FactSubTransaction

age_of_coin

subtransaction_value

address_id

time_id

tv_id

ca_id

subtransaction_id

DimTransactionV

alueGroup

tv_id

tv_label

tv_m

in

tv_m

ax

FactPrice

value

time_id

price_id

DimCoinA

geGroup

ca_id

ca_label

ca_m

in

ca_m

ax

Page 56: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered
Page 57: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

Bibliography

[Ali 2014] Robleh Ali, John Barrdear, Roger Clews and James Southgate.The Economics of Digital Currencies. Bank of England QuarterlyBulletin, vol. 54, no. 3, pages 276–286, 2014. (Cited on pages 1and 8.)

[Aligon 2015] Julien Aligon, Enrico Gallinucci, Matteo Golfarelli, PatrickMarcel and Stefano Rizzi. A collaborative filtering approach forrecommending OLAP sessions. Decision Support Systems, vol. 69,pages 20–30, 2015. (Cited on page 40.)

[Baumann 2014] Annika Baumann, Benjamin Fabian and Matthias Lis-chke. Exploring the Bitcoin Network. In WebDB’04: Proceedingsof the 10th International Conference on Web Information Systemsand Technologies, Barcelona, 2014. (Cited on page 12.)

[Dean 2008] Jeffrey Dean and Sanjay Ghemawat. MapReduce: SimplifiedData Processing on Large Clusters. Commun. ACM, vol. 51, no. 1,pages 107–113, January 2008. (Cited on page 40.)

[Diaper 2003] D. Diaper and N. Stanton. The handbook of task analysisfor human-computer interaction. Taylor & Francis, 2003. (Cited onpage 17.)

[Evans 2014] David S Evans. Economic aspects of bitcoin and other de-centralized public-ledger currency platforms. University of ChicagoCoase-Sandor Institute for Law & Economics Research Paper,no. 685, 2014. (Cited on page 1.)

[Eyal 2014] Ittay Eyal and Emin Gün Sirer. Majority Is Not Enough: Bit-coin Mining Is Vulnerable. In Nicolas Christin and Reihaneh Safavi-Naini, editeurs, Financial Cryptography and Data Security, volume8437 of Lecture Notes in Computer Science, pages 436–454. SpringerBerlin Heidelberg, 2014. (Cited on page 1.)

[Flohr 1996] Udo Flohr. Electric money. Byte, vol. 21, no. 6, pages 74–87,1996. (Cited on page 1.)

[Gladstone 1996] Julia Alpert Gladstone. Exploring the Role of Digital Cur-rency in the Retail Payments System. New Eng. L. Rev., vol. 31, page1193, 1996. (Cited on page 1.)

[Kimball 2011] R. Kimball and M. Ross. The data warehouse toolkit: Thecomplete guide to dimensional modeling. Wiley, 2011. (Cited onpage 23.)

Page 58: Decision Support and Business Intelligence · 2 Chapter 1. Introduction 1.2 Problem Statement Bitcoin, unlike traditional fiat currencies such as the dollar or euro, can be considered

52 Bibliography

[Möser 2015] Malte Möser and Rainer Böhme. Trends, Tips, Tolls: A Lon-gitudinal Study of Bitcoin Transaction Fees. In 2nd Workshop onBitcoin Research, affiliated with the 19th International Conferenceon Financial Cryptography and Data Security, Puerto Rico, 2015.(Cited on pages 10 and 11.)

[Nakamoto 2008] Satoshi Nakamoto. Bitcoin: A peer-to-peer electronic cashsystem. Consulted, vol. 1, no. 2012, page 28, 2008. (Cited on pages 1and 2.)

[Nielsen 1993] Jakob Nielsen. Usability engineering. Morgan KaufmannPublishers Inc., San Francisco, CA, USA, 1993. (Cited on page 38.)

[Nielsen 1995] Jakob Nielsen. Card sorting to discover the users’ model ofthe information space, 1995. http://www.useit.com/papers/sun/cardsort.html, Last visited: June 7th, 2015. (Cited on page 19.)

[Ober 2013] Micha Ober, Stefan Katzenbeisser and Kay Hamacher. Struc-ture and anonymity of the bitcoin transaction graph. Future internet,vol. 5, no. 2, pages 237–250, 2013. (Cited on pages 11 and 12.)

[Reid 2013] Fergal Reid and Martin Harrigan. An Analysis of Anonymityin the Bitcoin System. In Yaniv Altshuler, Yuval Elovici, Armin B.Cremers, Nadav Aharony and Alex Pentland, editeurs, Security andPrivacy in Social Networks, pages 197–223. Springer New York, 2013.(Cited on pages 2, 5, 12, 13 and 14.)

[Ron 2013] Dorit Ron and Adi Shamir. Quantitative Analysis of the FullBitcoin Transaction Graph. In Ahmad-Reza Sadeghi, editeur, Fi-nancial Cryptography and Data Security, volume 7859 of LectureNotes in Computer Science, pages 6–24. Springer Berlin Heidelberg,2013. (Cited on pages 2, 5, 7, 8, 9 and 13.)

[Sharp 2007] H. Sharp, Y. Rogers and J. Preece. Interaction design: Beyondhuman-computer interaction. Wiley, 2007. (Cited on pages 3, 17and 18.)

[Spagnuolo 2014] Michele Spagnuolo, Federico Maggi and Stefano Zanero.Bitiodine: Extracting intelligence from the bitcoin network. In Fi-nancial Cryptography and Data Security, pages 457–468. Springer,2014. (Cited on pages 2, 6, 14 and 40.)