Decentralized User-Centric Access Control using PubSub ... · Decentralized User-Centric Access Control using PubSub over Blockchain Sayed Hadi Hashemi 1, Faraz Faghri 1, Roy H Campbell

Decentralized User-Centric Access Controlusing PubSub over Blockchain

Sayed Hadi Hashemi1, Faraz Faghri1 , Roy H CampbellUniversity of Illinois at Urbana-Champaign

Abstract

We present a mechanism that puts users in the cen-ter of control and empowers them to dictate the ac-cess to their collections of data. Revisiting the fun-damental mechanisms in security for providing pro-tection, our solution uses capabilities, access lists,and access rights following well-understood formalnotions for reasoning about access. This contribu-tion presents a practical, correct, auditable, trans-parent, distributed, and decentralized mechanismthat is well-matched to the current emerging envi-ronments including Internet of Things, smart city,precision medicine, and autonomous cars. It isbased on well-tested principles and practices usedin distributed authorization, cryptocurrencies, andscalable computing.

1 Introduction

In not too long a fully deployed Internet of Things(IoT), smart city, precision medicine, and au-tonomous cars will become a reality, with it comeshuge promises for the reduction of frictions and riseof efficiencies. This promise relies on analysis ofmassive amount of data, often times refereed to asbig data analytics. However, gaining access to thismassive amount of data, at a scale required by theseenvironments, while preserving user’s right is prov-ing difficult. Data is stored in a variety of locations,managed by different service providers, to be an-alyzed by diverse entities, and subject to range ofregulations concerning public records and individ-ual privacy.

In order to make the dream of these emergingenvironments a reality, it is crucial to address theaforementioned challenges with more flexible and

1Joint first authors

scalable data sharing models. Here, we present oursolution, a user-centric data dissemination and dis-tribution system to enable scalable data sharing andsolving the following problems:

1. User-centric. data collected from the datasources potentially represents the activities of mil-lions of users (individuals or organizations) andtheir possessions. Commonly, in these environmentsusers themselves do not know what private data hasbeen stored, who manages it, or where it resides.Arguably, this user factor should be informed andthe ultimate deciding factor in control of their data.

2. Privacy-preserving. many users are willing togrant access to their data as long as their privacy ispreserved. The goal here is to not allow data analyz-ers infer the identity of data owner through protocolmechanisms. The only way that an adversary canlearn the identity of data owner would be throughanalysis of data content which is out of this work’sscope.

3. Endorsed. naturally, trusted third partiesshould contribute in helping ascertain the trustwor-thiness of a party. But this role should be advisory,whereas now they are actively making these deci-sions on behalf of data owners.

4. Accessible. to take full advantage of dataand produce actionable insight, data will need to beavailable for analysis to a diverse group of externalsources such as researchers, social entrepreneurs,city departments, healthcare providers, and privatesector. Also, any user such as data owners and dataanalyzers should be able to easily join the system.

5. Scalable. as the emerging environments scaleto billions of data sources and millions of users,granting control to users becomes a very challengingproblem. At this scale, conventional best practicesare not practical, it becomes impossible to main-tain an access control list on each single sensor, ornear impossible to rely on centralized access control

arX

iv:1

710.

0011

0v1

[cs

.CR

] 2

9 Se

p 20

17

server, or agreed upon trusted parties based proto-cols. Networks of billions of devices and millions ofusers would require maintaining enormous accessi-ble access control lists that if possible at all wouldbe very expensive and challenging with their strictavailability and consistency needs.

6. De-centralized. current environments, lessand less frequently have centralized control, and aretypically managed by a very diverse group of serviceproviders. Because of these heterogeneous and dis-parate environments, it is often difficult to managethe issues of trust relationship management, datamanagement, regulatory compliance and data shar-ing in a centralized control system. Server-clientmodels such as Kerberos [5] can not be used in suchenvironments due to lack of central point of trust.

7. Distributed. data resides on the clouds and atthe edge on billions of data sources. As the emerg-ing environments grow and data generating devicesdeployed at ever increasing rates, we find that dataremain siloed and inaccessible. Mainly due to thefact that the generated data is owned by millionsof users and managed by a wide range of organiza-tions.

8. Asynchronous. lack of connectivity is a tra-ditional challenge in many environments. Devicesmight not be nearly as well connected. A networkof sensors implemented in a house that is not nec-essarily exposed to the whole world. This requiressystems to function without the need of persistentconnectivity.

9. Compliant. many countries proscribe specificrequirements and liabilities upon organizations thathold or manage data. These entities are often sub-ject to the regulatory requirements, e.g. HIPAA, Di-rective 2011/24/EU, and other similar laws for en-tities covered by healthcare privacy regulations.

10. Transparent. in the absence of a centralizedtrusted party, transparency is a key principle to keepparties honest. Enhancing transparency ensures thatlegislation are adhered to in the emerging environ-ments.

11. Auditable. users should be able to share theirdata with any chosen data requester, as well as trackand monitor the requesters access. Audit is a criticalfeature to track who, when, and where has access adata.

12. Correct. showing security correctness is amajor requirement for practical applications in or-der to prevent protocol and implementation flaws.

13. Usable. we need a system that can securelyprovide the above functionalities for locating andsharing data between data sources, and/or entities.Distribution and privacy concerns make implemen-

tation of very basic Big Data Analytics primitivesnearly impossible. For instance, in the currentsetting, ‘dataset lookup’, the simple task of findinga data set is a major quest.

Our solution provides a mechanism for the user tocontrol at scale. Two differentiating contributions inour solution are: first, separation of the data storefrom the data management, and second, architect-ing both components scalable, decentralized anddistributed. These two major contributions providesubstantial benefit to resolving aforementionedchallenges; it grants the possibility of keeping dataat its origin, overcomes the single point of trust andfailure, and adapts to the growth of users and datasources.1

This paper describes and evaluates a practicaland correct implementation of this solution in asystem called Dusc. The paper contains an im-proved version of the system, a full discussion of animplementation, a proof of the system’s correctness,and an evaluation of its practicality by performancemeasurement, storage costs, query times, andBlockchain consistency times. This version providessubstantial improvements including support forlocal parallelization.

Dusc has three main components which uti-lizes best practices in computer security anddistributed systems: data management protocol,messaging service, and data store system. First,a data management protocol in which differentroles can interact with each other in a secure andprivate manner based on capability-based accesscontrol. Second, a decentralized and distributeddata store system that allows each role able tocommunicate with each other without the needfor a trusted third party based on Blockchain.Although our solution has a different purpose,recent cryptocurrencies such as Bitcoin providean interesting and similar successful implemen-tations of decentralized and secure circulation ofcurrency. Third, a scalable messaging service basedon publish-subscribe model designed to provideflexible and reliable communication between manysenders and receivers without the need of persistentconnectivity. Our solution is fully decentralized anddistributed in order to accommodate the massive

1Preliminary version of this idea in the context of Internetof Things has recently been proposed by the authors. Themanuscript will appear soon in an IoT conference. Due to the“Anonymous submission” policy, paper is not referenced. Con-tribution of the current paper compared to the IoT one is high-lighted in the next paragraph.

2

scale of the IoT, smart city, precision medicine, andautonomous cars paradigms. Because the system isuser-centric, no trust of any participant is impliedand automatic. While still maintaining the ability toscale, this provides full control of access control andpropagation, system security, and user privacy.

The remainder of the paper is organized as fol-lows. In Section 2, we give an overview of our so-lution. Sections 3, 4, and 5 will describe the threecomponents of our solution, data management pro-tocol, data store system, and messaging service re-spectively. We discus the implementation and evalu-ation of our proposed solution in Section 6. Section7 looks back at how Duschas addressed the afore-mentioned challenges.

2 System Model and Design

Further, we explain the roles, primitives offered tothese roles, along with the scalability and threat as-sumptions. For the rest of this paper we use Abadiet. al notations [2],[3] to formally describe the sys-tem and protocols. Notations are explained in Table1.

2.1 Roles

Four main roles are introduced in our model:

• Data Owner: an individual or organization whois in possession of data. This role does not nec-essarily generate or store the data. In our model,data owner grants the access to the data.

• Data Source: represents a computer system, indi-vidual, or organization, who manages and storesdata objects, be it at rest or in motion. Exampleswould include cloud providers, managers of Elec-tronic Health Record (EHR) systems, applicationgateways, and archival systems. A sensor can actas data source if it has enough performance, con-nectivity, and storage, otherwise its data is storedelsewhere.

• Data Requester: an individual or organizationthat requests access to other data owners’ data,available within the network. Examples are re-searcher, company, data aggregator, or anotherdevice.

• Endorser: a third party individual or organiza-tion that validates a request. This may be a

trusted authority, organization, or known individ-ual. Endorser either provides supplementary in-formation regarding credibility, or validates au-thenticity of the role’s identity. Examples of en-dorser are but not limited to: FDA, NIH, DMV, mu-nicipal, Internal Review Board (IRB), and externalreview by additional research institute (Hospital Aasks hospital B for review).

2.2 Primitives

Following primitives are considered for the aboveroles in our model:

• Data Discovery: users have many data generat-ing devices and more likely no substantial datastorage. They should know what data they ownor have the rights to. They should have one viewthat allows them to see a portfolio of all of thedata they own.

• Data Request: the subject should be able tosearch for a collection of data that meets theirconditions. The request should be easily con-verted to a data access authorization.

• Audit: users should be able to share their datawith any chosen data requester, as well as trackand monitor the requester’s access.

2.3 Assumptions

Following scalability and threat assumptions areconsidered in our model:

A1. We assume that data sources are aware of theirusers’ identity (public key), i.e. data source Sknows its user’s public key KX , and can securelyauthenticate signed messages by that user:

{message}X ′ =⇒ X says message

A2. We assume that during user registration, usersare provided with a mechanism to authenticatethe identity of the data source. This mechanismcan be verification from a certificate authorityor a token:

{KC}X =⇒ C

A3. We assume that the endorsers’ identity can berelied upon as valid via currently available tech-nologies such as PKI, i.e. if a subject trusts a

3

Table 1: Notations used in this paper adopted fromAbadi et. al [2],[3]

Notation Description

{M}XEncrypted message M using X ’s pub-lic key.

{M}X ′Signed message M using X ’s privatekey.

X Identity of X .

KX X ’s public key.

Oi Data Owner i.

Si Data Source i.

Ri Data Requester i.

Ei Endowser i.

DOT Data Object Ticket.

DAP Data Access Path.

RT Request Ticket.

DAT Data Access Ticket (Capability)

X says Y X makes the statement Y .

X for Y Y on behalf of X .

X controls YX is trusted on Y : if X says Y then Y .This is the meaning of X appearingin the ACL for Y . [2]

certificate authority such as C which signs anendorser identity E, then:

KC =⇒ C

KC says (KE =⇒ E)→ KE =⇒ E

A4. We address the issue of traffic analysis in thispaper; however we make a best effort attemptrather than attempting to completely obfuscateusers’ access patterns.

A5. It is assumed that users may trust their localcomputing environment. We understand thatthis assumption is difficult to guarantee and theexecution of sensitive programs in untrustedenvironments will continue to be a risk.

A6. We assume that attackers have a specific set ofabilities. We assume that attackers can viewsystem’s data, are able to present modified datato participants, and may impersonate roles. Wealso assume that attackers do not have accessto private keys, and cannot gain local adminis-trator access to the systems.

A7. We assume the security of the Blockchain sys-tem, and cryptographic integrity of the utilizedencryption protocols.

Figure 1: Our solution consists of three main com-ponents: i. data management protocol, ii. datastore system, and iii. messaging service. Data man-agement protocol is a secure capability-based sys-tem. The messaging service is based on publish-subscribe architecture. Data store system is basedon the Blockchain.

A8. We assume that ‘things’ either have the capabil-ity to act as a data source or are able to transmittheir data to a data source securely.

A9. We assume that each data object has at leastone data owner.

A10. We do not address DoS attacks in the currentsystem. In the future works, this attack can beaddressed by the best efforts and practices.

2.4 Our solution

Reviewed the roles, primitives, and assumptions inour model, we introduce our solution for empower-ing the users by controlling the access to their dataat scale. Our system consists of three main compo-nents:

1. Data management protocol

2. Data store system

3. Messaging service

An overview of our system with the three com-ponents is illustrated in Figure 1. Data manage-ment protocol provides a framework for interaction

4

among different roles with an access control mecha-nism. The protocol is a secure capability-based sys-tem. Data store system provides a persistent, dis-tributed, and decentralized storage for the accesscontrol component; it is based on Blockchain andprovides full transparency of transactions very simi-lar to Bitcoin. The messaging service plays an impor-tant role for the scalability of our solution; it is basedon publish-subscribe architecture and designed toprovide scalable, flexible, and reliable communica-tion between many senders and receivers withoutthe need of persistent connectivity.

Our solution is user-oriented; users are the bridgebetween different networks of data generating de-vices, and they are ultimately authorizing the accessto their data. Information regarding the user’s pos-sessions is sent to the user securely and privately.This data is accessible whenever the user demandsto view it. Data owners are aware of the existenceof data by direct knowledge or control over the sen-sor/actuator producing the data. Additionally, theymay learn about the existence of data via notifica-tions generated by the data sources and transmitteddirectly to the data owner, mainly to communicatethe management of data by the data source.

Data requests are broadcasted to all the data own-ers within the system. Data owners are not requiredto review all the requests. In case a data owner isnot interested in sharing any data, they can easilyfilter out all the requests, or they can selectively fil-ter out requests based on their interest. Also, user’strusted endorsers may verify a data request. Thishelps protecting users against malicious or unethi-cal data requests, as well as assisting them with riskassessment of received data requests.

Data requests, upon arrival by the user, will bechecked against the user’s portfolio in the client. Ifthe conditions are met, and data owner is willing toshare the data, a capability ticket will be issued tothe data requester which provides necessary autho-rization for accessing the requested data.

Whenever two roles want to interact, one sends amessage to the messaging service component. Then,the elements of messaging service will send the mes-sage to the data store system which is based onBlockchain. After successful storage of the messagein the data store system, it will be fetched by therecipient through the messaging service.

In the next sections, we explain these three com-ponents in more detail.

3 Data Management Protocol

Data management protocol provides a frameworkfor different roles to interact in a secure and privatemanner. The protocol is a decentralized capability-based access control. The communication betweendifferent roles is performed by message passing.Each message contains a sender, a receiver, and apayload. Messages are delivered through the under-lying “Messaging Service” which is described later inSection 5.

In this section, first, we compare our data man-agement protocol to capability-based access controlsystems. Later, we introduce data structures usedin the message payloads along with the messagestransmitted in the protocol. Throughout this sec-tion, we use notions described in Table 1.

3.1 Capability-Based Access Control

Our system is implementing a distributed and de-centralized capability based access control system.Capability is a token that gives a data requester per-mission to access a data owner’s data object on adata source. Using capability based access control,our system avoids the necessity of having a central-ized trusted party to confer trust. Instead, the dataowners are responsible for issuing the capability toother users.

In our implementation, capabilities are issued bythe data owners (O). Each one indicates who is theissuer, to which data requester (R) this capabilityis issued, and the object to grant access to. Addi-tionally, each capability contains the access rights inthe form of data queries, it also contains informationabout where and how the data can be accessed. Ca-pabilities are not transferable or usable by any sub-ject other than the intended data requester. As theresult, the data owner will remain the sole controllerof the data:

(R for O) says d

Our capability is implemented as a data structurethat contains:

• Access rights: every capability issued containsa query that adds additional restrictions on thedata access.

• Identities: are used to uniquely describe capa-bility issuer and issue. Due to the decentralizednature of the system, it is not an easy task toidentify every subject. Therefore, public keysare used to identify subjects in the capability.Based on implementations such as RSA, or PKI,

5

it is assumed that generated key pairs are al-ways unique. There is a many-to-one relation-ship between capabilities and a public key. Mul-tiple entities can share a public/private key pairin order to implement a joint ownership.

KX ⇒ X

In our implementation of capability, following op-erations are contained:

• Create: right to create capabilities is restricted tothe data owners (O). As soon as the data object(d) is generated and stored on a data source (S),this right is delegated to data owners in order togrant access to potential data requesters (R). Thiscreation of right is not transferable.

∀R,O controls [(R for O) says d]

• Delegate: deleting a capability is not possible forthe sake of audition and non-repudiation. How-ever, a data object referred by the capability canbe revoked by moving (or removing) the data onthe data source.

Audit is a critical feature to track who, when, andwhere has access a data. In the absence of a central-ized trusted party, our implementation provides theaudit feature through the capability call back key.Data sources will inform the data owners when ca-pabilities are used to access data objects.

Also, the capability is stored by the data requester.Upon arrival, capability can be verified without anexternal access control list. As a result, having morecapabilities or data objects may not require extrastorage and computing power on the data storage,nor the data owner.

3.2 Protocol Data Structures

Five data structures are used in the payload of mes-sages. Data Objects are data at rest or data in motionstored in the data sources. Even though the dataobjects are not directly used and transmitted in themessage, the ultimate goal is to manage their ac-cess. Data objects could be files or records withina file repository, database, cloud provider, or anyother data storage system. In practice, there areno limiting factors that would prevent our systemfrom supporting other documents such as paper doc-uments, film, recordings or another non-digitizedmedia. Data Objects in motion could be a variety oflive data sources, including but not limited to: sen-sors, video streams, audio streams, health trackers,

location data, etc. Lastly, a Data Object may repre-sent an actuator or device that the data requestermay interact with. An example may be a PhysicalAccess Control Systems (door locks, and alarm sys-tems), a PTZ camera, a traffic light, a shared vehicleignition system, etc.

Five data structures used in the payload of mes-sages transmitted in the Data Management Protocolare:

1. Data Object Ticket (DOT): the right to issuea capability to access a data object. This ticketis issued by the data source and contains theunique id of specified data object, the iden-tity of the owner (KO), the identity of the datasource (KS), data access path (DAP), and meta-data describing attributes of data object, signedby the data source:

DOT = {Data ID,KO,metadata,DAP}K′S,KS

2. Data Access Path (DAP): this message repre-sents a mechanism for an authorized role togain access to the data object. Some examplesof this DAP can be:

• URL/URI (FTP/SFTP/SCP/HTTP/HTTPS)

• Record Locator

• Contact Information

• Instructions

• Physical Location

Note: In our implementation, tickets can beutilized as the authentication mechanism forTLS/SFTP/SSH as they are generated from theTLS standard.

3. Request Tickets (RT): these are messagesbroadcasted by the data requester to all the sub-scribers. A data request ticket typically consistsof a data query, participation conditions, dura-tion of access, and some relevant metadata:

RT= {Request ID,KO,Query,Conditions,Duration,

Metadata,KR}K′R

A data query indicates to the data owner whatdata the data requester intends to collect fromdata owners. The query format is determinedby data sources. Participation conditions areconstraints on data owner in order to determineif they are qualified to participate in the data re-quest. For example in a medical data collectionthis condition can be on:

6

• Nationality of Participant

• Demographic information about partici-pant, age, gender, or residency

• Relationships with certain data sourceproviders, i.e. only data stored withGoogle or Apple

4. Endorsements: provide a method for an en-dorser or endorsers to provide additional thirdparty information to the data request. This in-formation will help the data owners to decidewhether they are willing to share data for a re-quest. For example, endorsement can be:

• Metadata regarding the data privacy pol-icy of the data requester

• Results of a third party audit or attes-tation regarding the information securitycontrols in place at the organization

• Results of an Internal Review Board (IRB)decision

• Authorization by a government entity toconduct a trial or research (FAA, NIH,HHS)

• Indication that data will be shared in ajoint collaboration

• Verification of the data requester authen-ticity by a service provider or other trustedthird party (Think CA/Trusted Roots inPKI)

Endorsement is done by chain of signatures.For example the following request ticket is en-dorsed by two endorser E1 and E2:

{{RT,KR,FeedbackE1}K′E1,E1,FeedbackE2}K′E2

5. Data Access Ticket (DAT): is a capability thatgives authorization to access a data object. Theticket is issued by the data owner to grantthe specified data requester access to the datasource and contains the identity of the entitywho will access the data. But it is not sent di-rectly to the data source. Additionally it con-tains an identity to receive the acknowledgmentwhen the ticket is being used.

DAT = {KO,DOT,{Data ID,Query,

KO,K3O,KR}K′O

}KS

In order to protect the real identity of data own-ers, a data owner may choose to use multipleidentities during the ticket exchange process:

Figure 2: Data Management Protocol with five classof messages transmitted among four roles. Numberson connections correspond to the message types.

• KO is the identity that is shared with the datasource

• K2O is the identity to be used to communicate

with the data requester. This identity is notknown by the data source

• K3O is the identity to receive the acknowledg-

ment of ticket being used. This identity is notknown by the data requester

3.3 Authorization Messages in the Pro-tocol

In this section, we describe the messages transmit-ted in the protocol. These five class of messages areillustrated in Figure 2 and a summary is provided inFigure 3. Messages contained in the protocol are:

• Message 1: Data Source Ticket Generation Thegoal of this message is to let the data owner (O)know about the existence of a new data objectwhich belongs to the recipient. Furthermore, theincluded data object ticket allows for the owner todelegate discretionary access to the data object.

The other included token {KS}K′O(from A1 as-

sumption) authenticates the data source to thedata owner. Later, DOT can be used to authen-ticate the recipient.

• Message 2: Data Request The data requesterbroadcasts a data request to the system sub-scribers. This data request may contain endorse-ments from one or more endorsers which mayprovide supplementary information regarding thedata request.

7

(M1) S→ O : {DOT{KS}K′O}KO

DOT = {Data ID,KO,metadata,DAP}K′S

(M2) R→∗ : {[{Hash{RT}, Feedback}K′E1]+,

RT,KE1,KR}K′R,KR

RT = {Request ID,Query,Conditions,Metadata,Duration,KR}K′R

(M3) O via K2O→ R : [{Data ID,DAP,DAT,KS}KR ]

+

{Request ID,K2O,{Checksum}K2′

O}K2′

O,

DAT = {KO,DOT,{Data ID, Request ID,

Query,KO,K3O,KR}K′O

}KS

(M4) R→ S : [DAT]+,

{{Hash{[DAT]+}}K′RKR}KS

(M5) S→ O : {KS,DOT,

{Query,KO,K3O,KR}K′O

}K3O

Figure 3: Five access control messages in the DataManagement Protocol.

The data owner’s client will process the data re-quests against its library of data object tickets inorder to check if participation condition is met,and identify which tickets may meet the specifica-tions of the data query.

Endorsements as well as supplementary infor-mation can also help the data owner to decideon whether to participate in the data request.Data owner can verify originate of endorsementby checking endorser’s identities (from A3 as-sumption) as well as confirming the KR in RT ismatched with the sender of request.

• Message 3: Ticket Exchange When the dataowner consents to grant access to the data ob-ject(s), a ticket exchange message is sent to thedata requester. This ticket contains one or moredata access tickets per data source, each one con-taining part of requested data. Each data ac-cess ticket includes identity KS of the data sourcewhich is storing the data object, as well as the ad-dress that the data can be accessed by (DAP).

• Message 4: Data Access As as a result of the mes-

sage 3, the data requester has now received thedata access path as well as any other relevant in-formation needed to access the data. Dependingon the application, the data requester may contactthe data source(s) directly or indirectly throughthe system.

Data source can verify the access by testing:

1. DOT is signed by the data source S and in-cludes KO. This will verify O right to grant acapability on Data ID:DOT ⇒ O controls Data ID

2. DAT includes signed Data ID, Query, and KRwith KO which matched with requester key.This verifies O grant a capability on Data IDusing Query for R:

{KR, Data ID , Query}K′O⇒

O says (KR∧ Data ID ∧ Query)(R for O) says DataID∧Query

3. Optionally, the data source can check RequestID against a data request black list.

• Message 5: Access Announcement In step 5, it isexpected that Data Sources will announce to thesystem that access has been made utilizing theDAP(s) and credentials provided in Step 3 andaccessed as part of Step 4. This allows the dataowner to monitor accesses to their data and in-form other parties about successful transmissionof transaction.

4 Data Store System

In this section we explain the design of the accesscontrol data store system. This storage system sup-ports the distributed access control and is separatedfrom the source data. Separation of source datagrants the possibility of keeping source data at itsorigin. The access control data store component is apersistent, scalable, decentralized, and distributedstorage system based on Blockchain. We modelthe Blockchain as a form of persistent data storagewith update notification support. It overcomes thesingle point of trust and failure, and provides fulltransparency of transactions very similar to Bitcoin.It has to be noted that the Bitcoin’s Blockchain isjust an example of an appropriate protocol, possibly,there exists other such appropriate protocols. Be-fore explaining the system, it is necessary to reviewcore components of the system which are adoptedfrom the Bitcoin’s Blockchain. Then we show howBlockchain can be used as data storage:

8

4.1 Blockchain

Bitcoin, the system first introduced by Nakamoto[4], is the first truly decentralized global currencysystem. Like any other currency, its main purpose isto facilitate the exchange of goods and services byoffering a commonly accepted value. Unlike tradi-tional currencies however, Bitcoin does not rely ona centralized authority to issue, control the supply,distribution and verification of the validity of trans-actions. Bitcoin enables a network of computers tomaintain a collective bookkeeping via the internet.This bookkeeping is neither closed nor in control ofone party. Rather, it is public and available in onedigital ledger called Blockchain which is fully dis-tributed across the network and relies on a networkof volunteers that collectively implement a repli-cated ledger and verify transactions. Traditionally,Blockchain has been discussed in the context of Bit-coin, however Blockchain goes beyond the scope ofconsensus currency, introduces many new and inno-vative methods for propagating information in thenetwork, public transaction history, multi granular-ity and many others.

Blockchain uses a multi-hop broadcast to propa-gate transactions and blocks through the networkto update the ledger replicas. In the Blockchain, alltransactions are logged including information on thedate, time, participants and amount of every singletransaction. Each node in the network owns a fullcopy of the Blockchain and on the basis of cryp-tographic principles, the transactions are verifiedby the so-called Bitcoin Miners, who maintain theledger. The systematic eventual consistency princi-ples also ensure that these nodes automatically andcontinuously agree about the current state of theledger and every transaction in it. If anyone at-tempts to corrupt a transaction, the nodes will notarrive at a consensus and hence will refuse to incor-porate the transaction in the Blockchain. So everytransaction is public and thousands of nodes unani-mously agree that a transaction has occurred on par-ticular date and time. In Blockchain, trust comesfrom the fact that everyone has access to a sharedsingle source of truth. The decentralized property isthe fundamental difference from previous systemswhich relied on a centralized block issuer and re-quired users to trust the original issuer, which wasstill used to eventually clear transactions.

In our solution, we use the Blockchain to store theaccess control data in a decentralized manner. Priorto describing the decentralized access control, weneed to discuss the Blockchain as data storage.

Figure 4: Overview of the Blockchain. Every blockcontains a hash of the previous block. New transac-tions are constantly added to the end of the chain,source[4].

4.2 Data Model in Blockchain

A block chain is a transaction database shared byall nodes participating in a system. As pointed outearlier, Blockchain can be used as data storage formany different applications; it provides various stor-age functionalities, among those three primitives areessential to our system: i. retrieve, ii. update,and iii. add. Further, we explain how these pri-mates are supported in the Blockchain and form thedata flow, but first we review some definitions in theBlockchain adapted from Nakamoto paper [4] andthe Bitcoin Developer Guide [1], illustrated in Fig-ure 4:

• Transaction: a transaction is a transfer of value(e.g. Bitcoin, information) that is broadcast to thenetwork and collected into blocks. A transactionreferences previous transaction outputs as newtransaction inputs and dedicates all input Bitcoinvalues to new outputs. It is possible to browse andview every transaction ever collected into a block.

• Block: transaction data is permanently recordedin files called blocks. Blocks are organized intoa linear sequence over time (also known as theblock chain). As Figure 4 illustrates, every blockcontains a hash of the previous block. New trans-actions are constantly being processes by minersinto new blocks which are added to the end of thechain and can never be changed or removed onceaccepted by the network.

• Mining: is a distributed consensus system that isused to confirm transactions and add transactionrecords to the public ledger of past transactions(Blockchain). It enforces a chronological order inthe Blockchain and allows different computers toagree on the state of the system.

9

• Miner: is an individual or an organization per-forming the mining. Miners dedicate consid-erable computation power for maintaining theBlockchain. In Bitcoin miners are incentivized bya reward i.e. Bitcoins. In our system miners areresearchers and organizations requesting and an-alyzing the data; it is in their benefit to mine theBlockchain.

• Blockchain: is a transaction database shared byall nodes participating in a system. A full copyof a Blockchain contains every transaction in or-der, dating back to the very first one. The en-tire Blockchain can be downloaded and openly re-viewed by anyone.

• Genesis block: is the first block of a Blockchain.The genesis block is hardcoded into the softwareand is a special case in that it does not referencea previous block.

Our system’s data flow is based on three essentialprimitives enabled by the Blockchain:

1. Retrieve: per design, the entire Blockchain canbe retrieved and downloaded by anyone. Withthis information, one can find out how muchvalue (e.g. Bitcoin, information) belonged toeach entity at any point in history.

2. Update: transactions and mining results arebroadcasted in the network, every new block isordered and linked to the previous block whichmakes it impossible for nodes to miss any addedinformation.

3. Add: adding data is the same process as transfer-ring data:

(a) When a node in a Blockchain wants to trans-fer data, the node broadcasts the request,the request is received by all the nodes onthe Blockchain network.

(b) After receiving the request, nodes which areminers will add this most recent transactionrequest into a block. Then they run the newblock and the previous block into a set ofhash function based calculations.

(c) All the miners start racing on the compli-cated cryptographic puzzle. When the firstminer solved the block, it adds the block tothe end of Blockchain and will broadcast itto its peers.

(d) After the broadcast, peers will check thetransaction and will start using the new ver-sion of Blockchain.

Figure 5: Client Access Models to the BlockChain.Left: Direct Access, Right: Server Client. Blue linesindicate Blockchain access, red line indicates APIcall.

In the next section, we discuss different mechanismsto allow our clients to connect to the data store sys-tem. We use the above primitives and implementa publish-subscribe messaging servie which deliverstransactions to their intended recipients.

5 Messaging Service

In all the emerging big data analytic domains, in-teraction with the data flow is a major issue andit is crucial to adopt the most scalable solution. Ingeneral, there are three different client data accessmodels available for users to access and communi-cate with the data store system: i. direct access,ii. server-client model, and iii. publish-subscribemodel.

In this section, we review these three options. Weexplain how publish-subscribe architecture providesscalable, flexible, and reliable communication be-tween many senders and receivers without the needof persistent connectivity. Furthermore, we explainhow we adapt the publish-subscribe model into oursolution. In our system, whenever two roles want tointeract, one sends a message to the messaging ser-vice component. Then, the elements of messagingservice will send the message to the data store sys-tem which is based on Blockchain. After successfulstorage of the message in the Blockchain (data storesystem), it will be fetched by the recipient throughthe messaging service.

5.1 Data Access Model

In order to address a variety of use cases we presentthree different client data access models for dataowners and other participants to interact with theData Flow.

10

• Direct Access: Blockchain openness allows any-one to download the whole distributed databaseup to the present and subsequent updates asdeltas (blocks). (Figure 5-Left) This method isvery straightforward and requires minimal im-plementation. It provides full validation of theBlockchain, as well as maintaining the highestlevel of safety and privacy. However, this modelis not feasible to implement in all environments.This model requires a powerful, always-on com-puter to handle a large amount of data and toquery this data. Devices such as mobile phones,embedded systems or other less powerful sys-tems typically lack the resources to interact di-rectly with the data flow. In reality, each entitywhether it is Data Owner, Data Source, or DataRequester, is interested only in a small fractionof data. Excess amount of redundant processing,as each client will need to process and downloadthe entire Blockchain, results in a high amount ofwaste.

• Server-Client: In environments where there is ahigh degree of trust between two entities a client-server model avoids the redundant processing as-sociated with the Direct Access Model. In a client-server model the server handles all of the client’sinteractions with the data flow (Figure 5-Right).In these environments the client’s public and pri-vate keys are transmitted to the server which actson the client’s behalf. Typically the client’s inter-action with the data flow will be completely ab-stracted by an application that interprets commu-nication received from the server. Some strengthsof this model include: it allows for password re-sets or other ways for the server to validate theclient’s identity, in case the client loses its keys.This model introduces risks into the system. Sinceclient’s keys are stored on the server, in a server-client environment, if a server is compromised, aclient’s keys might also get stolen. Since theseservers typically store large number of keys, theyare often the target of attackers.

• Publish-Subscribe (Pub-Sub): This model repre-sents a mixture of the previous two systems. In aPub-Sub system the Publisher or Server monitorsthe data flow on behalf of the Subscriber (Figure6). This substantially minimizes the processingload placed on the data owner. Clients subscribeto a set of queries. The publisher then filters outincoming traffic based on these queries, and onlycommunicates matching queries to the subscriber.In these systems, the publisher does not have ac-cess to the data. It communicates the encrypted

Figure 6: Publish-Subscribe client access model tothe Blockchain. Blue lines indicate the Blockchainaccess, red lines indicate API call, and black linesindicate publish-subscribe data transfer.

information, via a subscription, to the subscriberwho is then able to decrypt the information withtheir private key. The traffic will be decrypted andprocessed on the client side. The only query re-quired in the protocol is matching the destinationto an identity. This query can be implementedvery efficiently using Bloom filter and provided bythe pub-sub model.

Next, we explain how we adopt the publish-subscribe model into our solution.

5.2 Publish-Subscribe Model

In this section, we explain how users’ clients utilizethe subscribers to receive specific updates from theBlockchain. As explained in the previous section,a Blockchain is a collection of blocks. Each blockcontains a set of transactions such as ticket requestand data access, between different roles in the sys-tem. By using a publish-subscribe model, publish-ers join the Blockchain, collect and filter appropriatetransactions, and provide subscribers with their re-quested transactions. This process includes the fol-lowing mechanisms:

1. Join: after joining the Blockchain, each publisherreceives updates and will be able to access previ-ous blocks in the Blockchain. Then, subscriberssubscribe to one or more publishers, request adedicated cache space to be initiated on the pub-lisher, and provide them with a set of identities(public keys). The set of identities determineswhich transactions the subscriber is interested toreceive. The cache space is intended to store data

11

so data can be served faster when the subscriberrequests it. It should be noted that publishersare located on machines that have access to theBlockchain updates, and also have stored copy ofthe Blockchain for fast local access. Subscribersare located on the client’s machine.

2. Receive updates: whenever a new subscriber isadded, it is typically specified from which pastblock it would like to start receiving new blocks.Because of this, publishers can ensure that theyare sending updates beginning with the correctblock. Ideally all subscribers will receive up-dates from the same block, i.e. the last addedblock. Hence, publishers use one reader to re-trieve updates, and then relay them to all in-terested subscribers. In the case of error condi-tions, the publisher needs to restart sending up-dates from the last acknowledged location. Forthis, the publisher may need to read an older up-date. If many subscribers are recovering simul-taneously, a naive implementation, a publisherwould read from many positions in blocks simul-taneously, which might cause high publisher I/Oload. Another approach is to have one recoverythread from the oldest position working towardsthe end.

3. Filter: after receiving the updates from theBlockchain, publisher breaks down the blocksinto original transactions. Then, transactions arefiltered on the publisher side; the subscribers in-form the publishers of what filters they need.The Publisher only delivers updates that meet thespecifications of the supplied filters. While theevaluation of filters does place some additionalprocessing overhead on the publisher, it helpsconserve both memory and network bandwidth.This is especially the case when there are manysubscribers that require only a small subset of thedata.

Subscribers request filtering for a set of identitiesthat may be matched with either the sender orrecipient of a transaction. For example, a sub-scriber may request that a publisher filters mes-sages only set as broadcast to all members. Usingthis structure we can easily implement a Bloomfilter, which works efficiently under these condi-tions. If a transaction is matched, it will be keptand stored in the cache corresponding to the sub-scriber.

4. Deliver: reliability is an important requirement.For example, one missed update could lead to

permanent corruption in a user’s portfolio. Sub-scribers receive the update stream of blocks. Pub-lishers periodically track the delivery of updatesby having subscribers acknowledging the deliv-ery of updates. It is assumed that Subscribers arestateless, i.e it is not required for them to keeptrack of the state of deliveries. When a subscriberasks for receiving its updates, publisher sendsall the transactions in the cache correspondingto the subscriber and asks for the subscriber’sacknowledgment. Upon receiving the acknowl-edgment, publisher clears the cache. Otherwise,publisher keeps trying upon receiving the ac-knowledgment.

It should be noted that one client may deploymany subscribers, and one subscriber could joinmany unique publishers. Each publisher joins onlyone Blockchain.

6 Implementation and Evaluation

To estimate the performance of Dusc, we conducta number of experiments to collect real perfor-mance characteristics of our prototype implementa-tion. Our experiments are performed in the follow-ing settings:

• Server. we use a 2.2 GHz Intel(R) Xeon(R)CPU E5-2660 Machine with 128GB of Ram, withUbuntu 14.04 x64 OS. Codes on this platform aredeveloped using Python 2.7 and PyCrypto libraryfor cryptographic algorithms.

• Client. we use an iPhone 6 with iOS 9.2.1. Codeson this platform are developed using Swift 2 andXCode 7. Apple Security Framework and Heim-dall library is used for cryptographic algorithms.

To minimize the error and increase the confidencelevel, each experiment is repeated 100 times andthe median measured value is reported. Further,we describe the performance of three componentsof Dusc:

6.0.1 Data Management Protocol

Authorization costs. as mentioned earlier, dueto scalability challenges, it is not feasible for datasources to maintain an ACL. Therefore, data sourcescompute the authorization upon receipt of data re-quests rather than performing a lookup. To eval-uate the overhead of this operation, we execute abatch of authorization checks on data requests withvariable number of capabilities. Each request could

12

0

2

4

6

8

10

0 50 100 150 200 250 300

AverateTimetoProcessth

eRe

quest(Second

)

NumberofCapabilitiesperRequest

Figure 7: Authorization costs in the data manage-ment protocol.

1.00E-05

1.10E-05

1.20E-05

1.30E-05

1.40E-05

1.50E-05

1.60E-05

AverageLook

upTim

ePe

rTran

saction(Secon

d)

RequestKey(Log10)

Figure 8: Bloom filter performance in messagingservice Pub-sub.

contain multiple capabilities, the time spent to pro-cess one capability is 3.53e− 2 s constant. Figure7 shows the measured time for authorization checkof a data request with different number of capabil-ities on the server with single core resource alloca-tion. This metric performs linearly in respect to thenumber of capabilities, but constant in respect to thenumber of owners and data objects. For a requestwith 1M capability, authorization takes 2.6 s.

Create and verify messages. data managementprotocol contains five different class of messages.For each message communicated, the sender createsa message and the recipient verifies the integrity ofthe message. We have evaluated these overheads forthree different roles: data source, data owner, anddata requester. We have also considered two scenar-ios where each role is executed on either the serveror client side. As shown in Table 2, the overhead inall the above scenarios is negligible.

6.0.2 Messaging Service

Pub-Sub performance. persistency of messagesand scalability of messaging service in Duscrelies onthe performance of the implemented Pub-Sub. Forour Pub-Sub implementation, we use Bloom filtersimplementation of [6] to achieve near linear filter-ing performance. Figure 8 shows time spent to filtertransactions in respect to the number of requestedkeys. For each transaction in Dusc, there exists twokeys corresponding to sender and receiver. We useRSA generated 1024-bit public key. On average, ittakes 1.12e−5 s to filter each transaction, when 1Mkeys are presented under 0.1% Bloom filter errorrate. This time is almost negligible in respect to in-creased number of keys.

7 Discussion

In this section we look back at the problems facingscalable data access and how Duscaddresses theaforementioned challenges.

The following challenges have been addressedthrough the data management protocol:

1. User-centric. users are in full control of grant-ing data access to their collective data. They canalways query the system and discover data that be-longs to them.

2. Privacy-preserving. users preserve their pri-vacy by consciously choosing who can access towhat data.

3. Endorsed. a well-defined role with well-defined responsibilities.

12. Correct. we have shown the correctness inAppendix I.

The following challenges have been addressedthrough the messaging service and data storesystem:

4. Accessible. every individual or organizationcan easily join the Blockchain and Pub-Sub system.

5. Scalable. scalability performance have bothcomponents have been discussed in Section 6.

6. De-centralized. there is no need for trustedcentralized point, Blockchain assures anyone canplay the role of ledger.

7. Distributed. data can be stored on variousdata sources. Since our system separates actualdata store from data management, no data has tobe moved from its point of origin.

8. Asynchronous. both data and service are al-ways available to clients with bad connectivity. Pub-Sub ensures messages are cached by the Publishers

13

Table 2: Overhead of create and verify messages in the data management protocol.Step Role Quantity # Processing Cores Processing Overhead (second)

Message 1 (Create) Source N/A 1 1.20E−02

Message 1 (Verify) Owner N/A 1 2.00E−02

Message 2 (Verify) Owner 1 Endorser 1 1.66E−02

Message 2 (Verify) Owner 10 Endorsers 1 2.20E−01

Message 3 (Create) Owner 1 Capability 1 8.77E−03

Message 3 (Create) Owner 10 Capabilities 1 8.57E−02

Message 3 (Create) Owner 100 Capabilities 1 8.54E−01

Message 3 (Verify) Requester 1 Capability 1 2.37E−02

Message 3 (Verify) Requester 10 Capabilities 1 1.89E−01

Message 3 (Verify) Requester 100 Capabilities 1 1.84E +00

Message 4 (Create) Requester 1 Capability 1 1.59E−04

Message 4 (Create) Requester 10 Capabilities 1 3.98E−04

Message 4 (Create) Requester 100 Capabilities 1 2.79E−03

Message 4 (Verify) Source 1 Capability 1 3.51E−02

Message 4 (Verify) Source 10 Capabilities 1 3.28E−01

Message 4 (Verify) Source 100 Capabilities 1 3.26E +00

Message 4 (Verify) Source 100 Capabilities 16 2.00E−01

Message 4 (Verify) Source 1000 Capabilities 16 2.05E +00

Message 5 (Create) Source N/A 1 1.02E−02

Message 5 (Verify) Owner N/A 1 1.54E−02

and Blockchain assures the availability of data.10. Transparent. this is a property of Blockchain.11. Auditable. another property of Blockchain.

The following are general properties of oursystem:

9. Compliant. our model provides a particularadvantage, because it does not store any data, oruser identifiable information, it may not be subjectto the regulatory requirements.

13. Usable. all the data owner functionalities areexposed to a user through a mobile app, then cansimply view their data portfolio and grant access todata requesters.

8 Related Works

References

[1] Bitcoin developer guide.[2] ABADI, M., BURROWS, M., LAMPSON, B., AND PLOTKIN, G. A

calculus for access control in distributed systems. ACM Trans-actions on Programming Languages and Systems (TOPLAS)15, 4 (1993), 706–734.

[3] LAMPSON, B., ABADI, M., BURROWS, M., AND WOBBER, E.Authentication in distributed systems: Theory and practice.ACM Transactions on Computer Systems (TOCS) 10, 4 (1992),265–310.

[4] NAKAMOTO, S. Bitcoin: A peer-to-peer electronic cash sys-tem. Consulted 1, 2012 (2008), 28.

[5] NEUMAN, B. C., AND TS’ O, T. Kerberos: An authenticationservice for computer networks. Communications Magazine,IEEE 32, 9 (1994), 33–38.

[6] XIE, K., MIN, Y., ZHANG, D., WEN, J., AND XIE, G. A scal-able bloom filter for membership queries. In Global Telecom-munications Conference, 2007. GLOBECOM’07. IEEE (2007),IEEE, pp. 543–547.

Appendix I: Data Management ProtocolAnalysis

Security correctness is a major requirement forpractical applications. In Dusc, this requirementis assured thorough using delegation by certificatemethod. In this section we provide a more formalproof using Abadi et al’s notion and method intro-duced in [2] and [3].

14

The data management protocol relies on twoabilities:

D1. Ability of the Source (S) to give to another prin-cipal Owner (O) the authority to act on Source’sbehalf

D2. Ability of the Owner (O) to give to anotherprincipal Requester (R) the authority to act onOwner’s behalf

In this section, these abilities are discussed as aform of delegation (more throughly studied in [2]).In our delegation model, when a data requester (R)requests a data from a data source (S), R presents atoken demonstrating: R is making the request andowner (O) has delegated to R. Upon Rs request,an access will be granted, only if S decides thisdelegation is permitted. This process is usuallydone by looking up an ACL. However, in our systemS delegates the decision on the former delegationto O through another delegation as a substitutionof an ACL record. In other word, O is delegated todecide if delegation of O to R is permitted to accessa certain object. This positions O in the center ofaccess control.

In the proposed protocol these rights in eachdelegation is as follows:

1. in D1 Source delegates the access control deci-sion on a certain data object to Owner

2. in D2 Owner delegates the access to this dataobject on source with a certain query to the re-quester

In our system, delegation certificates are nottransferable, and access control decisions dependsonly on the identities of both O and R.

Delegation relies on some form of trust among R,O, and S preferably by authentication. In our sys-tem:

1. Owner and source are assumed authenticated(Assumption A1, A2)

2. Owner authenticates endorsers by using atrusted third party and may trusts the requesterby endorsement (Assumption A3)

We consider three instances of delegation. In eachcase we are led to ask whether composite princi-pals, such as B—A, appear on Cs ACL. The simplestinstance of delegation is delegation without certifi-cates:

1. When B wishes to make a request r on As behalf,B sends the signed request along with As name,for example in the format KB says A says r

2. When C receives the request r, he has evidencethat B has said that A requests r, but not that Ahas delegated to B; then C consults the ACL forrequest r and determines whether the requestshould be granted under these circumstances

The following is the reasoning of the data source(S), who makes the access control decision.

In the message 4, when R wishes to access thedata object (d) on the data owner (O)’s behalf,the data requester (R) sends the data access ticket(DAT ) received in step 3, signed with its public key(KR) to the data source (S). It contains two delega-tion certificates:

1. DOT which is the certificate of D1 deligation.

2. {Query,O,K3O,KR}K′O

: which is the certificate ofD2 deligation.

The signed message of {Query,O,K3O,KR}K′O

veri-fies that R is deligated by O under R∧Query∧d ac-cess right:

O says [R∧Query∧d])

At this point, S needs to decide if this deligationhave access to data object d. Using DOT, S verifiesthat O is deligated to control d by S itself:

S says (O controls d)

Therefore access will be granted.

15

Documents

Decentralized User-Centric Access Control using PubSub ... · Decentralized User-Centric Access Control using PubSub over Blockchain Sayed Hadi Hashemi 1, Faraz Faghri 1, Roy H Campbell