Chen Chen*, Anrin Chakraborti, and Radu Sion PD-DM ...€¦ · areeithersloworinsecureagainst“multi-snapshot”ad-versaries that can gain access to the storage multiple timesoveralongerperiod.Forexample,steganographic

Proceedings on Privacy Enhancing Technologies 2019 (1)153ndash171

Chen Chen Anrin Chakraborti and Radu Sion

PD-DM An efficient locality-preserving blockdevice mapper with plausible deniabilityAbstract Encryption protects sensitive data fromunauthorized access yet is not sufficient when users areforced to surrender keys under duress In contrast plau-sible deniability enables users to not only encrypt databut also deny its existence when challengedMost existing plausible deniability work (eg the suc-cessful and unfortunately now-defunct TrueCrypt) tack-les ldquosingle snapshotrdquo adversaries and cannot handle themore realistic scenario of adversaries gaining access toa device at multiple time points Such ldquomulti-snapshotrdquoadversaries can simply observe modifications betweensnapshots and detect the existence of hidden dataExisting ideas handling ldquomulti-snapshotrdquo scenariosfeature prohibitive overheads when deployed onpractically-sized disks This is mostly due to a lackof data locality inherent in certain standard access-randomization mechanisms one of the building blocksused to ensure plausible deniabilityIn this work we show that such randomization is notnecessary for strong plausible deniability Instead it canbe replaced by a canonical form that permits most ofwrites to be done sequentially This has two key advan-tages 1) it reduces the impact of seek due to randomaccesses 2) it reduces the overall number of physicalblocks that need to be written for each logical write Asa result PD-DM increases IO throughput by ordersof magnitude (10ndash100times in typical setups) over exist-ing work while maintaining strong plausible deniabilityagainst multi-snapshot adversariesNotably PD-DM is the first plausible-deniable systemgetting within reach of the performance of standard en-crypted volumes (dm-crypt) for random IO

Keywords Plausible deniability Storage security

DOI 102478popets-2019-0009Received 2018-05-31 revised 2018-09-15 accepted 2018-09-16

Corresponding Author Chen Chen Stony Brook Uni-versity E-mail chen18csstonybrookeduAnrin Chakraborti Stony Brook University E-mail an-chakrabortcsstonybrookeduRadu Sion Stony Brook University E-mailsioncsstonybrookedu

1 IntroductionAlthough encryption is widely used to protect sensitivedata it alone is usually not enough to serve usersrsquo in-tentions At the very least the existence of encryptioncan draw the attention of adversaries that may have thepower to coerce users to reveal keys

Plausible deniability(PD) aims to tackle this prob-lem It defines a security property making it possibleto claim that ldquosome information is not in possession [ofthe user] or some transactions have not taken placerdquo[23] In the context of storage devices as addressed inthis paper PD refers to the ability of a user to plausiblydeny the existence of certain stored data even when anadversary has access to the storage medium

In practice PD storage is often essential for sur-vival and sometimes a matter of life and death Ina notable example the human rights group Networkfor Human Rights Documentation-Burma (ND-Burma)carried data proving hundreds of thousands of humanrights violations out of the country on mobile devicesrisking exposure at checkpoints and border crossings[27] Videographers brought evidence of human rightsviolations out of Syria by hiding a micro-SD card inbody wounds [24] again at high risk of life

Yet unfortunately existing PD storage solutionsare either slow or insecure against ldquomulti-snapshotrdquo ad-versaries that can gain access to the storage multipletimes over a longer period For example steganographicfilesystems [5 23 26] (also known as ldquodeniable filesys-temsrdquo) resist only adversaries who can access storagemediums once (ldquosingle-snapshotrdquo adversaries) And al-though some attempts [26] have been made to handlemulti-snapshot adversaries they are insecure A strongadversary can straightforwardly infer what data is hid-den with a relatively high probability better than ran-dom guessing On the other hand while block-level PDsolutions [6 8] handling multi-snapshot adversaries ex-ist their performance is often simply unacceptable andmany orders of magnitude below the performance ofnon-PD storage

Nevertheless most realistic PD adversaries are ul-timately multi-snapshot Crossing a border twice orhaving an oppressive government collude with a hotel

PD-DM 154

maid and subsequently a border guard provides easyand cheap multi-snapshot capabilities to any adversary

It is obvious that the security of a PD systemshould not break down completely (under reasonableuser behavior) and should be resilient to such real-istic externalities (hotel maids border guards airlinechecked luggage etc) Thus undeniably a practical andsound plausible-deniable storage solution crucially needsto handle multi-snapshot accesses

In virtually all PD designs handling multi-snapshotadversaries requires hiding the ldquoaccess patternsrdquo to hid-den data Otherwise adversaries could trivially observeimplausible modifications between snapshots and inferthe existence of hidden data

Some of the first solutions [8 31] hide access pat-terns by rendering them indistinguishable from randomUnfortunately existing randomization-based mecha-nisms completely break data locality and naturally im-pose large numbers of physical writes that are neededto successfully complete one logical write This resultsin unacceptable performance especially for high-latencystorage (eg rotational disks) For example HIVE [8]performs 3 orders of magnitude slower than a non-PDdisk baseline

We observe that randomization is not the only wayto hide access patterns Canonical forms that dependonly on non-hidden public data (eg sequential struc-turing) suffice and can yield dramatic throughput in-creases This is mainly due to two reasons First it iswell known that sequential structuring can vastly speedup IO by reducing the impact of random seeks [14]This has been previously well explored in log-structuredfilesystems [13 28] Second the predictable nature ofsuch sequential access leads to more efficient searchesfor free block locations (to write new data) with lowernumbers of IO operations when compared with exist-ing (mostly randomized access) solutions

Finally we note that by definition a well-definedmulti-snapshot adversary cannot observe the device (in-cluding its RAM etc) between snapshots This enablesadditional optimizations for increased throughput

The result is PD-DM a new efficient block-levelstorage solution that is strongly plausibly deniableagainst multi-snapshot adversaries preserves localityand dramatically increases performance over existingwork In typical setups PD-DM is orders of magnitude(10ndash100times) faster than existing approaches

2 Storage-centric PD ModelPD can be provided at different layers in a storage sys-tem However properly designing multi-snapshot PD atthe filesystem level is not trivial As filesystems involvemany different types of data and associated metadataresolving all data and metadata modifications and as-sociated chain reactions in a plausibly deniable mannerrequires significant and careful consideration We arenot aware of any successful fully secure design More-over data loss (overwriting of data at a high deniabilitylevel by writes to data at a lower deniability level) islikely an unavoidable drawback of filesystem PD ideas

Instead the focus of this work is on storage-centricPD for block devices ndash allowing a user to store files of dif-ferent confidentiality levels to different logical volumesstored on the same underlying physical device Thesevolumes are accessed independently by upper-layer ap-plications such as filesystems and databases

It is important to note that no existing PD solu-tion hides the fact that a PD system is in place So far(and in this paper) the role of a sound PD system hasbeen to hide whether a user stores any hidden files (forfilesystems) or uses any hidden volumes (in block levelschemes) In other words it is assumed that adversarieswonrsquot punish users for merely using a PD-enabled sys-tem at least as long as the system allows also non-PDpurposes eg storing in a public volume

21 PD Framework at Block Level

A PD system at the block level intermediates betweena raw storage device and upper logical volumes (Figure1) It structures accesses to the underlying device whileproviding upper-level abstractions for multiple indepen-dent logical volumes Logical public volumes V ip can berevealed to adversaries whereas the existence and us-age of hidden volumes V jh should be hidden from ad-versaries The physical device contains Nd blocks of Bbytes each while each logical volume has N i

pNjh logi-

cal data blocks respectively The volumes are encryptedwith different encryption keys Ki

p and Kjh (eg derived

from passwords P ip and P jh) If coerced users can pro-vide the public passwords and deny the existence or useof hidden volumes even to multi-snapshot adversariesAdversary We consider a strong computationallybounded ldquomulti-snapshotrdquo adversary able to get com-plete representations of the devicersquos static state at mul-tiple times of her choosing Importantly the adversarycannot see the running state (memory caches etc) of

PD-DM 155

Fig 1 Overview of a PD system with one public volume and onehidden volume as an example PD-DM provides virtual volumeabstractions The choice of whether the hidden volume is usedwill be hidden

the PD layer occurring before she takes possession ofthe device For example a border guard in an oppres-sive government can check a userrsquos portable hard drivesevery time he crosses the border but the guard cannotinstall malwares or otherwise spy on the deviceAccess Pattern An access can be either aread or write to a logical volume Pub_read(lip)Pub_write(lip dip) denote accesses to public volume V ip while Hid_read(ljh) Hid_write(ljh d

jh) denote accesses

to hidden volume V jh lip and ljh are logical block IDs

while dip and djh are data items An access pattern Orefers to an ordered sequence of accesses being submit-ted to logical volumes eg by upper-layer filesystemsWrite Trace A write trace W(O) refers to a series ofactual modifications to the physical block device result-ing from a logical access patternO Let Phy_write(α d)denote an atomic physical device block write operationof data d at location α (physical block ID) Then a writetrace W(O) can be represented as an ordered sequenceof physical block writes Phy_write(α d)

A multi-snapshot adversary may get informationabout a write trace by observing differences in the ldquobe-forerdquo and ldquoafterrdquo device states A PD system is designedto prevent the adversary from inferring that the writetrace corresponds to hidden volume accesses

Not unlike existing work PD-DM achieves this byproviding plausible explanations for accesses to hiddenvolumes using accesses to public volumes Figure 2 illus-trates Consider two access patterns O0 and O1 O1 con-tains hidden accesses while O0 does not The two corre-sponding write traces bring the disk from input state S1to output state S20 and S21 according to which of O0and O1 has been executed The idea then is to empowera user to plausibly deny the execution of hidden accessesby arguing that any observed device state changes be-tween snapshots would have been indistinguishably in-

Fig 2 Consider two access patterns O0 and O1 that transformthe state of the disk from S1 to S20 and S21 respectively Ad-versaries should not be able to distinguish S20 from S21 basedon the snapshots Effectively the adversary should not be able todistinguish W(O0) from W(O1)

duced by O1 or O0 Effectively the corresponding writetraces W(O0) and W(O1) should be indistinguishable

One key observation is that only write accesses areof concern here since read accesses can be designed tohave no effect on write traces This also illustrates whyit is important to guarantee PD at block level ndash the abovemay not necessarily hold for filesystems which may up-date ldquolast-accessedrdquo and similar fields even on reads

22 Security Game

An intuitive explanation of PD at block level is illus-trated in Figure 2 Towards more formal insights weturn to existing work and observe that security require-ments are sometimes quite divergent across differentworks For example Blass et al [8] introduced a hid-den volume encryption (HIVE) game It outlines twomain concerns frequency of snapshots and ldquodegree ofhidingrdquo ldquoRestrictedrdquo ldquoopportunisticrdquo and ldquoplausiblerdquohiding are defined along with three kinds of adversariesldquoarbitraryrdquo ldquoon-eventrdquo and ldquoone-timerdquo Chakraborti etal [6] proposed the PD-CPA game (Appendix A1) tar-geting a more powerful ldquoarbitraryrdquo adversary furtherexcluding reliance on any specific device property

We discover that both models feature a sharedstructural backbone coupled with certain adversary-specific features The games are defined between an ad-versary A and a challenger C on a device D with a ldquometagamerdquo structure

Before presenting the meta game we first introduceIsExecutable(O) ndash a system-specific predicate that re-turns True if and only if the access patterns O can beldquofully executedrdquo It is hard to have a universal definitionof IsExecutable(O) for all the PD solutions For exam-ple in HIVE IsExecutable(O) models whether there

PD-DM 156

exists one corresponding read after each access to ahidden volume In PD-CPA IsExecutable(O) modelswhether the ratio between the number of hidden writesand the number of public writes is upper bounded by aparameter φ Then the meta game is defined as follows

1 Let np and nh denote the number of public and hiddenvolumes respectively

2 Challenger C chooses encryption keys Kipiisin[1np]

and Kjhjisin[1nh] using security parameter s

3 C creates logical volumes V ipiisin[1np] and

V jhjisin[1nh] in D encrypted with keys Ki

p andKj

hrespectively

4 C selects a random bit b5 C returns Ki

piisin[1np] to A6 The adversary A and the challenger C then engage in

a polynomial number of rounds in which(a) A selects a set of access patterns Oi

piisin[1np] andOj

hjisin[1nh] with the following restrictions

i Oip includes arbitrary writes to V i

p ii Oj

hincludes writes to V j

h subject to restric-

tion 6(a)iv belowiii O0 includes all the accesses in Oi

piisin[1np]iv There exists O1 a sequence composed (eg

by adversary A) of all accesses in O0 andOj

hjisin[1nh] such that IsExecutable(O1) =

(b) C executes Ob on D(c) C sends a snapshot of the resulting device to A

7 A outputs bprime Arsquos guess for b

Definition 1 A storage mechanism is plausibly-deniable if it ensures A cannot win the security gamewith a non-negligible advantage over random guessingOr more formally there exists a function ε(s) negli-gible in security parameter s such that Pr[bprime = b] le12 + ε(s)

The restrictions on access pattern choices (in step 6a)are mandatory in the game to prevent ldquotrivialrdquo impos-sibility In the absence of access patterns restrictions itmay be impossible to find a solution Eg an adversarymay choose different public accesses for O0 and O1 sothat she can distinguish access patterns by public ac-cesses only and trivially win the game The restrictions6(a)i - 6(a)ii ensure that the public writes chosen tocover the hidden writes are not a function of the hiddenwrites lest information about the existence of the hid-den volume is leaked by the public write choices (whichare unprotected from adversaries)

The IsExecutable(O1) in the restriction 6(a)iv is re-lated to the systemsrsquo ability to combine public and hid-den accesses to ensure PD It prevents adversaries win-

ning the game by trivially choosing access patterns withhidden accesses that cannot be hidden by a given sys-tem In existing work IsExecutable(O1) depends onlyon the ratio of hidden to public operations (either publicread or public write) This yields actual practical solu-tions but also limits the solution space to ORAM-basedmechanisms Yet this is not necessary if we consider thefact that adversaries in reality have much less freedomin choosing access patterns

Specifically requiring adversaries to choose no mat-ter how many public accesses that are needed to coverthose hidden accesses in the access pattern is quite rea-sonable The number of public accesses chosen couldbe related to not only the number of hidden accessesbut also the current state of hidden data on disk Inreality this corresponds to a multi-snapshot adversarywho gains access to storage devices at different pointsin time but does not monitor the system running As aresult the adversary has limited knowledge about whathas happened in the meantime in the hidden volumes(eg hard disk snooping [18 22])

In this case IsExecutable() can still be easily full-filled by access patterns coming from many realisticupper-layer workloads For example if PD-DM runsunderneath log-structure filesystems on both volumesIsExecutable() becomes exactly the same as that in PD-CPA Further once the public and hidden volume work-loads sufficiently correlate the PD-DM IsExecutable()is True with high probability (see Section 42)Timing Although a system log may record the publicwrites which cover the hidden accesses in PD-DM it isimportant to note that timing information is assumedto not provide adversaries any significant advantage

3 System DesignThe key idea in PD-DM is to ensure a canoni-calsequential physical device write trace notwith-standing the logical layer workload All writes to thephysical device are located at sequentially increasingphysical addresses similar to append operations in log-structure filesystems

This canonical nature eliminates the ability of anadversary to infer logical layer access patterns and thepossible existence of a hidden volume stored in appar-ently ldquofreerdquo space ndash eg by noticing that this ldquofreerdquospace is being written to in between observed snapshots

More importantly it brings two significant perfor-mance benefits Firstly it reduces the impact of disk

PD-DM 157

seeks This is very important as the seek time dominatesthe disk access time [20] For example the average seektime for modern drives today may range from 8 to 10ms [21] while the average access time is around 14 ms[20] Secondly as we will see a block that is chosen tobe written in PD-DM is more likely to contain invaliddata compared to a randomly-selected block This re-duces the average number of physical blocks that needto be accessed to serve one logical write requestSystem layout Figure 3 illustrates the disk layout ofPD-DM and how logical volumes are mapped to the un-derlying physical device For simplicity we consider onepublic volume and one hidden volume in PD-DM1 Asshown later it is easy to add more volumes In generalthe disk is divided into two segments a data segmentand a mapping segment Both public and hidden datais mapped to the data segment more specifically theData area (in Figure 3) Several data structures (PPMPBM HM_ROOT in Figure 3) are stored in the map-ping segment to maintain the logical volume to physicaldevice mapping obliviously and provide physical blockstatus information As will be detailed later the PPMand PBM are associated with the public volume andHM_ROOT with the hidden volume

For better performance recently used mapping-related blocks are cached in memory using assignedmemory pools2

Log-structure The Data area is written sequentiallyas a log and the Phead records its head An atomic writeto the Data area writes not only public data but also ei-ther hidden data as well as its related mapping blocks orsome dummy data to several successive blocks pointedby Phead3 The pointer Phead indicates the block af-ter the last written physical block and wraps around topoint again to the first physical block once the last blockof the Data area is writtenUnderlying Atomic Operation PD_write Asmentioned PD-DM writes to the Data area sequentiallywith an atomic operation Thus we define the atomicoperation as PD_write (see Section 33 for details) Thecontent written by a PD_write operation can be ad-justed by users We assume that one block of public

1 Vp and Vh are used for the two volumes instead of V ip and V j

and the index i or j is also omitted in all the other notationsdefined in Section 2 for simplicity2 Not to be confused with chip level caches (PLB) used in someORAM-based designs [15]3 This is a key point and reduces the impact of disk seeks byan order of magnitude

Fig 3 PD-DM layout The left-most dotted rectangle (blue)highlights the data structures used in mapping logical volumeblocks to physical locations PPM PBM and HM_ROOT Theright-most dotted rectangle (red) highlights the data storagesegment including ldquoPheadrdquo and a set of consecutive blocks on thedisk This segment stores all data from both public and hiddenvolumes as well as parts of the hidden map

data and one block of hidden data can be included inone PD_write operation without loss of generality

PD-DM receives upper-layer IO requests andtranslates them into physical IO to underlying devicesFor the sake of simplicity we first discuss write requestsand later read requests The remaining part of this sec-tion is structured as follows Section 31 describes howpublic and hidden data is arranged in the Data areaunder the assumption that mappings are in memorySection 32 details disk layout of mapping data and itsassociated plausibly deniable management Finally Sec-tion 33 outlines the entire end-to-end design of PD-DM

31 Basic Design

In this section we simplify the discussion by consider-ing certain part of the system as black boxes Specifi-cally to handle logical write requests (Pub_write(lp dp)and Hid_write(lh dh)) PD-DM needs to arrange thereceived data dpdh on the physical device and recordthe corresponding mappings from logical volumes to thephysical device for future reference For now we willconsider the maps as black boxes managing the logical-physical mappings as well as the status (free valid in-valid etc) of each physical block on the device Themap blackbox internals will be provided in Sections 32The black boxes support the following operations whereα is the ID of the physical location storing the logicalblock (lp or lh) and S denotes the ldquostatusrdquo of a physicalblock free valid or invalidndash α = Get_Pub_map(lp) Set_Pub_map(lp α)ndash α = Get_Hid_map(lh) Set_Hid_map(lh α)ndash S = Get_status(α) Set_status(α S)With these auxiliary functions incoming pairs ofPub_write(lp dp) and Hid_write(lhdh) requests are

PD-DM 158

handled straightforwardly as described in Algorithm 1As we assumed that one block of public data and oneblock of hidden data are written together in this papertwo blocks in the Data area are written Specificallypublic data dp and hidden data dh are encrypted withKp and Kh respectively and written to the two suc-cessive blocks at physical locations Phead and Phead+1Initially Phead is set to 0 and PD-DM is ready to receivelogical write requests

In a public-volume-only setting random data is sub-stituted for the encrypted hidden data In Algorithm 1dh = Null signifies the lack of hidden write requestsfrom the upper layers

Algorithm 1 Pub_write + Hid_write ndash Simple version

Input Pub_write(lp dp) Hid_write(lh dh) Phead

1 if dh is not Null then2 Phy_write(Phead dp)3 Phy_write(Phead + 1 dh)4 Set_Pub_map(lp Phead)5 Set_Hid_map(lh Phead + 1)6 else a public volume only no hidden write request7 Randomize data dummy8 Phy_write(Phead dp)9 Phy_write(Phead + 1 dummy)10 Set_Pub_map(lp Phead)11 end if12 Phead larr Phead + 2

Wrap around As described PD-DM always aims towrite to the ldquooldestrdquo (least recently written) block inthe Data area To achieve this it needs to mitigate thepossibility that after a wrap-around valid data is infact stored in that location and cannot be overwrittenFor simplicity consider the size of the Data area beinga multiple of 2 and aligned by public-hidden data pairs

We denote the existing public-hidden data pairpointed to by Phead dexistingp and dexistingh Their statuscan be either valid or invalid Invalid data may be eitherthe result of a previous dummy block write or the caseof data being an ldquooldrdquo version of a logical block

Now the key to realize here is that preserv-ing any of the valid existing data can be done byincluding it into the current input of the ongoinground of writing Table 1 illustrates how functionGet_new_data_pair(dp dh Phead) generates the datapair (dnewp dnewh ) written in the ongoing round in PD-DM The data pair is generated based on the statusof existing data and dh In brief the priority for con-structing the new pair of data is dexistingp dexistingh gt

dpdh gt dummy Then logical requests Pub_write andHid_write are handled with Algorithm 2

Note that the while loop in this algorithm may endwithout setting dh to Null ie the hidden write requestis not completely satisfied within one public write Inthis case more public writes are required to completethe same hidden write After a sufficient number ndash aconstant as will be shown later ndash of rounds the hiddenwrite request can be guaranteed to go through

Algorithm 2 Pub_write + Hid_write ndash Considering wrap-aroundInput Pub_write(lp dp) Hid_write(lh dh) Phead

1 while dp is not Null do The new pair of data is constructed based on Table 1

2 (dnewp dnew

h ) = Get_new_data_pair(dp dh Phead)3 Phy_write(Phead d

newp )

4 Phy_write(Phead + 1 dnewh )

5 if dnewp == dp then

6 Set_Pub_map(lp Phead) dp = Null

7 end if8 if dnew

h == dh then9 Set_Hid_map(lh Phead + 1) dh = Null

10 end if11 Phead larr Phead + 212 end while

Buffering and seek-optimized writes Notwith-standing the logical data layout PD-DM always writesphysical data blocks sequentially starting from the cur-rent value of Phead until enough blocks with invalid datahave been found to store the current payload (Algorithm2) Depending on the underlying stored data writtencontents are either re-encrypted valid existing data ornewly written payload data (Table 1) For example con-sider the following case where the pattern of existingvalid (V) and invalid (I) data is [V IIV V I middot middot middot ] Insteadof seeking to position 2 to write new data there PD-DM writes both position 1 and 2 with re-encryptednewdata sequentially

dh dexistingp d

existingh dnew

p dnewh

invalid invalid dp dummy

invalid valid dp dexistingh

valid invalid dexistingp dummy

valid valid dexistingp d

existingh

invalid invalid dp dh

not invalid valid dp dexistingh

NULL valid invalid dexistingp dh

existingh

Table 1 PD-DM constructs the data pair (dnewp dnew

h ) that canbe written by a PD_write operation according to the status ofexisting data (dexisting

p dexistingh

) and the availability of hiddenwrite requests (dh)

PD-DM 159

As a result by construction all physical data writesto the underlying device happen sequentially The pat-tern is in fact composed of reads and writes to the sameblock followed by the next block and so on For rota-tional media this means two things (i) disk heads do notneed to switch cylinders to write which results in sig-nificantly lower (mostly rotational-only) latencies and(ii) any predictive readwrite-ahead buffering (at thevarious layers within the drive and the OS) can workextremely well especially when compared with existingwork that randomizes physical access patternsReduced number of writes The sequential natureof the physical write trace results in a smaller averagenumber of blocks written per logical write comparedto existing work [11] In a PD solution a few blockswill be accessed following certain rules to find a physi-cal block for the logical data lest the write trace leaksthe access pattern It is more likely that the block se-lected in PD-DM contains no valid data as it is the leastrecently written physical block To set the stage for adetailed analysis we first introduce two related compo-nents ldquospare factorrdquo and ldquotraffic modelrdquo

ldquoSpare factorrdquo ρ In order to store all the logicalblocks of each logical volume the Data area in PD-DM should contain at least 2N physical blocks (N =max(Np Nh) In addition to these blocks similar to ex-isting work [11] ldquosparerdquo space is reserved so that it canbe easy to find available blocks to write We denote theratio of this additional space to the total size of the Dataarea as the ldquospare factorrdquo ρ Then the size of the Dataarea can be represented as 2N

1minusρ Note that the numberof required blocks will change from 2N to (k+1)N oncewe store maps on the disk (detailed in Section 322)and the ldquospare factorrdquo will be (k+1)N

1minusρ thenTraffic model We model the logical write traffic

for each volume with Rosenblumrsquos hotcold data model[28] It captures the spatial locality in real storage work-loads [12] and can be applied to PD-DM by ldquoseparatingrdquothe logical address space of each volume into ldquohotrdquo andldquocoldrdquo groups respectively Two parameters r and f areintroduced in the model Specifically a fraction f of theaddress space is ldquohotrdquo and hit by a fraction r of theoverall write requests The remaining 1 minus f fraction isldquocoldrdquo and receives a fraction of 1 minus r of all write re-quests The write traffic inside each group is uniformlydistributed When r = f = 100 this becomes a specialcase where the entire traffic is uniformly distributed

The average number of blocks written per logicalwrite is related to the probability that the selected blockis available for writing Given a spare factor ρ this prob-ability is actually ρ for a randomly selected block The

0 01 02 03 04

physic

Spare factor

r=100 f=100r= 90 f= 5r= 80 f= 20

Fig 4 Average number of physical writes required for one publicwrite request under different types of logical write workload

average number of blocks needed to be accessed for eachlogical write is then 1

ρ The sequential write nature inPD-DM ensures that no less than 2N middot ρ

1minusρ logical writescan be performed during the period when the entireData area is written once from beginning to end Thisalso implies that average number of blocks needed to beaccessed for each logical write is then at most 1

ρ = O(1)Moreover as PD-DM always writes to the ldquoleast re-

cently writtenrdquo physical block it is more likely to havebeen ldquoinvalidatedrdquo before being overwritten The prob-ability that the written block contains invalid data de-pends on both ρ and the incoming workloadtraffic pat-tern For example if the traffic is uniformly distributedthe probability that the current written block containsinvalid data can be represented as the closed-form Equa-tion 1 [12] HereW (x) denote the Lambertrsquos W function[10] which is the inverse of xex (ie W (y) = x|xex = y)

1 + (1minus ρ) middotW ( minus11minus ρ middot e

minus11minusρ ) (1)

In general cases the probability that the writtenblock in PD-DM contains invalid data cannot be repre-sented with a closed-form equation But it can be cal-culated numerically [12]

Figure 4 shows how the average number of physi-cal writes for one logical write changes when the logicaltraffic and spare factor ρ vary The average number ofphysical writes decreases as ρ increases Moreover themore focused the write traffic (on a smaller logical ad-dress space) the smaller the average number of requiredphysical write operations To sum up given a relativelysmall spare factor the average number of physical writeper logical write in PD-DM is much less compared toother randomization-based solutions

PD-DM 160

32 Mapping amp Block Occupancy Status

So far we have considered the mapping structures asblack boxes We now discuss their design implementa-tion and associated complexities

321 Mapping-related Data Structures

Two different data structures are used to store logical-physical mappings for the two volumes respectivelythe public position map (PPM) for the public volumeand the hidden map B+ tree (HM) with its root node(HM_ROOT) for the hidden volume

The PPM maps public logical block IDs to corre-sponding physical blocks on disk Itrsquos an array storedat a fixed location starting at physical block ID αPPM subsequent offset logical block IDs are used as indicesin the array Its content requires no protection since itis related only to public data and leaks no informationabout the existence of hidden data

Unlike in the case of the PPM for the public vol-ume we cannot store an individual position map forthe hidden volume at a fixed location since accesses tothis map may expose the existence of and access patternto hidden data Instead we deploy an HM tree whichis stored distributed across the entire disk leaving onlythe root node (denoted by HM_ROOT) in a fixed lo-cation (physical block ID = αHM_Root) of the mappingsegment To facilitate easy access and reconstruction ofthe HM each node contains the location of all of itschildren nodes

More specifically HM indexes physical locations bylogical block IDs The leaf nodes are similar to the PPMwhich contains the mapping entries and they are or-dered from left to right (from leaf_1 to leaf_n in Fig-ure 5) An example of a HM tree of height 3 is shownin Figure 5 A logical hidden block is mapped to itscorresponding physical block by tracing the tree branchfrom the root to the corresponding leaf node contain-ing the relevant mapping entry Suppose b is the size ofphysical block IDs in bytes then the height of the HMtree is O(logbBbc(Nh)) The last entry of each HM treeleaf node is used for a special purpose discussed in Sec-tion 323 Thus only bBbcminus 1 mapping entries can bestored in one leaf node Further each internal node canhave bBbc child nodes Finally the HM mapping entryof the i-th logical block is stored in the ( i

bBbcminus1 )-th leafnode at offset i(bBbcminus1) For a block size of 4KB andblock IDs of 32 bits the height of the map tree remainsk = 3 for any volume size between 4GB and 4TB

Fig 5 Hidden logical blocks are mapped to the physical diskwith a B+ tree map (eg of height 3 in this figure) A datablock and the related HM blocks construct a ldquohidden tuplerdquodenoted as T (dh) The HM blocks are the tree nodes in the pathfrom the root to the relevant leaf containing the correspondingmapping entry for the data block In this figure a ldquohidden tuplerdquoand the corresponding path of HM tree are colored gray

The HM_ROOT block is encrypted with the hid-den key Kh Adversaries cannot read the map in theabsence of Kh Additionally the HM_ROOT block willbe accessed following a specific rule independent of theexistence of a hidden volume This ensures that accessesto HM_ROOT do not leak hidden information Moredetails in Section 33

322 GetSet_map

Managing the PPM is relatively simple Logical blockID lp is used to retrieve the physical block where thecorresponding PPM entry resides (Algorithms 6 and 7in Appendix A2)

Managing the HM tree however requires muchmore care (Algorithms 8 and 9 in Appendix A2) Theaccess pattern to the HM tree should not be accessi-ble to a curious multi-snapshot adversary Fortunatelyas the map entries are stored in the leaf nodes access-ing one map entry always requires traversing one of thetree paths from the root node to the corresponding leafThus each path (from the root to a leaf) is linked withand embedded within a given corresponding data blockas a ldquohidden tuplerdquo in PD-DM And a ldquohidden tuplerdquois written to successive blocks in the Data area

Figure 5 illustrates the ldquohidden tuplerdquo concept foran example HM tree of height 3 The hidden tupleshere consist of the data block itself and the nodes inthe path from the root of HM to the relevant leaf node

PD-DM 161

containing the mapping entry for the data block4 Lethmi denote the HM tree node at level i in the pathfrom the root to the leaf Then the hidden tuple corre-sponding to hidden data dh is denoted as T (dh) =lthm1 hm2 middot middot middot hmkminus1 dh gt where k is the height ofthe HM tree In order to accomplish a write requestHid_write(lh dh) we need to not only write the datadh but also update the HM entry to reflect the correctnew mapping This is achieved by also writing a hiddentuple and updating HM_ROOT

Note that accessing one HM entry does not affectany other branches of the tree unrelated to it Thusimportantly no cascading effects ensue To understandthis better consider that as leaves are ordered on logicaladdresses changing the mapping entry in one leaf nodeonly requires updating the corresponding index in itsancestor nodes which indicate the physical location of itscorresponding child in the path Other indexes remainthe same and the rest of the tree is unchanged

323 GetSet_Status

PD-DM requires one more key structure providing theoccupancy status of individual blocks So far this wasassumed to be a black box (Section 31) We now detail

Keeping track of the status of individual blocks isdone using a public bit map (PBM) It contains one bitand one IV for each physical block in the Data area Abit is set once public data is being written to the cor-responding physical block The bitmap ensures that weare able to check the state of a physical block beforewriting to it to avoid public data loss eg by overwrit-ing The PBM is public and not of concern with respectto adversaries It is worth noting that the blocks occu-pied by hidden data are shown as unused in the PBM

To protect the hidden data from overwriting wedeploy a number of additional mechanisms We cannotuse a public bitmap since it would simply leak with-out additional precautions which necessarily introduceinefficiencies Examples of such inefficient bitmaps in-clude the in-memory bitmap used by Peters [27] andthe fuzzy bitmap used by Pang [26] which introducedldquodummy blocksrdquo to explain blocks which were markedas in-use but unreadable by the filesystem

4 The root node is not included in the hidden tuple since it isalways stored as HM_ROOT as explained before Updating theroot node can be done in memory only so that the sequentialdisk write behavior is not affected

Instead PD-DM deploys a ldquoreverse maprdquo mecha-nism (similar to HIVE [8]) to map physical to logicalblocks The corresponding logical ID of each block con-taining hidden data is stored directly in the correspond-ing hidden tuple or more specifically in the last entryof the leaf node in the hidden tuple (see Figure 5)

This makes it easy to identify whether a hidden tu-ple (and its associated block) is still valid ie by check-ing whether the logical block is still mapped to wherethe hidden tuple currently resides For example supposethe hidden tuple is stored in the physical blocks β + 1to β+ k then the hidden data is in block β+ k and thelogical ID of this hidden tuple is in block β + 1 Thusthe state of the hidden tuple will be obtained by firstreading block β+1 for the logical ID l and then readingthe current map entry for index l from the HM tree IfHM [l] = β+k is true then the hidden data there is stillup to date Otherwise the hidden tuple is invalid (andcan be overwritten)

As an optimization in this process one doesnrsquot al-ways need to read all k nodes in the path of the HMtree for the map entry Any node read indicating thata node on the path is located between β + 1 to β + k

establishes HM [l] = β + k as trueImportantly the above mechanisms do not leak any-

thing to multi-snapshot adversaries since all enabling in-formation is encrypted (eg the logical ID is part of thehidden tuple) and no data expansion occurs (would havecomplicated block mapping significantly) Status man-agement (Set_status(α S) and Get_status(α)) can bedone through readingwriting the hidden tuples Thusno additional Phy_write operations are needed

Algorithm 3 Set_Hid_map_full(lh α) ndash update a hiddenmap entry or fake map accessesInput lh α αHM_Root k1 if l_h = 0 then Randomized data dummy is written2 mapblock = Phy_read(αHM_Root)3 Phy_write(αHM_Rootmapblock)4 for i isin 2 k do5 Fake map block mapdummy

6 Phy_write(αminus k + iminus 1mapdummy)7 end for8 else real hidden data is written9 Set_Hid_map(lh α)10 end if

33 Full PD-DM With Maps On Disk

Once all mapping structures are on disk ndash in additionto replacing the GetSet_map black boxes with Algo-

PD-DM 162

rithms 6 ndash 9 a few more things need to be taken careof to ensure end-to-end PD

Most importantly accesses to the maps should notexpose the existence of hidden data and behave iden-tically regardless of whether users store data in hid-den volumes or not It is easy to see that no Set_mapfunctions are called whenever the written data is eitherldquoexistingrdquo or ldquodummyrdquo in Algorithm 2 which skewsthe access distribution and may arouse suspicion Totackle this additional fake accesses to the mapping-related data structures are performed as necessary TheSet_Hid_map_full function either sets a real hiddenmap entry or fakes those accesses (Algorithm 3) wherelh = 0 means faking accesses to related data structures

As a result the same number of physical writeoperations are performed regardless of whether thepair of public-hidden data written contains new dataexisting data or dummy data in Algorithm 2 Anatomic operation PD_write can be defined as Al-gorithm 4 to describe all those physical writesPD_write(Phead lp dp lh dh) arranges a public-hiddendata pair (dp dh) together in the Data area and up-date HM (real or fake) using a number of sequentialPhy_write operations

Then the resulting complete set of steps for handlinga logical write request can be illustrated with Algorithm5 The lnewh is set to be 0 when dh is randomized datadummy and lnewp is 0 when dnewp is existing public dataLogical read requests are straightforwardly handled byaccessing the data and existing mapping structures asillustrated in Algorithms 10 and 11 (Appendix A2)

Algorithm 4 PD_write(Phead lp dp lh dh)

Input Phead lp dp lh dh

1 Phy_write(Phead dp) Write the public data2 Set_Hid_map_full(lh Phead + k) Write the hidden

map3 Phy_write(Phead + k dh) Write the hidden data

Algorithm 5 Public write + Hidden write ndash Full PD-DMwith maps on the diskInput Pub_write(lp dp) Hid_write(lh dh) Phead

1 while dp is not Null do2 (dnew

p dnewh ) = Get_new_data_pair(dp dh Phead)

3 PD_write(Phead lnewp dnew

p lnewh dnew

h )4 if dnew

p == dp then5 Set_Pub_map(lp Phead) dp = Null

6 end if7 if dnew

h == dh then8 dh = Null

9 end if10 Phead larr Phead + (k + 1)11 end while

With maps on disk PD_write operations writesmore than just data blocks to the physical device Butthose physical writes still happen sequentially Howeveraccessing the map-related data structures (PPM PBM) will bring in a few seeks In-memory caches aremaintained in PD-DM for those data structures whichcan significantly reduce the number of seeks Considercaching PPM blocks as an example as bBbc mappingentries are stored in one mapping block caching it al-lows the reading of map entries for bBbc logical ad-dresses (in a sequential access) without seeking Thuscomparing with the previous works which seeks beforewriting every randomly chosen block PD-DM has lowerseek-related expense

Finally it is easy to generalize PD-DM to have mul-tiple volumes A PD_write operation can write hiddendata from any hidden volume

34 Encryption

All blocks in the Data area are encrypted using AES-CTR with one-time-use random per-block IVs ThoseIVs are stored as part of the PBM In fact other similarschemes of equivalent security level can be envisionedand the deployed cipher mode can be any acceptablemode providing IND-CPA security

4 Analysis41 Security

We now prove that adversaries cannot distinguish thetwo write traces W(O0) and W(O1) and thus can winthe game of Definition 1 with at most negligible advan-tage

Theorem 1 PD-DM is plausibly deniable

Proof (sketch) Each access pattern results in a writetrace transcript available to the adversary through dif-ferent device snapshots The transcript contains theseexact entriesndash Write k+ 1 blocks to the Data area and update the

HM_ROOT with PD_write operationsndash Update the PPM accordinglyndash Update the PBM accordinglyndash Update the PheadIn this transcript Lemmas 4 5 6 and 7 (in AppendixA3) show that the locations of all writes depend only onpublic information (public writes) Further the contentsof every write is either fully randomized or depends on

PD-DM 163

public data only (public writes) This holds also for thewrite traces of O0 and O1 in Definition 1 As a resultadversaries cannot win the game of Definition 1 betterthan random guessing

42 IO Complexity

As discussed above to hide the existence of the hid-den data more work is being done in addition to justreading and writing the data itself Here we discuss theassociated IO complexities and constantsPublic Read A Pub_read request requires readinga PPM block for the map entry plus reading the corre-sponding data block Thus compared with the raw diskPub_read roughly reads twice as often and halves thethroughput It may be reduced using a smart memorycaching strategy for the PPM blocks since consecutivePPM entries are stored in one PPM blockHidden Read Hid_read on the other hand requiresreading one HM tree path for the map entry first Thusk + 1 blocks (k HM tree path blocks + data block) areread in total which results in B middot (k + 1) = O(logN)complexity where k = logN is the height of the HMtreeUnderlying Atomic Write Operation PD_writechecks if the next k+ 1 blocks are free fills those blockswith the newly formed pair of public data and hiddentuple and updates the HM_ROOT block Checking thestate of the k+ 1 candidate blocks requires reading onePBM block and at most k HM tree nodes (Section 323)This first step gives B middot (k + 2) and results in O(logN)since k = logN Further the step of writing the k + 1blocks in order gives B middot (k+ 1) resulting in O(logN) aswell Thus the IO complexity of the overall PD_writeoperation is O(logN)PublicHidden Write As described in Algorithm5 only a few additional IOs (constant number) areneeded besides a PD_write for each iteration of theloop As the IO complexity of the PD_write opera-tion is O(logN) the complexity of each iteration is alsoO(logN) The analysis in Section 31 shows that eithera Pub_write or a Hid_write can be completed withinO(1) iterations (O(1) PD_writes) Thus the overallIO complexity is O(logN) for publichidden writeThroughput Another fact worth noting is that afteran initial period in which sufficient writes have been seenfor each volume the device usually ends up with N validblocks for each volume regardless of the workload Atthat point throughput stabilizes and is not a functionof uptime anymore

43 Throughput Comparison

PD-DM accesses at most O(logN) blocks for each log-ical access with relatively small constants (eg 2-3 forjudiciously chosen spare factors) Moreover physical ac-cesses are mostly sequential thus reducing latency com-ponents and improving performance

By comparison existing work such as HIVE fea-tures an IO complexity of O(log2N) [8] with constantsof either 64 (without a stash) or 3 (with a stash) Fur-thermore access patterns are random making seeks im-pactful and cachingbuffering difficult As a result onrotational media PD-DM outperforms existing work byup to two orders of magnitude

5 EvaluationPD-DM is implemented as a Linux block device map-per Encryption uses AES-CTR with 256 bit keys andrandom IVs Blocks are 4KB each Two virtual logicaldevices (one public and one hidden) are created on onephysical disk partition

Experiments were run on a machine with dual-corei5-3210M CPUs 2GB DRAM and Linux kernel 3136The underlying block devices were a WD10SPCX-00KHST0 5400 rpm HDD and a Samsung-850 EVOSSD The average seek time of the HDD is 89ms [4]

Two volumes of 40GB each were created on a 180GBdisk partition (ρ asymp 01) Ext4 FSes were installed oneach volume in ordered mode Additionally an ext4 FSmounted on top of a dm-crypt (encryption only) vol-ume was benchmarked on the same disk HIVE [2] wascompiled and run on the same platform

51 HDD Benchmarks

Two benchmarks (Bonnie++[1] IoZone[25]) were usedto evaluate performance under sequential and random

Operation dm-crypt PD-DM HIVE [8]Public Read 973 17452 plusmn 0367 0095Public Write 880 1347 plusmn 0093 0013Hidden Read na 12030 plusmn 0088 0113Hidden Write na 1456 plusmn 0086 0016

Table 2 Sequential workload throughput (MBs) comparisonwith the Bonnie++ benchmark PD-DM is 100-150 times fasterthan HIVE for sequential read and write The PD-DM through-puts reported here are the average value over 40 runs with thestandard error

PD-DM 164

0 5 10 15 20 25 30 35 40

Run number

Hidden volume Seq readMean

(a) Seq read

0 5 10 15 20 25 30 35 40

Run number

Hidden volume Seq writeMean

(b) Seq write

Fig 6 PD-DM hidden IO throughput results of multiple runs

workloads respectively Results are in Tables 2 and3 To further evaluate the impact of caching parts ofmapping-related data structures the IoZone experimentwas run multiple times for different cache sizes Resultsare illustrated in Figure 8

511 Throughput

Sequential IO For consistency in comparing withHIVE Bonnie++ was first used to benchmark thethroughput of sequential read and write for both vol-umes The mapping-related structures cache was lim-ited to 1MB Results averaged over 40 runs are shownin Table 2 Overall PD-DM performs about two ordersof magnitude faster for both public and hidden volumesFor example the basic Bonnie++ benchmark requiredabout one full week to complete on existing work [8]in the same setup The same benchmark completed inunder 4 hours on PD-DM

Figure 6 shows the result of every run as a sam-ple point as well as the mean value for the hidden vol-ume IO throughput Although the throughput variesaround the mean value for both read and write it doesnot decrease as more runs are completed Note thatBonnie++ writes more than 8GB of data to the diskin each run ndash as a result the disk has been written overand over during the test Moreover write throughputnears the mean with increasing run number ie thedisk is converging to a stable state The results for thepublic volume are similar and not presented here

As expected PD-DM and all existing PD systemsare slower when compared to simple encryption (dm-crypt) without PD However notably PD-DM is thefirst PD system protecting against a multi-snapshot ad-versary with performance in the MBs range on harddisks and 15MBs+ for public readsRandom IO IoZone was run to evaluate system be-havior under random IO workloads IoZone reads filesof different sizes using random intra-file accesses Direct

IO is set for all file operations so that IO caching bythe OS is bypassed Further the RAM allocated for PD-DM mapping-related data structures (PPM PBM etc)is limited to 1MB Results of random IO throughputsfor file sizes of 1GB are shown in Table 3

It can be seen that the PD-DM sequential writemechanism really makes an impact and performs 3ndash30timesfaster than HIVE Notably even standard dm-crypt isonly 2ndash3times faster

Overall PD-DM performs orders of magnitudefaster than existing solutions for both sequential andrandom workloads Moreover a great potential advan-tage of PD-DM for random workloads can be seen byanalyzing the results in Figure 7 depicting the perfor-mance for sequential and random IO of the dm-cryptand the two volumes in PD-DM respectively

Note that both read and write throughputs dropsignificantly faster for dm-crypt when the IO pat-tern changes from sequential to random In contrastthroughputs of the two PD-DM volumes are signifi-cantly less affected especially for writes This is becausethe underlying write pattern is always sequential in PD-DM regardless of logical IO request locality

It is important to note however that accessing map-ping structures may disturb the sequentiality of result-ing write patterns Fortunately the design allows foreven a small amount of cache to have a great impact

512 Cache Size

To evaluate the influence of cache sizes on performanceunder sequential or random access with files of differentsizes IoZone is run for different cache sizes Results arein Figure 8

dm-crypt PD-DMPublic

PD-DMHidden

seq readrandom read

seq writerandom write

Fig 7 PD-DM throughput comparison with dm-crypt RandomIO results in a throughput drop PD-DM is less affected becausethe underlying write pattern is always sequential regardless oflogical IO request locality

PD-DM 165

100 1000

File size (MB)

Cache size = 4KBCache size = 400KB

Cache size = 1MBCache size = 4MB

dm-crypt

(a) Hidden volume of PD-DMvs dm-crypt Random write

100 1000

ughput (K

File size (MB)

dm-crypt

(b) Hidden volume of PD-DMvs dm-crypt Random read

100 1000

ughput (K

File size (MB)

dm-crypt

(c) Public volume of PD-DMvs dm-crypt Random read

100 1000

ughput (K

File size (MB)

(d) PD-DM Hidden volumeSequential write

100 1000

ughput (M

File size (MB)

dm-crypt

(e) dm-crypt Sequential write

100 1000

ughput (M

File size (MB)

(f) PD-DM Hidden volumeSequential read

100 1000

ughput (M

File size (MB)

(g) PD-DM Public volume Se-quential read

100 1000

ughput (M

File size (MB)

dm-crypt

(h) dm-crypt Sequential read

Fig 8 IoZone throughput comparison for different workloads cache sizes and file sizes between PD-DM and dm-crypt Figure 8(a)shows how PD-DM random-write throughput is affected by cache size Write throughput exceeds that of dm-crypt as the cache sizeincreases since caching the mapping data structures reduces extraneous seeks and increases localitysequentiality in the underlyingwrites Similarly for random read (Figure 8(b) and 8(c)) increased caching improves performance for both public and hidden volumesespecially when the file size is relatively large But the throughput cannot be better than dm-crypt Cache size determines the turn-ing point Sequential IO is illustrated in Figures 8(d)ndash8(h) Throughputs of dm-crypt are shown in Figures 8(e) and 8(h) PD-DMwrite throughput (Figure 8(d)) varies in a small range except from one special case (cache size=4KB) since one hidden map entryis related to 2 map blocks (8KB) Read throughput (Figures 8(f) and 8(d)) is also less affected by caching as long as the number ofcache blocks is larger than the number of mapping blocks read for one mapping entry (4KB for public volume and 8KB for hidden vol-ume) Contrastingly the sequential-write throughput of dm-crypt increases with the increasing file size at first then remains in a stablerange

Operation dm-crypt PD-DM HIVE [8]Public Read 0526 0309 plusmn 0005 0012Public Write 1094 0276 plusmn 0017 0010Hidden Read na 0309 plusmn 0011 0097Hidden Write na 0282 plusmn 0024 0014Seeks Per Second 1227 33689 10357

Table 3 Performance comparison for random workloads ThePD-DM throughputs reported here are the average value over 40runs with the standard error The first four rows show throughput(MBs) of random read and write for different volumes respec-tively as measured by IoZone The last row represents the numberof seeks per second (the number for public and hidden volume areseparated by a slash) reported by Bonnie++

Write throughput Figure 8(a) illustrates random-write throughput of the hidden volume with differentcache sizes Random-write throughput for dm-crypt de-creases gradually (from 3MBs to 1MBs) with increas-ing file sizes mainly because of higher number of seeksfor larger files However PD-DM random-write through-put behaves differently For caches above 400KB it re-mains stable for relatively small files and then drops

for file sizes beyond a threshold As expected largercaches result in larger thresholds For caches over 4MBwrite throughput stabilizes for files smaller than 1GBThis means that map blocks are mostly read fromwriteto the caches without disk seeks As a result PD-DMrandom-write throughput exceeds that of dm-crypt

It is worth noting that the same cache mechanismdoes not work for dm-crypt as its logical-to-physicalblock mapping is fixed and the larger number of seeksderives from the logical-layer access randomness In con-trast in PD-DM additional seeks occur only when ac-cessing mapping blocks where multiple entries are storedtogether This effectively maximizes the overall cache-hit rate for the mapping data structures

Sequential-write throughput for the hidden volume(Figure 8(d)) is compared with the dm-crypt curve (Fig-ure 8(e)) As seen the cache size has a decreasing effectabove caches larger than 4KB (1 block) Moreover thethroughput is not greatly affected by file size (unlike inthe case of dm-crypt) although a larger file still resultsinto a lower throughput

PD-DM 166

Write throughput for the public volume is similarto the hidden volume mostly because hidden writes areexecuted together with public writesRead throughput Random-read throughput forboth volumes (Figures 8(b) and 8(c)) follow a similartrend as the random-write throughput (Figure 8(a)) ndashthroughput decreases significantly for file sizes beyond acertain threshold Further larger caches result in largerthresholds Finally the threshold for both volumes isthe same for the same cache size

Sequential-read throughputs for both volumes areless affected by either file or cache sizes (Figures 8(f)ndash8(h)) A special interesting case occurs for caches ofonly 4KB (1 block) In that case the sequential-readthroughput of the hidden volume drops by half (Fig-ures 8(f)) similar to what happens in Figures 8(d) Thereason for this is that reading a hidden mapping entryrequires reading more than one block when the heightof the hidden map tree is larger than 2 This resultsin continuous cache misses The same situation doesnrsquothappen for the public volume mainly because a sin-gle physical block can contain multiple adjacent publicmapping entries thus even a one-block cache can makean impact

The variation in sequential-read throughput ismildly affected by the existing layout of file blocks ondisk After all a continuous file may be fragmented byexisting data on disk

52 SSD Benchmarks

PD-DM is obviously expected to perform most of itsmagic on HDDs where reduced numbers of disk seekscan result in significant performance benefits Fortu-nately however PD-DM performs also significantly bet-ter than existing work when run on SSDs

A throughput comparison including PD-DM HIVEand dm-crypt is shown in Table 4 Only a minimal 40KBcache is being used ndash since seeks are cheap in the caseof SSDs we expected (and found) that cache size haslittle effect on performance

Overall although the performance advantage of re-ducing the number of seeks decreases for SSDs PD-DM still outperforms existing work due to two impor-tant facts (i) reads are unlinked with writes and (ii)PD-DM write complexity is only O(logN) compared toO(log2N) in HIVE

Table 4 Throughput (MBs) comparison on SSD for a sequen-tial workload using the Bonnie++ PD-DM is 30-180x faster thanHIVE for sequential read and 5x faster for sequential write

6 DiscussionAdvantages PD-DM has a number of other advan-tages over existing schemes aside from the performanceFirst PD-DM does not require secure stash queues aslong as there are enough public writes ongoing Notwith-standing it is worth noting that caches and filesystemrelated buffers should be protected from adversariessince they contain information about hidden volumesSecond PD-DM preserves data locality from logical ac-cesses The logical data written together will be arrangetogether on the device with high probability Third PD-DM breaks the bond between reads and writes and en-sures security Adversaries may have a prior knowledgeof the relationship between physical reads and writessince reads and writes are usually generated togetherby a filesystem HIVE masks hidden writes using pub-lic reads which may break this FS-implied relationshipand arouse suspicions For example a hidden write mas-queraded as a read to a public file data block may raisesuspicion if usually real public reads are coupled withwriting metadata or journaling writes which may beabsent in this casePhysical Space Utilization Considering only thedata ORAM in HIVE with two logical volumes HIVEcan use at most 50 of the disk for both public dataand hidden data since half the disk is required to befree Thus assuming that the two logical volumes are ofthe same size each would be of 25 of the total device

In PD-DM ρ and k decide volume sizes Both thepublic and the hidden volume is a fraction of (1minusρ)middot 1

k+1of the entire device (considering only the Data area)Assuming that k = 3 (works for volume size from 4GBto 4TB) an empirically chosen sweet spot balancingspace and performance can be found at ρ = 20 whichresults in an overall volume size of around 20 of thedeviceOptimization Knobs Performance is impacted by anumber of tuning knobs that we discuss here

PD-DM 167

Increasing ρ results in increased throughput at thecost of decreased space utilization The intuition hereis that the probability that the k + 1 target blocksfor PD_writes contain overwritable (invalid) data in-creases with increasing ρ ρ enables fine-tuning of thetrade-off between throughput and space utilization

Further changing the ratio between the numberof public data blocks and that of hidden data blockswritten together can help balancing the public writethroughput and the waiting time of hidden writes Inthis paper we assume that one public data block is al-ways written together with one hidden data block Forworkloads where the user accesses public data more of-ten than hidden data we can actually change the ratioso that more than one public data blocks are writtenwith one hidden data block (eg 2 public data pairedwith 1 hidden data) This results in a higher throughputfor public writes at the cost of delayed writes for hiddendata and a requirement for more memory to queue upoutstanding hidden data writes

The physical space utilization will also be affectedwhen the above ratio changes The size of public volumeincreases while the size of hidden volume decreases Forexample if 2 public data are paired with 1 hidden datathen k + 2 instead of k + 1 blocks will be written inthe Data area for that For an unchanged ρ the size ofpublic volume becomes (1minus ρ) middot 1

k2+1 gt (1minus ρ) middot 1k+1 of

the size of the Data area while the size of hidden volumebecomes (1minus ρ) middot 1

2(k2+1) = (1minus ρ) middot 1k+2 lt (1minus ρ) middot 1

k+1of the size of the Data area

7 Related WorkPlausibly-deniable encryption (PDE) is related toPD and was first introduced by Canetti et al [9] PDEallows a given ciphertext to be decrypted to multipleplaintexts by using different keys The user reveals thedecoy key to the adversary when coerced and plausiblyhides the actual content of the message Some examplesof PDE enabled systems are [7 19] Unfortunately mostPDE schemes only work for very short messages and arenot suitable for larger data storage devicesSteganographic Filesystems Anderson et al [5] ex-plored ldquosteganographic filesystemsrdquo and proposed twoideas for hiding data A first idea is to use a set of coverfiles and their linear combinations to reconstruct hiddenfiles This solution is not practical since the associatedperformance penalties are significant A second idea isto store files at locations determined by a crypto hash

of the filename This solution required storing multi-ple copies of the same file at different locations to pre-vent data loss McDonald and Kahn [23] implemented asteganographic filesystem for Linux on the basis of thisidea Pang et al [26] improved on the previous construc-tions by avoiding hash collisions and providing more ef-ficient storage

These steganographic filesystem ideas defendonly against a single-snapshot adversary Han etal [17] designed a ldquoDummy-Relocatable Stegano-graphicrdquo (DRSteg) filesystem that allows multiple usersto share the same hidden file The idea is to runtime-relocate data to confuse a multi-snapshot adversary andprovide a degree a deniability However such ideas donot scale well to practical scenarios as they require as-sumptions of multiple users with joint-ownership andplausible activities on the same devices Gasti et al [16]proposed PD solution (DenFS) specifically for cloudstorage Its security depends on processing data tem-porarily on a client machine and it is not straightfor-ward to deploy DenFS for local storage

DEFY [27] is a plausibly-deniable file system forflash devices targeting on multi-snapshot adversaries Itis based on WhisperYAFFS [30] a log structured filesys-tem which provides full disk encryption for flash devicesIt is important to note that due to its inherent log-structure nature derived from WhisperYAFFS similarto PD-DM DEFY also writes data sequentially under-neath on flash devices with the intention of maintaininga reasonable flash wear-leveling behavior and thus notsignificantly reduce device lifespan

However unlike PD-DM PD in DEFY is unrelatedto the sequential access pattern and instead is achievedthrough encryption-based secure deletion AdditionallyDEFY takes certain performance-enhancing shortcutswhich immediately weaken PD assurances and associ-ated security properties as follows

Firstly DEFY doesnrsquot guarantee that a blockDEFY writes is independent from the existence of hid-den levels Specifically all writes in DEFY will bypassthe blocks occupied by hidden data unless the filesystemis always mounted at the public level This is a signif-icant potential vulnerability that undermines PD andrequires careful analysis

Secondly DEFY writes checkpoint blocks for hid-den levels after all data has been written to the deviceThus these checkpoint blocks cannot be explained asdeleted public data any more which leaks the existenceof hidden levels

Thirdly DEFY requires all filesystem-related meta-data to be stored in memory This does not scale for

PD-DM 168

memory constrained systems with large external stor-age The use of an in-memory free-page bitmap aggra-vates this problem In contrast PD-DM stores mapping-related data structures securely on disk and allows theoption of caching portions thereofBlock Devices At block device level disk encryptiontools such as Truecrypt [3] and Rubberhose [19] pro-vide PD against single-snapshot adversaries Mobiflage[29] provides PD for mobile devices but cannot protectagainst multi-snapshot adversaries as well

Blass et al [8] (HIVE) are the first to deal with PDagainst a multi-snapshot adversary at device level Theyuse a write-only ORAM for mapping data from logicalvolumes to an underlying storage device and hiding ac-cess patterns for hidden data within reads to non-hidden(public) data Chakraborti et al [6] use writes to publicdata to hide accesses to hidden data based on anotherwrite-only ORAM that does not need recursive maps

8 ConclusionPD-DM is a new efficient block device with plausible de-niability resistant against multi-snapshot adversariesIts sequential physical write trace design reduces im-portant seek components of random access latencies onrotational media Moreover it also reduces the numberof physical accesses required to serve a logical write

Overall this results in orders of magnitude (10ndash100times) speedup over existing work Notably PD-DM isthe first plausible-deniable system with a throughputfinally getting within reach of the performance of stan-dard encrypted volumes (dm-crypt) for random IOFurther under identical fine-tuned caching PD-DMoutperforms dm-crypt for random IO

AcknowledgmentsThis work was supported by the National ScienceFoundation through awards 1161541 1318572 15261021526707

References[1] Bonnie++ httpwwwcokercomaubonnie++[2] Hive httpwwwonarlioglucomhive[3] TrueCrypt httptruecryptsourceforgenet

[4] Western digital blue wd10spcx httpswwwgoharddrivecomWestern-Digital-Blue-WD10SPCX-1TB-5400RPM-2-5-HDD-pg02-0319htm

[5] R Anderson R Needham and A Shamir The stegano-graphic file system In Information Hiding pages 73ndash82Springer 1998

[6] C C Anrin Chakraborti and R Sion DataLair Efficientblock storage with plausible deniability against multi-snapshot adversaries Proceedings on Privacy EnhancingTechnologies 2017(3) 2017

[7] D Beaver Plug and play encryption In Advances in Cryp-tology ndash CRYPTOrsquo97 pages 75ndash89 springer

[8] E-O Blass T Mayberry G Noubir and K OnarliogluToward robust hidden volumes using write-only obliviousram In Proceedings of the 2014 ACM SIGSAC Conferenceon Computer and Communications Security pages 203ndash214ACM 2014

[9] R Canetti C Dwork M Naor and R Ostrovsky Deniableencryption In Advances in Cryptology ndash CRYPTOrsquo97 pages90ndash104 Springer 1997

[10] R M Corless G H Gonnet D E Hare D J Jeffrey andD E Knuth On the lambertw function Advances in Com-putational mathematics 5(1)329ndash359 1996

[11] P Desnoyers Analytic models of ssd write performanceTrans Storage 10(2)81ndash825 Mar 2014

[12] P Desnoyers Analytic models of SSD write performanceACM Transactions on Storage (TOS) 10(2)8 2014

[13] F Douglis and J Ousterhout Log-structured file systemsIn COMPCON Springrsquo89 Thirty-Fourth IEEE Computer So-ciety International Conference Intellectual Leverage Digestof Papers pages 124ndash129 IEEE 1989

[14] J Dursi On random vs streaming IO performanceor seek() and you shall find ndash eventually 2015 httpsimpsonlabgithubio20150519io-performance

[15] C W Fletcher L Ren A Kwon M van Dijk and S De-vadas Freecursive ORAM[nearly] free recursion and in-tegrity verification for position-based oblivious ram In ACMSIGPLAN Notices volume 50 pages 103ndash116 ACM 2015

[16] P Gasti G Ateniese and M Blanton Deniable cloud stor-age sharing files via public-key deniability In Proceedings ofthe 9th annual ACM workshop on Privacy in the electronicsociety pages 31ndash42 ACM 2010

[17] J Han M Pan D Gao and H Pang A multi-usersteganographic file system on untrusted shared storage InProceedings of the 26th Annual Computer Security Applica-tions Conference pages 317ndash326 ACM 2010

[18] J Hinks Nsa copycat spyware could be snooping on yourhard drive 2015 httpsgooglKtesvl

[19] R P W J Assange and S Dreyfus Rubber-hosecryptographically deniable transparent disk encryptionsystem 1997 httpmarutukkuorg

[20] C M Kozierok Access time 2001 httppcguidecomrefhddperfperfspecpos_Accesshtm

[21] C M Kozierok Seek time 2001 httppcguidecomrefhddperfperfspecposSeek-chtml

[22] L Mathews Adobe software may be snooping through yourhard drive right now 2014 httpsgoogllXzPS3

[23] A D McDonald and M G Kuhn StegFS A stegano-graphic file system for Linux In Information Hiding pages463ndash477 Springer 1999

PD-DM 169

[24] J Mull How a Syrian refugee risked his life to bear witnessto atrocities Toronto Star Online posted 14-March-20122012 httpsgooglQsivgB

[25] W Norcott and D Capps IOZone filesystem benchmark2016 httpwwwiozoneorg

[26] H Pang K-L Tan and X Zhou StegFS A steganographicfile system In Data Engineering 2003 Proceedings 19thInternational Conference on pages 657ndash667 IEEE 2003

[27] T Peters M Gondree and Z N J Peterson DEFY Adeniable encrypted file system for log-structured storage In22nd Annual Network and Distributed System Security Sym-posium NDSS 2015 San Diego California USA February8-11 2014 2015

[28] M Rosenblum and J K Ousterhout The design and imple-mentation of a log-structured file system ACM Transactionson Computer Systems (TOCS) 10(1)26ndash52 1992

[29] A Skillen and M Mannan On implementing deniable stor-age encryption for mobile devices 2013

[30] WhisperSystems Github WhispersystemswhisperyaffsWiki 2012 httpsgithubcomWhisperSystemsWhisperYAFFSwiki

[31] X Zhou H Pang and K-L Tan Hiding data accessesin steganographic file system In Data Engineering 2004Proceedings 20th International Conference on pages 572ndash583 IEEE

PD-DM 170

A Appendix

A1 PD-CPA Game

1 Adversary A provides a storage device D (the adver-sary can decide its state fully) to challenger C

2 C chooses two encryption keys Kpub and Khid usingsecurity parameter λ and creates two logical volumesVpub and Vhid both stored in D Writes to Vpub andVhid are encrypted with keys Kpub and Khid respec-tively C also fairly selects a random bit b

3 C returns Kp to A4 The adversary A and the challenger C then engage in

a polynomial number of rounds in which(a) A selects access patterns O0 and O1 with the fol-

lowing restrictioni O1 and O0 include the same writes to Vpubii Both O1 and O0 may include writes to Vhid

iii O0 and O1 should not include more writes toVhid than φ times the number of operationsto Vpub in that sequence

(b) C executes Ob on D and sends a snapshot of thedevice to A

(c) A outputs bprime (d) A is said to have ldquowonrdquo the round iff bprime = b

A2 Algorithms

Algorithm 6 Set_Pub_map(lp α)

Input lp α αP P M

Find the physical block containing the map entry1 αM = αP P M + blp bBbcc2 mapblock = Phy_read(αM ) Set the map entry and write back

3 mapblock[lpmod bBbc]larr α

4 Phy_write(αM mapblock)

Algorithm 7 α = Get_Pub_map(lp)

Input lp αP P M αP P M is the starting block ID of PPM inthe physical device

Ouput α Find the physical block containing the map entry

1 αM = αP P M + blp bBbcc2 mapblock = Phy_read(αM ) Get the map entry

3 α = mapblock[lpmod bBbc]

Algorithm 6 to 9 define how the logical to physicalmaps for public and hidden volumes are maintained inPD-DM

Algorithm 8 Set_Hid_map(lh α)Input lh α αHM_Root k

Update the HM_ROOT1 αM larr αHM_Root X larr (bBbc minus 1) middot bBbckminus2

2 mapblock = Phy_read(αM )3 αM = mapblock[lhX]4 mapblock[lhX] = αminus k + 15 Phy_write(αHM_Rootmapblock)6 lh larr lhmodX

7 for i isin 2 k do Update each layer of the HM tree by writing sequentially

8 X larr (bBbc minus 1) middot bBbckminusiminus1

9 mapblock = Phy_read(αM )10 αM = mapblock[lhX]11 mapblock[lhX] = αminus k + i

12 Phy_write(αminus k + iminus 1mapblock)13 lh larr lhmodX

14 end for

Algorithm 9 α = Get_Hid_map(lh)Input lh αHM_Root k αHM_Root is the block ID

of HM_ROOT in the physical device k is the height of theHM tree

Ouput α1 αM larr αHM_Root

2 for i isin 1 k minus 1 do Traverse one path in the HM tree from root to corre-sponding leaf node

4 mapblock = Phy_read(αM )5 αM = mapblock[lhX]6 lh larr lhmodX

7 end for8 mapblock = Phy_read(αM ) Get the map entry

9 α = mapblock[lh]

Algorithm 10 and 11 illustrate how logical read re-quests are performed in PD-DM for public and hiddenvolume respectively

A3 Security Proofs

Lemma 1 Write traces of a PD_write without a hid-den write are indistinguishable from write traces of aPD_write with a hidden write request

PD-DM 171

Algorithm 10 dp = Pub_read(lp)

Input lpOuput dp

1 α = Get_Pub_map(lp)2 dp = Phy_read(α)

Algorithm 11 dh = Hid_read(lh)Input lhOuput dh

1 α = Get_Hid_map(lh)2 dh = Phy_read(α)

Proof (sketch) As shown in Algorithm 4 anyPD_write writes a pair of public data dp and hiddentuple T (dh) or T (dummy) to the next k + 1 blocks ofthe Data area and updates the HM_ROOT ndash notwith-standing of whether hidden write requests exist or notFurther the written public data block and hidden tupleare encrypted with a semantically secure cipher whichprevents adversaries from distinguishing between tu-ples with new hidden data existing hidden data or justdummy data Overall both the location and the contentof write traces are independent fixed and unrelated towhether or not a Hid_write(lh dh) request was associ-ated with this underlying PD_write

Lemma 2 The number of PD_write operations exe-cuted for a Pub_write request is not related to whetherHid_write requests are performed together with it

Proof (sketch) As illustrated in Algorithm 2 the whileloop will not stop until the public data dp is written byone PD_write and then set as Null This will happenonly when the obtained public status sp is invalid whichonly relates to the state of the public data on the diskndash and is thus independent on whether any Hid_writesare performed

Lemma 3 The number of underlying PD_write oper-ations triggered by any access pattern is not related towhether it contains any hidden writes

Proof (sketch) This follows straightforwardly now sincePD_write operations can only be triggered by pub-lic write requests Note that hidden writes can cer-tainly be completed with those PD_writes under therestrain 6(a)iv Accordingly (Lemma 2) the number ofPD_write operations needed to execute any access pat-tern depends on the public writes and has nothing to dowith the hidden writes

Lemma 4 Write traces to the Data area are indistin-guishable for any access pattern that contains the samepublic writes notwithstanding their contained hiddenwrites

Proof (sketch) For access patterns that contain thesame public writes Lemma 1 and Lemma 3 ensure thatthe corresponding number of PD_write operations arethe same and their actual write traces are indistinguish-able As PD-DM always write to the Data area withPD_write operations the write traces to the Data areaare indistinguishable

Lemma 5 Write traces to the PPM are identical forany access pattern that contains the same public writesnotwithstanding their contained hidden writes

Proof (sketch) PPM is written to only in the case of aPub_write For access patterns that contain the samePub_write operations the corresponding PPM writetraces are the same

Lemma 6 Write traces to the PBM are indistinguish-able for any access pattern that contains the same publicwrites notwithstanding their contained hidden writes

Proof (sketch) Each Pub_write(lp dp) marks the cor-responding bits in the PBM bitmap for the writtenphysical blocks as well as the ldquorecycledrdquo physical blockwhere the ldquooldrdquo version of lp resided (Section 32) Sinceno other operations modify the bitmap any modifica-tions thereto relate only to Pub_write operations in anaccess pattern

Lemma 7 Write trace to the Phead is indistinguish-able for any access pattern that contains the same publicwrites notwithstanding their contained hidden writes

Proof (sketch) Phead is incremented by k + 1 for eachPD_write As a result the associated write trace isonly related to the number of PD_write operations ndashthis number is decided by the public writes in an accesspattern only (Lemma 3) Thus the resulting write traceis indistinguishable for any access pattern that containsthe same public writes

PD-DM An efficient locality-preserving block device mapper with plausible deniability

1 Introduction

2 Storage-centric PD Model


22 Security Game

3 System Design

31 Basic Design



322 GetSet_map

323 GetSet_Status


34 Encryption

4 Analysis

41 Security

42 IO Complexity


5 Evaluation

51 HDD Benchmarks

511 Throughput

512 Cache Size

52 SSD Benchmarks

6 Discussion

7 Related Work

8 Conclusion

Acknowledgment

A Appendix

A1 PD-CPA Game

A2 Algorithms

A3 Security Proofs