IEEE TRANSACTIONS ON CLOUD COMPUTING , Volume: PP,Year: 2015 Performing ... Initiative Data Pr… · Performing Initiative Data Prefetching in Distributed File Systems for Cloud Computing

Performing Initiative Data Prefetching inDistributed File Systems for Cloud Computing

Jianwei Liao, Francois Trahay, Guoqiang Xiao, Li Li, Yutaka Ishikawa Member, IEEE

Abstract—This paper presents an initiative data prefetching scheme on the storage servers in distributed file systems for cloudcomputing. In this prefetching technique, the client machines are not substantially involved in the process of data prefetching, but thestorage servers can directly prefetch the data after analyzing the history of disk I/O access events, and then send the prefetched datato the relevant client machines proactively. To put this technique to work, the information about client nodes is piggybacked onto thereal client I/O requests, and then forwarded to the relevant storage server. Next, two prediction algorithms have been proposed toforecast future block access operations for directing what data should be fetched on storage servers in advance. Finally, the prefetcheddata can be pushed to the relevant client machine from the storage server. Through a series of evaluation experiments with acollection of application benchmarks, we have demonstrated that our presented initiative prefetching technique can benefit distributedfile systems for cloud environments to achieve better I/O performance. In particular, configuration-limited client machines in the cloudare not responsible for predicting I/O access operations, which can definitely contribute to preferable system performance on them.

Index Terms—Mobile Cloud Computing, Distributed File Systems, Time Series, Server-side Prediction, Initiative Data Prefetching.

F

1 INTRODUCTION

THE assimilation of distributed computing for searchengines, multimedia websites, and data-intensive

applications has brought about the generation of dataat unprecedented speed. For instance, the amount ofdata created, replicated, and consumed in United Statesmay double every three years through the end of thisdecade, according to the EMC-IDC Digital Universe 2020study [1]. In general, the file system deployed in a dis-tributed computing environment is called a distributedfile system, which is always used to be a backend storagesystem to provide I/O services for various sorts of data-intensive applications in cloud computing environments.In fact, the distributed file system employs multipledistributed I/O devices by striping file data across theI/O nodes, and uses high aggregate bandwidth to meetthe growing I/O requirements of distributed and parallelscientific applications [2], [10], [21], [24].

However, because distributed file systems scale bothnumerically and geographically, the network delay isbecoming the dominant factor in remote file systemaccess [26], [34]. With regard to this issue, numerousdata prefetching mechanisms have been proposed tohide the latency in distributed file systems caused bynetwork communication and disk operations. In theseconventional prefetching mechanisms, the client file sys-tem (which is a part of the file system and runs on the

• J. Liao is now with the College of Computer and Information Science,Southwest University of China, Chongqing, China, 400715. E-mail:[email protected]. Trahay is an associate professor of Telecom SudParis, France.G. Xiao is a professor of Southwest University of China.L. Li is a professor of Southwest University of China.Y. Ishikawa is a professor of the University of Tokyo, Japan.

client machine) is supposed to predict future access byanalyzing the history of occurred I/O access without anyapplication intervention. After that, the client file systemmay send relevant I/O requests to storage servers forreading the relevant data in advance [22], [24], [34].Consequently, the applications that have intensive readworkloads can automatically yield not only better useof available bandwidth, but also less file operations viabatched I/O requests through prefetching [29], [30].

On the other hand, mobile devices generally havelimited processing power, battery life and storage, butcloud computing offers an illusion of infinite computingresources. For combining the mobile devices and cloudcomputing to create a new infrastructure, the mobilecloud computing research field emerged [45]. Namely,mobile cloud computing provides mobile applicationswith data storage and processing services in clouds,obviating the requirement to equip a powerful hardwareconfiguration, because all resource-intensive computingcan be completed in the cloud [46]. Thus, conventionalprefetching schemes are not the best-suited optimizationstrategies for distributed file systems to boost I/O per-formance in mobile clouds, since these schemes requirethe client file systems running on client machines toproactively issue prefetching requests after analyzing theoccurred access events recorded by themselves, whichmust place negative effects to the client nodes.

Furthermore, considering only disk I/O events canreveal the disk tracks that can offer critical information toperform I/O optimization tactics [41], certain prefetchingtechniques have been proposed in succession to readthe data on the disk in advance after analyzing diskI/O traces [25], [28]. But, this kind of prefetching onlyworks for local file systems, and the prefetched data is

IEEE TRANSACTIONS ON CLOUD COMPUTING , Volume: PP,Year: 2015

cached on the local machine to fulfill the application’sI/O requests passively. In brief, although block accesshistory reveals the behavior of disk tracks, there are noprefetching schemes on storage servers in a distributedfile system for yielding better system performance. Andthe reason for this situation is because of the difficultiesin modeling the block access history to generate block ac-cess patterns and deciding the destination client machinefor driving the prefetched data from storage servers.

To yield attractive I/O performance in the distributedfile system deployed in a mobile cloud environment or acloud environment that has many resource-limited clientmachines, this paper presents an initiative data prefetch-ing mechanism. The proposed mechanism first analyzesdisk I/O tracks to predict the future disk I/O access sothat the storage servers can fetch data in advance, andthen forward the prefetched data to relevant client filesystems for future potential usages. In short, this papermakes the following two contributions:

1) Chaotic time series prediction and linear regres-sion prediction to forecast disk I/O access. Wehave modeled the disk I/O access operations, andclassified them into two kinds of access patterns,i.e. the random access pattern and the sequentialaccess pattern. Therefore, in order to predict thefuture I/O access that belongs to the differentaccess patterns as accurately as possible (note thatthe future I/O access indicates what data willbe requested in the near future), two predictionalgorithms including the chaotic time series predic-tion algorithm and the linear regression predictionalgorithm have been proposed respectively.

2) Initiative data prefetching on storage servers.Without any intervention from client file systemsexcept for piggybacking their information ontorelevant I/O requests to the storage servers. Thestorage servers are supposed to log disk I/O accessand classify access patterns after modeling diskI/O events. Next, by properly using two proposedprediction algorithms, the storage servers can pre-dict the future disk I/O access to guide prefetchingdata. Finally, the storage servers proactively for-ward the prefetched data to the relevant client filesystems for satisfying future application’s requests.

The remaining of the paper is organized as follows:The background knowledge and related work regard-ing I/O access prediction and data prefetching will bedescribed in Section 2. The design and implementationdetails of this newly proposed mechanism are illustratedin Section 3. Section 4 introduces the evaluation method-ology and discusses experimental results. At last, wemake concluding remarks in Section 5.

2 RELATED WORK

For the purpose of improving I/O system performancefor distributed/parallel file systems, many sophisticatedtechniques of data prefetching have been proposed and

developed consecutively. These techniques indeed con-tribute to reduce the overhead brought about by networkcommunication or disk operations when accessing data,and they are generally implemented in either the I/Olibrary layer on client file systems or the file server layer.

As a matter of fact, the data prefetching approachhas been proven to be an effective approach to hidelatency resulted by network communication or disk op-erations. V. Padmanabhan and J. Mogul [19] proposeda technique to reduce the latency perceived by usersthrough predicting and prefetching files that are likelyto be requested shortly in the World Wide Web (WWW)servers. There are also several data prefetching tactics fordistributed/parallel file systems. For instance, informedprefetching that leverages hints from the application todetermine what data to be fetched beforehand, sinceit assumes that better file system performance can beyielded with information from the running application[20], [27], [30], [31], [33]. However, this kind of prefetch-ing mechanisms cannot make accurate decisions whenthere are no appropriate hints (I/O access patterns) fromthe running applications, but inaccurate predictions mayresult in negative effects on system performance [22],[23]. Moreover, the client file systems are obliged to tracelogical I/O event and conduct I/O access prediction,which must place overhead on client nodes.

Prefetching in shipped distributed file systems. Re-garding the prefetching schemes used in real worlddistributed file systems, the Ceph file system [32] isable to prefetch and cache files’ metadata to reduce thecommunication between clients and the metadata serverfor better system performance. Although the Google filesystem [3] does not support predictive prefetching, ithandles I/O reads with quite large block size to prefetchsome data that might be used by the following readrequests, and then achieves better I/O data throughput.The technique of read cache prefetching has been em-ployed by the Lustre file system using Dell PowerVaultMD Storage for processing sequential I/O patterns [37].

Prefetching on file servers. Z. Li and Y. Zhou firstinvestigated the block correlation in the storage serversby employing data mining techniques, to benefit I/Ooptimization on file servers [35], [39]. S. Narayan and J.Chandy [41] researched disk I/O traffics under differentworkloads, as well as different file systems, and theydeclared the modeling information about physical I/Ooperations can contribute to I/O optimization tactics forbetter system performance [42]. In [43], an automaticlocality-improving storage has been presented, whichautomatically reorganizes selected disk blocks based onthe dynamic reference stream to effectively boost storageperformance. After that, DiskSeen has been presented tosupport prefetching block data directly at the level ofdisk layout [25], [28]. H. Song et al [44] have proposed aserver-side I/O collection mechanism to coordinate fileservers for serving one application at a time to decreasethe completion time. J. He et al. [29] have explored andclassified patterns of I/O within applications, thereby


allowing powerful I/O optimization strategies includingpattern-aware prefetching. Besides, M. Bhadkamkar etal. [38] have proposed BORG, which is a block reor-ganisation technique, and intends to reorganize hot datablocks sequentially for smaller disk I/O time, which canalso contribute to prefetching on the file server. However,the prefetching schemes used by local file systems aim toreduce disk latency, they fail to hide the latency causedby network communication, as the prefetched data is stillbuffered on the local storage server side.

Distributed file systems for mobile clouds. Moreover,many studies about the storage systems for cloud envi-ronments that enable mobile client devices have beenpublished. A new mobile distributed file system calledmobiDFS has been proposed and implemented in [6],which aims to reduce computing in mobile devices bytransferring computing requirements to servers. Hyrax,which is a infrastructure derived from Hadoop to sup-port cloud computing on mobile devices [7]. But Hadoopis designed for general distributed computing, and theclient machines are assumed to be traditional computers.

In short, neither of related work targets at the cloudsthat have certain resource-limited client machines, foryielding attractive performance enhancements. Namely,there are no prefetching schemes for distributed filesystems deployed in the clouds, which offer computingand storage services for mobile client machines.

3 ARCHITECTURE AND IMPLEMENTATION

This paper intends to propose a novel prefetchingscheme for distributed file systems in cloud computingenvironments to yield better I/O performance. In thissection, we first introduce the assumed application con-texts to use the proposed prefetching mechanism; thenthe architecture and related prediction algorithms of theprefetching mechanism are discussed specifically; finally,we briefly present the implementation details of the filesystem used in evaluation experiments, which enablesthe proposed prefetching scheme.

3.1 Assumptions in Application ContextsThis newly presented prefetching mechanism cannotwork well for all workloads in the real world, and itstarget application contexts must meet two assumptions:• Assumption 1: resource-limited client machines. This

newly proposed prefetching mechanism can be usedprimarily for the clouds that have many resource-limited client machines, not for generic cloud en-vironments. This is a reasonable assumption giventhat mobile cloud computing, which employs pow-erful cloud infrastructures to offer computing andstorage services on demand, for alleviating resourceutilization in mobile devices [46].

• Assumption 2: On-Line Transaction Processing (OLTP)applications. It is true that all prefetching schemesin distributed file systems make sense for a lim-ited number of read-intensive applications such as

database-related OLTP and server-like applications.That is because these long-time running applicationsmay have a limited number of access patterns, andthe patterns may occur repetitively during the life-time of execution, which can definitely contribute toboosting the effectiveness of prefetching.

3.2 Piggybacking Client InformationMost of the I/O tracing approaches proposed by otherresearchers focus on the logical I/O access events oc-curred on the client file systems, which might be use-ful for affirming application’s I/O access patterns [16],[30]. Nevertheless, without relevant information aboutphysical I/O access, it is difficult to build the connectionbetween the applications and the distributed file systemfor improving the I/O performance to a great extend.In this newly presented initiative prefetching approach,the data is prefetched by storage servers after analyzingdisk I/O traces, and the data is then proactively pushedto the relevant client file system for satisfying potentialapplication’s requests. Thus, for the storage servers, itis necessary to understand the information about clientfile systems and applications. To this end, we leverage apiggybacking mechanism, which is illustrated in Figure1, to transfer related information from the client nodeto storage servers for contributing to modeling disk I/Oaccess patterns and forwarding the prefetched data.

As clearly described in Figure 1, when sending alogical I/O request to the storage server, the client filesystem piggybacks information about the client file sys-tems and the application. In this way, the storage serversare able to record disk I/O events with associated clientinformation, which plays a critical role for classifyingaccess patterns and determining the destination clientfile system for the prefetched data. On the other side, theclient information is piggybacked to the storage servers,so that the storage servers are possible to record thedisk I/O operations accompanying with the informationabout relevant logical I/O events. Figure 2 demonstratesthe structure of each piece of logged information storedon the relevant storage server. The information aboutlogical access includes inode information, file descriptor,offset and requested size. And the information about therelevant physical access contains storage server ID, stripeID, block ID and requested size.

3.3 I/O Access PredictionMany heuristic algorithms have been proposed to shep-herd distributing file data on disk storage, as a result,data stripes that are expected to be used together will belocated close to one another [17], [18]. Moreover, J. Olyand D. Reed discovered that the spatial patterns of I/Orequests in scientific codes could be represented withMarkov models, so that future access can be also pre-dicted by Markov models with proper state definitions[36]. N. Tran and D. Reed have presented an automatictime series modeling and prediction framework to direct


Client file system Storage server Low level file system

Client Run applica*on

ACK reply ACK reply

ACK reply

I/O Access Req. & Piggybacked info.

Physical I/O req.

Generate piggyback info: CFS info. & App. info. &

Logical access info.

Parse req. & Forward real I/O req.

Logging physical I/O with piggybacking

info. from logical I/O

Fig. 1. Work flow among client who runs the applica-tion, client file system, storage server and low level filesystem: the client file system is responsible for generatingextra information about the application, client file system (CFS)and the logical access attributes; after that, it piggybacks theextra information onto relevant I/O request, and sends themto the corresponding storage server. On the other hand, thestorage server is supposed to parse the request to separatepiggybacked information and the real I/O request. Apart fromforwarding the I/O request to the low level file system, thestorage server records the disk I/O access with the informationabout the corresponding logical I/O access.

Time Block Info. Req. Size

Storage server ID Disk ID Stripe ID Block No.

Logical Info. Physical Info.

cfs R/W inode File descriptor Offset Req. Size

Fig. 2. Logged information about both logical access andits corresponding physical access

prefetching adaptively, this work employs ARIMA (Au-toregressive Integrated Moving Average model) time se-ries models to forecast the future I/O access by resortingto temporal patterns of I/O requests [27]. However, noneof the mentioned techniques aims to guide predictingphysical access via analyzing disk I/O traces, since twoneighbor I/O operations in the raw disk trace mightnot have any dependency. This section will illustratethe specifications about how to predict disk I/O opera-tions, including modeling disk I/Os and two predictionalgorithms adopted by our newly presented initiativeprefetching mechanism.

3.3.1 Modeling Disk I/O Access PatternsModeling and classifying logical I/O access patternsare beneficial to perform I/O optimization strategiesincluding data prefetching on the client nodes [29], [36].But, it is different from logical I/O access patterns, diskI/O access patterns may change without any regularities,because the disk I/O access does not have informationabout applications. As mentioned before, Z. Li et al. have

0

200000

400000

600000

800000

1000000

0 40 80 120 160 200

Lo

gic

al B

lock

Ad

dre

ss

Operation Sequential Number

Sequen&al access pa-ern

Random access pa-ern but with certain regulari&es.

(Chao&c &me series access)

Fig. 3. Specified Disk I/O Traces with (randomly) se-lected 200 I/O Operations while running a database-related OLTP benchmark, i.e. Sysbench[13]: all of them areread operations, most of the operations indicate certain accessregularities. To be specific, there are two kinds of trends in thefigure, including a sequential tendency and a random tendency,but parts of the random series may be chaotic.

confirmed that block correlations are common semanticpatterns in storage systems, and the information aboutthese correlations can be employed for enhancing theeffectiveness of storage caching, prefetching and diskscheduling [39]. This finding inspired us to explore theregularities in the disk I/O access history for hintingdata prefetching; currently, we can simply model diskI/O access patterns as two types, i.e. the sequentialaccess pattern and the random access pattern.

For modeling disk I/O access, i.e. determining an I/Ooperation belong to a sequential stream or a randomstream, we propose a working set-like algorithm torudimentarily classify I/O access patterns into either asequence tendency or a random tendency. Actually, C.Ruemmler and J. Wilkes [40] verified that the workingset size on the selected systems is small compared to thetotal storage size, though the size exhibits considerablevariability. This working set algorithm keeps a workingset (W(t, T)), which is defined for time t as the setof access addresses referenced within the process timeinterval (t-T, t). The working set size w(t, T) is the offsetdifference in W(t, T), but the noise operations will be ig-nored while computing the working set size. After that,the algorithm will compare the working set size with adefined threshold to indicate whether an access patternchange occurred or not. As a matter of fact, the employedworking set-like algorithm has very similar effects as thetechniques that utilize Hidden Markov Models (HMM)to classify access patterns after analyzing client I/O logs[36], but it does not introduce any overhead resultedby maintaining extra data structures and performingcomplex algorithms.

Figure 3 illustrates that the disk access patterns ofan OLTP application benchmark, i.e. Sysbench can beclassified into either a sequential access tendency or arandom access tendency by using our presented workingset-like algorithm. After modeling I/O access patterns,


End

Not a linear sequence.

N

Y

Tnew = Tcur + Tserving Sizenew = Sizecur or Average size

End

Count exceed Threshold ?

SLOPE = Dataamount / Timeused Tserving = Sizecur / Data Throughput Addrnew = Addrcur + Tserving / SLOPE

Counting the total number of access events in Access_set

Fig. 4. The flow chart of linear regression predictionalgorithm: which intends to forecast future access event whenthe current access events follow a linear access pattern.

we can understand that block I/Os may have certainregularities, i.e. a linear tendency and a chaotic tendency.Therefore, we have proposed two prediction algorithmsto forecast the future I/Os when the I/O history followseither of two tendencies after pattern classification byusing the working set-like approach.

3.3.2 Linear Regression PredictionThis linear regression prediction can be employed toforecast the future access events only if the sequenceof current access events follows a linear access pat-tern. That is to say, confirming the current access se-ries is sequential or not is critical to prediction ac-curacy. To this end, we can assume that a new ac-cess request comes at time T cur, starting at addressAddrcur and with a request size Sizecur, which canbe labeled as Access(Addrcur, Sizecur, Tcur). And weexpect to forecast the information about the followingread request Access(Addrnew, Sizenew, Tnew). From theavailable block access history, the proposed predictionalgorithm attempts to find all block access events thatoccurred recently after a given time point. Namely, allAccess(Addrprev, Sizeprev, Tprev), while T prev is greaterthan (T cur-T def ) are supposed to be collected and thenput into a set, i.e. Access set. Note that the mentionedvariable, i.e. T def , which is a pre-defined threshold onthe basis of the density of I/O operations, it can limit thetime range for the previous access operations that willbe used for judging whether the current access sequenceis a sequential one or not.

Figure 4 demonstrates the work flow of the proposedlinear prediction algorithm. We first count the total num-ber of occurred access events in Access set when theirtarget block addresses are within the range of [Addrcur-Addrdef , Addrcur]. It is similar to Tdef , Addrdef is an-

other pre-defined threshold to confine the offset rangeof occurred access events. After that, if the total numberof found access events is greater than our pre-definedthreshold value, we can ascertain that this access requestis a part of a linear access sequence. As a result, the sloperate of this sequential access pattern, i.e. SLOPE can becalculated to the rate of (Dataamount/T imeused), whereDataamount is the total size of processed data during theperiod of this linear access sequence, and Timeused isthe time consumed for completing all relevant accessrequests during this period. Furthermore, Tserving is thetime required for processing the currently received readrequest, which indicates that the next read request canbe handled only if this received one has been fulfilled. Inthis case, we can assert that the following future accessrequest is probably a part of this linear access sequence,as long as the current access series follows a sequentialtendency. Therefore, the future request can be predictedby using linear regression prediction, and the detailsabout how to forecast the address, request size and timeof the future access request have been shown in the lasttwo steps of the flow chart.

3.3.3 Chaotic Time Series PredictionAlthough certain disk I/Os might be a part of a se-quential sequence and the algorithm of linear regressionprediction can contribute to forecasting future I/O ac-cess when the occurred I/Os follow a sequential accesspattern, random I/Os make up a sizable portion of thetrack of all I/Os. For the sake of predicting future I/Oaccess while it follows a random trend, many predictionalgorithms have been proposed, such as the Markovmodel prediction algorithm [36].

It is obvious that all I/O operations in a random accesstendency are in chronological order; besides, we foundthat the neighbor I/O operations might not have anydependency, as the relationship between I/O operationsis influenced by several factors including network trafficand application behaviors. Therefore, we employ an ex-ponent called Lyapunov characteristic exponent, whichcharacterizes the rate of separation of infinitesimallyclose trajectories (offsets of block I/O operations) toaffirm whether the random access tendency is a chaotictime series or not. In fact, two offsets belonging toneighbor I/O operations in a phase space with an initialseparation δZ0 diverge, and then after t iterations, theδZ(t) can be defined by Equation 1.

|δZ(t)| ≈ eλt|δZ0| (1)

where λ is the Lyapunov exponent.The largest Lyapunov exponent is defined as the Max-

imal Lyapunov exponent (MLE), which can be employedto determine a notion of predictability for a dynamicalsystem or a sequence [14]. A positive value of MLE isusually leveraged as an indication that the system orthe sequence is chaotic, and then we can utilize chaotictime series prediction algorithms to forecast future disk


I/O access. Otherwise, I/O access prediction and corre-sponding data prefetching should not be done while theaccess pattern is random but not chaotic. In this paper,we propose a chaotic time series prediction algorithm toforecast future I/O access. There are two steps in thisalgorithm: the first step is to calculate the Lyapunovexponent, which is a norm to indicate a chaotic series;and the second step is to predict the future I/O accessby employing the obtained Lyapunov exponent.

Step 1: Lyapunov exponent estimation. Because theoffsets of disk I/O operations are critical to guide dataprefetching, we only take the offsets of I/O operationsin the access history into account while designing theprediction algorithm. To be specific, in our predictionalgorithm, each I/O operation can be treated as a pointin the time series, and its value is the operation’s offset.

(1) We employ offsets of disk I/O operations to buildan m-dimensional phase space Rm:

Xi = [xi, xi+k, xi+2k, · · · , xi+(m−1)k]T (2)

where i = 1, 2, · · · ,M , and Xi ∈ Rm. And k is a delayexponential, (τ = kτs, τ represents the delay time, andτs is the time interval for sampling points in the offsetseries); N is the total number of all points in the series,and M = N − (m − 1)k is the total number of phasepoints in Rm.

(2) To find all points xj that are nearby point xi whilethe neighbourhood is ε, but excluding time correlationpoints in the sequence.{

Dist(Xj , Xi; 0) = ‖Xj −Xi‖ ≤ ε|j − i| ≥ p

(3)

where Dist(Xj , Xi; 0) is the distance between Xj andXi at starting time, and ‖∗‖ is the absolute Euclideandistance. p is the average time for each cycle in thetime series. All points satisfying Equation 3 make upthe assembly of uj .

(3)After the time interval of t+ τs , Xi comes to Xi+1.As a result, the nearby point of Xi changes to Xj+1 fromXj , and the distance can be calculated with the followingequation:

Dist(Xj , Xi; t) = ‖Xj+1 −Xi+1‖ (t = 1, 2, · · · , ) (4)

According to the definition of Maximal Lyapunovexponent, we can achieve the following equation:

Dist(Xj , Xi; t) = Dist(Xj , Xi; 0)eλtτ (5)

Then, after taking a logarithm operation on both sidesof Equation 5, we can receive:

1

τslnDist(Xj , Xi; t) =

1

τslnDist(Xj , Xi; 0) + λt (6)

λ is the slope of 1τs

lnDist(Xj , Xi; t)∼t, we have donethe relevant optimization in Equation 7 to reduce theerror range of the slope.

S(t) =1

M

M∑i=1

ln(1

|uj |∑Xj∈uj

Dist(Xj , Xi; t)) (7)

where M is a sum number of phase points, |uj | is a totalnumber of elements in Assembly uj .

(4) Draw a curve of 1τsS(t)∼t, and we know the slope

of this curve is λ. Actually, the value of λ can indicatewhether the access series presents a chaotic trend or not.While the series is not a chaotic series, the predicationwill not be performed, and the relevant data prefetchingis not triggered, as well.

Setp 2: Lyapunov exponent-based Prediction Oncethe maximum Lyapunov exponent, i.e. λ is obtained, wecan estimate the series is chaotic or not. If the series ischaotic, it is possible to predict the offsets of future I/Ooperations. We set XM is the center for each predictionprocess, Xk is the nearest point to XM , and the distancebetween two points is dM (0):

dM (0) = minj

∥∥∥XM −Xj

∥∥∥ = ‖XM −Xk‖ (8)

‖XM+1 −Xk+1‖ = ‖XM −Xk‖ eλ (9)

2

√√√√ m∑j=1

|xM+1(j)− xk+1(j)|2 = 2

√√√√ m∑j=1

|xM (j)− xk(j)|2eλ

(10)As for the phase point XM+1, only its last component isunknown, but this component is predictable and can bepredicted with a given λ by using Equation 10.

3.4 Initiative Data Prefetching

The scheme of initiative data prefetching is a novelidea presented in this paper, and the architecture of thisscheme is demonstrated in Figure 5 while it handlesread requests (the assumed synopsis of a read operationis read(int fildes, size t size, off t off)). In the figure, thestorage server can predict the future read operation byanalyzing the history of disk I/Os, so that it can directlyissue a physical read request to fetch data in advance.

The most attractive idea in the figure is that theprefetched data will be forwarded to the relevant clientfile system proactively, but the client file system is notinvolved in both prediction and prefetching procedures.Finally, the client file system can respond to the hit readrequest sent by the application with the buffered data. Asa result, the read latency on the application side can bereduced significantly. Furthermore, the client machine,which might have limited computing power and energysupply, can focus on its own work rather than predicting-related tasks.


Client file system Storage server Low level file system

Network

Applica7on

read (fd, 4096, 0) read (fd, 4096, 0) read (fd, 4096, 0)

4096 bytes required data


required data

read (fd, 4096, 8192)


4096 bytes prefetched

data

read (fd, 4096, 8192)

required data

Predic,ng I/O access & issue prefeching Caching & managing

prefetched data

Compu7ng with input data

Read latency

Fig. 5. The algorithm of initiative data prefetching: thefirst read request is fulfilled as a normal case in the distributedfile system, so that the application on the client machine canperform computing tasks after receiving the required data. Atthe same time, the storage server is able to predict the futureI/O access by analyzing the history of disk I/O access, therefore,it issues a physical read request to fetch the data (that will beaccessed by the predicted read I/O access) in advance. At last,the prefetched data will be pushed to relevant client file systemsproactively. While the prediction of I/O access made a hit, thebuffered data on the client file systems can be responded to theapplication directly, that is why the read latency can be reduced.

Application

PARTE client file system

PARTE storage server

PARTE metadata

server

File Metadata

File I/O and File locking

File status and File creation

Fig. 6. Architecture of the PARTE file system:in which theinitiative prefetching mechanism has been implemented.

3.5 Implementation

We have applied the newly proposed initiative prefetch-ing scheme in the PARTE file system [15], [47], whichis a C-implemented prototype of a distributed file sys-tem. Figure 6 shows the architecture of the PARTE filesystem used in our evaluation experiments, it has threecomponents and all components work at user level:

• PARTE client file system (partecfs), which is de-signed based on Filesystem in Userspace (FUSE),and aims to provide POSIX interfaces for the ap-plications to use the file system. Normally, partecfssends metadata requests or file I/O requests to themetadata server or storage servers separately toobtain the desired metadata or file data. Moreover,partecfs is responsible for collecting the informationabout the client file system and applications, as wellas piggybacking it onto real I/O requests.

TABLE 1Specification of Nodes on the Cluster and the LANs

Cluster LANsCPU 4xIntel(R) E5410 2.33G Intel(R) E5800 3.20GMemory 1x4GB 1066MHz/DDR3 4GB DDR3-SDRAMDisk 6x114GB 7200rpm SATA 500GB 7200rpm SATANetwork Intel 82598EB, 10GbE 1000Mb or 100 MbOS Ubuntu 13.10 Debian 6.0.4

• PARTE metadata server (partemds), which managesmetadata of all objects including files and directoriesin the file system. Besides, it monitors all storageservers and issues commands to create or remove astripe on them. Note that the initiative prefetchingis transparent to partemds, as it works like the meta-data server in a traditional distributed file system.

• PARTE storage server (parteost), which handles filemanagement (but the basic unit is a stripe ratherthan a file) and provides the actual file I/O servicefor client file systems. Furthermore, parteost per-forms initiative prefetching, that is to say, it recordsthe block access events with the piggybacked infor-mation sent by the client file systems, then conductspattern classification and access prediction to guideprefetching block data.

4 EVALUATION AND EXPERIMENTS

We first introduce the experimental platform, the usedcomparison counterparts and the selected benchmarksin evaluation experiments. Next, evaluation methodolo-gies and relevant experimental results are discussed indetails. Then, the analysis on prediction accuracy of twoproposed prediction algorithms is presented. At last, weconduct a case study with real block traces.

4.1 Experimental Setup

4.1.1 Experimental PlatformOne cluster and two LANs are used for conducting theexperiments, the active metadata server and 4 storageservers are deployed on the 5 nodes of the cluster, andall client file systems are located on the 12 nodes of theLANs. Table I shows the specification of nodes on them.

Moreover, for emulating a distributed computing envi-ronment, 6 client file systems are installed on the nodesof the LAN that is connected with the cluster by a 1GigE Ethernet; another 6 client file systems are installedon the nodes of the LAN, which is connected with thecluster by a 100M Ethernet, and both LANs are equippedwith MPICH2-1.4.1. Figure 7 demonstrates the topologyof our experimental platform clearly.

4.1.2 Evaluation CounterpartsTo illustrate the effectiveness of the initiative prefetch-ing scheme, we have also employed a non-prefetching


C 1 C 2 C 3

C 4 C 5 C 6

C 1 C 2 C 3

C 4 C 5 C 6

100 Mbit/s Switched Network 1000 Mbit/s Switched Network

1000 Mbit/s Connected Cluster Storage Server

MDS

Storage Server Storage Server Storage Server

Fig. 7. The topology of our experimental platform.

scheme and another two prefetching schemes as com-parison counterparts in our experiments:• Non-prefetching scheme (Non-prefetch), which means

that no prefetching scheme is enabled in the dis-tributed file system. That is to say, although thereare no benefits that can be yielded from prefetching,there is also no overhead caused by trace loggingand access prediction required for prefetching data.

• Readahead prefetching scheme (Readahead), which mayprefetch the data in next neighbor blocks on thestorage servers, and it can adjust the amount ofthe prefetched data based on the amount of dataalready requested by the application. This mecha-nism is simple but has been proven to be an effectivescheme for better I/O performance [24].

• Signature-based prefetching scheme (IOSig+), which isa typical data prefetching scheme on the client filesystems by analyzing applications’ access patterns,and the source code is also available [4]. IOSig+allows users to characterize the I/O access patternsof an application in two steps: 1) trace collecting toolcan get the trace of all the I/O operations of theapplication; 2) through the offline analysis on thetrace, the analyzing tool gets the I/O signature [30].Thus, the file system can enable data prefetching byemploying the fixed I/O signatures.

• Initiative prefetching scheme (Initiative), which is ourproposed prefetching technique. In addition to pre-dicting future I/O access events, the storage serversare responsible to prefetch the block data and for-ward it to the corresponding client file systems.

4.1.3 BenchmarksIt is well-known that the prefetching scheme is able toimprove I/O performance only if the running appli-cations are database-related OLTP and server-like pro-grams. Therefore, we selected three related benchmarksincluding an OLTP application benchmark, a server-like application benchmark and the benchmark that hascertain regular access patterns to evaluate the initiativeprefetching scheme. In the evaluation experiments, allbenchmarks are running on 12 client nodes of two LANsby default if there are no particular specifications.• Sysbench, which is a modular, cross-platform and

multi-threaded benchmark tool for evaluating OS

parameters that are important for a system runninga database under intensive loads [13]. It containsseveral programs and each program exploits theperformance of specific aspects under Online Trans-action Processing (OLTP) workloads. As mentionedbefore, the prefetching mechanism makes sense fordatabase-related OLTP workloads, thus, we utilizedSysbench to measure the time required for process-ing online transactions to show the merits broughtabout by our proposed prefetching mechanism.

• Filebench, which allows generating a large varietyof workloads to assess the performance of storagesystems. Besides, Filebench is quite flexible and en-ables to minutely specify a collection of applications,such as mail, web, file, and database servers [12]. Wechose Filebench as one of benchmarks, as it has beenwidely used to evaluate file systems by emulatinga variety of several server-like applications [9].

• IOzone, which is a micro-benchmark that evaluatesthe performance of a file system by employing acollection of access workloads with regular pat-terns, such as sequential, random, reverse order, andstrided [11]. That is why we utilized it to measureread data throughput of the file systems with vari-ous prefetching schemes, when the workloads havedifferent access patterns.

4.2 I/O Processing Acceleration

In fact, I/O responsiveness of file systems is a crucialindicator to express their performance, and our primarymotivation to introduce the initiative prefetching schemeis to provide I/O service for OLTP applications. Thus,we employed the online transaction processing bench-mark of SysBench suite with a MySQL-based databasesetup to examine the validity of the Initiative prefetchingscheme. We selected SysBench as the benchmark to createOLTP workloads, since it is able to create similar OLTPworkloads that exist in real systems. All the configuredclient file systems executed the same script, and eachof them run several threads that issue OLTP requests.Because Sysbench requires MySQL installed as a backendfor OLTP workloads, we configured mysqld process to 16cores of storage servers. As a consequence, it is possibleto measure the response time to the client request whilehandling the generated workloads.

In the experiments, we created multiple tables withtotal 16G of data, and then set the benchmark has afixed number of threads on each client node to perform100,000 database transactions. Moreover, Sysbench hastwo operating modes: default mode (r/w) that reads andwrites to the database, and a readonly mode (ro). Ther/w mode executes the following database operations:5 SELECT operations, 2 UPDATE operations, 1 DELETEoperation and 1 INSERT operation. To put it in propor-tion specifications about both operations, the observedread/write ratio is about 75% reads and 25% writes.Figures 8(a) and 8(b) illustrate the response times while


(a) OLTP Mul,ple Table (Read only) (b) OLTP Mul,ple Table (Read/Write)

0

20

40

60

80

100

120

1 2 4 8 16 32 64 128 256R

esp

on

se T

ime

(m

s)

Number of Threads

Non-prefetchReadaheadIOSig+Initiative

10

100

1000

10000

1 2 4 8 16 32 64 128 256

Resp

onse

Tim

e (

ms)

Number of Threads


Fig. 8. Sysbench experimental results on I/O responsiveness: X-axis represents the number of threads on each client node,and Y-axis indicates the time required for handling each OLTP transaction, in which the lower value is preferred.

working in ro and r/w operating modes respectively. Itis clear that our Initiative scheme brought about the leastresponse times in both operating modes. That is to say,as storage servers could make accurate predictions aboutfuture I/O events for Sysbench, the Initiative prefetchingtechnique outperformed other prefetching schemes, eventhough when the servers were heavily loaded.

4.3 Data Bandwidth Improvement

In this section, we first measure read data throughput byusing IOzone benchmark, and then we evaluate overalldata throughput and processing throughput by execut-ing filebench benchmark. The experimental results andrelated discussions will be shown in the following twosub-sections respectively.

4.3.1 Read Data ThroughputWe have verified read data throughput of the file systemwhile employing the mentioned prefetching schemes,through performing multiple kinds of read operationsthat generated by the IOzone benchmark. Figure 9 reportsthe experimental results, and each sub-figure shows thedata throughput when read operations follow a specificaccess pattern. In all sub-figures, the horizontal axisshows record sizes for each read operation, the verticalaxis shows data rate, and the higher data rate indicatesthe better performance of the prefetching scheme.

From the data presented in the figure, we can safelyconclude that the Initiative prefetching scheme per-formed the best in most of cases when running theIOzone benchmark. Especially, while conducting bothbackward reads and stride reads, whose results aredemonstrated in sub-figures (b) and (c) in Figure 9, ourproposed scheme increased data throughput by 30% to160%, in contrast to the Readahead prefetching schemeand the Non-prefetching scheme. Besides, it is obviousthat IOSig+, which is a typical client-side prefetchingscheme, does not work better than the Non-prefetchingscheme in a major part of cases. That might becauseIOSig+ does not have any effective algorithms to classifyaccess patterns or any practical implementation of thisnotion [29].

4.3.2 Overall Data Throughput

As we demonstrated before, IOzone could measure theread data throughput with various types of read accesspatterns, but it fails to unveil overall performance ofthe file system, while applications have mixed accesspatterns. Thus, we also leveraged FileBench to measureoverall data and processing throughput, since FileBenchis able to create filesets that have varying sizes files priorto actually running a workload. The workloads of web,database, email and file server have been chosen in ourtests, because they can represent most common serverworkloads, and they differ from each other distinctly[8], [9]. All tests were piloted by using the workflowdefinitions provided by Filebench with run 60, and weonly recorded the results of server applications, i.e. mailserver (varmail), file server, web server and databaseserver (oltp) because the prefetching scheme can benefitthese server-like applications.

Figure 10 shows the overall data throughput andprocessing throughput caused by various prefetchingschemes. Without doubts, compared with other prefetch-ing schemes, the newly proposed Initiative prefetchingtechnique resulted in 16% - 28% performance improve-ments on overall data throughput, as well as 11% - 31%performance enhancements on processing throughput.This is because the Initiative prefetching scheme canenhance the data throughput on read operations, and theoverall data throughput can be improved consequently.

4.4 Benefits upon Client Nodes

We have moved the prefetching functionality from clientnodes that might be resource-limited, to storage serversin a distributed file system for alleviating the consump-tion of computing and memory resources on the clientnodes. In order to show the advantages to the clientnodes while performing prefetching on the servers, wehave programmed a matrix multiplication benchmarkthat runs on the client nodes. Figure 11 demonstratesits workflow, which reads the matrix elements fromfiles stored on the distributed file system. We intend todemonstrate that the task of matrix multiplication can becompleted on the client nodes quickly when the Initiative


(c) stride-‐read report (d) random-‐read report

0

50

100

150

200

250

300

64 128 256 512 1024 2048 4096 8192 16384

Da

ta R

ate

(MB

)

Record Size (KB)


0

10

20

30

40

50

60

64 128 256 512 1024 2048 4096 8192 16384

Da

ta R

ate

(MB

)

Record Size (KB)


(a) read report (b) backward-‐read report

0

50

100

150

200

250

300

64 128 256 512 1024 2048 4096 8192 16384D

ata

Rate

(MB

)

Record Size (KB)


0

50

100

150

200

250

300

64 128 256 512 1024 2048 4096 8192 16384

Da

ta R

ate

(MB

)

Record Size (KB)


Fig. 9. Read data throughput while running IOzone benchmark: (a) in the read report, all the prefetching schemes havethe similar results, it seems that prefetching schemes do not have positive effects on I/O performance while read operations donot have certain regularities. (b) in the backward read report, both IOSig+ and Initiative prefetching scheme can achieve preferabledata throughput. (c) in the stride read report, the proposed Initiative prefetching scheme outperforms others. (d)in the randomread report, the Non-prefetching scheme performs better than other three prefetching schemes. That is because the prefetchingscheme consumes computing power and memory capacity to predict future I/O operations and then to guide prefetching data, butthe prefetched data may not be used in random read patterns, i.e. prefetching leads to negative effects in this case.

(a) Data Rate Throughput (b) Processing Throughput

0

50

100

150

200

250

varmail fileserver webserver oltp

Dat

a T

hro

ug

hp

ut

(MB

/s)

Server Applications

Non!prefetchReadaheadIOsig+Initiative

0

20,000

40,000

60,000

80,000

100,000

120,000

varmail fileserver webserver oltp

Op

erat

ion

s p

er S

eco

nd

(o

ps/

s)

Server Applications


Fig. 10. Filebench experimental results on overall data throughput and processing throughput: (a) demonstrates overalldata throughput, X-axis represent four selected server applications in Filebench, Y-axis shows overall data throughput and thehigher one is preferred. (b) explores the results about processing throughput, and the vertical axis represents the number ofhandled read & write operations per second; moreover, the average latency for each operation including read and write in theselected server applications were 4.4ms, 1.5ms, 0ms and 0.2ms respectively.

Matrix A Matrix B AB B(AB) AB(B(AB))

AB B(AB) AB(B(AB))

Read & write on file system

Computing on client node

Fig. 11. The algorithm of matrix multiplication benchmark:there are three times of multiplication, and the intermediateresults are flushed back to the file system. In other words, allinput data should be read from the file system.

prefetching scheme is enabled, in contrast to adoptingIOsig+ that is a typical client prefetching scheme.

After running the benchmark for multiplying the ma-trices with varying sizes, we measured the time neededfor completing the multiplication tasks. Figure 12(a)shows the time required for performing matrix multi-plication on the file systems with different prefetchingschemes. It is clear that the newly presented initiativeprefetching scheme needed the least time to conductingmultiplication tasks when the matrix size is bigger than200. For instance, while the matrix size is 1000, whichmeans there are 1000x1000 matrix elements, the Initiativescheme could reduce the completion time by up to 21 %compared to the other schemes. In short, the less com-pletion time means less CPU and memory consumption


(a) Execu*on *me on client nodes (b) Time required for prefetching on client nodes

1,000(10.4M)C

om

ple

tio

n T

ime

(ms)

Matrix Size (File Size)

Non!prefetch

5,000

6,000

200(0.4M) 400(1.6M) 600(3.7M) 0

1,000

2,000

3,000

4,000

ReadaheadIOsig+

800(6.6M)

Initiative

0

200

400

600

800

1,000

200(0.4M) 400(1.6M) 600(3.7M) 800(6.6M) 1,000(10.4M)

Co

mp

leti

on

Tim

e (m

s)

Matrix Size (File Size)


Fig. 12. The time needed for matrix multiplication on the client node: X-axis expresses the size of matrix and thesize of the file that stores one of the target matrices. Y-axis in (a) presents the time needed for conducting matrixmultiplication, and the lower the better, and Y-axis in (b) shows the time required for performing prefetching-relatedtasks when conducting matrix multiplication.

on the client node.Another critical information demonstrated in the ex-

periments of executing matrix multiplication is aboutthe overhead caused by conducting data prefetching onthe client sides. The newly proposed Initiative schemeis a prefetching mechanism on the server sides, whichaims to reduce the prefetching overhead on the resource-limited client machines. Therefore, we have also re-ported the time required for completing the prefetching-relevant tasks, such as conducting future access pre-diction through executing complicated algorithms onthe client machines when running matrix multiplication.Figure 12(b) demonstrates the overhead on the clientnodes introduced by different prefetching mechanisms.It is obvious that both Non-prefetching and Readahead didnot result in any prefetching overhead, because there areno trace logging and access predictions in both schemes.The Initiative prefetching scheme caused a little prefetch-ing overhead on the client sides, because it requires theclient file system to piggyback the client informationonto the I/O requests. But Initiative outperformed IOsig+at the time needed to support the prefetching function-ality to a great extend. For example, it saves more than3700% time on the client nodes in contrast to IOsig+,while the size of matrix is 1000*1000.

Note that although we have demonstrated that IOSig+introduces certain prefetching overhead on the clientnodes, but it works better than Non-Prefetch, which hasbeen verified in Figure 12(a) before. This is becausethe application of matrix multiplication is a benchmarkto read data regularly, the gains resulted by the clientprefetching scheme could offset the overhead caused byconducting prediction on the client machines.

4.5 Overhead on Storage Servers

It has been confirmed that the proposed server-sideprefetching scheme, i.e. Initiative indeed achieves betterI/O performance, as well as less overhead on clientnodes, but it is necessary to check the prefetchingoverhead on storage servers for ensuring whether this

mechanism is practical or not. Table 2 illustrates therunning time, space overhead and network delays forinitiative prefetching on the storage servers. The resultsdemonstrate that the Initiative prefetching technique caneffectively and practically forecast future disk operationsto guide data prefetching for different workloads withacceptable overhead. Since both IOzone and Filebenchare typical I/O benchmarks to test I/O performanceof storage systems, they caused more than 7.2% timeoverhead on prefetching and pushing prefetched data.But, for compute-intensive applications and OLTP work-loads, our proposed mechanism utilized not much timeto yield preferable system performance. For example,while the workload is matrix multiplication, the timerequired for predicting future disk operations to directdata prefetching and pushing the data to relevant clientfile systems was around 2.2% of total processing time.Therefore, we can see that this newly presented server-side prefetching is practical for storage systems in cloudcomputing environments.

The consumed space for storing disk traces and theclassified access patterns to conduct I/O prediction isalso demonstrated in Table 2. It is clear that the presentedinitiative prefetching scheme is efficient in terms of spaceoverhead for a major part of traces. The results show thatrunning IOzone benchmark resulted in more than 100 MBbut less than 180 MB tracing logs, as it is a typical I/Obenchmark and focuses on I/O operations rather thancomputing tasks. Moreover, it is clear that less than 80MB space was used for storing disk traces to forecastfuture disk I/Os while the benchmark is Sysbench (i.e.OLTP workloads). In a word, analyzing disk traces andprefetching block data can run on the same machine asthe storage system without causing too much memoryand disk overhead. For long-time running applications,the size of trace logs may become extraordinary large, inthis case, the initiative prefetching may discard certainlogs that occurred at the early stage, because disk blockcorrelations are relatively stable during certain period,and the disk access logs at earlier stages are not instruc-tive to forecast future disk access [35], [39].


TABLE 2Initative Prefetching Overhead

Benchmark Time (%) Space (MB) Delay / Delay+3R (%)IOzone 10.5 170.7 4.8/ 18.3Sysbench 3.8 79.2 1.3 / 3.1Filebench 7.2 124.1 5.1 /19.6Matrix (1000) 2.2 0.32 0.8 / 1.9

Moreover, since transferring the prefetched data mustbring about extra network traffics, which may affect theresponse times to I/O requests excluding the prediction-hit requests. To disclose the percentage of network de-lays to the requests, we first measured the response timesto these requests caused by the system with prefetchingor without prefetching; and then we computed the per-centage of network delay according to Equation 11.

Delay =

∑ni=1[(Tirecv′ − Tisnd′)− (Tirecv − Tisnd)]∑n

i=1(Tirecv − Tisnd)(11)

where (Tirecv −Tisnd) indicates the response time to re-quest i caused by the system with initiative prefetching,and (Tirecv − Tisnd) means the response time resultedby the system without prefetching.

Table 2 reports the network delay percentages whenthe storage servers have no copy and three replicas foreach for stripe respectively. It is obvious that the trafficsof prefetched data did place no more than 5.1% networkdelays to the response time to the requests excludingprediction-hit ones there was no copy for each data stripe(labeled as Delay in the table) . But the percentage ofnetwork delays increases significantly while there arethree replicas for each stripe (labeled as Delay+3R), that isbecause all storage servers having the same stripe replicaare possible to push the prefetched data to the sameclient file system, the amount of data on the networkmay be increased manyfold.

4.6 Analysis of Prediction AlgorithmsWe not only exploited the advantages resulted by theinitiative prefetching scheme, but also made an anal-ysis of prediction error/deviation when applying ourestimation models on the different workloads, to showthe effectiveness of our prediction algorithms. In otherwords, we recorded the number of prediction hits, aswell as the number of predictions and the total read op-erations occurred on the storage servers, while runningthe selected benchmarks in evaluation experiments.

Table 3 demonstrates the results of related statisticselaborately. From the reported data in the table, weconclude that our prediction models are not responsiblefor predicting all future read operations on the disk.Namely, only the disk I/Os follow either a linear trend ora chaotic trend, which can be distinguished by modelingdisk I/O access patterns, relevant read predictions willbe performed. Moreover, our presented two prediction

TABLE 3Prediction Hit Ratio

Benchmark Reads Predictions Hits (Percentage)IOzone:read/re-read 2097184 1235247 1016608 (82%)IOzone:backward-read 1048576 924739 780479 (84.4%)IOzone:stride-read 1048576 943718 823866 (87.3%)IOzone:random-read 2097152 734003 416179 (56.7%)Sysbench:read/write 3326112 2135436 1665640 (78.0%)Sysbench:read-only 2372024 1549248 1275031 (82.3%)Matrix (200*200) 91 88 87 (98.9 %)Matrix (400*400) 355 352 347 (99.1 %)Matrix (600*600) 795 790 782 (99.0 %)Matrix (800*800) 1410 1401 1389 (99.1 %)Matrix (1000*1000) 2201 2188 2170 (99.2 %)

0

2

4

6

8

10

12

14

Finacial−1−read Finacial−1−write Finacial−2−read Finacial−2−wirte

Av

erag

e I/

O R

esp

on

se T

ime

(ms)

Trace Name and Operation

Non−prefetchReadaheadIOsig+Initiative

Fig. 13. The average I/O response time of the selectedblock traces

algorithms resulted in attractive prediction accuracy,which indicates better hit percentage. As a consequence,I/O performance can be enhanced to a great extend byemploying the initiative data prefetching mechanism.

4.7 Case Study with Real Block Traces

To check the feasibility of the proposed mechanism inpractice, we have chosen two widely used state-of-the-art I/O traces, i.e. Financial 1 and Financial 2, whichdescribe block traces of OLTP applications extractedfrom two large financial institutions [5]. In the exper-iments, the client file system issues the I/O requeststo the storage servers for writing/reading block data,according to the log events in the traces. Figure 13 showsthe experimental results about average response time toread and write requests in the selected two traces. In thefigure, X-axis shows the trace name and the operationtype, and Y-axis illustrates the average I/O responsetime (the lower one is better).

It is obvious that the Non-prefetching achieved thebest I/O response time on write operations, becauseit is not affected by prefetching on either client ma-chines or storage servers. Furthermore, compared withother prefetching schemes, the newly proposed Initiativeprefetching could reduce average I/O response time onread operations by 26.4-10.6% in contrast to Readahead,and 16.7-19.4% compared to IOSig+. That is because theproposed mechanism can model block access events and

classify access pattern among them for precisely guidingdata prefetching on storage servers, which can benefitthe read requests in the track of I/O requests.

5 CONCLUSIONS

We have proposed, implemented and evaluated an ini-tiative data prefetching approach on the storage serversfor distributed file systems, which can be employed asa backend storage system in a cloud environment thatmay have certain resource-limited client machines. Tobe specific, the storage servers are capable of predictingfuture disk I/O access to guide fetching data in ad-vance after analyzing the existing logs, and then theyproactively push the prefetched data to relevant clientfile systems for satisfying future applications’ requests.For the purpose of effectively modeling disk I/O accesspatterns and accurately forwarding the prefetched data,the information about client file systems is piggybackedonto relevant I/O requests, then transferred from clientnodes to corresponding storage server nodes. Therefore,the client file systems running on the client nodes neitherlog I/O events nor conduct I/O access prediction; con-sequently, the thin client nodes can focus on performingnecessary tasks with limited computing capacity andenergy endurance. Besides, the prefetched data will beproactively forwarded to the relevant client file system,and the latter does not need to issue a prefetching re-quest. So that both network traffics and network latencycan be reduced to a certain extent, which have beendemonstrated in our evaluation experiments.

To sum up, although the initiative data prefetchingapproach may place extra overhead on storage serversas they are supposed to predict the future I/Os by ana-lyzing the history of disk I/Os, it is a nice option to builda storage system for improving I/O performance whilethe client nodes that have limited hardware and softwareconfiguration. For instance, this initiative prefetchingscheme can be applied in the distributed file system fora mobile cloud computing environment, in which thereare many tablet computers and smart terminals.

The current implementation of our proposed initiativeprefetching scheme can classify only two access patternsand support two corresponding prediction algorithmsfor predicting future disk I/O access. After understand-ing of the fact about block access events generally followthe reference of locality, we are planning to work onclassifying patterns for a wider range of applicationbenchmarks in the future by utilizing the horizontal vis-ibility graph technique. Because the proposed initiativeprefetching scheme cannot work well when there is amix of different workloads happening in the system,classifying block access patterns from the block I/Otrace resulted by several workloads is one direction ofour future work. Besides, applying network delay awarereplica selection techniques for reducing network trans-fer time when prefetching data among several replicasis another task in our future work.

ACKNOWLEDGMENTS

This work was partially supported by ”National Nat-ural Science Foundation of China (No. 61303038)” and”Natural Science Foundation Project of CQ CSTC (No.CSTC2013JCYJA40050)”. We would like to thank Dr.Yanlong Yin for providing specifications about IOSig+.

REFERENCES

[1] J. Gantz and D. Reinsel. The Digital Universe in 2020:Big Data, Bigger Digital Shadows, Biggest Growth in theFar East-United States. http://www.emc.com/collateral/analyst-reports/idc-digital-universe-united-states.pdf [Accessed on Oct.2013], 2013.

[2] J. Kunkel and T. Ludwig, Performance Evaluation of the PVFS2Architecture, In Proceedings of 15th EUROMICRO InternationalConference on Parallel, Distributed and Network-Based Process-ing, PDP ’07, 2007

[3] S. Ghemawat, H. Gobioff, S. Leung, The Google file system, InProceedings of the nineteenth ACM symposium on Operatingsystems principles (SOSP ’03), pp. 29–43, 2003.

[4] IO Signature Plus (IOSIG+) Software Suite[Accessed on Nov. 2012].https://code.google.com/p/iosig/.

[5] UMass Trace Repository: OLTP Application I/O[Accessed on Jan.2014]. http://traces.cs.umass.edu/index.php/Storage/Storage.

[6] MobiDFS: Mobile Distributed File System[Accessed on Nov. 2014].https://code.google.com/p/mobidfs/.

[7] E. E. Marinelli. Hyrax: Cloud computing on mobile devices usingmapreduce. CMU, Tech. Rep., 2009.

[8] P. Sehgal, V. Tarasov, E. Zadok. Evaluating Performance and En-ergy in File System Server Workloads. the 8th USENIX Conferenceon File and Storage Technologies (FAST ’10), pp.253-266, 2010.

[9] V. Tarasov, S. Bhanage, E. Zadok, et al. Benchmarking file systembenchmarking: It* is* rocket science. In Proceedings of HotOS XIII,2011.

[10] N. Nieuwejaar and D. Kotz. The galley parallel file system.Parallel Computing, 23(4-5):447–476, 1997.

[11] IOzone Filesystem Benchmark. http://www.iozone.org.[12] Filebench Ver. 1.4.9.1 [Accessed on Apr. 2012],

http://filebench.sourceforge.net.[13] SysBench benchmark. http://sysbench.sourceforge.net.[14] M. Cencini, F. Cecconi and N. Vulpiani. Chaos From Simple

models to complex systems. World Scientific, ISBN:9814277657,2010.

[15] J. Liao, Y. Ishikawa. Partial Replication of Metadata to AchieveHigh Metadata Availability in Parallel File Systems. In the Pro-ceedings of 41st International Conference on Parallel Processing(ICPP ’12), pp. 168–177, 2012.

[16] I. Y. Zhang. Efficient File Distribution in a Flexible, Wide-area FileSystem. Master thesis, Massachusetts Institute of Technology, 2009.

[17] S. Akyurek and K. Salem. Adaptive block rearrangement. ACMTransactions on Computer Systems, Vol.13(2):89-121, 1995.

[18] B. Bakke, F. Huss, D. Moertl and B. Walk. Method and apparatusfor adaptive localization of frequently accessed, randomly accesseddata. US Patent No. 5765204, Filed June, 1998.

[19] V. Padmanabhan and J. Mogul. Using predictive prefetching toimprove World Wide Web latency. ACM SIGCOMM ComputerCommunication Review, Vol. 26(3): 22–36, 1996.

[20] H. Lei and D. Duchamp. An analytical approach to file prefetch-ing. In Proceedings of USENIX Annual Technical Conference (ATC’97), Anaheim, California, USENIX Association, 1997.

[21] E. Shriver, C. Small, and K. A. Smith. Why does file systemprefetching work? In Proceedings of the USENIX Annual TechnicalConference (ATC ’99), USENIX Association, 1999.

[22] T. M. Kroeger and D. Long. Predicting file system actions fromprior events. In Proceedings of the USENIX Annual TechnicalConference. USENIX Association, 1996.

[23] T. M. Madhyastha and D. A. Reed. Exploiting global input/outputaccess pattern classification. In Proceedings of the 1997 ACM/IEEEconference on Supercomputing (SC ’97), ACM, 1997.

[24] J. Stribling, Y. Sovran, I. Zhang and R. Morris et al. Flexible, wide-area storage for distributed systems with WheelFS. In Proceedingsof the 6th USENIX symposium on Networked systems design andimplementation (NSDI’09), USENIX Association, pp. 43–58, 2009.


[25] X. Ding, S. Jiang, F. Chen, K. Davis, and X. Zhang. DiskSeen: Ex-ploiting Disk Layout and Access History to Enhance I/O Prefetch.In Proceedings of In Proceedings of USENIX Annual TechnicalConference (ATC ’07), USENIX, 2007.

[26] A. Reda, B. Noble, and Y. Haile. Distributing private data inchallenged network environments. In Proceedings of the 19thinternational conference on World wide web (WWW ’10). ACM,pp. 801–810, 2010.

[27] N. Tran and D. A. Reed. Automatic ARIMA Time Series Modelingfor Adaptive I/O Prefetching. IEEE Trans. Parallel Distrib. Syst.Vol. 15(4):362–377, 2004.

[28] S. Jiang, X. Ding, Y. Xu, and K. Davis. A Prefetching SchemeExploiting both Data Layout and Access History on Disk. ACMTransaction on Storage Vol.9(3), Article 10, 23 pages, 2013.

[29] J. He, J. Bent, A. Torres, and X. Sun et al. I/O Accelerationwith Pattern Detection. In Proceedings of the 22nd InternationalACM Symposium on High Performance Parallel and DistributedComputing (HPDC ’13), pp. 26–35, 2013.

[30] S. Byna, Y. Chen, X. Sun, R. Thakur, and W. Gropp. Parallel I/Oprefetching using mpi file caching and I/O signatures. In Proceed-ings of the 2008 ACM/IEEE conference on Supercomputing (SC08), pp. 1-12, 2008.

[31] R. H. Patterson, G. A. Gibson, E. Ginting and J. Zelenka et al.Informed prefetching and caching. In Proceedings of the 15th ACMSymposium on Operating Systems Principles (SOSP ’95), 1995.

[32] S. A. Weil, S. A. Brandt, E. L. Miller, and C. Maltzahn et al. Ceph: ascalable, high-performance distributed file system. In Proceedingsof the 7th USENIX Symposium on Operating Systems Design andImplementation (OSDI ’06), USENIX Association, 2006.

[33] M. Al Assaf, X. Jiang, M. Abid and X. Qin. Eco-Storage: A Hy-brid Storage System with Energy-Efficient Informed Prefetching.Journal of Signal Processing Systems, Vol. 72(3):165-180, 2013.

[34] J. Griffioen, and R. Appleton. Reducing file system latency usinga predictive approach. In Proceedings of the 1994 USENIX AnnualTechnical Conference (ATC ’94), pp. 197-107, 1994.

[35] Z. Li, Z. Chen and Y. Zhou. Mining Block Correlations to ImproveStorage Performance. ACM Transactions on Storage, Vol. 1, No. 2,May 2005, Pages 213-245.

[36] J. Oly and D. A. Reed. Markov model prediction of I/O requestsfor scientific applications. In Proceedings of the 16th internationalconference on Supercomputing (ICS ’02). ACM, New York, NY,USA, pp. 147–105, 2002.

[37] W. Turek and P. Calleja. Technical Bulletin: High PerformanceLustre Filesystems using Dell PowerVault MD Storage. Universityof Cambridge.

[38] M. Bhadkamkar, J. Guerra, L. Useche and V. Hristidis. BORG:block-reORGanization for self-optimizing storage systems. In Pro-ceedings of the 7th conference on File and storage technologies(FAST ’09), pp. 183–196, 2009.

[39] Z. Li, Z. Chen, S. M. Srinivasan and Y. Zhou. C-Miner: MiningBlock Correlations in Storage Systems In Proceedings of the 3rdconference on File and storage technologies (FAST ’04), 2004.

[40] C. Ruemmler and J. Wilkes. A trace-driven analysis of diskworking set sizes. Technical report HPL–OSR–93–23, HP Labs,1993.

[41] S. Narayan, J. A. Chandy. Trace Based Analysis of File System Ef-fects on Disk I/O. In Proceedings of 2004 International Symposiumon Performance Evaluation of Computer and TelecommunicationSystems (SPECTS ’04), 2004.

[42] S. Narayan. File System Optimization Using Block ReorganizationTechniques. Master Thesis, University of Connecticut, 2004.

[43] W. W. Hsu, A. Smith, and H. Young. The automatic improvementof locality in storage systems. ACM Trans. Comput. Syst. Vol.23(4):424–473, 2005.

[44] H. Song, Y. Yin, X. Sun, R. Thakur and S. Lang. Server-sideI/O coordination for parallel file systems. In Proceedings of 2011International Conference for High Performance Computing, Net-working, Storage and Analysis (SC ’11), ACM, 2011.

[45] M. S. Obaidat. QoS-Guaranteed Bandwidth Shifting and Re-distribution in Mobile Cloud Environment. IEEE Transac-tions on Cloud Computing, Vol.2(2):181–193, April-June 2014,DOI:10.1109/TCC.2013.19

[46] C. Chen, M. Won, R. Stoleru and G. Xie. Energy-Efficient Fault-Tolerant Data Storage & Processing in Mobile Cloud. IEEE Trans-actions on Cloud Computing, DOI: 10.1109/TCC.2014.2326169,online in 2014.

[47] J. Liao, L. Li, H. Chen and X. Liu. Adaptive Replica Synchro-nization for Distributed File Systems. IEEE Systems Journal, DOI:10.1109/JSYST.2014.2300611, 2014

Jianwei Liao received his M.S. degree in com-puter science from University of Electronic Sci-ence and Technology of China in 2006, and thenreceived Ph.D. degree in computer science fromthe University of Tokyo, Japan in 2012. Now,he works for the college of computer and infor-mation science, Southwest University of China.His research interests are dependable operatingsystems and high performance storage systemsfor distributed computing environments.

Francois Trahay received his M.S. degree andPh.D. degree in computer science from the Uni-versity of Bordeaux in 2006 and 2009 respec-tively. He is now an associate professor at Tele-com SudParis. He was a postdoc researcher atthe Riken institute in the Ishikawa Lab (Univer-sity of Tokyo) in 2010. His research interestsinclude runtime systems for high performancecomputing (HPC) and performance analysis.

Guoqiang Xiao received his B.E. degree inelectrical engineering from Chongqing Univer-sity of China in 1986, and then received Ph.D.degree in computer science from University ofElectronic Science and Technology of China in1999. He is a full professor at the school of com-puter and information science, Southwest Uni-versity of China. He was a postdoc researcherat the University of Hongkong from 2001 to2003. His research interests include computerarchitecture and system software.

Li Li received her M.S degree in computingscience from Southwest University of China in1992, and then received Ph.D. degree in com-puting from Swinburne University of Technology,Australia in 2006. She is a full professor atthe Department of Computer Science, South-west University, China. She was a postdoc re-searcher at Commonwealth Scientific and Indus-trial Research Organisation, Australia from 2007to 2009. Her research interests include servicecomputing and social network analysis.

Yutaka Ishikawa received his B.E. degree,M.Tech. and Ph.D. degrees in electrical engi-neering from Keio University, Japan in 1982,1984 and 1987, respectively. He is a professorat the Department of Computer Science, theUniversity of Tokyo, Japan. He was a visitingscientist in the School of Computer Science atCarnegie Mellon University from 1988 to 1989.His current research interests include paral-lel/distributed systems and dependable systemsoftware.


Documents

IEEE TRANSACTIONS ON CLOUD COMPUTING , Volume: PP,Year: 2015 Performing ... Initiative Data Pr… · Performing Initiative Data Prefetching in Distributed File Systems for Cloud Computing