113
See discussions, stats, and author profiles for this publication at: http://www.researchgate.net/publication/2830193 Simulation of Dynamic Grid Replication Strategies in OptorSim ARTICLE · AUGUST 2003 Source: CiteSeer CITATIONS 74 DOWNLOADS 183 VIEWS 134 5 AUTHORS, INCLUDING: William Hamish Bell University of Geneva 137 PUBLICATIONS 2,125 CITATIONS SEE PROFILE A. Paul Millar Deutsches Elektronen-Synchrotron 38 PUBLICATIONS 1,021 CITATIONS SEE PROFILE Kurt Stockinger Zurich University of Applied Sciences 115 PUBLICATIONS 3,026 CITATIONS SEE PROFILE Available from: Kurt Stockinger Retrieved on: 20 September 2015 CENTRO PER LA RICERCA SCIENTIFICA E TECNOLOGICA 38050 Povo (Trento), Italy Tel.: +39 0461 314312 Fax: +39 0461 302040 e−mail: [email protected] − url: http://www.itc.it SIMULATION OF DYNAMIC GRID REPLICATION STRATEGIES IN OPTORSIM Bell W.B., Cameron D.G., Capozza L., Millar A.P., Stockinger K., Zini F. July 2002 Technical Report # 0207−19 Istituto Trentino di Cultura, 2002 LIMITED DISTRIBUTION NOTICE This report has been submitted for publication outside of ITC and will probably be copyrighted if accepted for publication. It has been issued as a Technical Report for early dissemination of its contents. In view of the transfert of copy right to the outside publisher, its distribution outside of ITC prior to publication should be limited to peer communications and specific requests. After outside publication, material will be available only in the form authorized by the copyright owner. Sim ulation of Dynami Grid Repli ation Strategies in OptorSim William H. Bell1 , Da vid G. Cameron1 , Luigi Cap ozza2 , A. P aul Millar1 , Kurt Sto kinger3 , Floriano Zini2 1 Univ ersit y of Glasgo w, Glasgo w, G12 8QQ, S otland 2 ITC-irst, Via Sommariv e 18, 38050 P o v o (T ren to), Italy 3 CERN, Europ ean Organization for Nu lear

Gridding

  • Upload
    mcemce

  • View
    212

  • Download
    0

Embed Size (px)

DESCRIPTION

gridding dynamic

Citation preview

Page 1: Gridding

See discussions, stats, and author profiles for this publication at: http://www.researchgate.net/publication/2830193

Simulation of Dynamic Grid ReplicationStrategies in OptorSimARTICLE · AUGUST 2003Source: CiteSeerCITATIONS 74DOWNLOADS 183VIEWS 1345 AUTHORS, INCLUDING:William Hamish BellUniversity of Geneva137 PUBLICATIONS 2,125 CITATIONSSEE PROFILEA. Paul MillarDeutsches Elektronen-Synchrotron38 PUBLICATIONS 1,021 CITATIONSSEE PROFILEKurt StockingerZurich University of Applied Sciences115 PUBLICATIONS 3,026 CITATIONSSEE PROFILEAvailable from: Kurt StockingerRetrieved on: 20 September 2015CENTRO PER LA RICERCASCIENTIFICA E TECNOLOGICA 38050 Povo (Trento), Italy Tel.: +39 0461 314312 Fax: +39 0461 302040 e−mail: [email protected] − url: http://www.itc.it SIMULATION OF DYNAMIC GRID REPLICATION STRATEGIES IN OPTORSIM Bell W.B., Cameron D.G., Capozza L., Millar A.P., Stockinger K., Zini F.July 2002Technical Report # 0207−19 Istituto Trentino di Cultura, 2002LIMITED DISTRIBUTION NOTICEThis report has been submitted for publication outside of ITC and will probably be copyrighted if accepted for publication. It has beenissued as a Technical Report for early dissemination of its contents. In view of the transfert of copy right to the outside publisher, itsdistribution outside of ITC prior to publication should be limited to peer communications and specific requests. After outside publication,material will be available only in the form authorized by the copyright owner.Sim ulation of Dynami Grid Repli ation Strategies in OptorSim William H. Bell1 , Da vid G. Cameron1 , Luigi Cap ozza2 , A. P aul Millar1 , Kurt Sto kinger3 , Floriano Zini2 1 Univ ersit y of Glasgo w, Glasgo w, G12 8QQ, S otland 2 ITC-irst, Via Sommariv e 18, 38050 P o v o (T ren to), Italy 3 CERN, Europ ean Organization for Nu lear Resear h, 1211 Genev a, Switzerland Abstra t. Computational Grids normally deal with large omputationally in tensiv e problems on small data sets. In on trast, Data Grids mostly deal with large omputational problems that in turn require ev aluating and mining large amoun ts of data. Repli ation is regarded as one of the ma jor optimisation te hniques for pro viding fast data a ess. Within this pap er, sev eral repli ation algorithms are studied. This is a hiev ed using the Grid sim ulator: Opto rSim. Opto rSim pro vides a mo dular framew ork within whi h optimisation strategies an b e studied under dieren t Grid on

Page 2: Gridding

gurations. The goal is to explore the stabilit y and transien t b eha viour of sele ted optimisation te hniques. 1 In tro du tion Within the Grid omm unit y m u h w ork has b een done on pro viding the basi infrastru ture for a t ypi al Grid en vironmen t. Globus [4℄, Condor [1℄ and re en tly the EU DataGrid [3℄ ha v e on tributed substan tially to ore Grid middlew are servi es soft w are that are a v ailable as the basis for further appli ation dev elopmen t. Ho w ev er, little eort has b een made so far to optimise the use of Grid resour es. T o use a Data Grid, users t ypi ally submit job s. In order for a job to b e exe uted, three t yp es of resour es are required: omputing fa ilities, data a ess and storage, and net w ork onne tivit y . The Grid m ust mak e s heduling de isions for ea h job based on the urren t state of these resour es (w orkload and features of Computing Elemen ts, lo ation of data, net w ork load). Complete optimisation is a hiev ed when the om bined resour e impa t of all jobs is minimised, allo wing jobs to run as fast as p ossible. File repli ation (i.e. spread of m ultiple opies of

Page 3: Gridding

les a ross the Grid) is an ee tiv e te hnique for redu ing data a ess o v erhead. Main taining an optimal distribution of repli as implies that the Grid optimisation servi e [7℄ m ust b e able to mo dify the geographi lo ation of data

Page 4: Gridding

les. This is a hiev ed b y triggering b oth repli ation and deletion of data

Page 5: Gridding

les. By re e ting the dynami load on the Grid, su h repli a managemen t will ae t the migration of parti ular

Page 6: Gridding

les to w ard sites that sho w in reased frequen y of

Page 7: Gridding

le-a ess requests. In order to study the omplex nature of a t ypi al Grid en vironmen t and ev aluate v arious repli a optimisation algorithms, a Grid sim ulator ( alled Opto rSim) w as dev elop ed. In this pap er the design on epts of Opto rSim are dis ussed and preliminary results based on sele ted repli ation algorithms are rep orted. The pap er is stru tured as follo ws. Se tion 2 des rib es the design of the sim ulator Opto rSim. V arious repli ation algorithms are dis ussed in Se tion 3. After setting the sim ulation on

Page 8: Gridding

guration in Se tion 4, Se tion 5 is dedi ated to a des ription of sim ulation results. Se tion 6 highligh ts related w ork. Finally , Se tion 7 on ludes the pap er and rep orts on future w ork. 2 Sim ulation Design Opto rSim [2℄ is a sim ulation pa k age written in Ja v a �. It w as dev elop ed to study the ee tiv eness of repli a optimisation algorithms within a Data Grid en vironmen t. 2.1 Ar hite ture One of the main design onsiderations for Opto rSim is to mo del the in tera tions of the individual Grid omp onen ts of a running Data Grid as realisti ally as p ossible. Therefore, the sim ulation is based on the ar hite ture of the EU DataGrid pro je t [14℄ as illustrated in Figure 1.StorageElementReplicaOptimiserComputingElementReplica ManagerStorageElementReplicaOptimiserComputingElementReplica ManagerStorageElementReplicaOptimiserComputingElementReplica ManagerUser InterfaceResource Broker

Job execution site Job execution site Job execution site Fig. 1. Sim ulated DataGrid Ar hite ture. The sim ulation w as onstru ted assuming that the Grid onsists of sev eral sites, ea h of whi h ma y pro vide omputational and data-storage resour es for submitted jobs. Ea h site onsists of zero or more Computing Elements and zero or more Stor age Elements. Computing Elemen ts run jobs, whi h use the data in

Page 9: Gridding

les stored on Storage Elemen ts and a R esour e Br oker on trols the s heduling of jobs to Computing Elemen ts. Sites without Storage or Computing Elemen ts a t as net w ork no des or routers. The de ision ab out data mo v emen t asso iated with jobs b et w een sites is p erformed b y a omp onen t alled the R epli a Manager. Within the Repli a Manager the de ision to reate or delete repli as is on trolled b y a R epli a Optimiser alled Opto r. A t the heart of Opto r is a repli a optimisation algorithm, the prop erties of whi h are dis ussed in Se tion 3. 2.2 In ternals In the sim ulation ea h Computing Elemen t is represen ted b y a thread. Job submission to the Computing Elemen ts is managed b y another thread: the Resour e Brok er. The Resour e Brok er ensures ev ery Computing Elemen t on tin uously runs jobs b y frequen tly attempting to distribute jobs to all the Computing Elemen ts. When the Resour e Brok er

Page 10: Gridding

nds an idle Computing Elemen t, it sele ts a job to run on it a ording to the p oli y of the Computing Elemen t, i.e. whi h t yp e of jobs it will run and ho w often it will run ea h job. A t an y time, a Computing Elemen t will b e running at most one job. As so on as the job

Page 11: Gridding

nishes, another is assigned b y the Resour e Brok er. So, although there is no expli it job s heduling algorithm, all Computing Elemen ts pro ess jobs for the duration of the sim ulation but are nev er o v erloaded. Curren tly , optimisation only o urs after a job has b een s heduled to a Computing Elemen t. The more omplex s enario of optimising b oth job s heduling and data a ess will b e part of future w ork. Ea h job has a set of

Page 12: Gridding

les it ma y request. Tw o t yp es of referen e ma y b e used for a

Page 13: Gridding

le: a logi al

Page 14: Gridding

le name (LFN) and a ph ysi al

Page 15: Gridding

le name (PFN). An LFN is an abstra t referen e to a

Page 16: Gridding

le that is indep enden t of b oth where the

Page 17: Gridding

le is stored and ho w man y repli as exist. A PFN refers to a sp e i

Page 18: Gridding

repli a of some LFN, lo ated at a de

Page 19: Gridding

nite site. Ea h LFN will ha v e one PFN for ea h repli a in the Grid. A job will t ypi ally request a set of LFNs for data a ess. The order in whi h those

Page 20: Gridding

les are requested is determined b y the a ess pattern. The follo wing a ess patterns w ere onsidered: se quential (the set of LFNs is ordered, forming a list of su essiv e requests), r andom (

Page 21: Gridding

les are sele ted randomly from set with a at probabilit y distribution), unitary r andom walk (set is ordered and su essiv e

Page 22: Gridding

le requests are exa tly one elemen t a w a y from previous

Page 23: Gridding

le request, dire tion is random) and Gaussian r andom walk (as with unitary random w alk, but

Page 24: Gridding

les are sele ted from a Gaussian distribution en tred on the previous

Page 25: Gridding

le request). When a

Page 26: Gridding

le is required b y a job, the

Page 27: Gridding

le's LFN is used to lo ate the b est repli a via the Repli a Optimiser fun tion getBestFile(LFN, destinationStora ge - Element), where destinationStora ge El eme nt is the Storage Elemen t to whi h the repli a ma y b e opied. It is assumed the Computing Elemen t on whi h the job is running and requested Storage Elemen t are lo ated at the same site. getBestFile() he ks the R epli a Catalo gue for opies of the

Page 28: Gridding

le. The Repli a Catalogue is a Grid middlew are servi e urren tly implemen ted within the simulation as a table of LFNs and all orresp onding PFNs. By examining the a v ailable bandwidth b et w een destinationStora ge Ele me nt and all sites on whi h a repli a of the

Page 29: Gridding

le is stored, getBestFile() an ho ose the PFN that will b e a essed fastest and hen e de rease the job running time. The sim ulated v ersion of getBestFile() partially ful

Page 30: Gridding

ls the fun tionalit y as des rib ed in [7℄. It is a blo king all that ma y ause repli ation to a Storage Elemen t lo ated in the site where the job is running. After an y repli ation has ompleted, the PFN of the b est a v ailable repli a is returned to the job. If repli ation has not o urred, the b est repli a is lo ated on a remote site and is a essed b y the job using remote I/O. Both the repli ation time (if repli ation o urs) and the

Page 31: Gridding

le a ess time (if from a remote site) are dep enden t on the net w ork hara teristi s o v er the duration of the onne tion. A t an y time, the bandwidth a v ailable to a transfer is limited b y the lo w est bandwidth along the transfer path. F or transfers utilising a ommon net w ork elemen t, the bandwidth of that elemen t is shared so ea h transfer re eiv es an equal share. 3 Optimisation Algorithms Repli a optimisation algorithms are the ore of the Repli a Optimiser. Ov er the duration of a submitted job, PFNs for ea h LFN are requested b y alling getBestFile(). Optimisation algorithms implemen t getBestFile() so that it ma y op y the requested

Page 32: Gridding

le from the remote site to a Storage Elemen t on the same site as the requesting Computing Elemen t. If all Storage Elemen ts on this site are full then a

Page 33: Gridding

le m ust b e deleted for the repli ation to su eed. The strategy used to de ide whi h

Page 34: Gridding

le should b e deleted dieren tiates optimisation algorithms. In the follo wing, w e brie y presen t three simple algorithms and a more sophisti ated one in greater detail. These algorithms ha v e b een implemen ted in to Opto rSim. 3.1 Simple Algorithms No repli ation. This algorithm nev er repli ates a

Page 35: Gridding

le. The distribution of initial

Page 36: Gridding

le repli as is de ided at the b eginning of the sim ulation and do es not hange during its exe ution. This algorithm returns a PFN with the largest exp e ted bandwidth. Sin e the net w ork load v aries during the sim ulation, the optimal PFN ma y hange. Un onditional repli ation, oldest

Page 37: Gridding

le deleted. This algorithm alw a ys repli- ates a

Page 38: Gridding

le to the site where the job is exe uting. If there is no spa e to a ommo date the repli ation, the oldest

Page 39: Gridding

le in the Storage Elemen t is deleted. Un onditional repli ation, least a essed

Page 40: Gridding

le deleted. This algorithms b eha v es as the previous metho d, ex ept the least a essed

Page 41: Gridding

le in the past time in terv al Æ t is deleted. 3.2 An E onomi Approa h This se tion presen ts a repli ation strategy based on an e onomi mo del for Grid resour e optimisation. A general des ription of this e onomi approa h an b e found in [9℄. The e onomi mo del w e prop ose in ludes a tors (autonomous goal-seeking en tities) and the resour es in the Grid. Optimisation is a hiev ed via in tera tion of the a tors in the mo del, whose goals are maximising the pro

Page 42: Gridding

ts and minimising the osts of data resour e managemen t. Data

Page 43: Gridding

les represen t the go o ds in the mark et. They are pur hased b y Computing Elemen ts for jobs and b y Storage Elemen ts in order to mak e an in v estmen t that will impro v e their rev en ues in the future. They are sold b y Storage Elemen ts to Computing Elemen ts and to other Storage Elemen ts. Computing Elemen ts try to minimise the

Page 44: Gridding

le pur hase ost, while Storage Elemen ts ha v e the goal of maximising pro

Page 45: Gridding

ts. This e onomi mo del is utilised to solv e t w o distin t problems: in de iding if repli ation should o ur and in the sele tion of the exp endable

Page 46: Gridding

le(s) when reating spa e for a new repli a. When a job running on a Computing Elemen t requests a

Page 47: Gridding

le, the optimisation tries to lo ate the heap est op y of it in the Grid b y starting an au tion. Storage Elemen ts that ha v e the

Page 48: Gridding

le lo ally ma y reply , bidding a pri e that indi ates the

Page 49: Gridding

le transfer ost. A site that do es not ha v e a

Page 50: Gridding

le lo ally ma y initiate its o wn au tion to establish if, b y repli ation, it an satisfy the

Page 51: Gridding

le request. This me hanisms realises the global optimisation men tioned ab o v e. Curren tly , the au tion proto ol has still to b e in tegrated in to Opto rSim. In the follo wing dis ussion, w e used the simpler proto ol des rib ed in Se tion 2.2. The me hanism for de iding if repli ation should o ur is implemen ted in Opto rSim. It is des rib ed the follo wing se tion. Repli ation De ision. Within our e onomi mo del the Repli a Optimiser needs to mak e an informed de ision ab out whether it should repli ate a

Page 52: Gridding

le to a lo al Storage Elemen t. This de ision is based on whether the repli ation (with asso iated

Page 53: Gridding

le transfer and

Page 54: Gridding

le deletion) will result in redu ed exp e ted future

Page 55: Gridding

le a ess ost for the lo al Computing Elemen t. In order to mak e this de ision, the Repli a Optimiser k eeps tra k of the

Page 56: Gridding

le requests it re eiv es and uses this history as input to an ev aluation fun tion E (f ; r ; n). This fun tion, de

Page 57: Gridding

ned in [9℄, returns the predi ted n um b er of times a

Page 58: Gridding

le f will b e requested in the next n requests based on the past r requests in the history . After an y new

Page 59: Gridding

le request is re eiv ed b y the Repli a Optimiser (sa y , for

Page 60: Gridding

le f ), the predi tion fun tion E is al ulated for f and ev ery

Page 61: Gridding

le in the storage. If there is no

Page 62: Gridding

le in the Storage Elemen t that has a v alue less than the v alue of f then no repli ation o urs. Otherwise, the least v aluable

Page 63: Gridding

le is sele ted for deletion and a new repli a of f is reated on the Storage Elemen t. If m ultiple

Page 64: Gridding

les on the Storage Elemen t share the minim um v alue, the

Page 65: Gridding

le ha ving the earliest last a ess time is deleted. The ev aluation fun tion E (f ; r ; n) is de

Page 66: Gridding

ned b y the equation E (f ; r ; n) = i X n =1 pi (f ); (1) with the follo wing argumen t. Assuming that requests for

Page 67: Gridding

les on taining similar data are lustered in spatial and time lo alit y , the request history an b e des rib ed as a random w alk in the spa e of in teger

Page 68: Gridding

le iden ti

Page 69: Gridding

ers1 . In the random w alk, the iden ti

Page 70: Gridding

er of the next requested

Page 71: Gridding

le is obtained from the urren t iden ti

Page 72: Gridding

er b y the addition of a step, the v alue of whi h is giv en b y some probabilit y distribution. Assuming a binomial distribution of the steps, the probabilit y of re eiving a request for

Page 73: Gridding

le f at step i of the random w alk is giv en b y the equation pi (f ) = 1 22iS id(f ) 2 iS s + iS ; jid(f ) s j iS (2) where s is the mean v alue of the binomial distribution , S is the maxim um v alue for the step, and id(f ) is a unique

Page 74: Gridding

le iden ti

Page 75: Gridding

er (for instan e, the LFN). Then, the most pr ob able numb er of times

Page 76: Gridding

le f wil l b e r e queste d during the next n r e quests is giv en b y (1). A time in terv al Æ t des rib es ho w far ba k the history go es and th us determines the n um b er r of previous requests whi h are onsidered in the predi tion fun tion. W e assume that the mean arriv al rate of requests is onstan t. On e Æ t has b een de ided, n is obtained b y n = r Æ t0 Æ t (3) where Æ t0 is the future in terv al for whi h w e in tend to do the predi tion. The v alue for S in (2) dep ends on the v alue of r . The mean v alue s is obtained from the re en t v alues of the step in the random w alk. In parti ular, s is al ulated as the w eigh ted a v erage of the last r steps, where w eigh ts de rease o v er past time. 4 Sim ulation Con

Page 77: Gridding

guration 4.1 Grid Con

Page 78: Gridding

guration The study of optimisation algorithms w as arried out using a mo del of EU DataGrid T estBed 1 sites and their asso iated net w ork geometry as illustrated 1 W e assume a mapping b et w een

Page 79: Gridding

le names and iden ti

Page 80: Gridding

ers that preserv e

Page 81: Gridding

le on ten t similarit y . in Figure 2. Within this mo del, ea h site w as allo ated storage resour es prop ortional to their a tual hardw are allo ations. Ea h T estBed site, ex luding CERN, w as assigned a Computing and Storage Elemen t. CERN w as allo ated a Storage Elemen t to hold all of the master

Page 82: Gridding

les but w as not assigned a Computing Elemen t. Routers, as previously stated, w ere des rib ed b y reating a site without Computing or Storage Elemen ts. The size of the Storage Elemen ts for ea h T estBed site are giv en in T able 1.

10G10G45M45M1G155M2.5GImperial College2.5G2.5GLyon155M10G 2.5G622M10GNIKHEFRALNorduGrid10G155M2.5G10GPadovaBologna155M10MCataniaTorinoMilano 10MCERNTestbed site

Router Fig. 2. The EU DataGrid T estBed 1 sites and the appro ximate net w ork geometry . The n um b ers indi ate the bandwidth b et w een t w o sites. Site Name Bologna Catania CERN Imp erial College Ly on Storage Elemen t (GBytes) 30 30 10000 80 50 Site Name Milano NIKHEF NorduGrid P ado v a RAL T orino Storage Elemen t (GBytes) 50 70 63 50 50 50 T able 1. A list of resour es allo ated to the T estBed 1 sites, from whi h the results in this pap er w ere generated. 4.2 Job Con

Page 83: Gridding

guration Initially , all

Page 84: Gridding

les w ere pla ed on the CERN Storage Elemen t. Jobs w ere based on the CDF use- ase as des rib ed in [12℄. There w ere six job t yp es, with no o v erlap b et w een the set of

Page 85: Gridding

les ea h job a essed. The total size of the

Page 86: Gridding

le a essed b y an y job t yp e w ere estimated in [12℄ and are summarised in T able 2. Ea h set of

Page 87: Gridding

les w as assumed to b e omp osed of 10GByte

Page 88: Gridding

les. There will b e some distribution of jobs ea h site p erforms. In the sim ulation, w e mo delled this distribution su h that ea h site ran an equal n um b er of jobs of ea h t yp e ex ept for a preferred job t yp e, whi h ran t wi e as often. This job t yp e w as hosen for ea h site based on storage onsiderations; for the repli ation algorithms to b e ee tiv e, the lo al storage on ea h site had to b e able to hold all the

Page 89: Gridding

les for the preferred job t yp e. Data Sample T otal Size (GBytes) Cen tral J = 1200 High pt leptons 200 In lusiv e ele trons 5000 In lusiv e m uons 1400 High Et photons 5800 Z 0 ! bb 600 T able 2. Estimated sizes of CDF se ondary data sets (from [12℄). 5 Results Fig. 3. A histogram of job duration (left) and the progression of job duration o v er ourse of the sim ulation (righ t). The left histogram in Figure 3 sho ws a t ypi al spread of job duration for a single job t yp e at a sele ted Computing Elemen t o v er the ourse of a sim ulation run. The large spik e near zero is due to the job requesting

Page 90: Gridding

les that are a v ailable on the lo al site, hen e no time- onsuming

Page 91: Gridding

le transfers need to tak e pla e. The longer durations are due to the job requesting some

Page 92: Gridding

les not presen t at the lo al site. The spread is due to the net w ork load, whi h an v ary o v er time, ae ting the

Page 93: Gridding

le transfer times. The v ariation of job duration o v er the sim ulation is sho wn in the righ t histogram in Figure 3 for the same job t yp e and Computing Elemen t as ab o v e. There is learly a large v ariation in the job duration due to the fa tors already men tioned, but the general trend is for jobs to b e exe uted more qui kly o v er time, indi ating the mo v emen t to w ard a more optimal repli a on

Page 94: Gridding

guration. Fig. 4. In tegrated running times for 10000 jobs using ea h a ess pattern and repli a optimisation algorithm. F urther tests w ere ondu ted sim ulating 10000 jobs using ea h of the four algorithms: 1. No r epli ation 2. Un onditional r epli ation, oldest

Page 95: Gridding

le delete d 3. Un onditional r epli ation, le ast a esse d

Page 96: Gridding

le delete d 4. E onomi Mo del F or ea h repli ation algorithm, ea h of the follo wing four

Page 97: Gridding

le a ess patterns (as de

Page 98: Gridding

ned in Se tion 2.2) w as tested. 1. Se quential 2. R andom 3. Unitary r andom walk 4. Gaussian r andom walk Figure 4 sho ws the total time to omplete 10000 jobs for ea h of the four a ess patterns using the four optimisation algorithms. With no optimisation, the jobs tak e m u h longer than ev en the simplest optimisation algorithm as all the

Page 99: Gridding

les for ev ery job ha v e to b e transferred from CERN ev ery time a job is run. The three algorithms where repli ation is ondu ted all sho w a mark ed redu tion in the time to exe ute 10000 jobs. This is not surprising as with no repli ation, all

Page 100: Gridding

le requests from all jobs m ust ome from CERN. The three optimisation algorithms that implemen t repli ation sho w similar p erforman e for Random, Unitary random w alk and Gaussian random w alk. F or sequen tial a ess patterns, the running time is at least 10% faster using the E onomi Mo del optimiser than the other optimisers. These results w ere exp e ted as the E onomi Mo del assumes a sequen tial a ess pattern. Ho w ev er, this an b e adjusted to mat h the observ ed distribution, if needed. 6 Related W ork Re en tly there has b een great in terest in mo delling Data Grid en vironmen ts. A sim ulator for mo delling omplex data a ess patterns of on urren t users in a distributed system is found in [13℄. These studies w ere mainly ondu ted within the setting of s ien ti

Page 101: Gridding

exp erimen ts su h as the LHC, whi h

Page 102: Gridding

nally resulted in the reation of the EU DataGrid pro je t [3℄. Mi roGrid [18℄ is a sim ulation to ol for designing and ev aluating Grid middlew are, appli ations and net w ork servi es for the omputational Grid. Curren tly , this sim ulator do es not tak e data managemen t issues in to onsideration. F urther Grid sim ulators are presen ted in [11, 6℄ In [15℄ an approa h is prop osed for automati ally reating repli as in a t ypi al de en tralised P eer-to-P eer net w ork. The goal is to reate a ertain n um b er of repli as on a giv en site in order to guaran tee some minimal a v ailabilit y requiremen ts. In Nimro d-G [8, 5℄ an e onomi mo del for job s heduling is in tro du ed in where \Grid redits" are assigned to users that are prop ortional to their lev el of priorit y . In this mo del, optimisation is a hiev ed at the s heduling stage of a job. Ho w ev er, our approa h diers b y in luding b oth optimal repli a sele tion and automated repli a reation in addition to s heduling-stage optimisation. V arious repli ation and a hing strategies within a sim ulated Grid en vironmen t are dis ussed in [16℄ and their om bination with s heduling algorithms is studied in [17℄. The repli ation algorithms prop osed are based on the assumption that p opular

Page 103: Gridding

les in one site are also p opular in other sites. Repli ation from one site to another is triggered when the p opularit y of a

Page 104: Gridding

le o v er omes a threshold and the destination site is hosen either randomly or b y sele ting the least loaded site. W e tak e a omplemen tary approa h. Our repli ation algorithms are used b y Grid sites when they need data lo ally and are based on the assumption that in omputational Grids there are areas (so alled \data hot-sp ots") where parti ular sets of data are highly requested. Our algorithms ha v e b een designed to mo v e data

Page 105: Gridding

les to w ard \data hot-sp ots". 7 Con lusions and F uture W ork In this pap er w e des rib ed the design of the Grid sim ulator Opto rSim. In parti ular, Opto rSim allo ws the analysis of v arious repli ation algorithms. The goal is to ev aluate the impa t of the hoi e of an algorithm on the throughput of t ypi al Grid jobs. W e ha v e hosen t w o traditional a he managemen t algorithms (oldest

Page 106: Gridding

le deletion and least a essed

Page 107: Gridding

le deletion) and ompared them to a no v el algorithm based on an e onomi mo del. W e based our analysis on sev eral Grid s enarios with v arious w ork loads. Results obtained from Opto rSim suggest that the e onomi mo del p erforms at least as w ell as traditional metho ds. In addition, there are sp e i

Page 108: Gridding

realisti ases where the e onomi mo del sho ws mark ed p erforman e impro v emen ts. Our future w ork will extending the sim ulator b y in luding the au tion proto ol prop osed in [10℄. This is motiv ated b y the additional fun tionalit y of automati repli ation to third part y sites, allo wing

Page 109: Gridding

le migration to a urately mat h demand. A kno wledgemen ts The authors thank Erwin Laure, Heinz Sto kinger and Ek o w Oto o for v aluable dis ussions during the preparation of this pap er. William Bell and Da vid Cameron thank PP AR C for funding as part of the GridPP(EDG) pro je t and as an e-S ien e studen t resp e tiv ely; P aul Millar thanks SHEF C for funding under the S otGRID pro je t. Referen es 1. The ondor pro je t. http://www. s.wis .edu/ ondor/ . 2. OptorSim - A Repli a Optimiser Sim ulation. http://grid- data- management . web. ern. h/grid- data- managem ent/o ptim isat i%on/ opto r/. 3. The DataGrid Pro je t. http://www.eu- datagrid.org. 4. The Globus Pro je t. http://www.globus.org. 5. D. Abramson, R. Buuy a, and J. Giddy . A Computational E onom y for Grid Computing and its Implemen tation in the Nimro d-G Resour e Brok er. In F utur e Gener ation Computer Systems, to app ear. 6. K. Aida, A. T ak efusa, H. Nak ak a, S. Matsuok a, S. Sekigu hi, and U. Nagashima. P erforman e Ev aluation Mo del for S heduling in a Global Computing System. International Journal of High Performan e Appli ations, 14(3), 2000. 7. W. H. Bell, D. G. Cameron, L. Cap ozza, P . Millar, K. Sto kinger, and F. Zini. Design of a Query Optimisation Servi e. T e hni al rep ort, CERN, 2002. WP2 - Data Managemen t, EU DataGrid Pro je t. http://edms. ern. h/do ument/3 37977 . 8. R. Buyy a, H. Sto kinger, J. Giddy , and D. Abramson. E onomi Mo dels for Managemen t of Resour es in P eer-to-P eer and Grid Computing. In Commer ial Appli ations for High-Performan e Computing, SPIE's International Symp osium on the Conver gen e of Information T e hnolo gies and Communi ations (ITCom 2001), Den v er, Colorado, USA, August 2001. 9. L. Cap ozza, K. Sto kinger, and F. Zini. Preliminary Ev aluation of Rev en ue Predi tion F un tions for E onomi ally-Ee tiv e File Repli ation, June 2002. 10. M. Carman, F. Zini, L. Sera

Page 110: Gridding

ni, and K. Sto kinger. T o w ards an E onom y-Based Optimisation of File A ess and Repli ation on a Data Grid. In International Workshop on A gent b ase d Cluster and Grid Computing at International Symp osium on Cluster Computing and the Grid (CCGrid 2002), Berlin, German y , Ma y 2002. IEEE Computer So iet y Press. Also app ears as IRST T e hni al Rep ort 0112-04, Istituto T ren tino di Cultura, De em b er 2001. 11. H. Casano v a, G. Ob ertelli an F. Berman, and R. W olski. The AppLeS P arameter Sw eep T emplate: User-Lev el Middlew are for the Grid. In Pr o . of Sup er Computing 2002, Dallas, T exas, USA, No v em b er 2002. 12. B. T. Human et al. The CDF/D0 UK GridPP Pro je t. CDF In ternal Note. 5858. 13. I. C. Legrand. Multi-Threaded, Dis rete Ev en t Sim ulation of Distributed Computing Systems. In Pr o . of Computing in High Ener gy Physi s (CHEP 2000), P ado v a, Italy , F ebruary 2000. 14. EU DataGrid Pro je t. The DataGrid Ar hite ture, 2001. 15. K. Ranganathan, A. Iamnit hi, and I. F oster. Impro ving Data Av ailabilit y through Dynami Mo del-Driv en Repli ation in Large P eer-to-P eer Comm unities. In Glob al and Pe er-to-Pe er Computing on L ar ge S ale Distribute d Systems Workshop, Berlin, German y , Ma y 2002. 16. K. Ranganathana and I. F oster. Iden tifying Dynami Repli ation Strategies for a High P erforman e Data Grid. In Pr o . of the International Grid Computing Workshop, Den v er, Colorado, USA, No v em b er 2001. 17. K. Ranganathana and I. F oster. De oupling Computation and Data S heduling in Distributed Data-In tensiv e Appli ations. In International Symp osium of High Performan e Distribute d Computing, Edin burgh, S otland, July 2002. T o app ear. 18. H. J. Song, X. Liu, D. Jak obsen, R. Bhagw an, X. Zhang, K. T aura, and A. Chien. The Mi roGrid: a S ien ti

Page 111: Gridding

T o ol for Mo deling Computational Grids. S ienti

Page 112: Gridding

Pr o gr amming, 8(3):127{141, 2000.