11

Click here to load reader

Replica Management in the European DataGrid Project

Embed Size (px)

Citation preview

Page 1: Replica Management in the European DataGrid Project

Journal of Grid Computing (2004) 2: 341–351 © Springer 2005

Replica Management in the European DataGrid Project�

David Cameron2, James Casey1, Leanne Guy1, Peter Kunszt1,��, Sophie Lemaitre1,Gavin McCance2, Heinz Stockinger1, Kurt Stockinger1, Giuseppe Andronico3, William Bell2,Itzhak Ben-Akiva4, Diana Bosio1, Radovan Chytracek1, Andrea Domenici3, Flavia Donno3,Wolfgang Hoschek1, Erwin Laure1, Levi Lucio1, Paul Millar2, Livio Salconi3, Ben Segal1 andMika Silander5

1CERN, European Organization for Nuclear Research, CH-1211 Geneva 23, Switzerland2University of Glasgow, Glasgow G12 8QQ, Scotland3INFN, Istituto Nazionale di Fisica Nucleare, Italy4Weizmann Institute of Science, Rehovot 76100, Israel5University of Helsinki, Finland

Key words: data intensive computing, data replication, data transfer, Grid computing

Abstract

Within the European DataGrid project, Work Package 2 has designed and implemented a set of integrated replicamanagement services for use by data intensive scientific applications. These services, based on the web servicesmodel, enable movement and replication of data at high speed from one geographical site to another, managementof distributed replicated data, optimization of access to data, and the provision of a metadata management tool. Inthis paper we describe the architecture and implementation of these services and evaluate their performance underdemanding Grid conditions.

1. Introduction

The European DataGrid (EDG) project was chargedwith providing a Grid infrastructure for the massivecomputational and data handling requirements of sev-eral large scientific experiments. The size of theserequirements brought the need for scalable and robustdata management services. These services had to man-age replication of large amounts of data across widearea networks and provide transparent access to dis-tributed storage systems holding this data. Creatingthese services was the task of EDG Work Package 2(WP2).

The first prototype replica management systemwas implemented early in the lifetime of the projectin C++ and comprised the edgreplica-manager [24],based on the Glob us toolkit, and the Grid Data Mirror-ing Package (GDMP) [25]. GDMP was a service for

� This work was partially funded by the European Commissionprogram IST- 2000-25182 through the European DataGrid Project.

�� Corresponding author.

the replication (mirroring) of file sets between StorageElements and together with the edg-replica-manager itprovided basic replication functionality.

After the experience gained from deployment ofthese prototypes and feedback from users, it was de-cided to adopt the web services paradigm [31] andimplement the replica management components inJava. The second generation replica management sys-tem now includes the following services: the ReplicaLocation Service, the Replica Metadata Catalog, andthe Replica Optimization Service. The primary inter-face between users and these services is the ReplicaManager client.

In this paper we discuss the architecture and func-tionality of these components and analyse their per-formance. The results show that they can handle userloads as expected and scale well. WP2 services havealready been used as production services for the LHCComputing Grid [22] in preparation for the start ofthe next generation of physics experiments at CERN

Page 2: Replica Management in the European DataGrid Project

342

in 2007. A “data challenge” run by the CMS experi-ment from March to May 2004 successfully used theReplica Location Service to store and query informa-tion on over two million replicated data files.

The paper is organised as follows: in Section 2we give an overview of the architecture of the WP2services and in Section 3 we describe the replicationservices in detail. In Section 4 we evaluate the per-formance of the replication services and Section 5discusses directions of possible future work. Relatedwork is described in Section 6 and we conclude inSection 7.

2. Design and Architecture

The WP2 replica management services [10, 20] arebased on web services and implemented in Java,adhering to the following principles:− Modularity: Replica management components

were designed to be modular so that plug-insand future extensions are easy to apply and eachservice can operate independently of the others.

− Evolution: The future directions and standards forGrid services are not well defined, but severalgroups are working to implement a frameworkfor Service Oriented Computing [17, 18]. Thereplica management components were designed tobe flexible with respect to these evolving areas andthe design allowed for an easy adoption of theseconcepts.

− Deployment: A vendor neutral approach wasadopted for all components to allow for differentdeployment scenarios. The data management ser-vices are independent of the underlying operatingsystem and have been tested on Tomcat and the Or-acle 9i Application Servers, interfacing to MySQLand Oracle database back-ends.

2.1. Web Service Design

Web service technologies [31] provide an easy andstandardized way to logically connect distributed ser-vices via XML (eXtensible Markup Language) mes-saging. They provide a platform and language inde-pendent way of accessing the information held bythe service and, as such, are highly suited to amulti-language, multi-domain environment such as aDataGrid.

All the replica management services have beendesigned and deployed as web services and run on

Apache Axis [4] inside a Java servlet engine. All ser-vices use the Java reference servlet engine, Tomcat [5],from the Apache Jakarta project [28]. The ReplicaMetadata Catalog and Replica Location Service havealso been successfully deployed into the Oracle 9iApplication Server and are being used in productionmode in the LOG project [22].

The services expose a standard interface in WSDLformat [32] from which client stubs can be generatedautomatically in any of the common programming lan-guages. A user application can then invoke the remoteservice directly. Pre-built client stubs are packaged asJava JAR files and shared and static libraries for Javaand C++, respectively. C++ clients are built basedon the gSOAP toolkit [29]. Client Command LineInterfaces are also provided.

The communication between the client and servercomponents is via the HTTP(S) protocol and the dataformat of the messages is XML, with the request beingwrapped using standard SOAP Remote Procedure Call(RPC). Persistent data is stored in a relational databasemanagement system. Services that make data persis-tent have been tested and deployed with both opensource (MySQL) and commercial (Oracle 9i) data-base back-ends, using abstract interfaces so that otherRDBMS systems can be easily slotted in.

3. Replication Services

The design of the replica management system is mod-ular, with several independent services interacting viathe Replica Manager, a logical single point of entry tothe system for users and other external services. TheReplica Manager coordinates the interactions betweenall components of the system and uses underlying filetransport services for replica creation and deletion.Query functionality and cataloging are provided bythe Replica Metadata Catalog and Replica LocationService. Optimized access to replicas is provided bythe Replica Optimization Service, which aims to min-imize file access times by directing file requests toappropriate replicas.

The Replica Manager is implemented as a clientside tool. The Replica Metadata Catalog, Replica Lo-cation Service and the Replica Optimization Serviceare all stand-alone services, allowing for a multitudeof deployment scenarios in a distributed environment.One advantage of such a design is that if any service isunavailable, the Replica Manager can still provide thefunctionality that does not make use of that particular

Page 3: Replica Management in the European DataGrid Project

343

service. Also, critical service components may havemore than one instance to provide a higher level ofavailability and avoid service bottlenecks.

The downside is that a lot of the coordinating logichappens at the client side so asynchronous interactionis not possible. In cases of failure on the client side,the user is left with little possibility to automaticallyre-try the operations so he has to deal with these caseshimself. Before a Replica Management Service couldbe provided we would need a secure fine-grained dele-gation mechanism by means of which the client couldsafely request the service to perform a well definedlimited set of tasks. Full delegation is available butthis gives little advantage over simple authenticationwhich is built into the services already.

3.1. Replica Manager

For the user, the main entry point to the ReplicationServices is through the Replica Manager client inter-face that is provided via C++ and Java APIs and acommand line interface. The actual choice of the ser-vice component to be used can be specified throughconfiguration files. Java dynamic class loading fea-tures are exploited to make them available at executiontime in the Replica Manager.

The Replica Manager also calls upon services ex-ternal to WP2. An Information Service such as MDS(Monitoring and Discovery Service) or R-GMA (Re-lational Grid Monitoring Architecture) needs to bepresent, as well as storage resources with a well-defined interface, in our case SRM (Storage ResourceManager) or the EDG-SE (EDG Storage Element).The Replica Manager also makes use of transportmechanisms such as GridFTP.

3.2. Replica Location Service

In a highly geographically distributed environment,providing global access to data can be facilitated viareplication, the creation of remote read-only copies offiles. In addition, data replication can reduce accesslatencies and improve system robustness and scalabil-ity. However, the existence of multiple replicas of filesin a system introduces additional issues. The replicasmust be kept consistent, they must be locatable andtheir lifetime must be managed. The Replica LocationService (RLS) is a system that maintains and providesaccess to information about the physical locations ofcopies of files [13].

The RLS architecture defines two types of com-ponents: the Local Replica Catalog (LRC) and the

Replica Location Index (RLI). The LRC maintains in-formation about replicas at a single site or on a singlestorage resource, thus maintaining reliable, up to dateinformation about the independent local state. The RLIis a (distributed) index that maintains soft collectivestate information obtained from any number of LRCs.

Globally Unique IDentifiers (GUIDs) are 128-bitnumbers used as guaranteed unique identifiers for dataon the Grid. In the LRC each GUID is mapped toone or more physical file names identified by StorageURLs (SURLs), which represent paths to the physicallocations of each replica of the data. The RLI storesmappings between GUIDs and the LRCs that hold amapping for that GUID. A query on a replica is a twostage process. The client first queries the RLI in or-der to determine which LRCs contain mappings for agiven GUID. One or more of the identified LRCs isthen queried to find the associated SURLs.

An LRC is configured at deployment time to sub-scribe to one or more RLIs. The LRCs periodicallypublish the list of GUIDs they maintain to the set ofRLIs that index them using a soft state protocol, mean-ing that the information in the RLI will time out andmust be refreshed periodically. The soft state informa-tion is sent to the RLIs in a compressed format usingbloom filter objects [9].

An LRC is typically deployed on a per site basis,or on a per storage resource basis, depending on thesite’s resources, needs and configuration. A site willtypically deploy 1 or more RLIs depending on usagepatterns and need. The LRC can also be deployed towork in stand-alone mode instead of fully distributedmode, providing the functionality of an replica catalogoperating in a fully centralized manner. In stand-alonemode, one central LRC holds the GUID to SURLmappings for all the distributed Grid files.

3.3. Replica Metadata Catalog Service

The GUIDs stored in the RLS are neither intuitive noruser friendly. The Replica Metadata Catalog (RMC)allows the user to define and store Logical File Name(LFN) aliases to GUIDs. Many LFNs may exist forone GUID but the LFN must be unique within theRMC. The relationship between LFNs, GUIDs andSURLs and how they are stored in the catalogs issummarised in Figure 1.

In addition, the RMC can store GUID metadatasuch as file size, owner and creation date. The RMCis not intended to manage all generic experimentalmetadata, however it is possible to extend the RMC to

Page 4: Replica Management in the European DataGrid Project

344

maintain O(10) items of user definable metadata. Thismetadata provides a means for a user to query the filecatalog based upon application-defined attributes.

The RMC is implemented using the same technol-ogy choices as the RLS, and thus supports differentback-end database implementations, and can be hostedwithin different application server environments.

The reason for providing a separate RMC servicefrom the RLS for the LFN mapping is the different ex-pected usage patterns of the LFN and replica lookups.The LFN to GUID mapping and the correspondingmetadata are used by the users for preselection of thedata to be processed. However the replica lookup hap-pens at job scheduling time when the locations of thereplicas need to be known and at application runtimewhen the user needs to access the file.

Figure 1. The logical file name to GUID mapping is maintainedin the Replica Metadata Catalog, the GUID to physical file name(SURL) mapping in the RLS.

3.4. Replica Optimization Service

Optimization of the use of computing, storage and net-work resources is essential for application jobs to beexecuted efficiently. The Replica Optimization Service(ROS) [7] focuses on the selection of the best replicaof a data file for a given job, taking into account thelocation of the computing resources and network andstorage access latencies.

Network monitoring information provided by ex-ternal EDG services is used by the ROS to obtaininformation on network latencies between the variousGrid resources. This information is used to calculatethe expected transfer time of a given file with a spe-cific size. The ROS can also be used by the ResourceBroker to schedule user jobs to the site from which thedata files required can be accessed in the shortest time.

The ROS is implemented as a light-weight webservice that gathers information from the EuropeanDataGrid network monitoring service and performsfile access optimization calculations based on thisinformation.

3.5. Service Interactions

The interaction between the various replica manage-ment services can be explained through a simple caseof a user wishing to make a copy of a file currentlyavailable on the Grid to another Grid site (Figure 2).The user supplies the LFN of the file and the des-tination storage location to the Replica Manager (1).The Replica Manager contacts the RMC to obtain the

Figure 2. Typical usage of the EDG data management services.

Page 5: Replica Management in the European DataGrid Project

345

GUID of the file (2), then uses this to query the RLSfor the locations of all currently existing replicas (3).The ROS calculates the best site from which the fileshould be copied based on network monitoring in-formation (4). The Replica Manager then copies thefile (5) and registers the new replica information in theRLS (6).

4. Evaluation of Data Management Services

Grid middleware components must be designed towithstand heavy and unpredictable usage and theirperformance must scale well with the demands ofthe Grid. Therefore all the WP2 services were testedfor performance and scaleability under stressful con-ditions. Some results of these tests are presented inthis section and they show the services can handle theloads as expected and scale well.

Clients for the services are available in three forms:C++ API, Java API, and a Command Line Interface(CLI). It was envisaged that the CLI, typing a com-mand by hand on the command line of a terminal,would be mainly used for testing an installation or in-dividual command. The APIs on the other hand wouldbe used directly by applications’ code and would avoidthe need for the user to interact directly with the mid-dleware. Tests were carried out using all three clientsfor each component and as the results will show, usingthe API gives far better performance results than usingthe CLI. The reasons for this will be explained in thissection.

All the services can be run as secure or insecureservices. Security within the services is based on theGrid Security Infrastructure (GSI) [11] developed byGlobus adapted for the web services architecture usedby the WP2 replica management services. The useof secure services involves mutual authentication toestablish the identities of both the client and serverand authorization mechanisms to determine the accessrights of the user for the particular service. This canadd a significant overhead to the performance of theservices involved. Here results are shown from testsusing both secure and insecure services.

The performance tests were run on the WP2 test-bed, consisting of 13 machines in 5 different sites.All the machines had similar specifications and oper-ating systems and ran identical versions of the WP2middleware. The application server used to deploy theservices was Apache Tomcat 4 and for storing data onthe server side, MySQL was used. For most of the per-formance tests small test applications were developed;

these are packaged with the software and can thereforebe re-run to check the results obtained.

4.1. Replica Location Service

Within the European DataGrid testbed, the RLS so farhas only been used with a single LRC per Virtual Or-ganization (group of users collaborating on the sameexperiment or project). Therefore results are presentedshowing the performance of a single LRC.

Firstly, the C++ client was tested using a test suitewhich inserts a number of GUID : SURL mappings,queries for one GUID and then deletes the mappings.This tests how each of these operations on the LRCscales with the number of entries in the catalog.

Figure 3(a) shows the total time to insert and deleteup to 10 million mappings, and Figure 3(b) shows howthe time to query one entry varies with the number ofentries in the LRC. These tests were run without GSIsecurity.

The results show that insert and delete operationshave stable behaviour, in that the total time to insertor delete mappings scales linearly with the numberof mappings inserted or deleted. A single transactionwith a single client thread takes about 25–29 ms withthe tendency that delete operations are slightly slowerthan inserts. The query time is independent of thenumber of entries in the catalog up to around 1 millionentries, when it tends to increase.

Taking advantage of the multiple threading ca-pabilities of Java, it was possible to simulate manyconcurrent users of the catalog and monitor the per-formance of the Java API.

The first test was done with 10 concurrent threads,where at any given moment 5 threads would be in-serting a mapping and 5 threads would be queryinga mapping. Figure 4 compares insert time and querytime for the LRC with varying numbers of entries us-ing (a) a secure service and (b) an insecure service.Using security creates significant overheads due to thecreation of a secure communications channel betweenthe client and server and the time to authorise clientrequests according to a certain access control policy.In this case these extra processes make catalog opera-tions take up to 20 times longer than operations on aninsecure catalog.

In Figure 4(a) the variation of operation times withthe number of entries is somewhat obscured by thelarge security overhead but in general queries takeslightly less time than inserts. The trend is a lot clearer

Page 6: Replica Management in the European DataGrid Project

346

(a) (b)

Figure 3. (a) Total time to add and delete mappings and (b) query the LRC using the C++ API.

(a) (b)

Figure 4. Time to insert mappings and query one GUID for different numbers of entries in the LRC, using 5 concurrent inserting clients and5 concurrent querying clients with (a) a secure service and (b) an insecure service.

in Figure 4(b), which shows the insert time rising from140 ms to 200 ms as the catalog fills up but the querytime remaining at a constant 100 ms and not varyingwith the number of entries.

To measure the effective throughput of the LRC,i.e. the time to complete the insert of a certain numberof entries, the total time to insert 500,000 mappingswas measured for different numbers of concurrentthreads using the Java API on an insecure service.Figure 5(a) shows that the time falls rapidly with in-creasing numbers of threads, bottoming out after 10 or20 threads. For 20 threads the total time taken is about40% less than using one thread. Although the timefor an individual operation is slower the more concur-rent operations are taking place, the overall throughput

actually increases, showing the ability of the LRC tohandle multiply threaded operations.

A direct comparison between the Java and C++API performance when inserting one mapping intothe LRC for varying numbers of entries in the cat-alog is given in Figure 5(b). This test was carriedout with one client thread operating on an insecureservice and the results confirm the behaviour seen inFigures 3(a) and 4(b), that C++ has much more stableand performant behaviour than Java. The total time tocomplete the 500,000 operations was just over 9000susing the C++ API and almost 13500s with the JavaAPI, a difference of 50%.

Similar performance tests on the C-based Globusimplementation of the Replica Location Service

Page 7: Replica Management in the European DataGrid Project

347

(a) (b)

Figure 5. Total time to add 500,000 mappings to the LRC using concurrent threads and (b) comparison of Java and C++ API performance.

(a) (b)

Figure 6. Total time to (a) insert and delete 10 GUIDs with varying number of LFNs, and (b) query for one LFN.

framework are presented in [14]. Here peak perfor-mance is also seen when around 10 concurrent clientthreads are using the service, however the rates ofoperations achieved are much higher, reaching 750 in-serts and 2000 queries per second (after tuning certaindatabase backend parameters).

4.2. Replica Metadata Catalog

The Replica Metadata Catalog can be regarded as anadd-on to the RLS system and is used by the ReplicaManager to provide a complete view on LFN : GUID: SURL (Figure 1) mapping. In fact the way the RMCand LRC are used is exactly the same, only the datastored is different and thus one would expect similarperformance from both components.

In the European DataGrid model, there can bemany user defined LFNs to a single GUID and sothe behaviour of the RMC was analysed in the casewhere there were multiple LFNs mapped to a singleGUID. Figure 6(a) shows the time to insert and delete10 GUIDs with different numbers of LFNs mappedto each GUID and Figure 6(b) shows the time toquery for 1 LFN with varying numbers of LFNs perGUID. These tests used the C++ API and an insecureservice.

The insert/delete times increase linearly as onemight expect, since each new LFN mapping to theGUID is treated in a similar way to inserting a newmapping, thus the effect is to give similar results tothe insert times for the LRC seen in Figure 3 in termsof number of operations performed. Query operations

Page 8: Replica Management in the European DataGrid Project

348

Figure 7. Variation of insert time with different numbers of entries in the catalog and different numbers of attributes per entry.

take longer the more LFNs exist for a single GUID,however the query time per LFN mapped to the GUIDactually decreases the more mappings there are, hencethe RMC performance scales well with the number ofmappings associated with each GUID.

Along with LFN aliases, in the RMC users candefine O(10) associated metadata attributes to eachalias, such as date created, file size or application spe-cific data such as the energy used to create a specificphysics event or the coordinates of a satellite image.This data can be used to query a subset of data fromthe catalog.

The time to insert mappings into the catalog in-cluding these attributes was measured and results areshown in Figure 7. Mappings were inserted with 1, 5or 10 attributes where half the attributes were stringattributes and half were integer attributes. In all casesone thread performed the inserts using the Java APIand an insecure RMC service.

It can be seen that adding an attribute to an aliastakes an equivalent amount of time to adding the map-ping itself, i.e. 25–30 ms. The number of attributesadded and hence the increasing amount of informationheld by the database does not affect the insert timings,as the three sets of results have essentially the sameshape, in that there is a very gradual increase in inserttime as the number of entries in the catalog increases.

The command line interface for all the services isimplemented in Java using the Java API. Table 1 showssome timing statistics giving the time to execute dif-ferent parts of the command addAlias used to insert a

Table 1. Timing statistics for adding a GUID : LFN mapping in theRMC using the CLI.

Time (s) Operation

0–1.0 Start-up script and JVM start-up time

1.0–1.1 Parse command and options

1.1–2.1 Get RMC service locator

2.1–2.3 Get RMC object

2.3–3.7 Call to the rmc.addAlias( ) method

2.3–3.0 Java class loading

3.0–3.6 Security authorisation

3.6–3.7 Database operations

3.7 End

GUID : LFN mapping into the RMC using a secureservice.

The total time to execute the command was 3.7 sand this time is broken down into the following areas:The start-up script sets various options such as loggingparameters and the class-path for the Java executableand this, along with the time to start the Java VirtualMachine, took 1.0 s. After parsing the command lineit took a further 1.0 s to get the LRC service locator –during this time many external classes had to be loadedin.

The call to the addAlias( ) method within the JavaAPI took around 1.4 s, mainly due to the Java dynamicclass loading that takes place first time a method iscalled (0.7 s) and the security authorisation (0.6 s). Thedatabase operations themselves took less than 0.1 s.Compared to the average over many calls of around

Page 9: Replica Management in the European DataGrid Project

349

25 ms observed above in the API tests, this is verylarge, and because every time the CLI is used a newJVM is started up, the time to execute the command isthe same every time.

In short, the time taken to insert a GUID : LFNmapping into the RMC using the command line inter-face is about 2 orders of magnitude longer than theaverage time taken using the Java or C++ API. There-fore the command line tool is only recommended forsimple testing and not for large scale operations on thecatalog.

5. Open Issues and Future Work

Most of the services provided by WP2 have satisfiedthe basic user requirements and the software systemcan be used efficiently in a Data Grid environment.However, several areas still need work.

5.1. User Feedback

There are a number of capabilities that have been re-quested by the users of our services or that we havedescribed and planned in the overall architecture butdid not implement within the project.

There is currently no proper transaction support inthe Replica Management services. This means that if aseemingly atomic operation is composite, like copyinga file and registering it in a catalog, there is no transac-tional safety mechanism if only half of the operation issuccessful. This may leave the content of the catalogsinconsistent with respect to the actual files in storage.A consistency service scanning the catalog content andchecking its validity also would add to the quality ofservice.

The other extreme is the grouping of several op-erations into a single transaction. Use cases from thehigh energy physics community have shown that thegranularity of interaction is not on a single file or evenof a collection of files. Instead, they would like tosee several operations managed as a single operativeentity. These are operations on sets of files, spawnedacross several jobs, involving operations like replica-tion, registration, unregistration, deletion, etc. Thiscan be managed in a straightforward manner if replicamanagement jobs are assigned to a session. The Ses-sion Manager would hand out session IDs and finalizesessions when they are closed, i.e. only at that timewould all changes to the catalogs be visible to all othersessions. In this context sessions are not to be misin-terpreted as transactions, as transactions may not span

different client processes; sessions are also managedin a much more lazy fashion.

We have also received requests for the support offile-system semantics in the logical file namespace, in-cluding the support for directories and fine-grainedaccess control mechanisms in the catalog layer. Thedirectory support in the logical namespace would sat-isfy most requests for proper file collections if meta-data can also be associated with directory entries. Formore complex collection semantics there might be theneed for additional metadata.

In the original WP2 design, pre- and post-processing hooks were foreseen to be availablethrough the Replica Manager. The reason to have suchhooks is that many Virtual Organizations have use-cases concerning replication where they have to addsome processing before the data can be replicated.These can be checksums, specialized validation aftercopy, encryption and decryption of data, additionalentries to be made in application catalogs, actual datageneration from templates, etc.

5.2. Future Services

There are several other services that need to be ad-dressed in future work. In the first prototype of theEuropean DataGrid testbed WP2 provided a replicasubscription facility, GDMP [25]. The hope was toreplace GDMP with a more robust and versatile fa-cility fully integrated with the rest of the replicationsystem, but this was not done due time pressures. Thefunctionality to automatically distribute files based onsome subscription mechanism is still much-needed.

In terms of Metadata Management, currently themetadata support in the RMC is limited to of O(10) ba-sic typed attributes, which can be used to select sets ofLFNs. The RMC cannot support many more metadataattributes or more complex metadata structures. Thereis ongoing work in the context of the GGF DAIS work-ing group to define proper interfaces for data accessand integration, much of their findings can be usedto refine and re-define the metadata structures of theRMC.

5.3. Future Optimisation Strategies

The current ROS component is conservative in termsof functionality, and there has been much recent re-search in this area [12, 21, 23] that should be leveragedto improve the capabilities and performance of theoptimisation systems. This would include the facilityfor the service to automatically replicate popular data

Page 10: Replica Management in the European DataGrid Project

350

sets based on previous access history, or pre-stage filesbased on job requirements before a job starts running.

5.4. Adhering to Standards

The Open Grid Service Architecture (OGSA) [18]is a GGF effort to describe Grid service architec-tures which define in generic terms the structure andmechanisms that have to be made available by Gridservices in order to be “OGSA compliant”. The cur-rent versions of these Grid standards are extensionsof the standard web service frameworks already usedfor WP2 software, consequently the migraton to anyof these newly emerging Grid standards should bestraightforward.

6. Related Work

As mentioned, one of the first Grid replica manage-ment prototypes was GDMP [25]. In its first toolkit theGlobus project [1] provided an LDAP-based replicacatalog service and a simple replica manager thatcould manage file copy and registration as a singlestep. The initial implementation of the EDG ReplicaManager simply wrapped these tools, providing amore user-friendly API and mass storage bindings.Later, we developed the concept of the Replica Loca-tion Service (RLS) together with Globus [13]. Bothprojects have their own implementation of the RLSand the performance of the Globus implementation isexamined in [14].

In terms of storage management, we have partici-pated actively in the definition of the Storage ResourceManagement (SRM) [8] interface specification. Workis being carried out on a Reliable File Transfer ser-vice [3] by the Globus Alliance, which may be ex-ploited by future high-level data management servicesfor reliable data movement. An integrated approachfor data and meta-data management is provided in theStorage Resource Broker (SRB) [6].

Within the high energy physics community one ofthe most closely related projects is SAM [26] (Se-quential data Access via Metadata) that was initiallydesigned to handle data management issues of theD0 experiment at Fermilab. Another data managementsystem as part of the Condor project is Kangaroo [27],which provides a reliable data movement service. Italso makes use of all available replicas in its systemsuch that this is transparent to the application.

Related work with respect to replica access op-timization has been done in the Earth Science Grid

(ESG) [2] project, which makes use of the NetworkWeather Service (NWS) [33]. Optimized replica selec-tion based on ClassAd match-making between storageresources and application requirements, whereby thestorage resources can be ranked by attributes such asavailable space or maximum data transfer rate is dis-cussed in [30]. Further studies in the field of optimizeddata replication have been presented in [12] and [23].

Various solutions have been proposed to data man-agement in peer-to-peer environments, however thesetend to concentrate on a specific requirement of a cer-tain group of users. Gnutella [19] provides a rapidquerying system at the expense of flooding the net-work with messages while Freenet [15] uses a moreconservative querying system to pin-point data lo-cation and guarantees anonymity for all users. FreeHaven [16] also provides anonymous read and writeaccess on the network and concentrates on guarentee-ing persistency of data. These systems work well formillions of users sharing small files but are not robustor reliable enough for the specialised requirements ofsmaller groups of scientists.

Some companies are now starting to offerreplication-based solutions to the data handling needsof industry. Avaki for example focusses on simplifyingaccess to data from multiple heterogeneous sources.A data servce layer lies between any number of datasources and any number of data applications and forthe applications it appears as one virtual data source.

7. Conclusion

In this paper we have described the design and ar-chitecture and examined the performance of the datamanagement services provided to the European Data-Grid project by Work Package 2. The web servicesmodel was used to create a set of independent replica-tion, cataloging and optimization services accessed viaa single entry point, the Replica Manager. The adop-tion of the web services model enables a platform andvendor independent means of accessing and managingthe data and associated metadata of the user applica-tions. Performance analysis has shown that when theservices are used as intended, they can cope understressful conditions and scale well with increasing userload.

It remains to be seen what the final standard will befor a Grid services framework. But the data manage-ment services we have developed should be adaptablewith minimal effort to the emergent standards and canprovide a solid base for any future efforts in this area.

Page 11: Replica Management in the European DataGrid Project

351

References

1. B. Allcock, J. Bester, J. Bresnahan, A.L. Chervenak, I.Foster, C. Kesselman, S. Meder, V. Nefedova, D. Ques-nal and S. Tuecke, “Data Management and Transfer in HighPerformance Computational Grid Environments”, ParallelComputing Journal, Vol. 28, No. 5, pp. 749–771, 2002.

2. B. Allcock, I. Foster, V. Nefedov, A. Chervenak, E. Deel-man, C. Kesselman, J. Lee, A. Sim, A. Shoshani, B. Drachand D. Williams, “High-Performance Remote Access to Cli-mate Simulation Data: A Challenge Problem for Data GridTechnologies”, in 14th International IEEE SupercomputingConference (SC 2001), Denver, Texas, USA, 2001.

3. W.E. Allcock, I. Foster and R. Madduri, “Reliable DataTransport: A Critical Service for the Grid”, in Global GridForum 11, Honolulu, Hawaii, USA, 2004.

4. “Apache Axis”. http://ws.apache.org/axis/5. “Apache Tomcat”. http://jakarta.apache.org/tomcat/6. C. Baru, R. Moore, A. Rajasekar and M. Wan, “The SDSC

Storage Resource Broker”, in CASCON’98, Toronto, Canada,1998.

7. W.H. Bell, D.G. Cameron, L. Capozza, A.P. Millar,K. Stockinger and F. Zini, “Design of a Replica OptimisationFramework”, Technical Report DataGrid-02-TED-021215,CERN, Geneva, Switzerland, 2002.

8. I. Bird, B. Hess, A. Kowalski, D. Petravick, R. Wellner, J. Gu,E. Otoo, A. Romosan, A. Sim, A. Shoshani, W. Hoschek,P. Kunszt, H. Stockinger, K. Stockinger, B. Tierney andJ.-P. Baud, “SRM joint functional design”, in Global GridForum 4, Toronto, Canada, 2002.

9. B. Bloom, “Space/Time Trade-offs in Hash Coding with Al-lowable Errors”, Communications of ACM, Vol. 13, No. 7,pp. 422–426, 1970.

10. D. Bosio, J. Casey, A. Frohner, L. Guy, P. Kunszt, E. Laure,S. Lemaitre, L. Lucio, H. Stockinger, K. Stockinger, W. Bell,D. Cameron, G. McCance, P. Millar, J. Hahkala, N. Karlsson,V. Nenonen, M. Silander, O. Mulmo, G.-L. Volpato, G. An-dronico, F. DiCarlo, L. Salconi, A. Domenici, R. Carvajal-Schiaffino and F. Zini, “Next-Generation EU DataGrid DataManagement Services”, in Computing in High Energy Physics(CHEP 2003), La Jolla, California, USA, 2003.

11. R. Butler, D. Engert, I. Foster, C. Kesselman, S. Tuecke, J.Volmer and V. Welch, “A National-Scale Authentication In-frastructure”, IEEE Computer, Vol. 33, No. 12, pp. 60–66,2000.

12. D.G. Cameron, R. Carvajal-Schiaffino, P. Millar, C. Nichol-son, K. Stockinger and F. Zini, “Evaluating Scheduling andReplica Optimisation Strategies in OptorSim”, in 4th Inter-national Workshop on Grid Computing (Grid2003), Phoenix,Arizona, USA, 2003.

13. A. Chervenak, E. Deelman, I. Foster, L. Guy, W. Hoschek,A. Iamnitchi, C. Kesselman, P. Kunszt, M. Ripeanu,B. Schwartzkopf, H. Stockinger, K. Stockinger and B. Tierney,“Giggle: A Framework for Constructing Scalable Replica Lo-cation Services”, in 15th International IEEE SupercomputingConference (SC 2002), Baltimore, USA, 2002.

14. A. Chervenak, N. Palavalli, S. Bharathi, C. Kesselman andR. Schwartzkopf, “Performance and Scalability of a ReplicaLocation Service”, in 13th IEEE Symposium on High Per-formance and Distributed Computing (HPDC-13), Honolulu,Hawaii, USA, 2004.

15. I. Clarke, S.G. Miller, T.W. Hong, O. Sandberg and B. Wi-ley, “Protecting Free Expression Online with Freenet”, IEEEInternet Computing, Vol. 6, No. 1, pp. 40–49, 2002.

16. R. Dingledine, M.J. Freedman and D. Molnar, Peer-To-Peer:Harnessing the Benefits of a Disruptive Technology, Chapter“Free Haven”, pp. 159–187, O’Reilly: 2001.

17. I. Foster, J. Frey, S. Graham, S. Tuecke, K. Czajkowski,D. Ferguson, F. Leymann, M. Nally, T. Storey, W. Vambenepeand S. Weerawarana, “Modeling Stateful Resources with WebServices”, in Globus World 2004, San Fransisco, California,USA, 2004.

18. I. Foster, C. Kesselman, J.M. Nick and S. Tuecke, “The Phys-iology of the Grid: An Open Grid Services Architecture forDistributed Systems Integration”, Technical report, GlobalGrid Forum, 2002.

19. G. Kan, Peer-To-Peer: Harnessing the Benefits of a Disrup-tive Technology, Chapter “Gnutella”, pp. 94–122. O’Reilly:2001.

20. P. Kunszt, E. Laure, H. Stockinger and K. Stockinger, “ReplicaManagement with Reptor”, in 5th International Conference onParallel Processing and Applied Mathematics, Czestochowa,Poland, 2003.

21. H. Lamehamedi, Z. Shentu, B. Szymanski and E. Deel-man, “Simulation of Dynamic Data Replication Strategiesin Data Grids”, in 12th Heterogeneous Computing Workshop(HCW2003), Nice, France, 2003.

22. “LCG: The LHC Computing Grid”. http://cern.ch/LCG/23. K. Ranganathan and I. Foster, “Identifying Dynamic Repli-

cation Strategies for a High Performance Data Grid”, in2nd International Workshop on Grid Computing (Grid2001),Denver, Colorado, USA, 2001.

24. H. Stockinger, F. Donno, E. Laure, S. Muzaffar, P. Kunszt,G. Andronico and P. Millar, “Grid Data Management inAction: Experience in Running and Supporting Data Man-agement Services in the EU DataGrid Project”, in Computingin High Energy Physics (CHEP 2003), La Jolla, California,USA, 2003.

25. H. Stockinger, A. Samar, S. Muzaffar and F. Donno, “GridData Mirroring Package (GDMP)”, Scientific ProgrammingJournal (Special Issue: Grid Computing), Vol. 10, No. 2,pp. 121–134, 2002.

26. I. Terekhov, R. Pordes, V. White, L. Lueking, L. Carpenter,H. Schellman, J. Trumbo, S. Veseli and M. Vranicar, “Distrib-uted Data Access and Resource Management in the D0 SAMSystem”, in 10th IEEE Symposium on High Performance andDistributed Computing (HPDC-10), San Francisco, Califor-nia, USA, 2001.

27. D. Thain, J. Basney, S. Son and M. Livny, “The KangarooApproach to Data Movement on the Grid”, in 10th IEEESymposium on High Performance and Distributed Computing(HPDC-10), San Francisco, California, USA, 2001.

28. “The Jakarta Project”. http://jakarta.apache.org/29. R.A. van Engelen and K.A. Gallivan, “The gSOAP Toolkit for

Web Services and Peer-To-Peer Computing Networks”, in 2ndIEEE/ACM International Conference on Cluster Computingand the Grid (CCGRID 2002), Berlin, Germany, 2002.

30. S. Vazhkudai, S. Tuecke and I. Foster, “Replica Selection inthe Globus Data Grid”, in 1st IEEE/ACM International Con-ference on Cluster Computing and the Grid (CCGRID 2001),Brisbane, Australia, 2001.

31. W3C. “Web Services Activity”. http://www.w3c.org/2002/ws/32. “Web Service Definition Language”. http://www.w3.org/TR/

wsdl/33. R. Wolski, N. Spring and J. Hayes, “The Network Weather

Service: A Distributed Resource Performance ForecastingService for Metacomputing”, Journal of Future Genera-tion Computing Systems, Vol. 15, Nos. 5–6, pp. 757–768,1999.