7
Journal of Physics: Conference Series OPEN ACCESS DIRAC: reliable data management for LHCb To cite this article: A C Smith and A Tsaregorodtsev 2008 J. Phys.: Conf. Ser. 119 062045 View the article online for updates and enhancements. Related content The LHCb Turbo Stream Sean Benson, Vladimir Gligorov, Mika Anton Vesterinen et al. - The LHCb Data Management System J P Baud, Ph Charpentier, K Ciba et al. - Optimization of the LHCb track reconstruction Barbara Storaci - Recent citations A classification of file placement and replication methods on grids Jianwei Ma et al - Belle-DIRAC Setup for Using Amazon Elastic Compute Cloud Ricardo Graciani Diaz et al - Integration of cloud, grid and local cluster resources with DIRAC Tom Fifield et al - This content was downloaded from IP address 83.227.110.128 on 24/09/2021 at 10:46

DIRAC: reliable data management for LHCb

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Journal of Physics Conference Series

OPEN ACCESS

DIRAC reliable data management for LHCbTo cite this article A C Smith and A Tsaregorodtsev 2008 J Phys Conf Ser 119 062045

View the article online for updates and enhancements

Related contentThe LHCb Turbo StreamSean Benson Vladimir Gligorov MikaAnton Vesterinen et al

-

The LHCb Data Management SystemJ P Baud Ph Charpentier K Ciba et al

-

Optimization of the LHCb trackreconstructionBarbara Storaci

-

Recent citationsA classification of file placement andreplication methods on gridsJianwei Ma et al

-

Belle-DIRAC Setup for Using AmazonElastic Compute CloudRicardo Graciani Diaz et al

-

Integration of cloud grid and local clusterresources with DIRACTom Fifield et al

-

This content was downloaded from IP address 83227110128 on 24092021 at 1046

DIRAC Reliable Data Management for LHCb

Andrew C Smith1 and Andrei Tsaregorodtsev2 on behalf of theLHCb DIRAC Team1 CERN CH-1211 Geneva Switzerland2 CPPM Marseille France

E-mail asmithcernch atsaregin2p3fr

Abstract DIRAC LHCbrsquos Grid Workload and Data Management System utilizes WLCGresources and middleware components to perform distributed computing tasks satisfying LHCbrsquosComputing Model The Data Management System (DMS) handles data transfer and data ac-cess within LHCb Its scope ranges from the output of the LHCb Online system to Grid-enabledstorage for all data types It supports metadata for these files in replica and bookkeeping cat-alogues allowing dataset selection and localization The DMS controls the movement of filesin a redundant fashion whilst providing utilities for accessing all metadata To do these taskseffectively the DMS requires complete self integrity between its components and external phys-ical storage The DMS provides highly redundant management of all LHCb data to leverageavailable storage resources and to manage transient errors in underlying services It providesdata driven and reliable distribution of files as well as reliable job output upload utilizing VOBoxes at LHCb Tier1 sites to prevent data loss

This paper presents several examples of mechanisms implemented in the DMS to increase relia-bility availability and integrity highlighting successful design choices and limitations discovered

1 IntroductionDIRAC is LHCbrsquos combined Grid Workload and Data Management System [1] providing thefunctionalities to support the LHCb Computing Model Since the inception of the project as aproduction system DIRAC has evolved to a generic Community Grid Solution DIRACrsquos DMShas also evolved over the lifetime of the project building on operational experience to providea redundant and reliable service to meet the needs of the LHCb VO In the Grid computingenvironment 100 service availability is not always the reality In this regime insuring no-loss Data Management requires a redundant system architecture In addition measures mustbe taken to ensure data integrity and data access to avoid wasting computing and networkresources

2 Ensuring Data DistributionWhen a computing task fails to upload its output data the computing resources consumed canbe considered wasted The waste of computing resources is a concern for the VO using them andthe site supplying them Ensuring the ability to upload output data is therefore important whenconsidering the global efficiency in the Grid environment A simple solution to this problem isto attempt to upload ouput to all SEs available until success is achieved

International Conference on Computing in High Energy and Nuclear Physics (CHEPrsquo07) IOP PublishingJournal of Physics Conference Series 119 (2008) 062045 doi1010881742-65961196062045

ccopy 2008 IOP Publishing Ltd 1

The LHCb Computing Model [2] states that simulation and processing jobs upload their out-put data to the associated Tier1 SE In this Model Tier2 sites provide computing resources forsimulation jobs where the output is uploaded to the associated Tier1 SE Processing activitiesare conducted on the Tier1 sites where input data is read from the local SE and the outputdata written to the local SE The upload of output data to the location of the input data isimportant when resultant jobs require access to ancestor files Given these requirements thesimple solution to ensuring data upload described above is inadequate

Figure 1 Architecture for Ensuring Data and Request Persistency

Within DIRAC this problem is solved using a lsquofailoverrsquo mechanism In the event the desiredoutput SE is unavailable DIRAC attempts to upload data to available lsquofailoverrsquo SEs until successis achieved like the simple solution described above The lsquofailoverrsquo SEs are configured to bedisk storage elements such that subsequent availability of data is assured Once the job hassuccessfully uploaded to a lsquofailoverrsquo SE it places a Data Management request in the centralizedTransferDB to move the file to the desired destination(s) The TransferDB contains all LHCbrsquostransfer requests which are scheduled based on network availability and replicated using gLitersquosFile Transfer Service [3] In the event the TransferDB is unavailable the job attempts to placethe request on one of the distributed RequestDBs present on the VO boxes at LHCbrsquos Tier1sThese redundant recipients of the requests ensure persistency of the request The requests arethen forwarded to the TransferDB once it becomes available This architecture is shown inFigure 1

This architecture provides several advantages The use of lsquofailoverrsquo disk SEs ensures that nooutput data is lost In addition multiple VO boxes ensure the persistency of the request tomove output data to its desired final destination Finally the centralized database of replicationrequests allows aggregation of requests into bulk transfers managed by FTS with access todedicated high bandwidth network connectivity

3 Ensuring LFC AvailabilityReliability and availability of replica information is fundamental to the functioning of DIRACWorkload and Data Management Systems LHCbrsquos choice of replica catalogue is the LCG FileCatalog (LFC) [4] and is used to map globally unique Logical File Names (LFN) to PhysicalFile Names (PFN) Within DIRAC there are many consumers and producers of the replica in-formation The consumers include central WMS and DMS components performing schedulingand optimization tasks as well as distributed job agents requiring access to replica informationfor data processing activity The generators of replica information are the DMS componentsperforming data replication and the distributed job agents producing output data In all these

International Conference on Computing in High Energy and Nuclear Physics (CHEPrsquo07) IOP PublishingJournal of Physics Conference Series 119 (2008) 062045 doi1010881742-65961196062045

2

cases the availability of the LFC service is crucial

Other Grid Data Management systems [5][6] due to the volume of data being managed havechosen an architecture with distributed lsquosite-specificrsquo catalogues serving local replica informa-tion In addition central catalogues are required to map lsquodatasetsrsquo to the site at which they canbe found This architecture reduces the volume of information stored in a single catalogue andas a result the load on each individual catalogue But the distributed architecture also requiresconsistency is maintained between central and distributed catalogues In comparison DIRACmanages a smaller data volume and have a single centralized catalogue containing all managedfiles This approach has shown to be scalable to the current level of O(10M) replicas and hasseveral advantages A single central catalogue simplifies the operations required to obtain replicainformation and reduces the number of components required to be operational The central-ized architecture employed by DIRAC does have a single major drawback single point of failure

Figure 2 Architecture for Ensuring Availability of Replica Information

To provide redundancy in the availability of replica information DIRAC uses distributedread-only catalogue mirrors shown in Figure 2 The central LFC instance in replicated toLHCbrsquos Tier1s using Oracle Streaming Technology [7] Producers of replica information mustcontact the readwrite master instance while queries can use any of the read-only mirrorcatalogues providing the additional benefit of reducing the load on the master In the eventthe master instance fails registration Requests are persisted in one of several RequestDBs usingthe mechanism presented in Section 2 These Requests are retried until success and provideadditional redundancy for the master instance

4 Ensuring Data Management Integrity[8]Ensuring the availability of replica information as shown in Section 3 is pointless if the meta-data stored is incorrect Similarly storage resources are wasted if physical files present on theseresources are orphans with no information in the replica catalogue The consistency of DIRACsmetadata catalogues and Storage Elements is vital in the provision of reliable data managementEnsuring this consistency is performed with an integrity checking suite This suite (shown inFigure 3) consists of a series of agents that check the mutual consistency of DIRACrsquos three mainData Management resources Storage Elements (SE) LFC LHCbrsquos Bookkeeping and prove-nance DB These agents report any inconsistencies found to a central repository (IntegrityDB)

41 Bookkeeping vs LFCLHCbrsquos Bookkeeping DB contains provenance information regarding all LHCbrsquos jobs and filesFiles which no longer physically exist are marked accordingly in the DB such that they are no

International Conference on Computing in High Energy and Nuclear Physics (CHEPrsquo07) IOP PublishingJournal of Physics Conference Series 119 (2008) 062045 doi1010881742-65961196062045

3

Figure 3 Data Integrity Suite Architecture

longer lsquovisiblersquo The Bookkeeping also provides an interface for users to query for files withparticular properties and is used extensively by physicists for selecting events for analysis Toensure users are given files that exist the consistency of the Bookkeeping must be maintainedwith the LFC An agent is deployed (BK-LFC) which verifies the lsquovisiblersquo files in the Bookkeepingexist in the LFC

42 LFC vs SEThe agent described in the previous section ensures users only receive files from the Bookkeepingthat physically exist The assumption made in the design of this agent is the consistency of theLFC contents This itself must be insured by checking the replicas present in the LFC existon the storage resources An agent is deployed (LFC-SE) which loops over the LFC namespaceperforming this check on the replicas found

43 SE vs LFCAt Petabyte scale Grid computing the pressure to efficiently use storage resources is paramountA small percentage waste has non-negligible financial implications Orphan physical files onstorage resources without registered replicas can never be accessed by users and are one sourceof waste To combat this an agent is deployed (SE-LFC) to loop over the contents of the SEnamespace and verify the physical files found are registered in the LFC

44 Resolving Data Integrity ProblemsThe agents described in the previous sections report problematic files to the IntegrityDB wherethey are collated A further agent is responsible for determining the reason for the inconsistencyand where possible automatically resolving it This resolution can take the form of re-replicationre-registration physical removal or catalogue removal of files Some pathologies may not beresolved automatically and remain in the DB for manual intervention

45 Determining Storage UsageAs mentioned in Section 43 the efficient use of storage resources is important Likewise itis important to know how storage resources are being used An agent is deployed (StorageUsage) which loops over the namespace of the LFC and generates a high granularity pictureof storage usage based on the registered replicas and their sizes This information is stored inthe IntegrityDB indexed by LFC directory such that different views of this information can begenerated

International Conference on Computing in High Energy and Nuclear Physics (CHEPrsquo07) IOP PublishingJournal of Physics Conference Series 119 (2008) 062045 doi1010881742-65961196062045

4

46 Issues with ScalabilityEnsuring the ultimate consistency and integrity in the Petabyte era of Grid computing is animpossible task DIRACrsquos current approach assumes the information provided by the underlyingresources is an accurate representation of that system The limitation of this approach is thisassumption For example the problem of ensuring the consistency of the catalogue with theunderlying storage resource is reflected within the SEs themselves The contents of the SEnamespace like the contents of replica catalogues (and Grid resources in general) are not 100dependable To a-priori solve data integrity issues would require continual data access of allfiles and until disk computing and cooling power is infinite this is not possible The futureapproach within DIRAC will include detecting integrity issues as they arise and resolving themautomatically post-priori This approach is scalable and can be extended as new pathologies areobserved

5 Ensuring Data AccessThe LHCb Computing Model [2] states that re-processing activity will be performed four times ayear During this exercise access to files written to the SEs possibly months before is requiredThese files may have been migrated to tape and cleared from disk caches To obtain access tothese files on disk cache for data transfer and processing DIRAC has implemented a stagingservice Without this service both network and computing resources may be wasted while theprocesses attempting to access the files lsquohangrsquo until the files become available

The DIRAC Stager shown in Figure 4 overlays the SEs and receives staging requests fromDIRAC WMS and DMS components These staging requests are stored in a central DB wherethey are retrieved by an agent that issues pre-staging requests to the remote SEs This agentmonitors the issued requests until the files are available on disk When deployed against StorageResource Manager 22 [9][10] compliant SEs files can be lsquopinnedrsquo to ensure they are not removedbefore access is attempted The availability of the files is then reported back to the system thatmade the original request

Figure 4 Architecture for DIRAC Stager System

51 Workload Management Usage [11]In the context of Workload Management requests are submitted to the DIRAC Stager at thepoint of data optimization At this point the possible sites at which a job may run is determinedusing replica information obtained from the LFC If the files to be processed are present on tapestorage the request to stage the files is passed to the Stager Once these files are staged theStager Agent reports back to the WMS and the job waiting for these files is submitted to theGrid for execution

International Conference on Computing in High Energy and Nuclear Physics (CHEPrsquo07) IOP PublishingJournal of Physics Conference Series 119 (2008) 062045 doi1010881742-65961196062045

5

52 Data Management UsageThe DMS usage of the Stager is similar to that of the WMS The lsquoReplication Optimiserrsquo ob-tains replication requests and assigns them to channels for execution In the event the files areregistered on tape storage the requests are passed to the Stager Once the files are staged theStager Agent reports back to the TransferDB where the files are made available for replication

By combining the needs of the WMS and DMS into a single centralized service it is possibleto coordinate the usage of disk cache for all of LHCbrsquos activities This will be discussed furtherin Section 7

6 ConclusionsThis paper describes several mechanisms to increase the reliability of DIRAC activities whileusing Grid services These examples provide insight into the problems that must be consideredwhen doing large scale Grid Data Management and the solutions so far adopted within DIRACThe solutions adopted include ensuring the persistency of data and requests to manipulate datato ensure the ability for consumers and producers of replica information to contact cataloguesand finally for consumers of files to access these on disk without wasting resources

7 Future WorkThe issues of scalability mentioned with respect to Data Integrity in Section 46 have a solutionin post-priori resolution The key to this development is making every DIRAC component asensor reporting to a centralized repository The DIRAC Logging Service is currently beingdeployed to meet this requirement [12]

The DIRAC Stager System discussed in Section 5 has been extensively used in LHCbcomputing exercises during 2007 [13] The current system is still vulnerable to files being removedfrom disk cache before the replicationjob attempts to access the file This possibility existsbecause of the time between the WMSDMS components being informed of the data availabilityand their ability to use the files The current system could be extended by maintaining a listof issued pins and file sizes which coupled with knowledge of the size of the disk cache wouldprevent stage requests being issued when the disk cache is already full

References[1] A Tsaregorodtsev et al DIRAC A community grid solution CHEP07 (2007)[2] LHCb Collaboration LHCb Computing TDR Technical Report CERNLHCC-05-119 CERN (2005)[3] G McCance et al Building the WLCG file transfer service CHEP07 (2007)[4] S Lemaitre et al Recent Developments in LFC CHEP07 (2007)[5] M Branco et al Managing ATLAS data on a petabyte-scale with DQ2 CHEP07 (2007)[6] L Tuura et al Scaling CMS data transfer system for LHC start-up CHEP07 (2007)[7] B Martelli et al LHCb experience with LFC database replication CHEP07 (2007)[8] A C Smith M Bargiotti DIRAC Data Management consistency integrity and coherence of data CHEP07

(2007)[9] F Donno et al Storage Resource Manager version 22 design implementation and testing experience

CHEP07 (2007)[10] Arie Shoshani et al Storage Resource Managers Recent International Experience on Requirements and

Multiple Co-Operating Implementations 24th IEEE Conference on Mass Storage Systems and Technologies(2007)

[11] S Paterson et al DIRAC Optimized Workload Management CHEP07 (2007)[12] R Graciani et al DIRAC Framework for Distributed Computing CHEP07 (2007)[13] R Nandakumar et al The LHCb Computing Data Challenge DC06 CHEP07 (2007)

International Conference on Computing in High Energy and Nuclear Physics (CHEPrsquo07) IOP PublishingJournal of Physics Conference Series 119 (2008) 062045 doi1010881742-65961196062045

6

DIRAC Reliable Data Management for LHCb

Andrew C Smith1 and Andrei Tsaregorodtsev2 on behalf of theLHCb DIRAC Team1 CERN CH-1211 Geneva Switzerland2 CPPM Marseille France

E-mail asmithcernch atsaregin2p3fr

Abstract DIRAC LHCbrsquos Grid Workload and Data Management System utilizes WLCGresources and middleware components to perform distributed computing tasks satisfying LHCbrsquosComputing Model The Data Management System (DMS) handles data transfer and data ac-cess within LHCb Its scope ranges from the output of the LHCb Online system to Grid-enabledstorage for all data types It supports metadata for these files in replica and bookkeeping cat-alogues allowing dataset selection and localization The DMS controls the movement of filesin a redundant fashion whilst providing utilities for accessing all metadata To do these taskseffectively the DMS requires complete self integrity between its components and external phys-ical storage The DMS provides highly redundant management of all LHCb data to leverageavailable storage resources and to manage transient errors in underlying services It providesdata driven and reliable distribution of files as well as reliable job output upload utilizing VOBoxes at LHCb Tier1 sites to prevent data loss

This paper presents several examples of mechanisms implemented in the DMS to increase relia-bility availability and integrity highlighting successful design choices and limitations discovered

1 IntroductionDIRAC is LHCbrsquos combined Grid Workload and Data Management System [1] providing thefunctionalities to support the LHCb Computing Model Since the inception of the project as aproduction system DIRAC has evolved to a generic Community Grid Solution DIRACrsquos DMShas also evolved over the lifetime of the project building on operational experience to providea redundant and reliable service to meet the needs of the LHCb VO In the Grid computingenvironment 100 service availability is not always the reality In this regime insuring no-loss Data Management requires a redundant system architecture In addition measures mustbe taken to ensure data integrity and data access to avoid wasting computing and networkresources

2 Ensuring Data DistributionWhen a computing task fails to upload its output data the computing resources consumed canbe considered wasted The waste of computing resources is a concern for the VO using them andthe site supplying them Ensuring the ability to upload output data is therefore important whenconsidering the global efficiency in the Grid environment A simple solution to this problem isto attempt to upload ouput to all SEs available until success is achieved

International Conference on Computing in High Energy and Nuclear Physics (CHEPrsquo07) IOP PublishingJournal of Physics Conference Series 119 (2008) 062045 doi1010881742-65961196062045

ccopy 2008 IOP Publishing Ltd 1

The LHCb Computing Model [2] states that simulation and processing jobs upload their out-put data to the associated Tier1 SE In this Model Tier2 sites provide computing resources forsimulation jobs where the output is uploaded to the associated Tier1 SE Processing activitiesare conducted on the Tier1 sites where input data is read from the local SE and the outputdata written to the local SE The upload of output data to the location of the input data isimportant when resultant jobs require access to ancestor files Given these requirements thesimple solution to ensuring data upload described above is inadequate

Figure 1 Architecture for Ensuring Data and Request Persistency

Within DIRAC this problem is solved using a lsquofailoverrsquo mechanism In the event the desiredoutput SE is unavailable DIRAC attempts to upload data to available lsquofailoverrsquo SEs until successis achieved like the simple solution described above The lsquofailoverrsquo SEs are configured to bedisk storage elements such that subsequent availability of data is assured Once the job hassuccessfully uploaded to a lsquofailoverrsquo SE it places a Data Management request in the centralizedTransferDB to move the file to the desired destination(s) The TransferDB contains all LHCbrsquostransfer requests which are scheduled based on network availability and replicated using gLitersquosFile Transfer Service [3] In the event the TransferDB is unavailable the job attempts to placethe request on one of the distributed RequestDBs present on the VO boxes at LHCbrsquos Tier1sThese redundant recipients of the requests ensure persistency of the request The requests arethen forwarded to the TransferDB once it becomes available This architecture is shown inFigure 1

This architecture provides several advantages The use of lsquofailoverrsquo disk SEs ensures that nooutput data is lost In addition multiple VO boxes ensure the persistency of the request tomove output data to its desired final destination Finally the centralized database of replicationrequests allows aggregation of requests into bulk transfers managed by FTS with access todedicated high bandwidth network connectivity

3 Ensuring LFC AvailabilityReliability and availability of replica information is fundamental to the functioning of DIRACWorkload and Data Management Systems LHCbrsquos choice of replica catalogue is the LCG FileCatalog (LFC) [4] and is used to map globally unique Logical File Names (LFN) to PhysicalFile Names (PFN) Within DIRAC there are many consumers and producers of the replica in-formation The consumers include central WMS and DMS components performing schedulingand optimization tasks as well as distributed job agents requiring access to replica informationfor data processing activity The generators of replica information are the DMS componentsperforming data replication and the distributed job agents producing output data In all these

International Conference on Computing in High Energy and Nuclear Physics (CHEPrsquo07) IOP PublishingJournal of Physics Conference Series 119 (2008) 062045 doi1010881742-65961196062045

2

cases the availability of the LFC service is crucial

Other Grid Data Management systems [5][6] due to the volume of data being managed havechosen an architecture with distributed lsquosite-specificrsquo catalogues serving local replica informa-tion In addition central catalogues are required to map lsquodatasetsrsquo to the site at which they canbe found This architecture reduces the volume of information stored in a single catalogue andas a result the load on each individual catalogue But the distributed architecture also requiresconsistency is maintained between central and distributed catalogues In comparison DIRACmanages a smaller data volume and have a single centralized catalogue containing all managedfiles This approach has shown to be scalable to the current level of O(10M) replicas and hasseveral advantages A single central catalogue simplifies the operations required to obtain replicainformation and reduces the number of components required to be operational The central-ized architecture employed by DIRAC does have a single major drawback single point of failure

Figure 2 Architecture for Ensuring Availability of Replica Information

To provide redundancy in the availability of replica information DIRAC uses distributedread-only catalogue mirrors shown in Figure 2 The central LFC instance in replicated toLHCbrsquos Tier1s using Oracle Streaming Technology [7] Producers of replica information mustcontact the readwrite master instance while queries can use any of the read-only mirrorcatalogues providing the additional benefit of reducing the load on the master In the eventthe master instance fails registration Requests are persisted in one of several RequestDBs usingthe mechanism presented in Section 2 These Requests are retried until success and provideadditional redundancy for the master instance

4 Ensuring Data Management Integrity[8]Ensuring the availability of replica information as shown in Section 3 is pointless if the meta-data stored is incorrect Similarly storage resources are wasted if physical files present on theseresources are orphans with no information in the replica catalogue The consistency of DIRACsmetadata catalogues and Storage Elements is vital in the provision of reliable data managementEnsuring this consistency is performed with an integrity checking suite This suite (shown inFigure 3) consists of a series of agents that check the mutual consistency of DIRACrsquos three mainData Management resources Storage Elements (SE) LFC LHCbrsquos Bookkeeping and prove-nance DB These agents report any inconsistencies found to a central repository (IntegrityDB)

41 Bookkeeping vs LFCLHCbrsquos Bookkeeping DB contains provenance information regarding all LHCbrsquos jobs and filesFiles which no longer physically exist are marked accordingly in the DB such that they are no

International Conference on Computing in High Energy and Nuclear Physics (CHEPrsquo07) IOP PublishingJournal of Physics Conference Series 119 (2008) 062045 doi1010881742-65961196062045

3

Figure 3 Data Integrity Suite Architecture

longer lsquovisiblersquo The Bookkeeping also provides an interface for users to query for files withparticular properties and is used extensively by physicists for selecting events for analysis Toensure users are given files that exist the consistency of the Bookkeeping must be maintainedwith the LFC An agent is deployed (BK-LFC) which verifies the lsquovisiblersquo files in the Bookkeepingexist in the LFC

42 LFC vs SEThe agent described in the previous section ensures users only receive files from the Bookkeepingthat physically exist The assumption made in the design of this agent is the consistency of theLFC contents This itself must be insured by checking the replicas present in the LFC existon the storage resources An agent is deployed (LFC-SE) which loops over the LFC namespaceperforming this check on the replicas found

43 SE vs LFCAt Petabyte scale Grid computing the pressure to efficiently use storage resources is paramountA small percentage waste has non-negligible financial implications Orphan physical files onstorage resources without registered replicas can never be accessed by users and are one sourceof waste To combat this an agent is deployed (SE-LFC) to loop over the contents of the SEnamespace and verify the physical files found are registered in the LFC

44 Resolving Data Integrity ProblemsThe agents described in the previous sections report problematic files to the IntegrityDB wherethey are collated A further agent is responsible for determining the reason for the inconsistencyand where possible automatically resolving it This resolution can take the form of re-replicationre-registration physical removal or catalogue removal of files Some pathologies may not beresolved automatically and remain in the DB for manual intervention

45 Determining Storage UsageAs mentioned in Section 43 the efficient use of storage resources is important Likewise itis important to know how storage resources are being used An agent is deployed (StorageUsage) which loops over the namespace of the LFC and generates a high granularity pictureof storage usage based on the registered replicas and their sizes This information is stored inthe IntegrityDB indexed by LFC directory such that different views of this information can begenerated

International Conference on Computing in High Energy and Nuclear Physics (CHEPrsquo07) IOP PublishingJournal of Physics Conference Series 119 (2008) 062045 doi1010881742-65961196062045

4

46 Issues with ScalabilityEnsuring the ultimate consistency and integrity in the Petabyte era of Grid computing is animpossible task DIRACrsquos current approach assumes the information provided by the underlyingresources is an accurate representation of that system The limitation of this approach is thisassumption For example the problem of ensuring the consistency of the catalogue with theunderlying storage resource is reflected within the SEs themselves The contents of the SEnamespace like the contents of replica catalogues (and Grid resources in general) are not 100dependable To a-priori solve data integrity issues would require continual data access of allfiles and until disk computing and cooling power is infinite this is not possible The futureapproach within DIRAC will include detecting integrity issues as they arise and resolving themautomatically post-priori This approach is scalable and can be extended as new pathologies areobserved

5 Ensuring Data AccessThe LHCb Computing Model [2] states that re-processing activity will be performed four times ayear During this exercise access to files written to the SEs possibly months before is requiredThese files may have been migrated to tape and cleared from disk caches To obtain access tothese files on disk cache for data transfer and processing DIRAC has implemented a stagingservice Without this service both network and computing resources may be wasted while theprocesses attempting to access the files lsquohangrsquo until the files become available

The DIRAC Stager shown in Figure 4 overlays the SEs and receives staging requests fromDIRAC WMS and DMS components These staging requests are stored in a central DB wherethey are retrieved by an agent that issues pre-staging requests to the remote SEs This agentmonitors the issued requests until the files are available on disk When deployed against StorageResource Manager 22 [9][10] compliant SEs files can be lsquopinnedrsquo to ensure they are not removedbefore access is attempted The availability of the files is then reported back to the system thatmade the original request

Figure 4 Architecture for DIRAC Stager System

51 Workload Management Usage [11]In the context of Workload Management requests are submitted to the DIRAC Stager at thepoint of data optimization At this point the possible sites at which a job may run is determinedusing replica information obtained from the LFC If the files to be processed are present on tapestorage the request to stage the files is passed to the Stager Once these files are staged theStager Agent reports back to the WMS and the job waiting for these files is submitted to theGrid for execution

International Conference on Computing in High Energy and Nuclear Physics (CHEPrsquo07) IOP PublishingJournal of Physics Conference Series 119 (2008) 062045 doi1010881742-65961196062045

5

52 Data Management UsageThe DMS usage of the Stager is similar to that of the WMS The lsquoReplication Optimiserrsquo ob-tains replication requests and assigns them to channels for execution In the event the files areregistered on tape storage the requests are passed to the Stager Once the files are staged theStager Agent reports back to the TransferDB where the files are made available for replication

By combining the needs of the WMS and DMS into a single centralized service it is possibleto coordinate the usage of disk cache for all of LHCbrsquos activities This will be discussed furtherin Section 7

6 ConclusionsThis paper describes several mechanisms to increase the reliability of DIRAC activities whileusing Grid services These examples provide insight into the problems that must be consideredwhen doing large scale Grid Data Management and the solutions so far adopted within DIRACThe solutions adopted include ensuring the persistency of data and requests to manipulate datato ensure the ability for consumers and producers of replica information to contact cataloguesand finally for consumers of files to access these on disk without wasting resources

7 Future WorkThe issues of scalability mentioned with respect to Data Integrity in Section 46 have a solutionin post-priori resolution The key to this development is making every DIRAC component asensor reporting to a centralized repository The DIRAC Logging Service is currently beingdeployed to meet this requirement [12]

The DIRAC Stager System discussed in Section 5 has been extensively used in LHCbcomputing exercises during 2007 [13] The current system is still vulnerable to files being removedfrom disk cache before the replicationjob attempts to access the file This possibility existsbecause of the time between the WMSDMS components being informed of the data availabilityand their ability to use the files The current system could be extended by maintaining a listof issued pins and file sizes which coupled with knowledge of the size of the disk cache wouldprevent stage requests being issued when the disk cache is already full

References[1] A Tsaregorodtsev et al DIRAC A community grid solution CHEP07 (2007)[2] LHCb Collaboration LHCb Computing TDR Technical Report CERNLHCC-05-119 CERN (2005)[3] G McCance et al Building the WLCG file transfer service CHEP07 (2007)[4] S Lemaitre et al Recent Developments in LFC CHEP07 (2007)[5] M Branco et al Managing ATLAS data on a petabyte-scale with DQ2 CHEP07 (2007)[6] L Tuura et al Scaling CMS data transfer system for LHC start-up CHEP07 (2007)[7] B Martelli et al LHCb experience with LFC database replication CHEP07 (2007)[8] A C Smith M Bargiotti DIRAC Data Management consistency integrity and coherence of data CHEP07

(2007)[9] F Donno et al Storage Resource Manager version 22 design implementation and testing experience

CHEP07 (2007)[10] Arie Shoshani et al Storage Resource Managers Recent International Experience on Requirements and

Multiple Co-Operating Implementations 24th IEEE Conference on Mass Storage Systems and Technologies(2007)

[11] S Paterson et al DIRAC Optimized Workload Management CHEP07 (2007)[12] R Graciani et al DIRAC Framework for Distributed Computing CHEP07 (2007)[13] R Nandakumar et al The LHCb Computing Data Challenge DC06 CHEP07 (2007)

International Conference on Computing in High Energy and Nuclear Physics (CHEPrsquo07) IOP PublishingJournal of Physics Conference Series 119 (2008) 062045 doi1010881742-65961196062045

6

The LHCb Computing Model [2] states that simulation and processing jobs upload their out-put data to the associated Tier1 SE In this Model Tier2 sites provide computing resources forsimulation jobs where the output is uploaded to the associated Tier1 SE Processing activitiesare conducted on the Tier1 sites where input data is read from the local SE and the outputdata written to the local SE The upload of output data to the location of the input data isimportant when resultant jobs require access to ancestor files Given these requirements thesimple solution to ensuring data upload described above is inadequate

Figure 1 Architecture for Ensuring Data and Request Persistency

Within DIRAC this problem is solved using a lsquofailoverrsquo mechanism In the event the desiredoutput SE is unavailable DIRAC attempts to upload data to available lsquofailoverrsquo SEs until successis achieved like the simple solution described above The lsquofailoverrsquo SEs are configured to bedisk storage elements such that subsequent availability of data is assured Once the job hassuccessfully uploaded to a lsquofailoverrsquo SE it places a Data Management request in the centralizedTransferDB to move the file to the desired destination(s) The TransferDB contains all LHCbrsquostransfer requests which are scheduled based on network availability and replicated using gLitersquosFile Transfer Service [3] In the event the TransferDB is unavailable the job attempts to placethe request on one of the distributed RequestDBs present on the VO boxes at LHCbrsquos Tier1sThese redundant recipients of the requests ensure persistency of the request The requests arethen forwarded to the TransferDB once it becomes available This architecture is shown inFigure 1

This architecture provides several advantages The use of lsquofailoverrsquo disk SEs ensures that nooutput data is lost In addition multiple VO boxes ensure the persistency of the request tomove output data to its desired final destination Finally the centralized database of replicationrequests allows aggregation of requests into bulk transfers managed by FTS with access todedicated high bandwidth network connectivity

3 Ensuring LFC AvailabilityReliability and availability of replica information is fundamental to the functioning of DIRACWorkload and Data Management Systems LHCbrsquos choice of replica catalogue is the LCG FileCatalog (LFC) [4] and is used to map globally unique Logical File Names (LFN) to PhysicalFile Names (PFN) Within DIRAC there are many consumers and producers of the replica in-formation The consumers include central WMS and DMS components performing schedulingand optimization tasks as well as distributed job agents requiring access to replica informationfor data processing activity The generators of replica information are the DMS componentsperforming data replication and the distributed job agents producing output data In all these

International Conference on Computing in High Energy and Nuclear Physics (CHEPrsquo07) IOP PublishingJournal of Physics Conference Series 119 (2008) 062045 doi1010881742-65961196062045

2

cases the availability of the LFC service is crucial

Other Grid Data Management systems [5][6] due to the volume of data being managed havechosen an architecture with distributed lsquosite-specificrsquo catalogues serving local replica informa-tion In addition central catalogues are required to map lsquodatasetsrsquo to the site at which they canbe found This architecture reduces the volume of information stored in a single catalogue andas a result the load on each individual catalogue But the distributed architecture also requiresconsistency is maintained between central and distributed catalogues In comparison DIRACmanages a smaller data volume and have a single centralized catalogue containing all managedfiles This approach has shown to be scalable to the current level of O(10M) replicas and hasseveral advantages A single central catalogue simplifies the operations required to obtain replicainformation and reduces the number of components required to be operational The central-ized architecture employed by DIRAC does have a single major drawback single point of failure

Figure 2 Architecture for Ensuring Availability of Replica Information

To provide redundancy in the availability of replica information DIRAC uses distributedread-only catalogue mirrors shown in Figure 2 The central LFC instance in replicated toLHCbrsquos Tier1s using Oracle Streaming Technology [7] Producers of replica information mustcontact the readwrite master instance while queries can use any of the read-only mirrorcatalogues providing the additional benefit of reducing the load on the master In the eventthe master instance fails registration Requests are persisted in one of several RequestDBs usingthe mechanism presented in Section 2 These Requests are retried until success and provideadditional redundancy for the master instance

4 Ensuring Data Management Integrity[8]Ensuring the availability of replica information as shown in Section 3 is pointless if the meta-data stored is incorrect Similarly storage resources are wasted if physical files present on theseresources are orphans with no information in the replica catalogue The consistency of DIRACsmetadata catalogues and Storage Elements is vital in the provision of reliable data managementEnsuring this consistency is performed with an integrity checking suite This suite (shown inFigure 3) consists of a series of agents that check the mutual consistency of DIRACrsquos three mainData Management resources Storage Elements (SE) LFC LHCbrsquos Bookkeeping and prove-nance DB These agents report any inconsistencies found to a central repository (IntegrityDB)

41 Bookkeeping vs LFCLHCbrsquos Bookkeeping DB contains provenance information regarding all LHCbrsquos jobs and filesFiles which no longer physically exist are marked accordingly in the DB such that they are no

International Conference on Computing in High Energy and Nuclear Physics (CHEPrsquo07) IOP PublishingJournal of Physics Conference Series 119 (2008) 062045 doi1010881742-65961196062045

3

Figure 3 Data Integrity Suite Architecture

longer lsquovisiblersquo The Bookkeeping also provides an interface for users to query for files withparticular properties and is used extensively by physicists for selecting events for analysis Toensure users are given files that exist the consistency of the Bookkeeping must be maintainedwith the LFC An agent is deployed (BK-LFC) which verifies the lsquovisiblersquo files in the Bookkeepingexist in the LFC

42 LFC vs SEThe agent described in the previous section ensures users only receive files from the Bookkeepingthat physically exist The assumption made in the design of this agent is the consistency of theLFC contents This itself must be insured by checking the replicas present in the LFC existon the storage resources An agent is deployed (LFC-SE) which loops over the LFC namespaceperforming this check on the replicas found

43 SE vs LFCAt Petabyte scale Grid computing the pressure to efficiently use storage resources is paramountA small percentage waste has non-negligible financial implications Orphan physical files onstorage resources without registered replicas can never be accessed by users and are one sourceof waste To combat this an agent is deployed (SE-LFC) to loop over the contents of the SEnamespace and verify the physical files found are registered in the LFC

44 Resolving Data Integrity ProblemsThe agents described in the previous sections report problematic files to the IntegrityDB wherethey are collated A further agent is responsible for determining the reason for the inconsistencyand where possible automatically resolving it This resolution can take the form of re-replicationre-registration physical removal or catalogue removal of files Some pathologies may not beresolved automatically and remain in the DB for manual intervention

45 Determining Storage UsageAs mentioned in Section 43 the efficient use of storage resources is important Likewise itis important to know how storage resources are being used An agent is deployed (StorageUsage) which loops over the namespace of the LFC and generates a high granularity pictureof storage usage based on the registered replicas and their sizes This information is stored inthe IntegrityDB indexed by LFC directory such that different views of this information can begenerated

International Conference on Computing in High Energy and Nuclear Physics (CHEPrsquo07) IOP PublishingJournal of Physics Conference Series 119 (2008) 062045 doi1010881742-65961196062045

4

46 Issues with ScalabilityEnsuring the ultimate consistency and integrity in the Petabyte era of Grid computing is animpossible task DIRACrsquos current approach assumes the information provided by the underlyingresources is an accurate representation of that system The limitation of this approach is thisassumption For example the problem of ensuring the consistency of the catalogue with theunderlying storage resource is reflected within the SEs themselves The contents of the SEnamespace like the contents of replica catalogues (and Grid resources in general) are not 100dependable To a-priori solve data integrity issues would require continual data access of allfiles and until disk computing and cooling power is infinite this is not possible The futureapproach within DIRAC will include detecting integrity issues as they arise and resolving themautomatically post-priori This approach is scalable and can be extended as new pathologies areobserved

5 Ensuring Data AccessThe LHCb Computing Model [2] states that re-processing activity will be performed four times ayear During this exercise access to files written to the SEs possibly months before is requiredThese files may have been migrated to tape and cleared from disk caches To obtain access tothese files on disk cache for data transfer and processing DIRAC has implemented a stagingservice Without this service both network and computing resources may be wasted while theprocesses attempting to access the files lsquohangrsquo until the files become available

The DIRAC Stager shown in Figure 4 overlays the SEs and receives staging requests fromDIRAC WMS and DMS components These staging requests are stored in a central DB wherethey are retrieved by an agent that issues pre-staging requests to the remote SEs This agentmonitors the issued requests until the files are available on disk When deployed against StorageResource Manager 22 [9][10] compliant SEs files can be lsquopinnedrsquo to ensure they are not removedbefore access is attempted The availability of the files is then reported back to the system thatmade the original request

Figure 4 Architecture for DIRAC Stager System

51 Workload Management Usage [11]In the context of Workload Management requests are submitted to the DIRAC Stager at thepoint of data optimization At this point the possible sites at which a job may run is determinedusing replica information obtained from the LFC If the files to be processed are present on tapestorage the request to stage the files is passed to the Stager Once these files are staged theStager Agent reports back to the WMS and the job waiting for these files is submitted to theGrid for execution

International Conference on Computing in High Energy and Nuclear Physics (CHEPrsquo07) IOP PublishingJournal of Physics Conference Series 119 (2008) 062045 doi1010881742-65961196062045

5

52 Data Management UsageThe DMS usage of the Stager is similar to that of the WMS The lsquoReplication Optimiserrsquo ob-tains replication requests and assigns them to channels for execution In the event the files areregistered on tape storage the requests are passed to the Stager Once the files are staged theStager Agent reports back to the TransferDB where the files are made available for replication

By combining the needs of the WMS and DMS into a single centralized service it is possibleto coordinate the usage of disk cache for all of LHCbrsquos activities This will be discussed furtherin Section 7

6 ConclusionsThis paper describes several mechanisms to increase the reliability of DIRAC activities whileusing Grid services These examples provide insight into the problems that must be consideredwhen doing large scale Grid Data Management and the solutions so far adopted within DIRACThe solutions adopted include ensuring the persistency of data and requests to manipulate datato ensure the ability for consumers and producers of replica information to contact cataloguesand finally for consumers of files to access these on disk without wasting resources

7 Future WorkThe issues of scalability mentioned with respect to Data Integrity in Section 46 have a solutionin post-priori resolution The key to this development is making every DIRAC component asensor reporting to a centralized repository The DIRAC Logging Service is currently beingdeployed to meet this requirement [12]

The DIRAC Stager System discussed in Section 5 has been extensively used in LHCbcomputing exercises during 2007 [13] The current system is still vulnerable to files being removedfrom disk cache before the replicationjob attempts to access the file This possibility existsbecause of the time between the WMSDMS components being informed of the data availabilityand their ability to use the files The current system could be extended by maintaining a listof issued pins and file sizes which coupled with knowledge of the size of the disk cache wouldprevent stage requests being issued when the disk cache is already full

References[1] A Tsaregorodtsev et al DIRAC A community grid solution CHEP07 (2007)[2] LHCb Collaboration LHCb Computing TDR Technical Report CERNLHCC-05-119 CERN (2005)[3] G McCance et al Building the WLCG file transfer service CHEP07 (2007)[4] S Lemaitre et al Recent Developments in LFC CHEP07 (2007)[5] M Branco et al Managing ATLAS data on a petabyte-scale with DQ2 CHEP07 (2007)[6] L Tuura et al Scaling CMS data transfer system for LHC start-up CHEP07 (2007)[7] B Martelli et al LHCb experience with LFC database replication CHEP07 (2007)[8] A C Smith M Bargiotti DIRAC Data Management consistency integrity and coherence of data CHEP07

(2007)[9] F Donno et al Storage Resource Manager version 22 design implementation and testing experience

CHEP07 (2007)[10] Arie Shoshani et al Storage Resource Managers Recent International Experience on Requirements and

Multiple Co-Operating Implementations 24th IEEE Conference on Mass Storage Systems and Technologies(2007)

[11] S Paterson et al DIRAC Optimized Workload Management CHEP07 (2007)[12] R Graciani et al DIRAC Framework for Distributed Computing CHEP07 (2007)[13] R Nandakumar et al The LHCb Computing Data Challenge DC06 CHEP07 (2007)

International Conference on Computing in High Energy and Nuclear Physics (CHEPrsquo07) IOP PublishingJournal of Physics Conference Series 119 (2008) 062045 doi1010881742-65961196062045

6

cases the availability of the LFC service is crucial

Other Grid Data Management systems [5][6] due to the volume of data being managed havechosen an architecture with distributed lsquosite-specificrsquo catalogues serving local replica informa-tion In addition central catalogues are required to map lsquodatasetsrsquo to the site at which they canbe found This architecture reduces the volume of information stored in a single catalogue andas a result the load on each individual catalogue But the distributed architecture also requiresconsistency is maintained between central and distributed catalogues In comparison DIRACmanages a smaller data volume and have a single centralized catalogue containing all managedfiles This approach has shown to be scalable to the current level of O(10M) replicas and hasseveral advantages A single central catalogue simplifies the operations required to obtain replicainformation and reduces the number of components required to be operational The central-ized architecture employed by DIRAC does have a single major drawback single point of failure

Figure 2 Architecture for Ensuring Availability of Replica Information

To provide redundancy in the availability of replica information DIRAC uses distributedread-only catalogue mirrors shown in Figure 2 The central LFC instance in replicated toLHCbrsquos Tier1s using Oracle Streaming Technology [7] Producers of replica information mustcontact the readwrite master instance while queries can use any of the read-only mirrorcatalogues providing the additional benefit of reducing the load on the master In the eventthe master instance fails registration Requests are persisted in one of several RequestDBs usingthe mechanism presented in Section 2 These Requests are retried until success and provideadditional redundancy for the master instance

4 Ensuring Data Management Integrity[8]Ensuring the availability of replica information as shown in Section 3 is pointless if the meta-data stored is incorrect Similarly storage resources are wasted if physical files present on theseresources are orphans with no information in the replica catalogue The consistency of DIRACsmetadata catalogues and Storage Elements is vital in the provision of reliable data managementEnsuring this consistency is performed with an integrity checking suite This suite (shown inFigure 3) consists of a series of agents that check the mutual consistency of DIRACrsquos three mainData Management resources Storage Elements (SE) LFC LHCbrsquos Bookkeeping and prove-nance DB These agents report any inconsistencies found to a central repository (IntegrityDB)

41 Bookkeeping vs LFCLHCbrsquos Bookkeeping DB contains provenance information regarding all LHCbrsquos jobs and filesFiles which no longer physically exist are marked accordingly in the DB such that they are no

International Conference on Computing in High Energy and Nuclear Physics (CHEPrsquo07) IOP PublishingJournal of Physics Conference Series 119 (2008) 062045 doi1010881742-65961196062045

3

Figure 3 Data Integrity Suite Architecture

longer lsquovisiblersquo The Bookkeeping also provides an interface for users to query for files withparticular properties and is used extensively by physicists for selecting events for analysis Toensure users are given files that exist the consistency of the Bookkeeping must be maintainedwith the LFC An agent is deployed (BK-LFC) which verifies the lsquovisiblersquo files in the Bookkeepingexist in the LFC

42 LFC vs SEThe agent described in the previous section ensures users only receive files from the Bookkeepingthat physically exist The assumption made in the design of this agent is the consistency of theLFC contents This itself must be insured by checking the replicas present in the LFC existon the storage resources An agent is deployed (LFC-SE) which loops over the LFC namespaceperforming this check on the replicas found

43 SE vs LFCAt Petabyte scale Grid computing the pressure to efficiently use storage resources is paramountA small percentage waste has non-negligible financial implications Orphan physical files onstorage resources without registered replicas can never be accessed by users and are one sourceof waste To combat this an agent is deployed (SE-LFC) to loop over the contents of the SEnamespace and verify the physical files found are registered in the LFC

44 Resolving Data Integrity ProblemsThe agents described in the previous sections report problematic files to the IntegrityDB wherethey are collated A further agent is responsible for determining the reason for the inconsistencyand where possible automatically resolving it This resolution can take the form of re-replicationre-registration physical removal or catalogue removal of files Some pathologies may not beresolved automatically and remain in the DB for manual intervention

45 Determining Storage UsageAs mentioned in Section 43 the efficient use of storage resources is important Likewise itis important to know how storage resources are being used An agent is deployed (StorageUsage) which loops over the namespace of the LFC and generates a high granularity pictureof storage usage based on the registered replicas and their sizes This information is stored inthe IntegrityDB indexed by LFC directory such that different views of this information can begenerated

International Conference on Computing in High Energy and Nuclear Physics (CHEPrsquo07) IOP PublishingJournal of Physics Conference Series 119 (2008) 062045 doi1010881742-65961196062045

4

46 Issues with ScalabilityEnsuring the ultimate consistency and integrity in the Petabyte era of Grid computing is animpossible task DIRACrsquos current approach assumes the information provided by the underlyingresources is an accurate representation of that system The limitation of this approach is thisassumption For example the problem of ensuring the consistency of the catalogue with theunderlying storage resource is reflected within the SEs themselves The contents of the SEnamespace like the contents of replica catalogues (and Grid resources in general) are not 100dependable To a-priori solve data integrity issues would require continual data access of allfiles and until disk computing and cooling power is infinite this is not possible The futureapproach within DIRAC will include detecting integrity issues as they arise and resolving themautomatically post-priori This approach is scalable and can be extended as new pathologies areobserved

5 Ensuring Data AccessThe LHCb Computing Model [2] states that re-processing activity will be performed four times ayear During this exercise access to files written to the SEs possibly months before is requiredThese files may have been migrated to tape and cleared from disk caches To obtain access tothese files on disk cache for data transfer and processing DIRAC has implemented a stagingservice Without this service both network and computing resources may be wasted while theprocesses attempting to access the files lsquohangrsquo until the files become available

The DIRAC Stager shown in Figure 4 overlays the SEs and receives staging requests fromDIRAC WMS and DMS components These staging requests are stored in a central DB wherethey are retrieved by an agent that issues pre-staging requests to the remote SEs This agentmonitors the issued requests until the files are available on disk When deployed against StorageResource Manager 22 [9][10] compliant SEs files can be lsquopinnedrsquo to ensure they are not removedbefore access is attempted The availability of the files is then reported back to the system thatmade the original request

Figure 4 Architecture for DIRAC Stager System

51 Workload Management Usage [11]In the context of Workload Management requests are submitted to the DIRAC Stager at thepoint of data optimization At this point the possible sites at which a job may run is determinedusing replica information obtained from the LFC If the files to be processed are present on tapestorage the request to stage the files is passed to the Stager Once these files are staged theStager Agent reports back to the WMS and the job waiting for these files is submitted to theGrid for execution

International Conference on Computing in High Energy and Nuclear Physics (CHEPrsquo07) IOP PublishingJournal of Physics Conference Series 119 (2008) 062045 doi1010881742-65961196062045

5

52 Data Management UsageThe DMS usage of the Stager is similar to that of the WMS The lsquoReplication Optimiserrsquo ob-tains replication requests and assigns them to channels for execution In the event the files areregistered on tape storage the requests are passed to the Stager Once the files are staged theStager Agent reports back to the TransferDB where the files are made available for replication

By combining the needs of the WMS and DMS into a single centralized service it is possibleto coordinate the usage of disk cache for all of LHCbrsquos activities This will be discussed furtherin Section 7

6 ConclusionsThis paper describes several mechanisms to increase the reliability of DIRAC activities whileusing Grid services These examples provide insight into the problems that must be consideredwhen doing large scale Grid Data Management and the solutions so far adopted within DIRACThe solutions adopted include ensuring the persistency of data and requests to manipulate datato ensure the ability for consumers and producers of replica information to contact cataloguesand finally for consumers of files to access these on disk without wasting resources

7 Future WorkThe issues of scalability mentioned with respect to Data Integrity in Section 46 have a solutionin post-priori resolution The key to this development is making every DIRAC component asensor reporting to a centralized repository The DIRAC Logging Service is currently beingdeployed to meet this requirement [12]

The DIRAC Stager System discussed in Section 5 has been extensively used in LHCbcomputing exercises during 2007 [13] The current system is still vulnerable to files being removedfrom disk cache before the replicationjob attempts to access the file This possibility existsbecause of the time between the WMSDMS components being informed of the data availabilityand their ability to use the files The current system could be extended by maintaining a listof issued pins and file sizes which coupled with knowledge of the size of the disk cache wouldprevent stage requests being issued when the disk cache is already full

References[1] A Tsaregorodtsev et al DIRAC A community grid solution CHEP07 (2007)[2] LHCb Collaboration LHCb Computing TDR Technical Report CERNLHCC-05-119 CERN (2005)[3] G McCance et al Building the WLCG file transfer service CHEP07 (2007)[4] S Lemaitre et al Recent Developments in LFC CHEP07 (2007)[5] M Branco et al Managing ATLAS data on a petabyte-scale with DQ2 CHEP07 (2007)[6] L Tuura et al Scaling CMS data transfer system for LHC start-up CHEP07 (2007)[7] B Martelli et al LHCb experience with LFC database replication CHEP07 (2007)[8] A C Smith M Bargiotti DIRAC Data Management consistency integrity and coherence of data CHEP07

(2007)[9] F Donno et al Storage Resource Manager version 22 design implementation and testing experience

CHEP07 (2007)[10] Arie Shoshani et al Storage Resource Managers Recent International Experience on Requirements and

Multiple Co-Operating Implementations 24th IEEE Conference on Mass Storage Systems and Technologies(2007)

[11] S Paterson et al DIRAC Optimized Workload Management CHEP07 (2007)[12] R Graciani et al DIRAC Framework for Distributed Computing CHEP07 (2007)[13] R Nandakumar et al The LHCb Computing Data Challenge DC06 CHEP07 (2007)

International Conference on Computing in High Energy and Nuclear Physics (CHEPrsquo07) IOP PublishingJournal of Physics Conference Series 119 (2008) 062045 doi1010881742-65961196062045

6

Figure 3 Data Integrity Suite Architecture

longer lsquovisiblersquo The Bookkeeping also provides an interface for users to query for files withparticular properties and is used extensively by physicists for selecting events for analysis Toensure users are given files that exist the consistency of the Bookkeeping must be maintainedwith the LFC An agent is deployed (BK-LFC) which verifies the lsquovisiblersquo files in the Bookkeepingexist in the LFC

42 LFC vs SEThe agent described in the previous section ensures users only receive files from the Bookkeepingthat physically exist The assumption made in the design of this agent is the consistency of theLFC contents This itself must be insured by checking the replicas present in the LFC existon the storage resources An agent is deployed (LFC-SE) which loops over the LFC namespaceperforming this check on the replicas found

43 SE vs LFCAt Petabyte scale Grid computing the pressure to efficiently use storage resources is paramountA small percentage waste has non-negligible financial implications Orphan physical files onstorage resources without registered replicas can never be accessed by users and are one sourceof waste To combat this an agent is deployed (SE-LFC) to loop over the contents of the SEnamespace and verify the physical files found are registered in the LFC

44 Resolving Data Integrity ProblemsThe agents described in the previous sections report problematic files to the IntegrityDB wherethey are collated A further agent is responsible for determining the reason for the inconsistencyand where possible automatically resolving it This resolution can take the form of re-replicationre-registration physical removal or catalogue removal of files Some pathologies may not beresolved automatically and remain in the DB for manual intervention

45 Determining Storage UsageAs mentioned in Section 43 the efficient use of storage resources is important Likewise itis important to know how storage resources are being used An agent is deployed (StorageUsage) which loops over the namespace of the LFC and generates a high granularity pictureof storage usage based on the registered replicas and their sizes This information is stored inthe IntegrityDB indexed by LFC directory such that different views of this information can begenerated

International Conference on Computing in High Energy and Nuclear Physics (CHEPrsquo07) IOP PublishingJournal of Physics Conference Series 119 (2008) 062045 doi1010881742-65961196062045

4

46 Issues with ScalabilityEnsuring the ultimate consistency and integrity in the Petabyte era of Grid computing is animpossible task DIRACrsquos current approach assumes the information provided by the underlyingresources is an accurate representation of that system The limitation of this approach is thisassumption For example the problem of ensuring the consistency of the catalogue with theunderlying storage resource is reflected within the SEs themselves The contents of the SEnamespace like the contents of replica catalogues (and Grid resources in general) are not 100dependable To a-priori solve data integrity issues would require continual data access of allfiles and until disk computing and cooling power is infinite this is not possible The futureapproach within DIRAC will include detecting integrity issues as they arise and resolving themautomatically post-priori This approach is scalable and can be extended as new pathologies areobserved

5 Ensuring Data AccessThe LHCb Computing Model [2] states that re-processing activity will be performed four times ayear During this exercise access to files written to the SEs possibly months before is requiredThese files may have been migrated to tape and cleared from disk caches To obtain access tothese files on disk cache for data transfer and processing DIRAC has implemented a stagingservice Without this service both network and computing resources may be wasted while theprocesses attempting to access the files lsquohangrsquo until the files become available

The DIRAC Stager shown in Figure 4 overlays the SEs and receives staging requests fromDIRAC WMS and DMS components These staging requests are stored in a central DB wherethey are retrieved by an agent that issues pre-staging requests to the remote SEs This agentmonitors the issued requests until the files are available on disk When deployed against StorageResource Manager 22 [9][10] compliant SEs files can be lsquopinnedrsquo to ensure they are not removedbefore access is attempted The availability of the files is then reported back to the system thatmade the original request

Figure 4 Architecture for DIRAC Stager System

51 Workload Management Usage [11]In the context of Workload Management requests are submitted to the DIRAC Stager at thepoint of data optimization At this point the possible sites at which a job may run is determinedusing replica information obtained from the LFC If the files to be processed are present on tapestorage the request to stage the files is passed to the Stager Once these files are staged theStager Agent reports back to the WMS and the job waiting for these files is submitted to theGrid for execution

International Conference on Computing in High Energy and Nuclear Physics (CHEPrsquo07) IOP PublishingJournal of Physics Conference Series 119 (2008) 062045 doi1010881742-65961196062045

5

52 Data Management UsageThe DMS usage of the Stager is similar to that of the WMS The lsquoReplication Optimiserrsquo ob-tains replication requests and assigns them to channels for execution In the event the files areregistered on tape storage the requests are passed to the Stager Once the files are staged theStager Agent reports back to the TransferDB where the files are made available for replication

By combining the needs of the WMS and DMS into a single centralized service it is possibleto coordinate the usage of disk cache for all of LHCbrsquos activities This will be discussed furtherin Section 7

6 ConclusionsThis paper describes several mechanisms to increase the reliability of DIRAC activities whileusing Grid services These examples provide insight into the problems that must be consideredwhen doing large scale Grid Data Management and the solutions so far adopted within DIRACThe solutions adopted include ensuring the persistency of data and requests to manipulate datato ensure the ability for consumers and producers of replica information to contact cataloguesand finally for consumers of files to access these on disk without wasting resources

7 Future WorkThe issues of scalability mentioned with respect to Data Integrity in Section 46 have a solutionin post-priori resolution The key to this development is making every DIRAC component asensor reporting to a centralized repository The DIRAC Logging Service is currently beingdeployed to meet this requirement [12]

The DIRAC Stager System discussed in Section 5 has been extensively used in LHCbcomputing exercises during 2007 [13] The current system is still vulnerable to files being removedfrom disk cache before the replicationjob attempts to access the file This possibility existsbecause of the time between the WMSDMS components being informed of the data availabilityand their ability to use the files The current system could be extended by maintaining a listof issued pins and file sizes which coupled with knowledge of the size of the disk cache wouldprevent stage requests being issued when the disk cache is already full

References[1] A Tsaregorodtsev et al DIRAC A community grid solution CHEP07 (2007)[2] LHCb Collaboration LHCb Computing TDR Technical Report CERNLHCC-05-119 CERN (2005)[3] G McCance et al Building the WLCG file transfer service CHEP07 (2007)[4] S Lemaitre et al Recent Developments in LFC CHEP07 (2007)[5] M Branco et al Managing ATLAS data on a petabyte-scale with DQ2 CHEP07 (2007)[6] L Tuura et al Scaling CMS data transfer system for LHC start-up CHEP07 (2007)[7] B Martelli et al LHCb experience with LFC database replication CHEP07 (2007)[8] A C Smith M Bargiotti DIRAC Data Management consistency integrity and coherence of data CHEP07

(2007)[9] F Donno et al Storage Resource Manager version 22 design implementation and testing experience

CHEP07 (2007)[10] Arie Shoshani et al Storage Resource Managers Recent International Experience on Requirements and

Multiple Co-Operating Implementations 24th IEEE Conference on Mass Storage Systems and Technologies(2007)

[11] S Paterson et al DIRAC Optimized Workload Management CHEP07 (2007)[12] R Graciani et al DIRAC Framework for Distributed Computing CHEP07 (2007)[13] R Nandakumar et al The LHCb Computing Data Challenge DC06 CHEP07 (2007)

International Conference on Computing in High Energy and Nuclear Physics (CHEPrsquo07) IOP PublishingJournal of Physics Conference Series 119 (2008) 062045 doi1010881742-65961196062045

6

46 Issues with ScalabilityEnsuring the ultimate consistency and integrity in the Petabyte era of Grid computing is animpossible task DIRACrsquos current approach assumes the information provided by the underlyingresources is an accurate representation of that system The limitation of this approach is thisassumption For example the problem of ensuring the consistency of the catalogue with theunderlying storage resource is reflected within the SEs themselves The contents of the SEnamespace like the contents of replica catalogues (and Grid resources in general) are not 100dependable To a-priori solve data integrity issues would require continual data access of allfiles and until disk computing and cooling power is infinite this is not possible The futureapproach within DIRAC will include detecting integrity issues as they arise and resolving themautomatically post-priori This approach is scalable and can be extended as new pathologies areobserved

5 Ensuring Data AccessThe LHCb Computing Model [2] states that re-processing activity will be performed four times ayear During this exercise access to files written to the SEs possibly months before is requiredThese files may have been migrated to tape and cleared from disk caches To obtain access tothese files on disk cache for data transfer and processing DIRAC has implemented a stagingservice Without this service both network and computing resources may be wasted while theprocesses attempting to access the files lsquohangrsquo until the files become available

The DIRAC Stager shown in Figure 4 overlays the SEs and receives staging requests fromDIRAC WMS and DMS components These staging requests are stored in a central DB wherethey are retrieved by an agent that issues pre-staging requests to the remote SEs This agentmonitors the issued requests until the files are available on disk When deployed against StorageResource Manager 22 [9][10] compliant SEs files can be lsquopinnedrsquo to ensure they are not removedbefore access is attempted The availability of the files is then reported back to the system thatmade the original request

Figure 4 Architecture for DIRAC Stager System

51 Workload Management Usage [11]In the context of Workload Management requests are submitted to the DIRAC Stager at thepoint of data optimization At this point the possible sites at which a job may run is determinedusing replica information obtained from the LFC If the files to be processed are present on tapestorage the request to stage the files is passed to the Stager Once these files are staged theStager Agent reports back to the WMS and the job waiting for these files is submitted to theGrid for execution

International Conference on Computing in High Energy and Nuclear Physics (CHEPrsquo07) IOP PublishingJournal of Physics Conference Series 119 (2008) 062045 doi1010881742-65961196062045

5

52 Data Management UsageThe DMS usage of the Stager is similar to that of the WMS The lsquoReplication Optimiserrsquo ob-tains replication requests and assigns them to channels for execution In the event the files areregistered on tape storage the requests are passed to the Stager Once the files are staged theStager Agent reports back to the TransferDB where the files are made available for replication

By combining the needs of the WMS and DMS into a single centralized service it is possibleto coordinate the usage of disk cache for all of LHCbrsquos activities This will be discussed furtherin Section 7

6 ConclusionsThis paper describes several mechanisms to increase the reliability of DIRAC activities whileusing Grid services These examples provide insight into the problems that must be consideredwhen doing large scale Grid Data Management and the solutions so far adopted within DIRACThe solutions adopted include ensuring the persistency of data and requests to manipulate datato ensure the ability for consumers and producers of replica information to contact cataloguesand finally for consumers of files to access these on disk without wasting resources

7 Future WorkThe issues of scalability mentioned with respect to Data Integrity in Section 46 have a solutionin post-priori resolution The key to this development is making every DIRAC component asensor reporting to a centralized repository The DIRAC Logging Service is currently beingdeployed to meet this requirement [12]

The DIRAC Stager System discussed in Section 5 has been extensively used in LHCbcomputing exercises during 2007 [13] The current system is still vulnerable to files being removedfrom disk cache before the replicationjob attempts to access the file This possibility existsbecause of the time between the WMSDMS components being informed of the data availabilityand their ability to use the files The current system could be extended by maintaining a listof issued pins and file sizes which coupled with knowledge of the size of the disk cache wouldprevent stage requests being issued when the disk cache is already full

References[1] A Tsaregorodtsev et al DIRAC A community grid solution CHEP07 (2007)[2] LHCb Collaboration LHCb Computing TDR Technical Report CERNLHCC-05-119 CERN (2005)[3] G McCance et al Building the WLCG file transfer service CHEP07 (2007)[4] S Lemaitre et al Recent Developments in LFC CHEP07 (2007)[5] M Branco et al Managing ATLAS data on a petabyte-scale with DQ2 CHEP07 (2007)[6] L Tuura et al Scaling CMS data transfer system for LHC start-up CHEP07 (2007)[7] B Martelli et al LHCb experience with LFC database replication CHEP07 (2007)[8] A C Smith M Bargiotti DIRAC Data Management consistency integrity and coherence of data CHEP07

(2007)[9] F Donno et al Storage Resource Manager version 22 design implementation and testing experience

CHEP07 (2007)[10] Arie Shoshani et al Storage Resource Managers Recent International Experience on Requirements and

Multiple Co-Operating Implementations 24th IEEE Conference on Mass Storage Systems and Technologies(2007)

[11] S Paterson et al DIRAC Optimized Workload Management CHEP07 (2007)[12] R Graciani et al DIRAC Framework for Distributed Computing CHEP07 (2007)[13] R Nandakumar et al The LHCb Computing Data Challenge DC06 CHEP07 (2007)

International Conference on Computing in High Energy and Nuclear Physics (CHEPrsquo07) IOP PublishingJournal of Physics Conference Series 119 (2008) 062045 doi1010881742-65961196062045

6

52 Data Management UsageThe DMS usage of the Stager is similar to that of the WMS The lsquoReplication Optimiserrsquo ob-tains replication requests and assigns them to channels for execution In the event the files areregistered on tape storage the requests are passed to the Stager Once the files are staged theStager Agent reports back to the TransferDB where the files are made available for replication

By combining the needs of the WMS and DMS into a single centralized service it is possibleto coordinate the usage of disk cache for all of LHCbrsquos activities This will be discussed furtherin Section 7

6 ConclusionsThis paper describes several mechanisms to increase the reliability of DIRAC activities whileusing Grid services These examples provide insight into the problems that must be consideredwhen doing large scale Grid Data Management and the solutions so far adopted within DIRACThe solutions adopted include ensuring the persistency of data and requests to manipulate datato ensure the ability for consumers and producers of replica information to contact cataloguesand finally for consumers of files to access these on disk without wasting resources

7 Future WorkThe issues of scalability mentioned with respect to Data Integrity in Section 46 have a solutionin post-priori resolution The key to this development is making every DIRAC component asensor reporting to a centralized repository The DIRAC Logging Service is currently beingdeployed to meet this requirement [12]

The DIRAC Stager System discussed in Section 5 has been extensively used in LHCbcomputing exercises during 2007 [13] The current system is still vulnerable to files being removedfrom disk cache before the replicationjob attempts to access the file This possibility existsbecause of the time between the WMSDMS components being informed of the data availabilityand their ability to use the files The current system could be extended by maintaining a listof issued pins and file sizes which coupled with knowledge of the size of the disk cache wouldprevent stage requests being issued when the disk cache is already full

References[1] A Tsaregorodtsev et al DIRAC A community grid solution CHEP07 (2007)[2] LHCb Collaboration LHCb Computing TDR Technical Report CERNLHCC-05-119 CERN (2005)[3] G McCance et al Building the WLCG file transfer service CHEP07 (2007)[4] S Lemaitre et al Recent Developments in LFC CHEP07 (2007)[5] M Branco et al Managing ATLAS data on a petabyte-scale with DQ2 CHEP07 (2007)[6] L Tuura et al Scaling CMS data transfer system for LHC start-up CHEP07 (2007)[7] B Martelli et al LHCb experience with LFC database replication CHEP07 (2007)[8] A C Smith M Bargiotti DIRAC Data Management consistency integrity and coherence of data CHEP07

(2007)[9] F Donno et al Storage Resource Manager version 22 design implementation and testing experience

CHEP07 (2007)[10] Arie Shoshani et al Storage Resource Managers Recent International Experience on Requirements and

Multiple Co-Operating Implementations 24th IEEE Conference on Mass Storage Systems and Technologies(2007)

[11] S Paterson et al DIRAC Optimized Workload Management CHEP07 (2007)[12] R Graciani et al DIRAC Framework for Distributed Computing CHEP07 (2007)[13] R Nandakumar et al The LHCb Computing Data Challenge DC06 CHEP07 (2007)

International Conference on Computing in High Energy and Nuclear Physics (CHEPrsquo07) IOP PublishingJournal of Physics Conference Series 119 (2008) 062045 doi1010881742-65961196062045

6