Upload
malia
View
51
Download
0
Embed Size (px)
DESCRIPTION
Preservation Environment Working Group. Officers: Bruce Barkstrom (NASA Langley) Reagan Moore (SDSC) Goals Demonstrate interoperability between multiple preservation environments that are based on data grid technology Interactions with Astro Working Group IVOA preservation working group - PowerPoint PPT Presentation
Citation preview
GGF-17 Astro Workshopwww.gridforum.org
Preservation Environment Working Group
• Officers: Bruce Barkstrom (NASA Langley)Reagan Moore (SDSC)
• Goals Demonstrate interoperability between multiple preservation environments
that are based on data grid technology• Interactions with Astro Working Group
IVOA preservation working group Define standards for preservation of astronomy collections
Sustainability Governance Preservation authenticity, integrity, infrastructure independence Standards
• FITS data format• UCD semantics• Hyperatlas plates• IVOA access services
GGF-17 Astro Workshopwww.gridforum.org
Intellectual Property Policy
• I acknowledge that participation in GGF8 is subject to the GGF Intellectual Property Policy.• Intellectual Property Notices Note Well: All statements related to the activities of the GGF and
addressed to the GGF are subject to all provisions of Section 17 of GFD-C.1 (.pdf), which grants to the GGF and its participants certain licenses and rights in such statements. Such statements include verbal statements in GGF meetings, as well as written and electronic communications made at any time or place, which are addressed to: the GGF plenary session,
• any GGF working group or portion thereof, • the GFSG, or any member thereof on behalf of the GFSG, • the GFAC, or any member thereof on behalf of the GFAC, • any GGF mailing list, including any working group or research group list, or any other list functioning
under GGF auspices, • the GFD Editor or the GWD process • Statements made outside of a GGF meeting, mailing list or other function, that are clearly not intended
to be input to an GGF activity, group or function, are not subject to these provisions.• Excerpt from Section 17 of GFD-C.1 Where the GFSG knows of rights, or claimed rights, the GGF
secretariat shall attempt to obtain from the claimant of such rights, a written assurance that upon approval by the GFSG of the relevant GGF document(s), any party will be able to obtain the right to implement, use and distribute the technology or works when implementing, using or distributing technology based upon the specific specification(s) under openly specified, reasonable, non-discriminatory terms. The working group or research group proposing the use of the technology with respect to which the proprietary rights are claimed may assist the GGF secretariat in this effort. The results of this procedure shall not affect advancement of document, except that the GFSG may defer approval where a delay may facilitate the obtaining of such assurances. The results will, however, be recorded by the GGF Secretariat, and made available. The GFSG may also direct that a summary of the results be included in any GFD published containing the specification. GGF Intellectual Property Policies are adapted from the IETF Intellectual Property Policies that support the Internet Standards Process.
GGF-17 Astro Workshopwww.gridforum.org
Preservation Components
Authenticity - manage links to preservation metadata Data grid OGSA naming / OGSA DAIS / Information Dissemination / DFDL
Integrity - assure data and metadata are not corrupted, track chain of custody, manage access controls, update state information
Data grid OGSA naming / OGSA DAIS / Grid File Systems / OGSA Data / Grid
Information Retrieval / OGSA Authorization Infrastructure independence - assure that no dependencies are
introduced on use of a particular vendor product Data grid Grid File Systems / DFDL / OGSA Data Replication / Grid Storage
Management / GridFTP / Transaction Management / OGSA Data / Grid Remote Procedure Call
GGF-17 Astro Workshopwww.gridforum.org
Preservation Approach
• Standard semantics IVOA - Uniform Content Descriptors
• Standard data encoding format IVOA - FITS file
• Standard access services IVOA - Cone Search, Simple Image Access Protocol, Simple
Spectrum Access Protocol, VOEvent notification, Mosaic service
• Standard validation services FITS header validation - correct coordinate information HyperAtlas standard plates - re-project pixels to standard plate
• Federation across independent systems Address sustainability by replicating across sustainability models
GGF-17 Astro Workshopwww.gridforum.org
Data Grids as Basis for Preservation
• Authenticity mechanisms Link images to preservation metadata
Provenance information for source of image (FITS header extraction) Descriptive information - UCDs
• Integrity mechanisms Chain of custody - tracking where images have been stored Audit trail - tracking operations performed on images Persistent name spaces for users, files, metadata Checksums Replicas Validation of checksums, synchronization of replicas Federation - managing integrity across independent data grids
• Infrastructure independence Ability to migrate archives onto new technology
GGF-17 Astro Workshopwww.gridforum.org
NOAO Preservation - Irene Barg
Federated SRB data grids
Goals: Replicate images Deposit into an archive Maintain availability Capture data daily
Implementation Federation of data grids Pull environment Reliable transport
Preservation environment Separate data grid Reliable storage
Archive
GGF-17 Astro Workshopwww.gridforum.org
Sustainability - Federation of Federations
Data Grid
Country SRBversion
Demouserggfsdsc
SRB Zonename
Storage ResourceLogical Name
I/OMB/sec
APAC Australia 3.4.0-P yes AU StoreDemoResc_AU 3.9 NOAO Chile/US 3.4.1 yes noao-ls-t3-z1 noao-ls-t3-fsChinaGrid China CGSP-II (software) IN2P3 France 3.4.0-P yes ccin2p3 LyonFS4 [25.] DEISA Italy 3.4.0-P yes DEISA demo-cineca KEK Japan 3.4.0-P yes KEK-CRC rsr01-ufs 7.4 SARA Netherlands 3.4.0-P yes SARA SaraStore IB New
Zealand3.4.1 yes aucklandZone aucklandResc (0.3)
ASGC Taiwan 3.4.0-P yes TWGrid SDSC-GGF_LRS1 (0.1) NCHC Taiwan 3.4.0-P yes ecogrid ggf-test RAL UK 3.4.0-P (firewall) tdmg2zone IB UK 3.4.1 yes avonZone avonResc WunGrid UK 3.3.1 (hardware) SDSC-wun sfs-tape Purdue US 3.4.0-P yes Purdue uxResc1 (2.5) Teragrid US 3.4.0-P yes SDSC-GGF sfs-disk U Md US 3.4.0-P yes umiacs narasrb02-unix1
GGF Data Grid Interoperability Demonstration
GGF-17 Astro Workshopwww.gridforum.org
Preservation at Scale
• Creation of standard plates for publication in a Hyperatlas - Roy Williams (Caltech)
• Used Montage mosaic code developed at IPAC/Caltech (John Good) Created mosaics by re-projecting 4,121,440 images from the 2MASS archive of 8
TB that had been replicated to the Teragrid. Because of overlap, required manipulating 6,275,494 files, and 14 TB of data. Processing time was over 100,000 CPU-hours on the Teragrid. Each mosaic covered a 6 degree square
Tiled each mosaic into a 12x12 array Registered plates into the Hyperatlas
• Advantages Standard projection Ability to composite images for improved signal to noise ratio Incorporated domain knowledge in generation of the standard product
GGF-17 Astro Workshopwww.gridforum.org
Collection-based Approach
• Authenticity - assertions made by creator of records Provenance metadata Descriptive metadata Encapsulation of metadata with data in an Archival Information
Package Validation of consistency between authenticity metadata and stored
data Verify data file exists for each metadata record Verify for each stored data file, a metadata record exists
Validation of provenance metadata Verify consistency of defined metadata attributes across all records Verify preservation consistency constraints (a record appears only
once)
GGF-17 Astro Workshopwww.gridforum.org
Collection-Based Approach
• Authenticity Validation of assertions about the collection
Characterization of assertions as management policies Mapping of management policies to executable rules Specification of state information on which the rules operate Specification of state information to manage rule outcomes
• Implementation Granularity of application Type of rule
Enterprise Setting of rule parameters Archives Aperiodic rule Collection Periodic rules Record Atomic rules
GGF-17 Astro Workshopwww.gridforum.org
Collection-based Approach
• Integrity - assertions made by archivists that both the data and metadata are uncorrupted, the chain of custody can be tracked, all actions performed by identified persons, the risk of data loss has been minimized
• Requires mechanisms for: Checksums - checks based on file size, System5 checksum, MD5
checksum Replicas, backups, versions Synchronization - between replicas, between system buffers and storage,
between archives and local storage Federation - replication of both metadata and data, while coordinating
name spaces Authentication - unique identity for archivists independently of storage
system Authorization - access controls managed independently of storage system
GGF-17 Astro Workshopwww.gridforum.org
Implementations
• NARA Research prototype persistent archive Electronic Records Archive Persistent Archive Testbed
• SDSC NSDL persistent archive CDL Digital Preservation Repository
• NASA Langley Archive Next Generation - ANGe
• Taiwan• Caspar / Digital Curation Centre• Diligent
GGF-17 Astro Workshopwww.gridforum.org
Preservation Services
• Appraisal DAIS / Grid File Systems
• Accession GridFTP / Grid File Systems / DAIS / Transaction Management / OGSA
Data / OGSA Naming / GridFTP
• Description DAIS / OGSA Naming / DFDL / Transaction Management
• Arrangement Grid File Systems / DAIS
• Preservation Grid File Systems / Grid Storage Management / OGSA Data Replication /
GridFTP / Transaction Management / OGSA Naming
• Access DAIS / DFDL / Grid File Systems / GridFTP / Transaction Management
GGF-17 Astro Workshopwww.gridforum.org
Propose Preservation Demonstration
• Formal validation of existing archives Consistency between metadata and stored data Verification of name space integrity
• Formal extraction of records Bulk operations to extract metadata
• Formal deposition of records into a federated data grid Federation with a second data grid Bulk operations to load metadata and data into remote data grid
• Formal validation of new archives Consistency between metadata and stored data Verification of name space integrity
• Formal export of records from the new archive and import back into the original archives, without loss of authenticity or integrity