Upload
cathy
View
31
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park. Mike Smorul Saurabh Channan. Overview. Digital Preservation Research ADAPT Project and Components Pilot Persistent Archive - PowerPoint PPT Presentation
Citation preview
Digital Preservation and Archiving at the Institute for Advanced Computer StudiesUniversity of Maryland, College ParkMike SmorulSaurabh Channan
OverviewDigital Preservation ResearchADAPT Project and ComponentsPilot Persistent Archive
Digital Library and Production Data DistributionGlobal Land Cover Facility
Conclusion
A Digital Approach to Preservation Technology (ADAPT)Premise:Preservation of digital entities into self-describing objectsOAIS Information Packet model as a frameworkSeparation of management into three layers, bitstream, semantic, and access/discoveryDistributed and Secure InfrastructureAutomatic ingestion and replication Policy-Driven Management of Preservation ProcessesGlobal Format RegistrySeparate Peer-to-Peer Deep Archive
ADAPT Architecture
Data Management
Metadata Management
Descriptive Metadata
Preservation Metadata
Administrative Metadata
Deep Archive
Data Grid
ConventionalArchive
PAWN
Management of Preservation Processes
CAN
Metadata
Data
ADAPT ComponentsIngestionProducer-Archive Workflow Network (PAWN)Management of Preservation ProcessesLightweight Preservation Environment (LPE)Access and DiscoveryGrid Retrieval and Search Platform (GRASP)EAP Collection browser
Overall Principles (PAWN)Distributed, secure ingestionOAIS based Information Packet creationUse of web/grid technologies platform independentMinimal client-side requirementsEase of integration with archive and data grid systems.Designed to satisfy data integrity requirements of scientific collections and digital preservation
Distributed Ingestion (PAWN)
`
Ingestion Workflow (PAWN)Negotiate Submission Agreement.Workflow Initialization and Submission Information Packet (SIP) creation.Transfer of SIPs to Data Grid site.Validation of SIP transferOrganization of data into collections and transfer into Data Grid.
Component Overview (PAWN)
Target Collections (PAWN)Digital Image CollectionRich metadata in various formatsWeb site crawlingOnline and interactive contentGLCF Landsat dataSpatial and temporal metadataLarge quantity (over 15,000 objects)
Lightweight Preservation Environment (LPE)The Lightweight Preservation Environment is an archival system based on a modular design using grid and web services.
The current implementation relies mostly on Globus technologies.
Primarily, weve focused on wrapping logic around those components.
Developed Components (LPE)Data Manager (DM): Organizes data and queries between the user and the other components
Policy Manager (PM): Ensures that a minimum number of copies exist for any given file
Transformation Manager (TM): Executes specific transformations on a named file on a given storage node and returns the results
Grid Retrieval and Search Platform (GRASP)Based on concepts developed in the Earth Science Data Interface (ESDI) developed at the UMIACS GLCF.Provides a graphical interface into data grid holdings. Access to entire GLCF holdings through the Storage Resource Broker(SRB)
GRASP Architecture
I/O Abstraction Layer
GRASP ArchitectureGRASP uses a data grid as an abstract storage repository.Metadata in the grid is mined from the grid itself or from external sources and published into a browsable form.Data grids may allow for platform independent metadata, but may not be optimal for access
GRASP Screenshot
Global Land Cover FacilityMission:The GLCF Mission is to encourage the use of remotely sensed imagery, derived products and applications within a broad range of science communities in a manner that improves comprehension of the nature and causes of land cover change and its impact on the Earth.
Goal:The GLCF Goal is to provide free access to an integrated collection of critical land cover and Earth science data through systems that are designed to maximize user outreach and that promote development of novel tools for ordering, visualizing and manipulating spatial data.
Data CollectionsMajority of the holdings are of Landsat and MODIS data
Data DistributionData at the GLCFApproximately 5.1 TB compressedApproximately 13 TB uncompressed
Anticipated Production RateTriple or Quadruple current data holding within the next two year
Chart1
3069.45
10815.11
269121.47
647432.61
1047074
565833.28
1133355.52
1151651.19
42959.72
974617.09
2179940.59
2771023.11
2205639.4
4140785.21
3436878.95
4539745.95
4101629.09
3249906.85
4020107.61
4903974.43
5438624.95
7707098.48
8626152.86
7245293.24
9072188.58
8458979.08
8109359.08
6427214.83
9061943.48
10023031.44
Megabytes
Month
Megabytes
Data Traffic
data
MonthHitsPercent HitsMegabytesPercent Megabytes
Aug-022040.00%3069.450.00%
Sep-0212000.00%10815.110.00%
Oct-02292360.20%269121.470.20%
Nov-02665750.40%647432.610.50%
Dec-02938480.60%10470740.80%
Jan-03504130.30%565833.280.40%
Feb-03872200.50%1133355.520.90%
Mar-031182090.70%1151651.190.90%
Apr-0343950.00%42959.720.00%
May-031086730.60%974617.090.80%
Jun-034530712.70%2179940.591.70%
Jul-033698072.20%2771023.112.20%
Aug-032366001.40%2205639.41.80%
Sep-034065972.40%4140785.213.30%
Oct-034593092.70%3436878.952.70%
Nov-035241003.10%4539745.953.60%
Dec-034373632.60%4101629.093.30%
Jan-043911312.30%3249906.852.60%
Feb-047454254.40%4020107.613.20%
Mar-045816703.50%4903974.433.90%
Apr-045286743.10%5438624.954.30%
May-046909354.10%7707098.486.10%
Jun-0411586536.90%8626152.866.80%
Jul-047293024.30%7245293.245.70%
Aug-0411788847.00%9072188.587.20%
Sep-0410452826.20%8458979.086.70%
Oct-0411647826.90%8109359.086.40%
Nov-0411711396.90%6427214.835.10%
Dec-0414213288.40%9061943.487.20%
Jan-0516239979.60%10023031.448.00%
Feb-059810815.80%4444209.123.50%
data
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Megabytes
Month
Megabytes
Data Traffic
Data Discovery ApplicationsESDI Web Interface User friendly Search Retrieve Discover Scalable Over 9TB a month !
GLCF ArchitectureScalable and Reliable
SunFire V100
Sun
ProFTPd servers
Participation PossibilitiesPAWN ingestion componentMinimal geospatial metadata support planned, can be expanded to support NGDA endpointGRASP display componentSolid core components, end-user interfaces need additional polishingGLCF data holdingsAdditional hardware required if additional data and access mechanisms (grid, etc) requiredOther possibilities include: grid infrastructure, GSI security, format registry, etc.
Questions