Upload
rose-watkins
View
224
Download
1
Tags:
Embed Size (px)
Citation preview
1
ICADL 2004 Tutorial
Digital Library:Overview and Framework
Edward A. Fox, [email protected] Library Research Laboratory, Dept. of CS
Virginia Tech, Blacksburg, VA 24061 USAhttp://fox.cs.vt.edu/talks/2004/
http://fox.cs.vt.edu/cv.htm
Acknowledgements (Selected)
• Sponsors: ACM, Adobe, AOL, CAPES, CNI, CONACyT, DFG, IBM, Microsoft, NASA, NDLTD, NLM, NSF (IIS-9986089, 0086227, 0080748, 0325579; ITR-0325579; DUE-0121679, 0136690, 0121741, 0333601), OCLC, SOLINET, SUN, SURA, UNESCO, US Dept. Ed. (FIPSE), VTLS
3
Acknowledgements: Faculty, Staff
• Lillian Cassel, Debra Dudley, Roger Ehrich, Joanne Eustis, Weiguo Fan, James Flanagan, C. Lee Giles, Eberhard Hilf, John Impagliazzo, Filip Jagodzinski, Rohit Kelapure, Neill Kipp, Douglas Knight, Deborah Knox, Aaron Krowne, Alberto Laender, Gail McMillan, Claudia Medeiros, Manuel Perez, Naren Ramakrishnan, Layne Watson, …
4
Acknowledgements: Students
• Pavel Calado, Yuxin Chen, Fernando Das Neves, Shahrooz Feizabadi, Robert France, Marcos Goncalves, Nithiwat Kampanya, S.H. Kim, Aaron Krowne, Bing Liu, Ming Luo, Paul Mather, Fernando Das Neves, Unni. Ravindranathan, Ryan Richardson, Rao Shen, Ohm Sornil, Hussein Suleman, Ricardo Torres, Wensi Xi, Baoping Zhang, Qinwei Zhu, …
5
For More Information• Magazine: www.dlib.org• Books: http://fox.cs.vt.edu/DLSB.html (1994)
– MIT Press: Arms, plus by Borgman, Licklider (1965)– Morgan Kaufmann: Witten... (several), Lesk (2nd edition)
• Conferences– ECDL: www.ecdl2005.org– ICADL: http://icadl2004.sjtu.edu.cn– JCDL: www.jcdl2005.org
• Associations– ASIS&T DL SIG– IEEE TCDL: www.ieee-tcdl.org (student awards,
consortium)• NSF: www.dli2.nsf.gov• Labs: VT: www.dlib.vt.edu, http://ei.cs.vt.edu/~dlib/
6
7
Outline
• 1. 5S Framework for DL• 1.1. Motivation: the problem• 1.2. Theory• 1.3. Tools/Applications• 1.4. Quality• 1.5. Conclusions, Future Work• 2. DL Integration• 3. DL Overview• 4. OAI, OCKHAM, CSTC, NSDL, NDLTD• 5. Open Source, Repositories, DigArch, ODL
8
Outline
• 1. 5S Framework for DL• 1.1. Motivation: the problem
– Hypotheses and research questions• 1.2. Theory
– 5S: introduction, formal definitions– The formal ontology
• 1.3. Tools/Applications– Language– Visualization– Generation– Logging
• 1.4. Quality• 1.5. Conclusions, Future Work
9
1.1. Motivation
• Digital Libraries (DLs): what are they??– No definitional consensus– Conflicting views– Makes interoperability a hard problem
• DLs are not benefiting from formal theories as are other CS fields: DB, IR, PL, etc.
• DL construction: difficult, ad-hoc, lack of support for tailoring/customization
• Conceptual modeling, requirements analysis, and methodological approaches are rarely supported in DL development.– Lack of specific DL models, formalisms, languages
10
Hypotheses
• A formal theory for DLs can be built based on 5S.
• The formalization can serve as a basis for modeling and building high-quality DLs.
11
Research Questions1. Can we formally elaborate 5S?
2. How can we use 5S to formally describe digital libraries?
3. What are the fundamental relationships among the Ss and high-level DL concepts?
4. How can we allow digital librarians to easily express those relationships?
5. Which are the fundamental quality properties of a DL? Can we use the formalized DL framework to characterize those properties?
6. Where in the life cycle of digital libraries can key aspects of quality be measured and how?
12
1.2. Informal 5S Definitions
DLs are complex systems that
• help satisfy info needs of users (societies)
• provide info services (scenarios)
• organize info in usable ways (structures)
• present info in usable ways (spaces)
• communicate info with users (streams)
13
5Ss
Ss Examples Objectives
Streams Text; video; audio; image Describes properties of the DL content such as encoding and language for textual material or particular forms of multimedia data
Structures Collection; catalog; hypertext; document; metadata
Specifies organizational aspects of the DL content
Spaces Measure; measurable, topological, vector, probabilistic
Defines logical and presentational views of several DL components
Scenarios Searching, browsing, recommending
Details the behavior of DL services
Societies Service managers, learners, teachers, etc.
Defines managers, responsible for running DL services; actors, that use those services; and relationships among them
14
Digital Objects (DOs)
• Born digital
• Digitized version of “real” object– Is the DO version the same, better, or worse?– Decision for ETDs: structured + rendered
• Surrogate for “real” object– Not covered explicitly in metamodel for a
minimal DL– Crucial in metamodel for archaeology DL
15
Metadata Objects (MDOs)
• MARC
• Dublin Core
• RDF
• IMS
• OAI (Open Archives Initiative)
• Crosswalks, mappings
• Ontologies
• Topics maps, concept maps
16
Other Key Definitions
–coll, catalog, repository, service, archive, (minimal) DL
–See Gonçalves et al. in April 2004 ACM Transactions on Information Systems (TOIS)
17
5S and DL formal definitions and compositions (April 2004 TOIS)
5S
structures (d.10)streams (d.9) spaces (d.18) scenarios (d.21) societies (d. 24)
structural metadataspecification(d.25)
descriptive metadataspecification(d.26)
repository(d. 33)
collection (d. 31)
(d.34)indexingservice
structured stream (d.29)
digitalobject (d.30)
metadata catalog (d.32)
browsingservice
(d.37)
searchingservice (d.35)
digital library(minimal) (d. 38)
services (d.22)
sequence (d. 3)
graph (d. 6)function (d. 2)
measurable(d.12), measure(d.13), probability (d.14), vector (d.15), topological (d.16) spaces
event (d.10)state (d. 18)
hypertext(d.36)
sequence (d. 3)
transmission(d.23)
relation (d. 1) language (d.5)
grammar (d. 7)
tuple (d. 4)*
18
Glossary: Concepts in the Minimal DL and Representing Symbols
Concept Symbol Digital object do Metadata specification ms Set of metadata specifications mss Collection C Catalog DMC Repository R Event e Scenario Sc Services Se Actor Ac Service Manager SM Operation op Society Soc
19
5S
Streams
text
audio
image
video do mss
R
C DMcIc
Se
Sc
e
SM
Ac
op
Scenarios
Societies
Top
Pr
Metric
Measurable
Measure
Structures
Spaces
Vec
ms
Static /Passive
Dynamic /Active
20
Digital Library Formal OntologyStreams
text
audio
image
video digitalobject
Repository
CollectionCatalog
describes
stores
is_version_of/ cites/links_to
Index
Service
Scenario
event
extends
reuses
ServiceManager
Actor
operationexecutes
participates_in
recipient
runs
Scenarios
Societies
inherits_from/includes
association
uses
Topological
ProbabilisticMetric
Measurable
Measure
describes
employs
produces
employs
produces
employs
produces
Structures
Spaces
Vector
contains
metadata specifications
is_a is_a
precedes
happens_before
is_a
redefinesinvokes
contains
contains
21
Ontology: Applications
• Expand definition of minimal DL by characterizing– typical DL services – in the context of “employs” and “produces”
relationships
• Use characterization to:– Reason about how DL services can be built
from other DL components– As well as be composed with other services
through extension or reuse
22
Ontology: Applications
23
Ontology: Taxonomy of Services
BrowsingCollaboratingCustomizingFilteringProviding
accessRecommendingRequestingSearchingVisualizing
AnnotatingClassifyingClusteringEvaluatingExtractingIndexing
MeasuringPublicizing
RatingReviewing (peer)
SurveyingTranslating
(language)
ConservingConverting
Copying/ReplicatingEmulatingRenewing
Translating (format)
AcquiringCatalogingCrawling (focused)
DescribingDigitizing
FederatingHarvestingPurchasingSubmitting
PreservationalCreational
AddValue
Repository-Building
Information SatisfactionServices
Infrastructure Services
24
Composition of key fundamental / infrastructure services
Ic
Acquiring
universalcollection
C
DMCIndexing
DescribingCataloguing
Linking
Hypertext
Submitting
AuthoringDigitizing
doi
mskjp
p
e
e
describes
p
p
p
e
e
p
e
p
25
Composition of additional services
SearchingBrowsing
queryanchor
Society
actor
C, {doi, i I}
Recommending Filtering Binding Visualizing Expanding query
user model/expr query/category {doj, j J}
{dor, r R} {dof, f F}
biuk
InformationSatisfaction Services
spj query’
fundamental
Rating Training
Infrastructure
Services (Add_Value)
composite
Requesting
handle
p pp
e e e{(doi, acj, rij), i I, j }
p
e
e
p p p p p
e e
classCt
e ee e
e
p
e
Indexing
IC
p
e
transformer
e
26
ApproachDomain Concepts (theory)
DLArchitecture
instance of
ModelingLanguage(Meta-Model)
Model
used to compose instance of
abstracted from
represented by
interpreted as
represented by
interpreted as
instance of
instance of
Running
DL DL
Actors
“Real”World
“real” worldobject
Q
27
1.3. Tools/Applications
5S MetaModel
5SGraphDL
Expert
DL Designer
5SL DL
Model
5SLGen
Practitioner
Researcher
TailoredDL
Teacher
componentpool
ODLSearch,ODLBrowse,ODLRate,ODLReview,
…….
Logging ModuleXMLLog
28
5SL: a DL design language
• Domain specific languages – Address a particular class of problems by offering
specific abstractions and notations for the domain at hand
– Advantages: domain-specific analysis, program management, visualization, testing, maintenance, modeling, and rapid prototyping.
• XML-based realization of 5S– Interoperability– Use of many sub-languages (e.g., MIME types, XML
Schemas, UML notations)
29
5SL – The Minimal DL Metamodel
Index
Actor
Search Manager
Index Manager
Document
Collection Catalog
Metadata
Service
Manager
Interface Manager
Community
Event
Scenario
Service
Browsing Manager
User
Interface
Scenarios (Meta-) Model
Spatial
(Meta-) Model
Meta-Models
Meta-ModelsPrimitives
Stream
(Meta-)ModelStructural (Meta-) Model
Text AudioVideo Image
Societal (Meta-) Model
Retrieval
Model
uses
runs
receiver
Repository Manager
30
<document name=`ETD'>
<stream_enumeration>
<stream
value=`ETDText'>
<stream
value=`ETDAudio'>
...
</stream_enumeration>
<structured_stream>
%XMLSchema%
<structured_stream>
</document>
Example of Document declaration in theStructures Model
<Society>
<Actor>
<Community name='Patron‘/>
<Attribute name='name‘
type='String'/>
<Attribute name='ID‘
type='Integer'/>
</Community>
<Community name='Student'>
<Service>Converting</Service>
</Community>
<Community name='ETDReviewer'>
<Service>Reviewing</Service>
</Community>
<Community name='ETDCataloguer'>
<Service>Cataloguing</Service>
</Community>
</Actor>
………
Example of Actors declaration in theSocieties Model
<SERVICE name ='Searching'>
<SCENARIO name='SimpleSearching'>
<NOTE>Simple scenario for an NDLTD
site searching service</NOTE>
<EVENT>
<SENDER>Patron</SENDER>
<RECEIVER>InterfaceManager</RECEIVER>
<OPERATION name=SearchCriteria/>
<PARAMETER>collection</PARAMETER>
<PARAMETER>query</PARAMETER>
</EVENT>
<EVENT>
<SENDER>InterfaceManager</SENDER>
<RECEIVER>SearchManager</RECEIVER>
<OPERATION name='Search'/>
<PARAMETER>collection</PARAMETER>
<PARAMETER>query</PARAMETER>
</EVENT>
<EVENT>
<SENDER>SearchManager</SENDER>
<RECEIVER>InterfaceManager</RECEIVER>
<PARAMETER name='Results'>WtdSet
</PARAMETER>
</EVENT> ….
Example of Service declaration in theScenario Model
31
• Help users model their own instances of a digital library (DL) in the 5S language (5SL).
• A simple modeling process which enables rapid generation of digital libraries
• Features– 5SGraph loads and displays a metamodel in a structured toolbox.– The structured editor of 5SGraph provides a top-down visual
building environment for the DL designer.– 5SGraph produces syntactically correct 5SL files according to the
visual model built by the designer.
5SGraph: A DL Modeling Tool
32
Overview of 5SGraph
Workspace
(instance model)
Structured
toolbox
(metamodel)
33
5SGraph: Other Key Features
• Flexible and extensible architecture
• Reuse of models– Load, save, and change common (sub-)
models
• Synchronization of views
• Enforcing of semantic constraints
34
5SGraph Evaluation: Usability Study
Task 1 Task 2 Task 3 Completion Rate (%) 100 100 100
Mean Task Time (min) 11.3 11.4 15.1 Mean Closeness to Expertise 0.483 0.752 0.712 Mean Goal Achievement (%) 97.4 97.4 98.2
0
1
2
3
4
5
6
7
8
9
10
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Satisfaction
Usefulness
0
1
2
3
4
5
6
7
8
9
10
Pre-Understanding Post-Understanding
35
5SGen
• Version 1 -- MARIAN as the target system– Focused on rich structures: semantic networks– Behavior attached to nodes/links
• Version 2 -- Shifted for later work to componentized (ODL) approach – Focused on scenarios/societies– Structures/Spaces encapsulated within components
(e.g., relational tables, indexes)– Only textual streams supported
36
5SLGen – Version 2: ODL, Services, Scenarios
5SL-SocietiesModel (1)
XPATH/JDOMTransform (2)
XMI:ClassModel (3)
Xmi2Java (4)
JavaClasses
Model (5)
superclass
DeterministicFSM (10)
SMC (11)
JavaFinite
State MachineClass
Controller (12)
5SL-ScenarioModel (6)
XPath/JDOMTransform (7)
StateChartModel (8)
Scenario Synthesis (9)
ODLSearch
Java
Wrapping
import
ComponentPool
ODLBrowse
Java
Wrapping
import
.
.
.
JSPUser
InterfaceView (13)
Generated DL Services
DLDesigner
DLDesigner
binds
5SLGen
5SL-SocietiesModel (1)
XPATH/JDOMTransform (2)
XMI:ClassModel (3)
Xmi2Java (4)
JavaClasses
Model (5)
superclass
DeterministicFSM (10)
SMC (11)
JavaFinite
State MachineClass
Controller (12)
5SL-ScenarioModel (6)
XPath/JDOMTransform (7)
StateChartModel (8)
Scenario Synthesis (9)
ODLSearch
Java
Wrapping
import
ComponentPool
ODLBrowse
Java
Wrapping
import
.
.
.
ODLSearch
Java
Wrapping
import
ComponentPool
ODLBrowse
Java
Wrapping
import
.
.
.
JSPUser
InterfaceView (13)
Generated DL Services
DLDesigner
DLDesigner
binds
5SLGen
37
5SLGen
• Proof of Concept: prototyping – CITIDEL– Viaduct– NDLTD Union Catalog– BDBComp
38
XML-based DL Log Standard• Log analysis
– is a source of information on:• How patrons really use DL services• How systems behave while supporting user information
seeking activities
• Used to:– Evaluate and enhance services– Guide allocation of resources
• Common practice in the web setting– Supported by web servers, proxy caches
• DL Logging can be more detailed
39
DL Logging Features
• Captures high level user and system behaviors
• Organized according to the 5S framework– Hierarchical organization (XML-based)– Centered on the notions of events
• Record only events related to initial user inputs and final system outputs
• Help to understand user interactions and the perceived value of responses
40
The XML Log Format
Log
SessionId MachineInfo StatementTransaction Timestamp
SessionInfo RegisterInfo StatementEvent Timestamp
Action
Search Browse StoreSysInfoUpdate
SearchBy QueryString CatalogCollection PresentationInfo
StatusInfo
Timeout
41
1.4. Describing Quality in Digital Libraries
• What’s a “good” digital Library?– Central Concept: Quality!– Hypotheses of this work:
• Formal theory can help to define “what’s a good digital library” by:
• New formalizations of quality indicators for DLs within our 5S framework
• Contextualizing these measures within the Information Life Cycle
42
Quality DimensionsDL Concept Dimensions of Quality Digital object Accessibility
Pertinence Preservability Relevance Similarity Significance Timeliness
Metadata specification Accuracy Completeness Conformance
Collection Completeness Impact Factor
Catalog Completeness Consistency
Repository Completeness Consistency
Services Composability Efficiency Effectiveness Extensibility Reusability Reliability
43
Digital Objects: Accessibility
• A digital object is accessible by an DL actor or patron, if
1. it exists in the DL collections
2. is retrievable from the repository
3. it is not restricted from access– by metadata on rights– For actor or actor’s society
44
Digital Objects: Pertinence
• Inf(doi) = information carried by a digital object or any of its descriptions
• IN(acj) = information need of an actor
• Contextjk = an amalgam of societal factors which can impact the judgment of pertinence by acj at time k.
– Factors include time, place, the actor's history of interaction, task in hand, and factors implicit in the interaction and ambient environment.
45
Digital Objects: Pertinence
• The pertinence of a digital object to a user acj is an indicator function Pertinence(doi, acj): Inf(doi) IN(acj) Contextjk defined as:
– 1, if Inf(doi) is judged by acj to be informative with regards to IN(acj) in context Contextjk;
– 0, otherwise
46
Digital Objects: Relevance
• Relevance (doi,q) 1, if doi is judge by external-judge to be relevant to q0, otherwise
• Relevance Estimate– Rel(doi,q) = doi
dj / |doi| |q|
• Objective, public, social notion– Established by a general consensus in the field, not
subjective, private judgment by an actor with an information need
47
Metadata Specifications and Metadata Format: Completeness
• Refers to the degree to which values are present in the description, according to a metadata standard. As far as an individual property is concerned, only two situations are possible: either a value is assigned to the property in question, or not.
• Completeness(msx) = 1 - (no. of missing
attributes in msx/ total attributes of the schema
to which msx conforms)
48
Metadata Specifications and Metadata Format: Completeness
• OCLC NDLTD Union catalog
00. 10. 20. 30. 40. 50. 60. 70. 80. 9
1
GWUD LSU
VTET
D
MIT
UBC
PHYS
NET
VTIN
DIV
VAND
ERBI
LT
NCSU
USAS
K
PITT HKU
HUMB
OLT
OCLC
BGMY
U
DRES
DEN
VIEN
NA
GATE
CH
ETSU USF
MUEN
CHEN
UTEN
N
CCSD
WATE
RLOO
NSYS
U
LAVA
L
UPSA
LLA
CALT
ECH
UCL
WagU
niv
49
Metadata Specifications and Metadata Format: Conformance
• An attribute attxy of a metadata specification msx is
cardinally conformant to a metadata format/standard if:– it appears at least once, if attxy is marked as
mandatory;– its value is from the domain defined for attxy;
– it does not appear more than once, if it is not marked as repeatable.
• Conformance(msx) = ((attribute attxy of msx)
degree of conformance of attxy)/ total attributes).
50
Metadata Specifications and Metadata Format: Conformance
• Based on ETD-MS
0. 75
0. 8
0. 85
0. 9
0. 95
1
GW
UD
LSU
VTET
D
MIT
UBC
PHYS
NET
VTIN
DIV
VAN
DER
BILT
NC
SU
USA
SK
PITT HKU
HU
MBO
LT
OC
LC
BGM
YU
DR
ESD
EN
VIEN
NA
GAT
ECH
ETSU
USF
MU
ENC
HEN
UTE
NN
CC
SD
WAT
ERLO
O
NSY
SU
LAVA
L
UPS
ALLA
CAL
TEC
H
UC
L
Wag
Uni
v
51
Services: Efficiency / Effectiveness
• Effectiveness– Very common measures: Precision, Recall, F1, 10-
precision, R-Precision– Other services may have different measures: e.g.,
Recommending, etc.
• Efficiency– let t(e) be the time of an event e
– let eix and efx be the initial and the final event of service sex .
– For service sex, efficiency is defined as:
• Efficiency(sex) = t(efx) - t(eix)
52
Services: Extensibility & Reusability
• A service Y reuses a service X if the behavior of Y incorporates the behavior of X.
• A service Y extends a service X if it subsumes the behavior of X and potentially includes additional
subflows of events.
53
Services: Extensibility & Reusability (2)
• Macro-Reusability(Serv) = no. of reused services/ total number of services
• Micro-Reusability(Serv) = number of lines of code of managers that implement (run) reused services/ total lines of code
54
Services: Extensibility and Reusability
Service Component
Based
LOC for implementing
service
LOC reused from
component
Total LOC
Searching – Back-end Yes - 1650 1650
Search Wrapping No 100 - 100
Recommending Yes - 700 700
Recommend Wrapping No 200 - 200
Annotating – Back-end Yes 50 600 600
Annotate Wrapping No 50 - 50
Union Catalog Yes - 680 680
User Interface Service No 1800 - 1600
Browsing No 1390 - 1390
Comparing (objects) No 650 - 650
Marking Items No 550 - 550
Items of Interest No 480 - 480
Recent Searches/Discussions
No 230 - 230
Collections Description No 250 - 250
User Management No 600 - 600
Framework Code No 2000 - 2000
Total 8280 3630 11910
Macro-Reusability = 4/16 = 0.25Micro-Reusability = 3630 / 11910 = 0.304
55
AuthoringModifying
OrganizingIndexing
Storing
Archiving
NetworkingAccessing
Filtering
Creation
DistributionUtilization
Significance
Similarity
Pertinence
AccuracyCompletenessConformance
Seeking
SearchingBrowsingRecommending
Relevance
Timeliness
Accessibility
Accessibility
Inactive
Active
Discard
RetentionMining
Semi-Active
Preservability
Timeliness
Preservability
Describing
Quality and the Information Life Cycle
56
Quality Model: Evaluation
• Focus groups– 3 librarians– Major points
• Focus on DLs not traditional libraries • Some indicators may have more theoretical than
practical use in some contexts• Liked minimalist approach• Interesting and potentially useful mainly for
education and evaluation
57
1.5. Conclusions• We have answered the almost 40-year-old
challenge of Licklider to build a unified CS / LIS theory by– Proposing and formalizing the first comprehensive
formal framework for digital libraries
• Showed how to move from theory to practice by – Applying the framework to the problems of – Materializing these application into languages, tools,
formats, etc.– Explaining and evaluating these applications (usability
studies, focus groups, prototyping, etc.)
58
Future Work
• Theory – Apply to formally describe other systems– Complete formal definitions of all services with
further events– Load axioms in knowledge base to automatically
assess quality of models (correctness, etc.)• Applications/Tools
– Language• Make different versions uniform• Extend with METS, less complex scenario, society
models• New metamodels
– Domain/application oriented (e.g., archaeology, education)– For traditional libraries
59
Future Work (cont’d)• Applications/Tools
– Visualization• Integration with other tools
– through Wizard • New visualizations • Applying as educational tool
– Generation• Use of Web services• Incorporation of Native XML repositories• Improvement of Scenario Algorithms
– Logging• Promote use• Consider privacy issues• New actions• Deal with scalability issues
60
Future Work (cont’d)• Quality
– Development of more usage-oriented measures• Current measures are mostly system-oriented• Focus on log format and evaluation
– Development of Quality ToolKit (5SQual) for DL managers with following features:
• Mapping tool to map local log format to standard XML Log format
• Components to implement all measures• Visualization of data and measures• Broken into several logical pieces to be used in the different
phases of the information life cycle
• Others, e.g., personalization• Create theories, tools, languages, methods for
personalization based on 5S
61
2. DL Integration
• What is “DL Integration”– Hide distribution– Hide heterogeneity– Enable autonomy of individual component
• Why Integration– island-DLs– inability to seamlessly and transparently
access knowledge across DLs
Utilize various autonomous DLs in concert
62
Integration: Rationale
• We can read any paper book (ignoring limitations of language, vision, …).
• Scholarship requires access, analysis, and synthesis spanning disciplines and sources.
• New theories, systems, and services build upon our past accomplishments.
• Our “Small World” and the “Internet Age” demand that we, and our computers, work together and interoperate.
63
Integration: Urgency, Longevity
• If we collect, capture, acquire, or produce information, will it be usable in 100 years?
• NSF Digital Archiving Program
• Library of Congress National Digital Information Infrastructure and Preservation Program
64
Integration: Standards
• Standards don’t exist in many areas.• Standards that do exist create a jumble:
– Conversion between (without loss?)– Bridging gaps (Z39.50 -> OAI)– Managing legacy content and systems
• Standards in DLs have focused on:– Metadata (e.g., Dublin Core)– Architecture (e.g., handles, repositories)
65
Integration: Challenges
• “Semantic Web” is vision, not reality.• How can we integrate without a theory?• How can we interoperate without a
common framework?• How can we have a science of DLs if we
lack agreement on definitions (so we can reason and discuss) and measures of quality (so we can compare and improve)?
66
Hypothesis and Research Questions
• The 5S framework provides effective solutions to DL integration.
– Formally define the DL integration problem?– Guide integration of domain focused DLs?
• How to formally model such domain specific DLs?• How to integrate formally defined DL models into a
union DL model?• How to use the union DL model to help design and
implement high quality integrated DLs?
– Assess the integration?
67
Related Work
DL interoperability approach
Intermediary-based mapping-based
Consists of
mediator wrapper agent
use
two architectures
federation Union Archiving
used in
Consists of
hybrid mapper composite mapper
use
schema mapping
use
SemInt
has an example
LSD
has an example
Interrelated with
68
DL interoperability approach
Intermediary-based mapping-based
Consists of
mediator wrapper agent
use
two architectures
federation Union Archiving
used in
Consists of
hybrid mapper composite mapper
use
schema mapping
use
Interrelated with
GA
trained by
DL integration formalization
based on
69
Formal Definition of DL Integration
• DLi=(Ri, DMi, Servi, Soci), 1 i n
– Ri is a network accessible repository
– DMi is a set of metadata catalogs for all collections
– Servi is a set of services
– Soci is a society
• UnionRep• UnionCat• UnionServices• UnionSociety
70
Formal Definition of DL Integration (Cont.)
• DL integration problem definition:
Given n individual libraries, integrate the n DLs to create a UnionDL.
71Repository1
DL1
Repository2
Union Catalog
Union Repository
Catalog1 Catalog2
Searching
Union DL DL2
archaeologists
Society
General Public
Society
ArchaeologistsGeneral Public
Union Society
ServiceBrowsingService
Union Service
Harvesting, Mapping,Searching, Browsing,
Clustering, Visualization
Architecture of a Union DL
72
Example of Union Service: CitiViz
73
Integration of Domain Focused DLs
• Union archaeological metadata catalog generation
• Modeling archaeological DLs (ArchDLs) in the 5S framework
• ArchDL integration case study:
ETANA-DL
74
Union Catalog Integration
VN MetadataFormat
Global MetadataFormat
VNCatalog
HDCatalog
Union Catalog
MappingTool
Wrapper
MappingTool
Wrapper
HD MetadataFormat
Virtual Nimrin(VN)
Halif DigMaster(HD)
Union ArchDL
75
Modeling ArchDLs in the 5S Framework
• Modeling archaeological information systems using the 5S theory to better understand the domain and design the system and the supported services
• Minimal DL
• Minimal ArchDL
76
Digital Object
RepositoryCollection Minimal DL
Metadata Catalog
Descriptive Metadata
Specification
A Minimal DL in the 5S Framework
Structural Metadata
Specification
Streams Structures Spaces Scenarios Societies
indexing
browsing searching
services
hypertext
Structured Stream
77
Streams Structures Spaces Scenarios Societies
indexing
browsing searching
services
hypertext
Structured Stream
Descriptive Metadata
specification
SpaTemOrg
StraDia
Arch Descriptive Metadata specification
ArchDO
ArchObj
ArchColl
Arch Metadata catalog
ArchDColl ArchDR Minimal ArchDL
A Minimal ArchDL in the 5S Framework
78
Integration of Domain Focused DLs
• Modeling archaeological DLs (ArchDLs) in the 5S framework
• Union archaeological metadata catalog generation
• ArchDL integration case study:
ETANA-DL
79
ETANA-DL
• Archaeological DL• Integrated DL
– Heterogeneous data handling
• Applies and extends the OAI-PMH– Open Archives Initiative Protocol for Metadata
Handling
• Design considerations– Componentized– Extensible– Portable
80
5S MetaModel
5SGraphDL
Expert
DL Designer
5SL DL
Model
5SLGen
Practitioner
Researcher
TailoredDL
Services
Teacher
componentpool
ODLSearch,ODLBrowse,ODLRate,ODLReview,
…….
Requirements (1) Analysis (2)
Implementation (4)
Design (3)
5SGraph 5SGen
Mapping Tool
5SSuite
81
5SGraph5S Archaeology
MetaModelArchDL Expert ArchDL Designer
Structure Sub-model
ETANA-DLUnion Services
Descriptions
HarvestingMapping
SearchingBrowsing
…
Scenario Sub-model
VN Metadata Format
ETANA-DL Metadata Format
HD Metadata Format
Mapping Tool
Wrapper4VN Wrapper4HD
Inverted Files
Services DB
Index
Index
BrowseService
SearchService
Browse DB
OtherETANA-DL
Services
Web
Interface
XOAI
XOAI
VNCatalog
HDCatalog
UnionCatalog
5SGen
ComponentPool
Browsing…
82
ETANA-DL Architecture
Users Services DataETANA-DL
UnionServices Users
DigBase
DigKit
83
ETANA-DL ArchitectureDigBase and DigKit
Lahav
Nimrin
Umayri
Hisban
Megiddo
Jalul
New Sites
DATABASE
WRAPPERS
ETANA-DLUNION
CATALOG
SearchUSER
INTERFACE
Browse
Recommend
Note
Personalize
Review
Visualizations
ArchaeologySpecific
Work in progress
…
84
Assessment of Integrated DL
• Union catalog quality measurement
• Union service quality measurement
• Initial example
85
Union Catalog Quality Measurement
• Complete– All the catalogs to be integrated are complete.
• Consistent– All the catalogs to be integrated are consistent.– Each descriptive metadata specification in the
union catalog describes only one digital object.
86
Union Catalog Quality Measurement (Cont.)
• Mapping-Completeness
n is the total number of local schemas
n
xxx SMapEle
n 1|)|/|(|
1
87
Union Services Quality Measurement
• Internal quality measurement– Composability: reusability and extensibility
• External quality measurement– Searching: coverageq =
– Browsing: knowledge-gainbrowse =
|)(Re|
|)(Re| 1
UnionDLtr
DLtrni i
1)(
)(
1
n
iiDLPath
UnionDLPath
88
ArchDL1 ArchDL2
UnionArchDL Site1 *Sub-partition *Container *Artifact*Locus*Partition
Bone *BoneName
SitesSite2 *Sub-partition *Container *Artifact*Locus*Partition
Artifacts
Path(ArchDL1)=6
Path(ArchDL2)=2
Path(UnionArchDL)
= (6+6+2) + 4*6*2=62
Browsing: knowledge-gainbrowse
Site *Sub-partition *Container *Artifact*Locus*Partition Bone *BoneName
Knowledge-gainbrowse = 75.6126
62
3. DL OverviewWhy of Global Interest?
• National projects can preserve antiquities and heritage: cultural, historical, linguistic, scholarly
• Knowledge and information are essential to economic and technological growth, education
• DL - a domain for international collaboration– wherein all can contribute and benefit– which leverages investment in networking– which provides useful content on Internet & WWW– which will tie nations and peoples together more
strongly and through deeper understanding
90
Libraries of the FutureJCR Licklider, 1965, MIT Press
World
Nation
State
City
Community
91
SynchronousScholarly Communication
Same time, Same or different place
92
Asynchronous, Digital Library Mediated Scholarly Communication
Different time and/or place
93
Digital LibrariesShorten the Chain from
Editor
Publisher
A&I
Consolidator
Library
Reviewer
94
DLs Shorten the Chain to
Author
Reader
Digital
LibraryEditor
Reviewer
Teacher
Learner
Librarian
Computing (flops)Digital content
Com
mun
icat
ions
(ban
dwid
th, c
onne
ctiv
ity)
Locating Digital Libraries in Computing andCommunications Technology Space
Digital Libraries technologytrajectory: intellectualaccess to globally distributed information
less moreNote: we should consider 4 dimensions: computing, communications,content, and community (people)
96
D ig ita l L ib ra r y C o n te n t
A rtic le s ,R e p o rts,
B o o ks
T e xtD o cum e n ts
S p ee ch ,M u s ic
V id eoA u d io
(A e ria l)P h o tos
G e og rap h icIn fo rm ation
M o d e lsS im u la tio ns
S o ftw a re ,P ro g ra m s
G e no m eH u m a n,a n im a l,
p la n t
B ioIn fo rm ation
2 D , 3 D ,V R ,C A T
Im ag es a ndG ra p h ics
C o nte n tT yp e s
97
AmericanSouth.Org – Roles, ContentSOLINET Libraries (Data
Providers)Scholars
Intellectual Organization Controlled vocabulary Metadata extension
development
Collection Decisions Selection Criteria
Selection Criteria Controlled
vocabulary
Central Server Maintenance Local Server Maintenance Provision of Context
Metadata Repository Metadata Creation/Maintenance
Organizational Structure and
Annotation Tools
Central Interface Design/Maintenance
Local Interface Design/Maintenance
Selection of Other Annotation
Tools
Central Indices Creation/Maintenance
Local Indices Selection of Thesauri
Coordination of Metadata Gateway
Development
Gateway Implementation Concept Mapping
Digital Objects
98
Content Area Description Audio
Digital
Finding Aid
MSS Other
Photo
Video
MF
Total
African-American cultural life 6 4 6 9 4 12 3 10 18 72
Agricultural crisis of late 19th century
1 1 3 1 1 4 8 19
Codification of segregation laws 1 3 2 1 1 8 16
Configuration of white supremacy 1 3 3 3 1 9 20
Cultural values and activities 3 1 5 17 4 15 1 5 20 71
Disenfranchising movements 1 2 2 1 2 1 6 15
Educational movements 6 1 1 18 6 21 3 5 27 98
Emergence of Holiness & Pentecostal Groups
1 1 1 7 10
Emergence of new musical forms 3 1 1 1 2 8
Emergence of organized groups expressing farmers concerns
2 2 1 8 13
… … … … … … … … … … …Total Each Format 41 14 51 161 38 133 13 79 301 831
99
Application
Domain
Related Institutions
Examples Technical Challenges Benefit / Impact
PublishingPublishers, Eprint
archivesOAI Quality control, openness Aggregation, organization
Education
Schools, colleges, universities
NSDL, NCSTRL Knowledge management,
reuseabilityAccess to data
Art, Culture
Museum AMICO, PRDLA Digitization, describing,
catalogingGlobal understanding
ScienceGovernment,
Academia, Commerce
NVO, PDG, SwissProt, UK
eScience,European Union Commission
Data modelsreproducibility, faster reuse, faster
advance
(e) Governme
nt
Government Agencies (all levels)
Census Intellectual property rights,
privacy, multi-nationalAccountability, homeland security
(e) Commerce
, (e) Industry
Legal institutionsCourt cases,
patents Developing standards
Standardization, economic development
History, Heritage
Foundations American Memory Content, context,
interpretation
Long term view, perspective, documentation, recording, facilitating, interpretation,
understanding
Cross-cutting
Library, Archive
Web, personal collections
Multi-language, preservation, scalability, interoperability, dynamic
behavior, workflow, sustainability, ontologies,
distributed data, infrastructure
Reduced cost, increased access, pereservation, democratization, leveling, peace, competitiveness
Reagan Moore
Ed Fox
June
2002
for
NSF
100
101
102
103
104
As data, information, and knowledge play increasingly central roles … digital library
research should focus on:
• Increasing the scope and scale of information resources and services;
• Employing context at the individual, community, and societal levels to improve performance;
• Developing algorithms and strategies for transforming data into actionable information;
• Demonstrating the integration of information spaces into everyday life; and
• Improving availability, accessibility, and, thereby, productivity.
105
An appropriate infrastructure program will provide sustainability of digital knowledge
resources among five dimensions:
• Acquisition of new information resources;• Effective access mechanisms that span
media type, mode, and language;• Facilities to leverage the utilization of
humankind’s knowledge resources;• Assured stewardship over humanity’s
scholarly and cultural legacy; and• Efficient and accountable management
of systems, services, and resources.
106
4. OAI, OCKHAM, CSTC, NSDL, NDLTD:Open Archives Initiative
• Advocacy for interoperability
• Standard for transferring metadata among digital libraries– Protocol for Metadata Harvesting (PMH)
• Simplicity• Generality• Extensibility
• Support for PMH => Open Archive (OA)
107
OAI = Technical Umbrella forPractical Interoperability…
ReferenceLibraries
PublishersE-Print
Archives
…that can be exploited by different communities
Museums
108
OAI – Repository PerspectiveRequired: Protocol
DODO DO DO
MDO
MDO MDOMDOMDO
MDOMDOMDO
109
OAI – Black Box Perspective
OA 1
OA 2
OA 4
OA 3
OA 5OA 6
OA 7
110
Tiered Model of Interoperability
Mediator services
Metadata harvesting
Document models
111
DiscoveryCurrent
AwarenessPreservation
Service Providers
Data Providers
Meta
data
harv
estin
g
The World According to OAI
112
LOCKSS
• Lots of copies keep stuff safe• Stanford (Vicky Reich)• Initial focus on lower levels• Initial content: journals• Emory (Martin Halbert)
– Help deploy and adapt– Help apply in other contexts
• Another registry• Set of publisher manifests (information providers)• Set of storage systems (archival storage)
– NDIIP: AmericanSouth, MetaArchive
113
OCKHAM Library Network
NSDL
OCKHAM
Services
NSDLServices
Teachers LearnersLibrarians
OCKHAMLibrary
Network
LibraryServices
114
OCKHAM
• Simplicity (a la OCCAM’s razor)
• Support by Mellon and DLF
• Four main ideas:
1. Components
2. Lightweight protocols
3. Open reference models (e.g., 5S, OAIS)
4. Community perspective and involvement
• Funded by NSF in NSDL, with P2P
115
Lightweight Protocols
• “Lightweight”, or relatively small and simple protocols seem to have clear advantages over “Full” protocols that attempt to be comprehensive.
• Successes of protocols considered lightweight is illuminating.
• Examples: TCP/IP, HTTP, LDAP, and the OAI PMH
116
Reference Models
• Reference Model: a common vocabulary and description of components, services, and inter-relationships that comprise a system under consideration
• Useful as a tool to foster consensus and common understanding in a time of rapid change and/or disagreement
117
OCKHAM Proposed Services
• Alerting• Browsing• Cataloging• Conversion• OAI – Z39.50• Pathfinding• Registry • (plus others such as from adapted ODL)
CS -> CSTC -> CRIM
• NSF and ACM Education Committee are funding a 2 year project “A Computer Science Teaching Center” - CSTC - http://www.cstc.org/
• College of NJ, U. Ill. Springfield, Virginia Tech• Focus initially on labs, visualization, multimedia• Multimedia part is also supported by a 2nd grant to
Virginia Tech and The George Washington University: http://www.cstc.org/~crim/ (with curricular guidelines also under development)
CS Teaching Center (CSTC)
• Instead of building large, expensive multimedia packages, that become obsolete and are difficult to re-use, concentrate on small knowledge units.
• Learners benefit from having well-crafted modules that have been reviewed and tested.
• Use digital libraries to build a powerful base of support for learners, upon which a variety of courses, self-study tutorials & reference resources can be built.
• ACM support led to Journal of Educational Resources in Computing (JERIC), accessible from www.cstc.org
120
121
Browsing (1)
122
Browsing (2)
123
124
125
126
Computing and Information Technology Interactive Digital Educational Library (CITIDEL)
• Domain: computing / information technology
• Genre: one-stop-shopping for teachers & learners: courseware (CSTC, JERIC), leading DLs (ACM, IEEE-CS, DB&LP, CiteSeer), PlanetMath.org, NCSTRL (technical reports), …
• Submission & Collection: sub/partner collections www.citidel.org
www.CITIDEL.org
• Led by Virginia Tech, with co-PIs:– Fox (director, DL systems)– Lee (history)– Perez (user interface, Spanish
support)
• Partners– College of New Jersey (Knox)– Hofstra (Impagliazzo)– Villanova (Cassel)– Penn State (Giles)
128
English
Spanish
Nominated
Editor reviewed
Java
Multimedia
LLaanngguuaaggee TTooppiicc
QQuuaalliittyy
Identified by crawl
Peer reviewed
Algorithms
Multi-dimensional Categorization
129
DIGITAL LIBRARY SERVICES
REPOSITORIES
USER PORTALS
Overview of CITIDEL architecture
130
Union Metadata Repository
OAI Data
Provider
Laboratories Repository
Applets Repository
Papers Repository
Syllabi Repository
. . .
Digital Library Services
OAI Data
Harvester
Distributed repository structure
131
Annotations
OAI Data
Harvester
EDUCATORS
ADMINISTRATORS LEARNERS
Multilingual Searching
Revising Annotating Filtering Browsing Administering
Filtering Profiles User Profiles
Union Metadata
OAI Data
Provider
Remote and Peer Digital Libraries (eg. NSDL -CIS)
PORTALS
SERVICES
REPOSITORIES
Digital library architecture for localand interoperable CITIDEL services
132
CITIDEL: Computing & Information Technology Interactive Digital Education Library
133
134
135
136
137
138
CITIDEL Technology Features•Component architecture (Open Digital Library)
•Re-use and compose re-deployable digital library components.
•Built Using Open Standards & Technologies
•OAI: Used to collect DL Resources and DL Interoperability
•XSL and XML: Interface rendering with multi-lingual community based translation of screens and content (Spanish, …)
•Perl: Component Integration
•ESSEX: Search Engine Functionality
•Very fast, utilizing in-memory processing
•Includes snap-shots for persistence
•Multi-scheming
•Integrates multiple classifications / views through maps, closure
139
140
Cluster Search Results from CITIDEL
141
Cluster NDLTD-Computing
142
CITIDEL + PIPE
• Adds Interaction Personalization to CITIDEL
•Automatically handles multi-modal conversion to Cell phone, PDA, Etc.
•Can be adopted to any digital data set, only requires XML file of content with hierarchy maintained.
CITIDEL -> NSDL
• A collection project in the
• National STEM (science, technolgy, engineering, and mathematics) education Digital Library – NSDL
• National Science Digital Library
• www.nsdl.org
• (Next slides courtesy Lee Zia, NSF)
144
Connects:
Users: students, educators, life-long learners
Content: structured learning materials; large real-time or archived datasets; audio, images, animations; primary sources; digital learning objects (e.g. applets); interactive (virtual, remote) laboratories; ...
Tools: search; refer; validate; integrate; create; customize; publish; share; notify; collaborate; ...
145
Supports:
Users
Content
Tools
(profiles)
(metadata)
(protocols)
Learning communities
Customizable collections
Application services
146
Enables:Environments for
• Communication
• Collaboration
• Creation
• Validation
• Evaluation
• Recognition
• ...
• Discovery
• Stability
• Reliability
• Reusability
• Interoperability
• Customizability
• ...
of Resources
AND
147
NSDL ProgramTracks
• Core Integration: coordinate a distributed alliance of resource collection and service providers; and ensure reliable and extensible access to and usability of the resulting network of learning environments and resources
• Collections: aggregate and actively manage a subset of the digital library’s content within a coherent theme / specialty
• Services: increase the impact, reach, efficiency, and value of the digital library in its fully operational form
• Targeted (Applied) Research: have immediate impact on one or more of the other three tracks
• Pathways: large efforts across broad ranges of areas or approaches or users
148
Collections
• Discovery of content• Classification and cataloguing• Acquisition and/or linking; referencing• Disciplinary-based themes define a natural body of content,
but other possibilities are also encouraged • Access to massive real-time or archived datasets• Software tool suites for analysis, modeling, simulation, or
visualization• Reviewed commentary on learning materials and pedagogy
149
Services• Help services, frequently asked questions, etc.
• Synchronous/asynchronous collaborative learning environments using shared resources
• Mechanisms for building personal annotated digital information spaces
• Reliability testing for applets or other digital learning objects
• Audio, image, and video search capability
• Metadata system translation
• Community feedback mechanisms
150
151
152
153
NSDL Information ArchitectureEssentially as developed by the Technical Infrastructure Workgroup
referenceditems &
collections
referenceditems &
collections
Special Databases
NSDLServicesNSDL
ServicesOther NSDLServices
CI Services
annotation
CI Services
discussion
CI Services
personalization
CI Services
authentication
CI Services
browsing
Core Services:information retrieval
Core Collection-Building Services
harvesting
Core Collection-Building Services
protocols
Core Services:metadata gathering
Portals &ClientsPortals &
ClientsPortals &Clients
Usage Enhancement
Collection Building
User Interfaces
NSDLCollections
NSDLCollections
NSDLCollections
CoreNSDL“Bus”
A Digital Library Case Study
• Domain: graduate education, research
• Genre:ETDs=electronic theses & dissertations
• Submission: http://etd.vt.edu
• Collection: http://www.theses.org
Project: Networked Digital Library of Theses & Dissertations (NDLTD) http://www.ndltd.org
NDLTD: How can a university get involved?
• Select planning/implementation team– Graduate School– Library– Computing / Information Technology– Institutional Research / Educ. Tech.
• Join online, give us contact names– www.ndltd.org/join
• Adapt Virginia Tech or other proven approach– Build interest and consensus– Start trial / allow optional submission
Student Gets CommitteeSignatures and Submits ETD
Signed
Grad School
Library Catalogs ETD, Access isOpened to the New Research
WWW
NDLTD
158
159
ETD Union Collection (OAI)
VIRTUA
Merged Metadata Collection
ODL (VT)
Virginia Tech ETD Archive
Brazil ETD
Archive
OCLC ETD
Archive
Future: recommender, …
… OAI Data Provider
OAI Service Provider
OAI Harvesting
LEGEND
160
Union catalog: OCLC
• OCLC will expand OAI data provider on TDs.
• Is getting data from WorldCat (so, from many sites!).
• Will harvest from all others who contact them.
• Need DC and either ETD-MS or MARC.
• Has a set for ETDs.
161
162
163
164
OCLC SRU Interface
165
Union catalog: VTLS, VT
• VTLS will enhance search/browse service for ETDs
– Will harvest from OCLC’s set of ETD records
– Will receive through other mechanisms
– Will work with MARC-21 and ETD-MS
• VT will continue to offer experimental services
166
167
ETD Union Search Mirror Site in China (CALIS)(http://ndltd.calis.edu.cn – popular site!)
168
169
VTLS Union CatalogContent Languages
The VTLS NDLTD Union Catalog has data in 6 different languages. These are: English German Greek Korean Portuguese Spanish
Examples follow
170
Language = German; hits = 137
171
Full record display
172
173
174
Complex to Simple
MARC ($50) Dublin Core (DC)
+thesis
175
176
Why ETD? Short Answer
• For Students:– Gain knowledge and skills for the Information Age– Richer communication (digital information, multimedia, …)
• For Universities: – Easy way to enter the digital library field and benefit
thereby
• For the World: – Global digital library – large, useful, many services
• General:– Save time and money– Increased visibility for all associated with research results
177
5. Open Source, Repositories, DigArch, ODL
Open Source DL Examples
• Eprints (www.eprints.org)
• Fedora
• Greenstone (www.greenstone.org)
• Many systems in NSF DLI projects
• VT systems: CITIDEL, CSTC, DL-in-a-box, ETANA, MARIAN, NCSTRL, NDLTD
178
179
180
181
182
183
184
185
What is a Digital Object Repository?
Also called: digital rep., digital asset rep., institutional repository
Stores and maintains digital objects (assets)Provides external interface for Digital Objects
Creation, Modification, Access
Enforces access policiesProvides for content type disseminations
Adapted from Slide by V. Chachra, VTLS
186
Goals of Institutional Repositories (by Steven Harnad, U. Southampton) Self Archiving of Institutional ResearchSelf Archiving of Institutional Research
Thesis and Dissertations (VTLS NDLTD Project)Thesis and Dissertations (VTLS NDLTD Project)Article preprints and post printsArticle preprints and post printsInternal documents and mapsInternal documents and maps
Management of digital collectionsManagement of digital collections
Preservation of materials – decentralized approachPreservation of materials – decentralized approach
Housing of teaching materialsHousing of teaching materials
Electronic Publishing of journals, books, posters, maps, Electronic Publishing of journals, books, posters, maps, audio, video and other multimedia objectsaudio, video and other multimedia objects
Adapted from Slide by V. Chachra, VTLS
187
Fedora™ Digital Object ArchitecturePersistent ID (PID)
Disseminators
System Metadata
EAD, TEI, DC, MARC,
VRA Core, MIX, etc.
Datastreams
Images, E-books, E-journals, Music, Video, etc.
Globally unique persistent id
Public view: access methods for obtaining “disseminations” of digital object content
Internal view: metadata necessary to manage the object
Protected view: content that makes up the “basis” of the object
The Mellon Fedora Project
Adapted from Slide by V. Chachra, VTLS
188
Fedora™Repository
E x ter n a lC o n ten tS o u r c e
E x ter n a lC o n ten tS o u r c e
HT
TP
E x ter n a l C o n ten tR etr iev er
X M L F ile s
Re la t io n a l D B
S e s s io n M a n a g e me n tU s e r A u th e n t ic a t io n
P o l icies
U s ers /G ro u p s
H T T P
F T P
D atas tr eam s
D ig ita l O b jec tsS to rag e S u b s ys te m
S e c u rityS u b s ys te m
W e b Se r vi c eE xpo s ur eL aye r
SO
AP
R em o teS er v ic e
L o c alS er v ic e
M an ag e A c c e s s S e arc h O A I P ro v id e r
M an ag e m e n tS u b s ys te m
A c c e s sS u b s ys te m
HT
TP
FT
P
H T T PH T T P S O A P H T T P S O A P H T T P S O A P
C lie n tA pplica t io n
B a tchPro g ra m
S e rv e rA pplica t io n
W e bB ro ws e r
Co mp o n e n t M g mt
O b je c t M g mt
O b je c t Va lid a t io n
P ID Ge n e ra t io n
O b je c t D is s e min a t io n
O b je c t Re fle c t io n
P o lic y En fo rc e me n t
P o lic y M g mt
Co n te n t
Web Service Web Service Exposure Exposure LayerLayer
Adapted from Slide by V. Chachra, VTLS
189
190
191
192
193
194
195
196
Digitization and PreservationCommunity and Activity (selected)• Archivists worldwide• International collaboration
– Million book project in US, China, India (Reddy, Chen, Balakrishnan)
• US Library of Congress– Matching funds– American Memory– Infrastructure: NDIIP
• Dutch National Library + IBM• Associations: ARL, DLF• People
– Harnad: Self-archiving movement– Lorie: Universal virtual computer– Gladney: technology, philosophy
(http://home.pacbell.net/hgladney/ddq_3_1.htm)– Besser, Trant, …
197
DigArch Complexities:Document Models,
Representations, and Accesses
• Doc = stream + structure + use-scenario; hybrid (paper/electronic), digital only
• Multilingual: content, summary, metadata• Structured: MARC; SGML, HTML, XML• Distributed collection: Kleisli, CIMI, Z39.50• Federated search: collecting, picking site(s),
parallel search / fall-back, fusing results• Access: IPR, payment, security, scenarios
198
DigArch Complexities: Multimedia
• Multiple media types, representations– Self-describing (structures), provenance
• Text, audio, image, video, graphics, animation• Capture, digitization, standards, interchange• Compression, content-based retrieval• Playback (Real time), QoS, rendering
– Popularity (e.g., PowerPoint) vs. longevity (SMIL?)
• JPEG, MPEG (and versions)
199
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video
users digital objects
?
ODL: Open Digital Library
200
?1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video?digital library
Monolithicand/or
Custom-builtweb-basedapplication
201
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video
componentized digital library
?
?
?
?
???
?
?
?
?
??
? ?
?
?
?
?
?
?
?
202
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video
open digital library
OA OA
OA
OA
OA
OA
OA
OA
OA
PMH
PMH
XPMH
XPMH
XPMH
XPMH
XPMH
XPMH
XPMH
XPMH
XPMH
XPMH
XPMH
203
Open Digital Library Protocol
Extended OAI-PMH
Protocol for Metadata Harvesting
204
Open Digital Library Component
Extended OPEN ARCHIVE
OPENARCHIVE
205
Open Digital Library Deployments
• NDLTD (www.ndltd.org)• Computer Science Teaching Center
(www.cstc.org)• Computing and Information Technology
Interactive Digital Educational Library (www.citidel.org)
• Open Archives Distributed (NSF, DFG) – enhancements to PhysNet
• OCKHAM• Open to others through DL-in-a-box
206
Open Digital Library
• Network of Extended Open Archives where each node acts as either a provider of data, services or both.
• Component = Node
• Protocol = Arc
207
Open Digital Library Components
• Running now– XML-File (data provider from file system)– Search: simple or in-memory (Essex) or
generalized– Union, browse, recent, filter– E-journal/review, Submit, Edit, Annotation– Recommender, Rating; Mirroring (see JCDL’02)– Working with NCSA: from DB, unstructured text
• Others in process– Classification/categorization– Registry (and other connections with web services)
208
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
ETD-1
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
ETD-2
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
ETD-3
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
ETD-4
ETD DL for the Networked Digital Library of Theses and Dissertations
(www.ndltd.org)
Search
Filter
Filter
Union
Recent
Browse
PMH
PMH
PMH
ODLRecent
ODLBrowse
ODLUnion
ODLUnion
ODLSearch
ODLUnionPMH
PMH
US
ER
INT
ER
FA
CE
Students and researchers ETD collections
Example Open Digital Library
209
OAI, ODL, DL-in-a-box
• Open Archives Initiative– since 1999, www.openarchives.org
• Open Digital Libraries– since 2001, from www.dlib.vt.edu– with Hussein Suleman (now U. Cape Town)
• DL-in-a-box– NSDL support since 2001– Aimed to help new collections / services projects– http://dlbox.nudl.org
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
Outline
• 1. 5S Framework for DL• 1.1. Motivation: the problem• 1.2. Theory• 1.3. Tools/Applications• 1.4. Quality• 1.5. Conclusions, Future Work• 2. DL Integration• 3. DL Overview• 4. OAI, OCKHAM, CSTC, NSDL, NDLTD• 5. Open Source, Repositories, DigArch, ODL