Upload
jasmin-armstrong
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
©CLRC/BITD/Keith G Jeffery
CERIF in CRISs & Scientific Repositories 20020823 1
Comparative Study of Metadata for Scientific
Information: The Place of CERIF in CRISs
& Scientific RepositoriesKeith G Jeffery, Director, IT CLRC [email protected]
Andrei Lopatenko, Manchester University [email protected]
Anne Asserson, University of Bergen [email protected]
©CLRC/BITD/Keith G Jeffery
CERIF in CRISs & Scientific Repositories 20020823 2
Series of Presentations
• CERIF: Past, Present and Future: An OverviewAnne Asserson, Andrei Lopatenko, Keith Jeffery
• CERIF - Information Retrieval of Research Information in a Distributed Heterogeneous EnvironmentAndrei Lopatenko, Keith Jeffery, Anne Asserson
• Comparative Study of Metadata for Scientific Information: The place of CERIF in CRISs and Scientific RepositoriesKeith Jeffery, Andrei Lopatenko, Anne Asserson
©CLRC/BITD/Keith G Jeffery
CERIF in CRISs & Scientific Repositories 20020823 3
STRUCTURE
• DATA, INFORMATION & KNOWLEDGE
• DATA DELUGE, INFORMATION EXPLOSION AND METADATA
• USAGE OF METADATA IN CRISs• METADATA AND CERIF• CONCLUSIONS AND
RECOMMENDATIONS
©CLRC/BITD/Keith G Jeffery
CERIF in CRISs & Scientific Repositories 20020823 4
DATA, INFORMATION & KNOWLEDGE
Data
• DATA : 06032002 – representation of observation of real
world– A lexical string of characters or
symbols
©CLRC/BITD/Keith G Jeffery
CERIF in CRISs & Scientific Repositories 20020823 5
• INFORMATION : 06-03-2002– USA: 3rd June 2002, – UK: 6th March 2002
• Instead use:– Data : 20020603 – Metadata:
• yyyymmdd : a ‘format template’• Date : a type
– Structured data in context
DATA, INFORMATION & KNOWLEDGE
Information
©CLRC/BITD/Keith G Jeffery
CERIF in CRISs & Scientific Repositories 20020823 6
• KNOWLEDGE– Theories or hypotheses– Representation of:
• Facts (i.e. information)• Rules (when a, if b, then x, else y)
– Processing of them by inference:• Deduction, induction, abduction
– Commonly accepted justified belief
DATA, INFORMATION & KNOWLEDGE
Knowledge
©CLRC/BITD/Keith G Jeffery
CERIF in CRISs & Scientific Repositories 20020823 7
Start-Time Departureairport Flight Arrivalairport End-Time
0800 LHR BA123 FRA 1000
0900 LHR BA125 FRA 1100
1000 LHR BA127 FRA 1200
1100 LHR BA129 FRA 1300
Etc etc
1800 LHR BA137 FRA 2000
DATA, INFORMATION & KNOWLEDGE
Knowledge: Facts
©CLRC/BITD/Keith G Jeffery
CERIF in CRISs & Scientific Repositories 20020823 8
Start-Time Departureairport Flight Arrivalairport End-Time
0800 LHR BA123 FRA 1000
0900 LHR BA125 FRA 1100
1000 LHR BA127 FRA 1200
1100 LHR BA129 FRA 1300
Etc etc
1800 LHR BA137 FRA 2000
between 0800 and 1800 every hour, on the hour a BA flight leaves LHR for FRA
INDUCTION
(data mining)
DATA, INFORMATION & KNOWLEDGE
Knowledge: Induction
©CLRC/BITD/Keith G Jeffery
CERIF in CRISs & Scientific Repositories 20020823 9
Collecting Observed Facts
DATA
DATA, INFORMATION & KNOWLEDGE
Putting it together
©CLRC/BITD/Keith G Jeffery
CERIF in CRISs & Scientific Repositories 20020823
10
Structuring in Context
DATA
INFORMATION
DATA, INFORMATION & KNOWLEDGE
Putting it together
©CLRC/BITD/Keith G Jeffery
CERIF in CRISs & Scientific Repositories 20020823
11
Inducing commonly accepted belief
DATA
INFORMATION
KNOWLEDGE
DATA, INFORMATION & KNOWLEDGE
Putting it together
©CLRC/BITD/Keith G Jeffery
CERIF in CRISs & Scientific Repositories 20020823
12
Value-Adding for Business Needs
DATA
INFORMATION
KNOWLEDGE
INSIGHT
DATA, INFORMATION & KNOWLEDGE
Putting it together
©CLRC/BITD/Keith G Jeffery
CERIF in CRISs & Scientific Repositories 20020823
13
STRUCTURE
• DATA, INFORMATION & KNOWLEDGE
• DATA DELUGE, INFORMATION EXPLOSION AND METADATA
• USAGE OF METADATA IN CRISs• METADATA AND CERIF• CONCLUSIONS AND
RECOMMENDATIONS
©CLRC/BITD/Keith G Jeffery
CERIF in CRISs & Scientific Repositories 20020823
14
DATA DELUGE, INFORMATION EXPLOSION AND METADATA
Technology Capacity
• Communications– 2.4Kb/s 20Gb/s in 30 years 2 * 106
• Online Storage– 1.2 Mb 40 Gb in 30 years 4 * 104
• Processor Speed increased even more
With acknowledgements to
©CLRC/BITD/Keith G Jeffery
CERIF in CRISs & Scientific Repositories 20020823
15
• e-Science– Petabytes per year– Particle Physics– Space Science– Genomics
• e-Information– Terabytes per year– Eprints– Hyperlinked data– Hypermedia
• e-Learning• e-Business With acknowledgements to CLRC/BITD/PS
DATA DELUGE, INFORMATION EXPLOSION AND METADATA
Application Areas
©CLRC/BITD/Keith G Jeffery
CERIF in CRISs & Scientific Repositories 20020823
16
WWW(1989)
DATA DELUGE, INFORMATION EXPLOSION AND METADATA
Technology Takeup
©CLRC/BITD/Keith G Jeffery
CERIF in CRISs & Scientific Repositories 20020823
17
• Much of this data is inaccessible • Need to be able to
– Find relevant data as information– Understand it : syntax, semantics
– Understand any restrictions on its use
datarequired
METADATA
DATA DELUGE, INFORMATION EXPLOSION AND METADATA
Data & Metadata
©CLRC/BITD/Keith G Jeffery
CERIF in CRISs & Scientific Repositories 20020823
18
• Metadata is data about data
• Metadata to one application is data to another
Application1 Application2
DATA DELUGE, INFORMATION EXPLOSION AND METADATA
Data & Metadata
©CLRC/BITD/Keith G Jeffery
CERIF in CRISs & Scientific Repositories 20020823
19
data (document)
SCHEMA NAVIGATIONAL ASSOCIATIVE
how to
get it
constrain it
view to users
DATA DELUGE, INFORMATION EXPLOSION AND METADATA
Three Kinds of Metadata
©CLRC/BITD/Keith G Jeffery
CERIF in CRISs & Scientific Repositories 20020823
20
• intensional description of extensional instances– database:
• name • size• security authorisations
– attributes: • name• type • constraints
• formal logic relationship to data instances
DATA DELUGE, INFORMATION EXPLOSION AND METADATA
Metadata Kinds: Schema
©CLRC/BITD/Keith G Jeffery
CERIF in CRISs & Scientific Repositories 20020823
21
data (document)
SCHEMA NAVIGATIONAL ASSOCIATIVE
how to
get it
constrain it
view to users
DATA DELUGE, INFORMATION EXPLOSION AND METADATA
Three Kinds of Metadata
©CLRC/BITD/Keith G Jeffery
CERIF in CRISs & Scientific Repositories 20020823
22
• How to get to information resource direct– filename– DB name + navigational algorithm– DB name + predicate (query)– URL– URL + predicate (query)
• or any of the above via– web indexing system (eg AltaVista, ExCite…)– local indexing system bookmarks or proxy
server)
DATA DELUGE, INFORMATION EXPLOSION AND METADATA
Metadata Kinds: Navigational
©CLRC/BITD/Keith G Jeffery
CERIF in CRISs & Scientific Repositories 20020823
23
data (document)
NAVIGATIONAL
how to
get it
SCHEMA
constrain it
ASSOCIATIVE
view to users
DATA DELUGE, INFORMATION EXPLOSION AND METADATA
Three Kinds of Metadata
©CLRC/BITD/Keith G Jeffery
CERIF in CRISs & Scientific Repositories 20020823
24
• information for application assistance– catalog record (e.g. Dublin Core) - descriptive– content rating (e.g. PICS) - restrictive– security, privacy (cryptography, digital signatures)
- restrictive– information from dictionaries, thesauri,
hyperglossaries, domain ontologies - supportive
• no formal logic relationship to data instances
DATA DELUGE, INFORMATION EXPLOSION AND METADATA
Metadata Kinds: Associative
©CLRC/BITD/Keith G Jeffery
CERIF in CRISs & Scientific Repositories 20020823
25
STRUCTURE
• DATA, INFORMATION & KNOWLEDGE
• DATA DELUGE, INFORMATION EXPLOSION AND METADATA
• USAGE OF METADATA IN CRISs• METADATA AND CERIF• CONCLUSIONS AND
RECOMMENDATIONS
©CLRC/BITD/Keith G Jeffery
CERIF in CRISs & Scientific Repositories 20020823
26
• Data quality• Access• Understanding answers• Improving Queries• Interoperability with other CRISs• Interoperability with other Systems e.g.
– Local management information systems– Bibliographic systems– Scientific data systems
USAGE OF METADATA IN CRISs
Benefits
©CLRC/BITD/Keith G Jeffery
CERIF in CRISs & Scientific Repositories 20020823
27
• All CRISs based on– DB SYSTEM– IR SYSTEM
• Have schema metadata• It may not be sufficient
– To ensure integrity– To provide rich enough
program interface– To ensue integrity in
foreign key - primary key linkage to associated CRISs or other systems
SCHEMA
constrain it
USAGE OF METADATA IN CRISs
Schema Metadata
©CLRC/BITD/Keith G Jeffery
CERIF in CRISs & Scientific Repositories 20020823
28
NAVIGATIONAL
how to
get it
• ‘Base CRISs’ may have navigational metadata– If provide raw
information only: no– If provide URLs to e.g.
publications, scientific datasets: yes
• ‘Meta-CRISs’ which act as catalogues or indexes to other CRISs do have navigational metadata
USAGE OF METADATA IN CRISs
Navigational Metadata
©CLRC/BITD/Keith G Jeffery
CERIF in CRISs & Scientific Repositories 20020823
29
• AdM– Associative
descriptive
• ArM– Associative
restrictive
• AsM– Associative
supportive
ASSOCIATIVE
view to users
USAGE OF METADATA IN CRISs
Associative Metadata
©CLRC/BITD/Keith G Jeffery
CERIF in CRISs & Scientific Repositories 20020823
30
• CRISs have AdM if– Provide summary record of >= 1
{<project> | <person> | <orgunit>} and point to detailed records
– The AdM provides machine-readable (syntax) and machine-understandable (semantics) information
ASSOCIATIVE
view to users
USAGE OF METADATA IN CRISs
Associative descriptive Metadata
©CLRC/BITD/Keith G Jeffery
CERIF in CRISs & Scientific Repositories 20020823
31
• CRISs have ArM if– Provide separate metadata record
with information on access rights, copyright, IPR, 3rd party liability disclaimer, pricing
– The ArM provides machine-readable (syntax) and machine-understandable (semantics) information
ASSOCIATIVE
view to users
USAGE OF METADATA IN CRISs
Associative restrictive Metadata
©CLRC/BITD/Keith G Jeffery
CERIF in CRISs & Scientific Repositories 20020823
32
• CRISs have AsM if– Provide >= 1 {dictionary |
hyperglossary | thesaurus | domain ontology}
– The AsM provides machine-readable (syntax) and machine-understandable (semantics) information and / or knowledge
ASSOCIATIVE
view to users
USAGE OF METADATA IN CRISs
Associative supportive Metadata
©CLRC/BITD/Keith G Jeffery
CERIF in CRISs & Scientific Repositories 20020823
33
USAGE OF METADATA IN CRISs Typical CRIS and
Metadata
Metadata for whole collection of base CRIS data records
Metadata for data record in base CRIS
Metadata within base CRIS
Data
schema navigational
associative
associative
Other data system
©CLRC/BITD/Keith G Jeffery
CERIF in CRISs & Scientific Repositories 20020823
34
STRUCTURE
• DATA, INFORMATION & KNOWLEDGE
• DATA DELUGE, INFORMATION EXPLOSION AND METADATA
• USAGE OF METADATA IN CRISs• METADATA AND CERIF• CONCLUSIONS AND
RECOMMENDATIONS
©CLRC/BITD/Keith G Jeffery
CERIF in CRISs & Scientific Repositories 20020823
35
METADATA & CERIF
The CERIF Models
Metadata Model
Export Model
Full CRIS Model
CRIS C
CRIS B
CRIS A
©CLRC/BITD/Keith G Jeffery
CERIF in CRISs & Scientific Repositories 20020823
36
• Metadata Model Designed in from the start– Schema and navigational metadata defined– AdM : e.g. to be used as catalog in ERGO– AsM : e.g. controlled lists of terms
• Also designed to assist evolution of further linkages– Flexible ‘articulated’ structure– Links (metadata within records) to e.g.
bibliographic, scientific datasets
METADATA & CERIF
The CERIF Metadata Model
©CLRC/BITD/Keith G Jeffery
CERIF in CRISs & Scientific Repositories 20020823
37
• AdM : provide crosswalks from CERIF Metadata Standard to:– Dublin Core (at least formalised
version)– Other relevant standards as they
emerge
– Using RDF, XML-Schema– Coded in XML
METADATA & CERIF CERIF: What Next? :
Associative descriptive Metadata
©CLRC/BITD/Keith G Jeffery
CERIF in CRISs & Scientific Repositories 20020823
38
• ArM: define metadata standard taking into account existing ones– Access rights– IPR, copyright, right to use– 3rd party liability disclaimer– Charges
• XrML ? (PARC, Palo Alto)
METADATA & CERIF CERIF: What Next? :
Associative restrictive Metadata
©CLRC/BITD/Keith G Jeffery
CERIF in CRISs & Scientific Repositories 20020823
39
• AsM : take current work on formal domain ontologies and push further– Using pre-existing ontologies when
relevant– Providing crosswalks to related
ontologies
• DAML/OIL
METADATA & CERIF CERIF: What Next? :
Associative supportive Metadata
©CLRC/BITD/Keith G Jeffery
CERIF in CRISs & Scientific Repositories 20020823
40
STRUCTURE
• DATA, INFORMATION & KNOWLEDGE
• DATA DELUGE, INFORMATION EXPLOSION AND METADATA
• USAGE OF METADATA IN CRISs• METADATA AND CERIF• CONCLUSIONS AND
RECOMMENDATIONS
©CLRC/BITD/Keith G Jeffery
CERIF in CRISs & Scientific Repositories 20020823
41
CONCLUSIONS & RECOMMENDATIONS
CERIF: Good Basis
• CERIF already provides a good metadata standard– Formally defined– Proper subset of export and full CRIS
models– Recognised by EU
©CLRC/BITD/Keith G Jeffery
CERIF in CRISs & Scientific Repositories 20020823
42
• CERIF Metadata already provides to a large extent the facility for:– Data quality– Access– Understanding answers– Improving Queries– Interoperability with other CRISs– Interoperability with other Systems e.g.
• Local management information systems• Bibliographic systems• Scientific data systems
CONCLUSIONS &
RECOMMENDATIONS CERIF Metadata Provides
©CLRC/BITD/Keith G Jeffery
CERIF in CRISs & Scientific Repositories 20020823
43
• But can do more– Improvements proposed earlier– Plenty of scope for ideas, enthusiasm
• euroCRIS CERIF Task Group• www.eurocris.org/cerif
CONCLUSIONS &
RECOMMENDATIONS More can be done
©CLRC/BITD/Keith G Jeffery
CERIF in CRISs & Scientific Repositories 20020823
44
CONCLUSIONS &
RECOMMENDATIONS CERIF Metadata: Use It!
hieroglyphics
demotic
greek
CERIF METADATA
Is developed to assist:
Quality
Understanding (answers)
Precision (queries)
Interoperability
Of CRISsWith acknowledgements to the British Museum