34
Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop 22-23 November 2007

Controlled Vocabularies in TELPlus

  • Upload
    zizi

  • View
    18

  • Download
    1

Embed Size (px)

DESCRIPTION

Controlled Vocabularies in TELPlus. Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop 22-23 November 2007. Agenda. TELPlus Context Improving subject access 3 sub-tasks Services for TEL. TELPlus Context. Started October 2007 Running 27 months Content WPs - PowerPoint PPT Presentation

Citation preview

Page 1: Controlled Vocabularies in TELPlus

Controlled Vocabularies in TELPlus

Antoine ISAACVrije Universiteit Amsterdam

EDLProject Workshop22-23 November 2007

Page 2: Controlled Vocabularies in TELPlus

Agenda

• TELPlus Context

• Improving subject access– 3 sub-tasks

• Services for TEL

Page 3: Controlled Vocabularies in TELPlus

TELPlus Context

• Started October 2007• Running 27 months

• Content WPs– OCRing previously digitised material– Improving the usability of TEL through OAI

PMH compliancy– Improving Access– Integrating services with TEL portal– User personalisation services– Extending TEL to Bulgaria & Romania

Page 4: Controlled Vocabularies in TELPlus

WP3 – Improving Access

• Task 1: Indexing for usability– Review/test state-of-the-art semantic search

engines• On content of documents

• Task 2: Improving subject access• Task 3: FRBR aggregation, search and

browsing– Create/exploit FRBR metadata repositories

• Task 4: Focus on users– Focus groups on prototypes

Page 5: Controlled Vocabularies in TELPlus

WP 3 Task 2 – Improving Subject Access• Improving subject access via semantic

alignment between subjects

• Search through collections– Using metadata– In a controlled setting

• Paving the way for enhanced usages– Advanced treatments mentioned in TELplus

need conceptual structures and links between these structures

• E.g. clustering

Page 6: Controlled Vocabularies in TELPlus

WP 3 Task 2 – Improving Subject Access• Improving subject access via semantic

alignment between subjects

• Reference: MACS project– Manually-built semantic equivalences

between Rameau, SWD & LCSH headings

Page 7: Controlled Vocabularies in TELPlus

MACS: Querying Collections

Page 8: Controlled Vocabularies in TELPlus

MACS: Query Reformulation Options

Page 9: Controlled Vocabularies in TELPlus

WP 3 Task 2 – Improving Subject Access• Improving subject access via semantic

alignment between subjects

• Reference: MACS project– Manual equivalences between Rameau,

SWD, LCSH headings

• Here: an experiment on deploying automatic alignment techniques– Determining possible strategies– Assessing feasibility and usefulness– MACS context

Page 10: Controlled Vocabularies in TELPlus

WP3.2 Sub-tasks

• 3.2.1. Converting the subjects to standard representation language – Semantic web format (SKOS)

• 3.2.2. Aligning the vocabularies– Semantic correspondences between subjects

• 3.2.3. Deploying the alignment knowledge obtained into TEL framework– E.g. using links to reformulate queries from one

subject list to the other

Page 11: Controlled Vocabularies in TELPlus

Converting subjects to standard representation language

Goal: solving syntactic heterogeneity between vocabularies

• Enabling the use of standard tools– E.g. for query (re)formulation

• Paving the way for dealing with semantic heterogeneity– Definitions of concepts expressed according

to a common model

Page 12: Controlled Vocabularies in TELPlus

Converting subjects to standard representation language

Approach: Semantic Web and SKOS• Semantic Web

– Knowledge objects as web resources (URIs)– Description by linking resources (RDF)– Description using shared formal

vocabularies (ontologies)

• SKOS – A standard Semantic Web model (ontology)– For knowledge organization systems

(thesauri, subject heading lists…)

Page 13: Controlled Vocabularies in TELPlus

http://www.iconclass.nl/s_11

http://www.iconclass.nl/s_11F

skos:Concept

rdf:type

skos:broader

skos:prefLabel

“the Virgin Mary”@en

skos:prefLabel“la Vierge Marie”@fr

http://www.iconclass.nl/

skos:inScheme

skos:ConceptScheme

rdf:type

SKOS: Example

Page 14: Controlled Vocabularies in TELPlus

Converting subjects to standard representation language - Process

• Getting processable versions from owners – E.g. XML

• Analyzing the models

• Converting to SKOS

Page 15: Controlled Vocabularies in TELPlus

WP3.2 Sub-tasks

• 3.2.1. Converting the subjects to standard representation language – Semantic web format (SKOS)

• 3.2.2. Aligning the vocabularies– Semantic correspondences between subjects

• 3.2.3. Deploying the alignment knowledge obtained into TEL framework– E.g. using links to reformulate queries from one

subject list to the other

Page 16: Controlled Vocabularies in TELPlus

Vocabulary Alignment

• Specifying required alignment format (links)– Type of mapping links: equivalence, broader– Cardinality: one-to-one, one-to-many– Taking application context (TEL) into account

Page 17: Controlled Vocabularies in TELPlus

Vocabulary Alignment

• Specifying required alignment format (links)

• Selecting (& running) alignment techniques/tools– Inspired by semantic web approaches

Page 18: Controlled Vocabularies in TELPlus

Vocabulary Alignment Techniques

• Similar to ontology alignment problem• Existing approaches for (semi-) automatic

ontology alignment– Using techniques from linguistics, computer

science, statistics

• Problem: performances do not allow 100% automatic alignment

• Problem: multilingual case– Some techniques cannot be used

Page 19: Controlled Vocabularies in TELPlus

Backgroundknowledge

Potential Technique: Using Background Knowledge

• Using a shared conceptual reference to find links

SHL 1 SHL 2

“Calendar”

“Publication”

Page 20: Controlled Vocabularies in TELPlus

Potential Technique: Statistical Alignment

• Object information (book indexing)

SHL 1 SHL 2

Dually-indexedbooks

“DutchLiterature”

“Dutch”

Page 21: Controlled Vocabularies in TELPlus

Vocabulary Alignment

• Specifying required alignment format (links)

• Selection (& running) of tool/method

• Evaluation (& cleaning)– Considering application

Page 22: Controlled Vocabularies in TELPlus

Evaluation of Alignments

• MACS has produced mappings!– Possible gold standard

• But: has MACS produced all mappings?– Which proportion of the SHLs is covered?– Taking into account all indexing strings?

• Are MACS mappings the only interesting ones?– “Serendipity” mappings

• Concepts that are not equivalent but could bring useful results when added to queries

– Compensating for indexing variability

Page 23: Controlled Vocabularies in TELPlus

Evaluation of Alignments

• Several scenarios for using and evaluating alignments– Concept-based search– Re-indexing– Integration of one SHL into the other– SHL Merging– Free-text search– Navigation

Page 24: Controlled Vocabularies in TELPlus

Evaluation of Alignments

• Several scenarios for using and evaluating alignments– Concept-based search

• Retrieving books indexed by SHL1 using SHL2 concepts

– Re-indexing– Integration of one SHL into the other– SHL Merging– Free-text search

• Matching user search terms to both SHL1 or SHL2 concepts

– Navigation• Browsing several collections using one SHL

structure

Page 25: Controlled Vocabularies in TELPlus

Evaluation of Alignments

• Several settings for a single scenario– Fully automatic reformulation vs assisted

reformulation (candidates)

• Different evaluation measures– Good mappings vs acceptable ones– Number of candidates for reformulation– Semantic closeness to original query

Page 26: Controlled Vocabularies in TELPlus

Vocabulary Alignment

• Specifying required alignment format (links)

• Selection (& running) of tool/method

• Evaluation (& cleaning)

• Assessment of the approach– Efforts required, quality, extendibility

Page 27: Controlled Vocabularies in TELPlus

WP3.2 Sub-tasks

• 3.2.1. Converting the subjects to standard representation language – Semantic web format (SKOS)

• 3.2.2. Aligning the vocabularies– Semantic correspondences between subjects

• 3.2.3. Deploying the alignment knowledge obtained into TEL framework– E.g. using links to reformulate queries from one

subject list to the other

Page 28: Controlled Vocabularies in TELPlus

Deploying the alignment knowledge obtained into TEL framework

• Observing integration of MACS data into TEL– Conceptual input for alignment requirements

• Integration of the obtained alignment in TEL

• Assessment of the alignment integration– Technical aspects, usage aspects

Page 29: Controlled Vocabularies in TELPlus

Reminder

• Alignment is a difficult problem• Application-specific alignment pretty much

unexplored in Semantic Web research

More a feasibility study than a complete solution to the problem

Practical goal: investigate how automatic techniques could help MACS-like initiatives

• Manual mapping is labour-intensive

Page 30: Controlled Vocabularies in TELPlus

Agenda

• TELPlus Context

• Improving subject access– 3 sub-tasks

• Services for TEL

Page 31: Controlled Vocabularies in TELPlus

WP4 – Integrating services with the European Library portal

Theo van Veen (KB)

Tasks:• Identifying services that are going to give the

user the greatest return• Creating new services• Integrating services within TEL…

Page 32: Controlled Vocabularies in TELPlus

WP4 – Some Services Mentioned

Preliminary inventory: no official commitment!

Services based on controlled vocabularies:• Thesaurus and name authority service

– Providing terms linked to query terms

• Semantic enrichment service– Users can annotate search results with

terms

• Distance between terms and related terms

Page 33: Controlled Vocabularies in TELPlus

WP4 – Some Services Mentioned

Preliminary inventory: no official commitment!

Services based on controlled vocabularies:• Thesaurus and name authority service• Semantic enrichment service• Distance between terms and related terms

Adding more value from controlled vocabularies and alignments between them

Page 34: Controlled Vocabularies in TELPlus

Thanks!