44
Towards Terminology Services: Reflections from the FACET Project Doug Tudhope Hypermedia Research Unit University of Glamorgan OCLC seminar, April, 2006

Towards Terminology Services: Reflections from the FACET Project

  • Upload
    gaurav

  • View
    37

  • Download
    3

Embed Size (px)

DESCRIPTION

Towards Terminology Services: Reflections from the FACET Project. Doug Tudhope Hypermedia Research Unit University of Glamorgan. OCLC seminar, April, 2006. Presentation. FACET Project Faceted Knowledge Organisation Systems (KOS) Semantic query expansion Web Demonstrator Evaluation - PowerPoint PPT Presentation

Citation preview

Page 1: Towards Terminology Services: Reflections from the FACET Project

Towards Terminology Services:Reflections from the FACET Project

Doug TudhopeHypermedia Research Unit

University of Glamorgan

OCLC seminar, April, 2006

Page 2: Towards Terminology Services: Reflections from the FACET Project

Presentation

• FACET Project– Faceted Knowledge Organisation Systems (KOS)– Semantic query expansion– Web Demonstrator– Evaluation– Need for standard representations and API

• Current work– Terminology Services– Pilot KOS web service browser– Semantic expansion service?

• Role for KOS in the Semantic Web?– Need to articulate context/rationale for KOS– What kind of Semantic Web?

Page 3: Towards Terminology Services: Reflections from the FACET Project

FACET - Faceted Access to Cultural hEritage Terminology

FACET - a collaborative project investigating the potential of semantic expansion in retrieval

Aims:• Integration of thesaurus into search process / interface• Semantic query expansion taking advantage of facet structure

http://www.comp.glam.ac.uk/~FACET/

Page 4: Towards Terminology Services: Reflections from the FACET Project

FACET Collaborators

• Research Council Funding: EPSRC 3 years

• National Museum of Science and Industry (NMSI):National Railway Museum and Science Museum Collections Database

• J. Paul Getty Trust Art and Architecture Thesaurus (AAT)

• Museum Documentation Association (MDA)Railway Thesaurus

• Canadian Heritage Information Network (CHIN)Advisors

Page 5: Towards Terminology Services: Reflections from the FACET Project

Semantic Expansion

Expanding over relationships in thesauri and related KOS allows the system to play an active role

• Ranking of matching results by semantic closeness• Query Expansion (automatic/interactive)• Augmented Browsing tools Underpinning technologies:• Measures of distance over the semantic index space • Multi-concept Matching Function

• Immediate application controlled vocabulary indexing but also relevant free text query expansion

Page 6: Towards Terminology Services: Reflections from the FACET Project

Faceted Knowledge Organisation Systems

Faceted systems based on primary division into fundamental, high-level categories (facets)

Compound descriptors (multi-concept headings) are synthesised by combination of terms from limited number of fundamental facets

In constructing AAT, adjectival noun phrases very common:e.g. painted oak furniture

“Rather than enumerate the nearly infinite number of object and subject descriptions needed by thesaurus users, the AAT decided to pursue the building blocks of these descriptors in the form of a faceted vocabulary”

(Guide to Indexing and Cataloging with the Art & Architecture Thesaurus)

Page 7: Towards Terminology Services: Reflections from the FACET Project

• Multi-concept subject headings allow highly specific

descriptions and offer promise of precise queries

• However practical focus has tended to be on

cataloguing rather than searching

• Poses problems for recall in retrieval and for browsing.

Full potential yet to be exploited in retrieval

Compound Descriptors and Queries

e.g. painted oak furniture

Page 8: Towards Terminology Services: Reflections from the FACET Project

Matching Problem

“The major problem lies in developing a system whereby individual parts of subject headings containing multiple AAT terms are broken apart, individually exploded hierarchically, and then reintegrated to answer a query with relevance”

(Toni Petersen, AAT Director)

egQuery: mahogany, dark yellow, brocading, Edwardian, armchairDescriptor: oak, light yellow, crests, ovals, brocade, Victorian, Carver chair

Potentially extra / missing / partially and non-matching terms

Page 9: Towards Terminology Services: Reflections from the FACET Project

“The major problem lies in developing a system whereby individual parts of subject headings containing multiple AAT terms are broken apart, individually exploded hierarchically, and then reintegrated to answer a query with relevance”

(Toni Petersen, AAT Director)

Query: mahogany, dark yellow, brocading, Edwardian, armchair

focus term must match after expansionDescriptor: oak, light yellow, crests, ovals, brocade, Victorian, Carver chair

Potentially extra / missing / partially and non-matching terms

Matching Problem

Page 10: Towards Terminology Services: Reflections from the FACET Project

Query expansion (on query as a whole)

Page 11: Towards Terminology Services: Reflections from the FACET Project

FACET Queries with Results

Page 12: Towards Terminology Services: Reflections from the FACET Project

Evaluation and user study with standalone version

• Exploratory to assess how people search for information and how thesauri can inform this process.

• Formative to support further development of the research prototype.

• Dorothee Blocks PhD Thesis 2004. A qualitative study of thesaurus integration for end-user searching.

Page 13: Towards Terminology Services: Reflections from the FACET Project

About the Evaluation (from Blocks 2004)

• Qualitative evaluation methodology employed• Participants were professionals in collaborating

institutions• 20 sessions totalling 22 hours were conducted• Each participant was given 3 tasks to complete, e.g.

“Please search the collection for something similar to the item in the photograph. Please try to be specific.”

Page 14: Towards Terminology Services: Reflections from the FACET Project

Some issues from the evaluation

• Initial allocation of functionality to interface elements did not support the stages of the search process

• Breaking down tasks into components from different facets

• Reformulating queries

• Expansion control on individual terms

• Model of controlled vocabulary search process

Page 15: Towards Terminology Services: Reflections from the FACET Project

The complete model diagram (Blocks 2004)

• Matching user terms to KOS

• Selecting suitable KOS terms • Including terms in the query

• Setting up the query

• Executing the query

• Retrieval of results

• Evaluating the success of the query

• Inspecting individual results – using information can lead to query reformulation

Page 16: Towards Terminology Services: Reflections from the FACET Project

System Architecture

Transact SQL Stored

Procedures

SQL Server Databases -collections & thesaurus

Active-X Data Objects (ADO) Data access components

Database

Application data objects

Expansion engine

(and data structure)

Query and matching functions

Compiled VB client interface and web browser interface

Application interfaces

Database interaction module

Persistent XML data: Queries,

parameters etc.

Page 17: Towards Terminology Services: Reflections from the FACET Project

Interactive/Automatic Thesaurus Query Expansion

• Statistical IRUncontrolled vocabulary, auto-indexingIQE/AQE – terms added to query

Exact match with probabilistic weightingTampere experiments with thesaurus AQE and strongly-structured queries support faceted approachGreenberg recent experiments on QE by thesaurus relationships

• FACETControlled vocabulary, intellectual indexingHybrid I/A QE – user selects terms to expand then AQESemantic degree-of-match with faceted queries

Page 18: Towards Terminology Services: Reflections from the FACET Project

FACET Web Demonstrator

• Illustrates thesaurus based expansion and faceted search

• Intended as an exploration of FACET research outcomes via dynamically generated Web components rather than a complete final interface

• Based on custom API for thesaurus programmatic access

• Browser-based interface (ASP application), using a combination of server-side scripting and compiled components

http://www.comp.glam.ac.uk/~FACET/webdemo/ http://jodi.tamu.edu/Articles/v04/i04/Binding/

Page 19: Towards Terminology Services: Reflections from the FACET Project

FACET Web Demonstator

Page 20: Towards Terminology Services: Reflections from the FACET Project

Semantic Query Expansion

Page 21: Towards Terminology Services: Reflections from the FACET Project

Some lessons learned

• Results show potential of faceted KOS for – Query expansion with semantically ranked results– Realtime implementation multi-concept matching function– Semantic expansion as a browsing tool

– Potential combine with statistical and linguistic techniques

How to generalise?

need for• Common KOS representations and APIs

Page 22: Towards Terminology Services: Reflections from the FACET Project

Towards Terminology Services

• KOS-based services as elements of applications with some form of search/indexing component

• Next phase of work looks at common KOS representation formats and API protocols - making content available via programmatic interfaces

• Eg SKOS Core (RDF/XML) Schema and SKOS API deliverables of SWAD-Europe Thesaurus Activity - http://www.w3.org/2001/sw/Europe/reports/thes

• Experiments with XPATH-based KOS interfaces (using XML and SKOS schemas) promising for relatively small KOS held within the web browser, e.g. interactive possibilities, such as rollover

Page 23: Towards Terminology Services: Reflections from the FACET Project

SKOS API

• SKOS Core (RDF/XML) Schema and SKOS API deliverables of SWAD-Europe Thesaurus Activity - http://www.w3.org/2001/sw/Europe/reports/thes

• SKOS API designed to provide programmatic access to thesauri and related KOS in SKOS Core – builds on previous NKOS work on KOS protocols

• Example SKOS API calls– getConcept (uri)– getConceptsMatchingKeyword/Regex (string)– getAllConceptRelatives (concept)– getSupportedSemanticRelations– getAllConceptRelatives (concept, relation)– getAllConceptsByPath (concept, relation, distance)

Page 24: Towards Terminology Services: Reflections from the FACET Project

Pilot KOS Browser Client Web Service

• Developed pilot to work with a remote server as an initial experiment with the SKOS API, a 'rich client' browser displaying details for thesaurus concepts via web service calls

• Uses GEMET - GEneral Multilingual Environmental Thesaurus

• DREFT demonstration web services server based on SKOS API developed at ILRT, Bristol University http://www.w3.org/2001/sw/Europe/reports/thes/dreft/

• Only a subset of SKOS API calls were available at time of work due to local requirements

So we investigated possibilities with just 2 API calls

Page 25: Towards Terminology Services: Reflections from the FACET Project

Pilot SKOS API Web Service Browser

getConcept getAllConceptRelatives show semantically connected concepts but not relationships

Navigation history andlocal cache of retrieved concepts implemented

API needs more workbut is a possible basis for web services

Page 26: Towards Terminology Services: Reflections from the FACET Project

Caching

• Thesaurus data relatively static - change unlikely during a session

• Caching of concepts helps prevent unnecessary repeated server calls.

• Implementation of concept caching made a significant difference to apparent speed of operation

URI

URI

URI

URI

URI

URI

URI

URI

URI

Next

Current

URIURI

concept

concept

concept

concept

concept

concept

concept

concept

concept

Concept cache

Navigate to next concept in the history

URI

URI

URI

URI

URI

URI

URI

URI

URI

Next

Current

URI

concept

concept

concept

concept

concept

concept

concept

concept

concept

Concept cache

Navigate to previous concept in the history

URI

Previous Previous

URI

URI

URI

URI

URI

URI

URI

URI

URI

Next

CurrentURIURI

concept

concept

concept

concept

concept

concept

concept

concept

concept

Concept cache

Navigate to new concept not previously retrieved

Previous

ServerServerWeb serviceWeb

serviceconcept

Retrieved concepts are cached to avoid repeated server calls.

Retrieved concepts are cached to avoid repeated server calls.

Page 27: Towards Terminology Services: Reflections from the FACET Project

Future issues

More complex services as API protocol elements:• more advanced natural language functionality• cross-mapping provision• data-dependent filters (such as number of postings)

• semantic expansion as a service– different configurations KOS interface displays by single call – novel interfaces, such as navigation via semantic expansion– Query expansion for various ranked result query services – Term suggestion to assist indexing/annotation– More details:

KOS at your Service: Programmatic Access to Knowledge Organisation Systems http://jodi.ecs.soton.ac.uk/Articles/v04/i04/Binding/

Page 28: Towards Terminology Services: Reflections from the FACET Project

Taxonomy of Knowledge Organisation SystemsGail Hodge

Term ListsAuthority Files, Glossaries, Gazetteers, Dictionaries

Classification and CategorizationSubject HeadingsClassification Schemes and Taxonomies

eg DDC, scientific taxonomies

Relationship SchemesThesauriSemantic Networks (eg WordNet)(Ontologies)

http://www.clir.org/pubs/abstract/pub91abst.html

Page 29: Towards Terminology Services: Reflections from the FACET Project

Bridge/migration between KOS and Ontologies?

• KOS as elements of higher level ontologies and schemas – can help leverage them.

• Eg map a thesaurus to a top Ontology

• SKOS RDF/XML Schemas as a possible bridging step

• Ontologies (taken as formal precise definition of relationships) can be combined with inference rules and logic systems in applications with well defined objects and operations

But rationale behind KOS not well understood in Semantic WebHow do intended contexts of use compare?

Page 30: Towards Terminology Services: Reflections from the FACET Project

Types of Knowledge Organisation System (KOS)

from Zeng & Salaba: FRBR Workshop, OCLC 2005

Term Lists:

Synonym RingsAuthority FilesGlossaries/DictionariesGazetteers

Natural language Controlled language

Wea

kly-s

truct

ured

Str o

n gl y-

str u

c tu r

ed

Classification &Categorization: Subject HeadingsSubject Headings

Classification schemesClassification schemes TaxonomiesCategorization schemes

Relationship Groups: Ontologies Semantic networksThesauriThesauri

Pick lists

Page 31: Towards Terminology Services: Reflections from the FACET Project

Ontology and Information Systems (Barry Smith)

• “Philosophical ontology as I shall conceive it here is what is standardly called descriptive or realist ontology. It seeks not explanation but rather a description of reality in terms of a classification of entities that is exhaustive in the sense that it can serve as an answer to such questions as: What classes of entities are needed for a complete description and explanation of all the goings-on in the universe? “

• Ontological Commitment“Some philosophers have thought that the way to do ontology is exclusively through the investigation of scientific theories. With the work of Quine (1953) there arose in this connection a new conception of the proper method of ontology, according to which the ontologist’s task is to establish what kinds of entities scientists are committed to in their theorizing. “

Page 32: Towards Terminology Services: Reflections from the FACET Project

Two Types of Ontology Systems (Barry Smith)

• “Perhaps we can resolve our puzzle as to the degree to which information systems ontologists are indeed concerned to provide theories which are true of reality – as Patrick Hayes would claim – by drawing on a distinction made by Andrew Frank (1997) between two types of information systems ontology.

• On the one hand there are ontologies – like Ontek’s PACIS and IFOMIS’s BFO – which were built to represent some pre-existing domain of reality. Such ontologies must reflect the properties of the objects within its domain in such a way that there obtain substantial and systematic correlations between reality and the ontology itself.

• On the other hand there are administrative information systems, where (as Frank sees it) there is no reality other than the one created through the system itself. The system is thus, by definition, correct. “

Page 33: Towards Terminology Services: Reflections from the FACET Project

AI Ontology Background (Barry Smith)

• Knowledge Representation Ontologiesgrowing out of background in:– “Database Tower of Babel Problem” (e-commerce)– Modelling of scientific theories (Gene ontology etc)

• AI goal radically extending scope of automation

• “Generally, and in part for reasons of computational efficiency rather than ontological adequacy, information systems ontologists have devoted the bulk of their efforts to constructing concept-hierarchies; they have paid much less attention to the question of how the concepts represented within such hierarchies are in fact instantiated in the real world of what happens and is the case. “

Page 34: Towards Terminology Services: Reflections from the FACET Project

What is an Ontology? (T. Gruber) - http://ksl-web.stanford.edu/people/gruber/

• “In the context of knowledge sharing, I use the term ontology to mean a specification of a conceptualization. That is, an ontology is a description (like a formal specification of a program) of the concepts and relationships that can exist for an agent or a community of agents.

• Practically, an ontological commitment is an agreement to use a vocabulary (i.e., ask queries and make assertions) in a way that is consistent (but not complete) with respect to the theory specified by an ontology. We build agents that commit to ontologies. We design ontologies so we can share knowledge with and among these agents.

• A conceptualization is an abstract, simplified view of the world that we wish to represent for some purpose. Every knowledge base, knowledge-based system, or knowledge-level agent is committed to some conceptualization, explicitly or implicitly.

• For AI systems, what "exists" is that which can be represented. When the knowledge of a domain is represented in a declarative formalism, the set of objects that can be represented is called the universe of discourse.“

Page 35: Towards Terminology Services: Reflections from the FACET Project

Semiotic Triangle (Ogden and Richards, 1923) reproduced in Campbell et al. 1998, Representing Thoughts, Words, and Things in the UMLS

Often referred to in Semantic Web literature

Needs to be problematised

Only indirect link via an interpreter

Page 36: Towards Terminology Services: Reflections from the FACET Project

Semiotic Triangle (Ogden and Richards, 1923) reproduced in Campbell et al. 1998, Representing Thoughts, Words, and Things in the UMLS

(AI) Ontology tends to be …

Instance of scientific concept

Fact in a ‘possible world’

Page 37: Towards Terminology Services: Reflections from the FACET Project

Semiotic Triangle (Ogden and Richards, 1923) reproduced in Campbell et al. 1998, Representing Thoughts, Words, and Things in the UMLS

information retrieval (subject) KOS tends to be

Probable relevance - aboutness

Inter/Intra indexer consistency ? (eg Bates 1986)

typically a complex entity

Page 38: Towards Terminology Services: Reflections from the FACET Project

KOS as metadata - Index (or classify) a resource

Semiotic Triangle (after Ogden & Richards)

Indexed resource traditionally a complex entity such as a ‘document’ or image.Semantic Web a wider context for resource

Resource probably about concept - to some extent- based on probable relevance judgments

• SubjectOf is via “aboutness” not a clear-cut instance relationship • Indexer (searcher) vocabulary consistency (eg Bates 1986)

– likely to differ in terminology judgments

• One reason for informal modelling approach of KOS

Term (Symbol) Resource (Referent)

Concept (Thought)

SubjectOf relationship

Page 39: Towards Terminology Services: Reflections from the FACET Project

KOS - Informal by design?

• KOS designed to assist perceived needs of information retrieval users rather than modelling a simplified reality of a domain– basis of (much) KOS construction is intended assistance in indexing/ searching/browsing and generalised retrieval as much as logical properties of attributes– implications:

levels of specialisation granularity of relationships

• Many KOS by design informal structures– pragmatic compromises for different uses– semantic relationships often ‘fuzzy’

• Semantic organisation understood as conventional – could be otherwise, different viewpoints inevitable– users assisted to explore and appropriate

Page 40: Towards Terminology Services: Reflections from the FACET Project

Distributed KOS meaningful?

• Meaning of a concept depends on its semantic context within a KOS (and indexing practice, relevance judgements)

Eg of KOS fragment (Getty AAT in FACET Web Demonstrator)

Not necessarily straightforward• apply KOS concepts out of this context (eg magenta)• link in to other distributed structures and contexts• Some ‘open world’ Semantic Web implications problematic?

Page 41: Towards Terminology Services: Reflections from the FACET Project

How to apply KOS?

• What is the purpose of a given KOS?- we need to specify/articulate more clearly

• Domain dependent level of precision in concept use Important to take into account how applications will process concepts

Current KOS relationships at a useful level of generality for many retrieval-based applications (with some specialisation?)

• Cost/benefit issues for KOS applicationsin granularity of relationships and degree of formalisation

Page 42: Towards Terminology Services: Reflections from the FACET Project

KOS in what kind of Semantic Web?

• Role for knowledge-based interactive tools in semantic web applications

(in addition to emphasis on AI machine reasoning)

– Reminiscent of old debates on appropriate limits to automation

– A balance between system and human ‘agency’ – Expert Systems or … Systems for Experts ?

Smart, interactive tools allowing scope for tacit knowledge, informal representations

Page 43: Towards Terminology Services: Reflections from the FACET Project

Contact Information

Doug TudhopeSchool of ComputingUniversity of GlamorganPontypridd CF37 1DLWales, UK

[email protected]://www.comp.glam.ac.uk/pages/staff/dstudhope

Page 44: Towards Terminology Services: Reflections from the FACET Project

References

Bates M. 1986. Subject access in online catalogs: a design model, Journal of the American Society for Information Science, 37(6), 357-376.

Binding C., Tudhope D. 2004. KOS at your Service: Programmatic Access to Knowledge Organisation Systems. JoDI 4(4), http://jodi.tamu.edu/Articles/v04/i04/Binding/

Blocks D., Cunliffe D. Tudhope D. A reference model for user-system interaction in thesaurus-based searching. 2006 (in press). Journal of the American Society for Information Science and Technology.

Campbell K., Oliver D., Spackman K., Shortliffe E. 1998. Representing Thoughts, Words, and Things in the UMLS. Journal of the American Medical Informatics Association, 5 (5), 421-431.

FACET Web demonstrator http://www.comp.glam.ac.uk/~FACET/webdemo/FACET Xpath browsers http://www.comp.glam.ac.uk/~FACET/formats/ Greenberg J. 2001. Automatic query expansion via lexical-semantic relationships, Journal of the American

Society for Information Science and Technology, 52(5), pp. 402-415.Gruber T. What is an ontology? http://ksl-web.stanford.edu/people/gruber/ Hendler J. Ontologies on the Semantic Web, In (S. Staab Ed.) Tremds & Controversies, IEEE Intelligent

Systems, 73-74Järvelin K., Kekäläinen J., Niemi T. 2003. ExpansionsTool: concept-based query extension and

construction”, Information Retrieval, 4(3/4), pp. 231-255Smith B. 2003. Ontology. In: (L. Floridi (ed.), Blackwell Guide to the Philosophy of Computing and

Information, Oxford: Blackwell, 2003, 155–166. (Longer draft at http://ontology.buffalo.edu/ontology(PIC).pdf)

Tudhope D., Binding C., Blocks D., Cunliffe D. 2002. Compound Descriptors in Context: A Matching Function for Classifications and Thesauri. JCDL 2002, 84-93. full paper (pdf)

Tudhope D., Binding C. 2005. Towards Terminology Services: experiences with a pilot web service thesaurus browser. Proc. International Conference on Dublin Core and Metadata Applications, (DC 2005), 269-273. (version forthcoming in ASIST Bulletin).

Tudhope D., Binding C., Blocks D., Cunliffe D. Query expansion via conceptual distance in thesaurus indexed collections. 2006 (in press). Journal of Documentation.