55
Knowledge Organization Systems and Information Discovery Douglas Tudhope Inaugural Lecture

Knowledge Organization Systems and Information Discovery Douglas Tudhope Inaugural Lecture

Embed Size (px)

Citation preview

Knowledge Organization Systemsand Information Discovery

Douglas Tudhope

Inaugural Lecture

Acknowledgements

Research team members and collaborators

– Ceri Binding (University of Glamorgan)– Andreas Vlachidis (University of Glamorgan)

– Keith May, English Heritage (EH)

– Stuart Jeffrey, Julian Richards, Archaeology Data Service (ADS)Archaeology Department, University of York

Collaborative acknowledgements

Harith Alani Steve HarrisPaul Beynon-Davies Traugott KochDorothee Block Marianne LykkeDaniel Cunliffe Brian MatthewsEmlyn Everitt Stuart LewisKora Golub Hugh MackayRachel Heery Jim MoonChris Jones Renato SouzaIolo Jones Carl Taylor

Information Discovery

• Literal string match (eg Google) is good for some kinds of searches:

specific concrete topics

where all we want are some relevant results

- not care how many we miss!

• Google less good at more conceptual (re)search topics

where important to be sure not missed anything important

eg medical, legal, scholarly research

-------------

• Searching data and documents a recent general research focus

variously termed ... eScience, Digital Humanities, Cyberinfrastructure

- data.gov.uk a recent initiative for government data

Words are tricky!

"When I use a word," Humpty Dumpty said in rather a scornful tone,

"it means just what I choose it to mean--neither more nor less." (Lewis Carroll)

• Various potential problems with literal string search

• Different words mean same thing• Same word means different things

• Trivial spelling differences can affect resultsor a particular choice of synonymor a slightly different perspective in choice of concept

- How to address this issue?

This lecture

• Brief look at the history of work on this topic at Glamorgan

• Examples from recent AHRC funded research

on cross search of different archaeological datasets and reports

- try to give a general flavour

• Discuss some current research issues

This lecture

• Part of a general move towards

a (more) machine understandable Web

Machine readable vs machine understandable

What we say to the machine:<h1>The Cat in the Hat</h1><ul>

<li>ISBN: 0007158440</li><li>Author: Dr. Seuss</li><li>Publisher: Collins</li>

</ul>

What the machine understands:<<h1></h1><ul>

<li</li><li</li><li</li>

</ul>

(More) machine understandable

What we say to the machine:<h1>Title:The Cat in the Hat</h1><ul>

<li>ISBN: 0007158440</li><li>Author: Dr. Seuss</li><li>Publisher: Collins</li>

</ul>

What the machine understands:<<h1></h1><ul>

<li</li><li</li><li</li>

</ul>

(More) machine understandable

What we say to the machine:<h1>Title:The Cat in the Hat</h1><ul>

<li>ISBN: 0007158440</li><li>Author: Dr. Seuss</li><li>Publisher: Collins</li>

</ul>

What the machine understands:<<h1></h1><ul>

<li</li><li</li><li</li>

</ul>

Book ID

Author Publisher

---------------conceptualstructure(ontology)

(More) machine understandable

What we say to the machine:<h1>Title:The Cat in the Hat</h1><ul>

<li>ISBN: 0007158440</li><li>Author: Dr. Seuss</li><li>Publisher: Collins</li>

</ul>

What the machine understands:<<h1></h1><ul>

<li</li><li</li><li</li>

</ul>

Book ID

Author Publisher

---------------conceptualstructure(ontology)---------------vocabularies forterminology andknowledge organization

Theodor Geisel

Knowledge Organization Systems

• Knowledge Organization Systems

eg classifications, thesauri and ontologies

help semantic interoperability

• Reduce ambiguity by defining terms

and providing synonyms

• Organise concepts via semantic relationships

Knowledge Organization Systems

• Knowledge Organization Systems

- classifications, thesauri and ontologies

help semantic interoperability

• Reduce ambiguity by defining terms

and providing synonyms

Organise concepts via semantic relationships

EH Monuments Type Thesaurus

Knowledge Organization Systems

• Knowledge Organization Systems

- classifications, thesauri and ontologies

help semantic interoperability

• Reduce ambiguity by defining terms

and providing synonyms

Organise concepts via semantic relationships

EH Monuments Type Thesaurus

Origins of research

Polytechnic of Wales Research Assistantship (collaborating with Paul Beynon-Davies, Chris Jones - Carl Taylor’s PhD)

Experimental museum exhibitExtract of collections database - Pontypridd Historical and Cultural Centre

Origins of research

Polytechnic of Wales Research Assistantship (collaborating with Paul Beynon-Davies, Chris Jones - Carl Taylor’s PhD)

Experimental museum exhibitExtract of collections database - Pontypridd Historical and Cultural Centre

Hard to generalise and maintain if based on manual linking of information

dynamic implicit links

In this case based on Social History and Industrial Classification (SHIC)and indexing for place, time period

Indexing on subject, period, place

Similar or different?

Semantic similarity measure

Source Context

Destination Context

Similarity

Coefficient

Based on comparison of sets of SHIC concepts via a computed measure of semantic closeness

Hypermedia navigation tool (find similar) rather than a formal query

General Costume, Social Organisation, Entertainment

Mens-Costume, Sporting Organisation

FACET - Faceted Access to Cultural hEritage Terminology

Subsequent EPSRC funded project

with Science Museum, National Railway Museum

and J. Paul Getty Trust - Art & Architecture Thesaurus (AAT)

Aims:• Integration of thesaurus into user interface• Semantic query expansion

FACET research question

“The major problem lies in developing a system whereby individual parts of subject headings containing multiple AAT terms are broken apart, individually exploded hierarchically, and then reintegrated to answer a query with relevance”

(Toni Petersen, AAT Director)

Example Query: mahogany, dark yellow, brocading, Edwardian, armchair

for National Railway Museum collection - eg royal carriage

FACET Web Demonstrator- Semantic Query Expansion

FACET Web Demonstrator- how to generalise?

FACET - more sophisticated search but still a single database

How to generalise to multiple datasets and thesauri?How to connect with text documents?

STAR Semantic Technologies for Archaeological Resources

• AHRC funded project(s) with English Heritage and the ADS

Generalise previous methods to :-

• Different datasets with different structures

• Reports of excavationsADS OASIS Grey Literature Library (unpublished reports)Online AccesS to the Index of archaeological investigationS

STAR Semantic Technologies for Archaeological Resources

• Currently excavation datasets isolated with different terminology systems

• Currently no connection with grey literature excavation reports

Aims

• Cross search at a conceptual level archaeological datasets with associated grey literature

STAR Semantic Technologies for Archaeological Resources

• Need for integrating conceptual frameworkand terminology control via thesauri and glossaries

• EH (Keith May) designed an ontology describing the archaeological process

The archaeological process

• Events in the present and events in the past,

related by the place in which they occur

and the physical remains in that place

• Activities in the present investigate the remains of the past

(affecting them in the process)

Events in the presentExcavation // Drawing and PhotographySurvey // SamplingTreatments and ProcessingClassification // Grouping and PhasingMeasuring including scientific datingRecording of observationsDissemination // Interpretation // Analysis

Events in the past have results in the present• Events shaping natural environment

geological, environmental and biological processes

Events in the past have results in the present• Events shaping natural environment

geological, environmental and biological processes

• Events concerned with object production, disposal or loss(how ‘finds’ produced and later deposited in archaeological context)

Events in the past have results in the present• Events shaping natural environment

geological, environmental and biological processes

• Events concerned with object production, disposal or loss(how ‘finds’ produced and later deposited in archaeological context)

• Construction, modification and destruction events relating to human buildings

Events in the past have results in the present

• Conceptual framework to model these archaeological events(an EH extension of a standard cultural heritage ontology)

• Need to move beyond simple Who – What – Where – When modeltypically used in state of the art cultural heritage databases

Typical ‘Advanced Search’ model- does not deal with events

Typical Who - What - Where - When advanced search user interface

WhoO and O or

WhatO and O or

WhereO and O or

When--------Resources

Typical ‘Advanced Search’ limitations

Typical Who - What - Where - When model - needs more semantics

WhoO and O or

WhatO and O or

WhereO and O or

When--------Resources

Archaeological ‘find’ (eg coin)

Archaeological ‘context’ (eg hearth)

Typical ‘Advanced Search’ limitations

Need to define relationships between entitiesand allow multiple connections

WhoO and O or

WhatO and O or

WhereO and O or

When--------Resources

Archaeological ‘find’ (eg coin)

Archaeological ‘context’ (eg hearth)

When photo was taken?When ‘find’ originally made?When ‘find’ deposited?

Typical ‘Advanced Search’ limitations

Assigning dates and classifying are important ‘events’ in the present- outcomes of the archaeological process (interpretations can differ)

WhoO and O or

WhatO and O or

WhereO and O or

When--------Resources

Who made dating judgment?

Archaeological ‘find’ (eg coin)

Archaeological ‘context’ (eg hearth)

When photo was taken?When ‘find’ originally made?When ‘find’ deposited?

Broader conceptual framework (ontology)

Modeling multiple interpretations – linked to underlying datawithin the ontology ‘multivocality’ in archaeology

WhoO and O or

WhatO and O or

WhereO and O or

When--------Resources

Who made dating judgment?

Archaeological ‘find’ (eg coin)

Archaeological ‘context’ (eg hearth)

When photo was taken?When ‘find’ originally made?When ‘find’ deposited?

Who made dating judgment?

Archaeological ‘find’ (eg coin)

Archaeological ‘context’ (eg hearth)

When photo was taken?When ‘find’ originally made?When ‘find’ deposited?

Who made dating judgment?

Archaeological ‘find’ (eg coin)

Archaeological ‘context’ (eg hearth)

When photo was taken?When ‘find’ originally made?When ‘find’ deposited?

Who made dating judgment?

Archaeological ‘find’ (eg coin)

Archaeological ‘context’ (eg hearth)

When photo was taken?When ‘find’ originally made?When ‘find’ deposited?

Who made dating judgment?

Archaeological ‘find’ (eg coin)

Archaeological ‘context’ (eg hearth)

When photo was taken?When ‘find’ originally made?When ‘find’ deposited?

Broader conceptual framework (ontology)EH extension of CIDOC Conceptual Reference Model (CRM) explicit modelling of archaeological events – complicated!

STAR general architecture

STAR web services

EH Thesauri and CRM ontology

EH Thesauri and CRM ontology

Archaeological Datasets (CRM)Archaeological Datasets (CRM)

• Windows applications• Browser components• Full text search• Browse concept space• Navigate via expansion• Cross search archaeological datasets

STAR client applications

STAR datasets(expressed in terms of CRM)

Grey literature indexing (CRM)

Grey literature indexing (CRM)

Natural Language Processing (NLP)of archaeological grey literature

Extract key concepts in same semantic representation as for data.

Allows unified searching of different datasets and grey literature

in terms of same underlying conceptual structure

“ditch containing prehistoric pottery dating to the Late Bronze Age”

NLP output – what the machine sees!

STAR Demonstrator – search for a conceptual pattern

An Internet Archaeology publication on one of the (Silchester Roman) datasets we used in STAR discusses the finding of a coin within a hearth.-- does the same thing occur in any of the grey literature reports?

Requires comparison of extracted data with NLP indexing in terms of the ontology.

STAR Demonstrator – search for a conceptual patternResearch paper reports finding a coin in hearth – exist elsewhere?

Current issues and goals

a) Apply research outcomes in practice (knowledge transfer)

semantic terminology services

‘rubbish example’ using the ADS Archaeology Image Bank

b) NLP challenges

negation!

Negative findings?

c) Multivocality in archaeology

broader picture of the research issues

Archaeology is rubbish!

• Google search for archaeology rubbish

ADS Archaeology Image Bank Example No results when search for rubbish or refuse – what to do?

STAR Semantic Terminology Services- concept expansion (as web service) midden

MIDDEN n dunghill, refuse heap

midden

dunghill, compost heap, refuse heap,

... muddle, mess

... dirty slovenly person

... midden mavis or midden raker --- searchers of refuse heaps(Concise Scots dictionary - Mairi Robinson, Scottish National Dictionary Association)

 

ADS Archaeology Image Bank ExampleNo results when search for rubbish or refuse – try midden!

NLP challenges – not just negation detection

NLP challenges – need for negative findings!

Archaeologists have to plan for the future

“Research excavations, therefore, must be planned for posterity, eschewing the quick answer and setting up a framework of excavation and recording which can be handed over, extended, modified and improved over decades and in some cases, centuries.”

Techniques of Archaeological Excavation, Philip Barker (1993)

• Archaeology in particular lends itself to the reuse of (excavation) data

• Connect interpretations with the underlying data

• Revisit previous archaeological interpretations and findings

- excavations inevitably based on a limited sample

Archaeological Multivocality - more voices involved than just original project team?

• Expose (invisible) datasets for wider analysis and reuse

• Meta studies comparing different excavation projects

• Connect datasets and wider grey literature – look for wider patterns

• Open up a broader range of research questions that might be answered when we connect currently isolated excavation datasets

• Allow different communities to share data and expertise

Words are tricky!

We should have a great fewer disputes in the world if words were taken for what they are,

the signs of our ideas only, and not for things themselves. (John Locke)

• Emergent classification? – an outcome of the archaeological process

- both constructing and constraining the world

• Map between different classifications and glossaries

rather than one imposed standard?

Words are tricky!

Words are not as satisfactory as we should like them to be, but, like our neighbours,

we have got to live with them and must make the best and not the worst of them. (Samuel Butler)

• Major issues remain

• but knowledge organization systemsoffer some current assistance for moving beyond literal string searchand making the best of the words we have to use