www.ontopedia.net
O N T O P E D I AThe Identity of Everything
Computing and LinguisticsA Cognitive Approach
or, Computing “As We May Think”
Steve [email protected]
University of Oslo, 2009-04-21
www.ontopedia.net
O N T O P E D I AThe Identity of Everything
Today’s “research questions”
How can linguistics – and in particular cognitive linguistics – inform our work with Topic Maps?
Can Topic Maps contribute in any way to the cognitive linguistics project?
Plan of action– I tell you about Topic Maps (conceptual model)– I draw some parallels with natural language– You correct me, elaborate and suggest new directions
www.ontopedia.net
O N T O P E D I AThe Identity of Everything
Relevance to you as linguists
As users of the technology– organizing data collected in your research
As consultants to users of the technology– e.g. universities, government agencies, private enterprise
As contributors to the standard– clarify some of the cognitive issues, establish best
practices, help extend the standard As lobbyists to the University of Oslo
– if you think the new UiO web site should be based onTopic Maps, please make your views known to the project group: http://www.admin.uio.no/prosjekter/nyuioweb/
www.ontopedia.net
O N T O P E D I AThe Identity of Everything
Relevance in general
1. We need to organize information in a new way– The summation of human experience is being
expanded at a prodigious rate, and the means weuse for threading through the consequent mazeto the momentarily important item is the sameas was used in the days of square-rigged ships.(Vannevar Bush, As We May Think, 1945)
2. We need new ways of managing knowledge– In today’s global knowledge economy, knowledge is
the key asset in many organizations... Topic Maps makes major contributions in both areas
– See the use cases presented at recent Topic Maps conferences http://www.topicmaps.com
www.ontopedia.net
O N T O P E D I AThe Identity of Everything
What is Topic Maps?
An ISO standard for computer-based informationand knowledge management
– “Provides the ability to control infoglut and share knowledgeby connecting any kind of information from any kind of source based on its meaning”
A “semantic technology”– Cf. Semantic Web (RDF, OWL)– A form of knowledge representation (primitive perhaps, but useful)
Widely used for web-based delivery of information– Plus: Information Integration, eLearning, Business Process
Modeling, Product Configuration, Business Rules Management, Asset Management, Knowledge Management, …
www.ontopedia.net
O N T O P E D I AThe Identity of Everything
The problem with computing...
...is that it’s inside-out! People used to think the sun
revolved around the earth– Copernicus’ heliocentric theory
turned this idea inside out and revolutionized our understandingof the universe
Today we face a similar situation in computing
– Our computing universe has computers, applications and documents at the centre
– The concepts that our information is about are somewhere in outer space where they can’t be found
www.ontopedia.net
O N T O P E D I AThe Identity of Everything
A subject-centric revolution
This is wrong, because it does not reflect how humans think
– We think in terms of interrelated concepts (or subjects)
– Subjects are what interest us, not documents or applications
– And so subjects must be givencentre stage
We need a subject-centric revolution
– This has ramifications for every aspect of human-computer interaction, including user interfaces, operating systems, file systems, etc.
– Consider the typical user desktop...
www.ontopedia.net
O N T O P E D I AThe Identity of Everything
Today our Today our desktops are desktops are
application-application-centric and centric and document-document-
centriccentricIcons represent Icons represent
applicationsapplications and and documentsdocuments
www.ontopedia.net
O N T O P E D I AThe Identity of Everything
topic maps
tm2008
bantu semantics
LING 2110
INF 2820rana
keynote
OOXML
K185 gambiaopera
janacek
bayreuthhåkon
TM2008Topic pageEmailsDocumentsWeb pagesCopy PSIΨ
Why can’t they be subject-centric, with icons that represent the subjects we are interested in?
With links between related icons? And with context menus that allow us to find
everything related to a particular subject?
TM2008Topic pageEmailsDocumentsWeb pagesCopy PSIΨ
www.ontopedia.net
O N T O P E D I AThe Identity of Everything
Computing “As We May Think”
Bush’s solution to information overload:– Organize information “As We May Think”, i.e. associatively
His vision spawned the hypertext movement– Doug Engelbart, Ted Nelson, Bill Atkinson, Tim Berners-Lee, ...– The World Wide Web is its greatest triumph to date
But hypertext does not correspond to how we think– Our heads are not full of millions of interlinked documents– They are full of “interlinked” concepts (or subjects)
Topic Maps provides a close approximation to this– It is a technology that is based on cognitive principles
www.ontopedia.net
O N T O P E D I AThe Identity of Everything
Background to Topic Maps
Emerged from the SGML community in 1990’s– Use case: How to merge (digital) back-of-book indexes– Some input from library science– No input from linguists– Precious little input from computer scientists before 2001– Most of the SGML community came from the humanities
ISO 13250 first published in 2000 (recently revised)– A model for representing knowledge organization structures
(indexes, glossaries, thesauri, encyclopedias)– Plus interchange syntax, query language, constraint language, ...
Widely adopted in Norway (esp. public sector)– And gaining ground elsewhere
www.ontopedia.net
O N T O P E D I AThe Identity of Everything
The TAO of Topic Maps
The core concepts are derived from the back-of-book index
Extended and generalized for use with digital information
Consider a two-layer model consisting of
– a set of information resources (below)– a “knowledge map” (above)
This is like the division of a book into content and index
knowledge layer
information layer
(INDEX)
(CONTENT)
Callas, Maria …………………… 42Cavalleria Rusticana … 71, 203-204Mascagni, Pietro Cavalleria Rusticana . 71, 203-204Pavarotti, Luciano ……………… 45Puccini, Giacomo ………. 23, 26-31 Tosca ………………. 65, 201-202Rustic Chivalry, see Cavalleria Rusticanasingers ………………………. 39-52 baritone ………………………. 46 bass ……………………….. 46-47 soprano ……………… 41-42, 337 tenor ………………………. 44-45 see also Callas, PavarottiTosca ………………… 65, 201-202
www.ontopedia.net
O N T O P E D I AThe Identity of Everything
(1) The information layer
The lower layer contains the content– usually digital, but need not be– can be in any format or notation or location– can be text, graphics, video, audio – whatever
This is like the content of the book to which theback-of-book index belongs
information layer(CONTENT)
www.ontopedia.net
O N T O P E D I AThe Identity of Everything
(2) The knowledge layer
The upper layer consists of (typed) topics and associations– Topics represent the subjects that the information is about
Like the list of topics that forms a back-of-book index
– Associations represent relationships between those subjects Like “see also” relationships in a back-of-book index
knowledge layer
composed by
born in
composed by
Puccini
Tosca
Lucca
MadameButterfly
(INDEX)
Domain:Italian opera
www.ontopedia.net
O N T O P E D I AThe Identity of Everything
Occurrences link the layers
Occurrences represent relationships between information resources and the subjects that they are “about”
The links (or locators) are like page numbers in a back-of-book index
Occurrences canalso be typed (e.g.bio, map, synopsis)
knowledge layer
information layer
Puccini
Tosca
Lucca
composed by
born in
composed by
MadameButterfly
www.ontopedia.net
O N T O P E D I AThe Identity of Everything
Summary of core concepts
A pool of information or data, and
information
Associations– representing relationships between
subjects
composed by
born in
composed by
Occurrences– links to information that is somehow
relevant to a given subject
= The TAO of Topic Maps
a knowledge layer consisting of
knowledge
Topics– a set of topics representing the key
subjects of the domain in question
Puccini
Tosca
Lucca
MadameButterfly
Let’s look at some TAOsin the Omnigator…
Plus: topic types, association types, occurrence types – each of which are represented by topics...
www.ontopedia.net
O N T O P E D I AThe Identity of Everything
About the Omnigator
A free topic map browser from Ontopia– Download from http://www.ontopia.net (part of “OKS Samplers”)– Java-based, runs on any computer
Completely generic– Not optimized for any particular ontology– Display and navigate any conforming topic map
A teaching aid– Not designed for end-users (no attempt to hide technical jargon)– Also used for prototyping and debugging
Not to be used for most real world applications!– These require custom interfaces based on a specific ontology– (see http://www.topicmaps.com for a good example)
www.ontopedia.net
O N T O P E D I AThe Identity of Everything
Omnigator interface
current topic
multiple (typed) names
topic type(s)
typedoccurrences (internal and external)
typedassociations
Demo
a typical topic page
identifier(s)
www.ontopedia.net
O N T O P E D I AThe Identity of Everything
Typing topics revisited
Basic building blocks of the TAO model are– Topics: e.g. “Puccini”, “Lucca”, “Tosca”– Associations: e.g. “Puccini was born in Lucca”– Occurrences: e.g. “http://www.opera.net/puccini/bio.html
is a biography of Puccini”
Each of these constructs can be typed– Topic types: “composer”, “city”, “opera”– Association types: “born in”, “composed by”– Occurrence types: “biography”, “street map”, “synopsis”
All such types are also topics– The set of typing topics constitutes an ontology
www.ontopedia.net
O N T O P E D I AThe Identity of Everything
Capabilities of the TAO model (1)
Represent subjects explicitly– Topics represent the “things” users are interested in
Capture relationships between subjects– Associations provide user-friendly navigation paths to information
(navigation “as we may think”)– Associations also promote serendipitous knowledge discovery
through browsing
Make information findable– Topics provide a “one-stop-shop” for everything that is known
about a subject (collocation of information and knowledge)– Occurrences allow information about a common subject to be
aggregated across multiple systems, irrespective of location
www.ontopedia.net
O N T O P E D I AThe Identity of Everything
Capabilities of the TAO model (2)
Represent taxonomies and thesauri– Associations can (also) represent hierarchical relationships– With Topic Maps you can have multiple, interlinked hierarchies
and faceted classification
Transcend simple hierarchies– Rich associative structures capture the complexity of
knowledge and reflect the way people think
Manage knowledge– The topic map is the embodiment of “organizational memory”– Provides a structured way to capture people’s knowledge of
things, events, relationships, etc.
www.ontopedia.net
O N T O P E D I AThe Identity of Everything
Beyond the TAO
Formal data model– Topic maps can be queried, e.g.– Give me all composers that composed operas that were
based on plays that were written by Shakespeare Interchange syntax
– Topic maps can be interchanged– Increased reuse = added value
Robust identity model– Topic maps can be merged– Potential to federate knowledge
Scope– Topic maps can capture context
Reification– Topic maps can express different levels of detail– Similar to scaling in cartography
For more details, see Pepper 2009
www.ontopedia.net
O N T O P E D I AThe Identity of Everything
Break – any questions so far?
After the break:Topic Maps and natural language – towards a linguistic perspective
www.ontopedia.net
O N T O P E D I AThe Identity of Everything
Parallels with natural language
Basic grammatical classes Nouns and verbs Nominals and nouns Clauses and verbs Valency Semantic roles Categories and schemas Hyponymy Synonymy and homonymy Nominalization Grounding / co-reference Information structure
TAO model Topics and associations Topics and their types Associations and their types Arity Association roles Typing topics Type hierarchies Naming Reification Subject identity / collocation Navigation
www.ontopedia.net
O N T O P E D I AThe Identity of Everything
Basic principles, basic classes
In elementary school, I was taught that a noun is the name of a person, place, or thing. In college, I was taught the basic linguistic doctrine that a noun can only be defined in terms of grammatical behavior, conceptual definitions of grammatical classes being impossible. Here, several decades later, I demonstrate the inexorable progress of grammatical theory by claiming that a noun is the name of a thing.(Langacker 2008)
The basic grammatical classes are nouns and verbs– They prototypically profile things and relationships
They correspond to topics and associations
www.ontopedia.net
O N T O P E D I AThe Identity of Everything
Grounding
“Grounding is characteristic of the structure referred to in CG as nominals and finite clauses. More specifically, a nominal or a finite clause profiles a grounded instance of a thing or process type.”
“A noun designates a type of thing, and a verb a type of process.”
“A nominal or a finite clause profiles a grounded instance of a thing or process type.”
Nominal grounding (determiners and quantifiers)– the, this, that, some, a, each, every, no, any
Clausal grounding (mood and tense)– -s, -ed, may, will, should
Langacker 2008: 259ff (esp. 264)
www.ontopedia.net
O N T O P E D I AThe Identity of Everything
Nouns and nominals
Topic types represent classes of topics– Conceptual “groupings of things”, e.g. composer, opera, city, ...– They correspond to Langacker’s nouns (“types of thing”)
However, topics can have multiple names– (This is how we handle synonymy and multilingualism)– In one sense it is topic names that correspond to nouns
Topic instances represent individual subjects– They correspond to Langacker’s nominals (“instances of types”)– Their names are typically proper nouns, e.g. Puccini, Tosca, Lucca
www.ontopedia.net
O N T O P E D I AThe Identity of Everything
Verbs and clauses
Association types represent classes of relationships– They correspond to Langacker’s verbs (“types of process”)– (Often named accordingly, e.g. born in, composed by, killed by, ...)
Individual associations represent specific relationships– They correspond to Langacker’s clauses (“instances of processes”)– e.g. Puccini was born in Lucca; Tosca was composed by Puccini
Langacker distinguishes processes (temporal) and non-processualrelationships (non-temporal). The latter are (prototypically) profiledby adjectives, adverbs, prepositions, and participles. This distinctionis not made explicitly in Topic Maps.
Note: There are two predefined association types– type-instance (the relationship between a topic and its type)– supertype-subtype (a relationship between types, see Hyponymy)
www.ontopedia.net
O N T O P E D I AThe Identity of Everything
Valency
Associations can involve one, two or more topics– Binary associations, e.g. Puccini composed Tosca, are most
common and correspond to transitive verbs– Ternary associations, e.g. Tosca killed Scarpia with a knife, can
correspond to ditransitive verbs– Unary associations, e.g. Turandot was unfinished, correspond
(sort of) to intransitive verbs (or binary properties)
The arity of an association– Corresponds to the valency of a verb
www.ontopedia.net
O N T O P E D I AThe Identity of Everything
Semantic roles
An association does not have “directionality”
Instead of direction, Topic Maps uses roles– Roles are classified by type– Role types specify the nature of each topic’s involvement
in the relationship. They correspond to semantic roles.– (Role types are also topics)
Role types are different from topic types...
Puccini Toscacomposed
composed by
composer work
RDFTopic Maps
www.ontopedia.net
O N T O P E D I AThe Identity of Everything
Roles and types
T R APuccini
R TTosca
T T Tcomposer workcomposed
Tcomposer
Topera
The role type can be– the same as the role playing topic’s topic type (composer = composer)– a supertype of the topic type (work > opera)– a subtype of the topic type (teacher < person)– a subtype of the topic type’s supertype (source < work)
www.ontopedia.net
O N T O P E D I AThe Identity of Everything
Association roles Semantic roles
Italian Opera Topic Map– composed: composer, work– born in: person, place– appears in: character, work– based on: source, result– revision of: source, result– part of: part, whole– exponent of: person, style– located in: container, containee– pupil of: teacher, pupil
Association roles tend to be much more specific– Variable practice – as yet no established conventions– Might (cognitive) linguists have something to offer here?
(Frawley 1992)– (logical actors)
agent, author, instrument– (logical recipients)
patient, experiencer, benefactive
– (spatial roles)theme, source, goal
– (non-participant roles)locative, reason, purpose
www.ontopedia.net
O N T O P E D I AThe Identity of Everything
Naming of associations
Intuitive naming requires flexibility– i.e. multiple AT names that change
depending on the “direction” of the association Puccini was born in Lucca Lucca was the birthplace of Puccini
Alternative CG view– Naming should be based on whether
the agent or the theme is in focus The focus becomes the trajector
– Point of focus = Current topic
Some strategies...
Voice-based– Active / passive forms of the verb
composedVa / composedVp by
– Works well in SVO languages. Less satisfactory with SOV.
Role-based teacherN of/pupilN of
Nominalization composition
– Tends to be used by Japanese, Koreans (and Germans??)
Combinations bornV in / birthplaceN of partN of/consistsV of
www.ontopedia.net
O N T O P E D I AThe Identity of Everything
Categories and prototypes
Topic types define categories of things– But are they Aristotelian or prototypical categories?
Aristotelian– Category membership is binary– All instances are equally representative. No standard notion of “similarity”.
Prototypical– Not defined by “necessary and sufficient conditions” (cf. OWL)
The decision is up to the conceptualizer (a.k.a. topic map author)
– A topic can have more than one type Boïto is a composer and a librettist
– The same topic can be a topic type and a role type e.g. Puccini is a composer; Puccini plays the role of composer in …
– Should we establish conventions for goodness of example? Could be useful in automated classification
www.ontopedia.net
O N T O P E D I AThe Identity of Everything
Schemas and constraints
Other types can also be said to define categories– association types, (occurrence types, name types, role types)
But these are more schematic (in the CG sense)– Schemas are “abstract templates obtained by reinforcing the
commonality inherent in a set of instances”(Langacker 2008, p.23, in the context of grammatical rules)
Rules can be defined as templates and constraints:
T R APuccini
R TTosca
T T Tcomposer workcomposedT
composer
Topera
“Puccini composed Tosca”
The composer Puccini plays the role of composer in the “composition” relationship in which the role of work is played by the opera Tosca.T
AGENT
TTHEMEelaboration sites
www.ontopedia.net
O N T O P E D I AThe Identity of Everything
Hyponymy
Topic Maps has two predefined association types:– type-instance (relationship between a topic and its type)– supertype-subtype (relationship between the denotations of
a hyponym and its hyperonym)
Mammal
Primate Canine
HumanChimp WolfDog
Steve Ron
LEGEND
types
instances
supertype-subtype
type-instance
www.ontopedia.net
O N T O P E D I AThe Identity of Everything
Synonymy and homonymy
Synonyms– One subject, multiple names– In thesauri: USE and USED FOR
TMs are subject-centric– A topic can have multiple names– Names can be typed
Typical name types:– nickname, synonym, alternate name
– Context can be expressed using scope Typically names in different natural languages– composer, komponist, 작곡가 , ...
– Names can also have “variants” Often used to capture orthographic variation:– Tchaikovsky, Чайко́вский, Tsjajkovskij,
Tschaikowski Also useful for sort names, pronunciation, etc.
Homonyms– One name, multiple subjects– In thesauri: problematic
TMs are based on identifiers– Same name can be used by more than
one topic– Disambiguation in UI is left to the
application– Two main disambiguation strategies
Default: qualify by type, e.g.– Tosca (opera) vs. Tosca (character) Fallback: qualify by some other
relationship, e.g.– Paris (France) vs. Paris (Texas)– La Bohème (Puccini) vs. La Bohème
(Leoncavallo)
www.ontopedia.net
O N T O P E D I AThe Identity of Everything
Nominalization
(A topic map consists of assertions about subjects) Assertions are made using statements:
– names, e.g. a certain subject has the name “Tosca”– associations, e.g. “Tosca is set in Rome”– occurrences, e.g. “http://en.wikipedia.org/wiki/Rome is a web
page about Rome” Any statement can be reified
– Reification results in a topic that has the same referent as the reified statement
– e.g. Tosca is set in RomeA The setting of Tosca in RomeT
– The (new) reifying topic can have names and occurrences,and it can play roles in associations
Derivation of nouns from other words, including verbs, adjectives etc.
e.g. meetV meetingN
www.ontopedia.net
O N T O P E D I AThe Identity of Everything
Subjects and topics
Topics represent subjects– the topic is the representation– the subject is the referent
Or, in Saussure’s terms– signifiant and signifié
A subject can be anything:A subject is any “thing” whatsoever, whether or not it exists or has any other specific characteristics, about which anything whatsoever may be asserted by any means whatsoever.
Is the topic/subject pairing a symbolic assembly?
A subject inthe real world
T
A topic in the computer domain
www.ontopedia.net
O N T O P E D I AThe Identity of Everything
Co-reference and collocation
Grounding singles out referents and enables co-reference– between speaker and listener– across a sequence of utterances
In Topic Maps the central objective is collocation– By definition, each topic represents a single subject (one subject per topic)– A topic is intended to be a point of collocation for everything that is known
about a particular subject– Therefore the goal is to have only one topic per subject
To achieve that we need to know which subject a topic represents
– (This is sometimes referred to as the “intentionality” of the relation between a symbol and its referent.
– We call it subject identity.
www.ontopedia.net
O N T O P E D I AThe Identity of Everything
Subject identity
The identity of a subject is expressed using globally unique identifiers called subject identifiers
– If two topics share a subject identifier, they are deemed to represent the same subject and must be merged
SUBJECTS
TOPICS
MadameButterfly
Tosca
Lucca
Puccini
www.ontopedia.net
O N T O P E D I AThe Identity of Everything
The subject is identified by a URL
• The URL is called asubject identifier
Subject identifiers
GiacomoPuccini
topic
http://psi.ontopedia.net/Giacomo_Puccini
subject identifier
The URL is the address of a web page
• The web page describes the subject such that a human can know what subject is referred to
• This web page is called a subject descriptor
Giacomo PucciniItalian composer, b. Lucca 22nd Dec 1858, d. Brussels, 29th Nov 1924. Best known for his operas, of whichTosca is one of the most popular and well-known.
subject descriptor
http://psi.ontopedia.net/Giacomo_Puccini Humans use the descriptorBy inspecting the web page the person responsible for assigning the identifier can be sure that it does not refer to, say, Giacomo’s grandfather Domenico (who was also a composer of operas)
Machines use the identifierThe link is not resolved. Instead simple lexical comparison is used. If the strings are identical, the subject is deemed to be the same and the topics are merged.
subject
Is the subject identifier/ subject descriptor pairing a symbolic assembly?
www.ontopedia.net
O N T O P E D I AThe Identity of Everything
Information structure
Intuitive navigation is a key feature of Topic Maps But what is its cognitive basis?
– I claim that it corresponds to the way we think (i.e., associatively)– Can linguistics back up this claim?
topic vs. comment in linguistics (Bussmann, 487)– “Analysis of sentences according to communicative criteria into the
topic (what is being talked about) and the comment (what is being said about the topic)”
– “Analysis of utterances according to the communicative criteria of given/known information vs. new information”
– Cf. theme vs. rheme in Halliday’s functional grammar
Consider our earlier tour of Italian opera...
www.ontopedia.net
O N T O P E D I AThe Identity of Everything
Navigation as narrative
Giacomo Puccini was a composer. He was born in Lucca in 1858.
Lucca is a city, located in Italy. It was the birthplace of Puccini and Catalani.
Catalani was a composer who composed 5 operas. He died in Milan.
Milan is the home of La Scala, which was the venue for many premiére performances, including that of Madam Butterfly.
Madam Butterfly is set in Nagasaki, which is located in Japan.
Japan is (also) the setting for Iris, [which is] an opera [which was] composed by Mascagni, who was a pupil of Ponchielli who was (also) the teacher of Puccini...
Giacomo Puccini was a composer. He was born in Lucca in 1858.
Lucca is a city, located in Italy. It was the birthplace of Puccini and […] Catalani.
Catalani was a composer who composed 5 operas. He died in Milan.
Milan is the home of La Scala, which was the venue for many premiére performances, including that of Madam Butterfly.
Madam Butterfly is set in Nagasaki, which is located in Japan.
Japan is (also) the setting for Iris, [which is] an opera [which was] composed by Mascagni, who was a pupil of Ponchielli who was (also) the teacher of Puccini...
THEME: new theme continuing themeRHEME: predicate with potential new theme
www.ontopedia.net
O N T O P E D I AThe Identity of Everything
“Now! …. That should clear up a few things
around here!”
Discussion
Questions, comments, corrections?– What have I missed? Where else should I look?
What might linguists contribute?– A better understanding of the nature of roles?– Approaches to representing temporal knowledge?– ...
Can Topic Maps inform linguistics?– After all, it is a technology that captures (some
degree of) (some form of) knowledge– It seems to have a reasonable cognitive basis– It emerged through usage (librarians, indexers, etc.)– And last but not least, it works!
www.ontopedia.net
O N T O P E D I AThe Identity of Everything
References
Bussman, H. Routledge Dictionary of Language and Linguistics (London 1996)
Frawley, W. Linguistic Semantics (Hillsdale 1992)
Langacker, R. Cognitive Grammar (Oxford 2008)
Pepper, S. Italian Opera Topic Map – http://www.ontopedia.net/ItalianOpera
Pepper, S. “Topic Maps” in Bates, M.J. and Maack, M.N. (eds) Encyclopedia of Library and Information Sciences (CRC Press, forthcoming 2009)
– http://www.ontopedia.net/pepper/papers/ELIS-TopicMaps.pdf