Upload
arvi-tavast
View
221
Download
1
Embed Size (px)
Citation preview
Terminology work and term databases in EstoniaWith emphasis on termbase data structures
Arvi Tavast, PhDqlaaraRiga, 4 November 2015
Lexicography Terminology What’s wrong Quantitative
IntroductionFrom Estonian terminology to termbase data structures
We used to have specialised lexicography that peopleaffectionately called terminology
Then we had a bit of terminology
(even applied to general language)
There were calls for a unified termbase of all terms
Which is unfortunately not doable:
coveragereliabilitylack of conventiontheoretical issues
The following presentation gives a bit more detail
Lexicography Terminology What’s wrong Quantitative
Outline
1 Lexicography: semasiological data structures
2 Terminology: onomasiological data structures
3 What’s wrongData structuresMetaphors of communication
4 Quantitative dictionary data structuresData structuresDivision of labour
Lexicography Terminology What’s wrong Quantitative
Semasiological data structuresWords and what they mean
en: table1. a piece of furniture with four legs and a flat top
de: Tisch
2. layout of data in rows and columns
de: Tabelle
en: desk- an office table
de: Tischde: Schreibtisch
en: spreadsheet- a data layout consisting of rows and columns
de: Tabellede: Arbeitsblatt
Lexicography Terminology What’s wrong Quantitative
Onomasiological data structuresConcepts and how they are called
1 A piece of furniture with four legs and a flat top, for eating
en: tablede: Tisch
2 A piece of furniture with four legs and a flat top, for writing
en: deskde: Tischde: Schreibtisch
3 Layout of data in rows and columns
en: tableen: spreadsheetde: Tabellede: Arbeitsblatt
Lexicography Terminology What’s wrong Quantitative
ExampleLatvian-Estonian dictionary
Lexicography Terminology What’s wrong Quantitative
ExampleLatvian-Estonian dictionary
Lexicography Terminology What’s wrong Quantitative
What’s wrongData structures
Semasiology
Pro: easy for the editor, understandable for the readerCon: no support for consistencyA narrative about the editor, not a data source about language
Onomasiology
Pro: consistency, scalability, standardisationCon: need for explicit binary decisionsAn oversimplified data source about language; works ifconcepts are known
Both
Binary: either means or does not mean, there is no scaleIntrospective: claims are not falsifiableSimplistic: assume the concepts are (or can be) knownThe channel metaphor of communication
Lexicography Terminology What’s wrong Quantitative
What’s wrongThe channel metaphor vs uncertainty reduction
Encoding of a message must contain a set of discriminablestates that is greater than or equal to the number ofdiscriminable states in the to-be-encoded message
or:
Encoding thoughts with words can only work if the number ofpossible thoughts is smaller than or equal to the number ofpossible words
This is the case only in very restricted domains (e.g. weatherforecasts)
Ramscar, M. et al. 2010. The Effects of Feature-Label-Order and Their Implications
for Symbolic Learning. Cognitive Science 34(6): 909–957.
Lexicography Terminology What’s wrong Quantitative
Quantitative data structuresWords (lexomes), their relatedness and other numerical parameters
Empirical data sources, rather than introspective
Corpus research, frequencies, collocations, distributionalsemanticsHuman experimental judgementsNB Meaning is inherently introspective, not measurable.Relative meaning is measurable
Quantified data, rather than binary
Types of relatedness: synonyms, equivalents, cohyponyms, etc.Other numerical parameters: frequency, valence, emotion,reaction times, naming latencies, neighbourhood density,relative entropy, median absolute deviation, morphologicaldistribution, search statistics etc.
Lexicography Terminology What’s wrong Quantitative
Quantitative data structuresRelatedness can be quantified and presented as a graph or a table
table1 table2 desk spreadsheet Tisch Schreibtisch Tabelle Arbeitsblatttable1 1 0 0.1 0 0.6 0.4 0 0table2 0 1 0 0.5 0 0 0.8 0.8desk 0.1 0 1 0 0.6 0.8 0 0spreadsheet 0 0.5 0 1 0 0 0.7 0.8Tisch 0.6 0 0.6 0 1 0.8 0 0Schreibtisch 0.4 0 0.8 0 0.8 1 0 0Tabelle 0 0.8 0 0.7 0 0 1 0.8Arbeitsblatt 0 0.8 0 0.8 0 0 0.8 1
Fictional data for demonstration purposes only
Lexicography Terminology What’s wrong Quantitative
Division of labourDumb user, smart dictionary vs smart user, dumb dictionary
A smart dictionary provides the correct answers
A dumb dictionary provides hints, like a thesaurus or synonymdictionary
A dumb user looks for definite answers
A smart user can figure out the answer based on even subtlehints
Lexicography Terminology What’s wrong Quantitative
Thanks for listeningContacts and recommended reading
Slides:www.slideshare.net/arvitavast
Contact:[email protected]
Easy reading:blog.qlaara.com
Pointer to the real stuff:Ramscar, M. et al. 2010. The Effects ofFeature-Label-Order and Their Implications for SymbolicLearning. Cognitive Science 34(6): 909–957