13
Terminology work and term databases in Estonia With emphasis on termbase data structures Arvi Tavast, PhD qlaara Riga, 4 November 2015

Terminology work and term databases in Estonia

Embed Size (px)

Citation preview

Page 1: Terminology work and term databases in Estonia

Terminology work and term databases in EstoniaWith emphasis on termbase data structures

Arvi Tavast, PhDqlaaraRiga, 4 November 2015

Page 2: Terminology work and term databases in Estonia

Lexicography Terminology What’s wrong Quantitative

IntroductionFrom Estonian terminology to termbase data structures

We used to have specialised lexicography that peopleaffectionately called terminology

Then we had a bit of terminology

(even applied to general language)

There were calls for a unified termbase of all terms

Which is unfortunately not doable:

coveragereliabilitylack of conventiontheoretical issues

The following presentation gives a bit more detail

Page 3: Terminology work and term databases in Estonia

Lexicography Terminology What’s wrong Quantitative

Outline

1 Lexicography: semasiological data structures

2 Terminology: onomasiological data structures

3 What’s wrongData structuresMetaphors of communication

4 Quantitative dictionary data structuresData structuresDivision of labour

Page 4: Terminology work and term databases in Estonia

Lexicography Terminology What’s wrong Quantitative

Semasiological data structuresWords and what they mean

en: table1. a piece of furniture with four legs and a flat top

de: Tisch

2. layout of data in rows and columns

de: Tabelle

en: desk- an office table

de: Tischde: Schreibtisch

en: spreadsheet- a data layout consisting of rows and columns

de: Tabellede: Arbeitsblatt

Page 5: Terminology work and term databases in Estonia

Lexicography Terminology What’s wrong Quantitative

Onomasiological data structuresConcepts and how they are called

1 A piece of furniture with four legs and a flat top, for eating

en: tablede: Tisch

2 A piece of furniture with four legs and a flat top, for writing

en: deskde: Tischde: Schreibtisch

3 Layout of data in rows and columns

en: tableen: spreadsheetde: Tabellede: Arbeitsblatt

Page 6: Terminology work and term databases in Estonia

Lexicography Terminology What’s wrong Quantitative

ExampleLatvian-Estonian dictionary

Page 7: Terminology work and term databases in Estonia

Lexicography Terminology What’s wrong Quantitative

ExampleLatvian-Estonian dictionary

Page 8: Terminology work and term databases in Estonia

Lexicography Terminology What’s wrong Quantitative

What’s wrongData structures

Semasiology

Pro: easy for the editor, understandable for the readerCon: no support for consistencyA narrative about the editor, not a data source about language

Onomasiology

Pro: consistency, scalability, standardisationCon: need for explicit binary decisionsAn oversimplified data source about language; works ifconcepts are known

Both

Binary: either means or does not mean, there is no scaleIntrospective: claims are not falsifiableSimplistic: assume the concepts are (or can be) knownThe channel metaphor of communication

Page 9: Terminology work and term databases in Estonia

Lexicography Terminology What’s wrong Quantitative

What’s wrongThe channel metaphor vs uncertainty reduction

Encoding of a message must contain a set of discriminablestates that is greater than or equal to the number ofdiscriminable states in the to-be-encoded message

or:

Encoding thoughts with words can only work if the number ofpossible thoughts is smaller than or equal to the number ofpossible words

This is the case only in very restricted domains (e.g. weatherforecasts)

Ramscar, M. et al. 2010. The Effects of Feature-Label-Order and Their Implications

for Symbolic Learning. Cognitive Science 34(6): 909–957.

Page 10: Terminology work and term databases in Estonia

Lexicography Terminology What’s wrong Quantitative

Quantitative data structuresWords (lexomes), their relatedness and other numerical parameters

Empirical data sources, rather than introspective

Corpus research, frequencies, collocations, distributionalsemanticsHuman experimental judgementsNB Meaning is inherently introspective, not measurable.Relative meaning is measurable

Quantified data, rather than binary

Types of relatedness: synonyms, equivalents, cohyponyms, etc.Other numerical parameters: frequency, valence, emotion,reaction times, naming latencies, neighbourhood density,relative entropy, median absolute deviation, morphologicaldistribution, search statistics etc.

Page 11: Terminology work and term databases in Estonia

Lexicography Terminology What’s wrong Quantitative

Quantitative data structuresRelatedness can be quantified and presented as a graph or a table

table1 table2 desk spreadsheet Tisch Schreibtisch Tabelle Arbeitsblatttable1 1 0 0.1 0 0.6 0.4 0 0table2 0 1 0 0.5 0 0 0.8 0.8desk 0.1 0 1 0 0.6 0.8 0 0spreadsheet 0 0.5 0 1 0 0 0.7 0.8Tisch 0.6 0 0.6 0 1 0.8 0 0Schreibtisch 0.4 0 0.8 0 0.8 1 0 0Tabelle 0 0.8 0 0.7 0 0 1 0.8Arbeitsblatt 0 0.8 0 0.8 0 0 0.8 1

Fictional data for demonstration purposes only

Page 12: Terminology work and term databases in Estonia

Lexicography Terminology What’s wrong Quantitative

Division of labourDumb user, smart dictionary vs smart user, dumb dictionary

A smart dictionary provides the correct answers

A dumb dictionary provides hints, like a thesaurus or synonymdictionary

A dumb user looks for definite answers

A smart user can figure out the answer based on even subtlehints

Page 13: Terminology work and term databases in Estonia

Lexicography Terminology What’s wrong Quantitative

Thanks for listeningContacts and recommended reading

Slides:www.slideshare.net/arvitavast

Contact:[email protected]

Easy reading:blog.qlaara.com

Pointer to the real stuff:Ramscar, M. et al. 2010. The Effects ofFeature-Label-Order and Their Implications for SymbolicLearning. Cognitive Science 34(6): 909–957