Making Terms Matter 2015. Kara Warburton, Termologic

Preview:

Citation preview

New Frontiers in Terminology work

Kara Warburtonkara@termologic.com

The frontiers of terminology work are extending to such a degree

that we are no longer dealing with terms, but with

subsegment level linguistic data of various kinds,

which are needed to process information in the digital age.

Lexical data has many uses

● Computer-assisted translation (CAT)

● Controlled authoring (CA)

● Content Management Systems (CMS)

● Globalization Management Systems (GMS)

● Business process management (BPM)

● Global branding: products, features, marketing

● SEO and search keywords

● Spell checkers, typeahead, machine-translation, indexing

● NLP and text mining, e.g. sentiment analysis, opinion mining, information forensics

A new operational framework is needed

● Factors driving the changes: advances in technology, diversity of applications, increased availability of large-scale corpora, “industrialization” of terminology

● Changes in the notion of termhood – what we agree to “manage”

● Changes in theory, mission, basic principles

● Changes in methodology – how and what we do

1990 2000 2010 20200

2

4

6

8

10

12

14

Normalization aim

Crowdsourcing

Range of tools

Role of text

Units accepted

Role of concept

Scope of applications

Trends we have been witnessing

Classical notions of termhood are being challenged

● Classical definition of a term:

– the designation of a concept in a structured concept system of a field of special knowledge (subject field).

● Now guided by two factors**:

– relevance to the corpus – lexical structures that are “stable” and “salient” in a given corpus

– relevance to the intended application – purposeful, productive, economical, efficient, internally coherent

** Bourigault, D., and Jacquemin, C. 2000. Construction de ressources terminologiques. In J-M. Pierrel, editor, Ingénierie des langues. Hermès, Paris.

Definition from the “Textual theory of terminology*”

A term is a construct that takes shape through an analysis which gives consideration to corpus evidence, validation by subject-matter experts, and the purpose of the terminographical product

According to the intended purpose, a collection of “terms” can differ according to

● which lexical units are retained

● how they are documented

* See works of D. Bourigault, C. Roche, A. Condamines, Slodzian, and M-C. L'Homme.

Repurposability requires...● A detailed, comprehensive data model

– Adherence to ISO standards, and principles

– Takes into account different applications

– Emphasis on textual context and concept relations

● A terminology management system (TMS) that supports such a data model

● Term selection criteria (termhood) according to purpose

Lack of structure reduces reuse potential

Knowledge bases

Are more repurposable than “flat” termbases

● Rich with concept relations

● Multi-level subject-field hierarchy

● Multi-media

Multi-level subject field hierarchy

Multimedia

© Termologic, 2014. All rights reserved.

Search query contraction

?

? ?

Facetted search without structured lexical resources

Global Search Engine Optimization

● Increase traffic to a website by improving the site's rank in search engines

● A key SEO method is to add search keywords strategically to web sites

Keyword Effectiveness Index (KEI)

volume of searches* per day 2

number of competing pages (hits)

• value greater than 1 is ideal but often difficult• values lower than 1 can still be good keywords

* you can get this data from: adwords.google.com/KeywordPlanner

Enterprise search can beat Google

● How can we associate the user's search words with other different yet closely-meaning words that are present in the text?

➔ Load the SE with a lexical resource (LR) comprising terms from the domain in question.

● Can we do this for global SEO (i.e. Google, Baidu, Yandex, etc.)?

➔ No. The target domain of a search in a global SE is unknown

➔ We can't load a global SEO with an LR

● Can we do this for an enterprise search (e.g. www.ibm.com or www.scania.com)?

➔ YES!

loafersshoesmoccasinschappalssandals

knowledge base feeds into enterprise search

If a user searches for “Venus”, the SE knows it is not the tennis player.

A search for “planet” could suggest all individual planets as alternate searches.

Leveraging big data

Using various NLP tools, terminologists can base decisions on objective statistical measures

● Generation of sailient unigrams

● Term extraction tools

● Concordancing software

● Collocations

● Pattern clustering

● Concept maps

Salient unigrams

Concordance

Collocations

Patterns

Collocations of “dimension”

Expansion of bigram to trigram

Like the cameleon who changes colours to adapt to his environment, terminologists need to adapt to new conditions.

While respecting the traditions of the past where it makes sense, we need to also be prepared to unshackle ourselves from those traditions in order to play a greater role in the evolution of information technology.

Recommended