21
1 Languages for aboutness Indexing languages: – Terminological tools • Thesauri (CV – controlled vocabulary) • Subject headings lists (CV) • Authority files for named entities (people, places, structures, organizations) – Classification – Keyword lists – Natural language systems (broad interpretation)

1 Languages for aboutness n Indexing languages: –Terminological tools Thesauri (CV – controlled vocabulary) Subject headings lists (CV) Authority files

Embed Size (px)

Citation preview

1

Languages for aboutness Indexing languages:

– Terminological tools• Thesauri (CV – controlled vocabulary)• Subject headings lists (CV)• Authority files for named entities (people,

places, structures, organizations)

– Classification– Keyword lists– Natural language systems (broad

interpretation)

2

Subject Analysis What something is about?

– What the content of an object is “about”? Different methods (Wilson, 1968)

– Counting (objective method)– Purposive method– Method appealing to unity– What stands out

Challenges– Non-text

3

Aboutness: How to do it! Read the document [Intellectual

reading]– look for key features– many indexers mark up the items– rarely have time to read the whole document

Determine aboutness [Conceptual analysis]

Translate aboutness into the vocabulary or scheme you are using– In general: Subject headings: 1-3 headings– Descriptors, 5-8 descriptors – Classification: 1 notation (should it only be

one!?).

4

Features of indexing languages:

Involve rules and require maintenance Can be generated via automatic, human,

or auto-human processes Different processes generally display

different strengths and weaknesses.

5

Features of indexing languages:

With the exception of a few general domain tools, they are generally domain specific.– MeSH– NASA Thesaurus– Astronomy Thesaurus– ERIC thesaurushttp://www.darmstadt.gmd.de/~lutes/thesoecd.html

Concepts (or concept representations) are arranged in a discernable order

6

Language schema designs Classified--grouping

– Hierarchies and facets

MeSH Browserhttp://www.nlm.nih.gov/mesh/MBrowser.html

Art and Architecture (Getty AAT) http://www.getty.edu/research/conducting_research/vocabularies/aat/

Alphabetical -- horizontal – Verbal/Alphabetical (ordering/filing challenges)

7

Controlled Vocabulary A list or a database of subject terms in

which each concept has a preferred terms or phrase that will be used to represent it in the retrieval tool; the terms not used have references (syndetic structure), and often scope notes.

8

Thesaurus (structured thesaurus)

Lexical semantic relationships Composed of indexing

terms/descriptors Descriptors = representations of

concepts Concepts = Units of meaning

(Svenonius)

9

Thesaurus

Preferred terms Non-preferred terms Semantic relations between terms How to apply terms (guidelines, rules) Scope notes Adding terms (How to produce terms

that are not listed explicitly in the thesaurus)

10

Preferred Terms

Control form of the term• Spelling, grammatical form• Theatre / Theater• MLA / Modern language association

Choose preferred term between synonyms

• Brain cancer or Brain Neoplasms?

11

Common thesaural identifiers

SN Scope Note – Instruction, e.g. don’t invert phrases

USE Use (another term in preference to this one)

UF Used For BT Broader Term NT Narrower Term RT Related Term

12

Semantic Relationships

Hierarchy Equivalence Association

13

Hierarchies of Meaning

‘Glass’

‘Beer Glass’

‘Wine Glass’

‘Red wine glass’

‘White wine glass’

From: Controlled Vocabularies/ Paul Miller Interoperability Focus UKOLN

14

Hierarchy

Level of generality – both preferred terms

BT (broader term)– Robins BT Birds

NT (narrower term)– Birds NT Robins

– Inheritance, very specific rules

15

Equivalence

When two or more terms represent the same concept

One is the preferred term (descriptor), where all the information is collected

The other is the non-preferred and helps the user to find the appropriate term

16

Equivalence

Non-preferred term USE Preferred term– Nuclear Power USE Nuclear Energy– Periodicals USE Serials

Preferred term UF (used for) Non-preferred term– Nuclear Energy UF Nuclear Power– Serials UF Periodicals

17

Association

One preferred term is related to another preferred term

Non-hierarchical “See also” function In any large thesaurus, a significant umber

of terms will mean similar things or cover related areas, without necessarily being synonyms or fitting into a defined hierarchy

18

Association

Related Terms (RT) can be used to show these links within the thesaurus– Bed RT Bedding– Paint Brushes RT Painting– Vandalism RT Hostility– Programming RT Software

19

Thesauri Guides National Information Standards Organization.

(1993). Guidelines for the construction, format, and management of monolingual thesauri. ANSI/NISO Z39.19-1993. Bethesda, MD: NISO Press.[SILS reference Z695.N36 1994 or http://www.niso.org/standards/resources/z39-19.pdf]

Aitchison, Jean & Gilchirist, Alan. Thesaurus Construction: A Practical Guide. 3rd ed. London: Aslib, 1997.

Willpower Information Management Consultants http://www.willpower.demon.co.uk/thesprin.htm

20

Thesauri Directory

Indexing Resources on the WWW– http://www.slais.ubc.ca/resources/indexing/

database1.htm – -- explore ASIST Thesaurus

Controlled vocabularies– http://sky.fit.qut.edu.au/~middletm//cont_voc.html

Web Compendium– http://www.darmstadt.gmd.de/~lutes/thesauri.html

21

Thesauri/KeywordsCreated according to standardsZ39.19 (Ansi)Single term concepts/postcoordination

“Wireless network” & “home computer”

“Terrorism” “Attacks” & “United States”

More popular in the online environmentLend to recallLend to multilingual environment

Subject Heading Lists

Rules and guidelines “Thesaurification” multi-word concepts/pre-

coordination

“Wireless home computer network”

$y Terrorism attacks $z United States

STRINGS

Lend to precision