21
LIS 677 1 Indexing with a Controlled Vocabulary Basic Concepts

LIS 6771 Indexing with a Controlled Vocabulary Basic Concepts

Embed Size (px)

Citation preview

LIS 677 1

Indexing with a Controlled Vocabulary

Basic Concepts

LIS 677 2

Indexing: Topics Covered

The “concept triangle”

The five-axiom theory of indexing

The indexing process

LIS 677 3

The “Concept Triangle” Referent

Concept Expression

LIS 677 4

The Referent “The referent is everything about

which a meaningful statement can be made.”

For example, about a certain table many statements can be made concerning the material of which it is made, its price, purpose, producer, weight, the structure of its surface, etc.

LIS 677 5

The Concept “We define the concept as the sum

of the essential statements that can be made about a referent.” Essential statements are those which

contribute to the characterization of the referent itself.

Inessential statements are those which do not contribute to the characterization of the referent itself.

LIS 677 6

Kinds of Concepts General concepts

The general concept describes a class of interrelated referents.

For example: metal, oxidation, information Individual concepts

The individual concept is one to which no meaningful conceptual feature can be added.

For example: Albert Einstein; Fritz the Cat.

LIS 677 7

General vs. Individual Concepts in Indexing “It is the task of subject indexes to

provide access to documents or text passages relevant to general concepts.”

“An information system which works quite well for individual concepts, may totally fail when it is required to manage general concepts too.”

LIS 677 8

The Mode of Expression Lexical expressions

linear strings of characters commonly agreed upon to express concepts or concept connections

Non-lexical expressions linear strings of characters by which

concepts or concept relations are expressed and upon which no firm agreement has been made

LIS 677 9

Forms of Expression & Indexing Lexical expressions require little

indexing work Often appear in Identifier fields rather

than in Descriptor fields of database records

Non-lexical expressions require indexing work non-lexical expressions exhibit

ambiguity and multiplicity

LIS 677 10

Concepts & Expressions Individual concepts are almost always

expressed lexically General concepts are almost always

expressed non-lexically In natural, uncontrolled language there is an

unlimited multitude of non-lexical, paraphrasing expressions for concepts

Multiplicity & ambiguity of natural language expressions are largely restricted to general concepts

LIS 677 11

Five-Axiom Theory of Indexing

Definability Order Sufficient degree of order Representational predictability Representational fidelity

LIS 677 12

Axiom of Definability The compilation of information

relevant to a topic can be delegated (to a skilled specialist or a programmed search mechanism) only to the extent to which the inquirer can define the topic in terms of concepts and concept relations.

LIS 677 13

Axiom of Order Any compilation of information

relevant to a topic is an order-creating process. Order is defined as the meaningful

proximity of the parts of a whole at a foreseeable place.

LIS 677 14

Axiom of Sufficient Degree of Order The demands made on the degree

of order increase as the size of the collection and/or the frequency of the searches and/or the specificity of the searches increases.

LIS 677 15

Axiom of Representational Predictability The completeness of any search for

documents relevant to a topic of interest depends on the predictability of the modes of expression for concepts in the search file. Successful searches require a language

with predictable modes of expression for concepts.

LIS 677 16

Axiom of Representational Fidelity The precision of any search for

documents relevant to a topic of interest depends on the fidelity with which the modes of expression for concepts can be expressed in the system’s language.

LIS 677 17

The Indexing Process

Step 1: Determine the essence of a document

Step 2: Represent this essence with sufficient

degrees of predictability and fidelity

LIS 677 18

Importance of Categories “The predictability of essence selection is

markedly enhanced when the indexers have an orientation to conceptual categories.” For example, in some chemistry databases, all

descriptors belong to the following categories: MATTER LIVING ENTITY APPARATUS PROCESSS

In ERIC, the nine Descriptor Groups serve as categories.

LIS 677 19

Natural Language Indexing “Natural language expressions, as

derived from original texts, can only in the case of individual concepts lead to an information system of adequate quality and survival power.”

The specificity of natural language expressions is compromised by their lack of predictability.

LIS 677 20

Importance of “Cutter’s Rule” Precise and complete searches

require that the most specific descriptors that the vocabulary provides be chosen for the indexing of a subject.

A query with a specific descriptor must not retrieve concepts that are more general than the search descriptor.

LIS 677 21

Importance of Syntax In the interests of enhanced

representational fidelity any advanced indexing language needs a syntax in addition to its vocabulary.

The syntax should represent the manner in which the concepts are connected with each other in the texts to be stored.