Upload
alaina-rich
View
225
Download
2
Embed Size (px)
Citation preview
INDEXES AND INDEXING
Ma. Theresa B. VillanuevaHead, Microforms and Digital Resource Center
Rizal Library, Ateneo De Manila University
April 15-16, 2013James O’Brien Library-Ateneo de Naga University
Index
a tool, which indicates to a user the information or a source of information that one needs
2
a systematic guide designed to indicate subjects, topics, or features of documents in order to facilitate their retrieval
DEFINITION OF TERMS
Indexing
the process of identifying and assigning index terms to a document, either to describe its physical characteristics, give facts about its creator or distribution, or describe its content
3
General Purposes of Indexes
To construct representations of documents in a form that is suitable to the users to browse through
To maximize the searching success of the users
To minimize the time and effort in finding information
4
• facilitate reference to the specific material or to locate wanted information
• serve as filter to withhold irrelevant materials
• make the information storage and retrieval system useful to individual
• disclose related information
• tool for current awareness services
5
Uses of Indexes
By Arrangement
7
a. Alphabetical Index - is based on the orderly principle of letters of the alphabet; used for the arrangement of subheadings, cross references as well as main headings
b. Classified Index – contents are arranged systematically by classes or subject headings
c. Concordance – is in alphabetical index of all principal words appearing in a single text or in a multi-volume of a single author w/ a precise pointer to the precise point at which the word occurs.
By Physical Form
8
a) Card index – an index in which 3” x 5” cards are used as the tools
b) Printed index – a tool for indexing or for researching and retrieval of information that is in printed form
c) Microform index – index to microforms such as microfiche and microfilm
d) Computerized index – uses computers to construct indexes
By Type of Materials Index
a. Audiovisual Material Index
- textual labeling (index terms or description) is needed along with image matching
- search on words may retrieve a particular image related to the search term which in turn can be used as input to find other related entries
9
b. Book index
- a list of words or group of words arranged
alphabetically, at the back of the book giving a page location of the subject or name associated with each word.
10
Periodical Index/Newspaper Index
- open-ended projects usually performed
by group of people
- consistency is a challenging part since
each periodical issue may deal with unrelated topics by several authors
- written in different styles and aimed at different users.
11
Classified Index Entry points are arranged in a hierarchy of related topics, starting with generic or broad topics and working down to the specific ones.Examples: - Index Medicus – classified index in the field of medicines and related disciplines - Engineering Index – classified index in the field of engineering and related disciplines
Alphabetical Subject Index
an alphabetical subject index covers a number of different kinds of indexes. The arrangement is in alphabetical order and follows a familiar pattern.
Examples:- Reader’s Guide to Periodical Literature (RGPL)- Index to Philippine Periodicals (IPP)
Author IndexEntry points are names of persons, organizations, government agencies, institutions, etc.
Examples: - Development Bank of the Philippines - Philippine Chamber of Commerce and Industry - Romulo, Carlos P.
Periodicals Indexes
12
- refers to the extent to which a document is analyzed to identify its subject content
– refers to the extent to which a concept or topic in a document is identified by precise term in the hierarchy of its genus-species relations
–refers to the extent to which agreement exists on the terms to be used to index contents of documents
INDEXING PRINCIPLES
Exhaustivi
ty
Consistency
Specificity
13
Principle of Exhaustivity
• Exhaustive indexing
use of various index terms to fully cover the major and minor themes of document
• Selective indexinguse of a few terms to cover only the main or major theme of a document
14
Exhaustivity results to high recall but low precision.
Principle of Specificity
Example:
Genus: Citrus FruitsSpecies: ORANGES
LEMONS LIMES
GRAPEFRUITS
Specificity would result to high precision but low recall 15
There are two types of consistency level:
Inter-indexer consistency refers to the agreement between or
among indexers in assigning subject terms in a particular article
16
Principle of Consistency
Intra-indexer consistency refers to the extent to which one
indexer is consistent to himself/herself on assigning subject terms.
Indexing Methods
1. Derived or derivative indexing
– a method by which words and phrases occurring in the title or text of
documentary unit are extracted by a human or computer to serve as indexing terms.
- also called an extractive indexing.
17
2. Assigned indexing
- a method by which terms, descriptors or subject headings are selected by a human or computer to represent the topics or features of a documentary unit
- assigned terms are often times taken from a
source other than the document itself.
18
Indexing Language
An indexing language is a language that is used by the indexer to
represent the subject content of a document.
19
Purposes and Uses of Indexing Language:
20
to represent the subject content of a document either using the words of the author or assigning appropriate descriptors from a controlled vocabulary
to help users discriminate between terms and reduce ambiguity in the language
Types of Indexing Language
1. Natural Language
- uses index terms/words occurring in the printed text as index entries; it is
sometimes called derived-term system
21
Characteristics of using Natural Language:
• Improves recall because it provides more access point but reduces precision
• Redundancy is greater
• Uses more current terms
• Tends to be favored by end-users
22
2. Controlled vocabulary
- represent the general conceptual
structure of one or more subject areas and presents a guide to the users of the index
- categorized as assigned-term system
23
Controlled Vocabulary provides cross references in the form of Use:
24
To show the three relationships of terms:
a) equivalenceb) hierarchical c) associative
This is achieved by providing or showing under:
broader term (BT) narrower term (NT) related terms (RT)use for (UF)
see also (SA)
Relationships of Terms:
a. Equivalence relationship - implies that there will be more than one term denoting the same concept
25
Equivalence relationship:
Example 1
Use for (UF) or Use reference (see reference)
Example: EMPLOYEES
UF: Personnel Staff Workers
- refers to a preferred descriptor from a non-usable term
26
Equivalence relationship
Example 2:
BIRTH CONTROL UF : Family Planning
- reference deals primarily with synonymous or variant forms of the preferred descriptor
- it is also used to lead the indexer to more general terms
27
Examples that indicate Equivalence relationship:
28
Synonyms (e.g. Reason; Cause)
Quasi-synonyms (e.g. Law; Law Management)
Preferred spelling (e.g. Catalog; Catalogue)
Acronyms and abbreviations (e.g. ASEAN; Association of Southeast Asian Nations)
Current and established terms (e.g. Cellular Radio; Cellular Phone)
Translation (e.g. Coconut Coir; Bunot)
b. Hierarchical relationship
– refers to the general and specific or broad and narrow type of relationship
29
Broader term (BT)
EmployeesBT : People
- shows hierarchical relationship upward in the classification ranking
- it differs from the use for reference in that both the basic terms and its broader term are descriptor
terms and both can be used
30
Hierarchical relationship Example 1 :
CatsBT: ANIMALS
"ANIMALS" is a broader term to "CATS“ because all cats are animals.
Reference: http://publish.uwo.ca/~craven/677/thesaur/main05.htm
Hierarchical relationship:
Example 2
Narrower term (NT)
Employees
NT : HOTEL EMPLOYEES RAILROAD EMPLOYEES
- reference is similar to the broader term reference, except it goes down in the classification ranking
32
Hierarchical relationship: Example 3
HeadNT : NOSE
“NOSE” might be a narrower term to “HEAD”, because noses are normally parts of heads.
Reference: http://publish.uwo.ca/~craven/677/thesaur/main05.htm
Hierarchical relationship: Example 4
Genus – species relationship (represent class
inclusion) Example: Animals Domestic Animals
Cats
Whole-part relationship Example: Hand Fingers
Instance relationship Example: Mountains Mount Apo
34
Example 1 :
Related term (RT)
EMPLOYEE
RT : EMPLOYMENT
- reference refers to a descriptor that can be used in addition to the basic term but not
in a hierarchical relationship
36
Associative relationship
Other Examples :
Teachers – Student Tables – Chairs Education – Teaching Men – Women
37
Associative relationship
Scope Note:
Examples: INDEXING (SN) Assigning of natural language terms
to documents
HOSPITALIZATION (SN) Assign also terms for the conditions for which patients were
hospitalized, if applicable
Qualifier: Example: Security (Law)
Security (Psychology) 38
Reference: http://publish.uwo.ca/~craven/677/thesaur/main08.htm
Scope Note (SN) & Qualifier - used to give the users about the descriptor’s usage restrictions or to clarify ambiguity; a scope note may give additional instructions to indexers
Functions of Controlled Vocabulary:
• To control synonyms by choosing one form as the standard term
• To make distinction among homographs
• To link or bring together those terms whose meaning are closely related
Example: Cereals and Wheat
• Controls variant spelling
39
40
A controlled vocabulary may take the form of verbal expressions as illustrated by Subject Headings Lists and Thesauri or coded/nonverbal expressions as shown by Classification schemes.
Subject headings lists – are lists of terms representing several subject fields; some focus on specific fields
Thesauri – are another authority devices that cover more
specific or narrower subject fields
Classification schemes – generally contain coded expression
or notations to the relevant topics in a particular class or
subclass
INDEXING PROCESS:
1. Recording of bibliographic data
- recording of the important information or the elements that identify a particular document
The International Organization for Standards (ISO) set a Standards for bibliographic references:
ISO 690 1975 (E)- “Bibliographic References
Essential and Supplementary Elements” 42
43
- When indexing contents of a collection of documents, locators should give complete information about each document.
- for periodical articles, each entry normally consists of
the following elements:
Essential elements for an article or contribution in a
periodical are:
Name(s) of Author(s) with forenamesTitle of the article Title of the periodical or SourceVolume Number Issue Number Date of the issue Page number
Example: Name(s) of Author(s): [Xian, Jie]
Title of the article : [Hybrid rice: a new hope towards a
bountiful Philippines]
Title of the periodical or Source : [Impact]
Volume Number : [46]
Issue Number : [9]
Date of the issue : [September 2007]
Page number : [4-8]44
Sample entry:
________________ (subject/Topic)
Xian, Jie. Hybrid rice: a new hope towards a bountiful Philippines. Impact, Vol. 46, no.9, S ‘12, p. 4-8.
ISO FORMAT:
46
ATENEO FORMAT:
OTHER FORMAT:
________________ (subject/Topic)
_______________ (subject/topic)
Format comparison:
_______________ (subject/topic)
ISO FORMAT:
Xian, Jie. Hybrid rice: a new hope towards a bountiful Philippines. Impact, Vol. 46, no.9, S ‘12, p. 4-8.
Hybrid rice: a new hope towards a bountiful Philippines. Xian, Jie. Impact 46 (9) : 4-8. S ‘12.
Xian, Jie. Hybrid rice: a new hope towards a bountiful Philippines. Impact 46 (9) : 4-8. S ‘12
2. Subject determination
“aboutness of the material and the formulation of a
concept list
• Choose the most appropriate concepts; consider the users & the purpose of the index
• No arbitrary limit should be set to the number of terms or descriptors which can be assigned to a document.
- it should be determined fully by the amount of information
contained in the document - it should be related to the expected needs of the users of
the index. 47
• Modify the indexing guidelines and procedures if needed; but modification should not compromise the structure or logic of the indexing language.
• Concepts should be as specific as possible. More general concepts may be preferred in some circumstances, depending upon the following factors:
– over-specificity might adversely affect the performance of the indexing system.
– if an idea is not fully developed, or is referred to only casually by the author, then it might be justified to index at a more general level
48
3. Content/Conceptual analysis
– identifying the topics discussed in a
document and determining what aspects of its users will be interested in
49
Content Analysis
- Decide which topics in the item are relevant to the potential user of the document.
- Decide which topics truly capture the content of the document.
- Determine terms that come as close as possible to the terminology use in the document.
- Decide on index terms and the specificity of those terms.
50
Parts of the document that have to be
analyzed
Title of the document/article - it is considered as basic indexing unit
- it is the first stop in determining the subject content
Abstract - actual information-packed miniature of documents;
- good abstract can be fundamental indicator of subject content
51
Text itself - includes introduction, summary, conclusion, section heading, first & last sentences of the paragraph
Illustrations, diagrams, tables and captions
References - reference sources cited by the author may also
be considered as subject indicator
52
Factors that may affect content analysis:
if there is labor shortage or other critical time factor
the guidelines and policies imposed by institutions that generally concerns with the selection of index
content
decisions of the indexer which aspects of the subjects will be emphasized and which aspects will be deemphasized
53
4. Translation
- involves the conversion of terms in the natural language into standard terms drawn from a
controlled vocabulary such as thesaurus, subject headings list, etc.
- match terms in the concept list against those available in the controlled vocabulary
54
Practices to follow in the Translation process:
55
- Concepts which are already translated into indexing terms should be translated into their preferred terms
- Terms which represent new concept should be checked for accuracy and acceptability from the reference tools such as:
◦ Dictionaries and encyclopedias ◦ Thesauri (UNBIS Thesaurus)◦ Classification schemes (Library of Congress)◦ Established indexes (Reader’s Guide to Periodical Literature)
- Subject specialist, particularly those with some knowledge of indexing or documentation, may also be consulted
56
- If the concepts are not found in existing thesaurus or
classification scheme, these may be:
• expressed by terms or descriptors which are admitted into indexing language
• represented temporarily by more general terms; the new concepts being proposed as candidates for later addition
Translation
- Group references to information that is scattered in the text of the document.
- Combine heading and subheadings into related multilevel headings.
- Direct the user seeking information under terms not used to those that are being used by means of see references and to related terms with see also references.
- Arrange the index into a systematic presentation
57
Generating Index Entries
Index entries maybe generated manually or using the computer.
Manual generation- involves generation of index entries one by one using an ordinary or electric typewriter
Machine generation- involves the use of the computers in generating index entries; various software packages are available
58
Indexing Techniques for Periodicals
1. Topics that can be considered for indexing are the following:
- persons - local politics - sports events - entertainment - economic news - editorials & columns
- special features - first and last events - social trends
59
• All article that have permanent value should be indexed under all topics and issues dealt with
• Editorials should be indexed under their topics as any other article but differentiated with others by adding (Ed.) or (E). The titles of editorials may be indexed under a collective heading “Editorials”.
• Letters to the editor if considered indexable should be indexed by topic, not under a caption that may have been assigned by the editor. It is advisable to index at least the name of the person who criticized an article as well as the author’s response.
60
2. Preference and Forms of Headings based on the
International Organization for Standardization
(ISO 999)
Personal Names:
– Provide as full a form as possible
– Choose the most recent/most commonly used form of personal name as the heading and add “see” cross-reference from other forms
– Personal names should be take the form used in the document, but if the text is not consistent the indexer should adopt one form. 61
– Compound and multiple surnames, whether hyphenated or not, should be indexed under the first part
e.g. Lee Chua, Queena, Loren ; Perez de Cueller, Javier
– Persons normally identified by title of honor or nobility should be indexed under the first name
e.g. Prince Charles see Charles, Prince of Wales Queen Elizabeth I see Elizabeth I, Queen of England
62
63
Corporate Bodies
• Names of the corporate bodies should normally be indexed without transportation and in as full a form as necessary. An initial article is omitted , unless specifically required for semantic or grammatical reasons
e.g. Lopez Museum
• Transposition maybe used if it is considered that this would help the users of the index
e.g. Department of Energy see Energy, Department of
• Choose the most recent, or the most commonly used, form of corporate name as the main heading and add “see” cross references from other forms
e.g. Philippine Normal College see Philippine Normal University
64
Geographic Names
• Geographic names should be as full as is necessary for clarity, with additions to avoid confusion with the otherwise identical names Example: J.P. Rizal (Quezon city)
J.P. Rizal (Marikina)
• An article or preposition should be retained in a geographic name of which it forms an integral part
Example: Santolan, Pasig City
• Where the article or preposition does not form an integral part of a name it should be omitted Example: New Day rather than The New Day
Standards serve as models and guidelines for the analysis of documents, construction and organization of indexes, indexing terminology, construction and use of thesauri, etc. they promote consistency and uniformity.
66
A. International Organization for Standardization
-is a network of the national standards institutes of 146 countries, on the basis of one member per country, with a Central Secretariat in Geneva, Switzerland that coordinates the system.
67
ISO 5963: 1985 Documentation – Methods for examining documents, determining their subjects, and selecting indexing terms
ISO 999: 1996 Information and documentation – Guidelines for the content, organization and
presentation of indexes
ISO 4: 1997 Information and documentation
– Rules for the abbreviation of title words and titles of
publications. It publishes a List of Serial Title Word Abbreviations which includes title word abbreviations
in over 50 languages.
68
B. National Information Standards Organization (NISO)
A nonprofit association accredited by the American
National Standards Institute (ANSI) that identifies, develops, maintains and publishes technical standards to manage information
in our changing and ever-more digital environment.
NISO standards apply both traditional and new technologies
to the full range of information-related needs, including retrieval, repurposing, storage, metadata, and presentation.
69
Standards developed by NISO:
– ANSI/NISO Z39.2 – 1994 (R2001) Information interchange format equivalent international standard: ISO 2709
– ANSI/NISO Z39.19 – 2003 Guidelines for the construction, format, and management of Monolingual Thesauri
*Equivalent international standard: ISO 2788
70
C. British Standards Institution (BSI)
– as the National Standards Body of the UK, it develops standards and applies innovative standardization solutions to meet the needs of business and society.
Standards developed by BSI (related to library and information science): – BS 1749: 1985 Recommendations for
alphabetical arrangement and the filing order of numbers and symbols
• Provides guidance on arranging entries within lists of all kinds, e.g. bibliographies, catalogues, directories and indexes.
– BS ISO 999: 1996 Information and Documentation – guidelines for the content, organization and presentation of indexes 71
Automatic Indexing
refers to indexing by machine, or the analysis of text by means of computer algorithms.
- The focus is on automatic methods used behind the scenes with little or no input from individual searchers, with the exception of relevance feedback.
- It does not include searching options and techniques used by human searches, such as methods for creating effective search statements, adding weights to terms, specifying proximity requirements, using truncation, wild cards or combining terms with Boolean or role operators.
72
Four Types of Approaches
• Statistical – based on counts of words, statistical associations, and collation techniques that assigns weights, cluster similar words
Example: Tf-idf (term frequency -inverse document frequency), which is frequency used in many search engines.
The intuitive philosophy behind tf-idf is that terms that are frequent in many documents are less suited to make discriminations, while terms that are frequent within a single document may indicate that this document has much information about the things the terms are referring to).
Source: Cleveland & Cleveland, 2001, p. 21173
• Syntactical – stresses grammar and parts of speech, identifying concepts
found in designated grammatical combinations, such as noun phrases
• Semantic systems – systems are concerned with the context sensitivity of words
in the text Examples: What does cat mean in terms of its context?
House cats? Heavy earthmoving equipment?
• Knowledge-based – systems goes beyond thesaurus or equivalent relationships
to knowing the relationship between words Example: ‘tibia’ is part of a leg, thus the document is indexed under ‘leg injuries’.
74
Human / Manual Indexing vs. Automatic Indexing
• Automatic methods have trouble handling synonyms, homonyms, and semantic relations. Conceptualizing is very poor. Human indexers go through cognitive processes that may be influenced by their background experience, education, training, intelligence, and common sense.
• Computers can, and humans cannot, organize all words in a text and in a given database and make statistical operations on them (e.g. Td-idf).
75
Websites for Indexers Indexing Services H.W. Wilson Home Page (http://www.hwwilson.com/)
Wright Information (http://mindspring.com/~jancw/)
Susan Holbert Indexing Services ( http://abbington.com/holbert/)
Special Formats and Subjects IndexingASIS Thesaurus of Information Science (http://www.asis.org/Publications/Thesaurus/isframe.htm)
The Library of Congress Thesauri (http://lcweb.loc.gov/pmei/lexico/liv/bsearch.html)
StandardsNational Information Standards Organization (http://www.niso.org/)
ANSI/NISO Z39.41- 1997 Guidelines for Abstracts (http://www.ansi.org/)
ANSI/Z39.4- 1984 Basic Criteria for Indexers (http://www.ansi.org/)
Indexing software
HTML Indexer (for Windows) http://www.html-indexer.com/
Cindex (for DOS, Windows, and Macintosh) http://www.indexres.com
76