Taxonomy Fundamentals - SLA 2014

Preview:

DESCRIPTION

An all-day version of Access Innovations' Taxonomy Fundamentals workshop, presented by Marjorie M.K. Hlava and Bob Kasenchak at the 2014 Special Libraries Association (SLA) annual meeting in Vancouver, British Columbia on June 7, 2014.

Citation preview

Taxonomy Fundamentals

Why build a taxonomy?

SLA – Vancouver – June 7, 2013

www.accessinn.comwww.dataharmony.com

505-998-0800Marjorie M.K. Hlava

President and Chief ScientistBob Kasenchak

Project CoordinatorAccess Innovations, Inc.

Copyright © 2013 Access Innovations, Inc.

A fast moving and powerful introduction to both the theoretical and practical aspects of building a taxonomy, thesaurus, and ontology. A well-built taxonomy is part of the foundation of the information architecture underlying web sites, corporate Intranets, search/retrieval, and access to relevant content in databases. After defining controlled vocabularies and identifying core standards, you will explore key concepts of taxonomy, thesaurus, indexing, classification, and filtering. Discussion will include the basics of a taxonomy records and fundamental term relationships. Attendees will put concepts into practice through multiple exercises, taxonomy, indexing, and related software tools will be demonstrated.

Introduction To Taxonomy Concepts

Copyright © 2013 Access Innovations, Inc.

About Access InnovationsAccess Innovations are experts in content creation, enrichment, and conversion services. We provide services to semantically enrich and tag raw text into highly structured data. We deliver clean, well-formed, metadata-enriched content so our clients can reuse, repurpose, store, and find their knowledge assets. We go beyond the standards to build taxonomies and other data control structures as a solid foundation for your information. Our services and software allow organizations to use and present their information to both internal and external constituents by leveraging search, presentation, and e-commerce. We change search to found!

Quick Facts• Founded in 1978• Headquartered in Albuquerque, NM• Privately held• Delivered more than 2000 engagements

Copyright © 2013 Access Innovations, Inc.

What we do

Access Innovations Ensure clean, well formed content Create Knowledge Organization Systems (KOS)

Data Harmony Tools To automatically index content To manage KOS and more To semantically enrich the content To organize the content

Visualization tools to portray the data

4Copyright © 2013 Access Innovations, Inc.

Outline of the Day Why the excitement What is a Taxonomy Card Sort – Slide 39 How to build a taxonomy Term relationships Thesaurus Examples Pre and Post

Coordination What are we controlling Vocabulary Options

TaxoMatch - Slide 189 Term Forms Facets / Notation / Roles /

Treatment/ Weighting Auto Indexing A Taxing Situation - Slide

315 Search Where do I use it? Standards and references

Why The Excitement? Makes information findable!

Cut search time by 50%! (The Weather Channel) Leverages information in new ways User satisfaction Organizes topical areas and web sites Provides better online help

Customer support 30x more costly than web self-service*

*(Forrester Research "Tier Zero Customer Support" 1999)

Copyright © 2013 Access Innovations, Inc.

Taxonomies are found…

• In “indexing”, tagging, categorizing, subject metadata• In search - precision, recall• In content management systems, web sites• In SharePoint to replace term tree, tag uploads• In mashups, repackaging, repurposing data• In social networking sites• In author tagging - peer reviewer selection• In filtering data – e.g., spam filters and RSS feeds• In web crawlers• In text analytics – trend analysis• … and much more

Copyright © 2013 Access Innovations, Inc.

Because taxonomies make them work

Where Does Implementation Happen?

At the backend When the records / articles are added to

the production system When the search software’s “inverted file”

is created When the HTML for the web page is

created

Copyright © 2013 Access Innovations, Inc.

Heart Of The “Big Data” Production Process

Copyright © 2013 Access Innovations, Inc.

From the production side to the website display, carry the taxonomy descriptors for use in precision search

Copyright © 2013 Access Innovations, Inc.

Taxonomy

Copyright © 2013 Access Innovations, Inc.

Authors at a place

MASHUP locations to a GPS grid of an area

Two data points GPS Coordinates Taxonomy description of the place

Copyright © 2013 Access Innovations, Inc.

Watch Crime In Action

Copyright © 2013 Access Innovations, Inc.

Copyright © 2013 Access Innovations, Inc.

Copyright © 2013 Access Innovations, Inc.

Two data points GPS Coordinates Taxonomy description of the crime

Copyright © 2013 Access Innovations, Inc.

17

Visualization Strategies

MatrixVisualization

Software

Copyright © 2013 Access Innovations, Inc.

Copyright © 2013 Access Innovations, Inc.

Copyright © 2013 Access Innovations, Inc.

All Data Up-postedTo The Top Level

Copyright © 2013 Access Innovations, Inc.

Pattern AnalysisIndexing Clusters

Copyright © 2013 Access Innovations, Inc.

Pattern AnalysisDomain Associations

Copyright © 2013 Access Innovations, Inc.

Pattern AnalysisDomain Correlations

Copyright © 2013 Access Innovations, Inc.

Pattern AnalysisGap Analyses

Copyright © 2013 Access Innovations, Inc.

Pattern AnalysisComponent Gaps

Copyright © 2013 Access Innovations, Inc.

More Like This - RecommenderCancer Epidemiology Biomarkers & Prevention Vol. 12, 161-164, February 2003© 2003 American Association for Cancer Research Short Communications

Alcohol, Folate, Methionine, and Risk of Incident Breast Cancer in the American Cancer Society Cancer Prevention Study II Nutrition Cohort Heather Spencer Feigelson1, Carolyn R. Jonas, Andreas S. Robertson, Marjorie L. McCullough, Michael J. Thun and Eugenia E. Calle Department of Epidemiology and Surveillance Research, American Cancer Society, National Home Office, Atlanta, Georgia 30329-4251

Recent studies suggest that the increased risk of breast cancer associated with alcohol consumption may be reduced by adequate folate intake. We examined this question among 66,561 postmenopausal women in the American Cancer Society Cancer Prevention Study II Nutrition Cohort.

Related Press Releases• How What and How Much We Eat (And Drink)

Affects Our Risk of Cancer • Novel COX-2 Combination Treatment May

Reduce Colon Cancer Risk Combination Regimen of COX-2 Inhibitor and Fish Oil Causes Cell Death

• COX-2 Levels Are Elevated in Smokers

Related AACR Workshops and Conferences• Frontiers in Cancer Prevention Research• Continuing Medical Education (CME) • Molecular Targets and Cancer

TherapeuticsRelated Meeting Abstracts• Association between dietary folate

intake, alcohol intake, and methylenetetrahydrofolate reductase C677T and A1298C polymorphisms and subsequent breast

• Folate, folate cofactor, and alcohol intakes and risk for colorectal adenoma

• Dietary folate intake and risk of prostate cancer in a large prospective cohort study

Related Working Groups• Finance• Charter• Molecular Epidemiology

Related Education Book ContentOral Contraceptives, Postmenopausal Hormones, and Breast CancerPhysical Activity and CancerHormonal Interventions: From Adjuvant Therapy to Breast Cancer PreventionRelated Awards

• AACR-GlaxoSmithKline Clinical Cancer Research Scholar Awards

• ACS Award• Weinstein Distinguished Lecture

WebcastsRelated Webcasts

Think Tank ReportRelated Think Tank Report Content

Copyright © 2013 Access Innovations, Inc.

Link to Society Resources

Journal Article on Topic A

Other Journal

Articles on Topic A

Upcoming Conference on Topic A

Podcast Interview with Researcher

Working on Topic A

Grant Available for Researchers

Working on Topic A

CME Activity on

Topic A

Job Posting for Expert on Topic A

Copyright © 2013 Access Innovations, Inc.

Author Connections

Copyright © 2013 Access Innovations, Inc.

What is a taxonomy?

Albuquerque, NM 87110www.accessinn.com

www.dataharmony.com505-998-0800

Marjorie M.K. Hlava

President and Chief Scientist

Access Innovations, Inc.

Copyright © 2013 Access Innovations, Inc.

Vocabulary Control - Options Classification

systems* Authority files Controlled term lists Uncontrolled term

lists Thesauri

Copyright © 2013 Access Innovations, Inc.

[*We will concentrate on taxonomies and thesauri, first, and then cover the others as time permits.]

Taxonomy Standards Z39.19 (2005) Controlled Vocabularies BS 8723 Parts 1 – 5 ISO25964 Parts 1 - 4 TAG 37 and 46 standards SKOS - Simple Knowledge Organization

System OWL - Web Ontology Language AND more!

Copyright © 2013 Access Innovations, Inc.

A Taxonomy is a Knowledge Organization System (KOS)

Uncontrolled list Name authority file Synonym set/ring Controlled vocabulary Taxonomy Thesaurus Ontology Semantic network

Not complex

Highly complex

Copyright © 2013 Access Innovations, Inc.

Structure Of Controlled Vocabularies

Lists Synonyms Taxonomy Thesaurus Ontology

Ambiguity Ambiguity Ambiguity Specifies a KOS Synonym Synonym Additional kinds of

Hierarchy Hierarchy RelationshipsRelationships relationships

INCREASING COMPLEXITY and CONTROL

Copyright © 2013 Access Innovations, Inc.

What is a Taxonomy? ANSI/NISO Z39.19-2005

“A collection of controlled vocabulary terms organized into a

hierarchical structure.”

controlled

Missing: equivalence, homographic, and associative relationships and notes

Yes!

Copyright © 2013 Access Innovations, Inc.

Taxonomy? Thesaurus?

Often used interchangeably Thesaurus is a taxonomy with extras

Related Terms Non-preferred Terms (USE/Used for) Scope Notes More

Taxonomies often have the actual information object at the final node.

CMS and SharePoint tend to the hierarchical view only, definition, and USE

Copyright © 2013 Access Innovations, Inc.

Taxonomy? Thesaurus?

Main Term (MT) Top Term (TT) Broader Terms (BT) Narrower Terms (NT) Related Terms (RT)

See also (SA) Non-Preferred Term (NP)

Used for (UF), See (S) Scope Note (SN) History (H)

= subject term, heading, node, category, descriptor, class

TAXONOMY

THESAURUSOWL can specify

Copyright © 2013 Access Innovations, Inc.

The Semantic Roadmap: Knowledge Organization Systems

Semantic network Ontology Thesaurus Taxonomy Controlled vocabulary Synonym set/ring Name authority file Uncontrolled list

• Unrelated Entities• Ambiguity

• Linked Entities• Contextual Specificity

• Simple• Low Value

• Complex• High value

Uncontrolled list has the

Highest Cost over Time!

Copyright © 2013 Access Innovations, Inc.

Copyright © 2005 - Access Innovations, Inc.

Taxonomyview

ThesaurusTerm Record

view

Copyright © 2013 Access Innovations, Inc.

CARD SORT

Copyright © 2013 Access Innovations, Inc.

Taxonomy 101How do you build a taxonomy?

Albuquerque, NM 87110www.accessinn.com

www.dataharmony.com505-998-0800

Marjorie M.K. Hlava

President and Chief Scientist

Access Innovations, Inc.

Copyright © 2013 Access Innovations, Inc.

How Do You Build a Taxonomy ?

• Define subject field• Collect terms• Organize terms• Fill in gaps• Flesh out and interrelate terms• Apply to your data

You’re done!

Copyright © 2013 Access Innovations, Inc.

Foundations Start with what is known Build from there Use the literature, your data Use the lists you already have internally Built-in continuous review throughout the

process, and beyond Who is involved?

Taxonomists Subject matter experts (SME) Project management Users

Copyright © 2013 Access Innovations, Inc.

Define Subject Field

Review representative collection of content Determine:

Core areas Peripheral topics

PsychologyEducation

Sociology

Law

Scope can be modified later

Copyright © 2013 Access Innovations, Inc.

Where Do I Get the Terms?

Your documents and databases Departmental terminology Text books and their indexes Book tables of contents and indexes Journal quarterly indexes Encyclopedias Lexicons, glossaries on the topic Web resources Users and experts Search logs

Copyright © 2013 Access Innovations, Inc.

How Do You Choose Terms?

Importance in the subject area Use in the literature, by the organization

or community Necessary degree of specificity or detail Relationship with other controlled

vocabularies Single concept = single term

Copyright © 2013 Access Innovations, Inc.

Build, Buy, Augment? Survey existing thesaurus/taxonomy resources for your

domain Test for

• Scope• Depth• Make-or-break terms• Cost

Adoption of existing taxonomies Term registries Taxobank Taxonomy Warehouse Other resources

Don’t reinvent the wheel!Copyright © 2013 Access Innovations, Inc.

Gather Terms From Search Logs

Top ~100 search terms from search logs Terms used more than 50 times Match to web site with appropriate

answer Basis for favorites or best bets, presented

at the top of results list Behavior-based taxonomy

Copyright © 2013 Access Innovations, Inc.

Vocabulary Control – How?

Use unambiguous terms, clear to the user group

Distinguish between terms that appear similar

Use Scope Notes when necessary Use terms as elements that can be

coordinated in a flexible manner Create compound terms, if necessary

Copyright © 2013 Access Innovations, Inc.

Term Format

KISS – Keep it short and simple• 1-2-3 words• Effect on search• Pre and Post Coordination

Establish a policy • follow Chicago Manual of Style

Grammatical issues • Nouns and noun phrases• Verbs Gerunds • Adjectives - no• Adverbs - no• Initial articles – no

Copyright © 2013 Access Innovations, Inc.

Thesaurus - Format

Main Entries Top Terms - TT Broader Terms - BT Narrower Terms - NT Related Terms - RT Scope Notes - SN History - HI Date term added/changed - DA

Copyright © 2013 Access Innovations, Inc.

Thesaurus - Format

Related terms - RT See - S See also - SA Use - U

Preferred Term PT Use for - UF

Non Preferred Term NP ..

Copyright © 2013 Access Innovations, Inc.

Definitions

Index term the representation of a concept

Preferred term (International)

a term used consistently to index a concept descriptor (USE) what the “USED FOR” reference points to

Copyright © 2013 Access Innovations, Inc.

Definitions

Non preferred term (International) synonym or quasi synonym of a preferred term non-descriptor (USE) the “USE” reference the “SEE” reference

Related term the “SEE ALSO”

Copyright © 2013 Access Innovations, Inc.

Indexing Terms

Three main categories concrete entities abstract concepts proper nouns

Copyright © 2013 Access Innovations, Inc.

One Term / One Concept

Importance in the subject area Use in the literature, by the organization

or community Necessary degree of specificity or detail Relationship with other controlled

vocabularies

Copyright © 2013 Access Innovations, Inc.

One Term / One Concept

Terms represent simple or unitary concept A unit of thought Can be a single-word term Can be a multiword term, if required to

represent the concept Three main categories

– Concrete entities – Abstract concepts– Proper nouns

“A unit of thought, formed by mentally combining some or all of the characteristics of a concrete or abstract, real or imaginary object. Concepts exist in the mind as abstract entities independent of terms used to express them.”

Copyright © 2013 Access Innovations, Inc.

Concrete Entities

Things and their physical parts primates

head buildings

floors islands

Copyright © 2013 Access Innovations, Inc.

Concrete Entities as Terms

• Things and their physical parts– Birds

• Feathers

• Buildings• Floors

• Materials– Cement – Wood – Lead

– Cards and Chips

Copyright © 2013 Access Innovations, Inc.

Concrete Entities

Materials cement wood lead cars refrigerators

Copyright © 2013 Access Innovations, Inc.

Abstract Concepts

Actions and events evolution respiration skating management wars ceremonies

Copyright © 2013 Access Innovations, Inc.

Abstract Concepts

Abstract entities, properties of things, materials and actions law theory strength efficiency lead (management)

Copyright © 2013 Access Innovations, Inc.

Abstract Concepts

Disciplines and sciences physics meteorology mathematics psychology

Copyright © 2013 Access Innovations, Inc.

Abstract Concepts

Units of measurement kilograms pounds meters miles

Copyright © 2013 Access Innovations, Inc.

Abstract Concepts as Terms• Actions and events

– evolution, skating, management, ceremonies• Abstract entities

– law, theory• Properties of things, materials, and

actions– strength, efficiency

• Disciplines and sciences– physics, meteorology, mathematics

• Units of measurement– pounds, kilograms, miles, meters, nanoseconds

Copyright © 2013 Access Innovations, Inc.

Proper Nouns*

Individual entities, or “classes of one”, expressed as proper nouns San Francisco United States of America Lake Michigan

* Proper names – of persons – are not included

Copyright © 2013 Access Innovations, Inc.

Proper Nouns as Terms

Individual entities – “classes of one” – expressed as proper nouns San Francisco, Lake Michigan

Thesaurus standards exclude proper names, persons, and trade names authority files.

Taxonomies include them as final nodes.

Copyright © 2013 Access Innovations, Inc.

Most Terms Are Nouns

Nouns or simple noun phrases Adj + Noun – Art history (ANSI/NISO standard)

Noun + Prep + Noun – History of art (ISO standard) Exceptions – Burden of proof, Coats of arms,

Prisoners of war, Birds of prey, etc.

Copyright © 2013 Access Innovations, Inc.

About “and”

Avoid “and” in terms – not a single concept

Instead of: Children and television

Factor and postcoordinate

USE Media influence + Television + Children“And” is not in the standard

In real life—need for granularity may dictate your choice

Copyright © 2013 Access Innovations, Inc.

Compound Terms – Nope!

“Terms in a thesaurus should represent simple or unitary concepts…” (ISO standard)

“Compound terms should be factored (split) into simple elements…” (ANSI/NISO standard)

Term phrases are okay (bigrams) Adjective Noun American history

Two concepts combined are not Aromatherapy for bloating

Copyright © 2013 Access Innovations, Inc.

Organize Terms – Roughly

Sort terms into several major categories – logical groups of similar concepts as Top Terms Identify core areas and peripheral topics 10 – 20 to start Consider moving proper names to authority files

Result: loose collection of terms under several main headings Rough and tentative – see how it fits as you go Initial gap analysis Add / modify / delete as needed

Copyright © 2013 Access Innovations, Inc.

Term Relationships

How Do Terms Relate?

Hierarchical relationships-- Parents and their

children Equivalence relationships

-- Aliases Associative relationships

-- Cousins

TAXONOMY

THESAURUS

Copyright © 2013 Access Innovations, Inc.

Hierarchical Relationships

Broader Term (BT) represents the class, whole, or genus

Narrower Term (BT) is a member, part, or species Generic relationship Whole-part relationship Instance relationship

NT inherit all the BT characteristics BTs/NTs have a reciprocal relationship

Copyright © 2013 Access Innovations, Inc.

Hierarchical Relationships

Class as a whole superordination broader term (BT) sometimes top term (TT)

Members or parts of the class subordination narrower term (NT)

Reciprocal

Copyright © 2013 Access Innovations, Inc.

Hierarchical Relationships

BT/NT based on being part of same class Same fundamental category

entities activities agents properties

Copyright © 2013 Access Innovations, Inc.

Hierarchical Relationships

Museums Archaeological museum type of entity NT Ethnological museum type of entity NT Curators agents RT Museum techniques action RT Scientific museum type of entity NT

Copyright © 2013 Access Innovations, Inc.

Hierarchy – Whole-Part Relationships

Four general types 1. Body systems and organs

Ear Middle ear

2. Geographical locations Bernalillo County Albuquerque

3. Fields of study Geology Physical geology

4. Hierarchical social structures Ontario Manitoulin District

Copyright © 2013 Access Innovations, Inc.

Hierarchy – Instance Relationships

General category (common noun) as BT,

with individual example (proper noun) as Narrower Term Instance (NTI)

Seas French cathedralsBaltic Sea Chartres Cathedral

Caspian Sea Rheims Cathedral

Mediterranean Sea Rouen Cathedral

Essentially identical to “final node” in taxonomies

Copyright © 2013 Access Innovations, Inc.

Hierarchical Typesof Display

Systematic Alphabetic other, but less common views

Copyright © 2013 Access Innovations, Inc.

80

DTIC

Hierarchy

Copyright © 2013 Access Innovations, Inc.

Polyhierarchical Relationship

• Term can logically fit under more than one Broader Term – can have Multiple Broader Terms (MBT)

• Part of ISO standards, new to ANSI/NISO

Nurses Health administrators Nurse administrators Nurse administrators

Finance Careers Accounting Accounting

Copyright © 2013 Access Innovations, Inc.

PolyhierarchicalRelationships

Great for the web click environment Terms occur in multiple categories Can be generic as well as hierarchical

Engineering PhysicsNT Nanotechnology NT Nanotechnology

NanotechnologyBT EngineeringBT Physics

Copyright © 2013 Access Innovations, Inc.

83

DTIC

Alpha

Copyright © 2013 Access Innovations, Inc.

Pests

Generic Relationship Tests

Squirrels

Rodents

ALL squirrels are rodents x NOT ALL squirrels are pestsx NOT ALL pests are rodents

Copyright © 2013 Access Innovations, Inc.

Generic Relationship Tests

• Both terms in same fundamental category• “All-and-some” test

SOME ALL

SOME NOT ALL

Rodents

Squirrels

Pests

SquirrelsConsider concepts of marketing and advertising

Copyright © 2013 Access Innovations, Inc.

Generic Relationships

“Identifies the link between a class or category and its members or species.”

Easy in biology Rodents

NT Squirrels All and some rule

Copyright © 2013 Access Innovations, Inc.

All and Some Rule

Rodents NT Squirrels RT Pests

Q. Is this an example of polyhierarchy? Q. Do you need to make RT relationships

for “Pests” to all of the NTs under “Rodents”?

Copyright © 2013 Access Innovations, Inc.

Instance Relationships Seas ISO

NT Baltic Sea NT Caspian Sea NT Mediterranean Sea

French Cathedrals NISO / ANSI NTI Chartres Cathedral NTI Rheims Cathedral NTI Rouen Cathedral RT Gothic cathedrals

Copyright © 2013 Access Innovations, Inc.

Instance Relationships French Cathedrals NISO / ANSI

NTI Chartres Cathedral NTI Rheims Cathedral NTI Rouen Cathedral RT Gothic cathedrals

French Gothic Cathedral NTI Chartres Cathedral NTI Rheims Cathedral NTI Rouen Cathedral BT Gothic cathedrals

Q. Why/how do these differ?Copyright © 2013 Access Innovations, Inc.

90

CABI Pages

Copyright © 2013 Access Innovations, Inc.

Instance Relationships

“…a general category of things and events expressed by a common noun, and an individual instance of that category, the instance then forming a class of one which is represented by a proper name.”

A way of adding the proper names and items from the Authority files to the thesaurus

Copyright © 2013 Access Innovations, Inc.

Questions before moving on to Associative Relationships?

Associative Relationships

Related Terms (RTs) – cousins “…terms related conceptually, but not

hierarchically, and are not part of an equivalence set” (i.e. not synonyms)

Both terms are valid thesaurus terms for indexing and have reciprocal relationship

Expands user’s awareness and reflects thesaurus coverage of unanticipated areas

Standards describe specific types

Copyright © 2013 Access Innovations, Inc.

Associated Relationships

Related terms

Physicians Medicine

(“Reciprocal posting” done automatically is highly desirable.)

Copyright © 2013 Access Innovations, Inc.

Associative Relationships Sibling relationships Examples:

Brother : Sister Desk : Chair

Easier to create within well defined facets (e.g. AAT)

Usual step in building process Can be identified automatically

Copyright © 2013 Access Innovations, Inc.

Associative Relationships

RT relationships Braking systems

RT Trains RT Bicycle RT Motor vehicle

Office furniture RT Office buildings RT Ergonomics

Copyright © 2013 Access Innovations, Inc.

Associative Relationships

Field of study and objects studied Seismology

RT Earthquakes Meteorology

RT Weather patterns

Copyright © 2013 Access Innovations, Inc.

Associative Relationships

Operation or process and the agent or instrument Hairdressing

RT Hair dryers Word processing

RT Typing skills

Copyright © 2013 Access Innovations, Inc.

Associative Relationships

Occupation and person in occupation Social work

RT Social workers Information science

RT Special librarians

Copyright © 2013 Access Innovations, Inc.

Associative Relationships

Action and the product of the action Publishing

RT Music scores Landscaping

RT Lawn mowers RT Irrigation systems

Copyright © 2013 Access Innovations, Inc.

Associative Relationships

Action and its patient Teaching

RT Students Conducting

RT Musicians

Copyright © 2013 Access Innovations, Inc.

Associative Relationships

Concepts related to their properties Women

RT Femininity Automobiles

RT Automotive safety

Copyright © 2013 Access Innovations, Inc.

Associative Relationships

Concepts related to their origins Water

RT Water wells Carpet

RT Thread

Copyright © 2013 Access Innovations, Inc.

Associative Relationships

Concepts linked by causal dependence Injuries

RT Accidents Cultural stress

RT Culture shock

Copyright © 2013 Access Innovations, Inc.

Associative Relationships

Action and counter action Pests

RT Pesticides Log on

RT Log off

Copyright © 2013 Access Innovations, Inc.

Associative Relationships

Raw material and its product Hides

RT Leather Clothing

RT Fabric

Copyright © 2013 Access Innovations, Inc.

Associative Relationships

Action and associated property Precision instrument

RT Accuracy Production processes

RT Quality control

Copyright © 2013 Access Innovations, Inc.

Associative Relationships

Concept and its opposite Single People

RT Married people Height

RT Depth RT Weight

If not hierarchical, probably associative

Copyright © 2013 Access Innovations, Inc.

Questions before moving on to Equivalence Relationships?

Equivalence Relationships

Refer to the same concept (Use for)

Prefix for non-preferred terms (Use)

Prefix for preferred terms Automobiles

used for Cars Cars

use Automobiles

Copyright © 2013 Access Innovations, Inc.

Equivalence Relationships

Use

Use forPhysicians

Doctors

Copyright © 2013 Access Innovations, Inc.

Equivalence Relationships Synonyms

popular and scientific spiders - arachnida

scientific and trade names Motrin (TM) - ibuprofen

standard names and slang hi fi - high fidelity

different linguistic origin home care - domicillary care

Copyright © 2013 Access Innovations, Inc.

Equivalence Relationships

Synonyms cont’d different cultures

aerials - antenna trunk - boot hire - rent

emerging concepts telecommuting - distance working

outdated refrigerators - iceboxes

Copyright © 2013 Access Innovations, Inc.

A “Term” Synonym Ring

Term

Node

Subject headingCategory

Descriptor

Copyright © 2013 Access Innovations, Inc.

Equivalence Relationships

Lexical variants variant spellings

Muslim - Moslem center - centre

direct and indirect forms electric power plants power plants, electric

abbreviations ECG - electrocardiograph

Copyright © 2013 Access Innovations, Inc.

Equivalence Relationships

Quasi synonyms urban areas - cities gifted people - geniuses

Antonyms height - depth literacy - illiteracy

Copyright © 2013 Access Innovations, Inc.

Equivalence Relationships

Up posting (generic posting) useful for web interfaces NT equivalent to their BT

not sub species of BT

Copyright © 2013 Access Innovations, Inc.

Equivalence RelationshipsPsychInfo Rotated

Copyright © 2013 Access Innovations, Inc.

Equivalence Relationships

Factored terms express terms in their combinations

Milk hygiene use milk and hygiene

Copyright © 2013 Access Innovations, Inc.

Equivalence Relationship

• Preferred Term – Thesaurus term and valid for indexing– Thesaurus notation: USE

• Non-Preferred Term– Not valid for indexing– An alias or imposter– Entry point, directs user to Preferred Term– Thesaurus notation: UF or NPT

Spiders Plant pathology UF Arachnids USE Phytopathology

Copyright © 2013 Access Innovations, Inc.

Equivalence – When to Use

Synonyms, slang, quasi-synonyms Scientific and trade names

Ibubrofen UF Motrin™ Lexical variants

Fiber optics UF Fibre optics Mouse UF Mice

Upward posting of narrow concepts not specified in taxonomy or thesaurus Social class UF Elite, Middle class, Working class

Get equivalent terms from search logs, brainstorming…

Copyright © 2013 Access Innovations, Inc.

Scope Notes (SN)

Indicate meaning of the term in the context of this thesaurus, for this audience Stress – Metal, Psychological, Physiological

Indicate any restriction in meaning Indicate range of topics covered Provide direction for indexers; for terms often

confused, may suggest an alternative term Use only as needed – not for every term Establish and stick with consistent format Be concise

Copyright © 2013 Access Innovations, Inc.

Scope Notes (SN)

Restrictions on meaning Range of topics covered Instructions to indexers Term histories Reciprocal scope notes

Copyright © 2013 Access Innovations, Inc.

Questions before moving on to more thesaurus examples?

Thesaurus - Examples Roget's 1852

synonyms COSATI - 1964

concept linking NASA AEC - ERDA - DOE - ESA

National Library of Medicine outline of a field Medical Subject Headings - MeSH

Copyright © 2013 Access Innovations, Inc.

Copyright © 2001 Access Innovations, Inc. 126

NASA

Alphabetic

Copyright © 2013 Access Innovations, Inc.

127

NASA

Hierarchical

Copyright © 2013 Access Innovations, Inc.

Thesaurus - Examples

INSPEC - multifaceted Thesaurus Classification system Free text terms Variant spellings

NICEM 27 Top Terms

Copyright © 2013 Access Innovations, Inc.

Copyright © 2001 Access Innovations, Inc. 129

INSPEC

Copyright © 2013 Access Innovations, Inc.

Copyright © 2001 Access Innovations, Inc. 130

INSPEC

Hierarchy

Copyright © 2013 Access Innovations, Inc.

Merged Vocabularies

Yahoo! Subject headings Authority files In a single list

Copyright © 2013 Access Innovations, Inc.

Copyright © 2001 Access Innovations, Inc. 132

Copyright © 2013 Access Innovations, Inc.

Copyright © 2001 Access Innovations, Inc. 133

Yahoo!

Hierarchy

Copyright © 2013 Access Innovations, Inc.

Merged Vocabularies - continued

Office.com Multiple broader terms Concept mapping

Copyright © 2013 Access Innovations, Inc.

Copyright © 2001 Access Innovations, Inc. 135Copyright © 2013 Access Innovations, Inc.

Eurovoc Thesaurus

PagesCopyright © 2013 Access Innovations, Inc.

Copyright © 2001 Access Innovations, Inc. 137

Eurovoc Thesaurus Hierarchy

Copyright © 2013 Access Innovations, Inc.

138

Eurovoc Terms

Copyright © 2013 Access Innovations, Inc.

So far you’ve got… Hierarchy

– Broader and Narrower Terms• Polyhierarchies when needed

– Preferred/Non-Preferred Terms – Equivalence relationships

– Related Terms– Associative relationships

– Scope Notes– Complete term records

– Correct term format

Copyright © 2013 Access Innovations, Inc.

So far you’ve got…

Hierarchical relationships-- Parents and their

children Equivalence relationships

-- Aliases Associative relationships

-- Cousins-- See Also’s

TAXONOMY

THESAURUS

Copyright © 2013 Access Innovations, Inc.

So far you’ve got…

• Term format• Grammatical issues• Singular and plural forms• Spelling• Abbreviations and acronyms• Capitalization• Other punctuation• Consistency

Copyright © 2013 Access Innovations, Inc.

Pre and Post Coordination

Pre and Post Coordinate Terms

Pre coordinates – two concepts Subject headings – Library of Congress

American history – Civil War Back of the book Put together in advance by the publisher

Post Coordinate Taxonomy terms Single concept Put together by the user / searcher

Copyright © 2013 Access Innovations, Inc.

Pre-coordination

Card catalogs - printed indexes Links and roles defined Controlled vocabularies High input costs Precise recall - easier searching

Copyright © 2013 Access Innovations, Inc.

Post-coordination

Starting with punch cards Machine readable Frequently natural language Currency and specificity Exhaustive coverage - loss of precision Low input costs False drops

Copyright © 2013 Access Innovations, Inc.

Work first from the literature Establish literary warrant for terms Some one else do the clerical work Differentiate the lexicography work

From the Subject Matter expert work Let SMEs do the review and tailoring Expert review ensures the proper term use and

application Advisory Board…advisable!

Subject Matter Experts (SME)

Copyright © 2013 Access Innovations, Inc.

Again, why do we index?

Improve precision define scope of terms

Improve recall different terms for same concept

Guide to a field of expertise Learning tool Richer expression

Copyright © 2013 Access Innovations, Inc.

Uses?

Indexing …process by which subject terms or

classification symbols are assigned to concepts in documents

A thesaurus is also known as an indexing language

M.A.I.™ is an automated indexing system

Copyright © 2013 Access Innovations, Inc.

What are We Controlling?

What are We Controlling?

Synonyms different terms same concept

Polysemes or Homonyms same word different meanings lead or mercury

Copyright © 2013 Access Innovations, Inc.

How? Meaning

delineation of scope of a term Term equivalence

linking of synonyms Disambiguation of homonyms

lead (metal) lead (element) lead (management)

Copyright © 2013 Access Innovations, Inc.

Disambiguation

Bridge Structure

Bridge Dentistry

Bridge Game

Bridge ConceptCopyright © 2013 Access Innovations, Inc.

Disambiguation

Restriction and clarification of meaning Cells

biological microsystems electrical equipment prison housing

Reading town in England communication process

Copyright © 2013 Access Innovations, Inc.

Disambiguation

Bill Invoice

Bill Legislative

Bill Sport

Bill PersonCopyright © 2013 Access Innovations, Inc.

Disambiguation: Pre-Coordinate vs.Post-Coordinate Forms

Cells (biology) Cells (electric) Cells (prison)

Reading (place) Reading (process)

Biological cells Electric cells

Copyright © 2013 Access Innovations, Inc.

Precision Options

Language specificity Coordination Compound terms - level of

precoordination Homographs and scope notes Word distance indication

Copyright © 2013 Access Innovations, Inc.

Precision Options

Structural relationships Links and roles Treatment and aspect codes Weighting

Copyright © 2013 Access Innovations, Inc.

Maintenance of aControlled Vocabulary

Allow for new jargon to be added Any living field will have new terms Identifier field Candidate terms Consider multiple broader terms

Copyright © 2013 Access Innovations, Inc.

Review, edit, test, edit, use, edit, and maintain, i.e., edit

Review Users Expert reviewers

Test Index 500+ documents

(more for variable writing style; fewer for strict style)

Monitor search log

Edit and maintain Add term Change existing term Change term status Delete term Add term relationship Delete term relationship Add/modify Scope Note Change overall structure

Consider automated / assisted indexing software

Copyright © 2013 Access Innovations, Inc.

When Do You Add More Terms?

On demand When usage changes Stewardess – flight attendant

As the field evolves 8 changes to 64 colors

In Use Don’t freeze waiting for perfection

Copyright © 2013 Access Innovations, Inc.

Vocabulary Control - Options

Classification systems

Authority files Controlled term lists Uncontrolled term

lists Thesauri

Copyright © 2013 Access Innovations, Inc.

Classification Systems - Defined

Are used to put an object in a specific place. In the traditional classification system each item has a single spot to go.

Follow an outline of knowledge Used to shelve books in a library

Copyright © 2013 Access Innovations, Inc.

Catalog Systems - Defined

Used to catalog the object to identify its contents

Based on perception Multiple terms are used to identify a

single object Not natural language Pre-coordinated - subheadings

Copyright © 2013 Access Innovations, Inc.

Classification Systems - Examples

Classification of actual collections New York State Library - Dewey

810.01 Cutter - Universities 1800 - 1960’s

Z34 Lan

Thomas Jefferson - Library of Congress z34.18 la

Government Documents Numbers based on government structure

Copyright © 2013 Access Innovations, Inc.

Catalog Systems - Examples

Library of Congress Subject Headings Sears Subject Headings

(used with Dewey)

Copyright © 2013 Access Innovations, Inc.

King of Catalogers

Charles Ammi Cutter rules for alphabetical subject indexing

most specific heading put two topics under two headings use English if possible x ref antonyms careful with homographs

1895 ALA Subject Headings following Cutter

Copyright © 2013 Access Innovations, Inc.

Politics in Libraries

In 1905 Dewey was president of ALA (American Library Association)

LC adopted DDC Threw out Cutter The two never spoke again.

Copyright © 2013 Access Innovations, Inc.

Types of Headings Single word

Botany or Ethics Adjective noun

Capital punishment Noun - noun

Death penalty American Standard

Noun preposition noun Penalty of death International Standard

Noun conjunction noun Nurses and nursing

Copyright © 2013 Access Innovations, Inc.

Cutter Guidelines File under the phrase “as it reads” Use the most significant words Reduce adjective nouns to noun

phrases Use singular rather than plural File compound words under the first

word No subheadings

Copyright © 2013 Access Innovations, Inc.

Cross References

Cross reference synonyms main heading should be what the class uses use the common term use the unambiguous heading prefer the one which brings relations “…with a well defined network of cross

references the mob becomes an army.. “ C.A. Cutter

Copyright © 2013 Access Innovations, Inc.

Library of Congress (LC) Subject Headings

1911 - List of Subject Headings extensive use of sub-headings invert phrases for main subject file under the noun not the adjective see references not cross filing place holder terms homographs defined parenthetically

Copyright © 2013 Access Innovations, Inc.

Classification vs. Subject Headings

Classification single spot or placement browse physical list often a numbering system clear hierarchy no or few cross references Like Yahoo!

Copyright © 2013 Access Innovations, Inc.

Classification vs. Subject Headings

Subject headings generic search hidden classification system related terms and cross references in heavy use usually the inverted form

cells, electric

Copyright © 2013 Access Innovations, Inc.

Vocabulary Control - Options

Classification systems

Authority files Controlled term lists Uncontrolled term

lists Thesauri

Copyright © 2013 Access Innovations, Inc.

Authority Systems - Defined

Frequently have cross references Widely available Frequently coded lists Brand names .. Lists of terms in the preferred format for

use.

Copyright © 2013 Access Innovations, Inc.

Authority Files - Defined

People Places Things ……..NOT Concepts Methods Processes

Copyright © 2013 Access Innovations, Inc.

Authority Files - Examples

ISO Country Name and Code International Standards Organization

ISO Language list NAICS (SIC)

Standard Industrial Classification Code (SIC) Replaced by

North American Industrial Classification System (NAICS)

Copyright © 2013 Access Innovations, Inc.

Authority Lists - Format

Belgian Congo use Congo

Bill Gates use William F. Gates, III (computer scientist) see also

William Gates (basketball player)

Copyright © 2013 Access Innovations, Inc.

Authority Lists - Need Style Sheets

Names AACR2

Anglo American Cataloging Rules AAP

American Association of Publishers Chicago Manual of Style Dun & Bradstreet Style Sheet

Copyright © 2013 Access Innovations, Inc.

Vocabulary Control - Options

Classification systems

Authority files Controlled term

lists Uncontrolled term

lists Thesauri

Copyright © 2013 Access Innovations, Inc.

Controlled Term Lists - Defined

State the preferred terms Provide allowed term entry Heavily cross referenced Not generally hierarchical Popular Easy to create

Copyright © 2013 Access Innovations, Inc.

Controlled Term Lists - Examples

ABI/Inform Predicasts RDS - Responsive Data Services Back of book indexes Art and Architecture Thesaurus …....These are not FULL thesauri

Copyright © 2013 Access Innovations, Inc.

Controlled Term List - Format

Cars use Automobiles

Personal Computer use Microcomputer

Copyright © 2013 Access Innovations, Inc.

Vocabulary Control - Options

Classification systems

Authority files Controlled term lists Uncontrolled term

lists Thesauri

Copyright © 2013 Access Innovations, Inc.

Uncontrolled List - Define

Add terms as they occur No cross reference Simple flat structure

Copyright © 2013 Access Innovations, Inc.

Uncontrolled List - Example

List of names Grocery list Candidate term list

Copyright © 2013 Access Innovations, Inc.

Uncontrolled List - Format

Laundry Trim bushes Cat box needs cleaning Tommy’s birthday (bake cake) Iron Water plants ….other natural language lists

Copyright © 2013 Access Innovations, Inc.

Trying to Impose Control...

Do laundry Trim bushes Clean cat box Bake birthday cake Iron shirts Water plants

Copyright © 2013 Access Innovations, Inc.

Designed to enhance understanding and retention of the vocabulary concepts necessary for creating a taxonomy, ontology, thesaurus, or controlled vocabulary.

Game supplies: 1 Deck of Orange Question and Challenge Cards 1 Deck of Green Answer Cards

Game setup: Shuffle the deck of Green Answer cards, Deal the entire deck to the players. Shuffle the deck of Orange Question and Challenge cards Place them facedown in a pile in the middle of the table so that all players can

reach the pile.

Reinforce what you just heard! Have fun!

TAXONOMATCH

Copyright © 2013 Access Innovations, Inc.

1. Play moves to the left of the dealer

2. Draw a card from the top of the Orange cards. Read it aloud to all of the players.

3. The player who read the card says out loud what they think the answer is.

4. Each player looks at the Green Answer cards in their hand.

1. If they have the correct answer to the Question or Challenge, they show their card to everyone at the table.

2. If everyone agrees that the answer is correct, the player holding the correct answer card gives it to the player who read the Question or Challenge card.

5. The player places their associated pair of cards – one Orange Question and Challenge card and one Green Answer card – face up on the table in front of them.

6. Play passes to the person who held the correct Green Answer card in their hand. Play continues as in step 2 above.

7. Discussion among the players to arrive at the correct answer is permissible and encouraged!

8. If players do not arrive at a consensus regarding the correct answer, the Orange Question and Challenge card may be returned to the bottom of the pile, and play passes to the person to the left of the player who drew the previous card.

9. When all of the Orange Question and Challenge cards have been drawn, read aloud, and matched with their Green Answer cards, the game ends.

10. If there are any Orange Question and Challenge cards remaining to which players cannot agree on an answer, players may consult their notes or ask the session speaker.

Copyright © 2013 Access Innovations, Inc.

TAXONOMATCH RULES

Term Forms

Term Forms

Nouns Prepositional forms Adjectives Adverbs Initial Articles Singular and plural

Copyright © 2013 Access Innovations, Inc.

Term Forms - Noun and Noun Phrases

Nouns and noun phrases print media carpet

Copyright © 2013 Access Innovations, Inc.

Term Forms - Prepositional Forms

Prepositional forms are seldom used okay in International Standard ISO

Philosophy of Education ANSI / NISO

Educational philosophy

Copyright © 2013 Access Innovations, Inc.

Term Forms – Adjectives

Adjectives not used in isolation may be used for coordination Miniature paintings

USE PAINTINGS AND MINIATURE Portable typewriters

USE TYPEWRITERS AND PORTABLE

Copyright © 2013 Access Innovations, Inc.

Term Forms – Adjectives

Adjectives may convert to noun forms

MINIATURE SIZE PORTABLE DEVICES TRIANGULAR SHAPE

Copyright © 2013 Access Innovations, Inc.

Term Forms - Adverbs

Adverbs not used unless part of a compound term VERY LARGE ARRAY RADIO TELESCOPE

Used for VLA

Copyright © 2013 Access Innovations, Inc.

Term Forms - Verbs Verbs

no infinitive or participle forms for actions that can be expressed as nouns and retain

clear meaning, use noun form or gerunds

Examples Speaking (not Speech) Walking (not Ambulation) Communication (not Communicate) Administration (not Administer)

Copyright © 2013 Access Innovations, Inc.

Term Forms - Initial Articles

AVOID THEM Example

Theater not The theater State (political entity) not The state

Use if part of a proper name Le Mans El Salvador

Copyright © 2013 Access Innovations, Inc.

Term Forms - Singular and Plural

Concrete entities count nouns are plurals - how many?

planets children

non count nouns - how much? nickel snow lace

Copyright © 2013 Access Innovations, Inc.

Term Forms - Singular and Plural

fully formed organism eyes mouth

objects are singular lamp

classes of things fruits

Copyright © 2013 Access Innovations, Inc.

Term Forms - Singular and Plural

Abstract concepts Show in the singular form

authority socialism packaging biochemistry

Copyright © 2013 Access Innovations, Inc.

Term Forms - Singular and Plural

Unique entities Show in the singular

Big Ben Grand Canyon

Copyright © 2013 Access Innovations, Inc.

Other Formatting

Spelling Punctuation Capitalization Abbreviations ...

Copyright © 2013 Access Innovations, Inc.

Spelling

Use what the users will use and cross post for multilingual fiber - fibre center - centre organization - organisation hemo - haemo Pediatrics - paediatrics

Copyright © 2013 Access Innovations, Inc.

Punctuation

Parentheses only for qualifiers Apostrophes are retained Hyphens - avoid

avoid avoid

avoid avoid

Copyright © 2013 Access Innovations, Inc.

Capitalization

NISO = initial only AACR2 format

Practice is to follow a manual of style Chicago Manual of Style Associated Press American Association of Publishers

Copyright © 2013 Access Innovations, Inc.

Abbreviations

Use only when well known Always include the full meaning LASER

Scope Note Light Amplification by Stimulated Emission of Radiation

WHO World Health Organization

Copyright © 2013 Access Innovations, Inc.

Other Ways of Adding Value

Cross references Facets Notation Roles Treatment Term weighting

Copyright © 2013 Access Innovations, Inc.

Cross References

See - S See also - SA Not related or associated Not opposite Just helpful guides

Copyright © 2013 Access Innovations, Inc.

Synthesis in Classification

S.R.Ranganathan 1933 Colon Classification analytico-syntactic classification analyze subject into component parts

(facets) arrange facets into schedules combine facets to express subject

complexity

Copyright © 2013 Access Innovations, Inc.

Ranganathan

A General Properties Ab Configuration

Ac Tubular B Materials Bc Metals

Bcc ferrous Bcd steels

Bcf Chromium steels Bcfi Chromium-nickel steels

K Modes of failure Kg Creep

Kgb Creep rupture L Stresses and loads

Lb Tensile

Copyright © 2013 Access Innovations, Inc.

Ranganathan

Tubular Chromium Nickel steel creep rupture Tensile strength

Ac Bcfi Kgb Bb Chain indexing Tubular

Chromium Nickel steel creep rupture

Tensile strength

Copyright © 2013 Access Innovations, Inc.

Other Ways of Adding Value

Cross references Facets Notation Roles Treatment Term weighting

Copyright © 2013 Access Innovations, Inc.

Facets

Additional ways to add meaning Divide terms into categories using a

single characteristic Limited number of categories

Copyright © 2013 Access Innovations, Inc.

Facets and Roles

PRECIS - Austin 1984 order of terms post-coordinate indexing system role of the term is important tomato

living plant? marketable product?

Facet role indicator organism end product

Copyright © 2013 Access Innovations, Inc.

Many Faceted Vocabularies

UMLS Semantic Network Unified Medical Language System - 49

BLISS Classification Association British Library Information Science System

Dewey Decimal Classification System Universal Decimal Classification

System Art and Architecture Thesaurus

Copyright © 2013 Access Innovations, Inc.

MeSH and Tree Pages

Copyright © 2013 Access Innovations, Inc.

Copyright © 2001 Access Innovations, Inc. 219

MeSH Alpha

Copyright © 2013 Access Innovations, Inc.

Order of Facets

Post-coordinate Means before order Notation becomes important Breaks down for large classes

(more than 5,000 terms)

Copyright © 2013 Access Innovations, Inc.

Other Ways of Adding Value

Cross references Facets Notation Roles Treatment Term weighting

Copyright © 2013 Access Innovations, Inc.

Notation Options

Expressive Ordinal Synthetic Enumeration Many style options

Copyright © 2013 Access Innovations, Inc.

Expressive Notation

83 Hazards 831 Fire 831.5Fire fighting 831.53 Fire fighting equipment 831.532 Fire extinguishers 831.532.5 Carbon dioxide fire extinguishers

832 Explosions

Copyright © 2013 Access Innovations, Inc.

Ordinal and Semi-ordinal Notation

HK Hazards HL Fire HM Fire fighting HN Fire fighting equipment HNB Fire extinguishers HNE Carbon dioxide fire extinguishers

HO Explosions

Indention is the sole indication of hierarchy

Copyright © 2013 Access Innovations, Inc.

Synthetic and Enumeration Notation

Need to allow the classification system to grow

Synthetic example P Architecture PAT Architectural information PAT.M Architectural information services

Copyright © 2013 Access Innovations, Inc.

Copyright © 2001 Access Innovations, Inc. 226

Notation Examples - AAT Facets

Copyright © 2013 Access Innovations, Inc.

Systematic Display

Paints (By composition)

Oil paints Water paints Cement paints

(By use) Primers Undercoats Top coats

Copyright © 2013 Access Innovations, Inc.

Copyright © 2001 Access Innovations, Inc. 228

AAT Pages

Notice faceted indentions

Copyright © 2013 Access Innovations, Inc.

229

AAT Term

Copyright © 2013 Access Innovations, Inc.

Alphabetical Display

Paints NT

Cement paints Oil paints Primers Top coats Undercoats Water paints

Copyright © 2013 Access Innovations, Inc.

Other Ways of Adding Value

Cross references Facets Notation Roles Treatment Term weighting

Copyright © 2013 Access Innovations, Inc.

Roles

ERIC Thesaurus - role indicators Adjectives - bibliographic terms Input or raw material Output or product Undesirables Indicated uses Materials “In which” Affects Primary topics of discussion Passive recipients, possessors, location Means used

Copyright © 2013 Access Innovations, Inc.

Roles

CAS - Super roles Analytical study Biological study Formation, nonpreparative Occurrence Preparation Process Uses

CAS Specific roles Miscellaneous Properties Reactant

Copyright © 2013 Access Innovations, Inc.

Subheadings as Roles

MeSH Therapeutic use Drug treatment (disease) Adverse effect (drug treatment) Diagnosis

Copyright © 2013 Access Innovations, Inc.

Other Ways of Adding Value

Cross references Facets Notation Roles Treatment Term weighting

Copyright © 2013 Access Innovations, Inc.

Treatment and Aspect Codes

Apply codes or types at article level Theoretical New development Experimental Practical

Copyright © 2013 Access Innovations, Inc.

Other Ways of Adding Value

Cross references Facets Notation Roles Treatment Term weighting

Copyright © 2013 Access Innovations, Inc.

Cranfield Project - Cleverdon 1966

Concepts in the main theme 9/10 Major subsidiary theme 7/8 Minor subsidiary theme 5/6

Copyright © 2013 Access Innovations, Inc.

Internet Engines

Complex weighting of terms Use term frequency Rank output wholly automatic Output based on input term weights Can also use “well formed” data -

like a thesaurus hierarchy field formatted data XML files

Copyright © 2013 Access Innovations, Inc.

Automatic and Semi-automatic Classification?

Data Harmony® M.A.I.™ Semio Autonomy - Muscat Net Owl - Names n-Stein Quiver Smart Logic

Copyright © 2013 Access Innovations, Inc.

Machine Aided Indexing Goals Improve

Indexing efficiency Indexing consistency Reduce editorial drift Depth of Indexing

Reduce Over and under indexing Term over use and under use

Copyright © 2013 Access Innovations, Inc.

Machine Aided Indexing Goals

Improve productivity

Indexer Information worker

Disambiguate terms Increase clarity

Copyright © 2013 Access Innovations, Inc.

Machine Aided Indexing - Intellectual Components

Word List or Thesaurus

Knowledge base Rules based

Natural Language (Semantic)

Editorial evaluation

Copyright © 2013 Access Innovations, Inc.

Example:M.A.I.™ Software Components

Rule Builder

Concept Extractor

Statistics Collector

Copyright © 2013 Access Innovations, Inc.

DATA HARMONY DISCOVERY

TOUR

Copyright © 2013 Access Innovations, Inc.

Taxonomies in Search

Copyright © 2013 Access Innovations, Inc.

Do the Data FIRST

What do you have? What does it need? How would you LIKE to access it? Look at the data BEFORE you create the

specifications DTD built without data is not going to work

Then choose the system that will support your data

Copyright © 2013 Access Innovations, Inc.

My Main Frustration

1. Select hardware

2. Select software

3. Design system

4. Try to load the data

5. Add the taxonomy, if at all That’s BACKWARDS

Copyright © 2013 Access Innovations, Inc.

Why Does Search Fail? Most large organizations have 5 different

search 7 All disappointing and sitting on the shelf

Inconsistent results Unclear path to results Lack of single unified clear consistent

vocabulary Not tied to data governance

Taxonomy Other metadata

Copyright © 2013 Access Innovations, Inc.

SEARCH

How search works Measuring accuracy in search

Precision Recall Relevance

Search theoretical basis Bayes, Boole, and the rest of the guys

The taxonomy effect

Copyright © 2013 Access Innovations, Inc.

Parts of Search

Search software Inverted Index Search algorithms

Presentation layer Search box Autocompletion Related and narrower terms Hierarchical display

Copyright © 2013 Access Innovations, Inc.

Hierarchical Display

InvertedFile

Index

Searchable Index

TaxonomyThesaurus

Inverted Files and Boolean are Basic to ALL Search

Copyright © 2013 Access Innovations, Inc.

Note: not available in all systems!

“Outline of Presentation”1 Define key terminology2 Thesaurus tools

Features Functions

3 Costs Thesaurus construction Thesaurus tools

4 Why & when?

Creating an Inverted File Index

Sample DOCUMENT

Copyright © 2013 Access Innovations, Inc.

Simple Inverted File Index ofthe Terms from the “Outline”

&1234constructioncostsdefinefeaturesfunctions

key ofoutlinepresentationterminologythesaurustoolswhenwhy

Copyright © 2013 Access Innovations, Inc.

& - Stop1 - Stop2 - Stop3 - Stop4 - Stopconstruction - L7, P2, SH costs - L6, P1, Hdefine - L2, P1, Hfeatures - L4, P1, SHfunctions - L5, P1, SH

key - L2, P2, Hof - Stopoutline - L1, P1, Tpresentation - L1, P3, Tterminology - L2, P3, Hthesaurus - (1) - L3, P1, H (2) - L7, P1, SH (3) - L8, P1, SHtools - (1) - L3, P2, H (2) - L8, P2, SHwhen - L9, P3, Hwhy - L9, P1, H

Complex Inverted File Index -Placement, Location added

Copyright © 2013 Access Innovations, Inc.

Search Presentation Layer

Automatic completionAnd type ahead

from Thesaurus

Copyright © 2013 Access Innovations, Inc.

Search Presentation Layer

Related

Narrower

Copyright © 2013 Access Innovations, Inc.

Search Presentation Layer

The Hierarchical view of the thesaurus is also a browse able view of the content.

The numbers include the number of hits 1. For the term 2. For the branch

Copyright © 2013 Access Innovations, Inc.

Many parts Search software – of course Computer network Parsing of text – the “inverted file” Well formed or structured text CLEAN DATA Computer software – network Computer hardware Telecommunications connection Training sets for statistical systems

How Does Search Work?

Copyright © 2013 Access Innovations, Inc.

Technical Parts of Search

Search technology Ranking algorithms Query language Federators Cache

Inverted index – as discussed above Other enhancements Presentation Layer

Copyright © 2013 Access Innovations, Inc.

Access Innovations – Complex Farm With Perfect Search

SourceData

Query

Search Harmony

Presentation Layer

Repository XIS (cache)

Cleanup, etc.

Federators

Query Servers

Index Builders

DeployHub

Cache Builders

Copyright © 2013 Access Innovations, Inc.

QU

ERY API

CUSTOMCONNECTOR

EMAILCONNECTOR

Core Architectural Components

Pipeline

SEARCHSERVER

QU

ERYPR

OC

ESSOR

Query

Results

VerticalApplications

Portals

CustomFront-Ends

MobileDevicesContent

Push

DO

CU

MEN

TPR

OC

ESSOR

WebContent

Files,Documents

Databases

CustomApplications

CO

NTEN

T API

MANAGEMENT API

Index DBDATABASE

CONNECTOR

FILETRAVERSER

WEBCRAWLER

Pipeline

Email, Groupware

Administrator’sDashboard

FILTERSERVER

Agent DB

Alerts

Data Harmony Governance API

MA

Istro

Search harmony

FAST Search Example

Copyright © 2013 Access Innovations, Inc.

Measuring Accuracy in Search

Relevance Recall Precision Accuracy – Hits, miss, noise Ranking Linguistics Query Processing Results Processing Display Search refinement Usability Business Rules

263Copyright © 2013 Access Innovations, Inc.

Relevance

How well a set of returned documents answers the information need

“Accuracy” Related to objective of search

Different user communities Information resources

Tension of user needs and context available A confidence “guesstimate”

Copyright © 2013 Access Innovations, Inc.

Recall = Number of relevant items retrieved

Number of relevant items in the collection

Precision = Number of relevant items retrieved Number of items retrieved

Relevance = Germane (Precision) Pertinent (Recall)

The Formulas

Copyright © 2013 Access Innovations, Inc.

Measuring Relevance

Concepts Context Age of documents Completeness (recall) Quality Statistically determined ? Nope, it is subjective

Someone has to determine the rightness of the item A confidence factor = canard!

Copyright © 2013 Access Innovations, Inc.

Kinds of Search Bayesian –

FAST Lucene Autonomy / Verity

Boolean Dialog Endeca Perfect Search

Ranking algorithms Google

267Copyright © 2013 Access Innovations, Inc.

George Booleand Boolean Algebra

George Boole Mathematician 1815-1864

Boolean algebra An algebraic system of logic AND, OR, NOT, ANDNOT, Dialog, BRS, Stairs

268Copyright © 2013 Access Innovations, Inc.

Boolean Representation Venn diagram showing

the intersection of sets A AND B (in violet),

The union of sets A OR B (all the colored regions),

And set A XOR B (all the colored regions except the violet).

The "universe" is represented by the rectangular frame.

269Copyright © 2013 Access Innovations, Inc.

Bayes and Bayes’ Theorem Thomas Bayes

Mathematician 1702 - 1761

Bayesian theorem Uses probability inductively Established a mathematical basis for probability inference

WHAT? A means of calculating,

from the number of times an event has not occurred, the probability that it will occur in future trials

270Copyright © 2013 Access Innovations, Inc.

Bayesian Methods –Cautions

A user might wish to change the distribution of probabilities.

A user will make a novel request for information in a previously unanticipated way.

The computational difficulty of exploring a previously unknown network.

The quality and extent of the prior beliefs used in Bayesian inference processing.

Copyright © 2013 Access Innovations, Inc.

Bayesian Methods - Cautions (continued)

A Bayesian network is only as useful as the prior knowledge is reliable.

An optimistic or pessimistic expectation of the quality of these prior beliefs will distort the entire network and invalidate the results.

Must ensure the selection of the statistical distribution induced in modeling the data.

Must have the proper distribution model to describe the data.

That is… you have to constantly train and retrain the data

Copyright © 2013 Access Innovations, Inc.

Basic Areas of Natural Language Processing (NLP)

Syntactic Semantic Morphological Phraseological Lemmatization (stemming) Statistical Grammatical Common Sense

Copyright © 2013 Access Innovations, Inc.

Basic Areas of AutomaticLanguage Processing (ALP)

Auto Translation Auto Indexing Auto Abstracting Artificial Intelligence Searching Spell Checking Semantic Web Natural Language Processes (NLP) Computational Linguistics

Copyright © 2013 Access Innovations, Inc.

Statistical Search

Cluster analysis Neural networks Co-occurrence Bayesian inference Latent Semantic Etc.

275Copyright © 2013 Access Innovations, Inc.

Word and Term Parsing

Stemming -ing, -ed, -es, -’s, -s’, etc. Depluralization

Truncation Left and right

Wild cards Organi*ation

Variant Spellings Centre, Center

Hyphens Copyright © 2013 Access Innovations, Inc.

The Taxonomy Effect

Where do the terms go? How are they used in search What other ways can I use the taxonomy

in search?

Copyright © 2013 Access Innovations, Inc.

For search all publications

Search database for Journals and pubs

Bookstore search

Search of 53 crawled sites including journals, books, web site, conference sites, etc.

Site search

Navigation

Copyright © 2013 Access Innovations, Inc.

Taxonomy DrivenSearch Presentation

Navigate the full taxonomy “tree”

BROWSE

Auto-completion using the taxonomy

Guide the user

Copyright © 2013 Access Innovations, Inc.

Subject Browsing

Copyright © 2013 Access Innovations, Inc.

Targeted Resources Basedon Subject or User Role

CONFIDENTIALCopyright © 2013 Access Innovations, Inc.

Member Profile Tagging

User pastes or uploads CV

Button to auto-extract taxonomy attributes

Copyright © 2013 Access Innovations, Inc.

TaxoTerm ServerData Harmony

(M.A.I.)

Even

t Han

dle

r

Returns subject metadata

MicrosoftSharePointServer 2010

User uploads a document to SharePoint space

Before uploading to SharePoint server, the EventHandler sends the document to Data Harmony.

Data Harmony automatically attaches indexing terms before uploading to MOSS

Adding Terms

to SharePoint

Copyright © 2013 Access Innovations, Inc.

SharePoint 2010 Only Shows 10 Lines of the Taxonomy

284

This add on makes it all viewable

Copyright © 2013 Access Innovations, Inc.

QU

ERY API

CUSTOMCONNECTOR

EMAILCONNECTOR

Core Architectural Components

Pipeline

SEARCHSERVER

QU

ERYPR

OC

ESSOR

Query

Results

VerticalApplications

Portals

CustomFront-Ends

MobileDevicesContent

Push

DO

CU

MEN

TPR

OC

ESSOR

WebContent

Files,Documents

Databases

CustomApplications

CO

NTEN

T API

FAST MANAGEMENT API

Index DBDATABASE

CONNECTOR

FILETRAVERSER

WEBCRAWLER

Pipeline

Email, Groupware

Administrator’sDashboard

FILTERSERVER

Agent DB

Alerts

Use taxonomy terms hereData Harmony Governance API

MA

Istro

Search harmony

Taxonomies Added in Search Example

Copyright © 2013 Access Innovations, Inc.

Auto suggestion ofTaxonomy Terms

Populate Keywords, Descriptors, Indexing terms, etc.

Allow for manual review of auto-tagging for quality assurance.

Copyright © 2013 Access Innovations, Inc.

Where do I use a taxonomy?

Copyright © 2013 Access Innovations, Inc.

Thesaurus Master

Machine Aided

Indexer (M.A.I.™) Database

Repository

SearchPresentation

Layer

Increasesaccuracy

Browse by SubjectAuto-completionBroader TermsNarrower TermsRelated Terms

Client Taxonomy

Inline Tagging

Metadata and Entity Extractor

Automatic Summarizati

on

Search Software

Client Data

Full Text

HTML, PDF,

Data Feeds,

etc.

Client taxonomy

The Workflow

288

Tag and Createmetadata

Put in data base with tags

Build Search inverted index

Create user interface

Gather source data

Copyright © 2013 Access Innovations, Inc.

Thesaurus Master

Machine Aided

Indexer (M.A.I.™) Reposito

ry

SearchPresentation:

90% accuracy

Browse by SubjectAuto-completionBroader TermsNarrower TermsRelated Terms

Client Taxonomy

Inline Tagging

Metadata and Entity Extractor

Automatic Summarizati

on

SearchSoftwar

e

Client Data

Full Text

HTML, PDF,

Data Feeds, etc.

Client taxonomy

Taxonomy In Sharepoint

Copyright © 2013 Access Innovations, Inc.

[Data Harmony fully integrated with MOSS.]

Adding Terms toInformation Objects

Part of the record XML MARC

A relational table pointing the terms to a record ID number (Secondary key)

Adding data to the HTML META NAME KEYWORD Element

Many other options

Copyright © 2013 Access Innovations, Inc.

Part of the Record - XML

Added as an element in the XML record Need an element to put the data in

<Taxonomy Term> Capture the terms when creating the

records

Copyright © 2013 Access Innovations, Inc.

The author pastes the data to the

document template,

attaching images, graphs, as necessary:

Author Submission

Module

Copyright © 2013 Access Innovations, Inc.

Editorial Workflow IntegrationAuthor Submission Module

The author fills in the data to the document template, attaching images and graphs as necessary.

An API calls Data Harmony and generates a list of indexing terms based on the content.

Copyright © 2013 Access Innovations, Inc.

Authors review the indexing and may change it.

Content is stored into a data repository as HTML, XML, etc.

Editorial Workflow IntegrationAuthor Submission Module

Copyright © 2013 Access Innovations, Inc.

In the HTML Record Makes it crawlable for the internet Used in CMS applications

Content Management Systems Add to the HTML

Manually In Dreamweaver In your CMS like Extron

Author Submissions Example Do the same with SharePoint

Copyright © 2013 Access Innovations, Inc.

META NAME “KEYWORDS”

Copyright © 2013 Access Innovations, Inc.

In Relational Database Table

Primary Key – the record Secondary key all the metadata

Like taxonomy terms Like author Like publication date

Used in Oracle, SQL, etc Need a field to put the taxonomy data in

Supports “Faceted Search” each item in a separate field or element or table

Copyright © 2013 Access Innovations, Inc.

RDBMS Connection

Taxonomy term table

Copyright © 2013 Access Innovations, Inc.

Using Taxonomiesin Applications

• Improve search• Subject browsing• Mobile intelligence• Targeted resources based on

subject or user role• Link to society resources• Author submission module• Author authority database• Expert reviewer identification• Member profiles• Data visualization• More like this

• In “indexing” or categorizing, as subject metadata

• In content management systems

• In SharePoint• In mashups• In social networking sites• In author tagging • In filtering data – e.g., spam

filters and RSS feeds• In web crawlers• Social media - community

Copyright © 2013 Access Innovations, Inc.

A Quick Look

Behind the Scenes

DatabaseManagement

System

Thesaurustool

Indexingtool• Validate terms

• Add terms and rules• Change terms and rules• Delete terms and rules

• Search thesaurus• Validate term entry• Block invalid terms• Record candidates

• Establish rules for term use

• Suggest indexing terms

Copyright © 2013 Access Innovations, Inc.

Taxonomyview

ThesaurusTerm Record

view

Copyright © 2013 Access Innovations, Inc.

Where Does the Subject Metadata Go?

Apply to content itself Use meta name field in HTML header Connect search to the keywords in the SQL or

other database tables

Copyright © 2013 Access Innovations, Inc.

HTML Header

Copyright © 2013 Access Innovations, Inc.

Suggested taxonomy descriptors

Copyright © 2013 Access Innovations, Inc.

Copyright © 2013 Access Innovations, Inc.

Integrate Taxonomy to Enhance Find-ability

Browsable categories of a directory Browsable faceted navigation

Smart search for term equivalents Taxonomy terms (original or modified) as labels Navigation aids incorporate taxonomy terms

and relationships

Copyright © 2013 Access Innovations, Inc.

More Taxonomy Enrichment

Spelling alternatives and correction Related concepts Statistical information about the metadata Navigation or drill downs Search refinement

Recursive sets Concept linking Dictionary lookup (in taxonomy glossary)

Copyright © 2013 Access Innovations, Inc.

Brand is repeated in several spots and tied to search as well

Copyright © 2013 Access Innovations, Inc.

Raw Full text data

feeds XIS™

Creation

Taxonomy Thesaurus Master®

Printed source

materials

Taxonomy terms

M.A.I.™ Concept Extractor

M.A.I.™ Rule Base

Load toPerfect Search

Search Harmon

™ Display Search

Database Plus Search Workflow

Data Crawls on 53+ sources

Add metadat

a XIS™ repositor

y

SQL for ecommerc

e

Save data to search and repositories at the same time

Copyright © 2013 Access Innovations, Inc.

Raw Full text data

feeds

XIS Creation

Taxonomy Thesaurus

Master

Printed source

materials

Taxonomy terms

MAI Rule Base

Load toSearch

Search Harmony Display Search

Data Base Plus Search Workflow

Data Crawls on data sources

Add metadata

XIS repositor

y

SQL for ecommerce

MAI Concept Extractor

Source data

Clean and enhance data

Search data

Copyright © 2013 Access Innovations, Inc.

Use Case: Inline Tagging

Show the exact point where the concept is mentioned

Mouse-over to view the term record

Statistical summary, showing the number of times each term is mentioned in the article

Copyright © 2013 Access Innovations, Inc.

Inline Tagging HTML View

Copyright © 2013 Access Innovations, Inc.

XML View forInline Tagging

Copyright © 2013 Access Innovations, Inc.

Taxonomyview

ThesaurusTerm Record

view

Copyright © 2013 Access Innovations, Inc.

The New Board Game Applications Implementation The taxonomy

A TAXING SITUATION

Copyright © 2013 Access Innovations, Inc.

The Changing Faces ofWeb Taxonomies

….and how the information is delivered From current site To new version

Depends on TAXONOMY Personalization Feeding ads Consistent information

Copyright © 2013 Access Innovations, Inc.

Copyright © 2013 Access Innovations, Inc.

Copyright © 2013 Access Innovations, Inc.

HTML HeadersMETA NAME KEYWORD

Use the taxonomy here

Copyright © 2013 Access Innovations, Inc.

Copyright © 2005 - Access Innovations, Inc.Copyright © 2013 Access Innovations, Inc.

More Innovations! Link topic to article to author to event Make visual links within domain Enable authors to submit and categorize conference

submissions Create author authority database linking to co-authors, topics,

locations, etc. Create expert reviewer database Create member profiles with alternate names, publications,

tagged by topic Visualize data and domain distribution Display interest connections in social network Deliver accurate targeted information through mobile applications Etc.

Copyright © 2013 Access Innovations, Inc.

Change to Ready, Aim, Fire!

Follow the data Look at the data, format and content Design taxonomy for data Leverage the standards Use taxonomy to tag data Choose search and repository software for data Load the data into the system Keep your eye on the target

Copyright © 2013 Access Innovations, Inc.

Standards forMonolingual Thesauri

TEST - Thesaurus of engineering and scientific terms - COSATI 1967

ARNOR NFZ 47-100 1981 French DIN 1463 German 1987-1993 NISO Z39.19 - 1993 - American

Copyright © 2013 Access Innovations, Inc.

Where Can I Get Taxonomy Standards?

www.niso.org Z39.19 (2010) Controlled Vocabularies

www.ISO.ce ISO 25964 parts 1 and 2 (2012 and 2013)

www.bsi.uk.co www.w3c.org SKOS and OWL www.accessinn.com/library

Copyright © 2013 Access Innovations, Inc.

Suggested Reading F.W. Lancaster - 1986

Vocabulary Control 1986 Aitchison, Gilchrist and Bawden

Thesaurus construction and use: a practical manual 4th edition

Accidental Taxonomist Heather Heddon

TaxoDiary.com Blog site

Copyright © 2013 Access Innovations, Inc.

Suggested Reading

Introduction to any thesaurus INSPEC NICEM Pychological Abstracts etc.

Copyright © 2013 Access Innovations, Inc.

It Just Takesa Little

ImaginationThank you

Marjorie M.K. Hlava, PresidentBob Kasenchak, Project CoordinatorAccess Innovations505-998-0800mhlava@accessinn.comBob_kasenchak@accessinn.com

Copyright © 2013 Access Innovations, Inc.

Recommended