44
High Level Browse Automatic Assignment of Broad Subject Categories Using Pre-existing Data from Catalog Records Jonathan Rothman Senior Systems Librarian / Analyst University of Michigan University Library [email protected]

High Level Browse Automatic Assignment of Broad Subject Categories Using Pre-existing Data from Catalog Records Jonathan Rothman Senior Systems Librarian

Embed Size (px)

Citation preview

High Level Browse

Automatic Assignment of Broad Subject Categories Using Pre-existing Data from Catalog Records

  Jonathan RothmanSenior Systems Librarian / Analyst

University of Michigan University Library

[email protected]

Context / History

People like lists (aka browsable, categorized access tools)

WWW = demand for browsable, clickable lists.

“Hand-made” web lists.

Manually-Maintained Resource List

…and another…

Demand for Comprehensive Lists

Manual maintenance is plausible for selected lists

… but it is not supportable for “comprehensive” tools.

Manually Built and Maintained Electronic Journals List

The Issues

Inconsistent Categories

The Issues

Inconsistent Categories

Categories require a lot of maintenance work

Alternatives We Considered

Using LC Classification As Interface

Order record fund codes

Mapping from data in Bibliographic Records

LC Class Schedule as Interface

Doesn’t accommodate local Dewey numbers

Assumes user knowledge of classification schedule organization, unintuitive

Scatters items of interest to Departments and programs across categories

Doesn’t take advantage of local expertise

Using Fund Codes

Not presently available outside of our Acquisitions System

Codes don’t map neatly to topics Master list of codes would need to

be carefully maintained along with maps from codes to topics

Data Mapping Pros and Cons

Pros Uses data that

already exists in records.

Mapping allows adjustments to topics without changing individual records.

Cons Some materials

don’t historically contain class numbers.

Some records don’t contain the numbers which will get them to appropriate categories.

High Level Browse Project

High Level Browse?? Two Project Components

Create a single set of topics to be used across access tools

Create an infrastructure that allows bibliographic data to be associated with topics in a maintainable way

Unified Topic List

Start with merger of existing lists. Review in light of local programs and units Broad Input

Design principles Limit number of headings at a given level Limit number of levels

Mostly a Political Process – A lot of discussion, compromise and iteration.

Topic List, Level One Topics There are nine Level One topics

Arts & Humanities Business & Economics Engineering General Reference Government Information & Law Health Sciences News & Current Events Science Social Sciences

Topic List, Level Two Topics110 total - Some Examples:

African Studies African-American Studies American and Canadian Studies Architecture Art and Design Art History Classical Studies East Asian Languages and

Cultures English Language and Literature Film and Video Studies Gay/Lesbian/Bisexual/

Transgender Studies General and Comparative

Literature Germanic Languages and

Literature History (General) Humanities (General)

Biological Chemistry Biomedical Engineering Complementary and

Alternative Medicine Dentistry Dermatology Family Medicine and Primary

Care Genetics Geriatrics Internal Medicine and

Specialties Kinesiology and Sports Medicine (General) Microbiology and Immunology Molecular, Cellular and

Developmental Biology Neurosciences

Overview of Work Involved

Development of initial maps by teams of catalogers and subject-selectors.

Technical infrastructure development.

Integration of high-level browse infrastructure with existing retrieval tools.

Evaluation / Tuning.

Principles for Technical Development

Mapping Infrastructure Should be Independent of Any Specific Access Tool

Regular Maintenance of Maps Should be Possible Without Programmer Intervention

What Do We Mean by a Map?

BC => Philosophy BD => Philosophy BF 432.N5 => Afro-American and African

Studies BR 128.A16 => Afro-American and African

Studies E 185 => Afro-American and African

Studies F 1435.3.P5 => Philosophy HF 5387-5387.5 => Philosophy

Topic Map

African and Afro-American Studies

DT1.A N1. A26 E184.7 F189.B19N4 HQ768

Revised Topic Map

African Studies Afro-American

Studies

DT1.A N1. A26 E184.7 F189.B19N4 HQ768

Map Creation Statistics

Creation of initial maps is about 80% complete.

On average, consultation session to define a map takes about 3-4 hours.

Map size ranges from One entry

Science (General) Map

Map Statistics

Creation of initial maps is about 80% complete.

On average, consultation session to define a map takes about 3-4 hours.

Map size ranges from One entry To 1656 Entries

Middle Eastern, Near Eastern and North African Studies Map

The Map Database

Map Tables 1

levelTwoTopic

id Name1 History (General)

2 Religious Studies

3 West European Studies

encompasses

levelOne levelTwo1 1

9 1

1 2

levelOneTopic

id name1 Arts & Humanities

2 Business & Economics

3 Engineering

Map Tables 2

levelTwoTopic

id Name1 History (General)

2 Religious Studies

3 West European Studies

lcId alphaStart numStart cutStart alphaEnd numEnd cutEnd notes

1 az 200.000 NULL az 361.000 NULL NULL

29 bl 1.000 NULL bx 999.000 NULL religion

34 z 7963.000 r45 z 7963.000 r45 women

lcMap

lc levelTwo1 1

2 1

1 3

Map Tables 3

levelTwoTopic

id Name1 History (General)

2 Religious Studies

3 West European Studies

deweyId numStart numEnd notes

1 350.000 359.000 NULL

29 840.000 849.990 French

34 850.000 859.940 Italian

deweyMap

dewey levelTwo1 1

2 60

3 63

Infrastructure Software Elements

Mapping Engine

Batch Load Script

Map Maintenance Interface

API Call to the Mapping Engine

#! /l/local/bin/perl

use CallNoToTopicMap;

CallNoToTopicMap::init();print "enter call numbers (ctrl-d when done): ";while ( <STDIN> ) { print "\ntopic(s): " . join("\n ", @{&CallNoToTopicMap::topics($_)}) . "\n\n: ";}CallNoToTopicMap::finish();

print "\n";

Infrastructure Demonstrations

Simple Demonstration Interface to Mapping Engine

Maintenance Interface

Integration with Existing Access Tools

Use to pre-generate categories associated with bibliographic items when data is updated in batch.

Use to populate menus of categories in real time

Use to generate categories associated with bibliographic items in real time.

Integration Demonstrations

New Books – new interface complete

Ejournals – integration still to be completed

Addressing Identified Issues

Types of materials that do not traditionally contain classification numbers in our system (e.g. Newspapers).

Individual items that are not classified so that they appear in all desired categories.

Implementation Status

New Books – move to production is imminent.

Electronic Journals and Newspapers – planned by end of 2003

NetER – Selection remains manual for now but new level one categories are integrated.

Work Outstanding

Completion of Initial Map Definition

Integration with Electronic Journals and Newspapers List

Tuning of Maps

Contact Information

Jonathan RothmanSenior Systems Librarian / Analyst

University of Michigan University Library

[email protected]

Questions?