30
Taxonomies in Electronic Records Management Systems May 21, 2002

Taxonomies in Electronic Records Management Systems

  • Upload
    steffi

  • View
    33

  • Download
    1

Embed Size (px)

DESCRIPTION

Taxonomies in Electronic Records Management Systems. May 21, 2002. Terms. Controlled Vocabulary A collection of preferred terms that indicates which terms are preferred and which are variants of the preferred terms. Thesaurus - PowerPoint PPT Presentation

Citation preview

Page 1: Taxonomies in Electronic Records Management Systems

Taxonomies in Electronic Records Management Systems

May 21, 2002

Page 2: Taxonomies in Electronic Records Management Systems

2

Terms Controlled Vocabulary

» A collection of preferred terms that indicates which terms are preferred and which are variants of the preferred terms.

Thesaurus» A type of controlled vocabulary that shows the hierarchical (parent-child),

associative (related terms) and equivalent (synonymous) relationships among terms.

Taxonomy» Hierarchical classification of elements within a domain. One type of

taxonomy is a File Plan.

 Ontology» A hierarchical classification that is more complex and subtle than a taxonomy.

It explains relationships between objects by mapping relationships, such as “part of” or “located in”. Also called knowledge mapping.

Page 3: Taxonomies in Electronic Records Management Systems

3

Why Use a Taxonomy

Management of Records

» Structure for Classification

» Navigational Tool

» Reduced Burden on Users

More Consistent Than Humans

Sheer Volume of Information

» Document Level Vs Folder Level

» High Speed Processing

More Than 80% of All Information Is Unstructured

Page 4: Taxonomies in Electronic Records Management Systems

4

Example: FirstGov.gov

Page 5: Taxonomies in Electronic Records Management Systems

5

Example: File Plan

Page 6: Taxonomies in Electronic Records Management Systems

6

Example: Visual Map

Page 7: Taxonomies in Electronic Records Management Systems

7

How Do Taxonomy Tools Work? General

»Understand Relevancy to Categories

»Create Knowledge Clusters

»Enable Types to Be Combined

Training Based»Require Representative Samples

» Identify Patterns

»Create Statistical Models

Rule Based»Process Rules Devised and Hand-coded by Humans

»Contain Keywords and Logical Relationships

Linguistics Based»Use Algorithms

»Understand Linguistic and Semantic Elements

Page 8: Taxonomies in Electronic Records Management Systems

8

Taxonomy Uses in Electronic Recordkeeping Systems

Auto Categorization

Searching and Browsing

File Plan Creation and Maintenance

Page 9: Taxonomies in Electronic Records Management Systems

9

Auto Categorization

Page 10: Taxonomies in Electronic Records Management Systems

10

Auto Categorization Case Studies

National Archives and Records Administration» 12,000 Documents

» Granular File Plan

» Single Repository

University of Nevada for Department of Energy» 150,000 Documents

» 99.5% Accuracy in Identifying Non Records

» Less Than 1 in 20 Documents Required Human Intervention

Department of Education» 90,000 Documents

» Accuracy Enhanced by Narrowing Categories

» 100% Accuracy Categorizing to Retention Periods

Page 11: Taxonomies in Electronic Records Management Systems

11

Auto Categorization Anecdotes

Factiva» 1500 Topics

» Target of 45% Accuracy

» Achieving 60-80% Accuracy

Gartner Group Findings» Typical Accuracy Is 80-95% When Broad Non-overlapping

Categories Are Used

One Vendor’s Literature» 75-80% Accuracy Is Typical

Page 12: Taxonomies in Electronic Records Management Systems

12

Common Themes

Mutually Exclusive Categories Increase Accuracy

Big Bucket Theory

Easy Retrieval Vs Easy Filing

Stove Piping Vs Open System

Human Effort Necessary» Select Training Set

» Quality Control

» Fine Tune

Page 13: Taxonomies in Electronic Records Management Systems

13

Comments on Accuracy

No Case Study Achieved 100% in Categorization

Accuracy Rises With Fewer Categories

Short Documents Can Have Too Little Content

Long Documents Can Cover Too Many Topics

Fly in Ointment» Accuracy Diminishes at Each Level Down in the File Plan

» In a System Where Auto Categorization Is 80% Accurate, the Expected Accuracy for the Proper Assignment of a Document At the Third Level Down Would Be About 51%

Critical Element - Records Management» Control of File Plan

» Understanding of Technology

Page 14: Taxonomies in Electronic Records Management Systems

14

Searching and Browsing

Page 15: Taxonomies in Electronic Records Management Systems

15

Searching and Browsing

The only thing harder than finding something is finding it again.

Searching» Looking For Something You Know About

» Generally Easy in Electronic Documents

» The Document Comes to You

Browsing» Looking Through a Collection to See What Is There

» Generally Difficult in Electronic Documents

» You Go to the Document(s)

Contextual Browsing» Accessing Other Relevant Content Related to the Content Being Viewed.

» Other Objects May Not Have Been Grouped Together

» Prospective Navigation

Page 16: Taxonomies in Electronic Records Management Systems

16

The Beauty of a Taxonomy Tool

Delivers Information You Did Not Know You Had

Identifies Unknown Associations Between Documents

Summarizes or Abstracts Content

Uses Visual Maps

Does Not Require User to Know Location of the Information

Page 17: Taxonomies in Electronic Records Management Systems

17

Visual Map

Page 18: Taxonomies in Electronic Records Management Systems

18

Visual Map Drilled to Document Level

Page 19: Taxonomies in Electronic Records Management Systems

19

File Plan Creation

Page 20: Taxonomies in Electronic Records Management Systems

20

File Plan Creation Using a Taxonomy Tool

Information Architecture Based on Content

Electronically Generated File Plan

“It is possible to produce affinities through automatic categorization without a pre-existing taxonomy. These categories can then be edited and renamed. Once categories have been created by humans, documents and other information objects can be automatically assigned to those categories.”

Gartner Group

Page 21: Taxonomies in Electronic Records Management Systems

21

Feasibility of Using Taxonomy Software for File Plan Creation

Feasible to Develop a True Records Management File Plan Using Software

Feasible to Populate an RMA With Electronically Generated File Plan

Feasible to Compile a Quantity of Quality Documents to Mine for Creating the Taxonomy

Page 22: Taxonomies in Electronic Records Management Systems

22

Then Why Hasn’t It Been Done?

Existing Retention Schedules Not Built This Way» Map Required File Plan Elements to Appropriate Retention Classification

OR

» Re-Engineer Retention Schedules

Usability for File Plan Development Untested» Statistically Correct

BUT

» May Not Appear Natural to Users

Page 23: Taxonomies in Electronic Records Management Systems

23

Scenario

Humans Create Top Level of File Plan

Software Mines Data - Free Categorization

Software Forms Category Patterns

Humans Use Results to Create One Subsidiary Level in File Plan

Humans Associate Retention Schedules at Secondary Level of File Plan

Software Auto Categorizes Documents Into File Plan

Page 24: Taxonomies in Electronic Records Management Systems

24

NoPattern

Audit

TestFacility

ReliabilityReport

Budget

BudgetCall

MeetingRoom

Change

LunchInvite

Cate-gories

Budget Quality Control Test Facility Reliability

RetentionSchedule

1 year

RetentionSchedule2 years

RetentionSchedule10 years

RetentionSchedule

Permanent

BudgetCorrespondence

Files

Quality ControlReports

Test Facility LogBooks

FormalReliabilityReports

Budget PolicyFles

AdministrativeMeeting

Files

Records Text

ReviewFolder

Records

ReliabilityReports

QualityControlReports

AdministrativeMeeting

Reminders

Personal

ConfidenceBelow

Threshhold

BudgetCorrespondence

Top Level of File Plan

Formation of Clusters

Secondary Levelof Taxonomywith Retention

Schedules

Auto Categorization

Records ManagementStaff

Records ManagementStaff

Software

Software

Resume

Hybrid Solution

Page 25: Taxonomies in Electronic Records Management Systems

25

Conclusion

Use for Support – Not Full Automation

Ongoing Human Commitment to Plan, Create, and Maintain

Consider Portfolio Approach – Mixing Products

Very Effective for Searching and Browsing

Capture and Search Legacy Documents That Otherwise Would Be Too Costly to Process

Integrate With Document Imaging System

Potential Is Huge

Page 26: Taxonomies in Electronic Records Management Systems

26

Resources

Page 27: Taxonomies in Electronic Records Management Systems

27

Web Sites With Energy Glossaries/Thesauri

www.eia.doe.gov

http://www.nerc.com/glossary/

http://www.eren.doe.gov/consumerinfo/glossary/

http://www.naruc.org/resources/glossary.shtml

www.powermarketers.com/glossary.htm

http://hilt.cdlr.strath.ac.uk/Sources/thesauri.html

Page 28: Taxonomies in Electronic Records Management Systems

28

Cool Stuff Thesaurus Management Tools

»www.multites.com

»www.synaptica.com

»www.pmei.com/lexico.html

Books

»Content Management Bible, Bob Boiko

»Information Architecture for the World Wide Web, Louis Rosenfeld & Peter Morville

Free Search Engine for Your Web Site

»http://www.freefind.com/

Page 29: Taxonomies in Electronic Records Management Systems

29

More Cool Stuff DOE Related Use of Taxonomy Tool for Searching

and Browsing» www.lsnnet.gov

Controlled Vocabularies, Thesauri and Classification Systems Available on the Web

» www.lub.lu.se/metadata/subject-help.html

» http://sky.fit.qut.edu.au/~middletm//cont_voc.html

Information Architecture White Papers and Publications

» http://argus-acia.com/index.html

Virtual Library» www.vlib.org/overview.html

Page 30: Taxonomies in Electronic Records Management Systems

30

THANK YOU!

Angela Tayfun, CRM

AT&T Government Solutions, Inc.

1900 Gallows Road

Vienna, VA 22182

Ph: 703.506.5562

E-mail: [email protected]