Transcript

Metadata Quality Assurance in the DLESE Community Collection

2

2

DLESE Community Collection

• Initial DLESE collection, continues to grow

• Approx 4200 items• Public cataloging tool but majority

of items from “known” sources and funded catalogers

3

3

Distribution of Cataloging

DPC5%

AGI64%

MSU19%

Other Community

12%

I

Data estimated as of June 2003, little change over last year

4

4

Quality Assurance Measures – Four stages

• Catalog system provides feedback on duplicate and similar entires

• Every record is reviewed by a person for metadata completeness and quality

• Additional technical checks for vocabulary and required metadata completeness

• Regular, periodic checks for URL viability, syntax and duplication/mirrors

5

5

DLESE Catalog System

• Disallows exact duplicate URL’s

• Provides list of similar URL’s in all stages of submission for decision to catalog or not• Discourage overlapping records

6

6

Human-mediated checks -1

• URL functional • Appropriate URL is cataloged

(granularity and duplication)• Written description aligns with

content at site• Complete sentences, spelling• Avoid repeating redundant

information (-: (technical info, creator)

7

7

Human-mediated checks - 2

• Required metadata is present; review resource and add or amend to follow best practices

• Controlled vocabularies properly assigned- resource type, technical

• Suggested metadata reviewed for accuracy, if present• Keywords• Relation• Coverage• Standards

8

8

Pre-accessioning technical checks

• URL viability checked• Check for missing required metadata

and proper vocabularies.• Coverage errors are flagged, though

some require a move to special directory for edit and subsequent accessioning (crossing the date line)

• Upon accessioning, additional check for duplicate ID numbers and duplicate resource content

9

9

Post-accessioning, ongoing checks

• Linkchecking 2x a day, reports issued twice a week or on demand

• Provides report on resource and relation URLs, indicating error type

• “Vitality” over time (too low is <50% available over 6 previous days)

• Duplication of URL or content (catches mirrors) and mirror URL differs from primary URL alerts

• Email syntax

10

10

Actions taken

• Email syntax and permanent redirects fixed

• Duplications investigated• “Vitality too low” group receive further

investigation to repair

11

11

“Vitality too low” = broken link

• First try to sleuth out new URL and fix it

• If unsuccessful, send email to creator/contact inquiring about status

• If creator replies, fix as indicated• If no reply, remove from discovery

but don’t delete• <1% of DCC collection is “broken”

at any given time

12

12

Ongoing development

• New DCS will support• multiple frameworks (ADN, collection,

anno)• more front-end quality controls; spell

check, completeness notification during cataloging

• Suggest-a-URL to replace full public cataloging

• Ongoing cataloging training and discussion with regular catalogers


Recommended