24
Metadata for Institutional Repositories Carolyn Hansen | @meta_caro NISO Training Thursday February 23, 2017

Hansen Metadata for Institutional Repositories

Embed Size (px)

Citation preview

Metadata for Institutional Repositories

Carolyn Hansen | @meta_caro NISO Training Thursday

February 23, 2017

Hansen | Metadata for IRs

About Me• Metadata Librarian at University

of Cincinnati (UC) Libraries

• Chair, Project Hydra Descriptive Metadata Working Group

• Embedded Metadata Subject Expert for UC Development Team for self-submission IR scholar@UC

• Research: linked data, digital humanities, public history, GIS & data visualization

Hansen | Metadata for IRs

Agenda:• Defining high quality metadata

• The trouble with IRs and metadata

• Case Study: scholar@UC

• Best practices for getting high quality metadata

• Resources and guidelines

Image: Silvia Sala. “Ma perché scrivo? È l'unico mio conforto.” https://flic.kr/p/cok3GL

Hansen | Metadata for IRs

Defining high quality metadata• According to Europeana’s “Report & Recommendations from the Task

Force on Metadata Quality,” high quality metadata is…

• Resulting from a series of trusted processes

• Findable

• Readable

• Standardised

• Meaningful to audiences

• Clear on re-use

• Visible

Hansen | Metadata for IRs

Barriers to getting high quality metadata

• In even the best situations…

• lack of foresight for online discovery

• treating metadata as an afterthought

• lack of funding and resources

• describing digitized items with little information

— Europeana’s “Report & Recommendations from the Task Force on Metadata Quality”

Hansen | Metadata for IRs

The Trouble with IRs and Metadata

• IRs are not controlled environments like traditional cataloging applications, may be vendor created or open-source

• IRs may or not have mediated submissions or quality control measures

• Metadata created by different streams according to different standards (ex. ETDs from graduate school v. self-submission by faculty)

• Librarians/developers/vendors make assumptions about how users will interpret submission forms

Hansen | Metadata for IRs

Why IRs need good metadata

• discovery (in the application and on the Web) depends on high quality, consistent metadata, particularly of facetable fields and subject/keyword terms

• faculty are more likely to use an IR if their material is discoverable on the Web (Ex. Google Scholar)

• IR metadata may be mixed with existing library metadata through an integrated discovery layer; interoperability can be a significant problem

Hansen | Metadata for IRs

Diverse user expectations also makes getting good metadata difficult

• “Why aren’t there ways to express geospatial data?”

• “I just want a DOI for my journal article”

• “Can I just use this for dark storage?”

• “Why are there so many required fields? I don’t have time to enter all this information.”

Hansen | Metadata for IRs

Case Study: • self-submission repository, open-source development based on

Project Hydra Stack (https://scholar.uc.edu/)

• metadata profiles pre-populate submission fields based on work type the user chooses at the beginning of submission

• only remediation of metadata is quarterly “clean” of facetable fields; LCSH not used but NAF is in remediation when possible

• in 2015-2016, cohort of early adopters populated the application with approximately 300 records

• despite “help” content embedded in the application showing how metadata should be entered, many metadata problems occurred

Hansen | Metadata for IRs

Hansen | Metadata for IRs

Hansen | Metadata for IRs

Hansen | Metadata for IRs

Hansen | Metadata for IRs

Example problems• “Title” field: Some faculty users unsure whether to enter the

title of their work or their academic title

• Unintelligible titles and incomplete metadata. Example => Title of a dataset: “p.txt.”

• Multiple instances of submitters’ name as “creator”

• Entering subjects in facetable fields using multiple variant spellings, capitalizations, etc.

• Technical problems like trailing/leading whitespace impacting facets

Hansen | Metadata for IRs

Personal names entered as subject in direct order, no controlled vocabulary Similar concepts with variant

spelling

acronyms

Could use LCSH terms in remediation

Entered as subject, not entered in separate geographic subject field

Hansen | Metadata for IRs

With IRs and metadata, you get what you ask for, or less

Whether designing GUIs for open-source IRs, or customizing vendor created interfaces,

remember…

Hansen | Metadata for IRs

Example: Datasets & Complex Files

Hansen | Metadata for IRs

Hansen | Metadata for IRs

What is the role of the metadata specialist?

cheerleader?

police?

cleaner?

educator?

all of the above?

Hansen | Metadata for IRs

Our imperfect solution• for scholar@UC, we decreased the number of required fields;

basically only the fields needed for a DOI

• we expanded help information and examples embedded within submission form

• additional descriptive fields are available but are hidden from main view to prevent input burnout

• sustainability of manual metadata remediation is an unsolved issue as the amount of content in the IR grows

• datasets and complex files present continual challenges; we encourage users to add “Read Me” files and other explanatory documents, but not many users are doing so

Hansen | Metadata for IRs

Best Practices for getting high quality metadata

• Develop a baseline metadata profile depending on your institution’s internal needs as well as harvesting for other platforms; this profile should define required and optional submission fields

• Determine if baseline metadata profiles differ depending on the material type (ex. documents v. datasets)

• Be very conscious of the UX experience of user submission forms; do focus group testing of forms if possible; consider customization of vendor created interfaces if necessary

• Determine an assessment schedule early, so that metadata specialists can identify trends of what is/isn’t working regarding metadata submission

• Determine which fields are candidates for remediation or enrichment

• Enrichment projects and remediation should involve users AND metadata specialists

Hansen | Metadata for IRs

Resources• Chapman, Joyce Celeste. “User Feedback and Cost/Value Analysis of Metadata Creation.” http://www2.archivists.org/sites/

all/files/saa_description_presentation_2010_chapman.pdf

• Digital Library Federation (DLF) Assessment Interest Group (AIG) Metadata Working Group. http://dlfmetadataassessment.github.io/

• “Draft Principles for Evaluating Metadata Standards.” ALCTS/LITA Metadata Standards Committee. http://metaware.buzz/2015/10/27/draft-principles-for-evaluating-metadata-standards/

• Europeana Pro Data Quality Committee. http://pro.europeana.eu/page/data-quality-committee

• “Final Report.” Task Force on Enrichment and Evaluation. http://pro.europeana.eu/files/Europeana_Professional/Publications/Metadata%20Quality%20Report.pdf

• Harlow, Christina. “Metadata Quality Analysis.” https://github.com/cmh2166/ShareFest15MetadataQA

• Harper, Corey. Can Metadata be Quantified? (slides presented at 2015 DPLAFest) 2015-04-18. https://schd.ws/hosted_files/dplafest2015/c1/CanMetadataBeQuantifiedSlides.pdf

• Hydra Metadata Interest Group. https://wiki.duraspace.org/display/hydra/Hydra+Metadata+Interest+Group

• Neatrour, Anna and Myntti, Jeremy. Automating Controlled Vocabulary Reconciliation. (slides presented at DLF Forum 2015) http://www.slideshare.net/aneatrour/automating-controlled-vocabulary-reconciliation

• “Report and Recommendations from the Task Force on Metadata Quality.” http://pro.europeana.eu/files/Europeana_Professional/Publications/Metadata%20Quality%20Report.pdf

Hansen | Metadata for IRs

ConnectEmail: [email protected]

Twitter: @meta_caro

Web: www.carolynhansen.org

Hansen | Metadata for IRs

Thank you!

Questions?