Hansen Metadata for Institutional Repositories

Metadata for Institutional Repositories

Carolyn Hansen | @meta_caro NISO Training Thursday

February 23, 2017

Hansen | Metadata for IRs

About Me• Metadata Librarian at University

of Cincinnati (UC) Libraries

• Chair, Project Hydra Descriptive Metadata Working Group

• Embedded Metadata Subject Expert for UC Development Team for self-submission IR scholar@UC

• Research: linked data, digital humanities, public history, GIS & data visualization


Agenda:• Defining high quality metadata

• The trouble with IRs and metadata

• Case Study: scholar@UC

• Best practices for getting high quality metadata

• Resources and guidelines

Image: Silvia Sala. “Ma perché scrivo? È l'unico mio conforto.” https://flic.kr/p/cok3GL

https://flic.kr/p/cok3GL


Defining high quality metadata• According to Europeana’s “Report & Recommendations from the Task

Force on Metadata Quality,” high quality metadata is…

• Resulting from a series of trusted processes

• Findable

• Readable

• Standardised

• Meaningful to audiences

• Clear on re-use

• Visible


Barriers to getting high quality metadata

• In even the best situations…

• lack of foresight for online discovery

• treating metadata as an afterthought

• lack of funding and resources

• describing digitized items with little information

— Europeana’s “Report & Recommendations from the Task Force on Metadata Quality”


The Trouble with IRs and Metadata

• IRs are not controlled environments like traditional cataloging applications, may be vendor created or open-source

• IRs may or not have mediated submissions or quality control measures

• Metadata created by different streams according to different standards (ex. ETDs from graduate school v. self-submission by faculty)

• Librarians/developers/vendors make assumptions about how users will interpret submission forms


Why IRs need good metadata

• discovery (in the application and on the Web) depends on high quality, consistent metadata, particularly of facetable fields and subject/keyword terms

• faculty are more likely to use an IR if their material is discoverable on the Web (Ex. Google Scholar)

• IR metadata may be mixed with existing library metadata through an integrated discovery layer; interoperability can be a significant problem


Diverse user expectations also makes getting good metadata difficult

• “Why aren’t there ways to express geospatial data?”

• “I just want a DOI for my journal article”

• “Can I just use this for dark storage?”

• “Why are there so many required fields? I don’t have time to enter all this information.”


Case Study: • self-submission repository, open-source development based on

Project Hydra Stack (https://scholar.uc.edu/)

• metadata profiles pre-populate submission fields based on work type the user chooses at the beginning of submission

• only remediation of metadata is quarterly “clean” of facetable fields; LCSH not used but NAF is in remediation when possible

• in 2015-2016, cohort of early adopters populated the application with approximately 300 records

• despite “help” content embedded in the application showing how metadata should be entered, many metadata problems occurred

https://scholar.uc.edu/






Example problems• “Title” field: Some faculty users unsure whether to enter the

title of their work or their academic title

• Unintelligible titles and incomplete metadata. Example => Title of a dataset: “p.txt.”

• Multiple instances of submitters’ name as “creator”

• Entering subjects in facetable fields using multiple variant spellings, capitalizations, etc.

• Technical problems like trailing/leading whitespace impacting facets


Personal names entered as subject in direct order, no controlled vocabulary Similar concepts with variant

spelling

acronyms

Could use LCSH terms in remediation

Entered as subject, not entered in separate geographic subject field


With IRs and metadata, you get what you ask for, or less

Whether designing GUIs for open-source IRs, or customizing vendor created interfaces,

remember…


Example: Datasets & Complex Files



What is the role of the metadata specialist?

cheerleader?

police?

cleaner?

educator?

all of the above?


Our imperfect solution• for scholar@UC, we decreased the number of required fields;

basically only the fields needed for a DOI

• we expanded help information and examples embedded within submission form

• additional descriptive fields are available but are hidden from main view to prevent input burnout

• sustainability of manual metadata remediation is an unsolved issue as the amount of content in the IR grows

• datasets and complex files present continual challenges; we encourage users to add “Read Me” files and other explanatory documents, but not many users are doing so


Best Practices for getting high quality metadata

• Develop a baseline metadata profile depending on your institution’s internal needs as well as harvesting for other platforms; this profile should define required and optional submission fields

• Determine if baseline metadata profiles differ depending on the material type (ex. documents v. datasets)

• Be very conscious of the UX experience of user submission forms; do focus group testing of forms if possible; consider customization of vendor created interfaces if necessary

• Determine an assessment schedule early, so that metadata specialists can identify trends of what is/isn’t working regarding metadata submission

• Determine which fields are candidates for remediation or enrichment

• Enrichment projects and remediation should involve users AND metadata specialists


Resources• Chapman, Joyce Celeste. “User Feedback and Cost/Value Analysis of Metadata Creation.” http://www2.archivists.org/sites/

all/files/saa_description_presentation_2010_chapman.pdf

• Digital Library Federation (DLF) Assessment Interest Group (AIG) Metadata Working Group. http://dlfmetadataassessment.github.io/

• “Draft Principles for Evaluating Metadata Standards.” ALCTS/LITA Metadata Standards Committee. http://metaware.buzz/2015/10/27/draft-principles-for-evaluating-metadata-standards/

• Europeana Pro Data Quality Committee. http://pro.europeana.eu/page/data-quality-committee

• “Final Report.” Task Force on Enrichment and Evaluation. http://pro.europeana.eu/files/Europeana_Professional/Publications/Metadata%20Quality%20Report.pdf

• Harlow, Christina. “Metadata Quality Analysis.” https://github.com/cmh2166/ShareFest15MetadataQA

• Harper, Corey. Can Metadata be Quantified? (slides presented at 2015 DPLAFest) 2015-04-18. https://schd.ws/hosted_files/dplafest2015/c1/CanMetadataBeQuantifiedSlides.pdf

• Hydra Metadata Interest Group. https://wiki.duraspace.org/display/hydra/Hydra+Metadata+Interest+Group

• Neatrour, Anna and Myntti, Jeremy. Automating Controlled Vocabulary Reconciliation. (slides presented at DLF Forum 2015) http://www.slideshare.net/aneatrour/automating-controlled-vocabulary-reconciliation

• “Report and Recommendations from the Task Force on Metadata Quality.” http://pro.europeana.eu/files/Europeana_Professional/Publications/Metadata%20Quality%20Report.pdf

http://www2.archivists.org/sites/all/files/saa_description_presentation_2010_chapman.pdf

http://dlfmetadataassessment.github.io/

http://metaware.buzz/2015/10/27/draft-principles-for-evaluating-metadata-standards/

http://pro.europeana.eu/page/data-quality-committee

http://pro.europeana.eu/files/Europeana_Professional/Publications/Metadata%20Quality%20Report.pdf

https://github.com/cmh2166/ShareFest15MetadataQA

https://schd.ws/hosted_files/dplafest2015/c1/CanMetadataBeQuantifiedSlides.pdf

https://wiki.duraspace.org/display/hydra/Hydra+Metadata+Interest+Group

http://www.slideshare.net/aneatrour/automating-controlled-vocabulary-reconciliation

http://pro.europeana.eu/files/Europeana_Professional/Publications/Metadata%20Quality%20Report.pdf


ConnectEmail: [email protected]

Twitter: @meta_caro

Web: www.carolynhansen.org

mailto:[email protected]

http://www.carolynhansen.org


Thank you!

Questions?

Education

Hansen Metadata for Institutional Repositories