Upload
makreal
View
219
Download
0
Embed Size (px)
Citation preview
7/30/2019 Metadata Creation Practices in Digital
1/13
104 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2010
developmentof such amediationmechanism calls foran empirical assessmentofvarious issues surrounding
metadata-creationpractices.The critical issues concerning metadata practices
across distributed digital collections have been rela-tivelyunexplored.Whileexamininglearningobjectsande-printscommunitiesofpractice,Barton,Currier,andHeypointoutthelackofformalinvestigationofthemetadata-creationprocess.2Aswillbe discussedin the followingsection,someresearchershavebeguntoassessthecurrentstate of descriptive practices, metadata schemata, andcontent standards. However, the literature has not yetdeveloped toa pointwhereit affords a comprehensivepicture.Giventhepropagationofmetadataprojects,itisimportanttocontinuetotrackchangesinmetadata-cre-ationpracticeswhiletheyarestillinconstantflux.Sucheffortsareessentialforaddingnewperspectivestodigitallibrary researchand practicesin an environmentwheremetadata best practices are being actively sought afterto aid in thecreation and management ofhigh-qualitydigitalcollections.
This study examines theprevailing current state ofmetadata-creation practices in digital repositories, col-lections,andlibraries,whichmayincludebothdigitizedand born-digital resources. Using nationwide surveydata, mostly drawn from the community of catalog-ingandmetadata professionals,we seek to investigateissues in creatingdescriptivemetadata elements, usingcontrolled vocabularies for subject access, and propa-
gatingmetadata andmetadata guidelines beyondlocalenvironments.Wewilladdressthefollowingresearchquestions:
1. Whichmetadataschema(ta)andcontentstandard(s)areemployed in individualdigital repositories andcollections?
2. Whichcontrolledvocabularyschema(ta)areusedtofacilitatesubjectaccess?
3. Whatcriteriaareappliedin selectingmetadataandcontrolled-vocabularyschema(ta)?
4. To what extent are mechanisms for exposing andsharingmetadata integrated intocurrentmetadata-creationpractices?
Inthisarticle,wefirstreviewrecentstudiesrelatingtocurrentmetadata-creationpracticesacrossdigitalcol-lections.Thenwepresentthesurveymethodemployedtoconductthisstudy,thegeneralcharacteristicsofsurveyparticipants, and thevalidityof the collected data, fol-lowedbythestudyresults.Wereportonhowmetadataand controlled vocabulary schema(ta) are being usedacross institutions, and we present a data analysis ofcurrent metadata-creation practices. The final sectionsummarizesthestudyandpresentssomesuggestionsforfuturestudies.
This study explores the current state of metadata-creation
practices across digital repositories and collections by
using data collected from a nationwide survey of mostly
cataloging and metadata professionals. Results show
that MARC, AACR2, and LCSH are the most widely
used metadata schema, content standard, and subject-
controlled vocabulary, respectively. Dublin Core (DC) is
the second most widely used metadata schema, followed
by EAD, MODS, VRA, and TEI. Qualified DCs wider
use vis--vis Unqualified DC (40.6 percent versus 25.4
percent) is noteworthy. The leading criteria in selecting
metadata and controlled-vocabulary schemata are collec-
tion-specific considerations, such as the types of resources,nature of the collection, and needs of primary users and
communities. Existing technological infrastructure and
staff expertise also are significant factors contributing
to the current use of metadata schemata and controlled
vocabularies for subject access across distributed digital
repositories and collections. Metadata interoperability
remains a major challenge. There is a lack of exposure of
locally created metadata and metadata guidelines beyond
the local environments. Homegrown locally added meta-
data elements may also hinder metadata interoperability
across digital repositories and collections when there is alack of sharable mechanisms for locally defined extensions
and variants.
Metadata is an essential building block in facili-tating effective resource discovery, access, andsharingacross ever-growingdistributed digital
collections. Quality metadata is becoming critical in anetworked world in which metadata interoperabilityis among the top challenges faced by digital libraries.However,thereisnocommondatamodelthatcatalog-ing and metadata professionals can readily reference
as a mediation mechanism during the processes ofdescriptive metadata creation and controlled vocabu-lary schemata application for subject description.1 The
Jng-an pak ([email protected]) is assistntPrfessr, Cege f Infrmtin Science nd Techngy, Drex-e niversity, Phidephi, nd Yj taka ([email protected])is Ctging/Metdt librrin, TCJ librry, The Cege fe Jersey, Eing, e Jersey.
Jung-ran Park andYuji Tosaka
Metadata Creation Practices in Digital
Repositories and Collections: Schemata,
Selection Criteria, and Interoperability
7/30/2019 Metadata Creation Practices in Digital
2/13
MetADAtA creAtioN prActices iN DiGitAl repositories AND collectioNs | pArK AND tosAKA 105
possibleincreaseintheuseoflocallydevelopedschemataasmanyprojectsaddednewtypesofnontextualdigital
objectsthatcouldnotbeadequatelydescribedbyexistingmetadataschemata.6
Thereisalackofresearchconcerningthecurrentuseofcontentstandards;however,itisreasonabletosuspectthat content-standards use exhibits patterns similar tothatofmetadatabecauseoftheiroftencloseassociationwithparticularmetadataschemata.TheOCLCRLGsur-veyrevealsthatAnglo-AmericanCataloguingRules,2ndedition(AACR2)thetraditionalcatalogingrulethathasmostoftenbeenusedinconjunctionwithMARCisthemostwidelyusedcontentstandard(81percent).AACR2isfollowedbyDescribingArchives:AContentStandard(DACS)with42percent;DescriptiveCatalogingofRareMaterials with 33 percent; Archives, Personal Papers,
Manuscripts (APPM) with 25 percent; and CatalogingCulturalObjects(CCO)with21percent.7
Inthesamewayasmetadataschemata,thereappearsto be a concentration of a few controlled vocabularyschemata at research institutions.MasARL survey, forexample, shows that the Library of Congress SubjectHeadings(LCSH)andNameAuthorityFile(NAF)wereused by most survey respondents (96 percent and 88percent,respectively).Thesetwopredominantlyadoptedvocabularies are followed by several domain-specificvocabularies, such as Art and Architecture Thesaurus(AAT), Library of Congress Thesaurus for GraphicalMaterials(TGM)IandII,GettyThesaurusofGeographic
Names(TGN),andtheGettyUnionListofArtistsNames(ULAN),whichwereusedbybetween30percenttomorethan60percentofrespondents.8TheOCLCRLGsurveyreportssimilarresults;however,nearlyhalfoftheOCLCRLGsurveyrespondents(N=9)indicatedthattheyhadalsobuiltandmaintainedoneormorelocallydevelopedthesauri.9
While creating and sharing information about localmetadata implementations is an important step towardincreasedinteroperability, recent studies tendto paintagrimpictureofcurrentlocaldocumentationpracticesandopenaccessibility. Ina nationwidestudyof institutionalrepositories in U.S. academic libraries, Markey et al.foundthatonly61.3percentofthe446surveyparticipants
with operational institutional repositories had imple-mentedpolicies formetadata schemata and authorizedmetadata creators.10 TheOCLCRLG survey also high-lightslimitedcollaborationandsharingofthemetadataguidelinesbothwithinandacrosstheinstitutions.Itfindsthatevenwhentherearemultipleunitscreatingmetadatawithinthesameinstitution,metadata-creationguidelinesoftenareunlikelytobeshared(28percentdonotshare;53percentsometimesshare).11
Amixedresultisreportedontheexposureofmeta-datatooutsideserviceproviders.InanARLsurvey,theUniversityofHoustonLibrariesInstitutionalRepository
Literature Review
As evinced by the principles and practices of bib-liographiccontrolthroughsharedcataloging,successfulresource access and sharing in the networked envi-ronment demands semantic interoperability based onaccurate,complete,andconsistentresourcedescription.TherecentsurveybyMafindsthattheOpenArchivesInitiativeProtocolforMetadataHarvesting(OAI-PMH)and metadata crosswalks have been adopted by 83percent and 73 percent of respondents, respectively.Even though the sample comes only from sixty-eightAssociationofResearchLibraries(ARL)memberlibrar-ies, and the figures thusmay be skewed higher thanthose of the entire population of academic libraries,there is little doubt that interoperability is a critical
issuegiventherapidproliferationofmetadataschematathroughoutdigitallibraries.3
While there is a variety ofmetadata schemata cur-rently in use for organizing digital collections, only afew of them arewidely used indigital repositories. InherARLsurvey,MareportsthattheMARCformatisthemostwidelyusedmetadataschema(91percent),followedby Encoded Archival Description (EAD) (84 percent),UnqualifiedDublinCore(DC)(78percent),andQualifiedDC (67 percent).4 Similarly, a 2007 member survey byOCLCResearchLibrariesGroup (RLG) programsgath-eredinformation from eighteenmajor research librariesand cultural heritage institutions and also found that
MARCisthemostwidelyusedscheme(65percent),fol-lowedbyEAD(43percent),UnqualifiedDC(30percent),andQualifiedDC(29percent).Thedifferentlevelsofusereportedby thesestudies are probably dueto differentsample sizesand compositions, but results nonethelesssuggestthatmetadatauseatresearchinstitutionstendstorelyonasmallnumberofmajorschemata.5
Theremayinfactbemuchgreaterdiversityinmeta-datausepatternswhenthescopeisexpandedtoincludeboth research and nonresearch institutions. Palmer,Zavalina,andMustafoff,forexample,trackedtrendsfrom2003through2006inmetadataselectionandapplicationpracticesatmore than160digitalcollectionsdevelopedthroughInstituteofMuseumandLibraryServicesgrants.
Theyfoundthatdespiteperceivedlimitations,useofDCisthemostwidespread,withmorethanhalfofthedigitalcollectionsusing it alone or in combinationwith otherschemata.MARC rankssecond,with nearly 30percentusingitaloneorincombination.Theauthorsfoundthatthechoiceofmetadataschema is largely influenced bypractices at peer institutions and compatibility with acontentmanagementsystem.Whatismoststriking,how-ever, is the finding that locallydevelopedschemataareusedasoftenasMARC.Thereisadeclineinthepercent-ageofdigitalprojectsusingmultiplemetadataschemata(from53percentto38percent).Yettheauthorsalsosawa
7/30/2019 Metadata Creation Practices in Digital
3/13
106 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2010
Method
Theobjectiveoftheresearchreportedinthispaperistoexaminethecurrentstateofmetadata-creationpracticesintermsofthecreationofdescriptivemetadataelements,the use of controlled vocabularies for subject access,and the exposureofmetadata andmetadataguidelinesbeyondlocalenvironments.WeconductedaWebsurveyusing WebSurveyor (now Vovici: http://www.vovici.com). The survey included both structured and open-endedquestions.Itwasextensivelyreviewedbymembersof an advisory boarda group of three experts in thefieldand it was pilot-tested prior to being officiallylaunched.Thesurvey includedmanymultiple-responsequestionsthatcalledforrespondentstocheckallappli-cableanswers.
We recruitedparticipants through survey invitationmessages and subsequent reminders to the electronicmailinglistsofcommunitiesofmetadataandcatalogingprofessionals.Table1 showsthemailing lists employedforthestudy.Wealsosentoutindividualinvitationsanddistributedflyerstoselectedmetadataandcatalogingses-sionsduringthe2008ALAMidwinterMeeting,heldthatyearinPhiladelphia.
The survey attracted a large number of initial par-ticipants(N=1,371),butduringthesixty-twodaysfromAugust6toOctober6,2008,weonlyreceived303com-pletedresponsesviathesurveymanagementsystem.Wesuspect that the high incompletion rate (77.9 percent)
stemsfromthefactthatthesubjectmattermayhavebeenoutsidethescopeofmanyparticipantsjobresponsibili-ties.Thelengthofthesurveymayalsohavebeenafactorintheincompletionrate.
The profiles of respondents job titles (see table 2)
Task Force found that exposingmetadata toOAI-PMHservice providers is an established practice used by
nearly90percentoftherespondents.12MasARLsurveyalso reports the wide adoption of OAI-PMH (83 per-cent).Theseresultsunderscorethevirtualconsensusonthecritical importance ofexposingmetadatatoachieveinteroperabilityandmakelocallycreatedmetadatausefulacrossdistributeddigitalrepositoriesandcollections.13Bycontrast,theOCLCRLGsurveyshowsthatonlyone-tenthoftherespondentsstatedthatallnon-MARCmetadataisexposed to OAI harvesters, while 30 percent indicatedthatonlysomeofitwasavailable.TheprominentthemerevealedbytheOCLCRLGsurveyisaninwardfocusincurrentmetadatapractices,markedbytheuseoflocaltoolstoreachagenerallylocalaudience.14
In summary, recent studies show that the current
practiceofmetadatacreation isproblematic due to thelack of a mechanism for integrating various types ofmetadata schemata, content standards, and controlledvocabularies inways that promote an optimal level ofinteroperabilityacrossdigitalcollectionsandrepositories.Theproblemsareexacerbatedinanenvironmentwheremany institutions lack local documentation delineatingthemetadata-creationprocess.
At the same time, researchers have only recentlybegun studying these issues, and the body of literatureis at an incipient stage. The research that was doneoften targeted different populations, and sample sizeswere different (some very small). In some cases the
literature exhibits contradictory findings about issuessurroundingmetadatapractices,increasingthedifficultyinunderstandingthecurrentstateofmetadatacreation.Thispointsoutthe need for further researchof currentmetadata-creationpractice.
Table 1. Electronic mailing lists for the survey
Electronic Mailing Lists E-mail Address
Autocat
Dublin Core Listserv
Metadata Librarians Listserv
Library and Information Technology Association Listserv
Online Audiovisual Catalogers Electronic Discussion List
Subject Authority Cooperative Program Listserv
Serialist
Text Encoding Initiative Listserv
Electronic Resources in Libraries Listserv
Encoded Archival Description Listserv
7/30/2019 Metadata Creation Practices in Digital
4/13
MetADAtA creAtioN prActices iN DiGitAl repositories AND collectioNs | pArK AND tosAKA 107
and job responsibilities (see table 3) clearly show thatmost of the individuals who completed the surveyengage professionally in activities directly relevant tothe research objectives, such asdescriptive and subject
cataloging,metadatacreationandmanagement, author-ity control, nonprint and special material cataloging,electronicresourceanddigitalprojectmanagement,andintegratedlibrarysystem(ILS)management.
Althoughthelargestnumberofparticipants(135,or44.6percent)chosetheOthercategoryregardingtheirjobtitle (see table2), it is reasonabletoassume that thevastmajoritycanbecategorizedascatalogingandmeta-dataprofessionals.15MostjobtitlesgivenasOtherareassociatedwithoneoftheprofessionalactivitieslistedintable4.
Thusitisreasonabletoassumethattherespondentsare in an appropriate position to provide first-hand,accurateinformationaboutthecurrentstateofmetadata
creationintheirinstitutions.Concerning the institutional background of partici-
pants, of the 303 survey participants, fewer than half(121,or39.9percent)providedinstitutionalinformation.We believe that this is mostly due to the fact that thequestionwasoptional,followinga suggestionfrom theInstitutionalReviewBoardatDrexelUniversity.Ofthosethatprovidedtheirinstitutionalbackground,themajority(75.2 percent) are from academic libraries, followedbyparticipantsfrompubliclibraries(17.4percent)andfromotherinstitutions(7.4percent).
Table 3. Participants job responsibilities (multiple responses)
Job Responsibilities
Number of
Participants
General cataloging
(e.g., descriptive and subject
cataloging)
171 (56.4%)
Metadata creation and
management
153 (50.5%)
Authority control 147 (48.5%)
Nonprint cataloging
(e.g., microform, music scores,
photographs, video recordings)
133 (43.9%)
Special material cataloging
(e.g., rare books, foreign language
materials, government documents)
126 (41.6%)
Digital project management 101 (33.3%)
Electronic resource management 62 (20.5%)
ILS management 59 (19.5%)
Other 51 (16.8%)
Survey questin: wht re yur primry jb respnsibiities? (pese check tht ppy)
Table 2. Job titles of participants (multiple responses)
Job Titles
Number of
Participants
Other 135 (44.6%)
Cataloger/cataloging librarian/
catalog librarian
99 (32.7%)
Metadata librarian 29 (9.6%)
Catalog & metadata librarian 26 (8.6%)
Head, cataloging 26 (8.6%)
Electronic resources cataloger 17 (5.6%)
Cataloging coordinator 15 (5.0%)
Head, cataloging & metadata
services
15 (5.0%)
N= 227. Survey questin: wht is yur rking jb tite? (pese check tht ppy)
Table 4. Professional activities specified in Other category intable 2
Professional Activities
Number of
Participants
Cataloging & metadata creation 31 (10.2%)
Digital projects management 23 (7.6%)
Technical services 17 (5.6%)
Archiving 16 (5.3%)
Electronic resources and
serials management
6 (2.0%)
Library system administration/
other
6 (2.0%)
N= 99. Survey questin: If yu seected ther, pese specify.
7/30/2019 Metadata Creation Practices in Digital
5/13
108 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2010
ItisnoteworthythatuseofQualifiedDCwashigherthan that of Unqualified DC. This result is differentfrom theARL survey and amember survey conducted
Results
In this section, we will present thefindings ofthis studyin the followingthreeareas:(1)metadataandcontrolledvocabulary schemata and metadatatools used, (2) criteria for selectingmetadata and controlled vocabularyschemata, and (3) exposing metadataand metadata guidelines beyond localenvironments.
Madaa and cndVabay shmaa andMadaa t ud
A great variety ofdigital objectswerehandled by the survey participants,asfigure1shows.Themostfrequentlyhan-dledobjectwastext,citedby86.5percentoftherespondents.Aboutthree-fourthsof the respondents described audiovi-sualmaterials(75.2percent),while60.1percentdescribedimagesand51.8per-centdescribedarchivalmaterials.Morethan65percentoftherespondentshan-dledelectronicresources (68.3percent)and digitized resources (66.7 percent),whileapproximatelyhalfhandledborn-
digital resources (52.5 percent). Thetypes ofmaterials described in digitalcollectionswerediverse,encompassingboth digitizedandborn-digitalmateri-als;however,digitizationaccountedfora slightly greater percentage of meta-datacreation.
To handle these diverse digitalobjects, the respondents institutionsemployed a wide range of metadataschemata, as figure 2 shows. Yet therewereafewschematathatwerewidelyused by cataloging and metadata pro-fessionals. Specifically, 84.2 percent
of the respondents institutions usedMARC;DCwasalsopopular,with25.4percentusingUnqualifiedDCand40.6percent using Qualified DC to createmetadata. EAD also was frequentlycited(31.7percent).Inadditiontothesemajortypesofmetadataschemata,therespondentsinstitutionsalsoemployedMetadata Object Description Schema(MODS) (17.8 percent), Visual Resource Association(VRA)Core (14.9 percent),and TextEncoding Initiative(TEI)(12.5percent).
Figure 1. Materials/resources handled (multiple responses)
Survey questin: wht type f mteris/resurces d yu nd yur fe ctgers/metdt ibrr-ins hnde? (pese check tht ppy)
Figure 2. Metadata schemata used (multiple responses)
Survey questin: which metdt schem(s) d yu nd yur fe ctgers/metdt ibrrinsuse? (pese check tht ppy)
7/30/2019 Metadata Creation Practices in Digital
6/13
MetADAtA creAtioN prActices iN DiGitAl repositories AND collectioNs | pArK AND tosAKA 109
custommetadataelementsderivesfromtheimperativeto accommodate the perceived needs of local collec-
tionsandusers,asindicatedbythetwomostcommonresponses: (1) to reflect the nature of local collec-tions/resources(76.9 percent) and (2)to reflect thecharacteristics of target audience/community of localcollections (58.3 percent). Local conditions were alsocitedfrominstitutionalandtechnicalstandpoints.Manyinstitutions(34.3percent)followexistinglocalpracticesforcatalogingandmetadatacreationwhile other insti-tutions(18.5percent)aremakinghomegrownmetadataadditionsbecauseofconstraintsimposedbytheirlocalsystems.
Table 6 summarizes themost frequentlyusedcon-trolled vocabulary schematas by resource type. By far
themostwidelyusedschemaacrossallresourcetypeswasLCSH.ThepreeminenceofLCSHevincesthecriti-calrolethatitplaysasthedefactoformofcontrolledvocabularyforsubjectdescription.LibraryofCongressClassification (LCC) was the second choice for allresourcetypesotherthanimages,culturalobjects,andarchives.Fordigital collectionsof these resource typesanddigitizedresources,AATwasthesecondmostusedcontrolledvocabulary,afactthatreflectsitspurposeasadomain-specificterminologyusedfordescribingworksof art, architecture, visual resources, material culture,andarchivalmaterials.
While traditional metadata schemata, content stan-dards, and controlled vocabularies such as MARC,
AACR2, and LCSH clearly were preeminent in themajority of the respondents institutions, currentmeta-datacreationindigitalrepositoriesandcollectionsfacesnew challenges from the enormous volume of onlineand digital resources.19 Approximately one-third of therespondents institutions (33.8 percent) were meetingthis challenge with tools for semiautomatic metadatageneration.Yeta majorityof respondents(52.5percent)indicated that their institutions did not use any suchtoolsformetadatacreationandmanagement.ThisresultseemstocontrastwithMasfindingthatautomaticmeta-data generation was used in some capacity in nearly
by OCLC RLG programs (as described in LiteratureReviewonpage105).16Inthesesurveys,UnqualifiedDC
wasmorefrequentlycitedthanQualifiedDC.Onepos-sibleexplanationofthislessfrequentuseofUnqualifiedDCmay lie inthe limitationsofUnqualifiedDCmeta-data semantics. Survey respondents also reported onproblemsusingDCmetadata,whichweremostlycausedby semantic ambiguities and semantic overlaps of cer-tainDCmetadata elements.17 Limitationsandissues ofUnqualified DC metadata semantics are discussed indepthinParksstudy. 18Inlightoftheseresults,examin-ingtrendsofQualifiedDCuseinafuturestudywouldbeinteresting.
Despitethewidevarietyofschematareportedinuse,thereseemedtobeaninclinationtouseonlyoneortwometadataschemataforresourcedescription.Asshownin
table5,themajorityoftherespondentsinstitutions(53.6percent) used only one schema for metadata creation,whileapproximately37percentusedtwoorthreesche-mata (26.2 percent and 10.3 percent, respectively). Theinstitutions usingmore than three schemataduring themetadata-creationprocessescomprisedonly9.9 percentoftherespondents.
Turningtocontentstandards(seefigure3),wefoundthatAACR2was themostwidelyused standard, indi-catedby84.5percentofrespondents.Thishighpercentageclearly reflects the continuing preeminence of MARCasthemetadataschemaofchoicefordigitalcollections.DC applicationprofiles also showeda largeuser base,
indicated bymore than one-third of respondents (37.0percent).Morethanonequarteroftherespondents(28.4percent)used EADapplicationguidelinesas developedby the Society of AmericanArchivists and the LibraryofCongress,while 10.6percentused RLGBestPracticeGuidelines for Encoded Archival Description (2002).Aboutonequarter(25.7percent)indicatedDACSastheircontentstandard.
Homegrownstandardsandguidelinesarelocalappli-cationprofilesthatclarifyexistingcontentstandardsandspecify howvalues for metadata elements are selectedandrepresentedtomeettherequirementsofaparticularcontext.Asshownintheresultsonmetadataschemata,itisnoteworthythathomegrowncontentstandardsand
guidelinesconstitutedoneofthemajorchoicesofpartici-pants,indicatedbymorethanone-fifthoftheinstitutions(22.1 percent). Almost two-fifths of the survey partici-pants(38percent)alsoreportedthattheyaddhomegrownmetadataelementstoagivenmetadataschema.Slightlylessthanhalfoftheparticipants(47.2percent)indicatedotherwise.
The local practice of creating homegrown contentguidelinesandmetadataelementsduringthemetadata-creationprocessdeserves a separate study; this studyonly briefly touches on the basis for locally addedcustom metadata elements. The motivation to create
Table 5. Number of metadata schemata in use
Number of Metadata
Schemata in Use Number of Participants
1 141 (53.6%)
2 69 (26.2%)
3 27 (10.3%)
4 or more 26 (9.9%)
N=263. Survey questin: which metdt schem(s) d yu nd yur fectgers/metdt ibrrins use the mst? (pese check tht ppy)
7/30/2019 Metadata Creation Practices in Digital
7/13
110 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2010
ca f sng Madaa and cndVabay shmaa
Whatare the factors thathaveshaped the current stateofmetadata-creationpracticesreported thusfar? In this
section,we turn our attention to constraints that affectdecisionmakingatinstitutionsintheselectionofmeta-data and controlled vocabulary schemata for subjectdescription.
Figure4presentsthepercentageofdifferentmetadataschemata selection criteria described by survey par-ticipants. First, collection-specific considerations clearlyplayedamajorroleintheselection.Themostfrequentlycitedreasonwastypesofresources(60.4percent).Thisresponsereflectsthefactthatalargenumberofmetadataschematahave been developed, oftenwithwide varia-tion in content and format, to better handle particular
two-thirdsofARLlibraries.20
Because semiautomatic metadata application isreported in-depth in a separate study, we only brieflysketch the topic here.21 The semiautomatic metadataapplicationtoolsusedinthe respondentsdigitalreposi-
toriesandcollectionscanbeclassifiedintofivecategoriesof common characteristics: (1)metadata format conver-sion, (2) templates and editors for metadata creation,(3) automatic metadata creation, (4) library system forbibliographic and authority control, and (5) metadataharvestingandimportingtools.
As table 7 illustrates, among those institutions thathave introduced semiautomatic metadata generationtools, metadata format conversion (38.6 percent) andtemplates and editors for metadata creation (27 per-cent)arethetwomostfrequentlycitedtools.
Figure 3. Content standards used (multiple responses)
Survey questin: wht cntent stndrd(s) nd/r guideines d yu nd yur fe ctgers/metdt ibrrins use?(pese check tht ppy)
7/30/2019 Metadata Creation Practices in Digital
8/13
MetADAtA creAtioN prActices iN DiGitAl repositories AND collectioNs | pArK AND tosAKA 111
job responsibility, expertiseof staff (44.2percent)andintegrated library system (39.9 percent) appeared tohighlightthekeyrolethatMARCcontinuestoplayinthemetadata-creationprocessfordigitalcollections(seefig-ure2).Budgetalsoappearedtobeanimportantfactorinmetadataselection(17.2percent),showingthatfundinglevelsplayedaconsiderableroleinmetadatadecisions.
typesofinformationresources.Theprimaryfactorinselectingmetadataschemataistheirsuit-ability fordescribingthemostcommontypeofresourceshan-dledbythesurveyparticipants.
Thesecondandthirdmostcommoncriteria,targetusers/audience (49.8 percent) andsubjectmatters of resources
(46.9 percent), also seem toreflect how domain-specificmetadataschemataareapplied.Inmaking decisions onmeta-data schemata, respondentsweighed materials in particular subject areas (e.g., art,education, and geography) and the needs of particu-lar communitiesofpractice as their primary users andaudiences.
However, existing technological infrastructure andresource constraints also determine options. Given theprominence of general library cataloging as a primary
Table 6. The most frequently used controlled vocabulary schema(s) by resource type (multiple responses)
LCSH LCC DDC AAT TGM ULAN TGN Other
Text79.5%
(241)
35.6%
(108)
16.8%
(51)
10.2%
(31)
6.9%
(21)
3.6%
(11)
5.0%
(15)
14.2%
(43)
Audiovisual materials67.3%
(204)
25.1%
(76)
12.9%
(39)
9.2%
(28)
8.6%
(26)
4.0%
(12)
5.0%
(15)
14.5%
(44)
Cartographic materials44.9%
(136)
17.5%
(53)
7.3%
(22)
5.0%
(15)
4.3%
(13)
1.3%
(4)
4.3%
(13)
6.3%
(19)
Images43.2%
(131)
11.9%
(36)
5.6%
(17)
25.7%
(78)
20.1%
(61)
9.9%
(30)
10.6%
(32)
11.2%
(34)
Cultural objects
(e.g., museum objects)
20.1%
(61)
7.3%
(22)
4.3%
(13)
13.2%
(40)
6.3%
(19)
4.6%
(14)
3.0%
(9)
7.9%
(24)
Archives44.2%
(134)
11.6%
(35)
6.3%
(19)
11.9%
(36)
6.6%
(20)
3.0%
(9)
2.6%
(8)
12.2%
(37)
Electronic resources60.7%
(184)
23.4%
(71)
8.6%
(26)
5.3%
(16)
3.6%
(11)
1.7%
(5)
3.0%
(9)
14.2%
(43)
Digitized resources51.8%
(157)
15.5%
(47)
5.0%
(15)
15.5%
(47)
10.2%
(31)
6.6%
(20)
7.6%
(23)
15.2%
(46)
Born-digital resources43.9%
(133)
13.5%
(41)
5.6%
(17)
8.3%
(25)
7.3%
(22)
4.3%
(13)
4.6%
(14)
13.9%
(42)
Survey questin: which cntred vcbury schem(s) d yu nd yur fe ctgers/metdt ibrrins use mst? (Pese check tht ppy)
Table 7. Types of semi-automatic metadata generation tools in use
Types Response Rating
Metadata format conversion 38 (38.6%)
Templates and editors for metadata creation 26 (27.0%)
Automatic metadata creation 16 (16.7%)
Library system for bibliographic and authority control 15 (15.6%)
Metadata harvesting and importing tools 8 (8.3%)
N= 96. Survey questin: Pese describe the (semi)utmtic metdt genertin ts yu use.
7/30/2019 Metadata Creation Practices in Digital
9/13
112 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2010
the software used by their insti-tutionsi.e., integrated library
system (39.9 percent), digitalcollection or asset managementsoftware (25.4 percent), institu-tional repository software (19.8percent), union catalogs (14.9percent), and archival manage-mentsoftware(5.6percent)asareason for their selection ofmeta-dataschemata.Metadatadecisionsthus seem to be drivenby avari-etyoflocaltechnologychoicesfordevelopingdigitalrepositoriesandcollections.
As shown in figure 5, similar
patterns areobservedwith regardto selection criteria for controlledvocabulary schemata. Three ofthe four selection criteria receiv-ing majority responsestargetusers/audience (55.4 percent),type of resources (54.8 percent),andnatureofthecollection(50.2percent)suggest that controlledvocabulary decisions are influ-encedprimarilybythesubstantivepurpose and scope of controlledvocabularies for local collections.
A major consideration seems tobe whether particular controlledvocabularies are suitable for rep-resenting standard data values toimprove access and retrieval fortargetaudiences.
Metadata standards, anotherselection criteria frequently citedin the survey (54.1 percent),reflects how some domain-spe-cific metadata schemata tend todictate the use of particular con-trolled vocabularies. At the sametime, the results also suggest that
resources and technological infra-structure available to institutionswere also important reasons fortheir selections. Expertise ofstaff (38.3 percent) seems to bea straightforward practical rea-son: the application of controlledvocabularies is highly dependenton the width and depth of staffexpertiseavailable.Likewise,when
implementing controlled vocabularies in the digitalenvironment, some institutions also took into account
Atthesametime,itisnoteworthythatwhileresponseswere not mutually exclusive, many respondents cited
Figure 4. Criteria for selecting metadata schemata (multiple responses)
Questin: which criteri ere ppied in seecting metdt schemt? (pese check tht ppy)
Figure 5. Criteria for selecting controlled vocabulary schemata (multiple responses)
Questin: which criteri re ppied in seecting cntred vcbury schemt? (Pese check tht ppy)
7/30/2019 Metadata Creation Practices in Digital
10/13
MetADAtA creAtioN prActices iN DiGitAl repositories AND collectioNs | pArK AND tosAKA 113
forsearchengines and63.2 percentfor OAI harvesters), a result that
maybeinterpretedasatendencytocreate metadata primarily for localaudiences.
Why do many institutions failtomake their locally createdmeta-data available to other institutionsdespite wide consensus on theimportance of metadata sharing inanetworkedworld?Responsesfromthose institutions exposingnoneornot all of theirmetadata (see table8) reveal that financial, personnel,and technical issues are major hin-drances inpromoting the exposure
ofmetadataoutside the immediatelocalenvironment.Someinstitutionsare notconfident that their currentmetadata practices are able to sat-isfy the technical requirements forproducingstandards-basedinterop-erable metadata. Another reasonfrequently mentioned is copyrightconcernsaboutlimited-accessmate-rials. Yet some respondents simplydo not see any merit to exposingtheir item-levelmetadata, citing its
relativeuselessnessfor resourcediscoveryoutside their
institutions.As stated earlier, the practice of adding home-grownmetadataelementsseemscommonamongmanyinstitutions. While locally created metadata elementsaccommodate local needs and requirements, they mayalso hinder metadata interoperability across digitalrepositories and collections if mechanisms for findinginformation about such locally defined extensions andvariants are absent. Homegrown metadata guidelinesdocumentlocaldatamodelsandfunctionasanessentialmechanismformetadatacreationandqualityassurancewithin and across digital repositories and collections.23
In this regard, it is essential toexamine locally createdmetadata guidelines and best practices.24 However, the
resultsofthesurveyanalysisevincethatthevastmajorityofinstitutions(72.0percent)providednopublicaccesstolocalapplicationprofilesontheirwebsiteswhileonly19.6percentofrespondentsinstitutionsmadethemavailableonlinetothepublic.
Conclusion
Metadataplaysanessentialroleinmanaging,organizing,andsearchingforinformationresources.Inthenetworked
existing system features for authority control and con-
trolledvocabularysearching,asexhibitedby17.2percentofresponsesfordigitalcollection/orassetmanagementsoftware.
exng Madaa and Madaa Gdnbynd la envnmn
Metadata interoperability across distributed digitalrepositories and collections is fast becoming a majorissue.22Theproliferationofopen-sourceandcommercialdigitallibraryplatformsusingavarietyofmetadatasche-matahasimplicationsonthelibrariansabilitytocreateshareable and interoperable metadata beyond the localenvironment.Towhatextentaremechanismsforsharing
metadata integrated into the current metadata-creationpracticesdescribedbytherespondents?
Figure 6 summarizes the responses concerning theusesofthreemajormechanismsformetadataexposure.Approximatelyhalfofrespondentsexposedatleastsomeof their metadata to search engines (52.8 percent) andunion catalogs such asOCLCWorldCat (50.6 percent).More than one-third of the respondents exposed all orsome of their metadata through OAI harvesters (36.8percent).About half ormore of the respondents eitherdidnotexposetheirmetadataorwerenotsureaboutthecurrentoperationsat theirinstitutions(e.g.,47.2percent
Figure 6. Mechanism to expose metadata (multiple responses)
Survey questin: D yu/yur rgniztin expse yur metdt t oaI (open archives Inititive)hrvesters, unin ctgs r serch engines?
7/30/2019 Metadata Creation Practices in Digital
11/13
114 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2010
TheDCmetadataschemaisthesecondmostwidelyemployed according to this study, with Qualified DC
used by 40.6 percent of responding institutions andUnqualifiedDCusedby25.4percent.EADisanotherfre-quentlycitedschema(31.7percent),followedbyMODS(17.8percent),VRA(14.9percent),andTEI(12.5percent).AtrendofQualifiedDCbeingused(40.6percent)moreoftenthanUnqualifiedDC(25.4percent)isnoteworthy.One possibleexplanationof this trendmay bederivedfromthefactthatsemanticambiguitiesandoverlapsinsomeoftheUnqualifiedDCelementsinterferewithuseinresourcedescription.25Giventheearliersurveysreport-ingthehigheruseofUnqualifiedDCoverQualifiedDC,morein-depthexaminationoftheirusetrendsmaybeanimportantavenueforfuturestudies.
Despiteactiveresearchandpromisingresultsobtained
from some experimental tools, practical applications ofsemiautomatic metadata generation have been incor-porated into the metadata-creation processes by onlyone-thirdofsurveyparticipants.
The leading criteria in selecting metadata andcontrolledvocabularyschemataarederivedfromcollec-tion-specificconsiderationsof thetypeof resources, thenatureofthecollections,andtheneedsofprimaryusersand communities. Existing technological infrastructure,encompassing digital collection or asset managementsoftware, archival management software, institutionalrepository software, integrated library systems, andunion catalogs also greatly influence the selection pro-
cess.Theskillsandknowledgeofmetadataprofessionalsand the expertise of staff also are significant factors inunderstanding current practices in theuse ofmetadataschemataandcontrolled vocabularies forsubject accessacrossdistributeddigitalrepositoriesandcollections.
The survey responses reveal thatmetadata interop-erability remains a challenge in the current networkedenvironment despite growing awareness of its impor-tance. For half of the survey respondents, exposingmetadatatotheserviceproviders,suchasOAIharvesters,unioncatalogs,andsearchengines,doesnotseemtobeahighprioritybecauseof localfinancial,personnel,andtechnicalconstraints.Locallycreatedmetadataelementsareaddedinmanydigitalrepositoriesandcollectionsin
largeparttomeetlocaldescriptiveneedsandservethetarget user community. While locally createdmetadataelementsaccommodatelocalneeds,theymayalsohindermetadata interoperability acrossdigital repositoriesandcollectionswhenshareablemechanismsarenotinplaceforsuchlocallydefinedextensionsandvariants.
Locallycreatedmetadataguidelinesandapplicationprofiles are essential for metadata creation and qualityassurance;however,mostcustomcontentguidelinesandbest practices (72 percent) are not made publicly avail-able.Thelackofamechanismtofacilitatepublicaccesstolocalapplicationprofilesandmetadataguidelinesmay
environment,theenormousvolumeofonlineanddigitalresources createsan impendingresearchneed to evalu-atetheissuessurroundingthemetadata-creationprocessandtheemploymentofcontrolledvocabularyschemataacross ever-growingdistributeddigital repositories andcollections.Inthispaperweexploredthecurrentstatusof metadata-creation practices through an examinationof survey responsesdrawnmostly from cataloging andmetadataprofessionals(seetables2,3,and4).Theresultsofthestudyindicatethatcurrentmetadatapracticesstilldonotcreateconditionsforinteroperability.
Despite the proliferation of newer metadata sche-mata,thesurveyresponsesshowedthatMARCcurrently
remains the most widely used schema for providingresource description and access in digital repositories,collections, and libraries.The continuingpredominanceofMARCgoeshand-in-handwiththeuseofAACR2astheprimarycontentstandardforselectingandrepresent-ingdatavaluesfordescriptivemetadataelements.LCSHisusedasthedefactocontrolledvocabularyschemaforprovidingsubjectaccessinalltypesofdigitalrepositoriesandcollections,whiledomain-specificsubjectterminolo-giessuchasAATareappliedatsignificantlyhigherratesindigital repositorieshandlingnonprint resourcessuchasimages,culturalobjects,andarchivalmaterials.
Table 8. Sample reasons for not exposing metadata
Not all our metadata conforms to standardsrequired
Not all our metadata is OAI compliant
Lack of expertise and time and money to
develop it
IT restrictions
Security concerns on the part of our information
technology department
Some collections/records are limited access and
not open to the general public
We think that having WorldCat available fortraditional library materials that many libraries
have is a better service to people than having
each library dump our catalog out on the web
Varies by tool and collection, but usually a
restriction on the material, a technical barrier, or
a feeling that for some collections the data is not
yet sufficiently robust still in a work in progress
Survey questin: If yu seected sme, but nt r n in questin 13 [seefigure 6], pese te hy yu d nt expse yur metdt.
7/30/2019 Metadata Creation Practices in Digital
12/13
MetADAtA creAtioN prActices iN DiGitAl repositories AND collectioNs | pArK AND tosAKA 115
presentedat2003DublinCoreConference:SupportingCommu-nities ofDiscourse andPracticeMetadata Research&Appli-
cations,Seattle,Wash.,Sept.28Oct. 2,2003), http://dcpapers.dublincore.org/ojs/pubs/article/view/732/728(accessedMar.24, 2009); Sarah Currier et al.,QualityAssurance forDigitalLearningObjectRepositories: Issues for theMetadata-creationprocess,ALT-J12(2004):520.
3. JinMa,Metadata,SPECKit298(Washington,D.C.:Asso-ciationofResearchLibraries,2007):13,28.
4. Ibid.,12,2122.5. KarenSmith-Yoshimura,RLG Programs Descriptive Meta-
data Practices Survey Results (Dublin, Ohio:OCLC,2007): 67,http://www.oclc.org/programs/publications/reports/2007-03.pdf (accessed Mar. 24, 2009); Karen Smith-Yoshimura andDiane Cellentani, RLG Programs Descriptive Metadata PracticesSurvey Results: Data Supplement(Dublin,Ohio:OCLC,2007):16,http://www.oclc.org/programs/publications/reports/2007-04
.pdf(accessedMar.24,2009).6. CarolePalmer,OksanaZavalina,andMeganMustafoff,
Trends inMetadata Practices:A Longitudinal Studyof Col-lection Federation (paper presented at the Seventh ACM/IEES-CSJointConferenceonDigitalLibraries,Vancouver,Brit-ish Columbia, Canada, June 1823, 2007), http://hdl.handle.net/2142/8984(accessedMar.24,2009).
7. Smith-Yoshimura,RLG Programs Descriptive Metadata Prac-tices Survey Results, 7; Smith-Yoshimura and Cellentani, RLGPrograms Descriptive Metadata Practices Survey Results,17.
8. Ma,Metadata,12,2223.9. Smith-Yoshimura,RLG Programs Descriptive Metadata Prac-
tices Survey Results, 7; Smith-Yoshimura and Cellentani, RLGPrograms Descriptive Metadata Practices Survey Results,1821.10. KarenMarkeyetal.,Census of Institutional Repositories in
the United States: MIRACLE Project Research Findings(Washing-ton,D.C.:CouncilonLibrary&InformationResources,2007):3,4650, http://www.clir.org/pubs/reports/pub140/pub140.pdf(accessedMar.24,2009).11. YoshimuraandCellentani,RLG Programs Descriptive Meta-
data Practices Survey Results,24.12. University ofHoustonLibrariesInstitutionalRepository
TaskForce,Institutional Repositories,SPECKit292(Washington,D.C.:AssociationofResearchLibraries,2006):18,78.13. Ma,Metadata,13,28.14. Smith-Yoshimura,RLG Programs Descriptive Metadata Prac-
tices Survey Results,9,11;Smith-YoshimuraandCellentani,RLGPrograms Descriptive Metadata Practices Survey Results,2729.15. For the metrics of job responsibilities used to analyze
jobdescriptionsand competenciesof catalogingandmetadata
professionals,seeJung-ranPark,CaimeiLu,andLindaMarion,CatalogingProfessionalsintheDigitalEnvironment:AContentAnalysisofJobDescriptions,Journal of the American Society forInformation Science & Technology60(2009):84457;Jung-ranParkandCaimeiLu,MetadataProfessionals:RolesandCompeten-ciesasReflectedinJobAnnouncements,20032006, Cataloging& Classification Quarterly47(2009):14560.16. Ma,Metadata;Smith-Yoshimura,RLG Programs Descriptive
Metadata Practices Survey Result.17. Jung-ranParkandEricChildress,DublinCoreMetadata
Semantics:AnAnalysisofthePerspectivesofInformationPro-fessionals,Joural of Information Science35,no.6(2009):72739.18. Park,SemanticInteroperability.19. Jung-ranPark,MetadataQualityinDigitalRepositories:
hindercross-checkingforqualitymetadataandcreatingshareablemetadatathatcanbeharvestedforahighlevel
of consistency and interoperability across distributeddigital collections and repositories. Development of asearchableregistryforpubliclyavailablemetadataguide-lineshaspotentialtoenhancemetadatainteroperability.
Aconstraining factor ofthisstudyderives fromtheparticipant population; thus we have not attempted togeneralize the findings of the study. However, resultsindicateapressingneedforacommondatamodelthatis shareableand interoperable across ever-growing dis-tributeddigitalrepositoriesandcollections.Developmentofsuchacommondatamodeldemandsfutureresearchof a practical and interoperable mediation mechanismunderlying local implementationofmetadata elements,semantics,content standards,and controlled vocabular-
ies in aworldwheremetadata can be distributed andsharedwidelybeyondtheimmediatelocalenvironmentand user community. (Other issues such as semiauto-matic metadata application, DC metadata semantics,custommetadata elements, and the professional devel-opment of cataloging and metadata professionals areexplained in-depth in separate studies.)26 For futurestudies,incorporationofotherresearchmethods(suchasfollow-uptelephonesurveysandface-to-facefocusgroupinterviews)couldbeusedtobetterunderstandthecur-rent status of metadata-creation practices. Institutionalvariationalsoneedsbetakenintoaccountinthedesignoffuturestudies.
Acknowledgments
Thisstudyissupportedthroughanearlycareerdevelop-mentresearchawardfromtheInstituteofMuseumandLibraryServices.Wewouldliketoexpressourapprecia-tiontothereviewersfortheirinvaluablecomments.
References
1.Jung-ranPark, SemanticInteroperability andMetadataQuality:AnAnalysisofMetadataItemRecordsofDigitalImage
Collections, Knowledge Organization 33 (2006): 2034; RachelHeery, Metadata Futures: Steps toward Semantic Interoper-ability,inMetadata in Practice,ed.DianeI.HillmanandElaineL. Westbrooks, 25771 (Chicago: ALA, 2004); Jung-ran Park,Semantic Interoperability across Digital Image Collections:APilotStudyonMetadataMapping (paperpresented at theCanadianAssociationforInformationScience2005AnnualCon-ference,London,Ontario,June24,2005),http://www.cais-acsi.ca/proceedings/2005/park_J_2005.pdf(accessedMar.24,2009).
2. JaneBarton,SarahCurrier,andJessieM.N.Hey,BuildingQualityAssuranceintoMetadataCreation:AnAnalysisBasedontheLearningObjectsandE-PrintsCommunitiesofPractice(paper
7/30/2019 Metadata Creation Practices in Digital
13/13
116 iNForMAtioNtecHNoloGY AND liBrAries | septeMBer 2010
A Survey of the Current Stateof theArt, inMetadata andOpenAccessRepositories, ed.MichaelS. BabinecandHollyMercer,specialissue,Cataloging & Classification Quarterly47,no.3/4(2009):21338.20. Ma,Metadata,12,24.TheOCLCRLGsurveyfoundthat
about40percentoftherespondentswereabletogeneratesomemetadata automatically. See Smith-Yoshimura, RLG ProgramsDescriptive Metadata Practices Survey Results, 6;YoshimuraandCellentani,RLG Programs Descriptive Metadata Practices SurveyResults,35.21. Jung-ran Park and Caimei Lu, Application of Semi-
AutomaticMetadataGenerationinLibraries:Types,Tools,andTechniques, Library & Information Science Research 31, no. 4(2009):22531.22. Park,SemanticInteroperability;SarahL.Shreevesetal.,
IsQualityMetadata ShareableMetadata?The ImplicationsofLocalMetadataPractices forFederated Collections (paperpresentedatthe12thNationalConferenceoftheAssociationof
CollegeandResearchLibraries,Apr. 710, 2005,Minneapolis,Minnesota), https://www.ideals.uiuc.edu/handle/2142/145(accessedMar. 24, 2009);AmyS. Jacksonet al.,Dublin CoreMetadataHarvestedthroughOAI-PMH,Journal of Library Meta-data8,no.1(2008):521;LoisMaiChanandMarciaLeiZeng,
Metadata Interoperability and StandardizationA Study ofMethodologyPartI:AchievingInteroperability at theSchemaLevel,D-Lib Magazine 12,no. 6 (2006),http://www.dlib.org/dlib/june06/chan/06chan.html(accessedMar.24,2009);MarciaLeiZeng andLoisMaiChan, Metadata InteroperabilityandStandardizationA Study ofMethodology Part II:AchievingInteroperability at the Record and Repository Levels,D-LibMagazine 12, no. 6 (2006), http://www.dlib.org/dlib/june06/zeng/06zeng.html(accessedMar.24,2009).23. Thomas R. Bruce and Diane I. Hillmann, The Con-
tinuumofMetadataQuality:Defining,Expressing,Exploiting,inMetadata in Practice, ed. Hillman and Westbrooks, 23856;Heery,MetadataFutures;Park,MetadataQualityinDigitalRepositories.24. Jung-ran Park, ed., Metadata Best Practices: Current
IssuesandFutureTrends,specialissue,Journal of Library Meta-data9,no.3/4(2009).25. SeePark,SemanticInteroperability;ParkandChildress,
DublinCoreMetadataSemantics.26. Park andChildress,DublinCoreMetadataSemantics;
ParkandLu,ApplicationofSemi-AutomaticMetadataGenera-tioninLibraries.