Metadata Creation Practices in Digital

7/30/2019 Metadata Creation Practices in Digital

1/13

104 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2010

developmentof such amediationmechanism calls foran empirical assessmentofvarious issues surrounding

metadata-creationpractices.The critical issues concerning metadata practices

across distributed digital collections have been rela-tivelyunexplored.Whileexamininglearningobjectsande-printscommunitiesofpractice,Barton,Currier,andHeypointoutthelackofformalinvestigationofthemetadata-creationprocess.2Aswillbe discussedin the followingsection,someresearchershavebeguntoassessthecurrentstate of descriptive practices, metadata schemata, andcontent standards. However, the literature has not yetdeveloped toa pointwhereit affords a comprehensivepicture.Giventhepropagationofmetadataprojects,itisimportanttocontinuetotrackchangesinmetadata-cre-ationpracticeswhiletheyarestillinconstantflux.Sucheffortsareessentialforaddingnewperspectivestodigitallibrary researchand practicesin an environmentwheremetadata best practices are being actively sought afterto aid in thecreation and management ofhigh-qualitydigitalcollections.

This study examines theprevailing current state ofmetadata-creation practices in digital repositories, col-lections,andlibraries,whichmayincludebothdigitizedand born-digital resources. Using nationwide surveydata, mostly drawn from the community of catalog-ingandmetadata professionals,we seek to investigateissues in creatingdescriptivemetadata elements, usingcontrolled vocabularies for subject access, and propa-

gatingmetadata andmetadata guidelines beyondlocalenvironments.Wewilladdressthefollowingresearchquestions:

1. Whichmetadataschema(ta)andcontentstandard(s)areemployed in individualdigital repositories andcollections?

2. Whichcontrolledvocabularyschema(ta)areusedtofacilitatesubjectaccess?

3. Whatcriteriaareappliedin selectingmetadataandcontrolled-vocabularyschema(ta)?

4. To what extent are mechanisms for exposing andsharingmetadata integrated intocurrentmetadata-creationpractices?

Inthisarticle,wefirstreviewrecentstudiesrelatingtocurrentmetadata-creationpracticesacrossdigitalcol-lections.Thenwepresentthesurveymethodemployedtoconductthisstudy,thegeneralcharacteristicsofsurveyparticipants, and thevalidityof the collected data, fol-lowedbythestudyresults.Wereportonhowmetadataand controlled vocabulary schema(ta) are being usedacross institutions, and we present a data analysis ofcurrent metadata-creation practices. The final sectionsummarizesthestudyandpresentssomesuggestionsforfuturestudies.

This study explores the current state of metadata-creation

practices across digital repositories and collections by

using data collected from a nationwide survey of mostly

cataloging and metadata professionals. Results show

that MARC, AACR2, and LCSH are the most widely

used metadata schema, content standard, and subject-

controlled vocabulary, respectively. Dublin Core (DC) is

the second most widely used metadata schema, followed

by EAD, MODS, VRA, and TEI. Qualified DCs wider

use vis--vis Unqualified DC (40.6 percent versus 25.4

percent) is noteworthy. The leading criteria in selecting

metadata and controlled-vocabulary schemata are collec-

tion-specific considerations, such as the types of resources,nature of the collection, and needs of primary users and

communities. Existing technological infrastructure and

staff expertise also are significant factors contributing

to the current use of metadata schemata and controlled

vocabularies for subject access across distributed digital

repositories and collections. Metadata interoperability

remains a major challenge. There is a lack of exposure of

locally created metadata and metadata guidelines beyond

the local environments. Homegrown locally added meta-

data elements may also hinder metadata interoperability

across digital repositories and collections when there is alack of sharable mechanisms for locally defined extensions

and variants.

Metadata is an essential building block in facili-tating effective resource discovery, access, andsharingacross ever-growingdistributed digital

collections. Quality metadata is becoming critical in anetworked world in which metadata interoperabilityis among the top challenges faced by digital libraries.However,thereisnocommondatamodelthatcatalog-ing and metadata professionals can readily reference

as a mediation mechanism during the processes ofdescriptive metadata creation and controlled vocabu-lary schemata application for subject description.1 The

Jng-an pak ([email protected]) is assistntPrfessr, Cege f Infrmtin Science nd Techngy, Drex-e niversity, Phidephi, nd Yj taka ([email protected])is Ctging/Metdt librrin, TCJ librry, The Cege fe Jersey, Eing, e Jersey.

Jung-ran Park andYuji Tosaka

Metadata Creation Practices in Digital

Repositories and Collections: Schemata,

Selection Criteria, and Interoperability


2/13

MetADAtA creAtioN prActices iN DiGitAl repositories AND collectioNs | pArK AND tosAKA 105

possibleincreaseintheuseoflocallydevelopedschemataasmanyprojectsaddednewtypesofnontextualdigital

objectsthatcouldnotbeadequatelydescribedbyexistingmetadataschemata.6

Thereisalackofresearchconcerningthecurrentuseofcontentstandards;however,itisreasonabletosuspectthat content-standards use exhibits patterns similar tothatofmetadatabecauseoftheiroftencloseassociationwithparticularmetadataschemata.TheOCLCRLGsur-veyrevealsthatAnglo-AmericanCataloguingRules,2ndedition(AACR2)thetraditionalcatalogingrulethathasmostoftenbeenusedinconjunctionwithMARCisthemostwidelyusedcontentstandard(81percent).AACR2isfollowedbyDescribingArchives:AContentStandard(DACS)with42percent;DescriptiveCatalogingofRareMaterials with 33 percent; Archives, Personal Papers,

Manuscripts (APPM) with 25 percent; and CatalogingCulturalObjects(CCO)with21percent.7

Inthesamewayasmetadataschemata,thereappearsto be a concentration of a few controlled vocabularyschemata at research institutions.MasARL survey, forexample, shows that the Library of Congress SubjectHeadings(LCSH)andNameAuthorityFile(NAF)wereused by most survey respondents (96 percent and 88percent,respectively).Thesetwopredominantlyadoptedvocabularies are followed by several domain-specificvocabularies, such as Art and Architecture Thesaurus(AAT), Library of Congress Thesaurus for GraphicalMaterials(TGM)IandII,GettyThesaurusofGeographic

Names(TGN),andtheGettyUnionListofArtistsNames(ULAN),whichwereusedbybetween30percenttomorethan60percentofrespondents.8TheOCLCRLGsurveyreportssimilarresults;however,nearlyhalfoftheOCLCRLGsurveyrespondents(N=9)indicatedthattheyhadalsobuiltandmaintainedoneormorelocallydevelopedthesauri.9

While creating and sharing information about localmetadata implementations is an important step towardincreasedinteroperability, recent studies tendto paintagrimpictureofcurrentlocaldocumentationpracticesandopenaccessibility. Ina nationwidestudyof institutionalrepositories in U.S. academic libraries, Markey et al.foundthatonly61.3percentofthe446surveyparticipants

with operational institutional repositories had imple-mentedpolicies formetadata schemata and authorizedmetadata creators.10 TheOCLCRLG survey also high-lightslimitedcollaborationandsharingofthemetadataguidelinesbothwithinandacrosstheinstitutions.Itfindsthatevenwhentherearemultipleunitscreatingmetadatawithinthesameinstitution,metadata-creationguidelinesoftenareunlikelytobeshared(28percentdonotshare;53percentsometimesshare).11

Amixedresultisreportedontheexposureofmeta-datatooutsideserviceproviders.InanARLsurvey,theUniversityofHoustonLibrariesInstitutionalRepository

Literature Review

As evinced by the principles and practices of bib-liographiccontrolthroughsharedcataloging,successfulresource access and sharing in the networked envi-ronment demands semantic interoperability based onaccurate,complete,andconsistentresourcedescription.TherecentsurveybyMafindsthattheOpenArchivesInitiativeProtocolforMetadataHarvesting(OAI-PMH)and metadata crosswalks have been adopted by 83percent and 73 percent of respondents, respectively.Even though the sample comes only from sixty-eightAssociationofResearchLibraries(ARL)memberlibrar-ies, and the figures thusmay be skewed higher thanthose of the entire population of academic libraries,there is little doubt that interoperability is a critical

issuegiventherapidproliferationofmetadataschematathroughoutdigitallibraries.3

While there is a variety ofmetadata schemata cur-rently in use for organizing digital collections, only afew of them arewidely used indigital repositories. InherARLsurvey,MareportsthattheMARCformatisthemostwidelyusedmetadataschema(91percent),followedby Encoded Archival Description (EAD) (84 percent),UnqualifiedDublinCore(DC)(78percent),andQualifiedDC (67 percent).4 Similarly, a 2007 member survey byOCLCResearchLibrariesGroup (RLG) programsgath-eredinformation from eighteenmajor research librariesand cultural heritage institutions and also found that

MARCisthemostwidelyusedscheme(65percent),fol-lowedbyEAD(43percent),UnqualifiedDC(30percent),andQualifiedDC(29percent).Thedifferentlevelsofusereportedby thesestudies are probably dueto differentsample sizesand compositions, but results nonethelesssuggestthatmetadatauseatresearchinstitutionstendstorelyonasmallnumberofmajorschemata.5

Theremayinfactbemuchgreaterdiversityinmeta-datausepatternswhenthescopeisexpandedtoincludeboth research and nonresearch institutions. Palmer,Zavalina,andMustafoff,forexample,trackedtrendsfrom2003through2006inmetadataselectionandapplicationpracticesatmore than160digitalcollectionsdevelopedthroughInstituteofMuseumandLibraryServicesgrants.

Theyfoundthatdespiteperceivedlimitations,useofDCisthemostwidespread,withmorethanhalfofthedigitalcollectionsusing it alone or in combinationwith otherschemata.MARC rankssecond,with nearly 30percentusingitaloneorincombination.Theauthorsfoundthatthechoiceofmetadataschema is largely influenced bypractices at peer institutions and compatibility with acontentmanagementsystem.Whatismoststriking,how-ever, is the finding that locallydevelopedschemataareusedasoftenasMARC.Thereisadeclineinthepercent-ageofdigitalprojectsusingmultiplemetadataschemata(from53percentto38percent).Yettheauthorsalsosawa


3/13


Method

Theobjectiveoftheresearchreportedinthispaperistoexaminethecurrentstateofmetadata-creationpracticesintermsofthecreationofdescriptivemetadataelements,the use of controlled vocabularies for subject access,and the exposureofmetadata andmetadataguidelinesbeyondlocalenvironments.WeconductedaWebsurveyusing WebSurveyor (now Vovici: http://www.vovici.com). The survey included both structured and open-endedquestions.Itwasextensivelyreviewedbymembersof an advisory boarda group of three experts in thefieldand it was pilot-tested prior to being officiallylaunched.Thesurvey includedmanymultiple-responsequestionsthatcalledforrespondentstocheckallappli-cableanswers.

We recruitedparticipants through survey invitationmessages and subsequent reminders to the electronicmailinglistsofcommunitiesofmetadataandcatalogingprofessionals.Table1 showsthemailing lists employedforthestudy.Wealsosentoutindividualinvitationsanddistributedflyerstoselectedmetadataandcatalogingses-sionsduringthe2008ALAMidwinterMeeting,heldthatyearinPhiladelphia.

The survey attracted a large number of initial par-ticipants(N=1,371),butduringthesixty-twodaysfromAugust6toOctober6,2008,weonlyreceived303com-pletedresponsesviathesurveymanagementsystem.Wesuspect that the high incompletion rate (77.9 percent)

stemsfromthefactthatthesubjectmattermayhavebeenoutsidethescopeofmanyparticipantsjobresponsibili-ties.Thelengthofthesurveymayalsohavebeenafactorintheincompletionrate.

The profiles of respondents job titles (see table 2)

Task Force found that exposingmetadata toOAI-PMHservice providers is an established practice used by

nearly90percentoftherespondents.12MasARLsurveyalso reports the wide adoption of OAI-PMH (83 per-cent).Theseresultsunderscorethevirtualconsensusonthecritical importance ofexposingmetadatatoachieveinteroperabilityandmakelocallycreatedmetadatausefulacrossdistributeddigitalrepositoriesandcollections.13Bycontrast,theOCLCRLGsurveyshowsthatonlyone-tenthoftherespondentsstatedthatallnon-MARCmetadataisexposed to OAI harvesters, while 30 percent indicatedthatonlysomeofitwasavailable.TheprominentthemerevealedbytheOCLCRLGsurveyisaninwardfocusincurrentmetadatapractices,markedbytheuseoflocaltoolstoreachagenerallylocalaudience.14

In summary, recent studies show that the current

practiceofmetadatacreation isproblematic due to thelack of a mechanism for integrating various types ofmetadata schemata, content standards, and controlledvocabularies inways that promote an optimal level ofinteroperabilityacrossdigitalcollectionsandrepositories.Theproblemsareexacerbatedinanenvironmentwheremany institutions lack local documentation delineatingthemetadata-creationprocess.

At the same time, researchers have only recentlybegun studying these issues, and the body of literatureis at an incipient stage. The research that was doneoften targeted different populations, and sample sizeswere different (some very small). In some cases the

literature exhibits contradictory findings about issuessurroundingmetadatapractices,increasingthedifficultyinunderstandingthecurrentstateofmetadatacreation.Thispointsoutthe need for further researchof currentmetadata-creationpractice.

Table 1. Electronic mailing lists for the survey

Electronic Mailing Lists E-mail Address

Autocat

Dublin Core Listserv

Metadata Librarians Listserv

Library and Information Technology Association Listserv

Online Audiovisual Catalogers Electronic Discussion List

Subject Authority Cooperative Program Listserv

Serialist

Text Encoding Initiative Listserv

Electronic Resources in Libraries Listserv

Encoded Archival Description Listserv

[email protected]

[email protected]

[email protected]

[email protected]

[email protected]

[email protected]

[email protected]

[email protected]

[email protected]

[email protected]


4/13


and job responsibilities (see table 3) clearly show thatmost of the individuals who completed the surveyengage professionally in activities directly relevant tothe research objectives, such asdescriptive and subject

cataloging,metadatacreationandmanagement, author-ity control, nonprint and special material cataloging,electronicresourceanddigitalprojectmanagement,andintegratedlibrarysystem(ILS)management.

Althoughthelargestnumberofparticipants(135,or44.6percent)chosetheOthercategoryregardingtheirjobtitle (see table2), it is reasonabletoassume that thevastmajoritycanbecategorizedascatalogingandmeta-dataprofessionals.15MostjobtitlesgivenasOtherareassociatedwithoneoftheprofessionalactivitieslistedintable4.

Thusitisreasonabletoassumethattherespondentsare in an appropriate position to provide first-hand,accurateinformationaboutthecurrentstateofmetadata

creationintheirinstitutions.Concerning the institutional background of partici-

pants, of the 303 survey participants, fewer than half(121,or39.9percent)providedinstitutionalinformation.We believe that this is mostly due to the fact that thequestionwasoptional,followinga suggestionfrom theInstitutionalReviewBoardatDrexelUniversity.Ofthosethatprovidedtheirinstitutionalbackground,themajority(75.2 percent) are from academic libraries, followedbyparticipantsfrompubliclibraries(17.4percent)andfromotherinstitutions(7.4percent).

Table 3. Participants job responsibilities (multiple responses)

Job Responsibilities

Number of

Participants

General cataloging

(e.g., descriptive and subject

cataloging)

171 (56.4%)

Metadata creation and

management

153 (50.5%)

Authority control 147 (48.5%)

Nonprint cataloging

(e.g., microform, music scores,

photographs, video recordings)

133 (43.9%)

Special material cataloging

(e.g., rare books, foreign language

materials, government documents)

126 (41.6%)

Digital project management 101 (33.3%)

Electronic resource management 62 (20.5%)

ILS management 59 (19.5%)

Other 51 (16.8%)

Survey questin: wht re yur primry jb respnsibiities? (pese check tht ppy)

Table 2. Job titles of participants (multiple responses)

Job Titles

Number of

Participants

Other 135 (44.6%)

Cataloger/cataloging librarian/

catalog librarian

99 (32.7%)

Metadata librarian 29 (9.6%)

Catalog & metadata librarian 26 (8.6%)

Head, cataloging 26 (8.6%)

Electronic resources cataloger 17 (5.6%)

Cataloging coordinator 15 (5.0%)

Head, cataloging & metadata

services

15 (5.0%)

N= 227. Survey questin: wht is yur rking jb tite? (pese check tht ppy)

Table 4. Professional activities specified in Other category intable 2

Professional Activities

Number of

Participants

Cataloging & metadata creation 31 (10.2%)

Digital projects management 23 (7.6%)

Technical services 17 (5.6%)

Archiving 16 (5.3%)

Electronic resources and

serials management

6 (2.0%)

Library system administration/

other

6 (2.0%)

N= 99. Survey questin: If yu seected ther, pese specify.


5/13


ItisnoteworthythatuseofQualifiedDCwashigherthan that of Unqualified DC. This result is differentfrom theARL survey and amember survey conducted

Results

In this section, we will present thefindings ofthis studyin the followingthreeareas:(1)metadataandcontrolledvocabulary schemata and metadatatools used, (2) criteria for selectingmetadata and controlled vocabularyschemata, and (3) exposing metadataand metadata guidelines beyond localenvironments.

Madaa and cndVabay shmaa andMadaa t ud

A great variety ofdigital objectswerehandled by the survey participants,asfigure1shows.Themostfrequentlyhan-dledobjectwastext,citedby86.5percentoftherespondents.Aboutthree-fourthsof the respondents described audiovi-sualmaterials(75.2percent),while60.1percentdescribedimagesand51.8per-centdescribedarchivalmaterials.Morethan65percentoftherespondentshan-dledelectronicresources (68.3percent)and digitized resources (66.7 percent),whileapproximatelyhalfhandledborn-

digital resources (52.5 percent). Thetypes ofmaterials described in digitalcollectionswerediverse,encompassingboth digitizedandborn-digitalmateri-als;however,digitizationaccountedfora slightly greater percentage of meta-datacreation.

To handle these diverse digitalobjects, the respondents institutionsemployed a wide range of metadataschemata, as figure 2 shows. Yet therewereafewschematathatwerewidelyused by cataloging and metadata pro-fessionals. Specifically, 84.2 percent

of the respondents institutions usedMARC;DCwasalsopopular,with25.4percentusingUnqualifiedDCand40.6percent using Qualified DC to createmetadata. EAD also was frequentlycited(31.7percent).Inadditiontothesemajortypesofmetadataschemata,therespondentsinstitutionsalsoemployedMetadata Object Description Schema(MODS) (17.8 percent), Visual Resource Association(VRA)Core (14.9 percent),and TextEncoding Initiative(TEI)(12.5percent).

Figure 1. Materials/resources handled (multiple responses)

Survey questin: wht type f mteris/resurces d yu nd yur fe ctgers/metdt ibrr-ins hnde? (pese check tht ppy)

Figure 2. Metadata schemata used (multiple responses)

Survey questin: which metdt schem(s) d yu nd yur fe ctgers/metdt ibrrinsuse? (pese check tht ppy)


6/13


custommetadataelementsderivesfromtheimperativeto accommodate the perceived needs of local collec-

tionsandusers,asindicatedbythetwomostcommonresponses: (1) to reflect the nature of local collec-tions/resources(76.9 percent) and (2)to reflect thecharacteristics of target audience/community of localcollections (58.3 percent). Local conditions were alsocitedfrominstitutionalandtechnicalstandpoints.Manyinstitutions(34.3percent)followexistinglocalpracticesforcatalogingandmetadatacreationwhile other insti-tutions(18.5percent)aremakinghomegrownmetadataadditionsbecauseofconstraintsimposedbytheirlocalsystems.

Table 6 summarizes themost frequentlyusedcon-trolled vocabulary schematas by resource type. By far

themostwidelyusedschemaacrossallresourcetypeswasLCSH.ThepreeminenceofLCSHevincesthecriti-calrolethatitplaysasthedefactoformofcontrolledvocabularyforsubjectdescription.LibraryofCongressClassification (LCC) was the second choice for allresourcetypesotherthanimages,culturalobjects,andarchives.Fordigital collectionsof these resource typesanddigitizedresources,AATwasthesecondmostusedcontrolledvocabulary,afactthatreflectsitspurposeasadomain-specificterminologyusedfordescribingworksof art, architecture, visual resources, material culture,andarchivalmaterials.

While traditional metadata schemata, content stan-dards, and controlled vocabularies such as MARC,

AACR2, and LCSH clearly were preeminent in themajority of the respondents institutions, currentmeta-datacreationindigitalrepositoriesandcollectionsfacesnew challenges from the enormous volume of onlineand digital resources.19 Approximately one-third of therespondents institutions (33.8 percent) were meetingthis challenge with tools for semiautomatic metadatageneration.Yeta majorityof respondents(52.5percent)indicated that their institutions did not use any suchtoolsformetadatacreationandmanagement.ThisresultseemstocontrastwithMasfindingthatautomaticmeta-data generation was used in some capacity in nearly

by OCLC RLG programs (as described in LiteratureReviewonpage105).16Inthesesurveys,UnqualifiedDC

wasmorefrequentlycitedthanQualifiedDC.Onepos-sibleexplanationofthislessfrequentuseofUnqualifiedDCmay lie inthe limitationsofUnqualifiedDCmeta-data semantics. Survey respondents also reported onproblemsusingDCmetadata,whichweremostlycausedby semantic ambiguities and semantic overlaps of cer-tainDCmetadata elements.17 Limitationsandissues ofUnqualified DC metadata semantics are discussed indepthinParksstudy. 18Inlightoftheseresults,examin-ingtrendsofQualifiedDCuseinafuturestudywouldbeinteresting.

Despitethewidevarietyofschematareportedinuse,thereseemedtobeaninclinationtouseonlyoneortwometadataschemataforresourcedescription.Asshownin

table5,themajorityoftherespondentsinstitutions(53.6percent) used only one schema for metadata creation,whileapproximately37percentusedtwoorthreesche-mata (26.2 percent and 10.3 percent, respectively). Theinstitutions usingmore than three schemataduring themetadata-creationprocessescomprisedonly9.9 percentoftherespondents.

Turningtocontentstandards(seefigure3),wefoundthatAACR2was themostwidelyused standard, indi-catedby84.5percentofrespondents.Thishighpercentageclearly reflects the continuing preeminence of MARCasthemetadataschemaofchoicefordigitalcollections.DC applicationprofiles also showeda largeuser base,

indicated bymore than one-third of respondents (37.0percent).Morethanonequarteroftherespondents(28.4percent)used EADapplicationguidelinesas developedby the Society of AmericanArchivists and the LibraryofCongress,while 10.6percentused RLGBestPracticeGuidelines for Encoded Archival Description (2002).Aboutonequarter(25.7percent)indicatedDACSastheircontentstandard.

Homegrownstandardsandguidelinesarelocalappli-cationprofilesthatclarifyexistingcontentstandardsandspecify howvalues for metadata elements are selectedandrepresentedtomeettherequirementsofaparticularcontext.Asshownintheresultsonmetadataschemata,itisnoteworthythathomegrowncontentstandardsand

guidelinesconstitutedoneofthemajorchoicesofpartici-pants,indicatedbymorethanone-fifthoftheinstitutions(22.1 percent). Almost two-fifths of the survey partici-pants(38percent)alsoreportedthattheyaddhomegrownmetadataelementstoagivenmetadataschema.Slightlylessthanhalfoftheparticipants(47.2percent)indicatedotherwise.

The local practice of creating homegrown contentguidelinesandmetadataelementsduringthemetadata-creationprocessdeserves a separate study; this studyonly briefly touches on the basis for locally addedcustom metadata elements. The motivation to create

Table 5. Number of metadata schemata in use

Number of Metadata

Schemata in Use Number of Participants

1 141 (53.6%)

2 69 (26.2%)

3 27 (10.3%)

4 or more 26 (9.9%)

N=263. Survey questin: which metdt schem(s) d yu nd yur fectgers/metdt ibrrins use the mst? (pese check tht ppy)


7/13


ca f sng Madaa and cndVabay shmaa

Whatare the factors thathaveshaped the current stateofmetadata-creationpracticesreported thusfar? In this

section,we turn our attention to constraints that affectdecisionmakingatinstitutionsintheselectionofmeta-data and controlled vocabulary schemata for subjectdescription.

Figure4presentsthepercentageofdifferentmetadataschemata selection criteria described by survey par-ticipants. First, collection-specific considerations clearlyplayedamajorroleintheselection.Themostfrequentlycitedreasonwastypesofresources(60.4percent).Thisresponsereflectsthefactthatalargenumberofmetadataschematahave been developed, oftenwithwide varia-tion in content and format, to better handle particular

two-thirdsofARLlibraries.20

Because semiautomatic metadata application isreported in-depth in a separate study, we only brieflysketch the topic here.21 The semiautomatic metadataapplicationtoolsusedinthe respondentsdigitalreposi-

toriesandcollectionscanbeclassifiedintofivecategoriesof common characteristics: (1)metadata format conver-sion, (2) templates and editors for metadata creation,(3) automatic metadata creation, (4) library system forbibliographic and authority control, and (5) metadataharvestingandimportingtools.

As table 7 illustrates, among those institutions thathave introduced semiautomatic metadata generationtools, metadata format conversion (38.6 percent) andtemplates and editors for metadata creation (27 per-cent)arethetwomostfrequentlycitedtools.

Figure 3. Content standards used (multiple responses)

Survey questin: wht cntent stndrd(s) nd/r guideines d yu nd yur fe ctgers/metdt ibrrins use?(pese check tht ppy)


8/13


job responsibility, expertiseof staff (44.2percent)andintegrated library system (39.9 percent) appeared tohighlightthekeyrolethatMARCcontinuestoplayinthemetadata-creationprocessfordigitalcollections(seefig-ure2).Budgetalsoappearedtobeanimportantfactorinmetadataselection(17.2percent),showingthatfundinglevelsplayedaconsiderableroleinmetadatadecisions.

typesofinformationresources.Theprimaryfactorinselectingmetadataschemataistheirsuit-ability fordescribingthemostcommontypeofresourceshan-dledbythesurveyparticipants.

Thesecondandthirdmostcommoncriteria,targetusers/audience (49.8 percent) andsubjectmatters of resources

(46.9 percent), also seem toreflect how domain-specificmetadataschemataareapplied.Inmaking decisions onmeta-data schemata, respondentsweighed materials in particular subject areas (e.g., art,education, and geography) and the needs of particu-lar communitiesofpractice as their primary users andaudiences.

However, existing technological infrastructure andresource constraints also determine options. Given theprominence of general library cataloging as a primary

Table 6. The most frequently used controlled vocabulary schema(s) by resource type (multiple responses)

LCSH LCC DDC AAT TGM ULAN TGN Other

Text79.5%

(241)

35.6%

(108)

16.8%

(51)

10.2%

(31)

6.9%

(21)

3.6%

(11)

5.0%

(15)

14.2%

(43)

Audiovisual materials67.3%

(204)

25.1%

(76)

12.9%

(39)

9.2%

(28)

8.6%

(26)

4.0%

(12)

5.0%

(15)

14.5%

(44)

Cartographic materials44.9%

(136)

17.5%

(53)

7.3%

(22)

5.0%

(15)

4.3%

(13)

1.3%

(4)

4.3%

(13)

6.3%

(19)

Images43.2%

(131)

11.9%

(36)

5.6%

(17)

25.7%

(78)

20.1%

(61)

9.9%

(30)

10.6%

(32)

11.2%

(34)

Cultural objects

(e.g., museum objects)

20.1%

(61)

7.3%

(22)

4.3%

(13)

13.2%

(40)

6.3%

(19)

4.6%

(14)

3.0%

(9)

7.9%

(24)

Archives44.2%

(134)

11.6%

(35)

6.3%

(19)

11.9%

(36)

6.6%

(20)

3.0%

(9)

2.6%

(8)

12.2%

(37)

Electronic resources60.7%

(184)

23.4%

(71)

8.6%

(26)

5.3%

(16)

3.6%

(11)

1.7%

(5)

3.0%

(9)

14.2%

(43)

Digitized resources51.8%

(157)

15.5%

(47)

5.0%

(15)

15.5%

(47)

10.2%

(31)

6.6%

(20)

7.6%

(23)

15.2%

(46)

Born-digital resources43.9%

(133)

13.5%

(41)

5.6%

(17)

8.3%

(25)

7.3%

(22)

4.3%

(13)

4.6%

(14)

13.9%

(42)

Survey questin: which cntred vcbury schem(s) d yu nd yur fe ctgers/metdt ibrrins use mst? (Pese check tht ppy)

Table 7. Types of semi-automatic metadata generation tools in use

Types Response Rating

Metadata format conversion 38 (38.6%)

Templates and editors for metadata creation 26 (27.0%)

Automatic metadata creation 16 (16.7%)

Library system for bibliographic and authority control 15 (15.6%)

Metadata harvesting and importing tools 8 (8.3%)

N= 96. Survey questin: Pese describe the (semi)utmtic metdt genertin ts yu use.


9/13


the software used by their insti-tutionsi.e., integrated library

system (39.9 percent), digitalcollection or asset managementsoftware (25.4 percent), institu-tional repository software (19.8percent), union catalogs (14.9percent), and archival manage-mentsoftware(5.6percent)asareason for their selection ofmeta-dataschemata.Metadatadecisionsthus seem to be drivenby avari-etyoflocaltechnologychoicesfordevelopingdigitalrepositoriesandcollections.

As shown in figure 5, similar

patterns areobservedwith regardto selection criteria for controlledvocabulary schemata. Three ofthe four selection criteria receiv-ing majority responsestargetusers/audience (55.4 percent),type of resources (54.8 percent),andnatureofthecollection(50.2percent)suggest that controlledvocabulary decisions are influ-encedprimarilybythesubstantivepurpose and scope of controlledvocabularies for local collections.

A major consideration seems tobe whether particular controlledvocabularies are suitable for rep-resenting standard data values toimprove access and retrieval fortargetaudiences.

Metadata standards, anotherselection criteria frequently citedin the survey (54.1 percent),reflects how some domain-spe-cific metadata schemata tend todictate the use of particular con-trolled vocabularies. At the sametime, the results also suggest that

resources and technological infra-structure available to institutionswere also important reasons fortheir selections. Expertise ofstaff (38.3 percent) seems to bea straightforward practical rea-son: the application of controlledvocabularies is highly dependenton the width and depth of staffexpertiseavailable.Likewise,when

implementing controlled vocabularies in the digitalenvironment, some institutions also took into account

Atthesametime,itisnoteworthythatwhileresponseswere not mutually exclusive, many respondents cited

Figure 4. Criteria for selecting metadata schemata (multiple responses)

Questin: which criteri ere ppied in seecting metdt schemt? (pese check tht ppy)

Figure 5. Criteria for selecting controlled vocabulary schemata (multiple responses)

Questin: which criteri re ppied in seecting cntred vcbury schemt? (Pese check tht ppy)


10/13


forsearchengines and63.2 percentfor OAI harvesters), a result that

maybeinterpretedasatendencytocreate metadata primarily for localaudiences.

Why do many institutions failtomake their locally createdmeta-data available to other institutionsdespite wide consensus on theimportance of metadata sharing inanetworkedworld?Responsesfromthose institutions exposingnoneornot all of theirmetadata (see table8) reveal that financial, personnel,and technical issues are major hin-drances inpromoting the exposure

ofmetadataoutside the immediatelocalenvironment.Someinstitutionsare notconfident that their currentmetadata practices are able to sat-isfy the technical requirements forproducingstandards-basedinterop-erable metadata. Another reasonfrequently mentioned is copyrightconcernsaboutlimited-accessmate-rials. Yet some respondents simplydo not see any merit to exposingtheir item-levelmetadata, citing its

relativeuselessnessfor resourcediscoveryoutside their

institutions.As stated earlier, the practice of adding home-grownmetadataelementsseemscommonamongmanyinstitutions. While locally created metadata elementsaccommodate local needs and requirements, they mayalso hinder metadata interoperability across digitalrepositories and collections if mechanisms for findinginformation about such locally defined extensions andvariants are absent. Homegrown metadata guidelinesdocumentlocaldatamodelsandfunctionasanessentialmechanismformetadatacreationandqualityassurancewithin and across digital repositories and collections.23

In this regard, it is essential toexamine locally createdmetadata guidelines and best practices.24 However, the

resultsofthesurveyanalysisevincethatthevastmajorityofinstitutions(72.0percent)providednopublicaccesstolocalapplicationprofilesontheirwebsiteswhileonly19.6percentofrespondentsinstitutionsmadethemavailableonlinetothepublic.

Conclusion

Metadataplaysanessentialroleinmanaging,organizing,andsearchingforinformationresources.Inthenetworked

existing system features for authority control and con-

trolledvocabularysearching,asexhibitedby17.2percentofresponsesfordigitalcollection/orassetmanagementsoftware.

exng Madaa and Madaa Gdnbynd la envnmn

Metadata interoperability across distributed digitalrepositories and collections is fast becoming a majorissue.22Theproliferationofopen-sourceandcommercialdigitallibraryplatformsusingavarietyofmetadatasche-matahasimplicationsonthelibrariansabilitytocreateshareable and interoperable metadata beyond the localenvironment.Towhatextentaremechanismsforsharing

metadata integrated into the current metadata-creationpracticesdescribedbytherespondents?

Figure 6 summarizes the responses concerning theusesofthreemajormechanismsformetadataexposure.Approximatelyhalfofrespondentsexposedatleastsomeof their metadata to search engines (52.8 percent) andunion catalogs such asOCLCWorldCat (50.6 percent).More than one-third of the respondents exposed all orsome of their metadata through OAI harvesters (36.8percent).About half ormore of the respondents eitherdidnotexposetheirmetadataorwerenotsureaboutthecurrentoperationsat theirinstitutions(e.g.,47.2percent

Figure 6. Mechanism to expose metadata (multiple responses)

Survey questin: D yu/yur rgniztin expse yur metdt t oaI (open archives Inititive)hrvesters, unin ctgs r serch engines?


11/13


TheDCmetadataschemaisthesecondmostwidelyemployed according to this study, with Qualified DC

used by 40.6 percent of responding institutions andUnqualifiedDCusedby25.4percent.EADisanotherfre-quentlycitedschema(31.7percent),followedbyMODS(17.8percent),VRA(14.9percent),andTEI(12.5percent).AtrendofQualifiedDCbeingused(40.6percent)moreoftenthanUnqualifiedDC(25.4percent)isnoteworthy.One possibleexplanationof this trendmay bederivedfromthefactthatsemanticambiguitiesandoverlapsinsomeoftheUnqualifiedDCelementsinterferewithuseinresourcedescription.25Giventheearliersurveysreport-ingthehigheruseofUnqualifiedDCoverQualifiedDC,morein-depthexaminationoftheirusetrendsmaybeanimportantavenueforfuturestudies.

Despiteactiveresearchandpromisingresultsobtained

from some experimental tools, practical applications ofsemiautomatic metadata generation have been incor-porated into the metadata-creation processes by onlyone-thirdofsurveyparticipants.

The leading criteria in selecting metadata andcontrolledvocabularyschemataarederivedfromcollec-tion-specificconsiderationsof thetypeof resources, thenatureofthecollections,andtheneedsofprimaryusersand communities. Existing technological infrastructure,encompassing digital collection or asset managementsoftware, archival management software, institutionalrepository software, integrated library systems, andunion catalogs also greatly influence the selection pro-

cess.Theskillsandknowledgeofmetadataprofessionalsand the expertise of staff also are significant factors inunderstanding current practices in theuse ofmetadataschemataandcontrolled vocabularies forsubject accessacrossdistributeddigitalrepositoriesandcollections.

The survey responses reveal thatmetadata interop-erability remains a challenge in the current networkedenvironment despite growing awareness of its impor-tance. For half of the survey respondents, exposingmetadatatotheserviceproviders,suchasOAIharvesters,unioncatalogs,andsearchengines,doesnotseemtobeahighprioritybecauseof localfinancial,personnel,andtechnicalconstraints.Locallycreatedmetadataelementsareaddedinmanydigitalrepositoriesandcollectionsin

largeparttomeetlocaldescriptiveneedsandservethetarget user community. While locally createdmetadataelementsaccommodatelocalneeds,theymayalsohindermetadata interoperability acrossdigital repositoriesandcollectionswhenshareablemechanismsarenotinplaceforsuchlocallydefinedextensionsandvariants.

Locallycreatedmetadataguidelinesandapplicationprofiles are essential for metadata creation and qualityassurance;however,mostcustomcontentguidelinesandbest practices (72 percent) are not made publicly avail-able.Thelackofamechanismtofacilitatepublicaccesstolocalapplicationprofilesandmetadataguidelinesmay

environment,theenormousvolumeofonlineanddigitalresources createsan impendingresearchneed to evalu-atetheissuessurroundingthemetadata-creationprocessandtheemploymentofcontrolledvocabularyschemataacross ever-growingdistributeddigital repositories andcollections.Inthispaperweexploredthecurrentstatusof metadata-creation practices through an examinationof survey responsesdrawnmostly from cataloging andmetadataprofessionals(seetables2,3,and4).Theresultsofthestudyindicatethatcurrentmetadatapracticesstilldonotcreateconditionsforinteroperability.

Despite the proliferation of newer metadata sche-mata,thesurveyresponsesshowedthatMARCcurrently

remains the most widely used schema for providingresource description and access in digital repositories,collections, and libraries.The continuingpredominanceofMARCgoeshand-in-handwiththeuseofAACR2astheprimarycontentstandardforselectingandrepresent-ingdatavaluesfordescriptivemetadataelements.LCSHisusedasthedefactocontrolledvocabularyschemaforprovidingsubjectaccessinalltypesofdigitalrepositoriesandcollections,whiledomain-specificsubjectterminolo-giessuchasAATareappliedatsignificantlyhigherratesindigital repositorieshandlingnonprint resourcessuchasimages,culturalobjects,andarchivalmaterials.

Table 8. Sample reasons for not exposing metadata

Not all our metadata conforms to standardsrequired

Not all our metadata is OAI compliant

Lack of expertise and time and money to

develop it

IT restrictions

Security concerns on the part of our information

technology department

Some collections/records are limited access and

not open to the general public

We think that having WorldCat available fortraditional library materials that many libraries

have is a better service to people than having

each library dump our catalog out on the web

Varies by tool and collection, but usually a

restriction on the material, a technical barrier, or

a feeling that for some collections the data is not

yet sufficiently robust still in a work in progress

Survey questin: If yu seected sme, but nt r n in questin 13 [seefigure 6], pese te hy yu d nt expse yur metdt.


12/13


presentedat2003DublinCoreConference:SupportingCommu-nities ofDiscourse andPracticeMetadata Research&Appli-

cations,Seattle,Wash.,Sept.28Oct. 2,2003), http://dcpapers.dublincore.org/ojs/pubs/article/view/732/728(accessedMar.24, 2009); Sarah Currier et al.,QualityAssurance forDigitalLearningObjectRepositories: Issues for theMetadata-creationprocess,ALT-J12(2004):520.

3. JinMa,Metadata,SPECKit298(Washington,D.C.:Asso-ciationofResearchLibraries,2007):13,28.

4. Ibid.,12,2122.5. KarenSmith-Yoshimura,RLG Programs Descriptive Meta-

data Practices Survey Results (Dublin, Ohio:OCLC,2007): 67,http://www.oclc.org/programs/publications/reports/2007-03.pdf (accessed Mar. 24, 2009); Karen Smith-Yoshimura andDiane Cellentani, RLG Programs Descriptive Metadata PracticesSurvey Results: Data Supplement(Dublin,Ohio:OCLC,2007):16,http://www.oclc.org/programs/publications/reports/2007-04

.pdf(accessedMar.24,2009).6. CarolePalmer,OksanaZavalina,andMeganMustafoff,

Trends inMetadata Practices:A Longitudinal Studyof Col-lection Federation (paper presented at the Seventh ACM/IEES-CSJointConferenceonDigitalLibraries,Vancouver,Brit-ish Columbia, Canada, June 1823, 2007), http://hdl.handle.net/2142/8984(accessedMar.24,2009).

7. Smith-Yoshimura,RLG Programs Descriptive Metadata Prac-tices Survey Results, 7; Smith-Yoshimura and Cellentani, RLGPrograms Descriptive Metadata Practices Survey Results,17.

8. Ma,Metadata,12,2223.9. Smith-Yoshimura,RLG Programs Descriptive Metadata Prac-

tices Survey Results, 7; Smith-Yoshimura and Cellentani, RLGPrograms Descriptive Metadata Practices Survey Results,1821.10. KarenMarkeyetal.,Census of Institutional Repositories in

the United States: MIRACLE Project Research Findings(Washing-ton,D.C.:CouncilonLibrary&InformationResources,2007):3,4650, http://www.clir.org/pubs/reports/pub140/pub140.pdf(accessedMar.24,2009).11. YoshimuraandCellentani,RLG Programs Descriptive Meta-

data Practices Survey Results,24.12. University ofHoustonLibrariesInstitutionalRepository

TaskForce,Institutional Repositories,SPECKit292(Washington,D.C.:AssociationofResearchLibraries,2006):18,78.13. Ma,Metadata,13,28.14. Smith-Yoshimura,RLG Programs Descriptive Metadata Prac-

tices Survey Results,9,11;Smith-YoshimuraandCellentani,RLGPrograms Descriptive Metadata Practices Survey Results,2729.15. For the metrics of job responsibilities used to analyze

jobdescriptionsand competenciesof catalogingandmetadata

professionals,seeJung-ranPark,CaimeiLu,andLindaMarion,CatalogingProfessionalsintheDigitalEnvironment:AContentAnalysisofJobDescriptions,Journal of the American Society forInformation Science & Technology60(2009):84457;Jung-ranParkandCaimeiLu,MetadataProfessionals:RolesandCompeten-ciesasReflectedinJobAnnouncements,20032006, Cataloging& Classification Quarterly47(2009):14560.16. Ma,Metadata;Smith-Yoshimura,RLG Programs Descriptive

Metadata Practices Survey Result.17. Jung-ranParkandEricChildress,DublinCoreMetadata

Semantics:AnAnalysisofthePerspectivesofInformationPro-fessionals,Joural of Information Science35,no.6(2009):72739.18. Park,SemanticInteroperability.19. Jung-ranPark,MetadataQualityinDigitalRepositories:

hindercross-checkingforqualitymetadataandcreatingshareablemetadatathatcanbeharvestedforahighlevel

of consistency and interoperability across distributeddigital collections and repositories. Development of asearchableregistryforpubliclyavailablemetadataguide-lineshaspotentialtoenhancemetadatainteroperability.

Aconstraining factor ofthisstudyderives fromtheparticipant population; thus we have not attempted togeneralize the findings of the study. However, resultsindicateapressingneedforacommondatamodelthatis shareableand interoperable across ever-growing dis-tributeddigitalrepositoriesandcollections.Developmentofsuchacommondatamodeldemandsfutureresearchof a practical and interoperable mediation mechanismunderlying local implementationofmetadata elements,semantics,content standards,and controlled vocabular-

ies in aworldwheremetadata can be distributed andsharedwidelybeyondtheimmediatelocalenvironmentand user community. (Other issues such as semiauto-matic metadata application, DC metadata semantics,custommetadata elements, and the professional devel-opment of cataloging and metadata professionals areexplained in-depth in separate studies.)26 For futurestudies,incorporationofotherresearchmethods(suchasfollow-uptelephonesurveysandface-to-facefocusgroupinterviews)couldbeusedtobetterunderstandthecur-rent status of metadata-creation practices. Institutionalvariationalsoneedsbetakenintoaccountinthedesignoffuturestudies.

Acknowledgments

Thisstudyissupportedthroughanearlycareerdevelop-mentresearchawardfromtheInstituteofMuseumandLibraryServices.Wewouldliketoexpressourapprecia-tiontothereviewersfortheirinvaluablecomments.

References

1.Jung-ranPark, SemanticInteroperability andMetadataQuality:AnAnalysisofMetadataItemRecordsofDigitalImage

Collections, Knowledge Organization 33 (2006): 2034; RachelHeery, Metadata Futures: Steps toward Semantic Interoper-ability,inMetadata in Practice,ed.DianeI.HillmanandElaineL. Westbrooks, 25771 (Chicago: ALA, 2004); Jung-ran Park,Semantic Interoperability across Digital Image Collections:APilotStudyonMetadataMapping (paperpresented at theCanadianAssociationforInformationScience2005AnnualCon-ference,London,Ontario,June24,2005),http://www.cais-acsi.ca/proceedings/2005/park_J_2005.pdf(accessedMar.24,2009).

2. JaneBarton,SarahCurrier,andJessieM.N.Hey,BuildingQualityAssuranceintoMetadataCreation:AnAnalysisBasedontheLearningObjectsandE-PrintsCommunitiesofPractice(paper


13/13

116 iNForMAtioNtecHNoloGY AND liBrAries | septeMBer 2010

A Survey of the Current Stateof theArt, inMetadata andOpenAccessRepositories, ed.MichaelS. BabinecandHollyMercer,specialissue,Cataloging & Classification Quarterly47,no.3/4(2009):21338.20. Ma,Metadata,12,24.TheOCLCRLGsurveyfoundthat

about40percentoftherespondentswereabletogeneratesomemetadata automatically. See Smith-Yoshimura, RLG ProgramsDescriptive Metadata Practices Survey Results, 6;YoshimuraandCellentani,RLG Programs Descriptive Metadata Practices SurveyResults,35.21. Jung-ran Park and Caimei Lu, Application of Semi-

AutomaticMetadataGenerationinLibraries:Types,Tools,andTechniques, Library & Information Science Research 31, no. 4(2009):22531.22. Park,SemanticInteroperability;SarahL.Shreevesetal.,

IsQualityMetadata ShareableMetadata?The ImplicationsofLocalMetadataPractices forFederated Collections (paperpresentedatthe12thNationalConferenceoftheAssociationof

CollegeandResearchLibraries,Apr. 710, 2005,Minneapolis,Minnesota), https://www.ideals.uiuc.edu/handle/2142/145(accessedMar. 24, 2009);AmyS. Jacksonet al.,Dublin CoreMetadataHarvestedthroughOAI-PMH,Journal of Library Meta-data8,no.1(2008):521;LoisMaiChanandMarciaLeiZeng,

Metadata Interoperability and StandardizationA Study ofMethodologyPartI:AchievingInteroperability at theSchemaLevel,D-Lib Magazine 12,no. 6 (2006),http://www.dlib.org/dlib/june06/chan/06chan.html(accessedMar.24,2009);MarciaLeiZeng andLoisMaiChan, Metadata InteroperabilityandStandardizationA Study ofMethodology Part II:AchievingInteroperability at the Record and Repository Levels,D-LibMagazine 12, no. 6 (2006), http://www.dlib.org/dlib/june06/zeng/06zeng.html(accessedMar.24,2009).23. Thomas R. Bruce and Diane I. Hillmann, The Con-

tinuumofMetadataQuality:Defining,Expressing,Exploiting,inMetadata in Practice, ed. Hillman and Westbrooks, 23856;Heery,MetadataFutures;Park,MetadataQualityinDigitalRepositories.24. Jung-ran Park, ed., Metadata Best Practices: Current

IssuesandFutureTrends,specialissue,Journal of Library Meta-data9,no.3/4(2009).25. SeePark,SemanticInteroperability;ParkandChildress,

DublinCoreMetadataSemantics.26. Park andChildress,DublinCoreMetadataSemantics;

ParkandLu,ApplicationofSemi-AutomaticMetadataGenera-tioninLibraries.

Documents

Metadata Creation Practices in Digital