View
218
Download
1
Category
Preview:
Citation preview
Discovering and Describing DPLA Collections
Students: Madhura Parikh, Zhang Zhang
Karen Wickett, Unmil P. KaradkarSchool of Information
The University of Texas at Austin
in the portal
in the portal
in the data
in the data
so what?• Collection-level entities and collection
descriptions can support a range of functions:– representing data providers– providing context for items– managing and presenting search results– assessing relevance and accessibility– supporting the contribution of collections by users.
Modeling Cultural Collections for Digital Aggregation and Exchange Environments. CIRSS Technical Report 201310-1, University of Illinois at Urbana-Champaign.
Approach• Based on collection/item propagation rules
– link item-level attribute/value pairs to collection-level attribute/values pairs• collection attributes from a collection-level schema• item attributes from DPLA’s
Metadata Application Profile
– in general, allow reasoning in either direction– we are experimenting with building descriptions of
collections, using:• descriptions of items• collection membership• a guiding propagation rule
collection-level properties• Collection title• Collection
description• Begin date• End date• Geographic
boundary• Places
• Subjects• Formats• Languages• Genres • Rights
Approach• Take data from the DPLA based on collection
membership– e.g. all items in the “Minnesota Newspapers
Collection”• Pick a target collection-level field
– e.g. dc:subject• Identify source data fields in item records
– e.g. dc:subject and dc:description
Approach (con’t)• Aggregate item data from across the
collection– e.g. all unique subject strings, along with
frequency counts• Derive collection-level values for the selected
attribute– e.g. five subjects for the Minnesota Newspaper
Collection.• Add attribute/value pair to collection record
WARNING: The following presentation contains strong,
graphic imagery.
Collection-level Metadata Generation
Support for: Portal users, Humanities scholars
Architecture
Aggregated subject values
Aggregated date values
CI I I I I I …
Date Deriver
Collection Date values
Subject Deriver
Collection Subject values
Collection Description
Extract
Derive
Enrich
Aggregated Spatial values
Spatial Deriver
Collection Subject values
Aggregate
Populate
ArtStor
Dates Format Variations
Date Processing
Begin and end dates
imperfect but consistent
Parser Factory
Inside the Date DeriverAggregated date values
Rule Factory
Begin year
End year
Additional rules
DD
D
D
Years with known formats
D
DDD
D
D D
D
D
D
D
Collection Date values
D
D
Subject - Phrases
Commonalities and Differences
Thresholded Boundaries
VariantsOjibwe-
Ojibway
GLBT-LGBT
Hierarchies
Labor Unions
Minnesota
Minneapolis
Newspapers
Labor Unions
Organizing
Automatic? Descriptions
Automatic? Descriptions
Inside the Subject Deriver
Aggregated subject values
Parser Factory
Tokenizer Tokenizer Tokenizer
Rule Factory
Thresholddetector
Cluster generator
Wordnet analyzer
Other
rules
Aggregated title values
Aggregated description values
Collection Subject values
Current Description
id: 49b09ce719c5184f166920a1a7c1e8cd
Title: Minnesota Newspapers Collection
Description: The Minnesota Digital Library is now providing access to some of Minnesota's historical newspapers. We are focusing our attention on titles, volumes and issues that were never microfilmed, and where the originals are frail and not frequently available to the public
.collectionResource.<property>
dateCreated: 3/30/2015
itemCount: 3528
date.begin: 1867
date.end: 2009
subjects: [Helpers, lockouts, Drivers, Indian, Indians, American, Sauk, Minnesota, Minneapolis, Gay, GLBT, Homosexuality, missions, Mission, Community, Ojibwa, Ojibway, Ojibwe, Pine, River, County, Strikes, Petroleum, Union]
Enhanced Description
spatial.boundary: [[153.06667, -27.28333], [-99.8111038208, 41.5272712708], [-94.8796463013, 47.4731407166], [132.270004272, -14.4532003403], [153.06667, -27.28333]]
formats: newspapers
languages: English, Dakota
dataProviders: [“Bemidji State University”, “Center for Human Resources and Labor Studies”, “Heritage Group North”, “Morrison County Historical Society”, “Morrison County Historical Society”, “Quatrefoil Library”, Sauk Centre Area Historical Society”, “Synod of Lakes and Prairies”]
rights:
Enhanced Description
Visual Assessment
S
S
S
S
C
C
C
C
DPLAD
D
D
DD
D
DD
D
D
DD
Collection ProfilesSupport for: DPLA, Hub, Data provider Staff
Approach
Numeric characterization
(for now) ignore semantic assessment
Assess consistency
enhance automation, computation
Assess compliance to MAP (3.1)
required, recommended fields
Support visual analysis
(early stage)
Collection Profile
DPLA Collection data
Administrative data
Collection and item details
Collection Details
Item Details
Visual Analysis
id: 49b09ce719c5184f166920a1a7c1e8cd
Title: Minnesota Newspapers Collection
Item titles Item rights
Other Fields
publisherformat
coordinates names
spatial
subjects
Subjects - Assessment
Subjects - Analysis
Correlations
coordinates names
Correlations
coordinates namessubjects
Collection description dashboard
Evaluation of developed algorithms and metrics
Implications and Ongoing Work
Contact
Unmil P. Karadkar <unmil@ischool.utexas.edu>
Karen Wickett <wickett@ischool.utexas.edu>
Temple Teaching Fellowship, School of Information, UT Austin
Acknowledgements
Mark Matienzo, Tom Johnson, Gretchen Gueguen, and the DPLA staff
Student programmers: Jiexian Li, Zheyuan Zhu, Nan Guo, Ruoying Li, Jeremy Tzou, Julia Link, Andrew Florance, Joshua
Sheehy, Meghanath Reddy, Robert Flores, Sowmya Sadhasivam
Collection description dashboard
which features?
which fields?
Evaluation of developed algorithms and metrics
Discussion
Recommended