Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett,...

Preview:

Citation preview

Discovering and Describing DPLA Collections

Students: Madhura Parikh, Zhang Zhang

Karen Wickett, Unmil P. KaradkarSchool of Information

The University of Texas at Austin

in the portal

in the portal

in the data

in the data

so what?• Collection-level entities and collection

descriptions can support a range of functions:– representing data providers– providing context for items– managing and presenting search results– assessing relevance and accessibility– supporting the contribution of collections by users.

Modeling Cultural Collections for Digital Aggregation and Exchange Environments. CIRSS Technical Report 201310-1, University of Illinois at Urbana-Champaign.

Approach• Based on collection/item propagation rules

– link item-level attribute/value pairs to collection-level attribute/values pairs• collection attributes from a collection-level schema• item attributes from DPLA’s

Metadata Application Profile

– in general, allow reasoning in either direction– we are experimenting with building descriptions of

collections, using:• descriptions of items• collection membership• a guiding propagation rule

collection-level properties• Collection title• Collection

description• Begin date• End date• Geographic

boundary• Places

• Subjects• Formats• Languages• Genres • Rights

Approach• Take data from the DPLA based on collection

membership– e.g. all items in the “Minnesota Newspapers

Collection”• Pick a target collection-level field

– e.g. dc:subject• Identify source data fields in item records

– e.g. dc:subject and dc:description

Approach (con’t)• Aggregate item data from across the

collection– e.g. all unique subject strings, along with

frequency counts• Derive collection-level values for the selected

attribute– e.g. five subjects for the Minnesota Newspaper

Collection.• Add attribute/value pair to collection record

WARNING: The following presentation contains strong,

graphic imagery.

Collection-level Metadata Generation

Support for: Portal users, Humanities scholars

Architecture

Aggregated subject values

Aggregated date values

CI I I I I I …

Date Deriver

Collection Date values

Subject Deriver

Collection Subject values

Collection Description

Extract

Derive

Enrich

Aggregated Spatial values

Spatial Deriver

Collection Subject values

Aggregate

Populate

ArtStor

Dates Format Variations

Date Processing

Begin and end dates

imperfect but consistent

Parser Factory

Inside the Date DeriverAggregated date values

Rule Factory 

Begin year

 

End year

  Additional rules

DD

D

D

Years with known formats

D

DDD

D

D D

D

D

D

D

Collection Date values

D

D

Subject - Phrases

Commonalities and Differences

Thresholded Boundaries

VariantsOjibwe-

Ojibway

GLBT-LGBT

Hierarchies

Labor Unions

Minnesota

Minneapolis

Newspapers

Labor Unions

Organizing

Automatic? Descriptions

Automatic? Descriptions

Inside the Subject Deriver

Aggregated subject values

Parser Factory

Tokenizer Tokenizer Tokenizer

Rule Factory

 Thresholddetector

Cluster generator

Wordnet analyzer

Other

rules

Aggregated title values

Aggregated description values

Collection Subject values

Current Description

id: 49b09ce719c5184f166920a1a7c1e8cd

Title: Minnesota Newspapers Collection

Description: The Minnesota Digital Library is now providing access to some of Minnesota's historical newspapers. We are focusing our attention on titles, volumes and issues that were never microfilmed, and where the originals are frail and not frequently available to the public

.collectionResource.<property>

dateCreated: 3/30/2015

itemCount: 3528

date.begin: 1867

date.end: 2009

subjects: [Helpers, lockouts, Drivers, Indian, Indians, American, Sauk, Minnesota, Minneapolis, Gay, GLBT, Homosexuality, missions, Mission, Community, Ojibwa, Ojibway, Ojibwe, Pine, River, County, Strikes, Petroleum, Union]

Enhanced Description

spatial.boundary: [[153.06667, -27.28333], [-99.8111038208, 41.5272712708], [-94.8796463013, 47.4731407166], [132.270004272, -14.4532003403], [153.06667, -27.28333]]

formats: newspapers

languages: English, Dakota

dataProviders: [“Bemidji State University”, “Center for Human Resources and Labor Studies”, “Heritage Group North”, “Morrison County Historical Society”, “Morrison County Historical Society”, “Quatrefoil Library”, Sauk Centre Area Historical Society”, “Synod of Lakes and Prairies”]

rights:

Enhanced Description

Visual Assessment

S

S

S

S

C

C

C

C

DPLAD

D

D

DD

D

DD

D

D

DD

Collection ProfilesSupport for: DPLA, Hub, Data provider Staff

Approach

Numeric characterization

(for now) ignore semantic assessment

Assess consistency

enhance automation, computation

Assess compliance to MAP (3.1)

required, recommended fields

Support visual analysis

(early stage)

Collection Profile

DPLA Collection data

Administrative data

Collection and item details

Collection Details

Item Details

Visual Analysis

id: 49b09ce719c5184f166920a1a7c1e8cd

Title: Minnesota Newspapers Collection

Item titles Item rights

Other Fields

publisherformat

coordinates names

spatial

subjects

Subjects - Assessment

Subjects - Analysis

Correlations

coordinates names

Correlations

coordinates namessubjects

Collection description dashboard

Evaluation of developed algorithms and metrics

Implications and Ongoing Work

Contact

Unmil P. Karadkar <unmil@ischool.utexas.edu>

Karen Wickett <wickett@ischool.utexas.edu>

Temple Teaching Fellowship, School of Information, UT Austin

Acknowledgements

Mark Matienzo, Tom Johnson, Gretchen Gueguen, and the DPLA staff

Student programmers: Jiexian Li, Zheyuan Zhu, Nan Guo, Ruoying Li, Jeremy Tzou, Julia Link, Andrew Florance, Joshua

Sheehy, Meghanath Reddy, Robert Flores, Sowmya Sadhasivam

Collection description dashboard

which features?

which fields?

Evaluation of developed algorithms and metrics

Discussion

Recommended