40
Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University of Texas at Austin

Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University

Embed Size (px)

Citation preview

Page 1: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University

Discovering and Describing DPLA Collections

Students: Madhura Parikh, Zhang Zhang

Karen Wickett, Unmil P. KaradkarSchool of Information

The University of Texas at Austin

Page 2: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University

in the portal

Page 3: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University

in the portal

Page 4: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University

in the data

Page 5: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University

in the data

Page 6: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University

so what?• Collection-level entities and collection

descriptions can support a range of functions:– representing data providers– providing context for items– managing and presenting search results– assessing relevance and accessibility– supporting the contribution of collections by users.

Modeling Cultural Collections for Digital Aggregation and Exchange Environments. CIRSS Technical Report 201310-1, University of Illinois at Urbana-Champaign.

Page 7: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University

Approach• Based on collection/item propagation rules

– link item-level attribute/value pairs to collection-level attribute/values pairs• collection attributes from a collection-level schema• item attributes from DPLA’s

Metadata Application Profile

– in general, allow reasoning in either direction– we are experimenting with building descriptions of

collections, using:• descriptions of items• collection membership• a guiding propagation rule

Page 8: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University

collection-level properties• Collection title• Collection

description• Begin date• End date• Geographic

boundary• Places

• Subjects• Formats• Languages• Genres • Rights

Page 9: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University

Approach• Take data from the DPLA based on collection

membership– e.g. all items in the “Minnesota Newspapers

Collection”• Pick a target collection-level field

– e.g. dc:subject• Identify source data fields in item records

– e.g. dc:subject and dc:description

Page 10: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University

Approach (con’t)• Aggregate item data from across the

collection– e.g. all unique subject strings, along with

frequency counts• Derive collection-level values for the selected

attribute– e.g. five subjects for the Minnesota Newspaper

Collection.• Add attribute/value pair to collection record

Page 11: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University

WARNING: The following presentation contains strong,

graphic imagery.

Page 12: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University

Collection-level Metadata Generation

Support for: Portal users, Humanities scholars

Page 13: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University

Architecture

Aggregated subject values

Aggregated date values

CI I I I I I …

Date Deriver

Collection Date values

Subject Deriver

Collection Subject values

Collection Description

Extract

Derive

Enrich

Aggregated Spatial values

Spatial Deriver

Collection Subject values

Aggregate

Populate

Page 14: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University

ArtStor

Dates Format Variations

Page 15: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University

Date Processing

Begin and end dates

imperfect but consistent

Page 16: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University

Parser Factory

Inside the Date DeriverAggregated date values

Rule Factory 

Begin year

 

End year

  Additional rules

DD

D

D

Years with known formats

D

DDD

D

D D

D

D

D

D

Collection Date values

D

D

Page 17: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University

Subject - Phrases

Page 18: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University

Commonalities and Differences

Page 19: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University

Thresholded Boundaries

VariantsOjibwe-

Ojibway

GLBT-LGBT

Hierarchies

Labor Unions

Minnesota

Minneapolis

Newspapers

Labor Unions

Organizing

Page 20: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University

Automatic? Descriptions

Page 21: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University

Automatic? Descriptions

Page 22: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University

Inside the Subject Deriver

Aggregated subject values

Parser Factory

Tokenizer Tokenizer Tokenizer

Rule Factory

 Thresholddetector

Cluster generator

Wordnet analyzer

Other

rules

Aggregated title values

Aggregated description values

Collection Subject values

Page 23: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University

Current Description

id: 49b09ce719c5184f166920a1a7c1e8cd

Title: Minnesota Newspapers Collection

Description: The Minnesota Digital Library is now providing access to some of Minnesota's historical newspapers. We are focusing our attention on titles, volumes and issues that were never microfilmed, and where the originals are frail and not frequently available to the public

Page 24: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University

.collectionResource.<property>

dateCreated: 3/30/2015

itemCount: 3528

date.begin: 1867

date.end: 2009

subjects: [Helpers, lockouts, Drivers, Indian, Indians, American, Sauk, Minnesota, Minneapolis, Gay, GLBT, Homosexuality, missions, Mission, Community, Ojibwa, Ojibway, Ojibwe, Pine, River, County, Strikes, Petroleum, Union]

Enhanced Description

Page 25: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University

spatial.boundary: [[153.06667, -27.28333], [-99.8111038208, 41.5272712708], [-94.8796463013, 47.4731407166], [132.270004272, -14.4532003403], [153.06667, -27.28333]]

formats: newspapers

languages: English, Dakota

dataProviders: [“Bemidji State University”, “Center for Human Resources and Labor Studies”, “Heritage Group North”, “Morrison County Historical Society”, “Morrison County Historical Society”, “Quatrefoil Library”, Sauk Centre Area Historical Society”, “Synod of Lakes and Prairies”]

rights:

Enhanced Description

Page 26: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University

Visual Assessment

Page 27: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University

S

S

S

S

C

C

C

C

DPLAD

D

D

DD

D

DD

D

D

DD

Collection ProfilesSupport for: DPLA, Hub, Data provider Staff

Page 28: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University

Approach

Numeric characterization

(for now) ignore semantic assessment

Assess consistency

enhance automation, computation

Assess compliance to MAP (3.1)

required, recommended fields

Support visual analysis

(early stage)

Page 29: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University

Collection Profile

DPLA Collection data

Administrative data

Collection and item details

Page 30: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University

Collection Details

Page 31: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University

Item Details

Page 32: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University

Visual Analysis

id: 49b09ce719c5184f166920a1a7c1e8cd

Title: Minnesota Newspapers Collection

Item titles Item rights

Page 33: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University

Other Fields

publisherformat

coordinates names

spatial

subjects

Page 34: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University

Subjects - Assessment

Page 35: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University

Subjects - Analysis

Page 36: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University

Correlations

coordinates names

Page 37: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University

Correlations

coordinates namessubjects

Page 38: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University

Collection description dashboard

Evaluation of developed algorithms and metrics

Implications and Ongoing Work

Page 39: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University

Contact

Unmil P. Karadkar <[email protected]>

Karen Wickett <[email protected]>

Temple Teaching Fellowship, School of Information, UT Austin

Acknowledgements

Mark Matienzo, Tom Johnson, Gretchen Gueguen, and the DPLA staff

Student programmers: Jiexian Li, Zheyuan Zhu, Nan Guo, Ruoying Li, Jeremy Tzou, Julia Link, Andrew Florance, Joshua

Sheehy, Meghanath Reddy, Robert Flores, Sowmya Sadhasivam

Page 40: Discovering and Describing DPLA Collections Students: Madhura Parikh, Zhang Zhang Karen Wickett, Unmil P. Karadkar School of Information The University

Collection description dashboard

which features?

which fields?

Evaluation of developed algorithms and metrics

Discussion