Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute...

Preview:

Citation preview

Ray R. Larson, School of Information, UC BerkeleyDaniel Pitti, University of Virginia, Institute for Advanced Technology

in the HumanitiesYiming Liu, School of Information, UC Berkeley

Brian Tingle, California Digital LibraryAdrian Turner, California Digital Library

Rachel Hu, California Digital Library

PNC 2013 – Kyoto, Japan

Towards a Social Network of History

http://socialarchive.iath.virginia.edu

Archival Name Authority System

Hamilton, Alexander, 1757-1804

Luce, Clare Boothe, 1903-1987

Oppenheimer, J. Robert, 1904-1967

Patton family

Patton, George S. (George Smith), 1885-1945

Sontag, Susan, 1933-2004Washington, George, 1732-1799

Whitman, Walt, 1819-1892

Wright, Lloyd, 1890-1978

Archival Name Authority System

Anthony, Susan B

Berkeley Free Church

Bernstein, Leonard, 1918-

Block, Herbert, 1909-2001

Bush, Vannevar, 1890-1974

Frankfurter, Felix, 1882-1965

Franklin, Benjamin, 1706-1790

Fuller, R. Buckminster (Richard Buckminster), 1895-1983

Hamilton, Alexander, 1757-1804

Luce, Clare Boothe, 1903-1987Oppenheimer, J. Robert, 1904-1967

Patton family

Patton, George S. (George Smith),

Hamilton, Alexander, 1757-1804

Luce, Clare Boothe, 1903-1987

Oppenheimer, J. Robert, 1904-1967

Patton family

Patton, George S. (George Smith), 1885-1945

Sontag, Susan, 1933-2004Washington, George, 1732-1799

Whitman, Walt, 1819-1892

Wright, Lloyd, 1890-1978

Archival Name Authority System

Engelland, Jurgen (George).Enwall, Ogie (Aage).

Erickson, Selma Inez.

Fahl, Hans Johan Fredrik.Fet, Peter Laurits.

Flones, Edward.

Fredrickson, Hans.Fredrickson, Sven Fredrick.Garberg, Peder.Gillam, Chandler B., 1833-1899.

Halseth, Otto Hjalmer.Handeland, Martha Tweiten.

Hansen, Anne Schmidt.Hansen, Sylvia (Solveig).Haug, Olga Karoline Nilsen.Hemmestad, Olga Kristine Brodahl.

Henry, Oscar M., 1851-1916.

Holmes, Anna Gudrun Hauge.

Holmes, Elias Kristofferson Velholmen.

Hoset, Ole.Howard, Barnett Allen, b. 1827.

Hytmo, Guri Olsdatter.Knudsen, Johanne.

Kofoed, Thorvald Andreas.Nakkerud, Inga Amanda Treland.

Nakkerud, Trygve Bloch.Nelson, Amanda.

Nerland, Einar Magnus.Nielsen, Einer.Nilsen, Martha Dagsvik.

Nissen, Ole Andreas Nissen.Norberg, Jonas

Walfred.Norwick, Goodman.Nygaard, Lars Thomas.

Odmark, Elsie Karlson.Ohrt, Sigfrid Eidsness.Oliver, Kole Skaflestad.

Olson, Alvin E.Opsal, Cato Torvald.

Petersen, Greta Jensen.Rasmussen, Martin.

Rinne, Esther Wiirre.Rodney familySandback, George Brun.Saure, Sivert Andreas.

Enwall, Ogie (Aage).Erickson,

Selma Inez.Fahl, Hans Johan Fredrik.

Fet, Peter Laurits. Norberg, Jonas Walfred.Norwick, Goodman.

Nygaard, Lars Thomas.Odmark, Elsie Karlson.

Ohrt, Sigfrid Eidsness.Oliver, Kole Skaflestad.Olson, Alvin E.

Opsal, Cato Torvald.Petersen, Greta Jensen.

Rasmussen, Martin.Rinne, Esther Wiirre.

Rodney familySandback, George Brun.

Saure, SHandeland, Martha Tweiten.

Hansen, Anne Schmidt.Hansen, Sylvia (Solveig).Haug, Olga Karoline Nilsen.

Holmes, Anna Gudrun Hauge.Holmes, Elias Kristofferson Velholmen.Hoset, Ole.Howard, Barnett Allen, b. 1827.

Hytmo, Guri Olsdatter.Johnson, Andrew (Anders Johansson).

Johnson, Phiea Petersen Stahl.Johnson, Thelma Irene

Underdal.Jorgenson, Jorgen Aadneram.

Kjersem, Ole Johnson.Knudsen, Johanne.

Kofoed, Thorvald Andreas.Larsen, Elias.Lillelien, Thor.Loe, Otto Calvin.Molund, Erik Wilhelm.

Nakkerud, Inga Amanda Treland.Nakkerud, Trygve Bloch.

Nelson, Amanda.Nerland, Einar Magnus.

Nielsen, Einer.Nilsen, Martha Dagsvik.Nissen, Ole Andreas Nissenivert Andreas.

Johnson, Andrew (Anders Johansson).Johnson, Phiea Petersen Stahl.

Johnson, Thelma Irene Underdal.

Jorgenson, Jorgen Aadneram.Kjersem, Ole Johnson.

Hamilton, Alexander, 1757-1804

Luce, Clare Boothe, 1903-1987

Oppenheimer, J. Robert, 1904-1967

Patton familyPatton, George S. (George Smith),

1885-1945

Sontag, Susan, 1933-2004Washington, George, 1732-1799

Whitman, Walt, 1819-1892Flones, Edward.

Fredrickson, Hans.Fredrickson, Sven Fredrick.

Garberg, Peder.Gillam, Chandler B., 1833-1899.Halseth, Otto Hjalmer.

.

Wright, Lloyd, 1890-1978

Archival Name Authority System

Anthony, Susan B

Berkeley Free Church

Bernstein, Leonard, 1918-

Block, Herbert, 1909-2001

Bush, Vannevar, 1890-1974

Frankfurter, Felix, 1882-1965

Franklin, Benjamin, 1706-1790

Fuller, R. Buckminster (Richard Buckminster), 1895-1983

Hamilton, Alexander, 1757-1804

Luce, Clare Boothe, 1903-1987Oppenheimer, J. Robert, 1904-1967

Patton family

Patton, George S. (George Smith),

Hamilton, Alexander, 1757-1804

Luce, Clare Boothe, 1903-1987

Oppenheimer, J. Robert, 1904-1967

Patton family

Patton, George S. (George Smith), 1885-1945

Sontag, Susan, 1933-2004Washington, George, 1732-1799

Whitman, Walt, 1819-1892

Wright, Lloyd, 1890-1978

Engelland, Jurgen (George).Enwall, Ogie (Aage).

Erickson, Selma Inez.Fahl, Hans Johan Fredrik.Fet, Peter Laurits.

Flones, Edward.Fredrickson, Hans.

Fredrickson, Sven Fredrick.Garberg, Peder.Gillam, Chandler B., 1833-1899.

Halseth, Otto Hjalmer.Handeland, Martha Tweiten.

Hansen, Anne Schmidt.Hansen, Sylvia (Solveig).Haug, Olga Karoline Nilsen.Hemmestad, Olga Kristine Brodahl.

Henry, Oscar M., 1851-1916.

Holmes, Anna Gudrun Hauge.

Holmes, Elias Kristofferson Velholmen.

Hoset, Ole.Howard, Barnett Allen, b. 1827.

Hytmo, Guri Olsdatter.Johnson, Andrew (Anders Johansson).

Johnson, Phiea Petersen Stahl.Johnson, Thelma Irene Underdal.

Jorgenson, Jorgen Aadneram.Kjersem, Ole Johnson.

Knudsen, Johanne.Kofoed, Thorvald Andreas.

Larsen, Elias.Lillelien, Thor.

Loe, Otto Calvin.Molund, Erik Wilhelm.Nakkerud, Inga Amanda Treland.Nakkerud, Trygve Bloch.

Nelson, Amanda.Nerland, Einar Magnus.Nielsen, Einer.Nilsen, Martha Dagsvik.

Nissen, Ole Andreas Nissen.Norberg, Jonas Walfred.

Norwick, Goodman.Nygaard, Lars Thomas.

Odmark, Elsie Karlson.Ohrt, Sigfrid Eidsness.Oliver, Kole Skaflestad.

Olson, Alvin E.Opsal, Cato Torvald.

Petersen, Greta Jensen.Rasmussen, Martin.

Rinne, Esther Wiirre.Rodney familySandback, George Brun.Saure, Sivert Andreas.

Enwall, Ogie (Aage).Erickson,

Selma Inez.Fahl, Hans Johan Fredrik.Fet, Peter Laurits.Flones, Edward.

Fredrickson, Hans.Fredrickson, Sven Fredrick.

Garberg, Peder.Gillam, Chandler B., 1833-1899.

Halseth, Otto Hjalmer.Handeland, Martha Tweiten.

Hansen, Anne Schmidt.Hansen, Sylvia (Solveig).Haug, Olga Karoline Nilsen.Hemmestad, Olga Kristine Brodahl.Henry, Oscar M., 1851-1916.Holmes, Anna Gudrun Hauge.Holmes, Elias Kristofferson Velholmen.

Hoset, Ole.Howard, Barnett Allen, b. 1827.

Hytmo, Guri Olsdatter.Johnson, Andrew (Anders Johansson).

Johnson, Phiea Petersen Stahl.Johnson, Thelma Irene

Underdal.Jorgenson, Jorgen Aadneram.

Kjersem, Ole Johnson.Knudsen, Johanne.

Kofoed, Thorvald Andreas.Larsen, Elias.Lillelien, Thor.Loe, Otto Calvin.Molund, Erik Wilhelm.

Nakkerud, Inga Amanda Treland.Nakkerud, Trygve Bloch.

Nelson, Amanda.Nerland, Einar Magnus.

Nielsen, Einer.Nilsen, Martha Dagsvik.Nissen, Ole Andreas Nissen.Norberg, Jonas Walfred.

Norwick, Goodman.Nygaard, Lars Thomas.Odmark, Elsie Karlson.Ohrt, Sigfrid Eidsness.

Oliver, Kole Skaflestad.Olson, Alvin E.Opsal, Cato Torvald.Petersen, Greta Jensen.

Rasmussen, Martin.Rinne, Esther Wiirre.

Rodney familySandback, George Brun.Saure, Sivert Andreas.

Engelland, Jurgen (George).Enwall, Ogie (Aage).

Erickson, Selma Inez.Fahl, Hans Johan Fredrik.Fet, Peter Laurits.

Flones, Edward.Fredrickson, Hans.

Fredrickson, Sven Fredrick.Garberg, Peder.Gillam, Chandler B., 1833-1899.

Halseth, Otto Hjalmer.Handeland, Martha Tweiten.

Hansen, Anne Schmidt.Hansen, Sylvia (Solveig).Haug, Olga Karoline Nilsen.Hemmestad, Olga Kristine Brodahl.

Henry, Oscar M., 1851-1916.

Holmes, Anna Gudrun Hauge.

Holmes, Elias Kristofferson Velholmen.

Hoset, Ole.Howard, Barnett Allen, b. 1827.

Hytmo, Guri Olsdatter.Johnson, Andrew (Anders Johansson).

Johnson, Phiea Petersen Stahl.Johnson, Thelma Irene Underdal.

Jorgenson, Jorgen Aadneram.Kjersem, Ole Johnson.

Knudsen, Johanne.Kofoed, Thorvald Andreas.

Larsen, Elias.Lillelien, Thor.

Loe, Otto Calvin.Molund, Erik Wilhelm.Nakkerud, Inga Amanda Treland.Nakkerud, Trygve Bloch.

Nelson, Amanda.Nerland, Einar Magnus.Nielsen, Einer.Nilsen, Martha Dagsvik.

Nissen, Ole Andreas Nissen.Norberg, Jonas Walfred.

Norwick, Goodman.Nygaard, Lars Thomas.

Odmark, Elsie Karlson.Ohrt, Sigfrid Eidsness.Oliver, Kole Skaflestad.

Olson, Alvin E.Opsal, Cato Torvald.

Petersen, Greta Jensen.Rasmussen, Martin.

Rinne, Esther Wiirre.Rodney familySandback, George Brun.Saure, Sivert Andreas.

Enwall, Ogie (Aage).Erickson,

Selma Inez.Fahl, Hans Johan Fredrik.Fet, Peter Laurits.Flones, Edward.

Fredrickson, Hans.Fredrickson, Sven Fredrick.

Garberg, Peder.Gillam, Chandler B., 1833-1899.

Halseth, Otto Hjalmer.Handeland, Martha Tweiten.

Hansen, Anne Schmidt.Hansen, Sylvia (Solveig).Haug, Olga Karoline Nilsen.Hemmestad, Olga Kristine Brodahl.Henry, Oscar M., 1851-1916.Holmes, Anna Gudrun Hauge.Holmes, Elias Kristofferson Velholmen.

Hoset, Ole.Howard, Barnett Allen, b. 1827.

Hytmo, Guri Olsdatter.Johnson, Andrew (Anders Johansson).

Johnson, Phiea Petersen Stahl.Johnson, Thelma Irene

Underdal.Jorgenson, Jorgen Aadneram.

Kjersem, Ole Johnson.Knudsen, Johanne.

Kofoed, Thorvald Andreas.Larsen, Elias.Lillelien, Thor.Loe, Otto Calvin.Molund, Erik Wilhelm.

Nakkerud, Inga Amanda Treland.Nakkerud, Trygve Bloch.

Nelson, Amanda.Nerland, Einar Magnus.

Nielsen, Einer.Nilsen, Martha Dagsvik.Nissen, Ole Andreas Nissen.Norberg, Jonas Walfred.

Norwick, Goodman.Nygaard, Lars Thomas.Odmark, Elsie Karlson.Ohrt, Sigfrid Eidsness.

Oliver, Kole Skaflestad.Olson, Alvin E.Opsal, Cato Torvald.Petersen, Greta Jensen.

Rasmussen, Martin.Rinne, Esther Wiirre.

Rodney familySandback, George Brun.Saure, Sivert Andreas.

Engelland, Jurgen (George).Enwall, Ogie (Aage).

Erickson, Selma Inez.Fahl, Hans Johan Fredrik.Fet, Peter Laurits.

Flones, Edward.Fredrickson, Hans.

Fredrickson, Sven Fredrick.Garberg, Peder.Gillam, Chandler B., 1833-1899.

Halseth, Otto Hjalmer.Handeland, Martha Tweiten.

Hansen, Anne Schmidt.Hansen, Sylvia (Solveig).Haug, Olga Karoline Nilsen.Hemmestad, Olga Kristine Brodahl.

Henry, Oscar M., 1851-1916.

Holmes, Anna Gudrun Hauge.

Holmes, Elias Kristofferson Velholmen.

Hoset, Ole.Howard, Barnett Allen, b. 1827.

Hytmo, Guri Olsdatter.Johnson, Andrew (Anders Johansson).

Johnson, Phiea Petersen Stahl.Johnson, Thelma Irene Underdal.

Jorgenson, Jorgen Aadneram.Kjersem, Ole Johnson.

Knudsen, Johanne.Kofoed, Thorvald Andreas.

Larsen, Elias.Lillelien, Thor.

Loe, Otto Calvin.Molund, Erik Wilhelm.Nakkerud, Inga Amanda Treland.Nakkerud, Trygve Bloch.

Nelson, Amanda.Nerland, Einar Magnus.Nielsen, Einer.Nilsen, Martha Dagsvik.

Nissen, Ole Andreas Nissen.Norberg, Jonas Walfred.

Norwick, Goodman.Nygaard, Lars Thomas.

Odmark, Elsie Karlson.Ohrt, Sigfrid Eidsness.Oliver, Kole Skaflestad.

Olson, Alvin E.Opsal, Cato Torvald.

Petersen, Greta Jensen.Rasmussen, Martin.

Rinne, Esther Wiirre.Rodney familySandback, George Brun.Saure, Sivert Andreas.

Enwall, Ogie (Aage).Erickson,

Selma Inez.Fahl, Hans Johan Fredrik.Fet, Peter Laurits.Flones, Edward.

Fredrickson, Hans.Fredrickson, Sven Fredrick.

Garberg, Peder.Gillam, Chandler B., 1833-1899.

Halseth, Otto Hjalmer.Handeland, Martha Tweiten.

Hansen, Anne Schmidt.Hansen, Sylvia (Solveig).Haug, Olga Karoline Nilsen.Hemmestad, Olga Kristine Brodahl.Henry, Oscar M., 1851-1916.Holmes, Anna Gudrun Hauge.Holmes, Elias Kristofferson Velholmen.

Hoset, Ole.Howard, Barnett Allen, b. 1827.

Hytmo, Guri Olsdatter.Johnson, Andrew (Anders Johansson).

Johnson, Phiea Petersen Stahl.Johnson, Thelma Irene

Underdal.Jorgenson, Jorgen Aadneram.

Kjersem, Ole Johnson.Knudsen, Johanne.

Kofoed, Thorvald Andreas.Larsen, Elias.Lillelien, Thor.Loe, Otto Calvin.Molund, Erik Wilhelm.

Nakkerud, Inga Amanda Treland.Nakkerud, Trygve Bloch.

Nelson, Amanda.Nerland, Einar Magnus.

Nielsen, Einer.Nilsen, Martha Dagsvik.Nissen, Ole Andreas Nissen.Norberg, Jonas Walfred.

Norwick, Goodman.Nygaard, Lars Thomas.Odmark, Elsie Karlson.Ohrt, Sigfrid Eidsness.

Oliver, Kole Skaflestad.Olson, Alvin E.Opsal, Cato Torvald.Petersen, Greta Jensen.

Rasmussen, Martin.Rinne, Esther Wiirre.

Rodney familySandback, George Brun.Saure, Sivert Andreas.

Archival Name Authority System

Archival Name Authority System

Archival Name Authority System

Archival Name Authority System

Archival Name Authority System

Archival Name Authority System

Background

• Research and demonstration project• Multi-year funding• National Endowment for the Humanities

(2010-2012) • Andrew W. Mellon Foundation (2012-

2014)• Planning Project for Cooperative Service

(2014-15 - Pending)

Objectives

1. Develop tools for extracting EAC-CPF records, drawing on existing data (EAD finding aids, MARC records)

2. Match, merge, and enhance; build a large test corpus of EAC-CPF records

3. Create a prototype biographical resource and access system, using those records

Objectives

1. Develop tools for extracting EAC-CPF records, drawing on existing data (EAD finding aids, MARC records)

2. Match, merge, and enhance; build a large test corpus of EAC-CPF records

3. Create a prototype biographical resource and access system, using those records

Objectives

1. Develop tools for extracting EAC-CPF records, drawing on existing data (EAD finding aids, MARC records)

2. Match, merge, and enhance; build a large test corpus of EAC-CPF records

3. Create a prototype biographical resource and access system, using those records

Project Team

• University of Virginia, Institute for Advanced Technology in the Humanities– Daniel Pitti (PI) and Worthy Martin

• UC Berkeley School of Information– Ray Larson and Yiming Liu

• California Digital Library– Rachael Hu, Brian Tingle, and Adrian Turner

Project Team

• Terry Catapano (Columbia University)• Sara Sprenkle (Washington and Lee University)• Sarah Wells (University of Virginia)• Kathy Wisser (Simmons Graduate School of Library

and Information Science)• Tom Lynch (University of Illinois School of Library

and Information Science)

EAC-CPF

• XML-based data structure standard for encoding archival authority records

• Authorized name headings for the entity• Biographical/historical context for the entity• Links to resources created by the entity• Links to resources about the entity

Example EAD - Creator<?xml version="1.0" encoding="UTF-8" standalone="no"?><!-- Transformed with v1v2002_4.xsl --><!DOCTYPE ead PUBLIC "+//ISBN 1-931666-00-8//DTD ead.dtd (Encoded Archival Description (EAD) Version 2002)//EN" "http://lcweb2.loc.gov/xmlcommon/dtds/ead2002/ead.dtd" [<!ENTITY lcseal SYSTEM "http://lcweb2.loc.gov/xmlcommon/lcseal.jpg " NDATA jpeg>]>

<ead><eadheader repositoryencoding="iso15511” … > <eadid mainagencycode="dlc" countrycode="us”…>http://hdl.loc.gov/loc.mss/eadmss.ms003073</eadid><filedesc><titlestmt> <titleproper encodinganalog="245$a">Clement F. Haynsworth

Papers</titleproper> … <unitid label="ID No." encodinganalog="590" countrycode="US” …>MSS79781</unitid> <origination label="Creator"> <persname source="lcnaf" encodinganalog="100">Haynsworth, Clement F. (Clement Furman), 1912-1989</persname> </origination>

<physdesc label="Extent">…

Example EAD - bioghist …<bioghist encodinganalog="545"> <head>Biographical Note</head> <chronlist> <listhead> <head01>Date</head01> <head02>Event</head02> </listhead> <chronitem> <date>1912, Oct. 30</date> <event>Born, Greenville, S.C.</event> </chronitem> <chronitem> <date>1933</date> <event>A.B., Furman University, Greenville, S.C.</event> </chronitem> <chronitem> <date>1936</date> <event>LL.B., Harvard University, Cambridge, Mass.</event>

</chronitem> …

Title

Title

Title

TitleJohn Brennan

George Jones

Thomas Smith

Frederick Jones

Martha Jones

Example EAD - controlaccess … </note> <controlaccess> <head>People</head> <persname encodinganalog="600" role="subject" source="lcnaf" altrender=":::PWEBRECON=^Barzun%2C+Jacques%2C+1907-+Correspondence.^">Barzun, Jacques, 1907- --Correspondence.</persname> <persname encodinganalog="600" role="subject" source="lcnaf" altrender=":::PWEBRECON=^Brennan%2C+William+J.+%28William+Joseph%29%2C+1906-1997+Correspondence.^">Brennan, William J. (William Joseph), 1906-1997--Correspondence.</persname> <persname encodinganalog="600" role="subject" source="lcnaf" altrender=":::PWEBRECON=^Burger%2C+Warren+E.%2C+1907-1995+Correspondence.^">Burger, Warren E., 1907-1995--Correspondence.</persname> <persname encodinganalog="600" role="subject" source="lcnaf" altrender=":::PWEBRECON=^Clark%2C+Tom+C.+%28Tom+Campbell%29%2C+1899-1977+Correspondence.^">Clark, Tom C. (Tom Campbell), 1899-1977--Correspondence.</persname> …

Example EAD - scopecontent

…The most significant and frequent of Haynsworth's correspondents are Jacques Barzun, William J. Brennan, Warren E. Burger, Tom C. Clark, John Paul Frank, Ernest F. Hollings, Edward Moore Kennedy, J. Woodrow Lewis, Daniel John Meador, Arthur Raphael Miller, Richard M. Nixon, Lewis F. Powell, Jr., Strom Thurmond, Johnnie McKeiver Walters, Bernard J. Ward, and Charles Alan Wright.</p> </scopecontent>

Example EAD – unittitle<c04 level="file"> <did> <unitid>No. 7383 </unitid> <unittitle encodinganalog="245$a">Long Mfg. Co. v. Holliday</unittitle> </did> </c04> …<c04 level="file"> <did> <unitid>No. 7416 </unitid> <unittitle encodinganalog="245$a">Norfolk and Portsmouth Belt Line R.R. v. Brotherhood of R.R. Trainmen, Lodge No. 514</unittitle> </did>

</c04> … <c03 level="file"> <did> <container type="box">201</container> <unittitle encodinganalog="245$a">Wright, Charles Alan, 1970-1989 </unittitle> <physdesc> <extent encodinganalog="300">(10 folders)</extent>

Data Sources

• EAD finding aids [~150,000]

– 13 regional and statewide consortia– 35 repositories in US, UK, and France; multiple US federal agencies

• MARC21 records [~4.5 million]

– OCLC WorldCat• Authority records– OCLC Research: Virtual International Authority File (VIAF)

[~16 million]

– Getty Vocabulary Program: Union List of Artist Names (ULAN) [~120,000]

– Additional name records from Archives nationales, British Library, NARA, New York State Archives, and Smithsonian Institution Archives

• Points

Consortia

•Archives Florida•ArchivesHub (UK)•Arizona Archives Online•EAD FACTORY (OhioLink)•Five Colleges•Maine Archival Collections Online (MACON)•Northwest Digital Archives (NWDA)•Online Archive of California•Philadelphia Area Consortium of Special Collections Libraries (PACSCL)•Rhode Island Archival & Manuscript Collections Online (RIAMCO)•Rocky Mountain Online Archive (RMOA)•Texas Archival Resources Online (TARO) •Virginia Heritage

Individual institutions

•American Philosophical Society•Archives nationales (France)•Archives of American Art•Bibliothèque nationale de France•BnF Archives et manuscripts•French Union Catalog •Brigham Young University•Church of Latter Day Saints Archives•Columbia University•Cornell University•Duke University•Harvard University•Indiana University•Library of Congress (publicly available without restriction)•Minnesota Historical Society•Massachusetts Institute of Technology•National Library of Medicine•New York Public Library•New York University•North Carolina State

•Northwestern University•Princeton University•Rutgers University•Smithsonian Institution Archives•Syracuse University •University of Alabama•University of Chicago•University of Connecticut•University of Delaware•University of Florida•University of Illinois•University of Kansas•University of Maryland•University of Michigan Bentley & Special Collections•University of Minnesota•University of Nebraska•University of North Carolina, Chapel Hill•University of Utah•Utah State Archives•Utah State University•Yale University

Data Sources

• EAD finding aids [~150,000]

– 13 regional and statewide consortia– 35 repositories in US, UK, and France; multiple US federal agencies

• MARC21 records [~4.5 million]

– OCLC WorldCat• Authority records– OCLC Research: Virtual International Authority File (VIAF)

[~16 million]

– Getty Vocabulary Program: Union List of Artist Names (ULAN) [~120,000]

– Additional name records from Archives nationales, British Library, NARA, New York State Archives, and Smithsonian Institution Archives

Data Sources

• EAD finding aids [~150,000]

– 13 regional and statewide consortia– 35 repositories in US, UK, and France; multiple US federal agencies

• MARC21 records [~4.5 million]

– OCLC WorldCat• Authority records– OCLC Research: Virtual International Authority File (VIAF)

[~16 million]

– Getty Vocabulary Program: Union List of Artist Names (ULAN) [~120,000]

– Additional EAC-CPF (or other) name records from Archives nationales de France, British Library, NARA, New York State Archives, and Smithsonian Institution Archives

Methods and Processing• Extract EAC-CPF records from existing EAD-encoded

archival descriptions– Extracting both creators and referenced CPF names

• Match EAC-CPF records against one another and against existing authority records (ULAN, VIAF, LCNAF)– Enhance EAC-CPF by normalizing entries, adding

alternative entries, titles (VIAF), and historical data (ULAN)

• Create a prototype historical resource and access system– Historical data and social-professional networks– Links to archive, library, and museum resources (by and

about)

The Problem

• Proliferation of the forms of names– Different names for the same person– Different people with the same names

• Examples – from Books in Print (semi-controlled but not

consistent)– ERIC author index (not controlled)

Goethe

…etc…

John Muir

Library and Archive Authority Control

• Library (or bibliographic) authority control is almost exclusively about the control of names

• Archival authority control involves biographical-historical description of the CPF entity– Descriptions based on controlled vocabularies, for

example, occupations, place of birth and death– But also biographical-historical description

• Prose• Chronological list

• Archival authority control provides context for understanding records, the context of their creation, the provenance

Matching and Merging in SNAC 2

• Developing an updateable database of merged EAC data (dumping Mongo for PostgreSQL)– Will permit incremental addition of new data and

support editing and “forced” merges

• All original records and merged records will be in the database

• Permanent identifiers will be assigned to merged (and unmerged) EAC output records– Track these in the database

& Repository of merged EAC Records

PostgresEAC Repository

VIAF Repository

Connect exactly matching records

Connect records using name authority information

Merge

Cheshire Search

Merging EAC-CPF RecordsLCNAF Repository ULAN Repository

EAC Record Input

Merged EAC RecordsOutput

Merge System Step 1: Load Original RecordsCREATE TABLE original_records ( id bigint NOT NULL, name character varying(255) DEFAULT ''::character varying NOT NULL, source_id character varying(255), collection_id character varying(64), path character varying(255), r_type character varying(64) NOT NULL, from_date date, from_date_type character varying(64), to_date date, to_date_type character varying(64), processed boolean DEFAULT false NOT NULL, last_processed timestamp without time zone, record_data text, created_at timestamp without time zone NOT NULL, updated_at timestamp without time zone NOT NULL, record_group_id bigint);

• Parse source EAC– Key attributes

extracted for merge use

– Original XML stored• Timestamp for last

merge run on record– Resumption of

aborted merge runs or reruns

& Repository of merged EAC Records

PostgresEAC Repository

VIAF Repository

Connect exactly matching records

Connect records using name authority information

Merge

Cheshire Search

Merging EAC-CPF RecordsLCNAF Repository ULAN Repository

EAC Record Input

Merged EAC RecordsOutput

But…

• Exact merging assumes that archives are following LC cataloging practice in their EAD records– There are some problems with this assumption

Some failures for merging…• Different abbreviations:

– A. & G. Carisch & C.– A. & G. Carisch & Co.

• And spacing issues:– A. C. Peters & Bro.– A. C. Peters & Brother.– A. C. Peters. (??)– A. C.Peters & Bro.

• Completeness and alternate rules– Tabb, John B. (John Banister), 1845-1909.– Tabb, John Banister, 1845-1909.

• Also differing transliterations for non-Latin scripts

More…

• Variant romanizations (and spacing):– M. P. Belaieff.– M. P. Belaïeff.– M. P. Bieliaev.– M.P. Belaïeff.– M.P.Belaïeff.

• Initials vs. names:– Zabolotskii, N.A.– Zabolotskii, Nikolai Alekseevich, 1903-1958.– Zabolotskii.

& Repository of merged EAC Records

PostgresEAC Repository

VIAF Repository

Connect exactly matching records

Connect records using name authority information

Merge

Cheshire Search

Merging EAC-CPF RecordsLCNAF Repository ULAN Repository

EAC Record Input

Merged EAC RecordsOutput

Search Authority Files

• For each name, formulate a search of the VIAF database using the Cheshire system (SGML/XML retrieval system with probabilistic and Boolean matching)– Search both the “authoritative” and “non-

authoritative” forms– Consider any name matching a non-authoritative

form to be a candidate match for the authoritative form

– Flag EAC records that match the same authority record as potential matches

Data Sources

• EAD finding aids [~150,000]

– 13 regional and statewide consortia– 35 repositories in US, UK, and France; multiple US federal agencies

• MARC21 records [~4.5 million]

– OCLC WorldCat• Authority records– OCLC Research: Virtual International Authority File (VIAF)

[~16 million]

– Getty Vocabulary Program: Union List of Artist Names (ULAN) [~120,000]

– Additional name records from Archives nationales, British Library, NARA, New York State Archives, and Smithsonian Institution Archives

Shingle Language Model for names

Name: Einstein Albert

Shingle sequence: ein, ins, nst, ste, tei, ein … , ert

Probability that the sequence (ins, nst, ste) follows ein is very high for the name einstein

Krishna Janakiraman and Sean Marimpietri - Biograph

NGRAM or Shingle Matching

Name 1 : Einstein AlbertName 2 : Ainshtain AlbertName 3 : Albert Einstein

ein

ins

nst

steein In n a

alb

ert

al

rtetei

ein

Ain

ins

nsh

shthta tai ain

alb

ert

al

rteteiein

ein

ins

nst

steein In n a

alb

ert

al

rtetei

ein

lbe

lbe lbe

Shingle Language Model for namesKrishna Janakiraman and Sean Marimpietri - Biograph

Merge System Step 2: Record Matches

• Execute merge algorithm and create record groups– pointers from original

records to record groups

– Can be invalidated• Matched authority

record stored for reference

CREATE TABLE record_groups ( id bigint NOT NULL, name character varying(255) DEFAULT ''::character varying NOT NULL, g_type character varying(64) NOT NULL, viaf_record text, ulan_record text, is_valid boolean, invalidated_by bigint, created_at timestamp without time zone NOT NULL, updated_at timestamp without time zone NOT NULL);

Original Records

Record Groups

belongs to

Has many

& Repository of merged EAC Records

PostgresEAC Repository

VIAF Repository

Connect exactly matching records

Connect records using name authority information

Merge

Cheshire Search

Merging EAC-CPF RecordsLCNAF Repository ULAN Repository

EAC Record Input

Merged EAC RecordsOutput

Merge Flagged Records

• For all of the exact matches and authority matches– Use the Authoritative form of the name– Combine data from each match into a single EAC-CPF

record– Retain all source record IDs and information

• Finally, output the merged EAC-CPF records– Actually – store how to build the merged record in the

database as well• Records can be regenerated as needed from the merge data

– Assign permanent identifier for the merged record

Merge System Step 3: Create Output

• Using valid record groups:– generate merged EAC– assign permanent ARK ID– write to new EAC file

• Merged XML stored in db, referenced by record group– Do not need to regenerate XML– Keep track of assigned permanent IDs

Merging Conclusions

• There is not a single merging method, but a staged set of approaches that will allow us to go from the simplest exact matches, to (we hope) reliably identifying various variant forms of a name, etc. when corroborated by contextual (date, etc.) information (including “active” dates

Prototype Access System

• text

http://socialarchive.iath.virginia.edu

SNACSocial Networks and Archival Context

SNACSocial Networks and Archival Context

NAACNational Archival Authorities Cooperative

Not the final name

NAACNational Archival Authorities Cooperativehttp://

socialarchive.iath.virginia.edu/NAAC_index.html

Activities

1. Cultivate EAC-CPF expertise across the archival community, through 140 SAA-hosted workshops

2. Develop a blueprint for a sustainable, national archival authority cooperative

Activities

1. Cultivate EAC-CPF expertise across the archival community, through 140 SAA-hosted workshops

2. Develop a blueprint for a sustainable, national archival authority cooperative

Activities

1. Cultivate EAC-CPF expertise across the archival community, through 140 SAA-hosted workshops

2. Develop a blueprint for a sustainable, national archival authority cooperative

Planning is being extended with proposalto the Mellon Foundation.

Stay tuned for Spring 2014!

Prototype Access System

• text

http://socialarchive.iath.virginia.edu

Brian Tingle and Adrian Turner

RBMSPre-Conference 2012San Diego, CA

Recommended