65
Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu, School of Information, UC Berkeley Brian Tingle, California Digital Library Adrian Turner, California Digital Library Rachel Hu, California Digital Library PNC 2013 – Kyoto, Japan Towards a Social Network of History

Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Embed Size (px)

Citation preview

Page 1: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Ray R. Larson, School of Information, UC BerkeleyDaniel Pitti, University of Virginia, Institute for Advanced Technology

in the HumanitiesYiming Liu, School of Information, UC Berkeley

Brian Tingle, California Digital LibraryAdrian Turner, California Digital Library

Rachel Hu, California Digital Library

PNC 2013 – Kyoto, Japan

Towards a Social Network of History

Page 2: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

http://socialarchive.iath.virginia.edu

Page 3: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Archival Name Authority System

Page 4: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Hamilton, Alexander, 1757-1804

Luce, Clare Boothe, 1903-1987

Oppenheimer, J. Robert, 1904-1967

Patton family

Patton, George S. (George Smith), 1885-1945

Sontag, Susan, 1933-2004Washington, George, 1732-1799

Whitman, Walt, 1819-1892

Wright, Lloyd, 1890-1978

Archival Name Authority System

Page 5: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Anthony, Susan B

Berkeley Free Church

Bernstein, Leonard, 1918-

Block, Herbert, 1909-2001

Bush, Vannevar, 1890-1974

Frankfurter, Felix, 1882-1965

Franklin, Benjamin, 1706-1790

Fuller, R. Buckminster (Richard Buckminster), 1895-1983

Hamilton, Alexander, 1757-1804

Luce, Clare Boothe, 1903-1987Oppenheimer, J. Robert, 1904-1967

Patton family

Patton, George S. (George Smith),

Hamilton, Alexander, 1757-1804

Luce, Clare Boothe, 1903-1987

Oppenheimer, J. Robert, 1904-1967

Patton family

Patton, George S. (George Smith), 1885-1945

Sontag, Susan, 1933-2004Washington, George, 1732-1799

Whitman, Walt, 1819-1892

Wright, Lloyd, 1890-1978

Archival Name Authority System

Page 6: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Engelland, Jurgen (George).Enwall, Ogie (Aage).

Erickson, Selma Inez.

Fahl, Hans Johan Fredrik.Fet, Peter Laurits.

Flones, Edward.

Fredrickson, Hans.Fredrickson, Sven Fredrick.Garberg, Peder.Gillam, Chandler B., 1833-1899.

Halseth, Otto Hjalmer.Handeland, Martha Tweiten.

Hansen, Anne Schmidt.Hansen, Sylvia (Solveig).Haug, Olga Karoline Nilsen.Hemmestad, Olga Kristine Brodahl.

Henry, Oscar M., 1851-1916.

Holmes, Anna Gudrun Hauge.

Holmes, Elias Kristofferson Velholmen.

Hoset, Ole.Howard, Barnett Allen, b. 1827.

Hytmo, Guri Olsdatter.Knudsen, Johanne.

Kofoed, Thorvald Andreas.Nakkerud, Inga Amanda Treland.

Nakkerud, Trygve Bloch.Nelson, Amanda.

Nerland, Einar Magnus.Nielsen, Einer.Nilsen, Martha Dagsvik.

Nissen, Ole Andreas Nissen.Norberg, Jonas

Walfred.Norwick, Goodman.Nygaard, Lars Thomas.

Odmark, Elsie Karlson.Ohrt, Sigfrid Eidsness.Oliver, Kole Skaflestad.

Olson, Alvin E.Opsal, Cato Torvald.

Petersen, Greta Jensen.Rasmussen, Martin.

Rinne, Esther Wiirre.Rodney familySandback, George Brun.Saure, Sivert Andreas.

Enwall, Ogie (Aage).Erickson,

Selma Inez.Fahl, Hans Johan Fredrik.

Fet, Peter Laurits. Norberg, Jonas Walfred.Norwick, Goodman.

Nygaard, Lars Thomas.Odmark, Elsie Karlson.

Ohrt, Sigfrid Eidsness.Oliver, Kole Skaflestad.Olson, Alvin E.

Opsal, Cato Torvald.Petersen, Greta Jensen.

Rasmussen, Martin.Rinne, Esther Wiirre.

Rodney familySandback, George Brun.

Saure, SHandeland, Martha Tweiten.

Hansen, Anne Schmidt.Hansen, Sylvia (Solveig).Haug, Olga Karoline Nilsen.

Holmes, Anna Gudrun Hauge.Holmes, Elias Kristofferson Velholmen.Hoset, Ole.Howard, Barnett Allen, b. 1827.

Hytmo, Guri Olsdatter.Johnson, Andrew (Anders Johansson).

Johnson, Phiea Petersen Stahl.Johnson, Thelma Irene

Underdal.Jorgenson, Jorgen Aadneram.

Kjersem, Ole Johnson.Knudsen, Johanne.

Kofoed, Thorvald Andreas.Larsen, Elias.Lillelien, Thor.Loe, Otto Calvin.Molund, Erik Wilhelm.

Nakkerud, Inga Amanda Treland.Nakkerud, Trygve Bloch.

Nelson, Amanda.Nerland, Einar Magnus.

Nielsen, Einer.Nilsen, Martha Dagsvik.Nissen, Ole Andreas Nissenivert Andreas.

Johnson, Andrew (Anders Johansson).Johnson, Phiea Petersen Stahl.

Johnson, Thelma Irene Underdal.

Jorgenson, Jorgen Aadneram.Kjersem, Ole Johnson.

Hamilton, Alexander, 1757-1804

Luce, Clare Boothe, 1903-1987

Oppenheimer, J. Robert, 1904-1967

Patton familyPatton, George S. (George Smith),

1885-1945

Sontag, Susan, 1933-2004Washington, George, 1732-1799

Whitman, Walt, 1819-1892Flones, Edward.

Fredrickson, Hans.Fredrickson, Sven Fredrick.

Garberg, Peder.Gillam, Chandler B., 1833-1899.Halseth, Otto Hjalmer.

.

Wright, Lloyd, 1890-1978

Archival Name Authority System

Page 7: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Anthony, Susan B

Berkeley Free Church

Bernstein, Leonard, 1918-

Block, Herbert, 1909-2001

Bush, Vannevar, 1890-1974

Frankfurter, Felix, 1882-1965

Franklin, Benjamin, 1706-1790

Fuller, R. Buckminster (Richard Buckminster), 1895-1983

Hamilton, Alexander, 1757-1804

Luce, Clare Boothe, 1903-1987Oppenheimer, J. Robert, 1904-1967

Patton family

Patton, George S. (George Smith),

Hamilton, Alexander, 1757-1804

Luce, Clare Boothe, 1903-1987

Oppenheimer, J. Robert, 1904-1967

Patton family

Patton, George S. (George Smith), 1885-1945

Sontag, Susan, 1933-2004Washington, George, 1732-1799

Whitman, Walt, 1819-1892

Wright, Lloyd, 1890-1978

Engelland, Jurgen (George).Enwall, Ogie (Aage).

Erickson, Selma Inez.Fahl, Hans Johan Fredrik.Fet, Peter Laurits.

Flones, Edward.Fredrickson, Hans.

Fredrickson, Sven Fredrick.Garberg, Peder.Gillam, Chandler B., 1833-1899.

Halseth, Otto Hjalmer.Handeland, Martha Tweiten.

Hansen, Anne Schmidt.Hansen, Sylvia (Solveig).Haug, Olga Karoline Nilsen.Hemmestad, Olga Kristine Brodahl.

Henry, Oscar M., 1851-1916.

Holmes, Anna Gudrun Hauge.

Holmes, Elias Kristofferson Velholmen.

Hoset, Ole.Howard, Barnett Allen, b. 1827.

Hytmo, Guri Olsdatter.Johnson, Andrew (Anders Johansson).

Johnson, Phiea Petersen Stahl.Johnson, Thelma Irene Underdal.

Jorgenson, Jorgen Aadneram.Kjersem, Ole Johnson.

Knudsen, Johanne.Kofoed, Thorvald Andreas.

Larsen, Elias.Lillelien, Thor.

Loe, Otto Calvin.Molund, Erik Wilhelm.Nakkerud, Inga Amanda Treland.Nakkerud, Trygve Bloch.

Nelson, Amanda.Nerland, Einar Magnus.Nielsen, Einer.Nilsen, Martha Dagsvik.

Nissen, Ole Andreas Nissen.Norberg, Jonas Walfred.

Norwick, Goodman.Nygaard, Lars Thomas.

Odmark, Elsie Karlson.Ohrt, Sigfrid Eidsness.Oliver, Kole Skaflestad.

Olson, Alvin E.Opsal, Cato Torvald.

Petersen, Greta Jensen.Rasmussen, Martin.

Rinne, Esther Wiirre.Rodney familySandback, George Brun.Saure, Sivert Andreas.

Enwall, Ogie (Aage).Erickson,

Selma Inez.Fahl, Hans Johan Fredrik.Fet, Peter Laurits.Flones, Edward.

Fredrickson, Hans.Fredrickson, Sven Fredrick.

Garberg, Peder.Gillam, Chandler B., 1833-1899.

Halseth, Otto Hjalmer.Handeland, Martha Tweiten.

Hansen, Anne Schmidt.Hansen, Sylvia (Solveig).Haug, Olga Karoline Nilsen.Hemmestad, Olga Kristine Brodahl.Henry, Oscar M., 1851-1916.Holmes, Anna Gudrun Hauge.Holmes, Elias Kristofferson Velholmen.

Hoset, Ole.Howard, Barnett Allen, b. 1827.

Hytmo, Guri Olsdatter.Johnson, Andrew (Anders Johansson).

Johnson, Phiea Petersen Stahl.Johnson, Thelma Irene

Underdal.Jorgenson, Jorgen Aadneram.

Kjersem, Ole Johnson.Knudsen, Johanne.

Kofoed, Thorvald Andreas.Larsen, Elias.Lillelien, Thor.Loe, Otto Calvin.Molund, Erik Wilhelm.

Nakkerud, Inga Amanda Treland.Nakkerud, Trygve Bloch.

Nelson, Amanda.Nerland, Einar Magnus.

Nielsen, Einer.Nilsen, Martha Dagsvik.Nissen, Ole Andreas Nissen.Norberg, Jonas Walfred.

Norwick, Goodman.Nygaard, Lars Thomas.Odmark, Elsie Karlson.Ohrt, Sigfrid Eidsness.

Oliver, Kole Skaflestad.Olson, Alvin E.Opsal, Cato Torvald.Petersen, Greta Jensen.

Rasmussen, Martin.Rinne, Esther Wiirre.

Rodney familySandback, George Brun.Saure, Sivert Andreas.

Engelland, Jurgen (George).Enwall, Ogie (Aage).

Erickson, Selma Inez.Fahl, Hans Johan Fredrik.Fet, Peter Laurits.

Flones, Edward.Fredrickson, Hans.

Fredrickson, Sven Fredrick.Garberg, Peder.Gillam, Chandler B., 1833-1899.

Halseth, Otto Hjalmer.Handeland, Martha Tweiten.

Hansen, Anne Schmidt.Hansen, Sylvia (Solveig).Haug, Olga Karoline Nilsen.Hemmestad, Olga Kristine Brodahl.

Henry, Oscar M., 1851-1916.

Holmes, Anna Gudrun Hauge.

Holmes, Elias Kristofferson Velholmen.

Hoset, Ole.Howard, Barnett Allen, b. 1827.

Hytmo, Guri Olsdatter.Johnson, Andrew (Anders Johansson).

Johnson, Phiea Petersen Stahl.Johnson, Thelma Irene Underdal.

Jorgenson, Jorgen Aadneram.Kjersem, Ole Johnson.

Knudsen, Johanne.Kofoed, Thorvald Andreas.

Larsen, Elias.Lillelien, Thor.

Loe, Otto Calvin.Molund, Erik Wilhelm.Nakkerud, Inga Amanda Treland.Nakkerud, Trygve Bloch.

Nelson, Amanda.Nerland, Einar Magnus.Nielsen, Einer.Nilsen, Martha Dagsvik.

Nissen, Ole Andreas Nissen.Norberg, Jonas Walfred.

Norwick, Goodman.Nygaard, Lars Thomas.

Odmark, Elsie Karlson.Ohrt, Sigfrid Eidsness.Oliver, Kole Skaflestad.

Olson, Alvin E.Opsal, Cato Torvald.

Petersen, Greta Jensen.Rasmussen, Martin.

Rinne, Esther Wiirre.Rodney familySandback, George Brun.Saure, Sivert Andreas.

Enwall, Ogie (Aage).Erickson,

Selma Inez.Fahl, Hans Johan Fredrik.Fet, Peter Laurits.Flones, Edward.

Fredrickson, Hans.Fredrickson, Sven Fredrick.

Garberg, Peder.Gillam, Chandler B., 1833-1899.

Halseth, Otto Hjalmer.Handeland, Martha Tweiten.

Hansen, Anne Schmidt.Hansen, Sylvia (Solveig).Haug, Olga Karoline Nilsen.Hemmestad, Olga Kristine Brodahl.Henry, Oscar M., 1851-1916.Holmes, Anna Gudrun Hauge.Holmes, Elias Kristofferson Velholmen.

Hoset, Ole.Howard, Barnett Allen, b. 1827.

Hytmo, Guri Olsdatter.Johnson, Andrew (Anders Johansson).

Johnson, Phiea Petersen Stahl.Johnson, Thelma Irene

Underdal.Jorgenson, Jorgen Aadneram.

Kjersem, Ole Johnson.Knudsen, Johanne.

Kofoed, Thorvald Andreas.Larsen, Elias.Lillelien, Thor.Loe, Otto Calvin.Molund, Erik Wilhelm.

Nakkerud, Inga Amanda Treland.Nakkerud, Trygve Bloch.

Nelson, Amanda.Nerland, Einar Magnus.

Nielsen, Einer.Nilsen, Martha Dagsvik.Nissen, Ole Andreas Nissen.Norberg, Jonas Walfred.

Norwick, Goodman.Nygaard, Lars Thomas.Odmark, Elsie Karlson.Ohrt, Sigfrid Eidsness.

Oliver, Kole Skaflestad.Olson, Alvin E.Opsal, Cato Torvald.Petersen, Greta Jensen.

Rasmussen, Martin.Rinne, Esther Wiirre.

Rodney familySandback, George Brun.Saure, Sivert Andreas.

Engelland, Jurgen (George).Enwall, Ogie (Aage).

Erickson, Selma Inez.Fahl, Hans Johan Fredrik.Fet, Peter Laurits.

Flones, Edward.Fredrickson, Hans.

Fredrickson, Sven Fredrick.Garberg, Peder.Gillam, Chandler B., 1833-1899.

Halseth, Otto Hjalmer.Handeland, Martha Tweiten.

Hansen, Anne Schmidt.Hansen, Sylvia (Solveig).Haug, Olga Karoline Nilsen.Hemmestad, Olga Kristine Brodahl.

Henry, Oscar M., 1851-1916.

Holmes, Anna Gudrun Hauge.

Holmes, Elias Kristofferson Velholmen.

Hoset, Ole.Howard, Barnett Allen, b. 1827.

Hytmo, Guri Olsdatter.Johnson, Andrew (Anders Johansson).

Johnson, Phiea Petersen Stahl.Johnson, Thelma Irene Underdal.

Jorgenson, Jorgen Aadneram.Kjersem, Ole Johnson.

Knudsen, Johanne.Kofoed, Thorvald Andreas.

Larsen, Elias.Lillelien, Thor.

Loe, Otto Calvin.Molund, Erik Wilhelm.Nakkerud, Inga Amanda Treland.Nakkerud, Trygve Bloch.

Nelson, Amanda.Nerland, Einar Magnus.Nielsen, Einer.Nilsen, Martha Dagsvik.

Nissen, Ole Andreas Nissen.Norberg, Jonas Walfred.

Norwick, Goodman.Nygaard, Lars Thomas.

Odmark, Elsie Karlson.Ohrt, Sigfrid Eidsness.Oliver, Kole Skaflestad.

Olson, Alvin E.Opsal, Cato Torvald.

Petersen, Greta Jensen.Rasmussen, Martin.

Rinne, Esther Wiirre.Rodney familySandback, George Brun.Saure, Sivert Andreas.

Enwall, Ogie (Aage).Erickson,

Selma Inez.Fahl, Hans Johan Fredrik.Fet, Peter Laurits.Flones, Edward.

Fredrickson, Hans.Fredrickson, Sven Fredrick.

Garberg, Peder.Gillam, Chandler B., 1833-1899.

Halseth, Otto Hjalmer.Handeland, Martha Tweiten.

Hansen, Anne Schmidt.Hansen, Sylvia (Solveig).Haug, Olga Karoline Nilsen.Hemmestad, Olga Kristine Brodahl.Henry, Oscar M., 1851-1916.Holmes, Anna Gudrun Hauge.Holmes, Elias Kristofferson Velholmen.

Hoset, Ole.Howard, Barnett Allen, b. 1827.

Hytmo, Guri Olsdatter.Johnson, Andrew (Anders Johansson).

Johnson, Phiea Petersen Stahl.Johnson, Thelma Irene

Underdal.Jorgenson, Jorgen Aadneram.

Kjersem, Ole Johnson.Knudsen, Johanne.

Kofoed, Thorvald Andreas.Larsen, Elias.Lillelien, Thor.Loe, Otto Calvin.Molund, Erik Wilhelm.

Nakkerud, Inga Amanda Treland.Nakkerud, Trygve Bloch.

Nelson, Amanda.Nerland, Einar Magnus.

Nielsen, Einer.Nilsen, Martha Dagsvik.Nissen, Ole Andreas Nissen.Norberg, Jonas Walfred.

Norwick, Goodman.Nygaard, Lars Thomas.Odmark, Elsie Karlson.Ohrt, Sigfrid Eidsness.

Oliver, Kole Skaflestad.Olson, Alvin E.Opsal, Cato Torvald.Petersen, Greta Jensen.

Rasmussen, Martin.Rinne, Esther Wiirre.

Rodney familySandback, George Brun.Saure, Sivert Andreas.

Archival Name Authority System

Page 8: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Archival Name Authority System

Archival Name Authority System

Page 9: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Archival Name Authority System

Archival Name Authority System

Page 10: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Archival Name Authority System

Page 11: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Background

• Research and demonstration project• Multi-year funding• National Endowment for the Humanities

(2010-2012) • Andrew W. Mellon Foundation (2012-

2014)• Planning Project for Cooperative Service

(2014-15 - Pending)

Page 12: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Objectives

1. Develop tools for extracting EAC-CPF records, drawing on existing data (EAD finding aids, MARC records)

2. Match, merge, and enhance; build a large test corpus of EAC-CPF records

3. Create a prototype biographical resource and access system, using those records

Page 13: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Objectives

1. Develop tools for extracting EAC-CPF records, drawing on existing data (EAD finding aids, MARC records)

2. Match, merge, and enhance; build a large test corpus of EAC-CPF records

3. Create a prototype biographical resource and access system, using those records

Page 14: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Objectives

1. Develop tools for extracting EAC-CPF records, drawing on existing data (EAD finding aids, MARC records)

2. Match, merge, and enhance; build a large test corpus of EAC-CPF records

3. Create a prototype biographical resource and access system, using those records

Page 15: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Project Team

• University of Virginia, Institute for Advanced Technology in the Humanities– Daniel Pitti (PI) and Worthy Martin

• UC Berkeley School of Information– Ray Larson and Yiming Liu

• California Digital Library– Rachael Hu, Brian Tingle, and Adrian Turner

Page 16: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Project Team

• Terry Catapano (Columbia University)• Sara Sprenkle (Washington and Lee University)• Sarah Wells (University of Virginia)• Kathy Wisser (Simmons Graduate School of Library

and Information Science)• Tom Lynch (University of Illinois School of Library

and Information Science)

Page 17: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,
Page 18: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

EAC-CPF

• XML-based data structure standard for encoding archival authority records

• Authorized name headings for the entity• Biographical/historical context for the entity• Links to resources created by the entity• Links to resources about the entity

Page 19: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,
Page 20: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Example EAD - Creator<?xml version="1.0" encoding="UTF-8" standalone="no"?><!-- Transformed with v1v2002_4.xsl --><!DOCTYPE ead PUBLIC "+//ISBN 1-931666-00-8//DTD ead.dtd (Encoded Archival Description (EAD) Version 2002)//EN" "http://lcweb2.loc.gov/xmlcommon/dtds/ead2002/ead.dtd" [<!ENTITY lcseal SYSTEM "http://lcweb2.loc.gov/xmlcommon/lcseal.jpg " NDATA jpeg>]>

<ead><eadheader repositoryencoding="iso15511” … > <eadid mainagencycode="dlc" countrycode="us”…>http://hdl.loc.gov/loc.mss/eadmss.ms003073</eadid><filedesc><titlestmt> <titleproper encodinganalog="245$a">Clement F. Haynsworth

Papers</titleproper> … <unitid label="ID No." encodinganalog="590" countrycode="US” …>MSS79781</unitid> <origination label="Creator"> <persname source="lcnaf" encodinganalog="100">Haynsworth, Clement F. (Clement Furman), 1912-1989</persname> </origination>

<physdesc label="Extent">…

Page 21: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Example EAD - bioghist …<bioghist encodinganalog="545"> <head>Biographical Note</head> <chronlist> <listhead> <head01>Date</head01> <head02>Event</head02> </listhead> <chronitem> <date>1912, Oct. 30</date> <event>Born, Greenville, S.C.</event> </chronitem> <chronitem> <date>1933</date> <event>A.B., Furman University, Greenville, S.C.</event> </chronitem> <chronitem> <date>1936</date> <event>LL.B., Harvard University, Cambridge, Mass.</event>

</chronitem> …

Page 22: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Title

Page 23: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Title

Page 24: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Title

Page 25: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

TitleJohn Brennan

George Jones

Page 26: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Thomas Smith

Frederick Jones

Martha Jones

Page 27: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Example EAD - controlaccess … </note> <controlaccess> <head>People</head> <persname encodinganalog="600" role="subject" source="lcnaf" altrender=":::PWEBRECON=^Barzun%2C+Jacques%2C+1907-+Correspondence.^">Barzun, Jacques, 1907- --Correspondence.</persname> <persname encodinganalog="600" role="subject" source="lcnaf" altrender=":::PWEBRECON=^Brennan%2C+William+J.+%28William+Joseph%29%2C+1906-1997+Correspondence.^">Brennan, William J. (William Joseph), 1906-1997--Correspondence.</persname> <persname encodinganalog="600" role="subject" source="lcnaf" altrender=":::PWEBRECON=^Burger%2C+Warren+E.%2C+1907-1995+Correspondence.^">Burger, Warren E., 1907-1995--Correspondence.</persname> <persname encodinganalog="600" role="subject" source="lcnaf" altrender=":::PWEBRECON=^Clark%2C+Tom+C.+%28Tom+Campbell%29%2C+1899-1977+Correspondence.^">Clark, Tom C. (Tom Campbell), 1899-1977--Correspondence.</persname> …

Page 28: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Example EAD - scopecontent

…The most significant and frequent of Haynsworth's correspondents are Jacques Barzun, William J. Brennan, Warren E. Burger, Tom C. Clark, John Paul Frank, Ernest F. Hollings, Edward Moore Kennedy, J. Woodrow Lewis, Daniel John Meador, Arthur Raphael Miller, Richard M. Nixon, Lewis F. Powell, Jr., Strom Thurmond, Johnnie McKeiver Walters, Bernard J. Ward, and Charles Alan Wright.</p> </scopecontent>

Page 29: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Example EAD – unittitle<c04 level="file"> <did> <unitid>No. 7383 </unitid> <unittitle encodinganalog="245$a">Long Mfg. Co. v. Holliday</unittitle> </did> </c04> …<c04 level="file"> <did> <unitid>No. 7416 </unitid> <unittitle encodinganalog="245$a">Norfolk and Portsmouth Belt Line R.R. v. Brotherhood of R.R. Trainmen, Lodge No. 514</unittitle> </did>

</c04> … <c03 level="file"> <did> <container type="box">201</container> <unittitle encodinganalog="245$a">Wright, Charles Alan, 1970-1989 </unittitle> <physdesc> <extent encodinganalog="300">(10 folders)</extent>

Page 30: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Data Sources

• EAD finding aids [~150,000]

– 13 regional and statewide consortia– 35 repositories in US, UK, and France; multiple US federal agencies

• MARC21 records [~4.5 million]

– OCLC WorldCat• Authority records– OCLC Research: Virtual International Authority File (VIAF)

[~16 million]

– Getty Vocabulary Program: Union List of Artist Names (ULAN) [~120,000]

– Additional name records from Archives nationales, British Library, NARA, New York State Archives, and Smithsonian Institution Archives

Page 31: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

• Points

Consortia

•Archives Florida•ArchivesHub (UK)•Arizona Archives Online•EAD FACTORY (OhioLink)•Five Colleges•Maine Archival Collections Online (MACON)•Northwest Digital Archives (NWDA)•Online Archive of California•Philadelphia Area Consortium of Special Collections Libraries (PACSCL)•Rhode Island Archival & Manuscript Collections Online (RIAMCO)•Rocky Mountain Online Archive (RMOA)•Texas Archival Resources Online (TARO) •Virginia Heritage

Individual institutions

•American Philosophical Society•Archives nationales (France)•Archives of American Art•Bibliothèque nationale de France•BnF Archives et manuscripts•French Union Catalog •Brigham Young University•Church of Latter Day Saints Archives•Columbia University•Cornell University•Duke University•Harvard University•Indiana University•Library of Congress (publicly available without restriction)•Minnesota Historical Society•Massachusetts Institute of Technology•National Library of Medicine•New York Public Library•New York University•North Carolina State

•Northwestern University•Princeton University•Rutgers University•Smithsonian Institution Archives•Syracuse University •University of Alabama•University of Chicago•University of Connecticut•University of Delaware•University of Florida•University of Illinois•University of Kansas•University of Maryland•University of Michigan Bentley & Special Collections•University of Minnesota•University of Nebraska•University of North Carolina, Chapel Hill•University of Utah•Utah State Archives•Utah State University•Yale University

Page 32: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Data Sources

• EAD finding aids [~150,000]

– 13 regional and statewide consortia– 35 repositories in US, UK, and France; multiple US federal agencies

• MARC21 records [~4.5 million]

– OCLC WorldCat• Authority records– OCLC Research: Virtual International Authority File (VIAF)

[~16 million]

– Getty Vocabulary Program: Union List of Artist Names (ULAN) [~120,000]

– Additional name records from Archives nationales, British Library, NARA, New York State Archives, and Smithsonian Institution Archives

Page 33: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Data Sources

• EAD finding aids [~150,000]

– 13 regional and statewide consortia– 35 repositories in US, UK, and France; multiple US federal agencies

• MARC21 records [~4.5 million]

– OCLC WorldCat• Authority records– OCLC Research: Virtual International Authority File (VIAF)

[~16 million]

– Getty Vocabulary Program: Union List of Artist Names (ULAN) [~120,000]

– Additional EAC-CPF (or other) name records from Archives nationales de France, British Library, NARA, New York State Archives, and Smithsonian Institution Archives

Page 34: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Methods and Processing• Extract EAC-CPF records from existing EAD-encoded

archival descriptions– Extracting both creators and referenced CPF names

• Match EAC-CPF records against one another and against existing authority records (ULAN, VIAF, LCNAF)– Enhance EAC-CPF by normalizing entries, adding

alternative entries, titles (VIAF), and historical data (ULAN)

• Create a prototype historical resource and access system– Historical data and social-professional networks– Links to archive, library, and museum resources (by and

about)

Page 35: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

The Problem

• Proliferation of the forms of names– Different names for the same person– Different people with the same names

• Examples – from Books in Print (semi-controlled but not

consistent)– ERIC author index (not controlled)

Page 36: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Goethe

…etc…

Page 37: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

John Muir

Page 38: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Library and Archive Authority Control

• Library (or bibliographic) authority control is almost exclusively about the control of names

• Archival authority control involves biographical-historical description of the CPF entity– Descriptions based on controlled vocabularies, for

example, occupations, place of birth and death– But also biographical-historical description

• Prose• Chronological list

• Archival authority control provides context for understanding records, the context of their creation, the provenance

Page 39: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Matching and Merging in SNAC 2

• Developing an updateable database of merged EAC data (dumping Mongo for PostgreSQL)– Will permit incremental addition of new data and

support editing and “forced” merges

• All original records and merged records will be in the database

• Permanent identifiers will be assigned to merged (and unmerged) EAC output records– Track these in the database

Page 40: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

& Repository of merged EAC Records

PostgresEAC Repository

VIAF Repository

Connect exactly matching records

Connect records using name authority information

Merge

Cheshire Search

Merging EAC-CPF RecordsLCNAF Repository ULAN Repository

EAC Record Input

Merged EAC RecordsOutput

Page 41: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Merge System Step 1: Load Original RecordsCREATE TABLE original_records ( id bigint NOT NULL, name character varying(255) DEFAULT ''::character varying NOT NULL, source_id character varying(255), collection_id character varying(64), path character varying(255), r_type character varying(64) NOT NULL, from_date date, from_date_type character varying(64), to_date date, to_date_type character varying(64), processed boolean DEFAULT false NOT NULL, last_processed timestamp without time zone, record_data text, created_at timestamp without time zone NOT NULL, updated_at timestamp without time zone NOT NULL, record_group_id bigint);

• Parse source EAC– Key attributes

extracted for merge use

– Original XML stored• Timestamp for last

merge run on record– Resumption of

aborted merge runs or reruns

Page 42: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

& Repository of merged EAC Records

PostgresEAC Repository

VIAF Repository

Connect exactly matching records

Connect records using name authority information

Merge

Cheshire Search

Merging EAC-CPF RecordsLCNAF Repository ULAN Repository

EAC Record Input

Merged EAC RecordsOutput

Page 43: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

But…

• Exact merging assumes that archives are following LC cataloging practice in their EAD records– There are some problems with this assumption

Page 44: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Some failures for merging…• Different abbreviations:

– A. & G. Carisch & C.– A. & G. Carisch & Co.

• And spacing issues:– A. C. Peters & Bro.– A. C. Peters & Brother.– A. C. Peters. (??)– A. C.Peters & Bro.

• Completeness and alternate rules– Tabb, John B. (John Banister), 1845-1909.– Tabb, John Banister, 1845-1909.

• Also differing transliterations for non-Latin scripts

Page 45: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

More…

• Variant romanizations (and spacing):– M. P. Belaieff.– M. P. Belaïeff.– M. P. Bieliaev.– M.P. Belaïeff.– M.P.Belaïeff.

• Initials vs. names:– Zabolotskii, N.A.– Zabolotskii, Nikolai Alekseevich, 1903-1958.– Zabolotskii.

Page 46: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

& Repository of merged EAC Records

PostgresEAC Repository

VIAF Repository

Connect exactly matching records

Connect records using name authority information

Merge

Cheshire Search

Merging EAC-CPF RecordsLCNAF Repository ULAN Repository

EAC Record Input

Merged EAC RecordsOutput

Page 47: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Search Authority Files

• For each name, formulate a search of the VIAF database using the Cheshire system (SGML/XML retrieval system with probabilistic and Boolean matching)– Search both the “authoritative” and “non-

authoritative” forms– Consider any name matching a non-authoritative

form to be a candidate match for the authoritative form

– Flag EAC records that match the same authority record as potential matches

Page 48: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Data Sources

• EAD finding aids [~150,000]

– 13 regional and statewide consortia– 35 repositories in US, UK, and France; multiple US federal agencies

• MARC21 records [~4.5 million]

– OCLC WorldCat• Authority records– OCLC Research: Virtual International Authority File (VIAF)

[~16 million]

– Getty Vocabulary Program: Union List of Artist Names (ULAN) [~120,000]

– Additional name records from Archives nationales, British Library, NARA, New York State Archives, and Smithsonian Institution Archives

Page 49: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Shingle Language Model for names

Name: Einstein Albert

Shingle sequence: ein, ins, nst, ste, tei, ein … , ert

Probability that the sequence (ins, nst, ste) follows ein is very high for the name einstein

Krishna Janakiraman and Sean Marimpietri - Biograph

NGRAM or Shingle Matching

Page 50: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Name 1 : Einstein AlbertName 2 : Ainshtain AlbertName 3 : Albert Einstein

ein

ins

nst

steein In n a

alb

ert

al

rtetei

ein

Ain

ins

nsh

shthta tai ain

alb

ert

al

rteteiein

ein

ins

nst

steein In n a

alb

ert

al

rtetei

ein

lbe

lbe lbe

Shingle Language Model for namesKrishna Janakiraman and Sean Marimpietri - Biograph

Page 51: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Merge System Step 2: Record Matches

• Execute merge algorithm and create record groups– pointers from original

records to record groups

– Can be invalidated• Matched authority

record stored for reference

CREATE TABLE record_groups ( id bigint NOT NULL, name character varying(255) DEFAULT ''::character varying NOT NULL, g_type character varying(64) NOT NULL, viaf_record text, ulan_record text, is_valid boolean, invalidated_by bigint, created_at timestamp without time zone NOT NULL, updated_at timestamp without time zone NOT NULL);

Original Records

Record Groups

belongs to

Has many

Page 52: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

& Repository of merged EAC Records

PostgresEAC Repository

VIAF Repository

Connect exactly matching records

Connect records using name authority information

Merge

Cheshire Search

Merging EAC-CPF RecordsLCNAF Repository ULAN Repository

EAC Record Input

Merged EAC RecordsOutput

Page 53: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Merge Flagged Records

• For all of the exact matches and authority matches– Use the Authoritative form of the name– Combine data from each match into a single EAC-CPF

record– Retain all source record IDs and information

• Finally, output the merged EAC-CPF records– Actually – store how to build the merged record in the

database as well• Records can be regenerated as needed from the merge data

– Assign permanent identifier for the merged record

Page 54: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Merge System Step 3: Create Output

• Using valid record groups:– generate merged EAC– assign permanent ARK ID– write to new EAC file

• Merged XML stored in db, referenced by record group– Do not need to regenerate XML– Keep track of assigned permanent IDs

Page 55: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Merging Conclusions

• There is not a single merging method, but a staged set of approaches that will allow us to go from the simplest exact matches, to (we hope) reliably identifying various variant forms of a name, etc. when corroborated by contextual (date, etc.) information (including “active” dates

Page 56: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Prototype Access System

• text

http://socialarchive.iath.virginia.edu

Page 57: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

SNACSocial Networks and Archival Context

Page 58: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

SNACSocial Networks and Archival Context

Page 59: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

NAACNational Archival Authorities Cooperative

Not the final name

Page 60: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

NAACNational Archival Authorities Cooperativehttp://

socialarchive.iath.virginia.edu/NAAC_index.html

Page 61: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Activities

1. Cultivate EAC-CPF expertise across the archival community, through 140 SAA-hosted workshops

2. Develop a blueprint for a sustainable, national archival authority cooperative

Page 62: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Activities

1. Cultivate EAC-CPF expertise across the archival community, through 140 SAA-hosted workshops

2. Develop a blueprint for a sustainable, national archival authority cooperative

Page 63: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Activities

1. Cultivate EAC-CPF expertise across the archival community, through 140 SAA-hosted workshops

2. Develop a blueprint for a sustainable, national archival authority cooperative

Planning is being extended with proposalto the Mellon Foundation.

Stay tuned for Spring 2014!

Page 64: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Prototype Access System

• text

http://socialarchive.iath.virginia.edu

Page 65: Ray R. Larson, School of Information, UC Berkeley Daniel Pitti, University of Virginia, Institute for Advanced Technology in the Humanities Yiming Liu,

Brian Tingle and Adrian Turner

RBMSPre-Conference 2012San Diego, CA