View
214
Download
0
Category
Tags:
Preview:
Citation preview
Ray R. Larson, School of Information, UC BerkeleyDaniel Pitti, University of Virginia, Institute for Advanced Technology
in the HumanitiesYiming Liu, School of Information, UC Berkeley
Brian Tingle, California Digital LibraryAdrian Turner, California Digital Library
Rachel Hu, California Digital Library
PNC 2013 – Kyoto, Japan
Towards a Social Network of History
http://socialarchive.iath.virginia.edu
Archival Name Authority System
Hamilton, Alexander, 1757-1804
Luce, Clare Boothe, 1903-1987
Oppenheimer, J. Robert, 1904-1967
Patton family
Patton, George S. (George Smith), 1885-1945
Sontag, Susan, 1933-2004Washington, George, 1732-1799
Whitman, Walt, 1819-1892
Wright, Lloyd, 1890-1978
Archival Name Authority System
Anthony, Susan B
Berkeley Free Church
Bernstein, Leonard, 1918-
Block, Herbert, 1909-2001
Bush, Vannevar, 1890-1974
Frankfurter, Felix, 1882-1965
Franklin, Benjamin, 1706-1790
Fuller, R. Buckminster (Richard Buckminster), 1895-1983
Hamilton, Alexander, 1757-1804
Luce, Clare Boothe, 1903-1987Oppenheimer, J. Robert, 1904-1967
Patton family
Patton, George S. (George Smith),
Hamilton, Alexander, 1757-1804
Luce, Clare Boothe, 1903-1987
Oppenheimer, J. Robert, 1904-1967
Patton family
Patton, George S. (George Smith), 1885-1945
Sontag, Susan, 1933-2004Washington, George, 1732-1799
Whitman, Walt, 1819-1892
Wright, Lloyd, 1890-1978
Archival Name Authority System
Engelland, Jurgen (George).Enwall, Ogie (Aage).
Erickson, Selma Inez.
Fahl, Hans Johan Fredrik.Fet, Peter Laurits.
Flones, Edward.
Fredrickson, Hans.Fredrickson, Sven Fredrick.Garberg, Peder.Gillam, Chandler B., 1833-1899.
Halseth, Otto Hjalmer.Handeland, Martha Tweiten.
Hansen, Anne Schmidt.Hansen, Sylvia (Solveig).Haug, Olga Karoline Nilsen.Hemmestad, Olga Kristine Brodahl.
Henry, Oscar M., 1851-1916.
Holmes, Anna Gudrun Hauge.
Holmes, Elias Kristofferson Velholmen.
Hoset, Ole.Howard, Barnett Allen, b. 1827.
Hytmo, Guri Olsdatter.Knudsen, Johanne.
Kofoed, Thorvald Andreas.Nakkerud, Inga Amanda Treland.
Nakkerud, Trygve Bloch.Nelson, Amanda.
Nerland, Einar Magnus.Nielsen, Einer.Nilsen, Martha Dagsvik.
Nissen, Ole Andreas Nissen.Norberg, Jonas
Walfred.Norwick, Goodman.Nygaard, Lars Thomas.
Odmark, Elsie Karlson.Ohrt, Sigfrid Eidsness.Oliver, Kole Skaflestad.
Olson, Alvin E.Opsal, Cato Torvald.
Petersen, Greta Jensen.Rasmussen, Martin.
Rinne, Esther Wiirre.Rodney familySandback, George Brun.Saure, Sivert Andreas.
Enwall, Ogie (Aage).Erickson,
Selma Inez.Fahl, Hans Johan Fredrik.
Fet, Peter Laurits. Norberg, Jonas Walfred.Norwick, Goodman.
Nygaard, Lars Thomas.Odmark, Elsie Karlson.
Ohrt, Sigfrid Eidsness.Oliver, Kole Skaflestad.Olson, Alvin E.
Opsal, Cato Torvald.Petersen, Greta Jensen.
Rasmussen, Martin.Rinne, Esther Wiirre.
Rodney familySandback, George Brun.
Saure, SHandeland, Martha Tweiten.
Hansen, Anne Schmidt.Hansen, Sylvia (Solveig).Haug, Olga Karoline Nilsen.
Holmes, Anna Gudrun Hauge.Holmes, Elias Kristofferson Velholmen.Hoset, Ole.Howard, Barnett Allen, b. 1827.
Hytmo, Guri Olsdatter.Johnson, Andrew (Anders Johansson).
Johnson, Phiea Petersen Stahl.Johnson, Thelma Irene
Underdal.Jorgenson, Jorgen Aadneram.
Kjersem, Ole Johnson.Knudsen, Johanne.
Kofoed, Thorvald Andreas.Larsen, Elias.Lillelien, Thor.Loe, Otto Calvin.Molund, Erik Wilhelm.
Nakkerud, Inga Amanda Treland.Nakkerud, Trygve Bloch.
Nelson, Amanda.Nerland, Einar Magnus.
Nielsen, Einer.Nilsen, Martha Dagsvik.Nissen, Ole Andreas Nissenivert Andreas.
Johnson, Andrew (Anders Johansson).Johnson, Phiea Petersen Stahl.
Johnson, Thelma Irene Underdal.
Jorgenson, Jorgen Aadneram.Kjersem, Ole Johnson.
Hamilton, Alexander, 1757-1804
Luce, Clare Boothe, 1903-1987
Oppenheimer, J. Robert, 1904-1967
Patton familyPatton, George S. (George Smith),
1885-1945
Sontag, Susan, 1933-2004Washington, George, 1732-1799
Whitman, Walt, 1819-1892Flones, Edward.
Fredrickson, Hans.Fredrickson, Sven Fredrick.
Garberg, Peder.Gillam, Chandler B., 1833-1899.Halseth, Otto Hjalmer.
.
Wright, Lloyd, 1890-1978
Archival Name Authority System
Anthony, Susan B
Berkeley Free Church
Bernstein, Leonard, 1918-
Block, Herbert, 1909-2001
Bush, Vannevar, 1890-1974
Frankfurter, Felix, 1882-1965
Franklin, Benjamin, 1706-1790
Fuller, R. Buckminster (Richard Buckminster), 1895-1983
Hamilton, Alexander, 1757-1804
Luce, Clare Boothe, 1903-1987Oppenheimer, J. Robert, 1904-1967
Patton family
Patton, George S. (George Smith),
Hamilton, Alexander, 1757-1804
Luce, Clare Boothe, 1903-1987
Oppenheimer, J. Robert, 1904-1967
Patton family
Patton, George S. (George Smith), 1885-1945
Sontag, Susan, 1933-2004Washington, George, 1732-1799
Whitman, Walt, 1819-1892
Wright, Lloyd, 1890-1978
Engelland, Jurgen (George).Enwall, Ogie (Aage).
Erickson, Selma Inez.Fahl, Hans Johan Fredrik.Fet, Peter Laurits.
Flones, Edward.Fredrickson, Hans.
Fredrickson, Sven Fredrick.Garberg, Peder.Gillam, Chandler B., 1833-1899.
Halseth, Otto Hjalmer.Handeland, Martha Tweiten.
Hansen, Anne Schmidt.Hansen, Sylvia (Solveig).Haug, Olga Karoline Nilsen.Hemmestad, Olga Kristine Brodahl.
Henry, Oscar M., 1851-1916.
Holmes, Anna Gudrun Hauge.
Holmes, Elias Kristofferson Velholmen.
Hoset, Ole.Howard, Barnett Allen, b. 1827.
Hytmo, Guri Olsdatter.Johnson, Andrew (Anders Johansson).
Johnson, Phiea Petersen Stahl.Johnson, Thelma Irene Underdal.
Jorgenson, Jorgen Aadneram.Kjersem, Ole Johnson.
Knudsen, Johanne.Kofoed, Thorvald Andreas.
Larsen, Elias.Lillelien, Thor.
Loe, Otto Calvin.Molund, Erik Wilhelm.Nakkerud, Inga Amanda Treland.Nakkerud, Trygve Bloch.
Nelson, Amanda.Nerland, Einar Magnus.Nielsen, Einer.Nilsen, Martha Dagsvik.
Nissen, Ole Andreas Nissen.Norberg, Jonas Walfred.
Norwick, Goodman.Nygaard, Lars Thomas.
Odmark, Elsie Karlson.Ohrt, Sigfrid Eidsness.Oliver, Kole Skaflestad.
Olson, Alvin E.Opsal, Cato Torvald.
Petersen, Greta Jensen.Rasmussen, Martin.
Rinne, Esther Wiirre.Rodney familySandback, George Brun.Saure, Sivert Andreas.
Enwall, Ogie (Aage).Erickson,
Selma Inez.Fahl, Hans Johan Fredrik.Fet, Peter Laurits.Flones, Edward.
Fredrickson, Hans.Fredrickson, Sven Fredrick.
Garberg, Peder.Gillam, Chandler B., 1833-1899.
Halseth, Otto Hjalmer.Handeland, Martha Tweiten.
Hansen, Anne Schmidt.Hansen, Sylvia (Solveig).Haug, Olga Karoline Nilsen.Hemmestad, Olga Kristine Brodahl.Henry, Oscar M., 1851-1916.Holmes, Anna Gudrun Hauge.Holmes, Elias Kristofferson Velholmen.
Hoset, Ole.Howard, Barnett Allen, b. 1827.
Hytmo, Guri Olsdatter.Johnson, Andrew (Anders Johansson).
Johnson, Phiea Petersen Stahl.Johnson, Thelma Irene
Underdal.Jorgenson, Jorgen Aadneram.
Kjersem, Ole Johnson.Knudsen, Johanne.
Kofoed, Thorvald Andreas.Larsen, Elias.Lillelien, Thor.Loe, Otto Calvin.Molund, Erik Wilhelm.
Nakkerud, Inga Amanda Treland.Nakkerud, Trygve Bloch.
Nelson, Amanda.Nerland, Einar Magnus.
Nielsen, Einer.Nilsen, Martha Dagsvik.Nissen, Ole Andreas Nissen.Norberg, Jonas Walfred.
Norwick, Goodman.Nygaard, Lars Thomas.Odmark, Elsie Karlson.Ohrt, Sigfrid Eidsness.
Oliver, Kole Skaflestad.Olson, Alvin E.Opsal, Cato Torvald.Petersen, Greta Jensen.
Rasmussen, Martin.Rinne, Esther Wiirre.
Rodney familySandback, George Brun.Saure, Sivert Andreas.
Engelland, Jurgen (George).Enwall, Ogie (Aage).
Erickson, Selma Inez.Fahl, Hans Johan Fredrik.Fet, Peter Laurits.
Flones, Edward.Fredrickson, Hans.
Fredrickson, Sven Fredrick.Garberg, Peder.Gillam, Chandler B., 1833-1899.
Halseth, Otto Hjalmer.Handeland, Martha Tweiten.
Hansen, Anne Schmidt.Hansen, Sylvia (Solveig).Haug, Olga Karoline Nilsen.Hemmestad, Olga Kristine Brodahl.
Henry, Oscar M., 1851-1916.
Holmes, Anna Gudrun Hauge.
Holmes, Elias Kristofferson Velholmen.
Hoset, Ole.Howard, Barnett Allen, b. 1827.
Hytmo, Guri Olsdatter.Johnson, Andrew (Anders Johansson).
Johnson, Phiea Petersen Stahl.Johnson, Thelma Irene Underdal.
Jorgenson, Jorgen Aadneram.Kjersem, Ole Johnson.
Knudsen, Johanne.Kofoed, Thorvald Andreas.
Larsen, Elias.Lillelien, Thor.
Loe, Otto Calvin.Molund, Erik Wilhelm.Nakkerud, Inga Amanda Treland.Nakkerud, Trygve Bloch.
Nelson, Amanda.Nerland, Einar Magnus.Nielsen, Einer.Nilsen, Martha Dagsvik.
Nissen, Ole Andreas Nissen.Norberg, Jonas Walfred.
Norwick, Goodman.Nygaard, Lars Thomas.
Odmark, Elsie Karlson.Ohrt, Sigfrid Eidsness.Oliver, Kole Skaflestad.
Olson, Alvin E.Opsal, Cato Torvald.
Petersen, Greta Jensen.Rasmussen, Martin.
Rinne, Esther Wiirre.Rodney familySandback, George Brun.Saure, Sivert Andreas.
Enwall, Ogie (Aage).Erickson,
Selma Inez.Fahl, Hans Johan Fredrik.Fet, Peter Laurits.Flones, Edward.
Fredrickson, Hans.Fredrickson, Sven Fredrick.
Garberg, Peder.Gillam, Chandler B., 1833-1899.
Halseth, Otto Hjalmer.Handeland, Martha Tweiten.
Hansen, Anne Schmidt.Hansen, Sylvia (Solveig).Haug, Olga Karoline Nilsen.Hemmestad, Olga Kristine Brodahl.Henry, Oscar M., 1851-1916.Holmes, Anna Gudrun Hauge.Holmes, Elias Kristofferson Velholmen.
Hoset, Ole.Howard, Barnett Allen, b. 1827.
Hytmo, Guri Olsdatter.Johnson, Andrew (Anders Johansson).
Johnson, Phiea Petersen Stahl.Johnson, Thelma Irene
Underdal.Jorgenson, Jorgen Aadneram.
Kjersem, Ole Johnson.Knudsen, Johanne.
Kofoed, Thorvald Andreas.Larsen, Elias.Lillelien, Thor.Loe, Otto Calvin.Molund, Erik Wilhelm.
Nakkerud, Inga Amanda Treland.Nakkerud, Trygve Bloch.
Nelson, Amanda.Nerland, Einar Magnus.
Nielsen, Einer.Nilsen, Martha Dagsvik.Nissen, Ole Andreas Nissen.Norberg, Jonas Walfred.
Norwick, Goodman.Nygaard, Lars Thomas.Odmark, Elsie Karlson.Ohrt, Sigfrid Eidsness.
Oliver, Kole Skaflestad.Olson, Alvin E.Opsal, Cato Torvald.Petersen, Greta Jensen.
Rasmussen, Martin.Rinne, Esther Wiirre.
Rodney familySandback, George Brun.Saure, Sivert Andreas.
Engelland, Jurgen (George).Enwall, Ogie (Aage).
Erickson, Selma Inez.Fahl, Hans Johan Fredrik.Fet, Peter Laurits.
Flones, Edward.Fredrickson, Hans.
Fredrickson, Sven Fredrick.Garberg, Peder.Gillam, Chandler B., 1833-1899.
Halseth, Otto Hjalmer.Handeland, Martha Tweiten.
Hansen, Anne Schmidt.Hansen, Sylvia (Solveig).Haug, Olga Karoline Nilsen.Hemmestad, Olga Kristine Brodahl.
Henry, Oscar M., 1851-1916.
Holmes, Anna Gudrun Hauge.
Holmes, Elias Kristofferson Velholmen.
Hoset, Ole.Howard, Barnett Allen, b. 1827.
Hytmo, Guri Olsdatter.Johnson, Andrew (Anders Johansson).
Johnson, Phiea Petersen Stahl.Johnson, Thelma Irene Underdal.
Jorgenson, Jorgen Aadneram.Kjersem, Ole Johnson.
Knudsen, Johanne.Kofoed, Thorvald Andreas.
Larsen, Elias.Lillelien, Thor.
Loe, Otto Calvin.Molund, Erik Wilhelm.Nakkerud, Inga Amanda Treland.Nakkerud, Trygve Bloch.
Nelson, Amanda.Nerland, Einar Magnus.Nielsen, Einer.Nilsen, Martha Dagsvik.
Nissen, Ole Andreas Nissen.Norberg, Jonas Walfred.
Norwick, Goodman.Nygaard, Lars Thomas.
Odmark, Elsie Karlson.Ohrt, Sigfrid Eidsness.Oliver, Kole Skaflestad.
Olson, Alvin E.Opsal, Cato Torvald.
Petersen, Greta Jensen.Rasmussen, Martin.
Rinne, Esther Wiirre.Rodney familySandback, George Brun.Saure, Sivert Andreas.
Enwall, Ogie (Aage).Erickson,
Selma Inez.Fahl, Hans Johan Fredrik.Fet, Peter Laurits.Flones, Edward.
Fredrickson, Hans.Fredrickson, Sven Fredrick.
Garberg, Peder.Gillam, Chandler B., 1833-1899.
Halseth, Otto Hjalmer.Handeland, Martha Tweiten.
Hansen, Anne Schmidt.Hansen, Sylvia (Solveig).Haug, Olga Karoline Nilsen.Hemmestad, Olga Kristine Brodahl.Henry, Oscar M., 1851-1916.Holmes, Anna Gudrun Hauge.Holmes, Elias Kristofferson Velholmen.
Hoset, Ole.Howard, Barnett Allen, b. 1827.
Hytmo, Guri Olsdatter.Johnson, Andrew (Anders Johansson).
Johnson, Phiea Petersen Stahl.Johnson, Thelma Irene
Underdal.Jorgenson, Jorgen Aadneram.
Kjersem, Ole Johnson.Knudsen, Johanne.
Kofoed, Thorvald Andreas.Larsen, Elias.Lillelien, Thor.Loe, Otto Calvin.Molund, Erik Wilhelm.
Nakkerud, Inga Amanda Treland.Nakkerud, Trygve Bloch.
Nelson, Amanda.Nerland, Einar Magnus.
Nielsen, Einer.Nilsen, Martha Dagsvik.Nissen, Ole Andreas Nissen.Norberg, Jonas Walfred.
Norwick, Goodman.Nygaard, Lars Thomas.Odmark, Elsie Karlson.Ohrt, Sigfrid Eidsness.
Oliver, Kole Skaflestad.Olson, Alvin E.Opsal, Cato Torvald.Petersen, Greta Jensen.
Rasmussen, Martin.Rinne, Esther Wiirre.
Rodney familySandback, George Brun.Saure, Sivert Andreas.
Archival Name Authority System
Archival Name Authority System
Archival Name Authority System
Archival Name Authority System
Archival Name Authority System
Archival Name Authority System
Background
• Research and demonstration project• Multi-year funding• National Endowment for the Humanities
(2010-2012) • Andrew W. Mellon Foundation (2012-
2014)• Planning Project for Cooperative Service
(2014-15 - Pending)
Objectives
1. Develop tools for extracting EAC-CPF records, drawing on existing data (EAD finding aids, MARC records)
2. Match, merge, and enhance; build a large test corpus of EAC-CPF records
3. Create a prototype biographical resource and access system, using those records
Objectives
1. Develop tools for extracting EAC-CPF records, drawing on existing data (EAD finding aids, MARC records)
2. Match, merge, and enhance; build a large test corpus of EAC-CPF records
3. Create a prototype biographical resource and access system, using those records
Objectives
1. Develop tools for extracting EAC-CPF records, drawing on existing data (EAD finding aids, MARC records)
2. Match, merge, and enhance; build a large test corpus of EAC-CPF records
3. Create a prototype biographical resource and access system, using those records
Project Team
• University of Virginia, Institute for Advanced Technology in the Humanities– Daniel Pitti (PI) and Worthy Martin
• UC Berkeley School of Information– Ray Larson and Yiming Liu
• California Digital Library– Rachael Hu, Brian Tingle, and Adrian Turner
Project Team
• Terry Catapano (Columbia University)• Sara Sprenkle (Washington and Lee University)• Sarah Wells (University of Virginia)• Kathy Wisser (Simmons Graduate School of Library
and Information Science)• Tom Lynch (University of Illinois School of Library
and Information Science)
EAC-CPF
• XML-based data structure standard for encoding archival authority records
• Authorized name headings for the entity• Biographical/historical context for the entity• Links to resources created by the entity• Links to resources about the entity
Example EAD - Creator<?xml version="1.0" encoding="UTF-8" standalone="no"?><!-- Transformed with v1v2002_4.xsl --><!DOCTYPE ead PUBLIC "+//ISBN 1-931666-00-8//DTD ead.dtd (Encoded Archival Description (EAD) Version 2002)//EN" "http://lcweb2.loc.gov/xmlcommon/dtds/ead2002/ead.dtd" [<!ENTITY lcseal SYSTEM "http://lcweb2.loc.gov/xmlcommon/lcseal.jpg " NDATA jpeg>]>
<ead><eadheader repositoryencoding="iso15511” … > <eadid mainagencycode="dlc" countrycode="us”…>http://hdl.loc.gov/loc.mss/eadmss.ms003073</eadid><filedesc><titlestmt> <titleproper encodinganalog="245$a">Clement F. Haynsworth
Papers</titleproper> … <unitid label="ID No." encodinganalog="590" countrycode="US” …>MSS79781</unitid> <origination label="Creator"> <persname source="lcnaf" encodinganalog="100">Haynsworth, Clement F. (Clement Furman), 1912-1989</persname> </origination>
<physdesc label="Extent">…
Example EAD - bioghist …<bioghist encodinganalog="545"> <head>Biographical Note</head> <chronlist> <listhead> <head01>Date</head01> <head02>Event</head02> </listhead> <chronitem> <date>1912, Oct. 30</date> <event>Born, Greenville, S.C.</event> </chronitem> <chronitem> <date>1933</date> <event>A.B., Furman University, Greenville, S.C.</event> </chronitem> <chronitem> <date>1936</date> <event>LL.B., Harvard University, Cambridge, Mass.</event>
</chronitem> …
Title
Title
Title
TitleJohn Brennan
George Jones
Thomas Smith
Frederick Jones
Martha Jones
Example EAD - controlaccess … </note> <controlaccess> <head>People</head> <persname encodinganalog="600" role="subject" source="lcnaf" altrender=":::PWEBRECON=^Barzun%2C+Jacques%2C+1907-+Correspondence.^">Barzun, Jacques, 1907- --Correspondence.</persname> <persname encodinganalog="600" role="subject" source="lcnaf" altrender=":::PWEBRECON=^Brennan%2C+William+J.+%28William+Joseph%29%2C+1906-1997+Correspondence.^">Brennan, William J. (William Joseph), 1906-1997--Correspondence.</persname> <persname encodinganalog="600" role="subject" source="lcnaf" altrender=":::PWEBRECON=^Burger%2C+Warren+E.%2C+1907-1995+Correspondence.^">Burger, Warren E., 1907-1995--Correspondence.</persname> <persname encodinganalog="600" role="subject" source="lcnaf" altrender=":::PWEBRECON=^Clark%2C+Tom+C.+%28Tom+Campbell%29%2C+1899-1977+Correspondence.^">Clark, Tom C. (Tom Campbell), 1899-1977--Correspondence.</persname> …
Example EAD - scopecontent
…The most significant and frequent of Haynsworth's correspondents are Jacques Barzun, William J. Brennan, Warren E. Burger, Tom C. Clark, John Paul Frank, Ernest F. Hollings, Edward Moore Kennedy, J. Woodrow Lewis, Daniel John Meador, Arthur Raphael Miller, Richard M. Nixon, Lewis F. Powell, Jr., Strom Thurmond, Johnnie McKeiver Walters, Bernard J. Ward, and Charles Alan Wright.</p> </scopecontent>
…
Example EAD – unittitle<c04 level="file"> <did> <unitid>No. 7383 </unitid> <unittitle encodinganalog="245$a">Long Mfg. Co. v. Holliday</unittitle> </did> </c04> …<c04 level="file"> <did> <unitid>No. 7416 </unitid> <unittitle encodinganalog="245$a">Norfolk and Portsmouth Belt Line R.R. v. Brotherhood of R.R. Trainmen, Lodge No. 514</unittitle> </did>
</c04> … <c03 level="file"> <did> <container type="box">201</container> <unittitle encodinganalog="245$a">Wright, Charles Alan, 1970-1989 </unittitle> <physdesc> <extent encodinganalog="300">(10 folders)</extent>
Data Sources
• EAD finding aids [~150,000]
– 13 regional and statewide consortia– 35 repositories in US, UK, and France; multiple US federal agencies
• MARC21 records [~4.5 million]
– OCLC WorldCat• Authority records– OCLC Research: Virtual International Authority File (VIAF)
[~16 million]
– Getty Vocabulary Program: Union List of Artist Names (ULAN) [~120,000]
– Additional name records from Archives nationales, British Library, NARA, New York State Archives, and Smithsonian Institution Archives
• Points
Consortia
•Archives Florida•ArchivesHub (UK)•Arizona Archives Online•EAD FACTORY (OhioLink)•Five Colleges•Maine Archival Collections Online (MACON)•Northwest Digital Archives (NWDA)•Online Archive of California•Philadelphia Area Consortium of Special Collections Libraries (PACSCL)•Rhode Island Archival & Manuscript Collections Online (RIAMCO)•Rocky Mountain Online Archive (RMOA)•Texas Archival Resources Online (TARO) •Virginia Heritage
Individual institutions
•American Philosophical Society•Archives nationales (France)•Archives of American Art•Bibliothèque nationale de France•BnF Archives et manuscripts•French Union Catalog •Brigham Young University•Church of Latter Day Saints Archives•Columbia University•Cornell University•Duke University•Harvard University•Indiana University•Library of Congress (publicly available without restriction)•Minnesota Historical Society•Massachusetts Institute of Technology•National Library of Medicine•New York Public Library•New York University•North Carolina State
•Northwestern University•Princeton University•Rutgers University•Smithsonian Institution Archives•Syracuse University •University of Alabama•University of Chicago•University of Connecticut•University of Delaware•University of Florida•University of Illinois•University of Kansas•University of Maryland•University of Michigan Bentley & Special Collections•University of Minnesota•University of Nebraska•University of North Carolina, Chapel Hill•University of Utah•Utah State Archives•Utah State University•Yale University
Data Sources
• EAD finding aids [~150,000]
– 13 regional and statewide consortia– 35 repositories in US, UK, and France; multiple US federal agencies
• MARC21 records [~4.5 million]
– OCLC WorldCat• Authority records– OCLC Research: Virtual International Authority File (VIAF)
[~16 million]
– Getty Vocabulary Program: Union List of Artist Names (ULAN) [~120,000]
– Additional name records from Archives nationales, British Library, NARA, New York State Archives, and Smithsonian Institution Archives
Data Sources
• EAD finding aids [~150,000]
– 13 regional and statewide consortia– 35 repositories in US, UK, and France; multiple US federal agencies
• MARC21 records [~4.5 million]
– OCLC WorldCat• Authority records– OCLC Research: Virtual International Authority File (VIAF)
[~16 million]
– Getty Vocabulary Program: Union List of Artist Names (ULAN) [~120,000]
– Additional EAC-CPF (or other) name records from Archives nationales de France, British Library, NARA, New York State Archives, and Smithsonian Institution Archives
Methods and Processing• Extract EAC-CPF records from existing EAD-encoded
archival descriptions– Extracting both creators and referenced CPF names
• Match EAC-CPF records against one another and against existing authority records (ULAN, VIAF, LCNAF)– Enhance EAC-CPF by normalizing entries, adding
alternative entries, titles (VIAF), and historical data (ULAN)
• Create a prototype historical resource and access system– Historical data and social-professional networks– Links to archive, library, and museum resources (by and
about)
The Problem
• Proliferation of the forms of names– Different names for the same person– Different people with the same names
• Examples – from Books in Print (semi-controlled but not
consistent)– ERIC author index (not controlled)
Goethe
…etc…
John Muir
Library and Archive Authority Control
• Library (or bibliographic) authority control is almost exclusively about the control of names
• Archival authority control involves biographical-historical description of the CPF entity– Descriptions based on controlled vocabularies, for
example, occupations, place of birth and death– But also biographical-historical description
• Prose• Chronological list
• Archival authority control provides context for understanding records, the context of their creation, the provenance
Matching and Merging in SNAC 2
• Developing an updateable database of merged EAC data (dumping Mongo for PostgreSQL)– Will permit incremental addition of new data and
support editing and “forced” merges
• All original records and merged records will be in the database
• Permanent identifiers will be assigned to merged (and unmerged) EAC output records– Track these in the database
& Repository of merged EAC Records
PostgresEAC Repository
VIAF Repository
Connect exactly matching records
Connect records using name authority information
Merge
Cheshire Search
Merging EAC-CPF RecordsLCNAF Repository ULAN Repository
EAC Record Input
Merged EAC RecordsOutput
Merge System Step 1: Load Original RecordsCREATE TABLE original_records ( id bigint NOT NULL, name character varying(255) DEFAULT ''::character varying NOT NULL, source_id character varying(255), collection_id character varying(64), path character varying(255), r_type character varying(64) NOT NULL, from_date date, from_date_type character varying(64), to_date date, to_date_type character varying(64), processed boolean DEFAULT false NOT NULL, last_processed timestamp without time zone, record_data text, created_at timestamp without time zone NOT NULL, updated_at timestamp without time zone NOT NULL, record_group_id bigint);
• Parse source EAC– Key attributes
extracted for merge use
– Original XML stored• Timestamp for last
merge run on record– Resumption of
aborted merge runs or reruns
& Repository of merged EAC Records
PostgresEAC Repository
VIAF Repository
Connect exactly matching records
Connect records using name authority information
Merge
Cheshire Search
Merging EAC-CPF RecordsLCNAF Repository ULAN Repository
EAC Record Input
Merged EAC RecordsOutput
But…
• Exact merging assumes that archives are following LC cataloging practice in their EAD records– There are some problems with this assumption
Some failures for merging…• Different abbreviations:
– A. & G. Carisch & C.– A. & G. Carisch & Co.
• And spacing issues:– A. C. Peters & Bro.– A. C. Peters & Brother.– A. C. Peters. (??)– A. C.Peters & Bro.
• Completeness and alternate rules– Tabb, John B. (John Banister), 1845-1909.– Tabb, John Banister, 1845-1909.
• Also differing transliterations for non-Latin scripts
More…
• Variant romanizations (and spacing):– M. P. Belaieff.– M. P. Belaïeff.– M. P. Bieliaev.– M.P. Belaïeff.– M.P.Belaïeff.
• Initials vs. names:– Zabolotskii, N.A.– Zabolotskii, Nikolai Alekseevich, 1903-1958.– Zabolotskii.
& Repository of merged EAC Records
PostgresEAC Repository
VIAF Repository
Connect exactly matching records
Connect records using name authority information
Merge
Cheshire Search
Merging EAC-CPF RecordsLCNAF Repository ULAN Repository
EAC Record Input
Merged EAC RecordsOutput
Search Authority Files
• For each name, formulate a search of the VIAF database using the Cheshire system (SGML/XML retrieval system with probabilistic and Boolean matching)– Search both the “authoritative” and “non-
authoritative” forms– Consider any name matching a non-authoritative
form to be a candidate match for the authoritative form
– Flag EAC records that match the same authority record as potential matches
Data Sources
• EAD finding aids [~150,000]
– 13 regional and statewide consortia– 35 repositories in US, UK, and France; multiple US federal agencies
• MARC21 records [~4.5 million]
– OCLC WorldCat• Authority records– OCLC Research: Virtual International Authority File (VIAF)
[~16 million]
– Getty Vocabulary Program: Union List of Artist Names (ULAN) [~120,000]
– Additional name records from Archives nationales, British Library, NARA, New York State Archives, and Smithsonian Institution Archives
Shingle Language Model for names
Name: Einstein Albert
Shingle sequence: ein, ins, nst, ste, tei, ein … , ert
Probability that the sequence (ins, nst, ste) follows ein is very high for the name einstein
Krishna Janakiraman and Sean Marimpietri - Biograph
NGRAM or Shingle Matching
Name 1 : Einstein AlbertName 2 : Ainshtain AlbertName 3 : Albert Einstein
ein
ins
nst
steein In n a
alb
ert
al
rtetei
ein
Ain
ins
nsh
shthta tai ain
alb
ert
al
rteteiein
ein
ins
nst
steein In n a
alb
ert
al
rtetei
ein
lbe
lbe lbe
Shingle Language Model for namesKrishna Janakiraman and Sean Marimpietri - Biograph
Merge System Step 2: Record Matches
• Execute merge algorithm and create record groups– pointers from original
records to record groups
– Can be invalidated• Matched authority
record stored for reference
CREATE TABLE record_groups ( id bigint NOT NULL, name character varying(255) DEFAULT ''::character varying NOT NULL, g_type character varying(64) NOT NULL, viaf_record text, ulan_record text, is_valid boolean, invalidated_by bigint, created_at timestamp without time zone NOT NULL, updated_at timestamp without time zone NOT NULL);
Original Records
Record Groups
belongs to
Has many
& Repository of merged EAC Records
PostgresEAC Repository
VIAF Repository
Connect exactly matching records
Connect records using name authority information
Merge
Cheshire Search
Merging EAC-CPF RecordsLCNAF Repository ULAN Repository
EAC Record Input
Merged EAC RecordsOutput
Merge Flagged Records
• For all of the exact matches and authority matches– Use the Authoritative form of the name– Combine data from each match into a single EAC-CPF
record– Retain all source record IDs and information
• Finally, output the merged EAC-CPF records– Actually – store how to build the merged record in the
database as well• Records can be regenerated as needed from the merge data
– Assign permanent identifier for the merged record
Merge System Step 3: Create Output
• Using valid record groups:– generate merged EAC– assign permanent ARK ID– write to new EAC file
• Merged XML stored in db, referenced by record group– Do not need to regenerate XML– Keep track of assigned permanent IDs
Merging Conclusions
• There is not a single merging method, but a staged set of approaches that will allow us to go from the simplest exact matches, to (we hope) reliably identifying various variant forms of a name, etc. when corroborated by contextual (date, etc.) information (including “active” dates
Prototype Access System
• text
http://socialarchive.iath.virginia.edu
SNACSocial Networks and Archival Context
SNACSocial Networks and Archival Context
NAACNational Archival Authorities Cooperative
Not the final name
NAACNational Archival Authorities Cooperativehttp://
socialarchive.iath.virginia.edu/NAAC_index.html
Activities
1. Cultivate EAC-CPF expertise across the archival community, through 140 SAA-hosted workshops
2. Develop a blueprint for a sustainable, national archival authority cooperative
Activities
1. Cultivate EAC-CPF expertise across the archival community, through 140 SAA-hosted workshops
2. Develop a blueprint for a sustainable, national archival authority cooperative
Activities
1. Cultivate EAC-CPF expertise across the archival community, through 140 SAA-hosted workshops
2. Develop a blueprint for a sustainable, national archival authority cooperative
Planning is being extended with proposalto the Mellon Foundation.
Stay tuned for Spring 2014!
Prototype Access System
• text
http://socialarchive.iath.virginia.edu
Recommended