23
Data archiving for the Irish Record Linkage project Rebecca Grant, Digital Archivist, Digital Repository of Ireland Dolores Grant, IRL-DRI Digital Archivist, Digital Repository of Ireland

Rebecca Grant & Dolores Grant - Data Archiving for the Irish Record Linkage Project

Embed Size (px)

Citation preview

Page 1: Rebecca Grant & Dolores Grant - Data Archiving for the Irish Record Linkage Project

Data archiving for the Irish Record Linkage project

Rebecca Grant, Digital Archivist, Digital Repository of IrelandDolores Grant, IRL-DRI Digital Archivist, Digital Repository of Ireland

Page 2: Rebecca Grant & Dolores Grant - Data Archiving for the Irish Record Linkage Project

Data archiving for the Irish Record Linkage project

This is a Placeholder for Text

•Bullet-point 01

•Bullet-point 02

•Bullet-point 03

Irish Record Linkage project 1864-1913

Irish Record Linkage is an Irish Research Council funded project running from 2014 – June 2016

Collaboration between the University of Limerick (historians), the Digital Repository of Ireland at the Royal Irish Academy (archivists), and Insight@NUI Galway (knowledge engineers, Linked Data experts)

Constructing a Knowledge Platform – Linked Data based on Vital Registration Data (digitised registers of Births, Marriages and Deaths) in order to answer research questions around infant and maternal mortality

Page 3: Rebecca Grant & Dolores Grant - Data Archiving for the Irish Record Linkage Project

Data archiving for the Irish Record Linkage project

This is a Placeholder for Text

•Bullet-point 01

•Bullet-point 02

•Bullet-point 03

Irish Record Linkage project 1864-1913

The Linked Data concept and the project’s dataset

Extracting data from the vital records

Approaches to archival authenticity

Preservation of the records

Page 4: Rebecca Grant & Dolores Grant - Data Archiving for the Irish Record Linkage Project

Data archiving for the Irish Record Linkage project

This is a Placeholder for Text

•Bullet-point 01

•Bullet-point 02

•Bullet-point 03

The Digital Repository of IrelandDRI is a trusted digital repository for the Humanities and Social Sciences data – launched June 2015 and based at the Royal Irish AcademyLinking and preserving the rich collections held by Irish institutions (archives, museums, libraries, galleries, universities, research projects etc)Focal point for the development of national guidelines and policy for digital preservation and access.

repository.dri.ie

Page 5: Rebecca Grant & Dolores Grant - Data Archiving for the Irish Record Linkage Project

Data archiving for the Irish Record Linkage project

This is a Placeholder for Text

•Bullet-point 01

•Bullet-point 02

•Bullet-point 03

INSIGHT@NUI GalwayInsight is a joint initiative between University College Dublin, the National University of Ireland at Galway, University College Cork, and Dublin City University. Insight was established in 2013 by Science Foundation Ireland with funding of €75m.

The Semantic Web,Sensors and the Sensor Web,Social network analysis,Decision Support and Optimization, andConnected Health.

Page 6: Rebecca Grant & Dolores Grant - Data Archiving for the Irish Record Linkage Project

Data archiving for the Irish Record Linkage project

This is a Placeholder for Text

•Bullet-point 01

•Bullet-point 02

•Bullet-point 03

Irish Record Linkage and Linked Data Queries

• How many women died within 42 days following childbirth due to complications related to labour and how does that figure correspond with the official reports?

• Which women died of causes that can be attributed to maternal death, but for which no corresponding birth certificate exists?

• How did various socio-economic conditions affect maternal and infant mortality rates?

Page 7: Rebecca Grant & Dolores Grant - Data Archiving for the Irish Record Linkage Project

Data archiving for the Irish Record Linkage project

This is a Placeholder for Text

•Bullet-point 01

•Bullet-point 02

•Bullet-point 03

The General Register Office (GRO) – civil registry responsible for recording information on births, deaths and marriages.

Records of 5,847,323 births (from 1864 to 1912), 4,236,922 deaths (from 1864 and 1912) and 1,160,546 marriages (from 1845 to 1912) transferred to the project team with strict terms and conditions.

Events were captured on register pages (up to 10 for births and deaths, and up to 4 for marriages) divided by district and sent to the GRO where volumes were then created and an index compiled. Database dump of the GRO's database with digitised versions of theregister pages and indexes (TIFFs)

General Register Office records

Page 8: Rebecca Grant & Dolores Grant - Data Archiving for the Irish Record Linkage Project

Data archiving for the Irish Record Linkage project

This is a Placeholder for Text

•Bullet-point 01

•Bullet-point 02

•Bullet-point 03

The Linked Data Concept

The example above describes the subject (James Joyce) and his relationship (predicate) to an object (Dublin). By semantically separating the elements of the information (that James Joyce was born in Dublin) datasets stored in this way can be easily queried.

Page 9: Rebecca Grant & Dolores Grant - Data Archiving for the Irish Record Linkage Project

Data archiving for the Irish Record Linkage project

This is a Placeholder for Text

•Bullet-point 01

•Bullet-point 02

•Bullet-point 03

Page 10: Rebecca Grant & Dolores Grant - Data Archiving for the Irish Record Linkage Project

Birth Records

Register TIFF Index TIFF System Pre 1900 System Post 1900 Superintendent Registrar’s District

Registrar’s District Registration district District District Union County County County Province Province Number in register Entry number Date & place of birth Year of event Date of birth, year of event Name (if any) Name Forename, Surname Forename, Surname Sex Sex Name, surname & dwelling place of father

Name & surname & maiden surname of mother

Mother’s maiden name

Rank or profession of father

Signature, qualification, and residence of informant

When Registered Returns year Returns year Returns quarter Returns quarter Signature of Registrar Name & surname & maiden surname of mother

Rank or profession of father

Signature, qualification, and residence of informant

Signature of Registrar Signature of Superintendant Registrar and date

Baptismal name if added after registration of birth and date

Stamp Number Stamp number Stamp number Volume number Returns volume number Returns volume number Page number Page number Returns page number Returns page number Stamped number Page ID 2nd Stamped

number

Index entry number Index page number

Page 11: Rebecca Grant & Dolores Grant - Data Archiving for the Irish Record Linkage Project

Data archiving for the Irish Record Linkage project

This is a Placeholder for Text

•Bullet-point 01

•Bullet-point 02

•Bullet-point 03

Archival principlesThe principle of provenance: Provenance means the history of ownership related to a group of records or an individual item in a collection. Preserving information on these relationships is essential as they provide evidence of how and who created and used the records before they became part of the archives. Provenance provides essential contextual information for understanding the content and history of an archival collectionThe principle of original order: Archives are kept in the order in which they were originally created or used. This original order allows custodians to protect the authenticity of the records and provides essential information as to how they were created, kept and used.

Page 12: Rebecca Grant & Dolores Grant - Data Archiving for the Irish Record Linkage Project

Data archiving for the Irish Record Linkage project

This is a Placeholder for Text

•Bullet-point 01

•Bullet-point 02

•Bullet-point 03

Data (eg. database records and TIFFs) are only stored for the duration of the project, and must be destroyed following its completion

Data can only be accessed by the IRL project team after an access agreement has been signed

Records cannot be duplicated, downloaded, brought off-site

Personal, identifying information cannot be published

Copyright and related rights remain vested in the General Register Office.

Terms of transfer

Page 13: Rebecca Grant & Dolores Grant - Data Archiving for the Irish Record Linkage Project

Data archiving for the Irish Record Linkage project

This is a Placeholder for Text

•Bullet-point 01

•Bullet-point 02

•Bullet-point 03

DRI Presentation

Archival authenticityThe quality of being genuine, not a counterfeit, and free from tampering, and is typically inferred from internal and external evidence, including its physical characteristics, structure, content, and context.

The presence of a signature serves as a fundamental test for authenticity; the signature identifies the creator and establishes the relationship between the creator and the record.

The style and language of the document must be consistent with other, related documents that are accepted as authentic.

Society of American Archivists http://www2.archivists.org/glossary/terms/a/authenticity

Page 14: Rebecca Grant & Dolores Grant - Data Archiving for the Irish Record Linkage Project

Data archiving for the Irish Record Linkage project

This is a Placeholder for Text

•Bullet-point 01

•Bullet-point 02

•Bullet-point 03

DRI Presentation

Archival authenticityOnly records that are complete can ensure accountability and protect personal rights[…]Individual records must be complete; they must contain all the information they had when they were created. They must also maintain their original structure and context. (Hirtle)

An authentic record is one that is what it purports to be and has not been tampered with or otherwise corrupted. (InterPARES 2)

For a record to be considered trustworthy […] it must accurately reflect the event it records and be uncontaminated by the distorting influence of time, bias, interpretation, or unwarranted opinion on the part of the record-maker (McNeil)

Page 15: Rebecca Grant & Dolores Grant - Data Archiving for the Irish Record Linkage Project

Data archiving for the Irish Record Linkage project

This is a Placeholder for Text

•Bullet-point 01

•Bullet-point 02

•Bullet-point 03

Initial data preparationFinal dataset comprises birth, marriage and death records from 2 districts in Dublin (South City no. 1 and South City no. 3)

Separate database constructed to enable the encoding of the IRL records

Tables represent both the register pages and the records (“record” = historical event)

Each event links back to the register page

Fields created reflect original record information and structure enables transformation to RDF

Page 16: Rebecca Grant & Dolores Grant - Data Archiving for the Irish Record Linkage Project

Data archiving for the Irish Record Linkage project

This is a Placeholder for Text

•Bullet-point 01

•Bullet-point 02

•Bullet-point 03

DRI Presentation

• Whole, authentic record maintained to represent the original record and preserve context of creation

• Every database record linked to the TIFF image – TIFFs stored in semi-meaningful arrangement

• Consistent cataloguing practices (dates, square brackets, [sic], notes field to capture anomalies)

• Paleography• Controlled vocabulary of death terms and professions• Archiving databases: preserving content, structure and processes

(RODA toolkit (Repository of Authentic Digital Objects), SIARD (Software Independent Archiving of Relational Databases))

Data challenges

Page 17: Rebecca Grant & Dolores Grant - Data Archiving for the Irish Record Linkage Project

Data archiving for the Irish Record Linkage project

This is a Placeholder for Text

•Bullet-point 01

•Bullet-point 02

•Bullet-point 03

DRI Presentation

Separation of concerns – transcription vs intepretationVariance in how subject names and places were recorded (initials,short hands, name of a building versus street name) - might imply something, which we are currently unaware of.

Transcription of the register pages transcribes exactly what was written down.

Some interpretation necessary in order to use data however – eg. street names changing over time, new insights into medical conditions, adoption of new social theory (eg. class distinctions)

Captured data in two separate ontologies – one for transcription, one for intepretation. For example a death recorded in days in the first database can be interpreted/queried as hours in the second.

Page 18: Rebecca Grant & Dolores Grant - Data Archiving for the Irish Record Linkage Project

Data archiving for the Irish Record Linkage project

This is a Placeholder for Text

•Bullet-point 01

•Bullet-point 02

•Bullet-point 03

GRO Triplestore

Triplestore 2 Data Analysis

SEPA

RATI

ON

OF

CON

CERN

S

Page 19: Rebecca Grant & Dolores Grant - Data Archiving for the Irish Record Linkage Project

Data archiving for the Irish Record Linkage project

This is a Placeholder for Text

•Bullet-point 01

•Bullet-point 02

•Bullet-point 03

DRI Presentation

Register page as EAD (database crosswalk)

Page 20: Rebecca Grant & Dolores Grant - Data Archiving for the Irish Record Linkage Project

Data archiving for the Irish Record Linkage project

This is a Placeholder for Text

•Bullet-point 01

•Bullet-point 02

•Bullet-point 03

DRI Presentation

Page 21: Rebecca Grant & Dolores Grant - Data Archiving for the Irish Record Linkage Project

Data archiving for the Irish Record Linkage project

This is a Placeholder for Text

•Bullet-point 01

•Bullet-point 02

•Bullet-point 03

DRI Presentation

Archival authenticity and preservationArchivist encoded entire register pages rather than lines of data regarding an individual (eg. a single life event such as a death)

Database records refer back to digitised TIFFs created by General Register Office

Interpretation of the dataset occurs separately – all records are transcribed exactly including typos, blank fields, details crossed out, Xs etc.

TIFFs can be preserved with EAD or QDC metadata, and associated databases preserved separately and linked

Querying of the data occurs only on an obfuscated dataset with personal names excluded; linked data can contain outbound links but is protected by a firewall

Authenticity of the dataset

Page 22: Rebecca Grant & Dolores Grant - Data Archiving for the Irish Record Linkage Project

Data archiving for the Irish Record Linkage project

This is a Placeholder for Text

•Bullet-point 01

•Bullet-point 02

•Bullet-point 03

Bibliography

Hirtle, Peter. “Archival Authenticity in a Digital Age”. Authenticity in a digital environment, 2000.

Lee, Brent. Authenticity, Accuracy and Reliability: Reconciling Arts-related and Archival Literature, 2005.

McNeil, Heather. “Trusting Records in a Postmodern World”. Archivaria 51, 2001.

Pearce-Moses, Richard. A Glossary of Archival and Records Terminology, 2005.

SIARD Suite: http://www.bar.admin.ch/dienstleistungen/00823/01911/index.html?lang=en

Page 23: Rebecca Grant & Dolores Grant - Data Archiving for the Irish Record Linkage Project

Data archiving for the Irish Record Linkage project

This is a Placeholder for Text

•Bullet-point 01

•Bullet-point 02

•Bullet-point 03

@beck_grant@IRL_project

[email protected]

http://repository.dri.ie

The content of this presentation is licensed as CC-BY. Please attribute to Rebecca Grant, Digital Archivist, Digital Repository of Ireland, 2015.

https://irishrecordlinkage.wordpress.com/