25
Approaching Archival Authenticity when “Records” become “Data” Rebecca Grant, Digital Archivist, Digital Repository of Ireland Dolores Grant, DRI-IRL Digital Archivist, Digital Repository of Ireland Dr. Sharon Webb, Knowledge Transfer Manager, Digital Arts & Humanities PhD Programme Dr. Sandra Collins, Director, Digital Repository of Ireland

Rebecca Grant - Approaching Archival Authenticity: when 'Records' become 'Data

Embed Size (px)

Citation preview

Page 1: Rebecca Grant - Approaching Archival Authenticity: when 'Records' become 'Data

Approaching Archival Authenticity when “Records” become “Data”

Rebecca Grant, Digital Archivist, Digital Repository of IrelandDolores Grant, DRI-IRL Digital Archivist, Digital Repository of IrelandDr. Sharon Webb, Knowledge Transfer Manager, Digital Arts & Humanities PhD ProgrammeDr. Sandra Collins, Director, Digital Repository of Ireland

Page 2: Rebecca Grant - Approaching Archival Authenticity: when 'Records' become 'Data

The Digital Repository of IrelandDRI is a trusted digital repository for the Humanities and Social Sciences data – launched June 2015

Linking and preserving the rich collections held by Irish institutions (archives, museums, libraries, galleries, universities, research projects etc)Focal point for the development of national guidelines and policy for digital preservation and access.

repository.dri.ie

Page 3: Rebecca Grant - Approaching Archival Authenticity: when 'Records' become 'Data

Irish Record Linkage project 1864-1913

Irish Record Linkage is an Irish Research Council funded project running from 2014 – September 2015

Collaboration between the University of Limerick (medical historians), the Digital Repository of Ireland at the Royal Irish Academy (archivists!), and Insight@NUI Galway (knowledge engineers, Linked Data experts)

Constructing a Knowledge Platform – Linked Data based on Vital Registration Data (digitised registers of Births, Marriages and Deaths) in order to answer research questions around infant and maternal mortality

Page 4: Rebecca Grant - Approaching Archival Authenticity: when 'Records' become 'Data

Irish Record Linkage and Linked Data Queries

• How many women died within 42 days following childbirth due to complications related to labour and how does that figure correspond with the official reports?

• Which women died of causes that can be attributed to maternal death, but for which no corresponding birth certificate exists?

• How did various socio-economic conditions affect maternal and infant mortality rates?

Page 5: Rebecca Grant - Approaching Archival Authenticity: when 'Records' become 'Data

The General Register Office (GRO) – civil registry responsible for recording information on births, deaths and marriages.

Records of 6,009,781 births (from 1864 to 1912), 4,314,963 deaths (from 1864 and 1912) and 1,443,110 marriages (from 1845 to 1912) transferred to the project team with strict terms and conditions.

Events were captured on register pages (up to 10 for births and deaths, and up to 4 for marriages) divided by district and sent to the GRO where volumes were then created and an index compiled. Database dump of the GRO's database with digitised versions of theregister pages and indexes (TIFFs)

General Register Office records

Page 6: Rebecca Grant - Approaching Archival Authenticity: when 'Records' become 'Data

Data (eg. database records and TIFFs) are only stored for the duration of the project, and must be destroyed following its completion

Data can only be accessed by the IRL project team after an access agreement has been signed

Records cannot be duplicated, downloaded, brought off-site

Personal, identifying information cannot be published

Copyright and related rights remain vested in the General Register Office.

Terms of transfer

Page 7: Rebecca Grant - Approaching Archival Authenticity: when 'Records' become 'Data

Birth records with redactions

The IRL project are not data owners..

The security and authenticity of the dataset were critical to the success of the project.

Page 8: Rebecca Grant - Approaching Archival Authenticity: when 'Records' become 'Data

The Linked Data Concept

A method of publishing structured data on the Web, allowing it to be connected and enriched, and facilitating linking between related resources.

Linked Data standards such as RDF allows semantic definitions to be applied to information, using statements called ‘triples’ in the form subject, predicate, object.

A key principle of Linked Data is that HTTP URIs are used to name the semantic elements of the dataset

Page 9: Rebecca Grant - Approaching Archival Authenticity: when 'Records' become 'Data

The Linked Data Concept

The example above describes the subject (James Joyce) and his relationship (predicate) to an object (Dublin). By semantically separating the elements of the information (that James Joyce was born in Dublin) datasets stored in this way can be easily queried.

Page 10: Rebecca Grant - Approaching Archival Authenticity: when 'Records' become 'Data

Competency questions for ontology constructionID Competency Question

C01 Women died within 41 days after giving birth (the date of birth counted as day 1 and day 41 is included)

C02 Women died within 41 days after giving birth AND in their death certificate ‘complication 1’ is mentioned.

C03 Women died within 41 days after giving birth AND in their death certificate ‘complication 2’ is mentioned.

C04 Women having official maternal death reports including “XXXX’

C05 Women having official maternal death reports including “cause 1”

C06 Women having official maternal death reports including “cause 2 and cause 3 together”

C07 For each record in C04 find the ones with corresponding birth record (the date of death counted as day 1 and day 41 is included)

Page 11: Rebecca Grant - Approaching Archival Authenticity: when 'Records' become 'Data

A General Register Office Birth Record, 1870

Page 12: Rebecca Grant - Approaching Archival Authenticity: when 'Records' become 'Data

Linked Data (logainm.ie)

Page 13: Rebecca Grant - Approaching Archival Authenticity: when 'Records' become 'Data

Birth Records

Register TIFF Index TIFF System Pre 1900 System Post 1900 Superintendent Registrar’s District

Registrar’s District Registration district District District Union County County County Province Province Number in register Entry number Date & place of birth Year of event Date of birth, year of event Name (if any) Name Forename, Surname Forename, Surname Sex Sex Name, surname & dwelling place of father

Name & surname & maiden surname of mother

Mother’s maiden name

Rank or profession of father

Signature, qualification, and residence of informant

When Registered Returns year Returns year Returns quarter Returns quarter Signature of Registrar Name & surname & maiden surname of mother

Rank or profession of father

Signature, qualification, and residence of informant

Signature of Registrar Signature of Superintendant Registrar and date

Baptismal name if added after registration of birth and date

Stamp Number Stamp number Stamp number Volume number Returns volume number Returns volume number Page number Page number Returns page number Returns page number Stamped number Page ID 2nd Stamped

number

Index entry number Index page number

Page 14: Rebecca Grant - Approaching Archival Authenticity: when 'Records' become 'Data

DRI Presentation

Archival authenticity

The quality of being genuine, not a counterfeit, and free from tampering, and is typically inferred from internal and external evidence, including its physical characteristics, structure, content, and context.

The presence of a signature serves as a fundamental test for authenticity; the signature identifies the creator and establishes the relationship between the creator and the record.

The style and language of the document must be consistent with other, related documents that are accepted as authentic.

Society of American Archivists http://www2.archivists.org/glossary/terms/a/authenticity

Page 15: Rebecca Grant - Approaching Archival Authenticity: when 'Records' become 'Data

DRI Presentation

Archival authenticityOnly records that are complete can ensure accountability and protect personal rights[…]Individual records must be complete; they must contain all the information they had when they were created. They must also maintain their original structure and context. (Hirtle)

An authentic record is one that is what it purports to be and has not been tampered with or otherwise corrupted. (InterPARES 2)

For a record to be considered trustworthy […] it must accurately reflect the event it records and be uncontaminated by the distorting influence of time, bias, interpretation, or unwarranted opinion on the part of the record-maker (McNeil)

Page 16: Rebecca Grant - Approaching Archival Authenticity: when 'Records' become 'Data

DRI Presentation

Approaching authenticity for the IRL project

The dataset cannot provide evidence of structure, context, standardised style, signatures – therefore the data “record” must always be linked to the TIFF

The “records” transcribed must be complete – all data must be transcribed, even if it is not currently used to answer our research questions

The “records” should not be biased by interpretation – each piece of data should be transcribed faithfully.

Page 17: Rebecca Grant - Approaching Archival Authenticity: when 'Records' become 'Data

Initial data preparation

Final dataset comprises death records from 2 districts in Dublin (South City no. 1 and South City no. 3)

Separate database constructed to enable the encoding of the IRL records

Tables represent both the register pages and the records (“record” = historical event)

The register page and record are linked to the index page

Fields created reflect original record information and structure enables transformation to RDF

Page 18: Rebecca Grant - Approaching Archival Authenticity: when 'Records' become 'Data

DRI Presentation

• Whole, authentic record maintained to represent the original record and preserve context of creation

• Every database record linked to the TIFF image – TIFFs stored in semi-meaningful arrangement

• Consistent cataloguing practices (dates, square brackets, [sic], notes field to capture anomalies)

• Paleography• Controlled vocabulary of death terms and professions• Archiving databases: preserving content, structure and processes

(RODA toolkit (Repository of Authentic Digital Objects), SIARD (Software Independent Archiving of Relational Databases))

Data challenges

Page 19: Rebecca Grant - Approaching Archival Authenticity: when 'Records' become 'Data

GRO Triplestore

Triplestore 2 Data Analysis

Transformation from one model to another• SPIN – SPARQL Inference• SWRL / RuleML• SPARQL Construct• …

SEPA

RATI

ON

OF

CON

CERN

S

GRO Records annotation vs. Data Analysis

Page 20: Rebecca Grant - Approaching Archival Authenticity: when 'Records' become 'Data

DRI Presentation

Separation of concerns – transcription vs intepretation

Variance in how subject names and places were recorded (initials,short hands, name of a building versus street name) - might imply something, which we are currently unaware of.

Transcription of the register pages transcribes exactly what was written down.

Some interpretation necessary in order to use data however – eg. street names changing over time, new insights into medical conditions, adoption of new social theory (eg. class distinctions)

Captured data in two separate ontologies – one for transcription, one for intepretation. For example a death recorded in days in the first database can be interpreted/queried as hours in the second.

Page 21: Rebecca Grant - Approaching Archival Authenticity: when 'Records' become 'Data

DRI Presentation

Register page as EAD (database crosswalk)

Page 22: Rebecca Grant - Approaching Archival Authenticity: when 'Records' become 'Data

DRI Presentation

Page 23: Rebecca Grant - Approaching Archival Authenticity: when 'Records' become 'Data

DRI Presentation

Thinking about archival authenticity

Archivist encoded entire register pages rather than lines of data regarding an individual (eg. a single life event such as a death)

Database records refer back to digitised TIFFs created by General Register Office

Interpretation of the dataset occurs separately – all records are transcribed exactly including typos, blank fields, details crossed out, Xs etc.

TIFFs can be preserved with EAD metadata, and associated databases preserved separately and linked

Querying of the data occurs only on an obfuscated dataset with personal names excluded; linked data can contain outbound links but is protected by a firewall

Authenticity of the dataset

Page 24: Rebecca Grant - Approaching Archival Authenticity: when 'Records' become 'Data

Bibliography

Hirtle, Peter. “Archival Authenticity in a Digital Age”. Authenticity in a digital environment, 2000.

Lee, Brent. Authenticity, Accuracy and Reliability: Reconciling Arts-related and Archival Literature, 2005.

McNeil, Heather. “Trusting Records in a Postmodern World”. Archivaria 51, 2001.

Pearce-Moses, Richard. A Glossary of Archival and Records Terminology, 2005.

SIARD Suite: http://www.bar.admin.ch/dienstleistungen/00823/01911/index.html?lang=en

Page 25: Rebecca Grant - Approaching Archival Authenticity: when 'Records' become 'Data

@beck_grant@dri_ireland

[email protected]

http://repository.dri.ie

The content of this presentation is licensed as CC-BY. Please attribute to Rebecca Grant, Digital Archivist, Digital Repository of Ireland, 2015.

https://irishrecordlinkage.wordpress.com/