Linking historical ship records to a newspaper archive

Andrea Bravo BaladoVictor de Boer, Guus Schreiber

VU University Amsterdam

Context: dutchshipsandsailors.nl/

Dutch Ships and Sailors (DSS) datasets

Results published as Linked Data

Data visualizations

This study

• Increasing number of historical databases are being digitized

• Finding matching occurrences of the same object in different datasets is both relevant (for historical research) and non-trivial– “Instance mapping”

• This paper: case study of linking ship instances in two maritime datasets

Focus on methodology

• This study is not about developing new techniques

• This study is about methodology:– What combination of existing techniques gets the

“best” result?– What the “best” result is depends on context (i.e.,

goal of the historical research)• This is a case study, so be wary of

generalization

• Muster rolls (Northern Dutch Maritime Museum)– Period: 1803-1937– 77,043 records of 34,552 sea men – 17,098 mentions of 4,935 ships

• Newspaper archive (Dutch National Library)– Period: 1618-1995– 7K newspapers, 9M pages (coverage: 10%) – Text generated via OCR

Timeline newspapers in the archive

Example muster roll record (in Dutch)

Example newspaper article (in Dutch)

Approach

• Generate candidate set of links• Apply two types of filters to the candidate set– Domain-specific filtering• Using domain heuristics about ship identification

– Text classification of newspaper articles• Determine whether the article is about a ship

• Combine filters

Baseline generation

• Find all ship instances in the muster rolls• Query newspaper archive for first 100 hits

with this name– API: http://www.delpher.nl/

• Result set is expected to have high recall but low precision

Evaluation

• No gold standard• Manual assessment of all links is infeasible• Sampling method for evaluating candidates– 50 candidates per technique– 3 assessors (domain expert plus two authors)– Inter-observer agreement: Cohen’s kappa = 0.65

• Recall: approximation, based on the estimated number of correct links (using the baseline)

Domain-specific filtering

• Heuristic 1: co-occurrence of name of ship captain– Common practice in historical maritime

documentation• Heuristic 2: date of newspaper article is within

ship lifetime (as indicated by muster roll)– Average life span of ship is 30 years

Text classification

• Task: decide whether a newspaper article is about a ship

• Two techniques used– Naive Bayes and Support Vector Machine (SVM)

with Sequential Minimal Optimisation (SMO)– WEKA implementation– Training set: 200 samples (121 positive, 79

negative)

Configuration

• Filter 1a: captain name• Filter 1b: time restriction• Filter 2: combine filters 1a + 1b• Filter 2 + text classification

Results

Analysis

• Captain’s name turns out to be a strong heuristic

• Time restriction much less useful• When combined, precision becomes very high,

at the cost of (approximate) recall• Text classification has high precision (no false

positives)• Text classification combined with heuristic

filtering has negative effect

Discussion

• Interestingly, the historian preferred very high precision at the cost of recall

• Consequently, 16K links published as Linked Data (precision 0.96; approximate recall 0.13)

• Links are to departure/arrival listing, but also to shipwrecks and sales

• In case of good heuristics the contribution of generic techniques is at best minimal

• Absence of gold standard is realistic

Limitations

• Evaluation– 50 samples – Choice of assessors– Approximation of recall

• Data– OCR quality of newspaper articles– Digitized newspaper archive covers only 10%

Acknowledgements

• Jurjen Leinenga, domain expert• CLARIN-NL

http://www.clarin.nl • BiographyNet, Netherlands eScience Center

http://esciencecenter.nl

• Online appendix with details of results at http://dx.doi.org/10.6084/m9.figshare.1189228

QUESTION TIME

Linking historical ship records to a newspaper archive

Technology

Linking Business Management With E-ship Growth

Fundamentals of Ship Hydrodynamics: Fluid Mechanics, Ship ... · 1 Ship Hydrodynamics . Calm Water Hydrodynamics . Ship Hydrodynamics and Ship Design . Available Tools 2 Ship Resistance

Virtual Newspaper "Linking Distances" - English - July 2011

National Digital Newspaper Program · 2012-06-04 · National Digital Newspaper Program A Case Study in Sharing, Linking, and Using Data ACM/IEEE 2012 Joint Conference on Digital

Corneal cross-linking - Farhad Hafezi€¦ · cornea. cross-linking bination keratoconus ectasia infectious keratitis cross-linking plus accelerated cross-linking. Since its inception

Pilot Tool for Linking Ship Design to Shipyard Simulation · Pilot Tool for Linking Ship Design to Shipyard Simulation 8/11/2011 Bagley College of Engineering, Mississippi State University

Linking out, linking in · Linking out, linking in Preparing for Linked Data at the University of Alberta Ian Bigelow, Sharon Farnel Netspeed 2017

and Sponsor-ship Across Campus Friend-ship, Mentor-ship ... · Friend-ship, Mentor-ship, and Sponsor-ship Across Campus with Meaghan DeRespini and Erin Leigh Inama Sailing the Berkeley

Ship My Furniture - Ship Furniture

INVESTIGATION OF SHIP-BANK, SHIP-BOTTOM AND SHIP-SHIP ... Ship Bottom Interaction/1_11.pdf · INVESTIGATION OF SHIP-BANK, SHIP-BOTTOM AND SHIP-SHIP INTERACTIONS BY USING POTENTIAL

INVESTIGATION OF SHIP-BANK, SHIP-BOTTOM AND SHIP-SHIP ...€¦ · INVESTIGATION OF SHIP-BANK, SHIP-BOTTOM AND SHIP-SHIP INTERACTIONS BY USING POTENTIAL FLOW METHOD Z-M Yuan and A

Linking historical ship records to newspaper archives · Linking historical ship records to newspaper archives ... Linking historical ship records to newspaper archives ... edge for

ISIS-XI Ship Operator / Ship Board Edition Installation ... Ship Operator - Ship Board... · ISIS-XI Ship Operator / Ship Board Edition Installation and User Guide ... OCIMF) FILES

U.S. COAST GUARD CUTTER PROCUREMENT LESSONS’ IMPACTS … · U.S. COAST GUARD CUTTER PROCUREMENT LESSONS’ IMPACTS ON ... Costs While Linking Combat Systems and Ship Design.

FleetBroadband brings ship-to-shore/ship-to-ship · USIM card slot LAN RJ45 ports for ... FleetBroadband brings ship-to-shore/ship-to-ship operational and social communications to

Pilot Tool for Linking Ship Design to Shipyard · PDF filePilot Tool for Linking Ship Design to Shipyard Simulation 8/11/2011 Bagley College of Engineering, Mississippi State University

YOUR SHIP WITHIN A SHIP

Droshak Newspaper A Newspaper That Openly Supported Terrorism · the section “Komitas Kahanan”, ... A Newspaper That Openly Supported Terrorism Droshak . Newspaper: A Newspaper

Ship construction Ship dimensions

SMTF, Ship to Ship Bunkering