41
Family History and Linked Data Free UK Genealogy Open Data Conference, 30 January 2016 Richard Light

Open data and Free UK Genealogy

Embed Size (px)

Citation preview

Page 1: Open data and Free UK Genealogy

Family History and Linked Data

Free UK Genealogy Open Data Conference, 30 January 2016

Richard Light

Page 2: Open data and Free UK Genealogy

Lights

Page 3: Open data and Free UK Genealogy

Kerridges

Page 4: Open data and Free UK Genealogy

Kerridges + Lights!

Page 5: Open data and Free UK Genealogy

Kerridge and Light

• … and Weissbeck• relatively uncommon names• How can FreeBMD and FreeCen help?

Page 6: Open data and Free UK Genealogy

Other people were here first …

• Lots of Kerridge research• Lights actually feature in a book: Common

People (Alison Light)

Page 7: Open data and Free UK Genealogy

Kerridge

Page 8: Open data and Free UK Genealogy

Light

Page 9: Open data and Free UK Genealogy

Pooling results

• Do we want to do it? (Not everyone does …)• If so, how can it be done?• How do you say that you’re both talking about

the same person?

Page 10: Open data and Free UK Genealogy

Current FreeUKGen search facilities

• BMD search is sophisticated and flexible• Only one result type: people who match• Census search has same approach, with links

to individual households

Page 11: Open data and Free UK Genealogy

BMD search

Page 12: Open data and Free UK Genealogy

Register search

Page 13: Open data and Free UK Genealogy

Census search

Page 14: Open data and Free UK Genealogy

Limitations of current search

• Limit of 3000 hits per BMD search• Difficult to get to household info• Result pages can’t be bookmarked– http://www.freecen.org.uk/cgi/search.pl

• Main problem: searches all return HTML!

Page 15: Open data and Free UK Genealogy

Getting machine-processible data

• Save FreeBMD HTML results page• Copy table of results• Paste into spreadsheet• Save as CSV file• Convert to XML and load into Modes

Page 16: Open data and Free UK Genealogy

BMD data in Modes

Page 17: Open data and Free UK Genealogy

Limitations

• Imprecision– temporal, e.g. BMD ‘after the event’ and grouped by

quarter– geographical: BMD only specifies District; Census -> Parish– names: variations in spelling– copying/transcription errors

• Incompleteness– overseas births/deaths– non-registration– transcription backlog

Page 18: Open data and Free UK Genealogy

Encoding a BMD entry as XML

Page 19: Open data and Free UK Genealogy

Indexed search, e.g. places

Page 20: Open data and Free UK Genealogy

Inference of birth data

Page 21: Open data and Free UK Genealogy

Speculative matching death -> birth

Page 22: Open data and Free UK Genealogy

Working with census data

• Initial efforts ‘broke’ FreeCen!• Data had to be loaded from a full dump• Loaded all Districts, Pieces and Households• Selectively loaded Light and Kerridge records• Then loaded all people registered in one of

these Light or Kerridge households• Shows up Lights/Kerridges as servants, in

institutions, etc.

Page 23: Open data and Free UK Genealogy

Districts

Page 24: Open data and Free UK Genealogy

Pieces

Page 25: Open data and Free UK Genealogy

Households

Page 26: Open data and Free UK Genealogy

Census data: co-contextuality

• Each ‘household’ records relationships between people

• Binary links between ‘Head’ and others, but other family relationships can be inferred

• Nothing like the completeness of FreeBMD, but more can be done with the data that is there

Page 27: Open data and Free UK Genealogy

Household summaries

Page 28: Open data and Free UK Genealogy

Occupations - KerridgeOccupations of Kerridges (>1)

KERRIDGE Scholar KERRIDGE - KERRIDGE Ag Labr KERRIDGE Agricultural Labourer(Em'ee)KERRIDGE Farmer's Son KERRIDGE Farm Labourer(Em'ee) KERRIDGE Farmer(Em'er) KERRIDGE Labourer(Em'ee)KERRIDGE Domestic Servant KERRIDGE Farm Labr KERRIDGE Agricultural Laborer(Em'ee) KERRIDGE Brickmaker(Em'ee)KERRIDGE Farm Labourer (Em'ee) KERRIDGE Retired Ag Labr

Page 29: Open data and Free UK Genealogy

Occupations - LightOccupations of Lights (>1)

LIGHT Scholar LIGHT Ag Lab LIGHT Ag Laborer LIGHT Labourer LIGHT Copper MinerLIGHT Female Servant LIGHT Miner LIGHT Pauper LIGHT Sawyer LIGHT Tin Miner(Em'ee)LIGHT - LIGHT Butcher(Em'ee) LIGHT Coal Miner(Em'ee) LIGHT Cordwainer LIGHT GardenerLIGHT General Servant LIGHT Independent LIGHT Mariner LIGHT Milliner LIGHT Miner Copper

Page 30: Open data and Free UK Genealogy

Cross-linking census data to BMD

• Census records include place of birth and age• Can use same inference techniques to match

against BMD data

Page 31: Open data and Free UK Genealogy

An Open Data FreeUKGen API …

• … could be HTTP-based; RESTful• would support a wide variety of information

needs• would deliver a variety of machine-processible

formats• would allow re-use of the data

Page 32: Open data and Free UK Genealogy

The problem of identity

• All my data files use invented primary keys for people, places, … which are only significant within my database

• In general, how do we assert that two statements are about the same person?

• None of these is sufficient on its own:– Name– Date of birth/death– Place of birth/death

Page 33: Open data and Free UK Genealogy

Linked Data

• One step beyond Open Data• Combines idea of machine-processible data

with a persistent identity for each concept• Uses content negotiation to return RDF, XML,

JSON, … for each URL• Allows programmatic access to data;

processing chains (‘follow your nose’)• Requires suitably open licensing

Page 34: Open data and Free UK Genealogy

Linked Data example: Wordsworth Trust

Page 35: Open data and Free UK Genealogy

Museum catalogue data as RDF

Page 36: Open data and Free UK Genealogy

Everything comes from the same URL

http://collections.wordsworth.org.uk/Object/WTcoll/id/GRMDC.C144.9

By default, return HTML:http://collections.wordsworth.org.uk/Object/WTcoll/id/html/GRMDC.C144.9

When RDF requested (in Accept header), redirect to a variant URL:http://collections.wordsworth.org.uk/Object/WTcoll/id/rdf/GRMDC.C144.9

Can support lots of variant formats, e.g. XML, JSON, … This approach relies on a technique called Content Negotiation

Linked Data URLs are unique; persistent; dereferenceable

Page 37: Open data and Free UK Genealogy

What FreeUKGen resources could we publish as Linked Data?

• Can only assign identifiers to data we have– BMD registration events– Census return events– Pieces, Districts etc.

• Can’t assign identifiers to people• Problem: current database update strategy

generates identifiers afresh each time– Conflicts with need for persistent identifiers

Page 38: Open data and Free UK Genealogy

Potential Linked Data projects

• Produce authorities which can be integrated into current approach:– Geographical units: Districts, Parishes, Pieces,

named places. Link to Geonames, OS Gazetteer– Occupations: potential for useful groupings (e.g.

Ag Lab and variants). Link to SIC, SHIC?• Generate persistent identifiers for the primary

references published by FreeUKGen– e.g. a page within the BMD index

Page 39: Open data and Free UK Genealogy

Let the computer work harder!

• Current approach makes very little use of the computer as a data-processing tool

• FreeUKGen resources as Open Data would support new types of research and simplify e.g. Single Name Studies

• FreeUKGen resources as Linked Data would give the community a common frame of reference for its work

Page 40: Open data and Free UK Genealogy

Cultural Heritage Linked Data

Page 41: Open data and Free UK Genealogy

Thank you!

Richard LightFreeUKGen Trustee@richardofsussex

[email protected]