Upload
free-uk-genealogy
View
494
Download
2
Embed Size (px)
Citation preview
Family History and Linked Data
Free UK Genealogy Open Data Conference, 30 January 2016
Richard Light
Lights
Kerridges
Kerridges + Lights!
Kerridge and Light
• … and Weissbeck• relatively uncommon names• How can FreeBMD and FreeCen help?
Other people were here first …
• Lots of Kerridge research• Lights actually feature in a book: Common
People (Alison Light)
Kerridge
Light
Pooling results
• Do we want to do it? (Not everyone does …)• If so, how can it be done?• How do you say that you’re both talking about
the same person?
Current FreeUKGen search facilities
• BMD search is sophisticated and flexible• Only one result type: people who match• Census search has same approach, with links
to individual households
BMD search
Register search
Census search
Limitations of current search
• Limit of 3000 hits per BMD search• Difficult to get to household info• Result pages can’t be bookmarked– http://www.freecen.org.uk/cgi/search.pl
• Main problem: searches all return HTML!
Getting machine-processible data
• Save FreeBMD HTML results page• Copy table of results• Paste into spreadsheet• Save as CSV file• Convert to XML and load into Modes
BMD data in Modes
Limitations
• Imprecision– temporal, e.g. BMD ‘after the event’ and grouped by
quarter– geographical: BMD only specifies District; Census -> Parish– names: variations in spelling– copying/transcription errors
• Incompleteness– overseas births/deaths– non-registration– transcription backlog
Encoding a BMD entry as XML
Indexed search, e.g. places
Inference of birth data
Speculative matching death -> birth
Working with census data
• Initial efforts ‘broke’ FreeCen!• Data had to be loaded from a full dump• Loaded all Districts, Pieces and Households• Selectively loaded Light and Kerridge records• Then loaded all people registered in one of
these Light or Kerridge households• Shows up Lights/Kerridges as servants, in
institutions, etc.
Districts
Pieces
Households
Census data: co-contextuality
• Each ‘household’ records relationships between people
• Binary links between ‘Head’ and others, but other family relationships can be inferred
• Nothing like the completeness of FreeBMD, but more can be done with the data that is there
Household summaries
Occupations - KerridgeOccupations of Kerridges (>1)
KERRIDGE Scholar KERRIDGE - KERRIDGE Ag Labr KERRIDGE Agricultural Labourer(Em'ee)KERRIDGE Farmer's Son KERRIDGE Farm Labourer(Em'ee) KERRIDGE Farmer(Em'er) KERRIDGE Labourer(Em'ee)KERRIDGE Domestic Servant KERRIDGE Farm Labr KERRIDGE Agricultural Laborer(Em'ee) KERRIDGE Brickmaker(Em'ee)KERRIDGE Farm Labourer (Em'ee) KERRIDGE Retired Ag Labr
Occupations - LightOccupations of Lights (>1)
LIGHT Scholar LIGHT Ag Lab LIGHT Ag Laborer LIGHT Labourer LIGHT Copper MinerLIGHT Female Servant LIGHT Miner LIGHT Pauper LIGHT Sawyer LIGHT Tin Miner(Em'ee)LIGHT - LIGHT Butcher(Em'ee) LIGHT Coal Miner(Em'ee) LIGHT Cordwainer LIGHT GardenerLIGHT General Servant LIGHT Independent LIGHT Mariner LIGHT Milliner LIGHT Miner Copper
Cross-linking census data to BMD
• Census records include place of birth and age• Can use same inference techniques to match
against BMD data
An Open Data FreeUKGen API …
• … could be HTTP-based; RESTful• would support a wide variety of information
needs• would deliver a variety of machine-processible
formats• would allow re-use of the data
The problem of identity
• All my data files use invented primary keys for people, places, … which are only significant within my database
• In general, how do we assert that two statements are about the same person?
• None of these is sufficient on its own:– Name– Date of birth/death– Place of birth/death
Linked Data
• One step beyond Open Data• Combines idea of machine-processible data
with a persistent identity for each concept• Uses content negotiation to return RDF, XML,
JSON, … for each URL• Allows programmatic access to data;
processing chains (‘follow your nose’)• Requires suitably open licensing
Linked Data example: Wordsworth Trust
Museum catalogue data as RDF
Everything comes from the same URL
http://collections.wordsworth.org.uk/Object/WTcoll/id/GRMDC.C144.9
By default, return HTML:http://collections.wordsworth.org.uk/Object/WTcoll/id/html/GRMDC.C144.9
When RDF requested (in Accept header), redirect to a variant URL:http://collections.wordsworth.org.uk/Object/WTcoll/id/rdf/GRMDC.C144.9
Can support lots of variant formats, e.g. XML, JSON, … This approach relies on a technique called Content Negotiation
Linked Data URLs are unique; persistent; dereferenceable
What FreeUKGen resources could we publish as Linked Data?
• Can only assign identifiers to data we have– BMD registration events– Census return events– Pieces, Districts etc.
• Can’t assign identifiers to people• Problem: current database update strategy
generates identifiers afresh each time– Conflicts with need for persistent identifiers
Potential Linked Data projects
• Produce authorities which can be integrated into current approach:– Geographical units: Districts, Parishes, Pieces,
named places. Link to Geonames, OS Gazetteer– Occupations: potential for useful groupings (e.g.
Ag Lab and variants). Link to SIC, SHIC?• Generate persistent identifiers for the primary
references published by FreeUKGen– e.g. a page within the BMD index
Let the computer work harder!
• Current approach makes very little use of the computer as a data-processing tool
• FreeUKGen resources as Open Data would support new types of research and simplify e.g. Single Name Studies
• FreeUKGen resources as Linked Data would give the community a common frame of reference for its work
Cultural Heritage Linked Data