Upload
victor-de-boer
View
743
Download
0
Embed Size (px)
Citation preview
Linked Data for Digital History
Connecting Data for Research
Victor de Boer
With input from Christophe Guéret, Serge ter Braake, Niels Ockeloen, Antske Fokkens, Dirk Roorda, Lora
Aroyo, Johan Oomen, Oana Inel, Jan Wielemaker, Jeroen Entjes
Victor de Boer
Web & Media Group, CS, Vrije Universiteit AmsterdamNetherlands Institute for Sound and Vision
Cultural HeritageDigital History
Linked Data for Development
Digital HistorySub-discipline of digital humanities
Part of the effort of historian is moved from the physical archives to digital ones
Cross-domain collaborationImg:www.doaks.org, www.dkrz.de
Tools and visualisations
http://armstrongdigitalhistory.org/, http://www.vcdh.virginia.edu/courses/fall07/hius401-f/, http://digitalhistory.unl.edu/essays/thomasessay.php, http://www.philipvickersfithian.com/2013/05/gender-in-stacks-on-managing-small.html
“That is great. I would love that…
…but my research questions are slightly different.”
Img:Monty Python
Aging
Data Tool
C. Guéret based on http://redmonk.com/jgovernor/2007/04/05/why-applciations-are-like-fish-and-data-is-like0wine/
Even betterDo not bake the data into the tool and treat data as an end product.Build tools on top of the data.Make sure others can do so as well.
Fig: C. Guéret
Linked Data for Digital History
• Represent heterogeneous datasets with their own data models in common format: Resource Description Format (RDF)– Link what can be linked
• re-use and re-usability
• Linked Data is the (technically) best way to publish and share your (research) data
OBJECT EVENT
PLACE
TIME
PERSON
CONCEPT
PROVENANCE
Some examples
Dutch Ships and Sailors
The Problem:((Maritime) historical) data is not integrated
KB NEWSPAPERS
Dutch-Asiatic Shipping “VOC Opvarenden”
Jur Leinenga Matthias van Rossum
Elbing voyagesArchangel voyages
DIFFERENT but LINKED DATAMODELS BASED ON COMPETENCY QUESTIONS
dss:Recordgzmvoc:Telling
gzmvoc:telling-1046-De_Berkel
__bnode_1
gzmvoc:aziatischeBemanning
dss:Shipgzmvoc:Schip
gzmvoc: schip-1046-De_Berkel
dss:has_shipgzmvoc:schip
"1046"
“Schip”
“De Berkel”
rdfs:labeldss:scheepsnaam
gzmvoc:scheepsnaam
dss:ShipTypegzmvoc:Scheepstype
gzmvoc: type-Shipdss:has_shiptype
gzmvoc:has_shiptype
gzmvoc:scheepstype
“21”
“Moorse mattroosen”
dss:azRegistratieKop
gzmvoc:azAantalMatrozen
gzmvoc:telling
gzmvoc:heeft DAS heenreis
dss:Recorddas:Voyage
das:voyage-1918_61
ACCESS IT ATHTTP://DUTCHSHIPSANDSAILORS.NL/DATA
OR
HTTP://SEMANTICWEB.CS.VU.NL/DSS
SELECT * WHERE { ?record dss:hasOriginalScan ?scan. ?record dss:has_kb_link ?kblink. ?record mdb:schip ?schip. ?schip mdb:scheepstype ?shiptype. ?shiptype skos:exactMatch ?em. ?em skos:broader* aat:kustvaarders. }
Data analysis and visualisation
MEDIA HISTORIANS AND RESEARCHERS Media researcher Lars Arve Røssland of the University of Bergen. (Photo: Andreas R. Graven)
EXPLORATIVE SEARCH
Digital Hermeneutics: The combination of digital (Web) technology and theory of interpretation
DATA: OPENIMAGES.EU and DELPHER.NL
ENTITY EXTRACTION
CROWDTRUTH.ORG
ENTITY EXTRACTION
EVENTS CROWDSOURCING AND LINKING TO CONCEPTS THROUGH CROWDTRUTH.ORG
SEGMENTATION & KEYFRAMES
LINKING EVENTS AND CONCEPTS TO KEYFRAMES
DATA CONNECTED IN KNOWLEDGE GRAPH
DIVE:MEDIA OBJECT
SEM:EVENT
SEM:PLACE
SEM:TIME
SEM:ACTOR
SKOS:CONCEPT
OA:ANNOTATION
LINKS TO EUROPEANA LINKS TO DBPEDIA
BiographyNetStarting Point: Biography Portal of the Netherlands; www.biografischportaal.nl
125,000 short biographical descriptions with limited metadata from 23 Dutch biographical dictionaries (~76,000 individuals)
What kind of historical questions can be answered with these data with the help of computational methods
Biographynet.nl
Johan Rudolph Thorbecke werdin 1798 geboren op 14 januari in Zwolle en komt uit een half-Duitse…
Johan Rudolph Thorbecke werdin 1798 geboren op 14 januari in Zwolle en komt uit een half-Duitse…
Linked Data for BiograpyNet
Thorbecke
Biographical Description
ProvenanceMeta Data
NNBW
PersonMeta Data
“Thorbecke”
BiographyParts
Birth1798Event
Biographical Description
Enrichment NLP Tool
PersonMeta Data
EventBirth
Johan Rudolph Thorbecke werdin 1798 geboren op 14 januari in Zwolle en komt uit een half-Duitse…
Zwolle1798-01-14
Biographynet.nl
a
Provenance in BiographynetEnsure credibility of the demonstrator, to evaluate its performance and to improve the academic status of the tool
Information involved Sources, but also: NER input data, etc. Processes involved All steps in enrichment, aggregation…People involved Who was responsible for pipeline, tool,
Biographynet.nl*Daniel Garijo, Yolanda Gil; http://www.opmw.org/model/p-plan
Interface for historians
Biographynet.nl
Framework generic solutions with historians1. Preprocess, Clean, Model, Link, Enrich data in a collaboration
with domain experts
2. Access heterogeneous datasets in a convenient way to get an intuition of the character and anomalies of the (linked) data;
3. Perform arbitrary queries to retrieve results relevant to their research questions;
4. Verify the veracity of query results, by following provenance links to original material
5. Retrieve and analyze the data with tool of preference.
6. Republish and share results
Historical tool criticism… willingness from historians to invest the time to learn about computer processes (at least the basic principles)
Possibilities for education at universities to bridge the gap between computer science and humanities studies and make tool criticism an integral part of student’s curricula
“Why do we still teach history student to decipher 17th Century handwriting, but not SQL”
Verrijkt Koninkrijk
30
National-Socialist29%
Social-Democrat21%Protestant
13%
Liberal12%
R-Catholic12%
Com
munist8%
Jewish5%
http://semanticweb.cs.vu.nl/verrijktkoninkrijk/http://search.loedejongdigitaal.nl/
Results are links to paragraphs
re-usability
http://qhp.science.uva.nl/