Upload
albert-merono-penuela
View
212
Download
0
Embed Size (px)
Citation preview
Historical Quantitative Reasoning on the Web
Albert Meroño-PeñuelaAshkan Ashkpour
Historical Open Data on the Web
• Volume• Velocity• Variety• Veracity
(Historical) Knowledge Discovery
Data Preparation
• Many interesting datasets are messy, incomplete and incorrect
• Data analysis requires clean data• Cleaning data involves careful interpretation and
study• Values and variables in the data are replaced
with (more) standard terms (coding)• Cross-dataset analyses requires a further data
harmonization step
Data Preparation
This ‘data preparation’ step can take up to 60% of the total work
We do this repeatedly for the same datasets!
Linking Social History Data
• Linked Open Data – machine-readable Web graph with 100 billion statements [1]
• Sharing (socio-historical) knowledge for reusability
• Solves integration
[1] http://lodlaundromat.org/
• Tablinker: Conversion of Excel spreadsheets to RDF• Integrator: Attach harmonization rules to the raw RDF• Qber: crowd based, interactive coding and harmonization • LSD Dimensions: index of statistical variables on the Web
http://lod.cedar-project.nl/maps/
Edit Rules
• Data is good• Knowledge to assess quality of data is good++
http://linkededitrules.org/
• Reusable rules hub• Quality assessment
tool
SCRY
Web standards compatible statistical functions in SPARQL
PREFIX : <http://scry.rocks/example/>PREFIX scry: <http://scry.rocks/>PREFIX impute: <http://scry.rocks/math/impute?>PREFIX mean: <http://scry.rocks/math/mean?>PREFIX sd: <http://scry.rocks/math/stdev?>
SELECT ?obs ?dim ?imputed_val WHERE {?obs a qb:Observation .?dim a qb:DimensionProperty|qb:MeasureProperty .FILTER NOT EXISTS { ?obs ?dim ?val .}?other_obs ?dim ?other_val .
SERVICE <http://sparql.scry.rocks/> { SELECT ?imputed_val { GRAPH ?g1 {impute:v scry:input ?other_val ; scry:output ?imputed_val .} }}}
Delegation of non-standard function to
remote SCRY orb
Don’t like SPARQL? Neither do we!
https://github.com/CEDAR-project/Queries http://grlc.clariah-sdh.eculture.labs.vu.nl/CEDAR-project/Queries/api-docs
Conclusion
• Data preparation: an expensive task (60%)• Linked Data is good for (socio-historical) data
integration on the Web• But data quality issues remain– Linked Edit Rules: rule-hub and data quality
assessment– SCRY: Linked Data compatible statistical functionality– grlc: you don’t need to know Linked Data to use
Linked Data
Refining Statistical Data on the Web
Thank you@albertmeronyo
https://github.com/CEDAR-project/https://github.com/CLARIAH/