Validating statistical Index Data represented in RDF using SPARQL Queries: Computex

Validating statistical index data represented in RDF using

SPARQL queries

WESO Research GroupUniversity of Oviedo, Spain

Jose Emilio Labra Gayo Jose María Álvarez Rodríguez

Motivation

The WebIndex ProjectMeasure impact of the Web in different countriesFirst publication: 2012, 2 web sites:Visualization

http://thewebindex.org

Data portalhttp://data.webfoundation.org

WebIndex Workflow

Raw DataExcel

Computed dataExcel

Externalsources

All dataRDF Datastore

Visualizations

Technical details

Index made from61 countries (100 planned for 2013)85 indicators:

51 Primary (questionnaires)34 Secondary (external sources)

WebIndex RDF dataModeled on top of RDF Data CubeMore than 1 million triplesLinked data: DBPedia, Organizations, etc.Records data computation

Country 2009 2010 2011

Spain 4 5 3

Finland 4 5 6

Armenia 1 1 1

Chile 6 8 10.6

Country 2009 2010 2011

Spain 4 5 3

Finland 4 5 6

Armenia 1 1 1

Chile 6 8 10.6

Country 2009 2010 2011

Spain 4 5 3

Finland 4 5 6

Armenia 1 1 1

Chile 6 8 10.6

Country 2009 2010 2011

Spain 4 5 3

Finland 4 5 6

Armenia 1 1 1

Chile 6 8 10.6

Country 2009 2010 2011

Spain 4 5 3

Finland 4 6

Armenia 1

Chile 6 8

Country 2009 2010 2011

Spain 4 5 3

Finland 4 6

Armenia 1

Chile 6 8

WebIndex computation process (1)

Simplified with one indicator, 3 years and 4 countries

Raw Data

Country 2009 2010 2011

Spain 4 5 3

Finland 4 5 6

Armenia 1 1 1

Chile 6 8 10.6

Imputed Data

Filtered Data (Indicator A)Country 2009 2010 2011

Spain -0.57 -0.57 -0.92

Finland -0.57 -0.57 -0.14

Chile 1.15 1.15 1.06

Normalized Data (z-scores)

Country 2009 2010 2011

Spain 4 5 3

Finland 4 6

Armenia 1

Chile 6 8

Country 2009 2010 2011

Spain 4 5 3

Finland 4 5 6

Armenia 1 1 1

Chile 6 8 10.6

Country 2009 2010 2011

Spain -0.57 -0.57 -0.92

Finland -0.57 -0.57 -0.14

Chile 1.15 1.15 1.06

Country 2009 2010 2011

Spain -0.57 -0.57 -0.92

Finland -0.57 -0.57 -0.14

Chile 1.15 1.15 1.06

Average growth

z-score

More details can be found here: http://thewebindex.org/about/methodology/computation/

WebIndex computation Process (2)

Simplified with one indicator, 3 years and 4 countries

Country 2009 2010 2011

Spain -0.57 -0.57 -0.92

Finland -0.57 -0.57 -0.14

Chile 1.15 1.15 1.06

Normalized Data (z-scores)

Country 2009 2010 2011

Spain -0.57 -0.57 -0.92

Finland -0.57 -0.57 -0.14

Chile 1.15 1.15 1.06

Country 2009 2010 2011

Spain -0.57 -0.57 -0.92

Finland -0.57 -0.57 -0.14

Chile 1.15 1.15 1.06

More details can be found here: http://thewebindex.org/about/methodology/computation/

Adjusted data

Country A B C D ...

Spain 8 7 9.1 7.1 ...

Finland 7 8 7.1 8 ...

Chile 8 9 7.6 6 ...

Group indicators

Country Readiness Impact Web Composite

Spain 5.7 3.5 5.1 4.5

Finland 5.5 3.9 7.1 4.9

Chile 6.7 4.5 7.6 5.1

Rankings

Country Readiness Impact Web Composite

Spain 2 3 3 3

Finland 3 2 2 2

Chile 1 1 1 1

𝑥𝑖=𝑥 𝑖+𝛿

Example of datarepresentation in RDF

Example:obs:obsM23 a qb:Observation ; cex:computation [ a cex:Z-Score ; cex:observation obs:obsA23 ; cex:slice slice:sliceA09 ; ] ; cex:value -0.57 ; cex:md5-checksum "2917835203..." ; cex:indicator indicator:A ; cex:concept country:ESP ; qb:dataSet dataset:A-Normalized ; # ... other declarations omitted for brevity

It declares that the valueof this observation was obtained asz-score of obs:obsA23 over slice:sliceA09

Each observation follows the RDF Data Cube vocabulary extented with metadata about how it was obtained

Computex

Vocabulary for statistical computationsExtends RDF Data CubeOntology at: http://purl.org/weso/computex Some terms:

cex:Conceptcex:Indicatorcex:Computationcex:WeightSchemaqb:Observation...

Validation process

Last year (2012) Template based validationMD5 checksum of each observation and some fields

This year (2013)SPARQL based validation3 levels of validation

RDF Data CubeShapes of dataComputation process

Ultimate goal: automatically compute the index

Validation approach

SPARQL CONSTRUCT queries instead of ASKIF no error THEN empty model

ELSE RDF graph with error informationCONSTRUCT { [ a cex:Error ; cex:errorParam [cex:name "obs"; cex:value ?obs ] , [cex:name "value1"; cex:value ?value1 ] , [cex:name "value2"; cex:value ?value2 ] ; cex:msg "Observation has two different values" . ]} WHERE { ?obs a qb:Observation . ?obs cex:value ?value1 . ?obs cex:value ?value2 . FILTER ( ?value1 != ?value2 )}

CONSTRUCT queries enable debugging

ASK WHERE { ?dim a qb:DimensionProperty . FILTER NOT EXISTS { ?dim rdfs:range [] }}

SPARQL queriesRDF Data Cube

We converted RDF Data Cube integrity constraints from ASK to CONSTRUCT queries

CONSTRUCT { [ a cex:Error ; cex:errorParam [cex:name "dim"; cex:value ?dim ] ; cex:msg "Every Dimension must have a declared range" . ]} WHERE { ?dim a qb:DimensionProperty . FILTER NOT EXISTS { ?dim rdfs:range [] }}

SPARQL expressivity

SPARQL can express complex validation patterns. Example: MeanCONSTRUCT { [ a cex:Error ; cex:errorParam # ...omitted cex:msg "Mean value does not match" ] .} WHERE { ?obs a qb:Observation ; cex:computation ?comp ; cex:value ?val . ?comp a cex:Mean . { SELECT (AVG(?value) as ?mean) ?comp WHERE { ?comp cex:observation ?obs1 . ?obs1 cex:value ?value ; } GROUP BY ?comp } FILTER( abs(?mean - ?val) > 0.0001) }}

Limitations of SPARQL expressivity

Some built-in functions are not standardizedExample: z-score employs standard deviation.

It requires built-in function: sqrtAvailable in some SPARQL implementations

Ranking of values was not obvious (*)

2 solutions:Using GROUP_CONCATCheck a value against all the other values

Neither solution is efficient

(*) http://stackoverflow.com/questions/17313730/how-to-rank-values-in-sparql

SPARQL expressivity

Handling series with RDF CollectionsAverage growth:

CONSTRUCT { # ... omitted for brevity } WHERE { ?obs cex:computation [a cex:AverageGrowth; cex:observations ?ls] ; cex:value ?val . ?ls rdf:first [cex:value ?v1 ] . { SELECT ( SUM(?v_n / ?v_n1)/COUNT(*) as ?meanGrowth) WHERE { ?ls rdf:rest* [ rdf:first [ cex:value ?v_n ] ; rdf:rest [ rdf:first [ cex:value ?v_n1 ]]] . } } FILTER (abs(?meanGrowth * ?v1 - ?val) > 0.001) } * This solution was provided by Joshua Taylor

RDF Profiles

Descriptions (with constraints) of dataset familiesCan be used to validate (and compute) RDF graphsExamples:

Cube (Statistical data)Normalization Update queries + Integrity constraints

Computex (statistical computations)Adds more queries

Other user defined profiles

Computex

WebIndexCloudIndex

Cube Profile

W3c Candidate Recommendation (http://www.w3.org/TR/vocab-data-cube/)

1 base ontology + RDF Schema inference2 update queries (normalization algorithm)21 integrity constraint queries

cex:cube a cex:ValidationProfile ; cex:name "RDF Data Cube" ; cex:ontologyBase cube:cube.ttl ; cex:expandSteps ( [ cex:name "Closure" ; cex:uri cube:closure.ru ] [ cex:name "Flatten" ; cex:uri cube:flatten.ru ] ) ; cex:integrityQuery [ cex:name "IC-1. Unique DataSet" ; cex:uri cube:q1.sparql ]; # ... .

Computex Profile

1 base ontology http://purl.org/weso/computex

Imports RDF Data Cubecex:computex a cex:ValidationProfile ; cex:name "Computex" ; cex:ontologyBase cex:computex.ttl ; cex:import profile:cube.ttl ; cex:expandSteps ( [ cex:name "Copy Raw"; cex:uri cex:copyRaw.sparql ] # Other UPDATE queries ) ; cex:integrityQuery [ cex:name "Adjusted" cex:uri cex:adjusted.sparql ] , # Other integrity queries

Computex Validation tool

Available at: http://computex.herokuapp.com

FeaturesInformative error messages

Option to generate EARL report

2 validation profiles (Cube and Computex)Source code in Scala available at:

http://github.com/weso/computex

Based on profiles

Computex Validation Toolhttp://computex.herokuapp.com

Conclusions

WebIndex = use case for RDF ValidationSPARQL queries as an implementation approachChallenges:

Expressivity limits of SPARQLComplexity of some queries

Concept of RDF Profiles Declarative, Turtle syntaxIntermediate level between OWL and SPARQL

END OF PRESENTATION

COMPLEMENTARY SLIDES

Limits of SPARQL expressivity

Ranking of values using GROUP_CONCAT

SELECT ?x ?v ?ranking { ?x :value ?v . { SELECT (GROUP_CONCAT(?x;separator="") as ?ordered) { { SELECT ?x { ?x :value ?v . } ORDER BY DESC(?v) } }}BIND (str(?x) as ?xName)BIND (strbefore(?ordered,?xName) as ?before)BIND ((strlen(?before) / strlen(?xName)) + 1 as ?ranking)} ORDER BY ?ranking

Limits of SPARQL expressivity

Ranking of values comparing all values

SELECT ?x ?v (COUNT(*) as ?ranking) WHERE { ?x :value ?v . [] :value ?u . FILTER( ?v <= ?u )}GROUP BY ?x ?vORDER BY ?ranking

* This solution was provided by Joshua Taylor

Validating statistical Index Data represented in RDF using SPARQL Queries: Computex

Technology

The Berlin SPARQL Benchmarkwifo5-03.informatik.uni-mannheim.de/...Berlin-SPARQL-Benchmark... · The Berlin SPARQL Benchmark Christian Bizer1 and Andreas Schultz1 ... benchmark might

For Computex, Relationships are the Cornerstone of Successassets.cdnma.com/8247/assets/Intel Elevate/Intel... · For Computex, Relationships are the Cornerstone of Success Houston-based

SUPPLY CHAIN RESEARCH Computex 2019: Focus on tech

Jena Sparql

COMPUTEX 2016 Brochure

05/01/2016 SPARQL SPARQL Protocol and RDF Query Language S. Garlatti

Foundations of Semantic Web Technologies - SPARQL 1 · Tutorial 2 30 APR DS6 SPARQL – Syntax & Intuition 02 MAY DS5 SPARQL – Semantics 02 MAY DS6 SPARQL Algebra 09 MAY DS5 Tutorial

Tuesday May 30, 2017 Welcome to Computex ... › tech › exhi_image › COMPUTEX2017 › ... · Digitimes report, Demand for ... VR to be highlight of Computex Taipei 2017. Continued

AMD 2014 computex

COMPUTEX 2016 Sponsorship Manualcloud.taiwantradeshows.com.tw/2016/computex/download/...COMPUTEX accepts applications from November 23rd, 2015 to March 31st, 2016 or when availabilities

Chapter 1 SPARQLog: SPARQL with Rules and Quantiﬁcation SPARQL... · Chapter 1 SPARQLog: SPARQL with Rules and ... (such as a tutorial or practice lab) ... SPARQL with Rules and

RDF and SPARQL - Part IV: Syntax of SPARQL

SPARQL Query Forms

C-SPARQL: A Continuous Extension of SPARQLmayor2.dia.fi.upm.es/oeg-upm/files/eswc2014/Tutorials/...csparql.pdf · C-SPARQL: A Continuous Extension of SPARQL ... C-SPARQL allows for

2013 luxa2 computex dm 4面式 a13052901

Understanding SPARQL

HWBOT World Tour 2016 - Computex Press Event

Practical SPARQL Benchmarking

Computex 2014 - Overclocking Infographic

Siemens computex smart grid