14
Exploring Web Data & Knowledge through the Semantic Web Dr. Stefan Dietze L3S Research Center 27/11/13 1 Stefan Dietze

Web Science Synergies: Exploring Web Knowledge through the Semantic Web

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Web Science Synergies: Exploring Web Knowledge through the Semantic Web

Exploring Web Data & Knowledge through

the Semantic Web

Dr. Stefan Dietze

L3S Research Center

27/11/13 1 Stefan Dietze

Page 2: Web Science Synergies: Exploring Web Knowledge through the Semantic Web

Pluto & the seven Dwarfs?

Stefan Dietze 27/11/13

„…solar system… #pluto“

pluto the dwarf planet ?

Page 3: Web Science Synergies: Exploring Web Knowledge through the Semantic Web

“A little semantics goes a long way” (J. Hendler1)

dbp:CelestialBody

dbp:Pluto

dbp:Pluto(mythology)

typeOf

dwarfPlanetOf

Semantic Web

Adding meaning through shared vocabularies and schemas (eg DBpedia)

W3C standards RDF & SPARQL for data & knowledge representation and querying

Persistent URIs to reference & interlink data on the Web

1 Hendler, J., The Dark Side of the Semantic Web, IEEE Intelligent Systems, Jan/Feb 2007

yago:AstronomicalObjects

typeOf

dbp:SolarSystem

dbp:DwarfPlanetPluto

redirectOf namedAfter

„…solar system… #pluto“

Page 4: Web Science Synergies: Exploring Web Knowledge through the Semantic Web

Semantic Web / Linked Data

FOAF

Gene Ontology

BIBO Geo

Ontology

DBpedia Ontology

Dublin Core

BBC Program

mes

„HTTP-accessibility“ (SPARQL, URI-dereferencing)

„Structure“ & „Semantics“ (=> shared/linked vocabularies)

„Interlinked“

„Persistent“

De-facto standard for sharing data on the Web

Vision: well connected graph of open Web data

350+ datasets and 32 billion triples in LOD Cloud alone

Other „incarnations“:

Google Knowledge Graph

Facebook Open Graph

http://schema.org

Stefan Dietze

Page 5: Web Science Synergies: Exploring Web Knowledge through the Semantic Web

…why are there so few datasets actually used?

Date reuse and in-links focused on trusted „reference graphs“ such as DBpedia (i.e. Wikipedia)

Long tail of LD datasets which are neither reused nor linked to (LOD Cloud alone consists of 300+ datasets)

Explanations?

That’s awesome, but...

27/11/13

„HTTP-accessibility“ (SPARQL, URI-dereferencing)

„Structure“ & „Semantics“ (=> shared/linked vocabularies)

„Interlinked“

„Persistent“

Hm,

really?

Stefan Dietze

Page 6: Web Science Synergies: Exploring Web Knowledge through the Semantic Web

Open data is more diverse than we think SPARQL Web-Querying Infrastructure: Ready for Action?,

Carlos Buil-Aranda, Aidan Hogan, Jürgen Umbrich Pierre-Yves

Vandenbussch, International Semantic Web Conference 2013,

(ISWC2013).

SPARQL endpoint availability over time [Buil-Aranda et al 2013]

Accessibility of datasets?

Less than 50% of all SPARQL endpoints actually responsive at given point of time

“THE” SPARQL protocol? No, but many variants & subsets

Shared vocabularies & schemas, but:

…still very heterogeneous [d’Aquin, WebSci13]

…data partially messy an not conformant (RDFS, schemas) [HoganJWS2012]

…even widely used reference datasets such as DBpedia noisy [Paulheim2013]

Co-occurence graph of data types in 146 datasets: 144 Vocabularies, 588 highly overlapping types, 719 Properties

Assessing the Educational Linked Data Landscape, D’Aquin, M.,

Adamou, A., Dietze, S., ACM Web Science 2013 (WebSci2013), Paris,

France, May 2013.

Type Inference on Noisy RDF Data, Paulheim H., Bizer, C. Semantic

Web – ISWC 2013, Lecture Notes in Computer Science Volume 8218,

2013, pp 510-525

An empirical survey of Linked Data conformance. Hogan, A., Umbrich,

J., Harth, A., Cyganiak, R., Polleres, A., Decker., S., In the Journal of Web

Semantics 14: pp. 14–44, 2012

Stefan Dietze

Page 7: Web Science Synergies: Exploring Web Knowledge through the Semantic Web

Too many/diverse datasets, too little information

Stefan Dietze 27/11/13

? ? ?

Which datasets are useful & trustworthy for case XY (eg „learning about the solar system“) ?

Which topics (eg „Astronomy“) are covered by dataset X?

Which datasets describe/offer videos (slides, publications, statistics etc)?

Page 8: Web Science Synergies: Exploring Web Knowledge through the Semantic Web

Data curation and dataset profiling

LinkedUp

Dataset Catalog

Stefan Dietze 27/11/13

Catalog of data (LinkedUp Catalog): classification of datasets according to resource types, disciplines/topics, data quality, accessability, etc

Infrastructure for distributed/federated querying

describes

Which datasets are useful & trustworthy for case XY (eg „learning about the solar system“) ?

Which topics (eg „Astronomy“) are covered by dataset X?

Which datasets describe/offer videos (slides, publications, statistics etc)?

Page 9: Web Science Synergies: Exploring Web Knowledge through the Semantic Web

db:Astro. Objects

Dataset profiling: what’s all the data about

Dataset Metadata

Stefan Dietze 27/11/13

Schema mappings

BIBO

AAISO

FOAF

contains

Entity disambiguation

Topic profile extraction

db:Astronomy

db:Astro. Objects

LinkedUp

Dataset Catalog

yov:Video

po:Programme

BBC Programme

<po:Programme …>

<po:Series>Wonders of the Solar System</.>

<po:Actor>Brian Cox</…>

</po:Programme…>

<yo:Video …>

<dc:title>Pluto & the

Dwarf Planets</dc:title>

</yo:Video…>

Yovisto Video

bibo:Fil bibo:Fi

bibo:Film

Page 10: Web Science Synergies: Exploring Web Knowledge through the Semantic Web

LinkedUp Data Catalog in a nutshell

http://data.linkededucation.org/linkedup/catalog/

Explore & query for datasets/types & topics

Federated queries using type mappings

Stefan Dietze 27/11/13

http://data.linkededucation.org/linkedup/categories-explorer

Page 11: Web Science Synergies: Exploring Web Knowledge through the Semantic Web

LinkedUp Challenge: using open data for learning

Open Data Competition to promote tools and applications that analyse / integrate (Linked) Web data

Organised by LinkedUp project over 2 years (“Veni”, “Vidi”, “Vici”) with 40.000 EUR awards

Veni Competition - 22 submissions, 8 shortlisted for presentation at Open Knowledge Conference (17 September, Geneva Switzerland)

http://linkedup-challenge.org

Stefan Dietze 27/11/13

Page 12: Web Science Synergies: Exploring Web Knowledge through the Semantic Web

1st Place: PoliMedia Exploring political debates & events

Cross-media exploration & analysis of political events (parliament debates and media coverage)

Automatically generated links between transcripts debates, newspaper articles, and radio bulletins.

(Linked) Data available at http://data.polimedia.nl

Data sources: 1) newspapers of the historical newspaper archive, 2) radio bulletins of the Dutch National Press Agency (ANP)

9000+ debates (1945 – 1995)

Over 3000 media links

Martijn Kleppe, Max Kemman, Henri Beunders (Erasmus Universiteit Rotterdam), Laura Hollink Damir Juric (Vrije Universiteit Amsterdam), Johan Oomen Jaap Blom (Nederlands Instituut voor Beeld en Geluid)

http://www.polimedia.nl/

Stefan Dietze 27/11/13

Page 13: Web Science Synergies: Exploring Web Knowledge through the Semantic Web

Simplifying complex information to make it accessible (example: publications from Elsevier)

Scalable tools and applications using (Linked) open data for educational purposes

LinkedUp data catalog

Promotion of selected Veni submissions

Open Track

Approx. 20.000 EUR awards budget

Final events at 11th Extended Semantic Web Conference (ESWC2014)

Outlook: more “focused” data reuse challenges

27/11/13 13

http://linkedup-challenge.org/

Recommender system for educational resources (courses, MOOCs) relevant to user interests

Focused Track

Submission: 14 February 2014

Stefan Dietze

Page 14: Web Science Synergies: Exploring Web Knowledge through the Semantic Web

Thank you!

WWW See also (data)

http://datahub.io/group/linked-education

http://data.linkededucation.org

http://data.linkededucation.org/linkedup/catalog/

http://lak.linkededucation.org

See also (general)

http://linkedup-project.eu

http://linkedup-challenge.org

http://linkededucation.org

http://linkeduniversities.org

REFERENCES Assessing the Educational Linked Data Landscape, D’Aquin, M., Adamou, A., Dietze, S., ACM Web Science 2013 (WebSci2013), Paris, France, May 2013. Generating structured Profiles of Linked Data Graphs, Fetahu, B; Dietze, S., d’Aquin, M., Nunes, B.P., ISWC2013 – 12th International Semantic Web Conference; Combining a co-occurrence-based and a semantic measure for entity linking, B. P. Nunes, S. Dietze, M.A. Casanova, R. Kawase, B. Fetahu, and W. Nejdl., ESWC 2013 - 10th Extended Semantic Web Conference, (May 2013). Type Inference on Noisy RDF Data, Paulheim H., Bizer, C. Semantic Web – ISWC 2013, Lecture Notes in Computer Science Volume 8218, 2013, pp 510-525 An empirical survey of Linked Data conformance. Hogan, A., Umbrich, J., Harth, A., Cyganiak, R., Polleres, A., Decker., S., In the Journal of Web Semantics 14: pp. 14–44, 2012 SPARQL Web-Querying Infrastructure: Ready for Action?, Carlos Buil-Aranda, Aidan Hogan, Jürgen Umbrich Pierre-Yves Vandenbussch, International Semantic Web Conference 2013, (ISWC2013).

27/11/13 14 Stefan Dietze