Linked Data Profiling Andrejs Abele National University of Ireland, Galway Supervisor: Paul...

Linked Data Profiling

Andrejs Abele

National University of Ireland, Galway

Supervisor: Paul Buitelaar

Overview

Terminology Motivation My approach Evaluation Conclusion Future work

Linked Data is about using the Web to connect related data that was not previously linked.

Resource Description Framework is represented by sets of subject-predicate-object triples, where the elements may be URIs, literals

https://www.insight-centre.org/users/andrejs-ābele foaf:name “Andrejs Ābele”

Linked Open Data Cloud is a collection of Linked Data resources that are open and freely available

Terminology

Linked Open Data Cloud Diagram

Publications

Life Sciences

Cross-Domain

Social Networking

Geographic

Government

User-Generated Content

Linguistics

Motivation

Linked Data is hard to understand for humans Only a small number of datasets provide a

human readable overview or comprehensive metadata

When adding a new dataset to the LOD cloud, connections have to be identified to as many other relevant LOD datasets as possible

LOD Cloud Diagram relays on human classification

Existing solutions for LD profiling

[1] http://demo.seco.tkk.fi/aether/#/ [2] https://www.hpi.uni-potsdam.de/naumann/sites/prolod++/#[3] http://lodlaundromat.org/

[4] http://stats.lod2.eu/ [5] http://demo.seco.tkk.fi/aether/#/[6] http://rdfstats.sourceforge.net/

Loupe1

ProLOD++2

LOD Laundromat3

LODStat4

Aether5

RDF-stats6

Domain identification method using DBpedia

Topic Extraction

Domain Identification

Domain

• Input : Bio2RDF-sgd

• Description: The Saccharomyces Genome Database (SGD) collects and organizes information about the molecular biology and genetics of the yeast Saccharomyces cerevisiae

1. Most frequent terms (sgd_vocabulary, query, proper, phenotype, experiment)

2. Literal containing one of the terms ("protein [sgd_vocabulary:protein]@en")

3. Identify DBpedia concept (http://dbpedia.org/resource/Protein)

4. Identify Category (http://dbpedia.org/resource/Category:Molecular_biology)

5. Identify domain under which category fits best (Biology =>Life Sciences)

Example

DatasetsLOD cloud datasets (annotated in LOD Cloud Diagram)405 datasets, 9 domains • Media (13)• Linguistics(34)• Publications (111)• Social Networking (41)• Geography (29)• Government (65)• Cross Domain (25)• User Generated (52)• Life Sciences (35)

1. Extract URIs of properties and classes from datasets2. Use classes and properties as features3. Classify using Support Vector Machine classifier4. Use Precision and Recall as metrics

Extended baselineEnrich the data with human annotated tags from Linked Open Vocabularies1

1. http://lov.okfn.org/dataset/lov/

Baseline approach

Precision and Recall for different domains using SVM

Linguist

Publicatio

Social n

etwork

Geogra

User g

enerate

Life s

cience

PrecisionRecall

Correctly Classified Instances

Classes Properties Classes + Properties

From DatasetDataset+LOVLOV

Conclusion

• Does not require training

• Works with new and customized vocabularies

• Works only if datasets contain literals

• Can not identify User-Generated Content and Cross-Domain

• Using just classes and properties is hard to improve results above 75%

Future Work

• Evaluate alternative classification algorithms

• Use Literals and URIs for classification

• Classify datasets in more specific subdomains

Thank you!

Linked Data Profiling Andrejs Abele National University of Ireland, Galway Supervisor: Paul...

Documents

Abele Njp 2Gravitation and quantum interference experiments with neutrons

BIBLIOGRAPHY Abele, D.C., lobie, J.S., Contacos, P.G., and ...shodhganga.inflibnet.ac.in/bitstream/10603/52191/... · - 123 - BIBLIOGRAPHY Abele, D.C., lobie, J.S., Contacos, P.G.,

USAC Colloquium Constructing Polyhedra Andrejs Treibergs - University …treiberg/PolyhedraSlides.pdf · 2011. 10. 31. · Constructing Polyhedra Andrejs Treibergs University of Utah

ANDREJS BESSONOVS OĻEGS KRASNOPJOROVS 1 / 2020

Evaluating Ontology Search Towards Benchmarking in Ontology Search Paul Buitelaar, Thomas Eigner

Andrejs Vasiljevs (Tilde) at the Industry Leaders Forum 2015

Hartmut Abele Knoxville, 8 June 2006 Neutron Decay Correlation Experiments

TAUS Annual Conference 2012, WHAT IS NEW AT TILDE, Andrejs Vasiljevs, Tilde

Extracting Author Networks & Using Semantic Similarity on Poetic Corpora - Paul Buitelaar

RTKs and rational cancer therapy Dr Andrejs Liepins/Science Photo Library

© Paul Buitelaar: eJustice Presentation, July 15th, 2004 Ontologies Contributions from Language Technology Paul Buitelaar DFKI GmbH Language Techology

a ndrejs v asiļjevs c hairman of the b oard andrejs@tilde.com

Improving the recognition system Prof. Andrejs Rauhvargers President, Lisbon Convention Committee

© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004 Human Language Technology in Ontology Engineering Ontology Learning from Text Paul Buitelaar

S andrejs vasiļjevs chairman of the board andrejs@tilde.com data is core LOCALIZATION WORLD PARIS, JUNE 5, 2012

Hill-Rom Leadership Model & Competencies - … · Mirjam Buitelaar MecoMed October 12, 2017 Hill-Rom Leadership Model & Competencies

Laima Abele - DIPUTACIÓN DE CASTELLÓN

© Paul Buitelaar – EON@ISWC07, November 2007, Busan, South-Korea Evaluating Ontology Search Towards Benchmarking in Ontology Search Paul Buitelaar, Thomas

Dean's List and Recognition - udayton.edu · List Type Student Abalodo, Bakpenam Abbarno, Erica H. Abel, Austin J. Abele, Alicia M. Abele, Claire E. Acra, Kaitlyn B. Adams, Emma R

Hansje Pansje Kevertje en het Babybrein - prof. dr. J.. Buitelaar