15
Using Linked Data Traversal to Label Academic Communities Ilaria Tiddi, Mathieu d’Aquin, Enrico Motta Knowledge Media Institute, The Open University

Savesd

  • Upload
    i-tiddi

  • View
    31

  • Download
    0

Embed Size (px)

Citation preview

Using Linked Data Traversal to

Label Academic CommunitiesIlaria Tiddi, Mathieu d’Aquin, Enrico Motta

Knowledge Media Institute, The Open University

Motivation

We • Explain data patterns automatically• Using Linked Data background knowledge

Scholarly data• Growing interest and techniques• Mine and visualise data• Reveal hidden knowledge• Forecast

Data interpretation still manual

Use-case: Community Detection

Aim • Detecting communities of research topics• The Open University papers (ORO1)

Usual text-mining methods• Groups of similar documents• Probabilistically extracted topics• Based on words of co-occurrence

1 http://oro.open.ac.uk/

Use-case: Community Detection

ProblemLabeling require human interpretation

Linked Data can help!

• Scholarly data: big portion within Linked Data• RDF structure (machine understandable)• Linked datasets • Across disciplines

• Easier discovery of unrevealed knowledge• Easier result interpretation

Proposition

• Automatic topic detection (labels) • With Linked Data background knowledge

• Machine Learning approach • A* search over the Linked Data graph• Link traversal (vs. literature based on SPARQL)

Approach

Document clustering • text pre-processing (normalise, stem, filter)• Latent Semantic Analysis space of word vectors• clustering according to LSA distance• community : a group of similar words

Communities networking• connecting clusters’ centroids (the closest one)• network graph of communities

Initial dataset• Words URIs • connected to DBpedia

Machine Learning/Logic Programming approach• Given• Positive examples E+ : Cluster (words) to label • Negative examples E-: Words not in E+

• Background Knowledge from Linked Data

• Derive• Explanations of the grouping for E+ (topic)

Approach

Explanation • RDF property chains• Leading to the same

entity • shared by a subset of

initial words

Linked Data Background Knowledge

Topic: many words of the cluster that share the same explanation

Aim: find the explanation shared by the biggest number of words in the cluster

Linked Data Traversal

e.g. <skos:relatedMatch-dc:subject-skos:broader.db:Creativity>

How: A* search to iteratively explore new parts of the graph and improve the explanation

Linked Data Traversal

<skos:relatedMatch-dc:subject-skos:broader-skos:broader.db:Aesthetics>

Ranking explanations according to F-Measure

Take the best explanation and label the cluster

Explanation Evaluation

word outside E+

words sharing

the explanation

cluster

(E+)

Community Labeling

Examples of topics:

<skos:relatedMatch-dc:subject-skos:broader-skos:broader-skos:broader.db:Geology><skos:relatedMatch-dc:subject-skos:broader-skos:broader-skos:broader.db:Chemistry><skos:relatedMatch-dc:subject-skos:broader-skos:broader-skos:broader.db:Mathematics>

Conclusion and future work

Facilitating data interpretation by combining• scholarly data• Machine Learning• Linked Data graph search

Future work• improve the graph exploration to discover

more knowledge• focus on the definition of “explanation”

Thank you! Questions?

Many thanks to him and him