Upload
i-tiddi
View
31
Download
0
Tags:
Embed Size (px)
Citation preview
Using Linked Data Traversal to
Label Academic CommunitiesIlaria Tiddi, Mathieu d’Aquin, Enrico Motta
Knowledge Media Institute, The Open University
Motivation
We • Explain data patterns automatically• Using Linked Data background knowledge
Scholarly data• Growing interest and techniques• Mine and visualise data• Reveal hidden knowledge• Forecast
Data interpretation still manual
Use-case: Community Detection
Aim • Detecting communities of research topics• The Open University papers (ORO1)
Usual text-mining methods• Groups of similar documents• Probabilistically extracted topics• Based on words of co-occurrence
1 http://oro.open.ac.uk/
Linked Data can help!
• Scholarly data: big portion within Linked Data• RDF structure (machine understandable)• Linked datasets • Across disciplines
• Easier discovery of unrevealed knowledge• Easier result interpretation
Proposition
• Automatic topic detection (labels) • With Linked Data background knowledge
• Machine Learning approach • A* search over the Linked Data graph• Link traversal (vs. literature based on SPARQL)
Approach
Document clustering • text pre-processing (normalise, stem, filter)• Latent Semantic Analysis space of word vectors• clustering according to LSA distance• community : a group of similar words
Communities networking• connecting clusters’ centroids (the closest one)• network graph of communities
Initial dataset• Words URIs • connected to DBpedia
Machine Learning/Logic Programming approach• Given• Positive examples E+ : Cluster (words) to label • Negative examples E-: Words not in E+
• Background Knowledge from Linked Data
• Derive• Explanations of the grouping for E+ (topic)
Approach
Explanation • RDF property chains• Leading to the same
entity • shared by a subset of
initial words
Linked Data Background Knowledge
Topic: many words of the cluster that share the same explanation
Aim: find the explanation shared by the biggest number of words in the cluster
Linked Data Traversal
e.g. <skos:relatedMatch-dc:subject-skos:broader.db:Creativity>
How: A* search to iteratively explore new parts of the graph and improve the explanation
Linked Data Traversal
<skos:relatedMatch-dc:subject-skos:broader-skos:broader.db:Aesthetics>
Ranking explanations according to F-Measure
Take the best explanation and label the cluster
Explanation Evaluation
word outside E+
words sharing
the explanation
cluster
(E+)
Community Labeling
Examples of topics:
<skos:relatedMatch-dc:subject-skos:broader-skos:broader-skos:broader.db:Geology><skos:relatedMatch-dc:subject-skos:broader-skos:broader-skos:broader.db:Chemistry><skos:relatedMatch-dc:subject-skos:broader-skos:broader-skos:broader.db:Mathematics>
Conclusion and future work
Facilitating data interpretation by combining• scholarly data• Machine Learning• Linked Data graph search
Future work• improve the graph exploration to discover
more knowledge• focus on the definition of “explanation”