50
The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi http:// ebiq.org /r/3

The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

Embed Size (px)

Citation preview

Page 1: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

The Semantic Web:there and back again

Tim FininUniversity of Maryland, Baltimore County

Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

http://ebiq.org/r/353

Page 2: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

LOD 123: Making the semantic web easier to use

Tim FininUniversity of Maryland, Baltimore County

Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

Page 3: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

Semantic Web: then and now

• Ten years ago the we developed complex ontologies used to encode and reason over small datasets of 1000s of facts

• Recently the focus has shifted to using simple ontologies and minimal reasoning over very large datasets of 100s of millions of facts

• Major companies are moving: Google Know-ledge Graph, Facebook Open Graph, Microsoft Satori, Apple Siri KB, IMB Watson KB Linked open data or “Things, not strings”

Page 4: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

Linked Open Data (LOD)• Linked data is just RDF data, lots of it,

with a small schema• RDF data is a graph of triples (subject, predicate

– URI URI String: dbr:Barack_Obama dbo:spouse “Michelle Obama”

– URI URI URI: dbr:Barack_Obama dbo:spouse dbpedia:Michelle_Obama

• Best linked data practice prefers 2nd pattern, using nodes rather than strings for “entities”– Things, not strings!

• Linked open data is just linked data freely acces-sible on the Web along with their ontologies

Page 5: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

Semantic web technologies allow machines to share data and knowledge using common web language and protocols.

~ 1997

Semantic Web

Semantic Web beginning

Use Semantic Web Technology to publish shared data & knowledge

Page 6: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

2007

Semantic Web => Linked Open DataUse Semantic Web Technology to publish shared data & knowledge

Data is inter-linked to support inte-gration and fusion of knowledge

LOD beginning

Page 7: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

2008

Semantic Web => Linked Open DataUse Semantic Web Technology to publish shared data & knowledge

Data is inter-linked to support inte-gration and fusion of knowledge

LOD growing

Page 8: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

2009

Semantic Web => Linked Open DataUse Semantic Web Technology to publish shared data & knowledge

Data is inter-linked to support inte-gration and fusion of knowledge

… and growing

Page 9: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

Linked Open Data

2010

LOD is the new Cyc: a common source of background

knowledge

Use Semantic Web Technology to publish shared data & knowledge

Data is inter-linked to support inte-gration and fusion of knowledge

…growing faster

Page 10: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

Linked Open Data

2011: 31B facts in 295 datasets interlinked by 504M assertions on ckan.net

LOD is the new Cyc: a common source of background

knowledge

Use Semantic Web Technology to publish shared data & knowledge

Data is inter-linked to support inte-gration and fusion of knowledge

Page 11: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

Exploiting LOD not (yet) Easy

• Publishing or using LOD data hasinherent difficulties for the potential user

– It’s difficult to explore LOD data and to query it for answers

– It’s challenging to publish data using appropriate LOD vocabularies & link it to existing data

• Problem: O(104) schema terms, O(1011) instances

• I’ll describe two ongoing research projects that are addressing these problems

Page 12: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

GoRelations:Intuitive Query System

for Linked Data

Research with Lushan Han

http://ebiq.org/j/93

Page 13: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

Dbpedia is the Stereotypical LOD•DBpedia is an important example of Linked Open Data–Extracts structured data from Infoboxes in Wikipedia –Stores in RDF using custom ontologies Yago terms

•The major integration point for the entire LOD cloud•Explorable as HTML, but harder to query in SPARQL

DBpedia

Page 14: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

Browsing DBpedia’s

Mark Twain

Page 15: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

Why it’s hard to query LOD• Querying DBpedia requires a lot of a user– Understand the RDF model– Master SPARQL, a formal query language– Understand ontology terms: 320 classes & 1600 properties !– Know instance URIs (>2M entities !)– Term heterogeneity (Place vs. PopulatedPlace)

• Querying large LODsets overwhelming

• Natural languagequery systems stilla research goal

Page 16: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

Goal

• Let users with a basic understanding of RDF query DBpedia and other LOD collections– Explore what data is in the system– Get answers to question– Create SPARQL queries for reuse or adaptation

• Desiderata– Easy to learn and to use– Good accuracy (e.g., precision and recall)– Fast

Page 17: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

Key Idea

Structured keyword queries reduce problem complexity:

– User enters a simple graph, and– Annotates the nodes and arcs with

words and phrases

Page 18: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

Structured Keyword Queries

• Nodes are entities and links binary relations• Entities described by two unrestricted terms:

name or value and type or concept• Outputs marked with ? • Compromise between a natural language Q&A

system and formal query–Users provide compositional structure of the question–Free to use their own terms to annotate structure

Page 19: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

Translation – Step Onefinding semantically similar ontology terms

For each graph concept/relation, generate k most semantically similar ontology classes/properties

Lexical similarity metric based on distributional similarity, LSA, and WordNet

Page 20: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

Semantic similarity: http://bit.ly/SEMSIM

Page 21: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

Semantic Textual Similarity task• 2013 lexical and computational semantics conference• Do two sentences have same meaning (0…5)

1: “The woman is playing the violin” vs. “The young lady enjoys listening to the guitar”4: "In May 2010, the troops attempted to invade Kabul” vs. "The US army invaded Kabul on May 7th last year, 2010"

• 2012: 35 teams, 88 runs, 2013: 36 teams, 89 runs• 2250 sentence pairs

from four domains• Our three runs

#1, #2 and #4

Dataset PairingWords Galactus Saiyan

Headlines (750 pairs) 0.7642 (3) 0.7428 (7) 0.7838 (1)

OnWN (561 pairs) 0.7529 (5) 0.7053 (12) 0.5593 (36)

FNWN (189 pairs) 0.5818 (1) 0.5444 (3) 0.5815 (2)

SMT (750 pairs) 0.3804 (8) 0.3705 (11) 0.3563 (16)

Weighted mean 0.6181 (1) 0.5927 (2) 0.5683 (4)

Page 22: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

• Assemble best interpretation using statistics of the data

• Use pointwise mutual informa-tion (PMI) between RDF terms in the LOD collection

Measures degree to which two RDF terms co-occur in knowledge base

• In a good interpretation, ontology terms associate like their corresponding user terms connect in the query

Translation – Step Twodisambiguation algorithm

Page 23: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

Three aspects are combined to derive an overall goodness measure for each candidate interpretation

Translation – Step Twodisambiguation algorithm

Joint disam-biguation

Resolvingdirection

Link reason-ableness

Page 24: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

Translation resultConcepts: Place => Place, Author => Writer, Book => BookProperties: born in => birthPlace, wrote => author (inverse direction)

Page 25: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

The translation of a semantic graph query to SPARQL is straightforward given the mappings

SPARQL Generation

Concepts•Place => Place•Author => Writer•Book => Book

Relations•born in => birthPlace•wrote => author

Page 26: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

Evaluation• 33 test questions from 2011 Workshop on Question

Answering over Linked Data answerable using DBpedia• Three human subjects unfamiliar with DBpedia translated

the test questions into semantic graph queries• Compared with two top natural language QA systems:

PowerAqua and True Knowledge

Page 27: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

Current work

• Baseline system works well for Dbpedia; we’re testing a second use case now

• Current work– Better entity matching– Relaxing the need for type information– A better Web interface with user feedback & advice

• See http://ebiq.org/93 for more information & try our alpha version at http://ebiq.org/GOR

Page 28: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

Generating Linked Databy Inferring the

Semantics of Tables

Research with Varish Mulwad

http://ebiq.org/j/96

Page 29: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

Goal: Table => LOD*

Name Team Position Height

Michael Jordan Chicago Shooting guard 1.98

Allen Iverson Philadelphia Point guard 1.83

Yao Ming Houston Center 2.29

Tim Duncan San Antonio Power forward 2.11

http://dbpedia.org/class/yago/NationalBasketballAssociationTeams

http://dbpedia.org/resource/Allen_Iverson Player height in meters

dbprop:team

* DBpedia

Page 30: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

Goal: Table => LOD*

Name Team Position HeightMichael Jordan Chicago Shooting guard 1.98

Allen Iverson Philadelphia Point guard 1.83

Yao Ming Houston Center 2.29

Tim Duncan San Antonio Power forward 2.11

@prefix dbpedia: <http://dbpedia.org/resource/> .@prefix dbo: <http://dbpedia.org/ontology/> .@prefix yago: <http://dbpedia.org/class/yago/> .

"Name"@en is rdfs:label of dbo:BasketballPlayer ."Team"@en is rdfs:label of yago:NationalBasketballAssociationTeams .

"Michael Jordan"@en is rdfs:label of dbpedia:Michael Jordan .dbpedia:Michael Jordan a dbo:BasketballPlayer .

"Chicago Bulls"@en is rdfs:label of dbpedia:Chicago Bulls .dbpedia:Chicago Bulls a yago:NationalBasketballAssociationTeams .

RDF Linked Data

All this in a completely automated way* DBpedia

Page 31: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

Tables are everywhere !! … yet …

The web – 154 million high quality relational tables

Fewer than 1% of the 400K tables at data.gov have rich semantic schemas

Page 32: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

Sampling Acronym detection

Pre-processing modules

Query and generate initial mappings

2 1

Generate Linked RDF Verify (optional) Store in a knowledge base & publish as LOD

Joint Inference/Assignment

A Domain Independent Framework

Page 33: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

Query and Rank

Rank(String Similarity,

Popularity)Chicago

Boston

Allen Iverson 1. Chicago_Bulls2. Chicago3. Judy_Chicago

possible entitiesfor Chicago

Can be replaced by Domain Specific / other LOD knowledge bases

Page 34: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

Generating candidate ‘types’ for Columns

Team

Chicago

Philadelphia

Houston

San Antonio

Class

Instance

1. Chicago Bulls2. Chicago3. Judy Chicago

{dbpedia-owl:Place,dbpedia-owl:City,yago:WomenArtist,yago:LivingPeople,yago:NationalBasketballAssociationTeams }

{dbpedia-owl:Place, dbpedia-owl:PopulatedPlace, dbpedia-owl:Film,yago:NationalBasketballAssociationTeams …. ….. ….. }

{……………………………………………………………. }

dbpedia-owl:Place, dbpedia-owl:City, yago:WomenArtist, yago:LivingPeople, yago:NationalBasketballAssociationTeams, dbpedia-owl:PopulatedPlace, dbpedia-owl:Film ….

Page 35: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

Joint Inference / Assignment

Page 36: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

A graphical model for tablesJoint inference over evidence in a table

C1 C2 C3

R11

R12

R13

R21

R22

R23

R31

R32

R33

Team

Chicago

Philadelphia

Houston

San Antonio

Class

Instance

Page 37: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

Parameterized Graphical Model

C1 C2C3

𝝍𝟓

R11 R12 R13 R21 R22 R23 R31 R32 R33

Function capturing affinity between column headers and row values

Row value

Variable Node: Column header

Captures interaction between column headers

Factor Node

Captures interaction between row values

Page 38: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

Standard message passingP(C1, C2, C3, R11, R12 ,R13, R21, R22, R23, R31, R32, R33)Joint Assignment :

P(C1, R11, R12 ,R13)

P(C3, R31, R32, R33)

P(C2,R21, R22, R23)

P(R31, R32, R33)

Graphical Models : Exploit Conditional

Independences

Still …

C1 R11 R12 R13 Val

4 Variables; Each having

25 options -- 390,625

entries !

Page 39: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

Semantic message passing

Michael_I_Jordan Chicago_BullsShooting_guard

“Change”Related to Chicago_Bulls

“No Change”“No Change”

R11:[Michael_I_Jordan]

R12:[Yao_Ming]

R13:[Allen_Iverson]

R21:[Chicago_Bulls]

R31:[Shooting_Guard]

……

C1:[BasketballPlayer] C2:[NBATeam] C3:[BasketBallPositions]

Yao_Ming Allen_Iverson

BasketballPlayer

NBATeam BasketBallPositions

“Change”BasketBall

Player

“No Change”

“No Change”

“No Change”“No Change”

“No Change”

“No Change”

Page 40: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

Inference – ExampleR11:

[Michael_I_Jordan]

R12:[Allen_Iverson] R13:[Yao_Ming]

C1:[Name]

(Michael_I_Jordan, Yao_Ming, Allen_Iverson)

“Change”“No Change”“No Change”

“BasketBallPlayer”

R11

Michael Jordan

1. Michael_I_Jordan (Professor)2. ….. 3. Michael_Jordan (BasketballPlayer)

….

Michael_I_Jordan Allen_Iverson Yao_Ming

“BasketBallPlayer”

Page 41: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

Column header – row value agreement[Michael_I_Jordan, Allen_Iverson, Yao_Ming]

WomenArtistCityPopulatedPlaceAthleteFilm

LivingPeopleGeoPopulatedPlaceBasketBallPlayerArtWork

Name

Michael_I_Jordan

Allen_Iverson

Yao_Ming BasketballPlayerAtheleteLivingPeople

LivingPeopleAI_Researchers +1

+1

+1

+1

+1 1. Athlete2. City 3. …

1.LivingPeople2. BasketBallPlayer3.GeoPopulatedPlace….

Yago Tie-breaker/Re-order : Choose more ‘descriptive’ class. E.g. BasketBallPlayer better than LivingPeople

ClassGranularityScore = 1-[]

1: Majority Voting

2: Choose the top Class

Top Yago : BasketBallPlayer TopDBpedia : Athelete

Page 42: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

Column header – row value agreementtopClassScore = (numberOfVotes)/(numberofRows)

Compute topScoreYago & topScoreDBpedia

(both)Score < Threshold(topScoreYago || topScoreDBpedia) >= Threshold

Check for AlignmentIs Athelete sub/superClass of BasketBallPlayer ?

Columnn Header Annotation = BasketBallPlayer, Athlete

Name

Michael_I_Jordan

Allen_Iverson

Yao_Ming BasketballPlayerAtheleteLivingPeople

LivingPeopleAI_Researchers Change

No - Change

Update Column Header Annotation = “No-Annotation”

Page 43: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

Update row value entity annotations

R11

Michael Jordan

1. Michael_I_Jordan 2. ….. 3. Michael_Jordan ….

LivingPeopleAI_Researchers

BasketBallPlayerAthlete

𝝍𝟑 “CHANGE”

Entity Class : BasketBallPlayer or

Athlete

R11 Michael Jordan Michael_Jordan

Candidate Entities for R11

Page 44: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

Evaluation

• Dataset of 80 tables (Wikipedia tables; part of larger dataset released by IIT-Bombay)

• Evaluated Column Header Annotation Accuracy– How good was the mapping Team to

NationalBasketballAssociationTeams• Evaluated Entity Linking Accuracy

– Mapping Michael Jordan to Michael_Jordan

Page 45: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

Column Header Annotation Accuracy

• System produced a ranked a list of Yago & DBpedia classes

• Human judges evaluated each class• For precision, judges scored each class

• 1 if the class was accurate• 0.5 if the class ok, but not best (e.g., Place vs. City)• 0 if it was incorrect

• For Recall, score 1 if accurate/correct, 0 for incorrect

522

259

422

Accurate

Okay

Incorrect

Page 46: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

Top K classes, F-measure

Yago rank 1

Yago rank 2

Yago rank 3

dbp rank 1

dbp rank 2

dbp rank 3

GOOG IIT-B

F-Measure 0.688360450563

204

0.487568135310

842

0.438263244010

532

0.677154394707

449

0.626750158305

777

0.612644415917

844

0.67 0.56

0.05

0.15

0.25

0.35

0.45

0.55

0.65

0.75

Column Header Annotations

F-Measure

Page 47: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

Entity linking accuracy

http://dbpedia.org/resource/Allen_IversonAllen Iverson

Correctly Linked Entities

3022

Incorrectly Linked Entities

959

Total Entities 3981

Accuracy : 75.91 %

Page 48: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

Other Challenges• Using table captions and other text is

associated documents to provide context• Size of some data.gov tables (> 400K rows!)

makes using full graphical model impractical– Sample table and run model on the subset

• Achieving acceptable accuracy may require human input– 100% accuracy unattainable automatically– How best to let humans offer advice and/or

correct interpretations?

Page 49: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

Final Conclusions

• Linked data great for sharing structured and semi-structured data– Backed by machine-understandable semantics– Uses successful Web languages and protocols

• Generating and exploring linked data resources is challenging– Schemas are too large, too many URIs

• New tools mapping tables to linked data and translating structured natural language queries reduce the barriers

Page 50: The Semantic Web: there and back again Tim Finin University of Maryland, Baltimore County Joint work with Lushan Han, Varish Mulwad, Anupam Joshi

http://ebiq.org/