Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
1/50
Expanding the YAGO knowledge base
Thomas Rebele
Télécom ParisTech
2018-07-05
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge baseWhat is a knowledgebase?
What is YAGO?
Accuracy
Using YAGO forthe Humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
What is a knowledge base?
2/50
Albert EinsteinMileva Maric
married
Alfred Kleinerhas advisor
Nobel Prize in Physics
won prize
Applications of knowledge basesI question answeringI semantic searchI text analysisI machine translation
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge baseWhat is a knowledgebase?
What is YAGO?
Accuracy
Using YAGO forthe Humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
What is a knowledge base?
2/50
Albert EinsteinMileva Maric
marriedAlfred Kleinerhas advisor
Nobel Prize in Physics
won prize
Applications of knowledge basesI question answeringI semantic searchI text analysisI machine translation
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge baseWhat is a knowledgebase?
What is YAGO?
Accuracy
Using YAGO forthe Humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
What is a knowledge base?
2/50
Albert EinsteinMileva Maric
marriedAlfred Kleinerhas advisor
Nobel Prize in Physics
won prize
Applications of knowledge basesI question answeringI semantic searchI text analysisI machine translation
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge baseWhat is a knowledgebase?
What is YAGO?
Accuracy
Using YAGO forthe Humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
What is a knowledge base?
2/50
Albert EinsteinMileva Maric
marriedAlfred Kleinerhas advisor
Nobel Prize in Physics
won prize
Applications of knowledge basesI question answeringI semantic searchI text analysisI machine translation
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge baseWhat is a knowledgebase?
What is YAGO?
Accuracy
Using YAGO forthe Humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
What is YAGO?
3/50
I knowledge base with 10 million entities and >210 million factsI automatically extracted from Wikipedia, Wordnet, and GeonamesI multilingual facts from 10 languagesI focus on precisionI developed by Max-Planck Institute for Informatics and Télécom
ParisTech
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge baseWhat is a knowledgebase?
What is YAGO?
Accuracy
Using YAGO forthe Humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
What is YAGO?
4/50
I I joined the project in 2015I coordinated / contributed to the evaluationI maintenance, participating in open source releaseI development
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge baseWhat is a knowledgebase?
What is YAGO?
Accuracy
Using YAGO forthe Humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
What is YAGO?
5/50
<Albert_Einstein>
Mileva
mar
ried
Alfred Kleiner
advi
sor
Nobel Prize
won
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge baseWhat is a knowledgebase?
What is YAGO?
Accuracy
Using YAGO forthe Humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
Accuracy
6/50
Figure: Screenshot of evaluation result
I 2 months evaluation, 15 participantsI evaluated 4412 facts of 76 relations (with 60m total facts)I 98% facts of the sample were correctI Wilson center: 95%, interval width: 4.2%
Now that we have this knowledge base, what can we do with it?
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe HumanitiesRelated Work
Extensions
Birth and Death Dates
Place of residence
Gender
Evaluation
Life expectancy overtime
Births per month
Relative populationsize
Summary
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
Using YAGO for the Humanities: Related Work
7/50
Similar studies using Semantic Web for Digital Humanities
I [Schich et al., 2014]: about 150,000 peopleI [de la Croix et al., 2015]: about 300,000 peopleI [Gergaud et al., 2017]: about 1,100,000 people
These studies are only about few people. Can we do better with YAGO?
YAGO has 2,200,000 people, but, e.g., locations only for 700,000 peopleHow can we make YAGO more complete?
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe HumanitiesRelated Work
Extensions
Birth and Death Dates
Place of residence
Gender
Evaluation
Life expectancy overtime
Births per month
Relative populationsize
Summary
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
Using YAGO for the Humanities: Related Work
7/50
Similar studies using Semantic Web for Digital Humanities
I [Schich et al., 2014]: about 150,000 peopleI [de la Croix et al., 2015]: about 300,000 peopleI [Gergaud et al., 2017]: about 1,100,000 people
These studies are only about few people. Can we do better with YAGO?
YAGO has 2,200,000 people, but, e.g., locations only for 700,000 peopleHow can we make YAGO more complete?
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe HumanitiesRelated Work
Extensions
Birth and Death Dates
Place of residence
Gender
Evaluation
Life expectancy overtime
Births per month
Relative populationsize
Summary
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
Using YAGO for the Humanities: Birth and Death Dates
8/50
Previous algorithm:
Languages
EnglishGermanFrench
Plato
Plato was a philoso-pher. He founded theAcademy in Athens. Helaid the foundation forphilosophy.
Plato
Birth 428 or 427 BCDeath 348 BC
Categories: 420s BC births | 340s BCdeaths | Greek philosoph | Greekmale wrestler | Austrian writer
Extracted birth dates
428
- 427
- 42#
bornOn
- 427
- 42#
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe HumanitiesRelated Work
Extensions
Birth and Death Dates
Place of residence
Gender
Evaluation
Life expectancy overtime
Births per month
Relative populationsize
Summary
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
Using YAGO for the Humanities: Birth and Death Dates
8/50
Previous algorithm:
Languages
EnglishGermanFrench
Plato
Plato was a philoso-pher. He founded theAcademy in Athens. Helaid the foundation forphilosophy.
Plato
Birth 428 or 427 BCDeath 348 BC
Categories: 420s BC births | 340s BCdeaths | Greek philosoph | Greekmale wrestler | Austrian writer
Extracted birth dates
428
bornOn- 427
- 42#
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe HumanitiesRelated Work
Extensions
Birth and Death Dates
Place of residence
Gender
Evaluation
Life expectancy overtime
Births per month
Relative populationsize
Summary
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
Using YAGO for the Humanities: Birth and Death Dates
9/50
New algorithm: filtering with category dates
Languages
EnglishGermanFrench
Plato
Plato was a philoso-pher. He founded theAcademy in Athens. Helaid the foundation forphilosophy.
Plato
Birth 428 or 427 BCDeath 348 BC
Categories: 420s BC births | 340s BCdeaths | Greek philosoph | Greekmale wrestler | Austrian writer
428
- 427
- 42#
bornOn (infobox)
bornOn (category)
?
?
"
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe HumanitiesRelated Work
Extensions
Birth and Death Dates
Place of residence
Gender
Evaluation
Life expectancy overtime
Births per month
Relative populationsize
Summary
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
Using YAGO for the Humanities: Birth and Death Dates
9/50
New algorithm: filtering with category dates
Languages
EnglishGermanFrench
Plato
Plato was a philoso-pher. He founded theAcademy in Athens. Helaid the foundation forphilosophy.
Plato
Birth 428 or 427 BCDeath 348 BC
Categories: 420s BC births | 340s BCdeaths | Greek philosoph | Greekmale wrestler | Austrian writer
428
- 427
- 42#
bornOn (infobox)
bornOn (category)
?
?
"
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe HumanitiesRelated Work
Extensions
Birth and Death Dates
Place of residence
Gender
Evaluation
Life expectancy overtime
Births per month
Relative populationsize
Summary
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
Using YAGO for the Humanities: Birth and Death Dates
9/50
New algorithm: filtering with category dates
Languages
EnglishGermanFrench
Plato
Plato was a philoso-pher. He founded theAcademy in Athens. Helaid the foundation forphilosophy.
Plato
Birth 428 or 427 BCDeath 348 BC
Categories: 420s BC births | 340s BCdeaths | Greek philosoph | Greekmale wrestler | Austrian writer
428
- 427
- 42#
bornOn (infobox)
bornOn (category)
?
?
"
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe HumanitiesRelated Work
Extensions
Birth and Death Dates
Place of residence
Gender
Evaluation
Life expectancy overtime
Births per month
Relative populationsize
Summary
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
Using YAGO for the Humanities: Birth and Death Dates
9/50
New algorithm: filtering with category dates
Languages
EnglishGermanFrench
Plato
Plato was a philoso-pher. He founded theAcademy in Athens. Helaid the foundation forphilosophy.
Plato
Birth 428 or 427 BCDeath 348 BC
Categories: 420s BC births | 340s BCdeaths | Greek philosoph | Greekmale wrestler | Austrian writer
428
- 427
- 42#
bornOn (infobox)
bornOn (category)
??
"
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe HumanitiesRelated Work
Extensions
Birth and Death Dates
Place of residence
Gender
Evaluation
Life expectancy overtime
Births per month
Relative populationsize
Summary
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
Using YAGO for the Humanities: Place of residence
10/50
Extract mapping from demonyms / adjectives to locations
”Austrian“ Austria
”Greek“ Greece
Take most frequent location as place of residence
Plato
Plato was a philoso-pher. He founded theAcademy in Athens. Helaid the foundation forphilosophy.
Plato
Birth 428 or 427 BCDeath 348 BC
Categories: 420s BC births | 340s BCdeaths | Greek philosoph | Greekmale wrestler | Austrian writer
Greece: 2Austria: 1
Greece
location withmax. occurence
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe HumanitiesRelated Work
Extensions
Birth and Death Dates
Place of residence
Gender
Evaluation
Life expectancy overtime
Births per month
Relative populationsize
Summary
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
Using YAGO for the Humanities: Place of residence
11/50
Caveat: only take outermost text spans
”Holy Roman Empire“ → <Holy_Roman_Empire>”Roman Empire“ → <Roman_Empire>
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe HumanitiesRelated Work
Extensions
Birth and Death Dates
Place of residence
Gender
Evaluation
Life expectancy overtime
Births per month
Relative populationsize
Summary
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
Using YAGO for the Humanities: Gender
12/50
Previous algorithm:
Languages
EnglishGermanFrench
Plato
Plato was a philoso-pher. He founded theAcademy in Athens. Helaid the foundation forphilosophy.
Plato
Birth 428 or 427 BCDeath 348 BC
Categories: 420s BC births | 340s BCdeaths | Greek philosoph | Greekmale wrestler | Austrian writer
Extracted gender
malegender
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe HumanitiesRelated Work
Extensions
Birth and Death Dates
Place of residence
Gender
Evaluation
Life expectancy overtime
Births per month
Relative populationsize
Summary
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
Using YAGO for the Humanities: Gender
12/50
Previous algorithm:
Languages
EnglishGermanFrench
Plato
Plato was a philoso-pher. He founded theAcademy in Athens. Helaid the foundation forphilosophy.
Plato
Birth 428 or 427 BCDeath 348 BC
Categories: 420s BC births | 340s BCdeaths | Greek philosoph | Greekmale wrestler | Austrian writer
Extracted gender
malegender
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe HumanitiesRelated Work
Extensions
Birth and Death Dates
Place of residence
Gender
Evaluation
Life expectancy overtime
Births per month
Relative populationsize
Summary
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
Using YAGO for the Humanities: Gender
13/50
New algorithm:
Languages
EnglishGermanFrench
Albert Einstein
Albert Einstein was aphysicist. Einstein de-veloped the theory ofrelativity.
Albert Einstein
Categories: Male scientist | Swissphysicists
Extracted gender
malegender
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe HumanitiesRelated Work
Extensions
Birth and Death Dates
Place of residence
Gender
Evaluation
Life expectancy overtime
Births per month
Relative populationsize
Summary
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
Using YAGO for the Humanities: Gender
13/50
New algorithm:
Languages
EnglishGermanFrench
Albert Einstein
Albert Einstein was aphysicist. Einstein de-veloped the theory ofrelativity.
Albert Einstein
Categories: Male scientist | Swissphysicists
Extracted gender
malegender
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe HumanitiesRelated Work
Extensions
Birth and Death Dates
Place of residence
Gender
Evaluation
Life expectancy overtime
Births per month
Relative populationsize
Summary
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
Using YAGO for the Humanities: Gender
14/50
Albert Schweitzer
Albert Schweitzer was aFrench-German writer,and philosopher.
Albert Schweitzer
Categories: Male writer
Albert Camus
Albert Camus was aFrench philosopher, au-thor, and journalist.
Albert Camus
Categories: Male philosopher
Albert Einstein
Albert Einstein was aphysicist. Einstein de-veloped the theory ofrelativity.
Albert Einstein
Categories: Male scientist | Swissphysicists
”Albert“ male
”Francesca“ female
”Kathleen“ female
in total:1206 first names
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe HumanitiesRelated Work
Extensions
Birth and Death Dates
Place of residence
Gender
Evaluation
Life expectancy overtime
Births per month
Relative populationsize
Summary
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
Using YAGO for the Humanities: Gender
14/50
Albert Schweitzer
Albert Schweitzer was aFrench-German writer,and philosopher.
Albert Schweitzer
Categories: Male writer
Albert Camus
Albert Camus was aFrench philosopher, au-thor, and journalist.
Albert Camus
Categories: Male philosopher
Albert Einstein
Albert Einstein was aphysicist. Einstein de-veloped the theory ofrelativity.
Albert Einstein
Categories: Male scientist | Swissphysicists
”Albert“ male
”Francesca“ female
”Kathleen“ female
in total:1206 first names
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe HumanitiesRelated Work
Extensions
Birth and Death Dates
Place of residence
Gender
Evaluation
Life expectancy overtime
Births per month
Relative populationsize
Summary
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
Using YAGO for the Humanities: Gender
15/50
Prioritize extracted facts
Albert Einstein
Albert Einstein was aphysicist. Einstein de-veloped the theory ofrelativity.
Albert Einstein
Categories: Male scientist | Swissphysicists
1. extract gender by category
Albert Camus
Albert Camus was aFrench philosopher, au-thor, and journalist.
Albert Camus
Categories: Male philosopher
2. extract gender by first name
Plato
Plato was a philoso-pher. He founded theAcademy in Athens. Helaid the foundation forphilosophy.
Plato
Birth 428 or 427 BCDeath 348 BC
Categories: 420s BC births | 340s BCdeaths | Greek philosoph | Greekmale wrestler | Austrian writer
3. extract gender by pronoun
Using YAGO for the Humanities: Evaluation
16/50
I compare extraction process on Wikipedia dump from 2017-02-20I extracted on 11 languagesI evaluate precision based on a sample of 100 people
Extraction YAGObefore
Recall YAGOnow
Recall Precision DBpedia (en)
Birth dates 1.6m 69% 1.7m 74% (+8%) 100% 0.8m
Death dates 0.7m 33% 0.8m 36% (+10%) 100% 0.3m
Place ofresidence 0.7m 30% 2.1m 91% (+201%) 97% (*) 0.7m
Gender 1.5m 64% 2.0m 87% (+35%) 98% 4k
Table: Coverage and precision of our methods.Recall relative to total number of people in YAGO (2.2m).
m million k thousand(*) 6% of anachronistic residencies (e.g., German Empire instead of Germany)
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe HumanitiesRelated Work
Extensions
Birth and Death Dates
Place of residence
Gender
Evaluation
Life expectancy overtime
Births per month
Relative populationsize
Summary
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
Using YAGO for the Humanities: Life expectancy over time
17/50
100 500 1000 1500 1900
45
50
55
60
65
70
75
80
85
Year
Med
ian
age
malefemale
Figure: Median age over time, by year of birth(with the Student’s t confidence interval at α = 95%).
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe HumanitiesRelated Work
Extensions
Birth and Death Dates
Place of residence
Gender
Evaluation
Life expectancy overtime
Births per month
Relative populationsize
Summary
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
Using YAGO for the Humanities: Life expectancy over time
18/50
1100 1200 1300 1400 1500 1600 1700 1800 1900
50
55
60
65
70
75
80
Year
Med
ian
age
Great BritainItalyIndiaChina
Figure: Median age over time, by year of birth(with the Student’s t confidence interval at α = 95%).
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe HumanitiesRelated Work
Extensions
Birth and Death Dates
Place of residence
Gender
Evaluation
Life expectancy overtime
Births per month
Relative populationsize
Summary
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
Using YAGO for the Humanities: Births per month
19/50
0 2 4 6 8 10 12
7.5%
8%
8.5%
9%
Month
Rel
ativ
ebi
rths
YAGO - allNational Center for Health Statistics
Figure: Births per month in the United States between 2003 and 2015(with the Student’s t confidence interval at α = 95%).
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe HumanitiesRelated Work
Extensions
Birth and Death Dates
Place of residence
Gender
Evaluation
Life expectancy overtime
Births per month
Relative populationsize
Summary
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
Using YAGO for the Humanities: Births per month
20/50
Possible explanation:
Languages
EnglishEuskara
Relative age effect
The relative age effectdescribes a bias. Peopleborn early in the selec-tion period of sports oracademia are more likelyto perform well.
Relative age effect
Categories: Ageism | Epidemiology
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe HumanitiesRelated Work
Extensions
Birth and Death Dates
Place of residence
Gender
Evaluation
Life expectancy overtime
Births per month
Relative populationsize
Summary
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
Using YAGO for the Humanities: Births per month
21/50
0 2 4 6 8 10 12
7.5%
8%
8.5%
9%
Month
Rel
ativ
ebi
rths
YAGO - no sportsmenNational Center for Health Statistics
Figure: Births per month in the United States between 2003 and 2015(with the Student’s t confidence interval at α = 95%).
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe HumanitiesRelated Work
Extensions
Birth and Death Dates
Place of residence
Gender
Evaluation
Life expectancy overtime
Births per month
Relative populationsize
Summary
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
Using YAGO for the Humanities: Relative population size
22/50
−3000 −2500 −2000 −1500 −1000 −500 0 500 1000 1500 20000
0.1
0.2
0.3
0.4
0.5
0.6
0.70.8
1
Year
Rel
ativ
epo
pula
tion
size
EgyptBabylonian-Empire
SyriaChinaGreece
Ancient-RomeBritainFranceItaly
GermanyUnited-States
Figure: Relative population size, by century. The y-axis is scaled by a quadraticfunction.
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe HumanitiesRelated Work
Extensions
Birth and Death Dates
Place of residence
Gender
Evaluation
Life expectancy overtime
Births per month
Relative populationsize
Summary
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
Using YAGO for the Humanities: Summary
23/50
I extension of YAGOI more birth and death dates (+8%/10%, 100% precision)I more people with locations (+201%, 97% precison)I more people with genders (+35%, 98% precision)
I case studiesI life expectancyI births per monthI relative population size
Thomas Rebele Arash Nekoei Fabian Suchanek
publication: ISWC 2017 (workshop paper)
We often had to repair regular expressions (e.g., for matching dates).Can we automate this step?
100 500 1000 1500 1900
50
60
70
80
Year
Med
ian
age
malefemale
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexesIntroduction
Problem statement
What is new in ourapproach
Approximate regexmatching
Finding the gaps
Add missing parts
Feedback function
Experiments
Summary
AnsweringQueries with UnixShell
Conclusion
Adding Words to Regexes: Introduction
24/50
Why does YAGO not knowthe ISBN numbers of my books?
I we want to find ISBN numbers in Wikipedia to include it in YAGOI we try the regex ISBN(978|979)?\d{10}
I why does the regex not find I978-2-1234-5680-3 ?I how can we modify the regex automatically to match the word?
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexesIntroduction
Problem statement
What is new in ourapproach
Approximate regexmatching
Finding the gaps
Add missing parts
Feedback function
Experiments
Summary
AnsweringQueries with UnixShell
Conclusion
Adding Words to Regexes: Introduction
24/50
Why does YAGO not knowthe ISBN numbers of my books?
I we want to find ISBN numbers in Wikipedia to include it in YAGOI we try the regex ISBN(978|979)?\d{10}I why does the regex not find I978-2-1234-5680-3 ?I how can we modify the regex automatically to match the word?
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexesIntroduction
Problem statement
What is new in ourapproach
Approximate regexmatching
Finding the gaps
Add missing parts
Feedback function
Experiments
Summary
AnsweringQueries with UnixShell
Conclusion
Adding Words to Regexes: Problem statement
25/50
Problem statement, first try:
GivenI a regular expression r and ISBN(978|979)?\d{10}I a set of strings S, { I978-2-1234-5680-3 }
find a regular expression r′ such thatI L(r) ⊆ L(r′)I S ⊆ L(r′)
Solution:
r′ = .∗
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexesIntroduction
Problem statement
What is new in ourapproach
Approximate regexmatching
Finding the gaps
Add missing parts
Feedback function
Experiments
Summary
AnsweringQueries with UnixShell
Conclusion
Adding Words to Regexes: Problem statement
25/50
Problem statement, first try:
GivenI a regular expression r and ISBN(978|979)?\d{10}I a set of strings S, { I978-2-1234-5680-3 }
find a regular expression r′ such thatI L(r) ⊆ L(r′)I S ⊆ L(r′)
Solution:
r′ = .∗
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexesIntroduction
Problem statement
What is new in ourapproach
Approximate regexmatching
Finding the gaps
Add missing parts
Feedback function
Experiments
Summary
AnsweringQueries with UnixShell
Conclusion
Adding Words to Regexes: Problem statement
26/50
Problem statement:
GivenI a regular expression r, ISBN(978|979)?\d{10}I a set of strings S, { I978-2-1234-5680-3 }I a set of negative examples E−, { 0612345678 }
find a regular expression r′ such thatI L(r) ⊆ L(r′)I S ⊆ L(r′)I L(r′) ∩ E− is small
Additional goals:I precision of r′ ≥ or ≈ precision of rI recall of r′ ≥ recall of r
(w.r.t. the intended meaning of the regex)
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexesIntroduction
Problem statement
What is new in ourapproach
Approximate regexmatching
Finding the gaps
Add missing parts
Feedback function
Experiments
Summary
AnsweringQueries with UnixShell
Conclusion
Adding Words to Regexes: Problem statement
26/50
Problem statement:
GivenI a regular expression r, ISBN(978|979)?\d{10}I a set of strings S, { I978-2-1234-5680-3 }I a set of negative examples E−, { 0612345678 }
find a regular expression r′ such thatI L(r) ⊆ L(r′)I S ⊆ L(r′)I L(r′) ∩ E− is small
Additional goals:I precision of r′ ≥ or ≈ precision of rI recall of r′ ≥ recall of r
(w.r.t. the intended meaning of the regex)
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexesIntroduction
Problem statement
What is new in ourapproach
Approximate regexmatching
Finding the gaps
Add missing parts
Feedback function
Experiments
Summary
AnsweringQueries with UnixShell
Conclusion
Adding Words to Regexes: What is new in our approach
27/50
Previous approaches
(regex
)E+ E− regex+ + −→
Our approach
regex S E− regex+ + −→
Rationale: creating a large set of positive examples is difficult
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexesIntroduction
Problem statement
What is new in ourapproach
Approximate regexmatching
Finding the gaps
Add missing parts
Feedback function
Experiments
Summary
AnsweringQueries with UnixShell
Conclusion
Adding Words to Regexes: Approximate regex matching
28/50
Step 1: match string and regex approximately [Myers et al. 1989].
.
I S B N
?
|
.
9 7 8
.
9 7 9
.
\d \d ... \d \d
I 9 7 8 - 2 - 1 2 3 4 - 5 6 8 0 - 3
...
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexesIntroduction
Problem statement
What is new in ourapproach
Approximate regexmatching
Finding the gaps
Add missing parts
Feedback function
Experiments
Summary
AnsweringQueries with UnixShell
Conclusion
Adding Words to Regexes: Finding the gaps
29/50
Step 2: find the gapsI between regex leaves
I between characters of the string
.
.
I S B N
?
|
.
9 7 8
.
9 7 9
.
\d \d ... \d \dS B N
I 9 7 8 - 2 - 1 2 3 4 - 5 6 8 0 - 3
...
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexesIntroduction
Problem statement
What is new in ourapproach
Approximate regexmatching
Finding the gaps
Add missing parts
Feedback function
Experiments
Summary
AnsweringQueries with UnixShell
Conclusion
Adding Words to Regexes: Finding the gaps
29/50
Step 2: find the gapsI between regex leavesI between characters of the string
.
.
I S B N
?
|
.
9 7 8
.
9 7 9
.
\d \d ... \d \d
I 9 7 8 - 2 - 1 2 3 4 - 5 6 8 0 - 3-
...
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexesIntroduction
Problem statement
What is new in ourapproach
Approximate regexmatching
Finding the gaps
Add missing parts
Feedback function
Experiments
Summary
AnsweringQueries with UnixShell
Conclusion
Adding Words to Regexes: Add missing parts
30/50
Step 3 (simple approach): adapt regex, so that it includes the missingparts
.
.
I S B N
?
|
.
9 7 8
.
9 7 9
.
\d \d ... \d \dS B N
I 9 7 8 - 2 - 1 2 3 4 - 5 6 8 0 - 3
...
.
.
I ?
.
S B N
?
|
.
9 7 8
.
9 7 9
{10}
\d
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexesIntroduction
Problem statement
What is new in ourapproach
Approximate regexmatching
Finding the gaps
Add missing parts
Feedback function
Experiments
Summary
AnsweringQueries with UnixShell
Conclusion
Adding Words to Regexes: Add missing parts
30/50
Step 3 (simple approach): adapt regex, so that it includes the missingparts
.
.
I S B N
?
|
.
9 7 8
.
9 7 9
.
\d \d ... \d \d
I 9 7 8 - 2 - 1 2 3 4 - 5 6 8 0 - 3-
...
.
.
I ?
.
S B N
?
|
.
9 7 8
.
9 7 9
?
-
{10}
\d
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexesIntroduction
Problem statement
What is new in ourapproach
Approximate regexmatching
Finding the gaps
Add missing parts
Feedback function
Experiments
Summary
AnsweringQueries with UnixShell
Conclusion
Adding Words to Regexes: Add missing parts
30/50
Step 3 (simple approach): adapt regex, so that it includes the missingparts
.
.
I S B N
?
|
.
9 7 8
.
9 7 9
.
\d \d ... \d \d
I 9 7 8 - 2 - 1 2 3 4 - 5 6 8 0 - 3-
...
.
.
I ?
.
S B N
?
|
.
9 7 8
.
9 7 9
?
-
.
\d ?
-
\d ... \d \d
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexesIntroduction
Problem statement
What is new in ourapproach
Approximate regexmatching
Finding the gaps
Add missing parts
Feedback function
Experiments
Summary
AnsweringQueries with UnixShell
Conclusion
Adding Words to Regexes: Add missing parts
30/50
Step 3 (simple approach): adapt regex, so that it includes the missingparts
.
.
I S B N
?
|
.
9 7 8
.
9 7 9
.
\d \d ... \d \d
I 9 7 8 - 2 - 1 2 3 4 - 5 6 8 0 - 3-
...
.
.
I ?
.
S B N
?
|
.
9 7 8
.
9 7 9
?
-
.
\d ?
-
\d ... \d ?
-
\d
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexesIntroduction
Problem statement
What is new in ourapproach
Approximate regexmatching
Finding the gaps
Add missing parts
Feedback function
Experiments
Summary
AnsweringQueries with UnixShell
Conclusion
Adding Words to Regexes: Add missing parts
31/50
Step 3 (adaptive approach): adapt regex, so that it includes the missingpartsCases:
.
a b ... c d
{g1, g2, g3}
→
.
a
?
b ... c d
{g1, g ′2} {g ′′
2 , g3}
|
a ... b → only recursive call
{n,m}
r
{g1, g2, g3}
→
.
{...}
r
{...}
r
{g1, g ′2} {g ′′
2 , g3}
*
r
{g1, g2}
→
*
r
{g1, g ′2, g
′′2 }
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexesIntroduction
Problem statement
What is new in ourapproach
Approximate regexmatching
Finding the gaps
Add missing parts
Feedback function
Experiments
Summary
AnsweringQueries with UnixShell
Conclusion
Adding Words to Regexes: Add missing parts
31/50
Step 3 (adaptive approach): adapt regex, so that it includes the missingpartsCases:
.
a b ... c d
{g1, g2, g3}
→
.
a
?
b ... c d
{g1, g ′2} {g ′′
2 , g3}
|
a ... b → only recursive call
{n,m}
r
{g1, g2, g3}
→
.
{...}
r
{...}
r
{g1, g ′2} {g ′′
2 , g3}
*
r
{g1, g2}
→
*
r
{g1, g ′2, g
′′2 }
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexesIntroduction
Problem statement
What is new in ourapproach
Approximate regexmatching
Finding the gaps
Add missing parts
Feedback function
Experiments
Summary
AnsweringQueries with UnixShell
Conclusion
Adding Words to Regexes: Add missing parts
31/50
Step 3 (adaptive approach): adapt regex, so that it includes the missingpartsCases:
.
a b ... c d
{g1, g2, g3}
→
.
a
?
b ... c d
{g1, g ′2} {g ′′
2 , g3}
|
a ... b → only recursive call
{n,m}
r
{g1, g2, g3}
→
.
{...}
r
{...}
r
{g1, g ′2} {g ′′
2 , g3}
*
r
{g1, g2}
→
*
r
{g1, g ′2, g
′′2 }
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexesIntroduction
Problem statement
What is new in ourapproach
Approximate regexmatching
Finding the gaps
Add missing parts
Feedback function
Experiments
Summary
AnsweringQueries with UnixShell
Conclusion
Adding Words to Regexes: Add missing parts
31/50
Step 3 (adaptive approach): adapt regex, so that it includes the missingpartsCases:
.
a b ... c d
{g1, g2, g3}
→
.
a
?
b ... c d
{g1, g ′2} {g ′′
2 , g3}
|
a ... b → only recursive call
{n,m}
r
{g1, g2, g3}
→
.
{...}
r
{...}
r
{g1, g ′2} {g ′′
2 , g3}
*
r
{g1, g2}
→
*
r
{g1, g ′2, g
′′2 }
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexesIntroduction
Problem statement
What is new in ourapproach
Approximate regexmatching
Finding the gaps
Add missing parts
Feedback function
Experiments
Summary
AnsweringQueries with UnixShell
Conclusion
Adding Words to Regexes: Feedback function
32/50
ExampleI now we want to find URLsI we try regex r = http://[a-zA-Z\.]+I it does not find s = wikipedia.orgI repaired regex r′ = (http://)?[a-zA-Z\.]+
I problem: r′ finds all wordsI precision drops
Solution: use feedback on set of negative examples E−
I determine the parts of the regex that we can make optionalI we use the number of false positives, i.e.,
f (r′) = |E− ∩ L(r′)| ≤ α|E− ∩ L(r)|
I if f (r′) = false, add the word as disjunction instead:http://[a-zA-Z\.]+|wikipedia.org
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexesIntroduction
Problem statement
What is new in ourapproach
Approximate regexmatching
Finding the gaps
Add missing parts
Feedback function
Experiments
Summary
AnsweringQueries with UnixShell
Conclusion
Adding Words to Regexes: Feedback function
32/50
ExampleI now we want to find URLsI we try regex r = http://[a-zA-Z\.]+I it does not find s = wikipedia.orgI repaired regex r′ = (http://)?[a-zA-Z\.]+
I problem: r′ finds all wordsI precision drops
Solution: use feedback on set of negative examples E−
I determine the parts of the regex that we can make optionalI we use the number of false positives, i.e.,
f (r′) = |E− ∩ L(r′)| ≤ α|E− ∩ L(r)|
I if f (r′) = false, add the word as disjunction instead:http://[a-zA-Z\.]+|wikipedia.org
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexesIntroduction
Problem statement
What is new in ourapproach
Approximate regexmatching
Finding the gaps
Add missing parts
Feedback function
Experiments
Summary
AnsweringQueries with UnixShell
Conclusion
Adding Words to Regexes: Feedback function
33/50
Summary of the algorithm:
1. match strings in S approximately to r2. find gaps in the regex or in the strings
3. (adaptive:) find overlaps within the gaps
4. (simple:) add missing parts for every missing word one after theother(adaptive:) add missing parts and check intermediate steps withthe feedback
5. (adaptive:) add a generalization of non-repaired words(similar to [Babbar et al. 2010])
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexesIntroduction
Problem statement
What is new in ourapproach
Approximate regexmatching
Finding the gaps
Add missing parts
Feedback function
Experiments
Summary
AnsweringQueries with UnixShell
Conclusion
Adding Words to Regexes: Experiments
34/50
Input dataI datasets:
ReLIE [Li et al., 2008],Enron [Babbar et al., 2010], andYAGO infobox attributes
I in total 8 tasksI in total 52 regexes
Experimental approachI 5× 2 train/test splitI missing words S are selected randomly from E+ \ L(r), |S| ≤ 10I we draw 10 different sets S
Adding Words to Regexes: Experiments
35/50
BaselinesI dis: r|s1| · · · |sn
I star: .*CompetitorsI B&S: [Babbar et al., 2010] (reimplementation)I simpleI adaptive
baseline adaptive
measure original dis star B&S simple α = 1.0 α = 1.1 α = 1.20
F1 55 55 21 40 56 60 60 60recall 66 67 62 35 69 75 76 77precision 64 64 14 71 64 63 63 63
length 56 270 2 3929 250 76 80 81
Table: Averaged measures for the different systems. Length is # of characters ofthe regex.
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexesIntroduction
Problem statement
What is new in ourapproach
Approximate regexmatching
Finding the gaps
Add missing parts
Feedback function
Experiments
Summary
AnsweringQueries with UnixShell
Conclusion
Adding Words to Regexes: Summary
36/50
SummaryI algorithm for adding missing words to regexesI increases recall, while keeping precision stableI Source code available at
https://github.com/thomasrebele/regex-repair
Future workI decrease dependency on E−
I add a generalization step as postprocessing
Thomas Rebele Katerina Tzompanaki Fabian Suchanek
publications: ISWC 2017 (demo), PAKDD 2018 (full paper)
Now that we have all this data, how can we process it efficiently?
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexes
AnsweringQueries with UnixShellMotivation
System
Approach
Optimization
Experiments
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Motivation
37/50
How can I find all teachers?
Albert Einstein
Relativity
Cosmology
teaches
teaches
Alfred Kleiner Statistical physics
teaches
successorOf
Person
Person
type
type
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexes
AnsweringQueries with UnixShellMotivation
System
Approach
Optimization
Experiments
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Motivation
38/50
Observation:
database
qresult
importing querying
//
qresult
’grep’ing
/
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexes
AnsweringQueries with UnixShellMotivation
System
Approach
Optimization
Experiments
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Motivation
38/50
Observation:
database
qresult
importing querying
//
qresult
’grep’ing
/
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexes
AnsweringQueries with UnixShellMotivation
System
Approach
Optimization
Experiments
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Motivation
38/50
Observation:
database
qresult
importing querying
//
qresult
’grep’ing
/
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexes
AnsweringQueries with UnixShellMotivation
System
Approach
Optimization
Experiments
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: System
39/50
3transformation
QSPARQL/OWL
QDatalog
or
&Bash script
qTSV/n-triples files
qresult
QDatalog
σπAlgebra
÷Optimize
^Translate
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexes
AnsweringQueries with UnixShellMotivation
System
Approach
Optimization
Experiments
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: System
39/50
3transformation
QSPARQL/OWL
QDatalog
or
&Bash script
qTSV/n-triples files
qresult
QDatalog
σπAlgebra
÷Optimize
^Translate
Answering Queries with Unix Shell: Approach
40/50
Query "Which people teach a course?" in SPARQL
SELECT ?X WHERE {?X <type> <Person>.?X <teachesCourse> ?Y.
}
Translating the query to Datalog
Person(X) :-facts(X, "type", "Person").
teaches(X, Y) :-facts(X, "teaches", Y).
Teacher(X) :- Person(X),teaches(X,Y).
π1
on1=1
σ3=”Person”
σ2=”type”
facts
σ2=”teaches”
facts
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexes
AnsweringQueries with UnixShellMotivation
System
Approach
Optimization
Experiments
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Approach
41/50
Optimization:
π1
on1=1
σ3=”Person”
σ2=”type”
facts
σ2=”teaches”
facts
π1
on1=1
π1
σ3=”Person”
σ2=”type”
facts
π1
σ2=”teaches”
facts
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexes
AnsweringQueries with UnixShellMotivation
System
Approach
Optimization
Experiments
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Approach
42/50
Algebra plan
π1
on1=1
π1
σ3=”Person”
σ2=”type”
facts
π1
σ2=”teaches”
facts
Bash code
sort -u \<(join -1 1 -2 1 -o 1.1 \
<(sort -k 1 \<(awk ’($3 == "Person" && $2 == "type")
{ print $1 };’ facts))<(sort -k 1 \
<(awk ’($2 == "teaches"){ print $1 };’ facts))
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexes
AnsweringQueries with UnixShellMotivation
System
Approach
Optimization
Experiments
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Approach
42/50
Algebra plan
π1
on1=1
π1
σ3=”Person”
σ2=”type”
facts
π1
σ2=”teaches”
facts
Bash code
sort -u \<(join -1 1 -2 1 -o 1.1 \
<(sort -k 1 \<(awk ’($3 == "Person" && $2 == "type")
{ print $1 };’ facts))<(sort -k 1 \
<(awk ’($2 == "teaches"){ print $1 };’ facts))
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexes
AnsweringQueries with UnixShellMotivation
System
Approach
Optimization
Experiments
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Approach
42/50
Algebra plan
π1
on1=1
π1
σ3=”Person”
σ2=”type”
facts
π1
σ2=”teaches”
facts
Bash code
sort -u \<(join -1 1 -2 1 -o 1.1 \
<(sort -k 1 \<(awk ’($3 == "Person" && $2 == "type")
{ print $1 };’ facts))<(sort -k 1 \
<(awk ’($2 == "teaches"){ print $1 };’ facts))
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexes
AnsweringQueries with UnixShellMotivation
System
Approach
Optimization
Experiments
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Approach
42/50
Algebra plan
π1
on1=1
π1
σ3=”Person”
σ2=”type”
facts
π1
σ2=”teaches”
facts
Bash code
sort -u \<(join -1 1 -2 1 -o 1.1 \
<(sort -k 1 \<(awk ’($3 == "Person" && $2 == "type")
{ print $1 };’ facts))<(sort -k 1 \
<(awk ’($2 == "teaches"){ print $1 };’ facts))
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexes
AnsweringQueries with UnixShellMotivation
System
Approach
Optimization
Experiments
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Approach
42/50
Algebra plan
π1
on1=1
π1
σ3=”Person”
σ2=”type”
facts
π1
σ2=”teaches”
facts
Bash code
sort -u \<(join -1 1 -2 1 -o 1.1 \
<(sort -k 1 \<(awk ’($3 == "Person" && $2 == "type")
{ print $1 };’ facts))<(sort -k 1 \
<(awk ’($2 == "teaches"){ print $1 };’ facts))
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexes
AnsweringQueries with UnixShellMotivation
System
Approach
Optimization
Experiments
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Optimization
43/50
OptimizationsI algebraic, e.g., merge union / projectsI semi-naive evaluationI join reorderingI remove superfluous recursive callsI materialize repeated subplansI read files only onceI tweak Unix commands, e.g., using LANG=C and MAWK
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexes
AnsweringQueries with UnixShellMotivation
System
Approach
Optimization
Experiments
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Optimization
44/50
How can I find all professors?
Professor(X) :- Person(X),teachesCourse(X,Y).
Professor(X) :- successorOf(X,Y),Professor(Y).
Person(X) :- Employee(X).Person(X) :- Professor(X).
Combining the first and the last rule leads to
Professor(X) :- Professor(X),teachesCourse(X,Y).
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexes
AnsweringQueries with UnixShellMotivation
System
Approach
Optimization
Experiments
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Optimization
45/50
µx
∪
π1
on1=1
∪
employee x
teachesCourse
π1
on2=1
successorOf x
{ } {?}
{ } {?}
{ }
{ , ?, ?}
{ }
{ }
⇒ superfluous
{?}
{?, ?, }
{ }
⇒ necessary
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexes
AnsweringQueries with UnixShellMotivation
System
Approach
Optimization
Experiments
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Optimization
45/50
µx
∪
π1
on1=1
∪
employee x
teachesCourse
π1
on2=1
successorOf x
{ } {?}
{ } {?}
{ }
{ , ?, ?}
{ }
{ }
⇒ superfluous
{?}
{?, ?, }
{ }
⇒ necessary
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexes
AnsweringQueries with UnixShellMotivation
System
Approach
Optimization
Experiments
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Optimization
45/50
µx
∪
π1
on1=1
∪
employee x
teachesCourse
π1
on2=1
successorOf x
{ }
{?}
{ }
{?}
{ }
{ , ?, ?}
{ }
{ }
⇒ superfluous
{?}
{?, ?, }
{ }
⇒ necessary
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexes
AnsweringQueries with UnixShellMotivation
System
Approach
Optimization
Experiments
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Optimization
45/50
µx
∪
π1
on1=1
∪
employee x
teachesCourse
π1
on2=1
successorOf x
{ }
{?}
{ }
{?}
{ }
{ , ?, ?}
{ }
{ }
⇒ superfluous
{?}
{?, ?, }
{ }
⇒ necessary
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexes
AnsweringQueries with UnixShellMotivation
System
Approach
Optimization
Experiments
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Optimization
45/50
µx
∪
π1
on1=1
∪
employee x
teachesCourse
π1
on2=1
successorOf x
{ } {?}
{ } {?}
{ }
{ , ?, ?}
{ }
{ }
⇒ superfluous
{?}
{?, ?, }
{ }
⇒ necessary
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexes
AnsweringQueries with UnixShellMotivation
System
Approach
Optimization
Experiments
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Optimization
45/50
µx
∪
π1
on1=1
∪
employee x
teachesCourse
π1
on2=1
successorOf x
{ } {?}
{ } {?}
{ }
{ , ?, ?}
{ }
{ }
⇒ superfluous
{?}
{?, ?, }
{ }
⇒ necessary
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexes
AnsweringQueries with UnixShellMotivation
System
Approach
Optimization
Experiments
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Optimization
45/50
µx
∪
π1
on1=1
∪
employee x
teachesCourse
π1
on2=1
successorOf x
{c1} {?}
{c1} {?}
{c1}
{c1, ?, ?}
{c1}
{c1}
⇒ superfluous
{?}
{?, ?, c1}
{c1}
⇒ necessary
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexes
AnsweringQueries with UnixShellMotivation
System
Approach
Optimization
Experiments
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Optimization
46/50
Assume that employee is an expensive subplan
µx
employee ...
£ y
employee µx
y ...
Answering Queries with Unix Shell: Experiments
47/50
I Dataset: LUBM university benchmarkI 14 different queriesI competitors: Datalog-based (DLV, Souffle, RDFox),
Triple store (Jena, Stardog, Virtuoso),Database management system (MonetDB, Postgres)
Number of finished queries
LUBM Bash DLV Souffle RDFox Jena Stardog Virtuoso MonetDB* Postgres*
10 14 14 13 14 5 14 6 10 10500 14 11 14 14 6 101000 14 4 14 14 10
Runtime in seconds
LUBM Bash DLV Souffle RDFox Jena Stardog Virtuoso MonetDB* Postgres*
10 1.6 9.3 (21.9) 2.2 (78.7) 13.6 (11.8) (5.2) (20.6)500 83 (310) 132 676 (1581) (600)1000 258 (346) 278 2009 (1187)
* = we folded the TBox into the query
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexes
AnsweringQueries with UnixShellMotivation
System
Approach
Optimization
Experiments
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Experiments
48/50
Figure: Screenshot of the web interface
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexes
AnsweringQueries with UnixShellMotivation
System
Approach
Optimization
Experiments
Experiments
Summary
Conclusion
Answering Queries with Unix Shell: Summary
49/50
SummaryI Preprocess large datasets without installing softwareI Supports OWL RL subset and Datalog as query languageI Try it online at
https://www.thomasrebele.org/projects/bashlogI Source code available at
https://github.com/thomasrebele/bashlog
Future workI numerical comparisonsI aggregations (e.g., max, count)
Thomas Rebele Thomas P. Tanon Fabian Suchanek
publication: ISWC 2018 (full paper)
Expanding theYAGO knowledge
base
Rebele
The YAGOknowledge base
Using YAGO forthe Humanities
Adding Words toRegexes
AnsweringQueries with UnixShell
Conclusion
Conclusion
50/50
This thesis showed how to extend YAGO along several axes:
I Improve completeness w.r.t. peopleI Automatically repairing of its regular expressionsI Preprocessing queries using only a Bash shell
I Interdisciplinary projectI Source code of all contributions is available onlineI Publications in ISWC 2016, ISWC 2017, ISWC 2018, PAKDD 2018
(other publication in TPDL 2016 (demo))