Upload
others
View
15
Download
0
Embed Size (px)
Citation preview
Efficient Probabilistic Logic Reasoning with Graph Neural Networks
Le Song
Associate Professor, CSE, College of ComputingAssociate Director, Center for Machine Learning
Georgia Institute of Technology
Learning with small dataβ’ Combine learning and logic reasoning
β’ Learning: generalize with noisy inputβ’ Logic/symbolic reasoning: predict without data
Data requirement
Noise Tolerance Deep
learning
Symbolic
Desired
Example of logic inference: HorseShape(c) β§ StripePattern(c) β Zebra(c)
+
βZebra
Horse
2
Graph neural networks (GNN/Structure2vec)β’ Obtain embedding via iterative updates:
1. Initialize ππππ(0) = β ππ1ππππ ,β ππ
2. Iterate ππ times
ππππ(π‘π‘) β β ππ1ππππ
(π‘π‘β1) + ππ2 οΏ½ππβπ©π© ππ
ππππ(π‘π‘β1)
neural network
ππ6
ππ1
ππ2 ππ3
ππ4
ππ5
ππ1(0)ππ1(1)
ππ ππ , ππ
Supervised Learning
GenerativeModels
ReinforcementLearning
ππππ(ππ) ππππ
(ππ) ππππππ
ππβππππππ(ππ)
ππ2(1)
ππ6(1)
ππ3(1)
ππ5(1)
ππ4(1)
[Dai, et al. ICML 16]3
Logical expressiveness of GNNβ’ Logical classifiers
β’ classifies each node according to whether it satisfies a formulaβ’ the formula is expressed in first order predicate logic (FO)
4
graded modal logic β FOC2 β FO
WL-test=
GNN
=
GNN(β depth)=
(finite depth)
β€
R-GNN
Readout(global information)
[Barcelo, et al. ICLRβ20]
FOC2
β’ 2 variableβ’ With Counting quantifier, ββ₯ππ
ππ π₯π₯ β Red π₯π₯ β§ ββ₯3π¦π¦ [Β¬Edge π₯π₯,π¦π¦ β§ π΅π΅π΅π΅π΅π΅π΅π΅ π¦π¦ ]
graded modal logic β FOC2
β’ Guarded by Edge π₯π₯,π¦π¦ .β’ Allow ββ₯πππ¦π¦ Edge π₯π₯,π¦π¦ β§ Predicate π¦π¦ , β’ but do not allow ββ₯πππ¦π¦ Predicate π¦π¦ππ π₯π₯ β Red π₯π₯ β§ ββ₯3π¦π¦ [Edge π₯π₯,π¦π¦ β§ π΅π΅π΅π΅π΅π΅π΅π΅ π¦π¦ ]
More compact representation and lower error
Harvard clean energy dataset, 2.3 million organic molecules, predict power conversion efficiency (0 -12 %)
0.1M 1M 10M 100M 1000M
0.0850.0950.1200.150
0.280
Dimension
MAE
Embedded MF
Embedded BP
Weisfeiler-LehmanLevel 6
HashedWL Level 6
Structure2Vecreduces model
size by 10,000x !
Le Song 5[Dai, et al. ICMLβ16]
Fraudulent account detection
Alipay: online payment platform
new accounts in a month: millions of nodes and edges.
Fake account can increase system level risk
Account β Device Network
Bad
Good
device
[Liu, et al. CCSβ17, CIKMβ17, AAAIβ19]Recall
Precision
detection comparisonrule structure2vecnode2vec
6
Knowledge base reasoningβ’ Many domains need knowledge bases β¦
β’ Contain incorrect, incomplete, duplicated recordsβ’ Need link prediction, attribution classification,
record de-duplication.
β’ Many existing rules/logic
β’ Graph neural networksβ’ not explicitly use existing knowledge
7
UW-CSE dataset detailsβ’ 22 relations
β’ Teach, publish β¦
β’ Task goalβ’ Predict who is whose advisorβ’ Zero observed facts for query predicates
β’ 94 crowd-sourced FOL formulasadvisedBy(s, p) β professor(p)
advisedBy(s, p) βΒ¬yearsInProgram(s, Year_1)
professor(x) βΒ¬student(y)
publication(p, x) v publication(p,y) v student(x) v Β¬student(y) β professor(y)
student(x) v Β¬advisedBy(x,y) β tempAdvisedBy(x,y)β¦ S1 S3
P2 Paper1publish
Course1
8
ππππ
Bipartite graph representation for knowledge baseβ’ Entity, ππ = π΄π΄,π΅π΅,πΆπΆ,π·π·β¦β’ Predicate (attribute | relation), ππ β :ππ Γ ππ Γ β― β¦ 0,1
β’ Eg. Smoke(x), Friend(x,xβ), Like(x,xβ)
F(A,B)=1
F(A,C)=1
S(B)=1
S(A)=1
F(A,D)=1
A B C D
F(B,C) F(B,D) F(C,D) S(C) S(D)
ππππ
predicatebinary variable latent variable
9
ππ ππ
Markov logic networksβ’ Use logic formula ππ β :ππ Γ ππ Γ β―Γ ππ β¦ 0,1 for potential functions
β’ Eg. formula ππ(x,xβ): Friend(x,xβ) β§ Smoke(x) βSmoke(xβ)
F(A,B)=1
F(A,C)=1
S(B)=1
S(A)=1
F(A,D)=1 F(B,C) F(B,D) F(C,D) S(C) S(D)
ππ(A,B) ππ(A,C) ππ(A,D) ππ(B,C) ππ(B,D) ππ(C,D)
ππππππ
ππ ππ,ππ =1ππ
exp οΏ½ππ
π€π€πποΏ½ππππ
ππππ(ππππ)
π€π€ππ: formula weight, eg. ππππ π₯π₯, π₯π₯π₯ : Β¬Friend (x,xβ)β¨Β¬Smoke (x) β¨ Smoke (xβ)10
ππ ππ
Challenges in inferenceβ’ Large grounded network, ππ ππ2 in the number ππ of entities!
β’ Enumerate configuration over ππ(ππ2) binary variables, with ππ 2ππ2 possibilities.
F(A,B)=1
F(A,C)=1
S(B)=1
S(A)=1
F(A,D)=1 F(B,C) F(B,D) F(C,D) S(C) S(D)
ππ(A,B) ππ(A,C) ππ(A,D) ππ(B,C) ππ(B,D) ππ(C,D)
ππππππ
ππ F B, C ππ ππ S C ππ
ππ = οΏ½ππππππ ππππππππππππππππ πππππππππππππππ‘π‘ππππππ ππππ π£π£πππ£π£ππππππππππ π£π£πππππ£π£ππππ
exp οΏ½ππ
π€π€πποΏ½ππππ
ππππ(ππππ)
11
Variational EM
PosteriorLikelihood
GCNMLN
KnowledgeGraph
ππππ(β )
formula potential predicate posterior
12
Variational EM
ππ ππ,ππ =1ππ exp οΏ½
ππ
π€π€πποΏ½ππππ
ππππ(ππππ) ππ ππ ππ
[Zhang ICLRβ20]
ExpressGNN: use GNN in variational inferenceβ’ GNN on original KB (ππππ) to get embedding πππ΄π΄, πππ΅π΅,πππΆπΆ , πππ·π· for entities
πππ΄π΄, πππ΅π΅,πππΆπΆ , πππ·π· = πΊπΊππππ(ππππ;ππ)
ππ
F(A,B)=1
F(A,C)=1
S(B)=1
S(A)=1
F(A,D)=1
A B C D
πππ΄π΄ πππ΅π΅ πππΆπΆ πππ·π·
ππππ
13[Zhang ICLRβ20]
ππππ(π‘π‘) β β ππ1ππππ
(π‘π‘β1) + ππ2 οΏ½ππβπ©π© ππ
ππππ(π‘π‘β1)
graph neural networks
ExpressGNN: use GNN in variational inferenceβ’ GNN on original KB (ππππ) to get embedding πππ΄π΄, πππ΅π΅,πππΆπΆ , πππ·π· for entities
πππ΄π΄, πππ΅π΅,πππΆπΆ , πππ·π· = πΊπΊππππ(ππππ;ππ)
β’ ππ F B, C = 1 ππ : = 11+exp πππ΅π΅
β€ΞπΉπΉ πππΆπΆ+πππ΅π΅β€πππΉπΉ
, ππ S C = 1 ππ : = 11+exp ππππ
β€ πππΆπΆ,πππΆπΆ
ππ
F(A,B)=1
F(A,C)=1
S(B)=1
S(A)=1
F(A,D)=1
ππF(B,C) F(B,D) F(C,D) S(C) S(D)
A B C D
πππ΄π΄ πππ΅π΅ πππΆπΆ πππ·π·
ππππβ²
14[Zhang ICLRβ20]
ππππ(π‘π‘) β β ππ1ππππ
(π‘π‘β1) + ππ2 οΏ½ππβπ©π© ππ
ππππ(π‘π‘β1)
graph neural networks
F(A,B)=1
F(A,C)=1
S(B)=1
S(A)=1
F(A,D)=1
A B C D
F(B,C) F(B,D) F(C,D) S(C) S(D)
F(A,B)=1
F(A,C)=1
S(B)=1
S(A)=1
F(A,D)=1 F(B,C) F(B,D) F(C,D) S(C) S(D)
ππ(A,B) ππ(A,C) ππ(A,D) ππ(B,C) ππ(B,D) ππ(C,D)
Is GNN embedding expressive enough?
if and only if
and
πΉπΉ π₯π₯, π¦π¦ππππβ²
πΉπΉ π₯π₯β²,π¦π¦π₯
π₯π₯ππππ π₯π₯π₯ π¦π¦
ππππ π¦π¦π₯
not true
πΉπΉ π₯π₯,π¦π¦ππππππ
πΉπΉ π₯π₯β²,π¦π¦π₯
ππππππ
ππππβ²
15[Zhang ICLRβ20]
ππ F B, C = 1 ππ: = 1
1+exp πππ΅π΅β€ΞπΉπΉ πππΆπΆ+πππ΅π΅
β€πππΉπΉ
Variational EM
PosteriorLikelihood
GCNMLN
KnowledgeGraph
ππππ(β )
formula potential predicate posterior
16
Variational EM
ππ ππ,ππ =1ππ exp οΏ½
ππ
π€π€πποΏ½ππππ
ππππ(ππππ) ππ ππ ππ
[Zhang et al. ICLRβ20]
Cora dataset details (CS paper citations)β’ 10 relation types
β’ Author, Title, Venue, HasWordTitle, β¦
β’ Task goal (entity resolution)β’ De-duplicate citations, authors, and venuesβ’ Zero observed facts for query predicates
β’ 46 crowd-sourced FOL formulas
A1 A2
W1 Paper1HasWordTitle
SameAuthor?
Author(bc1,a1) v Author(bc2,a2) v SameAuthor(a1,a2) β SameBib(bc1,bc2)
HasWordAuthor(a1, w) v HasWordAuthor(a2, w) β SameAuthor(a1, a2)
Title(bc1,t1) v Title(bc2,t2) v SameBib(bc1,bc2) β SameTitle(t1,t2)
SameVenue(v1,v2) v SameVenue(v2,v3) β SameVenue(v1,v3)
Title(bc1, t1) v Title(bc2, t2) v HasWordTitle(t1, +w) v HasWordTitle(t2, +w) β SameBib(bc1, bc2)
β¦17
Inference accuracy and timeβ’ Area under precision-recall curve (AUC-PR)
β’ Inference wall clock time
ExpressGNN
18
Large Facebook15K-237 datasetβ’ # entity: 15K, #ground predicate: 50M, #ground formula: 679B
β’ Increase training data for target predicates
19
Learning with small dataβ’ Combining learning and logic reasoning
β’ Learning: generalize with noisy inputβ’ Logic/symbolic reasoning: predict without data
Data requirement
Noise Tolerance Deep
learning
Symbolic
Desired
Symbolic reasoning: HorseShape(c) β§ StripePattern(c) β Zebra(c)
Convolutional NNLife is greatRecurrent NN
Learning based approaches
20