Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Reasoning with Deterministic and Probabilistic graphical models
Class 2: Inference in Constraint Networks
Rina Dechter
class1 Padova
Road Map
n Graphical models
n Constraint networks Model n Inference
n Search
n Probabilistic Networks
class1 Padova
A B red green red yellow green red green yellow yellow green yellow red
Example: map coloring Variables - countries (A,B,C,etc.) Values - colors (red, green, blue) Constraints:
C
A
B
D E
F
G
A Constraint Networks
A
B
E
G
D F
C
Constraint graph
class1 Padova
etc. ,ED D, AB,A ≠≠≠
Example: map coloring Variables - countries (A,B,C,etc.) Values - colors (e.g., red, green, yellow) Constraints:
Constraint Satisfaction Tasks
Are the constraints consistent? Find a solution, find all solutions Count all solutions Find a good (optimal) solution
class1 Padova
etc. ,ED D, AB,A ≠≠≠
Constraint Network
n A constraint network is: R=(X,D,C)
n X variables
n D domain
n C constraints
n R expresses allowed tuples over scopes n A solution is an assignment to all variables that satisfies all constraints (join of all relations). n Tasks: consistency?, one or all solutions, counting, optimization
class1 Padova
},...,{ 1 nXXX =},...{},,...,{ 11 kin vvDDDD ==
),(},...{ 1
iii
t
RSCCCC
=
=
Crossword puzzle
n Variables: x1, …, x13 n Domains: letters n Constraints: words from
{HOSES, LASER, SHEET, SNAIL, STEER, ALSO, EARN, HIKE, IRON, SAME, EAT, LET, RUN, SUN, TEN, YES, BE, IT, NO, US}
class1 Padova
The Queen problem
The network has four variables, all with domains Di = {1, 2, 3, 4}. (a) The labeled chess board. (b) The constraints between variables.
class1 Padova
Constraint’s representations
n Relation: allowed tuples n Algebraic expression: n Propositional formula: n Semantics: by a relation
class1 Padova
YXYX ≠≤+ ,102
cba ¬→∨ )(
312231ZYX
Partial solutions
Not all consistent instantiations are part of a solution: (a) A consistent instantiation that is not part of a solution. (b) The placement of the queens corresponding to the solution (2, 4, 1, 3). (c) The placement of the queens corresponding to the solution (3, 1, 4, 2).
class1 Padova
Constraint Graphs (primal)
Queen problem
class1 Padova
Constraint Graphs: Primal, Dual and Hypergraphs
A (primal) constraint graph: a node per variable arcs connect constrained variables. A dual constraint graph: a node per constraint’s scope, an arc connect nodes sharing variables =hypergraph
class1 Padova
Graph Concepts Reviews: Primal, Hyper and Dual Graphs
n A hypergraph n Dual graphs n A primal graph
class1 Padova
Propositional Satisfiability
ϕ = {(¬C), (A v B v C), (¬A v B v E), (¬B v C v D)}.
class1 Padova
Constraint graphs of 3 instances of the Radio frequency assignment problem in CELAR’s benchmark
class1 Padova
Operations With Relations
n Intersection
n Union
n Difference
n Selection
n Projection
n Join
n Composition
class1 Padova
n Join : n Logical AND:
Local Functions
Combination
class1 Padova
=
gf
gf ∧
= ∧
Global View of the Problem
C1 C2 Global View
What about counting?
Number of true tuples Sum over all the tuples
true is 1 false is 0 logical AND?
TASK
class1 Padova
=
Road Map
n Graphical models
n Constraint networks Model n Inference
n Variable elimination: n Tree-clustering
n Constraint propagation
n Search
n Probabilistic Networks
class1 Padova
Bucket Elimination Adaptive Consistency (Dechter & Pearl, 1987)
Bucket E: E ≠ D, E ≠ C Bucket D: D ≠ A Bucket C: C ≠ B Bucket B: B ≠ A Bucket A:
A ≠ C
contradiction
=
D = C
B = A
= ≠
class1 Padova
widthinduced -*
*
w ))exp(w O(n :Complexity
The Idea of Elimination
3 value assignment
D
B
C RDBC
eliminating E
class1 Padova
project and join E variableEliminate
⇔
=∏ ECDBC EBEDDBC RRRR
A
E D C B
E
A
D C B
|| RDBE , || RE
|| RDB
|| RDCB || RACB || RAB
RA
RCBE
Bucket Elimination Adaptive Consistency (Dechter & Pearl, 1987)
class1 Padova
d ordering along widthinduced -(d) ,
*
*
w(d)))exp(w O(n :Complexity
≠
≠≠
≠
≠
≠
E
D
A
C
B
}2,1{
}2,1{}2,1{
}2,1{ }3,2,1{
:)(AB :)(BC :)(AD :)(
BE C,E D,E :)(
ABucketBBucketCBucketDBucketEBucket
≠≠≠
≠≠≠
:)(EB :)(
EC , BC :)(ED :)(
BA D,A :)(
EBucketBBucketCBucketDBucketABucket
≠≠≠
≠≠≠
Adaptive Consistency, Bucket-elimination
Initialize: partition constraints into For i=n down to 1 along d // process in reverse order
for all relations do // join all relations and “project-out”
If is not empty, add it to where k is the largest variable index in Else problem is unsatisfiable Return the set of all relations (old and new) in the buckets
class1 Padova
nbucketbucket ,...,1
im bucketRR ∈,...,1
) ()( jX jnew RRi
∏−←
iX
newR ,, ikbucketk <newR
Properties of bucket-elimination (adaptive consistency)
n Adaptive consistency generates a constraint network that is backtrack-free (can be solved without deadends). n The time and space complexity of adaptive consistency along ordering d is exponential in w* . n Therefore, problems having bounded induced width are tractable (solved in polynomial time).
n trees ( w*=1), n series-parallel networks ( w*=2 ), n and in general k-trees ( w*=k).
class1 Padova
Solving Trees (Mackworth and Freuder, 1985)
Adaptive consistency is linear for trees and equivalent to enforcing directional arc-consistency (recording only unary constraints)
class1 Padova
Tree Solving is Easy
1,2,3 1,2,3 1,2,3 1,2,3
1,2,3 1,2,3
1,2,3
<
<
Y
X
Z
T S R U
class1 Padova
Tree Solving is Easy
1,2,3 1,2,3 1,2,3 1,2,3
1,2,3 1,2,3
1,2,3
<
<
Y
X
Z
T S R U
class1 Padova
Tree Solving is Easy
1,2,3 1,2,3 1,2,3 1,2,3
1,2 1,2
1,2,3
<
<
Y
X
Z
T S R U
class1 Padova
Tree Solving is Easy
1,2,3 1,2,3 1,2,3 1,2,3
1,2 1,2
1,2,3
<
<
Y
X
Z
T S R U
class1 Padova
Tree Solving is Easy
1,2,3 1,2,3 1,2,3 1,2,3
1,2 1,2
1
<
<
Y
X
Z
T S R U
class1 Padova
Tree Solving is Easy
1,2,3 1,2,3 1,2,3 1,2,3
2 2
1
<
<
Y
X
Z
T S R U
class1 Padova
Tree Solving is Easy
3 3 3 3
2 2
1
<
<
Y
X
Z
T S R U
class1 Padova
Tree Solving is Easy
3 3 3 3
2 2
1
<
<
Y
X
Z
T S R U
Constraint propagation Solves trees in linear time
class1 Padova
Road Map
n Graphical models
n Constraint networks Model n Inference
n Variable elimination: n Tree-clustering
n Constraint propagation
n Search
n Probabilistic Networks
class1 Padova
Tree Decomposition
Each function in a cluster Satisfy running intersection property
G
E
F
C D
B
A A B C
R(a), R(b,a), R(c,a,b)
B C D F R(d,b), R(f,c,d)
B E F R(e,b,f)
E F G R(g,e,f)
2
4
1
3
EF
BC
BF
R(a), R(b,a), R(c,a,b) R(d,b), R(f,c,d) R(e,b,f), R(g,e,f)
class1 Padova
Cluster Tree Elimination
A B C R(a), R(b,a), R(c,a,b)
B C D F R(d,b), R(f,c,d),h(1,2)(b,c)
B E F R(e,b,f), h(2,3)(b,f)
E F G R(g,e,f)
2
4
1
3
EF
BC
BF sep(2,3)={B,F} elim(2,3)={C,D}
Computes the minimal domains
class1 Padova
),,(),()(),()2,1( bacRabRaRcbh a ⊗⊗=⇓
),(),,(),(),( )2,1(,)3,2( cbhdcfRbdRfbh dc ⊗⊗=⇓
CTE: Cluster Tree Elimination
ABC
2
4
1
3 BEF
EFG
EF
BF
BC
BCDF
Time: O ( exp(w*+1 )) Space: O ( exp(sep))
G
E
F
C D
B
A
Computes the minimal domains
class1 Padova
),,(),()(),()2,1( bacRabRaRcbh a ⊗⊗=⇓
),(),,(),(),( )2,3(,)1,2( fbhdcfRbdRcbh fd ⊗⊗=⇓
),(),,(),(),( )2,1(,)3,2( cbhdcfRbdRfbh dc ⊗⊗=⇓
),(),,(),( )3,4()2,3( fehfbeRfbh e ⊗=⇓
),(),,(),( )3,2()4,3( fbhfbeRfeh b ⊗=⇓
),,(),()3,4( fegRfeh g=⇓
Tree decompositions
A B C R(a), R(b,a), R(c,a,b)
B C D F R(d,b), R(f,c,d)
B E F R(e,b,f)
E F G R(g,e,f)
EF
BF
BC
G
E
F
C D
B A
Belief network Tree decomposition
class1 Padova
property) onintersecti (running subtree connected a forms set the variableeach For 2.
and thatsuch vertex oneexactly is there function each For 1.
:satisfying and x over vertelabelings are and and a tree is where,,, triple
a is for A
χ(v)}V|X{vXXχ(v))scope(Cψ(v)C
CCCψ(v)Xχ(v)Vv
ψχ(V,E)TTX,D,CRiondecomposit tree
ii
ii
i
∈∈∈
⊆∈
∈
⊆⊆∈
=><
>=<
ψχ
Induced-width and Tree-width
EDCB
DCBA
DBE
ADB
CBE
Tree-width =3
Tree-width =2
Induced-width Of ordering
Tree-width of a graph = smallest cluster in a cluster-tree Path-width of a graph = smallest cluster in a cluster-path
class1 Padova
E
D
A
C
B
Road Map
n Graphical models
n Constraint networks Model n Inference
n Variable elimination: n Tree-clustering
n Constraint propagation
n Search
n Probabilistic Networks
class1 Padova
Sudoku – Approximation: Constraint Propagation
Each row, column and major block must be alldifferent “Well posed” if it has unique solution: 27 constraints
2 3 4 6 2
〉 Variables: empty slots 〉 Domains = {1,2,3,4,5,6,7,8,9} 〉 Constraints:
〉 27 all-different
〉 Constraint 〉 Propagation 〉 Inference
class1 Padova
Approximating Inference: Local Constraint Propagation
n Problem: bucket-elimination/tree-clustering algorithms are intractable when induced width is large
n Approximation: bound the size of recorded dependencies, i.e. perform local constraint propagation (local inference)
class1 Padova
From Global to Local Consistency
class1 Padova
Arc-consistency
A binary constraint R(X,Y) is arc-consistent w.r.t. X is every value In x’s domain has a match in y’s domain.
Only domains are reduced:
class1 Padova
YXRR YX <== constraint },3,2,1{ },3,2,1{
∏←X YXYX DRR
3 2, 1,
3 2, 1, 3 2, 1,
1 ≤ X, Y, Z, T ≤ 3 X < Y Y = Z T < Z X ≤ T
X Y
T Z
3 2, 1, <
=
<
∧
Arc-consistency
class1 Padova
1 ≤ X, Y, Z, T ≤ 3 X < Y Y = Z T < Z X ≤ T
X Y
T Z
<
=
<
∧
1 3
2 3
Arc-consistency
class1 Padova
∏←X YXYX DRR
From Arc-consistency to relational arc-consistency
n Sound n Incomplete n Always converges
(polynomial)
A B
C D
3 2 1 A
3 2 1 B
3 2 1 D
3 2 1 C
<
<
< =
class1 Padova
Relational Distributed Arc-Consistency
Primal Dual
AB
AD
A
A B
C D
3 2 1 A
3 2 1 B
3 2 1 D
3 2 1 C
DC
BC
D
C
B
class1 Padova
AC-3
Complexity: Best case O(ek), since each arc may be processed in O(2k)
class1 Padova
)( 3ekO
Arc-consistency Algorithms
n AC-1: brute-force, distributed
n AC-3, queue-based
n AC-4, context-based, optimal n AC-5,6,7,…. Good in special cases
n Important: applied at every node of search n (n number of variables, e=#constraints, k=domain size) n Mackworth and Freuder (1977,1983), Mohr and Anderson, (1985)…
class1 Padova
)( 3ekO)( 3nekO
)( 2ekO
Path-consistency
n A pair (x, y) is path-consistent relative to Z, if every consistent assignment (x, y) has a consistent extension to z.
class1 Padova
Example: path-consistency
class1 Padova
Path-consistency Algorithms
n Apply Revise-3 (O(k^3)) until no change
n Path-consistency (3-consistency) adds binary constraints. n PC-1:
n PC-2:
n PC-4 optimal:
class1 Padova
)( kjkikijijij RDRRR ⊗⊗∩← π
)( 55knO)( 53knO
)( 33knO
Local i-consistency
i-consistency: Any consistent assignment to any i-1 variables is consistent with at least one value of any i-th variable
class1 Padova
Gausian and Boolean Propagation, Resolution
n Linear inequalities n Boolean constraint propagation, unit resolution
class1 Padova
2,2 ≤≤ yx
)( CA ¬∨
⇒≥≤++ 13,15 zzyx
⇒¬¬∨∨ )(),( BCBA
Directional Resolution ó Adaptive Consistency
class1 Padova
))exp(( :space and timeDR))(exp(||
*
*
wnOwObucketi =
Directional i-consistency
A
E
C D
B
D
C B
E
D
C B
E
D
C B
E
Adaptive d-arc d-path
class1 Padova
DCBR
≠
≠ ≠
≠ ≠≠≠
:AB A:BBC :C
AD C,D :DBE C,E D,E :E
≠
≠
≠≠
≠≠≠
DBDC RR ,CBR
DRCRDR
class1 Padova
Greedy Algorithms for Induced-Width
n Min-width ordering
n Min-induced-width ordering
n Max-cardinality ordering
n Min-fill ordering
n Chordal graphs
n Hypergraph partitionings (Project: present papers on induced-width, run algorithms for induced-width on new benchmarks…)
Min-width ordering
Proposition: algorithm min-width finds a min-width ordering of a graph Complexity:? O(e)
class1 Padova
Greedy orderings heuristics
min-induced-width (miw) input: a graph G = (V;E), V = {v1; :::; vn} output: A miw ordering of the nodes d = (v1; :::; vn). 1. for j = n to 1 by -1 do 2. r ß a node in V with smallest degree. 3. put r in position j. 4. connect r's neighbors: E ß E union {(vi; vj)| (vi; r) in E; (vj ; r) 2 in E}, 5. remove r from the resulting graph: V ßV - {r}. min-fill (min-fill) input: a graph G = (V;E), V = {v1; :::; vn} output: An ordering of the nodes d = (v1; :::; vn). 1. for j = n to 1 by -1 do 2. r ßa node in V with smallest fill edges for his parents. 3. put r in position j. 4. connect r's neighbors: E ßE union {(vi; vj)| (vi; r) 2 E; (vj ; r) in E}, 5. remove r from the resulting graph: V ßV –{r}.
Theorem: A graph is a tree iff it has both width and induced-width of 1.
class1 Padova
class1 Padova
Example
We see again that G in the Figure (a) is not chordal since the parents of A are not connected in the max-cardinality ordering in Figure (d). If we connect B and C, the resulting induced graph is chordal.
class1 Padova Cordal Graphs; Max-Cardinality Ordering
n A graph is cordal if every cycle of length at least 4 has a chord
n Finding w* over chordal graph is easy using the max-cardinality ordering
n If G* is an induced graph it is chordal n K-trees are special chordal graphs. n Finding the max-clique in chordal graphs is easy (just enumerate all cliques in a max-cardinality ordering
class1 Padova
Max-cardinality ordering
Graph Concepts: Hypergraphs and Dual Graphs
n A hypergraph is H = (V,S) , V= {v_1,..,v_n} and a set of subsets Hyperegdes: S={S_1, ..., S_l }. n Dual graphs of a hypergaph: The nodes are the hyperedges and a pair of nodes is connected if they share vertices in V. The arc is labeled by the shared vertices. n A primal (Markov, moral) graph of a hypergraph H = (V,S) has V as its nodes, and any two nodes are connected by an arc if they appear in the same hyperedge. n Factor Graphs. .
class1 Padova
Which greedy algorithm is best?
n MinFill, prefers a node who add the least number of fill-in arcs.
n Empirically, fill-in is the best among the greedy algorithms (MW,MIW,MF,MC)
n Complexity of greedy orderings?
n MW is O(?), MIW: O(?) MF (?) MC is O(mn)
class1 Padova
Road Map
n Graphical models
n Constraint networks Model n Inference
n Variable elimination: n Tree-clustering
n Constraint propagation
n Search
n Probabilistic Networks
class1 Padova