o “Anytime” property: once is initialized to be asymp. consistent (and ), then remains asymp. consistent at every iteration.
•ADMM:
•Binary Two-node Toy Model: . Estimating (true ) as and (both known) are varied.•Matrix Consensus (introducing for theoretically purpose):
•Matrix Consensus reduces to linear consensus when Wi are diagonal matrices.
•Asymptotic covariance of :
•Joint optimization consensus is asymptotically equivalent to matrix consensus with Wi=Hi:
•The optimal weights should minimize the asymptotic mean square error (MSE):
•The optimal weights of matrix consensus are given by
•Let ; reform the optimization,
Choosing the Optimal Weights
Distributed Parameter Estimation via Pseudo-likelihood Qiang Liu Alexander Ihler
Department of Computer Science, University of California, Irvine
Motivation•Graphical models in exponential family form:
•Task: distributed algorithms for estimating parameters given i.i.d. data, .
•Example: wireless sensor network as MRF, •Limited computational power and memory on local sensors.
•High communication cost.
Task: calculate the partition function Z, or • Important: probability of evidence, parameter estimation• #P-complete in general graphs• Approximations and bounds are needed
M-Estimators•M-estimator:
• Asymptotic consistency and normality: if
•Intuition:
•Maximum likelihood estimator (MLE):
•Maximum Pseudo-likelihood (PL) estimator (MPLE):
A Distributed Paradigm
ADMM for Joint Optimization Consensus Choosing the Optimal Weights (cont.)
• Optimal weights for linear consensus :
•If corr(siα, sj
α) = 0 for i ≠ j, then the optimal weights of is
•Optimal weights for max consensus is
•If corr(siα, sj
α) = 1 for i ≠ j, then max consensus with achieves the performance of the best linear consensus.
Experiments(Sandwich formula)
Fisher information: , Hessian: .
…
Non-zero elementszero elements
• Perform local estimators on sensor nodes:
• Combine the local estimations:o Joint Optimization Consensus (joint MPLE):
o Linear Consensus:
o Max Consensus:
o Max consensus is a special linear consensus.
o Under mild conditions, all these consensus estimators are asymp. consistent if the local estimators are asymp. Consistent.
o quadratic programming, solvable, but requires global calculation.
o If corr(si, sj) = 0 for i ≠ j, then (joint MPLE, Wi =Hi) achieves the optimum asymptotic MSE.
Sum of rows of Vα-1
// recall that max consensus are special cases of linear consensus.
• Star Graphs (unbalanced degrees, max consensus preferred):
• 4X4 Grid (balanced degrees, joint MPLE preferred):
• Large-scale Models (100 nodes, similar trends as small models):
o requires calculating partition function, NP-hard.
o Important: each term of PL only involves local data and parameters.
Joint optimization consensus can be solved distributedly via alternating direction method of multipliers (ADMM):
•Augmented Lagrangian:
‘’Iterative” linear consensus See similar algorithm in
Wiesel & Hero 2012
ADMM Iteration