Download pptx - ADMM:

Transcript
Page 1: ADMM:

o “Anytime” property: once is initialized to be asymp. consistent (and ), then remains asymp. consistent at every iteration.

•ADMM:

•Binary Two-node Toy Model: . Estimating (true ) as and (both known) are varied.•Matrix Consensus (introducing for theoretically purpose):

•Matrix Consensus reduces to linear consensus when Wi are diagonal matrices.

•Asymptotic covariance of :

•Joint optimization consensus is asymptotically equivalent to matrix consensus with Wi=Hi:

•The optimal weights should minimize the asymptotic mean square error (MSE):

•The optimal weights of matrix consensus are given by

•Let ; reform the optimization,

Choosing the Optimal Weights

Distributed Parameter Estimation via Pseudo-likelihood Qiang Liu Alexander Ihler

Department of Computer Science, University of California, Irvine

Motivation•Graphical models in exponential family form:

•Task: distributed algorithms for estimating parameters given i.i.d. data, .

•Example: wireless sensor network as MRF, •Limited computational power and memory on local sensors.

•High communication cost.

Task: calculate the partition function Z, or • Important: probability of evidence, parameter estimation• #P-complete in general graphs• Approximations and bounds are needed

M-Estimators•M-estimator:

• Asymptotic consistency and normality: if

•Intuition:

•Maximum likelihood estimator (MLE):

•Maximum Pseudo-likelihood (PL) estimator (MPLE):

A Distributed Paradigm

ADMM for Joint Optimization Consensus Choosing the Optimal Weights (cont.)

• Optimal weights for linear consensus :

•If corr(siα, sj

α) = 0 for i ≠ j, then the optimal weights of is

•Optimal weights for max consensus is

•If corr(siα, sj

α) = 1 for i ≠ j, then max consensus with achieves the performance of the best linear consensus.

Experiments(Sandwich formula)

Fisher information: , Hessian: .

Non-zero elementszero elements

• Perform local estimators on sensor nodes:

• Combine the local estimations:o Joint Optimization Consensus (joint MPLE):

o Linear Consensus:

o Max Consensus:

o Max consensus is a special linear consensus.

o Under mild conditions, all these consensus estimators are asymp. consistent if the local estimators are asymp. Consistent.

o quadratic programming, solvable, but requires global calculation.

o If corr(si, sj) = 0 for i ≠ j, then (joint MPLE, Wi =Hi) achieves the optimum asymptotic MSE.

Sum of rows of Vα-1

// recall that max consensus are special cases of linear consensus.

• Star Graphs (unbalanced degrees, max consensus preferred):

• 4X4 Grid (balanced degrees, joint MPLE preferred):

• Large-scale Models (100 nodes, similar trends as small models):

o requires calculating partition function, NP-hard.

o Important: each term of PL only involves local data and parameters.

Joint optimization consensus can be solved distributedly via alternating direction method of multipliers (ADMM):

•Augmented Lagrangian:

‘’Iterative” linear consensus See similar algorithm in

Wiesel & Hero 2012

ADMM Iteration

Recommended