Federated Optimization in Heterogeneous Networkslitian/assets/slides/fedprox_mlsys20.pdf · •...

TianLi(CMU),AnitKumarSahu(BCAI),ManzilZaheer(GoogleResearch),MaziarSanjabi(FacebookAI),AmeetTalwalkar(CMU&DeterminedAI),VirginiaSmith(CMU)

FederatedOptimizationinHeterogeneousNetworks

tianli@cmu.edu

FederatedLearningPrivacy-preservingtraininginheterogeneous,(potentially)massivenetworks

Networksofremotedevicese.g.,cellphones

next-wordprediction

Networksofisolatedorganizationse.g.,hospitals

healthcare

ExampleApplications

Voicerecognitiononmobilephones

Adaptingtopedestrianbehavioronautonomousvehicles

Personalizedhealthcareonwearabledevices

Predictivemaintenanceforindustrialmachines

Workflow&Challenges

W′ ′ W′

Systemsheterogeneityvariablehardware,networkconnectivity,

power,etc

Statisticalheterogeneityhighlynon-identicallydistributeddata

Expensivecommunicationpotentiallymassivenetwork;wireless

communication

Privacyconcernsprivacyleakagethroughparameters

localtraininglocaltraining

Objective:

server

devices

lossondevicekAstandardsetup:

f(w) =N

∑k=1

pkFk(w)

APopularMethod:FederatedAveraging(FedAvg)[1]

[1]McMahan,H.Brendan,etal."Communication-efficientlearningofdeepnetworksfromdecentralizeddata."AISTATS,2017.

Workswellinmanysettings!(especiallynon-convex)

Ateachcommunicationround:

Serverrandomlyselectsasubsetofdevices&sendsthecurrentglobalmodel wt

Eachselecteddevice updates for epochs

ofSGDtooptimize &sendsthenewlocalmodelback

k wt EFk

Serveraggregateslocalmodelstoformanewglobalmodelwt+1

Whatcangowrong?

Whataretheissues?

simpleaverageupdates

statisticalheterogeneity

highlynon-identicallydistributeddata

0% stragglers

stragglers

systemsheterogeneity

simplydropslowdevices[2]

[2]Bonawitz,Keith,etal."TowardsFederatedLearningatScale:SystemDesign."MLSys,2019.

0% stragglers

90% stragglers

FedAvg

heuristicmethod

notguaranteedtoconverge

OutlineMotivation

FedProxMethod

TheoreticalAnalysis

Experiments

FutureWork7

FedProx—HighLevel

rateasafunctionofstatisticalheterogeneityaccountforstragglers theory

allowforvariableamountsofwork&safelyincorporatethem

encouragemorewell-behavedupdates

simplydropstragglers

systemsheterogeneity

averagesimpleSGDupdates

statisticalheterogeneity

1. convergenceguarantees2. morerobustempiricalperformance forfederatedlearninginheterogeneousnetworks

Contributio

FedProx

FedProx:AFrameworkForFederatedOptimizationAteachcommunicationround,

localobjective:

Fk(wk)

Objective:

f(w) =N

∑k=1

pkFk(w)

Idea1:Allowforvariableamountsofworktobeperformedonlocaldevicestohandlestragglers

Idea2:ModifiedLocalSubproblem:

a proximal term

Fk(wk) +μ2

wk − wt 2

FedProx:AFrameworkForFederatedOptimization

ModifiedLocalSubproblem: minwk

Fk(wk) +μ2

wk − wt 2

Theproximalterm(1)safelyincorporatenoisyupdates;(2)explicitly

limitstheimpactoflocalupdates

GeneralizationofFedAvg

Canuseanylocalsolver

Morerobustandstableempiricalperformance

Strongtheoreticalguarantees(withsomeassumptions)10

OutlineMotivation

FedProxMethod

TheoreticalAnalysis

Experiments

FutureWork11

ConvergenceAnalysis

High-level:convergesdespitethesechallengesIntroducesnotionofB-dissimilarityintocharacterizestatisticalheterogeneity:

IIDdata: non-IIDdata:

B = 1B > 1

Challenges:devicesubsampling,non-iiddata,localupdates

*usedinothercontexts,e.g.,gradientdiversity[3]toquantifythebenefitsofscalingdistributedSGD

[3]Yin,Dong,etal."GradientDiversity:aKeyIngredientforScalableDistributedLearning.”AISTATS,2018.

𝔼 [∥∇Fk(w)∥2] ≤ ∥∇f(w)∥2B2

Assumption1:Dissimilarityisbounded

ConvergenceAnalysis

Proximaltermmakesthemethodmoreamenabletotheoreticalanalysis!

Assumption2:Modifiedlocalsubproblemisconvex&smooth

Assumption3:EachlocalsubproblemissolvedtosomeaccuracyFlexiblecommunication/computationtradeoffAccountforpartialworkintherates

Rateisgeneral:Coversbothconvex,andnon-convexlossfunctionsIndependentofthelocalsolver;agnosticofthesamplingmethod

ThesameasymptoticconvergenceguaranteeasSGDCanconvergemuchfasterthandistributedSGDinpractice

ConvergenceAnalysis

[Theorem]Obtainsuboptimality ,afterTrounds,with:ε

T = O ( f(w0) − f*ρε )

some constant, a function of (B, μ, …)

OutlineMotivation

FedProxMethod

TheoreticalAnalysis

Experiments

FutureWork15

ExperimentsZeroSystemsheterogeneity+FixedStatisticalheterogeneity

Benchmark:LEAF(leaf.cmu.edu)

FedAvg

FedProxwith leadstomorestableconvergenceunderstatisticalheterogeneityμ > 0

FedProx, μ > 0

FedAvg FedProx, μ > 0

Similarbenefitsforalldatasets

ExperimentsHighSystemsheterogeneity+FixedStatisticalheterogeneity

FedAvg

Allowingforvariableamountsofworktobeperformedhelps

convergenceinthepresenceofsystems

heterogeneity

FedProx, μ = 0FedProx, μ > 0

FedProxwith leadstomorestableconvergenceunderstatistical&systems

heterogeneity

μ > 0

FedAvg FedProx, μ = 0 FedProx, μ > 0

Similarbenefitsforalldatasets

Intermsoftestaccuracy:

onaverage,22%absoluteaccuracyimprovementcomparedwithFedAvgin

highlyheterogeneoussettings

ExperimentsImpactofStatisticalHeterogeneity

Settingμ>0canhelptocombatthis

Inaddition,B-dissimilaritycapturesstatisticalheterogeneity(seepaper)

Increasingheterogeneityleadstoworseconvergence

OutlineMotivation

FedProxMethod

TheoreticalAnalysis

Experiments

FutureWork21

FutureWorkPrivacy&security

Betterprivacymetrics&mechanisms

PersonalizationAutomaticfine-tuning

ProductionizingColdstartproblems

Hyper-parametertuningSetμautomatically

DiagnosticsDeterminingheterogeneityaprioriLeveragingtheheterogeneityforimprovedperformance

Whitepaper:FederatedLearning:Challenges,Methods,andFutureDirections,IEEESignalProcessingMagazine,2020.

(alsoonArXiv)

Thanks!

Paper&code:cs.cmu.edu/~litian/Benchmark:leaf.cmu.edu

Poster:#3,thisroom

On-deviceIntelligenceWorkshop,Wednesday,thisroom

Backup1• Relationswithpreviousworks

• proximalterm• ElasticSGD:employsamorecomplexmovingaveragetoupdateparameters;

limitedtoSGDasalocalsolver;onlybeenanalyzedforquadraticproblems• DANEandinexactDANE:addsanadditionalgradientcorrectionterm,assume

fulldeviceparticipation(unrealistic);discouragingempiricalperformance• FedDANE: A Federated Newton-Type Method, Arxiv.

• Otherworks:differentpurposessuchasspeedingupSGDonasinglemachine;differentanalysisassumptions(IID,solvingsubproblemsexactly)

• B-dissimilarityterm• Forotherpurposes,suchasquantifyingthebenefitinscalingSGDforIIDdata

• Datastatistics

• Systemsheterogeneitysimulation

• FixaglobalnumberofepochsE,andforcesomedevicestoperformfewerupdatesthan epochs.Inparticular,forvaryingheterogeneoussetting,assign (chosenuniformlyrandombetween )numberofepochsto0%,50,and90%ofselecteddevices.

E x[1,E]

Backup2

Backup3

• TheoriginalFedAvgalgorithm

Backup4• CompletetheoremAssumethefunctions arenon-convex,L-Lipschitzsmooth,andthereexists ,suchthat

,with .Supposethat isnotastationarysolutionandthelocalfunctions are -dissimilar,i.e., If and arechosensuchthat

Fk L_ > 0∇2Fk ⪰ − L_I μ̄ = μ − L_ > 0 wt

Fk B B(wt) ≤ B . μ, K, γtk

ρt = ( 1μ

−γtBμ

−B(1 + γt) 2

μ̄ K−

LB(1 + γt)μ̄μ

−L(1 + γt)2B2

2μ̄2−

LB2(1 + γt)2

μ̄2K (2 2K + 2)) > 0,

thenattheiteration ofFedProx,wehavethefollowingexpecteddecreaseintheglobalobjective:

𝔼St[ f(wt+1)] ≤ f(wt) − ρt∥∇f(wt)∥2,

where isthesetof deviceschosenatiteration andSt K t γt = maxk∈St

γtk .

Federated Optimization in Heterogeneous Networkslitian/assets/slides/fedprox_mlsys20.pdf · •...

Documents

Bayesian Nonparametric Federated Learning of Neural Networksproceedings.mlr.press/v97/yurochkin19a/yurochkin19a.pdf · 2.3. Federated and Distributed Learning Federated learning has

Federated database performace

CIS14: Why Federated Access Needs a Federated Identity

THE FEDERATED FORECAST€¦ · 10/11/2018 · THE FEDERATED CHURCH—PAGE 1 THE FEDERATED FORECAST The Federated Church 612 W. State Street, Sycamore, IL 60178 Phone # 815-895-2706

FEDERATED MACHINE LEARNING - Applied Mathematics · Federated Averaging and FedSGD Federated Averaging (FedAvg) Shares updated parameters Federated SGD (FedSGD) Shares local gradients

Federated Insurance is presenting complimentary seminar ...nnla.org/.../08/Federated-Insurance-Risk-Management... · Federated Insurance is presenting a complimentary seminar you

SmartCloud Federated ID

FEDERATED ILLOGICALITIES

Federated Identity Management452

Practical Federated Identity

Presentation federated search

EGI federated Cloud, developing a Production Quality Federated Cloud Infrastructure

Federated Media- profile

THE FEDERATED STATES THE FEDERATED STATES OF … · THE FEDERATED STATES OF MICRONESIA NATIONAL ICT AND TELECOMMUNICATIONS POLICY ~ ii ~ MESSAGE FROM THE PRESIDENT OF THE FEDERATED

Federated Government Reserves Fund · Federated Government Reserves Fund ... Federated Institutional Money Market Management Federated Institutional Prime Obligations Fund ... The

Federated access management

Federated Wireless - Allied Mindsinvestors.alliedminds.com/.../federated-wireless-cmd-presentation.pdf · © Federated Wireless, Inc. 2018 For Public Use Iyad Tarazi, CEO. January

Federated Identity Management

THE FEDERATED FORECAST

String Similarity Join With Different Similarity Thresholds ...ynsilva/SimCloud/downloads/StringSimilarityJoin.pdfthe use of different thresholds for different strings. In this