18
Distributed Learning by Neural Networks Roma, 28/01/2015 Candidato: Roberto Fierimonte Relatore: Prof. Massimo Panella Correlatore: Dr. Simone Scardapane

M.Sc. Thesis presentation

Embed Size (px)

Citation preview

Page 1: M.Sc. Thesis presentation

Distributed  Learning  by  Neural  Networks  

Roma,  28/01/2015  

Candidato:  Roberto  Fierimonte  Relatore:  Prof.  Massimo  Panella  

Correlatore:  Dr.  Simone  Scardapane  

Page 2: M.Sc. Thesis presentation

OUTLINE  

Distributed  Learning  by  Neural  Networks   2/17

• Technological  Context  

• Distributed  Machine  Learning  

• Algorithms  Development  

• Experimental  Results  

• Conclusions  

Page 3: M.Sc. Thesis presentation

TECHNOLOGICAL  CONTEXT    Increasing  trends  in  ubiquitous  informaPon-­‐collecPng  devices,  in  storage  capability,  in    communicaPon  speed  (Moore’s  law  of  ICT).  As  a  consequence,  a  growing  amount  of    data  is  available  to  be  stored  in  databases  and  data  warehouses,  whose  locaPon  is    oUen  distributed  through  a  network  of  interconnected  nodes.              PROBLEMS:    •  Data  may  not  allowed  to  be  shared    •  Network  topology  changes  dynamically  

•  Dependence  on  a  central  hub  

•  Constraints  on  resources  

Distributed  Learning  by  Neural  Networks   3/17

TARGET:  developing  efficient  algorithms  to  perform  decentralized    analysis  and  inference  on  distributed  data  

Page 4: M.Sc. Thesis presentation

DISTRIBUTED  MACHINE  LEARNING  

A  machine  learning  problem  that  requires  the  cooperaPon  of  several  agents  (or  nodes)  is  named  distributed  learning  problem.    Two  paradigms  for  distributed  learning  problems:    •  Model-­‐distributed,  in  which  only  a  part  of  the  overall  model’s  parameters  are  

known  to  each  node    •  Data-­‐distributed,  in  which  only  a  fracPon  of  the  overall  training  set  is  available  to  

each  node      

Distributed  Learning  by  Neural  Networks   4/17

Node 1

Node 3

Node 2

Dataset

Node

Model

S3

Node 4

S1

S2 S4

Model

Input/Output

Link

Page 5: M.Sc. Thesis presentation

DISTRIBUTED  MACHINE  LEARNING  

“Given  a  network  and  a  segregated  dataset  T  such  that  Tk  is  related  with  the  k-­‐th  node      of  the  network,  how  can  each  machine  learn  a  mapping  of  examples  to  class  labels  in      T  without  communicaPng  any  data-­‐point  of  Tk  ?”    

Distributed  Learning  by  Neural  Networks   5/17

PRACTICAL  APPLICATIONS:    •  Big  data  problems  

•  Music  classificaPon  •  Financial  forecasPng  

 •  Learning  over  sensors  network  

•  Environmental  monitoring  •  Infomobility  

 •  Privacy  preserving  data  mining  

•  Electronic  health  •  Fraud  detecPon  

 

Page 6: M.Sc. Thesis presentation

DISTRIBUTED  MACHINE  LEARNING  

LEARNING  BY  CONSENSUS    Distributed  Average  Consensus  (DAC)  is  a  distributed  protocol  for  calculaPng  the  average  of  a  series  of  measurements  within  a  network.              •  No  need  to  exchange  data  between  nodes    •  Robustness  with  respect  to  the  network  topology    •  Totally  distributed  strategy    •  Ease  of  implementaPon  

Distributed  Learning  by  Neural  Networks   6/17

IDEA:  Train  a  common  learning  machine  on  the  local  dataset  for  every    node,  and  then  compute  the  global  soluPon  by  averaging  the  local  soluPons  through  the  Consensus  protocol.  

Page 7: M.Sc. Thesis presentation

BATCH  LEARNING:    When  the  enPre  training  set  is  available  before  starPng  the  training  

ALGORITHMS  DEVELOPMENT  Proposed  algorithms  are  a  combinaPon  of  Consensus  protocol  and  Random  Vector    FuncPonal-­‐Link  networks  (RVFLs).      

                 If  the  size  of  the  hidden  layer  is  adequately  high,  RVFL  are  universal  approximators  for  a  wide  range  of  basis  funcPons  h  [Igelnik  and  Pao,  1995].  

Distributed  Learning  by  Neural  Networks   7/17

x1

x2

h1

h2

h3

y

β1

β2

β3

y = βii=1

m∑ ⋅hi (x,wi ) = β

Th(x,w1,...,wm )

β* = (HTH +λI )−1HTYREGULARIZED  LEAST  SQUARES:  

Page 8: M.Sc. Thesis presentation

ALGORITHMS  DEVELOPMENT  CONSENSUS-­‐BASED  BATCH  DISTRIBUTED  RVFL:    1.  INITIALIZATION:  RVFLs  hidden  parameters                              are  generated  

randomly  and  then  shared  among  the  nodes.    2.  LOCAL  SOLUTION  COMPUTATION:  each  node  calculates  its  local  esPmate                

o        using  its  corresponding  local  dataset.    

3.  GLOBAL  SOLUTION  COMPUTATION:  once  that  each  node  has  computed  its  own  local  esPmates,  using  the  Consensus  protocol  the  global  soluPon  is  computed,  resulPng  in  the  average  of  the  local  esPmates.              

         Distributed  Learning  by  Neural  Networks   8/17

w1,...,wm

βi*

β* =1N

βi*

i=1

N∑

βi* = (Hi

THi +λI )−1Hi

TYi

Page 9: M.Sc. Thesis presentation

ALGORITHMS  DEVELOPMENT  Is  not  always  possible  to  have  the  enPre  dataset  available  before  starPng  the  training:    •  Data  are  obtained  in  real-­‐Pme    •  Datasets  are  too  large  to  be  processed  in  batch    

                   

Distributed  Learning  by  Neural  Networks   9/17

β k+1 = β k −αkHk+1

T Hk+1βk −Hk+1

T Yk+1 +λβk

Lk+1

Kk+1 = Kk + (Hk+1)T Hk+1

β k+1 = β k +Kk+1−1Hk+1

T (Yk+1 −Hk+1βk )

RECURSIVE  LEAST  SQUARES  (RLS):  

LEAST  MEAN  SQUARES  (LMS):  

ONLINE  LEARNING:  

ORIGINAL  PROPOSAL:  Extending  the  results  obtained  in  distributed  batch  learning  to  distributed  online  learning  problems    

Page 10: M.Sc. Thesis presentation

ALGORITHMS  DEVELOPMENT  CONSENSUS-­‐BASED  ONLINE  DISTRIBUTED  RVFL  (RLS  version):    1.  INITIALIZATION:  RVFLs’  hidden  parameters                                are  generated  

randomly  and  then  shared  among  the  nodes.    2.  LOCAL  SOLUTION  UPDATE:  each  node  updates  its  corresponding  local  

esPmate  using  its  new  data  according  to  RLS:        3.  GLOBAL  SOLUTION  UPDATE:  once  that  each  node  has  computed  its  own  

local  esPmates,  using  the  Consensus  protocol  the  global  soluPon  is  computed,  resulPng  in  the  average  of  the  local  updates.  

                   

Distributed  Learning  by  Neural  Networks   10/17

w1,...,wm

βCk+1 =

1N

βik+1

i=1

N∑

Kk+1i = Kk

i + (Hk+1i )T Hk+1

i

βik+1 = βC

k + (Kk+1i )−1(Hk+1

i )T (Yk+1i −Hk+1

i βCk )

Page 11: M.Sc. Thesis presentation

ALGORITHMS  DEVELOPMENT  CONSENSUS-­‐BASED  ONLINE  DISTRIBUTED  RVFL  (LMS  version):    1.  INITIALIZATION:  RVFLs’  hidden  parameters                                are  generated  

randomly  and  then  shared  among  the  nodes.    2.  LOCAL  SOLUTION  UPDATE:  each  node  updates  its  corresponding  local  

esPmate  using  its  new  data  according  to  LMS:        3.  GLOBAL  SOLUTION  UPDATE:  once  that  each  node  has  computed  its  own  

local  esPmates,  using  the  Consensus  protocol  the  global  soluPon  is  computed,  resulPng  in  the  average  of  the  local  updates.  

                   

Distributed  Learning  by  Neural  Networks   11/17

w1,...,wm

βCk+1 =

1N

βik+1

i=1

N∑

βik+1 = βC

k −αki (Hk+1

i )T Hk+1i βC

k − (Hk+1i )TYk+1 +λβC

k

Lk+1

Page 12: M.Sc. Thesis presentation

ALGORITHMS  DEVELOPMENT  ALGORITHMS  IMPLEMENTATION:    The  proposed  algorithms  are  implemented  in  MATLAB  in  order  to  test  their  efficacy,  using  the  Parallel  CompuPng  Toolbox  (PCT).                                  •  Use  of  the  spmd  command  to  run  the  code  on  a  pool  of  machine  or  on  a  simulated  

distributed  architecture  (up  to  12  nodes)  on  a  single  machine.        

•  A  serial  version  of  the  code  is  implemented  to  perform  large-­‐scale  simulaPons        

Distributed  Learning  by  Neural  Networks   12/17

Page 13: M.Sc. Thesis presentation

EXPERIMENTAL  RESULTS  EXPERIMENTAL  SETUP    SimulaPons  performed  on  4  freely  available  datasets:              Type  of  simulaPons  performed:    

•  Performance  and  scalability  

•  Influence  of  the  network  topology  

•  Comparison  in  training  Pme  

•  Performance  of  the  original  online  algorithms    All  results  are  obtained  on  a  machine  equipped  with  Intel  i5  @3.00  GHz  processor  and  16  GB  RAM  running  MATLAB  R2013a.        Distributed  Learning  by  Neural  Networks   13/17

Dataset   N° of Features   N° of Instances   Predicted value   Task Type  

Banknote   4   1372   Banknote  class   Binary  classificaPon  

g50c   50   550   Gaussian  label   Binary  classificaPon  

CCPP   4   9568   Energy  output   Regression  

Garageband   44   1856   Music  genre   9-­‐class  classificaPon  

Page 14: M.Sc. Thesis presentation

EXPERIMENTAL  RESULTS  PERFORMANCE  AND  SCALABILITY  

Distributed  Learning  by  Neural  Networks   14/17

0 5 10 15 20 25 30 35 40 45 50ï2

0

2

4

6

8

10

12

14

Nodes of network

Erro

r [%

]

CentralizedïRVFLConsensusïRVFLLocalïRVFL

0 5 10 15 20 25 30 35 40 45 500

5

10

15

20

25

30

Nodes of network

Erro

r [%

]

CentralizedïRVFLConsensusïRVFLLocalïRVFL

0 5 10 15 20 25 30 35 40 45 500.23

0.24

0.25

0.26

0.27

0.28

0.29

0.3

Nodes of network

NR

MSE

CentralizedïRVFLConsensusïRVFLLocalïRVFL

0 5 10 15 20 25 30 35 40 45 5035

40

45

50

55

60

65

70

75

Nodes of network

Erro

r [%

]

CentralizedïRVFLConsensusïRVFLLocalïRVFL

Page 15: M.Sc. Thesis presentation

EXPERIMENTAL  RESULTS  INFLUENCE  OF  NETWORK  TOPOLOGY                

COMPARISON  IN  TRAINING  TIME  

Distributed  Learning  by  Neural  Networks   15/17

0 5 10 150

50

100

150

200

250

300

350

Nodes of network

Num

ber o

f ite

ratio

ns

Cyclic LatticeFully ConnectedLinear Topology (K=1)Linear Topology (K=4)Random Topology (p=0.25)Random Topology (p=0.5)

0 2 4 6 8 10 12 14 160

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

Nodes of network

Trai

ning

tim

e [s

]

CentralizedïRVFLConsensusïRVFLLocalïRVFL

0 2 4 6 8 10 12 14 160

0.005

0.01

0.015

0.02

0.025

0.03

0.035

Nodes of network

Trai

ning

tim

e [s

]

CentralizedïRVFLConsensusïRVFLLocalïRVFL

0 5 10 150

20

40

60

80

100

120

140

160

180

200

Nodes of network

Num

ber o

f ite

ratio

ns

Cyclic LatticeFully ConnectedLinear Topology (K=1)Linear Topology (K=4)Random Topology (p=0.25)Random Topology (p=0.5)

Page 16: M.Sc. Thesis presentation

EXPERIMENTAL  RESULTS  ORIGINAL  ONLINE  ALGORITHMS  PERFORMANCE  

Distributed  Learning  by  Neural  Networks   16/17

0 20 40 60 80 100 1200

5

10

15

20

25

30

35

40

45

50

Number of iterations

Erro

r [%

]

CentralizedïRVFLConsensusïRVFL (RLS)ConsensusïRVFL (LMS)LocalïRVFL

0 5 10 15 20 25 30 35 40 45 500

5

10

15

20

25

30

35

40

45

50

Number of iterations

Erro

r [%

]

CentralizedïRVFLConsensusïRVFL (RLS)ConsensusïRVFL (LMS)LocalïRVFL

0 100 200 300 400 500 600 700 8000.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Number of iterations

Erro

r

CentralizedïRVFLConsensusïRVFL (RLS)ConsensusïRVFL (LMS)LocalïRVFL

0 20 40 60 80 100 120 140 16030

40

50

60

70

80

90

Number of iterations

Erro

r [%

]

CentralizedïRVFLConsensusïRVFL (RLS)ConsensusïRVFL (LMS)LocalïRVFL

Page 17: M.Sc. Thesis presentation

CONCLUSIONS  The  proposed  algorithms  solve  the  distributed  online  learning  problem  in  a  totally    decentralized  way  requiring  only  local  communicaPons  and  any  exchange  of  data  points.    Experimental  results  show  that  the  algorithms  are  capable  to  achieve  performances  comparable  with  a  centralized  soluPon,  except  for  LMS  version  of  the  online  algorithm.    ApplicaPon  of  the  proposed  algorithms  to  music  classificaPon  problem  is  invesPgated  in:  [Scardapane  et  al.,  2015b].    All  the  code  developed  for  the  thesis  is  freely  available  for  consultaPon  at:  hrps://github.com/roberto-­‐fierimonte/tesi-­‐rvfl-­‐online    

FUTURE  DEVELOPMENTS:    •  ApplicaPon  to  model-­‐distributed  problems    •  ApplicaPon  to  problems  with  constraints  on  energy  and  Pme    •  ApplicaPon  to  unsupervised  learning  problems  

 Distributed  Learning  by  Neural  Networks   17/17

Page 18: M.Sc. Thesis presentation

Distributed  Learning  by  Neural  Networks  

THANK  YOU    

FOR  YOUR  ATTENTION