M.Sc. Thesis presentation

Distributed Learning by Neural Networks

Roma, 28/01/2015

Candidato: Roberto Fierimonte Relatore: Prof. Massimo Panella

Correlatore: Dr. Simone Scardapane

OUTLINE

Distributed Learning by Neural Networks 2/17

• Technological Context

• Distributed Machine Learning

• Algorithms Development

• Experimental Results

• Conclusions

TECHNOLOGICAL CONTEXT Increasing trends in ubiquitous informaPon-‐collecPng devices, in storage capability, in communicaPon speed (Moore’s law of ICT). As a consequence, a growing amount of data is available to be stored in databases and data warehouses, whose locaPon is oUen distributed through a network of interconnected nodes. PROBLEMS: •  Data may not allowed to be shared •  Network topology changes dynamically

•  Dependence on a central hub

•  Constraints on resources


TARGET: developing efficient algorithms to perform decentralized analysis and inference on distributed data

DISTRIBUTED MACHINE LEARNING

A machine learning problem that requires the cooperaPon of several agents (or nodes) is named distributed learning problem. Two paradigms for distributed learning problems: •  Model-‐distributed, in which only a part of the overall model’s parameters are

known to each node •  Data-‐distributed, in which only a fracPon of the overall training set is available to

each node


Node 1

Node 3

Node 2

Dataset

Node

Model

S3

Node 4

S1

S2 S4

Model

Input/Output

Link


“Given a network and a segregated dataset T such that Tk is related with the k-‐th node of the network, how can each machine learn a mapping of examples to class labels in T without communicaPng any data-‐point of Tk ?”


PRACTICAL APPLICATIONS: •  Big data problems

•  Music classificaPon •  Financial forecasPng

•  Learning over sensors network

•  Environmental monitoring •  Infomobility

•  Privacy preserving data mining

•  Electronic health •  Fraud detecPon


LEARNING BY CONSENSUS Distributed Average Consensus (DAC) is a distributed protocol for calculaPng the average of a series of measurements within a network. •  No need to exchange data between nodes •  Robustness with respect to the network topology •  Totally distributed strategy •  Ease of implementaPon


IDEA: Train a common learning machine on the local dataset for every node, and then compute the global soluPon by averaging the local soluPons through the Consensus protocol.

BATCH LEARNING: When the enPre training set is available before starPng the training

ALGORITHMS DEVELOPMENT Proposed algorithms are a combinaPon of Consensus protocol and Random Vector FuncPonal-‐Link networks (RVFLs).

If the size of the hidden layer is adequately high, RVFL are universal approximators for a wide range of basis funcPons h [Igelnik and Pao, 1995].


x1

x2

h1

h2

h3

y

β1

β2

β3

y = βii=1

m∑ ⋅hi (x,wi ) = β

Th(x,w1,...,wm )

β* = (HTH +λI )−1HTYREGULARIZED LEAST SQUARES:

ALGORITHMS DEVELOPMENT CONSENSUS-‐BASED BATCH DISTRIBUTED RVFL: 1.  INITIALIZATION: RVFLs hidden parameters are generated

randomly and then shared among the nodes. 2.  LOCAL SOLUTION COMPUTATION: each node calculates its local esPmate

o using its corresponding local dataset.

3.  GLOBAL SOLUTION COMPUTATION: once that each node has computed its own local esPmates, using the Consensus protocol the global soluPon is computed, resulPng in the average of the local esPmates.


w1,...,wm

βi*

β* =1N

βi*

i=1

N∑

βi* = (Hi

THi +λI )−1Hi

TYi

ALGORITHMS DEVELOPMENT Is not always possible to have the enPre dataset available before starPng the training: •  Data are obtained in real-‐Pme •  Datasets are too large to be processed in batch


β k+1 = β k −αkHk+1

T Hk+1βk −Hk+1

T Yk+1 +λβk

Lk+1

Kk+1 = Kk + (Hk+1)T Hk+1

β k+1 = β k +Kk+1−1Hk+1

T (Yk+1 −Hk+1βk )

RECURSIVE LEAST SQUARES (RLS):

LEAST MEAN SQUARES (LMS):

ONLINE LEARNING:

ORIGINAL PROPOSAL: Extending the results obtained in distributed batch learning to distributed online learning problems

ALGORITHMS DEVELOPMENT CONSENSUS-‐BASED ONLINE DISTRIBUTED RVFL (RLS version): 1.  INITIALIZATION: RVFLs’ hidden parameters are generated

randomly and then shared among the nodes. 2.  LOCAL SOLUTION UPDATE: each node updates its corresponding local

esPmate using its new data according to RLS: 3.  GLOBAL SOLUTION UPDATE: once that each node has computed its own

local esPmates, using the Consensus protocol the global soluPon is computed, resulPng in the average of the local updates.


w1,...,wm

βCk+1 =

1N

βik+1

i=1

N∑

Kk+1i = Kk

i + (Hk+1i )T Hk+1

i

βik+1 = βC

k + (Kk+1i )−1(Hk+1

i )T (Yk+1i −Hk+1

i βCk )

ALGORITHMS DEVELOPMENT CONSENSUS-‐BASED ONLINE DISTRIBUTED RVFL (LMS version): 1.  INITIALIZATION: RVFLs’ hidden parameters are generated

randomly and then shared among the nodes. 2.  LOCAL SOLUTION UPDATE: each node updates its corresponding local

esPmate using its new data according to LMS: 3.  GLOBAL SOLUTION UPDATE: once that each node has computed its own

local esPmates, using the Consensus protocol the global soluPon is computed, resulPng in the average of the local updates.


w1,...,wm

βCk+1 =

1N

βik+1

i=1

N∑

βik+1 = βC

k −αki (Hk+1

i )T Hk+1i βC

k − (Hk+1i )TYk+1 +λβC

k

Lk+1

ALGORITHMS DEVELOPMENT ALGORITHMS IMPLEMENTATION: The proposed algorithms are implemented in MATLAB in order to test their efficacy, using the Parallel CompuPng Toolbox (PCT). •  Use of the spmd command to run the code on a pool of machine or on a simulated

distributed architecture (up to 12 nodes) on a single machine.

•  A serial version of the code is implemented to perform large-‐scale simulaPons


EXPERIMENTAL RESULTS EXPERIMENTAL SETUP SimulaPons performed on 4 freely available datasets: Type of simulaPons performed:

•  Performance and scalability

•  Influence of the network topology

•  Comparison in training Pme

•  Performance of the original online algorithms All results are obtained on a machine equipped with Intel i5 @3.00 GHz processor and 16 GB RAM running MATLAB R2013a. Distributed Learning by Neural Networks 13/17

Dataset N° of Features N° of Instances Predicted value Task Type

Banknote 4 1372 Banknote class Binary classificaPon

g50c 50 550 Gaussian label Binary classificaPon

CCPP 4 9568 Energy output Regression

Garageband 44 1856 Music genre 9-‐class classificaPon

EXPERIMENTAL RESULTS PERFORMANCE AND SCALABILITY


0 5 10 15 20 25 30 35 40 45 50ï2

0

2

4

6

8

10

12

14

Nodes of network

Erro

r [%

]

CentralizedïRVFLConsensusïRVFLLocalïRVFL

0 5 10 15 20 25 30 35 40 45 500

5

10

15

20

25

30

Nodes of network

Erro

r [%

]


0 5 10 15 20 25 30 35 40 45 500.23

0.24

0.25

0.26

0.27

0.28

0.29

0.3

Nodes of network

NR

MSE


0 5 10 15 20 25 30 35 40 45 5035

40

45

50

55

60

65

70

75

Nodes of network

Erro

r [%

]


EXPERIMENTAL RESULTS INFLUENCE OF NETWORK TOPOLOGY

COMPARISON IN TRAINING TIME


0 5 10 150

50

100

150

200

250

300

350

Nodes of network

Num

ber o

f ite

ratio

ns

Cyclic LatticeFully ConnectedLinear Topology (K=1)Linear Topology (K=4)Random Topology (p=0.25)Random Topology (p=0.5)

0 2 4 6 8 10 12 14 160

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

Nodes of network

Trai

ning

tim

e [s

]


0 2 4 6 8 10 12 14 160

0.005

0.01

0.015

0.02

0.025

0.03

0.035

Nodes of network

Trai

ning

tim

e [s

]


0 5 10 150

20

40

60

80

100

120

140

160

180

200

Nodes of network

Num

ber o

f ite

ratio

ns

Cyclic LatticeFully ConnectedLinear Topology (K=1)Linear Topology (K=4)Random Topology (p=0.25)Random Topology (p=0.5)

EXPERIMENTAL RESULTS ORIGINAL ONLINE ALGORITHMS PERFORMANCE


0 20 40 60 80 100 1200

5

10

15

20

25

30

35

40

45

50

Number of iterations

Erro

r [%

]

CentralizedïRVFLConsensusïRVFL (RLS)ConsensusïRVFL (LMS)LocalïRVFL

0 5 10 15 20 25 30 35 40 45 500

5

10

15

20

25

30

35

40

45

50


Erro

r [%

]


0 100 200 300 400 500 600 700 8000.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


Erro

r


0 20 40 60 80 100 120 140 16030

40

50

60

70

80

90


Erro

r [%

]


CONCLUSIONS The proposed algorithms solve the distributed online learning problem in a totally decentralized way requiring only local communicaPons and any exchange of data points. Experimental results show that the algorithms are capable to achieve performances comparable with a centralized soluPon, except for LMS version of the online algorithm. ApplicaPon of the proposed algorithms to music classificaPon problem is invesPgated in: [Scardapane et al., 2015b]. All the code developed for the thesis is freely available for consultaPon at: hrps://github.com/roberto-‐fierimonte/tesi-‐rvfl-‐online

FUTURE DEVELOPMENTS: •  ApplicaPon to model-‐distributed problems •  ApplicaPon to problems with constraints on energy and Pme •  ApplicaPon to unsupervised learning problems


Distributed Learning by Neural Networks

THANK YOU

FOR YOUR ATTENTION

Documents

M.Sc. Thesis presentation