1 Technical University, Computer Engineering Dept. · 1. High Generalization Capability 2. Suitable for integration of feature extraction models 3. Fast Training 4. Simple mathematical

Gokhan ALTAN*,1, Yakup KUTLU1

1 İskenderun Technical University, Computer Engineering Dept.

Problem Definition

Deep Learning

Extreme Learning Machines

Deep Extreme Learning Machines (Deep ELM)

Autoencoder Models

Comparison of Deep Learning and Deep ELM

The aim of the study is suggesting theadvantages and weaknesses of the recentTechnologies on Deep Learning.

Especially, Deep ELM and ConvolutionalNeural Network Model

Deep Learning is a special machine learningalgorithm that can generate high-level abstractionmodels using multiple processing layers in acomplex structure consisting of non-lineartransformations with random number of neurons.

Advantages

• Deep Analysis• Feature Learning• Generalization

performance• Stable against

overfitting• Suitable for Big Data• More hidden layers

Disadvantages

• Sample Size• Training Time

Krizhevsky, Sutskever and Hinton (2012)

- CNN with 8 layers [LeCun et al. 1989]

- 7 hidden layers, 650,000 neurons, ~60,000,000 parameters

- 1.2 million images on ImageNet database

- GPU use (50x CPU performance)- Training time: 1 week ( for even GPUs)

Advantages

• Feature Learning• Stable against

overfitting• Suitable for Big Data• More hidden layers

Disadvantages

• Sample Size• Training Time• GPU is needed• CPU is not enough

ELM is generalized single layer neural network model No-tuning is needed for the ordinary implementations Single hidden layer feedforward neural networks. It is fast and influential procedure with randomly

defined hidden node parameters and simply calculated output weights by simple analytical mathematics.

𝑓𝐿 𝑥 =

𝑖=1

𝐿

𝐺𝑖 𝑥, 𝑎𝑖 , 𝑏𝑖 · 𝛽𝑖

𝛽 = 𝐻𝑇1

𝜆+ 𝐻𝐻𝑇

−1

𝑇

𝐻† : Moore-Penrose inverse of matrix H

𝛽 = 𝐻†𝑇

Input and output are same for ELM kernel Constructs different presentations of input at different depths using decoding weights Feature learning, Decoding Unsupervised learning model Different presentations of the given X

𝐻 = 𝘨(𝑎𝑋 + 𝑏)

𝛽 = 𝐻𝑇I

𝐶+ 𝐻𝐻𝑇

−1

𝑋

C : sparsity parameter

The matrix H must be square matrix, (Least Mean Square Estimation) The matrix H must be non-singular (invertible) matrix to avoid

producing singular L and U. The biggest advantage of LU decomposition is the non-time-

consuming elimination.

𝑙11 0 0𝑙21 𝑙22 0𝑙31 𝑙32 𝑙33

𝑢11 𝑢12 𝑢130 𝑢22 𝑢230 0 𝑢33

𝑊1

𝑊2

𝑊3

=

𝑏1𝑏2𝑏3

𝐻 = 𝐿𝑈

𝐻𝑊 = 𝑏

𝐿𝑈 𝑊 = 𝑏

𝐻𝑊 = 𝐿 𝑈 𝑊 = 𝐿 𝑈 𝑊 = 𝑏

𝑙11 0 0𝑙21 𝑙22 0𝑙31 𝑙32 𝑙33

𝑑1𝑑2𝑑3

=

𝑏1𝑏2𝑏3

𝑢11 𝑢12 𝑢130 𝑢22 𝑢230 0 𝑢33

𝑊1

𝑊2

𝑊3

=

𝑑1𝑑2𝑑3

CNN Deep ELM

• Object Recognition• Image Editting• Computer Vision• Shape Modelling• Image Generating

• Object Recognition• Image Editting• Computer Vision• Time Series Analysis• …

Deep ELM

• Generative Autoencoder Kernels on Deep Learning for Brain Activity Analysis

• Hessenberg Elm Autoencoder Kernel For Deep Learning• Performance of Deep Extreme Learning Machines with LU Kernel on

Morphometric Fish Recognition• An Application of Steady Deep Autoencoding Algorithm on Analysing

Multi-channel Lung Sounds for classification of COPD

1. High Generalization Capability2. Suitable for integration of feature extraction models3. Fast Training4. Simple mathematical solutions5. An Open Box

Superiorities against DL

Assist. Prof. Gökhan ALTAN ([email protected])

Documents

1 Technical University, Computer Engineering Dept. · 1. High Generalization Capability 2. Suitable for integration of feature extraction models 3. Fast Training 4. Simple mathematical