Neural Network 𝜶 - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdf · 2016-06-17 · 𝜶46 Conclusion •Back propagation은chain rule을이용한목적함수의 미분계산을multi

𝑠𝑖𝑔𝑚𝑎 𝜶

Machine Learning

𝑠𝑖𝑔𝑚𝑎 𝜶

2015.06.27.

Neural Network

𝑠𝑖𝑔𝑚𝑎 𝜶 2

Neural Network

• Human Neuron

• Perceptron

• Artificial Neural Network

• Feed-forward Neural Nets.

• Gradient

• Least Square Error

• Cross Entropy

• Back-propagation

• Conclusion


Issues

• Inceptionism of Google


Issues



Issues



Issues



Human Neuron


Human Neuron

Input WeightSum

ActivationFunction

Output

Defined vectors

This is calculated as the weighted sum of the input vectors

The input vectors are transformed into an output signal via a activation function

An output signal is [0 or 1] or real value number (between 0 to 1)


Perceptron

Raw data Input vector Weight ActivationFunction

Output


Perceptron

• Inputs are features

• Each feature has weight

• Sum is the activation• Positive: 1

• Negative: 0

𝑧 = 𝑖𝑁𝑤𝑖 ∙ 𝑥𝑖𝑦 = 𝑓 𝑧 ,Activation is

Step function Sigmoid function Gaussian function


Perceptron & Logistic Regression

𝑥𝑖

𝑥 𝑤

…

Logistic RegressionPerceptron

Parametric problem


Perceptron learning rule

• On-line, error (mistake) driven learning

• Rosenblatt (1959, a psychologist)• suggested that when a target output value is provided for

a single neuron with fixed input, it can incrementally change weights and learn to produce the output using the Perceptron learning rule

• Perceptron == Linear Threshold Unit

𝑧 = 𝑖𝑁𝑤𝑖

𝑇 ∙ 𝑥𝑖

= 𝑤𝑇𝑥


Perceptron learning rule


Geometric View


Geometric View


Geometric View


Geometric View


Deriving the delta rule


Perceptron Example

-1

x1

x2

Raw data Input vector

?

Weight ActivationFunction

0

Output

X1 X2 Output

0 0 0

0 1 0

1 0 0

1 1 1

For AND


Perceptron Example

X1 X2 Output

0 0 0

0 1 0

1 0 0

1 1 1

For AND

X0 X1 X2 Summation Output

-1 0 0 (-1*0.5) + (0*0.4) + (0*0.4) = -0.5 0

-1 0 1 (-1*0.5) + (0*0.4) + (1*0.4) = -0.1 0

-1 1 0 (-1*0.5) + (1*0.4) + (0*0.4) = -0.1 0

-1 1 1 (-1*0.5) + (1*0.4) + (1*0.4) = 0.3 1


Limitation of a Perceptron: Linear separable


Decision surface of a perceptron

• Perceptron is able to represent some useful functions

• AND(x1, x2) choose weights w0=-1.5, w1=1, w2=1

• But functions that are not linearly separable(e.g. XOR) are not representable


Perceptrons...

• Perceptron: Mistake Bound Theorem

• Dual Perceptron

• Voted-Perceptron

• Regularization: Average Perceptron

• Passive-Aggressive Algorithm

• Unrealizable Case


We need Non-linearly separable

StructureTypes of

Decision Regions

Exclusive-OR

Problem

Classes with

Meshed regions

Most General

Region Shapes

Single-Layer

Two-Layer

Three-Layer

Half Plane

Bounded By

Hyperplane

Convex Open

Or

Closed Regions

Arbitrary

(Complexity

Limited by No.

of Nodes)

A

AB

B

A

AB

B

A

AB

B

BA

BA

BA


Artificial Neural Network

Raw data Input vector Weight ActivationFunction

Output

Add units!!Layer



Raw data Input layer

Weight

ActivationFunction

Hiddenlayer

ActivationFunction

Outputlayer

Weight



𝑧𝑘 = 𝑦1 𝐴𝑁𝐷 𝑁𝑂𝑇 𝑦2 = 𝑥1 𝑂𝑅 𝑥2 𝐴𝑁𝐷 𝑁𝑂𝑇 𝑥1 𝐴𝑁𝐷 𝑥2= 𝑥1 𝑋𝑂𝑅 𝑥2

그림출처: Pattern Classification

Solve a XOR!!



Input value

Emission value

Weight

Activation function

그림출처: Pattern Classification

Combination of each states


Feed-forward Neural Nets.

• Net activation (scalar, hidden unit ‘𝑗’)

• input-to-hidden

1) 𝑛𝑒𝑡𝑗 =

𝑖=1

𝑑

𝑥𝑖𝑤𝑖𝑗 + 𝑤𝑗0 =

𝑖=0

𝑑

𝑥𝑖𝑤𝑖𝑗 ≡ 𝑤𝑖𝑇𝑥

• 𝑖: 𝑖𝑛𝑝𝑢𝑡 𝑙𝑎𝑦𝑒𝑟, 𝑗: ℎ𝑖𝑑𝑑𝑒𝑛 𝑙𝑎𝑦𝑒𝑟, 𝑤𝑖𝑗: 𝑖 → 𝑗의 𝑤𝑒𝑖𝑔ℎ𝑡

• 𝑥: 𝑢𝑛𝑖𝑡𝑠(= 𝑛𝑜𝑑𝑒), 𝑤:𝑤𝑒𝑖𝑔ℎ𝑡

• 𝑥0 = 1, 𝑤0 = 0~1 (𝑏𝑖𝑎𝑠 𝑣𝑎𝑙𝑢𝑒)



• Activation function (non-linear function)

2) 𝑦𝑗 = 𝑓 𝑛𝑒𝑡𝑗

• → 𝑠𝑔𝑛 = 𝑠𝑖𝑔𝑛𝑢𝑚 표현 함수 (𝜑)

3) 𝑓 𝑛𝑒𝑡 = 𝑠𝑔𝑛 𝑛𝑒𝑡 ≡ 1, 𝑛𝑒𝑡 ≥ 0−1, 𝑛𝑒𝑡 < 0

: 𝑎𝑐𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛



• Activation functions

logistic sigmoid

𝑓 𝑛𝑒𝑡 =1

1 + exp −𝑛𝑒𝑡

𝜕𝑓 𝑛𝑒𝑡

𝜕𝑛𝑒𝑡= 𝑓 𝑛𝑒𝑡 1 − 𝑓 𝑛𝑒𝑡

tanh

𝑓 𝑛𝑒𝑡 = tanh 𝑛𝑒𝑡 =𝑒𝑥 + 𝑒−𝑥

𝑒𝑥 + 𝑒−𝑥

𝑡𝑎𝑛ℎ` 𝑛𝑒𝑡 = 1 − 𝑡𝑎𝑛ℎ` 𝑛𝑒𝑡2

hard tanh

𝑓 𝑛𝑒𝑡 = 𝐻𝑎𝑟𝑑Tanh 𝑛𝑒𝑡

𝐻𝑎𝑟𝑑Tanh 𝑛𝑒𝑡 =

−1 𝑖𝑓 𝑥 < −1

𝑥 𝑖𝑓 − 1 ≤ 𝑥 ≤ 1

1 𝑖𝑓 𝑥 > 1

그림출처: Torch7 Documentation



• Activation functions

SoftSign𝑓 𝑛𝑒𝑡 = 𝑆𝑜𝑓𝑡𝑆𝑖𝑔𝑛(𝑛𝑒𝑡)

𝑆𝑜𝑓𝑡𝑆𝑖𝑔𝑛 𝑛𝑒𝑡 =𝑎

1 + 𝑎

SoftMax𝑓 𝑛𝑒𝑡 = 𝑆𝑜𝑓𝑡𝑀𝑎𝑥(𝑛𝑒𝑡)

=exp 𝑛𝑒𝑡𝑖 − 𝑠ℎ𝑖𝑓𝑡

𝑗 exp(𝑛𝑒𝑡𝑗 − 𝑠ℎ𝑖𝑓𝑡)

, 𝑠ℎ𝑖𝑓𝑡 = max𝑖

(𝑛𝑒𝑡𝑖)

Rectifier

𝑓 𝑛𝑒𝑡 = 𝑟𝑒𝑐𝑡 (𝑛𝑒𝑡)

𝑟𝑒𝑐𝑡 𝑛𝑒𝑡 = max(0, 𝑛𝑒𝑡)

𝑚𝑎𝑥 0, 𝑛𝑒𝑡 =𝑥 𝑖𝑓 𝑥 > 0

0.01𝑥 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

그림출처: Wikipedia



• output layer (output unit ‘𝑘’)

• hidden-to-output

4) 𝑛𝑒𝑡𝑘 =

𝑗=1

𝑛+1

𝑦𝑖𝑤𝑗𝑘 + 𝑤𝑘0 =

𝑗=0

𝑛𝐻

𝑦𝑗𝑤𝑗𝑘 = 𝑤𝑗𝑇𝑦

• 𝑘: 𝑜𝑢𝑡𝑝𝑢𝑡 𝑙𝑎𝑦𝑒𝑟, 𝑛𝐻: 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 ℎ𝑖𝑑𝑑𝑒𝑛 𝑢𝑛𝑖𝑡𝑠

• 𝑦0 = 1 𝑏𝑖𝑎𝑠 𝑣𝑎𝑙𝑢𝑒 𝑖𝑛 ℎ𝑖𝑑𝑑𝑒𝑛

• output unit• 여기도 𝑠𝑔𝑛 . 적용

5) 𝑧𝑘 = 𝑓(𝑛𝑒𝑡𝑘)


Gradient

• 각 변수로의 일차 편미분 값으로 구성되는 벡터• 벡터: 𝑓(. )의 값이 가파른 쪽의 방향을 나타냄

• 벡터의 크기: 벡터 증가의 기울기를 나타냄

• 어떤 다변수 함수 𝑓(𝑥1, 𝑥2, … , 𝑥𝑛)가 있을 때, 𝑓의gradient는 다음과 같음

𝛻𝑓 = (𝜕𝑓

𝜕𝑥1,𝜕𝑓

𝜕𝑥2, … ,

𝜕𝑓

𝜕𝑥𝑛)

• Gradient를 이용한 다변수 scalar 함수 𝑓의 점 𝑎𝑘의 근처에서의 선형 근사식 (using Taylor expansion)

𝑓 𝑎 = 𝑓 𝑎𝑘 + 𝛻𝑓 𝑎𝑘 𝑎 − 𝑎𝑘 + 𝑜( 𝑎 − 𝑎𝑘 )


Gradient Descent

• Formula

𝑎 𝑘+1 = 𝑎𝑘 − 𝜂𝑘𝛻𝑓 𝑎𝑘 , 𝑘 ≥ 0

𝜂𝑘: 𝑙𝑒𝑎𝑟𝑛𝑖𝑛𝑔 𝑟𝑎𝑡𝑒

• Algorithm

𝒃𝒆𝒈𝒊𝒏 𝑖𝑛𝑖𝑡 𝑎, 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 𝜃, 𝜂𝒅𝒐 𝑘 ← 𝑘 + 1

𝑎 ← 𝑎 − 𝜂𝛻𝑓 𝑎𝒖𝒏𝒕𝒊𝒍 𝜂𝛻𝑎 𝑘 < 0

𝒓𝒆𝒕𝒖𝒓𝒏 𝑎𝒆𝒏𝒅

출처: wikipedia


Least Square Error

• 어떤 모델의 파라미터를 추정할 때 sample data와train data 간, 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙2의 합이 최소가 되도록 하는 것

𝑟1

𝑟2

𝑟3

𝑟4

𝑟5

ㅡ정답모델ㅡ추정모델

정답데이터추정데이터

Residual: 𝑟(= 휀)

min 𝑟 =

𝑖

(𝑦𝑖 − 𝑦𝑖)


Least Square Error

• 어떤 추정된 모델 𝑓 𝑥 = 𝑎𝑥 + 𝑏 인 경우

• 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙에 대해서 살펴보면 다음과 같음

𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙𝑖 = 𝑦𝑖 − 𝑓 𝑥𝑖

• 즉, LSE의 파라미터를 추정한다는 것은 min(𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙2)을구한다는 것

• 따라서 수식으로 표현하면

𝑖=1

𝑛

𝑟2 =

𝑖=1

𝑛

𝑦𝑖 − 𝑓 𝑥𝑖2

• 위의 모델, 즉 직선인 경우

𝑖=1

𝑛

𝑟2 =

𝑖=1

𝑛

𝑦𝑖 − 𝑎𝑥𝑖 + 𝑏𝑖2

• 따라서 𝑟2을 최소화 하는 파라미터 a, b를 결정


Back-propagation

• Delta Rule에 기반한 방법• LSE를 기반으로 target(t)과 output(z)의 오차 제곱을 최소로

함

• Credit assignment problem• NN의 Hidden layer에서 정답을 확인할 방법 없음

• 따라서 Back Prop.을 이용하여 weight 갱신

output(z) : target(t)

compare차이발생: error(=scalar function)

∴weight들은이 error 값을줄이도록조절 weight는패턴별로학습

weight


Back-propagation

• 임의 패턴에 대한 학습률(training error)

9) 𝐽 𝑤 ≡1

2

𝑘=1

𝑐

𝑡𝑘 − 𝑧𝑘2 =

1

2𝑡 − 𝑧 2

• 𝑡𝑘: 정답(target), 𝑧𝑘: net 출력(train result) output

• 𝑡, 𝑧: 길이가 c인 target, net의 출력 ‘vector’

• 𝑤: net의 모든 가중치 (training error)

• Back prop. training rule

• gradient descent에 기반 (init: random weight)

10) ∆𝑤 = −𝜂𝜕𝐽

𝜕𝑤, 𝑜𝑟 11) ∆𝑤𝑝𝑞 = −𝜂

𝜕𝐽

𝜕𝑤𝑝𝑞

• 𝜂: 학습률(training error) 가중치 변화의 상대적 크기

• 반복 m번일 때, gradient descent 기준함수(𝐽(𝑤))를 낮추도록 움직임

12) 𝑤𝑚+1 = 𝑤𝑚 + ∆𝑤𝑚


Back-propagation

• Back Prop. of Hidden-to-Output

• 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑟𝑟𝑜𝑟 "𝑤𝑗𝑘" 최적화 필요 (∴ 𝐽 𝑤 를𝑤로 최적화)

• 𝑤𝑗𝑘가 𝑤𝑘𝑗에 외연적으로 종속되지 않음

• 즉, 𝐽는 𝑛𝑒𝑡에 의존적: (9)1

2𝑡 − 𝑧 2, (5) 𝑧𝑘 = 𝑓(𝑛𝑒𝑡𝑘)

• 𝑛𝑒𝑡은 𝑤에 의존적: (4) 𝑛𝑒𝑡𝑘 = 𝑤𝑘𝑇𝑦

• 따라서 chain rule 적용 가능

I J K

ℎ𝑖𝑑𝑑𝑒𝑛 − 𝑡𝑜 − 𝑜𝑢𝑡에대한𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑟𝑟𝑜𝑟 "𝑤𝑗𝑘"를계산


Back-propagation

• 𝑤𝑘𝑗최적화에 대한 𝜕𝑛𝑒𝑡𝑘의 chain rule

13)𝜕𝐽

𝜕𝑤𝑘𝑗=

𝜕𝐽

𝜕𝑛𝑒𝑡𝑘

𝜕𝑛𝑒𝑡𝑘𝜕𝑤𝑗𝑘

• unit k의 ‘𝛿𝑘’: Delta rule [(𝑡𝑘 − 𝑧𝑘)]

• unit의 net 활성화에 따라 전반적 에러가 어떻게 바뀌는지 묘사(LSE, 오차)

14) 𝑑𝑒𝑙𝑡𝑎: −𝛿𝑘 =𝜕𝐽

𝜕𝑛𝑒𝑡𝑘• 활성함수 𝑓(. )가 미분 가능하다 가정: (5) 𝑧𝑘 = 𝑓(𝑛𝑒𝑡𝑘),

9 𝐽 =1

2 𝑘=1𝑐 𝑡𝑘 − 𝑧𝑘

2에 기반하여, 출력 unit에 대한 𝛿𝑘는 다

음과 같음

15) 𝛿𝑘 = −𝜕𝐽

𝜕𝑛𝑒𝑡𝑘= −

𝜕𝐽

𝜕𝑧𝑘

𝜕𝑧𝑘𝜕𝑛𝑒𝑡𝑘

= 𝑡𝑘 − 𝑧𝑘 𝑓′(𝑛𝑒𝑡𝑘)


Back-propagation

• 𝑤𝑘𝑗최적화에 대한 𝜕𝑛𝑒𝑡𝑘의 chain rule

13)𝜕𝐽

𝜕𝑤𝑗𝑘=

𝜕𝐽

𝜕𝑛𝑒𝑡𝑘


• 우변의 마지막 미분식은 (4) 𝑛𝑒𝑡𝑘 = 𝑤𝑘𝑇𝑦를 이용


= 𝑦𝑗

• Hidden-to-output의 weight를 위한 학습룰17) ∆𝑤𝑗𝑘 = 𝑡𝑘 − 𝑧𝑘 𝑓′ 𝑛𝑒𝑡𝑘 𝑦𝑗

∴output unit이 선형일 경우• 즉, 𝑓 𝑛𝑒𝑡𝑘 = 𝑛𝑒𝑡𝑘, 𝑓

′ 𝑛𝑒𝑡𝑘 = 1

• ∆𝑤𝑗𝑘 = 𝑡𝑘 − 𝑧𝑘 𝑦𝑖

• 식 (17)은 LSE(Least Square Error)와 같음

• LSE: 𝑎𝑘+1 = 𝑎𝑘 + 𝜂𝑘 𝑏𝑘 − 𝑓(𝑎𝑘) 𝑦𝑘 , 𝑓 𝑎𝑘 = 𝑎𝑘𝑇𝑦𝑘


Back-propagation

• Back Prop. of Input-to-Hidden

• 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑟𝑟𝑜𝑟 "𝑤𝑖𝑗" 최적화 필요 (∴ 𝐽 𝑤 를𝑤로 최적화)

I J K

𝑖𝑛𝑝𝑢𝑡 − 𝑡𝑜 − ℎ𝑖𝑑𝑑𝑒𝑛에대한𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑟𝑟𝑜𝑟 "𝑤𝑖𝑗"를계산


Back-propagation

• Back Prop. of Input-to-Hidden

• (11) ∆𝑤𝑝𝑞 = −𝜂𝜕𝐽

𝜕𝑤𝑝𝑞과 chain rule 이용

18)𝜕𝐽

𝜕𝑤𝑖𝑗=

𝜕𝐽

𝜕𝑦𝑖𝑗

𝜕𝑦𝑖𝑗

𝜕𝑛𝑒𝑡𝑗


𝜕𝑤𝑖𝑗

• 위 식에서 우변의 첫 항은 𝑤𝑘𝑗를 모두 포함

19)𝜕𝐽

𝜕𝑦𝑖𝑗=

𝜕

𝜕𝑦𝑖𝑗

1

2

𝑘=1

𝑐

𝑡𝑘 − 𝑧𝑘2

= −

𝑘=1

𝑐

𝑡𝑘 − 𝑧𝑘𝜕𝑧𝑘𝜕𝑦𝑗

= −

𝑘=1

𝑐

𝑡𝑘 − 𝑧𝑘𝜕𝑧𝑘𝜕𝑛𝑒𝑡𝑘

𝜕𝑛𝑒𝑡𝑘𝜕𝑦𝑗

= −

𝑘=1

𝑐

𝑡𝑘 − 𝑧𝑘 𝑓′ 𝑛𝑒𝑡𝑘 𝑤𝑘𝑗 = −

𝑘=1

𝑐

𝑤𝑗𝑘𝛿𝑘

9) 𝐽 𝑤 ≡1

2

𝑘=1

𝑐

𝑡𝑘 − 𝑧𝑘2 =

1

2𝑡 − 𝑧 2

𝑧𝑘 = 𝑓 𝑛𝑒𝑡𝑘

𝑛𝑒𝑡𝑘 =

𝑗

𝑦𝑗𝑤𝑗𝑘

𝛿𝑘 = 𝑡𝑘 − 𝑧𝑘 𝑓′ 𝑛𝑒𝑡𝑘 𝑦𝑗

chain rule

𝛿𝑘


Back-propagation

• unit k의 ‘𝛿𝑗’ (식(19)와 식(18)에서의 두 번째 식)

20) 𝛿𝑗 ≡ 𝑓′ 𝑛𝑒𝑡𝑗

𝑘=1

𝑐

𝑤𝑗𝑘𝛿𝑘

𝑓′ 𝑛𝑒𝑡𝑗 =𝜕𝑦𝑗

𝜕𝑛𝑒𝑡𝑗=

𝜕𝑓 𝑛𝑒𝑡𝑗


• Input-to-hidden의 weight 학습

21) ∆𝑤𝑖𝑗 = 𝜂𝑥𝑖𝛿𝑗 = 𝜂

𝑘=1

𝑐

𝑤𝑗𝑘𝛿𝑘 𝑓′ 𝑛𝑒𝑡𝑗 𝑥𝑖

𝑥𝑖: 18 의마지막 =𝜕𝑛𝑒𝑡𝑗𝜕𝑤𝑖𝑗

=𝜕 𝑖 𝑥𝑖𝑤𝑖𝑗

𝜕𝑤𝑖𝑗= 𝑥𝑖


Conclusion

• Back propagation은 chain rule을 이용한 목적함수의미분 계산을 multi layer model에 적용한 gradient descent에 기반한 것

• 모든 gradient descent와 마찬가지로 Back Prop.의 동작은 시작점에 의존• 시작, 즉 weight init은 가급적 0을 피해야 함 (곱 연산 때문)

• 식 (17)을 보면, unit k에서의 가중치 갱신은 (𝑡𝑘 − 𝑧𝑘)에 비례해야 함• (𝑡𝑘 = 𝑧𝑘), 즉 출력과 정답이 같으면 weight 변화 X

• sigmoid function 𝑓′(𝑛𝑒𝑡)는 항상 양의 수 [0 or 1]• (𝑡𝑘 − 𝑧𝑘)와 𝑦𝑗가 둘 다 양이면 output은 작고 가중치는 증가돼

야 함


Conclusion

• weight 갱신은 입력 값에 비례해야 함• 𝑦𝑖 = 0 이면, hidden unit “j”는 output과 error에 영향을 주지

않음 𝑤𝑗𝑖의 변경은 해당 패턴의 error에 영향 없음

• feed forward의 일반화를 사용한 Back prop.의 일반화• input unit들은 bias unit 포함

• input unit들은 hidden unit 뿐만 아니라 output unit들에도 직접 연결 가능 (그림 참조)

• 각 층마다 다른 비선형성이 있음

• NN [i-to-h: sigmoid, h-to-o: ReLU]

• 각 unit들은 그 자신의 비선형성을 가짐

• 각 unit들은 다른 학습률(∆𝑤)을 가짐


References

• https://photos.google.com/share/AF1QipPX0SCl7OzWilt9LnuQliattX4OUCj_8EP65_cTVnBmS1jnYgsGQAieQUc1VQWdgQ?key=aVBxWjhwSzg2RjJWLWRuVFBBZEN1d205bUdEMnhB

• http://cs.kangwon.ac.kr/~leeck/Advanced_algorithm/4_Perceptron.pdf

• Pattern Recognition, Richard O. Duda et al.

http://cs.kangwon.ac.kr/~leeck/Advanced_algorithm/4_Perceptron.pdf

http://cs.kangwon.ac.kr/~leeck/Advanced_algorithm/4_Perceptron.pdf


QA

감사합니다.

박천음, 박찬민, 최재혁, 박세빈, 이수정

𝑠𝑖𝑔𝑚𝑎 𝜶 , 강원대학교

Email: [email protected]


𝑥𝑖

Input layer

Weight

ActivationFunction

𝑦𝑗

Hiddenlayer

ActivationFunction

𝑧𝑘

Outputlayer

Weight

Documents

Neural Network 𝜶 - Kangwoncs.kangwon.ac.kr/.../05_neural_network.pdf · 2016-06-17 · 𝜶46 Conclusion •Back propagation은chain rule을이용한목적함수의 미분계산을multi