31
Classification: Perceptron Prof. Seungchul Lee Industrial AI Lab.

Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one

Classification: Perceptron

Prof. Seungchul Lee

Industrial AI Lab.

Page 2: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one

Classification

• Where 𝑦 is a discrete value– Develop the classification algorithm to determine which class a new input should fall into

• We will learn– Perceptron

– Logistic regression

• To find a classification boundary

2

Page 3: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one

Perceptron

• For input 𝑥 =

𝑥1⋮𝑥𝑑

'attributes of a customer’

• Weights 𝜔 =

𝜔1⋮𝜔𝑑

Data

Features

Classification

3

Page 4: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one

Perceptron

• For input 𝑥 =

𝑥1⋮𝑥𝑑

'attributes of a customer’

• Weights 𝜔 =

𝜔1⋮𝜔𝑑

Data

Features

Classification

4

Page 5: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one

Perceptron

• Introduce an artificial coordinate 𝑥0 = 1 :

• In a vector form, the perceptron implements

• Let’s see geometrical meaning of perceptron

5

Page 6: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one

𝝎

• If Ԧ𝑝 and Ԧ𝑞 are on the decision line

6

Page 7: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one

Signed Distance from a Line: 𝒉

• Sign with respect to a line

7

Page 8: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one

How to Find 𝝎

• All data in class 1 (𝑦 = 1)– 𝑔 𝑥 > 0

• All data in class 0 (𝑦 = −1)– 𝑔 𝑥 < 0

8

Page 9: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one

Perceptron Algorithm

• The perceptron implements

• Given the training set

1) pick a misclassified point

2) and update the weight vector

9

Page 10: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one

Iterations of Perceptron

1. Randomly assign 𝜔

2. One iteration of the PLA (perceptron learning algorithm)

where (𝑥, 𝑦) is a misclassified training point

3. At iteration 𝑡 = 1,2,3,⋯ , pick a misclassified point from

4. And run a PLA iteration on it

5. That's it!

10

Page 11: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one

Diagram of Perceptron

• Perceptron will be shown to be a basic unit for neural networks and deep learning later

11

Page 12: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one

Scikit-Learn for Perceptron

12

Page 13: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one

Perceptron Algorithm in Python

13

Page 14: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one

Perceptron Algorithm in Python

14

Page 15: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one

The Best Hyperplane Separator?

• Perceptron finds one of the many possible hyperplanes separating the data if one exists

• Of the many possible choices, which one is the best?

• Utilize distance information from all data samples

– We will see this formally when we discuss the logistic regression

15

Page 16: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one

Classification: Logistic Regression

16

Page 17: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one

Classification: Logistic Regression

• Perceptron: make use of sign of data

• Logistic regression: make use of distance of data

• Logistic regression is a classification algorithm

– don't be confused from its name

• To find a classification boundary

17

Page 18: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one

Using Distances

18

Page 19: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one

Using Distances

19

Page 20: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one

Using all Distances

• basic idea: to find the decision boundary (hyperplane) of 𝑔 𝑥 = 𝜔𝑇𝑥 = 0

such that maximizes ς𝑖 ℎ𝑖 → optimization

– Inequality of arithmetic and geometric means

and that equality holds if and only if 𝑥1 = 𝑥2 = ⋯ = 𝑥𝑚

20

Page 21: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one

Using all Distances

• Roughly speaking, this optimization of maxς𝑖 ℎ𝑖 tends to position a hyperplane in the middle of two classes

21

Page 22: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one

Sigmoid Function

• We link or squeeze −∞,+∞ to 0, 1 for several reasons:

• 𝜎 𝑧 is the sigmoid function, or the logistic function

– Logistic function always generates a value between 0 and 1

– Crosses 0.5 at the origin, then flattens out

Step function

22

Page 23: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one

Sigmoid Function

• Benefit of mapping via the logistic function– Monotonic: same or similar optimization solution

– Continuous and differentiable: good for gradient descent optimization

– Probability or confidence: can be considered as probability

– Probability that the label is +1

– Probability that the label is 0

23

Page 24: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one

Goal: We Need to Fit 𝜔 to our Data

• For a single data point (𝑥, 𝑦) with parameters 𝜔

• It can be compactly written as

24

Page 25: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one

Scikit-Learn for Logistic Regression

25

Page 26: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one

Multiclass Classification

• Generalization to more than 2 classes is straightforward– one vs. all (one vs. rest)

– one vs. one

26

Page 27: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one

Classifying Non-linear Separable Data

• Consider the binary classification problem

– each example represented by a single feature 𝑥

– No linear separator exists for this data

27

Page 28: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one

Classifying Non-linear Separable Data

• Consider the binary classification problem

– each example represented by a single feature 𝑥

– No linear separator exists for this data

• Now map each example as 𝑥 → 𝑥, 𝑥2

• Data now becomes linearly separable in the new representation

• Linear in the new representation = nonlinear in the old representation

28

Page 29: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one

Classifying Non-linear Separable Data

• Let's look at another example

– Each example defined by a two features

– No linear separator exists for this data 𝑥 = 𝑥1, 𝑥2

• Now map each example as 𝑥 = 𝑥1, 𝑥2 → 𝑧 = 𝑥12, 2𝑥1𝑥2, 𝑥2

2

– Each example now has three features (derived from the old representation)

• Data now becomes linear separable in the new representation

29

Page 30: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one

Kernel

• Often we want to capture nonlinear patterns in the data

– nonlinear regression: input and output relationship may not be linear

– nonlinear classification: classes may not be separable by a linear boundary

• Linear models (e.g. linear regression, linear SVM) are not just rich enough

– by mapping data to higher dimensions where it exhibits linear patterns

– apply the linear model in the new input feature space

– mapping = changing the feature representation

• Kernels: make linear model work in nonlinear settings

30

Page 31: Classification: Perceptron · 2020. 6. 29. · •Perceptron finds one of the many possible hyperplanes separating the data if one exists •Of the many possible choices, which one

Nonlinear Classification

31https://www.youtube.com/watch?v=3liCbRZPrZA