Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Learning From Data
Lecture 10
Nonlinear Transforms
The Z-space
Polynomial transforms
Be careful
M. Magdon-IsmailCSCI 4100/6100
recap: The Linear Model
linear in x: gives the line/hyperplane separator
↓
s = wtx
↑linear in w: makes the algorithms work
CreditAnalysis
Amountof Credit
Approveor Deny
Probabilityof Default
Perceptron
Logistic Regression
Linear Regression
Classification ErrorPLA, Pocket,. . .
Cross-entropy ErrorGradient descent
Squared ErrorPseudo-inverse
c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 2 /17 Limitations of linear −→
The Linear Model has its Limits
(a) Linear with outliers (b) Essentially nonlinear
To address (b) we need something more than linear.
c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 3 /17 Change the features −→
Change Your Features
Years in Residence, Y
Income
Y ≫ 3 yearsno additional effect beyond Y = 3;
Y ≪ 0.3 yearsno additional effect below Y = 0.3.
c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 4 /17 ‘Transform’ your features −→
Change Your Features Using a Transform
Years in Residence, Y
Income
z1Income
z1
Y
c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 5 /17 Feature transform I: Z-space −→
Mechanics of the Feature Transform I
Transform the data to a Z-space in which the data is separable.
x1
x2 −→
z1 = x21
z2=x2 2
x =
1
x1
x2
−→ z = Φ(x) =
1
x21
x22
=
1
Φ1(x)
Φ2(x)
c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 6 /17 Feature transform II: classify in Z-space −→
Mechanics of the Feature Transform II
Separate the data in the Z-space with w̃:
g̃(z) = sign(w̃tz)
−→
c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 7 /17 Feature transform III: bring back to X -space −→
Mechanics of the Feature Transform III
To classify a new x, first transform x to Φ(x) ∈ Z-space and classify there with g̃.
g(x) = g̃(Φ(x))
= sign(w̃tΦ(x))g̃(z) = sign(w̃tz)
←−
c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 8 /17 Summary of nonlinear transform −→
The General Feature Transform
X -space is Rd Z-space is Rd̃
x =
1x1...xd
z = Φ(x) =
1Φ1(x)
...Φd̃(x)
=
1z1...zd̃
x1,x2, . . . ,xN z1, z2, . . . , zN
y1, y2, . . . , yN y1, y2, . . . , yN
no weights w̃ =
w0
w1...wd̃
g(x) = sign(w̃tΦ(x))
c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 9 /17 Generalization −→
Generalization
dvc d̃vc
d + 1 −→ d̃+ 1
Choose the feature transform with smallest d̃
c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 10 /17 Many possibilities to choose from −→
Many Nonlinear Features May Work
x1
x2 ←− x2
1 + x22 = 0.6
z1 = (x1 + 0.05)2
z2=x2
0
z1 = x21
z2=x2 2
z1 = x21 + x22 − 0.6
A rat! A rat!
This is called data snooping: looking at your data and tailoring your H.
c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 11 /17 Choose before looking at data −→
Must Choose Φ BEFORE Your Look at the Data
After constructing features carefully, before seeing the data . . .
. . . if you think linear is not enough, try the 2nd order polynomial transform.
1
x1
x2
= x −→ Φ(x) =
1
Φ1(x)
Φ2(x)
Φ3(x)
Φ4(x)
Φ5(x)
=
1
x1
x2
x21
x1x2
x22
c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 12 /17 The polynomial transform −→
The General Polynomial Transform Φk
We can get even fancier: degree-k polynomial transform:
Φ1(x) = (1, x1, x2),
Φ2(x) = (1, x1, x2, x21, x1x2, x
22),
Φ3(x) = (1, x1, x2, x21, x1x2, x
22, x
31, x
21x2, x1x
22, x
32),
Φ4(x) = (1, x1, x2, x21, x1x2, x
22, x
31, x
21x2, x1x
22, x
32, x
41, x
31x2, x
21x
22, x1x
32, x
42),
...
– Dimensionality of the feature space increases rapidly (dvc)!
– Similar transforms for d-dimensional original space.
– Approximation-generalization tradeoffHigher degree gives lower (even zero) Ein but worse generalization.
c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 13 /17 Be carefull with nonlinear transforms −→
Be Careful with Feature Transforms
c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 14 /17 Insist on Ein = 0 −→
Be Careful with Feature Transforms
High order polynomial transform leads to “nonsense”.
c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 15 /17 Digits data −→
Digits Data “1” Versus “All”
Average Intensity
Sym
metry
0.35
Average Intensity
Sym
metry
Linear model
Ein = 2.13%Eout = 2.38%
3rd order polynomial model
Ein = 1.75%Eout = 1.87%
c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 16 /17 Use the linear model! −→
Use the Linear Model!
• First try a linear model – simple, robust and works.
• Algorithms can tolerate error plus you have nonlinear feature transforms.
• Choose a feature transform before seeing the data. Stay simple.Data snooping is hazardous to your Eout.
• Linear models are fundamental in their own right; they are also the building blocksof many more complex models like support vector machines.
• Nonlinear transforms also apply to regression and logistic regression.
c© AML Creator: Malik Magdon-Ismail Nonlinear Transforms: 17 /17