61
Linear Regression Robot Image Credit: Viktoriya Sukhanova © 123RF.com These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these slides for your own academic purposes, provided that you include proper aHribuIon. Please send comments and correcIons to Eric.

Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

LinearRegression

RobotImageCredit:ViktoriyaSukhanova©123RF.com

TheseslideswereassembledbyEricEaton,withgratefulacknowledgementofthemanyotherswhomadetheircoursematerialsfreelyavailableonline.Feelfreetoreuseoradapttheseslidesforyourownacademicpurposes,providedthatyouincludeproperaHribuIon.PleasesendcommentsandcorrecIonstoEric.

Page 2: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

RegressionGiven:–  Datawhere

–  Correspondinglabelswhere

2

0

1

2

3

4

5

6

7

8

9

1975 1980 1985 1990 1995 2000 2005 2010 2015

Septem

berA

rc+cSeaIceExtent

(1,000,000sq

km)

Year

DatafromG.WiH.JournalofStaIsIcsEducaIon,Volume21,Number1(2013)

LinearRegressionQuadraIcRegression

X =n

x

(1), . . . ,x(n)o

x

(i) 2 Rd

y =n

y(1), . . . , y(n)o

y(i) 2 R

Page 3: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

•  97samples,parIIonedinto67train/30test•  Eightpredictors(features):

–  6conInuous(4logtransforms),1binary,1ordinal•  ConInuousoutcomevariable:

–  lpsa:log(prostatespecificanIgenlevel)

ProstateCancerDataset

BasedonslidebyJeffHowbert

Page 4: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

LinearRegression•  Hypothesis:

•  Fitmodelbyminimizingsumofsquarederrors

5

x x

y = ✓0 + ✓1x1 + ✓2x2 + . . .+ ✓dxd =dX

j=0

✓jxj

Assumex0=1

y = ✓0 + ✓1x1 + ✓2x2 + . . .+ ✓dxd =dX

j=0

✓jxj

FiguresarecourtesyofGregShakhnarovich

Page 5: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

LeastSquaresLinearRegression

6

•  CostFuncIon

•  Fitbysolving

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2

min✓

J(✓)

Page 6: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

IntuiIonBehindCostFuncIon

7

ForinsightonJ(),let’sassumesox 2 R ✓ = [✓0, ✓1]

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2

BasedonexamplebyAndrewNg

Page 7: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

IntuiIonBehindCostFuncIon

8

0

1

2

3

0 1 2 3

y

x

(forfixed,thisisafuncIonofx) (funcIonoftheparameter)

0

1

2

3

-0.5 0 0.5 1 1.5 2 2.5

ForinsightonJ(),let’sassumesox 2 R ✓ = [✓0, ✓1]

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2

BasedonexamplebyAndrewNg

Page 8: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

IntuiIonBehindCostFuncIon

9

0

1

2

3

0 1 2 3

y

x

(forfixed,thisisafuncIonofx) (funcIonoftheparameter)

0

1

2

3

-0.5 0 0.5 1 1.5 2 2.5

ForinsightonJ(),let’sassumesox 2 R ✓ = [✓0, ✓1]

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2

J([0, 0.5]) =1

2⇥ 3

⇥(0.5� 1)2 + (1� 2)2 + (1.5� 3)2

⇤⇡ 0.58Basedonexample

byAndrewNg

Page 9: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

IntuiIonBehindCostFuncIon

10

0

1

2

3

0 1 2 3

y

x

(forfixed,thisisafuncIonofx) (funcIonoftheparameter)

0

1

2

3

-0.5 0 0.5 1 1.5 2 2.5

ForinsightonJ(),let’sassumesox 2 R ✓ = [✓0, ✓1]

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2

J([0, 0]) ⇡ 2.333

BasedonexamplebyAndrewNg

J()isconcave

Page 10: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

IntuiIonBehindCostFuncIon

11SlidebyAndrewNg

Page 11: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

IntuiIonBehindCostFuncIon

12

(forfixed,thisisafuncIonofx) (funcIonoftheparameters)

SlidebyAndrewNg

Page 12: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

IntuiIonBehindCostFuncIon

13

(forfixed,thisisafuncIonofx) (funcIonoftheparameters)

SlidebyAndrewNg

Page 13: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

IntuiIonBehindCostFuncIon

14

(forfixed,thisisafuncIonofx) (funcIonoftheparameters)

SlidebyAndrewNg

Page 14: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

IntuiIonBehindCostFuncIon

15

(forfixed,thisisafuncIonofx) (funcIonoftheparameters)

SlidebyAndrewNg

Page 15: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

BasicSearchProcedure•  ChooseiniIalvaluefor•  UnIlwereachaminimum:–  Chooseanewvaluefortoreduce

16

✓ J(✓)

�1�0

J(�0,�1)

FigurebyAndrewNg

Page 16: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

BasicSearchProcedure•  ChooseiniIalvaluefor•  UnIlwereachaminimum:–  Chooseanewvaluefortoreduce

17

J(✓)

�1�0

J(�0,�1)

FigurebyAndrewNg

Page 17: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

BasicSearchProcedure•  ChooseiniIalvaluefor•  UnIlwereachaminimum:–  Chooseanewvaluefortoreduce

18

J(✓)

�1�0

J(�0,�1)

FigurebyAndrewNg

SincetheleastsquaresobjecIvefuncIonisconvex(concave),wedon’tneedtoworryaboutlocalminima

Page 18: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

GradientDescent•  IniIalize•  RepeatunIlconvergence

19

✓j ✓j � ↵@

@✓jJ(✓) simultaneousupdate

forj=0...d

learningrate(small)e.g.,α=0.05

J(✓)

0

1

2

3

-0.5 0 0.5 1 1.5 2 2.5

Page 19: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

GradientDescent•  IniIalize•  RepeatunIlconvergence

20

✓j ✓j � ↵@

@✓jJ(✓) simultaneousupdate

forj=0...d

ForLinearRegression:@

@✓jJ(✓) =

@

@✓j

1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘2

=@

@✓j

1

2n

nX

i=1

dX

k=0

✓kx(i)k � y

(i)

!2

=1

n

nX

i=1

dX

k=0

✓kx(i)k � y

(i)

!⇥ @

@✓j

dX

k=0

✓kx(i)k � y

(i)

!

=1

n

nX

i=1

dX

k=0

✓kx(i)k � y

(i)

!x

(i)j

=1

n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j

Page 20: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

GradientDescent•  IniIalize•  RepeatunIlconvergence

21

✓j ✓j � ↵@

@✓jJ(✓) simultaneousupdate

forj=0...d

ForLinearRegression:@

@✓jJ(✓) =

@

@✓j

1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘2

=@

@✓j

1

2n

nX

i=1

dX

k=0

✓kx(i)k � y

(i)

!2

=1

n

nX

i=1

dX

k=0

✓kx(i)k � y

(i)

!⇥ @

@✓j

dX

k=0

✓kx(i)k � y

(i)

!

=1

n

nX

i=1

dX

k=0

✓kx(i)k � y

(i)

!x

(i)j

=1

n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j

Page 21: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

GradientDescent•  IniIalize•  RepeatunIlconvergence

22

✓j ✓j � ↵@

@✓jJ(✓) simultaneousupdate

forj=0...d

ForLinearRegression:@

@✓jJ(✓) =

@

@✓j

1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘2

=@

@✓j

1

2n

nX

i=1

dX

k=0

✓kx(i)k � y

(i)

!2

=1

n

nX

i=1

dX

k=0

✓kx(i)k � y

(i)

!⇥ @

@✓j

dX

k=0

✓kx(i)k � y

(i)

!

=1

n

nX

i=1

dX

k=0

✓kx(i)k � y

(i)

!x

(i)j

=1

n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j

Page 22: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

GradientDescent•  IniIalize•  RepeatunIlconvergence

23

✓j ✓j � ↵@

@✓jJ(✓) simultaneousupdate

forj=0...d

ForLinearRegression:@

@✓jJ(✓) =

@

@✓j

1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘2

=@

@✓j

1

2n

nX

i=1

dX

k=0

✓kx(i)k � y

(i)

!2

=1

n

nX

i=1

dX

k=0

✓kx(i)k � y

(i)

!⇥ @

@✓j

dX

k=0

✓kx(i)k � y

(i)

!

=1

n

nX

i=1

dX

k=0

✓kx(i)k � y

(i)

!x

(i)j

=1

n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j

Page 23: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

GradientDescentforLinearRegression

•  IniIalize•  RepeatunIlconvergence

24

simultaneousupdateforj=0...d

✓j ✓j � ↵

1

n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j

•  Toachievesimultaneousupdate•  AtthestartofeachGDiteraIon,compute•  Usethisstoredvalueintheupdatesteploop

h✓

⇣x

(i)⌘

kvk2 =

sX

i

v2i =q

v21 + v22 + . . .+ v2|v|L2norm:

k✓new

� ✓old

k2 < ✏•  Assumeconvergencewhen

Page 24: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

GradientDescent

25

(forfixed,thisisafuncIonofx) (funcIonoftheparameters)

h(x)=-900–0.1x

SlidebyAndrewNg

Page 25: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

GradientDescent

26

(forfixed,thisisafuncIonofx) (funcIonoftheparameters)

SlidebyAndrewNg

Page 26: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

GradientDescent

27

(forfixed,thisisafuncIonofx) (funcIonoftheparameters)

SlidebyAndrewNg

Page 27: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

GradientDescent

28

(forfixed,thisisafuncIonofx) (funcIonoftheparameters)

SlidebyAndrewNg

Page 28: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

GradientDescent

29

(forfixed,thisisafuncIonofx) (funcIonoftheparameters)

SlidebyAndrewNg

Page 29: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

GradientDescent

30

(forfixed,thisisafuncIonofx) (funcIonoftheparameters)

SlidebyAndrewNg

Page 30: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

GradientDescent

31

(forfixed,thisisafuncIonofx) (funcIonoftheparameters)

SlidebyAndrewNg

Page 31: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

GradientDescent

32

(forfixed,thisisafuncIonofx) (funcIonoftheparameters)

SlidebyAndrewNg

Page 32: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

GradientDescent

33

(forfixed,thisisafuncIonofx) (funcIonoftheparameters)

SlidebyAndrewNg

Page 33: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

Choosingα

34

αtoosmall

slowconvergence

αtoolarge

IncreasingvalueforJ(✓)

•  Mayovershoottheminimum•  Mayfailtoconverge•  Mayevendiverge

Toseeifgradientdescentisworking,printouteachiteraIon•  ThevalueshoulddecreaseateachiteraIon•  Ifitdoesn’t,adjustα

J(✓)

Page 34: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

ExtendingLinearRegressiontoMoreComplexModels

•  TheinputsXforlinearregressioncanbe:–  OriginalquanItaIveinputs–  TransformaIonofquanItaIveinputs

•  e.g.log,exp,squareroot,square,etc.–  PolynomialtransformaIon

•  example:y=�0+�1�x+�2�x2+�3�x3–  Basisexpansions–  Dummycodingofcategoricalinputs–  InteracIonsbetweenvariables

•  example:x3=x1�x2

Thisallowsuseoflinearregressiontechniquestofitnon-lineardatasets.

Page 35: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

LinearBasisFuncIonModels

•  Generally,

•  Typically,sothatactsasabias•  Inthesimplestcase,weuselinearbasisfuncIons:

h✓(x) =dX

j=0

✓j�j(x)

�0(x) = 1 ✓0

�j(x) = xj

basisfuncIon

BasedonslidebyChristopherBishop(PRML)

Page 36: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

LinearBasisFuncIonModels

–  Theseareglobal;asmallchangeinxaffectsallbasisfuncIons

•  PolynomialbasisfuncIons:

•  GaussianbasisfuncIons:

–  Thesearelocal;asmallchangeinxonlyaffectnearbybasisfuncIons.μjandscontrollocaIonandscale(width).

BasedonslidebyChristopherBishop(PRML)

Page 37: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

LinearBasisFuncIonModels•  SigmoidalbasisfuncIons:

where

–  Thesearealsolocal;asmallchangeinxonlyaffectsnearbybasisfuncIons.μjandscontrollocaIonandscale(slope).

BasedonslidebyChristopherBishop(PRML)

Page 38: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

ExampleofFinngaPolynomialCurvewithaLinearModel

y = ✓0 + ✓1x+ ✓2x2 + . . .+ ✓px

p =pX

j=0

✓jxj

Page 39: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

LinearBasisFuncIonModels

•  BasicLinearModel:

•  GeneralizedLinearModel:•  OncewehavereplacedthedatabytheoutputsofthebasisfuncIons,finngthegeneralizedmodelisexactlythesameproblemasfinngthebasicmodel–  Unlessweusethekerneltrick–moreonthatwhenwecoversupportvectormachines

–  Therefore,thereisnopointincluHeringthemathwithbasisfuncIons

40

h✓(x) =dX

j=0

✓j�j(x)

h✓(x) =dX

j=0

✓jxj

BasedonslidebyGeoffHinton

Page 40: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

LinearAlgebraConcepts•  Vectorinisanorderedsetofdrealnumbers

–  e.g.,v=[1,6,3,4]isin–  “[1,6,3,4]” isacolumnvector:–  asopposedtoarowvector:

•  Anm-by-nmatrixisanobjectwithmrowsandncolumns,whereeachentryisarealnumber:

⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜

4361

( )4361

⎟⎟⎟

⎜⎜⎜

2396784821

Rd

R4

BasedonslidesbyJosephBradley

Page 41: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

•  Transpose:reflectvector/matrixonline:

( )baba T

=⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛=⎟⎟⎠

⎞⎜⎜⎝

⎛dbca

dcba T

–  Note:(Ax)T=xTAT(We’lldefinemulIplicaIonsoon…)

•  Vectornorms:–  Lpnormofv=(v1,…,vk)is–  Commonnorms:L1,L2–  Linfinity=maxi|vi|

•  LengthofavectorvisL2(v)

X

i

|vi|p! 1

p

BasedonslidesbyJosephBradley

LinearAlgebraConcepts

Page 42: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

•  Vectordotproduct:

–  Note:dotproductofuwithitself=length(u)2=

•  Matrixproduct:

( ) ( ) 22112121 vuvuvvuuvu +=•=•

⎟⎟⎠

⎞⎜⎜⎝

⎛++++

=

⎟⎟⎠

⎞⎜⎜⎝

⎛=⎟⎟⎠

⎞⎜⎜⎝

⎛=

2222122121221121

2212121121121111

2221

1211

2221

1211 ,

babababababababa

AB

bbbb

Baaaa

A

kuk22

BasedonslidesbyJosephBradley

LinearAlgebraConcepts

Page 43: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

•  Vectorproducts:–  Dotproduct:

–  Outerproduct:

( ) 22112

121 vuvuvv

uuvuvu T +=⎟⎟⎠

⎞⎜⎜⎝

⎛==•

( ) ⎟⎟⎠

⎞⎜⎜⎝

⎛=⎟⎟⎠

⎞⎜⎜⎝

⎛=

2212

211121

2

1

vuvuvuvu

vvuu

uvT

BasedonslidesbyJosephBradley

LinearAlgebraConcepts

Page 44: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

h(x) = ✓

|x

x

| =⇥1 x1 . . . xd

VectorizaIon•  BenefitsofvectorizaIon– MorecompactequaIons–  Fastercode(usingopImizedmatrixlibraries)

•  Considerourmodel:•  Let

•  Canwritethemodelinvectorizedformas45

h(x) =dX

j=0

✓jxj

✓ =

2

6664

✓0✓1...✓d

3

7775

Page 45: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

VectorizaIon•  Considerourmodelforninstances:•  Let

•  Canwritethemodelinvectorizedformas46

h✓(x) = X✓

X =

2

66666664

1 x

(1)1 . . . x

(1)d

......

. . ....

1 x

(i)1 . . . x

(i)d

......

. . ....

1 x

(n)1 . . . x

(n)d

3

77777775

✓ =

2

6664

✓0✓1...✓d

3

7775

h

⇣x

(i)⌘=

dX

j=0

✓jx(i)j

R(d+1)⇥1 Rn⇥(d+1)

Page 46: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

J(✓) =1

2n

nX

i=1

⇣✓

|x

(i) � y(i)⌘2

VectorizaIon•  ForthelinearregressioncostfuncIon:

47

J(✓) =1

2n(X✓ � y)| (X✓ � y)

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2

Rn⇥(d+1)

R(d+1)⇥1

Rn⇥1R1⇥n

Let:

y =

2

6664

y(1)

y(2)

...y(n)

3

7775

Page 47: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

ClosedFormSoluIon:

ClosedFormSoluIon•  InsteadofusingGD,solveforopImal analyIcally–  NoIcethatthesoluIoniswhen

•  DerivaIon:

TakederivaIveandsetequalto0,thensolvefor:

48

✓@

@✓J(✓) = 0

J (✓) =1

2n(X✓ � y)| (X✓ � y)

/ ✓|X|X✓ � y|X✓ � ✓|X|y + y|y/ ✓|X|X✓ � 2✓|X|y + y|y

1x1J (✓) =1

2n(X✓ � y)| (X✓ � y)

/ ✓|X|X✓ � y|X✓ � ✓|X|y + y|y/ ✓|X|X✓ � 2✓|X|y + y|y

J (✓) =1

2n(X✓ � y)| (X✓ � y)

/ ✓|X|X✓ � y|X✓ � ✓|X|y + y|y/ ✓|X|X✓ � 2✓|X|y + y|y

@

@✓(✓|X|X✓ � 2✓|X|y + y|y) = 0

(X|X)✓ �X|y = 0

(X|X)✓ = X|y

✓ = (X|X)�1X|y

✓@

@✓(✓|X|X✓ � 2✓|X|y + y|y) = 0

(X|X)✓ �X|y = 0

(X|X)✓ = X|y

✓ = (X|X)�1X|y

@

@✓(✓|X|X✓ � 2✓|X|y + y|y) = 0

(X|X)✓ �X|y = 0

(X|X)✓ = X|y

✓ = (X|X)�1X|y

@

@✓(✓|X|X✓ � 2✓|X|y + y|y) = 0

(X|X)✓ �X|y = 0

(X|X)✓ = X|y

✓ = (X|X)�1X|y

Page 48: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

ClosedFormSoluIon•  CanobtainbysimplypluggingXand into

•  IfXTXisnotinverIble(i.e.,singular),mayneedto:–  Usepseudo-inverseinsteadoftheinverse

•  Inpython,numpy.linalg.pinv(a) –  Removeredundant(notlinearlyindependent)features–  Removeextrafeaturestoensurethatd≤n

49

@

@✓(✓|X|X✓ � 2✓|X|y + y|y) = 0

(X|X)✓ �X|y = 0

(X|X)✓ = X|y

✓ = (X|X)�1X|y

y =

2

6664

y(1)

y(2)

...y(n)

3

7775X =

2

66666664

1 x

(1)1 . . . x

(1)d

......

. . ....

1 x

(i)1 . . . x

(i)d

......

. . ....

1 x

(n)1 . . . x

(n)d

3

77777775

✓ y

Page 49: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

GradientDescentvsClosedForm

GradientDescentClosedFormSolu+on

50

•  RequiresmulIpleiteraIons•  Needtochooseα•  Workswellwhennislarge•  Cansupportincremental

learning

•  Non-iteraIve•  Noneedforα•  Slowifnislarge

–CompuIng(XTX)-1isroughlyO(n3)

Page 50: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

ImprovingLearning:FeatureScaling

•  Idea:Ensurethatfeaturehavesimilarscales

•  Makesgradientdescentconvergemuchfaster

51

0

5

10

15

20

0 5 10 15 20✓1

✓2

BeforeFeatureScaling

0

5

10

15

20

0 5 10 15 20✓1

✓2

AverFeatureScaling

Page 51: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

FeatureStandardizaIon•  Rescalesfeaturestohavezeromeanandunitvariance

– Letμjbethemeanoffeaturej:

– Replaceeachvaluewith:

•  sjisthestandarddeviaIonoffeaturej •  Couldalsousetherangeoffeaturej (maxj–minj)forsj

•  MustapplythesametransformaIontoinstancesforbothtrainingandpredicIon

•  Outlierscancauseproblems

52

µj =1

n

nX

i=1

x

(i)j

x

(i)j

x

(i)j � µj

sj

forj=1...d(notx0!)

Page 52: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

QualityofFit

OverfiHng:•  Thelearnedhypothesismayfitthetrainingsetverywell()

•  ...butfailstogeneralizetonewexamples

53

Price

Size

Price

Size

Price

Size

Underfinng(highbias)

Overfinng(highvariance)

Correctfit

J(✓) ⇡ 0

BasedonexamplebyAndrewNg

Page 53: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

RegularizaIon•  AmethodforautomaIcallycontrollingthecomplexityofthelearnedhypothesis

•  Idea:penalizeforlargevaluesof–  CanincorporateintothecostfuncIon– Workswellwhenwehavealotoffeatures,eachthatcontributesabittopredicIngthelabel

•  CanalsoaddressoverfinngbyeliminaIngfeatures(eithermanuallyorviamodelselecIon)

54

✓j

Page 54: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

RegularizaIon•  LinearregressionobjecIvefuncIon

–  istheregularizaIonparameter()– NoregularizaIonon!

55

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2+ �

dX

j=1

✓2j

modelfittodata regularizaIon

✓0

� � � 0

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2+

2

dX

j=1

✓2j

Page 55: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

UnderstandingRegularizaIon

•  Notethat

–  Thisisthemagnitudeofthefeaturecoefficientvector!

•  Wecanalsothinkofthisas:

•  L2regularizaIonpullscoefficientstoward0

56

dX

j=1

✓2j = k✓1:dk22

dX

j=1

(✓j � 0)2 = k✓1:d � ~0k22

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2+

2

dX

j=1

✓2j

Page 56: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

UnderstandingRegularizaIon

•  Whathappensifwesettobehuge(e.g.,1010)?

57

�Price

Size

BasedonexamplebyAndrewNg

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2+

2

dX

j=1

✓2j

Page 57: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

UnderstandingRegularizaIon

•  Whathappensifwesettobehuge(e.g.,1010)?

58

�Price

Size0 0 0 0

BasedonexamplebyAndrewNg

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2+

2

dX

j=1

✓2j

Page 58: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

RegularizedLinearRegression

59

•  CostFuncIon

•  Fitbysolving

•  Gradientupdate:

min✓

J(✓)

✓j ✓j � ↵

1

n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j

✓0 ✓0 � ↵1

n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

regularizaIon

@

@✓jJ(✓)

@

@✓0J(✓)

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2+

2

dX

j=1

✓2j

� ↵�✓j

Page 59: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

RegularizedLinearRegression

60

✓0 ✓0 � ↵1

n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

•  Wecanrewritethegradientstepas:

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2+

2

dX

j=1

✓2j

✓j ✓j (1� ↵�)� ↵

1

n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j

✓j ✓j � ↵

1

n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j � ↵�✓j

Page 60: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

RegularizedLinearRegression

61

✓ =

0

BBBBB@X|X + �

2

666664

0 0 0 . . . 00 1 0 . . . 00 0 1 . . . 0...

......

. . ....

0 0 0 . . . 1

3

777775

1

CCCCCA

�1

X|y

•  ToincorporateregularizaIonintotheclosedformsoluIon:

Page 61: Linear Regression - Penn Engineeringcis519/fall2016/lectures/04_LinearRegression.pdfBased on slide by Christopher Bishop (PRML) Linear Basis FuncIon Models • Sigmoidal basis funcIons:

RegularizedLinearRegression

62

•  ToincorporateregularizaIonintotheclosedformsoluIon:

•  Canderivethisthesameway,bysolving

•  Canprovethatforλ>0,inverseexistsintheequaIonabove

✓ =

0

BBBBB@X|X + �

2

666664

0 0 0 . . . 00 1 0 . . . 00 0 1 . . . 0...

......

. . ....

0 0 0 . . . 1

3

777775

1

CCCCCA

�1

X|y

@

@✓J(✓) = 0