29
UVA CS 4501: Machine Learning Lecture 8: Review of Regression Dr. Yanjun Qi University of Virginia Department of Computer Science

UVA CS 4501: Machine Learning Lecture 8: Review of Regression · Multivariate Linear Regression Regression Y = Weighted linear sum of X’s Least-squares / GD / SGD Linear algebra

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

  • UVACS4501:MachineLearning

    Lecture8:ReviewofRegression

    Dr.YanjunQi

    UniversityofVirginia

    DepartmentofComputerScience

  • Wherearewe?èFivemajorsec@onsofthiscourse

    q Regression(supervised)q Classifica@on(supervised)q Unsupervisedmodelsq Learningtheoryq Graphicalmodels

    9/28/18 2

    Dr.YanjunQi/UVACS

  • Regression(supervised)q Fourwaystotrain/performop@miza@onforlinearregressionmodelsq NormalEqua@onq GradientDescent(GD)q Stochas@cGDq Newton’smethod

    q Supervisedregressionmodels

    q Linearregression(LR)q LRwithnon-linearbasisfunc@onsq LocallyweightedLRq LRwithRegulariza@ons

    9/28/18 3

    Dr.YanjunQi/UVACS

  • Lecture3

    q Linearregression(akaleastsquares)q Learntoderivetheleastsquareses@matebynormalequa@on

    q Evalua@onwithCross-valida@on

    9/28/18 4

    Dr.YanjunQi/UVACS

  • Lecture-4

    q Morewaystotrain/performop@miza@onforlinearregressionmodelsü Review:GradientDescentü GradientDescent(GD)forLRü Stochas@cGD(SGD)forLR

    9/28/18 5

    Dr.YanjunQi/UVACS

  • Lecture-5

    q RegressionModelsBeyondLinearü LRwithnon-linearbasisfunc@onsü Instance-basedRegression:K-NearestNeighborsü Locallyweightedlinearregressionü RegressiontreesandMul@linearInterpola@on(later)

    9/28/18 6

    Dr.YanjunQi/UVACS

  • Lecture-6

    q LinearRegressionModelwithRegulariza@onsü  Review:(Ordinary)Leastsquares:squaredloss(NormalEqua@on)ü  Ridgeregression:squaredlosswithL2regulariza@onü  Lassoregression:squaredlosswithL1regulariza@onü  Elas@cregression:squaredlosswithL1ANDL2regulariza@onü WHYandInfluenceofRegulariza@onParameter

    9/28/18 7

    Dr.YanjunQi/UVACS

  • Lecture-7

    q FeatureSelec@onü  GeneralIntroduc@onü  Filteringü Wrapperü  EmbeddedMethod

    9/28/18 8

    Dr.YanjunQi/UVACS

  • 9/28/18 9

    Learner Reference Data

    Model f

    Execution Engine

    Model f Tagged Data

    Production Data

    Deployment(e.g.,security)

    Consistsof(x,y)pairs

    Dr.YanjunQi/UVACS

    Training/Evalua@on

  • 9/28/18

    Low-level sensing

    Pre-processing

    Feature Extract

    Feature Select

    Testing: Inference, Prediction, Recognition

    Label Collection

    Dr.YanjunQi/

    10

    Evaluation

    Training: Optimization

    e.g. Data Cleaning Task-relevant

    ThisCourse:BeforeDeployment

  • Task

    Machine Learning in a Nutshell

    Representation

    Score Function

    Search/Optimization

    Models, Parameters

    9/28/18 11

    Dr.YanjunQi/UVACS

  • Multivariate Linear Regression

    Regression

    Y = Weighted linear sum of X’s

    Least-squares / GD / SGD

    Linear algebra

    Regression coefficients

    Task

    Representation

    Score Function

    Search/Optimization

    Models, Parameters

    ŷ = f (x) =θ T x9/28/18 12

    Dr.YanjunQi/UVACS

  • Multivariate Linear Regression with basis Expansion

    Regression

    Y = Weighted linear sum of (X basis expansion)

    SSE

    Linear algebra

    Regression coefficients

    Task

    Representation

    Score Function

    Search/Optimization

    Models, Parameters

    !! ŷ =θ0 + θ jϕ j(x)j=1m∑ =ϕ(x)Tθ

    9/28/18 13

    Dr.YanjunQi/UVACS

  • K-Nearest Neighbor

    Regression/ classification

    Local Smoothness

    NA

    NA

    Training Samples

    Task

    Representation

    Score Function

    Search/Optimization

    Models, Parameters

    9/28/18 14

    Dr.YanjunQi/UVACS6316/f16

  • Locally Weighted / Kernel Linear Regression

    Regression

    Y = Weighted linear sum of X’s

    Weighted SSE

    Linear algebra

    Local Regression coefficients

    (conditioned on each test point)

    Task

    Representation

    Score Function

    Search/Optimization

    Models, Parameters

    0000 )(ˆ)(ˆ)(ˆ xxxxf βα +=min

    α (x0 ),β(x0 )Kλ(x0 ,xi )[ yi −α(x0)−β(x0)xi ]2

    i=1

    N

    ∑9/28/18 15

    Dr.YanjunQi/UVACS

    θ*(x0)= (BTW(x0)B)−1BTW(x0)y

  • Regularized multivariate linear regression

    Regression

    Y = Weighted linear sum of X’s

    Least-squares + Regularization

    Linear algebra for Ridge / sub-GD for Lasso & Elastic

    Regression coefficients (regularized weights)

    Task

    Representation

    Score Function

    Search/Optimization

    Models, Parameters

    9/28/18 16

    Dr.YanjunQi/UVACS

    min J(β ) = Y −Y^⎛

    ⎝⎞⎠

    2

    i=1

    n

    ∑ + λ( β jq )1/qj=1

    p

  • FeatureSelec@on:filtersvs.wrappersvs.embedding

    n  Maingoal:ranksubsetsofusefulfeatures

    FromDr.IsabelleGuyon9/28/18

    Dr.YanjunQi/UVACS

    17

  • Complexity versus Goodness of Fit: Model Selection

    x

    y

    x

    y

    x

    y

    x

    y

    Too simple?

    Too complex ? About right ?

    Training data

    What ultimately matters: GENERALIZATION

    LowVariance/HighBias

    LowBias/HighVariance

    9/28/18 18

    Dr.YanjunQi/UVACS

  • e.g.Byk=10foldCrossValida@onmodel P1 P2 P3 P4 P5 P6 P7 P8 P9 P10

    1 train train train train train train train train train test

    2 train train train train train train train train test train

    3 train train train train train train train test train train

    4 train train train train train train test train train train

    5 train train train train train test train train train train

    6 train train train train test train train train train train

    7 train train train test train train train train train train

    8 train train test train train train train train train train

    9 train test train train train train train train train train

    10 test train train train train train train train train train

    •  Dividedatainto10equalpieces

    •  9piecesastrainingset,therest1astestset

    •  Collectthescoresfromthediagonal

    •  Wenormallyusethemeanofthescores 19 9/28/18

    Dr.YanjunQi/UVACSMakesurethatthetrain/test/valida@onfoldsareindeedindependentsamples.

  • EvaluaNone.g.Regression(1Dexample)

    9/28/18

    20

    y^=θ 0+θ1 x1

    θ * = XTX( )−1XT !y

    Dr.YanjunQi/UVACS

    ε1ε2

    := ε i

    Jtest =1m

    (x iTθ * − yi )2i=n+1

    n+m

    ∑ = 1m ε i2

    i=n+1

    n+m

    Tes@ngMSEErrortoreport:

  • e.g.APrac@calApplica@onofRegressionModel

    9/28/18

    Dr.YanjunQi/UVACS

    21

    ProceedingsofHLT’2010HumanLanguageTechnologies:

  • 9/28/18

    Dr.YanjunQi/UVACS

    22

    ThefeatureweightscanbedirectlyinterpretedasU.S.dollarscontributedtothepredictedvalueyˆbyeachoccurrenceofthefeature.

    tomovies

    AREALAPPLICATION:MovieReviewsandRevenues:AnExperimentinTextRegression,ProceedingsofHLT'10HumanLanguageTechnologies:

  • 9/28/18

    Dr.YanjunQi/UVACS

    23

    Acombina@onofthemetaandtextfeaturesachievesthebestperformancebothintermsofMAEandpearsonr.

  • 9/28/18

    Dr.YanjunQi/UVACS

    24

    MovieReviewsandRevenues:AnExperimentinTextRegression,ProceedingsofHLT'10HumanLanguageTechnologies:

    Thefeaturesarefromthetext-onlymodelannotatedinTable2(total,notperscreen).ThefeatureweightscanbedirectlyinterpretedasU.S.dollarscontributedtothepredictedvaluebyeachoccurrenceofthefeature.Sen@ment-relatedtextfeaturesarenotasprominentasmightbeexpected,andtheiroverallpropor@oninthesetoffeatureswithnon-zeroweightsisquitesmall(es@matedinpreliminarytrialsatlessthan15%).Phrasesthatrefertometadataarethemorehighlyweightedandfrequentones.

  • •  Pearsoncorrela@oncoefficient

    •  Forregression:9/28/18

    Dr.YanjunQi/UVACS

    r(x , y)=(xi − x)( yi − y)

    i=1

    m

    (xi − x)2 × ( yi − y)2i=1

    m

    ∑i=1

    m

    wherex = 1m xii=1

    m

    ∑ and y = 1m yii=1

    m

    ∑ .

    r(x , y) ≤1

    MoreforMeasuringRegressionPerdi@ons:Correla@onCoefficient

    r(!ypredicted ,

    !yknown )

    •  MeasuringthelinearcorrelaNonbetweentwosequences,xandy,

    •  givingavaluebetween+1and−1inclusive,where1istotalposi@vecorrelaNon,0isnocorrelaNon,and−1istotalnega@vecorrelaNon.

    25

  • 9/28/18 26

    AnOpera@onalModelofMachineLearning

    Learner Reference Data

    Model

    Execution Engine

    Model Tagged Data

    Production Data Deployment

    Consistsofinput-outputpairs

    Dr.YanjunQi/UVACS

  • MoreGoalsinGeneral

    •  1.GeneralizeWell–  Connec@ngtoAsympto@cERRORBOUND

    •  2.Interpretable–  Especiallyforsomedomains,thisisabouttrust!

    •  3.Computa@onalEfficiency+Scalable

    •  4.Robustness

    9/28/18

    Dr.YanjunQi/UVACS

    27

  • 28

    Probabilis@cInterpreta@onofLinearRegression(LATER)

    •  Letusassumethatthetargetvariableandtheinputsarerelatedbytheequa@on:

    whereεisanerrortermofunmodeledeffectsorrandomnoise

    •  NowassumethatεfollowsaGaussianN(0,σ),thenwehave:

    •  Byiid(amongsamples)assump@on:

    yi =θTx i + ε i

    ⎟⎟⎠

    ⎞⎜⎜⎝

    ⎛ −−= 22

    221

    σθ

    σπθ )(exp);|( i

    Ti

    iiyxyp x

    ⎟⎟

    ⎞⎜⎜

    ⎛ −−⎟

    ⎠⎞⎜

    ⎝⎛== ∑∏ =

    =2

    12

    1 221

    σθ

    σπθθ

    n

    i iT

    inn

    iii

    yxypL

    )(exp);|()(

    x

    9/28/18

    Dr.YanjunQi/UVACS

    Manymorevaria@onsofLinearRfromthisperspec@ve,e.g.binomial/poisson

    (LATER)

  • References

    q BigthankstoProf.EricXing@CMUforallowingmetoreusesomeofhisslides

    q Prof.AlexanderGray’sslides

    9/28/18 29

    Dr.YanjunQi/UVACS