UVA CS 4501: Machine Learning Lecture 22: Review...(3) Locally Weighted / Kernel Regression Regression Y = local weighted linear sum of X’s Least-squares Linear algebra Local Regression

UVACS4501:MachineLearning

Lecture22:Review

Dr.YanjunQi

UniversityofVirginia

DepartmentofComputerScience

12/5/18

Dr.YanjunQi/UVACS

1

Objective

•  Tohelpstudentsbeabletobuildmachinelearningtools–  (notjustatooluser!!!)

•  KeyResults:– Abletobuildafewsimplemachinelearningmethodsfromscratch

– Abletounderstandafewcomplexmachinelearningmethodsatthesourcecodelevel

12/5/18

Dr.YanjunQi/UVACS

2

Today

q ReviewofMLmethodscoveredsofarq Regression(supervised)q Classification(supervised)q Unsupervisedmodelsq Learningtheory

q ReviewofAssignmentscoveredsofar

12/5/18 3

Dr.YanjunQi/UVACS

ATypicalMachineLearningPipeline

12/5/18

Low-level sensing

Pre-processing

Feature Extract

Feature Select

Inference, Prediction, Recognition

Label Collection

4

Evaluation

Optimization

e.g. Data Cleaning Task-relevant

Dr.YanjunQi/UVACS

12/5/18 5

AnOperationalModelofMachineLearning

Learner Reference Data

Model

Execution Engine

Model Tagged Data

Production Data Deployment

Consistsofinput-outputpairs

Dr.YanjunQi/UVACS

Machine Learning in a Nutshell

Task

Representation

Score Function

Search/Optimization

Models, Parameters

12/5/18 6

MLgrewoutofworkinAIOptimizeaperformancecriterionusingexampledataorpastexperience,Aimingtogeneralizetounseendata

Dr.YanjunQi/UVACS

Whatwehavecovered

12/5/18

Dr.YanjunQi/UVACS

7

Task Regression,classification,clustering,dimen-reduction

Representation

Linearfunc,nonlinearfunction(e.g.polynomialexpansion),locallinear,logisticfunction(e.g.p(c|x)),tree,multi-layer,prob-densityfamily(e.g.Bernoulli,multinomial,Gaussian,mixtureofGaussians),localfuncsmoothness,

ScoreFunction

MSE,Hinge,log-likelihood,EPE(e.g.L2lossforKNN,0-1lossforBayesclassifier),cross-entropy,clusterpointsdistancetocenters,variance,

Search/Optimization

Normalequation,gradientdescent,stochasticGD,Newton,Linearprogramming,Quadraticprogramming(quadraticobjectivewithlinearconstraints),greedy,EM,asyn-SGD,eigenDecomp

Models,Parameters

Regularization(e.g.L1,L2)

Scikit-learnalgorithmcheat-sheet

12/5/18 8

http://scikit-learn.org/stable/tutorial/machine_learning_map/Dr.YanjunQi/UVACS

12/5/18 9

Scikit-learn:Regression

LinearmodelfittedbyminimizingaregularizedempiricallosswithSGD

Dr.YanjunQi/UVACS

Scikit-learn:Classification

12/5/18 10

Linearclassifiers(SVM,logisticregression…)withSGDtraining.

approximatetheexplicitfeaturemappingsthatcorrespondtocertainkernels

Tocombinethepredictionsofseveralbaseestimatorsbuiltwithagivenlearningalgorithminordertoimprovegeneralizability/robustnessoverasingleestimator.(1)averaging/bagging(2)boosting

Dr.YanjunQi/UVACS

12/5/18 11

BasicPCA

Bayes-NetHMM

Kmeans+GMM

ThenafterclassificationDr.YanjunQi/UVACS

12/5/18 12

http://scikit-learn.org/stable/Dr.YanjunQi/UVACS

Today



12/5/18 13

Dr.YanjunQi/UVACS

SUPERVISED LEARNING

•  Find function to map input space X to output space Y

•  Generalisation:learnfunction/hypothesisfrompastdatainorderto“explain”,“predict”,“model”or“control”newdataexamples

12/5/18 14KEY

Dr.YanjunQi/UVACS

Whatwehavecovered(I)q SupervisedRegressionmodels

– Linearregression(LR)– LRwithnon-linearbasisfunctions– LocallyweightedLR– LRwithRegularizations–  Featureselection*

12/5/18 15

Dr.YanjunQi/UVACS

ADataset

•  Data/points/instances/examples/samples/records:[rows]•  Features/attributes/dimensions/independentvariables/covariates/

predictors/regressors:[columns,exceptthelast]•  Target/outcome/response/label/dependentvariable:special

columntobepredicted[lastcolumn]

12/5/18 16

YiscontinuousOutput Y as

continuous values

Dr.YanjunQi/UVACS

(1) Multivariate Linear Regression

Regression

Y = Weighted linear sum of X’s

Least-squares

Linear algebra

Regression coefficients

Task

Representation

Score Function

Search/Optimization

Models, Parameters

ŷ = f (x) =θ0 +θ1x1 +θ2x

2

12/5/18 17θ

Dr.YanjunQi/UVACS

(2) Multivariate Linear Regression with basis Expansion

Regression

Y = Weighted linear sum of (X basis expansion)

Least-squares

Linear algebra

Regression coefficients

Task

Representation

Score Function

Search/Optimization

Models, Parameters

ŷ =θ0 + θ jϕ j (x)j=1m

∑ =ϕ(x)θ12/5/18 18

Dr.YanjunQi/UVACS

(3) Locally Weighted / Kernel Regression

Regression

Y = local weighted linear sum of X’s

Least-squares

Linear algebra

Local Regression coefficients

(conditioned on each test point)

Task

Representation

Score Function

Search/Optimization

Models, Parameters

0000 )(ˆ)(ˆ)(ˆ xxxxf βα +=min

α (x0 ),β (x0 )Kλ (xi, x0 )[yi −α(x0 )−β(x0 )xi ]

2

i=1

N

∑12/5/18 19

Dr.YanjunQi/UVACS

(4) Regularized multivariate linear regression

Regression

Y = Weighted linear sum of X’s

Least-squares+ Regularization -norm

Normal Eq / SGD / GD

Regularized Regression coefficients

Task

Representation

Score Function

Search/Optimization

Models, Parameters

min J(β) = Y −Y^"

#$

%&'2

i=1

n

∑ +λ β j2j=1

p

∑12/5/18 20

Dr.YanjunQi/UVACS

(5) Feature Selection

Dimension Reduction

n/a

Many possible options

Greedy (mostly)

Reduced subset of features

Task

Representation

Score Function

Search/Optimization

Models, Parameters

12/5/18 21

Dr.YanjunQi/UVACS

(5)FeatureSelection•  Thousandstomillionsoflowlevelfeatures:selectthemostrelevantonetobuildbetter,faster,andeasiertounderstandlearningmachines.

X

p

n

p’

FromDr.IsabelleGuyon12/5/18 22

Dr.YanjunQi/UVACS

Today



12/5/18 23

Dr.YanjunQi/UVACS

Whatwehavecovered(II)q SupervisedClassificationmodels

– K-nearestNeighbor– SupportVectorMachine– LogisticRegression– NeuralNetwork(e.g.MLP)– BayesClassifier– Randomforest/DecisionTree

12/5/18 24

Dr.YanjunQi/UVACS

Threemajorsectionsforclassification

•  We can divide the large variety of classification approaches into roughly three major types

1. Discriminative - directly estimate a decision rule/boundary - e.g., logistic regression, support vector machine, decisionTree 2. Generative: - build a generative statistical model - e.g., naïve bayes classifier, Bayesian networks 3. Instance based classifiers - Use observation directly (no models) - e.g. K nearest neighbors

12/5/18 25

Dr.YanjunQi/UVACS

ADatasetforclassification

•  Data/points/instances/examples/samples/records:[rows]•  Features/attributes/dimensions/independentvariables/covariates/predictors/regressors:[columns,exceptthelast]•  Target/outcome/response/label/dependentvariable:specialcolumntobepredicted[lastcolumn]

12/5/18 26

Output as Discrete Class Label

C1, C2, …, CL

C

CDr.YanjunQi/UVACS

(1) K-Nearest Neighbor

classification

Local Smoothness

EPE with L2 loss è conditional mean (Extra)

NA

Training Samples

Task

Representation

Score Function

Search/Optimization

Models, Parameters

12/5/18 27

Dr.YanjunQi/UVACS

(2) Support Vector Machine

classification

Kernel Func K(xi, xj)

Margin + Hinge Loss (optional)

QP with Dual form

Dual Weights

Task

Representation

Score Function

Search/Optimization

Models, Parameters

€

w = α ixiyii∑

argminw,b

wi2

i=1p∑ +C εi

i=1

n

∑

subject to ∀xi ∈ Dtrain : yi xi ⋅w+b( ) ≥1−εi

K(x, z) :=Φ(x)TΦ(z)

12/5/18 28

Dr.YanjunQi/UVACS

(3) Logistic Regression

classification

Log-odds(Y) = linear function of X’s

EPE, with conditional Log-likelihood

Iterative (Newton) method

Logistic weights

Task

Representation

Score Function

Search/Optimization

Models, Parameters

P(c =1 x) = eα+βx

1+ eα+βx12/5/18 29

Dr.YanjunQi/UVACS

(4) Neural Network

conditional Log-likelihood , Cross-Entropy / MSE

SGD / Backprop

NN network Weights

Task

Representation

Score Function

Search/Optimization

Models, Parameters

12/5/18 30

Classification / Regression

Multilayer Network topology

Dr.YanjunQi/UVACS

Logistic regression

Logistic regression could be illustrated as a module

On input x, it outputs ŷ:

where

Draw a logisticregression unit as:

Σ

x1

x2

x3

+1

ŷ=P(Y=1|X,Θ)wT !x + b

11+ e− z

Bias

Summing function

Activation function

Output

12/5/18 31

Dr.YanjunQi/UVACS

Multi-LayerPerceptron(MLP)String a lot of logistic units together. Example: 3 layer network:

x1

x2

x3

+1 +1

a3

a2

a1

hiddeninput output

y

12/5/18 32

Dr.YanjunQi/UVACS

Backpropagation

●  Back-propagationtrainingalgorithm

Network activation Forward Step

Error propagation Backward Step

12/5/18 33

Dr.YanjunQi/UVACS

DeepLearningWay:Learningfeatures/Representationfromdata

Feature Engineering ü  Most critical for accuracy ü  Account for most of the computation for testing ü  Most time-consuming in development cycle ü  Often hand-craft and task dependent in practice

Feature Learning ü  Easily adaptable to new similar tasks ü  Layerwise representation ü  Layer-by-layer unsupervised training ü  Layer-by-layer supervised training 3412/5/18

Dr.YanjunQi/UVACS

(5) Bayes Classifier

classification

Prob. models p(X|C)

EPE with 0-1 loss, with Log likelihood(optional)

MLE

Prob. Models’ Parameter

Task

Representation

Score Function

Search/Optimization

Models, Parameters

P(X1, ⋅ ⋅ ⋅,Xp |C)

argmaxk

P(C _ k | X) = argmaxk

P(X,C) = argmaxk

P(X |C)P(C)

P̂(Xj |C = ck ) =1

2πσ jkexp −

(X j−µ jk )2

2σ jk2

"

#$$

%

&''

P(W1 = n1,...,Wv = nv | ck ) =N !

n1k !n2k !..nvk !θ1kn1kθ2k

n2k ..θvknvk

p(Wi = true | ck ) = pi,kBernoulliNaïve

GaussianNaive

Multinomial12/5/18

Dr.YanjunQi/UVACS

35

(6) Decision Tree / Random Forest

Greedy to find partitions

Split Purity measure / e.g. IG / cross-entropy / Gini /

Tree Model (s), i.e. space partition

Task

Representation

Score Function

Search/Optimization

Models, Parameters

12/5/18 37

Classification

Partition feature space into set of rectangles, local smoothness

Dr.YanjunQi/UVACS

Anatomyofadecisiontree

overcast

high normal falsetrue

sunny rain

No NoYes Yes

Yes

Outlook

HumidityWindy

Eachnodeisatestononefeature/attribute

Possibleattributevaluesofthenode

Leavesarethedecisions

12/5/18 38

Dr.YanjunQi/UVACS

Informationgain

•  IG(X_i,Y)=H(Y)-H(Y|X_i)ReductioninuncertaintybyknowingafeatureX_i

Informationgain:=(informationbeforesplit)–(informationaftersplit)=entropy(parent)–[averageentropy(children)]

Fixed thelower,thebetter(childrennodesarepurer)

– ForIG,thehigher,thebetter=

12/5/18 39

Dr.YanjunQi/UVACS

RandomForestClassifierNexamples

....…

....…

Takehemajorityvote

Mfeatures

12/5/18 40

Dr.YanjunQi/UVACS

12/5/18 41

ü differentassumptionsondataü differentscalabilityprofilesattrainingtimeü differentlatenciesatprediction(test)timeü differentmodelsizes(embedabilityinmobiledevices)ü differentlevelofmodelinterpretability

AdaptfromOlivierGrisel’stalk

Dr.YanjunQi/UVACS

https://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html

Today



12/5/18 42

Dr.YanjunQi/UVACS

Whatwehavecovered(III)

q Unsupervisedmodels– DimensionReduction(PCA)– Hierarchicalclustering– K-meansclustering– GMM/EMclustering

12/5/18 43

Dr.YanjunQi/UVACS

AnunlabeledDatasetX

•  Data/points/instances/examples/samples/records:[rows]•  Features/attributes/dimensions/independentvariables/covariates/predictors/regressors:[columns]12/5/18 44

a data matrix of n observations on p variables x1,x2,…xp

Unsupervisedlearning=learningfromraw(unlabeled,unannotated,etc)data,asopposedtosuperviseddatawherealabelofexamplesisgiven

Dr.YanjunQi/UVACS

(0) Principal Component Analysis (Self Learn)

Dimension Reduction

Gaussian assumption

Direction of maximum variance

Eigen-decomp

Principal components

Task

Representation

Score Function

Search/Optimization

Models, Parameters

12/5/18 45

Dr.YanjunQi/UVACS

Whatwehavecovered(III)

q Unsupervisedmodels– DimensionReduction(PCA)– Hierarchicalclustering– K-meansclustering– GMM/EMclustering

12/5/18 46

Dr.YanjunQi/UVACS

•  Find groups (clusters) of data points such that data points in a group will be similar (or related) to one another and different from (or unrelated to) the data points in other groups

Whatisclustering?

Inter-cluster distances are maximized

Intra-cluster distances are

minimized

12/5/18 47

Dr.YanjunQi/UVACS

12/5/18

Issuesforclustering•  Whatisanaturalgroupingamongtheseobjects?

–  Definitionof"groupness"•  Whatmakesobjects“related”?

–  Definitionof"similarity/distance"•  Representationforobjects

–  Vectorspace?Normalization?•  Howmanyclusters?

–  Fixedapriori?–  Completelydatadriven?

•  Avoid“trivial”clusters-toolargeorsmall•  ClusteringAlgorithms

–  Partitionalalgorithms–  Hierarchicalalgorithms

•  Formalfoundationandconvergence48

Dr.YanjunQi/UVACS

12/5/18

ClusteringAlgorithms

•  Partitionalalgorithms– Usuallystartwitharandom(partial)partitioning

–  Refineititeratively•  Kmeansclustering•  Mixture-Modelbasedclustering

•  Hierarchicalalgorithms–  Bottom-up,agglomerative–  Top-down,divisive

49

Dr.YanjunQi/UVACS

(1) Hierarchical Clustering

Clustering

Distance measure

No clearly defined loss

greedy bottom-up (or top-down)

Dendrogram (tree)

Task

Representation

Score Function

Search/Optimization

Models, Parameters

12/5/18 50

Dr.YanjunQi/UVACS

Example: single link

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡

04507

0

54)3,2,1(

54)3,2,1(

12 3 4

5

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

0458079

030

543)2,1(

543)2,1(

⎥⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢⎢

⎣

⎡

0458907910

03602

0

54321

54321

5},min{ 5),3,2,1(4),3,2,1()5,4(),3,2,1( == ddd

12/5/18 51

Dr.YanjunQi/UVACS

(2) K-means Clustering

Clustering

Spherical clusters

Sum-of-square distance to centroid

K-means (special case of EM) algorithm

Cluster membership &

centroid

Task

Representation

Score Function

Search/Optimization

Models, Parameters

12/5/18 52

Dr.YanjunQi/UVACS

12/5/18

53

K-meansClustering:Step2-Determinethemembershipofeachdatapoints

Dr.YanjunQi/UVACS

(3) GMM Clustering

Clustering

Likelihood

EM algorithm

Each point’s soft membership &

mean / covariance per cluster

Task

Representation

Score Function

Search/Optimization

Models, Parameters

12/5/18 54

MixtureofGaussians

Dr.YanjunQi/UVACS

log p(x = xi )i=1

n

∏i∑ = log p(µ = µ j )

1

2π( )p/2 Σ j1/2

e−1

2!x−!µ j( )T Σ j−1 !x− !µ j( )

µ j∑⎡

⎣

⎢⎢⎢

⎤

⎦

⎥⎥⎥i

∑

12/5/18

55

Expectation-MaximizationfortrainingGMM

•  Start:– "Guess"thecentroidmkandcovarianceSkofeachoftheKclusters

•  Loop each cluster, revising both the mean (centroid position) and covariance (shape)

Dr.YanjunQi/UVACS

For each point, revising its proportions belonging to each of the K clusters

Dr.YanjunQi/UVACS

56

Compare:K-means

•  TheEMalgorithmformixturesofGaussiansislikea"softversion"oftheK-meansalgorithm.

•  IntheK-means“E-step”wedohardassignment:

•  IntheK-means“M-step”weupdatethemeansastheweightedsumofthedata,butnowtheweightsare0or1:

12/5/18

12/5/18 57

ü differentassumptionsondataü differentscalabilityprofilesü differentmodelsizes(embedabilityinmobiledevices)

Dr.YanjunQi/UVACShttps://scikit-learn.org/stable/auto_examples/cluster/plot_cluster_comparison.html

Today



12/5/18 58

Dr.YanjunQi/UVACS

Whatwehavecovered(IV)

q Learningtheory/Modelselection– K-foldscrossvalidation– Expectedpredictionerror– Biasandvariancetradeoff

12/5/18 59

Dr.YanjunQi/UVACS

CV-basedModelSelectionWe’retryingtodecidewhichalgorithm/

hyperparametertouse.•  Wetraineachmodelandmakeatable...

12/5/18

Dr.YanjunQi/UVACS

60

Hyperparametertuning….

StatisticalDecisionTheory(Extra)

•  Random input vector: X •  Random output variable:Y •  Joint distribution: Pr(X,Y ) •  Loss function L(Y, f(X))

•  Expected prediction error (EPE): • 

€

EPE( f ) = E(L(Y, f (X))) = L(y, f (x))∫ Pr(dx,dy) e.g. = (y − f (x))2∫ Pr(dx,dy)

Consider population distribution

e.g. Squared error loss (also called L2 loss )

12/5/18 61

Dr.YanjunQi/UVACS

Bias-VarianceTrade-offforEPE: (Extra)

EPE = noise2 + bias2 + variance

Unavoidable error

Error due to incorrect

assumptions

Error due to variance of training

samples

Slidecredit:D.Hoiem12/5/18 62

Dr.YanjunQi/UVACS

Bias-Variance Tradeoff / Model Selection

63

underfit region overfit region

12/5/18

Dr.YanjunQi/UVACS

Model “bias” & Model “variance”

•  Middle RED: –  TRUE function

•  Error due to bias: –  How far off in general

from the middle red

•  Error due to variance: –  How wildly the blue

points spread

12/5/18 64

Dr.YanjunQi/UVACS

needtomakeassumptionsthatareabletogeneralize

•  Components of generalization error –  Bias: how much the average model over all training sets differ from

the true model? •  Error due to inaccurate assumptions/simplifications made by the

model –  Variance: how much models estimated from different training sets

differ from each other •  Underfitting: model is too “simple” to represent all the

relevant class characteristics –  High bias and low variance –  High training error and high test error

•  Overfitting: model is too “complex” and fits irrelevant characteristics (noise) in the data –  Low bias and high variance –  Low training error and high test error

Slidecredit:L.Lazebnik12/5/18 65

Dr.YanjunQi/UVACS

Task

Machine Learning in a Nutshell

Representation

Score Function

Search/Optimization

Models, Parameters

12/5/18 66

MLgrewoutofworkinAIOptimizeaperformancecriterionusingexampledataorpastexperience,Aimingtogeneralizetounseendata

Dr.YanjunQi/UVACS

Whatwehavecoveredforeachcomponent

12/5/18

Dr.YanjunQi/UVACS

67

Task Regression,classification,clustering,dimen-reduction

Representation

Linearfunc,nonlinearfunction(e.g.polynomialexpansion),locallinear,logisticfunction(e.g.p(c|x)),tree,multi-layer,prob-densityfamily(e.g.Bernoulli,multinomial,Gaussian,mixtureofGaussians),localfuncsmoothness,kernelmatrix,localsmoothness,partitionoffeaturespace,

ScoreFunction

MSE,Margin,log-likelihood,EPE(e.g.L2lossforKNN,0-1lossforBayesclassifier),cross-entropy,clusterpointsdistancetocenters,variance,conditionallog-likelihood,completedata-likelihood,regularizedlossfunc(e.g.L1,L2),goodnessofinter-clustersimilar

Search/Optimization

Normalequation,gradientdescent,stochasticGD,Newton,Linearprogramming,Quadraticprogramming(quadraticobjectivewithlinearconstraints),greedy,EM,asyn-SGD,eigenDecomp,backprop

Models,Parameters

Linearweightvector,basisweightvector,localweightvector,dualweights,trainingsamples,tree-dendrogram,multi-layerweights,principlecomponents,member(soft/hard)assignment,clustercentroid,clustercovariance(shape),…

Today



12/5/18 68

Dr.YanjunQi/UVACS

HW1

•  Q1:Linearalgebrareview•  Q2:Linearregressionmodelfitting– Dataloading– Basiclinearregression– Threewaystotrain:Normalequation/SGD/BatchGD

•  SampleexamQ:–  regressionmodelfitting

12/5/18

Dr.YanjunQi/UVACS6316/f16

69

Task

Representation

Score Function

Search/Optimization

Models, Parameters

HW2

•  Q1:Polynomialregression•  Q2:Ridgeregression

– Mathderivationofridge– Understandwhy/howRidge– ModelselectionofRidgewithKCV

•  SampleQs:– Regularization

12/5/18


70

Task

Representation

Score Function

Search/Optimization

Models, Parameters

HW3

•  Q1:KNNandmodelselection•  Q2:SupportVectorMachineswithScikit-Learn– Datapreprocessing– HowtouseSVMpackage– ModelselectionforSVM– Modelselectionpipelinewithtrain-vali,ortrain-CV;thentest

•  SampleQAs:–  SVM

12/5/18


71

Task

Representation

Score Function

Search/Optimization

Models, Parameters

HW4•  Q1:NeuralNetworkTensorflowPlayground–  InteractivelearningofMLP–  Featureengineeringvs.–  Featurelearning

•  Q2:ImageClassification/Keras– DNNTool:Kerasusing–  Extra:PCAforimageclassification

•  SampleQs:– NeuralNets

12/5/18


72

Task

Representation

Score Function

Search/Optimization

Models, Parameters

HW5•  Q1:NaiveBayesClassifierforText-baseMovieReviewClassification– Preprocessingoftextsamples– BOWDocumentRepresentation– MultinomialNaiveBayesClassifierBOWway

– MultivariateBernoulliNaiveBayesClassifier

•  SampleQs:– BayesClassifier12/5/18


73

Task

Representation

Score Function

Search/Optimization

Models, Parameters

HW6

•  Q1:UnsupervisedClusteringofaudiodataandconsensusdata– Dataloading– K-meanclustering– HowtofindK:knee-findingplot– Howtomeasureclustering:purityMetric

•  SampleQs:– KmeansandGMM– Decisiontrees

12/5/18


74

Task

Representation

Score Function

Search/Optimization

Models, Parameters

HighlyRecommendFourExtra-curriculumbooks

•  1.Book-AlgorithmstoLiveBy:TheComputerScienceofHumanDecisions– https://books.google.com/books/about/Algorithms_to_Live_By_The_Computer_Scien.html?id=xmeJCgAAQBAJ&source=kp_book_description

– Thisbookprovidesafascinatingexplorationofhowcomputeralgorithmscanbeappliedtooureverydaylives.

12/5/18

Dr.YanjunQi/UVACS

75


•  2.AMindforNumbers:HowtoExcelatMathandScience(EvenIfYouFlunkedAlgebra)–  https://www.goodreads.com/book/show/18693655-a-mind-for-numbers

– Agoodsummaryhttps://fastertomaster.com/a-mind-for-numbers-barbara-oakley/

–  Thebookisgearedtowardhelpingstudentsstudywell.Despitethetitleformath,mostoftheadviceinthisbookisappropriateforjustaboutanysubject.

12/5/18

Dr.YanjunQi/UVACS

76


•  3.Book:SoGoodTheyCannotIgnoreYou-WhySkillsTrumpPassionintheQuestforWorkYouLove– https://www.amazon.com/Good-They-Cant-Ignore-You-ebook/dp/B0076DDBJ6/ref=tmm_kin_swatch_0?_encoding=UTF8&qid=1497747881&sr=1-1\

– TheideaofCareercapital-rareandvaluableskillsneeddeliberatepractice

– 10,000hoursofdeliberatepracticeèExpert!

12/5/18

Dr.YanjunQi/UVACS

77


•  4.Book:HomoDeus-ABriefHistoryofTomorrow–  https://www.goodreads.com/book/show/31138556-homo-deus

–  “HomoDeusexplorestheprojects,dreamsandnightmaresthatwillshapethetwenty-firstcentury—fromovercomingdeathtocreatingartificiallife.Itasksthefundamentalquestions:Wheredowegofromhere?Andhowwillweprotectthisfragileworldfromourowndestructivepowers?Thisisthenextstageofevolution.ThisisHomoDeus.””

12/5/18

Dr.YanjunQi/UVACS

78

References

q Hastie,Trevor,etal.Theelementsofstatisticallearning.Vol.2.No.1.NewYork:Springer,2009.

q Prof.M.A.Papalaskar’sslidesq Prof.AndrewNg’sslides

12/5/18 79

Dr.YanjunQi/UVACS

Documents

UVA CS 4501: Machine Learning Lecture 22: Review...(3) Locally Weighted / Kernel Regression Regression Y = local weighted linear sum of X’s Least-squares Linear algebra Local Regression