Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
UVACS4501:MachineLearning
Lecture22:Review
Dr.YanjunQi
UniversityofVirginia
DepartmentofComputerScience
12/5/18
Dr.YanjunQi/UVACS
1
Objective
• Tohelpstudentsbeabletobuildmachinelearningtools– (notjustatooluser!!!)
• KeyResults:– Abletobuildafewsimplemachinelearningmethodsfromscratch
– Abletounderstandafewcomplexmachinelearningmethodsatthesourcecodelevel
12/5/18
Dr.YanjunQi/UVACS
2
Today
q ReviewofMLmethodscoveredsofarq Regression(supervised)q Classification(supervised)q Unsupervisedmodelsq Learningtheory
q ReviewofAssignmentscoveredsofar
12/5/18 3
Dr.YanjunQi/UVACS
ATypicalMachineLearningPipeline
12/5/18
Low-level sensing
Pre-processing
Feature Extract
Feature Select
Inference, Prediction, Recognition
Label Collection
4
Evaluation
Optimization
e.g. Data Cleaning Task-relevant
Dr.YanjunQi/UVACS
12/5/18 5
AnOperationalModelofMachineLearning
Learner Reference Data
Model
Execution Engine
Model Tagged Data
Production Data Deployment
Consistsofinput-outputpairs
Dr.YanjunQi/UVACS
Machine Learning in a Nutshell
Task
Representation
Score Function
Search/Optimization
Models, Parameters
12/5/18 6
MLgrewoutofworkinAIOptimizeaperformancecriterionusingexampledataorpastexperience,Aimingtogeneralizetounseendata
Dr.YanjunQi/UVACS
Whatwehavecovered
12/5/18
Dr.YanjunQi/UVACS
7
Task Regression,classification,clustering,dimen-reduction
Representation
Linearfunc,nonlinearfunction(e.g.polynomialexpansion),locallinear,logisticfunction(e.g.p(c|x)),tree,multi-layer,prob-densityfamily(e.g.Bernoulli,multinomial,Gaussian,mixtureofGaussians),localfuncsmoothness,
ScoreFunction
MSE,Hinge,log-likelihood,EPE(e.g.L2lossforKNN,0-1lossforBayesclassifier),cross-entropy,clusterpointsdistancetocenters,variance,
Search/Optimization
Normalequation,gradientdescent,stochasticGD,Newton,Linearprogramming,Quadraticprogramming(quadraticobjectivewithlinearconstraints),greedy,EM,asyn-SGD,eigenDecomp
Models,Parameters
Regularization(e.g.L1,L2)
Scikit-learnalgorithmcheat-sheet
12/5/18 8
http://scikit-learn.org/stable/tutorial/machine_learning_map/Dr.YanjunQi/UVACS
12/5/18 9
Scikit-learn:Regression
LinearmodelfittedbyminimizingaregularizedempiricallosswithSGD
Dr.YanjunQi/UVACS
Scikit-learn:Classification
12/5/18 10
Linearclassifiers(SVM,logisticregression…)withSGDtraining.
approximatetheexplicitfeaturemappingsthatcorrespondtocertainkernels
Tocombinethepredictionsofseveralbaseestimatorsbuiltwithagivenlearningalgorithminordertoimprovegeneralizability/robustnessoverasingleestimator.(1)averaging/bagging(2)boosting
Dr.YanjunQi/UVACS
12/5/18 11
BasicPCA
Bayes-NetHMM
Kmeans+GMM
ThenafterclassificationDr.YanjunQi/UVACS
12/5/18 12
http://scikit-learn.org/stable/Dr.YanjunQi/UVACS
Today
q ReviewofMLmethodscoveredsofarq Regression(supervised)q Classification(supervised)q Unsupervisedmodelsq Learningtheory
q ReviewofAssignmentscoveredsofar
12/5/18 13
Dr.YanjunQi/UVACS
SUPERVISED LEARNING
• Find function to map input space X to output space Y
• Generalisation:learnfunction/hypothesisfrompastdatainorderto“explain”,“predict”,“model”or“control”newdataexamples
12/5/18 14KEY
Dr.YanjunQi/UVACS
Whatwehavecovered(I)q SupervisedRegressionmodels
– Linearregression(LR)– LRwithnon-linearbasisfunctions– LocallyweightedLR– LRwithRegularizations– Featureselection*
12/5/18 15
Dr.YanjunQi/UVACS
ADataset
• Data/points/instances/examples/samples/records:[rows]• Features/attributes/dimensions/independentvariables/covariates/
predictors/regressors:[columns,exceptthelast]• Target/outcome/response/label/dependentvariable:special
columntobepredicted[lastcolumn]
12/5/18 16
YiscontinuousOutput Y as
continuous values
Dr.YanjunQi/UVACS
(1) Multivariate Linear Regression
Regression
Y = Weighted linear sum of X’s
Least-squares
Linear algebra
Regression coefficients
Task
Representation
Score Function
Search/Optimization
Models, Parameters
ŷ = f (x) =θ0 +θ1x1 +θ2x
2
12/5/18 17θ
Dr.YanjunQi/UVACS
(2) Multivariate Linear Regression with basis Expansion
Regression
Y = Weighted linear sum of (X basis expansion)
Least-squares
Linear algebra
Regression coefficients
Task
Representation
Score Function
Search/Optimization
Models, Parameters
ŷ =θ0 + θ jϕ j (x)j=1m
∑ =ϕ(x)θ12/5/18 18
Dr.YanjunQi/UVACS
(3) Locally Weighted / Kernel Regression
Regression
Y = local weighted linear sum of X’s
Least-squares
Linear algebra
Local Regression coefficients
(conditioned on each test point)
Task
Representation
Score Function
Search/Optimization
Models, Parameters
0000 )(ˆ)(ˆ)(ˆ xxxxf βα +=min
α (x0 ),β (x0 )Kλ (xi, x0 )[yi −α(x0 )−β(x0 )xi ]
2
i=1
N
∑12/5/18 19
Dr.YanjunQi/UVACS
(4) Regularized multivariate linear regression
Regression
Y = Weighted linear sum of X’s
Least-squares+ Regularization -norm
Normal Eq / SGD / GD
Regularized Regression coefficients
Task
Representation
Score Function
Search/Optimization
Models, Parameters
min J(β) = Y −Y^"
#$
%&'2
i=1
n
∑ +λ β j2j=1
p
∑12/5/18 20
Dr.YanjunQi/UVACS
(5) Feature Selection
Dimension Reduction
n/a
Many possible options
Greedy (mostly)
Reduced subset of features
Task
Representation
Score Function
Search/Optimization
Models, Parameters
12/5/18 21
Dr.YanjunQi/UVACS
(5)FeatureSelection• Thousandstomillionsoflowlevelfeatures:selectthemostrelevantonetobuildbetter,faster,andeasiertounderstandlearningmachines.
X
p
n
p’
FromDr.IsabelleGuyon12/5/18 22
Dr.YanjunQi/UVACS
Today
q ReviewofMLmethodscoveredsofarq Regression(supervised)q Classification(supervised)q Unsupervisedmodelsq Learningtheory
q ReviewofAssignmentscoveredsofar
12/5/18 23
Dr.YanjunQi/UVACS
Whatwehavecovered(II)q SupervisedClassificationmodels
– K-nearestNeighbor– SupportVectorMachine– LogisticRegression– NeuralNetwork(e.g.MLP)– BayesClassifier– Randomforest/DecisionTree
12/5/18 24
Dr.YanjunQi/UVACS
Threemajorsectionsforclassification
• We can divide the large variety of classification approaches into roughly three major types
1. Discriminative - directly estimate a decision rule/boundary - e.g., logistic regression, support vector machine, decisionTree 2. Generative: - build a generative statistical model - e.g., naïve bayes classifier, Bayesian networks 3. Instance based classifiers - Use observation directly (no models) - e.g. K nearest neighbors
12/5/18 25
Dr.YanjunQi/UVACS
ADatasetforclassification
• Data/points/instances/examples/samples/records:[rows]• Features/attributes/dimensions/independentvariables/covariates/predictors/regressors:[columns,exceptthelast]• Target/outcome/response/label/dependentvariable:specialcolumntobepredicted[lastcolumn]
12/5/18 26
Output as Discrete Class Label
C1, C2, …, CL
C
CDr.YanjunQi/UVACS
(1) K-Nearest Neighbor
classification
Local Smoothness
EPE with L2 loss è conditional mean (Extra)
NA
Training Samples
Task
Representation
Score Function
Search/Optimization
Models, Parameters
12/5/18 27
Dr.YanjunQi/UVACS
(2) Support Vector Machine
classification
Kernel Func K(xi, xj)
Margin + Hinge Loss (optional)
QP with Dual form
Dual Weights
Task
Representation
Score Function
Search/Optimization
Models, Parameters
€
w = α ixiyii∑
argminw,b
wi2
i=1p∑ +C εi
i=1
n
∑
subject to ∀xi ∈ Dtrain : yi xi ⋅w+b( ) ≥1−εi
K(x, z) :=Φ(x)TΦ(z)
12/5/18 28
Dr.YanjunQi/UVACS
(3) Logistic Regression
classification
Log-odds(Y) = linear function of X’s
EPE, with conditional Log-likelihood
Iterative (Newton) method
Logistic weights
Task
Representation
Score Function
Search/Optimization
Models, Parameters
P(c =1 x) = eα+βx
1+ eα+βx12/5/18 29
Dr.YanjunQi/UVACS
(4) Neural Network
conditional Log-likelihood , Cross-Entropy / MSE
SGD / Backprop
NN network Weights
Task
Representation
Score Function
Search/Optimization
Models, Parameters
12/5/18 30
Classification / Regression
Multilayer Network topology
Dr.YanjunQi/UVACS
Logistic regression
Logistic regression could be illustrated as a module
On input x, it outputs ŷ:
where
Draw a logisticregression unit as:
Σ
x1
x2
x3
+1
ŷ=P(Y=1|X,Θ)wT !x + b
11+ e− z
Bias
Summing function
Activation function
Output
12/5/18 31
Dr.YanjunQi/UVACS
Multi-LayerPerceptron(MLP)String a lot of logistic units together. Example: 3 layer network:
x1
x2
x3
+1 +1
a3
a2
a1
hiddeninput output
y
12/5/18 32
Dr.YanjunQi/UVACS
Backpropagation
● Back-propagationtrainingalgorithm
Network activation Forward Step
Error propagation Backward Step
12/5/18 33
Dr.YanjunQi/UVACS
DeepLearningWay:Learningfeatures/Representationfromdata
Feature Engineering ü Most critical for accuracy ü Account for most of the computation for testing ü Most time-consuming in development cycle ü Often hand-craft and task dependent in practice
Feature Learning ü Easily adaptable to new similar tasks ü Layerwise representation ü Layer-by-layer unsupervised training ü Layer-by-layer supervised training 3412/5/18
Dr.YanjunQi/UVACS
(5) Bayes Classifier
classification
Prob. models p(X|C)
EPE with 0-1 loss, with Log likelihood(optional)
MLE
Prob. Models’ Parameter
Task
Representation
Score Function
Search/Optimization
Models, Parameters
P(X1, ⋅ ⋅ ⋅,Xp |C)
argmaxk
P(C _ k | X) = argmaxk
P(X,C) = argmaxk
P(X |C)P(C)
P̂(Xj |C = ck ) =1
2πσ jkexp −
(X j−µ jk )2
2σ jk2
"
#$$
%
&''
P(W1 = n1,...,Wv = nv | ck ) =N !
n1k !n2k !..nvk !θ1kn1kθ2k
n2k ..θvknvk
p(Wi = true | ck ) = pi,kBernoulliNaïve
GaussianNaive
Multinomial12/5/18
Dr.YanjunQi/UVACS
35
Naïve Bayes Classifier
Difficulty: learning the joint probability
• Naïve Bayes classification – Assumption that all input attributes are conditionally independent!
P(X1, ⋅ ⋅ ⋅,Xp |C)
P(X1,X2, ⋅ ⋅ ⋅,Xp |C) = P(X1 | X2, ⋅ ⋅ ⋅,Xp,C)P(X2, ⋅ ⋅ ⋅,Xp |C) = P(X1 |C)P(X2, ⋅ ⋅ ⋅,Xp |C) = P(X1 |C)P(X2 |C) ⋅ ⋅ ⋅P(Xp |C)
12/5/18
AdaptfromProf.KeChenNBslides36
C
X1 X2 X5 X3 X4 X6
Dr.YanjunQi/UVACS
(6) Decision Tree / Random Forest
Greedy to find partitions
Split Purity measure / e.g. IG / cross-entropy / Gini /
Tree Model (s), i.e. space partition
Task
Representation
Score Function
Search/Optimization
Models, Parameters
12/5/18 37
Classification
Partition feature space into set of rectangles, local smoothness
Dr.YanjunQi/UVACS
Anatomyofadecisiontree
overcast
high normal falsetrue
sunny rain
No NoYes Yes
Yes
Outlook
HumidityWindy
Eachnodeisatestononefeature/attribute
Possibleattributevaluesofthenode
Leavesarethedecisions
12/5/18 38
Dr.YanjunQi/UVACS
Informationgain
• IG(X_i,Y)=H(Y)-H(Y|X_i)ReductioninuncertaintybyknowingafeatureX_i
Informationgain:=(informationbeforesplit)–(informationaftersplit)=entropy(parent)–[averageentropy(children)]
Fixed thelower,thebetter(childrennodesarepurer)
– ForIG,thehigher,thebetter=
12/5/18 39
Dr.YanjunQi/UVACS
RandomForestClassifierNexamples
....…
....…
Takehemajorityvote
Mfeatures
12/5/18 40
Dr.YanjunQi/UVACS
12/5/18 41
ü differentassumptionsondataü differentscalabilityprofilesattrainingtimeü differentlatenciesatprediction(test)timeü differentmodelsizes(embedabilityinmobiledevices)ü differentlevelofmodelinterpretability
AdaptfromOlivierGrisel’stalk
Dr.YanjunQi/UVACS
https://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html
Today
q ReviewofMLmethodscoveredsofarq Regression(supervised)q Classification(supervised)q Unsupervisedmodelsq Learningtheory
q ReviewofAssignmentscoveredsofar
12/5/18 42
Dr.YanjunQi/UVACS
Whatwehavecovered(III)
q Unsupervisedmodels– DimensionReduction(PCA)– Hierarchicalclustering– K-meansclustering– GMM/EMclustering
12/5/18 43
Dr.YanjunQi/UVACS
AnunlabeledDatasetX
• Data/points/instances/examples/samples/records:[rows]• Features/attributes/dimensions/independentvariables/covariates/predictors/regressors:[columns]12/5/18 44
a data matrix of n observations on p variables x1,x2,…xp
Unsupervisedlearning=learningfromraw(unlabeled,unannotated,etc)data,asopposedtosuperviseddatawherealabelofexamplesisgiven
Dr.YanjunQi/UVACS
(0) Principal Component Analysis (Self Learn)
Dimension Reduction
Gaussian assumption
Direction of maximum variance
Eigen-decomp
Principal components
Task
Representation
Score Function
Search/Optimization
Models, Parameters
12/5/18 45
Dr.YanjunQi/UVACS
Whatwehavecovered(III)
q Unsupervisedmodels– DimensionReduction(PCA)– Hierarchicalclustering– K-meansclustering– GMM/EMclustering
12/5/18 46
Dr.YanjunQi/UVACS
• Find groups (clusters) of data points such that data points in a group will be similar (or related) to one another and different from (or unrelated to) the data points in other groups
Whatisclustering?
Inter-cluster distances are maximized
Intra-cluster distances are
minimized
12/5/18 47
Dr.YanjunQi/UVACS
12/5/18
Issuesforclustering• Whatisanaturalgroupingamongtheseobjects?
– Definitionof"groupness"• Whatmakesobjects“related”?
– Definitionof"similarity/distance"• Representationforobjects
– Vectorspace?Normalization?• Howmanyclusters?
– Fixedapriori?– Completelydatadriven?
• Avoid“trivial”clusters-toolargeorsmall• ClusteringAlgorithms
– Partitionalalgorithms– Hierarchicalalgorithms
• Formalfoundationandconvergence48
Dr.YanjunQi/UVACS
12/5/18
ClusteringAlgorithms
• Partitionalalgorithms– Usuallystartwitharandom(partial)partitioning
– Refineititeratively• Kmeansclustering• Mixture-Modelbasedclustering
• Hierarchicalalgorithms– Bottom-up,agglomerative– Top-down,divisive
49
Dr.YanjunQi/UVACS
(1) Hierarchical Clustering
Clustering
Distance measure
No clearly defined loss
greedy bottom-up (or top-down)
Dendrogram (tree)
Task
Representation
Score Function
Search/Optimization
Models, Parameters
12/5/18 50
Dr.YanjunQi/UVACS
Example: single link
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
04507
0
54)3,2,1(
54)3,2,1(
12 3 4
5
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
0458079
030
543)2,1(
543)2,1(
⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢
⎣
⎡
0458907910
03602
0
54321
54321
5},min{ 5),3,2,1(4),3,2,1()5,4(),3,2,1( == ddd
12/5/18 51
Dr.YanjunQi/UVACS
(2) K-means Clustering
Clustering
Spherical clusters
Sum-of-square distance to centroid
K-means (special case of EM) algorithm
Cluster membership &
centroid
Task
Representation
Score Function
Search/Optimization
Models, Parameters
12/5/18 52
Dr.YanjunQi/UVACS
12/5/18
53
K-meansClustering:Step2-Determinethemembershipofeachdatapoints
Dr.YanjunQi/UVACS
(3) GMM Clustering
Clustering
Likelihood
EM algorithm
Each point’s soft membership &
mean / covariance per cluster
Task
Representation
Score Function
Search/Optimization
Models, Parameters
12/5/18 54
MixtureofGaussians
Dr.YanjunQi/UVACS
log p(x = xi )i=1
n
∏i∑ = log p(µ = µ j )
1
2π( )p/2 Σ j1/2
e−1
2!x−!µ j( )T Σ j−1 !x− !µ j( )
µ j∑⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥i
∑
12/5/18
55
Expectation-MaximizationfortrainingGMM
• Start:– "Guess"thecentroidmkandcovarianceSkofeachoftheKclusters
• Loop each cluster, revising both the mean (centroid position) and covariance (shape)
Dr.YanjunQi/UVACS
For each point, revising its proportions belonging to each of the K clusters
Dr.YanjunQi/UVACS
56
Compare:K-means
• TheEMalgorithmformixturesofGaussiansislikea"softversion"oftheK-meansalgorithm.
• IntheK-means“E-step”wedohardassignment:
• IntheK-means“M-step”weupdatethemeansastheweightedsumofthedata,butnowtheweightsare0or1:
12/5/18
12/5/18 57
ü differentassumptionsondataü differentscalabilityprofilesü differentmodelsizes(embedabilityinmobiledevices)
Dr.YanjunQi/UVACShttps://scikit-learn.org/stable/auto_examples/cluster/plot_cluster_comparison.html
Today
q ReviewofMLmethodscoveredsofarq Regression(supervised)q Classification(supervised)q Unsupervisedmodelsq Learningtheory
q ReviewofAssignmentscoveredsofar
12/5/18 58
Dr.YanjunQi/UVACS
Whatwehavecovered(IV)
q Learningtheory/Modelselection– K-foldscrossvalidation– Expectedpredictionerror– Biasandvariancetradeoff
12/5/18 59
Dr.YanjunQi/UVACS
CV-basedModelSelectionWe’retryingtodecidewhichalgorithm/
hyperparametertouse.• Wetraineachmodelandmakeatable...
12/5/18
Dr.YanjunQi/UVACS
60
Hyperparametertuning….
StatisticalDecisionTheory(Extra)
• Random input vector: X • Random output variable:Y • Joint distribution: Pr(X,Y ) • Loss function L(Y, f(X))
• Expected prediction error (EPE): •
€
EPE( f ) = E(L(Y, f (X))) = L(y, f (x))∫ Pr(dx,dy) e.g. = (y − f (x))2∫ Pr(dx,dy)
Consider population distribution
e.g. Squared error loss (also called L2 loss )
12/5/18 61
Dr.YanjunQi/UVACS
Bias-VarianceTrade-offforEPE: (Extra)
EPE = noise2 + bias2 + variance
Unavoidable error
Error due to incorrect
assumptions
Error due to variance of training
samples
Slidecredit:D.Hoiem12/5/18 62
Dr.YanjunQi/UVACS
Bias-Variance Tradeoff / Model Selection
63
underfit region overfit region
12/5/18
Dr.YanjunQi/UVACS
Model “bias” & Model “variance”
• Middle RED: – TRUE function
• Error due to bias: – How far off in general
from the middle red
• Error due to variance: – How wildly the blue
points spread
12/5/18 64
Dr.YanjunQi/UVACS
needtomakeassumptionsthatareabletogeneralize
• Components of generalization error – Bias: how much the average model over all training sets differ from
the true model? • Error due to inaccurate assumptions/simplifications made by the
model – Variance: how much models estimated from different training sets
differ from each other • Underfitting: model is too “simple” to represent all the
relevant class characteristics – High bias and low variance – High training error and high test error
• Overfitting: model is too “complex” and fits irrelevant characteristics (noise) in the data – Low bias and high variance – Low training error and high test error
Slidecredit:L.Lazebnik12/5/18 65
Dr.YanjunQi/UVACS
Task
Machine Learning in a Nutshell
Representation
Score Function
Search/Optimization
Models, Parameters
12/5/18 66
MLgrewoutofworkinAIOptimizeaperformancecriterionusingexampledataorpastexperience,Aimingtogeneralizetounseendata
Dr.YanjunQi/UVACS
Whatwehavecoveredforeachcomponent
12/5/18
Dr.YanjunQi/UVACS
67
Task Regression,classification,clustering,dimen-reduction
Representation
Linearfunc,nonlinearfunction(e.g.polynomialexpansion),locallinear,logisticfunction(e.g.p(c|x)),tree,multi-layer,prob-densityfamily(e.g.Bernoulli,multinomial,Gaussian,mixtureofGaussians),localfuncsmoothness,kernelmatrix,localsmoothness,partitionoffeaturespace,
ScoreFunction
MSE,Margin,log-likelihood,EPE(e.g.L2lossforKNN,0-1lossforBayesclassifier),cross-entropy,clusterpointsdistancetocenters,variance,conditionallog-likelihood,completedata-likelihood,regularizedlossfunc(e.g.L1,L2),goodnessofinter-clustersimilar
Search/Optimization
Normalequation,gradientdescent,stochasticGD,Newton,Linearprogramming,Quadraticprogramming(quadraticobjectivewithlinearconstraints),greedy,EM,asyn-SGD,eigenDecomp,backprop
Models,Parameters
Linearweightvector,basisweightvector,localweightvector,dualweights,trainingsamples,tree-dendrogram,multi-layerweights,principlecomponents,member(soft/hard)assignment,clustercentroid,clustercovariance(shape),…
Today
q ReviewofMLmethodscoveredsofarq Regression(supervised)q Classification(supervised)q Unsupervisedmodelsq Learningtheory
q ReviewofAssignmentscoveredsofar
12/5/18 68
Dr.YanjunQi/UVACS
HW1
• Q1:Linearalgebrareview• Q2:Linearregressionmodelfitting– Dataloading– Basiclinearregression– Threewaystotrain:Normalequation/SGD/BatchGD
• SampleexamQ:– regressionmodelfitting
12/5/18
Dr.YanjunQi/UVACS6316/f16
69
Task
Representation
Score Function
Search/Optimization
Models, Parameters
HW2
• Q1:Polynomialregression• Q2:Ridgeregression
– Mathderivationofridge– Understandwhy/howRidge– ModelselectionofRidgewithKCV
• SampleQs:– Regularization
12/5/18
Dr.YanjunQi/UVACS6316/f16
70
Task
Representation
Score Function
Search/Optimization
Models, Parameters
HW3
• Q1:KNNandmodelselection• Q2:SupportVectorMachineswithScikit-Learn– Datapreprocessing– HowtouseSVMpackage– ModelselectionforSVM– Modelselectionpipelinewithtrain-vali,ortrain-CV;thentest
• SampleQAs:– SVM
12/5/18
Dr.YanjunQi/UVACS6316/f16
71
Task
Representation
Score Function
Search/Optimization
Models, Parameters
HW4• Q1:NeuralNetworkTensorflowPlayground– InteractivelearningofMLP– Featureengineeringvs.– Featurelearning
• Q2:ImageClassification/Keras– DNNTool:Kerasusing– Extra:PCAforimageclassification
• SampleQs:– NeuralNets
12/5/18
Dr.YanjunQi/UVACS6316/f16
72
Task
Representation
Score Function
Search/Optimization
Models, Parameters
HW5• Q1:NaiveBayesClassifierforText-baseMovieReviewClassification– Preprocessingoftextsamples– BOWDocumentRepresentation– MultinomialNaiveBayesClassifierBOWway
– MultivariateBernoulliNaiveBayesClassifier
• SampleQs:– BayesClassifier12/5/18
Dr.YanjunQi/UVACS6316/f16
73
Task
Representation
Score Function
Search/Optimization
Models, Parameters
HW6
• Q1:UnsupervisedClusteringofaudiodataandconsensusdata– Dataloading– K-meanclustering– HowtofindK:knee-findingplot– Howtomeasureclustering:purityMetric
• SampleQs:– KmeansandGMM– Decisiontrees
12/5/18
Dr.YanjunQi/UVACS6316/f16
74
Task
Representation
Score Function
Search/Optimization
Models, Parameters
HighlyRecommendFourExtra-curriculumbooks
• 1.Book-AlgorithmstoLiveBy:TheComputerScienceofHumanDecisions– https://books.google.com/books/about/Algorithms_to_Live_By_The_Computer_Scien.html?id=xmeJCgAAQBAJ&source=kp_book_description
– Thisbookprovidesafascinatingexplorationofhowcomputeralgorithmscanbeappliedtooureverydaylives.
12/5/18
Dr.YanjunQi/UVACS
75
HighlyRecommendFourExtra-curriculumbooks
• 2.AMindforNumbers:HowtoExcelatMathandScience(EvenIfYouFlunkedAlgebra)– https://www.goodreads.com/book/show/18693655-a-mind-for-numbers
– Agoodsummaryhttps://fastertomaster.com/a-mind-for-numbers-barbara-oakley/
– Thebookisgearedtowardhelpingstudentsstudywell.Despitethetitleformath,mostoftheadviceinthisbookisappropriateforjustaboutanysubject.
12/5/18
Dr.YanjunQi/UVACS
76
HighlyRecommendFourExtra-curriculumbooks
• 3.Book:SoGoodTheyCannotIgnoreYou-WhySkillsTrumpPassionintheQuestforWorkYouLove– https://www.amazon.com/Good-They-Cant-Ignore-You-ebook/dp/B0076DDBJ6/ref=tmm_kin_swatch_0?_encoding=UTF8&qid=1497747881&sr=1-1\
– TheideaofCareercapital-rareandvaluableskillsneeddeliberatepractice
– 10,000hoursofdeliberatepracticeèExpert!
12/5/18
Dr.YanjunQi/UVACS
77
HighlyRecommendFourExtra-curriculumbooks
• 4.Book:HomoDeus-ABriefHistoryofTomorrow– https://www.goodreads.com/book/show/31138556-homo-deus
– “HomoDeusexplorestheprojects,dreamsandnightmaresthatwillshapethetwenty-firstcentury—fromovercomingdeathtocreatingartificiallife.Itasksthefundamentalquestions:Wheredowegofromhere?Andhowwillweprotectthisfragileworldfromourowndestructivepowers?Thisisthenextstageofevolution.ThisisHomoDeus.””
12/5/18
Dr.YanjunQi/UVACS
78
References
q Hastie,Trevor,etal.Theelementsofstatisticallearning.Vol.2.No.1.NewYork:Springer,2009.
q Prof.M.A.Papalaskar’sslidesq Prof.AndrewNg’sslides
12/5/18 79
Dr.YanjunQi/UVACS