51
Recap

clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

Embed Size (px)

Citation preview

Page 1: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

Recap •     

Page 2: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

Recap •     

Page 3: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

Be cautious..

Data  may  not  be  in  one  blob,  need  to  separate  data  into  groups  

Clustering

Page 4: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

CS  498  Probability  &  Sta;s;cs  

Clustering methods

Zicheng  Liao  

Page 5: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

What is clustering?

•  “Grouping”

–  A fundamental part in signal processing

•  “Unsupervised classification”

•  Assign the same label to data points

that are close to each other

Why?

Page 6: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

We live in a universe full of clusters

Page 7: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

Two (types of) clustering methods

•  Agglomerative/Divisive clustering

•  K-means

Page 8: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

Agglomerative/Divisive clustering

Agglomera;ve    clustering  

(BoDom  up)  

Hierarchical  cluster  tree  

Divisive  clustering  

(Top  down)  

Page 9: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

Algorithm

Page 10: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

Agglomerative clustering: an example •  “merge clusters bottom up to form a hierarchical cluster tree”

Anima;on  from  Georg  Berber  www.mit.edu/~georg/papers/lecture6.ppt  

Page 11: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

Dendrogram

>>  X  =  rand(6,  2);    %create  6  points  on  a  plane  

>>  Z  =  linkage(X);  %Z  encodes  a  tree  of  hierarchical  clusters  

>>  dendrogram(Z);  %visualize  Z  as  a  dendrograph  

Dis

tanc

e

Page 12: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

Distance measure •     

Page 13: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

Inter-cluster distance •  Treat each data point as a single cluster

•  Only need to define inter-cluster distance –  Distance between one set of points and another set of

points

•  3 popular inter-cluster distances –  Single-link –  Complete-link –  Averaged-link

Page 14: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

Single-link •  Minimum of all pairwise distances between

points from two clusters •  Tend to produce long, loose clusters

Page 15: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

Complete-link •  Maximum of all pairwise distances between

points from two clusters •  Tend to produce tight clusters

Page 16: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

Averaged-link •  Average of all pairwise distances between

points from two clusters

Page 17: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

How many clusters are there? •  Intrinsically hard to know •  The dendrogram gives insights to it •  Choose a threshold to split the dendrogram into clusters

Dis

tanc

e

Page 18: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

An example

do_agglomerative.m

Page 19: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

Divisive clustering •  “recursively split a cluster into smaller clusters” •  It’s hard to choose where to split: combinatorial problem •  Can be easier when data has a special structure (pixel grid)

Page 20: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

K-means •  Partition data into clusters such that:

–  Clusters are tight (distance to cluster center is small) –  Every data point is closer to its own cluster center than to all

other cluster centers (Voronoi diagram)

[figures  excerpted  from  Wikipedia]  

Page 21: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

Formulation •     

Cluster  center  

Page 22: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

K-means algorithm

•     

Page 23: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

Illustration

Randomly  ini;alize  3    cluster  centers  (circles)  

Assign  each  point  to  the    closest  cluster  center   Update  cluster  center   Re-­‐iterate  step2  

[figures  excerpted  from  Wikipedia]  

Page 24: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

Example

do_Kmeans.m  (show  step-­‐by-­‐step  updates  and  effects  of  cluster  number)  

Page 25: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

Discussion •     

K  =  2?  K  =  3?  K  =  5?  

Page 26: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

Discussion •  Converge to local minimum => counterintuitive clustering

[figures  excerpted  from  Wikipedia]  

Page 27: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

Discussion •  Favors spherical clusters; •  Poor results for long/loose/stretched clusters

Input  data(color  indicates  true  labels)   K-­‐means  results  

Page 28: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

Discussion •  Cost is guaranteed to decrease in every step

–  Assign a point to the closest cluster center minimizes the cost for current cluster center configuration

–  Choose the mean of each cluster as new cluster center minimizes the squared distance for current clustering configuration

•  Finish in polynomial time

Page 29: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

Summary •  Clustering as grouping “similar” data together

•  A world full of clusters/patterns

•  Two algorithms –  Agglomerative/divisive clustering: hierarchical clustering tree –  K-means: vector quantization

Page 30: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

CS  498  Probability  &  Sta;s;cs  Regression

Zicheng  Liao  

Page 31: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

Example-I •  Predict stock price

t+1   ;me  

Stock  price  

Page 32: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

Example-II •  Fill in missing pixels in an image: inpainting

Page 33: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

Example-III •  Discover relationship in data

Amount  of  hormones  by  devices  from  3  produc;on  lots  

Time  in  service  for  devices    from  3  produc;on  lots    

Page 34: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

Example-III •  Discovery relationship in data

Page 35: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

Linear regression

•     

Page 36: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

Parameter estimation •  MLE of linear model with Gaussian noise

[Least squares, Carl F. Gauss, 1809]

 

Likelihood  func;on  

Page 37: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

Parameter estimation

•  Closed form solution

 

 

Normal equation

Cost  func;on  

(expensive  to  compute  the  matrix  inverse  for  high  dimension)  

Page 38: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

Gradient descent •  hDp://openclassroom.stanford.edu/MainFolder/VideoPage.php?

course=MachineLearning&video=02.5-­‐LinearRegressionI-­‐GradientDescentForLinearRegression&speed=100    (Andrew Ng)

(Guarantees  to  reach  global  minimum  in  finite  steps)  

Page 39: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

Example

do_regression.m  

Page 40: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

Interpreting a regression

Page 41: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

Interpreting a regression

•  Zero  mean  residual  •  Zero  correla;on  

Page 42: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

Interpreting the residual

Page 43: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

Interpreting the residual •     

Page 44: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

How good is a fit? •     

Page 45: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

How good is a fit? •  R-squared measure

–  The percentage of variance explained by regression

–  Used in hypothesis test for model selection

Page 46: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

Regularized linear regression •  Cost

•  Closed-form solution

•  Gradient descent

Page 47: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

Why regularization?

•  Handle small eigenvalues –  Avoid dividing by small values by adding the

regularizer

Page 48: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

Why regularization?

•  Avoid over-fitting: –  Over fitting –  Small parameters simpler model less prone

to over-fitting

Over  fit:  hard  to  generalize  to  new  data  

Smaller  parameters  make  a  simpler  (beDer)  model  

Page 49: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

L1 regularization (Lasso) •     

Page 50: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

How does it work?

•     

Page 51: clustering and regression - luthuli.cs.uiuc.eduluthuli.cs.uiuc.edu/~daf/courses/Probcourse/Probcourse-2013... · • “merge clusters bottom up to form a hierarchical cluster tree”

Summary •