68
Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore, Percy Liang, Luke Zettlemoyer, Rob Pless, Killian Weinberger, Deva Ramanan 1

Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

Embed Size (px)

Citation preview

Page 1: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

1

Machine Learning Overview

Tamara Berg

CS 560 Artificial Intelligence

Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore, Percy Liang, Luke Zettlemoyer, Rob Pless, Killian Weinberger, Deva Ramanan

Page 2: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

Announcements

• HW3 due Oct 29, 11:59pm

Page 3: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

Machine learning

Image source: https://www.coursera.org/course/ml

Page 4: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

Machine learning

• Definition– Getting a computer to do well on a task

without explicitly programming it– Improving performance on a task based on

experience

Page 5: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

Big Data!

Page 6: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

What is machine learning?

• Computer programs that can learn from data

• Two key components– Representation: how should we represent the data?– Generalization: the system should generalize from its

past experience (observed data items) to perform well on unseen data items.

Page 7: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

Types of ML algorithms

• Unsupervised– Algorithms operate on unlabeled examples

• Supervised– Algorithms operate on labeled examples

• Semi/Partially-supervised– Algorithms combine both labeled and unlabeled

examples

Page 8: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart
Page 9: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

Clustering

– The assignment of objects into groups (aka clusters) so that objects in the same cluster are more similar to each other than objects in different clusters.

– Clustering is a common technique for statistical data analysis, used in many fields, including machine learning, data mining, pattern recognition, image analysis and bioinformatics.

Page 10: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

Euclidean distance, angle between data vectors, etc

Page 11: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart
Page 12: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

K-means clustering

• Want to minimize sum of squared Euclidean distances between points xi and their nearest cluster centers mk

k

ki

ki mxMXDcluster

clusterinpoint

2)(),(

Page 13: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart
Page 14: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart
Page 15: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart
Page 16: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart
Page 17: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart
Page 18: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart
Page 19: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart
Page 20: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart
Page 21: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart
Page 22: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart
Page 23: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart
Page 24: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

K-means

0 0.5 1 1.5 2 2.5 30

0.5

1

1.5

2

2.5

3

3.5

4

xx

x

x’s – indicate initialization for 3 cluster centers

Iterate until convergence:

1) Compute assignment of data points to cluster centers

2) Update cluster centers with mean of assigned points

Page 25: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart
Page 26: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart
Page 27: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

Flat vs Hierarchical Clustering

• Flat algorithms– Usually start with a random partitioning of docs into

groups– Refine iteratively– Main algorithm: k-means

• Hierarchical algorithms– Create a hierarchy– Bottom-up: agglomerative– Top-down: divisive

Page 28: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

Hierarchical clustering strategies

• Agglomerative clustering• Start with each data point in a separate cluster• At each iteration, merge two of the “closest” clusters

• Divisive clustering• Start with all data points grouped into a single cluster• At each iteration, split the “largest” cluster

Page 29: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

PProduces a hierarchy of clusterings

P

P

P

Page 30: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

P

Page 31: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

Divisive Clustering

• Top-down (instead of bottom-up as in Agglomerative Clustering)

• Start with all data points in one big cluster

• Then recursively split clusters

• Eventually each data point forms a cluster on its own.

Page 32: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

Flat or hierarchical clustering?

• For high efficiency, use flat clustering (e.g. k means)

• For deterministic results: hierarchical clustering

• When a hierarchical structure is desired: hierarchical algorithm

• Hierarchical clustering can also be applied if K cannot be predetermined (can start without knowing K)

Page 33: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

Clustering in Action – example from computer vision

Page 34: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

Recall: Bag of Words Representation

· Represent document as a “bag of words”

Page 35: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

Bag of features for images

· Represent images as a “bag of words”

Page 36: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

Bags of features for image classification

1. Extract features

Page 37: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

1. Extract features

2. Learn “visual vocabulary”

Bags of features for image classification

Page 38: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

1. Extract features

2. Learn “visual vocabulary”

3. Represent images by frequencies of “visual words”

Bags of features for image classification

Page 39: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

1. Feature extraction

Page 40: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

2. Learning the visual vocabulary

Page 41: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

2. Learning the visual vocabulary

Clustering

Page 42: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

2. Learning the visual vocabulary

Clustering

…Visual vocabulary

Page 43: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

Example visual vocabulary

Fei-Fei et al. 2005

Page 44: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

3. Image representation

…..

fre

que

ncy

Visual words

Page 45: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

Types of ML algorithms

• Unsupervised– Algorithms operate on unlabeled examples

• Supervised– Algorithms operate on labeled examples

• Semi/Partially-supervised– Algorithms combine both labeled and unlabeled

examples

Page 46: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart
Page 47: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart
Page 48: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart
Page 49: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart
Page 50: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

Example: Sentiment analysis

http://gigaom.com/2013/10/03/stanford-researchers-to-open-source-model-they-say-has-nailed-sentiment-analysis/

http://nlp.stanford.edu:8080/sentiment/rntnDemo.html

Page 51: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

Example: Image classification

apple

pear

tomato

cow

dog

horse

input desired output

Page 52: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

http://yann.lecun.com/exdb/mnist/index.html

Page 53: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

Example: Seismic data

Body wave magnitude

Sur

face

wav

e m

agni

tude

Nuclear explosions

Earthquakes

Page 54: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart
Page 55: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

The basic classification framework

y = f(x)

• Learning: given a training set of labeled examples {(x1,y1), …, (xN,yN)}, estimate the parameters of the prediction function f

• Inference: apply f to a never before seen test example x and output the predicted value y = f(x)

output classification function

input

Page 56: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

Naïve Bayes Classification

The class that maximizes:

Page 57: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

Example: Image classification

Car

Input: Image Representation Classifier (e.g. Naïve Bayes, Neural Net, etc

Output: Predicted label

Page 58: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

Example: Training and testing

• Key challenge: generalization to unseen examples

Training set (labels known) Test set (labels unknown)

Page 59: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart
Page 60: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

Some classification methods

106 examples

Nearest neighbor

Shakhnarovich, Viola, Darrell 2003Berg, Berg, Malik 2005…

Neural networks

LeCun, Bottou, Bengio, Haffner 1998Rowley, Baluja, Kanade 1998…

Support Vector Machines and Kernels Conditional Random Fields

McCallum, Freitag, Pereira 2000Kumar, Hebert 2003…

Guyon, VapnikHeisele, Serre, Poggio, 2001…

Page 61: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

Classification … more soon

Page 62: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

Types of ML algorithms

• Unsupervised– Algorithms operate on unlabeled examples

• Supervised– Algorithms operate on labeled examples

• Semi/Partially-supervised– Algorithms combine both labeled and unlabeled

examples

Page 64: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

However, for many problems, labeled data can be rare or expensive.

Unlabeled data is much cheaper.Need to pay someone to do it, requires special testing,…

Slide Credit: Avrim Blum

Page 65: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

However, for many problems, labeled data can be rare or expensive.

Unlabeled data is much cheaper.

Speech

Images

Medical outcomes

Customer modeling

Protein sequences

Web pages

Need to pay someone to do it, requires special testing,…

Slide Credit: Avrim Blum

Page 66: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

However, for many problems, labeled data can be rare or expensive.

Unlabeled data is much cheaper.

[From Jerry Zhu]

Need to pay someone to do it, requires special testing,…

Slide Credit: Avrim Blum

Page 67: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

Need to pay someone to do it, requires special testing,…

However, for many problems, labeled data can be rare or expensive.

Unlabeled data is much cheaper.

Can we make use of cheap unlabeled data?

Slide Credit: Avrim Blum

Page 68: Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart

Semi-Supervised LearningCan we use unlabeled data to augment a

small labeled sample to improve learning?

But unlabeled data is missing the most important info!!

But maybe still has useful regularities that

we can use.

But…But…But…Slide Credit: Avrim Blum