Machine Learning Overview

1

Machine Learning Overview

Tamara Berg

CS 590-133 Artificial Intelligence

Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell, Andrew Moore, Percy Liang, Luke Zettlemoyer, Rob Pless, Killian Weinberger, Deva Ramanan

Announcements

• HW4 is due April 3

• Reminder: Midterm2 next Thursday – Next Tuesday’s lecture topics will not be included (but material

will be on the final so attend!)

• Midterm review – Monday, 5pm in FB009

Midterm Topic ListBe able to define the following terms and answer basic questions about them:

Reinforcement learning– Passive vs Active RL– Model-based vs model-free approaches– Direct utility estimation– TD Learning and TD Q-learning– Exploration vs exploitation– Policy Search– Application to Backgammon/Aibos/helicopters (at a high level)

Probability– Random variables – Axioms of probability– Joint, marginal, conditional probability distributions – Independence and conditional independence– Product rule, chain rule, Bayes rule

Midterm Topic List

Bayesian Networks General– Structure and parameters – Calculating joint and conditional probabilities– Independence in Bayes Nets (Bayes Ball)

Bayesian Inference– Exact Inference (Inference by Enumeration, Variable Elimination)– Approximate Inference (Forward Sampling, Rejection Sampling, Likelihood

Weighting)– Networks for which efficient inference is possible

Naïve Bayes– Parameter learning including Laplace smoothing– Likelihood, prior, posterior – Maximum likelihood (ML), maximum a posteriori (MAP) inference – Application to spam/ham classification– Application to image classification (at a high level)

Midterm Topic List

HMMs– Markov Property– Markov Chains– Hidden Markov Model (initial distribution, transitions, emissions)– Filtering (forward algorithm)

Machine Learning– Unsupervised/supervised/semi-supervised learning– K Means clustering– Training, tuning, testing, generalization

Machine learning

Image source: https://www.coursera.org/course/ml

https://www.coursera.org/course/ml

Machine learning

• Definition– Getting a computer to do well on a task

without explicitly programming it– Improving performance on a task based on

experience

Big Data!

What is machine learning?

• Computer programs that can learn from data

• Two key components– Representation: how should we represent the data?– Generalization: the system should generalize from its

past experience (observed data items) to perform well on unseen data items.

Types of ML algorithms

• Unsupervised– Algorithms operate on unlabeled examples

• Supervised– Algorithms operate on labeled examples

• Semi/Partially-supervised– Algorithms combine both labeled and unlabeled

examples

Clustering

– The assignment of objects into groups (aka clusters) so that objects in the same cluster are more similar to each other than objects in different clusters.

– Clustering is a common technique for statistical data analysis, used in many fields, including machine learning, data mining, pattern recognition, image analysis and bioinformatics.

Euclidean distance, angle between data vectors, etc

K-means clustering

• Want to minimize sum of squared Euclidean distances between points xi and their nearest cluster centers mk

k

ki

ki mxMXDcluster

clusterinpoint

2)(),(

Source: Hinrich Schutze

Hierarchical clustering strategies

• Agglomerative clustering• Start with each data point in a separate cluster• At each iteration, merge two of the “closest” clusters

• Divisive clustering• Start with all data points grouped into a single cluster• At each iteration, split the “largest” cluster

PProduces a hierarchy of clusterings

P

P

P

P

Divisive Clustering

• Top-down (instead of bottom-up as in Agglomerative Clustering)

• Start with all data points in one big cluster

• Then recursively split clusters

• Eventually each data point forms a cluster on its own.

Flat or hierarchical clustering?

• For high efficiency, use flat clustering (e.g. k means)

• For deterministic results: hierarchical clustering

• When a hierarchical structure is desired: hierarchical algorithm

• Hierarchical clustering can also be applied if K cannot be predetermined (can start without knowing K)

Source: Hinrich Schutze

Clustering in Action – example from computer vision

Recall: Bag of Words Representation

· Represent document as a “bag of words”

Bag-of-features models

Slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba

Bags of features for image classification

1. Extract features

1. Extract features

2. Learn “visual vocabulary”


1. Extract features

2. Learn “visual vocabulary”

3. Represent images by frequencies of “visual words”


…

1. Feature extraction

2. Learning the visual vocabulary

…


Clustering

…


Clustering

…Visual vocabulary

Example visual vocabulary

Fei-Fei et al. 2005

3. Image representation

…..

fre

que

ncy

Visual words





examples

Example: Sentiment analysis

http://gigaom.com/2013/10/03/stanford-researchers-to-open-source-model-they-say-has-nailed-sentiment-analysis/

http://nlp.stanford.edu:8080/sentiment/rntnDemo.html

http://gigaom.com/2013/10/03/stanford-researchers-to-open-source-model-they-say-has-nailed-sentiment-analysis/

http://nlp.stanford.edu:8080/sentiment/rntnDemo.html

Example: Image classification

apple

pear

tomato

cow

dog

horse

input desired output

http://yann.lecun.com/exdb/mnist/index.html

http://yann.lecun.com/exdb/mnist/index.html

Example: Seismic data

Body wave magnitude

Sur

face

wav

e m

agni

tude

Nuclear explosions

Earthquakes

The basic classification framework

y = f(x)

• Learning: given a training set of labeled examples {(x1,y1), …, (xN,yN)}, estimate the parameters of the prediction function f

• Inference: apply f to a never before seen test example x and output the predicted value y = f(x)

output classification function

input

Naïve Bayes classifier

ddy

y

y

yxPyP

yPyP

yPf

)|()(maxarg

)|()(maxarg

)|(maxarg)(

x

xx

A single dimension or attribute of x

Example: Image classification

Car

Input: Image Representation Classifier (e.g. Naïve Bayes, Neural Net, etc

Output: Predicted label

Example: Training and testing

• Key challenge: generalization to unseen examples

Training set (labels known) Test set (labels unknown)

Some classification methods

106 examples

Nearest neighbor

Shakhnarovich, Viola, Darrell 2003Berg, Berg, Malik 2005…

Neural networks

LeCun, Bottou, Bengio, Haffner 1998Rowley, Baluja, Kanade 1998…

Support Vector Machines and Kernels Conditional Random Fields

McCallum, Freitag, Pereira 2000Kumar, Hebert 2003…

Guyon, VapnikHeisele, Serre, Poggio, 2001…

Classification … more soon





examples

Supervised learning has many successes

• recognize speech,• steer a car,• classify documents• classify proteins• recognizing faces, objects in images• ...

Slide Credit: Avrim Blum

http://images.google.com/imgres?imgurl=http://www.ebgm.jussieu.fr/~debrevern/PBs/images/protein_04.jpg&imgrefurl=http://www.ebgm.jussieu.fr/~debrevern/PBs/coding.html&h=496&w=709&sz=50&hl=en&start=8&tbnid=RCESdcwRtVouHM:&tbnh=98&tbnw=140&prev=/images?q=protein&gbv=2&hl=en

http://images.google.com/imgres?imgurl=http://www.tfe.umu.se/courses/systemteknik/Audio_signal_processing/04vasa/..%255C..%255CMediesignaler%255C03v4%255CLAB-di8.gif&imgrefurl=http://www.tfe.umu.se/courses/systemteknik/Audio_signal_processing/04vasa/LAB-dig_inspelning.html&h=504&w=603&sz=11&hl=en&start=1&tbnid=2f4H6Bmll4VRNM:&tbnh=113&tbnw=135&prev=/images?q=speech+signal&gbv=2&hl=en

However, for many problems, labeled data can be rare or expensive.

Unlabeled data is much cheaper.Need to pay someone to do it, requires special testing,…



Unlabeled data is much cheaper.

Speech

Images

Medical outcomes

Customer modeling

Protein sequences

Web pages

Need to pay someone to do it, requires special testing,…




[From Jerry Zhu]






Can we make use of cheap unlabeled data?


Semi-Supervised LearningCan we use unlabeled data to augment a

small labeled sample to improve learning?

But unlabeled data is missing the most important info!!

But maybe still has useful regularities that

we can use.

But…But…But…Slide Credit: Avrim Blum

Documents

Machine Learning Overview