36
Machine Learning Week 2 Lecture 1

Machine Learning Week 2 Lecture 1. Quiz and Hand in data Test what you know so I can adapt! We need data for the hand in

Embed Size (px)

Citation preview

Page 1: Machine Learning Week 2 Lecture 1. Quiz and Hand in data Test what you know so I can adapt! We need data for the hand in

Machine Learning

Week 2 Lecture 1

Page 2: Machine Learning Week 2 Lecture 1. Quiz and Hand in data Test what you know so I can adapt! We need data for the hand in

Quiz and Hand in data

• Test what you know so I can adapt!

• We need data for the hand in

Page 3: Machine Learning Week 2 Lecture 1. Quiz and Hand in data Test what you know so I can adapt! We need data for the hand in

Quiz

Any ProblemsAny Questions

Page 4: Machine Learning Week 2 Lecture 1. Quiz and Hand in data Test what you know so I can adapt! We need data for the hand in

Recap

Data Set

Learning Algorithm

Hypothesis hh(x) ≈ f(x)

Unknown Target f

Hypothesis Set

Supervised Learning

5 0 4 1 9 2 1 3 1 4

Target: House PriceInput: Size, Rooms, Age, Garage, …Data: Historical Data of House Sales

Regression

Classification (10 classes)

Page 5: Machine Learning Week 2 Lecture 1. Quiz and Hand in data Test what you know so I can adapt! We need data for the hand in

Linear Models

House Price = 1234 x 1 + 88 x size + 42 x Rooms - 666 x age + 0.01 x Garage

Example:Target House PriceInput: Size,

Rooms, Age, Garage, …

Data: Historical House Sales

Weigh each input dimension to effect the target function in a good way

θ4 x4 θ3 x3 θ2 * x2 θ1 x1 θ0 x0

Linear in θ

Nonlinear Transform

(matrix product)

Page 6: Machine Learning Week 2 Lecture 1. Quiz and Hand in data Test what you know so I can adapt! We need data for the hand in

Three Models

Logistic Regression

Estimating Probabilities

Classification (Perceptron)

Regression

w

Classify y = 1 if

Equivalent to

w

Page 7: Machine Learning Week 2 Lecture 1. Quiz and Hand in data Test what you know so I can adapt! We need data for the hand in

Maximum Likelihood

Likelihood

Use Logarithm to make into a sum. Then Optimize.

Assumption: Independent Data

For Logistic Regression we get cross entropy error:

Page 8: Machine Learning Week 2 Lecture 1. Quiz and Hand in data Test what you know so I can adapt! We need data for the hand in

Convex Optimization

ConvexNon-convex

x,f(x)y,f(y)

x,f(x)

y,f(y)

f(x)+f’(x)(y-x)

f and g are convex, h is affine

Local Minima are Global Minima

Page 9: Machine Learning Week 2 Lecture 1. Quiz and Hand in data Test what you know so I can adapt! We need data for the hand in

Descent Methods

Iteratively move toward a better solution

where f is twice continuously differentiable

• Pick start point x• Repeat Until Stopping Criterion Satisfied• Compute Descent Direction v• Line Search: Compute Step Size t• Update: x = x + t v

Page 10: Machine Learning Week 2 Lecture 1. Quiz and Hand in data Test what you know so I can adapt! We need data for the hand in

Simple Gradient Descent

• Pick start point x• LR = 0.1• Repeat 50 rounds• Set v• Update: x = x + LR v

Descent Direction is

Step size:

Page 11: Machine Learning Week 2 Lecture 1. Quiz and Hand in data Test what you know so I can adapt! We need data for the hand in

Learning Rate Learning Rate Learning Rate

Page 12: Machine Learning Week 2 Lecture 1. Quiz and Hand in data Test what you know so I can adapt! We need data for the hand in

Gradient Descent Jump Around

Use Exact Line Search Starting From (10,1)

Page 13: Machine Learning Week 2 Lecture 1. Quiz and Hand in data Test what you know so I can adapt! We need data for the hand in

Gradient Checking

If You Use Gradient Descent Compute Gradient Correctly.

Choose small h and compute

Use this two sided formula.Reduces the estimation error significantly.

n-dimensional gradient: Use formula for each variableUsually works well

Page 14: Machine Learning Week 2 Lecture 1. Quiz and Hand in data Test what you know so I can adapt! We need data for the hand in

Handin 1.

• It comes online after class today• Include Matlab examples but not a long intro.

Google is your friend.• Questions are always welcome• Get Busy

Supervised Learning

5 0 4 1 9 2 1 3 1 4

Page 15: Machine Learning Week 2 Lecture 1. Quiz and Hand in data Test what you know so I can adapt! We need data for the hand in

Today

• Learning feasibility

• Probabilistic Approach

• Learning Formalized

Page 16: Machine Learning Week 2 Lecture 1. Quiz and Hand in data Test what you know so I can adapt! We need data for the hand in

Learning Diagram

Data Set(x1,y1,...,xn,yn)

Learning Algorithm

Hypothesis hh(x) ≈ f(x)

Unknown Target f

Hypothesis Set

Page 17: Machine Learning Week 2 Lecture 1. Quiz and Hand in data Test what you know so I can adapt! We need data for the hand in

Impossibility of Learning!x1 x2 x3 f(x

)0 0 0 11 0 0 00 1 0 11 1 0 10 0 1 01 0 1 ?0 1 1 ?1 1 1 ?

What is f?

There are 256 potential functions 8 of them has in sample error 0

Assumptions are needed

Page 18: Machine Learning Week 2 Lecture 1. Quiz and Hand in data Test what you know so I can adapt! We need data for the hand in

No Free Lunch"All models are wrong, but some models are useful.” George Box

Machine Learning has many different models and algorithms.

Assumptions that works well in one domain may fail in another.

There is no single best model that works best for all problems (No Free Lunch Theorem)

Page 19: Machine Learning Week 2 Lecture 1. Quiz and Hand in data Test what you know so I can adapt! We need data for the hand in

Probabilistic Games

Page 20: Machine Learning Week 2 Lecture 1. Quiz and Hand in data Test what you know so I can adapt! We need data for the hand in

Probabilistic ApproachRepeat N times independently

What does sample mean say about μ?

Sample mean: ν #heads/N

With Certainty? Nothing really

Probabilistically? Yes sample mean is likely close to bias

Sample:h,h,h,t,t,h,t,t,h

μ is unknown

Page 21: Machine Learning Week 2 Lecture 1. Quiz and Hand in data Test what you know so I can adapt! We need data for the hand in

Hoeffdings InequalityBinary Variables

Sample mean is probably close to μ

Bound is independent of sample mean and actual probability, e.g. the probability distribution P(x)

Probability increase with #samples N

Hoeffdings Inequality

Sample mean ν

coin bias μ

Sample mean is probably approximately correctPAC

Page 22: Machine Learning Week 2 Lecture 1. Quiz and Hand in data Test what you know so I can adapt! We need data for the hand in

Classification ConnectionTesting a Hypothesis

Fixed Hypothesis Unknown Target

is probability of picking x such that f(x) ≠ h(x)is probability of picking x such that f(x) = h(x)

μ is the sum of the probability of all the points X where hypothesis is wrong

Probability Distribution over x

Sample Mean - Out of sample Error

Page 23: Machine Learning Week 2 Lecture 1. Quiz and Hand in data Test what you know so I can adapt! We need data for the hand in

Learning Diagram

Data Set(x1,y1,...,xn,yn)

Learning Algorithm

Hypothesis hh(x) ≈ f(x)

Unknown Target f

Hypothesis Set

Unknown Input Probability

Distribution P(x)

Page 24: Machine Learning Week 2 Lecture 1. Quiz and Hand in data Test what you know so I can adapt! We need data for the hand in

Coins to hypotheses

Sample size N:h,h,h,t,t,h,t,t,h

Samplemean

unknown μ

Page 25: Machine Learning Week 2 Lecture 1. Quiz and Hand in data Test what you know so I can adapt! We need data for the hand in

Not Learning Yet

• Hypothesis fixed before seeing data• Every hypothesis has its own error (different coin for

each hypothesis)

• In learning we have a training algorithm that picks the “best” hypothesis from the set

• We are only verifying fixed hypothesis• Hoeffding has left the building again.

Page 26: Machine Learning Week 2 Lecture 1. Quiz and Hand in data Test what you know so I can adapt! We need data for the hand in

Coin Analogy – Exercise 1.10 Book

• Flip a fair coin 10 times• What is Probability of 10 heads?

• Repeat 1000 times (1000 coins)• What is the probability that some coin has 10

heads? Approximately 63%

Page 27: Machine Learning Week 2 Lecture 1. Quiz and Hand in data Test what you know so I can adapt! We need data for the hand in

Crude ApproachApply Union Bound

Union Bound:

P(True for some hypothesis)≤

.

.

.

Apply Union Bound and Then Hoeffding to each one

Page 28: Machine Learning Week 2 Lecture 1. Quiz and Hand in data Test what you know so I can adapt! We need data for the hand in

Result

Finite Hypothesis set with M hypotheses.

Data Set with N points

Classification Problem. Error is f(x)≠h(x)

It explains the idea of what we are looking for (model complexity is a factor it seems)Our “simple” linear models have infinite size hypothesis sets…

Page 29: Machine Learning Week 2 Lecture 1. Quiz and Hand in data Test what you know so I can adapt! We need data for the hand in

New Learning Diagram

Data Set(x1,y1,...,xn,yn)

Learning Algorithm

Hypothesis hh(x) ≈ f(x)

Unknown Target f

Hypothesis Set

Input Probability Distribution P(x)

finite X

Page 30: Machine Learning Week 2 Lecture 1. Quiz and Hand in data Test what you know so I can adapt! We need data for the hand in

Learning Feasibility

• Deterministic/No assumptions NOT SO MUCH• Probabilisticly YES:

• Generalization: Out of sample error Close to In Sample Error

• Make In Sample Error Small• If target function is complex learning should

be harder?

Page 31: Machine Learning Week 2 Lecture 1. Quiz and Hand in data Test what you know so I can adapt! We need data for the hand in

Error FunctionsUser Specified, Heavily Problem Dependent.Identity System, Fingerprints. Is the person who he says he is. h(x)/f(x) Lying True

Estimate Lying True Negative False Negative

Estimate True False Positive True Positive

h(x)/f(x) Lying True

Est. Lying 0

Est. True 0

Walmart. Discount for a given personError Function

h(x)/f(x) Lying True

Est. Lying 0

Est. True 0

CIA Access (Friday bar stock)Error Function

10001

1000 1

Page 32: Machine Learning Week 2 Lecture 1. Quiz and Hand in data Test what you know so I can adapt! We need data for the hand in

Error FunctionsIf Not Given

Base it on making problem “solvable”.. Making the problem smooth and convex seems like a good idea.Least Squares Linear Regression was very nice indeed.

Base on assumptions about target and noiseLogistic Regression: Gives Cross EntropyAssume linear and Gaussian noise: Gives Least Squares

Page 33: Machine Learning Week 2 Lecture 1. Quiz and Hand in data Test what you know so I can adapt! We need data for the hand in

Formalize Everything

Data Set(x1,y1,...,xn,yn)

Learning Algorithm

Hypothesis hh(x) ≈ f(x)

Unknown Target

Hypothesis Set

Unknown Probability Distribution P(x)

Page 34: Machine Learning Week 2 Lecture 1. Quiz and Hand in data Test what you know so I can adapt! We need data for the hand in

Final Diagram

Unknown Target Unknown Probability Distribution P(x)

Learn Importance

P(y | x)

Data Set

Learning Algorithm

Hypothesis Set

Final Hypothesis

Error Measure e

Page 35: Machine Learning Week 2 Lecture 1. Quiz and Hand in data Test what you know so I can adapt! We need data for the hand in

Words on out of sample error

Imagine X,y are finite sets

Page 36: Machine Learning Week 2 Lecture 1. Quiz and Hand in data Test what you know so I can adapt! We need data for the hand in

Quick Summary

• Learning Without Assumptions is impossible• Probabilistically learning is possible– Hoeffding bound – Work needed for infinite hypothesis spaces!

• Error function depend on problem• Formalized Learning Approach– Ensure out of sample error is close to in sample error– Minimize in sample error– Complexity of hypothesis set (size M currently) matters – More data helps