machine learning and pattern recognition-tutorial2

Machine Learning and Pattern Recognition, Tutorial Sheet

Number 2

School of Informatics, University of Edinburgh, Instructor: Charles Sutton

1. Most of this week’s tutorial work time is dedicated to going over the later questions on the last sheet inlight of what you learnt in the first tutorial. Your tutor will spend more time on that in the next tutorial.It is always worth reviewing material after the tutorial, and so there will be a second opportunity tocover those questions. Worked answers are available for those after Friday, and so you can use those tocheck where you may have got stuck, and to help you in formulating questions to ask in the tutorials.

If you are fully happy with the answers you have for last weeks questions, here are other questionsyou could try

2. Consider a bivariate Gaussian p(x1, x2) = N (x|µ,Σ) with

µ =

(µ1

µ2

), Σ =

(σ21 σ12

σ12 σ22

)a. What is p(x1)?

b. Suppose µ = 0. What is p(x1|x2)?

c. Contrast your answers from a. and b.

3. TFT panels (the screen) in computer monitors sometimes have dead pixels. During manufacture, thelocation of these pixels is uniformly distributed over the panel. Typically a good panel manufacturer willreject a panel if the dead pixel is in the central region. Define the centre (x, y) = (0, 0) to be the origin,and the whole 2D panel to be defined by the region |x| < 1, |y| < 1. Then the manufacturer with rejectthe panel if |x| < dr and |y| < dr. Rejected panels never get built into monitors, and so are never seenin shops. Different manufacturers use different values for dr.

Suppose you see a number n of monitors (e.g. n = 4) with dead pixels in a shop, all from thesame manufacturer. But you don’t know what the dr value is for that manufacturer. Write down thelikelihood for the value of dr with the data points (xi, yi), i = 1, 2, . . . n, with (xi, yi) being the locationof the dead pixel on each screen i. What is the maximum likelihood value for dr given this data. Plotmade-up example locations for the dead pixels in monitor space on a diagram, and show, by indicatingthe rejection region, what the maximum likelihood value for dr would be in those circumstances.

4. For fun, because it is a cool illustration of the problems of maximum likelihood, and as a chance todo some Matlab. Suppose you now had a prior for P (dr). For example a uniform prior between 0 and1. Now simulate the above situation in Matlab: generate dr from the prior (the Matlab rand functiongenerates random numbers uniformly between 0 and 1). Choose a large K (e.g. 10000) Repeatedly (fork = 1, 2, . . . ,K) generate n = 4 data points compatible with valid panels (using the rejection approachin Q2). This involves

1. Set i=1.

2. Repeat

3. Generate x ∼ Uniform(-1,1). Generate y ∼ Uniform(-1,1).

1

4. if |x| < dr and |y| < dr, reject panel and return to previous step.

5. otherwise set xi = x and yi = y, and increment i.

6. Until i > n.

Now for each set of n = 4 data points, compute rmin = mini(max(|xi|, |yi|)) for those n data pointsi = 1, 2, . . . , n. Collect the dr value and rmin value pairs for each set k, and store them in matlab variablesdr(k) and rmin(k). Then after doing this for all k = 1, 2, . . .K sets of data points, plot

plot(rmin,dr,’b.’)

to show you a sample plot for the map from the observed minimum distances (from the centre to the deadpixels) to the actual panel manufacturer thresholds. This shows the difference between the maximumlikelihood solution and the posterior distribution for this case. You can change the number of monitors nyou observe and rerun: you should see that the maximum likelihood value becomes a good estimate forthe posterior as n gets larger if the dr is near to 1, but continues for be a pretty bad estimate for smallvalues of dr even for larger amounts of data. You may also believe us uniform prior is a bad thing, andcan try others.

Try programming this yourself. You can check your results against the script given in paneltest.m.

4. This is really, really just for those who have nothing better to do! We certainly won’t do this in thetutorial. However if you use v =

√1− d2r as your parameterisation instead of dr and say put a uniform

prior on P (v), then the Bayesian integrals are simple to do analytically and you can work out all theanswers on paper... Didn’t you just want to know that...

Once again there will be a chance to go over these questions again in Tutorial 3.

2

Documents

machine learning and pattern recognition-tutorial2