View
220
Download
1
Category
Preview:
Citation preview
Corrections & Hints•The formula for the entropy of a split
(Lec 4, Sep 2) is incorrect. Should be:
•Where is fraction of complete data that ends up in branch i
Corrections & Hints•Usual way is an index shuffle
•Generate an index vector and a random vector:
•Then sort on the random vector:
•Finally, use the “sorted” index row as index into X
Corrections & Hints•Concavity: The definition given in the
homework is buggy
•The final criterion, is wrong (to see this, consider
•A more usual definition is:
•then g() is concave iff:
)
Corrections & Hints•Missing values
•Both hepatitis and hypothyroid have many missing attribute (feature) values
•You have 3 choices
•Drop all points (instances) w/ missing values
•Report on effects on your data/results
•Handle missing attr as in Sec 4.7 or 4.1 of text
•Report (if you can) on effects
•Find new similar data sets with no missing values
•Many available in UCI data or other data sets from Weka group
Reminders•Office hours, Wed 9:00-11:00 AM, FEC345B•Homework due at start of next class (Thurs)•Late policy (from front material):
•1 day late => 50% off•I will relax for this assignment. Policy for
HW1:•15% per day late (incl weekend days)•You may “checkpoint” by handing in what
you have on the due date -- penalty points applied only to “delta”
•E.g. Turn in 60% of proj on Thurs, 40% on Sat before 4:00 (electronic)•Total score is 60%+0.7*40%=88%
•Read cheating policy -- don’t do it!!!
Reminder: 1-NN alg•Nearest neighbor: find the nearest
instance to the query point in feature space, return the class of that instance
•Simplest possible distance-based classifier
•With more notation:
•Distance function is anything appropriate to your data
Reminder: k-NN
•Slight generalization: k-Nearest neighbors (k-NN)
•Find k training instances closest to query point
•Vote among them for label
•Q: How does this affect system?
•Q: Why does it work?
Exercise•Show that k-NN does something reasonable:
•Assume binary data
•Let X be query point, X’ be any k-neighbor of X
•Let p=Pr[Class(X’)==Class(X)] (p>1/2)
•What is Pr[X receives correct label]?
•What happens as k grows?
•But there are tradeoffs...
•Let V(k,N)=volume of sphere enclosing k neighbors of X, assuming N points in data set
•For fixed N, what happens to V(k,N) as k grows?
•For fixed k, what happens to V(k,N) as N grows?
•What about radius of V(k,N)?
The volume question•Let V(k,N)=volume of sphere enclosing k
neighbors of X, assuming N points in data set
•Assume uniform point distribution
•Total volume of data is (1, w.l.o.g.)
•So, on average,
1-NN & speech proc.•1-NN closely related to technique of Vector
Quantization (VQ)•Used to compress speech data for cell
phones•CD quality sound:•16 bit/sample•44.1 kHz•⇒ 88.2 kB/sec ⇒ ~705 kbps
•Telephone (land line) quality:•~10 bit/sample•10 kHz•⇒ ~12.5 kB/sec ⇒ 100 kpbs
•Cell phones run at ~9600 bps...
Speech compression via VQ
D.S. cepstrum
Vectorrepresentation
Vectorquantize(1-NN)
Transmittedexemplar
(cell centroid)
Compression ratio•Original:
•10 bits
•10 kHz; 250 samples/“frame” (25ms/frame)
•⇒ 100 kbps; 2500 bits/frame
•VQ compressed:
•40 frames/sec
•1 VC centroid/frame
•~1M centroids ⇒ ~20 bits/centroid
•⇒ ~800 bits/sec!
Signal reconstruction
Transmittedcell centroid
Look upcepstral
coefficientsReconstruct
cepstrum
Convert to audio
Linear methods•Both methods we’ve seen so far:
•Are classification methods
•Use intersections of linear boundaries
•What about regression?
•What can we do with a single linear surface?
•Not as much as you’d like...
•... but still surprisingly a lot
•Linear regression is the proto-learning method
Linear regression prelims•Basic idea: assume y is a linear function
of X:
•Our job: find best to fit y “as well as possible”
•By “as well as possible”, we mean here, minimum squared error:
Useful definitions
•Definition: A trick is a clever mathematical hack
•Definition: A method is a trick you use more than once
A helpful “method”•Recall
•Want to be able to easily write
•Introduce “pseudo-feature” of X,
•Now have:
•And:
•So:
•And our “loss function” becomes:
Minimizing loss•Finally, can write:
•Want the “best” set of w: the weights that minimize the above
•Q: how do you find the minimum of a function w.r.t. some parameter?
Minimizing loss•Back up to the 1-d case
•Suppose you had the function:
•And wanted to find w that minimizes l()
•Std answer: take derivative, set equal to 0, and solve:
•To be sure of a min, check 2nd derivative too...
5 minutes of math...•Some useful linear algebra identities:
•If A and B are matrices,
(for invertible square matrices)
5 minutes of math...•What about derivatives of
vectors/matrices?
•There’s more than one kind...
•For the moment, we’ll need the derivative of a vector function with respect to a vector
•If x is a vector of variables, y is a vector of constants, and A is a matrix of constants, then:
Recommended