of 16 /16
Maria-Florina Balcan Learning with Similarity Functions Maria-Florina Balcan & Avrim Blum CMU, CSD

# Maria-Florina Balcan Learning with Similarity Functions Maria-Florina Balcan & Avrim Blum CMU, CSD

• View
219

1

Embed Size (px)

### Text of Maria-Florina Balcan Learning with Similarity Functions Maria-Florina Balcan & Avrim Blum CMU,... Maria-Florina Balcan

Learning with Similarity Functions

Maria-Florina Balcan & Avrim BlumCMU, CSD Maria-Florina Balcan

Kernels and Similarity Functions

• Useful in practice for dealing with many different kinds of data.

• Elegant theory about what makes a given kernel good for a given learning problem.

Our Goal: analyze more general similarity functions.• In the process we describe ways of constructing good data dependent kernels.

Kernels have become a powerful tool in ML. Maria-Florina Balcan

Kernels• A kernel K is a pairwise similarity function s.t. 9 an implicit mapping s.t. K(x,y)=(x) ¢ (y).

• Point is: many learning algorithms can be written so only interact with data via dot-products.

• If replace x¢y with K(x,y), it acts implicitly as if data was in higher-dimensional -space.

• If data is linearly separable by large margin in -space, don’t have to pay in terms of data or comp time.

If margin in -space, only need 1/2 examples to learn well.

w

(x)

1 Maria-Florina Balcan

General Similarity Functions

Goal: definition of good similarity function for a learning problem that:

1) Talks in terms of natural direct properties:

• no implicit high-dimensional spaces• no requirement of positive-

semidefiniteness2) If K satisfies these properties for our given

problem, then has implications to learning.

3) Is broad: includes usual notion of “good kernel”.(induces a large margin separator in -space) Maria-Florina Balcan

A First Attempt: Definition satisfying properties (1) and (2)

• K:(x,y) ! [-1,1] is an (,)-good similarity for P if at least a 1- probability mass of x satisfy:

Ey~P[K(x,y)|l(y)=l(x)] ¸ Ey~P[K(x,y)|l(y)l(x)]+

Note: this might not be a legal kernel.

• Suppose that positives have K(x,y) ¸ 0.2, negatives have K(x,y) ¸ 0.2, but for a positive and a negative K(x,y) are uniform random in [-1,1].

Let P be a distribution over labeled examples (x, l(x))

A

BC

+

-- Maria-Florina Balcan

A First Attempt: Definition satisfying properties (1) and (2). How to use it?

• K:(x,y) ! [-1,1] is an (,)-good similarity for P if at least a 1- probability mass of x satisfy:

Ey~P[K(x,y)|l(y)=l(x)] ¸ Ey~P[K(x,y)|l(y)l(x)]+

Algorithm

• Draw S+ of O((1/2) ln(1/2)) positive examples.• Draw S- of O((1/2) ln(1/2)) negative examples.• Classify x based on which gives better score. Maria-Florina Balcan

A First Attempt: How to use it?• K:(x,y) ! [-1,1] is an (,)-good similarity for P if at least a 1- probability mass of x satisfy:

Algorithm

• Draw S+ of O((1/2) ln(1/2)) positive examples.• Draw S- of O((1/2) ln(1/2)) negative examples.• Classify x based on which gives better score.

• Hoeffding: for any given “good x”, probability of error w.r.t. x (over draw of S+, S-) at most 2.

• By Markov, at most chance that the error rate over GOOD is more than . So overall error rate · + .

Guarantee: with probability ¸ 1-, error · + Proof

Ey~P[K(x,y)|l(y)=l(x)] ¸ Ey~P[K(x,y)|l(y)l(x)]+ Maria-Florina Balcan

A First Attempt: Not Broad Enough• K:(x,y) ! [-1,1] is an (,)-good similarity for P if at least a 1- probability mass of x satisfy:

Ey~P[K(x,y)|l(y)=l(x)] ¸ Ey~P[K(x,y)|l(y)l(x)]+

• K(x,y)=x ¢ y has good (large margin) separator but doesn’t satisfy our definition.

+ +++++

-- -- --

more similar to negs than to typical pos Maria-Florina Balcan

A First Attempt: Not Broad Enough• K:(x,y) ! [-1,1] is an (,)-good similarity for P if at least a 1- probability mass of x satisfy:

Idea: would work if we didn’t pick y’s rom top-left.

Broaden to say: OK if 9 large region R s.t. most x are on average more similar to y2R of same label than to y2 R of other label.

Ey~P[K(x,y)|l(y)=l(x)] ¸ Ey~P[K(x,y)|l(y)l(x)]+

R+ ++++

+

-- -- -- Maria-Florina Balcan

• K:(x,y) ! [-1,1] is an (,)-good similarity for P if exists a weighting function w(y) 2 [0,1] at least a 1- probability mass of x satisfy:

Ey~P[w(y)K(x,y)|l(y)=l(x)] ¸ Ey~P[w(y)K(x,y)|l(y)l(x)]+ Maria-Florina Balcan

Main Definition, How to Use It• K:(x,y) ! [-1,1] is an (,)-good similarity for P if exists a weighting function w(y) 2 [0,1] at least a 1- probability mass of x satisfy:

Ey~P[w(y)K(x,y)|l(y)=l(x)] ¸ Ey~P[w(y)K(x,y)|l(y)l(x)]+

Algorithm

• Draw S+={y1, , yd}, S-={z1, , zd}, d=O((1/2) ln(1/2)).

• Use to “triangulate” data:

F(x) = [K(x,y1), …,K(x,yd), K(x,zd),…,K(x,zd)].

• Take a new set of labeled examples, project to this space, and run your favorite alg for learning lin. separators.

Point is: with probability ¸ 1-, exists linear separator of error · + at margin /4.

(w = [w(y1), …,w(yd),-w(zd),…,-w(zd)]) Maria-Florina Balcan

Main Definition, Implications

Algorithm

• Draw S+={y1, , yd}, S-={z1, , zd}, d=O((1/2) ln(1/2)).• Use to “triangulate” data:F(x) = [K(x,y1), …,K(x,yd), K(x,zd),…,K(x,zd)].

Guarantee: with prob. ¸ 1-, exists linear separator of error · + at margin /4.

Implications legal kernel

K arbitrary sim. function

(,)-good sim. function

(+,/4)-good kernel function Maria-Florina Balcan

Good Kernels are Good Similarity Functions

Main Definition: K:(x,y) ! [-1,1] is an (,)-good similarity for P if exists a weighting function w(y) 2 [0,1] at least a 1- probability mass of x satisfy:

Ey~P[w(y)K(x,y)|l(y)=l(x)] ¸ Ey~P[w(y)K(x,y)|l(y)l(x)]+

• An (,)-good kernel is an (’,’)-good similarity function under main definition.

Theorem

Our current proofs incur some penalty: ’ = + extra, ’ = 3extra. Maria-Florina Balcan

Good Kernels are Good Similarity Functions

• An (,)-good kernel is an (’,’)-good similarity function under main definition, where

Theorem

’ = + extra, ’ = 3extra.

Proof Sketch

• Suppose K is a good kernel in usual sense.• Then, standard margin bounds imply:

– if S is a random sample of size Õ(1/(2)), then whp we can give weights wS(y) to all examples y 2 S so that the weighted sum of these examples defines a good LTF.

• But, we want sample-independent weights [and bounded].

– Boundedness not too hard (imagine a margin-perceptron run over just the good y).

– Get sample-independence using an averaging argument. Maria-Florina Balcan

Learning with Multiple Similarity Functions

• Let K1, …, Kr be similarity functions s. t. some (unknown) convex combination of them is (,)-good.

• Draw S+={y1, , yd}, S-={z1, , zd}, d=O((1/2) ln(1/2)).

• Use to “triangulate” data:

F(x) = [K1(x,y1), …,Kr(x,yd), K1(x,zd),…,Kr(x,zd)].

Guarantee: The induced distribution F(P) in R2dr has a separator of error · + at margin at least

Algorithm

Sample complexity is roughly Maria-Florina Balcan

Implications & Conclusions

• Develop theory that provides a formal way of understanding kernels as similarity function.

• Our algorithms work for similarity fns that aren’t necessarily PSD (or even symmetric).

Open ProblemsOpen Problems

• Better results for learning with multiple similarity functions. Extending [SB’06].

• Improve existing bounds. ##### MARIA-FLORINA BALCAN Carnegie Mellon University Pittsburgh, …ninamf/cv_nina.pdf · 2020-01-16 · 5 2. Noise in Classification.Maria-Florina Balcan and Nika Haghtalab. Book Chapter
Documents ##### Commitment Without Regrets: Online Learning in ...nika/pubs/security_multi...Commitment Without Regrets: Online Learning in Stackelberg Security Games MARIA-FLORINA BALCAN, Carnegie
Documents ##### Connections between Learning Theory, Game Theory, and Optimization Maria Florina (Nina) Balcan Lecture 14, October 7 th 2010
Documents ##### A Discriminative Framework for Clustering via Similarity Functions Maria-Florina Balcan Carnegie Mellon University Joint with Avrim Blum and Santosh Vempala
Documents ##### Maria-Florina Balcan Incorporating Unlabeled Data in the Learning Process Maria Florina Balcan Lecture 25th
Documents ##### Improved Equilibria via Public Service Advertising Maria-Florina Balcan TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Documents ##### Machine Learning Theory Maria-Florina Balcan Lecture 1, Jan. 12 th 2010
Documents ##### Active Learning Algorithms for Graphical Model Selectionninamf/papers/active-graphical.pdf · Gautam Dasarathyy Aarti Singhz Maria Florina Balcan Jong Hyuk Park[F Abstract The problem
Documents ##### The Price of Uncertainty Maria-Florina Balcan Georgia Tech Avrim Blum Carnegie Mellon Yishay Mansour Tel-Aviv/Google ACM-EC 2009
Documents ##### Machine Learning 10-401, Spring 2018ninamf/courses/315sp19/lectures/overview.pdf · Machine Learning 10-315, Spring 2019 Maria-Florina (Nina) Balcan Lecture 1, 01/14/ 2019 Introduction,
Documents ##### Commitment without Regrets: Online Learning in Stackelberg Security Games Nika Haghtalab Carnegie Mellon University Joint work with Maria-Florina Balcan,
Documents ##### Learning Submodular Functions TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA A AAA Maria Florina Balcan LGO,
Documents ##### Modern Machine Learning: New Challenges and Connections Maria-Florina Balcan
Documents ##### Maria-Florina Balcan Active Learning Maria Florina Balcan Lecture 26th
Documents ##### Machine Learning 10-601 B, Spring 2016wcohen/10-601/nina/intro-ml-601.pdf · Machine Learning 10-601 B, Spring 2016 Maria-Florina (Nina) Balcan Lecture 1, Jan. 11th 2016 Introduction,
Documents ##### Distributed Machine Learning: Communication, Efficiency, and Privacy Avrim Blum [RaviKannan60] Joint work with Maria-Florina Balcan, Shai Fine, and Yishay
Documents ##### New Theoretical Frameworks for Machine Learning Maria-Florina Balcan Thesis Proposal 05/15/2007
Documents ##### Distributed k-Means and -Median Clustering on General ...Distributed k-Means and k-Median Clustering on General Topologies Maria Florina Balcan Steven Ehrlichy Yingyu Liangz Abstract
Documents ##### The Boosting Approach to Machine Learningninamf/courses/601sp15/slides/15_boosting_3-1… · The Boosting Approach to Machine Learning Maria-Florina Balcan 03/16/2015 . Boosting •
Documents ##### Maria-Florina Balcan Mechanism Design, Machine Learning and Pricing Problems Maria-Florina Balcan Joint work with Avrim Blum, Jason Hartline, and Yishay
Documents ##### Design and Analysis of Algorithms Maria-Florina (Nina) Balcan Lecture 1, Jan. 14 th 2011
Documents Documents