64
Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13 th , 2009

Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

Some Mathematical Tools

for Machine Learning

Chris Burges

Microsoft Research

February 13th, 2009

Page 2: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

Contents – Part 1

1.Lagrange Multipliers: An Overview

2.Basic Concepts in Functional Analysis

3.Some Notes on Matrix Analysis

4.Convex Optimization: A Brief Tour

Page 3: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

A Brief Diversion

Leslie Lamport: “How to Write a Proof”: DEC tech report, Feb 14, 1993

Abstract: A method of writing proofs is proposed that makes it much harder

to prove things that are not true. The method, based on hierarchical

structuring, is simple and practical.

“Anecdotal evidence suggests that as many as a third of all papers

published in mathematical journals contain mistakes – not just minor errors,

but incorrect theorems and proofs.”

Page 4: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

Lagrange Multipliers:

An Overview, and Some Examples

Page 5: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

Lagrange the Mathematician

• Born 1736 in Turin, one of two of 11 to survive infancy

• “Responsible for much fine mathematics published under the names

of other mathematicians”

• Believed that a mathematician has not thoroughly understood his

own work till he has made it so clear that he can go out and explain

it to the first person he meets on the street

• Worked on mechanics, calculus, the calculus of variations

astronomy, probability, group theory, and number theory

• At least partly responsible for the choice of base 10 for the metric

system, rather than 12

• Supported by Euler and d’Alembert, financed by Frederick and Louis

XIV, close to Lavoisier, Marie Antoinette

Page 6: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

An indirect approach can be easier( ) ( ) 1,

0,

1,

Example: Minimize subject to

If could rotate to coordinate system and rescale so that

constraints take the form substitute with a parameterization

that encodes the c

nf x c x x Ax x

A

y y

1

1 1 2 1

2 1 2 1

:

sin sin sin

sin sin cos

onstraints that lives on

solve, and then apply the inverse mapping. But this can get

very complicated!

It is often not even possible to parameterize co

n

n

n

x

y

y

S

nstraints (for

example, polynomial constraints in several variables)

Page 7: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

One equality constraint

2( ) ( ,) 0 .Minimize subject to f x xc x

0f 1f

2f

3f

( ) 0c x

f

c

Page 8: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

One equality constraint, cont.

,

( ) 0

f c

f c

L f c

Hence at the optimum, we must have or:

0f 1f

2f

3f

( ) 0c x

f

c

Page 9: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

Multiple equality constraints

( ) 0, 1, , .

( ) ( ).

,

constraints:

Define gradients:

Let be the subspace spanned by the and let be its

orthogonal complement.

Suppose that at some point, all constraints hold, and

i

i i

i

n c x i n

g x c x

S g S

0

0, ( ) : ( ) 0

Then can increase (or decrease) by moving along

Hence or: i i i i

i i

f

f f

f f c x L f c

Puzzle: why not multiple Lagrangians?

Page 10: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

One inequality constraint

*

*

( ) ( ) 0.

( ) 0.

( ) ', ( ) 0.)

Find that minimizes subject to

What's new? At the solution, it's possible that

(Simple but not guaranteed: solve 'minimize check that

x f x c x

c x

f x c x

0f

1f

2f

3f

( ) 0c x

f

c

( ) 0c x

( ) 0c x

: ( ) 0, 0f c f c

Page 11: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

Multiple inequality constraints

( ) 0, 0i i ii

f c

Suppose that at the solution

Then removing makes no difference, and we must drop

from the sum in

Equivalently we can set

Hence, always impose

Page 12: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

A simple example

21 11 2 1 2

2 2

1 1 2 1 2 1 2 2

2 2 2

1 2 1 2 1 1 2 2

1 1 2 1 1

:

: , ,

( , ) 1 0, ( , ) 1

( , ) (1 ) (1 )

0 ( )

Extremize the distance between two points on

Embed in extremize

subject to

n

n n

i i

i

f x x x x

c x x x c x x x

L x x f c x x x x

L x x x

n

S

R

2 2 1 2 2

2 1 1 1 2 2

0

0 ( ) 0

(1 ) (1 )

2 0.

,

antipodal or equal: or i

L x x x

x x x x

Page 13: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

Another simple example

c

Given a parallelogram whose sidelengths you can choose but

whose perimeter is fixed - what shape has the largest area?

h

b

2( )

( , ) (2( ) )

0

bh b h c

L b h bh b h c

L b h

Maximize subject to

Again, not explicitly needed: hence “method of

undetermined multipliers”

Page 14: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

Simple exercises

2

2

1.

1 0 0

Puzzle: what coefficients maximize a convex sum of fixed numbers?

Puzzle: minimize subject to

Puzzle: maximize subject to and (hint: use )

i i

i i

i i i i i

i i

x x

x x x x

Page 15: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

Resource Allocation

Distortion

Rate

inNumber of widgets

( )

Cost

i ic n

Fiber has bit errors per second, and sends bits total, per second. We wish to maximize the bit rate at a fixed distortion rate:

Puzzle: Does this work for economics, too?

Page 16: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

A variational problem

An isoperimetric problem: find the curve of fixed length and

fixed endpoints {a,b} that encloses maximum area above [a,b].

x=1x=0

y

1 12 1/ 2

0 0

1 12 1/ 2

0 0

1 12 1/ 2

0 0

12 1/ 2

0

12 3/ 2

0

2 3/ 2

(1 ' )

(1 ' )

(1 ' ) ' '

1 '(1 ' )

1 ''(1 ' )

1 ''(1 ' ) 1 0

Area , length y dx y dx

L y dx y dx

L y dx y y y dx

dy y y dx

dx

y y y dx

y y

…straight line, or arc of circle.

Page 17: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

Which univariate distribution has max entropy?

( ) log ( )f x f x dx

Minimize

1

2

2

( ) 0 ,

( ) 1,

( ) ,

( )

f x x

f x dx

x f x dx c

x f x dx c

subject to:

( )( )

( )

functional derivative

g xx y

g y

Need :

2

1 2( ) log ( ) (1 ( ) ) ( ) ( )L f x f x dx f x dx x f x dx x f x dx

2

1 20, log ( ) log( ) 0( )

Lx f y e y y

f y

Impose integrate w.r.t

→ f must be Gaussian!

Page 18: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

Which univariate distribution has max

entropy?

Puzzle: I thought the uniform distribution has max entropy.

What’s going on?

Puzzle: What distribution do you get if you fix the mean,

but not the variance?

Puzzle: What distribution do you get if you fix only the

function’s support?

Page 19: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

Max Entropy for Discrete Distribn + Linear

Constraints

: 1, 0

Have discrete distribution

Suppose you also have known linear constraints:

but you are maximally uncertain about everything else. So want max

entropy distribution subject to

i i ii

ij j ji

P P P

a P C

log 1

0 exp( 1 ) (1/ )exp( )

these constraints.

i i j ij j j i i ii j i i i

k k j kj j kjj j

L P P a P C P P

L P a Z a

→ logistic regression!

Page 20: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

Are Lagrange Multipliers Really That Common?

Yes.

• Most flavors of Support Vector Machines

• Principal Component Analysis

• Canonical Correlation Analysis

• Locally Linear Embedding

• Laplacian Eigenmaps

• …

For example:

Page 21: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

Basic Concepts in

Functional Analysis:

A Brief Tour

of Hilbert spaces, Norms,

and All That

Page 22: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

What is a Field?

{ , , }: , { , } F F a set operations

{ , } is an Abelian group with identity denoted by 0F

{ 0, }F is an Abelian group with identity denoted by 1

( ) { , , }x y z x y x z x y z F

A field generalizes the notion of arithmetic on reals.

Page 23: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

Field : Examples

F

F

F

R

Q

C

With + meaning addition and meaning multiplication,

the reals:

the rationals:

the complex numbers:

2

{0

{

,1

: 0

}

} { , }?

2

+

Puzzle 1: For , why doesn't using "

with + meaning "X

OR" for + work?

Pu

What's the s

zzle 2: Is

OR" a

mallest fiel

nd meaning "AND"

a field und

?

er

d

x x

Z

Z

Page 24: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

How Many Fields Are There?

p

2

Actually infinitely many - e.g. , prime.

However, can define an 'ordering;' for fields (like <, =, > for ).

, can be ordered; and cannot; in fact all finite fields

cannot be ordered.

For orde

pZ

R

R Q C Z

.

red fields, 'supremum' and 'infimum' can be defined.

An ordered field is 'complete' iff every nonempty subset of

that has an upper bound in also has a supremum in

is not complete, but is. In

F

F F

Q R fact: every complete, ordered

field is isomorphic to !R

However, can define an ordering for some fields.

Page 25: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

What is a Vector Space?

,

( , )

( , )

;

( ) (

A vector space is a nonempty set a field and operations 'addition'

( from ) and 'multiplication by a scalar'

( from into ) such that:

(a)

(b)

E F

x y x y E E E

x x F E E

x y y x

x y z z y

);

, , ;

) ( ) ;

) ;

( ) ;

1 .

(c) For every there exists such that

(d) (

(e) (

(f)

(g)

z

x y E z E x y z

x x

x x x

x y x y

x x

nA vector space generalizes the notion of vectors in nR

Page 26: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

Vector Spaces: Field Matters!

{ , } { , } );

{ , } { , } );

{ , } { , } 2 );

(dimension

(dimension

(dimension

N

N

N

E F N

E F N

E F N

{ , }, 1

{ , }

:

For the vector space vectors and are linearly

independent.

For

'Linear

the vec

dependence' depends on field

tor space they are not.

i

F

Page 27: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

Vector Spaces: More Examples

( )( ) ( ) ( );

( )( ) ( ).

f g x f x g x

f x f x

(2) (complex by matrices), over the field ;mnM m n

(1) Functions, whose range is a vector space, also

themselves form a vector space:

1

1 1

1,

( ),

(3) : for the space of all infinite sequences of

complex numbers such that

p

p

n

n

n n n

n n

l p

z

x y x y x x

Page 28: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

What is an Inner Product?

, :

, , ,

, 0

, 0 0

, , ,

, ,

, ,

Let be a vector space over or . A function

is an if for all

(a)

(b)

(c)

(d) for all scalars

inner produ

)

ct

(e

V V V F

x y z V

x x

x x x

x y z x z y z

cx y c x y c F

x y y x

The inner product generalizes the notion of dot product

Page 29: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

Inner Product: Examples

2 1

{ , } ,

,

, ( )

,

(1) Vector space with , defined by

(2) Vector space over with , defined by

(3) Vector space of matrices over with

( )

Ni ii

i ii

T

pm

x y x y

l x y x y

X Y Trace X Y

X Y M

Page 30: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

Inner Product: Trace

2

2

( ) 0

( ) 0

(( ) , ) ( )

, 0

, 0 0

, , , :

,

(

,

)

(a)

(b)

(c)

(d) f

Ti

i

Ti

i

T T T

X X

X X x

x y

Tr X X X

Tr X X X

Tr Xz x z Yy z

cx y c

Z Tr X Z Tr Y Z

x y

( ) ( )

( ) ( ), ,

or all scalars

(e)

T T

T T

Tr X Y Tr X Y

Tr X Y Tr Yx X

c F

y y x

Page 31: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

2

1

1/ 2

| , | , , ,

| , |cos : 0

( , , )

Cauchy Schartz inequality holds for any inner product , :

for all

"Angle" between two vectors can be defined for any inner product:

x y x x y y x y V

x y

x x y y

2

1 2 1 0

3 4 1 2E.g. the angle between and is 78.4 degrees !

Inner Product is General

Page 32: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

What is a Norm?

:

, ,

0

0 0

| |

Let be a vector space over or . A function is

a norm if for all

(a)

(b)

(c) for all scalars

(d)

If condition (b) is droppe

Wh

d, it's a 'seminorm'.

at is the

V V

x y V

x

x x

cx c x c F

x y x y

simplest seminorm? : 0Ans x

Page 33: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

Seminorm splits the space

0

1 0 1

1.

{ : 0}

,

0.

If is a seminorm on , then is a subspace

(called the null space).

If is a subspace of such that then is a norm

on

Define The corresponding cosets form a vec

V V v V v

V V V V

V

x y x y

5 , ' , (3,2,1,0,0),

(0,0,0,1,0) (0,0,0,0,1).

tor

space, and on that sp

Example seminorm on : null space

spanned by

ace, is a no

and

rm.

x Dx D diag

Page 34: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

2 2 21 2 1 2

1/

( , , , )

{ } . .n=1

for is the Euclidean

norm.

Let The function defined by is a norm in

This works for finite sums too: define the norm on as:

nn n

pp

n p n p

np

x x x x x x x x

z z l z z l

l

x

1/

1 1 2| | | | | |

max({ })

{ }.

i=1

The norm is the 'Manhattan' norm

The norm is the 'max' norm

A normed vector space is a pair vector space, norm

np

p

i

n

i

x

l x x x x

l x x

Norm Generalizes Length

Page 35: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

What is convexity?

1 2 1

1 2( ) ( ).

A function is if, between and , it lies below the

chord joi

convex

strictly convexning and It is if it lies strictly below.

x x x

f x f x

1 2 1 2((1 ) ) (1 ) ( ) ( )

0 1

f x x f x f x

.

A set is if, for all and , all points on the line

joining and

convex

lie in

S a S b S

a b S

1x

2x

Page 36: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

When is a sphere a square?

2

1

: the 1-sphere

: the

Green

1-sphere

: the

Blu

Red

1-spher

e

e

l

l

l

Interesting... each ball looks convex

Page 37: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

When is a sphere a cross?

0

0

0

0 0,

0

| | 1.0

In this context, never. The " norm" (count the number of

nonzero elements) is not a norm (for over or ).

if

if

unless

n

l

v

v

v

00? lim ?

p

p iix x

Page 38: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

All norms are convex functions

1 2 1 2

1 2

1 2

(1 ) 1

| 1 | | |

1

x x x x

x x

x x

1 2

1 2 1 2

2

( ) ( ) 0

: ( ) 0 . ( 0 ( 0,

((1 ) ) (1 ) ( ) ( ) 0

1,

If is convex, then defines a convex set:

let Then if ) and )

the unit ball, for any norm is a convex region.

f x f x

R x f x f x f x

f x x f x f x

x

Page 39: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

Open, Closed, Compact

, :

: , 0 . . ( , )

:

: 0 ( ,0)

Given a norm and a set in a vector space

n

n

i

n S V

Open x S s t B x S

Closed complement of open

Bounded r suchthat S B r

Compact : Every sequence x in S contains convergent subsequence

with limit in S

O

1 2, , , , , ,

Compact sets are closed and bounded.

For finite dimensional normed vector spaces, every closed bounded

set is compa

NN i i

R : Every cover has a finite sub - cover :

S S S S S N finite suchthat S S

,

ct.

If for a normed vector space the unit ball is compact, then has

finite dimension.

V V

Page 40: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

Making your own norm

In fact, in real finite dimensional spaces, any symmetric,

compact, convex region centered on the origin defines

a norm (as the unit ball for that norm):

Page 41: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

0Rescuing : the Hamming norml

2

2

0

.

:

( )

( ) ( )

1

.

0

Consider n-tuples taking values in These form a vector

space over the field for example,

Now define the norm to be the number of nonzero elements

of Is

x y x y

x x

x x

l x N

x

Z

Z

0

0

0 0

0 0 0

0

0 0

| |

this a norm?

(a)

(b)

(c)

(d)

x

x x

cx c x

x y x y

Page 42: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

Hamming norm, cont.

2

2

1)

0

Puzzle: ( - how can this be correct?

Puzzle: What is the subtraction operation, in ?

Puzzle: What is the Hamming distance ?

N

x y

Z

Z

Page 43: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

When does a norm come from an inner product?

2 2

,

1,

4

Every inner product defines a norm:

Does every norm define an inner product? If so, for real vector spaces,

No! A necessary and sufficient condition for a norm to correspond t

x x x

x y x y x y

2 2 2 2

o an

inner product is the paralellogram identity:

1

2

(Jordan and von Neumann, 1935)

x y x y x y

Page 44: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

Lp norms, inner products

1/

2 /2

2 // 2

2 // 2

1 2 2 1 1 2 2 1 1 2 2

| | , | |

| | ? ,

, | | : , ,

, | ( ) | , ,

2.

1

where is integrable

E.g. try then but

unless

p

pp p

L

pp

pp

pp

f f f

f f f f

f g fg f g f g

f f g f f g f g f g

p

Page 45: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

Lp norms, inner products cont.

2 2

1

)

1,

4

Maybe we could find some other inner product that works for

all ?

No: if a norm is derivable from an inner product (over , then

. Choose two functions:

p

x y x y x y

1

1

1

1

( ) 0 : 0

( ) 1 : 0 1

( ) 2 : 1 2

( ) 0 : 2

f x x

f x x

f x x

f x x

2

2

2

0 : 1

1 : 1 2

0 : 2

f x

f x

f x

1 2

1

Page 46: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

Lp norms, inner products cont.

We’d like:

1 2 3 4 5 6 7 8 9 101.5

1.6

1.7

1.8

1.9

2

2.1

2.2

2.3

2.4

2.5

p

Page 47: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

norm on Rn has no inner product

2

2 2 2 2

2 2 2 22 2

: [1,0], [0,2]

1?

2

1 1(2 2 ) 4 1 4 5

2 2

[1,0,0, ,0], [0,2,0, ,0]

Example in

Use parallelogram test:

Extend to : n

x y

x y x y x y

x y x y x y

x y

R

R

Page 48: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

What is a Cauchy Sequence?

{ }

, .

Cauchy sequenc

A sequence of vectors in a normed vector space is called

a if for every >0 there exists a number

such that for all

e

n

m n

x

M

x x m n M

1x2x 3x

4x 5x 6x

Key idea: the Cauchy sequence

allows us to define notions of

convergence without ever

leaving the space

Page 49: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

Cauchy sequences, cont.

Every convergent sequence is a Cauchy sequence.

Not every Cauchy sequence is a convergent sequence.

Note Cauchy sequence requires choice of norm!

[0,1]

2 3

(0,1) [0,1].

max | ( ) | .

( ) 12! 3! !

1,2, { }

(0,1)

: the space of polynomials on Choose the

norm Define

for . Then is a Cauchy sequence but it does

not converge in becaus

n

n

n

l

P P x

x x xP x x

n

n P

P

P e its limit is not a polynomial.

Page 50: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

The notion of completeness

1 .

A normed vector space is called if every Cauchy

sequence in converges to an element of .

complete

Banach spaA complete normed space is called a .

with norm is complete, for all

The se

ce

q

np

E

E E

l p

2 1

2

1

[ , ]

[ , ]

uence space is complete, for all .

with norm is complete.

with norm or norm is not complete.

"All square integrable functions with norm" is complete.

A complete space "con

pl p

C a b L

C a b L L

L

tains all its limit points."

It is always possible to 'complete' a non-complete space.

Page 51: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

Hilbert and Banach spaces

A Hilbert space is a complete inner product space.

Normed Inner Product

Banach Hilbert

[ , ] : ,C a b x y xy

[0,1]Polynomials on with max norm

Page 52: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

Hilbert and Banach spaces, cont.

Normed

Banach

Hilbert

2[ , ] with

norm

C a b L

Polynomials on [0,1], max norm

[ , ] with normC a b L2pL

2pl

Finite dim. inner product spaces

2, all square

integrable fns

L

2l

Page 53: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

How many kinds of Hilbert spaces are there?

1 2 1,2

1

: ,

) ( ) ( ) ,linear mappin

A mapping where are vector spaces, is called

a if ( and

all scalar .

g

s ,

T E E E

T x y T x T y x y E

1 2

1 2

1

( ), ( ) ,

, .

A Hilbert space is said to be to a Hilbert space

if there exists a 1-1 linear mapping from to such that

for every Such a

isomorphic

Hilbert spac is called a e i

H H

T H H

T x T y x y

x y H T

1 2.of

somo

onto

rp ism

h

H H

Page 54: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

How many kinds of Hilbert spaces are there?

2 2[ , ]E.g. , are separable Hilbert spaces.l L a b

2.

, .

Let be separable:

If is infinite dimensional, then it is isomorphic to

If has dimension then it is is

Answe

omorp

r: 2..

hic to

.

N

H

H l

H N

A Hilbert space H is called separable if and only if it admits a

countable orthonormal basis. (So, all finite dimensional

Hilbert spaces are separable).

Page 55: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

The Riesz Representation Theorem

1sup | ( ) |

. . ( ) )

The of a linear functional is defined:

A linear functional on a normed vector space is bounded

( if and only if its operator

norm

operato

is finite

r

.

norm

x

f

f f x

K s t f x K x x V

{ , }

: .

A linear on a normed vector space is

a linear mappi

functi

ng

onal V F

V F

Page 56: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

The Riesz Representation Theorem

0 0

0

.

( ) ,

, .

Let be a bounded linear functional on a Hilbert space

Then there exists exactly one such that

for all and in fact

f H

x H f x x x

x H f x

2

2

[ , ], .

( ) ( )

? ( ) ( ) ( ) ( ) ( ) ( ) ( )

? ( ) ( ) ( ) ( )1

( )

Example: Define a linear functional by

b

a

b b b

a a a

b b b

a a a

b

a

H L a b a b

f x x t dt

Linear f x y x t y t dt x t dt y t dt f x f y

Bounded f x x t dt x t dt x t dt

x t dt

1 12 2

1

b

a

dt b a x

Page 57: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

Riesz Representation Theorem, cont.

0 0

0

1/ 22

0

2

1 1

1: ,1 ( ).1 ( ) ( )

?

1

sup | ( ) | sup | ( ) | sup | ( ) | : ( ) 1

1/ ,

Can we find ? Try

Check:

The sup is found by choosing

b b

a a

b

a

b b b

x x

a a a

x x x x t dt x t dt f x

f x

x b a

f f x x t dt

x b a

x t dt x t dt

a

, 0 otherwise.

f

x

b a

t b

Page 58: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

A Brief Look Ahead

The Riesz representation theorem can be used to show that any Hilbert

space for which the evaluation functional is continuous, is a Reproducing

Kernel Hilbert Space: there exists K such that

In particular:

Representer Theorem (Kimeldorf and Wahba, 1971; Schölkopf and

Smola, 2002): Let be a strictly monotonic increasing function

and let be an arbitrary loss function. Then each minimizer of

the “regularized risk”

admits a representation of the form

Page 59: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

What is a metric space?

( , )

( , ) ,

( , ) ( , ) ;

For any set , let be a function (with range in ) defined on

the set of all ordered pairs of members of satisfying:

(i) is a finite real number for every pair of

(i

E x y

E E x y E

x y x y E E

( , ) 0 ;

( , ) ( , ) ( , ), { , , } .

:

i)

(iii)

Such a function is called a on ; a set with

metric is called a . Different choices of m

metric

metric sp etric on

give different me

ace

t

x y x y

y z x y x z x y z E

E E E E

E

ric spaces.

( , ) 0?Puzzle: What about x y

( , ) ( , )?Puzzle: How about x y y x

( , ) ( , ) ( , )?Puzzle: Where's the triangle inequality: x y x z z y

Page 60: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

Metric versus norm

( , )Every normed vector space is a metric space: define

But metric spaces are much more general:

x y x y

Normed spaces

Metric spaces

2.2 1.0

3.4

4.5 3.1

6.0

1.1 0.5

5.9

1.1 6.4

2.3

9.6

19.5d

( ( )) ( )Metrics extend "continuity": f B x B f x

Page 61: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

Some metrics on function spaces

12

[ , ]

1

2

2

:[ , ]

, , ( , ) sup ( ) ( )

[ , ] : ( , ) ( ) ( ) ,

( , ) ( ) ( ( , ) ?) ,

Let be the set of all bounded functions . For two

points is a metric.

Let be

Sup

x a b

b

ap

a

b

A f a b

f g A f g f x g x

A C a b f g f x g x dx

f g f x g x dx f g

R.

( ) ( ), [ , ]

[ , ] :

( , ) sup ( ) ( ) , '( ) '( ) , , ( ) ( )

pose instead thenr

r rr x a b

A C a b

f g f x g x f x g x f x g x

Page 62: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

Topological Spaces

, :

,

,

A is a non-empty set, a fixed collection

of subsets of satisfying

(1) ,

(2) Intersection of any two sets in is in ,

(3) Unio

topological space

n of any collection of sets in is

T A A

A

A

S S a

S

S i S

S

1 1

,

.

,

in .

is called a topology for and the members of are called the

of

Topological spaces are more general than metric spaces!

Topological spaces extend "continuity": Given an

open

sets

A

T

T A 1

S.

S S

S 2 2

11 2 2 1

,

: , ( )

d and a

map is "continuous" if

T A

A A U A U A

2S

..

Continuity, convergence, connectedness… without distance!

Page 63: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

Putting Spaces in their Places…

Hilbert spaces

Banach spaces

Normed vector spaces

Metric spaces

Topological spaces

, : the "indiscrete" topology on A A S

Page 64: Some Mathematical Tools for Machine Learning (part 1)€¦ · Some Mathematical Tools for Machine Learning Chris Burges Microsoft Research February 13th, 2009. Contents –Part 1

~ The Middle ~

Thanks!