51
Continuous Joint Distributions Chris Piech CS109, Stanford University

Continuous Joint Distributions - Stanford University

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Continuous Joint Distributions - Stanford University

Continuous Joint DistributionsChris Piech

CS109, Stanford University

Page 2: Continuous Joint Distributions - Stanford University

1. Know how to use a multinomial2. Be able to calculate large bayes problems using a computer

3. Use a Joint CDF

Learning Goals

Page 3: Continuous Joint Distributions - Stanford University

Motivating Examples

Page 4: Continuous Joint Distributions - Stanford University

Four Prototypical Trajectories

Recall logs

Page 5: Continuous Joint Distributions - Stanford University

Log Reviewlog(x) = y implies e

y= x

log(x) = y implies e

y= x

Page 6: Continuous Joint Distributions - Stanford University

Log Identities

log(a · b) = log(a) + log(b)

log(a/b) = log(a)� log(b)

log(an) = n · log(a)

Page 7: Continuous Joint Distributions - Stanford University

Products become Sums!

log(

Y

i

ai) =X

i

log(ai)

log(a · b) = log(a) + log(b)

* Spoiler alert: This is important because the product of many small numbers gets hard for computers to represent.

Page 8: Continuous Joint Distributions - Stanford University

Four Prototypical Trajectories

Where we left off

Page 9: Continuous Joint Distributions - Stanford University

Joint Probability TableJointProbabilityTable

DiningHall EatingClub Cafe Self-madeMarginalYear

Freshman 0.02 0.00 0.02 0.00 0.04Sophomore 0.51 0.15 0.03 0.03 0.69

Junior 0.08 0.02 0.02 0.02 0.13Senior 0.02 0.05 0.01 0.01 0.085+ 0.02 0.01 0.05 0.05 0.07

MarginalStatus 0.65 0.23 0.13 0.11

Page 10: Continuous Joint Distributions - Stanford University

Fall2017 Spring2017

Change in Marginal!

Page 11: Continuous Joint Distributions - Stanford University

• Multinomial distribution§ n independent trials of experiment performed§ Each trial results in one of m outcomes, with

respective probabilities: p1, p2, …, pm where§ Xi = number of trials with outcome i

where and

å=

=m

iip

11

mcm

cc

mmm ppp

cccn

cXcXcXP ...,...,,

),...,,( 2121

212211 ÷÷

ø

öççè

æ====

ncm

ii =å

=1!!!

!,...,, 2121 mm ccc

nccc

n×××

=÷÷ø

öççè

æ

The Multinomial

Joint distribution Multinomial # ways of ordering the successes

Probabilities of each ordering are equal and

mutually exclusive

Page 12: Continuous Joint Distributions - Stanford University

• 6-sided die is rolled 7 times§ Roll results: 1 one, 1 two, 0 three, 2 four, 0 five, 3 six

• This is generalization of Binomial distribution§ Binomial: each trial had 2 possible outcomes§ Multinomial: each trial has m possible outcomes

7302011654321

61420

61

61

61

61

61

61

!3!0!2!0!1!1!7

)3,0,2,0,1,1(

÷øö

çèæ=÷

øö

çèæ

÷øö

çèæ

÷øö

çèæ

÷øö

çèæ

÷øö

çèæ

÷øö

çèæ=

====== XXXXXXP

Hello Die Rolls, My Old Friends

Page 13: Continuous Joint Distributions - Stanford University

• Ignoring order of words, what is probability of any given word you write in English?§ P(word = “the”) > P(word = “transatlantic”)§ P(word = “Stanford”) > P(word = “Cal”)§ Probability of each word is just multinomial distribution

• What about probability of those same words in someone else’s writing?§ P(word = “probability” | writer = you) >

P(word = “probability” | writer = non-CS109 student)§ After estimating P(word | writer) from known writings,

use Bayes’ Theorem to determine P(writer | word) for new writings!

Probabilistic Text Analysis

Page 14: Continuous Joint Distributions - Stanford University

According tothe GlobalLanguageMonitor there are 988,968 wordsintheenglishlanguageusedontheinternet.

A Document is a Large Multinomial

Page 15: Continuous Joint Distributions - Stanford University

Example document:“Pay for Viagra with a credit-card. Viagra is great. So are credit-cards. Risk free Viagra. Click for free.”n = 18

Text is a Multinomial

Viagra=2Free=2Risk=1Credit-card:2…For=2

P

✓1

2|spam

◆=

n!

2!2! . . . 2!p2viagra

p2free

. . . p2for

P

✓1

2|spam

◆=

n!

2!2! . . . 2!p2viagra

p2free

. . . p2for

P

✓1

2|spam

◆=

n!

2!2! . . . 2!p2viagra

p2free

. . . p2for

Probability of seeing this document | spam

It’s a Multinomial!

The probability of a word in spam email being viagra

Page 16: Continuous Joint Distributions - Stanford University

Four Prototypical Trajectories

Who wrote the federalist papers?

Page 17: Continuous Joint Distributions - Stanford University
Page 18: Continuous Joint Distributions - Stanford University

• Authorship of “Federalist Papers”

§ 85 essays advocating ratification of US constitution

§ Written under pseudonym “Publius”o Really, Alexander Hamilton, James

Madison and John Jay

§ Who wrote which essays?o Analyzed probability of words in each

essay versus word distributions from known writings of three authors

Old and New Analysis

Page 19: Continuous Joint Distributions - Stanford University

Four Prototypical Trajectories

Let’s write a program!

Page 20: Continuous Joint Distributions - Stanford University

Joint Expectation

E[g(X,Y )] =X

x,y

g(x, y)p(x, y)

E[g(X)] =X

x

g(x)p(x)

E[X] =X

x

xp(x)

• Expectation over a joint isn’t nicely defined because it is not clear how to compose the multiple variables:• Add them? Multiply them?

• Lemma: For a function g(X,Y) we can calculate the expectation of that function:

• Recall, this also holds for single random variables:

Page 21: Continuous Joint Distributions - Stanford University

E[X + Y] = E[X] + E[Y]

Generalized:

Holds regardless of dependency between Xi’s

åå==

=úû

ùêë

é n

ii

n

ii XEXE

11][

Expected Values of Sums

Page 22: Continuous Joint Distributions - Stanford University

E[X + Y ] = E[g(X,Y )] =X

x,y

g(x, y)p(x, y)

=X

x,y

[x+ y]p(x, y)

=X

x,y

xp(x, y) +X

x,y

yp(x, y)

=X

x

x

X

y

p(x, y) +X

y

y

X

x

p(x, y)

=X

x

xp(x) +X

y

yp(y)

= E[X] + E[Y ]

Skeptical Chris Wants a Proof!Let g(X,Y) = [X + Y]

E[X + Y ] = E[g(X,Y )] =X

x,y

g(x, y)p(x, y)

=X

x,y

[x+ y]p(x, y)

=X

x,y

xp(x, y) +X

x,y

yp(x, y)

=X

x

x

X

y

p(x, y) +X

y

y

X

x

p(x, y)

=X

x

xp(x) +X

y

yp(y)

= E[X] + E[Y ]

E[X + Y ] = E[g(X,Y )] =X

x,y

g(x, y)p(x, y)

=X

x,y

[x+ y]p(x, y)

=X

x,y

xp(x, y) +X

x,y

yp(x, y)

=X

x

x

X

y

p(x, y) +X

y

y

X

x

p(x, y)

=X

x

xp(x) +X

y

yp(y)

= E[X] + E[Y ]

E[X + Y ] = E[g(X,Y )] =X

x,y

g(x, y)p(x, y)

=X

x,y

[x+ y]p(x, y)

=X

x,y

xp(x, y) +X

x,y

yp(x, y)

=X

x

x

X

y

p(x, y) +X

y

y

X

x

p(x, y)

=X

x

xp(x) +X

y

yp(y)

= E[X] + E[Y ]

E[X + Y ] = E[g(X,Y )] =X

x,y

g(x, y)p(x, y)

=X

x,y

[x+ y]p(x, y)

=X

x,y

xp(x, y) +X

x,y

yp(x, y)

=X

x

x

X

y

p(x, y) +X

y

y

X

x

p(x, y)

=X

x

xp(x) +X

y

yp(y)

= E[X] + E[Y ]

E[X + Y ] = E[g(X,Y )] =X

x,y

g(x, y)p(x, y)

=X

x,y

[x+ y]p(x, y)

=X

x,y

xp(x, y) +X

x,y

yp(x, y)

=X

x

x

X

y

p(x, y) +X

y

y

X

x

p(x, y)

=X

x

xp(x) +X

y

yp(y)

= E[X] + E[Y ]

By the definition of g(x,y)

What a useful lemma

Break that sum into parts!

Change the sum of (x,y) into

separate sums

That is the definition of marginal probability

That is the definition of expectation

Page 23: Continuous Joint Distributions - Stanford University

Continuous Random Variables

Joint Distributions

Page 24: Continuous Joint Distributions - Stanford University

Four Prototypical Trajectories

Continuous Joint Distribution

Page 25: Continuous Joint Distributions - Stanford University

Riding the Marguerite

Youarerunningtothebusstop.Youdon’tknowexactlywhenthebusarrives.Youarriveat2:20pm.

WhatisP(wait<5min)?

Page 26: Continuous Joint Distributions - Stanford University

Joint Dart Distribution

P(hit within R pixels of center)?

Whatistheprobabilitythatadarthitsat(456.234231234122355,532.12344123456)?

Page 27: Continuous Joint Distributions - Stanford University

Joint Dart Distribution

Dart x location

Dar

t y lo

catio

n

0.005

0.12

P(hit within R pixels of center)?

Page 28: Continuous Joint Distributions - Stanford University

Joint Dart Distribution

Dart x location

Dar

t y lo

catio

n

0.005

0.12

P(hit within R pixels of center)?

Page 29: Continuous Joint Distributions - Stanford University

Joint Dart Distribution

Dart x location

Dar

t y lo

catio

nP(hit within R pixels of center)?

Page 30: Continuous Joint Distributions - Stanford University

0

y

x900

900

Joint Dart Distribution

Inthelimit,asyoubreakdowncontinuousvaluesintointestinallysmallbuckets,youendupwith

multidimensionalprobabilitydensity

Page 31: Continuous Joint Distributions - Stanford University

Ajointprobabilitydensityfunction givestherelativelikelihoodofmorethanonecontinuousrandomvariableeachtakingonaspecificvalue.

ò ò=£<£<2

1

2

1

),( ) ,(P ,2121

a

a

b

bYX dxdyyxfbYbaXa

Joint Probability Density Funciton

0

y

x 900

900

0 900

900

Page 32: Continuous Joint Distributions - Stanford University

ò ò=£<£<2

1

2

1

),( ) ,(P ,2121

a

a

b

bYX dxdyyxfbYbaXa

a1a2

b2b1

fX,Y (x, y)

x

y

Joint Probability Density Funciton

Page 33: Continuous Joint Distributions - Stanford University

𝑦

plot by Academo

Joint Probability Density Funciton

ò ò=£<£<2

1

2

1

),( ) ,(P ,2121

a

a

b

bYX dxdyyxfbYbaXa

Page 34: Continuous Joint Distributions - Stanford University

l Let X and Y be two continuous random variables§ where 0 ≤ X ≤ 1 and 0 ≤ Y ≤ 2

l We want to integrate g(x,y) = xy w.r.t. X and Y:§ First, do “innermost” integral (treat y as a constant):

§ Then, evaluate remaining (single) integral:

òòò òò ò=== == =

=úû

ùêë

é=÷÷

ø

öççè

æ=

2

0

2

0

22

0

1

0

2

0

1

0 21

2

0

1

yyy xy x

dyydyxydydxxydydxxy

101 42

10

222

0

=-=úû

ùêë

é=ò

=

ydyyy

Multiple Integrals Without Tears

Page 35: Continuous Joint Distributions - Stanford University

Marginalprobabilitiesgivethedistributionofasubsetofthevariables(often,justone)ofajointdistribution.

Sum/integrateoverthevariablesyoudon’tcareabout.

Marginalization

pX(a) =X

y

pX,Y (a, y)

fX(a) =

1Z

�1

fX,Y (a, y) dy

fY (b) =

1Z

�1

fX,Y (x, b) dx

Page 36: Continuous Joint Distributions - Stanford University

Darts!

X-Pixel Marginal

y

xY-Pixel Marginal

X ⇠ N✓900

2,900

2

◆Y ⇠ N

✓900

3,900

5

Page 37: Continuous Joint Distributions - Stanford University

• Cumulative Density Function (CDF):

ò ò¥- ¥-

=a b

YXYX dxdyyxfbaF ),( ),( ,,

),(),( ,

2

, baFbaf YXYX ba ¶¶¶=

Jointly Continuous

ò ò=£<£<2

1

2

1

),( ) ,(P ,2121

a

a

b

bYX dxdyyxfbYbaXa

Page 38: Continuous Joint Distributions - Stanford University

𝐹#,% 𝑥, 𝑦 = 𝑃 𝑋 ≤ 𝑥, 𝑌 ≤ 𝑦

𝑥

𝑦

to 0 asx → -∞,y → -∞,

to 1 asx → +∞,y → +∞,

plot by Academo

Jointly CDF

Page 39: Continuous Joint Distributions - Stanford University

Jointly Continuous

ò ò=£<£<2

1

2

1

),( ) ,(P ,2121

a

a

b

bYX dxdyyxfbYbaXa

a1a2

b2b1

fX,Y (x, y)

x

y

Page 40: Continuous Joint Distributions - Stanford University

𝑃 𝑎- < 𝑋 ≤ 𝑎/,𝑏- < 𝑌 ≤ 𝑏/ = 𝐹#,% 𝑎/,𝑏/

a1

a2

b1

b2

Probabilities from Joint CDF

Page 41: Continuous Joint Distributions - Stanford University

𝑃 𝑎- < 𝑋 ≤ 𝑎/,𝑏- < 𝑌 ≤ 𝑏/ = 𝐹#,% 𝑎/,𝑏/

a1

a2

b1

b2

Probabilities from Joint CDF

Page 42: Continuous Joint Distributions - Stanford University

𝑃 𝑎- < 𝑋 ≤ 𝑎/,𝑏- < 𝑌 ≤ 𝑏/ = 𝐹#,% 𝑎/,𝑏/

a1

a2

b1

b2

Probabilities from Joint CDF

Page 43: Continuous Joint Distributions - Stanford University

𝑃 𝑎- < 𝑋 ≤ 𝑎/,𝑏- < 𝑌 ≤ 𝑏/ = 𝐹#,% 𝑎/,𝑏/

−𝐹#,% 𝑎-,𝑏/

a1

a2

b1

b2

Probabilities from Joint CDF

Page 44: Continuous Joint Distributions - Stanford University

𝑃 𝑎- < 𝑋 ≤ 𝑎/,𝑏- < 𝑌 ≤ 𝑏/ = 𝐹#,% 𝑎/,𝑏/

−𝐹#,% 𝑎-,𝑏/

a1

a2

b1

b2

Probabilities from Joint CDF

Page 45: Continuous Joint Distributions - Stanford University

𝑃 𝑎- < 𝑋 ≤ 𝑎/,𝑏- < 𝑌 ≤ 𝑏/ = 𝐹#,% 𝑎/,𝑏/

−𝐹#,% 𝑎-,𝑏/

−𝐹#,% 𝑎/,𝑏-

a1

a2

b1

b2

Probabilities from Joint CDF

Page 46: Continuous Joint Distributions - Stanford University

𝑃 𝑎- < 𝑋 ≤ 𝑎/,𝑏- < 𝑌 ≤ 𝑏/ = 𝐹#,% 𝑎/,𝑏/

−𝐹#,% 𝑎-,𝑏/

−𝐹#,% 𝑎/,𝑏-

a1

a2

b1

b2

Probabilities from Joint CDF

Page 47: Continuous Joint Distributions - Stanford University

𝑃 𝑎- < 𝑋 ≤ 𝑎/,𝑏- < 𝑌 ≤ 𝑏/ = 𝐹#,% 𝑎/,𝑏/

−𝐹#,% 𝑎-,𝑏/

−𝐹#,% 𝑎/,𝑏-

+𝐹#,% 𝑎-,𝑏-

a1

a2

b1

b2

Probabilities from Joint CDF

Page 48: Continuous Joint Distributions - Stanford University

𝑃 𝑎- < 𝑋 ≤ 𝑎/,𝑏- < 𝑌 ≤ 𝑏/ = 𝐹#,% 𝑎/,𝑏/

−𝐹#,% 𝑎-,𝑏/

−𝐹#,% 𝑎/,𝑏-

+𝐹#,% 𝑎-,𝑏-

a1

a2

b1

b2

Probabilities from Joint CDF

Page 49: Continuous Joint Distributions - Stanford University

Probability for Instagram!

Page 50: Continuous Joint Distributions - Stanford University

Gaussian BlurInimageprocessing,aGaussianbluristheresultofblurringanimagebyaGaussianfunction.Itisawidelyusedeffectingraphicssoftware,typicallytoreduceimagenoise.

GaussianblurringwithStDev =3,isbasedonajointprobabilitydistribution:

fX,Y (x, y) =1

2⇡ · 32 e� x

2+y

2

2·32

FX,Y (x, y) = �⇣x

3

⌘· �

⇣y

3

Joint PDF

Joint CDF

Used to generate this weight matrix

Page 51: Continuous Joint Distributions - Stanford University

Gaussian Blur

fX,Y (x, y) =1

2⇡ · 32 e� x

2+y

2

2·32

FX,Y (x, y) = �⇣x

3

⌘· �

⇣y

3

Joint PDF

Joint CDF

EachpixelisgivenaweightequaltotheprobabilitythatX andY arebothwithinthepixelbounds.Thecenterpixelcoverstheareawhere

-0.5 ≤ x ≤ 0.5 and-0.5 ≤ y ≤ 0.5Whatistheweightofthecenterpixel?

P (�0.5 < X < 0.5,�0.5 < Y < 0.5)

=P (X < 0.5, Y < 0.5)� P (X < 0.5, Y < �0.5)

� P (X < �0.5, Y < 0.5) + P (X < �0.5, Y < �0.5)

=�

✓0.5

3

◆· �

✓0.5

3

◆� 2�

✓0.5

3

◆· �

✓�0.5

3

+ �

✓�0.5

3

◆· �

✓�0.5

3

=0.56622 � 2 · 0.5662 · 0.4338 + 0.43382 = 0.206

P (�0.5 < X < 0.5,�0.5 < Y < 0.5)

=P (X < 0.5, Y < 0.5)� P (X < 0.5, Y < �0.5)

� P (X < �0.5, Y < 0.5) + P (X < �0.5, Y < �0.5)

=�

✓0.5

3

◆· �

✓0.5

3

◆� 2�

✓0.5

3

◆· �

✓�0.5

3

+ �

✓�0.5

3

◆· �

✓�0.5

3

=0.56622 � 2 · 0.5662 · 0.4338 + 0.43382 = 0.206

P (�0.5 < X < 0.5,�0.5 < Y < 0.5)

=P (X < 0.5, Y < 0.5)� P (X < 0.5, Y < �0.5)

� P (X < �0.5, Y < 0.5) + P (X < �0.5, Y < �0.5)

=�

✓0.5

3

◆· �

✓0.5

3

◆� 2�

✓0.5

3

◆· �

✓�0.5

3

+ �

✓�0.5

3

◆· �

✓�0.5

3

=0.56622 � 2 · 0.5662 · 0.4338 + 0.43382 = 0.206

P (�0.5 < X < 0.5,�0.5 < Y < 0.5)

=P (X < 0.5, Y < 0.5)� P (X < 0.5, Y < �0.5)

� P (X < �0.5, Y < 0.5) + P (X < �0.5, Y < �0.5)

=�

✓0.5

3

◆· �

✓0.5

3

◆� 2�

✓0.5

3

◆· �

✓�0.5

3

+ �

✓�0.5

3

◆· �

✓�0.5

3

=0.56622 � 2 · 0.5662 · 0.4338 + 0.43382 = 0.206

P (�0.5 < X < 0.5,�0.5 < Y < 0.5)

=P (X < 0.5, Y < 0.5)� P (X < 0.5, Y < �0.5)

� P (X < �0.5, Y < 0.5) + P (X < �0.5, Y < �0.5)

=�

✓0.5

3

◆· �

✓0.5

3

◆� 2�

✓0.5

3

◆· �

✓�0.5

3

+ �

✓�0.5

3

◆· �

✓�0.5

3

=0.56622 � 2 · 0.5662 · 0.4338 + 0.43382 = 0.206

P (�0.5 < X < 0.5,�0.5 < Y < 0.5)

=P (X < 0.5, Y < 0.5)� P (X < 0.5, Y < �0.5)

� P (X < �0.5, Y < 0.5) + P (X < �0.5, Y < �0.5)

=�

✓0.5

3

◆· �

✓0.5

3

◆� 2�

✓0.5

3

◆· �

✓�0.5

3

+ �

✓�0.5

3

◆· �

✓�0.5

3

=0.56622 � 2 · 0.5662 · 0.4338 + 0.43382 = 0.206

Weight Matrix