42
The Poisson distribuon The Poisson distribuon Will Monroe Will Monroe July 12, 2017 July 12, 2017 with materials by Mehran Sahami and Chris Piech

The Poisson distribution - web.stanford.eduweb.stanford.edu/class/archive/cs/cs109/cs109.1178/lectures/08-poisson.pdf · The Poisson distribution Will Monroe July 12, 2017 with materials

Embed Size (px)

Citation preview

Page 1: The Poisson distribution - web.stanford.eduweb.stanford.edu/class/archive/cs/cs109/cs109.1178/lectures/08-poisson.pdf · The Poisson distribution Will Monroe July 12, 2017 with materials

The Poisson distributionThe Poisson distribution

Will MonroeWill MonroeJuly 12, 2017July 12, 2017

with materials byMehran Sahamiand Chris Piech

Page 2: The Poisson distribution - web.stanford.eduweb.stanford.edu/class/archive/cs/cs109/cs109.1178/lectures/08-poisson.pdf · The Poisson distribution Will Monroe July 12, 2017 with materials

Announcements: Problem Set 2

(Cell phone location sensing)

Due today!

Page 3: The Poisson distribution - web.stanford.eduweb.stanford.edu/class/archive/cs/cs109/cs109.1178/lectures/08-poisson.pdf · The Poisson distribution Will Monroe July 12, 2017 with materials

Announcements: Problem Set 3

(election prediction)

To be released later today.

Due next Wednesday, 7/19, at 12:30pm (before class).

Page 4: The Poisson distribution - web.stanford.eduweb.stanford.edu/class/archive/cs/cs109/cs109.1178/lectures/08-poisson.pdf · The Poisson distribution Will Monroe July 12, 2017 with materials

Announcements: Midterm

Two weeks from yesterday:

Tuesday, July 25, 7:00-9:00pm

Tell me by the end of thisweek if you have a conflict!

Page 5: The Poisson distribution - web.stanford.eduweb.stanford.edu/class/archive/cs/cs109/cs109.1178/lectures/08-poisson.pdf · The Poisson distribution Will Monroe July 12, 2017 with materials

Review: Basic distributions

X∼Bin (n , p)

Many types of random variables come up repeatedly. Known frequently-occurring distributions lets you do computations without deriving formulas from scratch.

family parametersvariable

We have ________ independent _________, INTEGER PLURAL NOUN

each of which ________ with probability VERB ENDING IN -S

________. How many of the ________ REAL NUMBER REPEAT PLURAL NOUN

________?REPEAT VERB -S

Page 6: The Poisson distribution - web.stanford.eduweb.stanford.edu/class/archive/cs/cs109/cs109.1178/lectures/08-poisson.pdf · The Poisson distribution Will Monroe July 12, 2017 with materials

Review: Bernoulli random variable

An indicator variable (a possibly biased coin flip) obeys a Bernoulli distribution. Bernoulli random variables can be 0 or 1.

X∼Ber ( p)

pX (1)=ppX (0)=1−p (0 elsewhere)

Page 7: The Poisson distribution - web.stanford.eduweb.stanford.edu/class/archive/cs/cs109/cs109.1178/lectures/08-poisson.pdf · The Poisson distribution Will Monroe July 12, 2017 with materials

Review: Bernoulli fact sheet

probability of “success” (heads, ad click, ...)

X∼Ber ( p)?

image (right): Gabriela Serrano

PMF:

expectation: E [X ]=p

variance: Var(X )=p(1−p)

pX (1)=ppX (0)=1−p (0 elsewhere)

Page 8: The Poisson distribution - web.stanford.eduweb.stanford.edu/class/archive/cs/cs109/cs109.1178/lectures/08-poisson.pdf · The Poisson distribution Will Monroe July 12, 2017 with materials

Review: Binomial random variable

The number of heads on n (possibly biased) coin flips obeys a binomial distribution.

pX (k)={(nk) p

k(1−p)n−k if k∈ℕ ,0≤k≤n

0 otherwise

X∼Bin (n , p)

Page 9: The Poisson distribution - web.stanford.eduweb.stanford.edu/class/archive/cs/cs109/cs109.1178/lectures/08-poisson.pdf · The Poisson distribution Will Monroe July 12, 2017 with materials

Review: Binomial fact sheet

probability of “success” (heads, crash, ...)

X∼Bin (n , p)

PMF:

expectation: E [X ]=np

variance: Var(X )=np(1−p)

number of trials (flips, program runs, ...)

Ber(p)=Bin (1 , p)note:

pX (k )={(nk) p

k(1−p)

n−k if k∈ℕ ,0≤k≤n

0 otherwise

Page 10: The Poisson distribution - web.stanford.eduweb.stanford.edu/class/archive/cs/cs109/cs109.1178/lectures/08-poisson.pdf · The Poisson distribution Will Monroe July 12, 2017 with materials

Poisson random variable

The number of occurrences of an event that occurs with constant rate λ (per unit time), in 1 unit of time, obeys a Poisson distribution.

pX (k)={e−λ λ

k

k !if x∈ℤ , x≥0

0 otherwise

X∼Poi (λ)

Page 11: The Poisson distribution - web.stanford.eduweb.stanford.edu/class/archive/cs/cs109/cs109.1178/lectures/08-poisson.pdf · The Poisson distribution Will Monroe July 12, 2017 with materials

Algorithmic ride sharingA ride-sharing program gets an average of 5 requests per minute from a region of the city.

X: number of requests this minute

Which looks like the most likely PMF for the distribution of X?

image: Zbigniew Bzdak, Chicago Tribune

0 1 2 3 4 5 6 7 8 90

0.050.1

0.150.2

0 1 2 3 4 5 6 7 8 90

0.5

10 1 2 3 4 5 6 7 8 9

00.05

0.10.15

0.2

←x

pX (x)

A)

B)

C)

D)

0 1 2 3 4 5 6 7 8 90

0.050.1

0.150.2

https://bit.ly/1a2ki4G → https://b.socrative.com/login/student/Room: CS109SUMMER17

Page 12: The Poisson distribution - web.stanford.eduweb.stanford.edu/class/archive/cs/cs109/cs109.1178/lectures/08-poisson.pdf · The Poisson distribution Will Monroe July 12, 2017 with materials

Algorithmic ride sharingA ride-sharing program gets an average of 5 requests per minute from a region of the city.

X: number of requests this minute

Which looks like the most likely PMF for the distribution of X?

image: Zbigniew Bzdak, Chicago Tribune

0 1 2 3 4 5 6 7 8 90

0.050.1

0.150.2

0 1 2 3 4 5 6 7 8 90

0.5

10 1 2 3 4 5 6 7 8 9

00.05

0.10.15

0.2

←x

pX (x)

A)

B)

C)

D)

0 1 2 3 4 5 6 7 8 90

0.050.1

0.150.2

https://bit.ly/1a2ki4G → https://b.socrative.com/login/student/Room: CS109SUMMER17

Page 13: The Poisson distribution - web.stanford.eduweb.stanford.edu/class/archive/cs/cs109/cs109.1178/lectures/08-poisson.pdf · The Poisson distribution Will Monroe July 12, 2017 with materials

Algorithmic ride sharingA ride-sharing program gets an average of 5 requests per minute from a region of the city.

X: number of requests this minute

What is P(X = 3)?

image: Zbigniew Bzdak, Chicago Tribune

1 minute =

60 seconds

...0 0 0 0 0 0 0111

X∼Bin (n , p)

Page 14: The Poisson distribution - web.stanford.eduweb.stanford.edu/class/archive/cs/cs109/cs109.1178/lectures/08-poisson.pdf · The Poisson distribution Will Monroe July 12, 2017 with materials

Algorithmic ride sharingA ride-sharing program gets an average of 5 requests per minute from a region of the city.

X: number of requests this minute

What is P(X = 3)?

image: Zbigniew Bzdak, Chicago Tribune

1 minute =

60 seconds

...0 0 0 0 0 0 0111

X∼Bin (60,5

60) E [X ]=60 p=5

p=5

60

Page 15: The Poisson distribution - web.stanford.eduweb.stanford.edu/class/archive/cs/cs109/cs109.1178/lectures/08-poisson.pdf · The Poisson distribution Will Monroe July 12, 2017 with materials

Algorithmic ride sharingA ride-sharing program gets an average of 5 requests per minute from a region of the city.

X: number of requests this minute

What is P(X = 3)?

image: Zbigniew Bzdak, Chicago Tribune

1 minute =

60000 milliseconds

...

X∼Bin (60000,5

60000) E [X ]=60000 p=5

p=5

60000

Page 16: The Poisson distribution - web.stanford.eduweb.stanford.edu/class/archive/cs/cs109/cs109.1178/lectures/08-poisson.pdf · The Poisson distribution Will Monroe July 12, 2017 with materials

Algorithmic ride sharingA ride-sharing program gets an average of 5 requests per minute from a region of the city.

X: number of requests this minute

What is P(X = 3)?

image: Zbigniew Bzdak, Chicago Tribune

1 minute =

∞ instants

...

X∼Bin (∞ ,5∞) E [X ]=∞ p=5

p=5∞

Page 17: The Poisson distribution - web.stanford.eduweb.stanford.edu/class/archive/cs/cs109/cs109.1178/lectures/08-poisson.pdf · The Poisson distribution Will Monroe July 12, 2017 with materials

Binomial in the limitP(X=k )= lim

n→∞ (nk)(5n )

k

(1−5n )

n−k

=limn→∞

n!k !(n−k)!

5k

nk

(1−5/n)n

(1−5 /n)k

=limn→∞

n(n−1)⋯(n−k+1)

k !5k

nk

(1−5 /n)n

(1−5 /n)k

=limn→∞

nn

n−1n

⋯n−k+1

n5k

k !(1−5 /n)

n

(1−5/n)k

nn

,n−1n

,⋯,n−k+1

n→1 (1−5/n)

k→1k

=1

(1−5/n)n→e−5

=limn→∞

n(n−1)⋯(n−k+1)

k !5k

nk

(1−5 /n)n

(1−5 /n)k

Page 18: The Poisson distribution - web.stanford.eduweb.stanford.edu/class/archive/cs/cs109/cs109.1178/lectures/08-poisson.pdf · The Poisson distribution Will Monroe July 12, 2017 with materials

Binomial in the limitP(X=k )= lim

n→∞ (nk)(5n )

k

(1−5n )

n−k

=limn→∞

n!k !(n−k)!

5k

nk

(1−5/n)n

(1−5 /n)k

=limn→∞

n(n−1)⋯(n−k+1)

k !5k

nk

(1−5 /n)n

(1−5 /n)k

=limn→∞

nn

n−1n

⋯n−k+1

n5k

k !(1−5 /n)

n

(1−5/n)k

=limn→∞

n(n−1)⋯(n−k+1)

k !5k

nk

(1−5 /n)n

(1−5 /n)k

=5k

k !e−5

Page 19: The Poisson distribution - web.stanford.eduweb.stanford.edu/class/archive/cs/cs109/cs109.1178/lectures/08-poisson.pdf · The Poisson distribution Will Monroe July 12, 2017 with materials

Algorithmic ride sharingA ride-sharing program gets an average of 5 requests per minute from a region of the city.

X: number of requests this minute

What is P(X = 3)?

image: Zbigniew Bzdak, Chicago Tribune

1 minute =

∞ instants

...

X∼Bin (∞ ,5∞) P(X=3)=

53

3!e−5

≈0.14

Page 20: The Poisson distribution - web.stanford.eduweb.stanford.edu/class/archive/cs/cs109/cs109.1178/lectures/08-poisson.pdf · The Poisson distribution Will Monroe July 12, 2017 with materials

Algorithmic ride sharingA ride-sharing program gets an average of 5 requests per minute from a region of the city.

X: number of requests this minute

What is P(X = 3)?

image: Zbigniew Bzdak, Chicago Tribune

1 minute =

∞ instants

...

X∼Poi (5) P(X=3)=53

3!e−5

≈0.14

Page 21: The Poisson distribution - web.stanford.eduweb.stanford.edu/class/archive/cs/cs109/cs109.1178/lectures/08-poisson.pdf · The Poisson distribution Will Monroe July 12, 2017 with materials

Poisson: Fact sheet

rate of events (requests, earthquakes, chocolate chips, …)per unit time (hour, year, cookie, ...)

X∼Poi (λ)

PMF: pX (k )={e−λ λ

k

k !if k∈ℤ , k≥0

0 otherwise

Page 22: The Poisson distribution - web.stanford.eduweb.stanford.edu/class/archive/cs/cs109/cs109.1178/lectures/08-poisson.pdf · The Poisson distribution Will Monroe July 12, 2017 with materials

Poisson and Cookies

Make a very large chocolate chip cookie recipe.

Recipe tells you the overall ratio ofchocolate chips per cookie (λ).

But some cookies get more, some get less!

X ~ Poi(λ) is the number of chocolate chips in some individual cookie.

(This is called a “Poisson process”: independent discrete events [chocolate chips] scattered throughout continuous time or space [batter], with constant overall rate.)

Page 23: The Poisson distribution - web.stanford.eduweb.stanford.edu/class/archive/cs/cs109/cs109.1178/lectures/08-poisson.pdf · The Poisson distribution Will Monroe July 12, 2017 with materials

Poisson: Fact sheet

rate of events (requests, earthquakes, chocolate chips, …)per unit time (hour, year, cookie, ...)

X∼Poi (λ)

PMF: pX (k )={e−λ λ

k

k !if k∈ℤ , k≥0

0 otherwise

expectation: E [X ]=λ

Page 24: The Poisson distribution - web.stanford.eduweb.stanford.edu/class/archive/cs/cs109/cs109.1178/lectures/08-poisson.pdf · The Poisson distribution Will Monroe July 12, 2017 with materials

Expectation of Poisson

The cheater’s way:

Poi(λ)=Bin(n=∞ , p= λ∞ )

E [X ]=np=∞⋅λ∞=λ

The real way:

E [X ]=∑k=0

e−λ λk

k !⋅k

=∑k=1

e−λ λk

k !⋅k

=∑k=1

e−λ λk

(k−1)!

=λ e−λ∑k=1

∞λ

k−1

(k−1)!

=λ e−λ∑j=0

∞λ

j

j !

=λ e−λ⋅eλ

∑j=0

∞λ

j

j !=eλ

Page 25: The Poisson distribution - web.stanford.eduweb.stanford.edu/class/archive/cs/cs109/cs109.1178/lectures/08-poisson.pdf · The Poisson distribution Will Monroe July 12, 2017 with materials

Poisson: Fact sheet

rate of events (requests, earthquakes, chocolate chips, …)per unit time (hour, year, cookie, ...)

X∼Poi (λ)

PMF:

expectation: E [X ]=λ

pX (k )={e−λ λ

k

k !if k∈ℤ , k≥0

0 otherwise

Page 26: The Poisson distribution - web.stanford.eduweb.stanford.edu/class/archive/cs/cs109/cs109.1178/lectures/08-poisson.pdf · The Poisson distribution Will Monroe July 12, 2017 with materials

Variance of Poisson

The cheater’s way:

Poi(λ)=Bin(n=∞ , p= λ∞ )

Var (X )=np(1−p)

The real way:

E [X 2]=∑

k=0

e−λ λk

k !⋅k2

=∑k=1

e−λ λk

k !⋅k2

=λ∑k=1

e−λ λk−1

(k−1)!⋅k

∑j=0

∞λ

j

j !=eλ

=∞⋅λ∞⋅(1− λ

∞)=λ⋅1=λ

=λ∑j=0

e−λ λj

j !⋅( j+1)

=λ [∑j=0

e−λ λj

j !⋅j+∑

j=0

e−λ λj

j ! ]=λ [E [X ]+e−λ∑

j=0

∞λ

j

j ! ]=λ(λ+1)

Var (X )=E [X 2]−(E [X ])

2=λ(λ+1)−λ

2=λ

Page 27: The Poisson distribution - web.stanford.eduweb.stanford.edu/class/archive/cs/cs109/cs109.1178/lectures/08-poisson.pdf · The Poisson distribution Will Monroe July 12, 2017 with materials

Poisson: Fact sheet

rate of events (requests, earthquakes, chocolate chips, …)per unit time (hour, year, cookie, ...)

X∼Poi (λ)

PMF:

expectation: E [X ]=λ

variance: Var(X )=λ

pX (k )={e−λ λ

k

k !if k∈ℤ , k≥0

0 otherwise

Page 28: The Poisson distribution - web.stanford.eduweb.stanford.edu/class/archive/cs/cs109/cs109.1178/lectures/08-poisson.pdf · The Poisson distribution Will Monroe July 12, 2017 with materials

Siméon Denis Poisson

French mathematician (1781-1840)

● First paper at age 18● Became professor at 21● Published over 300 papers

« La vie n'est bonne qu'à deux choses : à faire des mathématiques et à les professer. »

—quoted in Arago (1854)

(“Life is only good for two things: doing mathematics and teaching it.”)

Page 29: The Poisson distribution - web.stanford.eduweb.stanford.edu/class/archive/cs/cs109/cs109.1178/lectures/08-poisson.pdf · The Poisson distribution Will Monroe July 12, 2017 with materials

Break time!

Page 30: The Poisson distribution - web.stanford.eduweb.stanford.edu/class/archive/cs/cs109/cs109.1178/lectures/08-poisson.pdf · The Poisson distribution Will Monroe July 12, 2017 with materials

Poisson: Fact sheet

rate of events (requests, earthquakes, chocolate chips, …)per unit time (hour, year, cookie, ...)

X∼Poi (λ)

PMF:

expectation: E [X ]=λ

variance: Var(X )=λ

pX (k )={e−λ λ

k

k !if k∈ℤ , k≥0

0 otherwise

Page 31: The Poisson distribution - web.stanford.eduweb.stanford.edu/class/archive/cs/cs109/cs109.1178/lectures/08-poisson.pdf · The Poisson distribution Will Monroe July 12, 2017 with materials

Web server hits

Your web server gets an average of 2 requests each second.

What’s the probability of exactly 5requests in a second?

X: number of requests in next sec.

P(X=5)=e−2 25

5!≈0.0361

Page 32: The Poisson distribution - web.stanford.eduweb.stanford.edu/class/archive/cs/cs109/cs109.1178/lectures/08-poisson.pdf · The Poisson distribution Will Monroe July 12, 2017 with materials

Earthquakes

There are an average of 2.8 major earthquakes in the world each year.

What’s the probability of >1major earthquake next year?

X: number of earthquakes next year

P(X>1)=1−P(X=0)−P(X=1)

=1−e−2.8 2.80

0!−e−2.8 2.81

1!=1−0.06−0.17

=0.77

Page 34: The Poisson distribution - web.stanford.eduweb.stanford.edu/class/archive/cs/cs109/cs109.1178/lectures/08-poisson.pdf · The Poisson distribution Will Monroe July 12, 2017 with materials

Poisson approximation of binomial

We derived Poisson as the limit of the binomial as n → ∞, with λ = np.

But it works (approximately) for finite n as well!

Bin (n , p)≈Poi (λ=np)

if: n is largep is small

Page 35: The Poisson distribution - web.stanford.eduweb.stanford.edu/class/archive/cs/cs109/cs109.1178/lectures/08-poisson.pdf · The Poisson distribution Will Monroe July 12, 2017 with materials

More space messages

P(X≤1)=P (X=0)+P(X=1)

≈0.67031+0.26815=0.9384 6

=0.99994000+4000⋅0.0001⋅0.99993999

=(40000 )(0.0001)

0(0.9999)

4000−0

+(40001 )(0.0001)

1(0.9999)

4000−1

P(X≤1)=P (X=0)+P(X=1)

≈0.67032+0.26813=0.9384 5

=e−0.4+0.4 e−0.4

≈e−0.4 0.40

0 !+e−0.4 0.41

1!

Sending a 4000-bit message through space. Each bit corrupted (flipped) with probability p = 0.0001.

X: number of bits flippedX ~ Bin(4000, 0.0001) ≈ Poi(0.4)

Page 36: The Poisson distribution - web.stanford.eduweb.stanford.edu/class/archive/cs/cs109/cs109.1178/lectures/08-poisson.pdf · The Poisson distribution Will Monroe July 12, 2017 with materials

Poisson approximation of binomial

P(X=k )

k

Page 37: The Poisson distribution - web.stanford.eduweb.stanford.edu/class/archive/cs/cs109/cs109.1178/lectures/08-poisson.pdf · The Poisson distribution Will Monroe July 12, 2017 with materials

Poisson approximations are chill

Poisson often still works despite “mild” violations of the binomial distribution assumptions:

– Independence

e.g., # of entries in each bucket in a hash table

– Same p:

e.g. prob. of birthdays is somewhat uneven

Page 38: The Poisson distribution - web.stanford.eduweb.stanford.edu/class/archive/cs/cs109/cs109.1178/lectures/08-poisson.pdf · The Poisson distribution Will Monroe July 12, 2017 with materials

Birthdays

m people in a room.What is the probability none share the same birthday?

P(Exy) ≈ 1/365

Exy: x and y have the same birthday (“success”)

P(X=0)=e−(m2 )

1365

⋅((m2 )

1365 )

0

0!=e

−(m2 )1

365

P(X = 0) < 1/2: n = 23

trials, one for each distinct pair of people (x, y)(m2 )

Approximate: X ~ Poi(λ) λ=(m2 )1

365

not all independent

Page 39: The Poisson distribution - web.stanford.eduweb.stanford.edu/class/archive/cs/cs109/cs109.1178/lectures/08-poisson.pdf · The Poisson distribution Will Monroe July 12, 2017 with materials

A real license plate seen at Stanford

Page 40: The Poisson distribution - web.stanford.eduweb.stanford.edu/class/archive/cs/cs109/cs109.1178/lectures/08-poisson.pdf · The Poisson distribution Will Monroe July 12, 2017 with materials

Randomness and clumping

uniformly random

regularpattern

“low-discrepancy”pseudorandom

Page 41: The Poisson distribution - web.stanford.eduweb.stanford.edu/class/archive/cs/cs109/cs109.1178/lectures/08-poisson.pdf · The Poisson distribution Will Monroe July 12, 2017 with materials

Application: Low-discrepancysampling for rendering

uniformly random

regularpattern

“low-discrepancy”pseudorandom

Page 42: The Poisson distribution - web.stanford.eduweb.stanford.edu/class/archive/cs/cs109/cs109.1178/lectures/08-poisson.pdf · The Poisson distribution Will Monroe July 12, 2017 with materials

Philosophical note:The structure of the universe

image: Andrew Z. Colvin