View
216
Download
1
Embed Size (px)
Citation preview
Probability Theory:Paradoxes and Pitfalls
Great Theoretical Ideas In Computer ScienceGreat Theoretical Ideas In Computer Science
Steven Rudich, Steven Rudich, Anupam GuptaAnupam Gupta CS 15-251 Spring CS 15-251 Spring 20042004
Lecture 19 March 23, 2004Lecture 19 March 23, 2004 Carnegie Mellon Carnegie Mellon UniversityUniversity
Probability Distribution
A (finite) A (finite) probability distributionprobability distribution DD • a finite set a finite set SS of elements (samples) of elements (samples)• each each xx22SS has has probabilityprobability p(x)p(x) 22 [0,1] [0,1]
S
weights must sum to 1
0.30.30.2
0.1
0.050.050
“Sample space”
Probability Distribution
S
0.3
0.30.2
0.1
0.050.050
An “Event” is a subset
S
0.3
0.30.2
0.1
0.050.050
Pr[A] = 0.55
A
Probability Distribution
S
0.3
0.30.2
0.1
0.050.050
Total money = 1
Conditional probabilities
SA
Pr[x | A] = 0
Pr[y | A] = Pr[y] / Pr[A]
Conditional probabilities
SA
B
Pr [ B | A ] = x 2 B Pr[ x | A ]
Conditional probabilities
SA
B
Pr [ B | A ] = x 2 B Pr[ x | A ] = x 2 A Å B Pr[ x | A ]
= x 2 A Å B Pr[ x ] / Pr[A] = Pr[ A Å B ] / Pr[A]
Now, on to some fun puzzles!Now, on to some fun puzzles!
You have 3 dice
2 Players each rolls a 2 Players each rolls a die.die.
The player with the The player with the higher numberhigher number wins wins
A
B
C
You have 3 dice
Which die is Which die is best to have best to have – – AA, B, or , B, or CC ? ?
A
B
C
A is better than B
When rolled, 9 equally likely When rolled, 9 equally likely outcomesoutcomes
77 9 977 5 577 1 1
66 9 966 5 566 1 1
22 9 922 5 5 22 1 1
A A beats B beats B 5/95/9 of the time of the time
B is better than C
Again, 9 equally likely outcomesAgain, 9 equally likely outcomes
1 1 33 1 1 44 1 1 88
5 5 33 5 5 44 5 5 8 8
9 9 33 9 9 44 9 9 88
B beats B beats C C 5/95/9 of the time of the time
A beats B with Prob. 5/9B beats C with Prob. 5/9
Q) If you chose first, which die would Q) If you chose first, which die would you take?you take?
Q) If you chose second, which die Q) If you chose second, which die would you take?would you take?
C is better than A!
Alas, the same story!Alas, the same story!
3 3 22 3 3 66 3 3 7 7
4 4 22 4 4 66 4 4 77
88 22 8 8 66 8 8 77
C C beats beats A A 5/95/9 of the time! of the time!
First Moral
““Obvious” properties, such as Obvious” properties, such as transitivity, associativity, transitivity, associativity,
commutativity, etc… commutativity, etc… need to be rigorously argued.need to be rigorously argued.
Because sometimes they are Because sometimes they are
FALSE.FALSE.
Second Moral
Stay on your toes!Stay on your toes!
When reasoning about probabilities….When reasoning about probabilities….
Third Moral
To make money from a sucker in a To make money from a sucker in a bar, offer him the first choice of die. bar, offer him the first choice of die.
(Allow him to change to your “lucky” (Allow him to change to your “lucky” die any time he wants.)die any time he wants.)
Coming up next…
More of the pitfalls of probability.More of the pitfalls of probability.
NameName a body part that almost everyone on a body part that almost everyone on earth had an above average number of.earth had an above average number of.
FINGERS !!FINGERS !!
• Almost everyone has 10Almost everyone has 10• More people are missing some than haveMore people are missing some than have extras (# fingers missing > # of extras) extras (# fingers missing > # of extras)• Average: 9.99 …Average: 9.99 …
A Puzzle…
Almost everyone can be
above average!
Is a simple average a good statistic?Is a simple average a good statistic?
Several years ago Berkeley faced a law suit …
1.1. % of male applicants admitted to % of male applicants admitted to graduate school was 10%graduate school was 10%
2.2. % of female applicants admitted % of female applicants admitted to graduate school was 5%to graduate school was 5%
Grounds for discrimination? Grounds for discrimination?
SUITSUIT
Berkeley did a survey of its departments to find out which
ones were at fault
The result wasThe result was
SHOCKING…SHOCKING…
Every department was more likely to admit a female than a
male
#of females accepted #of females accepted to department Xto department X
#of males accepted #of males accepted to department Xto department X
#of female #of female applicants to applicants to department Xdepartment X
#of male applicants #of male applicants to department Xto department X
>>
How can this be ?
Answer
Women tend to apply to Women tend to apply to departments that admit a smaller departments that admit a smaller
percentage of their applicantspercentage of their applicants
WomenWomen MenMenDeptDept AppliedApplied AccepteAccepte
ddAppliedApplied AcceptedAccepted
AA 9999 44 11 00
BB 11 11 9999 1010
totaltotal 100100 55 100100 1010
Newspapers would publish these data…
Meaningless junk!Meaningless junk!
A single summary statistic A single summary statistic (such as an (such as an averageaverage, or a , or a medianmedian) ) may not summarize the data well ! may not summarize the data well !
Try to get a white ball
Choose one box and pick a random ball from it.
Max the chance of getting a white ball…
5/11 > 3/7
Better
Try to get a white ball
Better
6/9 > 9/14 Better
Try to get a white ball
Better
Better
Try to get a white ball
Better
Better
Better
11/20 < 12/21 !!!
Simpson’s Paradox
Arises all the time…Arises all the time…
Be careful when you interpret Be careful when you interpret numbersnumbers
Department of Transportation requires that each month all airlines report their “on-time
record”
# of on-time flights landing at # of on-time flights landing at nation’s 30 busiest airportsnation’s 30 busiest airports
# of total flights into those airports# of total flights into those airports
http://www.bts.gov/programs/oai/
Different airlines serve different airports with different frequency
An airline sending most of its planes into An airline sending most of its planes into fair weather airports will crush an airline fair weather airports will crush an airline
flying mostly into foggy airportsflying mostly into foggy airports
It can even happen that an airline has a better record at each airport, but gets a worse overall
rating by this method.
Alaska Alaska airlinesairlines
America WestAmerica West
% on % on timetime
# # flightsflights
% on % on timetime
# # flightsflights
LALA 88.988.9 559559 85.685.6 811811
PhoenixPhoenix 94.894.8 233233 92.192.1 52555255
San San DiegoDiego
91.791.7 232232 85.585.5 448448
SFSF 83.183.1 605605 71.371.3 449449
SeattleSeattle 85.885.8 21421466
76.776.7 262262
OVERALLOVERALL 86.786.7 37737755
89.189.1 72257225Alaska Air beats America West at each airport
but America West has a better overall rating!
An average may have several An average may have several different possible explanations… different possible explanations…
““Physicians are growing in number, Physicians are growing in number, but not in pay”but not in pay”
# Doctors# Doctors Average salary (1982)Average salary (1982)
19701970 334,000334,000 $103,900$103,900
19821982 480,000480,000 $99,950$99,950
Thrust of article: Market forces are at work
US News and World Report (’83)
Doctors earn more than ever.
But many old doctors have retired and been replaced with younger
ones.
Here’s another possibility
Rare diseases
Rare Disease
A person is selected at random and A person is selected at random and given test for rare disease given test for rare disease
“painanosufulitis”.“painanosufulitis”.
Only 1/10,000 people have it.Only 1/10,000 people have it.
The test is 99% accurate: it gives the wrong The test is 99% accurate: it gives the wrong answer (positive/negative) only 1% of the time.answer (positive/negative) only 1% of the time.
Does he have the disease?Does he have the disease?
What is the probability that he has the disease?What is the probability that he has the disease?
The person tests The person tests POSITIVE!!!POSITIVE!!!
sufferers· k/10,000
Disease Probability
•Suppose there are Suppose there are kk people in the population people in the population•At most At most k/k/10,00010,000 have the diseasehave the disease
k people
•But But k/100k/100 have false test results have false test results
So So k/k/100100 – k/ – k/10,00010,000 have false test results but have have false test results but have no disease!no disease!
false results k/100
It’s about 100 times more likely that he got a false positive!!
And we thought 99% accuracy was pretty good.
Conditional ProbabilitiesConditional Probabilities
You walk into a pet shop…
Shop A:Shop A: there are two parrots in a cage there are two parrots in a cage
The owner says “At least one parrot is male.”The owner says “At least one parrot is male.”
What is the chance that you get two males?What is the chance that you get two males?
Shop B:Shop B: again two parrots in a cage again two parrots in a cage
The owner says “The darker one is male.”The owner says “The darker one is male.”
What is the chance they are both male?What is the chance they are both male?FFFFFM FM MFMFMMMM
FFFF
FMFM
MFMF
MMMM
1/21/2 chance they are both chance they are both malemale
1/31/3 chance they are chance they are both maleboth male
Shop owner A says “At least one of the two is male”Shop owner A says “At least one of the two is male”
Shop owner B says “The dark one is male”Shop owner B says “The dark one is male”
Pet Shop Quiz
Intuition in probabilityIntuition in probability
Playing Alice and Bob
you beat Alice with probabilty 1/3you beat Alice with probabilty 1/3
you beat Bob with probability 5/6 you beat Bob with probability 5/6
You need to win two You need to win two consecutiveconsecutive games out of 3. games out of 3.
Should you playShould you play
Bob Alice BobBob Alice Bob or or Alice Bob AliceAlice Bob Alice??
Look closely
To win, we need To win, we need win middle game win middle game win one of {first, last} game.win one of {first, last} game.
must beat second player (for sure)must beat second player (for sure) must beat first player once in two tries.must beat first player once in two tries.
Should you playShould you playBob Alice BobBob Alice Bob or or Alice Bob AliceAlice Bob Alice??
Playing Alice and Bob
Bob Alice Bob:Bob Alice Bob:
Pr[ {WWW, WWL, LWW} ] Pr[ {WWW, WWL, LWW} ]
= = 11//3 3 (1 - (1 - 11//66* * 11//66) = 35/108.) = 35/108.
Alice Bob Alice:Alice Bob Alice:
Pr[ {WWW, WWL, LWW} ] Pr[ {WWW, WWL, LWW} ]
= = 55//6 6 (1 - (1 - 22//33* * 22//33)) = 50/108= 50/108
Bridge Hands have 13 cards
5 3 3 2 ?5 3 3 2 ?4 4 3 2 ?4 4 3 2 ?4 3 3 3 ?4 3 3 3 ?
What distribution of the 4 suits is most likely?
4 3 3 3
4 4 3 2
5 3 3 2
313 13
44 3
213 13 13
4 34 3 2
213 13 13
4 35 3 2
10 3#(4333) 34 11
4 9#(4432)510
Intuition could be wrongIntuition could be wrong
Work out the math to be 100% sureWork out the math to be 100% sure
“Law of Averages”
I flip a coin 10 times. It comes up heads each time!
What are the chances that my next coin flip is also heads?
“Law of Averages”?
“The number of heads and tails have to even out…”
Be Careful
Though the Though the sample averagesample average gets closer to ½, gets closer to ½, the deviation from the average may grow!the deviation from the average may grow!
After 100: 52 heads, sample average 0.52After 100: 52 heads, sample average 0.52deviation = 2deviation = 2
After 1000: 511 heads, sample average 0.511After 1000: 511 heads, sample average 0.511deviation = 11deviation = 11
After 10000: 5096 heads, sample average 0.5096After 10000: 5096 heads, sample average 0.5096deviation = 96deviation = 96
NN (odd) people, each of whom has a random bit (odd) people, each of whom has a random bit (50/50) on his/her forehead. (50/50) on his/her forehead.
No communication allowed. Each person goes to No communication allowed. Each person goes to a private voting booth and casts a vote for 1 or a private voting booth and casts a vote for 1 or 0. 0.
If the outcome of the election coincided with the If the outcome of the election coincided with the parityparity of the N bits, the voters “win” the election of the N bits, the voters “win” the election
A voting puzzle
Example:Example:
NN = 5, with bits 1 0 1 1 0 = 5, with bits 1 0 1 1 0
Parity = 1Parity = 1
If they vote If they vote 1 0 0 1 1,1 0 0 1 1, then majority = 1, they win. then majority = 1, they win.
If they vote If they vote 0 0 1 1 00 0 1 1 0, then majority = 0, they lose., then majority = 0, they lose.
A voting puzzle
NN (odd) people, each of whom has a (odd) people, each of whom has a randomrandom bit bit on his/her forehead. on his/her forehead.
No communication allowed. Each person goes to No communication allowed. Each person goes to a private voting booth and casts a vote for 1 or a private voting booth and casts a vote for 1 or 0. 0.
If the outcome of the election coincided with the If the outcome of the election coincided with the parityparity of the N bits, the voters “win” the election. of the N bits, the voters “win” the election.
How do voters maximize the How do voters maximize the probability of winning?probability of winning?
A voting puzzle
Note that each individual has no information about the parity
Since each individual is wrong half Since each individual is wrong half the time, the outcome of the the time, the outcome of the election is wrong half the timeelection is wrong half the time
Beware of the Fallacy!Beware of the Fallacy!
Solution
Note: to know parity is equivalent to knowing the bit on your forehead
STRATEGY:STRATEGY: Each person assumes the bit on his/her head is the same as the majority of bits he/she sees.
Vote accordingly (in the case of even split, vote 0).
STRATEGY:STRATEGY: Each person assumes the bit on Each person assumes the bit on his/her head is the same as the majority of bits his/her head is the same as the majority of bits he/she sees. Vote accordingly (in the case of he/she sees. Vote accordingly (in the case of even split, vote 0).even split, vote 0).
Two cases:Two cases:• difference of (# of 1’s) and (# of 0’s) > 1difference of (# of 1’s) and (# of 0’s) > 1• difference = 1difference = 1
Analysis
STRATEGY:STRATEGY: Each person assumes the bit on Each person assumes the bit on his/her head is the same as the majority of bits his/her head is the same as the majority of bits he/she sees. Vote accordingly (in the case of he/she sees. Vote accordingly (in the case of even split, vote 0).even split, vote 0).
ANALYSIS:ANALYSIS: The strategy works so long as the The strategy works so long as the difference in the number of 1’s and the number difference in the number of 1’s and the number of 0’s is at least two.of 0’s is at least two.
ProbabilityProbability
of winning = of winning = 2
1
( )21 1
N
NN
O N
Analysis
A Final Game
Greater or Smaller?
Alice and Bob play a gameAlice and Bob play a game
Alice picks two distinct random numbers Alice picks two distinct random numbers xx and and yy between 0 and 1 between 0 and 1
Bob chooses to know any one of them, say Bob chooses to know any one of them, say xx
Now, Bob has to tell whether Now, Bob has to tell whether x < yx < y or or x > yx > y
If Bob guesses at random, If Bob guesses at random,
chances of winning are 50%chances of winning are 50%
Can Bob improve his chances of Can Bob improve his chances of winning?winning?
Bob picks a number between 0 and 1 Bob picks a number between 0 and 1 at random, say at random, say zz..
If If x > zx > z, he says , he says xx is greater is greater
If If x < zx < z, he says , he says xx is smaller is smaller
Analysis
0 1x yz
If z lies between x and y, Bob’s answer is correctIf z lies between x and y, Bob’s answer is correct
Analysis
Since x and y are distinct, there is a non-zero Since x and y are distinct, there is a non-zero probability for z to lie between x and yprobability for z to lie between x and y
Hence, Bob’s probability of winning is more than 50%Hence, Bob’s probability of winning is more than 50%
0 1x yzz
If z lies between x and y, Bob’s answer is correctIf z lies between x and y, Bob’s answer is correct
If z does not lie between x and y, Bob’s If z does not lie between x and y, Bob’s answer is wrong 50% of the times.answer is wrong 50% of the times.
Final Lesson for today…
Keep your mind open towards new Keep your mind open towards new possibilities !possibilities !