Upload
davrang
View
216
Download
0
Embed Size (px)
Citation preview
7/29/2019 cat. data
1/11
Lec4 1. April 18, 2005
Relations in categorical data
Conditional probability
Outline:
Conditional probability;
Independence of events; The Bayes rule;
Tree diagram;
Simpsons paradox.
1
7/29/2019 cat. data
2/11
Conditional Probability
IfP(B) = 0, the conditional probability of
event A given B has occurred, denoted byP(A |B), is defined by,
P(A |B) =P(A and B)
P(B)
We can rewrite the formula:
P(A and B) = P(A |B)P(B)
= P(B |A)P(A)
2
7/29/2019 cat. data
3/11
Example
A focus group of 10 consumers has been
selected to view a new TV commercial. After
the viewing, 2 members of the focus groupwill be randomly selected and asked to
answer detailed questions about the
commercial. The group contains 4 men and 6
women.
P(first chosen person is female) =?
P(second person is female | first person
is female) = ?
P(both people are female) =?
3
7/29/2019 cat. data
4/11
Independence
Events A and B are independent if
P(A |B) = P(A)
or equivalently
P(B |A) = P(B)
P(A and B) = P(A)P(B)
This means that the probability that Aoccurs is unchanged by information aboutwhether B has occurred (and vice versa).
IfA1, . . . , An are independent,P(A1 and A2 and An)
= P(A1) P(An)
Note: When A and B are disjoint
P(A or B) = P(A) + P(B)
P(A and B) = P() = 0
4
7/29/2019 cat. data
5/11
Example
John and Paul go duck hunting together.
Suppose that John hits a target with
probability 0.3 and Paul, independently,
with probability 0.1. They both fire one
shot at a duck.
Given that exactly one shot hits the
duck, what is the conditional
probability that it is Johns shot?
That it is Pauls?
Given that the duck is hit, what is the
conditional probability that John hitit? that Paul hit it?
5
7/29/2019 cat. data
6/11
The Bayess rule
For any two events A and B,
A = (A and B) or (A and Bc)
= (A B) (A Bc)
Therefore,
P(A) = P(A and B) + P(A and Bc)
= P(A |B)P(B) + P(A |Bc)P(Bc)
If we know the probabilities P(A |B),P(A |Bc) and P(B), we can reverse the
conditioning using Bayess rule:
P(B |A) =P(A |B)P(B)
P(A)=
P(A |B)P(B)
P(A |B)P(B) + P(A |Bc)P(Bc)
6
7/29/2019 cat. data
7/11
Example
A blood test is 99% effective in detecting
a certain disease when the disease is
present. However, the test also yields afalse-positive result for 2% of healthy
patients tested. Suppose 0.1% of the
population has the disease. What is the
probability that a randomly tested
individual actually has the disease given
that his or her test result is positive?
D = an individual has the disease
T = test result is positive
7
7/29/2019 cat. data
8/11
The Tree Diagram
5% of male high school athletes go on to playat college level. Of these, 1.7% enter major
league professional sports. Given that he
doesnt compete in college, the probability
that a high school athelete reaches the
professional play is 0.01%.
A = {competes in college}
B = {competes professionally}
What is the probability that a high
school athlete competes in college and
enter major league professional sports?
What is the probability that a high
school athlete will go on to professional
sports?
What proportion of professional athletes
competed in college?
8
7/29/2019 cat. data
9/11
Relationships in Categorical Data
A two-way table is a way to display the
data from two categorical variables.
Example: Surgery death rates in two
hospitals
Hospital A Hospital B
died 63 16
survived 2037 784total 2100 800
Hospital A: 3% of surgery patients die
Hospital B : 2% of surgery patients dieQuestion: Which hospital is safer?
9
7/29/2019 cat. data
10/11
Example (cont.)
Patients are classified as being in eitherpoor or good conditions beforesurgery.
good cond. poor cond.
A B A Bdied 6 8 57 8
survived 594 592 1443 192
total 600 600 1500 200
For patients in good condition:
hospital A: 1% of surgery patients die
hospital B: 1.3% of surgery patients die
For patients in poor condition:
hospital A: 3.8% of surgery patients die
hospital B: 4% of surgery patients die
10
7/29/2019 cat. data
11/11
An association or comparison that holdsfor all of several groups can reversedirection when the data are combined toform a single group. This reversal is called
Simpsons paradox.
11