cat. data

  • Upload
    davrang

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

  • 7/29/2019 cat. data

    1/11

    Lec4 1. April 18, 2005

    Relations in categorical data

    Conditional probability

    Outline:

    Conditional probability;

    Independence of events; The Bayes rule;

    Tree diagram;

    Simpsons paradox.

    1

  • 7/29/2019 cat. data

    2/11

    Conditional Probability

    IfP(B) = 0, the conditional probability of

    event A given B has occurred, denoted byP(A |B), is defined by,

    P(A |B) =P(A and B)

    P(B)

    We can rewrite the formula:

    P(A and B) = P(A |B)P(B)

    = P(B |A)P(A)

    2

  • 7/29/2019 cat. data

    3/11

    Example

    A focus group of 10 consumers has been

    selected to view a new TV commercial. After

    the viewing, 2 members of the focus groupwill be randomly selected and asked to

    answer detailed questions about the

    commercial. The group contains 4 men and 6

    women.

    P(first chosen person is female) =?

    P(second person is female | first person

    is female) = ?

    P(both people are female) =?

    3

  • 7/29/2019 cat. data

    4/11

    Independence

    Events A and B are independent if

    P(A |B) = P(A)

    or equivalently

    P(B |A) = P(B)

    P(A and B) = P(A)P(B)

    This means that the probability that Aoccurs is unchanged by information aboutwhether B has occurred (and vice versa).

    IfA1, . . . , An are independent,P(A1 and A2 and An)

    = P(A1) P(An)

    Note: When A and B are disjoint

    P(A or B) = P(A) + P(B)

    P(A and B) = P() = 0

    4

  • 7/29/2019 cat. data

    5/11

    Example

    John and Paul go duck hunting together.

    Suppose that John hits a target with

    probability 0.3 and Paul, independently,

    with probability 0.1. They both fire one

    shot at a duck.

    Given that exactly one shot hits the

    duck, what is the conditional

    probability that it is Johns shot?

    That it is Pauls?

    Given that the duck is hit, what is the

    conditional probability that John hitit? that Paul hit it?

    5

  • 7/29/2019 cat. data

    6/11

    The Bayess rule

    For any two events A and B,

    A = (A and B) or (A and Bc)

    = (A B) (A Bc)

    Therefore,

    P(A) = P(A and B) + P(A and Bc)

    = P(A |B)P(B) + P(A |Bc)P(Bc)

    If we know the probabilities P(A |B),P(A |Bc) and P(B), we can reverse the

    conditioning using Bayess rule:

    P(B |A) =P(A |B)P(B)

    P(A)=

    P(A |B)P(B)

    P(A |B)P(B) + P(A |Bc)P(Bc)

    6

  • 7/29/2019 cat. data

    7/11

    Example

    A blood test is 99% effective in detecting

    a certain disease when the disease is

    present. However, the test also yields afalse-positive result for 2% of healthy

    patients tested. Suppose 0.1% of the

    population has the disease. What is the

    probability that a randomly tested

    individual actually has the disease given

    that his or her test result is positive?

    D = an individual has the disease

    T = test result is positive

    7

  • 7/29/2019 cat. data

    8/11

    The Tree Diagram

    5% of male high school athletes go on to playat college level. Of these, 1.7% enter major

    league professional sports. Given that he

    doesnt compete in college, the probability

    that a high school athelete reaches the

    professional play is 0.01%.

    A = {competes in college}

    B = {competes professionally}

    What is the probability that a high

    school athlete competes in college and

    enter major league professional sports?

    What is the probability that a high

    school athlete will go on to professional

    sports?

    What proportion of professional athletes

    competed in college?

    8

  • 7/29/2019 cat. data

    9/11

    Relationships in Categorical Data

    A two-way table is a way to display the

    data from two categorical variables.

    Example: Surgery death rates in two

    hospitals

    Hospital A Hospital B

    died 63 16

    survived 2037 784total 2100 800

    Hospital A: 3% of surgery patients die

    Hospital B : 2% of surgery patients dieQuestion: Which hospital is safer?

    9

  • 7/29/2019 cat. data

    10/11

    Example (cont.)

    Patients are classified as being in eitherpoor or good conditions beforesurgery.

    good cond. poor cond.

    A B A Bdied 6 8 57 8

    survived 594 592 1443 192

    total 600 600 1500 200

    For patients in good condition:

    hospital A: 1% of surgery patients die

    hospital B: 1.3% of surgery patients die

    For patients in poor condition:

    hospital A: 3.8% of surgery patients die

    hospital B: 4% of surgery patients die

    10

  • 7/29/2019 cat. data

    11/11

    An association or comparison that holdsfor all of several groups can reversedirection when the data are combined toform a single group. This reversal is called

    Simpsons paradox.

    11