44
Biostat 200 Lecture 2

Biostat 200 Lecture 2. Trimmed mean (chapter 3) In order to remove extreme values that might affect the mean, you can calculate a trimmed mean Remove

Embed Size (px)

Citation preview

Page 1: Biostat 200 Lecture 2. Trimmed mean (chapter 3) In order to remove extreme values that might affect the mean, you can calculate a trimmed mean Remove

Biostat 200 Lecture 2

Page 2: Biostat 200 Lecture 2. Trimmed mean (chapter 3) In order to remove extreme values that might affect the mean, you can calculate a trimmed mean Remove

Trimmed mean (chapter 3)• In order to remove extreme values that might affect

the mean, you can calculate a trimmed mean • Remove the bottom 5% and the top 5% of values.

Be careful not to remove too much data – sometimes the 5th percentile is also the 10th percentile...

• There is no easy way to do this in Stata (I went in did it by hand)

• Extra credit for doing this on Assignment 1

Page 3: Biostat 200 Lecture 2. Trimmed mean (chapter 3) In order to remove extreme values that might affect the mean, you can calculate a trimmed mean Remove

Grouped data (chapter 3)• Sometimes you are given data in aggregate

form• The data consist of frequencies of each

individual value or range of values• For example:

Page 4: Biostat 200 Lecture 2. Trimmed mean (chapter 3) In order to remove extreme values that might affect the mean, you can calculate a trimmed mean Remove

Grouped mean• The mean uses the midpoint of each group• For the highest group, the use the midpoint

between the cutpoint and the maximum• Grouped Mean fi = the frequency in the ith group

mi = the midpoint of the ith group

= (25*40 + 125*72 + 275.5*58 + 860.5*98) / 268 = 411.6 cells/mm3 (mean from original data was

296.9)

k

i i

k

i ii

f

fmx

1

1

Page 5: Biostat 200 Lecture 2. Trimmed mean (chapter 3) In order to remove extreme values that might affect the mean, you can calculate a trimmed mean Remove

Grouped standard deviation• The standard deviation

= sqrt ( (25-411.6)2*40 + (125-411.6)2*72 + (275.5-411.6)2*58 + (860.5-411.6)2*98 ) / 267 ) = 383.4 cells/mm3

(SD from original data was 255.4)

1)(

)(

1

1

2

k

i i

k

i ii

f

fxms

Page 6: Biostat 200 Lecture 2. Trimmed mean (chapter 3) In order to remove extreme values that might affect the mean, you can calculate a trimmed mean Remove

Basic probability

Page 7: Biostat 200 Lecture 2. Trimmed mean (chapter 3) In order to remove extreme values that might affect the mean, you can calculate a trimmed mean Remove

Basic probability• Why – probability is the foundation of

statistical inference – Methods needed to infer the characteristics of

the population from which a sample was drawn

Pagano and Gavreau, Chapter 6

Population

Sample

Page 8: Biostat 200 Lecture 2. Trimmed mean (chapter 3) In order to remove extreme values that might affect the mean, you can calculate a trimmed mean Remove

Basic probability• Event – Result of an experiment or observation– Occurs or does not occur– Denoted by uppercase letters e.g. A,B, X– We will apply probability to events – i.e. we will

want to know the probability that an event occurs

– E.g. a disease occurrence, an extreme laboratory value

Pagano and Gavreau, Chapter 6

Page 9: Biostat 200 Lecture 2. Trimmed mean (chapter 3) In order to remove extreme values that might affect the mean, you can calculate a trimmed mean Remove

Basic probability• Frequentist definition of probability If an

experiment is repeated n times under essentially identical conditions, and if the event A occurs m times, then as n grows large, the ratio m/n approaches a fixed limit that is the probability of A

Pagano and Gavreau, Chapter 6

Page 10: Biostat 200 Lecture 2. Trimmed mean (chapter 3) In order to remove extreme values that might affect the mean, you can calculate a trimmed mean Remove

Basic probability• Probability of an event – relative frequency

of its occurrence in a large number of trials repeated under the same conditions– E.g. Probability child has malaria at time of study– Always lies between 0 and 1 (inclusive)– Denoted P(A) or P(X)

Pagano and Gavreau, Chapter 6

Page 11: Biostat 200 Lecture 2. Trimmed mean (chapter 3) In order to remove extreme values that might affect the mean, you can calculate a trimmed mean Remove

Basic probability• Complement of an event, Ā or AC (read Not A or A

complement)– E.g. the event that the person does not have malaria– P(A)= 1-P(Ā)

• In epidemiology, we often write E for exposed and Ē for not exposed

• Ω is the universe, all the possible outcomes of an event• P(Ω) = P(A) + P(Ā) = 1

A

A

Ā

Pagano and Gavreau, Chapter 6

Ω

Page 12: Biostat 200 Lecture 2. Trimmed mean (chapter 3) In order to remove extreme values that might affect the mean, you can calculate a trimmed mean Remove

Complement example

• Probability that someone has extremely drug resistant (XDR TB) versus they do not

• P(XDR TB+) + P(XDR TB-) = 1

Page 13: Biostat 200 Lecture 2. Trimmed mean (chapter 3) In order to remove extreme values that might affect the mean, you can calculate a trimmed mean Remove

Basic probability• The intersection of 2 events is written A ∩ B• The intersection is when both A and B occur– E.g. The event that a person has both malaria and

pulmonary tuberculosis– The probability that both occur is written P(A ∩ B)

Pagano and Gavreau, Chapter 6

Page 14: Biostat 200 Lecture 2. Trimmed mean (chapter 3) In order to remove extreme values that might affect the mean, you can calculate a trimmed mean Remove

Basic probability• The union of 2 events is written A U B• The union is if either A or B or both occur – E.g. The event that a person has either malaria or

tuberculosis or both– P(A U B) = P(A) + P(B) – P(A ∩ B)– The probability of A or B is the sum of their individual

probabilities minus the probability of their intersection

Pagano and Gavreau, Chapter 6

Page 15: Biostat 200 Lecture 2. Trimmed mean (chapter 3) In order to remove extreme values that might affect the mean, you can calculate a trimmed mean Remove

Basic probability• Two events are mutually exclusive if they cannot

occur together– In English: for mutually exclusive events, the

probability of A or B occurring is the sum of their individual probabilities; both cannot occur together so P(A ∩ B) = 0

– In probability lexicon: P(A U B) = P(A) + P(B) - P(A ∩ B) = P(A) + P(B)

Pagano and Gavreau, Chapter 6

Page 16: Biostat 200 Lecture 2. Trimmed mean (chapter 3) In order to remove extreme values that might affect the mean, you can calculate a trimmed mean Remove

Basic probability• Two events are mutually exclusive if they

cannot occur together– This is true for complements– E.g. • Being pregnant and not pregnant • You cannot be both

Pagano and Gavreau, Chapter 6

Page 17: Biostat 200 Lecture 2. Trimmed mean (chapter 3) In order to remove extreme values that might affect the mean, you can calculate a trimmed mean Remove

Basic probability• If A and B are mutually exclusive,

P(A U B) = P(A) + P(B)• This is the additive rule of probability• E.g.

P(HCV genotype 1) in the US = .7P(HCV genotype 2) in the US = .15

P(HCV genotype 3,4,6) = .15 P(HCV genotype 1 or 2) = .85

Pagano and Gavreau, Chapter 6

Page 18: Biostat 200 Lecture 2. Trimmed mean (chapter 3) In order to remove extreme values that might affect the mean, you can calculate a trimmed mean Remove

Basic probability• The additive rule of probability can be applied

to three or more mutually exclusive events• If none of the events can occur together, thenP(A1 U A2 U … U An ) = P(A1) + P(A2) + … P(An)

Pagano and Gavreau, Chapter 6

Page 19: Biostat 200 Lecture 2. Trimmed mean (chapter 3) In order to remove extreme values that might affect the mean, you can calculate a trimmed mean Remove

Probability summary• Complement: P(A)= 1-P(Ā)• Union: Prob A or B or both = P(A U B)

P(A U B) =P(A) + P(B) – P(A ∩ B)

• Intersection: Prob A and B = P(A ∩ B)

• For mutually exclusive events: P(A ∩ B)=0P(A U B) = P(A) + P(B) additive rule

• So A and Ā are mutually exclusive• ,

Pagano and Gavreau, Chapter 6

Page 20: Biostat 200 Lecture 2. Trimmed mean (chapter 3) In order to remove extreme values that might affect the mean, you can calculate a trimmed mean Remove

Basic probability example• A = the event that an individual is exposed to

high levels of carbon monoxide• B= the event that an individual is exposed to

high levels of nitrogen dioxide– What is the event A ∩ B called? What is that in

this example?– What is the event A U B called? What is it in this

example?– What is the complement of A?– Are A and B mutually exclusive?

Pagano and Gavreau, Chapter 6

Page 21: Biostat 200 Lecture 2. Trimmed mean (chapter 3) In order to remove extreme values that might affect the mean, you can calculate a trimmed mean Remove

Basic probability example– A ∩ B is the intersection of A and B. It is the

event that the person is exposed to both gases.– A U B is the union of A and B. It is the event that

the person is exposed to one or the other or both.

– Ac is the event that the person is not exposed to carbon monoxide.

– Are A and B mutually exclusive? Can they both occur? Yes. So NOT mutually exclusive.

Pagano and Gavreau, Chapter 6

Page 22: Biostat 200 Lecture 2. Trimmed mean (chapter 3) In order to remove extreme values that might affect the mean, you can calculate a trimmed mean Remove

Conditional probability• The probability that an event B will occur given

that event A has occurred– Notation: P(B|A)– Read: the probability of B given A

• Example: Probability of a person becoming infected with malaria given that he/she uses a bed net at night

• Event A is using a bed net• Event B is becoming infected with malaria

Page 23: Biostat 200 Lecture 2. Trimmed mean (chapter 3) In order to remove extreme values that might affect the mean, you can calculate a trimmed mean Remove

Conditional probability• Multiplicative rule of probability

P(A ∩ B) = P(A) P(B|A)So P(B|A) = P(A ∩ B) / P(A)

• Example: P(becoming infected with malaria | use a bed net)Answer: P( Becoming infected and using a bed net ) /

P(using a bed net)= number of people who become infected with

malaria who use a bed net / number of people who use a bed net

Page 24: Biostat 200 Lecture 2. Trimmed mean (chapter 3) In order to remove extreme values that might affect the mean, you can calculate a trimmed mean Remove

Probability example1992 U.S. birth statistics• Probability that mother’s age was ≤24 = 0.003 + 0.124 + 0.263 = 0.390 (What probability rule?)

• Given that a mother is under age 30, what is the probability that she is under age 20?P( Mother’s age<20 | Mother’s age<30 ) = P ( Mother’s age<20 and <30 ) / P(Mother’s age <30) = ( 0.003 + 0.124 ) / ( 0.003 + 0.124 + 0.263 + 0.290 ) = 0.127 / 0.68 = 0.187

Age of mother Probability

<15 0.003

15-19 0.124

20-24 0.263

25-29 0.290

30-34 0.220

35-39 0.085

40-44 0.014

45-49 0.001

Total 1.000

Page 25: Biostat 200 Lecture 2. Trimmed mean (chapter 3) In order to remove extreme values that might affect the mean, you can calculate a trimmed mean Remove

Examples of conditional probabilities

• Relative risk is the ratio of 2 conditional probabilities

P(disease | exposed) / P(disease | not exposed)

• Odds also include conditional probabilities P(disease | exposed) / (1- P(disease | exposed))

P(disease | not exposed) / (1- P(disease | not exposed))

• An odds ratio is the ratio of the two odds above

Page 26: Biostat 200 Lecture 2. Trimmed mean (chapter 3) In order to remove extreme values that might affect the mean, you can calculate a trimmed mean Remove

Independence

• If the occurrence of B does not depend on A, – then P(B|A) = P(B)– Example: Probability of becoming infected with

malaria given that you wear a blue shirt = probability of becoming infected with malaria

– Then the multiplicative rule is P(A ∩ B) = P(A) P(B)– Example: coin tosses – the probability of a heads on

the 2nd throw is independent of the outcome on the first throw

Pagano and Gavreau, Chapter 6

Page 27: Biostat 200 Lecture 2. Trimmed mean (chapter 3) In order to remove extreme values that might affect the mean, you can calculate a trimmed mean Remove

Independence

Note that independence ≠ mutual exclusivity!– Mutual exclusivity • 2 events cannot both occur• P(A ∩ B) =0

– Independence • 2 events do not depend on each other• P(B|A)=P(B)• P(A ∩ B) = P(A) P(B)

Pagano and Gavreau, Chapter 6

Page 28: Biostat 200 Lecture 2. Trimmed mean (chapter 3) In order to remove extreme values that might affect the mean, you can calculate a trimmed mean Remove

Law of Total Probability• The law of total probability:

P(B) = P(B ∩ A) + P(B ∩ Ā) P(B) = P(B|A)P(A) + P(B|Ā)P(Ā)

More generally P(B) = P(B ∩ A1) + P(B ∩ A2) + … + P(B ∩ An)

if P(A1 U A2 U … U An ) = 1

P(B) = P(B|A1)P(A1) + P(B|A2)P(A2) + … + P(B|An)P(An)

Pagano and Gavreau, Chapter 6

Page 29: Biostat 200 Lecture 2. Trimmed mean (chapter 3) In order to remove extreme values that might affect the mean, you can calculate a trimmed mean Remove

Law of Total Probability• Helpful when you cannot directly calculate

a probability• Example: – Suppose you know the TB prevalence in different

areas and the population size in those areas, and you want to know the worldwide TB prevalence

– P(TB+) = P(TB+| live in lower income country)*P(live in lower income country) + P(TB+| live in upper income country)*P(live in upper income country)

– Weighted average of the 2 TB rates

Pagano and Gavreau, Chapter 6

Page 30: Biostat 200 Lecture 2. Trimmed mean (chapter 3) In order to remove extreme values that might affect the mean, you can calculate a trimmed mean Remove

Diagnostic tests

• Diagnostic tests of disease are rarely perfect– True positives – the test is positive given the person has the

disease • The probability of this is P(T+|D+) = Sensitivity

– False positives – the test is positive although the person does not have the disease

– True negatives – the test is negative given the person does not have the disease• The probability of this is P(T-|D-) = Specificity

– False negatives – the test is negative even though the person has the disease

Pagano and Gavreau, Chapter 6

Page 31: Biostat 200 Lecture 2. Trimmed mean (chapter 3) In order to remove extreme values that might affect the mean, you can calculate a trimmed mean Remove

Diagnostic tests

• Sensitivity = P(T+|D+) = P(T+∩D+)/P(D+) = TP/(TP+FN)

• Specificity = P(T-|D-) = P(T-∩D-)/P(D-) = TN/(FP+TN)

TRUTHTRUTH

DD++ DD--

TestTest TT++ TPTP FPFP

TT-- FNFN TNTN

Pagano and Gavreau, Chapter 6

Page 32: Biostat 200 Lecture 2. Trimmed mean (chapter 3) In order to remove extreme values that might affect the mean, you can calculate a trimmed mean Remove

Diagnostic tests

• Diagnostic test characteristics (sensitivity and specificity) are based on experiments in which the test is compared to a “gold standard”

Pagano and Gavreau, Chapter 6

Page 33: Biostat 200 Lecture 2. Trimmed mean (chapter 3) In order to remove extreme values that might affect the mean, you can calculate a trimmed mean Remove

Diagnostic test validation example

• New biological markers of chronic alcohol consumption are being developed. Phosphatidylethanol (PEth) is a metabolite of alcohol that is formed only in the presence of alcohol.

• Researchers examined a group of alcoholics being admitted to inpatient alcohol detoxification (n=56) and a group of abstainers in a closed psychiatric ward (n=35).

Hartmann, Addiction Biology, 2006Pagano and Gavreau, Chapter 6

Page 34: Biostat 200 Lecture 2. Trimmed mean (chapter 3) In order to remove extreme values that might affect the mean, you can calculate a trimmed mean Remove

Diagnostic tests example

• Number of positive PEth tests among the alcoholics using a cutoff of 0.36 µmol/l = Sensitivity

= 53/56 = 94.6%

• Number of negative PEth tests using a cutoff of 0.36 µmol/l among the abstainers = Specificity

= 35/35 = 100% Hartmann, Addiiction Biology, 2006

““TRUTH”TRUTH”

Alc+Alc+ Alc-Alc-

PethPeth

TestTest

++ 5353 00

-- 33 3535

Pagano and Gavreau, Chapter 6

Page 35: Biostat 200 Lecture 2. Trimmed mean (chapter 3) In order to remove extreme values that might affect the mean, you can calculate a trimmed mean Remove

Diagnostic tests• The level of the cutoff for a diagnostic test can be set

to– Maximize sensitivity -- this will decrease specificity!

• This might be ideal if a follow up confirmatory test is easy and you want to be sure not to miss any positives

– Maximize specificity -- this will decrease sensitivity!• This might be necessary if there are grave ramifications of a false

positive test

• Receiver-operator curves illustrate this tension– The ROC curve plots the sensitivity versus the specificity

for a test at every possible test cutoff

Pagano and Gavreau, Chapter 6

Page 36: Biostat 200 Lecture 2. Trimmed mean (chapter 3) In order to remove extreme values that might affect the mean, you can calculate a trimmed mean Remove

Diagnostic tests example• ROC of PEth to detect maximum breathalyzer

result over 21 days ≥.1% g/l in Mbarara, Uganda

Pagano and Gavreau, Chapter 6

Page 37: Biostat 200 Lecture 2. Trimmed mean (chapter 3) In order to remove extreme values that might affect the mean, you can calculate a trimmed mean Remove

Bayes’ theorem for diagnostic tests• Suppose you know from diagnostic testing that– The sensitivity of a new rapid HIV antibody

test (P(T+|HIV+)) is 0.96– The specificity P(T-|HIV-)) of the test is 0.99

• You want to know the probability that someone with a positive test using this test is truly infected with HIV – What is P(HIV+|T+) ?

• This is called the Positive Predictive Value (PPV) of the test

Pagano and Gavreau, Chapter 6

Page 38: Biostat 200 Lecture 2. Trimmed mean (chapter 3) In order to remove extreme values that might affect the mean, you can calculate a trimmed mean Remove

Want to know P(HIV+|T+)Instead we know:

Sensitivity P(T+|HIV+) and Specificity P(T-|HIV-) and P(T-|HIV+) = 1-sensitivity (false negatives) and P(T+|HIV-) = 1-specificity (false positives)

Pagano and Gavreau, Chapter 6

Bayes’ theorem for diagnostic tests

Page 39: Biostat 200 Lecture 2. Trimmed mean (chapter 3) In order to remove extreme values that might affect the mean, you can calculate a trimmed mean Remove

Bayes’ theorem

• P(A|B)=P(B|A)P(A) / P(B)

• Proof:– By definition of conditional probability– P(A|B)=P(A∩B)/P(B) • P(A∩B) = P(A|B)*P(B)

– P(B|A)=P(A∩B)/P(A) • P(A∩B) = P(B|A)P(A)

so P(A|B)*P(B) = P(B|A)P(A) P(A|B)=P(B|A)*P(A) / P(B)

Pagano and Gavreau, Chapter 6

Page 40: Biostat 200 Lecture 2. Trimmed mean (chapter 3) In order to remove extreme values that might affect the mean, you can calculate a trimmed mean Remove

By Bayes’ theorem:P(HIV+|T+) = P(T+|HIV+)*P(HIV+) / P(T+)

P(T+|HIV+) = 0.96 (sensitivity) P(HIV+) in sub-Saharan Africa is = 0.02

P(T+) = P(T+|HIV+) P(HIV+) + P(T+|HIV-) P(HIV-) by the law of total probability

= 0.96*0.02 + 0.01*0.98 P(HIV+|T+) = 0.96*0.02/(0.0192+0.0098) = 0.662

Pagano and Gavreau, Chapter 6

Bayes’ theorem for diagnostic tests

Page 41: Biostat 200 Lecture 2. Trimmed mean (chapter 3) In order to remove extreme values that might affect the mean, you can calculate a trimmed mean Remove

The prevalence of HIV was assumed to be 2%So before testing, the probability that a randomly

selected person is infected with HIV is .02This is the prior probability.

The probability that someone who tests positive has HIV is .662

This is the posterior probabilityIt incorporates the information gained by doing the test

Pagano and Gavreau, Chapter 6

Prior and posterior probability

Page 42: Biostat 200 Lecture 2. Trimmed mean (chapter 3) In order to remove extreme values that might affect the mean, you can calculate a trimmed mean Remove

What is P(HIV+|T+) in a population in which the HIV prevalence is 0.004?

P(HIV+|T+) = P(T+|HIV+)*P(HIV+) / P(T+) P(T+|HIV+)=0.96 P(HIV+) is =0.004

P(T+) = P(T+|HIV+) P(HIV+) + P(T+|HIV-) P(HIV-) = 0.96*0.004 + 0.01*0.996

P(HIV+|T+) = 0.96*0.004/(0.00384+0.0096) = 0.278

Pagano and Gavreau, Chapter 6

Bayes’ theorem for diagnostic tests

Page 43: Biostat 200 Lecture 2. Trimmed mean (chapter 3) In order to remove extreme values that might affect the mean, you can calculate a trimmed mean Remove

Bayes’ theoremBayes’ theorem allows you to use what you

know about the conditional probability of one event on another to help you understand the inverse

P(A1| B) = P(A1 ∩ B) / P(B)

= P( B | A1 ) P(A1) / P(B)

= P( B|A1 ) P(A1) / (P(B|A1)P(A1) + P(B|A2)P(A2) )

Remember P(B) = P(B|A1)P(A1) + P(B|A2)P(A2)

Pagano and Gavreau, Chapter 6

Page 44: Biostat 200 Lecture 2. Trimmed mean (chapter 3) In order to remove extreme values that might affect the mean, you can calculate a trimmed mean Remove

For next time

• Read Pagano and Gauvreau– Chapter 6 (Review of today’s material)– Chapter 7