Biometri cs 2007 Lecture 8

BiometriBiometricscs 2007 2007 Lecture 8Lecture 8

László Pótó

From the sample to the population…From the sample to the population… Remember: Biometrics is about making conclusion about the unknown

population based on the collected data (sample).

Typical questions:- Is a given lab-data (of a group of patients) different from the „healthy” value? (what is the expected value – for healthy

people?)- Is a measuring tool/process sharp enough (pipette, drug content of pills, box of sugar, and so on…)?- Does a complete series of measurements give the proof that the values are over a certain limit (air or water pollution, …)?

The problem: how to make conclusion: from x and sx to µ and x és sx (and ‘n’, so the measures of the sample) are known…, but:what about µ and ? So: which population come the sample from.

Two methods: - estimation- hypothesis testing

–

–

The confidence interval for the The confidence interval for the µ For 100 intervals created for n-data samples by x ±t* sx /n…¯

In case of 16 data the mean ± 2.13* sx /n interv. contains the exp. value of the population by 95% prob.

t values

n-1 p=95%

2 4,30

5 2,57

8 2,31

10 2,23

15 2,13

20 2,09

50 2,01

1000 1,96

Z= 1,96

-4 -3 -2 -1 1 2 3 4 5

1.2.

3.

96.

95.

100.

Summary: Because of the increased (t value, depends on n) intervals 95 contain again the µ - out of the 100 different center (the means are different) and length (the std devs are different) intervals.

The x ±t* sx /n is the p% confidence interval of the µ(for n=16 and p=95% the t=2.13)

¯

An inside value is a possible µ, an outside one is not (5% error risk)

Calculation of the confidence intervalCalculation of the confidence interval The drug content of pills at a pharmacological factory was

checked by the measures of a 16 pills sample. The measures are: n=16, mean=102.1 mg, S.D.= 4mg.

Can the expected value be 100mg?

Let’s see the second method for giving answer to such kind of questions:the hypothesis testing method!

The 95% conf. intv. (in mg): 102.1±2.13*4/16 =102.1±2.13= = (99.97- 104.23)mg

The 100mg is inside of it, so the 100mg is a possible !(by 95% confidence or 5% error-risk)

Interpretation: When repeating the experiment 100 times – having 100 datasets: 100 different means and S.D.s – and calculating the 95% CI from each on the above way, then 95 out of the 100 different CI would contain the real expected value (the ) and only 5 CI not. But note, please, that we can not know that which is the only one C.I. out of the above 100? Is that one out of the 95 (that „contains”) or the 5 (that is „not…”)!

The hypothesis testing – 1The hypothesis testing – 1 An „everyday life” model

I remember like hearing some noises of heavy rain during night. How can I decide in the morning, whether it was a rain or just a dream?

1, Let’s suppose it was not… (it was just a dream…)

(2,) Decide what do I mean on „probable” and on „not probable”… (this is more or less obvious now!)

3, Estimate how probable would be the observed fact in the case of the 1st point hypothesis? (suppose it IS true now!)

4, Decide about the hypothesis („no rain” in this case)a, When the result of point 3 is: „not probable”, do reject…b, When the result of point 3 is: „probable”, do not reject…

5, Conclusion

Checking the method: try the opposite hypothesis at the 1st point

The hypothesis testing – 2The hypothesis testing – 2 Hypothesis testing in biometrics

„The drug content of 16 pills…” example. Mean: 102.1 mg, S.D. 4mg. Can be the expected value 100mg?

1, Suppose that =100mg is true! No significant difference, the difference is just by chance! — „null”-hypothesis — : H0

2, Let’s choose the low-end of „probable” is 5%. „Border for decision”: . So let it be now = 0.05

3, If =100mg, than how probable that the mean of 16 data would differ from this at least by 2.1mg?

4, Decide about the hypothesis („ =100mg”) Because at point 3: p> („probable”), do not reject!5, Conclusion: The mean is not significantly different than the hypothetical expected value. So can be 100mg!

- As to last week: the difference between the mean and the is t*S.E. (hereSE= 4mg/16=1mg) where „t” follows df=n-1 (here 15) t distribution.- In our case t=2.1/1=2.1 (-times the S.E.). At the t15-curve at 2.13 (figure!) would „cut” 5% area (probability), so the prob. of „at least 2.1-times” S.E.

difference is >5%. So that p>0.05 (=„probable”) – (figure)

The one sample t testThe one sample t test We checked how different is the mean than a hypothetical („ H0”)

expected value (in S.E. units: „t” times).

When the difference „t” is big (= the area under the t curve – outside of the ‘t’ - is small that means: at least this size of difference has small probability if H0 was true) than our sample (the fact) are against of our hypothesis (null-hypothesis)

See: everyday life model of hyp. test: Reject the null-hypothesis!

When the difference „t” is small (= the area under the t curve is big)

at least this size of difference has large probability if H0 was true) than our sample (the fact) is not against of our hypothesis

(the null-hypothesis).See: everyday life model of hyp. test: Do not reject the null-

hypothesis!

This is the: One sample t test.

The probability (area) can be calculated knowing „t” (and n) using the prob dens function. By computer: „p=” (sharp) or from table: „p< ”.

The Conf. Intv. and the t test – compare -1The Conf. Intv. and the t test – compare -1 The confidence interval was that: The x ±t* sx /n interval contains

the exp. value of the population by a probability depends on the „t” value. (In case of 16 data the mean ± 2.13* sx /n interval contains the expected value by 95% probability.)

While the hypothesis-testing (1 sample t test): For any hypothetical µ (H0) at least that difference (t) of the mean of our actual sample from µ (so: x- µ) could happen by p probability.

the probability is the area outside of the (-t, t) intv (t dens fct.)

- When t is big (the prob is small p<), reject null-hypothesis

- When t is not big (the prob is not small p), do not reject H0

What is the case when the µ examined by hypothesis testing - is inside of the confidence interval, and what if it - is outside of the confidence interval?

¯

¯

the „inside” of the intv means: confidence (95 cases out of 100) the „outside” of the intv means : error-risk (5 cases out of 100)

The Conf. Intv. and the t test – compare -2The Conf. Intv. and the t test – compare -2 The meaning of the x ±t* sx /n interval (right at the border)¯

An equivalent meaning is that: the µ ±t* sx /n interval

95%5%

x

µ

¯-t(95%) t(95%)

x̄

The probability of „at least this difference”

is just 5%

The Conf. Intv. and the t test – compare -3The Conf. Intv. and the t test – compare -3 For an „outside” hypothetical µ: (= „big” distance):

t > t(95%) , p < reject H0

For an „inside” hypothetical µ: (= „small” distance)

t < t(95%) , p > not reject H0

45%55%

x̄

x̄µ

µ

The error risk of a decisionThe error risk of a decision For an „outside” hypothetical µ: p < probability for the sample case

– this is „unlikely”, so reject the null-hypothesis.

t > t(95%) , p < reject H0

For an „inside” hypothetical µ:

t < t(95%) , és p > not reject H0

Can be also wrong: Type 2 error

Wrong decision by „p” probability: risk of the

Type 1 error

x̄µ

45%55%

x̄ µ

How to decrease the error risk(s)?How to decrease the error risk(s)? For the Type 1 error: decrease

- the „not reject H0” interval is increasing - we would reject less and less H0 because the area for „small” p values would decrease - decreasing type 1 error risk: the p itself (p < ).

For the Type 2 error: increase - the „not reject H0” interval is decreasing - we would accept less and less H0 because the area for „large” p values would decrease - decreasing type 2 error risk.

But meanwhile - more and more H0 would be rejected, so - increasing risk of type 1 error: the p itself (p ).

But meanwhile - more and more H0 would be accepted, so - increasing risk of type 2 error

What is the best What is the best ??Example: the case of a murder trial: H0 : innocent1. „just not to imprison an innocent!”

prefer H0 (decrease ) increasing risk of type 2 error(accept a wrong null-hypothesis)

2. „ just not to let free a murderer!” prefer rejecting H0 (increase ) increasing risk of type 1 error

(reject a good null-hypothesis)

Example : is the new (say the 35th of this kind) drug effective? H0 : not1. „ just not to produce an ineffective product!” (money!)

prefer H0 (decrease : =0.01) increasing risk of type 2 error (accept a wrong null-hypothesis)

Example : is there some side-effect of the new drug?: H0 : no2. „ just not to have a hidden side effect”

prefer rejecting H0 (increase : =0.1) increasing risk of type 1 error

(reject a good null-hypothesis)The =0.05 is a „golden middle” when there are no special preferences

What was it today?What was it today? The hypothesis testing

H0 is the „no effect” case (only the „blind chance” is acting)

The confidence interval and the hypothesis testing (for the µ )

1. do reject every hypothetical „outside” µ, not the „inside” ones

2. both decisions has an error riskRisks for the Type 1 and Type 2 errors

3. when decreasing the one the other will increase (border )

4. selection of this border is not freedepending on the problem increase or decrease it

- the 5% is a good „middle”

Coming next: compare the means of two independent samples (which one is the more effective drug,…?)

From the textbooks:From the textbooks:

Belágyi: pp. 50-58 Moore: 365-407

Thank you for your attention!

What have we learned so farWhat have we learned so far?? Basic terms

Why biometrics (importance) – decisions based on probability

The interpretation of probability – variables The data – the population and the sample

Description and measures of the sample – histogram, mean, S.D.

Applications: difference between a sample mean and an expected value? (which population come the sample from?) Two methods

The population – distributions, density fct., parameters: µ,

The normal distribution

The confidence interval for the µ (estimation) The one sample t test (hypothesis testing)

Documents

Biometri cs 2007 Lecture 8