14
Exercises for the Course “Methodology of Scientific Research” prepared by Dr. Amke Caliebe Dr. Sandra Freitag-Wolf Dipl.Math. Elfriede Fritzer Dipl.Math. Arne Jochens Dipl.Inf. Jürgen Hedderich Dipl.Math. Oliver Vollrath Prof. Dr. Michael Krawczak Institut für Medizinische Informatik und Statistik Medizinische Fakultät der Christian-Albrechts-Universität Kiel Universitäts-Klinikum Schleswig-Holstein Campus Kiel Brunswiker Strasse 10 24105 Kiel

Exercises for the Course “Methodology of Scientific Research” · Exercises for the Course “Methodology of Scientific Research” 7 Module 6 (Statistical Testing I) 1. ( hypotheses,

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Exercises for the Course “Methodology of Scientific Research” · Exercises for the Course “Methodology of Scientific Research” 7 Module 6 (Statistical Testing I) 1. ( hypotheses,

Exercises for the Course “Methodology of Scientific Research” prepared by Dr. Amke Caliebe Dr. Sandra Freitag-Wolf Dipl.Math. Elfriede Fritzer Dipl.Math. Arne Jochens Dipl.Inf. Jürgen Hedderich Dipl.Math. Oliver Vollrath Prof. Dr. Michael Krawczak Institut für Medizinische Informatik und Statistik Medizinische Fakultät der Christian-Albrechts-Universität Kiel Universitäts-Klinikum Schleswig-Holstein Campus Kiel Brunswiker Strasse 10 24105 Kiel

Page 2: Exercises for the Course “Methodology of Scientific Research” · Exercises for the Course “Methodology of Scientific Research” 7 Module 6 (Statistical Testing I) 1. ( hypotheses,

Exercises for the Course “Methodology of Scientific Research” 2

Module 1 (Descriptive Statistics) 1. (measures of location and dispersion, box-whisker plot) During the course of a population health study, the body mass index (BMI) was determined of 100 randomly chosen males, aged 18 to 65 years. The results are given in ascending order in Table 1.1.

Table 1.1: BMI (kg/m2) of 100 randomly chosen males, aged 18 to 65 years.

19.02 19.85 20.09 20.27 20.31 20.54 20.54 20.57 20.75 21.00

21.00 21.14 21.25 21.25 21.29 21.32 21.43 21.49 21.80 21.87

22.16 22.32 22.73 22.81 22.85 22.94 23.01 23.14 23.19 23.23

23.24 23.30 23.31 23.33 23.35 23.50 23.51 23.56 23.59 23.60

23.61 23.62 23.71 24.47 24.49 24.49 24.53 24.60 24.64 24.70

24.78 24.85 25.05 25.07 25.13 25.24 25.36 25.48 25.51 25.74

25.77 25.80 25.80 25.99 26.08 26.12 26.23 26.24 26.26 26.42

26.60 26.75 26.82 27.18 27.27 27.29 27.50 27.52 27.56 27.84

28.11 28.27 28.39 28.94 29.12 29.19 29.45 29.66 30.43 31.10

31.75 31.87 32.14 32.71 32.94 33.17 34.99 35.28 39.03 41.85 a. From the 100 probands, 15 candidates for an embedded intervention study were selected at random. Their BMI values are highlighted in grey in Table 1.1. Calculate the mean, standard deviation, median and inter-quartile range for this subsample. b. Prepare a box-whisker plot of the complete data set in Table 1.1.

Page 3: Exercises for the Course “Methodology of Scientific Research” · Exercises for the Course “Methodology of Scientific Research” 7 Module 6 (Statistical Testing I) 1. ( hypotheses,

Exercises for the Course “Methodology of Scientific Research” 3

Module 2 (Probability Theory) 1. (discrete random variable, probability function, expected value) Approximately 5% of Germans are so-called “non-responders” to hepatitis B vaccination, i.e. they do not gain immunization from the usual three-step treatment (injections at 0, 1 and 6 months). a. What is the probability function of the random variable “number of non-responders among 5 treated individuals” (“X” for short)? Calculate the expected value of X. b. Calculate the probability of finding at most one non-responder among 5 treated individuals. Calculate the probability of finding at least one non-responder among 5 treated individuals. 2. (continuous random variable, density function, normal distribution) The body height of male Kiel students can be assumed to follow a normal distribution with an expected value of 183 cm and a variance of 36 cm2. a. Draw a sketch of the density function of the random variable “body height of a male Kiel student” (“Y” for short). How can the expected value and variance of Y be interpreted in relation to its density function? b. Calculate the probability that a randomly chosen male Kiel student is a most 189 cm tall. To solve this problem, you may want to transform Y into a random variable Z with a standard normal distribution (the distribution function of the N(0,1) distribution is given in Table A.1). c. Calculate the probability that a randomly chosen male Kiel student is between 184.5 cm and 192 cm tall.

Page 4: Exercises for the Course “Methodology of Scientific Research” · Exercises for the Course “Methodology of Scientific Research” 7 Module 6 (Statistical Testing I) 1. ( hypotheses,

Exercises for the Course “Methodology of Scientific Research” 4

Module 3 (Parameter Estimation) 1. (binomial distribution, maximum likelihood estimation) In the course of one year, 1631 children were born in the gynecology clinic of a German university hospital (see Table 3.1).

Table 3.1: Annual number of newborns in a German university hospital

Sex Quarter Jan.-Mar. Apr.-Jun. Jul.-Sep. Oct.-Dec.

female 202 184 213 170 male 206 205 224 227

a. Which parameters can sensibly be estimated from the data in Table 3.1? b. Which estimator π̂ is usually used to estimate from these data the probability π that a given newborn is male? Why is π̂ a sensible estimator? c. Estimate π by means of π̂ for each quarter, each half year and for the whole year. 2. (normal distribution, point estimation, confidence interval) For a sports medical study among male Kiel students, 100 volunteers were recruited and subjected to various parameter measurements. Table 3.2 contains the body height data obtained in the course of these examinations.

Table 3.2: Body height (in cm) of 100 male Kiel students

188.4 171.5 187.3 181.7 176.1 170.0 168.3 190.8 188.2 191.3 183.3 177.9 184.7 183.1 177.4 182.0 194.8 182.0 186.7 180.1 183.4 175.2 187.9 179.9 185.1 183.4 186.8 178.6 191.4 195.8 182.7 173.5 180.4 189.7 180.3 183.1 184.6 187.6 204.8 182.1 182.0 190.1 191.1 180.8 182.3 175.0 201.6 200.3 181.1 178.8 192.4 181.1 181.6 196.9 186.7 183.0 191.9 197.3 182.3 178.7 188.2 186.4 176.6 171.3 182.0 180.4 166.9 179.0 177.1 174.8 189.3 173.4 184.7 182.5 171.5 183.0 178.1 186.5 181.8 192.6 189.8 181.0 181.4 187.7 186.3 184.5 171.5 172.7 183.6 186.2 184.6 176.7 186.1 182.3 182.6 173.2 189.2 179.6 180.3 182.5

a. Which estimators should be used to sensibly estimate the expected value µ and the standard deviation σ of the (normally distributed) body height of male Kiel students from the data in Table 3.2? b. Estimate µ and σ from the sample data and calculate the 95% and 99% confidence intervals for µ (Cues: The sum of all values in Table 3.2 equals 18324.8; the sum of all squares equals 3362986.0; t0.975,99=1.99; t0.995,99=2.63). Also note that

2n

1i

2i

n

1i

2i xnx)xx( ⋅−=− ∑∑ ==

c. A subsample of 10 students extracted for another study corresponds to the first line of Table 3.2. Calculate a 95% and a 99% confidence interval for the estimate x=181.36 of µ obtained from this subsample (Cues: s=9.09 for the subsample; t0.975,9=2.26; t0.995,9=3.25).

Page 5: Exercises for the Course “Methodology of Scientific Research” · Exercises for the Course “Methodology of Scientific Research” 7 Module 6 (Statistical Testing I) 1. ( hypotheses,

Exercises for the Course “Methodology of Scientific Research” 5

Module 4 (Epidemiology) 1. (point prevalence, period prevalence) Each year, an unusually high number of adults on the pacific island of Gose contract Lusu, an otherwise extremely rare, neurodegenerative disease. Since 1995, the WHO therefore conducts an annual census to closely monitor the disease history on the island (Table 4.1). Table 4.1: Epidemiological data on the occurrence of Lusu among the adult inhabitants of the

pacific island of Gose (1995 - 1999)

Date Population size

(≥18 years)

Total number of patients

1.1.1995 3102 86

1.1.1996 3118 88

1.1.1997 3129 91

1.1.1998 3107 92

1.1.1999 3091 83 a. Estimate the point prevalence of the disease for each census time point. b. Estimate the period prevalence for the whole surveying period (1.1.1995-1.1.1999). Assume that a total of 101 patients lived on Gose during this period. 2. (incidence proportion, odds ratio, relative risk) A retrospective analysis of 58 hospitalized leukaemia cases revealed that seven of them used to work in a nearby synthetic rubber factory for over a year. In a comparable control group of 1662 patients with a leisure accident, only 11 employees of the same factory were identified. Therefore, 8728 synthetic rubber factory workers from all over Germany were prospectively monitored for 5 years. The number of leukaemia incidents in this group was compared to that observed among 11214 metal workers during the same time period. In the first cohort, 17 new cases were identified, compared to only one case in the second cohort. a. Compare the two epidemiological studies with regard to their design and evidential power, using the criteria in Table A.3 (see Appendix). b. Estimate the effect measures discussed in (a) and calculate a 95% confidence interval for each of them. c. From the relative risk, calculate the attributable risk (AR) and the population attributable risk (PAR) assuming that 0.05% of the German population work in a synthetic rubber factory.

Page 6: Exercises for the Course “Methodology of Scientific Research” · Exercises for the Course “Methodology of Scientific Research” 7 Module 6 (Statistical Testing I) 1. ( hypotheses,

Exercises for the Course “Methodology of Scientific Research” 6

Module 5 (Diagnostic Testing) 1. (conditional probability, Bayes theorem) HIV infections can be probed by means of the ELISA test. This procedure, which targets antibodies against HIV in the proband’s blood, but not the virus itself, has a sensitivity and specificity of 99.5%. According to the Robert Koch Institute, the prevalence of HIV infection equals 0.01% in the “low risk” part of the German population (i.e., heterosexual, no drug abuser), compared to 15% among intravenous drug abusers. a. Calculate the positive and negative predictive value of the ELISA test for both subpopulations. b. What would be the positive predictive value of a second ELISA test carried out for a low risk proband with a positive result in the first test? 2. (sensitivity, specificity) Gestational diabetes (GD) can be diagnosed by two alternative procedures that are less cumbersome than the usual oral 100g glucose tolerance test (the current “gold standard” for GD diagnostics). This includes the 50g glucose challenge test (GCT) and the measurement of the fasting plasma glucose concentration (FPGC). In the late 1990s, both the evidential value and the reliability were compared between the two procedures in a prospective study by Perucchini et al. (BMJ 319:812, 1999). Of a total of 520 randomly selected pregnant women, 53 were diagnosed with gestational diabetes using the oral 100g glucose tolerance test. Prior to this assessment, the same women were also subjected to GCT and FPGC measurement (see Table 5.1).

Table 5.1: Comparison of GCT and FPGC measurement as diagnostic tests for gestational diabetes (GD) in 520 Canadian pregnancies (Perucchini et al. 1999)

GD GCT FPGC Positive Negative Positive Negative

Yes 36 17 43 10 No 84 383 112 355

a. From the data in Table 5.1, estimate the sensitivity and specificity of both diagnostic procedures. b. From a medical professional perspective, which procedure would be more suitable for routine GD screening? Why?

Page 7: Exercises for the Course “Methodology of Scientific Research” · Exercises for the Course “Methodology of Scientific Research” 7 Module 6 (Statistical Testing I) 1. ( hypotheses,

Exercises for the Course “Methodology of Scientific Research” 7

Module 6 (Statistical Testing I) 1. (hypotheses, significance level, power, one-sample t-test) Blood naturally contains lipids and lipoproteins. If the concentration is too high, however, this might result in an accumulation of lipids or lipoproteins at the artery walls. By ruptures, these so-called plaques can then get into the circulatory system and cause infarctions such as heart attack or stroke. In the adult female population the concentration of lipids and lipoproteins in blood can be assumed to follow a normal distribution with an expected value of 5.1 g/l. To investigate the assumption that this concentration is higher during pregnancy, it was measured in six pregnant women (Table 6.1). Table 6.1: Concentration of lipids and lipoproteins in the blood of six pregnant women (in g/l)

Proband Concentration 1 5.24 2 5.33 3 5.10 4 5.07 5 5.85 6 5.59

a. Formulate the scientific question outlined above as a statistical decision problem. Characterize the null hypothesis (H0) and the alternative hypothesis (HA) and determine a sensible significance level. b. Which statistical test can be used to decide between H0 and HA assuming that the concentration of lipids and lipoproteins follows a normal distribution in pregnant women as well? c. What are the critical values of the test statistic identified in (b) assuming a significance level of 5%, 1% or 0.1%, respectively (see Table A.2)? d. Calculate the test statistic identified in (b) for the data in Table 6.1 and interpret the result. e. With a sufficiently precise tabulation of the t distribution, it would be possible to determine a p value for the test statistic calculated in (d). This notwithstanding, a qualitative conclusion about the size of the p value can be drawn from the results of (d) and the t quantiles given in Table A.2 alone. Which statement is it?

Page 8: Exercises for the Course “Methodology of Scientific Research” · Exercises for the Course “Methodology of Scientific Research” 7 Module 6 (Statistical Testing I) 1. ( hypotheses,

Exercises for the Course “Methodology of Scientific Research” 8

Module 7 (Statistical Testing II) 1. (sample size calculation, significance level, power, two-sample t-test) The influence of a daily dose of 250 mg M-Oglitin upon body weight was assessed in a phase II study of 20 adipose probands. Within the first 2 weeks, the probands achieved a mean weight reduction of 4.3 kg, with an estimated standard deviation of 2.6 kg. a. How can the results of this study be interpreted? Which consequences does this interpretation have for choosing a sensible research question to be addressed in a follow-up study? b. How many probands must be included in a placebo-controlled phase III study to verify the observed effect with 80% power at the 5% significance level? To this end, assume a placebo effect of 3.2 kg weight reduction with the same standard deviation (i.e., 2.6 kg). c. Keeping the power at 80%, how many probands would be required if the significance level is decreased to 1%? Keeping the significance level at 5%, how many probands would be required if the power is increased to 90%? 2. (χ2 test, multiple testing) A prospective, double blinded, placebo-controlled study was carried out to determine the efficacy of the drug Bulliforton for the treatment of postprandial digestion problems of type II diabetes patients. The primary endpoint (EP) of the study was a reduction in UADS (“upper abdominal discomfort severity”) scores after 4 weeks (UADS4) by at least 200 point relative to the baseline (UADS0). The two secondary EPs were (i) a reduction in UADS score by at least 200 points after 2 weeks and (ii) a UADS score of less than 150 points after 4 weeks. The results of the study are summarized in Table 7.1.

Table 7.1: Results of a phase III study on the efficacy of Bulliforton as a treatment of postprandial digestion problems of type II diabetes patients

Treatment

Total primary EP secondary EPs

UADS4≤ UADS0-200

UADS2≤ UADS0-200

UADS4≤150

Verum 493 148 128 98 Placebo 487 122 101 71

a. Formulate the scientific question of the Bulliforton study as a statistical decision problem regarding the primary EP. Which statistical test can sensibly be used for decision making in this instance? b. Transform the primary EP data of Table 7.1 into a 2×2 table and calculate the test statistic identified in (a). c. For the secondary EPs, calculation of the test statistic identified in (a) yielded the following results: 3.734 (UADS2≤UADS0-200) and 4.821 (UADS4≤150). Assuming a 5% significance level, how would you interpret these results and the result under (b)? Cue: χ2

0.95,1=3.841

Page 9: Exercises for the Course “Methodology of Scientific Research” · Exercises for the Course “Methodology of Scientific Research” 7 Module 6 (Statistical Testing I) 1. ( hypotheses,

Exercises for the Course “Methodology of Scientific Research” 9

Module 8 (Correlation and Regression) 1. (linear regression, Pearson and Spearman correlation coefficient) Using a laser polarimeter, the mean thickness (in µm) of the retinal nerve fibre layer (RNFL) of 10 volunteers was measured and related to the mean visual field sensitivity (VFS; in dB) of the probands (see Table 8.1). This study was aimed at modeling the functional relationship between the two entities. Such a mathematical model would be highly relevant in glaucoma diagnostics because it appears as if a measurable destruction of the nerve fibre layer antedates the functionality loss of the retina by a long way (a phenomenon known as “functional reserve”).

Tabelle 8.1: Mean thickness of the retinal nerve fibre layer (RNFL) and mean visual field sensitivity (VFS) in 10 volunteers

Proband RNFL (µm) VFS (dB)

1 28 10 2 32 11 3 36 18 4 37 26 5 39 31 6 42 32 7 45 33 8 49 34 9 59 35

10 62 33 a. From the data in Table 8.1, calculate the intercept and slope of the least square regression line, the Pearson correlation coefficient rXY, and the coefficient of determination R2. Cues:

∑==−10

1i

2i 9.1104)xx( , ∑ =

=−10

1i

2i 1.848)yy( , .3.774)yy()xx(

10

1i ii∑==−⋅−

For the calculation of sX, sY and sXY, see footnote 1. b. Is the Pearson correlation coefficient rXY calculated in (a) significantly different from zero? c. Prepare a scatter plot of the data in Table 8.1 and include the least square regression line. How would you interpret the result of the linear regression analysis? d. Calculate the Spearman correlation coefficient ρXY for the data in Table 8.1 and compare it to the Pearson correlation coefficient rXY from (a). How can the difference between ρXY and rXY be explained?

1 For a sample of paired observations (x1,y1),…,(xn,yn), we have

∑ =−

−= n

1i2

i2X )xx(1n

1s

and

∑ =−⋅−

−= n

1i iiXY )yy()xx(1n1s

Page 10: Exercises for the Course “Methodology of Scientific Research” · Exercises for the Course “Methodology of Scientific Research” 7 Module 6 (Statistical Testing I) 1. ( hypotheses,

Exercises for the Course “Methodology of Scientific Research” 10

Module 9 (Statistical Modeling) 1. (multiple linear regression) A representative sample of 100 workers from a cadmium-processing chemical plant was assessed for a possible relationship between vital lung capacity (in liters) and job tenure (in years). On the other hand, age (in years) is known to influence vital lung capacity. Table 9.1 summarizes the results of two simple and one multiple regression analyses of these data.

Table 9.1: Regression coefficients b from simple and multiple linear regression analyses of vital lung capacity, job tenure and age in 100 chemical plant workers

Parameter Simple analyses Multiple analysis b (p value) b (p value) b (p value)

Job tenure -0.048 (<0.001) --- -0.006 (0.452) Age --- -0.051 (<0.001) -0.046 (<0.001) Intercept 6.020 (<0.001) 7.023 (<0.001) 6.924 (<0.001) Coefficient of determination 0.762 0.822 0.823

a. Formulate the model equations arising from the results of the simple and multiple regression analyses summarized Table 9.1. b. How can these results be interpreted (see also Figure A.4)? 2. (logistic regression, odds ratio) A retrospective study was carried out in 244 Dutch males, aged 55 to 65 years, who were suspected to have prostate cancer based upon their PSA (“prostate-specific antigen”) value falling into the diagnostic “grey zone” of 3-10 µg/l. In addition to the ratio (F:T) of free and total PSA, logistic regression analysis identified a positive digital rectal examination (DRE) result and a positive family history (FH) of the disease as significant risk factors for prostate cancer (Table 9.2).

Table 9.2: Logistic regression analysis of potential risk factors for prostate cancer in 244 Dutch males (55-65 years of age)

Parameter Regression coefficient 95% CI P value Intercept -0.169 -.- -.- F :T -10.197 (-18.932,-3.053) 0.0054 DRE (+) 1.643 (0.752,2.559) 0.0004 FH (+) 1.076 (0.012,2.131) 0.0437

a. How can the results of the regression analysis be interpreted? To answer this question, transform the regression coefficients of the two dichotomous influential variables DRE and FH into odds ratios. b. Calculate the prostate cancer risk of a male with an F:T ratio of 0.07, a positive DRE results and a positive FH. Compare the result to the risk of a male with F:T=0.35, a negative DRE results and a negative FH.

Page 11: Exercises for the Course “Methodology of Scientific Research” · Exercises for the Course “Methodology of Scientific Research” 7 Module 6 (Statistical Testing I) 1. ( hypotheses,

Exercises for the Course “Methodology of Scientific Research” 11

Module 10 (Survival Analysis) 1. (Kaplan-Meier estimate, log-rank test) In a clinical study, 10 newly diagnosed cancer patients were randomly assigned to each of two different chemotherapies, C1 or C2. The question was whether therapy C2 may significantly change post-therapeutic survival compared to C1. The results are given in Table 10.1.

Table 10.1: Survival (in days) of cancer patients after chemotherapy

C1 4 18(+) 55 66(+) 90 101 148 207(+) 283 441(+) C2 26(+) 70 93 105(+) 193 229(+) 242 455(+) 518 595

(+): right-censored observation a. In the following table, fill in all information necessary to estimate the Kaplan-Meier survival function for therapy C2.

ti ni di )tT(P̂ 1i−> )tT|tT(P̂ 1ii −>> )tT(P̂ i>

70 9 1 1.000 0.889 0.889 93 8 1 0.889 0.875 0.778 … … … … … … … … … … … … … … … … … … … …

b. Draw a sketch of the estimated survival function (i.e., a “Kaplan-Meier curve”) for therapy C2 and mark all censored observations. What would be the qualitative difference between this Kaplan-Meier curve and an analogous curve for therapy C1? c. Complete the following table by including all information necessary to calculate a log-rank statistic.

ti nC1,i dC1,i nC2,i dC2,i eC1,i eC2,i 4 10 1 10 0 0.500 0.500 55 8 1 9 0 0.471 0.529 70 6 0 9 1 0.400 0.600 … … … … … … … … … … … … … …

101 5 1 7 0 0.417 0.583 148 4 1 6 0 0.400 0.600 193 3 0 6 1 0.333 0.667 … … … … … … … … … … … … … … … … … … … … …

595 0 0 1 1 0.000 1.000 Summe … … … …

d. Calculate the log-rank statistic and interpret the result.

Page 12: Exercises for the Course “Methodology of Scientific Research” · Exercises for the Course “Methodology of Scientific Research” 7 Module 6 (Statistical Testing I) 1. ( hypotheses,

Exercises for the Course “Methodology of Scientific Research” 12

Table A.1: Distribution function Φ(z) of a standard normal distribution

z 0 1 2 3 4 5 6 7 8 9 0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359 0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753 0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141 0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517 0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879 0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224 0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549 0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852 0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133 0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389 1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621 1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830 1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015 1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177 1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319 1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441 1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545 1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633 1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706 1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767 2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817 2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857 2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890 2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916 2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936 2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952 2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964 2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974 2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981 2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986 3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990 3.1 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.9993 3.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.9995 3.3 0.9995 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9997 3.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998

Comment: For z values between 0.00 and 3.49, Φ(z) can be obtained directly from the above table, e.g. Φ(1.96)=0.9750. Separate tabulation of Φ(z) for negative z values is not necessary because, due to the symmetry of the normal distribution, Φ(-z)=1-Φ(z) so that, for example, Φ(-1.64)=1-Φ(1.64)=1-0.9495=0.0505.

Page 13: Exercises for the Course “Methodology of Scientific Research” · Exercises for the Course “Methodology of Scientific Research” 7 Module 6 (Statistical Testing I) 1. ( hypotheses,

Exercises for the Course “Methodology of Scientific Research” 13

Table A.2: Selected quantiles t1-α,ν of the t distribution

Degrees of freedom (ν) t0.900,ν t0.950,ν t0.975,ν t0.990,ν t0.999,ν

1 3.0777 6.3138 12.7062 31.8205 318.3088

2 1.8856 2.9200 4.3027 6.9646 22.3271

3 1.6377 2.3534 3.1824 4.5407 10.2145

4 1.5332 2.1318 2.7764 3.7469 7.1732

5 1.4759 2.0150 2.5706 3.3649 5.8934

6 1.4398 1.9432 2.4469 3.1427 5.2076

7 1.4149 1.8946 2.3646 2.9980 4.7853

8 1.3968 1.8595 2.3060 2.8965 4.5008

9 1.3830 1.8331 2.2622 2.8214 4.2968

10 1.3722 1.8125 2.2281 2.7638 4.1437

11 1.3634 1.7959 2.2010 2.7181 4.0247

12 1.3562 1.7823 2.1788 2.6810 3.9296

13 1.3502 1.7709 2.1604 2.6503 3.8520

14 1.3450 1.7613 2.1448 2.6245 3.7874

15 1.3406 1.7531 2.1314 2.6025 3.7328

16 1.3368 1.7459 2.1199 2.5835 3.6862

17 1.3334 1.7396 2.1098 2.5669 3.6458

18 1.3304 1.7341 2.1009 2.5524 3.6105

19 1.3277 1.7291 2.0930 2.5395 3.5794

20 1.3253 1.7247 2.0860 2.5280 3.5518

Table A.3: Criteria for the comparison of cohort and case-control studies

data generation (retrospective vs. prospective) data quality

costs group formation (explanatory vs. response variable)

occurrence of response (before or after beginning of study) availability of incidence information

scientific credibility effect measure

Page 14: Exercises for the Course “Methodology of Scientific Research” · Exercises for the Course “Methodology of Scientific Research” 7 Module 6 (Statistical Testing I) 1. ( hypotheses,

Figure A.4: Simple and multiple linear regression analyses of the relationship between vital lung capacity and job tenure