Statistical Methods For Health Research. History Blaise Pascl: tossing ……probability William Gossett: standard error of mean “ how large the sample should

Statistical Methods For Health Research

History

• Blaise Pascl: tossing ……probability • William Gossett: standard error of

mean “ how large the sample should be for

a given degree of precision, to extrapolate accuracy

Why statistics?

1. You will be able to read research (read, understand, and evaluate)

2. You will be able to do you own research ( you know what you do, you defend what you do, you are able to make changes if changes is in your side)

Levels of measurement

• Need to understand levels of measurement to be able to evaluate the appropriateness of the analysis

• Nominal –classifying objects into mutually exclusive categories (various independent categories)

e.g. gender, marital status

• Ordinal – sorting of objects on the basis of their standing, relative to each other

• Two rules: 1. Equality – on equality rule 2. Greater than or less than rule.

E.g. dependency1=completely dependent on others2=needs another’s assistance3=needs mechanical assistance4=complete independence

• Interval – specifies the rank order and the distance between objects; theoretically, never an absolute zero

• “How much greater or how much less”• “ the distance between the categories are equal” • E.g. temperature (scale does not have an absolute

zero) Interval is probably the most commonly used level

of measurement used in nursing research

• Ratio – equal intervals and an absolute zero starting point. Many physical characteristics; e.g. length, weight, volume

Levels and analysis

• The levels of measurement are the determining factor in deciding on the type of statistics to be used in the analysis

[ the choice as to which statistical test can be legitimately used]

Statistics

Descriptive Inferential

Univariate Bivariate Parametric Nonparametric

Planning for data analysis• Descriptive statistics provides description of the

data from your particular sample[ a technique used to describe large amount of data

in abbreviated symbolic form]

• Inferential analysis, on the other hand, allows you to draw inferences about the larger population from your sample date.

[ technique used to measure a sample (subgroups) and then genralize these measures to the population (entire group]

Descriptive statistics A: univariate

• Frequency distribution – number of time each event occurs is counted; e.g. bar charts

• Measures of central tendency - Mode (most frequent)- Median (mid point)- Mean (average)

Measures of variability

• refers to the spread or dispersion of the data• Range – difference between the highest and

lowest scores• Standard deviation (SD) – the most frequently

used measure of variability

Descriptive statistics

B: Bivariate: degree and magnitude of relationship between two variables

• Contingency table• Correlation– Pearson r for interval level data– Spearman’s rho and Kendall’s tau for ordinal

level data

Inferential statistics

• Parametric statistics – used to describe inferential statistics that assume a normal distribution of the variables and the use of interval or ratio measures.

- t tests; analysis of variance; analysis of covariance• Nonparametric – do not require the same rigorous

assumptions as parametric statistics.Often used when the sample size is small and data are

nominal or ordinal- chi-square

Statistical Significance or p value (alpha level)

• Statistical significant means that the obtained results are not likely to be the results of chance fluctuations at the specified level of probability.

p <.05 There are 5 chances out of 100 that the results are due to chance, and 95 chances that the results are due to the intervention!!!

P = 0.01 1 in 100P = 0.05 5 in 100P = 0. 001 1 in 1000

Steps

1. Hypothesis stated2. P value set p=.05

p=.01p=.001

3. Analysis completed4. Is results are as predicted in the hypothesis

and the p value the same or smaller, results supported.

Interpretation of findings

• Results – the results of the analysisThe information from each Research Question/s

or Hypothesis should be presented• Then the “So what” question• Discussion - must address all results and flow

clearly from results

Practical vs statistical significance

The discussion section may include a discussion of practical or clinical significance vs statistical significance. The former relates to the importance of the findings to the clinical population, even if statistical significance not found

The normal curve (distribution)

Major characteristics 1. most of the scores cluster around the middle of the

distribution 2. symmetrical 3. mean = median = mode 4. constant relationship (percentage) with SD5. Asymptotic

Skewness and kurtosis of Dn

• Neg Skewness: Q2 – Q1 > Q3 – Q2 • Pos Skewness: Q2 – Q1 < Q3 – Q2 • Nor distribution: Q2 – Q1 = Q3 – Q2

• Skewness: -1 to + 1• Kurtosis: +ve leyptikurtic (peaked) • - ve platykutric (flat)

55 70 85 100 115 130 145 -3S -2S -1S X +1S +2S +3S

NORMAL

POSITIVE NEGATIVEBIMODAL

Area of the normal curve • + 1 s = 34 + 34 = 68% • + 2 s = 47.5 + 47.5 = 95%• + 3 s = 49.5 + 49.5 = 99%• Z scores: is says how much above and below

the mean a given score is in S units• Z score is translation of raw scores into units

of SD

• FOR RAW SCORE

• FOR SAMPLE

• Z scores are helpful for comparing performances• try ? 1. x = 85, M = 65, s = 10 2. x = 60, M = 55, s = 10

Frequency And Visual Displays

• Range ( X MAX – X MIN

• width of interval = range / number of intervals• GRAPHS: – Histogram – Polygon – Ogive curve

Histogram

Polygon

OUTLIERS = 1.5 IQR (Q3-Q1)

hinges

hinges

OUTLIERS = 1.5 IQR (Q3-Q1)

Q3

Q1

see page 55

STEM AND LEAF DISPLAYS

Significance & Testing Hypothesis

• H0 : the event in question is only due to chance

• Significance: excluding chances as one of the explanation

• Significance = rejecting the null hypothesis

Testing hypothesis

• For example: • T test H0 : µ1 = µ2

• Ha : µ1 ≠ µ2

Types of Error

• Sampling error: error in selecting the sample • Error of inference: error in drawing the

conclusion. • Type I, type II

• The main principle is that the H0 is true and we are trying to test whether it is true or false • The researcher has to make the Ha

that support his perspective • Therefore, we are testing the H0

Factor affect type II error

1. Sample size

2. The difference between the H0 and Ha

3. Heterogeneity of the sample as the population suppose to be

• The researcher should avoid falling in type II error.

Confidence interval

• The area in which the mean of the population falls.

• Using the standardized scores• Therefore, t scores are mainly used for that

purpose.

• 95% = M ± 1.96 (SE)• 99% = M ± 2.58 (SE)

Data Preparation

• Checking the Data For Accuracy– are the responses legible/readable? – are all important questions answered? – are the responses complete? – is all relevant contextual information included

(e.g., data, time, place, researcher)?

• Developing a Database Structure– variable name – variable description – variable format (number, data, text) – instrument/method of collection – date collected – respondent or group – variable location (in database) – notes

• Entering the Data into the Computer• Data Transformations• Once the data have been entered it is almost always

necessary to transform the raw data into variables that are usable in the analyses. There are a wide variety of transformations that you might perform. Some of the more common are:

• missing values • item reversals

• scale totals • categories • For many variables you will want to collapse

them into categories. For instance, you may want to collapse income estimates (in dollar amounts) into income ranges.

Outliers

• Steps on SPSS 1. Analyze 2. Descriptive3. Explore 4. Statistics ……plots 5. Outliers

OUTLIRIES

• SRESID: Studentized residual.

• ZRESID: Normalized residual.: Residual divided by the square root of the product of PRED and 1–PRED

• COOK: Analog of Cook’s influence statistic (Cook's distance may be considered "large" if substantially larger than 1) .

Documents

Statistical Methods For Health Research. History Blaise Pascl: tossing ……probability William Gossett: standard error of mean “ how large the sample should