Upload
vivien-pierce
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
Statistical Methods For Health Research
History
• Blaise Pascl: tossing ……probability • William Gossett: standard error of
mean “ how large the sample should be for
a given degree of precision, to extrapolate accuracy
Why statistics?
1. You will be able to read research (read, understand, and evaluate)
2. You will be able to do you own research ( you know what you do, you defend what you do, you are able to make changes if changes is in your side)
Levels of measurement
• Need to understand levels of measurement to be able to evaluate the appropriateness of the analysis
• Nominal –classifying objects into mutually exclusive categories (various independent categories)
e.g. gender, marital status
• Ordinal – sorting of objects on the basis of their standing, relative to each other
• Two rules: 1. Equality – on equality rule 2. Greater than or less than rule.
E.g. dependency1=completely dependent on others2=needs another’s assistance3=needs mechanical assistance4=complete independence
• Interval – specifies the rank order and the distance between objects; theoretically, never an absolute zero
• “How much greater or how much less”• “ the distance between the categories are equal” • E.g. temperature (scale does not have an absolute
zero) Interval is probably the most commonly used level
of measurement used in nursing research
• Ratio – equal intervals and an absolute zero starting point. Many physical characteristics; e.g. length, weight, volume
Levels and analysis
• The levels of measurement are the determining factor in deciding on the type of statistics to be used in the analysis
[ the choice as to which statistical test can be legitimately used]
Statistics
Descriptive Inferential
Univariate Bivariate Parametric Nonparametric
Planning for data analysis• Descriptive statistics provides description of the
data from your particular sample[ a technique used to describe large amount of data
in abbreviated symbolic form]
• Inferential analysis, on the other hand, allows you to draw inferences about the larger population from your sample date.
[ technique used to measure a sample (subgroups) and then genralize these measures to the population (entire group]
Descriptive statistics A: univariate
• Frequency distribution – number of time each event occurs is counted; e.g. bar charts
• Measures of central tendency - Mode (most frequent)- Median (mid point)- Mean (average)
Measures of variability
• refers to the spread or dispersion of the data• Range – difference between the highest and
lowest scores• Standard deviation (SD) – the most frequently
used measure of variability
Descriptive statistics
B: Bivariate: degree and magnitude of relationship between two variables
• Contingency table• Correlation– Pearson r for interval level data– Spearman’s rho and Kendall’s tau for ordinal
level data
Inferential statistics
• Parametric statistics – used to describe inferential statistics that assume a normal distribution of the variables and the use of interval or ratio measures.
- t tests; analysis of variance; analysis of covariance• Nonparametric – do not require the same rigorous
assumptions as parametric statistics.Often used when the sample size is small and data are
nominal or ordinal- chi-square
Statistical Significance or p value (alpha level)
• Statistical significant means that the obtained results are not likely to be the results of chance fluctuations at the specified level of probability.
p <.05 There are 5 chances out of 100 that the results are due to chance, and 95 chances that the results are due to the intervention!!!
P = 0.01 1 in 100P = 0.05 5 in 100P = 0. 001 1 in 1000
Steps
1. Hypothesis stated2. P value set p=.05
p=.01p=.001
3. Analysis completed4. Is results are as predicted in the hypothesis
and the p value the same or smaller, results supported.
Interpretation of findings
• Results – the results of the analysisThe information from each Research Question/s
or Hypothesis should be presented• Then the “So what” question• Discussion - must address all results and flow
clearly from results
Practical vs statistical significance
The discussion section may include a discussion of practical or clinical significance vs statistical significance. The former relates to the importance of the findings to the clinical population, even if statistical significance not found
The normal curve (distribution)
Major characteristics 1. most of the scores cluster around the middle of the
distribution 2. symmetrical 3. mean = median = mode 4. constant relationship (percentage) with SD5. Asymptotic
Skewness and kurtosis of Dn
• Neg Skewness: Q2 – Q1 > Q3 – Q2 • Pos Skewness: Q2 – Q1 < Q3 – Q2 • Nor distribution: Q2 – Q1 = Q3 – Q2
• Skewness: -1 to + 1• Kurtosis: +ve leyptikurtic (peaked) • - ve platykutric (flat)
55 70 85 100 115 130 145 -3S -2S -1S X +1S +2S +3S
NORMAL
POSITIVE NEGATIVEBIMODAL
Area of the normal curve • + 1 s = 34 + 34 = 68% • + 2 s = 47.5 + 47.5 = 95%• + 3 s = 49.5 + 49.5 = 99%• Z scores: is says how much above and below
the mean a given score is in S units• Z score is translation of raw scores into units
of SD
• FOR RAW SCORE
• FOR SAMPLE
• Z scores are helpful for comparing performances• try ? 1. x = 85, M = 65, s = 10 2. x = 60, M = 55, s = 10
Frequency And Visual Displays
• Range ( X MAX – X MIN
• width of interval = range / number of intervals• GRAPHS: – Histogram – Polygon – Ogive curve
Histogram
Polygon
OUTLIERS = 1.5 IQR (Q3-Q1)
hinges
hinges
OUTLIERS = 1.5 IQR (Q3-Q1)
Q3
Q1
see page 55
STEM AND LEAF DISPLAYS
Significance & Testing Hypothesis
• H0 : the event in question is only due to chance
• Significance: excluding chances as one of the explanation
• Significance = rejecting the null hypothesis
Testing hypothesis
• For example: • T test H0 : µ1 = µ2
• Ha : µ1 ≠ µ2
Types of Error
• Sampling error: error in selecting the sample • Error of inference: error in drawing the
conclusion. • Type I, type II
• The main principle is that the H0 is true and we are trying to test whether it is true or false • The researcher has to make the Ha
that support his perspective • Therefore, we are testing the H0
Factor affect type II error
1. Sample size
2. The difference between the H0 and Ha
3. Heterogeneity of the sample as the population suppose to be
• The researcher should avoid falling in type II error.
Confidence interval
• The area in which the mean of the population falls.
• Using the standardized scores• Therefore, t scores are mainly used for that
purpose.
• 95% = M ± 1.96 (SE)• 99% = M ± 2.58 (SE)
Data Preparation
• Checking the Data For Accuracy– are the responses legible/readable? – are all important questions answered? – are the responses complete? – is all relevant contextual information included
(e.g., data, time, place, researcher)?
• Developing a Database Structure– variable name – variable description – variable format (number, data, text) – instrument/method of collection – date collected – respondent or group – variable location (in database) – notes
• Entering the Data into the Computer• Data Transformations• Once the data have been entered it is almost always
necessary to transform the raw data into variables that are usable in the analyses. There are a wide variety of transformations that you might perform. Some of the more common are:
• missing values • item reversals
• scale totals • categories • For many variables you will want to collapse
them into categories. For instance, you may want to collapse income estimates (in dollar amounts) into income ranges.
Outliers
• Steps on SPSS 1. Analyze 2. Descriptive3. Explore 4. Statistics ……plots 5. Outliers
OUTLIRIES
• SRESID: Studentized residual.
• ZRESID: Normalized residual.: Residual divided by the square root of the product of PRED and 1–PRED
• COOK: Analog of Cook’s influence statistic (Cook's distance may be considered "large" if substantially larger than 1) .