Upload
michael-goodwin
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
© 2006
Dr Rotimi Adigun
Recommended text
High Yield Biostatistics, Epidemiology and Public Health ,Michael Glaser, Fourth edition.
Pre-Test Preventive Medicine and Public Health, Sylvie Ratelle(Chapter 1) for practice questions.
I. Descriptive statistics Populations, samples
and Elements Probability Types of Data Frequency Distribution Measures of Central
Tendency Measures of Variability Z scores
II. Inferential Statistics Statistics and
Parameters Estimating mean T-scores Hypothesis testing Steps of hypothesis
testing Z-tests The meaning of
statistical significance Type I and II errors Power Differences between
groups
III. Correlation and
Predictive Techniques
Correlation Survival analysis
IV.
Research methods Sampling techniques Assessing the
Evidence Hierarchy of
evidence Systematic review Clinical Decision
making
© 200604/21/23
Descriptive Statistics,Population, Samples, Elements
Types of Data, Measures of Central tendency, Measures of Variability,
Z-scores.
A way to summarize data from a sample or a population
DSs illustrate the shape, central tendency, and variability of a set of data The shape of data has to do with the
frequencies of the values of observations
DSs= descriptive statistics.
04/21/23 6
Population A population is the group from which a sample is drawn e.g., automobile crash victims in an
emergency room ,Student scores on block I exams at Windor,
In research, it is not practical to include all members of a population
Thus, a sample (a subset of a population) is taken
04/21/23 7
Elements=A single observation—such as one students score—is an element, denoted by X.
The number of elements in a population is denoted by N
The number of elements in a sample is denoted by n.
04/21/23 8
Data Measurements or observations of a variable
Variable A characteristic that is observed or
manipulated Can take on different values
04/21/23 9
Independent variables Precede dependent variables in time Are often manipulated by the researcher The treatment or intervention that is used in a
study Dependent variables
What is measured as an outcome in a study Values depend on the independent variable
04/21/23 10
Independent and Dependent variables e.g.
-You study the effect of different drugs on colon cancer, the drugs, dosage, timing would be independent variables, the effect of the different drugs, dosage ,and timing on cancer would be the dependent variable.
04/21/23 11
We compare the effect of stimulants on memory and retention.
Independent variables would be the different stimulants and dosages..dependent variables would be the different outcomes..(improved retention, decreased retention or no effect)
04/21/23 12
Mathematically on a graph or equation where Y depends on the value of X,
Y is a function of X or f(x)
Independent variable is usually plotted on the X axis while the dependent variable is plotted on the Y axis.
04/21/23 13
Parameters Summary data from a population
Statistics Summary data from a sample
04/21/23 14
Central tendency describes the location of the middle of the data
Variability is the extent values are spread above and below the middle values a.k.a., Dispersion
DSs can be distinguished from inferential statistics DSs are not capable of testing hypotheses
04/21/23 15
Mean (a.k.a., average) The most commonly used DS
To calculate the mean Add all values of a series of numbers and then
divided by the total number of elements
04/21/23 16
Mean of a sample
Mean of a population
(X bar) refers to the mean of a sample and refers to the mean of a population
X is a command that adds all of the X values n is the total number of values in the series of a
sample and N is the same for a population
X μ
04/21/23 17
N
X
n
XX
Mode The most frequently
occurring value in a series
The modal value is the highest bar in a histogram
04/21/23 18
ModeMode
Median The value that divides a series of values in
half when they are all listed in order When there are an odd number of values
The median is the middle value When there are an even number of values
Count from each end of the series toward the middle and then average the 2 middle values
04/21/23 19
Each of the three methods of measuring central tendency has certain advantages and disadvantages
Which method should be used? It depends on the type of data that is being
analyzed e.g., categorical, continuous, and the level of
measurement that is involved
04/21/23 20
There are 4 levels of measurement Nominal, ordinal, interval, and ratio
1. Nominal Data are coded by a number, name, or letter
that is assigned to a category or group No ordering Examples
Gender (e.g., male, female) Race Treatment preference (e.g., Surgery,
Radiotherapy, Hormone, Chemotherapy)
04/21/23 21
2. Ordinal Is similar to nominal because the
measurements involve categories However, the categories are ordered by rank Examples
Pain level (e.g., mild, moderate, severe) Military rank (e.g., lieutenant, captain, major,
colonel, general) Opinion- (Agree, strongly agree) Severity – Mild, moderate, severe (dysplasia)
04/21/23 22
Ordinal values only describe order, not quantity Thus, severe pain is not the same as 2 times
mild pain The only mathematical operations
allowed for nominal and ordinal data are counting of categories e.g., 25 males and 30 females
04/21/23 23
3. Interval Measurements are ordered (like ordinal data) Have equal intervals Does not have a true zero Examples
The Fahrenheit scale, where 0° does not correspond to an absence of heat (no true zero)
In contrast to Kelvin, which does have a true zero
04/21/23 24
4. Ratio Measurements have equal intervals There is a true zero Ratio is the most advanced level of
measurement, which can handle most types of mathematical operations (including multiplication and division which are absent in interval scales)
04/21/23 25
Ratio examples Range of motion
No movement corresponds to zero degrees The interval between 10 and 20 degrees is the
same as between 40 and 50 degrees Lifting capacity
A person who is unable to lift scores zero A person who lifts 30 kg can lift twice as much as
one who lifts 15 kg
04/21/23 26
NOIR is a mnemonic to help remember the names and order of the levels of measurement Nominal
OrdinalIntervalRatio
04/21/23 27
Measurement scalePermissible mathematic
operationsBest measure ofcentral tendency
Nominal Counting Mode
OrdinalGreater or less than
operationsMedian
Interval Addition and subtractionSymmetrical – Mean
Skewed – Median
RatioAddition, subtraction,
multiplication and division Symmetrical – Mean
Skewed – Median
04/21/23 28
Histograms of frequency distributions have shape
Distributions are often symmetrical with most scores falling in the middle and fewer toward the extremes
Most biological data are symmetrically distributed and form a normal curve (a.k.a, bell-shaped curve)
04/21/23 29
04/21/23 30
Line depicting the shape of the data
Line depicting the shape of the data
The area under a normal curve has a normal distribution (a.k.a., Gaussian distribution)
Properties of a normal distribution It is symmetric about its mean The highest point is at its mean The height of the curve decreases as one
moves away from the mean in either direction, approaching, but never reaching zero
04/21/23 31
04/21/23 32
MeanMean
A normal distribution is symmetric about its meanA normal distribution is symmetric about its mean
As one moves away from the mean in either direction the height of the curve decreases, approaching, but never reaching zero
As one moves away from the mean in either direction the height of the curve decreases, approaching, but never reaching zero
The highest point of the overlying normal curve is at the mean
The highest point of the overlying normal curve is at the mean
04/21/23 33
Mean = Median = ModeMean = Median = Mode
The data are not distributed symmetrically in skewed distributions Consequently, the mean, median, and mode
are not equal and are in different positions Scores are clustered at one end of the
distribution A small number of extreme values are located
in the limits of the opposite end
04/21/23 34
Skew is always toward the direction of the longer tail(not the hump) Positive if skewed to the right Negative if to the left
04/21/23 35
The mean is shifted the most
Because the mean is shifted so much, it is not the best estimate of the average score for skewed distributions
The median is a better estimate of the center of skewed distributions It will be the central point of any distribution 50% of the values are above and 50% below
the median
04/21/23 36
Mean,Median and Mode
04/21/23 37
Midrange Smallest observation + Largest
observation2
Mode the value which occurs with the
greatest frequency i.e. the most common value
Summary statistics
Median the observation which lies in the
middle of the ordered observation.
Arithmetic mean (mean)Sum of all observationsNumber of observations
Summary statistics
The mean represents the average of a group of scores, with some of the scores being above the mean and some below This range of scores is referred to as variability
or spread Range Variance Standard deviation Semi-interquartile range Coefficient of variation “Standard error”
SD is the average amount of spread in a distribution of scores
The next slide is a group of 10 patients whose mean age is 40 years Some are older than 40 and some younger
04/21/23 41
04/21/23 42
Ages are spread out along an X axis
Ages are spread out along an X axis
The amount ages are spread out is known as dispersion or spread
The amount ages are spread out is known as dispersion or spread
04/21/23 43
Adding deviations always equals zero
Adding deviations always equals zero
Etc.
To find the average, one would normally total the scores above and below the mean, add them together, and then divide by the number of values
However, the total always equals zero Values must first be squared, which cancels
the negative signs
04/21/23 44
04/21/23 45
Symbol for SD of a sample for a population
Symbol for SD of a sample for a population
S2 is not in the same units (age), but SD is
S2 is not in the same units (age), but SD is
04/21/23 46
7 7
7 7 7
7
7 8
7 7 7
6
3 2
7 8 13
9
Mean = 7SD=0
Mean = 7SD=0.63
Mean = 7SD=4.04
About 68.3% of the area under a normal curve is within one standard deviation (SD) of the mean
About 95.5% is within two SDs About 99.7% is within three SDs
04/21/23 48
04/21/23 49
The number of SDs that a specific score is above or below the mean in a distribution
Raw scores can be converted to z-scores by subtracting the mean from the raw score then dividing the difference by the SD
04/21/23 51
X
z
If the element lies above the mean, it will have a positive z score;
if it lies below the mean, it will have a negative z score.
04/21/23 52
Standardization The process of converting raw to z-scores The resulting distribution of z-scores will
always have a mean of zero, a SD of one, and an area under the curve equal to one
The proportion of scores that are higher or lower than a specific z-score can be determined by referring to a z-table
04/21/23 53
04/21/23 54
Refer to a z-tableto find proportionunder the curve
Refer to a z-tableto find proportionunder the curve
04/21/23 55
Partial z-table (to z = 1.5) showing proportions of the area under a normal curve for different values of z.
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.94410.93320.9332
Corresponds to the area under the curve in black
Corresponds to the area under the curve in black
Tables of z scores state what proportion of any normal distribution lies above or below any given z scores, not just z scores of ±1, 2, or 3.
Z-score tables can be used to - Determine proportion of distribution with a
certain score(e.g finding the proportion of the class with a score of 65% on an exam)
-Find scores that divide the distribution into certain proportions(for example finding what scores separates the top 5% of the class from the remaining 95%)
04/21/23 56
Allows us to specify the probability that a randomly picked element will lie above or below a particular score.
For example, if we know that 5% of the population has a heart rate above 90 beats/min, then the probability of one randomly selected person from this population having a heart rate above 86.5 beats/min will be 5%.
04/21/23 57
What is the probability that a randomly picked person would have a heart rate of less than 50 beats per minute in a population with S.D of 10 and mean heart beat of 70z?
04/21/23 58