Upload
tao-hong
View
124
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
11
Review of following topics:Review of following topics:
Population vs. samplePopulation vs. sample Measurement scalesMeasurement scales Plotting dataPlotting data Mean & Standard deviationMean & Standard deviation Degrees of freedomDegrees of freedom Transforming dataTransforming data Normal distributionNormal distribution
- Howell (2002) Chap 1-3. ‘Statistical Methods for Psychology’- Howell (2002) Chap 1-3. ‘Statistical Methods for Psychology’
22
Population vs. samplePopulation vs. sample Population - an entire collection of Population - an entire collection of
measurementsmeasurements (e.g. reaction times, IQ scores, height or (e.g. reaction times, IQ scores, height or
even height of male Goldsmiths students) even height of male Goldsmiths students)
Sample – smaller subset of observations Sample – smaller subset of observations taken from populationtaken from population sample should be drawn randomly to make sample should be drawn randomly to make
inferences about population. Random inferences about population. Random assignment to groups improves validityassignment to groups improves validity
33
Population vs. samplePopulation vs. sample In general:In general:
population parameters =Greek letters population parameters =Greek letters sample statistics=English letterssample statistics=English letters
-worth learning glossary of other symbols now -worth learning glossary of other symbols now to avoid later confusion (e.g. Σ=the sum of)to avoid later confusion (e.g. Σ=the sum of)
PopulationPopulation SampleSample
meanmean μ μ (mu)(mu) XX
variancevariance σσ22 (sigma)(sigma) ss22
44
Measurement scalesMeasurement scales Categorical or ‘Nominal’Categorical or ‘Nominal’
e.g. male/female, or e.g. male/female, or catholic/protestant/othercatholic/protestant/other
ContinuousContinuous Ordinal - Ordinal - e.g. private/sergeant/admirale.g. private/sergeant/admiral Interval- Interval- e.g. temperature in celsius e.g. temperature in celsius Ratio - Ratio - e.g. weight, height etce.g. weight, height etc
55
Plotting dataPlotting data Basic rule is to select plot which Basic rule is to select plot which
represents what you want to say in represents what you want to say in the clearest and simplest waythe clearest and simplest way
Avoid ‘chart junk’ (e.g. plotting in 3D Avoid ‘chart junk’ (e.g. plotting in 3D where 2D would be clearer)where 2D would be clearer)
Popular options include bar charts, histograms, Popular options include bar charts, histograms, pie charts etc - see any text book. SPSS charts pie charts etc - see any text book. SPSS charts discussed in workshopdiscussed in workshop
66
Summary statisticsSummary statistics
Two essential components of data are:Two essential components of data are: (i) central tendency of the data &(i) central tendency of the data &
(ii) spread of the data (e.g. standard (ii) spread of the data (e.g. standard deviation) deviation)
Although mean (central tendency) and Although mean (central tendency) and standard deviation (spread) are most standard deviation (spread) are most commonly used, other measures can also commonly used, other measures can also be usefulbe useful
77
Measures of central Measures of central tendencytendency
Mode Mode the most frequent observation: 1, 2, 2, 3, 4 ,5the most frequent observation: 1, 2, 2, 3, 4 ,5
MedianMedian the middle number of a dataset arranged in numerical order: 0, 1, 2, 5, 1000the middle number of a dataset arranged in numerical order: 0, 1, 2, 5, 1000
(average of middle two numbers when even number of scores exist)(average of middle two numbers when even number of scores exist) relatively uninfluenced by outliersrelatively uninfluenced by outliers
Mean = Mean =
Mode Mode the most frequent observation: 1, the most frequent observation: 1, 22, , 22, , 3, 4 ,53, 4 ,5
MedianMedian the middle number of a dataset arranged in numerical order: 0, 1, the middle number of a dataset arranged in numerical order: 0, 1, 22, 5, 1000, 5, 1000
(average of middle two numbers when even number of scores exist)(average of middle two numbers when even number of scores exist) relatively uninfluenced by outliersrelatively uninfluenced by outliers
88
Measures of dispersionMeasures of dispersion Several ways to measure spread of data:Several ways to measure spread of data:
Range (max-min), IQR or Inter-Quartile Range (middle Range (max-min), IQR or Inter-Quartile Range (middle 50%), Average Deviation, Mean Absolute Deviation50%), Average Deviation, Mean Absolute Deviation
Variance – average of the squared deviationsVariance – average of the squared deviations
Variance for population of 3 scores (-10,0,10) is 66.66 (200/3)Variance for population of 3 scores (-10,0,10) is 66.66 (200/3) Standard deviation is simply the square root of the varianceStandard deviation is simply the square root of the variance
99
Calculating sample varianceCalculating sample variance Population variance (2) is the true variance of
the population calculated by -this equation is used when we have all values in a population (unusual)
However, the variance of a sample (S2) tends to be smaller than the population from which it was drawn. So, we use this equation:
The correction factor of ‘N-1’ increases the variance to be closer to the true population variance (in fact, the average of all possible sample variances exactly equals 2)
1010
Degrees of freedomDegrees of freedom Why is ‘N-1’ used to calculate sample Why is ‘N-1’ used to calculate sample
variance? variance? When calculating sample variance, we calculate When calculating sample variance, we calculate the sample mean thus making make the last the sample mean thus making make the last number in the dataset redundant – i.e. we lose a number in the dataset redundant – i.e. we lose a ‘degree of freedom’ (last no. is not free to vary)‘degree of freedom’ (last no. is not free to vary)
e.g. M=10, sample data: 12, 9, 10, 11, 8e.g. M=10, sample data: 12, 9, 10, 11, 8
Calculating the Calculating the sample mean sample mean (10) means that we have already (10) means that we have already (implicitly) included the last number in our calculations.(implicitly) included the last number in our calculations.If we (knew and) used the If we (knew and) used the population mean population mean rather than the sample rather than the sample mean this would not be the case so we could use N not N-1.mean this would not be the case so we could use N not N-1.
Howell illustrates this with a worked example (and Howell illustrates this with a worked example (and mathematical proof can be retrieved with internet search)mathematical proof can be retrieved with internet search)
Bottom line is whenever we have to estimate a Bottom line is whenever we have to estimate a statistic (e.g. mean) we lose a degree of freedomstatistic (e.g. mean) we lose a degree of freedom
1111
Transforming dataTransforming data One reason we might ‘transform’ data is to convert from One reason we might ‘transform’ data is to convert from
one scale to another one scale to another e.g. feet into inches, centigrade into fahrenheit,e.g. feet into inches, centigrade into fahrenheit,
raw IQ scores into standard IQ scoresraw IQ scores into standard IQ scores
Scale conversion can usually be achieved by simple Scale conversion can usually be achieved by simple linearlinear transformation (multiplying/dividing by a constant and transformation (multiplying/dividing by a constant and adding/subtracting a constant)adding/subtracting a constant)
XXnewnew = = bb**XXoldold + + cc
So to convert centigrade data into fahrenheit we would apply the following:So to convert centigrade data into fahrenheit we would apply the following:
1212
Transforming dataTransforming data Z-transform (standardisation) is one common Z-transform (standardisation) is one common
type of linear transform, which produces a new type of linear transform, which produces a new variable with M=0 & SD=1variable with M=0 & SD=1
ZZ -scores= -scores= XX
Standardisation is useful when comparing the same dimension Standardisation is useful when comparing the same dimension measured on different scales (e.g. anxiety scores measured on a measured on different scales (e.g. anxiety scores measured on a VAS and questionnaire)VAS and questionnaire)
After standardisation these scales could also be added together After standardisation these scales could also be added together
(adding two quantities on different scales is obviously problematic)(adding two quantities on different scales is obviously problematic)
1313
Normal DistributionNormal Distribution Many real-life variablesMany real-life variables
(height, weight, IQ etc etc)(height, weight, IQ etc etc)are distributed like thisare distributed like this
Mathematical equationMathematical equation mimics this normal mimics this normal(or Gaussian) distribution (or Gaussian) distribution
1414
Normal DistributionNormal Distribution The mathematical normal distribution is useful The mathematical normal distribution is useful
as its known mathematical properties give us as its known mathematical properties give us useful info about our real-life variable (useful info about our real-life variable (assumingassuming our real-life variable is normally distributed)our real-life variable is normally distributed)
For example, 2 standard deviations above the For example, 2 standard deviations above the mean represent the extreme 2.5% of scores mean represent the extreme 2.5% of scores (calculus equations used to derive this)(calculus equations used to derive this)
Consequently, a person with an IQ score of Consequently, a person with an IQ score of 130 (M=100, SD=15), would be in the top 130 (M=100, SD=15), would be in the top 2.5% (assuming IQ is normally distributed) 2.5% (assuming IQ is normally distributed)
1515
Normal DistributionNormal Distribution Normality is important assumption (though Normality is important assumption (though
more about this next week). Violations of more about this next week). Violations of normality generally take two forms:normality generally take two forms:
SKEWNESSSKEWNESS
KURTOSISKURTOSIS
1616
1717
Association: Bivariate StatisticsIn Figure 3.3 (a) Y and X are positively but weakly correlated while in 3.3 (b) they are negatively and strongly correlated
1818
The covariance between two real-valued random variables X and Y, with mean (expected values) and is
Cov(X, Y) can be negative, zero, or positiveRandom variables with covariance is zero are called uncorrelated or independent
1919
The covariance is one measure of how closely the values taken by two variables X and Y vary together:
If we have a series of n measurements of X and Y written as Xi and Yi where i = 1, 2, ..., n, then the sample covariance can be used to estimate the population covariance between X=(X1, X2, …, Xn) and Y=(Y1, Y2, …, Yn). The sample covariance is calculated as