Lect 2 basic ppt

11

Review of following topics:Review of following topics:

Population vs. samplePopulation vs. sample Measurement scalesMeasurement scales Plotting dataPlotting data Mean & Standard deviationMean & Standard deviation Degrees of freedomDegrees of freedom Transforming dataTransforming data Normal distributionNormal distribution

- Howell (2002) Chap 1-3. ‘Statistical Methods for Psychology’- Howell (2002) Chap 1-3. ‘Statistical Methods for Psychology’

22

Population vs. samplePopulation vs. sample Population - an entire collection of Population - an entire collection of

measurementsmeasurements (e.g. reaction times, IQ scores, height or (e.g. reaction times, IQ scores, height or

even height of male Goldsmiths students) even height of male Goldsmiths students)

Sample – smaller subset of observations Sample – smaller subset of observations taken from populationtaken from population sample should be drawn randomly to make sample should be drawn randomly to make

inferences about population. Random inferences about population. Random assignment to groups improves validityassignment to groups improves validity

33

Population vs. samplePopulation vs. sample In general:In general:

population parameters =Greek letters population parameters =Greek letters sample statistics=English letterssample statistics=English letters

-worth learning glossary of other symbols now -worth learning glossary of other symbols now to avoid later confusion (e.g. Σ=the sum of)to avoid later confusion (e.g. Σ=the sum of)

PopulationPopulation SampleSample

meanmean μ μ (mu)(mu) XX

variancevariance σσ22 (sigma)(sigma) ss22

44

Measurement scalesMeasurement scales Categorical or ‘Nominal’Categorical or ‘Nominal’

e.g. male/female, or e.g. male/female, or catholic/protestant/othercatholic/protestant/other

ContinuousContinuous Ordinal - Ordinal - e.g. private/sergeant/admirale.g. private/sergeant/admiral Interval- Interval- e.g. temperature in celsius e.g. temperature in celsius Ratio - Ratio - e.g. weight, height etce.g. weight, height etc

55

Plotting dataPlotting data Basic rule is to select plot which Basic rule is to select plot which

represents what you want to say in represents what you want to say in the clearest and simplest waythe clearest and simplest way

Avoid ‘chart junk’ (e.g. plotting in 3D Avoid ‘chart junk’ (e.g. plotting in 3D where 2D would be clearer)where 2D would be clearer)

Popular options include bar charts, histograms, Popular options include bar charts, histograms, pie charts etc - see any text book. SPSS charts pie charts etc - see any text book. SPSS charts discussed in workshopdiscussed in workshop

66

Summary statisticsSummary statistics

Two essential components of data are:Two essential components of data are: (i) central tendency of the data &(i) central tendency of the data &

(ii) spread of the data (e.g. standard (ii) spread of the data (e.g. standard deviation) deviation)

Although mean (central tendency) and Although mean (central tendency) and standard deviation (spread) are most standard deviation (spread) are most commonly used, other measures can also commonly used, other measures can also be usefulbe useful

77

Measures of central Measures of central tendencytendency

Mode Mode the most frequent observation: 1, 2, 2, 3, 4 ,5the most frequent observation: 1, 2, 2, 3, 4 ,5

MedianMedian the middle number of a dataset arranged in numerical order: 0, 1, 2, 5, 1000the middle number of a dataset arranged in numerical order: 0, 1, 2, 5, 1000

(average of middle two numbers when even number of scores exist)(average of middle two numbers when even number of scores exist) relatively uninfluenced by outliersrelatively uninfluenced by outliers

Mean = Mean =

Mode Mode the most frequent observation: 1, the most frequent observation: 1, 22, , 22, , 3, 4 ,53, 4 ,5

MedianMedian the middle number of a dataset arranged in numerical order: 0, 1, the middle number of a dataset arranged in numerical order: 0, 1, 22, 5, 1000, 5, 1000

(average of middle two numbers when even number of scores exist)(average of middle two numbers when even number of scores exist) relatively uninfluenced by outliersrelatively uninfluenced by outliers

88

Measures of dispersionMeasures of dispersion Several ways to measure spread of data:Several ways to measure spread of data:

Range (max-min), IQR or Inter-Quartile Range (middle Range (max-min), IQR or Inter-Quartile Range (middle 50%), Average Deviation, Mean Absolute Deviation50%), Average Deviation, Mean Absolute Deviation

Variance – average of the squared deviationsVariance – average of the squared deviations

Variance for population of 3 scores (-10,0,10) is 66.66 (200/3)Variance for population of 3 scores (-10,0,10) is 66.66 (200/3) Standard deviation is simply the square root of the varianceStandard deviation is simply the square root of the variance

99

Calculating sample varianceCalculating sample variance Population variance (2) is the true variance of

the population calculated by -this equation is used when we have all values in a population (unusual)

However, the variance of a sample (S2) tends to be smaller than the population from which it was drawn. So, we use this equation:

The correction factor of ‘N-1’ increases the variance to be closer to the true population variance (in fact, the average of all possible sample variances exactly equals 2)

1010

Degrees of freedomDegrees of freedom Why is ‘N-1’ used to calculate sample Why is ‘N-1’ used to calculate sample

variance? variance? When calculating sample variance, we calculate When calculating sample variance, we calculate the sample mean thus making make the last the sample mean thus making make the last number in the dataset redundant – i.e. we lose a number in the dataset redundant – i.e. we lose a ‘degree of freedom’ (last no. is not free to vary)‘degree of freedom’ (last no. is not free to vary)

e.g. M=10, sample data: 12, 9, 10, 11, 8e.g. M=10, sample data: 12, 9, 10, 11, 8

Calculating the Calculating the sample mean sample mean (10) means that we have already (10) means that we have already (implicitly) included the last number in our calculations.(implicitly) included the last number in our calculations.If we (knew and) used the If we (knew and) used the population mean population mean rather than the sample rather than the sample mean this would not be the case so we could use N not N-1.mean this would not be the case so we could use N not N-1.

Howell illustrates this with a worked example (and Howell illustrates this with a worked example (and mathematical proof can be retrieved with internet search)mathematical proof can be retrieved with internet search)

Bottom line is whenever we have to estimate a Bottom line is whenever we have to estimate a statistic (e.g. mean) we lose a degree of freedomstatistic (e.g. mean) we lose a degree of freedom

1111

Transforming dataTransforming data One reason we might ‘transform’ data is to convert from One reason we might ‘transform’ data is to convert from

one scale to another one scale to another e.g. feet into inches, centigrade into fahrenheit,e.g. feet into inches, centigrade into fahrenheit,

raw IQ scores into standard IQ scoresraw IQ scores into standard IQ scores

Scale conversion can usually be achieved by simple Scale conversion can usually be achieved by simple linearlinear transformation (multiplying/dividing by a constant and transformation (multiplying/dividing by a constant and adding/subtracting a constant)adding/subtracting a constant)

XXnewnew = = bb**XXoldold + + cc

So to convert centigrade data into fahrenheit we would apply the following:So to convert centigrade data into fahrenheit we would apply the following:

1212

Transforming dataTransforming data Z-transform (standardisation) is one common Z-transform (standardisation) is one common

type of linear transform, which produces a new type of linear transform, which produces a new variable with M=0 & SD=1variable with M=0 & SD=1

ZZ -scores= -scores= XX

Standardisation is useful when comparing the same dimension Standardisation is useful when comparing the same dimension measured on different scales (e.g. anxiety scores measured on a measured on different scales (e.g. anxiety scores measured on a VAS and questionnaire)VAS and questionnaire)

After standardisation these scales could also be added together After standardisation these scales could also be added together

(adding two quantities on different scales is obviously problematic)(adding two quantities on different scales is obviously problematic)

1313

Normal DistributionNormal Distribution Many real-life variablesMany real-life variables

(height, weight, IQ etc etc)(height, weight, IQ etc etc)are distributed like thisare distributed like this

Mathematical equationMathematical equation mimics this normal mimics this normal(or Gaussian) distribution (or Gaussian) distribution

1414

Normal DistributionNormal Distribution The mathematical normal distribution is useful The mathematical normal distribution is useful

as its known mathematical properties give us as its known mathematical properties give us useful info about our real-life variable (useful info about our real-life variable (assumingassuming our real-life variable is normally distributed)our real-life variable is normally distributed)

For example, 2 standard deviations above the For example, 2 standard deviations above the mean represent the extreme 2.5% of scores mean represent the extreme 2.5% of scores (calculus equations used to derive this)(calculus equations used to derive this)

Consequently, a person with an IQ score of Consequently, a person with an IQ score of 130 (M=100, SD=15), would be in the top 130 (M=100, SD=15), would be in the top 2.5% (assuming IQ is normally distributed) 2.5% (assuming IQ is normally distributed)

1515

Normal DistributionNormal Distribution Normality is important assumption (though Normality is important assumption (though

more about this next week). Violations of more about this next week). Violations of normality generally take two forms:normality generally take two forms:

SKEWNESSSKEWNESS

KURTOSISKURTOSIS

1616

1717

Association: Bivariate StatisticsIn Figure 3.3 (a) Y and X are positively but weakly correlated while in 3.3 (b) they are negatively and strongly correlated

1818

The covariance between two real-valued random variables X and Y, with mean (expected values) and is

Cov(X, Y) can be negative, zero, or positiveRandom variables with covariance is zero are called uncorrelated or independent

1919

The covariance is one measure of how closely the values taken by two variables X and Y vary together:

If we have a series of n measurements of X and Y written as Xi and Yi where i = 1, 2, ..., n, then the sample covariance can be used to estimate the population covariance between X=(X1, X2, …, Xn) and Y=(Y1, Y2, …, Yn). The sample covariance is calculated as

Technology

Lect 2 basic ppt