Lecture 3: Chi-Sqaure, correlation and your dissertation proposal Non-parametric data: the...

Preview:

Citation preview

Lecture 3: Chi-Sqaure, correlation and your dissertation proposal

• Non-parametric data: the Chi-Square test

• Statistical correlation and regression: parametric and non-parametric tests

• Break

• Regression in SPSS

• Writing a dissertation proposal when you plan to use statistics

• Exercises, assessment and assistance

Non-parametric statistics

• Non-parametric statistics in human geography

• Different types of non-parametric test:– 1 sample– 2 independent samples– 2 tied samples– 3 or more samples

The Chi-Square test

• Most versatile test in social science

• Can be used to examine nominal data, ordinal data and interval/ratio data in groups

• There are no assumptions about independent or paired observations

Theory of Chi-Square

• The test examines the difference between observed counts and expected values

• Suppose we wanted to examine the difference between age groups in our sample and people in those groups in the UK? Or perhaps the difference between age groups between two or three samples?

• Chi-Square can examine these differences

The Chi-Square Equation

χ2 = Sum of: (observed - expected)2

expected

One way Chi-Square test

• Examines whether there is a difference between one sample and a population

• We can assume either that the expected counts will be equal between categories or that we know the proportions

• But, before we do the test, we have to cross-tabulate the data

The Cross-tabulationAge 18-30 31-50 51-65 Over 65 Total

North 30 20 35 55 140

Total 30 20 35 55 140

The expected counts

• Expected counts relate to either equal proportions or previously known proportions (e.g. from a population)

• These are then compared to observed counts and the difference is calculated

• A significance level is selected and the null hypothesis is accepted or rejected

The Contingency Table

Age 18-30 31-50 51-65 Over 65 Total

North 30 20 35 55 140

Exp 35 35 35 35 140

Total 30 20 35 55 140

The test result

• Chi-Square is calculated as the sum of each difference for every cell

• Assessed as for other statistical tests

• χ2 = 7.1 (p <0.05)

Two way Chi-Square tests

• Very often, we want to compare more than one sample with a population, such as with another sample, or three or more samples

• Two way Chi-Square allows us to do this easily

• Again, we cross-tabulate the data

The Contingency table

Age 18-30 31-50 51-65 Over 65 Total

North 25 25 35 55 140

South 40 35 25 20 120

Total 65 60 60 75 260

Two-way analysis

• Chi-Square calculates expected values by multiplying the row and column totals and dividing between the grand total

• Expected values represent the number in each category which, given the sample sizes and distribution, we would expect to see in each cell

The Chi-Square result• Chi-Square gives the result and we evaluate the

test with the use of significance tests

• χ2 = 21.7 (p <0.05)

• But, we can only state that there is a difference - not what the difference is. For example, does our sample from the north have more older people in it?

• We must examine the relative proportions of the contingency table to find this out

The expected counts problem

• Chi-Square has the stipulation that 20% or less of the expected counts in an analysis must be under 5. If there are more than this, the test is invalid

• So, how can we get over this problem?

Recoding variables

• We can aggregate suitable variables to make the number of groups smaller

• Aggregating only works with ordinal data

• This reduces the number of groups and makes the likelihood of obtaining counts below 5 less

• We can also use this to make interval/ratio data into groups

Chi-Square: Qualifications

• You should have no less than 20 cases

• As stated above, not more than 20% of cells should have expected values under 5

• You should not necessarily ignore a contingency table, even if the Chi-Square test is invalid

• Remember, above all, that Chi-Square is a test of difference, not correlation

Statistical correlation: relationships among variables

• Relationships are concerned with the extent to which variable A is related to B

• This is termed correlation

• Correlation does not necessarily imply causation, but merely a possible relationship

• There are parametric and non-parametric tests of correlation

Types of correlation

• Perfect positive correlation: +1

• Perfect negative correlation: -1

• Linear relationship• No correlation: 0• Non-linear

relationship0

5

10

15

20

0 5 10 15 20

Parametric correlation: Pearson’s r

• Assumes your data are on interval/ratio scales AND are normally distributed

• Measured as -1 - +1

• This result shows the strength of the relationship

• The test must be judged by its significance (as for other parametric tests: < > 0.05)

Non-parametric correlation:Spearman’s rs

• Assumes ordinal data, or interval/ratio data that are not normally distributed

• Data are ranked for the test

• Measured as for Pearson’s

• Significance as for Pearson’s

From correlation to explanation: regression analysis

• Regression seeks to examine the nature of the relationship between one or more independent variables and a dependent variable

• It is concerned with prediction, not just correlation

• To predict, there is an equation which describes the ‘line of best fit’ between variables

The Line of best fit• Line of best fit ‘fits’ a

straight line through the data points you observe

• Can be expressed by:

Y = mx + cWhere:

Y = Dependent variable

c = constant (intercept)

m = slope gradient

x = independent variable

y = 0.9677x + 0.5895

02468

10

1214161820

0 5 10 15 20

Predicting using the regression equation

• You can use the equation to predict levels of Y for given levels of X

• This is often of use when looking at different outcome situations

Interpreting regression results

• R2: the ‘goodness of fit’ that the model offers, expressed in per cent

• F: the significance of the model

• The regression coefficients and associated p values

Regression: assumptions

• Your data:– Are measured on interval/ratio scales;– Are normally distributed;– And are therefore Parametric; and...– Have a linear relationship• You can use other techniques for non-linear

regression and regression with nominal/ordinal variables

Is any of this relevant to me?

• YES - you have to write a dissertation proposal

• Saying you will ‘analyse’ the data using appropriate methods is not enough

• You will get a far higher mark if you follow these simple steps in the next two months when preparing your proposal:

Writing your Dissertation Proposal: key points

• Do you need to use a questionnaire/other quantitative instrument?

• If yes, what key questions are you posing?

• ALWAYS relate these questions to your plans for analysis

• How will you analyse these collected data to meet your aims and objectives?

Writing your proposal• Methodology

• Questionnaire

• Questions

• Data this will yield

• Analysis types

• Analysis tools

• Quantitative/qualitative?

• Type: closed/open/both?

• Yes/no; frequency; categorical; multiple response?

• Parametric/non-parametric?

• Description, Differences, relationships?

• Parametric/non-parametric?

Example of this process

Section of proposal Abstract example Specific example

Methodology Quantitative A questionnaire

Questionnaire Type Closed, with one open question

Questions DichotomousCategoricalAgreementFrequencyWrite-in answers

Yes/no, M/F, etc.Family type, etc.Attitude questionsBehaviour questionsAge

Data this will yield ParametricNon-parametric

AgeAll other variables

Analysis types DescriptionDifferencesRelationships

Describe attitudesDifference in attitudes (e.g. M/F)Correlation of attitudes/behaviour

Analysis tools VisualParametricNon-parametric

Bar and pie graphs, Pareto chartst-tests, ANOVA, Pearson’s rChi-Square, Spearman’s rs

A final word

• Think carefully about your questionnaire - can you meet the objectives you have set yourself?

• Do you need to use every statistical test?

• Assessments (all 3) due in on 6 May

• Where can you get help?– Friday 14th March, 9-11am;– Monday 28th April, 11am-1pm• E-mail: S.W.Barr@exeter.ac.uk

Recommended