Psychological Statistics 88

Embed Size (px)

Citation preview

  • psychological statistics

    B Sc. Counselling Psychology2011 Admission onwards

    III SEMESTER

    COMPLEMENTARY COURSE

    UNIVERSITY OF CALICUTSCHOOL OF DISTANCE EDUCATION

    CALICUT UNIVERSITY.P.O., MALAPPURAM, KERALA, INDIA 673 635

    psychological statistics

    B Sc. Counselling Psychology2011 Admission onwards

    III SEMESTER

    COMPLEMENTARY COURSE

    UNIVERSITY OF CALICUTSCHOOL OF DISTANCE EDUCATION

    CALICUT UNIVERSITY.P.O., MALAPPURAM, KERALA, INDIA 673 635

    psychological statistics

    B Sc. Counselling Psychology2011 Admission onwards

    III SEMESTER

    COMPLEMENTARY COURSE

    UNIVERSITY OF CALICUTSCHOOL OF DISTANCE EDUCATION

    CALICUT UNIVERSITY.P.O., MALAPPURAM, KERALA, INDIA 673 635

  • School of Distance Education

    Psychological Statistics Page 2

    UNIVERSITY OF CALICUTSCHOOL OF DISTANCE EDUCATION

    STUDY MATERIALB Sc. COUNSELLING PSYCHOLOGY

    III SemesterCOMPLEMENTARY COURSE

    PSYCHOLOGICAL STATISTICS

    Prepared by:Dr.Vijayakumari.K,Associate Professor,Farook Teacher Training College,Farook College.P.O.Feroke

    Scrutinised by:Prof.C.Jayan,Department of Psychology,University of Calicut.

    Layout & SettingsComputer Section, SDE

    Reserved

  • School of Distance Education

    Psychological Statistics Page 3

    CONTENTS

    MODULE I CORRELATION 04-13

    MODULE II PARAMETRIC AND NON-PARAMETRIC TESTS 14-20

    MODULE III MEDIAN TEST 21-26

  • School of Distance Education

    Psychological Statistics Page 4

    MODULE ICORRELATION

    Objectives:The student will be

    - acquainted with knowledge of correlation, types of correlation and differentmethods of calculating correlation;

    - able to calculate coefficient of correlation using different methods.- able to draw scatter diagram and interpret it.Correlation

    In Psychology, there are situations where two or more variables are involved.Some of them will be related to each other and some will be independent. To knowwhether these variables are related or not, one has to be thorough with the concept ofcorrelation.

    Correlation is a statistical technique that is used to measure and describerelationship between two variables. Two variables are said to be related, if for a changein one variable, there is a change in the other. That is, the two quantities vary in such away that movements in one are accompanied by movements in the other. For example,the scholastic achievement and general intelligence of a child will be related to eachother. The extent of relationship and the nature of relationship will be measured usinga correlation coefficient.

    If an increase in one variable brings a corresponding increase in the othervariable or a decrease in one variable brings a corresponding decrease in the othervariable, the two variables are said to be positively related.

    For example, it is supposed that as height of individuals increase weight alsoincreases. Hence the two variables height and weight are positively related.

    If increase (or decrease) in one variable brings a corresponding decrease (orincrease) in the other variable the two variable are negatively related.

    The relation between performance of an individual in a test and test anxiety willbe negative as for an increase in test anxiety, there will be a decrease in the performanceor vice versa.

    If there is no corresponding change in one variable for the changes in the othervariable, the two variables are said to be not related or there is zero correlation betweenthe variables.

    The height of an individual and his intelligence are independent (or not relatedto each other) as there will not be any changes in intelligence accompanied withchanges in height or vice versa.

    It is to be noted that correlation gives idea about the relation between thevariables, but not the causation. That is, the change in one variable is not the cause forchange in the other variable, but due to some reasons, the two variables vary together.

  • School of Distance Education

    Psychological Statistics Page 5

    The relation between two variables can be linear or curvilinear. If the relationbetween two variables can be represented graphically by a straight line, the relation islinear. If the graph is not a straight line, but a curve, the relation is curvilinear.

    In other words, if the amount of change in one variable tends to bear constantratio to the amount of change in the other variable, then the correlation is said to belinear. If the change in one variable does not bear a constant ratio to the amount ofchange in the other variable, the relation is non-linear or curvilinear.

    In common purposes, usually linear correlation is calculated, and it reveals howthe change in one variable is accompanied by a change in the other variable.

    If two variables are linearly related, the degree of relationship and the nature ofrelation can be measured using coefficient of correlation. It is a ratio which expressesthe extent to which changes in one variable are accompanied by changes in the othervariable.

    A coefficient of correlation ranges from -1 to +1.Two important methods of computing coefficient of correlation are

    1. Pearson's Product Moment method and2. Spearman's Rank different method.Pearson's Product Moment Method

    This method is the most common method of calculating the linear relationshipbetween the variables (usually with interval or ratio measure)

    Pearson's Coefficient of Correlation

    r =

    22YYXX

    YYXX

    WhereY&X are the arithmetic means of the two sets of values X and Y respectively.

    Machine formula for Pearson's coefficient of correlation is

    2222 YYNXXN

    YXXYNr

    Spearman's Rank difference methodPearson's product moment correlation is used when the variables are in interval

    or ratio scale and if the relation between them is linear. But in practice one may facesituation when the data are not in interval or ratio scale or the expected relationshipdoes not fit a linear form. Spearman's method is one alternative to Pearson's methodwhen the variables are in ordinal form (When X and Y values are ranks) or when theexpected relationship is not linear. (When one wants to measure the consistency of arelationship between X and Y, independent of the specific form of the relationship).

  • School of Distance Education

    Psychological Statistics Page 6

    Spearman's coefficient of correlation

    1nn D61 22

    Where D is the difference between X rank and the Y rank for

    each individual and n-number of individuals.If X and Y are not given as ranks, convert X and Y into ranks. Rank can be

    assigned by taking either highest value as 1 or the lowest value as 1, but the samemethod must be followed while ranking the other variable.

    If two or more entries are equal, each entry is given an average rank. Forexample if two scores are ranked equal at 6th place, they are each given the rank

    5.62

    76 , giving the 8th rank for the next item. If three scores are at 6th place, theaverage rank 7

    3876 will be given to the three values, the next item with rank 9.

    Interpretation of correlation co-efficientA correlation coefficient tells whether there is any relation between the variables

    and if a relationship exists between variables, is it positive or negative. It also indicatesthe degree of relationship, ie., closeness of relationship.

    Thus a correlation coefficient can be interpreted with respect to1) The sign : If the coefficient of correlation computed is positive, there is a positive

    relationship between the variables and if it is negative, the relationship betweenthe variables is negative.

    2) The magnitude: The value ranges from -1 to +1. One way of interpreting thecorrelation coefficient is,

    0 - zero relation or no relationship.0.00- 0.20 - Negligible to low relation0.20 - .40 - Low to moderate relation.0.40 - .60 - Moderate to high relation.0.60 - .80 - High to very high relation0.80 - 1 - Very high to perfect relation.

    IllustrationPearson's Method

    Following are the marks obtained by five students in two tests. CalculatePearson's Product moment coefficient of correlation and interpret it.

    Sl. No. Test 1 Test 21 5 102 3 123 6 144 8 85 4 10

  • School of Distance Education

    Psychological Statistics Page 7

    This can be written as

    X Y XX YY 2XX 2YY YYXX 5 10 -.2 -.8 0.04 0.64 .163 12 -2.2 1.2 4.84 1.44 -2.646 14 0.8 3.2 0.64 10.24 2.568 8 2.8 -2.8 7.84 7.84 -7.844 10 -1.2 -0.8 1.44 0.64 0.96

    14.80 20.80 -6.80

    2.5526

    548635X

    8.105

    545

    108141210Y

    r =

    22YYXX

    YYXX

    = 377.084.307

    80.680.2080.14

    80.6 x

    r = -0.387Negative sign of 'r' indicates a negative correlation between the two sets of test scores.That is, for an increase in the first test score, there will be a decrease in the second testscore or vice versa.

    The magnitude is 0.387 which is in between 0.20 and 0.40 and hence therelationship can be considered as moderate or substantial.Thus, the two sets of test scores are negatively related and the relationship is moderate.Using Machine formula

    X Y X2 Y2 XY5 10 25 100 503 12 9 144 366 14 36 196 848 8 64 64 644 10 16 100 4026 54 150 604 274

  • School of Distance Education

    Psychological Statistics Page 8

    r =

    2222 YYNXXN

    YXXYN

    = 29163020686750 14041370546045261505 52262745 22 xx xx= 387.07696

    3410474

    3429163020676750

    14041370

    x

    Illustration 2Spearman's Rank difference method

    The ranks given by two judges for ten items of a Thurston's scale of attitude aregiven below. Find the correlation between the two ranks.

    Item Judge 1 Judge 2Item 1 9 8Item 2 8 6Item 3 5 7Item 4 1 2Item 5 4 3Item 6 6 5Item 7 2 1Item 8 3 4Item 9 10 9Item 10 7 10

    R1 R2 D=R1-R2 D2

    9 8 1 18 6 2 45 7 -2 41 2 -1 14 3 1 16 5 1 12 1 1 13 4 -1 110 9 1 17 10 -3 9

    24

  • School of Distance Education

    Psychological Statistics Page 9

    p = 10nHere1nn D61 22

    p = 145.19901441

    99102461

    x

    x

    = 0.855The correlation coefficient obtained is 0.855. The positive sign indicates the

    direction of correlation, that is the two sets of ranks are positively related. Themagnitude is 0.855 which denote a high correlation between the two sets of data.Hence the two judges have agreement in the ranks given to the items.Illustration 3

    X Y R R1 D=R1-R2 D2

    5 10 5.5 5 .5 .253 12 8.5 2 6.5 42.256 14 4 1 3 98 8 2 8.5 -6.5 42.254 10 7 5 2 45 9 5.5 7 -1.5 2.253 10 8.5 5 3.5 12.259 8 1 8.5 -7.5 56.257 11 3 3 0 02 7 10 10 0 0

    168.5

    Here the value 5 repeats two times in X-set and the rank for the two values willbe five and six. Hence the average ie 5.5

    265 is given as rank for both the values.

    The next lower value 4 is given the rank '7'.Similarly, '3' repeats twice and the corresponding ranks are 8 and 9, the rank

    given is 5.82

    98 for the 3's. The next lower value 2 is given the rank 10.In the second set of values (y - values), '10' occurs three times occupying the

    positions 4, 5 and 6. Hence the average 53

    654 is given as rank for 10, the nextlower value '9' getting the rank '7'. Similarly, '8' has got the rank 8.5

    p = 1nn D61 22

    = 99010111)110010

    5.16861 x

    = 1 - 1.0212= -0.02

  • School of Distance Education

    Psychological Statistics Page 10

    The obtained value shows that the two sets of values are negatively related but therelationship is negligible.Scatter diagram

    Scatter diagram or scatter plot is the graphical representation of correlationbetween two variables. It gives the pattern of relationship between the two variables ata glance itself.

    To construct a scatter diagram the following steps can be followed. Draw the axes and decide which variable goes on which axis.

    Two perpendicular axes are drawn and one variable is taken on the horizontalaxis and the other on the vertical axis.

    Determine the range of values for each variable and mark them on the axes.Usually the point at which the axes meet is taken as 'zero' and the points are

    marked on the axes by taking an appropriate scale. The lowest and highest values ofboth variables must be included in the respective axes. It is conventional that thescatter diagrams are roughly square in shape and hence the scales must be taken so thatthe horizontal and vertical axes have almost the same length (1:1 ratio).

    Mark a 'dot' for each pair of scores.Plot points on the paper by taking the score of the variable on horizontal axis as

    the X-co-ordinate and that of the vertical axis as the Y-co-ordinate. Each pair of scorescan be represented as a point by following this method.Examples

    The scores obtained by 10 individuals in two tests are given below. Draw ascatter diagram.Individual 1 2 3 4 5 6 7 8 9 10Test A 3 6 1 4 3 4 2 6 3 5Test B 4 5 2 6 5 4 3 6 4 4

    Scatter diagram

    0

    1

    2

    3

    4

    5

    6

    7

    0 2 4 6 8

    Scor

    e Te

    st B

    Score Test A

  • School of Distance Education

    Psychological Statistics Page 11

    Here the points plotted tend to lie on a straight line and hence the two variablesare linearly related (or the relation is linear). The points are from left bottom to righttop which indicate a positive relationship between the variables.

    In the following diagram the points are arranged from the left top to rightbottom indicating a negative relationship between the variables.

    If the variables are not related or if there is zero relation between the variables,there will not be any pattern in the distribution of points, the points will be scattered inthe plane.

    0

    1

    2

    3

    4

    5

    6

    0 1 2 3 4 5

    Y

    X

    Individuals. X YA 4 2B 3 3C 2 3D 1 4E 2 5

    Indls. X YA 1 2B 2 4C 3 1D 4 3E 5 2

  • School of Distance Education

    Psychological Statistics Page 12

    But scatter diagram do not give any clear picture about the strength of therelation as the correlation coefficient give. Any how, whether the variables are highlyrelated or the relationship is low can be identified by looking into how do the pointsfall to a straight line. That is, if the points more tend to a straight line, the variables arehighly related. If they fall far away from the straight line, the relationship is zero orvery low. The correlation is moderate, if the pattern of dots is somewhere between alow and a high correlation. If the points lie exactly on a straight line (as shown in thefigure, the correlation is perfect).

    0

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    4.5

    0 2 4 6

    Y

    X

    0

    1

    2

    3

    4

    5

    6

    0 2 4 6

    Y

    X

    Indls. X YA 1 1B 2 2C 3 3D 4 4E 5 5

  • School of Distance Education

    Psychological Statistics Page 13

    Scatter diagram showing positive perfect linear relationship

    Indls. X YA 1 5B 2 4C 3 3D 4 2E 5 1

    Scatter diagram showing negative perfect linear relationshipScatter diagrams are important because they clearly exhibit the nature of

    relationship, whether linear or curvilinear. If the variables are found to be linearlyrelated, then only, coefficients of correlation can be calculated. When the relation iscurvilinear, calculating coefficient of correlation and interpreting it will be misleadingor will be an error.

    0

    1

    2

    3

    4

    5

    6

    0 2 4 6

    Y

    X

  • School of Distance Education

    Psychological Statistics Page 14

    MODULE 2PARAMETRIC AND NON-PARAMETRIC TESTS

    ObjectivesThe student will be able to

    - discriminate parametric and non-parametric tests.- use chi-square test and c-coefficient of contingency in appropriate situations.Introduction

    In order to understand this module, a more knowledge about statistics isneeded. First of all you should be familiar with the descriptive and inferential statistics.Descriptive statistics describes the data as does arithmetic mean or median or standarddeviation or correlation. But statistics is mainly used to predict or infer the unknown.This function of statistics is done by inferential statistics. Two terms related toinferential statistics are parameter and statistics. Usually we are conducting researchon a sample, not on the entire population. But the purpose of the study will be toknow about the population and not the sample. The value calculated from the sampleis called statistic and the corresponding value for the population is parameter. Forexample, if arithmetic mean of a sample is , it is a statistic. The correspondingarithmetic mean of the population can be taken as , the parameter. Statistics tries topredict the unknown parameter from the known statistics. In the process of predictingor inferring population values, hypotheses are formulated and tested. A hypothesis isa tentative solution to the problem which is tested through the study. Hypotheses arestated as affirmative sentences. Inferential statistics tests these hypotheses using thedata available through the study of sample, a representative group of the population.

    Hypotheses can be directional or non directional. The statement, Boys are betterthan girls in mechanical aptitude is a directional hypothesis as there is a clearindication of direction of change. But the statement, Boys and girls differ in theirmechanical aptitude is a non directional hypothesis as there is no indication ofdirection of change.Statistical procedures are there to test hypothesis in which no change or no effect isstated. Such hypothesis is known as null hypothesis denoted as H0. H0 is tested againstthe alternative hypothesis, H1. An alternative hypothesis is the hypothesis that isaccepted when the null hypothesis is rejected.Example: H0: There is no significant mean difference in mechanical aptitude betweenboys and girls.(1= 2).

    H1: There is significant difference in mechanical aptitude between boys andgirls. ( 1 2).

    While testing a hypothesis, there is a possibility of occurrence of two types oferrors. One error is rejecting H0 when it is true and the other is accepting H0 when itis not true. The first error is known as Type I error and the second is Type II error.That is any decision regarding hypothesis testing may include these two types oferrors. Probability of Type I Error ie., rejecting H0 when it is true is known as Level ofSignificance. In hypothesis testing level of significance is determined by the researcherin advance itself. In behavioural sciences, it is taken commonly as .05 or .01. As theprobability of type I error decreases, probability of type II error increases and viceversa.

  • School of Distance Education

    Psychological Statistics Page 15

    Decisions regarding hypothesis testing are made based on the concept of aNormal distribution. Normal distribution is a theoretical model of distribution ofquantitative data. A normal probability curve is a bell shaped curve which issymmetric with respect to the ordinate at mean. It is non-skewed, lepto kurtic curve forwhich mean, median and mode coincide. It has wide application in statistics. The twotails of the curve determines the chance of accepting or rejecting a null hypothesis. Thearea for which the hypothesis is rejected is known as rejection area or critical region.(More about normal curve is included in fourth semester).Parametric and Non-parametric Tests

    The researches in behavioural science need different types of statistical tests totest hypotheses about specific population parameters (Values corresponding topopulation are called parameters and those of sample are statistics). A number ofstatistical techniques like 't' tests and ANOVA make assumptions about the shape ofthe population distribution and about other population parameters. As these testsconcern parameters and require assumptions about parameters, they are called'Parametric tests'. Parametric tests require a numerical score for each individual in thesample. The scores then are added, squared, averaged and otherwise manipulatedusing basic arithmetic. That is, parametric tests require data from an interval or ratioscale.

    If some assumptions are not met, it may not be appropriate to use a parametrictest. Violation of assumptions of a test may lead to erroneous interpretation of the data.In such situations, there are several hypothesis testing techniques that providealternatives to parametric tests. These tests are known as 'non-parametric' tests.

    Non-parametric tests usually do not state hypotheses in terms of a specificparameter and there may few (if any) assumptions about population distribution.(Because of this reason, non-parametric tests are also known as distribution free tests)Chi-square test, Sign test etc are non-parametric tests.

    For non-parametric tests, the subjects (sample elements) are usually classifiedinto categories. (These measures are on nominal or ordinal scales and they do notproduce numerical values that can be used to calculate means and variances). The datafor many non-parametric tests are frequencies.

    Non-parametric tests are not as sensitive as parametric tests. (i.e., non-parametric tests are more likely to fail in detecting a real difference between twotreatments). Hence whenever there is a chance to use parametric test, it must bepreferred than a non-parametric test.

    Chi-Square Test

    Chi-square ( 2) test is a non-parametric test with wide applications in the fieldof social sciences. (The name of the test is related with the Greek letter '' 'Kye' used toidentify the test statistic).

    Chi-square test is used when the data is in the form of frequencies (ie., themeasurement is on nominal scale). Mainly the test is used for two purposes- testinggoodness of fit and to test the independence between two nominal variables.

  • School of Distance Education

    Psychological Statistics Page 16

    The Chi-square test for Goodness of fitHere Chi-square test is used for testing hypothesis related to sample proportions

    with respect to the corresponding population proportions.The Chi-square test for goodness of fit determines how well the obtained sample

    proportions fit the population proportions specified by the null hypothesis.Formula for calculating 2 is

    EEO 22 )( where

    O is the observed frequency andE is the expected frequency (as per the null hypothesis)Usually in the test of Goodness of fit, the null hypothesis will fall into one of the

    following categories.1. No preference: The null hypothesis states that there is no preference among the

    different categories. ie., Ho states that the population is divided equally amongthe categories.

    2. No difference from other populationThe null hypothesis can state that the frequency distribution for one population

    is not different from the distribution that is known to exist for another population.The null hypothesis for the goodness of fit test specifies an exact distribution for

    the population. So the alternative hypothesis (H1) state that the population distributionhas a different shape from that specified in Ho. For example, if Ho states that thepopulation is equally divided among three categories, the alternative hypothesis willsay that the population is not divided equally.

    2 measures the discrepancy between the observed frequencies (the data) andthe expected frequencies (Ho). When the calculated value of 2 is large, observed andexpected frequencies differ highly and hence Ho is rejected. If 2 value is small, thedifference between the observed and expected frequencies are less, then Ho is accepted.To determine whether the 2 value is large or small, the theoretical distribution of 2 isused. The table of 2 gives values for different levels of significance and degrees offreedom(df). In goodness of fit test, the degree of freedom of 2 is C-1 where C is thenumber of categories. If there are three groups or categories in the specifiedpopulation, the degrees of freedom will be 3-1=2. If there are five categories, the df willbe 5-1=4.

    In a 2 table, the first column lists df, the top row lists proportions of area in theextreme right hand tail of the distribution (or level of significance). The numbers in thebody of the table are the critical values of Chi-square. To test the hypothesis the criticalvalue of 2 for given df and level of significance is located from the table. Then thecalculated 2 value is compared with the tabled value. If the calculated value is greaterthan the tabled value, Ho is rejected and if it is less than the tabled 2 value, Ho isaccepted.

  • School of Distance Education

    Psychological Statistics Page 17

    Illustration 1Forty eight subjects were asked to express their response to an item in an

    attitude scale in three categories Agree, Undecided and Disagree. Of the members inthe group, 24 marked 'agree', 12 'undecided' and 12 'disagree'. Do these resultsindicate a significant difference among groups?

    Here the null hypothesis Ho can be stated asHo: There is no significant difference among groupsH1: The group differ significantly.

    Category O EAgree 24 16Undecided 12 16Disagree 12 16

    Since the assumption is that the groups do not differ in their size, the total 48 canbe equally divided into three groups. So the expected frequency of each group is

    16348

    Next step is finding 2 by summingE

    EO 2)(

    E

    EO 22 )(

    16

    )1624( 2 16

    )1612( 216

    )1612( 2

    =16

    161664 =1696 = 6

    With degree of freedom c-1=3-1=2. Fix the level of significance as 0.05. Fromtable of 2, the value of 2(2 df) 0.05 level is 5.991. Since the calculated value is greaterthan the tabled value of 2, Ho is rejected. That is one cannot expect equal preferenceamong the subjects at 0.05 level of significance or the subjects showed a significantdifference in the response to the given item.Illustration-2

    In a study to know if high-performance over powered cars are more likely to beinvolved in accidents than other types of cars, the investigator classified 50 cars as highperformance, sub compact, mid-size, or full size. The observed frequencies are givenbelow.

  • School of Distance Education

    Psychological Statistics Page 18

    Highperformance

    Subcompact Mid size Full size

    20 14 7 9

    It is assumed that only 10% of the cars in the population are the high-performance variety. The subcompact, mid size and full size cars are 40%, 30% and20% respectively. Can the researcher conclude that the observed pattern of accidentsdoes not fit the predicted values (Test with =0.05).Ho: In the population, no particular type of car shows a disproportionate number ofaccidents. (or the observed frequencies are in agreement with the expected frequencies)H1: In the population, a disproportionate number of accidents occur with certain typesof cars (There is disagreement with O and E).

    O E (O-E)2 (O-E)2 /E20 5 225 4514 20 36 1.87 15 64 4.269 10 1 0.1

    Here expected frequencies are calculated according to the assumption. So theexpected frequency of the first group is 50 x

    10010 = 5x1=5

    The E of second group is 20 (50 x10040 )

    The E of third group is 50 x10030 = 15

    and that of fourth group is 50 x10020 =10

    EEO 22 )( = 51.16

    degrees of freedom = C-1=4-1=3The tabled value of 2(3 df) at 0.05 level is 7.81. Since the calculated value is greater thanthe tabled value of 2 for 3df at 0.05 level, Ho is rejected. That is a disproportionatenumber of accidents occur with certain types of cars.

  • School of Distance Education

    Psychological Statistics Page 19

    Chi-square test of independenceChi-square test can be used to test whether two nominal variables (values

    expressed as frequencies) are independent or not (associated or not). The formula forcalculating 2 is same as that of Goodness of fit.

    EEO 22 )(

    Degrees of freedom in the case of Test of Independence is (C-1) (R-1) where C isthe number of columns and R number of rows. The steps involved in the test ofindependence are

    Calculate the expected frequencies. Expected frequency for each cell can becalculated by

    NRTxCTE

    E- Expected frequencyRT- The row total for the row containing the cellCT- The column total for the column containing the cellN- The total number of observations.

    Take the difference between observed and expected frequencies and obtain thesquares of these differences, i.e., obtain the values of (O-E)2.

    Divide the values of (O-E)2 obtained in step 2 by the respective expectedfrequency and obtain the total

    EEO 2)(

    The calculated value of 2 is compared with the table value.Example: Two hundred students were classified by personality and colour preferenceas given below. Test whether colour preference and personality are dependent for thestudent population.

    Red Yellow Green BlueIntrovert 10 3 15 22 50Extrovert 90 17 25 18 150

    100 20 40 40 N=200

    Here the null hypothesis isHo: Colour preference and personality of students are independent.The expected frequencies of the cells are calculated by multiplying the correspondingrow total and column total and then dividing it by total number of participants.

  • School of Distance Education

    Psychological Statistics Page 20

    The expected (theoretical) frequencies will beRed Yellow Green Blue

    Introvert200

    10050x

    =25200

    2050x =5200

    4050x =10200

    4050x =10

    Extrovert200

    100150x

    =75200

    20150x

    =15200

    40150x

    =30200

    40150x

    =30

    EEO 22 )( substituting the values

    25)2510( 22 +

    5)53( 2 +

    10)1015( 2 +

    10)1022( 2 +

    75)7590( 2 +

    15)1517( 2 +

    15)1525( 2

    +15

    )1518( 2

    = 37.4Degrees of freedom (df) is

    (R-1) (C-1) = (2-1) (4-1) = 1x3=3The table value for 3 df at 0.05 level is 7.815 and at 0.01 level is 9.210.

    Since the calculated value of 2 is greater than the tabled value for significance at0.01 level, the null hypothesis is rejected. That is, the hypothesis "Colour preferenceand personality of students are independent' is rejected. Hence the two variablescolour preference and personality are associated or dependent.Note: If two variables are found to be associated using 2 test of independence, thestrength of association can be calculated using c-coefficient of contingency.

    2

    2

    NC

    Basic AssumptionChi-square test is a non-parametric test and hence is distribution free. That is, as

    in parametric tests, variables need not be normally distributed.But, the variables are to be measured in a nominal scale. That is, the data is in

    the form of frequencies.

  • School of Distance Education

    Psychological Statistics Page 21

    MODULE 3MEDIAN TEST

    Objectives

    The student will know about various non-parametric tests likeMan-Whitney U test, Sign test, Wilcoxon Matched pairs signed ranks test.

    IntroductionNon-parametric tests can be used in situations where

    - the sample size is quite small (i.e., when N=5 or 6)- assumptions like normality of the distribution of values in the population is notensured.- When the measurement of data is either in ordinal (ranks) or nominal(frequencies). As non-parametric tests are less powerful than the parametric tests,when ever possible, parametric tests are to be used. But one cannot neglect the non-parametric tests because researchers in psychology need such tests for statisticalinferences in many situations.

    Some major non-parametric tests are discussed below.1. Mann-Whitney U Test

    To test the significance of difference between two independent means, usually t-test (parametric) is used. But when the normality of the variables are questioned, t-testis not applicable. Then Mann-Whitney U test can be used, but it tries to find out thedifference between population distributions, and not the population means. It is apowerful test to test the difference between two independent samples havinguncorrelated data.

    The basic assumption for Mann-Whitney U test is given below:"If two sets of data differ significantly, generally, the scores in one sample will be

    larger than the scores in the other. If the two samples are combined and all the scoresare placed in rank order on a line, then the scores from one sample will concentrate atone end of the line, the scores of the other sample will concentrate at the other end. Butif there is no significant difference between the groups, the large and small values willbe mixed evenly in the two samples, values of one sample not concentrating at one sideof the line.Step 1

    The null hypothesis Ho can be written as there is no tendency for the ranks inone group to be systematically higher or lower than the ranks in the other group. Fixthe level of significance as 0.05 or 0.01.Step 2

    Two samples are there sample A and sample B. Let nA be the number of subjectsin sample A and nb be the number of subjects (sample size) of sample B. Then combinethe scores of the two groups and the nA+ nB scores are ranked.

  • School of Distance Education

    Psychological Statistics Page 22

    Step 3The ranks in each group are summed up to get RAand RB

    (RA+ RB = N (N+1)/2 where N= nA+ nB)Step 4

    Then the Mann-Whitney U for the two sets are calculated using the formulaU = n n + n (n + 1)2 RU = n n + n (n + 1)2 RStep 5

    Take the Mann-Whitney U as the smaller of UA and UB. Refer the table of criticalvalues of Mann-Whitney U for the tabled value of U corresponding to the obtained UAand UB and the level of significance fixed. If the tabled value is less than theobtained U value, Ho can be accepted and if the obtained U value is less than thecritical value Ho is rejected.Example: The time in seconds recorded for 13 children (5 boys and 8 girls) in a block-manipulation task are given below. Test whether there is consistent difference betweenboys and girls in the time required for completing the task.

    Boys 23 18 29 42 21Girls 37 56 39 34 26 104 48 25

    Step 1The null hypothesis Ho: There is no consistent difference between boys and girls

    in the time needed for completion of the task.Alternate hypothesis: H1: There is a systematic difference.

    Take level of significance as =0.05.Step 2

    nA = 5 nB= 8 nA+ nB = 13Ranks

    Boys Girls18 - 121 - 223 - 329 - 642 - 10

    25 - 426 - 534 - 737 - 839- 948 - 1156 - 12104 - 13

    RA = 22 RB = 69

  • School of Distance Education

    Psychological Statistics Page 23

    U = n n + n (n + 1)2 RU = 5 8 + 5(5 + 1)2 22

    = 40+15- 22= 40- 7= 33U = n n + n (n + 1)2 R= 5 8 + ( ) 69= 40 + 36 69

    = 7

    The least value of UA and UB is 7 hence U=7The critical value is 6 (for nA =5 & nB =8) at 0.05 level of significance. Since the obtainedU value is greater than the tabled value, accept Ho. At the 0.05 level of significance, thedata do not provide sufficient evidence to conclude that there is significant differencebetween boys and girls in time needed for completing the task.2. The Sign Test

    Sign test is the most simple non-parametric test. It is used for comparing twocorrelated samples, i.e., two parallel set of scores which are paired off in some way(dependent samples). This test uses plus and minus signs instead of quantitativevalues and hence the name 'sign test'.

    Sign test is useful when1. Normality and homogeneity of variance of the variables are not sure, but the

    variable considered has a continuous distribution.2. All the subjects are not from the same population.3. Two correlated samples are to be compared, with the null hypothesis the median

    difference between the pairs is zero.4. There are two sets of measurements which can be matched (paired) with respect

    to relevant extraneous variables.5. When the data is not in interval or ratio scale, especially when the direction of

    change is given, not the quantitative measure.If the number of individuals in the single sample is less than or equal to 25, the

    table of probabilities associated with values as small as observed values of x (number offewer signs) in the Binomial test.

    If N is greater than 25, the normal approximation to the binomial distribution or2 may be used; Here

  • School of Distance Education

    Psychological Statistics Page 24

    = ( ) 1/2 . With Yate's correction this formula can be rewritten as= ( . ) / where x+.5 is takenwhen < , .5 is taken when >' ' is the total of '+' or '-' signs.Example: A researcher tests the effect of a diet plan on body weight. A sample of 50people is selected. The observation made at the start of the diet and one month laterrevealed that 37 people lost weight, 12 gained and one had no change in weight. Testwhether the treatment leads to gain in weight.

    Number of positive signs (initially high value, second one smaller, i.e., loss inweight) = 37

    Number of negative signs (gain in weight) =12Here N = Number of positive signs + Number of negative signs =49 (not the samplesize 50).Ho: The diet has no effect; H1: The diet has some effect.= ( 0.5) /2x can be taken as 37 or 12 leading to the same numerical value of z. The only changewill be in the sign.= ( . ) /= . .= . = 3.43

    Since this value is greater than 1.96, the critical value for significance at 0.05level, Ho is rejected. That is the assumption that 'the diet has no effect on weight' isrejected.(If N is small, instead of calculating z, the value of N and x is taken for finding thevalue of P from a table of probabilities associated with values in the Binomial test. If itis greater than 0.05, the null hypothesis is accepted, if it is less than 0.05, the nullhypothesis is rejected).3. Wilcoxon Matched-pairs Signed Ranks Test

    Wilcoxon Matched- Pairs Signed Ranks Test is more powerful than the sign testbecause it tests not only direction but also magnitude of differences within pairs ofmatched group. This test is used to test the difference between two related (dependent)samples/matched pairs of individuals and is not applicable to independent groups.MethodStep 1

    Let d1 be the difference of scores for any matched pair, representing thedifference between pair's scores under two treatments A and B.

  • School of Distance Education

    Psychological Statistics Page 25

    Step 2Delete all such pairs for which d1 = 0.

    Step 3Rank all the s with out regard to sign, giving rank 1 to the smallest d1, rank 2

    to the next smallest and so on.Step 4

    Indicate which ranks arose from negative d1s and which ranks arouse frompositive d'1 s by giving sign of difference to each rank.Step 5

    Sum the ranks for the positive differences and sum the ranks for the negativedifferences. Under the null hypothesis, the two sums are expected to be equal.Z is calculated using the formula= ( ) ( )( ) whereN- Number of pairs related.T- Sum of the ranks of the smaller, of the like signed ranksIllusteration

    The local Red Cross has conducted an intensive campaign to increase blooddonations. This campaign has been concentrated in 10 local businesses. In eachcompany, the goal was to increase the percentage of employees who participate in theblood donation program. Using Wilcoxon test to decide whether these data provideevidence that the campaign had a significant impact on blood donations. The datafrom each company has listed in a rank order to the absolute value of the differencescores.

    Company Percentage of participation RankBefore After Difference

    A 18 18 0 -B 24 24 0 -C 31 30 -1 1D 28 24 -4 2E 17 24 +7 3F 16 24 +8 4G 15 26 +11 5.5H 18 29 +11 5.5I 20 36 +16 7J 9 28 +19 8

  • School of Distance Education

    Psychological Statistics Page 26

    Step 1: The null hypotheses state that the campaign has no effect. Therefore, anydifferences are due to chance, and there should be no consistent pattern.

    Step 2: the two companies with difference scores of zero are discarded, and n is reducedto 8 and = .05, the critical value for the Wilcoxon test is T=3. A sample value thatis less than or equal to 3 will lead us to reject Ho.

    Step 3: For these data, the positive differences have ranks of 3, 4, 5.5, 5.5, 7, and 8;R+ = 33The negative differences have ranks of 1 and 2; = 3The Wilcoxon T is the smaller of these sums, so T = 3.

    Step 4: T vlue from the data is in the critical region. This value is very unlikely to occur bychance (p< .05); therefore, we reject Ho and conclude that there is a significant change inparticipation after Red Cross campaign.(source: Gravetter, F. J., & Wallnau, L. B. (2000). Statistics for the Behavioral Sciences (5th ed)