Statistical Reasoning III

Embed Size (px)

Citation preview

  • 8/13/2019 Statistical Reasoning III

    1/30

    The Basics

    Statistical Analysis of Difference

  • 8/13/2019 Statistical Reasoning III

    2/30

    Introduction When Statistical Tests are used?

    When researchers want to determine whether a statisticallysignificant difference exists b/w two or more set of

    numbers

    The decision to reject or accept the null hypothesis is basedon whether or not the observed values are in the criticalregion.

    What we will try to learn from next few classes?- Data Handling

    - Use of specific statistical tests

  • 8/13/2019 Statistical Reasoning III

    3/30

    Distribution for Analysis of DifferenceWhat type of distribution you have learned so far?

    - Standard Normal Distributions

    - Z-scores

    - When we use these distributions we assume thatpopulation standard deviation is known.

    - Because population standard deviations is usuallynot known we cannot ordinarily use the standardnormal distribution and its z- scores to drawstatistical conclusions from samples.

  • 8/13/2019 Statistical Reasoning III

    4/30

    Distribution for Analysis of DifferenceThen What should we do????

    Then researchers conduct most statistical tests

    using distributions that resemble the normaldistributions but are altered somewhat to accountfor the errors the errors that are made whenpopulation parameters are not known.

    The Three most common distributions used are t, Fand chi-square distributions

  • 8/13/2019 Statistical Reasoning III

    5/30

    How we use these distributions?

    Just like z-score distribution

    We determine the probability of certain z- scores

    based on standard normal distribution

    We can determine the probability of obtaining certaint, F and chi-square statistics based on their respective

    distribution.

    The decision to reject or accept the null hypothesis isbased on whether or not the observed values are inthe critical region.

  • 8/13/2019 Statistical Reasoning III

    6/30

    What are the shapes of thesedistributions?

  • 8/13/2019 Statistical Reasoning III

    7/30

    What things influence the shapes ofthese distribution?

    Degrees of Freedom

    The degrees of freedom are calculated in different

    ways for the different distributions but in generalare related to two things.

    1. Number of participants in study

    2. Number of levels of independent variable

  • 8/13/2019 Statistical Reasoning III

    8/30

    t- distributions

    The picture shows the shape of the t -distribution incomparison to the standard normal (or Z ) distribution. Noticethat the t -distribution becomes flatter with a smaller value of n.

  • 8/13/2019 Statistical Reasoning III

    9/30

    T-distributionSome characteristics of t-distribution also known as

    student t distribution

    1.The mean of the distribution is equal to 0 .2. The variance is equal to v / ( v - 2 ), where v isthe degrees of freedom and v > 2.

    3. The S.D. is always greater than 1

  • 8/13/2019 Statistical Reasoning III

    10/30

    F-distribution

    The shape of the F distribution is dependent upon thedegrees of freedom of both the numerator and denominator.Red has df

    1=2 and df

    2=3 , blue has df

    1= 4 and df

    2=30 ,

    and black has df 1= 20 and df 2=20.

  • 8/13/2019 Statistical Reasoning III

    11/30

    F-distributionCharacteristics of the F-distribution

    1. It is not symmetric. The F-distribution is skewedright. That is, it is positively skewed.

    2. The shape of the F-distribution depends uponthe degrees of freedom in the numerator anddenominator.

    3. The total area under the curve is 1.

    4. The values of F are always greater than or equalto zero. That is F distribution can not benegative.

    The F distribution is used to test whether two

    population variances are the same.

  • 8/13/2019 Statistical Reasoning III

    12/30

    Chi-square distribution

    Notice that in this picture as df gets large,curve is less skewed, more normal.

  • 8/13/2019 Statistical Reasoning III

    13/30

    Properties of Chi-square distributionChi-square is non-negative. Is the ratio of twonon-negative values, therefore must be non-negative itself.Chi-square is non-symmetric or asymmetric.There are many different chi-square distributions,one for each degree of freedom.The degrees of freedom when working with a

    single population variance is n-1.

  • 8/13/2019 Statistical Reasoning III

    14/30

    Let compare and review threedistributions

    t- distribution F- distribution Chi-square distribution

    A symmetric distribution Non-symmetric distribution.Why asymmetric becauseobtained from squaredscores of t-statistic

    Non-symmetric as the dfincreases it becomes moresymmetricObtained by distribution ofsquared z-scores

    Shape of t-distributionvaries with degree offreedom which is base onsample size In case oflarge sample size the t-distribution becomes morelike z-distribution becausedf and sample size arelarge

    Shape of f-distributiondepends on two degrees offreedom called numeratorand denominatorFirst one is associated withnumber of groups beingcomparedSecond is associated withsample size

    Shape of distribution varieswith its degree of freedom.

    Consist of both positive

    and negative values

    Consist of only positive

    values. Therefore positivelyskewed

    The value of chi-square is

    never negative. Thereforeit is positively skewed

  • 8/13/2019 Statistical Reasoning III

    15/30

    Types of Test

    Parametric Non-parametricUse sample statistics such asthe mean, standard deviationand variance to estimatedifferences b/w populationparameters

    Use rank and frequencydistributions to draw conclusionsabout the distribution ofpopulation parameters

    Major classes of parametrictests are t-test and Analysis ofvariancePearson product momentcorrelation

    Chi-square, Spearman rankorder rho,

    Based on specific assumptions Used when those assumptionsnot meet

    More powerful and preferred,

    however cannot always be usedbecause assumptions on which

    Considered less powerful

    however used as many times inactual researches the

  • 8/13/2019 Statistical Reasoning III

    16/30

    Assumptions of Tests ofDifference

    Assumptions for Parametric Tests are- Random selection- Homogeneity of variance- Level of Measurement (Controversial)

  • 8/13/2019 Statistical Reasoning III

    17/30

    from a normally distributedpopulationParticipants are randomly selected from normally

    distributed populationsEven if data sets are relatively normally distributed still

    accepted

    The extent to which data set is normally distributed canbe tested

    (We will practice this today on SPSS)In case when data set is not normally distributed there

    is one strategy called transform or convert data andthen use parametric tests on that data

    Otherwise can also use non-parametric tests

  • 8/13/2019 Statistical Reasoning III

    18/30

    Assumption 2: Homogeneity ofVariance

    Population variance of groups being tested are equalof homogenous

    This can also be tested statistically

    Will practice how to compute in next class

    What to do after checking homogeneity of variance?In case if the variances of the groups are found to

    differ significantly, non-parametric tests must be usedIn case if the sample sizes of groups being comparedare same, differences in the variances of groupsbecome less concernResearchers often design their studies to have equal

    sample size in two groups

  • 8/13/2019 Statistical Reasoning III

    19/30

    ssump on : eve oMeasurement

    Do you know what are levels of measurements?In the previous slide where we compared parametricand non-parametric test have you noted which type ofsample statistics we use for parametric and nonparametric testsParametric Non-parametric

    Use sample statistics such as themean, standard deviation andvariance to estimate differencesb/w population parameters

    Use rank and frequencydistributions to draw conclusionsabout the distribution of populationparameters

    Interval and ratio data meet thisneed

    Nominal and ranked ordinal datameet this need

    The controversy is about use ofparametric tests with ordinalmeasurements which not remainvalid so much

    Interval and ratio data can beconverted into ranks or groupedinto categories to meet this need

  • 8/13/2019 Statistical Reasoning III

    20/30

    Assumption 3: Level ofMeasurementNote : Regardless of the origin of numbers,

    parametric tests can be conducted as the dataitself meet the assumptions of parametric tests

    However the researcher must interpret theparametric statistical conclusions based onordinal data in light of their clinical and practicalimplications.

    Can b e il lus t ra ted w i th examp le

  • 8/13/2019 Statistical Reasoning III

    21/30

    Assumption 3: Level ofMeasurementExample from Rehabilitation ResearchVariable : Amount of assistance a patient needs to

    accomplish various functional tasks.Categories are :

    Codes

    Categories Mean Scores offour group

    1 Maximal 1.0

    2 Moderate 2.0

    3 Minimal 3.0

    4 Standby 4.0

    5 No Assistance

    5.0

    These group means have found tobe significantly different from oneanother.

    If the researcher believe that thereal interval b/w maximal andmoderate is greater than theinterval b/w stand -by and noassistance they might interpret

    the differences b/w 1.0 and 2.0 asc l in ical ly imp then b/w stand by

  • 8/13/2019 Statistical Reasoning III

    22/30

    Checking Normality of DataIf we learn it by example, our hypothesis in this example is the nullhypothesis (Ho) is that the data is normally distributed and thealternative hypothesis (Ha) is that the data is not normallydistributed.

    Steps ctions

    Step 1 Select "Analyze -> Descriptive Statistics -> Explore".

    Step 2 From the list on the left, select the variable Age" to the "Dependent List".Click "Plots" on the right. A new window will come. Check "None" forboxplot, un h k everything for descriptive and make sure the box"Normality plots with tests " is checked.

    Step 3 The results now appear in the "Output" window.

    Step 4 Interpret the result.

    Look into the third table. Here two tests for normality are run. For datasetsmall than 2000 elements, we use the Shapiro-Wilk test, otherwise, theKolmogorov-Smirnov test is used.If the Sig. value of the Shapiro-Wilk Test is greater than 0.05, the data isnormal. If it is below 0.05, the data significantly deviate from a normaldistribution.

  • 8/13/2019 Statistical Reasoning III

    23/30

    Graphical Method

    Normal quantile-quantile plot (Q-Qplot) is the most commonly used andeffective diagnostic tool for checkingnormality of the data.

    It is constructed by plotting theempirical quantiles of the data againstcorresponding quantiles of the normaldistribution.

    If the empirical distribution of the datais approximately normal, the quantilesof the data will closely match thenormal quantiles, and the points onthe plot will fall near the line y=x.

  • 8/13/2019 Statistical Reasoning III

    24/30

    Graphical Method

    It is impossible to fit a straight line in Q-Q plot for the real data due to the factthat the random fluctuations will cause the points to drift away and aberrantobservations often contaminate the samples.

    Only large or systematic departures from the line indicate the abnormality of thedata. The points will remain reasonably close to the line if there is just natural

    variability.

    Therefore, the straightness of the normal Q-Q plot helps us to judge whetherthe data has the same distribution shape as a normal distribution, while shiftsand tilts away from the line y=x indicate differences in location and spread,respectively

  • 8/13/2019 Statistical Reasoning III

    25/30

    Graphical Method (Q-Q plotInterpretation points)If the data are normally distributed, the data pointswill be close to the diagonal line. If the data pointsstray from the line in an obvious non-linear fashion,the data are not normally distributed.

    If you are at all unsure of being able to correctlyinterpret the graph, rely on the numerical methodsinstead because it can take a fair bit of experience tocorrectly judge the normality of data based on plots.

  • 8/13/2019 Statistical Reasoning III

    26/30

    Normality Check Other way

    Histogram: When a histograms shapeapproximates a bell-curve it suggests that the

    data may have come for a normal population.

  • 8/13/2019 Statistical Reasoning III

    27/30

    Example from Data SetIn both plots, there is a singlevalue that appears to beconsiderably different. It is anoutlier. This happens to beobservation number 5 in the data

    set.

  • 8/13/2019 Statistical Reasoning III

    28/30

    If we readjust outlier

  • 8/13/2019 Statistical Reasoning III

    29/30

    Analysis of Skewness andKurtosis

    Since the skewness and kurtosis of the normaldistribution are zero, values for these two parametersshould be close to zero for data to follow a normaldistribution.

    A rough measure of the standard error of theskewness is 6/n where n is the sample size.

    A rough measure of the standard error of the kurtosisis 24/n where n is the sample size.If the absolute value of the skewness for the data is

    more than twice the standard error this indicates thatthe data are not symmetric, and therefore not normal.Similarly if the absolute value of the kurtosis for thedata is more than twice the standard error this is alsoan indication that the data are not normal.

  • 8/13/2019 Statistical Reasoning III

    30/30

    ExampleIf in a data set the skewness for the data is(Absolute value .23 ) and the kurtosis is (Absolutevalue -1,53).The standard error for the skewness is .55 the

    standard error for the kurtosis is 1.10.Both values are nearly not the twice the standarderror

    As in previous slide

    If the absolute value of the skewness/kutosis for thedata is more than twice the standard error thisindicates that the data are not symmetric, andtherefore not normal.

    Both statistics are within two standard errorsh h h h d l k l b