Statistics for College Students-part 2

Embed Size (px)

Citation preview

  • 8/6/2019 Statistics for College Students-part 2

    1/43

    OVERVIEW

  • 8/6/2019 Statistics for College Students-part 2

    2/43

    STATISTICS

    Modules 5-8

    Prepared by:

    Mrs. Cristina H. Price

  • 8/6/2019 Statistics for College Students-part 2

    3/43

    Module 5 The Normal Curve

    The normal curve or the normal frequency distribution is a hypothetical distribution of scoresthat is widely used in statistical analysis. Since many psychological and physicalmeasurements are normally distributed, the concept of the normal curve can be used withmany scores. The characteristics of the normal curve make it useful in education and in thephysical and social sciences.

    Characteristics of the Normal Curve

    Some of the important characteristics of the normal curve are:

    1. The normal curve is a symmetrical distribution of scores with an equal number of scoresabove and below the midpoint of the abscissa (horizontal axis of the curve).

    2. The tails of the curve are asymptotic to the horizontal axis.

    3. Since the distribution of scores is symmetrical the mean, median, and mode are all at thesame point on the abscissa. In other words, the mean = the median = the mode.

    4. If we divide the distribution up into standard deviation units, a known proportion of scoreslies within each portion of the curve. The total area of the curve is equal to 1.

    5. Tables exist so that we can find the proportion of scores above and below any part of thecurve, expressed in standard deviation units. Scores expressed in standard deviation unitsare referred to as Z-scores.

  • 8/6/2019 Statistics for College Students-part 2

    4/43

    Standard score

    It is the distance of an observed value (x) from the

    mean in terms of the standard deviation. It tells

    how many standard deviations the observed value

    lies above or below the mean of its distribution.

    s

    xxz

    =

    =

    xzOR

    Where: x = observed value or raw score

    = sample mean

    s = sample standard deviation

    = population mean = population standard deviation

    x

  • 8/6/2019 Statistics for College Students-part 2

    5/43

    Using MS Excels Statistical

    Functions

    Conversion of raw score to standard score

    =standardize(x, , s)

    Finding the area/probability value given the value of z=normsdist(z)

    Finding the value of z given the probability value

    =normsinv(p-value)

    x

  • 8/6/2019 Statistics for College Students-part 2

    6/43

    Sample problem

    The average daily income of 2000 workers isP362.00 with a standard deviation ofP15.00. Assuming that the daily incomesare normally distributed,

    a) what percent of the workers earn at leastP380.00 per day?

    b) what percent of the workers earn below

    P350.00 per day?c) determine the number of workers who earn

    from P350.00 to P375.00 per day.

  • 8/6/2019 Statistics for College Students-part 2

    7/43

    Exercises

    1. In a departmental examination in statistics, the mean grade was74 and the standard deviation was 10. If the grades areapproximately normally distributed and 40 students got gradesbetween 70 and 80, how many students took the examination?

    2. The experience of a certain hospital showed that the distributionof length of stay of its patients is normal with a mean of 11.5 days

    and a standard deviation of 2 days.a) What percent of the patients stayed 9 days or less?

    b) If a new method in nursing care is to be administered to themiddle 95% of the group, how long should a patient stay to beincluded in the study?

    3. A study finds that the time spent on advertisement per hour on a

    certain TV station is approximately distributed with mean equal to12.8 minutes and standard deviation equal to 2.2 minutes. Duringa randomly selected hour, what is the probability that between 14and 16 minutes were devoted to advertisements?

  • 8/6/2019 Statistics for College Students-part 2

    8/43

    Hypothesis statement that is formulated which

    cannot be accepted to be true unless otherwise

    proven

    Assumption statement that is formulated andaccepted to be true without the necessity of a

    proof. It serves as the springboard of the study

    Types of hypothesis null and alternative

    Module 6 - Hypothesis Testing

  • 8/6/2019 Statistics for College Students-part 2

    9/43

  • 8/6/2019 Statistics for College Students-part 2

    10/43

    Other key concepts

    Types of test one-tailed and two-tailed

    Level of significance alpha (0.01, 0.05, 0.1)

    Observed value the obtained computed

    value based on the data gathered

    Critical value the value obtained from the

    table; the value that divides the distribution of

    the test into the rejection and the acceptanceregion

  • 8/6/2019 Statistics for College Students-part 2

    11/43

    Critical Values of z

    Test Type Level of significance

    0.05 0.01One-tailed 1.645 2.33

    Two-tailed 1.96 2.575

  • 8/6/2019 Statistics for College Students-part 2

    12/43

    Steps to performing hypothesis testing

    1. Write the original claim and identify whether it is the nullhypothesis or the alternative hypothesis.

    2. Write the null and alternative hypotheses. Use thealternative hypothesis to identify the type of test.

    3. Write down all information from the problem.

    4. Determine the appropriate test statistics. Find thecritical value using the tables.

    5. Compute the test statistic.

    6. Make a decision to reject or fail to reject the nullhypothesis. A picture showing the critical value and teststatistic may be useful.

    7. Write the conclusion.

    Steps in hypothesis testing

  • 8/6/2019 Statistics for College Students-part 2

    13/43

    Bivariate Distribution

    Involving two variables

    - Significant difference (t-test, z-test, ANOVA)

    - Significant relationship (Pearson r,

    Spearmans rho, Chi-square and other

    correlational techniques)

  • 8/6/2019 Statistics for College Students-part 2

    14/43

    Testing significant difference using

    parametric test (two groups)

    t test

    distribution is normal

    homogeneous variance

    sample std. deviation is knownn < 30

    z test

    distribution is normal

    homogeneous variance

    population std. deviation is known

    n 30

  • 8/6/2019 Statistics for College Students-part 2

    15/43

    t-test & z test

    (sample vs population)

    1;)(

    =

    = ndf

    s

    nxt

    nxz )( =

  • 8/6/2019 Statistics for College Students-part 2

    16/43

    Decision rules

    Observed value < Critical value

    OR

    p-value > level of significance()

    Accept the null hypothesis (Theres not enoughevidence to reject the null hypothesis)

    Observed value Critical value

    OR

    p-value < level of significance()

    Reject the null hypothesis

  • 8/6/2019 Statistics for College Students-part 2

    17/43

    Sample Problems

    1. A certain rice miller claims that the average weightof a cavan of rice is 50 kilograms with a standarddeviation of 5 kilograms. A retailer sampled 20cavans of this rice and got an average weight of

    46.6 kilograms. Is the claim of the rice miller validusing 5% level of significance?

    2. A standardized test was administered to thousandsof pupils with a mean score of 85 and a standarddeviation of 8. A random sample of 50 pupils were

    given the same test and showed an average scoreof 83.20. Is there evidence to show that this grouphas a lower performance than the ones in general at0.05 level of significance?

  • 8/6/2019 Statistics for College Students-part 2

    18/43

    t-test & z test

    (two-sample groups)

    y

    y

    x

    x

    yx

    nn

    yxz22

    )()(

    +

    =

    2;

    11

    2

    )1()1(

    )()(

    22+=

    +

    +

    +

    = yx

    yxyx

    yyxx

    yxnndf

    nnnn

    SnSn

    yxt

  • 8/6/2019 Statistics for College Students-part 2

    19/43

    Sample Problems

    1. A random sample of 20 newly-born baby boys showed an average weight of7.4 pounds while a sample of 25 newly-born baby girls showed a meanweight of 6.5 pounds. If the variance of all newly-born babies is 1.25 pounds,can we say that newly-born baby boys are heavier than newly-born babygirls?

    2. Two hamburger stores were compared in terms of the number of orders of

    hamburger per day. The results of the ten-day observation were as follows:

    Using the 0.05 level of significance, test if there is a significant difference inthe number of orders of hamburger from the two stores.

    Day 1 2 3 4 5 6 7 8 9 10

    Nutri 148 126 103 169 135 152 144 124 132 128

    Deli 150 127 125 152 129 146 153 118 126 119

  • 8/6/2019 Statistics for College Students-part 2

    20/43

    t-test for paired observations

    (dependent groups)

    1;

    )1(

    22 =

    =

    ndf

    nn

    dnd

    d

    t

  • 8/6/2019 Statistics for College Students-part 2

    21/43

    Sample Problem

    A certain diet pill was developed by a pharmaceutical company. To testthe efficacy of the said pill, 10 randomly selected individuals were selected.The results of the study are presented in the following table:

    Use hypothesis testing to determine whether the diet pill is effective or not.

    SUBJECTS WEIGHT BEFORE WEIGHT AFTER

    1 148 150

    2 142 1393 131 130

    4 128 128

    5 121 123

    6 118 115

    7 120 119

    8 152 151

    9 112 110

    10 110 105

  • 8/6/2019 Statistics for College Students-part 2

    22/43

    Module 7 - Correlation and Simple

    Linear Regression

    1. Pearsons product-moment correlation

    coefficient (Pearson r)

    2. Spearmans rank-order correlation

    coefficient (Spearmans )

    P d l i

  • 8/6/2019 Statistics for College Students-part 2

    23/43

    Pearsons product-moment correlation

    coefficient

    ( ) ( )

    ( )[ ] ( )[ ]

    =

    2222yynxxn

    yxx ynr

  • 8/6/2019 Statistics for College Students-part 2

    24/43

    Sample Problem

    Determine if there is a relationship between the number of years of service

    and the employees monthly salary based on the data gathered from a

    certain company.

    No. of yrs. of service Monthly salary (in T)

    5 25

    7 28

    8 29

    10 32

    12 34

    2 18

    11 32

    15 35

    20 40

    25 50

  • 8/6/2019 Statistics for College Students-part 2

    25/43

    Legend for Pearsons r and

    Spearmans rho

    0.00 0.3 Little or no positive correlation

    0.31 0.5 Low positive correlation

    0.51 0.7 Moderately positive correlation

    0.71 0.9 High positive correlation

    0.91 1.0 Very high positive correlation

    Overview

  • 8/6/2019 Statistics for College Students-part 2

    26/43

    Testing the significance of the

    relationship

    2,1

    2

    2=

    =

    ndfr

    n

    rt

    Where:

    r = the correlation coefficientn = no. of pairs

    df = degrees of freedom

  • 8/6/2019 Statistics for College Students-part 2

    27/43

    Key concepts to remember

    Correlation simply describes a relationship

    between two variables. It does not explain

    why the two variables are related.

    Specifically, a correlation should not andcannot be interpreted as proof of a cause-

    and-effect relationship between the two

    variables.

    The value of a correlation can be affectedgreatly by the range of scores represented in

    the data.

  • 8/6/2019 Statistics for College Students-part 2

    28/43

    Key concepts to remember

    One or two extreme data points, often called

    outliers, can have a dramatic effect on the

    value of a correlation.

    A correlation measures the degree ofrelationship between two variables.

    The values ofrrange from -1.00 to +1.00.

    The valuer2

    is called thecoefficient of

    determination because it measures the

    proportion of variability in one variable that

    can be determined from the relationship with

    the other variable.

    S k d l ti

  • 8/6/2019 Statistics for College Students-part 2

    29/43

    Spearmans rank-order correlation

    coefficient (Spearmans )

    )1(

    6

    1 2

    2

    =

    nn

    d

    Where:n = no. of pairs

    d = difference between the ranks of each pair

    Statistics to test the

  • 8/6/2019 Statistics for College Students-part 2

    30/43

    Statistics to test the

    significance of

    1= nz

  • 8/6/2019 Statistics for College Students-part 2

    31/43

    Sample Problem

    Seven instructors are rated by

    freshmen and sophomore

    students on clarity of

    presentation and the results

    are tabulated. What is the

    Spearman rho for the

    following?

    Instructor Freshmen Sophomore

    1 44 58

    2 39 42

    3 36 18

    4 35 22

    5 33 31

    6 29 38

    7 22 38

    Where a is the intercept and b is the slope or the incremental change in Y when X changes by one unit

  • 8/6/2019 Statistics for College Students-part 2

    32/43

    Regression Analysis

    Regression Analysis is a statistical technique used to

    describe relationships among variables. This

    relationship is expressed in a form of mathematical

    equation. The simplest case of such a relationship is

    when there is a single independent variable (X)explaining the dependent variable (Y) in a linear

    fashion.

    bxay +=

    Where a is the intercept and b is the slope or the incremental change in Y when X changes by one unit.

    Where a is the intercept and b is the slope or the incremental change in

    Y when X changes by one unit.

  • 8/6/2019 Statistics for College Students-part 2

    33/43

    Key concepts to remember

    Regression analysis is the most widely used technique of

    Multivariate Analysis with applications across all types of

    problems and all disciplines.

    It is a statistical technique that is concerned with describing and

    evaluating the relationship between a metric variable called

    dependent variable and one or more metric or non-metric

    variables called independent variables orregressors.

    It attempts to predict the change in the dependent variable as a

    result of changes in the independent variables. In addition, the

    analysis of the independent variables allows assessment of their

    respective explanatory impact on the dependent variable.

    NSAT Achievement

    G d

    SUMMARY OUTPUT

  • 8/6/2019 Statistics for College Students-part 2

    34/43

    Grade

    78 82

    79 83 Regression Statistics

    80 82 Multiple R 0.95

    92 91 0.90

    93 94 Adjusted 0.8866686 85 Standard

    Error1.45918

    88 87 Observations 8

    86 86

    ANOVA

    df SS MS F Significance F Regression 1 118.7248 118.725 55.7605 0.0003

    Residual 6 12.77516 2.1292

    Total 7 131.5

    Coefficients StandardError

    t Stat P-value Lower 95%

    Upper95%

    Intercept 25.46 8.16 3.12 0.02 5.50 45.42

    NSAT 0.71 0.10 7.47 0.00 0.48 0.95

    ab

  • 8/6/2019 Statistics for College Students-part 2

    35/43

    Using the formula:

    Predicted Achievement grade = 25.46 + 0.71 * 70

    = 75.38

    The value of r-squared indicates the percentage ofrelationship between the NSAT scores and the

    achievement grade. Thus, there is 90.29 %

    association.

  • 8/6/2019 Statistics for College Students-part 2

    36/43

    Module 8 Selected Nonparametric

    Statistics

    Chi-square test (2)

    Mann-Whitney U test

    Kruskal-Wallis H test

  • 8/6/2019 Statistics for College Students-part 2

    37/43

    Chi-square Test

    Significant relationship

    Test of goodness-of-fit

    Test of independence

    ( ))1)(1(;

    2

    2=

    = crdf

    EF

    EFOF

    Where:

    OF= observed frequency

    EF = expected frequency

    Sample problem:

  • 8/6/2019 Statistics for College Students-part 2

    38/43

    Red Yellow Green Blue Total

    Introvert 10 3 15 22 50

    Extrovert 90 17 25 18 150

    100 20 40 40 200

    Sample problem:

    Suppose we want to find out if there is a relationship between the students

    color preference and personality. The data may be illustrated in the

    contingency table below:

    Observed frequencies:

    Row

    totals

    (fr)

    Column totals (fc)

    Grand total (n)

  • 8/6/2019 Statistics for College Students-part 2

    39/43

    To determine the expected frequencies for each cell, we

    use the formula below:

    n

    ffef cr

    ))((=

    Whereef = expected frequency

    fr = total frequencies of the corresponding row

    fc = total frequencies of the corresponding column

    n = grand total

  • 8/6/2019 Statistics for College Students-part 2

    40/43

    Testing significant difference using

    nonparametric test (two groups)

    Mann-Whitney U test

    +

    += 111

    2112

    )1(R

    NNNNU

    +

    += 222

    2122

    )1(R

    NNNNU

    Where:

    U is the lower value between U1 and U2.

  • 8/6/2019 Statistics for College Students-part 2

    41/43

    Example:

    Treatment Control

    4 20

    7 17

    1 3

    12 15

    2 7

    2 12

    9 18

  • 8/6/2019 Statistics for College Students-part 2

    42/43

    Testing significant difference

    (3 or more groups)

    Parametric test (distribution is normal)

    ANOVA (Analysis of variance)

    Nonparametric test

    Kruskal-Wallis

    1),1(3

    )1(

    12

    1

    2 =+

    +

    = =

    kdfnR

    nn

    Hk

    i

    i

  • 8/6/2019 Statistics for College Students-part 2

    43/43

    To compare four bowling balls, a professional bowler bowls five

    games with each ball and gets the following scores:

    Bowling ball A 221 232 207 198 212

    Bowling ball B 202 225 252 218 226

    Bowling ball C 210 205 189 196 216

    Bowling ball D 229 192 247 220 208

    Use the H test at 0.05 level of significance to test the nullhypothesis that on the average the bowler performs equallywell with the four bowling balls.