22
The Factors that Affect GPA of Undergraduate Students

The Factors that Affect GPA of Undergraduate Students Points: 95 · 2014. 2. 19. · GPA and unemployed peoples’ GPA. The means were actually quite close. Employed people had a

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

  • The Factors that Affect GPA of Undergraduate Students

    NinaSticky NotePoints: 95

  • ABSTRACT

    The purpose of this case study is to show the correlation between GPA and amount of

    hours worked. Using a t-test it was found that we did not reject the hypothesis of µ = µ. The

    regression equation for GPA vs. number of hours worked was f(x) = 8.899132 + 0.555767x. The

    coefficient of determination was found to be 0.000602, and the correlation coefficient was

    0.024546. As a second variable, the number of hours of sleep was tested as a factor that affects

    GPA. For GPA vs. number of hours of sleep the regression equation was y = 3.357+ -0.00665x.

    The coefficient of determination was 0.021732, and the correlation coefficient was -0.147416.

    INTRODUCTION

    This case study is to show the relationship between students’ GPA and the number of

    hours they work per week. It is thought that a student that works more will have a lower GPA

    because the student has less time to study. This means the mean GPA of the group of students

    that are unemployed should have a higher GPA than the students that are employed. Additional

    factors that could affect GPA, such as sleep, the number of hours of sleep each student got were

    also recorded. This hypothesis has been discussed in the article “Term-Time Employment and

    the Academic Performance of Undergraduates”. In this article they state that their survey data of

    students through the years 2004 to 2008 “finds that an increase in work hours has negative

    effects on GPA” (Wenz). This was our original thought process as well and the motivation

    behind collecting and analyzing the data collected.

    We collected our data by creating a survey online and inviting our friends using social

    media to take the survey. We collected 50 survey entries. Our survey was conducted using the

    website Kwiksurveys.com and consisted of 16 questions (see attached survey). From these

    questions we were able to choose useful numbers in which we thought best correlated with GPA.

    NinaHighlight

    NinaSticky Notemissing subscripts

    NinaSticky NoteDo not include equations in an abstract

  • To analyze our collected data, we used Excel to organize our data and compute statistics and

    graphs.

    To show if there was a difference between the means of employed peoples’ GPA and

    unemployed peoples’ GPA we used a 2-tailed non-pooled t-test. We first conducted a hypothesis

    test with our Ho: µ1 = µ2. Our null hypothesis was that the mean GPA of unemployed students

    was equal to the mean number of employed students. Our alternative hypothesis stated that the

    mean GPA of unemployed students did not equal the mean GPA of employed students, Ha: µ≠ µ.

    We did this at the 95% confidence interval. The test was performed at the 5% significance level,

    α = 0.05. The null hypothesis was that the means would be equal. This would mean there is no

    significant difference at the 95% confidence interval between the GPA of people who work and

    the people who don’t work.

    To figure this out we first needed to find out what the means of both employed peoples’

    GPA and unemployed peoples’ GPA. The means were actually quite close. Employed people

    had a mean GPA of 3.08, and unemployed people had a mean GPA of 3.016. This already goes

    against our thought that unemployed people would have a higher GPA.

    The critical values, t were calculated to indicate any discrepancy between the two means

    was present. To solve for t you have to subtract the mean of one minus the mean of the other and

    divide it by the square root of the standard deviation squared of one data set over the number of

    variables plus the standard deviation squared of the other data set over the number of variables.

    From this calculation, t value was obtained, which equaled to 0.530. Since this is a 2-tailed test

    0.530 had to be between the two significant values. Since alpha is 0.05 and it was a two tailed

    test, the alpha value was then split to be α = 0.025. We found the t value to be ±2.011. The value

    0.530 falls in the region where we cannot reject the null. This means there is no significant

  • difference between the mean GPA of employed people and the GPA of unemployed people. This

    appears contradictory for we expected the GPA of unemployed people to be higher since they

    would be able to study more.

    From this result, we’ve considered other additional factors that could affect the findings

    we made and realized just because people have more time to study doesn’t mean they are using

    it. We also realized this could be due to the fact that working people are more motivated due to

    the fact they realize what it is like to work and most likely not be getting paid a wage they are

    happy with.

    The purpose of a 2-tailed hypothesis test is to show that the true mean of a population is

    between two numbers given by the t value within a certain confidence level. The higher the

    confidence level, the closer together the two numbers that the true mean needs to be between. If

    the number does not fall between the numbers the null hypothesis is rejected. If it lies within the

    range the null is not rejected, like what happened with our experiment. This is how we were able

    to show that the two means are equal at the 95% confidence interval.

    Our hypothesis test shows that the mean GPA of students who are employed is equal to

    the mean GPA of students who are unemployed. To further prove this, we conducted a

    regression analysis. By conducting this analysis we would be able to recognize if there was any

    correlation between GPA and the number of hours worked per week by a student. If there is no

    correlation seen between these two variables, then our hypothesis test would prove to be correct.

    This data was taken from 50 randomly selected college students, all attending different

    universities. The GPA range from 2.0 to 3.95 and the number of hours worked per week range

    from 0 to 40. This regression analysis analyzed the following data:

  • Table I: GPA and Number of Hours Worked

    GPA Hours Work

    2 20

    2.4 10

    2.5 10

    2.5 3

    2.5 20

    2.56 8

    2.56 0

    2.56 12

    2.58 0

    2.6 10

    2.6 0

    2.7 15

    2.89 15

    2.9 20

    2.9 20

    2.9 25

    2.9 0

    2.9 0

    2.9 0

    2.9 0

    2.9 15

    3 15

    3 0

    3 15

    3 20

    GPA Hours Work

    3.01 5.5

    3.1 6

    3.1 20

    3.17 27

    3.2 0

    3.2 0

    3.2 20

    3.24 20

    3.27 10

    3.29 20

    3.3 10

    3.32 6.5

    3.33 12.5

    3.4 10

    3.4 0

    3.43 40

    3.5 5

    3.5 0

    3.5 0

    3.6 0

    3.67 9

    3.69 5

    3.7 20

    3.8 25

    3.95 5.5

    By placing this data into a scatter plot, it was easy to distinguish a weak relationship between the

    number of hours worked per week and GPA. A scatter plot for the data collected is shown as:

  • Figure 1: GPA and Number of Hours Worked

    Obtaining a regression equation for the data in the scatter plot does not seem reasonable.

    This is because all of the data is spread out across the scatter plot. Also, if a line of best fit were

    to be drawn, most of the points would not hit the line, which is an indication that there is little to

    no correlation between GPA and number of hours worked per week. The data presents as:

    SUMMARY OUTPUT for figure 1

    Regression Statistics Multiple R 0.024546 R Square 0.000602 Adjusted R Square -0.02022 Standard Error 9.493849 Observations 50

    ANOVA df SS MS F Significance F

    Regression 1 2.608213 2.608213 0.028937 0.865639 Residual 48 4326.392 90.13316

    Total 49 4329

    0 1 2 3 4 5

    0

    5

    10

    15

    20

    25

    30

    35

    40

    45

    GPA

    Nu

    mb

    er

    of

    ho

    urs

    wo

    rke

    d p

    er

    we

    ek

    Number of Hours worked and GPA

    Number of Hours working

    Linear (Number of Hours working)

  • Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

    Intercept 8.899132 10.08839 0.882116 0.38211 -11.3849 29.18321

    X Variable 1 0.555767 3.267106 0.17011 0.865639 -6.01319 7.124724

    The regression equation for the data is f(x) = 8.899132 + 0.555767x. From this regression line

    you get a slope of 0.555767. The slope is slightly increasing, inferring that GPA increases as the

    number of hours worked increases. The best fit line only passes through a few of these points on

    the scatter plot, showing that it would not be a good idea to make an inference about GPA and

    number of hours worked from this regression equation. It does not represent enough of the data

    correctly.

    The coefficient of determination is 0.000602. This shows how much variation in the GPA

    is explained by the number of hours worked. This number shows how much of the actual data

    was explained by the regression line. If only 0.06% of the data was explained by the regression

    line, this indicates that a lot of the data goes unexplained by the line of best fit. The correlation

    coefficient is 0.024546. This was found by taking the square root of r2 (0.000602). The

    correlation coefficient shows that since 0.024546 is not close to 1 (which would be a perfect line

    of fit) that it would not be a good idea to say that GPA depends on number of hours worked. This

    is an example of an extremely weak positive linear correlation. Because it is so close to zero, it

    could possibly be considered linearly uncorrelated. This implies that the regression equation is

    not useful for making predictions.

    An outlier is an observation that lies outside the overall pattern of the data. In this data

    set, the outlier seen is at (3.43, 40). This data points lies far from the regression line, relative to

    the other data points. This point is determined an outlier because it is located outside of where all

    the other points seemed to be condensed. This point was not near the regression line and had

    great affect upon the regression equation. The removal of this point changes the line of

  • regression. A potential influential observation would also be seen at (3.43, 40). This point’s

    removal would change the coefficient of determination to 0.001429. This would also change the

    regression equation to f(x) = 12.35081 – 0.77004x. Both the slope and y-intercept change

    considerably.

    By removing the outlier listed above, a new scatter plot would present as:

    Figure 2: Number of Hours Worked and GPA (removal of outlier)

    Obtaining a regression equation for the data in the scatter plot would once again be

    unreasonable. This is because the data is still all so far away from the regression line. Most of the

    points do not hit the line of best fit, indicating that there is little to no correlation between GPA

    and number of hours worked per week.

    The data with the removal of the outlier, (3.43, 40) would present as:

    0 1 2 3 4 5

    0

    5

    10

    15

    20

    25

    30

    GPA

    Nu

    mb

    er

    of

    ho

    urs

    wo

    rke

    d p

    er

    we

    ek

    Number of Hours worked and GPA

    Number of Hours working

    Linear (Number of Hours working)

  • SUMMARY OUTPUT for figure 2

    Regression Statistics

    Multiple R 0.037797 R Square 0.001429 Adjusted R Square -0.01982 Standard Error 8.557783

    Observations 49

    ANOVA

    df SS MS F Significance F

    Regression 1 4.924384 4.924384 0.06724 0.796531

    Residual 47 3442.076 73.23565

    Total 48 3447

    Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

    Intercept 12.35081 9.147798 1.35014 0.183437 -6.05218 30.75381

    X Variable 1 -0.77004 2.969591 -0.25931 0.796531 -6.74408 5.20401

    The new regression equation for this data is f(x) = 12.35081 – 0.77004x. From this

    regression line you get a slope of -0.77004. This slope is slightly decreasing, inferring that as

    number of hours worked decreases, the GPA increases. The best fit line still only passes through

    a few points on the scatter plot, inferring that it is still not a good idea to make an inference about

    the data using this new regression equation. The coefficient of determination is 0.001429. This

    shows how much variation in the GPA is explained by the number of hours worked per week. If

    only 0.14 % of the data is explained by the regression line, this indicates that a lot of the data

    goes unexplained by the line of best fit. The correlation coefficient is -0.037797. This was found

    by taking the square root of r2 (0.001429). The correlation coefficient shows that since -0.037797

    is not close to 1 (a perfect line of fit), that it would not be a good idea to say that GPA depends

    on the number of hours worked per week. This would be an example of a weak negative linear

  • correlation. Because it is so close to zero it could possibly be considered linearly uncorrelated.

    This implies that the regression equation is not useful in making predictions.

    By removing the outlier at (3.43, 40), the regression equation went from a positive to a

    negative slope. This dramatically changes the analysis of the data from inferring that GPA

    increases as the number of hours worked increases, to inferring that as GPA increases, the

    number of hours worked decreases. These two regression equations are completely different and

    infer different ideas about the relationship between GPA and number of hours worked. Although

    both regression equations differed considerably, the amount of data explained by these lines was

    so small that neither equation could be used dependably. Both equations had extremely small

    coefficients of determination, suggesting that both regression lines were not strong enough to

    make inferences about the data.

    The regression analysis of the data further explained our results for the hypothesis test of

    whether or not the mean GPA of unemployed students was equal to the mean GPA of employed

    students. By taking the actual number of hours worked by employed and unemployed students

    we were able to conclude that there is no relationship between the GPA and number of hours

    worked, and that GPA does not depend on the number of hours worked. By finding this, it

    proved that the result of the hypothesis test was reasonable in showing that the mean GPA of

    unemployed students was equal to the mean GPA of employed students, proving that both mean

    GPAs are equal because there is no affect upon GPA by number of hours worked (employment).

    The original data of the GPA scores and hours worked per week can be further tested by

    a residual analysis. The first assumption of regression inferences is that a plot of the residuals

    against the values of GPA should fall roughly in a horizontal band centered and symmetric

    around the x-axis. The second assumption is that a normal probability plot of the residuals should

    NinaSticky Notesince GPA is your dependent variable one would state: As hours worked increases (your independent variable) the GPA decreases (your dependent variable)

  • be roughly linear. If these assumptions are not met, the validity of the assumptions for regression

    inferences of the data is lost. The residual plot of the data for GPA presents as:

    Figure 3: Residual Plot GPA Data

    This residual plot shows that the residuals fall roughly in a horizontal band that is

    centered and symmetric about the x-axis. This meets the first assumption of a residual plot.

    Figure 4: Normal Probability Plot for Residuals

    The normal probability plot shows that the plot for the residuals is roughly linear. This

    meets the second assumption of a normal probability plot. Interpreting these plots, we can

    conclude that there are no obvious violations of the assumptions for regression inferences for the

    variables GPA and number of hours worked per week.

    -20

    0

    20

    40

    0 1 2 3 4 5

    Re

    sid

    ual

    s

    GPA

    GPA Residual Plot

    0

    20

    40

    60

    0 20 40 60 80 100 120

    Nu

    mb

    er

    of

    ho

    urs

    wo

    rke

    d

    Residual

    Normal Probability Plot

  • Table II: GPA and Number of Hours of Sleep

    Table II: The data was obtained for a sample Undergraduate students at the University of Rhode

    Island.

    Previous attempt to determine whether the Number of Hours of Working has an effect on

    students’ GPA resulted in a strongly weak correlation, more close to no correlation. To explore

    additional factors that are associated to the GPA of students, the relationship between GPA and

    the number of hours of sleep was examined. First, a scatterplot was developed in order to

    visualize any apparent relationship between GPA and the number of hour of sleep:

    GPA Hours of Sleep GPA Hours of Sleep

    2 45 3.01 49

    2.4 75 3.09 35

    2.5 42 3.1 45

    2.5 50 3.1 49

    2.5 56 3.17 32

    2.56 30 3.2 50

    2.56 56 3.2 36

    2.56 35 3.2 35

    2.58 56 3.24 45

    2.6 40 3.27 42

    2.6 42 3.3 50

    2.7 49 3.32 42

    2.89 49 3.33 54

    2.9 42 3.4 50

    2.9 40 3.4 56

    2.9 32 3.45 49

    2.9 55 3.5 35

    2.9 50 3.5 40

    2.9 35 3.5 60

    2.9 56 3.6 49

    2.9 50 3.67 42

    3 35 3.69 35

    3 33 3.7 35

    3 50 3.8 35

    3 49 3.95 56

  • Figure 5: Scatterplot for the GPA and the average number of hours of sleep students get over

    the span of 7 days data from Table II.

    The scatterplot is consists of data from Table II with the horizontal axis used number of

    hours of sleep and the vertical axis used for GPA. Although the GPA – number of hours sleep

    data points do not fall exactly on a line, they appear to scatter about a line, so a regression

    equation is obtained to further examine the relationship between the two variables.

    Figure 6: Regression line and data points for GPA – Number of Hours of Sleep data

    0

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    4.5

    0 10 20 30 40 50 60 70 80

    GP

    A

    Number of Hours of Sleep (Hours)

    GPA vs. Number of Hours of Sleep

    0

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    4.5

    0 20 40 60 80

    GP

    A

    Number of Hours of Sleep (Hours)

    GPA vs. Number of Hours of Sleep

    GPA

    Regression Equation

  • SUMMARY OUTPUT for Figure 6

    Regression Statistics

    Multiple R -0.147416

    R Square 0.021732

    Adjusted R

    Square 0.001351

    Standard Error 0.413929

    Observations 50

    ANOVA

    df SS MS F Significance F

    Regression 1 0.182695 0.182695 1.066288 0.30696

    Residual 48 8.224193 0.171337

    Total 49 8.406888

    Coefficients

    Standard

    Error t Stat P-value Lower 95% Upper 95%

    Intercept 3.356989 0.296543 11.3204 3.75E-15 2.760749 3.953229

    X Variable 1 -0.00665 0.006437 -1.03261 0.30696 -0.01959 0.006296

    Table III: Summary of Regression Statistics; obtained from Microsoft Excel using Data Analysis.

    From the summary of regression statistics table, the linear equation can be obtained: y =

    3.357+ -0.00665x; where the y-intercept is 3.357 and the slope is -0.00665. The line slopes

    downward, where the y-values decrease as x increase because the slope is negative. This

    indicates that the GPA decreases as the number of hour of sleep increases, which is no surprise.

    Although the data points are slightly scattered, they are scattered about a line, so it would be

    appropriate to determine a regression line, which indicates that it is acceptable to determine the

    coefficient of determination.

    The coefficient of determination, represented r2, is a descriptive measure of the utility of

    the regression equation for making predictions. It can be calculated using the proportion of

    variation in the observed values of the response variable (SSR) by the total regression (SST).

    From the summary of regression statistics table, the values of SSR and SST can be obtained,

  • which are 0.182695 and 8.406888, respectively, resulting in the coefficient of determination

    value of 0.021732. The coefficient of determination value will always lie between 0 and 1, where

    a value near 0 indicates that the regression equation is not very useful for making predictions,

    while a value near 1 indicates that the regression is very useful for making predictions. Given

    that the coefficient of determination value of 0.021732, which is extremely close to 0, suggests

    that the regression is rather not very useful in making assumptions.

    Another method used to measure the correlation between two quantitative variables is the

    linear correlation coefficient, r. This value measures the strength of the linear relationship

    between two variables. From the regression statistics table, the value of the linear correlation

    coefficient can be determined, which is -0.147416. From this value of the linear correlation, we

    can conclude a number of properties about the data. First, the negative r value reflects the slope

    of the scatterplot, which in this case is negative. Second, the sign of r also suggests the type of

    linear relationship. This r value is negative (-), suggesting the variables are negatively linearly

    correlated, meaning that y will decrease linearly as x increases. Third, the magnitude of the r

    value indicates the strength of the linear relationship between the two variables; for this

    scatterplot it is -0.147416, which is not near ±1, but rather 0. Therefore this shows most weak

    linear relationship between the variables. Furthermore, this can strongly conclude that the

    variable x is a poor linear predictor of the variable y. Lastly but not least, the sign of r and the

    sign of the slope of the regression line is identical. The sign of both r and the slope of the

    regression line are negative. This identical sign implies that the regression equation and the

    linear correlation are useful for making predictions. This results in an extreme weak negative

    linear correlation with an r value of -0.147416, which can be concluded that the two variables are

    linearly uncorrelated.

  • Examining both methods of Coefficient of Determination and Linear Correlation

    Coefficient resulted in uncorrelated relationship between the GPA and the number of hours of

    sleep. This leads to an assumption that there may be outliers or influential observations present in

    the data. An outlier is any data point that lies far from the regression line. Influential observation

    is any data point whose removal causes a significant change in the regression equation. Two

    influential observations, (56, 3.95) and (45, 2) were removed and a new scatterplot was obtained.

    Figure 7: New regression line and data points for GPA – Number of Hours of Sleep data with 2

    influential observations, (56, 3.95) and (45, 2) removed.

    SUMMARY OUTPUT for

    figure 7

    Regression Statistics

    Multiple R -0.231041 R Square 0.05338 Adjusted R Square 0.032801 Standard Error 0.365501

    Observations 48

    0

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    0 20 40 60 80

    GP

    A

    Number of Hours of Sleep

    GPA vs. Number of Hours of Sleep

    GPA

    Regression Equation

  • ANOVA

    df SS MS F Significance F

    Regression 1 0.346528 0.346528 2.593951 0.114114 Residual 46 6.14517 0.133591

    Total 47 6.491698

    Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

    Intercept 3.477697 0.264531 13.14663 3.45E-17 2.945223 4.010172

    X Variable 1 -0.00929 0.005768 -1.61057 0.114114 -0.0209 0.002321

    Table IV: Summary of Regression Statistics for new scatter plot with two influential observations

    removed; obtained from Microsoft Excel using Data Analysis.

    From the new summary of regression statistics table, the linear equation is obtained: y =

    3.477 + -0.00929x; where the y-intercept is 3.477 and the slope is -0.00929. The line slopes still

    negative, indicating that the GPA decreases as the number of hour of sleep increase. The

    coefficient of determination for the new scatter plot is 0.05338. Although the r2 value has

    increased significantly, it remains closer to 0, indicating that the regression is still not useful in

    making assumptions. The linear correlation coefficient, r value for the new scatter plot is -

    0.231041. This value has increased tremendously after having removed 2 influential

    observations. The negative value reflects negative slope of the scatter plot which is identical for

    the linear regression equation, also suggesting the negative linearly correlation relationship.

    Although the magnitude of the r value is 0.231041 is still closer to 0 than ±1, it shows a better

    weak correlation between two variables. From the linear correlation coefficient computation, we

    can conclude there is a weak negative linear correlation with an r value of -0.231041.

    CONCLUSION

    The results were not what were initially expected. The mean GPA of both employed and

    unemployed was not significantly different. This means working does not affect a student’s

    GPA. Also, there was no correlation between GPA and the number of hours a student slept.

  • Possible reasons for this could be the students with jobs are more motivated to do well, or

    perhaps the students all study for similar amounts of time even though some of them work during

    the week.

    NinaSticky NoteNicely done. In hindsight maybe you should have looked at credit hours --- people working may take only a minimum number of credits compared to those not working

  • References

    Weiss, N.A. (2012). Introductory Statistics. 9th

    Edition.

    Wenz, M., & Yu, W. (2010). Term-Time Employment and the Academic Performance of

    Undergraduates. Journal Of Education Finance, 35(4), 358-373.

  • SAMPLE SURVEY (Factors Affecting GPA)

    1) Are you currently enrolled in college?

    Yes

    No

    2) Are you a full-time or part-time student?

    Full-time

    Part-time

    3) What is your current class status?

    Freshman

    Sophomore

    Junior

    Senior

    4) What is your current field of study (major)?

    Science

    Math

    English

    History

    Other

    5) What is your current GPA? (Response in 0.00 format)

    6) Do you live at home, on-campus or off-campus?

    Home

    On-campus

    Off-campus

    7) If you are an off-campus student, then how many hours per week do you spend driving to school? (If you

    live on campus, then mark 0 as your response)

    0

    0-5

    5-10

    10-15

    15 or more

    8) Are you currently employed or unemployed?

    Employed

    Unemployed

    9) On average, how many hours per week do you work? (Response in # hours format)

  • 10) If you commute to work, how many hours per week do you spend driving to work? (If you are not

    currently working, mark 0 as your answer)

    0

    0-5

    5-10

    10-15

    15 or more

    11) On average, how many hours of sleep do you get per week? (Response in 0.0 hours format)

    12) How many hours per week do you spend doing school-related work (ex. homework, projects, paper, lab

    reports, studying, etc)?

    0-5

    5-10

    10-15

    15 -20

    20-25

    25-30

    30 or more

    13) Are you currently involved in any extra-curricular activities? (Ex. sports, clubs, organizations, volunteer

    work, etc)

    Sports

    Organizations

    Clubs

    Volunteer Work

    None

    Other

    14) What is your family's average income?

    0-20,000

    20,000-40,000

    40,000-60,000

    60,000-80,000

    80,000-100,000

    100,000 or more

    15) Are you financially independent?

    Yes

    No

    16) What is your ethnicity?

  • Caucasian

    African American

    Native American

    Asian

    Hispanic

    Other