Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Running Head: EXCEL PROJECT
Final Project
Dina M. Kennedy
Seattle Pacific University
EDU 6976 Interpreting & Applying Educational Research II
Fall Quarter, 2009
Running Head: EXCEL PROJECT
The data that was collected from elementary and secondary public schools in the United States. All 50 States are
included plus Washington D.C. The data was then separated into four regions: West, Midwest, South and Northeast.
The data was taken from the Digest of Educational Statistics. The data will be used to analyze whether school
expenditures are related to academic performance. This data collected underwent various statistical measures. The
results and interpretation of the statistical analyses can be read below.
Part 1: Histograms, Box Plots and Frequency Distribution
The data collected from Elementary and Secondary schools represented in the four regions were : Current
expenditure per pupil in average daily attendance (2005-06), Average teacher/pupil ratio (Fall 2005 and 2006), Estimated
average annual salary (2005-2006), Percentage of graduates taking the SAT (2006-2007), Average verbal SAT score
(2005-2006), Average math SAT score (2005-2006), Average writing SAT score (2005-2006), Percent of students who
are eligible for free or reduced-price lunch (2006-2007), Number of students eligible for free or reduced-price lunch
(2006-2007), Percentage of students with disabilities (2006-2007) and Total revenue for the year (2005-2006) in
thousands. We will analyze this data with the use of histograms and box plots.
The histograms below will give us a visual representation of the measures of performance as well as the frequency
of occurrence. There is one histogram per variable. Each of the seven variables has a summary of distribution analysis
located below the histogram.
Running Head: EXCEL PROJECT
This histogram has a right-skewed distribution. This indicates that the distribution of expenditure per pupil did not follow a
normal curve due to the peak in expenditures clustered between $8,000 and $11,000 dollars per pupil. This cluster is located in
the lower portion of distribution.
The histogram’s distribution follows a normal distribution curve. There is a shift in frequency to the left which explains a lower
frequency of smaller teacher/student ratios. Average pupil/teacher ratios were clustered in the 1:13 to 1:17.
Running Head: EXCEL PROJECT
The histogram has a well defined peak that is close in value to the lower amount of money in the salary scale. This peak of the
distribution of annual salaries is clustered between $40,000 and $45,000.
The histogram has a large peak which is not close to the mean. This peak shows a large distribution of verbal SAT scores in the
range of 490-500. The median and mean are difficult to determine. The distribution of scores, minus the peak discussed
previously, resemble a bimodal distribution.
Running Head: EXCEL PROJECT
This histogram has a right-skewed distribution. According to the histogram’s data, there was a high distribution of SAT math scores
in the ranges of 500-520. There is an outlier in the distribution in the scoring range of 470. The mean and median are difficult to
determine. This may be a result of the range values. Overall, Math and Verbal SAT scores appear to plot in the same distribution
areas.
This histogram displays a bimodal distribution. These two distributions within the same set of data may be because of two
different tests. It appears that some type of shift in the frequency may be because of an extraneous variable. This histogram has a
peak in the 485 range. This data is not representative of the distribution in the Verbal and Math SAT scores.
Running Head: EXCEL PROJECT
This histogram indicates an outlier in the 9 0-100.0 range. I believe this distribution is due to construction error. It is normal to
have a frequency of distribution higher than 100% in eligible students taking the SAT.
The histogram has a relatively normal distribution except for the scores in the outlier range of 20-25. The mean and median are
between distributed in the 30-40% range. This histogram also shows that none of the data exceeds 70%. This means that out of
the population sampled, none of the groups exceeded 70%.
Running Head: EXCEL PROJECT
The histogram has a relatively normal distribution. The population of students with disabilities does not exceed 20% the mean is
very clear at 15%.
The histogram has several outliers. This is due to the range values being too large. The majority of annual revenues are in the 0-5,
ooo dollar range. This indicates that revenues in the population are not equivalent.
Running Head: EXCEL PROJECT
Again, this histogram’s ranges are too large. The median and mean are not clear. It is difficult to determine enrollment
according to the graph. There is, however, a peak of frequency in the range of 0-5,000.
Following the histograms and their analyses, the box plots are another method to display various data. They summarize
the statistical measures: mean, upper and lower quartiles and minimum and maximum data values. Box plots divide the data into
quartiles. Each of the quartiles includes 25% of the data. All quartiles total 100% of the data. The box itself contains the middle
50% of the data. The upper hinge of the box is the 75th percentile of the data. The lower hinge is the 25th percentile. The range of
the middle two quartiles is known as the inter-quartile range.
When interpreting a box plot, it is also important to note that the line within the box represents the median value of the
data set. “Whiskers” are the vertical lines that show the minimum and maximum data values. Points on the outside of the
whiskers are outliers similar to the outliers in the histograms. Box plots are able to visually display a variable’s location and
frequency distribution at a glance. They are able to show the symmetry and skewness included in the data set. When using a box
plot for categorical variables side by side, it is quick and easy to compare data sets.
Running Head: EXCEL PROJECT
The median for the Western region’s distribution is approximately 9,000. 50% of the scores fall between 8,000
and 10,000. The Midwestern region’s distribution is approximately 10,000. 50% of the scores fall between 10,500
and 9,000. The southern region’s distribution is approximately 9,000 and 50% of the scores fall between 10,000 and
8,000. The Northeastern’s distribution is approximately 13,750 and 50% of the scores fall between 12,000 and
14,000. The Northeast region has much higher expenditures per student. The West, Midwest and South regions are
more dispersed at a lower range and are clustered together. The Western range’s third quartile scores are high.
Running Head: EXCEL PROJECT
The median of the pupil/teacher ratios, according to region are: West=18:1, Midwest=14:1, South=15:1 and
Northeast=13:1. The Western region has the largest frequency of distribution. The Southern region has an outlier of
47. 50% of the overall ratio scores for the data fall between 11:1 and 20:1.
This box plot has a week positive skew. The Northeast’s box shifting toward the upper whisker weakens the skew.
The median of each region’s estimated average salaries are: West=$45,500, Midwest=$45,000, South=$44,500 and
Northeast=$57,000. 50% of all average salaries fall between $41,000 and $56,000. Again, the southern region has
three outliers and their distribution is small and low. This may be due to advancement opportunities, job turn-over
rate and/or low beginning salaries. The northeast region has the largest frequency of higher salaries. The Midwest
has the overall largest distribution of salaries, $36,000-$60,000.
Running Head: EXCEL PROJECT
This box plot displays the percentage of eligible students taking the SAT by region. The western region has a
distribution of 5-60% with the median of 30%. The Midwest has a very small frequency distribution of 3-5% but has
outliers of 15 students at 60% and 36 students at 27%. The southern region has the highest frequency distribution of
8-75% with a median of 50%. The northeastern region’s distribution range is small but in the highest quartile. It is
70-100% with a median of 80%. This box plot has a positive skew.
The median of the Verbal Sat scores, by region, of this box plot are: West=525, Midwest=590; with an outer
fence of 36 and 16 scores at 500, South=500 and Northeast=510. The Midwest’s scores were located in the 75th
percentile and the remaining regions were present in the inter-quartile. The south had the largest frequency
Running Head: EXCEL PROJECT
distribution of scores but the Midwest had the highest scores, overall. This box plot has a positive skew but it is
weakened by the West’s box shifting toward the higher whisker.
Verbal and writing SAT scores are very similar. The midwest has the outer fence of 36 again with 15 scores
represented at approximately 485. This may be a magnet school or a homeschool sample. The median for each
region: West=510, Midwest=575, South=500 and Northeast=490.The Midwest is, again in the upper 75th percentile,
The South has the highest amount of scores in the inter-quartile. This box plot has a negative skew.
Running Head: EXCEL PROJECT
Median math scores by region: West=535, Midwest=590, South=510 and Northeast=510. The frequency
distribution has decreased according to Verbal and Writing SAT scores. Outer fence range in the Midwest is present.
South continues to have the largest range of inter-quartile scores. A weak positive skew is displayed within the data
of this box plot.
Medians represented in this box plot are closely matched due to the frequency values. The West has 5 scores
out of range in the upper quartile. The South has an outer fence with 11 scores and 54 scores outside of its range.
The Northeast also has an outer fence with 33 scores. Scores are clustered in the lower range of distribution. Overall
the box plot’s distribution is positively skewed.
Running Head: EXCEL PROJECT
The West’s midpoint of distribution is 36 with a lower whisker of 30 and an upper whisker of 50. There are 32
scores at the outer fence located at 60. The Midwest’s midpoint of distribution is 35 with an upper whisker of 40 and
a lower whisker of 30. The South has a median of 50 with an upper whisker of 68 and a lower whisker of 30. The
South has the largest distribution of scores. The Northeast has a median of 30 with an upper whisker of 34 and a
lower whisker of 28. There are 33 scores on the upper outer fence and 30 scores on the lower outer fence. This box
plot is positively skewed.
Running Head: EXCEL PROJECT
50 % of the West’s scores fall within the 11.5-13.75 range. 50% of the Midwest’s scores fall within the 14-15.5
range. 50% of the South’s scores fall within the 12.25-15.5 range. 50% of the Northeast’s scores fall within the 15.5-
18 range. The Northeast had the largest distribution of scores. The distribution is positive but weak due to the South
and Northeast region’s boxes shift toward the higher end of the whisker.
This box plot distribution of total revenues within regions is positively skewed. The amount of scores at the
outer fence (44) in the Southern region is 10. The West has the lowest total revenues but has an outlier of 5 scores
above the 60,000,000. Point. The South also has outliers. The outer fence is 44 and the amount of scores is 10 and
44. The Northeast had the highest distribution of scores within the inter-quartile exceeding all other regions.
Running Head: EXCEL PROJECT
There were two outer fence limits present in this positively skewed box plot displaying fall enrollment. The
West, with 5 scores and the South with 54. Enrollment was the closest of all variables with the Northeast being the
highest and the West being the lowest.
The frequency distribution of the categorical variables: State name and region are as follows:
Region 1 (13 Western States): Alaska, Arizona, California, Colorado, Hawaii, Idaho, Montana, Nevada, New Mexico,
Oregon, Utah, Washington and Wyoming. Region 1 represents 25.5% of the nation.
Region 2 (12 Midwestern States): Illinois, Indiana, Iowa, Kansas, Michigan, Minnesota, Missouri, Nebraska, North
Dakota, Ohio, South Dakota and Wisconsin. Region 2 represents 23.5% of the nation.
Region 3 (17 Southern States): Alabama, Arkansas, Delaware, Washington D.C., Florida, Georgia, Kentucky,
Louisiana, Maryland, Mississippi, North Carolina, Oklahoma, South Carolina, Tennessee, Texas, Virginia and West
Virginia. Region 3 represents 33.3% of the nation.
Region 4 (9 Northeastern States): Connecticut, Maine, Massachusetts, New Hampshire, New Jersey, New York,
Pennsylvania, Rhode Island and Vermont. Region 4 represents 17.6% of the nation.
Part 2 ANOVA
We have finished anal sizing the distributions of all the continuous variables and now we will take a look at how
the regions differ in 1) Expenditure per pupil, 2) Pupil/Teacher ratio, 3) Teacher Average Salary, 4) Percentage of All
Running Head: EXCEL PROJECT
Students Taking SAT, 5) Performance on SAT, 6) SES as Measured by Percentage of Students on Free or Reduced Price
Lunch and 6) Percentage of Students with Disabilities. We will do this by using an analysis of variance, or ANOVA test.
The ANOVA will allow us establish if there is a significant difference between the regions means. We will analyze the
variance between groups (regions) and the variance within each of the regions, independently. It will be from both of
these measures of variance, between-groups and within-groups effects that we determine if a significant difference is
present.
Several items can be observed from the ANOVA run on Expenditures per pupil within and between the regions.
The Northeast’s mean is significantly higher ($13,601.44) than the other three. The south has the highest number in its
population at 17 compared to 9, 12 and 13. The F Ratio Between-groups is 9.75 and the variance within-groups is 1.45.
When the variance between groups is higher than variance within groups, we can determine that a significant difference
is present. According to the data, there is a significant difference between the regions regarding expenditures per pupil.
Descriptive Statistics
Dependent Variable: current expenditure per pupil in average daily
attendance in public elem and sec schools 2005-06
Region Mean Std. Deviation N
West 9244.92 2024.64 13
Midwest 9905.42 814.05 12
South 9720.88 2626.95 17
Northeast 13601.44 1804.85 9
Total 10327.78 2502.20 51
Several things can be observed from the Pupil/teacher ratio and the regions. The West has the highest sample
mean of 17.81. The difference between the Northeast mean of 12.73 and the West at 17.81 is 5.08. The difference
Running Head: EXCEL PROJECT
between the means of the Midwest and South is .05. Within-groups variance (F Ratio) is 4.92 and the Between-groups
variance is 13.08. Again, the variance Between-groups are significantly higher than Within-groups so we can determine
that there is a significant difference between regions and their Pupil/teacher ratio.
Descriptive Statistics
Dependent Variable:average pupil/teacher ratio Fall 2005
Region Mean Std. Deviation N
West 17.81 2.90 13
Midwest 14.81 1.71 12
South 14.86 1.28 17
Northeast 12.73 1.46 9
Total 15.23 2.54 51
Running Head: EXCEL PROJECT
Several things are noticeable about the variances in regions and average salaries. The Northeast has the highest
salaries with a mean of $5,3864.89. The West, South and Midwest are all quite equivalent. The within-subjects variance
is .53 and the between -subjects variance is 3.45. We can conclude, again, that there is a significant difference amongst
the regions and their annual salaries.Descriptive Statistics
Dependent Variable:estimated ave salary 2005-2006
Region Mean Std. Deviation N
West 47223.38 5969.66 13
Midwest 46312.50 7302.27 12
South 45717.35 6086.72 17
Northeast 53864.89 6779.55 9
Total 47679.08 6942.01 51
Several things to notice about the percentage of eligible students taking the SAT are that the test between-subjects
have an f ratio of 16.66. The f ratio from the within-subjects test is 16.53. The difference between these two f ratios is
insignificant. We can concur that there is not a significant difference in means amongst the regions and the percentage of
students eligible to take the SAT. From this data we can make the assumption that regions do not differ significantly on
their student’s eligibility to take the SAT. This is positive information for educators looking at students across the nation
and their access to the SAT which in turn may be a determining factor to student’s plans to continue their education at the
collegiate level.
Descriptive Statistics
Dependent Variable: percentage of all eligible
students taking the SAT 2006-07
Region Mean Std. Deviation N
West 33.46 18.85 13
Midwest 12.67 16.75 12
South 40.35 30.89 17
Northeast 81.44 10.33 9
Total 39.33 31.12 51
Running Head: EXCEL PROJECT
Several things to notice on the Verbal, Math and Writing student performances on the SAT are as follows: regions
did indicate a significant difference in Verbal and Math performances. Regions did not indicate a significant difference in
performances in Writing. Writing performance was weak with F ratios within-subjects at 8.01 and between-subjects F
ratio at 9.62. This smaller difference between tests and the R squared value of 38% does demonstrate with strong
confidence that writing performance, across regions did not have a significant difference. This may indicate that writing
instruction and/or curriculum is more aligned across the nation than math and verbal.
Several things to notice with the regions being compared in differences from the SES, as measured by percent of
students on free and reduced lunch are that the South has the highest mean (49.14) with a standard deviation of 9.53. The
Northeast has the lowest mean (29.78) with a standard deviation of 7.02. The West has a mean of 39.57 with a standard
deviation of 8.600 and the Midwest has a mean of 34.43 with a standard deviation of 3.77. The F ratio Within-subjects
effects are 1.30. This tells us that the differences within the regions are relatively small. The Between-subjects
differences, however, measured by the F ratio, are large at 14.87. This data shows that since the Between-subjects effects
are significantly higher than the Within-subjects effects, we can determine that a significant difference is present. This
data is important to look at as educators because SES levels are indicators that could have an effect on many student
outcomes. It is important to note that the Western region did not have a complete sample compared to past comparisons.
This may have an effect on the overall regional SES levels determined by the % of students eligible for free/reduced
lunch.
Descriptive Statistics
Dependent Variable:% of students eligible for
free/reduced lunch 2006-07
Did Region Mean Std. Deviation N
West 39.57 8.600 12
Midwest 34.43 3.77 12
South 49.14 9.53 17
Northeast 29.78 7.02 9
Total 39.82 10.64 50
Running Head: EXCEL PROJECT
Several things to notice when looking at the percentage of students with disabilities statistically, at a regional
level, are that the between-subjects effects is 10.25 and the within-subjects design is 2.55. This data suggests that there is
a significant difference between the regions in regards to the percent of students with disabilities. This may or may not At
a practical level, this may mean that state funding for schools is effected and that students, with disabilities, are receiving
services for their disabilities.
Descriptive Statistics
Dependent Variable:% of students with
disabilities 2006-07
Region Mean Std. Deviation N
West 12.41 1.75 13
Midwest 14.91 .88 12
South 13.99 1.75 17
Northeast 16.34 2.31 9
Total 14.22 2.13 51
Several things to notice with Total revenues for the year are that the Midwest ($9,680,180.75) and South
($9,842,097.24) means are close in value with a difference of $161,916.49. The West has the lowest mean but it also has
the highest standard deviation. This may mean that state spending has not been allocated to certain schools due to
performance standards. The Northeast has the highest revenues but a large standard deviation which may mean that
population levels (student enrollment) may have an effect. The large standard deviation may also be present due to the
amount of cities including in the sample and their overall population size. According to the F Ratios Between-Subjects,
the F Ratio is small (.30) and Within-Subjects F-Ratio is .77. This data suggest that there is a not a significant difference
between regions and there is not a significant difference in variance between cities in each individual region.Descriptive Statistics
Dependent Variable:Total revenues for the year 2005-06 (in thousands)
Region MeanStd.
Deviation N
West 8785486.46 16766404.95 13
Midwest 9680180.75 7521910.91 12
South 9842097.24 9682419.61 17
Northeast 13661645.56 15114676.13 9
Total 10208705.02 12216880.36 51
Running Head: EXCEL PROJECT
It is interesting that there were not many variances of statistical significance between the regions and between the
cities within the regions. Looking at F-Ratios and P Values of each variable was an effective way to determine significant
differences. My analysis leads me to believe that populations within the samples and within the overall population are
major components in determination of significance. If it were possible to look at the data again with sample sizes that are
more equal in size, the data would be more accurate to generalize. With the data collected, as is, I am comfortable stating
that overall significant differences in means between-subjects (regions) were not present.
Part 3 Scatter plots & Regression Equations
The last part of analysis and interpretation is looking closely at scatter plots and regression equations which were
created for the continuous variables: 1) Expenditure and SAT Scores, 2) Pupil/Teacher Ratio and SAT Scores, 3) Salary
and Expenditures and 4) Salary and Pupil/Teacher Ratio. Each scatter plot will help to determine if there is a strong,
medium or weak correlation. We will look closely at the data represented in each scatter plot to determine if there is a
relationship between the continuous variables when they are paired together. We will look at each of the scatter plots
individually to explain in detail the continuous variable relationship and what one can predict from the regression
equations that accompany each scatter plot. Let’s take a look at our data:
Scatter plot 1: Expenditure and SAT SCORES
The relationship between expenditures and SAT Scores has a negative correlation. The SAT Scores are the on the
Y-axis and the expenditures are on the X-axis. The data points are only moderately clustered close to the line of best fit.
This data tells us that since the scores are moderately close to the line, the relationship is moderate. The regression
equation: y=-0.005x + 601.4 tells us that the slope of the line is negative (-.005). R squared=0.154 tells us that 15% of
SAT Scores can be explained by expenditures. This information tells educators that expenditures per pupil do not have a
strong effect on their performances on the SAT. This is on the contrary to what many teachers believe that with more
money comes more access to teaching and learning opportunities. Further research into what types of tutoring/SAT
support resources could improve student performance on SAT without adding any cost would be very valuable. I know
that many parents are anxious to support teachers at no cost.
Running Head: EXCEL PROJECT
4,000 6,000 8,000 10,000 12,000 14,000 16,000 18,000 20,0000.00
100.00
200.00
300.00
400.00
500.00
600.00
700.00
f(x) = − 0.00589021277130288 x + 601.421082358085R² = 0.15481381442258
Scatter plot 1: Expenditures (on x-axis) and SAT Scores (y-axis)Green=MathBlue=VerbalBrown=Writing
Scatter plot 2: Pupil/Teacher Ratio and SAT Scores
The relationship between Pupil/Teacher Ratio and SAT Scores is a negative correlation. The low R value
suggests there was virtually no relationship between pupil/teacher ratio and SAT scores. The regression equation is y= -
0.691x + 535.9. Looking at R squared= .002, tells us that 2% of the SAT scores can be explained by teacher/pupil ratio.
The correlation is weak. By looking at the scatter plot we can see that most of the scores do not fall along the line. This
indicates a weak correlation. This correlation may indicate to educators that more support in the classroom via support
staff is not necessary to increase performance levels. More research on amount of staff serving students and
improvements in performance is necessary.
10.00 12.00 14.00 16.00 18.00 20.00 22.00 24.000.00
100.00
200.00
300.00
400.00
500.00
600.00
700.00
f(x) = − 0.691717731785425 x + 535.90429056336R² = 0.00218149562332148
Running Head: EXCEL PROJECT
Scatter plot 3: Salary and Expenditure
30,000.0035,000.0040,000.0045,000.0050,000.0055,000.0060,000.0065,000.000
2,000
4,000
6,000
8,000
10,000
12,000
14,000
16,000
18,000
20,000
f(x) = 0.248500156517404 x − 1520.47413907618R² = 0.475312632437971
The relationship between Salary and Expenditure has a positive correlation. The correlation is moderate because
the scores are not clustered near the line of best fit. R squared= -0.475 means that approximately 48% of salary can be
explained by expenditure. The regression equation is y=0.248 x – 1520. This means the slope of the line is .248. It is a
positive slope. The y-intercept is 1520. This information can indicate to educators that as salaries increase expenditures
do as well. Further research into increased expenditure having a correlation to salary levels would be powerful to further
understand the relationship.
Scatter plot 4: Salary and Pupil/Teacher Ratio
30,000.0035,000.0040,000.0045,000.0050,000.0055,000.0060,000.0065,000.000.00
5.00
10.00
15.00
20.00
25.00
f(x) = 3.5100464412213E-05 x + 13.4852657337243R² = 0.00932722241511419
Running Head: EXCEL PROJECT
The relationship between Salary and Pupil/Teacher Ratio has a weak negative correlation. The correlation is weak
because the scores are not clustered near the line of best fit. R squared= 0.009 means that approximately .9% of salary
can be explained by Pupil/Teacher Ratio. The regression equation is y=4E – 05 x + 13.48. This means the slope of the
line is - .0005. It is a negative slope. The y-intercept is 13.48. This data can indicate that salary and amount of children
per classroom do not correlate. Further research into smaller class sizes in private schools and teacher salaries may help
to draw more information since the private schools have more freedom to classroom sizes.