mrspritchard.files.wordpress.com€¦ · Web viewFinal Project. Dina M. Kennedy. Seattle Pacific University. EDU 6976 Interpreting & Applying Educational Research II. Fall Quarter,

Running Head: EXCEL PROJECT

Final Project

Dina M. Kennedy

Seattle Pacific University

EDU 6976 Interpreting & Applying Educational Research II

Fall Quarter, 2009


The data that was collected from elementary and secondary public schools in the United States. All 50 States are

included plus Washington D.C. The data was then separated into four regions: West, Midwest, South and Northeast.

The data was taken from the Digest of Educational Statistics. The data will be used to analyze whether school

expenditures are related to academic performance. This data collected underwent various statistical measures. The

results and interpretation of the statistical analyses can be read below.

Part 1: Histograms, Box Plots and Frequency Distribution

The data collected from Elementary and Secondary schools represented in the four regions were : Current

expenditure per pupil in average daily attendance (2005-06), Average teacher/pupil ratio (Fall 2005 and 2006), Estimated

average annual salary (2005-2006), Percentage of graduates taking the SAT (2006-2007), Average verbal SAT score

(2005-2006), Average math SAT score (2005-2006), Average writing SAT score (2005-2006), Percent of students who

are eligible for free or reduced-price lunch (2006-2007), Number of students eligible for free or reduced-price lunch

(2006-2007), Percentage of students with disabilities (2006-2007) and Total revenue for the year (2005-2006) in

thousands. We will analyze this data with the use of histograms and box plots.

The histograms below will give us a visual representation of the measures of performance as well as the frequency

of occurrence. There is one histogram per variable. Each of the seven variables has a summary of distribution analysis

located below the histogram.


This histogram has a right-skewed distribution. This indicates that the distribution of expenditure per pupil did not follow a

normal curve due to the peak in expenditures clustered between $8,000 and $11,000 dollars per pupil. This cluster is located in

the lower portion of distribution.

The histogram’s distribution follows a normal distribution curve. There is a shift in frequency to the left which explains a lower

frequency of smaller teacher/student ratios. Average pupil/teacher ratios were clustered in the 1:13 to 1:17.


The histogram has a well defined peak that is close in value to the lower amount of money in the salary scale. This peak of the

distribution of annual salaries is clustered between $40,000 and $45,000.

The histogram has a large peak which is not close to the mean. This peak shows a large distribution of verbal SAT scores in the

range of 490-500. The median and mean are difficult to determine. The distribution of scores, minus the peak discussed

previously, resemble a bimodal distribution.


This histogram has a right-skewed distribution. According to the histogram’s data, there was a high distribution of SAT math scores

in the ranges of 500-520. There is an outlier in the distribution in the scoring range of 470. The mean and median are difficult to

determine. This may be a result of the range values. Overall, Math and Verbal SAT scores appear to plot in the same distribution

areas.

This histogram displays a bimodal distribution. These two distributions within the same set of data may be because of two

different tests. It appears that some type of shift in the frequency may be because of an extraneous variable. This histogram has a

peak in the 485 range. This data is not representative of the distribution in the Verbal and Math SAT scores.


This histogram indicates an outlier in the 9 0-100.0 range. I believe this distribution is due to construction error. It is normal to

have a frequency of distribution higher than 100% in eligible students taking the SAT.

The histogram has a relatively normal distribution except for the scores in the outlier range of 20-25. The mean and median are

between distributed in the 30-40% range. This histogram also shows that none of the data exceeds 70%. This means that out of

the population sampled, none of the groups exceeded 70%.


The histogram has a relatively normal distribution. The population of students with disabilities does not exceed 20% the mean is

very clear at 15%.

The histogram has several outliers. This is due to the range values being too large. The majority of annual revenues are in the 0-5,

ooo dollar range. This indicates that revenues in the population are not equivalent.


Again, this histogram’s ranges are too large. The median and mean are not clear. It is difficult to determine enrollment

according to the graph. There is, however, a peak of frequency in the range of 0-5,000.

Following the histograms and their analyses, the box plots are another method to display various data. They summarize

the statistical measures: mean, upper and lower quartiles and minimum and maximum data values. Box plots divide the data into

quartiles. Each of the quartiles includes 25% of the data. All quartiles total 100% of the data. The box itself contains the middle

50% of the data. The upper hinge of the box is the 75th percentile of the data. The lower hinge is the 25th percentile. The range of

the middle two quartiles is known as the inter-quartile range.

When interpreting a box plot, it is also important to note that the line within the box represents the median value of the

data set. “Whiskers” are the vertical lines that show the minimum and maximum data values. Points on the outside of the

whiskers are outliers similar to the outliers in the histograms. Box plots are able to visually display a variable’s location and

frequency distribution at a glance. They are able to show the symmetry and skewness included in the data set. When using a box

plot for categorical variables side by side, it is quick and easy to compare data sets.


The median for the Western region’s distribution is approximately 9,000. 50% of the scores fall between 8,000

and 10,000. The Midwestern region’s distribution is approximately 10,000. 50% of the scores fall between 10,500

and 9,000. The southern region’s distribution is approximately 9,000 and 50% of the scores fall between 10,000 and

8,000. The Northeastern’s distribution is approximately 13,750 and 50% of the scores fall between 12,000 and

14,000. The Northeast region has much higher expenditures per student. The West, Midwest and South regions are

more dispersed at a lower range and are clustered together. The Western range’s third quartile scores are high.


The median of the pupil/teacher ratios, according to region are: West=18:1, Midwest=14:1, South=15:1 and

Northeast=13:1. The Western region has the largest frequency of distribution. The Southern region has an outlier of

47. 50% of the overall ratio scores for the data fall between 11:1 and 20:1.

This box plot has a week positive skew. The Northeast’s box shifting toward the upper whisker weakens the skew.

The median of each region’s estimated average salaries are: West=$45,500, Midwest=$45,000, South=$44,500 and

Northeast=$57,000. 50% of all average salaries fall between $41,000 and $56,000. Again, the southern region has

three outliers and their distribution is small and low. This may be due to advancement opportunities, job turn-over

rate and/or low beginning salaries. The northeast region has the largest frequency of higher salaries. The Midwest

has the overall largest distribution of salaries, $36,000-$60,000.


This box plot displays the percentage of eligible students taking the SAT by region. The western region has a

distribution of 5-60% with the median of 30%. The Midwest has a very small frequency distribution of 3-5% but has

outliers of 15 students at 60% and 36 students at 27%. The southern region has the highest frequency distribution of

8-75% with a median of 50%. The northeastern region’s distribution range is small but in the highest quartile. It is

70-100% with a median of 80%. This box plot has a positive skew.

The median of the Verbal Sat scores, by region, of this box plot are: West=525, Midwest=590; with an outer

fence of 36 and 16 scores at 500, South=500 and Northeast=510. The Midwest’s scores were located in the 75th

percentile and the remaining regions were present in the inter-quartile. The south had the largest frequency


distribution of scores but the Midwest had the highest scores, overall. This box plot has a positive skew but it is

weakened by the West’s box shifting toward the higher whisker.

Verbal and writing SAT scores are very similar. The midwest has the outer fence of 36 again with 15 scores

represented at approximately 485. This may be a magnet school or a homeschool sample. The median for each

region: West=510, Midwest=575, South=500 and Northeast=490.The Midwest is, again in the upper 75th percentile,

The South has the highest amount of scores in the inter-quartile. This box plot has a negative skew.


Median math scores by region: West=535, Midwest=590, South=510 and Northeast=510. The frequency

distribution has decreased according to Verbal and Writing SAT scores. Outer fence range in the Midwest is present.

South continues to have the largest range of inter-quartile scores. A weak positive skew is displayed within the data

of this box plot.

Medians represented in this box plot are closely matched due to the frequency values. The West has 5 scores

out of range in the upper quartile. The South has an outer fence with 11 scores and 54 scores outside of its range.

The Northeast also has an outer fence with 33 scores. Scores are clustered in the lower range of distribution. Overall

the box plot’s distribution is positively skewed.


The West’s midpoint of distribution is 36 with a lower whisker of 30 and an upper whisker of 50. There are 32

scores at the outer fence located at 60. The Midwest’s midpoint of distribution is 35 with an upper whisker of 40 and

a lower whisker of 30. The South has a median of 50 with an upper whisker of 68 and a lower whisker of 30. The

South has the largest distribution of scores. The Northeast has a median of 30 with an upper whisker of 34 and a

lower whisker of 28. There are 33 scores on the upper outer fence and 30 scores on the lower outer fence. This box

plot is positively skewed.


50 % of the West’s scores fall within the 11.5-13.75 range. 50% of the Midwest’s scores fall within the 14-15.5

range. 50% of the South’s scores fall within the 12.25-15.5 range. 50% of the Northeast’s scores fall within the 15.5-

18 range. The Northeast had the largest distribution of scores. The distribution is positive but weak due to the South

and Northeast region’s boxes shift toward the higher end of the whisker.

This box plot distribution of total revenues within regions is positively skewed. The amount of scores at the

outer fence (44) in the Southern region is 10. The West has the lowest total revenues but has an outlier of 5 scores

above the 60,000,000. Point. The South also has outliers. The outer fence is 44 and the amount of scores is 10 and

44. The Northeast had the highest distribution of scores within the inter-quartile exceeding all other regions.


There were two outer fence limits present in this positively skewed box plot displaying fall enrollment. The

West, with 5 scores and the South with 54. Enrollment was the closest of all variables with the Northeast being the

highest and the West being the lowest.

The frequency distribution of the categorical variables: State name and region are as follows:

Region 1 (13 Western States): Alaska, Arizona, California, Colorado, Hawaii, Idaho, Montana, Nevada, New Mexico,

Oregon, Utah, Washington and Wyoming. Region 1 represents 25.5% of the nation.

Region 2 (12 Midwestern States): Illinois, Indiana, Iowa, Kansas, Michigan, Minnesota, Missouri, Nebraska, North

Dakota, Ohio, South Dakota and Wisconsin. Region 2 represents 23.5% of the nation.

Region 3 (17 Southern States): Alabama, Arkansas, Delaware, Washington D.C., Florida, Georgia, Kentucky,

Louisiana, Maryland, Mississippi, North Carolina, Oklahoma, South Carolina, Tennessee, Texas, Virginia and West

Virginia. Region 3 represents 33.3% of the nation.

Region 4 (9 Northeastern States): Connecticut, Maine, Massachusetts, New Hampshire, New Jersey, New York,

Pennsylvania, Rhode Island and Vermont. Region 4 represents 17.6% of the nation.

Part 2 ANOVA

We have finished anal sizing the distributions of all the continuous variables and now we will take a look at how

the regions differ in 1) Expenditure per pupil, 2) Pupil/Teacher ratio, 3) Teacher Average Salary, 4) Percentage of All


Students Taking SAT, 5) Performance on SAT, 6) SES as Measured by Percentage of Students on Free or Reduced Price

Lunch and 6) Percentage of Students with Disabilities. We will do this by using an analysis of variance, or ANOVA test.

The ANOVA will allow us establish if there is a significant difference between the regions means. We will analyze the

variance between groups (regions) and the variance within each of the regions, independently. It will be from both of

these measures of variance, between-groups and within-groups effects that we determine if a significant difference is

present.

Several items can be observed from the ANOVA run on Expenditures per pupil within and between the regions.

The Northeast’s mean is significantly higher ($13,601.44) than the other three. The south has the highest number in its

population at 17 compared to 9, 12 and 13. The F Ratio Between-groups is 9.75 and the variance within-groups is 1.45.

When the variance between groups is higher than variance within groups, we can determine that a significant difference

is present. According to the data, there is a significant difference between the regions regarding expenditures per pupil.

Descriptive Statistics

Dependent Variable: current expenditure per pupil in average daily

attendance in public elem and sec schools 2005-06

Region Mean Std. Deviation N

West 9244.92 2024.64 13

Midwest 9905.42 814.05 12

South 9720.88 2626.95 17

Northeast 13601.44 1804.85 9

Total 10327.78 2502.20 51

Several things can be observed from the Pupil/teacher ratio and the regions. The West has the highest sample

mean of 17.81. The difference between the Northeast mean of 12.73 and the West at 17.81 is 5.08. The difference


between the means of the Midwest and South is .05. Within-groups variance (F Ratio) is 4.92 and the Between-groups

variance is 13.08. Again, the variance Between-groups are significantly higher than Within-groups so we can determine

that there is a significant difference between regions and their Pupil/teacher ratio.


Dependent Variable:average pupil/teacher ratio Fall 2005


West 17.81 2.90 13

Midwest 14.81 1.71 12

South 14.86 1.28 17

Northeast 12.73 1.46 9

Total 15.23 2.54 51


Several things are noticeable about the variances in regions and average salaries. The Northeast has the highest

salaries with a mean of $5,3864.89. The West, South and Midwest are all quite equivalent. The within-subjects variance

is .53 and the between -subjects variance is 3.45. We can conclude, again, that there is a significant difference amongst

the regions and their annual salaries.Descriptive Statistics

Dependent Variable:estimated ave salary 2005-2006


West 47223.38 5969.66 13

Midwest 46312.50 7302.27 12

South 45717.35 6086.72 17

Northeast 53864.89 6779.55 9

Total 47679.08 6942.01 51

Several things to notice about the percentage of eligible students taking the SAT are that the test between-subjects

have an f ratio of 16.66. The f ratio from the within-subjects test is 16.53. The difference between these two f ratios is

insignificant. We can concur that there is not a significant difference in means amongst the regions and the percentage of

students eligible to take the SAT. From this data we can make the assumption that regions do not differ significantly on

their student’s eligibility to take the SAT. This is positive information for educators looking at students across the nation

and their access to the SAT which in turn may be a determining factor to student’s plans to continue their education at the

collegiate level.


Dependent Variable: percentage of all eligible

students taking the SAT 2006-07


West 33.46 18.85 13

Midwest 12.67 16.75 12

South 40.35 30.89 17

Northeast 81.44 10.33 9

Total 39.33 31.12 51


Several things to notice on the Verbal, Math and Writing student performances on the SAT are as follows: regions

did indicate a significant difference in Verbal and Math performances. Regions did not indicate a significant difference in

performances in Writing. Writing performance was weak with F ratios within-subjects at 8.01 and between-subjects F

ratio at 9.62. This smaller difference between tests and the R squared value of 38% does demonstrate with strong

confidence that writing performance, across regions did not have a significant difference. This may indicate that writing

instruction and/or curriculum is more aligned across the nation than math and verbal.

Several things to notice with the regions being compared in differences from the SES, as measured by percent of

students on free and reduced lunch are that the South has the highest mean (49.14) with a standard deviation of 9.53. The

Northeast has the lowest mean (29.78) with a standard deviation of 7.02. The West has a mean of 39.57 with a standard

deviation of 8.600 and the Midwest has a mean of 34.43 with a standard deviation of 3.77. The F ratio Within-subjects

effects are 1.30. This tells us that the differences within the regions are relatively small. The Between-subjects

differences, however, measured by the F ratio, are large at 14.87. This data shows that since the Between-subjects effects

are significantly higher than the Within-subjects effects, we can determine that a significant difference is present. This

data is important to look at as educators because SES levels are indicators that could have an effect on many student

outcomes. It is important to note that the Western region did not have a complete sample compared to past comparisons.

This may have an effect on the overall regional SES levels determined by the % of students eligible for free/reduced

lunch.


Dependent Variable:% of students eligible for

free/reduced lunch 2006-07

Did Region Mean Std. Deviation N

West 39.57 8.600 12

Midwest 34.43 3.77 12

South 49.14 9.53 17


Total 39.82 10.64 50


Several things to notice when looking at the percentage of students with disabilities statistically, at a regional

level, are that the between-subjects effects is 10.25 and the within-subjects design is 2.55. This data suggests that there is

a significant difference between the regions in regards to the percent of students with disabilities. This may or may not At

a practical level, this may mean that state funding for schools is effected and that students, with disabilities, are receiving

services for their disabilities.


Dependent Variable:% of students with

disabilities 2006-07


West 12.41 1.75 13

Midwest 14.91 .88 12

South 13.99 1.75 17


Total 14.22 2.13 51

Several things to notice with Total revenues for the year are that the Midwest ($9,680,180.75) and South

($9,842,097.24) means are close in value with a difference of $161,916.49. The West has the lowest mean but it also has

the highest standard deviation. This may mean that state spending has not been allocated to certain schools due to

performance standards. The Northeast has the highest revenues but a large standard deviation which may mean that

population levels (student enrollment) may have an effect. The large standard deviation may also be present due to the

amount of cities including in the sample and their overall population size. According to the F Ratios Between-Subjects,

the F Ratio is small (.30) and Within-Subjects F-Ratio is .77. This data suggest that there is a not a significant difference

between regions and there is not a significant difference in variance between cities in each individual region.Descriptive Statistics

Dependent Variable:Total revenues for the year 2005-06 (in thousands)

Region MeanStd.

Deviation N

West 8785486.46 16766404.95 13

Midwest 9680180.75 7521910.91 12

South 9842097.24 9682419.61 17

Northeast 13661645.56 15114676.13 9

Total 10208705.02 12216880.36 51


It is interesting that there were not many variances of statistical significance between the regions and between the

cities within the regions. Looking at F-Ratios and P Values of each variable was an effective way to determine significant

differences. My analysis leads me to believe that populations within the samples and within the overall population are

major components in determination of significance. If it were possible to look at the data again with sample sizes that are

more equal in size, the data would be more accurate to generalize. With the data collected, as is, I am comfortable stating

that overall significant differences in means between-subjects (regions) were not present.

Part 3 Scatter plots & Regression Equations

The last part of analysis and interpretation is looking closely at scatter plots and regression equations which were

created for the continuous variables: 1) Expenditure and SAT Scores, 2) Pupil/Teacher Ratio and SAT Scores, 3) Salary

and Expenditures and 4) Salary and Pupil/Teacher Ratio. Each scatter plot will help to determine if there is a strong,

medium or weak correlation. We will look closely at the data represented in each scatter plot to determine if there is a

relationship between the continuous variables when they are paired together. We will look at each of the scatter plots

individually to explain in detail the continuous variable relationship and what one can predict from the regression

equations that accompany each scatter plot. Let’s take a look at our data:

Scatter plot 1: Expenditure and SAT SCORES

The relationship between expenditures and SAT Scores has a negative correlation. The SAT Scores are the on the

Y-axis and the expenditures are on the X-axis. The data points are only moderately clustered close to the line of best fit.

This data tells us that since the scores are moderately close to the line, the relationship is moderate. The regression

equation: y=-0.005x + 601.4 tells us that the slope of the line is negative (-.005). R squared=0.154 tells us that 15% of

SAT Scores can be explained by expenditures. This information tells educators that expenditures per pupil do not have a

strong effect on their performances on the SAT. This is on the contrary to what many teachers believe that with more

money comes more access to teaching and learning opportunities. Further research into what types of tutoring/SAT

support resources could improve student performance on SAT without adding any cost would be very valuable. I know

that many parents are anxious to support teachers at no cost.


4,000 6,000 8,000 10,000 12,000 14,000 16,000 18,000 20,0000.00

100.00

200.00

300.00

400.00

500.00

600.00

700.00

f(x) = − 0.00589021277130288 x + 601.421082358085R² = 0.15481381442258

Scatter plot 1: Expenditures (on x-axis) and SAT Scores (y-axis)Green=MathBlue=VerbalBrown=Writing

Scatter plot 2: Pupil/Teacher Ratio and SAT Scores

The relationship between Pupil/Teacher Ratio and SAT Scores is a negative correlation. The low R value

suggests there was virtually no relationship between pupil/teacher ratio and SAT scores. The regression equation is y= -

0.691x + 535.9. Looking at R squared= .002, tells us that 2% of the SAT scores can be explained by teacher/pupil ratio.

The correlation is weak. By looking at the scatter plot we can see that most of the scores do not fall along the line. This

indicates a weak correlation. This correlation may indicate to educators that more support in the classroom via support

staff is not necessary to increase performance levels. More research on amount of staff serving students and

improvements in performance is necessary.

10.00 12.00 14.00 16.00 18.00 20.00 22.00 24.000.00

100.00

200.00

300.00

400.00

500.00

600.00

700.00

f(x) = − 0.691717731785425 x + 535.90429056336R² = 0.00218149562332148


Scatter plot 3: Salary and Expenditure

30,000.0035,000.0040,000.0045,000.0050,000.0055,000.0060,000.0065,000.000

2,000

4,000

6,000

8,000

10,000

12,000

14,000

16,000

18,000

20,000

f(x) = 0.248500156517404 x − 1520.47413907618R² = 0.475312632437971

The relationship between Salary and Expenditure has a positive correlation. The correlation is moderate because

the scores are not clustered near the line of best fit. R squared= -0.475 means that approximately 48% of salary can be

explained by expenditure. The regression equation is y=0.248 x – 1520. This means the slope of the line is .248. It is a

positive slope. The y-intercept is 1520. This information can indicate to educators that as salaries increase expenditures

do as well. Further research into increased expenditure having a correlation to salary levels would be powerful to further

understand the relationship.

Scatter plot 4: Salary and Pupil/Teacher Ratio

30,000.0035,000.0040,000.0045,000.0050,000.0055,000.0060,000.0065,000.000.00

5.00

10.00

15.00

20.00

25.00

f(x) = 3.5100464412213E-05 x + 13.4852657337243R² = 0.00932722241511419


The relationship between Salary and Pupil/Teacher Ratio has a weak negative correlation. The correlation is weak

because the scores are not clustered near the line of best fit. R squared= 0.009 means that approximately .9% of salary

can be explained by Pupil/Teacher Ratio. The regression equation is y=4E – 05 x + 13.48. This means the slope of the

line is - .0005. It is a negative slope. The y-intercept is 13.48. This data can indicate that salary and amount of children

per classroom do not correlate. Further research into smaller class sizes in private schools and teacher salaries may help

to draw more information since the private schools have more freedom to classroom sizes.

Documents

mrspritchard.files.wordpress.com€¦ · Web viewFinal Project. Dina M. Kennedy. Seattle Pacific University. EDU 6976 Interpreting & Applying Educational Research II. Fall Quarter,