STATISTICSSTATISTICSFor ResearchFor Research
1.1. QuantitativelyQuantitatively describe and describe and
summarize datasummarize data
A Researcher Can:A Researcher Can:
A Researcher Can:A Researcher Can:2.2. Draw conclusionsDraw conclusions about large sets about large sets
of data by sampling only of data by sampling only small small portions of themportions of them
3.3. ObjectivelyObjectively measure differencesmeasure differences and and relationships between sets of data.relationships between sets of data.
A Researcher Can:A Researcher Can:
• Samples should be taken at Samples should be taken at randomrandom
• Each measurement has an Each measurement has an equal equal opportunityopportunity of being selected of being selected
• Otherwise, sampling Otherwise, sampling procedures may be procedures may be biasedbiased
Random SamplingRandom Sampling
• A characteristic CANNOT be A characteristic CANNOT be estimated from a single data estimated from a single data pointpoint
• ReplicatedReplicated measurements should measurements should be taken, at least be taken, at least 1010..
Sampling ReplicationSampling Replication
MechanicsMechanics
1.1. Write down a Write down a formulaformula
2.2. Substitute numbers Substitute numbers into the into the
formulaformula
3.3. Solve Solve for the for the
unknownunknown..
The Null HypothesisThe Null Hypothesis• HHoo = There is no difference = There is no difference
between 2 or more sets of databetween 2 or more sets of data– any difference is due to chance any difference is due to chance
alonealone
– Commonly set at a probability of Commonly set at a probability of
95% (P 95% (P .05) .05)
The Alternative HypothesisThe Alternative Hypothesis• HHAA = There = There isis a difference a difference
between 2 or more sets of databetween 2 or more sets of data– the difference is due to more than the difference is due to more than
just chancejust chance
– Commonly set at a probability of Commonly set at a probability of
95% (P 95% (P .05) .05)
AveragesAverages• Population Average = mean ( Population Average = mean ( x x ))
• a a PopulationPopulation mean mean = ( = ( ))– take the mean of a take the mean of a random samplerandom sample
from the population ( from the population ( n n ))
Population MeansPopulation Means
To find the population mean ( To find the population mean ( ),),• add up (add up (Σ) the values ) the values
((x x = grasshopper mass, tree = grasshopper mass, tree height)height)
• divide by the number of values (divide by the number of values (nn):):
= = xx — —
nn
Measures of VariabilityMeasures of Variability• Calculating a mean gives only a Calculating a mean gives only a partialpartial
description of a set of data description of a set of data
– Set A = 1, 6, 11, 16, 21Set A = 1, 6, 11, 16, 21– Set B = 10, 11, 11, 11, 12Set B = 10, 11, 11, 11, 12
• Means for A & B Means for A & B ????????????
• Need a measure of how variable Need a measure of how variable the data are.the data are.
RangeRange• DifferenceDifference between the largest and between the largest and
smallest valuessmallest values
– Set ASet A = 1, 6, 11, 16, 21 = 1, 6, 11, 16, 21
• Range = Range = ??????
– Set BSet B = 10, 11, 11, 11, 12 = 10, 11, 11, 11, 12
• Range = Range = ??????
Standard Standard DeviationDeviation
Standard DeviationStandard Deviation• A measure of the deviation of A measure of the deviation of
data data from their mean.from their mean.
The FormulaThe Formula
__________SDSD = = NN ∑XX2 - 2 - ((∑XX))22
________ ________
NN ( (NN-1)-1)
SD SymbolsSD SymbolsSDSD = Standard Dev = Standard Dev
= Square Root= Square Root
∑XX2 2 = Sum of x= Sum of x22’’dd
∑((XX))2 2 = Sum of x= Sum of x’’s, then squareds, then squared
NN = # of samples = # of samples
The FormulaThe Formula
__________SDSD = = NN ∑XX2 - 2 - ((∑XX))22
________ ________
NN ( (NN-1)-1)
XX XX22
297 297 88,209 88,209301 301 90,601 90,601306 306 93,636 93,636312 312 97,344 97,344314 314 98,596 98,596317 317 100,489 100,489325 325 105,625 105,625329 329 108,241 108,241334 334 111,556 111,556350 350 122,500122,500XX = 3,185 = 3,185 XX22 = 1,016,797 = 1,016,797
You can use your You can use your calculatorcalculator to find SD! to find SD!
Once YouOnce You’’ve got the Idea:ve got the Idea:
The Normal The Normal CurveCurve
The Normal The Normal CurveCurve
SD & the Bell CurveSD & the Bell Curve
% Increments% Increments
Skewed CurvesSkewed Curves
medianmedian
Critical ValuesCritical Values
Standard Deviations Standard Deviations 2 2 SD above SD above
or below the mean or below the mean ==
““due todue to more than chance alone.” more than chance alone.”
THIS MEANSTHIS MEANS: The data lies : The data lies outsideoutside the the 95%95% confidence limits for confidence limits for probability. probability. Your research shows there is Your research shows there is something significant going on...something significant going on...
Chi-SquareChi-Square
22
Chi-Square Test Chi-Square Test RequirementsRequirements
• QuantiQuantitative datatative data• Simple Simple randomrandom sample sample• One or more categoriesOne or more categories• Data in frequency (Data in frequency (%%) form) form
• Independent observationsIndependent observations
• AllAll observations must be used observations must be used
• Adequate sample size (Adequate sample size (10)10)
ExampleExample
Chi-Square SymbolsChi-Square Symbols 22 = = ΣΣ (O - E)(O - E) 22
EE
OO = = Observed FrequencyObserved Frequency
EE = = Expected FrequencyExpected Frequency
ΣΣ = = sum ofsum of
dfdf = = degrees of freedomdegrees of freedom ( (n n -1) -1)
22 = = Chi SquareChi Square
Chi-Square WorksheetChi-Square Worksheet
Chi-Square AnalysisChi-Square AnalysisTable value for Chi Square = Table value for Chi Square = 9.499.49 44 dfdf P = .05P = .05 level of significancelevel of significance
Is there a significant difference in car preference????
SD & the Bell CurveSD & the Bell Curve
T-TestsT-Tests
T-TestsT-TestsFor populations that For populations that do do follow a follow a
normalnormal distribution distribution
T-TestsT-Tests• To draw conclusions about To draw conclusions about
similarities or differences between similarities or differences between population means population means ( ( ))
• Is average plant biomass the same in Is average plant biomass the same in – two different geographical two different geographical areas areas ??????
– two different two different seasons seasons ??? ???
T-TestsT-Tests• To be COMPLETELYTo be COMPLETELY confident you confident you
would have to measure would have to measure allall plant plant biomass in each biomass in each area.area.– Is this
PRACTICAL?????
Instead:Instead:• Take one sample from Take one sample from eacheach
population.population.
• InferInfer from the sample means and from the sample means and standard deviation (SD) whether the standard deviation (SD) whether the populations have the populations have the samesame or or differentdifferent means. means.
AnalysisAnalysis• SMALLSMALL tt values = values = high high probability probability
that the two population means are that the two population means are the the samesame
• LARGE LARGE tt values = values = low low probability probability (the means are different)(the means are different)
AnalysisAnalysis TTcalculatedcalculated > > ttcritical critical = reject = reject HHoo
ttcriticalcritical ttcriticalcritical
We will be using computer We will be using computer analysis to perform the analysis to perform the
tt-test -test
SimpsonSimpson’’s s Diversity IndexDiversity Index
Nonparametric TestingNonparametric Testing• For populations that For populations that do NOTdo NOT follow follow
a a normalnormal distribution distribution
– includes includes most wild populationsmost wild populations
Answers the QuestionAnswers the Question• If 2 indiv are taken at RANDOM from If 2 indiv are taken at RANDOM from
a community, what is the probability a community, what is the probability that they will be the that they will be the SAME SAME speciesspecies????????
The FormulaThe Formula
DD = 1 - = 1 - nni i (n(ni i - 1)- 1) ————— —————
N (N-1)N (N-1)
ExampleExample
ExampleExample
D = D = 1- 50(49)+25(24)+10(9)1- 50(49)+25(24)+10(9)
——————————————————————
85(84)85(84)
DD = 0.56(medium diversity)
AnalysisAnalysis• Closer to Closer to 1.01.0 = =
– more more HomoHomogeneousgeneous community community (low diversity)(low diversity)
• Farther away from Farther away from 1.01.0 = = – more more HeteroHeterogeneous geneous community community (high (high
diversity)diversity)
• You can You can calculate by hand calculate by hand to to find find ““DD””
• School Stats package School Stats package MAYMAY calculate it.calculate it.
Designed by Anne F. Maben
Former AP Science Coach, LACOE
for the Los Angeles County Science FairLos Angeles County Science Fair
© 2013 All rights reserved
This presentation is for viewing only and may not be published in any form