36
STATISTICS!!! STATISTICS!!! The science of data The science of data

1.1 STATISTICS

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: 1.1 STATISTICS

STATISTICS!!!STATISTICS!!!

The science of dataThe science of data

Page 2: 1.1 STATISTICS

What is data?What is data?

Information, in the form of Information, in the form of facts or figures obtained from facts or figures obtained from experiments or surveys, used experiments or surveys, used as a basis for making as a basis for making calculations or drawing calculations or drawing conclusionsconclusions

Encarta dictionary Encarta dictionary

Page 3: 1.1 STATISTICS

Statistics in ScienceStatistics in Science

Data can be collected about a Data can be collected about a population (surveys)population (surveys)

Data can be collected about a Data can be collected about a process (experimentation)process (experimentation)

Page 4: 1.1 STATISTICS

2 types of Data2 types of Data

QualitativeQualitative

QuantitativeQuantitative

Page 5: 1.1 STATISTICS

Qualitative DataQualitative Data

Information that relates to Information that relates to characteristics characteristics oror description description (observable qualities) (observable qualities)

Information is Information is often groupedoften grouped by by descriptive categorydescriptive category

ExamplesExamples Species of plantSpecies of plant Type of insectType of insect Shades of colorShades of color Rank of flavor in taste testingRank of flavor in taste testingRemember: qualitative data can be Remember: qualitative data can be ““scoredscored”” and and

evaluated numericallyevaluated numerically

Page 6: 1.1 STATISTICS

Qualitative data, manipulated Qualitative data, manipulated numericallynumerically

Survey results, teens and need for environmental actionSurvey results, teens and need for environmental action

Page 7: 1.1 STATISTICS

Quantitative dataQuantitative data

Quantitative – Quantitative – measuredmeasured using a using a naturally occurringnaturally occurring numerical numerical scale scale

ExamplesExamplesChemical concentrationChemical concentrationTemperatureTemperatureLengthLengthWeight…etc.Weight…etc.

Page 8: 1.1 STATISTICS

Quantitation Quantitation

Measurements are often displayed Measurements are often displayed graphicallygraphically

Page 9: 1.1 STATISTICS

Quantitation = MeasurementQuantitation = Measurement In data collection for Biology, data must be In data collection for Biology, data must be

measured carefully, using laboratory measured carefully, using laboratory equipment equipment ((ex. Timers, metersticks, pH meters, balances , pipettes, etc)ex. Timers, metersticks, pH meters, balances , pipettes, etc)

The limits of the equipment used add some The limits of the equipment used add some uncertainty to the data collected. All equipment uncertainty to the data collected. All equipment has a certain magnitude of uncertainty. For has a certain magnitude of uncertainty. For example, is a ruler that is mass-produced a example, is a ruler that is mass-produced a good measure of 1 cm? 1mm? 0.1mm?good measure of 1 cm? 1mm? 0.1mm?

For quantitative testing, For quantitative testing, you must indicate you must indicate the level of uncertainty of the tool that the level of uncertainty of the tool that you are using for measurement!!you are using for measurement!!

Page 10: 1.1 STATISTICS

How to determine How to determine uncertainty?uncertainty?

Usually the instrument manufacturer will Usually the instrument manufacturer will indicate this – read what is provided by the indicate this – read what is provided by the manufacturer.manufacturer.

Be sure that the number of significant digits in Be sure that the number of significant digits in the data table/graph reflects the precision of the the data table/graph reflects the precision of the instrument used (for ex. If the manufacturer instrument used (for ex. If the manufacturer states that the accuracy of a balance is to 0.1g – states that the accuracy of a balance is to 0.1g – and your average mass is 2.06g, be sure to and your average mass is 2.06g, be sure to round the average to 2.1g) Your data must be round the average to 2.1g) Your data must be consistent consistent with your measurement tool with your measurement tool regarding regarding significant figuressignificant figures..

Page 11: 1.1 STATISTICS

Finding the limitsFinding the limits

As a As a ““rule-of-thumbrule-of-thumb””, if not specified, use +/- , if not specified, use +/- 1/2 of the smallest measurement unit (ex 1/2 of the smallest measurement unit (ex metric ruler is lined to 1mm,so the limit of metric ruler is lined to 1mm,so the limit of uncertainty of the ruler is +/- 0.5 mm.)uncertainty of the ruler is +/- 0.5 mm.)

If the room temperature is read as 25 If the room temperature is read as 25 degrees C, with a thermometer that is scored degrees C, with a thermometer that is scored at 1 degree intervals – what is the range of at 1 degree intervals – what is the range of possible temperatures for the room?possible temperatures for the room?

(ans.s +/- 0.5 degrees Celsius - if you read (ans.s +/- 0.5 degrees Celsius - if you read 1515ooC, it may in fact be 14.5 or 15.5 degrees)C, it may in fact be 14.5 or 15.5 degrees)

Page 12: 1.1 STATISTICS

Looking at DataLooking at Data

How accurate is the data? (How close How accurate is the data? (How close are the data to the are the data to the ““realreal”” results?) This results?) This is also considered as BIASis also considered as BIAS

How precise is the data? (All test How precise is the data? (All test systems have some uncertainty, due systems have some uncertainty, due to limits of measurement) Estimation to limits of measurement) Estimation of the limits of the experimental of the limits of the experimental uncertainty is essential. uncertainty is essential.

Page 13: 1.1 STATISTICS
Page 14: 1.1 STATISTICS
Page 15: 1.1 STATISTICS

Comparing AveragesComparing Averages

Once the 2 averages are Once the 2 averages are calculated for each set of calculated for each set of data, the average values can data, the average values can be plotted together on a be plotted together on a graph, to visualize the graph, to visualize the relationship between the 2relationship between the 2

Page 16: 1.1 STATISTICS
Page 17: 1.1 STATISTICS
Page 18: 1.1 STATISTICS

Drawing error barsDrawing error bars

The simplest way to draw an error The simplest way to draw an error bar is to use the mean as the central bar is to use the mean as the central point, and to use the distance of the point, and to use the distance of the measurement that is furthest from measurement that is furthest from the average as the endpoints of the the average as the endpoints of the data bardata bar

Page 19: 1.1 STATISTICS

Average value

Value farthest from average

Calculated distance

Page 20: 1.1 STATISTICS

What do error bars suggest?What do error bars suggest? If the bars show extensive overlap, it If the bars show extensive overlap, it

is likely that there is is likely that there is notnot a significant a significant difference between those valuesdifference between those values

Page 21: 1.1 STATISTICS
Page 22: 1.1 STATISTICS

Quick Review – 3 measures of Quick Review – 3 measures of ““Central TendencyCentral Tendency””

modemode: value that appears most : value that appears most frequentlyfrequently

medianmedian: When all data are listed : When all data are listed from least to greatest, the value at from least to greatest, the value at which half of the observations are which half of the observations are greater, and half are lesser. greater, and half are lesser.

The most commonly used measure of The most commonly used measure of central tendency is the central tendency is the meanmean, or , or arithmetic average (sum of data arithmetic average (sum of data points divided by the number of points divided by the number of points)  points)       

Page 23: 1.1 STATISTICS

How can leaf lengths be How can leaf lengths be displayed graphically?displayed graphically?

Page 24: 1.1 STATISTICS

Simply measure the lengths of each and plot how Simply measure the lengths of each and plot how many are of each lengthmany are of each length

Page 25: 1.1 STATISTICS

If smoothed, the histogram data If smoothed, the histogram data assumes this shapeassumes this shape

Page 26: 1.1 STATISTICS

This Shape?This Shape?

Is a classic bell-shaped curve, AKA Is a classic bell-shaped curve, AKA Gaussian Distribution Curve, AKA a Gaussian Distribution Curve, AKA a Normal Distribution curve.Normal Distribution curve.

Essentially it means that in all Essentially it means that in all studies with an adequate number of studies with an adequate number of datapoints (>30) a significant datapoints (>30) a significant number of results tend to be near the number of results tend to be near the mean. Fewer results are found mean. Fewer results are found farther from the mean farther from the mean

Page 27: 1.1 STATISTICS

The The standard deviationstandard deviation is a is a statistic that tells you how tightly all statistic that tells you how tightly all the various examples are clustered the various examples are clustered around the mean in a set of dataaround the mean in a set of data

Page 28: 1.1 STATISTICS

Standard deviationStandard deviation

The STANDARD DEVIATION is a more The STANDARD DEVIATION is a more sophisticated indicator of the precision sophisticated indicator of the precision of a set of a given number of of a set of a given number of measurementsmeasurementsThe standard deviation is like an average The standard deviation is like an average

deviation of measurement values from the deviation of measurement values from the mean. In large studies, the standard mean. In large studies, the standard deviation is used to draw error bars, instead deviation is used to draw error bars, instead of the maximum deviation.of the maximum deviation.

Page 29: 1.1 STATISTICS

A typical standard distribution A typical standard distribution curvecurve

Page 30: 1.1 STATISTICS

According to this curve:According to this curve:

One standard deviationOne standard deviation away from the away from the mean in either direction on the mean in either direction on the horizontal axis (the red area on the horizontal axis (the red area on the preceding graph) accounts for preceding graph) accounts for somewhere around somewhere around 68 percent68 percent of the of the data in this group. data in this group.

Two standard deviationsTwo standard deviations away from the away from the mean (mean (the redthe red and and green areasgreen areas) account ) account for roughly for roughly 95 percent of the data. 95 percent of the data.

Page 31: 1.1 STATISTICS

Three Standard Deviations?Three Standard Deviations?

three standard deviations (the red, three standard deviations (the red, green and blue areas) account for green and blue areas) account for about 99 percent of the dataabout 99 percent of the data

-3sd -2sd +/-1sd 2sd +3sd

Page 32: 1.1 STATISTICS

How is Standard Deviation How is Standard Deviation calculated?calculated?

With this formula!With this formula!

Page 33: 1.1 STATISTICS

AGHHH! Ms. PatiAGHHH! Ms. Pati

DO I NEED TO DO I NEED TO KNOW THIS FOR KNOW THIS FOR THE TEST?????THE TEST?????

Page 34: 1.1 STATISTICS

Not the formula!Not the formula!

This can be calculated on a scientific calculatorThis can be calculated on a scientific calculator OR…. In Microsoft Excel, type the following code OR…. In Microsoft Excel, type the following code

into the cell where you want the Standard into the cell where you want the Standard Deviation result, using the "unbiased," or "n-1" Deviation result, using the "unbiased," or "n-1" method: =STDEV(A1:A30) method: =STDEV(A1:A30) (substitute the cell (substitute the cell name of the first value in your dataset for A1, and name of the first value in your dataset for A1, and the cell name of the last value for A30.)the cell name of the last value for A30.)

OR….Try this! OR….Try this! http://www.pages.drexel.edu/~jdf37/mean.htm

Page 35: 1.1 STATISTICS

You DO need to know the You DO need to know the concept!concept!

standard deviationstandard deviation is a statistic that tells is a statistic that tells how tightly all the various datapoints are how tightly all the various datapoints are clustered around the mean in a set of data. clustered around the mean in a set of data.

When the datapoints are tightly bunched When the datapoints are tightly bunched together and the bell-shaped curve is together and the bell-shaped curve is steep, the standard deviation is small.steep, the standard deviation is small.(precise results, smaller sd)(precise results, smaller sd)

When the datapoints are spread apart and When the datapoints are spread apart and the bell curve is relatively flat, a large the bell curve is relatively flat, a large standard deviation value suggests less standard deviation value suggests less precise resultsprecise results

Page 36: 1.1 STATISTICS

THE ENDTHE END

For todayFor today……….……….