40
Mystery Data 0.000 0.200 0.400 0.600 0.800 1.000 1.200 1.400 1.600 f(x) = − 1.78984476996036E-05 x² + 1.41853083716074 x − 28104.9051549717 R² = 0.818508472651409 Can you figure out what this data is?

Mystery Data

  • Upload
    gunnar

  • View
    24

  • Download
    0

Embed Size (px)

DESCRIPTION

Mystery Data. 1.1 example. these are prices for Internet service packages find the mean, median and mode determine what type of data this is create a suitable frequency table, stem and leaf plot and graph 13.60 15.60 17.20 16.00 17.50 18.6018.70 - PowerPoint PPT Presentation

Citation preview

Page 1: Mystery Data

Mystery Data

3-Jan-08

22-Feb-08

12-Apr-08

1-Jun-08

21-Jul-0

8

9-Sep-08

29-Oct-

080.0000.2000.4000.6000.8001.0001.2001.4001.600

f(x) = − 1.78984476996036E-05 x² + 1.41853083716074 x − 28104.9051549717R² = 0.818508472651409

Can you figure out what this data is?

Page 2: Mystery Data

1.1 example

these are prices for Internet service packages find the mean, median and mode determine what type of data this is create a suitable frequency table, stem and leaf plot

and graph13.60 15.60 17.20 16.00 17.50 18.60 18.7012.20 18.60 15.70 15.30 13.00 16.40 14.3018.10 18.60 17.60 18.40 19.30 15.60 17.1018.30 15.20 15.70 17.20 18.10 18.40 12.0016.40 15.60

Page 3: Mystery Data

Answers to yesterday’s problem Mean = 494.30/30 = 16.50 Median = average of 15th and 16th numbers Median = (16.40 + 17.10)/2 = 16.75 Mode = 15.60 and 18.60 bimodal What type of data? numerical, so at least

Interval data. It has an absolute starting point, so it is ratio data

Given this, a histogram is appropriate

Page 4: Mystery Data

Frequency Table

Class Interval Frequency

12.00 – 12.99 213.00 – 13.99 214.00 – 14.99 115.00 – 15.99 716.00 – 16.99 317.00 – 17.99 518.00 – 18.99 919.00 – 19.99 1

Page 5: Mystery Data

Stem and Leaf Plot

Stem Leaf12. 20 0013. 60 0014. 3015. 60 70 30 60 20 70 6016. 00 40 4017. 20 50 60 10 2018. 60 70 60 10 60 40 30 10 4019. 30

Page 6: Mystery Data

Histogram

How many class intervals?

What does the height of each bar mean?

What does the histogram tell us about the data?

1

2

3

4

5

6

7

8

10 12 14 16 18 20 22 24

Price

Internet Prices Histogram

Page 7: Mystery Data

Trends in Data

Chapter 1.3 – Visualizing TrendsMathematics of Data Management (Nelson)MDM 4U

Page 8: Mystery Data

Variables Variable (Mathematics)

a symbol denoting a quantity or symbolic representation an unknown quantity

Variable (Statistics) A measurable attribute; these typically vary over time or

between individuals Can be Discrete or Continous or neither

Temperature is a continuous variable Number of siblings is a discrete variable

Page 9: Mystery Data

The Two Types of Variables Independent Variable

values are arbitrarily chosen horizontal axis time is usually independent (why?)

Counter-example: time to do the wave vs. size of group

Dependent Variable values depend on the independent variable vertical axis

Syntax: A graph of arm span vs. height means arm span is the dependent variable and height is independent

Page 10: Mystery Data

Scatter Plots a graphical method of showing two variables each axis represents a variable each point indicates a pair of values (x, y) may show a trend

Page 11: Mystery Data

What is a trend?

a pattern of average behavior that occurs over time

e.g., costs tend to increase over time need two variables to exhibit a trend (time

can be one)

Page 12: Mystery Data

An Example of a trend

U.S. population from 1780 to 1960

Describe the trend A

ttr2

_pop

mill

ions

0

20

40

60

80

100

120

140

PearlReedandKish1940_USpopulationfrom17901940_year1780 1800 1820 1840 1860 1880 1900 1920 1940 1960

019 Scatter Plot

Page 13: Mystery Data

Line of Best Fit

A line that represents the trend in the data Can be used to make predictions Can be drawn by hand or calculated

(median-median and least squares) Gives no indication of the strength of the

trend (use the r or r2 value §1.4)

Page 14: Mystery Data

An example of the line of best fit this is temperature

data from New York over time, with a median-median line added

what type of trend are we looking at?

Att

r2_m

eant

emp

14161820222426283032

StateofNewYorkHistoricalTemperatureData_winters...1900 1920 1940 1960 1980 2000

Attr2_meantemp = 0.0230StateofNewYorkHistoricalTemperatureData_winterseasonmeanof40wea_ - 21.4

048 Scatter Plot

Page 15: Mystery Data

Median-Median Line (10 points)

Page 16: Mystery Data

Creating a Median-Median Line Divide the points into 3 symmetric groups

If there is 1 extra point, include it in the middle group If there are 2 extra points, include one in each end group

Calculate the median x- and y-coordinates for each group and plot the median point (x, y)

If the median points are on a straight line, connect them

Otherwise, line up the two outer points, move 1/3 of the way to the middle point and draw a line of best fit

Page 17: Mystery Data

Creating a Median-Median Line Using Technology Click on the wikispace Right-click the file

armspan_v_height_4_ med-med.ftm and save to your M:\ or USB drive

Open the file Create a scatter plot for each set of data Right-click and select Median-Median Line

Page 18: Mystery Data

MSIP / Homework

Complete p. 37 #2, 3, 6, 8

Page 19: Mystery Data

Trends in Data Using Technology

Chapter 1.4 – Trends in TechnologyMathematics of Data Management (Nelson)MDM 4U

Page 20: Mystery Data

Categories of Correlation A scatter plot can show a

correlation that is positive or negative and strong or weak

There can also be no correlation between two variables

Look at the Correlation Picture and Regression Line examples on this website to help you understand:

http://www.seeingstatistics.com/seeing1999/gallery/CorrelationPicture.htmlhttp://www.seeingstatistics.com/seeing1999/gallery/CorrelationPicture.html

Page 21: Mystery Data

Regression a process of fitting a line or curve to a set of

data if a line is used, it is linear regression if a curve is used, it may be quadratic

regression, cubic regression, etc. Why do we do this? What can we do with the resulting function?

Page 22: Mystery Data

Correlation Coefficient The correlation coefficient, r, is an indicator

of the strength and direction of a linear relationship r = 0 no relationship r = 1 perfect positive correlation r = -1 perfect negative correlation

r2 is the coefficient of determination Takes on values from 0 to 1 if r2 = 0.42, that means that 42% of the variation in

y is due to x

Page 23: Mystery Data

Residuals a residual is the vertical

distance between a point and the line of best fit

if the model you are considering is a good fit, the residuals should be small and have no noticeable pattern

The least-squares line minimizes the sum of the squares of the residuals

y

23456789

x1 2 3 4 5 6 7 8 9

y = 0.0804x + 3.5; r^2 = 0.021

-1

1

3

Res

idua

l

1 2 3 4 5 6 7 8 9x

Collection 1 Scatter Plot

http://www.math.csusb.edu/faculty/stanton/m262/regress/

Page 24: Mystery Data

MSIP / Homework Fathom activity

NHL Team Data http://www.nhl.com/ice/playerstats.htm

TEAM: Pick your favourite Click SEARCH> Click BIOS Click # to sort (if desired) Copy URL FileImportImport From URL

Complete p. 51 #1-6, 7 bcd, 8

Page 25: Mystery Data

Linear RegressionWeight vs. Height (NHL) w = 7.23h – 325

Page 26: Mystery Data

Using the equation

How much does a player who is 203cm tall weigh?

203 cm ÷ 2.54 = 71” w = 7.23(741 – 325)

= 188.33 lbs

How tall is a player who weighs 180 lbs? w = 7.23h – 325 h = (w+325)÷7.23 So h = (180+325)÷7.23 = 69.85” or 177.4cm

Page 27: Mystery Data

References

Wikipedia (2004). Online Encyclopedia. Retrieved September 1, 2004 from http://en.wikipedia.org/wiki/Main_Page

Page 28: Mystery Data

1.5 Comparing Apples to Oranges http://www.smarter.org/research/apples-to-

oranges/

Page 29: Mystery Data

The Power of Data

Chapter 1.5 – The MediaMathematics of Data Management (Nelson)MDM 4U

There are 3 kinds of lies: lies, damn lies and statistics.

Page 30: Mystery Data

Example 1 – Changing the scale on the axis Why is the following graph misleading?

Mr. Lieff Mr. Winter Mr. Dickie Mr. Frey40

42

44

46

48

50

52

Favourite Teacher

Page 31: Mystery Data

Example 1 – Scale from 0 Consider that this is a bar graph – could it

still be misleading?

Mr. Lieff Mr. Winter Mr. Dickie Mr. Frey0

10

20

30

40

50

60

Favourite Teacher

Page 32: Mystery Data

Include every category!

Mr. Lieff Mr. Winter Mr. Dickie Mr. Frey Mr. Villanueva0

10

20

30

40

50

60

70

Favourite Teacher

Page 33: Mystery Data

Example 2 – Using a Small Sample For the following surveys, consider:

The sample size If there is any (mis)leading language

Page 34: Mystery Data

Example 2 – Using a Small Sample “4 out of 5 dentists recommend Trident sugarless gum to

their patients who chew gum.” “In the past, we found errors in 4 out of 5 of the returns

people brought in for a Second Look review.” (H&R Block)

“Did you know that 1 in 4 women can misread a traditional pregnancy test result?” (Clearblue Easy Digital Pregnancy Test)

“Using Pedigree® DentaStix® daily can reduce the build up of tartar by up to 80%.”

“Did you know that the average Canadian wastes $500 of food in a year?” (Zip-Lock Freezer bags)

Page 35: Mystery Data

Details on the Trident Survey How many dentists did they ask?

Actual number: 1200 4 out of 5 is convincing but reasonable

5 out of 5 is preposterous 3 out of 5 is good but not great Actual statistic 85%

Recommend Trident over what? There were 2 other options:

Chewing sugared gum Not chewing gum

Page 36: Mystery Data

Misleading Statements(?)

How could these statements be misleading? “More people stay with Bell Mobility than any

other provider.” “Every minute of every hour of every

business day, someone comes back to Bell.”

Page 37: Mystery Data

“More people stay with Bell Mobility than any other provider.” Does not specify how many more customers stay

with Bell. e.g. Percentage of customers renewing their plan:

Bell: 30% Rogers: 29% Telus: 25% Fido: 28% Did they compare percentages or totals? What does it mean to “stay with Bell”? Honour entire

contract? Renew contract at the end of a term? Are early terminations factored in? If so, does Bell

have a higher cost for early terminations? Competitors’ renewal rates may have decreased

due to family plans / bundling Does the data include Private / Corporate plans?

Page 38: Mystery Data

“Every minute of every hour of every business day, someone comes back to Bell.” 60 mins x 7 hours x 5 days = 2 100/wk What does it mean to “Come back to Bell”? How many hours in a business day?

Page 39: Mystery Data

How does the media use (misuse) data? To inform the public about world events in an

objective manner It sometimes gives misleading or false impressions

to sway the public or to increase ratings

It is important to: Study statistics to understand how information is

represented or misrepresented Correctly interpret tables/charts presented by the media

Page 40: Mystery Data

MSIP / Homework

Read pp. 57 – 60 Ex. 1-2 Complete p. 60 #1-6 Final Project Example – Manipulating Data

(on wiki)

Examples http://junkcharts.typepad.com/ http://www.coolschool.ca/lor/AMA11/unit1/U01L02.htm http://mediamatters.org/research/200503220005