Algebra 1 Statistics Part 1 Unit 1

Embed Size (px)

DESCRIPTION

Today’s Objectives Calculate Mean, Median, Mode from a set of data. Calculate 5 number summary from a set of data. Calculate Range from a set of data. Calculate Interquartile Range (IQR) from a set of data. Common Core State Standard Focus standard for the day: S.ID.1 Represent data with plots on a real number line. (box-plots)

Citation preview

Algebra 1 Statistics Part 1 Unit 1 Todays Objectives Calculate Mean, Median, Mode from a set of data.
Calculate 5 number summary froma set of data. Calculate Range from a set of data. Calculate Interquartile Range (IQR)from a set of data. Common Core State Standard Focus standard for the day: S.ID.1 Represent data with plots ona real number line. (box-plots) Question: What is the first step in finding the median, andwhat is the 5 number summary? Measures of Center The MEAN of a data set is its average.
Use the symbolfor mean. Calculate by adding all the numbers and divide by how many individualnumbers there are. I have 5 different numbers that represent the ages of randomly selectedpeople:11, 72, 83, 94, 25 mean: 5 The average age of these 5 people is 57 years old. The MEDIAN of a data set is its midpoint. That is, half the data fall above the median and half fall below the median.(50% above, 50% below) To find the median: Sort data from low to high, count to middle. The MODE is the most frequently occurring value Lesson 1.2.1 Rainy Days! For the last 15 years, I have kept track of the number of rainy days we had in April. The results are below. Calculate the mean, median, and mode of these data. Here are the data: 16, 3, 16, 15, 13, 26, 15, 13, 14, 3, 10, 8, 9, 2, 9 Mean: _________ Median: _________ Mode: __________ Lesson 1.2.1 Rainy Days! For the last 15 years, I have kept track of the number of rainy days we had in April. The results are below. Calculate the mean, median, and mode of these data. Here are the data: 16, 3, 16, 15, 13, 26, 15, 13, 14, 3, 10, 8, 9, 2, 9 Rearranged: 2, 3, 3, 8, 9, 9, 10, 13, 13, 14, 15, 15, 16, 16, 26 Mean: ____11.46_____ Median: ____13_____ Mode: _____3,9,13,15,16_____ Lesson 1.2.1 5 Number Summary Consisting of: Minimum, Q1, Medium, Q3, and Maximum
Minimum: Smallest value of the sample data Q1: first quartile, this is the median of the lower half of data [lower 25% of data falls in this range] Median: technically Q2, middle point of sample data Q3: third quartile, this is the median of the upper half of data [upper 25% of data falls in this range] Maximum: largest value of the sample data This 5 number summary can be used to create a boxplot. (aka boxand whisker plot) Find the 5 number summary, IQR, and Range
Rainy Days! For the last 15 years, I have kept track of the number of rainy days we had inApril.The results are below. Find the 5 number summary, IQR, and Range Here are the data: 2, 3, 3, 8, 9, 9, 10, 13, 13, 14, 15, 15, 16, 16, 26 Min:_______ Q1:________Range:_______ Med:_______IQR:________ Q3:________ Max:_______ Lesson 1.2.1 Find the 5 number summary, IQR, and Range
Rainy Days! For the last 15 years, I have kept track of the number of rainy days we had inApril.The results are below. Find the 5 number summary, IQR, and Range Here are the data: 2, 3, 3, 8, 9, 9, 10, 13, 13, 14, 15, 15, 16, 16, 26 Min:___2____ Q1:____8____Range:_______ Med:___13____IQR:________ Q3:_____15___ Max:___26____ Lesson 1.2.1 Measures of Spread: 5 number summary
RANGE = maximum value minimum value Inter-quartile range (IQR) = Q3 Q1 Talk for next class: Outliers are observations (data points) too far removed from themain body of data. Any observation above Q x IQR Any observation below Q1 1.5 x IQR Lesson 1.2.1 Find the 5 number summary, IQR, and Range
Rainy Days! For the last 15 years, I have kept track of the number of rainy days we had inApril.The results are below. Find the 5 number summary, IQR, and Range Here are the data: 2, 3, 3, 8, 9, 9, 10, 13, 13, 14, 15, 15, 16, 16, 26 Min:___2____ Q1:____8____Range:_______ Med:___13____IQR:________ Q3:_____15___ Max:___26____ Lesson 1.2.1 Find the 5 number summary, IQR, and Range
Rainy Days! For the last 15 years, I have kept track of the number of rainy days we had inApril.The results are below. Find the 5 number summary, IQR, and Range Here are the data: 2, 3, 3, 8, 9, 9, 10, 13, 13, 14, 15, 15, 16, 16, 26 Min:___2____ Q1:____8____Range:___24___ Med:___13____IQR:____7____ Q3:_____15___ Max:___26____ Lesson 1.2.1 Lets Practice what weve learned!
End of day 1 Lets Practice what weve learned! Todays Objectives Using the 5 number summary, IQR,and Range to describe the data. Constructing a box plot. Describing the Center, UnusualPoints, Spread, Shape (C.U.S.S) Communicate! Communicate! Outliers- what are they? Calculating Outliers using the 1.5 rule. Common Core State Standard Focus standard for the day: S.ID.1 Represent data with plots on areal number line. (box-plots) S.ID.2 Use statistics appropriate to theshape of the data distribution tocompare center (median, mean) andspread (interquartile range, standarddeviation) of two or more data sets. S.ID.3 interpret differences in shape,center, and spread in the context ofthe data sets, accounting for possibleeffects of extreme data points (outliers). Question: What are the 4 main characteristics to describewhen reporting your findings on a set of data? Find the 5 number summary, IQR, and Range
Rainy Days! For the last 15 years, I have kept track of the number of rainy days we had inApril.The results are below. Find the 5 number summary, IQR, and Range Here are the data: 2, 3, 3, 8, 9, 9, 10, 13, 13, 14, 15, 15, 16, 16, 26 Min:___2____ Q1:____8____Range:___24___ Med:___13____IQR:____7____ Q3:_____15___ Max:___26____ Lesson 1.2.1 Boxplots: Box and Whisker Plot Make a box plot! Lets make a box plot using the same important informationfrom the 5 number summary! Answer The mean is 11.5 Data:2, 3, 3, 8, 9, 9, 10, 13, 13, 14, 15, 15, 16, 16, 26 The median is 13 For the boxplot, we also need Minimum = 2 and Maximum = 26 First Quartile = 8 and Third Quartile = 15 Lesson 1.2.1 Outliers Outliers are observations (data points) too far removedfrom the main body of data. Outliers often skew our data. We can calculate how tofind outliers with the 1.5 Rule for outliers. Upper Outlier: Any observation above Q x IQR Lower Outlier: Any observation belowQ1 1.5 x IQR How did this apply to the Rainy Days?The next slidehas the data again. Lesson 1.2.1 Find the 5 number summary, IQR, and Range
Rainy Days! For the last 15 years, I have kept track of the number of rainy days we had inApril.The results are below. Find the 5 number summary, IQR, and Range Here are the data: 2, 3, 3, 8, 9, 9, 10, 13, 13, 14, 15, 15, 16, 16, 26 Min:___2____ Q1:____8____Range:___24___ Med:___13____IQR:____7____ Q3:_____15___Outliers?Upper: Q (IQR) Max:___26____ Lower:Q1 1.5(IQR) Lesson 1.2.1 Find the 5 number summary, IQR, and Range
Rainy Days! For the last 15 years, I have kept track of the number of rainy days we had inApril.The results are below. Find the 5 number summary, IQR, and Range Here are the data: 2, 3, 3, 8, 9, 9, 10, 13, 13, 14, 15, 15, 16, 16, 26 Min:___2____ Q1:____8____Range:___24___ Med:___13____IQR:____7____ Q3:_____15___Outliers?Upper: Q (IQR)= 25.5 Max:___26____ Lower:Q1 1.5(IQR)= -2.5 Lesson 1.2.1 Outlier in our data! 26 is an outlier for our data!
We must take this into account when constructing a boxplot When constructing a box plot we now only extend the whisker to the pointof the data that stays within our boundaries for outliers. The way we represent an outlier on a box plot is just a single point (dot). Lets make a new boxplot taking this into account! Lets look at the box plot again! Answer Data:2, 3, 3, 8, 9, 9, 10, 13, 13, 14, 15, 15, 16, 16, 26 The median is 13 For the boxplot, we also need Minimum = 2 and Maximum = 26 First Quartile = 8 and Third Quartile = 15 Lesson 1.2.1 C.U.S.S C: Center Median, where is it? Mean can also describe the center, but is not resistant U: Unusual data points Outliers! Are there any? We can calculate them S: Spread Describe the variability of the graph. Range. (largest value smallest value) S: Shape Is the data clumped in a general location? Is data stretchingto the right (skewed right). Is the data stretching to the left(skewed left). LASTLYAlways, ALWAYS describe data with these 4main points. Communication is key in Statistics! C.U.S.S put to use with rainy days example:
After we have crunched numbers and calculated all of this information thatdescribes this quantitative data, we need to communicate it back into context! My write up: The data recorded for the number of rainy days in April for the past 15 yearsappears to have a median number of 13 rainy days with one outlier of 26 rainydays. The data appears to be clumped together with a spread of 24 rainy days;this includes the outlier found in our recorded data. Center: median number of 13 rainy days Unusual points: with one outlier of 26 rainy days. Spread: with a spread of 24 rainy days Shape: the data appears to be clumped together Lets Practice what weve learned!
End of day 2 Lets Practice what weve learned! Day 3: Activity! Putting it all together
Todays class will be spent being statisticians and collecting data from thefield. You will be required to collect data from everyone in this class! Organize your data and create a presentation of your data to the class. See hand out! Algebra 1 Statistics Part 2 Unit S2 Todays Objectives Understand bell curve and theEmpirical rule % Calculate standard deviation fromVariance Common Core State Standard Focus standard for the day: S.ID.2 Describe variability bycalculating deviations from themean Bell Curve Statistics is about representing data and analyzing it in order to report back thefindings. Part of representing data is through graphs. Pie charts, histograms, stem-and-leafplot, and box plots. All of these graphs can be transformed from one or the other. One of the most useful visuals in statistics is the Standard Normal Curve. Or bellcurve Here is what a bell curve looks like, and also a matching box plot. Notice how the median is centered in the middle of the bell curve
Notice how the median is centered in the middle of the bell curve. Couldwe break the box plot up into the percentages of data that falls betweeneach quartile?? Empirical rule % Similarly to breaking up our boxplots into quartiles and describing the percentages ofhow much data falls between each section, we can also break up the bell curveinto Standard Deviations, we denote standard deviation with lower case Greek lettersigma Standard deviation describes the distance away from the center of the bell curve.(mean/median) 1 standard deviation describes that 68% of the data will fall between +1 and -1standard deviation. 2 standard deviations describes that 95% of the data will fall between +2 and -2standard deviations. 3 standard deviations describes that 99.7% of the data will fall between +3 and -3standard deviations. Lets take a look at a visual! Empirical rule standard deviation We can calculate the standard deviation.
Since we will be sampling from different areas of interest, such as baseball,medical records, insurance records, cars, machinery, aircraft, medical devices,household items, agriculture the list goes on! We need to make sure we aretalking about standard deviation in context to the problem! Every set of sampledata has its own unique sample standard deviation. Since we are touching the basis of statistics at this point we will not worry aboutdistinguishing between what it means to calculate Sample Standard Deviationversus the Population Standard Deviation. At this point we want to get down thebasics, then later down your math career we will make sure to distinguish betweenthe differences! Calculating the Standard Deviation
To find the standard deviation all we have to do is take the square root of thevariancebut to find the variance we have to do a bit of work! Variance is described as the average squared distance Step 1: find the mean of your data Step 2: subtract the mean from each and every data point you have. Step 3: square the differences of each data point Step 4: add up al the squared-differences, then divide the sum by 1 this isVariance! Step 5: take the square root to find the standard deviation Lets look at an example together and walk through this! A group of 9 elementary school children was asked how many pets they have. Here are their responses, arranged from lowest to highest Step 1: Find the mean = =5 So we now know that the mean number of pets owned by these 9 children is5 pets. A group of 9 elementary school children was asked how many pets they have. Here are their responses, arranged from lowest to highest Step 2: Subtract the mean from each and every data point we have. Step 3: square the differences from each and every point we have. A group of 9 elementary school children was asked how many pets they have. Here are their responses, arranged from lowest to highest Step 4: add up all the squared-differences, then divide the sum by 1 this is Variance! 2 = 1 = 6.5 Step 5: take the square root to find the standard deviation =6.5 =2.55 The average amount of pets owned of this group of 9 Elementary School Children is5 pets with a Standard Deviation of 2.55 Pets. Rule with pets So back to the standard bell curveif our mean is the center, and we know thestandard deviation nowLet me ask you this How many pets could we say 68% of students sampled have? [ hint: (1) ] How many pets could we say 95% of students sampled have? [ hint: (2) ] How many pets could we say 99.7% of students sampled have? [ hint: (3) ] REMEMBER STATISTICS IS ABOUT COMMUNICATION, COMMUNICATION! Exit Ticket: You try! Here are data from 5 different dogs and their height. Find the standard deviation by hand. 600, 470, 170, 430, 300 Step1: Mean Step 2: Difference Step 3: Square Step 4: add and divide by 1 Step 5: square root Todays Objectives Scatter plots: Positive, no, negative correlation
Correlation does not implycausation! Two way tables Probability vs. Conditionalprobability Common Core State Standard Focus standard for the day: S.ID. 5 Interpret a table that dividesdata into different categories S.ID.6 Represent data on twoquantitative variables on a scatterplot S.ID.6 Describe how twoquantitative variables on a scatterplot are related Scatter Plots A Scatter (XY) Plot has points that show the relationshipbetween two sets of data. In this example, each dot shows one persons weight versus their height. Scatterplots The association between two quantitative variables can beshown on one graph by plotting data points as ordered pairs onaxes.Such a graph is called a scatterplot. Scatter plots do not have connected points If it seems that one variable is a response to the other, then plotthat variable on the y axis.It is called the response variable(dependent variable). The x axis then has the explanatory variable. (independent variable) Temperature C Ice Cream Sales
14.2 $ 215 16.4 $ 325 11.9 $ 185 15.2 $ 332 18.5 $ 406 22.1 $ 522 19.4 $ 412 25.1 $ 614 23.4 $ 544 18.1 $ 421 22.6 $ 445 17.2 $ 408 The local ice cream shop keepstrack of how much ice creamthey sell versus the noontemperature on that day. A scatter plot can show that arelationship exists between twodata sets. Examples Of Correlations: Pos, none, neg? Correlation does not imply causation
Think about it. In a study of college freshmen, researchers found that students who watched TVfor an hour or more on weeknights were significantly more likely to have highblood pressure, compared to those students who watched less than an hour of TVon weeknights. Does this mean that watching more TV raises ones bloodpressure? Explain your reasoning. Ask yourself. What possible outside factors could be in play here? Do those factors have more logical reasoning as to effect blood pressure? Moral of the story Just because there is a correlation, DOES NOT imply that one variable causes theeffect of the other! There can be a lurking variable another factor that could beinfluencing the cause of a variable. You try! Airline Outsourced Percent Delay Percent Air Tran 66 14 Alaska 92 42 American 46 26 American West 76 39 ATA 18 19 Continental 69 20 Delta 48 Frontier 65 31 Hawaiian 80 70 JetBlue 68 Northwest 43 Southwest United 63 27 US Airways 77 24 The Problem:Airlines have increasingly outsourced the maintenance of theirplanes to other companies. Critics say that the maintenance may be lesscarefully done, so that outsourcing creates a safety hazard. As evidence, theypoint to government data on percent of major maintenance outsourced andpercent of flight delays blamed on the airline (often due to maintenanceproblems). Make a scatterplot that shows how delays depend on outsourcing. Probability vs conditional probability
Two way tables Probability vs conditional probability Basics Two way frequency tables are a visual representation of the possiblerelationships between two set of categorical data. The categories are labeledat the top and the left side of the table, with the frequency info appearing inthe interior cells of the table. The totals of each row appear at the right, andthe totals of each column appear at the bottom. If you could have a new vehicle, would you want a sport utility vehicle or a sports car?
Entries in the body of the table are called joint frequencies.The cells that contain the sum are called marginalfrequencies. Two Way Relative Frequency Table
Displays percent or ratios instead of frequency counts. Thesetables can show relative frequencies for the whole table, forrows, or for columns. Relative frequencies can be shown as aratio, decimal or percent. Probability When looking at a relative frequencytable the percent or ratio is also theprobability of that event happening over the ENTIRE TOTAL. If asked, What's the probability a maleselects an SUV? 21/240 If asked, What's the probability afemale selects an SUV? 135/240 If asked, What's the probability that aSUV is selected? 156/240 Probability Notice how all the probabilities have a denominator of 240! Its out of theentire table total! Moral of the story When asked fora probability that does not have apreexisting condition look for the specific characteristics desired in thetable divided by the table total. P Event A = number of outcomes corresponding to event A Or you can look at it this way P specific charac = Specific characteristics table total Conditional probability
When we are calculating the probability of an event occurring given thatanother event has occurred, we are describing conditional probability. Certain conditions have been preselected, and now we much calculate theprobability based on that condition already happening. When we have conditional probability our denominator value becomes thecolumn total or the row total depending on which condition is given. Example: What is the probability of selecting a sports car given a male? V.S. What's the probability a male selects an SUV? Conditional probability
Notice how the totals for each box is over the TABLE TOTAL? What if we knew one of the variables already? What is the probability that its asports car GIVEN that its a male? Then our probability changes!! ( = Probability( sports car given that its a male) = =.65=65% Comparing two different questions
What is the probability of selecting asports car given a male? What's the probability a maleselects an sports car? (male selecting an sports car)= =0.1625=16.25% ( = =.65=65% Flashback! On April 15, 1912, the Titanic struck an iceberg and rapidly sank with only710 of her 2,204 passengers and crew surviving. Data on survival ofpassengers are summarized in the table below Survival Status Class of Travel Survived Died Total First Class 201 123 Second Class 118 166 Third Class 181 528 Conditional Probability
1 201/324 3 528/709 1 201/500 P(survived) 500/1317 Survival Status Class of Travel Survived Died Total First Class 201 123 324 Second Class 118 166 284 Third Class 181 528 709 500 817 1317 Todays Objectives Correlation coefficient r
Cover the understanding that r fallsbetween, 1 #4 Linear Regression (Ax + b)> L1, L2 You will get the linear regression line, and R-square. Take square root toget r! 2) Compute r value with website: Lets try The local ice cream shop keeps track of how much ice cream they sell versus the noon temperature on that day. Compute the r value with TI 83: Compute the r value with For Homework: Collect data from your classmates (teacher choice: whole class / half class /section) Make a scatter plot of your sample with the graph provided Label axis correctly with a title Calculate the correlation constant using technology Interpret the correlation constant and scatter plot; Direction Strength Remember: communication is important, so report your answer in context ofthe problem! Todays Objectives Regression Line introduction Line of best fit
Interpret Slope and y-intercept of a linearmodel in the context of the data Common Core State Standard Focus standard for the day: S.ID.6a Practice steps to find the bestline using technology of your choice S.ID.6b Quantify the goodness of fit of asmall data set by plotting and analyzingresiduals S.ID. 6c Fit a linear function for a scatterplot that suggests a linear association S.ID.7 Interpret the slope and the interceptof a linear model in the context of thedata F.IF.6 Calculate and interpret the averagerate of change of a function (presentedsymbolically or as a table) over aspecified interval. Estimate the rate ofchange from a graph. Draw a quick sketch of three scatterplots:
Warm up/ pop quiz Draw a quick sketch of three scatterplots: Draw a plot with r .9 Draw a plot with r -.5 Draw a plot with r 0 Temperature C Ice Cream Sales
14.2 $ 215 16.4 $ 325 11.9 $ 185 15.2 $ 332 18.5 $ 406 22.1 $ 522 19.4 $ 412 25.1 $ 614 23.4 $ 544 18.1 $ 421 22.6 $ 445 17.2 $ 408 The local ice cream shop keepstrack of how much ice creamthey sell versus the noontemperature on that day. Algebra Line of Best Fit
We can draw a Line of Best Fit on our scatter plot: When creating a line of best fit we try to have the line as close as possible to all points, and as many points above the line as below. In Algebra the Line of best fit comes in the form=+. Statistics Regression Line
In Algebra our line is known as a line of best fit In statistics, this is called a regression line! A line that describes how a response variables y changes as an explanatoryvariable x changes. We often use a regression line to predict the value of y for agiven value of x. Formulas for Regression Line
The Regression line is linear, so it follows the form y = mx + b In Statistics, we say =+(Pronounced y-hat) In this context,is called the predicted value WARNING: We are entering predicting statistics, using the correctnotationis very important!! Variable breakdown in Algebra terms: ... is the predicted value a is the y-intercept b is the slope =+ The Meaning of Slope In a simple algebraic equation such as, = 2 + 17, what isthe real meaning of the slope? For every increase in x of 1 unit, y increases by 2 In the function = 2 + 17 what is the meaning of the yintercept? It is the value y takes on when x = 0 In statistics ifthe regression line is =17+2 What is the slope? What is the y-intercept? Example Some data were collected on the weight of a male white laboratory rat for the first 25weeks after its birth. A scatterplot of the weight (in grams) and time since birth (inweeks) shows a fairly strong, positive relationship. The linear regression equation modelsthe data fairly well: = () 1) What is the slope of the regression line? Explain what it means incontext 2) Whats the y intercept? Explain in context 3) Predict the rats weight at 16 weeks Todays Objectives Interpret 2 value!
Fit a linear regression equation withthe appropriate graph Common Core State Standard Focus standard for the day: S.ID.6a Practice steps to find thebest line using technology of yourchoice S.ID.6b Quantify the goodness of fitof a small data set by plotting andanalyzing residuals S.ID. 6c Fit a linear function for ascatter plot that suggests a linearassociation S.ID.7 Interpret the slope and theintercept of a linear model in thecontext of the data remember our Ice cream sales depending on temperature example?
Would you say our regression line drawn generally represents each data point, giveor take some? Regression lines and scatter plots
When looking at a scatter plot someone could draw any line.Perhaps a lineshifted up, shifted down, more left, more right, but which line is best? We can create a regression line to represent our data, and we can alsomeasure how accurately that line represents the data: our numerical valuethat indicates the strength of our line is 2 much like probability, our 2value falls between0