22
Notes Unit 1 Chapters 2-5 Univariate Data

Notes Unit 1 Chapters 2-5 Univariate Data. Statistics is the science of data. A set of data includes information about individuals. This information is

Embed Size (px)

Citation preview

Page 1: Notes Unit 1 Chapters 2-5 Univariate Data. Statistics is the science of data. A set of data includes information about individuals. This information is

Notes Unit 1Chapters 2-5

Univariate Data

Page 2: Notes Unit 1 Chapters 2-5 Univariate Data. Statistics is the science of data. A set of data includes information about individuals. This information is

• Statistics is the science of data. A set of data includes information about individuals. This information is organized into different categories or characteristics called variables. For example in our class survey, each one of you is an individual represented in the data set. We collected information about the variables gender, height, etc…

Page 3: Notes Unit 1 Chapters 2-5 Univariate Data. Statistics is the science of data. A set of data includes information about individuals. This information is

• We are always interested in the context of the data. That means…where did it come from, who did we include, when was it collected, why were we interested, what did we collect etc…Without context, data is meaningless.

Page 4: Notes Unit 1 Chapters 2-5 Univariate Data. Statistics is the science of data. A set of data includes information about individuals. This information is

• After we understand the context, the next thing we should always do is GRAPH the data.

Page 5: Notes Unit 1 Chapters 2-5 Univariate Data. Statistics is the science of data. A set of data includes information about individuals. This information is

Graphs

• Be sure to always:

• *Title your graphs

• *Label your axis including units of

• measure

• *number your axes in a consistent

• and reasonable manner

Page 6: Notes Unit 1 Chapters 2-5 Univariate Data. Statistics is the science of data. A set of data includes information about individuals. This information is

Categorical Data

Categorical variables record which of several groups or categories an individual belongs to.

Page 7: Notes Unit 1 Chapters 2-5 Univariate Data. Statistics is the science of data. A set of data includes information about individuals. This information is

Quantitative Data

Quantitative variables take numerical values for which it makes sense to do arithmetic operations like adding or averaging.

Page 8: Notes Unit 1 Chapters 2-5 Univariate Data. Statistics is the science of data. A set of data includes information about individuals. This information is

Quantitative Data

The distribution of a variable tells us what values the variable typically takes and how often it takes them. It is a generalization about the variable values.

Page 9: Notes Unit 1 Chapters 2-5 Univariate Data. Statistics is the science of data. A set of data includes information about individuals. This information is

• When describing any Quantitative distribution:

• C – Center

• U – Unusual Features

• S – Shape

• S – Spread

• &

• B – Be

• S - Specific

Page 10: Notes Unit 1 Chapters 2-5 Univariate Data. Statistics is the science of data. A set of data includes information about individuals. This information is

• Common Shapes of distributions/graphs• Symmetric

• Skewed to the right

• Skewed to the left

• Bimodal

• Uniform

Page 11: Notes Unit 1 Chapters 2-5 Univariate Data. Statistics is the science of data. A set of data includes information about individuals. This information is

• Once you have chosen a shape, you choose a measure of center and spread based on that shape.

Page 12: Notes Unit 1 Chapters 2-5 Univariate Data. Statistics is the science of data. A set of data includes information about individuals. This information is

If a distribution is symmetric, we use mean for center.

Mean: the averageformula:

x

ixxn

Page 13: Notes Unit 1 Chapters 2-5 Univariate Data. Statistics is the science of data. A set of data includes information about individuals. This information is

If the distribution is symmetric, we use standard deviation for spread.

Standard deviation:

21( )

1x is x xn

Page 14: Notes Unit 1 Chapters 2-5 Univariate Data. Statistics is the science of data. A set of data includes information about individuals. This information is

Measure of Center when the distribution is not symmetric:

Median – the middle value in an ordered list. If there are two values in the middle, then average them.

Page 15: Notes Unit 1 Chapters 2-5 Univariate Data. Statistics is the science of data. A set of data includes information about individuals. This information is

Measure Spread or Variability when the distribution is not Symmetric

• We can also examine spread by looking at the range of middle 50% of the data. This is called the:

Interquartile Range (IQR).

IQR = Q3 – Q1

Page 16: Notes Unit 1 Chapters 2-5 Univariate Data. Statistics is the science of data. A set of data includes information about individuals. This information is

We also need to talk about the 5-number summary.

The 5-number summary is made up of the minimum, the first quartile, Q1 (where 25% of the data lies below this value), the median, the third quartile, Q3 (where 75% of the data lies below this value), and the maximum.

Page 17: Notes Unit 1 Chapters 2-5 Univariate Data. Statistics is the science of data. A set of data includes information about individuals. This information is

Another Measure of Spread or Variability

• Range – the difference between the maximum and the minimum observations. This is the simplest measure of spread. We typically use this as preliminary information or if it is the only measure of spread we can calculate.

Page 18: Notes Unit 1 Chapters 2-5 Univariate Data. Statistics is the science of data. A set of data includes information about individuals. This information is

Another measure of spread or variability

• Variance is the average of the squares of the deviations of the observations from their mean. It is the standard deviation squared.

Page 19: Notes Unit 1 Chapters 2-5 Univariate Data. Statistics is the science of data. A set of data includes information about individuals. This information is

• An outlier is an individual observation in data that falls outside the overall pattern of the data.

Page 20: Notes Unit 1 Chapters 2-5 Univariate Data. Statistics is the science of data. A set of data includes information about individuals. This information is

Using the IQR, we can perform a test for outliers.

Outlier Test:

Any value below Q1 – 1.5(IQR)

or above Q3 + 1.5 (IQR)

is considered an outlier.

Page 21: Notes Unit 1 Chapters 2-5 Univariate Data. Statistics is the science of data. A set of data includes information about individuals. This information is

Measures that are not strongly affected by extreme values are said to be resistant.

The median and IQR are more resistant than the mean and standard deviation.

The standard deviation, is even less resistant than the mean.

Page 22: Notes Unit 1 Chapters 2-5 Univariate Data. Statistics is the science of data. A set of data includes information about individuals. This information is

Measures of Spread or Variability – Why?

We measure spread because it’s an important description of what is happening with the data. We need to know about the amount of variation we can expect in a data set.