Chapter 1 Notes for STAT400

Embed Size (px)

Citation preview

  • 7/21/2019 Chapter 1 Notes for STAT400

    1/31

    Populations, Samples, and Processes.Populations and Samples

    Random Variables and Statistical PopulationsBrunches of statistics.

    Types of data.Describing data by tables and graphs.

    Measures of LocationMeasures of Variability

    STAT400. Chapter 1. Overview and Descriptive

    Statistics

    Natalia Tchetcherina

    January 26, 28

    Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics

    http://find/
  • 7/21/2019 Chapter 1 Notes for STAT400

    2/31

    Populations, Samples, and Processes.Populations and Samples

    Random Variables and Statistical PopulationsBrunches of statistics.

    Types of data.Describing data by tables and graphs.

    Measures of LocationMeasures of Variability

    Populations, Samples, and Processes.

    Populations and Samples

    Random Variables and Statistical Populations

    Brunches of statistics.

    Types of data.Categorical dataNumerical data.

    Describing data by tables and graphs.

    Categorical Data.Discrete Data.Continuous data.

    Measures of Location

    Measures of Variability

    Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics

    http://find/
  • 7/21/2019 Chapter 1 Notes for STAT400

    3/31

    Populations, Samples, and Processes.Populations and Samples

    Random Variables and Statistical PopulationsBrunches of statistics.

    Types of data.Describing data by tables and graphs.

    Measures of LocationMeasures of Variability

    Statistics.

    Statisticsas a subject provides a body of principles andmethodology for designing the process of data collection,

    summarizing and interpreting the data, and drawing conclusions orgeneralities.

    Examples.

    Employment. Monthly, as part of the Current Population Survey,

    the Bureau of Census collects information about employmentstatus from a sample of about 65,000 households. Households arecontacted on a rotating basis with three-fourths of the sampleremaining the same for any two consecutive months.The survey data are analyzed by the Bureau of Labor Statistics,

    which reports monthly unemployment rates.

    Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics

    http://find/
  • 7/21/2019 Chapter 1 Notes for STAT400

    4/31

    Populations, Samples, and Processes.Populations and Samples

    Random Variables and Statistical PopulationsBrunches of statistics.

    Types of data.Describing data by tables and graphs.

    Measures of LocationMeasures of Variability

    Statistics.

    Examples.Gallup Poll. This, the best known of the national polls, producesestimates of the percentage of popular vote for each candidatebased on interviews with a minimum of 1500 adults. Beginningseveral months before the presidential election, results are regularlypublished. These reports help predict winners and track changes invoter preferences.

    Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics

    P l i S l d P

    http://find/http://goback/
  • 7/21/2019 Chapter 1 Notes for STAT400

    5/31

    Populations, Samples, and Processes.Populations and Samples

    Random Variables and Statistical PopulationsBrunches of statistics.

    Types of data.Describing data by tables and graphs.

    Measures of LocationMeasures of Variability

    Statistics.

    Examples.

    Making in medical research studies. Heart decease is the most

    common cause of death in the industrialized nations. In the USand Canada nearly 30 % of deaths each year are due to heartdeceases, mainly heart attack. Does regular aspirin intake reducesdeaths from heart attacks? The Harvard Medical Schoolconducted a landmark study to investigate. The peopleparticipating in the study regularly took either aspirin or placibo (atablet with no active ingredient). Of those who took aspirin 0.9%suffered heart attacks during the study. Of those who took placibo1.7 % had heart attacks. Could we conclude that its beneficial for

    people to take aspirin?Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics

    P l ti S l d P

    http://find/http://goback/
  • 7/21/2019 Chapter 1 Notes for STAT400

    6/31

    Populations, Samples, and Processes.Populations and Samples

    Random Variables and Statistical PopulationsBrunches of statistics.

    Types of data.Describing data by tables and graphs.

    Measures of LocationMeasures of Variability

    Populations, Units and Characteristics

    Population is a well-defined collection of objects or subjects.Studies involve the investigation of certain characteristic(s) of

    members (called units) of population(s). All water in a lake. Characteristic: Mercury concentration.

    All items of a certain manufactured product (that have, or willbe produced). Characteristic: Proportion of defectives.

    All students enrolled in Big Ten universities during the2013-14 academic year. Characteristics: Favorite type ofmusic; Political affiliation.

    Two types of cleaning products. Characteristic: cleaningeffectiveness.

    Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics

    Pop lations Samples and Processes

    http://find/http://goback/
  • 7/21/2019 Chapter 1 Notes for STAT400

    7/31

    Populations, Samples, and Processes.Populations and Samples

    Random Variables and Statistical PopulationsBrunches of statistics.

    Types of data.Describing data by tables and graphs.

    Measures of LocationMeasures of Variability

    Variable = a Numerical Characteristic

    In most examples considered above, the characteristics weconsidered are quantities that can be measured and expressed as

    numbers, e.g. thermal expansion of a metal, hardness of cement,mercury concentration. Such characteristics are calledquantitative.Examples of non-quantitative characteristics are gender, make ofcar, eye color, strength category, political affiliation. Such

    characteristics are called categorical orqualitative.Because statistical procedures are applied to numerical data sets,the categories in categorical characteristic are labeled witharbitrarily chosen numbers (i.e. male= 1, female= +1).A characteristic expressed as a number is calleda variable.

    Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics

    http://find/
  • 7/21/2019 Chapter 1 Notes for STAT400

    8/31

    Populations Samples and Processes

  • 7/21/2019 Chapter 1 Notes for STAT400

    9/31

    Populations, Samples, and Processes.Populations and Samples

    Random Variables and Statistical PopulationsBrunches of statistics.

    Types of data.Describing data by tables and graphs.

    Measures of LocationMeasures of Variability

    Branches of statistics.

    Experimental design.Planning of data collection.

    Descriptive statistics.Summarizing and describing theprominent features of data.

    Inferential statistics. Evaluation of the information present

    in data (making conclusion).

    Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics

    Populations, Samples, and Processes.

    http://find/http://goback/
  • 7/21/2019 Chapter 1 Notes for STAT400

    10/31

    Populations, Samples, and Processes.Populations and Samples

    Random Variables and Statistical PopulationsBrunches of statistics.

    Types of data.Describing data by tables and graphs.

    Measures of LocationMeasures of Variability

    Categorical dataNumerical data.

    Categorical (qualitative) data

    When the characteristic under study concerns a qualitative trait

    that is only classified in categories and not numerically measured,the resulting data are called categorical data.

    Examples.

    Hair color: blond, brown, red, black

    Employment status: employed, unemployed

    Blood type:O, A, B, AB

    Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics

    Populations, Samples, and Processes.

    http://find/
  • 7/21/2019 Chapter 1 Notes for STAT400

    11/31

    Populations, Samples, and Processes.Populations and Samples

    Random Variables and Statistical PopulationsBrunches of statistics.

    Types of data.Describing data by tables and graphs.

    Measures of LocationMeasures of Variability

    Categorical dataNumerical data.

    Numerical (measurement) data

    If the characteristic is measured on a numerical scale, the resultingdata consist of a set of numbers and are called measurement data.We will use the term (numerical) variable to refer to acharacteristic that is measured on a numerical scale.Examples.

    Shoe size of an adult male

    Height of a 1-year-old pine seedling

    The time in line at an automated teller

    The number of offspring in an animal litter.

    Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics

    http://find/http://goback/
  • 7/21/2019 Chapter 1 Notes for STAT400

    12/31

    Populations, Samples, and Processes.

  • 7/21/2019 Chapter 1 Notes for STAT400

    13/31

    Populations and SamplesRandom Variables and Statistical Populations

    Brunches of statistics.Types of data.

    Describing data by tables and graphs.Measures of Location

    Measures of Variability

    Categorical Data.Discrete Data.Continuous data.

    Frequency table.

    Relative Frequency of a category = Frequency in the categoryTotal number of observations

    .

    Opinion poll on new dorm regulations.

    Responses Frequency Relative Frequency

    Support 152 152/280 =.543Neutral 77 77/280 =.275Oppose 51 51/280 =.182Total 280 1.000

    Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics

    Populations, Samples, and Processes.

    http://find/
  • 7/21/2019 Chapter 1 Notes for STAT400

    14/31

    Populations and SamplesRandom Variables and Statistical Populations

    Brunches of statistics.Types of data.

    Describing data by tables and graphs.Measures of Location

    Measures of Variability

    Categorical Data.Discrete Data.Continuous data.

    Frequency table.

    Daily numbers (x) of internet system crashes.

    Data: 1,3,1,1,0,1,0,1,1,0,2,2,0,0,0,1,2,1,2,0,0,1,6,4,3,3,1,2,4,0.

    Value x Frequency Relative Frequency0 9 .3001 10 .3332 5 .167

    3 3 .1004 2 .0675 0 .0006 1 .033Total 30 1.000

    Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics

    Populations, Samples, and Processes.P l i d S l

    http://find/
  • 7/21/2019 Chapter 1 Notes for STAT400

    15/31

    Populations and SamplesRandom Variables and Statistical Populations

    Brunches of statistics.Types of data.

    Describing data by tables and graphs.Measures of Location

    Measures of Variability

    Categorical Data.Discrete Data.Continuous data.

    Histogram and line diagram

    Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics

    Populations, Samples, and Processes.P l ti d S l

    http://find/
  • 7/21/2019 Chapter 1 Notes for STAT400

    16/31

    Populations and SamplesRandom Variables and Statistical Populations

    Brunches of statistics.Types of data.

    Describing data by tables and graphs.Measures of Location

    Measures of Variability

    Categorical Data.Discrete Data.Continuous data.

    Constructing a Frequency Distribution for a ContinuousVariable

    Find the minimum and the maximum values in the data set. Choose intervals or cells of equal length that cover the range

    between the minimum and the maximum without overlapping.These are called class intervals, and their endpoints classboundaries.

    Count the number of observations in the data that belong toeach class interval. The count in each class is the classfrequencyorcell frequency.

    Calculate the relative frequency of each class by dividing theclass frequency by the total number of observations in the

    data: Class frequencyNatalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics

    Populations, Samples, and Processes.Populations and Samples

    http://find/
  • 7/21/2019 Chapter 1 Notes for STAT400

    17/31

    Populations and SamplesRandom Variables and Statistical Populations

    Brunches of statistics.Types of data.

    Describing data by tables and graphs.Measures of Location

    Measures of Variability

    Categorical Data.Discrete Data.Continuous data.

    Example.

    Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics

    Populations, Samples, and Processes.Populations and Samples

    http://find/
  • 7/21/2019 Chapter 1 Notes for STAT400

    18/31

    Populations and SamplesRandom Variables and Statistical Populations

    Brunches of statistics.Types of data.

    Describing data by tables and graphs.Measures of Location

    Measures of Variability

    Categorical Data.Discrete Data.Continuous data.

    Example.

    Frequency Distribution for Bookstore Sales Data(left endpoints included, but right endpoints

    excluded)

    Class Interval Frequency Relative Frequency$ 0125 5 5/40 =.125125250 8 8/40 =.200

    250375 13 13/40 =.325375500 11 11/40 =.275500625 3 3/40 =.075

    Total 40 1.000

    Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics

    Populations, Samples, and Processes.Populations and Samples

    http://find/http://goback/
  • 7/21/2019 Chapter 1 Notes for STAT400

    19/31

    Populations and SamplesRandom Variables and Statistical Populations

    Brunches of statistics.Types of data.

    Describing data by tables and graphs.Measures of Location

    Measures of Variability

    Categorical Data.Discrete Data.Continuous data.

    Example.

    Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics

    Populations, Samples, and Processes.Populations and Samples

    http://find/http://goback/
  • 7/21/2019 Chapter 1 Notes for STAT400

    20/31

    Populations and SamplesRandom Variables and Statistical Populations

    Brunches of statistics.Types of data.

    Describing data by tables and graphs.Measures of Location

    Measures of Variability

    Population Averages and Sample Averages

    Consider a population ofNunits, and let v1, v2, . . . , vNdenote thestatistical population corresponding to some variable.Then the

    population average orpopulation mean, denoted by , is thearithmetic average of all values in the statistical population. Thus,

    = 1

    N

    Ni=1

    vi.

    If the random variable Xdenotes the value of the variable of arandomly selected population unit, then a synonymous terminologyfor the population mean is expected value ofX, ormean valueofX, and is denoted by X orE(X).

    Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics

    Populations, Samples, and Processes.Populations and Samples

    http://find/
  • 7/21/2019 Chapter 1 Notes for STAT400

    21/31

    p pRandom Variables and Statistical Populations

    Brunches of statistics.Types of data.

    Describing data by tables and graphs.Measures of Location

    Measures of Variability

    Population Averages and Sample Averages

    EXAMPLE: In a population of 500 tin plates, the number of plateswith 0, 1 and 2 scratches is N0= 190, N1= 160 and

    N2= 150.Thus, in the statistical population v1, . . . , v500, 190 viequal 0, 160 equal 1, and 150 equal 2.The population mean is

    = 1

    500

    500

    i=1

    vi=0 N0

    500 +

    1 N1500

    +2 N2

    500 = 0.92

    If a tin plate is selected at random and X is the rv denoting thenumber of scratches, the mean value ofX is 0.92. (We writeX = 0.92, or E(X) = 0.92).

    Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics

    Populations, Samples, and Processes.Populations and Samples

    http://find/
  • 7/21/2019 Chapter 1 Notes for STAT400

    22/31

    p pRandom Variables and Statistical Populations

    Brunches of statistics.Types of data.

    Describing data by tables and graphs.Measures of Location

    Measures of Variability

    Population Averages and Sample Averages

    If a sample of size n is taken, and x1, x2, . . . , xn denote the variablevalues of the sample units, then the sample average orsample

    mean, denoted by x, is

    x= 1

    n

    ni=1

    xi

    Under s.r. sampling, a sample mean approximates, but in generalis different fromthe population mean.EXAMPLE: If a s.r. sample ofn= 100 is taken from the 500 tinplates, it could be that there are n0= 40, n1= 34 and n2= 26plates with 0, 1 and 2 scratches.In this case, x= 0.86.

    Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics

    Populations, Samples, and Processes.Populations and Samples

    http://find/
  • 7/21/2019 Chapter 1 Notes for STAT400

    23/31

    Random Variables and Statistical PopulationsBrunches of statistics.

    Types of data.Describing data by tables and graphs.

    Measures of LocationMeasures of Variability

    Median

    The sample median of a set ofn measurements x1, x2. . . , xn is themiddle value when the measurements are arranged from smallest

    to largest. It is denoted as xHow to compute the median

    1. Order the data from smallest to largest.

    2. When the number of observations n is ODDthe median ismiddle observation of the ordered sample.

    3. When the number of observations n is EVEN, twoobservations from the ordered sample fall in the middle, andthe median is their average.

    Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics

    Populations, Samples, and Processes.Populations and Samples

    http://find/http://goback/
  • 7/21/2019 Chapter 1 Notes for STAT400

    24/31

    Random Variables and Statistical PopulationsBrunches of statistics.

    Types of data.Describing data by tables and graphs.

    Measures of LocationMeasures of Variability

    Comparing mean and median

    The median is not affected by a few very small or very largeobservations, whereas the presence of such extremes can have aconsiderable effect on the mean. For extremely asymmetricaldistributions, the median is likely to be a more sensible measure of

    center than the mean.

    Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics

    http://find/
  • 7/21/2019 Chapter 1 Notes for STAT400

    25/31

    Populations, Samples, and Processes.Populations and Samples

    R d V i bl d St ti ti l P l ti

  • 7/21/2019 Chapter 1 Notes for STAT400

    26/31

    Random Variables and Statistical PopulationsBrunches of statistics.

    Types of data.Describing data by tables and graphs.

    Measures of LocationMeasures of Variability

    Sample Quartiles

    Lower (first) quartile (designated Q1) = 25th percentile Second quartile (median) (designated Q2) = 50th percentile Upper (third) quartile (designated Q3) = 75th percentile

    Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics

    Populations, Samples, and Processes.Populations and Samples

    Random Variables and Statistical Populations

    http://find/
  • 7/21/2019 Chapter 1 Notes for STAT400

    27/31

    Random Variables and Statistical PopulationsBrunches of statistics.

    Types of data.Describing data by tables and graphs.

    Measures of LocationMeasures of Variability

    Population Variance and Sample Variance

    Let v1, v2, . . . , vNbe a statistical population with mean .DEFINITION: The population variance, 2, is defined as

    2 = 1

    N

    Ni=1

    (vi )2.

    The standard deviation is the positive square root of thevariance: = 2.If the rv Xdenotes a randomly selected value from the statisticalpopulation, then a synonymous terminology for the populationvariance is variance ofX, and is denoted by 2

    X, or Var(X).

    Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics

    Populations, Samples, and Processes.Populations and Samples

    Random Variables and Statistical Populations

    http://find/
  • 7/21/2019 Chapter 1 Notes for STAT400

    28/31

    Random Variables and Statistical PopulationsBrunches of statistics.

    Types of data.Describing data by tables and graphs.

    Measures of LocationMeasures of Variability

    Population Variance and Sample Variance

    Similarly, the standard deviation ofX is X =

    2X

    .

    A simpler computational formula for the variance is

    2 = 1

    N

    Ni=1

    v2i 2

    .EXAMPLE: Consider the tin plate example, so the statistical

    population v1, . . . , v500, has 190 viequal 0, 160 equal 1, 150 equal2, and = 0.92.Then,

    2 =190 0

    500 +

    1 160500

    +4 150

    500 0.922 = 0.6736.

    Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics

    Populations, Samples, and Processes.Populations and Samples

    Random Variables and Statistical Populations

    http://find/http://goback/
  • 7/21/2019 Chapter 1 Notes for STAT400

    29/31

    Random Variables and Statistical PopulationsBrunches of statistics.

    Types of data.Describing data by tables and graphs.

    Measures of LocationMeasures of Variability

    Population Variance and Sample Variance

    Ifx1, x2, . . . , xn denotes a sample from the statistical population,the sample variance and its computational value are:

    S2 = 1n 1

    ni=1

    (xi x)2 = 1n 1

    ni=1

    x2i 1n n

    i=1

    xi2

    .

    The sample standard deviation is S=S2. Under s.r.

    sampling, a sample variance approximates, but in general is

    different fromthe population variance.EXAMPLE: Consider the s.r. sample ofn= 100 tin plates, whichhas 40, 34 and 26 plates with 0, 1 and 2 scratches.Then,

    S2 = 1

    99[138 73.96] = 0.647

    Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics

    Populations, Samples, and Processes.Populations and Samples

    Random Variables and Statistical Populations

    http://find/
  • 7/21/2019 Chapter 1 Notes for STAT400

    30/31

    Random Variables and Statistical PopulationsBrunches of statistics.

    Types of data.Describing data by tables and graphs.

    Measures of LocationMeasures of Variability

    Other measures of variation

    Sample range = Largest observation - Smallest observation

    Sample interquartile range = Third quartile - First quartile

    Five number summary:

    (minimum, Q1, Q2 (median), Q3, maximum).

    Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics

    Populations, Samples, and Processes.Populations and Samples

    Random Variables and Statistical Populations

    http://find/http://goback/
  • 7/21/2019 Chapter 1 Notes for STAT400

    31/31

    pBrunches of statistics.

    Types of data.Describing data by tables and graphs.

    Measures of LocationMeasures of Variability

    Boxplot

    Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics

    http://find/