Introductory Statistical Concepts

Embed Size (px)

Citation preview

  • 7/28/2019 Introductory Statistical Concepts

    1/118

    Introductory StatisticalConcepts

    F. Michael Speed, Ph.D.

    Department of Statistics

    Texas A&M University

  • 7/28/2019 Introductory Statistical Concepts

    2/118

    2

    Disclaimer I am not an expert SAS programmer.

    Nothing that I say is confirmed or denied by TexasA&M University.

  • 7/28/2019 Introductory Statistical Concepts

    3/118

    3

    Why Are We Here?Deming

    To Learn

    To Have Fun

    Question: Who was Deming?

  • 7/28/2019 Introductory Statistical Concepts

    4/118

    4

    Poll: What type of organization do you workfor?[PlaceWare Multiple Choice Poll. Use PlaceWare> Edit Slid e Propert ies.. . to edit.]

    BusinessGovernment

    Education

    Nonprofit

    Other

  • 7/28/2019 Introductory Statistical Concepts

    5/118

    5

    Purpose of These LecturesA review of the statistical concepts used in most of the

    SAS Analytics Lecture Series.We will look at questions such as the following:

    What is the nature of statistical analyses?

    Why are population parameters so important?

    What is really being tested when you see ap-value?

    Why does regression handle missing data so well?

    What are residual analyses?

  • 7/28/2019 Introductory Statistical Concepts

    6/118

    Descriptive Statistics

  • 7/28/2019 Introductory Statistical Concepts

    7/1187

    (Very impo rtant concepts)

    Variable of Interest

    The Distribution

    Parameters

    Mean Mode Range

    Median Variance

    Etc

    The Population

  • 7/28/2019 Introductory Statistical Concepts

    8/1188

    Learning OutcomesYou will learn

    basic statistical concepts the definition of mean, median, mode and standard deviation

    the difference between populations and samples

    the difference between parameters and estimates

    about confidence intervals

    how to test a statistical hypothesis

    how to run a regression analysis

  • 7/28/2019 Introductory Statistical Concepts

    9/1189

    ParametersCharacteristics of the variable of interest

    It is how we describe the variable of interest

    Parameters are unknown

  • 7/28/2019 Introductory Statistical Concepts

    10/11810

    Parameters

    (Characteristics)

    Central Tendency

    Mode

    Median

    Mean

    Measures of Variability

    Range

    Variance

    Standard Deviation

    Click Here for more information on Mode Mean Median

    Click Here for an applet

    http://dist.stat.tamu.edu/pub/speed/mean_mode.htmhttp://www.stat.tamu.edu/~west/ph/meanmedian.htmlhttp://www.stat.tamu.edu/~west/ph/meanmedian.htmlhttp://dist.stat.tamu.edu/pub/speed/mean_mode.htm
  • 7/28/2019 Introductory Statistical Concepts

    11/118

    Variability

    Change in the Data

  • 7/28/2019 Introductory Statistical Concepts

    12/11812

    What is an Index ?

    How SUNNY is SUNNY?

    THE UV Index

    Click Here

    http://www.epa.gov/sunwise/uviscale.htmlhttp://www.epa.gov/sunwise/uviscale.html
  • 7/28/2019 Introductory Statistical Concepts

    13/11813

    Air Quality Index

    What Does It Mean?

  • 7/28/2019 Introductory Statistical Concepts

    14/11814

    DOW JONES INDUSTRIAL AVERAGE INDEX

    What does 10,971.16 really mean?

    What is better a DJIA of 10,000

    Or a DJIA of 12,000?

  • 7/28/2019 Introductory Statistical Concepts

    15/11815

    Variability IndexA Simple One

    Find the Largest Value

    Find the Smallest Value

    Let Range = R = Largest Smallest

  • 7/28/2019 Introductory Statistical Concepts

    16/11816

    A More Complex Variation Index

    The Standard Deviation

    Statisticians use this index to indicate variability

    You will see it written as

    Widely available from SAS, Excel, and other statistical packages

    or S or s

  • 7/28/2019 Introductory Statistical Concepts

    17/118

    17

    Details of the More Complex IndexExample Suppose that we observe the following three numbers

    1 4 7

    The mean of these number is:

    ( 1 +4+7)/3 = 4

    We now subtract the mean from each number and square it

    (1-4)*(1-4) + (4-4)*(4-4) +(7-4)*(7-4) = 18

    The Standard Deviation = sqrt(18/2) = 3

  • 7/28/2019 Introductory Statistical Concepts

    18/118

    18

    What does this Mean?By itself , it may be confusing to some.

    Comparing populations, we can use it to say which

    population varies the most.

    Let us look at an appletClick Here

    http://www.stat.tamu.edu/~west/ph/stddev.htmlhttp://www.stat.tamu.edu/~west/ph/stddev.html
  • 7/28/2019 Introductory Statistical Concepts

    19/118

    19

    Using Graphs to Determine Variability

    Box Plot

    Click Here

    3535N =

    State

    NEW_YORKCALIFORN

    400000

    300000

    200000

    100000

    0

    http://www.netmba.com/statistics/plot/box/http://www.netmba.com/statistics/plot/box/http://www.netmba.com/statistics/plot/box/
  • 7/28/2019 Introductory Statistical Concepts

    20/118

    20

    Describe What Is Happening

    You are giving the parameters of the picture

  • 7/28/2019 Introductory Statistical Concepts

    21/118

    21

    Example Using SAS

  • 7/28/2019 Introductory Statistical Concepts

    22/118

    Distributions

  • 7/28/2019 Introductory Statistical Concepts

    23/118

    23

    Known DistributionWith a known distribution, we know the following:

    the shape

    the mean

    the variability (standard deviation)

    and/or some other information

  • 7/28/2019 Introductory Statistical Concepts

    24/118

    24

    Classical DistributionsNormal

  • 7/28/2019 Introductory Statistical Concepts

    25/118

    25

    NormalOverlay

  • 7/28/2019 Introductory Statistical Concepts

    26/118

    26

    Classical DistributionsUniform

  • 7/28/2019 Introductory Statistical Concepts

    27/118

    27

    UniformOverlay

  • 7/28/2019 Introductory Statistical Concepts

    28/118

    28

    Classical DistributionsChi-Square

  • 7/28/2019 Introductory Statistical Concepts

    29/118

    29

    SurveyThe following are called parameters of the population:

    mean, median, mode

    variance, standard deviation, range, inter-quartile

    range (IQR)

    In general, are these known or unknown?

    Known = yes (select using your seat indicator)

    Unknown = no (select using your seat indicator)

  • 7/28/2019 Introductory Statistical Concepts

    30/118

    30

    Generate a Sample from a Known Distribution

    Why? This is a simulation.

    It helps us to understand a process or analyses.

    It helps to see if we are getting expected results.

    It is fun.

  • 7/28/2019 Introductory Statistical Concepts

    31/118

    31

    MPG ExampleSuppose we want to simulate mpg for a car that weighs

    3000 lbs.

    Let us assume that the mean mpg=24.

    Let us assume that the standard deviation=1 mpg.

    We will generate a number from the normal distributionwith mean 0 and standard deviation=1.

    We will then add (subtract) that number from 24.

  • 7/28/2019 Introductory Statistical Concepts

    32/118

    32

    MPGComposition

    Let us generate 1000 mpg.

    Observed +/-= Essential Part24 LeftoversN(0,1)

  • 7/28/2019 Introductory Statistical Concepts

    33/118

  • 7/28/2019 Introductory Statistical Concepts

    34/118

    34

    Simulated MPG

  • 7/28/2019 Introductory Statistical Concepts

    35/118

    35

    MPGHistogram

    Compare withtrue values !

  • 7/28/2019 Introductory Statistical Concepts

    36/118

    36

    Simulated SampleIn this example, we simulated taking a sample of size

    1000 from one population of cars weighing 3000 poundswith a normal distribution with mean=24 and standard

    deviation=1.

    You can practice this after class.

  • 7/28/2019 Introductory Statistical Concepts

    37/118

    37

    After Class PracticeSimulate 1000 data points for each of the following five

    populations. Run and explore your data.

  • 7/28/2019 Introductory Statistical Concepts

    38/118

    38

    SAS Code 4: Generate a Normal withMean 0 and Standard Deviation of s=1.5

    data mpg1;s=1.5;mean = 24;do i=1 to 1000;

    lo =s*normal(-1);

    mpg = mean + lo;output;end;

    run;

  • 7/28/2019 Introductory Statistical Concepts

    39/118

    39

    This demonstration illustrates how to simulate

    data for a given population.

    Simulating DataSAS_Code4.sas

  • 7/28/2019 Introductory Statistical Concepts

    40/118

    40

    View/Application Share: Demo: Simulation[PlaceWare View/Application Share. Use PlaceWare> Edit Slide Pro pert ies... to edit.]

  • 7/28/2019 Introductory Statistical Concepts

    41/118

    41

    Summary

  • 7/28/2019 Introductory Statistical Concepts

    42/118

    42

  • 7/28/2019 Introductory Statistical Concepts

    43/118

    Section 1.2

    Populations and Samples

  • 7/28/2019 Introductory Statistical Concepts

    44/118

    44

    Objectives Understand the relationships between

    populations and samples parameters and estimates.

    Look at an overview of hypotheses testing.

  • 7/28/2019 Introductory Statistical Concepts

    45/118

    45

    Population

    Mean, Variance, Median,

    Mode, Distribution,

    Parameters

  • 7/28/2019 Introductory Statistical Concepts

    46/118

    46

    ExampleMpg of American-made cars that weigh between 2000

    and 3500 pounds and were built in the 1970s.

    Parameters mean, variance, and so on

    In general, we do not know the parameters.

  • 7/28/2019 Introductory Statistical Concepts

    47/118

    47

    Purpose of Statistical Analyses Estimate the parameters. (Make guesses.)

    Example: What is the population mean?

    Test hypothesis about the parameters. (Ask questions.)

    Example: Is the population mean=30mpg?

  • 7/28/2019 Introductory Statistical Concepts

    48/118

    48

    Role of Samples Taking a sample of the population enables you to

    make estimates of the population parameters answer the questions about the population

    parameters.

  • 7/28/2019 Introductory Statistical Concepts

    49/118

    49

    Population and Sample

    Mean, Variance, Median,

    Mode, Distribution,

    Parameters

    Sample mean

    Sample variance

    Sample

    S

    Inference:

    Estimates

    Test of hypotheses

  • 7/28/2019 Introductory Statistical Concepts

    50/118

  • 7/28/2019 Introductory Statistical Concepts

    51/118

    51

    Results of Summary Statistics

  • 7/28/2019 Introductory Statistical Concepts

    52/118

    52

    Results of Histogram

    continued...

  • 7/28/2019 Introductory Statistical Concepts

    53/118

    53

    Results of Histogram

  • 7/28/2019 Introductory Statistical Concepts

    54/118

    54

    This demonstration illustrates how to estimate

    and plot the sampling distribution of variousstatistics.

    Sampling Distribution Appletsampling_dist

  • 7/28/2019 Introductory Statistical Concepts

    55/118

    55

    View/Application Share: Demo: SamplingDistributions Applet[PlaceWare View/Application Share. Use PlaceWare> Edit Slide Pro pert ies... to edit.]

  • 7/28/2019 Introductory Statistical Concepts

    56/118

    56

    http://www.ruf.rice.edu/~lane/stat_sim/sampling_dist/index.h...[PlaceWare Web Page. Use PlaceWare> Edit Slid e Propert ies.. . to edit.]

  • 7/28/2019 Introductory Statistical Concepts

    57/118

    57

    Confidence Intervals on the Population MeanLevel of Comfort

    50% {21.57 to 22.21}

    95% {20.96 to 22.82}

    99.9% {20.30 to 23.48}

    What does this mean?

  • 7/28/2019 Introductory Statistical Concepts

    58/118

    58

    Test That the Population Mean = 30 mpgUse t-test One Sample t-test

    Requirements for running this test:

    Large n > 35

    Or leftovers are normal

    What is thep-value or sig value?

  • 7/28/2019 Introductory Statistical Concepts

    59/118

    59

    Testing Mean = 30

    : 30

    : 30

    o mpg

    A mpg

    H

    H

  • 7/28/2019 Introductory Statistical Concepts

    60/118

    60

    Conclusions of the TestChoose an alpha level, usually alpha=.05.

    If sig

  • 7/28/2019 Introductory Statistical Concepts

    61/118

    61

    Sig andp-valuesWhen you see a sig value orp-value:

    You know that some hypothesis is being tested. You know whether or not the hypothesis is being

    rejected.

    You probably do not know what the hypothesis really

    is.

    Ask yourself these questions:

    What are the population parameters being tested?

    How is what is being tested related to those

    parameters?

  • 7/28/2019 Introductory Statistical Concepts

    62/118

    62

    Requirements for Doing This TestLarge n n > 35

    Or leftovers are normally distributed.

    Use Histogram to test for normality.

  • 7/28/2019 Introductory Statistical Concepts

    63/118

    63

    This demonstration illustrates the testing of

    hypotheses using the data setcars_american.

    Testing Hypotheses

    Vi /A li i Sh D T i

  • 7/28/2019 Introductory Statistical Concepts

    64/118

    64

    View/Application Share: Demo: TestingHypotheses[PlaceWare View/Application Share. Use PlaceWare> Edit Slide Pro pert ies... to edit.]

  • 7/28/2019 Introductory Statistical Concepts

    65/118

    65

    P l ti Whi h O Si il ?

  • 7/28/2019 Introductory Statistical Concepts

    66/118

    66

    PopulationsWhich Ones are Similar?

    P l ti Whi h O Si il ?

  • 7/28/2019 Introductory Statistical Concepts

    67/118

    67

    PopulationsWhich Ones are Similar?Take samples.

    T k S l

  • 7/28/2019 Introductory Statistical Concepts

    68/118

    68

    Take SamplesUse the samples to answer this question:

    Which populations are similar?

    Statistical translations:

    Which populations are similar? is the same as asking

    Are the following the same:

    distribution?

    mean?

    variance?

    B k d/R i t

  • 7/28/2019 Introductory Statistical Concepts

    69/118

    69

    Background/RequirementsBefore we jump into the analysis, we must ask the

    following questions: How many populations are there?

    How many population parameters are we interested in

    and what are they?

    What tests do we want to do, and what are therequirements for doing those?

    Are we using everything we know?

    E l

  • 7/28/2019 Introductory Statistical Concepts

    70/118

    70

    ExampleSuppose that we are interested in the mpg of American

    and European cars. How many populations are there?

    American Cars

    Mpg

    DistributionMean

    Variance

    European Cars

    Mpg

    DistributionMean

    Variance

    P ll H l ti th ?

  • 7/28/2019 Introductory Statistical Concepts

    71/118

    71

    Poll: How many populations are there?[PlaceWare Multiple Choice Poll. Use PlaceWare> Edit Slid e Propert ies.. . to edit.]

    One - MPG

    Two - American and European

    Depends on the sample size

    P t

  • 7/28/2019 Introductory Statistical Concepts

    72/118

    72

    Parameters

    Population 1 Population 2

    American Cars European Cars

    Variable of interest: mpg Variable of interest: mpg

    Distribution: Normal? Distribution: Normal?

    Mean: Mean:

    Variance: Variance:

    A

    E

    2

    A

    2

    E

    A l

  • 7/28/2019 Introductory Statistical Concepts

    73/118

    73

    Analyses1. We want to look at the distributions.

    2. We want to estimate the parameters.3. We want to answer these questions:

    Are the populations means the same?

    Are the population variances the same?

    Example: Our Data Set car am eu

  • 7/28/2019 Introductory Statistical Concepts

    74/118

    74

    Example: Our Data Set car_am_euSuppose that we are interested in the mpg of American

    and European cars.

    Sample

    American Cars

    Mpg

    DistributionMean

    Variance

    European Cars

    Mpg

    DistributionMean

    Variance

    Sample

    Results from the Sample

  • 7/28/2019 Introductory Statistical Concepts

    75/118

    75

    Results from the Sample

    continued...

  • 7/28/2019 Introductory Statistical Concepts

    76/118

    Box Plots

  • 7/28/2019 Introductory Statistical Concepts

    77/118

    77

    Box Plots

    American European

    Histograms

  • 7/28/2019 Introductory Statistical Concepts

    78/118

    78

    Histograms

    American

    European

    Poll: Are the populations the same?

  • 7/28/2019 Introductory Statistical Concepts

    79/118

    79

    Poll: Are the populations the same?[PlaceWare Yes/No Poll. Use PlaceWare> Edit Slid e Propert ies.. . to edit.]

    Yes

    No

    Conclusion Based on Sample Numbers and

  • 7/28/2019 Introductory Statistical Concepts

    80/118

    80

    Conclusion Based on Sample Numbers andGraphs

    Easy -- Based on the samples, the populations are

    differentno statistical jargon

    But I must have ap-value for my boss, for my paper, and

    so on.

    Formal Tests

  • 7/28/2019 Introductory Statistical Concepts

    81/118

    81

    Formal TestsThe classical approach in determining whether two

    populations are the same is to test to see whether the twopopulation means are equal.

    But first we check to see whether the two population

    variances are equal:

    2 2:o A EH :

    o A E

    H

    continued...

    Formal Tests

  • 7/28/2019 Introductory Statistical Concepts

    82/118

    82

    Formal TestsWe use t-test Two Sample.

    Test 2

    Test 1

  • 7/28/2019 Introductory Statistical Concepts

    83/118

    83

    This demonstration shows how to compare two

    populations using the data set car_am_eu.

    Comparing Two Populations

    View/Application Share: Demo: Comparing

  • 7/28/2019 Introductory Statistical Concepts

    84/118

    84

    View/Application Share: Demo: ComparingTwo populations[PlaceWare View/Application Share. Use PlaceWare> Edit Slide Pro pert ies... to edit.]

    Example

  • 7/28/2019 Introductory Statistical Concepts

    85/118

    85

    Example

    1. Run summary statistics.

    2. Ask for histogram and box plot.

    What do you get?

    data temp1;x = 1;output;

    run;

  • 7/28/2019 Introductory Statistical Concepts

    86/118

  • 7/28/2019 Introductory Statistical Concepts

    87/118

    87

  • 7/28/2019 Introductory Statistical Concepts

    88/118

    Section 1.3

    Simple Linear Regression

    Objectives

  • 7/28/2019 Introductory Statistical Concepts

    89/118

    89

    Objectives Identify the following:

    the population parameters the appropriate model

    number of populations sampled

    the correct hypotheses

    what should be tested for normality

    what equal variances means.

    MPG Example

  • 7/28/2019 Introductory Statistical Concepts

    90/118

    90

    MPG Example

    Weight = 3000

    1

    2

    1

    3

    2

    3

    2

    2

    2

    4

    2

    4

    Weight = 2600

    Weight = 2900Weight = 2300

    Take a sample of

    size 1 from each

    population!

    Data

  • 7/28/2019 Introductory Statistical Concepts

    91/118

    91

    DataWe should be in deep trouble with one sample from each

    population.We have eight unknown population parameters.

    Can you name them?

    But what do we know?

    Survey

  • 7/28/2019 Introductory Statistical Concepts

    92/118

    92

    SurveyName the population parameters.

    Essential Part and Leftovers

  • 7/28/2019 Introductory Statistical Concepts

    93/118

    93

    Essential Part and LeftoversWe want to model the data as follows:

    MPG = Essential Part + Leftover

    or

    MPG = Mean + Leftover

    Know or Assumptions

  • 7/28/2019 Introductory Statistical Concepts

    94/118

    94

    Know or AssumptionsFirst, we know that

    Second, each population mean is related to weight by the

    following:

    The population means fall on a straight line!!

    How many unknowns are there now?

    2 2 2 2 21 2 3 4

    i

    *

    ia b weight

    Poll: How many unknowns are there?

  • 7/28/2019 Introductory Statistical Concepts

    95/118

    95

    Poll: How many unknowns are there?[PlaceWare Multiple Choice Poll. Use PlaceWare> Edit Slid e Propert ies.. . to edit.]

    1

    2

    3

    4

    5n

    Graph

  • 7/28/2019 Introductory Statistical Concepts

    96/118

    96

    Graph

    Observed, Essential Part, Leftover

  • 7/28/2019 Introductory Statistical Concepts

    97/118

    97

    Observed, Essential Part, Leftover

    The Official Regression Model

  • 7/28/2019 Introductory Statistical Concepts

    98/118

    98

    or

    or

    or

    mpg = a + b*weight+leftover

    The Official Regression Model

    The errors are known

    to be normal with mean0 and variance .2

    mpg = a + b*weight+error

    mpg = a + b*weight+

    o 1

    mpg = + *weight+

  • 7/28/2019 Introductory Statistical Concepts

    99/118

  • 7/28/2019 Introductory Statistical Concepts

    100/118

    100

    This demonstration illustrates the fundamental

    concepts of simple linear regression.

    Assumptions for Simple

    Linear RegressionAppendix A

    View/Application Share: Demo: Linear.doc

  • 7/28/2019 Introductory Statistical Concepts

    101/118

    101

    View/Application Share: Demo: Linear.doc[PlaceWare View/Application Share. Use PlaceWare> Edit Slide Pro pert ies... to edit.]

    How Can We Estimate the Unknown

  • 7/28/2019 Introductory Statistical Concepts

    102/118

    102

    Parameters?

    The Principle of Least Squares:

    or

    or

    Now, choose a andb so that is as small as

    possible.

    or

    Minimize .

    i i i

    leftover mpg (a+b*weight )

    i i i

    Let leftover mpg (essential part)

    i i i

    r mpg (a+b*weight )

    2 2 2 2

    1 2 3 4r r r r

    1(r2 2 2 2

    2 3 4r r r )

  • 7/28/2019 Introductory Statistical Concepts

    103/118

  • 7/28/2019 Introductory Statistical Concepts

    104/118

    104

    This demonstration references David Lanes

    applet at Rice University.

    Regression AppletReg_by_eye

    View/Application Share: Demo: David Lane's

  • 7/28/2019 Introductory Statistical Concepts

    105/118

    105

    ppApplet[PlaceWare View/Application Share. Use PlaceWare> Edit Slide Pro pert ies... to edit.]

    http://www.ruf.rice.edu/~lane/stat_sim/reg_by_

  • 7/28/2019 Introductory Statistical Concepts

    106/118

    106

    p _ g_ y_eye[PlaceWare Web Page. Use PlaceWare> Edit Slid e Propert ies.. . to edit.]

  • 7/28/2019 Introductory Statistical Concepts

    107/118

    View/Application Share: Demo: Output of SAS

  • 7/28/2019 Introductory Statistical Concepts

    108/118

    108

    pp pRegression[PlaceWare View/Application Share. Use PlaceWare> Edit Slide Pro pert ies... to edit.]

    OUTPUT_0

  • 7/28/2019 Introductory Statistical Concepts

    109/118

    109

    OUTPUT

  • 7/28/2019 Introductory Statistical Concepts

    110/118

    110

  • 7/28/2019 Introductory Statistical Concepts

    111/118

    OUTPUT_2

  • 7/28/2019 Introductory Statistical Concepts

    112/118

    112

    OUTPUT_3

  • 7/28/2019 Introductory Statistical Concepts

    113/118

    113

    OUTPUT_4

  • 7/28/2019 Introductory Statistical Concepts

    114/118

    114

    Missing Values

  • 7/28/2019 Introductory Statistical Concepts

    115/118

    115

    Suppose that we want to estimate the mean mpg when

    weight=2500.

    Predicted (Estimated) Mean MPG = 44.05 - .0078*weight

    Why does this work?

    Survey

  • 7/28/2019 Introductory Statistical Concepts

    116/118

    116

    Can anyone explain why this works?

  • 7/28/2019 Introductory Statistical Concepts

    117/118

  • 7/28/2019 Introductory Statistical Concepts

    118/118