Lecture 02. Graphical Displays Part 1

Embed Size (px)

Citation preview

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    1/59

    Statistics

    ST 361

    Statistics for Engineers

    Graphical Displays

    Kimberly Weems

    [email protected]

    5260 SAS Hall

    mailto:[email protected]:[email protected]
  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    2/59

    Statistics

    Scales of Measure

    Nominal DataPlaces data in categories

    Another name for that group

    From the article: Do you believe thatextraterrestrial beings have visited Earth at some

    time in the past (Believe, Dont Believe, Not sure)

    People are in one of three categories.

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    3/59

    Statistics

    Scales of Measure

    Ordinal DataCategories that have an order

    From the article:How superstitious are you?

    (Very, Somewhat, Not very, not at all)

    We know that Very is more than Somewhat which

    is more than Not very, etc

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    4/59

    Statistics

    Scales of Measure

    Note: Numbers can be assigned to thecategories but using the numbers does not

    make much sense.

    Code: 4 = Very, 3 = Somewhat, 2= Not very, 1=Not at all

    Very is not twice as much as Not very.

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    5/59

    Statistics

    Scales of Measure

    Interval/RatioNumbers are actually numbers=> make sense as

    numbers and can be used that way

    From the article: What is your age in years?

    Someone who is 40 is twice as old as someone

    who is 20. Difference in Age between someone 30

    and 35 is the same as the difference between 25

    and 30.

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    6/59

    Statistics

    Scales of Measure

    Note: Your text refers to this as interval data,some other texts refer to it as ratio data. For

    the purposes of this course we will not

    differentiate the two.

    No real difference between methods of analysis.

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    7/59Statistics

    Other terminology:

    Categorical data- nominal and ordinal, givesnames to categories.

    Numeric data-uses meaningful numbers

    Quantitative- another name for numeric Qualitative-another name for categorical

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    8/59

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    9/59

    Statistics

    Example: Intro Stat Students

    gender height textbooks HSGPA car

    female 69.5 320 3.1 yes

    female 65 250 4 no

    female 63 150 4 yes

    female 64 300 3.35 yes

    female 67 90 3.7 yes

    female 63 300 3.9 yes

    female 60 250 3 yes

    female 64 250 3.8 yes

    male 72 187 3.1 yes

    female 66 150 3.4 no

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    10/59

    Statistics

    Example: Intro Stat Students

    gender height textbooks HSGPA car

    female 69.5 320 3.1 yes

    female 65 250 4 no

    female 63 150 4 yes

    female 64 300 3.35 yes

    female 67 90 3.7 yes

    female 63 300 3.9 yes

    female 60 250 3 yes

    female 64 250 3.8 yes

    male 72 187 3.1 yes

    female 66 150 3.4 no

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    11/59

    Statistics

    Example: Intro Stat Students

    gender height textbooks HSGPA car

    female 69.5 320 3.1 yes

    female 65 250 4 no

    female 63 150 4 yes

    female 64 300 3.35 yes

    female 67 90 3.7 yes

    female 63 300 3.9 yes

    female 60 250 3 yes

    female 64 250 3.8 yes

    male 72 187 3.1 yes

    female 66 150 3.4 no

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    12/59

    Statistics 12

    Why do we care?

    Type of data dictates summary that will beused. We must choose the analysis that will be

    used.

    Summaries of categorical data.Proportions and counts

    Example: Superstitious? 2% very, 22%

    somewhat, 31% not very, 45% not at all.

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    13/59

    Statistics 13

    Why do we care?

    Summaries of numeric data.Averages, medians, standard deviations

    Example: Age? Average 35 years

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    14/59

    Statistics

    Graphics

    Statistical Results are often presented ingraphical displays

    1 picture = ____________

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    15/59

    Statistics

    Graphics

    Statistical Results are often presented ingraphical displays

    1 picture = 1000 words

    Help understand the story behind the data.

    Visualize the distribution

    Th th l f d t l i t

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    16/59

    Statistics

    The three rules of data analysis wont

    be difficult to remember

    1. Make a picturereveals aspects not obvious in the rawdata; enables you to thinkclearly about the patterns andrelationships that may be hiding in your data.

    2. Make a pictureto show important features of and

    patterns in the data. You may also see things that you did notexpect: the extraordinary (possibly wrong) data values orunexpected patterns

    3. Make a picturethe best way to tell others about yourdata is with a well-chosen picture.

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    17/59

    Statistics

    Graphics

    Bar Charts- graphical representation ofcategorical data

    Horizontal Axis- categories

    Vertical Axis- count or percentage of subjects inthat category.

    Pie Charts- Angle represents proportion of

    values

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    18/59

    Statistics

    Categorical data

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    19/59

    Statistics

    Categorical data

    Story: More females than males

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    20/59

    Statistics

    What about numeric values?

    Plot numeric values to see where they arelocated.

    Stem and Leaf Displays

    Dot plot- graphic of numeric data.

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    21/59

    Statistics

    Stem and Leaf Displays

    Partition each no. in data into a stem and leaf

    Constructing stem and leaf display

    1) deter. stem and leaf partition (5-20 stems)

    2) write stems in column with smallest stem at top;

    include all stems in range of data

    3) only 1 digit in leaves; drop digits or round off

    4) record leaf for each no. in corresponding stem row;

    ordering the leaves in each row helps

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    22/59

    Statistics

    Example: employee ages at a small company

    18 21 22 19 32 33 40 41 56 57 64 28 29 29 38 39;

    stem: 10s digit; leaf: 1s digit 18: stem=1; leaf=8; 18 = 1 | 8

    stem leaf

    1 8 92 1 2 8 9 9

    3 2 3 8 9

    4 0 1

    5 6 7

    6 4

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    23/59

    Statistics

    Suppose a 95 yr. old is hiredstem leaf

    1 8 92 1 2 8 9 9

    3 2 3 8 9

    4 0 1

    5 6 7

    6 4

    7

    8

    9 5

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    24/59

    Statistics

    Advantages/Disadvantages of Stem-and-

    Leaf Displays

    Advantages

    1) each measurement displayed

    2) ascending order in each stem row

    3) relatively simple (data set not too large)

    Disadvantages

    display becomes unwieldy for large data sets

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    25/59

    Statistics

    Dot Plot

    A health researcher examined the amount ofsoda that a group of teenagers consumed

    during a day. The resulting amounts in ounces

    were: 9, 9, 6, 15, 12, 14, and 40.

    0 10 20 30 40

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    26/59

    Statistics

    Dot Plot

    A health researcher examined the amount ofsoda that a group of teenagers consumed

    during a day. The resulting amounts in ounces

    were: 9, 9, 6, 15, 12, 14, and 40.

    0 10 20 30 40

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    27/59

    Statistics

    Dot Plot

    A health researcher examined the amount ofsoda that a group of teenagers consumed

    during a day. The resulting amounts in ounces

    were: 9, 9, 6, 15, 12, 14, and 40.

    0 10 20 30 40

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    28/59

    Statistics

    Dot Plot

    A health researcher examined the amount ofsoda that a group of teenagers consumed

    during a day. The resulting amounts in ounces

    were: 9, 9, 6, 15, 12, 14, and 40.

    0 10 20 30 40

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    29/59

    Statistics

    Dot Plot

    A health researcher examined the amount ofsoda that a group of teenagers consumed

    during a day. The resulting amounts in ounces

    were: 9, 9, 6, 15, 12, 14, and 40.

    0 10 20 30 40

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    30/59

    Statistics

    Dot Plot

    A health researcher examined the amount ofsoda that a group of teenagers consumed

    during a day. The resulting amounts in ounces

    were: 9, 9, 6, 15, 12, 14, and 40.

    0 10 20 30 40

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    31/59

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    32/59

    Statistics

    Dot Plot

    We can see the main cluster is around 10. Smallest at 6, largest at 40.

    Big gap between 40 and the other values.

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    33/59

    Statistics

    Dot Plot

    Good for small numbers of values, butbecomes cumbersome with many values.

    For larger numbers of values we could stack up

    values in categories.

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    34/59

    Statistics

    Histograms

    Histogram-bar chart of quantitative dataRange of possible values are broken into

    categories

    Example: Undergraduate university students

    survey: How much did you spend on

    textbooks this academic term?

    Categories: $0 to $100, $101 to $200, etc.

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    35/59

    Statistics

    Textbooks

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    36/59

    Statistics

    Textbooks

    About 20people paidbetween $601and $700

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    37/59

    Statistics

    Textbooks

    Amounts centered around $400 More people around $400 than around $200

    Values as big as 800 as low as 0.

    Useful for understanding the distribution ofquantitative variables.

    Where are the main chunks of data?

    How spread out are the values?What is the shape of the data?

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    38/59

    Statistics

    TV Time

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    39/59

    Statistics

    TV Time

    About 110 peoplespent between 11and 15

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    40/59

    Statistics

    TV Time

    Main cluster between 0 and 10 Smallest value 0 and largest value around 60

    Cant be below 0, minimum possible. Many

    people around the lower limit. Max possible 168. No one around there or

    even close.

    Values tail off to the right side.

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    41/59

    Statistics

    Shapes of distributions

    Skewed to the right (or positively skewed)Long tail to the right

    Generally because individuals are stacked up near

    a lower limit and unlimited on the upper end.

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    42/59

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    43/59

    Statistics

    Shapes of distributions

    Skewed to the left (or negatively skewed)Long tail to the left

    Generally because individuals are stacked up near

    an upper limit and unlimited on the lower end.

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    44/59

    Statistics

    Birth Year

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    45/59

    Statistics

    Birth Year Traditionalstudents

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    46/59

    Statistics

    Birth Year Traditionalstudents

    Switchedmajors, fiveor six year

    students

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    47/59

    Statistics

    Birth Year Traditionalstudents

    Switchedmajors, fiveor six year

    students

    Nontraditional

    students

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    48/59

    Statistics

    Birth Year Traditionalstudents

    Switchedmajors, fiveor six year

    students

    Nontraditional

    studentsOlderstudentsback to for

    life change

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    49/59

    Statistics

    Shapes of distributions

    SymmetricTails approximately equal in both directions

    Major cluster near far from limits on both ends.

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    50/59

    Statistics

    Textbooks

    Approximatelysymmetric

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    51/59

    Statistics

    Bimodal

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    52/59

    Statistics

    Bimodal

    Peak 1

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    53/59

    Statistics

    Bimodal

    Peak 2

    Peak 1

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    54/59

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    55/59

    Statistics

    Shapes of distributions

    Bimodal-two peaksCaused by two or more groups

    Multi-modalseveral peaks

    h f di ib i

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    56/59

    Statistics

    Shapes of distributions

    Help us understand the dataSkewed=> typically because of a natural limit that

    subjects are near

    Symmetric => Subjects are not near the limit.

    Multi-modal=> multiple distinct groups within the

    distribution.

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    57/59

    O li

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    58/59

    Statistics

    Outliers

    Recall dot plot40 might be considered an outlier.

    Maybe data entry error.

    May be actual value.

    0 10 20 30 40

    Cl P bl

  • 7/28/2019 Lecture 02. Graphical Displays Part 1

    59/59

    Class Problem

    The following 10 observations on Octobersnow cover for Eurasia during the years 1970-

    1979 (in million km2):

    Create a stem & leaf display of the data.

    Is their an outlier in the data set?

    6.5 12.0 14.9 10.0 10.7

    7.9 21.9 12.5 14.5 9.2