Basic Elements of Descritive Statistic

Embed Size (px)

Citation preview

  • 7/28/2019 Basic Elements of Descritive Statistic

    1/61

    What is Statistics?

    Statistics is the science of describing ormaking inferences about the world from asample of data.

    1

  • 7/28/2019 Basic Elements of Descritive Statistic

    2/61

    Descriptive Inferential

    Statistics

    2

  • 7/28/2019 Basic Elements of Descritive Statistic

    3/61

    Descriptive Statistics

    Descriptive statistics are methods fororganizing and summarizing data.

    For example, tables or graphs are used toorganize data, and descriptive values areused to summarize data.

    3

  • 7/28/2019 Basic Elements of Descritive Statistic

    4/61

    Inferential Statistics

    Two main methods:1. Estimation

    The sample statistic is used to estimate a

    population parameter.

    A confidence interval about the estimate is

    constructed.

    2. Hypothesis testing

    A null hypothesis is put forward.

    Analysis of the data is then used to

    determine whether to reject it.4

  • 7/28/2019 Basic Elements of Descritive Statistic

    5/61

    Definitions

    A variable is a characteristic or conditionthat can change or take on different

    values.

    Datum is one observation about thevariable being measured.

    Data are a collection of observations.

    The goal of statistics is to help researchersorganize and interpret the data.

    5

  • 7/28/2019 Basic Elements of Descritive Statistic

    6/61

    TYPES OF DATA

    VARIABLES

    QUANTITATIVEQUALITATIVE

    RATIO INTERVALORDINAL NOMINAL

    Discrete Continuous

    6

  • 7/28/2019 Basic Elements of Descritive Statistic

    7/61

    Nominal or categorical data is data that comprises of

    categories that cannotbe rank ordered each category is just

    different.

    The categories available cannot be placed in any order and nojudgement can be made about the relative size or distance from

    one category to another.

    What does this mean? No mathematical operations can be

    performed on the data relative to each other.

    Therefore, nominal data reflect qualitative differences rather

    than quantitative ones.

    Nominal data

    7

  • 7/28/2019 Basic Elements of Descritive Statistic

    8/61

    Examples:

    Nominal data

    What is your gender?

    (please tick)

    Male

    Female

    Did you enjoy the film?

    (please tick)

    Yes

    No

    8

  • 7/28/2019 Basic Elements of Descritive Statistic

    9/61

    Systems for measuring nominal data must ensure that

    each category is mutually exclusive and the system of

    measurement needs to be exhaustive.

    Variables that have only two responses i.e. Yes or No,

    are known as dichotomies.

    Nominal data

    9

  • 7/28/2019 Basic Elements of Descritive Statistic

    10/61

    Ordinal data is data that comprises of categories that

    can be rank ordered.

    Similarly with nominal data the distance between each

    category cannot be calculated but the categories can be

    ranked above or below each other.

    What does this mean? Can make statistical judgements

    and perform limited maths.

    Ordinal data

    10

  • 7/28/2019 Basic Elements of Descritive Statistic

    11/61

    Example:

    Ordinal data

    How satisfied are you with the level of

    service you have received? (please tick)

    Very satisfied

    Somewhat satisfied

    Neutral

    Somewhat dissatisfied

    Very dissatisfied

    11

  • 7/28/2019 Basic Elements of Descritive Statistic

    12/61

    Both interval and ratio data are examples of scale data.

    Scale data:

    data is in numeric format (50, 100, 150)

    data that can be measured on a continuous scale

    the distance between each can be observed and

    as a result measured the data can be placed in rank order.

    Interval and ratio data

    12

  • 7/28/2019 Basic Elements of Descritive Statistic

    13/61

    Interval data measured on a continuous scale and has no

    true zero point. Examples:

    Time moves along a continuous measure or

    seconds, minutes and so on and is without a zero

    point of time.

    Temperature moves along a continuous measure of

    degrees and is without a true zero.

    Interval data

    13

  • 7/28/2019 Basic Elements of Descritive Statistic

    14/61

    Ratio data measured on acontinuous

    scale anddoes

    have a true zeropoint. Examples:

    Age

    Weight

    Height

    Ratio data measured on a discrete scale and does have a true zero point.

    Example:

    Number of children

    Ratio data

    14

  • 7/28/2019 Basic Elements of Descritive Statistic

    15/61

    These levels of measurement can be placed in hierarchical order.

    Hierarchical data order

    Ratio

    Interval

    Ordinal

    Nominal

    15

  • 7/28/2019 Basic Elements of Descritive Statistic

    16/61

    Population

    The entire group of individuals is called the

    population.

    Population

    16

  • 7/28/2019 Basic Elements of Descritive Statistic

    17/61

    Sample

    Usually populations are so large that a

    researcher cannot examine the entire group.Therefore, a sample is selected to representthe population in a research study. The goal

    is to use the results obtained from thesample to help answer questions about thepopulation.

    Population

    Sample

    17

  • 7/28/2019 Basic Elements of Descritive Statistic

    18/61

    Why sample?

    Measuring all units (trees, products, birds, etc.) is

    impractical, if not impossible.

    Sampling just a few units saves money.

    Sampling just a few units saves time.

    Some measurements are destructive:

    cutting down trees to inspect ring patterns or stem analysiscapturing wildlife to examine their morphology, etc.

    Sampling makes statistical methods attractive and powerful.

    18

  • 7/28/2019 Basic Elements of Descritive Statistic

    19/61

    19

  • 7/28/2019 Basic Elements of Descritive Statistic

    20/61

    A descriptive value for a population iscalled a parameterand a descriptive

    value for a sample is called astatistic.

    Parameterversus Statistic

    PopulationSample

    ParameterStatistic

    20

  • 7/28/2019 Basic Elements of Descritive Statistic

    21/61

    21

    Statistic tools

  • 7/28/2019 Basic Elements of Descritive Statistic

    22/61

    Tables

    One way frequency table

    Number of passangers Frequency

    2 2

    4 23

    5 41

    6 18

    7 88 1

    For nominal, ordinal and discrete variables.

    22

  • 7/28/2019 Basic Elements of Descritive Statistic

    23/61

    Two way frequency table

    Tables

    Sex\ Hobby Dance Sports TV Total

    Men 2 10 8 20

    Women 16 6 8 30

    Total 18 16 16 50

    For nominal, ordinal and discrete variables.

    23

  • 7/28/2019 Basic Elements of Descritive Statistic

    24/61

    Frequency table

    Tables

    Age Frequency Percentage

    10-14 2 5

    15-19 16 40

    20-24 18 45

    25-29 37.5

    30-34 1 2.5

    Total 40 100

    For quantitive variables.

    24

  • 7/28/2019 Basic Elements of Descritive Statistic

    25/61

    Graphs

    Bar chart

    Pie chart

    Pictograms

    HistogramDensity plot

    Scatter plot

    Time series plotBoxplot

    25

  • 7/28/2019 Basic Elements of Descritive Statistic

    26/61

    Graphs

    For nominal, ordinal and discrete variables.

    26

  • 7/28/2019 Basic Elements of Descritive Statistic

    27/61

    Graphs

    For nominal, ordinal and discrete variables.

    27

  • 7/28/2019 Basic Elements of Descritive Statistic

    28/61

    Graphs

    Statistic pictograms

    Do not recommended

    28

  • 7/28/2019 Basic Elements of Descritive Statistic

    29/61

    Graphs

    Only for numerical variables

    29

  • 7/28/2019 Basic Elements of Descritive Statistic

    30/61

    Graphs

    Only for numerical variables

    30

  • 7/28/2019 Basic Elements of Descritive Statistic

    31/61

    Graphs

    Only for numerical variables

    31

  • 7/28/2019 Basic Elements of Descritive Statistic

    32/61

    Graphs examples on web

    32

  • 7/28/2019 Basic Elements of Descritive Statistic

    33/61

    Recommended book

    http://www.laeditorialvirtual.com.ar/Pages2/Huff_Darrell/Huff_ComoMentirConEstadisticas.html#_Toc334380216

    33

    http://www.laeditorialvirtual.com.ar/Pages2/Huff_Darrell/Huff_ComoMentirConEstadisticas.htmlhttp://www.laeditorialvirtual.com.ar/Pages2/Huff_Darrell/Huff_ComoMentirConEstadisticas.htmlhttp://www.laeditorialvirtual.com.ar/Pages2/Huff_Darrell/Huff_ComoMentirConEstadisticas.html
  • 7/28/2019 Basic Elements of Descritive Statistic

    34/61

    A cartoon

    34

  • 7/28/2019 Basic Elements of Descritive Statistic

    35/61

    Recommended videos

    http://www.youtube.com/watch?v=nUJNstRFvvo

    http://www.youtube.com/watch?v=ETbc8GIhfHo

    35

    http://www.youtube.com/watch?v=nUJNstRFvvohttp://www.youtube.com/watch?v=ETbc8GIhfHohttp://www.youtube.com/watch?v=ETbc8GIhfHohttp://www.youtube.com/watch?v=ETbc8GIhfHohttp://www.youtube.com/watch?v=nUJNstRFvvohttp://www.youtube.com/watch?v=nUJNstRFvvohttp://www.youtube.com/watch?v=nUJNstRFvvo
  • 7/28/2019 Basic Elements of Descritive Statistic

    36/61

    A measure of central tendency is a value that represents a

    typical, or central, entry of a data set. The three most

    commonly used measures of central tendency are the mean,

    the median, and the mode.

    Measures of Central Tendency

    36

  • 7/28/2019 Basic Elements of Descritive Statistic

    37/61

    The mean of a data set is the sum of the data entries divided

    by the number of entries.

    Population mean:x

    N

    Sample mean:

    xx

    n

    mu x-bar

    Mean

    37

  • 7/28/2019 Basic Elements of Descritive Statistic

    38/61

    Calculate the population mean.

    Mean

    N

    x

    7

    343

    49 years

    53 32 61 57 39 44 57

    Example: the following are the ages of all seven

    employees of a small company:

    The mean age of the employees is 49 years.

    Add the ages and divide by 7.

    38

  • 7/28/2019 Basic Elements of Descritive Statistic

    39/61

    Median

    The median of a data set is the value that lies in the middleof the data when the data set is ordered. If the data set has

    an odd number of entries, the median is the middle data

    entry. If the data set has an even number of entries, the

    median is the mean of the two middle data entries.

    53 32 61 57 39 44 57To find the median, sort the data.

    Example: calculate the median age of the seven employees.

    32 39 44 53 57 57 61

    The median age of the employees is 53 years.39

  • 7/28/2019 Basic Elements of Descritive Statistic

    40/61

    The mode is 57 because it occurs the most times.

    Mode

    The mode of a data set is the data entry or category thatoccurs with the greatest frequency. If no entry is repeated,

    the data set has no mode. If two entries occur with the same

    greatest frequency, each entry is a mode and the data set is

    called bimodal.

    53 32 61 57 39 44 57

    Example: find the mode of the ages of the seven employees.

    An outlieris a datum that is far from the other in the data set.

    40

  • 7/28/2019 Basic Elements of Descritive Statistic

    41/61

    53 32 61 57 39 44 57 29

    Recalculate the mean, the median, and the mode. Whichmeasure of central tendency was affected when this new

    age was added?

    Mean = 46.5

    Example: A 29-year-old employee joins the company

    and the ages of the employees are now:

    Comparing the Mean, Median and Mode

    Median = 48.5

    Mode = 57

    The mean takes every value into

    account, but is affected by the outlier.

    The median and mode are not

    influenced by extreme values.

    41

  • 7/28/2019 Basic Elements of Descritive Statistic

    42/61

    Weighted Mean

    A weighted mean is the mean of a data set whose entries

    have varying weights. A weighted mean is given by

    where wis the weight of each entryx.

    ( )x wx

    w

    Example: grades in a statistics class are weighted as

    follows.

    Tests are worth 50% of the grade, homework is worth 30% ofthe grade and the final is worth 20% of the grade. A student

    receives a total of 80 points on tests, 100 points on

    homework, and 85 points on his final. What is his current

    grade?42

  • 7/28/2019 Basic Elements of Descritive Statistic

    43/61

    Weighted Mean

    Source Score,x Weight, w xw

    Tests 80 0.50 40Homework 100 0.30 30

    Final 85 0.20 17

    The students current grade is 87%.

    ( )x wxw

    87100

    0.87

    Begin by organizing the data in a table.

  • 7/28/2019 Basic Elements of Descritive Statistic

    44/61

    Shapes of Distributions

    A frequency distribution is symmetric when a vertical line can be drawnthrough the middle of a graph of the distribution and the resulting halves

    are approximately the mirror images.

    A frequency distribution is uniform (orrectangular) when all entries, or

    classes, in the distribution have equal frequencies. A uniform distribution

    is also symmetric.

    A frequency distribution is skewed if the tail of the graph elongates more

    to one side than to the other. A distribution is skewed left (negatively

    skewed) if its tail extends to the left. A distribution is skewed right

    (positivelyskewed) if its tail extends to the right.

  • 7/28/2019 Basic Elements of Descritive Statistic

    45/61

    Mean > Median > ModeMean < Median < Mode

    Summary of Shapes of Distributions

    Mean = Median

    45

  • 7/28/2019 Basic Elements of Descritive Statistic

    46/61

    Measures of Variation

    46

  • 7/28/2019 Basic Elements of Descritive Statistic

    47/61

    The mean is a good indicator of the central tendency of a setof data, but it does not provide the whole picture about the

    data set.

    Example 1: comparison of the distribution of two data sets

    Mean Median

    Data set A: 5 6 7 8 9 7 7Data set B: 1 2 7 12 13 7 7

    Note: Both the distributions have same mean and median, butbeyond that they are quite different. In the distribution A, 7 is afairly typical value but in distribution B, most of the values differquite a bit from 7. What is needed here is some measure ofthe dispersion or spread of the data. Following example willillustrate further the importance of measuring the variability in adata set.

    47

  • 7/28/2019 Basic Elements of Descritive Statistic

    48/61

    Example 2: Suppose that in a hospital, each patients pulserate is taken in the morning, at noon, and in the evening. On a

    certain day, pulse rate for

    Mean Median

    Patient A: 72 76 74 74 74Patient B: 72 91 59 74 72

    Note: Mean pulse rate is same for both the patients. While

    patientAs pulse rate is stable, patient Bs fluctuates widely.

    48

    R

  • 7/28/2019 Basic Elements of Descritive Statistic

    49/61

    Range

    The range of a data set is the difference between themaximum and minimum date entries in the set.

    Range = (Maximum data entry) (Minimum data entry)

    Example:

    The following data are the closing prices for a certain

    stock on ten successive Fridays. Find the range.

    Stock 56 56 57 58 61 63 63 67 67 67

    The range is 67 56 = 11.

  • 7/28/2019 Basic Elements of Descritive Statistic

    50/61

    Population Variance and Standard Deviation

    The populationvariance of a population data set ofN

    entries is

    Population variance =

    sigma squared

    The populationstandard deviation of a population data set

    ofNentries is the square root of the population variance.

    Population standard deviation =

    sigma

    2 =( )2

    =( )2

    50

  • 7/28/2019 Basic Elements of Descritive Statistic

    51/61

    Sample Variance and Standard Deviation

    The samplevariance of a sample data set ofn entries is

    Sample variance =

    S squared

    The samplestandard deviation of a sample data set ofn

    entries is the square root of the sample variance.

    Sample standard deviation =

    S

    2 =( )2

    1

    2 =( )2

    1

    51

  • 7/28/2019 Basic Elements of Descritive Statistic

    52/61

    Interpreting Standard Deviation

    When interpreting standard deviation, remember that is a

    measure of the typical amount an entry deviates from the

    mean. The more the entries are spread out, the greater the

    standard deviation.

    10

    8

    6

    4

    2

    0

    Data value

    Frequency

    1214

    2 4 6

    = 4

    s = 1.18

    x

    10

    8

    6

    4

    2

    0

    Data value

    Frequency

    1214

    2 4 6

    = 4

    s = 0

    x

    52

  • 7/28/2019 Basic Elements of Descritive Statistic

    53/61

    Measures of Position

    53

    Quartiles

  • 7/28/2019 Basic Elements of Descritive Statistic

    54/61

    Quartiles

    The three quartiles, Q1, Q2, and Q3, approximately divide anordered data set into four equal parts.

    Median

    0 5025 10075

    Q3Q2Q1

    Q1 is the median of the

    data below Q2.

    Q3 is the median of

    the data above Q2.

    54

    Finding Quartiles

  • 7/28/2019 Basic Elements of Descritive Statistic

    55/61

    Finding Quartiles

    Example:The quiz scores for 15 students is listed below. Find the first,

    second and third quartiles of the scores.

    28 43 48 51 43 30 55 44 48 33 45 37 37 42 38

    Order the data.

    28 30 33 37 37 38 42 43 43 44 45 48 48 51 55

    Lower half Upper half

    Q2Q1 Q3

    About one fourth of the students scores 37 or less; about one half

    score 43 or less; and about three fourths score 48 or less.

    55

    Interquartile Range

  • 7/28/2019 Basic Elements of Descritive Statistic

    56/61

    Interquartile Range

    The interquartile range (IQR) of a data set is the differencebetween the third and first quartiles.

    Interquartile range (IQR) = Q3Q1.

    Example:

    The quartiles for 15 quiz scores are listed below. Find the

    interquartile range.

    (IQR) = Q3Q1

    Q2 = 43 Q3 = 48Q1 = 37

    = 48 37

    = 11

    The quiz scores in the middle

    portion of the data set vary by at

    most 11 points.

    56

    Box and Whisker Plot (boxplot)

  • 7/28/2019 Basic Elements of Descritive Statistic

    57/61

    Box and Whisker Plot (boxplot)

    A box-and-whisker plot is an exploratory data analysis tool

    that highlights the important features of a data set.

    The five-number summary is used to draw the graph.

    The minimum entry

    Q1 Q2 (median)

    Q3 The maximum entry

    Example:

    Use the data from the 15 quiz scores to draw a box-and-

    whisker plot.

    Continued.

    28 30 33 37 37 38 42 43 43 44 45 48 48 51 55

    57

    Box and Whisker Plot

  • 7/28/2019 Basic Elements of Descritive Statistic

    58/61

    Box and Whisker Plot

    Five-number summary

    The minimum entry

    Q1 Q2 (median)

    Q3 The maximum entry

    37

    28

    55

    4348

    40 44 48 52363228 56

    28 37 43 48 55

    Quiz Scores

    58

    Parts of a boxplot

  • 7/28/2019 Basic Elements of Descritive Statistic

    59/61

    59

    Parts of a boxplot

    Percentiles and Deciles

  • 7/28/2019 Basic Elements of Descritive Statistic

    60/61

    Percentiles and Deciles

    Percentiles divide an ordered data set into 100 parts.There are 99 percentiles: P1, P2, P3P99.

    Deciles divide an ordered data set into 10 parts. Thereare 9 deciles: D1, D2, D3D9.

    Example: A test score at the 80th percentile (D8), indicatesthat the test score is greater than 80% of all other test scores

    and less than or equal to 20% of the scores.

    Design matrix

  • 7/28/2019 Basic Elements of Descritive Statistic

    61/61

    Design matrix

    Sex Age Smoke Country Married

    Female 23 Yes USA Yes

    Male 43 Yes Colombia Yes

    Male 19 Not Brazil Yes

    Male 23 Yes Brazil Not

    Female 56 Not Canada Yes

    Female 78 Yes USA Yes

    Male 54 Not Spain Not

    Male 76 Yes Colombia Not

    Female 43 Not Peru Yes

    5 Variables

    10 Individuals

    Dimension 10 x 5