47
Gender male/female continuous Age in years Continuous Time is on a continuous scale Weight nearest Kg continuous Weekly expenditure £ Continuous Again a continuous scale but you might argue that it is discrete Number of siblings discrete Party Labour, Cons, Lib Dem, other categorical Term accommodation Halls, Home, Private rented, other categorical Assignment grade A, B, C, D, E Ranked because we put grade A as better than grade B better than C etc Types of data

Lecture Material Stat PPT

Embed Size (px)

DESCRIPTION

Lecture Material Stat PPT

Citation preview

  • Types of data

    Gender male/female

    continuousAge in years

    ContinuousTime is on a continuous scaleWeight nearest Kg

    continuousWeekly expenditure

    Continuous

    Again a continuous scale but you might argue that it is discreteNumber of siblings

    discreteParty Labour, Cons, Lib Dem, other

    categoricalTerm accommodation Halls, Home, Private rented, other

    categoricalAssignment grade A, B, C, D, E

    Ranked because we put grade A as better than grade B better than C etc

  • Tabulation of data

    MalesFemalesTotalHalls75100175Home303868Private rented21930Other437Total130150280

  • X 360

    Pie chart angle formula

  • Halls 75/130 x360 =208Home 30/130x360= 83Private 21/130x360=58Other 4/130x360= 11

    Must add up to 360Pie chart angle calculations

  • Pie Chart

    Chart3

    75

    30

    21

    4

    males

    Sheet1

    malesfemales

    Halls75100

    Home3038

    Private rented219

    other43

    Sheet1

    males

    Sheet2

    females

    Sheet3

  • Presentation of informationA simple bar chart has non touching bars with the height of each bar proportional to the frequency. In Excel a bar chart with vertical bars is called a Column chartAnd with horizontal bars is called a Bar chartMultiple Bar Chart: the bars are split into several to show another variable. In Excel such a bar chart is known as a clustered column with vertical bars and a clustered bar with horizontal bars.Component Bar Chart:components are stacked together to show another variable. In Excel such a bar chart is known as a stacked column with vertical bars and a stacked bar chart with horizontal bars.Percentage Component Bar charts convey the information in percentage form rather than the actual frequency values and thus highlight differences in proportions of one variable.

  • Frequency table

    TimeFrequency BoundariesClass sizeClass midpoint0-

  • Open ended classesUnder 5 becomes 0 to under 5

    40 and over becomes 40 to under 45

  • Histogramof delivery times

    Chart1

    0

    3

    3

    7

    7

    10

    10

    12

    12

    8

    8

    6

    6

    2

    2

    2

    2

    0

    Delivery times (days)

    Frequency

    Histogram of delivery times

    histogram1

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0 0 0 0 0 0 0 0 0

    time (days)

    frequency

    histogram of delivery times

    histogram blank grid

    00

    50

    100

    150

    200

    250

    300

    350

    400

    450

    500

    histogram blank grid

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    blank grids

    50

    100

    150

    200

    250

    300

    350

    400

    450

    500

    blank grids

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Sheet1

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    histograms2

    Time (days)Frequency50

    5053

    103103

    157107

    2010157

    25121510

    3082010

    3562012

    4022512

    452258

    308

    306

    356

    352

    402

    402

    452

    450

    histograms2

    Delivery times (days)

    Frequency

    Histogram of delivery times

    cumultive frequencytable

    cumultive frequencytable

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    cumultive f graph

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    cumultive f graph measures

    uppper boundarycumulative frequency

    delivery timefrequencyless than 550

    5 -< 103less than 10103

    10 -

  • Frequency polygon of delivery times

    Chart1

    0

    3

    7

    10

    12

    8

    6

    2

    2

    0

    frequency

    Delivery times (days)

    frequency

    frequency polygon

    histogram equal

    50

    53Histogram equal classes

    103

    107

    157

    1510

    2010

    2012

    2512

    258

    308

    306

    356

    352

    402

    402

    452

    450

    histogram equal

    Delivery times (days)

    Frequency

    Histogram of delivery times

    histogram1

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0 0 0 0 0 0 0 0 0

    time (days)

    frequency

    histogram of delivery times

    unequal classes

    50

    100

    150

    200

    250

    300

    350

    400

    450

    500

    01.5

    51.5

    101.5

    107

    157

    1510

    2010

    2012

    2512

    258

    308

    303.3

    353.3

    403.3

    453.3

    450

    00.3

    50.3

    100.3

    101.4

    151.4

    152

    202

    202.4

    252.4

    251.6

    301.6

    300.7

    350.7

    400.7

    450.7

    450

    unequal classes

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    histograms2

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    cumultive frequencytable

    1.5

    1.5

    1.5

    7

    7

    10

    10

    12

    12

    8

    8

    3.3

    3.3

    3.3

    3.3

    0

    delivery times (days)

    frequency per class size 5

    histogram using standard class size 5

    cumultive f graph

    0.3

    0.3

    0.3

    1.4

    1.4

    2

    2

    2.4

    2.4

    1.6

    1.6

    0.7

    0.7

    0.7

    0.7

    0

    delivery times (days)

    frequency density

    histogram using frequency density

    frequecny polygon

    frequecny polygon

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    cumultive f graph measures

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    uppper boundarycumulative frequency

    delivery timefrequencyless than 550

    5 -< 103less than 10103

    10 -

  • Unequal class sizes

    Delivery timeNumber of deliveriesFrequency densityAdjusted height using standard class width 50-

  • Blank grids

    Chart2

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    histogram equal

    50

    53Histogram equal classes

    103

    107

    157

    1510

    2010

    2012

    2512

    258

    308

    306

    356

    352

    402

    402

    452

    450

    histogram equal

    Delivery times (days)

    Frequency

    Histogram of delivery times

    histogram1

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0 0 0 0 0 0 0 0 0

    time (days)

    frequency

    histogram of delivery times

    blank grids

    50

    100

    150

    200

    250

    300

    350

    400

    450

    500

    blank grids

    histograms2

    cumultive frequencytable

    cumultive frequencytable

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    cumultive f graph

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    cumultive f graph measures

    uppper boundarycumulative frequency

    delivery timefrequencyless than 550

    5 -< 103less than 10103

    10 -

  • Histogram standard class size 5

    Chart4

    1.5

    1.5

    1.5

    7

    7

    10

    10

    12

    12

    8

    8

    3.3

    3.3

    3.3

    3.3

    0

    delivery times (days)

    frequency per class size 5

    histogram using standard class size 5

    histogram equal

    50

    53Histogram equal classes

    103

    107

    157

    1510

    2010

    2012

    2512

    258

    308

    306

    356

    352

    402

    402

    452

    450

    histogram equal

    Delivery times (days)

    Frequency

    Histogram of delivery times

    histogram1

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0 0 0 0 0 0 0 0 0

    time (days)

    frequency

    histogram of delivery times

    blank grids

    50

    100

    150

    200

    250

    300

    350

    400

    450

    500

    01.5

    51.5

    101.5

    107

    157

    1510

    2010

    2012

    2512

    258

    308

    303.3

    353.3

    403.3

    453.3

    450

    blank grids

    histograms2

    cumultive frequencytable

    delivery times (days)

    frequency per class size 5

    histogram using standard class size 5

    cumultive f graph

    cumultive f graph

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    cumultive f graph measures

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    uppper boundarycumulative frequency

    delivery timefrequencyless than 550

    5 -< 103less than 10103

    10 -

  • Histogram using Frequency density

    Chart6

    0.3

    0.3

    0.3

    1.4

    1.4

    2

    2

    2.4

    2.4

    1.6

    1.6

    0.7

    0.7

    0.7

    0.7

    0

    delivery times (days)

    frequency density

    histogram using frequency density

    histogram equal

    50

    53Histogram equal classes

    103

    107

    157

    1510

    2010

    2012

    2512

    258

    308

    306

    356

    352

    402

    402

    452

    450

    histogram equal

    Delivery times (days)

    Frequency

    Histogram of delivery times

    histogram1

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0 0 0 0 0 0 0 0 0

    time (days)

    frequency

    histogram of delivery times

    blank grids

    50

    100

    150

    200

    250

    300

    350

    400

    450

    500

    01.5

    51.5

    101.5

    107

    157

    1510

    2010

    2012

    2512

    258

    308

    303.3

    353.3

    403.3

    453.3

    450

    00.3

    50.3

    100.3

    101.4

    151.4

    152

    202

    202.4

    252.4

    251.6

    301.6

    300.7

    350.7

    400.7

    450.7

    450

    blank grids

    histograms2

    cumultive frequencytable

    delivery times (days)

    frequency per class size 5

    histogram using standard class size 5

    cumultive f graph

    delivery times (days)

    frequency density

    histogram using frequency density

    cumultive f graph measures

    cumultive f graph measures

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    uppper boundarycumulative frequency

    delivery timefrequencyless than 550

    5 -< 103less than 10103

    10 -

  • Guidelines for grouping dataSo, in the example,Largest observation = 41 Smallest = 5Require say 8 classesclass width =

  • Cumulative frequency table

    Delivery timeFrequency CumulativeFrequencyLess than 505-

  • Cumulative frequency graph

  • Finding measures from the cumulative frequency graph

  • Measures for this examplemedian look at cumulative frequency of 25 on graph22 daysupper quartile -cumulative frequency of 37.5 on graph28 dayslower quartile - cumulative frequency of 12.5 on graph16 daysinter-quartile range is UQ LQ 28-16=12 days20th percentile look at cumulative frequency of 10 on graph 15 dayslook at 30 days horizontal axis to give 40 deliveries so 50-40 =10 deliveries are more than 30 days90% of deliveries take less 34 days

  • Measures of locationThe mode : Most frequently occurring item 35

    The median: Middle number. The mean

    28 28 35 35 35 36 39 44 44 50

  • Using frequency formula

    XFFX2825635310536136391394428850150

  • Measure from a grouped frequency table

    TimeFrequencyFMidpointXFXCumulative frequency0-

  • MeasuresMean =

    Mode is estimated to be 22.5, the middle of the modal class

    Median

  • Which measure is best3 33 45 710 10 10 25 40Mean= 10.9~11Mode= 3, 10Median = 7

  • quartilesLower quartileUpper quartile

  • Measures of spread

    ArmstrongBarrett3 6 3 4 4 6 4 2 4 5 3 5 4 4 3 5Ordered2 3 3 4 4 4 6 6 3 3 4 4 4 4 5 5Mean4 weeks4Mode44Median44ConcludeLittle differenceRange6-2 = 45-3=2Inter-quartile rangeStandard deviationCoefficient of variation

  • Standard deviation

    X23344466-2-1-1000224110004414

    1.32

  • Coefficient of variationThe higher the ratio, the greater the spread around the mean. Lengths mean=55standard deviation = 28.7coefficient of variation=52%Weightsmean=5.5standard deviation = 2.8.7coefficient of variation = 52%

  • Mean and Standard Deviation for Armstrong

    XFFXFX20.51473.51.51522.533.752.51845112.53.516561964.51567.5303.755.51160.5332.756.51171.5464.75totals3301447

  • mean

  • Standard deviation

  • Probabilty examplesExamplesThrow a coin. The probability of a head = 0.5

    There are three counters in a bag, one red, one green and one blue. One counter is pulled out.The probability that the counter is red =

    The counter is then replaced and a second pulled out.List all the outcomes: RR, RG, RB, GR, GG, GB, BR, BG, BB

    the probability that both the first and the second were red =

  • exampleOver the last month (November) a machine has broken down on three days. What is the probability the machine breaks down?

  • exampleIn a sample of adults these probabilities were found: P(male) = 0.5P(Married)=0.6P(full time job)=0.9A person is selected at random. What is the probability that the person ismarried and male

    ii) male and in a full time job

    iii) female 0.5

  • exampleP(male)=0.7P(aged 40 to 59)=0.4P(aged 60 to 69)=0.15P(aged 70 or more)=0.1

    Female1-0.7 = 0.3

    Probability(40 to 59) or (60 to 69) years old0.4+0.15=0.55

    Probability (female) and (40 to 59) years old0.30.4 = 0.12

    male or aged 40 to 59? CANNOT SAY NOT 0.7 + 0.4 = 1.1

  • exampleThe probability that firm A makes a profit has been assessed to be 0.6. The probability that the firm breaks even is 0.3.What is the probability that the firm makes a loss?1- 0.6 - 0.3 = 0.1

  • example

    Firm ProfitBreak evenlossA0.60.30.1B0.70.10.2

  • exampleboth firms make a profit 0.60.7=0.42firm A does not make a profit 1-0.6=0.4firm A makes a profit or breaks even0.6+0.3 = 0.9firm B does not make a profit 0.3neither firms make a profit 0.40.3=0.12at least one firm makes a profit1-0.12=0.88only one firm makes a profit 1- 0.42 0.12 = 0.46

  • Throw a dieExpected score is

  • expectationExpected number of minutes late00.4 + 30.3 + 50.15 + 100.1 + 150.05 = 3.4 minutes

    03510150.40.300.150.10.05

  • Spread of valueswith =3000 hrs and =200 hrsapproximately 68% of the bulbs will last between 2800 hours and 3200 hours,

    approximately 95% of the bulbs will last between 2600 hours and 3400 hours,

    approximately 99.75% of the bulbs will last between 2400 hours and 3600 hours.

  • Using normal tablesiP(Z1.3) Read directly from the table =0.0968

    iiiP(Z

  • Find Kfind k such that P(Z>K) = 0.15

    also means P(Z

  • Solution to example = 2000 hours and = 250 hours(a) less than 1750 hours

    from tables area to the left of -1 = 0.1587

    more than 2350 hours,

    from tables area to the left of -1.4 = 0.0808

  • Example (c)between 1800 hours and 2400 hours?

    area to the left of -0.8= 0.2119

    area to left of 1.6 = 1- 0.0548 = 0.9452

    area in between = 0.9452-0.2119 = 0.7333

  • Example(d)4% fail

    P(Z

  • Example (e) best 6%P(Z>k) = 0.06 means also that P(Z