Quantitative Anlysis

Embed Size (px)

Citation preview

  • 7/27/2019 Quantitative Anlysis

    1/71

    Quantitative

    Analysis

    Dr. Basheer Ahmad Samim

    111:04 AM

  • 7/27/2019 Quantitative Anlysis

    2/71

    Recommended Readings (Books)

    Introduction to Statistics, Walpole, R.E., 3rd Edition (2000)

    Statistical Methods for Practice andResearch by Ajai S. Gaur and Sanjaya S.Gaur

    Statistical Inference, Cassella, G. andBerger, R. L., 2nd Edition (2002)

    211:04 AM

  • 7/27/2019 Quantitative Anlysis

    3/71

    Attendance Policy8-Weeks Teaching16-Lectures (32-Attendance)

    Twice Roll Call, Once before the breakand once after the break

    At Least 80% (24) Attendance is

    compulsory to be elligible for the FinalExamination

    No Roll Call after First Five(5) minutes

    311:04 AM

  • 7/27/2019 Quantitative Anlysis

    4/71

    Mode of TeachingLecture

    SPSS Workshop

    Discussion Session

    411:04 AM

  • 7/27/2019 Quantitative Anlysis

    5/71

    Mode of AssessmentQuizes (15%)

    Assignments (15%)Class Performance (5%)

    Mid Term Test (25%)Final Examination (40%)

    511:04 AM

  • 7/27/2019 Quantitative Anlysis

    6/71

    Questionnaire

    611:04 AM

  • 7/27/2019 Quantitative Anlysis

    7/71

    Introductionto

    Statistics

    711:04 AM

  • 7/27/2019 Quantitative Anlysis

    8/71

    ConstantA characteristic orproperty that does notchange from individual

    to individual.

    811:04 AM

  • 7/27/2019 Quantitative Anlysis

    9/71

    VariableA characteristic orproperty thatvaries

    from individual toindividual.

    911:04 AM

  • 7/27/2019 Quantitative Anlysis

    10/71

    Types of Variable

    1011:04 AM

    Types ofVariables

    Qualitative

    Nominal Ordinal

    Quantitative

    Discrete Continuous

  • 7/27/2019 Quantitative Anlysis

    11/71

    Nominal ScaleVariable categories are mutually

    exclusive and exhaustive.Variable categories have no

    logical order.

    Eye Color, Hair Color, Gender.

    1111:04 AM

  • 7/27/2019 Quantitative Anlysis

    12/71

    Ordinal ScaleData categories are mutually

    exclusive and exhaustive.Data classifications are ranked orordered according to the

    particular trait they possess.Level of Knowledge about SPSS

    1211:04 AM

  • 7/27/2019 Quantitative Anlysis

    13/71

    Interval ScaleData categories are mutually exclusiveand exhaustive.

    Data classifications are ranked or orderedaccording to the particular trait theypossess.

    Equal differences in the characteristic arenot represented by equal differences inthe measurements.Temperature, Shoe Size and IQ scores

    1311:04 AM

  • 7/27/2019 Quantitative Anlysis

    14/71

    14

    Ratio ScaleData categories are mutually exclusive and

    exhaustive.Data classifications are ranked or ordered

    according to the particular trait they possess. Equal differences in the characteristic are

    represented by equal differences in the

    measurements. The zero point is the essence of the

    characteristic.Height, Weight, Distance.

    11:04 AM

  • 7/27/2019 Quantitative Anlysis

    15/71

    15

    Scale

    Nominal

    Data may only

    be classified

    Eye color,Hair Color

    Gender.

    Ordinal

    Data are

    ranked

    Level ofKnowledge

    aboutSPSS

    Interval

    True Zero Point

    does notExist.

    Temperature,Shoe Size,IQ Scores

    Ratio

    Meaningful Zero

    point and RatioBetween values

    Height, Weight,Distance.

    Measurement Scales

    11:04 AM

  • 7/27/2019 Quantitative Anlysis

    16/71

    16

    Data

    The information collectedfor any kind of investigation.Usually Numerical but can

    be Qualitative.

    11:04 AM

  • 7/27/2019 Quantitative Anlysis

    17/71

    17

    Primary DataThe initial material collected

    during the research process.The information collected

    directly from the respondent.Personal Invetigation, Through Investigator, Through Questionnaire,Through Local Sources, Through Telephone,

    11:04 AM

  • 7/27/2019 Quantitative Anlysis

    18/71

    18

    Secondary DataThe information

    collected and processedby the people other than

    the researcherGovernment Organizations, Semi-GovernmentOrganizations,

    11:04 AM

  • 7/27/2019 Quantitative Anlysis

    19/71

    Data Collection

    Any of the following methods may beadopted:

    (a) Personal interview(b) Direct observation

    (c) Mail interview (internet interview)

    (d) Telephone interview

    What are the cons and pros of each?

    1911:04 AM

  • 7/27/2019 Quantitative Anlysis

    20/71

    Data management

    Office Editing,

    Post Coding,

    Data entry and Verification.

    2011:04 AM

  • 7/27/2019 Quantitative Anlysis

    21/71

    Data organization and Analysis

    Preparing data for analysis, Extracting descriptive measures

    from the data, Using advanced statistical

    techniques to analyze the dataand draw inference there from.

    2111:04 AM

  • 7/27/2019 Quantitative Anlysis

    22/71

    SummationNotation

    11:04 AM 22

  • 7/27/2019 Quantitative Anlysis

    23/71

    FrequencyDistribution

    11:04 AM 23

  • 7/27/2019 Quantitative Anlysis

    24/71

    CrossTabulation

    11:04 AM 24

  • 7/27/2019 Quantitative Anlysis

    25/71

    25

    Measures of Central Tendency

    Arithmetic Mean

    Quantiles(Median, Quartiles, Deciles, Percentiles)

    Mode

    11:04 AM

  • 7/27/2019 Quantitative Anlysis

    26/71

    26

    ArithmeticMean

    A value obtained by dividing the sum of all the observations by

    their number.

    nn

    XXXX

    n

    1ii

    n21X

    If X1, X2, ,Xn are n observations of a variable X then

    nsobservatiotheofNumbernsobservatiotheallofSumMeanArithmetic

    11:04 AM

  • 7/27/2019 Quantitative Anlysis

    27/71

    27

    Arithmetic Mean

    The marks obtained by 8 students are:

    Marks5.688

    548

    8

    637267

    X

    67 72 68 70 65 68 75 63

    11:04 AM

  • 7/27/2019 Quantitative Anlysis

    28/71

    28

    QuantilesFor individual observations/discrete frequencydistribution, the ith quartile, jth decile and kth

    percentile are located in the array/discrete frequencydistribution by the following relations

    32,1,ion,distributiin thenobservatioth4

    1)i(nQi

    ,92,1,jon,distributiin thenobservatioth10

    1)j(nDj

    ,992,1,kon,distributiin thenobservatioth100

    1)k(nPk

    11:04 AM

  • 7/27/2019 Quantitative Anlysis

    29/71

    29

    The weekly TV Watching times (Hours):

    25 41 27 32 43 66 35 31 15 5

    34 26 32 38 16 30 38 30 20 21

    Quartiles

    The array of the above data is given below:

    5 15 16 20 21 25 26 27 30 3031 32 32 34 35 37 38 41 43 66

    11:04 AM

  • 7/27/2019 Quantitative Anlysis

    30/71

    30

    Quartiles

    Hours22.021}-0.25{2521

    obs.}5th-obs.0.25{6thobs.th5

    ondistributiin thenobservatioth25.5

    ondistributiin thenobservatioth

    4

    1)1(20Q1

    11:04 AM

  • 7/27/2019 Quantitative Anlysis

    31/71

    31

    Hours30.530}-0.50{3130

    obs.}10th-obs.0.50{11thobs.th10

    ondistributiin thenobservatioth50.10

    ondistributiin thenobservatioth

    4

    1)2(20Q2

    Quartiles

    11:04 AM

  • 7/27/2019 Quantitative Anlysis

    32/71

    32

    Quantiles

    11:04 AM

  • 7/27/2019 Quantitative Anlysis

    33/71

    33

    ModeThe mode is a value which occurs

    most frequently in a set of data. Ormode is a value that occurs

    maximum number of times in a

    sequence of observations.

    11:04 AM

  • 7/27/2019 Quantitative Anlysis

    34/71

    34

    The total automobile sales (in millions) in

    the United States for the last 14 years.

    9.0 8.2 8.0 9.1 10.3 11.0 11.5

    10.3 10.5 9.8 9.3 8.2 8.2 8.5

    Mode

    Mode = 8.2 million

    11:04 AM

  • 7/27/2019 Quantitative Anlysis

    35/71

    35

    Measures of variation measure thevariation present among the values

    of a data set, so measures ofvariation are measures of spread of

    values in the data.

    11:04 AM

  • 7/27/2019 Quantitative Anlysis

    36/71

    36

    Absolute Measures of

    Dispersion

    RangeQuartile Deviation

    Mean (Average) Deviation

    Variance and Standard Deviation

    11:04 AM

  • 7/27/2019 Quantitative Anlysis

    37/71

    37

    Relative Measures ofDispersion

    Coefficient of RangeCoefficient of Quartile Deviation

    Coefficient of Mean Deviation

    Coefficient of Variation (CV)

    11:04 AM

  • 7/27/2019 Quantitative Anlysis

    38/71

    38

    RangeDifference between the largest

    and the smallest observations

    Largest SmallestRange X X

    11:04 AM

  • 7/27/2019 Quantitative Anlysis

    39/71

    39

    Ignores the way in which data are distributed

    Sensitive to outliers

    7 8 9 10 11 12

    Range = 12 - 7 = 5

    7 8 9 10 11 12

    Range = 12 - 7 = 5

    Disadvantages of the Range

    1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5

    1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120

    Range = 5 - 1 = 4

    Range = 120 - 1 = 119

    11:04 AM

  • 7/27/2019 Quantitative Anlysis

    40/71

    Inter-quartile Range (IQR)

    Inter-quartile range = 3rd quartile 1st QuartileQ3 - Q1

    IQR is independent of outliers

    4011:04 AM

  • 7/27/2019 Quantitative Anlysis

    41/71

    Inter-quartile Range

    41

    Median

    (Q2)X

    maximumXminimum Q1 Q3

    25% 25% 25% 25%

    12 30 45 57 70

    Inter-quartile Range (IQR)

    = 57 30 = 27

    11:04 AM

  • 7/27/2019 Quantitative Anlysis

    42/71

    42

    The Mean (absolute) Deviation

    X

    8 3

    5 0

    2 -3

    0

    Mean Deviation is the average of absolutedeviations taken form the mean value.

    ( ) 62

    3

    x x

    n

    3

    0

    3

    6

    ( )X X X X

    11:04 AM

  • 7/27/2019 Quantitative Anlysis

    43/71

    43

    Variance

    Variance is the averageof the squared

    deviations taken fromthe mean value.

    X cm (X-Mean)^2 X2

    4 36 16

    6 16 369 1 81

    12 4 144

    13 9 169

    16 36 25660 102 702

    2

    2 2

    2

    222 2

    ( ) 102( ) 17

    6

    702 102( ) 176 6

    x xi S cm

    n

    X Xii S cm

    n n

    11:04 AM

    Comparing Standard Deviations

  • 7/27/2019 Quantitative Anlysis

    44/71

    44

    Comparing Standard Deviations

    Mean = 15.5S = 3.33811 12 13 14 15 16 17 18 19 20 21

    Data A

    11 12 13 14 15 16 17 18 19 20 21

    Mean = 15.5

    S = 4.567

    Data C

    The smaller the standard deviation, the more tightlyclustered the scores around mean

    The larger the standard deviation, the more spread outthe scores from mean11:04 AM

    11 12 13 14 15 16 17 18 19 20 21

    Data BMean = 15.5

    S = 0.926

  • 7/27/2019 Quantitative Anlysis

    45/71

    45

    Relative Measures of Variation

    Largest Smallest

    Largest Smallest

    Coefficient of RangeX X

    X X

    3 1

    3 1

    Coefficient of Quartile DeviationQ Q

    Q Q

    Coefficient of Mean Deviation MDMean

    11:04 AM

  • 7/27/2019 Quantitative Anlysis

    46/71

    Coefficient of Variation (CV)

    Can be used to compare two or moresets of data measured in differentunits or same units but different

    average size.

    11:04 AM 46

    100%X

    SCV

  • 7/27/2019 Quantitative Anlysis

    47/71

    47

    Use of Coefficient of Variation Stock A:

    Average price last year = $50 Standard deviation = $5

    Stock B:

    Average price last year = $100

    Standard deviation = $5

    but stock B is

    less variablerelative to its

    price

    10%100%$50

    $5100%

    X

    SCV

    A

    5%100%$100

    $5100%

    X

    SCVB

    Both stocks

    have the

    same

    standard

    deviation

    11:04 AM

  • 7/27/2019 Quantitative Anlysis

    48/71

    48

    Appropriate Choice of Measure

    of Variability

    If data are symmetric, with no serious

    outliers, use range and standarddeviation.

    If data are skewed, and/or have serious

    outliers, use IQR. If comparing variation across two data

    sets, use coefficient of variation (C.V)

    11:04 AM

  • 7/27/2019 Quantitative Anlysis

    49/71

    49

    Five Number SummaryThe five number summary of a data set consists of the

    minimum value, the first quartile, the second quartile, the

    third quartile and the maximum value written in that order:Min, Q1, Q2, Q3, Max.

    From the three quartiles we can obtain a measure of central

    tendency (the median, Q2

    )and measures of variation of thetwo middle quarters of the distribution, Q2-Q1 for the

    second quarter and Q3-Q2for the third quarter.

    11:04 AM

  • 7/27/2019 Quantitative Anlysis

    50/71

    50

    The weekly TV viewing times (in hours).

    25 41 27 32 43 66 35 31 15 5

    34 26 32 38 16 30 38 30 20 21

    The array of the above data is given below:

    5 15 16 20 21 25 26 27 30 30

    31 32 32 34 35 37 38 41 43 66

    Five Number Summary

    11:04 AM

  • 7/27/2019 Quantitative Anlysis

    51/71

    51

    Hrs22.021}-0.25{2521obs.}5th-obs.0.25{6thobs.5th;Q1ofVALUE

    obs.5.25thdatain theobs.th4

    1)1(20;Q1ofLOCATION

    Five Number Summary

    Hrs30.530}-0.50{3103obs.}10th-obs.0.50{11thobs.th10;Q2ofVALUE

    obs.th50.10datain theobs.th4

    1)2(20;2QofLOCATION

    Minimum value=5.0 Maximum value=66.0

    Hrs36.535}-0.75{3735obs}15th-obs{16th75.0obs15th;3QofVALUE

    obs.15.75thdatain theobs.th

    4

    1)3(20;3QofLOCATION

    11:04 AM

  • 7/27/2019 Quantitative Anlysis

    52/71

    52

    Box and Whisker DiagramA box and whisker diagram or box-plot is a

    graphical mean for displaying the five number

    summary of a set of data. In a box-plot the firstquartile is placed at the lower hinge and the

    third quartile is placed at the upper hinge. The

    median is placed in between these two hinges.

    The two lines emanating from the box are

    called whiskers. The box and whisker diagram

    was introduced by Professor Jhon W. Tukey.

    11:04 AM

  • 7/27/2019 Quantitative Anlysis

    53/71

    53

    Construction of Box-Plot

    1. Start the box from Q1 and end atQ3

    2. Within the box draw a line torepresent Q2

    3. Draw lower whisker to Min.Value up to Q1

    4. Draw upper Whisker from Q3 upto Max. Value

    Q1

    Q3

    Q2

    11:04 AM

    MaxValue

    MinValue

    70

  • 7/27/2019 Quantitative Anlysis

    54/71

    54

    Construction of Box-Plot

    1. Q1=22.0 Q3=36.5

    2. Q2=30.53. Minimum Value=5.0

    4. Maximum Value=66.0

    70

    60

    50

    40

    30

    20

    10

    0

    11:04 AM

    70

  • 7/27/2019 Quantitative Anlysis

    55/71

    55

    Interpretation of Box-Plot

    70

    60

    50

    40

    30

    20

    10

    0

    Box-Whisker Plot is useful to identify

    Maximum and Minimum Values in the data

    Median of the data

    IQR=Q3-Q1,Lengthy box indicates more variability in the data

    Shape of the data From Position of line within box

    Line At the center of the box----Symmetrical

    Line above center of the box----Negatively skewed

    Line below center of the box----Positively Skewed

    Detection of Outliers in the data

    11:04 AM

  • 7/27/2019 Quantitative Anlysis

    56/71

    56

    OutliersAn outlier is the values that falls well outside the overall

    pattern of the data. It might be

    the result of a measurement or recording error,

    a member from a different population,

    simply an unusual extreme value.

    An extreme value needs not to be an outliers; it might,

    instead, be an indication of skewness.

    11:04 AM

  • 7/27/2019 Quantitative Anlysis

    57/71

    57

    Inner and Outer Fences

    If Q1=22.0 Q2=30.5 Q3=36.5

    25.58IQR1.5QFenceInnerUpper

    25.0IQR1.5QFenceInnerLower:FencesInner

    3

    1

    0.80IQR3QFenceOuterUpper

    5.21IQR3QFenceOuterLower:FencesOuter

    3

    1

    11:04 AM

  • 7/27/2019 Quantitative Anlysis

    58/71

    58

    Identification of the Outliers

    1. The values that lie within inner

    fences are normal values

    2. The values that lie outside inner

    fences but inside outer fencesare possible/suspected/mild

    outliers

    3. The values that lie outside outer

    fences are sure outliers

    80

    70

    60

    50

    40

    30

    20

    10

    0

    Plot each suspected outliers with an asteriskand each sure outliers with an hollow dot.

    *

    Only66 is amildoutlier

    11:04 AM

  • 7/27/2019 Quantitative Anlysis

    59/71

    59

    Box plots are

    especially suitable for

    comparing two or moredata sets. In such a

    situation the box plots

    are constructed on the

    same scale.

    Uses of Box and Whisker Diagram

    Male Female11:04 AM

  • 7/27/2019 Quantitative Anlysis

    60/71

    Standardized VariableA variable that has mean 0 and Variance 1 is

    called standardized variable

    Values of standardized variable are calledstandard scores

    Values of standard variable i.e standard scores areunit-less

    Construction

    VariableofDeviationStandard

    VariableofMeanVariableZ

    11:04 AM 60

  • 7/27/2019 Quantitative Anlysis

    61/71

    X Z

    3 25 -1.3624 1.8561

    6 4 -0.5450 0.2970

    11 9 0.81741 0.6682

    12 16 1.0899 1.1879

    32 54 0 4.009

    5.134

    54

    84

    32

    2

    xS

    n

    X

    X

    2

    )( XX

    67.3

    8

    X

    Sx

    XX

    Z

    14009.4

    0

    2

    zS

    n

    ZZ

    2)( ZZ

    Variable Z has mean 0 and

    variance 1 so Z is a standard variable.

    Standard Score at X=11 is 8174.067.3

    811

    Sx

    XXZ

    11:04 AM

    Standardized Variable

  • 7/27/2019 Quantitative Anlysis

    62/71

    62

    The industry in which sales rep Mr. Atif works has meanannual sales=$2,500

    standard deviation=$500.

    The industry in which sales rep Mr. Asad works has meanannual sales=$4,800

    standard deviation=$600.

    Last year Mr.Atifs sales were $4,000 andMr.Asads sales were $6,000.

    Performance evaluation by z-scores

    Which of the representatives would you hireif you have one sales position to fill?

    11:04 AM

  • 7/27/2019 Quantitative Anlysis

    63/71

    63

    Performance evaluation by z-scores

    3500

    500,2000,4

    B

    B

    BB

    B

    Z

    S

    XXZ

    Sales rep. Atif

    XB= $2,500

    SB= $500

    XB= $4,000

    Sales rep. Asad

    XP=$4,800

    SP= $600

    XP= $6,000

    2600

    800,4000,6

    P

    P

    PPP

    Z

    S

    XXZ

    Mr. Atif is the best choice11:04 AM

  • 7/27/2019 Quantitative Anlysis

    64/71

    64

    valuesof68.26%aboutcontains1SX

    The Empirical Rule

    X

    68.26%

    1SX

    valuesof99.73%aboutcontains3SX

    valuesof95.45%aboutcontains2SX 95.45%

    X 2S

    X 3S

    99.73%

    11:04 AM

  • 7/27/2019 Quantitative Anlysis

    65/71

    65

    A distribution in which the values equidistant from

    the centre have equal frequencies is defined to be

    symmetrical and any departure from symmetry is

    called skewness.

    1. Length of Right Tail = Length of Left

    Tail

    2. Mean = Median = Mode

    3. Sk=0a) Sk=(Mean-Mode)/SD

    b) Sk=(Q3-2Q2+Q1)/(Q3-Q1)

    11:04 AM

    Measures of Skewness

  • 7/27/2019 Quantitative Anlysis

    66/71

    66

    A distribution is positively skewed, if the observationstend to concentrate more at the lower end of the possiblevalues of the variable than the upper end. A positivelyskewed frequency curve has a longer tail on the righthand side

    1. Length of Right Tail > Length of Left

    Tail

    2. Mean > Median > Mode

    3. SK>0

    MeasuresofSkewness

    11:04 AM

  • 7/27/2019 Quantitative Anlysis

    67/71

    67

    A distribution is negatively skewed, if the

    observations tend to concentrate more at the upper

    end of the possible values of the variable than the

    lower end. A negatively skewed frequency curve has a

    longer tail on the left side.

    1. Length of Right Tail < Length of Left

    Tail

    2. Mean < Median < Mode

    3. SK< 0

    11:04 AM

    Measures of Skewness

  • 7/27/2019 Quantitative Anlysis

    68/71

    11:04 AM 68

    The Kurtosis is the degree of peakedness or flatness of a

    unimodal (single humped) distribution,

    When the values of a variable are highly concentrated around

    the mode, the peak of the curve becomes relatively high; the

    curve is Leptokurtic.

    When the values of a variable have low concentration around

    the mode, the peak of the curve becomes relatively flat;curve

    is Platykurtic.

    A curve, which is neither very peaked nor very flat-toped, it

    is taken as a basis for comparison, is called

    Mesokurtic/Normal.

    Measures of Kurtosis

  • 7/27/2019 Quantitative Anlysis

    69/71

    6911:04 AM

    Measures of Kurtosis

  • 7/27/2019 Quantitative Anlysis

    70/71

    70

    Measures of Kurtosis

    1. If Coefficient of Kurtosis > 3 -----------------Leptokurtic.

    2. If Coefficient of Kurtosis = 3 -----------------Mesokurtic.

    3. If Coefficient of Kurtosis < 3 ----------------- isPlatykurtic.

    4

    22

    n X-XCoefficient of Kurtosis=

    X-X

    11:04 AM

  • 7/27/2019 Quantitative Anlysis

    71/71

    SPSS

    Statistical Packagefor Social Sciences