Measures of Cental Tendency

Embed Size (px)

Citation preview

  • 8/2/2019 Measures of Cental Tendency

    1/40

    MEASURES OFCENTRAL TENDENCY

    (AVERAGES)

    Topic #6

  • 8/2/2019 Measures of Cental Tendency

    2/40

    Measures

    of Central Tendency

    A measure of central tendencyis a univariate statistic

    that indicates, in one manner or another,

    the average ortypicalobserved value of a variable ina data set, or

    put otherwise, the centerof the frequency distributionof the data.

  • 8/2/2019 Measures of Cental Tendency

    3/40

    Central Tendency (cont.)

    The central tendency of age among Berkeley faculty

    member clearly increased from 1969-70 to 1988-89. On the other hand, the central tendency of Democratic

    House vote by CD did not change much from 1948 to1970 (though other characteristics of its frequencydistribution certainly did change).

  • 8/2/2019 Measures of Cental Tendency

    4/40

    How to Calculate Averages

    We consider three measures of central tendency that areappropriate for three different levels of measurement(nominal, ordinal, and interva).

    There are others, e.g., the geometricand harmonic

    means, appropriate only for ratio variables.

    For each measure of central tendency, we will considerhow it may be calculated from three different startingpoints: a list of observed values (i.e., raw data, e.g., one column in a

    data spread sheet);

    a frequency table (or bar graph);

    a histogram and, in particular, a

    continuous density curve (like income in PS #5C).

  • 8/2/2019 Measures of Cental Tendency

    5/40

    The Mode

    The mode (ormodal value) of a variable in a set of datais the value of the variable that is observed mostfrequentlyin that data (or, given a continuous frequencycurve, is at the point ofgreatest density).

    Note: the mode is the value that is observed most

    frequently, not the frequencyitself. (I see this error toofrequently on tests.)

    The mode is defined foreverytype of variable [i.e.,

    nominal, ordinal, interval, orratio]. However, the mode is used as a measure of central

    tendency primarily for nominal variables only.

  • 8/2/2019 Measures of Cental Tendency

    6/40

    The Mode (cont.) The mode is defined for ordinal and interval variables as well.

    But since it takes account of only the nominal characteristics ofvariables (i.e., whether two cases have the same or different values), itis usually very unsatisfactory when applied to ordinal or higher levelvariables.

    The mode may be ill-defined if we have either:

    a small number of cases; or a precisely measured continuous variable and a finite number of

    cases;

    because in either event it is likely that no value will be observedmore than once in the data.

    In any event, the mode is unstable in that

    small changes in the data can result in large and erratic changes inthe modal value; and

    changes in coding the variable (for example, RELIGION), orchanges in class intervals, can change the modal value.

  • 8/2/2019 Measures of Cental Tendency

    7/40

    How To Calculate the Mode

    Given a list of observed values (raw data):

    Construct a frequency table (see next slide).

  • 8/2/2019 Measures of Cental Tendency

    8/40

    How To Calculate the Mode (cont.)

    Given a frequency table or bar graph (or having constructed one):

    observe which value in the table (or graph) has the greatest(absolute orrelative) frequency;

    the most frequent value (notthe frequencyitself) is the mode.

    Notice that the modal number of problem sets turned in is 5, eventhough most students turned in fewer than 5,

    So if we recoded the variable to create just two dichotomous categories:

    (a) turned in all 5

    (b) did not turn in all 5

    the latter category becomes the modal category

  • 8/2/2019 Measures of Cental Tendency

    9/40

    How To Calculate the Mode (cont.) Given a continuous frequency curve:

    the mode is the value of the variable under thehighest pointof the frequency curve (the point withthe greatest density of observed values).

  • 8/2/2019 Measures of Cental Tendency

    10/40

    The Median

    The median (ormedian value) of a variable in a data setis the value in the middle of the observations, in the sense that no

    more than halfof the cases have lowervalues and no more thanhalf of the cases have highervalues or,

    more generally, such that no more than half of the cases havevalues that lie on either side of the median value.

    Given a quite precisely measured continuous variableand a very large number of cases, we can in practice say

    that half the cases have lower values and half have higher values

    (e.g., LEVEL OF INCOME, SAT scores).

    Equivalently, the median value is the value of the case at the50th percentile of the distribution.

  • 8/2/2019 Measures of Cental Tendency

    11/40

  • 8/2/2019 Measures of Cental Tendency

    12/40

    The Median (cont.)

    G

    iven a list of observed values (raw data): rank order the cases in terms of their observed values(e.g., from lowest to highest);

    identify the value of the case right at the middle of thisrank-ordered list, and

    the value of this case is the median value; or construct a frequency table and find where the

    cumulative frequency crosses the 50% mark.

  • 8/2/2019 Measures of Cental Tendency

    13/40

    The Median (cont.)

  • 8/2/2019 Measures of Cental Tendency

    14/40

    The Median (cont.)If the number of cases is even, there is no observed value at the exact middle

    of the list. Look at the pair of observations closest to the middle of the list. Ifthey have the same value, that value is the median. If they have differentvalues, every value in the intervalbounded by these two values meets thedefinition of a median; but conventionally the median in this event is definedas the midpointof the interval.

  • 8/2/2019 Measures of Cental Tendency

    15/40

    TABLE 1 PERCENT OF POPULATION AGED 65 OR HIGHER IN THE 50 STATES

    (UNIVARIATE DATA ARRAY)

    Alabama 12.4 Montana 12.5

    Alaska 3.6 Nebraska 13.8Arizona 12.7 Nevada 10.6

    Arkansas 14.6 New Hampshire 11.5

    California 10.6 New Jersey 13.0

    Colorado 9.2 New Mexico 10.0

    Connecticut 13.4 New York 13.0

    Delaware 11.6 North Carolina 11.8

    Florida 17.8 North Dakota 13.3

    Georgia 10.0 Ohio 12.5Hawaii 10.1 Oklahoma 12.8

    Idaho 11.5 Oregon 13.7

    Illinois 12.1 Pennsylvania 14.8

    Indiana 12.1 Rhode Island 14.7

    Iowa 14.8 South Carolina 10.7

    Kansas 13.6 South Dakota 14.0

    Kentucky 12.3 Tennessee 12.4

    Louisiana 10.8 Texas 9.7Maine 13.4 Utah 8.2

    Maryland 10.7 Vermont 11.9

    Massachusetts 13.7 Virginia 10.6

    Michigan 11.5 Washington 11.8

    Minnesota 12.6 West Virginia 13.9

    Mississippi 12.1 Wisconsin 13.2

    Missouri 13.8 Wyoming 8.9

  • 8/2/2019 Measures of Cental Tendency

    16/40

    (1) Rank of state

    (2) Name of state

    (3) % over 65 instate (value ofvariable)

    (4) percentile rankof state

  • 8/2/2019 Measures of Cental Tendency

    17/40

    The Median (cont.)Is median income higher or lower than modal

    income?

  • 8/2/2019 Measures of Cental Tendency

    18/40

    The Median (cont.)Draw a vertical line such that half of the area under the

    frequency curve lies on one side of the line and half onthe other side. The median value of the variable lies onthis line.

  • 8/2/2019 Measures of Cental Tendency

    19/40

    Center of Gravity of Data The median (and the mode) may fail to be the center of

    gravity or balance point of the data:Data (ranked): 5 6 7 8 14

  • 8/2/2019 Measures of Cental Tendency

    20/40

    The Mean

    The mean (ormean value) of a variable in a set of datais the result of adding up all the observed values of thevariable and dividing by the number of cases (i.e., theaverage as the term is most commonly used).

    The mean is defined if and only if the variable is at least

    intervalin nature [i.e., intervalorratio].

    Suppose we have a variableXand a set of casesnumbered 1,2,...,n. Let the observed value of thevariable in each case be designated x1,x2, etc. Thus:

  • 8/2/2019 Measures of Cental Tendency

    21/40

    How to Calculate the Mean

    G

    iven a list of observed values (raw data): Following the formula above, add up the observedvalues of the variable in all cases and divide by thenumber of cases.

    Given a frequency table (or bar graph): Take each value and multiply it by its absolute

    frequency, add up these products over all values, anddivide this sum by the number of cases; or(and this isusually easier)

    take each value and multiply it by the decimal fraction(between zero and one) that represents its relativefrequency, and add up these products over all values.

    Do not divide by the number of cases you have alreadydone this as a result of multiplying each value by a decimalfraction.

  • 8/2/2019 Measures of Cental Tendency

    22/40

  • 8/2/2019 Measures of Cental Tendency

    23/40

    The Mean (cont.) Is mean income lower or higher than median (or modal)

    income?

  • 8/2/2019 Measures of Cental Tendency

    24/40

    The Mean (cont.) The mean is the center of gravity of the distribution.

    Determine (by eyeball approximation) the value of thevariable such that the density balances at that point;this value is the mean.

  • 8/2/2019 Measures of Cental Tendency

    25/40

    Average Values For a quantitative variable, the range of observed values

    of a variable in a set of data is the interval extendingfrom the minimum observed value to the maximumobserved value.

    For everymeasure of central tendency, the averagevalue of the observations lies somewhere in this range,

    often (but certainly not always) somewhere near themiddle of this range. The modal or median observed value can equal the minimum or

    (like the modal problem sets turned in) the maximum value, but

    the mean observation can do so only in the very special case in

    which all cases have identical observed values, so there is nodispersion [next topic] in the data and min value = max value.

    Once again, remember that the median is not (necessarily) themidpoint of the range.

  • 8/2/2019 Measures of Cental Tendency

    26/40

    Average Values (cont.)

    If a quantitative variable is discrete, so all of its valuesare whole numbers,

    the modal or median observed value is always awhole number also (with the possible exception notedfor the median when the number of cases is even),

    but the mean observation is almost never a whole

    number.

    The average family may have 2.374 children, i.e., the mean number of children per family may be 2.374,

    even though no individual family can have that [or anyfractional] number of children.

    Likewise, the mean number of problem sets turned inwas 3.85.

  • 8/2/2019 Measures of Cental Tendency

    27/40

    Deviations From the Average

    Unless all cases have the same observed value [i.e., nodispersion], some (in the case of mean, probably all)observed values will differ from the average value.

    If the variable is interval (or ratio) in nature, we cancalculate the deviation from the average in each case, i.e., the distance or interval from the observed value to the

    average value.

    The deviation in each case is positive if the observed value is greater than the average value, negative if the observed value is less than the average value, or

    zero if the observed value is equal to the average value.

  • 8/2/2019 Measures of Cental Tendency

    28/40

    Deviations From the Average (cont.)

    Since we have three different types of averages, we alsohave three different types of deviations from theaverage. The deviations from each type of averagehave different properties.

    There are fewernon-zero deviations from the modal

    value than from any other value of the variable. No more than halfof the deviations from the medianvalue arepositive and no more than halfare negative.

    Unless several cases have the median value (andperhaps even then), the number of cases with

    positive deviations equals the number with negativedeviations; that is,

    the cases with positive and negative deviations balance outwith respect to theirnumber.

  • 8/2/2019 Measures of Cental Tendency

    29/40

    Deviations From the Average (cont.)

    The sum (or mean) of the absolute deviations (i.e.,ignoring whether the deviations are positive or negative)from the median is less than the sum (or mean) of theabsolute deviations from any other value of the variable.

    The sum (or mean) of the (algebraic, i.e., takingaccount of whether deviations are positive or negative)deviations from the mean is zero; that is, the positive and negative deviations balance out with respect to

    their total (positive and negative) magnitudes.

    The sum (or mean) of the squared deviations from themean is less than the sum of the squared deviationsfrom any other value of the variable.

  • 8/2/2019 Measures of Cental Tendency

    30/40

  • 8/2/2019 Measures of Cental Tendency

    31/40

    Median vs. Mean Values (cont.)

    If the distribution of the data issymmetric, the median and meanvalues are the same.

    If the distribution is almostsymmetric, the median and themean are almost the same.

    For example, test scores aretypically distributedapproximately symmetrically,so median and mean testscores are typically

    approximately the same.Median Mean

    MC 15.5 15.71

    BB1 15.75 15.865

    Score 32.75 31.577

  • 8/2/2019 Measures of Cental Tendency

    32/40

    Median vs. Mean Values (cont.) If the distribution of the data is skewed, the mean is

    pulled (relative to the median) in the direction of the longthin tail. For example, income is distributed in a highly skewed fashion,

    with a long thin tail in the direction of higher income. Thus meanincome is typically considerably higher than median income.

  • 8/2/2019 Measures of Cental Tendency

    33/40

    Median vs. Mean Values (cont.)

    Question: which arrow (red or green) points tothe median value and which to the median?

    On the course webpage, see Statistical Applets:

    Mean and Median

  • 8/2/2019 Measures of Cental Tendency

    34/40

    Median vs. Mean Values (cont.)

    The median (unlike the mean) is resistant to outliers,where an outlier(in univariate data) is a case with anextreme (very high or very low) value. That is, addingsome outliers to the data (or removing them) may have abig impact on the mean value but usually has little impacton the median value.

  • 8/2/2019 Measures of Cental Tendency

    35/40

    Median vs. Mean Values (cont.)If the observed values in some distribution of data changes, the

    median value changes only ifthe value (or identity) of themedian case changes, whereas the mean value most likely isaffected by anychange in the data. For example, if the richget richer while everybody else stays about the same, meanincome increases while median income stays the same.

  • 8/2/2019 Measures of Cental Tendency

    36/40

    Median vs. Mean Values (cont.)

    If changes in the data do not change the sum of allobserved values, i.e., if the sum of all values remainsconstant, the mean remains constant, while the medianmay change.

    For example, if Congress has decided that a tax cutwill total $100 billion but has not decided how thisfixed sum should be divided up among the nations100 million households,

    the median benefit will depend on the specifics of

    the legislation but the mean tax benefit per household will be

    $1000 in any event.

  • 8/2/2019 Measures of Cental Tendency

    37/40

    Median vs. Mean Values (cont.) If observed values arepolarizedwith approximately half

    the cases having high values, half having low, and nonehaving medium values, the median is unstable; that is, the median value will be high or low depending on whether

    slightly more than half the cases have high values or slightlymore than half the cases have low values, whereas

    the mean value will barely change as a result of such small shiftsin the data.

  • 8/2/2019 Measures of Cental Tendency

    38/40

    TEST 1: FREQUENCY DISTRIBUTION OF LETTER GRADES

  • 8/2/2019 Measures of Cental Tendency

    39/40

    TEST 1: UNIVARIATE STATISTICS:

    FALL 09 FALL 10

  • 8/2/2019 Measures of Cental Tendency

    40/40

    TEST 1: BIVARIATE CHART AND SUMMARY STATISTIC