16
NUMERICAL DATA ALPHANUMERIC DATA ORDINAL DATA NOMINAL DATA POPULATION INDIVIDUAL SAMPLE GROUP

Lecture 2 - Mean, median 2 - Mean, median.pdf · Lecture 2 - Statistical indicators Minimum, maximum Mean or Average Standard deviation Median Quartiles Percentiles, deciles . Data

  • Upload
    others

  • View
    13

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lecture 2 - Mean, median 2 - Mean, median.pdf · Lecture 2 - Statistical indicators Minimum, maximum Mean or Average Standard deviation Median Quartiles Percentiles, deciles . Data

NUMERICAL DATA ALPHANUMERIC DATA

ORDINAL DATA

NOMINAL DATA

√ √√

√√

POPULATION

INDIVIDUAL

SAMPLE GROUP

Page 2: Lecture 2 - Mean, median 2 - Mean, median.pdf · Lecture 2 - Statistical indicators Minimum, maximum Mean or Average Standard deviation Median Quartiles Percentiles, deciles . Data

Lecture 2 - Statistical indicators

Minimum, maximum Mean or Average Standard deviation Median Quartiles Percentiles, deciles

Page 3: Lecture 2 - Mean, median 2 - Mean, median.pdf · Lecture 2 - Statistical indicators Minimum, maximum Mean or Average Standard deviation Median Quartiles Percentiles, deciles . Data

Data tableNo. Year Name Surname Sex Area Age Decade Occupation Stage1 2008 CALOTA LUCIA F RURAL 62 60-69 FARA OCUPATIE III2 2009 CONSTANTIN MARIN M URBAN 55 50-59 FARA OCUPATIE III3 2007 FLOREA ELENA F RURAL 83 80-89 PENSIONAR II4 2009 HOLT MARIANA F URBAN 65 60-69 PENSIONAR I5 2010 IVANESCU VIRGIL M RURAL 64 60-69 PENSIONAR II6 2012 LEPADAT MARIN M URBAN 68 60-69 PENSIONAR III7 2011 MANOLACHE EUGENIA F RURAL 39 30-39 SALARIAT IV8 2010 MARINESCU DAN M RURAL 57 50-59 PENSIONAR IV9 2008 STAN SANDU M URBAN 53 50-59 PENSIONAR V10 2007 NEAGU MARIA M URBAN 53 50-59 PENSIONAR III11 2008 NEDELEA GHEORGHE F RURAL 70 70-79 PENSIONAR II12 2009 ORZESCU ION M URBAN 71 70-79 PENSIONAR V13 2011 PALIU MARIN F RURAL 76 70-79 PENSIONAR IV14 2013 PISICA MIHAIL F RURAL 72 70-79 PENSIONAR III15 2010 POPESCU PETRE M URBAN 58 50-59 PENSIONAR IV16 2012 PREDA ION M RURAL 45 40-49 SALARIAT V17 2009 ALBU NICOLAE M RURAL 45 40-49 SALARIAT V18 2008 RADUCAN ELISABETA M URBAN 62 60-69 FARA OCUPATIE IV19 2010 RADUCEANU ION M URBAN 39 30-39 FARA OCUPATIE III20 2012 IONESCU MARIA M URBAN 39 30-39 FARA OCUPATIE IV

Page 4: Lecture 2 - Mean, median 2 - Mean, median.pdf · Lecture 2 - Statistical indicators Minimum, maximum Mean or Average Standard deviation Median Quartiles Percentiles, deciles . Data

Statistical indicators for data series

If we record the numerical values of a parameter for several individuals, we create a

DATA SERIES

Hb: 12,5; 13,5; 15,3; 16,4; 11,7,.....etc Age: 36; 54; 73; 46; 31; 46; .....etc We use the following mathematical notation: X: x1, x2, x3,.......xn Y: y1, y2, y3,.......ym

Page 5: Lecture 2 - Mean, median 2 - Mean, median.pdf · Lecture 2 - Statistical indicators Minimum, maximum Mean or Average Standard deviation Median Quartiles Percentiles, deciles . Data

Minimum and maximum X: 58, 74, 70, 71, 56, 68, 70, 82, 62, 62 (ages)

Minimum=56, Maximum=82Y: 58, 74, 70, 71, 56, 68, 70, 82, 62, 62, 59, 46, 57, 71

(4 more values added to the first series)

Minimum=46, Maximum=82 Range of valuesR = Max – Min = 36

Page 6: Lecture 2 - Mean, median 2 - Mean, median.pdf · Lecture 2 - Statistical indicators Minimum, maximum Mean or Average Standard deviation Median Quartiles Percentiles, deciles . Data

Arithmetic mean THE MEAN OR AVERAGE VALUE of a statistical

data series is the sum of all the values dividedby the number of values in that series.

The mean is an indicator of the centraltendency of the series, and usually showswhere data tend to gather.

Although it is computed using real, measuredvalues, the average is an abstract number.

Page 7: Lecture 2 - Mean, median 2 - Mean, median.pdf · Lecture 2 - Statistical indicators Minimum, maximum Mean or Average Standard deviation Median Quartiles Percentiles, deciles . Data

Arithmetic mean - the influence of outliners (extreme values)

Y is made of the values of X plus 4 morevalues.

The value 46, extremely small, greatlymodifies the average.

Page 8: Lecture 2 - Mean, median 2 - Mean, median.pdf · Lecture 2 - Statistical indicators Minimum, maximum Mean or Average Standard deviation Median Quartiles Percentiles, deciles . Data

Weighted arithmetic mean/ Weighted average

F1, F2, ...,Fm are thefrequencies of apparition inthe series (weights) of thevalues x1, x2, ...,xm.

Other means - geometric mean, harmonic mean -are less widely used.

ChildrenNo.of

familiesNo. of

children0 102 01 406 4062 388 7763 143 4294 54 2165 16 806 5 307 1 7

Total 1115 1944Average no. of child per family

1.743498

Page 9: Lecture 2 - Mean, median 2 - Mean, median.pdf · Lecture 2 - Statistical indicators Minimum, maximum Mean or Average Standard deviation Median Quartiles Percentiles, deciles . Data

Variance

The variance is a measurement of the spread ofvalues in a data series

Observations:1. For series with almost equal averages, the

data with the higher variance is more spread.2. For series with almost equal variances, the

data with the lower average is more spread.

Page 10: Lecture 2 - Mean, median 2 - Mean, median.pdf · Lecture 2 - Statistical indicators Minimum, maximum Mean or Average Standard deviation Median Quartiles Percentiles, deciles . Data

Standard Deviation

Variance has the following disadvantages:1. It is measured with the squared units of the original variable

– meaningless units2. Generally, it has very high values compared with the

average. Therefore, another indicator, called the standard

deviation, is commonly used, and it is the square root ofvariance.

We note standard deviation as σ and it has the followingformula:

Page 11: Lecture 2 - Mean, median 2 - Mean, median.pdf · Lecture 2 - Statistical indicators Minimum, maximum Mean or Average Standard deviation Median Quartiles Percentiles, deciles . Data

Practical example

Let us suppose we measured blood pressure in twopatients, daily, for 10 days, obtaining the followingvalues for systolic (maximum) blood pressure:

170, 180, 160, 180, 190, 190, 180, 190, 170, 190,for the first patient

160, 170, 190, 160, 190, 190, 200, 180, 180, 180,for the second patient

Page 12: Lecture 2 - Mean, median 2 - Mean, median.pdf · Lecture 2 - Statistical indicators Minimum, maximum Mean or Average Standard deviation Median Quartiles Percentiles, deciles . Data

Computations

Because both series have the same average value – 180,we can conclude that Y series is more spread, having thehigher standard deviation, 13.3, compared to X’sstandard deviation -10.5.

Page 13: Lecture 2 - Mean, median 2 - Mean, median.pdf · Lecture 2 - Statistical indicators Minimum, maximum Mean or Average Standard deviation Median Quartiles Percentiles, deciles . Data

Coefficient of variation It is the ratio between standard deviation

and average, if the average differs from 0,and is measured as a percentage:

For the previous series, the coefficient ofvariation is greater for the one that is morespread, that is for the one with the higherstandard deviation:

1. C.V.x = 10,5 / 180 = 0,058 = 5,8 %.2. C.V.y = 13,3 / 180 = 0,073 = 7,3%.

Page 14: Lecture 2 - Mean, median 2 - Mean, median.pdf · Lecture 2 - Statistical indicators Minimum, maximum Mean or Average Standard deviation Median Quartiles Percentiles, deciles . Data

Asymmetry indicators Median is the middle number, in a sorted list of

values. Half of the values are lower than the medianand half are higher.

For a measured parameter (e.g. weight, height, age),it represents the value located exactly in the middle ofthe data series , if individuals are ordered ascending ordescending.

Q1 quartile for a measured parameter, it representsthe value to which a quarter of individuals have lowervalues and three quarters - higher values.

Q3 quartile is the value to which three quarters ofindividuals have lower values and a quarter – highervalues.

Page 15: Lecture 2 - Mean, median 2 - Mean, median.pdf · Lecture 2 - Statistical indicators Minimum, maximum Mean or Average Standard deviation Median Quartiles Percentiles, deciles . Data

Median – exampleSystolic blood pressure – 10 days measurements

150,160,160, 170,160,170,150,160,170,160 First we order the values in increasing order:

150,150,160,160,160,160,160,170,170,170 In this case, the median is between the fifth and sixth

value in the ordered sequence, i.e. 160 If these two middle values are different, we consider

the median as their arithmetic mean. If we have an odd number of measurements then the

median is the value found in the middle of theordered series.

Page 16: Lecture 2 - Mean, median 2 - Mean, median.pdf · Lecture 2 - Statistical indicators Minimum, maximum Mean or Average Standard deviation Median Quartiles Percentiles, deciles . Data

Other statistical indicators Deciles. Used for greater samples, including hundreds

of individuals. There are nine deciles, eachcorresponding to 10%, 20% ...... 90% of the sample, ina similar way to quartiles. The 5th decile is the median.

Percentiles. Used in studies on thousands of cases,usually of larger interest, national or international. Thepercentiles correspond to values greater than 1%, 2%,..., 99% of the values found in the sample group. The50th percentile is the median.

Mode is the most frequently occurring number found ina series of values. E.g. - 16 is the mode since it appearsmore times than any other number in the set: 3, 3, 6, 9,16, 16, 16, 27, 27, 37, 48.