35
1 PUAF 610 TA Session 2

PUAF 610 TA

  • Upload
    noam

  • View
    68

  • Download
    1

Embed Size (px)

DESCRIPTION

PUAF 610 TA. Session 2. Today. Class Review- summary statistics STATA Introduction Reminder: HW this week. Review: Two types of Statistics. Descriptive statistics summarize numerical information. Inferential statistics uses a sample to infer the population. Summary statistic. - PowerPoint PPT Presentation

Citation preview

Page 1: PUAF 610  TA

1

PUAF 610 TA

Session 2

Page 2: PUAF 610  TA

2

Today

• Class Review- summary statistics

• STATA Introduction

• Reminder: HW this week

Page 3: PUAF 610  TA

3

Review: Two types of Statistics

• Descriptive statistics summarize numerical information.

• Inferential statistics uses a sample to infer the population.

Page 4: PUAF 610  TA

4

Summary statistic

• In descriptive statistics, summary statistics are used to summarize a set of observations.

• Typically, – What is the central value?– How widely are values spread from the

center?– Are there data that are very atypical?– ….

Page 5: PUAF 610  TA

5

Summary statistic

• a measure of location, or central tendency

• a measure of statistical dispersion

• a measure of the shape of the distribution

Page 6: PUAF 610  TA

6

Central tendency

• Central tendency relates to the way in which quantitative data tend to cluster around some value.

• A measure of central tendency is any of a number of ways of specifying the “central value”.

Page 7: PUAF 610  TA

7

Basic measures of central tendency

• Mean

• Median

• Mode

Page 8: PUAF 610  TA

8

Mean

• the sum of all measurements divided by the number of observations in the data set

• population mean () v. sample mean (“x-bar”)

Page 9: PUAF 610  TA

9

Example

• Assume 4 people take PUAF 610, and their final exam scores are 95, 87, 93, 83. What’s the mean for exam score?

Page 10: PUAF 610  TA

10

Example

• Mean= (95+87+93+83)/4=89.5

Page 11: PUAF 610  TA

11

Median

• the middle observation, when data are ordered from smallest to largest

• the point of a distribution that divides the bottom 50% from the top 50% of the data. The median is the 50th percentile.

Page 12: PUAF 610  TA

12

Median

• If there is an odd number of observations, the median is the middle observation

• If there is an even number of observations, the median is the average of the two middle observations

• If the dataset is arranged in increasing order the median is located at position (n+1)/2

Page 13: PUAF 610  TA

13

Example

• Calculate the sample median for the following observations: 1, 5, 2, 8, 7.

• Start by sorting the values: 1, 2, 5, 7, 8.

• The median is located at position (n+1)/2=3, thus it is 5.

• An odd number of values.

Page 14: PUAF 610  TA

14

Example

• Calculate the sample median for the following observations: 1, 5, 2, 8, 7, 2.

• Start by sorting the values: 1, 2, 2, 5, 7, 8.

• The median is located at position (n+1)/2=3.5, Thus, it is the average of the two middlemost terms (2 + 5)/2 = 3.5.

• An even number of values

Page 15: PUAF 610  TA

15

Mode

• the most frequent value in the data set

• It is possible for a distribution to have more than one mode or not to have a mode at all.

Page 16: PUAF 610  TA

16

Example

• The mode for the following data set

• (1) 1, 2, 2, 3, 4, 7, 9

• (2) 12, 26, 26, 53, 84, 71, 71, 79

• (3) 32, 46, 53, 94, 37, 29

Page 17: PUAF 610  TA

17

Comparing of Mode, Median and Mean

• Pros and Cons

• For descriptive purposes we might use the measure that suits the data.

• If we would like to infer from samples to populations, the mean is a measure of choice because it can be manipulated mathematically.

Page 18: PUAF 610  TA

18

Summary statistic

• a measure of location, or central tendency

• a measure of statistical dispersion, or variation

• a measure of the shape of the distribution

Page 19: PUAF 610  TA

19

Measures of Variation

• Variation is variability or spread in a variable

• Measures of variation are lengths of intervals on the measurement scale that indicate the spread of values in a distribution.

Page 20: PUAF 610  TA

20

Measures of Variation

• Range

• Quartiles

• Interquartile range

• Variance

• Standard Deviation

Page 21: PUAF 610  TA

21

Range

• the length of the smallest interval which contains all the data

• (highest value – lowest value) + 1

Page 22: PUAF 610  TA

22

Quartiles

• any of the three values which divide the sorted data set into four equal parts, so that each part represents one fourth of the sampled population.

Page 23: PUAF 610  TA

23

Quartiles

• first quartile (Q1) = lower quartile = cuts off lowest 25% of data = 25th percentile

• second quartile (Q2) = median = cuts data set in half = 50th percentile

• third quartile (Q3) = upper quartile = cuts off highest 25% of data, or lowest 75% = 75th percentile

• * The difference between the upper and lower quartiles is called the interquartile range.

Page 24: PUAF 610  TA

24

Variance

• Describes how far values lie from the mean. • Use the absolute values or to square the

deviation scores to get rid of the minus signs.• Averaging absolute values cannot be used in

more advanced analyses.– By averaging the sum of squared deviations (sum of

squares) we can get a measure that is susceptible to further algebraic manipulations that are difficult or impossible with absolute values.

Page 25: PUAF 610  TA

2525

Variance• Less intuitive and more difficult to interpret,

because it is measured in squared units rather than original units

• Do not use variance much

• (in population) and (in sample)

where μ is the mean and N is the number of population.

Page 26: PUAF 610  TA

26

Standard deviation

26

•A widely used measure of the variability or dispersion.

•It shows how much variation there is from the "average“.

•Standard deviation is obtained by taking a square root of the variance, i.e.

(population) (sample)

Page 27: PUAF 610  TA

27

Standard deviation

• A low standard deviation indicates that the data points tend to be very close to the mean.

• A high standard deviation indicates that the data is spread out over a large range of values.

Page 28: PUAF 610  TA

28

Summary statistic

• a measure of location, or central tendency

• a measure of statistical dispersion, or variation

• a measure of the shape of the distribution

Page 29: PUAF 610  TA

29

Shape of the distribution

• Skewness

• Kurtosis

Page 30: PUAF 610  TA

30

Skewness

• a measure of the asymmetry of the distribution

• The skewness value can be positive or negative, or even undefined.

Page 31: PUAF 610  TA

31

Skewness

• negative skew: The left tail is longer; the mass of the distribution is concentrated on the right of the figure. It has relatively few low values.

Page 32: PUAF 610  TA

32

Skewness

• positive skew: The right tail is longer; the mass of the distribution is concentrated on the left of the figure. It has relatively few high values.

Page 33: PUAF 610  TA

33

Skewness

• A zero value indicates that the values are relatively evenly distributed on both sides of the mean.

Page 34: PUAF 610  TA

34

Kurtosis

• a measure of the "peakedness" of the distribution

• Higher kurtosis means more of the variance is the result of infrequent extreme deviations, as opposed to frequent modestly sized deviations

Page 35: PUAF 610  TA

35

That’s all for class review. So far so good?

Let’s go to STATA!