39
1. Are Women Paid Less than Men? Intro and revision of basic statistics

Are Women Paid Less Than Men 10-09-2012 v2.02

Embed Size (px)

Citation preview

Page 1: Are Women Paid Less Than Men 10-09-2012 v2.02

1.Are Women Paid Less than Men?

Intro and revision of basic statistics

Page 2: Are Women Paid Less Than Men 10-09-2012 v2.02

Learning Objectives

1. Review of basic statistics2. Some basic stata commands3. Nature of randomness4. Distributions5. The use and abuse of the Normal Distribution6. Basic Hypothesis testing7. The role of prediction & causation

Page 3: Are Women Paid Less Than Men 10-09-2012 v2.02

Wages and Gender

• Some Stata commands• We can use data in wages.dta to examine this issue

– .dta is a stata format data file• Start stata via ucd connect• Review the dataset using basic stata commands

– Editor– summ– Describe– sort

Page 4: Are Women Paid Less Than Men 10-09-2012 v2.02

Wages and gender

• Basic answer to the question is to compare average wages.

• Stata: – summ wage if gender==1– Summ wage if gender==2

• So we can say wages different in this sample• Can we say this difference would be true generally?

– Note the implicit “out of sample” prediction in this question

Page 5: Are Women Paid Less Than Men 10-09-2012 v2.02

Statistical Inference

• Key point is that would be no problem if we observed wages of all men and women.

• Sampling is the source of the problem– Result could it be dumb luck– Choice of sample could be dumb luck– Isolated example?

• What is chance that different sample lead to radically different result?

Page 6: Are Women Paid Less Than Men 10-09-2012 v2.02

Statistical Inference

• Key Point: Statistical inference is process of deciding when we can use a sample result to make statements about the world

• Also referred to as Hypothesis testing– Using this data can we reject the hypothesis that both

groups are paid the same?• Rough ans: we can if difference between the

average wage is large (How large?)• To answer precisely we need to review the nature

of randomness – the role of “dumb luck”

Page 7: Are Women Paid Less Than Men 10-09-2012 v2.02

Random Variables• Outcome of an “experiment” value unknown until it is

observed – Dice, coin, roulette wheel– Getting a job, the wage– In our example the observed differences between wages

may be random and have nothing to do with gender i.e. dumb luck

• Continuous vs Discrete Random Variables– Discrete: Finite no of possible outcomes

• Dice, coin, lottery• Useful for intuition

– Continuous: Outcome can be any value over a range• Most real world data is continuous

Page 8: Are Women Paid Less Than Men 10-09-2012 v2.02

Distribution of Random Variables• Discrete random variable

– Establish the frequency of outcomes in repeated experiments

• Continuous Random Variable– Outcome can be any value over a range– Can establish probability of r.v. being with a

particular interval,– not of taking a particular value (zero by definition) – e.g. Household Income, interest rates, price of

bread

Page 9: Are Women Paid Less Than Men 10-09-2012 v2.02

Probability Density Function f(x)

• Mathematical representation of the distribution of a random variable

• f(x) = Pr(X=x)– prob of dice to get 3 i.e. f(3) = Pr(X=3) = 1/6.

Page 10: Are Women Paid Less Than Men 10-09-2012 v2.02

• f(x) = 1; Sum of probabilities = 1, 0 f(x) 1

• For a continuous rv– can take on any value within an interval: infinite

no. of values.– NB: the probability of one exact value occurring =

0: • Pr(a X b) = ? • Probability is not measured by height now –

measured by area

Page 11: Are Women Paid Less Than Men 10-09-2012 v2.02

An example of a continuous RV

• An example of the bell curve

Pr(Y 1200) = Shaded Area = 1200

0( )f y

Integral defines the probability:

f(y)

1200

Page 12: Are Women Paid Less Than Men 10-09-2012 v2.02

Empirical vs Theoretical Distribution

• The examples so far are theoretical distributions• Real world data wont necessarily match these

exactly or even approximately• We can show the empirical distribution of data

on a histogram• File dice.dta contains 10 dice each rolled 3449

times– hist dice1

Page 13: Are Women Paid Less Than Men 10-09-2012 v2.02

Empirical Dist of Dice

05

1015

20P

erce

nt

1 2 3 4 5 6dice1

Page 14: Are Women Paid Less Than Men 10-09-2012 v2.02

Example using wage data

• Histogram will show the distribution– Stata: hist wages, bin(50)

• Can also do it separately for the two genders– Hist wages if gender==1, bin(50) norm– Hist wages if gender==2, bin(50) norm

• This shows that the distribution of wages looks a little different for both groups– Not just the average

• Note that bell curve is bad approximation

Page 15: Are Women Paid Less Than Men 10-09-2012 v2.02

0.0

5.1

.15

.2

0 5 10 15 20 0 5 10 15 20

Male Female

Densitynormal hwage

Den

sity

hwage

Graphs by ==1 for a man, 2 for a woman

Page 16: Are Women Paid Less Than Men 10-09-2012 v2.02

Characteristics of a Distribution

• We can characterise the difference between distributions in many ways– The two main are the Expected value and the

Variance• Expected Value is the average

– Weighted by probability

Page 17: Are Women Paid Less Than Men 10-09-2012 v2.02

Sample and Theoretical Mean

• Sample and theoretical mean will be different– dice: E(X)=1.(1/6)+2.(1/6)+3.(1/6)+4.(1/6)+5.

(1/6)+6.(1/6)=3.5– For dice1: summ dice1

• Continuous:– We already did for gender using summ

Page 18: Are Women Paid Less Than Men 10-09-2012 v2.02

Rules of Expectations

1. E(X+Y) = E(X)+E(Y) 2. E(X-Y) = E(X)-E(Y)3. E(aX) =a E(X)4. E(X+a) = E(X)+a5. E(aX+bY+cZ) = aE(X)+bE(Y)+cE(X)

See dice.dta for examples of this

Page 19: Are Women Paid Less Than Men 10-09-2012 v2.02

Variance of a random variable

• Distribution has an average but it also varies around that average

• Need a concept to measure that dispersion

In stata part of output of “summ”

Var(X) = 2

E X E X

2

2 2 .E X X E X E X

2

2 2E X E X E X E X

2

2E X E X

Page 20: Are Women Paid Less Than Men 10-09-2012 v2.02

Dice Example

X (X-E(X))2

1 1 (1-3.5)2 = 6.25

2 4 (2-3.5)2=2.25

3 9 (3-3.5)2=0.25

4 15 ….0.25

5 25 ….2.25

6 36 ….6.25

=21 =91 =17.5

2X

Page 21: Are Women Paid Less Than Men 10-09-2012 v2.02

Dice Example cont.

E (X) = X = 21/6 = 3.5 E(X2)= 91/6 = 15.166 Var (X) = E(X2) – (E(X))2 = 15.166 – (3.5)2 = 2.916

Var(X) = 21 17.5 2.9166X E Xn

Note That the theoretical variance may differ from the variance in the sampleStata: summ dice1

Page 22: Are Women Paid Less Than Men 10-09-2012 v2.02

Rules of Variance

1. Var (aX) = a2Var(X): 2. Var(X+a) = Var(X): 3. Var(a+bX) = b2Var(X): 4. Var (aX+bY) = a2Var(X) + b2Var(Y) if X and Y

are independent. (deal with dependence later).

• Standard Deviation = Square root of Variance: = 2

Page 23: Are Women Paid Less Than Men 10-09-2012 v2.02

Normal Distribution

• A special continuous distribution that can be very useful

• AKA “Gaussian Distribution”, “Bell Curve”, “Law of Errors”

• Mean and variance completely define it– X~N(m,

Page 24: Are Women Paid Less Than Men 10-09-2012 v2.02

Bell CurvePr(Y 1200) = Shaded Area =

1200

0( )f y

Integral defines the probability:

f(y)

12001000

Page 25: Are Women Paid Less Than Men 10-09-2012 v2.02

Calculating Probabilities from Bell Curve

• Integral solved for you by computer• in stata the “normal” function gives area under the

curve of standard normal rv– Mean=0, variance=1– display normal(0.5)

• For other normal make use of trick– If y~N(m, then z=(y-m/ is N(0,1)– Mean 1000, stn dev 100– Prob(x<1200)=Prob(z<2)– di normal(2)

Page 26: Are Women Paid Less Than Men 10-09-2012 v2.02

Properties of the Normal

• Symmetric around the mean– Positive and negative deviations are equally likely

• The probability of a deviation declines with the size of a deviation

• approx. 68% of the area under the curve lies between [m,m+]

• and 95% is in the interval [m2,m+2]

Page 27: Are Women Paid Less Than Men 10-09-2012 v2.02

Using the Bell Curve• We often assume that data has normal distribution

– Easy to use– Often intuitive

• But remember nothing is actually normal • We choose to model things as normal

– it is only a convenient approximation– Could be very wrong

• Always have to ask yourself if it is reasonable to treat the data as normal

• Check histogram

Page 28: Are Women Paid Less Than Men 10-09-2012 v2.02

Using the Bell Curve

• Nassim Taleb made career out of complaining that we assume data is normal when it is not– See Fooled by Randomness

• Already seen that bad approx to wage data• Bad approximation to stock market data as

grossly underside tails– Low prob of large changes– See sandp.dta

Page 29: Are Women Paid Less Than Men 10-09-2012 v2.02

Empirical dist of % Stock Returns

0.2

.4.6

Den

sity

-4 -2 0 2 4% change in daily closing price

Page 30: Are Women Paid Less Than Men 10-09-2012 v2.02

Answer the question!

• We now have some tools which we can use to answer the question of gender bias in wages

• Recall that women are paid less on average in the sample

• Recall that the issue is whether we can use this fact about the sample to make statements about the world (“population”)

Page 31: Are Women Paid Less Than Men 10-09-2012 v2.02

An Answer

• We will develop a precise method of answering these kind of questions as we go thorough the course

• For the moment we adopt a very simple heuristic approach

• Most people would suggest that the difference in the means is the key– i.e. 6.701875 -5.451302= 1.250573

• So men are paid more than women• We know the problem with this is that it doesn’t take

account of possibility that sample may be unusual– The statistical inference problem

Page 32: Are Women Paid Less Than Men 10-09-2012 v2.02

An Answer• Suppose we think that by dumb luck all our

women have come from the low end of the distribution of wages and all our men have come from the upper end– We would find “discrimination” but it is just to our

weird sample• So we need to account for the dispersion of

the data• Recall we measure dispersion by the variance

Page 33: Are Women Paid Less Than Men 10-09-2012 v2.02

An Answer• Think of how this works

– If the variance is large then by dumb luck we could have chosen a sample that where men came from one end of the distribution and women from the other

– If the variance of wages is small then the distance between the two ends of the distribution isn't that big

– So any difference in the wage is less likely due to chance• So what matters is the difference in the (mean)

wage relative to the dispersion or variance of wages• Sample size will also matter

Page 34: Are Women Paid Less Than Men 10-09-2012 v2.02

Answer

• So we calculate

– This is known as the “test statistic”• If this is large then we will say there is a

statistically significant difference between the two

• How large? Typically around 2– We will see why later

nt

/21

mm

Page 35: Are Women Paid Less Than Men 10-09-2012 v2.02

Here it is!

• Stata:– di (6.701875 -5.451302)/(3.157414/345^0.5)

• Gives 7.35• This is more than 2 so we reject the idea of equal

wages• Note we are not saying discrimination is certain

– “statistically significant” difference in wages– The observed difference is large enough that it is unlikely to

have ocurred by chance– We can reject the null hypothesis of equal wages– Note the phrasing “can reject’’

Page 36: Are Women Paid Less Than Men 10-09-2012 v2.02

A Comment on Prediction• We have taken a fact about the sample and

inferred a result about the world– Statistical inference– May be wrong

• Implicit prediction: women (outside the sample) will be paid less than men

• Over the next few weeks will learn what is missing from this technique

Page 37: Are Women Paid Less Than Men 10-09-2012 v2.02

A Comment on Causation• Implicit causation:

– gender “causes” the pay difference• Only really shown correlation• Need theory to justify causation• Think of effect of schooling vs effect of

intelligence• This is problem even if solve the sampling

issue

Page 38: Are Women Paid Less Than Men 10-09-2012 v2.02

What Have We Learned?1. Most fundamental issue: sampling causes

randomness2. Theoretical distributions are only approx to the

true distribution3. Distributions can be characterised by means

and variances4. Sampling “error”

– makes statistical inference non-trivial– Prediction is implicit in everything

5. Causation is not the same as correlation

Page 39: Are Women Paid Less Than Men 10-09-2012 v2.02

What is missing?• Only one of the variables was continuous• We were very lax and vague in the discussion

of statistical inference• Both of these to be corrected in the next

section