Upload
ambadar
View
305
Download
0
Embed Size (px)
DESCRIPTION
Citation preview
1
2
We are making a big assump1on here – that the rela1onship is a straight line
Wouldn’t life be so much easier if all rela1onships are straight lines?
3
The Pearson correla1on r is a numeric index of the rela1onship between two con1nuous (interval/ra1o) variables Cau1on: if a variable is categorical (e.g., gender – male vs. female; ethnic – white, black, asian) you cannot correlate it with another variable. Pearson r can only be calculated between two number variables (e.g., age, salary, height, weight)
R tells us how much the rela1onship is a straight line
These graphs show possible ways two variables relate to one another
The more the graph looks like a straight line, the stronger the r value is
The graphs that resemble a circle indicate very low or even no correla1on between the two variables
The direc1on of the line indicates whether the correla1on is posi1ve or nega1ve If the line goes up to the right, it’s a posi1ve rela1onship (meaning, when X goes up, Y goes up too) If the line goes down to the right, it’s a nega1ve rela1onship (meaning, when X goes up, Y goes down and vice versa)
For example, “when we get older, we also get wiser”. If this is true, that means there should be a posi1ve and strong Pearson correla1on r between the age variable and the wisdom variable.
If we are less happy when we have more money, that means there should be a nega1ve Pearson correla1on r between the happiness variable and the money variable
4
As you can see from these charts, Pearson correla1on r becomes stronger as the data points cluster more 1ghtly around a straight line.
When the data points are distributed like a round circle, that means the X and Y variables have liTle rela1onship to each other.
Note that most of these (except for the first graph) have posi1ve correla1ons, although some of them are weaker (more rounded) than others (more straight lines).
5
The same principle applies to the nega1ve correla1ons. The trend goes down to the right when the correla1on is nega1ve
6
Again, to summarize there are two components to the correla1on value:
1. It’s direc1on, 2. it’s strength
What kind of correla1on are you predic1ng for your group project?
7
Cau1on: Correla1on measures the linear rela1onship between two variables. When the assump1on of normality is violated, weird things happen. This slide illustrates 4 different datasets all with the same correla1on. The moral of the story is that we should always inspect the scaTerplot when running correla1ons. Numbers should be interpreted sensibly.
8
We can never stress enough that correla1on is NOT the same as causa1on.
One of my favorite examples by a student is about shoe size and intelligence. A posi1ve correla1on was found between shoe size and intelligence levels, leading people to think that bigger feet = smarter people. Then they realized that bigger shoe size also generally means older people, and in fact it wasn’t the size of peoples’ feet that was causing increased intelligence, it was simply the fact that they were older and therefore scored higher on tests!
9
We all want to have a posi1ve rela1onship with our family, friends, coworkers, etc. Who wants a nega1ve rela1onship, right?
In that spirit, why would anyone want a nega1ve correla1on? And we should celebrate every 1me we have a posi1ve correla1on, right?
How about a posi1ve correla1on between GDP and obesity level? How about a posi1ve correla1on between smoking and cancer? How about a posi1ve correla1on between the CEO’s compensa1on and corrup1on level?
Now let’s look at some nega1ve correla1ons that are supposed to be “depressing:” more exercise associated with lower levels of obesity, more educa1on associated with lower crime rate, fewer mee1ngs associated with increased produc1vity, and, how about more relaxing weekends associated with lower stress levels?
What’s the moral of the story? Correla1on is what it is – it’s a number that indicates the strength and direc1on of a rela1onship between two numerical (con1nuous) variables. Whether the rela1onship is good for the mankind or not is beyond the scope of the humble liTle number’s responsibility!
10
Assigning numbers to categorical variables do not make them interval/ra1o variables.
This is because we can only do math with interval/ra1on variables. Basic math principles don’t apply to categorical variables, even if they have numbers associated with them. The numbers assign to categorical variables are just for iden1fica1on, just like SSN, or zip codes.
For example, 1+1=2 In the gender case, this means that if you add a female and another female together, that’s equal to a male. Another math principle is that 2 is twice as big as 1. In the gender case, that would mean that a male is twice as big as a female.
All this madness would happen if we try to treat categorical variables in numeric ways.
Keep in mind that the Pearson correla1on r value is calculated based on a math formula. If you try to feed the gender variables into SPSS as numbers, SPSS CAN and WILL calculate a Pearson correla1on value for you, but using that number requires you to make the kinds of crazy assump1ons illustrated above.
11