11
1

Fa2013 mba724-session 5 week 2 correlation-za edit

  • Upload
    ambadar

  • View
    305

  • Download
    0

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Fa2013 mba724-session 5 week 2 correlation-za edit

1  

Page 2: Fa2013 mba724-session 5 week 2 correlation-za edit

2  

Page 3: Fa2013 mba724-session 5 week 2 correlation-za edit

We  are  making  a  big  assump1on  here  –  that  the  rela1onship  is  a  straight  line  

Wouldn’t  life  be  so  much  easier  if  all  rela1onships  are  straight  lines?  

3  

Page 4: Fa2013 mba724-session 5 week 2 correlation-za edit

The  Pearson  correla1on  r  is  a  numeric  index  of  the  rela1onship  between  two  con1nuous  (interval/ra1o)  variables  Cau1on:  if  a  variable  is  categorical  (e.g.,  gender  –  male  vs.  female;  ethnic  –  white,  black,  asian)  you  cannot  correlate  it  with  another  variable.  Pearson  r  can  only  be  calculated  between  two  number  variables  (e.g.,  age,  salary,  height,  weight)  

R  tells  us  how  much  the  rela1onship  is  a  straight  line  

These  graphs  show  possible  ways  two  variables  relate  to  one  another  

The  more  the  graph  looks  like  a  straight  line,  the  stronger  the  r  value  is  

The  graphs  that  resemble  a  circle  indicate  very  low  or  even  no  correla1on  between  the  two  variables  

The  direc1on  of  the  line  indicates  whether  the  correla1on  is  posi1ve  or  nega1ve  If  the  line  goes  up  to  the  right,  it’s  a  posi1ve  rela1onship  (meaning,  when  X  goes  up,  Y  goes  up  too)  If  the  line  goes  down  to  the  right,  it’s  a  nega1ve  rela1onship  (meaning,  when  X  goes  up,  Y  goes  down  and  vice  versa)  

For  example,  “when  we  get  older,  we  also  get  wiser”.  If  this  is  true,  that  means  there  should  be  a  posi1ve  and  strong  Pearson  correla1on  r  between  the  age  variable  and  the  wisdom  variable.  

If  we  are  less  happy  when  we  have  more  money,  that  means  there  should  be  a  nega1ve  Pearson  correla1on  r  between  the  happiness  variable  and  the  money  variable  

4  

Page 5: Fa2013 mba724-session 5 week 2 correlation-za edit

As  you  can  see  from  these  charts,  Pearson  correla1on  r  becomes  stronger  as  the  data  points  cluster  more  1ghtly  around  a  straight  line.  

When  the  data  points  are  distributed  like  a  round  circle,  that  means  the  X  and  Y  variables  have  liTle  rela1onship  to  each  other.  

Note  that  most  of  these  (except  for  the  first  graph)  have  posi1ve  correla1ons,  although  some  of  them  are  weaker  (more  rounded)  than  others  (more  straight  lines).  

5  

Page 6: Fa2013 mba724-session 5 week 2 correlation-za edit

The  same  principle  applies  to  the  nega1ve  correla1ons.  The  trend  goes  down  to  the  right  when  the  correla1on  is  nega1ve  

6  

Page 7: Fa2013 mba724-session 5 week 2 correlation-za edit

Again,  to  summarize  there  are  two  components  to  the  correla1on  value:  

1.  It’s  direc1on,  2.  it’s  strength  

What  kind  of  correla1on  are  you  predic1ng  for  your  group  project?  

7  

Page 8: Fa2013 mba724-session 5 week 2 correlation-za edit

Cau1on:  Correla1on  measures  the  linear  rela1onship  between  two  variables.  When  the  assump1on  of  normality  is  violated,  weird  things  happen.  This  slide  illustrates  4  different  datasets  all  with  the  same  correla1on.  The  moral  of  the  story  is  that  we  should  always  inspect  the  scaTerplot  when  running  correla1ons.  Numbers  should  be  interpreted  sensibly.  

8  

Page 9: Fa2013 mba724-session 5 week 2 correlation-za edit

We  can  never  stress  enough  that  correla1on  is  NOT  the  same  as  causa1on.  

One  of  my  favorite  examples  by  a  student  is  about  shoe  size  and  intelligence.    A  posi1ve  correla1on  was  found  between  shoe  size  and  intelligence  levels,  leading  people  to  think  that  bigger  feet  =  smarter  people.  Then  they  realized  that  bigger  shoe  size  also  generally  means  older  people,  and  in  fact  it  wasn’t  the  size  of  peoples’  feet  that  was  causing  increased  intelligence,  it  was  simply  the  fact  that  they  were  older  and  therefore  scored  higher  on  tests!      

9  

Page 10: Fa2013 mba724-session 5 week 2 correlation-za edit

We  all  want  to  have  a  posi1ve  rela1onship  with  our  family,  friends,  coworkers,  etc.  Who  wants  a  nega1ve  rela1onship,  right?  

In  that  spirit,  why  would  anyone  want  a  nega1ve  correla1on?  And  we  should  celebrate  every  1me  we  have  a  posi1ve  correla1on,  right?  

How  about  a  posi1ve  correla1on  between  GDP  and  obesity  level?  How  about  a  posi1ve  correla1on  between  smoking  and  cancer?  How  about  a  posi1ve  correla1on  between  the  CEO’s  compensa1on  and  corrup1on  level?    

Now  let’s  look  at  some  nega1ve  correla1ons  that  are  supposed  to  be  “depressing:”  more  exercise  associated  with  lower  levels  of  obesity,  more  educa1on  associated  with  lower  crime  rate,  fewer  mee1ngs  associated  with  increased  produc1vity,  and,  how  about  more  relaxing  weekends  associated  with  lower  stress  levels?  

What’s  the  moral  of  the  story?  Correla1on  is  what  it  is  –  it’s  a  number  that  indicates  the  strength  and  direc1on  of  a  rela1onship  between  two  numerical  (con1nuous)  variables.  Whether  the  rela1onship  is  good  for  the  mankind  or  not  is  beyond  the  scope  of  the  humble  liTle  number’s  responsibility!  

10  

Page 11: Fa2013 mba724-session 5 week 2 correlation-za edit

Assigning  numbers  to  categorical  variables  do  not  make  them  interval/ra1o  variables.  

This  is  because  we  can  only  do  math  with  interval/ra1on  variables.  Basic  math  principles  don’t  apply  to  categorical  variables,  even  if  they  have  numbers  associated  with  them.  The  numbers  assign  to  categorical  variables  are  just  for  iden1fica1on,  just  like  SSN,  or  zip  codes.  

For  example,  1+1=2  In  the  gender  case,  this  means  that  if  you  add  a  female  and  another  female  together,  that’s  equal  to  a  male.  Another  math  principle  is  that  2  is  twice  as  big  as  1.  In  the  gender  case,  that  would  mean  that  a  male  is  twice  as  big  as  a  female.  

All  this  madness  would  happen  if  we  try  to  treat  categorical  variables  in  numeric  ways.  

Keep  in  mind  that  the  Pearson  correla1on  r  value  is  calculated  based  on  a  math  formula.  If  you  try  to  feed  the  gender  variables  into  SPSS  as  numbers,  SPSS  CAN  and  WILL  calculate  a  Pearson  correla1on  value  for  you,  but  using  that  number  requires  you  to  make  the  kinds  of  crazy  assump1ons  illustrated  above.  

11