36
Sweeney L. Discrimination in Online Ad Delivery Discrimination in Online Ad Delivery Latanya Sweeney Harvard University [email protected] January 28, 2013 1 Abstract A Google search for a person's name, such as “Trevon Jones”, may yield a personalized ad for public records about Trevon that may be neutral, such as Looking for Trevon Jones? …”, or may be suggestive of an arrest record, such as Trevon Jones, Arrested?...”. This writing investigates the delivery of these kinds of ads by Google AdSense using a sample of racially associated names and finds statistically significant discrimination in ad delivery based on searches of 2184 racially associated personal names across two websites. First names, previously identified by others as being assigned at birth to more black or white babies, are found predictive of race (88% black, 96% white), and those assigned primarily to black babies, such as DeShawn, Darnell and Jermaine, generated ads suggestive of an arrest in 81 to 86 percent of name searches on one website and 92 to 95 percent on the other, while those assigned at birth primarily to whites, such as Geoffrey, Jill and Emma, generated more neutral copy: the word "arrest" appeared in 23 to 29 percent of name searches on one site and 0 to 60 percent on the other. On the more ad trafficked website, a blackidentifying name was 25% more likely to get an ad suggestive of an arrest record. A few names did not follow these patterns: Dustin, a name predominantly given to white babies, generated an ad suggestive of arrest 81 and 100 percent of the time. All ads return results for actual individuals and ads appear regardless of whether the name has an arrest record in the company’s database. Notwithstanding these findings, the company maintains Google received the same ad text for groups of last names (not first names), raising questions as to whether Google's advertising technology exposes racial bias in society and how ad and search technology can develop to assure racial fairness. Keywords: online advertising, public records, racial discrimination, data privacy, information retrieval, computers and society, search engine marketing 1 v0.14 Preprint available at http://dataprivacylab.org/projects/onlineads/1071-1.pdf

Discrimination in Online Ad Delivery - arXiv

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Sweeney L. Discrimination in Online Ad Delivery

   

Discrimination  in  Online  Ad  Delivery          

Latanya  Sweeney  Harvard  University  [email protected]  

 January  28,  20131  

     Abstract    A  Google  search  for  a  person's  name,  such  as  “Trevon  Jones”,  may  yield  a  personalized  ad  for  public  records  about  Trevon  that  may  be  neutral,  such  as  “Looking  for  Trevon  Jones?  …”,  or  may  be  suggestive  of  an  arrest  record,  such  as  “Trevon  Jones,  Arrested?...”.  This  writing  investigates  the  delivery  of  these  kinds  of  ads  by  Google  AdSense  using  a  sample  of  racially  associated  names  and  finds  statistically  significant  discrimination  in  ad  delivery  based  on  searches  of  2184  racially  associated  personal  names  across  two  websites.    First  names,  previously  identified  by  others  as  being  assigned  at  birth  to  more  black  or  white  babies,  are  found  predictive  of  race  (88%  black,  96%  white),  and  those  assigned  primarily  to  black  babies,  such  as  DeShawn,  Darnell  and  Jermaine,  generated  ads  suggestive  of  an  arrest  in  81  to  86  percent  of  name  searches  on  one  website  and  92  to  95  percent  on  the  other,  while  those  assigned  at  birth  primarily  to  whites,  such  as  Geoffrey,  Jill  and  Emma,  generated  more  neutral  copy:  the  word  "arrest"  appeared  in  23  to  29  percent  of  name  searches  on  one  site  and  0  to  60  percent  on  the  other.  On  the  more  ad  trafficked  website,  a  black-­‐identifying  name  was  25%  more  likely  to  get  an  ad  suggestive  of  an  arrest  record.    A  few  names  did  not  follow  these  patterns:  Dustin,  a  name  predominantly  given  to  white  babies,  generated  an  ad  suggestive  of  arrest  81  and  100  percent  of  the  time.    All  ads  return  results  for  actual  individuals  and  ads  appear  regardless  of  whether  the  name  has  an  arrest  record  in  the  company’s  database.  Notwithstanding  these  findings,  the  company  maintains  Google  received  the  same  ad  text  for  groups  of  last  names  (not  first  names),  raising  questions  as  to  whether  Google's  advertising  technology  exposes  racial  bias  in  society  and  how  ad  and  search  technology  can  develop  to  assure  racial  fairness.    Keywords:  online  advertising,  public  records,  racial  discrimination,  data  privacy,  information  retrieval,  computers  and  society,  search  engine  marketing

1 v0.14 Preprint available at http://dataprivacylab.org/projects/onlineads/1071-1.pdf

Sweeney L. Discrimination in Online Ad Delivery

2

Introduction    Have  you  ever  been  arrested?  Imagine  the  question  not  appearing  in  the  solitude  of  your  thoughts  as  you  read  this  paper,  but  appearing  explicitly  whenever  someone  queries  your  name  in  a  search  engine.    Perhaps  you  are  in  competition  for  an  award,  an  appointment,  a  promotion,  or  a  new  job,  or  maybe  you  are  in  a  position  of  trust,  such  as  a  professor,  a  physician,  a  banker,  a  judge,  a  manager,  or  a  volunteer,  or  perhaps  you  are  completing  a  rental  application,  selling  goods,  applying  for  a  loan,  joining  a  social  club,  making  new  friends,  dating,  or  engaged  in  any  one  of  hundreds  circumstances  for  which  an  online  searcher  seeks  to  learn  more  about  you.    Appearing  alongside  your  list  of  accomplishments  is  an  advertisement  implying  you  may  have  a  criminal  record,  whether  you  actually  have  one  or  not.    Worse,  the  ads  don’t  appear  for  your  competitors.    A  person’s  criminal  record  begins  when  he  is  arrested  for  a  crime.  Job  applications  frequently  include  questions  such  as:    

• "Have  you  ever  been  arrested?"  • "Have  you  ever  been  charged  with  a  crime?"  • "Other  than  a  traffic  ticket,  have  you  been  convicted  of  a  crime?"  

 Advantages  of  knowing  such  information  when  hiring  or  engaging  with  a  person  relate  to  trustworthiness.      Because  others  often  equate  a  criminal  record  with  not  being  reliable  or  honest,  protections  exist  for  those  having  criminal  records.      If  someone  is  falsely  accused  of  a  crime,  pleads  not  guilty,  and  charges  are  dismissed,  in  the  U.S.,  he  may  file  suit  against  the  person  who  brought  the  charges.  For  example,  if  a  private  citizen  files  a  false  criminal  charge  against  you,  or  falsely  makes  a  complaint  to  a  police  officer  that  results  in  your  arrest,  and  if  no  conviction  results,  you  may  be  able  to  sue  the  accuser  for  malicious  prosecution.      If  an  employer  disqualifies  a  job  applicant  based  solely  upon  information  indicating  an  arrest  record,  the  company  may  face  legal  consequences.  The  U.S.  Equal  Employment  Opportunity  Commission  ("EEOC")  is  the  federal  agency  charged  with  enforcing  Title  VII  of  the  Civil  Rights  Act  of  1964,  a  law  in  the  United  States  which  applies  to  most  employers,  prohibiting  employment  discrimination  based  on  race,  color,  religion,  sex,  or  national  origin,  and  through  guidance  issuance  in  1973,  extended  to  persons  having  criminal  records  [1,2].    Title  VII  does  not  prohibit  employers  from  obtaining  criminal  background  information.    However,  certain  uses  of  criminal  information,  such  as  a  blanket  policy  or  practice  of  excluding  applicants  or  disqualifying  employees  based  solely  upon  information  indicating  an  arrest  record,  can  result  in  a  charge  of  discrimination.  To  make  a  determination,  the  EEOC  uses  an  “adverse  impact  test,”  which  measures  whether  practices,  intentional  or  not,  have  a  disproportionate  effect.  If  the  ratio  of  the  effect  on  groups  is  less  than  80%,  the  employer  may  be  held  responsible  for  discrimination  [3].  

Sweeney L. Discrimination in Online Ad Delivery

3

So  what  about  online  ads  suggesting  someone  with  your  name  has  an  arrest  record,  even  when  no  one  with  your  name  has  ever  been  arrested?      The  malicious  prosecution  approach  does  not  apply.    Title  VII  does  not  apply  either,  unless  you  have  an  arrest  record  and  can  prove  the  potential  employer  used  the  ad  or  information  from  the  company  sponsoring  the  ad.      Further,  the  advertiser  may  argue  that  the  ads  are  commercial  free  speech  –a  constitutional  right  to  display  the  ad  associated  with  your  name.    The  First  Amendment  of  the  U.S.  Constitution  protects  advertising,  as  granted  under  the  landmark  U.S.  Supreme  Court  decision,  Central  Hudson  Gas  &  Electric  Corp.  v.  Public  Service  Commission  of  New  York,  Supreme  Court  of  the  United  States,  447  U.S.  557  (1980).    In  Central  Hudson,  the  Supreme  Court  sets  out  a  four-­‐part  test  for  assessing  government  restrictions  on  commercial  speech,  which  begins  by  determining  whether  the  speech  is  misleading.    Are  online  ads  suggesting  the  existence  of  an  arrest  record  misleading  if  no  one  having  the  name  has  an  arrest  record?      Assume  the  ads  are  free  speech:  what  happens  when  these  ads  appear  more  often  for  one  racial  group  than  another?    Not  everyone  is  being  equally  affected  by  the  free  speech.    Is  that  free  speech  or  is  it  racial  discrimination?        Racism  is  “any  attitude,  action  or  institutional  structure  which  subordinates  a  person  or  group  because  of  their  color  .  .  .  Racism  is  not  just  a  matter  of  attitudes;  actions  and  institutional  structures  can  also  be  a  form  of  racism”  [4].  Racial  discrimination  results  when  a  person  or  group  of  people  is  treated  differently  based  on  their  racial  origins  [5].  Power  is  a  necessary  precondition,  for  it  depends  on  the  ability  to  give  or  withhold  benefits,  facilities,  services,  opportunities  etc.,  from  someone  who  should  be  entitled  to  them,  and  are  denied  on  the  basis  of  race.    Institutional  or  structural  racism  is  a  system  of  procedures/patterns  whose  effect  is  to  foster  discriminatory  outcomes  or  give  preferences  to  members  of  one  group  over  another  [6].      Notice  that  racism  can  result,  even  if  not  intentional  and  that  online  activity  may  be  so  ubiquitous  and  intimately  entwined  with  technology  design  that  technologists  may  now  have  to  think  about  societal  consequences  like  structural  racism  in  the  technology  they  design.  Such  considerations  are  beyond  this  paper,  but  they  frame  the  relevant  legal,  societal  and  technical  landscape  in  which  this  work  resides.      The  investigation,  chronicled  in  this  writing,  reports  on  an  observed  phenomenon,  that  some  online  ads  suggestive  of  arrest  records  appear  more  often  for  one  racial  group  than  another  among  a  sample  of  racially  associated  names.  Because  online  ad  delivery  is  a  socio-­‐technical  construct,  its  study  requires  blending  sociology  and  computer  science,  and  so  this  writing  presents  such  a  blend.        

Sweeney L. Discrimination in Online Ad Delivery

4

Problem  Statement    

Given  online  searches  of  racially  identifying  names,  show  that  associated  personalized  ads  suggestive  of  an  arrest  record  do  not  differ  by  race.    

 Our  hypothesis:  no  difference  exists  in  the  delivery  of  ads  suggestive  of  an  arrest  record  responding  to  online  searches  of  racially  associated  names.  Then,  when  presented  with  evidence  of  a  pattern  to  the  contrary,  examine  the  pattern’s  credibility,  likelihood  and  circumstances  of  occurring.        What  is  the  suspected  pattern  of  ad  delivery?  Below  are  three  groups  of  ad  hoc  real-­‐world  examples  that  jointly  describe  concerns.    Earlier  this  year,  a  Google  search  for  “Latanya  Farrell”  yielded  the  two  ads  appearing  in  Figure  1a.  The  first  ad  implies  she  may  have  been  arrested,  was  she?  After  clicking  on  the  link  and  paying  the  requisite  subscription  fee,  we  learn  that  the  company  has  no  arrest  record  for  her  (Figure  1b).  A  Google  search  for  “Latanya  Sweeney“  and  “Latanya  Lockett”  also  yields  ads  suggestive  of  arrests.  We  find  no  arrest  record  for  “Latanya  Sweeney”  but  we  do  for  “Latanya  Lockett”  (Figure  1).    The  ads  appeared  on  google.com  and  on  a  newspaper  website  to  which  Google  supplies  ads,  reuters.com  (Figure  1c).    All  the  ads  in  question  link  to  instantcheckmate.com.    In  comparison,  searches  for  “Kristen  Haring”,  “Kristen  Sparrow”  and  “Kristen  Lindquist”  did  not  yield  any  instantcheckmate.com  ads,  only  competitor  ads  (Figure  2a,  2c,  and  2e),  even  though  the  company’s  database  reports  having  records  for  all  three  names  and  arrest  records  for  “Kristen  Sparrow”  and  “Kristen  Lindquist  (Figure  2d  and  2f).      Searches  for  “Jill  Foley”,  “Jill  Schneider”  and  “Jill  James”  displayed  instantcheckmate.com  ads  with  neutral  copy;  the  word  “arrest”  did  not  appear  in  the  ads  even  though  arrest  records  for  all  three  names  appear  in  the  company’s  database  (Figure  3).    Lastly,  we  consider  a  proxy  for  race  associated  with  these  names.  Figure  4  shows  Google  images  appearing  for  image  searches  of  “Latanya”,  “Latisha”,  “Kristen”  and  “Jill”,  respectively.    There  appears  a  racial  distinction.    The  faces  associated  with  “Latanya”  and  “Latisha”  (Figure  4a  and  4b)  tend  to  be  black,  while  white  faces  dominate  the  images  of  “Kristen”  and  “Jill”  (Figure  4c  and  4d).            Together,  these  handpicked  examples  (Figures  2,  3  and  4)  describe  the  suspected  pattern  –ads  suggesting  arrest  tend  to  appear  with  names  associated  with  blacks  and  neutral  ads  or  no  ads  tend  to  appear  with  names  associated  with  whites,  regardless  of  whether  the  company  has  an  arrest  record  associated  with  the  name.    The  remainder  of  this  paper  describes  a  journey  to  establish  an  instance  of  the  pattern  worthy  of  scholarly  consideration  and  statistical  assessment.      

Sweeney L. Discrimination in Online Ad Delivery

5

   

 

 

   

(a)    

 

(b)    

 

 

 

 

 

 

(c)    

 

(d)    

 

 

 

   

(e)    

 

(f)    

Figure  1.  Sample  ads  and  criminal  reports  for  “latanya  farrell”  (a,b),  “latanya  sweeney”  (c,d),  and  “latanya  locket”(e,f)    appearing  on  google.com  (a,b,c)  and  reuters.com  (c  bottom).    Criminal  reports  from  instantcheckmate.com  (b,d,f).        

Sweeney L. Discrimination in Online Ad Delivery

6

 

 

 

   

 

(a)    

 

(b)    

 

 

 

   

(c)    

 

(d)    

 

 

 

 

 

(e)      

 

(f)    

Figure  2.  Sample  ads  and  criminal  reports  for  “kristen  haring”  (a),  “kristen  sparrow”  (b),  and  “kristen  lindquist”  (c),  appearing  on  reuters.com  (a,c,e).    Criminal  reports  from  instantcheckmate.com  (b,d,f).        

Sweeney L. Discrimination in Online Ad Delivery

7

     

 

 

     

 

 

 

(a)    

 

(b)  

 

 

 

   

(c)    

 

(d)  

 

 

 

   

(e)    

 

(f)  

Figure  3.  Sample  ads  and  criminal  reports  for  “jill  foley”  (a,b),  “jill  schneider”  (c,d),  and  “jill  james”(e,f)    appearing  on  google.com  (c,e)  and  reuters.com  (a).    Criminal  reports  from  instantcheckmate.com  (b,d,f).  

Sweeney L. Discrimination in Online Ad Delivery

8

     

 

 

         

     

 

(a)    

 

(b)    

 

 

   

(c)    

 

(d)  Figure  4.  Sample  face  images  on  google.com  retrieved  for  searches  “latanya”  (a),  “latisha”  (b),  “kristen”  (c),  and  “jill”  (d).        

Sweeney L. Discrimination in Online Ad Delivery

9

Google  AdSense    Who  generates  the  ad’s  text?    Who  decides  when  and  where  an  ad  will  appear?  What  is  the  relationship  between  Google,  Reuters  and  Instant  Checkmate  in  the  previous  examples?  An  overview  of  Google  AdSense,  the  program  that  delivered  the  ads  in  Figures  1,  2,  and  3,  explains  entities  and  relationships.    In  printed  newspapers  and  magazines,  ad  space  and  ad  content  are  fixed.    Everyone  who  purchases  the  publication  sees  the  same  ad  in  the  same  space.    But  websites  are  different.    Online  ad  space,  not  bound  by  the  same  physical  limitations,  can  be  dynamic,  with  ads  tailored  to  the  reader’s  search  criteria,  content  interests,  geographical  location,  and  so  on.    Any  two  readers  (or  the  same  reader  returning  to  the  same  website)  might  view  different  ads.        Google  AdSense  is  the  largest  provider  of  dynamic  online  advertisements,  placing  ads  for  millions  of  sponsors  on  millions  of  websites  [7].    In  the  first  quarter  of  2011,  Google  earned  US  $2.43  billion  ($9.71  billion  annualized),  or  28%  of  total  revenue,  through  Google  AdSense  [8].    AdSense  has  operational  variations,  but  for  simplicity,  this  writing  only  describes  those  features  of  Google  AdSense  specific  to  the  Instant  Checkmate  ads  in  question.      When  a  reader  enters  search  criteria  in  an  enrolled  website,  Google  AdSense  embeds  ads  believed  to  be  relevant  to  his  search  in  the  web  page  of  results.    Figures  1,  2,  and  3  show  ads  delivered  by  Google  AdSense  in  response  to  various  “firstname  lastname”  searches.    To  place  an  online  ad,  a  “sponsor”  provides  Google  with  search  criteria,  copies  of  possible  ads  to  deliver  once  a  match  occurs,  and  a  financial  bid  (an  amount  the  sponsor  is  willing  to  pay)  if  a  reader  clicks the  delivered  ad.2    Google  operates  a  real-­‐time  auction  across  bids  for  the  same  search  criteria,  usually  displaying  the  ad  having  the  highest  bid  first,  the  second  highest  second,  and  so  on,  and  may  elect  not  to  show  any  ad  if  it  considers  the  bid  too  low  or  if  showing  the  ad  exceeds  a  threshold  (e.g.  a  maximum  account  total  for  the  sponsor).    In  Figures  1,  2,  and  3,  Instant  Checkmate  sponsors  the  ads,  which  in  most  cases  appears  first  among  ads,  implying  Instant  Checkmate  had  the  highest  bid.    A  website  owner  wanting  to  “host”  online  ads  enrolls  in  AdSense  and  changes  his  website  to  include  special  software  that  sends  information  about  the  current  reader  (e.g.,  search  criteria)  to  Google  and  in  exchange,  receives  corresponding  ads  from  Google.  The  displayed  ads  have  the  banner  “Ads  by  Google”  when  appearing  on  sites  other  than  google.com.    For  example,  reuters.com  is  an  AdSense  host,  and  entering  “Latanya  Sweeney”  in  the  search  bar  at  reuters.com  generated  a  new  web  page  having  ads  delivered  by  Google,  bearing  the  banner  “Ads  by  Google”  (Figure  1c).       2 This writing conflates two interacting Google programs: Google Adwords allows advertisers to specify search criteria, ad text and bids and Google AdSense delivers the ads to host sites.

Sweeney L. Discrimination in Online Ad Delivery

10

   

   

Figure  5.  Google  explanation  for  delivering  ad  “Latanya  Sweeney,  Arrested?”  –matches  the  exact  first  and  last  name  searched.    

   There  is  no  cost  associated  with  displaying  an  ad,  but  if  the  reader  actually  clicks  the  ad,  the  sponsor  pays  the  promised  bid,  which  is  split  between  Google  and  the  host.    Clicking  the  “Latanya  Sweeney”  ad  on  reuters.com  (Figure  1c)  would  cause  Instant  Checkmate  to  pay  its  bid  to  Google,  which  splits  it  with  Reuters.        Search  Criteria    What  search  criteria  did  Instant  Checkmate  specify?  Are  ads  randomly  delivered?  Do  ads  rely  only  on  the  first  name?    Will  ads  be  delivered  for  made-­‐up  names  too?  Google  AdSense  provides  answers  to  these  questions  too.    Ads  displayed  on  google.com  allow  readers  to  learn  why  a  specific  ad  appeared.  Clicking  the  circled  “i”  in  the  ad  banner  (e.g.,  Figure  1c)  provides  a  web  page  explaining  the  ads  (e.g.,  Figure  5).    Doing  so  for  ads  in  Figures  1,  2,  and  3,  reveals  that  the  ads  appeared  because  the  search  criteria  associated  with  the  bid  matched  the  exact  first  and  last  name  combination  searched.    Because  bids  presumably  relate  to  records  the  company  sells,  the  names  would  likely  be  the  first  and  last  names  of  real  people,  and  because  searches  are  online,  ads  may  be  more  effective  for  people  having  online  identities.      In  summary,  search  criteria  associated  with  ads:    

• has  to  be  both  first  and  last  names;    • should  be  names  of  real  people;  and,  • may  prefer  names  of  people  with  an  online  identity.      

 The  next  sections  describe  systematic  construction  of  a  list  of  racially  associated  first  and  last  names  for  real  people.    It  is  not  presumed  that  Instant  Checkmate  placed  bids  or  Google  delivered  ads  using  any  such  list.  Instead,  the  list  allows  us  to  have  a  qualified  sample  of  racially  associated  names  for  testing  ad  delivery.            

Sweeney L. Discrimination in Online Ad Delivery

11

Black  and  White  Identifying  Names    “Black-­‐identifying”  and  “white-­‐identifying”  first  names  are  those  for  which  a  significant  number  of  people  have  the  name  and  the  frequency  is  sufficiently  higher  in  one  race  than  another.    Heavily  cited  prior  academic  work  provides  exemplars.      In  2003,  Bertrand  and  Mullainathan  did  a  field  experiment  in  which  they  provided  resumes  to  job  posts  that  were  virtually  identical  except  some  of  the  resumes  had  black-­‐identifying  names  and  others  had  white-­‐identifying  names  [9].  Their  “Job  Discrimination  Study”  showed  significant  discrimination  against  black  names:  white  names  received  50%  more  callbacks  for  interviews  even  though  the  resumes  otherwise  had  identical  qualifications.      The  Job  Discrimination  study  used  a  correlation  of  names  given  to  black  and  white  babies  in  Massachusetts  between  1974  and  1979,  defining  black-­‐identifying  and  white-­‐identifying  names  as  those  that  have  the  highest  ratio  of  frequency  in  one  racial  group  to  frequency  in  the  other  racial  group.            

  White Female Black Female White Male Black Male        

(a)  

Allison Anne Carrie Emily

Jill Laurie Kristen

Meredith

Aisha Ebony Keisha Kenya

Latonya Lakisha Latoya Tamika

Brad Brendan Geoffrey

Greg Brett Jay

Matthew Neil

Darnell Hakim

Jermaine Kareem Jamal Leroy

Rasheed Tremayne

 

     

(b)  

Molly Amy

Claire Emily* Katie

Madeline Katelyn Emma

Imani Ebony* Shanice Aaliyah

Precious Nia

Deja Diamond

Jake Connor Tanner Wyatt Cody Dustin Luke Jack

DeShawn DeAndre Marquis Darnell* Terrell Malik

Trevon Tyrone

                     (c)    

Latanya Latisha

   

Figure  6.  Black-­‐identifying  and  white-­‐identifying  first  names  from  (a)  the  Job  Discrimination  Study  [9],  (b)  Fryer  and  Levitt  [11],  and  (c)  observation  in  Figure  4.  Emily,  a  white  female  name,  Ebony,  a  black  female  name,  and  Darnell,  a  black  male  name,  appear  in  both  (a)  and  (b),  giving  a  total  of  63  distinct  first  names.  

     

Sweeney L. Discrimination in Online Ad Delivery

12

In  the  popular  book  "Freakonomics,"  Levitt  and  Dubner  report  the  top  20  whitest-­‐  and  blackest-­‐identifying  girl  and  boy  names  [10].    The  list  comes  from  earlier  work  by  Fryer  and  Levitt,  which  shows  a  pattern  change  in  the  way  Blacks  named  their  children  starting  in  the  1970’s,  which  they  correlate  with  the  Black  Power  Movement  [11].    They  postulate  that  the  movement  influenced  how  Blacks  perceived  their  identities  and  they  give  as  evidence  that  before  the  movement,  names  given  to  black  and  white  children  were  not  distinctly  different,  but  after  the  movement,  the  emergence  of  distinctly  black  names  appear.    Similar  to  the  Job  Discrimination  Study,  the  list  used  by  Fryer  and  Levitt  comes  from  names  given  to  black  and  white  children  recorded  in  California  birth  records  from  1961-­‐2000  (over  16  million  births).        We  need  a  list  of  racially  associated  names  in  order  to  test  ad  delivery,  so  we  use  the  union  of  lists  from  these  prior  studies  augmented  with  two  black  female  names,  “Latanya”  and  “Latisha”,  from  earlier  observations.    Figure  6  enumerates  our  list,  having  eight  names  for  each  of  the  categories:  white  female,  black  female,  white  male,  and  black  male  from  the  Job  Discrimination  Study  (Figure  6a),  and  the  first  eight  names  for  each  category  from  the  Fryer  and  Levitt  work  (Figure  6b).  Removing  duplicates  gives  a  total  of  63  distinct  first  names.      Full  Names  of  Real  People    Having  a  list  of  racially  associated  first  names  (Figure  6)  is  a  start,  but  testing  ad  delivery  requires  a  real  person’s  first  and  last  name  (“full  name”).  How  do  we  get  full  names?  Web  searches  provide  a  means  to  locate  and  harvest  full  names  by:  (1)  sampling  names  of  professionals  appearing  on  the  Web;  and,  (2)  sampling  names  of  people  active  on  social  media  sites  and  blogs  (“netizens”).    The  subsections  below  describe  the  steps.    Harvesting  Full  Names  of  Professionals    Professionals  often  have  their  own  websites  or  have  biographical  information  appearing  on  institutional  websites,  listing  titles  and  positions,  and  describing  prior  accomplishments  and  current  activities.  Several  professions,  such  as  research,  medicine,  law,  and  business,  often  have  degree  designations,  such  as  PhD,  MD,  JD  or  MBA,  associated  with  people  in  that  profession.  A  Google  search  for  a  first  name  and  a  degree  designation  typically  yields  lists  of  people  having  that  first  name  and  degree.  We  use  these  kinds  of  searches  to  harvest  a  sample  of  full  names  of  professionals  having  racially  associated  first  names;  Figure  8a  itemizes  the  steps.    Here  is  a  walk  through  the  method  of  Figure  8a.  The  goal  is  to  acquire  a  list  of  at  least  10  full  names  for  each  racially  associated  first  name.    For  each  first  name  in  the  list  of  racially  associated  first  names  (Figure  6):  perform  a  Google  search  with  that  first  name  and  a  degree  designation  (Step  1.1);  harvest  full  names  from  the  search  

Sweeney L. Discrimination in Online Ad Delivery

13

results,  up  to  3  pages  of  results,  avoiding  duplicate  names;  and,  for  each  full  name  recorded,  visit  its  associated  web  page,  and  if  an  image  is  discernible,  record  whether  the  person  appears  black,  white,  or  other.    Archive  each  web  page  visited,  preserving  images  and  content.    Here  are  two  examples.  Figure  9a  shows  results  for  a  Google  search  of  “Ebony  PhD”.  The  results  immediately  reveal  links  for  real  people  having  “Ebony”  as  a  first  name  –specifically,  “Ebony  Bookman”,  “Ebony  Glover”  (highlighted),  “Ebony  Baylor”  and  “Ebony  Utley”.  We  harvest  the  full  names  appearing  on  the  first  three  pages  of  search  results,  using  searches  with  other  professional  endings,  such  as  JD,  MD,  or  MBA  as  needed  to  find  additional  names  in  order  to  get  at  least  10  full  names  for  “Ebony”.  Clicking  on  the  link  associated  with  “Ebony  Glover”  provides  more  information  about  her  (Figure  9b),  including  an  image.    We  record  that  the  Ebony  Glover  in  the  study  appears  black.      Similarly,  Figure  9c  shows  search  results  for  “Jill  PhD”—a  list  of  professionals  whose  first  name  are  Jill.  Visiting  links  yields  web  pages  with  more  information  about  each  person.  For  example,  Figure  9d  shows  an  extract  of  Jill  Schneider  ‘s  web  page,  and  from  the  associated  image,  we  record  that  the  Jill  Schneider  in  this  study  is  white.        Step 1 For each namei in the list of racially associated first names in Figure 6, do: 1.1 Perform a Google search for “namei degreej” where degreej is one of {PhD, MD, JD, MBA}. 1.2 For each result page, up to 3 pages, do: Preserve a copy of the page Record first and last names of people, avoiding duplicates. For each full name recorded, do: Click on the associated link. Preserve a copy of the resulting page. If personal image appears, record whether the person appears black, white, or other. Repeat Steps 1.1 and 1.2 with another degreej if the number of full names for namei is less than 10.

(a)    Step 1 For each namei in the list of racially associated first names in Figure 6, do: 1.1 Perform a PeekYou search for “namei” 1.2 For each result page, up to 2 pages, and 10 recorded full names for namei do: Preserve a copy of the page Record first and last names of people, avoiding duplicates. For each full name recorded, note whether associated image appears black, white, or other.

(b)    

Figure  8.  Method  for  harvesting  racially  associated  first  and  last  names  of  (a)  professionals  using  Google  search  and  (b)  netizens  using  PeekYou.          

Sweeney L. Discrimination in Online Ad Delivery

14

 

   

 

   

(a)    

(b)    

 

   

 

   

(c)    

(d)    

 

Figure  9.  Extracts  of  search  and  web  pages  for  first  names  and  degree  designations.  (a)  Search  “Ebony  Phd”.  (b)  “Ebony  Glover”  page.  (c)  Search  “Jill  Phd”  (d)  “Jill  Schneider”  page.      

   

 

   

 

(a)    

(b)    

 

Figure  10.  Extracts  of  search  pages  for  netizens  using  PeekYou.com  for  first  names  (a)  “Ebony”  and  (b)  “Jill”.  Highlighted  records  are  (a)  “Ebony  Springer”  and  (b)  “Jill  Christopher”.  

Sweeney L. Discrimination in Online Ad Delivery

15

Harvesting  Full  Names  of  Netizens    The  website  peekyou.com  (“PeekYou”)  compiles  and  disambiguates  online  and  offline  information  on  individuals,  thereby  connecting  residential  information  with  Facebook  and  twitter  users,  bloggers,  and  others,  and  assigns  its  own  rating  of  size  for  each  person’s  on-­‐line  footprint.    Search  results  from  peekyou.com  (“PeekYou  search”)  lists  people  having  the  highest  score  first,  second  highest  second,  and  so  on,  and  includes  an  image  of  the  person.    Celebrities  and  public  figures  tend  to  list  first,  having  the  highest  PeekYou  scores,  followed  by  bloggers,  tweeters  and  the  rest.    We  use  PeekYou  searches  to  harvest  a  sample  of  full  names  of  netizens  having  racially  associated  first  names;  Figure  8b  itemizes  the  steps.    Harvesting  names  of  netizens  (Figure  8b)  is  similar  but  simpler  than  harvesting  names  of  professionals  (Figure  8a).  For  each  name  in  the  list  of  racially  associated  first  names  (Figure  6),  perform  a  PeekYou  search  with  that  first  name  (Step  1.1);  harvest  full  names  from  the  search  results,  up  to  2  pages  of  results,  avoiding  duplicate  names;  and,  for  each  full  name  recorded,  note  whether  the  person  in  the  associated  image  appears  black,  white,  or  other.    Archive  each  web  page,  preserving  images  and  content.    Here  are  two  examples.  Figure  10a  shows  some  results  from  a  PeekYou  search  of  “Ebony”  as  a  first  name,  listing  “Ebony  Small”,  “Ebony  Cams”,  “Ebony  King”,  “Ebony  Springer”  (highlighted),  and  “Ebony  Tan”.  Similarly,  Figure  10b  shows  some  PeekYou  search  results  for  “Jill”  as  a  first  name,  listing  “Jill  Christopher”  (highlighted),  “Jill  Spivack”,  “Jill  English”,  “Jill  Pantozzi”,  and  “Jill  Dobson”.    We  harvest  these  and  other  full  names  appearing  on  the  first  two  pages  of  results  and  for  each  recorded  image,  report  the  race  of  the  person  if  discernible.    We  record  “Ebony  Glover”  in  this  study  appears  black  and  “Jill  Christopher”  white.    Results  from  Harvesting  Full  Names    Armed  with  the  approach  just  described,  from  September  24  through  October  22,  2012,  I  harvested  2184  racially  associated  full  names  of  people  with  an  online  presence  and  using  the  images  associated  with  those  names,  was  able  to  confirm  that  the  racially  associated  first  names  in  Figure  6  are  predictive  of  race  (88%  black  and  96%  white).    Figures  11  and  12  summarize  results.  Below  is  a  discussion.    Google  searches  of  first  names  and  degree  designations  were  not  as  productive  as  first  name  lookups  on  PeekYou,  1002  to  1182  harvested  names,  respectively.    White  male  names,  “Cody”,  “Connor”,  “Tanner”  and  “Wyatt”,  retrieved  results  with  those  as  last  names  not  first  names,  the  black  male  name,  “Kenya”,  was  confused  with  the  country,  and  black  names,  “Aaliyah”,  “Deja”,  “Diamond”,  “Hakim”,  “Malik”,  “Marquis”,  “Nia”,  “Precious”,  “Rasheed”  retrieved  less  than  10  full  names.    Only  “Diamond”  posed  a  problem  with  PeekYou  searches  –seemingly  confused  with  other  online  entities.    Other  than  “Diamond”,  all  other  searches  contributed  full  names,  and  so  unless  noted  otherwise,  we  exclude  “Diamond”  from  further  consideration.  

Sweeney L. Discrimination in Online Ad Delivery

16

 

Figure  11.  Summary  of  harvesting  2184  full  names  of  professionals  and  neitzens  from  the  Web    (middle  group)  using  racially  associated  first  names  (leftmost  group),  and  race  observations  of  online  images    (rightmost  group).  A  total  of  1428  images,  508  black,  881  white  and  39  other.  

Sweeney L. Discrimination in Online Ad Delivery

17

 

   

                 (a)  

           

   

                   (b)  

           

   

                   (c)  

 Figure  12.  Descriptive  statistics  of  harvested  full  names  (a)  and  analysis  of  first  names  as  a  classifier  for  blacks  (b)  and  for  whites  (c).        Figure  11  shows  the  number  of  full  names  harvested  for  each  first  name.    Names  contributing  the  most  number  of  full  names  have  white  first  names,  e.g.  “Katelyn”  (80),  “Molly”  (70),  “Amy”  (67),  “Dustin”  (66)  and  “Madeline”  (66),  purposefully  oversampled  to  test  whether  comparable  PeekYou  scores  have  any  effect  on  ad  delivery.  Names  contributing  the  least  number  of  full  names  have  black  first  names,  “Hakim”  (17),  “Rasheed”  (17),  “Precious”  (12),  “Nia”  (11)  and  “Kenya”  (4).        The  average  number  of  full  names  for  each  first  name  is  35,  with  a  median  of  30,  and  standard  deviation  16.    For  black  first  names,  the  average  number  of  full  names  for  each  of  the  31  first  names  is  27,  with  median  27,  and  standard  deviation  11,  and  for  the  31  white  first  names,  the  average  is  44,  median  35,  and  standard  deviation  16.    Of  the  2184  full  names  harvested,  835  (38%)  are  associated  with  black  first  names  and  1349  (62%)  with  white  first  names,  and  1075  (49%)  with  male  first  names  and  1109  (51%)  with  female  names;  see  Figure  12a.        Most  images  associated  with  black-­‐identifying  names  were  of  black  people  (88%)  and  an  even  greater  percentage  of  images  associated  with  white-­‐identifying  names  were  of  white  people  (96%).    A  total  of  1428  names  had  discernible  black  (508),  white  (881)  or  other  (39)  images  (Figure  11).  We  examine  black  and  white  names  separately  as  predictors  of  race  (Figures  12b  and  12c).  Of  those  having  black  associated  first  names,  490  images  were  of  blacks,  68  images  were  not,  18  images  

Sweeney L. Discrimination in Online Ad Delivery

18

having  white  first  names  were  of  blacks,  and  852  names  had  neither  black  first  names  nor  images  of  blacks.    Similarly,  831  images  of  whites  had  white  first  names,  50  images  of  whites  did  not  have  white  first  names,  39  had  white  first  names  but  non-­‐white  images,  and  508  had  neither  white  first  names  nor  images  of  whites.    Some  first  names  associated  as  black  had  perfect  predictions  (100%)  –“Aaliyah”,  “DeAndre”,  “Imani”,  “Jermaine”,  “Lakisha”,  “Latoya”,  “Malik”,  “Tamika”,  and  “Trevon”  —and  the  worst  predictors  of  blacks  were  “Jamal”  (48%)  and  “Leroy”  (50%).    Figure  11  has  details.  Even  more  first  names  associated  with  whites,  12  of  31  names  or  39%,  made  perfect  predictions  –“Brad”,  “Brett”,  “Cody”,  “Dustin”,  “Greg”,  “Jill”,  “Katelyn”,  “Katie”,  “Kristen”,  “Matthew”,  “Tanner”  and  “Wyatt”  –and  the  worst  predictors  of  whites,  “Jay”  (78%)  and  “Brendan”  (83%”),  were  not  bad.  These  findings  strongly  support  the  use  of  these  names  as  racial  indicators  in  this  study.        Sixty-­‐two  full  names  (or  62/2184  =  3%)  appeared  in  the  list  twice  even  though  the  people  were  not  necessarily  the  same.      No  name  appeared  more  than  twice,  so  overall,  Google  and  PeekYou  searches  tended  to  yield  different  names.    Ad  Delivery      We  now  have  a  set  of  first  and  last  names  suggestive  of  race.    What  ads  appear  when  these  names  are  searched?  To  answer  this  question,  we  examine  ads  delivered  on  two  sites,  Google.com  and  Reuters.com,  in  response  to  searches  of  each  full  name,  once  at  each  site.    The  method  is  straightforward.    For  each  full  name,  visit  Google.com,  search  for  the  name  and  record  which  ads  display.    Repeat  the  process  at  Reuters.com,  clearing  the  browser’s  cache  and  cookies  before  each  search  and  preserving  copies  of  web  pages  received.    Figure  13  enumerates  these  steps.    As  examples,  Figure  14  shows  ads  delivered  in  response  to  searches  of  “Lakisha  Simmons”,  “Laurie  Ryan”,  “Darnell  Bacon”,  and  “Brendan  Watson”  on  google.com  and  reuters.com.    We  preserve  the  capture  of  all  ads,  not  just  those  of  Instant  Checkmate.    Step 1 For each fullnamei in the list of racially associated full names, do: 1.1 Clear the browser cache and cookies. 1.2 Search Google.com for “fullnamei” 1.3 Preserve copies of any of up to the first 3 pages of results having ads. 1.4 Record which ads display. 1.5 Clear the browser cache and cookies. 1.6 Search Reuters.com for “fullnamei” 1.7 Preserve a copy of the resulting page. 1.8 Record which ads display.  

Figure  13.  Method  for  harvesting  ads  appearing  in  responses  to  searches  of  full  names  on  google.com  and  reuters.com.        

Sweeney L. Discrimination in Online Ad Delivery

19

 

   

   

 

(a)   (b)    

   

 

   

(c)   (d)      

 

   

 

(e)   (f)      

 

 

   

(g)   (h)  Figure  14.  Ads  in  response  to  full  name  searches  on  google.com  (a,c,e,g)  and  reuters.com  (b,d,f,h)  for  “Lakisha  Simmons”,  “Laurie  Ryan”,  “Darnell  Bacon”,  and  “Brendan  Watson”.  

Sweeney L. Discrimination in Online Ad Delivery

20

Results  from  Ad  Delivery    From  September  24  through  October  23,  2012,  I  searched  2184  full  names  on  google.com  and  reuters.com,  as  described  above.    Execution  took  place  at  different  times  of  days,  different  days  of  week,  with  different  IP  and  machine  addresses  operating  in  different  parts  of  the  United  States  using  different  browsers.  I  manually  searched  1373  of  the  names  and  used  automated  means  (“Webshot”  [12])  for  the  remainder  (812  names).  Here  are  15  findings  about  ads  and  names,  followed  by  four  supplemental  observations.    

1. No  more  than  three  ads  ever  appeared  for  a  search,  whether  manual  or  automated,  regardless  of  website,  Google  or  Reuters.  No  company’s  ad  listed  more  than  once  on  a  page.    

2. Far  fewer  ads  appeared  on  google.com  than  on  reuters.com.  A  total  of  5337  ads  appeared,  4473  (84%)  on  reuters.com  and  only  864  (16%)  on  Google,  even  when  examining  up  to  three  pages  of  search  results  on  google.com,  and  Google  showed  fewer  ads  per  page,  typically  1  (median)  compared  to  3    (median)  on  Reuters.    In  terms  of  the  2184  full  names,  ads  appeared  exclusive  to  Reuters  (1221),  Google  (17)  and  on  both  (604)  for  a  total  of  1842  (84%)  names  having  ads;  342  names  had  no  ads  at  all.  Reuters  displayed  ads  for  1826  (84%)  names  and  Google  for  622  (28%).  Figures  15a  and  15c  have  summary  statistics.  

 3. Most  ads  were  for  government-­‐collected  information  (“public  records”)  

about  the  person.    Public  records  in  the  United  States  often  include  a  person’s  address,  phone  number,  criminal  history,  and  professional  and  business  licenses,  though  specifics  vary  among  states.  Of  the  5337  total  ads  captured,  all  but  1161  were  for  public  records,  or  conversely,  4176  ads  (78%  of  all  ads)  were  for  public  records.  Figure  15a  has  a  distribution.  

 4. Ads  for  public  records  appeared  for  most  names.    Of  the  2184  names,  1705  

(78%)  had  at  least  one  ad  for  public  records  about  the  person  being  searched.  Reuters  showed  ads  for  1598  names  and  Google  for  544  names.    Figure  16  has  details.  

 5. More  Instant  Checkmate  ads  appeared  than  for  any  other  company.    Four  

companies  accounted  for  more  than  half  of  all  ads:    instantcheckmate.com  (1557  of  5337  or  29%),  publicrecords.com  (861  or  16%),  peoplesmart.com  (589,  11%),  and  peoplefinders.com  (542,  10%).    All  ads  for  these  companies  sold  public  records.    Ad  distribution  was  different  on  Google’s  site;  Instant  Checkmate  still  had  the  most  ads  (431  of  864  or  50%),  but  Intelius,  another  seller  of  public  records,  while  not  in  the  top  four  overall,  had  the  second  most  ads  (127  or  15%)  on  google.com.  Figure  15a  lists  details.  

 

Sweeney L. Discrimination in Online Ad Delivery

21

6. Instant  Checkmate  ads  dominated  the  topmost  ad  position.    On  reuters.com,  ads  for  Instant  Checkmate  listed  first  in  892  (49%)  of  the  1826  searches  having  ads  on  Reuters.    The  next  closest,  publicrecords.com,  was  a  distance  back  having  the  topmost  spot  only  142  times,  but  most  frequently  appearing  in  the  second  and  third  positions.    Figure  15b  summarizes  ad  positions.    

 7. Ads  for  public  records  appeared  more  often  in  black  names  than  white.  

Regardless  of  company,  proportionately  more  ads  appeared  for  names  having  a  black-­‐identifying  first  name.  PeopleSmart  ads  appeared  for  270  white  and  280  black  names,  being  disproportionately  higher  for  blacks,  41%  (280  of  679)  to  29%.    PublicRecords  ads  appeared  10%  more  often  for  black  (54%)  than  white  (44%)  names,  and  Instant  Checkmate  ads  2.45%  more  often  for  blacks  (72%  to  69%).    Figure  15d  lists  findings.  

 8. Instant  Checkmate  ads  accounted  for  the  largest  percentage  of  ads  in  most  

first  name  categories,  except  for  “Kristen”,  “Connor”,  and  “Tremayne”,  which  have  uncharacteristically  fewer  ads.    Instant  Checkmate  ads  appeared  for  an  average  of  70%  of  all  full  names  in  a  first  name  group  receiving  ads  on  Reuters  (median  76%,  standard  deviation  0.21,  63  first  name  groups).    For  example,  Instant  Checkmate  ads  appeared  on  Reuters  for  90-­‐100%  of  all  full  names  having  ads  whose  name  began  “Kenya”,  “Latoya”,  “DeShawn”,  “Emily”,  “Jay”,  “Greg”,  “Brendan”,  “Brad”,  “Leroy”,  “Dustin”,  “Neil”  or  “Jill”.    In  three  cases,  Instant  Checkmate  ads  fell  under  25%  despite  competition:  “Tremayne”  (91%  PublicRecords,  23%  Instant  Checkmate),  “Connor”  (80%  PublicRecords,  20%  Instant  Checkmate),  and  “Kristen”  (58%  PublicRecords,  16%  Instant  Checkmate).    Figure  16  shows  results  by  first  name  group.  

 9. Instant  Checkmate  had  the  most  variability  in  ad  copy.  Almost  all  ads  for  

public  records  included  the  name  of  the  person  in  the  ad  itself,  making  each  ad  virtually  unique,  but  beyond  personalization,  there  was  little  variability  in  ad  templates.  Of  the  534  PeopleFinder  ads  appearing  on  Reuters,  all  but  11  used  the  same  personalized  template,  “We found fullname. Current Address, Phone and Age. Find fullname, Anywhere”,  where  the  person’s  first  and  last  name  replaces  fullname.  PublicRecords  used  5  templates  and  PeopleSmart  7,  but  Instant  Checkmate  used  more  than  all  others  combined,  18  templates  in  1126  ads.  Figure  17  displays  ad  texts  and  frequencies  for  all  four  companies.  

 10. Only  Instant  Checkmate  ads  included  the  word  “arrest”.    While  Instant  

Checkmate’s  competitors,  PeopleSmart,  PublicRecords,  and  PeopleFinders,  also  sell  criminal  history  information,  none  of  their  ads  included  the  word  “arrest”.    In  the  18  templates  of  Instant  Checkmate  ads  found  on  Reuters,  8  of  them  include  the  word  “arrest”;  see  Figure  17  for  details.  

 

Sweeney L. Discrimination in Online Ad Delivery

22

11. Instant  Checkmate  ads  having  “arrest”  in  its  text  appeared  less  often  than  ads  not  including  the  word  on  Reuters.    Of  the  1126  Instant  Checkmate  ads  appearing  on  Reuters,  544  (48%)  include  the  word  “arrest”  and  582  (52%)  do  not.  Figure  19  provides  details.      

 12. A  greater  percentage  of  Instant  Checkmate  ads  having  “arrest”  in  ad  text  

appeared  for  black  identifying  first  names  than  for  white  first  names.    Of  the  1126  Instant  Checkmate  ads  on  Reuters,  488  displayed  with  black-­‐identifying  first  names,  291  (60%)  of  which  had  “arrest”  in  ad  text.    Of  the  638  ads  displayed  with  white-­‐identifying  names,  308  (48%)  had  “arrest”.    These  results  are  statistically  significant,  X2(1)=14.32,  p  < 0.001;  there  is  less  than  a  0.1%  probability  that  these  data  can  be  explained  by  chance.    The  results  also  have  an  adverse  impact  ratio  (40%/52%)  of  77%,  satisfying  the  EEOC’s  and  U.S.  Department  of  Labor’s  80%  adverse  impact  test  if  this  were  employment.  Figure  15e  shows  analysis.  

13. More  white  identifying  first  names  top  the  list  of  neutral  Instant  Checkmate  

Ads  than  do  black  names.  On  reuters.com,  the  highest  percentage  of  neutral  ads,  where  the  word  “arrest”  does  not  appear  in  ad  text,  were  ads  for  “Jill”  (77%)  and  “Emma”  (75%),  both  white-­‐identifying  names.    Names  receiving  the  highest  percentage  of  ads  with  “arrest”  in  the  text  are  “Darnell”  (84%),  “Jermaine”  (81%)  and  “DeShawn”  (86%),  all  black-­‐identifying  first  names.  Some  names  appear  opposite  this  pattern.    “Dustin”,  a  white-­‐identifying  name,  generated  “arrest”  ads  in  81%  of  searches  with  that  first  name,  and  “Imani”,  a  black-­‐identifying  name,  received  neutral  copy  in  75%  of  “Imani”  searches.  Figure  19  provides  results  by  first  name  groups.  

 14. Instant  Checkmate  ads  appearing  on  google.com  often  used  different  ad  text  

than  on  Reuters.  While  the  same  neutral  and  arrest  ads  having  dominant  appearances  on  Reuters  also  appeared  frequently  on  Google,  ads  on  google.com  included  an  additional  10  templates,  all  using  the  word  “criminal”,  a  word  also  suggestive  of  arrest,  or  the  word  “arrest”.  These  new  templates  appeared  in  89  of  the  432  ads  (21%).    Figure  20  lists  the  Instant  Checkmate  ad  templates  found  on  google.com.  

 15. On  google.com,  a  greater  percentage  of  Instant  Checkmate  ads  suggestive  of  

arrest  displayed  for  black  associated  first  names  than  white.  Of  the  432  Instant  Checkmate  ads  appearing  on  google.com,  90%  (388)  were  suggestive  of  arrest  regardless  of  race.  Of  the  366  ads  that  appeared  for  black-­‐identifying  names,  335  (92%)  were  suggestive  of  arrest.    Far  fewer  ads  displayed  for  white-­‐identifying  names  (66  total),  and  53  (80%)  were  suggestive  of  arrest.    These  results  are  statistically  significant,  X2(1)=7.71,  p  < 0.01;  there  is  less  than  a  1%  probability  that  these  data  can  be  explained  by  chance.    The  adverse  impact  ratio  (8%/20%)  of  40%,  which  would  satisfy  the  EEOC  adverse  impact  test  if  this  were  employment.  Figure  15f  shows  analysis  and  Figures  21  and  22  show  distributions.  

Sweeney L. Discrimination in Online Ad Delivery

23

Here  are  four  supplemental  observations.    

16. A  greater  percentage  of  Instant  Checkmate  ads  having  the  word  “arrest”  in  ad  text  appeared  for  black  identifying  first  names  than  for  white  identifying  first  names  within  professional  and  netizen  subsets.  Of  the  2184  names  in  the  study,  599,  harvested  using  professional  designations,  had  Instant  Checkmate  ads  on  Reuters  with  217  having  black  associated  names,  136  (63%)  of  which  received  ads  with  the  word  “arrest”  in  ad  text  compared  to  only  178  (47%)  of  382  white  associated  names,  a  statistically  significant  difference  (X2(1)=14.34,  p  < 0.001).  Netizens  also  had  a  higher  percentage  of  black  names  having  ads  with  the  word  “arrest”  in  155  (57%)  of  271  ads  for  black  identifying  names  compared  to  130  (51%)  of  256  ads  for  white  identifying  names.      

17. People  behind  the  names  used  in  this  study  are  diverse.    Examining  source  webpages  for  the  names  reveals  all  kinds  of  people.    Political  figures  include  State  Representatives  Aisha  Braveboy  (“arrest”  ad)  and  Jay  Jacobs  (neutral  ad)  of  Maryland,  Jill  Biden  (neutral  ad),  wife  of  U.S.  Vice  President  Joe  Biden,  and  Claire  McCaskill,  whose  campaign  advertisement  for  the  U.S.  Senate  is  alongside  an  Instant  Checkmate  ad  having  the  word  “arrest”  (Figure  23).    Names  mined  from  academic  websites  include  graduate  students,  researchers,  administrators,  staff,  and  accomplished  academics,  such  as  Amy  Gutmann,  President  of  the  University  of  Pennsylvania  and  Chair  of  the  U.S.  Presidential  Commission  for  the  Study  of  Bioethical  Issues.  Dustin  Hoffman  (“arrest”  ad)  is  among  names  of  celebrities.  A  smorgasbord  of  athletes  appears,  from  local  to  national  fame,  including  numerous  high  school  stars  (assorted  neutral  and  “arrest”  ads).  The  youngest  person  associated  with  the  study  was  a  missing  11-­‐year-­‐old  black  girl.      

 18. PeekYou,  the  primary  source  of  names  for  Netizens  in  this  study,  assigns  a  

score  to  each  name  estimating  the  name’s  overall  presence  on  the  Web.    As  expected,  celebrities  get  the  highest  scores,  10’s  and  9’s.    Of  the  2184  names  in  the  study,  1143  were  harvested  from  PeekYou  with  scores,  and  only  4  of  these  had  a  PeekYou  score  of  10  and  12  had  a  9  score.  Dustin  Hoffman  is  a  9.    Only  2  ads  appeared  for  these  high  scoring  names.    Other  than  that,  an  abundance  of  ads  appeared  across  the  remaining  spectrum  of  PeekYou  scores.  Figure  25  shows  distributions  of  Peek  You  scores.  

 19. Different  Instant  Checkmate  ads  appear  for  the  same  person.    Of  the  2184  

names,  228  names  had  Instant  Checkmate  ads  on  both  Reuters  and  Google,  but  only  42  of  these  names  received  the  same  ad.    The  other  186  (82%)  names  received  different  ads  across  the  two  sites.    Search  results  on  Reuters  for  the  62  duplicate  names  that  appeared  in  the  study  show  different  ads  for  37  (60%)  names,  the  same  ad  for  7  names,  and  no  ad  for  18.      At  most,  three  distinct  ads  appeared  across  Reuters  and  Google  for  the  same  name;  Figure  24  has  examples.    

Sweeney L. Discrimination in Online Ad Delivery

24

 Figure  15.  Summary  statistics  for  (a)  ads  appearing  on  Reuters  and  Google;  (b)  ad  positions  on  Reuters;  (c)  results  by  names;  (d)  ads  for  public  record  appearing  on  Reuters  by  racially  associated  first  name;  (e)  Chi-­‐Square  test  for  Instant  Checkmate  ads  on  Reuters;  and,  (f)  Chi-­‐Square  test  for  Instant  Checkmate  ads  on  Google.        

 

   

 

   

(a)    

(b)  

 

   

 

   

(c)    

(d)  

 

   

 

   

(e)   (f)  

Sweeney L. Discrimination in Online Ad Delivery

25

   

Figure  16.  Counts  of  ads  for  public  records  by  first  name.  

Sweeney L. Discrimination in Online Ad Delivery

26

instantcheckmate peoplesmart C

382 Located: fullname Information found on fullname fullname found in database.

A 7

We found: fullname 1) Get firstname's Background Report 2) Contact info & More -try Free!

AC 2

Located: The Person Information found on them Person found in database.

T 87

We found: fullname 1) Get Aisha's Background Report 2) Current Contact Info - Try Free!

G* 96

We found fullname Search Arrests, Address, Phone, etc. Search records for fullname.

D 105

We found: fullname 1) Contact fullname -Free Info! 2) Current Address, Phone & More.

S* 4

We found Them Search Arrests, Address, Phone, etc. Search records for fullname.

Q 348

We found: fullname 1) Contact fullname -Free Info! 2) Current Phone, Address & More.

I 40

Background of fullname Search Instant Checkmate for the Records of fullname

AG 1

We found firstname Get firstname in CA’s Email, Address, Phone, Public Records & More Easy!

U 9

Background of Anyone Search Instant Checkmate for the Records of fullname

AH 1

We found firstname In lastname 1)Get firstname‘s Info – Try Now! 2)Current Phone, Address & More.

J 17

fullname's Records 1) Enter Name and State. 2) Access Full Background Checks Instantly.

R 1

Looking For fullname? Get fullname’s Phone, Email Address, Public Records & More Now!

X 3

Anyone’s Records 1) Enter Name and State. 2) Access Full Background Checks Instantly.

publicrecords

K* 195

fullname: Truth Arrests and Much More. Everything About fullname

B 570

fullname Public Records Found For: fullname. View now.

O* 67

fullname Truth Looking for fullname? Check fullname's Arrests

P 128

fullname Public Records Found For: fullname. Search now.

L* 176

fullname, Arrested? 1) Enter Name and State. 2) Access Full Background Checks Instantly.

F 13

Records: fullname Database of all lastname's in the Country. Search now.

V* 2

Uh Oh, Arrested? 1) Enter Name and State. 2) Access Full Background Checks Instantly.

Z 2

Fullname Info View Contact Information For Free Quick & Easy Search Results!

AD* 1

Found: fullname We have the story on fullname fullname’s arrests, relatives,etc.

H 56

fullname We have Public Records For: fullname. Search Now.

AF* 3

Fullname - Found Learn the truth about fullname Check fullname’s arrests & more.

peoplefinders

AE 4

Research fullname We have details on fullname. fullname’s full background & info.

E 523

We found fullname Current Address, Phone and Age. Find fullname, Anywhere.

M* 55

fullname Located Background Check, Arrest Records, Phone, & Address. Instant, Accurate

Y 8

We found fullname 1)Get Phone/ Address/ Age Instantly! 2) Find Anyone, Anywhere for Free.

N 62

Looking for fullname? Comprehensive Background Report and More on fullname

AA 2

Find fullname Get current and past addresses and phone numbers. Instant results!

AI 8

Looking for People in the US? Comprehensive Background Report and More on fullname

AB 1

We Found Them for Free Current Address, Phone and Age. Find fullname Anywhere.

Figure  17.  Templates  for  ads  for  public  records  on  Reuters,  replace  fullname  with  person’s  first  and  last  name.    Letter  identifies  text.  Number  is  number  of  occurrences  of  text.  *arrest  ad.    

Sweeney L. Discrimination in Online Ad Delivery

27

   Figure  18.  Distribution  of  ad  templates  in  Figure  17  by  first  name  as  they  appeared  on  Reuters.com.    

Sweeney L. Discrimination in Online Ad Delivery

28

 

 Figure  19.  Distributions  of  Instant  Checkmate  ads  having  the  word  “arrest”  or  not  (“neutral”)  appearing  on  Reuters.com.  

 

Sweeney L. Discrimination in Online Ad Delivery

29

 Ad Templates on Reuters and Google Ad Templates on Google Only

    C

33 Located: fullname Information found on fullname fullname found in database.

AJ* 30

fullname’s Records Did you know fullname’s criminal history is searchable?

G* 24

We found fullname Search Arrests, Address, Phone, etc. Search records for fullname.

AP* 2

fullname’s Records Online? Did you know fullname’s criminal history is searchable?

I 2

Background of fullname Search Instant Checkmate for the Records of fullname

AM* 9

Anyone’s Records Online? Did you know fullname’s criminal history is searchable?

U 1

Background of Anyone Search Instant Checkmate for the Records of fullname

AK* 2

Records For Anyone in US View Anyone’s Criminal History. Check Criminal Records in Seconds!

J 6

fullname's Records 1) Enter Name and State. 2) Access Full Background Checks Instantly.

AN* 9

Records For fullname View Anyone’s Criminal History. Check Criminal Records in Seconds!

K* 52

fullname: Truth Arrests and Much More. Everything About fullname

AL* 26

Records For fullname? Find the Truth About fullname View Criminal Records in Seconds.

O* 7

fullname Truth Looking for fullname? Check fullname's Arrests

AQ* 3

Records For People in the US? Find the Truth About fullname View Criminal Records in Seconds.

L* 200

fullname, Arrested? 1) Enter Name and State. 2) Access Full Background Checks Instantly.

AO* 6

Find fullname Criminal records, phone, address, & more on fullname

V* 10

Uh Oh, Arrested? 1) Enter Name and State. 2) Access Full Background Checks Instantly.

AS* 1

We Found fullname | InstantCheckmate.com Search Arrests, Address, Phone, etc Search records for fullname.

M* 6

fullname Located Background Check, Arrest Records, Phone, & Address. Instant, Accurate

AT* 1

fullname‘s Records | InstantCheckmate.com Did you know fullname’s criminal history is searchable?

N 2

Looking for fullname? Comprehensive Background Report and More on fullname

 

Figure  20.  Templates  for  ads  for  public  records  on  Google.com,  replace  fullname  with  person’s  first  and  last  name.    Letter  identifies  text.  Number  is  number  of  occurrences  of  text.  Asterisk  (*)  denotes  an  ad  suggestive  of  an  arrest  record.        

Sweeney L. Discrimination in Online Ad Delivery

30

 

   Figure  21.  Distribution  of  ad  templates  in  Figure  20  by  first  name  as  they  appeared  on  Google.com.  

Sweeney L. Discrimination in Online Ad Delivery

31

 

 Figure  22.  Distributions  of  Instant  Checkmate  ads  having  the  word  “arrest”  or  not  (“neutral”)  appearing  on  Google.com.  

   

Sweeney L. Discrimination in Online Ad Delivery

32

 

 

 

   

Figure  23.  Example  ads  displayed  in  response  to  search  of  “Claire  McCaskill”  on  Reuters.com  (right),  Claire  McCaskill,  U.S.  Senator  from  Missouri  (left).    An  ad  having  the  word  “arrest”  appears  below  an  ad  for  her  U.S.  Senate  campaign.          

   

 

   

 

   

 

   

 

   

 

   

     

 

   

Figure  24.  Examples  of  different  ad  copy  appearing  for  searches  of  “Latonya  Evans”  (left)  and  “Latisha  Smith”  (right).      

Sweeney L. Discrimination in Online Ad Delivery

33

 

 Figure  25.  Distributions  of  Netizen  names  and  ad  delivery  by  PeekYou  scores  for  those  names  having  PeekYou  scores,  which  are  values  PeekYou  assigns  to  names  as  an  estimate  of  the  person’s  presence  on  the  Web.  

   

Sweeney L. Discrimination in Online Ad Delivery

34

Conclusion  and  Future  Work    This  study  raises  more  questions  than  it  answers.    Here  is  the  one  answer  provided.    Our  hypothesis  states  that  no  difference  exists  in  the  delivery  of  ads  suggestive  of  an  arrest  record  based  on  searches  of  racially  associated  names.  Our  findings  reject  this  hypothesis.  A  greater  percentage  of  ads  having  “arrest”  in  ad  text  appeared  for  black  identifying  first  names  than  for  white  identifying  first  names  in  searches  on  Reuters.com,  on  Google.com,  and  in  subsets  of  the  sample.    Results  of  Chi-­‐Square  tests  on  these  patterns  were  statistically  significant.    On  Reuters.com,  a  host  of  Google  AdSense  ads,  a  black-­‐identifying  name  was  25%  more  likely  to  get  an  ad  suggestive  of  an  arrest  record,  X2(1)=14.32,  p  < 0.001;  there  is  less  than  a  0.1%  probability  that  these  data  can  be  explained  by  chance.      Why  is  this  discrimination  occurring?    Is  this  Instant  Checkmate,  Google,  or  society’s  fault?  Answering  these  questions  is  beyond  the  scope  of  this  writing,  but  navigating  the  terrain  requires  further  information  about  the  inner  workings  of  Google  AdSense.    Google  understands  that  an  advertiser  may  not  know  which  ad  copy  will  work  best,  so  an  advertiser  may  give  multiple  templates  for  the  same  search  string  and  the  “Google  algorithm”  learns  over  time  which  ad  text  gets  the  most  clicks  from  viewers  of  the  ad.    It  does  this  by  assigning  weights  (or  probabilities)  based  on  the  click  history  of  each  ad  copy.    At  first  all  possible  ad  copies  are  weighted  the  same,  they  are  all  equally  likely  to  produce  a  click.    Over  time,  as  people  tend  to  click  one  version  of  ad  text  over  others,  the  weights  change,  so  the  ad  text  getting  the  most  clicks  eventually  displays  more  frequently.  This  approach  aligns  the  financial  interests  of  Google,  as  the  ad  deliverer,  with  the  advertiser.    Figure  24  provides  examples  in  which  Instant  Checkmate  provided  multiple  ad  templates  for  searches  of  “Latonya  Evans”  and  “Latisha  Smith”.    Did  Instant  Checkmate  provide  ad  templates  suggestive  of  arrest  disproportionately  to  black-­‐identifying  names?3  Or,  did  Instant  Checkmate  provide  roughly  the  same  templates  evenly  across  racially  associated  names  but  society  clicked  ads  suggestive  of  arrest  more  often  for  black  identifying  names?  Google  uses  cloud-­‐caching  strategies  to  deliver  ads  quickly,  might  these  strategies  bias  ad  delivery  towards  ad  templates  previously  loaded  in  the  cloud  cache?  Is  there  a  combinatorial  effect?    This  paper  is  a  start  and  more  research  is  needed;  however,  online  advertising  is  dynamic  and  easy  to  change.    In  order  to  preserve  research  opportunities,  prior  to  any  announcement  of  this  work,  I  captured  additional  results  for  50  hits  on  2184  names  across  30  websites  serving  Google  Ads  to  learn  the  underlying  distributions  of  ad  occurrences  per  name.  While  analyzing  these  data  may  prove  illuminating,  in  the  end,  the  basic  message  presented  in  this  writing  does  not  change.  There  is  discrimination  in  delivery  of  these  ads.    

3 During a conference call with the founders of Instant Checkmate and their lawyer on December 21, 2012, the company’s representatives asserted that Instant Checkmate gave the same ad text to Google for groups of last names (not first names).

Sweeney L. Discrimination in Online Ad Delivery

35

 In  the  broader  picture,  technology  can  do  more  to  thwart  discriminatory  effects  and  harmonize  with  societal  norms.  Ads  responding  to  name  searches  appear  in  a  specific  information  context  and  technology  controls  that  context.    A  reader  enters  a  name  then  views  web  content  and  news  stories  specific  to  that  name.    Dynamic  ads  are  a  part  of  that  context.    Alongside  news  stories  about  high  school  athletes  and  children  can  be  ads  bearing  the  child’s  name  and  suggesting  arrest.  This  seems  concerning  on  many  levels.  For  example,  even  if  the  child  has  an  arrest  record,  juvenile  records  are  typically  exempt  from  public  record  disclosure.  The  juxtaposition  of  ads  also  provide  context.    Claire  McCaskill  provides  an  example  where  an  ad  suggestive  of  arrest  appears  alongside  an  ad  for  her  U.S.  Senate  campaign.    Search  and  ad  technology  already  reason  extensively  about  context  and  appropriateness  when  deciding  the  best  content  to  deliver  to  the  reader  [13].    Many  factors  are  often  known  about  the  reader  at  the  time  of  ad  delivery,  e.g.,  browsing  history,  geographical  location,  and  shopping  behavior  [14].    With  some  expansion,  technology  could  additionally  reason  about  social  and  legal  implications  of  content  and  context  too.    For  example,  well-­‐known  computer  scientist  Cynthia  Dwork  and  her  colleagues  have  already  been  working  on  algorithms  that  assure  racial  fairness  [15].    This  area  seems  ripe  for  further  research  and  development.      Acknowledgements    The  author  thanks  Ben  Edelman,  Claudine  Gay,  Gary  King,  Annie  Lewis,  and  weekly  Topics  in  Privacy  participants  (David  Abrams,  Micah  Altman,  Merce  Crosas,  Bob  Gelman,  Harry  Lewis,  Joe  Pato,  and  Salil  Vadhan)  for  discussions,  the  Institute  for  Quantitative  Social  Science,  the  Department  of  Government,  Dean  Smith,  Jim  Waldo  and  my  students  at  Harvard  for  an  awesome  work  environment,  Adam  Tanner  for  first  suspecting  a  pattern,  and  Diane  Lopez  and  Matthew  Fox  in  Harvard’s  Office  of  the  General  Counsel  for  making  publication  possible  in  the  face  of  legal  threats.    Data  from  this  study  are  available  at  foreverdata.org  and  the  IQSS  Dataverse  Network.    Supported  in  part  by  NSF  grant  CNS-­‐1237235  and  a  gift  from  Google,  Inc.        

Sweeney L. Discrimination in Online Ad Delivery

36

References    1   Harris  P  and  Keller  K.  Ex-­‐offenders  need  not  apply:  the  criminal  background  check  in  hiring  

decisions.  Journal  of  Contemporary  Criminal  Justice.  February  2005  vol.  21  no.  1  6-­‐30.      

2   Consideration  of  Arrest  and  Conviction  Records  in  Employment  Decisions  Under  Title  VII  of  the  Civil  Rights  Act  of  1964.  U.S.  Equal  Employment  Opportunity  Commission.    Washington,  DC.  915.002.  4/25/2012.    http://www.eeoc.gov/laws/guidance/arrest_conviction.cfm  (as  of  January  9,  2013).      

3   Uniform  Guidelines  on  Employee  Selection  Procedures.  U.S.  Equal  Employment  Opportunity  Commission.    Washington,  DC.  

   4   Racism  in  America  and  how  to  combat  it.  U.S.  Commission  on  Civil  Rights.  Washington,  D.C  

1970.      5   Panel  on  Methods  for  Assessing  Discrimination,  National  Research  Council.  Measuring  Racial  

Discrimination.  National  Academy  Press.  Washington,  DC.  2004.      6   Barker  R.  The  social  work  dictionary  (5th  ed.).  Washington,  DC:  NASW  Press.  2003.      7   Google  AdSense.  Google.  http://google.com/adsense  (as  of  January  9,  2013).      8   Google  Announces  First  Quarter  2011  Financial  Results.    Google.  

http://investor.google.com/earnings/2011/Q1_google_earnings.html    (as  of  January  9,  2013).      9   Bertrand  M  and  Mullainathan  S.  Are  Emily  and  Greg  More  Employable  than  Lakisha  and  Jamal?  

A  Field  Experiment  on  Labor  Market  Discrimination.  NBER  Working  Paper  No.  9873.  July  2003.  http://www.nber.org/papers/w9873  (As  of  January  9,  2013).    

10   Levitt  S  and  Dubner  S.  Freakonomics:  A  rogue  economist  explores  the  hidden  side  of  everything.  New  York:  William  Morrow,  2005.  

   11   Fryer  R  and  Levitt  S.  The  Causes  and  Consequences  of  Distinctively  Black  Names.  The  Quarterly  

Journal  of  Economics.  Vol  59  (3)  August  2004.  http://pricetheory.uchicago.edu/levitt/Papers/FryerLevitt2004.pdf  (As  of  January  9,  2013).    

12   WebShot  Command  Line  Server  Edition.  Version  1.9.1.1.  http://www.websitescreenshots.com/  (As  of  January  9,  2013).  

   13   Yuan  S,  Zainal  Abidin  A,  et  al.  Internet  Advertising:  An  Interplay  among  Advertisers,  Online  

Publishers,  Ad  Exchanges  and  Web  Users.  arXiv:1206.1754  [cs.IR].  http://arxiv.org/abs/1206.1754.  (As  of  January  9,  2013).  

   14   Emily  Steel  and  Julia  Angwin.  On  the  web’s  cutting  edge,  anonymity  in  name  only.  The  

Wall  Street  Journal,  2010.      15   Dwork  C,  Hardt  M,  et  al.  Fairness  through  Awareness.  arXiv:1104.3913  [cs.CC].  

http://arxiv.org/abs/1104.3913  (As  of  January  9,  2013).