19
629 Privacy and Predictive Analytics in E-Commerce SHAUN B. SPENCER* INTRODUCTION his Article discusses the implications of predictive analytics for consumer privacy in e-commerce and surveys potential regulatory responses. Part I introduces predictive analytics and illustrates its potential uses in e-commerce. Predictive analytics helps merchants operate efficiently and maximize profits, but also risks denying consumers important commercial benefits. Part II examines how predictive analytics harms consumer privacy. The prevailing theoretical accounts define privacy as the ability to control what others know about you and recognize privacy’s role in promoting personal autonomy and dignity. Predictive analytics harms privacy as control because individuals cannot know what the data they share will ultimately predict. In addition, predictive analytics harms consumer autonomy and dignity because it deprives them of significant commercial benefits based on secret formulas, and risks automating societal discrimination. Finally, Part III examines potential regulatory responses to the harms of predictive surveillance. Any regulatory response must be measured to avoid diminishing real commercial benefits and stifling innovation. In addition, regulation must be tailored to the type and degree of privacy harm posed by the varying uses of predictive analytics. *Assistant Professor and Director of Legal Skills, University of Massachusetts School of Law- Dartmouth. I am grateful to the panelists and participants at the New England Law Review’s Spring 2015 Symposium, What Stays in Vegas, and at the University of Michigan Technology and Communications Law Review’s Spring 2015 Symposium, Privacy, Technology, and the Law, where I presented earlier versions of this article. Parts I and II of this article will appear in Predictive Analytics, Consumer Privacy, and E-Commerce, in RESEARCH HANDBOOK ON ELECTRONIC COMMERCE LAW (John A. Rothchild ed., Elgar Publishing, forthcoming 2015). T

Spencer: Privacy and Predictive Analytics in E-commerce

Embed Size (px)

DESCRIPTION

This Article discusses the implications of predictive analytics for consumer privacy in e-commerce and surveys potential regulatory responses. Part I introduces predictive analytics and illustrates its potential uses in e-commerce. Predictive analytics helps merchants operate efficiently and maximize profits, but also risks denying consumers important commercial benefits. Part II examines how predictive analytics harms consumer privacy. The prevailing theoretical accounts define privacy as the ability to control what others know about you and recognize privacy’s role in promoting personal autonomy and dignity. Predictive analytics harms privacy as control because individuals cannot know what the data they share will ultimately predict. In addition, predictive analytics harms consumer autonomy and dignity because it deprives them of significant commercial benefits based on secret formulas, and risks automating societal discrimination. Finally, Part III examines potential regulatory responses to the harms of predictive surveillance. Any regulatory response must be measured to avoid diminishing real commercial benefits and stifling innovation. In addition, regulation must be tailored to the type and degree of privacy harm posed by the varying uses of predictive analytics.

Citation preview

Page 1: Spencer: Privacy and Predictive Analytics in E-commerce

629

Privacy and Predictive Analytics in E-Commerce

SHAUN B. SPENCER*

INTRODUCTION

his Article discusses the implications of predictive analytics for consumer privacy in e-commerce and surveys potential regulatory responses. Part I introduces predictive analytics and illustrates its

potential uses in e-commerce. Predictive analytics helps merchants operate efficiently and maximize profits, but also risks denying consumers important commercial benefits. Part II examines how predictive analytics harms consumer privacy. The prevailing theoretical accounts define privacy as the ability to control what others know about you and recognize privacy’s role in promoting personal autonomy and dignity. Predictive analytics harms privacy as control because individuals cannot know what the data they share will ultimately predict. In addition, predictive analytics harms consumer autonomy and dignity because it deprives them of significant commercial benefits based on secret formulas, and risks automating societal discrimination. Finally, Part III examines potential regulatory responses to the harms of predictive surveillance. Any regulatory response must be measured to avoid diminishing real commercial benefits and stifling innovation. In addition, regulation must be tailored to the type and degree of privacy harm posed by the varying uses of predictive analytics.

*Assistant Professor and Director of Legal Skills, University of Massachusetts School of Law-

Dartmouth. I am grateful to the panelists and participants at the New England Law Review’s

Spring 2015 Symposium, What Stays in Vegas, and at the University of Michigan Technology and

Communications Law Review’s Spring 2015 Symposium, Privacy, Technology, and the Law, where

I presented earlier versions of this article. Parts I and II of this article will appear in Predictive

Analytics, Consumer Privacy, and E-Commerce, in RESEARCH HANDBOOK ON ELECTRONIC

COMMERCE LAW (John A. Rothchild ed., Elgar Publishing, forthcoming 2015).

T

Page 2: Spencer: Privacy and Predictive Analytics in E-commerce

630 New England Law Review v. 49 | 629

I. Predictive Analytics and E-Commerce

A. Overview of Predictive Analytics

Predictive analytics predicts future behavior based on the patterns of past behavior.1 Although predictive analytics uses statistical techniques, it departs from traditional statistical analysis in several important ways. First, predictive analytics usually analyzes vast quantities of data rather than carefully drawn samples. In contrast, traditional statistical analysis has always relied on sophisticated techniques for drawing representative samples and inferring population characteristics from those samples.2 With the data explosion over the last few decades, however, researchers can use predictive analytics to observe the entire population and find subtle patterns that help predict future behavior.3

Second, predictive analytics is less concerned about causation than traditional statistical methods. By using predictive analytics to study large datasets with many variables, analysts can build extremely accurate predictive models based on strong correlations in the data, regardless of why those correlations exist.4 This technique can reveal correlations one might not have imagined if one were looking for causation.5

For example, predictive analytics can generate models that predict when a given mechanical device, like a motor or a bridge, will fail. The models are based on vast amounts of data from sensors monitoring patterns in the data that the devices emit, such as heat, vibration, stress, and sound. It is far less important to know why the device may fail than it is to know that it will probably fail soon.6 Eric Siegel’s Predictive Analytics7 gives us many examples of what predictive analytics can show us, including: “[s]uicide bombers do not buy life insurance”;8 crime rises after upset losses in college football;9 and phone card sales predict massacres in the Congo.10 In each of these cases, researchers use past correlations to

1 ERIC SIEGEL, PREDICTIVE ANALYTICS: THE POWER TO PREDICT WHO WILL CLICK, BUY, LIE,

OR DIE 80 (2013). 2 See VIKTOR MAYER-SCHÖNBERGER & KENNETH CUKIER, BIG DATA: A REVOLUTION THAT

WILL TRANSFORM HOW WE LIVE, AND THINK 24–25 (2013). 3 Id. at 6. 4 See id. at 13. 5 SIEGEL, supra note 1, at 88. 6 See MAYER-SCHÖNBERGER & CUKIER, supra note 2, at 58–59. 7 SIEGEL, supra note 1. 8 Id. at 85. 9 Id. at 86. 10 Id.

Page 3: Spencer: Privacy and Predictive Analytics in E-commerce

2015 Privacy and Pred ic t iv e Analyt ics 631

predict future behavior.

B. Using Predictive Analytics in E-Commerce

Merchants use predictive analytics to identify consumers who share a condition of interest to the merchant, also known as the “target variable.”11 That condition may be the likelihood of a commercial behavior, like clicking on an online ad, purchasing a product, defecting to a competitor, or defaulting on a loan.12 Or the condition may be the likelihood of non-commercial behaviors like dying early or getting into a car accident.13 Or the condition may not involve future behavior. It may instead be a specific characteristic that is of interest to the merchant, such as whether the consumer is pregnant14 or has a particular medical condition.15

Predictive analytics relies mainly on secondary evidence of these conditions of interest, rather than primary evidence. Primary evidence, for example, might take the form of a consumer’s answer to survey questions about the consumer’s preferences or characteristics. Secondary evidence, in contrast, appears in many small bits of data about the consumer’s past behavior.16 This behavioral evidence more accurately reflects consumers’

11 VIJAY KOTU & BALA DESHPANDE, PREDICTIVE ANALYTICS AND DATA MINING: CONCEPTS &

PRACTICE WITH RAPIDMINER 13 (2015); Testimony of Solon Barocas, FTC Workshop: Big Data: A

Tool for Inclusion or Exclusion, at 19 (FTC Sept. 15, 2014), available at

https://www.ftc.gov/system/files/documents/public_events/313371/bigdata-transcript-

9_15_14.pdf. 12 See KOTU & DESHPANDE, supra note 11, at xi (discussing prediction of customer defection

to a competitor); SIEGEL, supra note 1, at 83 (discussing prediction of loan repayment risk);

FEDERAL TRADE COMMISSION, OFFICE OF THE SECRETARY, DIRECT MARKETING ASSOCIATION

PUBLIC COMMENT, SPRING PRIVACY SERIES: ALTERNATIVE SCORING PRODUCTS 1, 4–5 (Apr. 17,

2014) available at https://www.ftc.gov/policy/public-comments/2014/04/17/comment-00011

(“predictive analytics are used to predict a consumer’s likelihood of being interested in a

product or service” and to “tailor[] marketing materials to meet the preferences of

consumers”). 13 SIEGEL, supra note 1, at 83 (discussing automobile insurer’s prediction of bodily injury

based on vehicle characteristics); id. at 64–65 (discussing health insurance companies’

predictions of policyholder mortality). 14 See Charles Duhigg, How Companies Learn Your Secrets, N.Y. TIMES, Feb. 16, 2012, available

at http://www.nytimes.com/2012/02/19/magazine/shopping-habits.html (discussing Target’s

pregnancy prediction score). 15 See Ryen W. White et al., Web-scale Pharmacovigilance: Listening to Signals from the Crowd,

20 J. AM. MED. INFORM. ASSOC. 404–08 (2012), available at http://jamia.oxfordjournals.org/

content/jaminfo/20/3/404.full.pdf (describing how web searches provide evidence of an

adverse interaction between two drugs, the antidepressant Paroxetine and the cholesterol

drug Pravastatin). 16 See KOTU & DESHPANDE, supra note 11, at 13.

Page 4: Spencer: Privacy and Predictive Analytics in E-commerce

632 New England Law Review v. 49 | 629

attitudes and preferences than self-reports.17

The familiar story of Target’s “pregnancy prediction score” illustrates the predictive analytics process.18 First, the merchant identifies a group of consumers who possess the condition of interest to create a training set.19 Target wanted to know which customers were pregnant because the changes in habit formation associated with pregnancy created a significant opportunity to secure future purchases.20 Target already had a set of customers with a known condition of interest — customers who had signed up for Target’s online baby shower registry and shared their due date.21 So Target would use a subset of these as its training set.22

Next, the merchant uses the many variables in its training set to develop a predictive model.23 Target’s training set would be a subset of its baby shower registrants. Target would use these customers’ detailed purchase histories to develop a model that weighs many different types of purchases in order to generate a “pregnancy score” to predict whether a given customer is pregnant.24 Eventually, Target settled on a model that included and weighed about twenty-five different products to produce a “pregnancy prediction” score.25

Third, the merchant tests and refines the model using a different subset of the customers with known conditions of interest.26 For example, Target would use another subset of its baby shower registrants to refine and perfect its pregnancy score model, along with other customers who showed no evidence of pregnancy.27

17 Lior Jacob Strahilevitz, Toward a Positive Theory of Privacy Law, 126 HARV. L. REV. 2010,

2023 (2013). 18 Duhigg, supra note 14. 19 See generally KOTU & DESHPANDE, supra note 11, at 17–19, 27–28 (discussing

implementation and usage of data mining processes); see Testimony of Solon Barocas, supra

note 11, at 20–21. 20 Duhigg, supra note 14. When consumers change their routines they are susceptible to

forming new shopping habits. Merchants, therefore, see new parents as a valuable customer

segment because landing them while their routines are in flux may produce substantial sales

over the long term. As a Target statistician explained, if Target could identify pregnant

consumers in their second trimester, “there’s a good chance we could capture them for years.”

Id. 21 Id. 22 Id. 23 See KOTU & DESHPANDE, supra note 11, at 27–28. 24 Duhigg, supra note 14. 25 Id. 26 KOTU & DESHPANDE, supra note 11, at 27–28. 27 See Duhigg, supra note 14. Although Duhigg does not mention testing the algorithm on

Page 5: Spencer: Privacy and Predictive Analytics in E-commerce

2015 Privacy and Pred ic t iv e Analyt ics 633

Finally, once the model is optimized, the merchant applies the final model to current prospects or customers.28 A Target employee illustrated how Target might use this prediction with regard to a hypothetical customer. Based on the customer’s purchase history, Target’s algorithm might assign an 87% chance that she is pregnant and due in August. Based on other data about her shopping habits, Target may also know the most likely marketing approaches to draw her to a Target store or website. For example, email coupons may trigger her to purchase online, whereas direct mail that arrives on a Friday may be likely to get her to a store over the weekend. By applying those techniques to the tens of thousands of consumers with high pregnancy prediction scores, Target hoped to reshape their shopping habits to generate purchases at Target for years to come.29

Target’s implementation involved sending ads for maternity and baby products to consumers with high pregnancy scores.30 Target’s model proved to be too accurate for its own good. One Minnesota father stormed into his local Target complaining that his teenage daughter was receiving maternity ads.31 The puzzled store manager could only apologize. A week later, however, the father called to apologize, saying that there had “been some activities in my house I haven’t been completely aware of. She’s due in August.”32 Target had learned that the daughter was pregnant before her father.

Predictive analytics has myriad uses in e-commerce, but they can be grouped into four common categories: (1) targeted advertising; (2) price discrimination; (3) customer segmentation; and (4) eligibility determinations for particular financial and insurance products.33 These

customers who had not signed up for the baby registry, Target had to include non-pregnant

customers in the test set to determine whether the model could predict the likelihood of

pregnancy. Id. 28 See KOTU & DESHPANDE, supra note 11, at 32. 29 Id. 30 Duhigg, supra note 14. 31 Id. 32 Id. 33 See CLAUDIA PERLICH, FED. TRADE COMM’N, SPRING PRIVACY SERIES: ALTERNATIVE

SCORING PRODUCTS 11–14 (Mar. 19, 2014), available at https://www.ftc.gov/system/files/

documents/public_events/182261/alternative-scoring-products_final-transcript.pdf (discussing

targeted advertising); FED. TRADE COMM’N, COMMENTS OF THE SOFTWARE & INFORMATION

INDUSTRY ASSOCIATION ON THE FTC WORKSHOP ON ALTERNATIVE SCORING PRODUCTS 9–10

(Apr. 17, 2014), available at https://www.ftc.gov/policy/public-comments/2014/04/17/comment-

00010 (discussing price discrimination); CATALYSIS, BUILDING BEST PRACTICE CUSTOMER

SEGMENTATION USING PREDICTIVE ANALYTICS (Feb. 10, 2012), available at

http://media.catalysis.com/prod/resources/files/articles/pdfs/Building%20Best%20practice%20

Page 6: Spencer: Privacy and Predictive Analytics in E-commerce

634 New England Law Review v. 49 | 629

categories are not mutually exclusive; they represent points along a continuum. For example, price discrimination that quotes some consumers impossibly high auto insurance rates can effectively render those consumers ineligible for auto insurance.

1. Online Behavioral Advertising

Online behavioral advertising means “the tracking of a consumer’s online activities over time — including the searches the consumer has conducted, the web pages visited, and the content viewed − in order to deliver advertising targeted to the individual consumer’s interests.”34 For example, a consumer might search a travel website for flights to New York City, but not buy tickets. The consumer might then visit a local newspaper to read about the Washington Nationals baseball team. On the newspaper’s website, the consumer would see a display ad for flights from Washington, D.C. to New York City.35

The behind the scenes process that led to the display ad involved the relationships between the travel website, the newspaper website, and an intermediary called a network advertiser. The travel website had an arrangement with a network advertiser (Doubleclick, for example), so when the consumer visited the travel website, the network advertiser placed a cookie on the consumer’s computer. This cookie tracks aspects of the user’s online behavior such as websites visited and includes a unique identifier assigned by the network advertiser. The newspaper website also had an arrangement with the network advertiser to place an ad on its website. So when the consumer visited the newspaper website, the network advertiser’s cookie identified the user as someone potentially interested in flying to New York, and displayed an ad consistent with that interest.36

customer%20segmentation.pdf (discussing use of predictive analytics in customer

segmentation); PAM DIXON & ROBERT GELLMAN, THE SCORING OF AMERICA: HOW SECRET

CONSUMER SCORES THREATEN YOUR PRIVACY AND YOUR FUTURE 8–9 (2014), available at

http://www.worldprivacyforum.org/wp-content/uploads/2014/04/WPF_Scoring_of_America_

April2014_fs.pdf (discussing eligibility determinations based on “consumer scores”). 34 FEDERAL TRADE COMMISSION STAFF REPORT: SELF-REGULATORY PRINCIPLES FOR ONLINE

BEHAVIORAL ADVERTISING 46 (2009), available at https://www.ftc.gov/sites/default/files/

documents/reports/federal-trade-commission-staff-report-self-regulatory-principles-online-

behavioral-advertising/p085400behavadreport.pdf. 35 Id. at 3. 36 Id. Contrast targeted advertising with “contextual advertising,” in which advertisers

place ads based on the content of the page, and therefore on inferences about the types of

consumers who will be reading that page. Jonathan R. Mayer & John C. Mitchell, Third-Party

Web Tracking: Policy & Technology, IEEE SYMP. ON SECURITY & PRIVACY (2012), available at

Page 7: Spencer: Privacy and Predictive Analytics in E-commerce

2015 Privacy and Pred ic t iv e Analyt ics 635

This process became more complex with the emergence of ad exchanges. Ad exchanges emerged in the mid-2000s as a way for websites to sell the “remnant” ad spaces they could not sell though advertising networks.37 For each ad in its inventory, an ad exchange takes bids in real time from many different advertising networks.38 Ad exchanges, however, did not change the tailoring of ad placement to consumers’ online behavior.

Predictive analytics can make online behavioral advertising more efficient by showing ads to consumers who are more likely to click on them. For example, an education “information portal” targeted at high school seniors used predictive analytics to increase the click through rate for its ads.39 The portal hired a predictive analytics firm to analyze millions of instances where consumers clicked or did not click different ads. The firm then generated many different models to decide which consumers’ behavioral profiles make them more likely to click which ads. Using the models generated increased the response rate by 25% over its existing online advertising.40

2. Price Discrimination

Price discrimination involves merchants selling “the same or similar products at different prices in different markets, where such price differentials are not based on differences in marginal cost.”41 Familiar examples of price discrimination include airlines selling seats on the same flight to different passengers at different rates and theaters offering senior discounts.42

Predictive analytics, however, allows merchants to make dynamic, real-time use of price discrimination in e-commerce. The Wall Street Journal

https://jonathanmayer.org/papers_data/trackingsurvey12.pdf; Blase Ur et al., Smart, Useful,

Scary, Creepy: Perceptions of Online Behavioral Advertising, SYMP. ON USABLE PRIVACY AND

SECURITY (SOUPS), July 11–13, 2012, available at https://www.andrew.cmu.edu/user/pgl/

soups2012.pdf. 37 Mayer & Mitchell, supra note 36, at 419. 38 Id. 39 Prediction Impact, Case Study: How Predictive Analytics Generates $1 Million Increased

Revenue, PREDICTIVE ANALYTICS WORLD, http://www.predictiveanalyticsworld.com/

casestudy.php (last visited Sept. 3, 2015). 40 Id. 41 NICK WILKINSON, MANAGERIAL ECONOMICS: A PROBLEM-SOLVING APPROACH 396 (2005),

available at http://www.railassociation.ir/Download/Article?books?Managerial%20Economics-

%20A%20Problem%20Solving%20Approach.pdf. 42 See id.

Page 8: Spencer: Privacy and Predictive Analytics in E-commerce

636 New England Law Review v. 49 | 629

reported on companies:

consistently adjusting prices and displaying different product offers based on a range of characteristics that could be discovered about the user. Office Depot, for example, told the Journal that it uses “customers’ browsing history and geolocation” to vary the offers and products it displays to a visitor to its site.43

Similarly, Capital One Financial used “personalization technology to decide which credit cards to show first-time visitors to its website.”44 The Journal’s follow-up testing showed that users deemed to have “excellent credit” saw different cards than those with “average credit.”45

Discrimination need not be limited to price. A major cable company worked with data broker eBureau to “determine the appropriate equipment and service packages to sell to each new customer.”46 The company developed a predictive model that “identified and segmented the risk for every online lead, ultimately scoring and rank ordering each customer for appropriate level of service and equipment.”47

3. Customer Segmentation

Customer segmentation groups people or organizations with similar characteristics such as demographics, purchase histories, or preferences.48 Segmenting customers improves merchants’ marketing and customer retention efforts by helping them understand their customers better.49 Predictive analytics augments the segmentation process to reveal more

43 Jennifer Valentino-Devries et al., Websites Vary Prices, Deals Based on Users' Information,

WALL STREET J., Dec. 24, 2012, available at http://www.wsj.com/articles/SB10001424127887323

777204578189391813881534 (referencing Staples, Discover Financial Services, Rosetta Stone

Inc. and Home Depot Inc.). 44 Id. 45 Id. 46 U.S. PIRG & CENTER FOR DIGITAL DEMOCRACY, PROTECTING CONSUMER PRIVACY AND

WELFARE IN THE ERA OF “E-SCORES,” REAL-TIME BIG-DATA “LEAD GENERATION” PRACTICES

AND OTHER SCORING/PROFILE APPLICATIONS 11, COMMENTS SUBMITTED TO FTC WORKSHOP:

ALTERNATIVE SCORING PRODUCTS (2014), available at https://www.ftc.gov/policy/public-

comments/comment-00006-75 (citing eBureau, Fortune 500 and Top 5 Cable Operator,

http://www.ebureau.com/sites/all/files/file/ebureau_successstory_top5cable_operator.pdf). 47 Id. 48 David Vergara, Database: Get a Little Closer: Use Effective Segmentation with Predictive

Analytics to Personalize Customer Relationships (May 2009),

http://www.targetmarketingmag.com/article/use-effective-segmentation-predictive-analytics-

personalize-customer-relationships-406169/1. 49 Id.

Page 9: Spencer: Privacy and Predictive Analytics in E-commerce

2015 Privacy and Pred ic t iv e Analyt ics 637

subtle and granular segments than traditional approaches.50

Companies often use customer segmentation to develop “churn scores” identifying the risk that a customer will defect to a competitor.51 For example, a cellular phone carrier may use predictive analytics to identify the customers who are most likely to switch carriers within a few months. The same company may then use predictive analytics to identify which potential defectors offer sufficient long-term value to merit spending resources to retain them. Finally, predictive analytics may help that company determine what offers are most likely to persuade the valuable customers to stay.52

Predictive analytics can also be used to decide what level of service to deliver to each customer. For example, a merchant’s call center can connect high-value customers to the best customer service agents, while routing lower-value customers to an “outsourced overflow call center.”53

4. Eligibility Determinations

Predictive analytics can also help merchants decide whether to do business at all with certain consumers.54 Many consumers are familiar with credit scores like the FICO Score widely used to determine loan eligibility.55 However, the proliferation of data in e-commerce allows merchants to create and use consumer scores in many other contexts. Merchants may refuse to do business with some consumers because of a risk of fraud or default.56 Many of these consumer scores are not subject to the Fair Credit Reporting Act. For example, Experian offers a “Consumer View Profitability Score” designed to “predict, identify, and target prospect in households likely to be profitable and pay debt.”57 The database includes

50 Id. 51 DIXON & GELLMAN, supra note 33, at 51–52. 52 See IBM SOFTWARE, REAL WORLD PREDICTIVE ANALYTICS: PUTTING ANALYSIS INTO ACTION

FOR VISIBLE RESULTS 6–8 (2010), available at http://www.revelwood.com/uploads/whitepapers/

PA/WP_Real-World-Predictive-Analytics_IBM_SPSS.pdf. 53 Natasha Singer, Secret E-Scores Chart Consumers’ Buying Power, N.Y. TIMES (Aug. 12,

2012), http://www.nytimes.com/2012/08/19/business/electronic-scores-rank-consumers-by-

potential-value.html. 54 DIXON & GELLMAN, supra note 33, at 19–21. 55 See FICO Score, Critical in Billions of Lending Decisions, FICO, http://www.fico.com/en/

products/fico-score (last visited Sept. 3, 2015). 56 DIXON & GELLMAN, supra note 33, at 53–55; IBM SOFTWARE, REAL WORLD PREDICTIVE

ANALYTICS: PUTTING ANALYSIS INTO ACTION FOR VISIBLE RESULTS 2 (2010), available at

http://www.revelwood.com/uploads/whitepapers/PA/WP_Real-World-Predictive-

Analytics_IBM_SPSS.pdf. 57 DIXON & GELLMAN, supra note 33, at 46.

Page 10: Spencer: Privacy and Predictive Analytics in E-commerce

638 New England Law Review v. 49 | 629

information on “235 million consumers and 117 million households from hundreds of data sources.” Scores like this can serve as proxies for credit risks. However, because they assess households rather than individuals, they are not governed by the Fair Credit Reporting Act.58

Merchants may also make de facto eligibility determinations by not targeting prospects who may be credit risks. Their risk assessment may include traditional credit scores or include other variables including “the history of which customers proved to be good or bad risks in this business.”59 Consumers with scores above a certain risk threshold will be excluded from marketing outreach, while some risky prospects may be targeted if their potential value is high enough.60

II. Predictive Analytics and Consumer Privacy

A. Prevailing Theories of Privacy

The prevailing theoretical accounts of privacy describe what privacy is and what privacy does. Many theorists define privacy as the individual’s ability to control what others know about him or her.61 This notion of privacy as control reaches back to Warren and Brandeis’ famous account of the “right to be let alone” in their seminal 1890 law review article, The Right to Privacy.62 Privacy as control also animated Alan Westin’s work in the 1960s defining privacy as control in four different states: solitude, anonymity, intimacy, and reserve.63

Leading theorists have identified privacy’s instrumental value for promoting personal dignity and autonomy in ways that are important for individual personality, healthy civic discourse, and democratic governance.64 For Warren and Brandeis, privacy promotes the “inviolate personality.”65 Edward Bloustein observed that privacy defines one’s essence as a human being by promoting individual dignity, integrity, personal autonomy, and independence.66 Similarly, Ruth Gavison

58 Id. 59 IBM SOFTWARE, supra note 56, at 8. 60 Id. at 9. 61 Daniel J. Solove, Conceptualizing Privacy, 90 CALIF. L. REV. 1087, 1092 (2002) (identifying

varying accounts of privacy as the right to be let alone, limited access to the self, secrecy,

control over personal information, personhood, and intimacy). 62 Samuel D. Warren & Louis D. Brandeis, The Right to Privacy, 4 HARV. L. REV. 193 (1890). 63 ALAN F. WESTIN, PRIVACY AND FREEDOM 31–32 (1967). 64 Solove, Conceptualizing Privacy, supra note 61, at 1093 (noting accounts of privacy’s

importance for “freedom, democracy, social welfare, [and] individual well-being”). 65 Warren & Brandeis, supra note 62, at 205. 66 Edward J. Bloustein, Privacy as an Aspect of Human Dignity: An Answer to Dean Prosser, 39

Page 11: Spencer: Privacy and Predictive Analytics in E-commerce

2015 Privacy and Pred ic t iv e Analyt ics 639

described privacy’s role as promoting “liberty, autonomy, selfhood, . . . human relations, and . . . the existence of a free society.”67

B. Predictive Analytics and Privacy as Control

One can assume that the daughter in Target’s pregnancy prediction score story did not want Target to know that she was pregnant. After all, she apparently did not sign up for Target’s baby shower registry. If Target had asked her in a survey whether she was pregnant, she surely would have said no. But when she shared all of her shopping habits with Target, she could not possibly know that she was also sharing secondary evidence that Target would use to generate a pregnancy prediction score. Had the daughter known what Target could learn from her purchases, she might have exercised control over what Target could learn about her by paying in cash or shopping elsewhere. But that was not an option. Moreover, for the many consumers who shared their purchases with Target before Target developed its pregnancy prediction model, even Target did not know that customer purchases could predict pregnancy.

The control problem gets even more challenging when companies combine their internal data with third party data to build predictive models. For example, a merchant might provide data on its existing “high value” customers to a predictive analytics company. The predictive analytics company then combines the merchant’s data with information about those same customers obtained from third parties. Finally, the predictive analytics company uses the combined data to develop a model to help identify future high value prospects.68 If the consumers could not anticipate future predictive uses of the data they shared with the merchant, they certainly could not know about the future predictive uses of the data shared with third parties.

Merchants themselves have difficulty valuing data’s future uses. For example, at the time of Facebook’s initial public offering in 2012, the issuing banks had valued Facebook at $104 billion.69 However, Facebook’s audited financial statements for 2011 reported assets of only $6.3 billion, which included cash and physical assets but excluded the vast stores of

N.Y.U. L. REV. 962, 965–66, 1002–03 (1964). 67 Ruth Gavison, Privacy and the Limits of Law, 89 YALE L.J. 421, 423 (1980). 68 See, e.g., TruSignal, Leading Life Insurance Broker Case Study: Firm Reaches New High Value

Customers Through Targeted Display Advertising, TRU-SIGNAL.COM, http://www.tru-

signal.com/wp-content/uploads/2014/11/TruSignal-Leading_Life_Insurance_Broker.pdf (last

visited Sept. 13, 2015) (describing a model using more than 100 predictive factors to identify a

lookalike audience of over 8 million high value prospects). 69 MAYER-SCHÖNBERGER & CUKIER, supra note 2, at 118.

Page 12: Spencer: Privacy and Predictive Analytics in E-commerce

640 New England Law Review v. 49 | 629

personal data that are Facebook’s lifeblood. The data valuation challenge arises because most of data’s value lies in unknown future secondary uses, rather than the original purpose of collection.70 If merchants cannot value data easily and consistently, consumers can hardly be asked to value unknown future uses of their own data.

C. Predictive Analytics and Personal Autonomy and Dignity

Predictive analytics also harms personal autonomy and dignity in several ways. One type of harm arises from the nature of predictive algorithms and their secret e-commerce status. Another arises from the risk that predictive algorithms can institutionalize latent societal discrimination.

1. Algorithms in E-Commerce: Secret, Predictive, and Imperfect

As described above, predictive models can determine the prices consumers pay, the level of service merchants provide, and even consumers’ eligibility to make purchases or obtain such essentials as credit, housing, and insurance. To most consumers, however, the very existence of the models that dole out these important commercial benefits is a secret. To the extent that some consumers know they exist, merchants will not reveal how they work, and consumers are powerless to reverse engineer them. A recent White House report observed that “big data analytics may . . . create such an opaque decision-making environment that individual autonomy is lost in an impenetrable set of algorithms.”71 Thus, the real-world effects of these secret algorithms diminish consumers’ sense of autonomy and dignity.72

Next, predictive models do not judge individuals based on their own actions. Instead, they judge individuals based on things they have not yet done, and even worse, on things that other people did.73 Research suggests that people have an aversion to algorithms because of notions that

70 See id. at 118–20. 71 EXECUTIVE OFFICE OF THE PRESIDENT, BIG DATA: SEIZING OPPORTUNITIES, PRESERVING

VALUE 10 (2014), available at https://www.whitehouse.gov/sites/default/files/docs/big_data_

privacy_report_may_1_2014.pdf. 72 See Danielle Keats Citron & Frank Pasquale, The Scored Society: Due Process for Automated

Predictions, 89 WASH. L. REV. 1, 27 (2014) (discussing how secret scoring systems affecting

credit, housing, employment, and other opportunities threaten human dignity); Strahilevitz,

supra note 17, at 2028 (discussing dignitary harm from “service discrimination”). 73 See Testimony of Solon Barocas, supra note 11, at 20–21; KOTU & DESHPANDE, supra note

11, at 17–19, 27–29.

Page 13: Spencer: Privacy and Predictive Analytics in E-commerce

2015 Privacy and Pred ic t iv e Analyt ics 641

“algorithms are dehumanizing” or “cannot properly consider individual” subjects.74 For these reasons, predictive models challenge individuals’ sense of autonomy.

Finally, predictive models are always wrong for a subset of consumers. Merchants do not need them to be perfect as applied to every consumer. They merely need them to be better than the previous approaches to pricing, marketing, and eligibility determinations.75 So, predictive models optimize profits for the merchants, but inevitably misclassify some consumers. Misclassified consumers who pay higher prices, have fewer options, and cannot secure credit, housing, or insurance are simply the “collateral damage” of a predictive algorithm. Treating some consumers as “collateral damage” from an algorithm they can neither see nor comprehend offends their sense of dignity and autonomy.

2. The Institutionalization of Societal Discrimination

Predictive analytics can institutionalize existing societal prejudices.76 Although the mathematics underlying algorithms may be free from prejudice, the choices that data scientists must make are not. First, someone must decide how to define the target variable, which carries a risk of intentional or unintentional discrimination.77 Target, for example, saw pregnancy as a valuable trait in its customers, but some insurers might view pregnancy differently. Next, someone must decide what training data to use. If that training data resulted in part from societal discrimination, then the existing discriminatory effects will be baked into the predictive model.78 For example, if there already exists a discriminatory pattern of lenders targeting poor consumers for unfavorable credit terms, then a model trained on those data will reproduce that pattern of discrimination. Poor consumers will be saddled with higher debt service, and will

74 Berkeley J. Dietvorst et al., Algorithm Aversion: People Erroneously Avoid Algorithms After

Seeing Them Err, 144 J. EXPERIMENTAL PSYCHOL.: GEN. 114–26 (2015), available at

http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2466040. 75 MAYER-SCHÖNBERGER & CUKIER, supra note 2, at 45–49 (describing predictive modeling

as aiming for “good enough” results). 76 Testimony of Solon Barocas, supra note 11, at 19–22; see FRANK PASQUALE, THE BLACK

BOX SOCIETY: THE SECRET ALGORITHMS THAT CONTROL MONEY AND INFORMATION 41 (2015);

Citron & Pasquale, supra note 72, at 4–5, 13. 77 Testimony of Solon Barocas, supra note 11, at 19–20. 78 Id. at 20–22; accord Michael Aleo & Pablo Svirsky, Foreclosure Fallout: The Banking

Industry’s Attack on Disparate Impact Race Discrimination Claims Under the Fair Housing Act and

the Equal Credit Opportunity Act, 18 B.U. PUB. INT. L.J. 1, 5 (2008) (discussing the irony of

charging higher rates to riskier debtors, thus increasing the risk of default); see also Citron &

Pasquale, supra note 72, at 18 n.106.

Page 14: Spencer: Privacy and Predictive Analytics in E-commerce

642 New England Law Review v. 49 | 629

therefore have even fewer resources available to engage in the kinds of behaviors that would convince the predictive model that they should receive more favorable credit terms.79 And because predictive models operate invisibly to consumers, they render patterns of discrimination nearly undetectable. Each consumer sees only the offers that merchants make to them, not the offers merchants make to others.80 As Michael Fertick has observed, the rich see a different internet than the poor.81

Predictive analytics company TruSignal offers an “Ideal Audiences” service to let merchants market only to online consumers that “look like” current high-value customers.82 TruSignal generates ideal audiences by drawing on predictive analytics tools and its “big data warehouse of offline consumer profile information.”83 TruSignal draws its audience data from the Bluekai Exchange, which purports to hold “actionable audience data” on 80% of the U.S. Internet population.84 The risk, of course, is that the way that current high-value customers “look” may be a product of societal discrimination. If so, building a model to replicate those customers institutionalizes that discrimination.

Researcher Nathan Newman found evidence that online display advertising reinforced racial stereotypes. He created test Gmail accounts and assigned some white-sounding names, others African-American-sounding names, and still others Latino-sounding names. He then sent emails about several different topics to and from the test accounts. Because Google scans all Gmail and delivers ads based on their content, he wanted to see if the types of ads delivered would vary if he held the content constant but varied the names. When the test accounts sent emails about car purchases, the white-sounding names all saw ads from car dealers or

79 Cf. PASQUALE, supra note 76, at 41. 80 Citron & Pasquale, supra note 72, at 10–11; Michael Fertik, The Rich See a Different Internet

Than the Poor, SCIENTIFIC AMERICAN, (Jan. 15, 2013), http://www.scientificamerican.com/

article/rich-see-different-internet-than-the-poor/. 81 Fertik, supra note 80; accord EXECUTIVE OFFICE OF THE PRESIDENT, supra note 71, at 10

(discussing the risk of disparate treatment of disadvantaged groups); Singer, supra note 53

(discussing how financial sector consumer scores risk creating “a new subprime class”);

Joseph W. Jerome, Buying and Selling Privacy: Big Data's Different Burdens and Benefits, 66 STAN.

L. REV. ONLINE 47, 51 (2013) (discussing how big data harms self-determination and

autonomy, especially for poor consumers). 82 TruSignal Unveils High Value Consumer Audience Targeting Segments on the BlueKai

Exchange, TRUSIGNAL (Feb. 16, 2012), http://www.tru-signal.com/press-releases/trusignal-

unveils-high-value-consumer-audience-targeting-segments-on-the-bluekai-exchange. 83 Id. 84 Data Activation: The Audience Data Marketplace, ORACLE | BLUEKAI,

http://www.bluekai.com/audience-data-marketplace.php (last visited Sept. 13, 2015).

Page 15: Spencer: Privacy and Predictive Analytics in E-commerce

2015 Privacy and Pred ic t iv e Analyt ics 643

car buying sites. In contrast, the African-American sounding names all saw at least one ad related to “bad credit card loans” or used car purchases. The Latino-sounding names saw a mix. When the test accounts sent emails with the term “education” in the subject line, the white-sounding names saw more ads for graduate education, while the non-white names saw more ads for undergraduate and non-college education.85

Predictive analytics may also reinforce class-based discrimination. A Wall Street Journal investigation showed that geographically-based price discrimination can “reinforce patterns that e-commerce had promised to erase: prices that are higher in areas with less competition, including rural or poor areas. It diminishes the Internet’s role as an equalizer.”86 The Journal found that Staples offered discount prices to ZIP codes with weighted average income of about $59,900, but offered higher prices to ZIP codes with weighted average incomes of about $48,700.87

III. Implications for Regulating Predictive Analytics

Any regulatory response must balance potential harm against the commercial benefits of predictive analytics.88 Predictive analytics can maximize revenues by improving marketing efficiency, improving customer retention, attracting high-value customers, preventing fraud, and avoiding credit risks. It can also improve the consumer experience by providing consumers with more relevant ads, offers, and products.89

In addition, regulation must avoid stifling innovation. Legislating technology can have unintended consequences, and legislators may have difficulty keeping pace with developing technologies. For example, there is near-universal agreement that the Electronic Communications Privacy

85 Nathan Newman, Racial and Economic Profiling in Google Ads: A Preliminary Investigation

(Updated), HUFFINGTON POST (Nov. 20, 2011, 5:12 AM), http://www.huffingtonpost.com/

nathan-newman/racial-and-economic-profi_b_970451.html, noted in PASQUALE, supra note 76,

at 40. For a similar study finding statistically significant discrimination in ad delivery based

on searches of 2,184 racially associated personal names. See Latanya Sweeney, Discrimination

in Online Ad Delivery, ACM QUEUE (Apr. 2, 2013), https://queue.acm.org/detail.cfm?id=

2460278 (cited in PASQUALE, supra note 76, at 236). 86 Valentino-Devries et al., supra note 43. 87 Id. 88 For a discussion of how existing law may apply to predictive analytics in e-commerce,

see Shaun B. Spencer, Predictive Analytics, Consumer Privacy, and E-Commerce, in RESEARCH

HANDBOOK ON ELECTRONIC COMMERCE LAW (Elgar Publishing, John A. Rothchild ed.,

forthcoming 2015) (discussing application of the FTC Act, Children’s Online Privacy

Protection Act, Equal Credit Opportunity Act, Fair Housing Act, and state laws prohibiting

discrimination in insurance and public accommodations). 89 See MAYER-SCHÖNBERGER & CUKIER, supra note 2, at 58; SIEGEL, supra note 1, at 23.

Page 16: Spencer: Privacy and Predictive Analytics in E-commerce

644 New England Law Review v. 49 | 629

Act’s (“ECPA”) approach to e-mail privacy is based on a long-outdated conception of how e-mail works.90 Yet Congress has been unable to amend ECPA despite repeated attempts.91 Similarly, the Computer Fraud and Abuse Act (“CFAA”) has been decried as an abuse of justice and the “worst law in technology.”92

In the context of predictive analytics, regulators must avoid dictating what types of data merchants can use to train predictive models, aside from overtly discriminatory factors such as race. It also means that regulators should avoid regulating how the models are constructed. Instead, regulators should focus on the harmful outputs or effects of predictive models. As discussed above, the privacy harms caused by predictive analytics fall into three categories: (1) the loss of control over how one’s information is used; (2) autonomy and dignity harms from secret and even flawed uses of models to dole out commercial benefits; and (3) discriminatory allocation of commercial benefits. Figure 1 below proposes a classification for the degree of harm posed by the various uses of predictive analytics in e-commerce.

Figure 1: Degrees of Harm Posed by Predictive Analytics in E-commerce

Use

Harm

Online Behavioral

Advertising

Price

Discrimination

Customer

Segmentation

Eligibility: General

Commercial

Transactions

Eligibility:

Credit,

Housing,

Insurance

Error Rate Minimal Minimal Minimal Minimal Moderate

Secrecy Minimal Minimal Minimal Minimal Moderate

90 See, e.g., Charles H. Kennedy, An ECPA for the 21st Century: The Present Reform Efforts and

Beyond, 20 COMMLAW CONSPECTUS 129, 129, 145–53 (2011) (discussing the challenges in

applying ECPA’s outdated framework to unanticipated technologies). 91 Id. at 153–61 (discussing reform efforts). For recent attempts to amend ECPA, see

Electronic Communications Privacy Act Amendments Act of 2015, S. 356, 114th Cong. (2015);

Electronic Communications Privacy Act Amendments Act of 2013, S. 607, 113th Cong. (2013);

Electronic Communications Privacy Act Amendments Act of 2011, S. 1011, 112th Cong. (2011). 92 Lothar Determann, Internet Freedom and Computer Abuse, 35 HASTINGS COMM. & ENT. L.J.

429, 429–30 (2013) (quoting Tim Wu, Fixing the Worst Law in Technology, THE NEW YORKER

(Mar. 18, 2013), available at http://www.newyorker.com/news/news-desk/fixing-the-worst-law-

in-technology).

Page 17: Spencer: Privacy and Predictive Analytics in E-commerce

2015 Privacy and Pred ic t iv e Analyt ics 645

Poverty

Discrimination

Moderate Moderate Moderate High High

Discrimination

Against

Traditionally

Protected Classes

High High High High High

Areas of minimal harm should be left largely to the market with minimal regulatory intervention. The error rate, in particular, should be left to competition between merchants to develop the most accurate models. To mitigate the privacy harm from the secret use of predictive analytics, the FTC could treat merchants’ privacy policies as deceptive acts under the FTC Act, unless they disclose predictive uses of data in at least general terms.93 This approach is by far the most achievable because it would not require any new legislation or regulations.

Areas of moderate and high harm, in contrast, require more direct intervention targeted at the harmful outcomes. For the moderate harms caused by flawed data or model error in critical eligibility determinations, regulatory approaches should require that merchants use statistically sound methodology94 and should afford consumers the opportunity to review and correct data.95

To address the high harms caused by discrimination against the poor and against traditionally protected classes, new legislation should authorize disparate-impact claims concerning eligibility for all commercial transactions. Disparate-impact claims are currently authorized to varying degrees in credit, housing, insurance, and employment sectors.96 We ought

93 See Daniel J. Solove & Woodrow Hartzog, The FTC and the New Common Law of Privacy,

114 COLUM. L. REV. 583, 585 (2014) (discussing FTC privacy enforcement actions pursuant to

its jurisdiction over unfair and deceptive trade practices). 94 For example, the Equal Credit Opportunity Act requires that creditors using “empirically

derived” scoring systems use statistically sound methodology. 15 U.S.C. § 1691(b)(3) (2012); 12

C.F.R. § 202.2(p) (2014). 95 For example, the Fair Credit Reporting Act provides consumers with the right to access

information in their credit file, 15 U.S.C. § 1681(g) (2012), and provides a procedure for

consumers to dispute the information in their file, 15 U.S.C. § 1681(i) (2012). 96 See Spencer, supra note 88 (discussing how disparate-impact claims under ECOA and

FHA regulations may apply to predictive analytics). In 2015, the U.S. Supreme Court

confirmed that disparate-impact claims are cognizable under the FHA, and offered guidance

on the plaintiff’s prima facie case, the burden-shifting framework, and the business necessity

defense in FHA cases. Tex. Dep’t of Hous. & Cmty. Affairs v. Inclusive Cmtys. Project, Inc.,

Page 18: Spencer: Privacy and Predictive Analytics in E-commerce

646 New England Law Review v. 49 | 629

not tolerate practices that exclude traditionally protected classes from commercial transactions, regardless of whether that exclusion is unintended.

Applying the disparate-impact test to predictive analytics, however, will be quite challenging.97 Under the traditional burden-shifting framework of employment cases, business necessity is a defense to proof of disparate impact.98 Merchants may argue that predictive models by their very nature must survive disparate-impact scrutiny because they produce the most efficient results. Accordingly, they would argue, there is no alternative model that would have a less discriminatory impact while maintaining an equally efficient outcome. Consumers and regulators may respond that a modest increase in marginal profits should not excuse disadvantaging protected classes. Consumers could draw analogies to employers’ failed attempts to justify discriminatory hiring practices based on the biases of their customers.99 As a practical matter, of course, passing broad disparate impact-legislation seems unlikely. But this debate over how to apply disparate impact-claims to predictive analytics will likely

135 S. Ct. 2507, 2514–15, 2525 (2015). The Court emphasized several limitations on disparate-

impact liability to ensure that regulated entities can make “practical business choices and

profit-related decisions” essential to free enterprise. Id. at 2518. First, the plaintiff cannot

establish a prima facie case based solely on a statistical disparity. Instead, the plaintiff must

also identify the “defendant’s policy or policies causing that disparity.” Id. at 2523. Second, the

Court emphasized that defendants facing FHA disparate-impact claims have a defense

analogous to the business necessity defense in Title VII cases, and that the defense allows

defendants to “state and explain the valid interest served by their policies.” Id. at 2522. Third,

the Court emphasized that, to refute the defendant’s stated business need or government

interest, the plaintiff must show an available alternative practice with less disparate-impact

that still serves the defendant’s legitimate needs. Id. at 2518. Finally, the Court stressed that

the defendant’s “policies are not contrary to the disparate-impact requirement unless they are

‘artificial, arbitrary, and unnecessary barriers.’” Id. at 2524 (quoting Griggs v. Duke Power

Co., 401 U.S. 424, 431 (1971)). 97 In fact, discovering disparate impact in the first place may be challenging, because

consumers will not know why the merchant refused to do business with them or whether

other consumers in the protected class were excluded. 98 See 42 U.S.C. § 2002e-2(k)(1)(A)(i) (2012) (requiring that, after complainant demonstrates

disparate impact on protected class, employer must demonstrate that “the challenged practice

is job related for the position in question and consistent with business necessity”). 99 See, e.g., Rucker v. Higher Educ. Aids Bd., 669 F.2d 1179, 1181 (7th Cir. 1982) (holding

that Title VII forbids “refus[ing] on racial grounds to hire someone because your customers or

clientele do not like his race”); Diaz v. Pan Am. World Airways, Inc., 442 F.2d 385, 389 (5th

Cir. 1971) (rejecting customer preference for female flight attendants as justification for sex

discrimination, where discriminatory employment policy was not founded on “business

necessity”).

Page 19: Spencer: Privacy and Predictive Analytics in E-commerce

2015 Privacy and Pred ic t iv e Analyt ics 647

play out under the ECOA and FHA.100

CONCLUSION

Predictive analytics promises substantial benefits for merchants and consumers alike, but can harm consumer privacy. Balancing these benefits and harms requires careful attention to the nature and degree of the harm, as well as the risk that regulatory intervention may stifle innovation. This article proposes that regulators allow the market to police areas of minimal harm flowing from the error rates and secrecy inherent in predictive analytics. For more significant harms involving discrimination against the poor and against traditionally protected classes, regulators and legislators should intervene, either using existing legal tools or through new legislation targeted at potential discriminatory impact from predictive analytics.

100 See Spencer, supra note 88 (discussing how disparate impact claims under ECOA and

FHA regulations may apply to predictive analytics); Solon Barocas & Andrew D. Selbst, Big

Data’s Disparate Impact, 104 CALIF. L. REV. (forthcoming 2016) (discussing how disparate

impact claims may be applied to predictive analytics).