Causal data mining: Identifying causal effects at scale

Causal data mining: Identifying causal effects at scaleAMIT SHARMA Postdoctoral Researcher, Microsoft Research New Yorkhttp://www.amitsharma.in@amt_shrma

A tale of two questions

Q1: How much activity comes from the recommendation system?

Q2: How much activity comes because of the recommendation system?

How much activity comes because of the recommendation system?

A causal question.

With recommender

Without recommender

Real world Counterfactual world

2. Evaluating systems

1. Modeling user behavior

Understanding causal relationships from data

Distinguishing between personal preference and homophily in online activity feeds. Sharma and

Cosley (2016).

Studying and modeling the effect of social explanations in recommender systems. Sharma

and Cosley (2013).

Amit and Dan like this.

SOME MUSICAL ARTIST

2. Evaluating and improving systems

Distinguishing between personal preference and homophily in online activity feeds. Sharma and

Cosley (2016).

Studying and modeling the effect of social explanations in recommender systems. Sharma

and Cosley (2013).

Amit and Dan like this.

SOME MUSICAL ARTIST

Averaging Gone Wrong: Using Time-Aware Analyses to Better Understand Behavior. Barbosa, Cosley,

Sharma, Cesar (2016)

Auditing search engines for differential satisfaction across demographics. Mehrotra, Anderson, Diaz, Sharma,

Wallach (2016)

A core problem across the sciencesJake and Duncan like this

Code profiling, static analysis [Berger et al.]

Debugging machine learning [Chakarov et

Decision-making in robotics

Why is it hard?

Without:

Recommender System

algorithm Any code

change Social policy,

medical treatment

Observed Data from the Real world

No data from the Counterfactual

Without a randomized experiment, hard to estimate.

Difference between prediction and causation

Cause (X=) Outcome (Y)

Unobserved Confounders

(U)𝒚= 𝑓 (𝒙 ,𝑢) 𝑔 𝑓

Hofman, Sharma, and Watts (2017). Science, 355.6324

Prediction: = ( )+𝑦 𝑘 𝑥 𝜖

¿ 𝑋 ,𝑌>¿�̂� ?

Hofman, Sharma, and Watts (2017). Science, 355.6324

Causation:

¿ 𝑋 ,𝑌>¿

Research goal

How can we use large-scale data to infer causal estimates?

Use algorithms to find experiment-like data: Quasi (”Natural”) experiments

12 PredictionCausation

𝑦=𝛽𝑥+𝜖

¿ 𝑋 ,𝑌>¿�̂�

¿ 𝑋 ,𝑌>¿

�̂�

Natural experiment

Combine Pearl’s causal graphical model framework with natural

experiments

Inverting the natural experiment paradigmHypothesize about a natural variation

Argue why it resembles a randomized experiment

Observational DataDevelop tests for validity of natural

variation

Mine for data subsets with such

valid variations

¿ 𝑋 ,𝑌>¿

Natural Experimen

tNatural

Experiment

Since 1850s¿ 𝑋 ,𝑌>¿

¿ 𝑋 ,𝑌>¿Natural

Experiment

¿ 𝑋 ,𝑌>¿

1. Split-door Criterion Causal effect of recommender systems

2. Bayesian Natural Experiment Test Validate past economics studies

Data mining for causal inference

¿ 𝑋 ,𝑌>¿

Part 0: Traditional causal inference using a natural experiment

1854: London was having a devastating cholera outbreak

Causal question: What is causing cholera?Air-borne: Spreads through air (“miasma”)

Water-borne: Spreads through contaminated water

Polluted Air

Cholera Diagnosis

Contaminated Water

Cholera Diagnosis

Neighborhood

Enter John Snow. He found higher cholera deaths near a water pump, but could be just

correlational.

LAMBETH

COMPANY

New Idea: Two major water companies for London:

one upstream and one downstream.

No difference in neighborhood, still an 8-fold increase in cholera with the downstream

company.

S&V and Lambeth

Led to a change in belief about cholera’s cause.

• Choice of water company cannot cause cholera.

• Choice of water company was not related to people’s neighborhood or its air quality. • People receiving water from the two companies were interspersed

within neighborhoods.

Why was Snow’s study so convincing?

Choice of water company cannot cause cholera.Choice of water company is not related to neighborhood.

Probably the first application of cause-effect principles

Exclusion

As-if-random

Contaminated Water

Cholera Diagnosis

Neighborhood

Water Compan

Contaminated Water (X)

Cholera Diagnosis

Other factors [e.g.

neighborhood] (U)

Water Compan

As-If-Random

Exclusion

Two assumptions central to causal inference: Exclusion and As-if-random

28(𝑍 ∐𝑌∨𝑋 ,𝑈 )

Cause (X) Outcome (Y)

New variable

As-If-Random

Exclusion

Two assumptions central to causal inference: Exclusion and As-if-random

1930s: Fisher introduces randomized experiment

Since then, these assumptions have formed the core of causal inference

Cause (X)

Outcome (Y)

Randomized

Assignment (Z)

Exclusion: Randomized assignment should not affect outcome.As-if-random: Randomized assignment should be independent of unobserved confounders.

Z is now a special observed variable, called an instrumental variable.

All studies using observational data also need to satisfy these two assumptions

Cause (X)

Outcome (Y)

Instrumental Variable

But Exclusion and As-if-random are hard to establish, because of unobserved confounders.

More formally…

Full dataset Subsets of the data

¿ 𝑋 ,𝑌>¿

Such that:As-If-Random: Exclusion:

Hard to verify from observed data.

Current methods haven’t changed much from that used by John Snow in 1850s.Use rhetorical arguments to justify an instrumental variable.

1. Manually finding an instrumental variable restricts researchers to single-source events (e.g. weather or lottery)

2. Still no guarantee that either Exclusion or As-if-random is satisfied.

Causal data mining: Inverting the natural experiment paradigm

Hypothesize about a natural variation

Observational DataDevelop tests for validity of natural

variation

Mine for data subsets with such

valid variations

¿ 𝑋 ,𝑌>¿

Part I: Split-door criterion for causal identification

Intuition: What if we can observe an auxiliary outcome that is unaffected by causal variable?

Cause Outcome

Unobserved

Confounders

Auxiliary

Outcome

Outcome can be separated into two observable parts:

i) Primary outcome: (possibly) affected by cause

ii) Auxiliary outcome: unaffected by cause

CausePrimary Outcom

Unobserved

Confounders

Auxiliary

Outcome

Cause Outcome

Unobserved

Confounders

Auxiliary

Outcome

Cause Outcome

Unobserved

Confounders

Auxiliary

Outcome

Simplest case: Outcome can be separated into two observable parts

i) Primary outcome: (possibly) affected by cause

ii) Auxiliary outcome: unaffected by cause

Such outcome data commonly available in digital systemsRecommender systemsAd systemsApp notificationsAny content website (such as news)

Let’s take a concrete example: recommender systems

Can we find such an auxiliary outcome ()?

Example: Estimating the causal impact of a recommender system (novel recommendations)

How much activity comes from the recommendation system?

30% of product page visits.

30% of groups joined.

80% of movies watched.

Sharma and Yan (2013), Sharma, Hofman and Watts (2015), Gomez and Hunt (2015)

Confounding: Observed click-throughs may be due to correlated demand

Demand for The Road

Visits to The Road

Rec. visits to No

Country for Old

Demand for No Country for Old Men

Correlated Demand for Cormac McCarthy

Observed activity is almost surely an overestimate of the causal effect

Causal

Convenience

OBSERVED ACTIVITY

FROM RECOMMENDER

All page visits

ACTIVITY WITHOUT

RECOMMENDER

Counterfactual thought experiment: What would have happened without recommendations?

Hypothetical experiment: Randomized A/B test

But such experiments can be costly.Can we develop an offline metric?

Treatment (A) Control (B)

Past work: traditional instrumental variable

Instrument

Demand for Cormac

McCarthy

Visits to The Road

Rec. visits to No

Country for Old

Carmi et al. (2012)

Data mining approach (Shock-IV): Finding valid shocks across product categories

Shock to demand of a product due to

¿ 𝑋 ,𝑌>¿ Develop tests for validity of a shock

Mine for shocks in observational data

¿ 𝑋 ,𝑌>¿

Finding auxiliary outcome: Split outcome into recommender (primary) and direct visits (auxiliary)

All visits to a recommended

product

Recommender visits Direct visits

Search visits

Direct browsing

Auxiliary outcome: Proxy for unobserved demand

Causal graphical model for effect of a recommendation system

Demand for focal

product(UX)

Visits to focal

product (X)Rec. visits

(YR)Direct

visits (YD

Demand for rec.

product(UY)

1a. Search for any product with a shock to page visits

1b. Filtering out invalid natural experiments

The “split-door” criterionTest if auxiliary outcome is independent of the cause. Criterion:

Exclusion

Demand for focal

product(UX)

Visits to focal

product (X)Rec. visits

(YR)Direct

visits (YD

Demand for rec.

product(UY)

More formally, why does it work?

Theorem 1: Barring incidental equality of parameters, statistical independence of and guarantees unconfoundedness between and .Proof: Follows from properties of causal graphical models and Pearl’s do-calculus [Pearl 2009]

Unobserved variables

Cause(X)

Outcome (YR)

Auxiliary Outcome

Unobserved variables

Example: Assuming a linear model

Theorem 1a: Whenever , and , then the unbiased causal estimate can be estimated as:

TreatmentOutcome: Unobserved confoundersCausal effectParameters

Relationship to instrumental variable techniqueBoth utilize naturally occurring variation in data.

Instrumental Variable Split-door criterionAssumption: Exclusion and As-if-random

Independence test used to find natural experiments.Only Assumption: Auxiliary outcome is affected by the causes of the primary outcome.

By testing if treatment is independent of auxiliary outcome, Split-door requires a weaker dependence

assumption for validity.

By testing if treatment is independent of auxiliary outcome, Split-door requires a weaker dependence

assumption for validity.

Treatment

Outcome

Unobserved

Confounders

Exclusion?

Treatment

Outcome

Unobserved

Confounders

Auxiliary Outcome

Split-door criterion

Data from Amazon.com, using Bing toolbarAnonymized browsing logs (Sept 2013-May 2014)• 23 M pageviews • 2 M Bing Toolbar users• 1.3 M Amazon productsOut of which 20 K products have at least 10 visits on any one day

Constructed sequence of visits for each user

Search page Focal product pageRecommended product page

Recreating sequence of visits: Log data

Timestamp URL2014-01-20 09:04:10

http://www.amazon.com/s/ref=nb_sb_noss_1?field-keywords=Cormac%20McCarthy

2014-01-20 09:04:15

http://www.amazon.com/dp/0812984250/ref=sr_1_2

2014-01-20 09:05:01

http://www.amazon.com/dp/1573225797/ref=pd_sim_b_1

Recreating sequence of visits: Log data

Timestamp

2014-01-20 09:04:10

http://www.amazon.com/s/ref=nb_sb_noss_1?field-keywords=Cormac%20McCarthy

2014-01-20 09:04:15

http://www.amazon.com/dp/0812984250/ref=sr_1_2

2014-01-20 09:05:01

http://www.amazon.com/dp/1573225797/ref=pd_sim_b_1

User searches for Cormac McCarthy

User clicks on the second search result

User clicks on the first recommendation

I. Weekly and seasonal patterns in traffic, nearly tripling in holidays

II. 30% of pageviews come from recommendations

III. Books and eBooks are the most popular categories by far

Implementing the split-door criterion

¿ 𝑋 ,𝑌 𝐷>¿

Nat. Expt.

𝑥(2) , 𝑦𝐷(2)

𝑥(𝑖) , 𝑦𝐷(𝑖)

𝑥(1) , 𝑦𝐷(1 )

𝑥(𝑛−1) , 𝑦𝐷(𝑛−1 )

𝑥(𝑛) , 𝑦𝐷(𝑛)

Causal effect

Implementing the split-door criterion1. Divide up data into t=15 day periods.

2. For each time period: a) Using Fisher’s test, find product pairs (X and Y) such that:

Visits to focal product: Direct visits to recommended product

b) Compute

Using the split-door criterion, obtain 23,000 natural experiments for over

12,000 products.1) Traditional IV method using Oprah Winfrey

[Carmi et al.]: 133 natural experiments

2) Covers more than half of all products~20k

INVALID

Observational click-through rate overestimates causal effect

Over half of the recommendation click-throughs would have happened anyways.

Can vary the confidence in validity of obtained natural experiments

Similar, more precise causal estimates than simply using shocks

Generalization? Distribution of products with a natural experiment identical to overall distribution

Causal estimates are consistent with experimental findings (e.g., Belluf et. al. [2012], Lee

and Hosanager [2014])

• Shocks may be due to discounts or sales

Generalizable to all products on amazon.com?

Lower CTR may be due to the holiday

season

• Split-door products are not a representative sample of all products, nor are the users who participate in them.

• But Split-door criterion covers more than half of all products with at least 10 visits on any single day.

• Causal estimates are consistent with experimental findings (e.g., Belluf et. al. [2012], Lee and Hosanager [2014])

Generalization to all of Amazon.com?

Potential applications: Whenever an auxiliary outcome is availableDigital systemsRecommender systems, ad systems, app notificationsAny media website or app (such as newspapers)Offline contextsDiscount mailers sent by storesAny two marketing channelsIn the future…Effect of medical treatments, teaching interventions, etc.

Summary: Mining natural experiments at scaleUnlike traditional natural experiments, Split-door criterion relies on fine-grained data to:

Verify exclusion assumption [Robustness]Cover a broad range of data

[Generalizability]

Provides an offline metric for computing causal effects in digital systems (e.g., ad systems, media websites, app notifications).Code available for use.

Oprah [Carmi et al.] 133 shocks Restricted to books

Split-door criterion

12,000 natural experiments

Representative of overall product distribution

Nat. Exp.

The spectrum: split-door, regression and a natural experiment

Cutoff for Likelihood of Independence

0 .80 .95 1

Split-door

Nat. Exp.

Regression

Part 2: A general Bayesian test for natural experiments in any dataset

Cause (X)

Outcome (Y)

As-If-Random?

Exclusion?Given observed data, can we determine whether it was generated from, a) the above model class (Invalid-IV), or

b) a model class without red edges (Valid-IV)?

Observational Data

Cause (X)

Outcome (Y)

Unobserved

Confounders (U)

I.V.(Z)

(Z) (X

𝑦= 𝑓 (𝑥 ,𝑢)

𝑦= 𝑓 (𝑥 ,𝑧 ,𝑢)

Necessary test: By properties of causal graph

Test :

Pearl (1993)

Cause (X)

Outcome (Y)

Unobserved

Confounders (U)

I.V.(Z)

But we would like a sufficient test for instrumental variables.

A first try: Compare model classes by maximum likelihood

Every data distribution that can be generated by a ValidIV model can also be generated by an InvalidIV model.

𝑀𝐿 𝐼𝑛𝑣𝑎𝑙𝑖𝑑𝐼𝑉= max𝑚 ′∈𝐼𝑛𝑣𝑎𝑙𝑖𝑑𝐼𝑉

𝑃 (𝐷𝑎𝑡𝑎∨𝑚′ )

Diamond represents all observable probability distributions P(X,Y|Z).

Sufficiency is almost “impossible”

Passes Necessary test

Both Valid and Invalid IV models can generate this data distribution.

Can attain a weaker notion: probable sufficiency

𝑉𝑎𝑙𝑖𝑑𝑖𝑡𝑦𝑅𝑎𝑡𝑖𝑜=𝑃 (𝑉𝑎𝑙𝑖𝑑𝐼𝑉∨𝐷𝑎𝑡𝑎)𝑃 (𝐼𝑛𝑣𝑎𝑙𝑖𝑑𝐼𝑉∨𝐷𝑎𝑡𝑎)

A “probably sufficient” criterion

Intuition

Valid-IV

𝑉𝑎𝑙𝑖𝑑𝑖𝑡𝑦𝑅𝑎𝑡𝑖𝑜=𝑃 (𝑉𝑎𝑙𝑖𝑑𝐼𝑉∨𝐷𝑎𝑡𝑎)𝑃 (𝐼𝑛𝑣𝑎𝑙𝑖𝑑𝐼𝑉∨𝐷𝑎𝑡𝑎)

Invalid-IV

Observational Data

𝑔1 𝑔2

𝑓 1

𝑓 3

𝑓 2

𝑓 4𝑔4h3 h4

Develop a generative meta-model of the data.

Compare marginal likelihoods of Valid versus Invalid-IV models.

Can formalize as a Bayesian model comparison

Data is likely to be generated from a Valid-IV model if ValidityRatio ≫ 1

Computing the Validity Ratio

Two problems:Each causal model contains unobserved

variable U.Infinitely many causal models in each

sub-class.

I. Use a response variable frameworkAssumes discrete variables.

𝑦= 𝑓 (𝑥 ,𝑢)

II. Non-standard integral over infinite models

Denominator (Invalid-IV)

Derived a closed form solution.

Properties of dirichlet and hyperdirichlet distributions.

-Laplace transform

Numerator (Valid-IV)

No closed form solution exists.

Used Monte Carlo methods for approximating.

-Annealed Importance Sampling

∫❑ ∫❑

Use the NPS test to validate IV studies from American Economic ReviewCollected studies from American Economic Review (AER) with “instrumental variable” in title or abstract.

Many recent studies from American Economic Review do not pass the testCollected studies from American Economic Review (AER) with “instrumental variable” in title or abstract. Studies from American Economic Review Validity

RatioEffect of Mexican immigration on crime in United States (2015)

0.07Effect of subsidy manipulation on Medicare premiums (2015)

1.02Effect of credit supply on housing prices (2015) 0.01Effect of Chinese import competition on local labor markets (2013)

0.3Effect of rural electrification on employment in South Africa (2011)

Expt: National Job Training Partnership Act (JTPA) Study (2002)

Challenges decades-long belief that causal assumptions cannot be tested from data

Can use data mining for causal effects in large-scale data.

Two recipes:• Create new graphical structures that identify

causal effect: Split-door criterion• Use Bayesian modeling to test instrumental

variables: NPS test

Conclusion: Causal data mining enables causal inference from large-scale data

More generally, a viable methodology for causal inference in large datasets

¿ 𝑋 ,𝑌>¿ Develop tests for validity of natural

variation

Mine for such valid variations in

observational data

LotteryWeatherShocks

Hard-to-find variations

Discontinuities

Change in access of digital services

Change in medicines at a hospital

Change in train stops in a city

More generally, a viable methodology for causal inference in large datasets

Controlled experiments

IV Test

Future Work

Ability to experiment

Contextual BanditsA/B

Split-door

Causal algorithms

Warm Start (choosing expts.)

Online+Offline

Future work: Causal inference and machine learning

Causal inference robust prediction

Causal inferencePredicted value under the counterfactual distribution P’(X,y).

(Supervised) MLPredicted value under the training distribution P(X,y).

Thank you!Amit Sharmahttp://www.amitsharma.in1. Hofman, Sharma, and Watts (2017).

Prediction and explanation in social systems. Science, 355.6324.

2. Sharma (2016). Necessary and probably sufficient test for finding instrumental variables. Working paper.

3. Sharma, Hofman, and Watts (2016). Split-door criterion for causal identification: An algorithm for finding natural experiments. Under review at Annals of Applied Statistics (AOAS).

4. Sharma, Hofman, and Watts (2015). Estimating the causal impact of recommendation systems from observational data. In Proceedings of the 16th ACM Conference on Economics and Computation.

References1. Angrist and Pischke (2008). Mostly harmless econometrics:

An empiricist’s companion. Princeton Univ. Press.2. Belluf, Xavier and Giglio (2012). Case study on the business

value impact of personalized recommendations on a large online retailer. In Proc. ACM Conf. on Recommender Systems.

3. Carmi, Oestreicher-Singer and Sundararajan (2012). Is Oprah contagious? Identifying demand spillovers in online networks. SSRN 1694308

4. Dunning (2012). Natural experiments in the social sciences: a design-based approach. Cambridge University Press

5. Gomez-Uribe and Hunt (2015). The Netflix recommender system: Algorithms, business value and innovation. ACM Transactions on Management Information Systems.

6. Lee and Hosanager (2014). When do recommender systems work the best? The moderating effects of product attributes and consumer reviews on recommender performance. In Proc. ACM World Wide Web Conference.

References7. Lin, Goh and Heng (2013). The demand effects of product

recommendation networks: An empirical analysis of network diversity and stability. SSRN 2389339.

8. Linden, Smith and York (2003). Amazon. com recommendations: Item to-item collaborative filtering. IEEE Internet Computing.

9. Mulpuru (2006). What you need to know about Third-Party Recommendation Engines. Forrester Research.

10. Oestreicher-Singer and Sundararajan (2012). The Visible Hand? Demand Effects of Recommendation Networks in Electronic Markets. Management Science.

11. Pearl (2009). Causality: models, reasoning and inference. Cambridge Univ Press.

12. Sharma and Yan (2013). Pairwise learning in recommendation: Experiments with community recommendation on Linkedin. In ACM Conf. on Recommender Systems.

Causal data mining: Identifying causal effects at scale

Science

Identifying Causal Influences on Publication Trends and

Causal Inference in Observational Data - arXiv · techniquest to determine the e cacy of display advertise-ments. Even extending association rules mining to causal rule mining has

IDENTIFICATION AND ESTIMATION OF CAUSAL EFFECTS WITH … · We develop an approach to identifying and estimating causal effects in longitudinal settings with time-varying treatments

Causal Loop Diagrams Causal loop diagrams can be very helpful for explaining the mechanism behind feedback loops and identifying whether the feedback loop

Identifying causal effects: Essays on empirical economics

Identifying Causal Effects from Observationsstat.cmu.edu/~cshalizi/uADA/12/lectures/ch23.pdf · 2012. 4. 25. · 1. Given the causal structure of a system, estimate the effects the

Association rule mining · 2018. 9. 9. · Association rule mining Association rule mining Finding frequent patterns, associations, correlations, or causal structures among sets of

TIME SERIES DATA MINING: IDENTIFYING TEMPORAL PATTERNS FOR

Identifying Bug Signatures Using Discriminative Graph Mining

A Simple Constraint-Based Algorithm for Efﬁciently Mining ...mgv/BNSeminar/Cooper97.pdfa start toward using more elaborate causal discovery algorithms. Keywords: causal discovery,

1 Part 2 Automatically Identifying and Measuring Latent Variables for Causal Theorizing

Bounding Causal Effects on Continuous Outcome · 2021. 2. 6. · and Rubin 1996). The problem of identifying causal effects from observed data provided with causal assumptions about

Identifying expectations about the strength of causal ... · causal relationship. More recently, researchers have proposed various models based on Bayesian statistics. While these

An advanced framework for identifying causal models of ... · Causal models including genetic factors are important for understanding the presentation mechanisms of complex diseases

A Unified Framework for Defining and Identifying …economics.ucr.edu/seminars_colloquia/2006/econometrics/...A Unified Framework for Defining and Identifying Causal Effects Halbert

Mining Causal Association Rules

Differential Slicing: Identifying Causal Execution ...bitblaze.cs.berkeley.edu/papers/diffslicing_oakland11.pdf · Differential Slicing: Identifying Causal Execution Differences for

Identifying Dynamic Spillovers of Crime with a Causal ... Dynamic Spillovers of Crime with a Causal Approach to Model Selection Gregorio Caetano & Vikram Maheshri November 15, 2016

Identifying Causal Effects with the R Package causaleffect · Identifying Causal Eﬀects with the R Package causaleﬀect Santtu Tikka University of Jyvaskyla Juha Karvanen University

Application of Data Mining for identifying and predicting ...Application of Data Mining for identifying and predicting room bookings ... ZMOT Zero moment of truth 1 1. INTRODUCTION