Upload
others
View
3
Download
1
Embed Size (px)
Citation preview
UTS:ENGINEERING AND
INFORMATIONTECHNOLOGY
UTS CRICOS PROVIDER CODE: 00099F feit.uts.edu.au
Causal Analysis for Recommender Systems
Qian LiSchool of Software & Advanced Analytics Institute
2 Causal Analysis for Recommender Systems 14/4/2019
Agenda
Causality notations
Causality inference with potential outcome
Causal effect learning
Causal recommendation
Open questions
3 Causal Analysis for Recommender Systems 14/4/2019
Causality notations
Causation Statistical Association≠
High Temperature
High ice cream sale
High electric bills
Strong associationnot causation
causaMon
4 Causal Analysis for Recommender Systems 14/4/2019
Causality notations
Causal analysis Goal: assess the causal effect of some potential cause (e.g. an institution, intervention, policy, or event) on some outcomes• Two kinds of causal questions• Learn causal effect (causal inference) • What is the effect of the treatment on the outcome?• If the opposite treatment had been received, how would the outcome differ?
• Discover causal relationship (causal discovery)• Whether there exists a causal relationship between two variables. • By modifying the value of which variable could we change the value of another
variable?• Applications• Medical science• Education • Economics…
5 Causal Analysis for Recommender Systems 14/4/2019
Causality notations
Causal models [Pearl 2009]• Structural causal models (SCM) • Causal graph G=(V, E) is a directed acyclic graph (DAG)• Structrual equation specifies causal graph • Causal discovery, causal inference
• Potential outcome framework • Treatment-outcome pair (D,Y) and SUTVA assumption• Causal inference
Structural equationCausal graph
6 Causal Analysis for Recommender Systems 14/4/2019
Causality notations
• Notation• Treatment: the variable to be manipulated• Outcome: the variable that can be observed with some responses• Confounder: the variable influences both treatment and outcome
• Example: How Yelp ratings influence potential restaurant customers?
OutcomeTreatment
Restaurant type
Causality
Confounder
influence influence
e.g., the number of customers
7 Causal Analysis for Recommender Systems 14/4/2019
Causality inference with potential outcome
Causality inference with potential outcome• Treatment• Active treatment and control treatment (not getting the treatment)• 𝐷# ∈ 0,1 : treatment variable for instance 𝑢
𝐷#= +1 active treatment0 control treatment
• Potential outcomes• 𝑌#8 : outcome variable that would be realized if the treatment for 𝑢 had been set 𝐷
• Observed/Factual outcomes• 𝑌#: observed outcome variable of interest for instance 𝑢
X
D Y
𝑌#8 = 9 𝑌#: active treatment𝑌#; control treatment
8 Causal Analysis for Recommender Systems 14/4/2019
Causality inference with potential outcome• Stable Unit Treatment Value Assumption (SUTVA)• Observed outcomes are realized as
𝑌# = 𝐷# ⋅ 𝑌#: + 1 − 𝐷# 𝑌#;, i. e. , 𝑌#= 9 𝑌#: if 𝐷# = 1𝑌#; if 𝐷# = 0
• Potential outcomes for 𝑢 are unaffected by treatment assignment for 𝑣
• Causal effect• Individual causal effect of the treatment on the outcome for instance 𝑢 is
the difference between its two potential outcomes
𝜏# = 𝑌#: − 𝑌#;
Causality inference with potential outcome
Individual treatment effect (ITE)
9 Causal Analysis for Recommender Systems 14/4/2019
Causality inference with potential outcome• Example of ITE
• Average Treatment Effect (ATE) • Averages causal effect over the whole population
𝜏CDE = 𝔼#∈G[𝜏#] = 𝔼#∈G[𝑌#: − 𝑌#;]• ATE only requires to query interventional distributions but not counterfactuals• 𝜏CDE is still unidentified
Causal effect learning
10 Causal Analysis for Recommender Systems 14/4/2019
Causality inference with potential outcome• Average Treatment Effect on the Treated (ATT)
𝜏CDD = 𝔼 𝜏# 𝐷# = 1 = 𝔼 𝑌#: − 𝑌#; 𝐷# = 1]
• Imagine a study population with 5 units:
• Compute ATE=2 and ATT=1. In fact, all potential outcomes are unobserved.
𝒖 𝑫𝒖 𝒀𝒖𝟏 𝒀𝒖𝟎 𝒀𝒖𝟏 − 𝒀𝒖𝟎
a 0 3 5 2
b 1 2 5 3
c 1 5 4 -1
d 0 2 7 5e 1 1 2 1
Causal effect learning
11 Causal Analysis for Recommender Systems 14/4/2019
Causality inference with potential outcome• How to approximate ATT with observed data?
𝑬 𝒀𝒖 𝑫𝒖 = 𝟏 − 𝑬 𝒀𝒖 𝑫𝒖 = 𝟎 (known by observed data)= 𝐸 𝑌#: 𝐷# = 1 − 𝐸 𝑌#; 𝐷# = 0= 𝑬 𝒀𝒖𝟏 𝑫𝒖 = 𝟏 − 𝑬 𝒀𝒖𝟎 𝑫𝒖 = 𝟏 + 𝐸 𝑌#; 𝐷# = 1 − 𝐸 𝑌#; 𝐷# = 0
ü Bias =0, ATT can be directly computed by the observed data
𝑬 𝒀𝒖𝟏 𝑫𝒖 = 𝟏 − 𝑬 𝒀𝒖𝟎 𝑫𝒖 = 𝟏 = 𝑬 𝒀𝒖 𝑫𝒖 = 𝟏 − 𝑬 𝒀𝒖 𝑫𝒖 = 𝟎
ü Bias ≠0, if selection into treatment is associated with potential outcomes
Bias𝜏CDD
Computed from observed data
Causal effect learning
12 Causal Analysis for Recommender Systems 14/4/2019
Selection Bias
• Example: Job training program for the disadvantaged• Participants are self-selected from a subpopulation of individuals in difficult labor
situations
• Post-training period earnings for participants would be lower than those for
nonparticipants in the absence of the program, i.e.,
𝐸[𝑌#; |𝐷# = 1] − 𝐸[𝑌#; |𝐷# = 0] < 0
• Simple comparison of earnings of people who receive job training or not may give us
the wrong answer.
Causal effect learning
13 Causal Analysis for Recommender Systems 14/4/2019
Causal effect learning
Selection Bias• Homogeneity is one solution
𝜏# = 𝜏 for all instances 𝜏
i.e., 𝑌#: or 𝑌#; is the same for every instance even some of them do not receive treatment.
• This may be sometimes plausible in physical sciences
• Unfortunately, rarely true in social sciences.
• Treatment assignment mechanism can reduce selection bias and simulate homogeneity so
as to estimate average causal effects.
14 Causal Analysis for Recommender Systems 14/4/2019
Causal effect learning
Treatment Assignment Mechanism• Definition: Treatment assignment mechanism is the procedure that determines which
instances are selected for treatment.• Three mechanisms• No variable influences treatment
• Random assignment• Whether instance 𝑢 receives treatment 𝐷# is determined by throwing a coin or
random number, thus (Y#:, 𝑌G;) ⊥ 𝐷#• Bias=0, then ATT =𝐸 𝑌# 𝐷# = 1 − 𝐸 𝑌# 𝐷# = 0
• E[Y#: − 𝑌G;|𝐷# = 1] = E[Y#: − 𝑌G;], then ATE=ATT
• Disadvantages: noncompliance, expensive, suboptimal randomization
D Y
15 Causal Analysis for Recommender Systems 14/4/2019
Causal effect learning
Treatment Assignment Mechanism
• Three mechanisms• Variables influences treatment
X
D Y
X: Restauranttype
Confounder
D: Yelp ratingsY: Number of
customers
• Selection for observed confounder• Selection for unobserved confounder
16 Causal Analysis for Recommender Systems 14/4/2019
Causal effect learning
Assignment Mechanism• Three mechanisms• Selection for observed confounder• Observed variable (covariates) influences treatment• Observed variables relate to the instance e.g., age, sex or earnings.• For “Job training program for the disadvantaged”, earnings before training is
one observed variable that must be considered
• Treatment is “ as-if ” random after statistical control • Propensity score matching• Simulate “homogeneity” by matching the propensity scores of treatment• Assume outcomes of two instance for treatment D are the same, if they
have a similar estimated propensity score.
17 Causal Analysis for Recommender Systems 14/4/2019
Causal effect learning
Assignment Mechanism• Three mechanisms• Selection for unobserved confounder• Unobserved variable influences treatment• Prior knowledge may imply that some confounding variables are not
measurable • Insufficient evidence to confirm the causal relationship between all observed
variables. • Methodologies• Instrumental variable method• Mediate variable method • Regression discontinuity design
X
D Y
Special causal variables
18 Causal Analysis for Recommender Systems 14/4/2019
Causal recommendation
• Causal inference• Expose a patient to a treatment • Biased data from observational studies • What would happen if a patient received a treatment?
• Recommendation• Expose a user to an item • Biased data from logged user behavior • What would happen if a user was recommended an item?
EsMmaMon of causal effects o\en starts with studying the treatment assignment mechanism
19 Causal Analysis for Recommender Systems 14/4/2019
• Assignment mechanism determines which items the users interact with (e.g., watch movies or click)? • Matrix factorization recommendation predicts unseen ratings by
𝑦#_ 𝑎 = 𝜃#b𝛽_ ⋅ 𝑎 + 𝜖#_, 𝜖#_ ∼ 𝒩 0, 𝜎h , 𝑎 = i1 𝑢 interact with 𝑖0 otherwise
• Underlying assumptions• Outcome (a user’s rating of a movie, observed or not) is independent of the treatment
assignment (whether the user has watched the movie)• Users are exposed to all items
• Problems• Users do not randomly interact with items (Missing Not At Random problem) • Overestimate the effect of the unclicked items
Causal recommendation
Assignment mechanism in recommendation
D: whether interact Y: ratings
Treatment Outcome
20 Causal Analysis for Recommender Systems 14/4/2019
Causal recommendation Confounders learning• Confounders that influences treatment should be modeled to debias the ratings
• Variables influence users’ interactions and ratings, e.g., exposure model
• Two assignment mechanisms to learn confounders
• Selection for observed confounders
• Selection for unobserved confounders
X: exposure model
D: which movies to watch Y: ratings
Confounder
Treatment Outcome
21 Causal Analysis for Recommender Systems 14/4/2019
Causal recommendation
Notations• Problem formulation
• Treatment:
𝐷#_ = +1 user 𝑢 interacts with 𝑖0 otherwise
• Exposure confounder:
𝑎#_ = +1 𝑖 is exposed to 𝑢0 otherwise
• Rating outcome:
𝑦#_(𝐷#_) = +rating 𝐷#_ = 10 𝐷#_ = 0
• Causal recommendation first computes confounders and then uses confounders to guide rating prediction
𝑎#_
𝐷#_ 𝑦#_
intervention
22 Causal Analysis for Recommender Systems 14/4/2019
Causal recommendation
Selection for observed confounder [Liang et.al 2016]
• A simple task: debias click outcome with exposure model
• The following defines 𝑦#_ as click data, which in fact is 𝐷#_• How to model exposure and predict unbiased interaction outcome?
𝑎#_
𝐷#_ 𝑦#_
𝑎#_
𝐷#_
Exposure confounder
RaMng outcomeInteractiontreatment
Unknown exposure
Click data
23 Causal Analysis for Recommender Systems 14/4/2019
Causal recommendation
Selection for observed confounders [Liang et.al 2016]
Alice
See
See
Not see
Dislike
Actionse.g. view, rating
No acLons
No actionsUnobserved data
Observed data
She may like or dislike this movie
a limited budget
24 Causal Analysis for Recommender Systems 14/4/2019
Causal recommendation
• Partially unobserved exposure data 𝑎#_• Popular-based exposure 𝑎#_ ∼ Bernoulli(𝜇#_)
• Location-based exposure 𝑎#_ ∼ Bernoulli(𝜎(𝑥#D𝑙_))
• Others: social-network exposure…
• Infer hidden variables: user factors 𝜃#, item factors 𝛽_, 𝜇#_
Selection for observed confounders [Liang et.al 2016]
exposure indicatorprior probability of exposure
𝑦#_|𝑎#_ = 1 ∼ exp-fam (𝜃#D𝛽)
𝑦#_|𝑎#_ = 0 ∼ 𝛿;
click data
treatment outcome
• Define outcome click data 𝑦#_ by treatment assignment• The user didn’t see the item.• The user chose not to click on it.
• Task: debias click outcome with exposure model
25 Causal Analysis for Recommender Systems 14/4/2019
Causal recommendation
• Task: debias click outcome with exposure• Random initialization: 𝜃#, 𝛽_, 𝜇#_, (or exposure variates 𝜓
• E-step: Compute the expectation of exposure (or propensity score)
𝔼 𝑎#_ 𝑦#_ = 0, 𝜃, 𝛽, 𝜇 = v 𝑦#_ = 0 𝑎#_ = 1 v(wxyz:)∑v 𝑦#_ = 0 𝑎#_ v(wxy)
= 𝑝 𝑎#_ = 1 𝑦#_ = 0, 𝜃, 𝛽, 𝜇
𝑝#_ = 𝑝 𝑎#_ = 1 𝑦#_ = 0, 𝜃#, 𝛽_, 𝜇#_
• Construct the function ℒ = ∑ #,_ ∈𝒪:vxy
𝑦#_ − 𝜃#D𝛽_ h + 𝜆� ∑ 𝜃# hh + 𝜆� ∑ 𝛽_ h
h
• M-step: Minimize ℒ with parameters (𝜃, 𝛽, 𝜇) = argmin ℒ and update
Selection for observed confounders [Liang et.al 2016]
Propensity score
26 Causal Analysis for Recommender Systems 14/4/2019
Causal recommendation
• Task: debias click outcome with location (topic) exposure• Update the exposure prior 𝜇_
• For popular-based exposure 𝜇_ ∼ Beta(𝛼:, 𝛼h)
𝜇_��� ←𝛼: + ∑# 𝑝#_ − 1𝛼: + 𝛼h + 𝑈 − 2
• For location (topic)-based exposure 𝜇#_ = 𝜎 𝜓#D𝑥_ and 𝑥_ is inferred from the data
𝝍#��� ← 𝝍# + 𝜂∇𝝍xℒ
• Prediction of click data �𝔼�[𝑦#_|𝜽#, 𝜷_ = 𝔼w 𝔼� 𝑦#_ 𝜽#, 𝜷_, 𝑎#_ = 𝜇#_ ⋅ 𝜽#b𝜷_
• Summary
• Apply the assignment mechanism in causal inference
• Propose an extendable framework with various forms of exposure models
Selection for observed confounders [Liang et.al 2016]
27 Causal Analysis for Recommender Systems 14/4/2019
Causal recommendation
Selection for observed confounder• Task: debias the click model with social exposure [Wang et.al 2018]
• Most existing social recommender systems assumed users who are connected are more likely to have similar preferences.• Relax the “similar preference” assumption for social recommendation • Use social information on the exposure rather than the preference
𝑎#_
𝐷#_
Unknown exposure
Click data
28 Causal Analysis for Recommender Systems 14/4/2019
Causal recommendation
Selection for observed confounders [Wang et.al 2018]
Alice
See
See
Not see
Dislike
Actionse.g. view, rating
No actions
No actions
Unobserved data
Observed data
She may know this film from Bob, then her acLon may be dislike
Might be changed by online social data i.e., friends may not share similarities
Dislike
29 Causal Analysis for Recommender Systems 14/4/2019
Causal recommendation
• Task: debias the click model with social exposure
• Rating component
• Update 𝜃#, 𝛽_ by EM algorithm
• Social exposure component
• Social regularization
𝜇#_ = 𝑋#D𝑇_ + 𝛾_• Update 𝜇#_ with social regularization in one EM iteration
Selection for observed confounders [Wang et.al 2018]
30 Causal Analysis for Recommender Systems 14/4/2019
Causal recommendation
• Social boosting
• Maximizing the log likelihood with respect to 𝜇#_ is
equivalent to finding the mode of the complete conditional
Selection for observed confounders [Wang et.al 2018]
• Prediction of click data
�𝔼�[𝑦#_|𝜽#, 𝜷_ = 𝔼w 𝔼� 𝑦#_ 𝜽#, 𝜷_, 𝑎#_ = 𝜇#_ ⋅ 𝜽#b𝜷_
31 Causal Analysis for Recommender Systems 14/4/2019
Causal recommendation
Selection for unobserved confounder [Wang and Liang et al. 2018]
• Task: debias counterfactual rating with unobserved confounders
• Confounders might be difficult (or impossible) to measure and observe• Propensity models rely on either observed ratings of a missing-completely-at-random
sample or externally observed user and item covariates
Unobserved exposure𝑎#_
𝐷#_ 𝑦#_
intervention
Unobserved exposure𝑎#_
𝐷#_ 𝑦#_
�𝑎#_
estimate
Intervention𝐷#_ = 1
Counterfactual
32 Causal Analysis for Recommender Systems 14/4/2019
Causal recommendation
• The deconfounded recommender• Task: debias counterfactual rating• Exposure model �𝑎#_ is constructed from Poisson factorization
• Outcome model is conditional on the substitute confounders �𝑎
𝑦#_ 𝐷#_ = 𝜃#b𝛽_ ⋅ 𝐷#_ + 𝛾# ⋅ �𝑎#_ + 𝜖#_, 𝜖#_ ∼ 𝒩(0, 𝜎h)
Selection for unobserved confounders [Wang and Liang et al. 2018]
𝑎#_
𝐷#_ 𝑦#_
PF
𝑈# 𝑉_
𝑐:, 𝑐h 𝑐�, 𝑐�
�𝑎#_]�𝑎#_ = 𝔼��[𝑈#b𝑉_|𝑫 = 𝒅
33 Causal Analysis for Recommender Systems 14/4/2019
Causal recommendation
• The deconfounded recommender
• Model ratings 𝑦#_(𝐷#_) by 𝑦#_(𝐷#_) ∼ 𝑝(⋅ |𝑚(𝜃#b𝛽_, 𝐷#_) +𝛾#⋅ �𝑎#_ + 𝛽;, 𝑣(𝜃#b𝛽_, 𝐷#_))
𝑚(⋅) and 𝑣 ̇ are mean and variance of 𝑦#_(𝐷#_)
• Fit outcome model 𝑦#_(𝐷#_) by maximum a posteriori estimation
Selection for unobserved confounders [Wang and Liang et al. 2018]
𝑎#_
𝐷#_ 𝑦#_
�𝑎#_
34 Causal Analysis for Recommender Systems 14/4/2019
Causal recommendation
• The deconfounded recommender
• Predict all potential ratings 𝑦#_(1) with intervention of 𝐷#_ = 1
• Existing users for unseen items
• New users for unseen items
• This relies solely on the observed ratings without a gold-standard
randomized exposure or external covariates.
Selection for unobserved confounders [Wang and Liang et al. 2018]
𝑎#_
𝐷#_ 𝑦#_
�𝑎#_
Intervention𝐷#_ = 1
35 Causal Analysis for Recommender Systems 14/4/2019
Causal recommendation
Selection for unobserved confounders [Schnabel et.al 2016]• Task: debiasing learning and recommendation quality evaluation
• Empirical Risk Minimization (ERM) framework
• Propensity estimation for observational data
a) Propensity Estimation via Naive Bayes (similar to [Liang et.al 2016])
𝑝 𝐷#_ = 1 𝑦#_ = 𝑟 =𝑝(𝑦#_ = 1|𝐷#_ = 1)
𝑝(𝑦#_ = 𝑟)
b) Propensity Estimation via logistic regression (similar to [Wang et.al 2018])
𝑝#_ = 𝜎(𝑤D𝑥#_ + 𝛽_ + 𝛾#)
• Inverse-Propensity-Scoring (IPS) for evaluating recommendation quality¢𝑅¤¥¦ �𝑦#_ 𝑝#_ = :
G⋅¤∑ #,_ ∈𝒪
§xy(�xy, ��xy)vxy
MAE or MSE
36 Causal Analysis for Recommender Systems 14/4/2019
Causal recommendation
Selection for unobserved confounders [Schnabel et.al 2016]• Debiasing learning and evaluation
• Empirical Risk Minimization (ERM) framework
• Use estimated propensity 𝑝#_ to optimize recommendation quality estimator�𝑦E¨© = argmin �� ¢𝑅¤¥¦ �𝑦#_ 𝑝#_ = argmin :
G⋅¤∑ #,_ ∈𝒪
§xy(�xy, ��xy)vxy
• 𝐿h-regularized matrix factorization by assuming �𝑦E¨© = 𝑉D𝑊 + 𝐴
argmin,®,C
¯°x,yz:
𝛿#,_ 𝑌, 𝑉D𝑊 + 𝐴𝑝#,_
+ 𝜆 ∥ 𝑉 ∥²h+∥ 𝑊 ∥²h
• Conventional incomplete matrix factorization is a special case by setting 𝑝#,_ = 1
• Reweighting the samples by the inverse of the probability of receiving the treatment can alleviate the confounding bias
37 Causal Analysis for Recommender Systems 14/4/2019
Causal recommendation
Causal recommendation • Other treatment assignments
• Observed confounders• Regression adjustment
• Fit a single function to estimate 𝑃(𝑌|𝑆, 𝐷)• Infer the counterfactual outcome as
• Unobserved confounders• Mediate variable method
38 Causal Analysis for Recommender Systems 14/4/2019
Causal recommendation
Causal recommendation • Other treatment assignments
• Unobserved confounders• Instrumental variable method
• Two-stage regressions • Fit µ𝐷,¶ = 𝑓8,¶(𝐼,¶, 𝑋,¶) for each treatment variable µ𝐷,¶• Fit Y = 𝑓¹ (µ𝐷,¶, 𝑋,¶)
• Regression discontinuity design
39 Causal Analysis for Recommender Systems 14/4/2019
Causal recommendation
Conclusion and open questions• Conclusion
• Counterfactual prediction• Debias ratings or click outcomes with different exposure models [Nathan et al. 2018]
• Recommendation policy optimization• Unbiased evaluation of a recommendation policy using biased data [Schnabel et.al 2016]• How much user activities is caused by RS itself [Amit et al. 2015, Bonner et al. 2018]
• Open questions:• Multi-causal inference
• Current causal recommendation considers univariate cause rather than multiple causes• Learn unobserved confounders in RS
• How to identify other confounders (not exposure models) using causal prior knowledge• Causal relationship discovery for explainable RS
• Current causal discovery may not work well on recommendation data as they are extremely sparse and MNAR
40 Causal Analysis for Recommender Systems 14/4/2019
Causal recommendation
• [Pearl 2009] Pearl J. Causality: Models, Reasoning, and Inference. second ed. New York: Cambridge University Press; 2000a. 2009.
• [Schnabel et.al 2016] Schnabel, Tobias, et al. "Recommendations as treatments: Debiasing learning and evaluation." arXivpreprint arXiv:1602.05352 (2016).
• [Wang and Liang et al. 2018] Wang, Yixin, et al. "The Deconfounded Recommender: A Causal Inference Approach to Recommendation." arXiv preprint arXiv:1808.06581 (2018).
• [Bonner et al. 2018]Bonner, Stephen, and Flavian Vasile. "Causal embeddings for recommendation." Proceedings of the 12th ACM Conference on Recommender Systems. ACM, 2018.
• [Nathan et al. 2018]Kallus, Nathan, Xiaojie Mao, and Madeleine Udell. "Causal inference with noisy and missing covariates via matrix factorization." Advances in Neural Information Processing Systems. 2018
• [Liang et al. 2016]Liang, Dawen, et al. "Modeling user exposure in recommendation." Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 2016.
• [Wang et al. 2018] Menghan Wang et al. “Collaborative Filtering with Social Exposure: A Modular Approach to Social Recommendation” The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18) .
References
41 Causal Analysis for Recommender Systems 14/4/2019