Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Laplacian-regularized graph bandits: Algorithms and theoretical analysisKaige Yang1, Xiaowen Dong2 and Laura Toni1 [email protected];xdong.robots.ox.ac.uk;[email protected] 1Department of Electrical and Electronic Engineering, Uninversity College London 2Department of Engineering Science, University of Oxford
Abstract We study contextual multi-armed bandit problems in the case of multiple users, where we exploit the structure in the user domain to reduce the cumulative regret. • We model user relation as a graph, and assume that the parameters (preferences) of users form smooth signals on the graph. • Based on a graph Laplacian-regularized estimator, we propose a novel bandit algorithm whose performance depends on a notion of local smoothness on the graph. • We provide a closed-form solution to the estimator and provide theoretical analysis on the estimation error, single-user upper confidence bound (UCB) and cumulative regret. • The single-user estimation and UCB also allow us to further propose a low complexity algorithm, whose computational complexity scales linearly with the number of users. • Our theoretical claims and algorithms are validated and tested empirically upon both synthetic and real-world datasets.
Problem Setting • Linear contextual bandit (n users and m arms )
• Graph-based bandit
Theorems • Single-user estimation: ;
• Local smoothness measure:
• Estimation error bound:
• Single-user UCB:
Conclusion • We have proposed G-UCB and G-UCB SIM, two Laplacian-regularized
graph bandit algorithms that exploit the relation between users for more efficient learning.
• As future directions, we may further consider negative edge weights in the adjacency matrix, which relaxes the condition that all user parameters are encouraged to be similar to each other,
• We may also consider an directed graph, which may help take into account the different levels of influence a pair of users may have on each other.
Theorems Validation
Bandit Experiments
Reference 1. Yasin Abbasi-Yadkori, David Pal, and Csaba Szepesvari. Improved algorithms or linear stochastic
bandits. In Advances in Neural Information Processing Systems, Ppge 2312-2320, 2011 2. Nicolo Cesa-Bianchi,Claudio Gentile, and Giovanni Zappella. A gang of bandits. In Advances in
Neural Information Processing Systems, pages 737-745, 2013 3. Claudio Gentile, Shuai Li, and Giovanni Zappella. Online clustering of bandits. In International
Conference on Machine Learning, pages 757-765, 2014 4. LiLong Li, Wei Chu, John Langd ford, and Robert E Schapire. A contextual-bandit approach to
personalized new article recommendation. In Proceedings of the 19th international conference on world wide web, pages 661-670,. ACM, 2010
y = xTθ + ϵ ϵ ∼ 𝒩(0, σ2)
RT =T
∑t=1
(xTi,*θt,i − xT
t θt,i)
tr(ΘTLΘ) =12
d
∑k=1
∑i∼j
Wij(Θik
Dii−
Θjk
Djj
)2
Θt = arg minΘ∈ℝn×d
n
∑i=1
| |Yt(i, :) − Θ(i, :)Xt,i | |2F + αtr(ΘTLΘ)
| |Δi | |2 = | |θi − (n
∑j≠i
− Lijθj) | |2
| |θi − θt,i | |2 ≤ tr(A−2t,i )(α | | Δt,i | |2 + | |Xt,iηt,i | |2 )
βt,i = α M−1t,ii | | Δt,i | |2 + | |Xt,iηt,i | |M−1
t,ii
θt,i = (At,i + αLiiI)−1Bt,i + αA−1t,i
n
∑j≠i
− LijA−1t,j Bt,j
Smoothness Smoothness Noise
Sparsity Netflix Movielens
Approximation UCB
Algorithm 1: G-UCB AlgorithmInput : ↵, T , empty graph Laplacian L,✓0,i = 0 2 Rd, M0,ii = I 2 Rd⇥d, �0,i = 0
for t 2 [1, T ] do
• For the appeared user i, calculate �t,i = ✓t�1,i �⇣Pn
j 6=i �Lij ✓lst�1,j
⌘
• Calculate �t,i
• Select one arm xt from the set D by maximizingUCB(i, t)= xT ✓t�1,i + �t,i||x||M�1
t�1,ii
• Receive payo↵ yt, update Yt�1 ! Yt and Xt�1,i ! Xt,i for all i
• Update ⇥t (and ✓t,i)
• Update ✓lst,i and M�1
t,ii:
✓lst,i = (Xt,iX
Tt,i)
�1Xt,iYt(i, :)T , Mt,ii = Xt,iX
Tt,i + ↵LiiI
• Update W (and L) via Gaussian RBF: Wij = exp(�||✓t,i�✓t,j ||22
2�2w
), �w is kernel
width
θ x
At,i = Xt,iXTt,i
Mt,ii ≈ Xt,iXTt,i + αLiiI
G-UCB SIM (Low Complexity Algorithm) If user i has at least one neighbour:
If user i has no neighbours:
θt,i = θridget,i + αA−1
t,i
n
∑j≠i
− Lijθlst,j
θt,i = θlst,i
Bt,i = Xt,iYt(i, :)