View
2
Download
0
Category
Preview:
Citation preview
Structured Output Learning with AbstentionApplication to Accurate Opinion Prediction
Alexandre GarciaChloé Clavel
Slim EssidFlorence d’Alché-Buc
LTCI, Télécom ParisTech, Paris, FRANCEflorence.dalche@telecom-paristech.fr
DATAIA-JST International Symposium on Data Science and AI
July 10, 2018
florence.dalche@telecom-paristech.fr
Short Overview of research activities
Outline
1 Short Overview of research activities
2 Focus on Structured Output Prediction with Abstention
3 Conclusion
(
LTCI, Télécom ParisTech, Paris, FRANCEflorence.dalche@telecom-paristech.fr
DATAIA-JST International Symposium on DataScience and AI
)Structured Output Learning with Abstention July 10, 2018 2 / 25
florence.dalche@telecom-paristech.fr
Short Overview of research activities
Research activities
Laboratory: LTCI (permanent staff: 120)Signal, Statistics and Machine Learning group (20 permanent staff members, 40PhD students)
(
LTCI, Télécom ParisTech, Paris, FRANCEflorence.dalche@telecom-paristech.fr
DATAIA-JST International Symposium on DataScience and AI
)Structured Output Learning with Abstention July 10, 2018 3 / 25
florence.dalche@telecom-paristech.fr
Short Overview of research activities
Focus on complex output learning
How to learn a function from X to Y when Y: set of trees,( labeled) graphs, sequences,functions .... ?
Multi-task learning: Multiple quantile regression
Functional-valued Regression: Infinite-task learning
Structured Output regression: Graph prediction in chemoinformatics
Zero-shot learning: Predict a class/complex object never seen in the training data
New challenges: make it fast and efficient, make it robust and reliable !
(
LTCI, Télécom ParisTech, Paris, FRANCEflorence.dalche@telecom-paristech.fr
DATAIA-JST International Symposium on DataScience and AI
)Structured Output Learning with Abstention July 10, 2018 4 / 25
florence.dalche@telecom-paristech.fr
Short Overview of research activities
How do we solve these problems? solve an easier surrogate problem
(1) Transform your outputs and solve an easier problem in a well chosen output featurespace
(
LTCI, Télécom ParisTech, Paris, FRANCEflorence.dalche@telecom-paristech.fr
DATAIA-JST International Symposium on DataScience and AI
)Structured Output Learning with Abstention July 10, 2018 5 / 25
florence.dalche@telecom-paristech.fr
Short Overview of research activities
How do we solve these problems?
(2) Come back to the original output space by solving a pre-image problem
(
LTCI, Télécom ParisTech, Paris, FRANCEflorence.dalche@telecom-paristech.fr
DATAIA-JST International Symposium on DataScience and AI
)Structured Output Learning with Abstention July 10, 2018 6 / 25
florence.dalche@telecom-paristech.fr
Focus on Structured Output Prediction with Abstention
Outline
1 Short Overview of research activities
2 Focus on Structured Output Prediction with AbstentionLearning frameworkEmpirical results
3 Conclusion
(
LTCI, Télécom ParisTech, Paris, FRANCEflorence.dalche@telecom-paristech.fr
DATAIA-JST International Symposium on DataScience and AI
)Structured Output Learning with Abstention July 10, 2018 7 / 25
florence.dalche@telecom-paristech.fr
Focus on Structured Output Prediction with Abstention
Learning to label a structure with abstention
Setup : we want to predict the labels of a known target graph structure (encoded bya directed graph).
TripAdvisor review⇒ sentence level opinion annotations
The room was ok, nothingspecial, still a perfectchoice to quickly join themain places.
(
LTCI, Télécom ParisTech, Paris, FRANCEflorence.dalche@telecom-paristech.fr
DATAIA-JST International Symposium on DataScience and AI
)Structured Output Learning with Abstention July 10, 2018 8 / 25
florence.dalche@telecom-paristech.fr
Focus on Structured Output Prediction with Abstention
Learning to label a structure with abstention
Setup : we want to predict the labels of a known target graph structure (encoded bya directed graph).
TripAdvisor review⇒ sentence level opinion annotations
The room was OK, nothingspecial, still a perfectchoice to quickly join themain places.
(
LTCI, Télécom ParisTech, Paris, FRANCEflorence.dalche@telecom-paristech.fr
DATAIA-JST International Symposium on DataScience and AI
)Structured Output Learning with Abstention July 10, 2018 8 / 25
florence.dalche@telecom-paristech.fr
Focus on Structured Output Prediction with Abstention
Problem
Problem: Error at a node penalizes the prediction of descendants.
The room was ok, nothingspecial, still a perfectchoice to quickly join themain places.
Can we build a mechanism allowing to abstain on difficult nodes?
(
LTCI, Télécom ParisTech, Paris, FRANCEflorence.dalche@telecom-paristech.fr
DATAIA-JST International Symposium on DataScience and AI
)Structured Output Learning with Abstention July 10, 2018 9 / 25
florence.dalche@telecom-paristech.fr
Focus on Structured Output Prediction with Abstention
Problem
Problem: Error at a node penalizes the prediction of descendants.
The room was ok, nothingspecial, still a perfectchoice to quickly︸ ︷︷ ︸
Service, Checkin
join
the main places︸ ︷︷ ︸Location
.
Can we build a mechanism allowing to abstain on difficult nodes?
(
LTCI, Télécom ParisTech, Paris, FRANCEflorence.dalche@telecom-paristech.fr
DATAIA-JST International Symposium on DataScience and AI
)Structured Output Learning with Abstention July 10, 2018 9 / 25
florence.dalche@telecom-paristech.fr
Focus on Structured Output Prediction with Abstention
Problem
Problem: Error at a node penalizes the prediction of descendants.
The room was ok, nothingspecial, still a perfectchoice to quickly︸ ︷︷ ︸
Service, Checkin
join
the main places︸ ︷︷ ︸Location
.
Can we build a mechanism allowing to abstain on difficult nodes?
(
LTCI, Télécom ParisTech, Paris, FRANCEflorence.dalche@telecom-paristech.fr
DATAIA-JST International Symposium on DataScience and AI
)Structured Output Learning with Abstention July 10, 2018 9 / 25
florence.dalche@telecom-paristech.fr
Focus on Structured Output Prediction with Abstention
Problem
Problem: Error at a node penalizes the prediction of descendants.
The room was ok, nothingspecial, still a perfectchoice to quickly︸ ︷︷ ︸
Service, Checkin
join
the main places︸ ︷︷ ︸Location
.
Can we build a mechanism allowing to abstain on difficult nodes?
(
LTCI, Télécom ParisTech, Paris, FRANCEflorence.dalche@telecom-paristech.fr
DATAIA-JST International Symposium on DataScience and AI
)Structured Output Learning with Abstention July 10, 2018 9 / 25
florence.dalche@telecom-paristech.fr
Focus on Structured Output Prediction with Abstention
Mathematical setup
X an input sample space.Y the subset of {0, 1}d that contains all possible legal labelings of an outputstructure G.
Goal of Structured Output Learning with Abstention: learn a pair of functions (h, r) fromX to YH,R ⊂ {0, 1}d × {0, 1}d where a predictor h predicts the labels of G and anabstention function r chooses on which components of G to abstain from predicting alabel.
Y? ⊂ {0, 1, a}d is the set of legal labelings with abstention where a denotes theabstention label,
Abstention-aware predictive model f h,r : X → Y? defined by :{f h,r (x)T = [f h,r1 (x), . . . , f
h,rd (x)],
f h,ri (x) = 1h(x)i=11r(x)i=1 + a1r(x)i=0.
(
LTCI, Télécom ParisTech, Paris, FRANCEflorence.dalche@telecom-paristech.fr
DATAIA-JST International Symposium on DataScience and AI
)Structured Output Learning with Abstention July 10, 2018 10 / 25
florence.dalche@telecom-paristech.fr
Focus on Structured Output Prediction with Abstention
Learning setup
(xi , yi )i=1,...,n ∼ D are n i.i.d. samples from a distribution P over X × Y.Suppose that we have access to an abstention aware loss ∆a : YH,R × Y → R+then the risk of an abstention aware predictor is:
R(h, r) = Ex,y∼D ∆a(h(x), r(x), y).
Where ∆a can be rewritten under the general form :
∆a(h(x), r(x), y) = 〈ψwa(y),Cψa(h(x), r(x))〉,
With C : Rp → Rq a bounded linear operator and ψa : YH,R → Rp, ψwa : Y → Rq outputembeddings.
(
LTCI, Télécom ParisTech, Paris, FRANCEflorence.dalche@telecom-paristech.fr
DATAIA-JST International Symposium on DataScience and AI
)Structured Output Learning with Abstention July 10, 2018 11 / 25
florence.dalche@telecom-paristech.fr
Focus on Structured Output Prediction with Abstention
Abstention-aware H-loss (Ha-loss)
∆a(h(x), r(x), y) =d∑
i=1
cAi1{f h,ri =a,fh,rp(i)=yp(i)}︸ ︷︷ ︸
abstention cost
+ cAc i1{f h,ri 6=yi ,fh,rp(i)=a}︸ ︷︷ ︸
abstention regret
+ ci1{f h,ri 6=yi ,fh,rp(i)=yp(i),a 6=f
h,ri }︸ ︷︷ ︸
misclassification cost
This loss writes as a inner product 〈ψwa(y),Cψa(h(x), r(x))〉.
(
LTCI, Télécom ParisTech, Paris, FRANCEflorence.dalche@telecom-paristech.fr
DATAIA-JST International Symposium on DataScience and AI
)Structured Output Learning with Abstention July 10, 2018 12 / 25
florence.dalche@telecom-paristech.fr
Focus on Structured Output Prediction with Abstention Learning framework
Square surrogate framework
True risk:
R(h, r) = Ex 〈Ey|xψwa(y),Cψa(h(x), r(x))〉.
Procedure :
Solve a surrogate risk minimization problem :
ming∈H
Ex,y‖ψwa(y)− g(x)‖2︸ ︷︷ ︸surrogate risk
.
Solve a pre-image problem
(ĥ(x), r̂(x)) = arg min(yh,yr )∈YH,R
〈ĝ(x),Cψa(yh, yr )〉,
(
LTCI, Télécom ParisTech, Paris, FRANCEflorence.dalche@telecom-paristech.fr
DATAIA-JST International Symposium on DataScience and AI
)Structured Output Learning with Abstention July 10, 2018 13 / 25
florence.dalche@telecom-paristech.fr
Focus on Structured Output Prediction with Abstention Learning framework
Square surrogate framework: intuition
(
LTCI, Télécom ParisTech, Paris, FRANCEflorence.dalche@telecom-paristech.fr
DATAIA-JST International Symposium on DataScience and AI
)Structured Output Learning with Abstention July 10, 2018 14 / 25
florence.dalche@telecom-paristech.fr
Focus on Structured Output Prediction with Abstention Learning framework
Square surrogate framework: intuition
(
LTCI, Télécom ParisTech, Paris, FRANCEflorence.dalche@telecom-paristech.fr
DATAIA-JST International Symposium on DataScience and AI
)Structured Output Learning with Abstention July 10, 2018 15 / 25
florence.dalche@telecom-paristech.fr
Focus on Structured Output Prediction with Abstention Learning framework
Square surrogate framework: intuition
(
LTCI, Télécom ParisTech, Paris, FRANCEflorence.dalche@telecom-paristech.fr
DATAIA-JST International Symposium on DataScience and AI
)Structured Output Learning with Abstention July 10, 2018 16 / 25
florence.dalche@telecom-paristech.fr
Focus on Structured Output Prediction with Abstention Learning framework
Square surrogate framework: intuition
(
LTCI, Télécom ParisTech, Paris, FRANCEflorence.dalche@telecom-paristech.fr
DATAIA-JST International Symposium on DataScience and AI
)Structured Output Learning with Abstention July 10, 2018 17 / 25
florence.dalche@telecom-paristech.fr
Focus on Structured Output Prediction with Abstention Learning framework
Surrogate risk minimization
Goal :g∗ = min
g∈HEx,y‖ψwa(y)− g(x)‖2︸ ︷︷ ︸
surrogate risk
.
Based on an empirical sample (xi , yi )i∈1,...,n:
ĝ = ming
1n
n∑i=1
‖ψwa(yi )− g(xi )‖2 + λΩ(g),
Multivariate regression problemH: operator valued kernel, vector random forest, kNN, . . .
(
LTCI, Télécom ParisTech, Paris, FRANCEflorence.dalche@telecom-paristech.fr
DATAIA-JST International Symposium on DataScience and AI
)Structured Output Learning with Abstention July 10, 2018 18 / 25
florence.dalche@telecom-paristech.fr
Focus on Structured Output Prediction with Abstention Learning framework
Learning guarantees
Theorem
Based on the previous notations, the optimal predictor (h∗, r∗) is defined as:
(h∗(x), r∗(x)) = arg min(yh,yr )∈YH,R
〈Cψa(yh, yr ),Ey|xψwa(y)〉.
The excess risk of an abstention aware predictor (ĥ, r̂) defined from ĝ:R(ĥ, r̂)−R(h?, r?) is linked to the estimation error of the regression step.
R(ĥ, r̂)−R(h?, r?) ≤ 2cl√L(ĝ)− L(Ey|xψwa(y)),
where L(g) = Ex,y‖ψwa(y)− g(x)‖2, and cl = ‖C‖maxyh,yr∈YH,R ‖ψa(yh, yr )‖Rp .
(
LTCI, Télécom ParisTech, Paris, FRANCEflorence.dalche@telecom-paristech.fr
DATAIA-JST International Symposium on DataScience and AI
)Structured Output Learning with Abstention July 10, 2018 19 / 25
florence.dalche@telecom-paristech.fr
Focus on Structured Output Prediction with Abstention Learning framework
Pre-image for hierarchical structures with abstention
Step 2 : Solve a pre-image problem
(ĥ(x), r̂(x)) = arg min(yh,yr )∈YH,R
〈ĝ(x),Cψa(yh, yr )〉,
Problem: search over the set YH,R which is a subset of {0, 1}d × {0, 1}d under the
constraint A
yhyrc
≤ b.Canonical form :
(ĥ(x), r̂(x)) = arg min(yh,yr )
[yTh yTr c
T ]MTψx
s.t. Acanonical
yhyrc
≤ bcanonical,(yh, yr ) ∈ {0, 1}d × {0, 1}d ,
Where Acanonical, bcanonical encode the constraints of A, b and the one of YH,RIn general→ NP-HardThere exists polynomial time good initialization techniques.
(
LTCI, Télécom ParisTech, Paris, FRANCEflorence.dalche@telecom-paristech.fr
DATAIA-JST International Symposium on DataScience and AI
)Structured Output Learning with Abstention July 10, 2018 20 / 25
florence.dalche@telecom-paristech.fr
Focus on Structured Output Prediction with Abstention Empirical results
Experimental Setting
Dataset :Input: TripAdvisor reviewsannotated at the sentence levelfrom [Marcheggiani et al. 2014]We use the dense InferSent[Conneau et al. 2017] ( featurerepresentation for handling inputdata).
(
LTCI, Télécom ParisTech, Paris, FRANCEflorence.dalche@telecom-paristech.fr
DATAIA-JST International Symposium on DataScience and AI
)Structured Output Learning with Abstention July 10, 2018 21 / 25
florence.dalche@telecom-paristech.fr
Focus on Structured Output Prediction with Abstention Empirical results
Experiments (subset): Joint Aspect and polarity prediction withabstention
Parameterization: ci =cp(i)
|siblings(i)| , cAi = KAci , cAc i = KAc ci .
0 1 2 3 4 5 6
mean number of abstentions on aspect nodes
0.03
0.04
0.05
0.06
0.07
0.08
0.09
Nor
mal
ized
Ham
min
glo
ss
Hamming loss on the polarity nodes located after an abstention label
KAc = 0.25
KAc : 0.5
KAc : 0.75
HStrict KAc = 0.25
HStrict KAc = 0.5
HStrict KAc = 0.75
(
LTCI, Télécom ParisTech, Paris, FRANCEflorence.dalche@telecom-paristech.fr
DATAIA-JST International Symposium on DataScience and AI
)Structured Output Learning with Abstention July 10, 2018 22 / 25
florence.dalche@telecom-paristech.fr
Conclusion
Outline
1 Short Overview of research activities
2 Focus on Structured Output Prediction with Abstention
3 Conclusion
(
LTCI, Télécom ParisTech, Paris, FRANCEflorence.dalche@telecom-paristech.fr
DATAIA-JST International Symposium on DataScience and AI
)Structured Output Learning with Abstention July 10, 2018 23 / 25
florence.dalche@telecom-paristech.fr
Conclusion
Conclusion and future works
SOLA extends two families of approaches: learning with abstention andleast-squares surrogate structured prediction.
Beyond ridge regression, any vector-valued regression is eligible (including deeplearning).
Allows to build a robust representation for star rating in a pipeline framework.
Beyond the target problem: develop general approaches to efficiently providerobust and reliable structured output prediction, whatever the underlying predictorarchitecture.
Other ways to define r(x): Bayesian approaches
(
LTCI, Télécom ParisTech, Paris, FRANCEflorence.dalche@telecom-paristech.fr
DATAIA-JST International Symposium on DataScience and AI
)Structured Output Learning with Abstention July 10, 2018 24 / 25
florence.dalche@telecom-paristech.fr
Conclusion
References
Hierarchical Multi-label Conditional Random Fields for Aspect-Oriented OpinionMining, Marcheggiani, Diego and Täckström, Oscar and Esuli, Andrea andSebastiani, Fabrizio, ECIR 2014.
Supervised Learning of Universal Sentence Representations from NaturalLanguage Inference Data, A. Conneau and D. Kiela and H. Schwenk and L. Barraultand A.Bordes, arxiv, 2017.
Structured Output Learning with Abstention, A. Garcia, C. Clavel, S. Essid, F.d’Alché-Buc, ICML 2018.
Output Fisher Embedding Regression, M. Djerrab, A. Garcia, M. Sangnier, F.d’Alché-Buc, ECML/PKDD 2018 and MLJ, May 2018
(
LTCI, Télécom ParisTech, Paris, FRANCEflorence.dalche@telecom-paristech.fr
DATAIA-JST International Symposium on DataScience and AI
)Structured Output Learning with Abstention July 10, 2018 25 / 25
florence.dalche@telecom-paristech.fr
Short Overview of research activitiesFocus on Structured Output Prediction with AbstentionLearning frameworkEmpirical results
Conclusion
Recommended