Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Learning under BudgetGenerative Adversarial Nets
Nataliya Sokolovska
Sorbonne UniversityParis, France
Master 2 in StatisticsMarch, 29, 2019
1 / 41
Outline
Learning under BudgetMotivation for Learning under BudgetCascade ClassifiersA deep reinforcement learning modelMDP and POMDPSome Applications in Medicine
Generative Adversarial LearningGenerative Adversarial NetworksExample: a Real Materials Science ApplicationReal Data and Discussion
2 / 41
Outline
Learning under BudgetMotivation for Learning under BudgetCascade ClassifiersA deep reinforcement learning modelMDP and POMDPSome Applications in Medicine
Generative Adversarial LearningGenerative Adversarial NetworksExample: a Real Materials Science ApplicationReal Data and Discussion
3 / 41
Example: the DiaRem (Diabetes Prediction) Score
Variable Thresholds ScoreAge <40 0
40–49 150 – 59 2>60 3
Glycated hemoglobin <6.5 06.5 – 6.9 27 – 8.9 4> 9 6
Insuline No 0Yes 10
Other drugs No 0Yes 3
Classify as Remission if sum of scores < 7Classify as Non-remission if sum of scores ≥ 7
C. D. Still et al., Preoperative prediction of type 2 diabetes remission afterRoux-en-Y gastric bypass surgery: a retrospective cohort study, 2013
4 / 41
DiaRem accuracy on NutriOmics data
0 2 4 6 8 11 15 18 21
remissionnot remission
Score
05
1015
2025
30
5 / 41
Classification under budget constraints
Problem: predict y from x when this is reliable; abstain otherwise.The cost:
δ =
1 , if y = y,δ , if rejection,0 , if y 6= y.
δ < 1/2, y is a classifier.The cost is used to avoid misclassification.I Boosting with Abstention, C. Cortes, G. DeSalvo and M. Mohri,
Advances in Neural Information Processing Systems, 2016:
Many thanks to Matthieu Clertant for sharinghis slides
6 / 41
Homogeneous cascade with abstention
Label
Test A
Label
Test A
f(A)
Label
Test A Test B Test C
f(A,B
)
f(A,B
,C)
Label Label
. . .
Tradeoff between accuracy and cost: The system incurs a penaltyof δk+1 at k-th stage if it rejects to seek more measurements. Order oftests and classifier with abstention are learned purely from the data.
7 / 41
The Costs
In real life application, budget is always limited.
Possible costs:I time, money,I side effects of medication,I complexity (interpretability, signature disease) ...
The costs δk depends of the tests or the nature of acquired data.
Sum of costs:
∆ =
K∑k=1
δk.
∆ is a tradeoff parameter between accuracy and cost.
8 / 41
Heterogeneous Cascade with abstention
Test A
Test B
Test C
Test C
f(A
)Test D
Test E
f(A,B,C,D)
f(A,B,C,E)
Test D
Test E
f(A,C,D)
f(A,C,E)
f(A
,B)
Label
LabelLabel
Label
Label
Label
Tradeoff between accuracy and cost: Dynamic diagnostic protocolsare individually tailored acquisitions of patients data with the aim toprovide the most accurate diagnostics for the lowest cost.
9 / 41
Goals
The machine learning sytem has to provide:I individualized treatment of patients,I exploration of K! homogeneous cascades,I a tree structure or heterogeneous cascade,I 2K classifiers and 2K rejecters,I classifiers and rejecters jointly,I interpretable classifiers,
The algorithm has to learn purely from data.
10 / 41
Strategy Games: Human vs Computer
Deep Blue (1997):I it won against Gary KasparovI almost exhaustive explorationI human knowledge (evaluation function,
memory of real games)
Alpha Go (2015):I it won against Lee SedolI no exhaustive explorationI human knowledge (memory of real game)
Alpha Zero (2017): no human knowledge.
These board games are Markov Decision Processes.
11 / 41
Markov Decision Process
A Markov Decision Process is a 5-tuple (S,A, P,C,X × Y).
I S : set of states,I A : set of actions,I P : set of conditional transition probabilities,I C : cost function,I X × Y : set of observations.
12 / 41
From POMDP to MDP
Partially Observable Markov Decision Process deals with:I uncertainty related to the effects of actions (cost).I uncertainty about the current state.
Purely epsistemic MDP deals with the second uncertainty.
Historical state st:I a[t], set of tests done at time t,I x[t], all the features available at time t.
Purely epistemic MDP+ historical data (state)= simple MDP.
13 / 41
Deep Q Learning
Goal: Determinating a policy π of action.
Global cost C and cost from step t, noted Ct,:
C(sK) =∑
k∈[K+1]
c(sk) and Ct(sK) =∑
k∈[t:K+1]
c(sk) .
The function Q (also called cost-to-go to at) is the expectedcumulative cost from time t to the end if we undertake action at:
Qπ(st−1, at) = Eπ [Ct(sK) | st−1, at].
Optimal solution: π? = arg minπ
Eπ[C(sK)].
The function Qπ?
is approximatedwith deep neural network.
14 / 41
Optimal policy
TheoremLet y and r be two natural candidates in case of classification andrejection when the distribution D is known:
yt = arg maxat∈[L]
PD(at = Y | st−1) and rt = − arg minat∈[−K:−1]
Qπ(st−1, at) .
For all t ∈ [K] (or to the end of the cascade), the optimal solution ofthe MDP system satisfies:
π?(st−1) = a?t =
yt , if max
at>0PD(at = Y | st−1)
> 1−minat<0
Qπ(st−1, at)
rt , otherwise.
15 / 41
Deep Neural Networks
Environment: dataset of patients features and final diagnostic.
Agent (Part 1):Rejecter/selector
At time t in the cascade:I inputs are summarized by st = (a[t];x[t]) (historical state: tests+
features),I outputs are the approximated cost-to-go to each available tests
(function Q).
16 / 41
Deep Neural Networks
Environment: dataset of patients features and final diagnostic.
Agent (Part 2):Classifier
At time t in the cascade:I inputs are summarized by st = (a[t];x[t]) (historical state: tests+
features),I outputs are the probability of each class (function Q).
16 / 41
Deep Neural Networks
Environment: dataset of patients features and final diagnostic.
Agent (Part 2):Classifier
At time t in the cascade:I inputs are summarized by st = (a[t];x[t]) (historical state: tests+
features),I final outputs are the probability of each class (function Q).
Familly of logistic regressions: fw(st, a) ∝ exp 〈βa(M[t]),M[t] � x〉.
16 / 41
Experimentation setup
Three models:I ICCA stands for Interpretable Cascade Classifier with
Abstention; it uses a classifier network generating a family ofsoftmax regression,
I p-ICCA, same model after a run of pretraining using only themask for the rejecter network.
I Random forest with feature selection (Importance measure bymean decrease impurity).
Data: Breast Cancer Wisconsin Diagnostic (UCI). We dispose of 30parameters describing characteristics of the cell nuclei and medicalimages for 569 patients. All parameters are continuous. Costs offeature acquisition are the same for all the features.
17 / 41
Data of breast cancer diagnostic
2 3 4 5 6 7 8number of features
0.93
0.94
0.95
0.96
0.97
0.98
Accu
racy
Accuracy and standard deviation for ICCA (in blue), pretrainedICCA(in yellow), and the random forest classifier (dotted line). Thefeatures share the same cost: δ ∈ [10−4, 1/3.10−2].
18 / 41
Data of breast cancer diagnostic
0 5 10 15 20 25 30
Number of features0.00
0.05
0.10
0.15
0.20
0.25
Pro
babi
lity
Histogram of the number of features explored for ICCA. In blue: allobservations (size: 569, mean: 4.8); in orange: the misclassified ob-servations (size: 21, mean: 12.3). Overall accuracy: 96,31%.
18 / 41
Published in proceedings of AISTATS, 2019.
19 / 41
The State-of-the-art and Open Challenges
More or less achieved:I methods for learning an (individualized) diagnostic protocol,I based on theoretical support (MDP, POMDP),I deep network able to learn a family of parametric models for
missing data.
Still a Challenge:I Missing data alternatives as generative adversarial network and
variational autoencoder,I numerical experiments for special cases (comparison between
cascading system, multi-stage system and classifier withembedded feature selection),
I adaptation to genomic, transcriptomic and metabolomic data,I features variation in time (POMDP)
20 / 41
Outline
Learning under BudgetMotivation for Learning under BudgetCascade ClassifiersA deep reinforcement learning modelMDP and POMDPSome Applications in Medicine
Generative Adversarial LearningGenerative Adversarial NetworksExample: a Real Materials Science ApplicationReal Data and Discussion
21 / 41
Generative Learning
I Discriminative learning tries to classify data (maps features toclasses)I Find a boundary between classes
I Generative learning tries to provide features given a labelI Model data distribution
I To be precise, generative learning: given a class, how likely arethese or those features?
22 / 41
Generative Adversarial Networks
Two components:
1. Generator – generates new data
2. Discriminator – estimate the probability to belong to a class
Steps of the GANs:
1. The generator generates data
2. The generated and observed data are fed into the discriminator
3. The discriminator returns probabilities to belong to the classes(fake or true)
23 / 41
GANs
from https://skymind.ai/wiki/generative-adversarial-network-gan
I Feedback loop: discriminator and the true classesI Feedback loop: generator – discriminator
24 / 41
GANs
I pz – Data distribution over noise (uniform)I pg – Generator’s distribution over (observed) dataI pr – True distribution
Play a minimax game:
minG
maxD
L(D,G) = Ex∼pr{logD(x)}+ Ez∼pz{log(1−D(G(z)))}
= Ex∼pr{logD(x)}+ Ex∼pg{1− logD(x)}
I D is expected to maximize Ez∼pz{log(1−D(G(z)))}I G is trained to fool D, and to minimizeEz∼pz{log(1−D(G(z)))}
25 / 41
GANs
from http://www.kdnuggets.com/2017/01/generative-adversarial-networks-hot-topic-machine-learning.html
26 / 41
GANs: the Learning Algorithm
from Ian J. Goodfellow et al., 2014l
27 / 41
Autoencoders
from https://www.jeremyjordan.me/variational-autoencoders/
28 / 41
Variational Autoencoders
from https://www.jeremyjordan.me/variational-autoencoders/
29 / 41
Generative Adversarial LearningGenerative Adversarial NetworksExample: a Real Materials Science ApplicationReal Data and Discussion
30 / 41
Some Materials Science Examples
I Synthesis of new organic and inorganic compounds
I Combinatorial problem for human experts
I “Clean” energy: hydrogen storage
31 / 41
Problem Formulation
Generate: ternary hydride compounds of the form ”A (a metal) - H(hydrogen) - B (a metal)”.
I The training algorithm observes stable binary compoundscontaining chemical elements A+H which is a composition ofsome metal A and the hydrogen H, and B+H which is a mixtureof another metal B with the hydrogen.
I A machine learning algorithm has access to observations{(xAHi)}
NAHi=1 and {(yBHi)}
NBHi=1 . Our goal is to generate novel
ternary, i.e. more complex, stable data xAHB (or yBHA) basedon the properties learned from the observed binary structures.
32 / 41
GANs and Cross Domain GANs
I DiscoGAN and CycleGAN: cross-domain learningI Zhu, J.-Y.; Park, T.; Isola, P.; and Efros, A. A. Unpaired
Image-to-Image Translation using Cycle-Consistent AdversarialNetworks. ICCV, 2017
I Kim, T.; Cha, M.; Kim, H.; Lee, J. K.; and Kim, J. Learning toDiscover Cross-Domain Relations with Generative AdversarialNetworks, ICML, 2017.
I Discover relations between two different domains from unpairedsamples, and to find a mapping from one domain to another
I How to generate data with augmented complexity?
33 / 41
GANs and Cross Domain GANsWe consider a function GABZ that maps elements from domains Aand B to domain Z which includes the co-domains A and B. In anunsupervised learning scenario, GABZ can be arbitrarily defined,however, to apply it to real-world applications, some conditions on therelation of interest have to be well-defined.In an idealistic setting, the equality
GABZ ◦GZAB(xA, xB) = (xA, xB) (1)
is satisfied. However, this constraint is a hard constraint, it is notstraightforward to optimize it, and a relaxed soft constraint ispreferred. As a soft constraint, we can consider the distance
d (GABZ ◦GZAB(xA, xB), (xA, xB)) , (2)
and minimize it using a metric function such as L1 or L2.
−ExA,xB∼PA,B[logDZ(GABZ)(xA, xB)] . (3)
34 / 41
CrystalGAN
Our goal: Generate xAHB (or yBHA) from {(xAHi)}NAHi=1 and
{(yBHi)}NBHi=1
Three steps of CrystalGAN:
1. First step GAN which is closely related to the cross-domainGANs, and that generates pseudo-binary samples where thedomains are mixed.
2. Feature transfer procedure constructs higher order complexitydata from the samples generated at the previous step, and wherecomponents from all domains are well-separated.
3. Second step GAN synthesizes, under geometric constraints,novel ternary stable chemical structures.
35 / 41
CrystalGAN: First Step
36 / 41
CrystalGAN: Feature Transfer and Second Step
37 / 41
Outline
Generative Adversarial LearningGenerative Adversarial NetworksExample: a Real Materials Science ApplicationReal Data and Discussion
38 / 41
Numerical Experiments: POSCAR files
A Material Science application:generate stable data for hydrogen storage
Dataset: POSCAR files: 3 matrices:
1. abc matrix, corresponding to the three lattice vectors defining theunit cell of the system
2. atomic positions of H atom, and coordinates of metallic atomA (or B).
39 / 41
Results on Crystals Dataset
Number of ternary compositions of good quality (stable) generated bythe tested methods
Composition GAN Disco Crystal CrystalGAN GAN GAN
(standard) without with geometricconstraints constraints
Pd - Ni - H1 0 0 4 9Mg - Ti - H2 0 0 2 8
(learning from about 35 observations from each class)
1Palladium – Hydrogen – Nickel2Magnesium - Hydrogen - Titanium
40 / 41
Published in proceedings of AAAI-MAKE, 2019.
41 / 41
Many thanks
to Matthieu Clertant (learning under budget)and Asma Nouira (GANs for materials science)
for sharing their slides and figures
42 / 41