Upload
vutram
View
236
Download
5
Embed Size (px)
Citation preview
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Multi-Task Learning: Theory, Algorithms, and Applications
Jiayu Zhou1,2, Jianhui Chen3, Jieping Ye1,2
1Computer Science and Engineering, Arizona State University, AZ 2The Biodesign Institute, Arizona State University, AZ
3GE Global Research, NY
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Tutorial Goals
• Understand the basic concepts in multi-task learning
• Understand different approaches to model task relatedness
• Get familiar with different types of multi-task learning techniques
• Introduce multi-task learning applications
• Introduce the multi-task learning package: MALSAR
2
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Tutorial Road Map
• Part I: Multi-task Learning (MTL) background and motivations
• Part II: MTL formulations
• Part III: Case study of real-world applications
– Incomplete Multi-Source Fusion
– Drosophila Gene Expression Image Analysis
• Part IV: An MTL Package (MALSAR)
• Current and future directions
3
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Tutorial Road Map
• Part I: Multi-task Learning (MTL) background and motivations
• Part II: MTL formulations
• Part III: Case study of real-world applications
– Incomplete Multi-Source Fusion
– Drosophila Gene Expression Image Analysis
• Part IV: An MTL Package (MALSAR)
• Current and future directions
4
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Multiple Tasks o Examination Scores Prediction1 (Argyriou et. al.’08)
5
School 1 - Alverno High School
School 138 - Jefferson Intermediate School
School 139 - Rosemead High School
1The Inner London Education Authority (ILEA)
student-dependent school-dependent
student-dependent school-dependent
student-dependent school-dependent
Student
id
Birth
year
Previous
score
…
School
ranking
…
72981 1985 95 … 83% …
Student
id
Birth
year
Previous
score
…
School
ranking
…
31256 1986 87 … 72% …
Student
id
Birth
year
Previous
score
…
School
ranking
…
12381 1986 83 … 77% …
Exam
score
?
Exam
score
?
Exam
score
?
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Learning Multiple Tasks o Learning each task independently
6
task 1st
task 138th
task 139th
Student
id
Birth
year
Previous
score
School
ranking
…
72981 1985 95 83% …
Student
id
Birth
year
Previous
score
School
ranking
…
31256 1986 87 72% …
Student
id
Birth
year
Previous
score
School
ranking
…
12381 1986 83 77% …
Exam
Score
?
Exam
Score
?
Exam
Score
?
School 1 - Alverno High School
School 138 - Jefferson Intermediate School
School 139 - Rosemead High School
Excellent
Excellent
Excellent
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Learning Multiple Tasks o Leaning multiple tasks simultaneously
7
task 1st
task 138th
task 139th
Learn tasks simultaneously Model the tasks relationship
Student
id
Birth
year
Previous
score
School
ranking
…
72981 1985 95 83% …
Student
id
Birth
year
Previous
score
School
ranking
…
31256 1986 87 72% …
Student
id
Birth
year
Previous
score
School
ranking
…
12381 1986 83 77% …
Exam
Score
?
Exam
Score
?
School 1 - Alverno High School
School 138 - Jefferson Intermediate School
School 139 - Rosemead High School
Exam
Score
?
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Performance of MTL o Evaluation on the School data:
• Predict exam scores for 15362 students from 139 schools
• Describe each student by 27 attributes
• Compare single task learning approaches (Ridge Regression, Lasso) and one multi-task learning approach (trace-norm regularized learning)
8
N−MSE = mean squared error
variance (target)
1 2 3 4 5 6 7 80.7
0.75
0.8
0.85
0.9
0.95
1
1.05
Index of Training Ratio
N-M
SE
Ridge Regression
Lasso
Trace Norm
Performance measure:
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Multi-Task Learning
• Multi-task Learning is different from single task learning in the training (induction) process.
• Inductions of multiple tasks are performed simultaneously to capture intrinsic relatedness.
9
Training DataTrained Model
Task 1
Training DataTrained Model
Task 2
Training DataTrained Model
Task t
...
...
Training
Generalization
Generalization
Generalization
Multi-Task Learning
Training DataTrained Model
Task 1
Training DataTrained Model
Task 2
Training DataTrained Model
Task t
...
...
Training
Training
Training
Generalization
Generalization
Generalization
Single Task Learning
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Learning Methods
o
–
–
–
o
–
–
–
o
–
–
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Web Pages Categorization Chen et. al. 2009 ICML
• Classify documents into categories
• The classification of each category is a task
• The tasks of predicting different categories may be latently related
11
Health
Travel
World
Politics
US...
Category Classifiers
Models of different categories are
latently related
Classifiers’ Parameters
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
MTL for HIV Therapy Screening Bickel et. al. ICML 08
• Hundreds of possible combinations of drugs, some of which use similar biochemical mechanisms
• The samples available for each combination are limited.
• For a patient, the prediction of using one combination is a task
• Use the similarity information by multiple task learning
12
ETV
ABC
AZT
D4T
DDI
FTC
...
Patient Treatment RecordDrug
AZT
ETV
FTC
?
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Other Applications
• Portfolio selection [Ghosn and Bengio, NIPS’97]
• Collaborative ordinal regression [Yu et. al. NIPS’06]
• Web image and video search [Wang et. al. CVPR’09]
• Disease progression modeling [Zhou et. al. KDD’11]
• Disease prediction [Zhang et. al. NeuroImage 12]
13
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Tutorial Road Map
• Part I: Multi-task Learning (MTL) background and motivations
• Part II: MTL formulations
• Part III: Case study of real-world applications
– Incomplete Multi-Source Fusion
– Drosophila Gene Expression Image Analysis
• Part IV: An MTL Package (MALSAR)
• Current and future directions
14
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
How Tasks Are Related
15
All tasks are relatedAssumption:
Methods • Mean-regularized MTL • Joint feature learning • Trace-Norm regularized
MTL • Alternating structural
optimization (ASO) • Shared Parameter
Gaussian Process
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
How Tasks Are Related
16
• Assume all tasks are related may be too strong for practical applications.
• There are some irrelevant (outlier) tasks.
There are outlier tasksAssumption:
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
How Tasks Are Related
17
Methods • Clustered MTL • Tree MTL • Network MTL
Tasks have group structuresAssumption:
Tasks have tree structuresAssumption:
a cb
Models
Tasks have graph/network structuresAssumption:
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Multi-Task Learning Methods
18
• Regularization-based MTL
– All tasks are related • regularized MTL, joint feature learning, low rank MTL, ASO
– Learning with outlier tasks: robust MTL
– Tasks form groups/graphs/trees • clustered MTL, network MTL, tree MTL
• Other Methods
– Shared Hidden Nodes in Neural Network
– Shared Parameter Gaussian Process
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Regularization-based Multi-Task Learning
• All tasks are related
– Mean-Regularized MTL
– MTL in high dimensional feature space • Embedded Feature Selection
• Low-Rank Subspace Learning
• Clustered MTL
• MTL with Tree/Graph structure
19
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Notation
• We focus on linear models: Yi = 𝑋𝑖 ×𝑊𝑖 𝑋𝑖 ∈ ℝ
𝑛𝑖×𝑑 , 𝑌𝑖 ∈ ℝ𝑛𝑖×1,𝑊 = [𝑊1,𝑊2, … ,𝑊𝑚]
20
Learning
Task m
Dimension d
Sam
ple
nt
... Sa
mp
le n
2
Sam
ple
n1
Feature Matrices Xi
Task m
Sam
ple
nt
... Sa
mp
le n
2
Sam
ple
n1
Target Vectors Yi
Task m
Dim
ensio
n d
Model Matrix W
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Mean-Regularized Multi-Task Learning Evgeniou & Pontil, 2004 KDD
• Assumption: task parameter vectors of all tasks are close to each other.
– Advantage: simple, intuitive, easy to implement
– Disadvantage: may not hold in real applications.
21
mean
Task
Regularization penalizes the deviation of each task from the mean
min𝑊
1
2𝑋𝑊 − 𝑌 𝐹
2 + 𝜆 𝑊𝑖 −1
𝑚 𝑊𝑠
𝑚
𝑠=1
𝑚
𝑖=1 2
2
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Regularization-based Multi-Task Learning
• All tasks are related
– Mean-Regularized MTL
– MTL in high dimensional feature space • Embedded Feature Selection
• Low-Rank Subspace Learning
• Clustered MTL
• MTL with Tree/Graph structure
22
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Multi-Task Learning with High Dimensional Data
• In practical applications, we may deal with high dimensional data.
– Gene expression data, biomedical image data
• Curse of Dimensionality
• Dealing with high dimensional data in multi-task learning
– Embedded feature selection: L1/Lq - Group Lasso
– Low-rank subspace learning: low-rank assumption – ASO, Trace-norm regularization
23
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Regularization-based Multi-Task Learning
• All tasks are related
– Mean-Regularized MTL
– MTL in high dimensional feature space • Embedded Feature Selection
• Low-Rank Subspace Learning
• Clustered MTL
• MTL with Tree/Graph structure
24
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Multi-Task Learning with Joint Feature Learning
• One way to capture the task relatedness from multiple related tasks is to constrain all models to share a common set of features.
• For example, in school data, the scores from different schools may be determined by a similar set of features.
25
Feature 1
Feature 2
Feature 3
Feature 4
Feature 5
Feature 6
Feature 7
Feature 8
Feature 9
Feature 10
Task 1Task 2
Task 3Task 4
Task 5
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Multi-Task Learning with Joint Feature Learning Obozinski et. al. 2009 Stat Comput, Liu et. al. 2010 Technical Report
• Using group sparsity: ℓ1/ℓ𝑞-norm regularization
• When q>1 we have group sparsity.
26
min𝑊
1
2𝑋𝑊 − 𝑌 𝐹
2 + 𝜆 𝑊 1,𝑞
𝑊 1,𝑞 = 𝒘𝑖 𝑞
𝑑
𝑖=1
...
Y
≈
n×m
... ×
WX
n×d d×m
Sample 1
...
...Sample 2
Sample 3
Sample n-2
Sample n-1
Sample n
Task 1Task 2
Task 3Task m
Task 1Task 2
Task 3Task m
Input ModelOutput
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Writer-Specific Character Recognition Obozinski, Taskar, and Jordan, 2006
• Each task is a classification between two letters for one writer.
27
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Dirty Model for Multi-Task Learning Jalali et. al. 2010 NIPS
• In practical applications, it is too restrictive to constrain all tasks to share a single shared structure.
28
+=
Group Sparse Component
P
Sparse Component
Q
Model
W
min𝑃,𝑄𝑌 − 𝑋 𝑃 + 𝑄 𝐹
2 + 𝜆1 𝑃 1,𝑞 + 𝜆2 𝑄 1
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Robust Multi-Task Learning
o Most Existing MTL Approaches o Robust MTL Approaches
29
relevant tasks all tasks are relevant
All tasks are relatedAssumption:
There are outlier tasksAssumption:
irrelevant task
irrelevant task
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Robust Multi-Task Feature Learning Gong et. al. 2012 Submitted
• Simultaneously captures a common set of features among relevant tasks and identifies outlier tasks.
30 min𝑃,𝑄𝑌 − 𝑋 𝑃 + 𝑄 𝐹
2 + 𝜆1 𝑃 1,𝑞 + 𝜆2 𝑄𝑇1,𝑞
+=
Group Sparse Component
P
Group Sparse Component
Q
Model
W
Joint Selected Features Outlier Tasks
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Regularization-based Multi-Task Learning
• All tasks are related
– Mean-Regularized MTL
– MTL in high dimensional feature space • Embedded Feature Selection
• Low-Rank Subspace Learning
• Clustered MTL
• MTL with Tree/Graph structure
31
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Trace-Norm Regularized MTL
32
o Capture task relatedness via a shared low-rank structure
trained model
training data
task 1
training
trained model
training data
task 2 …
…
A shared low-rank subspace
generalization
generalization
trained model
training data
task n generalization
…
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Low-Rank Structure for MTL
… 𝛼1 𝛼2 + ≈ ×
training data weight vector target
Task 1
… ≈ ×
Task 2
…
≈ × Task 3
=
=
=
𝛽1 𝛽1 +
𝛾1 𝛾2 +
basis vector basis vector
• Assume we have a rank 2 model matrix:
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Low-Rank Structure for MTL
34
=
=
𝛼1 𝛼2𝛽1 𝛽2 𝛾1 𝛾2
𝑇
𝜏 11 ⋯ 𝜏 1𝑚 ⋮ ⋱ ⋮𝜏 𝑝1 ⋯ 𝜏 𝑝𝑚
… …
×
×
A low rank structure Basis vectors Coefficients
m tasks p basis vectors 𝑚 > 𝑝
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Low-Rank Structure for MTL Ji et. al. 2009 ICML
• Rank minimization formulation
– min𝑊Loss(𝑊) + 𝜆 × Rank(𝑊)
– Rank minimization is NP-Hard for general loss functions
• Convex relaxation: trace norm minimization
– min𝑊Loss(𝑊) + 𝜆 × 𝑊 ∗
– The trace norm is theoretically shown to be a good approximation for rank function (Fazel et al., 2001).
35
𝑊 ∗ : sum of singular values of W
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Low-Rank Structure for MTL o Evaluation on the School data:
• Predict exam scores for 15362 students from 139 schools
• Describe each student by 27 attributes
• Compare Ridge Regression, Lasso, and Trace Norm (for inducing a low-rank structure)
36
N−MSE = mean squared error
variance (target)
1 2 3 4 5 6 7 80.7
0.75
0.8
0.85
0.9
0.95
1
1.05
Index of Training Ratio
N-M
SE
Ridge Regression
Lasso
Trace Norm
Performance measure:
The Low-Rank Structure (induced via Trace Norm) leads to the smallest N-MSE.
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Alternating Structure Optimization (ASO) Ando and Zhang, 2005 JMRL
37
Input X ≈ Task 1
Low-Dimensional Feature Map
+ Θ Xw1 v1
Input X ≈ Task 2 + Θ Xw2 v2
Input X ≈ Task m + Θ Xwm vm
...
Θ
• ASO assumes that the model is the sum of two components: a task specific one and a shared low dimensional subspace.
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Alternating Structure Optimization (ASO) Ando and Zhang, 2005 JMRL
38
o Learning from the i-th task
o ASO simultaneously learns models and the shared structure:
ℒi Xi Өvi +wi , yi = Xi Өvi +wi − yi 2
minӨ,{vi,wi}
ℒi Xi Өvi +wi , yi + 𝛼 wi2
𝑚
𝑖=1
subject to ӨTӨ = I
o Empirical loss function for i-th task
Input X ≈
Xi
Low-Dimensional Feature Map
+ Θ Xwi vi
yi
ui =Өvi + wi
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
iASO Formulation Chen et al., 2009 ICML
39
o iASO formulation
minӨ,{vi,wi}
ℒi Xi Өvi +wi , yi + 𝛼 Өvi +wi2 + 𝛽 wi
2
𝑚
𝑖=1
subject to ӨTӨ = I
• control both model complexity and task relatedness
• subsume ASO (Ando et al.’05) as a special case
• naturally lead to a convex relaxation (Chen et al., 09, ICML)
• Convex relaxed ASO is equivalent to iASO under certain mild conditions
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Incoherent Low-Rank and Sparse Structures Chen et. al. 2010 KDD
• ASO uses L2-norm on task-specific component, we can also use L1-norm to learn task-specific features.
40
basis coefficient+= = X
Element-wise Sparse Component
Q
Model
W
Low-Rank Component
P
minP,Q
ℒi Xi Pi + Qi , yi + ߣ Q 1
𝑚
𝑖=1
subject to P ∗ ⩽ 𝜂 Convex formulation
Capture task relatedness Task-specific features
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Robust Low-Rank in MTL Chen et. al. 2011 KDD
41
• Simultaneously perform low-rank MTL and identify outlier tasks.
Outlier Tasks
basis coefficient+= = X
Model
W
Low-Rank Component
P
Group Sparse Component
Q
miniP,Q ℒi Xi 𝑃𝑖 + 𝑄𝑖 , yi + 𝛼 P ∗ + 𝛽 QT
1,𝑞
𝑚
𝑖=1
Capture task relatedness Identify irrelevant tasks
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Summary
• All multi-task learning formulations discussed above can fit into the W=P+Q schema. – Component P: shared structure
– Component Q: information not captured by the shared structure
42
Embedded Feature Selection Shared Structure P Component Q L1/Lq Feature Selection (L1/Lq Norm) 0 Dirty Feature Selection (L1/Lq Norm) L1-norm rMTFL Feature Selection (L1/Lq Norm) Outlier (column-wise L1/Lq Norm) Low-Rank Subspace Learning
Trace Norm Low-Rank (Trace Norm) 0 ISLR Low-Rank (Trace Norm) L1-norm ASO Low-Rank (Shared Subspace) L2-norm on independent comp. RMTL Low-Rank (Trace Norm) Outlier (column-wise L1/Lq Norm)
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Regularization-based Multi-Task Learning
• All tasks are related
– Mean-Regularized MTL
– MTL in high dimensional feature space • Embedded Feature Selection
• Low-Rank Subspace Learning
• Clustered MTL
• MTL with Tree/Graph structure
43
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Multi-Task Learning with Clustered Structures
44
• Most MTL techniques assume all tasks are related
• Not true in many applications • Clustered multi-task learning
assumes the tasks have a group
structure the models of tasks from the
same group are closer to each other than those from a different group
Tasks have group structuresAssumption:
e.g. tasks in the yellow group are predictions of heart related diseases and in the blue group are brain related diseases.
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Clustered Multi-Task Learning Jacob et. al. 2008 NIPS, Zhou et. al. 2011 NIPS
• Use regularization to capture clustered structures.
45
Training Data X ≈
Training Data X ≈
...
Clustered Models
...
Cluster 1 Cluster 2 Cluster k-1 Cluster k
Cluster 1
Cluster 2
Cluster k-1
Cluster k
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Clustered Multi-Task Learning Zhou et. al. 2011 NIPS
• Capture structures by minimizing sum-of-square error (SSE) in K-means clustering:
46
Ij index set of jth cluster
min𝐼 𝑤𝑣 − 𝑤 𝑗 2
2
𝑣∈𝐼𝑗
𝑘
𝑗=1
min𝐹tr 𝑊𝑇𝑊 − tr(𝐹𝑇𝑊𝑇𝑊𝐹)
𝐹 : m×k orthogonal cluster indicator matrix 𝐹𝑖,𝑗 = 1/ 𝑛𝑗 if 𝑖 ∈ 𝐼𝑗 and 0 otherwise
Clustered Models
...
Cluster 1 Cluster 2 Cluster k-1 Cluster k
Cluster 1
Cluster 2
Cluster k-1
Cluster k
m tasks
task number m > cluster number k
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Clustered Multi-Task Learning Zhou et. al. 2011 NIPS
• Directly minimizing SSE is hard because of the non-linear constraint on F:
47
min𝐹tr 𝑊𝑇𝑊 − tr(𝐹𝑇𝑊𝑇𝑊𝐹)
𝐹 : m×k orthogonal cluster indicator matrix 𝐹𝑖,𝑗 = 1/ 𝑛𝑗 if 𝑖 ∈ 𝐼𝑗 and 0 otherwise
min𝐹:𝐹𝑇𝐹=𝐼𝑘
tr 𝑊𝑇𝑊 − tr(𝐹𝑇𝑊𝑇𝑊𝐹)
Zha et. al. 2001 NIPS
Clustered Models
...
Cluster 1 Cluster 2 Cluster k-1 Cluster k
Cluster 1
Cluster 2
Cluster k-1
Cluster k
m tasks
task number m > cluster number k
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Improves generalization performance
capture cluster structures
Cluster 1
Cluster 2
Cluster k-1
Cluster k
Clustered Multi-Task Learning Zhou et. al. 2011 NIPS
• Clustered multi-task learning (CMTL) formulation
• CMTL has been shown to be equivalent to ASO
– Given the dimension of the shared low-rank subspace in ASO and the cluster number in clustered multi-task learning (CMTL) are the same.
48
min𝑊,𝐹:𝐹𝑇𝐹=𝐼𝑘
Loss W + 𝛼 tr 𝑊𝑇𝑊 − tr 𝐹𝑇𝑊𝑇𝑊𝐹 + 𝛽 tr 𝑊𝑇𝑊
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Convex Clustered Multi-Task Learning Zhou et. al. 2011 NIPS
49
Ground Truth Mean Regularized MTL
Trace Norm Regularized MTL
Convex Relaxed CMTL
noise introduced by relaxations
Low rank can also well capture
cluster structure
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Regularization-based Multi-Task Learning
• All tasks are related
– Mean-Regularized MTL
– MTL in high dimensional feature space • Embedded Feature Selection
• Low-Rank Subspace Learning
• Clustered MTL
• MTL with Tree/Graph structure
50
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Multi-Task Learning with Tree Structures
• In some applications, the tasks may be equipped with a tree structure:
– The tasks belonging to the same node are similar to each other
– The similarity between two nodes is related to the depth of the ‘common’ tree node
51
Tasks have tree structuresAssumption:
a cb
ModelsTask a is more similar with b,
compared to c
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Multi-Task Learning with Tree Structures Kim and Xing 2010 ICML
• Tree-Guided Group Lasso
52
min𝑊Loss 𝑊 + 𝜆 𝑤𝑣 𝑊𝐺𝑣
𝑗
2𝑣∈𝑉
𝑑
𝑗=1
Inp
ut
Feat
ure
s
Output (Tasks)
W1 W2 W3
Gv5={W1, W2, W3}
Gv4={W1, W2}
Gv1={W1} Gv2={W2} Gv3={W3}
Structure
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Multi-Task Learning with Graph Structures
• In real applications, tasks involved in MTL may have graph structures
– The two tasks are related if they are connected in a graph, i.e. the connected tasks are similar
– The similarity of two related tasks can be represented by the weight of the connecting edge.
53
Tasks have graph/network structuresAssumption:
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Multi-Task Learning with Graph Structures
• A simple way to encode graph structure is to penalize the difference of two tasks that have an edge between them
• Given a set of edges E, we thus penalize:
• The graph regularization term can also be represented in the form of Laplacian term
54
𝑊𝑒{𝑖,1} −𝑊𝑒{𝑖,2} 2
2𝐸
𝑖=1
= 𝑊𝑅𝑇 𝐹2
𝑊𝑅𝑇 𝐹2 = 𝑡𝑟 𝑊𝑅𝑇 𝑇𝑊𝑅𝑇 = 𝑡𝑟 𝑊𝑅𝑇𝑅𝑊𝑇 = 𝑡𝑟(𝑊ℒ𝑊𝑇)
𝑅 ∈ ℝ 𝐸 ×𝑚
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Multi-Task Learning with Graph Structures
• How to obtain graph information
– External domain knowledge • protein-protein interaction (PPI) for microarray
– Discover task relatedness from data • Pairwise correlation coefficient
• Sparse inverse covariance (Friedman et. al. 2008 Biostatistics)
55
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Multi-Task Learning with Graph Structures Chen et. al. 2011 UAI, Kim et. al. 2009 Bioinformatics
• Graph-guided Fused Lasso
56
ACGTTTTACTGTACAATTTACGene SNP
phenotype
Input
Output
ACGTTTTACTGTACAATTTACGene SNP
phenotype
Input
Output
Lasso
Graph-Guided Fused Lasso
min𝑊Loss 𝑊 + 𝜆 𝑊 1 + Ω(𝑊) Graph-guided Fusion Penalty
Ω 𝑊 = 𝛾 𝑊𝑗𝑚 − 𝑠𝑖𝑔𝑛 𝑟𝑚𝑙 𝑊𝑗𝑙
𝐽
𝑗=1
𝑒= 𝑚,𝑙 ∈𝐸
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Multi-Task Learning with Graph Structures Kim et. al. 2009 Bioinformatics
• In some applications, we know not only which pairs are related, but also how they are related.
• Graph-Weighted Fused Lasso.
57
min𝑊Loss 𝑊 + 𝜆 𝑊 1 + 𝛾 𝜏(𝑟𝑚𝑙)
𝑒= 𝑚,𝑙 ∈𝐸
𝑊𝑗𝑚 − 𝑠𝑖𝑔𝑛 𝑟𝑚𝑙 𝑊𝑗𝑙
𝐽
𝑗=1
Added weight information!
ACGTTTTACTGTACAATTTACGene SNP
phenotype
Input
Output
Graph-Weighted Fused Lasso
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Practical Guideline
• MTL versus STL
– MTL is preferred when dealing with multiple related tasks with small number of training samples
• Shared features versus shared subspace
– Identifying shared features is preferred When the data dimensionality is large
– Identifying a shared subspace is preferred when the number of tasks is large
58
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Tutorial Road Map
• Part I: Multi-task Learning (MTL) background and motivations
• Part II: MTL formulations
• Part III: Case study of real-world applications
– Incomplete Multi-Source Fusion
– Drosophila Gene Expression Image Analysis
• Part IV: An MTL Package (MALSAR)
• Current and future directions
59
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Case Study I: Incomplete Multi-Source Data Fusion Yuan et. al. 2012 NeuroImage
• In many applications, multiple data sources may contain a considerable amount of missing data.
• In ADNI, over half of the subjects lack CSF measurements; an independent half of the subjects do not have FDG-PET.
60
P1 P2 P3 … P114 P115 P116
PET
Subject1
Subject60
Subject61
Subject62
Subject139
Subject140
Subject141
Subject148
Subject149
Subject245
.........
...
......
......
......
......
M1 M2 M3 M4 … M303 M304 M305 C1 C2 C3 C4 C5
MRI CSF
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Overview of iMSF Yuan et. al. 2012 NeuroImage
61
MRIPET
Task ITask I
Task IITask II
Task IIITask III
Task IVTask IV
Model I
Model II
Model III
Model IV
MRI CSF
CSF
PET
min𝑊
1
𝑚 𝐿𝑜𝑠𝑠 𝑋𝑖 , 𝑌𝑖 ,𝑊𝑖
𝑚
𝑖=1
+ 𝜆 𝑊𝐺𝑘 2
𝑑
𝑘=1
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
iMSF: Performance Yuan et. al. 2012 NeuroImage
62
0.79
0.8
0.81
0.82
0.83
0.84
50.0% 66.7% 75.0%
Acc
ura
cy iMSF
Zero
EM
KNN
SVD
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
50.0% 66.7% 75.0%
Sen
siti
vity
iMSF
Zero
EM
KNN
SVD
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
50.0% 66.7% 75.0%
Spe
cifi
city
iMSF
Zero
EM
KNN
SVD
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
• Drosophila (fruit fly) is a favorite model system for geneticists and developmental biologists studying embryogenesis.
– The small size and short generation time make it ideal for genetic studies.
• In situ hybridization allows us to generate images showing when and where individual genes were active.
– The analysis of such images can potentially reveal gene functions and gene-gene interactions.
Case Study II: Drosophila Gene Expression Image Analysis
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Berkeley Drosophila Genome Project (BDGP) http://www.fruitfly.org/
Fly-FISH http://fly-fish.ccbr.utoronto.ca/
Stage 1-3 Stage 4-6 Stage 7-8 Stage 9-10 Stage 11-12 Stage 13-
Drosophila gene expression pattern images
[Tomancak et al. (2002) Genome Biology; Lécuyer et al. (2007) Cell]
Stage 1-3 Stage 4-5 Stage 6-7 Stage 8-9 Stage 10-
bgm
en
hb
runt
Expressions
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Comparative image analysis Twist heartless stumps
anterior endoderm AISN trunk mesoderm AISN subset cellular blastoderm mesoderm AISN
dorsal ectoderm AISN procephalic ectoderm AISN subset cellular blastoderm mesoderm AISN
anterior endoderm AISN trunk mesoderm AISN head mesoderm AISN
stage 4-6
[Tomancak et al. (2002) Genome Biology; Sandmann et al. (2007) Genes & Dev.]
trunk mesoderm PR head mesoderm PR
trunk mesoderm PR head mesoderm PR anterior endoderm anlage
yolk nuclei trunk mesoderm PR head mesoderm PR anterior endoderm anlage
stage 7-8
We need the spatial and temporal annotations of expressions
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Challenges of manual annotation
0
20000
40000
60000
80000
100000
120000
140000
160000
180000
1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2010
Nu
mb
er
of
imag
es
Cumulative number of in situ images
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Low-rank multi-task
Model construction
Graph-based multi-task
Method outline Ji et. al. 2008 Bioinformatics; Ji et. al. 2009 BMC Bioinformatics; Ji et. al. 2009 NIPS
Images Sparse coding
Feature extraction
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Low rank multi-task learning model Ji et. al. 2009 BMC Bioinformatics
*1 1
),( WYxwLk
i
n
j
jij
T
i
X Y W
Low rank
trace norm = sum of singular values
Loss term
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
2-norm loss
Graph-based multi-task learning model Ji et. al. 2009 SIGKDD
2
),(
2
2
1
1 1
)sgn()(),( qpqp
Gqp
pqF
k
i
n
j
jij
T
i wCwCgWYxwL
sensory system head
ventral sensory complex
dorsal/lateral sensory complexes
embryonic maxillary sensory complex
embryonic antennal sense organ
sensory nervous system
0.56
0.31 0.57
0.60
0.79
0.50
0.36
0.35
Closed-form solution
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Spatial annotation performance
• 50% data for training and 50% for testing and 30 random trials are generated
• Multi-task approaches outperform single-task approaches
65
67
69
71
73
75
77
79
81
83
85
Stages 4-6 Stages 7-8 Stages 9-10 Stages 11-12 Stages 13-16
AU
C
SC+LR
SC+Graph
BoW+Graph
Kernel+SVM
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Tutorial Road Map
• Part I: Multi-task Learning (MTL) background and motivations
• Part II: MTL formulations
• Part III: Case study of real-world applications
– Incomplete Multi-Source Fusion
– Drosophila Gene Expression Image Analysis
• Part IV: An MTL Package (MALSAR)
• Current and future directions
71
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
• A multi-task learning package
• Encode task relationship via structural regularization
• www.public.asu.edu/~jye02/Software/MALSAR/
72
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
MTL Algorithms in MALSAR 1.0 • Mean-Regularized Multi-Task Learning
• MTL with Embedded Feature Selection – Joint Feature Learning
– Dirty Multi-Task Learning
– Robust Multi-Task Feature Learning
• MTL with Low-Rank Subspace Learning – Trace Norm Regularized Learning
– Alternating Structure Optimization
– Incoherent Sparse and Low Rank Learning
– Robust Low-Rank Multi-Task Learning
• Clustered Multi-Task Learning
• Graph Regularized Multi-Task Learning
73
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Tutorial Road Map
• Part I: Multi-task Learning (MTL) background and motivations
• Part II: MTL formulations
• Part III: Case study of real-world applications
– Incomplete Multi-Source Fusion
– Drosophila Gene Expression Image Analysis
• Part IV: An MTL Package (MALSAR)
• Current and future directions
74
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Trends in Multi-Task Learning • Develop efficient algorithms for large-scale multi-
task learning.
• Semi-supervised and unsupervised MTL
• Learn task structures automatically in MTL
• Asymmetric MTL
• Cross-Domain MTL – The features may be different
75
The relationship is not mutual
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
76
• National Science Foundation
• National Institute of Health
Acknowledgement
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Reference
77
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Reference
78
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Reference
79
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Reference
80
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Reference
81
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Reference
82
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012
Center for Evolutionary Medicine and Informatics
Reference
83