116
Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou 1,2 , Jianhui Chen 3 , Jieping Ye 1,2 1 Computer Science and Engineering, Arizona State University, AZ 2 Center for Evolutionary Medicine Informatics, Biodesign Institute, Arizona State University, AZ 3 GE Global Research, NY SDM 2012 Tutorial

Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

  • Upload
    lehanh

  • View
    221

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Multi-Task Learning: Theory, Algorithms, and Applications

Jiayu Zhou1,2, Jianhui Chen3, Jieping Ye1,2

1 Computer Science and Engineering, Arizona State University, AZ 2 Center for Evolutionary Medicine Informatics, Biodesign Institute, Arizona State University, AZ

3 GE Global Research, NY

SDM 2012 Tutorial

Page 2: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Tutorial Goals

• Understand the basic concepts in multi-task learning

• Understand different approaches to model task relatedness

• Get familiar with different types of multi-task learning techniques

• Introduce the multi-task learning package: MALSAR

2

Page 3: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Tutorial Road Map

• Part I: Multi-task Learning (MTL) background and motivations

• Part II: MTL formulations

• Part III: Case study of real-world applications

– Disease Progression

– Dealing with Missing Value in Multiple Sources

– Drosophila Image Analysis

• Part IV: MALSAR: Multi-task Learning via Structural Regularization Package

• Future directions

3

Page 4: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Multiple Tasks o

4

School 1 - Alverno High School

School 138 - Jefferson Intermediate School

School 139 - Rosemead High School

Student

id

Birth

year

Previous

score

School

ranking

72981 1985 95 … 83% …

Student

id

Birth

year

Previous

score

School

ranking

31256 1986 87 … 72% …

Student

id

Birth

year

Previous

score

School

ranking

12381 1986 83 … 77% …

Exam

score

?

Exam

score

?

Exam

score

?

Page 5: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Learning Multiple Tasks o

5

Student

id

Birth

year

Previous

score

School

ranking

72981 1985 95 83% …

31256 1986 87 72% …

12381 1987 83 77% …

… … … … …

Exam

Score

?

?

?

21901 1986 87 72% … ?

Students with same

Features but different

Exam Scores

Page 6: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Learning Multiple Tasks o

6

Student

id

Birth

year

Previous

score

School

ranking

72981 1985 95 83% …

Student

id

Birth

year

Previous

score

School

ranking

31256 1986 87 72% …

Student

id

Birth

year

Previous

score

School

ranking

12381 1986 83 77% …

Exam

Score

?

Exam

Score

?

Exam

Score

?

School 1 - Alverno High School

School 138 - Jefferson Intermediate School

School 139 - Rosemead High School

Excellent

Excellent

Excellent

Page 7: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Learning Multiple Tasks o

7

Student

id

Birth

year

Previous

score

School

ranking

72981 1985 95 83% …

Student

id

Birth

year

Previous

score

School

ranking

31256 1986 87 72% …

Student

id

Birth

year

Previous

score

School

ranking

12381 1986 83 77% …

Exam

Score

?

Exam

Score

?

School 1 - Alverno High School

School 138 - Jefferson Intermediate School

School 139 - Rosemead High School

Exam

Score

?

Page 8: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Multi-Task Learning

• Multi-task Learning is different from single task learning in the training (induction) process.

• Inductions of multiple tasks are performed simultaneously to capture intrinsic relatedness.

8

Page 9: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Learning Methods o

o

o

o

Page 10: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Web Pages Categorization • Classify documents into

categories

• The classification of each category is a task

• The tasks of predicting different categories may be latently related [Chen et.al. ICML 09]

10

Health

Travel

World

Politics

US...

Category Classifiers

Models of different categories are

latently related

Classifiers’ Parameters

Page 11: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Collaborative Ordinal Regression

• The preference prediction of each user can be modeled using ordinal regression

• Some users have similar tastes and their predictions may also have similarities

• Simultaneously perform multiple prediction to use such similarity information [Yu et. al. NIPS 06]

11

Page 12: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

MTL for HIV Therapy Screening • Hundreds of possible combinations of drugs, some of which

use similar biochemical mechanisms • The sample available for each combination is limited. • For a patient, the prediction of using one combination is a

task • Use the similarity information by simultaneously inference

multiple tasks

12

ETV

ABC

AZT

D4T

DDI

FTC

...

Patient Treatment RecordDrug

AZT

ETV

FTC

?

Bickel et al. ICML 08

Page 13: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

How to capture shared structures?

13

All tasks are relatedAssumption:

The relationship is not symmetric

Assumption:Tasks have group structures

Assumption:

There are outlier tasksAssumption:

Page 14: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

How to capture shared structures?

14

All tasks are relatedAssumption:

Methods • Mean-regularized MTL • Joint feature learning • Low rank regularized

MTL • alternating structural

optimization (ASO) • Shared Parameter

Gaussian Process

Page 15: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

How to capture shared structures?

15

Methods • Clustered MTL • Tree MTL • Network MTL

Tasks have group structuresAssumption:

Tasks have tree structuresAssumption:

a cb

Models

Tasks have graph/network structuresAssumption:

Page 16: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

How to capture shared structures?

16

Methods • Robust MTL

There are outlier tasksAssumption:

Page 17: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

How to capture shared structures?

17

Methods • Asymmetric MTL

The relationship is not symmetric

Assumption:

Page 18: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

All tasks are related

18

All tasks are relatedAssumption:

• Shared Hidden Node in Neural Network

• Shared Parameter Gaussian Process

• Regularization-based MTL • Mean-regularized MTL • Joint feature learning • Low rank regularized

MTL • alternating structural

optimization

Page 19: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Sharing Hidden Nodes in Neural Network

19

...

Task 1 Task 2 Task 3 Task 4

Sharing Nodes

Inputs

• Neural network has been well studied for learning multiple related tasks for improved generalization performance.

• A set of hidden units are shared among multiple tasks for improved generalization (Caruana ML 97).

Page 20: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Mortality Rank

• Future lab results are used as extra outputs to bias learning for the main risk prediction task

20

Page 21: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

21

Shared Parameter Gaussian Process • In (Lawrence and Platt, ICML 04) an efficient method is proposed to

learn the parameters (of a shared covariance function) for the Gaussian process.

• adopts the multi-task informative vector machine (IVM) to greedily select the most informative examples from the separate tasks and hence alleviate the computation cost.

Page 22: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Common Latent Representation in Nonparametric Bayesian Models

• Multi-Task Infinite Latent Support Vector Machines (Zhu, J. et al NIPS 11)

22

Page 23: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Regularization-based Multi-task Learning

• All tasks are shared

– regularized MTL, joint feature learning, low rank MTL, ASO

• Tasks form groups

– clustered MTL, Network/Tree MTL

• Learning with outlier tasks: robust MTL

• Asymmetric MTL

23

Page 24: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Regularized Multi-Task Learning

• Assume all tasks are related in that the models of all tasks come from a particular distribution (Evgeniou & Pontil, KDD 04)

24

mean

Task

Page 25: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Regularized Multi-Task Learning

• Assumption: task parameter vectors of all tasks are close to each other.

– Advantage: smooth objective, easy to optimize

– Disadvantage: may not hold in real applications.

25

mean

Task

Page 26: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Multi-Task Learning with Joint Feature Learning

• One way to capture the task relatedness from multiple related tasks is to constrain all models to share a common set of features.

• For example, in school data, the scores from different schools may be determined by a similar set of features.

26

Feature 1

Feature 2

Feature 3

Feature 4

Feature 5

Feature 6

Feature 7

Feature 8

Feature 9

Feature 10

Task 1Task 2

Task 3Task 4

Task 5

Page 27: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Multi-Task Learning with Joint Feature Learning

• Using group sparsity: ℓ1/ℓ2-norm regularization

27

...

Y

=

n×k... × ...+

W*X Z

n×p p×k n×k

Sample 1

......

Sample 2

Sample 3

Sample n-2

Sample n-1

Sample n

Task 1Task 2

Task 3Task k

Task 1Task 2

Task 3Task k

Input Model NoiseOutput

min𝑊

1

2𝑋𝑊 − 𝑌 𝐹

2 + 𝜆 𝒘𝑖 2

𝑝

𝑖=1

Page 28: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Joint Feature Selection in Disease Progression

28

Pat

ien

t Sa

mp

le: n

Feature Space: d

Task (Time Point): t

X ≈

Baseline MRI Volum

e

Baseline MRI Area

Baseline MRI Surface

Baseline Labtest

Baseline Cognitive Test

Other features

Task (Time Point): t

06 Month

12 Month

24 Month

36 Month

Pat

ien

t Sa

mp

le: n

Feat

ure

Siz

e: d

06 Month

12 Month

24 Month

36 Month

X YW

Removed Feature

Removed Feature

Removed Feature

Re

mo

ved

Fe

atu

re

Re

mo

ved

Fe

atu

re

Re

mo

ved

Fe

atu

re

• The progression of disease is assumed to involve the same set of features at different time points [Zhou et.al. KDD 11].

Page 29: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Joint Feature Selection in Disease Progression

• In predicting different cognitive scores, there may be shared features from different data sources.

• Multi-modal multi-task learning [Zhang, D. et.al. NeuroImage 12]

29

Page 30: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Multi-Task Learning with Joint Feature Learning – L1Lq

• More general ℓ1/ℓ𝑞-norm regularization:

30

min𝑊

1

2𝑋𝑊 − 𝑌 𝐹

2 + 𝜆 𝒘𝑖 𝑞

𝑝

𝑖=1

...

Y

=

n×k... × ...+

W*X Z

n×p p×k n×k

Sample 1

......

Sample 2

Sample 3

Sample n-2

Sample n-1

Sample n

Task 1Task 2

Task 3Task k

Task 1Task 2

Task 3Task k

Input Model NoiseOutput

Page 31: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Multi-Task Learning with Joint Feature Learning – L1Lq

• The selection of q may depend on the distribution of the model:

31

𝑊~𝑁(Mean, Variance)

Page 32: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Trace-Norm Regularized MTL

32

o Capture Task Relatedness via a Shared Low-Rank Structure

trained model

training data

task 1

training

trained model

training data

task 2 …

A shared low-rank structure

generalization

generalization

trained model

training data

task n generalization

Page 33: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Low-Rank Structure for MTL

… 𝛼1 𝛼2 + ≈ ×

training data weight vector target

Task 1

… ≈ ×

Task 2

≈ × Task 3

=

=

=

𝛽1 𝛽1 +

𝛾1 𝛾2 +

basis vector basis vector

Page 34: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Low-Rank Structure for MTL

34

=

=

𝛼1 𝛼2

𝛽1 𝛽2 𝛾1 𝛾2

𝜏 11 ⋯ 𝜏 1𝑚 ⋮ ⋱ ⋮

𝜏 𝑝1 ⋯ 𝜏 𝑝𝑚 … …

×

×

A low rank structure Basis vectors Coefficients

m tasks p basis vectors 𝑚 > 𝑝

Page 35: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Low-Rank Structure for MTL

• Rank minimization formulation

– min𝑊

Loss(𝑊) + 𝜆 × Rank(𝑊)

– Rank minimization is NP-Hard

• Convex relaxation: trace norm minimization

– min𝑊

Loss(𝑊) + 𝜆 × 𝑊 ∗

– Trace-norm minimization is the convex envelope of the rank minimization (Fazel et al., 2001).

35

Page 36: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Low-Rank Structure for MTL o Evaluation on the School data1:

• Predict exam scores for 15362 students from 139 schools

• Describe each student by 27 attributes

• Compare Ridge Regression, Lasso, and Trace Norm (for inducing a low-rank structure)

36

1http://ttic.uchicago.edu/~argyriou/code/

N−MSE = mean squared error

variance (target)

1 2 3 4 5 6 7 80.7

0.75

0.8

0.85

0.9

0.95

1

1.05

Index of Training Ratio

N-M

SE

Ridge Regression

Lasso

Trace Norm

Performance measure:

The Low-Rank Structure (induced via Trace Norm) leads to the smallest N-MSE.

Page 37: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Low-Rank Structure for MTL

Rough shape of the faces

o Evaluation on the Face data1:

• Trace Norm (low-rank structure)

Plot of the 1st weight vector

Page 38: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

A shared Low-Rank Structure for MTL

… ≈ ×

training data weight vector target

the i-th task

38

+ specific for each task shared among tasks

vi wi Ө

weight vector ui =Өvi + wi

o Learning from the i-th task (Ando et. al.’05, Chen et. al.’09)

ӨTӨ = Ip

Page 39: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

A shared Low-Rank Structure for MTL

39

+ …

+

v2 w2 Ө

+

vm wm Ө

+

v1 w1 Ө

u1 u2 um

a shared low rank structure a task-specific structure

o Learning from multiple tasks

… =

transformation matrix

u1, u2, … , um = Ө v1, v2, … , vm + w1, w2, … , wm

Page 40: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Empirical Loss

… ≈ ×

Xi

40

+

o Learning from the i-th task

o Empirical loss on the i-th task, for example,

ℒi Xi Өvi + wi , yi = Xi Өvi + wi − yi 2

ui =Өvi + wi yi

Page 41: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

iASO Formulation

41

o iASO formulation

minimizeӨ,{vi,wi}

ℒi Xi Өvi + wi , yi + 𝛼 Өvi + wi2 + 𝛽 wi

2

𝑚

𝑖=1

subject to ӨTӨ = I

• control both model complexity and task relatedness

• subsume ASO (Ando et al.’05) and SVM as special cases

• naturally lead to a convex relaxation (Chen et al., 10)

• iASO and cASO are equivalent under certain conditions

Page 42: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Multi-Task Learning with Clustered Structure

44

• Most MTL techniques assume all tasks are related

• Not true in many applications • Clustered multi-task learning

assumes the tasks have group

structures the models of tasks from the

same group are closer to each other than those from a different group

Tasks have group structuresAssumption:

e.g. tasks in the yellow group are predictions of heart related diseases and in the blue group are brain related diseases.

Page 43: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Task Clustering in Neural Network

• Bakker and Heskes JMLR 2003

45

Page 44: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Clustered Multi-Task Learning

• Use regularization to capture clustered structures.

46

Training Data X ≈

Training Data X ≈

...

Clustered Models

...

Cluster 1 Cluster 2 Cluster k-1 Cluster k

Cluster 1

Cluster 2

Cluster k-1

Cluster k

Page 45: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Clustered Multi-Task Learning • Capture structures by minimizing sum-

of-square error (SSE) in K-means clustering:

47

Ij index set of jth cluster

min𝐼

𝑤𝑣 − 𝑤 𝑗 2

2

𝑣∈𝐼𝑗

𝑘

𝑗=1

min𝐹

tr 𝑊𝑇𝑊 − tr(𝐹𝑇𝑊𝑇𝑊𝐹)

𝐹 : m×k orthogonal cluster indicator matrix 𝐹𝑖,𝑗 = 1/ 𝑛𝑗 if 𝑖 ∈ 𝐼𝑗 and 0 otherwise

Clustered Models

...

Cluster 1 Cluster 2 Cluster k-1 Cluster k

Cluster 1

Cluster 2

Cluster k-1

Cluster k

m tasks

task number m < cluster number k

Page 46: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Clustered Multi-Task Learning

• Directly minimizing SSE is hard because of the non-linear constraint on F:

48

Clustered Models

...

Cluster 1 Cluster 2 Cluster k-1 Cluster k

Cluster 1

Cluster 2

Cluster k-1

Cluster k

m tasks

task number m < cluster number k

min𝐹

tr 𝑊𝑇𝑊 − tr(𝐹𝑇𝑊𝑇𝑊𝐹)

𝐹 : m×k orthogonal cluster indicator matrix 𝐹𝑖,𝑗 = 1/ 𝑛𝑗 if 𝑖 ∈ 𝐼𝑗 and 0 otherwise

min𝐹:𝐹𝑇𝐹=𝐼𝑘

tr 𝑊𝑇𝑊 − tr(𝐹𝑇𝑊𝑇𝑊𝐹)

Zha et. al. 2001 NIPS

Page 47: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Improves generalization performance

capture cluster structures

Cluster 1

Cluster 2

Cluster k-1

Cluster k

Clustered Multi-Task Learning

• Clustered multi-task learning (CMLT) formulation [Zhou et. al. NIPS 2011]

49

min𝑊,𝐹:𝐹𝑇𝐹=𝐼𝑘

Loss W + 𝛼 tr 𝑊𝑇𝑊 − tr 𝐹𝑇𝑊𝑇𝑊𝐹 + 𝛽 tr 𝑊𝑇𝑊

Page 48: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Convex Clustered Multi-Task Learning

50

min𝑊,𝐹:𝐹𝑇𝐹=𝐼𝑘

Loss W + 𝛼 tr 𝑊𝑇𝑊 − tr 𝐹𝑇𝑊𝑇𝑊𝐹 + 𝛽 tr 𝑊𝑇𝑊

min𝑊,𝐹:𝐹𝑇𝐹=𝐼𝑘

Loss W + 𝛼𝜂(1 + 𝜂)tr 𝑊 𝜂𝐼 + 𝐹𝐹𝑇 −1𝑊𝑇

Chen et al KDD 2009

Jacob et al NIPS 2009

Zhou et al NIPS 2010

min𝑊,𝑀

Loss W + 𝛼𝜂(1 + 𝜂)tr 𝑊 𝜂𝐼 + 𝑀 −1𝑊𝑇

subject to: tr 𝑀 = 𝑘, 𝑀 ≼ 𝐼, 𝑀 ∈ 𝑆+𝑚

Page 49: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Convex Clustered Multi-Task Learning

51

Ground Truth

• Synthetic Study [Zhou NIPS 2011]

Mean Regularized MTL

Single Task Learning

Trace Norm Regularized MTL

Convex Relaxed CMTL

noise introduced by relaxations

Low rank can also well capture

cluster structure

Page 50: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Multi-Task Learning with Tree Structures

• In some scenarios, the tasks may be equipped with tree structures:

– The tasks belong to the same node are similar to each other

– The similarity between two nodes is structured and relates to the depth of the ‘common’ tree node

52

Tasks have tree structuresAssumption:

a cb

ModelsTask a is more similar with b,

comparing to c

Page 51: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Multi-Task Learning with Tree Structures

• Tree-Guided Group Lasso (Kim and Xing 2010 ICML)

53

Inp

ut

Feat

ure

s

Output (Tasks)

β1 β2 β3

Gv5={β1, β2, β3}

Gv4={β1, β2}

Gv1={β1} Gv2={β2} Gv3={β3}

Structure

min𝛽

Loss 𝛽 + 𝜆 𝑤𝑣 𝛽𝐺𝑣

𝑗

2𝑣∈𝑉𝑗

Page 52: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Multi-Task Learning with Graph Structures

• In real applications, tasks involved in MTL may have graph structures

– The two tasks are related if they are connected in a graph, i.e. the connected tasks are similar

– The similarity of two related tasks can be represented by the weight of the connecting edge.

54

Tasks have graph/network structuresAssumption:

Page 53: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Multi-Task Learning with Graph Structures

• Graph-guided Fused Lasso (Chen et. al. UAI11)

55

ACGTTTTACTGTACAATTTACGene SNP

phenotype

Input

Output

ACGTTTTACTGTACAATTTACGene SNP

phenotype

Input

Output

Lasso

Graph-Guided Fused Lasso

min𝑊

Loss 𝑊 + 𝜆 𝑊 1 + Ω(𝑊) Graph-guided Fusion Penalty

Ω 𝑊 = 𝛾 𝜏(𝑟𝑚𝑙)

𝑒= 𝑚,𝑙 ∈𝐸

𝑤𝑗𝑚 − 𝑠𝑖𝑔𝑛 𝑟𝑚𝑙 𝑤𝑗𝑙

𝐽

𝑗=1

Page 54: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Quantitative Trait Network

• Linked Edge: the corresponding two traits are highly correlated.

• Thicknesses: strength of correlation.

• Identifying SNPs that are associated with a subnetwork of clinical traits (Kim and Xing 2009).

56

Page 55: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Graph-Weighted Fused Lasso • Lasso: each phenotype represented as a circle is

independently mapped to SNPs for association

• Graph-constrained fused Lasso: consider a QTN to search for an association between a SNP and a subnetwork of traits.

• Graph-weighted fused Lasso: consider a QTN with edge weights.

57

Page 56: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Robust Multi-Task Learning o Most Existing MTL Approaches o Robust MTL Approaches

58

relevant tasks

irrelevant tasks

equally weighted

Page 57: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Incoherent Low-Rank and Sparse Structures

… ≈ ×

training data weight vector target

59

o Learning from the i-th task

+

… … … …

Low rank structure Entry-wise sparse structure

Select discriminative features for each task

Capture task relatedness

Page 58: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Incoherent Low-Rank and Sparse Structures

60

… …

Low-rank structure

… …

Entry-wise sparse structure

q1 q𝑖 q𝑚

Q 1 P ∗

p1 p𝑖 p𝑚

(Sum of singular values) (Sum of the absolute values of all entries)

P ∗ ⩽ 𝜂 𝜆 Q 1

Page 59: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

ISLR Formulation

o Empirical loss on the i-th task, e.g.,

61

ℒi Xi pi + qi , yi = Xi pi + qi − yi 2

o Incoherent Sparse Low-Rank (ISLR) formulation

minimizeP,Q

ℒi Xi pi + qi , yi + 𝜆 Q 1

𝑚

𝑖=1

subject to P ∗ ⩽ 𝜂

• Convex formulation

• Decomposed sparse and low-rank structures

Page 60: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Low-Rank and Group Sparsity in MTL

… ≈ ×

training data weight vector target

62

o Learning from the i-th task

+

… … … …

Low rank structure Group sparse structure

Identify irrelevant tasks via non-zero vectors

Capture task relatedness

Page 61: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Low-Rank and Group Sparsity in MTL

63

… …

Low-rank structure

… …

Group sparse structure

𝑠1 𝑠𝑖 𝑠𝑚

𝑠𝑖 2 𝑠1 2 𝑠𝑚 2 … …

+

𝑆 1,2 𝐿 ∗

𝑙1 𝑙𝑖 𝑙𝑚

(Sum of singular values in L)

Page 62: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Robust MTL Formulation

64

minimizeL,S

ℒi Xi li + si , yi + 𝛼 L ∗ + 𝛽 S 1,2

𝑚

𝑖=1

o Robust MTL Formulation

• Capture task relatedness via a low-rank structure

• Identify irrelevant tasks via a group-sparse structure

o Empirical loss on the i-th task, e.g.,

ℒi Xi li + 𝑠 , yi = Xi li + si − yi 2

Page 63: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Performance Bound

65

1

T Xi

T 𝑙 i + 𝑠 i − 𝑓 i 2

2𝑚𝑖=1 ⩽ (1+ ε) inf

𝑙i,𝑠i

1

T Xi

T 𝑙i + 𝑠i − 𝑓 i 2

2𝑚𝑖=1 + Ф ε

α2 κ1

2 2𝑟 +

β2

κ22 𝑐

o Assumption on the existence of κ1 𝑠 and κ2 𝑞 • Training data

• Geometric structure of the coefficient matrices

o Performance Bound

with the probability of at least 1 − 𝑚𝑒−

1

2𝑡−𝑑 log 1+

𝑡

d .

Page 64: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Robust Multi-Task Feature Learning

• Simultaneously captures a common set of features among relevant tasks and identifies outlier tasks:

66

Page 65: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Robust Multi-Task Feature Learning

• Formulation:

• Algorithm:

– Accelerated Gradient Method

– Proximal Operator problems:

67

Page 66: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Robust Multi-Task Feature Learning

• Theoretical Guarantees

– With probability of at least

– With probability of

68

Page 67: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Optimization Algorithms

• Loss Function 𝑙𝑜𝑠𝑠 𝑥 – Least Squares Loss – Logistic Loss

• Convex and Smooth Penalty Ω(𝑥) – Regularized MTL

• Convex but Non-Smooth Penalty Ω(𝑥) – ℓ2,1 −Norm – Dirty MTL – Trace Norm

• Non-Convex Penalty Ω 𝑥 – Convex Relaxation – CMTL – ASO

69

Objective min 𝑓(𝑥) = 𝑙𝑜𝑠𝑠 𝑥 + 𝜆 × Ω(𝑥)

Page 68: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Optimization Algorithms

• Gradient Descent (GD)

• Accelerated Gradient Method (AGM)

– Solving Proximal Operator

70

Objective min 𝑓(𝑥) = 𝑙𝑜𝑠𝑠 𝑥 + 𝜆 × Ω(𝑥)

Page 69: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Gradient Descent

• Gradient descent is an algorithm to solve smooth optimization problems min

𝑥𝑓(𝑥):

– Repeat 𝑥𝑖+1 = 𝑥𝑖 − 𝛾𝑖𝑓′ 𝑥𝑖 until convergence criterion is

met.

– 𝑓 𝑥 is continuously differentiable with Lipchitz continuous gradient L then if 𝛾𝑖 ≤ 1/𝐿 we can obtain the convergence rate of 𝑂(1/𝑁)

• Most optimization problems in MTL are non-convex.

• Can we apply gradient descent to non-smooth problems?

71

Page 70: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Gradient Descent

72

Repeat

𝑥𝑖+1 = 𝑥𝑖 − 𝛾𝑖𝑓′ 𝑥𝑖

until convergence

Repeat

𝑥𝑖+1 = arg min𝑥

𝑀(𝑥𝑖 , 𝛾𝑖)

until convergence

𝑀 𝑥𝑖 , 𝛾𝑖 = 𝑓 𝑥𝑖 + 𝑓′ 𝑥𝑖 , 𝑥 − 𝑥𝑖 +1

2𝛾𝑖𝑥 − 𝑥𝑖 2

2

1st order

Taylor expansion

Model

Regularization

Smooth Objective min 𝑓(𝑥)

Page 71: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Non Smooth

Penalty 1st order

Taylor expansion

Regularization

Gradient Descent

• Using the gradient descent with composite model to solve non-smooth optimization problems.

• Convergence Rate O(1/N) 73

Objective min 𝑓(𝑥) = loss 𝑥 + 𝜆 × Ω(𝑥)

𝑀 𝑥𝑖 , 𝛾𝑖 = 𝑓 𝑥𝑖 + 𝑓′ 𝑥𝑖 , 𝑥 − 𝑥𝑖 +1

2𝛾𝑖𝑥 − 𝑥𝑖 2

2 + 𝜆 × Ω(𝑥)

Composite Model

Repeat

𝑥𝑖+1 = arg min𝑥

𝑀(𝑥𝑖 , 𝛾𝑖)

until convergence

Page 72: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Gradient Descent

74

𝑀 𝑥𝑖 , 𝛾𝑖 = 𝑓 𝑥𝑖 + 𝑓′ 𝑥𝑖 , 𝑥 − 𝑥𝑖 +1

2𝛾𝑖𝑥 − 𝑥𝑖 2

2 + 𝜆 × Ω(𝑥)

Composite Model

Repeat

𝑥𝑖+1 = arg min𝑥

𝑀(𝑥𝑖 , 𝛾𝑖)

until convergence

𝑥𝑖+1 = arg min𝑥

1

2𝑥 − 𝑣 2

2 + 𝜌 × Ω(𝑥)

𝑣 = 𝑥𝑖 − 𝛾𝑖𝑙𝑜𝑠𝑠′ 𝑥𝑖

𝜌 = 𝛾𝑖𝜆

Proximal Operator (Moreau, 1965)

Page 73: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Accelerated Gradient Method (AGM)

• A faster extension of gradient descent (Nesterov, 1983; Nemirovski, 1994; Nesterov, 2004)

75

Repeat

𝑥𝑖+1 = 𝑥𝑖 − 𝛾𝑖𝑓′ 𝑥𝑖

until convergence

Repeat

𝑠𝑖 = 𝑥𝑖 + 𝛼𝑖(𝑥𝑖 − 𝑥𝑖−1) 𝑥𝑖+1 = 𝑥𝑖 − 𝛾𝑖𝑓

′ 𝑥𝑖

until convergence

Xi Xi+1

Xi Xi+1

Si

Page 74: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Accelerated Gradient Method (AGM)

76

Repeat

𝑠𝑖 = 𝑥𝑖 + 𝛼𝑖(𝑥𝑖 − 𝑥𝑖−1) 𝑥𝑖+1 = arg min

𝑥𝑀(𝑠𝑖 , 𝛾𝑖)

until convergence

Xi Xi+1

Xi Xi+1

Si

𝑀 𝑥𝑖 , 𝛾𝑖 = 𝑓 𝑥𝑖 + 𝑓′ 𝑥𝑖 , 𝑥 − 𝑥𝑖 +1

2𝛾𝑖𝑥 − 𝑥𝑖 2

2 + 𝜆 × Ω(𝑥)

Composite Model

Repeat

𝑥𝑖+1 = arg min𝑥

𝑀(𝑥𝑖 , 𝛾𝑖)

until convergence

Page 75: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Optimization with Non-Convex Objectives

• In multi-task learning, optimization objectives involved may be non-convex (e.g. clustered multi-task learning).

• Directly applying convex optimization techniques may obtain suboptimal solutions.

• Convex Relaxation

– General non-convex problem: find convex envelope • Rank minimization → Trace-norm minimization

– Difference of convex (DC) problem: Convex-Concave Procedure (CCCP)[Yuille and Rangarajan NIPS 2001]

• ℓ1/ℓ0.5-regularization → Reweighted group Lasso

77

Page 76: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Difference of Convex (DC) Programming

• The objective can be written in the form:

– min𝑥

𝑓 𝑥 − 𝑔(𝑥)

– 𝑓 𝑥 and 𝑔 𝑥 are convex functions.

• We linearize 𝑔 𝑥 using the 1st order Taylor expansion at 𝑥′:

– 𝑓 𝑥 − 𝑔 𝑥 = 𝑓 𝑥 − 𝑔 𝑥′ − 𝛻𝑔 𝑥′ , 𝑥 − 𝑥′

• In every iteration of CCCP, we minimize the upper bound:

– 𝑥𝑘+1 = argmin𝑥𝑓 𝑥 − 𝛻𝑔 𝑥𝑘 , 𝑥

• The objective function is guaranteed to decrease

78

Page 77: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Case Study: Disease Progression

• Alzheimer’s Disease (AD) is

– the most common type of dementia;

– severe neurodegenerative disorder;

– definitive diagnosed only through brain biopsy or autopsy;

– clinically diagnosed by clinical/cognitive measures including MMSE and ADAS-Cog.

79

Page 78: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Modeling Disease Progression

• The prediction of cognitive scores at each time point can be modeled as a regression task.

• Motivation of using multi-task learning: the ability to explore inherent relationships among related tasks and enforce such knowledge using proper regularizations.

Page 79: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Temporal Smoothness

• Prior knowledge: the change of cognitive scores should be small for a patient. The scores should not fluctuate:

• To enforce this prior knowledge, we use regularization term to penalize the difference.

Page 80: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Temporal Group Lasso • Assumption: there is only a small subset of features related to

disease progression, shared among tasks.

• Achieve this using group sparsity:

Page 81: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Fused Sparse Group Lasso

• Goal: find temporal patterns of the biomarkers in the disease progression.

• Simultaneous feature selection via Fused Lasso:

– a common set of biomarkers for multiple time points

– specific sets of biomarkers for different time points

• Incorporate the temporal smoothness via Group Lasso.

Page 82: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Fused Sparse Group Lasso • The convex formulation:

• Non-convex formulations:

– Reduce shrinkage bias

– Closer to the optimal l0-norm

– Fewer tuning parameters

Page 83: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Performance • MTL outperforms STL

• Fused sparse group Lasso formulations achieve better performance than Temporal group Lasso

Page 84: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Performance

86

Page 85: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Longitudinal Stability Selection on ADAS-Cog • Using FSGL

• From the distribution of stability scores, we can observe temporal patters of MRI biomarkers.

Page 86: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Longitudinal Stability Selection on MMSE

• From the distribution of stability scores, we can observe temporal patters of MRI biomarkers.

Page 87: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Case Study: Missing Data in Multi-Source Learning

• In many applications, multiple data sources may suffer from a considerable amount of missing data.

• In ADNI, over half of the subjects lack CSF measurements; an independent half of the subjects do not have FDG-PET.

89

P1 P2 P3 … P114 P115 P116

PET

Subject1

Subject60

Subject61

Subject62

Subject139

Subject140

Subject141

Subject148

Subject149

Subject245

.........

...

......

......

......

......

M1 M2 M3 M4 … M303 M304 M305 C1 C2 C3 C4 C5

MRI CSF

Page 88: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Challenges

• Simply removing samples with missing values will dramatically reduces the number of samples in the analysis.

• Plus, the resource and time devoted to those subjects with incomplete data are totally wasted.

• Estimating the entire chunk of missing values is very challenging.

90

Page 89: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Incomplete Multi Source Feature Learning (iMSF)

• A “row-wise” strategy

– We first partition the samples into multiple blocks, one for each combination of data sources available

– We then build one different model for each block of data

– Using multi-task techniques, all models involving a specific source are constrained to select a common set of features for that particular source

91

Page 90: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Overview of iMSF

92

MRIPET

Task ITask I

Task IITask II

Task IIITask III

Task IVTask IV

Model I

Model II

Model III

Model IV

MRI CSF

CSF

PET

Page 91: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

iMSF: the Formulation

• Suppose the data set is divided into 𝑚 tasks: 𝑇𝑖 = 𝑥𝑗𝑖 , 𝑦𝑗

𝑖 ,

𝑖 = 1 … 𝑚, 𝑗 = 1 … 𝑁𝑖, where 𝑁𝑖 is the number of subjects in the 𝑖-th task

• Denote 𝛽𝑖 as the weight vector for the 𝑖-th task

• 𝛽𝐼(𝑠,𝑘) denotes all the model parameters corresponding to the

𝑘-th feature in the 𝑠-th data source

• We Solve:

min𝛽

1

𝑚

1

𝑁𝑖 𝐿 𝑥𝑗

𝑖 , 𝑦𝑗𝑖 , 𝛽𝑖

𝑁𝑖

𝑗=1

𝑚

𝑖=1

+ 𝜆 𝛽𝐼 𝑠,𝑘 2

𝑝𝑠

𝑘=1

𝑆

𝑠=1

Page 92: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

iMSF: Performance

94

0.77

0.78

0.79

0.8

0.81

0.82

0.83

0.84

0.85

50.0% 66.7% 75.0%

Acc

ura

cy iMSF

Zero

EM

KNN

SVD

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

50.0% 66.7% 75.0%

Sen

siti

vity

iMSF

Zero

EM

KNN

SVD

0.94

0.95

0.96

0.97

0.98

0.99

1

50.0% 66.7% 75.0%

Spe

cifi

city

iMSF

Zero

EM

KNN

SVD

Page 93: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Case Study: Drosophila Gene Expression Image Analysis

95

[Megason and Fraser (2007) Cell]

Page 94: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Life cycle of fruit fly Drosophila melanogaster

[Wolpert et al. (2006) Principles of Development]

Computer Science & Engineering

“We are much more like flies in our development than you might think.” L. Wolpert

embryogenesis

Page 95: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Berkeley Drosophila Genome Project (BDGP) http://www.fruitfly.org/

Fly-FISH http://fly-fish.ccbr.utoronto.ca/

Stage 1-3 Stage 4-6 Stage 7-8 Stage 9-10 Stage 11-12 Stage 13-

Drosophila gene expression pattern images

[Tomancak et al. (2002) Genome Biology; Lécuyer et al. (2007) Cell]

Stage 1-3 Stage 4-5 Stage 6-7 Stage 8-9 Stage 10-

bgm

en

hb

runt

Computer Science & Engineering

Expressions

Page 96: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Comparative image analysis Twist heartless stumps

anterior endoderm AISN trunk mesoderm AISN subset cellular blastoderm mesoderm AISN

dorsal ectoderm AISN procephalic ectoderm AISN subset cellular blastoderm mesoderm AISN

anterior endoderm AISN trunk mesoderm AISN head mesoderm AISN

stage 4-6

[Tomancak et al. (2002) Genome Biology; Sandmann et al. (2007) Genes & Dev.]

trunk mesoderm PR head mesoderm PR

trunk mesoderm PR head mesoderm PR anterior endoderm anlage

yolk nuclei trunk mesoderm PR head mesoderm PR anterior endoderm anlage

stage 7-8

Computer Science & Engineering

We need the spatial and temporal annotations of expressions

Page 97: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Challenges of manual annotation

Computer Science & Engineering

0

20000

40000

60000

80000

100000

120000

140000

160000

180000

1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2010

Nu

mb

er

of

imag

es

Cumulative number of in situ images

Page 98: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Spatial keywords annotation

• Prior approaches assume all keywords are associated with all images – Zhou and Peng (2007) Bioinformatics

brain primordium visceral muscle primordium

nerve cord primordium

Actn, stage 11-12

Computer Science & Engineering

Multiple keywords are associated with multiple images

Exact correspondences among keywords and images are NOT given

Page 99: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

What are the challenges?

“We used human annotation, rather than automated approaches based on pattern recognition algorithms, because of the overwhelming complexity of annotation. Variation in morphology and incomplete knowledge of the shape and position of various embryonic structures make computational approaches impracticable at present.” P. Tomancak et al. (2002) Genome Biology

Computer Science & Engineering

Local invariant

features

High-level

representations

Multi-task

learning

Page 100: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

SVM

Low-rank multi-task

Model construction

Graph-based multi-task

Method outline

Images

Kernel-based approach

Bag-of-words

Sparse coding

Feature extraction

[Ji et al. (2008) Bioinformatics; Ji et al. (2009) BMC Bioinformatics; Ji et al. (2009) NIPS]

Computer Science & Engineering

Page 101: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

From bag-of-words to sparse coding

u codebook

A 0 0.2 0.6 0.3

0..

2

1min

1

2

xts

xuAxx

0 0 1 0

1,}1,0{..

2

1min

2

exxts

uAx

T

i

x

[Ji et al. (2009) SIGKDD]

Bag-of-words

Sparse coding

Computer Science & Engineering

Both can be improved by incorporating the proximity information of local patches

Page 102: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Low rank multi-task learning model

[Argyriou et al. (2008) Machine Learning]

Sparse coding features

*1 1

),( WYxwLk

i

n

j

jij

T

i

X Y W

Computer Science & Engineering

Low rank

trace norm = sum of singular values

Loss term

Page 103: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

2-norm loss

Graph-based multi-task learning model

2

),(

2

2

1

1 1

)sgn()(),( qpqp

Gqp

pqF

k

i

n

j

jij

T

i wCwCgWYxwL

[Ji et al. (2009) SIGKDD]

sensory system head

ventral sensory complex

dorsal/lateral sensory complexes

embryonic maxillary sensory complex

embryonic antennal sense organ

sensory nervous system

0.56

0.31 0.57

0.60

0.79

0.50

0.36

0.35

Computer Science & Engineering

Closed-form solution

Page 104: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Spatial annotation performance

• 50% data for training and 50% for testing and 30 random trials are generated

• Sparse coding with low rank multi-task learning achieves the best performance

Computer Science & Engineering

0

10

20

30

40

50

60

70

80

90

Stages 4-6 Stages 7-8 Stages 9-10 Stages 11-12 Stages 13-16

AU

C

SC+LR

SC+Graph

BoW+Graph

Kernel+SVM

Page 105: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

MALSAR Package

• Jiayu Zhou, Jianhui Chen, Jieping Ye

• http://www.public.asu.edu/~jzhou29/Software/MALSAR/index.html

107

Multi-TAsk Learning via StructurAl Regularization

MALSAR package

Page 106: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Functions in MALSAR Package

• Regularized Multi-Task Learning

• Joint Feature Learning

• Trace Norm Minimization

• ASO

• Clustered Multi-Task Learning

• Network Multi-Task Learning

• Robust Multi-Task Learning

108

Page 107: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Trends in Multi-Task Learning

• Develop efficient algorithms for large-scale multi-task learning. In many areas where multi-task learning is applied, such as bioinformatics, the dimensionality of data can be huge.

• Learn task structures automatically in MTL

• Most multi-task learning techniques deal with supervised learning problems. There is a high demand of developing new methods for semi-supervised and unsupervised learning.

109

Page 108: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Reference

110

Page 109: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Reference

111

Page 110: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Reference

112

Page 111: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Reference

113

Page 112: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Reference

114

Page 113: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Reference

115

Page 114: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Reference

116

Page 115: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

Reference Optimization Algorithms

117

Page 116: Multi-Task Learning: Theory, Algorithms, and Applications · Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2,

Center for Evolutionary Medicine and Informatics

118