18
Summer Review 3 Towards Gene Expression Convolutions using Gene Interaction Graphs School of Engineering and Applied Sciences Harvard University Cambridge, MA, USA Reviewed by : Arshdeep Sekhon 1 Department of Computer Science, University of Virginia https://qdata.github.io/deep2Read/ Reviewed by : Arshdeep Sekhon (University of Virginia) Summer Review 3 Towards Gene Expressio 1 / 18

Summer Review 3 Towards Gene Expression Convolutions using ... · domains, where edges represent genes that encode proteins with similar domains. Figure:Composite Network Use such

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Summer Review 3 Towards Gene Expression Convolutions using ... · domains, where edges represent genes that encode proteins with similar domains. Figure:Composite Network Use such

Summer Review 3

Towards Gene Expression Convolutionsusing Gene Interaction Graphs

School of Engineering and Applied Sciences Harvard UniversityCambridge, MA, USA

Reviewed by : Arshdeep Sekhon

1Department of Computer Science, University of Virginiahttps://qdata.github.io/deep2Read/

Reviewed by : Arshdeep Sekhon (University of Virginia)Summer Review 3Towards Gene Expression Convolutions using Gene Interaction Graphs1 / 18

Page 2: Summer Review 3 Towards Gene Expression Convolutions using ... · domains, where edges represent genes that encode proteins with similar domains. Figure:Composite Network Use such

GEO database

Figure: GEO data

Figure: In the NIH GEO database the median dataset size as well as the numberof datasets added each year.

Reviewed by : Arshdeep Sekhon (University of Virginia)Summer Review 3Towards Gene Expression Convolutions using Gene Interaction Graphs2 / 18

Page 3: Summer Review 3 Towards Gene Expression Convolutions using ... · domains, where edges represent genes that encode proteins with similar domains. Figure:Composite Network Use such

Prior Information: GeneMANIA

GeneMANIA uses a database of organism-specific weighted networksto construct the resulting composite network.The database includes over 1800 networks, containing over 500million interactions for 8 organismsThe networks are organized into groups such as co-expression, whereedges are derived from expression profiles, and shared proteindomains, where edges represent genes that encode proteins withsimilar domains.

Figure: Composite Network

Use such graphs to bias a model, to learn representations using infoalready known in biology.

Reviewed by : Arshdeep Sekhon (University of Virginia)Summer Review 3Towards Gene Expression Convolutions using Gene Interaction Graphs3 / 18

Page 4: Summer Review 3 Towards Gene Expression Convolutions using ... · domains, where edges represent genes that encode proteins with similar domains. Figure:Composite Network Use such

Figure: Caption

Reviewed by : Arshdeep Sekhon (University of Virginia)Summer Review 3Towards Gene Expression Convolutions using Gene Interaction Graphs4 / 18

Page 5: Summer Review 3 Towards Gene Expression Convolutions using ... · domains, where edges represent genes that encode proteins with similar domains. Figure:Composite Network Use such

Gene Graph Convolutions

Graph Convolution usually used where data is in the form of graphs:citation networks, etc

Graphs complementary to the main task in Gene Expression

Use graphs to bias the model

with low number of samples, known relationships between variablescan avoid spurious relationships.

Reviewed by : Arshdeep Sekhon (University of Virginia)Summer Review 3Towards Gene Expression Convolutions using Gene Interaction Graphs5 / 18

Page 6: Summer Review 3 Towards Gene Expression Convolutions using ... · domains, where edges represent genes that encode proteins with similar domains. Figure:Composite Network Use such

Background: Graph Laplacian

The graph Laplacian regularization is a summation of smoothnessterms on the variables to encourage similar coefficients on the genesor other genomic features that are connected in the network

Reviewed by : Arshdeep Sekhon (University of Virginia)Summer Review 3Towards Gene Expression Convolutions using Gene Interaction Graphs6 / 18

Page 7: Summer Review 3 Towards Gene Expression Convolutions using ... · domains, where edges represent genes that encode proteins with similar domains. Figure:Composite Network Use such

Previous Work: Network Regularized Sparse LogisticRegression

Figure: NSLR

L is the normalized Laplacian matrix encoding a prior network1

Figure: Caption

1NSLR Paper Link

Reviewed by : Arshdeep Sekhon (University of Virginia)Summer Review 3Towards Gene Expression Convolutions using Gene Interaction Graphs7 / 18

Page 8: Summer Review 3 Towards Gene Expression Convolutions using ... · domains, where edges represent genes that encode proteins with similar domains. Figure:Composite Network Use such

Graph Convolution

Extract information from the neighbor nodes in a graph

Graph convolutions are generalisation of convolutions, and easiest todefine in spectral domain

euclidean vs non euclidean data

Reviewed by : Arshdeep Sekhon (University of Virginia)Summer Review 3Towards Gene Expression Convolutions using Gene Interaction Graphs8 / 18

Page 9: Summer Review 3 Towards Gene Expression Convolutions using ... · domains, where edges represent genes that encode proteins with similar domains. Figure:Composite Network Use such

Graph convolution

Laplacian L = D − A

D: degree matrix, A : adjacency matrix

for convolution, euclidean shift-invariant definition not applicablesince the structure isnt shift-invariant

use the spectral definition (Convolution is element-wise multiplicationin the Fourier domain)

Lsym = D−1/2LD−1/2=I − D−1/2WD−1/2

Reviewed by : Arshdeep Sekhon (University of Virginia)Summer Review 3Towards Gene Expression Convolutions using Gene Interaction Graphs9 / 18

Page 10: Summer Review 3 Towards Gene Expression Convolutions using ... · domains, where edges represent genes that encode proteins with similar domains. Figure:Composite Network Use such

Graph Convolution

Consider X l+1 = σ(AX lθl)

Issues:

sum up all the feature vectors of all neighboring nodes but not thenode itself : A = A + IA is typically not normalized and therefore the multiplication with Awill completely change the scale of the feature vectors2

Symmetric Normalization: D−1/2AD−1/2

2https://tkipf.github.io/graph-convolutional-networks/Reviewed by : Arshdeep Sekhon (University of Virginia)Summer Review 3Towards Gene Expression Convolutions using Gene Interaction Graphs10 / 18

Page 11: Summer Review 3 Towards Gene Expression Convolutions using ... · domains, where edges represent genes that encode proteins with similar domains. Figure:Composite Network Use such

Graph Convolution

θ ? X l ≈ D′−1/2A′D ′−1/2X lθ = AX lθ (1)

l is layer, n is the number of nodes, o is output feature size, c is inputfeature size

A′ = A + IN A is adjacency matrix

D ′i =∑

j A′ij

where X l ∈ Rn×c

A ∈ Rn×n

θ ∈ Rc×o

Reviewed by : Arshdeep Sekhon (University of Virginia)Summer Review 3Towards Gene Expression Convolutions using Gene Interaction Graphs11 / 18

Page 12: Summer Review 3 Towards Gene Expression Convolutions using ... · domains, where edges represent genes that encode proteins with similar domains. Figure:Composite Network Use such

Graph Convolution for gene networks

Figure: GCN

not possible to have multiple types of interactions: all nodes areaggregated before any transformation

possible to have different sets of parameters for different types ofinteractions

But, genes do not have such interactions.

Reviewed by : Arshdeep Sekhon (University of Virginia)Summer Review 3Towards Gene Expression Convolutions using Gene Interaction Graphs12 / 18

Page 13: Summer Review 3 Towards Gene Expression Convolutions using ... · domains, where edges represent genes that encode proteins with similar domains. Figure:Composite Network Use such

GCN

add skip connection at each conv layer: add neighborhood as well asnode itself.

X l = Aggregate(σ(AX lθ1 + X lθ2)) (2)

Aggregate is hierarchical clustering.

Dropout encourages the model to spread information across all nodesand not rely on a single node.

Reviewed by : Arshdeep Sekhon (University of Virginia)Summer Review 3Towards Gene Expression Convolutions using Gene Interaction Graphs13 / 18

Page 14: Summer Review 3 Towards Gene Expression Convolutions using ... · domains, where edges represent genes that encode proteins with similar domains. Figure:Composite Network Use such

Experiment 1: PANCAN

10, 459 RNA Seq samples from TCGA

16300 genes for each sample

Each sample has some cancer subtype or healthy label. (Actual labels)

Option 1: But, DNNs don’t work on this setting of actual labels.

Option 2: Use a smal subset of genes relevant for cancer subtypedetection or trait, etc.

Option 2 doesn’t work because

No assumption about which gene is relevant.”we cannot guarantee any complex relationship is important to solvethe task”.

Option 3: Select a specific gene and binarize the label: predict usingthe rest of the genes.

Reviewed by : Arshdeep Sekhon (University of Virginia)Summer Review 3Towards Gene Expression Convolutions using Gene Interaction Graphs14 / 18

Page 15: Summer Review 3 Towards Gene Expression Convolutions using ... · domains, where edges represent genes that encode proteins with similar domains. Figure:Composite Network Use such

Experiment : PANCAN

Figure: Adding genes based on distance

predict +1/− 1 by using the expression values of closest neighbors

add more successively till all covered

Reviewed by : Arshdeep Sekhon (University of Virginia)Summer Review 3Towards Gene Expression Convolutions using Gene Interaction Graphs15 / 18

Page 16: Summer Review 3 Towards Gene Expression Convolutions using ... · domains, where edges represent genes that encode proteins with similar domains. Figure:Composite Network Use such

Experiment: Add graph information

Use GeneMANIA and RegNetwork: two types of public database graphs.

Figure: Results

Reviewed by : Arshdeep Sekhon (University of Virginia)Summer Review 3Towards Gene Expression Convolutions using Gene Interaction Graphs16 / 18

Page 17: Summer Review 3 Towards Gene Expression Convolutions using ... · domains, where edges represent genes that encode proteins with similar domains. Figure:Composite Network Use such

Results: Quality of graphs

Genes with a > 20% AUC improvement are a minority.

Gene Mania outperforms RegNetwork

RegNetwork has twice as many edges per node

Indicating simply merging does not lead to improvement.

Reviewed by : Arshdeep Sekhon (University of Virginia)Summer Review 3Towards Gene Expression Convolutions using Gene Interaction Graphs17 / 18

Page 18: Summer Review 3 Towards Gene Expression Convolutions using ... · domains, where edges represent genes that encode proteins with similar domains. Figure:Composite Network Use such

Robustness to Noise

Fig in paper(Low resolution)

Reviewed by : Arshdeep Sekhon (University of Virginia)Summer Review 3Towards Gene Expression Convolutions using Gene Interaction Graphs18 / 18