94
L8: Part 1 Hierarchical trees Representing time Kirill Bessonov Nov 10 th 2015 1

L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

L8: Part 1 Hierarchical trees Representing time

Kirill Bessonov

Nov 10th 2015

1

Page 2: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Talk Plan

• Trees – Similarity assessment via trees – Phylogenetic trees vocabulary and types

• Practical on phylogenetic trees and sequence alignment – Identifying source viral sequences

• Networks – examples – main definitions – biological examples

• Practical on WGCNA package – main protocol steps – interpretation of network modules – WGCNA demo

2

Page 3: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Decision Trees (DTs) • A data structure type used in CS

• A data model

– Purpose 1: recursively partition data

• cut data space into perpendicular hyper-planes (w)

– Purpose 2: classify data

• DTs with class label at the leaf node

• E.g. a decision tree that estimates whether or not a potential customer will respond to a direct mailing

– predicted binary class: YES or NO

Source: DECISION TREES by Lior Rokach

Page 4: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

DT growth and splitting

• In top-down approach – assign all data to root node

• Select attribute(s)/feature(s) to split the node

• Splitting based on – 1 feature: univariate split

– ≥2 features: multivaraite split

• Stop tree growth based on Max depth reached

Splitting criteria is not met

Leaf

s/Te

rmin

al

no

de

s

Selected feature(s)

X>x X<x

Y>y Y<y

Page 5: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Hierarchical Trees

• Trees can be used also for – Clustering

– Hierarchy determination • E.g. phylogenetic trees

• Convenient visualization – effective visual condensation of the

clustering results

• Gene Ontology – Direct acyclic graph (DAG)

– Example of functional hierarchy

5

Page 6: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

GO tree example

6

Page 7: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Phylogenetic trees

• Show evolutionary relationships

• Taxa (taxon) – Group of organisms

• Clade – A group of organisms having

a common ancestor

• Common ancestor – an ancestor that given organisms

have in common

7

clade

Page 8: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Building a phylo tree using ape

• Ape - Analyses of Phylogenetics and Evolution

– Functions to create and manipulate phylo trees

– Graphical exploration of phylogenetic data

• To build a phylogenetic tree

1. Download protein sequences from DB

2. Align sequences

3. Calculate pairwise distance using ape

4. Visualize a phylogenetic tree

Page 9: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Building an unrooted phylogenetic tree (1)

#install req. libraries

install.packages("seqinr");

source("https://bioconductor.org/biocLite.R");

biocLite("muscle");

install.packages("ape");

library("seqinr");

library("muscle");

library("ape");

multipleSeqAlignment <- function (seqnames, seqs){

tmp=data.frame(V1=rep(0,length(seqs)),V2=rep(0,length(seqs)));

for(i in 1:length(seqs)){

tmp[i,1]=seqnames[i]

tmp[i,2]=paste(seqs[[i]],collapse="")

}

fasta_seqs_Object = AAStringSet(tmp[,2]); names(fasta_seqs_Object) = seqnames;

#multiple sequence alignment

alignment=muscle::muscle(fasta_seqs_Object); #muscle format

alignment_ape=ape::as.alignment(as.matrix(alignment));

return (alignment_ape)

}

Page 10: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Building an unrooted phylogenetic tree (2)

#main part of the code

choosebank("swissprot") #selects database for query

seqnames <- c("P06747", "P0C569", "O56773", "Q5VKP1");

seqs=list();

for(i in 1:length(seqnames)){

query <- query(paste("AC=",seqnames[i],sep=""));

seqs[i]=getSequence(query);

}

#multipleSeqAlignment() is defined on previous slide

alignment_ape <- multipleSeqAlignment(seqnames, seqs);

mydist <- dist.alignment(alignment_ape);

#nj() performs the neighbor-joining tree estimation by Saitou and Nei mytree$tip.label=c("Q5VKP1-\nWestern Caucasian bat virus\nphosphoprotein","P06747-\nrabies virus\nphosphoprotein","P0C569-\nMokola virus\nphosphoprotein","O56773-\nLagos bat virus\nphosphoprotein");

plot.phylo(mytree,type="u", edge.color = "blue", edge.width = 3, cex=0.8, no.margin=T, srt=50);

Page 11: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Unrooted Phylogenetic Tree

• Phylogenetic tree showing distance between 4 protein viral sequences

• the genetic distance between O56773 and P0C569 is the smallest

Page 12: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Unrooted phylogenetic tree (1)

• The lengths of the branches

– proportional to the amount of evolutionary change

• estimated by number of mutations

• This is an unrooted phylogenetic tree – does not contain an outgroup sequence,

• sequence of a protein that is known to be more distantly related to the other proteins in the tree than they are to each other

• i.e. the common ancestor to all taxa

Page 13: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Unrooted phylogenetic tree(2)

• we cannot tell which direction evolutionary time ran in along the internal branches of the tree.

• Cannot tell whether – the node representing the

common ancestor of (O56773, P0C569) was

• an ancestor of the node representing the common ancestor of (Q5VKP1, P06747),

• or the other way around…

Page 14: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Distance matrix

• Inspecting calculated distance matrix between aligned sequences confirms results seen in phylogenetic tree

• Closest pair is O56773 and P0C559 proteins

Q5VKP1 P06747 P0C569

P06747 0.49

P0C569 0.48 0.45

O56773 0.50 0.46 0.41

Page 15: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Rooted phylogenetic tree

• In order to convert the unrooted tree into a rooted tree, we need to add an outgroup sequence – Outgroup

• a taxon outside the group of interest • will branch off at the base of phylogeny • Represented by

– Caenorhabditis elegans (UniProt accession Q10572) and – Caenorhabditis remanei (UniProt E3M2K8)

• If we were to build a phylogenetic tree of the Fox-1 homologues in verterbrates, the distantly related sequence from worms would probably be a good choice of outgroup – this protein is from a different taxa/group (worms)

Page 16: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Building an rooted phylogenetic tree (1)

#BUILDIN ROOTED TREE OF PROTEIN SEQUNCES (FOX1)

#Q9NWB1 - Human

#Q17QD3 - Cow

#Q95KI0 - Monkey

#A1A5R1 - Rat

#Q10572 - Worm C.elegans(Root)

#E1G4K8 - Eye worm

seqnames <- c("Q9NWB1","Q17QD3","Q95KI0","A1A5R1","Q10572","E1G4K8")

choosebank("swissprot") #selects database for query

seqs=list()

for(i in 1:length(seqnames)){

query <- query(paste("AC=",seqnames[i],sep=""))

seqs[i]=getSequence(query)

}

alignment_ape <- multipleSeqAlignment(seqnames, seqs);

mydist <- dist.alignment(alignment_ape)

Page 17: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Building an rooted phylogenetic tree (2)

library("ape")

mytree <- nj(mydist)

mytree$tip.label=c("E1G4K8-Eye worm ", "Q10572-C.elegans(Root)",

"A1A5R1-Rat", "Q9NWB1-Human", "Q17QD3-Cow", "Q95KI0-Monkey")

myrootedtree <- root(mytree, outgroup="Q10572-C.elegans(Root)",

r=TRUE)

#Phylogenetic tree with 6 tips and 5 internal nodes.

#Tip labels:

#[1] "E1G4K8" "Q8WS01" "Q9VT99" "A8NSK3" "Q10572" "E3M2K8"

#Rooted; includes branch lengths.

plot.phylo(myrootedtree, edge.color = "blue", edge.width = 3 ,

type="p")

Page 18: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Rooted tree of FOX1 proteins

• The invertebrates are grouped together

• Worms form a distinct group yet with large genetic distance

• Human FOX1 is closest to monkey and cow sequences

outgroup (worms)

Page 19: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Distance matrix E1G4K8 Q10572 A1A5R1 Q9NWB1 Q17QD3 Q10572 0.72

A1A5R1 0.75 0.63 Q9NWB1 0.72 0.62 0.44 Q17QD3 0.73 0.62 0.50 0.28

Q95KI0 0.73 0.61 0.49 0.28 0.14

• As expected, eye worms are the mostly distantly related species to vertebrates

• Cow and monkey have the closest relationship and the lowest genetic distance

Table legend: Q9NWB1 – Human Q95KI0 – Monkey Q10572 - Worm C.elegans (Root) Q17QD3 – Cow A1A5R1 – Rat E1G4K8 - Eye worm

Page 20: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Rooted tree

• Time runs from left to right

• Monkey, Cow and Human have common ancestor 3

• Ancestor 1 is common to ancestors 2 and 3

TIME

Page 21: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Exercises on phylogenetic tree building

• Q1. Calculate the genetic distances (i.e. genetic distance) between the following NS1 proteins from different Dengue virus strains: Dengue virus 1 NS1 protein (Uniprot ID: Q9YRR4), Dengue virus 2 NS1 protein (UniProt: Q9YP96), Dengue virus 3 NS1 protein (UniProt: B0LSS3), and Dengue virus 4 NS1 protein (UniProt: Q6TFL5). Which viruses are the most closely related, and which are the least closely related, based on the genetic distances? Note: Dengue virus causes Dengue fever, which is classified by the WHO as a neglected tropical disease. There are four main types of Dengue virus, Dengue virus 1, Dengue virus 2, Dengue virus 3, and Dengue virus 4.

• Q2. Build an unrooted phylogenetic tree of the NS1 proteins from Dengue virus 1, Dengue virus 2, Dengue virus 3 and Dengue virus 4, using the neighbour-joining algorithm. Which are the most closely related proteins, based on the tree?

Page 22: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

• Q3. The Zika virus is related to Dengue viruses, but is not a Dengue virus, and so therefore can be used as an outgroup in phylogenetic trees of Dengue virus sequences. UniProt accession Q32ZE1 consists of a sequence with similarity to the Dengue NS1 protein, so seems to be a related protein from Zika virus. Build a rooted phylogenetic tree of the Dengue NS1 proteins based on an alignment, using the Zika virus protein as the outgroup. Which are the most closely related Dengue virus proteins, based on the tree? What extra information does this tree tell you, compared to the unrooted tree in Q2?

Exercises on phylogenetic tree building

Page 23: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Answers

Question 1: Summary of viral proteins and Uniprot accession numbers: Uniprot ID: Q9YRR4 Dengue virus 1 NS1 protein UniProt: Q9YP96 Dengue virus 2 NS1 protein UniProt: B0LSS3 Dengue virus 3 NS1 protein UniProt: Q6TFL5 Dengue virus 4 NS1 protein seqnames <- c("Q9YRR4","Q9YP96","B0LSS3","Q6TFL5")

choosebank("swissprot") #selects database for query

seqs=list()

for(i in 1:length(seqnames)){

query <- query(paste("AC=",seqnames[i],sep=""))

seqs[i]=getSequence(query)

}

alignment_ape <- multipleSeqAlignment(seqnames, seqs);

mydist <- dist.alignment(alignment_ape);

mydist

Page 24: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Answers

• Q1. The distance matrix is as follows

The most distant are Q9YP96(V2) and Q6TFL5(V4) with genetic distance of 0,33 while the most closely related are Q9YP96(V1) and BOLSS3(V3) with genetic distance of 0,227

Q6TFL5 Q9YRR4 Q9YP96

Q9YRR4 0.306 Q9YP96 0.333 0.254

B0LSS3 0.297 0.230 0.227

Page 25: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Answers

Question 2:

library("ape")

mytree <- nj(mydist)

#plotting unrooted tree

plot.phylo(mytree,type="u", edge.color = "blue", edge.width = 3, cex=1.2,

no.margin=T, srt=0)

#clean the sequences from gaps

seqs_trim=seqs

for(i in 1:length(seqs)){

start=regexpr("DMGY", paste(seqs_trim[[i]],collapse="") ) [1]

stop=regexpr("GEDG", paste(seqs_trim[[i]],collapse="") ) [1]

seqs_trim[[i]]=seqs_trim[[i]][start:stop]

}

alignment_ape <- multipleSeqAlignment(seqnames, seqs_trim);

mydist <- dist.alignment(alignment_ape);mydist

library("ape")

mytree <- nj(mydist)

#plotting unrooted tree based on alignment of whole protein sequences

plot.phylo(mytree,type="u", edge.color = "blue", edge.width = 3, cex=1.2,

no.margin=T, srt=0)

Page 26: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Question 2 (continued):

alignment_ape <- multipleSeqAlignment(seqnames, seqs_trim);

mydist <- dist.alignment(alignment_ape);mydist

library("ape")

mytree <- nj(mydist)

#tree based on the best aligned portion

plot.phylo(mytree,type="u", edge.color = "blue", edge.width = 3, cex=1.2,

no.margin=T, srt=0)

Answers

Page 27: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Answers • The resulting Q2 un-rooted tree This un-rooted tree agrees with the genetic distance matrix calculated in Q1. The tree suggests that BOLSS3 and Q9YP96 are the mostly related proteins. To improve quality of the tree it is best to select region that has minimal number of gaps between protein sequences. How gap cleaning affects phylogentic tree performance please see reference [2]

Below you can see that there are regions with lots of gaps. Let’s build another tree based on the bolded(most conserved) region to see if it is the same

Q6TFL5 DMGCVVSWNGKELKC…KDQKAVHADMGYWIESSKNQTWQIEKASLIEVKTCLWPKTHTL…GMEIRPLSEKEENMVKSQVTA

Q9YRR4 ------------------------DMGYWIESEKNETWKLARASFIEVKTCIWPKSHTL…GMEI-----------------

Q9YP96 DSGCVVSWKNKELKC…KDNRAVHADMGYWIESALNDTWKIEKASFIEVKNCHWPKSHTL…GMEIRPLKEKEENLVNSLVTA

B0LSS3 --------------------ASHADMGYWIESQKNGSWKLEKASLIEVKTCTWPKSHTL…------------------------

Alignment of proteins: Built using the full lengths of proteins

Page 28: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Answers

• The resulting tree looks the same but we had achieved overall better resolution between proteins

Q6TFL5 Q9YRR4 Q9YP96

Q9YRR4 0.317 Q9YP96 0.317 0.264

B0LSS3 0.292 0.233 0.216 Built using the bolded region

Whole protein sequences used

Best aligned portion of protein sequences used

Q6TFL5 Q9YRR4 Q9YP96 Q9YRR4 0.306

Q9YP96 0.332 0.254 B0LSS3 0.297 0.230 0.227

Page 29: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Answers

Question 3:

#Q3 building rooted tree based on Q89277 (yellow fever virus) as out group

library("seqinr")

library("muscle")

library("ape")

seqnames <- c("Q9YRR4","Q9YP96","B0LSS3","Q6TFL5", "Q89277")

choosebank("swissprot") #selects database for query

seqs=list()

for(i in 1:length(seqnames)){

query <- query(paste("AC=",seqnames[i],sep=""))

seqs[i]=getSequence(query)

}

alignment_ape <- multipleSeqAlignment(seqnames, seqs);

mydist <- dist.alignment(alignment_ape);mydist

library("ape")

mytree <- nj(mydist)

myrootedtree <- root(mytree, outgroup="Q89277", r=TRUE)

plot.phylo(myrootedtree ,type="p", edge.color = "blue", edge.width = 3,

cex=1.2, no.margin=T, srt=0)

Page 30: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Answers

• Q3 asks to build a rooted tree using out-group yellow fever virus (Q89277)

• Most closely related viruses: – BOLSS3 and Q9YP96

• This rooted tree tells you which of the Dengue virus NS1 proteins branched off the earliest from the ancestors. Unrooted tree does not provide ancestry information (i.e. time sequence)

Q89277 Q6TFL5 Q9YRR4 Q9YP96

Q6TFL5 0.523 Q9YRR4 0.511 0.306

Q9YP96 0.486 0.333 0.254

B0LSS3 0.487 0.297 0.230 0.227

outgroup

Page 31: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

References

1. Ape library for phylogenetic trees and ancestry with bootstrap methods http://cran.r-project.org/web/packages/ape/ape.pdf

2. Gerard Talavera and Jose Castresana. Improvement of Phylogenies after Removing Divergent and Ambiguously Aligned Blocks from Protein Sequence Alignments. Systematic Biology Volume 56, Issue 4 p. 564-577 (link)

Page 32: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

L8: Part 2 Networks of Biological

interactions Kirill Bessonov

Nov 10th 2015

32

Page 33: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

We are surrounded by networks

33

Page 34: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

34

Page 35: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Transportation Networks

35

Page 36: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Computer Networks

36

Page 37: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Social networks

37

Page 38: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Internet submarine cable map

38

Page 39: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

From describing to engineering

• In 1950

– Alex Bavelas founds the Networks Laboratory Group at M.I.T. to study effectiveness of different communication patterns

39

Page 40: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Social interaction patterns

40

Page 41: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

PPI (Protein Interaction Networks)

• Nodes – protein names • Links – physical binding event 41

Page 42: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Network Definitions

42

Page 43: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Network components

• Networks also called graphs

– Graph (G) contains

• Nodes (N): genes, SNPs, cities, PCs, etc.

• Edges (E): links connecting two nodes

43

Page 44: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Some characteristics

• Networks are

– Complex

– Dynamic

– Can be used to reduce data dimensionally

44 time = t0 time = t

Page 45: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Topology

• Refers to connection pattern

– The pattern of links

45

Page 48: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Modules

• Sub-networks with

– Specific topology

– Function

• Biological context

– Protein complex

– Common function

• E.g. energy production

48 clique

Page 49: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Edges Types

N nodes

E edges

graph:

directed

undirected

Page 50: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Network types • Directed

– Edge have directionality

– Some links are unidirectional

– Direction matters • Going A B is not the same as BA

– Analogous to chemical reactions • Forward rate might not be the same as reverse

– E.g. directed gene regulatory networks (TF gene)

• Undirected – Edges have no directionality

– Simpler to describe and work with

– E.g. co-expression networks

50

Page 51: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Neighbours of node(s)

• Neighbours(node, order) = {node1 … nodep}

• Neighbours(3,1) = {2,4}

• Neighbours(2,2) = {1,3,5,4}

51

Page 52: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Reachability of two nodes i and j

• Walk – Sequence of visited nodes on a

path from node i to j

– e.g. nodes(1,2) = {5,2,1,2,3,4,5,2}

• Trail – a walk with no repeated edges

– e.g. nodes(1,4)={5,4}

• Path – a walk with no repeated nodes

– e.g. nodes(1,6)={5,4,6}

52

visited nodes

Page 53: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Connectivity • Line (edge) connectivity (λ)

– Minimum number of lines (edges) that need to be removed to disconnect graph G

• i.e. no other links would be able to connect a node

• Node connectivity (κ)

– Minimum number of nodes that need to be removed to disconnect graph G

53

λs = 3 and κs = 2

λt = 3 and κt = 2

Page 54: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Connectivity matrix (also known as adjacency matrix)

A =

Size

binary or weighted

Page 55: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Node degree (k)

• the number of edges connected to the node

• k(6) = 1

• k(4) = 3

55

Page 56: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Degree distribution (P(k))

• Determines the statistical properties of

uncorrelated networks

56

source: http://www.network-science.org/powerlaw_scalefree_node_degree_distribution.html

Page 57: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Topologies: scale-free Most real networks have

Degree distribution that follows power-law

• the sizes of earthquakes craters on the moon

• solar flares • the sizes of activity patterns of neuronal

populations • the frequencies of words in most languages • frequencies of family names • sizes of power outages • criminal charges per convict • and many more

Page 58: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Topology: random

Degree distribution of nodes is statistically independent

Page 59: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Shortest path (p)

• Indicates the distance between i and j in

terms of geodesics (unweighted)

• p(1,3) =

– {1-5-4-3}

– {1-5-2-3}

– {1-2-5-4-3}

– {1-2-3}

59

Page 60: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Betweenness centrality

# SPs from j to k via i

# SPs from j to k

the ratio between • all shortest paths (SP) that path the node i and all shortest paths existing in the graph G

Page 61: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Facebook academic network

61 Blue low and red is high betweenness

Page 62: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Betweenness centrality

• reflects the – amount of control over the interactions of other nodes in the network

• bc = ((bab(c) / bab) + (bae(c) / bae) + (bad(c) / bad) + (bbe(c) / bbe) + (bbd(b) / bbd) + (bde(b) / bde)) = ((0/1)+(1 / 2) + (0 / 1) + (1 / 2) + (0 / 1) + 0/1)

• bc = 1 62

Possible node combinations: {AB, AD, AE, AC, BD, BE, BC, CD, CE DE}

Page 63: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Betweenness centrality standardized • For standardization

– the denominator is (n-1)(n-2)/2 (15)

– the maximum possible number of edges

63

Node b b - standardized

1 0 0

2 0 0

3 9 9/15

4 9 9/15

5 8 9/15

6 0 0

7 0 0

Possible node pairs (21) 12 23 34 45 56 67 13 24 35 46 57 14 25 36 47 15 26 37 16 27 17

Page 64: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Cliques

• A clique of a graph G is a complete subgraph of G

– i.e. maximally interconnected subgraph

• The highlighted clique is the maximal clique of size 4 (nodes) 64

Page 65: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

–Robert Kiyosaki

“The richest people in the world look for and

build networks. Everyone else looks for work.”

Page 66: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Biological context

66

Page 67: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Biological Networks

67

Page 68: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Biological examples

• Co-expression – For genes that have similar expression profile

• Directed gene regulatory networks (GRNs) – show directionality between gene interactions

• Transcription factor target gene expression

– Show direction of information flow – E.g. transcription factor activating target gene

• Protein-Protein Interaction Networks (PPI) – Show physical interaction between proteins – Concentrate on binding events

• Others – Metabolic, differential, Bayesian, etc.

68

Page 69: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Biological networks

• Three main classes

69

Type Name Nodes Edges Resource

molecular interactions PPI proteins physical bonds BioGRID DTI drugs/targets physical bonds PubChem

functional associations

GI genes genetic interactions BioGRID

ON Gene Ontology

functional relations GO

GDA genes/diseases associations OMIM

functional/structural similarities Co-Ex genes

expression profile similarity

GEO, ArrayExpress

PStrS proteins structural similarities PDB

Source: Gligorijević, Vladimir, and Nataša Pržulj. "Methods for biological data integration: perspectives and challenges." Journal of The Royal Society Interface 12.112 (2015): 20150571.

Page 70: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Inferring co-expression networks in R

WGCNA package (Weighted Gene Correlation Network Analysis)

70

Page 71: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Main features

• Builds correlation networks

• Correlations are

– simple to calculate

– fast on large scale data

• Support sign of association (not direction)

• Lots of network metrics (e.g. connectivity)

• Easy identification of modules

– Reduction of dataset dimensionality good

71

Page 72: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Construct a network Search for genes with similar expression profile

Identify modules in predicted network Reduce data into gene sets / groups

Relate modules to external information

find biologically interesting modules E.g.: Clinical data, biological function (gene ontology, pathways)

Find the key drivers in interesting modules Experimental validation, therapeutics, biomarkers

Study Module Preservation across different data Check robustness of module definition

72

Page 73: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Steps for constructing a co-expression network

A) Obtain gene expression data

B) Measure co-expression between genes via a correlation coefficient

C) Build correlation matrix = network A) Adjacency matrix

D) Transform correlation matrix with the power adjacency function new adjacency matrix weighted network

73

Page 74: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Network=Adjacency Matrix

• Adjacency matrix, A=[aij], encodes how a pair of nodes is connected (if at all)

– Weighted networks = aij is edge value (weight)

– Unweighted networks = aij presence or absence of edge

74

Page 75: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Scale Free Network Topology

• Scale free topology means

– presence of hub nodes highly connected to other nodes

– metabolic networks exhibit scale free topology at least approximately

– Node connectivity (k), degree, follows power law

– p(k)=proportion of nodes that have connectivity k

Frequency Distribution of Connectivity

Connectivity k

Fre

qu

en

cy

0.000 0.005 0.010 0.015 0.020 0.025 0.030 0.035

01

00

20

03

00

40

05

00

60

07

00

75

Page 76: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

How to check Scale Free Topology?

Only few nodes display

high connectivity

Check if obtained network follows scale free topology Idea: Log transformation p(k) and k and look at scatter plots Answer: R^2 can be used to quantify goodness of fit R^2 > 0.6 mean that networks follows scale free topology

76

Page 77: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Power function transformation

• Idea:

– transform correlation matrix via power function

– Impose scale free topology

– Select the best beta (β)

• Pick the largest beta

• Corresponds to largest R^2

(Beta)

R^2

Power function

77

Page 78: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Defining modules • based on a hierarchical cluster tree

– Build a tree and cut it – Dynamic tree cutting at optimal height [1] Module=branch of

a cluster tree

78

Page 79: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Analysis of modules

• Perform gene ontology analysis on genes from each module (e.g. yellow = “genes 1”)

• Link modules to clinical data (e.g. weight) – Via module eigengene e.g. cor(trait, eigengene)

genes 1 genes 2 genes 3 genes 4

Modules

79

Page 80: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Heatmap view of module

mo

du

les

tissue samples

vertical bands indicate tight co-expression of module genes

GE

NE

S

Module of

co-expressed

genes

80

Page 81: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Modules as eigengenes

• Can summarized all genes in a module by one eigengene (i.e. virtual gene)

• allow one to relate modules to each other

– Allows calculate distance between modules

• to relate modules to clinical traits and SNPs

81

Page 82: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

brown

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185

brown

-0.10.0

0.10.2

0.30.4

Module Eigengene= measure of over-expression=average redness

Rows,=genes, Columns=microarray

The brown module eigengenes across samples

82

Page 83: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Analysis of modules

• Relate modules to traits

• Interested in modules with correlation > 0.75 (red)

83

Page 84: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

WGCNA Demo Simulated data - 5 modules

84

Page 85: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Simulating expression data (1) Note: install 1st Hmisc library otherwise WGCNA installation fails install.packages("Hmisc");

install.packages("WGCNA");

source("https://bioconductor.org/biocLite.R") ;

biocLite(c("GO.db", "preprocessCore", "impute"));

#Simulate data

# Load WGCNA package

library(WGCNA)

# The following setting is important, do not omit.

options(stringsAsFactors = FALSE);

# Here are input parameters of the simulation model

# number of samples or microarrays in the training data

no.obs=50

# now we specify the true measures of eigengene significance

# recall that ESturquoise=cor(y,MEturquoise)

ESturquoise=0; ESbrown= -.6;

ESgreen=.6;ESyellow=0

# Note that we dont specify the eigengene significance of the blue module

# since it is highly correlated with the turquoise module.

ESvector=c(ESturquoise,ESbrown,ESgreen,ESyellow)

# number of genes

nGenes1=3000

# proportion of genes in the turquoise, blue, brown, green, and yellow module #respectively.

simulateProportions1=c(0.2,0.15, 0.08, 0.06, 0.04)

# Note that the proportions dont add up to 1. The remaining genes will be colored grey,

# ie the grey genes are non-module genes.

# set the seed of the random number generator. As a homework exercise change this seed.

set.seed(1) 85

Page 86: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Simulating expression data (2)

#Step 1: simulate a module eigengene network.

# Training Data Set I

MEgreen=rnorm(no.obs)

scaledy=MEgreen*ESgreen+sqrt(1-ESgreen^2)*rnorm(no.obs)

y=ifelse( scaledy>median(scaledy),2,1)

MEturquoise= ESturquoise*scaledy+sqrt(1-ESturquoise^2)*rnorm(no.obs)

# we simulate a strong dependence between MEblue and MEturquoise

MEblue= 0.6*MEturquoise+ sqrt(1-.6^2) *rnorm(no.obs)

MEbrown= ESbrown*scaledy+sqrt(1-ESbrown^2)*rnorm(no.obs)

MEyellow= ESyellow*scaledy+sqrt(1-ESyellow^2)*rnorm(no.obs)

ModuleEigengeneNetwork1=data.frame(y,MEturquoise,MEblue,MEbrown,MEgreen, MEyellow)

86

Page 87: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Simulating expression data (3) dat1=simulateDatExpr5Modules(MEturquoise=ModuleEigengeneNetwork1$MEturquoise,

MEblue=ModuleEigengeneNetwork1$MEblue,

MEbrown=ModuleEigengeneNetwork1$MEbrown,

MEyellow=ModuleEigengeneNetwork1$MEyellow,

MEgreen=ModuleEigengeneNetwork1$MEgreen,

nGenes=nGenes1,

simulateProportions=simulateProportions1)

datExpr = dat1$datExpr;

truemodules = dat1$truemodule;

datME = dat1$datME;

attach(ModuleEigengeneNetwork1)

datExpr=data.frame(datExpr)

ArrayName=paste("Sample",1:dim(datExpr)[[1]], sep="" )

# The following code is useful for outputting the simulated data

GeneName=paste("Gene",1:dim(datExpr)[[2]], sep="" )

dimnames(datExpr)[[1]]=ArrayName

dimnames(datExpr)[[2]]=GeneName

rm(dat1); collectGarbage();

# The following command will save all variables defined in the current session.

save.image("Simulated-dataSimulation.RData");

cat("Note: *.RData file written in ",getwd(), "\n") 87

Page 88: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Construction of a weighted gene co-expression network (1)

# Load WGCNA package

library(WGCNA)

# Load additional necessary packages

library(cluster)

1# The following setting is important, do not omit.

options(stringsAsFactors = FALSE);

# Load the previously saved data

load("Simulated-StandardScreening.RData");

attach(ModuleEigengeneNetwork1)

sft=pickSoftThreshold(datExpr,powerVector=1:20)

plot(sft$fitIndices[,1],-sign(sft$fitIndices[,3])*sft$fitIndices[,2], xlab="Soft Threshold (power)",ylab="SFT, signed R^2", type="o")

abline(h=0.90,col="red")

88

Page 89: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Construction of a weighted gene co-expression network (2)

# here we define the adjacency matrix using soft

thresholding with beta=6

ADJ1=abs(cor(datExpr,use="p"))^6

# When you have relatively few genes (<5000) use the

following code

k=as.vector(apply(ADJ1,2,sum, na.rm=T))

# When you have a lot of genes use the following code

#k=softConnectivity(datE=datExpr,power=6)

# Plot a histogram of k and a scale free topology plot

sizeGrWindow(10,5)

par(mfrow=c(1,2))

hist(k)

scaleFreePlot(k, main="Check scale free topology\n")

89

Page 90: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Definition of co-expression modules (1)

#Many clustering procedures require a dissimilarity matrix as input. We define a dissimilarity based on adjacency

# Turn adjacency into a measure of dissimilarity

dissADJ=1-ADJ1

hierADJ=hclust(as.dist(dissADJ), method="average" )

# Plot the resulting clustering tree together with the true color assignment

sizeGrWindow(10,5);

plotDendroAndColors(hierADJ, colors = data.frame(truemodules), dendroLabels = FALSE, hang = 0.03,

main = "Gene hierarchical clustering dendrogram and simulated module colors" )

90

Page 91: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Definition of co-expression modules (2)

#static tree cutting

colorStaticADJ=as.character(cutreeStaticColor(hierADJ, cutHeight=.99, minSize=20))

# Plot the dendrogram with module colors

sizeGrWindow(10,5);

plotDendroAndColors(hierADJ, colors = data.frame(truemodules, colorStaticADJ),

dendroLabels = FALSE, abHeight = 0.99,

main = "Gene dendrogram and module colors")

#dynamic tree cutting

branch.number=cutreeDynamic(hierADJ,method="tree")

# This function transforms the branch numbers into colors

colorDynamicADJ=labels2colors(branch.number)

sizeGrWindow(10,5)

plotDendroAndColors(dendro = hierADJ,

colors=data.frame(truemodules, colorStaticADJ,

colorDynamicADJ, colorDynamicADJ),

dendroLabels = FALSE, marAll = c(0.2, 8, 2.7, 0.2),

main = "Gene dendrogram and module colors")

91

Page 92: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Calculating module eigengenes

#caluculate eigengenes for each module

datME=moduleEigengenes(datExpr,colorStaticADJ)$eigengenes

#correlation between modules based on their eigengenes

signif(cor(datME, use="p"), 2)

#dendrogram

dissimME=(1-t(cor(datME, method="p")))/2

hclustdatME=hclust(as.dist(dissimME), method="average" )

# Plot the eigengene dendrogram

par(mfrow=c(1,1))

plot(hclustdatME, main="Clustering tree based of the module eigengenes")

#see expression profiles - diagnostic plots

#show available modules

levels(as.factor(colorStaticADJ))

sizeGrWindow(8,9)

par(mfrow=c(3,1), mar=c(1, 2, 4, 1))

which.module="blue";

plotMat(t(scale(datExpr[,colorStaticADJ==which.module ]) ),nrgcols=30,rlabels=T,

clabels=T,rcols=which.module,

title=which.module )

ME=datME[, paste("ME",which.module, sep="")]

barplot(ME, col=which.module, main="", cex.main=2,

ylab="eigengene expression",xlab="array sample")

92

Page 93: L8: Part 1 Hierarchical trees Representing time€¦ · Hierarchical trees Representing time Kirill Bessonov Nov 10th 2015 1 . Talk Plan • Trees – Similarity assessment via trees

Relating modules to trait

#all modules (green and brown modules look interesting)

signif(cor(y,datME, use="p"),2)

#get statistical significance of module association to

trait

cor.test(y, datME$MEbrown)

cor.test(y, datME$MEgreen)

93