Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Intrudoction to caOmicsV
Hongen Zhang, Ph.D.Genetics Branch, Center for Cancer Research,
National Cancer Institute, NIH
March 10, 2015
Contents
1 Introduction 1
2 An Quick Demo 2
3 Making Plot Data Set 4
4 Plot bioMatrix Layout Manually 5
5 Plot bioNetCircos Layout Manually 8
6 sessionInfo 11
1 Introduction
Translational genomics research in cancer, e.g., International Cancer GenomeConsortium (ICGC) and The Cancer Genome Atlas (TCGA), has generatedlarge multidimensional datasets from high-throughput technologies. Multidi-mensional data offers great promise to improve clinical applications of genomicinformation in diagnosis, prognosis and therapeutics of cancers. Tools to effec-tively visualize integrated multidimensional data are important for understand-ing and describing the relationship between genomic variations and cancers.The caOmicsV package provides methods to visualize multidimensional cancergenomic data in two layouts: a heatmap-like matrix layout, bioMatrix, and cir-cular layouts superimposed on a biological network or graph, bioNetCircos.
The data that could be plotted with each layout is listed below:
� Clinical (phenotypes) data such as gender, tissue type, and diagnosis,plotted as colored rectangles
� Expression data such as RNASeq and miRNASeq, plotted as heatmap
1
� Category data such as DNA methylation status, plotted as colored boxoutlines on bioMatrix layout and bars on bioNetCircos layout
� Binary data such as mutation status and DNA copy number variations,plotted as colored points
� Text labelling, for gene names, sample names, summary in text format
In addition, link lines can also be plotted on bioNetCircos layout to showthe relationship between two samples.
For bioNetCircos layout, igraph and bc3net packages must be installed first.
2 An Quick Demo
Following code will generate a bioMatrix layout image with the build-in demodata.
> library(caOmicsV)
> data(biomatrixPlotDemoData)
> plotBioMatrix(biomatrixPlotDemoData, summaryType="text")
> bioMatrixLegend(heatmapNames=c("RNASeq", "miRNASeq"),
+ categoryNames=c("Methyl H", "Methyl L"),
+ binaryNames=c("CN LOSS", "CN Gain"),
+ heatmapMin=-3, heatmapMax=3, colorType="BlueWhiteRed")
2
BC
.A21
6.N
orm
alB
D.A
2L6.
Nor
mal
BD
.A3E
P.N
orm
alD
D.A
113.
Nor
mal
DD
.A11
4.N
orm
alD
D.A
118.
Nor
mal
DD
.A11
9.N
orm
alD
D.A
11A
.Nor
mal
DD
.A11
B.N
orm
alD
D.A
11C
.Nor
mal
DD
.A11
D.N
orm
alD
D.A
1EB
.Nor
mal
DD
.A1E
C.N
orm
alD
D.A
1EG
.Nor
mal
DD
.A1E
H.N
orm
alD
D.A
1EI.N
orm
alD
D.A
1EJ.
Nor
mal
DD
.A1E
L.N
orm
alD
D.A
39V.
Nor
mal
DD
.A39
W.N
orm
alD
D.A
39X
.Nor
mal
DD
.A39
Z.N
orm
alD
D.A
3A1.
Nor
mal
DD
.A3A
2.N
orm
alD
D.A
3A3.
Nor
mal
EP
.A12
J.N
orm
alE
P.A
26S
.Nor
mal
ES
.A2H
T.N
orm
alF
V.A
23B
.Nor
mal
FV.
A2Q
R.N
orm
alB
C.A
216.
Tum
orB
D.A
2L6.
Tum
orB
D.A
3EP
.Tum
orD
D.A
113.
Tum
orD
D.A
114.
Tum
orD
D.A
118.
Tum
orD
D.A
119.
Tum
orD
D.A
11A
.Tum
orD
D.A
11B
.Tum
orD
D.A
11C
.Tum
orD
D.A
11D
.Tum
orD
D.A
1EB
.Tum
orD
D.A
1EC
.Tum
orD
D.A
1EG
.Tum
orD
D.A
1EH
.Tum
orD
D.A
1EI.T
umor
DD
.A1E
J.Tu
mor
DD
.A1E
L.Tu
mor
DD
.A39
V.Tu
mor
DD
.A39
W.T
umor
DD
.A39
X.T
umor
DD
.A39
Z.T
umor
DD
.A3A
1.Tu
mor
DD
.A3A
2.Tu
mor
DD
.A3A
3.Tu
mor
EP
.A12
J.Tu
mor
EP
.A26
S.T
umor
ES
.A2H
T.Tu
mor
FV.
A23
B.T
umor
FV.
A2Q
R.T
umor
sample_type
ECM1SLC26A6
ADAMTS13FCN3
CFPCXCL12PTH1R
LIFRPLVAP
TOMM40LLEPREL1AMIGO3CSRNP1
DBHOLFML3RCAN1
COL15A1LYVE1
CDKN3RND3MT1F
SEMA3FLOC222699
LILRB5ESM1BCO2
hsa−mir−10bhsa−mir−139hsa−mir−10bhsa−mir−10bhsa−mir−10bhsa−mir−10bhsa−mir−10bhsa−mir−10bhsa−mir−424hsa−mir−101−2hsa−mir−10bhsa−mir−450bhsa−mir−10bhsa−mir−10bhsa−mir−10bhsa−mir−10bhsa−mir−424hsa−mir−10bhsa−mir−101−1hsa−mir−10bhsa−mir−10bhsa−mir−139hsa−mir−139hsa−mir−10bhsa−mir−450bhsa−mir−10b
Solid Tissue NormalPrimary Tumor
logFC
−4.0 2.6−3.6−5.9−4.4−3.8−4.4−3.6 2.9 1.9−2.5 2.0−2.5−5.0−2.9−2.7 4.3−4.0 4.2−2.9−6.7 1.6 2.5−2.5 5.0−5.1
−3 3Top: RNASeq Bottom: miRNASeq
Methyl HMethyl L
CN LOSSCN Gain
Figure 1. caOmicsV bioMatrix layout plot
Run the code below will get a bioNetCircos layout image with the build-indemo data.
> library(caOmicsV)
> data(bionetPlotDemoData)
> plotBioNetCircos(bionetPlotDemoData)
[1] "plot polygon"
[1] "plot heatmap"
[1] "plot heatmap"
[1] "plot bar"
[1] "plot points"
> dataNames <- c("Tissue Type", "RNASeq", "miRNASeq", "Methylation", "CNV")
> bioNetLegend(dataNames, heatmapMin=-3, heatmapMax=3)
3
1
2
3
456
7
8
9
1011
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
ADAMTS13
FCN3
CXCL12
PTH1R
LEPREL1CSRNP1
OLFML3
RND3
LIFR
TOMM40L
RCAN1
CDKN3
SEMA3F
BCO2
CFP
ESM1
MT1F
LILRB5
LOC222699
DBH
COL15A1
AMIGO3
LYVE1
ECM1
SLC26A6
PLVAP
−3 3From center to outer:
Tissue TypeRNASeqmiRNASeqMethylationCNV
Figure 2. caOmicsV bioNetCircos layout plot
3 Making Plot Data Set
To use the default plot function shown as above, the first step is making an plotdata set to hold all datasets in a list object. Two demo data sets are includedin the package installation and could be explored with:
> library(caOmicsV)
> data(biomatrixPlotDemoData)
> names(biomatrixPlotDemoData)
[1] "sampleNames" "geneNames" "secondGeneNames" "sampleInfo"
[5] "heatmapData" "categoryData" "binaryData" "summaryInfo"
> data(bionetPlotDemoData)
> names(bionetPlotDemoData)
[1] "sampleNames" "geneNames" "secondGeneNames" "sampleInfo"
[5] "heatmapData" "categoryData" "binaryData" "summaryInfo"
caOmicsV package has a function getPlotDataSet() to make the plot dataset as above. The input data to pass to the function are as below:
4
� sampleNames: required, character vector with names of samples to plot
� geneNames: required, character vector with names of genes to plot
� sampleData: required, data frame with rows for samples and columns forfeatures. The first column must be sample names same as the sample-Names above in same order
� heatmapData: list of data frames (maximum 2), continue numeric datasuch as gene expression data.
� categoryData: list of data frames(maximum 2), categorical data such asmethylation High, Low, NO.
� binaryData: list of data frames(maximum 3), binary data such as 0/1.
� summaryData: list of data frames (maximum 2), summarization for genesor for samples.
� secondGeneNames: names of second set of genes, e.g. miRNA names, tolabel genes on right side of matrix layout.
All genomic data must be held with data frame in the format of rows forgenes and columns for samples. The first column of each data frame must begene names same as the geneNames as above in same order. The column namesof each data frame must be sample names same as sampleNames as above insame order. caOmicsV package contains functions to sort data frame for givenorder and several supplement functions are also included in the package to helpextract required data set from big datasets by supplying required gene namesand sample names in a given order.
To make caOmicsV plot, the plot data set must contains at least one genomicdata (one of heatmap data, category data, or binary data)
4 Plot bioMatrix Layout Manually
The default plot method, plotBiomatrix(), is a convenient and efficient way tomake bioMatirx plot. In case of necessary, users can follow procedures below togenerate bioMatrix layout plot manually.
1. Demo data
> library(caOmicsV)
> data(biomatrixPlotDemoData)
> dataSet <- biomatrixPlotDemoData
> names(dataSet)
[1] "sampleNames" "geneNames" "secondGeneNames" "sampleInfo"
[5] "heatmapData" "categoryData" "binaryData" "summaryInfo"
5
2. Initialize bioMatrix Layout
> numOfGenes <- length(dataSet$geneNames);
> numOfSamples <- length(dataSet$sampleNames);
> numOfPhenotypes <- nrow(dataSet$sampleInfo)-1;
> numOfHeatmap <- length(dataSet$heatmapData);
> numOfSummary <- length(dataSet$summaryData);
> phenotypes <- rownames(dataSet$sampleInfo)[-1];
> sampleHeight <- 0.4;
> sampleWidth <- 0.1;
> samplePadding <- 0.025;
> geneNameWidth <- 1;
> sampleNameHeight <- 2.5;
> remarkWidth <- 2;
> summaryWidth <- 1;
> rowPadding <- 0.1;
> initializeBioMatrixPlot(numOfGenes, numOfSamples, numOfPhenotypes,
+ sampleHeight, sampleWidth, samplePadding, rowPadding,
+ geneNameWidth, remarkWidth, summaryWidth, sampleNameHeight)
> caOmicsVColors <- getCaOmicsVColors()
> png("caOmicsVbioMatrixLayoutDemo.png", height=8, width=12,
+ unit="in", res=300, type="cairo")
> par(cex=0.75)
> showBioMatrixPlotLayout(dataSet$geneNames,dataSet$sampleNames, phenotypes)
3. Plot tissue types on phenotype area
> head(dataSet$sampleInfo)[,1:3]
TCGA.BC.A216.Normal TCGA.BD.A2L6.Normal TCGA.BD.A3EP.Normal
sampleID "TCGA.BC.A216.Normal" "TCGA.BD.A2L6.Normal" "TCGA.BD.A3EP.Normal"
sample_type "Solid Tissue Normal" "Solid Tissue Normal" "Solid Tissue Normal"
> rowIndex <- 2;
> sampleGroup <- as.character(dataSet$sampleInfo[rowIndex,])
> sampleTypes <- unique(sampleGroup)
> sampleColors <- rep("blue", length(sampleGroup));
> sampleColors[grep("Tumor", sampleGroup)] <- "red"
> rowNumber <- 1
> areaName <- "phenotype"
> plotBioMatrixSampleData(rowNumber, areaName, sampleColors);
> geneLabelX <- getBioMatrixGeneLabelWidth()
> maxAreaX <- getBioMatrixDataAreaWidth()
> legendH <- getBioMatrixLegendHeight()
> plotAreaH <- getBioMatrixPlotAreaHeigth()
> sampleH<- getBioMatrixSampleHeight()
> sampleLegendX <- geneLabelX + maxAreaX
6
> sampleLegendY <- plotAreaH + legendH - length(sampleTypes)*sampleH
> colors <- c("blue", "red")
> legend(sampleLegendX, sampleLegendY, legend=sampleTypes,
+ fill=colors, bty="n", xjust=0)
4. Heatmap plot
> heatmapData <- as.matrix(dataSet$heatmapData[[1]][,]);
> plotBioMatrixHeatmap(heatmapData, maxValue=3, minValue=-3)
> heatmapData <- as.matrix(dataSet$heatmapData[[2]][,])
> plotBioMatrixHeatmap(heatmapData, topAdjust=sampleH/2,
+ maxValue=3, minValue=-3);
> secondNames <- as.character(dataSet$secondGeneNames)
> textColors <- rep(caOmicsVColors[3], length(secondNames));
> plotBioMatrixRowNames(secondNames, "omicsData", textColors,
+ side="right", skipPlotColumns=0);
5. Draw outline for each samples to show methylation status.
> categoryData <- dataSet$categoryData[[1]]
> totalCategory <- length(unique(as.numeric(dataSet$categoryData[[1]])))
> plotColors <- rev(getCaOmicsVColors())
> plotBioMatrixCategoryData(categoryData, areaName="omicsData",
+ sampleColors=plotColors[1:totalCategory])
6. Binary data plot
> binaryData <- dataSet$binaryData[[1]];
> plotBioMatrixBinaryData(binaryData, sampleColor=caOmicsVColors[4]);
> binaryData <- dataSet$binaryData[[2]];
> plotBioMatrixBinaryData(binaryData, sampleColor=caOmicsVColors[3])
7. Plot summary data on right side of plot area
> summaryData <- dataSet$summaryInfo[[1]][, 2];
> summaryTitle <- colnames(dataSet$summaryInfo[[1]])[2];
> remarkWidth <- getBioMatrixRemarkWidth();
> sampleWidth <- getBioMatrixSampleWidth();
> col2skip <- remarkWidth/2/sampleWidth + 2;
> plotBioMatrixRowNames(summaryTitle, areaName="phenotype",
+ colors="black", side="right", skipPlotColumns=col2skip);
> plotBioMatrixRowNames(summaryData, "omicsData",
+ colors=caOmicsVColors[3], side="right",
+ skipPlotColumns=col2skip)
8. Add legend
7
> bioMatrixLegend(heatmapNames=c("RNASeq", "miRNASeq"),
+ categoryNames=c("Methyl H", "Methyl L"),
+ binaryNames=c("CN LOSS", "CN Gain"),
+ heatmapMin=-3, heatmapMax=3,
+ colorType="BlueWhiteRed")
> dev.off()
null device
1
Run code above should generate an image same as Figure 1.
5 Plot bioNetCircos Layout Manually
With default bioNetCircos layout plot method, the node layout and labellingrely on the igraph package and sometimes the node layout and labelling maynot be in desired location. In that case, it is recommended to manually checkout the layout first then plot each item.
Following are basic procedures to make a bioNetCircos plot:
1. Demo data
> library(caOmicsV)
> data(bionetPlotDemoData)
> dataSet <- bionetPlotDemoData
> sampleNames <- dataSet$sampleNames
> geneNames <- dataSet$geneNames
> numOfSamples <- length(sampleNames)
> numOfSampleInfo <- nrow(dataSet$sampleInfo) - 1
> numOfSummary <- ifelse(dataSet$summaryByRow, 0, col(dataSet$summaryInfo)-1)
> numOfHeatmap <- length(dataSet$heatmapData)
> numOfCategory <- length(dataSet$categoryData)
> numOfBinary <- length(dataSet$binaryData)
> expr <- dataSet$heatmapData[[1]]
> bioNet <- bc3net(expr)
2. Initialize bioNetCircos layout
> widthOfSample <- 100
> widthBetweenNode <- 3
> lengthOfRadius <- 10
> dataNum <- sum(numOfSampleInfo, numOfSummary, numOfHeatmap,
+ numOfCategory, numOfBinary)
> trackheight <- 1.5
> widthOfPlotArea <- dataNum*2*trackheight
8
> initializeBioNetCircos(bioNet, numOfSamples, widthOfSample,
+ lengthOfRadius, widthBetweenNode, widthOfPlotArea)
> caOmicsVColors <- getCaOmicsVColors()
> supportedType <- getCaOmicsVPlotTypes()
> par(cex=0.75)
> showBioNetNodesLayout()
3. Manually label each node
At this point, each node has its index labelled. Manually check out the de-sired location for node name (gene) labelling then label each node (the nodeindex here may be different from your graph).
> par(cex=0.6)
> onTop <- c(14, 15, 16, 9, 7, 20, 8, 24, 10, 25)
> labelBioNetNodeNames(nodeList=onTop,labelColor="blue",
+ labelLocation="top", labelOffset = 0.7)
> onBottom <- c(26, 22, 23, 18, 19, 3, 5)
> labelBioNetNodeNames(nodeList=onBottom,labelColor="black",
+ labelLocation="bottom", labelOffset = 0.7)
> onLeft <- c(2, 11, 21, 17)
> labelBioNetNodeNames(nodeList=onLeft,labelColor="red",
+ labelLocation="left", labelOffset = 0.7)
> onRight <- c(13, 12, 4, 1, 6)
> labelBioNetNodeNames(nodeList=onRight,labelColor="brown",
+ labelLocation="right", labelOffset = 0.7)
Once all node names are labelled correctly, plot area of each node could beerased for plotting.
> eraseBioNetNode()
4. Plot each data set
> inner <- lengthOfRadius/2
> outer <- inner + trackheight
Plot tissue type for each node. Repeat this step if there are more than oneclinical features.
> groupInfo <- as.character(dataSet$sampleInfo[2, ])
> sampleColors <- rep("blue", numOfSamples);
> sampleColors[grep("Tumor", groupInfo)] <- "red"
> plotType=supportedType[1]
> groupInfo <- matrix(groupInfo, nrow=1)
> bioNetCircosPlot(dataValues=groupInfo,
+ plotType, outer, inner, sampleColors)
> inner <- outer + 0.5
> outer <- inner + trackheight
9
Heatmap plot for each node. Repeat this step if there are more heatmapdata.
> exprData <- dataSet$heatmapData[[1]]
> plotType <- supportedType[4]
> bioNetCircosPlot(exprData, plotType, outer, inner,
+ plotColors="BlueWhiteRed", maxValue=3, minValue=-3)
> inner <- outer + 0.5
> outer <- inner + trackheight
Category data plot for each node. Repeat this step if there are more categorydatasets.
> categoryData <- dataSet$categoryData[[1]]
> plotType <- supportedType[2];
> bioNetCircosPlot(categoryData, plotType,
+ outer, inner, plotColors="red")
length of data values and colors are different.
First color will be used for all samples.
> inner <- outer + 0.5
> outer <- inner + trackheight
Binary data plot for each node. Repeat this step if there are more binarydatasets
> binaryData <- dataSet$binaryData[[1]]
> plotType <- supportedType[3]
> plotColors <- rep(caOmicsVColors[1], ncol(binaryData))
> bioNetCircosPlot(binaryData, plotType,
+ outer, inner, plotColors)
> inner <- outer + 0.5
> outer <- inner + trackheight
Link samples on a node. Repeat this step for each node when needed.
> outer <- 2.5
> bioNetGraph <- getBioNetGraph()
> nodeIndex <- which(V(bioNetGraph)$name=="PLVAP")
> fromSample <- 10
> toSample <- 50
> plotColors <- "red"
> linkBioNetSamples(nodeIndex, fromSample,
+ toSample, outer, plotColors)
> fromSample <- 40
> toSample <- 20
> plotColors <- "blue"
> linkBioNetSamples(nodeIndex, fromSample,
+ toSample, outer, plotColors)
10
5. Add legend
> dataNames <- c("Tissue Type", "RNASeq", "Methylation", "CNV")
> bioNetLegend(dataNames, heatmapMin=-3, heatmapMax=3)
The output from the code above should be same as Figure 2.
6 sessionInfo
> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS
Matrix products: default
BLAS: /home/biocbuild/bbs-3.12-bioc/R/lib/libRblas.so
LAPACK: /home/biocbuild/bbs-3.12-bioc/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] caOmicsV_1.20.0 bc3net_1.0.4 lattice_0.20-41 Matrix_1.2-18
[5] infotheo_1.2.0 c3net_1.1.1 igraph_1.2.6
loaded via a namespace (and not attached):
[1] compiler_4.0.3 magrittr_1.5 tools_4.0.3 grid_4.0.3
[5] pkgconfig_2.0.3
11