5
Applied Bioinformatics Introduction to R, continued Bing Zhang Department of Biomedical Informatics Vanderbilt University [email protected]

Applied Bioinformatics Introduction to R, continued Bing Zhang Department of Biomedical Informatics Vanderbilt University [email protected]

Embed Size (px)

Citation preview

Page 1: Applied Bioinformatics Introduction to R, continued Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu

Applied Bioinformatics

Introduction to R, continued

Bing Zhang

Department of Biomedical Informatics

Vanderbilt University

[email protected]

Page 2: Applied Bioinformatics Introduction to R, continued Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu

Matrix subsetting and combining

2

Task R code

Import data from a tabular file data<-read.table("GSE8671_exp.txt",head=TRUE,sep="\t")

Convert data frame to matrix data0<-as.matrix(data)

Get dimensions of the matrix dim(data0)

Select discrete rows by index data0[c(1,3,5,7,9),]

Select continuous rows by index data0[5:10,]

Select discrete columns by index data0[,c(1,3,5,7,9)]

Select continuous columns by index data0[,5:10]

Select both rows and columns by index data0[1:10,1:5]

Select one row by name data0[“1438_at”,]

Select both rows and columns by name data0[c(“1438_at”, “117_at”),c(“GSM215052”, “GSM215079”)]

Calculate variances for all rows gene_variances<-apply(data0,1,var)

Calculate means for all rows gene_means<-apply(data0,1,mean)

Combine columns (same number of rows) combined<-cbind(data0,gene_means,gene_variances)

Select rows by output of a comparison combined[gene_means>60000,]

Page 3: Applied Bioinformatics Introduction to R, continued Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu

Save your work The R environment is controlled by hidden files in the startup directory

.Rdata

.Rhistory

Save before quit > q()

Save worksapce image? [y/n/c]:

During a session > save.image()

Save your code to a file (e.g. diff.r), which can be excuted in batch $ R CMD BATCH diff.r &

&: running a program in the background

Screen output to diff.r.Rout

3

Page 4: Applied Bioinformatics Introduction to R, continued Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu

Install and load packages

CRAN packages http://cran.r-project.org/web/packages/

>6000 packages

BioConductor packages http://www.bioconductor.org/

~1000 packages for the analysis of high-throughput genomics data

4

Task R code

Install a CRAN package install.packages (“package name”)

Install a BioConductor package souce (“http://www.bioconductor.org/biocLite.R”)biocLite (“package name”)

Load a package/library library (“package name”)

Page 5: Applied Bioinformatics Introduction to R, continued Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu

Graphics in R

R has very strong graphic capacities

High quality, high reproducibility, lots of packages

On-screen graphics Works in R Gui (both Windows and Mac)

In Linux, requires X11 (windowing system for bitmap displays) in Linux

Output to a file postscript, pdf, svg

jpeg, png, tiff, …

5

Start a pdf file pdf(“gse4183_clustering.pdf”, width=10, height=15)

Generate a heatmap heatmap.plus(data3, Rowv=as.dendrogram(rhc), Colv=as.dendrogram(hc), colSideColors=ann, cexRow=0.5, cexCol=0.5, col=greenred(256))

Close the file dev.off()