Upload
giana-tobin
View
216
Download
0
Embed Size (px)
Citation preview
Overview of Bioconductor
Aedín Culhane
http://bcb.dfci.harvard.edu/~aedin
http://www.hsph.harvard.edu/research/aedin-culhane
BioconductorBiannual release (normally April, October) to coincide with R release.
Current: Bioconductor 2.9 (release coincide with R 2.14)
To install use script on Bioconductor Website source("http://www.bioconductor.org/biocLite.R")
biocLite()
Packages Overview
BioConductor web site
• Bioconductor BiocViews Task view
Software
Annotation Data
Experimental Data
What Packages do I need?
Specific to you data and analysis pipeline but for examples:
• Bioconductor Workshops
• Bioconductor Workflows
Main types of Annotation Packages• Gene centric AnnotationDbi packages:
– Organism: org.Mm.eg.db.
– Technology/Platform: hgu133plus2.db.
– GeneSets and Pathway (biology level): GO.db or KEGG.db
– .db packages can be queried with sql or accessed using annotation package (totable, get, mget)
• Genome centric GenomicFeatures packages:– Transriptome level: TxDb.Hsapiens.UCSC.hg19.knownGene
– Generic features: Can generate via GenomicFeatures
• biomaRt:– Query web-based `biomart' resource for genes, sequence, SNPs, and
etc.• See http://www.bioconductor.org/help/course-materials/2011/BioC2011/LabStuff/AnnotationSlidesBioc2011.pdf
Bioconductor resources
• Mailing List (sign up for daily digest)
• Documentation, workshop/course material online– Slides from talks, pdf of tutorials, R code
• Help available for each software package– Each package MUST contain vignette (howto)
• Other resources ww.Rseek.org www.r-bloggers.com
Vignette
• Tutorials, provide worked example of package• Required in Bioconductor packages• Written in Sweave (Leisch, 2002).
– LATEX dynamic reports in which R code is embedded and executable
– All R code in vignette is checked (and executed) by R CMD check
– http://www.bioconductor.org/docs/vignettes.html
library("Biobase") library("GOstats") # Load package of interestopenVignette()
S4 classes and ExpressionSet
• Within Bioconductor, you will encounter packages are structured around S4 object-oriented programming proposed by John Chambers (developer of S)
• A class provides a software abstraction of a real world object.
• A method performs an action on a class(Think of a class as a noun, and method as verb)
Object (S4)
• An object is an instance of a class.
• Descriptions are stored in slots
• slotNames(ob1) lists all slots in object, or use str().
• To access slots– ob1@slotname– slotname(ob1), or– slot(ob1, “slotname")
Example: ExpressionSet
library(ALL)
data(ALL)
slotNames(ALL)
ALL@phenoData
phenoData(ALL)
class(ALL)
?ExpressionSet
> ALL
ExpressionSet (storageMode: lockedEnvironment)
assayData: 12625 features, 128 samples
element names: exprs
protocolData: none
phenoData
sampleNames: 01005 01010 ... LAL4 (128 total)
varLabels: cod diagnosis ... date last seen (21 total)
varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
pubMedIds: 14684422 16243790
Annotation: hgu95av2
Method which act on a S4 class
showMethods(class= "ExpressionSet")
getMethod("write.exprs", "ExpressionSet")
Or if you wish to see how the package really works, download and look the source code
Getting Data into R & Bioconductor
Aedín Culhane
http://www.hsph.harvard.edu/research/aedin-culhane/
Simple Excel SpreadSheet data
• Simple table
– read.table()
– read.csv()
– scan()
• However more datatype specialized. See Technologies on BiocViews.
– http://www.bioconductor.org/packages/release/BiocViews.html
• Large data files. Also see http://www.revolutionanalytics.com
13
Reading Affymetrix Data
library(affy)
require(affy) # Alternative
affybatch <- ReadAffy(celfile.path="[Location of your data]")
eSet<-justRMA()
May 2011 16
Public Microarray Data
ArrayExpress • 21997 Studies (622,617 profiles,)
GEO • 22,735 Studies (558,074 profiles)
Statistics May 2011
More on GEOquery
May 2011 22
require(GEOquery)
Let's try to load the GDS810 dataset which contains data on Alzheimer's disease at various stages of severity.
GDS810<-getGEO("GDS810")
The getGEO function returns an object of class GEOData. You can get a description of this class like this: help("GEOData-class")
Meta(GDS810) Columns(GDS810) head(Table(GDS810))
Other Arrays
• Illumina– Lumi package
• 2 color spotted arrays– Limma package
• Other arrays– http://www.bioconductor.org/help/workflows/
oligo-arrays/
May 2011 25
Exercise
• Install the library GEOquery
• Download the dataset GSE1297 using getGEO
• This data will be downloaded as an eSet, so to see the expression data and phenoData, use pData and exprs
• Use ArrayQualityMetrics to Assess the data quality of these data
May 2011 28
R basics: Getting help
• To get help– ?mean– help(mean)
• help.search(“mean”)
• apropos("mean")
• example(mean)
• http://www.bioconductor.org/help/