21
Outline Bioconductor Annotations for Sequencing Technologies rtracklayer biomaRt AnnotationDbi Annotation Marc Carlson Fred Hutchinson Cancer Research Center January 29, 2010 1 / 21 Annotation

Marc Carlson · Annotations for Sequencing Technologies Annotations for Sequencing projects Other packages: rtracklayer{ export to UCSC web browsers. GenomicFeatures{ coming soon

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Marc Carlson · Annotations for Sequencing Technologies Annotations for Sequencing projects Other packages: rtracklayer{ export to UCSC web browsers. GenomicFeatures{ coming soon

Outline Bioconductor Annotations for Sequencing Technologies rtracklayer biomaRt AnnotationDbi

Annotation

Marc Carlson

Fred Hutchinson Cancer Research Center

January 29, 2010

1 / 21Annotation

Page 2: Marc Carlson · Annotations for Sequencing Technologies Annotations for Sequencing projects Other packages: rtracklayer{ export to UCSC web browsers. GenomicFeatures{ coming soon

Outline Bioconductor Annotations for Sequencing Technologies rtracklayer biomaRt AnnotationDbi

1 Bioconductor Annotations for Sequencing Technologies

2 rtracklayer

3 biomaRt

4 AnnotationDbi

2 / 21Annotation

Page 3: Marc Carlson · Annotations for Sequencing Technologies Annotations for Sequencing projects Other packages: rtracklayer{ export to UCSC web browsers. GenomicFeatures{ coming soon

Outline Bioconductor Annotations for Sequencing Technologies rtracklayer biomaRt AnnotationDbi

Outline

1 Bioconductor Annotations for Sequencing Technologies

2 rtracklayer

3 biomaRt

4 AnnotationDbi

3 / 21Annotation

Page 4: Marc Carlson · Annotations for Sequencing Technologies Annotations for Sequencing projects Other packages: rtracklayer{ export to UCSC web browsers. GenomicFeatures{ coming soon

Outline Bioconductor Annotations for Sequencing Technologies rtracklayer biomaRt AnnotationDbi

Annotations for Sequencing Technologies

Annotations for Sequencing projectsOther packages:

rtracklayer – export to UCSC web browsers.

GenomicFeatures – coming soon for transcript annotations (willrelease in spring)

biomaRt:

Query web-based ‘biomart’ resource for genes, sequence, and SNPsetc.

AnnotationDbi packages:

Organism and chip packages – contain chromosome start and stopsites for most genes.

4 / 21Annotation

Page 5: Marc Carlson · Annotations for Sequencing Technologies Annotations for Sequencing projects Other packages: rtracklayer{ export to UCSC web browsers. GenomicFeatures{ coming soon

Outline Bioconductor Annotations for Sequencing Technologies rtracklayer biomaRt AnnotationDbi

Outline

1 Bioconductor Annotations for Sequencing Technologies

2 rtracklayer

3 biomaRt

4 AnnotationDbi

5 / 21Annotation

Page 6: Marc Carlson · Annotations for Sequencing Technologies Annotations for Sequencing projects Other packages: rtracklayer{ export to UCSC web browsers. GenomicFeatures{ coming soon

Outline Bioconductor Annotations for Sequencing Technologies rtracklayer biomaRt AnnotationDbi

rtracklayer basics

What rtraklayer offers: rtracklayer

Web accessible annotations

Source: The data is from UCSC Genome tracks

6 / 21Annotation

Page 7: Marc Carlson · Annotations for Sequencing Technologies Annotations for Sequencing projects Other packages: rtracklayer{ export to UCSC web browsers. GenomicFeatures{ coming soon

Outline Bioconductor Annotations for Sequencing Technologies rtracklayer biomaRt AnnotationDbi

finding resources with rtracklayer

How to find data from the UCSC Genome browser in R

creates a browserSession: browserSession.

list available genomes from UCSC: ucscGenomes.

set up a genome object: genome.

list available tracks: trackNames.

> library(rtracklayer)

> session <- browserSession()

> head(ucscGenomes())

> genome(session) <- "hg18"

> head(trackNames(session))

7 / 21Annotation

Page 8: Marc Carlson · Annotations for Sequencing Technologies Annotations for Sequencing projects Other packages: rtracklayer{ export to UCSC web browsers. GenomicFeatures{ coming soon

Outline Bioconductor Annotations for Sequencing Technologies rtracklayer biomaRt AnnotationDbi

obtaining resources with rtracklayer

Downloading the UCSC Genome browser data into R

generate a query for UCSC: ucscTableQuery.

retrieves a UCSC track: getTable.

> ##can generate a query

> query <- ucscTableQuery(session, "refGene")

> ##which in turn can be used to get the data

> track <- getTable(query)

> head(track)

> colnames(track)

8 / 21Annotation

Page 9: Marc Carlson · Annotations for Sequencing Technologies Annotations for Sequencing projects Other packages: rtracklayer{ export to UCSC web browsers. GenomicFeatures{ coming soon

Outline Bioconductor Annotations for Sequencing Technologies rtracklayer biomaRt AnnotationDbi

packaging chromosome data into aRangedData object

Next we can package this data into a RangedData object

> library(IRanges)

> library(BSgenome)

> rdAnn <- RangedData(IRanges(start = track[,"txStart"],

+ end = track[,"txEnd"]),

+ space = track[,"chrom"],

+ strand = track[,"strand"],

+ id = track[,"name"])

> rdAnn

9 / 21Annotation

Page 10: Marc Carlson · Annotations for Sequencing Technologies Annotations for Sequencing projects Other packages: rtracklayer{ export to UCSC web browsers. GenomicFeatures{ coming soon

Outline Bioconductor Annotations for Sequencing Technologies rtracklayer biomaRt AnnotationDbi

Outline

1 Bioconductor Annotations for Sequencing Technologies

2 rtracklayer

3 biomaRt

4 AnnotationDbi

10 / 21Annotation

Page 11: Marc Carlson · Annotations for Sequencing Technologies Annotations for Sequencing projects Other packages: rtracklayer{ export to UCSC web browsers. GenomicFeatures{ coming soon

Outline Bioconductor Annotations for Sequencing Technologies rtracklayer biomaRt AnnotationDbi

BiomaRt basics

What biomaRt offers: biomaRt

Web accessible annotations

The data is from ensembl

11 / 21Annotation

Page 12: Marc Carlson · Annotations for Sequencing Technologies Annotations for Sequencing projects Other packages: rtracklayer{ export to UCSC web browsers. GenomicFeatures{ coming soon

Outline Bioconductor Annotations for Sequencing Technologies rtracklayer biomaRt AnnotationDbi

finding resources at biomaRt

BiomaRt has several methods for discovery or resources.

list available databases: listMarts.

list available datasets: listDatasets.

sets up a DB to be used: useMart.

> library(biomaRt)

> head(listMarts())

> mart <- useMart("ensembl")

> head(listDatasets(mart))

> ens <- useMart("ensembl", dataset="scerevisiae_gene_ensembl")

> ens

12 / 21Annotation

Page 13: Marc Carlson · Annotations for Sequencing Technologies Annotations for Sequencing projects Other packages: rtracklayer{ export to UCSC web browsers. GenomicFeatures{ coming soon

Outline Bioconductor Annotations for Sequencing Technologies rtracklayer biomaRt AnnotationDbi

extracting data from biomaRt

To call getBM you need to to apply appropriate filters andattributes to a list of values that you supply. Attributes are whatyou want from the query, and filters describe the values you supply.

list filters from the DB/Dataset: listFilters.

list attributes from that DB/Dataset: listAttributes.

get selected data: getBM.

> head(listFilters(ens))

> head(listAttributes(ens))

> ## example query

> getBM(attributes=c("ensembl_gene_id","chromosome_name",

+ "strand","start_position","end_position"),

+ filters="entrezgene",

+ values=c(1466398,1466399,1466400), mart=ens)

13 / 21Annotation

Page 14: Marc Carlson · Annotations for Sequencing Technologies Annotations for Sequencing projects Other packages: rtracklayer{ export to UCSC web browsers. GenomicFeatures{ coming soon

Outline Bioconductor Annotations for Sequencing Technologies rtracklayer biomaRt AnnotationDbi

extracting data from biomaRt

Lets now call getBM to get ALL of the data on these fields.

> BMres <- getBM(attributes=c("ensembl_gene_id",

+ "chromosome_name","strand",

+ "start_position","end_position"), mart=ens)

14 / 21Annotation

Page 15: Marc Carlson · Annotations for Sequencing Technologies Annotations for Sequencing projects Other packages: rtracklayer{ export to UCSC web browsers. GenomicFeatures{ coming soon

Outline Bioconductor Annotations for Sequencing Technologies rtracklayer biomaRt AnnotationDbi

biomaRt exercise

Using what you just learned about biomaRt, try to construct aRangedData Annotation object similar to what we did withrtracklayer.

15 / 21Annotation

Page 16: Marc Carlson · Annotations for Sequencing Technologies Annotations for Sequencing projects Other packages: rtracklayer{ export to UCSC web browsers. GenomicFeatures{ coming soon

Outline Bioconductor Annotations for Sequencing Technologies rtracklayer biomaRt AnnotationDbi

packaging biomaRt data into a RangedDataobject

> library(IRanges)

> library(BSgenome)

> strand <- strand(ifelse(BMres[,"strand"] > 0, "+", "-"))

> rdAnno <- RangedData(IRanges(

+ start = abs(BMres[,"start_position"]),

+ end = abs(BMres[,"end_position"])),

+ space = BMres[,"chromosome_name"],

+ strand = strand,

+ gene_id = BMres[,"ensembl_gene_id"] )

> rdAnno

16 / 21Annotation

Page 17: Marc Carlson · Annotations for Sequencing Technologies Annotations for Sequencing projects Other packages: rtracklayer{ export to UCSC web browsers. GenomicFeatures{ coming soon

Outline Bioconductor Annotations for Sequencing Technologies rtracklayer biomaRt AnnotationDbi

Outline

1 Bioconductor Annotations for Sequencing Technologies

2 rtracklayer

3 biomaRt

4 AnnotationDbi

17 / 21Annotation

Page 18: Marc Carlson · Annotations for Sequencing Technologies Annotations for Sequencing projects Other packages: rtracklayer{ export to UCSC web browsers. GenomicFeatures{ coming soon

Outline Bioconductor Annotations for Sequencing Technologies rtracklayer biomaRt AnnotationDbi

Using Annotation packages

What Annotation packages offer:

Pre-built and versioned annotation packages

The data is from NCBI

18 / 21Annotation

Page 19: Marc Carlson · Annotations for Sequencing Technologies Annotations for Sequencing projects Other packages: rtracklayer{ export to UCSC web browsers. GenomicFeatures{ coming soon

Outline Bioconductor Annotations for Sequencing Technologies rtracklayer biomaRt AnnotationDbi

extracting chromosome data from Annotpackages

First let’t just get the data from the package.

> library(org.Sc.sgd.db)

> start <- toTable(org.Sc.sgdCHRLOC)

> end <- toTable(org.Sc.sgdCHRLOCEND)

> ##must check that these are the SAME!

> table(start[,1]==end[,1])

> ##If that checks out ok, then we can cbind() them together:

> end <- end[,"stop"]

> res <- cbind(start,end)

> ##filter out autonomously replicating sequences...

> res <- res[abs(res[,"start"]) < abs(res[,"end"]),]

> head(res)

19 / 21Annotation

Page 20: Marc Carlson · Annotations for Sequencing Technologies Annotations for Sequencing projects Other packages: rtracklayer{ export to UCSC web browsers. GenomicFeatures{ coming soon

Outline Bioconductor Annotations for Sequencing Technologies rtracklayer biomaRt AnnotationDbi

Annotation package exercise

Using what you just learned about the annotation packages, try toconstruct a RangedData Annotation object similar to what we didwith biomaRt and rtracklayer.

20 / 21Annotation

Page 21: Marc Carlson · Annotations for Sequencing Technologies Annotations for Sequencing projects Other packages: rtracklayer{ export to UCSC web browsers. GenomicFeatures{ coming soon

Outline Bioconductor Annotations for Sequencing Technologies rtracklayer biomaRt AnnotationDbi

packaging annotation package data into aRangedData object

> library(IRanges)

> library(BSgenome)

> chroms <- paste("chr", res[,"Chromosome"], sep="")

> strand <- strand(ifelse(res[,"start"] > 0, "+", "-"))

> rdAnnot <- RangedData(IRanges(start = abs(res[,"start"]),

+ end = abs(res[,"end"])),

+ space = chroms,

+ strand = strand,

+ id = res[,"systematic_name"])

> rdAnnot

This is the same as the contents ofextractYeastGenesAsRangedData.

21 / 21Annotation