48
- - packages: packages: from ideas to software from ideas to software Dr. Michael Shmoish Computer Science Department Technion – IIT Workshop on R, Technion March 28, 2005

-packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

  • Upload
    vandang

  • View
    217

  • Download
    3

Embed Size (px)

Citation preview

Page 1: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

--packages: packages:

from ideas to softwarefrom ideas to software

Dr. Michael Shmoish

Computer Science Department

Technion – IIT

Workshop on R, Technion

March 28, 2005

Page 2: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

OutlineOutline

� R-resources

� R-extension packages

a) What are they? How many of them? What kinds?

b) Loading packages

c) Getting help on packages

d) Using packages

� Home-made examples of transforming ideas into R-software:

a) Gene expression plots in the GeneCards http://genecards.weizmann.ac.il/

b) Gel-simulation

c) GC visualization

d) Genomic rearrangements

� R is a de facto language of Bioinformatics

� Bioconductor packages

Page 3: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

RR--ResourcesResources

� R homepage

http://www.r-project.org/

contains information on the R-project and (almost)

everything related to it.

� CRAN page

http://cran.r-project.org/

is the download area, with R-base software itself,

extension packages, PDF manuals.

Page 4: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

RR--ResourcesResources

Page 5: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

RR Reference CardsReference Cards

�Rpad and R reference card,

by Tom Short (a long one: 4 pages)

www.rpad.org/Rpad/Rpad-refcard.pdf

�R reference card,

by Jonathan Baron (very short: 1 page only)

www.psych.upenn.edu/~baron/refcard.pdf

Page 6: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

Publications (PNAS)Publications (PNAS)

Page 7: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

Publications (Nature)Publications (Nature)

Page 8: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

What areWhat are RR--packagespackages??

� packages are self-contained units of code

with documentation

� there are automatic testing features built in

� all functions must have examples and the

examples must run

� interesting commands:

– update.packages, example, >example(hclust)

Page 9: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

How manyHow many R R packages? packages?

Quite a lot!

Page 10: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

How manyHow many R R packages? packages?

�Base packages (CRAN - base, graphics, methods, stats).

�Contributed packages (CRAN – 485 packages as of March 2005).

�Bioconductor project packages (Bioconductor.organnotate, affy, marray, multtest,

hgu95av2, ALL, EMBO03)

�Others (Rgeo for analysis of spatial data;

Rmetrics for financial market analysis;

packages before submitting to CRAN; not always updated and checked : dna, DNAcopy, DEDS)

Page 11: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

What kind ofWhat kind of R R packages?packages?

� Analysis packages: implementation of statistical and graphical methods (cluster, lattice, nnet, rpart).

� Data packages: datasets for tutorials/books (UsingR, datasets), biological metadata packages consisting of environment objects for mappings between different gene identifiers (e.g., Affymetrix ID, LocusLink ID, PubMed ID), CDF and probe sequence information for Affymetrix chips ( GO, hgu95av2 , humanLLMappings, KEGG).

� Specialized/custom packages: code, data, documentation, and exercises, for a particular project, article, or course

(DNAcopy - detecting chr. regions with abnormal DNA copy number

EMBO03 : Bioconductor course package;

GeneTS: analysing multiple gene expression time series golubEsets: Golub et al. (2000) ALL/AML dataset; yeastCC: Spellman et al. (1998) yeast cell cycle dataset

qtl – analysing QTL data).

Page 12: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

R R packages (cluster analysis )packages (cluster analysis )

� cclust: convex clustering methods.

� class: self-organizing maps (SOM).

� cluster:

– AGglomerative NESting (agnes),

– Clustering LARe Applications (clara),

– DIvisive ANAlysis (diana),

– Fuzzy Analysis (fanny),

– MONothetic Analysis (mona),

– Partitioning Around Medoids (pam).

� e1071:

– fuzzy C-means clustering (cmeans),

– bagged clustering (bclust).

� mva (now part of the stats):

– hierarchical clustering (hclust),

– k-means (kmeans).

� GeneSOM: self-organizing maps

� Flexmix, fpc: fixed point clusters, clusterwise regression and discriminant plots.

Page 13: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

R R packagespackages

Page 14: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

Installing/loading Installing/loading RR--packagespackages

� From the RGui console menu – Packages

� Loading from the command line:

> 'library(RpackageName)'

� Loading from inside other functions:

'require(RpackageName)'

Both load the R-package named ‘RpackageName‘ and make its functions available in the R environment.

Page 15: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

Getting help on Getting help on RR--packagespackages

The list of these functions (names with a brief description ) can be displayed with the commands

> library (help = RpackageName ) #or

> help ( package = RpackageName )

To get a help on a specific function FOO one can use :

> help (FOO, package = RpackageName ) #or

> ?FOO # in the case the relevant package is loaded

> library() # list all packages available for loading

> search() #gives a list of 'attach'ed packages

Page 16: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

Getting help on Getting help on RR--packages (example)packages (example)

> help( package = limma)

> help( ebayes, package = limma)

> library(limma)

> ?ebayes

Warning:

> ?RpackageName #could fail!

Page 17: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical
Page 18: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

R R packagespackages

�sound - provides basic functions for dealing with wav files and sound

samples

�rimage - provides functions for image processing, including sobel filter, rank filters, fft, histogram equalization, and reading JPEG file (could be used for the automatic processing of gels).

Page 19: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

Using Using RR packages: rimagepackages: rimage

> library (rimage)

> a = read.jpeg ("sb.jpg")

> plot (a)

Page 20: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

R R packages: rimagepackages: rimage

Page 21: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

R R packages: rimagepackages: rimage

>a = read.jpeg("sb.jpg")

>plot(a)

>plot(rgb2grey(a))

Page 22: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

R R packages: rimagepackages: rimage

Page 23: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

R R packages: rimagepackages: rimage

>a = read.jpeg("sb.jpg")

>plot(a)

>plot(rgb2grey(a))

> print(a)

size: 179 x 600

type: rgb

>plot(imagematrix(matrix(rnorm(179 * 600 ), 179 ,600 )))Warning message:

Pixel values were automatically clipped because of range over. in: imagematrix(matrix(rnorm(179 * 600), 179, 600))

Page 24: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

R R packages: rimagepackages: rimage

Page 25: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

Probability distributionsProbability distributions

�cumulative distribution function P(X ≤ x): ‘p’ for the CDF

�probability density function d P(X ≤ x) /dx : ‘d’ for the density,

�quantile function (given q, the smallest x such that P(X ≤ x) > q):

‘q’ for the quantile

�Simulate random numbers from the distribution: ‘r’ for random

Distribution R name Additional argumentsbeta beta shape1, shape2, ncp

binomial binom size, prob

chi-squared chisq df, ncp

exponential exp rate

F f df1, df1, ncp

gamma gamma shape, scale

geometric geom prob

hypergeometric hyper m, n, k

Page 26: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

Probability distributions (cont.)Probability distributions (cont.)

Distribution R name

log-normal lnorm

logistic logis

negative binomial nbinom

normal norm

Poisson pois

Student’s t t

uniform unif

Wilcoxon wilcox

Page 27: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

R R packagespackages

Page 28: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

HomeHome--made examples of transforming made examples of transforming

ideas into ideas into RR--softwaresoftware

Page 29: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

Gene expression in the GeneCardsGene expression in the GeneCards

� non-standard ‘root’ scale

� colors

� easy to add texts and plots to existing plots

� correlation/length computations for

variation plots

� mining Unigene/CGAP

Dr. Shmoish and Prof. Lancet, Dr. Dr. Shmoish and Prof. Lancet, Dr. ChalifaChalifa -- CaspiCaspi, Dr. , Dr. ShmueliShmueli, Mrs. , Mrs. SafranSafran of the Weizmann Institute of Scienceof the Weizmann Institute of Science

Nucleic Acids ResearchNucleic Acids Research 31,1:14231,1:142--146 (2003)146 (2003)

Page 30: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

Gene expression in the GeneCardsGene expression in the GeneCards

Page 31: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

GC visualizationGC visualization

>Prochlorococcus_MIT9302 gi|51235135|gb|AY599029.1| Prochlorococcus marinus str. MIT 9302 PsbA (psbA) gene, partial cds

----GTTCCTTCATCTAACGCTATTGGTCTACACTTCTACCCAATTTGGGAAGCAGCTACTGTAGATGAGTGGT

TATACAACGGTGGTCCTTACCAGCTTGTTATTTTCCACTTCCTAATTGGTATCTCAGCATACATGGGAAG

ACAGTGGGAGCTTTCATACCGTTTAGGTATGCGTCCTTGGATCTGTGTTGCATACTCTGCACCAGTTTCA

GCAGCTTTCGCAGTATTTCTTGTATACCCATTCGGTCAAGGTTCATTCTCTGACGGAATGCCTCTAGGTA

TCTCTGGAACATTCAACTTCATGTTTGTTTTCCAGGCAGAGCACAACATTCTTATGCACCCATTCCATAT

GGCTGGTGTTGCTGGTATGTTCGGAGGATCTTTATTCTCAGCTATGCATGGTTCACTTGTTACTTCGTCT

CTAATCAGAGAAACAACTGAGACAGAGTCTCAGAACTATGGTTACAAGTTCGGACAAGAAGAAGAAACAT

>Prochlorococcus_MIT9515 gi|51235137|gb|AY599030.1| Prochlorococcus marinus str. MIT 9515 PsbA (psbA) gene, partial cds

----GTTCCTTCTTCAAATGCTATTGGTCTACACTTCTACCCAATTTGGGAAGCAGCTACTGTAGATGAGTGGT

TATACAACGGTGGTCCTTACCAGCTAGTAATTTTCCACTTCCTTATTGGTATCTCAGCTTACATGGGACG

TCAGTGGGAGCTTTCATACCGTTTAGGTATGCGTCCTTGGATCTGTGTTGCATACTCTGCACCAGTTTCA

GCAGCTTTCGCAGTATTCCTTGTATATCCATTTGGTCAAGGTTCATTCTCTGACGGAATGCCTTTAGGTA

TCTCTGGAACATTCAACTTCATGTTTGTTTTCCAGGCAGAGCACAACATTCTTATGCACCCATTCCATAT

GGCTGGTGTTGCAGGTATGTTCGGAGGATCATTATTCTCAGCAATGCATGGTTCACTTGTTACTTCATCT

CTAATCAGAGAAACAACTGAGACAGAGTCTCAGAACTATGGTTACAAGTTCGGACAAGAAGAAGAAACAT

>Prochlorococcus_MIT9312_HL

------gtggacatagacgaaataagcgagccagttgctggttcattcctatatggaaacaacatcatctcaggtgcagttgttccttcatccaacgctattggtc

Tacacttctacccaatttgggaagcagctactgtagatgagtggttatac

Page 32: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

GC visualizationGC visualization

Dr. Shmoish and Dr. Beja , Mrs. Limor, Mr. Zeidner of BiologyDr. Shmoish and Dr. Beja , Mrs. Limor, Mr. Zeidner of Biology, Technion., Technion.

to appearto appear in in Environ Environ MicrobiolMicrobiol. (2005). (2005)

Page 33: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

GelGel--simulationsimulation

Dr. Shmoish and Dr. Shmoish and Prof. Manor, Mr. Prof. Manor, Mr. RomiRomi of Biology, Technion.of Biology, Technion.

Page 34: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

Genomic rearrangementsGenomic rearrangements

Dr. Shmoish and Dr. Shmoish and Prof. Pinter, Mr. Prof. Pinter, Mr. SwidanSwidan of of Computer Science Dept., Technion.Computer Science Dept., Technion.

Page 35: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

http://www.zbi.uni-saarland.de/cbi/stud/perspekt.shtml

R is a (de facto) language of the Bioinformatics

Bioinformatics is an

interdisciplinary field

Page 36: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

Definitions of Definitions of BioinformaticsBioinformatics on the Web:on the Web:

� The application of computer technology to the management of biological information. Specifically, it is the science of developing computer databases and algorithms to facilitate and expedite biological research, particularly in genomics. www.informatics.jax.org/mgihome/other/glossary.shtml

� The study of the application of computer and statistical techniques to the management of biological information. In genome projects,bioinformaticsincludes the development of methods to search databases quickly, to analyze DNA sequence information, and to predict protein sequence and structure from DNA sequence data. home.san.rr.com/dna/darryl/glossary.html

� The analysis of biological information using computers and statistical techniques; the science of developing and utilizing computer databases and algorithms to accelerate and enhance biological research. www.niehs.nih.gov/nct/glossary.htm

� The science of informatics as applied to biological research. Informatics is the management and analysis of data using advanced computing techniques. www.genencor.com/wt/gcor/glossary

Page 37: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

Definitions of Definitions of BioinformaticsBioinformatics on the Web:on the Web:

� (Computational biology). This word has not a clear definition. It involves the analysis and interpretation of data and the development of algorithms and statistics. The term was coined to encompass computer applications in biological sciences but is now used to mean rather different things, from artificial intelligence and robotics to genome analysis. The term was originally applied to the computational manipulation and analysis of biological sequence data (DNA and/or protein), but now tends also to be used to embrace the manipulation and analysis of 3D structural data. www.biol.lu.se/mibiol/research/wachen/glossary.htm

� The use of computers to handle biological information. The term is often used to describe computational molecular biology – the use of computers to store, search and characterize the genetic code of genes, the proteins linked to each gene and their associated functions. www.syngenta.com/en/about_syngenta/research_tech_gloss.asp

� The application of computational techniques to the management and analysis of biological information. bioinf.uta.fi/xml/courses/glossary/glossary-items.xml

Page 38: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

Definitions of Definitions of BioinformaticsBioinformatics on the Web:on the Web:

� The science that uses advanced computing techniques for management and analysis of biological data. Bioinformatics is particularly important as an adjunct to genomic research, which generates a large amount of complex data, involving billions of individual DNA building-blocks, and tens of thousands of genes. (SNP consortium) www.variagenics.com/glossary.html

� The science of managing and analyzing biological data using advanced computing techniques. Especially important in analyzing genomic research data. See also: informatics doegenomestolife.org/glossary/glossary_b.html

� the use of computers in solving information problems in the life sciences. It mainly involves the creation of extensive electronic databases on genomes, protein sequences etc. Also involves techniques such as three-dimensional modelling of biomolecules and biological systems. www.universityscience.ie/pages/glossary.htm

� Computational or algorithmic approaches to the analysis and integration of genomic, proteomic, or chemical data residing in databases. Bioinformatics includes applications for the analysis of DNA and protein sequence patterns and similarities, tools for t www.dddmag.com/scripts/glossary.asp

Page 39: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

RR packages (proteomics)packages (proteomics)

Algorithms used for analysis of MALDI-MS spectra

1) LDA (Linear Discriminant Analysis), QDA (Quadratic Discriminant Analysis)

R package: MASSfunction: lda, qda

2) KNN (k-nearest neighbor) R package: classfunction: knn

3) Bagging, boosting classification trees

R package: rpart, treefunction: rpart, tree

4) SVM (Support Vector Machine) R package: e1071function: svm

Wu, B., Abbott, T., Fishman, D., McMurray, W., Mor, G., Stone, K., Ward, D., Williams, K., and Zhao, H. (2003) Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data. Bioinformatics , 19: 1636-1643

Page 40: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

R/qtlR/qtl

Authors: Karl Broman, Hao Wu, Gary Churchill, Saunak Sen, & Brian Yandell

Page 41: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

R/R/qtlqtl

Page 42: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

RR

Page 43: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

Bioinformatics challengesBioinformatics challenges

�Large data: tens of thousands of genes across a few

hundred samples

�Much of the data is non-numeric (e.g., the annotation of

genes, mutations), genomic rearrangments

�The role of the gene in a particular pathway

� Integration of the “omic” data (data sources are varied

with different formats)

Page 44: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

BioconductorBioconductor

�Bioconductor is a (relatively) new software initiative

– www.bioconductor.org

� among the goals of this project is the deployment of high

quality software for the analysis of the “omic” data

� the challenges are varied and exciting

Page 45: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

Bioconductor PackagesBioconductor Packages

� General infrastructure: Biobase, Biostrings, DynDoc, reposTools, rhdf5 , ruuid, tkWidgets, widgetTools.

� Annotation: annotate, AnnBuilder + metadata packages.

� Graphics: geneplotter, hexbin.

� Pre-processing microarray data: affy, affycomp, affydata, affylmGUI , affyPLM, annaffy, gcrma, makecdfenv, limma, limmaGUI , marray , vsn.

� Other assays: aCGH, DNAcopy, prada, PROcess, RSNPer, SAGElyzer.

� Differential gene expression: EBarrays, edd, factDesign, genefilter, limma, limmaGUI , multtest, ROC.

� Graphs and networks: graph, RBGL, Rgraphviz .

� Gene Ontology: GOstats, goTools.

Page 46: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

R R packages (microarray data analysis)packages (microarray data analysis)

CEL, CDF

affy

vsn

.gpr, .Spot, MAGEMLPre-processing

exprSet

graph

RBGL

Rgraphviz

edd

genefilter

limma

multtest

ROC

+ CRAN

annotate

annaffy

+ metadata

packagesCRAN

class

cluster

MASS

mvageneplotter

hexbin

+ CRAN

marray

limma

vsn

Differential

expression

Graphs &

networksCluster

analysis

Annotation

CRAN

class

e1071

ipred

LogitBoost

MASS

nnet

randomForest

rpart

Prediction

Graphics

Page 47: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

Messages to take homeMessages to take home

� R is a good thing to be aware of

� you can (and have to) tRy it at home

� for Life Science researchers/students R is a comprehensible high-level language with a lot of useful packages and with many friendly features that extends their standard Excel abilities for data analysis

� for Exact Science researchers/students R is a good entry point into the exciting world of Bioinformatics and arguably one of the best tools for transforming bioinformatics ideas into working and well-documented software organized in packages.

Page 48: -packages: from ideas to software - Technionbioinfo.cs.technion.ac.il/cobi/RworkshopPDF/R-packages.pdf · -packages: from ideas to software ... ncp exponential ... Comparison of statistical

Thank you!