25
© 2003 The MathWorks, Inc. MATLAB Applications in Bioinformatics Developing and Deploying Bioinformatics Applications with MATLAB © 2003 The MathWorks, Inc. 1 MATLAB for Bioinformatics Kristen Amuzzini Biotech, Pharmaceutical, & Medical Industry The MathWorks, Inc.

© 2003 The MathWorks, Inc. MATLAB Applications in Bioinformatics Developing and Deploying Bioinformatics Applications with MATLAB © 2003 The MathWorks,

  • View
    220

  • Download
    0

Embed Size (px)

Citation preview

Page 1: © 2003 The MathWorks, Inc. MATLAB Applications in Bioinformatics Developing and Deploying Bioinformatics Applications with MATLAB © 2003 The MathWorks,

© 2003 The MathWorks, Inc.

MATLAB Applications in BioinformaticsDeveloping and Deploying Bioinformatics

Applications with MATLAB

© 2

00

3 T

he M

ath

Work

s, Inc.

1

MATLAB for Bioinformatics

Kristen AmuzziniBiotech, Pharmaceutical, & Medical Industry

The MathWorks, Inc.

Page 2: © 2003 The MathWorks, Inc. MATLAB Applications in Bioinformatics Developing and Deploying Bioinformatics Applications with MATLAB © 2003 The MathWorks,

© 2003 The MathWorks, Inc.

Presentation Layout

MATLAB applications in Bioinformatics Customer success stories MATLAB & The Bioinformatics Toolbox

Sequence analysis Microarray analysis

Integrating MATLAB with other tools MATLAB as computational engine for Excel

Questions/Answers & Wrap-up

Page 3: © 2003 The MathWorks, Inc. MATLAB Applications in Bioinformatics Developing and Deploying Bioinformatics Applications with MATLAB © 2003 The MathWorks,

© 2003 The MathWorks, Inc.

Bioinformatics Applications

• Sequence analysis• Base calling algorithm design, sequence alignment,

sequence building algorithms

• Microarray analysis• Image processing, QA/QC, data normalization, data analysis

• Proteomics• Mass Spectrometry signal processing, protein marker

identification and classification, peptide sequence identification, 2D-Gel image analysis

• Systems Biology• Interaction network identification, simulation of metabolic

pathways, flux analysis

Page 4: © 2003 The MathWorks, Inc. MATLAB Applications in Bioinformatics Developing and Deploying Bioinformatics Applications with MATLAB © 2003 The MathWorks,

© 2003 The MathWorks, Inc.

Bioinformatics teams supporting multiple constituencies with multiple tools.

• C/C++, Java, Perl• VB, Excel Macros• SQL• GUI Based tools• Freeware• SPLUS, R, SAS, Mathematica• Web based tools

Research Biologists•Prefer UI/Web based tools

•Want custom analyses

Bioinformatics Team•Algorithm development•Custom one-off analyses •Programs for biologists

Software Engineers•C++, Java•Work off MATLAB prototypes

Page 5: © 2003 The MathWorks, Inc. MATLAB Applications in Bioinformatics Developing and Deploying Bioinformatics Applications with MATLAB © 2003 The MathWorks,

© 2003 The MathWorks, Inc.

Using MATLAB, bioinformatics teams can support multiple constituencies.

MATLAB GUI’s, analyses

Research Biologists•Prefer UI/Web based tools

•Want custom analyses

Bioinformatics Team•Algorithm development•Custom one-off analyses •Programs for biologists

Software Engineers•C++, Java•Work off MATLAB prototypes

MATLAB prototypes/Applications

Page 6: © 2003 The MathWorks, Inc. MATLAB Applications in Bioinformatics Developing and Deploying Bioinformatics Applications with MATLAB © 2003 The MathWorks,

© 2003 The MathWorks, Inc.

Complete draft of the human genome, accelerated by Applied Biosystems — using MATLAB algorithms.

“Having one integrated packageis a big advantage. Using MATLAB and theMATLAB Compiler reduced my development time by a factor of 4 or 5.”

“MATLAB has always been ideal as an algorithm prototyping tool,” Labrenz concludes, “but the MATLAB Compiler and C/C++ Math and Graphics Libraries add a whole new dimension, allowing rapid delivery of sophisticated solutions.”

Jim Labrenz, Applied Biosystems

User example: Genetic Sequence Base Calling

Page 7: © 2003 The MathWorks, Inc. MATLAB Applications in Bioinformatics Developing and Deploying Bioinformatics Applications with MATLAB © 2003 The MathWorks,

© 2003 The MathWorks, Inc.

User example: Breast Cancer Prognosis

Rosetta Inpharmatics recently developed a tool that enables clinicians to determine a breast cancer patient’s prognosis based on the gene expression profile of the primary tumor.

“Since MATLAB and the Image Processing Toolbox are fully integrated and the MATLAB platform is very good for matrix calculation, we did not have to spend time writing the low level image processing and the basic data analysis routines like vector and matrix calculations”

“Our research scientists are happy with the quick feedback,” Dr. Dai says. “Using MathWorks tools, we can respond to their requests very fast, and it’s easy for the scientists to use these tools. Using the GUIs that we develop in MATLAB, they can access functions without having to remember the underlying code.”

Dr. Hongyue Dai,Rosetta Inpharmatics/Merck & Company

Page 8: © 2003 The MathWorks, Inc. MATLAB Applications in Bioinformatics Developing and Deploying Bioinformatics Applications with MATLAB © 2003 The MathWorks,

© 2003 The MathWorks, Inc.

Academic users

• Bioinformatics Teaching• MIT, Stanford, Cornell, Carnegie Mellon, …

• Research• Sequencing

• Base calling algorithm design• Sequence analysis

• Computational biolinguistics • Microarray analysis

• Statistical modeling of microarrays• Proteomics

• Statistical modeling of protein-protein interaction• Systems Biology

• Flux Analysis

Page 9: © 2003 The MathWorks, Inc. MATLAB Applications in Bioinformatics Developing and Deploying Bioinformatics Applications with MATLAB © 2003 The MathWorks,

© 2003 The MathWorks, Inc.

More than 600 textbooks for education and professional use, in 19 languages

– Biosciences

– Controls

– Signal Processing

– Image Processing

– Mechanical Engineering– Mathematics– Natural Sciences– Environmental Sciences

Thousands of universities teach students using MathWorks products.

Page 10: © 2003 The MathWorks, Inc. MATLAB Applications in Bioinformatics Developing and Deploying Bioinformatics Applications with MATLAB © 2003 The MathWorks,

© 2003 The MathWorks, Inc.

Industry Issues & Solutions

•Integrating tools from various programming languages is difficult, closed source tools are not customizable, and freeware is often not supported.

•There is no standard biological data format.

•Applications must be easily deployable within organizations.

•MATLAB is a supported, open architecture, user-friendly environment for data analysis across applications, algorithm development, and deployment.

•MATLAB and the Bioinformatics Toolbox provides file format support for common data sources (web-based, sequences, microarray, etc.).

•MATLAB’s deployment tools and user-interface design environment allow easy deployment of MATLAB based applications.

Page 11: © 2003 The MathWorks, Inc. MATLAB Applications in Bioinformatics Developing and Deploying Bioinformatics Applications with MATLAB © 2003 The MathWorks,

© 2003 The MathWorks, Inc.

The Bioinformatics Toolbox

Robert Henson

The MathWorks, Inc.

Developing and Deploying Bioinformatics Applications with MATLAB

© 2

00

3 T

he M

ath

Work

s, Inc.

11

MATLAB & The Bioinformatics Toolbox

Page 12: © 2003 The MathWorks, Inc. MATLAB Applications in Bioinformatics Developing and Deploying Bioinformatics Applications with MATLAB © 2003 The MathWorks,

© 2003 The MathWorks, Inc.

The MathWorks Product Family

Code Generation

Blocksets

Integrated for: technical computing, data analysis and visualization system modeling and simulation implementation of real-time embedded software

PC-based real-time systems

StateflowStateflowStateflowToolboxes

DAQ cardsInstruments

Databases and filesFinancial Datafeeds

Desktop ApplicationsAutomated Reports

Page 13: © 2003 The MathWorks, Inc. MATLAB Applications in Bioinformatics Developing and Deploying Bioinformatics Applications with MATLAB © 2003 The MathWorks,

© 2003 The MathWorks, Inc.

• File I/O

• FASTA, PDB, SCF, GPR, GAL

• Web Connectivity

• GenBank, EMBL, PIR, PDB

• Sequence Analysis & Alignment

• Needleman-Wunsch, Smith-Waterman

• DNA/RNA/AA conversions, pattern searching

• Microarray Normalization & Visualization

• Lowess, global mean, MAD (median absolute deviation)

• Protein Visualization

• Atomic composition, molecular weight, hydrophobicity profile

Bioinformatics Toolbox 1.0

212  PYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEYARLRGIR      | |  |   : | | |   : | : | :   : :  | :  | | | : | |  | : | : :  | : :

321  PYISRYYPELAVHGAYSE -SETYSEQDVREVAEFAKIYGVQ

Page 14: © 2003 The MathWorks, Inc. MATLAB Applications in Bioinformatics Developing and Deploying Bioinformatics Applications with MATLAB © 2003 The MathWorks,

© 2003 The MathWorks, Inc.

CommandHistory

MATLAB Desktop Tools

Launchpad:Start other tools and

demos

WorkspaceBrowser:

See your data

Command Window

Page 15: © 2003 The MathWorks, Inc. MATLAB Applications in Bioinformatics Developing and Deploying Bioinformatics Applications with MATLAB © 2003 The MathWorks,

© 2003 The MathWorks, Inc.

Sequence Alignment Tutorial Example

• Get human and mouse genes from GenBank• Look for open reading frames (ORFs)• Convert DNA sequences to amino acid sequences• Create a dotplot of the two sequences• Perform global alignment• Perform local alignment

Page 16: © 2003 The MathWorks, Inc. MATLAB Applications in Bioinformatics Developing and Deploying Bioinformatics Applications with MATLAB © 2003 The MathWorks,

© 2003 The MathWorks, Inc.

Microarray Data Analysis Tutorial Example

• Plot expression profiles for genes• Filter genes based on information content of profile• Perform hierarchical clustering• Perform K-means clustering• Perform Principal Component Analysis

Reference:

DeRisi, JL, Iyer, VR, Brown, PO. "Exploring the metabolic and genetic control of gene expression on a genomic scale." Science. 1997 Oct 24;278(5338):680-6.

Page 17: © 2003 The MathWorks, Inc. MATLAB Applications in Bioinformatics Developing and Deploying Bioinformatics Applications with MATLAB © 2003 The MathWorks,

© 2003 The MathWorks, Inc.

Integrating and Deploying Bioinformatics Tools with MATLAB

Robert Henson

The MathWorks, Inc.

Developing and Deploying Bioinformatics Applications with MATLAB

© 2

00

3 T

he M

ath

Work

s, Inc.

17

Integrating and Deploying Bioinformatics Tools with MATLAB

Page 18: © 2003 The MathWorks, Inc. MATLAB Applications in Bioinformatics Developing and Deploying Bioinformatics Applications with MATLAB © 2003 The MathWorks,

© 2003 The MathWorks, Inc.

Connecting to MATLAB

WebWeb

Instrument ControlInstrument ControlData AcquisitionData AcquisitionImage AcquisitionImage Acquisition

Excel / COM

File I/O

Database Database ToolboxToolbox

C/C++Java Perl

Page 19: © 2003 The MathWorks, Inc. MATLAB Applications in Bioinformatics Developing and Deploying Bioinformatics Applications with MATLAB © 2003 The MathWorks,

© 2003 The MathWorks, Inc.

C/C++ C/C++

WebWebStand-aloneStand-alone

ExcelCOM

Deploying with MATLAB

Page 20: © 2003 The MathWorks, Inc. MATLAB Applications in Bioinformatics Developing and Deploying Bioinformatics Applications with MATLAB © 2003 The MathWorks,

© 2003 The MathWorks, Inc.

Push Data into MATLAB

Data I/O• Import Excel ranges

into MATLAB• Export MATLAB data into

Excel ranges• Evaluate MATLAB Statements in

Excel

Page 21: © 2003 The MathWorks, Inc. MATLAB Applications in Bioinformatics Developing and Deploying Bioinformatics Applications with MATLAB © 2003 The MathWorks,

© 2003 The MathWorks, Inc.

Computational Engine for Excel

Spread Sheet Applications

• MATLAB Excel Link can be the computational engine behind your Excel applications

• Fast scalable solutionMLPutMatrix("data",B2:H43)

MLPutMatrix("Genes",A2:A43)

MLPutMatrix("TimeSteps",B1:H1)

MLEvalString("clustergram(data,'RowLabels',…

Genes,'ColLabels',TimeSteps)")

Page 22: © 2003 The MathWorks, Inc. MATLAB Applications in Bioinformatics Developing and Deploying Bioinformatics Applications with MATLAB © 2003 The MathWorks,

© 2003 The MathWorks, Inc.

Image ProcessingSignal Processing

Neural Networks OptimizationStatistics

What else could you do?

Bioinformatics

Page 23: © 2003 The MathWorks, Inc. MATLAB Applications in Bioinformatics Developing and Deploying Bioinformatics Applications with MATLAB © 2003 The MathWorks,

© 2003 The MathWorks, Inc.

Integrating and Deploying Bioinformatics Tools with MATLAB

Robert Henson

The MathWorks, Inc.

Developing and Deploying Bioinformatics Applications with MATLAB

© 2

00

3 T

he M

ath

Work

s, Inc.

23

Summary

Page 24: © 2003 The MathWorks, Inc. MATLAB Applications in Bioinformatics Developing and Deploying Bioinformatics Applications with MATLAB © 2003 The MathWorks,

© 2003 The MathWorks, Inc.

Industry Issues & Solutions

•Integrating tools from various programming languages is difficult, closed source tools are not customizable, and freeware is often not supported.

•There is no standard biological data format.

•Applications must be easily deployable within organizations.

•MATLAB is a supported, open architecture, user-friendly environment for data analysis across applications, algorithm development, and deployment.

•MATLAB and the Bioinformatics Toolbox provides file format support for common data sources (web-based, sequences, microarray, etc.).

•MATLAB’s deployment tools and user-interface design environment allow easy deployment of MATLAB based applications.

Page 25: © 2003 The MathWorks, Inc. MATLAB Applications in Bioinformatics Developing and Deploying Bioinformatics Applications with MATLAB © 2003 The MathWorks,

© 2003 The MathWorks, Inc.

Further Information

• Bioinformatics Toolbox Product page–Demos, technical literature, trial information–www.mathworks.com/products/bioinfo

• MATLAB Central– File exchange and newsgroup access for MATLAB and

Simulink users– www.mathworks.com/matlabcentral– Access to comp.soft-sys.matlab

                                                                                                 

file exchange and newsgroup access forthe MATLAB & Simulink user community