24
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003

Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Bioinformatics:a Multidisciplinary Challenge

Ron Y. Pinter

Dept. of Computer Science

Technion

March 12, 2003

What is Bioinformatics?• The application of information technology

to life sciences research– modeling (abstraction)– analysis and collection– data integration and information retrieval

• Enables the discovery and analysis of biomolecules and their properties (structure, function, interactions) for e.g.– pharmaceutical research– medical diagnosis– agriculture

• AKA computational or dry or in silico biology

Computational Sciences:Analytic and Predictive

• Physics– Universal: mechanics, electricity, particle physics– Started in the 17th Century

• Chemistry– Specific materials– 19th Century

• Biology– The study of living organisms– Metamorphosis coincides with the huge increase in data

acquisition capabilities and computational power

Biological Revolution Necessitates Bioinformatics

•New bio-technologies (automatic sequencing, DNA chips, protein identification, mass specs., etc.) produce large quantities of biological data.

• It is impossible to analyze data by manual inspection.

• Bioinformatics: Development of algorithms that enable theanalysis of the data (from experiments or from databases).Data produced by biologists and stored in database

New informationfor biological and medical useBioinformatics

Algorithms and Tools

Central Dogmaof Molecular Biology

Transcription

mRNA

Cells express different subset of the genes in different tissues and under different conditions

Gene (DNA)

Translation

Protein

The Genetic Code

Central Paradigm of Bioinformatics

Biochemical Function

GeneticInformation

MolecularStructure

Symptoms

• Exponential growth of biological information:growth of sequences, structures, and literature.

• Efficient storage and management tools are most important.

Activities

• Development of new models, algorithms and statistical methods to assess and predict the relationships among members of very large data sets

• Development and implementation of tools to efficiently access and manage different types of information.

• Application of these methods and tools to real problems in biology by conducting bioinformatic experiments

Primary Areas

• Genomics

• Proteomics

• Metabolomics

• “Systems biology”

Genomics• Sequence analysis

– Homology searches– Assembly of ESTs– Domain and profile identification

• Gene hunting– Promoter identification– Genomic maps

• Comparative genomics– SNP detection (point mutations) : among individuals– Genomic rearrangement: among species

Towards large scale genomic comparisons…

Human vs. Mouse

Proteomics

• Functional prediction

• Localization

• Expression analysis

• Structure prediction

• Docking information for biomolecules

• …

Metabolomicsand Systems Biology

• Metabolic and regulatory pathways

• Cell simulation

• Toxicological and phaprmacological parameters

Data Types

• Strings (over nucleotides and amino acids)• 2D and 3D geometric structures• Images• Numeric data (expression data, mass spec, …)• Graphs (pathways, networks, …)• Text articles• …

Some Queries

• What genes are connected to a disease?

• What proteins are encoded by them?

• Under what conditions are they expressed?

• What pathways do they participate in?

• Which are targets for new therapeutics?

• What will happen if we introduce a virus into a certain environment?

• …

Data Sources

• Mostly public– NCBI, EMBL, KEGG, Swissprot, …

• Also some commercial– Celera, Compugen, …

• Ever changing …

Disciplines• Life sciences

– Biology– Biochemistry– Medicine– …

• Computing– Mathematics– Computer science– Information management– Information theory

The Gap

• Life sciences– Descriptive– Based on observations, lots of exceptions– Constant evolution and change of paradigms

based on new discovery

• Computer science– Analytic– Exact and predictive– “Linear”, synthetic evolution

Bridging the Gap

• Study both disciplines– Start as early as possible

• Work in joint teams– At all levels

• Learn from each others’ methods– Increase [web] sophistication of life scientists– Teach computer scientists to model the real

world

Example: Intro to Bioinformatics

• Grand tour of tools and methods– Extensive web presence– Many highly specialized tools– Diversity in each category– Require high skill in specific usage– Loose integration

• Initial encounter with topic– Prereqs: Biology 1 and Intro to CS

• Must bridge gaps among disciplines

Method• All work in pairs of LS and CS students

– Strict enforcement– Develop dialogue– Complementary skills

• “Dry” labs, homework (reports) and final project (including presentation)

• Topical presentations coupled with labs– Delivered by Esti Yeger-Lotem, a CS/Biology expert

(speaks both languages)– Labs run by TAs