43
1 Algorithms in Computational Biology (236522) Fall 2005-6 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours: Wednesday 12:30-13:30 (or 15:30-16:30) TA: Ilan Gronau, Taub 700, tel 4894 Office hours Monday 1530-1630 Lecture: Wednesday 10:30-12:30, Taub 6 Tutorial: Monday 14:30-15:30, Taub 6 This class has been initially edited from Nir Friedman’s lecture at the Hebrew University. Changes made by Dan Geiger, then by Shlomo Moran.

Algorithms in Computational Biology (236522) Fall 2005-6  Lecture #1

Embed Size (px)

DESCRIPTION

Algorithms in Computational Biology (236522) Fall 2005-6  Lecture #1. Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours: Wednesday 12:30-13:30 (or 15:30-16:30) TA: Ilan Gronau, Taub 700, tel 4894 Office hours Monday 1530-1630. Lecture: Wednesday 10:30-12:30, Taub 6 - PowerPoint PPT Presentation

Citation preview

1

Algorithms in Computational Biology (236522) Fall 2005-6 

Lecture #1

Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours: Wednesday 12:30-13:30 (or 15:30-16:30)TA: Ilan Gronau, Taub 700, tel 4894Office hours Monday 1530-1630

Lecture: Wednesday 10:30-12:30, Taub 6Tutorial: Monday 14:30-15:30, Taub 6

This class has been initially edited from Nir Friedman’s lecture at the Hebrew University. Changes made by Dan Geiger, then by Shlomo Moran.

2.11.05: עם הרבה שאלות, הספקתי קצת בלחץ. בהפסקה הייתי בשקף 25

2

Course Information

Requirements & Grades:

• 15-25% homework, in five assignments. [Submit in two weeks time]. Homework is obligatory.

• 75-85% test. Must pass beyond 55 for the homework’s grade to count

• Exam date: 1.3.06/to be coordinated.

3

Bibliography• Biological Sequence Analysis, R.Durbin et al.

, Cambridge University Press, 1998 • Introduction to Molecular Biology, J.

Setubal, J. Meidanis, PWS publishing Company, 1997 

• Phylogenetics, C. Semple, M. Steel, Oxford press, 2003

• url: webcourse.cs.technion.ac.il/~cs236522

4

Course PrerequisitesComputer Science and Probability Background• Data structure 1 (cs234218)• Algorithms 1 (cs234247)• Probability (any course)

Some Biology Background Formally: None, to allow CS students to take this course. Recommended: MolecularMolecular Biology 1 (especially for those in the

Bioinformatics track), or a similar Biology course, and/or a serious desire to complement your knowledge in Biology by reading the appropriate material (see the course web site).

בעקבות הערה של סטודנטית בקורס, החלפתי את "ביולוגיה מולקולרית 1" (שלדבריה הוא הרבה מעבר לדרוש) ב"ביולוגיה 1".

6

Biological Background

Due time: Tutorial class of 21.11.05 (<3 weeks from today).

First home work assignment: Read the first chapter (pages 1-30) of Setubal et al., 1997. (copies are available in the Taub building library, and in the central library). Answer the questions of the first assignment in the course site.

7

Computational BiologyComputational biology is the application of computational tools and techniques to (primarily) molecular biology.  It enables new ways of study in life sciences, allowing analytic and predictive methodologies that support and enhance laboratory work. It is a multidisciplinary area of study that combines Biology, Computer Science, and Statistics.

Computational biology is also called Bioinformatics, although many practitioners define Bioinformatics somewhat narrower by restricting the field to molecular Biology only.

8

Examples of Areas of Interest• Building evolutionary trees from molecular (and other) data• Efficiently constructing genomes of various organisms• Understanding the structure of genomes (SNP, SSR, Genes)• Understanding function of genes in the cell cycle and disease• Deciphering structure and function of proteins

_____________________SNP: Single Nucleotide PolymorphismSSR: Simple Sequence Repeat

SNP are common DNA sequence variations among individuals - help to understand human diseaseSSR: rgions of DNA where one to few bases are tandemly repeated few to hundreds of times.

9

Exponential growth of biological information: growth of sequences, structures, and literature.

12

Course Goals

• Learning about computational tools for (primarily) molecular biology.

• Cover computational tasks that are posed by modern molecular biology

• Discuss the biological motivation and setup for these tasks

• Understand the kinds of solutions that exist and what principles justify them

13

Topics I

Dealing with DNA/Protein sequences:• Informal biological background. (1

week)• Finding similar sequence (~3 weeks)• Models of sequences: Hidden Markov

Models (~2 weeks)• Parameter estimation: ML methods and

the EM algorithm (~4 weeks)

14

Topics II

Reconstructing evolutionary trees:• Background: Darwin’s theory of evolution• Distance based methods (~2 weeks)• Character based methods (~2 weeks)

The presentations are similar to these given in the fall Semester 04-05, and can be found in the site of that semester.

Updated presentations will be uploaded to the course site before the lectures.

16

Human GenomeMost human cells contain

46 chromosomes:

• 2 sex chromosomes (X,Y):

XY – in males.

XX – in females.

• 22 pairs of chromosomes named autosomes.

autosome - any chrmosome which is not the sex chrmosome

17

DNA OrganizationS

ourc

e: A

lber

ts e

t al

USER
מהם העיגולים בשקף השני משמאל?

18

The Double HelixS

ourc

e: A

lber

ts e

t al

19

DNA ComponentsFour nucleotide types:• Adenine• Guanine• Cytosine• Thymine

Hydrogen bonds(electrostatic connection):

• A-T• C-G

20

Genome Sizes• E.Coli (bacteria) 4.6 x 106 bases• Yeast (simple fungi) 15 x 106 bases• Smallest human chromosome 50 x 106 bases• Entire human genome 3 x 109 bases

USER
האם למטה זה כרומוזומי האדם? אם לא, מה זה?

21

Genetic Information

• Genome – the collection of genetic information.

• Chromosomes – storage units of genes.

• Gene – basic unit of genetic information. They determine the inherited characters.

22

GenesThe DNA strings include:• Coding regions (“genes”)

– E. coli has ~4,000 genes – Yeast has ~6,000 genes– C. Elegans has ~13,000 genes– Humans have ~32,000 genes

• Control regions – These typically are adjacent to the genes– They determine when a gene should be “expressed”

• “Junk” DNA (unknown function - ~90% of the DNA in human’s chromosomes)

הערה מסטודנט 05-6: אזורי הבקרה קובעים את ביטוי החלבון על סמך אינטרקציה עם סביבתם בתא - אך מהות האינטרקציה הינה מסובכת ולא ידועה בהרבה מקרים.

23

The Cell

All cells of an organism contain the same DNA content (and the same genes) yet there is a variety of cell types.

24

Example: Tissues in Stomach

How is this variety encoded and expressed ?

25

Central Dogma

Transcription

mRNA

Translation

ProteinGene

cells express different subset of the genesIn different tissues and under different conditions

שעתוק תרגום

26

Transcription• Coding sequences can be transcribed to

RNA

• RNA – Similar to DNA, slightly different nucleotides:

different backbone– Uracil (U) instead of Thymine (T)

Sou

rce:

Mat

hew

s &

van

Hol

de

USER
הסבר על ה"נעצים" הקטנים

27

Transcription: RNA Editing

Exons hold information, they are more stable during evolution.This process takes place in the nucleus. The mRNA molecules diffuse through the nucleus membrane to the outer cell plasma.

1. Transcribe to RNA2. Eliminate introns3. Splice (connect) exons* Alternative splicing exists

מידע מסטודנט 04-5: גם ה introns הם חלק מגן (כלומר, לא "זבל"). במו כן, לאחרונה (3 שנים - 2004) התברר שחלק מה mRNA לא מתורגם לחלבונים בסופו של דבר, אך יש לו תפקיד ביולוגי אחר.מידע מסטודנט 05-6: החולקה לintrons ו exons אינה חד משמעית: ייתכנו שינויים באזורי הגבול

28

RNA roles• Messenger RNA (mRNA)

– Encodes protein sequences. Each three nucleotide acids translate to an amino acid (the protein building block).

• Transfer RNA (tRNA)– Decodes the mRNA molecules to amino-acids. It connects

to the mRNA with one side and holds the appropriate amino acid on its other side.

• Ribosomal RNA (rRNA) – Part of the ribosome, a machine for translating mRNA to

proteins. It catalyzes (like enzymes) the reaction that attaches the hanging amino acid from the tRNA to the amino acid chain being created.

• ...

29

Translation

• Translation is mediated by the ribosome• Ribosome is a complex of protein & rRNA

molecules• The ribosome attaches to the mRNA at a

translation initiation site• Then ribosome moves along the mRNA sequence

and in the process constructs a sequence of amino acids (polypeptide) which is released and folds into a protein.

30

Genetic Code

There are 20 amino acids from which proteins are build.

31

Protein Structure

• Proteins are poly-peptides of 70-3000 amino-acids

• This structure is (mostly) determined by the sequence of amino-acids that make up the protein

USER
למצוא קצת יותר מידע על תמונה זו
הערות 2.11.05: הקשרים במבנה השלישוני הן ביו קבוצות של חומצות אמינו.

32

Protein Structure

33

Evolution

• Related organisms have similar DNA– Similarity in sequences of proteins– Similarity in organization of genes along the

chromosomes

• Evolution plays a major role in biology– Many mechanisms are shared across a wide

range of organisms– During the course of evolution existing

components are adapted for new functions

34

Evolution

Evolution of new organisms is driven by

• Diversity– Different individuals carry different variants of

the same basic blue print

• Mutations– The DNA sequence can be changed due to

single base changes, deletion/insertion of DNA segments, etc.

• Selection bias

35

The Tree of Life

Sou

rce:

Alb

erts

et

al

36

Example of a graph theoretic problem related

to evolution trees: the perfect phylogeny

problem

37

Characters in Species

• A (discrete) character is a property which distinguishes between species (e.g. dental structure, a certain gene)

• A characters state is a value of the character (human dental structure).

• Problem: Given set of species, specified by their characters, reconstruct their evolutionary tree.

38

Species ≡ VerticesCharacters ≡ Colorings

States ≡ Colors Each species is identified by its states

Evolutionary tree ≡ A tree with many colorings, containing the given vertices

= No teeth

= teeth

AB

C

D

39

Another tree

Which tree is more reasonable?

= No teeth

= teeth

A B

C D

40

Evolutionary trees should avoid

reversal transitions

• A species regains a state it’s direct ancestor has lost.

• Famous (and rare) examples:– Teeth in birds.– Legs in snakes.

experiment reported in science 80: producing teeth in chickens

41

Evolutionary trees should avoid convergence transitions

• Two species possess the same state while their least common ancestor possesses a different state.

• Famous example: The marsupials.

42

היונקים מימין הם יונקי כיס. קודם היתה התפצלות של כל היומקי כיס, ולאחר מכן התכנסות לכל מיני תכונות דומות ליונקים "רגילים".

43

Common Assumption:Characters with Reversal or Convergent transitions are highly unlikely in the Evolutionary Tree

A character that exhibits neither reversals nor convergence is denoted homoplasy free.

44

A character is Homoplasy Free

↕ The corresponding coloring is convex

(each color induces a connected subtree)

45

A partial coloring is convex if it can be completed to a (total) convex coloring

46

The Perfect Phylogeny Problem

• Input: a set of species, and many characters, each assign states (colors) to the species.

• Question: is there a tree T containing the species as vertices, in which all the characters (colorings) are convex?

47

Input: Some colorings (C1,…,Ck) of a set of vertices (in the example: 3 colorings: left, center, right, each by (the same) two colors).

Problem: Is there a tree T which includes these vertices, s.t. (T,Ci) is convex for i=1,…,k?

RBRRRRBBRRRB

The Perfect Phylogeny Problem(combinatorial setting)

NP-Hard In general, in P for some special cases