Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
1
cbio course, spring 2005, Hebrew University
Computational Methods In Molecular Biology
CS-67693, Spring 2005
School of Computer Science & EngineeringHebrew University, Jerusalem
cbio course, spring 2005, Hebrew University
Class 1: Introduction
cbio course, spring 2005, Hebrew University
Introduction
What is Comp. Bio.? Why is it great?What are the aims and basic concepts of this courseHigh level biological review: give basic bio background and motivation for tasks handled in the courseAdministration…
cbio course, spring 2005, Hebrew University
The Cell
cbio course, spring 2005, Hebrew University
Example: Tissues in Stomach
cbio course, spring 2005, Hebrew University
DNA Components
Four nucleotide types:AdenineGuanineCytosineThymine
Hydrogen bonds:A-TC-G
2
cbio course, spring 2005, Hebrew University
The Double HelixSo
urce
: Alb
erts
et a
l
cbio course, spring 2005, Hebrew University
DNA Organization
Sour
ce: A
lber
tset
al
cbio course, spring 2005, Hebrew University
Genome Sizes
E.Coli (bacteria) 4.6 x 106 basesYeast (simple fungi) 15 x 106 basesSmallest human chromosome 50 x 106 basesEntire human genome 3 x 109 bases
cbio course, spring 2005, Hebrew University
Related Computational Tasks
Need a way to reconstruct DNA sequence from fragments – major contribution of comp. bio. !Related: sequence comparison, sequence alignment
cbio course, spring 2005, Hebrew University
DNA Duplication
Sour
ce: M
athe
ws &
van
Hol
de
cbio course, spring 2005, Hebrew University
GenesThe DNA strings include:
Coding regions (“genes”) • E. coli has ~4,000 genes • Yeast has ~6,000 genes• C. Elegans has ~13,000 genes• Humans have ~32,000 genes
Control regions • These typically are adjacent to the genes• They determine when a gene should be
expressed“Junk” DNA (unknown function)
3
cbio course, spring 2005, Hebrew University
The Tree of LifeSo
urce
: Alb
erts
et a
l
cbio course, spring 2005, Hebrew University
Evolution
Related organisms have similar DNA• Similarity in sequences of proteins• Similarity in organization of genes along the
chromosomesEvolution plays a major role in biology• Many mechanisms are shared across a wide
range of organisms (e.g. orthologes)• During the course of evolution existing
components are adapted for new functions (e.gparaloges)
cbio course, spring 2005, Hebrew University
Evolution
Evolution of new organisms is driven byDiversity• Different individuals carry different variants of
the same basic blue printMutations• The DNA sequence can be changed due to
single base changes, deletion/insertion of DNA segments, etc.
Selection bias
cbio course, spring 2005, Hebrew University
Related Computational Tasks
Phylogeny – not just theory!: • Rebuild the tree of life…• Infer relations between genes/pathways etc.
across species• Learn models for changes and development• Major benefit: exploit the information we do
have/observe to infer about the systems on which we have very little knowledge and observations….
cbio course, spring 2005, Hebrew University
How Do Genes Code for Proteins?
Transcription
RNA
Translation
ProteinDNA cbio course, spring 2005, Hebrew University
Transcription
Coding sequences can be transcribed to RNA
RNA nucleotides:• Similar to DNA, slightly different backbone• Uracil (U) instead of Thymine (T)
Sour
ce: M
athe
ws &
van
Hol
de
4
cbio course, spring 2005, Hebrew University
RNA Editing
cbio course, spring 2005, Hebrew University
Translation
cbio course, spring 2005, Hebrew University
Translation
The ribosome attaches to the mRNA at a translation initiation siteThen ribosome moves along the mRNA sequence and in the process constructs a poly-peptideWhen the ribosome encounters a stop signal, it releases the mRNA. The construct poly-peptide is released, and folds into a protein.
Translation is mediated by the ribosome
Ribosome is a complex of protein & rRNA molecules
cbio course, spring 2005, Hebrew University
Translation
Sour
ce: A
lber
tset
al
cbio course, spring 2005, Hebrew University
Translation
Sour
ce: A
lber
tset
al
cbio course, spring 2005, Hebrew University
Translation
Sour
ce: A
lber
tset
al
5
cbio course, spring 2005, Hebrew University
TranslationSo
urce
: Alb
erts
et a
l
cbio course, spring 2005, Hebrew University
Translation
Sour
ce: A
lber
tset
al
cbio course, spring 2005, Hebrew University
Genetic Code
cbio course, spring 2005, Hebrew University
Transcription
RNA
Translation
ProteinDNA
The Central Dogma
Genes
Experiments
cbio course, spring 2005, Hebrew University
TFTFTFs
Basal
PromotermRNA
Gene5’ 3’
Transcription start site
3’ 5’RNA
polymerase II
5’
Eukaryotic Transcription Regulation
“Classical Model”Composition of promoter region determines rate of transcription initiationCombinations of TFs control the transcription of gene sets under specific conditions
Genes
TF
cbio course, spring 2005, Hebrew University
From Data to Model
>YKL112W Chr 11 ATGGACAAATTAGTCGTGAATTATTATGAATACAAGCACCCTATAATTAATAAAGACCTGGCCATTGGAGCCCATGGAGGCAAAAAATTTCCCACCTTGGGTGCTTGGTATGATGTAATTAATGAGTACGAATTTCAGACGCGTTGCCCTATTATTTTAAAGAATTCGCATAGGAACAAACATTTTACATTTGCCTGTCATTTGAAAAACTGTCCATTTAAAGTCTTGCTAAGCTATGCTGGCAATGCTGCATCCTCAGAAACCTCATCTCCTTCTGCAAATAATAATACCAACCCTCCGGGTACTCCTGATCATATTCATCATCATAGCAACAACATGAACAACGAGGACAATGATAATAACAATGGCAGTAATAATAAGGTTAGCAATGACAGTAAACTTGACTTCGTTACTGATGATCTTGAATACCATCTGGCGAACACTCATCCGGACGACACCAATGACAAAGTGGAGTCGAGAAGCAATGAGGTGAATGGGAACAATGACGATGATGCTGATGCCAACAACATTTTTAAACAGCAAGGTGTTACTATCAAGAACGACACTGAAGATGATTCGATAAATAAGGCCTCTAT
6
cbio course, spring 2005, Hebrew University
Many Related Computational Tasks…
Information is in the code book →:• How alternative splicing is determined and
where?• Build models for regulation of genes at different
levels of complexity• Relate genotype and phenotype: What are the
expression patterns of some disease? How do they relate to sequence? What model can explain the observations? Can we predict phenomenon based on our models?
cbio course, spring 2005, Hebrew University
Who came first?
Chicken or egg? • Egg
DNA or Protein? • RNA…
Thomas Cech & Sidney Altman ( 80’s !):• RNA as an “independent” molecule• Probably more close to the ancient “source”
cbio course, spring 2005, Hebrew University
RNA roles
Messenger RNA (mRNA)• Encodes protein sequences
Transfer RNA (tRNA)• Adaptor between mRNA molecules and amino-
acids (protein building blocks)Ribosomal RNA (rRNA) • Part of the ribosome, a machine for translating
mRNA to proteins...
cbio course, spring 2005, Hebrew University
Transfer RNA
Anticodon:matches a codon (triplet of mRNA nucleotides)
Attachment site:matches a specific amino-acid
cbio course, spring 2005, Hebrew University
Related Computational Tasks
RNA secondary structure prediction: • based on CFG and CM
RNA coding area prediction…
cbio course, spring 2005, Hebrew University
RNA Editing
Sour
ce: M
athe
ws &
van
Hol
de
7
cbio course, spring 2005, Hebrew University
Translation
cbio course, spring 2005, Hebrew University
How do Proteins Perform their Rules?
Protein interact in various waysChange conformations, conformations → functionMajor Issues: • Their “active”/functional areas which interact • Their 3D structure
cbio course, spring 2005, Hebrew University
Protein Structure
Proteins are poly-peptides of 70-3000 amino-acids
This structure is (mostly) determined by the sequence of amino-acids that make up the protein
cbio course, spring 2005, Hebrew University
Protein Structure
cbio course, spring 2005, Hebrew University
Related Computational Tasks
Protein 2D, 3D structure predictionIdentify sequence motifs/domains in proteins• Sequence similarity vs. functional similarity
cbio course, spring 2005, Hebrew University
Course GoalsReview current tasks posed by modern molecular biologyReview and experiment with some of the tools/solutions currently found (e.g. BLAST, clustalw)Gain some tools to handle such problems:• Dynamic programming• Probabilistic graphical models:
♦MM,HMM,CM,Trees♦Representation, what principles justify them,
Learning, Inference• Statistic tools: how to measure our confidence in our
results?
8
cbio course, spring 2005, Hebrew University
Course Goals
Computational tools in molecular biology:
We will cover computational tasks that are posed by modern molecular biologyWe will discuss the biological motivation and setup for these tasksWe will understand the the kinds of solutions exist and what principles justify them
cbio course, spring 2005, Hebrew University
Course’s Main Point
cbio course, spring 2005, Hebrew University
Course’s Main Point
Learn to do:Define the problem → Find comp. solutionFour Aspects:
Biological • What is the task?
Algorithmic• How to perform the task at hand efficiently?
Learning• How to adapt parameters of the task form examples
Statistics• How to differentiate true phenomena from artifacts
cbio course, spring 2005, Hebrew University
Example: Sequence Comparison
Biological • Evolution preserves sequences, thus similar genes might
have similar functionAlgorithmic
• Consider all ways to “align” one sequence against another
Learning• How do we define “similar” sequences? Use examples to
define similarityStatistics
• When we compare to ~106 sequences, what is a random match and what is true one
cbio course, spring 2005, Hebrew University
Topics I
Dealing with DNA/Protein sequences:Genome projects and how sequences are found Finding similar sequencesModels of sequences: Hidden Markov ModelsTranscription regulationProtein FamiliesGene finding
cbio course, spring 2005, Hebrew University
Topics II
Gene Expression:Genome-wide expression patternsData organization: clusteringReconstructing transcription regulationRecognizing and classifying cancers
9
cbio course, spring 2005, Hebrew University
Topics III
Models of genetic change:Long term: evolutionary changes among speciesReconstructing evolutionary trees from current day sequencesShort term: genetic variations in a populationFinding genes by linkage and association
cbio course, spring 2005, Hebrew University
Topics IV
Protein World:How proteins fold - secondary & tertiary structureHow to predict protein folds from sequences data aloneHow to analyze proteins changes from raw experimental measurements (MassSpec)2D gels
cbio course, spring 2005, Hebrew University
Class Structure2 weekly meeting• Mondays 16-18 (Levin 8), Wednesdays 10-12
(Kaplan)Grade:
Homework assignments: ~50% of the final grade. There will be up to seven homework assignments. These assignments will include theoretical problems, using bioinformatics tools and programming. Final home assignment: ~20% of the final grade. Final test: ~30% of the grade. Class participation: A 5% bonus grade for students who actively participate in discussions during classes Possible: oral presentation of any exercise to define grade!
cbio course, spring 2005, Hebrew University
Exercises & Handouts
Check regularly
http://www.cs.huji.ac.il/~cbio