Upload
job-hodge
View
216
Download
2
Tags:
Embed Size (px)
Citation preview
Comparative Genomics
2/30
Overview of the Talk
• Comparing Genomes
• Homologies & Families
• Sequence Alignments
3/30
Evolution at the DNA Level
…ACTGACATGTACCA…
…AC----CATGCACCA…
Mutation
Sequence edits
Rearrangements
Deletion
InversionTranslocationDuplication
4/30
• We can better understand evolution/ speciation
• We can find important, functional regions of the sequence (codons, promoters, regulatory regions)
• It can help us locate genes in other species that are missing or not well-defined (also through comparison and alignments).
Why Compare Genomes?
5/30
Mammals have roughly 3 billion base pairs in their genomes
Over 98% human genes are shared with primates, wth more than 95-98% similarity between genes.
Even the fruit fly shares 60% of its genes with humans! (March 2000)
Differences: gene structure, sequence
Remember… one nucleotide change can cause disease such as sickle cell anemia and cancer.
Comparing Genomes
6/30
• Uses all the species
• Uses a representative protein (the longest) for every gene
• Builds a gene tree
• EnsemblCompara GeneTrees: Analysis of complete, duplication aware phylogenetic trees in vertebrates. Vilella AJ, Severin J, Ureta-Vidal A, Durbin R, Heng L, Birney E. Genome Res. 2008 Nov 24.
How Does Ensembl Predict Homology?
7/30
Load longest protein for every gene from all species
WU Blastp + SmithWaterman longest translation of every gene
against every other (Blast Reciprocal Hit/ Blast Score Ratio)
Protein clustering, build multiple alignments (MCoffee)
From each alignment, build a gene tree (TreeBest)
Reconcile each gene tree with the species tree to determine internal
nodes (TreeBest) Orthologues, paralogues…
Steps in Homology Prediction
..MEDPATA…
8/30
Viewing Trees in Ensembl
9/30
Types of Homologues
• Orthologues : any gene pairwise relation where the ancestor node is a speciation event
• Paralogues : any gene pairwise relation where the ancestor node is a duplication event
10/30
The Gene Tree for INS (insulin precursor)
A red square is a
duplication event
(Paralogues)
A blue square is a
speciation event
(Orthologues)
Reconciliation
M
R
H
M
R
H
species tree
unrooted gene tree
Duplication nodeSpeciation node
M
R
HM
H
R
gene
loss
gene
loss
gene lossR’
H’
M’
12/30
Orthologue Types
What is ‘1 to 1’?
What is ‘1 to many’?
13/30
Protein Families
• How: Cluster proteins for every isoform in every species + UniProt proteins.
• BLASTP comparison of:– all Ensembl ENSP…– all metazoan (animal) proteins in UniProt
14/30
1. Find the human MYL6 gene: go to its gene summary.
2. How many paralogues does it have? Find them in the gene tree.
3. Which paralogue is closest to the human MYL6 gene? In what taxon is the common ancestor?
Homologues ExerciseHomologues Exercise
15/30
Pan-taxonomic compara
Anopheles gambiaeCaenorhabditis elegansDrosophila melanogaster
Aspergillus nidulansNeurospora crassaSaccharomyces cerevisiaeSchizosaccharomyces pombe
B_aphidicola_Tokyo_1998B_burgdorferi_DSM_4680B_subtilisE_coli_K12M_tuberculosis_H37RvN_meningitidis_AP_horikoshiiS_aureus_N315S_pneumoniae_TIGR4S_pyogenes_SF370W_pipientis_wMel
Anolis carolinensisCiona savignyiDanio rerioEquus caballusGallus gallusHomo sapiensMacaca mulattaMonodelphis domesticaMus musculusOrnithorhynchus anatinusPan troglodytesPongo pygmaeusXenopus tropicalis
Dictyostelium discoideumPlasmodium falciparumPlasmodium vivax
17/30
Families
18/30
Ensembl Proteins in the Family
19/30
Overview of the Talk
• Comparing Genomes
• Homologies and Families
• Sequence Alignments
20/30
• To identify homologous regions
• To spot trouble gene predictions
• Conserved regions could be functional
• To define syntenic regions (long regions of DNA sequences where order and orientation is highly conserved)
Aligning Whole Genomes- Why?
21/30
Aligning large genomic sequences
Difficulties:• Requires a significant computer resource• Scalability, as more and more genomes are
sequenced• Time constraint• As the «true» alignment is not known, then
difficult to measure the alignment accuracy and apply the right method
22/30
Whole Genome Alignments• BLASTZ-net (nucleotide level) closer species e.g. human – mouse
• Translated BLAT (amino acid level) more distant species, e.g. human – zebrafish
• EPO/PECAN multispecies alignments
• ORTHEUS used to determine ancestral alleles
23/30
Which Multispecies Alignments?
Mercator-Pecan• 16 amniota vertebrates + constrained elements
Enredo-Pecan-Ortheus (EPO)• For 6 primates• For 5 teleost fish + constrained elements• For 12 eutherian mammals• For 34 eutherian mammals + constrained elements
24/30
• “Phylogenetic Footprinting” – conserved noncoding regions can be functional
• Regulatory regions discovered in this way for genes:
Hoxb-1, Hoxb4, PAX6, SOX9
Non-Coding Regions
25/30
More Examples
• Highly conserved transcription factor binding sites discovered
eg. 401 bp non-coding sequence involved in transcriptional regulation of Interleukins.
• New genes (human-mouse comparison)
eg. APOA5, identified as a paralogue to APOA4 in human and mouse.
26/30
Going Beyond Mammals
Where human-mouse is too conserved, go to other species:
Chicken (Mammals and birds: 300MYA)
e.g. A cardiac-specific enhancer of Nkx2-5
Human and fish (400-450 MYA)
In 2002, comparison of human to Fugu rubripes led to identification of 1000 genes.
27/30
Regulatory Features of the PDX1 gene
Region in Detail shows conservation of sequence in regionsinvolved in PDX1 transcriptional regulation (1.6-2.8 kb upstream of the gene).
28/30
1. Have a look at Region in Detail for the ACN9 gene.
2. Turn on the BLASTZ alignment against macaque. What parts of the macaque genome aligns to this region in human?
3. Turn on the constrained elements for the 33 eutherian mammals. How does this track differ from the BLASTZ alignment?
Alignments ExerciseAlignments Exercise
29/30
1. Zoom out one box in the zoom slide.
Are there constrained elements upstream of the ACN9 transcript that overlap a regulatory feature?
2. View the ‘6 primates alignment’ using the Alignments links at the left.
Alignments ContinuedAlignments Continued
30/30
Compara Team at EBI
• Javier Herrero• Kathryn Beal• Stephen Fitzgerald• Leo Gordon