Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Extending Gene Families Extending Gene Families via Predicted Ancestral via Predicted Ancestral
SequencesSequences
Loretta GoldbergLoretta GoldbergComputational BiosciencesComputational Biosciences
Arizona State UniversityArizona State UniversityApril 28, 2006April 28, 2006
This internship project is presented in This internship project is presented in partial fulfillment of the requirement of partial fulfillment of the requirement of
the Professional Science Master's in the Professional Science Master's in Computational Biosciences, Computational Biosciences,
Arizona State UniversityArizona State University
Research conducted fromResearch conducted fromSeptember 2005 September 2005 –– April 2006April 2006
3
AcknowledgementsAcknowledgements
Thanks to my Advisor, Dr. Michael Rosenberg, Thanks to my Advisor, Dr. Michael Rosenberg, who proposed this internship project, and has who proposed this internship project, and has provided guidance throughout.provided guidance throughout.
Thanks to my committee: Dr. Rosenberg, Dr. Thanks to my committee: Dr. Rosenberg, Dr. Renaut and Dr. Touchman, for the knowledge they Renaut and Dr. Touchman, for the knowledge they have shared throughout my coursework and for have shared throughout my coursework and for review of this final project.review of this final project.
4
Personal BackgroundPersonal Background
B.A. ChemistryB.A. ChemistryRussell Sage College, Troy NYRussell Sage College, Troy NY
4 years as a Process Development Chemist in the 4 years as a Process Development Chemist in the Pharmaceutical IndustryPharmaceutical Industry
M.S. Computer SciencesM.S. Computer SciencesArizona State UniversityArizona State University
14 years as a Software Engineer in the 14 years as a Software Engineer in the Telecommunications IndustryTelecommunications Industry
P.S.M. Computational BiosciencesP.S.M. Computational BiosciencesTBDTBD
5
Extending Gene Families via Extending Gene Families via Predicted Ancestral SequencesPredicted Ancestral Sequences
Gene familiesGene familiesWhat are they?What are they?How do we determine them?How do we determine them?
Ancestral SequencesAncestral SequencesWhat are they?What are they?How do we determine them?How do we determine them?
How does this project propose to utilize How does this project propose to utilize ancestral sequences in the context of gene ancestral sequences in the context of gene families?families?
6
Gene FamiliesGene Families
A A gene familygene family is the grouping together of is the grouping together of genes based on the similarity of the products genes based on the similarity of the products or proteins that they produce.or proteins that they produce.
A A gene familygene family is a set of related genes is a set of related genes occupying various loci in the DNA, almost occupying various loci in the DNA, almost certainly formed by duplication of ancestral certainly formed by duplication of ancestral genes, and having a genes, and having a recognizablyrecognizably similar similar sequence.sequence.
7
Ancestral sequencesAncestral sequences
A A phylogenetic treephylogenetic tree is a is a tool used to show the tool used to show the evolutionary relationship evolutionary relationship between biological objectsbetween biological objects
The The tipstips represent the represent the observed genes observed genes (sequences) in living (sequences) in living organismsorganisms
The The nodesnodes or branching or branching points represent the points represent the extinct ancestorsextinct ancestors
Ancestralsequence
ObservedSequence
8
Project GoalProject Goal
To determine if the inclusion of ancestral To determine if the inclusion of ancestral sequences in an allsequences in an all--vsvs--all comparison of all comparison of protein sequences will extend the number of protein sequences will extend the number of homologous sequences that can be extracted homologous sequences that can be extracted from this comparison.from this comparison.
9
Proposed Work FlowProposed Work Flow
Identifying protein sequences of interestIdentifying protein sequences of interest
AllAll--vsvs--all sequence comparisonall sequence comparison
Alignment of sequencesAlignment of sequences
Generation of Phylogenetic treesGeneration of Phylogenetic trees
Prediction of ancestral sequencesPrediction of ancestral sequences
AllAll--vsvs--all sequence comparison (including all sequence comparison (including ancestral sequences)ancestral sequences)
10
Protein Sequences
All-vs-AllComparison
SequenceAlignment
Phylogenetic Tree Construction
Ancestral SequencePrediction
DetermineEffect ofAncestral
Sequences
Proposed Work FlowProposed Work Flow
11
Protein Sequences
All-vs-AllComparison
SequenceAlignment
Phylogenetic Tree Construction
Ancestral SequencePrediction
DetermineEffect ofAncestral
Sequences
Obtaining Protein SequencesObtaining Protein Sequences
Sequences for the desired species are extracted from NCBI’s FASTA formatted nonredundant protein database (nr).
formatdb
12
Protein Sequences
All-vs-AllComparison
SequenceAlignment
Phylogenetic Tree Construction
Ancestral SequencePrediction
DetermineEffect ofAncestral
Sequences
Finding Gene FamiliesFinding Gene Families
blastclust performs the all-vs-all sequence comparison, returning a cluster list (each cluster represents a gene family or group of homologous sequences).
formatdb
blastclust
13
Protein Sequences
All-vs-AllComparison
SequenceAlignment
Phylogenetic Tree Construction
Ancestral SequencePrediction
DetermineEffect ofAncestral
Sequences
Sequence AlignmentSequence Alignment
clustalw performs a progressive multiple sequence alignment for each cluster of sequences. clustalw is the command line version of web-based CLUSTAL W.
formatdb
blastclust
clustalw
14
Protein Sequences
All-vs-AllComparison
SequenceAlignment
Phylogenetic Tree Construction
Ancestral SequencePrediction
DetermineEffect ofAncestral
Sequences
Phylogenetic Tree ConstructionPhylogenetic Tree Construction
clustalw utilizes the sequence alignment to build a phylogenetic tree via neighbor joining.
formatdb
blastclust
clustalw
clustalw
15
Protein Sequences
All-vs-AllComparison
SequenceAlignment
Phylogenetic Tree Construction
Ancestral SequencePrediction
DetermineEffect ofAncestral
Sequences
Ancestral Sequences PredictionAncestral Sequences Prediction
codeml utilizes both the sequence alignment and phylogenetic tree as input. The maximum likelihood method is used in the reconstruction of the ancestral sequences.
formatdb
blastclust
clustalw
clustalw
codeml
16
Protein Sequences
All-vs-AllComparison
SequenceAlignment
Phylogenetic Tree Construction
Ancestral SequencePrediction
DetermineEffect ofAncestral
Sequences
Extending Sequence DatabaseExtending Sequence Database
The database of protein sequence now includes the predicted ancestral sequences.
formatdb
blastclust
clustalw
clustalw
codeml
17
Protein Sequences
All-vs-AllComparison
SequenceAlignment
Phylogenetic Tree Construction
Ancestral SequencePrediction
DetermineEffect ofAncestral
Sequences
Finding Gene Families (again)Finding Gene Families (again)
The database of protein sequence now includes the predicted ancestral sequences.
formatdb
blastclust
clustalw
clustalw
codeml
18
Protein Sequences
All-vs-AllComparison
SequenceAlignment
Phylogenetic Tree Construction
Ancestral SequencePrediction
DetermineEffect ofAncestral
Sequences
Gene Family AnalysisGene Family Analysis
The cluster lists from each all-vs-all comparison are the basis of this analysis.
formatdb
blastclust
clustalw
clustalw
codeml
19
Programming EnvironmentProgramming Environment
All of the software developed for this project included:All of the software developed for this project included:PerlPerlCCBatch files executed from the DOS promptBatch files executed from the DOS prompt
All of the development was done under Windows XP. The All of the development was done under Windows XP. The development environment included:development environment included:
Active State PerlActive State PerlMinGW CMinGW CSource EditSource Edit
All processing was performed locallyAll processing was performed locallyThe external programs utilized (formatdb, blastclust, The external programs utilized (formatdb, blastclust, clustalw, and codeml) were command line versions, clustalw, and codeml) were command line versions, compiled for Windows, and executed from the DOS compiled for Windows, and executed from the DOS prompt.prompt.
20
Sequences/Species Sequences/Species
FASTA sequences from the nr protein database Sequences Rel Size
All species as of 01/11/06 3203752
Mus musculus and Rattus norvegicus – rodents 136403 110.35%
Homo sapiens – human 123609 100.00%Mus musculus – mouse 106205 85.92%Arabidopsis thaliana - thale cress 52159 42.20%Bos taurus – cattle 36855 29.82%Rattus norvegicus – rat 30881 24.98%Pan troglodytes – chimp 22873 18.50%Escherichia coli 12225 9.89%Saccharomyces cerevisiae and Schizosaccharomyces pombe – yeast 15484 12.53%
Saccharomyces cerevisiae – baker's yeast 9158 7.41%Zea mays – corn 3410 2.76%
21
Processing Time RequiredProcessing Time Required
SpeciesDatabase
(nr 1/11/06)
FASTArecs formatdb blastclust clustalw codeml
AncestralSeq blastclust
rodents 136403 4 d 7375 4d 15h 1m
human 123609 46 s 3d 1h 22m 3h 45m >3 d 5822 3d 10h 36m
mouse 106205 41 s 2d 7h 55m 1h 27m >3d 4569 2d 14h 19m
thale cress 52159 22 s 10h 6m
22
Initial ClusteringInitial Clustering
SpeciesDatabase
FASTArecs
Clusters size ≥ 6
Largestcluster
LargestCluster
processed
Sequencesprocessed
Avg. # Seq. /cluster
rodents 136403 1815 4051 404 23314 13
human 123609 1050 3210 406 28103 27
mouse 106205 951 4049 402 15958 17
thale cress 52159 183 38 38 1513 8
rat 30881 112 195 195 1346 12chimp 22873 17 92 NA NA NAyeast 15484 34 34 34 319 9
byeast 9158 33 34 34 313 9corn 3410 53 48 48 672 13
23
Second ClusteringSecond Clustering
SpeciesDatabase
AncestralSequences
added
Clusterssize ≥ 6
Largestcluster
Sequencesprocessed
Avg. #Seq. /
cluster
rodents 7375 1815 4051 30805 17
human 5822 1051 3210 33974 32
mouse 4569 956 4049 20592 22
thale cress 520 183 75 2040 11
rat 784 110 403 2147 20
yeast 101 34 60 420 12
corn 157 53 52 829 16
24
Comparative AnalysisComparative Analysis
DEBUG: Collecting clusters from "corn_db\clust.out"DEBUG: Collecting clusters from "corn_db2\aClust.out"------------------------------------------------------------Initial clustering (size 6+) included: 672 Sequences# maintained in reclustering: 672 # lost during reclustering: 0 ------------------------------------------------------------Reclustering (size 6+) included: 829 sequences# maintained during reclustering: 672 # ancestral added for reclustering: 157 # ancestral maintained during reclustering: 157 # other seq added during reclustering: 0
672+157
829
For this relatively small sequence database (corn), the presence of the ancestral sequences did not change the sequences that were clustered together
25
Comparative AnalysisComparative Analysis
DEBUG: Collecting clusters from "thale_db\clust.out"DEBUG: Collecting clusters from "thale_db2\aclust.out"------------------------------------------------------------Initial clustering (size 6+) included: 1513 Sequences# maintained in reclustering: 1513 # lost during reclustering: 0 ------------------------------------------------------------Reclustering (size 6+) included: 2040 sequences# maintained during reclustering: 1513 # ancestral added for reclustering: 520 # ancestral maintained during reclustering: 517 # other seq added during reclustering: 10
Sequence gained during reclustering: cluster 00019_0001 id 15231449 Sequence gained during reclustering: cluster 00027_0001 id 15809938 Sequence gained during reclustering: cluster 00027_0001 id 16648811 Sequence gained during reclustering: cluster 00046_0001 id 2832540 Sequence gained during reclustering: cluster 00046_0001 id 2832572 Sequence gained during reclustering: cluster 00027_0001 id 30690085 Sequence gained during reclustering: cluster 00075_0001 id 32364494 Sequence gained during reclustering: cluster 00075_0001 id 32364496 Sequence gained during reclustering: cluster 00075_0001 id 32364523 Sequence gained during reclustering: cluster 00010_0018 id 7671458
For this sequence database, the presence of the ancestral sequences caused 10 additional sequences to be pulled into gene families. The number of clusters size 6+ was unchanged.
26
Comparative AnalysisComparative Analysis
DEBUG: Collecting clusters from "rat_db\clust.out"DEBUG: Collecting clusters from "rat_db2\aclust.out"------------------------------------------------------------Found composite cluster 00403_0001 in "rat_db2\aclust.out"
Sequences from 00195_0001and 00037_0001 in "rat_db\clust.out"
------------------------------------------------------------Found composite cluster 00031_0001 in "rat_db2\aclust.out"
Sequences from 00010_0003and 00006_0018 in "rat_db\clust.out"
------------------------------------------------------------Initial clustering (size 6+) included: 1346 Sequences# maintained in reclustering: 1346 # lost during reclustering: 0 ------------------------------------------------------------Reclustering (size 6+) included: 2147 sequences# maintained during reclustering: 1346 # ancestral added for reclustering: 784 # ancestral maintained during reclustering: 784 # other seq added during reclustering: 17
Sequence gained during reclustering: cluster 00032_0001 id 27692470 Sequence gained during reclustering: cluster 00022_0001 id 27695749 . . .
For this sequence database, the presence of the ancestral sequences caused 17 additional sequences to be pulled into gene families. In addition, 2 pairs of clusters were combined to form larger clusters.
27
2
13
4
6
8
57
10
9
2
1
3
4
5
6
Olfactory Receptor ProteinsInitial Clustering
1
1
1
313 – 320amino acids
312 – 313amino acids
309 – 313amino acids
28
Olfactory Receptor Proteins +Predicted Ancestral Sequences
1
6
2
5
78
4
3 1
2 4
3
2
1
3
4
5
6
2
13
4
6
8
57
10
9
(n – 2) unique ancestral sequences
generated for each cluster
291
1
1
Olfactory Receptor ProteinsClustering with Ancestral Sequences
1
6
2
5
7 8
4
31
2
4
3
2
1
34
5
6
2
13
4
6
8
57
10
9
Total of31 sequences233 linkages
(not all shown!!)
30
Comparative AnalysisComparative Analysis
SpeciesDatabase
Clusterssize ≥ 6
ObservedSequences
AncestralSequences
added
Clusterssize ≥ 6
Observed+Ancestral
Sequences
NumberOf
CompositeClusters
AdditionalSequencesClustered
1815 133
59
71
10
17
0
0
1050
951
183
112
34
53
Sequencesprocessed
rodents 7375 1815 11 30805
human 5822 1051 7 33974
mouse 4569 956 7 20592
thale cress 520 183 0 2040
rat 784 110 2 2147
yeast 101 34 0 420
corn 157 53 0 829
31
ConclusionsConclusions
It appears that using ancestral sequences to It appears that using ancestral sequences to extend gene families is a viable approach.extend gene families is a viable approach.
The work presented here is just a starting The work presented here is just a starting point, a demonstration of point, a demonstration of ““proof of proof of conceptconcept””..
There is much more work that can be done There is much more work that can be done to determine how effective this approach is.to determine how effective this approach is.
32
Future WorkFuture Work
Default parameters were used for Default parameters were used for blastclustblastclustresulting in fairly stringent similarity criteria, thus resulting in fairly stringent similarity criteria, thus all of the sequences clustered together are all of the sequences clustered together are annotated similarly in annotated similarly in GenbankGenbank ((egeg. Olfactory . Olfactory receptors, immunoglobulin chains, etc.)receptors, immunoglobulin chains, etc.)
A true test of this approach would be to determine A true test of this approach would be to determine the criteria that clusters all of sequences perceived the criteria that clusters all of sequences perceived as homologous in the first pass, then cluster again as homologous in the first pass, then cluster again with ancestral sequences.with ancestral sequences.
33
Future WorkFuture Work
For each of the steps described: alignment, For each of the steps described: alignment, tree building, and prediction of ancestral tree building, and prediction of ancestral sequences we have tried one algorithm.sequences we have tried one algorithm.
How does the accuracy of each of these step How does the accuracy of each of these step effect the result?effect the result?Would another method/algorithm be better?Would another method/algorithm be better?egeg. Bayesian methods for ancestral sequences. Bayesian methods for ancestral sequences
Parsimony or Likelihood methods for treeParsimony or Likelihood methods for treeconstructionconstruction
34
ReferencesReferences
[Atls 90][Atls 90] Altschul S F, Gish W, Miller W, Myers E W, Lipman D J; Basic LocAltschul S F, Gish W, Miller W, Myers E W, Lipman D J; Basic Local al Alignment Search Tool; Alignment Search Tool; Journal of Molecular BiologyJournal of Molecular Biology 1990; 215:4031990; 215:403--410.410.
[Atls 97][Atls 97] Altschul S F, Madden T L, Schaffer A A, Zhang J, Zhang Z, MillerAltschul S F, Madden T L, Schaffer A A, Zhang J, Zhang Z, Miller W, Lipman D J; W, Lipman D J; Gapped BLAST and PSIGapped BLAST and PSI--BLAST: a new generation of protein database search BLAST: a new generation of protein database search programs; programs; Nucleic Acids ResNucleic Acids Res 1997; 25:33891997; 25:3389--3402. 3402.
[Brid 06][Brid 06] BridghamBridgham J T, Carroll S M, Thornton J W; Evolution of HormoneJ T, Carroll S M, Thornton J W; Evolution of Hormone--Receptor Receptor Complexity by Molecular Evolution; Science 2006; 312:97Complexity by Molecular Evolution; Science 2006; 312:97--101.101.
[Geer 02][Geer 02] Geer L Y, Geer L Y, DomrachevDomrachev M, Lipman D J, Bryant S H; CDART: Protein Homology M, Lipman D J, Bryant S H; CDART: Protein Homology by by Domain Architecture; Domain Architecture; Genome ResearchGenome Research 2002; 12(10):16192002; 12(10):1619--1623.1623.
[Heni 92][Heni 92] HeinkoffHeinkoff S, S, HeinkoffHeinkoff J G; Amino Acid Substitution Matrices for Block Proteins; J G; Amino Acid Substitution Matrices for Block Proteins; Proc. Nat. Proc. Nat. Acad. Acad. SciSci. USA. USA 1992; 89(22):109151992; 89(22):10915--9.9.
[Jean 98][Jean 98] JeanmouginJeanmougin F, Thompson J D, F, Thompson J D, GouyGouy M, Higgins D G, and Gibson, T J; Multiple M, Higgins D G, and Gibson, T J; Multiple Sequence Alignments with Clustal X; Trends Sequence Alignments with Clustal X; Trends BiochemBiochem SciSci 1998; 23:4031998; 23:403--5.5.
[Jone 92][Jone 92] Jones D T, Taylor W R, Thornton J M; The Rapid Generation of MutJones D T, Taylor W R, Thornton J M; The Rapid Generation of Mutation Data Matrices ation Data Matrices from Protein Sequences; from Protein Sequences; CABIOSCABIOS[1][1] 1992; 8:2751992; 8:275--282.282.
[Kosi 05][Kosi 05] KosiolKosiol C, Goldman N; Different Versions of the C, Goldman N; Different Versions of the DayhoffDayhoff Rate Matrix; Rate Matrix; Molecular Biology and Molecular Biology and EvolutionEvolution 2005; 22:1932005; 22:193--199.199.
[Lewi 04][Lewi 04] Lewin B; Lewin B; Genes VIII; Genes VIII; Pearson Prentice Hall, NJ 2004.Pearson Prentice Hall, NJ 2004.
[1][1] CABIOS is the former name of the Bioinformatic Journal, Oxford CABIOS is the former name of the Bioinformatic Journal, Oxford University Press.University Press.
35
ReferencesReferences
[Li 01][Li 01] Li W, Jaroszewski L, Godzik A; Clustering of Highly Homologous SLi W, Jaroszewski L, Godzik A; Clustering of Highly Homologous Sequences to Reduce equences to Reduce the Size of Large Protein Databases; the Size of Large Protein Databases; BioinformaticsBioinformatics 2001; 17(3):2822001; 17(3):282--283.283.
[Pevs 03][Pevs 03] Pevsner J; BioinformaticsPevsner J; Bioinformatics and Functional Genomicsand Functional Genomics; John Wiley & Sons, Inc., NJ 2003.; John Wiley & Sons, Inc., NJ 2003.[Remm 00][Remm 00] Remm M, Remm M, SonnhammerSonnhammer E; Classification of Transmembrane Protein Families in the E; Classification of Transmembrane Protein Families in the
CaenorhabditisCaenorhabditis eleganselegans Genome and Identification of Human Orthologs; Genome and Identification of Human Orthologs; Genome ResearchGenome Research2000; 10(11):16792000; 10(11):1679--1689.1689.
[Tatu 97][Tatu 97] TatusovTatusov R L, R L, KooninKoonin E V, Lipman D J; A Genomic Perspective on Protein Families; E V, Lipman D J; A Genomic Perspective on Protein Families; ScienceScience 1997; 278(5338):6311997; 278(5338):631--637.637.
[Tigr 05][Tigr 05] TIGR; Domain Based Paralogous Protein Families; TIGR; Domain Based Paralogous Protein Families; www.tigr.orgwww.tigr.org ; Annotation Workshop ; Annotation Workshop July 13, 2005.July 13, 2005.
[Thom 94][Thom 94] Thompson J D, Higgins D G, Gibson T J; Clustal W: Improving the Thompson J D, Higgins D G, Gibson T J; Clustal W: Improving the Sensitivity of Sensitivity of Progressive Multiple Sequence Alignment through Sequence WeightiProgressive Multiple Sequence Alignment through Sequence Weighting, Positionng, Position--Specific Specific Gap Penalties and Weight Matrix Choice; Gap Penalties and Weight Matrix Choice; Nucleic Acids ResearchNucleic Acids Research 1994; 22(22):46731994; 22(22):4673--4680.4680.
[Yang 94][Yang 94] Yang Z, Goldman N, Friday A; Comparison of Models for NucleotideYang Z, Goldman N, Friday A; Comparison of Models for Nucleotide substitution used substitution used in Maximumin Maximum--Likelihood Phylogenetic Estimation; Likelihood Phylogenetic Estimation; Molecular Biology and EvolutionMolecular Biology and Evolution 1994; 1994; 11:31611:316--324.324.
[Yang 95][Yang 95] Yang Z, Kumar S, Yang Z, Kumar S, NeiNei M; A New Method of Inference of Ancestral Nucleotide and M; A New Method of Inference of Ancestral Nucleotide and Amino Acid Sequences; Amino Acid Sequences; GeneticsGenetics 1995; 141:16411995; 141:1641--1650.1650.
[Yang 97] [Yang 97] Yang Z; PAML: a Program Package for Phylogenetic Analysis by MaxYang Z; PAML: a Program Package for Phylogenetic Analysis by Maximum Likelihood; imum Likelihood; CABIOSCABIOS 1997; 12(7):5551997; 12(7):555--556.556.
36
Questions?Questions?
Thank you for attending!Thank you for attending!
37
Input Software OutputNCBI’s nr database spFilter.pl speciesDB(FASTA)
speciesDB(FASTA) dbFormatter.pl(formatdb) speciesDB(BLAST)
speciesDB(BLAST) bClust.pl(blastclust) cluster listneighbor/hit-list
cluster list countClust.pl summary of cluster sizes
cluster list collectClust.pl cluster files (FASTA)
cluster files (FASTA) buildClustalBat.pl bat file(clustalw)
cluster files (FASTA) bat file (clustalw) alignment files (PHYLIP-C)tree files (PHYLIP-C)
alignment files (PHYLIP-C) modifyAlignFiles.pl alignment files (PHYLIP-P)
tree files (PHYLIP-C) modyifyTreeFiles.pl tree files (PHYLIP-P)
alignment files (PHYLIP-P)tree files (PHYLIP-P) buildCtlFiles.pl
control filesbat file(codeml)
alignment files (PHYLIP-P)tree files (PHYLIP-P)control files
bat file(codeml) ancestral sequences files
ancestral sequence files extractAnSeq.pl ancestral sequences (FASTA)
speciesDB(FASTA)ancestral sequences (FASTA) concatenate files together speciesDB + AnSeq(FASTA)
speciesDB + AnSeq(FASTA) dbFormatter.pl(formatdb) speciesDB + AnSeq (BLAST)
speciesDB + AnSeq (BLAST) bClust.pl(blastclust) + AnSeq cluster list+ AnSeq neighbor/hit-list
cluster list+ AnSeq cluster list compareClust.pl
summary of clusters combined and sequences maintained and added
neighbor/hit-list xnbr.c neighbor/hit-list(text)
38
or 1 a
or 6 3
or 1 b
or 1 c
or 6 6or 6 1or 6 5
or 6 4or 6 2
or 10 1
or 10 2or 10 3
or 10 4
or 10 5 or 10 6 or 10 7 or 10 8 or 10 9
or 10 10
Unrooted tree, constructed by Clustal W and displayed via Tree ViewColors show positioning of sequences from each initial cluster.
Olfactory Receptor Proteins
Extending Gene Families via Predicted Ancestral SequencesAcknowledgementsPersonal BackgroundExtending Gene Families via Predicted Ancestral SequencesGene FamiliesAncestral sequencesProject GoalProposed Work FlowProgramming EnvironmentSequences/Species Processing Time RequiredInitial ClusteringSecond ClusteringComparative AnalysisComparative AnalysisComparative AnalysisOlfactory Receptor Proteins�Initial ClusteringOlfactory Receptor Proteins +�Predicted Ancestral SequencesOlfactory Receptor Proteins�Clustering with Ancestral SequencesComparative AnalysisConclusionsFuture WorkFuture WorkReferencesReferencesQuestions?Olfactory Receptor Proteins�