Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
AutoGRAPH: un serveur de comparaison de génomes -
Application à l’identification de nouveaux gènes chez le chien
Thomas DERRIEN & Christophe Hitte-
Laboratory: CNRS - Institute of Genetics and Development of Rennes (France)-
Team: Dog Genetics
Rennes - 23 Oct 2007
Context
Dog Radiation Hybrid (RH) map
- 2003 : >3200 markers (Guyon et al.)
- 2004 : >4200 markers FISH/RH (Breen et al.)
- 2005 : 10,000 genes (Hitte et al.)
Canis familiaris : 38 autosomes + XY
chr1
chr2
chr3
chr4
????Prédiction de gènes Comparaison et perspectivesIntroduction Multispecies mapMultiresources map
Rennes 23 oct 2007
Context
Dog sequence
- 2005: Optimization of the low-coverage sequence of the dog genome. (Hitte C. et al.)
- 2005: Framework for the high-coverage of the dog sequence assembly. (Lindblad-Toh K. et al.)
Prédiction de gènes Comparaison et perspectivesIntroduction Multispecies mapMultiresources map
Rennes 23 oct 2007
Dog Radiation Hybrid (RH) map
- 2003 : >3200 markers (Guyon et al.)
- 2004 : >4200 markers FISH/RH (Breen et al.)
- 2005 : 10,000 genes (Hitte et al.)
Comparative genomics x2
Prédiction de gènes Comparaison et perspectives
Multi-ressources and multi-species comparative genomics analyses.
=> Sequence vs RH map vs cytogenetic map...
=> Dog vs mammal sequences.
Introduction Multispecies mapMultiresources map
Rennes 23 oct 2007
RH markers localizations and sequence alignments from the dog sequence
assembly (CanFam 1.0).
rh markers/genes
sequence alignments
relation between sequence and RH markers
Dog sequenceDog RH map
CFA 9
- Compare gene order RH map and sequence assembly.
- Estimate the colinearity between the 2 resources.
Aims:
Multi-resources comparative maps:
Prédiction de gènes Comparaison et perspectivesIntroduction Multispecies mapMultiresources map
Data set:
Rennes 23 oct 2007
AutoGRAPH and Multi-resources datasets:
Prédiction de gènes Comparaison et perspectives
(Derrien T. et al. 2006)
http://autograph.genouest.org/
Introduction Multispecies mapMultiresources map
Rennes 23 oct 2007
AutoGRAPH and Multi-resources datasets:
Prédiction de gènes Comparaison et perspectivesIntroduction Multispecies mapMultiresources map
MySQL temporary table
Insertion
- PHP, PERL- GD Graphic Library
ordering
Graphic map construction Visualization
User dataset
Identifiant
Chromos
Localisatio
Chromos
Localisatio
Chromos
Localisatio
GENE CFA7 6.3 CFA_0 30393 HSA_ 20043GENE CFA7 14.9 CFA_0 31907 HSA_ 20028GENE CFA7 28.3 CFA_0 32588 HSA_ 20021GENE CFA7 51.6 CFA_0 36582 HSA_ 19971GENE CFA7 64.9 CFA_0 38730 HSA_ 19942GENE CFA7 72.3 CFA_0 39878 HSA_ 19922GENE CFA7 79.6 CFA_0 40725 HSA_ 19915GENE CFA7 84.4 CFA_0 41214 HSA_ 19911GENE CFA7 101.1 CFA_0 41925 HSA_ 19906GENE CFA7 112.0 CFA_0 42469 HSA_ 19898GENE CFA7 124.7 CFA_0 44408 HSA_ 19874GENE CFA7 135.2 CFA_0 45004 HSA_ 19864GENE CFA7 146.0 CFA_0 46163 HSA_ 19855GENE CFA7 168.2 CFA_0 47265 HSA_ 19840
Rennes 23 oct 2007
Prédiction de gènes Comparaison et perspectives
Comparative analysis Cytogenetic map - RH map -
Sequence (CanFam1.0) for the dog chromosome 11 (CFA 11) :
- Strong colinearity between RH vs. CytoGenetic map.
- Inversion might be due to a problem in sequence assembly.
RH map Sequence
AutoGRAPH and Multi-resources datasets:
CytoGenetic
Results:- 8 discrepancies Sequence assembly / RH map- Cytogenetic experiments.- 4 have been solved in favor of the RH map.
Led to CanFam2.0 (Dec. 2005) (Lindblad-toh K)
Introduction Multispecies mapMultiresources map
Rennes 23 oct 2007
Introduction Multispecies map Prédiction de gènes Comparaison et perspectivesMultiresources map
Multispecies Comparative Maps
Rennes 23 oct 2007
Introduction Multispecies map Prédiction de gènes Comparaison et perspectivesMultiresources map
Rennes 23 oct 2007
- Identification of conserved sequences between species => functional sequences - Compare chromosomal organization between species => chromosomes rearrangements and evolution
Mutlispecies map: why?
Introduction Multispecies map Prédiction de gènes Comparaison et perspectivesMultiresources map
Rennes 23 oct 2007
- “Comparative anchors” : conserved sequences between species....
- ortholog genes (orthology relationships 1:1) -
Mutlispecies map: How?
Ancestral genomegene A gene B
Genome 1gene A.1 gene B.1
Genome 2gene A.2 gene B.2
SPECIATION
Orthologs: gene A.1 et A.2
= homolog genes separated by a speciation event
(Carnivore)
(Felis catus)(Canis familiaris)
DUPLICATIONGenome 1
gene A.1 gene B.1’ gene B.1’’ Genome 2
gene A.2 gene B.2
Paralogs: gene B.1’ et B.1’’
= homolog genes separated by a duplication event
(Canis familiaris) (Felis catus)
Multi-species comparative maps:
Data sets: - Collect ortholog data sets from Ensembl v.42 (Biomart/MartView)
=> Orthologues features for 5 species of interest : Dog - Human - Chimp - Rat - Mouse.
Introduction Multispecies map Prédiction de gènes Comparaison et perspectivesMultiresources map
- Compare genomes and construct multispecies comparative maps (synteny maps)
- Identify Conserved Segments (CS, Synteny blocks), Conserved Segments Ordered
(CSO, Synteny segments) and breakpoints regions.
- Facilitate gene identification
Aim:
Dog genes
Chromosome 1 (CFA 1)
Human genes
Orthology relationships (1:1)
Dog: Reference Human: Tested Genome
HSA 2
HSA 3Breakpoint between 2 CS.
Breakpoint between 2 CSO.
CSOs
Human synt
AutoGRAPH: CFA 34 vs human genome
- CFA 34 (reference) has 2 Conserved Segments:
=> HSA 5 => HSA 3
- Within CFA34 - HSA 3: 2 Conserved Segments Ordered (CSO).
- 2 Breakpoint regions.
- > High colinearity within CSO <-
CFA - 34
HSA 3
HSA 5
Results:
- Automatic identification of CS/CSO and breakpoints regions between 2 species
Results (map output):
Introduction Multispecies map Prédiction de gènes Comparaison et perspectivesMultiresources map
Rennes 23 oct 2007
CFA - 34 Human
Introduction Multispecies map Prédiction de gènes Comparaison et perspectivesMultiresources map
AutoGRAPH: CFA 34 vs human and mouse genomes
- 5 CS with mouse genome: => MMU 15 => MMU 13 => MMU 16 => 2x MMU 3 (=> 4 CSO)
- Reused /species-specific Breakpoints Identification.
- > High colinearity within CSO <-
Mouse
Results:
- Identification of CS/CSO and breakpoints regions between 3 species
Results (map output):
Rennes 23 oct 2007
Introduction Multispecies map Prédiction de gènes Comparaison et perspectives
Multispecies Comparative Maps
Multiresources map
2 types of result:
- displayed on the comparative map.
- listed in an array (and a flat file format that can be downloaded).
Localizations on reference chromosome
CS - CSO : Sizes (bp)CS - CSO Id
(No of genes)Localizations on
tested chromosomes
Density
Density around breakpoint regions
Results (array output):
Rennes 23 oct 2007
Introduction Multispecies map Prédiction de gènes Comparaison et perspectivesMultiresources map
Results (orthology relationships):
1:1 orthology relationship
1:0 orthology relationship
Tested syntenic interval
Rennes 23 oct 2007
Introduction Multispecies map Comparaison et perspectivesMultiresources map
Applications: Dog genome annotation
Rennes 23 oct 2007
The Dog Orphan genes
Prédiction de gènes
The Ensembl Automatic Gene Annotation System (Curwen V., 2004)
412 unannotated genes in the dog genome (orphan genes)(Protein coding genes)
All Ensembl gene predictions are based on experimental evidences (UniProt - SwissProt - RefSeq)
Analysis of a subset of genes 1:1:1:1:0 (Human:Chimp:Mouse:Rat)
Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map
Human Chimp Mouse Rat Dog
Total protein coding genes 23068 20 982 23616 21367 17507
Extract X
Data set:
Rennes 23 oct 2007
Hypothesis:
Annotation problems in the dog genome?
Are these genes specific in primates and rodents lineages?
Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map
Aim: Analyse these genes in the dog genome
Rennes 23 oct 2007
Characterization of 412 [1:1:1:1:0] ortholog
Structural characterization:
1 2 3 4 5 6 7 8 9 10 12 14 16 18 20 22 24 26 28 30P
Dog_OrphanGenes
Random_Genes
Nombre d’Exon(s)
No
mb
re d
e g
en
es
01
02
03
04
05
06
0
Exon number
Num
ber
of g
enes
Mean exon No Mean protein sizes (aa) Mean cDNA GC content
Human tested set (n=412) 6.3 398 52.55
Human random set (n=1000) 10.7 557 51.34
Higher rate of monoexonic genes
Smaller protein sizes
Higher transcripts GC content
Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map
Characterization of human ortholog
Functionnal characterization:
- Significant enrichment in Specific GO term:
- Analyzed with the program GO Tree Machine (Zhang B. et al, 2004)
p-value GO category (BP) p-value GO category (BP)
0.008970025556679 potassium ion transport 0.00045451746485333 fertilization
0.008902320120899 response to wounding 0.00040299007694943 fertilization (sensu Metazoa)
0.0088375773202095 detection of chemical stimulus 6.2941602638017E-05 regulation of cellular physiological process
0.0070783033944581 physiological response to wounding 1.4569323904298E-05 regulation of physiological process
0.0069479680074864 plasma membrane fusion 1.7202210352273E-06 regulation of metabolism
0.0069479680074864 microtubule nucleation 1.2026321510082E-06 regulation of cellular metabolism
0.0052673840107958 fusion of sperm to egg plasma membrane 6.8760533901414E-07 regulation of transcription
0.004307225374419 response to external stimulus 5.7435783302335E-07 regulation of nucleobase, nucleoside, nucleotide and nucleic acid metabolism
0.00060351043892835 nucleobase, nucleoside, nucleotide and nucleic acid metabolism 5.0349916873545E-07 transcription
0.00052973081439737 regulation of biological process 4.3662427217568E-07 regulation of transcription, DNA-dependent
0.00049971977537453 regulation of cellular process 2.0446677466923E-07 transcription, DNA-dependent
Main category:- chemosensation, olfaction- immunity and host defense- reproduction
Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map
Rennes 23 oct 2007
Methods:
4 steps
Multispecies synteny maps construction: Reference (hsa/ptr/mmu/rno) : Tested (cfa)&
Dog syntenic interval identification
Testing the dog syntenic interval by sequence alignments
Dog gene predictions
1
2
3
Comparison with ensembl dog annotated genes4
Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map
Rennes 23 oct 2007
1st step: Multispecies map construction
Ensembl annotation: Comparative anchors
AutoGRAPH: For 1 tested dog gene <=> 4 dog syntenic intervals defined
Example: Gene H2BFS = Histone H2B type F-S
- Human: ENSG00000197597 (HSA 21: ~ 43,8 Mb)- Mouse: ENSMUSG00000050936 (MMU 3: ~ 96,3 Mb)- Chimp: ENSPTRG00000017808 (PTR 6: ~ 26,6 Mb)- Rat: ENSRNOG00000029696 (RNO 2: ~ 19,1 Mb)
Human Chimp Mouse Rat
No of 1:1 orthologues with the
dog genome14,997 14,798 14,667 14,065
Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map
Rennes 23 oct 2007
Reference genome : Human (HSA 21: ~ 43,8 Mb)
Tested : Dog
Conserved segment : x1
Localization : CFA 31: 14,13 - 40,90 (Mb)
(ENSG00000197597 = H2BFS = Histone H2B type F-S)
One predicted dog interval with human
High colinearity in the CS
Interval defined: CFA_31: 39,87 - 39,98 (Mb)
1st step: Multispecies map construction
Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map
Rennes 23 oct 2007
Reference : Mouse (MMU 3: ~ 96,3 Mb)
Tested : Dog
No Conserved Segments : x16
Coordonnées : XXXs
... same analyses with chimp and rat orthologues
Interval defined: CFA_31: 39,87 - 40,16 (Mb)
(ENSMUSG00000050936 = H2BFS = Histone H2B type F-S)
1st step: Multispecies map construction
Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map
Rennes 23 oct 2007
Interval defined: CFA_31: 39,87 - 39,98 (Mb)
Interval defined: CFA_31: 39,87 - 40,16 (Mb)
Hu/Dog :
Mmu/Dog :
Interval defined: CFA_31: 39,87 - 39,98 (Mb)Chim/Dog :
Interval defined: CFA_31: 39,80 - 40,16 (Mb)Rat/Dog :
39,8
7 Mb
39,8
0 Mb
39,9
8 Mb
40,1
6 Mb
Consensus Orthologous IntervaL :COIL
Principle
Results
- 389 (94%) COILs (dog predicted intervals)
If at least (2/4) overlapped intervals : Consensus Orthologous IntervaLs -COIL-
Mean size interval: 347 kb (vs 2,4 Gb)
distributed evenly on all the dog chromosomes (nb:1 => chr 14 --- nb:47 => chr1)
- 17 : in breakpoint regions reused between primates and rodents- 6 : no consensual in the prediced interval (orthology prediction ?)
Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map
Reduce the search space to only 347 kb!
Rennes 23 oct 2007
Methods:
4 steps
Multispecies synteny maps construction: Reference (hsa/ptr/mmu/rno) : Tested (caf)&
Dog syntenic interval identification
Protein Alignment : cDNA reference sequence vs Canine genome sequence :
Dog gene predictions
1
2
3
Comparison with ensembl dog annotated genes4
Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map
412
389
Rennes 23 oct 2007
Canis familiaris : 38 autosomes + XY
chr1
chr2
chr3
chr4
????
Protein Alignment : cDNA reference sequence vs Canine genome sequence :
Rat
H2BFS cDNA
Mouse
H2BFS cD
NA
Exonerate (Slater G et al., 2005): model cDNA2 genome.
Chimp
H2BFS cD
NA
Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map
Rennes 23 oct 2007
Human
H2BFS cDNA
2/4: Alignement de séquences de référence sur le génome testé
4 reference species 2-3 reference species 0-1 reference species
canine COIL match with Protein Alignments 271 77 41
Proportion 69.7% 19.7% 10.5%
Protein Alignment : cDNA reference sequence vs Canine genome sequence :
Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map
348 COILs correlated with sequence alignments of cDNA reference-species sequences
Rennes 23 oct 2007
Methods:
4 steps
Multispecies synteny maps construction: Reference (hsa/ptr/mmu/rno) : Tested (caf)&
Dog syntenic interval identification
Protein Alignment : cDNA reference sequence vs Canine genome sequence :
Dog gene predictions
1
2
3
Comparison with ensembl dog annotated genes4
Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map
412
389
348
Rennes 23 oct 2007
Gene prediction
Software: GeneWise (Birney E et al, 2004):
- Alignment of Reference protein sequences with dog nucleotidic interval.
=> 1 gene : 4 analyses corresponding to the 4 reference proteins=> Best Score => Threshold: > 40% protein identity with reference-species ortholog
CTCAATGCCAATCCGGCCCCCCTGAGTTCTTCTTCCGCGTGTTGACCACCGTCCCAGAATTCCAGGCCCTGCTCTTCCTCCTCTTCCTCCTCCTCTGATCCTCTGTGGCAACACAGCCATCATCTGGGTGGTGTGCACGCACAGCTCCCTCCGCACCCCCATGTACTTCTTCCTCTGCAACCTGGCCTTTGATCAGCTACACCACGGTGGTGGTGCCTCTGATGCTTTCCAACATTTGGGCTCAACCAATCCGGCCCCCCTGAGTTCTTCTTCCGCGTGTTGACCCCCAGAATTCCAGGCCCTGCTCTTCCTCCTCTTCCTCCTCCTCTACTTGATGATCCTCTGTGGCAACACAGCCATCATCTGGGTGGTGTGCACGCTCCCTCCGCACCCCCATGTACTTCTTCCTCTGCAACCTGGCCTTTGTAGAGATCAGCTACACCACGGTGGTGGTGCCTCTGATGCTTTCCAACAT
Dog syntenic interval
Mouse protein (H2BFS)
285 dog genes
Dog gene structure prediction
Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map
Rennes 23 oct 2007
Methods:
4 steps
Multispecies synteny maps construction: Reference (hsa/ptr/mmu/rno) : Tested (caf)&
Dog syntenic interval identification
Protein Alignment : cDNA reference sequence vs Canine genome sequence :
Dog gene predictions
1
2
3
Comparison with ensembl dog annotated genes4
Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map
412
389
348
285
Rennes 23 oct 2007
CFA_ 14: 57,078,322 - 58,156,905
347 AA - 7 Exons
Number of ensembl annotated genes in the interval:
1
New orthology relationship (1:1:1:1:1)
TFEC: Transcription factor EC isoform b
Is there any dog gene already annotated by Ensembl in the interval?
Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map
Dog syntenic interval
Reference cDNA alignments
GeneWise: gene prediction
Rennes 23 oct 2007
Is there any dog gene already annotated by Ensembl in the interval?
chr4:
Gap
Conservation
humanmouse
rat
Level 1Level 2Level 3Level 4Level 5Level 6
SINELINELTRDNA
SimpleLow Complexity
SatelliteRNA
OtherUnknown
AGAGACTTGTATACTCTAC...
CTTTTAT
TCAGGGAGCAGGTGTGCCCCTGTAC...TGTGCCCCTGTCC...CCAGCCATCTGTG...
TCAAAAAGA
AG
CAATTATTATTTTA
TGAATGGATAAAGA...
TTTATTAT
GA
chr4 + 61141kchr35 + 28639kchr35 - 28672kchr21 - 33184kchr32 + 18168kchr29 - 31942kchr11 + 27582k
chr2 + 57129kchr17 + 42932k
chr6 + 66820kchr30 + 27046k
chr6 + 15046kchr1 - 118754kchr11 + 70472kchr20 + 59523kchrX + 41508kchr1 - 118796kchr1 - 119285kchr28 - 44147kchr1 - 113736kchr1 + 120633kchrX + 40980kchr1 - 108199kchr11 + 70446kchr1 - 108154kchr1 - 102220kchrX - 40072k
chr1 + 108116kchr26 + 3091kchrX + 41591kchr4 - 61500kchr26 + 3047kchr15 + 5460kchr1 - 73204k
chr17 - 38021kchr5 - 42234kchr4 - 61438kchr26 + 3177k
chr1 - 118621kchr1 + 104532kchr17 - 47586k
chr6 - 15047kchr5 + 78548k
chr1 + 108384kchr1 - 73109kchrX - 41383k
chr1 + 118324kchr35 + 27795kchr1 + 102889k
chr15 + 5540kchr26 - 3047kchr1 + 97057k
chr1 + 118097kchr6 + 13102k
chr14 - 15102kchr1 - 108060kchr26 + 3213k
chr24 + 36661kchr5 + 83998k
chr17 + 37978kchr8 - 65715kchr6 + 3360kchr4 + 3154k
chr1 - 103101kchr1 + 118703kchr16 - 56946kchrX - 41583k
chr1 + 108063kchr1 - 72722k
chr4 - 3427kchr1 + 108040k
61380000 61390000 61400000 61410000 61420000 61430000 61440000Gap Locations
Your Sequence from Blat Search
RefSeq GenesNon-Dog RefSeq Genes
Human Proteins Mapped by Chained tBLASTn
CpG Islands (Islands < 300 Bases are Light Green)
Dog/Human/Mouse/Rat Multiz Alignments & phastCons Scores
Human (Mar. 2006/hg18) Alignment Net
Repeating Elements by RepeatMasker
Simple Tandem Repeats by TRF
Microsatellites - Di-nucleotide and Tri-nucleotide RepeatsChained Self-Alignments
ENSCAF_BEFOREENSG00000211445_predit
ENSCAF_borne_AFTER
Homo TNIP1Bos TNIP1Mus Tnip1
Xenopus tnip1Danio tnip1
Gallus LOC396235Mus Gpx3Mus Gpx3
Homo GPX3Bos GPX3
Sus CAPNS1Rattus Gpx3
Oryza Os05g0493500Oryza Os03g0607600
GLI4TNIP1
TNIP1GPX6
GPX3
GPX3
CCNB1 ZNF33BZNF11B
AK131420ZNF33BZNF11BZNF551
CpG: 61
GPX3: Glutathione peroxidase 3 precursor
CFA_ 4: 61,375,065-61,498,458
Dog syntenic interval
Reference cDNA alignments
GeneWise: gene prediction
226 AA - 5 Exons
Number of ensembl annotated genes in the interval:
0
New gene AND new orthology relationship (1:1:1:1:1)
Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map
Rennes 23 oct 2007
Tester si la présence de gène dans l'Env. synténique défini
Number of ensembl annotated genes in the interval:
3 (?)
New gene AND new orthology relationship (1:1:1:1:1)
MYO1D: Myosin Id
CFA_9:43,454,106 - 43,829,963
1006 AA - 22 Exons
Dog syntenic interval
Reference cDNA alignments
GeneWise: gene prediction
Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map
Rennes 23 oct 2007
Methods:
4 steps
Multispecies synteny maps construction: Reference (hsa/ptr/mmu/rno) : Tested (caf)&
Dog syntenic interval identification
Testing the dog syntenic interval by sequence alignments
Dog gene predictions
1
2
3
Comparison with ensembl dog annotated genes4
Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map
412
389
348
285
185 new dog genes100 new orthology relationships
=
Rennes 23 oct 2007
Results validation:
Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map
80.5% have a peptide motif with InterProScan (InterPro Database)
48.6% (90/185) match with a canine ESTs (DB_EST)
> 40% protein identity with a reference ortholog.
Rennes 23 oct 2007
104 with no gene prediction: reasons?
Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map
Mean Size (bp)Fraction of
GAP content(%)
Fraction of repeat content
(%)
Fraction of GC content
(%)
Fraction of gene in telomeric region (%)
412 None dog interval predicted
23 None dog interval predicted
389 347,401 2.23 37.33 46.23 44.4
41 231,900 5.96 35.24 48.86 53.6
348 361,008 1.79 37.57 45.92 43.4
63 287,821 3.35 35.36 48.09 46.0
285 377,187 1.44 38.06 45.43 42.8
375,929 1.3233 35.8865 45.63 31.0
Step 1
Step 2
Step 3
Dog Consensual intervals definition
Overlap Reference transcripts vs. dog consensual interval
Gene prediction in dog consensual interval
104 without dog prediction
Rennes 23 oct 2007
104 with no gene prediction: reasons?
Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map
Mean Size (bp)Fraction of
GAP content(%)
Fraction of repeat content
(%)
Fraction of GC content
(%)
Fraction of gene in telomeric region (%)
412 None dog interval predicted
23 None dog interval predicted
389 347,401 2.23 37.33 46.23 44.4
41 231,900 5.96 35.24 48.86 53.6
348 361,008 1.79 37.57 45.92 43.4
63 287,821 3.35 35.36 48.09 46.0
285 377,187 1.44 38.06 45.43 42.8
Random set 1000 375,929 1.3233 35.8865 45.63 31.0
Step 1
Step 2
Step 3
Rennes 23 oct 2007
104 with no gene prediction ?
Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map
Structural problems (Sequence Quality):
- higher GAP content (> 10% for 12 genes)
- Protein identity < 20% [3/4] :- Smaller sizes of the dog intervals- Higher rate of GC content + telomeric localization - No EST validation - Biological function prone to “Gain and Loss” (immunity, olfaction = adaptation to environment, GOTM analysis)
Rennes 23 oct 2007
92 genes
Evolutionnary scenario : Loss of dog genes
92 with no gene prediction? the example of the PNMA Family: RAP (Dufayard et al)
Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map
Reconciliate TreeGene Tree
Conclusions - Directions:
- Analysis of the evolution rate of these dog sequence compared to reference sequence
- Other orphan-gene sets & Other species set (using Cat, Elephant...)
- Using the gene adjacency + in-depth gene prediction for refining gene family orthology :
1:0 orthology + n:m orthology
Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map
Rennes 23 oct 2007
COILs approach : - Multispecies - Multiple set of 1:1:
- complementary contributions of different genomes
- short interval = short space search (350 kb) : - reduces the cost of detecting false-positives - divergent sequence match facilitated- background noise is significantly reduced
Acknowledgements:
Introduction Multispecies map The unknown dog genes Dog orphan genesMultiresources map
Francis GalibertCatherine André Christophe Hitte
Rennes Dog Genetics-Genomics Team
Rennes 23 oct 2007(Sophie Roucan, Hugues Leroy, Anthony Assi, Olivier Filangi...)