Upload
hana
View
25
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Integration of data to uncover evolutionary trends and infer protein function: The tale of Rcs1. M. Madan Babu. MRC Laboratory of Molecular Biology Cambridge. C. H. H. C. Overview of research. Evolution of biological systems. Evolution of networks within and across genomes. - PowerPoint PPT Presentation
Citation preview
Integration of data to uncover evolutionary trends and infer protein function: The tale of Rcs1
Integration of data to uncover evolutionary trends and infer protein function: The tale of Rcs1
MRC Laboratory of Molecular BiologyCambridge
MRC Laboratory of Molecular BiologyCambridge
M. Madan BabuM. Madan Babu
Overview of researchEvolution of biological systems
Evolutionary of transcriptional networks Evolution of networks within and across genomes
Nature Genetics (2004) J Mol Biol (2006a)
Evolution of transcription factors
Nuc. Acids. Res (2003)
Structure and dynamics of transcriptional networks
Structure and function of biological systems Uncovering a distributed architecture in networks
Methods to study network dynamics
J Mol Biol (2006b) J Mol Biol (2006c)
Discovery of novel DNA binding proteins
Data integration, function prediction and classification
Nature (2004)
Nuc. Acids. Res (2005) Cell Cycle (2006)
C
C
H
H
Discovery of transcription factors in Plasmodium
Evolution of a global regulatory hubs
Rcs1 – regulator of cell size 1
S. cerevisiae - wild type S. cerevisiae - Rcs1 mutant
Micrographs and data from SCMD
Roundness of mother cell
1.291.20
The following parameters that were used to define cell-size for the Rcs1 mutantwere at least 2 Standard deviation (2 ) from the mean values of the wild-type
Mother cell-size
874760
Contour length of mother cell
108100
Long axis length of mother cell
3633
Short axis length of mother cell
3027
Size of mutant cells are twice that of the parental
strain
The critical size for budding in the mutant is
similarly increased
Rcs1 binds specific DNA sequences
C6-
Fun
gal
C2H
2-Z
n
bZip
Hom
eo
Gat
a
bHL
H
Fkh Hsf
Aps
es
Myb
Mad
s
HM
G1
Lis
H
Gcr
1
Rcs
1A
ce1
AT
-Hoo
k
Tig
Abf
1
Tea
Ime1
Dal
82
Tig
ger
P53
Rcs1 is a global regulatory hub – Network analysis I
Transcriptional regulatory network in yeast
123 41 314
Aft2p Rcs1p
Number of target genes regulated
Sub-network of Rcs1 and Aft2
No.
of m
embe
rs
Distribution of DNA binding domains in yeast transcription factors
Rcs1p and Aft2p are global regulatory hubs with an as yet uncharacterized DNA binding
domain
How did the paralogous hubs that regulate distinct sets of genes evolve?
Relationship to WRKY DNA binding domain – Sequence analysis I
Non-redundant database
+
...
.
Lineage specific expansion in several fungi and is seen in lower eukaryotes
Candida albicans (ascomycete)Yarrowia lipolytica (ascomycete)Ustilago maydis (basidiomycete)Cryptococcus sp (basidiomycetes)E. cuniculi (microsporidia)
Giardia lamblia (diplomonad)Dictyostelium discoideumEntamoeba histolytica
Profiles + HMMof this region
Non-redundant database
+
WRKY domain(Arabidopsis)
FAR-1 type transposase(Medicago truncatula)
Globular region maps to WRKY DNA-binding domain
Non-redundant database
+
WRKY DNA-bindingDomain fromArabidopsis
WRKY4
Rcs1(S. cerevisiae)
Gcm1(Drosophila)
WRKY DNA-binding domain maps to the same globular region
Confirmation of relationship to WRKY DBD – Sequence analysis II
Multiple sequence alignment of all globular
domains
JPRED/PHD
Sequence of secondary structure is similar to the WRKY DNA-binding domainand GCM1 protein seen in mouse
Homologs of the conserved globular domain constitutes a novel family of the WRKY DNA-binding domain
S1 S2 S3 S4
Characterization of the globular domain – structural analysis I
A. thaliana transcription factor(WRKY4:1wj2:NMR structure)
S1 S2 S3
S1 S2 S3
Predicted SS of Rcs1 DBD
SS of WRKY4
S4
S4 S1 S2 S3
S1 S2 S3
Predicted SS of Rcs1 DBD
SS of GCM1
S4
S4
Mus musculus Glial Cell Missing - 1(GCM-1:1odh:X-ray structure)
Both WRKY and GCM1 have similar network of stabilizing interactions
Template structure
S1 S2 S3
4 residues involved in metal co-ordination and10 residues involved in key stabilizing hydrophobic interactions that determine the path of the backbone
in the four strands of the GCM1-WRKY domainshow a strong pattern of conservation.
S4
Characterization of the globular domain – structural analysis II
Core fold of the Rcs1 DBDwill be similar to the WRKY-GCM1
domain and may bind DNA in a similar way
Classification of WRKY-GCM1 superfamily – Cladistic analysis I
S1 S2 S3 S4
S1 S2 S3S4
C
C
H
H
Zn2+
Template structure
+
S1 S2 S3S4
C
C
H
H
Zn2+
Classical WRKY (C)
WRKY motif in S1Short loop between S2 & S3
S1 S2 S3S4
CH
H
Zn2+
N-terminal helixConserved W in S4Large insert between S2 & S3
Insert containingversion (I)
W
C
S1 S2 S3S4
C
C
H
C
Zn2+
HxC containing version (HxC)
HxC instead of HxHN-terminal helixShort insert between S2 & S3
S1 S2 S3S4
C
C
H
H
Zn2+
FLYWCH domain(F)
Conserved W in S2Sequence features
W
S1 S2 S3S4
CH
H
Zn2+
Insertion of Zn ribbon between S2 and S3
GCM domain(G)
C
GC HxC I FWRKY4 Rcs1Far1 Mdg Gcm1
Domain context for the different families – network analysis I
S1 S2 S3S4
C
C
H
H
Zn2+
Classical WRKY (C)
S1 S2 S3S4
CH
H
Zn2+
Insert containingversion (I)
W
C
S1 S2 S3S4
C
C
H
C
Zn2+
HxC containing version (HxC)
S1 S2 S3S4
C
C
H
H
Zn2+
FLYWCH domain(F)
W
S1 S2 S3S4
CH
H
Zn2+
GCM domain(G)
C
C
e.g. WRKY4 e.g. Rcs1
e.g. Far1
e.g. Mod (mdg)
C C
Tan
dem
Stan
dal
one
Zn
clus
ter
I I
I
Tan
dem
Stan
dal
one
HxC
MU
LE
Tpa
se
OU
Tpr
otea
se
MU
LE
Tpa
se
Mob
ile
elem
ent
Stan
dal
one
HxC
e.g. 101.t00020
e.g. At2g23500
F
BE
Dfi
nger
Stan
dal
one
PO
Z
F G
G
Stan
dal
one
e.g. Gcm1
SMB
D
Znkn
uckl
e
Human
Fly
Worm
Fungi
Plants
Entamoeba
Slim mould
GC HxC I F
Phyletic distribution – Comparative genome analysis I
TF o
nly
TF o
nly
TF +
TP
Plants
Lowereukaryotes
Fungi
HigherEukaryotes
Transcription factor
Transposase
GCM1 and FLYWCH versionsevolved from an insert containingversion that is a transposase
Classical version of the WRKYevolved from an insert containingversion that is a transposase
HxC and Insert containing versionsare seen as both transcription factorsand as transposases
-explain that there has been multiple transitions from transposase to TFs in the fungal genomes-explain how this could have happened by showing the snapshot of the breakup of selfish elements into two distinct products-explain that the transposase can itself regulate the gene expression of itself
Outline of the presentation
Rcs1 and aft2 have a distinct version of the WRKY type DNA binding domain
Sensitive sequence search reveals that
Oryza sativa (monocot)Arabidopsis thaliana (dicot)Medicago truncatula (dicot)Nicotiana tabacum (dicot)
Structural equivalences of WRKY-GCM1 domain proteins with Bed and Zn finger
S1 S2 S3 S4
C
C
HZn2+
H
ZnC
C
C
C
S1 S2 S3S4
C
C
H
H
Zn2+
WRKY (1wj2)
GCM-type WRKY(1odh)
S1 S2 S3
CC
H
HZn2+
S4S1 S2 H1
CC
H
HZn2+
Bed-finger(2ct5)
Classical Zn-finger(1m36)
Why Rcs1? While systematically analyzing the genes which gave rise to abnormal cell size, We and the other noted that mutants of Rcs1 give abnormal cell shape.
It was known to be an important transcription factor involved in cell size regulation– explain showing graphs and images
Independently, during the analysis of the TNET in yeastWe looked at the hubs and the DNA binding domainsThat were present in them. Interestingly, there were twoHubs that did not have any known DNA binding domainIdentified in them, but the region which mediates DNAwas known – explain showing the family relationshipOf the hubs-only two members, and both are hubs-how and when did they evolve?
Standard search procedures using Pfam and other databases did not provide any clue about the domain. So we set out to characterize the DNA binding region from Rcs1p and its paralog Aft2p using sensitive
sequence search and other computational methods. -show output from Pfam hits
Structural aspects of the DNA binding domainExplain the residues involved in metal chelating
-DNA contacting surface-Inserts in the loops
-Stabilizing contacts involved
WRKY DNA binding domain – Structure analysis I
WRKY DNA binding domain – Structure analysis II
Structure comparisons identify several otherKnown transcription factors including the GCM protein in eukaryotes
-Explain the insert of a zinc ribbon in the loopIn fact sequence comparison without the insert can pick these WRKY proteins
Multiple starting points identified all homologs in the different speciesThis allowed us to classify the sequences into different families
Each with a specific feature suggesting common evolutionary relationshipBased on shared and derived features of the domains
- List the 5 families and point to features involved using a structure template
Classification of WRKY domains – Cladistic analysis I
Phylogenetic distribution and domain architecture for the different families - I
Phyletic profiles of the different domains points to the possibility that these transcription factors could have evolved from transposases
With at least two distinct recruitment into transcription factors.-In plants in one case
-In the base of the fungal genomes in the other case
Phylogenetic distribution and domain architecture for the different families - II
Comparative genomics using the fungal genomesprovides the clue for the evolution of these TFs
-explain that there has been multiple transitions from transposase to TFs in the fungal genomes-explain how this could have happened by showing the snapshot of the breakup of selfish elements into two distinct products-explain that the transposase can itself regulate the gene expression of itself
Comparative genomics using the fungal genomesprovides the clue for the evolution of these TFs
-extensive recruitment of the transposase in the different fungal lineages-multiple jumps within the fungal lineage-very recent duplication event in the orderSaccharomycetales suggest hubs couldEvolve rapidly-Candida rbf1 and other TFs independently duplicated and evolved as global regulators
Since it happened in fungal genomes, we ask how does this behave in the plants.-show the gene expression patterns for the different subfamilies.We see two trends one where divergence has primarily occurred in the expression changes rather than in the protein sequence, and the other in which proteins with the same expression patternhave different binding site residues.-spatio-temporal changes in gene expression-It is experimentally well known that the FLYWCH and the GCM proteins are developmentally important regulatory proteins.
So in three lineages there has been recruitment of the transposase into becoming a developmentally important global regulator.
Analysis of the gene expression data in plants
Analysis of the gene expression data in plants
There are interesting traces of gene expression pattern when we see for the different WRKY containing proteins. TPases are expressed in the root and in the pollen enhancing the possibility of rapidly expanding themselves during evolution.
Acknowledgements
S Balaji
Lakshminarayan Iyer
Aravind group
L Aravind
*
Encephalitozoon
cuniculi
Dictyostelium discoideum
Plants
Giardia lamblia
Ciliates
Apicomplexa
Fungi
Caenorhabditis elegansHomo sapiensDrosophila
melanogaster
ClassicalWRKY
HxC-type WRKY
MULEtransposase
Animals
Entamoeba histolytica
Plant specificZn-cluster
SWIMdomain
POZ
1- 5
GLP_79_64671_67418_Glam_71077115)
GLP_9_36401_35940_Glam_71071693)
101.t00020_Ehis_67474280
dd_03024_Ddis_28829829
ECU05_0180_Ecun_19173554
mutA_Ylip_49523824
TTR1_Atha_30694675WRKY41_Osat_46394336
WRKY58_Atha_22330782 At2g34830_Atha_27754312
NtEIG-D48_Ntab_10798760
FAR1_Atha_18414374
AT4g19990_Atha_7268794
LOC_Os11g31760_Osat_77551147
At2g23500_Atha_3242713
C26E6.2_Cele_32565510
T24C4.2_Cele_17555262C20orf164_Hsap_13929452
KIAA1552_Hsap_10047169
hGCMa_Hsap_1769820
mod(mdg4)_Dmel_24648712
LOC411361_Amel_66547010
CG13845_Dmel_24649011
gcm_Dmel_17137116
GCM-type WRKY
Zincknuckle
BEDfinger
* *
Plant specificN-all-beta
TIRdomain
LRR
STANDATPase
FLYWCH-type WRKY
Insert-containing WRKY
C
G
HxC I
F
G
F
G
F
F
F
F
CC C
HxC
C CC C
II
I
I
CHGG_08318_CGLO_88179597
I
I
*
Isochorismatase
IAN6124.2_ANID_67539908
I
AT-hook
HxC
MtrDRAFT_AC146590g49v2_Mtru_92891293
1- 5
IAFT2_Scer_6325054
HxC
Afu2g08220_Afum_71000950
I
I
OTU
I I I IYALI0C00781g_Ylip_50547661
CHGG_00311_Cglo_88184608
I
I
YALI0A02266g_Ylip_50543034
*MtrDRAFT_AC126008g21v1_Mtru_92876827
*
*
IUM03656.1_Umay_71019145
*
CC C
HxC
HxC
C
Ci-ZF-1_Cint_93003122
PHDfinger
C2H2finger
I
F54C4.3_Cele_3790719
I
T24C4.7_Cele_17555272
I
Plant-specificmobile domain
*
60 W
RK
Y do
mai
n co
ntai
ning
pro
tein
s15
Far
1-ty
pe
prot
eins
40 H
xC ty
pe W
RK
Ydo
mai
n pr
otei
ns5
WR
KY
dom
ain
Pro
tein
s w
ith
TIR
/LR
R
+
60 W
RK
Y do
mai
n co
ntai
ning
pro
tein
s15
Far
1-ty
pe
prot
eins
40 H
xC ty
pe W
RK
Ydo
mai
n pr
otei
ns5
WR
KY
dom
ain
Pro
tein
s w
ith
TIR
/LR
R
+
Gene expression profiles for the developmental stages in
Arabidopsis thaliana
Gene expression profiles for the light exposure conditions in
Arabidopsis thaliana
RootStem Leaf
Apex
Flower
Floral
organs Seeds
Darkness
Continuous
light
Pulse
light
a b
Expression profiles of WRKY-GCM1 domain proteins in Arabidopsis
WRKY proteinsshow tissue
specific expression
WRKY proteinsshow light
specific expression
123 41 314
Aft2p Rcs1p
Number of target genes regulated
Aft2p
Rcs1p
Transcriptional network involving Aft2p and Rcs1p
UM
03656.1 Um
ay 71019145
CA
GL
0H03487G
CG
LA
49526254
CA
GL
0G09042G
CG
LA
49526062
CaO
19.2272 Calb 68482460
DE
HA
0F25124g D
han 50425555
KL
LA
0D03256g K
lac 50306475
AF
L087C
AG
OS 44984319
OR
FP
Sklu Contig1830.2 kluyveri
Kw
al 24045 waltii
OR
FP
Skud Contig2057.12 kudriavzeii
OR
FP
Scas Contig720.21 castelli
RC
S1 S
CE
R 51830313
OR
FP
7853 mikatae
OR
FP
8601 paradoxus
OR
FP
21513 mikatae
OR
FP
Scas Contig690.14 castelli
OR
FP
22109 paradoxus
AF
T2 S
CE
R 6325054
OR
FP
Skud Contig1659.3 kudriavzeii
Relationship between Rcs1p and Aft2p homologs
* *
AAL026Wp Agos 44980144UM03656.1 Umay 71019145CHGG 06963 CGLO 88178242CHGG 06785 CGLO 88182698CHGG 09478 CGLO 88177996CHGG 00175 CGLO 88184472CHGG 10902 CGLO 88175616FG05699.1 Gzea 46122643NCU06551.1 Ncra 85106835NCU05145.1 Ncra 85081010YALI0F07128g Ylip 50555399MG05295.4 Mgri 39939890FG04147.1 Gzea 46116610NCU07855.1 Ncra 85109845MG06795.4 Mgri 39977821NCU08168.1 Ncra 85093270CHGG 09951 CGLO 88176079CHGG 08318 CGLO 88179597NCU04492.1 Ncra 32406464FG09606.1 Gzea 46136181NCU06975.1 Ncra 85108658CHGG 05063 CGLO 88180976HOP78 FOXY 30421204CHGG 00311 CGLO 88184608CIMG 00825 CIMM 90305840AN6124.2 Anid 67539908ISOCHOR AFUM 71001046CNC00740 CNEO 57225606CNBH2400 Cneo 50256416AN0859.2 ANID 67517161YALI0A16269g Ylip 50545173CaO19 12424 Calb 68467239DEHA0E17127g Dhan 50422877RBF1P CALB 2498834
DEHA0A05258g Dhan 50405817CaO19.2272 Calb 68482460DEHA0F25124g Dhan 50425555CAGL0H03487G CGLA 49526254AFL087C AGOS 44984319KLLA0D03256g Klac 50306475CAGL0G09042G CGLA 49526062RCS1 SCER 51830313AFT2 SCER 6325054YALI0A05313g Ylip 50543230YALI0A02266g Ylip 50543034Mutyl Ylip 50545163YALI0C17193g.c Ylip 50548927Mutyl.c Ylip 50545161YALI0C00781g.d Ylip 50547661YALI0C00781g.a Ylip 50547661YALI0C00781g.b Ylip 50547661YALI0C00781g.c Ylip 50547661YALI0C17193g.a Ylip 50548927Mutyl.a Ylip 50545161YALI0D22506g Ylip 50551361Mutyl.b Ylip 50545161YALI0C17193g.b Ylip 50548927MG07557.4 Mgri 39972511MG09992.4 Mgri 39965911101.T00020 EHIS 674742804.T00052 EHIS 67483840FAR1 ATHA 18414374AT2G27110 ATHA 18401324AT2G43280 ATHA 30689328AT4G38180 ATHA 15233732AT3G59470 ATHA 18411179AT5G28530 ATHA 22327146AT1G52520 ATHA 15219020AT1G80010 ATHA 15220043C20ORF164 HSAP 13929452LOC428161 GGAL 50759053T24C4.2 CELE 17555262SJCHGC04823 SJAP 567589366330408A02RIK MMUS 50053999LOC374920 HSAP 27694337
Multiple independent evolution of TFs from Transposons
Animals
Plants
Entamoeba
Fungi
Rcs1Aft2p
cluster
Rbf1cluster
Sequence Structure Expression Interaction
Conclusion
Integration of different types of experimental data allowed us to Identify the DNA binding domain in Rcs1