Upload
christos-argyropoulos
View
428
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Powerpoint presentation of a talk I gave in the15th European Immunology Congress (EFIS 2003), June 2003 Rhodes Greece
Citation preview
ANALYSIS OF GENE ANALYSIS OF GENE REGULATION IN T-REGULATION IN T-
LYMPHOCYTES USING LYMPHOCYTES USING MICROARRAYS AND GENE MICROARRAYS AND GENE
EXPRESSION DATABASESEXPRESSION DATABASES
Dr A. Mouzaki, Dr A. Mouzaki,
Dr C. ArgyropoulosDr C. Argyropoulos
Aims & ObjectivesAims & Objectives
Hypothesis-driven application of high-Hypothesis-driven application of high-throughput gene quantification technologiesthroughput gene quantification technologies
Bioinformatics as an experimental toolBioinformatics as an experimental tool Application of formal Bayesian Statistical Application of formal Bayesian Statistical
Inference techniques in molecular biology Inference techniques in molecular biology experimental designexperimental design
Elucidation of gene regulation interactions Elucidation of gene regulation interactions in lymphocyte biology in lymphocyte biology
Starting point …Starting point …
Experimental work from early 90s Experimental work from early 90s suggesting that IL-2 gene is actively suggesting that IL-2 gene is actively repressed repressed in resting naïve T-lymphocytesin resting naïve T-lymphocytes
The same repressor(s) seems to be involved The same repressor(s) seems to be involved in HIV-1 regulation and in autoimmune in HIV-1 regulation and in autoimmune diseases (childhood ITP)diseases (childhood ITP)
Protein purification has been a daunting Protein purification has been a daunting tasktask
The distalThe distal NF-AT site in the IL-2 geneNF-AT site in the IL-2 gene & & HIV-1 LTR are co-regulatedHIV-1 LTR are co-regulated
Experimental systemExperimental system : : Χ.Χ.laevislaevis oocytesoocytes1.1. Transfections using various promoters to drive expression CAT –plasmids Transfections using various promoters to drive expression CAT –plasmids
2.2. Microinjection of nuclear and cytoplasmic extracts from Microinjection of nuclear and cytoplasmic extracts from T-lymphocytesT-lymphocytes
Experimental findingsExperimental findings : : HIV-LTRHIV-LTR HIV-LTR HIV-LTR
ΔΔPRREPRREIL-2 IL-2
promoterpromoterIL-2 promoter IL-2 promoter
Δ Δ PRREPRRE
resting naiveresting naive
T cellsT cells00 ++ 00 ++
activated T-activated T-cellscells
++++ ++++ ++++ ++++
Interpretation : ? Repressor in resting naive κύτταρα
Similar size complexes in EMSA experiments
Hot – cold competition EMSA experiments
Background dataBackground data
Repressor(s) is selectively expressed in Repressor(s) is selectively expressed in resting naive T-lymphocytesresting naive T-lymphocytes
Disappears and (never comes back) after T Disappears and (never comes back) after T – cell activation– cell activation
Binds to : Binds to :
HIV–1 PRRE -279 AGGCCAATGAAGGAGAGAACAACAGCTTGT -250 IL-2 PRRE -292 AAGAAAGGAGGAAAAACT-GT -273
HIV-1 NF-AT –252 TGTTACACCCTATFAGCCTGCATGGGATGGAGGACGC -216 HIV-1 NF-κB -108 ACAAGGGACTTTCCGCTGGGGACTTTCCA - 80
From gene (expression) to From gene (expression) to proteinprotein
Problem specificationProblem specification : : 1.1. looking for DNA-Binding Proteins that are looking for DNA-Binding Proteins that are
down-regulated upon T cell activationdown-regulated upon T cell activation2.2. Can bind to the common motif: Can bind to the common motif: AAGGAG
In theory: In theory: test candidate TFs in EMSA test candidate TFs in EMSA experimentexperiment
Q: Q: how do you find candidate factors ?how do you find candidate factors ? A: A: Bayesian statistics+TF Databases+Gene Bayesian statistics+TF Databases+Gene
Expression DatabasesExpression Databases
Crash Course on BAYESIANCrash Course on BAYESIAN StatisticsStatistics
BayesianBayesian:: named after named after Rev Bayes who described Rev Bayes who described Bayes theorem in the 18th centuryBayes theorem in the 18th century..
ProbabilityProbability : a real-number-valued measure of : a real-number-valued measure of the plausibility of a proposition when incomplete the plausibility of a proposition when incomplete knowledge does not allow us to establish its truth knowledge does not allow us to establish its truth or falsehood with certaintyor falsehood with certainty
Probability theory is just common sense Probability theory is just common sense reduced to numbers, and probability reduced to numbers, and probability represents the observer’s belief that a represents the observer’s belief that a certain event is truecertain event is true
How Does it Work ?How Does it Work ?A three step procedure : A three step procedure :
1.1. Clearly state what the hypotheses or models are, Clearly state what the hypotheses or models are, along with all the background information and along with all the background information and datadata
2.2. Use the language of probability to assign prior Use the language of probability to assign prior probabilities to the hypotheses investigatedprobabilities to the hypotheses investigated
3.3. Use Use probability calculusprobability calculus in order to arrive to in order to arrive to numerical values for the hypotheses in light of numerical values for the hypotheses in light of the available data the available data
)|(),|( )|(
),|(IEP
IHEPIHPIEHP
Experimental DesignExperimental Design
Bayesian methods are ideally suited to contribute Bayesian methods are ideally suited to contribute to experimental designto experimental design because : because :
1.1. information is usually available prior to information is usually available prior to experimentation experimentation
2.2. uncertainties can be combined with numerical uncertainties can be combined with numerical measures of measures of utilityutility of consequences of consequences
The optimal experimental design is the one The optimal experimental design is the one maximizing the expected utility of an maximizing the expected utility of an experimentexperiment
Decision structure of EMSA Decision structure of EMSA experimentsexperiments
a set of available actions a set of available actions ((ααii))
a set of uncertain events a set of uncertain events (E(Ei,ji,j)) : E: EDNDN and E and ENDNNDN
a set of consequencesa set of consequences Preferences about the Preferences about the
uncertain scenarios uncertain scenarios depends on the attitudes depends on the attitudes towards the consequences towards the consequences involved and is codified in involved and is codified in a utility function a utility function
ENDN
EDN
αi-1
αi+1
αi
c*
ci
U(c*) = 0
0 ≤ U(ci) ≤ 1
The role of DatabasesThe role of Databases Microarrays help to expand the action horizon by Microarrays help to expand the action horizon by
providing quantitative data about uncertain events providing quantitative data about uncertain events Specialized databases (i.e. TRANSFAC) Specialized databases (i.e. TRANSFAC)
complement microarrays by providing a wealth of complement microarrays by providing a wealth of data to aid numerical codification of utilities in a data to aid numerical codification of utilities in a specific situation.specific situation.
Common sense implies a simple utility function :Common sense implies a simple utility function :
1.1. the utility the utility cc** of the event Eof the event ENDNNDN is set equal to zero is set equal to zero
2.2. The utility of any factor in the case of down-The utility of any factor in the case of down-regulation is a semi-quantitative binding site regulation is a semi-quantitative binding site “similarity” score, with 1 being a perfect match and “similarity” score, with 1 being a perfect match and lesser degrees of similarity coded accordinglylesser degrees of similarity coded accordingly
The role of mathematicsThe role of mathematics maxmax ii {P {PDNDN(TF(TFii) * u(c) * u(cii) + P) + PNDNNDN(TF(TFii) * u(c) * u(c**)} )}
= max= max ii {P {PDNDN(TF(TFii) * u(c) * u(cii) }) } PPDN DN given by the given by the Behrens-FisherBehrens-Fisher distribution distribution
in Bayesian statistics (bypasses sample size in Bayesian statistics (bypasses sample size limitations and heteroscedasticity of limitations and heteroscedasticity of measurements)measurements)
Probabilities easier conceptualized as oddsProbabilities easier conceptualized as odds Calculated using the Calculated using the Bayes Bayes TheoremTheorem Computer algebra software takes care of Computer algebra software takes care of
integralsintegrals
Two possible outcomes for every gene:Two possible outcomes for every gene: eithereither δ < 0 δ < 0 (gene is down regulated upon T cell activation)(gene is down regulated upon T cell activation), , oror δ δ 0 0 (gene is not down-regulated)(gene is not down-regulated).. ““Bayesian”Bayesian” significance testingsignificance testing : :
HH00 : : δδ < 0 < 0 null hypothesisnull hypothesis and and HH11 : : δδ 0 0 alternative onealternative one Before we collect any gene expression data, we could Before we collect any gene expression data, we could
assume that each outcome is equiprobable (I stands for assume that each outcome is equiprobable (I stands for available background knowledgeavailable background knowledge))
P(HP(H00|I) = P(H|I) = P(H11|I) =|I) = ½ ½ After observing data, we calculate the POR:After observing data, we calculate the POR:
Significance Testing …Significance Testing …
)0(1
)0(
1
P
P
PP
DN
DN
The role of The role of public public data sourcesdata sources
The Lymphochip dataset : The Lymphochip dataset : Alizadeh AA, Eisen Alizadeh AA, Eisen MB, Davis RE MB, Davis RE et alet al. Distinct types of diffuse large . Distinct types of diffuse large B-cell lymphoma identified by gene expression B-cell lymphoma identified by gene expression profiling. profiling. NatureNature 2000; 403: 503-511 2000; 403: 503-511 provided the provided the relevant array experimentrelevant array experiment
The TRANSFAC database : The TRANSFAC database : Wingender E, Chen Wingender E, Chen X, Hehl R X, Hehl R et alet al. TRANSFAC: an integrated . TRANSFAC: an integrated system for gene expression regulation. system for gene expression regulation. Nucleic Nucleic Acids Res.Acids Res. 2000; 28: 316-319 2000; 28: 316-319 helped to calculate helped to calculate utilities for factorsutilities for factors
Bayesian analysis Bayesian analysis pinpoints pinpoints
differentially differentially regulated genesregulated genes
RESULTSRESULTS
DOWN-REGULATED TRANSCRIPTION DOWN-REGULATED TRANSCRIPTION FACTORSFACTORS
TRANSCRIPTION FACTORS Δ Fold (Log) Δ Fold Baseline POR
Nuclear Factor Of Activated T Cells 4 0.2758 1.32 13.540 191,90ID2 Inhibitor Of DNA Binding 2 0.3728 1.45 21.930 85,35MAZ - Myc Associated Zinc Finger Protein 0.6108 1.84 0.7148 85,08I52969 Programmed Cell Death 2 0.8764 2.40 -0.4210 72,23Myocyte-Specific Enhancer Factor Human Mef2 0.6634 1.94 0.8250 70,82Glucocorticoid Receptor, Alpha Splice Form 0.4030 1.50 22.130 61,59Evi Zinc Finger Protein 0.3018 1.35 15.430 57,91Interferon-Stimulated Transcription Factor 3 0.5290 1.70 14.160 48,21ETS2 0.8222 2.28 10.290 47,50ZFP161 Zinc Finger Protein 161 0.4574 1.58 0.9405 47,48MAF 96% Similar To Mouse MAF2 0.6966 2.01 0.7073 47,15Zinc Finger Protein, Subfamily 1A, 1 (Ikaros) 0.3630 1.44 23.090 36,93Far Upstream Element Binding Protein 3 0.6177 1.85 0.1068 35,90Host Cell Factor C1 (HCF). 0.6081 1.84 0.6696 35,50Max-Interacting Transcriptional Repressor Mad4 0.5164 1.68 11.560 29,00Mad1=MAD=MAX-Binding Protein 0.4276 1.53 11.340 25,99Signal Transducer Activator Of Transcription 1 0.4084 1.50 19.080 25,86BHLHB2 (Dec-1) 0.4872 1.63 0.9080 25,22Zinc Finger Protein ZNF131 With POZ Domain 0.4330 1.54 11.170 23,49Histone Acetyltransferase Associated With MOZ 0.5693 1.77 0.1876 23,27BCL-6 Zinc Finger Protein (ZFP51) 0.6552 1.93 0.5631 21,32Madh1 Mad 0.6431 1.90 0.9082 21,19Egrα, TGF – Β Early Inducible Protein 0.5897 1.80 13.210 21,08Cyclin D Binding Myb-Like Transcription Factor 1 0.5976 1.82 0.2317 21,05Mybl1. 100% Similarity F Human A-Myb 0.7820 2.19 -0.3559 21,02
COMBINING UTILITIES AND COMBINING UTILITIES AND PROBABILITIESPROBABILITIES
TRANSCRIPTION FACTORS BINDING SITES Nuclear Factor Of Activated T Cells 4 IL-2 PRRE
ID2 Inhibitor Of DNA Binding 2Lacks a DNA Binding domain; inhibits otherBHLH TFs
MAZ - Myc Associated Zinc Finger Protein GGGAGGGI52969 Programmed Cell Death 2 Unknown Binding SiteMyocyte-Specific Enhancer Factor Human Mef2 KCTAWAAATAGM Glucocorticoid Receptor, Alpha Splice Form GRE ELEMENTEvi Zinc Finger Protein ACAAGATAA Interferon-Stimulated Transcription Factor 3 GGGAAACCGAAACETS2 GGGAAG, GGAGGAA
ZFP161 Zinc Finger Protein 161 RNRNRCGCGCWMAF 96% Similar To Mouse MAF2 CTCATTTTCCCTTGGTTTCAGCAACTTTAZinc Finger Protein, Subfamily 1A, 1 (Ikaros) NNNTGGGAATRCCFar Upstream Element Binding Protein 3
TTGTTTTTCATGCCGTGGAATAACACAAAATAAAAAATCCCGAGGGAATATAC
Host Cell Factor C1 (HCF). ATGCAAATMax-Interacting Transcriptional Repressor Mad4 MYC SITE REPRESSORMad1=MAD=MAX-Binding Protein MYC SITE REPRESSORSignal Transducer Activator Of Transcription 1 ANTTCCGGGAANTGNSNBHLHB2 (Dec-1) Unknown Binding SiteZinc Finger Protein ZNF131 With POZ Domain Unknown Binding SiteHistone Acetyltransferase Associated With MOZ TGT/CGGT BCL-6 Zinc Finger Protein (ZFP51) GAAAATTCCTAGAAAGCATAMadh1 Mad MYC SITE REPRESSOREgrα, TGF – Β Early Inducible Protein Unknown Binding SiteCyclin D Binding Myb-Like Transcription Factor 1 CCCG(G/T)ATGT Mybl1. 100% Similarity F Human A-Myb YAACNGHH
The role of (depletion) EMSAsThe role of (depletion) EMSAs
Nuclear extracts of Nuclear extracts of resting naive resting naive Τ Τ cellscells
EnrichmentEnrichment x 10 x 10 Remove Remove TF using TF using
specific antibody and specific antibody and Protein A SepharoseProtein A Sepharose
Band Signal Reduction Band Signal Reduction Super Shift in typicalSuper Shift in typical EMSA experimentsEMSA experiments
HENCEHENCE ... ...
An ets-2 like factor most likely represses IL-2 An ets-2 like factor most likely represses IL-2 genegene
Most likely another Most likely another DNA-BP also plays a role DNA-BP also plays a role (work in progress using affinity columns to isolate (work in progress using affinity columns to isolate the complex)the complex)
Statistics can be extremely useful, provided Statistics can be extremely useful, provided computer algebra systems take care of “double” computer algebra systems take care of “double” integralsintegrals