20
ANALYSIS OF GENE REGULATION ANALYSIS OF GENE REGULATION IN T-LYMPHOCYTES USING IN T-LYMPHOCYTES USING MICROARRAYS AND GENE MICROARRAYS AND GENE EXPRESSION DATABASES EXPRESSION DATABASES Dr A. Mouzaki, Dr A. Mouzaki, Dr C. Argyropoulos Dr C. Argyropoulos

ANALYSIS OF GENE REGULATION IN T-LYMPHOCYTES USING MICROARRAYS AND GENE EXPRESSION DATABASES

Embed Size (px)

DESCRIPTION

Powerpoint presentation of a talk I gave in the15th European Immunology Congress (EFIS 2003), June 2003 Rhodes Greece

Citation preview

Page 1: ANALYSIS OF GENE REGULATION IN T-LYMPHOCYTES USING MICROARRAYS AND GENE EXPRESSION DATABASES

ANALYSIS OF GENE ANALYSIS OF GENE REGULATION IN T-REGULATION IN T-

LYMPHOCYTES USING LYMPHOCYTES USING MICROARRAYS AND GENE MICROARRAYS AND GENE

EXPRESSION DATABASESEXPRESSION DATABASES

Dr A. Mouzaki, Dr A. Mouzaki,

Dr C. ArgyropoulosDr C. Argyropoulos

Page 2: ANALYSIS OF GENE REGULATION IN T-LYMPHOCYTES USING MICROARRAYS AND GENE EXPRESSION DATABASES

Aims & ObjectivesAims & Objectives

Hypothesis-driven application of high-Hypothesis-driven application of high-throughput gene quantification technologiesthroughput gene quantification technologies

Bioinformatics as an experimental toolBioinformatics as an experimental tool Application of formal Bayesian Statistical Application of formal Bayesian Statistical

Inference techniques in molecular biology Inference techniques in molecular biology experimental designexperimental design

Elucidation of gene regulation interactions Elucidation of gene regulation interactions in lymphocyte biology in lymphocyte biology

Page 3: ANALYSIS OF GENE REGULATION IN T-LYMPHOCYTES USING MICROARRAYS AND GENE EXPRESSION DATABASES

Starting point …Starting point …

Experimental work from early 90s Experimental work from early 90s suggesting that IL-2 gene is actively suggesting that IL-2 gene is actively repressed repressed in resting naïve T-lymphocytesin resting naïve T-lymphocytes

The same repressor(s) seems to be involved The same repressor(s) seems to be involved in HIV-1 regulation and in autoimmune in HIV-1 regulation and in autoimmune diseases (childhood ITP)diseases (childhood ITP)

Protein purification has been a daunting Protein purification has been a daunting tasktask

Page 4: ANALYSIS OF GENE REGULATION IN T-LYMPHOCYTES USING MICROARRAYS AND GENE EXPRESSION DATABASES

The distalThe distal NF-AT site in the IL-2 geneNF-AT site in the IL-2 gene & & HIV-1 LTR are co-regulatedHIV-1 LTR are co-regulated

Experimental systemExperimental system : : Χ.Χ.laevislaevis oocytesoocytes1.1. Transfections using various promoters to drive expression CAT –plasmids Transfections using various promoters to drive expression CAT –plasmids

2.2. Microinjection of nuclear and cytoplasmic extracts from Microinjection of nuclear and cytoplasmic extracts from T-lymphocytesT-lymphocytes

Experimental findingsExperimental findings : : HIV-LTRHIV-LTR HIV-LTR HIV-LTR

ΔΔPRREPRREIL-2 IL-2

promoterpromoterIL-2 promoter IL-2 promoter

Δ Δ PRREPRRE

resting naiveresting naive

T cellsT cells00 ++ 00 ++

activated T-activated T-cellscells

++++ ++++ ++++ ++++

Interpretation : ? Repressor in resting naive κύτταρα

Similar size complexes in EMSA experiments

Hot – cold competition EMSA experiments

Page 5: ANALYSIS OF GENE REGULATION IN T-LYMPHOCYTES USING MICROARRAYS AND GENE EXPRESSION DATABASES

Background dataBackground data

Repressor(s) is selectively expressed in Repressor(s) is selectively expressed in resting naive T-lymphocytesresting naive T-lymphocytes

Disappears and (never comes back) after T Disappears and (never comes back) after T – cell activation– cell activation

Binds to : Binds to :

HIV–1 PRRE -279 AGGCCAATGAAGGAGAGAACAACAGCTTGT -250 IL-2 PRRE -292 AAGAAAGGAGGAAAAACT-GT -273

HIV-1 NF-AT –252 TGTTACACCCTATFAGCCTGCATGGGATGGAGGACGC -216 HIV-1 NF-κB -108 ACAAGGGACTTTCCGCTGGGGACTTTCCA - 80

Page 6: ANALYSIS OF GENE REGULATION IN T-LYMPHOCYTES USING MICROARRAYS AND GENE EXPRESSION DATABASES

From gene (expression) to From gene (expression) to proteinprotein

Problem specificationProblem specification : : 1.1. looking for DNA-Binding Proteins that are looking for DNA-Binding Proteins that are

down-regulated upon T cell activationdown-regulated upon T cell activation2.2. Can bind to the common motif: Can bind to the common motif: AAGGAG

In theory: In theory: test candidate TFs in EMSA test candidate TFs in EMSA experimentexperiment

Q: Q: how do you find candidate factors ?how do you find candidate factors ? A: A: Bayesian statistics+TF Databases+Gene Bayesian statistics+TF Databases+Gene

Expression DatabasesExpression Databases

Page 7: ANALYSIS OF GENE REGULATION IN T-LYMPHOCYTES USING MICROARRAYS AND GENE EXPRESSION DATABASES
Page 8: ANALYSIS OF GENE REGULATION IN T-LYMPHOCYTES USING MICROARRAYS AND GENE EXPRESSION DATABASES

Crash Course on BAYESIANCrash Course on BAYESIAN StatisticsStatistics

BayesianBayesian:: named after named after Rev Bayes who described Rev Bayes who described Bayes theorem in the 18th centuryBayes theorem in the 18th century..

ProbabilityProbability : a real-number-valued measure of : a real-number-valued measure of the plausibility of a proposition when incomplete the plausibility of a proposition when incomplete knowledge does not allow us to establish its truth knowledge does not allow us to establish its truth or falsehood with certaintyor falsehood with certainty

Probability theory is just common sense Probability theory is just common sense reduced to numbers, and probability reduced to numbers, and probability represents the observer’s belief that a represents the observer’s belief that a certain event is truecertain event is true

Page 9: ANALYSIS OF GENE REGULATION IN T-LYMPHOCYTES USING MICROARRAYS AND GENE EXPRESSION DATABASES

How Does it Work ?How Does it Work ?A three step procedure : A three step procedure :

1.1. Clearly state what the hypotheses or models are, Clearly state what the hypotheses or models are, along with all the background information and along with all the background information and datadata

2.2. Use the language of probability to assign prior Use the language of probability to assign prior probabilities to the hypotheses investigatedprobabilities to the hypotheses investigated

3.3. Use Use probability calculusprobability calculus in order to arrive to in order to arrive to numerical values for the hypotheses in light of numerical values for the hypotheses in light of the available data the available data

)|(),|( )|(

),|(IEP

IHEPIHPIEHP

Page 10: ANALYSIS OF GENE REGULATION IN T-LYMPHOCYTES USING MICROARRAYS AND GENE EXPRESSION DATABASES

Experimental DesignExperimental Design

Bayesian methods are ideally suited to contribute Bayesian methods are ideally suited to contribute to experimental designto experimental design because : because :

1.1. information is usually available prior to information is usually available prior to experimentation experimentation

2.2. uncertainties can be combined with numerical uncertainties can be combined with numerical measures of measures of utilityutility of consequences of consequences

The optimal experimental design is the one The optimal experimental design is the one maximizing the expected utility of an maximizing the expected utility of an experimentexperiment

Page 11: ANALYSIS OF GENE REGULATION IN T-LYMPHOCYTES USING MICROARRAYS AND GENE EXPRESSION DATABASES

Decision structure of EMSA Decision structure of EMSA experimentsexperiments

a set of available actions a set of available actions ((ααii))

a set of uncertain events a set of uncertain events (E(Ei,ji,j)) : E: EDNDN and E and ENDNNDN

a set of consequencesa set of consequences Preferences about the Preferences about the

uncertain scenarios uncertain scenarios depends on the attitudes depends on the attitudes towards the consequences towards the consequences involved and is codified in involved and is codified in a utility function a utility function

ENDN

EDN

αi-1

αi+1

αi

c*

ci

U(c*) = 0

0 ≤ U(ci) ≤ 1

Page 12: ANALYSIS OF GENE REGULATION IN T-LYMPHOCYTES USING MICROARRAYS AND GENE EXPRESSION DATABASES

The role of DatabasesThe role of Databases Microarrays help to expand the action horizon by Microarrays help to expand the action horizon by

providing quantitative data about uncertain events providing quantitative data about uncertain events Specialized databases (i.e. TRANSFAC) Specialized databases (i.e. TRANSFAC)

complement microarrays by providing a wealth of complement microarrays by providing a wealth of data to aid numerical codification of utilities in a data to aid numerical codification of utilities in a specific situation.specific situation.

Common sense implies a simple utility function :Common sense implies a simple utility function :

1.1. the utility the utility cc** of the event Eof the event ENDNNDN is set equal to zero is set equal to zero

2.2. The utility of any factor in the case of down-The utility of any factor in the case of down-regulation is a semi-quantitative binding site regulation is a semi-quantitative binding site “similarity” score, with 1 being a perfect match and “similarity” score, with 1 being a perfect match and lesser degrees of similarity coded accordinglylesser degrees of similarity coded accordingly

Page 13: ANALYSIS OF GENE REGULATION IN T-LYMPHOCYTES USING MICROARRAYS AND GENE EXPRESSION DATABASES

The role of mathematicsThe role of mathematics maxmax ii {P {PDNDN(TF(TFii) * u(c) * u(cii) + P) + PNDNNDN(TF(TFii) * u(c) * u(c**)} )}

= max= max ii {P {PDNDN(TF(TFii) * u(c) * u(cii) }) } PPDN DN given by the given by the Behrens-FisherBehrens-Fisher distribution distribution

in Bayesian statistics (bypasses sample size in Bayesian statistics (bypasses sample size limitations and heteroscedasticity of limitations and heteroscedasticity of measurements)measurements)

Probabilities easier conceptualized as oddsProbabilities easier conceptualized as odds Calculated using the Calculated using the Bayes Bayes TheoremTheorem Computer algebra software takes care of Computer algebra software takes care of

integralsintegrals

Page 14: ANALYSIS OF GENE REGULATION IN T-LYMPHOCYTES USING MICROARRAYS AND GENE EXPRESSION DATABASES

Two possible outcomes for every gene:Two possible outcomes for every gene: eithereither δ < 0 δ < 0 (gene is down regulated upon T cell activation)(gene is down regulated upon T cell activation), , oror δ δ 0 0 (gene is not down-regulated)(gene is not down-regulated).. ““Bayesian”Bayesian” significance testingsignificance testing : :

HH00 : : δδ < 0 < 0 null hypothesisnull hypothesis and and HH11 : : δδ 0 0 alternative onealternative one Before we collect any gene expression data, we could Before we collect any gene expression data, we could

assume that each outcome is equiprobable (I stands for assume that each outcome is equiprobable (I stands for available background knowledgeavailable background knowledge))

P(HP(H00|I) = P(H|I) = P(H11|I) =|I) = ½ ½ After observing data, we calculate the POR:After observing data, we calculate the POR:

Significance Testing …Significance Testing …

)0(1

)0(

1

P

P

PP

DN

DN

Page 15: ANALYSIS OF GENE REGULATION IN T-LYMPHOCYTES USING MICROARRAYS AND GENE EXPRESSION DATABASES

The role of The role of public public data sourcesdata sources

The Lymphochip dataset : The Lymphochip dataset : Alizadeh AA, Eisen Alizadeh AA, Eisen MB, Davis RE MB, Davis RE et alet al. Distinct types of diffuse large . Distinct types of diffuse large B-cell lymphoma identified by gene expression B-cell lymphoma identified by gene expression profiling. profiling. NatureNature 2000; 403: 503-511 2000; 403: 503-511 provided the provided the relevant array experimentrelevant array experiment

The TRANSFAC database : The TRANSFAC database : Wingender E, Chen Wingender E, Chen X, Hehl R X, Hehl R et alet al. TRANSFAC: an integrated . TRANSFAC: an integrated system for gene expression regulation. system for gene expression regulation. Nucleic Nucleic Acids Res.Acids Res. 2000; 28: 316-319 2000; 28: 316-319 helped to calculate helped to calculate utilities for factorsutilities for factors

Page 16: ANALYSIS OF GENE REGULATION IN T-LYMPHOCYTES USING MICROARRAYS AND GENE EXPRESSION DATABASES

Bayesian analysis Bayesian analysis pinpoints pinpoints

differentially differentially regulated genesregulated genes

RESULTSRESULTS

Page 17: ANALYSIS OF GENE REGULATION IN T-LYMPHOCYTES USING MICROARRAYS AND GENE EXPRESSION DATABASES

DOWN-REGULATED TRANSCRIPTION DOWN-REGULATED TRANSCRIPTION FACTORSFACTORS

TRANSCRIPTION FACTORS Δ Fold (Log) Δ Fold Baseline POR

Nuclear Factor Of Activated T Cells 4 0.2758 1.32 13.540 191,90ID2 Inhibitor Of DNA Binding 2 0.3728 1.45 21.930 85,35MAZ - Myc Associated Zinc Finger Protein 0.6108 1.84 0.7148 85,08I52969 Programmed Cell Death 2 0.8764 2.40 -0.4210 72,23Myocyte-Specific Enhancer Factor Human Mef2 0.6634 1.94 0.8250 70,82Glucocorticoid Receptor, Alpha Splice Form 0.4030 1.50 22.130 61,59Evi Zinc Finger Protein 0.3018 1.35 15.430 57,91Interferon-Stimulated Transcription Factor 3 0.5290 1.70 14.160 48,21ETS2 0.8222 2.28 10.290 47,50ZFP161 Zinc Finger Protein 161 0.4574 1.58 0.9405 47,48MAF 96% Similar To Mouse MAF2 0.6966 2.01 0.7073 47,15Zinc Finger Protein, Subfamily 1A, 1 (Ikaros) 0.3630 1.44 23.090 36,93Far Upstream Element Binding Protein 3 0.6177 1.85 0.1068 35,90Host Cell Factor C1 (HCF). 0.6081 1.84 0.6696 35,50Max-Interacting Transcriptional Repressor Mad4 0.5164 1.68 11.560 29,00Mad1=MAD=MAX-Binding Protein 0.4276 1.53 11.340 25,99Signal Transducer Activator Of Transcription 1 0.4084 1.50 19.080 25,86BHLHB2 (Dec-1) 0.4872 1.63 0.9080 25,22Zinc Finger Protein ZNF131 With POZ Domain 0.4330 1.54 11.170 23,49Histone Acetyltransferase Associated With MOZ 0.5693 1.77 0.1876 23,27BCL-6 Zinc Finger Protein (ZFP51) 0.6552 1.93 0.5631 21,32Madh1 Mad 0.6431 1.90 0.9082 21,19Egrα, TGF – Β Early Inducible Protein 0.5897 1.80 13.210 21,08Cyclin D Binding Myb-Like Transcription Factor 1 0.5976 1.82 0.2317 21,05Mybl1. 100% Similarity F Human A-Myb 0.7820 2.19 -0.3559 21,02

Page 18: ANALYSIS OF GENE REGULATION IN T-LYMPHOCYTES USING MICROARRAYS AND GENE EXPRESSION DATABASES

COMBINING UTILITIES AND COMBINING UTILITIES AND PROBABILITIESPROBABILITIES

TRANSCRIPTION FACTORS BINDING SITES Nuclear Factor Of Activated T Cells 4 IL-2 PRRE

ID2 Inhibitor Of DNA Binding 2Lacks a DNA Binding domain; inhibits otherBHLH TFs

MAZ - Myc Associated Zinc Finger Protein GGGAGGGI52969 Programmed Cell Death 2 Unknown Binding SiteMyocyte-Specific Enhancer Factor Human Mef2 KCTAWAAATAGM Glucocorticoid Receptor, Alpha Splice Form GRE ELEMENTEvi Zinc Finger Protein ACAAGATAA Interferon-Stimulated Transcription Factor 3 GGGAAACCGAAACETS2 GGGAAG, GGAGGAA

ZFP161 Zinc Finger Protein 161 RNRNRCGCGCWMAF 96% Similar To Mouse MAF2 CTCATTTTCCCTTGGTTTCAGCAACTTTAZinc Finger Protein, Subfamily 1A, 1 (Ikaros) NNNTGGGAATRCCFar Upstream Element Binding Protein 3

TTGTTTTTCATGCCGTGGAATAACACAAAATAAAAAATCCCGAGGGAATATAC

Host Cell Factor C1 (HCF). ATGCAAATMax-Interacting Transcriptional Repressor Mad4 MYC SITE REPRESSORMad1=MAD=MAX-Binding Protein MYC SITE REPRESSORSignal Transducer Activator Of Transcription 1 ANTTCCGGGAANTGNSNBHLHB2 (Dec-1) Unknown Binding SiteZinc Finger Protein ZNF131 With POZ Domain Unknown Binding SiteHistone Acetyltransferase Associated With MOZ TGT/CGGT BCL-6 Zinc Finger Protein (ZFP51) GAAAATTCCTAGAAAGCATAMadh1 Mad MYC SITE REPRESSOREgrα, TGF – Β Early Inducible Protein Unknown Binding SiteCyclin D Binding Myb-Like Transcription Factor 1 CCCG(G/T)ATGT Mybl1. 100% Similarity F Human A-Myb YAACNGHH

Page 19: ANALYSIS OF GENE REGULATION IN T-LYMPHOCYTES USING MICROARRAYS AND GENE EXPRESSION DATABASES

The role of (depletion) EMSAsThe role of (depletion) EMSAs

Nuclear extracts of Nuclear extracts of resting naive resting naive Τ Τ cellscells

EnrichmentEnrichment x 10 x 10 Remove Remove TF using TF using

specific antibody and specific antibody and Protein A SepharoseProtein A Sepharose

Band Signal Reduction Band Signal Reduction Super Shift in typicalSuper Shift in typical EMSA experimentsEMSA experiments

Page 20: ANALYSIS OF GENE REGULATION IN T-LYMPHOCYTES USING MICROARRAYS AND GENE EXPRESSION DATABASES

HENCEHENCE ... ...

An ets-2 like factor most likely represses IL-2 An ets-2 like factor most likely represses IL-2 genegene

Most likely another Most likely another DNA-BP also plays a role DNA-BP also plays a role (work in progress using affinity columns to isolate (work in progress using affinity columns to isolate the complex)the complex)

Statistics can be extremely useful, provided Statistics can be extremely useful, provided computer algebra systems take care of “double” computer algebra systems take care of “double” integralsintegrals