Integrated transcriptional profiling and linkage analysis for mapping
disease genes and regulatory gene networks analysis
Enrico PetrettoResearch Fellow in Genomic Medicine
Imperial College Faculty of [email protected]
Outline• Introduction: the biological framework
– Expression QTL mapping using animal models
– eQTL analysis in multiple tissues
• Integrating genome-wide eQTL data to identify gene association networks
– Data mining of eQTLs
– Graphical Gaussian models (GGMs)
– Example of identification of disregulated pathway
– Master transcriptional regulator
quantitative variation of mRNA levelsin a segregating population
Genetical Genomics
Genetic mapping
Expression QTLsgenetic determinants
of gene expression
model organisms
The rat is among the leading model species for researchin physiology, pharmacology, toxicology
and for the study of genetically complex human diseases
Spontaneously Hypertensive Rat (SHR):A model of the metabolic syndrome
• Spontaneous hypertension• Decreased insulin action• Hyperinsulinaemia• Central obesity• Defective fatty acid metabolism• Hypertriglyceridaemia
Specialized tools for genetic mapping: Rat Recombinant Inbred (RI) strains
HXB1 HXB2 HXB3 HXB4 HXB5 HXB6 HXB7 …
F1
F2
Pravenec et al. J Hypertension, 1989
Mate two inbred strains
F1 offspring are identical
F2 offspring are different(due to recombination)
Brother sister mating over >20 generations to achieve
homozygosity at all genetic loci RI strains
Spontaneously
Hypertensive Rat
Normotensive
Rat (BN)
SHR BN
F1
F2
H B BH B H HStrain Distribution Patternfor Gene X
Gene X
GenotypeB
Genotype H
Cumulative, renewable resource for phenotypes and genetic mapping
RI strains
Gene X
B H BB B H HSDP for Gene X
Mapping of QTLscompare strain distribution pattern of
markers and traits
RI strains
obesity
mR
NA
Linkage
Linkage
Gene expression analysis in the Rat
4 animals per strain (no pooling)
30 RI strains + 2 parental strains
Affymetrix RAE230A
Affymetrix RAE230_2
640 microarray data sets~ 16,000 probe sets per array (fat, kidney, adrenal) ~ 30,000 probe sets per array (heart, skeletal muscle)
Fat
Heart
Expression profiling
Skeletal muscle
eQTL Linkage Analysis
Multiple testing issues
1,011 genetic markers 15,923 probe sets
Evaluate the linkage statistics for each genetic marker and use permutation testing to provide
genome-wide corrected P-values
* Storey 2000
Expected proportion of false positives among the probe sets called significant in the linkage
analysis (False Discovery Rate*)
For each probe set on the microarray, expression profiles were regressed against all 1,011 genetic markers
cis-acting trans-actingeQTLgene
eQTLgene
cis- and trans-acting eQTLs
Candidate genes for physiological traits
Regulatory gene networks
Fat
Genomewide significance of the eQTL
eQTL datasets in the rat model system
In collaboration with Dr SA Cook (Molecular Cardiology, MRC Clinical Sciences Centre), Dr M Pravenec (Czech Academy of Sciences, Prague) and Prof N Hubner (MDC, Berlin)
Skeletal muscle
brain
Heart
Tissue
Rat genome
Cis-acting eQTL
Trans-acting eQTL
+ cis-eQTL
+ trans-eQTL
Heart
Genetic architecture of genetic variation in gene expression
Petretto et al. 2006 PLoS Genet
trans-eQTLs: small genetic effect
cis-eQTLs: big genetic effect highly heritable
FatHeart
FDR for cis- and trans-eQTLs
heart fat kidney adrenal
homogeneous tissues heterogeneous tissues
FDRFDR
Petretto et al. 2006 PLoS Genet
trans-eQTLs hot-spotsRat chromosome 8
not tissue-specific cluster
heart
fat
kidney
adrenal
PGW<0.05tissue-specific clusters
Trans-eQTLs
Master transcriptional
regulator ?
Strategy to identify master transcriptional regulators
eQTLs
cis
Genetic markers
Data mining
Association networks
Model for master transcriptional regulator
genetic variant
cis-linked gene
Transcription Factor (TF) activity profile
Expression of trans-linked genes
Multi-tissues
TF binding data
Downstream functional validation in the lab
(Dr Cook / Prof Aitman)
Gene expression
trans
GGMs
FunctionalAnalysis
(GSEA, etc.)
GGMs• Partial correlation matrix = (ij)
• Inverse of variance covariance matrix P = (ij) = P-1
• small n, large p• Regularized covariance matrix estimator by
shrinkage (Ledoit-Wolf approach)
• Guarantees positive definiteness
ij = - ij / (ii jj )½
Schafer and Strimmer 2004, Rainer and Strimmer 2007
Partial correlation graphs• Multiple testing on all partial correlations
– Fitting a mixture distribution to the observed partial correlations (p)
f (p) = 0 f0 (p;) + A fA (p)
uniform [-1, 1]
0 ,
Prob (non-zero edge|p) = 1 - 0 f0 (p;)
f (p)
0 +A =1, 0 >> A
Schafer and Strimmer 2004, Rainer and Strimmer 2007
GGMs Infer partial ordering of the node
• Standardized partial variances (SPVi)
• Proportion of the variance that remains unexplained after regressing against all other variables
• Log-ratios of standardized partial variances B = (SPVi / SPVj)½
Log(B) |rest = 0 Log(B) |rest ≠ 0
undirected directed
ij ij
endogenous variable
smaller SPV
exogenous variable
bigger SPV
Inclusion of a directed edge into the network is conditional on a non-zero partial correlation coefficient
Schafer and Strimmer 2004, Rainer and Strimmer 2007
Hypothesis driven analysis
Graphical Gaussian models
• Detect conditionally dependent trans-eQTL genes
• Infer partial ordering of the nodes (directed edges)
1. Gene expression levels under genetic control (i.e., ‘structural’ genetic perturbation)
2. Co-expression of trans-eQTLs point to common regulation by a single gene
0
20
40
60
80
100
120
140
160c1
7.6
c17
.38
c15
.10
8c1
5.1
1c1
6.0
c11
.31
c15
.75
c6.1
36
c4.9
3c1
5.7
8c1
.87
c10
.25
c11
.32
c4.1
48
c8.4
5c8
.87
c8.5
3c4
.91
c4.1
61
c10
.21
c4.1
51
c16
.46
c15
.80
c17
.40
c8.9
c16
.50
c3.4
1c2
0.4
4c3
.11
2c8
.49
c13
.9c1
7.8
7c3
.13
0c5
.15
1c7
.14
2c8
.32
c15
.58
c1.2
48
c8.3
8c1
.90
c12
.7c3
.12
9c6
.13
1
kidney
heart
fat
adrenal
Locus (chromosome.Mb)
Chromosome 15, 108 Mb, D15Rat29
trans-eQTLs hot spots
posterior probability for non-zero edge 0.8
Heart tissue, trans-eQTLs hot-spot (chromosome 15)
posterior probability for non-zero edge 0.8 posterior probability for directed edge 0.8
Heart tissue, trans-eQTLs hot-spot (chromosome 15)
Interferon Regulatory Factor 8
IFN-gamma-inducible
Implicated in immune and inflammatory responses
Overexpression of IRF8 greatly enhances IFN-gamma
Enrichment for NF-kappa-B transcription factor binding sites
posterior probability for non-zero edge 0.70.7
posterior probability for directed edge 0.8
Relaxing the threshold…
Signal transducer / activator of transcriptionIFN gamma activated, drive expression of the target genes, inducing a cellular antiviral state
Involved in the transport of antigens from the cytoplasm to the endoplasmic reticulum forassociation with MHC class I molecules
MHC class I antigen antigen processing and
presentation
degradation of cytoplasmic antigens for MHC class I antigen presentation pathways
Is this association graph tissue specific?
C15.108
C15.108 C15.108C15.108
C15.108
C15.108
C15.108
C15.108
C15.108
kidney, all trans-eQTLs, posterior probability 0.95
Adrenal, all trans-eQTLs, posterior probability 0.95
C15.108
C15.108C15.108
C15.108
C15.108
C15.108
C15.108C15.108
C15.108
Trans-eQTL genes detected
in multiple tissues
IRF - transcription factorinflammatory response
adrenalheartkidney
interferon-stimulated transcription factor
FC
in R
I st
rain
sF
C in
the
par
enta
l str
ains
type I interferon (IFN)inducible gene
Microarray data:dysregulated genes
Trans cluster
Cis eQTLs
Transcripts representing Dock9 gene
Pearson Correlation 100,000 permutations Bonferroni corrected
cis-acting eQTLs within the cluster region
Model for master transcriptional regulator
genetic variant
cis-linked gene
Transcription Factor (TF) activity profile
Expression of trans-linked genes
Gene Set Enrichment Analysis
Correlation between Dock9 and all trans-eQTLs (heart)
LCP2IRF8TAP1PSMB9PSMB8PSMB10IRGMIFIT3STAT1USP18IFI35IRF7LGALS3BP
Transcript 1370905_atEnrichment Score -0.73Normalized Enrichment Score -0.93p-value 0.004FDR q-value 3%
Transcript 1385378_atEnrichment Score -0.69Normalized Enrichment Score -1.85Nominal p-value 0.015FDR q-value 7%
IRF8PSMB8PSMB10TAP1PSMB9IRGMSTAT1USP18IFIT3IFI35IRF7LGALS3BP
Functionalgene-sets correlated with Dock9
Genes whose expression is altered greater than twofold in mouse livers experiencing graft-versus-host disease (GVHD)
as a result of allogenic bone marrow transplantation…
Other examples
ATP binding and ion transporter activityCalcium signaling pathway
Heart tissue, trans-eQTLs hot-spot (chromosome 15, 78Mb)
posterior probability for non-zero edge 0.8
posterior probability for directed edge 0.8
Fat tissue specific, trans-eQTLs hot-spot (chromosome 17)
posterior probability for non-zero edge 0.8
posterior probability for directed edge 0.8
Summary
• Genome-wide eQTL data provide new insights into gene regulatory networks
• GGMs applied to trans-eQTL hotspots identified dysregulated pathway related to inflammation
• Hypothesis-driven inference can be a powerful approach to dissect regulatory networks
Acknowledgments
Sylvia Richardson Stuart CookTim Aitman Jonathan Mangion
Rizwan Sarwar
collaborators:
Norbert Hubner (MDC, Berlin)
Michael Pravenec (Institute of Physiology, Prague)
Extra slides
Chr 15 qRT-PCR validation in RI strains
1
2
3
4
Rarresin1_pred Irf7_pred Stat1
Fo
ld c
han
ge
Array qRT-PCR
Gene Rarresin1_pred Irf7_pred Stat1
Array 2.28 3.06 1.63P 4.0E-05 8.6E-05 1.4E-04
qRT-PCR 1.36 1.91 1.90P 0.039 0.004 0.036
Rpt4 mRNA
Control Alpha Beta Gamma
+256
+64
+16
+4
±1
Interferon
Fo
ld c
han
ge
Irf7 mRNA
Control Alpha Beta Gamma
+256
+64
+16
+4
±1
Interferon
Fo
ld c
han
ge
Rpt4 and Irf7 mRNA levels increase in response to interferon
• H9c2 cells (rat cardiac embryonic myoblast)• Stimulated with recombinant rat interferon for 3 hours• RNA extracted, assayed by qRT-PCR (SYBR Green I)• 3 independent expts, 3 biological replicates