View
42
Download
0
Category
Preview:
DESCRIPTION
Genetical Genomics in the Mouse. Finding Genes with Microarray Expression Data. Genetical Genomics. Jansen, R.C. and J.P. Nap (2001). Genetical genomics: the added value from segregation. Trends Genet 17(7): 388-91. Mouse Genetical Genomics. BXD recombinant inbred lines - PowerPoint PPT Presentation
Citation preview
Genetical Genomics in the Mouse
Finding Genes with Microarray Expression Data
Genetical Genomics
Jansen, R.C. and J.P. Nap (2001). Genetical genomics: the added value from segregation. Trends Genet 17(7): 388-91.
Mouse Genetical Genomics
• BXD recombinant inbred lines• 21 strains + parents and F1
– genotypes• 508 markers
– traits• forebrain RNA assayed by Affymetrix
U74Av2– PM probe sequences– MM probe sequences
• 1 to 4 microarrays per RI line (average 2.5)
QTL mapping by regression
• Trait vs genotype association– Genetically determined difference
• in expressed RNA level• in hybridization of probe sequence• in competing hybridization
– Measured by LRS (likelihood ratio statistic)
BXD Marker Distribution
0.0
0.4
0.8
1.2
0 100 200 300 400 500
Marker Number
Ma
rke
r L
oca
tion 1
35 7
9
11
1517
1319
BXD Marker Distribution
Trait Data Preparation
• 12,422 probesets (traits)– 16 PM & 16 MM probes
(oligonucleotides)– average PM-MM difference
• log2-transform average difference• normalize data of each microarray to
common mean and standard deviation• average replicate microarrays
• 400,000 PM & MM probes (cells)– log2-transform cell intensity– normalize and average replicate arrays
Multiple testing problem
• Two levels of multiple testing– Each trait or probe vs 508 loci– 12,422 traits or 400,000 probes
• Strategy– Empirical p-value for multiple loci
• measures significance of single best association
– Benjamini-Hochberg procedure for multiple traits or probes• may declare many significant associations• assumes at least one significant association
Empirical p-value
• Measures genome-wide significance– converts multiple test into single test– significance of best association among
all loci• Permutation test for distribution under
null– up to 106 scans with permuted trait values– record largest LRS for each permutation
• Find p-value of original regression from its rank in the null distribution
Outliers
Distribution of significance thresholds
10
12
14
16
18
20
22
7 8 9 10 11
Suggestive
Sig
nif
ican
t
• Examine permutation test distribution for bimodality– Compare 37th and
95th percentile values
• Find outlier and assign next most extreme value
• Redo permutation test and regression
Benjamini-Hochberg test
• Test of 100 uniformly distributed p-values (p-values from non-significant results)
• P-values as blue dots• Significance
threshold for FDR = 0.2 as red line0
0.05
0.1
0.15
0.2
0 5 10 15 20
Rank
P-v
alue
0
0.05
0.1
0.15
0.2
0 5 10 15 20
Rank
P-v
alue
Benjamini-Hochberg test
• Test of 10 low p-values (significant results) mixed with 90 p-values from non-significant results
• P-values as blue dots• Significance
threshold for FDR = 0.2 as red line
• Eleven cases declared significant
Declare significant
Empirical P-value Calculation
Marker regression mapping
Maximumgenome-wide
LRS
500x Permutation test
5000x Perm
50000x Perm
1000000x Perm
? p-value
p-value
p-value
p-value
?
?
Trait-locus associations
• Ranked P-values as blue dots (90 smallest from 12,422)
• Significance threshold as red line
• Cases below red line are significant for FDR = 0.2
• 75 significant trait-locus associations
0.000000
0.000500
0.001000
0.001500
0.002000
0.002500
0 20 40 60 80
Rank
P-v
alu
e
Probe-locus associations
• Ranked P-values as blue dots (600 smallest from ~400,000)
• Significance threshold as red line
• Cases below red line are significant for FDR = 0.2
• 576 significant probe-locus associations
0.0E+00
5.0E-05
1.0E-04
1.5E-04
2.0E-04
2.5E-04
3.0E-04
3.5E-04
4.0E-04
0 100 200 300 400 500 600
Rank
P-v
alu
e
QTLs from MM probes
• 576 QTLs defined by single microarray probes– 454 (79%) by PM
probes– 122 (21%) by MM
probes
• Proportion of PM probes QTLs declines as p-value increases
A B C
Fraction of PM probesamong QTLs
0.00
0.20
0.40
0.60
0.80
1.00
0 200 400 600
Rank
Av
era
ge
fo
r w
ind
ow
of
50
QTLs from cell-level mapping
• 576 cell-marker associations (QTLs)– 339 traits (probesets) represented– most probesets represented by a single
probe– rarely, two or more significant probes
from same probeset– all probes from one probeset identify
same locus– 79% of probes are PM
QTLs from PM cells only
• 454 PM cells defining QTLs– 288 traits (probesets) represented
• 184 controlled by location on the same chr
• 88 controlled by location on different chr• 16 unknown location for probeset
– 147 locations (marker loci) with nearby QTLs, distributed on all chromosomes
Probe-locus associations among traits
• 339 traits (probesets) with probes identifying significant QTLs
• 186 traits represented by a single probes
• 2 traits represented by 10 probes
Distribution among probesets of PM probes detecting a QTL
0
25
50
75
100
125
150
175
200
225
1 2 3 4 5 6 7 10
Probes per probeset
Fre
qu
en
cy
QTL distribution among marker loci
• 147 loci identified by at least one significant probes-locus association
• multiple associations to one locus– multiple probes from
one probeset– multiple QTL near
locus
Distribution among marker loci of QTLs detected by PM probes
0
10
20
30
40
50
60
70
1 3 5 7 9 11 13 17
Probes per marker
Fre
qu
en
cy
Profiles of probe sensitivity
Li, C & Wong, WH (2001) Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detection. PNAS 98: 31-36
Probe profiles (best)
93269_at
0
10
20
30
40
1 3 5 7 9 11 13 15
96156_at
0
20
40
60
80
1 4 7 10 13 16
94426_at
0
20
40
60
80
1 4 7 10 13 16
• LRS vs probe number
• Probesets with highest significance in probeset-level mapping
PMMM
94244_at
01020304050
1 3 5 7 9 11 13 15
Probe profiles (worst)
• LRS vs probe number
• Probesets with lowest significant association in probeset-level mapping
93583_s_at
05
10152025
1 3 5 7 9 11 13 15
102321_at
0123456
1 3 5 7 9 11 13 15
102776_at
05
1015202530
1 3 5 7 9 11 13 15
93730_at
02468
1012
1 3 5 7 9 11 13 15
PMMM
Distribution of controlled loci
Controlled traits by chromosome
00.5
11.5
22.5
3
3.54
4.55
1 3 5 7 9 11 13 15 17 19
Probeset Chr
Re
lati
ve
Fre
qu
en
cy
Syn
Nonsyn
Distribution of controlling loci
Controlling loci by chromosome
00.5
11.5
22.5
3
3.54
4.55
1 3 5 7 9 11 13 15 17 19
QTL Chr
Re
lati
ve
Fre
qu
en
cy
Syn
Nonsyn
Chr 9 QTLs
• Unusual number of chr 9 QTLs (22) controlling sequences on other chrs
• Normalized frequency 3-fold greater than average chr
• Many of these QTLs cluster near 2 loci on chr 9
Chr 9 QTLs controlling nonsyntenic sequences
0
1
2
3
4
5
6
7
8
0.00 0.20 0.40 0.60
Position of QTLs
Nu
mb
er
of
seq
ue
nce
s co
ntr
olle
d
D9Mit253
D9Mit18
Acknowledgments
• Robert W Williams– Lu Lu – S Shou– Yanhua Qu– Elissa Chesler
• John D Mountz– Hui Chen Hsu
• David Threadgill• Gene Hwang• Dan Nettleton
• Jintao Wang
• Ram Varma• Jianxin Wang• Mark Brady• Gene Sobel
U Tennessee, Memphis
U Alabama, Birmingham
U North Carolina
Iowa State U
Cornell U
GOG
Gene Expression Core
Bioinformatics
Recommended