Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
P. falciparum:Examination of Correlation Between Spatial Location and Temporal Expression of Genes
CAMDA Conference11 November 2004
JB Christian, C Shaw, J Noyola-Martinez, MC Gustin, DW Scott and R Guerra
Motivations:
• Evidence for correlation in literature– Printing artifact – Biological
• Improving on Bozdech threshold • Develop a visualization and statistical
testing methodology
ORF1 ORF2promoter
mRNA
Operon control (bacteria)
ORF1 ORF2
mRNAs
Upstream Activating Sequences (yeast)
UAS1 UAS2
ORF1 ORF2
Locus Control Region (mammalian globin cluster)
LCR1
mRNAs
Biological Motivations
Hypothesis and Statistic
• Statistical: Correlation between chromosomal location and gene expression?
• Biological: Gene order random?• H0: no correlation between location on
chromosome and expression• Consider correlations in partitions
ApproachCovariogram: General Tool
Partition Chromosome, Develop Statistic
Permutation Testing Framework
Check for Confounding Factors
Biological Significance
Issues
• Confounding (printing) or other artifacts• Account for inter-gene distances (as
opposed to adjacent pairwise correlation)• Significance of correlation
operon
Methods: Data
• Need gene information (plasmodb.orghas annotated fastA files):
TCAAGCAATTGTTAGATGAGAACAATAGGAAGAATTTAAATTTTAATGAT
CTGGTTATACACCCTTGGTGGTCTTATAAGAATTAA
>Pfa3D7|pfal_chr1|PFA0135w|Annotation|Sanger(protein coding) hypothetical protein Location=join(124752..124823,124961..125719)
ATGATATTTCATAAATGCTTTAAAATTTGTTCGCTCTCTTGTACTGTTTT
ATGGGTTACCGCCATATCATCGATCATTCAACCAGACAAACAACAAGAAA
• Normalized gpr files (2-D loess, centered and scaled)
Methods: Data
FastA sequence:5400 predicted
genes
QC Microarray:3800 genes5100 probes
Intersection:3500 genes
with common gene name
PFA0135w124752:125719 bp
PFA0135wprobe a16122_1
t1,t2,…, t48
PFA0135w124752:125719 bpprobe a16122_1
t1,t2,…, t48
Methods: Covariograms
)]),(|,([),;,( baba dyxdistdyxAveddyx <≤= ργ
• Covariogram 1: distance is chromosomal location:
• Covariogram 2: distance is printed microarray location:
)(,)(,),( locchrmidptjlocchrmidptiji ggggd −=
( ) ( )( )2,,
2,,),( yjyixjxiji ggggggd −+−=
Chr 10: Covariogram 2Chr 10: Covariogram 1
Chr 6: Covariogram 1Chr 6: Covariogram 2
Methods: Partitioning
• Partition• Avg of all
pairwise Pearson correlations
�=
=3
12 3
1
iirr
3 genes, ���
����
�
23
pairwise correlations
60 kb
120 kb
0 kb
�=
=21
11 21
1
iirr
7 genes, ���
����
�
27
pairwise correlations
Methods: Partitioning
• Chr 6, 40 kb partition• Significant?
Methods: Permutation Test
• in a 40kb interval on chr 6
• Permutation test• Null distribution• Estimated
p-values
2g
3g
4g
obsgene
1g 1e
2e
3e
4e
Perm(1)
1e
2e
3e
4e
Perm(2)
1e
2e
3e
4e
Perm(n)…
1e
2e
3e
4e
…
.50=r
22.0
2
57.0
=−
==
valp
n
r
genes
obs
Methods: Permutation Test
• Distribution ofin 40 kb interval
r
001.0
6
72.0
≤−
==
valp
n
r
genes
obs
Methods: Permutation Test
• Distribution ofin 40 kb interval
r
002.0
9
49.0
=−
==
valp
n
r
genes
obs
Methods: Permutation Test
• Distribution ofin 40 kb interval
r
475.0
12
018.0
=−
==
valp
n
r
genes
obs
Methods: Permutation Test
• Distribution ofin 40 kb interval
r
100kb
10kb
80kb
20kb
60kb40kb
Significant Intervals (Chr 7)
Significant Intervals (Chr 7)
100kb
10kb
80kb
20kb
60kb40kb
Significant Intervals (Chr 7)
100kb
10kb
80kb
20kb
60kb40kb
100kb80kb
10kb20kb40kb60kb
MAL6P1.273: hypothetical protein
MAL6P1.272: ribonuclease
MAL6P1.271: cdc2-like protein kinase
MAL6P1.268: hypothetical protein
MAL6P1.267: hypothetical protein
MAL6P1.266: hypothetical protein
MAL6P1.265: pyridoxine kinase
MAL6P1.263: hypothetical protein
MAL6P1.260: hypothetical protein
MAL6P1.259: hypothetical protein
MAL6P1.258: malate:quinone oxidoreductase
MAL6P1.257: hypothetical protein
750010967.5957.530.870.002
50001096595530.870.004
100002097095040.760.002
4500060100594590.510
50002096594540.760.002
60000801020940110.390.003
200004098094050.640.002
150002095593520.960
300006099093080.570.001
100004097093050.760.001
300004075071090.390.004
3000060750690100.440.003
75000100775675140.270.003
250010562.5552.530.860.002
100002057055030.860.004
01056055030.860.003
Start LocSize kbEnd kbStart kbngenesAvg Corp-val
Intervals (Chr 6)
Results: Summary Table
01/1322/2204/1304Chr 14
03/561/886/528Chr 5
42/485/8010/476Chr 4
00/400/683/400Chr 3
10kb in 60kb100kb60kb10kb
Conclusions
• Statistical: Significance for both small regions of strong correlation and large regions of weak correlation
• Biological: Evidence for regulation at multiple levels