25
P. falciparum: Examination of Correlation Between Spatial Location and Temporal Expression of Genes CAMDA Conference 11 November 2004 JB Christian, C Shaw, J Noyola-Martinez, MC Gustin, DW Scott and R Guerra

2 - Guerra Rudy christian camda04 draft5camda2009.bioinformatics.northwestern.edu/camda04/papers/...Gustin, DW Scott and R Guerra Motivations: • Evidence for correlation in literature

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 2 - Guerra Rudy christian camda04 draft5camda2009.bioinformatics.northwestern.edu/camda04/papers/...Gustin, DW Scott and R Guerra Motivations: • Evidence for correlation in literature

P. falciparum:Examination of Correlation Between Spatial Location and Temporal Expression of Genes

CAMDA Conference11 November 2004

JB Christian, C Shaw, J Noyola-Martinez, MC Gustin, DW Scott and R Guerra

Page 2: 2 - Guerra Rudy christian camda04 draft5camda2009.bioinformatics.northwestern.edu/camda04/papers/...Gustin, DW Scott and R Guerra Motivations: • Evidence for correlation in literature

Motivations:

• Evidence for correlation in literature– Printing artifact – Biological

• Improving on Bozdech threshold • Develop a visualization and statistical

testing methodology

Page 3: 2 - Guerra Rudy christian camda04 draft5camda2009.bioinformatics.northwestern.edu/camda04/papers/...Gustin, DW Scott and R Guerra Motivations: • Evidence for correlation in literature

ORF1 ORF2promoter

mRNA

Operon control (bacteria)

ORF1 ORF2

mRNAs

Upstream Activating Sequences (yeast)

UAS1 UAS2

ORF1 ORF2

Locus Control Region (mammalian globin cluster)

LCR1

mRNAs

Biological Motivations

Page 4: 2 - Guerra Rudy christian camda04 draft5camda2009.bioinformatics.northwestern.edu/camda04/papers/...Gustin, DW Scott and R Guerra Motivations: • Evidence for correlation in literature

Hypothesis and Statistic

• Statistical: Correlation between chromosomal location and gene expression?

• Biological: Gene order random?• H0: no correlation between location on

chromosome and expression• Consider correlations in partitions

Page 5: 2 - Guerra Rudy christian camda04 draft5camda2009.bioinformatics.northwestern.edu/camda04/papers/...Gustin, DW Scott and R Guerra Motivations: • Evidence for correlation in literature

ApproachCovariogram: General Tool

Partition Chromosome, Develop Statistic

Permutation Testing Framework

Check for Confounding Factors

Biological Significance

Page 6: 2 - Guerra Rudy christian camda04 draft5camda2009.bioinformatics.northwestern.edu/camda04/papers/...Gustin, DW Scott and R Guerra Motivations: • Evidence for correlation in literature

Issues

• Confounding (printing) or other artifacts• Account for inter-gene distances (as

opposed to adjacent pairwise correlation)• Significance of correlation

operon

Page 7: 2 - Guerra Rudy christian camda04 draft5camda2009.bioinformatics.northwestern.edu/camda04/papers/...Gustin, DW Scott and R Guerra Motivations: • Evidence for correlation in literature

Methods: Data

• Need gene information (plasmodb.orghas annotated fastA files):

TCAAGCAATTGTTAGATGAGAACAATAGGAAGAATTTAAATTTTAATGAT

CTGGTTATACACCCTTGGTGGTCTTATAAGAATTAA

>Pfa3D7|pfal_chr1|PFA0135w|Annotation|Sanger(protein coding) hypothetical protein Location=join(124752..124823,124961..125719)

ATGATATTTCATAAATGCTTTAAAATTTGTTCGCTCTCTTGTACTGTTTT

ATGGGTTACCGCCATATCATCGATCATTCAACCAGACAAACAACAAGAAA

• Normalized gpr files (2-D loess, centered and scaled)

Page 8: 2 - Guerra Rudy christian camda04 draft5camda2009.bioinformatics.northwestern.edu/camda04/papers/...Gustin, DW Scott and R Guerra Motivations: • Evidence for correlation in literature

Methods: Data

FastA sequence:5400 predicted

genes

QC Microarray:3800 genes5100 probes

Intersection:3500 genes

with common gene name

PFA0135w124752:125719 bp

PFA0135wprobe a16122_1

t1,t2,…, t48

PFA0135w124752:125719 bpprobe a16122_1

t1,t2,…, t48

Page 9: 2 - Guerra Rudy christian camda04 draft5camda2009.bioinformatics.northwestern.edu/camda04/papers/...Gustin, DW Scott and R Guerra Motivations: • Evidence for correlation in literature

Methods: Covariograms

)]),(|,([),;,( baba dyxdistdyxAveddyx <≤= ργ

• Covariogram 1: distance is chromosomal location:

• Covariogram 2: distance is printed microarray location:

)(,)(,),( locchrmidptjlocchrmidptiji ggggd −=

( ) ( )( )2,,

2,,),( yjyixjxiji ggggggd −+−=

Page 10: 2 - Guerra Rudy christian camda04 draft5camda2009.bioinformatics.northwestern.edu/camda04/papers/...Gustin, DW Scott and R Guerra Motivations: • Evidence for correlation in literature

Chr 10: Covariogram 2Chr 10: Covariogram 1

Chr 6: Covariogram 1Chr 6: Covariogram 2

Page 11: 2 - Guerra Rudy christian camda04 draft5camda2009.bioinformatics.northwestern.edu/camda04/papers/...Gustin, DW Scott and R Guerra Motivations: • Evidence for correlation in literature

Methods: Partitioning

• Partition• Avg of all

pairwise Pearson correlations

�=

=3

12 3

1

iirr

3 genes, ���

����

23

pairwise correlations

60 kb

120 kb

0 kb

�=

=21

11 21

1

iirr

7 genes, ���

����

27

pairwise correlations

Page 12: 2 - Guerra Rudy christian camda04 draft5camda2009.bioinformatics.northwestern.edu/camda04/papers/...Gustin, DW Scott and R Guerra Motivations: • Evidence for correlation in literature

Methods: Partitioning

• Chr 6, 40 kb partition• Significant?

Page 13: 2 - Guerra Rudy christian camda04 draft5camda2009.bioinformatics.northwestern.edu/camda04/papers/...Gustin, DW Scott and R Guerra Motivations: • Evidence for correlation in literature

Methods: Permutation Test

• in a 40kb interval on chr 6

• Permutation test• Null distribution• Estimated

p-values

2g

3g

4g

obsgene

1g 1e

2e

3e

4e

Perm(1)

1e

2e

3e

4e

Perm(2)

1e

2e

3e

4e

Perm(n)…

1e

2e

3e

4e

.50=r

Page 14: 2 - Guerra Rudy christian camda04 draft5camda2009.bioinformatics.northwestern.edu/camda04/papers/...Gustin, DW Scott and R Guerra Motivations: • Evidence for correlation in literature

22.0

2

57.0

=−

==

valp

n

r

genes

obs

Methods: Permutation Test

• Distribution ofin 40 kb interval

r

Page 15: 2 - Guerra Rudy christian camda04 draft5camda2009.bioinformatics.northwestern.edu/camda04/papers/...Gustin, DW Scott and R Guerra Motivations: • Evidence for correlation in literature

001.0

6

72.0

≤−

==

valp

n

r

genes

obs

Methods: Permutation Test

• Distribution ofin 40 kb interval

r

Page 16: 2 - Guerra Rudy christian camda04 draft5camda2009.bioinformatics.northwestern.edu/camda04/papers/...Gustin, DW Scott and R Guerra Motivations: • Evidence for correlation in literature

002.0

9

49.0

=−

==

valp

n

r

genes

obs

Methods: Permutation Test

• Distribution ofin 40 kb interval

r

Page 17: 2 - Guerra Rudy christian camda04 draft5camda2009.bioinformatics.northwestern.edu/camda04/papers/...Gustin, DW Scott and R Guerra Motivations: • Evidence for correlation in literature

475.0

12

018.0

=−

==

valp

n

r

genes

obs

Methods: Permutation Test

• Distribution ofin 40 kb interval

r

Page 18: 2 - Guerra Rudy christian camda04 draft5camda2009.bioinformatics.northwestern.edu/camda04/papers/...Gustin, DW Scott and R Guerra Motivations: • Evidence for correlation in literature

100kb

10kb

80kb

20kb

60kb40kb

Significant Intervals (Chr 7)

Page 19: 2 - Guerra Rudy christian camda04 draft5camda2009.bioinformatics.northwestern.edu/camda04/papers/...Gustin, DW Scott and R Guerra Motivations: • Evidence for correlation in literature

Significant Intervals (Chr 7)

100kb

10kb

80kb

20kb

60kb40kb

Page 20: 2 - Guerra Rudy christian camda04 draft5camda2009.bioinformatics.northwestern.edu/camda04/papers/...Gustin, DW Scott and R Guerra Motivations: • Evidence for correlation in literature

Significant Intervals (Chr 7)

100kb

10kb

80kb

20kb

60kb40kb

Page 21: 2 - Guerra Rudy christian camda04 draft5camda2009.bioinformatics.northwestern.edu/camda04/papers/...Gustin, DW Scott and R Guerra Motivations: • Evidence for correlation in literature

100kb80kb

10kb20kb40kb60kb

Page 22: 2 - Guerra Rudy christian camda04 draft5camda2009.bioinformatics.northwestern.edu/camda04/papers/...Gustin, DW Scott and R Guerra Motivations: • Evidence for correlation in literature

MAL6P1.273: hypothetical protein

MAL6P1.272: ribonuclease

MAL6P1.271: cdc2-like protein kinase

MAL6P1.268: hypothetical protein

MAL6P1.267: hypothetical protein

MAL6P1.266: hypothetical protein

MAL6P1.265: pyridoxine kinase

MAL6P1.263: hypothetical protein

MAL6P1.260: hypothetical protein

MAL6P1.259: hypothetical protein

MAL6P1.258: malate:quinone oxidoreductase

MAL6P1.257: hypothetical protein

Page 23: 2 - Guerra Rudy christian camda04 draft5camda2009.bioinformatics.northwestern.edu/camda04/papers/...Gustin, DW Scott and R Guerra Motivations: • Evidence for correlation in literature

750010967.5957.530.870.002

50001096595530.870.004

100002097095040.760.002

4500060100594590.510

50002096594540.760.002

60000801020940110.390.003

200004098094050.640.002

150002095593520.960

300006099093080.570.001

100004097093050.760.001

300004075071090.390.004

3000060750690100.440.003

75000100775675140.270.003

250010562.5552.530.860.002

100002057055030.860.004

01056055030.860.003

Start LocSize kbEnd kbStart kbngenesAvg Corp-val

Intervals (Chr 6)

Page 24: 2 - Guerra Rudy christian camda04 draft5camda2009.bioinformatics.northwestern.edu/camda04/papers/...Gustin, DW Scott and R Guerra Motivations: • Evidence for correlation in literature

Results: Summary Table

01/1322/2204/1304Chr 14

03/561/886/528Chr 5

42/485/8010/476Chr 4

00/400/683/400Chr 3

10kb in 60kb100kb60kb10kb

Page 25: 2 - Guerra Rudy christian camda04 draft5camda2009.bioinformatics.northwestern.edu/camda04/papers/...Gustin, DW Scott and R Guerra Motivations: • Evidence for correlation in literature

Conclusions

• Statistical: Significance for both small regions of strong correlation and large regions of weak correlation

• Biological: Evidence for regulation at multiple levels