1
Analysis of 14 Coccidioides fungal genome sequences highlights incomplete speciation and natural selection Daniel E. Neafsey 1 , Bridget Barker 2 , Garry T. Cole 3 , John Galgiani 2 , Matthew R. Henn 1 , ChiungYu Hung 3 , Theo Kirkland 4 , Scott Kroken 2 , Cody McMahan 3 , Marc Orbach 2 , Daniel Park 1 , Steve Rounsley 2 , Thomas J. Sharpton 5 , Jason E. Stajich 5 , John W. Taylor 5 , Emily Whiston 5 , Bruce W. Birren 1 1 Broad Institute of MIT and Harvard, Cambridge, MA, USA, ([email protected] ). 2 University of Arizona, Tucson, AZ, USA. 3 University of Texas, San Antonio, TX, USA. 4 University of California, San Diego, CA, USA. 5 University of California at Berkeley, Berkeley, CA, USA. ABSTRACT We have fully sequenced the genomes of 4 C. immitis isolates and 9 C. posadasii isolates, allowing us to explore regional variation in patterns of intraspecific diversity and interspecific divergence. These genome sequences, in combination with the C. posadasii C735 genome previously sequenced by JCVI, offer an excellent resource for improving our understanding of these dimorphic fungal pathogens, which are the etiological agent of coccidioidomycosis (Valley Fever). Through analysis of all 14 genomes we find that although C. immitis and C. posadasii nominally diverged at least 5 million years ago, extensive regions of their genomes exhibit evidence of recent gene flow even while the majority of the genome exhibits perfect genetic isolation. We explore the signal of natural selection across all genes in both species, and identify signals of positive selection in membrane or cell wall associated proteins. These selection signals may indicate that immune-mediated selection pressure from mammalian hosts is an important driver of Coccidioides evolution, and help to clarify the relative importance of the saprophytic vs. parasitic phases of the Coccidioides life cycle. CONCLUSIONS C. immitis and C. posadasii, the causative agents of Desert Valley Fever, are nominally distinct species that selectively exchange 7-10% of genes. •Cell surface and exported proteins exhibit signs of positive selection, suggesting immune pressure. •Knowledge of introgression and Desert Valley Fever Facts Coccidioides On CDC list of select agents • Infection caused by inhalation of arthroconidia •>100,000 infections annually in US •35% symptomatic, illness can last months •5% require medical care •30% of all pneumonia cases in Arizona •infections that spread past lungs (to bones, joints, skin, brain) can persist for life C. immitis C. posadasii PFAM ID P value description PF00904 7.26 x 10 -4 Involucrin repeat (membrane-associated) PF02370 2.40 x 10 -3 M protein repeat (virulence; IgA binding; resistance to opsonophagocytosis) Evidence of Immune-Mediated Selection -0.2 0 0.2 0.4 0.6 0.8 1 1.2 0 1 2 3 4 5 6 7 8 position (M b) F st (divergence) Heterogeneity in Ci vs Cp Divergence Locally reduced genetic differentiation could be caused by introgression or incomplete lineage sorting. Incomplete lineage sorting Introgression Divergence (as measured by F st ) is nearly complete along most of each chromosome (Fst=1), but there are regions of much lower differentiation. Chromosome 1 Genome Sequencing 0.01 CP RMSCC 2133 CP RMSCC 3700 99 CP RMSCC 1037 CP RMSCC 3488 CP CPA 0001 CP CPA 0020 100 CP1 95 CP RMSCC 1038 CPS2 CP CPA 0066 99 100 CI RMSCC 2394 CI RMSCC 3703 100 99 CIH1 CI2 73 kb cds ML, HKY + C . p o s a d a s i i C . i m m i t i s Isolate Coverage Size (Mb) C. immitis RS 14.41X 28.9 C. immitis H538.4 3.41X 27.7 C. immitis RMSCC_2394 8.22X 28.8 C. immitis RMSCC_3703 3.17X 27.6 C. posadasii Silveira 5.23X 27.5 C. posadasii RMSCC_3488 8.52X 28.1 C. posadasii RMSCC_2133 6.69X 27.8 C. posadasii RMSCC_3700 3.58X 25.5 C. posadasii CPA_0001 3.09X 28.6 C. posadasii CPA_0020 3.42X 27.3 Reference Genome: C. immitis RS Total Annotated Genes: 10,355 Total SNPs: 670,880 Coalescent Analysis: Introgression Occurring 0 500 1000 1500 2000 2500 1 2 3 4 5 6 7 8 9 10 20 30 >30 C hiSq.Value N o.G enes threshold for significance after correction for mult. testing Coalescent method to test H 0 (incomplete lineage sorting) Wakeley & Hey 1997: 1. Use data from multiple loci to infer population parameters. 2. Analytically derive coalescent expectations for relative incidence of shared polymorphisms between species, fixed differences, and exclusive polymorphisms within species. 3. For individual loci, compare Obs vs. Exp mutation counts assuming no introgression. (Chi sq. test results to left; significant results indicate evidence of introgression.) CONCLUSION: Over 700 genes show evidence of recent introgresssion. Natural Selection Most genes exhibit purifying selection Neutrality Index (NI) measured for each gene using counts of divergent(D) vs. polymorphic(P) synonymous(S) & replacement(N) SNPs. Functional Enrichment in Positively Selected Genes 0 200 400 600 800 1000 1200 -LO G (N eutrality Index) No.genes Purifying Selection Positive Selection PN PS NI DN DS

Analysis of 14 Coccidioides fungal genome sequences highlights incomplete speciation and natural selection Daniel E. Neafsey 1, Bridget Barker 2, Garry

Embed Size (px)

Citation preview

Page 1: Analysis of 14 Coccidioides fungal genome sequences highlights incomplete speciation and natural selection Daniel E. Neafsey 1, Bridget Barker 2, Garry

Analysis of 14 Coccidioides fungal genome sequences highlights incomplete speciation and natural selectionDaniel E. Neafsey1, Bridget Barker2, Garry T. Cole3, John Galgiani2, Matthew R. Henn1, ChiungYu Hung3, Theo Kirkland4, Scott Kroken2, Cody McMahan3,

Marc Orbach2, Daniel Park1, Steve Rounsley2, Thomas J. Sharpton5, Jason E. Stajich5, John W. Taylor5, Emily Whiston5, Bruce W. Birren1

1Broad Institute of MIT and Harvard, Cambridge, MA, USA, ([email protected]). 2University of Arizona, Tucson, AZ, USA. 3University of Texas, San Antonio, TX, USA. 4University of California, San Diego, CA, USA. 5University of California at Berkeley, Berkeley, CA, USA.

ABSTRACTWe have fully sequenced the genomes of 4 C. immitis isolates and 9 C. posadasii isolates, allowing us to explore regional variation in patterns of intraspecific diversity and interspecific divergence. These genome sequences, in combination with the C. posadasii C735 genome previously sequenced by JCVI, offer an excellent resource for improving our understanding of these dimorphic fungal pathogens, which are the etiological agent of coccidioidomycosis (Valley Fever). Through analysis of all 14 genomes we find that although C. immitis and C. posadasii nominally diverged at least 5 million years ago, extensive regions of their genomes exhibit evidence of recent gene flow even while the majority of the genome exhibits perfect genetic isolation. We explore the signal of natural selection across all genes in both species, and identify signals of positive selection in membrane or cell wall associated proteins. These selection signals may indicate that immune-mediated selection pressure from mammalian hosts is an important driver of Coccidioides evolution, and help to clarify the relative importance of the saprophytic vs. parasitic phases of the Coccidioides life cycle.

CONCLUSIONS•C. immitis and C. posadasii, the causative agents of Desert Valley Fever, are nominally distinct species that selectively exchange 7-10% of genes.

•Cell surface and exported proteins exhibit signs of positive selection, suggesting immune pressure.

•Knowledge of introgression and selection patterns will inform vaccine design.

Desert Valley Fever Facts• Coccidioides On CDC list of select agents

• Infection caused by inhalation of arthroconidia

•>100,000 infections annually in US

•35% symptomatic, illness can last months

•5% require medical care

•30% of all pneumonia cases in Arizona

•infections that spread past lungs (to bones, joints, skin, brain) can persist for life

C. immitis

C. posadasii

PFAM ID P value description

PF00904 7.26 x 10-4 Involucrin repeat (membrane-associated)

PF02370 2.40 x 10-3 M protein repeat (virulence; IgA binding; resistance to opsonophagocytosis)

Evidence of Immune-Mediated Selection

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

0 1 2 3 4 5 6 7 8

position (Mb)

Fst

(di

verg

ence

)

Heterogeneity in Ci vs Cp Divergence

Locally reduced genetic differentiation could be caused by introgression or incomplete

lineage sorting.

Incomplete lineage sorting Introgression

Divergence (as measured by Fst) is nearly complete along most of each chromosome (Fst=1), but there are

regions of much lower differentiation.

Chromosome 1

Genome Sequencing

0.01

CP RMSCC 2133

CP RMSCC 370099

CP RMSCC 1037

CP RMSCC 3488

CP CPA 0001

CP CPA 0020100

CP1

95

CP RMSCC 1038

CPS2

CP CPA 006699

100

CI RMSCC 2394

CI RMSCC 3703100

99

CIH1

CI2

73 kb cds

ML, HKY + C

. posadasii

C. im

mitis

Isolate Coverage Size (Mb)C. immitis RS 14.41X 28.9

C. immitis H538.4 3.41X 27.7

C. immitis RMSCC_2394 8.22X 28.8

C. immitis RMSCC_3703 3.17X 27.6

C. posadasii Silveira 5.23X 27.5

C. posadasii RMSCC_3488 8.52X 28.1

C. posadasii RMSCC_2133 6.69X 27.8

C. posadasii RMSCC_3700 3.58X 25.5

C. posadasii CPA_0001 3.09X 28.6

C. posadasii CPA_0020 3.42X 27.3

C. posadasii CPA_0066 3.34X 27.7

C. posadasii RMSCC_1037 3.41X 26.6

C. posadasii RMSCC_1038 3.00X 26.1

C. posadasii C735 8X (JCVI) 26.7

Reference Genome: C. immitis RS

Total Annotated Genes: 10,355

Total SNPs: 670,880

Coalescent Analysis: Introgression Occurring

0

500

1000

1500

2000

2500

1 2 3 4 5 6 7 8 9 10 20 30 >30

Chi Sq. Value

No

. G

en

es

threshold for

significance after

correction for mult.

testing

Coalescent method to test H0

(incomplete lineage sorting)

Wakeley & Hey 1997:

1. Use data from multiple loci to infer population parameters.

2. Analytically derive coalescent expectations for relative incidence of shared polymorphisms between species, fixed differences, and exclusive polymorphisms within species.

3. For individual loci, compare Obs vs. Exp mutation counts assuming no introgression.

(Chi sq. test results to left; significant results indicate evidence of introgression.)

CONCLUSION: Over 700 genes show evidence of recent

introgresssion.

Natural SelectionMost genes exhibit purifying selection

Neutrality Index (NI) measured for each gene using counts of divergent(D) vs. polymorphic(P) synonymous(S) & replacement(N) SNPs.

Functional Enrichment in Positively Selected Genes

0

200

400

600

800

1000

1200

-LOG(Neutrality Index)

No

. g

enes Purifying Selection Positive Selection

PN PSNI

DN DS