52
The tangled genome Gil McVean

The tangled genome

Embed Size (px)

DESCRIPTION

The tangled genome. Gil McVean. The real heroes. PanMap – Genome sequencing of 10 Western Chimpanzees. Patterns of small insertion and deletion are quite different and reveal details of DNA repair pathways - PowerPoint PPT Presentation

Citation preview

Page 2: The tangled genome
Page 3: The tangled genome

The real heroes

Page 4: The tangled genome

PanMap – Genome sequencing of 10 Western Chimpanzees

• Patterns of small insertion and deletion are quite different and reveal details of DNA repair pathways

• Patterns of recombination in humans and chimpanzees are highly diverged at the fine-scale, but largely conserved at broad scales

• There are a surprising number (6+ now ‘confirmed)’) of trans-specific polymorphisms, probably maintained through host-pathogen interactions

Page 5: The tangled genome

A tangle of sequence

Page 6: The tangled genome
Page 7: The tangled genome

Difficulties of working with an incomplete reference

Page 8: The tangled genome

Using de novo assembly to find variants

Page 9: The tangled genome

Entire populationEntire population

Page 10: The tangled genome

Sample 1

Page 11: The tangled genome

Sample 2

Page 12: The tangled genome

Chromosome 1

Page 13: The tangled genome

Using Cortex leads to a high quality set of variants

Page 14: The tangled genome

Diversity in Western Chimpanzees

• Similar diversity as humans of European origin (0.06%-0.08%)• Excess of common variants• 1% variants shared with humans

Page 15: The tangled genome

Non-slippage indels are strongly biased to deletions

13:1 bias toward deletions.Unexpected peak at 4bp

Page 16: The tangled genome

Indels as indicators of DNA repair processes

Insertions deletions

5 10 2015 25

5

10

20

15

25

5

10

20

15

25

5 10 2015 25Indel size Indel size

Longest word agreement

Page 17: The tangled genome

TGACGAACTTATACTGCTTGAATA

TGACGAAC

ATTGAATA

TGAC--ATACTGAATATGACTTAT

Losing GAAC

Page 18: The tangled genome

A tangle of trees

Page 19: The tangled genome

Myers et al. 2005

Page 20: The tangled genome

The zinc-finger protein PRDM9 determines hotspot location

Myers et al. 2010

Page 21: The tangled genome

PRDM9 Zinc fingers are radically different between humans and chimps

Perhaps the most diverged gene between humans and chimpanzees

Repeatedly hit by adaptive evolution across mammals

Only known ‘speciation gene’ in mammals

Polymorphic in humans – leads to variation in hotspots and genome instability

Page 22: The tangled genome

Questions

• We know from previous work in a few regions that hotspot locations tend not to be shared between humans and chimpanzees

• Calculations suggested that only 40% of human hotspots were driven by PRDM9 binding

• But..– Is there any hotspot sharing?– Do we conservation of recombination rates at any scale?– What features determine hotspot location in chimpanzees?

Page 23: The tangled genome

The first genome-wide fine-scale map of recombination for a non-reference organism

Auton et al. 2012

Page 24: The tangled genome
Page 25: The tangled genome

Chimpanzee recombination is dominated by hotspots in a manner similar to humans

Page 26: The tangled genome

But the hotspots are not in the same locations

Page 27: The tangled genome

Fine-scale profiles around genes are similar

Page 28: The tangled genome

As is rate variation around CpG islands

Page 29: The tangled genome

Substantial PRDM9 diversity, but overlap in predicted binding sequences

Page 30: The tangled genome

No signal for predicted binding sequences

Page 31: The tangled genome

Similarities at 1Mb scale

Page 32: The tangled genome

Human and chimp recombination rates are correlated at the chromosomal scale

Page 33: The tangled genome

Human and chimp recombination rates are only correlated at broad scales

Page 34: The tangled genome

Lower correlation in structural rearrangements

• All, bar one, of the inverted regions are pericentric so change in position wrt to centromere does not contribute

• Change in proximity to telomere is important

Page 35: The tangled genome

chimphuman

C.A.

2a

2b

2a

2b

2

t

A natural experiment: chromosomal fusion

Page 36: The tangled genome

Fusion region shows 3-fold decrease in recombination rate

Page 37: The tangled genome

Fusion region shows 3-fold decrease in recombination rate

Page 38: The tangled genome

A tangle of histories

Page 39: The tangled genome

Distribution of sickle allele

Of malaria

Page 40: The tangled genome
Page 41: The tangled genome

How many variants are shared through descent?

Page 42: The tangled genome

SNPs shared by humans and chimpanzees (33,906 autosomal and 527 X chromosome)

Human polymorphism 9.4 million autosomal and 261,000 X chromosome SNPs from 1000 genomes Pilot 1 YRI (59 individuals)

Chimpanzee polymorphism3.8 million autosomal and 102,000 X chromosome SNPs from PanMap Pan troglogdytes verus (10 individuals)

Human-chimpanzee shared haplotypesAt least two shared SNPs in 4kb with the same

LD

reduce recurrentmutation

Human-chimpanzee shared coding SNPs

identify potentially functional coding variants

reduce artifactual sharing due to known or cryptic paralogs by filtering out SNPs with low 50 bp mappability, with high read depth, or not found in 1000 Genomes Phase 1

130 regions with shared haplotypes

outside the MHC

135 shared non-synonymous SNPs1 shared premature stop SNP200 shared synonymous SNPs

outside the MHC

7 resequenced using Sanger sequencing

8 with more than two pairs in LD

Page 43: The tangled genome

Outside of the MHC, six clear-cut cases of trans-species polymorphisms

All non-coding and putatively regulatory

FREM3/GYPE MTRR IGFBP7

Page 44: The tangled genome

In intron of IGFBP7

TFBS conserved in human/mouse/rat

Chromatin state segmentationby HMMDNaseI hypersensitive sites

Human-Chimpanzee shared SNPs

Primate phastCons score

TFBS identified by ChIP-seq

IGFBP7 gene structure

RelACUTL1

4kb

Regulatory region in HUVEC Regulatory region in NHEK and HMECWeak

enhancerWeak

enhancerStrong

enhancerStrong

enhancer

SRF Bach1

STAT3GATA-2

ISGF-3

Weak enhancer

20kb

Aver

age

pairw

ise

diffe

renc

esOpen chromatin by FAIRE

Page 45: The tangled genome

• In total, 130 regions with shared human-chimpanzee haplotypes. Six clear-cut cases of ancient balanced polymorphisms.

• None are protein-coding. Eleven occur in non-coding genes (e.g., 7 in lincRNAs). Eleven compelling cases of regulatory regions.

• What do these regions have in common?

Page 46: The tangled genome

SNPs shared by humans and chimpanzees

Shared haplotypesShared coding SNPs

Closest gene within 20 kb of a human-chimp shared haplotype (n=26, p=2x10-5, FDR=0.03)

Genes human-chimp coding shared SNP (n=99, p=0.017, FDR=0.20)

Enrichment of membrane glycoproteins-> host-pathogen interactions

Glycoproteins Glycoproteins

Page 47: The tangled genome

Project Participants

• University of OxfordAdam AutonRory BowdenPeter HumburgZam IqbalGerton LunterJulian MallerSimon MyersSusanne PfeiferIsaac TurnerOliver VennPeter Donnelly (PI)Gil McVean (PI)

• Biomedical Primate Research CentreRonald Bontrop

• University of ChicagoAdi Fledel-AlonRyan Hernandez (UCSF)Ellen LefflerCord MeltonLaure SegurelMolly Przeworski (PI)

• FundersHoward Hughes Medical InstituteNational Institute of HealthRoyal SocietyWellcome Trust

Page 48: The tangled genome

Where next?

Page 49: The tangled genome

Remarkable structural and sequence diversity in chimp PRDM9

Page 50: The tangled genome

Variation greater than in human populations

Page 51: The tangled genome

Little correlation in fine-scale structure around DNA repeat elements

Page 52: The tangled genome

No activating motif discovered in chimp

CCTCCCT