Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
GG16CH04-Xie ARI 21 July 2015 11:16
Single-Cell Whole-GenomeAmplification and Sequencing:Methodology and ApplicationsLei Huang,1 Fei Ma,1 Alec Chapman,2,3 Sijia Lu,4
and Xiaoliang Sunney Xie1,2
1Biodynamic Optical Imaging Center (BIOPIC), School of Life Sciences, Peking University,Beijing 100871, China2Department of Chemistry and Chemical Biology and 3Graduate Program in Biophysics,Harvard University, Cambridge, Massachusetts 01238; email: [email protected] Genomics Co. Ltd., Taizhou, Jiangsu 225300, China
Annu. Rev. Genomics Hum. Genet. 2015.16:79–102
First published online as a Review in Advance onJune 11, 2015
The Annual Review of Genomics and Human Geneticsis online at genom.annualreviews.org
This article’s doi:10.1146/annurev-genom-090413-025352
Copyright c© 2015 by Annual Reviews.All rights reserved
Keywords
single-cell genomics, cancer genomics, WGA, DOP-PCR, MDA,MALBAC, CTC, PGD, PGS, in vitro fertilization
Abstract
We present a survey of single-cell whole-genome amplification (WGA)methods, including degenerate oligonucleotide–primed polymerase chainreaction (DOP-PCR), multiple displacement amplification (MDA), andmultiple annealing and looping–based amplification cycles (MALBAC). Thekey parameters to characterize the performance of these methods are de-fined, including genome coverage, uniformity, reproducibility, unmappablerates, chimera rates, allele dropout rates, false positive rates for calling single-nucleotide variations, and ability to call copy-number variations. Using theseparameters, we compare five commercial WGA kits by performing deep se-quencing of multiple single cells. We also discuss several major applicationsof single-cell genomics, including studies of whole-genome de novo mu-tation rates, the early evolution of cancer genomes, circulating tumor cells(CTCs), meiotic recombination of germ cells, preimplantation genetic di-agnosis (PGD), and preimplantation genomic screening (PGS) for in vitro–fertilized embryos.
79
Ann
u. R
ev. G
enom
. Hum
an G
enet
. 201
5.16
:79-
102.
Dow
nloa
ded
from
ww
w.a
nnua
lrev
iew
s.or
g A
cces
s pr
ovid
ed b
y H
arva
rd U
nive
rsity
on
09/0
1/15
. For
per
sona
l use
onl
y.
GG16CH04-Xie ARI 21 July 2015 11:16
INTRODUCTION
Individual cells are the fundamental units of life. DNA that carries genetic information exists assingle molecules in individual cells. In biology and medicine, there is often a need to characterizegenomes of individual cells for several reasons: (a) Some cells are precious and exist in low numbers[for example, human oocytes and circulating tumor cells (CTCs)]; (b) every cell is unique in itsgenome (for example, every sperm cell of an individual is different because of recombination);(c) the genomes of individual cells undergo stochastic changes with time, and hence single cells’genomes at particular times can reveal their temporal evolution; and (d ) the genomes of individualcells in the same sample are heterogeneous (as in the case of primary cancer tissues), and conse-quently one often needs to determine the distribution rather than the average of a large ensemble ofcells.
Changes in the genome of a single cell include single-nucleotide variations (SNVs) and struc-tural variations, which result in copy-number variations (CNVs). SNVs are single-base insertions,deletions, or mutations, either transitions (e.g., C→T, A→G) or transversions (e.g., A→T, C→G).Structural variations are genomic rearrangements, including insertions or deletions (indels), du-plications, inversions, and translocations. Structural variations at large genome scales are linkedto CNVs. Because of technical difficulties (chimeras; see below), structural variations are difficultto observe at the single-cell level, whereas CNVs ranging in size from hundreds of kilobases tomegabases can be detected relatively easily.
The copy number of a particular gene in a human somatic cell is normally two because of thetwo alleles (one from each parent). Although it had long been known that gene copy numberscould deviate from two in humans, genome-wide measurements of CNVs (27, 56) promptedextensive investigations of the biology and clinical consequences of CNVs, which often resultfrom double-strand breaks that cannot be repaired perfectly (21).
We note that epigenetics (e.g., the methylation of DNA) represents another type of genomicchange. The single-cell methylome has been determined (19), but this topic is beyond the scopeof our review.
Advances in next-generation DNA sequencing technologies have enabled individual humangenomes to be sequenced at affordable costs (4, 43, 69, 72). With the development of single-cellwhole-genome amplification (WGA) techniques—i.e., amplification of all DNA molecules afterthe cell has been lysed—single-cell genomics has emerged as an exciting field of its own. In thisarticle, we review the principles and compare the performance of the existing methodologies forsingle-cell WGA. We note that, in parallel with single-cell genomics, single-cell transcriptomicshas seen rapid developments as well; these techniques are beyond the scope of this article, andwe refer readers to other recent reviews for more information (55, 57, 60). We provide examplesto illustrate the utility of single-cell WGA in biomedical applications, including fundamentalresearch on genome stability and the biology of meiosis, as well as clinical perspectives for cancerdiagnosis and reproductive medicine.
SINGLE-CELL WHOLE-GENOME AMPLIFICATION METHODS
Although sequencing individual DNA molecules with lengths of thousands of bases has becomepossible (20), currently there is not a method that can collect and sequence all DNA fragmentsfrom a single cell. Therefore, sequencing an entire genome of a cell requires WGA. Given the traceamount of DNA from a single human cell (a few picograms), extreme care should be taken to avoidcontamination. Indeed, a major difficulty in single-cell genomics is substantial contamination from
80 Huang et al.
Ann
u. R
ev. G
enom
. Hum
an G
enet
. 201
5.16
:79-
102.
Dow
nloa
ded
from
ww
w.a
nnua
lrev
iew
s.or
g A
cces
s pr
ovid
ed b
y H
arva
rd U
nive
rsity
on
09/0
1/15
. For
per
sona
l use
onl
y.
GG16CH04-Xie ARI 21 July 2015 11:16
the environment and operators. Our experience is that WGA should be conducted in a dedicatedclean room with controlled air pressure and quality. As such, the bacterial contamination can becontrolled below 0.1% of the amount of DNA in a human cell (78). Alternatively, microfluidicdevices can be used to minimize contamination (14, 31, 68), which is particularly important forWGA of a bacterial cell.
Regardless of whether the WGA is done in a polymerase chain reaction (PCR) tube or a mi-crofluidic device, the chemistry is the most important aspect. Several different chemistries are avail-able for single-cell WGA, and here we review three major ones: the degenerate oligonucleotide–primed polymerase chain reaction (DOP-PCR) (61), multiple displacement amplification (MDA)(12), and multiple annealing and looping–based amplification cycles (MALBAC) (78).
Degenerate Oligonucleotide–Primed Polymerase Chain Reaction (DOP-PCR)
PCR has had a major impact on biology and medicine in the past 30 years (54). It offers single-copy sensitivity—that is, a single copy of DNA can generate a signal detectable by PCR. Todetect the existence of one mutation in a particular gene, one can design a PCR primer to amplifythe gene locus. Attempts have been made to use PCR to amplify the entire genome using a setof primers. Primer extension preamplification PCR (PEP-PCR) was among the first single-cellWGA methods used (3, 45, 77), followed by the more widely adopted DOP-PCR (61) (Figure 1).
N N N N N N
N N N N N N
N N N N N N
N N N N N N
N N N N N N
N N N N N N
N N N N N N
5'
5'
3'
5'
5'
3'
3'
3'
5'
Primer binding
5'
3' Primer extension
ssDNA template
5'
Primer binding
Primer extension
3'
3' 5'
3'
3' 5'
Figure 1Schematic of the degenerate oligonucleotide–primed polymerase chain reaction (DOP-PCR), which usesdegenerate oligonucleotide primers for whole-genome amplification (45). Additional abbreviation: ssDNA,single-stranded DNA.
www.annualreviews.org • Single-Cell Whole-Genome Amplification and Sequencing 81
Ann
u. R
ev. G
enom
. Hum
an G
enet
. 201
5.16
:79-
102.
Dow
nloa
ded
from
ww
w.a
nnua
lrev
iew
s.or
g A
cces
s pr
ovid
ed b
y H
arva
rd U
nive
rsity
on
09/0
1/15
. For
per
sona
l use
onl
y.
GG16CH04-Xie ARI 21 July 2015 11:16
The principle of DOP-PCR is to use degenerate primers containing a random six-base sequenceat the 3′ end and a fixed sequence at the 5′ end. For the initial amplification, the primers bindto the DNA template at a low annealing temperature. Strand extension is then achieved at araised temperature. During the second stage of PCR amplification, the previous products areamplified with a primer targeting the 5′ fixed sequence at a higher annealing temperature. Theconcentrations of the primers and polymerase directly affect the result of DOP-PCR. DOP-PCRhas been used to amplify picogram quantities of human genomic DNA for genotypic analyses (7).
DOP-PCR often yields low genome coverage, which is pertinent to the exponential ampli-fication of PCR. Any small differences in the amplification factors among different sequencesare exponentially enlarged, causing overamplified regions and underamplified regions inthe genome, and hence low coverage. Although lacking completeness in accessing the wholegenome, DOP-PCR can be well suited for measuring CNVs on a large genomic scale with largebin sizes (1 million bases) (47).
Multiple Displacement Amplification (MDA)
MDA was developed in 2001 by Lasken and coworkers (12) using a random hexamer as a primerand φ29 DNA polymerase, a highly processive DNA polymerase (5) with strong strand displace-ment activity. φ29 has a high replication fidelity because of its 3′→5′ exonuclease activity andproofreading activity (17, 51). Under isothermal conditions, MDA extends the random primersand produces branched structures, which are extended by other primers and eventually formmultibranched structures (Figure 2). The DNA fragments are 50–100 kb long.
MDA offers much higher genome coverage than DOP-PCR. However, like DOP-PCR, MDAis an exponential amplification process. This results in sequence-dependent bias, causing over-amplification in certain genomic regions and underamplification in other regions. However, suchsequence-dependent bias of MDA is not reproducible along the genome from cell to cell, render-ing CNV measurements noisy and normalization ineffective. Nevertheless, MDA has been widelyapplied since its invention (37).
Genomic DNA Random primers ϕ29 DNA polymerase
Branched DNA
Figure 2Schematic of multiple displacement amplification (MDA). Random primers are used for isothermalamplification with φ29 DNA polymerase, which has strong strand displacement activity (35).
82 Huang et al.
Ann
u. R
ev. G
enom
. Hum
an G
enet
. 201
5.16
:79-
102.
Dow
nloa
ded
from
ww
w.a
nnua
lrev
iew
s.or
g A
cces
s pr
ovid
ed b
y H
arva
rd U
nive
rsity
on
09/0
1/15
. For
per
sona
l use
onl
y.
GG16CH04-Xie ARI 21 July 2015 11:16
Annealing of primers
Synthesis
Extensionat 65°C
Denaturing
Quenchingat 0°C
Meltingat 94°C
m × n
m × n
PCRamplification
MALBAC productto be sequenced
Genomic DNA
Semiamplicon
Full amplicon
m × n2
Loopingat 58°C
(m+1) × n
MALBACprimer
Displaced strand
5'3'
Figure 3Schematic of multiple annealing and looping–based amplification cycles (MALBAC). Random primers with afixed sequence are used in a temperature cycle in which only the original genomic DNA and semiampliconsare linearly amplified, and full amplicons are protected from further amplification by the formation of DNAloops owing to the complementarity of the fixed sequences at the 3′ and 5′ ends. The DNA loops arepolymerase chain reaction (PCR) amplified at the final stage (78). Here, m is the number of temperaturecycles (m = 0 ∼ 10), and n is the number of primers bound; (m + 1) × n is the number of semiampliconspresent at the mth cycle, and m × n2 is the number of full amplicons generated in the mth cycle.
Multiple Annealing and Looping–Based Amplification Cycles (MALBAC)
MALBAC was first reported in 2012 by Zong et al. (78) for single-cell WGA; this method hasthe unique feature of quasi-linear amplification, which reduces the sequence-dependent biasexacerbated by exponential amplification. It has recently been applied to single-cell transcriptomemeasurement as well (6). The key to MALBAC is to not make copies of copies, and instead makecopies only of the original genomic DNA by protecting the amplification products (Figure 3). Thespecially designed MALBAC primers have a common 27-nucleotide sequence at the 5′ end and8 random nucleotides at the 3′ end, which can evenly hybridize to the template when the tem-perature is lowered (to 15–20◦C). At the beginning, semiamplicons with variable lengths (0.5–1.5kb) are made when the temperature is elevated (to 70–75◦C). The semiamplicons are melted offfrom the templates (at 95◦C), from which full amplicons are made with complementary ends,causing the formation of hairpins when the temperature is lowered (to 58◦C), preventing theirfurther amplification. This cycle is repeated 8–12 times. The quasi-linear amplification at thesefirst few cycles is critical for avoiding the sequence-dependent bias exacerbated by exponential
www.annualreviews.org • Single-Cell Whole-Genome Amplification and Sequencing 83
Ann
u. R
ev. G
enom
. Hum
an G
enet
. 201
5.16
:79-
102.
Dow
nloa
ded
from
ww
w.a
nnua
lrev
iew
s.or
g A
cces
s pr
ovid
ed b
y H
arva
rd U
nive
rsity
on
09/0
1/15
. For
per
sona
l use
onl
y.
GG16CH04-Xie ARI 21 July 2015 11:16
amplification. MALBAC uses a thermally stable DNA polymerase with strand displacementactivity. This important preamplification stage is then followed by exponential amplification of thefull amplicons by PCR, generating the amount of DNA required for next-generation sequencing.
MALBAC is not a mere combination of DOP-PCR and MDA, but is fundamentally differentbecause of its quasi-linear, as opposed to exponential, amplification. This results in two majoradvantages: accuracy for CNV detection and a low false negative rate for SNV detection.
MALBAC is not free from sequence-dependent bias. Unlike MDA, however, MALBAC’ssequence-dependent bias is reproducible along the genome from cell to cell. Therefore, signalnormalization for CNV noise reduction can be carried out. As shown below, after signal normal-ization with a reference cell, MALBAC offers the best CNV accuracy.
MALBAC exhibits the lowest false negative rates for SNV calling. However, MALBAC has ahigher false positive rate for SNV detection than MDA because the DNA polymerase currentlyused has a lower fidelity than the ϕ29 polymerase.
Characterization and Comparison of Whole-Genome Amplification Methods
It is necessary to provide a critical comparison of different WGA methods. Here we choose keyfeatures to compare, such as coverage, uniformity, reproducibility, allele dropout rate, false positiverate, chimera rate, and unmappable rate, as defined below. We compared the performance of fivecommercial single-cell WGA kits that use one of the three methods discussed above: the Sigma-Aldrich GenomePlex Single Cell Whole Genome Amplification Kit (DOP-PCR), the QiagenREPLI-g Single Cell Kit (MDA), the General Electric (GE) illustra Single Cell GenomiPhiDNA Amplification Kit (MDA), the Yikon Genomics Single Cell Whole Genome AmplificationKit (MALBAC), and the Rubicon Genomics PicoPLEX WGA Kit (MALBAC-like).
De Bourcy et al. (11) recently compared several WGA methods for single-cell analyses ofEscherichia coli. For mammalian cells, issues related to CNVs and heterozygosity of alleles aregermane to genomic analyses, which prompted us to perform a comparison of human cells. Wenote that human cancer cell lines often exhibit aneuploidy, which seriously complicates analysesof genome coverage and allele dropout rate. Here, we chose a diploid human cell line, BJ primaryhuman foreskin fibroblast. The lack of aneuploidy of this cell line makes it an ideal system tocharacterize amplification performance.
To perform this comparison, we amplified several single cells and sequenced to exclude cellsin metaphase (in which DNA is replicated). The amplified DNA products from each cell werefragmented by sonication to ∼250 base pairs. The sequencing libraries were prepared by using theNEBNext Ultra DNA Library Prep Kit before sequencing on an Illumina HiSeq X Ten platformwith 150-base paired-end reads.
Table 1 summarizes our characterization of key parameters that are defined and discussedbelow for the five tested kits for DOP-PCR, MDA, and MALBAC. Below, we define the parametersused in the comparison and present and discuss the results.
Coverage. At this point there has not yet been a complete human genome sequence—even thereference genome has gaps. In comparing the genome coverage of single cells with different WGAkits, we used the bulk sequencing without amplification at 30× depth as the reference and assumedit has 100% coverage.
The comparison was done using the total raw data of 80–100 Gb (30×) for each cell (Table 1).The single-cell sequencing data from certain kits have a large fraction of unmappable reads,which is of course undesirable. Comparison at identical sequencing depths would be impractical
84 Huang et al.
Ann
u. R
ev. G
enom
. Hum
an G
enet
. 201
5.16
:79-
102.
Dow
nloa
ded
from
ww
w.a
nnua
lrev
iew
s.or
g A
cces
s pr
ovid
ed b
y H
arva
rd U
nive
rsity
on
09/0
1/15
. For
per
sona
l use
onl
y.
GG16CH04-Xie ARI 21 July 2015 11:16
Table 1 Comparison of key parameters of five single-cell whole-genome amplification kits
KitWGA
methodRawdata Coverage CV Reproducibility ADO FPR CR UMR
Sigma-Aldrich
DOP-PCR 79 Gb 39% 0.14 0.93 76% 9.6 × 10−4 15% 64%
Qiagen MDA 78 Gb 84% 0.21 0.68 33% 1.3 × 10−4 2% 18%GE MDA 105 Gb 82% 0.17 0.31 38% 8.2 × 10−5 3% 44%Yikon MALBAC 96 Gb 72% 0.10 0.87 21% 3.8 × 10−4 5% 22%Rubicon MALBAC-
like80 Gb 52% 0.13 0.98 28% 2.4 × 10−4 13% 66%
Kits: Sigma-Aldrich, GenomePlex Single Cell Whole Genome Amplification Kit; Qiagen, REPLI-g Single Cell Kit; GE, illustra Single Cell GenomiPhiDNA Amplification Kit; Yikon, Single Cell Whole Genome Amplification Kit; Rubicon, PicoPLEX WGA Kit. Abbreviations: ADO, allele dropout; CR,chimera rate; CV, coefficient of variation; DOP-PCR, degenerate oligonucleotide–primed polymerase chain reaction; FPR, false positive rate; MALBAC,multiple annealing and looping–based amplification cycles; MDA, multiple displacement amplification; UMR, unmappable rate; WGA, whole-genomeamplification.
because of the high cost. Considering that the coverage is dependent on the actual sequencingdepth, Figure 4 gives the genome coverage of the single cells referenced to the bulk as a functionof the effective sequencing depth.
The low effective sequencing depth for the Sigma-Aldrich DOP-PCR kit was mainly due tothe high number of universal adapter reads. For Rubicon’s MALBAC-like kit, in addition to theprimer and adapter reads, the high number of unmappable reads includes large contributionsfrom the small DNA fragments (<30 base pairs) inserted into the sequencing adapters, whichwere too short to be mapped to the human reference genome. The coverage of Qiagen’s MDA kitis higher than that of Yikon’s MALBAC kit, which may result from MDA’s lack of reproducibilityof the sequence-dependent bias for the two different alleles in the diploid cell (36). Suffice it to say
0
20
40
60
80
100
0 10 20
Effective sequencing depth
Cove
rage
(%)
30 40
Sigma-Aldrich
Qiagen
GE
Yikon
Rubicon
Bulk
Figure 4Genome coverage versus effective sequencing depth for the five commercial single-cell whole-genomeamplification kits.
www.annualreviews.org • Single-Cell Whole-Genome Amplification and Sequencing 85
Ann
u. R
ev. G
enom
. Hum
an G
enet
. 201
5.16
:79-
102.
Dow
nloa
ded
from
ww
w.a
nnua
lrev
iew
s.or
g A
cces
s pr
ovid
ed b
y H
arva
rd U
nive
rsity
on
09/0
1/15
. For
per
sona
l use
onl
y.
GG16CH04-Xie ARI 21 July 2015 11:16
that MDA and MALBAC kits have comparable levels of coverage, both of which are significantlyhigher than the coverage of DOP-PCR.
Uniformity. Uniform amplification is important for accurate measurements of CNV. Figure 5ashows the raw read density of 23 chromosomes, clearly illustrating the sequence-dependent biasalong the genome. The variation is at the 1,000-kb scale.
If the variation along the genome is the same from cell to cell, then a normalization factor canbe calculated by averaging the read density of several cells in each bin and using this to normalizethe single-cell data. If the purpose of the investigation is to measure CNV, then reference cellswithout aneuploidy should be chosen. Figure 5b shows the data normalized by the average ofthree cells with no aneuploidy.
We characterize the uniformity by coefficients of variation of the read density (the genome-wide standard deviation divided by the mean). The normalization is done by the average of a fewreference cells (as in Figure 5b) in a certain bin size (1,000 kb). Figure 6a shows the uniformitycomparison of the five kits.
It is evident from Figure 5a that, of the five methods, DOP-PCR gives the flattest CNV rawdata without normalization. MDA creates variations along the genome that are not reproduciblefrom cell to cell and cannot be smoothed via normalization. MALBAC’s sequence-dependent biasis reproducible from cell to cell, giving the flattest CNV after normalization.
Reproducibility. Reproducibility is important in single-cell measurements, and measurementnoise needs to be minimized. We use Pearson’s cross-correlation coefficient of the read densitiesalong the genome between two identical cells to characterize the reproducibility.
Figure 6b shows our measurements of the cell-to-cell reproducibility of a WGA method by thecross-correlation coefficients for different WGA methods. It is apparent that the reproducibilityof DOP-PCR and MALBAC kits is better than that of MDA methods that exhibit stochasticityin WGA.
Allele dropout rate. The allele dropout rate is one of the most important characteristics of WGA,particularly for medical applications. Allele dropout arises from uneven WGA, which needs to beimproved. If a diploid cell has a heterozygous mutation, the lack of amplification in one of the twoalleles causes allele dropout (Figure 7a). Allele dropout is the primary cause of false negatives ofSNV calling. The allele dropout rate is measured by the ratio of the undetected and the actualheterozygous SNVs in a single cell. The latter is often approximated by the bulk measurement ofidentical cells.
Figure 6c shows our measurements of the allele dropout rates of the five kits. It is evident thatthe MALBAC methods have lower allele dropout rates than the other WGA methods.
False positive rate. A false positive of base calling can arise from either a sequencing error oran amplification error. Whereas random sequencing errors can be avoided with high sequencing
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−→Figure 5Read density of reads from single diploid human cells with no aneuploidy from the BJ primary human foreskin fibroblast cell line, usingthe five commercial single-cell whole-genome amplification kits. (a) Raw data of single cells at a sequencing depth of 2× with a bin sizeof 1,000 kb, mapped to the human reference genome. The chromosomes are shown in alternating red and blue colors. (b) The samesingle-cell data normalized by the mean of three identical cells in each bin, with corresponding coefficients of variation (CVs) reflectinguniformity.
86 Huang et al.
Ann
u. R
ev. G
enom
. Hum
an G
enet
. 201
5.16
:79-
102.
Dow
nloa
ded
from
ww
w.a
nnua
lrev
iew
s.or
g A
cces
s pr
ovid
ed b
y H
arva
rd U
nive
rsity
on
09/0
1/15
. For
per
sona
l use
onl
y.
GG16CH04-Xie ARI 21 July 2015 11:16
0246 0246 0246 0246 0246
12
34
56
78
91
01
11
21
31
41
51
61
71
81
92
02
12
2X
Y
Sigm
a-A
ldri
chSi
gma-
Ald
rich
Qia
gen
Qia
gen
GE
GE
Yiko
nYi
kon
Rubi
con
Rubi
con
0246 0246 0246 0246 0246
12
34
56
78
91
01
11
21
31
41
51
61
71
81
92
02
12
2X
Y
Sigm
a-A
ldri
ch, C
V =
0.1
4Si
gma-
Ald
rich
, CV
= 0
.14
a b Qia
gen,
CV
= 0
.21
Qia
gen,
CV
= 0
.21
GE,
CV
= 0
.17
GE,
CV
= 0
.17
Yiko
n, C
V =
0.1
0Yi
kon,
CV
= 0
.10
Rubi
con,
CV
= 0
.13
Rubi
con,
CV
= 0
.13
Chro
mos
ome
Chro
mos
ome
www.annualreviews.org • Single-Cell Whole-Genome Amplification and Sequencing 87
Ann
u. R
ev. G
enom
. Hum
an G
enet
. 201
5.16
:79-
102.
Dow
nloa
ded
from
ww
w.a
nnua
lrev
iew
s.or
g A
cces
s pr
ovid
ed b
y H
arva
rd U
nive
rsity
on
09/0
1/15
. For
per
sona
l use
onl
y.
GG16CH04-Xie ARI 21 July 2015 11:16
0.05
0.00
0.10
0.15
0.20
0.25
Coeffi
cien
t of v
aria
tion
0.2
0.0
0.4
0.6
0.8
1.0
Repr
oduc
ibili
ty
2.0 × 10–4
0.0
4.0 × 10–4
6.0 × 10–4
8.0 × 10–4
1.0 × 10–3
1.2 × 10–3
Fals
e po
siti
ve ra
te
20%
0%
40%
60%
80%
100%
Sigma-Aldrich
Qiagen GE Yikon Rubicon Sigma-Aldrich
Qiagen GE Yikon Rubicon
Sigma-Aldrich
Qiagen GE Yikon Rubicon Sigma-Aldrich
Qiagen GE Yikon Rubicon
Sigma-Aldrich
Qiagen GE Yikon Rubicon Sigma-Aldrich
Qiagen GE Yikon Rubicon
Unm
appa
ble
rate
20%
0%
40%
60%
80%
100%
Alle
le d
ropo
ut ra
te
5%
0%
10%
15%
20%
Chim
era
rate
a Coefficient of variation b Reproducibility
c Allele dropout rate d False positive rate
e Chimera rate f Unmappable rate
Figure 6Comparison of key parameters of the five commercial single-cell whole-genome amplification kits, usingdata generated from three single cells from the BJ primary human foreskin fibroblast cell line. (a) Coefficientof variation, calculated as the ratio of the standard deviation of read density to its average in three individualcells. (b) Reproducibility, measured by Pearson’s cross-correlation coefficient of the read densities betweentwo identical cells. (c) Allele dropout rate, defined as the percentage of homozygous sites in the single-cellsamples where the bulk sample is heterozygous at the same nucleotide site. (d ) False positive rate, defined asthe number of heterozygous site calls in the single cell divided by the number of sites in the bulk sample thatare homozygous or have a different allele at the same nucleotide site. (e) Chimera rate, defined as the numberof reads that are improperly connected (including abnormal fragment size and interchromosomalconnection) divided by the total number of mappable reads. ( f ) Fraction of reads that are unmappable to thereference genome.
depth, WGA errors often dominate, occur during the first or second cycle of exponential am-plification, and are dependent on the error rate of the DNA polymerase (Figure 7b). Figure 6dcompares the false positive rates of the WGA kits. The Qiagen and GE kits perform better thanthe MALBAC approaches, mainly because of the higher accuracy of the φ29 polymerase in theformer.
88 Huang et al.
Ann
u. R
ev. G
enom
. Hum
an G
enet
. 201
5.16
:79-
102.
Dow
nloa
ded
from
ww
w.a
nnua
lrev
iew
s.or
g A
cces
s pr
ovid
ed b
y H
arva
rd U
nive
rsity
on
09/0
1/15
. For
per
sona
l use
onl
y.
GG16CH04-Xie ARI 21 July 2015 11:16
Allele dropout
A
C
A
False positives
A
A
C*
A
Two alleles
in a single cell
Two alleles
in a single cell
Chimera formation
Two separate DNA
segments in a single cell
a b c
Figure 7Schematics for three types of errors that can arise in whole-genome amplification. (a) Allele dropout. Thelack of amplification of only one of the two alleles makes a heterozygous point mutation appear to be ahomozygous one, causing false negative errors. (b) False positive errors arising from point mutationsgenerated by the whole-genome amplification. (c) Chimera formation, in which different parts of the originalgenome are artificially connected by whole-genome amplification.
Chimera rate. Chimera formation is an artifact of WGA that generates reads that can bemapped to different parts of the genome that are not physically linked (Figure 7c). It prevents theidentification of true structural variations, and single-cell structural variations are still challengingto detect owing to this technical problem. The chimera rate is defined as the percentage of readsthat exhibit improper connections in the reference genome, i.e., those that show an abnormalfragment size or indicate interchromosomal connections. Figure 6e shows the chimera rates ofthe WGA kits. The chimera rates of the Sigma-Aldrich and Rubicon kits are much higher thanthose of the other kits.
Unmappable rate. The unmappable rate is the probability that junk sequences are generated inthe WGA process, which can arise from the formation of primer dimers, short DNA fragments,and/or nonspecific incorporations. A large fraction of unmappable reads reduces the cost effective-ness of genome sequencing and the completeness of the genome coverage. Figure 6f shows the un-mappable rates of the WGA kits. The Qiagen and Yikon kits showed the lowest unmappable rates.
We would like to point out that all of the comparisons described above were done using theexisting kits available to us and that each would improve with time. As important as the specificresults is how we performed the comparison. It is fair to say that each method or kit has its ownadvantages and disadvantages, and that selection should be based on specific applications.
For accurate CNV determination, DOP-PCR and MALBAC kits offer the highest repro-ducibility and lowest coefficients of variation. MALBAC kits and Qiagen’s MDA kit are well suitedfor simultaneous measurements of SNVs and CNVs. For calling de novo SNV, the false negativeand false positive rates need to be considered. Yikon’s MALBAC kit has the lowest false negatives(the allele dropout rate), while Qiagen’s MDA kit has the lowest false positive rate. Whensequencing cost effectiveness is considered, a low unmappable rate is desirable for high effectivecoverage.
www.annualreviews.org • Single-Cell Whole-Genome Amplification and Sequencing 89
Ann
u. R
ev. G
enom
. Hum
an G
enet
. 201
5.16
:79-
102.
Dow
nloa
ded
from
ww
w.a
nnua
lrev
iew
s.or
g A
cces
s pr
ovid
ed b
y H
arva
rd U
nive
rsity
on
09/0
1/15
. For
per
sona
l use
onl
y.
GG16CH04-Xie ARI 21 July 2015 11:16
We performed our comparison using small sample tubes instead of microfluidic devices; the per-formance parameters might change under microfluidic conditions and might be different for bacte-rial (small) and human (large) genomes. Although expensive, a similar comprehensive comparisonof the chemistries for human genomes would be worth doing under microfluidic conditions.
STUDY OF GERM CELLS
Phasing of Human Genomes with Single-Cell Genomics
One central goal of human genetics is to characterize the genetic variations of individual hu-man beings and study their correlations with function-related traits (1, 44). Human genomes arediploid in nature, and human genes often contain multiple coding and regulatory elements withvariations among different populations. Different phenotypic consequences can often arise de-pending on whether the genetic variations are associated on the same chromosome (in cis) or onopposite homolog chromosomes (in trans) (2, 62). One example is the phenomenon of compoundheterozygosity (28, 66), in which the two heterozygous nonsynonymous mutations in a certaingene could result in either one normal version (as in cis) or no normal version (as in trans) of thegene. A complete understanding of genetic variations and their consequences in human diseasescannot be obtained without resolving the combination of genetic variations at different loci onthe same chromosome—that is, the haplotypes (16, 29). Without the haplotype information, thedescription of individual human genomes is incomplete, and the functional interpretation can beerror prone.
However, owing to the limited read lengths of the currently available sequencing techniques,it is often challenging to separate the two homolog chromosomes in genetic analyses to obtain theassociation information between the genetic variations. Molecular cloning technologies have beendeveloped for phasing individual genomes (33, 38, 59, 76). However, the cloning procedures areoften lengthy and labor intensive and therefore may not be scalable with the increasing demandfor individual genome sequencing.
Single-cell genome sequencing provides an alternative way to obtain phasing information onhuman individuals. Individual metaphase chromosomes have been isolated from actively dividingcells by techniques such as laser microdissection (41), microfluidics (14), or flow cytometry (74)before being whole-genome amplified and genotyped to obtain the association information forgenetic variations on individual chromosomes. Peters et al. (50) further developed a low-cost DNAhaplotyping process by diluting genomic DNA into 384 wells, each containing subcellular-levelgenomic DNA. DNA molecules from each well were then separately whole-genome amplifiedand sequenced to obtain the phasing information. Hou et al. (24), Kirkness et al. (32), and Lu et al.(40) used whole-genome-amplified individual germ cells for haplotype analyses, further increas-ing the haplotype block size to the chromosome-spanning level (Figure 8). These two categoriesof haplotyping approaches do not require molecular cloning, cell culturing, or sophisticated in-strumentation for chromosome isolation, potentially enabling haplotype-resolved comprehensivegenetic studies and clinical applications in the near future.
Study of Meiotic Recombination in Human Germ Cells
Meiotic recombination is essential to the proper segregation of homolog chromosomes, whichresults in the exchange of genetic information through crossover events and creates diversityfor evolution (8). Abnormality in generating crossovers of homolog chromosomes is the leading
90 Huang et al.
Ann
u. R
ev. G
enom
. Hum
an G
enet
. 201
5.16
:79-
102.
Dow
nloa
ded
from
ww
w.a
nnua
lrev
iew
s.or
g A
cces
s pr
ovid
ed b
y H
arva
rd U
nive
rsity
on
09/0
1/15
. For
per
sona
l use
onl
y.
GG16CH04-Xie ARI 21 July 2015 11:16
AA CC AA CC
AA AA GG
CC GG CC
AA TT AA GG
AA CC AA xx CC AA
TT TT CC AA
TT TT CC
TT GG TT AA
A C A G C
T G T C A
Unphased personal genome
Single sperm
Sperm 1
Sperm 8
Sperm 2
Sperm 3
Sperm 4
Sperm 5
Sperm 6
Sperm 7
Phased genome
A/T C/G A/T C/G A/C
Figure 8Phasing the genome of an individual using the single-nucleotide variation (SNV) linkage information fromhis single sperm cells. The diploid genome was sequenced and found to contain five heterozygous single-nucleotide polymorphisms (SNPs), with unknown linkage information shown in purple. Individual spermcells were then sequenced via whole-genome amplification using the multiple annealing and looping–basedamplification cycles (MALBAC) method, from which SNP linkage information in each sperm was used tophase the individual’s diploid genome. The light blue “T” is a whole-genome amplification or sequencingerror; the black “x” represents a crossover in recombination, the switching point between the paternal andmaternal DNA. Figure adapted from Reference 40.
cause of miscarriage and birth defects (13). Population analyses such as linkage disequilibrium andpedigree studies are widely used in studying the highly uneven pattern of meiotic recombinationacross the human genome. However, population analyses yield results that are averaged amongindividuals and affected by evolutionary pressures, therefore often masking the quickly evolvingor individual-specific features of the recombination-active regions (23, 30, 34).
Single-germ-cell genome sequencing offers a novel approach for studying meiotic recombi-nation at the level of individual human beings. Each germ cell is unique in genome constructionbecause of the differences in recombination, and sequencing multiple single germ cells froman individual allows investigators to map individual recombination events and study germlinegenome instability. Single-cell studies have been published using donor sperm (40, 68) as well asoocytes (24) for mapping recombination in individual human beings. Wang et al. (68) utilized amicrofluidic device to separate and sequence single sperm, thereby mapping the recombinationdistributions and studying the gene conversion and de novo mutation events of an individual. Luet al. (40) sequenced ∼100 single sperm cells using the MALBAC method and mapped the recom-bination events of an individual at high resolution (Figure 9). This study revealed the correlationof a decreased crossover frequency with an increase of autosomal aneuploidy rate from humansperm, recapitulating the importance of meiotic crossover in maintaining genome stability in germcells.
www.annualreviews.org • Single-Cell Whole-Genome Amplification and Sequencing 91
Ann
u. R
ev. G
enom
. Hum
an G
enet
. 201
5.16
:79-
102.
Dow
nloa
ded
from
ww
w.a
nnua
lrev
iew
s.or
g A
cces
s pr
ovid
ed b
y H
arva
rd U
nive
rsity
on
09/0
1/15
. For
per
sona
l use
onl
y.
GG16CH04-Xie ARI 21 July 2015 11:16
0 100
0 1000 100
0 100
0 100
0 100
0 1000 100
0 100 0 100 0 100
0 1000 100
0 100
0 100
0 100 0 100
0 1000 100
0 100 0 100
0 100
0 100
Paternal haplotype
Maternal haplotype
Unresolved haplotype
Unassembled gap
Centromere region
Crossover
Percentage of reads covering paternal SNPs
Percentage of reads covering maternal SNPs
1 2 3 4 5 6 7 8 9 10 11
Chromosome12 13 14 15 16 17 18 19 20 21 22 X Y
Figure 9Crossovers between paternal ( green) and maternal (red ) DNA in the 23 chromosomes of a sperm cell. Figure adapted fromReference 40.
Hou et al. (24) utilized the first and the second polar bodies of individual oocytes to determinethe crossover maps of female individuals and to study the chromatid interference effect in oocytes.This study also utilized the information of the two polar bodies to deduce the genomes of the oocytepronuclei, enabling noninvasive preimplantation genetic diagnosis (PGD) and preimplantationgenomic screening (PGS) for in vitro–fertilized embryos (Figure 10).
Preimplantation Genetic Diagnosis and Genomic Screeningin In Vitro Fertilization
In vitro fertilization is a technique of assisted reproductive technology by which an egg is fertilizedby sperm outside the human body. The fertilized egg is cultured in vitro for 2–6 days before beingimplanted into the uterus with the intention of creating a successful pregnancy. Nucleic acid
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−→Figure 10Deduction of viability for preimplantation genomic screening for in vitro fertilization. (a) A human oocyte with homologousrecombination between the paternal (blue) and maternal (red ) DNA (only one chromosome is shown). Upon fertilization with a spermcell, the first and second polar bodies, which are dispensable for embryo development, can be safely biopsied by a micropipette, subjectto whole-genome amplification with multiple annealing and looping–based amplification cycles (MALBAC). By sequencing the twopolar bodies or a few cells in the blastocyst stage of the embryo, one can avoid chromosome abnormality (middle case) as well asundesirable point mutations from a disease-carrying parent (right case) and increase in vitro fertilization’s successful rate for healthybabies (left case). (b) Deduction of the copy number of chromosomes from a female pronucleus based on its two polar bodies. Thededuction is based on the fact that the total number of four chromatids is conserved, which was verified by sequencing the femalepronucleus with the donor’s consent. Panel adapted from Reference 24.
92 Huang et al.
Ann
u. R
ev. G
enom
. Hum
an G
enet
. 201
5.16
:79-
102.
Dow
nloa
ded
from
ww
w.a
nnua
lrev
iew
s.or
g A
cces
s pr
ovid
ed b
y H
arva
rd U
nive
rsity
on
09/0
1/15
. For
per
sona
l use
onl
y.
GG16CH04-Xie ARI 21 July 2015 11:16
0
1
2
3
4
0
1
2
3
4
0
1
2
3
4
0
1
2
3
4
0
1
2
3
4
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X
First polar body
Second polar body
Total chromatid copy numbers conserved in the whole oocyte
Female pronucleus predicted
Female pronucleus confirmed
Male pronucleus
Polar body 1
Chr1
Sperm
Fertilization
C
C
Polar body 2 Polar body 1 Polar body 2 Polar body 1 Polar body 2
Female pronucleus
Paternal haplotype
Maternal haplotype
Healthy baby Birth difficulty or
baby with genetic disorder
Baby with dominant
disease allele
HUMAN OOCYTE
A
A
A
a
b
Chromosome
AAAAA CC C
www.annualreviews.org • Single-Cell Whole-Genome Amplification and Sequencing 93
Ann
u. R
ev. G
enom
. Hum
an G
enet
. 201
5.16
:79-
102.
Dow
nloa
ded
from
ww
w.a
nnua
lrev
iew
s.or
g A
cces
s pr
ovid
ed b
y H
arva
rd U
nive
rsity
on
09/0
1/15
. For
per
sona
l use
onl
y.
GG16CH04-Xie ARI 21 July 2015 11:16
materials obtained from the in vitro–fertilized embryos can be used to perform PGD and PGS toavoid the inheritance of pathogenic mutations and chromosome abnormalities, respectively.
PGD/PGS can be performed on the two polar bodies of the oocytes, one of the blastomeric cellsfrom day-3 embryos, or several trophectoderm cells from day-5 blastocyst embryos, in which onlyone or a few cells can be obtained for PGD/PGS purposes. Performing WGA on these materialsenables comprehensive chromosome analyses on various genome analytical platforms, such asa comparative genomic hybridization array (52), single-nucleotide polymorphism (SNP) array(63), or multiplex quantitative PCR (65). The rapid development of high-throughput sequencingtechniques further reduced the cost and increased the precision and resolution of chromosome-level PGD/PGS (24, 26, 71). Treff et al. (64) utilized targeted high-throughput sequencing toperform PGD of monogenic disease, and efforts were made to perform both point mutationPGD and chromosomal PGS in the same embryo (10, 58). By using MALBAC next-generationsequencing, we were able to report a fully integrated pipeline for combined monogenic PGD andchromosomal PGS and a live birth free of a specific pathogenic mutation carried by the child’sparents (22). We envision that the advancement of WGA techniques will enable clinical trials withfully integrated pipelines for combined monogenic PGD and chromosomal PGS in the near future.
STUDY OF GENOME EVOLUTION IN CANCER
Cancer is a genomic disease (67). Recent advances in single-cell WGA methods have alloweddetermination of CNVs at a large scale (48, 70) and at the single-gene level (15) as well as SNVsof a single cancer cell (18, 70, 75), in primary tissue, or in CTCs in blood (9, 39, 49).
A pair of companion studies used single-cell WGA and exome sequencing to examine tumorheterogeneity (25, 73). These studies found hundreds of SNVs—which differ from cell to cell,revealing the genetic complexity of these tumors—and applied population genetic analysis to studytheir clonal composition. However, our independent analysis of these data indicates that the vastmajority of these SNVs are due to sequencing artifacts or possibly contaminations, and only ahandful, if any, may be attributed to true heterogeneity of the tumor samples (A. Chapman, C.Zong & X.S. Xie, unpublished results). Of the 711 tumor-specific SNVs reported in one of thestudies (25), we found that 42% of them were actually present at a lower level in the normaltissue as well. These are likely to be SNPs that were mistakenly not identified in the normal tissue(false negatives) rather than tumor-specific SNVs. Furthermore, 58% of the remaining SNVsreported were present in dbSNP, a database of germline mutations known to occur in the humanpopulation. Because new mutations in a tumor arise randomly, it would be highly unlikely that asignificant number of them happen to coincide with known mutations present in the general pop-ulation. Instead, an alternative explanation for these mutations could be contaminations from theoperators or other human sources. Indeed, we observed that of those remaining SNVs not foundin dbSNP, 84% were found in unrelated samples—either normal control cells from the same studyor cells from the companion study. In summary, more than 96% of the SNVs identified as beingtumor specific in that report (25) appear likely to be contaminants or sequencing artifacts.
We now give a few examples of new information available from single-cell genomics.
Measurement of Spontaneous Whole-Genome Mutation Rate
In calling SNVs from a single cell by WGA (24, 68, 73), one challenge is high false negative ratescaused by allele dropout, and another is false positives associated with amplification and sequencingerrors (either random or systematic) (42). Figure 11 shows 35 unique SNVs in a human cancer cellline (SW480) newly acquired during 20 cell divisions. Adjusting for a 70% detection efficiency for
94 Huang et al.
Ann
u. R
ev. G
enom
. Hum
an G
enet
. 201
5.16
:79-
102.
Dow
nloa
ded
from
ww
w.a
nnua
lrev
iew
s.or
g A
cces
s pr
ovid
ed b
y H
arva
rd U
nive
rsity
on
09/0
1/15
. For
per
sona
l use
onl
y.
GG16CH04-Xie ARI 21 July 2015 11:16
1
22 21 20 19 18 17 16 15 14 13 X Y
2
34
56
78 9 10 11 12
G
Chromosome 2: 57,736,349
GGGGG
GGG
G
GGGGGGGG
GG
G
G
GGGG
GGGGG
G
G
G
G
GG
a b
c
C1
C1 C2
MALBAC
Whole-genomesequencing
C3 C4 C5 C6
Kindredsingle cells
Bulk DNA
Singleancestor cell
~20 generations
C2 C3 Bulk
Figure 11Detecting newly acquired single-nucleotide variations (SNVs) with no false positives and estimation of the mutation rate of a humancancer cell line (SW480). (a) Experimental design. A single ancestor cell was chosen and cultured for ∼20 generations. The vastmajority of cells were used to extract DNA for bulk sequencing to represent the ancestral cell’s genome. A single cell from this culturewas chosen for another expansion of four generations. The kindred cells were isolated for single-cell whole-genome amplification.Single-cell samples C1, C2, and C3 were used for high-throughput sequencing. (b) Locations of the 35 newly acquired SNVs on thechromosomes of a single cell. (c) Next-generation sequencing data of a newly acquired SNV. The SNV (C→G) existed in thehigh-throughput data of all three kindred cells but not in the bulk data. Additional abbreviation: MALBAC, multiple annealing andlooping–based amplification cycles. Figure adapted from Reference 78.
heterozygous SNVs, Zong et al. (78) estimated that ∼49 mutations occurred in the 20 generations,yielding a mutation rate of ∼2.5 nucleotides per cell generation, which is consistent with theestimate based on the bulk data.
At this rather slow spontaneous mutation rate, cancer development would take a long time.What happens at the very beginning of cancer development has been a long-standing question,and single-cell WGA is well suited to help answer it.
Does Copy-Number Variation Precede Single-Nucleotide Variation?
In a bulk tissue sample at an early stage of cancer development, detection of abnormal CNVsis often difficult, especially when the number of cells with abnormal CNVs is small. Single-cellgenomic analysis is essential in evaluating the relationship, if any, between CNVs and SNVs.
www.annualreviews.org • Single-Cell Whole-Genome Amplification and Sequencing 95
Ann
u. R
ev. G
enom
. Hum
an G
enet
. 201
5.16
:79-
102.
Dow
nloa
ded
from
ww
w.a
nnua
lrev
iew
s.or
g A
cces
s pr
ovid
ed b
y H
arva
rd U
nive
rsity
on
09/0
1/15
. For
per
sona
l use
onl
y.
GG16CH04-Xie ARI 21 July 2015 11:16
Hyperproliferation Adenomatouspolyps
High-grade dysplasia Adenocarcinoma Invasive cancer
Abnormalcell growth
Cell 1Cell 1
Cell 2Cell 2
Cell 3Cell 3
Cell 4Cell 4
Cell 5Cell 5
Cell 6Cell 6
Cell 7Cell 7
Cell 8Cell 8
01234
01234
01234
01234
01234
01234
01234
01234
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y
Cell 1
Cell 2
Cell 3
Cell 4
Cell 5
Cell 6
Cell 7
Cell 8
RefSeqKCNB1
G
S V I S
G T C G T G C
47,900,760 base pairs
Chromosome
a
b
c
dA A A
01234
01234
01234
01234
01234
01234
01234
01234
Chromosome 5
Cell 1 Cell 1
Cell 2 Cell 2
Cell 3 Cell 3
Cell 4 Cell 4
Cell 5 Cell 5
Cell 6 Cell 6
Cell 7 Cell 7
Cell 8 Cell 8
96 Huang et al.
Ann
u. R
ev. G
enom
. Hum
an G
enet
. 201
5.16
:79-
102.
Dow
nloa
ded
from
ww
w.a
nnua
lrev
iew
s.or
g A
cces
s pr
ovid
ed b
y H
arva
rd U
nive
rsity
on
09/0
1/15
. For
per
sona
l use
onl
y.
GG16CH04-Xie ARI 21 July 2015 11:16
We have carried out single-cell genomic analyses of colonoscopy biopsies at different adenomastages (L. Huang, S. Ding & X.S. Xie, unpublished results). Some single cells in stage II adenomahave CNVs in the tumor suppresser gene APC (a reduction in copy number from two to one) aswell as SNVs in numerous cancer-related genes. Interestingly, we found that although some singlecells exhibited CNV reduction in APC without SNVs, all single cells with SNVs that have beenreported as somatic mutations of colon and other cancers in the Catalogue of Somatic Mutationsin Cancer (COSMIC) database showed the CNV reduction in APC. Moreover, we did not see anysingle cell that has colon cancer–related SNVs but no CNVs in an adenoma. Figure 12 showssuch data for the COSMIC gene KCNB1, which encodes an ion channel. These data indicate thatthe CNV in APC precedes the SNV in colon cancer development, at least in the particular colonyexamined.
Furthermore, we found that the single cells from the same adenoma exhibited CNV patternsreproducible among all the cells, indicating that these cells might be derived from a single stem orprogenitor cell in which the CNVs first arose. Thus, we have established a correlation between theSNVs and CNVs and propose that the SNVs are generated as a consequence of abnormal CNVsin the genome, or arise after the abnormal CNVs are acquired in the original populating cell. Ifconfirmed to be general for driving SNVs, this result could have significant implications for thegenesis of cancer. Suffice it to say that this experiment underscores the importance of single-cellgenomics in understanding cancer.
Circulating Tumor Cells
Originating from primary tumors, CTCs enter peripheral blood and seed metastases, which ac-count for 90% of cancer-related deaths. The genome sequencing of CTCs could offer noninvasiveprognosis or even diagnosis but has been hampered by low single-cell genome coverage of scarceCTCs.
Ni et al. (49) applied MALBAC for WGA of single CTCs from lung cancer patients and ob-served characteristic cancer-associated SNVs and indels in exomes of CTCs. These mutationsprovided information needed for individualized therapy (53), such as drug resistance and pheno-typic transition, but were heterogeneous from cell to cell.
Ni et al. (49) also discovered that, unlike the highly heterogeneous point mutations, theCNV patterns of CTCs are reproducible from CTC to CTC within a patient and even withindifferent patients of the same cancer type, but are distinctly different among different cancer types(Figure 13). Furthermore, the reproducible CNV patterns of CTCs are similar to those of themetastatic tumor. This result raised intriguing questions about the genesis of metastasis. It isevident that gains and losses in copy numbers of certain chromosome regions are selected for
←−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−Figure 12Temporal sequence of copy-number variations (CNVs) and single-nucleotide variations (SNVs) of colon polyps. (a) Normal intestinalepithelium and adenoma. (b) CNVs detected in eight single cells (3,000-kb bin size). Some cells exhibit changes in chromosome 5 copynumber from two to one. This is an ∼8-Mb region. (c) CNVs seen when zooming into chromosome 5 (1,000-kb bin size). A reductionof copy number from two to one is apparent in the 5q region containing the APC-coding gene (chromosomal region 5q21.3–23.1). Thesame region is seen for five different single cells, indicating that they came from the same stem cell. (d ) Four of eight single cells exhibitpoint mutations in the KCNB1 gene, which has been reported as a somatic mutation of colon cancer in the Catalogue of SomaticMutations in Cancer (COSMIC) database. Serine (S), valine (V), and isoleucine (I) are the amino acids translated by the triplet codon.The KCNB1 gene is a protein-coding gene that encodes an ion channel protein. Interestingly, SNVs occur only in cells with abnormalCNVs.
www.annualreviews.org • Single-Cell Whole-Genome Amplification and Sequencing 97
Ann
u. R
ev. G
enom
. Hum
an G
enet
. 201
5.16
:79-
102.
Dow
nloa
ded
from
ww
w.a
nnua
lrev
iew
s.or
g A
cces
s pr
ovid
ed b
y H
arva
rd U
nive
rsity
on
09/0
1/15
. For
per
sona
l use
onl
y.
GG16CH04-Xie ARI 21 July 2015 11:16
a
b
c
P1
7 20 2214 Y198Chromosome 1 116 17 2116 183 12 15 X42 9 13105
P1(ADC SCLC)
0246
P2(ADC) 0
246
P3(ADC) 0
246
P4(ADC) 0
246
P5(ADC) 0
246
P6(ADC) 0
246
P7ADC+SCLC
P7
_1
P7
_2
P1
_4
P1
_6
P1
_1
P1
_2
P1
_5
P1
_7
P1
_3
P1
_8
P3
_5
P3
_2
P3
_3
P3
_1
P3
_4
P2
_3
P2
_4
P2
_5
P2
_6
P2
_1
P2
_2
P6
_1
P5
_1
P5
_2
P4
_5
P4
_4
P4
_1
P4
_2
P4
_3
0246
7 20 2214 Y198Chromosome 1 116 17 2116 183 12 15 X42 9 13105
Primary
0246
Metastasis
0246
CTC 1
0246
CTC 2
0246
CTC 3
0246
CTC 4
0246
CTC 5
0246
CTC 6
0246
CTC 7
0246
CTC 8
02
46
0
200
500
98 Huang et al.
Ann
u. R
ev. G
enom
. Hum
an G
enet
. 201
5.16
:79-
102.
Dow
nloa
ded
from
ww
w.a
nnua
lrev
iew
s.or
g A
cces
s pr
ovid
ed b
y H
arva
rd U
nive
rsity
on
09/0
1/15
. For
per
sona
l use
onl
y.
GG16CH04-Xie ARI 21 July 2015 11:16
←−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−Figure 13Reproducible copy-number variation (CNV) patterns of seven lung cancer patients. Patient 1 (P1) experienced a phenotypic transitionfrom adenocarcinoma (ADC) in the lung to small-cell lung cancer (SCLC) in the metastatic liver, patients 2–6 (P2–6) have ADC, andpatient 7 (P7) has a mix of ADC and SCLC. (a) Reproducible CNV patterns of eight circulating tumor cells (CTCs) in P1. Thepatterns are different from that of the primary tissue ( first row) but the same as that of the metastatic tissue (second row). (b) CNVpatterns of CTCs from the seven patients. (c) Clustering analyses of CTCs based on the CNVs. CTCs from P1 and P7 (SCLC patients)were well separated from CTCs from P2–P6 (ADC patients). The y axis is the cluster distance constructed using Ward’s method basedon Euclidean distances between the patients’ CNVs. Figure adapted from Reference 49.
metastases. The finding that the CNV patterns of CTCs are cancer or tissue dependent offersthe potential for noninvasive cancer diagnosis based on the CNV patterns.
DISCLOSURE STATEMENT
S.L. and X.S.X. are coauthors on a patent applied for by Harvard University for MALBAC tech-nology and are cofounders of Yikon Genomics.
ACKNOWLEDGMENTS
We thank Chenghang Zong for his contribution to the invention of MALBAC. We thank YaqiongTang, Dandan Cao, and Hongshan Guo for technical assistance in data collection and GuangyuZhou for help with data analysis. We are indebted to our collaborators Fuchou Tang, Jie Qiao,Xiaohui Ni, Fan Bai, Wei Fan, Liya Xu, Jie Wang, Ning Zhang, Youyong Lu, Liying Yan, YuHou, Zhe Su, Yan Gao, Shigang Ding, Hejun Zhang, Ruiqiang Li, Mingyu Yang, Jinsen Li, JessicaSang, and Yanyi Huang for their contributions to the work reviewed in this article. We are gratefulfor funding from the National Natural Science Foundation of China (21327808), US NationalInstitutes of Health (NIH) National Human Genome Research Institute grants (HG005097-1 andHG005613-01), and an NIH Director’s Pioneer Award (5DP1CA186693) to X.S.X. as well as aNational Cancer Institute grant (5R33CA174560). The work at the Biodynamic Optical ImagingCenter (BIOPIC) has been supported by Peking University 985 special funding for collaborationwith hospitals and funding from Guangxi Wuzhou Zhongheng Group Co., Ltd.
LITERATURE CITED
1. 1000 Genomes Proj. Consort. 2010. A map of human genome variation from population-scale sequencing.Nature 467:1061–73
2. Bansal V, Tewhey R, Topol EJ, Schork NJ. 2011. The next phase in human genetics. Nat. Biotechnol.29:38–39
3. Barrett MT, Reid BJ, Joslyn G. 1995. Genotypic analysis of multiple loci in somatic cells by whole genomeamplification. Nucleic Acids Res. 23:3488–92
4. Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, et al. 2008. Accurate whole humangenome sequencing using reversible terminator chemistry. Nature 456:53–59
5. Blanco L, Bernad A, Lazaro JM, Martin G, Garmendia C, Salas M. 1989. Highly efficient DNA synthesisby the phage φ29 DNA polymerase: symmetrical mode of DNA replication. J. Biol. Chem. 264:8935–40
6. Chapman AR, He Z, Lu S, Yong J, Tan L, et al. 2015. Single cell transcriptome amplification withMALBAC. PLOS ONE 10:e0120889
7. Cheung VG, Nelson SF. 1996. Whole genome amplification using a degenerate oligonucleotide primerallows hundreds of genotypes to be performed on less than one nanogram of genomic DNA. PNAS93:14676–79
www.annualreviews.org • Single-Cell Whole-Genome Amplification and Sequencing 99
Ann
u. R
ev. G
enom
. Hum
an G
enet
. 201
5.16
:79-
102.
Dow
nloa
ded
from
ww
w.a
nnua
lrev
iew
s.or
g A
cces
s pr
ovid
ed b
y H
arva
rd U
nive
rsity
on
09/0
1/15
. For
per
sona
l use
onl
y.
GG16CH04-Xie ARI 21 July 2015 11:16
8. Coop G, Przeworski M. 2007. An evolutionary view of human recombination. Nat. Rev. Genet. 8:23–349. Dago AE, Stepansky A, Carlsson A, Luttgen M, Kendall J, et al. 2014. Rapid phenotypic and genomic
change in response to therapeutic pressure in prostate cancer inferred by high content analysis of singlecirculating tumor cells. PLOS ONE 9:e101777
10. Daina G, Ramos L, Obradors A, Rius M, Martinez-Pasarell O, et al. 2013. First successful double-factorPGD for Lynch syndrome: monogenic analysis and comprehensive aneuploidy screening. Clin. Genet.84:70–73
11. de Bourcy CF, De Vlaminck I, Kanbar JN, Wang J, Gawad C, Quake SR. 2014. A quantitative comparisonof single-cell whole genome amplification methods. PLOS ONE 9:e105585
12. Dean FB, Nelson JR, Giesler TL, Lasken RS. 2001. Rapid amplification of plasmid and phage DNA usingphi29 DNA polymerase and multiply-primed rolling circle amplification. Genome Res. 11:1095–99
13. Epstein CJ. 2007. The Consequences of Chromosome Imbalance: Principles, Mechanisms, and Models. Cambridge,UK: Cambridge Univ. Press
14. Fan HC, Wang J, Potanina A, Quake SR. 2011. Whole-genome molecular haplotyping of single cells.Nat. Biotechnol. 29:51–57
15. Francis JM, Zhang CZ, Maire CL, Jung J, Manzo VE, et al. 2014. EGFR variant heterogeneity inglioblastoma resolved through single-nucleus sequencing. Cancer Discov. 4:956–971
16. Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, et al. 2007. A second generation humanhaplotype map of over 3.1 million SNPs. Nature 449:851–61
17. Garmendia C, Bernad A, Esteban JA, Blanco L, Salas M. 1992. The bacteriophage φ29 DNA polymerase,a proofreading enzyme. J. Biol. Chem. 267:2594–99
18. Gawad C, Koh W, Quake SR. 2014. Dissecting the clonal origins of childhood acute lymphoblasticleukemia by single-cell genomics. PNAS 111:17947–52
19. Guo H, Zhu P, Wu X, Li X, Wen L, Tang F. 2013. Single-cell methylome landscapes of mouse embryonicstem cells and early embryos analyzed using reduced representation bisulfite sequencing. Genome Res.23:2126–35
20. Harris TD, Buzby PR, Babcock H, Beer E, Bowers J, et al. 2008. Single-molecule DNA sequencing of aviral genome. Science 320:106–9
21. Hastings PJ, Lupski JR, Rosenberg SM, Ira G. 2009. Mechanisms of change in gene copy number. Nat.Rev. Genet. 10:551–64
22. Heger M. 2014. Peking University reports first success in trial of single-cell sequencing for PGD.GenomeWeb, Oct. 1. http://www.genomeweb.com/sequencing/peking-university-reports-first-success-trial-single-cell-sequencing-pgd
23. Hinch AG, Tandon A, Patterson N, Song Y, Rohland N, et al. 2011. The landscape of recombination inAfrican Americans. Nature 476:170–75
24. Hou Y, Fan W, Yan L, Li R, Lian Y, et al. 2013. Genome analyses of single human oocytes. Cell 155:1492–506
25. Hou Y, Song L, Zhu P, Zhang B, Tao Y, et al. 2012. Single-cell exome sequencing and monoclonalevolution of a JAK2-negative myeloproliferative neoplasm. Cell 148:873–85
26. Huang J, Yan L, Fan W, Zhao N, Zhang Y, et al. 2014. Validation of multiple annealing and looping-based amplification cycle sequencing for 24-chromosome aneuploidy screening of cleavage-stage embryos.Fertil. Steril. 102:1685–91
27. Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, et al. 2004. Detection of large-scale variationin the human genome. Nat. Genet. 36:949–51
28. Ingles J, Doolan A, Chiu C, Seidman J, Seidman C, Semsarian C. 2005. Compound and double mutationsin patients with hypertrophic cardiomyopathy: implications for genetic testing and counselling. J. Med.Genet. 42:e59
29. Int. HapMap Consort. 2005. A haplotype map of the human genome. Nature 437:1299–32030. Jeffreys AJ, Neumann R. 2002. Reciprocal crossover asymmetry and meiotic drive in a human recombi-
nation hot spot. Nat. Genet. 31:267–7131. Kalisky T, Quake SR. 2011. Single-cell genomics. Nat. Methods 8:311–1432. Kirkness EF, Grindberg RV, Yee-Greenbaum J, Marshall CR, Scherer SW, et al. 2013. Sequencing of
isolated sperm cells for direct haplotyping of a human genome. Genome Res. 23:826–32
100 Huang et al.
Ann
u. R
ev. G
enom
. Hum
an G
enet
. 201
5.16
:79-
102.
Dow
nloa
ded
from
ww
w.a
nnua
lrev
iew
s.or
g A
cces
s pr
ovid
ed b
y H
arva
rd U
nive
rsity
on
09/0
1/15
. For
per
sona
l use
onl
y.
GG16CH04-Xie ARI 21 July 2015 11:16
33. Kitzman JO, MacKenzie AP, Adey A, Hiatt JB, Patwardhan RP, et al. 2011. Haplotype-resolved genomesequencing of a Gujarati Indian individual. Nat. Biotechnol. 29:59–63
34. Kong A, Thorleifsson G, Gudbjartsson DF, Masson G, Sigurdsson A, et al. 2010. Fine-scale recombinationrate differences between sexes, populations and individuals. Nature 467:1099–103
35. Lasken RS. 2012. Genomic sequencing of uncultured microorganisms from single cells. Nat. Rev. Microbiol.10:631–40
36. Lasken RS. 2013. Single-cell sequencing in its prime. Nat. Biotechnol. 31:211–1237. Lasken RS, Egholm M. 2003. Whole genome amplification: abundant supplies of DNA from precious
samples or clinical specimens. Trends Biotechnol. 21:531–3538. Levy S, Sutton G, Ng PC, Feuk L, Halpern AL, et al. 2007. The diploid genome sequence of an individual
human. PLOS Biol. 5:e25439. Lohr JG, Adalsteinsson VA, Cibulskis K, Choudhury AD, Rosenberg M, et al. 2014. Whole-exome
sequencing of circulating tumor cells provides a window into metastatic prostate cancer. Nat. Biotechnol.32:479–84
40. Lu S, Zong C, Fan W, Yang M, Li J, et al. 2012. Probing meiotic recombination and aneuploidy of singlesperm cells by whole-genome sequencing. Science 338:1627–30
41. Ma L, Xiao Y, Huang H, Wang Q, Rao W, et al. 2010. Direct determination of molecular haplotypes bychromosome microdissection. Nat. Methods 7:299
42. MacArthur D. 2012. Methods: face up to false positives. Nature 487:427–2843. Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, et al. 2005. Genome sequencing in microfab-
ricated high-density picolitre reactors. Nature 437:376–8044. Morley M, Molony CM, Weber TM, Devlin JL, Ewens KG, et al. 2004. Genetic analysis of genome-wide
variation in human gene expression. Nature 430:743–4745. Mueller E, Brueck C. 2015. Whole genome amplification for single cell biology. Tech. Doc., Sigma-Aldrich, St.
Louis, MO. http://www.sigmaaldrich.com/technical-documents/articles/life-science-innovations/whole-genome-amplification.html
46. Nat. Methods Eds. 2014. Method of the Year 2013. Nat. Methods 11:147. Navin NE. 2014. Cancer genomics: one cell at a time. Genome Biol. 15:45248. Navin NE, Kendall J, Troge J, Andrews P, Rodgers L, et al. 2011. Tumour evolution inferred by single-cell
sequencing. Nature 472:90–9449. Ni X, Zhuo M, Su Z, Duan J, Gao Y, et al. 2013. Reproducible copy number variation patterns among
single circulating tumor cells of lung cancer patients. PNAS 110:21083–8850. Peters BA, Kermani BG, Sparks AB, Alferov O, Hong P, et al. 2012. Accurate whole-genome sequencing
and haplotyping from 10 to 20 human cells. Nature 487:190–9551. Pugh TJ, Delaney AD, Farnoud N, Flibotte S, Griffith M, et al. 2008. Impact of whole genome amplifi-
cation on analysis of copy number variants. Nucleic Acids Res. 36:e8052. Rubio C, Rodrigo L, Mir P, Mateu E, Peinado V, et al. 2013. Use of array comparative genomic hy-
bridization (array-CGH) for embryo assessment: clinical results. Fertil. Steril. 99:1044–4853. Ruiz C, Li J, Luttgen MS, Kolatkar A, Kendall JT, et al. 2015. Limited genomic heterogeneity of circu-
lating melanoma cells in advanced stage patients. Phys. Biol. 12:01600854. Saiki RK, Gelfand DH, Stoffel S, Scharf SJ, Higuchi R, et al. 1988. Primer-directed enzymatic amplifi-
cation of DNA with a thermostable DNA polymerase. Science 239:487–9155. Sandberg R. 2014. Entering the era of single-cell transcriptomics in biology and medicine. Nat. Methods
11:22–2456. Sebat J, Lakshmi B, Troge J, Alexander J, Young J, et al. 2004. Large-scale copy number polymorphism
in the human genome. Science 305:525–2857. Shapiro E, Biezuner T, Linnarsson S. 2013. Single-cell sequencing-based technologies will revolutionize
whole-organism science. Nat. Rev. Genet. 14:619–3058. Shen J, Cram DS, Wu W, Cai L, Yang X, et al. 2013. Successful PGD for late infantile neuronal ceroid
lipofuscinosis achieved by combined chromosome and TPP1 gene analysis. Reprod. BioMed. Online 27:176–83
59. Suk EK, McEwen GK, Duitama J, Nowick K, Schulz S, et al. 2011. A comprehensively molecularhaplotype-resolved genome of a European individual. Genome Res. 21:1672–85
www.annualreviews.org • Single-Cell Whole-Genome Amplification and Sequencing 101
Ann
u. R
ev. G
enom
. Hum
an G
enet
. 201
5.16
:79-
102.
Dow
nloa
ded
from
ww
w.a
nnua
lrev
iew
s.or
g A
cces
s pr
ovid
ed b
y H
arva
rd U
nive
rsity
on
09/0
1/15
. For
per
sona
l use
onl
y.
GG16CH04-Xie ARI 21 July 2015 11:16
60. Tang F, Lao K, Surani MA. 2011. Development and applications of single-cell transcriptome analysis.Nat. Methods 8(Suppl.):S6–11
61. Telenius H, Carter NP, Bebb CE, Nordenskjo M, Ponder BA, Tunnacliffe A. 1992. Degenerateoligonucleotide-primed PCR: general amplification of target DNA by a single degenerate primer.Genomics 13:718–25
62. Tewhey R, Bansal V, Torkamani A, Topol EJ, Schork NJ. 2011. The importance of phase informationfor human genomics. Nat. Rev. Genet. 12:215–23
63. Tobler KJ, Brezina PR, Benner AT, Du L, Xu X, Kearns WG. 2014. Two different microarray tech-nologies for preimplantation genetic diagnosis and screening, due to reciprocal translocation imbalances,demonstrate equivalent euploidy and clinical pregnancy rates. J. Assist. Reprod. Genet. 31:843–50
64. Treff NR, Fedick A, Tao X, Devkota B, Taylor D, Scott RT. 2013. Evaluation of targeted next-generationsequencing–based preimplantation genetic diagnosis of monogenic disease. Fertil. Steril. 99:1377–84
65. Treff NR, Tao X, Ferry KM, Su J, Taylor D, Scott RT. 2012. Development and validation of an accu-rate quantitative real-time polymerase chain reaction–based assay for human blastocyst comprehensivechromosomal aneuploidy screening. Fertil. Steril. 97:819–24
66. Van Driest SL, Vasile VC, Ommen SR, Will ML, Tajik AJ, et al. 2004. Myosin binding proteinC mutations and compound heterozygosity in hypertrophic cardiomyopathy. J. Am. Coll. Cardiol. 44:1903–10
67. Vogelstein B, Kinzler KW. 2004. Cancer genes and the pathways they control. Nat. Med. 10:789–9968. Wang J, Fan HC, Behr B, Quake SR. 2012. Genome-wide single-cell analysis of recombination activity
and de novo mutation rates in human sperm. Cell 150:402–1269. Wang J, Wang W, Li R, Li Y, Tian G, et al. 2008. The diploid genome sequence of an Asian individual.
Nature 456:60–6570. Wang Y, Waters J, Leung ML, Unruh A, Roh W, et al. 2014. Clonal evolution in breast cancer revealed
by single nucleus genome sequencing. Nature 512:155–6071. Wells D, Kaur K, Grifo J, Glassner M, Taylor JC, et al. 2014. Clinical utilisation of a rapid low-pass whole
genome sequencing technique for the diagnosis of aneuploidy in human embryos prior to implantation.J. Med. Genet. 51:553–62
72. Wheeler DA, Srinivasan M, Egholm M, Shen Y, Chen L, et al. 2008. The complete genome of anindividual by massively parallel DNA sequencing. Nature 452:872–76
73. Xu X, Hou Y, Yin X, Bao L, Tang A, et al. 2012. Single-cell exome sequencing reveals single-nucleotidemutation characteristics of a kidney tumor. Cell 148:886–95
74. Yang H, Chen X, Wong WH. 2011. Completely phased genome sequencing through chromosome sorting.PNAS 108:12–17
75. Yu C, Yu J, Yao X, Wu WK, Lu Y, et al. 2014. Discovery of biclonal origin and a novel oncogene SLC12A5in colon cancer by single-cell sequencing. Cell Res. 24:701–12
76. Zhang K, Zhu J, Shendure J, Porreca GJ, Aach JD, et al. 2006. Long-range polony haplotyping of individualhuman chromosome molecules. Nat. Genet. 38:382–87
77. Zhang L, Cui X, Schmitt K, Hubert R, Navidi W, Arnheim N. 1992. Whole genome amplification froma single cell: implications for genetic analysis. PNAS 89:5847–51
78. Zong C, Lu S, Chapman AR, Xie XS. 2012. Genome-wide detection of single-nucleotide and copy-numbervariations of a single human cell. Science 338:1622–26
102 Huang et al.
Ann
u. R
ev. G
enom
. Hum
an G
enet
. 201
5.16
:79-
102.
Dow
nloa
ded
from
ww
w.a
nnua
lrev
iew
s.or
g A
cces
s pr
ovid
ed b
y H
arva
rd U
nive
rsity
on
09/0
1/15
. For
per
sona
l use
onl
y.
GG16-FrontMatter ARI 22 June 2015 17:20
Annual Review ofGenomics andHuman Genetics
Volume 16, 2015Contents
A Mathematician’s OdysseyWalter Bodmer � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 1
Lessons from modENCODEJames B. Brown and Susan E. Celniker � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �31
Non-CG Methylation in the Human GenomeYupeng He and Joseph R. Ecker � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �55
Single-Cell Whole-Genome Amplification and Sequencing:Methodology and ApplicationsLei Huang, Fei Ma, Alec Chapman, Sijia Lu, and Xiaoliang Sunney Xie � � � � � � � � � � � � � � � �79
Unraveling the Tangled Skein: The Evolution of TranscriptionalRegulatory Networks in DevelopmentMark Rebeiz, Nipam H. Patel, and Veronica F. Hinman � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 103
Alignment of Next-Generation Sequencing ReadsKnut Reinert, Ben Langmead, David Weese, and Dirk J. Evers � � � � � � � � � � � � � � � � � � � � � � � � � 133
The Theory and Practice of Genome Sequence AssemblyJared T. Simpson and Mihai Pop � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 153
Addressing the Genetics of Human Mental Health Disordersin Model OrganismsJasmine M. McCammon and Hazel Sive � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 173
Advances in Skeletal Dysplasia GeneticsKrista A. Geister and Sally A. Camper � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 199
The Genetics of Soft Connective Tissue DisordersOlivier Vanakker, Bert Callewaert, Fransiska Malfait, and Paul Coucke � � � � � � � � � � � � � � � 229
Neurodegeneration with Brain Iron Accumulation: Genetic Diversityand Pathophysiological MechanismsEsther Meyer, Manju A. Kurian, and Susan J. Hayflick � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 257
The Pathogenesis and Therapy of Muscular DystrophiesSimon Guiraud, Annemieke Aartsma-Rus, Natassia M. Vieira, Kay E. Davies,
Gert-Jan B. van Ommen, and Louis M. Kunkel � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 281
v
Ann
u. R
ev. G
enom
. Hum
an G
enet
. 201
5.16
:79-
102.
Dow
nloa
ded
from
ww
w.a
nnua
lrev
iew
s.or
g A
cces
s pr
ovid
ed b
y H
arva
rd U
nive
rsity
on
09/0
1/15
. For
per
sona
l use
onl
y.
GG16-FrontMatter ARI 22 June 2015 17:20
Detection of Chromosomal Aberrations in Clinical Practice: FromKaryotype to Genome SequenceChrista Lese Martin and Dorothy Warburton � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 309
Mendelian Randomization: New Applications in the Coming Ageof Hypothesis-Free CausalityDavid M. Evans and George Davey Smith � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 327
Eugenics and Involuntary Sterilization: 1907–2015Philip R. Reilly � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 351
Noninvasive Prenatal Genetic Testing: Current and Emerging Ethical,Legal, and Social IssuesMollie A. Minear, Stephanie Alessi, Megan Allyse, Marsha Michie,
and Subhashini Chandrasekharan � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 369
Errata
An online log of corrections to Annual Review of Genomics and Human Genetics articlesmay be found at http://www.annualreviews.org/errata/genom
vi Contents
Ann
u. R
ev. G
enom
. Hum
an G
enet
. 201
5.16
:79-
102.
Dow
nloa
ded
from
ww
w.a
nnua
lrev
iew
s.or
g A
cces
s pr
ovid
ed b
y H
arva
rd U
nive
rsity
on
09/0
1/15
. For
per
sona
l use
onl
y.