18
GENETICS | INVESTIGATION A Genealogical Look at Shared Ancestry on the X Chromosome Vince Buffalo *,†,1 , Stephen M. Mount and Graham Coop * Population Biology Graduate Group, Center for Population Biology, Department of Evolution and Ecology, University of California, Davis, CA 95616, Department of Cell Biology and Molecular Genetics, Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742 ABSTRACT Close relatives can share large segments of their genome identical by descent (IBD) that can be identified in genome-wide polymorphism datasets. There are a range of methods to use these IBD segments to identify relatives and estimate their relationship. These methods have focused on sharing on the autosomes, as they provide a rich source of information about genealogical relationships. We can hope to learn additional information about recent ancestry through shared IBD segments on the X chromosome, but currently lack the theoretical framework to use this information fully. Here, we fill this gap by developing probability distributions for the number and length of X chromosome segments shared IBD between an individual and an ancestor k generations back, as well as between half- and full-cousin relationships. Due to the inheritance pattern of the X and the fact that X homologous recombination only occurs in females (outside of the pseudo- autosomal regions), the number of females along a genealogical lineage is a key quantity for understanding the number and length of the IBD segments shared amongst relatives. When inferring relationships among individuals, the number of female ancestors along a genealogical lineage will often be unknown. Therefore, our IBD segment length and number distributions marginalize over this unknown number of recombinational meioses through a distribution of recombinational meioses we derive. By using Bayes theorem to invert these distributions, we can estimate the number of female ancestors between two relatives, giving us details about the genealogical relations between individuals not possible with autosomal data alone. KEYWORDS X chromosome, genetic genealogy, statistical genetics, identity by descent, recent ancestry C lose relatives are expected to share large contiguous seg- ments of their genome due to the limited number of crossovers per chromosome each generation (Fisher et al. 1949, 1954; Donnelly 1983). These large identical by descent (IBD) segments shared among close relatives leave a conspicuous footprint in population genomic data, and identifying and un- derstanding this sharing is key to many applications in biology (Thompson 2013). For example, in human genetics, evidence of recent shared ancestry is an integral part of detecting cryp- tic relatedness in genome-wide association studies (Gusev et al. 2009), discovering mis-specified relationships in pedigrees (Sun et al. 2002), inferring pairwise relationships (Epstein et al. 2000; Glaubitz et al. 2003; Huff et al. 2011), and localizing disease traits in pedigrees (Thomas et al. 2008). In forensics, recent ances- try is crucial for both accounting for population-level related- Copyright © 2016 by the Genetics Society of America doi: 10.1534/genetics.XXX.XXXXXX Manuscript compiled: Tuesday 28 th June, 2016% 1 Email for correspondence: [email protected] ness (Balding and Nichols 1994) and in familial DNA database searches (Belin et al. 1997; Sjerps and Kloosterman 1999). Ad- ditionally, recent ancestry detection methods have a range of applications in anthropology and ancient DNA to understand the familial relationships among sets of individuals (Fu et al. 2015; Keyser-Tracqui et al. 2003; Baca et al. 2012; Haak et al. 2008). In population genomics, recent ancestry has been used to learn about recent migrations and other demographic events (Ralph and Coop 2013; Palamara et al. 2012). An understanding of re- cent ancestry also plays a large role in understanding recently admixed populations, where individuals draw ancestry from multiple distinct populations (Pool and Nielsen 2009; Gravel 2012; Liang and Nielsen 2014). Finally, relative finding through recent genetic ancestry is increasingly a key feature of direct-to- consumer personal genomics products and an important source of information for genealogists (Durand et al. 2014; Royal et al. 2010). Approaches to infer recent ancestry among humans have of- ten used only the autosomes, as the recombining autosomes of- Genetics, Vol. XXX, XXXXXXXX June 2016 1 Genetics: Early Online, published on June 29, 2016 as 10.1534/genetics.116.190041 Copyright 2016.

A Genealogical Look at Shared Ancestry on the X Chromosome · A Genealogical Look at Shared Ancestry on the X Chromosome Vince Buffalo,†,1, Stephen M. Mount‡ and Graham Coop†

  • Upload
    ngokien

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Genealogical Look at Shared Ancestry on the X Chromosome · A Genealogical Look at Shared Ancestry on the X Chromosome Vince Buffalo,†,1, Stephen M. Mount‡ and Graham Coop†

GENETICS | INVESTIGATION

A Genealogical Look at Shared Ancestry on the XChromosome

Vince Buffalo∗,†,1, Stephen M. Mount‡ and Graham Coop†

∗Population Biology Graduate Group, †Center for Population Biology, Department of Evolution and Ecology, University of California, Davis, CA 95616, ‡Department of Cell Biologyand Molecular Genetics, Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742

ABSTRACT Close relatives can share large segments of their genome identical by descent (IBD) that can be identified ingenome-wide polymorphism datasets. There are a range of methods to use these IBD segments to identify relatives andestimate their relationship. These methods have focused on sharing on the autosomes, as they provide a rich source ofinformation about genealogical relationships. We can hope to learn additional information about recent ancestry throughshared IBD segments on the X chromosome, but currently lack the theoretical framework to use this information fully. Here,we fill this gap by developing probability distributions for the number and length of X chromosome segments shared IBDbetween an individual and an ancestor k generations back, as well as between half- and full-cousin relationships. Due tothe inheritance pattern of the X and the fact that X homologous recombination only occurs in females (outside of the pseudo-autosomal regions), the number of females along a genealogical lineage is a key quantity for understanding the number andlength of the IBD segments shared amongst relatives. When inferring relationships among individuals, the number of femaleancestors along a genealogical lineage will often be unknown. Therefore, our IBD segment length and number distributionsmarginalize over this unknown number of recombinational meioses through a distribution of recombinational meioses wederive. By using Bayes theorem to invert these distributions, we can estimate the number of female ancestors between tworelatives, giving us details about the genealogical relations between individuals not possible with autosomal data alone.

KEYWORDS X chromosome, genetic genealogy, statistical genetics, identity by descent, recent ancestry

Close relatives are expected to share large contiguous seg-ments of their genome due to the limited number of

crossovers per chromosome each generation (Fisher et al. 1949,1954; Donnelly 1983). These large identical by descent (IBD)segments shared among close relatives leave a conspicuousfootprint in population genomic data, and identifying and un-derstanding this sharing is key to many applications in biology(Thompson 2013). For example, in human genetics, evidenceof recent shared ancestry is an integral part of detecting cryp-tic relatedness in genome-wide association studies (Gusev et al.2009), discovering mis-specified relationships in pedigrees (Sunet al. 2002), inferring pairwise relationships (Epstein et al. 2000;Glaubitz et al. 2003; Huff et al. 2011), and localizing disease traitsin pedigrees (Thomas et al. 2008). In forensics, recent ances-try is crucial for both accounting for population-level related-

Copyright © 2016 by the Genetics Society of Americadoi: 10.1534/genetics.XXX.XXXXXXManuscript compiled: Tuesday 28th June, 2016%1Email for correspondence: [email protected]

ness (Balding and Nichols 1994) and in familial DNA databasesearches (Belin et al. 1997; Sjerps and Kloosterman 1999). Ad-ditionally, recent ancestry detection methods have a range ofapplications in anthropology and ancient DNA to understandthe familial relationships among sets of individuals (Fu et al.2015; Keyser-Tracqui et al. 2003; Baca et al. 2012; Haak et al. 2008).In population genomics, recent ancestry has been used to learnabout recent migrations and other demographic events (Ralphand Coop 2013; Palamara et al. 2012). An understanding of re-cent ancestry also plays a large role in understanding recentlyadmixed populations, where individuals draw ancestry frommultiple distinct populations (Pool and Nielsen 2009; Gravel2012; Liang and Nielsen 2014). Finally, relative finding throughrecent genetic ancestry is increasingly a key feature of direct-to-consumer personal genomics products and an important sourceof information for genealogists (Durand et al. 2014; Royal et al.2010).

Approaches to infer recent ancestry among humans have of-ten used only the autosomes, as the recombining autosomes of-

Genetics, Vol. XXX, XXXXXXXX June 2016 1

Genetics: Early Online, published on June 29, 2016 as 10.1534/genetics.116.190041

Copyright 2016.

Page 2: A Genealogical Look at Shared Ancestry on the X Chromosome · A Genealogical Look at Shared Ancestry on the X Chromosome Vince Buffalo,†,1, Stephen M. Mount‡ and Graham Coop†

fer more opportunity to detect a range of relationships than theY chromosome, mitochondria, or X chromosome. However, thenature of X chromosome inheritance means that it can clarifydetails of the relationships among individuals and be informa-tive about sex-specific demography and admixture histories inways that autosomes cannot (Goldberg and Rosenberg 2015; Ra-machandran et al. 2004, 2008; Bryc et al. 2010; Bustamante andRamachandran 2009; Shringarpure et al. 2016; Pool and Nielsen2007; Rosenberg 2016).

In this paper, we look at the inheritance of chromosomal seg-ments on the X chromosome among closely related individuals.Our genetic ancestry models are structured around biparentalgenealogies back in time, an approach used by many previousauthors (e.g., Donnelly 1983; Chang 1999; Barton and Etheridge2011; Rohde et al. 2004). If we ignore pedigree collapse, the ge-nealogy of a present-day individual encodes all biparental rela-tionships back in time; e.g. the two parents, four grandparents,eight great-grandparents, 2k greatk−2 grandparents, and in gen-eral the 2k ancestors k generations back; we refer to these indi-viduals as one’s genealogical ancestors. Note that throughout thispaper, kth generation ancestors refers to the ancestors within gen-eration k, not the total number of ancestors from generations 1to k. A genealogical ancestor of a present-day individual is saidto also be a genetic ancestor if the present-day individual sharesgenetic material by descent from this ancestor. We refer to thesesegments of shared genetic material as being identical by de-scent, and in doing so we ignore the possibility of mutation inthe limited number of generations separating our individuals.Throughout this paper, we will ignore the pseudo-autosomal(PAR) region(s) of the X chromosome, which undergoes cross-ing over with the Y chromosome in males (Koller and Darling-ton 1934) to ensure proper disjunction in meiosis I (Hassoldet al. 1991). We also ignore gene conversion which is knownto occur on the X (Rosser et al. 2009).

Here, we are concerned with inheritance through the X ge-nealogy embedded inside an individual’s genealogy, which in-cludes only the subset of one’s genealogical ancestors whocould have possibly contributed to one’s non-PAR X chromo-some. We refer to the individuals in this X genealogy as Xancestors. Since males receive an X only from their mothers, amale’s father cannot be an X ancestor. Consequently, a male’sfather and all of his ancestors are excluded from the X geneal-ogy (Figure 1). Therefore, females are overrepresented in the Xgenealogy, and as we go back in one’s genealogy, the fractionof individuals who are possible X ancestors shrinks. This prop-erty means that genetic relationships differ on the X comparedto the autosomes, a fact that changes the calculation of kinshipcoefficients on the X (Pinto et al. 2012, 2011) and also has in-teresting implications for kin-selection models involving the Xchromosome (Fox et al. 2009; Rice et al. 2008).

In Section (and in Appendix ) we review models of auto-somal identity by descent among relatives, on which we baseour models of X genetic ancestry. Then, in Section we look atX genealogies, as their properties affect the transmission of Xgenetic material from X ancestors to a present-day individual.We develop simple approximations to the probability distribu-tions of the number and length of X chromosome segments thatwill be shared IBD between a present-day female and one ofher X ancestors a known number of generations back. Thesemodels provide a set of results for the X chromosome equiva-lent to those already known for the autosomes (Donnelly 1983;Thomas et al. 1994). Then, in Section , we look at shared X

ancestry—when two present-day cousins share an X ancestor aknown number of generations back. We calculate the probabil-ities that genealogical half- and full-cousins are also connectedthrough their X genealogy, and thus can potentially share ge-netic material on their X. We then extend our models of IBDsegment length and number to segments shared between half-and full-cousins. Finally, in Section we show that shared Xgenetic ancestry contains additional information (compared togenetic autosomal ancestry) for inferring relationships amongindividuals, and explore the limits of this information.

Autosomal Ancestry

To facilitate comparison with our X chromosome results, wefirst briefly review analogous autosomal segment number andsegment length distributions (Donnelly 1983; Thomas et al. 1994;Huff et al. 2011). Throughout this paper, we assume that one’sgenealogical ancestors k generations back are distinct (e.g. thereis no inbreeding), i.e. there is no pedigree collapse due toinbreeding (see Appendix for a model of how this assump-tion breaks down with increasing k). Thus, an individual has2k distinct genealogical ancestors. Assuming no selection andfair meiosis, a present-day individual’s autosomal genetic ma-terial is spread across these 2k ancestors with equal probability,having been transmitted to the present-day individual solelythrough recombination and segregation.

We model the process of crossing over during meiosis as acontinuous time Markov process along the chromosome, as inThomas et al. (1994) and Huff et al. (2011), and described byDonnelly (1983). In doing so we assume no crossover inter-ference, such that in each generation b recombinational break-points occur as a Poisson process running with a uniform rateequal to the total length of the genetic map (in Morgans), ν.Within a single chromosome, b breaks create a mosaic of b + 1alternating maternal and paternal segments. This alternationbetween maternal and paternal haplotypes creates long-run de-pendency between segments (Liang and Nielsen 2014). We ig-nore these dependencies in our analytic models by assumingthat each chromosomal segment survives segregation indepen-dently with probability 1/2 per generation. For d independentmeioses separating two individuals, we imagine the Poissonrecombination process running at rate νd, and for a segmentto be shared IBD between the two ancestors it must survive1/2d segregations. Consequently, the expected number of seg-ments shared IBD between two individuals d meioses apart in agenome with c chromosomes is approximated as (Thomas et al.1994):

E[N] =12d (νd + c) (1)

Intuitively, we can understand the 1/2d factor as the coeffi-cient of kinship (or path coefficient; Wright 1922, 1934) of twoindividuals d meioses apart, which gives the probability thattwo alleles are shared IBD between these two individuals. Then,the expected number of IBD segments E[N] can be thought of asthe average number of alleles shared between two individualsin a genome with νd+ c loci total. Under this approximation, re-combination increases the number of independent loci linearlyeach generation (by a factor of the total genetic length). A frac-tion 1/2d of parental alleles at these loci survive the d meioses tobe IBD with the present-day individual.

2 Vince Buffalo et al.

Page 3: A Genealogical Look at Shared Ancestry on the X Chromosome · A Genealogical Look at Shared Ancestry on the X Chromosome Vince Buffalo,†,1, Stephen M. Mount‡ and Graham Coop†

� �

� � �

� � � � �

� � � � � � � �

� � � � � � � � � � � � �

Figure 1 A genealogy back five generations with the embedded X genealogy. Males are depicted as squares and females as circles.Individuals in the X genealogy are shaded gray while unshaded individuals are ancestors that are not X ancestors. Each X ancestoris labeled with the number of recombinational meioses to the present-day female.

By convention, we count the number of contiguous IBD seg-ments N in the present-day individual, not the number of con-tiguous segments in the ancestor. For example, an individualwill share exactly one block per chromosome with each par-ent if we count the contiguous segments in the offspring, eventhough these segments may be spread across the parent’s twohomologues. This convention, which we use throughout thepaper, is identical to counting the number of IBD segments thatoccur in d − 1 meioses rather than d meioses. This conventiononly impacts models of segments shared IBD between an indi-vidual and one of their ancestors; neither the distribution of seg-ment lengths nor the distributions for segment number sharedIBD between cousins are affected by this convention.

The distribution of IBD segments between a present-day individualand an ancestor Given that a present-day individual and anancestor in the kth generation are separated by k meioses, thenumber of IBD segments can be modeled with what we call thePoisson-Binomial approximation. Over d = k meioses, B = b ∼Pois(νk) recombinational breakpoints fall on c independentlyassorting chromosomes, creating b+ c segments. Ignoring long-range dependencies, we assume all of these b + c segmentshave an independent chance of surviving the k segregations tothe present-day individual, and thus the probability that n seg-ments survive given b + c trials is Binomially distributed withprobability 1/2k. Marginalizing over the unobserved number ofrecombinational breakpoints b, and replacing k with k − 1 tofollowing the convention described above:

P(N = n|k) =∞

∑b=0

Bin(N = n | l = b + c, p = 1/2k−1)

× Pois(B = b | λ = ν(k − 1)) (2)

The expected value of the Poisson-Binomial model is givenby equation (1) with d = k − 1 and this model is similar to thoseof Thomas et al. (1994); Donnelly (1983). We can further approx-imate this by assuming that we have a Poisson total numberof segments with mean (c + ν(k − 1)) and these segments areshared with probability 1/2k−1 as in Huff et al. (2011). This givesus a thinned Poisson distribution of shared segments:

P(N = n|k, ν, c) = Pois(N = n|λ = (c + ν(k − 1))/2k−1)

=((c + ν(k − 1))/2k−1)ne−(c+ν(k−1))/2k−1

n!(3)

This thinned Poisson model also has an expectation givenby equation (1) but compared to the Poisson-Binomial modelhas a larger variance than the true process. This overdisper-sion occurs because modeling the number of segments createdafter b breakpoints involves incorporating the initial numberof chromosomes into the Poisson rate. However, this initialnumber of chromosomes is actually fixed, which the Poisson-Binomial model captures but the Poisson thinning model doesnot (i.e. one generation back such that k = 1, the thinningmodel treats the number of segments shared IBD with one’s par-ents is N ∼ Pois(c) rather than c). See Appendix for a furthercomparison of these two models. A more formal descriptionof this approximation as a continuous-time Markov process isgiven in Thomas et al. (1994). In Appendix , we describe similarresults for the number of autosomal segments shared betweencousins and the length distributions of autosomal segments.

We will use similar models as these in modeling the lengthand number of X chromosome segments shared been relatives.However, the nature of X genealogies (which we cover in thenext section) requires we adjust these models. Specifically,while one always has k recombinational meioses between an au-tosomal ancestor in the kth generation, the number of recombi-national meioses varies across the lineages to an X ancestor withthe number of females in a lineage, since X homologous recom-bination only occurs in females (Figure 1). This varying numberof recombinational meioses across lineages leads to a varying-rate Poisson recombination process, with the rate dependingon the specific lineage to the X ancestor. After we take a closerlook at X genealogies in the next section, we adapt the mod-els above to handle the varying-rate Poisson process needed tomodel IBD segments in X genealogies.

X Ancestry

Number of Genealogical X Ancestors While a present-day indi-vidual can potentially inherit autosomal segments from any ofits 2k genealogical ancestors k generations back, only a fractionof these individuals can possibly share segments on the X chro-mosome. In contrast to biparental genealogies, males have onlyone genealogical X ancestor—their mothers—if we ignore thePAR. This constraint (which we refer to throughout as the notwo adjacent males condition) shapes both the number of X ances-tors and the number of females along an X lineage. For exam-ple, consider a present-day female’s X ancestors one generationback: both her father and mother contribute X chromosome ma-terial. Two generations back, she has three X genealogical an-

Models of Recent Ancestry on the X Chromosome 3

Page 4: A Genealogical Look at Shared Ancestry on the X Chromosome · A Genealogical Look at Shared Ancestry on the X Chromosome Vince Buffalo,†,1, Stephen M. Mount‡ and Graham Coop†

2 4 6 8 10 12 14

100

101

102

103

104

105

generation

expecte

d n

um

ber

of ancesto

rs

A

autosome ancestor

X ancestor

genetic autosomes

genetic autosome length of X

genetic X

2 4 6 8 10 12 14

0.0

0.2

0.4

0.6

0.8

1.0

generation

pro

babili

ty

B

P(Nauto > 0)P(Nx > 0 | X ancestor)

P(Nx > 0 | ancestor)

P(X ancestor | ancestor)

Figure 2 How the number of genetic and genealogical ancestors and probabilities of sharing genetic material vary back through thegenerations for different cases. A: Each line represents a present-day female’s expected number of ancestors (y-axis) in the kth gener-ation (x-axis; where k = 1 is parental generation), for a variety of cases. The present day female’s number of genealogical ancestorsin the kth generation is in red, and the expected number of these ancestors that contribute any autosome genetic material is in yel-low. Likewise, the present-day female’s number genealogical X ancestors is in green, and the expected number of these ancestorsthat contribute any X genetic material is in blue. For comparison, the number of genetic ancestors of an autosome of length equal tothe X is included (orange). B: The probability of genealogical and genetic ancestry (y-axis) from an arbitrary ancestor in the kth gen-eration (x-axis). P(Nauto > 0) is derived from equation (3), P(NX > 0 | X ancestor) from equation (8), P(NX > 0 | ancestor) fromequations (8) and (4), and P(X ancestor | ancestor) from equation (4). Points show simulated results.

cestors: her father only inherits an X from her paternal grand-mother, while her mother can inherit X material from either ofparents. Continuing this process, this individual has five X an-cestors three generations back and eight ancestors four genera-tions back (Figure 1).

In general, a present-day female’s X genealogical ancestors isgrowing as a Fibonacci series (Laughlin 1920; Basin 1963), suchthat k generations back she has Fk+2 X genealogical ancestors,where Fk is the kth Fibonacci number (where k is 0-indexed andthe series begins F0 = 0, F1 = 1, . . .; Online Encyclopedia of Inte-ger Sequences reference A000045; Sloane, 2010). We can demon-strate that one’s number of X genealogical ancestors (nk) growsas a Fibonacci series by encoding the X inheritance rules for thenumber of males and females (mk and fk, respectively) in thekth generation as a set of recurrence relations:

fk = nk−1 every individual receives anX chromosome from his/hermother

mk = fk−1 every female receives an Xchromosome from her father

nk = fk + mk

Rearranging these recurrence equations gives us nk = nk−1 +nk−2, which is the Fibonacci recurrence. Starting with a femalein the k = 0 generation, we have initial values n0 = 1 andn1 = 2, which gives us the Fibonacci numbers offset by two,Fk+2. For a present-day male, his number of X ancestors isFk+1, i.e. offset by one to count the number of X ancestorshis mother has. To simplify our expressions, we will assumethroughout the paper that all-present day individuals are fe-male since a simple offset can be made to handle males.

In Figure 2A we show the increase in the number of X ge-

nealogical and genetic ancestors (green and light blue) andcompare these to the growth of all of one’s genealogical an-cestors and autosomal genetic ancestors. The closed-form so-lution for the kth Fibonacci number is given by Binet’s formula(Fn = ((1+

√5)n − (1−

√5)n)/(2n

√5)), which shows that the

Fibonacci sequence grows at an exponential rate slower than 2k.Consequently, the fraction of ancestors who can contribute

to the X chromosome declines with k. Given that a female hasFk+2 X ancestors and 2k genealogical distinct ancestors, her pro-portion of X ancestors is:

P(X ancestor | ancestor) =Fk+2

2k (4)

This fraction can also be interpreted as the probability that a ran-domly chosen genealogical ancestor k generations ago is also anX genealogical ancestor. We show this probability as a functionof generations into the past in Figure 2B (yellow line).

From our recurrence equations we can see that a present-dayfemale’s Fk+2 ancestors in the kth generation are composed ofFk+1 females and Fk males. Likewise for a present-day male,his Fk+1 ancestors in the kth generation are composed of Fkfemales and Fk−1 males. We will use these results when calcu-lating the probability of a shared X ancestor.

Ancestry Simulations In the next sections, we use stochas-tic simulations to verify the analytic approximations we de-rive; here we briefly describe the simulation methods. Wehave written a C and Python X genealogy simulation proce-dure (source code available in File S1 and at https://github.com/vsbuffalo/x-ancestry/). We simulate a female’s X chromosomegenetic ancestry back through her X genealogy. Figure 3 visual-izes the X genetic ancestors of one simulated example X geneal-ogy back nine generations to illustrate this process. Each sim-ulation begins with two present-day female X chromosomes,one of which is passed to her mother and one to her father.

4 Vince Buffalo et al.

Page 5: A Genealogical Look at Shared Ancestry on the X Chromosome · A Genealogical Look at Shared Ancestry on the X Chromosome Vince Buffalo,†,1, Stephen M. Mount‡ and Graham Coop†

Figure 3 Graphical representations of an example X chromosome genealogy. A: Simulated X genealogy of a present-day female,back nine generations. Each arc is an ancestor, with female ancestors colored red, and male ancestors colored blue. The trans-parency of each arc reflects the genetic contribution of this ancestor to the present-day female. White arcs correspond X genealogicalancestors that share no genetic material with the present-day female, and gray arcs are genealogical ancestors that are not X ances-tors. B: The X segments of the simulation in (A), back five generations. The maternal X lineage’s segments are colored red, and thepaternal X segments are colored blue. A male ancestor’s sex chromosomes are colored dark gray (and include the Y) and a femaleancestor’s sex chromosomes are colored light gray.

Segments transmitted to a male ancestor are simply passed di-rectly back to his mother (without recombination). For seg-ments passed to a female ancestor, we place a Poisson num-ber of recombination breakpoints (with mean ν) on the X chro-mosome and the segment is broken where it overlaps theserecombination events. The first segment along the chromo-some is randomly drawn to have been inherited from either hermother/father, and we alternate this choice for subsequent seg-ments. This procedure repeats until the target generation back kis reached. The segments in the k-generation ancestors are thensummarized as either counts (number of IBD segments per in-dividual) or lengths. These simulations are necessarily approx-imate as they ignore crossover interference. However, unlikeour analytic approximations, our simulation procedure main-tains long-run dependencies created during recombination, al-lowing us to see the extent to which assuming independent seg-ment survival adversely impacts our analytic results.

The number of recombinational meioses along an unknown X lineageIf we pick an ancestor at random k generations ago, the proba-bility that they are an X genealogical ancestor is given by equa-tion (4). We can now extend this logic and ask: having ran-domly sampled an X genealogical ancestor, how many recom-binational meioses (i.e. females) lie in the lineage between apresent day individual and this ancestor? Since IBD segmentnumber and length distributions are parameterized by a rateproportional to the number of recombination events, this quan-tity is essential to our further derivations. Specifically, if there’suncertainty about the particular lineage between a present-dayfemale and one of her X ancestors k generations back (such thatall of the Fk+2 lineages to an X ancestor are equally probable),the number of females (thus, recombinational meioses) that oc-cur is a random variable R. By the no two adjacent males con-dition, the possible number of females R is constrained; R hasa lower bound of ⌊k/2⌋, which corresponds to male-female al-ternation each generation to an ancestor in the kth generation.Similarly, the upper bound of R is k, since it is possible everyindividual along one X lineage is a female. Noting that an Xgenealogy extending back k generations enumerates every pos-sible way to arrange r females such that none of the k − r malesare adjacent, we find that the number of ways of arranging r

such females this way is

(r + 1k − r

). (5)

For some readers, it may be useful to visualize the relation-ship between the numbers of recombinational meioses acrossthe generations using Pascal’s triangle (Figure 4). The sequenceof recombinational meioses is related to a known integer se-quence; see Online Encyclopedia of Integer Sequences referenceA030528 (Sloane 2010) for a description of this sequence and itsother applications.

If we pick an X genealogical ancestor at random k genera-tions ago the probability that there are r female meioses alongthe lineage leading to this ancestor is

PR(R = r|k) =(r+1

k−r)

Fk+2. (6)

In Appendix , we derive a generating function for the num-ber of recombinational meioses. We can use this generatingfunction to obtain properties of this distribution such as the ex-pected number of recombinational meioses. We can show thatthe expected number of recombinational meioses convergesrapidly to E[R] ≈ (φ/

√5) k with increasing k, where φ is the

Golden Ratio, 1+√

52 .

The Distribution of Number of Segments Shared with an X AncestorUsing the distribution of recombinational meioses derived inthe last section, we now derive a distribution for the number ofIBD segments shared between a present-day individual and anX ancestor in the kth generation. For clarity, we first derive thenumber of IBD segments counted in the parents (i.e. not follow-ing the convention described in Section ), but we can adjust thissimply by replacing k with k − 1.

First, we calculate the probability of a present-day individ-ual sharing N = n IBD segments with an X genealogical an-cestor k generations in the past, where it is known that thereare R = r females (and thus recombinational meioses) alongthe lineage to this ancestor. This probability uses the Poisson-

Models of Recent Ancestry on the X Chromosome 5

Page 6: A Genealogical Look at Shared Ancestry on the X Chromosome · A Genealogical Look at Shared Ancestry on the X Chromosome Vince Buffalo,†,1, Stephen M. Mount‡ and Graham Coop†

1

1 1

0 2 1

0 1 3 1

0 0 3 4 1

0 0 1 6 5 1

0 0 0 4 10 6 1

0 0 0 1 10 15 7 1

1

1 1

0 2 1

0 1 3 1

0 0 3 4 1

0 0 1 6 5 1

0 0 0 4 10 6 1

1

1 1

0 2 1

0 1 3 1

0 0 3 4 1

0 0 1 6 5 1

1

k = 1

k = 2

k = 3

k = 4

k = 5

k = 6

k = 7

r = 7r = 6r = 5r = 4r = 3r = 2r = 1r = 0

Figure 4 The number of individuals (black numbers) with rrecombinational meioses (each diagonal, labeled at base of tri-angle) for a generation k (each row). This encodes the numberof recombinational meioses as the binomial coefficient (r+1

k−r).Each value is further decomposed into the number of recom-binational meioses from the female (red value, upper left) andmale (blue value, upper right) lineages. Each black value iscalculated by adding the black number to the left in the rowabove (the number of recombinational meioses from the mater-nal side) and the black number two rows directly above (thenumber of recombinational meioses from the paternal side).The sum of each row (fixed k) is a Fibonacci number and thevalues in the diagonal corresponding to a fixed value of r arebinomial coefficients. Reading from the top left side to bottomright corner, Pascal’s triangle is contained in the red, blue, andblack numbers.

Binomial model described in Section :

P(N = n|r, k, ν) =∞

∑b=0

Bin(N = n | l = b + 1, p = 1/2r)

× Pois(B = b | λ = νr) (7)

Note that once we have conditioned on the number of re-combinational meioses r, the lineages to an X ancestor are in-terchangeable; the specific X lineage affects recombination (andthus the IBD number and length distributions) only through thenumber of recombinational meioses along the lineage.

If we consider an X genealogical ancestor k generations back,this individual could be any of the present-day female’s Fk+2X ancestors. Since the particular lineage to this ancestor is un-known, we marginalize over all possible numbers of recombi-national meioses that could occur:

P(N = n|k, ν)

=k

∑r=⌊k/2⌋

∑b=0

Bin(N = n|l = b + 1, p = 1/2r)

× Pois(B = b | λ = νr)(r+1

k−r)

Fk+2

=k

∑r=⌊k/2⌋

∑b=0

(b + 1

n

)1/2rn(1 − 1/2r)b−n+1

× (νr)be−νr

b!(r+1

k−r)

Fk+2

For the distribution of number of IBD segments counted inthe offspring, we substitute k − 1 for k

P(N = n | k, ν) =k−1

∑r=⌊(k−1)/2⌋

∑b=0

Bin(n | l = b + 1, p = 1/2r)

× Pois(B = b | λ = νr)

×( r+1

k−r−1)

Fk+1. (8)

In this formulation, if k = 1, r = 0. In this case, the lack ofrecombinational meioses implies b = 0, such that a present-day female shares n = 1 X chromosomes with each of her twoparents in the k = 1 generation with certainty. These segmentnumber distributions are visualized in Figure 5 (light blue lines)alongside simulated results (gray points).

We can use our equation (8) to obtain P(N > 0), the probabil-ity that a genealogical X ancestor k generations ago is a geneticancestor. This probability over k ∈ {1, 2, . . . , 14} generations isshown in Figure 2B. For comparison, Figure 2B also includes theprobability of a genealogical ancestor in the kth generation be-ing an autosomal genetic ancestor and the probability of beinga genetic X ancestor unconditional on being an X genealogicalancestor.

We have also assessed the Poisson thinning approach tomodeling X IBD segment number. As with the Poisson-Binomial model, we marginalize over R:

P(N = n | k, ν) =k−1

∑r=rM

Pois(B = b | λ = (1 + νr)/2r)

×( r+1

k−r−1)

Fk+1(9)

where rM = ⌊(k − 1)/2⌋.In Figure 5 we have compared the Poisson-Binomial and

Poisson-thinning approximations for the number of IBD seg-ments (counted in the offspring) shared between an X-ancestorin the kth generation and a present-day female. Overall, the ana-lytic approximations are close to the simulation results, with thePoisson-Binomial model a closer approximation for small k andboth models’ accuracy improving quickly with increasing k. Fora single chromosome (like the X), the Poisson-thinning modeloffers a noticeable worse fit than it does for the autosomes dueto overdispersion discussed in Section (see Appendix for de-tails). Throughout the paper, we use the more accurate Poisson-Binomial model rather than this Poisson thinning model. If onlyX ancestry more than 3 generations back is of interest, the Pois-son thinning approach may be used without much loss of accu-racy.

The Distribution of IBD Segment Lengths with an X Ancestor Thedistribution of IBD segment lengths between a present-day fe-male and an unknown X genealogical ancestor in the kth genera-tion is similar to the autosomal length distribution described inAppendix (equation 29). However, with uncertainty about theparticular lineage to the X ancestor, the number of recombina-tional meioses can vary between ⌊k/2⌋ ≤ r ≤ k; we marginalizeover the unknown number of recombinational meioses usingthe distribution equation (6). Our length density function is:

p(U = u|k) =k

∑r=⌊k/2⌋

re−ru (r+1k−r)

Fk+2(10)

6 Vince Buffalo et al.

Page 7: A Genealogical Look at Shared Ancestry on the X Chromosome · A Genealogical Look at Shared Ancestry on the X Chromosome Vince Buffalo,†,1, Stephen M. Mount‡ and Graham Coop†

0.0

0.2

0.4

0.6

0.8 k = 2 k = 3

poisson−binomial

poisson thinning

k = 4

0 1 2 3 4 5 6 7

0.0

0.2

0.4

0.6

0.8 k = 5

0 1 2 3 4 5 6 7

k = 6

0 1 2 3 4 5 6 7

k = 7

number of IBD segments

pro

ba

bili

ty

Figure 5 The Poisson thinning (yellow lines) and Poisson-Binomial (blue lines) analytic distributions of IBD segment number be-tween an X ancestor in the kth generation (each panel) and a present-day female. Simulation results averaged over 5,000 simulationsare the gray points.

In Figure 6, we compare our analytic length density to anempirical density of X segment lengths calculated from 5,000simulations. As with our IBD segment number distributions,our analytic model is close to the simulated data’s empiricaldensity, and converges rapidly with increasing k.

Note that both the IBD segment length and number distri-butions marginalize over an unobserved number of recombina-tional meioses (R) that occur along the lineage between individ-uals. As the IBD segments shared between two individuals isa function of the number breakpoints B, and thus recombina-tional meioses, the length and number distributions P(N = n)and p(U = u) (which separately marginalize over both R andB) are not independent of one another.

Shared X Ancestry

Because only a fraction of one’s genealogical ancestors are X an-cestors (and this fraction rapidly decreases with k; see equation(4)), two individuals sharing X segments IBD from a recent an-cestor considerably narrows the possible ancestors they couldshare. In this section, we describe the probability that a ge-nealogical ancestor is an X ancestor, and the distributions forIBD segment number and length across full- and half-cousin re-lationships. For simplicity we concentrate on the case wherethe cousins share a genealogical ancestor k generations ago inboth of their pedigrees, i.e. the individuals are k − 1 degreecousins. The formulae could be generalized to ancestors of un-equal generational-depths (e.g. second cousins once removed)

but we do not pursue this here.

Probability of a Shared X Ancestor Two individuals share theirfirst common genealogical ancestor in the kth generation if oneof an individual’s 2k ancestors is also one of the other individ-ual’s ancestors k generations back. Given this shared ancestor,we can calculate the probability that this single ancestor is alsoan X genealogical ancestor. Since this shared ancestor must beof the same sex in each of the two present-day individuals’ ge-nealogies, we condition on the ancestor’s sex (with probability1/2 each) and then calculate the probability that this individualis also an X ancestor (with the same sex). Let us define N♀ andN♂ as the number of genealogical female and male ancestors,and NX♀ and NX♂ as the number of X female and male ancestors

of a present-day individual in the kth generation. Then:

P(shared X ancestor | shared ancestor k generations)

=N♀2k

(NX♀N♀)2

+N♂2k

(NX♂N♂

)2

=12

(Fk+1

2k−1

)2+

12

(Fk

2k−1

)2(11)

Thus, the probability that a shared genealogical ancestor isalso a shared X ancestor is decreasing at an exponential rate.By the 8th generation, a shared genealogical ancestor has less

Models of Recent Ancestry on the X Chromosome 7

Page 8: A Genealogical Look at Shared Ancestry on the X Chromosome · A Genealogical Look at Shared Ancestry on the X Chromosome Vince Buffalo,†,1, Stephen M. Mount‡ and Graham Coop†

0

1

2

3

4

5k = 2 k = 3 k = 4

0.0 0.5 1.0 1.5 2.0

0

1

2

3

4

5k = 5

0.0 0.5 1.0 1.5 2.0

k = 6

0.0 0.5 1.0 1.5 2.0

k = 7

length of IBD segments (Morgans)

de

nsity

Figure 6 The analytic distributions of IBD segment length between an ancestor in the kth generation (for k ∈ {2, . . . , 7}) and apresent-day female (blue lines), and the binned average over 5,000 simulations (gray points).

than a five percent chance of being a shared X ancestor of bothpresent-day individuals.

The Sex of Shared Ancestor Unlike genealogical ancestors—which are equally composed of males and females—recent X ge-nealogical ancestors are predominantly female. Since a present-day female has Fk+1 female ancestors and Fk male ancestors kgenerations ago, the ratio of female to male X genealogical an-

cestors converges to the Golden Ratio φ = 1+√

52 (Simson 1753).

limk→∞

Fk+1Fk

= φ (12)

In modeling the IBD segment number and length distribu-tions between present day individuals, the sex of the sharedancestor k generations ago affects the genetic ancestry processin two ways. First, a female shared ancestor allows the twopresent-day individuals to share segments on either of her twoX chromosomes while descendents of a male shared ancestorshare IBD segments only through his single X chromosome.Second, the no two adjacent males condition implies a maleshared X genealogical ancestor constrains the X genealogy suchthat the present-day X descendents are related through his twodaughters. Given that the ratio of female to male X ancestorsis skewed, our later distributions require an expression for theprobability that a shared X ancestor in the kth generation is fe-male, which we work through in this section.

As in equation (11), an ancestor shared in the kth gener-ation of two present-day individuals’ genealogies must have

the same sex in each genealogy. Assuming both present-daycousins are females, in each genealogy there are Fk possiblemale ancestors and Fk+1 female ancestors that could be shared.Across each present-day females’ genealogies there are (Fk)

2

possible male ancestor combinations and (Fk+1)2 possible fe-

male ancestor combinations. Thus, if we let ♀X and ♂X denotethat the sex of the shared is female and male respectively, theprobability of a female shared ancestor is:

P(♀X) =(Fk+1)

2

(Fk)2 + (Fk+1)2 (13)

The probability that the shared ancestor is male is simply1 − P(♀X). One curiosity is that as k → ∞, P(♀X) → φ√

5=

5+√

510 ≈ 0.7236, where φ is the Golden Ratio.

Partnered Shared Ancestors Thus far, we have only looked attwo present-day individuals sharing a single X ancestor k gen-erations back. In monogamous populations, most shared an-cestry is likely to descend from two ancestors; we call such re-lationships partnered shared ancestors. In this section, we lookat full-cousins descending from two shared genealogical ances-tors that may also be X ancestors. Two full-cousins could ei-ther (1) both descend from two X ancestors such that they are Xfull-cousins, (2) share only one X ancestor, such that they are Xhalf-cousins, or (3) share no X ancestry. We calculate the proba-bilities associated with each of these events here.

Two individuals are full-cousins if the greatk−2 grandfatherand the greatk−2 grandmother in one individual’s genealogy

8 Vince Buffalo et al.

Page 9: A Genealogical Look at Shared Ancestry on the X Chromosome · A Genealogical Look at Shared Ancestry on the X Chromosome Vince Buffalo,†,1, Stephen M. Mount‡ and Graham Coop†

are in the other individual’s genealogy. For these two full-cousins to be X full-cousins, this couple must also be a couple inboth individuals’ X genealogies. In every X genealogy, the num-ber of couples in generation k is the number of females in gen-eration k − 1, as every female has two X ancestors in the priorgeneration (while males only have one). Thus, the probabilitytwo female k − 1 degree full-cousins are also X full-cousins is:

P(X full-cousins | full-cousins) =(

Fk2k−1

)2(14)

Now, we consider the event that two genealogical full-cousins are X half-cousins. Being X half-cousins implies thatthe partnered couple these full-cousins descend from includesa single ancestor that is in the X genealogies of both full-cousins.This single X ancestor must be a female, as a male X ancestor’sfemale partner must also be an X ancestor (since mothers mustpass an X). For a female to be an X ancestor but not her part-ner, one or both of her offspring must be male. Either of theseevents occurs with probability:

P(X half-cousins | full-cousins) =F2

k−1 + 2Fk−1Fk

22(k−1)(15)

The Distribution of Recombinational Meioses between Two X Half–Cousins To find distributions for the number and lengths ofIBD segments shared between two half-cousins on the X chro-mosome, we first need to find the distribution for the number offemales between two half-cousins with a shared ancestor in thekth generation. We refer to the individuals connecting the twocousins as a genealogical chain. As we’ll see in the next section,the number of IBD X segments shared between half-cousins de-pends on the sex of the shared ancestor; thus, we also derivedistributions in this section for the number of recombinationalmeioses along a genealogical chain, conditioning on the sex ofthe shared ancestor. As earlier, our models assume two present-day female cousins but are easily extended to male cousins.

First, there are 2k − 1 ancestral individuals separating twopresent-day female (k − 1)th degree cousins. These X ances-tors in the genealogical chain connecting the two present-dayfemale cousins follow the no two adjacent male condition; thusthe distribution of females follows the approach used in equa-tion (6) with k replaced with 2k − 1:

PH(R = r|k) =( r+1

2k−r−1)

F2k+1(16)

where the H (for half-cousin) subscript differentiates this equa-tion from equation (6), k is the generation of the shared ancestor.Similarly to equation (6), r is bounded such that rH,M ≤ r ≤2k − 1, where rH,M = ⌊(2k − 1)/2⌋.

Now, we derive the probability of R = r females conditionalon the shared ancestor being female, ♀X . This conditional distri-bution differs from equation (16) since it eliminates all genealog-ical chains with a male shared ancestor. We find the distributionof recombinational meioses conditional on a female shared an-cestor by placing the other R′ = r′ females (the prime denoteswe do not count the shared female ancestor here) along the twolineages of k − 1 individuals from the shared female ancestordown to the present-day female cousins. These R′ = r′ femalescan be placed in both lineages by positioning s females in thefirst lineage and r′ − s females in the second lineage, where sfollows the constraint ⌊(k − 1)/2⌋ ≤ s ≤ k − 1. Our equation

(6) models the probability of an X genealogical chain having r fe-males in k generations; here, we use this distribution to find theprobabilities of s females in k− 1 generations in one lineage andr′ − s females in k − 1 generations in the other lineage. As thenumber of females in each lineage is independent, we take theproduct of these probabilities and sum over all possible s; thisis the discrete convolution of the number of females in two lin-eages k − 1 generations long. Finally, we account for the sharedfemale ancestor, by the transform R = R′ + 1 = r:

PH(R = r|♀X , k) =k−1

∑s=⌊(k−1)/2⌋

( s+1k−s−1)(

r−sk+s−r)

(Fk+1)2 (17)

In general, this convolution approach allows us to find thedistribution of females in a genealogical chain under variousconstraints, and can easily be extended to the case of a sharedmale X ancestor (with necessarily two daughters).

Finally, note that we have modeled the number of females ina genealogical chain of 2k − 1 individuals. Thus far in our mod-els, the number of females has equaled the number of recombi-national meioses. However, when considering the number ofrecombinational meioses between half-cousins, two recombina-tional meioses occur if the shared ancestor is a female (as sheproduced two independent gametes she transmits to her twooffspring). Thus, for a single shared X ancestor, the number ofrecombinational meioses ρ is

ρ =

{r + 1 if ♀X

r if ♂X(18)

which we use when parameterizing the rate of recombinationin our IBD segment number distributions. Furthermore, since ashared female ancestor has two X haplotypes that present-daycousins could share segments IBD through, the binomial proba-bility 1/2ρ is doubled. Further constraints are needed to handlefull-cousins; we will discuss these below.

Half-Cousins In this section we calculate the distribution ofIBD X segments shared between two present-day female X half-cousins with a shared ancestor in the kth generation. We imag-ine we do not know any details about the lineages to this sharedancestor nor the sex of the shared ancestor, so we marginalizeover both. Thus, the probability of two (k − 1)th degree X half-cousins sharing N = n segments is:

P(N = n|k) =2k−1

∑r=rH,M

PH(R = r|k)

× [P(N = n|♀X , R = r)P(♀X |R = r) +× P(N = n|♂X , R = r)P(♂X |R = r)]

(19)

As discussed in the previous section, the total number of re-combinational meioses along the genealogical chain betweenhalf-cousins depends on the unobserved sex of the shared an-cestor (i.e. equation (18)). Likewise, the binomial probabil-ity also depends on the shared ancestor’s sex. Accounting forthese adjustments, the probabilities P(N = n|♀X , R = r) and

Models of Recent Ancestry on the X Chromosome 9

Page 10: A Genealogical Look at Shared Ancestry on the X Chromosome · A Genealogical Look at Shared Ancestry on the X Chromosome Vince Buffalo,†,1, Stephen M. Mount‡ and Graham Coop†

P(N = n|♂X , R = r) are:

P(N = n|♀X , R = r) =∞

∑b=0

Pois(B = b|λ = (r + 1)ν)

× Bin(N = n|l = b + 1, p = 1/2r)(20a)

P(N = n|♂X , R = r) =∞

∑b=0

Pois(B = b|λ = rν)

× Bin(N = n|l = b + 1, p = 1/2r)(20b)

Since the sex of the shared ancestor depends on the numberof females in the genealogical chain between the two cousins(e.g. if r = 2k− 1, the shared ancestor is a female with certainty),we require an expression for the probability of the shared ances-tor being male or female given R = r. Using Bayes’ theorem, wecan invert the conditional probability P(R = r|♀X) to find thatthe probability that a shared X ancestor is female conditionedon R females in the genealogical chain is

PH(♀X |R = r, k) =F2k−1

( r+12k−r−1) ((Fk+1)2 + (Fk)2)

×k−1

∑s=⌊(k−1)/2⌋

(s + 1

k − s + 1

)(r − s

k + s − r

)(21)

and P(♂X |R = r) can be found as the complement of this prob-ability.

Inserting equations (20a), (20b), and (21) into (19) gives usan expression for the distribution of IBD segment numbers be-tween two half-cousins with a shared ancestor k generationsago. Figure 7 compares the analytic model in equation (19) withthe IBD segments shared between half-cousins over 5,000 simu-lated pairs of X genealogies.

The density function for IBD segment lengths between Xcousins (either half- or full-cousins; length distributions areonly affected by the number of recombinations in the genealogi-cal chain) is equation (10) but marginalized over the number ofrecombinational meioses between two cousins (equation (16))rather than the number of recombinational meioses betweena present-day individual and a shared ancestor. Simulationsshow the length density closely matches simulation results (seeFigure 11 in the appendix).

Full-Cousins Full-cousin relationships allow descendents toshare IBD autosomal segments from either their shared mater-nal ancestor, shared paternal ancestral, or both. In contrast,since males only pass an X chromosome to daughters, onlyfull-sibling relationships in which both offspring are female(due to the no to adjacent males condition) are capable of leav-ing X genealogical descendents. We derive a distribution forthe number of IBD segments shared between (k − 1)th degreefull X cousins by conditioning on this familial relationship andmarginalizing over the unobserved number of females fromthe two full-sibling daughters to the present-day female full-cousins.

First, we find the number of females (including the two full-sibling daughters in the (k − 1)th generation) in the genealogi-cal chain between the two X full-cousins (omitting the sharedmale and female ancestors, which we account for separately).Like equation (17), this is a discrete convolution:

PF(R = r, k) =k−2

∑s=⌊(k−2)/2⌋

( s+1k−s−2)(

r−s−1k−r+s)

(Fk)2 (22)

where the F subscript indicates this equation is for full-cousins.This probability is valid for rF,M + 2 ≤ r ≤ 2k − 2 and is 0 else-where, where rF,M = 2⌊(k − 2)/2⌋+ 2. For N = n segments tobe shared between two X full-cousins, z segments can be sharedvia the maternal shared X ancestor (where 0 ≤ z ≤ n) and n − zsegments can be shared through the paternal shared X ancestor.We marginalize over all possible values of z, giving us anotherdiscrete convolution:

P(N = n|R = r) =2k−2

∑r=rF,M

n

∑z=0

P(N = z|♀X)

× P(N = n − z|♂X)PF(R = r)(23)

where

P(N = n|♀X , R = r) =∞

∑b=0

Bin(N = n|l = b + 1, p = 1/2r+1)

× Pois(B = b|λ = ν(r + 2)) (24)

P(N = n|♂X , R = r) =∞

∑b=0

Bin(N = n|l = b + 1, p = 1/2r)

× Pois(B = b|λ = νr) (25)

are the probabilities of sharing n segments through the sharedfemale and male X ancestors respectively. For the femaleshared ancestor, we account for two additional recombinationalmeioses (one for each of the two gametes she passes to her twodaughters), and the fact she can share segments through eitherof her X chromosomes (hence, why the binomial probabilityis 1/2r+1). We compare our analytic X full-cousin IBD segmentnumber results to 5,000 genealogical simulations in Figure 7.

Inference

With our IBD X segment distributions, we now turn to howthese can be used to infer details about recent X ancestry. Inpractice, inferring the number of generations back to a commonancestor (k) is best accomplished through the signature of re-cent ancestry from the 22 autosomes, rather than through theshort X chromosome. A number of methods are available forthe task of estimating k through autosomal IBD segments (Huffet al. 2011; Henn et al. 2012; Durand et al. 2014). Therefore, weconcentrate on questions about the extra information that the Xprovides conditional on k being known with certainty.

Here, we focus on two separate questions: (1) what is theprobability of being an X genealogical ancestor given that noIBD segments are observed, and (2) can we infer details aboutthe X genealogical chain between two half-cousins? These ques-tions address how informative the number of segments sharedbetween cousins is about the precise relationship of cousins. Weassume that segments of X chromosome IBD come only fromthe kth generation, and not from deeper relationships or fromfalse positives. In practice, inference from the X IBD segmentswould have to incorporate both of these complications, and assuch our results represent best case scenarios.

It’s possible that k generations back, an individual is a ge-nealogical X ancestor but shares no X genetic material with a

10 Vince Buffalo et al.

Page 11: A Genealogical Look at Shared Ancestry on the X Chromosome · A Genealogical Look at Shared Ancestry on the X Chromosome Vince Buffalo,†,1, Stephen M. Mount‡ and Graham Coop†

0.0

0.2

0.4

0.6

0.8

1.0

k = 2 k = 3

half−cousin

full−cousin

k = 4

0 1 2 3 4 5 6 7

0.0

0.2

0.4

0.6

0.8

1.0

k = 5

0 1 2 3 4 5 6 7

k = 6

0 1 2 3 4 5 6 7

k = 7

number of IBD segments

pro

ba

bili

ty

Figure 7 Distributions of X IBD segment number for X half- (blue) and X full-cousins (yellow). Lines show the analytic approxima-tions (equations (19) and (23)) and blue and yellow points show the probabilities for X half- and X full-cousins averaged over 5,000simulations.

1 5 10 15

0.0

0.2

0.4

0.6

0.8

1.0

generation

P(X

an

ce

sto

r | N

x =

0) X ancestor

X ancestor prior

half−cousins

half−cousins prior

Figure 8 The probability of X ancestry given no shared X ge-netic material. Yellow solid line: the probability an individualin the kth generation (x-axis) is an X ancestor to a present-dayfemale, given they share no X genetic material with her. Bluesolid line: the probability that two half-cousins share an X an-cestor in the kth generation, given they share no X genetic ma-terial between them. Dashed lines indicate the prior probabili-ties.

present-day descendent. To what extent is the lack of shar-ing on the X chromosome with an ancestor informative aboutour relationship to them? Similarly, how does the lack of shar-

ing of the X chromosome between (k − 1)th cousins changeour views as to their relationship? To get at these issues, wecan use our analytic approximations to calculate the probabilitythat one is an X ancestor given that no segments are observed,P(X ancestor | N = 0):

P(X ancestor | N = 0) =

P(X ancestorP(N = 0 | X ancestor)P(N = 0 | X ancestor)P(X ancestor) + P(not X ancestor)

(26)

Here, P(N = 0 | X ancestor) is given by equation (8) andP(X ancestor) is given by equation (4). This function is shownin Figure 8 (yellow lines). We can derive an analogous ex-pression for the probability of two female half-cousins sharingan X ancestor but not having any X segments IBD by replac-ing P(X ancestor | N = 0) with equation (19), and replac-ing P(X ancestor) with P(shared X ancestor) which is given byequation (11), and plotted in Figure 8 (blue lines). We also plotthe prior distributions to show the answer if no informationabout the X chromosome was observed. In both cases, observ-ing zero shared segments on the X chromosome makes it morelikely that a shared ancestor was not a shared X ancestor. Thisadditional information is strongest—as compared to the prior—for close relationships (k < 5), where segments on the X arelikely to be shared if the ancestor was an X genealogical ances-tor.

Models of Recent Ancestry on the X Chromosome 11

Page 12: A Genealogical Look at Shared Ancestry on the X Chromosome · A Genealogical Look at Shared Ancestry on the X Chromosome Vince Buffalo,†,1, Stephen M. Mount‡ and Graham Coop†

Additionally, X IBD segments carry information about ge-nealogical details that are not possible considering autosomeIBD segments alone. While IBD autosome segments leave asignature of recent ancestry between two individuals, the uni-formity of recombinational meioses across every lineage to theshared ancestor leaves no signal of which genealogical chainconnects two present-day cousins. In contrast, since the num-ber of females varies along X lineages and effects the number ofrecombination events, the number and length of X segmentscarries information about which genealogical chain connectstwo cousins. Information about the genealogical chain betweencousins is summarized by the number of female ancestors be-tween two cousins, R, and constrains the possible X genealog-ical chains between these two cousins by varying amounts de-pendent on R and k.

Our approach to inference is through the posterior distribu-tion of R given an observed number of IBD segments N andconditioning on k. We calculate this posterior conditional onthe cousins sharing an X ancestor; we do this to separate it fromthe question of whether a pair share an X ancestor (as derivedin equation (26)). Our posterior probability is given by Bayes’theorem

P(R|N = n, k) =P(N = n|R)P(R)

P(N = n)(27)

where the prior P(R) is readily calculable through equation (16)and P(N = n) given by equation (19). The data likelihoodP(N = n|R) is given by equation (7).

In Figure 9, we show the posterior distributions over thenumber of recombinational meioses, given an observed num-ber of IBD segments between two females known to be Xhalf-cousins. Again, these posterior distributions condition onknowing how many generations have occurred since the sharedancestor, k. With an increasing number of generations to theshared ancestor, fewer segments survive to be IBD between thepresent-day cousins. Consequently, observing IBD segments in-creases the likelihood of fewer females (and thus fewer recombi-national meioses) between the cousins. For example, for k = 6,observing (the admittedly unlikely) six or more IBD segmentsleads to a posterior mode over the fewest possible number offemales in the genealogical chain (⌊(2k − 1)/2⌋ = 5; Figure 9).Similarly, observing between three and five segments places theposterior mode over six females in the genealogical chain. Fork > 4, seeing zero segments provides little information overthe prior about the relationship between the cousins, as shar-ing zero segments is the norm. In each case, a posterior distri-bution over the number of females in a genealogical chain cangreatly reduce the number of likely genealogical configurations.For example, observing n = 3 shared X segments between half-cousins k = 4 generations back first restricts their shared ances-tor to be one of the 34 possible shared X ancestors (out of thetotal 128 possible shared ancestors). Furthermore, these threeshared X segments combined with our posterior distributionover recombinational meioses, leads to a maximum a posterioriestimate of R = 4 females along the genealogical chain connect-ing the half-cousins. Only 10 genealogical chains connectingthese cousins contain four females, thus the likely relationshipof these cousins is considerably narrowed from the original 128possible relationships. Therefore, sharing genetic segments onthe X can provide considerable information about genealogicalrelationships.

Discussion

Detecting and inferring the nature of recent ancestry is impor-tant to a range of applications and the nature of such relation-ships are often of inherent interest. As the sample sizes of popu-lation genomic data sets increase, so will the probability of sam-pling individuals that share recent ancestry. In particular, thevery large data sets being developed in human genetics will ne-cessitate taking a genealogical view of recent relatedness. Ourmethods extend existing methods for the autosomes by account-ing for the special inheritance pattern of the X. Specifically, re-cent ancestry on the X differs from the autosomes since malesonly inherit an X from their mothers, and fathers pass an un-recombined (ignoring the PAR) X to their daughters. Conse-quently, the number of recombinational meioses, which deter-mine the length and number of IBD segments, varies acrossthe X genealogy. Since in most cases the number of femalesbetween two individuals in a genealogical chain is often un-known, we derive a distribution for recombinational meioses(equation (6)).

We also derive distributions for the length and number ofIBD X segments by marginalizing over the unknown numberof recombinational meioses that can occur between two indi-viduals connected through a genealogical chain. In both cases,we condition on knowing k (the generations back to a sharedancestor) which can be inferred from the autosomes (Huff et al.2011). Our models for IBD segment number and length use aPoisson-Binomial approximation to the recombination process,which match simulation results closely.

The genomic information about the genealogical relation-ship between pairs of individuals is inherently limited (due tothe small number of segments shared and the stochasticity ofthe process); thus making full use of all shared segments onall chromosomes will be key to better inference. Our resultshere not only allow X IBD segments to be used to model recentancestry, but in fact provide qualitatively different informativeabout genealogical ancestry than autosomal data alone. This ad-ditional information occurs through two avenues. First, sharingIBD segments on the X immediately reduces the potential ge-nealogical ancestors two individuals share, since one’s X ances-tors are only a fraction of their possible genealogical ancestors(i.e. Fk+2/2k in the case of a present-day female). Second, thevarying number of females in an X genealogy across lineagescombined with the fact that recombinational meioses only oc-cur in females to some extent leave a lineage-specific signatureof ancestry.

Unfortunately, the X chromosome is short, such that thechance of any signal of recent ancestry on the X decays ratherquickly. However, growing sample sizes will increase both thedetection of the pairwise relatedness and cases of relatednessbetween multiple individuals. In these large data sets, overlap-ping pairwise relationships (e.g. a present-day individual thatshares X segments with two distinct other individuals) could bequite informative about the particular ancestors that individu-als share.

Our results should also be of use in understanding patternsof admixture on the X chromosome. In particular our resultsabout the posterior information from the number and lengthof X segments shared with a genealogical ancestor can help usunderstand what can be learned from the presence (or absence)of segments of particular ancestry on the X chromosome. Forexample, if you observed long segments of a particular ances-try on your X chromosome our results could be used to aid

12 Vince Buffalo et al.

Page 13: A Genealogical Look at Shared Ancestry on the X Chromosome · A Genealogical Look at Shared Ancestry on the X Chromosome Vince Buffalo,†,1, Stephen M. Mount‡ and Graham Coop†

1 2 3

0.0

0.2

0.4

0.6

0.8

1.0k = 2

2 3 4 5

0.0

0.2

0.4

0.6

0.8

1.0k = 3

3 4 5 6 7

0.0

0.2

0.4

0.6

0.8

1.0number of IBD

segments

0

1

2

3

4

5

6

7

8

k = 4

4 5 6 7 8 9

0.0

0.2

0.4

0.6

0.8

1.0k = 5

5 6 7 8 9 10 11

0.0

0.2

0.4

0.6

0.8

1.0k = 6

6 7 8 9 10 11 12 13

0.0

0.2

0.4

0.6

0.8

1.0k = 7

number of recombinational meioses

poste

rior

Figure 9 Posterior probability distribution P(R = r|N = n, k) for different generations (each panel), and the observed number ofIBD segments (each colored line). The prior distribution of recombinational meioses given k is indicated by a black dashed line.

the identification of which parts of your family tree this ances-try has been inherited from. These genetic genealogical infer-ences can provide informative details in genealogy reconstruc-tion where historical genealogical information is missing or un-certain. While this information for an individual decays some-what quickly after a small number of generations, models of Xchromosome segment ancestry will be useful at a population-level for understanding sex-biased admixture (Bryc et al. 2010;Goldberg and Rosenberg 2015; Shringarpure et al. 2016).

Acknowledgements

We wish to thank Jeremy Berg, Nancy Chen, Kristin Lee, andthe rest of the Coop lab for helpful discussions and feedbackon earlier drafts. We also thank Amy Williams and the Statisti-cal and Computational Genetics Reading Group at Cornell forvery helpful feedback on the BioRxiv preprint version. Finally,we thank Noah Rosenberg and two anonymous reviewers fortheir feedback. This research was supported by an NSF Grad-uate Research Fellowship grant awarded to VB (1148897), andNational Institute of General Medical Sciences of the NationalInstitutes of Health under award numbers NIH RO1GM83098and RO1GM107374 to GC.

Literature Cited

Baca, M., K. Doan, M. Sobczyk, A. Stankovic, and P. Weglenski,2012 Ancient dna reveals kinship burial patterns of a pre-columbian andean community. BMC Genetics 13: 1.

Balding, D. J. and R. A. Nichols, 1994 Dna profile match proba-bility calculation: how to allow for population stratification,relatedness, database selection and single bands. Forensic Sci-ence International 64: 125–140.

Barton, N. H. and A. M. Etheridge, 2011 The Relation BetweenReproductive Value and Genetic Contribution. Genetics 188:953–973.

Basin, S. L., 1963 The Fibonacci sequence as it appears in nature.Fibonacci Quarterly 1: 53–56.

Belin, T. R., D. W. Gjertson, and M.-Y. Hu, 1997 Summarizingdna evidence when relatives are possible suspects. Journal ofthe American Statistical Association 92: 706–716.

Bryc, K., A. Auton, M. R. Nelson, J. R. Oksenberg, S. L. Hauser,S. Williams, A. Froment, J.-M. Bodo, C. Wambebe, S. A.Tishkoff, et al., 2010 Genome-wide patterns of populationstructure and admixture in west africans and african ameri-cans. Proceedings of the National Academy of Sciences 107:786–791.

Bustamante, C. D. and S. Ramachandran, 2009 Evaluating sig-natures of sex-specific processes in the human genome. Na-ture Genetics 41: 8–10.

Chang, J. T., 1999 Recent common ancestors of all present-dayindividuals. Advances in Applied Probability pp. 1002–1026.

Donnelly, K. P., 1983 The probability that related individualsshare some section of genome identical by descent. Theoreti-cal Population Biology 23: 34–63.

Durand, E. Y., C. B. Do, J. L. Mountain, and J. M. Macpherson,2014 Ancestry composition: A novel, efficient pipeline for an-

Models of Recent Ancestry on the X Chromosome 13

Page 14: A Genealogical Look at Shared Ancestry on the X Chromosome · A Genealogical Look at Shared Ancestry on the X Chromosome Vince Buffalo,†,1, Stephen M. Mount‡ and Graham Coop†

cestry deconvolution. bioRxiv .Epstein, M. P., W. L. Duren, and M. Boehnke, 2000 Improved in-

ference of relationship for pairs of individuals. The AmericanJournal of Human Genetics 67: 1219–1231.

Feller, W., 1950 An Introduction to Probability Theory and its Appli-cations, volume 1. John Wiley & Sons.

Fisher, R. A. et al., 1949 The theory of inbreeding. Edinburgh:Oliver & Boyd. .

Fisher, R. A. et al., 1954 A fuller theory of ‘junctions’ in inbreed-ing. Heredity 8: 187–197.

Fox, M., R. Sear, and J. Beise, 2009 Grandma plays favourites:X-chromosome relatedness and sex-specific childhood mor-tality. Proceedings of the Royal Society B .

Fu, Q., M. Hajdinjak, O. T. Moldovan, S. Constantin, S. Mallick,P. Skoglund, N. Patterson, N. Rohland, I. Lazaridis, B. Nickel,B. Viola, K. Prüfer, M. Meyer, J. Kelso, D. Reich, and S. Pääbo,2015 An early modern human from Romania with a recentNeanderthal ancestor. Nature 524: 216–219.

Glaubitz, J. C., O. E. Rhodes, and J. A. DeWoody, 2003 Prospectsfor inferring pairwise relationships with single nucleotidepolymorphisms. Molecular Ecology 12: 1039–1047.

Goldberg, A. and N. A. Rosenberg, 2015 Beyond 2/3 and 1/3:The Complex Signatures of Sex-Biased Admixture on the XChromosome. Genetics 201: 263–279.

Gravel, S., 2012 Population genetics models of local ancestry.Genetics 191: 607–619.

Gusev, A., J. K. Lowe, M. Stoffel, M. J. Daly, D. Altshuler, J. L.Breslow, J. M. Friedman, and I. Pe’er, 2009 Whole population,genome-wide mapping of hidden relatedness. Genome Re-search 19: 318–326.

Haak, W., G. Brandt, H. N. de Jong, C. Meyer, R. Ganslmeier,V. Heyd, C. Hawkesworth, A. W. Pike, H. Meller, and K. W.Alt, 2008 Ancient dna, strontium isotopes, and osteologicalanalyses shed light on social and kinship organization of thelater stone age. Proceedings of the National Academy of Sci-ences 105: 18226–18231.

Hassold, T., S. Sherman, D. Pettay, D. Page, and P. Jacobs, 1991Xy chromosome nondisjunction in man is associated withdiminished recombination in the pseudoautosomal region.American journal of human genetics 49: 253.

Henn, B. M., L. Hon, J. M. Macpherson, N. Eriksson, S. Sax-onov, I. Pe’er, and J. L. Mountain, 2012 Cryptic Distant Rel-atives Are Common in Both Isolated and Cosmopolitan Ge-netic Samples. PLoS One 7: e34267–13.

Huff, C. D., D. J. Witherspoon, T. S. Simonson, J. Xing, W. S.Watkins, Y. Zhang, T. M. Tuohy, D. W. Neklason, R. W. Burt,S. L. Guthery, et al., 2011 Maximum-likelihood estimation ofrecent shared ancestry (ersa). Genome Research 21: 768–774.

Ihaka, R., P. Murrell, K. Hornik, and A. Zeileis, 2008 Colorspace:color space manipulation. R package version pp. 1–0.

Keyser-Tracqui, C., E. Crubezy, and B. Ludes, 2003 Nuclear andmitochondrial dna analysis of a 2,000-year-old necropolis inthe egyin gol valley of mongolia. The American Journal ofHuman Genetics 73: 247–260.

Koller, P. C. and C. Darlington, 1934 The genetical and mechan-ical properties of the sex-chromosomes. Journal of Genetics29: 159–173.

Laughlin, H. H., 1920 Calculating ancestral influence in man: Amathematical measure of the facts of bisexual heredity. Ge-netics 5: 435.

Liang, M. and R. Nielsen, 2014 The lengths of admixture tracts.Genetics 197: 953–967.

Palamara, P. F., T. Lencz, A. Darvasi, and I. Peer, 2012 Lengthdistributions of identity by descent reveal fine-scale demo-graphic history. The American Journal of Human Genetics91: 809–822.

Pinto, N., L. Gusmão, and A. Amorim, 2011 X-chromosomemarkers in kinship testing: a generalisation of the ibd ap-proach identifying situations where their contribution is cru-cial. Forensic Science International: Genetics 5: 27–32.

Pinto, N., P. V. Silva, and A. Amorim, 2012 A general methodto assess the utility of the x-chromosomal markers in kinshiptesting. Forensic Science International: Genetics 6: 198–207.

Pool, J. E. and R. Nielsen, 2007 Population size changes reshapegenomic patterns of diversity. Evolution 61: 3001–3006.

Pool, J. E. and R. Nielsen, 2009 Inference of historical changesin migration rate from the lengths of migrant tracts. Genetics181: 711–719.

R Core Team, 2015 R: A Language and Environment for StatisticalComputing. R Foundation for Statistical Computing, Vienna,Austria.

Ralph, P. and G. Coop, 2013 The geography of recent geneticancestry across europe. PLoS Biology 11: e1001555.

Ram, K. and H. Wickham, 2015 wesanderson: A Wes AndersonPalette Generator. R package version 0.3.2.

Ramachandran, S., N. A. Rosenberg, M. W. Feldman, andJ. Wakeley, 2008 Population differentiation and migration: co-alescence times in a two-sex island model for autosomal andx-linked loci. Theoretical Population Biology 74: 291–301.

Ramachandran, S., N. A. Rosenberg, L. A. Zhivotovsky, andM. W. Feldman, 2004 Robustness of the inference of humanpopulation structure: a comparison of x-chromosomal andautosomal microsatellites. Human genomics 1: 1.

Rice, W. R., S. Gavrilets, and U. Friberg, 2008 Sexually antago-nistic zygotic drive of the sex chromosomes. PLoS Genetics 4:e1000313.

Rohde, D., S. Olson, and J. T. Chang, 2004 Modelling the recentcommon ancestry of all living humans. Nature .

Rosenberg, N. A., 2016 Admixture Models and the BreedingSystems of H. S. Jennings: A GENETICS Connection. Genet-ics 202: 9–13.

Rosser, Z. H., P. Balaresque, and M. A. Jobling, 2009 Gene con-version between the X chromosome and the male-specific re-gion of the Y chromosome at a translocation hotspot. TheAmerican Journal of Human Genetics .

Rossum, G., 1995 Python reference manual. Technical report,Amsterdam, The Netherlands, The Netherlands.

Royal, C. D., J. Novembre, S. M. Fullerton, D. B. Goldstein,J. C. Long, M. J. Bamshad, and A. G. Clark, 2010 Inferringgenetic ancestry: opportunities, challenges, and implications.The American Journal of Human Genetics 86: 661–673.

Shringarpure, S. S., C. D. Bustamante, K. L. Lange, and D. H.Alexander, 2016 Efficient analysis of large datasets and sexbias with admixture. bioRxiv .

Simson, R., 1753 An Explication of an Obscure Passage in Al-bert Girard’s Commentary upon Simon Stevin’s Works (VideLes Oeuvres Mathem. de Simon Stevin, à Leyde, 1634, p. 169,170); By Mr. Simson, Professor of Mathematics at the Univer-sity of Glasgow: Communicated by the Right HonourablePhilip Earl Stanhope. Philosophical Transactions (1683-1775)pp. 368–377.

Sjerps, M. and A. D. Kloosterman, 1999 On the consequences ofdna profile mismatches for close relatives of an excluded sus-pect. International Journal of Legal Medicine 112: 176–180.

14 Vince Buffalo et al.

Page 15: A Genealogical Look at Shared Ancestry on the X Chromosome · A Genealogical Look at Shared Ancestry on the X Chromosome Vince Buffalo,†,1, Stephen M. Mount‡ and Graham Coop†

Sloane, N., 2010 Online encyclopaedia of integer sequences.Published electronically at http://www.oeis.org .

Sun, L., K. Wilder, and M. S. McPeek, 2002 Enhanced pedigreeerror detection. Human Heredity 54: 99–110.

Thomas, A., N. J. Camp, J. M. Farnham, K. Allen-Brady, andL. A. Cannon-Albright, 2008 Shared genomic segment analy-sis. mapping disease predisposition genes in extended pedi-grees using snp genotype assays. Annals of Human Genetics72: 279–287.

Thomas, A., M. H. Skolnick, and C. M. Lewis, 1994 Genomicmismatch scanning in pedigrees. Mathematical Medicine andBiology 11: 1–16.

Thompson, E. A., 2013 Identity by descent: variation in meiosis,across genomes, and in populations. Genetics 194: 301–326.

Wachter, K. W., E. A. Hammel, and P. Laslett, 1979 StatisticalStudies of Historical Social Structure. Academic Press Inc.

Wickham, H., 2009 ggplot2: Elegant Graphics for Data Analysis.Springer-Verlag New York.

Wickham, H., 2016a purrr: Functional Programming Tools. R pack-age version 0.2.0.

Wickham, H., 2016b tidyr: Easily Tidy Data with ‘spread()’ and‘gather()’ Functions. R package version 0.4.0.

Wickham, H. and R. Francois, 2015 dplyr: A Grammar of DataManipulation. R package version 0.4.3.

Wilf, H. S., 2013 generatingfunctionology. Elsevier.Wright, S., 1922 Coefficients of inbreeding and relationship. The

American Naturalist 56: 330–338.Wright, S., 1934 The method of path coefficients. The Annals of

Mathematical Statistics 5: 161–215.

Convergence of the Thinned Poisson Process toPoisson-Binomial Model

We compared the Poisson thinning approximation and thePoisson-Binomial models. One can show using the law of totalexpectation that the Poisson-Binomial and Poisson model havethe same expected value:

E[N] =∞

∑b=0

E[N|B = b]P(B = b)

E[N] =12d

(∞

∑b=0

bPois(B = b) + c∞

∑b=0

Pois(B = b)

)

E[N] =12d (νd + c)

This is the same expected value as the thinned Poisson pro-cess with rate (νd + c)/2d. However, the Poisson thinning andPoisson-Binomial models differ in their variance. Using Eve’slaw, we can show the Poisson-Binomial model has variance

V[N] = EB[V[N|B]] + VB[E[N|B]]

V[N] =dv + 1

2d − 122d

This differs from the thinned Poisson process variance by theterm 1/22d, which grows smaller with increasing d. Finally, wenumerically show these two distributions (here, we label thetwo distributions for k generations µk(x) and νk(x), where x isthe number of segments) converge quickly in total variationaldistance (dTV(µk, νk) = 1/2 ∑∞

n=0 |µk(n)− νk(n)|) as k increases,in Figure 10.

1 5 10

0.0

0.2

0.4

0.6

0.8

1.0

generation

tota

l va

ria

tio

n d

ista

nce

X chromosome

autosomes

Figure 10 The total variation distance between the Poissonthinning and the Poisson-Binomial model for IBD segmentnumber for X segments (yellow) and the autosomal segments(blue).

Additional Autosomal Segment Distributions

The distribution of IBD segments between cousins Similar to thedistribution of autosomal segments between a present-day in-dividual and an ancestor (Section ), we can derive the distribu-tion for the number of IBD segments shared between two half-cousins with an ancestor in the kth generation. Two half-cousinsare separated by 2k meioses, thus the distribution for number ofsegments is:

Models of Recent Ancestry on the X Chromosome 15

Page 16: A Genealogical Look at Shared Ancestry on the X Chromosome · A Genealogical Look at Shared Ancestry on the X Chromosome Vince Buffalo,†,1, Stephen M. Mount‡ and Graham Coop†

P(N = n | k, ν, c) = Pois(

N = n | λ = (c + 2kν)/22k−1)

(28)

Since either of the shared ancestor’s haplotypes can be sharedIBD between the two cousins, the Poisson process rate is dou-bled. Full-cousins can share segments via either of their twoshared ancestors, leading the distribution to be:

P(N = n | k, ν, c) = Pois(

N = n | λ = (c + 2kν)/22k−2)

The distribution of autosome segment lengths In addition to thenumber of IBD segments, the length of segments is also infor-mative about ancestry (e.g. Palamara et al. 2012). As we modelcrossing over as a Poisson process, a one Morgan region willexperience on average d recombination events over d meioses.Therefore, the probability density of segment lengths sharedIBD between two individuals d meioses apart is exponentialwith rate d:

p(U = u|d) = de−du (29)

Equations (29) and (28) specify a model of the number andlengths of segments shared between various degree relatives.Various authors have used these types of results to derivelikelihood-based models for classifying the genealogical rela-tionship between pairs of individuals using autosome IBD data(Huff et al. 2011; Henn et al. 2012; Durand et al. 2014).

Generating Function for Recombinational Meioses

We also develop a generating function g(x, k) that encodes thenumber of recombinational meioses in the kth generation as thecoefficient for the term xk. This generating function can also beused in approximations and finding moments of the distribu-tion pk(r).

An expansion of the generating function below encodes thenumber of lineages with r females in a genealogical chain k gen-erations long (nk,r) as the coefficient of the term xk

g(x, k) =1

2k+1√x√

x + 4

[x((

R+)k −(

R−)k)

+√

x√

x + 4((

R−)k+(

R+)k)

+ −2(

R−)k+ 2

(R+)k

]where

R− = x −√

x√

x + 4

R+ = x +√

x√

x + 4

Proof. We begin by stating some recurrences that occur from theinheritance pattern of X ancestry:

nk,r = mk,r + fk,r (30a)

mk,r = fk−1,r (30b)

fk,r = fk−1,r−1 + mk−1,r−1 (30c)

(30d)

Starting from equation (30a):

nk,r = mk,r + fk,r

nk,r = fk−1,r + fk,r

nk,r = fk−1,r + fk−1,r−1 + mk−1,r−1

nk,r = fk−1,r + nk−1,r−1

nk,r = fk−2,r−1 + mk−2,r−1 + nk−1,r−1

finally, substituting equation (30a) again gives us the desiredrecurrence relation for nk,r:

nk,r = nk−2,r−1 + nk−1,r−1 (31)

We can now use generating functions (Wilf 2013) to tacklethis recurrence. Define:

Ak(x) = ∑r≥0

nk,rxr

then, multiply both sides of (31) by xr and sum over r. Onthe right hand side:

= ∑r≥0

nk−2,r−1xr + ∑r≥0

nk−1,r−1xr

Note that nk,r = 0 if r < 0. Multiplying and dividing thesecond term by x yields:

x(nk−1,0x + nk−1,1x2 + nk−1,2x3 + . . .)/x = xAk−1(x)

An identical derivation works for the first term. We find:

Ak(x) = xAk−1(x) + xAk−2(x)

This generating function is in the form of another recurrence.We can solve this recurrence (i.e. with Mathematica) with theinitial conditions below (which can be derived from (30a) andits initial conditions)

1. A0(x) = 1

2. A1(x) = 1 + x

to find a solution with these initial conditions, giving us ourdesired generating function g(x, k).

We can see that our generating function works via an expan-sion and verify the coefficients match known numbers of recom-binational meioses for some k. For example, let’s expand g(x, k)at k = 5:

x2 + 6x3 + 5x4 + x5

which matches the nk,r values found via computational calcula-tion.

Half-Cousins IBD Length Distribution Simulation Re-sults

Figure 11 show the concordance between our cousin IBD seg-ment length analytic distributions and the binned average(1.98cM bin intervals) of 5,000 simulations.

16 Vince Buffalo et al.

Page 17: A Genealogical Look at Shared Ancestry on the X Chromosome · A Genealogical Look at Shared Ancestry on the X Chromosome Vince Buffalo,†,1, Stephen M. Mount‡ and Graham Coop†

0

5

10

15k = 2 k = 3 k = 4

0.0 0.5 1.0 1.5 2.0

0

5

10

15k = 5

0.0 0.5 1.0 1.5 2.0

k = 6

0.0 0.5 1.0 1.5 2.0

k = 7

length of IBD segments (Morgans)

de

nsity

Figure 11 The analytic distributions of IBD segment length (blue lines) between two-present-day female half-cousins with a sharedancestor in the kth generation (each panel), and the binned average 5,000 simulations (gray points).

An Approximation of X Pedigree Collapse

Since our models of recent X ancestry omit the possibility ofpedigree collapse, it is worthwhile to see when this assumptionbreaks down. To see how pedigree collapse becomes an increas-ing problem further generations back, we look at the probabil-ity that all of a single individual’s Fk+2 X ancestors, when sam-pled from a population of N individuals with replacement, aredistinct. We treat generations as discrete and non-overlapping,and look at the probability that all Fk+2 are distinct individu-als as a function of how many generations we go back. Thisproblem is similar to the celebrated birthday problem, but withtwo rooms of participants: one room of females and anotherof males. Assuming random mating, each generation, one’s Xancestors must be randomly selected with replacement from apopulation of N individuals. For all ancestors to be distinct, allFk male ancestors selected from a pool of N/2 and Fk+1 femaleancestors selected from a pool of N/2 must be unique:

P(X ancestors all distinct) = P(male X ancestors unique)

× P(female X ancestors unique)

=Fk

∏i=1

(1 − 2i

N

) Fk+1

∏j=1

(1 − 2j

N

)

This probability as a function of k is plotted in Figure 12. ForX ancestors, the probability that at least two individuals arenon-distinct only becomes a significant problem after around

12 generations. Note that this is a very conservative accountof how pedigree collapse could affect our calculations; even iftwo ancestors were to be non-distinct, this is unlikely to affectour calculations greatly. For pedigree collapse to affect our IBDsegment models, an individual has to both be a genealogicalancestor and a genetic ancestor of the present-day individual;pedigree collapse has no genetic affect if non-distinct individu-als are not genetic ancestors.

0 5 10 15 20

0.0

0.2

0.4

0.6

0.8

1.0

generation

pro

ba

bili

ty a

ll a

nce

sto

rs d

istin

ct

genealogical ancestors

X ancestors

Figure 12 The probability that all genealogical and X ancestorsare distinct in a population of N = 100, 000.

For other pedigree collapse related quantities (e.g., what’s

Models of Recent Ancestry on the X Chromosome 17

Page 18: A Genealogical Look at Shared Ancestry on the X Chromosome · A Genealogical Look at Shared Ancestry on the X Chromosome Vince Buffalo,†,1, Stephen M. Mount‡ and Graham Coop†

the average number of distinct ancestors k generations back),see Wachter et al. (1979) approximations which use Feller’s(1950) occupancy models.

18 Vince Buffalo et al.