Research ArticlePath-Counting Formulas for Generalized Kinship Coefficientsand Condensed Identity Coefficients
En Cheng1 and Z Meral Ozsoyoglu2
1 Computer Science Department The University of Akron Akron OH 44325 USA2 Electrical Engineering and Computer Science Department Case Western Reserve University 10900 Euclid AvenueCleveland OH 44106 USA
Correspondence should be addressed to En Cheng echenguakronedu
Received 14 January 2014 Accepted 8 May 2014 Published 21 July 2014
Academic Editor Zhenyu Jia
Copyright copy 2014 E Cheng and Z M Ozsoyoglu This is an open access article distributed under the Creative CommonsAttribution License which permits unrestricted use distribution and reproduction in any medium provided the original work isproperly cited
An important computation on pedigree data is the calculation of condensed identity coefficients which provide a completedescription of the degree of relatedness of two individuals The applications of condensed identity coefficients range from geneticcounseling to disease tracking Condensed identity coefficients can be computed using linear combinations of generalized kinshipcoefficients for two three four individuals and two pairs of individuals and there are recursive formulas for computing thosegeneralized kinship coefficients (Karigl 1981) Path-counting formulas have been proposed for the (generalized) kinship coefficientsfor two (three) individuals but there have been no path-counting formulas for the other generalized kinship coefficients It hasalso been shown that the computation of the (generalized) kinship coefficients for two (three) individuals using path-countingformulas is efficient for large pedigrees together with path encoding schemes tailored for pedigree graphs In this paper wepropose a framework for deriving path-counting formulas for generalized kinship coefficientsThen we present the path-countingformulas for all generalized kinship coefficients for which there are recursive formulas and which are sufficient for computingcondensed identity coefficients We also perform experiments to compare the efficiency of our method with the recursive methodfor computing condensed identity coefficients on large pedigrees
1 Introduction
With the rapidly expanding field of medical genetics andgenetic counseling genealogy information is becomingincreasingly abundant In January 2009 the US Departmentof Health and Human Services released an updated andimproved version of the Surgeon GeneralrsquosWeb-based familyhealth history tool [1] This Web-based tool makes it easy forusers to record their family health history Large extendedhuman pedigrees are very informative for linkage analysisPedigrees including thousands of members in 10ndash20 gen-erations are available from genetically isolated populations[2 3] In human genetics a pedigree is defined as ldquoasimplified diagram of a familyrsquos genealogy that shows familymembersrsquo relationships to each other and how a specific traitabnormality or disease has been inheritedrdquo [4] Pedigreesare utilized to trace the inheritance of a specific disease
calculate genetic risk ratios identify individuals at risk andfacilitate genetic counseling To calculate genetic risk ratiosor identify individuals at risk we need to assess the degreeof relatedness of two individuals As a matter of fact allmeasures of relatedness are based on the concept of identicalby descent (IBD) Two alleles are identical by descent if oneis an ancestral copy of the other or if they are both copies ofthe same ancestral allele The IBD concept is primarily dueto Cotterman [5] and Malecot [6] and has been successfullyapplied to many problems in population genetics
The simplest measure of relationship between two indi-viduals is their kinship coefficient The kinship coefficientbetween two individuals 119894 and 119895 is the probability that an alleleselected randomly from 119894 and an allele selected randomlyfrom the same autosomal locus of 119895 are identical by descentTo better discriminate between different types of pairs of rel-atives identity coefficients were introduced by Gillois [7] and
Hindawi Publishing CorporationComputational and Mathematical Methods in MedicineVolume 2014 Article ID 898424 20 pageshttpdxdoiorg1011552014898424
2 Computational and Mathematical Methods in Medicine
Harris [8] and promulgated by Jacquard [9] Considering thefour alleles of two individuals at a fixed autosomal locus thereare 15 possible identity states Disregarding the distinctionbetween maternally and paternally derived alleles we obtain9 condensed identity states The probabilities associated witheach condensed identity state are called condensed identitycoefficients which are useful in a diverse range of fields Thisincludes the calculation of risk ratios for qualitative diseasethe analysis of quantitative traits and genetic counseling inmedicine
A recursive algorithm for calculating condensed identitycoefficients proposed by Karigl [10] has been known forsome time This method requires that one calculates a setof generalized kinship coefficients from which one obtainscondensed identity coefficients via a linear transformationOne limitation is that this recursive approach is not scalablewhen applied to very large pedigrees It has been previouslyshown that the kinship coefficients for two individuals [11ndash13]and the generalized kinship coefficients for three individuals[14 15] can be efficiently calculated using path-countingformulas together with path encoding schemes tailored forpedigree graphs
Motivated by the efficiency of path-counting formulas forcomputing the kinship coefficient for two individuals andthe generalized kinship coefficient for three individuals wefirst introduce a framework for developing path-countingformulas to compute generalized kinship coefficients con-cerning three individuals four individuals and two pairs ofindividuals Then we present path-counting formulas for allgeneralized kinship coefficients which have recursive formu-las proposed by Karigl [10] and are sufficient to computecondensed identity coefficients In summary our ultimategoal is to use path-counting formulas for generalized kinshipcoefficients computation so that efficiency and scalability forcondensed identity coefficients calculation can be improved
The main contributions of our work are as follows
(i) a framework to develop path-counting formulas forgeneralized kinship coefficients
(ii) a set of path-counting formulas for all generalizedkinship coefficients having recursive formulas [10]
(iii) experimental results demonstrating significant per-formance gains for calculating condensed identitycoefficients based on our proposed path-countingformulas as compared to using recursive formulas[10]
2 Materials and Methods
This section describes kinship coefficients and generalizedkinship coefficients identity coefficients and condensedidentity coefficients in more detail Conceptual terms for thepath-counting formulas for three and four individuals areintroduced in Section 23 In addition an overview of path-counting formula derivation is presented
21 Kinship Coefficients and Generalized Kinship CoefficientsThe kinship coefficient between two individuals 119886 and 119887 is
the probability that a randomly chosen allele at the samelocus from each is identical by descent (IBD) There are twoapproaches to computing the kinship coefficient Φ
119886119887 the
recursive approach [10] and the path-counting approach [16]The recursive formulas [10] forΦ
119886119887and Φ
119886119886are
Φ119886119887=1
2(Φ119891119887+ Φ119898119887) if 119886 is not an ancestor of 119887
Φ119886119886=1
2(1 + Φ
119891119898) =
1
2(1 + 119865
119886)
(1)
where119891 and119898 denote the father and themother of 119886 respec-tively and 119865
119886is the inbreeding coefficient of 119886
Wrightrsquos path-counting formula [16] forΦ119886119887is
Φ119886119887= sum
119860
sum
⟨119875119860119886119875119860119887⟩isin119875119875
(1
2)
119903+119904+1
(1 + 119865119860) (2)
where 119860 is a common ancestor of 119886 and 119887 119875119875 is a set of non-overlapping path-pairs ⟨119875
119860119886 119875119860119887⟩ from 119860 to 119886 and 119887 119903 is the
length of the path 119875119860119886 119904 is the length of the path 119875
119860119887 and 119865
119860
is the inbreeding coefficient of 119860 The path-pair ⟨119875119860119886 119875119860119887⟩ is
nonoverlapping if and only if the two paths share no commonindividuals except 119860
Recursive formulas proposed by Karigl [10] for general-ized kinship coefficients concerning three individuals fourindividuals and two pairs of individuals are listed as followsin (3) (4) and (5)
Φ119886119887119888=1
2(Φ119891119887119888+ Φ119898119887119888)
if 119886 is not an ancestor of 119887 or 119888
Φ119886119886119887
=1
2(Φ119886119887+ Φ119891119898119887) if 119886 is not an ancestor of 119887
Φ119886119886119886
=1
4(1 + 3Φ
119891119898) =
1
4(1 + 3119865
119886)
(3)
Φ119886119887119888119889
=1
2(Φ119891119887119888119889
+ Φ119898119887119888119889
)
if 119886 is not an ancestor of 119887 or 119888 or 119889
Φ119886119886119887119888
=1
2(Φ119886119887119888+ Φ119891119898119887119888
)
if 119886 is not an ancestor of 119887 or 119888
Φ119886119886119886119887
=1
4(Φ119886119887+ 3Φ119891119898119887)
if 119886 is not an ancestor of 119887
Φ119886119886119886119886
=1
8(1 + 7Φ
119891119898) =
1
8(1 + 7119865
119886)
(4)
Computational and Mathematical Methods in Medicine 3
Φ119886119887119888119889
=1
2(Φ119891119887119888119889
+ Φ119898119887119888119889
)
if 119886 is not an ancestor of 119887 or 119888 or 119889
Φ119886119886119887119888
=1
2(Φ119887119888+ Φ119891119898119887119888
)
if 119886 is not an ancestor of 119887 or 119888
Φ119886119887119886119888
=1
4(2Φ119886119887119888+ Φ119891119887119898119888
+ Φ119898119887119891119888
)
if 119886 is not an ancestor of 119887 or 119888
Φ119886119886119886119887
=1
2(Φ119886119887+ Φ119891119898119887)
if 119886 is not an ancestor of 119887
Φ119886119886119886119886
=1
4(1 + 3Φ
119891119898) =
1
4(1 + 3119865
119886)
(5)
Φ119886119887119888
is the probability that randomly chosen alleles atthe same locus from each of the three individuals (ie 119886 119887and 119888) are identical by descent (IBD) Similarly Φ
119886119887119888119889is the
probability that randomly chosen alleles at the same locusfrom each of the four individuals (ie 119886 119887 119888 and 119889) are IBDΦ119886119887119888119889
is the probability that a random allele from 119886 is IBDwith a random allele from 119887 and that a random allele from 119888
is IBD with a random allele from 119889 at the same locus Notethat Φ
119886119887119888= 0 if there is no common ancestor of 119886 119887 and 119888
Φ119886119887119888119889
= 0 if there is no common ancestor of 119886 119887 119888 and 119889 andΦ119886119887119888119889
= 0 in the absence of a common ancestor either for 119886and 119887 or for 119888 and 119889
22 Identity Coefficients and Condensed Identity CoefficientsGiven two individuals 119886 and 119887withmaternally and paternallyderived alleles at a fixed autosomal locus there are 15 possibleidentity states and the probabilities associated with eachidentity state are called identity coefficients Ignoring thedistinction betweenmaternally and paternally derived alleleswe categorize the 15 possible states to 9 condensed identitystates as shown in Figure 1 The states range from state 1in which all four alleles are IBD to state 9 in which noneof the four alleles are IBD The probabilities associated witheach condensed identity state are called condensed identitycoefficients denoted by Δ
119894| 1 le 119894 le 9 The condensed
identity coefficients can be computed based on generalizedkinship coefficients using the linear transformation shown asfollows in (6)
[[[[[[[[[[[[
[
1 1 1 1 1 1 1 1 1
2 2 2 2 1 1 1 1 1
2 2 1 1 2 2 1 1 1
4 0 2 0 2 0 2 1 0
8 0 4 0 2 0 2 1 0
8 0 2 0 4 0 2 1 0
16 0 4 0 4 0 2 1 0
4 4 2 2 2 2 1 1 1
16 0 4 0 4 0 4 1 0
]]]]]]]]]]]]
]
[[[[[[[[[[[[
[
Δ1
Δ2
Δ3
Δ4
Δ5
Δ6
Δ7
Δ8
Δ9
]]]]]]]]]]]]
]
=
[[[[[[[[[[[[
[
1
2Φ119886119886
2Φ119887119887
4Φ119886119887
8Φ119886119886119887
8Φ119886119887119887
16Φ119886119886119887119887
4Φ119886119886119887119887
16Φ119886119887119886119887
]]]]]]]]]]]]
]
(6)
In our work we focus on deriving the path-counting for-mulas for the generalized kinship coefficients includingΦ
119886119887119888
Φ119886119887119888119889
and Φ119886119887119888119889
23 Terms Defined for Path-Counting Formulas for Three andFour Individuals
(1) Triple-Common AncestorGiven three individuals 119886 119887 and119888 if119860 is a common ancestor of the three individuals then wecall 119860 a triple-common ancestor of 119886 119887 and 119888
(2) Quad-Common Ancestor Given four individuals 119886 119887 119888and 119889 if119860 is a common ancestor of the four individuals thenwe call 119860 a quad-common ancestor of 119886 119887 119888 and 119889
(3) 119875(119860 119886) It denotes the set of all possible paths from 119860 to119886 where the paths can only traverse edges in the direction ofparent to child such that 119875(119860 119886) = 119873119880119871119871 if and only if 119860 isan ancestor of 119886 119875
119860119886denotes a particular path from 119860 to 119886
where 119875119860119886isin 119875(119860 119886)
(4) Path-Pair It consists of two paths denoted as ⟨119875119860119886 119875119860119887⟩
where 119875119860119886isin 119875(119860 119886) and 119875
119860119887isin 119875(119860 119887)
(5) Nonoverlapping Path-Pair Given a path-pair ⟨119875119860119886 119875119860119887⟩
it is nonoverlapping if and only if the two paths share nocommon individuals except 119860
(6) Path-Triple It consists of three paths denoted as ⟨119875119860119886 119875119860119887
119875119860119888⟩ where 119875
119860119886isin 119875(119860 119886) 119875
119860119887isin 119875(119860 119887) and 119875
119860119888isin 119875(119860 119888)
(7) Path-Quad It consists of four paths denoted as ⟨119875119860119886 119875119860119887
119875119860119888 119875119860119889⟩ where 119875
119860119886isin 119875(119860 119886) 119875
119860119887isin 119875(119860 119887) 119875
119860119888isin 119875(119860 119888)
and 119875119860119889isin 119875(119860 119889)
(8) 119861119894 119862(119875119860119886 119875119860119887) It denotes all common individuals shared
between 119875119860119886
and 119875119860119887 except 119860
(9) 119879119903119894 119862(119875119860119886 119875119860119887 119875119860119888) It denotes all common individuals
shared among 119875119860119886 119875119860119887 and 119875
119860119888 except 119860
(10)119876119906119886119889 119862(119875119860119886 119875119860119887 119875119860119888 119875119860119889) It denotes all common indi-
viduals shared among 119875119860119886 119875119860119887 119875119860119888 and 119875
119860119889 except 119860
(11) Crossover and 2-Overlap Individual If 119904 isin 119861119894 119862(119875119860119886 119875119860119887)
we call 119904 a crossover individual with respect to 119875119860119886
and 119875119860119887
ifthe two paths pass through different parents of 119904 On the otherhand if 119875
119860119886and 119875
119860119887pass through the same parent of 119904 then
we call 119904 a 2-overlap individual with respect to 119875119860119886
and 119875119860119887
(12) 3-Overlap Individual If 119904 isin 119879119903119894 119862(119875119860119886 119875119860119887 119875119860119888) and the
three paths 119875119860119886 119875119860119887 and 119875
119860119888pass through the same parent
of 119904 then we call 119904 a 3-overlap individual with respect to 119875119860119886
119875119860119887 and 119875
119860119888
(13) 2-Overlap Path If 119904 is a 2-overlap individual with respectto 119875119860119886
and 119875119860119887 then both 119875
119860119886and 119875119860119887
pass through the sameparent of 119904 denoted by 119901 and the edge from 119901 to 119904 is called anoverlap edge All consecutive overlap edges constitute a pathand this path is called a 2-overlap path If the 2-overlap path
4 Computational and Mathematical Methods in Medicine
Mat
erna
lPa
tern
al
Δ1 Δ2 Δ3 Δ4 Δ5 Δ6 Δ7 Δ8 Δ9
arsquos allelesbrsquos alleles
Figure 1 The 15 possible identity states for individuals 119886 and 119887 grouped by their 9 condensed states Lines indicate alleles that are IBD
A
c s d
e f
t
a b
Non-overlapping path-pair
Three independent paths
t is a crossover individual
and the overlap path is a root 2-overlap path
t is a 2-overlap individual and e is acrossover individual
t is a crossover individual s is a 2-overlapindividual and the overlap path is a root 2-overlap path
overlap individuals and the overlap path is a root 2-overlap path
e is a crossover individual t is a 2-overlapindividual and the overlap path is not a root 2-overlap path c is a 2-overlap individual and theoverlap path is a root 2-overlap path
Path-triple6
t is a crossover individual
s e t are 2-overlap individuals
c is a 3-overlap individual and e t are 2-
A rarr s rarr e rarr t rarr aA rarr s rarr e rarr t rarr b
A rarr s rarr e rarr t rarr aA rarr drarr b
A rarr s rarr e rarr t rarr aA rarrA rarr c
A rarr c
A rarr c
Path-pair1
Path-pair2
A rarr d rarr f rarr t rarr bA rarr s rarr e rarr t rarr a
A rarr s rarr e rarr t rarr aA rarr s rarr e rarr t rarr b
d rarr f
A rarr s rarr e rarr t rarr aA rarr d rarr f rarr t rarr b
A rarr c rarr t rarr e rarr aA rarr d rarr f rarr t rarr b
A rarr s rarr e rarr t rarr aA rarr s rarr f rarr t rarr bA rarr c
A rarr c rarr e rarr t rarr aA rarr c rarr e rarr t rarr bA rarr c
A rarr c rarr e rarr t rarr aA rarr c rarr e rarr t rarr bA rarr c
Path-triple1
Path-triple2
Path-triple3
Path-triple4
Path-pair3
Path-pair4
Path-triple5
s e t are 2-overlap individualswhere
where
where
where
where
where
where
where
Figure 2 Examples of path-pairs and path-triples
extends all theway to the ancestor119860 we call it a root 2-overlappath
(14) 3-Overlap PathIt consists of all 3-overlap individuals ina consecutive order If the 3-overlap path extends all the wayto the root 119860 we call it a root 3-overlap path
Example 1 Consider the path-pairs from 119860 to 119886 and 119887 inFigure 2 where119860 is a common ancestor of 119886 and 119887 For path-pair1 119861119894 119862(119875
119860119886 119875119860119887) = 119904 119890 119905 and 119860 rarr 119904 rarr 119890 rarr 119905 is
a root 2-overlap path with respect to 119875119860119886
and 119875119860119887 For path-
pair4 119861119894 119862(119875119860119886 119875119860119887) = 119890 119905 where 119890 is a crossover indi-
vidual 119905 is a 2-overlap individual with respect to 119875119860119886
and 119875119860119887
and 119890 rarr 119905 is a root 2-overlap path with respect to 119875119860119886
and119875119860119887
Example 2 There are four path-quads listed in Figure 3 from119860 to four individuals 119886 119887 119888 and 119889 where 119860 is a quad-common ancestor of the four individuals For path-quad2considering the paths 119875
119860119886and 119875119860119887 the path119860 rarr 119905 rarr 119891 rarr
119904 is a root 2-overlap path 119905 119891 119904 are 2-overlap individualswithrespect to 119875
119860119886and 119875
119860119887 For path-quad3 119905 119891 119904 are 3-overlap
individuals with respect to 119875119860119886 119875119860119887 and 119875
119860119888 and the path
119860 rarr 119905 rarr 119891 rarr 119904 is a root 3-overlap path
Then we summarize all the conceptual terms used in thepath-counting formulas for two individuals three individu-als and four individuals in Table 1 which reveals a glimpse ofour framework for generalizingWrightrsquos formula to three andfour individuals from terminology aspect
24 An Overview of Path-Counting Formula DerivationAccording to Wrightrsquos path-counting formula [16] (see (2))for two individuals 119886 and 119887 the path-counting approachrequires identifying common ancestors of 119886 and 119887 andcalculating the contribution of each common ancestor toΦ119886119887 More specifically for each common ancestor denoted
as 119860 we obtain all path-pairs from 119860 to 119886 and 119887
and identify acceptable path-pairs For Φ119886119887 an acceptable
path-pair ⟨119875119860119886 119875119860119887⟩ is a nonoverlapping path-pair where
Computational and Mathematical Methods in Medicine 5
A
c
s
dt
f
ba
m
Path-quad1
Path-quad2
Path-quad3
Path-quad4
A rarr cA rarr d
A rarr t rarr f rarr s rarr aA rarr m rarr s rarr b
A rarr t rarr f rarr s rarr aA rarr t rarr f rarr s rarr bA rarr cA rarr d
A rarr t rarr f rarr s rarr aA rarr t rarr f rarr s rarr bA rarr t rarr f rarr s rarr cA rarr d
A rarr t rarr f rarr s rarr aA rarr t rarr m rarr s rarr bA rarr t rarr m rarr s rarr cA rarr d
Figure 3 Examples of path-quads
Table 1 The conceptual terms used for two three and four individuals
Two individuals Three individuals Four individualsCommon ancestor Triple-common ancestor Quad-common ancestorPath-pair Path-triple Path-quad119861119894 119862(119875
119860119886 119875119860119887) 119879119903119894 119862(119875
119860119886 119875119860119887 119875119860119888) 119876119906119886119889 119862(119875
119860119886 119875119860119887 119875119860119888 119875119860119889)
NA 2-Overlap individual 3-Overlap individualNA 2-Overlap path 3-Overlap pathNA Root 2-overlap path Root 3-overlap pathNA Crossover individual Crossover individual
the two paths share no common individuals except 119860 InFigure 2 path-pair2 is an acceptable path-pair while path-pair1 path-pair3 and path-pair4 are not acceptable path-pairs The contribution of each common ancestor 119860 toΦ
119886119887is
computed based on the inbreeding coefficient of 119860 modifiedby the length of each acceptable path-pair
To compute Φ119886119887119888
the path-counting approach requiresidentifying all triple-common ancestors of 119886 119887 and 119888 andsumming up all triple-common ancestorsrsquo contributions toΦ119886119887119888
For each triple-common ancestor denoted as119860 we firstidentify all path-triples each of which consists of three pathsfrom 119860 to 119886 119887 and 119888 respectively Some examples of path-triples are presented in Figure 2
For Φ119886119887 only nonoverlapping path-pairs are acceptable
A path-triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ consists of three path-pairs
⟨119875119860119886 119875119860119887⟩ ⟨119875119860119886 119875119860119888⟩ and ⟨119875
119860119887 119875119860119888⟩ For Φ
119886119887119888 a path-triple
might be acceptable even though either 2-overlap individualsor crossover individuals exist between a path-pair Themain challenge we need to address is finding necessary andsufficient conditions for acceptable path-triples
Aiming at solving the problem of identifying acceptablepath-triples we first use a systematic method to generate allpossible cases for a path-pair by considering different types ofcommon individuals shared between the two pathsThen weintroduce building blocks which are connected graphs withconditions on every edge in the graph that encapsulates a
set of acceptable cases of path-pairs In each building blockwe represent paths as nodes and interactions (ie sharedcommon individuals between two paths) as edges There areat least two paths in a building block For each buildingblock we obtain all acceptable cases for concerned path-pairs Given a path-triple it can be decomposed to one ormultiple building blocks Considering a shared path-pairbetween two building blocks we use the natural join operatorfrom relational algebra to match the acceptable cases forthe shared path-pair between two building blocks In otherwords considering the acceptable cases for building blocksas inputs we use the natural join operator to construct allacceptable cases for a path-triple Acceptable cases for a path-triple are identified and then used in deriving the path-counting formula forΦ
119886119887119888
Then we summarize all the main procedures used forderiving the path-counting formula for Φ
119886119887119888in a flowchart
shown in Figure 4 The main procedures are also applicablefor deriving the path-counting formulas forΦ
119886119887119888119889andΦ
119886119887119888119889
3 Results and Discussion
31 Path-Counting Formulas for Three Individuals We firstintroduce a systematic method to generate all possible cases
6 Computational and Mathematical Methods in Medicine
Path-pair
Path-triple Path-pair levelrepresentation Decomposition A set of
building blocksSets of acceptable casesFor each building block
Acceptable cases forpath-triple Natural join
If path-pair hascrossover
No
No
Yes
Yes
Split operator
Path-triple belongs toType 2
Type 1
If path-pair hasroot overlap
Compute its contributionto Φabc
Path-triple belongs to
⟨PAa PAb⟩Generate all cases for Identify nonoverlap path-
Pairs for ⟨PAa PAb⟩Compute its contribution
to Φab
Identify acceptable cases⟨PAa PAb⟩ in thefor
context of a path-triple
Aa PAb PAc ⟩⟨P
⟨PAa PAb⟩
Figure 4 A flowchart for path-counting formula derivation
for a path-pair Then we discuss building blocks for path-triples and identify all acceptable cases which are used inderiving the path-counting formula forΦ
119886119887119888
311 Cases for a Path-Pair Given a path-pair ⟨119875119860119886 119875119860119887⟩with
119861119894 119862(119875119860119886 119875119860119887) = 119873119880119871119871 where 119860 is a common ancestor of 119886
and 119887 and 119861119894 119862(119875119860119886 119875119860119887) consists of all common individuals
shared between 119875119860119886
and 119875119860119887 except 119860 we introduce three
patterns (ie crossover 2-overlap and root 2-overlap) to gen-erate all possible cases for ⟨119875
119860119886 119875119860119887⟩
(1) 119883(119875119860119886 119875119860119887) 119875119860119886
and 119875119860119887
share one or multiple cross-over individuals
(2) 119879(119875119860119886 119875119860119887) 119875119860119886
and 119875119860119887
are root 2-overlapping from119860 and the root 2-overlap path can have one or multi-ple 2-overlap individuals
(3) 119884(119875119860119886 119875119860119887)119875119860119886
and119875119860119887
are overlapping but not from119860 and the 2-overlap path can have one or multiple 2-overlap individuals
Based on the three patterns 119883(119875119860119886 119875119860119887) 119879(119875
119860119886 119875119860119887)
and 119884(119875119860119886 119875119860119887) we use regular expressions to generate all
possible cases for the path-pair ⟨119875119860119886 119875119860119887⟩ For convenience
we drop ⟨119875119860119886 119875119860119887⟩ and use 119883119879 and 119884 instead of patterns
119883(119875119860119886 119875119860119887) 119879(119875
119860119886 119875119860119887) and 119884(119875
119860119886 119875119860119887) whenever there is
no confusion When 119861119894 119862(119875119860119886 119875119860119887) = 119873119880119871119871 the eight cases
shown in (7) cover all possible cases for ⟨119875119860119886 119875119860119887⟩ The com-
pleteness of eight cases shown in (7) for ⟨119875119860119886 119875119860119887⟩ can be
proved by induction on the total number of 119879 119883 and 119884appearing in ⟨119875
119860119886 119875119860119887⟩ Using the pedigree in Figure 2 Cases
1ndash3 and Case 6 are illustrated in (8) (9) (10) and (11)
Case 1 119879Case 2 119883+
Case 3 119879119883+
Case 4 119879(119883+119884)+
Case 5 119879(119883+119884)+119883+
Case 6 119883+119884Case 7 119883+(119884119883+)+
Case 8 119883+(119884119883+)+119884
(7)
119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119886
119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119887 isin 119879 (8)
Computational and Mathematical Methods in Medicine 7
S0 S1 S2 S3
PAa PAb
PAc
Figure 5 A path-pair level graphical representation of ⟨119875119860119886 119875119860119887 119875119860119888⟩
where 119904 119890 119905 are 2-overlap individuals and the overlap pathis a root 2-overlap path
119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119886
119860 997888rarr 119904 997888rarr 119891 997888rarr 119905 997888rarr 119887 isin 119879119883 (9)
where 119904 is a 2-overlap individual and the overlap path is a root2-overlap path 119905 is a crossover individual
119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119886
119860 997888rarr 119889 997888rarr 119891 997888rarr 119905 997888rarr 119887 isin 119883 (10)
where 119905 is a crossover individual
119860 997888rarr 119888 997888rarr 119890 997888rarr 119905 997888rarr 119886
119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119887 isin 119883119884 (11)
where 119890 is a crossover individual 119905 is a 2-overlap individualand the overlap path is a 2-overlap path
312 Path-Pair Level Graphical Representation of a Path-Tri-ple Given a path-triple ⟨119875
119860119886 119875119860119887 119875119860119888⟩ we represent each
path as a node The path-triple can be decomposed to threepath-pairs (ie ⟨119875
119860119886 119875119860119887⟩ ⟨119875119860119886 119875119860119888⟩ and ⟨119875
119860119887 119875119860119888⟩) For
each path-pair if the two paths share at least one commonindividual (ie either 2-overlap individual or crossover indi-vidual) except119860 then there is an edge between the two nodesrepresenting the two paths Therefore we obtain four differ-ent scenarios 119878
0ndash1198783 shown in Figure 5
In Figure 5 the scenario 1198780has no edges so it means
that ⟨119875119860119886 119875119860119887 119875119860119888⟩ consists of three independent paths In
Figure 2 path-triple1 is an example of 1198780 Next we introduce
a lemma which can assist with identifying the options for theedges in the scenarios 119878
1ndash1198783
Lemma 3 Given a path-triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ consider the
three path-pairs ⟨119875119860119886 119875119860119887⟩ ⟨119875119860119886 119875119860119888⟩ and ⟨119875
119860119887 119875119860119888⟩ if there
is a 2-overlap edge which is represented by 119884 in regular expres-sion representation of any of the three path-pairs and then thepath-triple ⟨119875
119860119886 119875119860119887 119875119860119888⟩ has no contribution to Φ
119886119887119888
Proof In [17] Nadot and Vaysseix proposed from a geneticand biological point of view that Φ
119886119887119888can be evaluated by
enumerating all eligible inheritance paths at allele-level start-ing from a triple common ancestor119860 to the three individuals119886 119887 and 119888
p1
p3
A
b c
a
p2
p5
p8
p4
p7
p6
(a) Pedigree
A
b c
a
p5
p7
p4
p6
p8
p1 p2
p3
(b) Inheritance paths
Figure 6 Examples of pedigree and inheritance paths
For the pedigree in Figure 6 let us consider the path-triple ⟨119875
119860119886 119875119860119887 119875119860119888⟩ listed as follows 119875
119860119886 119860 rarr 119886 119875
119860119887
119860 rarr 1199013rarr 1199016rarr 1199017rarr 119887 119875
119860119888 119860 rarr 119901
4rarr 1199016rarr
1199017rarr 119888For ⟨119875
119860119887 119875119860119888⟩ 1199016is a crossover individual 119901
7is an over-
lap individual and 1199016rarr 1199017is a 2-overlap edge repre-sented
by 119884 in regular expression representation (see the definitionfor 119884 in Section 311)
For the individual 1199016 let us denote the two alleles at one
fixed autosomal locus as 1198921and 119892
2 At allele-level only one
allele can be passed down from 1199016to 1199017 Since 119901
3and 119901
4
are parents of 1199016 1198921is passed down from one parent and
1198922is passed down from the other parent It is infeasible to
pass down both 1198921and 119892
2from 119901
6to 1199017 In other words
there are no corresponding inheritance paths for the path-triple ⟨119875
119860119886 119875119860119887 119875119860119888⟩with a 2-overlap edge between ⟨119875
119860119887 119875119860119888⟩
(ie Case 6119883119884) Therefore such kind of path-triples has nocontribution toΦ
119886119887119888
Figure 6(b) shows one example of eligible inheritancepaths corresponding to a pedigree graph Each individual isrepresented by two allele nodesThe eligible inheritance pathsin Figure 6(b) consist of red edges only
Only Case 1 Case 2 and Case 3 do not have 119884 in theregular expression representation of a path-pair (see (7))considering the scenarios 119878
1ndash1198783shown in Figure 5 an edge
can have three options Case 1 119879Case 2 119883Case 3 119879119883
313 Constructing Cases for a Path-Triple For the scenarios1198781ndash1198783in Figure 5 we define two building blocks 119861
1 1198612
along with some rules in Figure 7 to generate acceptablecases For 119861
1 the edge can have three options Case 1 119879
Case 2 119883 Case 3 119879119883 For 1198612 we cannot allow both edges
to be root overlap because if two edges are root overlap then
8 Computational and Mathematical Methods in Medicine
For B2 there can be at most one edge belonging to root overlap (either T or TX)
PAa PAa
PAb PAb PAc
B1 B2
For B1 the edge can have three options case 1 T case 2 X case 3 TX
Figure 7 Building blocks 1198611 1198612 and basic rules
Note Ri denotes all acceptable path-triples for ui
S3e1
T3 = R1 ⋈ R2 ⋈ R3u1 u2 u3
e2 e2 e2
e3e3 e3e1 e1
Figure 8 A graphical illustration for obtaining 1198793
119875119860119886
and 119875119860119888
must share at least one com-mon individualexcept 119860 which contradicts the fact that 119875
119860119886and 119875
119860119888have
no edgeNext we focus on generating all acceptable cases for the
scenarios 1198781ndash1198783in Figure 5 where only 119878
3contains more
than one building block In order to leverage the dependencyamong building blocks we decompose 119878
3to 1198783= 1199061= 1198612
1199062= 1198612 1199063= 1198612 shown in Figure 8 For each 119906
119894 we have a
set of acceptable path-triples denoted as 119877119894
Considering the dependency among 1198771 1198772 1198773 we use
the natural join operator denoted as ⋈ operating on 1198771
1198772 1198773 to generate all acceptable cases for 119878
3 As a result we
obtain 1198793= 1198771⋈ 1198772⋈ 1198773 where 119879
3denotes the acceptable
cases of the path-triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ in the scenario 119878
3
For each scenario in Figure 5 we generate all acceptablecases for ⟨119875
119860119886 119875119860119887 119875119860119888⟩ The scenario 119878
0has no edges and
it shows that ⟨119875119860119886 119875119860119887 119875119860119888⟩ consists of three independent
paths while for the other scenarios 119878119896(119896 = 1 2 3) the 119896
edges can have two options
(1) all 119896 edges belong to crossover or(2) one edge belongs to root 2-overlap the remaining (119896minus
1) edges belong to crossover
In summary acceptable path-triples can have at most oneroot 2-overlap path any number of crossover individuals butzero 2-overlap path
314 Splitting Operator Considering the existence of root2-overlap path and crossover in acceptable path-triples wepropose a splitting operator to transform a path-triple withcrossover individuals to a noncrossover path-triple withoutchanging the contribution from this path-triple to Φ
119886119887119888 The
main purpose of using the splitting operator is to simplifythe path-counting formula derivation process We first usean example in Figure 9 to illustrate how the splitting operator
works In Figure 9 there is a crossover individual 119904 between119875119860119886
and 119875119860119887
in the path triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ in 119866
119896+1 The
splitting operator proceeds as follows
(1) split the node 119904 to two nodes 1199041and 1199042
(2) transform the edges 119904 rarr 1198861015840 and 119904 rarr 119887
1015840 to 1199041rarr 1198861015840
and 1199042rarr 1198871015840 respectively
(3) add two new edges 1199042rarr 1198861015840 and 119904
1rarr 1198871015840
Lemma 4 Given a pedigree graph 119866119896+1
having (119896 + 1)
crossover individuals regarding ⟨119875119860119886 119875119860119887 119875119860119888⟩ shown in
Figure 9 let 119904 denote the lowest crossover individual where nodescendant of 119904 can be a crossover individual among the threepaths119875
119860119886119875119860119887 and119875
119860119888 After using the splitting operator for the
lowest crossover individual 119904 in119866119896+1 the number of crossover
individuals in 119866119896+1
is decreased by 1
Proof The splitting operator only affects the edges from 119904 to1198861015840 and 1198871015840 If there is a new crossover node appearing the only
possible node is either 1198861015840 or 1198871015840 Assume 1198871015840 becomes a cross-over individual it means that 1198871015840 is able to reach 119886 and 119887 fromtwo separate paths It contradicts the fact that 119904 is the lowestcrossover individual between 119875
119860119886and 119875
119860119887
Next we introduce a canonical graph which results fromapplying the splitting operator for all crossover individualsThe canonical graph has zero crossover individual
Definition 5 (Canonical Graph) Given a pedigree graph 119866having one or more crossover individuals regarding Φ
119886119887119888 If
there exists a graph 1198661015840 which has no crossover individualswith regards to Φ
119886119887119888such that
(i) any acceptable path-triple in 119866 has an acceptablepath-triple in 1198661015840 which has the same contribution toΦ119886119887119888
as the one in 119866 forΦ119886119887119888
(ii) any acceptable path-triple in 1198661015840 has an acceptablepath-triple in 119866 which and has the same contributionto Φ119886119887119888
as the one in 1198661015840 forΦ119886119887119888
We call 1198661015840 a canonical graph of 119866 regardingΦ119886119887119888
Lemma 6 For a pedigree graph 119866 having one or morecrossover individuals regarding ⟨119875
119860119886 119875119860119887 119875119860119888⟩ there exists a
canonical graph 1198661015840 for 119866
Computational and Mathematical Methods in Medicine 9
Ancestor-descendant relationshipParent-child relationship
a998400 b
a b a b
998400 a998400 b998400
s1 s2
A A
x w c x w c
s For Gk+1 ⟨P ⟩ = PAa PAb PAc
⟨P ⟩ = PAa PAb PAcFor Gk
Gk+1 k + 1 crossover Gk k crossover
A rarr middot middot middot rarr x rarr s rarr a998400 rarr middot middot middot rarr aA rarr middot middot middot rarr w rarr s rarr b998400 rarr middot middot middot rarr b
A rarr middot middot middot rarr x rarr s1 rarr a998400 rarr middot middot middot rarr aA rarr middot middot middot rarr w rarr s2 rarr b998400 rarr middot middot middot rarr b
A rarr c
A rarr c
Figure 9 Transforming pedigree graph 119866119896+ 1 having 119896 + 1 crossover to 119866
119896having 119896 crossover
S0
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
PAa PAd
PAb PAc
Figure 10 A path-pair level graphical representation of ⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
Proof (Sketch) The proof is by induction on the number ofcrossover individuals
Induction hypothesis assume that if119866 has 119896 or less cross-overs there is a canonical graph 1198661015840 for 119866
In the induction step let119866119896+1
be a graph with 119896+1 cross-overs let 119904 be the lowest crossover between paths 119875
119860119886and
119875119860119887
in 119866119896+1
We apply the splitting operator on 119904 in 119866119896+1
andobtain 119866
119896having 119896 crossovers by Lemma 4
315 Path-Counting Formula for Φ119886119887119888
Now we present thepath-counting formula forΦ
119886119887119888
Φ119886119887119888= sum
119860
( sum
Type 1(1
2)
119871 triple
Φ119860119860119860
+ sum
Type 2(1
2)
119871 triple+1
Φ119860119860)
(12)
where Φ119860119860= (12)(1 + 119865
119860) Φ119860119860119860
= (14)(1 + 3119865119860) 119865119860 the
inbreeding coefficient of119860119860 a triple-common ancestor of 119886119887 and 119888 Type 1 ⟨119875
119860119886 119875119860119887 119875119860119888⟩ has zero root 2-overlap Type
2 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has one root 2-overlap path 119875
119860119904ending at
the individual 119904
119871 triple = 119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888minus 119871119875119860119904
for Type 2(13)
and 119871119875119860119886
the length of the path 119875119860119886
(also applicable for 119875119860119886
119875119860119888 and 119875
119860119904)
For completeness the path-counting formula for Φ119886119886119887
isgiven in Appendix A and the correctness proof of the path-counting formula is given in Appendix B
32 Path-Counting Formulas for Four Individuals
321 Path-Pair Level Graphical Representation of ⟨119875119860119886119875119860119887
119875119860119888119875119860119889⟩ Given a path-quad ⟨119875
119860119886 119875119860119887 119875119860119888 119875119860119889⟩ and
119876119906119886119889 119862(119875119860119886 119875119860119887 119875119860119888 119875119860119889) = 0 the path-quad can have 11
scenarios 1198780ndash11987810shown in Figure 10 where all four paths are
considered symmetricallyIn Figure 11 we introduce three building blocks 119861
1
1198612 1198613 For 119861
1and 119861
2 the rules presented in Figure 7 are also
applicable for Figure 11 For1198613 we only consider root overlap
because the crossover individuals can be eliminated by usingthe splitting operator introduced in Section 314 Note thatfor 1198613 if 119879119903119894 119862(119875
119860119886 119875119860119887 119875119860119888) = 0 then it is equivalent to the
scenario 1198783in Figure 8 Therefore we only need to consider
1198613when 119879119903119894 119862(119875
119860119886 119875119860119887 119875119860119888) = 0
322 Building Block-Based Cases Construction for ⟨119875119860119886119875119860119887
119875119860119888119875119860119889⟩ For a scenario 119878
119894(0 le 119894 le 10) in Figure 11 we
first decompose 119878119894to one or multiple building blocks For a
scenario 119878119894isin 1198781 1198783 it has only one building block and
all acceptable cases can be obtained directly For 1198782= 1199061=
1198611 1199062= 1198611 there is no need to consider the conflict between
the edges in 1199061and 119906
2because 119906
1and 119906
2are disconnected
Let 119877119894denote all acceptable cases of the path-pairs in 119906
119894 and
let 119879119894denote all acceptable cases for 119878
119894 Therefore we obtain
1198792= 1198771times1198772where times denotes the Cartesian product operator
from relational algebra
10 Computational and Mathematical Methods in Medicine
For B3 all three edges belong to root overlap (ie having root 3-overlap)
PAa
PAb PAcPAb
PAa
C(PAa PAb PAc) ne
B1 B2 B3
Tri 0
Figure 11 Building blocks for all scenarios of ⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
Table 2 Largest subgraph of a scenario 119878119894(4 le 119894 le 10 and 119894 = 6)
119878119894
1198784
1198785
1198787
1198788
1198789
11987810
119878119895
1198783
1198783
1198786
1198785
1198787
1198789
For 1198786= 1199061= 1198613 we obtain 119879
6= 1198771 For 119878
119894isin 119878119894| 4 le
119894 le 10 and 119894 = 6 we define the largest subgraph of 119878119894based
on which we construct 119879119894
Definition 7 (Largest Subgraph) Given a scenario 119878119894(4 le 119894 le
10 and 119894 = 6) the largest subgraph of 119878119894 denoted as 119878
119895 is
defined as follows
(1) 119878119895is a proper subgraph of 119878
119894
(2) if 119878119894contains 119861
3 then 119878
119895must also contain 119861
3
(3) no such 119878119896exists that 119878
119895is a proper subgraph of 119878
119896
while 119878119896is also a proper subgraph of 119878
119894
For each scenario 119878119894(4 le 119894 le 10 and 119894 = 6) we list the
largest subgraph of 119878119894 denoted as 119878
119895 in Table 2
For a scenario 119878119894(4 le 119894 le 10 and 119894 = 6) let Diff(119878
119894 119878119895)
denote the set of building blocks in 119878119894but not in 119878
119895 where 119878
119895is
the largest subgraph of 119878119894 Let |119864
119894| and |119864
119895| denote the number
of edges in 119878119894and 119878
119895 respectively According to Table 2 we
can conclude that |119864119894| minus |119864
119895| = 1 In order to leverage the
dependency among building blocks we consider only 1198612in
Diff(119878119894119878119895) For example Diff(119878
51198783) = 119861
2 Let119879
3denote all
acceptable cases for 1198783 And let119877
1denote the set of acceptable
cases for Diff(1198785 1198783) Then we can use 119878
3and Diff(119878
5
1198783) to construct all acceptable cases for 119878
5 Then we apply
this idea for constructing all acceptable cases for each 119878119894in
Table 2Given a path-quad ⟨119875
119860119886 119875119860119887 119875119860119888 119875119860119889⟩ an acceptable case
has the following properties
(1) if there is one root 3-overlap path there can be atmostone root 2-overlap path
(2) otherwise there can be at most two root 2-overlappaths
323 Path-Counting Formula forΦ119886119887119888119889
Now we present thepath-counting formula forΦ
119886119887119888119889as follows
Φ119886119887119888119889
= sum
119860
( sum
Type 1(1
2)
119871quad
Φ119860119860119860119860
+ sum
Type 2(1
2)
119871quad+1
Φ119860119860119860
+ sum
Type 3(1
2)
119871quad+2
Φ119860119860)
(14)
where Φ119860119860= (12)(1+119865
119860)Φ119860119860119860
= (14)(1+3119865119860)Φ119860119860119860119860
=
(18)(1+7119865119860) 119865119860 the inbreeding coefficient of119860119860 a quad-
common ancestor of 119886 119887 119888 and 119889 Type 1 zero root 2-overlapand zero root 3-overlap path Type 2 one root 2-overlap path119875119860119904
ending at 119904
Type 3
Case 1 two root 2-overlap paths 1198751198601199041
1198751198601199042
ending at 1199041and 1199042 respectively
Case 2 one root 3-overlap path119875119860119905
ending at 119905Case 3 one root 2-overlap path119875119860119904 one root 3-overlap
path 119875119860119905
ending at 119904 and 119905respectively
119871quad =
119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889
for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 119871119875119860119904
for Type 2119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889
minus1198711198751198601199041
minus 1198711198751198601199042
for Case 1 isin Type 3119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 2 lowast 119871
119875119860119905for Case 2 isin Type 3
119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889
minus119871119875119860119905minus 119871119875119860119904
for Case 3 isin Type 3(15)
and 119871119875119860119886
the length of the path 119875119860119886
(also applicable for 119875119860119887
119875119860119888 119875119860119889 etc)
For completeness the path-counting formulas for Φ119886119886119887119888
and Φ119886119886119886119887
are presented in Appendix A The correctness ofthe path-counting formula for four individuals is proven inAppendix C
Computational and Mathematical Methods in Medicine 11
⟨ ⟩(PAa PAb) (PAc PAd) = b
A
c
s t
da
A rarr s rarr aA rarr s rarr bA rarr t rarr cA rarr t rarr d
(a)
⟨ ⟩(PAa PAb) (PAc PAd) = b
A
c
x y
da
A rarr x rarr a
A rarr x rarr d
A rarr y rarr bA rarr y rarr c
(b)
Figure 12 Examples of 2-pair-path-quads for Φ119886119887119888119889
33 Path-Counting Formulas for Two Pairs of Individuals
331 Terminology and Definitions
(1) 2-Pair-Path-Pair It consists of two pairs of path-pairsdenoted as ⟨(119875
119878119886 119875119878119887) (119875119879119888 119875119879119889)⟩ where 119875
119878119886isin 119875(119878 119886) 119875
119878119887isin
119875(119878 119887) 119875119879119888isin 119875(119879 119888) 119875
119879119889isin 119875(119879 119889) 119878 is a common ancestor
of 119886 and 119887 and 119879 is a common ancestor of 119888 and 119889 If119860 = 119878 =119879 then 119860 is a quad-common ancestor of 119886 119887 119888 and 119889
(2) Homo-Overlap and Heter-Overlap Individual Given twopairs of individuals ⟨119886 119887⟩ and ⟨119888 119889⟩ if 119904 isin 119861119894 119862(119875
119860119886 119875119860119887) (or
119904 isin 119861119894 119862(119875119860119888 119875119860119889) we call 119904 a homo-overlap individual when
119875119860119886
and 119875119860119887
(or 119875119860119888
and 119875119860119889) pass through the same parent of
119904 If 119903 isin 119861119894 119862(119875119860119894 119875119860119895) where 119894 isin 119886 119887 and 119895 isin 119888 119889 we call
119903 a heter-overlap individual when 119875119860119894
and 119875119860119895
pass throughthe same parent of 119903
(3) Root Homo-Overlap and Heter-Overlap Path Given a 2-pair-path-pair ⟨(119875
119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ if 119904 is a homo-overlap
individual and the homo-overlap path extends all the wayto the quad-common ancestor 119860 then we call it a roothomo-overlap path If 119903 is a heter-overlap individual and theheter-overlap path extends all the way to the quad-commonancestor 119860 then we call it a root heter-overlap path
Example 8 119860 is quad-common ancestor for 119886 119887 119888 and 119889 inFigure 12 For (a) 119904 is a homo-overlap individual between 119875
119860119886
and 119875119860119887
119905 is a homo-overlap individual between 119875119860119888
and 119875119860119889 And
119860 rarr 119904 and 119860 rarr 119905 are root homo-overlap paths For (b) 119909 isa heter-overlap individual between 119875
119860119886and 119875
119860119889 119910 is a heter-
overlap individual between 119875119860119887
and 119875119860119888 And 119860 rarr 119909 and
119860 rarr 119910 are root heter-overlap paths
332 Path-Counting Formula for Φ119886119887119888119889
Now we presenta path-pair level graphical representation for ⟨(119875
119860119886 119875119860119887)
(119875119860119888 119875119860119889)⟩ shown in Figure 13 The options for an edge can
be 119879119883 119879119883 (Refer to Section 311 for definitions of 119879119883and 119879119883) Based on the different types of ⟨119875
119860119886 119875119860119887 119875119860119888 119875119860119889⟩
presented in (14) all cases for ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ are
summarized in Table 3 where ℎ is the last individual of a roothomo-overlap path 119875
119860ℎ(ie the path 119875
119860ℎending at ℎ) and 119903
1
and 1199032are the last individuals of root heter-overlap paths 119875
1198601199031
and 1198751198601199032
respectivelyGiven a pedigree graph having one or multiple progeni-
tors 119901119894| 119894 gt 0 we define that the generation of a progenitor
Table 3 A summary of all cases for ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩ ⟨(119875
119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩
Zero root 2-overlap andzero root 3-overlap
Zero root homo-overlap and zero rootheter-overlap
One root 2-overlap path
One root homo-overlap and zero rootheter-overlapZero root homo-overlap and one rootheter-overlap
Two root 2-overlap paths
Two root homo-overlaps and zero rootheter-overlapZero root homo-overlap and two rootheter-overlaps
One root 3-overlap path One root homo-overlap and two rootheter-overlaps and ℎ = 119903
1= 1199032
One root 2-overlap andone root 3-overlap
One root homo-overlap and two rootheter-overlaps and 119903
1= 1199032= ℎ
One root homo-overlap and two rootheter-overlaps and ℎ = 119903
1= 1199032
119901119894is 0 denoted as gen(119901
119894) = 0 If an individual 119886 has only
one parent 119901 then we define gen(119886) = gen(119901) + 1 If anindividual 119886 has two parents 119891 and 119898 we define gen(119886) =MAXgen(119891) gen(119898) + 1
The path-counting formula forΦ119886119887119888119889
is as follows
Φ119886119887119888119889
= sum
119860
( sum
Type 1(1
2)
1198712-pair
Φ119860119860119860
+ sum
Type 2(1
2)
1198712-pair+1
Φ119860119860119860
+ sum
Type 3(1
2)
1198712-pair+2
Φ119860119860
+ sum
Type 4(1
2)
1198712-pair+1
Φ119860119860)
+ sum
(119878119879)isinType 5(1
2)
119871⟨119875119878119886119875119878119887⟩+119871⟨119875119879119888119875119879119889
⟩+1
Φ119861119861
(16)
where 119860 a quad-common ancestor of 119886 119887 119888 and 119889 119878a common ancestor of 119886 and 119887 and 119879 a common ances-tor of 119888 and 119889 For ⟨(119875
119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ (119878 = 119879 =
119860) there are four types (ieType 1 to Type 4)
12 Computational and Mathematical Methods in Medicine
S0S1 S2 S3 S4 S5 S6 S7
S8 S9 S10 S11 S12 S13 S14 S15 S16
PAa
PAdPAb
PAc
Figure 13 Scenarios of ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ at path-pair level
Type 1 zero root homo-overlap and zero root heter-overlapType 2 zero root homo-overlap and one root heter-overlap 119875
119860119903ending at 119903
Type 3
zero root homo-overlap and two rootheter-overlap 119875
1198601199031and1198751198601199032
ending at1199031and 1199032 respectively
one root homo-overlap 119875119860ℎ
ending at ℎand two root heter-overlap 119875
1198601199031and 119875
1198601199032
ending at 1199031and 1199032 and 119903
1= 1199032
(17)
Type 4 one root homo-overlap 119875119860ℎ
ending at ℎ andtwo root heter-overlap ending at 119903
1and 1199032 and ℎ =
1199031= 1199032 For ⟨(119875
119878119886 119875119878119887) (119875119879119888 119875119879119889)⟩ (119878 = 119879) there is
one type (ie Type 5)Type 5 ⟨119875
119878119886 119875119878119887⟩ has zero overlap individual ⟨119875
119879119888
119875119879119889⟩ has zero overlap individual
At most one path-pair (either ⟨119875119878119886 119875119878119887⟩ or ⟨119875
119879119888
119875119879119889⟩) can have crossover individualsBetween a path from ⟨119875
119878119886 119875119878119887⟩ and a path from ⟨119875
119879119888 119875119879119889⟩
there are no overlap individuals but there can be crossoverindividuals 119909 where 119909 = 119878 and 119909 = 119879
119861=
119878 when gen (119878) lt gen (119879)119878 when gen (119878) = gen (119879)
and 119879 has two parents119879 otherwise
1198712-pair =
119871119875119860119886+ 119871119875119860119887
+119871119875119860119888+ 119871119875119860119889
for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 119871119875119860119903
for Type 2119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 1198711198751198601199031
minus 1198711198751198601199032
for Type 3119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 2 lowast 119871
119875119860ℎfor Type 4
119871⟨119875119878119886 119875119878119887⟩
= 119871119875119878119886+ 119871119875119878119887
for Type 5
119871⟨119875119879119888 119875119879119889⟩
= 119871119875119879119888+ 119871119875119879119889
for Type 5
(18)
Note that if ⟨119886 119887⟩ and ⟨119888 119889⟩ have zero quad-commonancestors we have the following formula for Φ
119886119887119888119889
Φ119886119887119888119889
= sum
(119878119879)isinType 6(1
2)
119871⟨119875119878119886119875119878119887⟩+119871⟨119875119879119888119875119879119889
⟩
Φ119878119878lowast Φ119879119879 (19)
Type 6 ⟨119875119878119886 119875119878119887⟩ is a nonoverlapping path-pair and ⟨119875
119879119888
119875119879119889⟩ is a nonoverlapping path-pair Between a path from
⟨119875119878119886 119875119878119887⟩ and a path from ⟨119875
119879119888 119875119879119889⟩ there are no overlap
individuals but there can be crossover individuals119871⟨119875119878119886 119875119878119887⟩
and 119871⟨119875119879119888119875119879119889⟩
are defined as in Type 5The correctness of the path-counting formula forΦ
119886119887119888119889is
proven in Appendix C For completeness please refer to [18]for the path-counting formulas for Φ
119886119886119887119888 Φ119886119887119886119888
Φ119886119887119886119887
andΦ119886119886119886119887
34 Experimental Results In this section we show the effi-ciency of our path-counting method using NodeCodes forcondensed identity coefficients by making comparisons withthe performance of a recursive method used in [10] Weimplemented two methods (1) using recursive formulas tocompute each required kinship coefficient and generalizedkinship coefficient (2) using path-counting method coupledwith NodeCodes to compute each required kinship coeffi-cient and generalized kinship coefficient independently Werefer to the first method as Recursive the second methodas NodeCodes For completeness please refer to [18] for thedetails of the NodeCodes-based method
Nodecodes of a node is a set of labels each representing apath to the node from its ancestors Given a pedigree graphlet 119903 be the progenitor (ie the node with 0 in-degree)(For simplicity we assume there is one progenitor 119903 asthe ancestor of all individuals in the pedigree Otherwise avirtual node 119903 can be added to the pedigree graph and allprogenitors can be made children of 119903) For each node 119906 inthe graph the set of NodeCodes of 119906 denoted as NC(119906) areassigned using a breadth-first-search traversal starting from119903 as follows
(1) If 119906 is 119903 then NC(119903) contains only one element theempty string
(2) Otherwise let 119906 be a node with NC(119906) and V0 V1
V119896be 119906rsquos children in sibling order then for each 119909
in NC(119906) a code 119909119894lowast is added to NC(V119894) where 0 le
119894 le 119896 and lowast indicates the gender of the individualrepresented by node V
119894
Computational and Mathematical Methods in Medicine 13
Computations of kinship coefficients for two individualsand generalized kinship coefficients for three individualspresented in [11 12 14 15] are using NodeCodes TheNodeCodes-based computation schemes can also be appliedfor the generalized kinship coefficients for four individualsand two pairs of individuals For completeness please referto [18] for the details using NodeCodes to compute thegeneralized kinship coefficients for four individuals and twopairs of individuals based on our proposed path-countingformulas in Sections 32 and 33
In order to test the scalability of our approach for cal-culating condensed identity coefficients on large pedigreeswe used a population simulator implemented in [11] togenerate arbitrarily large pedigreesThe population simulatoris based on the algorithm for generating populations withoverlapping generations in Chapter 4 of [19] along withthe parameters given in Appendix B of [20] to model therelatively isolated Finnish Kainuu subpopulation and itsgrowth during the years 1500ndash2000 An overview of thegeneration algorithmwas presented in [11 12 14]The param-eters include startingending year initial population sizeinitial age distribution marriage probability maximum ageat pregnancy expected number of children by time periodimmigration rate and probability of death by time period andage group
We examine the performance of condensed identity coef-ficients using twelve synthetic pedigrees which range from75 individuals to 195197 individuals The smallest pedigreespans 3 generations and the largest pedigree spans 19 gener-ations We analyzed the effects of pedigree size and the depthof individuals in the pedigree (the longest path between theindividual and a progenitor) on the computation efficiencyimprovement
In the first experiment 300 random pairs were selectedfrom each of our 12 synthetic pedigrees Figure 14 showscomputation efficiency improvement for each pedigree Ascan be seen the improvement of NodeCodes over Recursivegrew increasingly larger as the pedigree size increased froma comparable amount of 2683 on the smallest pedigree to9475 on the largest pedigree It also shows that path-count-ing method coupled with NodeCodes can scale very well onlarge pedigrees in terms of computing condensed identitycoefficients
In our next experiment we examined the effect of thedepth of the individual in the pedigree on the query time Foreach depth we generated 300 random pairs from the largestsynthetic pedigree
Figure 15 shows the effect of depth on the compu-tation efficiency improvement We can see the improve-ment of NodeCodes over Recursive ranging from 8648 to9130
4 Conclusion
We have introduced a framework for generalizing Wrightrsquospath-counting formula for more than two individuals Aim-ing at efficiently computing condensed identity coefficients
0
50
100
150
200
77 181
383
769
1558
3105
6174
1235
1
2466
7
4976
1
9832
8
1951
97
250
300
Aver
age t
ime (
ms)
Individuals in pedigree
RecursiveNodecodes
Figure 14 The effect of pedigree size on computation efficiencyimprovement
0200400600800
10001200140016001800
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Aver
age t
ime (
ms)
Depth
RecursiveNodeCodes
Figure 15 The effect of depth on computation efficiency improve-ment
we proposed path-counting formulas (PCF) for all general-ized kinship coefficients for which are sufficient for express-ing condensed identity coefficients by a linear combinationWe also perform experiments to compare the efficiency of ourmethod with the recursive method for computing condensedidentity coefficients on large pedigrees Our future workincludes (i) further improvements on condensed identifycoefficients computation by collectively calculating the setof generalized kinship coefficients to avoid redundant com-putations and (ii) experimental results for using PCF inconjunction with encoding schemes (eg compact path-encoding schemes [13]) for computing condensed identitycoefficients on very large pedigrees
Appendices
A Path-Counting Formulas of Special Cases
A1 Path-Counting Formula for Φ119886119886119887
For ⟨1198751198601198861 1198751198601198862⟩ we
introduce a special case where 1198751198601198861
and 1198751198601198862
aremergeable
14 Computational and Mathematical Methods in Medicine
PAa1 PAa2 PAa1 PAa2
S0 S1
PAb PAb PAb
If is mergeable⟨P ⟩Aa1 PAa2
PAa
S2 S3
Figure 16 A path-pair level graphical representation of ⟨1198751198601198861 1198751198601198862
119875119860119887⟩
Definition A1 (Mergeable Path-Pair) A path-pair ⟨1198751198601198861
1198751198601198862⟩ is mergeable if and only if the two paths 119875
1198601198861and 119875
1198601198862
are completely identical
Next we present a graphical representation of ⟨1198751198601198861 1198751198601198862
119875119860119887⟩ in Figure 16
Lemma A2 For 1198782and 119878
3in Figure 16 ⟨119875
1198601198861 1198751198601198862⟩ cannot
be a mergeable path-pair
Proof For 1198782and 119878
3 if ⟨119875
1198601198861 1198751198601198862⟩ is mergeable then
any common individual 119904 between 1198751198601198861
and 119875119860119887
is alsoa shared individual between 119875
1198601198862and 119875
119860119887 It means
119904 isin 119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) which contradicts the fact that
119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) = 0
Considering all three scenarios in Figure 16 only 1198781can
have a mergeable path-pair ⟨1198751198601198861 1198751198601198862⟩ by Lemma A2 Now
we present our path-counting formula forΦ119886119886119887
where 119886 is notan ancestor of 119887
Φ119886119886119887
= sum
119860
( sum
Type 1(1
2)
119871 tripleminus1
Φ119860119860119860
+ sum
Type 2(1
2)
119871 triple
Φ119860119860
+ sum
Type 3(1
2)
119871⟨119875119860119886119875119860119887⟩+1
Φ119860119860)
(A1)
where 119860 a common ancestor of 119886 and 119887When ⟨119875
1198601198861 1198751198601198862⟩ is not mergeable
Type 1 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has no root 2-overlap
Type 2 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has one root 2-overlap path
119875119860119904
ending at the individual 119904
When ⟨1198751198601198861 1198751198601198862⟩ is mergeable
Type 3 ⟨119875119860119886 119875119860119887⟩ is a nonoverlapping path-pair
119871 triple = 1198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887
for Type 11198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887minus 119871119875119860119904
for Type 2
119871⟨119875119860119886 119875119860119887⟩
= 119871119875119860119886+ 119871119875119860119887
for Type 3
(A2)
For the sake of completeness if 119886 is an ancestor of 119887 there isno recursive formula for Φ
119886119886119887in [10] but we can use either
the recursive formula for Φ119886119887119888
or the path-counting formulaforΦ119886119887119888
to computeΦ11988611198862119887
A2 Path-Counting Formula for Φ119886119886119887119888
Given a path-quad⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩ if ⟨119875
1198601198861 1198751198601198862⟩ is not mergeable then
we process the path-quad as equivalent to ⟨119875119860119886 119875119860119887 119875119860119888
119875119860119889⟩ If ⟨119875
1198601198861 1198751198601198862⟩ is mergeable the path-quad ⟨119875
1198601198861 1198751198601198862
119875119860119887 119875119860119888⟩ can be condensed to scenarios for ⟨119875
119860119886 119875119860119887 119875119860119888⟩
Now we present a path-counting formula forΦ119886119886119887119888
where119886 is not an ancestor of 119887 and 119888 as follows
Φ119886119886119887119888
= sum
119860
( sum
Type 1(1
2)
119871quadminus1
Φ119860119860119860119860
+ sum
Type 2(1
2)
119871quad
ΦAAA
+ sum
Type 3(1
2)
119871quad+1
Φ119860119860)
+sum
119860
( sum
Type 4(1
2)
119871 triple+1
Φ119860119860119860
+ sum
Type 5(1
2)
119871 triple+2
Φ119860119860)
(A3)
where 119860 a quad-common ancestor of 119886 119887 119888 and 119889When ⟨119875
1198601198861 1198751198601198862⟩ is not mergeable
Type 1 zero root 2-overlap and zero root 3-overlappathType 2 one root 2-overlap path 119875
119860119904ending at 119904
Type 3
Case 1 two root 2-overlap paths 1198751198601199041
and 1198751198601199042
ending at 1199041and 1199042 respectively
Case 2 one root 3-overlap path 119875119860119905
ending at 119905Case 3 one root 2-overlapand one root 3-overlap paths119875119860119904
and 119875119860119905
ending at 119904 and 119905respectively
(A4)
When ⟨1198751198601198861 1198751198601198862⟩ is mergeable
Type 4 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has zero root 2-overlap path
Type 5 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has one root 2-overlap path119875
119860119904
ending at 119904
119871quad=
1198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
for Type 11198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus119871119875119860119904
for Type 21198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus1198711198751198601199041
minus 1198711198751198601199042
for Case 1isinType 31198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus119871119875119860119905
for Case 2isinType 31198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus119871119875119860119905minus 119871119875119860119904
for Case 3isinType 3
119871 triple = 119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
for Type 4119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888minus 119871119875119860119904
for Type 5(A5)
Computational and Mathematical Methods in Medicine 15
Note that if 119886 is an ancestor of either 119887 or 119888 or both ofthem then the path-counting formula of Φ
119886119887119888119889is applicable
to computeΦ11988611198862119887119888
A3 Path-Counting Formula for Φ119886119886119886119887
A special case of⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ for ⟨119875
1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ is introduced
when ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable With the existence of
a mergeable path-triple ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ can be con-
densed to ⟨119875119860119886 119875119860119887⟩
Definition A3 (Mergeable Path-Triple) Given three paths1198751198601198861
1198751198601198862
and 1198751198601198863
they are mergeable if and only if theyare completely identical
Lemma A4 Given a path-quad ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there
must be at least one mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩
⟨1198751198601198861 1198751198601198863⟩ ⟨1198751198601198862 1198751198601198863⟩
Proof For an individual 119886 with two parents 119891 and 119898 thepaternal allele of the individual 119886 is transmitted from 119891 andthe maternal allele is transmitted from119898 At allele level onlytwo descent paths starting from an ancestor are allowed Fora path-quad ⟨119875
1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there must be at least one
mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩ ⟨1198751198601198861 1198751198601198863⟩ and
⟨1198751198601198862 1198751198601198863⟩
For simplicity we treat ⟨1198751198601198861 1198751198601198862⟩ as a default mergeable
path-pairNow we present the path-counting formula for Φ
119886119886119886119887
where 119886 is not an ancestor of 119887 as follows
Φ119886119886119886119887
= sum
119860
(3
2( sum
Type 1(1
2)
119871 tripleminus1
Φ119860119860119860
+ sum
Type 2(1
2)
119871 triple
Φ119860119860)
+ sum
Type 3(1
2)
119871pair+2
Φ119860119860)
(A6)
where 119860 a common ancestor of 119886 and 119887When there is only one mergeable path-pair (let us con-
sider ⟨1198751198601198861 1198751198601198862⟩ as the mergeable path-pair)
Type 1 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has zero root 2-overlap path
Type 2 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has one root 2-overlap path
119875119860119904
ending at 119904
When ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable
Type 3 ⟨119875119860119886 119875119860119887⟩ is nonoverlapping
119871 triple = 1198711198751198601198861
+ 1198711198751198601198863
+ 119871119875119860119887
for Type 11198711198751198601198861
+ 1198711198751198601198863
+ 119871119875119860119887minus 119871119875119860119904
for Type 2
119871pair = 119871119875119860119886 + 119871119875119860119887 for Type 3
(A7)
Note that if 119886 is an ancestor of 119887 we treat Φ119886119886119886119887
=
Φ119886111988621198863119887
Then we apply the path-counting formula forΦ119886119887119888119889
to computeΦ119886111988621198863119887
Case21 Case31 ΦAAAΦabCase22 Case32
Case23 ΦAA
Figure 17 Dependency graph for different cases regardingΦ119886119887119888
andΦ119886119886119887
B Proof for Path-Counting Formulas ofThree Individuals
Wefirst demonstrate that for one triple-common ancestor119860the path-counting computation of Φ
119886119887119888is equivalent to the
computation using recursive formulas Then we prove thecorrectness of the path-counting computation for multipletriple-common ancestors
B1 One Triple-Common Ancestor Considering the differenttypes of path-triples starting from a triple-common ancestor119860 in a pedigree graph119866 contributing toΦ
119886119887119888andΦ
119886119886119887119866 can
have 5 different cases
Case 21 119866 does not haveany path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with root overlapCase 22 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with root overlapCase 23 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
having mergeablepath-pair⟨119875
1198601198861 1198751198601198862⟩
lArr997904 Φ119886119886119887
Case 31 119866 does not haveany path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩
with root overlapCase 32 119866 has path-triples
⟨119875119860119886 119875119860119887 119875119860119888⟩
with root overlap
lArr997904 Φ119886119887119888
(B1)
Based on the 5 cases from Case 21 to Case 32 we firstconstruct a dependency graph shown in Figure 17 consist-ent with the recursive formulas (3) (4) and (5) for the gener-alized kinship coefficients for three individuals
Then we take the following steps to prove the correctnessof the path-counting formulas (12) and (A1)
(i) forΦ119886119887 the correctness of the path-counting formula
(ie Wrightrsquos formula) is proven in [21] For Case 21and Case 22 the correctness is proven based on thecorrectness of Cases 31 and 32
(ii) for Case 23 it has no cycle but only depends on Φ119886119887
Thus we prove the correctness of Case 23 by trans-forming the case toΦ
119886119887
16 Computational and Mathematical Methods in Medicine
a b
c
(a)
A
a b c
(b)
Figure 18 (a) 119888 is a parent of 119886 and 119887 (b) no individual is a parent of another
Parent-child relationshipAncestor-descendant relationship
A
a
s v p
f b c
(a)
Parent-child relationshipAncestor-descendant relationship
c
a
s v
f b
(b)
Figure 19 (a) No individual is a parent of another (b) 119888 is an ancestor of 119886 and 119887
(iii) for Cases 31 and 32 the correctness is proven byinduction on the number of edges 119899 in the pedigreegraph 119866
B11 Correctness Proof for Case 31
Case 31 ForΦ119886119887119888
119866 does not have any path triples ⟨119875119860119886 119875119860119887
119875119860119888⟩ with root overlap
Proof (Basis) There are two basic scenarios (i) one individ-ual is a parent of another (ii) no individual is a parent ofanother among 119886 119887 and 119888
Using the recursive formula (3) to compute Φ119886119887119888
forFigure 18(a) Φ
119886119887119888= (12)Φ
119888119887119888= (12)
2
Φ119888119888119888 for Figure 18(b)
Φ119886119887119888= (12)Φ
119860119887119888= (12)
2
Φ119860119860119888
= (12)3
Φ119860119860119860
Using the path-counting formula (12) if a path-triple
⟨119875119860119886 119875119860119887 119875119860119888⟩ has no root overlap (ie Type 1) then the
contribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ
119886119887119888can be computed as
follows sumType 1(12)119871⟨119875119860119886119875119860119887
119875119860119888⟩Φ119860119860119860
where 119871⟨119875119860119886119875119860119887 119875119860119888⟩
=
119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
For Figure 18(a) 119888 is the only triple-common ancestor
and we obtain Φ119886119887119888
= (12)119871⟨119875119888119886119875119888119887
119875119888119888⟩Φ119888119888119888
= (12)2
Φ119888119888119888 for
Figure 18(b) we obtain Φ119886119887119888
= (12)119871⟨119875119860119886119875119860119887
119875119860119888⟩Φ119860119860119860
=
(12)3
Φ119860119860119860
Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Then we show it istrue for 119899 = 119896 + 1
For Figures 19(a) and 19(b) among 119886 119887 and 119888 let 119886 be theindividual having the longest path starting from their triple-common ancestor in the pedigree graph119866with (119896+1) edgesIf we remove the node 119886 and cut the edge 119891 rarr 119886 from 119866
then the new graph 119866lowast has 119896 edges In terms of computingΦ119891119887119888
119866lowast satisfies the condition for induction hypothesisFor Figure 19(a) Φ
119891119887119888= sumType 1(12)
119871⟨119875119860119891119875119860119887119875119860119888⟩Φ119860119860119860
Based on the recursive formula (3)Φ
119886119887119888= (12)(Φ
119891119887119888+Φ119898119887119888)
where 119891 and 119898 are parents of 119886 In 119866 119886 only has one parent119891 thus it indicatesΦ
119898119887119888= 0 Then we can plug-in the path-
counting formula forΦ119891119887119888
to obtain
Φ119886119887119888=1
2Φ119891119887119888
=1
2lowast sum
Type 1(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩
Φ119860119860119860
= sum
Type 1(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860119860
∵ 119871⟨119875119860119886119875119860119887 119875119860119888⟩
= 119871⟨119875119860119891119875119860119887 119875119860119888⟩
+ 1
there4 Φ119886119887119888= sum
Type 1(1
2)
119871⟨119875119860119886119875119860119887119875119860119888⟩
Φ119860119860119860
(B2)
Similarly for Figure 19(b) we obtain Φ119886119887119888
=
sumType 1(12)119871⟨119875119888119891119875119888119887119875119888119888⟩+1
Φ119888119888119888= sumType 1(12)
119871⟨119875119888119886119875119888119887119875119888119888⟩Φ119888119888119888
Thus it is true for 119899 = 119896 + 1
B12 Correctness Proof for Case 32
Case 32 ForΦ119886119887119888
119866 has path triples ⟨119875119860119886 119875119860119887 119875119860119888⟩with root
overlap
Proof (Basis) There are three basic scenarios (i) there are twoindividuals who are parents of another (ii) there is only oneindividual who is parent of another (iii) there is no individualwho is a parent of another among 119886 119887 and 119888
Computational and Mathematical Methods in Medicine 17
a
b
c
(a)
A
a
b c
(b)
A
a
s
b
c
(c)
Figure 20 (a) 119887 is a parent of 119886 and 119888 is a parent of 119887 (b) 119887 is a parentof 119886 (c) no individual who is a parent of another
Using the recursive formula (3) to compute Φ119886119887119888
inFigure 20 for Figure 20(a) Φ
119886119887119888= (12)Φ
119887119887119888= (12)
2
Φ119887119888=
(12)3
Φ119888119888 for Figure 20(b)Φ
119886119887119888= (12)Φ
119887119887119888= (12)
2
Φ119887119888=
(12)4
Φ119860119860
for Figure 20(c)Φ119886119887119888= (12)
2
Φ119904119904119888= (12)
3
Φ119904119888=
(12)5
Φ119860119860
Using the path-counting formula (12) if a path-triple
⟨119875119860119886 119875119860119887 119875119860119888⟩ has root overlap (ie Type 2) then the con-
tribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ
119886119887119888can be computed as
followssumType 2(12)119871⟨119875119860119886119875119860119887
119875119860119888⟩+1
Φ119860119860
where 119871⟨119875119860119886 119875119860119887 119875119860119888⟩
=
119871119875119860119886
+ 119871119875119860119887
+ 119871119875119860119888minus 119871119875119860119904
and 119904 is the last individual of theroot overlap path 119875
119860119904
For Figure 20(a) 119888 is the only triple-common ancestorand we obtain Φ
119886119887119888= (12)
119871⟨119875119888119886119875119888119887119875119888119888⟩+1
Φ119888119888= (12)
2+1
Φ119888119888=
(12)3
Φ119888119888 Similarly for Figures 20(b) and 20(c) we obtain
Φ119886119887119888= (12)
4
Φ119860119860
and Φ119886119887119888= (12)
5
Φ119860119860
respectively
Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Show that it is truefor = 119896 + 1
For Figures 21(a) 21(b) and 21(c) among 119886 119887 and 119888 let119886 be the individual who has the longest path and let 119901 be aparent of 119886 Then we cut the edge 119901 rarr 119886 from 119866 and obtaina new graph 119866lowast which satisfies the condition of inductionhypothesis For Figure 21(a) we use the path-counting for-mula forΦ
119891119887119888in 119866lowast Φ
119891119887119888= sumType 2(12)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860
In 119866 119891 is the only parent of 119886 according to the recursive
formula (3) we have Φ119886119887119888= (12)Φ
119891119887119888 Then we can plug-in
the Φ119891119887119888
and obtain
Φ119886119887119888=1
2Φ119891119887119888
=1
2sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860
= sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1
Φ119860119860
∵ 119871⟨119875119860119886 119875119860119887 119875119860119888⟩
= 119871⟨119875119860119891119875119860119887 119875119860119888⟩
+ 1
there4 Φ119886119887119888= sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1
Φ119860119860
= sum
Type 2(1
2)
119871⟨119875119860119886119875119860119887119875119860119888⟩+1
Φ119860119860
(B3)
For Figures 21(b) and 21(c) we take the same steps as we cal-culate Φ
119886119887119888for Figure 21(a)
In summary it is true for 119899 = 119896 + 1
A
a
s
t
f
b
c
(a)
a
t
b
A
s c
(b)
a
s
t
b
c
(c)Figure 21 (a) No individual who is a parent of another (b) 119887 is aparent of 119886 (c) 119887 is a parent of 119886 and 119888 is an ancestor of 119887
B13 Correctness Proof for Case 23
Case 23 For Φ119886119886119887
the path-triples in the pedigree graph 119866have mergeable path-pair
Proof Considering the relationship between 119886 and 119887 119866has two scenarios (i) 119887 is not an ancestor of 119886 (ii) 119887 isan ancestor of 119886 Using the path-counting formula (A1)if a path-triple ⟨119875
1198601198861 1198751198601198862 119875119860119887⟩ isin Type 3 which means
that it has a mergeable path-pair then the contributionof ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ to Φ
119886119886119887can be computed as follows
sumType 3(12)119871⟨119875119860119886119875119860119887
⟩+1Φ119860119860
where 119871⟨119875119860119886 119875119860119887⟩
= 119871119875119860119886+ 119871119875119860119887
Using the recursive formula (4) we obtain Φ
119886119886119887=
(12)(Φ119886119887+ Φ119891119898119887)
For Figure 22(a) 119860 is a common ancestor of 119886 and 119887∵ 119886 only has one parent 119891
there4 Φ119886119886119887
=1
2(Φ119886119887+ Φ119891119898119887)
=1
2(Φ119886119887+ 0) =
1
2Φ119886119887
(as 119898 is missing) (B4)
For Φ119886119887 we use Wrightrsquos formula and obtain Φ
119886119887=
sum119875(12)119871⟨119875119860119886119875119860119887
⟩Φ119860119860
where 119875 denotes all nonoverlappingpath-pairs ⟨119875
119860119886 119875119860119887⟩
Then we have Φ119886119886119887
= (12)Φ119886119887
=
(12)sum119875(12)119871⟨119875119860119886119875119860119887
⟩Φ119860119860= sum119875(12)119871⟨119875119860119886119875119860119887
⟩+1Φ119860119860
For Figure 22(b) we can also transform the computation
of Φ119886119886119887
to Φ119886119887
In summary it shows that the path-counting formula(A1) is true for Case 23
B14 Correctness Proof for Cases 21 and 22 For Φ119886119886119887
whenthere is no path-triple having mergeable path-pair (ie thepath-triple belongs to either Case 21 or Case 23)Φ
119886119886119887can be
transformed toΦ11988611198862119887
which is equivalent to the computationof Φ119886119887119888
for Cases 31 and 32 The correctness of our path-counting formula for Cases 31 and 32 is proven Thus weobtain the correctness for Φ
119886119886119887when the path-triple belongs
to either Case 21 or Case 22
B2 Multiple Triple-Common Ancestors Now we providethe correctness proof for multiple triple-common ancestorsregarding the path-counting formulas (12) and (A1)
18 Computational and Mathematical Methods in Medicine
A
a
s
w
t
f
b
Parent-child relationshipAncestor-descendant relationship
(a)
a
s
f
b
Parent-child relationshipAncestor-descendant relationship
(b)
Figure 22 (a) 119887 is not an ancestor of 119886 (b) 119887 is an ancestor of 119886
Lemma A Given a pedigree graph 119866 and three individuals 119886119887 119888 having at least one trip-common ancestorΦ
119886119887119888is correctly
computed using the path counting formulas (12) and (A1)
Proof Proof by induction on the number of triple-commonancestorsBasis 119866 has only one triple-common ancestor of 119886 119887 and 119888
The correctness of (12) and (A1) for 119866 with only one tri-ple-common ancestor of 119886 119887 and 119888 is proven in the previoussection
Induction Hypothesis Assume that if 119866 has 119896 or less triple-common ancestors of 119886 119887 and 119888 (12) and (A1) are correct for119866
Induction Step Now we show that it is true for 119866 with 119896 + 1triple-common ancestors of 119886 119887 and 119888
Let 119879119903119894 119862(119886 119887 119888 119866) denote all triple-common ancestorsof 119886 119887 and 119888 in 119866 where 119879119903119894 119862(119886 119887 119888 119866) = 119860
119894| 1 le 119894 le 119896 +
1 Let 1198601be the most top triple-common ancestor such that
there is no individual among the remaining ancestors 119860119894|
2 le 119894 le 119896 + 1 who is an ancestor of 1198601 Let 119878(119860
1) denote the
contribution from 1198601to Φ119886119887119888
Because119860
1is themost top triple-common ancestor there
is no path-triple from 119860119894| 2 le 119894 le 119896 + 1 to 119886 119887 and
119888 which passes through 1198601 Then we can remove 119860
1from
119866 and delete all out-going edges from 1198601and obtain a new
graph 1198661015840 which has 119896 triple-common ancestors of 119886 119887 and 119888It means 119879119903119894 119862(119886 119887 119888 1198661015840) = 119860
119894| 2 le 119894 le 119896 + 1
For the new graph 1198661015840 we can apply our induction
hypothesis and obtainΦ119886119887119888(1198661015840
)For the most top triple-common ancestor 119860
1 there are
two different cases considering its relationship with the othertriple-common ancestors
(1) there is no individual among 119860119894| 2 le 119894 le 119896 + 1 who
is a descendant of 1198601
(2) there is at least one individual among 119860119894| 2 le 119894 le
119896 + 1 who is a descendant of 1198601
For (1) since no individual among 119860119894| 2 le 119894 le 119896 + 1 is a
descendant of 1198601 the set of path-triples from 119860
1to 119886 119887 and
119888 is independent of the set of path-triples from 119860119894| 2 le 119894 le
119896 + 1 to 119886 119887 and 119888 It also means that the contribution from
1198601toΦ119886119887119888
is independent of the contribution from the othertriple-common ancestors
Summing up all contributions we can obtainΦ119886119887119888(119866) =
Φ119886119887119888(1198661015840
) + 119878(1198601)
For (2) let119860119895be one descendant of119860
1 Now both119860
1and
119860119895can reach 119886 119887 and 119888119901119905119894= 119905119886 1198601rarr sdot sdot sdot rarr 119886 119905
119887 1198601rarr sdot sdot sdot rarr 119887 119905
119888 1198601rarr
sdot sdot sdot rarr 119888 a path-triple from 1198601to 119886 119887 and 119888
If 119905119886 119905119887 and 119905
119888all pass through119860
119895 then the path-triple119901119905
119894
is not an eligible path-triple for Φ119886119887119888
When we compute thecontribution from119860
1toΦ119886119887119888
we exclude all such path-tripleswhere 119905
119886 119905119887 and 119905
119888all pass through a lower triple-common
ancestor In other words an eligible path-triple from 1198601
regarding Φ119886119887119888
cannot have three paths all passing through alower triple-common ancestor Therefore we know that thatthe contribution from119860
1toΦ119886119887119888
is independent of the contri-bution from the other triple-common ancestors Summing upall contributions we obtainΦ
119886119887119888(119866) = Φ
119886119887119888(1198661015840
) + 119878(1198601)
C Proof for Four Individuals and TwoPairs of Individuals
Here we give a proof sketch for the correctness of pathcounting formulas for four individuals First of all for fourindividuals in a pedigree graph 119866 we present all differentcases based on which we construct a dependency graphThe correctness of the path-counting formulas for two-pairindividuals can be proved similarly
C1 Proof for Four Individuals Consider the existence ofdifferent types of path-quads regarding Φ
119886119887119888119889 Φ119886119886119887119888
andΦ119886119886119886119887
there are 15 cases for a pedigree graph 119866
Case 21 119866 has path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with zero root overlapCase 22 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with one root overlapCase 23 119866 has path-pairs
⟨119875119860119886 119875119860119887⟩
with zero root overlap
lArr997904 Φ119886119886119886119887
Computational and Mathematical Methods in Medicine 19
Case21
Case31 ΦAAA
ΦAAA
Case41
Case42
Case34ΦAA
Case32
Case331
Case22
Case23
Case431
Case35
Case432
Case4 33
Case332
Case333
Figure 23 Dependency graph for different cases for four individuals
Case 31 119866 has path-quads⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with zero root overlapCase 32 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 2-overlapCase 331 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with two root 2-overlapCase 332 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 3-overlapCase 333 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 2-overlapand one root 3-overlap
Case 34 119866 has path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩
with zero root overlapCase 35 119866 has path-triples
⟨119875119860119886 119875119860119887 119875119860119888⟩
with one root overlap
lArr997904 Φ119886119886119887119888
Case 41 119866 has path-quads⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with zero root overlapCase 42 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 2-overlapCase 431 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with two root 2-overlapCase 432 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 3-overlapCase 433 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 2-overlapand one root 3-overlap
lArr997904 Φ119886119887119888119889
(C1)Then we construct a dependency graph shown in
Figure 23 for all cases for four individualsAccording to the dependency graph in Figure 23 the
intermediate steps including Cases 34 and 35 are already
proved for the computation of Φ119886119887119888
The correctness of thetransformation fromCase 42 to Case 34 can be proved basedon the recursive formula forΦ
119886119887119888119889andΦ
119886119886119887119888 Similarly we can
obtain the transformation from Case 431 to Case 35
C2 Proof for TwoPairs of Individuals Consider the existenceof different types of 2-pair-path-pair regarding Φ
119886119887119888119889 there
are 9 cases which are listed as follows
Case 41 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-
overlap and zero root heter-overlap
Case 42 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-
overlap and one root heter-overlap
Case 431 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root
homo-overlap and two root heter-overlap
Case 432 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root
homo-overlap and two root heter-overlap
Case 44119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root homo-
overlap and zero root heter-overlap
Case 45 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with two root homo-
overlap and zero root heter-overlap
Case 46 119866 has path-triples ⟨119875119860119886 119875119860119887 119875119860119888⟩ with zero root
overlapCase 47 119866 has path-triples ⟨119875
119860119886 119875119860119887 119875119860119888⟩ with one root
overlap
Case 48 119866 has path-pairs ⟨119875119879119888 119875119879119889⟩ with zero root overlap
Then we construct a dependency graph for the casesrelating to Φ
119886119887119888119889in Figure 24
According to the dependency graph in Figure 24Cases 46 47 and 48 are the intermediate steps whichalready are proved for the computation of Φ
119886119887119888 The
correctness of the transformation from Case 42 to Case 46can be proved based on the recursive formula for Φ
119886119887119888119889and
Φ119886119887119886119888
Similarly we can obtain the transformation fromCases 431 and 432 to Case 47 as well as from Case 44 toCase 48 accordingly
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
20 Computational and Mathematical Methods in Medicine
Case41
Case44
ΦAAA
Case42 Case46
Case48
ΦAA
ΦTT
Case431 Case47
Case432
ΦAAAA
Figure 24 Dependency graph for different cases for two pairs of individuals
Acknowledgments
The authors thank Professor Robert C Elston Case Schoolof Medicine for introducing to them the identity coefficientsand referring them to the related literature [7 10 17] Thiswork is partially supported by the National Science Founda-tionGrants DBI 0743705 DBI 0849956 andCRI 0551603 andby the National Institute of Health Grant GM088823
References
[1] Surgeon Generalrsquos New Family Health History Tool Is ReleasedReady for ldquo21st Century Medicinerdquo httpcompmedcomcate-gorypeople-helping-peoplepage7
[2] M Falchi P Forabosco E Mocci et al ldquoA genomewidesearch using an original pairwise sampling approach for largegenealogies identifies a new locus for total and low-density lipo-protein cholesterol in two genetically differentiated isolates ofSardiniardquoThe American Journal of Human Genetics vol 75 no6 pp 1015ndash1031 2004
[3] M Ciullo C Bellenguez V Colonna et al ldquoNew susceptibilitylocus for hypertension on chromosome 8q by efficient pedigree-breaking in an Italian isolaterdquo Human Molecular Genetics vol15 no 10 pp 1735ndash1743 2006
[4] Glossary of Genetic Terms National Human Genome ResearchInstitute httpwwwgenomegovglossaryid=148
[5] CW CottermanA calculus for statistico-genetics [PhD thesis]Columbus Ohio USA Ohio State University 1940 Reprintedin P Ballonoff Ed Genetics and Social Structure DowdenHutchinson amp Ross Stroudsburg Pa USA 1974
[6] G Malecot Les mathematique de lrsquoheredite Masson ParisFrance 1948 Translated edition The Mathematics of HeredityFreeman San Francisco Calif USA 1969
[7] M Gillois ldquoLa relation drsquoidentite en genetiquerdquo Annales delrsquoInstitut Henri Poincare B vol 2 pp 1ndash94 1964
[8] D L Harris ldquoGenotypic covariances between inbred relativesrdquoGenetics vol 50 pp 1319ndash1348 1964
[9] A Jacquard ldquoLogique du calcul des coefficients drsquoidentite entredeux individualsrdquo Population vol 21 pp 751ndash776 1966
[10] G Karigl ldquoA recursive algorithm for the calculation of identitycoefficientsrdquo Annals of Human Genetics vol 45 no 3 pp 299ndash305 1981
[11] B Elliott S F Akgul S Mayes and Z M Ozsoyoglu ldquoEfficientevaluation of inbreeding queries on pedigree datardquo in Proceed-ings of the 19th International Conference on Scientific and Statis-tical Database Management (SSDBM rsquo07) July 2007
[12] B Elliott E Cheng S Mayes and Z M Ozsoyoglu ldquoEfficientlycalculating inbreeding on large pedigrees databasesrdquo Informa-tion Systems vol 34 no 6 pp 469ndash492 2009
[13] L Yang E Cheng and Z M Ozsoyoglu ldquoUsing compactencodings for path-based computations on pedigree graphsrdquo inProceedings of the ACM Conference on Bioinformatics Compu-tational Biology and Biomedicine (ACM-BCB rsquo11) pp 235ndash244August 2011
[14] E Cheng B Elliott and Z M Ozsoyoglu ldquoScalable compu-tation of kinship and identity coefficients on large pedigreesrdquoin Proceedings of the 7th Annual International Conference onComputational Systems Bioinformatics (CSB rsquo08) pp 27ndash362008
[15] E Cheng B Elliott and Z M Ozsoyoglu ldquoEfficient compu-tation of kinship and identity coefficients on large pedigreesrdquoJournal of Bioinformatics and Computational Biology (JBCB)vol 7 no 3 pp 429ndash453 2009
[16] S Wright ldquoCoefficients of inbreeding and relationshiprdquo TheAmerican Naturalist vol 56 no 645 1922
[17] R Nadot and G Vaysseix ldquoKinship and identity algorithm ofcoefficients of identityrdquo Biometrics vol 29 no 2 pp 347ndash3591973
[18] E Cheng Scalable path-based computations on pedigree data[PhD thesis] Case Western Reserve University ClevelandOhio USA 2012
[19] V Ollikainen Simulation Techniques for Disease Gene Localiza-tion in Isolated Populations [PhD thesis] University ofHelsinkiHelsinki Finland 2002
[20] H T T Toivonen P Onkamo K Vasko et al ldquoData miningapplied to linkage diseqilibrium mappingrdquoThe American Jour-nal of Human Genetics vol 67 no 1 pp 133ndash145 2000
[21] W Boucher ldquoCalculation of the inbreeding coefficientrdquo Journalof Mathematical Biology vol 26 no 1 pp 57ndash64 1988
Submit your manuscripts athttpwwwhindawicom
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MEDIATORSINFLAMMATION
of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Behavioural Neurology
EndocrinologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Disease Markers
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
OncologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Oxidative Medicine and Cellular Longevity
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
PPAR Research
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
ObesityJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational and Mathematical Methods in Medicine
OphthalmologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Diabetes ResearchJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Research and TreatmentAIDS
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Gastroenterology Research and Practice
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Parkinsonrsquos Disease
Evidence-Based Complementary and Alternative Medicine
Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom
2 Computational and Mathematical Methods in Medicine
Harris [8] and promulgated by Jacquard [9] Considering thefour alleles of two individuals at a fixed autosomal locus thereare 15 possible identity states Disregarding the distinctionbetween maternally and paternally derived alleles we obtain9 condensed identity states The probabilities associated witheach condensed identity state are called condensed identitycoefficients which are useful in a diverse range of fields Thisincludes the calculation of risk ratios for qualitative diseasethe analysis of quantitative traits and genetic counseling inmedicine
A recursive algorithm for calculating condensed identitycoefficients proposed by Karigl [10] has been known forsome time This method requires that one calculates a setof generalized kinship coefficients from which one obtainscondensed identity coefficients via a linear transformationOne limitation is that this recursive approach is not scalablewhen applied to very large pedigrees It has been previouslyshown that the kinship coefficients for two individuals [11ndash13]and the generalized kinship coefficients for three individuals[14 15] can be efficiently calculated using path-countingformulas together with path encoding schemes tailored forpedigree graphs
Motivated by the efficiency of path-counting formulas forcomputing the kinship coefficient for two individuals andthe generalized kinship coefficient for three individuals wefirst introduce a framework for developing path-countingformulas to compute generalized kinship coefficients con-cerning three individuals four individuals and two pairs ofindividuals Then we present path-counting formulas for allgeneralized kinship coefficients which have recursive formu-las proposed by Karigl [10] and are sufficient to computecondensed identity coefficients In summary our ultimategoal is to use path-counting formulas for generalized kinshipcoefficients computation so that efficiency and scalability forcondensed identity coefficients calculation can be improved
The main contributions of our work are as follows
(i) a framework to develop path-counting formulas forgeneralized kinship coefficients
(ii) a set of path-counting formulas for all generalizedkinship coefficients having recursive formulas [10]
(iii) experimental results demonstrating significant per-formance gains for calculating condensed identitycoefficients based on our proposed path-countingformulas as compared to using recursive formulas[10]
2 Materials and Methods
This section describes kinship coefficients and generalizedkinship coefficients identity coefficients and condensedidentity coefficients in more detail Conceptual terms for thepath-counting formulas for three and four individuals areintroduced in Section 23 In addition an overview of path-counting formula derivation is presented
21 Kinship Coefficients and Generalized Kinship CoefficientsThe kinship coefficient between two individuals 119886 and 119887 is
the probability that a randomly chosen allele at the samelocus from each is identical by descent (IBD) There are twoapproaches to computing the kinship coefficient Φ
119886119887 the
recursive approach [10] and the path-counting approach [16]The recursive formulas [10] forΦ
119886119887and Φ
119886119886are
Φ119886119887=1
2(Φ119891119887+ Φ119898119887) if 119886 is not an ancestor of 119887
Φ119886119886=1
2(1 + Φ
119891119898) =
1
2(1 + 119865
119886)
(1)
where119891 and119898 denote the father and themother of 119886 respec-tively and 119865
119886is the inbreeding coefficient of 119886
Wrightrsquos path-counting formula [16] forΦ119886119887is
Φ119886119887= sum
119860
sum
⟨119875119860119886119875119860119887⟩isin119875119875
(1
2)
119903+119904+1
(1 + 119865119860) (2)
where 119860 is a common ancestor of 119886 and 119887 119875119875 is a set of non-overlapping path-pairs ⟨119875
119860119886 119875119860119887⟩ from 119860 to 119886 and 119887 119903 is the
length of the path 119875119860119886 119904 is the length of the path 119875
119860119887 and 119865
119860
is the inbreeding coefficient of 119860 The path-pair ⟨119875119860119886 119875119860119887⟩ is
nonoverlapping if and only if the two paths share no commonindividuals except 119860
Recursive formulas proposed by Karigl [10] for general-ized kinship coefficients concerning three individuals fourindividuals and two pairs of individuals are listed as followsin (3) (4) and (5)
Φ119886119887119888=1
2(Φ119891119887119888+ Φ119898119887119888)
if 119886 is not an ancestor of 119887 or 119888
Φ119886119886119887
=1
2(Φ119886119887+ Φ119891119898119887) if 119886 is not an ancestor of 119887
Φ119886119886119886
=1
4(1 + 3Φ
119891119898) =
1
4(1 + 3119865
119886)
(3)
Φ119886119887119888119889
=1
2(Φ119891119887119888119889
+ Φ119898119887119888119889
)
if 119886 is not an ancestor of 119887 or 119888 or 119889
Φ119886119886119887119888
=1
2(Φ119886119887119888+ Φ119891119898119887119888
)
if 119886 is not an ancestor of 119887 or 119888
Φ119886119886119886119887
=1
4(Φ119886119887+ 3Φ119891119898119887)
if 119886 is not an ancestor of 119887
Φ119886119886119886119886
=1
8(1 + 7Φ
119891119898) =
1
8(1 + 7119865
119886)
(4)
Computational and Mathematical Methods in Medicine 3
Φ119886119887119888119889
=1
2(Φ119891119887119888119889
+ Φ119898119887119888119889
)
if 119886 is not an ancestor of 119887 or 119888 or 119889
Φ119886119886119887119888
=1
2(Φ119887119888+ Φ119891119898119887119888
)
if 119886 is not an ancestor of 119887 or 119888
Φ119886119887119886119888
=1
4(2Φ119886119887119888+ Φ119891119887119898119888
+ Φ119898119887119891119888
)
if 119886 is not an ancestor of 119887 or 119888
Φ119886119886119886119887
=1
2(Φ119886119887+ Φ119891119898119887)
if 119886 is not an ancestor of 119887
Φ119886119886119886119886
=1
4(1 + 3Φ
119891119898) =
1
4(1 + 3119865
119886)
(5)
Φ119886119887119888
is the probability that randomly chosen alleles atthe same locus from each of the three individuals (ie 119886 119887and 119888) are identical by descent (IBD) Similarly Φ
119886119887119888119889is the
probability that randomly chosen alleles at the same locusfrom each of the four individuals (ie 119886 119887 119888 and 119889) are IBDΦ119886119887119888119889
is the probability that a random allele from 119886 is IBDwith a random allele from 119887 and that a random allele from 119888
is IBD with a random allele from 119889 at the same locus Notethat Φ
119886119887119888= 0 if there is no common ancestor of 119886 119887 and 119888
Φ119886119887119888119889
= 0 if there is no common ancestor of 119886 119887 119888 and 119889 andΦ119886119887119888119889
= 0 in the absence of a common ancestor either for 119886and 119887 or for 119888 and 119889
22 Identity Coefficients and Condensed Identity CoefficientsGiven two individuals 119886 and 119887withmaternally and paternallyderived alleles at a fixed autosomal locus there are 15 possibleidentity states and the probabilities associated with eachidentity state are called identity coefficients Ignoring thedistinction betweenmaternally and paternally derived alleleswe categorize the 15 possible states to 9 condensed identitystates as shown in Figure 1 The states range from state 1in which all four alleles are IBD to state 9 in which noneof the four alleles are IBD The probabilities associated witheach condensed identity state are called condensed identitycoefficients denoted by Δ
119894| 1 le 119894 le 9 The condensed
identity coefficients can be computed based on generalizedkinship coefficients using the linear transformation shown asfollows in (6)
[[[[[[[[[[[[
[
1 1 1 1 1 1 1 1 1
2 2 2 2 1 1 1 1 1
2 2 1 1 2 2 1 1 1
4 0 2 0 2 0 2 1 0
8 0 4 0 2 0 2 1 0
8 0 2 0 4 0 2 1 0
16 0 4 0 4 0 2 1 0
4 4 2 2 2 2 1 1 1
16 0 4 0 4 0 4 1 0
]]]]]]]]]]]]
]
[[[[[[[[[[[[
[
Δ1
Δ2
Δ3
Δ4
Δ5
Δ6
Δ7
Δ8
Δ9
]]]]]]]]]]]]
]
=
[[[[[[[[[[[[
[
1
2Φ119886119886
2Φ119887119887
4Φ119886119887
8Φ119886119886119887
8Φ119886119887119887
16Φ119886119886119887119887
4Φ119886119886119887119887
16Φ119886119887119886119887
]]]]]]]]]]]]
]
(6)
In our work we focus on deriving the path-counting for-mulas for the generalized kinship coefficients includingΦ
119886119887119888
Φ119886119887119888119889
and Φ119886119887119888119889
23 Terms Defined for Path-Counting Formulas for Three andFour Individuals
(1) Triple-Common AncestorGiven three individuals 119886 119887 and119888 if119860 is a common ancestor of the three individuals then wecall 119860 a triple-common ancestor of 119886 119887 and 119888
(2) Quad-Common Ancestor Given four individuals 119886 119887 119888and 119889 if119860 is a common ancestor of the four individuals thenwe call 119860 a quad-common ancestor of 119886 119887 119888 and 119889
(3) 119875(119860 119886) It denotes the set of all possible paths from 119860 to119886 where the paths can only traverse edges in the direction ofparent to child such that 119875(119860 119886) = 119873119880119871119871 if and only if 119860 isan ancestor of 119886 119875
119860119886denotes a particular path from 119860 to 119886
where 119875119860119886isin 119875(119860 119886)
(4) Path-Pair It consists of two paths denoted as ⟨119875119860119886 119875119860119887⟩
where 119875119860119886isin 119875(119860 119886) and 119875
119860119887isin 119875(119860 119887)
(5) Nonoverlapping Path-Pair Given a path-pair ⟨119875119860119886 119875119860119887⟩
it is nonoverlapping if and only if the two paths share nocommon individuals except 119860
(6) Path-Triple It consists of three paths denoted as ⟨119875119860119886 119875119860119887
119875119860119888⟩ where 119875
119860119886isin 119875(119860 119886) 119875
119860119887isin 119875(119860 119887) and 119875
119860119888isin 119875(119860 119888)
(7) Path-Quad It consists of four paths denoted as ⟨119875119860119886 119875119860119887
119875119860119888 119875119860119889⟩ where 119875
119860119886isin 119875(119860 119886) 119875
119860119887isin 119875(119860 119887) 119875
119860119888isin 119875(119860 119888)
and 119875119860119889isin 119875(119860 119889)
(8) 119861119894 119862(119875119860119886 119875119860119887) It denotes all common individuals shared
between 119875119860119886
and 119875119860119887 except 119860
(9) 119879119903119894 119862(119875119860119886 119875119860119887 119875119860119888) It denotes all common individuals
shared among 119875119860119886 119875119860119887 and 119875
119860119888 except 119860
(10)119876119906119886119889 119862(119875119860119886 119875119860119887 119875119860119888 119875119860119889) It denotes all common indi-
viduals shared among 119875119860119886 119875119860119887 119875119860119888 and 119875
119860119889 except 119860
(11) Crossover and 2-Overlap Individual If 119904 isin 119861119894 119862(119875119860119886 119875119860119887)
we call 119904 a crossover individual with respect to 119875119860119886
and 119875119860119887
ifthe two paths pass through different parents of 119904 On the otherhand if 119875
119860119886and 119875
119860119887pass through the same parent of 119904 then
we call 119904 a 2-overlap individual with respect to 119875119860119886
and 119875119860119887
(12) 3-Overlap Individual If 119904 isin 119879119903119894 119862(119875119860119886 119875119860119887 119875119860119888) and the
three paths 119875119860119886 119875119860119887 and 119875
119860119888pass through the same parent
of 119904 then we call 119904 a 3-overlap individual with respect to 119875119860119886
119875119860119887 and 119875
119860119888
(13) 2-Overlap Path If 119904 is a 2-overlap individual with respectto 119875119860119886
and 119875119860119887 then both 119875
119860119886and 119875119860119887
pass through the sameparent of 119904 denoted by 119901 and the edge from 119901 to 119904 is called anoverlap edge All consecutive overlap edges constitute a pathand this path is called a 2-overlap path If the 2-overlap path
4 Computational and Mathematical Methods in Medicine
Mat
erna
lPa
tern
al
Δ1 Δ2 Δ3 Δ4 Δ5 Δ6 Δ7 Δ8 Δ9
arsquos allelesbrsquos alleles
Figure 1 The 15 possible identity states for individuals 119886 and 119887 grouped by their 9 condensed states Lines indicate alleles that are IBD
A
c s d
e f
t
a b
Non-overlapping path-pair
Three independent paths
t is a crossover individual
and the overlap path is a root 2-overlap path
t is a 2-overlap individual and e is acrossover individual
t is a crossover individual s is a 2-overlapindividual and the overlap path is a root 2-overlap path
overlap individuals and the overlap path is a root 2-overlap path
e is a crossover individual t is a 2-overlapindividual and the overlap path is not a root 2-overlap path c is a 2-overlap individual and theoverlap path is a root 2-overlap path
Path-triple6
t is a crossover individual
s e t are 2-overlap individuals
c is a 3-overlap individual and e t are 2-
A rarr s rarr e rarr t rarr aA rarr s rarr e rarr t rarr b
A rarr s rarr e rarr t rarr aA rarr drarr b
A rarr s rarr e rarr t rarr aA rarrA rarr c
A rarr c
A rarr c
Path-pair1
Path-pair2
A rarr d rarr f rarr t rarr bA rarr s rarr e rarr t rarr a
A rarr s rarr e rarr t rarr aA rarr s rarr e rarr t rarr b
d rarr f
A rarr s rarr e rarr t rarr aA rarr d rarr f rarr t rarr b
A rarr c rarr t rarr e rarr aA rarr d rarr f rarr t rarr b
A rarr s rarr e rarr t rarr aA rarr s rarr f rarr t rarr bA rarr c
A rarr c rarr e rarr t rarr aA rarr c rarr e rarr t rarr bA rarr c
A rarr c rarr e rarr t rarr aA rarr c rarr e rarr t rarr bA rarr c
Path-triple1
Path-triple2
Path-triple3
Path-triple4
Path-pair3
Path-pair4
Path-triple5
s e t are 2-overlap individualswhere
where
where
where
where
where
where
where
Figure 2 Examples of path-pairs and path-triples
extends all theway to the ancestor119860 we call it a root 2-overlappath
(14) 3-Overlap PathIt consists of all 3-overlap individuals ina consecutive order If the 3-overlap path extends all the wayto the root 119860 we call it a root 3-overlap path
Example 1 Consider the path-pairs from 119860 to 119886 and 119887 inFigure 2 where119860 is a common ancestor of 119886 and 119887 For path-pair1 119861119894 119862(119875
119860119886 119875119860119887) = 119904 119890 119905 and 119860 rarr 119904 rarr 119890 rarr 119905 is
a root 2-overlap path with respect to 119875119860119886
and 119875119860119887 For path-
pair4 119861119894 119862(119875119860119886 119875119860119887) = 119890 119905 where 119890 is a crossover indi-
vidual 119905 is a 2-overlap individual with respect to 119875119860119886
and 119875119860119887
and 119890 rarr 119905 is a root 2-overlap path with respect to 119875119860119886
and119875119860119887
Example 2 There are four path-quads listed in Figure 3 from119860 to four individuals 119886 119887 119888 and 119889 where 119860 is a quad-common ancestor of the four individuals For path-quad2considering the paths 119875
119860119886and 119875119860119887 the path119860 rarr 119905 rarr 119891 rarr
119904 is a root 2-overlap path 119905 119891 119904 are 2-overlap individualswithrespect to 119875
119860119886and 119875
119860119887 For path-quad3 119905 119891 119904 are 3-overlap
individuals with respect to 119875119860119886 119875119860119887 and 119875
119860119888 and the path
119860 rarr 119905 rarr 119891 rarr 119904 is a root 3-overlap path
Then we summarize all the conceptual terms used in thepath-counting formulas for two individuals three individu-als and four individuals in Table 1 which reveals a glimpse ofour framework for generalizingWrightrsquos formula to three andfour individuals from terminology aspect
24 An Overview of Path-Counting Formula DerivationAccording to Wrightrsquos path-counting formula [16] (see (2))for two individuals 119886 and 119887 the path-counting approachrequires identifying common ancestors of 119886 and 119887 andcalculating the contribution of each common ancestor toΦ119886119887 More specifically for each common ancestor denoted
as 119860 we obtain all path-pairs from 119860 to 119886 and 119887
and identify acceptable path-pairs For Φ119886119887 an acceptable
path-pair ⟨119875119860119886 119875119860119887⟩ is a nonoverlapping path-pair where
Computational and Mathematical Methods in Medicine 5
A
c
s
dt
f
ba
m
Path-quad1
Path-quad2
Path-quad3
Path-quad4
A rarr cA rarr d
A rarr t rarr f rarr s rarr aA rarr m rarr s rarr b
A rarr t rarr f rarr s rarr aA rarr t rarr f rarr s rarr bA rarr cA rarr d
A rarr t rarr f rarr s rarr aA rarr t rarr f rarr s rarr bA rarr t rarr f rarr s rarr cA rarr d
A rarr t rarr f rarr s rarr aA rarr t rarr m rarr s rarr bA rarr t rarr m rarr s rarr cA rarr d
Figure 3 Examples of path-quads
Table 1 The conceptual terms used for two three and four individuals
Two individuals Three individuals Four individualsCommon ancestor Triple-common ancestor Quad-common ancestorPath-pair Path-triple Path-quad119861119894 119862(119875
119860119886 119875119860119887) 119879119903119894 119862(119875
119860119886 119875119860119887 119875119860119888) 119876119906119886119889 119862(119875
119860119886 119875119860119887 119875119860119888 119875119860119889)
NA 2-Overlap individual 3-Overlap individualNA 2-Overlap path 3-Overlap pathNA Root 2-overlap path Root 3-overlap pathNA Crossover individual Crossover individual
the two paths share no common individuals except 119860 InFigure 2 path-pair2 is an acceptable path-pair while path-pair1 path-pair3 and path-pair4 are not acceptable path-pairs The contribution of each common ancestor 119860 toΦ
119886119887is
computed based on the inbreeding coefficient of 119860 modifiedby the length of each acceptable path-pair
To compute Φ119886119887119888
the path-counting approach requiresidentifying all triple-common ancestors of 119886 119887 and 119888 andsumming up all triple-common ancestorsrsquo contributions toΦ119886119887119888
For each triple-common ancestor denoted as119860 we firstidentify all path-triples each of which consists of three pathsfrom 119860 to 119886 119887 and 119888 respectively Some examples of path-triples are presented in Figure 2
For Φ119886119887 only nonoverlapping path-pairs are acceptable
A path-triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ consists of three path-pairs
⟨119875119860119886 119875119860119887⟩ ⟨119875119860119886 119875119860119888⟩ and ⟨119875
119860119887 119875119860119888⟩ For Φ
119886119887119888 a path-triple
might be acceptable even though either 2-overlap individualsor crossover individuals exist between a path-pair Themain challenge we need to address is finding necessary andsufficient conditions for acceptable path-triples
Aiming at solving the problem of identifying acceptablepath-triples we first use a systematic method to generate allpossible cases for a path-pair by considering different types ofcommon individuals shared between the two pathsThen weintroduce building blocks which are connected graphs withconditions on every edge in the graph that encapsulates a
set of acceptable cases of path-pairs In each building blockwe represent paths as nodes and interactions (ie sharedcommon individuals between two paths) as edges There areat least two paths in a building block For each buildingblock we obtain all acceptable cases for concerned path-pairs Given a path-triple it can be decomposed to one ormultiple building blocks Considering a shared path-pairbetween two building blocks we use the natural join operatorfrom relational algebra to match the acceptable cases forthe shared path-pair between two building blocks In otherwords considering the acceptable cases for building blocksas inputs we use the natural join operator to construct allacceptable cases for a path-triple Acceptable cases for a path-triple are identified and then used in deriving the path-counting formula forΦ
119886119887119888
Then we summarize all the main procedures used forderiving the path-counting formula for Φ
119886119887119888in a flowchart
shown in Figure 4 The main procedures are also applicablefor deriving the path-counting formulas forΦ
119886119887119888119889andΦ
119886119887119888119889
3 Results and Discussion
31 Path-Counting Formulas for Three Individuals We firstintroduce a systematic method to generate all possible cases
6 Computational and Mathematical Methods in Medicine
Path-pair
Path-triple Path-pair levelrepresentation Decomposition A set of
building blocksSets of acceptable casesFor each building block
Acceptable cases forpath-triple Natural join
If path-pair hascrossover
No
No
Yes
Yes
Split operator
Path-triple belongs toType 2
Type 1
If path-pair hasroot overlap
Compute its contributionto Φabc
Path-triple belongs to
⟨PAa PAb⟩Generate all cases for Identify nonoverlap path-
Pairs for ⟨PAa PAb⟩Compute its contribution
to Φab
Identify acceptable cases⟨PAa PAb⟩ in thefor
context of a path-triple
Aa PAb PAc ⟩⟨P
⟨PAa PAb⟩
Figure 4 A flowchart for path-counting formula derivation
for a path-pair Then we discuss building blocks for path-triples and identify all acceptable cases which are used inderiving the path-counting formula forΦ
119886119887119888
311 Cases for a Path-Pair Given a path-pair ⟨119875119860119886 119875119860119887⟩with
119861119894 119862(119875119860119886 119875119860119887) = 119873119880119871119871 where 119860 is a common ancestor of 119886
and 119887 and 119861119894 119862(119875119860119886 119875119860119887) consists of all common individuals
shared between 119875119860119886
and 119875119860119887 except 119860 we introduce three
patterns (ie crossover 2-overlap and root 2-overlap) to gen-erate all possible cases for ⟨119875
119860119886 119875119860119887⟩
(1) 119883(119875119860119886 119875119860119887) 119875119860119886
and 119875119860119887
share one or multiple cross-over individuals
(2) 119879(119875119860119886 119875119860119887) 119875119860119886
and 119875119860119887
are root 2-overlapping from119860 and the root 2-overlap path can have one or multi-ple 2-overlap individuals
(3) 119884(119875119860119886 119875119860119887)119875119860119886
and119875119860119887
are overlapping but not from119860 and the 2-overlap path can have one or multiple 2-overlap individuals
Based on the three patterns 119883(119875119860119886 119875119860119887) 119879(119875
119860119886 119875119860119887)
and 119884(119875119860119886 119875119860119887) we use regular expressions to generate all
possible cases for the path-pair ⟨119875119860119886 119875119860119887⟩ For convenience
we drop ⟨119875119860119886 119875119860119887⟩ and use 119883119879 and 119884 instead of patterns
119883(119875119860119886 119875119860119887) 119879(119875
119860119886 119875119860119887) and 119884(119875
119860119886 119875119860119887) whenever there is
no confusion When 119861119894 119862(119875119860119886 119875119860119887) = 119873119880119871119871 the eight cases
shown in (7) cover all possible cases for ⟨119875119860119886 119875119860119887⟩ The com-
pleteness of eight cases shown in (7) for ⟨119875119860119886 119875119860119887⟩ can be
proved by induction on the total number of 119879 119883 and 119884appearing in ⟨119875
119860119886 119875119860119887⟩ Using the pedigree in Figure 2 Cases
1ndash3 and Case 6 are illustrated in (8) (9) (10) and (11)
Case 1 119879Case 2 119883+
Case 3 119879119883+
Case 4 119879(119883+119884)+
Case 5 119879(119883+119884)+119883+
Case 6 119883+119884Case 7 119883+(119884119883+)+
Case 8 119883+(119884119883+)+119884
(7)
119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119886
119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119887 isin 119879 (8)
Computational and Mathematical Methods in Medicine 7
S0 S1 S2 S3
PAa PAb
PAc
Figure 5 A path-pair level graphical representation of ⟨119875119860119886 119875119860119887 119875119860119888⟩
where 119904 119890 119905 are 2-overlap individuals and the overlap pathis a root 2-overlap path
119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119886
119860 997888rarr 119904 997888rarr 119891 997888rarr 119905 997888rarr 119887 isin 119879119883 (9)
where 119904 is a 2-overlap individual and the overlap path is a root2-overlap path 119905 is a crossover individual
119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119886
119860 997888rarr 119889 997888rarr 119891 997888rarr 119905 997888rarr 119887 isin 119883 (10)
where 119905 is a crossover individual
119860 997888rarr 119888 997888rarr 119890 997888rarr 119905 997888rarr 119886
119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119887 isin 119883119884 (11)
where 119890 is a crossover individual 119905 is a 2-overlap individualand the overlap path is a 2-overlap path
312 Path-Pair Level Graphical Representation of a Path-Tri-ple Given a path-triple ⟨119875
119860119886 119875119860119887 119875119860119888⟩ we represent each
path as a node The path-triple can be decomposed to threepath-pairs (ie ⟨119875
119860119886 119875119860119887⟩ ⟨119875119860119886 119875119860119888⟩ and ⟨119875
119860119887 119875119860119888⟩) For
each path-pair if the two paths share at least one commonindividual (ie either 2-overlap individual or crossover indi-vidual) except119860 then there is an edge between the two nodesrepresenting the two paths Therefore we obtain four differ-ent scenarios 119878
0ndash1198783 shown in Figure 5
In Figure 5 the scenario 1198780has no edges so it means
that ⟨119875119860119886 119875119860119887 119875119860119888⟩ consists of three independent paths In
Figure 2 path-triple1 is an example of 1198780 Next we introduce
a lemma which can assist with identifying the options for theedges in the scenarios 119878
1ndash1198783
Lemma 3 Given a path-triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ consider the
three path-pairs ⟨119875119860119886 119875119860119887⟩ ⟨119875119860119886 119875119860119888⟩ and ⟨119875
119860119887 119875119860119888⟩ if there
is a 2-overlap edge which is represented by 119884 in regular expres-sion representation of any of the three path-pairs and then thepath-triple ⟨119875
119860119886 119875119860119887 119875119860119888⟩ has no contribution to Φ
119886119887119888
Proof In [17] Nadot and Vaysseix proposed from a geneticand biological point of view that Φ
119886119887119888can be evaluated by
enumerating all eligible inheritance paths at allele-level start-ing from a triple common ancestor119860 to the three individuals119886 119887 and 119888
p1
p3
A
b c
a
p2
p5
p8
p4
p7
p6
(a) Pedigree
A
b c
a
p5
p7
p4
p6
p8
p1 p2
p3
(b) Inheritance paths
Figure 6 Examples of pedigree and inheritance paths
For the pedigree in Figure 6 let us consider the path-triple ⟨119875
119860119886 119875119860119887 119875119860119888⟩ listed as follows 119875
119860119886 119860 rarr 119886 119875
119860119887
119860 rarr 1199013rarr 1199016rarr 1199017rarr 119887 119875
119860119888 119860 rarr 119901
4rarr 1199016rarr
1199017rarr 119888For ⟨119875
119860119887 119875119860119888⟩ 1199016is a crossover individual 119901
7is an over-
lap individual and 1199016rarr 1199017is a 2-overlap edge repre-sented
by 119884 in regular expression representation (see the definitionfor 119884 in Section 311)
For the individual 1199016 let us denote the two alleles at one
fixed autosomal locus as 1198921and 119892
2 At allele-level only one
allele can be passed down from 1199016to 1199017 Since 119901
3and 119901
4
are parents of 1199016 1198921is passed down from one parent and
1198922is passed down from the other parent It is infeasible to
pass down both 1198921and 119892
2from 119901
6to 1199017 In other words
there are no corresponding inheritance paths for the path-triple ⟨119875
119860119886 119875119860119887 119875119860119888⟩with a 2-overlap edge between ⟨119875
119860119887 119875119860119888⟩
(ie Case 6119883119884) Therefore such kind of path-triples has nocontribution toΦ
119886119887119888
Figure 6(b) shows one example of eligible inheritancepaths corresponding to a pedigree graph Each individual isrepresented by two allele nodesThe eligible inheritance pathsin Figure 6(b) consist of red edges only
Only Case 1 Case 2 and Case 3 do not have 119884 in theregular expression representation of a path-pair (see (7))considering the scenarios 119878
1ndash1198783shown in Figure 5 an edge
can have three options Case 1 119879Case 2 119883Case 3 119879119883
313 Constructing Cases for a Path-Triple For the scenarios1198781ndash1198783in Figure 5 we define two building blocks 119861
1 1198612
along with some rules in Figure 7 to generate acceptablecases For 119861
1 the edge can have three options Case 1 119879
Case 2 119883 Case 3 119879119883 For 1198612 we cannot allow both edges
to be root overlap because if two edges are root overlap then
8 Computational and Mathematical Methods in Medicine
For B2 there can be at most one edge belonging to root overlap (either T or TX)
PAa PAa
PAb PAb PAc
B1 B2
For B1 the edge can have three options case 1 T case 2 X case 3 TX
Figure 7 Building blocks 1198611 1198612 and basic rules
Note Ri denotes all acceptable path-triples for ui
S3e1
T3 = R1 ⋈ R2 ⋈ R3u1 u2 u3
e2 e2 e2
e3e3 e3e1 e1
Figure 8 A graphical illustration for obtaining 1198793
119875119860119886
and 119875119860119888
must share at least one com-mon individualexcept 119860 which contradicts the fact that 119875
119860119886and 119875
119860119888have
no edgeNext we focus on generating all acceptable cases for the
scenarios 1198781ndash1198783in Figure 5 where only 119878
3contains more
than one building block In order to leverage the dependencyamong building blocks we decompose 119878
3to 1198783= 1199061= 1198612
1199062= 1198612 1199063= 1198612 shown in Figure 8 For each 119906
119894 we have a
set of acceptable path-triples denoted as 119877119894
Considering the dependency among 1198771 1198772 1198773 we use
the natural join operator denoted as ⋈ operating on 1198771
1198772 1198773 to generate all acceptable cases for 119878
3 As a result we
obtain 1198793= 1198771⋈ 1198772⋈ 1198773 where 119879
3denotes the acceptable
cases of the path-triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ in the scenario 119878
3
For each scenario in Figure 5 we generate all acceptablecases for ⟨119875
119860119886 119875119860119887 119875119860119888⟩ The scenario 119878
0has no edges and
it shows that ⟨119875119860119886 119875119860119887 119875119860119888⟩ consists of three independent
paths while for the other scenarios 119878119896(119896 = 1 2 3) the 119896
edges can have two options
(1) all 119896 edges belong to crossover or(2) one edge belongs to root 2-overlap the remaining (119896minus
1) edges belong to crossover
In summary acceptable path-triples can have at most oneroot 2-overlap path any number of crossover individuals butzero 2-overlap path
314 Splitting Operator Considering the existence of root2-overlap path and crossover in acceptable path-triples wepropose a splitting operator to transform a path-triple withcrossover individuals to a noncrossover path-triple withoutchanging the contribution from this path-triple to Φ
119886119887119888 The
main purpose of using the splitting operator is to simplifythe path-counting formula derivation process We first usean example in Figure 9 to illustrate how the splitting operator
works In Figure 9 there is a crossover individual 119904 between119875119860119886
and 119875119860119887
in the path triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ in 119866
119896+1 The
splitting operator proceeds as follows
(1) split the node 119904 to two nodes 1199041and 1199042
(2) transform the edges 119904 rarr 1198861015840 and 119904 rarr 119887
1015840 to 1199041rarr 1198861015840
and 1199042rarr 1198871015840 respectively
(3) add two new edges 1199042rarr 1198861015840 and 119904
1rarr 1198871015840
Lemma 4 Given a pedigree graph 119866119896+1
having (119896 + 1)
crossover individuals regarding ⟨119875119860119886 119875119860119887 119875119860119888⟩ shown in
Figure 9 let 119904 denote the lowest crossover individual where nodescendant of 119904 can be a crossover individual among the threepaths119875
119860119886119875119860119887 and119875
119860119888 After using the splitting operator for the
lowest crossover individual 119904 in119866119896+1 the number of crossover
individuals in 119866119896+1
is decreased by 1
Proof The splitting operator only affects the edges from 119904 to1198861015840 and 1198871015840 If there is a new crossover node appearing the only
possible node is either 1198861015840 or 1198871015840 Assume 1198871015840 becomes a cross-over individual it means that 1198871015840 is able to reach 119886 and 119887 fromtwo separate paths It contradicts the fact that 119904 is the lowestcrossover individual between 119875
119860119886and 119875
119860119887
Next we introduce a canonical graph which results fromapplying the splitting operator for all crossover individualsThe canonical graph has zero crossover individual
Definition 5 (Canonical Graph) Given a pedigree graph 119866having one or more crossover individuals regarding Φ
119886119887119888 If
there exists a graph 1198661015840 which has no crossover individualswith regards to Φ
119886119887119888such that
(i) any acceptable path-triple in 119866 has an acceptablepath-triple in 1198661015840 which has the same contribution toΦ119886119887119888
as the one in 119866 forΦ119886119887119888
(ii) any acceptable path-triple in 1198661015840 has an acceptablepath-triple in 119866 which and has the same contributionto Φ119886119887119888
as the one in 1198661015840 forΦ119886119887119888
We call 1198661015840 a canonical graph of 119866 regardingΦ119886119887119888
Lemma 6 For a pedigree graph 119866 having one or morecrossover individuals regarding ⟨119875
119860119886 119875119860119887 119875119860119888⟩ there exists a
canonical graph 1198661015840 for 119866
Computational and Mathematical Methods in Medicine 9
Ancestor-descendant relationshipParent-child relationship
a998400 b
a b a b
998400 a998400 b998400
s1 s2
A A
x w c x w c
s For Gk+1 ⟨P ⟩ = PAa PAb PAc
⟨P ⟩ = PAa PAb PAcFor Gk
Gk+1 k + 1 crossover Gk k crossover
A rarr middot middot middot rarr x rarr s rarr a998400 rarr middot middot middot rarr aA rarr middot middot middot rarr w rarr s rarr b998400 rarr middot middot middot rarr b
A rarr middot middot middot rarr x rarr s1 rarr a998400 rarr middot middot middot rarr aA rarr middot middot middot rarr w rarr s2 rarr b998400 rarr middot middot middot rarr b
A rarr c
A rarr c
Figure 9 Transforming pedigree graph 119866119896+ 1 having 119896 + 1 crossover to 119866
119896having 119896 crossover
S0
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
PAa PAd
PAb PAc
Figure 10 A path-pair level graphical representation of ⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
Proof (Sketch) The proof is by induction on the number ofcrossover individuals
Induction hypothesis assume that if119866 has 119896 or less cross-overs there is a canonical graph 1198661015840 for 119866
In the induction step let119866119896+1
be a graph with 119896+1 cross-overs let 119904 be the lowest crossover between paths 119875
119860119886and
119875119860119887
in 119866119896+1
We apply the splitting operator on 119904 in 119866119896+1
andobtain 119866
119896having 119896 crossovers by Lemma 4
315 Path-Counting Formula for Φ119886119887119888
Now we present thepath-counting formula forΦ
119886119887119888
Φ119886119887119888= sum
119860
( sum
Type 1(1
2)
119871 triple
Φ119860119860119860
+ sum
Type 2(1
2)
119871 triple+1
Φ119860119860)
(12)
where Φ119860119860= (12)(1 + 119865
119860) Φ119860119860119860
= (14)(1 + 3119865119860) 119865119860 the
inbreeding coefficient of119860119860 a triple-common ancestor of 119886119887 and 119888 Type 1 ⟨119875
119860119886 119875119860119887 119875119860119888⟩ has zero root 2-overlap Type
2 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has one root 2-overlap path 119875
119860119904ending at
the individual 119904
119871 triple = 119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888minus 119871119875119860119904
for Type 2(13)
and 119871119875119860119886
the length of the path 119875119860119886
(also applicable for 119875119860119886
119875119860119888 and 119875
119860119904)
For completeness the path-counting formula for Φ119886119886119887
isgiven in Appendix A and the correctness proof of the path-counting formula is given in Appendix B
32 Path-Counting Formulas for Four Individuals
321 Path-Pair Level Graphical Representation of ⟨119875119860119886119875119860119887
119875119860119888119875119860119889⟩ Given a path-quad ⟨119875
119860119886 119875119860119887 119875119860119888 119875119860119889⟩ and
119876119906119886119889 119862(119875119860119886 119875119860119887 119875119860119888 119875119860119889) = 0 the path-quad can have 11
scenarios 1198780ndash11987810shown in Figure 10 where all four paths are
considered symmetricallyIn Figure 11 we introduce three building blocks 119861
1
1198612 1198613 For 119861
1and 119861
2 the rules presented in Figure 7 are also
applicable for Figure 11 For1198613 we only consider root overlap
because the crossover individuals can be eliminated by usingthe splitting operator introduced in Section 314 Note thatfor 1198613 if 119879119903119894 119862(119875
119860119886 119875119860119887 119875119860119888) = 0 then it is equivalent to the
scenario 1198783in Figure 8 Therefore we only need to consider
1198613when 119879119903119894 119862(119875
119860119886 119875119860119887 119875119860119888) = 0
322 Building Block-Based Cases Construction for ⟨119875119860119886119875119860119887
119875119860119888119875119860119889⟩ For a scenario 119878
119894(0 le 119894 le 10) in Figure 11 we
first decompose 119878119894to one or multiple building blocks For a
scenario 119878119894isin 1198781 1198783 it has only one building block and
all acceptable cases can be obtained directly For 1198782= 1199061=
1198611 1199062= 1198611 there is no need to consider the conflict between
the edges in 1199061and 119906
2because 119906
1and 119906
2are disconnected
Let 119877119894denote all acceptable cases of the path-pairs in 119906
119894 and
let 119879119894denote all acceptable cases for 119878
119894 Therefore we obtain
1198792= 1198771times1198772where times denotes the Cartesian product operator
from relational algebra
10 Computational and Mathematical Methods in Medicine
For B3 all three edges belong to root overlap (ie having root 3-overlap)
PAa
PAb PAcPAb
PAa
C(PAa PAb PAc) ne
B1 B2 B3
Tri 0
Figure 11 Building blocks for all scenarios of ⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
Table 2 Largest subgraph of a scenario 119878119894(4 le 119894 le 10 and 119894 = 6)
119878119894
1198784
1198785
1198787
1198788
1198789
11987810
119878119895
1198783
1198783
1198786
1198785
1198787
1198789
For 1198786= 1199061= 1198613 we obtain 119879
6= 1198771 For 119878
119894isin 119878119894| 4 le
119894 le 10 and 119894 = 6 we define the largest subgraph of 119878119894based
on which we construct 119879119894
Definition 7 (Largest Subgraph) Given a scenario 119878119894(4 le 119894 le
10 and 119894 = 6) the largest subgraph of 119878119894 denoted as 119878
119895 is
defined as follows
(1) 119878119895is a proper subgraph of 119878
119894
(2) if 119878119894contains 119861
3 then 119878
119895must also contain 119861
3
(3) no such 119878119896exists that 119878
119895is a proper subgraph of 119878
119896
while 119878119896is also a proper subgraph of 119878
119894
For each scenario 119878119894(4 le 119894 le 10 and 119894 = 6) we list the
largest subgraph of 119878119894 denoted as 119878
119895 in Table 2
For a scenario 119878119894(4 le 119894 le 10 and 119894 = 6) let Diff(119878
119894 119878119895)
denote the set of building blocks in 119878119894but not in 119878
119895 where 119878
119895is
the largest subgraph of 119878119894 Let |119864
119894| and |119864
119895| denote the number
of edges in 119878119894and 119878
119895 respectively According to Table 2 we
can conclude that |119864119894| minus |119864
119895| = 1 In order to leverage the
dependency among building blocks we consider only 1198612in
Diff(119878119894119878119895) For example Diff(119878
51198783) = 119861
2 Let119879
3denote all
acceptable cases for 1198783 And let119877
1denote the set of acceptable
cases for Diff(1198785 1198783) Then we can use 119878
3and Diff(119878
5
1198783) to construct all acceptable cases for 119878
5 Then we apply
this idea for constructing all acceptable cases for each 119878119894in
Table 2Given a path-quad ⟨119875
119860119886 119875119860119887 119875119860119888 119875119860119889⟩ an acceptable case
has the following properties
(1) if there is one root 3-overlap path there can be atmostone root 2-overlap path
(2) otherwise there can be at most two root 2-overlappaths
323 Path-Counting Formula forΦ119886119887119888119889
Now we present thepath-counting formula forΦ
119886119887119888119889as follows
Φ119886119887119888119889
= sum
119860
( sum
Type 1(1
2)
119871quad
Φ119860119860119860119860
+ sum
Type 2(1
2)
119871quad+1
Φ119860119860119860
+ sum
Type 3(1
2)
119871quad+2
Φ119860119860)
(14)
where Φ119860119860= (12)(1+119865
119860)Φ119860119860119860
= (14)(1+3119865119860)Φ119860119860119860119860
=
(18)(1+7119865119860) 119865119860 the inbreeding coefficient of119860119860 a quad-
common ancestor of 119886 119887 119888 and 119889 Type 1 zero root 2-overlapand zero root 3-overlap path Type 2 one root 2-overlap path119875119860119904
ending at 119904
Type 3
Case 1 two root 2-overlap paths 1198751198601199041
1198751198601199042
ending at 1199041and 1199042 respectively
Case 2 one root 3-overlap path119875119860119905
ending at 119905Case 3 one root 2-overlap path119875119860119904 one root 3-overlap
path 119875119860119905
ending at 119904 and 119905respectively
119871quad =
119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889
for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 119871119875119860119904
for Type 2119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889
minus1198711198751198601199041
minus 1198711198751198601199042
for Case 1 isin Type 3119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 2 lowast 119871
119875119860119905for Case 2 isin Type 3
119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889
minus119871119875119860119905minus 119871119875119860119904
for Case 3 isin Type 3(15)
and 119871119875119860119886
the length of the path 119875119860119886
(also applicable for 119875119860119887
119875119860119888 119875119860119889 etc)
For completeness the path-counting formulas for Φ119886119886119887119888
and Φ119886119886119886119887
are presented in Appendix A The correctness ofthe path-counting formula for four individuals is proven inAppendix C
Computational and Mathematical Methods in Medicine 11
⟨ ⟩(PAa PAb) (PAc PAd) = b
A
c
s t
da
A rarr s rarr aA rarr s rarr bA rarr t rarr cA rarr t rarr d
(a)
⟨ ⟩(PAa PAb) (PAc PAd) = b
A
c
x y
da
A rarr x rarr a
A rarr x rarr d
A rarr y rarr bA rarr y rarr c
(b)
Figure 12 Examples of 2-pair-path-quads for Φ119886119887119888119889
33 Path-Counting Formulas for Two Pairs of Individuals
331 Terminology and Definitions
(1) 2-Pair-Path-Pair It consists of two pairs of path-pairsdenoted as ⟨(119875
119878119886 119875119878119887) (119875119879119888 119875119879119889)⟩ where 119875
119878119886isin 119875(119878 119886) 119875
119878119887isin
119875(119878 119887) 119875119879119888isin 119875(119879 119888) 119875
119879119889isin 119875(119879 119889) 119878 is a common ancestor
of 119886 and 119887 and 119879 is a common ancestor of 119888 and 119889 If119860 = 119878 =119879 then 119860 is a quad-common ancestor of 119886 119887 119888 and 119889
(2) Homo-Overlap and Heter-Overlap Individual Given twopairs of individuals ⟨119886 119887⟩ and ⟨119888 119889⟩ if 119904 isin 119861119894 119862(119875
119860119886 119875119860119887) (or
119904 isin 119861119894 119862(119875119860119888 119875119860119889) we call 119904 a homo-overlap individual when
119875119860119886
and 119875119860119887
(or 119875119860119888
and 119875119860119889) pass through the same parent of
119904 If 119903 isin 119861119894 119862(119875119860119894 119875119860119895) where 119894 isin 119886 119887 and 119895 isin 119888 119889 we call
119903 a heter-overlap individual when 119875119860119894
and 119875119860119895
pass throughthe same parent of 119903
(3) Root Homo-Overlap and Heter-Overlap Path Given a 2-pair-path-pair ⟨(119875
119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ if 119904 is a homo-overlap
individual and the homo-overlap path extends all the wayto the quad-common ancestor 119860 then we call it a roothomo-overlap path If 119903 is a heter-overlap individual and theheter-overlap path extends all the way to the quad-commonancestor 119860 then we call it a root heter-overlap path
Example 8 119860 is quad-common ancestor for 119886 119887 119888 and 119889 inFigure 12 For (a) 119904 is a homo-overlap individual between 119875
119860119886
and 119875119860119887
119905 is a homo-overlap individual between 119875119860119888
and 119875119860119889 And
119860 rarr 119904 and 119860 rarr 119905 are root homo-overlap paths For (b) 119909 isa heter-overlap individual between 119875
119860119886and 119875
119860119889 119910 is a heter-
overlap individual between 119875119860119887
and 119875119860119888 And 119860 rarr 119909 and
119860 rarr 119910 are root heter-overlap paths
332 Path-Counting Formula for Φ119886119887119888119889
Now we presenta path-pair level graphical representation for ⟨(119875
119860119886 119875119860119887)
(119875119860119888 119875119860119889)⟩ shown in Figure 13 The options for an edge can
be 119879119883 119879119883 (Refer to Section 311 for definitions of 119879119883and 119879119883) Based on the different types of ⟨119875
119860119886 119875119860119887 119875119860119888 119875119860119889⟩
presented in (14) all cases for ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ are
summarized in Table 3 where ℎ is the last individual of a roothomo-overlap path 119875
119860ℎ(ie the path 119875
119860ℎending at ℎ) and 119903
1
and 1199032are the last individuals of root heter-overlap paths 119875
1198601199031
and 1198751198601199032
respectivelyGiven a pedigree graph having one or multiple progeni-
tors 119901119894| 119894 gt 0 we define that the generation of a progenitor
Table 3 A summary of all cases for ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩ ⟨(119875
119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩
Zero root 2-overlap andzero root 3-overlap
Zero root homo-overlap and zero rootheter-overlap
One root 2-overlap path
One root homo-overlap and zero rootheter-overlapZero root homo-overlap and one rootheter-overlap
Two root 2-overlap paths
Two root homo-overlaps and zero rootheter-overlapZero root homo-overlap and two rootheter-overlaps
One root 3-overlap path One root homo-overlap and two rootheter-overlaps and ℎ = 119903
1= 1199032
One root 2-overlap andone root 3-overlap
One root homo-overlap and two rootheter-overlaps and 119903
1= 1199032= ℎ
One root homo-overlap and two rootheter-overlaps and ℎ = 119903
1= 1199032
119901119894is 0 denoted as gen(119901
119894) = 0 If an individual 119886 has only
one parent 119901 then we define gen(119886) = gen(119901) + 1 If anindividual 119886 has two parents 119891 and 119898 we define gen(119886) =MAXgen(119891) gen(119898) + 1
The path-counting formula forΦ119886119887119888119889
is as follows
Φ119886119887119888119889
= sum
119860
( sum
Type 1(1
2)
1198712-pair
Φ119860119860119860
+ sum
Type 2(1
2)
1198712-pair+1
Φ119860119860119860
+ sum
Type 3(1
2)
1198712-pair+2
Φ119860119860
+ sum
Type 4(1
2)
1198712-pair+1
Φ119860119860)
+ sum
(119878119879)isinType 5(1
2)
119871⟨119875119878119886119875119878119887⟩+119871⟨119875119879119888119875119879119889
⟩+1
Φ119861119861
(16)
where 119860 a quad-common ancestor of 119886 119887 119888 and 119889 119878a common ancestor of 119886 and 119887 and 119879 a common ances-tor of 119888 and 119889 For ⟨(119875
119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ (119878 = 119879 =
119860) there are four types (ieType 1 to Type 4)
12 Computational and Mathematical Methods in Medicine
S0S1 S2 S3 S4 S5 S6 S7
S8 S9 S10 S11 S12 S13 S14 S15 S16
PAa
PAdPAb
PAc
Figure 13 Scenarios of ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ at path-pair level
Type 1 zero root homo-overlap and zero root heter-overlapType 2 zero root homo-overlap and one root heter-overlap 119875
119860119903ending at 119903
Type 3
zero root homo-overlap and two rootheter-overlap 119875
1198601199031and1198751198601199032
ending at1199031and 1199032 respectively
one root homo-overlap 119875119860ℎ
ending at ℎand two root heter-overlap 119875
1198601199031and 119875
1198601199032
ending at 1199031and 1199032 and 119903
1= 1199032
(17)
Type 4 one root homo-overlap 119875119860ℎ
ending at ℎ andtwo root heter-overlap ending at 119903
1and 1199032 and ℎ =
1199031= 1199032 For ⟨(119875
119878119886 119875119878119887) (119875119879119888 119875119879119889)⟩ (119878 = 119879) there is
one type (ie Type 5)Type 5 ⟨119875
119878119886 119875119878119887⟩ has zero overlap individual ⟨119875
119879119888
119875119879119889⟩ has zero overlap individual
At most one path-pair (either ⟨119875119878119886 119875119878119887⟩ or ⟨119875
119879119888
119875119879119889⟩) can have crossover individualsBetween a path from ⟨119875
119878119886 119875119878119887⟩ and a path from ⟨119875
119879119888 119875119879119889⟩
there are no overlap individuals but there can be crossoverindividuals 119909 where 119909 = 119878 and 119909 = 119879
119861=
119878 when gen (119878) lt gen (119879)119878 when gen (119878) = gen (119879)
and 119879 has two parents119879 otherwise
1198712-pair =
119871119875119860119886+ 119871119875119860119887
+119871119875119860119888+ 119871119875119860119889
for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 119871119875119860119903
for Type 2119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 1198711198751198601199031
minus 1198711198751198601199032
for Type 3119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 2 lowast 119871
119875119860ℎfor Type 4
119871⟨119875119878119886 119875119878119887⟩
= 119871119875119878119886+ 119871119875119878119887
for Type 5
119871⟨119875119879119888 119875119879119889⟩
= 119871119875119879119888+ 119871119875119879119889
for Type 5
(18)
Note that if ⟨119886 119887⟩ and ⟨119888 119889⟩ have zero quad-commonancestors we have the following formula for Φ
119886119887119888119889
Φ119886119887119888119889
= sum
(119878119879)isinType 6(1
2)
119871⟨119875119878119886119875119878119887⟩+119871⟨119875119879119888119875119879119889
⟩
Φ119878119878lowast Φ119879119879 (19)
Type 6 ⟨119875119878119886 119875119878119887⟩ is a nonoverlapping path-pair and ⟨119875
119879119888
119875119879119889⟩ is a nonoverlapping path-pair Between a path from
⟨119875119878119886 119875119878119887⟩ and a path from ⟨119875
119879119888 119875119879119889⟩ there are no overlap
individuals but there can be crossover individuals119871⟨119875119878119886 119875119878119887⟩
and 119871⟨119875119879119888119875119879119889⟩
are defined as in Type 5The correctness of the path-counting formula forΦ
119886119887119888119889is
proven in Appendix C For completeness please refer to [18]for the path-counting formulas for Φ
119886119886119887119888 Φ119886119887119886119888
Φ119886119887119886119887
andΦ119886119886119886119887
34 Experimental Results In this section we show the effi-ciency of our path-counting method using NodeCodes forcondensed identity coefficients by making comparisons withthe performance of a recursive method used in [10] Weimplemented two methods (1) using recursive formulas tocompute each required kinship coefficient and generalizedkinship coefficient (2) using path-counting method coupledwith NodeCodes to compute each required kinship coeffi-cient and generalized kinship coefficient independently Werefer to the first method as Recursive the second methodas NodeCodes For completeness please refer to [18] for thedetails of the NodeCodes-based method
Nodecodes of a node is a set of labels each representing apath to the node from its ancestors Given a pedigree graphlet 119903 be the progenitor (ie the node with 0 in-degree)(For simplicity we assume there is one progenitor 119903 asthe ancestor of all individuals in the pedigree Otherwise avirtual node 119903 can be added to the pedigree graph and allprogenitors can be made children of 119903) For each node 119906 inthe graph the set of NodeCodes of 119906 denoted as NC(119906) areassigned using a breadth-first-search traversal starting from119903 as follows
(1) If 119906 is 119903 then NC(119903) contains only one element theempty string
(2) Otherwise let 119906 be a node with NC(119906) and V0 V1
V119896be 119906rsquos children in sibling order then for each 119909
in NC(119906) a code 119909119894lowast is added to NC(V119894) where 0 le
119894 le 119896 and lowast indicates the gender of the individualrepresented by node V
119894
Computational and Mathematical Methods in Medicine 13
Computations of kinship coefficients for two individualsand generalized kinship coefficients for three individualspresented in [11 12 14 15] are using NodeCodes TheNodeCodes-based computation schemes can also be appliedfor the generalized kinship coefficients for four individualsand two pairs of individuals For completeness please referto [18] for the details using NodeCodes to compute thegeneralized kinship coefficients for four individuals and twopairs of individuals based on our proposed path-countingformulas in Sections 32 and 33
In order to test the scalability of our approach for cal-culating condensed identity coefficients on large pedigreeswe used a population simulator implemented in [11] togenerate arbitrarily large pedigreesThe population simulatoris based on the algorithm for generating populations withoverlapping generations in Chapter 4 of [19] along withthe parameters given in Appendix B of [20] to model therelatively isolated Finnish Kainuu subpopulation and itsgrowth during the years 1500ndash2000 An overview of thegeneration algorithmwas presented in [11 12 14]The param-eters include startingending year initial population sizeinitial age distribution marriage probability maximum ageat pregnancy expected number of children by time periodimmigration rate and probability of death by time period andage group
We examine the performance of condensed identity coef-ficients using twelve synthetic pedigrees which range from75 individuals to 195197 individuals The smallest pedigreespans 3 generations and the largest pedigree spans 19 gener-ations We analyzed the effects of pedigree size and the depthof individuals in the pedigree (the longest path between theindividual and a progenitor) on the computation efficiencyimprovement
In the first experiment 300 random pairs were selectedfrom each of our 12 synthetic pedigrees Figure 14 showscomputation efficiency improvement for each pedigree Ascan be seen the improvement of NodeCodes over Recursivegrew increasingly larger as the pedigree size increased froma comparable amount of 2683 on the smallest pedigree to9475 on the largest pedigree It also shows that path-count-ing method coupled with NodeCodes can scale very well onlarge pedigrees in terms of computing condensed identitycoefficients
In our next experiment we examined the effect of thedepth of the individual in the pedigree on the query time Foreach depth we generated 300 random pairs from the largestsynthetic pedigree
Figure 15 shows the effect of depth on the compu-tation efficiency improvement We can see the improve-ment of NodeCodes over Recursive ranging from 8648 to9130
4 Conclusion
We have introduced a framework for generalizing Wrightrsquospath-counting formula for more than two individuals Aim-ing at efficiently computing condensed identity coefficients
0
50
100
150
200
77 181
383
769
1558
3105
6174
1235
1
2466
7
4976
1
9832
8
1951
97
250
300
Aver
age t
ime (
ms)
Individuals in pedigree
RecursiveNodecodes
Figure 14 The effect of pedigree size on computation efficiencyimprovement
0200400600800
10001200140016001800
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Aver
age t
ime (
ms)
Depth
RecursiveNodeCodes
Figure 15 The effect of depth on computation efficiency improve-ment
we proposed path-counting formulas (PCF) for all general-ized kinship coefficients for which are sufficient for express-ing condensed identity coefficients by a linear combinationWe also perform experiments to compare the efficiency of ourmethod with the recursive method for computing condensedidentity coefficients on large pedigrees Our future workincludes (i) further improvements on condensed identifycoefficients computation by collectively calculating the setof generalized kinship coefficients to avoid redundant com-putations and (ii) experimental results for using PCF inconjunction with encoding schemes (eg compact path-encoding schemes [13]) for computing condensed identitycoefficients on very large pedigrees
Appendices
A Path-Counting Formulas of Special Cases
A1 Path-Counting Formula for Φ119886119886119887
For ⟨1198751198601198861 1198751198601198862⟩ we
introduce a special case where 1198751198601198861
and 1198751198601198862
aremergeable
14 Computational and Mathematical Methods in Medicine
PAa1 PAa2 PAa1 PAa2
S0 S1
PAb PAb PAb
If is mergeable⟨P ⟩Aa1 PAa2
PAa
S2 S3
Figure 16 A path-pair level graphical representation of ⟨1198751198601198861 1198751198601198862
119875119860119887⟩
Definition A1 (Mergeable Path-Pair) A path-pair ⟨1198751198601198861
1198751198601198862⟩ is mergeable if and only if the two paths 119875
1198601198861and 119875
1198601198862
are completely identical
Next we present a graphical representation of ⟨1198751198601198861 1198751198601198862
119875119860119887⟩ in Figure 16
Lemma A2 For 1198782and 119878
3in Figure 16 ⟨119875
1198601198861 1198751198601198862⟩ cannot
be a mergeable path-pair
Proof For 1198782and 119878
3 if ⟨119875
1198601198861 1198751198601198862⟩ is mergeable then
any common individual 119904 between 1198751198601198861
and 119875119860119887
is alsoa shared individual between 119875
1198601198862and 119875
119860119887 It means
119904 isin 119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) which contradicts the fact that
119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) = 0
Considering all three scenarios in Figure 16 only 1198781can
have a mergeable path-pair ⟨1198751198601198861 1198751198601198862⟩ by Lemma A2 Now
we present our path-counting formula forΦ119886119886119887
where 119886 is notan ancestor of 119887
Φ119886119886119887
= sum
119860
( sum
Type 1(1
2)
119871 tripleminus1
Φ119860119860119860
+ sum
Type 2(1
2)
119871 triple
Φ119860119860
+ sum
Type 3(1
2)
119871⟨119875119860119886119875119860119887⟩+1
Φ119860119860)
(A1)
where 119860 a common ancestor of 119886 and 119887When ⟨119875
1198601198861 1198751198601198862⟩ is not mergeable
Type 1 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has no root 2-overlap
Type 2 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has one root 2-overlap path
119875119860119904
ending at the individual 119904
When ⟨1198751198601198861 1198751198601198862⟩ is mergeable
Type 3 ⟨119875119860119886 119875119860119887⟩ is a nonoverlapping path-pair
119871 triple = 1198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887
for Type 11198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887minus 119871119875119860119904
for Type 2
119871⟨119875119860119886 119875119860119887⟩
= 119871119875119860119886+ 119871119875119860119887
for Type 3
(A2)
For the sake of completeness if 119886 is an ancestor of 119887 there isno recursive formula for Φ
119886119886119887in [10] but we can use either
the recursive formula for Φ119886119887119888
or the path-counting formulaforΦ119886119887119888
to computeΦ11988611198862119887
A2 Path-Counting Formula for Φ119886119886119887119888
Given a path-quad⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩ if ⟨119875
1198601198861 1198751198601198862⟩ is not mergeable then
we process the path-quad as equivalent to ⟨119875119860119886 119875119860119887 119875119860119888
119875119860119889⟩ If ⟨119875
1198601198861 1198751198601198862⟩ is mergeable the path-quad ⟨119875
1198601198861 1198751198601198862
119875119860119887 119875119860119888⟩ can be condensed to scenarios for ⟨119875
119860119886 119875119860119887 119875119860119888⟩
Now we present a path-counting formula forΦ119886119886119887119888
where119886 is not an ancestor of 119887 and 119888 as follows
Φ119886119886119887119888
= sum
119860
( sum
Type 1(1
2)
119871quadminus1
Φ119860119860119860119860
+ sum
Type 2(1
2)
119871quad
ΦAAA
+ sum
Type 3(1
2)
119871quad+1
Φ119860119860)
+sum
119860
( sum
Type 4(1
2)
119871 triple+1
Φ119860119860119860
+ sum
Type 5(1
2)
119871 triple+2
Φ119860119860)
(A3)
where 119860 a quad-common ancestor of 119886 119887 119888 and 119889When ⟨119875
1198601198861 1198751198601198862⟩ is not mergeable
Type 1 zero root 2-overlap and zero root 3-overlappathType 2 one root 2-overlap path 119875
119860119904ending at 119904
Type 3
Case 1 two root 2-overlap paths 1198751198601199041
and 1198751198601199042
ending at 1199041and 1199042 respectively
Case 2 one root 3-overlap path 119875119860119905
ending at 119905Case 3 one root 2-overlapand one root 3-overlap paths119875119860119904
and 119875119860119905
ending at 119904 and 119905respectively
(A4)
When ⟨1198751198601198861 1198751198601198862⟩ is mergeable
Type 4 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has zero root 2-overlap path
Type 5 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has one root 2-overlap path119875
119860119904
ending at 119904
119871quad=
1198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
for Type 11198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus119871119875119860119904
for Type 21198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus1198711198751198601199041
minus 1198711198751198601199042
for Case 1isinType 31198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus119871119875119860119905
for Case 2isinType 31198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus119871119875119860119905minus 119871119875119860119904
for Case 3isinType 3
119871 triple = 119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
for Type 4119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888minus 119871119875119860119904
for Type 5(A5)
Computational and Mathematical Methods in Medicine 15
Note that if 119886 is an ancestor of either 119887 or 119888 or both ofthem then the path-counting formula of Φ
119886119887119888119889is applicable
to computeΦ11988611198862119887119888
A3 Path-Counting Formula for Φ119886119886119886119887
A special case of⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ for ⟨119875
1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ is introduced
when ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable With the existence of
a mergeable path-triple ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ can be con-
densed to ⟨119875119860119886 119875119860119887⟩
Definition A3 (Mergeable Path-Triple) Given three paths1198751198601198861
1198751198601198862
and 1198751198601198863
they are mergeable if and only if theyare completely identical
Lemma A4 Given a path-quad ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there
must be at least one mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩
⟨1198751198601198861 1198751198601198863⟩ ⟨1198751198601198862 1198751198601198863⟩
Proof For an individual 119886 with two parents 119891 and 119898 thepaternal allele of the individual 119886 is transmitted from 119891 andthe maternal allele is transmitted from119898 At allele level onlytwo descent paths starting from an ancestor are allowed Fora path-quad ⟨119875
1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there must be at least one
mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩ ⟨1198751198601198861 1198751198601198863⟩ and
⟨1198751198601198862 1198751198601198863⟩
For simplicity we treat ⟨1198751198601198861 1198751198601198862⟩ as a default mergeable
path-pairNow we present the path-counting formula for Φ
119886119886119886119887
where 119886 is not an ancestor of 119887 as follows
Φ119886119886119886119887
= sum
119860
(3
2( sum
Type 1(1
2)
119871 tripleminus1
Φ119860119860119860
+ sum
Type 2(1
2)
119871 triple
Φ119860119860)
+ sum
Type 3(1
2)
119871pair+2
Φ119860119860)
(A6)
where 119860 a common ancestor of 119886 and 119887When there is only one mergeable path-pair (let us con-
sider ⟨1198751198601198861 1198751198601198862⟩ as the mergeable path-pair)
Type 1 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has zero root 2-overlap path
Type 2 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has one root 2-overlap path
119875119860119904
ending at 119904
When ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable
Type 3 ⟨119875119860119886 119875119860119887⟩ is nonoverlapping
119871 triple = 1198711198751198601198861
+ 1198711198751198601198863
+ 119871119875119860119887
for Type 11198711198751198601198861
+ 1198711198751198601198863
+ 119871119875119860119887minus 119871119875119860119904
for Type 2
119871pair = 119871119875119860119886 + 119871119875119860119887 for Type 3
(A7)
Note that if 119886 is an ancestor of 119887 we treat Φ119886119886119886119887
=
Φ119886111988621198863119887
Then we apply the path-counting formula forΦ119886119887119888119889
to computeΦ119886111988621198863119887
Case21 Case31 ΦAAAΦabCase22 Case32
Case23 ΦAA
Figure 17 Dependency graph for different cases regardingΦ119886119887119888
andΦ119886119886119887
B Proof for Path-Counting Formulas ofThree Individuals
Wefirst demonstrate that for one triple-common ancestor119860the path-counting computation of Φ
119886119887119888is equivalent to the
computation using recursive formulas Then we prove thecorrectness of the path-counting computation for multipletriple-common ancestors
B1 One Triple-Common Ancestor Considering the differenttypes of path-triples starting from a triple-common ancestor119860 in a pedigree graph119866 contributing toΦ
119886119887119888andΦ
119886119886119887119866 can
have 5 different cases
Case 21 119866 does not haveany path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with root overlapCase 22 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with root overlapCase 23 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
having mergeablepath-pair⟨119875
1198601198861 1198751198601198862⟩
lArr997904 Φ119886119886119887
Case 31 119866 does not haveany path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩
with root overlapCase 32 119866 has path-triples
⟨119875119860119886 119875119860119887 119875119860119888⟩
with root overlap
lArr997904 Φ119886119887119888
(B1)
Based on the 5 cases from Case 21 to Case 32 we firstconstruct a dependency graph shown in Figure 17 consist-ent with the recursive formulas (3) (4) and (5) for the gener-alized kinship coefficients for three individuals
Then we take the following steps to prove the correctnessof the path-counting formulas (12) and (A1)
(i) forΦ119886119887 the correctness of the path-counting formula
(ie Wrightrsquos formula) is proven in [21] For Case 21and Case 22 the correctness is proven based on thecorrectness of Cases 31 and 32
(ii) for Case 23 it has no cycle but only depends on Φ119886119887
Thus we prove the correctness of Case 23 by trans-forming the case toΦ
119886119887
16 Computational and Mathematical Methods in Medicine
a b
c
(a)
A
a b c
(b)
Figure 18 (a) 119888 is a parent of 119886 and 119887 (b) no individual is a parent of another
Parent-child relationshipAncestor-descendant relationship
A
a
s v p
f b c
(a)
Parent-child relationshipAncestor-descendant relationship
c
a
s v
f b
(b)
Figure 19 (a) No individual is a parent of another (b) 119888 is an ancestor of 119886 and 119887
(iii) for Cases 31 and 32 the correctness is proven byinduction on the number of edges 119899 in the pedigreegraph 119866
B11 Correctness Proof for Case 31
Case 31 ForΦ119886119887119888
119866 does not have any path triples ⟨119875119860119886 119875119860119887
119875119860119888⟩ with root overlap
Proof (Basis) There are two basic scenarios (i) one individ-ual is a parent of another (ii) no individual is a parent ofanother among 119886 119887 and 119888
Using the recursive formula (3) to compute Φ119886119887119888
forFigure 18(a) Φ
119886119887119888= (12)Φ
119888119887119888= (12)
2
Φ119888119888119888 for Figure 18(b)
Φ119886119887119888= (12)Φ
119860119887119888= (12)
2
Φ119860119860119888
= (12)3
Φ119860119860119860
Using the path-counting formula (12) if a path-triple
⟨119875119860119886 119875119860119887 119875119860119888⟩ has no root overlap (ie Type 1) then the
contribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ
119886119887119888can be computed as
follows sumType 1(12)119871⟨119875119860119886119875119860119887
119875119860119888⟩Φ119860119860119860
where 119871⟨119875119860119886119875119860119887 119875119860119888⟩
=
119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
For Figure 18(a) 119888 is the only triple-common ancestor
and we obtain Φ119886119887119888
= (12)119871⟨119875119888119886119875119888119887
119875119888119888⟩Φ119888119888119888
= (12)2
Φ119888119888119888 for
Figure 18(b) we obtain Φ119886119887119888
= (12)119871⟨119875119860119886119875119860119887
119875119860119888⟩Φ119860119860119860
=
(12)3
Φ119860119860119860
Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Then we show it istrue for 119899 = 119896 + 1
For Figures 19(a) and 19(b) among 119886 119887 and 119888 let 119886 be theindividual having the longest path starting from their triple-common ancestor in the pedigree graph119866with (119896+1) edgesIf we remove the node 119886 and cut the edge 119891 rarr 119886 from 119866
then the new graph 119866lowast has 119896 edges In terms of computingΦ119891119887119888
119866lowast satisfies the condition for induction hypothesisFor Figure 19(a) Φ
119891119887119888= sumType 1(12)
119871⟨119875119860119891119875119860119887119875119860119888⟩Φ119860119860119860
Based on the recursive formula (3)Φ
119886119887119888= (12)(Φ
119891119887119888+Φ119898119887119888)
where 119891 and 119898 are parents of 119886 In 119866 119886 only has one parent119891 thus it indicatesΦ
119898119887119888= 0 Then we can plug-in the path-
counting formula forΦ119891119887119888
to obtain
Φ119886119887119888=1
2Φ119891119887119888
=1
2lowast sum
Type 1(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩
Φ119860119860119860
= sum
Type 1(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860119860
∵ 119871⟨119875119860119886119875119860119887 119875119860119888⟩
= 119871⟨119875119860119891119875119860119887 119875119860119888⟩
+ 1
there4 Φ119886119887119888= sum
Type 1(1
2)
119871⟨119875119860119886119875119860119887119875119860119888⟩
Φ119860119860119860
(B2)
Similarly for Figure 19(b) we obtain Φ119886119887119888
=
sumType 1(12)119871⟨119875119888119891119875119888119887119875119888119888⟩+1
Φ119888119888119888= sumType 1(12)
119871⟨119875119888119886119875119888119887119875119888119888⟩Φ119888119888119888
Thus it is true for 119899 = 119896 + 1
B12 Correctness Proof for Case 32
Case 32 ForΦ119886119887119888
119866 has path triples ⟨119875119860119886 119875119860119887 119875119860119888⟩with root
overlap
Proof (Basis) There are three basic scenarios (i) there are twoindividuals who are parents of another (ii) there is only oneindividual who is parent of another (iii) there is no individualwho is a parent of another among 119886 119887 and 119888
Computational and Mathematical Methods in Medicine 17
a
b
c
(a)
A
a
b c
(b)
A
a
s
b
c
(c)
Figure 20 (a) 119887 is a parent of 119886 and 119888 is a parent of 119887 (b) 119887 is a parentof 119886 (c) no individual who is a parent of another
Using the recursive formula (3) to compute Φ119886119887119888
inFigure 20 for Figure 20(a) Φ
119886119887119888= (12)Φ
119887119887119888= (12)
2
Φ119887119888=
(12)3
Φ119888119888 for Figure 20(b)Φ
119886119887119888= (12)Φ
119887119887119888= (12)
2
Φ119887119888=
(12)4
Φ119860119860
for Figure 20(c)Φ119886119887119888= (12)
2
Φ119904119904119888= (12)
3
Φ119904119888=
(12)5
Φ119860119860
Using the path-counting formula (12) if a path-triple
⟨119875119860119886 119875119860119887 119875119860119888⟩ has root overlap (ie Type 2) then the con-
tribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ
119886119887119888can be computed as
followssumType 2(12)119871⟨119875119860119886119875119860119887
119875119860119888⟩+1
Φ119860119860
where 119871⟨119875119860119886 119875119860119887 119875119860119888⟩
=
119871119875119860119886
+ 119871119875119860119887
+ 119871119875119860119888minus 119871119875119860119904
and 119904 is the last individual of theroot overlap path 119875
119860119904
For Figure 20(a) 119888 is the only triple-common ancestorand we obtain Φ
119886119887119888= (12)
119871⟨119875119888119886119875119888119887119875119888119888⟩+1
Φ119888119888= (12)
2+1
Φ119888119888=
(12)3
Φ119888119888 Similarly for Figures 20(b) and 20(c) we obtain
Φ119886119887119888= (12)
4
Φ119860119860
and Φ119886119887119888= (12)
5
Φ119860119860
respectively
Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Show that it is truefor = 119896 + 1
For Figures 21(a) 21(b) and 21(c) among 119886 119887 and 119888 let119886 be the individual who has the longest path and let 119901 be aparent of 119886 Then we cut the edge 119901 rarr 119886 from 119866 and obtaina new graph 119866lowast which satisfies the condition of inductionhypothesis For Figure 21(a) we use the path-counting for-mula forΦ
119891119887119888in 119866lowast Φ
119891119887119888= sumType 2(12)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860
In 119866 119891 is the only parent of 119886 according to the recursive
formula (3) we have Φ119886119887119888= (12)Φ
119891119887119888 Then we can plug-in
the Φ119891119887119888
and obtain
Φ119886119887119888=1
2Φ119891119887119888
=1
2sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860
= sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1
Φ119860119860
∵ 119871⟨119875119860119886 119875119860119887 119875119860119888⟩
= 119871⟨119875119860119891119875119860119887 119875119860119888⟩
+ 1
there4 Φ119886119887119888= sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1
Φ119860119860
= sum
Type 2(1
2)
119871⟨119875119860119886119875119860119887119875119860119888⟩+1
Φ119860119860
(B3)
For Figures 21(b) and 21(c) we take the same steps as we cal-culate Φ
119886119887119888for Figure 21(a)
In summary it is true for 119899 = 119896 + 1
A
a
s
t
f
b
c
(a)
a
t
b
A
s c
(b)
a
s
t
b
c
(c)Figure 21 (a) No individual who is a parent of another (b) 119887 is aparent of 119886 (c) 119887 is a parent of 119886 and 119888 is an ancestor of 119887
B13 Correctness Proof for Case 23
Case 23 For Φ119886119886119887
the path-triples in the pedigree graph 119866have mergeable path-pair
Proof Considering the relationship between 119886 and 119887 119866has two scenarios (i) 119887 is not an ancestor of 119886 (ii) 119887 isan ancestor of 119886 Using the path-counting formula (A1)if a path-triple ⟨119875
1198601198861 1198751198601198862 119875119860119887⟩ isin Type 3 which means
that it has a mergeable path-pair then the contributionof ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ to Φ
119886119886119887can be computed as follows
sumType 3(12)119871⟨119875119860119886119875119860119887
⟩+1Φ119860119860
where 119871⟨119875119860119886 119875119860119887⟩
= 119871119875119860119886+ 119871119875119860119887
Using the recursive formula (4) we obtain Φ
119886119886119887=
(12)(Φ119886119887+ Φ119891119898119887)
For Figure 22(a) 119860 is a common ancestor of 119886 and 119887∵ 119886 only has one parent 119891
there4 Φ119886119886119887
=1
2(Φ119886119887+ Φ119891119898119887)
=1
2(Φ119886119887+ 0) =
1
2Φ119886119887
(as 119898 is missing) (B4)
For Φ119886119887 we use Wrightrsquos formula and obtain Φ
119886119887=
sum119875(12)119871⟨119875119860119886119875119860119887
⟩Φ119860119860
where 119875 denotes all nonoverlappingpath-pairs ⟨119875
119860119886 119875119860119887⟩
Then we have Φ119886119886119887
= (12)Φ119886119887
=
(12)sum119875(12)119871⟨119875119860119886119875119860119887
⟩Φ119860119860= sum119875(12)119871⟨119875119860119886119875119860119887
⟩+1Φ119860119860
For Figure 22(b) we can also transform the computation
of Φ119886119886119887
to Φ119886119887
In summary it shows that the path-counting formula(A1) is true for Case 23
B14 Correctness Proof for Cases 21 and 22 For Φ119886119886119887
whenthere is no path-triple having mergeable path-pair (ie thepath-triple belongs to either Case 21 or Case 23)Φ
119886119886119887can be
transformed toΦ11988611198862119887
which is equivalent to the computationof Φ119886119887119888
for Cases 31 and 32 The correctness of our path-counting formula for Cases 31 and 32 is proven Thus weobtain the correctness for Φ
119886119886119887when the path-triple belongs
to either Case 21 or Case 22
B2 Multiple Triple-Common Ancestors Now we providethe correctness proof for multiple triple-common ancestorsregarding the path-counting formulas (12) and (A1)
18 Computational and Mathematical Methods in Medicine
A
a
s
w
t
f
b
Parent-child relationshipAncestor-descendant relationship
(a)
a
s
f
b
Parent-child relationshipAncestor-descendant relationship
(b)
Figure 22 (a) 119887 is not an ancestor of 119886 (b) 119887 is an ancestor of 119886
Lemma A Given a pedigree graph 119866 and three individuals 119886119887 119888 having at least one trip-common ancestorΦ
119886119887119888is correctly
computed using the path counting formulas (12) and (A1)
Proof Proof by induction on the number of triple-commonancestorsBasis 119866 has only one triple-common ancestor of 119886 119887 and 119888
The correctness of (12) and (A1) for 119866 with only one tri-ple-common ancestor of 119886 119887 and 119888 is proven in the previoussection
Induction Hypothesis Assume that if 119866 has 119896 or less triple-common ancestors of 119886 119887 and 119888 (12) and (A1) are correct for119866
Induction Step Now we show that it is true for 119866 with 119896 + 1triple-common ancestors of 119886 119887 and 119888
Let 119879119903119894 119862(119886 119887 119888 119866) denote all triple-common ancestorsof 119886 119887 and 119888 in 119866 where 119879119903119894 119862(119886 119887 119888 119866) = 119860
119894| 1 le 119894 le 119896 +
1 Let 1198601be the most top triple-common ancestor such that
there is no individual among the remaining ancestors 119860119894|
2 le 119894 le 119896 + 1 who is an ancestor of 1198601 Let 119878(119860
1) denote the
contribution from 1198601to Φ119886119887119888
Because119860
1is themost top triple-common ancestor there
is no path-triple from 119860119894| 2 le 119894 le 119896 + 1 to 119886 119887 and
119888 which passes through 1198601 Then we can remove 119860
1from
119866 and delete all out-going edges from 1198601and obtain a new
graph 1198661015840 which has 119896 triple-common ancestors of 119886 119887 and 119888It means 119879119903119894 119862(119886 119887 119888 1198661015840) = 119860
119894| 2 le 119894 le 119896 + 1
For the new graph 1198661015840 we can apply our induction
hypothesis and obtainΦ119886119887119888(1198661015840
)For the most top triple-common ancestor 119860
1 there are
two different cases considering its relationship with the othertriple-common ancestors
(1) there is no individual among 119860119894| 2 le 119894 le 119896 + 1 who
is a descendant of 1198601
(2) there is at least one individual among 119860119894| 2 le 119894 le
119896 + 1 who is a descendant of 1198601
For (1) since no individual among 119860119894| 2 le 119894 le 119896 + 1 is a
descendant of 1198601 the set of path-triples from 119860
1to 119886 119887 and
119888 is independent of the set of path-triples from 119860119894| 2 le 119894 le
119896 + 1 to 119886 119887 and 119888 It also means that the contribution from
1198601toΦ119886119887119888
is independent of the contribution from the othertriple-common ancestors
Summing up all contributions we can obtainΦ119886119887119888(119866) =
Φ119886119887119888(1198661015840
) + 119878(1198601)
For (2) let119860119895be one descendant of119860
1 Now both119860
1and
119860119895can reach 119886 119887 and 119888119901119905119894= 119905119886 1198601rarr sdot sdot sdot rarr 119886 119905
119887 1198601rarr sdot sdot sdot rarr 119887 119905
119888 1198601rarr
sdot sdot sdot rarr 119888 a path-triple from 1198601to 119886 119887 and 119888
If 119905119886 119905119887 and 119905
119888all pass through119860
119895 then the path-triple119901119905
119894
is not an eligible path-triple for Φ119886119887119888
When we compute thecontribution from119860
1toΦ119886119887119888
we exclude all such path-tripleswhere 119905
119886 119905119887 and 119905
119888all pass through a lower triple-common
ancestor In other words an eligible path-triple from 1198601
regarding Φ119886119887119888
cannot have three paths all passing through alower triple-common ancestor Therefore we know that thatthe contribution from119860
1toΦ119886119887119888
is independent of the contri-bution from the other triple-common ancestors Summing upall contributions we obtainΦ
119886119887119888(119866) = Φ
119886119887119888(1198661015840
) + 119878(1198601)
C Proof for Four Individuals and TwoPairs of Individuals
Here we give a proof sketch for the correctness of pathcounting formulas for four individuals First of all for fourindividuals in a pedigree graph 119866 we present all differentcases based on which we construct a dependency graphThe correctness of the path-counting formulas for two-pairindividuals can be proved similarly
C1 Proof for Four Individuals Consider the existence ofdifferent types of path-quads regarding Φ
119886119887119888119889 Φ119886119886119887119888
andΦ119886119886119886119887
there are 15 cases for a pedigree graph 119866
Case 21 119866 has path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with zero root overlapCase 22 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with one root overlapCase 23 119866 has path-pairs
⟨119875119860119886 119875119860119887⟩
with zero root overlap
lArr997904 Φ119886119886119886119887
Computational and Mathematical Methods in Medicine 19
Case21
Case31 ΦAAA
ΦAAA
Case41
Case42
Case34ΦAA
Case32
Case331
Case22
Case23
Case431
Case35
Case432
Case4 33
Case332
Case333
Figure 23 Dependency graph for different cases for four individuals
Case 31 119866 has path-quads⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with zero root overlapCase 32 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 2-overlapCase 331 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with two root 2-overlapCase 332 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 3-overlapCase 333 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 2-overlapand one root 3-overlap
Case 34 119866 has path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩
with zero root overlapCase 35 119866 has path-triples
⟨119875119860119886 119875119860119887 119875119860119888⟩
with one root overlap
lArr997904 Φ119886119886119887119888
Case 41 119866 has path-quads⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with zero root overlapCase 42 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 2-overlapCase 431 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with two root 2-overlapCase 432 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 3-overlapCase 433 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 2-overlapand one root 3-overlap
lArr997904 Φ119886119887119888119889
(C1)Then we construct a dependency graph shown in
Figure 23 for all cases for four individualsAccording to the dependency graph in Figure 23 the
intermediate steps including Cases 34 and 35 are already
proved for the computation of Φ119886119887119888
The correctness of thetransformation fromCase 42 to Case 34 can be proved basedon the recursive formula forΦ
119886119887119888119889andΦ
119886119886119887119888 Similarly we can
obtain the transformation from Case 431 to Case 35
C2 Proof for TwoPairs of Individuals Consider the existenceof different types of 2-pair-path-pair regarding Φ
119886119887119888119889 there
are 9 cases which are listed as follows
Case 41 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-
overlap and zero root heter-overlap
Case 42 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-
overlap and one root heter-overlap
Case 431 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root
homo-overlap and two root heter-overlap
Case 432 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root
homo-overlap and two root heter-overlap
Case 44119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root homo-
overlap and zero root heter-overlap
Case 45 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with two root homo-
overlap and zero root heter-overlap
Case 46 119866 has path-triples ⟨119875119860119886 119875119860119887 119875119860119888⟩ with zero root
overlapCase 47 119866 has path-triples ⟨119875
119860119886 119875119860119887 119875119860119888⟩ with one root
overlap
Case 48 119866 has path-pairs ⟨119875119879119888 119875119879119889⟩ with zero root overlap
Then we construct a dependency graph for the casesrelating to Φ
119886119887119888119889in Figure 24
According to the dependency graph in Figure 24Cases 46 47 and 48 are the intermediate steps whichalready are proved for the computation of Φ
119886119887119888 The
correctness of the transformation from Case 42 to Case 46can be proved based on the recursive formula for Φ
119886119887119888119889and
Φ119886119887119886119888
Similarly we can obtain the transformation fromCases 431 and 432 to Case 47 as well as from Case 44 toCase 48 accordingly
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
20 Computational and Mathematical Methods in Medicine
Case41
Case44
ΦAAA
Case42 Case46
Case48
ΦAA
ΦTT
Case431 Case47
Case432
ΦAAAA
Figure 24 Dependency graph for different cases for two pairs of individuals
Acknowledgments
The authors thank Professor Robert C Elston Case Schoolof Medicine for introducing to them the identity coefficientsand referring them to the related literature [7 10 17] Thiswork is partially supported by the National Science Founda-tionGrants DBI 0743705 DBI 0849956 andCRI 0551603 andby the National Institute of Health Grant GM088823
References
[1] Surgeon Generalrsquos New Family Health History Tool Is ReleasedReady for ldquo21st Century Medicinerdquo httpcompmedcomcate-gorypeople-helping-peoplepage7
[2] M Falchi P Forabosco E Mocci et al ldquoA genomewidesearch using an original pairwise sampling approach for largegenealogies identifies a new locus for total and low-density lipo-protein cholesterol in two genetically differentiated isolates ofSardiniardquoThe American Journal of Human Genetics vol 75 no6 pp 1015ndash1031 2004
[3] M Ciullo C Bellenguez V Colonna et al ldquoNew susceptibilitylocus for hypertension on chromosome 8q by efficient pedigree-breaking in an Italian isolaterdquo Human Molecular Genetics vol15 no 10 pp 1735ndash1743 2006
[4] Glossary of Genetic Terms National Human Genome ResearchInstitute httpwwwgenomegovglossaryid=148
[5] CW CottermanA calculus for statistico-genetics [PhD thesis]Columbus Ohio USA Ohio State University 1940 Reprintedin P Ballonoff Ed Genetics and Social Structure DowdenHutchinson amp Ross Stroudsburg Pa USA 1974
[6] G Malecot Les mathematique de lrsquoheredite Masson ParisFrance 1948 Translated edition The Mathematics of HeredityFreeman San Francisco Calif USA 1969
[7] M Gillois ldquoLa relation drsquoidentite en genetiquerdquo Annales delrsquoInstitut Henri Poincare B vol 2 pp 1ndash94 1964
[8] D L Harris ldquoGenotypic covariances between inbred relativesrdquoGenetics vol 50 pp 1319ndash1348 1964
[9] A Jacquard ldquoLogique du calcul des coefficients drsquoidentite entredeux individualsrdquo Population vol 21 pp 751ndash776 1966
[10] G Karigl ldquoA recursive algorithm for the calculation of identitycoefficientsrdquo Annals of Human Genetics vol 45 no 3 pp 299ndash305 1981
[11] B Elliott S F Akgul S Mayes and Z M Ozsoyoglu ldquoEfficientevaluation of inbreeding queries on pedigree datardquo in Proceed-ings of the 19th International Conference on Scientific and Statis-tical Database Management (SSDBM rsquo07) July 2007
[12] B Elliott E Cheng S Mayes and Z M Ozsoyoglu ldquoEfficientlycalculating inbreeding on large pedigrees databasesrdquo Informa-tion Systems vol 34 no 6 pp 469ndash492 2009
[13] L Yang E Cheng and Z M Ozsoyoglu ldquoUsing compactencodings for path-based computations on pedigree graphsrdquo inProceedings of the ACM Conference on Bioinformatics Compu-tational Biology and Biomedicine (ACM-BCB rsquo11) pp 235ndash244August 2011
[14] E Cheng B Elliott and Z M Ozsoyoglu ldquoScalable compu-tation of kinship and identity coefficients on large pedigreesrdquoin Proceedings of the 7th Annual International Conference onComputational Systems Bioinformatics (CSB rsquo08) pp 27ndash362008
[15] E Cheng B Elliott and Z M Ozsoyoglu ldquoEfficient compu-tation of kinship and identity coefficients on large pedigreesrdquoJournal of Bioinformatics and Computational Biology (JBCB)vol 7 no 3 pp 429ndash453 2009
[16] S Wright ldquoCoefficients of inbreeding and relationshiprdquo TheAmerican Naturalist vol 56 no 645 1922
[17] R Nadot and G Vaysseix ldquoKinship and identity algorithm ofcoefficients of identityrdquo Biometrics vol 29 no 2 pp 347ndash3591973
[18] E Cheng Scalable path-based computations on pedigree data[PhD thesis] Case Western Reserve University ClevelandOhio USA 2012
[19] V Ollikainen Simulation Techniques for Disease Gene Localiza-tion in Isolated Populations [PhD thesis] University ofHelsinkiHelsinki Finland 2002
[20] H T T Toivonen P Onkamo K Vasko et al ldquoData miningapplied to linkage diseqilibrium mappingrdquoThe American Jour-nal of Human Genetics vol 67 no 1 pp 133ndash145 2000
[21] W Boucher ldquoCalculation of the inbreeding coefficientrdquo Journalof Mathematical Biology vol 26 no 1 pp 57ndash64 1988
Submit your manuscripts athttpwwwhindawicom
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MEDIATORSINFLAMMATION
of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Behavioural Neurology
EndocrinologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Disease Markers
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
OncologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Oxidative Medicine and Cellular Longevity
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
PPAR Research
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
ObesityJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational and Mathematical Methods in Medicine
OphthalmologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Diabetes ResearchJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Research and TreatmentAIDS
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Gastroenterology Research and Practice
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Parkinsonrsquos Disease
Evidence-Based Complementary and Alternative Medicine
Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom
Computational and Mathematical Methods in Medicine 3
Φ119886119887119888119889
=1
2(Φ119891119887119888119889
+ Φ119898119887119888119889
)
if 119886 is not an ancestor of 119887 or 119888 or 119889
Φ119886119886119887119888
=1
2(Φ119887119888+ Φ119891119898119887119888
)
if 119886 is not an ancestor of 119887 or 119888
Φ119886119887119886119888
=1
4(2Φ119886119887119888+ Φ119891119887119898119888
+ Φ119898119887119891119888
)
if 119886 is not an ancestor of 119887 or 119888
Φ119886119886119886119887
=1
2(Φ119886119887+ Φ119891119898119887)
if 119886 is not an ancestor of 119887
Φ119886119886119886119886
=1
4(1 + 3Φ
119891119898) =
1
4(1 + 3119865
119886)
(5)
Φ119886119887119888
is the probability that randomly chosen alleles atthe same locus from each of the three individuals (ie 119886 119887and 119888) are identical by descent (IBD) Similarly Φ
119886119887119888119889is the
probability that randomly chosen alleles at the same locusfrom each of the four individuals (ie 119886 119887 119888 and 119889) are IBDΦ119886119887119888119889
is the probability that a random allele from 119886 is IBDwith a random allele from 119887 and that a random allele from 119888
is IBD with a random allele from 119889 at the same locus Notethat Φ
119886119887119888= 0 if there is no common ancestor of 119886 119887 and 119888
Φ119886119887119888119889
= 0 if there is no common ancestor of 119886 119887 119888 and 119889 andΦ119886119887119888119889
= 0 in the absence of a common ancestor either for 119886and 119887 or for 119888 and 119889
22 Identity Coefficients and Condensed Identity CoefficientsGiven two individuals 119886 and 119887withmaternally and paternallyderived alleles at a fixed autosomal locus there are 15 possibleidentity states and the probabilities associated with eachidentity state are called identity coefficients Ignoring thedistinction betweenmaternally and paternally derived alleleswe categorize the 15 possible states to 9 condensed identitystates as shown in Figure 1 The states range from state 1in which all four alleles are IBD to state 9 in which noneof the four alleles are IBD The probabilities associated witheach condensed identity state are called condensed identitycoefficients denoted by Δ
119894| 1 le 119894 le 9 The condensed
identity coefficients can be computed based on generalizedkinship coefficients using the linear transformation shown asfollows in (6)
[[[[[[[[[[[[
[
1 1 1 1 1 1 1 1 1
2 2 2 2 1 1 1 1 1
2 2 1 1 2 2 1 1 1
4 0 2 0 2 0 2 1 0
8 0 4 0 2 0 2 1 0
8 0 2 0 4 0 2 1 0
16 0 4 0 4 0 2 1 0
4 4 2 2 2 2 1 1 1
16 0 4 0 4 0 4 1 0
]]]]]]]]]]]]
]
[[[[[[[[[[[[
[
Δ1
Δ2
Δ3
Δ4
Δ5
Δ6
Δ7
Δ8
Δ9
]]]]]]]]]]]]
]
=
[[[[[[[[[[[[
[
1
2Φ119886119886
2Φ119887119887
4Φ119886119887
8Φ119886119886119887
8Φ119886119887119887
16Φ119886119886119887119887
4Φ119886119886119887119887
16Φ119886119887119886119887
]]]]]]]]]]]]
]
(6)
In our work we focus on deriving the path-counting for-mulas for the generalized kinship coefficients includingΦ
119886119887119888
Φ119886119887119888119889
and Φ119886119887119888119889
23 Terms Defined for Path-Counting Formulas for Three andFour Individuals
(1) Triple-Common AncestorGiven three individuals 119886 119887 and119888 if119860 is a common ancestor of the three individuals then wecall 119860 a triple-common ancestor of 119886 119887 and 119888
(2) Quad-Common Ancestor Given four individuals 119886 119887 119888and 119889 if119860 is a common ancestor of the four individuals thenwe call 119860 a quad-common ancestor of 119886 119887 119888 and 119889
(3) 119875(119860 119886) It denotes the set of all possible paths from 119860 to119886 where the paths can only traverse edges in the direction ofparent to child such that 119875(119860 119886) = 119873119880119871119871 if and only if 119860 isan ancestor of 119886 119875
119860119886denotes a particular path from 119860 to 119886
where 119875119860119886isin 119875(119860 119886)
(4) Path-Pair It consists of two paths denoted as ⟨119875119860119886 119875119860119887⟩
where 119875119860119886isin 119875(119860 119886) and 119875
119860119887isin 119875(119860 119887)
(5) Nonoverlapping Path-Pair Given a path-pair ⟨119875119860119886 119875119860119887⟩
it is nonoverlapping if and only if the two paths share nocommon individuals except 119860
(6) Path-Triple It consists of three paths denoted as ⟨119875119860119886 119875119860119887
119875119860119888⟩ where 119875
119860119886isin 119875(119860 119886) 119875
119860119887isin 119875(119860 119887) and 119875
119860119888isin 119875(119860 119888)
(7) Path-Quad It consists of four paths denoted as ⟨119875119860119886 119875119860119887
119875119860119888 119875119860119889⟩ where 119875
119860119886isin 119875(119860 119886) 119875
119860119887isin 119875(119860 119887) 119875
119860119888isin 119875(119860 119888)
and 119875119860119889isin 119875(119860 119889)
(8) 119861119894 119862(119875119860119886 119875119860119887) It denotes all common individuals shared
between 119875119860119886
and 119875119860119887 except 119860
(9) 119879119903119894 119862(119875119860119886 119875119860119887 119875119860119888) It denotes all common individuals
shared among 119875119860119886 119875119860119887 and 119875
119860119888 except 119860
(10)119876119906119886119889 119862(119875119860119886 119875119860119887 119875119860119888 119875119860119889) It denotes all common indi-
viduals shared among 119875119860119886 119875119860119887 119875119860119888 and 119875
119860119889 except 119860
(11) Crossover and 2-Overlap Individual If 119904 isin 119861119894 119862(119875119860119886 119875119860119887)
we call 119904 a crossover individual with respect to 119875119860119886
and 119875119860119887
ifthe two paths pass through different parents of 119904 On the otherhand if 119875
119860119886and 119875
119860119887pass through the same parent of 119904 then
we call 119904 a 2-overlap individual with respect to 119875119860119886
and 119875119860119887
(12) 3-Overlap Individual If 119904 isin 119879119903119894 119862(119875119860119886 119875119860119887 119875119860119888) and the
three paths 119875119860119886 119875119860119887 and 119875
119860119888pass through the same parent
of 119904 then we call 119904 a 3-overlap individual with respect to 119875119860119886
119875119860119887 and 119875
119860119888
(13) 2-Overlap Path If 119904 is a 2-overlap individual with respectto 119875119860119886
and 119875119860119887 then both 119875
119860119886and 119875119860119887
pass through the sameparent of 119904 denoted by 119901 and the edge from 119901 to 119904 is called anoverlap edge All consecutive overlap edges constitute a pathand this path is called a 2-overlap path If the 2-overlap path
4 Computational and Mathematical Methods in Medicine
Mat
erna
lPa
tern
al
Δ1 Δ2 Δ3 Δ4 Δ5 Δ6 Δ7 Δ8 Δ9
arsquos allelesbrsquos alleles
Figure 1 The 15 possible identity states for individuals 119886 and 119887 grouped by their 9 condensed states Lines indicate alleles that are IBD
A
c s d
e f
t
a b
Non-overlapping path-pair
Three independent paths
t is a crossover individual
and the overlap path is a root 2-overlap path
t is a 2-overlap individual and e is acrossover individual
t is a crossover individual s is a 2-overlapindividual and the overlap path is a root 2-overlap path
overlap individuals and the overlap path is a root 2-overlap path
e is a crossover individual t is a 2-overlapindividual and the overlap path is not a root 2-overlap path c is a 2-overlap individual and theoverlap path is a root 2-overlap path
Path-triple6
t is a crossover individual
s e t are 2-overlap individuals
c is a 3-overlap individual and e t are 2-
A rarr s rarr e rarr t rarr aA rarr s rarr e rarr t rarr b
A rarr s rarr e rarr t rarr aA rarr drarr b
A rarr s rarr e rarr t rarr aA rarrA rarr c
A rarr c
A rarr c
Path-pair1
Path-pair2
A rarr d rarr f rarr t rarr bA rarr s rarr e rarr t rarr a
A rarr s rarr e rarr t rarr aA rarr s rarr e rarr t rarr b
d rarr f
A rarr s rarr e rarr t rarr aA rarr d rarr f rarr t rarr b
A rarr c rarr t rarr e rarr aA rarr d rarr f rarr t rarr b
A rarr s rarr e rarr t rarr aA rarr s rarr f rarr t rarr bA rarr c
A rarr c rarr e rarr t rarr aA rarr c rarr e rarr t rarr bA rarr c
A rarr c rarr e rarr t rarr aA rarr c rarr e rarr t rarr bA rarr c
Path-triple1
Path-triple2
Path-triple3
Path-triple4
Path-pair3
Path-pair4
Path-triple5
s e t are 2-overlap individualswhere
where
where
where
where
where
where
where
Figure 2 Examples of path-pairs and path-triples
extends all theway to the ancestor119860 we call it a root 2-overlappath
(14) 3-Overlap PathIt consists of all 3-overlap individuals ina consecutive order If the 3-overlap path extends all the wayto the root 119860 we call it a root 3-overlap path
Example 1 Consider the path-pairs from 119860 to 119886 and 119887 inFigure 2 where119860 is a common ancestor of 119886 and 119887 For path-pair1 119861119894 119862(119875
119860119886 119875119860119887) = 119904 119890 119905 and 119860 rarr 119904 rarr 119890 rarr 119905 is
a root 2-overlap path with respect to 119875119860119886
and 119875119860119887 For path-
pair4 119861119894 119862(119875119860119886 119875119860119887) = 119890 119905 where 119890 is a crossover indi-
vidual 119905 is a 2-overlap individual with respect to 119875119860119886
and 119875119860119887
and 119890 rarr 119905 is a root 2-overlap path with respect to 119875119860119886
and119875119860119887
Example 2 There are four path-quads listed in Figure 3 from119860 to four individuals 119886 119887 119888 and 119889 where 119860 is a quad-common ancestor of the four individuals For path-quad2considering the paths 119875
119860119886and 119875119860119887 the path119860 rarr 119905 rarr 119891 rarr
119904 is a root 2-overlap path 119905 119891 119904 are 2-overlap individualswithrespect to 119875
119860119886and 119875
119860119887 For path-quad3 119905 119891 119904 are 3-overlap
individuals with respect to 119875119860119886 119875119860119887 and 119875
119860119888 and the path
119860 rarr 119905 rarr 119891 rarr 119904 is a root 3-overlap path
Then we summarize all the conceptual terms used in thepath-counting formulas for two individuals three individu-als and four individuals in Table 1 which reveals a glimpse ofour framework for generalizingWrightrsquos formula to three andfour individuals from terminology aspect
24 An Overview of Path-Counting Formula DerivationAccording to Wrightrsquos path-counting formula [16] (see (2))for two individuals 119886 and 119887 the path-counting approachrequires identifying common ancestors of 119886 and 119887 andcalculating the contribution of each common ancestor toΦ119886119887 More specifically for each common ancestor denoted
as 119860 we obtain all path-pairs from 119860 to 119886 and 119887
and identify acceptable path-pairs For Φ119886119887 an acceptable
path-pair ⟨119875119860119886 119875119860119887⟩ is a nonoverlapping path-pair where
Computational and Mathematical Methods in Medicine 5
A
c
s
dt
f
ba
m
Path-quad1
Path-quad2
Path-quad3
Path-quad4
A rarr cA rarr d
A rarr t rarr f rarr s rarr aA rarr m rarr s rarr b
A rarr t rarr f rarr s rarr aA rarr t rarr f rarr s rarr bA rarr cA rarr d
A rarr t rarr f rarr s rarr aA rarr t rarr f rarr s rarr bA rarr t rarr f rarr s rarr cA rarr d
A rarr t rarr f rarr s rarr aA rarr t rarr m rarr s rarr bA rarr t rarr m rarr s rarr cA rarr d
Figure 3 Examples of path-quads
Table 1 The conceptual terms used for two three and four individuals
Two individuals Three individuals Four individualsCommon ancestor Triple-common ancestor Quad-common ancestorPath-pair Path-triple Path-quad119861119894 119862(119875
119860119886 119875119860119887) 119879119903119894 119862(119875
119860119886 119875119860119887 119875119860119888) 119876119906119886119889 119862(119875
119860119886 119875119860119887 119875119860119888 119875119860119889)
NA 2-Overlap individual 3-Overlap individualNA 2-Overlap path 3-Overlap pathNA Root 2-overlap path Root 3-overlap pathNA Crossover individual Crossover individual
the two paths share no common individuals except 119860 InFigure 2 path-pair2 is an acceptable path-pair while path-pair1 path-pair3 and path-pair4 are not acceptable path-pairs The contribution of each common ancestor 119860 toΦ
119886119887is
computed based on the inbreeding coefficient of 119860 modifiedby the length of each acceptable path-pair
To compute Φ119886119887119888
the path-counting approach requiresidentifying all triple-common ancestors of 119886 119887 and 119888 andsumming up all triple-common ancestorsrsquo contributions toΦ119886119887119888
For each triple-common ancestor denoted as119860 we firstidentify all path-triples each of which consists of three pathsfrom 119860 to 119886 119887 and 119888 respectively Some examples of path-triples are presented in Figure 2
For Φ119886119887 only nonoverlapping path-pairs are acceptable
A path-triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ consists of three path-pairs
⟨119875119860119886 119875119860119887⟩ ⟨119875119860119886 119875119860119888⟩ and ⟨119875
119860119887 119875119860119888⟩ For Φ
119886119887119888 a path-triple
might be acceptable even though either 2-overlap individualsor crossover individuals exist between a path-pair Themain challenge we need to address is finding necessary andsufficient conditions for acceptable path-triples
Aiming at solving the problem of identifying acceptablepath-triples we first use a systematic method to generate allpossible cases for a path-pair by considering different types ofcommon individuals shared between the two pathsThen weintroduce building blocks which are connected graphs withconditions on every edge in the graph that encapsulates a
set of acceptable cases of path-pairs In each building blockwe represent paths as nodes and interactions (ie sharedcommon individuals between two paths) as edges There areat least two paths in a building block For each buildingblock we obtain all acceptable cases for concerned path-pairs Given a path-triple it can be decomposed to one ormultiple building blocks Considering a shared path-pairbetween two building blocks we use the natural join operatorfrom relational algebra to match the acceptable cases forthe shared path-pair between two building blocks In otherwords considering the acceptable cases for building blocksas inputs we use the natural join operator to construct allacceptable cases for a path-triple Acceptable cases for a path-triple are identified and then used in deriving the path-counting formula forΦ
119886119887119888
Then we summarize all the main procedures used forderiving the path-counting formula for Φ
119886119887119888in a flowchart
shown in Figure 4 The main procedures are also applicablefor deriving the path-counting formulas forΦ
119886119887119888119889andΦ
119886119887119888119889
3 Results and Discussion
31 Path-Counting Formulas for Three Individuals We firstintroduce a systematic method to generate all possible cases
6 Computational and Mathematical Methods in Medicine
Path-pair
Path-triple Path-pair levelrepresentation Decomposition A set of
building blocksSets of acceptable casesFor each building block
Acceptable cases forpath-triple Natural join
If path-pair hascrossover
No
No
Yes
Yes
Split operator
Path-triple belongs toType 2
Type 1
If path-pair hasroot overlap
Compute its contributionto Φabc
Path-triple belongs to
⟨PAa PAb⟩Generate all cases for Identify nonoverlap path-
Pairs for ⟨PAa PAb⟩Compute its contribution
to Φab
Identify acceptable cases⟨PAa PAb⟩ in thefor
context of a path-triple
Aa PAb PAc ⟩⟨P
⟨PAa PAb⟩
Figure 4 A flowchart for path-counting formula derivation
for a path-pair Then we discuss building blocks for path-triples and identify all acceptable cases which are used inderiving the path-counting formula forΦ
119886119887119888
311 Cases for a Path-Pair Given a path-pair ⟨119875119860119886 119875119860119887⟩with
119861119894 119862(119875119860119886 119875119860119887) = 119873119880119871119871 where 119860 is a common ancestor of 119886
and 119887 and 119861119894 119862(119875119860119886 119875119860119887) consists of all common individuals
shared between 119875119860119886
and 119875119860119887 except 119860 we introduce three
patterns (ie crossover 2-overlap and root 2-overlap) to gen-erate all possible cases for ⟨119875
119860119886 119875119860119887⟩
(1) 119883(119875119860119886 119875119860119887) 119875119860119886
and 119875119860119887
share one or multiple cross-over individuals
(2) 119879(119875119860119886 119875119860119887) 119875119860119886
and 119875119860119887
are root 2-overlapping from119860 and the root 2-overlap path can have one or multi-ple 2-overlap individuals
(3) 119884(119875119860119886 119875119860119887)119875119860119886
and119875119860119887
are overlapping but not from119860 and the 2-overlap path can have one or multiple 2-overlap individuals
Based on the three patterns 119883(119875119860119886 119875119860119887) 119879(119875
119860119886 119875119860119887)
and 119884(119875119860119886 119875119860119887) we use regular expressions to generate all
possible cases for the path-pair ⟨119875119860119886 119875119860119887⟩ For convenience
we drop ⟨119875119860119886 119875119860119887⟩ and use 119883119879 and 119884 instead of patterns
119883(119875119860119886 119875119860119887) 119879(119875
119860119886 119875119860119887) and 119884(119875
119860119886 119875119860119887) whenever there is
no confusion When 119861119894 119862(119875119860119886 119875119860119887) = 119873119880119871119871 the eight cases
shown in (7) cover all possible cases for ⟨119875119860119886 119875119860119887⟩ The com-
pleteness of eight cases shown in (7) for ⟨119875119860119886 119875119860119887⟩ can be
proved by induction on the total number of 119879 119883 and 119884appearing in ⟨119875
119860119886 119875119860119887⟩ Using the pedigree in Figure 2 Cases
1ndash3 and Case 6 are illustrated in (8) (9) (10) and (11)
Case 1 119879Case 2 119883+
Case 3 119879119883+
Case 4 119879(119883+119884)+
Case 5 119879(119883+119884)+119883+
Case 6 119883+119884Case 7 119883+(119884119883+)+
Case 8 119883+(119884119883+)+119884
(7)
119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119886
119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119887 isin 119879 (8)
Computational and Mathematical Methods in Medicine 7
S0 S1 S2 S3
PAa PAb
PAc
Figure 5 A path-pair level graphical representation of ⟨119875119860119886 119875119860119887 119875119860119888⟩
where 119904 119890 119905 are 2-overlap individuals and the overlap pathis a root 2-overlap path
119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119886
119860 997888rarr 119904 997888rarr 119891 997888rarr 119905 997888rarr 119887 isin 119879119883 (9)
where 119904 is a 2-overlap individual and the overlap path is a root2-overlap path 119905 is a crossover individual
119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119886
119860 997888rarr 119889 997888rarr 119891 997888rarr 119905 997888rarr 119887 isin 119883 (10)
where 119905 is a crossover individual
119860 997888rarr 119888 997888rarr 119890 997888rarr 119905 997888rarr 119886
119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119887 isin 119883119884 (11)
where 119890 is a crossover individual 119905 is a 2-overlap individualand the overlap path is a 2-overlap path
312 Path-Pair Level Graphical Representation of a Path-Tri-ple Given a path-triple ⟨119875
119860119886 119875119860119887 119875119860119888⟩ we represent each
path as a node The path-triple can be decomposed to threepath-pairs (ie ⟨119875
119860119886 119875119860119887⟩ ⟨119875119860119886 119875119860119888⟩ and ⟨119875
119860119887 119875119860119888⟩) For
each path-pair if the two paths share at least one commonindividual (ie either 2-overlap individual or crossover indi-vidual) except119860 then there is an edge between the two nodesrepresenting the two paths Therefore we obtain four differ-ent scenarios 119878
0ndash1198783 shown in Figure 5
In Figure 5 the scenario 1198780has no edges so it means
that ⟨119875119860119886 119875119860119887 119875119860119888⟩ consists of three independent paths In
Figure 2 path-triple1 is an example of 1198780 Next we introduce
a lemma which can assist with identifying the options for theedges in the scenarios 119878
1ndash1198783
Lemma 3 Given a path-triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ consider the
three path-pairs ⟨119875119860119886 119875119860119887⟩ ⟨119875119860119886 119875119860119888⟩ and ⟨119875
119860119887 119875119860119888⟩ if there
is a 2-overlap edge which is represented by 119884 in regular expres-sion representation of any of the three path-pairs and then thepath-triple ⟨119875
119860119886 119875119860119887 119875119860119888⟩ has no contribution to Φ
119886119887119888
Proof In [17] Nadot and Vaysseix proposed from a geneticand biological point of view that Φ
119886119887119888can be evaluated by
enumerating all eligible inheritance paths at allele-level start-ing from a triple common ancestor119860 to the three individuals119886 119887 and 119888
p1
p3
A
b c
a
p2
p5
p8
p4
p7
p6
(a) Pedigree
A
b c
a
p5
p7
p4
p6
p8
p1 p2
p3
(b) Inheritance paths
Figure 6 Examples of pedigree and inheritance paths
For the pedigree in Figure 6 let us consider the path-triple ⟨119875
119860119886 119875119860119887 119875119860119888⟩ listed as follows 119875
119860119886 119860 rarr 119886 119875
119860119887
119860 rarr 1199013rarr 1199016rarr 1199017rarr 119887 119875
119860119888 119860 rarr 119901
4rarr 1199016rarr
1199017rarr 119888For ⟨119875
119860119887 119875119860119888⟩ 1199016is a crossover individual 119901
7is an over-
lap individual and 1199016rarr 1199017is a 2-overlap edge repre-sented
by 119884 in regular expression representation (see the definitionfor 119884 in Section 311)
For the individual 1199016 let us denote the two alleles at one
fixed autosomal locus as 1198921and 119892
2 At allele-level only one
allele can be passed down from 1199016to 1199017 Since 119901
3and 119901
4
are parents of 1199016 1198921is passed down from one parent and
1198922is passed down from the other parent It is infeasible to
pass down both 1198921and 119892
2from 119901
6to 1199017 In other words
there are no corresponding inheritance paths for the path-triple ⟨119875
119860119886 119875119860119887 119875119860119888⟩with a 2-overlap edge between ⟨119875
119860119887 119875119860119888⟩
(ie Case 6119883119884) Therefore such kind of path-triples has nocontribution toΦ
119886119887119888
Figure 6(b) shows one example of eligible inheritancepaths corresponding to a pedigree graph Each individual isrepresented by two allele nodesThe eligible inheritance pathsin Figure 6(b) consist of red edges only
Only Case 1 Case 2 and Case 3 do not have 119884 in theregular expression representation of a path-pair (see (7))considering the scenarios 119878
1ndash1198783shown in Figure 5 an edge
can have three options Case 1 119879Case 2 119883Case 3 119879119883
313 Constructing Cases for a Path-Triple For the scenarios1198781ndash1198783in Figure 5 we define two building blocks 119861
1 1198612
along with some rules in Figure 7 to generate acceptablecases For 119861
1 the edge can have three options Case 1 119879
Case 2 119883 Case 3 119879119883 For 1198612 we cannot allow both edges
to be root overlap because if two edges are root overlap then
8 Computational and Mathematical Methods in Medicine
For B2 there can be at most one edge belonging to root overlap (either T or TX)
PAa PAa
PAb PAb PAc
B1 B2
For B1 the edge can have three options case 1 T case 2 X case 3 TX
Figure 7 Building blocks 1198611 1198612 and basic rules
Note Ri denotes all acceptable path-triples for ui
S3e1
T3 = R1 ⋈ R2 ⋈ R3u1 u2 u3
e2 e2 e2
e3e3 e3e1 e1
Figure 8 A graphical illustration for obtaining 1198793
119875119860119886
and 119875119860119888
must share at least one com-mon individualexcept 119860 which contradicts the fact that 119875
119860119886and 119875
119860119888have
no edgeNext we focus on generating all acceptable cases for the
scenarios 1198781ndash1198783in Figure 5 where only 119878
3contains more
than one building block In order to leverage the dependencyamong building blocks we decompose 119878
3to 1198783= 1199061= 1198612
1199062= 1198612 1199063= 1198612 shown in Figure 8 For each 119906
119894 we have a
set of acceptable path-triples denoted as 119877119894
Considering the dependency among 1198771 1198772 1198773 we use
the natural join operator denoted as ⋈ operating on 1198771
1198772 1198773 to generate all acceptable cases for 119878
3 As a result we
obtain 1198793= 1198771⋈ 1198772⋈ 1198773 where 119879
3denotes the acceptable
cases of the path-triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ in the scenario 119878
3
For each scenario in Figure 5 we generate all acceptablecases for ⟨119875
119860119886 119875119860119887 119875119860119888⟩ The scenario 119878
0has no edges and
it shows that ⟨119875119860119886 119875119860119887 119875119860119888⟩ consists of three independent
paths while for the other scenarios 119878119896(119896 = 1 2 3) the 119896
edges can have two options
(1) all 119896 edges belong to crossover or(2) one edge belongs to root 2-overlap the remaining (119896minus
1) edges belong to crossover
In summary acceptable path-triples can have at most oneroot 2-overlap path any number of crossover individuals butzero 2-overlap path
314 Splitting Operator Considering the existence of root2-overlap path and crossover in acceptable path-triples wepropose a splitting operator to transform a path-triple withcrossover individuals to a noncrossover path-triple withoutchanging the contribution from this path-triple to Φ
119886119887119888 The
main purpose of using the splitting operator is to simplifythe path-counting formula derivation process We first usean example in Figure 9 to illustrate how the splitting operator
works In Figure 9 there is a crossover individual 119904 between119875119860119886
and 119875119860119887
in the path triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ in 119866
119896+1 The
splitting operator proceeds as follows
(1) split the node 119904 to two nodes 1199041and 1199042
(2) transform the edges 119904 rarr 1198861015840 and 119904 rarr 119887
1015840 to 1199041rarr 1198861015840
and 1199042rarr 1198871015840 respectively
(3) add two new edges 1199042rarr 1198861015840 and 119904
1rarr 1198871015840
Lemma 4 Given a pedigree graph 119866119896+1
having (119896 + 1)
crossover individuals regarding ⟨119875119860119886 119875119860119887 119875119860119888⟩ shown in
Figure 9 let 119904 denote the lowest crossover individual where nodescendant of 119904 can be a crossover individual among the threepaths119875
119860119886119875119860119887 and119875
119860119888 After using the splitting operator for the
lowest crossover individual 119904 in119866119896+1 the number of crossover
individuals in 119866119896+1
is decreased by 1
Proof The splitting operator only affects the edges from 119904 to1198861015840 and 1198871015840 If there is a new crossover node appearing the only
possible node is either 1198861015840 or 1198871015840 Assume 1198871015840 becomes a cross-over individual it means that 1198871015840 is able to reach 119886 and 119887 fromtwo separate paths It contradicts the fact that 119904 is the lowestcrossover individual between 119875
119860119886and 119875
119860119887
Next we introduce a canonical graph which results fromapplying the splitting operator for all crossover individualsThe canonical graph has zero crossover individual
Definition 5 (Canonical Graph) Given a pedigree graph 119866having one or more crossover individuals regarding Φ
119886119887119888 If
there exists a graph 1198661015840 which has no crossover individualswith regards to Φ
119886119887119888such that
(i) any acceptable path-triple in 119866 has an acceptablepath-triple in 1198661015840 which has the same contribution toΦ119886119887119888
as the one in 119866 forΦ119886119887119888
(ii) any acceptable path-triple in 1198661015840 has an acceptablepath-triple in 119866 which and has the same contributionto Φ119886119887119888
as the one in 1198661015840 forΦ119886119887119888
We call 1198661015840 a canonical graph of 119866 regardingΦ119886119887119888
Lemma 6 For a pedigree graph 119866 having one or morecrossover individuals regarding ⟨119875
119860119886 119875119860119887 119875119860119888⟩ there exists a
canonical graph 1198661015840 for 119866
Computational and Mathematical Methods in Medicine 9
Ancestor-descendant relationshipParent-child relationship
a998400 b
a b a b
998400 a998400 b998400
s1 s2
A A
x w c x w c
s For Gk+1 ⟨P ⟩ = PAa PAb PAc
⟨P ⟩ = PAa PAb PAcFor Gk
Gk+1 k + 1 crossover Gk k crossover
A rarr middot middot middot rarr x rarr s rarr a998400 rarr middot middot middot rarr aA rarr middot middot middot rarr w rarr s rarr b998400 rarr middot middot middot rarr b
A rarr middot middot middot rarr x rarr s1 rarr a998400 rarr middot middot middot rarr aA rarr middot middot middot rarr w rarr s2 rarr b998400 rarr middot middot middot rarr b
A rarr c
A rarr c
Figure 9 Transforming pedigree graph 119866119896+ 1 having 119896 + 1 crossover to 119866
119896having 119896 crossover
S0
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
PAa PAd
PAb PAc
Figure 10 A path-pair level graphical representation of ⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
Proof (Sketch) The proof is by induction on the number ofcrossover individuals
Induction hypothesis assume that if119866 has 119896 or less cross-overs there is a canonical graph 1198661015840 for 119866
In the induction step let119866119896+1
be a graph with 119896+1 cross-overs let 119904 be the lowest crossover between paths 119875
119860119886and
119875119860119887
in 119866119896+1
We apply the splitting operator on 119904 in 119866119896+1
andobtain 119866
119896having 119896 crossovers by Lemma 4
315 Path-Counting Formula for Φ119886119887119888
Now we present thepath-counting formula forΦ
119886119887119888
Φ119886119887119888= sum
119860
( sum
Type 1(1
2)
119871 triple
Φ119860119860119860
+ sum
Type 2(1
2)
119871 triple+1
Φ119860119860)
(12)
where Φ119860119860= (12)(1 + 119865
119860) Φ119860119860119860
= (14)(1 + 3119865119860) 119865119860 the
inbreeding coefficient of119860119860 a triple-common ancestor of 119886119887 and 119888 Type 1 ⟨119875
119860119886 119875119860119887 119875119860119888⟩ has zero root 2-overlap Type
2 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has one root 2-overlap path 119875
119860119904ending at
the individual 119904
119871 triple = 119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888minus 119871119875119860119904
for Type 2(13)
and 119871119875119860119886
the length of the path 119875119860119886
(also applicable for 119875119860119886
119875119860119888 and 119875
119860119904)
For completeness the path-counting formula for Φ119886119886119887
isgiven in Appendix A and the correctness proof of the path-counting formula is given in Appendix B
32 Path-Counting Formulas for Four Individuals
321 Path-Pair Level Graphical Representation of ⟨119875119860119886119875119860119887
119875119860119888119875119860119889⟩ Given a path-quad ⟨119875
119860119886 119875119860119887 119875119860119888 119875119860119889⟩ and
119876119906119886119889 119862(119875119860119886 119875119860119887 119875119860119888 119875119860119889) = 0 the path-quad can have 11
scenarios 1198780ndash11987810shown in Figure 10 where all four paths are
considered symmetricallyIn Figure 11 we introduce three building blocks 119861
1
1198612 1198613 For 119861
1and 119861
2 the rules presented in Figure 7 are also
applicable for Figure 11 For1198613 we only consider root overlap
because the crossover individuals can be eliminated by usingthe splitting operator introduced in Section 314 Note thatfor 1198613 if 119879119903119894 119862(119875
119860119886 119875119860119887 119875119860119888) = 0 then it is equivalent to the
scenario 1198783in Figure 8 Therefore we only need to consider
1198613when 119879119903119894 119862(119875
119860119886 119875119860119887 119875119860119888) = 0
322 Building Block-Based Cases Construction for ⟨119875119860119886119875119860119887
119875119860119888119875119860119889⟩ For a scenario 119878
119894(0 le 119894 le 10) in Figure 11 we
first decompose 119878119894to one or multiple building blocks For a
scenario 119878119894isin 1198781 1198783 it has only one building block and
all acceptable cases can be obtained directly For 1198782= 1199061=
1198611 1199062= 1198611 there is no need to consider the conflict between
the edges in 1199061and 119906
2because 119906
1and 119906
2are disconnected
Let 119877119894denote all acceptable cases of the path-pairs in 119906
119894 and
let 119879119894denote all acceptable cases for 119878
119894 Therefore we obtain
1198792= 1198771times1198772where times denotes the Cartesian product operator
from relational algebra
10 Computational and Mathematical Methods in Medicine
For B3 all three edges belong to root overlap (ie having root 3-overlap)
PAa
PAb PAcPAb
PAa
C(PAa PAb PAc) ne
B1 B2 B3
Tri 0
Figure 11 Building blocks for all scenarios of ⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
Table 2 Largest subgraph of a scenario 119878119894(4 le 119894 le 10 and 119894 = 6)
119878119894
1198784
1198785
1198787
1198788
1198789
11987810
119878119895
1198783
1198783
1198786
1198785
1198787
1198789
For 1198786= 1199061= 1198613 we obtain 119879
6= 1198771 For 119878
119894isin 119878119894| 4 le
119894 le 10 and 119894 = 6 we define the largest subgraph of 119878119894based
on which we construct 119879119894
Definition 7 (Largest Subgraph) Given a scenario 119878119894(4 le 119894 le
10 and 119894 = 6) the largest subgraph of 119878119894 denoted as 119878
119895 is
defined as follows
(1) 119878119895is a proper subgraph of 119878
119894
(2) if 119878119894contains 119861
3 then 119878
119895must also contain 119861
3
(3) no such 119878119896exists that 119878
119895is a proper subgraph of 119878
119896
while 119878119896is also a proper subgraph of 119878
119894
For each scenario 119878119894(4 le 119894 le 10 and 119894 = 6) we list the
largest subgraph of 119878119894 denoted as 119878
119895 in Table 2
For a scenario 119878119894(4 le 119894 le 10 and 119894 = 6) let Diff(119878
119894 119878119895)
denote the set of building blocks in 119878119894but not in 119878
119895 where 119878
119895is
the largest subgraph of 119878119894 Let |119864
119894| and |119864
119895| denote the number
of edges in 119878119894and 119878
119895 respectively According to Table 2 we
can conclude that |119864119894| minus |119864
119895| = 1 In order to leverage the
dependency among building blocks we consider only 1198612in
Diff(119878119894119878119895) For example Diff(119878
51198783) = 119861
2 Let119879
3denote all
acceptable cases for 1198783 And let119877
1denote the set of acceptable
cases for Diff(1198785 1198783) Then we can use 119878
3and Diff(119878
5
1198783) to construct all acceptable cases for 119878
5 Then we apply
this idea for constructing all acceptable cases for each 119878119894in
Table 2Given a path-quad ⟨119875
119860119886 119875119860119887 119875119860119888 119875119860119889⟩ an acceptable case
has the following properties
(1) if there is one root 3-overlap path there can be atmostone root 2-overlap path
(2) otherwise there can be at most two root 2-overlappaths
323 Path-Counting Formula forΦ119886119887119888119889
Now we present thepath-counting formula forΦ
119886119887119888119889as follows
Φ119886119887119888119889
= sum
119860
( sum
Type 1(1
2)
119871quad
Φ119860119860119860119860
+ sum
Type 2(1
2)
119871quad+1
Φ119860119860119860
+ sum
Type 3(1
2)
119871quad+2
Φ119860119860)
(14)
where Φ119860119860= (12)(1+119865
119860)Φ119860119860119860
= (14)(1+3119865119860)Φ119860119860119860119860
=
(18)(1+7119865119860) 119865119860 the inbreeding coefficient of119860119860 a quad-
common ancestor of 119886 119887 119888 and 119889 Type 1 zero root 2-overlapand zero root 3-overlap path Type 2 one root 2-overlap path119875119860119904
ending at 119904
Type 3
Case 1 two root 2-overlap paths 1198751198601199041
1198751198601199042
ending at 1199041and 1199042 respectively
Case 2 one root 3-overlap path119875119860119905
ending at 119905Case 3 one root 2-overlap path119875119860119904 one root 3-overlap
path 119875119860119905
ending at 119904 and 119905respectively
119871quad =
119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889
for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 119871119875119860119904
for Type 2119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889
minus1198711198751198601199041
minus 1198711198751198601199042
for Case 1 isin Type 3119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 2 lowast 119871
119875119860119905for Case 2 isin Type 3
119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889
minus119871119875119860119905minus 119871119875119860119904
for Case 3 isin Type 3(15)
and 119871119875119860119886
the length of the path 119875119860119886
(also applicable for 119875119860119887
119875119860119888 119875119860119889 etc)
For completeness the path-counting formulas for Φ119886119886119887119888
and Φ119886119886119886119887
are presented in Appendix A The correctness ofthe path-counting formula for four individuals is proven inAppendix C
Computational and Mathematical Methods in Medicine 11
⟨ ⟩(PAa PAb) (PAc PAd) = b
A
c
s t
da
A rarr s rarr aA rarr s rarr bA rarr t rarr cA rarr t rarr d
(a)
⟨ ⟩(PAa PAb) (PAc PAd) = b
A
c
x y
da
A rarr x rarr a
A rarr x rarr d
A rarr y rarr bA rarr y rarr c
(b)
Figure 12 Examples of 2-pair-path-quads for Φ119886119887119888119889
33 Path-Counting Formulas for Two Pairs of Individuals
331 Terminology and Definitions
(1) 2-Pair-Path-Pair It consists of two pairs of path-pairsdenoted as ⟨(119875
119878119886 119875119878119887) (119875119879119888 119875119879119889)⟩ where 119875
119878119886isin 119875(119878 119886) 119875
119878119887isin
119875(119878 119887) 119875119879119888isin 119875(119879 119888) 119875
119879119889isin 119875(119879 119889) 119878 is a common ancestor
of 119886 and 119887 and 119879 is a common ancestor of 119888 and 119889 If119860 = 119878 =119879 then 119860 is a quad-common ancestor of 119886 119887 119888 and 119889
(2) Homo-Overlap and Heter-Overlap Individual Given twopairs of individuals ⟨119886 119887⟩ and ⟨119888 119889⟩ if 119904 isin 119861119894 119862(119875
119860119886 119875119860119887) (or
119904 isin 119861119894 119862(119875119860119888 119875119860119889) we call 119904 a homo-overlap individual when
119875119860119886
and 119875119860119887
(or 119875119860119888
and 119875119860119889) pass through the same parent of
119904 If 119903 isin 119861119894 119862(119875119860119894 119875119860119895) where 119894 isin 119886 119887 and 119895 isin 119888 119889 we call
119903 a heter-overlap individual when 119875119860119894
and 119875119860119895
pass throughthe same parent of 119903
(3) Root Homo-Overlap and Heter-Overlap Path Given a 2-pair-path-pair ⟨(119875
119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ if 119904 is a homo-overlap
individual and the homo-overlap path extends all the wayto the quad-common ancestor 119860 then we call it a roothomo-overlap path If 119903 is a heter-overlap individual and theheter-overlap path extends all the way to the quad-commonancestor 119860 then we call it a root heter-overlap path
Example 8 119860 is quad-common ancestor for 119886 119887 119888 and 119889 inFigure 12 For (a) 119904 is a homo-overlap individual between 119875
119860119886
and 119875119860119887
119905 is a homo-overlap individual between 119875119860119888
and 119875119860119889 And
119860 rarr 119904 and 119860 rarr 119905 are root homo-overlap paths For (b) 119909 isa heter-overlap individual between 119875
119860119886and 119875
119860119889 119910 is a heter-
overlap individual between 119875119860119887
and 119875119860119888 And 119860 rarr 119909 and
119860 rarr 119910 are root heter-overlap paths
332 Path-Counting Formula for Φ119886119887119888119889
Now we presenta path-pair level graphical representation for ⟨(119875
119860119886 119875119860119887)
(119875119860119888 119875119860119889)⟩ shown in Figure 13 The options for an edge can
be 119879119883 119879119883 (Refer to Section 311 for definitions of 119879119883and 119879119883) Based on the different types of ⟨119875
119860119886 119875119860119887 119875119860119888 119875119860119889⟩
presented in (14) all cases for ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ are
summarized in Table 3 where ℎ is the last individual of a roothomo-overlap path 119875
119860ℎ(ie the path 119875
119860ℎending at ℎ) and 119903
1
and 1199032are the last individuals of root heter-overlap paths 119875
1198601199031
and 1198751198601199032
respectivelyGiven a pedigree graph having one or multiple progeni-
tors 119901119894| 119894 gt 0 we define that the generation of a progenitor
Table 3 A summary of all cases for ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩ ⟨(119875
119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩
Zero root 2-overlap andzero root 3-overlap
Zero root homo-overlap and zero rootheter-overlap
One root 2-overlap path
One root homo-overlap and zero rootheter-overlapZero root homo-overlap and one rootheter-overlap
Two root 2-overlap paths
Two root homo-overlaps and zero rootheter-overlapZero root homo-overlap and two rootheter-overlaps
One root 3-overlap path One root homo-overlap and two rootheter-overlaps and ℎ = 119903
1= 1199032
One root 2-overlap andone root 3-overlap
One root homo-overlap and two rootheter-overlaps and 119903
1= 1199032= ℎ
One root homo-overlap and two rootheter-overlaps and ℎ = 119903
1= 1199032
119901119894is 0 denoted as gen(119901
119894) = 0 If an individual 119886 has only
one parent 119901 then we define gen(119886) = gen(119901) + 1 If anindividual 119886 has two parents 119891 and 119898 we define gen(119886) =MAXgen(119891) gen(119898) + 1
The path-counting formula forΦ119886119887119888119889
is as follows
Φ119886119887119888119889
= sum
119860
( sum
Type 1(1
2)
1198712-pair
Φ119860119860119860
+ sum
Type 2(1
2)
1198712-pair+1
Φ119860119860119860
+ sum
Type 3(1
2)
1198712-pair+2
Φ119860119860
+ sum
Type 4(1
2)
1198712-pair+1
Φ119860119860)
+ sum
(119878119879)isinType 5(1
2)
119871⟨119875119878119886119875119878119887⟩+119871⟨119875119879119888119875119879119889
⟩+1
Φ119861119861
(16)
where 119860 a quad-common ancestor of 119886 119887 119888 and 119889 119878a common ancestor of 119886 and 119887 and 119879 a common ances-tor of 119888 and 119889 For ⟨(119875
119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ (119878 = 119879 =
119860) there are four types (ieType 1 to Type 4)
12 Computational and Mathematical Methods in Medicine
S0S1 S2 S3 S4 S5 S6 S7
S8 S9 S10 S11 S12 S13 S14 S15 S16
PAa
PAdPAb
PAc
Figure 13 Scenarios of ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ at path-pair level
Type 1 zero root homo-overlap and zero root heter-overlapType 2 zero root homo-overlap and one root heter-overlap 119875
119860119903ending at 119903
Type 3
zero root homo-overlap and two rootheter-overlap 119875
1198601199031and1198751198601199032
ending at1199031and 1199032 respectively
one root homo-overlap 119875119860ℎ
ending at ℎand two root heter-overlap 119875
1198601199031and 119875
1198601199032
ending at 1199031and 1199032 and 119903
1= 1199032
(17)
Type 4 one root homo-overlap 119875119860ℎ
ending at ℎ andtwo root heter-overlap ending at 119903
1and 1199032 and ℎ =
1199031= 1199032 For ⟨(119875
119878119886 119875119878119887) (119875119879119888 119875119879119889)⟩ (119878 = 119879) there is
one type (ie Type 5)Type 5 ⟨119875
119878119886 119875119878119887⟩ has zero overlap individual ⟨119875
119879119888
119875119879119889⟩ has zero overlap individual
At most one path-pair (either ⟨119875119878119886 119875119878119887⟩ or ⟨119875
119879119888
119875119879119889⟩) can have crossover individualsBetween a path from ⟨119875
119878119886 119875119878119887⟩ and a path from ⟨119875
119879119888 119875119879119889⟩
there are no overlap individuals but there can be crossoverindividuals 119909 where 119909 = 119878 and 119909 = 119879
119861=
119878 when gen (119878) lt gen (119879)119878 when gen (119878) = gen (119879)
and 119879 has two parents119879 otherwise
1198712-pair =
119871119875119860119886+ 119871119875119860119887
+119871119875119860119888+ 119871119875119860119889
for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 119871119875119860119903
for Type 2119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 1198711198751198601199031
minus 1198711198751198601199032
for Type 3119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 2 lowast 119871
119875119860ℎfor Type 4
119871⟨119875119878119886 119875119878119887⟩
= 119871119875119878119886+ 119871119875119878119887
for Type 5
119871⟨119875119879119888 119875119879119889⟩
= 119871119875119879119888+ 119871119875119879119889
for Type 5
(18)
Note that if ⟨119886 119887⟩ and ⟨119888 119889⟩ have zero quad-commonancestors we have the following formula for Φ
119886119887119888119889
Φ119886119887119888119889
= sum
(119878119879)isinType 6(1
2)
119871⟨119875119878119886119875119878119887⟩+119871⟨119875119879119888119875119879119889
⟩
Φ119878119878lowast Φ119879119879 (19)
Type 6 ⟨119875119878119886 119875119878119887⟩ is a nonoverlapping path-pair and ⟨119875
119879119888
119875119879119889⟩ is a nonoverlapping path-pair Between a path from
⟨119875119878119886 119875119878119887⟩ and a path from ⟨119875
119879119888 119875119879119889⟩ there are no overlap
individuals but there can be crossover individuals119871⟨119875119878119886 119875119878119887⟩
and 119871⟨119875119879119888119875119879119889⟩
are defined as in Type 5The correctness of the path-counting formula forΦ
119886119887119888119889is
proven in Appendix C For completeness please refer to [18]for the path-counting formulas for Φ
119886119886119887119888 Φ119886119887119886119888
Φ119886119887119886119887
andΦ119886119886119886119887
34 Experimental Results In this section we show the effi-ciency of our path-counting method using NodeCodes forcondensed identity coefficients by making comparisons withthe performance of a recursive method used in [10] Weimplemented two methods (1) using recursive formulas tocompute each required kinship coefficient and generalizedkinship coefficient (2) using path-counting method coupledwith NodeCodes to compute each required kinship coeffi-cient and generalized kinship coefficient independently Werefer to the first method as Recursive the second methodas NodeCodes For completeness please refer to [18] for thedetails of the NodeCodes-based method
Nodecodes of a node is a set of labels each representing apath to the node from its ancestors Given a pedigree graphlet 119903 be the progenitor (ie the node with 0 in-degree)(For simplicity we assume there is one progenitor 119903 asthe ancestor of all individuals in the pedigree Otherwise avirtual node 119903 can be added to the pedigree graph and allprogenitors can be made children of 119903) For each node 119906 inthe graph the set of NodeCodes of 119906 denoted as NC(119906) areassigned using a breadth-first-search traversal starting from119903 as follows
(1) If 119906 is 119903 then NC(119903) contains only one element theempty string
(2) Otherwise let 119906 be a node with NC(119906) and V0 V1
V119896be 119906rsquos children in sibling order then for each 119909
in NC(119906) a code 119909119894lowast is added to NC(V119894) where 0 le
119894 le 119896 and lowast indicates the gender of the individualrepresented by node V
119894
Computational and Mathematical Methods in Medicine 13
Computations of kinship coefficients for two individualsand generalized kinship coefficients for three individualspresented in [11 12 14 15] are using NodeCodes TheNodeCodes-based computation schemes can also be appliedfor the generalized kinship coefficients for four individualsand two pairs of individuals For completeness please referto [18] for the details using NodeCodes to compute thegeneralized kinship coefficients for four individuals and twopairs of individuals based on our proposed path-countingformulas in Sections 32 and 33
In order to test the scalability of our approach for cal-culating condensed identity coefficients on large pedigreeswe used a population simulator implemented in [11] togenerate arbitrarily large pedigreesThe population simulatoris based on the algorithm for generating populations withoverlapping generations in Chapter 4 of [19] along withthe parameters given in Appendix B of [20] to model therelatively isolated Finnish Kainuu subpopulation and itsgrowth during the years 1500ndash2000 An overview of thegeneration algorithmwas presented in [11 12 14]The param-eters include startingending year initial population sizeinitial age distribution marriage probability maximum ageat pregnancy expected number of children by time periodimmigration rate and probability of death by time period andage group
We examine the performance of condensed identity coef-ficients using twelve synthetic pedigrees which range from75 individuals to 195197 individuals The smallest pedigreespans 3 generations and the largest pedigree spans 19 gener-ations We analyzed the effects of pedigree size and the depthof individuals in the pedigree (the longest path between theindividual and a progenitor) on the computation efficiencyimprovement
In the first experiment 300 random pairs were selectedfrom each of our 12 synthetic pedigrees Figure 14 showscomputation efficiency improvement for each pedigree Ascan be seen the improvement of NodeCodes over Recursivegrew increasingly larger as the pedigree size increased froma comparable amount of 2683 on the smallest pedigree to9475 on the largest pedigree It also shows that path-count-ing method coupled with NodeCodes can scale very well onlarge pedigrees in terms of computing condensed identitycoefficients
In our next experiment we examined the effect of thedepth of the individual in the pedigree on the query time Foreach depth we generated 300 random pairs from the largestsynthetic pedigree
Figure 15 shows the effect of depth on the compu-tation efficiency improvement We can see the improve-ment of NodeCodes over Recursive ranging from 8648 to9130
4 Conclusion
We have introduced a framework for generalizing Wrightrsquospath-counting formula for more than two individuals Aim-ing at efficiently computing condensed identity coefficients
0
50
100
150
200
77 181
383
769
1558
3105
6174
1235
1
2466
7
4976
1
9832
8
1951
97
250
300
Aver
age t
ime (
ms)
Individuals in pedigree
RecursiveNodecodes
Figure 14 The effect of pedigree size on computation efficiencyimprovement
0200400600800
10001200140016001800
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Aver
age t
ime (
ms)
Depth
RecursiveNodeCodes
Figure 15 The effect of depth on computation efficiency improve-ment
we proposed path-counting formulas (PCF) for all general-ized kinship coefficients for which are sufficient for express-ing condensed identity coefficients by a linear combinationWe also perform experiments to compare the efficiency of ourmethod with the recursive method for computing condensedidentity coefficients on large pedigrees Our future workincludes (i) further improvements on condensed identifycoefficients computation by collectively calculating the setof generalized kinship coefficients to avoid redundant com-putations and (ii) experimental results for using PCF inconjunction with encoding schemes (eg compact path-encoding schemes [13]) for computing condensed identitycoefficients on very large pedigrees
Appendices
A Path-Counting Formulas of Special Cases
A1 Path-Counting Formula for Φ119886119886119887
For ⟨1198751198601198861 1198751198601198862⟩ we
introduce a special case where 1198751198601198861
and 1198751198601198862
aremergeable
14 Computational and Mathematical Methods in Medicine
PAa1 PAa2 PAa1 PAa2
S0 S1
PAb PAb PAb
If is mergeable⟨P ⟩Aa1 PAa2
PAa
S2 S3
Figure 16 A path-pair level graphical representation of ⟨1198751198601198861 1198751198601198862
119875119860119887⟩
Definition A1 (Mergeable Path-Pair) A path-pair ⟨1198751198601198861
1198751198601198862⟩ is mergeable if and only if the two paths 119875
1198601198861and 119875
1198601198862
are completely identical
Next we present a graphical representation of ⟨1198751198601198861 1198751198601198862
119875119860119887⟩ in Figure 16
Lemma A2 For 1198782and 119878
3in Figure 16 ⟨119875
1198601198861 1198751198601198862⟩ cannot
be a mergeable path-pair
Proof For 1198782and 119878
3 if ⟨119875
1198601198861 1198751198601198862⟩ is mergeable then
any common individual 119904 between 1198751198601198861
and 119875119860119887
is alsoa shared individual between 119875
1198601198862and 119875
119860119887 It means
119904 isin 119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) which contradicts the fact that
119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) = 0
Considering all three scenarios in Figure 16 only 1198781can
have a mergeable path-pair ⟨1198751198601198861 1198751198601198862⟩ by Lemma A2 Now
we present our path-counting formula forΦ119886119886119887
where 119886 is notan ancestor of 119887
Φ119886119886119887
= sum
119860
( sum
Type 1(1
2)
119871 tripleminus1
Φ119860119860119860
+ sum
Type 2(1
2)
119871 triple
Φ119860119860
+ sum
Type 3(1
2)
119871⟨119875119860119886119875119860119887⟩+1
Φ119860119860)
(A1)
where 119860 a common ancestor of 119886 and 119887When ⟨119875
1198601198861 1198751198601198862⟩ is not mergeable
Type 1 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has no root 2-overlap
Type 2 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has one root 2-overlap path
119875119860119904
ending at the individual 119904
When ⟨1198751198601198861 1198751198601198862⟩ is mergeable
Type 3 ⟨119875119860119886 119875119860119887⟩ is a nonoverlapping path-pair
119871 triple = 1198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887
for Type 11198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887minus 119871119875119860119904
for Type 2
119871⟨119875119860119886 119875119860119887⟩
= 119871119875119860119886+ 119871119875119860119887
for Type 3
(A2)
For the sake of completeness if 119886 is an ancestor of 119887 there isno recursive formula for Φ
119886119886119887in [10] but we can use either
the recursive formula for Φ119886119887119888
or the path-counting formulaforΦ119886119887119888
to computeΦ11988611198862119887
A2 Path-Counting Formula for Φ119886119886119887119888
Given a path-quad⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩ if ⟨119875
1198601198861 1198751198601198862⟩ is not mergeable then
we process the path-quad as equivalent to ⟨119875119860119886 119875119860119887 119875119860119888
119875119860119889⟩ If ⟨119875
1198601198861 1198751198601198862⟩ is mergeable the path-quad ⟨119875
1198601198861 1198751198601198862
119875119860119887 119875119860119888⟩ can be condensed to scenarios for ⟨119875
119860119886 119875119860119887 119875119860119888⟩
Now we present a path-counting formula forΦ119886119886119887119888
where119886 is not an ancestor of 119887 and 119888 as follows
Φ119886119886119887119888
= sum
119860
( sum
Type 1(1
2)
119871quadminus1
Φ119860119860119860119860
+ sum
Type 2(1
2)
119871quad
ΦAAA
+ sum
Type 3(1
2)
119871quad+1
Φ119860119860)
+sum
119860
( sum
Type 4(1
2)
119871 triple+1
Φ119860119860119860
+ sum
Type 5(1
2)
119871 triple+2
Φ119860119860)
(A3)
where 119860 a quad-common ancestor of 119886 119887 119888 and 119889When ⟨119875
1198601198861 1198751198601198862⟩ is not mergeable
Type 1 zero root 2-overlap and zero root 3-overlappathType 2 one root 2-overlap path 119875
119860119904ending at 119904
Type 3
Case 1 two root 2-overlap paths 1198751198601199041
and 1198751198601199042
ending at 1199041and 1199042 respectively
Case 2 one root 3-overlap path 119875119860119905
ending at 119905Case 3 one root 2-overlapand one root 3-overlap paths119875119860119904
and 119875119860119905
ending at 119904 and 119905respectively
(A4)
When ⟨1198751198601198861 1198751198601198862⟩ is mergeable
Type 4 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has zero root 2-overlap path
Type 5 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has one root 2-overlap path119875
119860119904
ending at 119904
119871quad=
1198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
for Type 11198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus119871119875119860119904
for Type 21198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus1198711198751198601199041
minus 1198711198751198601199042
for Case 1isinType 31198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus119871119875119860119905
for Case 2isinType 31198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus119871119875119860119905minus 119871119875119860119904
for Case 3isinType 3
119871 triple = 119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
for Type 4119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888minus 119871119875119860119904
for Type 5(A5)
Computational and Mathematical Methods in Medicine 15
Note that if 119886 is an ancestor of either 119887 or 119888 or both ofthem then the path-counting formula of Φ
119886119887119888119889is applicable
to computeΦ11988611198862119887119888
A3 Path-Counting Formula for Φ119886119886119886119887
A special case of⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ for ⟨119875
1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ is introduced
when ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable With the existence of
a mergeable path-triple ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ can be con-
densed to ⟨119875119860119886 119875119860119887⟩
Definition A3 (Mergeable Path-Triple) Given three paths1198751198601198861
1198751198601198862
and 1198751198601198863
they are mergeable if and only if theyare completely identical
Lemma A4 Given a path-quad ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there
must be at least one mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩
⟨1198751198601198861 1198751198601198863⟩ ⟨1198751198601198862 1198751198601198863⟩
Proof For an individual 119886 with two parents 119891 and 119898 thepaternal allele of the individual 119886 is transmitted from 119891 andthe maternal allele is transmitted from119898 At allele level onlytwo descent paths starting from an ancestor are allowed Fora path-quad ⟨119875
1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there must be at least one
mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩ ⟨1198751198601198861 1198751198601198863⟩ and
⟨1198751198601198862 1198751198601198863⟩
For simplicity we treat ⟨1198751198601198861 1198751198601198862⟩ as a default mergeable
path-pairNow we present the path-counting formula for Φ
119886119886119886119887
where 119886 is not an ancestor of 119887 as follows
Φ119886119886119886119887
= sum
119860
(3
2( sum
Type 1(1
2)
119871 tripleminus1
Φ119860119860119860
+ sum
Type 2(1
2)
119871 triple
Φ119860119860)
+ sum
Type 3(1
2)
119871pair+2
Φ119860119860)
(A6)
where 119860 a common ancestor of 119886 and 119887When there is only one mergeable path-pair (let us con-
sider ⟨1198751198601198861 1198751198601198862⟩ as the mergeable path-pair)
Type 1 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has zero root 2-overlap path
Type 2 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has one root 2-overlap path
119875119860119904
ending at 119904
When ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable
Type 3 ⟨119875119860119886 119875119860119887⟩ is nonoverlapping
119871 triple = 1198711198751198601198861
+ 1198711198751198601198863
+ 119871119875119860119887
for Type 11198711198751198601198861
+ 1198711198751198601198863
+ 119871119875119860119887minus 119871119875119860119904
for Type 2
119871pair = 119871119875119860119886 + 119871119875119860119887 for Type 3
(A7)
Note that if 119886 is an ancestor of 119887 we treat Φ119886119886119886119887
=
Φ119886111988621198863119887
Then we apply the path-counting formula forΦ119886119887119888119889
to computeΦ119886111988621198863119887
Case21 Case31 ΦAAAΦabCase22 Case32
Case23 ΦAA
Figure 17 Dependency graph for different cases regardingΦ119886119887119888
andΦ119886119886119887
B Proof for Path-Counting Formulas ofThree Individuals
Wefirst demonstrate that for one triple-common ancestor119860the path-counting computation of Φ
119886119887119888is equivalent to the
computation using recursive formulas Then we prove thecorrectness of the path-counting computation for multipletriple-common ancestors
B1 One Triple-Common Ancestor Considering the differenttypes of path-triples starting from a triple-common ancestor119860 in a pedigree graph119866 contributing toΦ
119886119887119888andΦ
119886119886119887119866 can
have 5 different cases
Case 21 119866 does not haveany path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with root overlapCase 22 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with root overlapCase 23 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
having mergeablepath-pair⟨119875
1198601198861 1198751198601198862⟩
lArr997904 Φ119886119886119887
Case 31 119866 does not haveany path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩
with root overlapCase 32 119866 has path-triples
⟨119875119860119886 119875119860119887 119875119860119888⟩
with root overlap
lArr997904 Φ119886119887119888
(B1)
Based on the 5 cases from Case 21 to Case 32 we firstconstruct a dependency graph shown in Figure 17 consist-ent with the recursive formulas (3) (4) and (5) for the gener-alized kinship coefficients for three individuals
Then we take the following steps to prove the correctnessof the path-counting formulas (12) and (A1)
(i) forΦ119886119887 the correctness of the path-counting formula
(ie Wrightrsquos formula) is proven in [21] For Case 21and Case 22 the correctness is proven based on thecorrectness of Cases 31 and 32
(ii) for Case 23 it has no cycle but only depends on Φ119886119887
Thus we prove the correctness of Case 23 by trans-forming the case toΦ
119886119887
16 Computational and Mathematical Methods in Medicine
a b
c
(a)
A
a b c
(b)
Figure 18 (a) 119888 is a parent of 119886 and 119887 (b) no individual is a parent of another
Parent-child relationshipAncestor-descendant relationship
A
a
s v p
f b c
(a)
Parent-child relationshipAncestor-descendant relationship
c
a
s v
f b
(b)
Figure 19 (a) No individual is a parent of another (b) 119888 is an ancestor of 119886 and 119887
(iii) for Cases 31 and 32 the correctness is proven byinduction on the number of edges 119899 in the pedigreegraph 119866
B11 Correctness Proof for Case 31
Case 31 ForΦ119886119887119888
119866 does not have any path triples ⟨119875119860119886 119875119860119887
119875119860119888⟩ with root overlap
Proof (Basis) There are two basic scenarios (i) one individ-ual is a parent of another (ii) no individual is a parent ofanother among 119886 119887 and 119888
Using the recursive formula (3) to compute Φ119886119887119888
forFigure 18(a) Φ
119886119887119888= (12)Φ
119888119887119888= (12)
2
Φ119888119888119888 for Figure 18(b)
Φ119886119887119888= (12)Φ
119860119887119888= (12)
2
Φ119860119860119888
= (12)3
Φ119860119860119860
Using the path-counting formula (12) if a path-triple
⟨119875119860119886 119875119860119887 119875119860119888⟩ has no root overlap (ie Type 1) then the
contribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ
119886119887119888can be computed as
follows sumType 1(12)119871⟨119875119860119886119875119860119887
119875119860119888⟩Φ119860119860119860
where 119871⟨119875119860119886119875119860119887 119875119860119888⟩
=
119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
For Figure 18(a) 119888 is the only triple-common ancestor
and we obtain Φ119886119887119888
= (12)119871⟨119875119888119886119875119888119887
119875119888119888⟩Φ119888119888119888
= (12)2
Φ119888119888119888 for
Figure 18(b) we obtain Φ119886119887119888
= (12)119871⟨119875119860119886119875119860119887
119875119860119888⟩Φ119860119860119860
=
(12)3
Φ119860119860119860
Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Then we show it istrue for 119899 = 119896 + 1
For Figures 19(a) and 19(b) among 119886 119887 and 119888 let 119886 be theindividual having the longest path starting from their triple-common ancestor in the pedigree graph119866with (119896+1) edgesIf we remove the node 119886 and cut the edge 119891 rarr 119886 from 119866
then the new graph 119866lowast has 119896 edges In terms of computingΦ119891119887119888
119866lowast satisfies the condition for induction hypothesisFor Figure 19(a) Φ
119891119887119888= sumType 1(12)
119871⟨119875119860119891119875119860119887119875119860119888⟩Φ119860119860119860
Based on the recursive formula (3)Φ
119886119887119888= (12)(Φ
119891119887119888+Φ119898119887119888)
where 119891 and 119898 are parents of 119886 In 119866 119886 only has one parent119891 thus it indicatesΦ
119898119887119888= 0 Then we can plug-in the path-
counting formula forΦ119891119887119888
to obtain
Φ119886119887119888=1
2Φ119891119887119888
=1
2lowast sum
Type 1(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩
Φ119860119860119860
= sum
Type 1(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860119860
∵ 119871⟨119875119860119886119875119860119887 119875119860119888⟩
= 119871⟨119875119860119891119875119860119887 119875119860119888⟩
+ 1
there4 Φ119886119887119888= sum
Type 1(1
2)
119871⟨119875119860119886119875119860119887119875119860119888⟩
Φ119860119860119860
(B2)
Similarly for Figure 19(b) we obtain Φ119886119887119888
=
sumType 1(12)119871⟨119875119888119891119875119888119887119875119888119888⟩+1
Φ119888119888119888= sumType 1(12)
119871⟨119875119888119886119875119888119887119875119888119888⟩Φ119888119888119888
Thus it is true for 119899 = 119896 + 1
B12 Correctness Proof for Case 32
Case 32 ForΦ119886119887119888
119866 has path triples ⟨119875119860119886 119875119860119887 119875119860119888⟩with root
overlap
Proof (Basis) There are three basic scenarios (i) there are twoindividuals who are parents of another (ii) there is only oneindividual who is parent of another (iii) there is no individualwho is a parent of another among 119886 119887 and 119888
Computational and Mathematical Methods in Medicine 17
a
b
c
(a)
A
a
b c
(b)
A
a
s
b
c
(c)
Figure 20 (a) 119887 is a parent of 119886 and 119888 is a parent of 119887 (b) 119887 is a parentof 119886 (c) no individual who is a parent of another
Using the recursive formula (3) to compute Φ119886119887119888
inFigure 20 for Figure 20(a) Φ
119886119887119888= (12)Φ
119887119887119888= (12)
2
Φ119887119888=
(12)3
Φ119888119888 for Figure 20(b)Φ
119886119887119888= (12)Φ
119887119887119888= (12)
2
Φ119887119888=
(12)4
Φ119860119860
for Figure 20(c)Φ119886119887119888= (12)
2
Φ119904119904119888= (12)
3
Φ119904119888=
(12)5
Φ119860119860
Using the path-counting formula (12) if a path-triple
⟨119875119860119886 119875119860119887 119875119860119888⟩ has root overlap (ie Type 2) then the con-
tribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ
119886119887119888can be computed as
followssumType 2(12)119871⟨119875119860119886119875119860119887
119875119860119888⟩+1
Φ119860119860
where 119871⟨119875119860119886 119875119860119887 119875119860119888⟩
=
119871119875119860119886
+ 119871119875119860119887
+ 119871119875119860119888minus 119871119875119860119904
and 119904 is the last individual of theroot overlap path 119875
119860119904
For Figure 20(a) 119888 is the only triple-common ancestorand we obtain Φ
119886119887119888= (12)
119871⟨119875119888119886119875119888119887119875119888119888⟩+1
Φ119888119888= (12)
2+1
Φ119888119888=
(12)3
Φ119888119888 Similarly for Figures 20(b) and 20(c) we obtain
Φ119886119887119888= (12)
4
Φ119860119860
and Φ119886119887119888= (12)
5
Φ119860119860
respectively
Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Show that it is truefor = 119896 + 1
For Figures 21(a) 21(b) and 21(c) among 119886 119887 and 119888 let119886 be the individual who has the longest path and let 119901 be aparent of 119886 Then we cut the edge 119901 rarr 119886 from 119866 and obtaina new graph 119866lowast which satisfies the condition of inductionhypothesis For Figure 21(a) we use the path-counting for-mula forΦ
119891119887119888in 119866lowast Φ
119891119887119888= sumType 2(12)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860
In 119866 119891 is the only parent of 119886 according to the recursive
formula (3) we have Φ119886119887119888= (12)Φ
119891119887119888 Then we can plug-in
the Φ119891119887119888
and obtain
Φ119886119887119888=1
2Φ119891119887119888
=1
2sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860
= sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1
Φ119860119860
∵ 119871⟨119875119860119886 119875119860119887 119875119860119888⟩
= 119871⟨119875119860119891119875119860119887 119875119860119888⟩
+ 1
there4 Φ119886119887119888= sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1
Φ119860119860
= sum
Type 2(1
2)
119871⟨119875119860119886119875119860119887119875119860119888⟩+1
Φ119860119860
(B3)
For Figures 21(b) and 21(c) we take the same steps as we cal-culate Φ
119886119887119888for Figure 21(a)
In summary it is true for 119899 = 119896 + 1
A
a
s
t
f
b
c
(a)
a
t
b
A
s c
(b)
a
s
t
b
c
(c)Figure 21 (a) No individual who is a parent of another (b) 119887 is aparent of 119886 (c) 119887 is a parent of 119886 and 119888 is an ancestor of 119887
B13 Correctness Proof for Case 23
Case 23 For Φ119886119886119887
the path-triples in the pedigree graph 119866have mergeable path-pair
Proof Considering the relationship between 119886 and 119887 119866has two scenarios (i) 119887 is not an ancestor of 119886 (ii) 119887 isan ancestor of 119886 Using the path-counting formula (A1)if a path-triple ⟨119875
1198601198861 1198751198601198862 119875119860119887⟩ isin Type 3 which means
that it has a mergeable path-pair then the contributionof ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ to Φ
119886119886119887can be computed as follows
sumType 3(12)119871⟨119875119860119886119875119860119887
⟩+1Φ119860119860
where 119871⟨119875119860119886 119875119860119887⟩
= 119871119875119860119886+ 119871119875119860119887
Using the recursive formula (4) we obtain Φ
119886119886119887=
(12)(Φ119886119887+ Φ119891119898119887)
For Figure 22(a) 119860 is a common ancestor of 119886 and 119887∵ 119886 only has one parent 119891
there4 Φ119886119886119887
=1
2(Φ119886119887+ Φ119891119898119887)
=1
2(Φ119886119887+ 0) =
1
2Φ119886119887
(as 119898 is missing) (B4)
For Φ119886119887 we use Wrightrsquos formula and obtain Φ
119886119887=
sum119875(12)119871⟨119875119860119886119875119860119887
⟩Φ119860119860
where 119875 denotes all nonoverlappingpath-pairs ⟨119875
119860119886 119875119860119887⟩
Then we have Φ119886119886119887
= (12)Φ119886119887
=
(12)sum119875(12)119871⟨119875119860119886119875119860119887
⟩Φ119860119860= sum119875(12)119871⟨119875119860119886119875119860119887
⟩+1Φ119860119860
For Figure 22(b) we can also transform the computation
of Φ119886119886119887
to Φ119886119887
In summary it shows that the path-counting formula(A1) is true for Case 23
B14 Correctness Proof for Cases 21 and 22 For Φ119886119886119887
whenthere is no path-triple having mergeable path-pair (ie thepath-triple belongs to either Case 21 or Case 23)Φ
119886119886119887can be
transformed toΦ11988611198862119887
which is equivalent to the computationof Φ119886119887119888
for Cases 31 and 32 The correctness of our path-counting formula for Cases 31 and 32 is proven Thus weobtain the correctness for Φ
119886119886119887when the path-triple belongs
to either Case 21 or Case 22
B2 Multiple Triple-Common Ancestors Now we providethe correctness proof for multiple triple-common ancestorsregarding the path-counting formulas (12) and (A1)
18 Computational and Mathematical Methods in Medicine
A
a
s
w
t
f
b
Parent-child relationshipAncestor-descendant relationship
(a)
a
s
f
b
Parent-child relationshipAncestor-descendant relationship
(b)
Figure 22 (a) 119887 is not an ancestor of 119886 (b) 119887 is an ancestor of 119886
Lemma A Given a pedigree graph 119866 and three individuals 119886119887 119888 having at least one trip-common ancestorΦ
119886119887119888is correctly
computed using the path counting formulas (12) and (A1)
Proof Proof by induction on the number of triple-commonancestorsBasis 119866 has only one triple-common ancestor of 119886 119887 and 119888
The correctness of (12) and (A1) for 119866 with only one tri-ple-common ancestor of 119886 119887 and 119888 is proven in the previoussection
Induction Hypothesis Assume that if 119866 has 119896 or less triple-common ancestors of 119886 119887 and 119888 (12) and (A1) are correct for119866
Induction Step Now we show that it is true for 119866 with 119896 + 1triple-common ancestors of 119886 119887 and 119888
Let 119879119903119894 119862(119886 119887 119888 119866) denote all triple-common ancestorsof 119886 119887 and 119888 in 119866 where 119879119903119894 119862(119886 119887 119888 119866) = 119860
119894| 1 le 119894 le 119896 +
1 Let 1198601be the most top triple-common ancestor such that
there is no individual among the remaining ancestors 119860119894|
2 le 119894 le 119896 + 1 who is an ancestor of 1198601 Let 119878(119860
1) denote the
contribution from 1198601to Φ119886119887119888
Because119860
1is themost top triple-common ancestor there
is no path-triple from 119860119894| 2 le 119894 le 119896 + 1 to 119886 119887 and
119888 which passes through 1198601 Then we can remove 119860
1from
119866 and delete all out-going edges from 1198601and obtain a new
graph 1198661015840 which has 119896 triple-common ancestors of 119886 119887 and 119888It means 119879119903119894 119862(119886 119887 119888 1198661015840) = 119860
119894| 2 le 119894 le 119896 + 1
For the new graph 1198661015840 we can apply our induction
hypothesis and obtainΦ119886119887119888(1198661015840
)For the most top triple-common ancestor 119860
1 there are
two different cases considering its relationship with the othertriple-common ancestors
(1) there is no individual among 119860119894| 2 le 119894 le 119896 + 1 who
is a descendant of 1198601
(2) there is at least one individual among 119860119894| 2 le 119894 le
119896 + 1 who is a descendant of 1198601
For (1) since no individual among 119860119894| 2 le 119894 le 119896 + 1 is a
descendant of 1198601 the set of path-triples from 119860
1to 119886 119887 and
119888 is independent of the set of path-triples from 119860119894| 2 le 119894 le
119896 + 1 to 119886 119887 and 119888 It also means that the contribution from
1198601toΦ119886119887119888
is independent of the contribution from the othertriple-common ancestors
Summing up all contributions we can obtainΦ119886119887119888(119866) =
Φ119886119887119888(1198661015840
) + 119878(1198601)
For (2) let119860119895be one descendant of119860
1 Now both119860
1and
119860119895can reach 119886 119887 and 119888119901119905119894= 119905119886 1198601rarr sdot sdot sdot rarr 119886 119905
119887 1198601rarr sdot sdot sdot rarr 119887 119905
119888 1198601rarr
sdot sdot sdot rarr 119888 a path-triple from 1198601to 119886 119887 and 119888
If 119905119886 119905119887 and 119905
119888all pass through119860
119895 then the path-triple119901119905
119894
is not an eligible path-triple for Φ119886119887119888
When we compute thecontribution from119860
1toΦ119886119887119888
we exclude all such path-tripleswhere 119905
119886 119905119887 and 119905
119888all pass through a lower triple-common
ancestor In other words an eligible path-triple from 1198601
regarding Φ119886119887119888
cannot have three paths all passing through alower triple-common ancestor Therefore we know that thatthe contribution from119860
1toΦ119886119887119888
is independent of the contri-bution from the other triple-common ancestors Summing upall contributions we obtainΦ
119886119887119888(119866) = Φ
119886119887119888(1198661015840
) + 119878(1198601)
C Proof for Four Individuals and TwoPairs of Individuals
Here we give a proof sketch for the correctness of pathcounting formulas for four individuals First of all for fourindividuals in a pedigree graph 119866 we present all differentcases based on which we construct a dependency graphThe correctness of the path-counting formulas for two-pairindividuals can be proved similarly
C1 Proof for Four Individuals Consider the existence ofdifferent types of path-quads regarding Φ
119886119887119888119889 Φ119886119886119887119888
andΦ119886119886119886119887
there are 15 cases for a pedigree graph 119866
Case 21 119866 has path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with zero root overlapCase 22 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with one root overlapCase 23 119866 has path-pairs
⟨119875119860119886 119875119860119887⟩
with zero root overlap
lArr997904 Φ119886119886119886119887
Computational and Mathematical Methods in Medicine 19
Case21
Case31 ΦAAA
ΦAAA
Case41
Case42
Case34ΦAA
Case32
Case331
Case22
Case23
Case431
Case35
Case432
Case4 33
Case332
Case333
Figure 23 Dependency graph for different cases for four individuals
Case 31 119866 has path-quads⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with zero root overlapCase 32 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 2-overlapCase 331 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with two root 2-overlapCase 332 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 3-overlapCase 333 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 2-overlapand one root 3-overlap
Case 34 119866 has path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩
with zero root overlapCase 35 119866 has path-triples
⟨119875119860119886 119875119860119887 119875119860119888⟩
with one root overlap
lArr997904 Φ119886119886119887119888
Case 41 119866 has path-quads⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with zero root overlapCase 42 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 2-overlapCase 431 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with two root 2-overlapCase 432 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 3-overlapCase 433 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 2-overlapand one root 3-overlap
lArr997904 Φ119886119887119888119889
(C1)Then we construct a dependency graph shown in
Figure 23 for all cases for four individualsAccording to the dependency graph in Figure 23 the
intermediate steps including Cases 34 and 35 are already
proved for the computation of Φ119886119887119888
The correctness of thetransformation fromCase 42 to Case 34 can be proved basedon the recursive formula forΦ
119886119887119888119889andΦ
119886119886119887119888 Similarly we can
obtain the transformation from Case 431 to Case 35
C2 Proof for TwoPairs of Individuals Consider the existenceof different types of 2-pair-path-pair regarding Φ
119886119887119888119889 there
are 9 cases which are listed as follows
Case 41 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-
overlap and zero root heter-overlap
Case 42 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-
overlap and one root heter-overlap
Case 431 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root
homo-overlap and two root heter-overlap
Case 432 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root
homo-overlap and two root heter-overlap
Case 44119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root homo-
overlap and zero root heter-overlap
Case 45 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with two root homo-
overlap and zero root heter-overlap
Case 46 119866 has path-triples ⟨119875119860119886 119875119860119887 119875119860119888⟩ with zero root
overlapCase 47 119866 has path-triples ⟨119875
119860119886 119875119860119887 119875119860119888⟩ with one root
overlap
Case 48 119866 has path-pairs ⟨119875119879119888 119875119879119889⟩ with zero root overlap
Then we construct a dependency graph for the casesrelating to Φ
119886119887119888119889in Figure 24
According to the dependency graph in Figure 24Cases 46 47 and 48 are the intermediate steps whichalready are proved for the computation of Φ
119886119887119888 The
correctness of the transformation from Case 42 to Case 46can be proved based on the recursive formula for Φ
119886119887119888119889and
Φ119886119887119886119888
Similarly we can obtain the transformation fromCases 431 and 432 to Case 47 as well as from Case 44 toCase 48 accordingly
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
20 Computational and Mathematical Methods in Medicine
Case41
Case44
ΦAAA
Case42 Case46
Case48
ΦAA
ΦTT
Case431 Case47
Case432
ΦAAAA
Figure 24 Dependency graph for different cases for two pairs of individuals
Acknowledgments
The authors thank Professor Robert C Elston Case Schoolof Medicine for introducing to them the identity coefficientsand referring them to the related literature [7 10 17] Thiswork is partially supported by the National Science Founda-tionGrants DBI 0743705 DBI 0849956 andCRI 0551603 andby the National Institute of Health Grant GM088823
References
[1] Surgeon Generalrsquos New Family Health History Tool Is ReleasedReady for ldquo21st Century Medicinerdquo httpcompmedcomcate-gorypeople-helping-peoplepage7
[2] M Falchi P Forabosco E Mocci et al ldquoA genomewidesearch using an original pairwise sampling approach for largegenealogies identifies a new locus for total and low-density lipo-protein cholesterol in two genetically differentiated isolates ofSardiniardquoThe American Journal of Human Genetics vol 75 no6 pp 1015ndash1031 2004
[3] M Ciullo C Bellenguez V Colonna et al ldquoNew susceptibilitylocus for hypertension on chromosome 8q by efficient pedigree-breaking in an Italian isolaterdquo Human Molecular Genetics vol15 no 10 pp 1735ndash1743 2006
[4] Glossary of Genetic Terms National Human Genome ResearchInstitute httpwwwgenomegovglossaryid=148
[5] CW CottermanA calculus for statistico-genetics [PhD thesis]Columbus Ohio USA Ohio State University 1940 Reprintedin P Ballonoff Ed Genetics and Social Structure DowdenHutchinson amp Ross Stroudsburg Pa USA 1974
[6] G Malecot Les mathematique de lrsquoheredite Masson ParisFrance 1948 Translated edition The Mathematics of HeredityFreeman San Francisco Calif USA 1969
[7] M Gillois ldquoLa relation drsquoidentite en genetiquerdquo Annales delrsquoInstitut Henri Poincare B vol 2 pp 1ndash94 1964
[8] D L Harris ldquoGenotypic covariances between inbred relativesrdquoGenetics vol 50 pp 1319ndash1348 1964
[9] A Jacquard ldquoLogique du calcul des coefficients drsquoidentite entredeux individualsrdquo Population vol 21 pp 751ndash776 1966
[10] G Karigl ldquoA recursive algorithm for the calculation of identitycoefficientsrdquo Annals of Human Genetics vol 45 no 3 pp 299ndash305 1981
[11] B Elliott S F Akgul S Mayes and Z M Ozsoyoglu ldquoEfficientevaluation of inbreeding queries on pedigree datardquo in Proceed-ings of the 19th International Conference on Scientific and Statis-tical Database Management (SSDBM rsquo07) July 2007
[12] B Elliott E Cheng S Mayes and Z M Ozsoyoglu ldquoEfficientlycalculating inbreeding on large pedigrees databasesrdquo Informa-tion Systems vol 34 no 6 pp 469ndash492 2009
[13] L Yang E Cheng and Z M Ozsoyoglu ldquoUsing compactencodings for path-based computations on pedigree graphsrdquo inProceedings of the ACM Conference on Bioinformatics Compu-tational Biology and Biomedicine (ACM-BCB rsquo11) pp 235ndash244August 2011
[14] E Cheng B Elliott and Z M Ozsoyoglu ldquoScalable compu-tation of kinship and identity coefficients on large pedigreesrdquoin Proceedings of the 7th Annual International Conference onComputational Systems Bioinformatics (CSB rsquo08) pp 27ndash362008
[15] E Cheng B Elliott and Z M Ozsoyoglu ldquoEfficient compu-tation of kinship and identity coefficients on large pedigreesrdquoJournal of Bioinformatics and Computational Biology (JBCB)vol 7 no 3 pp 429ndash453 2009
[16] S Wright ldquoCoefficients of inbreeding and relationshiprdquo TheAmerican Naturalist vol 56 no 645 1922
[17] R Nadot and G Vaysseix ldquoKinship and identity algorithm ofcoefficients of identityrdquo Biometrics vol 29 no 2 pp 347ndash3591973
[18] E Cheng Scalable path-based computations on pedigree data[PhD thesis] Case Western Reserve University ClevelandOhio USA 2012
[19] V Ollikainen Simulation Techniques for Disease Gene Localiza-tion in Isolated Populations [PhD thesis] University ofHelsinkiHelsinki Finland 2002
[20] H T T Toivonen P Onkamo K Vasko et al ldquoData miningapplied to linkage diseqilibrium mappingrdquoThe American Jour-nal of Human Genetics vol 67 no 1 pp 133ndash145 2000
[21] W Boucher ldquoCalculation of the inbreeding coefficientrdquo Journalof Mathematical Biology vol 26 no 1 pp 57ndash64 1988
Submit your manuscripts athttpwwwhindawicom
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MEDIATORSINFLAMMATION
of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Behavioural Neurology
EndocrinologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Disease Markers
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
OncologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Oxidative Medicine and Cellular Longevity
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
PPAR Research
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
ObesityJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational and Mathematical Methods in Medicine
OphthalmologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Diabetes ResearchJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Research and TreatmentAIDS
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Gastroenterology Research and Practice
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Parkinsonrsquos Disease
Evidence-Based Complementary and Alternative Medicine
Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom
4 Computational and Mathematical Methods in Medicine
Mat
erna
lPa
tern
al
Δ1 Δ2 Δ3 Δ4 Δ5 Δ6 Δ7 Δ8 Δ9
arsquos allelesbrsquos alleles
Figure 1 The 15 possible identity states for individuals 119886 and 119887 grouped by their 9 condensed states Lines indicate alleles that are IBD
A
c s d
e f
t
a b
Non-overlapping path-pair
Three independent paths
t is a crossover individual
and the overlap path is a root 2-overlap path
t is a 2-overlap individual and e is acrossover individual
t is a crossover individual s is a 2-overlapindividual and the overlap path is a root 2-overlap path
overlap individuals and the overlap path is a root 2-overlap path
e is a crossover individual t is a 2-overlapindividual and the overlap path is not a root 2-overlap path c is a 2-overlap individual and theoverlap path is a root 2-overlap path
Path-triple6
t is a crossover individual
s e t are 2-overlap individuals
c is a 3-overlap individual and e t are 2-
A rarr s rarr e rarr t rarr aA rarr s rarr e rarr t rarr b
A rarr s rarr e rarr t rarr aA rarr drarr b
A rarr s rarr e rarr t rarr aA rarrA rarr c
A rarr c
A rarr c
Path-pair1
Path-pair2
A rarr d rarr f rarr t rarr bA rarr s rarr e rarr t rarr a
A rarr s rarr e rarr t rarr aA rarr s rarr e rarr t rarr b
d rarr f
A rarr s rarr e rarr t rarr aA rarr d rarr f rarr t rarr b
A rarr c rarr t rarr e rarr aA rarr d rarr f rarr t rarr b
A rarr s rarr e rarr t rarr aA rarr s rarr f rarr t rarr bA rarr c
A rarr c rarr e rarr t rarr aA rarr c rarr e rarr t rarr bA rarr c
A rarr c rarr e rarr t rarr aA rarr c rarr e rarr t rarr bA rarr c
Path-triple1
Path-triple2
Path-triple3
Path-triple4
Path-pair3
Path-pair4
Path-triple5
s e t are 2-overlap individualswhere
where
where
where
where
where
where
where
Figure 2 Examples of path-pairs and path-triples
extends all theway to the ancestor119860 we call it a root 2-overlappath
(14) 3-Overlap PathIt consists of all 3-overlap individuals ina consecutive order If the 3-overlap path extends all the wayto the root 119860 we call it a root 3-overlap path
Example 1 Consider the path-pairs from 119860 to 119886 and 119887 inFigure 2 where119860 is a common ancestor of 119886 and 119887 For path-pair1 119861119894 119862(119875
119860119886 119875119860119887) = 119904 119890 119905 and 119860 rarr 119904 rarr 119890 rarr 119905 is
a root 2-overlap path with respect to 119875119860119886
and 119875119860119887 For path-
pair4 119861119894 119862(119875119860119886 119875119860119887) = 119890 119905 where 119890 is a crossover indi-
vidual 119905 is a 2-overlap individual with respect to 119875119860119886
and 119875119860119887
and 119890 rarr 119905 is a root 2-overlap path with respect to 119875119860119886
and119875119860119887
Example 2 There are four path-quads listed in Figure 3 from119860 to four individuals 119886 119887 119888 and 119889 where 119860 is a quad-common ancestor of the four individuals For path-quad2considering the paths 119875
119860119886and 119875119860119887 the path119860 rarr 119905 rarr 119891 rarr
119904 is a root 2-overlap path 119905 119891 119904 are 2-overlap individualswithrespect to 119875
119860119886and 119875
119860119887 For path-quad3 119905 119891 119904 are 3-overlap
individuals with respect to 119875119860119886 119875119860119887 and 119875
119860119888 and the path
119860 rarr 119905 rarr 119891 rarr 119904 is a root 3-overlap path
Then we summarize all the conceptual terms used in thepath-counting formulas for two individuals three individu-als and four individuals in Table 1 which reveals a glimpse ofour framework for generalizingWrightrsquos formula to three andfour individuals from terminology aspect
24 An Overview of Path-Counting Formula DerivationAccording to Wrightrsquos path-counting formula [16] (see (2))for two individuals 119886 and 119887 the path-counting approachrequires identifying common ancestors of 119886 and 119887 andcalculating the contribution of each common ancestor toΦ119886119887 More specifically for each common ancestor denoted
as 119860 we obtain all path-pairs from 119860 to 119886 and 119887
and identify acceptable path-pairs For Φ119886119887 an acceptable
path-pair ⟨119875119860119886 119875119860119887⟩ is a nonoverlapping path-pair where
Computational and Mathematical Methods in Medicine 5
A
c
s
dt
f
ba
m
Path-quad1
Path-quad2
Path-quad3
Path-quad4
A rarr cA rarr d
A rarr t rarr f rarr s rarr aA rarr m rarr s rarr b
A rarr t rarr f rarr s rarr aA rarr t rarr f rarr s rarr bA rarr cA rarr d
A rarr t rarr f rarr s rarr aA rarr t rarr f rarr s rarr bA rarr t rarr f rarr s rarr cA rarr d
A rarr t rarr f rarr s rarr aA rarr t rarr m rarr s rarr bA rarr t rarr m rarr s rarr cA rarr d
Figure 3 Examples of path-quads
Table 1 The conceptual terms used for two three and four individuals
Two individuals Three individuals Four individualsCommon ancestor Triple-common ancestor Quad-common ancestorPath-pair Path-triple Path-quad119861119894 119862(119875
119860119886 119875119860119887) 119879119903119894 119862(119875
119860119886 119875119860119887 119875119860119888) 119876119906119886119889 119862(119875
119860119886 119875119860119887 119875119860119888 119875119860119889)
NA 2-Overlap individual 3-Overlap individualNA 2-Overlap path 3-Overlap pathNA Root 2-overlap path Root 3-overlap pathNA Crossover individual Crossover individual
the two paths share no common individuals except 119860 InFigure 2 path-pair2 is an acceptable path-pair while path-pair1 path-pair3 and path-pair4 are not acceptable path-pairs The contribution of each common ancestor 119860 toΦ
119886119887is
computed based on the inbreeding coefficient of 119860 modifiedby the length of each acceptable path-pair
To compute Φ119886119887119888
the path-counting approach requiresidentifying all triple-common ancestors of 119886 119887 and 119888 andsumming up all triple-common ancestorsrsquo contributions toΦ119886119887119888
For each triple-common ancestor denoted as119860 we firstidentify all path-triples each of which consists of three pathsfrom 119860 to 119886 119887 and 119888 respectively Some examples of path-triples are presented in Figure 2
For Φ119886119887 only nonoverlapping path-pairs are acceptable
A path-triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ consists of three path-pairs
⟨119875119860119886 119875119860119887⟩ ⟨119875119860119886 119875119860119888⟩ and ⟨119875
119860119887 119875119860119888⟩ For Φ
119886119887119888 a path-triple
might be acceptable even though either 2-overlap individualsor crossover individuals exist between a path-pair Themain challenge we need to address is finding necessary andsufficient conditions for acceptable path-triples
Aiming at solving the problem of identifying acceptablepath-triples we first use a systematic method to generate allpossible cases for a path-pair by considering different types ofcommon individuals shared between the two pathsThen weintroduce building blocks which are connected graphs withconditions on every edge in the graph that encapsulates a
set of acceptable cases of path-pairs In each building blockwe represent paths as nodes and interactions (ie sharedcommon individuals between two paths) as edges There areat least two paths in a building block For each buildingblock we obtain all acceptable cases for concerned path-pairs Given a path-triple it can be decomposed to one ormultiple building blocks Considering a shared path-pairbetween two building blocks we use the natural join operatorfrom relational algebra to match the acceptable cases forthe shared path-pair between two building blocks In otherwords considering the acceptable cases for building blocksas inputs we use the natural join operator to construct allacceptable cases for a path-triple Acceptable cases for a path-triple are identified and then used in deriving the path-counting formula forΦ
119886119887119888
Then we summarize all the main procedures used forderiving the path-counting formula for Φ
119886119887119888in a flowchart
shown in Figure 4 The main procedures are also applicablefor deriving the path-counting formulas forΦ
119886119887119888119889andΦ
119886119887119888119889
3 Results and Discussion
31 Path-Counting Formulas for Three Individuals We firstintroduce a systematic method to generate all possible cases
6 Computational and Mathematical Methods in Medicine
Path-pair
Path-triple Path-pair levelrepresentation Decomposition A set of
building blocksSets of acceptable casesFor each building block
Acceptable cases forpath-triple Natural join
If path-pair hascrossover
No
No
Yes
Yes
Split operator
Path-triple belongs toType 2
Type 1
If path-pair hasroot overlap
Compute its contributionto Φabc
Path-triple belongs to
⟨PAa PAb⟩Generate all cases for Identify nonoverlap path-
Pairs for ⟨PAa PAb⟩Compute its contribution
to Φab
Identify acceptable cases⟨PAa PAb⟩ in thefor
context of a path-triple
Aa PAb PAc ⟩⟨P
⟨PAa PAb⟩
Figure 4 A flowchart for path-counting formula derivation
for a path-pair Then we discuss building blocks for path-triples and identify all acceptable cases which are used inderiving the path-counting formula forΦ
119886119887119888
311 Cases for a Path-Pair Given a path-pair ⟨119875119860119886 119875119860119887⟩with
119861119894 119862(119875119860119886 119875119860119887) = 119873119880119871119871 where 119860 is a common ancestor of 119886
and 119887 and 119861119894 119862(119875119860119886 119875119860119887) consists of all common individuals
shared between 119875119860119886
and 119875119860119887 except 119860 we introduce three
patterns (ie crossover 2-overlap and root 2-overlap) to gen-erate all possible cases for ⟨119875
119860119886 119875119860119887⟩
(1) 119883(119875119860119886 119875119860119887) 119875119860119886
and 119875119860119887
share one or multiple cross-over individuals
(2) 119879(119875119860119886 119875119860119887) 119875119860119886
and 119875119860119887
are root 2-overlapping from119860 and the root 2-overlap path can have one or multi-ple 2-overlap individuals
(3) 119884(119875119860119886 119875119860119887)119875119860119886
and119875119860119887
are overlapping but not from119860 and the 2-overlap path can have one or multiple 2-overlap individuals
Based on the three patterns 119883(119875119860119886 119875119860119887) 119879(119875
119860119886 119875119860119887)
and 119884(119875119860119886 119875119860119887) we use regular expressions to generate all
possible cases for the path-pair ⟨119875119860119886 119875119860119887⟩ For convenience
we drop ⟨119875119860119886 119875119860119887⟩ and use 119883119879 and 119884 instead of patterns
119883(119875119860119886 119875119860119887) 119879(119875
119860119886 119875119860119887) and 119884(119875
119860119886 119875119860119887) whenever there is
no confusion When 119861119894 119862(119875119860119886 119875119860119887) = 119873119880119871119871 the eight cases
shown in (7) cover all possible cases for ⟨119875119860119886 119875119860119887⟩ The com-
pleteness of eight cases shown in (7) for ⟨119875119860119886 119875119860119887⟩ can be
proved by induction on the total number of 119879 119883 and 119884appearing in ⟨119875
119860119886 119875119860119887⟩ Using the pedigree in Figure 2 Cases
1ndash3 and Case 6 are illustrated in (8) (9) (10) and (11)
Case 1 119879Case 2 119883+
Case 3 119879119883+
Case 4 119879(119883+119884)+
Case 5 119879(119883+119884)+119883+
Case 6 119883+119884Case 7 119883+(119884119883+)+
Case 8 119883+(119884119883+)+119884
(7)
119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119886
119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119887 isin 119879 (8)
Computational and Mathematical Methods in Medicine 7
S0 S1 S2 S3
PAa PAb
PAc
Figure 5 A path-pair level graphical representation of ⟨119875119860119886 119875119860119887 119875119860119888⟩
where 119904 119890 119905 are 2-overlap individuals and the overlap pathis a root 2-overlap path
119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119886
119860 997888rarr 119904 997888rarr 119891 997888rarr 119905 997888rarr 119887 isin 119879119883 (9)
where 119904 is a 2-overlap individual and the overlap path is a root2-overlap path 119905 is a crossover individual
119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119886
119860 997888rarr 119889 997888rarr 119891 997888rarr 119905 997888rarr 119887 isin 119883 (10)
where 119905 is a crossover individual
119860 997888rarr 119888 997888rarr 119890 997888rarr 119905 997888rarr 119886
119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119887 isin 119883119884 (11)
where 119890 is a crossover individual 119905 is a 2-overlap individualand the overlap path is a 2-overlap path
312 Path-Pair Level Graphical Representation of a Path-Tri-ple Given a path-triple ⟨119875
119860119886 119875119860119887 119875119860119888⟩ we represent each
path as a node The path-triple can be decomposed to threepath-pairs (ie ⟨119875
119860119886 119875119860119887⟩ ⟨119875119860119886 119875119860119888⟩ and ⟨119875
119860119887 119875119860119888⟩) For
each path-pair if the two paths share at least one commonindividual (ie either 2-overlap individual or crossover indi-vidual) except119860 then there is an edge between the two nodesrepresenting the two paths Therefore we obtain four differ-ent scenarios 119878
0ndash1198783 shown in Figure 5
In Figure 5 the scenario 1198780has no edges so it means
that ⟨119875119860119886 119875119860119887 119875119860119888⟩ consists of three independent paths In
Figure 2 path-triple1 is an example of 1198780 Next we introduce
a lemma which can assist with identifying the options for theedges in the scenarios 119878
1ndash1198783
Lemma 3 Given a path-triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ consider the
three path-pairs ⟨119875119860119886 119875119860119887⟩ ⟨119875119860119886 119875119860119888⟩ and ⟨119875
119860119887 119875119860119888⟩ if there
is a 2-overlap edge which is represented by 119884 in regular expres-sion representation of any of the three path-pairs and then thepath-triple ⟨119875
119860119886 119875119860119887 119875119860119888⟩ has no contribution to Φ
119886119887119888
Proof In [17] Nadot and Vaysseix proposed from a geneticand biological point of view that Φ
119886119887119888can be evaluated by
enumerating all eligible inheritance paths at allele-level start-ing from a triple common ancestor119860 to the three individuals119886 119887 and 119888
p1
p3
A
b c
a
p2
p5
p8
p4
p7
p6
(a) Pedigree
A
b c
a
p5
p7
p4
p6
p8
p1 p2
p3
(b) Inheritance paths
Figure 6 Examples of pedigree and inheritance paths
For the pedigree in Figure 6 let us consider the path-triple ⟨119875
119860119886 119875119860119887 119875119860119888⟩ listed as follows 119875
119860119886 119860 rarr 119886 119875
119860119887
119860 rarr 1199013rarr 1199016rarr 1199017rarr 119887 119875
119860119888 119860 rarr 119901
4rarr 1199016rarr
1199017rarr 119888For ⟨119875
119860119887 119875119860119888⟩ 1199016is a crossover individual 119901
7is an over-
lap individual and 1199016rarr 1199017is a 2-overlap edge repre-sented
by 119884 in regular expression representation (see the definitionfor 119884 in Section 311)
For the individual 1199016 let us denote the two alleles at one
fixed autosomal locus as 1198921and 119892
2 At allele-level only one
allele can be passed down from 1199016to 1199017 Since 119901
3and 119901
4
are parents of 1199016 1198921is passed down from one parent and
1198922is passed down from the other parent It is infeasible to
pass down both 1198921and 119892
2from 119901
6to 1199017 In other words
there are no corresponding inheritance paths for the path-triple ⟨119875
119860119886 119875119860119887 119875119860119888⟩with a 2-overlap edge between ⟨119875
119860119887 119875119860119888⟩
(ie Case 6119883119884) Therefore such kind of path-triples has nocontribution toΦ
119886119887119888
Figure 6(b) shows one example of eligible inheritancepaths corresponding to a pedigree graph Each individual isrepresented by two allele nodesThe eligible inheritance pathsin Figure 6(b) consist of red edges only
Only Case 1 Case 2 and Case 3 do not have 119884 in theregular expression representation of a path-pair (see (7))considering the scenarios 119878
1ndash1198783shown in Figure 5 an edge
can have three options Case 1 119879Case 2 119883Case 3 119879119883
313 Constructing Cases for a Path-Triple For the scenarios1198781ndash1198783in Figure 5 we define two building blocks 119861
1 1198612
along with some rules in Figure 7 to generate acceptablecases For 119861
1 the edge can have three options Case 1 119879
Case 2 119883 Case 3 119879119883 For 1198612 we cannot allow both edges
to be root overlap because if two edges are root overlap then
8 Computational and Mathematical Methods in Medicine
For B2 there can be at most one edge belonging to root overlap (either T or TX)
PAa PAa
PAb PAb PAc
B1 B2
For B1 the edge can have three options case 1 T case 2 X case 3 TX
Figure 7 Building blocks 1198611 1198612 and basic rules
Note Ri denotes all acceptable path-triples for ui
S3e1
T3 = R1 ⋈ R2 ⋈ R3u1 u2 u3
e2 e2 e2
e3e3 e3e1 e1
Figure 8 A graphical illustration for obtaining 1198793
119875119860119886
and 119875119860119888
must share at least one com-mon individualexcept 119860 which contradicts the fact that 119875
119860119886and 119875
119860119888have
no edgeNext we focus on generating all acceptable cases for the
scenarios 1198781ndash1198783in Figure 5 where only 119878
3contains more
than one building block In order to leverage the dependencyamong building blocks we decompose 119878
3to 1198783= 1199061= 1198612
1199062= 1198612 1199063= 1198612 shown in Figure 8 For each 119906
119894 we have a
set of acceptable path-triples denoted as 119877119894
Considering the dependency among 1198771 1198772 1198773 we use
the natural join operator denoted as ⋈ operating on 1198771
1198772 1198773 to generate all acceptable cases for 119878
3 As a result we
obtain 1198793= 1198771⋈ 1198772⋈ 1198773 where 119879
3denotes the acceptable
cases of the path-triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ in the scenario 119878
3
For each scenario in Figure 5 we generate all acceptablecases for ⟨119875
119860119886 119875119860119887 119875119860119888⟩ The scenario 119878
0has no edges and
it shows that ⟨119875119860119886 119875119860119887 119875119860119888⟩ consists of three independent
paths while for the other scenarios 119878119896(119896 = 1 2 3) the 119896
edges can have two options
(1) all 119896 edges belong to crossover or(2) one edge belongs to root 2-overlap the remaining (119896minus
1) edges belong to crossover
In summary acceptable path-triples can have at most oneroot 2-overlap path any number of crossover individuals butzero 2-overlap path
314 Splitting Operator Considering the existence of root2-overlap path and crossover in acceptable path-triples wepropose a splitting operator to transform a path-triple withcrossover individuals to a noncrossover path-triple withoutchanging the contribution from this path-triple to Φ
119886119887119888 The
main purpose of using the splitting operator is to simplifythe path-counting formula derivation process We first usean example in Figure 9 to illustrate how the splitting operator
works In Figure 9 there is a crossover individual 119904 between119875119860119886
and 119875119860119887
in the path triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ in 119866
119896+1 The
splitting operator proceeds as follows
(1) split the node 119904 to two nodes 1199041and 1199042
(2) transform the edges 119904 rarr 1198861015840 and 119904 rarr 119887
1015840 to 1199041rarr 1198861015840
and 1199042rarr 1198871015840 respectively
(3) add two new edges 1199042rarr 1198861015840 and 119904
1rarr 1198871015840
Lemma 4 Given a pedigree graph 119866119896+1
having (119896 + 1)
crossover individuals regarding ⟨119875119860119886 119875119860119887 119875119860119888⟩ shown in
Figure 9 let 119904 denote the lowest crossover individual where nodescendant of 119904 can be a crossover individual among the threepaths119875
119860119886119875119860119887 and119875
119860119888 After using the splitting operator for the
lowest crossover individual 119904 in119866119896+1 the number of crossover
individuals in 119866119896+1
is decreased by 1
Proof The splitting operator only affects the edges from 119904 to1198861015840 and 1198871015840 If there is a new crossover node appearing the only
possible node is either 1198861015840 or 1198871015840 Assume 1198871015840 becomes a cross-over individual it means that 1198871015840 is able to reach 119886 and 119887 fromtwo separate paths It contradicts the fact that 119904 is the lowestcrossover individual between 119875
119860119886and 119875
119860119887
Next we introduce a canonical graph which results fromapplying the splitting operator for all crossover individualsThe canonical graph has zero crossover individual
Definition 5 (Canonical Graph) Given a pedigree graph 119866having one or more crossover individuals regarding Φ
119886119887119888 If
there exists a graph 1198661015840 which has no crossover individualswith regards to Φ
119886119887119888such that
(i) any acceptable path-triple in 119866 has an acceptablepath-triple in 1198661015840 which has the same contribution toΦ119886119887119888
as the one in 119866 forΦ119886119887119888
(ii) any acceptable path-triple in 1198661015840 has an acceptablepath-triple in 119866 which and has the same contributionto Φ119886119887119888
as the one in 1198661015840 forΦ119886119887119888
We call 1198661015840 a canonical graph of 119866 regardingΦ119886119887119888
Lemma 6 For a pedigree graph 119866 having one or morecrossover individuals regarding ⟨119875
119860119886 119875119860119887 119875119860119888⟩ there exists a
canonical graph 1198661015840 for 119866
Computational and Mathematical Methods in Medicine 9
Ancestor-descendant relationshipParent-child relationship
a998400 b
a b a b
998400 a998400 b998400
s1 s2
A A
x w c x w c
s For Gk+1 ⟨P ⟩ = PAa PAb PAc
⟨P ⟩ = PAa PAb PAcFor Gk
Gk+1 k + 1 crossover Gk k crossover
A rarr middot middot middot rarr x rarr s rarr a998400 rarr middot middot middot rarr aA rarr middot middot middot rarr w rarr s rarr b998400 rarr middot middot middot rarr b
A rarr middot middot middot rarr x rarr s1 rarr a998400 rarr middot middot middot rarr aA rarr middot middot middot rarr w rarr s2 rarr b998400 rarr middot middot middot rarr b
A rarr c
A rarr c
Figure 9 Transforming pedigree graph 119866119896+ 1 having 119896 + 1 crossover to 119866
119896having 119896 crossover
S0
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
PAa PAd
PAb PAc
Figure 10 A path-pair level graphical representation of ⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
Proof (Sketch) The proof is by induction on the number ofcrossover individuals
Induction hypothesis assume that if119866 has 119896 or less cross-overs there is a canonical graph 1198661015840 for 119866
In the induction step let119866119896+1
be a graph with 119896+1 cross-overs let 119904 be the lowest crossover between paths 119875
119860119886and
119875119860119887
in 119866119896+1
We apply the splitting operator on 119904 in 119866119896+1
andobtain 119866
119896having 119896 crossovers by Lemma 4
315 Path-Counting Formula for Φ119886119887119888
Now we present thepath-counting formula forΦ
119886119887119888
Φ119886119887119888= sum
119860
( sum
Type 1(1
2)
119871 triple
Φ119860119860119860
+ sum
Type 2(1
2)
119871 triple+1
Φ119860119860)
(12)
where Φ119860119860= (12)(1 + 119865
119860) Φ119860119860119860
= (14)(1 + 3119865119860) 119865119860 the
inbreeding coefficient of119860119860 a triple-common ancestor of 119886119887 and 119888 Type 1 ⟨119875
119860119886 119875119860119887 119875119860119888⟩ has zero root 2-overlap Type
2 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has one root 2-overlap path 119875
119860119904ending at
the individual 119904
119871 triple = 119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888minus 119871119875119860119904
for Type 2(13)
and 119871119875119860119886
the length of the path 119875119860119886
(also applicable for 119875119860119886
119875119860119888 and 119875
119860119904)
For completeness the path-counting formula for Φ119886119886119887
isgiven in Appendix A and the correctness proof of the path-counting formula is given in Appendix B
32 Path-Counting Formulas for Four Individuals
321 Path-Pair Level Graphical Representation of ⟨119875119860119886119875119860119887
119875119860119888119875119860119889⟩ Given a path-quad ⟨119875
119860119886 119875119860119887 119875119860119888 119875119860119889⟩ and
119876119906119886119889 119862(119875119860119886 119875119860119887 119875119860119888 119875119860119889) = 0 the path-quad can have 11
scenarios 1198780ndash11987810shown in Figure 10 where all four paths are
considered symmetricallyIn Figure 11 we introduce three building blocks 119861
1
1198612 1198613 For 119861
1and 119861
2 the rules presented in Figure 7 are also
applicable for Figure 11 For1198613 we only consider root overlap
because the crossover individuals can be eliminated by usingthe splitting operator introduced in Section 314 Note thatfor 1198613 if 119879119903119894 119862(119875
119860119886 119875119860119887 119875119860119888) = 0 then it is equivalent to the
scenario 1198783in Figure 8 Therefore we only need to consider
1198613when 119879119903119894 119862(119875
119860119886 119875119860119887 119875119860119888) = 0
322 Building Block-Based Cases Construction for ⟨119875119860119886119875119860119887
119875119860119888119875119860119889⟩ For a scenario 119878
119894(0 le 119894 le 10) in Figure 11 we
first decompose 119878119894to one or multiple building blocks For a
scenario 119878119894isin 1198781 1198783 it has only one building block and
all acceptable cases can be obtained directly For 1198782= 1199061=
1198611 1199062= 1198611 there is no need to consider the conflict between
the edges in 1199061and 119906
2because 119906
1and 119906
2are disconnected
Let 119877119894denote all acceptable cases of the path-pairs in 119906
119894 and
let 119879119894denote all acceptable cases for 119878
119894 Therefore we obtain
1198792= 1198771times1198772where times denotes the Cartesian product operator
from relational algebra
10 Computational and Mathematical Methods in Medicine
For B3 all three edges belong to root overlap (ie having root 3-overlap)
PAa
PAb PAcPAb
PAa
C(PAa PAb PAc) ne
B1 B2 B3
Tri 0
Figure 11 Building blocks for all scenarios of ⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
Table 2 Largest subgraph of a scenario 119878119894(4 le 119894 le 10 and 119894 = 6)
119878119894
1198784
1198785
1198787
1198788
1198789
11987810
119878119895
1198783
1198783
1198786
1198785
1198787
1198789
For 1198786= 1199061= 1198613 we obtain 119879
6= 1198771 For 119878
119894isin 119878119894| 4 le
119894 le 10 and 119894 = 6 we define the largest subgraph of 119878119894based
on which we construct 119879119894
Definition 7 (Largest Subgraph) Given a scenario 119878119894(4 le 119894 le
10 and 119894 = 6) the largest subgraph of 119878119894 denoted as 119878
119895 is
defined as follows
(1) 119878119895is a proper subgraph of 119878
119894
(2) if 119878119894contains 119861
3 then 119878
119895must also contain 119861
3
(3) no such 119878119896exists that 119878
119895is a proper subgraph of 119878
119896
while 119878119896is also a proper subgraph of 119878
119894
For each scenario 119878119894(4 le 119894 le 10 and 119894 = 6) we list the
largest subgraph of 119878119894 denoted as 119878
119895 in Table 2
For a scenario 119878119894(4 le 119894 le 10 and 119894 = 6) let Diff(119878
119894 119878119895)
denote the set of building blocks in 119878119894but not in 119878
119895 where 119878
119895is
the largest subgraph of 119878119894 Let |119864
119894| and |119864
119895| denote the number
of edges in 119878119894and 119878
119895 respectively According to Table 2 we
can conclude that |119864119894| minus |119864
119895| = 1 In order to leverage the
dependency among building blocks we consider only 1198612in
Diff(119878119894119878119895) For example Diff(119878
51198783) = 119861
2 Let119879
3denote all
acceptable cases for 1198783 And let119877
1denote the set of acceptable
cases for Diff(1198785 1198783) Then we can use 119878
3and Diff(119878
5
1198783) to construct all acceptable cases for 119878
5 Then we apply
this idea for constructing all acceptable cases for each 119878119894in
Table 2Given a path-quad ⟨119875
119860119886 119875119860119887 119875119860119888 119875119860119889⟩ an acceptable case
has the following properties
(1) if there is one root 3-overlap path there can be atmostone root 2-overlap path
(2) otherwise there can be at most two root 2-overlappaths
323 Path-Counting Formula forΦ119886119887119888119889
Now we present thepath-counting formula forΦ
119886119887119888119889as follows
Φ119886119887119888119889
= sum
119860
( sum
Type 1(1
2)
119871quad
Φ119860119860119860119860
+ sum
Type 2(1
2)
119871quad+1
Φ119860119860119860
+ sum
Type 3(1
2)
119871quad+2
Φ119860119860)
(14)
where Φ119860119860= (12)(1+119865
119860)Φ119860119860119860
= (14)(1+3119865119860)Φ119860119860119860119860
=
(18)(1+7119865119860) 119865119860 the inbreeding coefficient of119860119860 a quad-
common ancestor of 119886 119887 119888 and 119889 Type 1 zero root 2-overlapand zero root 3-overlap path Type 2 one root 2-overlap path119875119860119904
ending at 119904
Type 3
Case 1 two root 2-overlap paths 1198751198601199041
1198751198601199042
ending at 1199041and 1199042 respectively
Case 2 one root 3-overlap path119875119860119905
ending at 119905Case 3 one root 2-overlap path119875119860119904 one root 3-overlap
path 119875119860119905
ending at 119904 and 119905respectively
119871quad =
119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889
for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 119871119875119860119904
for Type 2119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889
minus1198711198751198601199041
minus 1198711198751198601199042
for Case 1 isin Type 3119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 2 lowast 119871
119875119860119905for Case 2 isin Type 3
119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889
minus119871119875119860119905minus 119871119875119860119904
for Case 3 isin Type 3(15)
and 119871119875119860119886
the length of the path 119875119860119886
(also applicable for 119875119860119887
119875119860119888 119875119860119889 etc)
For completeness the path-counting formulas for Φ119886119886119887119888
and Φ119886119886119886119887
are presented in Appendix A The correctness ofthe path-counting formula for four individuals is proven inAppendix C
Computational and Mathematical Methods in Medicine 11
⟨ ⟩(PAa PAb) (PAc PAd) = b
A
c
s t
da
A rarr s rarr aA rarr s rarr bA rarr t rarr cA rarr t rarr d
(a)
⟨ ⟩(PAa PAb) (PAc PAd) = b
A
c
x y
da
A rarr x rarr a
A rarr x rarr d
A rarr y rarr bA rarr y rarr c
(b)
Figure 12 Examples of 2-pair-path-quads for Φ119886119887119888119889
33 Path-Counting Formulas for Two Pairs of Individuals
331 Terminology and Definitions
(1) 2-Pair-Path-Pair It consists of two pairs of path-pairsdenoted as ⟨(119875
119878119886 119875119878119887) (119875119879119888 119875119879119889)⟩ where 119875
119878119886isin 119875(119878 119886) 119875
119878119887isin
119875(119878 119887) 119875119879119888isin 119875(119879 119888) 119875
119879119889isin 119875(119879 119889) 119878 is a common ancestor
of 119886 and 119887 and 119879 is a common ancestor of 119888 and 119889 If119860 = 119878 =119879 then 119860 is a quad-common ancestor of 119886 119887 119888 and 119889
(2) Homo-Overlap and Heter-Overlap Individual Given twopairs of individuals ⟨119886 119887⟩ and ⟨119888 119889⟩ if 119904 isin 119861119894 119862(119875
119860119886 119875119860119887) (or
119904 isin 119861119894 119862(119875119860119888 119875119860119889) we call 119904 a homo-overlap individual when
119875119860119886
and 119875119860119887
(or 119875119860119888
and 119875119860119889) pass through the same parent of
119904 If 119903 isin 119861119894 119862(119875119860119894 119875119860119895) where 119894 isin 119886 119887 and 119895 isin 119888 119889 we call
119903 a heter-overlap individual when 119875119860119894
and 119875119860119895
pass throughthe same parent of 119903
(3) Root Homo-Overlap and Heter-Overlap Path Given a 2-pair-path-pair ⟨(119875
119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ if 119904 is a homo-overlap
individual and the homo-overlap path extends all the wayto the quad-common ancestor 119860 then we call it a roothomo-overlap path If 119903 is a heter-overlap individual and theheter-overlap path extends all the way to the quad-commonancestor 119860 then we call it a root heter-overlap path
Example 8 119860 is quad-common ancestor for 119886 119887 119888 and 119889 inFigure 12 For (a) 119904 is a homo-overlap individual between 119875
119860119886
and 119875119860119887
119905 is a homo-overlap individual between 119875119860119888
and 119875119860119889 And
119860 rarr 119904 and 119860 rarr 119905 are root homo-overlap paths For (b) 119909 isa heter-overlap individual between 119875
119860119886and 119875
119860119889 119910 is a heter-
overlap individual between 119875119860119887
and 119875119860119888 And 119860 rarr 119909 and
119860 rarr 119910 are root heter-overlap paths
332 Path-Counting Formula for Φ119886119887119888119889
Now we presenta path-pair level graphical representation for ⟨(119875
119860119886 119875119860119887)
(119875119860119888 119875119860119889)⟩ shown in Figure 13 The options for an edge can
be 119879119883 119879119883 (Refer to Section 311 for definitions of 119879119883and 119879119883) Based on the different types of ⟨119875
119860119886 119875119860119887 119875119860119888 119875119860119889⟩
presented in (14) all cases for ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ are
summarized in Table 3 where ℎ is the last individual of a roothomo-overlap path 119875
119860ℎ(ie the path 119875
119860ℎending at ℎ) and 119903
1
and 1199032are the last individuals of root heter-overlap paths 119875
1198601199031
and 1198751198601199032
respectivelyGiven a pedigree graph having one or multiple progeni-
tors 119901119894| 119894 gt 0 we define that the generation of a progenitor
Table 3 A summary of all cases for ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩ ⟨(119875
119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩
Zero root 2-overlap andzero root 3-overlap
Zero root homo-overlap and zero rootheter-overlap
One root 2-overlap path
One root homo-overlap and zero rootheter-overlapZero root homo-overlap and one rootheter-overlap
Two root 2-overlap paths
Two root homo-overlaps and zero rootheter-overlapZero root homo-overlap and two rootheter-overlaps
One root 3-overlap path One root homo-overlap and two rootheter-overlaps and ℎ = 119903
1= 1199032
One root 2-overlap andone root 3-overlap
One root homo-overlap and two rootheter-overlaps and 119903
1= 1199032= ℎ
One root homo-overlap and two rootheter-overlaps and ℎ = 119903
1= 1199032
119901119894is 0 denoted as gen(119901
119894) = 0 If an individual 119886 has only
one parent 119901 then we define gen(119886) = gen(119901) + 1 If anindividual 119886 has two parents 119891 and 119898 we define gen(119886) =MAXgen(119891) gen(119898) + 1
The path-counting formula forΦ119886119887119888119889
is as follows
Φ119886119887119888119889
= sum
119860
( sum
Type 1(1
2)
1198712-pair
Φ119860119860119860
+ sum
Type 2(1
2)
1198712-pair+1
Φ119860119860119860
+ sum
Type 3(1
2)
1198712-pair+2
Φ119860119860
+ sum
Type 4(1
2)
1198712-pair+1
Φ119860119860)
+ sum
(119878119879)isinType 5(1
2)
119871⟨119875119878119886119875119878119887⟩+119871⟨119875119879119888119875119879119889
⟩+1
Φ119861119861
(16)
where 119860 a quad-common ancestor of 119886 119887 119888 and 119889 119878a common ancestor of 119886 and 119887 and 119879 a common ances-tor of 119888 and 119889 For ⟨(119875
119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ (119878 = 119879 =
119860) there are four types (ieType 1 to Type 4)
12 Computational and Mathematical Methods in Medicine
S0S1 S2 S3 S4 S5 S6 S7
S8 S9 S10 S11 S12 S13 S14 S15 S16
PAa
PAdPAb
PAc
Figure 13 Scenarios of ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ at path-pair level
Type 1 zero root homo-overlap and zero root heter-overlapType 2 zero root homo-overlap and one root heter-overlap 119875
119860119903ending at 119903
Type 3
zero root homo-overlap and two rootheter-overlap 119875
1198601199031and1198751198601199032
ending at1199031and 1199032 respectively
one root homo-overlap 119875119860ℎ
ending at ℎand two root heter-overlap 119875
1198601199031and 119875
1198601199032
ending at 1199031and 1199032 and 119903
1= 1199032
(17)
Type 4 one root homo-overlap 119875119860ℎ
ending at ℎ andtwo root heter-overlap ending at 119903
1and 1199032 and ℎ =
1199031= 1199032 For ⟨(119875
119878119886 119875119878119887) (119875119879119888 119875119879119889)⟩ (119878 = 119879) there is
one type (ie Type 5)Type 5 ⟨119875
119878119886 119875119878119887⟩ has zero overlap individual ⟨119875
119879119888
119875119879119889⟩ has zero overlap individual
At most one path-pair (either ⟨119875119878119886 119875119878119887⟩ or ⟨119875
119879119888
119875119879119889⟩) can have crossover individualsBetween a path from ⟨119875
119878119886 119875119878119887⟩ and a path from ⟨119875
119879119888 119875119879119889⟩
there are no overlap individuals but there can be crossoverindividuals 119909 where 119909 = 119878 and 119909 = 119879
119861=
119878 when gen (119878) lt gen (119879)119878 when gen (119878) = gen (119879)
and 119879 has two parents119879 otherwise
1198712-pair =
119871119875119860119886+ 119871119875119860119887
+119871119875119860119888+ 119871119875119860119889
for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 119871119875119860119903
for Type 2119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 1198711198751198601199031
minus 1198711198751198601199032
for Type 3119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 2 lowast 119871
119875119860ℎfor Type 4
119871⟨119875119878119886 119875119878119887⟩
= 119871119875119878119886+ 119871119875119878119887
for Type 5
119871⟨119875119879119888 119875119879119889⟩
= 119871119875119879119888+ 119871119875119879119889
for Type 5
(18)
Note that if ⟨119886 119887⟩ and ⟨119888 119889⟩ have zero quad-commonancestors we have the following formula for Φ
119886119887119888119889
Φ119886119887119888119889
= sum
(119878119879)isinType 6(1
2)
119871⟨119875119878119886119875119878119887⟩+119871⟨119875119879119888119875119879119889
⟩
Φ119878119878lowast Φ119879119879 (19)
Type 6 ⟨119875119878119886 119875119878119887⟩ is a nonoverlapping path-pair and ⟨119875
119879119888
119875119879119889⟩ is a nonoverlapping path-pair Between a path from
⟨119875119878119886 119875119878119887⟩ and a path from ⟨119875
119879119888 119875119879119889⟩ there are no overlap
individuals but there can be crossover individuals119871⟨119875119878119886 119875119878119887⟩
and 119871⟨119875119879119888119875119879119889⟩
are defined as in Type 5The correctness of the path-counting formula forΦ
119886119887119888119889is
proven in Appendix C For completeness please refer to [18]for the path-counting formulas for Φ
119886119886119887119888 Φ119886119887119886119888
Φ119886119887119886119887
andΦ119886119886119886119887
34 Experimental Results In this section we show the effi-ciency of our path-counting method using NodeCodes forcondensed identity coefficients by making comparisons withthe performance of a recursive method used in [10] Weimplemented two methods (1) using recursive formulas tocompute each required kinship coefficient and generalizedkinship coefficient (2) using path-counting method coupledwith NodeCodes to compute each required kinship coeffi-cient and generalized kinship coefficient independently Werefer to the first method as Recursive the second methodas NodeCodes For completeness please refer to [18] for thedetails of the NodeCodes-based method
Nodecodes of a node is a set of labels each representing apath to the node from its ancestors Given a pedigree graphlet 119903 be the progenitor (ie the node with 0 in-degree)(For simplicity we assume there is one progenitor 119903 asthe ancestor of all individuals in the pedigree Otherwise avirtual node 119903 can be added to the pedigree graph and allprogenitors can be made children of 119903) For each node 119906 inthe graph the set of NodeCodes of 119906 denoted as NC(119906) areassigned using a breadth-first-search traversal starting from119903 as follows
(1) If 119906 is 119903 then NC(119903) contains only one element theempty string
(2) Otherwise let 119906 be a node with NC(119906) and V0 V1
V119896be 119906rsquos children in sibling order then for each 119909
in NC(119906) a code 119909119894lowast is added to NC(V119894) where 0 le
119894 le 119896 and lowast indicates the gender of the individualrepresented by node V
119894
Computational and Mathematical Methods in Medicine 13
Computations of kinship coefficients for two individualsand generalized kinship coefficients for three individualspresented in [11 12 14 15] are using NodeCodes TheNodeCodes-based computation schemes can also be appliedfor the generalized kinship coefficients for four individualsand two pairs of individuals For completeness please referto [18] for the details using NodeCodes to compute thegeneralized kinship coefficients for four individuals and twopairs of individuals based on our proposed path-countingformulas in Sections 32 and 33
In order to test the scalability of our approach for cal-culating condensed identity coefficients on large pedigreeswe used a population simulator implemented in [11] togenerate arbitrarily large pedigreesThe population simulatoris based on the algorithm for generating populations withoverlapping generations in Chapter 4 of [19] along withthe parameters given in Appendix B of [20] to model therelatively isolated Finnish Kainuu subpopulation and itsgrowth during the years 1500ndash2000 An overview of thegeneration algorithmwas presented in [11 12 14]The param-eters include startingending year initial population sizeinitial age distribution marriage probability maximum ageat pregnancy expected number of children by time periodimmigration rate and probability of death by time period andage group
We examine the performance of condensed identity coef-ficients using twelve synthetic pedigrees which range from75 individuals to 195197 individuals The smallest pedigreespans 3 generations and the largest pedigree spans 19 gener-ations We analyzed the effects of pedigree size and the depthof individuals in the pedigree (the longest path between theindividual and a progenitor) on the computation efficiencyimprovement
In the first experiment 300 random pairs were selectedfrom each of our 12 synthetic pedigrees Figure 14 showscomputation efficiency improvement for each pedigree Ascan be seen the improvement of NodeCodes over Recursivegrew increasingly larger as the pedigree size increased froma comparable amount of 2683 on the smallest pedigree to9475 on the largest pedigree It also shows that path-count-ing method coupled with NodeCodes can scale very well onlarge pedigrees in terms of computing condensed identitycoefficients
In our next experiment we examined the effect of thedepth of the individual in the pedigree on the query time Foreach depth we generated 300 random pairs from the largestsynthetic pedigree
Figure 15 shows the effect of depth on the compu-tation efficiency improvement We can see the improve-ment of NodeCodes over Recursive ranging from 8648 to9130
4 Conclusion
We have introduced a framework for generalizing Wrightrsquospath-counting formula for more than two individuals Aim-ing at efficiently computing condensed identity coefficients
0
50
100
150
200
77 181
383
769
1558
3105
6174
1235
1
2466
7
4976
1
9832
8
1951
97
250
300
Aver
age t
ime (
ms)
Individuals in pedigree
RecursiveNodecodes
Figure 14 The effect of pedigree size on computation efficiencyimprovement
0200400600800
10001200140016001800
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Aver
age t
ime (
ms)
Depth
RecursiveNodeCodes
Figure 15 The effect of depth on computation efficiency improve-ment
we proposed path-counting formulas (PCF) for all general-ized kinship coefficients for which are sufficient for express-ing condensed identity coefficients by a linear combinationWe also perform experiments to compare the efficiency of ourmethod with the recursive method for computing condensedidentity coefficients on large pedigrees Our future workincludes (i) further improvements on condensed identifycoefficients computation by collectively calculating the setof generalized kinship coefficients to avoid redundant com-putations and (ii) experimental results for using PCF inconjunction with encoding schemes (eg compact path-encoding schemes [13]) for computing condensed identitycoefficients on very large pedigrees
Appendices
A Path-Counting Formulas of Special Cases
A1 Path-Counting Formula for Φ119886119886119887
For ⟨1198751198601198861 1198751198601198862⟩ we
introduce a special case where 1198751198601198861
and 1198751198601198862
aremergeable
14 Computational and Mathematical Methods in Medicine
PAa1 PAa2 PAa1 PAa2
S0 S1
PAb PAb PAb
If is mergeable⟨P ⟩Aa1 PAa2
PAa
S2 S3
Figure 16 A path-pair level graphical representation of ⟨1198751198601198861 1198751198601198862
119875119860119887⟩
Definition A1 (Mergeable Path-Pair) A path-pair ⟨1198751198601198861
1198751198601198862⟩ is mergeable if and only if the two paths 119875
1198601198861and 119875
1198601198862
are completely identical
Next we present a graphical representation of ⟨1198751198601198861 1198751198601198862
119875119860119887⟩ in Figure 16
Lemma A2 For 1198782and 119878
3in Figure 16 ⟨119875
1198601198861 1198751198601198862⟩ cannot
be a mergeable path-pair
Proof For 1198782and 119878
3 if ⟨119875
1198601198861 1198751198601198862⟩ is mergeable then
any common individual 119904 between 1198751198601198861
and 119875119860119887
is alsoa shared individual between 119875
1198601198862and 119875
119860119887 It means
119904 isin 119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) which contradicts the fact that
119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) = 0
Considering all three scenarios in Figure 16 only 1198781can
have a mergeable path-pair ⟨1198751198601198861 1198751198601198862⟩ by Lemma A2 Now
we present our path-counting formula forΦ119886119886119887
where 119886 is notan ancestor of 119887
Φ119886119886119887
= sum
119860
( sum
Type 1(1
2)
119871 tripleminus1
Φ119860119860119860
+ sum
Type 2(1
2)
119871 triple
Φ119860119860
+ sum
Type 3(1
2)
119871⟨119875119860119886119875119860119887⟩+1
Φ119860119860)
(A1)
where 119860 a common ancestor of 119886 and 119887When ⟨119875
1198601198861 1198751198601198862⟩ is not mergeable
Type 1 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has no root 2-overlap
Type 2 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has one root 2-overlap path
119875119860119904
ending at the individual 119904
When ⟨1198751198601198861 1198751198601198862⟩ is mergeable
Type 3 ⟨119875119860119886 119875119860119887⟩ is a nonoverlapping path-pair
119871 triple = 1198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887
for Type 11198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887minus 119871119875119860119904
for Type 2
119871⟨119875119860119886 119875119860119887⟩
= 119871119875119860119886+ 119871119875119860119887
for Type 3
(A2)
For the sake of completeness if 119886 is an ancestor of 119887 there isno recursive formula for Φ
119886119886119887in [10] but we can use either
the recursive formula for Φ119886119887119888
or the path-counting formulaforΦ119886119887119888
to computeΦ11988611198862119887
A2 Path-Counting Formula for Φ119886119886119887119888
Given a path-quad⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩ if ⟨119875
1198601198861 1198751198601198862⟩ is not mergeable then
we process the path-quad as equivalent to ⟨119875119860119886 119875119860119887 119875119860119888
119875119860119889⟩ If ⟨119875
1198601198861 1198751198601198862⟩ is mergeable the path-quad ⟨119875
1198601198861 1198751198601198862
119875119860119887 119875119860119888⟩ can be condensed to scenarios for ⟨119875
119860119886 119875119860119887 119875119860119888⟩
Now we present a path-counting formula forΦ119886119886119887119888
where119886 is not an ancestor of 119887 and 119888 as follows
Φ119886119886119887119888
= sum
119860
( sum
Type 1(1
2)
119871quadminus1
Φ119860119860119860119860
+ sum
Type 2(1
2)
119871quad
ΦAAA
+ sum
Type 3(1
2)
119871quad+1
Φ119860119860)
+sum
119860
( sum
Type 4(1
2)
119871 triple+1
Φ119860119860119860
+ sum
Type 5(1
2)
119871 triple+2
Φ119860119860)
(A3)
where 119860 a quad-common ancestor of 119886 119887 119888 and 119889When ⟨119875
1198601198861 1198751198601198862⟩ is not mergeable
Type 1 zero root 2-overlap and zero root 3-overlappathType 2 one root 2-overlap path 119875
119860119904ending at 119904
Type 3
Case 1 two root 2-overlap paths 1198751198601199041
and 1198751198601199042
ending at 1199041and 1199042 respectively
Case 2 one root 3-overlap path 119875119860119905
ending at 119905Case 3 one root 2-overlapand one root 3-overlap paths119875119860119904
and 119875119860119905
ending at 119904 and 119905respectively
(A4)
When ⟨1198751198601198861 1198751198601198862⟩ is mergeable
Type 4 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has zero root 2-overlap path
Type 5 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has one root 2-overlap path119875
119860119904
ending at 119904
119871quad=
1198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
for Type 11198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus119871119875119860119904
for Type 21198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus1198711198751198601199041
minus 1198711198751198601199042
for Case 1isinType 31198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus119871119875119860119905
for Case 2isinType 31198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus119871119875119860119905minus 119871119875119860119904
for Case 3isinType 3
119871 triple = 119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
for Type 4119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888minus 119871119875119860119904
for Type 5(A5)
Computational and Mathematical Methods in Medicine 15
Note that if 119886 is an ancestor of either 119887 or 119888 or both ofthem then the path-counting formula of Φ
119886119887119888119889is applicable
to computeΦ11988611198862119887119888
A3 Path-Counting Formula for Φ119886119886119886119887
A special case of⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ for ⟨119875
1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ is introduced
when ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable With the existence of
a mergeable path-triple ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ can be con-
densed to ⟨119875119860119886 119875119860119887⟩
Definition A3 (Mergeable Path-Triple) Given three paths1198751198601198861
1198751198601198862
and 1198751198601198863
they are mergeable if and only if theyare completely identical
Lemma A4 Given a path-quad ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there
must be at least one mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩
⟨1198751198601198861 1198751198601198863⟩ ⟨1198751198601198862 1198751198601198863⟩
Proof For an individual 119886 with two parents 119891 and 119898 thepaternal allele of the individual 119886 is transmitted from 119891 andthe maternal allele is transmitted from119898 At allele level onlytwo descent paths starting from an ancestor are allowed Fora path-quad ⟨119875
1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there must be at least one
mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩ ⟨1198751198601198861 1198751198601198863⟩ and
⟨1198751198601198862 1198751198601198863⟩
For simplicity we treat ⟨1198751198601198861 1198751198601198862⟩ as a default mergeable
path-pairNow we present the path-counting formula for Φ
119886119886119886119887
where 119886 is not an ancestor of 119887 as follows
Φ119886119886119886119887
= sum
119860
(3
2( sum
Type 1(1
2)
119871 tripleminus1
Φ119860119860119860
+ sum
Type 2(1
2)
119871 triple
Φ119860119860)
+ sum
Type 3(1
2)
119871pair+2
Φ119860119860)
(A6)
where 119860 a common ancestor of 119886 and 119887When there is only one mergeable path-pair (let us con-
sider ⟨1198751198601198861 1198751198601198862⟩ as the mergeable path-pair)
Type 1 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has zero root 2-overlap path
Type 2 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has one root 2-overlap path
119875119860119904
ending at 119904
When ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable
Type 3 ⟨119875119860119886 119875119860119887⟩ is nonoverlapping
119871 triple = 1198711198751198601198861
+ 1198711198751198601198863
+ 119871119875119860119887
for Type 11198711198751198601198861
+ 1198711198751198601198863
+ 119871119875119860119887minus 119871119875119860119904
for Type 2
119871pair = 119871119875119860119886 + 119871119875119860119887 for Type 3
(A7)
Note that if 119886 is an ancestor of 119887 we treat Φ119886119886119886119887
=
Φ119886111988621198863119887
Then we apply the path-counting formula forΦ119886119887119888119889
to computeΦ119886111988621198863119887
Case21 Case31 ΦAAAΦabCase22 Case32
Case23 ΦAA
Figure 17 Dependency graph for different cases regardingΦ119886119887119888
andΦ119886119886119887
B Proof for Path-Counting Formulas ofThree Individuals
Wefirst demonstrate that for one triple-common ancestor119860the path-counting computation of Φ
119886119887119888is equivalent to the
computation using recursive formulas Then we prove thecorrectness of the path-counting computation for multipletriple-common ancestors
B1 One Triple-Common Ancestor Considering the differenttypes of path-triples starting from a triple-common ancestor119860 in a pedigree graph119866 contributing toΦ
119886119887119888andΦ
119886119886119887119866 can
have 5 different cases
Case 21 119866 does not haveany path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with root overlapCase 22 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with root overlapCase 23 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
having mergeablepath-pair⟨119875
1198601198861 1198751198601198862⟩
lArr997904 Φ119886119886119887
Case 31 119866 does not haveany path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩
with root overlapCase 32 119866 has path-triples
⟨119875119860119886 119875119860119887 119875119860119888⟩
with root overlap
lArr997904 Φ119886119887119888
(B1)
Based on the 5 cases from Case 21 to Case 32 we firstconstruct a dependency graph shown in Figure 17 consist-ent with the recursive formulas (3) (4) and (5) for the gener-alized kinship coefficients for three individuals
Then we take the following steps to prove the correctnessof the path-counting formulas (12) and (A1)
(i) forΦ119886119887 the correctness of the path-counting formula
(ie Wrightrsquos formula) is proven in [21] For Case 21and Case 22 the correctness is proven based on thecorrectness of Cases 31 and 32
(ii) for Case 23 it has no cycle but only depends on Φ119886119887
Thus we prove the correctness of Case 23 by trans-forming the case toΦ
119886119887
16 Computational and Mathematical Methods in Medicine
a b
c
(a)
A
a b c
(b)
Figure 18 (a) 119888 is a parent of 119886 and 119887 (b) no individual is a parent of another
Parent-child relationshipAncestor-descendant relationship
A
a
s v p
f b c
(a)
Parent-child relationshipAncestor-descendant relationship
c
a
s v
f b
(b)
Figure 19 (a) No individual is a parent of another (b) 119888 is an ancestor of 119886 and 119887
(iii) for Cases 31 and 32 the correctness is proven byinduction on the number of edges 119899 in the pedigreegraph 119866
B11 Correctness Proof for Case 31
Case 31 ForΦ119886119887119888
119866 does not have any path triples ⟨119875119860119886 119875119860119887
119875119860119888⟩ with root overlap
Proof (Basis) There are two basic scenarios (i) one individ-ual is a parent of another (ii) no individual is a parent ofanother among 119886 119887 and 119888
Using the recursive formula (3) to compute Φ119886119887119888
forFigure 18(a) Φ
119886119887119888= (12)Φ
119888119887119888= (12)
2
Φ119888119888119888 for Figure 18(b)
Φ119886119887119888= (12)Φ
119860119887119888= (12)
2
Φ119860119860119888
= (12)3
Φ119860119860119860
Using the path-counting formula (12) if a path-triple
⟨119875119860119886 119875119860119887 119875119860119888⟩ has no root overlap (ie Type 1) then the
contribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ
119886119887119888can be computed as
follows sumType 1(12)119871⟨119875119860119886119875119860119887
119875119860119888⟩Φ119860119860119860
where 119871⟨119875119860119886119875119860119887 119875119860119888⟩
=
119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
For Figure 18(a) 119888 is the only triple-common ancestor
and we obtain Φ119886119887119888
= (12)119871⟨119875119888119886119875119888119887
119875119888119888⟩Φ119888119888119888
= (12)2
Φ119888119888119888 for
Figure 18(b) we obtain Φ119886119887119888
= (12)119871⟨119875119860119886119875119860119887
119875119860119888⟩Φ119860119860119860
=
(12)3
Φ119860119860119860
Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Then we show it istrue for 119899 = 119896 + 1
For Figures 19(a) and 19(b) among 119886 119887 and 119888 let 119886 be theindividual having the longest path starting from their triple-common ancestor in the pedigree graph119866with (119896+1) edgesIf we remove the node 119886 and cut the edge 119891 rarr 119886 from 119866
then the new graph 119866lowast has 119896 edges In terms of computingΦ119891119887119888
119866lowast satisfies the condition for induction hypothesisFor Figure 19(a) Φ
119891119887119888= sumType 1(12)
119871⟨119875119860119891119875119860119887119875119860119888⟩Φ119860119860119860
Based on the recursive formula (3)Φ
119886119887119888= (12)(Φ
119891119887119888+Φ119898119887119888)
where 119891 and 119898 are parents of 119886 In 119866 119886 only has one parent119891 thus it indicatesΦ
119898119887119888= 0 Then we can plug-in the path-
counting formula forΦ119891119887119888
to obtain
Φ119886119887119888=1
2Φ119891119887119888
=1
2lowast sum
Type 1(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩
Φ119860119860119860
= sum
Type 1(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860119860
∵ 119871⟨119875119860119886119875119860119887 119875119860119888⟩
= 119871⟨119875119860119891119875119860119887 119875119860119888⟩
+ 1
there4 Φ119886119887119888= sum
Type 1(1
2)
119871⟨119875119860119886119875119860119887119875119860119888⟩
Φ119860119860119860
(B2)
Similarly for Figure 19(b) we obtain Φ119886119887119888
=
sumType 1(12)119871⟨119875119888119891119875119888119887119875119888119888⟩+1
Φ119888119888119888= sumType 1(12)
119871⟨119875119888119886119875119888119887119875119888119888⟩Φ119888119888119888
Thus it is true for 119899 = 119896 + 1
B12 Correctness Proof for Case 32
Case 32 ForΦ119886119887119888
119866 has path triples ⟨119875119860119886 119875119860119887 119875119860119888⟩with root
overlap
Proof (Basis) There are three basic scenarios (i) there are twoindividuals who are parents of another (ii) there is only oneindividual who is parent of another (iii) there is no individualwho is a parent of another among 119886 119887 and 119888
Computational and Mathematical Methods in Medicine 17
a
b
c
(a)
A
a
b c
(b)
A
a
s
b
c
(c)
Figure 20 (a) 119887 is a parent of 119886 and 119888 is a parent of 119887 (b) 119887 is a parentof 119886 (c) no individual who is a parent of another
Using the recursive formula (3) to compute Φ119886119887119888
inFigure 20 for Figure 20(a) Φ
119886119887119888= (12)Φ
119887119887119888= (12)
2
Φ119887119888=
(12)3
Φ119888119888 for Figure 20(b)Φ
119886119887119888= (12)Φ
119887119887119888= (12)
2
Φ119887119888=
(12)4
Φ119860119860
for Figure 20(c)Φ119886119887119888= (12)
2
Φ119904119904119888= (12)
3
Φ119904119888=
(12)5
Φ119860119860
Using the path-counting formula (12) if a path-triple
⟨119875119860119886 119875119860119887 119875119860119888⟩ has root overlap (ie Type 2) then the con-
tribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ
119886119887119888can be computed as
followssumType 2(12)119871⟨119875119860119886119875119860119887
119875119860119888⟩+1
Φ119860119860
where 119871⟨119875119860119886 119875119860119887 119875119860119888⟩
=
119871119875119860119886
+ 119871119875119860119887
+ 119871119875119860119888minus 119871119875119860119904
and 119904 is the last individual of theroot overlap path 119875
119860119904
For Figure 20(a) 119888 is the only triple-common ancestorand we obtain Φ
119886119887119888= (12)
119871⟨119875119888119886119875119888119887119875119888119888⟩+1
Φ119888119888= (12)
2+1
Φ119888119888=
(12)3
Φ119888119888 Similarly for Figures 20(b) and 20(c) we obtain
Φ119886119887119888= (12)
4
Φ119860119860
and Φ119886119887119888= (12)
5
Φ119860119860
respectively
Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Show that it is truefor = 119896 + 1
For Figures 21(a) 21(b) and 21(c) among 119886 119887 and 119888 let119886 be the individual who has the longest path and let 119901 be aparent of 119886 Then we cut the edge 119901 rarr 119886 from 119866 and obtaina new graph 119866lowast which satisfies the condition of inductionhypothesis For Figure 21(a) we use the path-counting for-mula forΦ
119891119887119888in 119866lowast Φ
119891119887119888= sumType 2(12)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860
In 119866 119891 is the only parent of 119886 according to the recursive
formula (3) we have Φ119886119887119888= (12)Φ
119891119887119888 Then we can plug-in
the Φ119891119887119888
and obtain
Φ119886119887119888=1
2Φ119891119887119888
=1
2sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860
= sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1
Φ119860119860
∵ 119871⟨119875119860119886 119875119860119887 119875119860119888⟩
= 119871⟨119875119860119891119875119860119887 119875119860119888⟩
+ 1
there4 Φ119886119887119888= sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1
Φ119860119860
= sum
Type 2(1
2)
119871⟨119875119860119886119875119860119887119875119860119888⟩+1
Φ119860119860
(B3)
For Figures 21(b) and 21(c) we take the same steps as we cal-culate Φ
119886119887119888for Figure 21(a)
In summary it is true for 119899 = 119896 + 1
A
a
s
t
f
b
c
(a)
a
t
b
A
s c
(b)
a
s
t
b
c
(c)Figure 21 (a) No individual who is a parent of another (b) 119887 is aparent of 119886 (c) 119887 is a parent of 119886 and 119888 is an ancestor of 119887
B13 Correctness Proof for Case 23
Case 23 For Φ119886119886119887
the path-triples in the pedigree graph 119866have mergeable path-pair
Proof Considering the relationship between 119886 and 119887 119866has two scenarios (i) 119887 is not an ancestor of 119886 (ii) 119887 isan ancestor of 119886 Using the path-counting formula (A1)if a path-triple ⟨119875
1198601198861 1198751198601198862 119875119860119887⟩ isin Type 3 which means
that it has a mergeable path-pair then the contributionof ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ to Φ
119886119886119887can be computed as follows
sumType 3(12)119871⟨119875119860119886119875119860119887
⟩+1Φ119860119860
where 119871⟨119875119860119886 119875119860119887⟩
= 119871119875119860119886+ 119871119875119860119887
Using the recursive formula (4) we obtain Φ
119886119886119887=
(12)(Φ119886119887+ Φ119891119898119887)
For Figure 22(a) 119860 is a common ancestor of 119886 and 119887∵ 119886 only has one parent 119891
there4 Φ119886119886119887
=1
2(Φ119886119887+ Φ119891119898119887)
=1
2(Φ119886119887+ 0) =
1
2Φ119886119887
(as 119898 is missing) (B4)
For Φ119886119887 we use Wrightrsquos formula and obtain Φ
119886119887=
sum119875(12)119871⟨119875119860119886119875119860119887
⟩Φ119860119860
where 119875 denotes all nonoverlappingpath-pairs ⟨119875
119860119886 119875119860119887⟩
Then we have Φ119886119886119887
= (12)Φ119886119887
=
(12)sum119875(12)119871⟨119875119860119886119875119860119887
⟩Φ119860119860= sum119875(12)119871⟨119875119860119886119875119860119887
⟩+1Φ119860119860
For Figure 22(b) we can also transform the computation
of Φ119886119886119887
to Φ119886119887
In summary it shows that the path-counting formula(A1) is true for Case 23
B14 Correctness Proof for Cases 21 and 22 For Φ119886119886119887
whenthere is no path-triple having mergeable path-pair (ie thepath-triple belongs to either Case 21 or Case 23)Φ
119886119886119887can be
transformed toΦ11988611198862119887
which is equivalent to the computationof Φ119886119887119888
for Cases 31 and 32 The correctness of our path-counting formula for Cases 31 and 32 is proven Thus weobtain the correctness for Φ
119886119886119887when the path-triple belongs
to either Case 21 or Case 22
B2 Multiple Triple-Common Ancestors Now we providethe correctness proof for multiple triple-common ancestorsregarding the path-counting formulas (12) and (A1)
18 Computational and Mathematical Methods in Medicine
A
a
s
w
t
f
b
Parent-child relationshipAncestor-descendant relationship
(a)
a
s
f
b
Parent-child relationshipAncestor-descendant relationship
(b)
Figure 22 (a) 119887 is not an ancestor of 119886 (b) 119887 is an ancestor of 119886
Lemma A Given a pedigree graph 119866 and three individuals 119886119887 119888 having at least one trip-common ancestorΦ
119886119887119888is correctly
computed using the path counting formulas (12) and (A1)
Proof Proof by induction on the number of triple-commonancestorsBasis 119866 has only one triple-common ancestor of 119886 119887 and 119888
The correctness of (12) and (A1) for 119866 with only one tri-ple-common ancestor of 119886 119887 and 119888 is proven in the previoussection
Induction Hypothesis Assume that if 119866 has 119896 or less triple-common ancestors of 119886 119887 and 119888 (12) and (A1) are correct for119866
Induction Step Now we show that it is true for 119866 with 119896 + 1triple-common ancestors of 119886 119887 and 119888
Let 119879119903119894 119862(119886 119887 119888 119866) denote all triple-common ancestorsof 119886 119887 and 119888 in 119866 where 119879119903119894 119862(119886 119887 119888 119866) = 119860
119894| 1 le 119894 le 119896 +
1 Let 1198601be the most top triple-common ancestor such that
there is no individual among the remaining ancestors 119860119894|
2 le 119894 le 119896 + 1 who is an ancestor of 1198601 Let 119878(119860
1) denote the
contribution from 1198601to Φ119886119887119888
Because119860
1is themost top triple-common ancestor there
is no path-triple from 119860119894| 2 le 119894 le 119896 + 1 to 119886 119887 and
119888 which passes through 1198601 Then we can remove 119860
1from
119866 and delete all out-going edges from 1198601and obtain a new
graph 1198661015840 which has 119896 triple-common ancestors of 119886 119887 and 119888It means 119879119903119894 119862(119886 119887 119888 1198661015840) = 119860
119894| 2 le 119894 le 119896 + 1
For the new graph 1198661015840 we can apply our induction
hypothesis and obtainΦ119886119887119888(1198661015840
)For the most top triple-common ancestor 119860
1 there are
two different cases considering its relationship with the othertriple-common ancestors
(1) there is no individual among 119860119894| 2 le 119894 le 119896 + 1 who
is a descendant of 1198601
(2) there is at least one individual among 119860119894| 2 le 119894 le
119896 + 1 who is a descendant of 1198601
For (1) since no individual among 119860119894| 2 le 119894 le 119896 + 1 is a
descendant of 1198601 the set of path-triples from 119860
1to 119886 119887 and
119888 is independent of the set of path-triples from 119860119894| 2 le 119894 le
119896 + 1 to 119886 119887 and 119888 It also means that the contribution from
1198601toΦ119886119887119888
is independent of the contribution from the othertriple-common ancestors
Summing up all contributions we can obtainΦ119886119887119888(119866) =
Φ119886119887119888(1198661015840
) + 119878(1198601)
For (2) let119860119895be one descendant of119860
1 Now both119860
1and
119860119895can reach 119886 119887 and 119888119901119905119894= 119905119886 1198601rarr sdot sdot sdot rarr 119886 119905
119887 1198601rarr sdot sdot sdot rarr 119887 119905
119888 1198601rarr
sdot sdot sdot rarr 119888 a path-triple from 1198601to 119886 119887 and 119888
If 119905119886 119905119887 and 119905
119888all pass through119860
119895 then the path-triple119901119905
119894
is not an eligible path-triple for Φ119886119887119888
When we compute thecontribution from119860
1toΦ119886119887119888
we exclude all such path-tripleswhere 119905
119886 119905119887 and 119905
119888all pass through a lower triple-common
ancestor In other words an eligible path-triple from 1198601
regarding Φ119886119887119888
cannot have three paths all passing through alower triple-common ancestor Therefore we know that thatthe contribution from119860
1toΦ119886119887119888
is independent of the contri-bution from the other triple-common ancestors Summing upall contributions we obtainΦ
119886119887119888(119866) = Φ
119886119887119888(1198661015840
) + 119878(1198601)
C Proof for Four Individuals and TwoPairs of Individuals
Here we give a proof sketch for the correctness of pathcounting formulas for four individuals First of all for fourindividuals in a pedigree graph 119866 we present all differentcases based on which we construct a dependency graphThe correctness of the path-counting formulas for two-pairindividuals can be proved similarly
C1 Proof for Four Individuals Consider the existence ofdifferent types of path-quads regarding Φ
119886119887119888119889 Φ119886119886119887119888
andΦ119886119886119886119887
there are 15 cases for a pedigree graph 119866
Case 21 119866 has path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with zero root overlapCase 22 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with one root overlapCase 23 119866 has path-pairs
⟨119875119860119886 119875119860119887⟩
with zero root overlap
lArr997904 Φ119886119886119886119887
Computational and Mathematical Methods in Medicine 19
Case21
Case31 ΦAAA
ΦAAA
Case41
Case42
Case34ΦAA
Case32
Case331
Case22
Case23
Case431
Case35
Case432
Case4 33
Case332
Case333
Figure 23 Dependency graph for different cases for four individuals
Case 31 119866 has path-quads⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with zero root overlapCase 32 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 2-overlapCase 331 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with two root 2-overlapCase 332 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 3-overlapCase 333 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 2-overlapand one root 3-overlap
Case 34 119866 has path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩
with zero root overlapCase 35 119866 has path-triples
⟨119875119860119886 119875119860119887 119875119860119888⟩
with one root overlap
lArr997904 Φ119886119886119887119888
Case 41 119866 has path-quads⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with zero root overlapCase 42 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 2-overlapCase 431 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with two root 2-overlapCase 432 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 3-overlapCase 433 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 2-overlapand one root 3-overlap
lArr997904 Φ119886119887119888119889
(C1)Then we construct a dependency graph shown in
Figure 23 for all cases for four individualsAccording to the dependency graph in Figure 23 the
intermediate steps including Cases 34 and 35 are already
proved for the computation of Φ119886119887119888
The correctness of thetransformation fromCase 42 to Case 34 can be proved basedon the recursive formula forΦ
119886119887119888119889andΦ
119886119886119887119888 Similarly we can
obtain the transformation from Case 431 to Case 35
C2 Proof for TwoPairs of Individuals Consider the existenceof different types of 2-pair-path-pair regarding Φ
119886119887119888119889 there
are 9 cases which are listed as follows
Case 41 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-
overlap and zero root heter-overlap
Case 42 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-
overlap and one root heter-overlap
Case 431 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root
homo-overlap and two root heter-overlap
Case 432 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root
homo-overlap and two root heter-overlap
Case 44119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root homo-
overlap and zero root heter-overlap
Case 45 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with two root homo-
overlap and zero root heter-overlap
Case 46 119866 has path-triples ⟨119875119860119886 119875119860119887 119875119860119888⟩ with zero root
overlapCase 47 119866 has path-triples ⟨119875
119860119886 119875119860119887 119875119860119888⟩ with one root
overlap
Case 48 119866 has path-pairs ⟨119875119879119888 119875119879119889⟩ with zero root overlap
Then we construct a dependency graph for the casesrelating to Φ
119886119887119888119889in Figure 24
According to the dependency graph in Figure 24Cases 46 47 and 48 are the intermediate steps whichalready are proved for the computation of Φ
119886119887119888 The
correctness of the transformation from Case 42 to Case 46can be proved based on the recursive formula for Φ
119886119887119888119889and
Φ119886119887119886119888
Similarly we can obtain the transformation fromCases 431 and 432 to Case 47 as well as from Case 44 toCase 48 accordingly
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
20 Computational and Mathematical Methods in Medicine
Case41
Case44
ΦAAA
Case42 Case46
Case48
ΦAA
ΦTT
Case431 Case47
Case432
ΦAAAA
Figure 24 Dependency graph for different cases for two pairs of individuals
Acknowledgments
The authors thank Professor Robert C Elston Case Schoolof Medicine for introducing to them the identity coefficientsand referring them to the related literature [7 10 17] Thiswork is partially supported by the National Science Founda-tionGrants DBI 0743705 DBI 0849956 andCRI 0551603 andby the National Institute of Health Grant GM088823
References
[1] Surgeon Generalrsquos New Family Health History Tool Is ReleasedReady for ldquo21st Century Medicinerdquo httpcompmedcomcate-gorypeople-helping-peoplepage7
[2] M Falchi P Forabosco E Mocci et al ldquoA genomewidesearch using an original pairwise sampling approach for largegenealogies identifies a new locus for total and low-density lipo-protein cholesterol in two genetically differentiated isolates ofSardiniardquoThe American Journal of Human Genetics vol 75 no6 pp 1015ndash1031 2004
[3] M Ciullo C Bellenguez V Colonna et al ldquoNew susceptibilitylocus for hypertension on chromosome 8q by efficient pedigree-breaking in an Italian isolaterdquo Human Molecular Genetics vol15 no 10 pp 1735ndash1743 2006
[4] Glossary of Genetic Terms National Human Genome ResearchInstitute httpwwwgenomegovglossaryid=148
[5] CW CottermanA calculus for statistico-genetics [PhD thesis]Columbus Ohio USA Ohio State University 1940 Reprintedin P Ballonoff Ed Genetics and Social Structure DowdenHutchinson amp Ross Stroudsburg Pa USA 1974
[6] G Malecot Les mathematique de lrsquoheredite Masson ParisFrance 1948 Translated edition The Mathematics of HeredityFreeman San Francisco Calif USA 1969
[7] M Gillois ldquoLa relation drsquoidentite en genetiquerdquo Annales delrsquoInstitut Henri Poincare B vol 2 pp 1ndash94 1964
[8] D L Harris ldquoGenotypic covariances between inbred relativesrdquoGenetics vol 50 pp 1319ndash1348 1964
[9] A Jacquard ldquoLogique du calcul des coefficients drsquoidentite entredeux individualsrdquo Population vol 21 pp 751ndash776 1966
[10] G Karigl ldquoA recursive algorithm for the calculation of identitycoefficientsrdquo Annals of Human Genetics vol 45 no 3 pp 299ndash305 1981
[11] B Elliott S F Akgul S Mayes and Z M Ozsoyoglu ldquoEfficientevaluation of inbreeding queries on pedigree datardquo in Proceed-ings of the 19th International Conference on Scientific and Statis-tical Database Management (SSDBM rsquo07) July 2007
[12] B Elliott E Cheng S Mayes and Z M Ozsoyoglu ldquoEfficientlycalculating inbreeding on large pedigrees databasesrdquo Informa-tion Systems vol 34 no 6 pp 469ndash492 2009
[13] L Yang E Cheng and Z M Ozsoyoglu ldquoUsing compactencodings for path-based computations on pedigree graphsrdquo inProceedings of the ACM Conference on Bioinformatics Compu-tational Biology and Biomedicine (ACM-BCB rsquo11) pp 235ndash244August 2011
[14] E Cheng B Elliott and Z M Ozsoyoglu ldquoScalable compu-tation of kinship and identity coefficients on large pedigreesrdquoin Proceedings of the 7th Annual International Conference onComputational Systems Bioinformatics (CSB rsquo08) pp 27ndash362008
[15] E Cheng B Elliott and Z M Ozsoyoglu ldquoEfficient compu-tation of kinship and identity coefficients on large pedigreesrdquoJournal of Bioinformatics and Computational Biology (JBCB)vol 7 no 3 pp 429ndash453 2009
[16] S Wright ldquoCoefficients of inbreeding and relationshiprdquo TheAmerican Naturalist vol 56 no 645 1922
[17] R Nadot and G Vaysseix ldquoKinship and identity algorithm ofcoefficients of identityrdquo Biometrics vol 29 no 2 pp 347ndash3591973
[18] E Cheng Scalable path-based computations on pedigree data[PhD thesis] Case Western Reserve University ClevelandOhio USA 2012
[19] V Ollikainen Simulation Techniques for Disease Gene Localiza-tion in Isolated Populations [PhD thesis] University ofHelsinkiHelsinki Finland 2002
[20] H T T Toivonen P Onkamo K Vasko et al ldquoData miningapplied to linkage diseqilibrium mappingrdquoThe American Jour-nal of Human Genetics vol 67 no 1 pp 133ndash145 2000
[21] W Boucher ldquoCalculation of the inbreeding coefficientrdquo Journalof Mathematical Biology vol 26 no 1 pp 57ndash64 1988
Submit your manuscripts athttpwwwhindawicom
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MEDIATORSINFLAMMATION
of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Behavioural Neurology
EndocrinologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Disease Markers
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
OncologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Oxidative Medicine and Cellular Longevity
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
PPAR Research
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
ObesityJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational and Mathematical Methods in Medicine
OphthalmologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Diabetes ResearchJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Research and TreatmentAIDS
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Gastroenterology Research and Practice
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Parkinsonrsquos Disease
Evidence-Based Complementary and Alternative Medicine
Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom
Computational and Mathematical Methods in Medicine 5
A
c
s
dt
f
ba
m
Path-quad1
Path-quad2
Path-quad3
Path-quad4
A rarr cA rarr d
A rarr t rarr f rarr s rarr aA rarr m rarr s rarr b
A rarr t rarr f rarr s rarr aA rarr t rarr f rarr s rarr bA rarr cA rarr d
A rarr t rarr f rarr s rarr aA rarr t rarr f rarr s rarr bA rarr t rarr f rarr s rarr cA rarr d
A rarr t rarr f rarr s rarr aA rarr t rarr m rarr s rarr bA rarr t rarr m rarr s rarr cA rarr d
Figure 3 Examples of path-quads
Table 1 The conceptual terms used for two three and four individuals
Two individuals Three individuals Four individualsCommon ancestor Triple-common ancestor Quad-common ancestorPath-pair Path-triple Path-quad119861119894 119862(119875
119860119886 119875119860119887) 119879119903119894 119862(119875
119860119886 119875119860119887 119875119860119888) 119876119906119886119889 119862(119875
119860119886 119875119860119887 119875119860119888 119875119860119889)
NA 2-Overlap individual 3-Overlap individualNA 2-Overlap path 3-Overlap pathNA Root 2-overlap path Root 3-overlap pathNA Crossover individual Crossover individual
the two paths share no common individuals except 119860 InFigure 2 path-pair2 is an acceptable path-pair while path-pair1 path-pair3 and path-pair4 are not acceptable path-pairs The contribution of each common ancestor 119860 toΦ
119886119887is
computed based on the inbreeding coefficient of 119860 modifiedby the length of each acceptable path-pair
To compute Φ119886119887119888
the path-counting approach requiresidentifying all triple-common ancestors of 119886 119887 and 119888 andsumming up all triple-common ancestorsrsquo contributions toΦ119886119887119888
For each triple-common ancestor denoted as119860 we firstidentify all path-triples each of which consists of three pathsfrom 119860 to 119886 119887 and 119888 respectively Some examples of path-triples are presented in Figure 2
For Φ119886119887 only nonoverlapping path-pairs are acceptable
A path-triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ consists of three path-pairs
⟨119875119860119886 119875119860119887⟩ ⟨119875119860119886 119875119860119888⟩ and ⟨119875
119860119887 119875119860119888⟩ For Φ
119886119887119888 a path-triple
might be acceptable even though either 2-overlap individualsor crossover individuals exist between a path-pair Themain challenge we need to address is finding necessary andsufficient conditions for acceptable path-triples
Aiming at solving the problem of identifying acceptablepath-triples we first use a systematic method to generate allpossible cases for a path-pair by considering different types ofcommon individuals shared between the two pathsThen weintroduce building blocks which are connected graphs withconditions on every edge in the graph that encapsulates a
set of acceptable cases of path-pairs In each building blockwe represent paths as nodes and interactions (ie sharedcommon individuals between two paths) as edges There areat least two paths in a building block For each buildingblock we obtain all acceptable cases for concerned path-pairs Given a path-triple it can be decomposed to one ormultiple building blocks Considering a shared path-pairbetween two building blocks we use the natural join operatorfrom relational algebra to match the acceptable cases forthe shared path-pair between two building blocks In otherwords considering the acceptable cases for building blocksas inputs we use the natural join operator to construct allacceptable cases for a path-triple Acceptable cases for a path-triple are identified and then used in deriving the path-counting formula forΦ
119886119887119888
Then we summarize all the main procedures used forderiving the path-counting formula for Φ
119886119887119888in a flowchart
shown in Figure 4 The main procedures are also applicablefor deriving the path-counting formulas forΦ
119886119887119888119889andΦ
119886119887119888119889
3 Results and Discussion
31 Path-Counting Formulas for Three Individuals We firstintroduce a systematic method to generate all possible cases
6 Computational and Mathematical Methods in Medicine
Path-pair
Path-triple Path-pair levelrepresentation Decomposition A set of
building blocksSets of acceptable casesFor each building block
Acceptable cases forpath-triple Natural join
If path-pair hascrossover
No
No
Yes
Yes
Split operator
Path-triple belongs toType 2
Type 1
If path-pair hasroot overlap
Compute its contributionto Φabc
Path-triple belongs to
⟨PAa PAb⟩Generate all cases for Identify nonoverlap path-
Pairs for ⟨PAa PAb⟩Compute its contribution
to Φab
Identify acceptable cases⟨PAa PAb⟩ in thefor
context of a path-triple
Aa PAb PAc ⟩⟨P
⟨PAa PAb⟩
Figure 4 A flowchart for path-counting formula derivation
for a path-pair Then we discuss building blocks for path-triples and identify all acceptable cases which are used inderiving the path-counting formula forΦ
119886119887119888
311 Cases for a Path-Pair Given a path-pair ⟨119875119860119886 119875119860119887⟩with
119861119894 119862(119875119860119886 119875119860119887) = 119873119880119871119871 where 119860 is a common ancestor of 119886
and 119887 and 119861119894 119862(119875119860119886 119875119860119887) consists of all common individuals
shared between 119875119860119886
and 119875119860119887 except 119860 we introduce three
patterns (ie crossover 2-overlap and root 2-overlap) to gen-erate all possible cases for ⟨119875
119860119886 119875119860119887⟩
(1) 119883(119875119860119886 119875119860119887) 119875119860119886
and 119875119860119887
share one or multiple cross-over individuals
(2) 119879(119875119860119886 119875119860119887) 119875119860119886
and 119875119860119887
are root 2-overlapping from119860 and the root 2-overlap path can have one or multi-ple 2-overlap individuals
(3) 119884(119875119860119886 119875119860119887)119875119860119886
and119875119860119887
are overlapping but not from119860 and the 2-overlap path can have one or multiple 2-overlap individuals
Based on the three patterns 119883(119875119860119886 119875119860119887) 119879(119875
119860119886 119875119860119887)
and 119884(119875119860119886 119875119860119887) we use regular expressions to generate all
possible cases for the path-pair ⟨119875119860119886 119875119860119887⟩ For convenience
we drop ⟨119875119860119886 119875119860119887⟩ and use 119883119879 and 119884 instead of patterns
119883(119875119860119886 119875119860119887) 119879(119875
119860119886 119875119860119887) and 119884(119875
119860119886 119875119860119887) whenever there is
no confusion When 119861119894 119862(119875119860119886 119875119860119887) = 119873119880119871119871 the eight cases
shown in (7) cover all possible cases for ⟨119875119860119886 119875119860119887⟩ The com-
pleteness of eight cases shown in (7) for ⟨119875119860119886 119875119860119887⟩ can be
proved by induction on the total number of 119879 119883 and 119884appearing in ⟨119875
119860119886 119875119860119887⟩ Using the pedigree in Figure 2 Cases
1ndash3 and Case 6 are illustrated in (8) (9) (10) and (11)
Case 1 119879Case 2 119883+
Case 3 119879119883+
Case 4 119879(119883+119884)+
Case 5 119879(119883+119884)+119883+
Case 6 119883+119884Case 7 119883+(119884119883+)+
Case 8 119883+(119884119883+)+119884
(7)
119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119886
119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119887 isin 119879 (8)
Computational and Mathematical Methods in Medicine 7
S0 S1 S2 S3
PAa PAb
PAc
Figure 5 A path-pair level graphical representation of ⟨119875119860119886 119875119860119887 119875119860119888⟩
where 119904 119890 119905 are 2-overlap individuals and the overlap pathis a root 2-overlap path
119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119886
119860 997888rarr 119904 997888rarr 119891 997888rarr 119905 997888rarr 119887 isin 119879119883 (9)
where 119904 is a 2-overlap individual and the overlap path is a root2-overlap path 119905 is a crossover individual
119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119886
119860 997888rarr 119889 997888rarr 119891 997888rarr 119905 997888rarr 119887 isin 119883 (10)
where 119905 is a crossover individual
119860 997888rarr 119888 997888rarr 119890 997888rarr 119905 997888rarr 119886
119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119887 isin 119883119884 (11)
where 119890 is a crossover individual 119905 is a 2-overlap individualand the overlap path is a 2-overlap path
312 Path-Pair Level Graphical Representation of a Path-Tri-ple Given a path-triple ⟨119875
119860119886 119875119860119887 119875119860119888⟩ we represent each
path as a node The path-triple can be decomposed to threepath-pairs (ie ⟨119875
119860119886 119875119860119887⟩ ⟨119875119860119886 119875119860119888⟩ and ⟨119875
119860119887 119875119860119888⟩) For
each path-pair if the two paths share at least one commonindividual (ie either 2-overlap individual or crossover indi-vidual) except119860 then there is an edge between the two nodesrepresenting the two paths Therefore we obtain four differ-ent scenarios 119878
0ndash1198783 shown in Figure 5
In Figure 5 the scenario 1198780has no edges so it means
that ⟨119875119860119886 119875119860119887 119875119860119888⟩ consists of three independent paths In
Figure 2 path-triple1 is an example of 1198780 Next we introduce
a lemma which can assist with identifying the options for theedges in the scenarios 119878
1ndash1198783
Lemma 3 Given a path-triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ consider the
three path-pairs ⟨119875119860119886 119875119860119887⟩ ⟨119875119860119886 119875119860119888⟩ and ⟨119875
119860119887 119875119860119888⟩ if there
is a 2-overlap edge which is represented by 119884 in regular expres-sion representation of any of the three path-pairs and then thepath-triple ⟨119875
119860119886 119875119860119887 119875119860119888⟩ has no contribution to Φ
119886119887119888
Proof In [17] Nadot and Vaysseix proposed from a geneticand biological point of view that Φ
119886119887119888can be evaluated by
enumerating all eligible inheritance paths at allele-level start-ing from a triple common ancestor119860 to the three individuals119886 119887 and 119888
p1
p3
A
b c
a
p2
p5
p8
p4
p7
p6
(a) Pedigree
A
b c
a
p5
p7
p4
p6
p8
p1 p2
p3
(b) Inheritance paths
Figure 6 Examples of pedigree and inheritance paths
For the pedigree in Figure 6 let us consider the path-triple ⟨119875
119860119886 119875119860119887 119875119860119888⟩ listed as follows 119875
119860119886 119860 rarr 119886 119875
119860119887
119860 rarr 1199013rarr 1199016rarr 1199017rarr 119887 119875
119860119888 119860 rarr 119901
4rarr 1199016rarr
1199017rarr 119888For ⟨119875
119860119887 119875119860119888⟩ 1199016is a crossover individual 119901
7is an over-
lap individual and 1199016rarr 1199017is a 2-overlap edge repre-sented
by 119884 in regular expression representation (see the definitionfor 119884 in Section 311)
For the individual 1199016 let us denote the two alleles at one
fixed autosomal locus as 1198921and 119892
2 At allele-level only one
allele can be passed down from 1199016to 1199017 Since 119901
3and 119901
4
are parents of 1199016 1198921is passed down from one parent and
1198922is passed down from the other parent It is infeasible to
pass down both 1198921and 119892
2from 119901
6to 1199017 In other words
there are no corresponding inheritance paths for the path-triple ⟨119875
119860119886 119875119860119887 119875119860119888⟩with a 2-overlap edge between ⟨119875
119860119887 119875119860119888⟩
(ie Case 6119883119884) Therefore such kind of path-triples has nocontribution toΦ
119886119887119888
Figure 6(b) shows one example of eligible inheritancepaths corresponding to a pedigree graph Each individual isrepresented by two allele nodesThe eligible inheritance pathsin Figure 6(b) consist of red edges only
Only Case 1 Case 2 and Case 3 do not have 119884 in theregular expression representation of a path-pair (see (7))considering the scenarios 119878
1ndash1198783shown in Figure 5 an edge
can have three options Case 1 119879Case 2 119883Case 3 119879119883
313 Constructing Cases for a Path-Triple For the scenarios1198781ndash1198783in Figure 5 we define two building blocks 119861
1 1198612
along with some rules in Figure 7 to generate acceptablecases For 119861
1 the edge can have three options Case 1 119879
Case 2 119883 Case 3 119879119883 For 1198612 we cannot allow both edges
to be root overlap because if two edges are root overlap then
8 Computational and Mathematical Methods in Medicine
For B2 there can be at most one edge belonging to root overlap (either T or TX)
PAa PAa
PAb PAb PAc
B1 B2
For B1 the edge can have three options case 1 T case 2 X case 3 TX
Figure 7 Building blocks 1198611 1198612 and basic rules
Note Ri denotes all acceptable path-triples for ui
S3e1
T3 = R1 ⋈ R2 ⋈ R3u1 u2 u3
e2 e2 e2
e3e3 e3e1 e1
Figure 8 A graphical illustration for obtaining 1198793
119875119860119886
and 119875119860119888
must share at least one com-mon individualexcept 119860 which contradicts the fact that 119875
119860119886and 119875
119860119888have
no edgeNext we focus on generating all acceptable cases for the
scenarios 1198781ndash1198783in Figure 5 where only 119878
3contains more
than one building block In order to leverage the dependencyamong building blocks we decompose 119878
3to 1198783= 1199061= 1198612
1199062= 1198612 1199063= 1198612 shown in Figure 8 For each 119906
119894 we have a
set of acceptable path-triples denoted as 119877119894
Considering the dependency among 1198771 1198772 1198773 we use
the natural join operator denoted as ⋈ operating on 1198771
1198772 1198773 to generate all acceptable cases for 119878
3 As a result we
obtain 1198793= 1198771⋈ 1198772⋈ 1198773 where 119879
3denotes the acceptable
cases of the path-triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ in the scenario 119878
3
For each scenario in Figure 5 we generate all acceptablecases for ⟨119875
119860119886 119875119860119887 119875119860119888⟩ The scenario 119878
0has no edges and
it shows that ⟨119875119860119886 119875119860119887 119875119860119888⟩ consists of three independent
paths while for the other scenarios 119878119896(119896 = 1 2 3) the 119896
edges can have two options
(1) all 119896 edges belong to crossover or(2) one edge belongs to root 2-overlap the remaining (119896minus
1) edges belong to crossover
In summary acceptable path-triples can have at most oneroot 2-overlap path any number of crossover individuals butzero 2-overlap path
314 Splitting Operator Considering the existence of root2-overlap path and crossover in acceptable path-triples wepropose a splitting operator to transform a path-triple withcrossover individuals to a noncrossover path-triple withoutchanging the contribution from this path-triple to Φ
119886119887119888 The
main purpose of using the splitting operator is to simplifythe path-counting formula derivation process We first usean example in Figure 9 to illustrate how the splitting operator
works In Figure 9 there is a crossover individual 119904 between119875119860119886
and 119875119860119887
in the path triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ in 119866
119896+1 The
splitting operator proceeds as follows
(1) split the node 119904 to two nodes 1199041and 1199042
(2) transform the edges 119904 rarr 1198861015840 and 119904 rarr 119887
1015840 to 1199041rarr 1198861015840
and 1199042rarr 1198871015840 respectively
(3) add two new edges 1199042rarr 1198861015840 and 119904
1rarr 1198871015840
Lemma 4 Given a pedigree graph 119866119896+1
having (119896 + 1)
crossover individuals regarding ⟨119875119860119886 119875119860119887 119875119860119888⟩ shown in
Figure 9 let 119904 denote the lowest crossover individual where nodescendant of 119904 can be a crossover individual among the threepaths119875
119860119886119875119860119887 and119875
119860119888 After using the splitting operator for the
lowest crossover individual 119904 in119866119896+1 the number of crossover
individuals in 119866119896+1
is decreased by 1
Proof The splitting operator only affects the edges from 119904 to1198861015840 and 1198871015840 If there is a new crossover node appearing the only
possible node is either 1198861015840 or 1198871015840 Assume 1198871015840 becomes a cross-over individual it means that 1198871015840 is able to reach 119886 and 119887 fromtwo separate paths It contradicts the fact that 119904 is the lowestcrossover individual between 119875
119860119886and 119875
119860119887
Next we introduce a canonical graph which results fromapplying the splitting operator for all crossover individualsThe canonical graph has zero crossover individual
Definition 5 (Canonical Graph) Given a pedigree graph 119866having one or more crossover individuals regarding Φ
119886119887119888 If
there exists a graph 1198661015840 which has no crossover individualswith regards to Φ
119886119887119888such that
(i) any acceptable path-triple in 119866 has an acceptablepath-triple in 1198661015840 which has the same contribution toΦ119886119887119888
as the one in 119866 forΦ119886119887119888
(ii) any acceptable path-triple in 1198661015840 has an acceptablepath-triple in 119866 which and has the same contributionto Φ119886119887119888
as the one in 1198661015840 forΦ119886119887119888
We call 1198661015840 a canonical graph of 119866 regardingΦ119886119887119888
Lemma 6 For a pedigree graph 119866 having one or morecrossover individuals regarding ⟨119875
119860119886 119875119860119887 119875119860119888⟩ there exists a
canonical graph 1198661015840 for 119866
Computational and Mathematical Methods in Medicine 9
Ancestor-descendant relationshipParent-child relationship
a998400 b
a b a b
998400 a998400 b998400
s1 s2
A A
x w c x w c
s For Gk+1 ⟨P ⟩ = PAa PAb PAc
⟨P ⟩ = PAa PAb PAcFor Gk
Gk+1 k + 1 crossover Gk k crossover
A rarr middot middot middot rarr x rarr s rarr a998400 rarr middot middot middot rarr aA rarr middot middot middot rarr w rarr s rarr b998400 rarr middot middot middot rarr b
A rarr middot middot middot rarr x rarr s1 rarr a998400 rarr middot middot middot rarr aA rarr middot middot middot rarr w rarr s2 rarr b998400 rarr middot middot middot rarr b
A rarr c
A rarr c
Figure 9 Transforming pedigree graph 119866119896+ 1 having 119896 + 1 crossover to 119866
119896having 119896 crossover
S0
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
PAa PAd
PAb PAc
Figure 10 A path-pair level graphical representation of ⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
Proof (Sketch) The proof is by induction on the number ofcrossover individuals
Induction hypothesis assume that if119866 has 119896 or less cross-overs there is a canonical graph 1198661015840 for 119866
In the induction step let119866119896+1
be a graph with 119896+1 cross-overs let 119904 be the lowest crossover between paths 119875
119860119886and
119875119860119887
in 119866119896+1
We apply the splitting operator on 119904 in 119866119896+1
andobtain 119866
119896having 119896 crossovers by Lemma 4
315 Path-Counting Formula for Φ119886119887119888
Now we present thepath-counting formula forΦ
119886119887119888
Φ119886119887119888= sum
119860
( sum
Type 1(1
2)
119871 triple
Φ119860119860119860
+ sum
Type 2(1
2)
119871 triple+1
Φ119860119860)
(12)
where Φ119860119860= (12)(1 + 119865
119860) Φ119860119860119860
= (14)(1 + 3119865119860) 119865119860 the
inbreeding coefficient of119860119860 a triple-common ancestor of 119886119887 and 119888 Type 1 ⟨119875
119860119886 119875119860119887 119875119860119888⟩ has zero root 2-overlap Type
2 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has one root 2-overlap path 119875
119860119904ending at
the individual 119904
119871 triple = 119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888minus 119871119875119860119904
for Type 2(13)
and 119871119875119860119886
the length of the path 119875119860119886
(also applicable for 119875119860119886
119875119860119888 and 119875
119860119904)
For completeness the path-counting formula for Φ119886119886119887
isgiven in Appendix A and the correctness proof of the path-counting formula is given in Appendix B
32 Path-Counting Formulas for Four Individuals
321 Path-Pair Level Graphical Representation of ⟨119875119860119886119875119860119887
119875119860119888119875119860119889⟩ Given a path-quad ⟨119875
119860119886 119875119860119887 119875119860119888 119875119860119889⟩ and
119876119906119886119889 119862(119875119860119886 119875119860119887 119875119860119888 119875119860119889) = 0 the path-quad can have 11
scenarios 1198780ndash11987810shown in Figure 10 where all four paths are
considered symmetricallyIn Figure 11 we introduce three building blocks 119861
1
1198612 1198613 For 119861
1and 119861
2 the rules presented in Figure 7 are also
applicable for Figure 11 For1198613 we only consider root overlap
because the crossover individuals can be eliminated by usingthe splitting operator introduced in Section 314 Note thatfor 1198613 if 119879119903119894 119862(119875
119860119886 119875119860119887 119875119860119888) = 0 then it is equivalent to the
scenario 1198783in Figure 8 Therefore we only need to consider
1198613when 119879119903119894 119862(119875
119860119886 119875119860119887 119875119860119888) = 0
322 Building Block-Based Cases Construction for ⟨119875119860119886119875119860119887
119875119860119888119875119860119889⟩ For a scenario 119878
119894(0 le 119894 le 10) in Figure 11 we
first decompose 119878119894to one or multiple building blocks For a
scenario 119878119894isin 1198781 1198783 it has only one building block and
all acceptable cases can be obtained directly For 1198782= 1199061=
1198611 1199062= 1198611 there is no need to consider the conflict between
the edges in 1199061and 119906
2because 119906
1and 119906
2are disconnected
Let 119877119894denote all acceptable cases of the path-pairs in 119906
119894 and
let 119879119894denote all acceptable cases for 119878
119894 Therefore we obtain
1198792= 1198771times1198772where times denotes the Cartesian product operator
from relational algebra
10 Computational and Mathematical Methods in Medicine
For B3 all three edges belong to root overlap (ie having root 3-overlap)
PAa
PAb PAcPAb
PAa
C(PAa PAb PAc) ne
B1 B2 B3
Tri 0
Figure 11 Building blocks for all scenarios of ⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
Table 2 Largest subgraph of a scenario 119878119894(4 le 119894 le 10 and 119894 = 6)
119878119894
1198784
1198785
1198787
1198788
1198789
11987810
119878119895
1198783
1198783
1198786
1198785
1198787
1198789
For 1198786= 1199061= 1198613 we obtain 119879
6= 1198771 For 119878
119894isin 119878119894| 4 le
119894 le 10 and 119894 = 6 we define the largest subgraph of 119878119894based
on which we construct 119879119894
Definition 7 (Largest Subgraph) Given a scenario 119878119894(4 le 119894 le
10 and 119894 = 6) the largest subgraph of 119878119894 denoted as 119878
119895 is
defined as follows
(1) 119878119895is a proper subgraph of 119878
119894
(2) if 119878119894contains 119861
3 then 119878
119895must also contain 119861
3
(3) no such 119878119896exists that 119878
119895is a proper subgraph of 119878
119896
while 119878119896is also a proper subgraph of 119878
119894
For each scenario 119878119894(4 le 119894 le 10 and 119894 = 6) we list the
largest subgraph of 119878119894 denoted as 119878
119895 in Table 2
For a scenario 119878119894(4 le 119894 le 10 and 119894 = 6) let Diff(119878
119894 119878119895)
denote the set of building blocks in 119878119894but not in 119878
119895 where 119878
119895is
the largest subgraph of 119878119894 Let |119864
119894| and |119864
119895| denote the number
of edges in 119878119894and 119878
119895 respectively According to Table 2 we
can conclude that |119864119894| minus |119864
119895| = 1 In order to leverage the
dependency among building blocks we consider only 1198612in
Diff(119878119894119878119895) For example Diff(119878
51198783) = 119861
2 Let119879
3denote all
acceptable cases for 1198783 And let119877
1denote the set of acceptable
cases for Diff(1198785 1198783) Then we can use 119878
3and Diff(119878
5
1198783) to construct all acceptable cases for 119878
5 Then we apply
this idea for constructing all acceptable cases for each 119878119894in
Table 2Given a path-quad ⟨119875
119860119886 119875119860119887 119875119860119888 119875119860119889⟩ an acceptable case
has the following properties
(1) if there is one root 3-overlap path there can be atmostone root 2-overlap path
(2) otherwise there can be at most two root 2-overlappaths
323 Path-Counting Formula forΦ119886119887119888119889
Now we present thepath-counting formula forΦ
119886119887119888119889as follows
Φ119886119887119888119889
= sum
119860
( sum
Type 1(1
2)
119871quad
Φ119860119860119860119860
+ sum
Type 2(1
2)
119871quad+1
Φ119860119860119860
+ sum
Type 3(1
2)
119871quad+2
Φ119860119860)
(14)
where Φ119860119860= (12)(1+119865
119860)Φ119860119860119860
= (14)(1+3119865119860)Φ119860119860119860119860
=
(18)(1+7119865119860) 119865119860 the inbreeding coefficient of119860119860 a quad-
common ancestor of 119886 119887 119888 and 119889 Type 1 zero root 2-overlapand zero root 3-overlap path Type 2 one root 2-overlap path119875119860119904
ending at 119904
Type 3
Case 1 two root 2-overlap paths 1198751198601199041
1198751198601199042
ending at 1199041and 1199042 respectively
Case 2 one root 3-overlap path119875119860119905
ending at 119905Case 3 one root 2-overlap path119875119860119904 one root 3-overlap
path 119875119860119905
ending at 119904 and 119905respectively
119871quad =
119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889
for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 119871119875119860119904
for Type 2119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889
minus1198711198751198601199041
minus 1198711198751198601199042
for Case 1 isin Type 3119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 2 lowast 119871
119875119860119905for Case 2 isin Type 3
119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889
minus119871119875119860119905minus 119871119875119860119904
for Case 3 isin Type 3(15)
and 119871119875119860119886
the length of the path 119875119860119886
(also applicable for 119875119860119887
119875119860119888 119875119860119889 etc)
For completeness the path-counting formulas for Φ119886119886119887119888
and Φ119886119886119886119887
are presented in Appendix A The correctness ofthe path-counting formula for four individuals is proven inAppendix C
Computational and Mathematical Methods in Medicine 11
⟨ ⟩(PAa PAb) (PAc PAd) = b
A
c
s t
da
A rarr s rarr aA rarr s rarr bA rarr t rarr cA rarr t rarr d
(a)
⟨ ⟩(PAa PAb) (PAc PAd) = b
A
c
x y
da
A rarr x rarr a
A rarr x rarr d
A rarr y rarr bA rarr y rarr c
(b)
Figure 12 Examples of 2-pair-path-quads for Φ119886119887119888119889
33 Path-Counting Formulas for Two Pairs of Individuals
331 Terminology and Definitions
(1) 2-Pair-Path-Pair It consists of two pairs of path-pairsdenoted as ⟨(119875
119878119886 119875119878119887) (119875119879119888 119875119879119889)⟩ where 119875
119878119886isin 119875(119878 119886) 119875
119878119887isin
119875(119878 119887) 119875119879119888isin 119875(119879 119888) 119875
119879119889isin 119875(119879 119889) 119878 is a common ancestor
of 119886 and 119887 and 119879 is a common ancestor of 119888 and 119889 If119860 = 119878 =119879 then 119860 is a quad-common ancestor of 119886 119887 119888 and 119889
(2) Homo-Overlap and Heter-Overlap Individual Given twopairs of individuals ⟨119886 119887⟩ and ⟨119888 119889⟩ if 119904 isin 119861119894 119862(119875
119860119886 119875119860119887) (or
119904 isin 119861119894 119862(119875119860119888 119875119860119889) we call 119904 a homo-overlap individual when
119875119860119886
and 119875119860119887
(or 119875119860119888
and 119875119860119889) pass through the same parent of
119904 If 119903 isin 119861119894 119862(119875119860119894 119875119860119895) where 119894 isin 119886 119887 and 119895 isin 119888 119889 we call
119903 a heter-overlap individual when 119875119860119894
and 119875119860119895
pass throughthe same parent of 119903
(3) Root Homo-Overlap and Heter-Overlap Path Given a 2-pair-path-pair ⟨(119875
119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ if 119904 is a homo-overlap
individual and the homo-overlap path extends all the wayto the quad-common ancestor 119860 then we call it a roothomo-overlap path If 119903 is a heter-overlap individual and theheter-overlap path extends all the way to the quad-commonancestor 119860 then we call it a root heter-overlap path
Example 8 119860 is quad-common ancestor for 119886 119887 119888 and 119889 inFigure 12 For (a) 119904 is a homo-overlap individual between 119875
119860119886
and 119875119860119887
119905 is a homo-overlap individual between 119875119860119888
and 119875119860119889 And
119860 rarr 119904 and 119860 rarr 119905 are root homo-overlap paths For (b) 119909 isa heter-overlap individual between 119875
119860119886and 119875
119860119889 119910 is a heter-
overlap individual between 119875119860119887
and 119875119860119888 And 119860 rarr 119909 and
119860 rarr 119910 are root heter-overlap paths
332 Path-Counting Formula for Φ119886119887119888119889
Now we presenta path-pair level graphical representation for ⟨(119875
119860119886 119875119860119887)
(119875119860119888 119875119860119889)⟩ shown in Figure 13 The options for an edge can
be 119879119883 119879119883 (Refer to Section 311 for definitions of 119879119883and 119879119883) Based on the different types of ⟨119875
119860119886 119875119860119887 119875119860119888 119875119860119889⟩
presented in (14) all cases for ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ are
summarized in Table 3 where ℎ is the last individual of a roothomo-overlap path 119875
119860ℎ(ie the path 119875
119860ℎending at ℎ) and 119903
1
and 1199032are the last individuals of root heter-overlap paths 119875
1198601199031
and 1198751198601199032
respectivelyGiven a pedigree graph having one or multiple progeni-
tors 119901119894| 119894 gt 0 we define that the generation of a progenitor
Table 3 A summary of all cases for ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩ ⟨(119875
119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩
Zero root 2-overlap andzero root 3-overlap
Zero root homo-overlap and zero rootheter-overlap
One root 2-overlap path
One root homo-overlap and zero rootheter-overlapZero root homo-overlap and one rootheter-overlap
Two root 2-overlap paths
Two root homo-overlaps and zero rootheter-overlapZero root homo-overlap and two rootheter-overlaps
One root 3-overlap path One root homo-overlap and two rootheter-overlaps and ℎ = 119903
1= 1199032
One root 2-overlap andone root 3-overlap
One root homo-overlap and two rootheter-overlaps and 119903
1= 1199032= ℎ
One root homo-overlap and two rootheter-overlaps and ℎ = 119903
1= 1199032
119901119894is 0 denoted as gen(119901
119894) = 0 If an individual 119886 has only
one parent 119901 then we define gen(119886) = gen(119901) + 1 If anindividual 119886 has two parents 119891 and 119898 we define gen(119886) =MAXgen(119891) gen(119898) + 1
The path-counting formula forΦ119886119887119888119889
is as follows
Φ119886119887119888119889
= sum
119860
( sum
Type 1(1
2)
1198712-pair
Φ119860119860119860
+ sum
Type 2(1
2)
1198712-pair+1
Φ119860119860119860
+ sum
Type 3(1
2)
1198712-pair+2
Φ119860119860
+ sum
Type 4(1
2)
1198712-pair+1
Φ119860119860)
+ sum
(119878119879)isinType 5(1
2)
119871⟨119875119878119886119875119878119887⟩+119871⟨119875119879119888119875119879119889
⟩+1
Φ119861119861
(16)
where 119860 a quad-common ancestor of 119886 119887 119888 and 119889 119878a common ancestor of 119886 and 119887 and 119879 a common ances-tor of 119888 and 119889 For ⟨(119875
119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ (119878 = 119879 =
119860) there are four types (ieType 1 to Type 4)
12 Computational and Mathematical Methods in Medicine
S0S1 S2 S3 S4 S5 S6 S7
S8 S9 S10 S11 S12 S13 S14 S15 S16
PAa
PAdPAb
PAc
Figure 13 Scenarios of ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ at path-pair level
Type 1 zero root homo-overlap and zero root heter-overlapType 2 zero root homo-overlap and one root heter-overlap 119875
119860119903ending at 119903
Type 3
zero root homo-overlap and two rootheter-overlap 119875
1198601199031and1198751198601199032
ending at1199031and 1199032 respectively
one root homo-overlap 119875119860ℎ
ending at ℎand two root heter-overlap 119875
1198601199031and 119875
1198601199032
ending at 1199031and 1199032 and 119903
1= 1199032
(17)
Type 4 one root homo-overlap 119875119860ℎ
ending at ℎ andtwo root heter-overlap ending at 119903
1and 1199032 and ℎ =
1199031= 1199032 For ⟨(119875
119878119886 119875119878119887) (119875119879119888 119875119879119889)⟩ (119878 = 119879) there is
one type (ie Type 5)Type 5 ⟨119875
119878119886 119875119878119887⟩ has zero overlap individual ⟨119875
119879119888
119875119879119889⟩ has zero overlap individual
At most one path-pair (either ⟨119875119878119886 119875119878119887⟩ or ⟨119875
119879119888
119875119879119889⟩) can have crossover individualsBetween a path from ⟨119875
119878119886 119875119878119887⟩ and a path from ⟨119875
119879119888 119875119879119889⟩
there are no overlap individuals but there can be crossoverindividuals 119909 where 119909 = 119878 and 119909 = 119879
119861=
119878 when gen (119878) lt gen (119879)119878 when gen (119878) = gen (119879)
and 119879 has two parents119879 otherwise
1198712-pair =
119871119875119860119886+ 119871119875119860119887
+119871119875119860119888+ 119871119875119860119889
for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 119871119875119860119903
for Type 2119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 1198711198751198601199031
minus 1198711198751198601199032
for Type 3119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 2 lowast 119871
119875119860ℎfor Type 4
119871⟨119875119878119886 119875119878119887⟩
= 119871119875119878119886+ 119871119875119878119887
for Type 5
119871⟨119875119879119888 119875119879119889⟩
= 119871119875119879119888+ 119871119875119879119889
for Type 5
(18)
Note that if ⟨119886 119887⟩ and ⟨119888 119889⟩ have zero quad-commonancestors we have the following formula for Φ
119886119887119888119889
Φ119886119887119888119889
= sum
(119878119879)isinType 6(1
2)
119871⟨119875119878119886119875119878119887⟩+119871⟨119875119879119888119875119879119889
⟩
Φ119878119878lowast Φ119879119879 (19)
Type 6 ⟨119875119878119886 119875119878119887⟩ is a nonoverlapping path-pair and ⟨119875
119879119888
119875119879119889⟩ is a nonoverlapping path-pair Between a path from
⟨119875119878119886 119875119878119887⟩ and a path from ⟨119875
119879119888 119875119879119889⟩ there are no overlap
individuals but there can be crossover individuals119871⟨119875119878119886 119875119878119887⟩
and 119871⟨119875119879119888119875119879119889⟩
are defined as in Type 5The correctness of the path-counting formula forΦ
119886119887119888119889is
proven in Appendix C For completeness please refer to [18]for the path-counting formulas for Φ
119886119886119887119888 Φ119886119887119886119888
Φ119886119887119886119887
andΦ119886119886119886119887
34 Experimental Results In this section we show the effi-ciency of our path-counting method using NodeCodes forcondensed identity coefficients by making comparisons withthe performance of a recursive method used in [10] Weimplemented two methods (1) using recursive formulas tocompute each required kinship coefficient and generalizedkinship coefficient (2) using path-counting method coupledwith NodeCodes to compute each required kinship coeffi-cient and generalized kinship coefficient independently Werefer to the first method as Recursive the second methodas NodeCodes For completeness please refer to [18] for thedetails of the NodeCodes-based method
Nodecodes of a node is a set of labels each representing apath to the node from its ancestors Given a pedigree graphlet 119903 be the progenitor (ie the node with 0 in-degree)(For simplicity we assume there is one progenitor 119903 asthe ancestor of all individuals in the pedigree Otherwise avirtual node 119903 can be added to the pedigree graph and allprogenitors can be made children of 119903) For each node 119906 inthe graph the set of NodeCodes of 119906 denoted as NC(119906) areassigned using a breadth-first-search traversal starting from119903 as follows
(1) If 119906 is 119903 then NC(119903) contains only one element theempty string
(2) Otherwise let 119906 be a node with NC(119906) and V0 V1
V119896be 119906rsquos children in sibling order then for each 119909
in NC(119906) a code 119909119894lowast is added to NC(V119894) where 0 le
119894 le 119896 and lowast indicates the gender of the individualrepresented by node V
119894
Computational and Mathematical Methods in Medicine 13
Computations of kinship coefficients for two individualsand generalized kinship coefficients for three individualspresented in [11 12 14 15] are using NodeCodes TheNodeCodes-based computation schemes can also be appliedfor the generalized kinship coefficients for four individualsand two pairs of individuals For completeness please referto [18] for the details using NodeCodes to compute thegeneralized kinship coefficients for four individuals and twopairs of individuals based on our proposed path-countingformulas in Sections 32 and 33
In order to test the scalability of our approach for cal-culating condensed identity coefficients on large pedigreeswe used a population simulator implemented in [11] togenerate arbitrarily large pedigreesThe population simulatoris based on the algorithm for generating populations withoverlapping generations in Chapter 4 of [19] along withthe parameters given in Appendix B of [20] to model therelatively isolated Finnish Kainuu subpopulation and itsgrowth during the years 1500ndash2000 An overview of thegeneration algorithmwas presented in [11 12 14]The param-eters include startingending year initial population sizeinitial age distribution marriage probability maximum ageat pregnancy expected number of children by time periodimmigration rate and probability of death by time period andage group
We examine the performance of condensed identity coef-ficients using twelve synthetic pedigrees which range from75 individuals to 195197 individuals The smallest pedigreespans 3 generations and the largest pedigree spans 19 gener-ations We analyzed the effects of pedigree size and the depthof individuals in the pedigree (the longest path between theindividual and a progenitor) on the computation efficiencyimprovement
In the first experiment 300 random pairs were selectedfrom each of our 12 synthetic pedigrees Figure 14 showscomputation efficiency improvement for each pedigree Ascan be seen the improvement of NodeCodes over Recursivegrew increasingly larger as the pedigree size increased froma comparable amount of 2683 on the smallest pedigree to9475 on the largest pedigree It also shows that path-count-ing method coupled with NodeCodes can scale very well onlarge pedigrees in terms of computing condensed identitycoefficients
In our next experiment we examined the effect of thedepth of the individual in the pedigree on the query time Foreach depth we generated 300 random pairs from the largestsynthetic pedigree
Figure 15 shows the effect of depth on the compu-tation efficiency improvement We can see the improve-ment of NodeCodes over Recursive ranging from 8648 to9130
4 Conclusion
We have introduced a framework for generalizing Wrightrsquospath-counting formula for more than two individuals Aim-ing at efficiently computing condensed identity coefficients
0
50
100
150
200
77 181
383
769
1558
3105
6174
1235
1
2466
7
4976
1
9832
8
1951
97
250
300
Aver
age t
ime (
ms)
Individuals in pedigree
RecursiveNodecodes
Figure 14 The effect of pedigree size on computation efficiencyimprovement
0200400600800
10001200140016001800
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Aver
age t
ime (
ms)
Depth
RecursiveNodeCodes
Figure 15 The effect of depth on computation efficiency improve-ment
we proposed path-counting formulas (PCF) for all general-ized kinship coefficients for which are sufficient for express-ing condensed identity coefficients by a linear combinationWe also perform experiments to compare the efficiency of ourmethod with the recursive method for computing condensedidentity coefficients on large pedigrees Our future workincludes (i) further improvements on condensed identifycoefficients computation by collectively calculating the setof generalized kinship coefficients to avoid redundant com-putations and (ii) experimental results for using PCF inconjunction with encoding schemes (eg compact path-encoding schemes [13]) for computing condensed identitycoefficients on very large pedigrees
Appendices
A Path-Counting Formulas of Special Cases
A1 Path-Counting Formula for Φ119886119886119887
For ⟨1198751198601198861 1198751198601198862⟩ we
introduce a special case where 1198751198601198861
and 1198751198601198862
aremergeable
14 Computational and Mathematical Methods in Medicine
PAa1 PAa2 PAa1 PAa2
S0 S1
PAb PAb PAb
If is mergeable⟨P ⟩Aa1 PAa2
PAa
S2 S3
Figure 16 A path-pair level graphical representation of ⟨1198751198601198861 1198751198601198862
119875119860119887⟩
Definition A1 (Mergeable Path-Pair) A path-pair ⟨1198751198601198861
1198751198601198862⟩ is mergeable if and only if the two paths 119875
1198601198861and 119875
1198601198862
are completely identical
Next we present a graphical representation of ⟨1198751198601198861 1198751198601198862
119875119860119887⟩ in Figure 16
Lemma A2 For 1198782and 119878
3in Figure 16 ⟨119875
1198601198861 1198751198601198862⟩ cannot
be a mergeable path-pair
Proof For 1198782and 119878
3 if ⟨119875
1198601198861 1198751198601198862⟩ is mergeable then
any common individual 119904 between 1198751198601198861
and 119875119860119887
is alsoa shared individual between 119875
1198601198862and 119875
119860119887 It means
119904 isin 119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) which contradicts the fact that
119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) = 0
Considering all three scenarios in Figure 16 only 1198781can
have a mergeable path-pair ⟨1198751198601198861 1198751198601198862⟩ by Lemma A2 Now
we present our path-counting formula forΦ119886119886119887
where 119886 is notan ancestor of 119887
Φ119886119886119887
= sum
119860
( sum
Type 1(1
2)
119871 tripleminus1
Φ119860119860119860
+ sum
Type 2(1
2)
119871 triple
Φ119860119860
+ sum
Type 3(1
2)
119871⟨119875119860119886119875119860119887⟩+1
Φ119860119860)
(A1)
where 119860 a common ancestor of 119886 and 119887When ⟨119875
1198601198861 1198751198601198862⟩ is not mergeable
Type 1 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has no root 2-overlap
Type 2 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has one root 2-overlap path
119875119860119904
ending at the individual 119904
When ⟨1198751198601198861 1198751198601198862⟩ is mergeable
Type 3 ⟨119875119860119886 119875119860119887⟩ is a nonoverlapping path-pair
119871 triple = 1198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887
for Type 11198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887minus 119871119875119860119904
for Type 2
119871⟨119875119860119886 119875119860119887⟩
= 119871119875119860119886+ 119871119875119860119887
for Type 3
(A2)
For the sake of completeness if 119886 is an ancestor of 119887 there isno recursive formula for Φ
119886119886119887in [10] but we can use either
the recursive formula for Φ119886119887119888
or the path-counting formulaforΦ119886119887119888
to computeΦ11988611198862119887
A2 Path-Counting Formula for Φ119886119886119887119888
Given a path-quad⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩ if ⟨119875
1198601198861 1198751198601198862⟩ is not mergeable then
we process the path-quad as equivalent to ⟨119875119860119886 119875119860119887 119875119860119888
119875119860119889⟩ If ⟨119875
1198601198861 1198751198601198862⟩ is mergeable the path-quad ⟨119875
1198601198861 1198751198601198862
119875119860119887 119875119860119888⟩ can be condensed to scenarios for ⟨119875
119860119886 119875119860119887 119875119860119888⟩
Now we present a path-counting formula forΦ119886119886119887119888
where119886 is not an ancestor of 119887 and 119888 as follows
Φ119886119886119887119888
= sum
119860
( sum
Type 1(1
2)
119871quadminus1
Φ119860119860119860119860
+ sum
Type 2(1
2)
119871quad
ΦAAA
+ sum
Type 3(1
2)
119871quad+1
Φ119860119860)
+sum
119860
( sum
Type 4(1
2)
119871 triple+1
Φ119860119860119860
+ sum
Type 5(1
2)
119871 triple+2
Φ119860119860)
(A3)
where 119860 a quad-common ancestor of 119886 119887 119888 and 119889When ⟨119875
1198601198861 1198751198601198862⟩ is not mergeable
Type 1 zero root 2-overlap and zero root 3-overlappathType 2 one root 2-overlap path 119875
119860119904ending at 119904
Type 3
Case 1 two root 2-overlap paths 1198751198601199041
and 1198751198601199042
ending at 1199041and 1199042 respectively
Case 2 one root 3-overlap path 119875119860119905
ending at 119905Case 3 one root 2-overlapand one root 3-overlap paths119875119860119904
and 119875119860119905
ending at 119904 and 119905respectively
(A4)
When ⟨1198751198601198861 1198751198601198862⟩ is mergeable
Type 4 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has zero root 2-overlap path
Type 5 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has one root 2-overlap path119875
119860119904
ending at 119904
119871quad=
1198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
for Type 11198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus119871119875119860119904
for Type 21198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus1198711198751198601199041
minus 1198711198751198601199042
for Case 1isinType 31198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus119871119875119860119905
for Case 2isinType 31198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus119871119875119860119905minus 119871119875119860119904
for Case 3isinType 3
119871 triple = 119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
for Type 4119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888minus 119871119875119860119904
for Type 5(A5)
Computational and Mathematical Methods in Medicine 15
Note that if 119886 is an ancestor of either 119887 or 119888 or both ofthem then the path-counting formula of Φ
119886119887119888119889is applicable
to computeΦ11988611198862119887119888
A3 Path-Counting Formula for Φ119886119886119886119887
A special case of⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ for ⟨119875
1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ is introduced
when ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable With the existence of
a mergeable path-triple ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ can be con-
densed to ⟨119875119860119886 119875119860119887⟩
Definition A3 (Mergeable Path-Triple) Given three paths1198751198601198861
1198751198601198862
and 1198751198601198863
they are mergeable if and only if theyare completely identical
Lemma A4 Given a path-quad ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there
must be at least one mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩
⟨1198751198601198861 1198751198601198863⟩ ⟨1198751198601198862 1198751198601198863⟩
Proof For an individual 119886 with two parents 119891 and 119898 thepaternal allele of the individual 119886 is transmitted from 119891 andthe maternal allele is transmitted from119898 At allele level onlytwo descent paths starting from an ancestor are allowed Fora path-quad ⟨119875
1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there must be at least one
mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩ ⟨1198751198601198861 1198751198601198863⟩ and
⟨1198751198601198862 1198751198601198863⟩
For simplicity we treat ⟨1198751198601198861 1198751198601198862⟩ as a default mergeable
path-pairNow we present the path-counting formula for Φ
119886119886119886119887
where 119886 is not an ancestor of 119887 as follows
Φ119886119886119886119887
= sum
119860
(3
2( sum
Type 1(1
2)
119871 tripleminus1
Φ119860119860119860
+ sum
Type 2(1
2)
119871 triple
Φ119860119860)
+ sum
Type 3(1
2)
119871pair+2
Φ119860119860)
(A6)
where 119860 a common ancestor of 119886 and 119887When there is only one mergeable path-pair (let us con-
sider ⟨1198751198601198861 1198751198601198862⟩ as the mergeable path-pair)
Type 1 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has zero root 2-overlap path
Type 2 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has one root 2-overlap path
119875119860119904
ending at 119904
When ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable
Type 3 ⟨119875119860119886 119875119860119887⟩ is nonoverlapping
119871 triple = 1198711198751198601198861
+ 1198711198751198601198863
+ 119871119875119860119887
for Type 11198711198751198601198861
+ 1198711198751198601198863
+ 119871119875119860119887minus 119871119875119860119904
for Type 2
119871pair = 119871119875119860119886 + 119871119875119860119887 for Type 3
(A7)
Note that if 119886 is an ancestor of 119887 we treat Φ119886119886119886119887
=
Φ119886111988621198863119887
Then we apply the path-counting formula forΦ119886119887119888119889
to computeΦ119886111988621198863119887
Case21 Case31 ΦAAAΦabCase22 Case32
Case23 ΦAA
Figure 17 Dependency graph for different cases regardingΦ119886119887119888
andΦ119886119886119887
B Proof for Path-Counting Formulas ofThree Individuals
Wefirst demonstrate that for one triple-common ancestor119860the path-counting computation of Φ
119886119887119888is equivalent to the
computation using recursive formulas Then we prove thecorrectness of the path-counting computation for multipletriple-common ancestors
B1 One Triple-Common Ancestor Considering the differenttypes of path-triples starting from a triple-common ancestor119860 in a pedigree graph119866 contributing toΦ
119886119887119888andΦ
119886119886119887119866 can
have 5 different cases
Case 21 119866 does not haveany path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with root overlapCase 22 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with root overlapCase 23 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
having mergeablepath-pair⟨119875
1198601198861 1198751198601198862⟩
lArr997904 Φ119886119886119887
Case 31 119866 does not haveany path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩
with root overlapCase 32 119866 has path-triples
⟨119875119860119886 119875119860119887 119875119860119888⟩
with root overlap
lArr997904 Φ119886119887119888
(B1)
Based on the 5 cases from Case 21 to Case 32 we firstconstruct a dependency graph shown in Figure 17 consist-ent with the recursive formulas (3) (4) and (5) for the gener-alized kinship coefficients for three individuals
Then we take the following steps to prove the correctnessof the path-counting formulas (12) and (A1)
(i) forΦ119886119887 the correctness of the path-counting formula
(ie Wrightrsquos formula) is proven in [21] For Case 21and Case 22 the correctness is proven based on thecorrectness of Cases 31 and 32
(ii) for Case 23 it has no cycle but only depends on Φ119886119887
Thus we prove the correctness of Case 23 by trans-forming the case toΦ
119886119887
16 Computational and Mathematical Methods in Medicine
a b
c
(a)
A
a b c
(b)
Figure 18 (a) 119888 is a parent of 119886 and 119887 (b) no individual is a parent of another
Parent-child relationshipAncestor-descendant relationship
A
a
s v p
f b c
(a)
Parent-child relationshipAncestor-descendant relationship
c
a
s v
f b
(b)
Figure 19 (a) No individual is a parent of another (b) 119888 is an ancestor of 119886 and 119887
(iii) for Cases 31 and 32 the correctness is proven byinduction on the number of edges 119899 in the pedigreegraph 119866
B11 Correctness Proof for Case 31
Case 31 ForΦ119886119887119888
119866 does not have any path triples ⟨119875119860119886 119875119860119887
119875119860119888⟩ with root overlap
Proof (Basis) There are two basic scenarios (i) one individ-ual is a parent of another (ii) no individual is a parent ofanother among 119886 119887 and 119888
Using the recursive formula (3) to compute Φ119886119887119888
forFigure 18(a) Φ
119886119887119888= (12)Φ
119888119887119888= (12)
2
Φ119888119888119888 for Figure 18(b)
Φ119886119887119888= (12)Φ
119860119887119888= (12)
2
Φ119860119860119888
= (12)3
Φ119860119860119860
Using the path-counting formula (12) if a path-triple
⟨119875119860119886 119875119860119887 119875119860119888⟩ has no root overlap (ie Type 1) then the
contribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ
119886119887119888can be computed as
follows sumType 1(12)119871⟨119875119860119886119875119860119887
119875119860119888⟩Φ119860119860119860
where 119871⟨119875119860119886119875119860119887 119875119860119888⟩
=
119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
For Figure 18(a) 119888 is the only triple-common ancestor
and we obtain Φ119886119887119888
= (12)119871⟨119875119888119886119875119888119887
119875119888119888⟩Φ119888119888119888
= (12)2
Φ119888119888119888 for
Figure 18(b) we obtain Φ119886119887119888
= (12)119871⟨119875119860119886119875119860119887
119875119860119888⟩Φ119860119860119860
=
(12)3
Φ119860119860119860
Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Then we show it istrue for 119899 = 119896 + 1
For Figures 19(a) and 19(b) among 119886 119887 and 119888 let 119886 be theindividual having the longest path starting from their triple-common ancestor in the pedigree graph119866with (119896+1) edgesIf we remove the node 119886 and cut the edge 119891 rarr 119886 from 119866
then the new graph 119866lowast has 119896 edges In terms of computingΦ119891119887119888
119866lowast satisfies the condition for induction hypothesisFor Figure 19(a) Φ
119891119887119888= sumType 1(12)
119871⟨119875119860119891119875119860119887119875119860119888⟩Φ119860119860119860
Based on the recursive formula (3)Φ
119886119887119888= (12)(Φ
119891119887119888+Φ119898119887119888)
where 119891 and 119898 are parents of 119886 In 119866 119886 only has one parent119891 thus it indicatesΦ
119898119887119888= 0 Then we can plug-in the path-
counting formula forΦ119891119887119888
to obtain
Φ119886119887119888=1
2Φ119891119887119888
=1
2lowast sum
Type 1(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩
Φ119860119860119860
= sum
Type 1(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860119860
∵ 119871⟨119875119860119886119875119860119887 119875119860119888⟩
= 119871⟨119875119860119891119875119860119887 119875119860119888⟩
+ 1
there4 Φ119886119887119888= sum
Type 1(1
2)
119871⟨119875119860119886119875119860119887119875119860119888⟩
Φ119860119860119860
(B2)
Similarly for Figure 19(b) we obtain Φ119886119887119888
=
sumType 1(12)119871⟨119875119888119891119875119888119887119875119888119888⟩+1
Φ119888119888119888= sumType 1(12)
119871⟨119875119888119886119875119888119887119875119888119888⟩Φ119888119888119888
Thus it is true for 119899 = 119896 + 1
B12 Correctness Proof for Case 32
Case 32 ForΦ119886119887119888
119866 has path triples ⟨119875119860119886 119875119860119887 119875119860119888⟩with root
overlap
Proof (Basis) There are three basic scenarios (i) there are twoindividuals who are parents of another (ii) there is only oneindividual who is parent of another (iii) there is no individualwho is a parent of another among 119886 119887 and 119888
Computational and Mathematical Methods in Medicine 17
a
b
c
(a)
A
a
b c
(b)
A
a
s
b
c
(c)
Figure 20 (a) 119887 is a parent of 119886 and 119888 is a parent of 119887 (b) 119887 is a parentof 119886 (c) no individual who is a parent of another
Using the recursive formula (3) to compute Φ119886119887119888
inFigure 20 for Figure 20(a) Φ
119886119887119888= (12)Φ
119887119887119888= (12)
2
Φ119887119888=
(12)3
Φ119888119888 for Figure 20(b)Φ
119886119887119888= (12)Φ
119887119887119888= (12)
2
Φ119887119888=
(12)4
Φ119860119860
for Figure 20(c)Φ119886119887119888= (12)
2
Φ119904119904119888= (12)
3
Φ119904119888=
(12)5
Φ119860119860
Using the path-counting formula (12) if a path-triple
⟨119875119860119886 119875119860119887 119875119860119888⟩ has root overlap (ie Type 2) then the con-
tribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ
119886119887119888can be computed as
followssumType 2(12)119871⟨119875119860119886119875119860119887
119875119860119888⟩+1
Φ119860119860
where 119871⟨119875119860119886 119875119860119887 119875119860119888⟩
=
119871119875119860119886
+ 119871119875119860119887
+ 119871119875119860119888minus 119871119875119860119904
and 119904 is the last individual of theroot overlap path 119875
119860119904
For Figure 20(a) 119888 is the only triple-common ancestorand we obtain Φ
119886119887119888= (12)
119871⟨119875119888119886119875119888119887119875119888119888⟩+1
Φ119888119888= (12)
2+1
Φ119888119888=
(12)3
Φ119888119888 Similarly for Figures 20(b) and 20(c) we obtain
Φ119886119887119888= (12)
4
Φ119860119860
and Φ119886119887119888= (12)
5
Φ119860119860
respectively
Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Show that it is truefor = 119896 + 1
For Figures 21(a) 21(b) and 21(c) among 119886 119887 and 119888 let119886 be the individual who has the longest path and let 119901 be aparent of 119886 Then we cut the edge 119901 rarr 119886 from 119866 and obtaina new graph 119866lowast which satisfies the condition of inductionhypothesis For Figure 21(a) we use the path-counting for-mula forΦ
119891119887119888in 119866lowast Φ
119891119887119888= sumType 2(12)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860
In 119866 119891 is the only parent of 119886 according to the recursive
formula (3) we have Φ119886119887119888= (12)Φ
119891119887119888 Then we can plug-in
the Φ119891119887119888
and obtain
Φ119886119887119888=1
2Φ119891119887119888
=1
2sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860
= sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1
Φ119860119860
∵ 119871⟨119875119860119886 119875119860119887 119875119860119888⟩
= 119871⟨119875119860119891119875119860119887 119875119860119888⟩
+ 1
there4 Φ119886119887119888= sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1
Φ119860119860
= sum
Type 2(1
2)
119871⟨119875119860119886119875119860119887119875119860119888⟩+1
Φ119860119860
(B3)
For Figures 21(b) and 21(c) we take the same steps as we cal-culate Φ
119886119887119888for Figure 21(a)
In summary it is true for 119899 = 119896 + 1
A
a
s
t
f
b
c
(a)
a
t
b
A
s c
(b)
a
s
t
b
c
(c)Figure 21 (a) No individual who is a parent of another (b) 119887 is aparent of 119886 (c) 119887 is a parent of 119886 and 119888 is an ancestor of 119887
B13 Correctness Proof for Case 23
Case 23 For Φ119886119886119887
the path-triples in the pedigree graph 119866have mergeable path-pair
Proof Considering the relationship between 119886 and 119887 119866has two scenarios (i) 119887 is not an ancestor of 119886 (ii) 119887 isan ancestor of 119886 Using the path-counting formula (A1)if a path-triple ⟨119875
1198601198861 1198751198601198862 119875119860119887⟩ isin Type 3 which means
that it has a mergeable path-pair then the contributionof ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ to Φ
119886119886119887can be computed as follows
sumType 3(12)119871⟨119875119860119886119875119860119887
⟩+1Φ119860119860
where 119871⟨119875119860119886 119875119860119887⟩
= 119871119875119860119886+ 119871119875119860119887
Using the recursive formula (4) we obtain Φ
119886119886119887=
(12)(Φ119886119887+ Φ119891119898119887)
For Figure 22(a) 119860 is a common ancestor of 119886 and 119887∵ 119886 only has one parent 119891
there4 Φ119886119886119887
=1
2(Φ119886119887+ Φ119891119898119887)
=1
2(Φ119886119887+ 0) =
1
2Φ119886119887
(as 119898 is missing) (B4)
For Φ119886119887 we use Wrightrsquos formula and obtain Φ
119886119887=
sum119875(12)119871⟨119875119860119886119875119860119887
⟩Φ119860119860
where 119875 denotes all nonoverlappingpath-pairs ⟨119875
119860119886 119875119860119887⟩
Then we have Φ119886119886119887
= (12)Φ119886119887
=
(12)sum119875(12)119871⟨119875119860119886119875119860119887
⟩Φ119860119860= sum119875(12)119871⟨119875119860119886119875119860119887
⟩+1Φ119860119860
For Figure 22(b) we can also transform the computation
of Φ119886119886119887
to Φ119886119887
In summary it shows that the path-counting formula(A1) is true for Case 23
B14 Correctness Proof for Cases 21 and 22 For Φ119886119886119887
whenthere is no path-triple having mergeable path-pair (ie thepath-triple belongs to either Case 21 or Case 23)Φ
119886119886119887can be
transformed toΦ11988611198862119887
which is equivalent to the computationof Φ119886119887119888
for Cases 31 and 32 The correctness of our path-counting formula for Cases 31 and 32 is proven Thus weobtain the correctness for Φ
119886119886119887when the path-triple belongs
to either Case 21 or Case 22
B2 Multiple Triple-Common Ancestors Now we providethe correctness proof for multiple triple-common ancestorsregarding the path-counting formulas (12) and (A1)
18 Computational and Mathematical Methods in Medicine
A
a
s
w
t
f
b
Parent-child relationshipAncestor-descendant relationship
(a)
a
s
f
b
Parent-child relationshipAncestor-descendant relationship
(b)
Figure 22 (a) 119887 is not an ancestor of 119886 (b) 119887 is an ancestor of 119886
Lemma A Given a pedigree graph 119866 and three individuals 119886119887 119888 having at least one trip-common ancestorΦ
119886119887119888is correctly
computed using the path counting formulas (12) and (A1)
Proof Proof by induction on the number of triple-commonancestorsBasis 119866 has only one triple-common ancestor of 119886 119887 and 119888
The correctness of (12) and (A1) for 119866 with only one tri-ple-common ancestor of 119886 119887 and 119888 is proven in the previoussection
Induction Hypothesis Assume that if 119866 has 119896 or less triple-common ancestors of 119886 119887 and 119888 (12) and (A1) are correct for119866
Induction Step Now we show that it is true for 119866 with 119896 + 1triple-common ancestors of 119886 119887 and 119888
Let 119879119903119894 119862(119886 119887 119888 119866) denote all triple-common ancestorsof 119886 119887 and 119888 in 119866 where 119879119903119894 119862(119886 119887 119888 119866) = 119860
119894| 1 le 119894 le 119896 +
1 Let 1198601be the most top triple-common ancestor such that
there is no individual among the remaining ancestors 119860119894|
2 le 119894 le 119896 + 1 who is an ancestor of 1198601 Let 119878(119860
1) denote the
contribution from 1198601to Φ119886119887119888
Because119860
1is themost top triple-common ancestor there
is no path-triple from 119860119894| 2 le 119894 le 119896 + 1 to 119886 119887 and
119888 which passes through 1198601 Then we can remove 119860
1from
119866 and delete all out-going edges from 1198601and obtain a new
graph 1198661015840 which has 119896 triple-common ancestors of 119886 119887 and 119888It means 119879119903119894 119862(119886 119887 119888 1198661015840) = 119860
119894| 2 le 119894 le 119896 + 1
For the new graph 1198661015840 we can apply our induction
hypothesis and obtainΦ119886119887119888(1198661015840
)For the most top triple-common ancestor 119860
1 there are
two different cases considering its relationship with the othertriple-common ancestors
(1) there is no individual among 119860119894| 2 le 119894 le 119896 + 1 who
is a descendant of 1198601
(2) there is at least one individual among 119860119894| 2 le 119894 le
119896 + 1 who is a descendant of 1198601
For (1) since no individual among 119860119894| 2 le 119894 le 119896 + 1 is a
descendant of 1198601 the set of path-triples from 119860
1to 119886 119887 and
119888 is independent of the set of path-triples from 119860119894| 2 le 119894 le
119896 + 1 to 119886 119887 and 119888 It also means that the contribution from
1198601toΦ119886119887119888
is independent of the contribution from the othertriple-common ancestors
Summing up all contributions we can obtainΦ119886119887119888(119866) =
Φ119886119887119888(1198661015840
) + 119878(1198601)
For (2) let119860119895be one descendant of119860
1 Now both119860
1and
119860119895can reach 119886 119887 and 119888119901119905119894= 119905119886 1198601rarr sdot sdot sdot rarr 119886 119905
119887 1198601rarr sdot sdot sdot rarr 119887 119905
119888 1198601rarr
sdot sdot sdot rarr 119888 a path-triple from 1198601to 119886 119887 and 119888
If 119905119886 119905119887 and 119905
119888all pass through119860
119895 then the path-triple119901119905
119894
is not an eligible path-triple for Φ119886119887119888
When we compute thecontribution from119860
1toΦ119886119887119888
we exclude all such path-tripleswhere 119905
119886 119905119887 and 119905
119888all pass through a lower triple-common
ancestor In other words an eligible path-triple from 1198601
regarding Φ119886119887119888
cannot have three paths all passing through alower triple-common ancestor Therefore we know that thatthe contribution from119860
1toΦ119886119887119888
is independent of the contri-bution from the other triple-common ancestors Summing upall contributions we obtainΦ
119886119887119888(119866) = Φ
119886119887119888(1198661015840
) + 119878(1198601)
C Proof for Four Individuals and TwoPairs of Individuals
Here we give a proof sketch for the correctness of pathcounting formulas for four individuals First of all for fourindividuals in a pedigree graph 119866 we present all differentcases based on which we construct a dependency graphThe correctness of the path-counting formulas for two-pairindividuals can be proved similarly
C1 Proof for Four Individuals Consider the existence ofdifferent types of path-quads regarding Φ
119886119887119888119889 Φ119886119886119887119888
andΦ119886119886119886119887
there are 15 cases for a pedigree graph 119866
Case 21 119866 has path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with zero root overlapCase 22 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with one root overlapCase 23 119866 has path-pairs
⟨119875119860119886 119875119860119887⟩
with zero root overlap
lArr997904 Φ119886119886119886119887
Computational and Mathematical Methods in Medicine 19
Case21
Case31 ΦAAA
ΦAAA
Case41
Case42
Case34ΦAA
Case32
Case331
Case22
Case23
Case431
Case35
Case432
Case4 33
Case332
Case333
Figure 23 Dependency graph for different cases for four individuals
Case 31 119866 has path-quads⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with zero root overlapCase 32 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 2-overlapCase 331 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with two root 2-overlapCase 332 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 3-overlapCase 333 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 2-overlapand one root 3-overlap
Case 34 119866 has path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩
with zero root overlapCase 35 119866 has path-triples
⟨119875119860119886 119875119860119887 119875119860119888⟩
with one root overlap
lArr997904 Φ119886119886119887119888
Case 41 119866 has path-quads⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with zero root overlapCase 42 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 2-overlapCase 431 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with two root 2-overlapCase 432 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 3-overlapCase 433 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 2-overlapand one root 3-overlap
lArr997904 Φ119886119887119888119889
(C1)Then we construct a dependency graph shown in
Figure 23 for all cases for four individualsAccording to the dependency graph in Figure 23 the
intermediate steps including Cases 34 and 35 are already
proved for the computation of Φ119886119887119888
The correctness of thetransformation fromCase 42 to Case 34 can be proved basedon the recursive formula forΦ
119886119887119888119889andΦ
119886119886119887119888 Similarly we can
obtain the transformation from Case 431 to Case 35
C2 Proof for TwoPairs of Individuals Consider the existenceof different types of 2-pair-path-pair regarding Φ
119886119887119888119889 there
are 9 cases which are listed as follows
Case 41 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-
overlap and zero root heter-overlap
Case 42 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-
overlap and one root heter-overlap
Case 431 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root
homo-overlap and two root heter-overlap
Case 432 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root
homo-overlap and two root heter-overlap
Case 44119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root homo-
overlap and zero root heter-overlap
Case 45 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with two root homo-
overlap and zero root heter-overlap
Case 46 119866 has path-triples ⟨119875119860119886 119875119860119887 119875119860119888⟩ with zero root
overlapCase 47 119866 has path-triples ⟨119875
119860119886 119875119860119887 119875119860119888⟩ with one root
overlap
Case 48 119866 has path-pairs ⟨119875119879119888 119875119879119889⟩ with zero root overlap
Then we construct a dependency graph for the casesrelating to Φ
119886119887119888119889in Figure 24
According to the dependency graph in Figure 24Cases 46 47 and 48 are the intermediate steps whichalready are proved for the computation of Φ
119886119887119888 The
correctness of the transformation from Case 42 to Case 46can be proved based on the recursive formula for Φ
119886119887119888119889and
Φ119886119887119886119888
Similarly we can obtain the transformation fromCases 431 and 432 to Case 47 as well as from Case 44 toCase 48 accordingly
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
20 Computational and Mathematical Methods in Medicine
Case41
Case44
ΦAAA
Case42 Case46
Case48
ΦAA
ΦTT
Case431 Case47
Case432
ΦAAAA
Figure 24 Dependency graph for different cases for two pairs of individuals
Acknowledgments
The authors thank Professor Robert C Elston Case Schoolof Medicine for introducing to them the identity coefficientsand referring them to the related literature [7 10 17] Thiswork is partially supported by the National Science Founda-tionGrants DBI 0743705 DBI 0849956 andCRI 0551603 andby the National Institute of Health Grant GM088823
References
[1] Surgeon Generalrsquos New Family Health History Tool Is ReleasedReady for ldquo21st Century Medicinerdquo httpcompmedcomcate-gorypeople-helping-peoplepage7
[2] M Falchi P Forabosco E Mocci et al ldquoA genomewidesearch using an original pairwise sampling approach for largegenealogies identifies a new locus for total and low-density lipo-protein cholesterol in two genetically differentiated isolates ofSardiniardquoThe American Journal of Human Genetics vol 75 no6 pp 1015ndash1031 2004
[3] M Ciullo C Bellenguez V Colonna et al ldquoNew susceptibilitylocus for hypertension on chromosome 8q by efficient pedigree-breaking in an Italian isolaterdquo Human Molecular Genetics vol15 no 10 pp 1735ndash1743 2006
[4] Glossary of Genetic Terms National Human Genome ResearchInstitute httpwwwgenomegovglossaryid=148
[5] CW CottermanA calculus for statistico-genetics [PhD thesis]Columbus Ohio USA Ohio State University 1940 Reprintedin P Ballonoff Ed Genetics and Social Structure DowdenHutchinson amp Ross Stroudsburg Pa USA 1974
[6] G Malecot Les mathematique de lrsquoheredite Masson ParisFrance 1948 Translated edition The Mathematics of HeredityFreeman San Francisco Calif USA 1969
[7] M Gillois ldquoLa relation drsquoidentite en genetiquerdquo Annales delrsquoInstitut Henri Poincare B vol 2 pp 1ndash94 1964
[8] D L Harris ldquoGenotypic covariances between inbred relativesrdquoGenetics vol 50 pp 1319ndash1348 1964
[9] A Jacquard ldquoLogique du calcul des coefficients drsquoidentite entredeux individualsrdquo Population vol 21 pp 751ndash776 1966
[10] G Karigl ldquoA recursive algorithm for the calculation of identitycoefficientsrdquo Annals of Human Genetics vol 45 no 3 pp 299ndash305 1981
[11] B Elliott S F Akgul S Mayes and Z M Ozsoyoglu ldquoEfficientevaluation of inbreeding queries on pedigree datardquo in Proceed-ings of the 19th International Conference on Scientific and Statis-tical Database Management (SSDBM rsquo07) July 2007
[12] B Elliott E Cheng S Mayes and Z M Ozsoyoglu ldquoEfficientlycalculating inbreeding on large pedigrees databasesrdquo Informa-tion Systems vol 34 no 6 pp 469ndash492 2009
[13] L Yang E Cheng and Z M Ozsoyoglu ldquoUsing compactencodings for path-based computations on pedigree graphsrdquo inProceedings of the ACM Conference on Bioinformatics Compu-tational Biology and Biomedicine (ACM-BCB rsquo11) pp 235ndash244August 2011
[14] E Cheng B Elliott and Z M Ozsoyoglu ldquoScalable compu-tation of kinship and identity coefficients on large pedigreesrdquoin Proceedings of the 7th Annual International Conference onComputational Systems Bioinformatics (CSB rsquo08) pp 27ndash362008
[15] E Cheng B Elliott and Z M Ozsoyoglu ldquoEfficient compu-tation of kinship and identity coefficients on large pedigreesrdquoJournal of Bioinformatics and Computational Biology (JBCB)vol 7 no 3 pp 429ndash453 2009
[16] S Wright ldquoCoefficients of inbreeding and relationshiprdquo TheAmerican Naturalist vol 56 no 645 1922
[17] R Nadot and G Vaysseix ldquoKinship and identity algorithm ofcoefficients of identityrdquo Biometrics vol 29 no 2 pp 347ndash3591973
[18] E Cheng Scalable path-based computations on pedigree data[PhD thesis] Case Western Reserve University ClevelandOhio USA 2012
[19] V Ollikainen Simulation Techniques for Disease Gene Localiza-tion in Isolated Populations [PhD thesis] University ofHelsinkiHelsinki Finland 2002
[20] H T T Toivonen P Onkamo K Vasko et al ldquoData miningapplied to linkage diseqilibrium mappingrdquoThe American Jour-nal of Human Genetics vol 67 no 1 pp 133ndash145 2000
[21] W Boucher ldquoCalculation of the inbreeding coefficientrdquo Journalof Mathematical Biology vol 26 no 1 pp 57ndash64 1988
Submit your manuscripts athttpwwwhindawicom
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MEDIATORSINFLAMMATION
of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Behavioural Neurology
EndocrinologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Disease Markers
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
OncologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Oxidative Medicine and Cellular Longevity
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
PPAR Research
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
ObesityJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational and Mathematical Methods in Medicine
OphthalmologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Diabetes ResearchJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Research and TreatmentAIDS
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Gastroenterology Research and Practice
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Parkinsonrsquos Disease
Evidence-Based Complementary and Alternative Medicine
Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom
6 Computational and Mathematical Methods in Medicine
Path-pair
Path-triple Path-pair levelrepresentation Decomposition A set of
building blocksSets of acceptable casesFor each building block
Acceptable cases forpath-triple Natural join
If path-pair hascrossover
No
No
Yes
Yes
Split operator
Path-triple belongs toType 2
Type 1
If path-pair hasroot overlap
Compute its contributionto Φabc
Path-triple belongs to
⟨PAa PAb⟩Generate all cases for Identify nonoverlap path-
Pairs for ⟨PAa PAb⟩Compute its contribution
to Φab
Identify acceptable cases⟨PAa PAb⟩ in thefor
context of a path-triple
Aa PAb PAc ⟩⟨P
⟨PAa PAb⟩
Figure 4 A flowchart for path-counting formula derivation
for a path-pair Then we discuss building blocks for path-triples and identify all acceptable cases which are used inderiving the path-counting formula forΦ
119886119887119888
311 Cases for a Path-Pair Given a path-pair ⟨119875119860119886 119875119860119887⟩with
119861119894 119862(119875119860119886 119875119860119887) = 119873119880119871119871 where 119860 is a common ancestor of 119886
and 119887 and 119861119894 119862(119875119860119886 119875119860119887) consists of all common individuals
shared between 119875119860119886
and 119875119860119887 except 119860 we introduce three
patterns (ie crossover 2-overlap and root 2-overlap) to gen-erate all possible cases for ⟨119875
119860119886 119875119860119887⟩
(1) 119883(119875119860119886 119875119860119887) 119875119860119886
and 119875119860119887
share one or multiple cross-over individuals
(2) 119879(119875119860119886 119875119860119887) 119875119860119886
and 119875119860119887
are root 2-overlapping from119860 and the root 2-overlap path can have one or multi-ple 2-overlap individuals
(3) 119884(119875119860119886 119875119860119887)119875119860119886
and119875119860119887
are overlapping but not from119860 and the 2-overlap path can have one or multiple 2-overlap individuals
Based on the three patterns 119883(119875119860119886 119875119860119887) 119879(119875
119860119886 119875119860119887)
and 119884(119875119860119886 119875119860119887) we use regular expressions to generate all
possible cases for the path-pair ⟨119875119860119886 119875119860119887⟩ For convenience
we drop ⟨119875119860119886 119875119860119887⟩ and use 119883119879 and 119884 instead of patterns
119883(119875119860119886 119875119860119887) 119879(119875
119860119886 119875119860119887) and 119884(119875
119860119886 119875119860119887) whenever there is
no confusion When 119861119894 119862(119875119860119886 119875119860119887) = 119873119880119871119871 the eight cases
shown in (7) cover all possible cases for ⟨119875119860119886 119875119860119887⟩ The com-
pleteness of eight cases shown in (7) for ⟨119875119860119886 119875119860119887⟩ can be
proved by induction on the total number of 119879 119883 and 119884appearing in ⟨119875
119860119886 119875119860119887⟩ Using the pedigree in Figure 2 Cases
1ndash3 and Case 6 are illustrated in (8) (9) (10) and (11)
Case 1 119879Case 2 119883+
Case 3 119879119883+
Case 4 119879(119883+119884)+
Case 5 119879(119883+119884)+119883+
Case 6 119883+119884Case 7 119883+(119884119883+)+
Case 8 119883+(119884119883+)+119884
(7)
119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119886
119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119887 isin 119879 (8)
Computational and Mathematical Methods in Medicine 7
S0 S1 S2 S3
PAa PAb
PAc
Figure 5 A path-pair level graphical representation of ⟨119875119860119886 119875119860119887 119875119860119888⟩
where 119904 119890 119905 are 2-overlap individuals and the overlap pathis a root 2-overlap path
119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119886
119860 997888rarr 119904 997888rarr 119891 997888rarr 119905 997888rarr 119887 isin 119879119883 (9)
where 119904 is a 2-overlap individual and the overlap path is a root2-overlap path 119905 is a crossover individual
119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119886
119860 997888rarr 119889 997888rarr 119891 997888rarr 119905 997888rarr 119887 isin 119883 (10)
where 119905 is a crossover individual
119860 997888rarr 119888 997888rarr 119890 997888rarr 119905 997888rarr 119886
119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119887 isin 119883119884 (11)
where 119890 is a crossover individual 119905 is a 2-overlap individualand the overlap path is a 2-overlap path
312 Path-Pair Level Graphical Representation of a Path-Tri-ple Given a path-triple ⟨119875
119860119886 119875119860119887 119875119860119888⟩ we represent each
path as a node The path-triple can be decomposed to threepath-pairs (ie ⟨119875
119860119886 119875119860119887⟩ ⟨119875119860119886 119875119860119888⟩ and ⟨119875
119860119887 119875119860119888⟩) For
each path-pair if the two paths share at least one commonindividual (ie either 2-overlap individual or crossover indi-vidual) except119860 then there is an edge between the two nodesrepresenting the two paths Therefore we obtain four differ-ent scenarios 119878
0ndash1198783 shown in Figure 5
In Figure 5 the scenario 1198780has no edges so it means
that ⟨119875119860119886 119875119860119887 119875119860119888⟩ consists of three independent paths In
Figure 2 path-triple1 is an example of 1198780 Next we introduce
a lemma which can assist with identifying the options for theedges in the scenarios 119878
1ndash1198783
Lemma 3 Given a path-triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ consider the
three path-pairs ⟨119875119860119886 119875119860119887⟩ ⟨119875119860119886 119875119860119888⟩ and ⟨119875
119860119887 119875119860119888⟩ if there
is a 2-overlap edge which is represented by 119884 in regular expres-sion representation of any of the three path-pairs and then thepath-triple ⟨119875
119860119886 119875119860119887 119875119860119888⟩ has no contribution to Φ
119886119887119888
Proof In [17] Nadot and Vaysseix proposed from a geneticand biological point of view that Φ
119886119887119888can be evaluated by
enumerating all eligible inheritance paths at allele-level start-ing from a triple common ancestor119860 to the three individuals119886 119887 and 119888
p1
p3
A
b c
a
p2
p5
p8
p4
p7
p6
(a) Pedigree
A
b c
a
p5
p7
p4
p6
p8
p1 p2
p3
(b) Inheritance paths
Figure 6 Examples of pedigree and inheritance paths
For the pedigree in Figure 6 let us consider the path-triple ⟨119875
119860119886 119875119860119887 119875119860119888⟩ listed as follows 119875
119860119886 119860 rarr 119886 119875
119860119887
119860 rarr 1199013rarr 1199016rarr 1199017rarr 119887 119875
119860119888 119860 rarr 119901
4rarr 1199016rarr
1199017rarr 119888For ⟨119875
119860119887 119875119860119888⟩ 1199016is a crossover individual 119901
7is an over-
lap individual and 1199016rarr 1199017is a 2-overlap edge repre-sented
by 119884 in regular expression representation (see the definitionfor 119884 in Section 311)
For the individual 1199016 let us denote the two alleles at one
fixed autosomal locus as 1198921and 119892
2 At allele-level only one
allele can be passed down from 1199016to 1199017 Since 119901
3and 119901
4
are parents of 1199016 1198921is passed down from one parent and
1198922is passed down from the other parent It is infeasible to
pass down both 1198921and 119892
2from 119901
6to 1199017 In other words
there are no corresponding inheritance paths for the path-triple ⟨119875
119860119886 119875119860119887 119875119860119888⟩with a 2-overlap edge between ⟨119875
119860119887 119875119860119888⟩
(ie Case 6119883119884) Therefore such kind of path-triples has nocontribution toΦ
119886119887119888
Figure 6(b) shows one example of eligible inheritancepaths corresponding to a pedigree graph Each individual isrepresented by two allele nodesThe eligible inheritance pathsin Figure 6(b) consist of red edges only
Only Case 1 Case 2 and Case 3 do not have 119884 in theregular expression representation of a path-pair (see (7))considering the scenarios 119878
1ndash1198783shown in Figure 5 an edge
can have three options Case 1 119879Case 2 119883Case 3 119879119883
313 Constructing Cases for a Path-Triple For the scenarios1198781ndash1198783in Figure 5 we define two building blocks 119861
1 1198612
along with some rules in Figure 7 to generate acceptablecases For 119861
1 the edge can have three options Case 1 119879
Case 2 119883 Case 3 119879119883 For 1198612 we cannot allow both edges
to be root overlap because if two edges are root overlap then
8 Computational and Mathematical Methods in Medicine
For B2 there can be at most one edge belonging to root overlap (either T or TX)
PAa PAa
PAb PAb PAc
B1 B2
For B1 the edge can have three options case 1 T case 2 X case 3 TX
Figure 7 Building blocks 1198611 1198612 and basic rules
Note Ri denotes all acceptable path-triples for ui
S3e1
T3 = R1 ⋈ R2 ⋈ R3u1 u2 u3
e2 e2 e2
e3e3 e3e1 e1
Figure 8 A graphical illustration for obtaining 1198793
119875119860119886
and 119875119860119888
must share at least one com-mon individualexcept 119860 which contradicts the fact that 119875
119860119886and 119875
119860119888have
no edgeNext we focus on generating all acceptable cases for the
scenarios 1198781ndash1198783in Figure 5 where only 119878
3contains more
than one building block In order to leverage the dependencyamong building blocks we decompose 119878
3to 1198783= 1199061= 1198612
1199062= 1198612 1199063= 1198612 shown in Figure 8 For each 119906
119894 we have a
set of acceptable path-triples denoted as 119877119894
Considering the dependency among 1198771 1198772 1198773 we use
the natural join operator denoted as ⋈ operating on 1198771
1198772 1198773 to generate all acceptable cases for 119878
3 As a result we
obtain 1198793= 1198771⋈ 1198772⋈ 1198773 where 119879
3denotes the acceptable
cases of the path-triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ in the scenario 119878
3
For each scenario in Figure 5 we generate all acceptablecases for ⟨119875
119860119886 119875119860119887 119875119860119888⟩ The scenario 119878
0has no edges and
it shows that ⟨119875119860119886 119875119860119887 119875119860119888⟩ consists of three independent
paths while for the other scenarios 119878119896(119896 = 1 2 3) the 119896
edges can have two options
(1) all 119896 edges belong to crossover or(2) one edge belongs to root 2-overlap the remaining (119896minus
1) edges belong to crossover
In summary acceptable path-triples can have at most oneroot 2-overlap path any number of crossover individuals butzero 2-overlap path
314 Splitting Operator Considering the existence of root2-overlap path and crossover in acceptable path-triples wepropose a splitting operator to transform a path-triple withcrossover individuals to a noncrossover path-triple withoutchanging the contribution from this path-triple to Φ
119886119887119888 The
main purpose of using the splitting operator is to simplifythe path-counting formula derivation process We first usean example in Figure 9 to illustrate how the splitting operator
works In Figure 9 there is a crossover individual 119904 between119875119860119886
and 119875119860119887
in the path triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ in 119866
119896+1 The
splitting operator proceeds as follows
(1) split the node 119904 to two nodes 1199041and 1199042
(2) transform the edges 119904 rarr 1198861015840 and 119904 rarr 119887
1015840 to 1199041rarr 1198861015840
and 1199042rarr 1198871015840 respectively
(3) add two new edges 1199042rarr 1198861015840 and 119904
1rarr 1198871015840
Lemma 4 Given a pedigree graph 119866119896+1
having (119896 + 1)
crossover individuals regarding ⟨119875119860119886 119875119860119887 119875119860119888⟩ shown in
Figure 9 let 119904 denote the lowest crossover individual where nodescendant of 119904 can be a crossover individual among the threepaths119875
119860119886119875119860119887 and119875
119860119888 After using the splitting operator for the
lowest crossover individual 119904 in119866119896+1 the number of crossover
individuals in 119866119896+1
is decreased by 1
Proof The splitting operator only affects the edges from 119904 to1198861015840 and 1198871015840 If there is a new crossover node appearing the only
possible node is either 1198861015840 or 1198871015840 Assume 1198871015840 becomes a cross-over individual it means that 1198871015840 is able to reach 119886 and 119887 fromtwo separate paths It contradicts the fact that 119904 is the lowestcrossover individual between 119875
119860119886and 119875
119860119887
Next we introduce a canonical graph which results fromapplying the splitting operator for all crossover individualsThe canonical graph has zero crossover individual
Definition 5 (Canonical Graph) Given a pedigree graph 119866having one or more crossover individuals regarding Φ
119886119887119888 If
there exists a graph 1198661015840 which has no crossover individualswith regards to Φ
119886119887119888such that
(i) any acceptable path-triple in 119866 has an acceptablepath-triple in 1198661015840 which has the same contribution toΦ119886119887119888
as the one in 119866 forΦ119886119887119888
(ii) any acceptable path-triple in 1198661015840 has an acceptablepath-triple in 119866 which and has the same contributionto Φ119886119887119888
as the one in 1198661015840 forΦ119886119887119888
We call 1198661015840 a canonical graph of 119866 regardingΦ119886119887119888
Lemma 6 For a pedigree graph 119866 having one or morecrossover individuals regarding ⟨119875
119860119886 119875119860119887 119875119860119888⟩ there exists a
canonical graph 1198661015840 for 119866
Computational and Mathematical Methods in Medicine 9
Ancestor-descendant relationshipParent-child relationship
a998400 b
a b a b
998400 a998400 b998400
s1 s2
A A
x w c x w c
s For Gk+1 ⟨P ⟩ = PAa PAb PAc
⟨P ⟩ = PAa PAb PAcFor Gk
Gk+1 k + 1 crossover Gk k crossover
A rarr middot middot middot rarr x rarr s rarr a998400 rarr middot middot middot rarr aA rarr middot middot middot rarr w rarr s rarr b998400 rarr middot middot middot rarr b
A rarr middot middot middot rarr x rarr s1 rarr a998400 rarr middot middot middot rarr aA rarr middot middot middot rarr w rarr s2 rarr b998400 rarr middot middot middot rarr b
A rarr c
A rarr c
Figure 9 Transforming pedigree graph 119866119896+ 1 having 119896 + 1 crossover to 119866
119896having 119896 crossover
S0
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
PAa PAd
PAb PAc
Figure 10 A path-pair level graphical representation of ⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
Proof (Sketch) The proof is by induction on the number ofcrossover individuals
Induction hypothesis assume that if119866 has 119896 or less cross-overs there is a canonical graph 1198661015840 for 119866
In the induction step let119866119896+1
be a graph with 119896+1 cross-overs let 119904 be the lowest crossover between paths 119875
119860119886and
119875119860119887
in 119866119896+1
We apply the splitting operator on 119904 in 119866119896+1
andobtain 119866
119896having 119896 crossovers by Lemma 4
315 Path-Counting Formula for Φ119886119887119888
Now we present thepath-counting formula forΦ
119886119887119888
Φ119886119887119888= sum
119860
( sum
Type 1(1
2)
119871 triple
Φ119860119860119860
+ sum
Type 2(1
2)
119871 triple+1
Φ119860119860)
(12)
where Φ119860119860= (12)(1 + 119865
119860) Φ119860119860119860
= (14)(1 + 3119865119860) 119865119860 the
inbreeding coefficient of119860119860 a triple-common ancestor of 119886119887 and 119888 Type 1 ⟨119875
119860119886 119875119860119887 119875119860119888⟩ has zero root 2-overlap Type
2 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has one root 2-overlap path 119875
119860119904ending at
the individual 119904
119871 triple = 119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888minus 119871119875119860119904
for Type 2(13)
and 119871119875119860119886
the length of the path 119875119860119886
(also applicable for 119875119860119886
119875119860119888 and 119875
119860119904)
For completeness the path-counting formula for Φ119886119886119887
isgiven in Appendix A and the correctness proof of the path-counting formula is given in Appendix B
32 Path-Counting Formulas for Four Individuals
321 Path-Pair Level Graphical Representation of ⟨119875119860119886119875119860119887
119875119860119888119875119860119889⟩ Given a path-quad ⟨119875
119860119886 119875119860119887 119875119860119888 119875119860119889⟩ and
119876119906119886119889 119862(119875119860119886 119875119860119887 119875119860119888 119875119860119889) = 0 the path-quad can have 11
scenarios 1198780ndash11987810shown in Figure 10 where all four paths are
considered symmetricallyIn Figure 11 we introduce three building blocks 119861
1
1198612 1198613 For 119861
1and 119861
2 the rules presented in Figure 7 are also
applicable for Figure 11 For1198613 we only consider root overlap
because the crossover individuals can be eliminated by usingthe splitting operator introduced in Section 314 Note thatfor 1198613 if 119879119903119894 119862(119875
119860119886 119875119860119887 119875119860119888) = 0 then it is equivalent to the
scenario 1198783in Figure 8 Therefore we only need to consider
1198613when 119879119903119894 119862(119875
119860119886 119875119860119887 119875119860119888) = 0
322 Building Block-Based Cases Construction for ⟨119875119860119886119875119860119887
119875119860119888119875119860119889⟩ For a scenario 119878
119894(0 le 119894 le 10) in Figure 11 we
first decompose 119878119894to one or multiple building blocks For a
scenario 119878119894isin 1198781 1198783 it has only one building block and
all acceptable cases can be obtained directly For 1198782= 1199061=
1198611 1199062= 1198611 there is no need to consider the conflict between
the edges in 1199061and 119906
2because 119906
1and 119906
2are disconnected
Let 119877119894denote all acceptable cases of the path-pairs in 119906
119894 and
let 119879119894denote all acceptable cases for 119878
119894 Therefore we obtain
1198792= 1198771times1198772where times denotes the Cartesian product operator
from relational algebra
10 Computational and Mathematical Methods in Medicine
For B3 all three edges belong to root overlap (ie having root 3-overlap)
PAa
PAb PAcPAb
PAa
C(PAa PAb PAc) ne
B1 B2 B3
Tri 0
Figure 11 Building blocks for all scenarios of ⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
Table 2 Largest subgraph of a scenario 119878119894(4 le 119894 le 10 and 119894 = 6)
119878119894
1198784
1198785
1198787
1198788
1198789
11987810
119878119895
1198783
1198783
1198786
1198785
1198787
1198789
For 1198786= 1199061= 1198613 we obtain 119879
6= 1198771 For 119878
119894isin 119878119894| 4 le
119894 le 10 and 119894 = 6 we define the largest subgraph of 119878119894based
on which we construct 119879119894
Definition 7 (Largest Subgraph) Given a scenario 119878119894(4 le 119894 le
10 and 119894 = 6) the largest subgraph of 119878119894 denoted as 119878
119895 is
defined as follows
(1) 119878119895is a proper subgraph of 119878
119894
(2) if 119878119894contains 119861
3 then 119878
119895must also contain 119861
3
(3) no such 119878119896exists that 119878
119895is a proper subgraph of 119878
119896
while 119878119896is also a proper subgraph of 119878
119894
For each scenario 119878119894(4 le 119894 le 10 and 119894 = 6) we list the
largest subgraph of 119878119894 denoted as 119878
119895 in Table 2
For a scenario 119878119894(4 le 119894 le 10 and 119894 = 6) let Diff(119878
119894 119878119895)
denote the set of building blocks in 119878119894but not in 119878
119895 where 119878
119895is
the largest subgraph of 119878119894 Let |119864
119894| and |119864
119895| denote the number
of edges in 119878119894and 119878
119895 respectively According to Table 2 we
can conclude that |119864119894| minus |119864
119895| = 1 In order to leverage the
dependency among building blocks we consider only 1198612in
Diff(119878119894119878119895) For example Diff(119878
51198783) = 119861
2 Let119879
3denote all
acceptable cases for 1198783 And let119877
1denote the set of acceptable
cases for Diff(1198785 1198783) Then we can use 119878
3and Diff(119878
5
1198783) to construct all acceptable cases for 119878
5 Then we apply
this idea for constructing all acceptable cases for each 119878119894in
Table 2Given a path-quad ⟨119875
119860119886 119875119860119887 119875119860119888 119875119860119889⟩ an acceptable case
has the following properties
(1) if there is one root 3-overlap path there can be atmostone root 2-overlap path
(2) otherwise there can be at most two root 2-overlappaths
323 Path-Counting Formula forΦ119886119887119888119889
Now we present thepath-counting formula forΦ
119886119887119888119889as follows
Φ119886119887119888119889
= sum
119860
( sum
Type 1(1
2)
119871quad
Φ119860119860119860119860
+ sum
Type 2(1
2)
119871quad+1
Φ119860119860119860
+ sum
Type 3(1
2)
119871quad+2
Φ119860119860)
(14)
where Φ119860119860= (12)(1+119865
119860)Φ119860119860119860
= (14)(1+3119865119860)Φ119860119860119860119860
=
(18)(1+7119865119860) 119865119860 the inbreeding coefficient of119860119860 a quad-
common ancestor of 119886 119887 119888 and 119889 Type 1 zero root 2-overlapand zero root 3-overlap path Type 2 one root 2-overlap path119875119860119904
ending at 119904
Type 3
Case 1 two root 2-overlap paths 1198751198601199041
1198751198601199042
ending at 1199041and 1199042 respectively
Case 2 one root 3-overlap path119875119860119905
ending at 119905Case 3 one root 2-overlap path119875119860119904 one root 3-overlap
path 119875119860119905
ending at 119904 and 119905respectively
119871quad =
119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889
for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 119871119875119860119904
for Type 2119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889
minus1198711198751198601199041
minus 1198711198751198601199042
for Case 1 isin Type 3119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 2 lowast 119871
119875119860119905for Case 2 isin Type 3
119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889
minus119871119875119860119905minus 119871119875119860119904
for Case 3 isin Type 3(15)
and 119871119875119860119886
the length of the path 119875119860119886
(also applicable for 119875119860119887
119875119860119888 119875119860119889 etc)
For completeness the path-counting formulas for Φ119886119886119887119888
and Φ119886119886119886119887
are presented in Appendix A The correctness ofthe path-counting formula for four individuals is proven inAppendix C
Computational and Mathematical Methods in Medicine 11
⟨ ⟩(PAa PAb) (PAc PAd) = b
A
c
s t
da
A rarr s rarr aA rarr s rarr bA rarr t rarr cA rarr t rarr d
(a)
⟨ ⟩(PAa PAb) (PAc PAd) = b
A
c
x y
da
A rarr x rarr a
A rarr x rarr d
A rarr y rarr bA rarr y rarr c
(b)
Figure 12 Examples of 2-pair-path-quads for Φ119886119887119888119889
33 Path-Counting Formulas for Two Pairs of Individuals
331 Terminology and Definitions
(1) 2-Pair-Path-Pair It consists of two pairs of path-pairsdenoted as ⟨(119875
119878119886 119875119878119887) (119875119879119888 119875119879119889)⟩ where 119875
119878119886isin 119875(119878 119886) 119875
119878119887isin
119875(119878 119887) 119875119879119888isin 119875(119879 119888) 119875
119879119889isin 119875(119879 119889) 119878 is a common ancestor
of 119886 and 119887 and 119879 is a common ancestor of 119888 and 119889 If119860 = 119878 =119879 then 119860 is a quad-common ancestor of 119886 119887 119888 and 119889
(2) Homo-Overlap and Heter-Overlap Individual Given twopairs of individuals ⟨119886 119887⟩ and ⟨119888 119889⟩ if 119904 isin 119861119894 119862(119875
119860119886 119875119860119887) (or
119904 isin 119861119894 119862(119875119860119888 119875119860119889) we call 119904 a homo-overlap individual when
119875119860119886
and 119875119860119887
(or 119875119860119888
and 119875119860119889) pass through the same parent of
119904 If 119903 isin 119861119894 119862(119875119860119894 119875119860119895) where 119894 isin 119886 119887 and 119895 isin 119888 119889 we call
119903 a heter-overlap individual when 119875119860119894
and 119875119860119895
pass throughthe same parent of 119903
(3) Root Homo-Overlap and Heter-Overlap Path Given a 2-pair-path-pair ⟨(119875
119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ if 119904 is a homo-overlap
individual and the homo-overlap path extends all the wayto the quad-common ancestor 119860 then we call it a roothomo-overlap path If 119903 is a heter-overlap individual and theheter-overlap path extends all the way to the quad-commonancestor 119860 then we call it a root heter-overlap path
Example 8 119860 is quad-common ancestor for 119886 119887 119888 and 119889 inFigure 12 For (a) 119904 is a homo-overlap individual between 119875
119860119886
and 119875119860119887
119905 is a homo-overlap individual between 119875119860119888
and 119875119860119889 And
119860 rarr 119904 and 119860 rarr 119905 are root homo-overlap paths For (b) 119909 isa heter-overlap individual between 119875
119860119886and 119875
119860119889 119910 is a heter-
overlap individual between 119875119860119887
and 119875119860119888 And 119860 rarr 119909 and
119860 rarr 119910 are root heter-overlap paths
332 Path-Counting Formula for Φ119886119887119888119889
Now we presenta path-pair level graphical representation for ⟨(119875
119860119886 119875119860119887)
(119875119860119888 119875119860119889)⟩ shown in Figure 13 The options for an edge can
be 119879119883 119879119883 (Refer to Section 311 for definitions of 119879119883and 119879119883) Based on the different types of ⟨119875
119860119886 119875119860119887 119875119860119888 119875119860119889⟩
presented in (14) all cases for ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ are
summarized in Table 3 where ℎ is the last individual of a roothomo-overlap path 119875
119860ℎ(ie the path 119875
119860ℎending at ℎ) and 119903
1
and 1199032are the last individuals of root heter-overlap paths 119875
1198601199031
and 1198751198601199032
respectivelyGiven a pedigree graph having one or multiple progeni-
tors 119901119894| 119894 gt 0 we define that the generation of a progenitor
Table 3 A summary of all cases for ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩ ⟨(119875
119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩
Zero root 2-overlap andzero root 3-overlap
Zero root homo-overlap and zero rootheter-overlap
One root 2-overlap path
One root homo-overlap and zero rootheter-overlapZero root homo-overlap and one rootheter-overlap
Two root 2-overlap paths
Two root homo-overlaps and zero rootheter-overlapZero root homo-overlap and two rootheter-overlaps
One root 3-overlap path One root homo-overlap and two rootheter-overlaps and ℎ = 119903
1= 1199032
One root 2-overlap andone root 3-overlap
One root homo-overlap and two rootheter-overlaps and 119903
1= 1199032= ℎ
One root homo-overlap and two rootheter-overlaps and ℎ = 119903
1= 1199032
119901119894is 0 denoted as gen(119901
119894) = 0 If an individual 119886 has only
one parent 119901 then we define gen(119886) = gen(119901) + 1 If anindividual 119886 has two parents 119891 and 119898 we define gen(119886) =MAXgen(119891) gen(119898) + 1
The path-counting formula forΦ119886119887119888119889
is as follows
Φ119886119887119888119889
= sum
119860
( sum
Type 1(1
2)
1198712-pair
Φ119860119860119860
+ sum
Type 2(1
2)
1198712-pair+1
Φ119860119860119860
+ sum
Type 3(1
2)
1198712-pair+2
Φ119860119860
+ sum
Type 4(1
2)
1198712-pair+1
Φ119860119860)
+ sum
(119878119879)isinType 5(1
2)
119871⟨119875119878119886119875119878119887⟩+119871⟨119875119879119888119875119879119889
⟩+1
Φ119861119861
(16)
where 119860 a quad-common ancestor of 119886 119887 119888 and 119889 119878a common ancestor of 119886 and 119887 and 119879 a common ances-tor of 119888 and 119889 For ⟨(119875
119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ (119878 = 119879 =
119860) there are four types (ieType 1 to Type 4)
12 Computational and Mathematical Methods in Medicine
S0S1 S2 S3 S4 S5 S6 S7
S8 S9 S10 S11 S12 S13 S14 S15 S16
PAa
PAdPAb
PAc
Figure 13 Scenarios of ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ at path-pair level
Type 1 zero root homo-overlap and zero root heter-overlapType 2 zero root homo-overlap and one root heter-overlap 119875
119860119903ending at 119903
Type 3
zero root homo-overlap and two rootheter-overlap 119875
1198601199031and1198751198601199032
ending at1199031and 1199032 respectively
one root homo-overlap 119875119860ℎ
ending at ℎand two root heter-overlap 119875
1198601199031and 119875
1198601199032
ending at 1199031and 1199032 and 119903
1= 1199032
(17)
Type 4 one root homo-overlap 119875119860ℎ
ending at ℎ andtwo root heter-overlap ending at 119903
1and 1199032 and ℎ =
1199031= 1199032 For ⟨(119875
119878119886 119875119878119887) (119875119879119888 119875119879119889)⟩ (119878 = 119879) there is
one type (ie Type 5)Type 5 ⟨119875
119878119886 119875119878119887⟩ has zero overlap individual ⟨119875
119879119888
119875119879119889⟩ has zero overlap individual
At most one path-pair (either ⟨119875119878119886 119875119878119887⟩ or ⟨119875
119879119888
119875119879119889⟩) can have crossover individualsBetween a path from ⟨119875
119878119886 119875119878119887⟩ and a path from ⟨119875
119879119888 119875119879119889⟩
there are no overlap individuals but there can be crossoverindividuals 119909 where 119909 = 119878 and 119909 = 119879
119861=
119878 when gen (119878) lt gen (119879)119878 when gen (119878) = gen (119879)
and 119879 has two parents119879 otherwise
1198712-pair =
119871119875119860119886+ 119871119875119860119887
+119871119875119860119888+ 119871119875119860119889
for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 119871119875119860119903
for Type 2119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 1198711198751198601199031
minus 1198711198751198601199032
for Type 3119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 2 lowast 119871
119875119860ℎfor Type 4
119871⟨119875119878119886 119875119878119887⟩
= 119871119875119878119886+ 119871119875119878119887
for Type 5
119871⟨119875119879119888 119875119879119889⟩
= 119871119875119879119888+ 119871119875119879119889
for Type 5
(18)
Note that if ⟨119886 119887⟩ and ⟨119888 119889⟩ have zero quad-commonancestors we have the following formula for Φ
119886119887119888119889
Φ119886119887119888119889
= sum
(119878119879)isinType 6(1
2)
119871⟨119875119878119886119875119878119887⟩+119871⟨119875119879119888119875119879119889
⟩
Φ119878119878lowast Φ119879119879 (19)
Type 6 ⟨119875119878119886 119875119878119887⟩ is a nonoverlapping path-pair and ⟨119875
119879119888
119875119879119889⟩ is a nonoverlapping path-pair Between a path from
⟨119875119878119886 119875119878119887⟩ and a path from ⟨119875
119879119888 119875119879119889⟩ there are no overlap
individuals but there can be crossover individuals119871⟨119875119878119886 119875119878119887⟩
and 119871⟨119875119879119888119875119879119889⟩
are defined as in Type 5The correctness of the path-counting formula forΦ
119886119887119888119889is
proven in Appendix C For completeness please refer to [18]for the path-counting formulas for Φ
119886119886119887119888 Φ119886119887119886119888
Φ119886119887119886119887
andΦ119886119886119886119887
34 Experimental Results In this section we show the effi-ciency of our path-counting method using NodeCodes forcondensed identity coefficients by making comparisons withthe performance of a recursive method used in [10] Weimplemented two methods (1) using recursive formulas tocompute each required kinship coefficient and generalizedkinship coefficient (2) using path-counting method coupledwith NodeCodes to compute each required kinship coeffi-cient and generalized kinship coefficient independently Werefer to the first method as Recursive the second methodas NodeCodes For completeness please refer to [18] for thedetails of the NodeCodes-based method
Nodecodes of a node is a set of labels each representing apath to the node from its ancestors Given a pedigree graphlet 119903 be the progenitor (ie the node with 0 in-degree)(For simplicity we assume there is one progenitor 119903 asthe ancestor of all individuals in the pedigree Otherwise avirtual node 119903 can be added to the pedigree graph and allprogenitors can be made children of 119903) For each node 119906 inthe graph the set of NodeCodes of 119906 denoted as NC(119906) areassigned using a breadth-first-search traversal starting from119903 as follows
(1) If 119906 is 119903 then NC(119903) contains only one element theempty string
(2) Otherwise let 119906 be a node with NC(119906) and V0 V1
V119896be 119906rsquos children in sibling order then for each 119909
in NC(119906) a code 119909119894lowast is added to NC(V119894) where 0 le
119894 le 119896 and lowast indicates the gender of the individualrepresented by node V
119894
Computational and Mathematical Methods in Medicine 13
Computations of kinship coefficients for two individualsand generalized kinship coefficients for three individualspresented in [11 12 14 15] are using NodeCodes TheNodeCodes-based computation schemes can also be appliedfor the generalized kinship coefficients for four individualsand two pairs of individuals For completeness please referto [18] for the details using NodeCodes to compute thegeneralized kinship coefficients for four individuals and twopairs of individuals based on our proposed path-countingformulas in Sections 32 and 33
In order to test the scalability of our approach for cal-culating condensed identity coefficients on large pedigreeswe used a population simulator implemented in [11] togenerate arbitrarily large pedigreesThe population simulatoris based on the algorithm for generating populations withoverlapping generations in Chapter 4 of [19] along withthe parameters given in Appendix B of [20] to model therelatively isolated Finnish Kainuu subpopulation and itsgrowth during the years 1500ndash2000 An overview of thegeneration algorithmwas presented in [11 12 14]The param-eters include startingending year initial population sizeinitial age distribution marriage probability maximum ageat pregnancy expected number of children by time periodimmigration rate and probability of death by time period andage group
We examine the performance of condensed identity coef-ficients using twelve synthetic pedigrees which range from75 individuals to 195197 individuals The smallest pedigreespans 3 generations and the largest pedigree spans 19 gener-ations We analyzed the effects of pedigree size and the depthof individuals in the pedigree (the longest path between theindividual and a progenitor) on the computation efficiencyimprovement
In the first experiment 300 random pairs were selectedfrom each of our 12 synthetic pedigrees Figure 14 showscomputation efficiency improvement for each pedigree Ascan be seen the improvement of NodeCodes over Recursivegrew increasingly larger as the pedigree size increased froma comparable amount of 2683 on the smallest pedigree to9475 on the largest pedigree It also shows that path-count-ing method coupled with NodeCodes can scale very well onlarge pedigrees in terms of computing condensed identitycoefficients
In our next experiment we examined the effect of thedepth of the individual in the pedigree on the query time Foreach depth we generated 300 random pairs from the largestsynthetic pedigree
Figure 15 shows the effect of depth on the compu-tation efficiency improvement We can see the improve-ment of NodeCodes over Recursive ranging from 8648 to9130
4 Conclusion
We have introduced a framework for generalizing Wrightrsquospath-counting formula for more than two individuals Aim-ing at efficiently computing condensed identity coefficients
0
50
100
150
200
77 181
383
769
1558
3105
6174
1235
1
2466
7
4976
1
9832
8
1951
97
250
300
Aver
age t
ime (
ms)
Individuals in pedigree
RecursiveNodecodes
Figure 14 The effect of pedigree size on computation efficiencyimprovement
0200400600800
10001200140016001800
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Aver
age t
ime (
ms)
Depth
RecursiveNodeCodes
Figure 15 The effect of depth on computation efficiency improve-ment
we proposed path-counting formulas (PCF) for all general-ized kinship coefficients for which are sufficient for express-ing condensed identity coefficients by a linear combinationWe also perform experiments to compare the efficiency of ourmethod with the recursive method for computing condensedidentity coefficients on large pedigrees Our future workincludes (i) further improvements on condensed identifycoefficients computation by collectively calculating the setof generalized kinship coefficients to avoid redundant com-putations and (ii) experimental results for using PCF inconjunction with encoding schemes (eg compact path-encoding schemes [13]) for computing condensed identitycoefficients on very large pedigrees
Appendices
A Path-Counting Formulas of Special Cases
A1 Path-Counting Formula for Φ119886119886119887
For ⟨1198751198601198861 1198751198601198862⟩ we
introduce a special case where 1198751198601198861
and 1198751198601198862
aremergeable
14 Computational and Mathematical Methods in Medicine
PAa1 PAa2 PAa1 PAa2
S0 S1
PAb PAb PAb
If is mergeable⟨P ⟩Aa1 PAa2
PAa
S2 S3
Figure 16 A path-pair level graphical representation of ⟨1198751198601198861 1198751198601198862
119875119860119887⟩
Definition A1 (Mergeable Path-Pair) A path-pair ⟨1198751198601198861
1198751198601198862⟩ is mergeable if and only if the two paths 119875
1198601198861and 119875
1198601198862
are completely identical
Next we present a graphical representation of ⟨1198751198601198861 1198751198601198862
119875119860119887⟩ in Figure 16
Lemma A2 For 1198782and 119878
3in Figure 16 ⟨119875
1198601198861 1198751198601198862⟩ cannot
be a mergeable path-pair
Proof For 1198782and 119878
3 if ⟨119875
1198601198861 1198751198601198862⟩ is mergeable then
any common individual 119904 between 1198751198601198861
and 119875119860119887
is alsoa shared individual between 119875
1198601198862and 119875
119860119887 It means
119904 isin 119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) which contradicts the fact that
119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) = 0
Considering all three scenarios in Figure 16 only 1198781can
have a mergeable path-pair ⟨1198751198601198861 1198751198601198862⟩ by Lemma A2 Now
we present our path-counting formula forΦ119886119886119887
where 119886 is notan ancestor of 119887
Φ119886119886119887
= sum
119860
( sum
Type 1(1
2)
119871 tripleminus1
Φ119860119860119860
+ sum
Type 2(1
2)
119871 triple
Φ119860119860
+ sum
Type 3(1
2)
119871⟨119875119860119886119875119860119887⟩+1
Φ119860119860)
(A1)
where 119860 a common ancestor of 119886 and 119887When ⟨119875
1198601198861 1198751198601198862⟩ is not mergeable
Type 1 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has no root 2-overlap
Type 2 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has one root 2-overlap path
119875119860119904
ending at the individual 119904
When ⟨1198751198601198861 1198751198601198862⟩ is mergeable
Type 3 ⟨119875119860119886 119875119860119887⟩ is a nonoverlapping path-pair
119871 triple = 1198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887
for Type 11198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887minus 119871119875119860119904
for Type 2
119871⟨119875119860119886 119875119860119887⟩
= 119871119875119860119886+ 119871119875119860119887
for Type 3
(A2)
For the sake of completeness if 119886 is an ancestor of 119887 there isno recursive formula for Φ
119886119886119887in [10] but we can use either
the recursive formula for Φ119886119887119888
or the path-counting formulaforΦ119886119887119888
to computeΦ11988611198862119887
A2 Path-Counting Formula for Φ119886119886119887119888
Given a path-quad⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩ if ⟨119875
1198601198861 1198751198601198862⟩ is not mergeable then
we process the path-quad as equivalent to ⟨119875119860119886 119875119860119887 119875119860119888
119875119860119889⟩ If ⟨119875
1198601198861 1198751198601198862⟩ is mergeable the path-quad ⟨119875
1198601198861 1198751198601198862
119875119860119887 119875119860119888⟩ can be condensed to scenarios for ⟨119875
119860119886 119875119860119887 119875119860119888⟩
Now we present a path-counting formula forΦ119886119886119887119888
where119886 is not an ancestor of 119887 and 119888 as follows
Φ119886119886119887119888
= sum
119860
( sum
Type 1(1
2)
119871quadminus1
Φ119860119860119860119860
+ sum
Type 2(1
2)
119871quad
ΦAAA
+ sum
Type 3(1
2)
119871quad+1
Φ119860119860)
+sum
119860
( sum
Type 4(1
2)
119871 triple+1
Φ119860119860119860
+ sum
Type 5(1
2)
119871 triple+2
Φ119860119860)
(A3)
where 119860 a quad-common ancestor of 119886 119887 119888 and 119889When ⟨119875
1198601198861 1198751198601198862⟩ is not mergeable
Type 1 zero root 2-overlap and zero root 3-overlappathType 2 one root 2-overlap path 119875
119860119904ending at 119904
Type 3
Case 1 two root 2-overlap paths 1198751198601199041
and 1198751198601199042
ending at 1199041and 1199042 respectively
Case 2 one root 3-overlap path 119875119860119905
ending at 119905Case 3 one root 2-overlapand one root 3-overlap paths119875119860119904
and 119875119860119905
ending at 119904 and 119905respectively
(A4)
When ⟨1198751198601198861 1198751198601198862⟩ is mergeable
Type 4 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has zero root 2-overlap path
Type 5 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has one root 2-overlap path119875
119860119904
ending at 119904
119871quad=
1198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
for Type 11198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus119871119875119860119904
for Type 21198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus1198711198751198601199041
minus 1198711198751198601199042
for Case 1isinType 31198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus119871119875119860119905
for Case 2isinType 31198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus119871119875119860119905minus 119871119875119860119904
for Case 3isinType 3
119871 triple = 119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
for Type 4119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888minus 119871119875119860119904
for Type 5(A5)
Computational and Mathematical Methods in Medicine 15
Note that if 119886 is an ancestor of either 119887 or 119888 or both ofthem then the path-counting formula of Φ
119886119887119888119889is applicable
to computeΦ11988611198862119887119888
A3 Path-Counting Formula for Φ119886119886119886119887
A special case of⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ for ⟨119875
1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ is introduced
when ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable With the existence of
a mergeable path-triple ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ can be con-
densed to ⟨119875119860119886 119875119860119887⟩
Definition A3 (Mergeable Path-Triple) Given three paths1198751198601198861
1198751198601198862
and 1198751198601198863
they are mergeable if and only if theyare completely identical
Lemma A4 Given a path-quad ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there
must be at least one mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩
⟨1198751198601198861 1198751198601198863⟩ ⟨1198751198601198862 1198751198601198863⟩
Proof For an individual 119886 with two parents 119891 and 119898 thepaternal allele of the individual 119886 is transmitted from 119891 andthe maternal allele is transmitted from119898 At allele level onlytwo descent paths starting from an ancestor are allowed Fora path-quad ⟨119875
1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there must be at least one
mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩ ⟨1198751198601198861 1198751198601198863⟩ and
⟨1198751198601198862 1198751198601198863⟩
For simplicity we treat ⟨1198751198601198861 1198751198601198862⟩ as a default mergeable
path-pairNow we present the path-counting formula for Φ
119886119886119886119887
where 119886 is not an ancestor of 119887 as follows
Φ119886119886119886119887
= sum
119860
(3
2( sum
Type 1(1
2)
119871 tripleminus1
Φ119860119860119860
+ sum
Type 2(1
2)
119871 triple
Φ119860119860)
+ sum
Type 3(1
2)
119871pair+2
Φ119860119860)
(A6)
where 119860 a common ancestor of 119886 and 119887When there is only one mergeable path-pair (let us con-
sider ⟨1198751198601198861 1198751198601198862⟩ as the mergeable path-pair)
Type 1 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has zero root 2-overlap path
Type 2 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has one root 2-overlap path
119875119860119904
ending at 119904
When ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable
Type 3 ⟨119875119860119886 119875119860119887⟩ is nonoverlapping
119871 triple = 1198711198751198601198861
+ 1198711198751198601198863
+ 119871119875119860119887
for Type 11198711198751198601198861
+ 1198711198751198601198863
+ 119871119875119860119887minus 119871119875119860119904
for Type 2
119871pair = 119871119875119860119886 + 119871119875119860119887 for Type 3
(A7)
Note that if 119886 is an ancestor of 119887 we treat Φ119886119886119886119887
=
Φ119886111988621198863119887
Then we apply the path-counting formula forΦ119886119887119888119889
to computeΦ119886111988621198863119887
Case21 Case31 ΦAAAΦabCase22 Case32
Case23 ΦAA
Figure 17 Dependency graph for different cases regardingΦ119886119887119888
andΦ119886119886119887
B Proof for Path-Counting Formulas ofThree Individuals
Wefirst demonstrate that for one triple-common ancestor119860the path-counting computation of Φ
119886119887119888is equivalent to the
computation using recursive formulas Then we prove thecorrectness of the path-counting computation for multipletriple-common ancestors
B1 One Triple-Common Ancestor Considering the differenttypes of path-triples starting from a triple-common ancestor119860 in a pedigree graph119866 contributing toΦ
119886119887119888andΦ
119886119886119887119866 can
have 5 different cases
Case 21 119866 does not haveany path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with root overlapCase 22 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with root overlapCase 23 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
having mergeablepath-pair⟨119875
1198601198861 1198751198601198862⟩
lArr997904 Φ119886119886119887
Case 31 119866 does not haveany path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩
with root overlapCase 32 119866 has path-triples
⟨119875119860119886 119875119860119887 119875119860119888⟩
with root overlap
lArr997904 Φ119886119887119888
(B1)
Based on the 5 cases from Case 21 to Case 32 we firstconstruct a dependency graph shown in Figure 17 consist-ent with the recursive formulas (3) (4) and (5) for the gener-alized kinship coefficients for three individuals
Then we take the following steps to prove the correctnessof the path-counting formulas (12) and (A1)
(i) forΦ119886119887 the correctness of the path-counting formula
(ie Wrightrsquos formula) is proven in [21] For Case 21and Case 22 the correctness is proven based on thecorrectness of Cases 31 and 32
(ii) for Case 23 it has no cycle but only depends on Φ119886119887
Thus we prove the correctness of Case 23 by trans-forming the case toΦ
119886119887
16 Computational and Mathematical Methods in Medicine
a b
c
(a)
A
a b c
(b)
Figure 18 (a) 119888 is a parent of 119886 and 119887 (b) no individual is a parent of another
Parent-child relationshipAncestor-descendant relationship
A
a
s v p
f b c
(a)
Parent-child relationshipAncestor-descendant relationship
c
a
s v
f b
(b)
Figure 19 (a) No individual is a parent of another (b) 119888 is an ancestor of 119886 and 119887
(iii) for Cases 31 and 32 the correctness is proven byinduction on the number of edges 119899 in the pedigreegraph 119866
B11 Correctness Proof for Case 31
Case 31 ForΦ119886119887119888
119866 does not have any path triples ⟨119875119860119886 119875119860119887
119875119860119888⟩ with root overlap
Proof (Basis) There are two basic scenarios (i) one individ-ual is a parent of another (ii) no individual is a parent ofanother among 119886 119887 and 119888
Using the recursive formula (3) to compute Φ119886119887119888
forFigure 18(a) Φ
119886119887119888= (12)Φ
119888119887119888= (12)
2
Φ119888119888119888 for Figure 18(b)
Φ119886119887119888= (12)Φ
119860119887119888= (12)
2
Φ119860119860119888
= (12)3
Φ119860119860119860
Using the path-counting formula (12) if a path-triple
⟨119875119860119886 119875119860119887 119875119860119888⟩ has no root overlap (ie Type 1) then the
contribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ
119886119887119888can be computed as
follows sumType 1(12)119871⟨119875119860119886119875119860119887
119875119860119888⟩Φ119860119860119860
where 119871⟨119875119860119886119875119860119887 119875119860119888⟩
=
119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
For Figure 18(a) 119888 is the only triple-common ancestor
and we obtain Φ119886119887119888
= (12)119871⟨119875119888119886119875119888119887
119875119888119888⟩Φ119888119888119888
= (12)2
Φ119888119888119888 for
Figure 18(b) we obtain Φ119886119887119888
= (12)119871⟨119875119860119886119875119860119887
119875119860119888⟩Φ119860119860119860
=
(12)3
Φ119860119860119860
Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Then we show it istrue for 119899 = 119896 + 1
For Figures 19(a) and 19(b) among 119886 119887 and 119888 let 119886 be theindividual having the longest path starting from their triple-common ancestor in the pedigree graph119866with (119896+1) edgesIf we remove the node 119886 and cut the edge 119891 rarr 119886 from 119866
then the new graph 119866lowast has 119896 edges In terms of computingΦ119891119887119888
119866lowast satisfies the condition for induction hypothesisFor Figure 19(a) Φ
119891119887119888= sumType 1(12)
119871⟨119875119860119891119875119860119887119875119860119888⟩Φ119860119860119860
Based on the recursive formula (3)Φ
119886119887119888= (12)(Φ
119891119887119888+Φ119898119887119888)
where 119891 and 119898 are parents of 119886 In 119866 119886 only has one parent119891 thus it indicatesΦ
119898119887119888= 0 Then we can plug-in the path-
counting formula forΦ119891119887119888
to obtain
Φ119886119887119888=1
2Φ119891119887119888
=1
2lowast sum
Type 1(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩
Φ119860119860119860
= sum
Type 1(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860119860
∵ 119871⟨119875119860119886119875119860119887 119875119860119888⟩
= 119871⟨119875119860119891119875119860119887 119875119860119888⟩
+ 1
there4 Φ119886119887119888= sum
Type 1(1
2)
119871⟨119875119860119886119875119860119887119875119860119888⟩
Φ119860119860119860
(B2)
Similarly for Figure 19(b) we obtain Φ119886119887119888
=
sumType 1(12)119871⟨119875119888119891119875119888119887119875119888119888⟩+1
Φ119888119888119888= sumType 1(12)
119871⟨119875119888119886119875119888119887119875119888119888⟩Φ119888119888119888
Thus it is true for 119899 = 119896 + 1
B12 Correctness Proof for Case 32
Case 32 ForΦ119886119887119888
119866 has path triples ⟨119875119860119886 119875119860119887 119875119860119888⟩with root
overlap
Proof (Basis) There are three basic scenarios (i) there are twoindividuals who are parents of another (ii) there is only oneindividual who is parent of another (iii) there is no individualwho is a parent of another among 119886 119887 and 119888
Computational and Mathematical Methods in Medicine 17
a
b
c
(a)
A
a
b c
(b)
A
a
s
b
c
(c)
Figure 20 (a) 119887 is a parent of 119886 and 119888 is a parent of 119887 (b) 119887 is a parentof 119886 (c) no individual who is a parent of another
Using the recursive formula (3) to compute Φ119886119887119888
inFigure 20 for Figure 20(a) Φ
119886119887119888= (12)Φ
119887119887119888= (12)
2
Φ119887119888=
(12)3
Φ119888119888 for Figure 20(b)Φ
119886119887119888= (12)Φ
119887119887119888= (12)
2
Φ119887119888=
(12)4
Φ119860119860
for Figure 20(c)Φ119886119887119888= (12)
2
Φ119904119904119888= (12)
3
Φ119904119888=
(12)5
Φ119860119860
Using the path-counting formula (12) if a path-triple
⟨119875119860119886 119875119860119887 119875119860119888⟩ has root overlap (ie Type 2) then the con-
tribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ
119886119887119888can be computed as
followssumType 2(12)119871⟨119875119860119886119875119860119887
119875119860119888⟩+1
Φ119860119860
where 119871⟨119875119860119886 119875119860119887 119875119860119888⟩
=
119871119875119860119886
+ 119871119875119860119887
+ 119871119875119860119888minus 119871119875119860119904
and 119904 is the last individual of theroot overlap path 119875
119860119904
For Figure 20(a) 119888 is the only triple-common ancestorand we obtain Φ
119886119887119888= (12)
119871⟨119875119888119886119875119888119887119875119888119888⟩+1
Φ119888119888= (12)
2+1
Φ119888119888=
(12)3
Φ119888119888 Similarly for Figures 20(b) and 20(c) we obtain
Φ119886119887119888= (12)
4
Φ119860119860
and Φ119886119887119888= (12)
5
Φ119860119860
respectively
Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Show that it is truefor = 119896 + 1
For Figures 21(a) 21(b) and 21(c) among 119886 119887 and 119888 let119886 be the individual who has the longest path and let 119901 be aparent of 119886 Then we cut the edge 119901 rarr 119886 from 119866 and obtaina new graph 119866lowast which satisfies the condition of inductionhypothesis For Figure 21(a) we use the path-counting for-mula forΦ
119891119887119888in 119866lowast Φ
119891119887119888= sumType 2(12)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860
In 119866 119891 is the only parent of 119886 according to the recursive
formula (3) we have Φ119886119887119888= (12)Φ
119891119887119888 Then we can plug-in
the Φ119891119887119888
and obtain
Φ119886119887119888=1
2Φ119891119887119888
=1
2sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860
= sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1
Φ119860119860
∵ 119871⟨119875119860119886 119875119860119887 119875119860119888⟩
= 119871⟨119875119860119891119875119860119887 119875119860119888⟩
+ 1
there4 Φ119886119887119888= sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1
Φ119860119860
= sum
Type 2(1
2)
119871⟨119875119860119886119875119860119887119875119860119888⟩+1
Φ119860119860
(B3)
For Figures 21(b) and 21(c) we take the same steps as we cal-culate Φ
119886119887119888for Figure 21(a)
In summary it is true for 119899 = 119896 + 1
A
a
s
t
f
b
c
(a)
a
t
b
A
s c
(b)
a
s
t
b
c
(c)Figure 21 (a) No individual who is a parent of another (b) 119887 is aparent of 119886 (c) 119887 is a parent of 119886 and 119888 is an ancestor of 119887
B13 Correctness Proof for Case 23
Case 23 For Φ119886119886119887
the path-triples in the pedigree graph 119866have mergeable path-pair
Proof Considering the relationship between 119886 and 119887 119866has two scenarios (i) 119887 is not an ancestor of 119886 (ii) 119887 isan ancestor of 119886 Using the path-counting formula (A1)if a path-triple ⟨119875
1198601198861 1198751198601198862 119875119860119887⟩ isin Type 3 which means
that it has a mergeable path-pair then the contributionof ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ to Φ
119886119886119887can be computed as follows
sumType 3(12)119871⟨119875119860119886119875119860119887
⟩+1Φ119860119860
where 119871⟨119875119860119886 119875119860119887⟩
= 119871119875119860119886+ 119871119875119860119887
Using the recursive formula (4) we obtain Φ
119886119886119887=
(12)(Φ119886119887+ Φ119891119898119887)
For Figure 22(a) 119860 is a common ancestor of 119886 and 119887∵ 119886 only has one parent 119891
there4 Φ119886119886119887
=1
2(Φ119886119887+ Φ119891119898119887)
=1
2(Φ119886119887+ 0) =
1
2Φ119886119887
(as 119898 is missing) (B4)
For Φ119886119887 we use Wrightrsquos formula and obtain Φ
119886119887=
sum119875(12)119871⟨119875119860119886119875119860119887
⟩Φ119860119860
where 119875 denotes all nonoverlappingpath-pairs ⟨119875
119860119886 119875119860119887⟩
Then we have Φ119886119886119887
= (12)Φ119886119887
=
(12)sum119875(12)119871⟨119875119860119886119875119860119887
⟩Φ119860119860= sum119875(12)119871⟨119875119860119886119875119860119887
⟩+1Φ119860119860
For Figure 22(b) we can also transform the computation
of Φ119886119886119887
to Φ119886119887
In summary it shows that the path-counting formula(A1) is true for Case 23
B14 Correctness Proof for Cases 21 and 22 For Φ119886119886119887
whenthere is no path-triple having mergeable path-pair (ie thepath-triple belongs to either Case 21 or Case 23)Φ
119886119886119887can be
transformed toΦ11988611198862119887
which is equivalent to the computationof Φ119886119887119888
for Cases 31 and 32 The correctness of our path-counting formula for Cases 31 and 32 is proven Thus weobtain the correctness for Φ
119886119886119887when the path-triple belongs
to either Case 21 or Case 22
B2 Multiple Triple-Common Ancestors Now we providethe correctness proof for multiple triple-common ancestorsregarding the path-counting formulas (12) and (A1)
18 Computational and Mathematical Methods in Medicine
A
a
s
w
t
f
b
Parent-child relationshipAncestor-descendant relationship
(a)
a
s
f
b
Parent-child relationshipAncestor-descendant relationship
(b)
Figure 22 (a) 119887 is not an ancestor of 119886 (b) 119887 is an ancestor of 119886
Lemma A Given a pedigree graph 119866 and three individuals 119886119887 119888 having at least one trip-common ancestorΦ
119886119887119888is correctly
computed using the path counting formulas (12) and (A1)
Proof Proof by induction on the number of triple-commonancestorsBasis 119866 has only one triple-common ancestor of 119886 119887 and 119888
The correctness of (12) and (A1) for 119866 with only one tri-ple-common ancestor of 119886 119887 and 119888 is proven in the previoussection
Induction Hypothesis Assume that if 119866 has 119896 or less triple-common ancestors of 119886 119887 and 119888 (12) and (A1) are correct for119866
Induction Step Now we show that it is true for 119866 with 119896 + 1triple-common ancestors of 119886 119887 and 119888
Let 119879119903119894 119862(119886 119887 119888 119866) denote all triple-common ancestorsof 119886 119887 and 119888 in 119866 where 119879119903119894 119862(119886 119887 119888 119866) = 119860
119894| 1 le 119894 le 119896 +
1 Let 1198601be the most top triple-common ancestor such that
there is no individual among the remaining ancestors 119860119894|
2 le 119894 le 119896 + 1 who is an ancestor of 1198601 Let 119878(119860
1) denote the
contribution from 1198601to Φ119886119887119888
Because119860
1is themost top triple-common ancestor there
is no path-triple from 119860119894| 2 le 119894 le 119896 + 1 to 119886 119887 and
119888 which passes through 1198601 Then we can remove 119860
1from
119866 and delete all out-going edges from 1198601and obtain a new
graph 1198661015840 which has 119896 triple-common ancestors of 119886 119887 and 119888It means 119879119903119894 119862(119886 119887 119888 1198661015840) = 119860
119894| 2 le 119894 le 119896 + 1
For the new graph 1198661015840 we can apply our induction
hypothesis and obtainΦ119886119887119888(1198661015840
)For the most top triple-common ancestor 119860
1 there are
two different cases considering its relationship with the othertriple-common ancestors
(1) there is no individual among 119860119894| 2 le 119894 le 119896 + 1 who
is a descendant of 1198601
(2) there is at least one individual among 119860119894| 2 le 119894 le
119896 + 1 who is a descendant of 1198601
For (1) since no individual among 119860119894| 2 le 119894 le 119896 + 1 is a
descendant of 1198601 the set of path-triples from 119860
1to 119886 119887 and
119888 is independent of the set of path-triples from 119860119894| 2 le 119894 le
119896 + 1 to 119886 119887 and 119888 It also means that the contribution from
1198601toΦ119886119887119888
is independent of the contribution from the othertriple-common ancestors
Summing up all contributions we can obtainΦ119886119887119888(119866) =
Φ119886119887119888(1198661015840
) + 119878(1198601)
For (2) let119860119895be one descendant of119860
1 Now both119860
1and
119860119895can reach 119886 119887 and 119888119901119905119894= 119905119886 1198601rarr sdot sdot sdot rarr 119886 119905
119887 1198601rarr sdot sdot sdot rarr 119887 119905
119888 1198601rarr
sdot sdot sdot rarr 119888 a path-triple from 1198601to 119886 119887 and 119888
If 119905119886 119905119887 and 119905
119888all pass through119860
119895 then the path-triple119901119905
119894
is not an eligible path-triple for Φ119886119887119888
When we compute thecontribution from119860
1toΦ119886119887119888
we exclude all such path-tripleswhere 119905
119886 119905119887 and 119905
119888all pass through a lower triple-common
ancestor In other words an eligible path-triple from 1198601
regarding Φ119886119887119888
cannot have three paths all passing through alower triple-common ancestor Therefore we know that thatthe contribution from119860
1toΦ119886119887119888
is independent of the contri-bution from the other triple-common ancestors Summing upall contributions we obtainΦ
119886119887119888(119866) = Φ
119886119887119888(1198661015840
) + 119878(1198601)
C Proof for Four Individuals and TwoPairs of Individuals
Here we give a proof sketch for the correctness of pathcounting formulas for four individuals First of all for fourindividuals in a pedigree graph 119866 we present all differentcases based on which we construct a dependency graphThe correctness of the path-counting formulas for two-pairindividuals can be proved similarly
C1 Proof for Four Individuals Consider the existence ofdifferent types of path-quads regarding Φ
119886119887119888119889 Φ119886119886119887119888
andΦ119886119886119886119887
there are 15 cases for a pedigree graph 119866
Case 21 119866 has path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with zero root overlapCase 22 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with one root overlapCase 23 119866 has path-pairs
⟨119875119860119886 119875119860119887⟩
with zero root overlap
lArr997904 Φ119886119886119886119887
Computational and Mathematical Methods in Medicine 19
Case21
Case31 ΦAAA
ΦAAA
Case41
Case42
Case34ΦAA
Case32
Case331
Case22
Case23
Case431
Case35
Case432
Case4 33
Case332
Case333
Figure 23 Dependency graph for different cases for four individuals
Case 31 119866 has path-quads⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with zero root overlapCase 32 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 2-overlapCase 331 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with two root 2-overlapCase 332 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 3-overlapCase 333 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 2-overlapand one root 3-overlap
Case 34 119866 has path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩
with zero root overlapCase 35 119866 has path-triples
⟨119875119860119886 119875119860119887 119875119860119888⟩
with one root overlap
lArr997904 Φ119886119886119887119888
Case 41 119866 has path-quads⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with zero root overlapCase 42 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 2-overlapCase 431 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with two root 2-overlapCase 432 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 3-overlapCase 433 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 2-overlapand one root 3-overlap
lArr997904 Φ119886119887119888119889
(C1)Then we construct a dependency graph shown in
Figure 23 for all cases for four individualsAccording to the dependency graph in Figure 23 the
intermediate steps including Cases 34 and 35 are already
proved for the computation of Φ119886119887119888
The correctness of thetransformation fromCase 42 to Case 34 can be proved basedon the recursive formula forΦ
119886119887119888119889andΦ
119886119886119887119888 Similarly we can
obtain the transformation from Case 431 to Case 35
C2 Proof for TwoPairs of Individuals Consider the existenceof different types of 2-pair-path-pair regarding Φ
119886119887119888119889 there
are 9 cases which are listed as follows
Case 41 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-
overlap and zero root heter-overlap
Case 42 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-
overlap and one root heter-overlap
Case 431 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root
homo-overlap and two root heter-overlap
Case 432 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root
homo-overlap and two root heter-overlap
Case 44119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root homo-
overlap and zero root heter-overlap
Case 45 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with two root homo-
overlap and zero root heter-overlap
Case 46 119866 has path-triples ⟨119875119860119886 119875119860119887 119875119860119888⟩ with zero root
overlapCase 47 119866 has path-triples ⟨119875
119860119886 119875119860119887 119875119860119888⟩ with one root
overlap
Case 48 119866 has path-pairs ⟨119875119879119888 119875119879119889⟩ with zero root overlap
Then we construct a dependency graph for the casesrelating to Φ
119886119887119888119889in Figure 24
According to the dependency graph in Figure 24Cases 46 47 and 48 are the intermediate steps whichalready are proved for the computation of Φ
119886119887119888 The
correctness of the transformation from Case 42 to Case 46can be proved based on the recursive formula for Φ
119886119887119888119889and
Φ119886119887119886119888
Similarly we can obtain the transformation fromCases 431 and 432 to Case 47 as well as from Case 44 toCase 48 accordingly
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
20 Computational and Mathematical Methods in Medicine
Case41
Case44
ΦAAA
Case42 Case46
Case48
ΦAA
ΦTT
Case431 Case47
Case432
ΦAAAA
Figure 24 Dependency graph for different cases for two pairs of individuals
Acknowledgments
The authors thank Professor Robert C Elston Case Schoolof Medicine for introducing to them the identity coefficientsand referring them to the related literature [7 10 17] Thiswork is partially supported by the National Science Founda-tionGrants DBI 0743705 DBI 0849956 andCRI 0551603 andby the National Institute of Health Grant GM088823
References
[1] Surgeon Generalrsquos New Family Health History Tool Is ReleasedReady for ldquo21st Century Medicinerdquo httpcompmedcomcate-gorypeople-helping-peoplepage7
[2] M Falchi P Forabosco E Mocci et al ldquoA genomewidesearch using an original pairwise sampling approach for largegenealogies identifies a new locus for total and low-density lipo-protein cholesterol in two genetically differentiated isolates ofSardiniardquoThe American Journal of Human Genetics vol 75 no6 pp 1015ndash1031 2004
[3] M Ciullo C Bellenguez V Colonna et al ldquoNew susceptibilitylocus for hypertension on chromosome 8q by efficient pedigree-breaking in an Italian isolaterdquo Human Molecular Genetics vol15 no 10 pp 1735ndash1743 2006
[4] Glossary of Genetic Terms National Human Genome ResearchInstitute httpwwwgenomegovglossaryid=148
[5] CW CottermanA calculus for statistico-genetics [PhD thesis]Columbus Ohio USA Ohio State University 1940 Reprintedin P Ballonoff Ed Genetics and Social Structure DowdenHutchinson amp Ross Stroudsburg Pa USA 1974
[6] G Malecot Les mathematique de lrsquoheredite Masson ParisFrance 1948 Translated edition The Mathematics of HeredityFreeman San Francisco Calif USA 1969
[7] M Gillois ldquoLa relation drsquoidentite en genetiquerdquo Annales delrsquoInstitut Henri Poincare B vol 2 pp 1ndash94 1964
[8] D L Harris ldquoGenotypic covariances between inbred relativesrdquoGenetics vol 50 pp 1319ndash1348 1964
[9] A Jacquard ldquoLogique du calcul des coefficients drsquoidentite entredeux individualsrdquo Population vol 21 pp 751ndash776 1966
[10] G Karigl ldquoA recursive algorithm for the calculation of identitycoefficientsrdquo Annals of Human Genetics vol 45 no 3 pp 299ndash305 1981
[11] B Elliott S F Akgul S Mayes and Z M Ozsoyoglu ldquoEfficientevaluation of inbreeding queries on pedigree datardquo in Proceed-ings of the 19th International Conference on Scientific and Statis-tical Database Management (SSDBM rsquo07) July 2007
[12] B Elliott E Cheng S Mayes and Z M Ozsoyoglu ldquoEfficientlycalculating inbreeding on large pedigrees databasesrdquo Informa-tion Systems vol 34 no 6 pp 469ndash492 2009
[13] L Yang E Cheng and Z M Ozsoyoglu ldquoUsing compactencodings for path-based computations on pedigree graphsrdquo inProceedings of the ACM Conference on Bioinformatics Compu-tational Biology and Biomedicine (ACM-BCB rsquo11) pp 235ndash244August 2011
[14] E Cheng B Elliott and Z M Ozsoyoglu ldquoScalable compu-tation of kinship and identity coefficients on large pedigreesrdquoin Proceedings of the 7th Annual International Conference onComputational Systems Bioinformatics (CSB rsquo08) pp 27ndash362008
[15] E Cheng B Elliott and Z M Ozsoyoglu ldquoEfficient compu-tation of kinship and identity coefficients on large pedigreesrdquoJournal of Bioinformatics and Computational Biology (JBCB)vol 7 no 3 pp 429ndash453 2009
[16] S Wright ldquoCoefficients of inbreeding and relationshiprdquo TheAmerican Naturalist vol 56 no 645 1922
[17] R Nadot and G Vaysseix ldquoKinship and identity algorithm ofcoefficients of identityrdquo Biometrics vol 29 no 2 pp 347ndash3591973
[18] E Cheng Scalable path-based computations on pedigree data[PhD thesis] Case Western Reserve University ClevelandOhio USA 2012
[19] V Ollikainen Simulation Techniques for Disease Gene Localiza-tion in Isolated Populations [PhD thesis] University ofHelsinkiHelsinki Finland 2002
[20] H T T Toivonen P Onkamo K Vasko et al ldquoData miningapplied to linkage diseqilibrium mappingrdquoThe American Jour-nal of Human Genetics vol 67 no 1 pp 133ndash145 2000
[21] W Boucher ldquoCalculation of the inbreeding coefficientrdquo Journalof Mathematical Biology vol 26 no 1 pp 57ndash64 1988
Submit your manuscripts athttpwwwhindawicom
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MEDIATORSINFLAMMATION
of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Behavioural Neurology
EndocrinologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Disease Markers
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
OncologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Oxidative Medicine and Cellular Longevity
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
PPAR Research
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
ObesityJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational and Mathematical Methods in Medicine
OphthalmologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Diabetes ResearchJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Research and TreatmentAIDS
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Gastroenterology Research and Practice
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Parkinsonrsquos Disease
Evidence-Based Complementary and Alternative Medicine
Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom
Computational and Mathematical Methods in Medicine 7
S0 S1 S2 S3
PAa PAb
PAc
Figure 5 A path-pair level graphical representation of ⟨119875119860119886 119875119860119887 119875119860119888⟩
where 119904 119890 119905 are 2-overlap individuals and the overlap pathis a root 2-overlap path
119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119886
119860 997888rarr 119904 997888rarr 119891 997888rarr 119905 997888rarr 119887 isin 119879119883 (9)
where 119904 is a 2-overlap individual and the overlap path is a root2-overlap path 119905 is a crossover individual
119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119886
119860 997888rarr 119889 997888rarr 119891 997888rarr 119905 997888rarr 119887 isin 119883 (10)
where 119905 is a crossover individual
119860 997888rarr 119888 997888rarr 119890 997888rarr 119905 997888rarr 119886
119860 997888rarr 119904 997888rarr 119890 997888rarr 119905 997888rarr 119887 isin 119883119884 (11)
where 119890 is a crossover individual 119905 is a 2-overlap individualand the overlap path is a 2-overlap path
312 Path-Pair Level Graphical Representation of a Path-Tri-ple Given a path-triple ⟨119875
119860119886 119875119860119887 119875119860119888⟩ we represent each
path as a node The path-triple can be decomposed to threepath-pairs (ie ⟨119875
119860119886 119875119860119887⟩ ⟨119875119860119886 119875119860119888⟩ and ⟨119875
119860119887 119875119860119888⟩) For
each path-pair if the two paths share at least one commonindividual (ie either 2-overlap individual or crossover indi-vidual) except119860 then there is an edge between the two nodesrepresenting the two paths Therefore we obtain four differ-ent scenarios 119878
0ndash1198783 shown in Figure 5
In Figure 5 the scenario 1198780has no edges so it means
that ⟨119875119860119886 119875119860119887 119875119860119888⟩ consists of three independent paths In
Figure 2 path-triple1 is an example of 1198780 Next we introduce
a lemma which can assist with identifying the options for theedges in the scenarios 119878
1ndash1198783
Lemma 3 Given a path-triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ consider the
three path-pairs ⟨119875119860119886 119875119860119887⟩ ⟨119875119860119886 119875119860119888⟩ and ⟨119875
119860119887 119875119860119888⟩ if there
is a 2-overlap edge which is represented by 119884 in regular expres-sion representation of any of the three path-pairs and then thepath-triple ⟨119875
119860119886 119875119860119887 119875119860119888⟩ has no contribution to Φ
119886119887119888
Proof In [17] Nadot and Vaysseix proposed from a geneticand biological point of view that Φ
119886119887119888can be evaluated by
enumerating all eligible inheritance paths at allele-level start-ing from a triple common ancestor119860 to the three individuals119886 119887 and 119888
p1
p3
A
b c
a
p2
p5
p8
p4
p7
p6
(a) Pedigree
A
b c
a
p5
p7
p4
p6
p8
p1 p2
p3
(b) Inheritance paths
Figure 6 Examples of pedigree and inheritance paths
For the pedigree in Figure 6 let us consider the path-triple ⟨119875
119860119886 119875119860119887 119875119860119888⟩ listed as follows 119875
119860119886 119860 rarr 119886 119875
119860119887
119860 rarr 1199013rarr 1199016rarr 1199017rarr 119887 119875
119860119888 119860 rarr 119901
4rarr 1199016rarr
1199017rarr 119888For ⟨119875
119860119887 119875119860119888⟩ 1199016is a crossover individual 119901
7is an over-
lap individual and 1199016rarr 1199017is a 2-overlap edge repre-sented
by 119884 in regular expression representation (see the definitionfor 119884 in Section 311)
For the individual 1199016 let us denote the two alleles at one
fixed autosomal locus as 1198921and 119892
2 At allele-level only one
allele can be passed down from 1199016to 1199017 Since 119901
3and 119901
4
are parents of 1199016 1198921is passed down from one parent and
1198922is passed down from the other parent It is infeasible to
pass down both 1198921and 119892
2from 119901
6to 1199017 In other words
there are no corresponding inheritance paths for the path-triple ⟨119875
119860119886 119875119860119887 119875119860119888⟩with a 2-overlap edge between ⟨119875
119860119887 119875119860119888⟩
(ie Case 6119883119884) Therefore such kind of path-triples has nocontribution toΦ
119886119887119888
Figure 6(b) shows one example of eligible inheritancepaths corresponding to a pedigree graph Each individual isrepresented by two allele nodesThe eligible inheritance pathsin Figure 6(b) consist of red edges only
Only Case 1 Case 2 and Case 3 do not have 119884 in theregular expression representation of a path-pair (see (7))considering the scenarios 119878
1ndash1198783shown in Figure 5 an edge
can have three options Case 1 119879Case 2 119883Case 3 119879119883
313 Constructing Cases for a Path-Triple For the scenarios1198781ndash1198783in Figure 5 we define two building blocks 119861
1 1198612
along with some rules in Figure 7 to generate acceptablecases For 119861
1 the edge can have three options Case 1 119879
Case 2 119883 Case 3 119879119883 For 1198612 we cannot allow both edges
to be root overlap because if two edges are root overlap then
8 Computational and Mathematical Methods in Medicine
For B2 there can be at most one edge belonging to root overlap (either T or TX)
PAa PAa
PAb PAb PAc
B1 B2
For B1 the edge can have three options case 1 T case 2 X case 3 TX
Figure 7 Building blocks 1198611 1198612 and basic rules
Note Ri denotes all acceptable path-triples for ui
S3e1
T3 = R1 ⋈ R2 ⋈ R3u1 u2 u3
e2 e2 e2
e3e3 e3e1 e1
Figure 8 A graphical illustration for obtaining 1198793
119875119860119886
and 119875119860119888
must share at least one com-mon individualexcept 119860 which contradicts the fact that 119875
119860119886and 119875
119860119888have
no edgeNext we focus on generating all acceptable cases for the
scenarios 1198781ndash1198783in Figure 5 where only 119878
3contains more
than one building block In order to leverage the dependencyamong building blocks we decompose 119878
3to 1198783= 1199061= 1198612
1199062= 1198612 1199063= 1198612 shown in Figure 8 For each 119906
119894 we have a
set of acceptable path-triples denoted as 119877119894
Considering the dependency among 1198771 1198772 1198773 we use
the natural join operator denoted as ⋈ operating on 1198771
1198772 1198773 to generate all acceptable cases for 119878
3 As a result we
obtain 1198793= 1198771⋈ 1198772⋈ 1198773 where 119879
3denotes the acceptable
cases of the path-triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ in the scenario 119878
3
For each scenario in Figure 5 we generate all acceptablecases for ⟨119875
119860119886 119875119860119887 119875119860119888⟩ The scenario 119878
0has no edges and
it shows that ⟨119875119860119886 119875119860119887 119875119860119888⟩ consists of three independent
paths while for the other scenarios 119878119896(119896 = 1 2 3) the 119896
edges can have two options
(1) all 119896 edges belong to crossover or(2) one edge belongs to root 2-overlap the remaining (119896minus
1) edges belong to crossover
In summary acceptable path-triples can have at most oneroot 2-overlap path any number of crossover individuals butzero 2-overlap path
314 Splitting Operator Considering the existence of root2-overlap path and crossover in acceptable path-triples wepropose a splitting operator to transform a path-triple withcrossover individuals to a noncrossover path-triple withoutchanging the contribution from this path-triple to Φ
119886119887119888 The
main purpose of using the splitting operator is to simplifythe path-counting formula derivation process We first usean example in Figure 9 to illustrate how the splitting operator
works In Figure 9 there is a crossover individual 119904 between119875119860119886
and 119875119860119887
in the path triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ in 119866
119896+1 The
splitting operator proceeds as follows
(1) split the node 119904 to two nodes 1199041and 1199042
(2) transform the edges 119904 rarr 1198861015840 and 119904 rarr 119887
1015840 to 1199041rarr 1198861015840
and 1199042rarr 1198871015840 respectively
(3) add two new edges 1199042rarr 1198861015840 and 119904
1rarr 1198871015840
Lemma 4 Given a pedigree graph 119866119896+1
having (119896 + 1)
crossover individuals regarding ⟨119875119860119886 119875119860119887 119875119860119888⟩ shown in
Figure 9 let 119904 denote the lowest crossover individual where nodescendant of 119904 can be a crossover individual among the threepaths119875
119860119886119875119860119887 and119875
119860119888 After using the splitting operator for the
lowest crossover individual 119904 in119866119896+1 the number of crossover
individuals in 119866119896+1
is decreased by 1
Proof The splitting operator only affects the edges from 119904 to1198861015840 and 1198871015840 If there is a new crossover node appearing the only
possible node is either 1198861015840 or 1198871015840 Assume 1198871015840 becomes a cross-over individual it means that 1198871015840 is able to reach 119886 and 119887 fromtwo separate paths It contradicts the fact that 119904 is the lowestcrossover individual between 119875
119860119886and 119875
119860119887
Next we introduce a canonical graph which results fromapplying the splitting operator for all crossover individualsThe canonical graph has zero crossover individual
Definition 5 (Canonical Graph) Given a pedigree graph 119866having one or more crossover individuals regarding Φ
119886119887119888 If
there exists a graph 1198661015840 which has no crossover individualswith regards to Φ
119886119887119888such that
(i) any acceptable path-triple in 119866 has an acceptablepath-triple in 1198661015840 which has the same contribution toΦ119886119887119888
as the one in 119866 forΦ119886119887119888
(ii) any acceptable path-triple in 1198661015840 has an acceptablepath-triple in 119866 which and has the same contributionto Φ119886119887119888
as the one in 1198661015840 forΦ119886119887119888
We call 1198661015840 a canonical graph of 119866 regardingΦ119886119887119888
Lemma 6 For a pedigree graph 119866 having one or morecrossover individuals regarding ⟨119875
119860119886 119875119860119887 119875119860119888⟩ there exists a
canonical graph 1198661015840 for 119866
Computational and Mathematical Methods in Medicine 9
Ancestor-descendant relationshipParent-child relationship
a998400 b
a b a b
998400 a998400 b998400
s1 s2
A A
x w c x w c
s For Gk+1 ⟨P ⟩ = PAa PAb PAc
⟨P ⟩ = PAa PAb PAcFor Gk
Gk+1 k + 1 crossover Gk k crossover
A rarr middot middot middot rarr x rarr s rarr a998400 rarr middot middot middot rarr aA rarr middot middot middot rarr w rarr s rarr b998400 rarr middot middot middot rarr b
A rarr middot middot middot rarr x rarr s1 rarr a998400 rarr middot middot middot rarr aA rarr middot middot middot rarr w rarr s2 rarr b998400 rarr middot middot middot rarr b
A rarr c
A rarr c
Figure 9 Transforming pedigree graph 119866119896+ 1 having 119896 + 1 crossover to 119866
119896having 119896 crossover
S0
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
PAa PAd
PAb PAc
Figure 10 A path-pair level graphical representation of ⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
Proof (Sketch) The proof is by induction on the number ofcrossover individuals
Induction hypothesis assume that if119866 has 119896 or less cross-overs there is a canonical graph 1198661015840 for 119866
In the induction step let119866119896+1
be a graph with 119896+1 cross-overs let 119904 be the lowest crossover between paths 119875
119860119886and
119875119860119887
in 119866119896+1
We apply the splitting operator on 119904 in 119866119896+1
andobtain 119866
119896having 119896 crossovers by Lemma 4
315 Path-Counting Formula for Φ119886119887119888
Now we present thepath-counting formula forΦ
119886119887119888
Φ119886119887119888= sum
119860
( sum
Type 1(1
2)
119871 triple
Φ119860119860119860
+ sum
Type 2(1
2)
119871 triple+1
Φ119860119860)
(12)
where Φ119860119860= (12)(1 + 119865
119860) Φ119860119860119860
= (14)(1 + 3119865119860) 119865119860 the
inbreeding coefficient of119860119860 a triple-common ancestor of 119886119887 and 119888 Type 1 ⟨119875
119860119886 119875119860119887 119875119860119888⟩ has zero root 2-overlap Type
2 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has one root 2-overlap path 119875
119860119904ending at
the individual 119904
119871 triple = 119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888minus 119871119875119860119904
for Type 2(13)
and 119871119875119860119886
the length of the path 119875119860119886
(also applicable for 119875119860119886
119875119860119888 and 119875
119860119904)
For completeness the path-counting formula for Φ119886119886119887
isgiven in Appendix A and the correctness proof of the path-counting formula is given in Appendix B
32 Path-Counting Formulas for Four Individuals
321 Path-Pair Level Graphical Representation of ⟨119875119860119886119875119860119887
119875119860119888119875119860119889⟩ Given a path-quad ⟨119875
119860119886 119875119860119887 119875119860119888 119875119860119889⟩ and
119876119906119886119889 119862(119875119860119886 119875119860119887 119875119860119888 119875119860119889) = 0 the path-quad can have 11
scenarios 1198780ndash11987810shown in Figure 10 where all four paths are
considered symmetricallyIn Figure 11 we introduce three building blocks 119861
1
1198612 1198613 For 119861
1and 119861
2 the rules presented in Figure 7 are also
applicable for Figure 11 For1198613 we only consider root overlap
because the crossover individuals can be eliminated by usingthe splitting operator introduced in Section 314 Note thatfor 1198613 if 119879119903119894 119862(119875
119860119886 119875119860119887 119875119860119888) = 0 then it is equivalent to the
scenario 1198783in Figure 8 Therefore we only need to consider
1198613when 119879119903119894 119862(119875
119860119886 119875119860119887 119875119860119888) = 0
322 Building Block-Based Cases Construction for ⟨119875119860119886119875119860119887
119875119860119888119875119860119889⟩ For a scenario 119878
119894(0 le 119894 le 10) in Figure 11 we
first decompose 119878119894to one or multiple building blocks For a
scenario 119878119894isin 1198781 1198783 it has only one building block and
all acceptable cases can be obtained directly For 1198782= 1199061=
1198611 1199062= 1198611 there is no need to consider the conflict between
the edges in 1199061and 119906
2because 119906
1and 119906
2are disconnected
Let 119877119894denote all acceptable cases of the path-pairs in 119906
119894 and
let 119879119894denote all acceptable cases for 119878
119894 Therefore we obtain
1198792= 1198771times1198772where times denotes the Cartesian product operator
from relational algebra
10 Computational and Mathematical Methods in Medicine
For B3 all three edges belong to root overlap (ie having root 3-overlap)
PAa
PAb PAcPAb
PAa
C(PAa PAb PAc) ne
B1 B2 B3
Tri 0
Figure 11 Building blocks for all scenarios of ⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
Table 2 Largest subgraph of a scenario 119878119894(4 le 119894 le 10 and 119894 = 6)
119878119894
1198784
1198785
1198787
1198788
1198789
11987810
119878119895
1198783
1198783
1198786
1198785
1198787
1198789
For 1198786= 1199061= 1198613 we obtain 119879
6= 1198771 For 119878
119894isin 119878119894| 4 le
119894 le 10 and 119894 = 6 we define the largest subgraph of 119878119894based
on which we construct 119879119894
Definition 7 (Largest Subgraph) Given a scenario 119878119894(4 le 119894 le
10 and 119894 = 6) the largest subgraph of 119878119894 denoted as 119878
119895 is
defined as follows
(1) 119878119895is a proper subgraph of 119878
119894
(2) if 119878119894contains 119861
3 then 119878
119895must also contain 119861
3
(3) no such 119878119896exists that 119878
119895is a proper subgraph of 119878
119896
while 119878119896is also a proper subgraph of 119878
119894
For each scenario 119878119894(4 le 119894 le 10 and 119894 = 6) we list the
largest subgraph of 119878119894 denoted as 119878
119895 in Table 2
For a scenario 119878119894(4 le 119894 le 10 and 119894 = 6) let Diff(119878
119894 119878119895)
denote the set of building blocks in 119878119894but not in 119878
119895 where 119878
119895is
the largest subgraph of 119878119894 Let |119864
119894| and |119864
119895| denote the number
of edges in 119878119894and 119878
119895 respectively According to Table 2 we
can conclude that |119864119894| minus |119864
119895| = 1 In order to leverage the
dependency among building blocks we consider only 1198612in
Diff(119878119894119878119895) For example Diff(119878
51198783) = 119861
2 Let119879
3denote all
acceptable cases for 1198783 And let119877
1denote the set of acceptable
cases for Diff(1198785 1198783) Then we can use 119878
3and Diff(119878
5
1198783) to construct all acceptable cases for 119878
5 Then we apply
this idea for constructing all acceptable cases for each 119878119894in
Table 2Given a path-quad ⟨119875
119860119886 119875119860119887 119875119860119888 119875119860119889⟩ an acceptable case
has the following properties
(1) if there is one root 3-overlap path there can be atmostone root 2-overlap path
(2) otherwise there can be at most two root 2-overlappaths
323 Path-Counting Formula forΦ119886119887119888119889
Now we present thepath-counting formula forΦ
119886119887119888119889as follows
Φ119886119887119888119889
= sum
119860
( sum
Type 1(1
2)
119871quad
Φ119860119860119860119860
+ sum
Type 2(1
2)
119871quad+1
Φ119860119860119860
+ sum
Type 3(1
2)
119871quad+2
Φ119860119860)
(14)
where Φ119860119860= (12)(1+119865
119860)Φ119860119860119860
= (14)(1+3119865119860)Φ119860119860119860119860
=
(18)(1+7119865119860) 119865119860 the inbreeding coefficient of119860119860 a quad-
common ancestor of 119886 119887 119888 and 119889 Type 1 zero root 2-overlapand zero root 3-overlap path Type 2 one root 2-overlap path119875119860119904
ending at 119904
Type 3
Case 1 two root 2-overlap paths 1198751198601199041
1198751198601199042
ending at 1199041and 1199042 respectively
Case 2 one root 3-overlap path119875119860119905
ending at 119905Case 3 one root 2-overlap path119875119860119904 one root 3-overlap
path 119875119860119905
ending at 119904 and 119905respectively
119871quad =
119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889
for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 119871119875119860119904
for Type 2119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889
minus1198711198751198601199041
minus 1198711198751198601199042
for Case 1 isin Type 3119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 2 lowast 119871
119875119860119905for Case 2 isin Type 3
119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889
minus119871119875119860119905minus 119871119875119860119904
for Case 3 isin Type 3(15)
and 119871119875119860119886
the length of the path 119875119860119886
(also applicable for 119875119860119887
119875119860119888 119875119860119889 etc)
For completeness the path-counting formulas for Φ119886119886119887119888
and Φ119886119886119886119887
are presented in Appendix A The correctness ofthe path-counting formula for four individuals is proven inAppendix C
Computational and Mathematical Methods in Medicine 11
⟨ ⟩(PAa PAb) (PAc PAd) = b
A
c
s t
da
A rarr s rarr aA rarr s rarr bA rarr t rarr cA rarr t rarr d
(a)
⟨ ⟩(PAa PAb) (PAc PAd) = b
A
c
x y
da
A rarr x rarr a
A rarr x rarr d
A rarr y rarr bA rarr y rarr c
(b)
Figure 12 Examples of 2-pair-path-quads for Φ119886119887119888119889
33 Path-Counting Formulas for Two Pairs of Individuals
331 Terminology and Definitions
(1) 2-Pair-Path-Pair It consists of two pairs of path-pairsdenoted as ⟨(119875
119878119886 119875119878119887) (119875119879119888 119875119879119889)⟩ where 119875
119878119886isin 119875(119878 119886) 119875
119878119887isin
119875(119878 119887) 119875119879119888isin 119875(119879 119888) 119875
119879119889isin 119875(119879 119889) 119878 is a common ancestor
of 119886 and 119887 and 119879 is a common ancestor of 119888 and 119889 If119860 = 119878 =119879 then 119860 is a quad-common ancestor of 119886 119887 119888 and 119889
(2) Homo-Overlap and Heter-Overlap Individual Given twopairs of individuals ⟨119886 119887⟩ and ⟨119888 119889⟩ if 119904 isin 119861119894 119862(119875
119860119886 119875119860119887) (or
119904 isin 119861119894 119862(119875119860119888 119875119860119889) we call 119904 a homo-overlap individual when
119875119860119886
and 119875119860119887
(or 119875119860119888
and 119875119860119889) pass through the same parent of
119904 If 119903 isin 119861119894 119862(119875119860119894 119875119860119895) where 119894 isin 119886 119887 and 119895 isin 119888 119889 we call
119903 a heter-overlap individual when 119875119860119894
and 119875119860119895
pass throughthe same parent of 119903
(3) Root Homo-Overlap and Heter-Overlap Path Given a 2-pair-path-pair ⟨(119875
119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ if 119904 is a homo-overlap
individual and the homo-overlap path extends all the wayto the quad-common ancestor 119860 then we call it a roothomo-overlap path If 119903 is a heter-overlap individual and theheter-overlap path extends all the way to the quad-commonancestor 119860 then we call it a root heter-overlap path
Example 8 119860 is quad-common ancestor for 119886 119887 119888 and 119889 inFigure 12 For (a) 119904 is a homo-overlap individual between 119875
119860119886
and 119875119860119887
119905 is a homo-overlap individual between 119875119860119888
and 119875119860119889 And
119860 rarr 119904 and 119860 rarr 119905 are root homo-overlap paths For (b) 119909 isa heter-overlap individual between 119875
119860119886and 119875
119860119889 119910 is a heter-
overlap individual between 119875119860119887
and 119875119860119888 And 119860 rarr 119909 and
119860 rarr 119910 are root heter-overlap paths
332 Path-Counting Formula for Φ119886119887119888119889
Now we presenta path-pair level graphical representation for ⟨(119875
119860119886 119875119860119887)
(119875119860119888 119875119860119889)⟩ shown in Figure 13 The options for an edge can
be 119879119883 119879119883 (Refer to Section 311 for definitions of 119879119883and 119879119883) Based on the different types of ⟨119875
119860119886 119875119860119887 119875119860119888 119875119860119889⟩
presented in (14) all cases for ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ are
summarized in Table 3 where ℎ is the last individual of a roothomo-overlap path 119875
119860ℎ(ie the path 119875
119860ℎending at ℎ) and 119903
1
and 1199032are the last individuals of root heter-overlap paths 119875
1198601199031
and 1198751198601199032
respectivelyGiven a pedigree graph having one or multiple progeni-
tors 119901119894| 119894 gt 0 we define that the generation of a progenitor
Table 3 A summary of all cases for ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩ ⟨(119875
119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩
Zero root 2-overlap andzero root 3-overlap
Zero root homo-overlap and zero rootheter-overlap
One root 2-overlap path
One root homo-overlap and zero rootheter-overlapZero root homo-overlap and one rootheter-overlap
Two root 2-overlap paths
Two root homo-overlaps and zero rootheter-overlapZero root homo-overlap and two rootheter-overlaps
One root 3-overlap path One root homo-overlap and two rootheter-overlaps and ℎ = 119903
1= 1199032
One root 2-overlap andone root 3-overlap
One root homo-overlap and two rootheter-overlaps and 119903
1= 1199032= ℎ
One root homo-overlap and two rootheter-overlaps and ℎ = 119903
1= 1199032
119901119894is 0 denoted as gen(119901
119894) = 0 If an individual 119886 has only
one parent 119901 then we define gen(119886) = gen(119901) + 1 If anindividual 119886 has two parents 119891 and 119898 we define gen(119886) =MAXgen(119891) gen(119898) + 1
The path-counting formula forΦ119886119887119888119889
is as follows
Φ119886119887119888119889
= sum
119860
( sum
Type 1(1
2)
1198712-pair
Φ119860119860119860
+ sum
Type 2(1
2)
1198712-pair+1
Φ119860119860119860
+ sum
Type 3(1
2)
1198712-pair+2
Φ119860119860
+ sum
Type 4(1
2)
1198712-pair+1
Φ119860119860)
+ sum
(119878119879)isinType 5(1
2)
119871⟨119875119878119886119875119878119887⟩+119871⟨119875119879119888119875119879119889
⟩+1
Φ119861119861
(16)
where 119860 a quad-common ancestor of 119886 119887 119888 and 119889 119878a common ancestor of 119886 and 119887 and 119879 a common ances-tor of 119888 and 119889 For ⟨(119875
119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ (119878 = 119879 =
119860) there are four types (ieType 1 to Type 4)
12 Computational and Mathematical Methods in Medicine
S0S1 S2 S3 S4 S5 S6 S7
S8 S9 S10 S11 S12 S13 S14 S15 S16
PAa
PAdPAb
PAc
Figure 13 Scenarios of ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ at path-pair level
Type 1 zero root homo-overlap and zero root heter-overlapType 2 zero root homo-overlap and one root heter-overlap 119875
119860119903ending at 119903
Type 3
zero root homo-overlap and two rootheter-overlap 119875
1198601199031and1198751198601199032
ending at1199031and 1199032 respectively
one root homo-overlap 119875119860ℎ
ending at ℎand two root heter-overlap 119875
1198601199031and 119875
1198601199032
ending at 1199031and 1199032 and 119903
1= 1199032
(17)
Type 4 one root homo-overlap 119875119860ℎ
ending at ℎ andtwo root heter-overlap ending at 119903
1and 1199032 and ℎ =
1199031= 1199032 For ⟨(119875
119878119886 119875119878119887) (119875119879119888 119875119879119889)⟩ (119878 = 119879) there is
one type (ie Type 5)Type 5 ⟨119875
119878119886 119875119878119887⟩ has zero overlap individual ⟨119875
119879119888
119875119879119889⟩ has zero overlap individual
At most one path-pair (either ⟨119875119878119886 119875119878119887⟩ or ⟨119875
119879119888
119875119879119889⟩) can have crossover individualsBetween a path from ⟨119875
119878119886 119875119878119887⟩ and a path from ⟨119875
119879119888 119875119879119889⟩
there are no overlap individuals but there can be crossoverindividuals 119909 where 119909 = 119878 and 119909 = 119879
119861=
119878 when gen (119878) lt gen (119879)119878 when gen (119878) = gen (119879)
and 119879 has two parents119879 otherwise
1198712-pair =
119871119875119860119886+ 119871119875119860119887
+119871119875119860119888+ 119871119875119860119889
for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 119871119875119860119903
for Type 2119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 1198711198751198601199031
minus 1198711198751198601199032
for Type 3119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 2 lowast 119871
119875119860ℎfor Type 4
119871⟨119875119878119886 119875119878119887⟩
= 119871119875119878119886+ 119871119875119878119887
for Type 5
119871⟨119875119879119888 119875119879119889⟩
= 119871119875119879119888+ 119871119875119879119889
for Type 5
(18)
Note that if ⟨119886 119887⟩ and ⟨119888 119889⟩ have zero quad-commonancestors we have the following formula for Φ
119886119887119888119889
Φ119886119887119888119889
= sum
(119878119879)isinType 6(1
2)
119871⟨119875119878119886119875119878119887⟩+119871⟨119875119879119888119875119879119889
⟩
Φ119878119878lowast Φ119879119879 (19)
Type 6 ⟨119875119878119886 119875119878119887⟩ is a nonoverlapping path-pair and ⟨119875
119879119888
119875119879119889⟩ is a nonoverlapping path-pair Between a path from
⟨119875119878119886 119875119878119887⟩ and a path from ⟨119875
119879119888 119875119879119889⟩ there are no overlap
individuals but there can be crossover individuals119871⟨119875119878119886 119875119878119887⟩
and 119871⟨119875119879119888119875119879119889⟩
are defined as in Type 5The correctness of the path-counting formula forΦ
119886119887119888119889is
proven in Appendix C For completeness please refer to [18]for the path-counting formulas for Φ
119886119886119887119888 Φ119886119887119886119888
Φ119886119887119886119887
andΦ119886119886119886119887
34 Experimental Results In this section we show the effi-ciency of our path-counting method using NodeCodes forcondensed identity coefficients by making comparisons withthe performance of a recursive method used in [10] Weimplemented two methods (1) using recursive formulas tocompute each required kinship coefficient and generalizedkinship coefficient (2) using path-counting method coupledwith NodeCodes to compute each required kinship coeffi-cient and generalized kinship coefficient independently Werefer to the first method as Recursive the second methodas NodeCodes For completeness please refer to [18] for thedetails of the NodeCodes-based method
Nodecodes of a node is a set of labels each representing apath to the node from its ancestors Given a pedigree graphlet 119903 be the progenitor (ie the node with 0 in-degree)(For simplicity we assume there is one progenitor 119903 asthe ancestor of all individuals in the pedigree Otherwise avirtual node 119903 can be added to the pedigree graph and allprogenitors can be made children of 119903) For each node 119906 inthe graph the set of NodeCodes of 119906 denoted as NC(119906) areassigned using a breadth-first-search traversal starting from119903 as follows
(1) If 119906 is 119903 then NC(119903) contains only one element theempty string
(2) Otherwise let 119906 be a node with NC(119906) and V0 V1
V119896be 119906rsquos children in sibling order then for each 119909
in NC(119906) a code 119909119894lowast is added to NC(V119894) where 0 le
119894 le 119896 and lowast indicates the gender of the individualrepresented by node V
119894
Computational and Mathematical Methods in Medicine 13
Computations of kinship coefficients for two individualsand generalized kinship coefficients for three individualspresented in [11 12 14 15] are using NodeCodes TheNodeCodes-based computation schemes can also be appliedfor the generalized kinship coefficients for four individualsand two pairs of individuals For completeness please referto [18] for the details using NodeCodes to compute thegeneralized kinship coefficients for four individuals and twopairs of individuals based on our proposed path-countingformulas in Sections 32 and 33
In order to test the scalability of our approach for cal-culating condensed identity coefficients on large pedigreeswe used a population simulator implemented in [11] togenerate arbitrarily large pedigreesThe population simulatoris based on the algorithm for generating populations withoverlapping generations in Chapter 4 of [19] along withthe parameters given in Appendix B of [20] to model therelatively isolated Finnish Kainuu subpopulation and itsgrowth during the years 1500ndash2000 An overview of thegeneration algorithmwas presented in [11 12 14]The param-eters include startingending year initial population sizeinitial age distribution marriage probability maximum ageat pregnancy expected number of children by time periodimmigration rate and probability of death by time period andage group
We examine the performance of condensed identity coef-ficients using twelve synthetic pedigrees which range from75 individuals to 195197 individuals The smallest pedigreespans 3 generations and the largest pedigree spans 19 gener-ations We analyzed the effects of pedigree size and the depthof individuals in the pedigree (the longest path between theindividual and a progenitor) on the computation efficiencyimprovement
In the first experiment 300 random pairs were selectedfrom each of our 12 synthetic pedigrees Figure 14 showscomputation efficiency improvement for each pedigree Ascan be seen the improvement of NodeCodes over Recursivegrew increasingly larger as the pedigree size increased froma comparable amount of 2683 on the smallest pedigree to9475 on the largest pedigree It also shows that path-count-ing method coupled with NodeCodes can scale very well onlarge pedigrees in terms of computing condensed identitycoefficients
In our next experiment we examined the effect of thedepth of the individual in the pedigree on the query time Foreach depth we generated 300 random pairs from the largestsynthetic pedigree
Figure 15 shows the effect of depth on the compu-tation efficiency improvement We can see the improve-ment of NodeCodes over Recursive ranging from 8648 to9130
4 Conclusion
We have introduced a framework for generalizing Wrightrsquospath-counting formula for more than two individuals Aim-ing at efficiently computing condensed identity coefficients
0
50
100
150
200
77 181
383
769
1558
3105
6174
1235
1
2466
7
4976
1
9832
8
1951
97
250
300
Aver
age t
ime (
ms)
Individuals in pedigree
RecursiveNodecodes
Figure 14 The effect of pedigree size on computation efficiencyimprovement
0200400600800
10001200140016001800
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Aver
age t
ime (
ms)
Depth
RecursiveNodeCodes
Figure 15 The effect of depth on computation efficiency improve-ment
we proposed path-counting formulas (PCF) for all general-ized kinship coefficients for which are sufficient for express-ing condensed identity coefficients by a linear combinationWe also perform experiments to compare the efficiency of ourmethod with the recursive method for computing condensedidentity coefficients on large pedigrees Our future workincludes (i) further improvements on condensed identifycoefficients computation by collectively calculating the setof generalized kinship coefficients to avoid redundant com-putations and (ii) experimental results for using PCF inconjunction with encoding schemes (eg compact path-encoding schemes [13]) for computing condensed identitycoefficients on very large pedigrees
Appendices
A Path-Counting Formulas of Special Cases
A1 Path-Counting Formula for Φ119886119886119887
For ⟨1198751198601198861 1198751198601198862⟩ we
introduce a special case where 1198751198601198861
and 1198751198601198862
aremergeable
14 Computational and Mathematical Methods in Medicine
PAa1 PAa2 PAa1 PAa2
S0 S1
PAb PAb PAb
If is mergeable⟨P ⟩Aa1 PAa2
PAa
S2 S3
Figure 16 A path-pair level graphical representation of ⟨1198751198601198861 1198751198601198862
119875119860119887⟩
Definition A1 (Mergeable Path-Pair) A path-pair ⟨1198751198601198861
1198751198601198862⟩ is mergeable if and only if the two paths 119875
1198601198861and 119875
1198601198862
are completely identical
Next we present a graphical representation of ⟨1198751198601198861 1198751198601198862
119875119860119887⟩ in Figure 16
Lemma A2 For 1198782and 119878
3in Figure 16 ⟨119875
1198601198861 1198751198601198862⟩ cannot
be a mergeable path-pair
Proof For 1198782and 119878
3 if ⟨119875
1198601198861 1198751198601198862⟩ is mergeable then
any common individual 119904 between 1198751198601198861
and 119875119860119887
is alsoa shared individual between 119875
1198601198862and 119875
119860119887 It means
119904 isin 119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) which contradicts the fact that
119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) = 0
Considering all three scenarios in Figure 16 only 1198781can
have a mergeable path-pair ⟨1198751198601198861 1198751198601198862⟩ by Lemma A2 Now
we present our path-counting formula forΦ119886119886119887
where 119886 is notan ancestor of 119887
Φ119886119886119887
= sum
119860
( sum
Type 1(1
2)
119871 tripleminus1
Φ119860119860119860
+ sum
Type 2(1
2)
119871 triple
Φ119860119860
+ sum
Type 3(1
2)
119871⟨119875119860119886119875119860119887⟩+1
Φ119860119860)
(A1)
where 119860 a common ancestor of 119886 and 119887When ⟨119875
1198601198861 1198751198601198862⟩ is not mergeable
Type 1 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has no root 2-overlap
Type 2 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has one root 2-overlap path
119875119860119904
ending at the individual 119904
When ⟨1198751198601198861 1198751198601198862⟩ is mergeable
Type 3 ⟨119875119860119886 119875119860119887⟩ is a nonoverlapping path-pair
119871 triple = 1198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887
for Type 11198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887minus 119871119875119860119904
for Type 2
119871⟨119875119860119886 119875119860119887⟩
= 119871119875119860119886+ 119871119875119860119887
for Type 3
(A2)
For the sake of completeness if 119886 is an ancestor of 119887 there isno recursive formula for Φ
119886119886119887in [10] but we can use either
the recursive formula for Φ119886119887119888
or the path-counting formulaforΦ119886119887119888
to computeΦ11988611198862119887
A2 Path-Counting Formula for Φ119886119886119887119888
Given a path-quad⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩ if ⟨119875
1198601198861 1198751198601198862⟩ is not mergeable then
we process the path-quad as equivalent to ⟨119875119860119886 119875119860119887 119875119860119888
119875119860119889⟩ If ⟨119875
1198601198861 1198751198601198862⟩ is mergeable the path-quad ⟨119875
1198601198861 1198751198601198862
119875119860119887 119875119860119888⟩ can be condensed to scenarios for ⟨119875
119860119886 119875119860119887 119875119860119888⟩
Now we present a path-counting formula forΦ119886119886119887119888
where119886 is not an ancestor of 119887 and 119888 as follows
Φ119886119886119887119888
= sum
119860
( sum
Type 1(1
2)
119871quadminus1
Φ119860119860119860119860
+ sum
Type 2(1
2)
119871quad
ΦAAA
+ sum
Type 3(1
2)
119871quad+1
Φ119860119860)
+sum
119860
( sum
Type 4(1
2)
119871 triple+1
Φ119860119860119860
+ sum
Type 5(1
2)
119871 triple+2
Φ119860119860)
(A3)
where 119860 a quad-common ancestor of 119886 119887 119888 and 119889When ⟨119875
1198601198861 1198751198601198862⟩ is not mergeable
Type 1 zero root 2-overlap and zero root 3-overlappathType 2 one root 2-overlap path 119875
119860119904ending at 119904
Type 3
Case 1 two root 2-overlap paths 1198751198601199041
and 1198751198601199042
ending at 1199041and 1199042 respectively
Case 2 one root 3-overlap path 119875119860119905
ending at 119905Case 3 one root 2-overlapand one root 3-overlap paths119875119860119904
and 119875119860119905
ending at 119904 and 119905respectively
(A4)
When ⟨1198751198601198861 1198751198601198862⟩ is mergeable
Type 4 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has zero root 2-overlap path
Type 5 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has one root 2-overlap path119875
119860119904
ending at 119904
119871quad=
1198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
for Type 11198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus119871119875119860119904
for Type 21198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus1198711198751198601199041
minus 1198711198751198601199042
for Case 1isinType 31198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus119871119875119860119905
for Case 2isinType 31198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus119871119875119860119905minus 119871119875119860119904
for Case 3isinType 3
119871 triple = 119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
for Type 4119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888minus 119871119875119860119904
for Type 5(A5)
Computational and Mathematical Methods in Medicine 15
Note that if 119886 is an ancestor of either 119887 or 119888 or both ofthem then the path-counting formula of Φ
119886119887119888119889is applicable
to computeΦ11988611198862119887119888
A3 Path-Counting Formula for Φ119886119886119886119887
A special case of⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ for ⟨119875
1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ is introduced
when ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable With the existence of
a mergeable path-triple ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ can be con-
densed to ⟨119875119860119886 119875119860119887⟩
Definition A3 (Mergeable Path-Triple) Given three paths1198751198601198861
1198751198601198862
and 1198751198601198863
they are mergeable if and only if theyare completely identical
Lemma A4 Given a path-quad ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there
must be at least one mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩
⟨1198751198601198861 1198751198601198863⟩ ⟨1198751198601198862 1198751198601198863⟩
Proof For an individual 119886 with two parents 119891 and 119898 thepaternal allele of the individual 119886 is transmitted from 119891 andthe maternal allele is transmitted from119898 At allele level onlytwo descent paths starting from an ancestor are allowed Fora path-quad ⟨119875
1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there must be at least one
mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩ ⟨1198751198601198861 1198751198601198863⟩ and
⟨1198751198601198862 1198751198601198863⟩
For simplicity we treat ⟨1198751198601198861 1198751198601198862⟩ as a default mergeable
path-pairNow we present the path-counting formula for Φ
119886119886119886119887
where 119886 is not an ancestor of 119887 as follows
Φ119886119886119886119887
= sum
119860
(3
2( sum
Type 1(1
2)
119871 tripleminus1
Φ119860119860119860
+ sum
Type 2(1
2)
119871 triple
Φ119860119860)
+ sum
Type 3(1
2)
119871pair+2
Φ119860119860)
(A6)
where 119860 a common ancestor of 119886 and 119887When there is only one mergeable path-pair (let us con-
sider ⟨1198751198601198861 1198751198601198862⟩ as the mergeable path-pair)
Type 1 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has zero root 2-overlap path
Type 2 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has one root 2-overlap path
119875119860119904
ending at 119904
When ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable
Type 3 ⟨119875119860119886 119875119860119887⟩ is nonoverlapping
119871 triple = 1198711198751198601198861
+ 1198711198751198601198863
+ 119871119875119860119887
for Type 11198711198751198601198861
+ 1198711198751198601198863
+ 119871119875119860119887minus 119871119875119860119904
for Type 2
119871pair = 119871119875119860119886 + 119871119875119860119887 for Type 3
(A7)
Note that if 119886 is an ancestor of 119887 we treat Φ119886119886119886119887
=
Φ119886111988621198863119887
Then we apply the path-counting formula forΦ119886119887119888119889
to computeΦ119886111988621198863119887
Case21 Case31 ΦAAAΦabCase22 Case32
Case23 ΦAA
Figure 17 Dependency graph for different cases regardingΦ119886119887119888
andΦ119886119886119887
B Proof for Path-Counting Formulas ofThree Individuals
Wefirst demonstrate that for one triple-common ancestor119860the path-counting computation of Φ
119886119887119888is equivalent to the
computation using recursive formulas Then we prove thecorrectness of the path-counting computation for multipletriple-common ancestors
B1 One Triple-Common Ancestor Considering the differenttypes of path-triples starting from a triple-common ancestor119860 in a pedigree graph119866 contributing toΦ
119886119887119888andΦ
119886119886119887119866 can
have 5 different cases
Case 21 119866 does not haveany path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with root overlapCase 22 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with root overlapCase 23 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
having mergeablepath-pair⟨119875
1198601198861 1198751198601198862⟩
lArr997904 Φ119886119886119887
Case 31 119866 does not haveany path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩
with root overlapCase 32 119866 has path-triples
⟨119875119860119886 119875119860119887 119875119860119888⟩
with root overlap
lArr997904 Φ119886119887119888
(B1)
Based on the 5 cases from Case 21 to Case 32 we firstconstruct a dependency graph shown in Figure 17 consist-ent with the recursive formulas (3) (4) and (5) for the gener-alized kinship coefficients for three individuals
Then we take the following steps to prove the correctnessof the path-counting formulas (12) and (A1)
(i) forΦ119886119887 the correctness of the path-counting formula
(ie Wrightrsquos formula) is proven in [21] For Case 21and Case 22 the correctness is proven based on thecorrectness of Cases 31 and 32
(ii) for Case 23 it has no cycle but only depends on Φ119886119887
Thus we prove the correctness of Case 23 by trans-forming the case toΦ
119886119887
16 Computational and Mathematical Methods in Medicine
a b
c
(a)
A
a b c
(b)
Figure 18 (a) 119888 is a parent of 119886 and 119887 (b) no individual is a parent of another
Parent-child relationshipAncestor-descendant relationship
A
a
s v p
f b c
(a)
Parent-child relationshipAncestor-descendant relationship
c
a
s v
f b
(b)
Figure 19 (a) No individual is a parent of another (b) 119888 is an ancestor of 119886 and 119887
(iii) for Cases 31 and 32 the correctness is proven byinduction on the number of edges 119899 in the pedigreegraph 119866
B11 Correctness Proof for Case 31
Case 31 ForΦ119886119887119888
119866 does not have any path triples ⟨119875119860119886 119875119860119887
119875119860119888⟩ with root overlap
Proof (Basis) There are two basic scenarios (i) one individ-ual is a parent of another (ii) no individual is a parent ofanother among 119886 119887 and 119888
Using the recursive formula (3) to compute Φ119886119887119888
forFigure 18(a) Φ
119886119887119888= (12)Φ
119888119887119888= (12)
2
Φ119888119888119888 for Figure 18(b)
Φ119886119887119888= (12)Φ
119860119887119888= (12)
2
Φ119860119860119888
= (12)3
Φ119860119860119860
Using the path-counting formula (12) if a path-triple
⟨119875119860119886 119875119860119887 119875119860119888⟩ has no root overlap (ie Type 1) then the
contribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ
119886119887119888can be computed as
follows sumType 1(12)119871⟨119875119860119886119875119860119887
119875119860119888⟩Φ119860119860119860
where 119871⟨119875119860119886119875119860119887 119875119860119888⟩
=
119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
For Figure 18(a) 119888 is the only triple-common ancestor
and we obtain Φ119886119887119888
= (12)119871⟨119875119888119886119875119888119887
119875119888119888⟩Φ119888119888119888
= (12)2
Φ119888119888119888 for
Figure 18(b) we obtain Φ119886119887119888
= (12)119871⟨119875119860119886119875119860119887
119875119860119888⟩Φ119860119860119860
=
(12)3
Φ119860119860119860
Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Then we show it istrue for 119899 = 119896 + 1
For Figures 19(a) and 19(b) among 119886 119887 and 119888 let 119886 be theindividual having the longest path starting from their triple-common ancestor in the pedigree graph119866with (119896+1) edgesIf we remove the node 119886 and cut the edge 119891 rarr 119886 from 119866
then the new graph 119866lowast has 119896 edges In terms of computingΦ119891119887119888
119866lowast satisfies the condition for induction hypothesisFor Figure 19(a) Φ
119891119887119888= sumType 1(12)
119871⟨119875119860119891119875119860119887119875119860119888⟩Φ119860119860119860
Based on the recursive formula (3)Φ
119886119887119888= (12)(Φ
119891119887119888+Φ119898119887119888)
where 119891 and 119898 are parents of 119886 In 119866 119886 only has one parent119891 thus it indicatesΦ
119898119887119888= 0 Then we can plug-in the path-
counting formula forΦ119891119887119888
to obtain
Φ119886119887119888=1
2Φ119891119887119888
=1
2lowast sum
Type 1(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩
Φ119860119860119860
= sum
Type 1(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860119860
∵ 119871⟨119875119860119886119875119860119887 119875119860119888⟩
= 119871⟨119875119860119891119875119860119887 119875119860119888⟩
+ 1
there4 Φ119886119887119888= sum
Type 1(1
2)
119871⟨119875119860119886119875119860119887119875119860119888⟩
Φ119860119860119860
(B2)
Similarly for Figure 19(b) we obtain Φ119886119887119888
=
sumType 1(12)119871⟨119875119888119891119875119888119887119875119888119888⟩+1
Φ119888119888119888= sumType 1(12)
119871⟨119875119888119886119875119888119887119875119888119888⟩Φ119888119888119888
Thus it is true for 119899 = 119896 + 1
B12 Correctness Proof for Case 32
Case 32 ForΦ119886119887119888
119866 has path triples ⟨119875119860119886 119875119860119887 119875119860119888⟩with root
overlap
Proof (Basis) There are three basic scenarios (i) there are twoindividuals who are parents of another (ii) there is only oneindividual who is parent of another (iii) there is no individualwho is a parent of another among 119886 119887 and 119888
Computational and Mathematical Methods in Medicine 17
a
b
c
(a)
A
a
b c
(b)
A
a
s
b
c
(c)
Figure 20 (a) 119887 is a parent of 119886 and 119888 is a parent of 119887 (b) 119887 is a parentof 119886 (c) no individual who is a parent of another
Using the recursive formula (3) to compute Φ119886119887119888
inFigure 20 for Figure 20(a) Φ
119886119887119888= (12)Φ
119887119887119888= (12)
2
Φ119887119888=
(12)3
Φ119888119888 for Figure 20(b)Φ
119886119887119888= (12)Φ
119887119887119888= (12)
2
Φ119887119888=
(12)4
Φ119860119860
for Figure 20(c)Φ119886119887119888= (12)
2
Φ119904119904119888= (12)
3
Φ119904119888=
(12)5
Φ119860119860
Using the path-counting formula (12) if a path-triple
⟨119875119860119886 119875119860119887 119875119860119888⟩ has root overlap (ie Type 2) then the con-
tribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ
119886119887119888can be computed as
followssumType 2(12)119871⟨119875119860119886119875119860119887
119875119860119888⟩+1
Φ119860119860
where 119871⟨119875119860119886 119875119860119887 119875119860119888⟩
=
119871119875119860119886
+ 119871119875119860119887
+ 119871119875119860119888minus 119871119875119860119904
and 119904 is the last individual of theroot overlap path 119875
119860119904
For Figure 20(a) 119888 is the only triple-common ancestorand we obtain Φ
119886119887119888= (12)
119871⟨119875119888119886119875119888119887119875119888119888⟩+1
Φ119888119888= (12)
2+1
Φ119888119888=
(12)3
Φ119888119888 Similarly for Figures 20(b) and 20(c) we obtain
Φ119886119887119888= (12)
4
Φ119860119860
and Φ119886119887119888= (12)
5
Φ119860119860
respectively
Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Show that it is truefor = 119896 + 1
For Figures 21(a) 21(b) and 21(c) among 119886 119887 and 119888 let119886 be the individual who has the longest path and let 119901 be aparent of 119886 Then we cut the edge 119901 rarr 119886 from 119866 and obtaina new graph 119866lowast which satisfies the condition of inductionhypothesis For Figure 21(a) we use the path-counting for-mula forΦ
119891119887119888in 119866lowast Φ
119891119887119888= sumType 2(12)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860
In 119866 119891 is the only parent of 119886 according to the recursive
formula (3) we have Φ119886119887119888= (12)Φ
119891119887119888 Then we can plug-in
the Φ119891119887119888
and obtain
Φ119886119887119888=1
2Φ119891119887119888
=1
2sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860
= sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1
Φ119860119860
∵ 119871⟨119875119860119886 119875119860119887 119875119860119888⟩
= 119871⟨119875119860119891119875119860119887 119875119860119888⟩
+ 1
there4 Φ119886119887119888= sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1
Φ119860119860
= sum
Type 2(1
2)
119871⟨119875119860119886119875119860119887119875119860119888⟩+1
Φ119860119860
(B3)
For Figures 21(b) and 21(c) we take the same steps as we cal-culate Φ
119886119887119888for Figure 21(a)
In summary it is true for 119899 = 119896 + 1
A
a
s
t
f
b
c
(a)
a
t
b
A
s c
(b)
a
s
t
b
c
(c)Figure 21 (a) No individual who is a parent of another (b) 119887 is aparent of 119886 (c) 119887 is a parent of 119886 and 119888 is an ancestor of 119887
B13 Correctness Proof for Case 23
Case 23 For Φ119886119886119887
the path-triples in the pedigree graph 119866have mergeable path-pair
Proof Considering the relationship between 119886 and 119887 119866has two scenarios (i) 119887 is not an ancestor of 119886 (ii) 119887 isan ancestor of 119886 Using the path-counting formula (A1)if a path-triple ⟨119875
1198601198861 1198751198601198862 119875119860119887⟩ isin Type 3 which means
that it has a mergeable path-pair then the contributionof ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ to Φ
119886119886119887can be computed as follows
sumType 3(12)119871⟨119875119860119886119875119860119887
⟩+1Φ119860119860
where 119871⟨119875119860119886 119875119860119887⟩
= 119871119875119860119886+ 119871119875119860119887
Using the recursive formula (4) we obtain Φ
119886119886119887=
(12)(Φ119886119887+ Φ119891119898119887)
For Figure 22(a) 119860 is a common ancestor of 119886 and 119887∵ 119886 only has one parent 119891
there4 Φ119886119886119887
=1
2(Φ119886119887+ Φ119891119898119887)
=1
2(Φ119886119887+ 0) =
1
2Φ119886119887
(as 119898 is missing) (B4)
For Φ119886119887 we use Wrightrsquos formula and obtain Φ
119886119887=
sum119875(12)119871⟨119875119860119886119875119860119887
⟩Φ119860119860
where 119875 denotes all nonoverlappingpath-pairs ⟨119875
119860119886 119875119860119887⟩
Then we have Φ119886119886119887
= (12)Φ119886119887
=
(12)sum119875(12)119871⟨119875119860119886119875119860119887
⟩Φ119860119860= sum119875(12)119871⟨119875119860119886119875119860119887
⟩+1Φ119860119860
For Figure 22(b) we can also transform the computation
of Φ119886119886119887
to Φ119886119887
In summary it shows that the path-counting formula(A1) is true for Case 23
B14 Correctness Proof for Cases 21 and 22 For Φ119886119886119887
whenthere is no path-triple having mergeable path-pair (ie thepath-triple belongs to either Case 21 or Case 23)Φ
119886119886119887can be
transformed toΦ11988611198862119887
which is equivalent to the computationof Φ119886119887119888
for Cases 31 and 32 The correctness of our path-counting formula for Cases 31 and 32 is proven Thus weobtain the correctness for Φ
119886119886119887when the path-triple belongs
to either Case 21 or Case 22
B2 Multiple Triple-Common Ancestors Now we providethe correctness proof for multiple triple-common ancestorsregarding the path-counting formulas (12) and (A1)
18 Computational and Mathematical Methods in Medicine
A
a
s
w
t
f
b
Parent-child relationshipAncestor-descendant relationship
(a)
a
s
f
b
Parent-child relationshipAncestor-descendant relationship
(b)
Figure 22 (a) 119887 is not an ancestor of 119886 (b) 119887 is an ancestor of 119886
Lemma A Given a pedigree graph 119866 and three individuals 119886119887 119888 having at least one trip-common ancestorΦ
119886119887119888is correctly
computed using the path counting formulas (12) and (A1)
Proof Proof by induction on the number of triple-commonancestorsBasis 119866 has only one triple-common ancestor of 119886 119887 and 119888
The correctness of (12) and (A1) for 119866 with only one tri-ple-common ancestor of 119886 119887 and 119888 is proven in the previoussection
Induction Hypothesis Assume that if 119866 has 119896 or less triple-common ancestors of 119886 119887 and 119888 (12) and (A1) are correct for119866
Induction Step Now we show that it is true for 119866 with 119896 + 1triple-common ancestors of 119886 119887 and 119888
Let 119879119903119894 119862(119886 119887 119888 119866) denote all triple-common ancestorsof 119886 119887 and 119888 in 119866 where 119879119903119894 119862(119886 119887 119888 119866) = 119860
119894| 1 le 119894 le 119896 +
1 Let 1198601be the most top triple-common ancestor such that
there is no individual among the remaining ancestors 119860119894|
2 le 119894 le 119896 + 1 who is an ancestor of 1198601 Let 119878(119860
1) denote the
contribution from 1198601to Φ119886119887119888
Because119860
1is themost top triple-common ancestor there
is no path-triple from 119860119894| 2 le 119894 le 119896 + 1 to 119886 119887 and
119888 which passes through 1198601 Then we can remove 119860
1from
119866 and delete all out-going edges from 1198601and obtain a new
graph 1198661015840 which has 119896 triple-common ancestors of 119886 119887 and 119888It means 119879119903119894 119862(119886 119887 119888 1198661015840) = 119860
119894| 2 le 119894 le 119896 + 1
For the new graph 1198661015840 we can apply our induction
hypothesis and obtainΦ119886119887119888(1198661015840
)For the most top triple-common ancestor 119860
1 there are
two different cases considering its relationship with the othertriple-common ancestors
(1) there is no individual among 119860119894| 2 le 119894 le 119896 + 1 who
is a descendant of 1198601
(2) there is at least one individual among 119860119894| 2 le 119894 le
119896 + 1 who is a descendant of 1198601
For (1) since no individual among 119860119894| 2 le 119894 le 119896 + 1 is a
descendant of 1198601 the set of path-triples from 119860
1to 119886 119887 and
119888 is independent of the set of path-triples from 119860119894| 2 le 119894 le
119896 + 1 to 119886 119887 and 119888 It also means that the contribution from
1198601toΦ119886119887119888
is independent of the contribution from the othertriple-common ancestors
Summing up all contributions we can obtainΦ119886119887119888(119866) =
Φ119886119887119888(1198661015840
) + 119878(1198601)
For (2) let119860119895be one descendant of119860
1 Now both119860
1and
119860119895can reach 119886 119887 and 119888119901119905119894= 119905119886 1198601rarr sdot sdot sdot rarr 119886 119905
119887 1198601rarr sdot sdot sdot rarr 119887 119905
119888 1198601rarr
sdot sdot sdot rarr 119888 a path-triple from 1198601to 119886 119887 and 119888
If 119905119886 119905119887 and 119905
119888all pass through119860
119895 then the path-triple119901119905
119894
is not an eligible path-triple for Φ119886119887119888
When we compute thecontribution from119860
1toΦ119886119887119888
we exclude all such path-tripleswhere 119905
119886 119905119887 and 119905
119888all pass through a lower triple-common
ancestor In other words an eligible path-triple from 1198601
regarding Φ119886119887119888
cannot have three paths all passing through alower triple-common ancestor Therefore we know that thatthe contribution from119860
1toΦ119886119887119888
is independent of the contri-bution from the other triple-common ancestors Summing upall contributions we obtainΦ
119886119887119888(119866) = Φ
119886119887119888(1198661015840
) + 119878(1198601)
C Proof for Four Individuals and TwoPairs of Individuals
Here we give a proof sketch for the correctness of pathcounting formulas for four individuals First of all for fourindividuals in a pedigree graph 119866 we present all differentcases based on which we construct a dependency graphThe correctness of the path-counting formulas for two-pairindividuals can be proved similarly
C1 Proof for Four Individuals Consider the existence ofdifferent types of path-quads regarding Φ
119886119887119888119889 Φ119886119886119887119888
andΦ119886119886119886119887
there are 15 cases for a pedigree graph 119866
Case 21 119866 has path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with zero root overlapCase 22 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with one root overlapCase 23 119866 has path-pairs
⟨119875119860119886 119875119860119887⟩
with zero root overlap
lArr997904 Φ119886119886119886119887
Computational and Mathematical Methods in Medicine 19
Case21
Case31 ΦAAA
ΦAAA
Case41
Case42
Case34ΦAA
Case32
Case331
Case22
Case23
Case431
Case35
Case432
Case4 33
Case332
Case333
Figure 23 Dependency graph for different cases for four individuals
Case 31 119866 has path-quads⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with zero root overlapCase 32 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 2-overlapCase 331 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with two root 2-overlapCase 332 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 3-overlapCase 333 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 2-overlapand one root 3-overlap
Case 34 119866 has path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩
with zero root overlapCase 35 119866 has path-triples
⟨119875119860119886 119875119860119887 119875119860119888⟩
with one root overlap
lArr997904 Φ119886119886119887119888
Case 41 119866 has path-quads⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with zero root overlapCase 42 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 2-overlapCase 431 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with two root 2-overlapCase 432 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 3-overlapCase 433 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 2-overlapand one root 3-overlap
lArr997904 Φ119886119887119888119889
(C1)Then we construct a dependency graph shown in
Figure 23 for all cases for four individualsAccording to the dependency graph in Figure 23 the
intermediate steps including Cases 34 and 35 are already
proved for the computation of Φ119886119887119888
The correctness of thetransformation fromCase 42 to Case 34 can be proved basedon the recursive formula forΦ
119886119887119888119889andΦ
119886119886119887119888 Similarly we can
obtain the transformation from Case 431 to Case 35
C2 Proof for TwoPairs of Individuals Consider the existenceof different types of 2-pair-path-pair regarding Φ
119886119887119888119889 there
are 9 cases which are listed as follows
Case 41 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-
overlap and zero root heter-overlap
Case 42 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-
overlap and one root heter-overlap
Case 431 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root
homo-overlap and two root heter-overlap
Case 432 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root
homo-overlap and two root heter-overlap
Case 44119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root homo-
overlap and zero root heter-overlap
Case 45 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with two root homo-
overlap and zero root heter-overlap
Case 46 119866 has path-triples ⟨119875119860119886 119875119860119887 119875119860119888⟩ with zero root
overlapCase 47 119866 has path-triples ⟨119875
119860119886 119875119860119887 119875119860119888⟩ with one root
overlap
Case 48 119866 has path-pairs ⟨119875119879119888 119875119879119889⟩ with zero root overlap
Then we construct a dependency graph for the casesrelating to Φ
119886119887119888119889in Figure 24
According to the dependency graph in Figure 24Cases 46 47 and 48 are the intermediate steps whichalready are proved for the computation of Φ
119886119887119888 The
correctness of the transformation from Case 42 to Case 46can be proved based on the recursive formula for Φ
119886119887119888119889and
Φ119886119887119886119888
Similarly we can obtain the transformation fromCases 431 and 432 to Case 47 as well as from Case 44 toCase 48 accordingly
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
20 Computational and Mathematical Methods in Medicine
Case41
Case44
ΦAAA
Case42 Case46
Case48
ΦAA
ΦTT
Case431 Case47
Case432
ΦAAAA
Figure 24 Dependency graph for different cases for two pairs of individuals
Acknowledgments
The authors thank Professor Robert C Elston Case Schoolof Medicine for introducing to them the identity coefficientsand referring them to the related literature [7 10 17] Thiswork is partially supported by the National Science Founda-tionGrants DBI 0743705 DBI 0849956 andCRI 0551603 andby the National Institute of Health Grant GM088823
References
[1] Surgeon Generalrsquos New Family Health History Tool Is ReleasedReady for ldquo21st Century Medicinerdquo httpcompmedcomcate-gorypeople-helping-peoplepage7
[2] M Falchi P Forabosco E Mocci et al ldquoA genomewidesearch using an original pairwise sampling approach for largegenealogies identifies a new locus for total and low-density lipo-protein cholesterol in two genetically differentiated isolates ofSardiniardquoThe American Journal of Human Genetics vol 75 no6 pp 1015ndash1031 2004
[3] M Ciullo C Bellenguez V Colonna et al ldquoNew susceptibilitylocus for hypertension on chromosome 8q by efficient pedigree-breaking in an Italian isolaterdquo Human Molecular Genetics vol15 no 10 pp 1735ndash1743 2006
[4] Glossary of Genetic Terms National Human Genome ResearchInstitute httpwwwgenomegovglossaryid=148
[5] CW CottermanA calculus for statistico-genetics [PhD thesis]Columbus Ohio USA Ohio State University 1940 Reprintedin P Ballonoff Ed Genetics and Social Structure DowdenHutchinson amp Ross Stroudsburg Pa USA 1974
[6] G Malecot Les mathematique de lrsquoheredite Masson ParisFrance 1948 Translated edition The Mathematics of HeredityFreeman San Francisco Calif USA 1969
[7] M Gillois ldquoLa relation drsquoidentite en genetiquerdquo Annales delrsquoInstitut Henri Poincare B vol 2 pp 1ndash94 1964
[8] D L Harris ldquoGenotypic covariances between inbred relativesrdquoGenetics vol 50 pp 1319ndash1348 1964
[9] A Jacquard ldquoLogique du calcul des coefficients drsquoidentite entredeux individualsrdquo Population vol 21 pp 751ndash776 1966
[10] G Karigl ldquoA recursive algorithm for the calculation of identitycoefficientsrdquo Annals of Human Genetics vol 45 no 3 pp 299ndash305 1981
[11] B Elliott S F Akgul S Mayes and Z M Ozsoyoglu ldquoEfficientevaluation of inbreeding queries on pedigree datardquo in Proceed-ings of the 19th International Conference on Scientific and Statis-tical Database Management (SSDBM rsquo07) July 2007
[12] B Elliott E Cheng S Mayes and Z M Ozsoyoglu ldquoEfficientlycalculating inbreeding on large pedigrees databasesrdquo Informa-tion Systems vol 34 no 6 pp 469ndash492 2009
[13] L Yang E Cheng and Z M Ozsoyoglu ldquoUsing compactencodings for path-based computations on pedigree graphsrdquo inProceedings of the ACM Conference on Bioinformatics Compu-tational Biology and Biomedicine (ACM-BCB rsquo11) pp 235ndash244August 2011
[14] E Cheng B Elliott and Z M Ozsoyoglu ldquoScalable compu-tation of kinship and identity coefficients on large pedigreesrdquoin Proceedings of the 7th Annual International Conference onComputational Systems Bioinformatics (CSB rsquo08) pp 27ndash362008
[15] E Cheng B Elliott and Z M Ozsoyoglu ldquoEfficient compu-tation of kinship and identity coefficients on large pedigreesrdquoJournal of Bioinformatics and Computational Biology (JBCB)vol 7 no 3 pp 429ndash453 2009
[16] S Wright ldquoCoefficients of inbreeding and relationshiprdquo TheAmerican Naturalist vol 56 no 645 1922
[17] R Nadot and G Vaysseix ldquoKinship and identity algorithm ofcoefficients of identityrdquo Biometrics vol 29 no 2 pp 347ndash3591973
[18] E Cheng Scalable path-based computations on pedigree data[PhD thesis] Case Western Reserve University ClevelandOhio USA 2012
[19] V Ollikainen Simulation Techniques for Disease Gene Localiza-tion in Isolated Populations [PhD thesis] University ofHelsinkiHelsinki Finland 2002
[20] H T T Toivonen P Onkamo K Vasko et al ldquoData miningapplied to linkage diseqilibrium mappingrdquoThe American Jour-nal of Human Genetics vol 67 no 1 pp 133ndash145 2000
[21] W Boucher ldquoCalculation of the inbreeding coefficientrdquo Journalof Mathematical Biology vol 26 no 1 pp 57ndash64 1988
Submit your manuscripts athttpwwwhindawicom
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MEDIATORSINFLAMMATION
of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Behavioural Neurology
EndocrinologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Disease Markers
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
OncologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Oxidative Medicine and Cellular Longevity
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
PPAR Research
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
ObesityJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational and Mathematical Methods in Medicine
OphthalmologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Diabetes ResearchJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Research and TreatmentAIDS
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Gastroenterology Research and Practice
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Parkinsonrsquos Disease
Evidence-Based Complementary and Alternative Medicine
Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom
8 Computational and Mathematical Methods in Medicine
For B2 there can be at most one edge belonging to root overlap (either T or TX)
PAa PAa
PAb PAb PAc
B1 B2
For B1 the edge can have three options case 1 T case 2 X case 3 TX
Figure 7 Building blocks 1198611 1198612 and basic rules
Note Ri denotes all acceptable path-triples for ui
S3e1
T3 = R1 ⋈ R2 ⋈ R3u1 u2 u3
e2 e2 e2
e3e3 e3e1 e1
Figure 8 A graphical illustration for obtaining 1198793
119875119860119886
and 119875119860119888
must share at least one com-mon individualexcept 119860 which contradicts the fact that 119875
119860119886and 119875
119860119888have
no edgeNext we focus on generating all acceptable cases for the
scenarios 1198781ndash1198783in Figure 5 where only 119878
3contains more
than one building block In order to leverage the dependencyamong building blocks we decompose 119878
3to 1198783= 1199061= 1198612
1199062= 1198612 1199063= 1198612 shown in Figure 8 For each 119906
119894 we have a
set of acceptable path-triples denoted as 119877119894
Considering the dependency among 1198771 1198772 1198773 we use
the natural join operator denoted as ⋈ operating on 1198771
1198772 1198773 to generate all acceptable cases for 119878
3 As a result we
obtain 1198793= 1198771⋈ 1198772⋈ 1198773 where 119879
3denotes the acceptable
cases of the path-triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ in the scenario 119878
3
For each scenario in Figure 5 we generate all acceptablecases for ⟨119875
119860119886 119875119860119887 119875119860119888⟩ The scenario 119878
0has no edges and
it shows that ⟨119875119860119886 119875119860119887 119875119860119888⟩ consists of three independent
paths while for the other scenarios 119878119896(119896 = 1 2 3) the 119896
edges can have two options
(1) all 119896 edges belong to crossover or(2) one edge belongs to root 2-overlap the remaining (119896minus
1) edges belong to crossover
In summary acceptable path-triples can have at most oneroot 2-overlap path any number of crossover individuals butzero 2-overlap path
314 Splitting Operator Considering the existence of root2-overlap path and crossover in acceptable path-triples wepropose a splitting operator to transform a path-triple withcrossover individuals to a noncrossover path-triple withoutchanging the contribution from this path-triple to Φ
119886119887119888 The
main purpose of using the splitting operator is to simplifythe path-counting formula derivation process We first usean example in Figure 9 to illustrate how the splitting operator
works In Figure 9 there is a crossover individual 119904 between119875119860119886
and 119875119860119887
in the path triple ⟨119875119860119886 119875119860119887 119875119860119888⟩ in 119866
119896+1 The
splitting operator proceeds as follows
(1) split the node 119904 to two nodes 1199041and 1199042
(2) transform the edges 119904 rarr 1198861015840 and 119904 rarr 119887
1015840 to 1199041rarr 1198861015840
and 1199042rarr 1198871015840 respectively
(3) add two new edges 1199042rarr 1198861015840 and 119904
1rarr 1198871015840
Lemma 4 Given a pedigree graph 119866119896+1
having (119896 + 1)
crossover individuals regarding ⟨119875119860119886 119875119860119887 119875119860119888⟩ shown in
Figure 9 let 119904 denote the lowest crossover individual where nodescendant of 119904 can be a crossover individual among the threepaths119875
119860119886119875119860119887 and119875
119860119888 After using the splitting operator for the
lowest crossover individual 119904 in119866119896+1 the number of crossover
individuals in 119866119896+1
is decreased by 1
Proof The splitting operator only affects the edges from 119904 to1198861015840 and 1198871015840 If there is a new crossover node appearing the only
possible node is either 1198861015840 or 1198871015840 Assume 1198871015840 becomes a cross-over individual it means that 1198871015840 is able to reach 119886 and 119887 fromtwo separate paths It contradicts the fact that 119904 is the lowestcrossover individual between 119875
119860119886and 119875
119860119887
Next we introduce a canonical graph which results fromapplying the splitting operator for all crossover individualsThe canonical graph has zero crossover individual
Definition 5 (Canonical Graph) Given a pedigree graph 119866having one or more crossover individuals regarding Φ
119886119887119888 If
there exists a graph 1198661015840 which has no crossover individualswith regards to Φ
119886119887119888such that
(i) any acceptable path-triple in 119866 has an acceptablepath-triple in 1198661015840 which has the same contribution toΦ119886119887119888
as the one in 119866 forΦ119886119887119888
(ii) any acceptable path-triple in 1198661015840 has an acceptablepath-triple in 119866 which and has the same contributionto Φ119886119887119888
as the one in 1198661015840 forΦ119886119887119888
We call 1198661015840 a canonical graph of 119866 regardingΦ119886119887119888
Lemma 6 For a pedigree graph 119866 having one or morecrossover individuals regarding ⟨119875
119860119886 119875119860119887 119875119860119888⟩ there exists a
canonical graph 1198661015840 for 119866
Computational and Mathematical Methods in Medicine 9
Ancestor-descendant relationshipParent-child relationship
a998400 b
a b a b
998400 a998400 b998400
s1 s2
A A
x w c x w c
s For Gk+1 ⟨P ⟩ = PAa PAb PAc
⟨P ⟩ = PAa PAb PAcFor Gk
Gk+1 k + 1 crossover Gk k crossover
A rarr middot middot middot rarr x rarr s rarr a998400 rarr middot middot middot rarr aA rarr middot middot middot rarr w rarr s rarr b998400 rarr middot middot middot rarr b
A rarr middot middot middot rarr x rarr s1 rarr a998400 rarr middot middot middot rarr aA rarr middot middot middot rarr w rarr s2 rarr b998400 rarr middot middot middot rarr b
A rarr c
A rarr c
Figure 9 Transforming pedigree graph 119866119896+ 1 having 119896 + 1 crossover to 119866
119896having 119896 crossover
S0
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
PAa PAd
PAb PAc
Figure 10 A path-pair level graphical representation of ⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
Proof (Sketch) The proof is by induction on the number ofcrossover individuals
Induction hypothesis assume that if119866 has 119896 or less cross-overs there is a canonical graph 1198661015840 for 119866
In the induction step let119866119896+1
be a graph with 119896+1 cross-overs let 119904 be the lowest crossover between paths 119875
119860119886and
119875119860119887
in 119866119896+1
We apply the splitting operator on 119904 in 119866119896+1
andobtain 119866
119896having 119896 crossovers by Lemma 4
315 Path-Counting Formula for Φ119886119887119888
Now we present thepath-counting formula forΦ
119886119887119888
Φ119886119887119888= sum
119860
( sum
Type 1(1
2)
119871 triple
Φ119860119860119860
+ sum
Type 2(1
2)
119871 triple+1
Φ119860119860)
(12)
where Φ119860119860= (12)(1 + 119865
119860) Φ119860119860119860
= (14)(1 + 3119865119860) 119865119860 the
inbreeding coefficient of119860119860 a triple-common ancestor of 119886119887 and 119888 Type 1 ⟨119875
119860119886 119875119860119887 119875119860119888⟩ has zero root 2-overlap Type
2 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has one root 2-overlap path 119875
119860119904ending at
the individual 119904
119871 triple = 119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888minus 119871119875119860119904
for Type 2(13)
and 119871119875119860119886
the length of the path 119875119860119886
(also applicable for 119875119860119886
119875119860119888 and 119875
119860119904)
For completeness the path-counting formula for Φ119886119886119887
isgiven in Appendix A and the correctness proof of the path-counting formula is given in Appendix B
32 Path-Counting Formulas for Four Individuals
321 Path-Pair Level Graphical Representation of ⟨119875119860119886119875119860119887
119875119860119888119875119860119889⟩ Given a path-quad ⟨119875
119860119886 119875119860119887 119875119860119888 119875119860119889⟩ and
119876119906119886119889 119862(119875119860119886 119875119860119887 119875119860119888 119875119860119889) = 0 the path-quad can have 11
scenarios 1198780ndash11987810shown in Figure 10 where all four paths are
considered symmetricallyIn Figure 11 we introduce three building blocks 119861
1
1198612 1198613 For 119861
1and 119861
2 the rules presented in Figure 7 are also
applicable for Figure 11 For1198613 we only consider root overlap
because the crossover individuals can be eliminated by usingthe splitting operator introduced in Section 314 Note thatfor 1198613 if 119879119903119894 119862(119875
119860119886 119875119860119887 119875119860119888) = 0 then it is equivalent to the
scenario 1198783in Figure 8 Therefore we only need to consider
1198613when 119879119903119894 119862(119875
119860119886 119875119860119887 119875119860119888) = 0
322 Building Block-Based Cases Construction for ⟨119875119860119886119875119860119887
119875119860119888119875119860119889⟩ For a scenario 119878
119894(0 le 119894 le 10) in Figure 11 we
first decompose 119878119894to one or multiple building blocks For a
scenario 119878119894isin 1198781 1198783 it has only one building block and
all acceptable cases can be obtained directly For 1198782= 1199061=
1198611 1199062= 1198611 there is no need to consider the conflict between
the edges in 1199061and 119906
2because 119906
1and 119906
2are disconnected
Let 119877119894denote all acceptable cases of the path-pairs in 119906
119894 and
let 119879119894denote all acceptable cases for 119878
119894 Therefore we obtain
1198792= 1198771times1198772where times denotes the Cartesian product operator
from relational algebra
10 Computational and Mathematical Methods in Medicine
For B3 all three edges belong to root overlap (ie having root 3-overlap)
PAa
PAb PAcPAb
PAa
C(PAa PAb PAc) ne
B1 B2 B3
Tri 0
Figure 11 Building blocks for all scenarios of ⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
Table 2 Largest subgraph of a scenario 119878119894(4 le 119894 le 10 and 119894 = 6)
119878119894
1198784
1198785
1198787
1198788
1198789
11987810
119878119895
1198783
1198783
1198786
1198785
1198787
1198789
For 1198786= 1199061= 1198613 we obtain 119879
6= 1198771 For 119878
119894isin 119878119894| 4 le
119894 le 10 and 119894 = 6 we define the largest subgraph of 119878119894based
on which we construct 119879119894
Definition 7 (Largest Subgraph) Given a scenario 119878119894(4 le 119894 le
10 and 119894 = 6) the largest subgraph of 119878119894 denoted as 119878
119895 is
defined as follows
(1) 119878119895is a proper subgraph of 119878
119894
(2) if 119878119894contains 119861
3 then 119878
119895must also contain 119861
3
(3) no such 119878119896exists that 119878
119895is a proper subgraph of 119878
119896
while 119878119896is also a proper subgraph of 119878
119894
For each scenario 119878119894(4 le 119894 le 10 and 119894 = 6) we list the
largest subgraph of 119878119894 denoted as 119878
119895 in Table 2
For a scenario 119878119894(4 le 119894 le 10 and 119894 = 6) let Diff(119878
119894 119878119895)
denote the set of building blocks in 119878119894but not in 119878
119895 where 119878
119895is
the largest subgraph of 119878119894 Let |119864
119894| and |119864
119895| denote the number
of edges in 119878119894and 119878
119895 respectively According to Table 2 we
can conclude that |119864119894| minus |119864
119895| = 1 In order to leverage the
dependency among building blocks we consider only 1198612in
Diff(119878119894119878119895) For example Diff(119878
51198783) = 119861
2 Let119879
3denote all
acceptable cases for 1198783 And let119877
1denote the set of acceptable
cases for Diff(1198785 1198783) Then we can use 119878
3and Diff(119878
5
1198783) to construct all acceptable cases for 119878
5 Then we apply
this idea for constructing all acceptable cases for each 119878119894in
Table 2Given a path-quad ⟨119875
119860119886 119875119860119887 119875119860119888 119875119860119889⟩ an acceptable case
has the following properties
(1) if there is one root 3-overlap path there can be atmostone root 2-overlap path
(2) otherwise there can be at most two root 2-overlappaths
323 Path-Counting Formula forΦ119886119887119888119889
Now we present thepath-counting formula forΦ
119886119887119888119889as follows
Φ119886119887119888119889
= sum
119860
( sum
Type 1(1
2)
119871quad
Φ119860119860119860119860
+ sum
Type 2(1
2)
119871quad+1
Φ119860119860119860
+ sum
Type 3(1
2)
119871quad+2
Φ119860119860)
(14)
where Φ119860119860= (12)(1+119865
119860)Φ119860119860119860
= (14)(1+3119865119860)Φ119860119860119860119860
=
(18)(1+7119865119860) 119865119860 the inbreeding coefficient of119860119860 a quad-
common ancestor of 119886 119887 119888 and 119889 Type 1 zero root 2-overlapand zero root 3-overlap path Type 2 one root 2-overlap path119875119860119904
ending at 119904
Type 3
Case 1 two root 2-overlap paths 1198751198601199041
1198751198601199042
ending at 1199041and 1199042 respectively
Case 2 one root 3-overlap path119875119860119905
ending at 119905Case 3 one root 2-overlap path119875119860119904 one root 3-overlap
path 119875119860119905
ending at 119904 and 119905respectively
119871quad =
119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889
for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 119871119875119860119904
for Type 2119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889
minus1198711198751198601199041
minus 1198711198751198601199042
for Case 1 isin Type 3119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 2 lowast 119871
119875119860119905for Case 2 isin Type 3
119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889
minus119871119875119860119905minus 119871119875119860119904
for Case 3 isin Type 3(15)
and 119871119875119860119886
the length of the path 119875119860119886
(also applicable for 119875119860119887
119875119860119888 119875119860119889 etc)
For completeness the path-counting formulas for Φ119886119886119887119888
and Φ119886119886119886119887
are presented in Appendix A The correctness ofthe path-counting formula for four individuals is proven inAppendix C
Computational and Mathematical Methods in Medicine 11
⟨ ⟩(PAa PAb) (PAc PAd) = b
A
c
s t
da
A rarr s rarr aA rarr s rarr bA rarr t rarr cA rarr t rarr d
(a)
⟨ ⟩(PAa PAb) (PAc PAd) = b
A
c
x y
da
A rarr x rarr a
A rarr x rarr d
A rarr y rarr bA rarr y rarr c
(b)
Figure 12 Examples of 2-pair-path-quads for Φ119886119887119888119889
33 Path-Counting Formulas for Two Pairs of Individuals
331 Terminology and Definitions
(1) 2-Pair-Path-Pair It consists of two pairs of path-pairsdenoted as ⟨(119875
119878119886 119875119878119887) (119875119879119888 119875119879119889)⟩ where 119875
119878119886isin 119875(119878 119886) 119875
119878119887isin
119875(119878 119887) 119875119879119888isin 119875(119879 119888) 119875
119879119889isin 119875(119879 119889) 119878 is a common ancestor
of 119886 and 119887 and 119879 is a common ancestor of 119888 and 119889 If119860 = 119878 =119879 then 119860 is a quad-common ancestor of 119886 119887 119888 and 119889
(2) Homo-Overlap and Heter-Overlap Individual Given twopairs of individuals ⟨119886 119887⟩ and ⟨119888 119889⟩ if 119904 isin 119861119894 119862(119875
119860119886 119875119860119887) (or
119904 isin 119861119894 119862(119875119860119888 119875119860119889) we call 119904 a homo-overlap individual when
119875119860119886
and 119875119860119887
(or 119875119860119888
and 119875119860119889) pass through the same parent of
119904 If 119903 isin 119861119894 119862(119875119860119894 119875119860119895) where 119894 isin 119886 119887 and 119895 isin 119888 119889 we call
119903 a heter-overlap individual when 119875119860119894
and 119875119860119895
pass throughthe same parent of 119903
(3) Root Homo-Overlap and Heter-Overlap Path Given a 2-pair-path-pair ⟨(119875
119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ if 119904 is a homo-overlap
individual and the homo-overlap path extends all the wayto the quad-common ancestor 119860 then we call it a roothomo-overlap path If 119903 is a heter-overlap individual and theheter-overlap path extends all the way to the quad-commonancestor 119860 then we call it a root heter-overlap path
Example 8 119860 is quad-common ancestor for 119886 119887 119888 and 119889 inFigure 12 For (a) 119904 is a homo-overlap individual between 119875
119860119886
and 119875119860119887
119905 is a homo-overlap individual between 119875119860119888
and 119875119860119889 And
119860 rarr 119904 and 119860 rarr 119905 are root homo-overlap paths For (b) 119909 isa heter-overlap individual between 119875
119860119886and 119875
119860119889 119910 is a heter-
overlap individual between 119875119860119887
and 119875119860119888 And 119860 rarr 119909 and
119860 rarr 119910 are root heter-overlap paths
332 Path-Counting Formula for Φ119886119887119888119889
Now we presenta path-pair level graphical representation for ⟨(119875
119860119886 119875119860119887)
(119875119860119888 119875119860119889)⟩ shown in Figure 13 The options for an edge can
be 119879119883 119879119883 (Refer to Section 311 for definitions of 119879119883and 119879119883) Based on the different types of ⟨119875
119860119886 119875119860119887 119875119860119888 119875119860119889⟩
presented in (14) all cases for ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ are
summarized in Table 3 where ℎ is the last individual of a roothomo-overlap path 119875
119860ℎ(ie the path 119875
119860ℎending at ℎ) and 119903
1
and 1199032are the last individuals of root heter-overlap paths 119875
1198601199031
and 1198751198601199032
respectivelyGiven a pedigree graph having one or multiple progeni-
tors 119901119894| 119894 gt 0 we define that the generation of a progenitor
Table 3 A summary of all cases for ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩ ⟨(119875
119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩
Zero root 2-overlap andzero root 3-overlap
Zero root homo-overlap and zero rootheter-overlap
One root 2-overlap path
One root homo-overlap and zero rootheter-overlapZero root homo-overlap and one rootheter-overlap
Two root 2-overlap paths
Two root homo-overlaps and zero rootheter-overlapZero root homo-overlap and two rootheter-overlaps
One root 3-overlap path One root homo-overlap and two rootheter-overlaps and ℎ = 119903
1= 1199032
One root 2-overlap andone root 3-overlap
One root homo-overlap and two rootheter-overlaps and 119903
1= 1199032= ℎ
One root homo-overlap and two rootheter-overlaps and ℎ = 119903
1= 1199032
119901119894is 0 denoted as gen(119901
119894) = 0 If an individual 119886 has only
one parent 119901 then we define gen(119886) = gen(119901) + 1 If anindividual 119886 has two parents 119891 and 119898 we define gen(119886) =MAXgen(119891) gen(119898) + 1
The path-counting formula forΦ119886119887119888119889
is as follows
Φ119886119887119888119889
= sum
119860
( sum
Type 1(1
2)
1198712-pair
Φ119860119860119860
+ sum
Type 2(1
2)
1198712-pair+1
Φ119860119860119860
+ sum
Type 3(1
2)
1198712-pair+2
Φ119860119860
+ sum
Type 4(1
2)
1198712-pair+1
Φ119860119860)
+ sum
(119878119879)isinType 5(1
2)
119871⟨119875119878119886119875119878119887⟩+119871⟨119875119879119888119875119879119889
⟩+1
Φ119861119861
(16)
where 119860 a quad-common ancestor of 119886 119887 119888 and 119889 119878a common ancestor of 119886 and 119887 and 119879 a common ances-tor of 119888 and 119889 For ⟨(119875
119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ (119878 = 119879 =
119860) there are four types (ieType 1 to Type 4)
12 Computational and Mathematical Methods in Medicine
S0S1 S2 S3 S4 S5 S6 S7
S8 S9 S10 S11 S12 S13 S14 S15 S16
PAa
PAdPAb
PAc
Figure 13 Scenarios of ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ at path-pair level
Type 1 zero root homo-overlap and zero root heter-overlapType 2 zero root homo-overlap and one root heter-overlap 119875
119860119903ending at 119903
Type 3
zero root homo-overlap and two rootheter-overlap 119875
1198601199031and1198751198601199032
ending at1199031and 1199032 respectively
one root homo-overlap 119875119860ℎ
ending at ℎand two root heter-overlap 119875
1198601199031and 119875
1198601199032
ending at 1199031and 1199032 and 119903
1= 1199032
(17)
Type 4 one root homo-overlap 119875119860ℎ
ending at ℎ andtwo root heter-overlap ending at 119903
1and 1199032 and ℎ =
1199031= 1199032 For ⟨(119875
119878119886 119875119878119887) (119875119879119888 119875119879119889)⟩ (119878 = 119879) there is
one type (ie Type 5)Type 5 ⟨119875
119878119886 119875119878119887⟩ has zero overlap individual ⟨119875
119879119888
119875119879119889⟩ has zero overlap individual
At most one path-pair (either ⟨119875119878119886 119875119878119887⟩ or ⟨119875
119879119888
119875119879119889⟩) can have crossover individualsBetween a path from ⟨119875
119878119886 119875119878119887⟩ and a path from ⟨119875
119879119888 119875119879119889⟩
there are no overlap individuals but there can be crossoverindividuals 119909 where 119909 = 119878 and 119909 = 119879
119861=
119878 when gen (119878) lt gen (119879)119878 when gen (119878) = gen (119879)
and 119879 has two parents119879 otherwise
1198712-pair =
119871119875119860119886+ 119871119875119860119887
+119871119875119860119888+ 119871119875119860119889
for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 119871119875119860119903
for Type 2119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 1198711198751198601199031
minus 1198711198751198601199032
for Type 3119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 2 lowast 119871
119875119860ℎfor Type 4
119871⟨119875119878119886 119875119878119887⟩
= 119871119875119878119886+ 119871119875119878119887
for Type 5
119871⟨119875119879119888 119875119879119889⟩
= 119871119875119879119888+ 119871119875119879119889
for Type 5
(18)
Note that if ⟨119886 119887⟩ and ⟨119888 119889⟩ have zero quad-commonancestors we have the following formula for Φ
119886119887119888119889
Φ119886119887119888119889
= sum
(119878119879)isinType 6(1
2)
119871⟨119875119878119886119875119878119887⟩+119871⟨119875119879119888119875119879119889
⟩
Φ119878119878lowast Φ119879119879 (19)
Type 6 ⟨119875119878119886 119875119878119887⟩ is a nonoverlapping path-pair and ⟨119875
119879119888
119875119879119889⟩ is a nonoverlapping path-pair Between a path from
⟨119875119878119886 119875119878119887⟩ and a path from ⟨119875
119879119888 119875119879119889⟩ there are no overlap
individuals but there can be crossover individuals119871⟨119875119878119886 119875119878119887⟩
and 119871⟨119875119879119888119875119879119889⟩
are defined as in Type 5The correctness of the path-counting formula forΦ
119886119887119888119889is
proven in Appendix C For completeness please refer to [18]for the path-counting formulas for Φ
119886119886119887119888 Φ119886119887119886119888
Φ119886119887119886119887
andΦ119886119886119886119887
34 Experimental Results In this section we show the effi-ciency of our path-counting method using NodeCodes forcondensed identity coefficients by making comparisons withthe performance of a recursive method used in [10] Weimplemented two methods (1) using recursive formulas tocompute each required kinship coefficient and generalizedkinship coefficient (2) using path-counting method coupledwith NodeCodes to compute each required kinship coeffi-cient and generalized kinship coefficient independently Werefer to the first method as Recursive the second methodas NodeCodes For completeness please refer to [18] for thedetails of the NodeCodes-based method
Nodecodes of a node is a set of labels each representing apath to the node from its ancestors Given a pedigree graphlet 119903 be the progenitor (ie the node with 0 in-degree)(For simplicity we assume there is one progenitor 119903 asthe ancestor of all individuals in the pedigree Otherwise avirtual node 119903 can be added to the pedigree graph and allprogenitors can be made children of 119903) For each node 119906 inthe graph the set of NodeCodes of 119906 denoted as NC(119906) areassigned using a breadth-first-search traversal starting from119903 as follows
(1) If 119906 is 119903 then NC(119903) contains only one element theempty string
(2) Otherwise let 119906 be a node with NC(119906) and V0 V1
V119896be 119906rsquos children in sibling order then for each 119909
in NC(119906) a code 119909119894lowast is added to NC(V119894) where 0 le
119894 le 119896 and lowast indicates the gender of the individualrepresented by node V
119894
Computational and Mathematical Methods in Medicine 13
Computations of kinship coefficients for two individualsand generalized kinship coefficients for three individualspresented in [11 12 14 15] are using NodeCodes TheNodeCodes-based computation schemes can also be appliedfor the generalized kinship coefficients for four individualsand two pairs of individuals For completeness please referto [18] for the details using NodeCodes to compute thegeneralized kinship coefficients for four individuals and twopairs of individuals based on our proposed path-countingformulas in Sections 32 and 33
In order to test the scalability of our approach for cal-culating condensed identity coefficients on large pedigreeswe used a population simulator implemented in [11] togenerate arbitrarily large pedigreesThe population simulatoris based on the algorithm for generating populations withoverlapping generations in Chapter 4 of [19] along withthe parameters given in Appendix B of [20] to model therelatively isolated Finnish Kainuu subpopulation and itsgrowth during the years 1500ndash2000 An overview of thegeneration algorithmwas presented in [11 12 14]The param-eters include startingending year initial population sizeinitial age distribution marriage probability maximum ageat pregnancy expected number of children by time periodimmigration rate and probability of death by time period andage group
We examine the performance of condensed identity coef-ficients using twelve synthetic pedigrees which range from75 individuals to 195197 individuals The smallest pedigreespans 3 generations and the largest pedigree spans 19 gener-ations We analyzed the effects of pedigree size and the depthof individuals in the pedigree (the longest path between theindividual and a progenitor) on the computation efficiencyimprovement
In the first experiment 300 random pairs were selectedfrom each of our 12 synthetic pedigrees Figure 14 showscomputation efficiency improvement for each pedigree Ascan be seen the improvement of NodeCodes over Recursivegrew increasingly larger as the pedigree size increased froma comparable amount of 2683 on the smallest pedigree to9475 on the largest pedigree It also shows that path-count-ing method coupled with NodeCodes can scale very well onlarge pedigrees in terms of computing condensed identitycoefficients
In our next experiment we examined the effect of thedepth of the individual in the pedigree on the query time Foreach depth we generated 300 random pairs from the largestsynthetic pedigree
Figure 15 shows the effect of depth on the compu-tation efficiency improvement We can see the improve-ment of NodeCodes over Recursive ranging from 8648 to9130
4 Conclusion
We have introduced a framework for generalizing Wrightrsquospath-counting formula for more than two individuals Aim-ing at efficiently computing condensed identity coefficients
0
50
100
150
200
77 181
383
769
1558
3105
6174
1235
1
2466
7
4976
1
9832
8
1951
97
250
300
Aver
age t
ime (
ms)
Individuals in pedigree
RecursiveNodecodes
Figure 14 The effect of pedigree size on computation efficiencyimprovement
0200400600800
10001200140016001800
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Aver
age t
ime (
ms)
Depth
RecursiveNodeCodes
Figure 15 The effect of depth on computation efficiency improve-ment
we proposed path-counting formulas (PCF) for all general-ized kinship coefficients for which are sufficient for express-ing condensed identity coefficients by a linear combinationWe also perform experiments to compare the efficiency of ourmethod with the recursive method for computing condensedidentity coefficients on large pedigrees Our future workincludes (i) further improvements on condensed identifycoefficients computation by collectively calculating the setof generalized kinship coefficients to avoid redundant com-putations and (ii) experimental results for using PCF inconjunction with encoding schemes (eg compact path-encoding schemes [13]) for computing condensed identitycoefficients on very large pedigrees
Appendices
A Path-Counting Formulas of Special Cases
A1 Path-Counting Formula for Φ119886119886119887
For ⟨1198751198601198861 1198751198601198862⟩ we
introduce a special case where 1198751198601198861
and 1198751198601198862
aremergeable
14 Computational and Mathematical Methods in Medicine
PAa1 PAa2 PAa1 PAa2
S0 S1
PAb PAb PAb
If is mergeable⟨P ⟩Aa1 PAa2
PAa
S2 S3
Figure 16 A path-pair level graphical representation of ⟨1198751198601198861 1198751198601198862
119875119860119887⟩
Definition A1 (Mergeable Path-Pair) A path-pair ⟨1198751198601198861
1198751198601198862⟩ is mergeable if and only if the two paths 119875
1198601198861and 119875
1198601198862
are completely identical
Next we present a graphical representation of ⟨1198751198601198861 1198751198601198862
119875119860119887⟩ in Figure 16
Lemma A2 For 1198782and 119878
3in Figure 16 ⟨119875
1198601198861 1198751198601198862⟩ cannot
be a mergeable path-pair
Proof For 1198782and 119878
3 if ⟨119875
1198601198861 1198751198601198862⟩ is mergeable then
any common individual 119904 between 1198751198601198861
and 119875119860119887
is alsoa shared individual between 119875
1198601198862and 119875
119860119887 It means
119904 isin 119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) which contradicts the fact that
119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) = 0
Considering all three scenarios in Figure 16 only 1198781can
have a mergeable path-pair ⟨1198751198601198861 1198751198601198862⟩ by Lemma A2 Now
we present our path-counting formula forΦ119886119886119887
where 119886 is notan ancestor of 119887
Φ119886119886119887
= sum
119860
( sum
Type 1(1
2)
119871 tripleminus1
Φ119860119860119860
+ sum
Type 2(1
2)
119871 triple
Φ119860119860
+ sum
Type 3(1
2)
119871⟨119875119860119886119875119860119887⟩+1
Φ119860119860)
(A1)
where 119860 a common ancestor of 119886 and 119887When ⟨119875
1198601198861 1198751198601198862⟩ is not mergeable
Type 1 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has no root 2-overlap
Type 2 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has one root 2-overlap path
119875119860119904
ending at the individual 119904
When ⟨1198751198601198861 1198751198601198862⟩ is mergeable
Type 3 ⟨119875119860119886 119875119860119887⟩ is a nonoverlapping path-pair
119871 triple = 1198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887
for Type 11198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887minus 119871119875119860119904
for Type 2
119871⟨119875119860119886 119875119860119887⟩
= 119871119875119860119886+ 119871119875119860119887
for Type 3
(A2)
For the sake of completeness if 119886 is an ancestor of 119887 there isno recursive formula for Φ
119886119886119887in [10] but we can use either
the recursive formula for Φ119886119887119888
or the path-counting formulaforΦ119886119887119888
to computeΦ11988611198862119887
A2 Path-Counting Formula for Φ119886119886119887119888
Given a path-quad⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩ if ⟨119875
1198601198861 1198751198601198862⟩ is not mergeable then
we process the path-quad as equivalent to ⟨119875119860119886 119875119860119887 119875119860119888
119875119860119889⟩ If ⟨119875
1198601198861 1198751198601198862⟩ is mergeable the path-quad ⟨119875
1198601198861 1198751198601198862
119875119860119887 119875119860119888⟩ can be condensed to scenarios for ⟨119875
119860119886 119875119860119887 119875119860119888⟩
Now we present a path-counting formula forΦ119886119886119887119888
where119886 is not an ancestor of 119887 and 119888 as follows
Φ119886119886119887119888
= sum
119860
( sum
Type 1(1
2)
119871quadminus1
Φ119860119860119860119860
+ sum
Type 2(1
2)
119871quad
ΦAAA
+ sum
Type 3(1
2)
119871quad+1
Φ119860119860)
+sum
119860
( sum
Type 4(1
2)
119871 triple+1
Φ119860119860119860
+ sum
Type 5(1
2)
119871 triple+2
Φ119860119860)
(A3)
where 119860 a quad-common ancestor of 119886 119887 119888 and 119889When ⟨119875
1198601198861 1198751198601198862⟩ is not mergeable
Type 1 zero root 2-overlap and zero root 3-overlappathType 2 one root 2-overlap path 119875
119860119904ending at 119904
Type 3
Case 1 two root 2-overlap paths 1198751198601199041
and 1198751198601199042
ending at 1199041and 1199042 respectively
Case 2 one root 3-overlap path 119875119860119905
ending at 119905Case 3 one root 2-overlapand one root 3-overlap paths119875119860119904
and 119875119860119905
ending at 119904 and 119905respectively
(A4)
When ⟨1198751198601198861 1198751198601198862⟩ is mergeable
Type 4 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has zero root 2-overlap path
Type 5 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has one root 2-overlap path119875
119860119904
ending at 119904
119871quad=
1198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
for Type 11198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus119871119875119860119904
for Type 21198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus1198711198751198601199041
minus 1198711198751198601199042
for Case 1isinType 31198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus119871119875119860119905
for Case 2isinType 31198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus119871119875119860119905minus 119871119875119860119904
for Case 3isinType 3
119871 triple = 119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
for Type 4119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888minus 119871119875119860119904
for Type 5(A5)
Computational and Mathematical Methods in Medicine 15
Note that if 119886 is an ancestor of either 119887 or 119888 or both ofthem then the path-counting formula of Φ
119886119887119888119889is applicable
to computeΦ11988611198862119887119888
A3 Path-Counting Formula for Φ119886119886119886119887
A special case of⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ for ⟨119875
1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ is introduced
when ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable With the existence of
a mergeable path-triple ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ can be con-
densed to ⟨119875119860119886 119875119860119887⟩
Definition A3 (Mergeable Path-Triple) Given three paths1198751198601198861
1198751198601198862
and 1198751198601198863
they are mergeable if and only if theyare completely identical
Lemma A4 Given a path-quad ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there
must be at least one mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩
⟨1198751198601198861 1198751198601198863⟩ ⟨1198751198601198862 1198751198601198863⟩
Proof For an individual 119886 with two parents 119891 and 119898 thepaternal allele of the individual 119886 is transmitted from 119891 andthe maternal allele is transmitted from119898 At allele level onlytwo descent paths starting from an ancestor are allowed Fora path-quad ⟨119875
1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there must be at least one
mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩ ⟨1198751198601198861 1198751198601198863⟩ and
⟨1198751198601198862 1198751198601198863⟩
For simplicity we treat ⟨1198751198601198861 1198751198601198862⟩ as a default mergeable
path-pairNow we present the path-counting formula for Φ
119886119886119886119887
where 119886 is not an ancestor of 119887 as follows
Φ119886119886119886119887
= sum
119860
(3
2( sum
Type 1(1
2)
119871 tripleminus1
Φ119860119860119860
+ sum
Type 2(1
2)
119871 triple
Φ119860119860)
+ sum
Type 3(1
2)
119871pair+2
Φ119860119860)
(A6)
where 119860 a common ancestor of 119886 and 119887When there is only one mergeable path-pair (let us con-
sider ⟨1198751198601198861 1198751198601198862⟩ as the mergeable path-pair)
Type 1 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has zero root 2-overlap path
Type 2 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has one root 2-overlap path
119875119860119904
ending at 119904
When ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable
Type 3 ⟨119875119860119886 119875119860119887⟩ is nonoverlapping
119871 triple = 1198711198751198601198861
+ 1198711198751198601198863
+ 119871119875119860119887
for Type 11198711198751198601198861
+ 1198711198751198601198863
+ 119871119875119860119887minus 119871119875119860119904
for Type 2
119871pair = 119871119875119860119886 + 119871119875119860119887 for Type 3
(A7)
Note that if 119886 is an ancestor of 119887 we treat Φ119886119886119886119887
=
Φ119886111988621198863119887
Then we apply the path-counting formula forΦ119886119887119888119889
to computeΦ119886111988621198863119887
Case21 Case31 ΦAAAΦabCase22 Case32
Case23 ΦAA
Figure 17 Dependency graph for different cases regardingΦ119886119887119888
andΦ119886119886119887
B Proof for Path-Counting Formulas ofThree Individuals
Wefirst demonstrate that for one triple-common ancestor119860the path-counting computation of Φ
119886119887119888is equivalent to the
computation using recursive formulas Then we prove thecorrectness of the path-counting computation for multipletriple-common ancestors
B1 One Triple-Common Ancestor Considering the differenttypes of path-triples starting from a triple-common ancestor119860 in a pedigree graph119866 contributing toΦ
119886119887119888andΦ
119886119886119887119866 can
have 5 different cases
Case 21 119866 does not haveany path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with root overlapCase 22 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with root overlapCase 23 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
having mergeablepath-pair⟨119875
1198601198861 1198751198601198862⟩
lArr997904 Φ119886119886119887
Case 31 119866 does not haveany path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩
with root overlapCase 32 119866 has path-triples
⟨119875119860119886 119875119860119887 119875119860119888⟩
with root overlap
lArr997904 Φ119886119887119888
(B1)
Based on the 5 cases from Case 21 to Case 32 we firstconstruct a dependency graph shown in Figure 17 consist-ent with the recursive formulas (3) (4) and (5) for the gener-alized kinship coefficients for three individuals
Then we take the following steps to prove the correctnessof the path-counting formulas (12) and (A1)
(i) forΦ119886119887 the correctness of the path-counting formula
(ie Wrightrsquos formula) is proven in [21] For Case 21and Case 22 the correctness is proven based on thecorrectness of Cases 31 and 32
(ii) for Case 23 it has no cycle but only depends on Φ119886119887
Thus we prove the correctness of Case 23 by trans-forming the case toΦ
119886119887
16 Computational and Mathematical Methods in Medicine
a b
c
(a)
A
a b c
(b)
Figure 18 (a) 119888 is a parent of 119886 and 119887 (b) no individual is a parent of another
Parent-child relationshipAncestor-descendant relationship
A
a
s v p
f b c
(a)
Parent-child relationshipAncestor-descendant relationship
c
a
s v
f b
(b)
Figure 19 (a) No individual is a parent of another (b) 119888 is an ancestor of 119886 and 119887
(iii) for Cases 31 and 32 the correctness is proven byinduction on the number of edges 119899 in the pedigreegraph 119866
B11 Correctness Proof for Case 31
Case 31 ForΦ119886119887119888
119866 does not have any path triples ⟨119875119860119886 119875119860119887
119875119860119888⟩ with root overlap
Proof (Basis) There are two basic scenarios (i) one individ-ual is a parent of another (ii) no individual is a parent ofanother among 119886 119887 and 119888
Using the recursive formula (3) to compute Φ119886119887119888
forFigure 18(a) Φ
119886119887119888= (12)Φ
119888119887119888= (12)
2
Φ119888119888119888 for Figure 18(b)
Φ119886119887119888= (12)Φ
119860119887119888= (12)
2
Φ119860119860119888
= (12)3
Φ119860119860119860
Using the path-counting formula (12) if a path-triple
⟨119875119860119886 119875119860119887 119875119860119888⟩ has no root overlap (ie Type 1) then the
contribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ
119886119887119888can be computed as
follows sumType 1(12)119871⟨119875119860119886119875119860119887
119875119860119888⟩Φ119860119860119860
where 119871⟨119875119860119886119875119860119887 119875119860119888⟩
=
119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
For Figure 18(a) 119888 is the only triple-common ancestor
and we obtain Φ119886119887119888
= (12)119871⟨119875119888119886119875119888119887
119875119888119888⟩Φ119888119888119888
= (12)2
Φ119888119888119888 for
Figure 18(b) we obtain Φ119886119887119888
= (12)119871⟨119875119860119886119875119860119887
119875119860119888⟩Φ119860119860119860
=
(12)3
Φ119860119860119860
Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Then we show it istrue for 119899 = 119896 + 1
For Figures 19(a) and 19(b) among 119886 119887 and 119888 let 119886 be theindividual having the longest path starting from their triple-common ancestor in the pedigree graph119866with (119896+1) edgesIf we remove the node 119886 and cut the edge 119891 rarr 119886 from 119866
then the new graph 119866lowast has 119896 edges In terms of computingΦ119891119887119888
119866lowast satisfies the condition for induction hypothesisFor Figure 19(a) Φ
119891119887119888= sumType 1(12)
119871⟨119875119860119891119875119860119887119875119860119888⟩Φ119860119860119860
Based on the recursive formula (3)Φ
119886119887119888= (12)(Φ
119891119887119888+Φ119898119887119888)
where 119891 and 119898 are parents of 119886 In 119866 119886 only has one parent119891 thus it indicatesΦ
119898119887119888= 0 Then we can plug-in the path-
counting formula forΦ119891119887119888
to obtain
Φ119886119887119888=1
2Φ119891119887119888
=1
2lowast sum
Type 1(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩
Φ119860119860119860
= sum
Type 1(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860119860
∵ 119871⟨119875119860119886119875119860119887 119875119860119888⟩
= 119871⟨119875119860119891119875119860119887 119875119860119888⟩
+ 1
there4 Φ119886119887119888= sum
Type 1(1
2)
119871⟨119875119860119886119875119860119887119875119860119888⟩
Φ119860119860119860
(B2)
Similarly for Figure 19(b) we obtain Φ119886119887119888
=
sumType 1(12)119871⟨119875119888119891119875119888119887119875119888119888⟩+1
Φ119888119888119888= sumType 1(12)
119871⟨119875119888119886119875119888119887119875119888119888⟩Φ119888119888119888
Thus it is true for 119899 = 119896 + 1
B12 Correctness Proof for Case 32
Case 32 ForΦ119886119887119888
119866 has path triples ⟨119875119860119886 119875119860119887 119875119860119888⟩with root
overlap
Proof (Basis) There are three basic scenarios (i) there are twoindividuals who are parents of another (ii) there is only oneindividual who is parent of another (iii) there is no individualwho is a parent of another among 119886 119887 and 119888
Computational and Mathematical Methods in Medicine 17
a
b
c
(a)
A
a
b c
(b)
A
a
s
b
c
(c)
Figure 20 (a) 119887 is a parent of 119886 and 119888 is a parent of 119887 (b) 119887 is a parentof 119886 (c) no individual who is a parent of another
Using the recursive formula (3) to compute Φ119886119887119888
inFigure 20 for Figure 20(a) Φ
119886119887119888= (12)Φ
119887119887119888= (12)
2
Φ119887119888=
(12)3
Φ119888119888 for Figure 20(b)Φ
119886119887119888= (12)Φ
119887119887119888= (12)
2
Φ119887119888=
(12)4
Φ119860119860
for Figure 20(c)Φ119886119887119888= (12)
2
Φ119904119904119888= (12)
3
Φ119904119888=
(12)5
Φ119860119860
Using the path-counting formula (12) if a path-triple
⟨119875119860119886 119875119860119887 119875119860119888⟩ has root overlap (ie Type 2) then the con-
tribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ
119886119887119888can be computed as
followssumType 2(12)119871⟨119875119860119886119875119860119887
119875119860119888⟩+1
Φ119860119860
where 119871⟨119875119860119886 119875119860119887 119875119860119888⟩
=
119871119875119860119886
+ 119871119875119860119887
+ 119871119875119860119888minus 119871119875119860119904
and 119904 is the last individual of theroot overlap path 119875
119860119904
For Figure 20(a) 119888 is the only triple-common ancestorand we obtain Φ
119886119887119888= (12)
119871⟨119875119888119886119875119888119887119875119888119888⟩+1
Φ119888119888= (12)
2+1
Φ119888119888=
(12)3
Φ119888119888 Similarly for Figures 20(b) and 20(c) we obtain
Φ119886119887119888= (12)
4
Φ119860119860
and Φ119886119887119888= (12)
5
Φ119860119860
respectively
Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Show that it is truefor = 119896 + 1
For Figures 21(a) 21(b) and 21(c) among 119886 119887 and 119888 let119886 be the individual who has the longest path and let 119901 be aparent of 119886 Then we cut the edge 119901 rarr 119886 from 119866 and obtaina new graph 119866lowast which satisfies the condition of inductionhypothesis For Figure 21(a) we use the path-counting for-mula forΦ
119891119887119888in 119866lowast Φ
119891119887119888= sumType 2(12)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860
In 119866 119891 is the only parent of 119886 according to the recursive
formula (3) we have Φ119886119887119888= (12)Φ
119891119887119888 Then we can plug-in
the Φ119891119887119888
and obtain
Φ119886119887119888=1
2Φ119891119887119888
=1
2sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860
= sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1
Φ119860119860
∵ 119871⟨119875119860119886 119875119860119887 119875119860119888⟩
= 119871⟨119875119860119891119875119860119887 119875119860119888⟩
+ 1
there4 Φ119886119887119888= sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1
Φ119860119860
= sum
Type 2(1
2)
119871⟨119875119860119886119875119860119887119875119860119888⟩+1
Φ119860119860
(B3)
For Figures 21(b) and 21(c) we take the same steps as we cal-culate Φ
119886119887119888for Figure 21(a)
In summary it is true for 119899 = 119896 + 1
A
a
s
t
f
b
c
(a)
a
t
b
A
s c
(b)
a
s
t
b
c
(c)Figure 21 (a) No individual who is a parent of another (b) 119887 is aparent of 119886 (c) 119887 is a parent of 119886 and 119888 is an ancestor of 119887
B13 Correctness Proof for Case 23
Case 23 For Φ119886119886119887
the path-triples in the pedigree graph 119866have mergeable path-pair
Proof Considering the relationship between 119886 and 119887 119866has two scenarios (i) 119887 is not an ancestor of 119886 (ii) 119887 isan ancestor of 119886 Using the path-counting formula (A1)if a path-triple ⟨119875
1198601198861 1198751198601198862 119875119860119887⟩ isin Type 3 which means
that it has a mergeable path-pair then the contributionof ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ to Φ
119886119886119887can be computed as follows
sumType 3(12)119871⟨119875119860119886119875119860119887
⟩+1Φ119860119860
where 119871⟨119875119860119886 119875119860119887⟩
= 119871119875119860119886+ 119871119875119860119887
Using the recursive formula (4) we obtain Φ
119886119886119887=
(12)(Φ119886119887+ Φ119891119898119887)
For Figure 22(a) 119860 is a common ancestor of 119886 and 119887∵ 119886 only has one parent 119891
there4 Φ119886119886119887
=1
2(Φ119886119887+ Φ119891119898119887)
=1
2(Φ119886119887+ 0) =
1
2Φ119886119887
(as 119898 is missing) (B4)
For Φ119886119887 we use Wrightrsquos formula and obtain Φ
119886119887=
sum119875(12)119871⟨119875119860119886119875119860119887
⟩Φ119860119860
where 119875 denotes all nonoverlappingpath-pairs ⟨119875
119860119886 119875119860119887⟩
Then we have Φ119886119886119887
= (12)Φ119886119887
=
(12)sum119875(12)119871⟨119875119860119886119875119860119887
⟩Φ119860119860= sum119875(12)119871⟨119875119860119886119875119860119887
⟩+1Φ119860119860
For Figure 22(b) we can also transform the computation
of Φ119886119886119887
to Φ119886119887
In summary it shows that the path-counting formula(A1) is true for Case 23
B14 Correctness Proof for Cases 21 and 22 For Φ119886119886119887
whenthere is no path-triple having mergeable path-pair (ie thepath-triple belongs to either Case 21 or Case 23)Φ
119886119886119887can be
transformed toΦ11988611198862119887
which is equivalent to the computationof Φ119886119887119888
for Cases 31 and 32 The correctness of our path-counting formula for Cases 31 and 32 is proven Thus weobtain the correctness for Φ
119886119886119887when the path-triple belongs
to either Case 21 or Case 22
B2 Multiple Triple-Common Ancestors Now we providethe correctness proof for multiple triple-common ancestorsregarding the path-counting formulas (12) and (A1)
18 Computational and Mathematical Methods in Medicine
A
a
s
w
t
f
b
Parent-child relationshipAncestor-descendant relationship
(a)
a
s
f
b
Parent-child relationshipAncestor-descendant relationship
(b)
Figure 22 (a) 119887 is not an ancestor of 119886 (b) 119887 is an ancestor of 119886
Lemma A Given a pedigree graph 119866 and three individuals 119886119887 119888 having at least one trip-common ancestorΦ
119886119887119888is correctly
computed using the path counting formulas (12) and (A1)
Proof Proof by induction on the number of triple-commonancestorsBasis 119866 has only one triple-common ancestor of 119886 119887 and 119888
The correctness of (12) and (A1) for 119866 with only one tri-ple-common ancestor of 119886 119887 and 119888 is proven in the previoussection
Induction Hypothesis Assume that if 119866 has 119896 or less triple-common ancestors of 119886 119887 and 119888 (12) and (A1) are correct for119866
Induction Step Now we show that it is true for 119866 with 119896 + 1triple-common ancestors of 119886 119887 and 119888
Let 119879119903119894 119862(119886 119887 119888 119866) denote all triple-common ancestorsof 119886 119887 and 119888 in 119866 where 119879119903119894 119862(119886 119887 119888 119866) = 119860
119894| 1 le 119894 le 119896 +
1 Let 1198601be the most top triple-common ancestor such that
there is no individual among the remaining ancestors 119860119894|
2 le 119894 le 119896 + 1 who is an ancestor of 1198601 Let 119878(119860
1) denote the
contribution from 1198601to Φ119886119887119888
Because119860
1is themost top triple-common ancestor there
is no path-triple from 119860119894| 2 le 119894 le 119896 + 1 to 119886 119887 and
119888 which passes through 1198601 Then we can remove 119860
1from
119866 and delete all out-going edges from 1198601and obtain a new
graph 1198661015840 which has 119896 triple-common ancestors of 119886 119887 and 119888It means 119879119903119894 119862(119886 119887 119888 1198661015840) = 119860
119894| 2 le 119894 le 119896 + 1
For the new graph 1198661015840 we can apply our induction
hypothesis and obtainΦ119886119887119888(1198661015840
)For the most top triple-common ancestor 119860
1 there are
two different cases considering its relationship with the othertriple-common ancestors
(1) there is no individual among 119860119894| 2 le 119894 le 119896 + 1 who
is a descendant of 1198601
(2) there is at least one individual among 119860119894| 2 le 119894 le
119896 + 1 who is a descendant of 1198601
For (1) since no individual among 119860119894| 2 le 119894 le 119896 + 1 is a
descendant of 1198601 the set of path-triples from 119860
1to 119886 119887 and
119888 is independent of the set of path-triples from 119860119894| 2 le 119894 le
119896 + 1 to 119886 119887 and 119888 It also means that the contribution from
1198601toΦ119886119887119888
is independent of the contribution from the othertriple-common ancestors
Summing up all contributions we can obtainΦ119886119887119888(119866) =
Φ119886119887119888(1198661015840
) + 119878(1198601)
For (2) let119860119895be one descendant of119860
1 Now both119860
1and
119860119895can reach 119886 119887 and 119888119901119905119894= 119905119886 1198601rarr sdot sdot sdot rarr 119886 119905
119887 1198601rarr sdot sdot sdot rarr 119887 119905
119888 1198601rarr
sdot sdot sdot rarr 119888 a path-triple from 1198601to 119886 119887 and 119888
If 119905119886 119905119887 and 119905
119888all pass through119860
119895 then the path-triple119901119905
119894
is not an eligible path-triple for Φ119886119887119888
When we compute thecontribution from119860
1toΦ119886119887119888
we exclude all such path-tripleswhere 119905
119886 119905119887 and 119905
119888all pass through a lower triple-common
ancestor In other words an eligible path-triple from 1198601
regarding Φ119886119887119888
cannot have three paths all passing through alower triple-common ancestor Therefore we know that thatthe contribution from119860
1toΦ119886119887119888
is independent of the contri-bution from the other triple-common ancestors Summing upall contributions we obtainΦ
119886119887119888(119866) = Φ
119886119887119888(1198661015840
) + 119878(1198601)
C Proof for Four Individuals and TwoPairs of Individuals
Here we give a proof sketch for the correctness of pathcounting formulas for four individuals First of all for fourindividuals in a pedigree graph 119866 we present all differentcases based on which we construct a dependency graphThe correctness of the path-counting formulas for two-pairindividuals can be proved similarly
C1 Proof for Four Individuals Consider the existence ofdifferent types of path-quads regarding Φ
119886119887119888119889 Φ119886119886119887119888
andΦ119886119886119886119887
there are 15 cases for a pedigree graph 119866
Case 21 119866 has path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with zero root overlapCase 22 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with one root overlapCase 23 119866 has path-pairs
⟨119875119860119886 119875119860119887⟩
with zero root overlap
lArr997904 Φ119886119886119886119887
Computational and Mathematical Methods in Medicine 19
Case21
Case31 ΦAAA
ΦAAA
Case41
Case42
Case34ΦAA
Case32
Case331
Case22
Case23
Case431
Case35
Case432
Case4 33
Case332
Case333
Figure 23 Dependency graph for different cases for four individuals
Case 31 119866 has path-quads⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with zero root overlapCase 32 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 2-overlapCase 331 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with two root 2-overlapCase 332 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 3-overlapCase 333 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 2-overlapand one root 3-overlap
Case 34 119866 has path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩
with zero root overlapCase 35 119866 has path-triples
⟨119875119860119886 119875119860119887 119875119860119888⟩
with one root overlap
lArr997904 Φ119886119886119887119888
Case 41 119866 has path-quads⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with zero root overlapCase 42 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 2-overlapCase 431 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with two root 2-overlapCase 432 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 3-overlapCase 433 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 2-overlapand one root 3-overlap
lArr997904 Φ119886119887119888119889
(C1)Then we construct a dependency graph shown in
Figure 23 for all cases for four individualsAccording to the dependency graph in Figure 23 the
intermediate steps including Cases 34 and 35 are already
proved for the computation of Φ119886119887119888
The correctness of thetransformation fromCase 42 to Case 34 can be proved basedon the recursive formula forΦ
119886119887119888119889andΦ
119886119886119887119888 Similarly we can
obtain the transformation from Case 431 to Case 35
C2 Proof for TwoPairs of Individuals Consider the existenceof different types of 2-pair-path-pair regarding Φ
119886119887119888119889 there
are 9 cases which are listed as follows
Case 41 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-
overlap and zero root heter-overlap
Case 42 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-
overlap and one root heter-overlap
Case 431 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root
homo-overlap and two root heter-overlap
Case 432 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root
homo-overlap and two root heter-overlap
Case 44119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root homo-
overlap and zero root heter-overlap
Case 45 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with two root homo-
overlap and zero root heter-overlap
Case 46 119866 has path-triples ⟨119875119860119886 119875119860119887 119875119860119888⟩ with zero root
overlapCase 47 119866 has path-triples ⟨119875
119860119886 119875119860119887 119875119860119888⟩ with one root
overlap
Case 48 119866 has path-pairs ⟨119875119879119888 119875119879119889⟩ with zero root overlap
Then we construct a dependency graph for the casesrelating to Φ
119886119887119888119889in Figure 24
According to the dependency graph in Figure 24Cases 46 47 and 48 are the intermediate steps whichalready are proved for the computation of Φ
119886119887119888 The
correctness of the transformation from Case 42 to Case 46can be proved based on the recursive formula for Φ
119886119887119888119889and
Φ119886119887119886119888
Similarly we can obtain the transformation fromCases 431 and 432 to Case 47 as well as from Case 44 toCase 48 accordingly
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
20 Computational and Mathematical Methods in Medicine
Case41
Case44
ΦAAA
Case42 Case46
Case48
ΦAA
ΦTT
Case431 Case47
Case432
ΦAAAA
Figure 24 Dependency graph for different cases for two pairs of individuals
Acknowledgments
The authors thank Professor Robert C Elston Case Schoolof Medicine for introducing to them the identity coefficientsand referring them to the related literature [7 10 17] Thiswork is partially supported by the National Science Founda-tionGrants DBI 0743705 DBI 0849956 andCRI 0551603 andby the National Institute of Health Grant GM088823
References
[1] Surgeon Generalrsquos New Family Health History Tool Is ReleasedReady for ldquo21st Century Medicinerdquo httpcompmedcomcate-gorypeople-helping-peoplepage7
[2] M Falchi P Forabosco E Mocci et al ldquoA genomewidesearch using an original pairwise sampling approach for largegenealogies identifies a new locus for total and low-density lipo-protein cholesterol in two genetically differentiated isolates ofSardiniardquoThe American Journal of Human Genetics vol 75 no6 pp 1015ndash1031 2004
[3] M Ciullo C Bellenguez V Colonna et al ldquoNew susceptibilitylocus for hypertension on chromosome 8q by efficient pedigree-breaking in an Italian isolaterdquo Human Molecular Genetics vol15 no 10 pp 1735ndash1743 2006
[4] Glossary of Genetic Terms National Human Genome ResearchInstitute httpwwwgenomegovglossaryid=148
[5] CW CottermanA calculus for statistico-genetics [PhD thesis]Columbus Ohio USA Ohio State University 1940 Reprintedin P Ballonoff Ed Genetics and Social Structure DowdenHutchinson amp Ross Stroudsburg Pa USA 1974
[6] G Malecot Les mathematique de lrsquoheredite Masson ParisFrance 1948 Translated edition The Mathematics of HeredityFreeman San Francisco Calif USA 1969
[7] M Gillois ldquoLa relation drsquoidentite en genetiquerdquo Annales delrsquoInstitut Henri Poincare B vol 2 pp 1ndash94 1964
[8] D L Harris ldquoGenotypic covariances between inbred relativesrdquoGenetics vol 50 pp 1319ndash1348 1964
[9] A Jacquard ldquoLogique du calcul des coefficients drsquoidentite entredeux individualsrdquo Population vol 21 pp 751ndash776 1966
[10] G Karigl ldquoA recursive algorithm for the calculation of identitycoefficientsrdquo Annals of Human Genetics vol 45 no 3 pp 299ndash305 1981
[11] B Elliott S F Akgul S Mayes and Z M Ozsoyoglu ldquoEfficientevaluation of inbreeding queries on pedigree datardquo in Proceed-ings of the 19th International Conference on Scientific and Statis-tical Database Management (SSDBM rsquo07) July 2007
[12] B Elliott E Cheng S Mayes and Z M Ozsoyoglu ldquoEfficientlycalculating inbreeding on large pedigrees databasesrdquo Informa-tion Systems vol 34 no 6 pp 469ndash492 2009
[13] L Yang E Cheng and Z M Ozsoyoglu ldquoUsing compactencodings for path-based computations on pedigree graphsrdquo inProceedings of the ACM Conference on Bioinformatics Compu-tational Biology and Biomedicine (ACM-BCB rsquo11) pp 235ndash244August 2011
[14] E Cheng B Elliott and Z M Ozsoyoglu ldquoScalable compu-tation of kinship and identity coefficients on large pedigreesrdquoin Proceedings of the 7th Annual International Conference onComputational Systems Bioinformatics (CSB rsquo08) pp 27ndash362008
[15] E Cheng B Elliott and Z M Ozsoyoglu ldquoEfficient compu-tation of kinship and identity coefficients on large pedigreesrdquoJournal of Bioinformatics and Computational Biology (JBCB)vol 7 no 3 pp 429ndash453 2009
[16] S Wright ldquoCoefficients of inbreeding and relationshiprdquo TheAmerican Naturalist vol 56 no 645 1922
[17] R Nadot and G Vaysseix ldquoKinship and identity algorithm ofcoefficients of identityrdquo Biometrics vol 29 no 2 pp 347ndash3591973
[18] E Cheng Scalable path-based computations on pedigree data[PhD thesis] Case Western Reserve University ClevelandOhio USA 2012
[19] V Ollikainen Simulation Techniques for Disease Gene Localiza-tion in Isolated Populations [PhD thesis] University ofHelsinkiHelsinki Finland 2002
[20] H T T Toivonen P Onkamo K Vasko et al ldquoData miningapplied to linkage diseqilibrium mappingrdquoThe American Jour-nal of Human Genetics vol 67 no 1 pp 133ndash145 2000
[21] W Boucher ldquoCalculation of the inbreeding coefficientrdquo Journalof Mathematical Biology vol 26 no 1 pp 57ndash64 1988
Submit your manuscripts athttpwwwhindawicom
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MEDIATORSINFLAMMATION
of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Behavioural Neurology
EndocrinologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Disease Markers
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
OncologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Oxidative Medicine and Cellular Longevity
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
PPAR Research
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
ObesityJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational and Mathematical Methods in Medicine
OphthalmologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Diabetes ResearchJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Research and TreatmentAIDS
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Gastroenterology Research and Practice
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Parkinsonrsquos Disease
Evidence-Based Complementary and Alternative Medicine
Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom
Computational and Mathematical Methods in Medicine 9
Ancestor-descendant relationshipParent-child relationship
a998400 b
a b a b
998400 a998400 b998400
s1 s2
A A
x w c x w c
s For Gk+1 ⟨P ⟩ = PAa PAb PAc
⟨P ⟩ = PAa PAb PAcFor Gk
Gk+1 k + 1 crossover Gk k crossover
A rarr middot middot middot rarr x rarr s rarr a998400 rarr middot middot middot rarr aA rarr middot middot middot rarr w rarr s rarr b998400 rarr middot middot middot rarr b
A rarr middot middot middot rarr x rarr s1 rarr a998400 rarr middot middot middot rarr aA rarr middot middot middot rarr w rarr s2 rarr b998400 rarr middot middot middot rarr b
A rarr c
A rarr c
Figure 9 Transforming pedigree graph 119866119896+ 1 having 119896 + 1 crossover to 119866
119896having 119896 crossover
S0
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
PAa PAd
PAb PAc
Figure 10 A path-pair level graphical representation of ⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
Proof (Sketch) The proof is by induction on the number ofcrossover individuals
Induction hypothesis assume that if119866 has 119896 or less cross-overs there is a canonical graph 1198661015840 for 119866
In the induction step let119866119896+1
be a graph with 119896+1 cross-overs let 119904 be the lowest crossover between paths 119875
119860119886and
119875119860119887
in 119866119896+1
We apply the splitting operator on 119904 in 119866119896+1
andobtain 119866
119896having 119896 crossovers by Lemma 4
315 Path-Counting Formula for Φ119886119887119888
Now we present thepath-counting formula forΦ
119886119887119888
Φ119886119887119888= sum
119860
( sum
Type 1(1
2)
119871 triple
Φ119860119860119860
+ sum
Type 2(1
2)
119871 triple+1
Φ119860119860)
(12)
where Φ119860119860= (12)(1 + 119865
119860) Φ119860119860119860
= (14)(1 + 3119865119860) 119865119860 the
inbreeding coefficient of119860119860 a triple-common ancestor of 119886119887 and 119888 Type 1 ⟨119875
119860119886 119875119860119887 119875119860119888⟩ has zero root 2-overlap Type
2 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has one root 2-overlap path 119875
119860119904ending at
the individual 119904
119871 triple = 119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888minus 119871119875119860119904
for Type 2(13)
and 119871119875119860119886
the length of the path 119875119860119886
(also applicable for 119875119860119886
119875119860119888 and 119875
119860119904)
For completeness the path-counting formula for Φ119886119886119887
isgiven in Appendix A and the correctness proof of the path-counting formula is given in Appendix B
32 Path-Counting Formulas for Four Individuals
321 Path-Pair Level Graphical Representation of ⟨119875119860119886119875119860119887
119875119860119888119875119860119889⟩ Given a path-quad ⟨119875
119860119886 119875119860119887 119875119860119888 119875119860119889⟩ and
119876119906119886119889 119862(119875119860119886 119875119860119887 119875119860119888 119875119860119889) = 0 the path-quad can have 11
scenarios 1198780ndash11987810shown in Figure 10 where all four paths are
considered symmetricallyIn Figure 11 we introduce three building blocks 119861
1
1198612 1198613 For 119861
1and 119861
2 the rules presented in Figure 7 are also
applicable for Figure 11 For1198613 we only consider root overlap
because the crossover individuals can be eliminated by usingthe splitting operator introduced in Section 314 Note thatfor 1198613 if 119879119903119894 119862(119875
119860119886 119875119860119887 119875119860119888) = 0 then it is equivalent to the
scenario 1198783in Figure 8 Therefore we only need to consider
1198613when 119879119903119894 119862(119875
119860119886 119875119860119887 119875119860119888) = 0
322 Building Block-Based Cases Construction for ⟨119875119860119886119875119860119887
119875119860119888119875119860119889⟩ For a scenario 119878
119894(0 le 119894 le 10) in Figure 11 we
first decompose 119878119894to one or multiple building blocks For a
scenario 119878119894isin 1198781 1198783 it has only one building block and
all acceptable cases can be obtained directly For 1198782= 1199061=
1198611 1199062= 1198611 there is no need to consider the conflict between
the edges in 1199061and 119906
2because 119906
1and 119906
2are disconnected
Let 119877119894denote all acceptable cases of the path-pairs in 119906
119894 and
let 119879119894denote all acceptable cases for 119878
119894 Therefore we obtain
1198792= 1198771times1198772where times denotes the Cartesian product operator
from relational algebra
10 Computational and Mathematical Methods in Medicine
For B3 all three edges belong to root overlap (ie having root 3-overlap)
PAa
PAb PAcPAb
PAa
C(PAa PAb PAc) ne
B1 B2 B3
Tri 0
Figure 11 Building blocks for all scenarios of ⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
Table 2 Largest subgraph of a scenario 119878119894(4 le 119894 le 10 and 119894 = 6)
119878119894
1198784
1198785
1198787
1198788
1198789
11987810
119878119895
1198783
1198783
1198786
1198785
1198787
1198789
For 1198786= 1199061= 1198613 we obtain 119879
6= 1198771 For 119878
119894isin 119878119894| 4 le
119894 le 10 and 119894 = 6 we define the largest subgraph of 119878119894based
on which we construct 119879119894
Definition 7 (Largest Subgraph) Given a scenario 119878119894(4 le 119894 le
10 and 119894 = 6) the largest subgraph of 119878119894 denoted as 119878
119895 is
defined as follows
(1) 119878119895is a proper subgraph of 119878
119894
(2) if 119878119894contains 119861
3 then 119878
119895must also contain 119861
3
(3) no such 119878119896exists that 119878
119895is a proper subgraph of 119878
119896
while 119878119896is also a proper subgraph of 119878
119894
For each scenario 119878119894(4 le 119894 le 10 and 119894 = 6) we list the
largest subgraph of 119878119894 denoted as 119878
119895 in Table 2
For a scenario 119878119894(4 le 119894 le 10 and 119894 = 6) let Diff(119878
119894 119878119895)
denote the set of building blocks in 119878119894but not in 119878
119895 where 119878
119895is
the largest subgraph of 119878119894 Let |119864
119894| and |119864
119895| denote the number
of edges in 119878119894and 119878
119895 respectively According to Table 2 we
can conclude that |119864119894| minus |119864
119895| = 1 In order to leverage the
dependency among building blocks we consider only 1198612in
Diff(119878119894119878119895) For example Diff(119878
51198783) = 119861
2 Let119879
3denote all
acceptable cases for 1198783 And let119877
1denote the set of acceptable
cases for Diff(1198785 1198783) Then we can use 119878
3and Diff(119878
5
1198783) to construct all acceptable cases for 119878
5 Then we apply
this idea for constructing all acceptable cases for each 119878119894in
Table 2Given a path-quad ⟨119875
119860119886 119875119860119887 119875119860119888 119875119860119889⟩ an acceptable case
has the following properties
(1) if there is one root 3-overlap path there can be atmostone root 2-overlap path
(2) otherwise there can be at most two root 2-overlappaths
323 Path-Counting Formula forΦ119886119887119888119889
Now we present thepath-counting formula forΦ
119886119887119888119889as follows
Φ119886119887119888119889
= sum
119860
( sum
Type 1(1
2)
119871quad
Φ119860119860119860119860
+ sum
Type 2(1
2)
119871quad+1
Φ119860119860119860
+ sum
Type 3(1
2)
119871quad+2
Φ119860119860)
(14)
where Φ119860119860= (12)(1+119865
119860)Φ119860119860119860
= (14)(1+3119865119860)Φ119860119860119860119860
=
(18)(1+7119865119860) 119865119860 the inbreeding coefficient of119860119860 a quad-
common ancestor of 119886 119887 119888 and 119889 Type 1 zero root 2-overlapand zero root 3-overlap path Type 2 one root 2-overlap path119875119860119904
ending at 119904
Type 3
Case 1 two root 2-overlap paths 1198751198601199041
1198751198601199042
ending at 1199041and 1199042 respectively
Case 2 one root 3-overlap path119875119860119905
ending at 119905Case 3 one root 2-overlap path119875119860119904 one root 3-overlap
path 119875119860119905
ending at 119904 and 119905respectively
119871quad =
119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889
for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 119871119875119860119904
for Type 2119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889
minus1198711198751198601199041
minus 1198711198751198601199042
for Case 1 isin Type 3119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 2 lowast 119871
119875119860119905for Case 2 isin Type 3
119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889
minus119871119875119860119905minus 119871119875119860119904
for Case 3 isin Type 3(15)
and 119871119875119860119886
the length of the path 119875119860119886
(also applicable for 119875119860119887
119875119860119888 119875119860119889 etc)
For completeness the path-counting formulas for Φ119886119886119887119888
and Φ119886119886119886119887
are presented in Appendix A The correctness ofthe path-counting formula for four individuals is proven inAppendix C
Computational and Mathematical Methods in Medicine 11
⟨ ⟩(PAa PAb) (PAc PAd) = b
A
c
s t
da
A rarr s rarr aA rarr s rarr bA rarr t rarr cA rarr t rarr d
(a)
⟨ ⟩(PAa PAb) (PAc PAd) = b
A
c
x y
da
A rarr x rarr a
A rarr x rarr d
A rarr y rarr bA rarr y rarr c
(b)
Figure 12 Examples of 2-pair-path-quads for Φ119886119887119888119889
33 Path-Counting Formulas for Two Pairs of Individuals
331 Terminology and Definitions
(1) 2-Pair-Path-Pair It consists of two pairs of path-pairsdenoted as ⟨(119875
119878119886 119875119878119887) (119875119879119888 119875119879119889)⟩ where 119875
119878119886isin 119875(119878 119886) 119875
119878119887isin
119875(119878 119887) 119875119879119888isin 119875(119879 119888) 119875
119879119889isin 119875(119879 119889) 119878 is a common ancestor
of 119886 and 119887 and 119879 is a common ancestor of 119888 and 119889 If119860 = 119878 =119879 then 119860 is a quad-common ancestor of 119886 119887 119888 and 119889
(2) Homo-Overlap and Heter-Overlap Individual Given twopairs of individuals ⟨119886 119887⟩ and ⟨119888 119889⟩ if 119904 isin 119861119894 119862(119875
119860119886 119875119860119887) (or
119904 isin 119861119894 119862(119875119860119888 119875119860119889) we call 119904 a homo-overlap individual when
119875119860119886
and 119875119860119887
(or 119875119860119888
and 119875119860119889) pass through the same parent of
119904 If 119903 isin 119861119894 119862(119875119860119894 119875119860119895) where 119894 isin 119886 119887 and 119895 isin 119888 119889 we call
119903 a heter-overlap individual when 119875119860119894
and 119875119860119895
pass throughthe same parent of 119903
(3) Root Homo-Overlap and Heter-Overlap Path Given a 2-pair-path-pair ⟨(119875
119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ if 119904 is a homo-overlap
individual and the homo-overlap path extends all the wayto the quad-common ancestor 119860 then we call it a roothomo-overlap path If 119903 is a heter-overlap individual and theheter-overlap path extends all the way to the quad-commonancestor 119860 then we call it a root heter-overlap path
Example 8 119860 is quad-common ancestor for 119886 119887 119888 and 119889 inFigure 12 For (a) 119904 is a homo-overlap individual between 119875
119860119886
and 119875119860119887
119905 is a homo-overlap individual between 119875119860119888
and 119875119860119889 And
119860 rarr 119904 and 119860 rarr 119905 are root homo-overlap paths For (b) 119909 isa heter-overlap individual between 119875
119860119886and 119875
119860119889 119910 is a heter-
overlap individual between 119875119860119887
and 119875119860119888 And 119860 rarr 119909 and
119860 rarr 119910 are root heter-overlap paths
332 Path-Counting Formula for Φ119886119887119888119889
Now we presenta path-pair level graphical representation for ⟨(119875
119860119886 119875119860119887)
(119875119860119888 119875119860119889)⟩ shown in Figure 13 The options for an edge can
be 119879119883 119879119883 (Refer to Section 311 for definitions of 119879119883and 119879119883) Based on the different types of ⟨119875
119860119886 119875119860119887 119875119860119888 119875119860119889⟩
presented in (14) all cases for ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ are
summarized in Table 3 where ℎ is the last individual of a roothomo-overlap path 119875
119860ℎ(ie the path 119875
119860ℎending at ℎ) and 119903
1
and 1199032are the last individuals of root heter-overlap paths 119875
1198601199031
and 1198751198601199032
respectivelyGiven a pedigree graph having one or multiple progeni-
tors 119901119894| 119894 gt 0 we define that the generation of a progenitor
Table 3 A summary of all cases for ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩ ⟨(119875
119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩
Zero root 2-overlap andzero root 3-overlap
Zero root homo-overlap and zero rootheter-overlap
One root 2-overlap path
One root homo-overlap and zero rootheter-overlapZero root homo-overlap and one rootheter-overlap
Two root 2-overlap paths
Two root homo-overlaps and zero rootheter-overlapZero root homo-overlap and two rootheter-overlaps
One root 3-overlap path One root homo-overlap and two rootheter-overlaps and ℎ = 119903
1= 1199032
One root 2-overlap andone root 3-overlap
One root homo-overlap and two rootheter-overlaps and 119903
1= 1199032= ℎ
One root homo-overlap and two rootheter-overlaps and ℎ = 119903
1= 1199032
119901119894is 0 denoted as gen(119901
119894) = 0 If an individual 119886 has only
one parent 119901 then we define gen(119886) = gen(119901) + 1 If anindividual 119886 has two parents 119891 and 119898 we define gen(119886) =MAXgen(119891) gen(119898) + 1
The path-counting formula forΦ119886119887119888119889
is as follows
Φ119886119887119888119889
= sum
119860
( sum
Type 1(1
2)
1198712-pair
Φ119860119860119860
+ sum
Type 2(1
2)
1198712-pair+1
Φ119860119860119860
+ sum
Type 3(1
2)
1198712-pair+2
Φ119860119860
+ sum
Type 4(1
2)
1198712-pair+1
Φ119860119860)
+ sum
(119878119879)isinType 5(1
2)
119871⟨119875119878119886119875119878119887⟩+119871⟨119875119879119888119875119879119889
⟩+1
Φ119861119861
(16)
where 119860 a quad-common ancestor of 119886 119887 119888 and 119889 119878a common ancestor of 119886 and 119887 and 119879 a common ances-tor of 119888 and 119889 For ⟨(119875
119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ (119878 = 119879 =
119860) there are four types (ieType 1 to Type 4)
12 Computational and Mathematical Methods in Medicine
S0S1 S2 S3 S4 S5 S6 S7
S8 S9 S10 S11 S12 S13 S14 S15 S16
PAa
PAdPAb
PAc
Figure 13 Scenarios of ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ at path-pair level
Type 1 zero root homo-overlap and zero root heter-overlapType 2 zero root homo-overlap and one root heter-overlap 119875
119860119903ending at 119903
Type 3
zero root homo-overlap and two rootheter-overlap 119875
1198601199031and1198751198601199032
ending at1199031and 1199032 respectively
one root homo-overlap 119875119860ℎ
ending at ℎand two root heter-overlap 119875
1198601199031and 119875
1198601199032
ending at 1199031and 1199032 and 119903
1= 1199032
(17)
Type 4 one root homo-overlap 119875119860ℎ
ending at ℎ andtwo root heter-overlap ending at 119903
1and 1199032 and ℎ =
1199031= 1199032 For ⟨(119875
119878119886 119875119878119887) (119875119879119888 119875119879119889)⟩ (119878 = 119879) there is
one type (ie Type 5)Type 5 ⟨119875
119878119886 119875119878119887⟩ has zero overlap individual ⟨119875
119879119888
119875119879119889⟩ has zero overlap individual
At most one path-pair (either ⟨119875119878119886 119875119878119887⟩ or ⟨119875
119879119888
119875119879119889⟩) can have crossover individualsBetween a path from ⟨119875
119878119886 119875119878119887⟩ and a path from ⟨119875
119879119888 119875119879119889⟩
there are no overlap individuals but there can be crossoverindividuals 119909 where 119909 = 119878 and 119909 = 119879
119861=
119878 when gen (119878) lt gen (119879)119878 when gen (119878) = gen (119879)
and 119879 has two parents119879 otherwise
1198712-pair =
119871119875119860119886+ 119871119875119860119887
+119871119875119860119888+ 119871119875119860119889
for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 119871119875119860119903
for Type 2119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 1198711198751198601199031
minus 1198711198751198601199032
for Type 3119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 2 lowast 119871
119875119860ℎfor Type 4
119871⟨119875119878119886 119875119878119887⟩
= 119871119875119878119886+ 119871119875119878119887
for Type 5
119871⟨119875119879119888 119875119879119889⟩
= 119871119875119879119888+ 119871119875119879119889
for Type 5
(18)
Note that if ⟨119886 119887⟩ and ⟨119888 119889⟩ have zero quad-commonancestors we have the following formula for Φ
119886119887119888119889
Φ119886119887119888119889
= sum
(119878119879)isinType 6(1
2)
119871⟨119875119878119886119875119878119887⟩+119871⟨119875119879119888119875119879119889
⟩
Φ119878119878lowast Φ119879119879 (19)
Type 6 ⟨119875119878119886 119875119878119887⟩ is a nonoverlapping path-pair and ⟨119875
119879119888
119875119879119889⟩ is a nonoverlapping path-pair Between a path from
⟨119875119878119886 119875119878119887⟩ and a path from ⟨119875
119879119888 119875119879119889⟩ there are no overlap
individuals but there can be crossover individuals119871⟨119875119878119886 119875119878119887⟩
and 119871⟨119875119879119888119875119879119889⟩
are defined as in Type 5The correctness of the path-counting formula forΦ
119886119887119888119889is
proven in Appendix C For completeness please refer to [18]for the path-counting formulas for Φ
119886119886119887119888 Φ119886119887119886119888
Φ119886119887119886119887
andΦ119886119886119886119887
34 Experimental Results In this section we show the effi-ciency of our path-counting method using NodeCodes forcondensed identity coefficients by making comparisons withthe performance of a recursive method used in [10] Weimplemented two methods (1) using recursive formulas tocompute each required kinship coefficient and generalizedkinship coefficient (2) using path-counting method coupledwith NodeCodes to compute each required kinship coeffi-cient and generalized kinship coefficient independently Werefer to the first method as Recursive the second methodas NodeCodes For completeness please refer to [18] for thedetails of the NodeCodes-based method
Nodecodes of a node is a set of labels each representing apath to the node from its ancestors Given a pedigree graphlet 119903 be the progenitor (ie the node with 0 in-degree)(For simplicity we assume there is one progenitor 119903 asthe ancestor of all individuals in the pedigree Otherwise avirtual node 119903 can be added to the pedigree graph and allprogenitors can be made children of 119903) For each node 119906 inthe graph the set of NodeCodes of 119906 denoted as NC(119906) areassigned using a breadth-first-search traversal starting from119903 as follows
(1) If 119906 is 119903 then NC(119903) contains only one element theempty string
(2) Otherwise let 119906 be a node with NC(119906) and V0 V1
V119896be 119906rsquos children in sibling order then for each 119909
in NC(119906) a code 119909119894lowast is added to NC(V119894) where 0 le
119894 le 119896 and lowast indicates the gender of the individualrepresented by node V
119894
Computational and Mathematical Methods in Medicine 13
Computations of kinship coefficients for two individualsand generalized kinship coefficients for three individualspresented in [11 12 14 15] are using NodeCodes TheNodeCodes-based computation schemes can also be appliedfor the generalized kinship coefficients for four individualsand two pairs of individuals For completeness please referto [18] for the details using NodeCodes to compute thegeneralized kinship coefficients for four individuals and twopairs of individuals based on our proposed path-countingformulas in Sections 32 and 33
In order to test the scalability of our approach for cal-culating condensed identity coefficients on large pedigreeswe used a population simulator implemented in [11] togenerate arbitrarily large pedigreesThe population simulatoris based on the algorithm for generating populations withoverlapping generations in Chapter 4 of [19] along withthe parameters given in Appendix B of [20] to model therelatively isolated Finnish Kainuu subpopulation and itsgrowth during the years 1500ndash2000 An overview of thegeneration algorithmwas presented in [11 12 14]The param-eters include startingending year initial population sizeinitial age distribution marriage probability maximum ageat pregnancy expected number of children by time periodimmigration rate and probability of death by time period andage group
We examine the performance of condensed identity coef-ficients using twelve synthetic pedigrees which range from75 individuals to 195197 individuals The smallest pedigreespans 3 generations and the largest pedigree spans 19 gener-ations We analyzed the effects of pedigree size and the depthof individuals in the pedigree (the longest path between theindividual and a progenitor) on the computation efficiencyimprovement
In the first experiment 300 random pairs were selectedfrom each of our 12 synthetic pedigrees Figure 14 showscomputation efficiency improvement for each pedigree Ascan be seen the improvement of NodeCodes over Recursivegrew increasingly larger as the pedigree size increased froma comparable amount of 2683 on the smallest pedigree to9475 on the largest pedigree It also shows that path-count-ing method coupled with NodeCodes can scale very well onlarge pedigrees in terms of computing condensed identitycoefficients
In our next experiment we examined the effect of thedepth of the individual in the pedigree on the query time Foreach depth we generated 300 random pairs from the largestsynthetic pedigree
Figure 15 shows the effect of depth on the compu-tation efficiency improvement We can see the improve-ment of NodeCodes over Recursive ranging from 8648 to9130
4 Conclusion
We have introduced a framework for generalizing Wrightrsquospath-counting formula for more than two individuals Aim-ing at efficiently computing condensed identity coefficients
0
50
100
150
200
77 181
383
769
1558
3105
6174
1235
1
2466
7
4976
1
9832
8
1951
97
250
300
Aver
age t
ime (
ms)
Individuals in pedigree
RecursiveNodecodes
Figure 14 The effect of pedigree size on computation efficiencyimprovement
0200400600800
10001200140016001800
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Aver
age t
ime (
ms)
Depth
RecursiveNodeCodes
Figure 15 The effect of depth on computation efficiency improve-ment
we proposed path-counting formulas (PCF) for all general-ized kinship coefficients for which are sufficient for express-ing condensed identity coefficients by a linear combinationWe also perform experiments to compare the efficiency of ourmethod with the recursive method for computing condensedidentity coefficients on large pedigrees Our future workincludes (i) further improvements on condensed identifycoefficients computation by collectively calculating the setof generalized kinship coefficients to avoid redundant com-putations and (ii) experimental results for using PCF inconjunction with encoding schemes (eg compact path-encoding schemes [13]) for computing condensed identitycoefficients on very large pedigrees
Appendices
A Path-Counting Formulas of Special Cases
A1 Path-Counting Formula for Φ119886119886119887
For ⟨1198751198601198861 1198751198601198862⟩ we
introduce a special case where 1198751198601198861
and 1198751198601198862
aremergeable
14 Computational and Mathematical Methods in Medicine
PAa1 PAa2 PAa1 PAa2
S0 S1
PAb PAb PAb
If is mergeable⟨P ⟩Aa1 PAa2
PAa
S2 S3
Figure 16 A path-pair level graphical representation of ⟨1198751198601198861 1198751198601198862
119875119860119887⟩
Definition A1 (Mergeable Path-Pair) A path-pair ⟨1198751198601198861
1198751198601198862⟩ is mergeable if and only if the two paths 119875
1198601198861and 119875
1198601198862
are completely identical
Next we present a graphical representation of ⟨1198751198601198861 1198751198601198862
119875119860119887⟩ in Figure 16
Lemma A2 For 1198782and 119878
3in Figure 16 ⟨119875
1198601198861 1198751198601198862⟩ cannot
be a mergeable path-pair
Proof For 1198782and 119878
3 if ⟨119875
1198601198861 1198751198601198862⟩ is mergeable then
any common individual 119904 between 1198751198601198861
and 119875119860119887
is alsoa shared individual between 119875
1198601198862and 119875
119860119887 It means
119904 isin 119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) which contradicts the fact that
119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) = 0
Considering all three scenarios in Figure 16 only 1198781can
have a mergeable path-pair ⟨1198751198601198861 1198751198601198862⟩ by Lemma A2 Now
we present our path-counting formula forΦ119886119886119887
where 119886 is notan ancestor of 119887
Φ119886119886119887
= sum
119860
( sum
Type 1(1
2)
119871 tripleminus1
Φ119860119860119860
+ sum
Type 2(1
2)
119871 triple
Φ119860119860
+ sum
Type 3(1
2)
119871⟨119875119860119886119875119860119887⟩+1
Φ119860119860)
(A1)
where 119860 a common ancestor of 119886 and 119887When ⟨119875
1198601198861 1198751198601198862⟩ is not mergeable
Type 1 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has no root 2-overlap
Type 2 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has one root 2-overlap path
119875119860119904
ending at the individual 119904
When ⟨1198751198601198861 1198751198601198862⟩ is mergeable
Type 3 ⟨119875119860119886 119875119860119887⟩ is a nonoverlapping path-pair
119871 triple = 1198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887
for Type 11198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887minus 119871119875119860119904
for Type 2
119871⟨119875119860119886 119875119860119887⟩
= 119871119875119860119886+ 119871119875119860119887
for Type 3
(A2)
For the sake of completeness if 119886 is an ancestor of 119887 there isno recursive formula for Φ
119886119886119887in [10] but we can use either
the recursive formula for Φ119886119887119888
or the path-counting formulaforΦ119886119887119888
to computeΦ11988611198862119887
A2 Path-Counting Formula for Φ119886119886119887119888
Given a path-quad⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩ if ⟨119875
1198601198861 1198751198601198862⟩ is not mergeable then
we process the path-quad as equivalent to ⟨119875119860119886 119875119860119887 119875119860119888
119875119860119889⟩ If ⟨119875
1198601198861 1198751198601198862⟩ is mergeable the path-quad ⟨119875
1198601198861 1198751198601198862
119875119860119887 119875119860119888⟩ can be condensed to scenarios for ⟨119875
119860119886 119875119860119887 119875119860119888⟩
Now we present a path-counting formula forΦ119886119886119887119888
where119886 is not an ancestor of 119887 and 119888 as follows
Φ119886119886119887119888
= sum
119860
( sum
Type 1(1
2)
119871quadminus1
Φ119860119860119860119860
+ sum
Type 2(1
2)
119871quad
ΦAAA
+ sum
Type 3(1
2)
119871quad+1
Φ119860119860)
+sum
119860
( sum
Type 4(1
2)
119871 triple+1
Φ119860119860119860
+ sum
Type 5(1
2)
119871 triple+2
Φ119860119860)
(A3)
where 119860 a quad-common ancestor of 119886 119887 119888 and 119889When ⟨119875
1198601198861 1198751198601198862⟩ is not mergeable
Type 1 zero root 2-overlap and zero root 3-overlappathType 2 one root 2-overlap path 119875
119860119904ending at 119904
Type 3
Case 1 two root 2-overlap paths 1198751198601199041
and 1198751198601199042
ending at 1199041and 1199042 respectively
Case 2 one root 3-overlap path 119875119860119905
ending at 119905Case 3 one root 2-overlapand one root 3-overlap paths119875119860119904
and 119875119860119905
ending at 119904 and 119905respectively
(A4)
When ⟨1198751198601198861 1198751198601198862⟩ is mergeable
Type 4 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has zero root 2-overlap path
Type 5 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has one root 2-overlap path119875
119860119904
ending at 119904
119871quad=
1198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
for Type 11198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus119871119875119860119904
for Type 21198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus1198711198751198601199041
minus 1198711198751198601199042
for Case 1isinType 31198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus119871119875119860119905
for Case 2isinType 31198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus119871119875119860119905minus 119871119875119860119904
for Case 3isinType 3
119871 triple = 119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
for Type 4119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888minus 119871119875119860119904
for Type 5(A5)
Computational and Mathematical Methods in Medicine 15
Note that if 119886 is an ancestor of either 119887 or 119888 or both ofthem then the path-counting formula of Φ
119886119887119888119889is applicable
to computeΦ11988611198862119887119888
A3 Path-Counting Formula for Φ119886119886119886119887
A special case of⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ for ⟨119875
1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ is introduced
when ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable With the existence of
a mergeable path-triple ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ can be con-
densed to ⟨119875119860119886 119875119860119887⟩
Definition A3 (Mergeable Path-Triple) Given three paths1198751198601198861
1198751198601198862
and 1198751198601198863
they are mergeable if and only if theyare completely identical
Lemma A4 Given a path-quad ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there
must be at least one mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩
⟨1198751198601198861 1198751198601198863⟩ ⟨1198751198601198862 1198751198601198863⟩
Proof For an individual 119886 with two parents 119891 and 119898 thepaternal allele of the individual 119886 is transmitted from 119891 andthe maternal allele is transmitted from119898 At allele level onlytwo descent paths starting from an ancestor are allowed Fora path-quad ⟨119875
1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there must be at least one
mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩ ⟨1198751198601198861 1198751198601198863⟩ and
⟨1198751198601198862 1198751198601198863⟩
For simplicity we treat ⟨1198751198601198861 1198751198601198862⟩ as a default mergeable
path-pairNow we present the path-counting formula for Φ
119886119886119886119887
where 119886 is not an ancestor of 119887 as follows
Φ119886119886119886119887
= sum
119860
(3
2( sum
Type 1(1
2)
119871 tripleminus1
Φ119860119860119860
+ sum
Type 2(1
2)
119871 triple
Φ119860119860)
+ sum
Type 3(1
2)
119871pair+2
Φ119860119860)
(A6)
where 119860 a common ancestor of 119886 and 119887When there is only one mergeable path-pair (let us con-
sider ⟨1198751198601198861 1198751198601198862⟩ as the mergeable path-pair)
Type 1 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has zero root 2-overlap path
Type 2 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has one root 2-overlap path
119875119860119904
ending at 119904
When ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable
Type 3 ⟨119875119860119886 119875119860119887⟩ is nonoverlapping
119871 triple = 1198711198751198601198861
+ 1198711198751198601198863
+ 119871119875119860119887
for Type 11198711198751198601198861
+ 1198711198751198601198863
+ 119871119875119860119887minus 119871119875119860119904
for Type 2
119871pair = 119871119875119860119886 + 119871119875119860119887 for Type 3
(A7)
Note that if 119886 is an ancestor of 119887 we treat Φ119886119886119886119887
=
Φ119886111988621198863119887
Then we apply the path-counting formula forΦ119886119887119888119889
to computeΦ119886111988621198863119887
Case21 Case31 ΦAAAΦabCase22 Case32
Case23 ΦAA
Figure 17 Dependency graph for different cases regardingΦ119886119887119888
andΦ119886119886119887
B Proof for Path-Counting Formulas ofThree Individuals
Wefirst demonstrate that for one triple-common ancestor119860the path-counting computation of Φ
119886119887119888is equivalent to the
computation using recursive formulas Then we prove thecorrectness of the path-counting computation for multipletriple-common ancestors
B1 One Triple-Common Ancestor Considering the differenttypes of path-triples starting from a triple-common ancestor119860 in a pedigree graph119866 contributing toΦ
119886119887119888andΦ
119886119886119887119866 can
have 5 different cases
Case 21 119866 does not haveany path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with root overlapCase 22 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with root overlapCase 23 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
having mergeablepath-pair⟨119875
1198601198861 1198751198601198862⟩
lArr997904 Φ119886119886119887
Case 31 119866 does not haveany path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩
with root overlapCase 32 119866 has path-triples
⟨119875119860119886 119875119860119887 119875119860119888⟩
with root overlap
lArr997904 Φ119886119887119888
(B1)
Based on the 5 cases from Case 21 to Case 32 we firstconstruct a dependency graph shown in Figure 17 consist-ent with the recursive formulas (3) (4) and (5) for the gener-alized kinship coefficients for three individuals
Then we take the following steps to prove the correctnessof the path-counting formulas (12) and (A1)
(i) forΦ119886119887 the correctness of the path-counting formula
(ie Wrightrsquos formula) is proven in [21] For Case 21and Case 22 the correctness is proven based on thecorrectness of Cases 31 and 32
(ii) for Case 23 it has no cycle but only depends on Φ119886119887
Thus we prove the correctness of Case 23 by trans-forming the case toΦ
119886119887
16 Computational and Mathematical Methods in Medicine
a b
c
(a)
A
a b c
(b)
Figure 18 (a) 119888 is a parent of 119886 and 119887 (b) no individual is a parent of another
Parent-child relationshipAncestor-descendant relationship
A
a
s v p
f b c
(a)
Parent-child relationshipAncestor-descendant relationship
c
a
s v
f b
(b)
Figure 19 (a) No individual is a parent of another (b) 119888 is an ancestor of 119886 and 119887
(iii) for Cases 31 and 32 the correctness is proven byinduction on the number of edges 119899 in the pedigreegraph 119866
B11 Correctness Proof for Case 31
Case 31 ForΦ119886119887119888
119866 does not have any path triples ⟨119875119860119886 119875119860119887
119875119860119888⟩ with root overlap
Proof (Basis) There are two basic scenarios (i) one individ-ual is a parent of another (ii) no individual is a parent ofanother among 119886 119887 and 119888
Using the recursive formula (3) to compute Φ119886119887119888
forFigure 18(a) Φ
119886119887119888= (12)Φ
119888119887119888= (12)
2
Φ119888119888119888 for Figure 18(b)
Φ119886119887119888= (12)Φ
119860119887119888= (12)
2
Φ119860119860119888
= (12)3
Φ119860119860119860
Using the path-counting formula (12) if a path-triple
⟨119875119860119886 119875119860119887 119875119860119888⟩ has no root overlap (ie Type 1) then the
contribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ
119886119887119888can be computed as
follows sumType 1(12)119871⟨119875119860119886119875119860119887
119875119860119888⟩Φ119860119860119860
where 119871⟨119875119860119886119875119860119887 119875119860119888⟩
=
119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
For Figure 18(a) 119888 is the only triple-common ancestor
and we obtain Φ119886119887119888
= (12)119871⟨119875119888119886119875119888119887
119875119888119888⟩Φ119888119888119888
= (12)2
Φ119888119888119888 for
Figure 18(b) we obtain Φ119886119887119888
= (12)119871⟨119875119860119886119875119860119887
119875119860119888⟩Φ119860119860119860
=
(12)3
Φ119860119860119860
Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Then we show it istrue for 119899 = 119896 + 1
For Figures 19(a) and 19(b) among 119886 119887 and 119888 let 119886 be theindividual having the longest path starting from their triple-common ancestor in the pedigree graph119866with (119896+1) edgesIf we remove the node 119886 and cut the edge 119891 rarr 119886 from 119866
then the new graph 119866lowast has 119896 edges In terms of computingΦ119891119887119888
119866lowast satisfies the condition for induction hypothesisFor Figure 19(a) Φ
119891119887119888= sumType 1(12)
119871⟨119875119860119891119875119860119887119875119860119888⟩Φ119860119860119860
Based on the recursive formula (3)Φ
119886119887119888= (12)(Φ
119891119887119888+Φ119898119887119888)
where 119891 and 119898 are parents of 119886 In 119866 119886 only has one parent119891 thus it indicatesΦ
119898119887119888= 0 Then we can plug-in the path-
counting formula forΦ119891119887119888
to obtain
Φ119886119887119888=1
2Φ119891119887119888
=1
2lowast sum
Type 1(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩
Φ119860119860119860
= sum
Type 1(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860119860
∵ 119871⟨119875119860119886119875119860119887 119875119860119888⟩
= 119871⟨119875119860119891119875119860119887 119875119860119888⟩
+ 1
there4 Φ119886119887119888= sum
Type 1(1
2)
119871⟨119875119860119886119875119860119887119875119860119888⟩
Φ119860119860119860
(B2)
Similarly for Figure 19(b) we obtain Φ119886119887119888
=
sumType 1(12)119871⟨119875119888119891119875119888119887119875119888119888⟩+1
Φ119888119888119888= sumType 1(12)
119871⟨119875119888119886119875119888119887119875119888119888⟩Φ119888119888119888
Thus it is true for 119899 = 119896 + 1
B12 Correctness Proof for Case 32
Case 32 ForΦ119886119887119888
119866 has path triples ⟨119875119860119886 119875119860119887 119875119860119888⟩with root
overlap
Proof (Basis) There are three basic scenarios (i) there are twoindividuals who are parents of another (ii) there is only oneindividual who is parent of another (iii) there is no individualwho is a parent of another among 119886 119887 and 119888
Computational and Mathematical Methods in Medicine 17
a
b
c
(a)
A
a
b c
(b)
A
a
s
b
c
(c)
Figure 20 (a) 119887 is a parent of 119886 and 119888 is a parent of 119887 (b) 119887 is a parentof 119886 (c) no individual who is a parent of another
Using the recursive formula (3) to compute Φ119886119887119888
inFigure 20 for Figure 20(a) Φ
119886119887119888= (12)Φ
119887119887119888= (12)
2
Φ119887119888=
(12)3
Φ119888119888 for Figure 20(b)Φ
119886119887119888= (12)Φ
119887119887119888= (12)
2
Φ119887119888=
(12)4
Φ119860119860
for Figure 20(c)Φ119886119887119888= (12)
2
Φ119904119904119888= (12)
3
Φ119904119888=
(12)5
Φ119860119860
Using the path-counting formula (12) if a path-triple
⟨119875119860119886 119875119860119887 119875119860119888⟩ has root overlap (ie Type 2) then the con-
tribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ
119886119887119888can be computed as
followssumType 2(12)119871⟨119875119860119886119875119860119887
119875119860119888⟩+1
Φ119860119860
where 119871⟨119875119860119886 119875119860119887 119875119860119888⟩
=
119871119875119860119886
+ 119871119875119860119887
+ 119871119875119860119888minus 119871119875119860119904
and 119904 is the last individual of theroot overlap path 119875
119860119904
For Figure 20(a) 119888 is the only triple-common ancestorand we obtain Φ
119886119887119888= (12)
119871⟨119875119888119886119875119888119887119875119888119888⟩+1
Φ119888119888= (12)
2+1
Φ119888119888=
(12)3
Φ119888119888 Similarly for Figures 20(b) and 20(c) we obtain
Φ119886119887119888= (12)
4
Φ119860119860
and Φ119886119887119888= (12)
5
Φ119860119860
respectively
Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Show that it is truefor = 119896 + 1
For Figures 21(a) 21(b) and 21(c) among 119886 119887 and 119888 let119886 be the individual who has the longest path and let 119901 be aparent of 119886 Then we cut the edge 119901 rarr 119886 from 119866 and obtaina new graph 119866lowast which satisfies the condition of inductionhypothesis For Figure 21(a) we use the path-counting for-mula forΦ
119891119887119888in 119866lowast Φ
119891119887119888= sumType 2(12)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860
In 119866 119891 is the only parent of 119886 according to the recursive
formula (3) we have Φ119886119887119888= (12)Φ
119891119887119888 Then we can plug-in
the Φ119891119887119888
and obtain
Φ119886119887119888=1
2Φ119891119887119888
=1
2sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860
= sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1
Φ119860119860
∵ 119871⟨119875119860119886 119875119860119887 119875119860119888⟩
= 119871⟨119875119860119891119875119860119887 119875119860119888⟩
+ 1
there4 Φ119886119887119888= sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1
Φ119860119860
= sum
Type 2(1
2)
119871⟨119875119860119886119875119860119887119875119860119888⟩+1
Φ119860119860
(B3)
For Figures 21(b) and 21(c) we take the same steps as we cal-culate Φ
119886119887119888for Figure 21(a)
In summary it is true for 119899 = 119896 + 1
A
a
s
t
f
b
c
(a)
a
t
b
A
s c
(b)
a
s
t
b
c
(c)Figure 21 (a) No individual who is a parent of another (b) 119887 is aparent of 119886 (c) 119887 is a parent of 119886 and 119888 is an ancestor of 119887
B13 Correctness Proof for Case 23
Case 23 For Φ119886119886119887
the path-triples in the pedigree graph 119866have mergeable path-pair
Proof Considering the relationship between 119886 and 119887 119866has two scenarios (i) 119887 is not an ancestor of 119886 (ii) 119887 isan ancestor of 119886 Using the path-counting formula (A1)if a path-triple ⟨119875
1198601198861 1198751198601198862 119875119860119887⟩ isin Type 3 which means
that it has a mergeable path-pair then the contributionof ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ to Φ
119886119886119887can be computed as follows
sumType 3(12)119871⟨119875119860119886119875119860119887
⟩+1Φ119860119860
where 119871⟨119875119860119886 119875119860119887⟩
= 119871119875119860119886+ 119871119875119860119887
Using the recursive formula (4) we obtain Φ
119886119886119887=
(12)(Φ119886119887+ Φ119891119898119887)
For Figure 22(a) 119860 is a common ancestor of 119886 and 119887∵ 119886 only has one parent 119891
there4 Φ119886119886119887
=1
2(Φ119886119887+ Φ119891119898119887)
=1
2(Φ119886119887+ 0) =
1
2Φ119886119887
(as 119898 is missing) (B4)
For Φ119886119887 we use Wrightrsquos formula and obtain Φ
119886119887=
sum119875(12)119871⟨119875119860119886119875119860119887
⟩Φ119860119860
where 119875 denotes all nonoverlappingpath-pairs ⟨119875
119860119886 119875119860119887⟩
Then we have Φ119886119886119887
= (12)Φ119886119887
=
(12)sum119875(12)119871⟨119875119860119886119875119860119887
⟩Φ119860119860= sum119875(12)119871⟨119875119860119886119875119860119887
⟩+1Φ119860119860
For Figure 22(b) we can also transform the computation
of Φ119886119886119887
to Φ119886119887
In summary it shows that the path-counting formula(A1) is true for Case 23
B14 Correctness Proof for Cases 21 and 22 For Φ119886119886119887
whenthere is no path-triple having mergeable path-pair (ie thepath-triple belongs to either Case 21 or Case 23)Φ
119886119886119887can be
transformed toΦ11988611198862119887
which is equivalent to the computationof Φ119886119887119888
for Cases 31 and 32 The correctness of our path-counting formula for Cases 31 and 32 is proven Thus weobtain the correctness for Φ
119886119886119887when the path-triple belongs
to either Case 21 or Case 22
B2 Multiple Triple-Common Ancestors Now we providethe correctness proof for multiple triple-common ancestorsregarding the path-counting formulas (12) and (A1)
18 Computational and Mathematical Methods in Medicine
A
a
s
w
t
f
b
Parent-child relationshipAncestor-descendant relationship
(a)
a
s
f
b
Parent-child relationshipAncestor-descendant relationship
(b)
Figure 22 (a) 119887 is not an ancestor of 119886 (b) 119887 is an ancestor of 119886
Lemma A Given a pedigree graph 119866 and three individuals 119886119887 119888 having at least one trip-common ancestorΦ
119886119887119888is correctly
computed using the path counting formulas (12) and (A1)
Proof Proof by induction on the number of triple-commonancestorsBasis 119866 has only one triple-common ancestor of 119886 119887 and 119888
The correctness of (12) and (A1) for 119866 with only one tri-ple-common ancestor of 119886 119887 and 119888 is proven in the previoussection
Induction Hypothesis Assume that if 119866 has 119896 or less triple-common ancestors of 119886 119887 and 119888 (12) and (A1) are correct for119866
Induction Step Now we show that it is true for 119866 with 119896 + 1triple-common ancestors of 119886 119887 and 119888
Let 119879119903119894 119862(119886 119887 119888 119866) denote all triple-common ancestorsof 119886 119887 and 119888 in 119866 where 119879119903119894 119862(119886 119887 119888 119866) = 119860
119894| 1 le 119894 le 119896 +
1 Let 1198601be the most top triple-common ancestor such that
there is no individual among the remaining ancestors 119860119894|
2 le 119894 le 119896 + 1 who is an ancestor of 1198601 Let 119878(119860
1) denote the
contribution from 1198601to Φ119886119887119888
Because119860
1is themost top triple-common ancestor there
is no path-triple from 119860119894| 2 le 119894 le 119896 + 1 to 119886 119887 and
119888 which passes through 1198601 Then we can remove 119860
1from
119866 and delete all out-going edges from 1198601and obtain a new
graph 1198661015840 which has 119896 triple-common ancestors of 119886 119887 and 119888It means 119879119903119894 119862(119886 119887 119888 1198661015840) = 119860
119894| 2 le 119894 le 119896 + 1
For the new graph 1198661015840 we can apply our induction
hypothesis and obtainΦ119886119887119888(1198661015840
)For the most top triple-common ancestor 119860
1 there are
two different cases considering its relationship with the othertriple-common ancestors
(1) there is no individual among 119860119894| 2 le 119894 le 119896 + 1 who
is a descendant of 1198601
(2) there is at least one individual among 119860119894| 2 le 119894 le
119896 + 1 who is a descendant of 1198601
For (1) since no individual among 119860119894| 2 le 119894 le 119896 + 1 is a
descendant of 1198601 the set of path-triples from 119860
1to 119886 119887 and
119888 is independent of the set of path-triples from 119860119894| 2 le 119894 le
119896 + 1 to 119886 119887 and 119888 It also means that the contribution from
1198601toΦ119886119887119888
is independent of the contribution from the othertriple-common ancestors
Summing up all contributions we can obtainΦ119886119887119888(119866) =
Φ119886119887119888(1198661015840
) + 119878(1198601)
For (2) let119860119895be one descendant of119860
1 Now both119860
1and
119860119895can reach 119886 119887 and 119888119901119905119894= 119905119886 1198601rarr sdot sdot sdot rarr 119886 119905
119887 1198601rarr sdot sdot sdot rarr 119887 119905
119888 1198601rarr
sdot sdot sdot rarr 119888 a path-triple from 1198601to 119886 119887 and 119888
If 119905119886 119905119887 and 119905
119888all pass through119860
119895 then the path-triple119901119905
119894
is not an eligible path-triple for Φ119886119887119888
When we compute thecontribution from119860
1toΦ119886119887119888
we exclude all such path-tripleswhere 119905
119886 119905119887 and 119905
119888all pass through a lower triple-common
ancestor In other words an eligible path-triple from 1198601
regarding Φ119886119887119888
cannot have three paths all passing through alower triple-common ancestor Therefore we know that thatthe contribution from119860
1toΦ119886119887119888
is independent of the contri-bution from the other triple-common ancestors Summing upall contributions we obtainΦ
119886119887119888(119866) = Φ
119886119887119888(1198661015840
) + 119878(1198601)
C Proof for Four Individuals and TwoPairs of Individuals
Here we give a proof sketch for the correctness of pathcounting formulas for four individuals First of all for fourindividuals in a pedigree graph 119866 we present all differentcases based on which we construct a dependency graphThe correctness of the path-counting formulas for two-pairindividuals can be proved similarly
C1 Proof for Four Individuals Consider the existence ofdifferent types of path-quads regarding Φ
119886119887119888119889 Φ119886119886119887119888
andΦ119886119886119886119887
there are 15 cases for a pedigree graph 119866
Case 21 119866 has path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with zero root overlapCase 22 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with one root overlapCase 23 119866 has path-pairs
⟨119875119860119886 119875119860119887⟩
with zero root overlap
lArr997904 Φ119886119886119886119887
Computational and Mathematical Methods in Medicine 19
Case21
Case31 ΦAAA
ΦAAA
Case41
Case42
Case34ΦAA
Case32
Case331
Case22
Case23
Case431
Case35
Case432
Case4 33
Case332
Case333
Figure 23 Dependency graph for different cases for four individuals
Case 31 119866 has path-quads⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with zero root overlapCase 32 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 2-overlapCase 331 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with two root 2-overlapCase 332 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 3-overlapCase 333 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 2-overlapand one root 3-overlap
Case 34 119866 has path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩
with zero root overlapCase 35 119866 has path-triples
⟨119875119860119886 119875119860119887 119875119860119888⟩
with one root overlap
lArr997904 Φ119886119886119887119888
Case 41 119866 has path-quads⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with zero root overlapCase 42 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 2-overlapCase 431 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with two root 2-overlapCase 432 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 3-overlapCase 433 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 2-overlapand one root 3-overlap
lArr997904 Φ119886119887119888119889
(C1)Then we construct a dependency graph shown in
Figure 23 for all cases for four individualsAccording to the dependency graph in Figure 23 the
intermediate steps including Cases 34 and 35 are already
proved for the computation of Φ119886119887119888
The correctness of thetransformation fromCase 42 to Case 34 can be proved basedon the recursive formula forΦ
119886119887119888119889andΦ
119886119886119887119888 Similarly we can
obtain the transformation from Case 431 to Case 35
C2 Proof for TwoPairs of Individuals Consider the existenceof different types of 2-pair-path-pair regarding Φ
119886119887119888119889 there
are 9 cases which are listed as follows
Case 41 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-
overlap and zero root heter-overlap
Case 42 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-
overlap and one root heter-overlap
Case 431 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root
homo-overlap and two root heter-overlap
Case 432 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root
homo-overlap and two root heter-overlap
Case 44119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root homo-
overlap and zero root heter-overlap
Case 45 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with two root homo-
overlap and zero root heter-overlap
Case 46 119866 has path-triples ⟨119875119860119886 119875119860119887 119875119860119888⟩ with zero root
overlapCase 47 119866 has path-triples ⟨119875
119860119886 119875119860119887 119875119860119888⟩ with one root
overlap
Case 48 119866 has path-pairs ⟨119875119879119888 119875119879119889⟩ with zero root overlap
Then we construct a dependency graph for the casesrelating to Φ
119886119887119888119889in Figure 24
According to the dependency graph in Figure 24Cases 46 47 and 48 are the intermediate steps whichalready are proved for the computation of Φ
119886119887119888 The
correctness of the transformation from Case 42 to Case 46can be proved based on the recursive formula for Φ
119886119887119888119889and
Φ119886119887119886119888
Similarly we can obtain the transformation fromCases 431 and 432 to Case 47 as well as from Case 44 toCase 48 accordingly
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
20 Computational and Mathematical Methods in Medicine
Case41
Case44
ΦAAA
Case42 Case46
Case48
ΦAA
ΦTT
Case431 Case47
Case432
ΦAAAA
Figure 24 Dependency graph for different cases for two pairs of individuals
Acknowledgments
The authors thank Professor Robert C Elston Case Schoolof Medicine for introducing to them the identity coefficientsand referring them to the related literature [7 10 17] Thiswork is partially supported by the National Science Founda-tionGrants DBI 0743705 DBI 0849956 andCRI 0551603 andby the National Institute of Health Grant GM088823
References
[1] Surgeon Generalrsquos New Family Health History Tool Is ReleasedReady for ldquo21st Century Medicinerdquo httpcompmedcomcate-gorypeople-helping-peoplepage7
[2] M Falchi P Forabosco E Mocci et al ldquoA genomewidesearch using an original pairwise sampling approach for largegenealogies identifies a new locus for total and low-density lipo-protein cholesterol in two genetically differentiated isolates ofSardiniardquoThe American Journal of Human Genetics vol 75 no6 pp 1015ndash1031 2004
[3] M Ciullo C Bellenguez V Colonna et al ldquoNew susceptibilitylocus for hypertension on chromosome 8q by efficient pedigree-breaking in an Italian isolaterdquo Human Molecular Genetics vol15 no 10 pp 1735ndash1743 2006
[4] Glossary of Genetic Terms National Human Genome ResearchInstitute httpwwwgenomegovglossaryid=148
[5] CW CottermanA calculus for statistico-genetics [PhD thesis]Columbus Ohio USA Ohio State University 1940 Reprintedin P Ballonoff Ed Genetics and Social Structure DowdenHutchinson amp Ross Stroudsburg Pa USA 1974
[6] G Malecot Les mathematique de lrsquoheredite Masson ParisFrance 1948 Translated edition The Mathematics of HeredityFreeman San Francisco Calif USA 1969
[7] M Gillois ldquoLa relation drsquoidentite en genetiquerdquo Annales delrsquoInstitut Henri Poincare B vol 2 pp 1ndash94 1964
[8] D L Harris ldquoGenotypic covariances between inbred relativesrdquoGenetics vol 50 pp 1319ndash1348 1964
[9] A Jacquard ldquoLogique du calcul des coefficients drsquoidentite entredeux individualsrdquo Population vol 21 pp 751ndash776 1966
[10] G Karigl ldquoA recursive algorithm for the calculation of identitycoefficientsrdquo Annals of Human Genetics vol 45 no 3 pp 299ndash305 1981
[11] B Elliott S F Akgul S Mayes and Z M Ozsoyoglu ldquoEfficientevaluation of inbreeding queries on pedigree datardquo in Proceed-ings of the 19th International Conference on Scientific and Statis-tical Database Management (SSDBM rsquo07) July 2007
[12] B Elliott E Cheng S Mayes and Z M Ozsoyoglu ldquoEfficientlycalculating inbreeding on large pedigrees databasesrdquo Informa-tion Systems vol 34 no 6 pp 469ndash492 2009
[13] L Yang E Cheng and Z M Ozsoyoglu ldquoUsing compactencodings for path-based computations on pedigree graphsrdquo inProceedings of the ACM Conference on Bioinformatics Compu-tational Biology and Biomedicine (ACM-BCB rsquo11) pp 235ndash244August 2011
[14] E Cheng B Elliott and Z M Ozsoyoglu ldquoScalable compu-tation of kinship and identity coefficients on large pedigreesrdquoin Proceedings of the 7th Annual International Conference onComputational Systems Bioinformatics (CSB rsquo08) pp 27ndash362008
[15] E Cheng B Elliott and Z M Ozsoyoglu ldquoEfficient compu-tation of kinship and identity coefficients on large pedigreesrdquoJournal of Bioinformatics and Computational Biology (JBCB)vol 7 no 3 pp 429ndash453 2009
[16] S Wright ldquoCoefficients of inbreeding and relationshiprdquo TheAmerican Naturalist vol 56 no 645 1922
[17] R Nadot and G Vaysseix ldquoKinship and identity algorithm ofcoefficients of identityrdquo Biometrics vol 29 no 2 pp 347ndash3591973
[18] E Cheng Scalable path-based computations on pedigree data[PhD thesis] Case Western Reserve University ClevelandOhio USA 2012
[19] V Ollikainen Simulation Techniques for Disease Gene Localiza-tion in Isolated Populations [PhD thesis] University ofHelsinkiHelsinki Finland 2002
[20] H T T Toivonen P Onkamo K Vasko et al ldquoData miningapplied to linkage diseqilibrium mappingrdquoThe American Jour-nal of Human Genetics vol 67 no 1 pp 133ndash145 2000
[21] W Boucher ldquoCalculation of the inbreeding coefficientrdquo Journalof Mathematical Biology vol 26 no 1 pp 57ndash64 1988
Submit your manuscripts athttpwwwhindawicom
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MEDIATORSINFLAMMATION
of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Behavioural Neurology
EndocrinologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Disease Markers
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
OncologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Oxidative Medicine and Cellular Longevity
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
PPAR Research
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
ObesityJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational and Mathematical Methods in Medicine
OphthalmologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Diabetes ResearchJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Research and TreatmentAIDS
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Gastroenterology Research and Practice
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Parkinsonrsquos Disease
Evidence-Based Complementary and Alternative Medicine
Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom
10 Computational and Mathematical Methods in Medicine
For B3 all three edges belong to root overlap (ie having root 3-overlap)
PAa
PAb PAcPAb
PAa
C(PAa PAb PAc) ne
B1 B2 B3
Tri 0
Figure 11 Building blocks for all scenarios of ⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
Table 2 Largest subgraph of a scenario 119878119894(4 le 119894 le 10 and 119894 = 6)
119878119894
1198784
1198785
1198787
1198788
1198789
11987810
119878119895
1198783
1198783
1198786
1198785
1198787
1198789
For 1198786= 1199061= 1198613 we obtain 119879
6= 1198771 For 119878
119894isin 119878119894| 4 le
119894 le 10 and 119894 = 6 we define the largest subgraph of 119878119894based
on which we construct 119879119894
Definition 7 (Largest Subgraph) Given a scenario 119878119894(4 le 119894 le
10 and 119894 = 6) the largest subgraph of 119878119894 denoted as 119878
119895 is
defined as follows
(1) 119878119895is a proper subgraph of 119878
119894
(2) if 119878119894contains 119861
3 then 119878
119895must also contain 119861
3
(3) no such 119878119896exists that 119878
119895is a proper subgraph of 119878
119896
while 119878119896is also a proper subgraph of 119878
119894
For each scenario 119878119894(4 le 119894 le 10 and 119894 = 6) we list the
largest subgraph of 119878119894 denoted as 119878
119895 in Table 2
For a scenario 119878119894(4 le 119894 le 10 and 119894 = 6) let Diff(119878
119894 119878119895)
denote the set of building blocks in 119878119894but not in 119878
119895 where 119878
119895is
the largest subgraph of 119878119894 Let |119864
119894| and |119864
119895| denote the number
of edges in 119878119894and 119878
119895 respectively According to Table 2 we
can conclude that |119864119894| minus |119864
119895| = 1 In order to leverage the
dependency among building blocks we consider only 1198612in
Diff(119878119894119878119895) For example Diff(119878
51198783) = 119861
2 Let119879
3denote all
acceptable cases for 1198783 And let119877
1denote the set of acceptable
cases for Diff(1198785 1198783) Then we can use 119878
3and Diff(119878
5
1198783) to construct all acceptable cases for 119878
5 Then we apply
this idea for constructing all acceptable cases for each 119878119894in
Table 2Given a path-quad ⟨119875
119860119886 119875119860119887 119875119860119888 119875119860119889⟩ an acceptable case
has the following properties
(1) if there is one root 3-overlap path there can be atmostone root 2-overlap path
(2) otherwise there can be at most two root 2-overlappaths
323 Path-Counting Formula forΦ119886119887119888119889
Now we present thepath-counting formula forΦ
119886119887119888119889as follows
Φ119886119887119888119889
= sum
119860
( sum
Type 1(1
2)
119871quad
Φ119860119860119860119860
+ sum
Type 2(1
2)
119871quad+1
Φ119860119860119860
+ sum
Type 3(1
2)
119871quad+2
Φ119860119860)
(14)
where Φ119860119860= (12)(1+119865
119860)Φ119860119860119860
= (14)(1+3119865119860)Φ119860119860119860119860
=
(18)(1+7119865119860) 119865119860 the inbreeding coefficient of119860119860 a quad-
common ancestor of 119886 119887 119888 and 119889 Type 1 zero root 2-overlapand zero root 3-overlap path Type 2 one root 2-overlap path119875119860119904
ending at 119904
Type 3
Case 1 two root 2-overlap paths 1198751198601199041
1198751198601199042
ending at 1199041and 1199042 respectively
Case 2 one root 3-overlap path119875119860119905
ending at 119905Case 3 one root 2-overlap path119875119860119904 one root 3-overlap
path 119875119860119905
ending at 119904 and 119905respectively
119871quad =
119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889
for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 119871119875119860119904
for Type 2119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889
minus1198711198751198601199041
minus 1198711198751198601199042
for Case 1 isin Type 3119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 2 lowast 119871
119875119860119905for Case 2 isin Type 3
119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888+ 119871119875119860119889
minus119871119875119860119905minus 119871119875119860119904
for Case 3 isin Type 3(15)
and 119871119875119860119886
the length of the path 119875119860119886
(also applicable for 119875119860119887
119875119860119888 119875119860119889 etc)
For completeness the path-counting formulas for Φ119886119886119887119888
and Φ119886119886119886119887
are presented in Appendix A The correctness ofthe path-counting formula for four individuals is proven inAppendix C
Computational and Mathematical Methods in Medicine 11
⟨ ⟩(PAa PAb) (PAc PAd) = b
A
c
s t
da
A rarr s rarr aA rarr s rarr bA rarr t rarr cA rarr t rarr d
(a)
⟨ ⟩(PAa PAb) (PAc PAd) = b
A
c
x y
da
A rarr x rarr a
A rarr x rarr d
A rarr y rarr bA rarr y rarr c
(b)
Figure 12 Examples of 2-pair-path-quads for Φ119886119887119888119889
33 Path-Counting Formulas for Two Pairs of Individuals
331 Terminology and Definitions
(1) 2-Pair-Path-Pair It consists of two pairs of path-pairsdenoted as ⟨(119875
119878119886 119875119878119887) (119875119879119888 119875119879119889)⟩ where 119875
119878119886isin 119875(119878 119886) 119875
119878119887isin
119875(119878 119887) 119875119879119888isin 119875(119879 119888) 119875
119879119889isin 119875(119879 119889) 119878 is a common ancestor
of 119886 and 119887 and 119879 is a common ancestor of 119888 and 119889 If119860 = 119878 =119879 then 119860 is a quad-common ancestor of 119886 119887 119888 and 119889
(2) Homo-Overlap and Heter-Overlap Individual Given twopairs of individuals ⟨119886 119887⟩ and ⟨119888 119889⟩ if 119904 isin 119861119894 119862(119875
119860119886 119875119860119887) (or
119904 isin 119861119894 119862(119875119860119888 119875119860119889) we call 119904 a homo-overlap individual when
119875119860119886
and 119875119860119887
(or 119875119860119888
and 119875119860119889) pass through the same parent of
119904 If 119903 isin 119861119894 119862(119875119860119894 119875119860119895) where 119894 isin 119886 119887 and 119895 isin 119888 119889 we call
119903 a heter-overlap individual when 119875119860119894
and 119875119860119895
pass throughthe same parent of 119903
(3) Root Homo-Overlap and Heter-Overlap Path Given a 2-pair-path-pair ⟨(119875
119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ if 119904 is a homo-overlap
individual and the homo-overlap path extends all the wayto the quad-common ancestor 119860 then we call it a roothomo-overlap path If 119903 is a heter-overlap individual and theheter-overlap path extends all the way to the quad-commonancestor 119860 then we call it a root heter-overlap path
Example 8 119860 is quad-common ancestor for 119886 119887 119888 and 119889 inFigure 12 For (a) 119904 is a homo-overlap individual between 119875
119860119886
and 119875119860119887
119905 is a homo-overlap individual between 119875119860119888
and 119875119860119889 And
119860 rarr 119904 and 119860 rarr 119905 are root homo-overlap paths For (b) 119909 isa heter-overlap individual between 119875
119860119886and 119875
119860119889 119910 is a heter-
overlap individual between 119875119860119887
and 119875119860119888 And 119860 rarr 119909 and
119860 rarr 119910 are root heter-overlap paths
332 Path-Counting Formula for Φ119886119887119888119889
Now we presenta path-pair level graphical representation for ⟨(119875
119860119886 119875119860119887)
(119875119860119888 119875119860119889)⟩ shown in Figure 13 The options for an edge can
be 119879119883 119879119883 (Refer to Section 311 for definitions of 119879119883and 119879119883) Based on the different types of ⟨119875
119860119886 119875119860119887 119875119860119888 119875119860119889⟩
presented in (14) all cases for ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ are
summarized in Table 3 where ℎ is the last individual of a roothomo-overlap path 119875
119860ℎ(ie the path 119875
119860ℎending at ℎ) and 119903
1
and 1199032are the last individuals of root heter-overlap paths 119875
1198601199031
and 1198751198601199032
respectivelyGiven a pedigree graph having one or multiple progeni-
tors 119901119894| 119894 gt 0 we define that the generation of a progenitor
Table 3 A summary of all cases for ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩ ⟨(119875
119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩
Zero root 2-overlap andzero root 3-overlap
Zero root homo-overlap and zero rootheter-overlap
One root 2-overlap path
One root homo-overlap and zero rootheter-overlapZero root homo-overlap and one rootheter-overlap
Two root 2-overlap paths
Two root homo-overlaps and zero rootheter-overlapZero root homo-overlap and two rootheter-overlaps
One root 3-overlap path One root homo-overlap and two rootheter-overlaps and ℎ = 119903
1= 1199032
One root 2-overlap andone root 3-overlap
One root homo-overlap and two rootheter-overlaps and 119903
1= 1199032= ℎ
One root homo-overlap and two rootheter-overlaps and ℎ = 119903
1= 1199032
119901119894is 0 denoted as gen(119901
119894) = 0 If an individual 119886 has only
one parent 119901 then we define gen(119886) = gen(119901) + 1 If anindividual 119886 has two parents 119891 and 119898 we define gen(119886) =MAXgen(119891) gen(119898) + 1
The path-counting formula forΦ119886119887119888119889
is as follows
Φ119886119887119888119889
= sum
119860
( sum
Type 1(1
2)
1198712-pair
Φ119860119860119860
+ sum
Type 2(1
2)
1198712-pair+1
Φ119860119860119860
+ sum
Type 3(1
2)
1198712-pair+2
Φ119860119860
+ sum
Type 4(1
2)
1198712-pair+1
Φ119860119860)
+ sum
(119878119879)isinType 5(1
2)
119871⟨119875119878119886119875119878119887⟩+119871⟨119875119879119888119875119879119889
⟩+1
Φ119861119861
(16)
where 119860 a quad-common ancestor of 119886 119887 119888 and 119889 119878a common ancestor of 119886 and 119887 and 119879 a common ances-tor of 119888 and 119889 For ⟨(119875
119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ (119878 = 119879 =
119860) there are four types (ieType 1 to Type 4)
12 Computational and Mathematical Methods in Medicine
S0S1 S2 S3 S4 S5 S6 S7
S8 S9 S10 S11 S12 S13 S14 S15 S16
PAa
PAdPAb
PAc
Figure 13 Scenarios of ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ at path-pair level
Type 1 zero root homo-overlap and zero root heter-overlapType 2 zero root homo-overlap and one root heter-overlap 119875
119860119903ending at 119903
Type 3
zero root homo-overlap and two rootheter-overlap 119875
1198601199031and1198751198601199032
ending at1199031and 1199032 respectively
one root homo-overlap 119875119860ℎ
ending at ℎand two root heter-overlap 119875
1198601199031and 119875
1198601199032
ending at 1199031and 1199032 and 119903
1= 1199032
(17)
Type 4 one root homo-overlap 119875119860ℎ
ending at ℎ andtwo root heter-overlap ending at 119903
1and 1199032 and ℎ =
1199031= 1199032 For ⟨(119875
119878119886 119875119878119887) (119875119879119888 119875119879119889)⟩ (119878 = 119879) there is
one type (ie Type 5)Type 5 ⟨119875
119878119886 119875119878119887⟩ has zero overlap individual ⟨119875
119879119888
119875119879119889⟩ has zero overlap individual
At most one path-pair (either ⟨119875119878119886 119875119878119887⟩ or ⟨119875
119879119888
119875119879119889⟩) can have crossover individualsBetween a path from ⟨119875
119878119886 119875119878119887⟩ and a path from ⟨119875
119879119888 119875119879119889⟩
there are no overlap individuals but there can be crossoverindividuals 119909 where 119909 = 119878 and 119909 = 119879
119861=
119878 when gen (119878) lt gen (119879)119878 when gen (119878) = gen (119879)
and 119879 has two parents119879 otherwise
1198712-pair =
119871119875119860119886+ 119871119875119860119887
+119871119875119860119888+ 119871119875119860119889
for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 119871119875119860119903
for Type 2119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 1198711198751198601199031
minus 1198711198751198601199032
for Type 3119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 2 lowast 119871
119875119860ℎfor Type 4
119871⟨119875119878119886 119875119878119887⟩
= 119871119875119878119886+ 119871119875119878119887
for Type 5
119871⟨119875119879119888 119875119879119889⟩
= 119871119875119879119888+ 119871119875119879119889
for Type 5
(18)
Note that if ⟨119886 119887⟩ and ⟨119888 119889⟩ have zero quad-commonancestors we have the following formula for Φ
119886119887119888119889
Φ119886119887119888119889
= sum
(119878119879)isinType 6(1
2)
119871⟨119875119878119886119875119878119887⟩+119871⟨119875119879119888119875119879119889
⟩
Φ119878119878lowast Φ119879119879 (19)
Type 6 ⟨119875119878119886 119875119878119887⟩ is a nonoverlapping path-pair and ⟨119875
119879119888
119875119879119889⟩ is a nonoverlapping path-pair Between a path from
⟨119875119878119886 119875119878119887⟩ and a path from ⟨119875
119879119888 119875119879119889⟩ there are no overlap
individuals but there can be crossover individuals119871⟨119875119878119886 119875119878119887⟩
and 119871⟨119875119879119888119875119879119889⟩
are defined as in Type 5The correctness of the path-counting formula forΦ
119886119887119888119889is
proven in Appendix C For completeness please refer to [18]for the path-counting formulas for Φ
119886119886119887119888 Φ119886119887119886119888
Φ119886119887119886119887
andΦ119886119886119886119887
34 Experimental Results In this section we show the effi-ciency of our path-counting method using NodeCodes forcondensed identity coefficients by making comparisons withthe performance of a recursive method used in [10] Weimplemented two methods (1) using recursive formulas tocompute each required kinship coefficient and generalizedkinship coefficient (2) using path-counting method coupledwith NodeCodes to compute each required kinship coeffi-cient and generalized kinship coefficient independently Werefer to the first method as Recursive the second methodas NodeCodes For completeness please refer to [18] for thedetails of the NodeCodes-based method
Nodecodes of a node is a set of labels each representing apath to the node from its ancestors Given a pedigree graphlet 119903 be the progenitor (ie the node with 0 in-degree)(For simplicity we assume there is one progenitor 119903 asthe ancestor of all individuals in the pedigree Otherwise avirtual node 119903 can be added to the pedigree graph and allprogenitors can be made children of 119903) For each node 119906 inthe graph the set of NodeCodes of 119906 denoted as NC(119906) areassigned using a breadth-first-search traversal starting from119903 as follows
(1) If 119906 is 119903 then NC(119903) contains only one element theempty string
(2) Otherwise let 119906 be a node with NC(119906) and V0 V1
V119896be 119906rsquos children in sibling order then for each 119909
in NC(119906) a code 119909119894lowast is added to NC(V119894) where 0 le
119894 le 119896 and lowast indicates the gender of the individualrepresented by node V
119894
Computational and Mathematical Methods in Medicine 13
Computations of kinship coefficients for two individualsand generalized kinship coefficients for three individualspresented in [11 12 14 15] are using NodeCodes TheNodeCodes-based computation schemes can also be appliedfor the generalized kinship coefficients for four individualsand two pairs of individuals For completeness please referto [18] for the details using NodeCodes to compute thegeneralized kinship coefficients for four individuals and twopairs of individuals based on our proposed path-countingformulas in Sections 32 and 33
In order to test the scalability of our approach for cal-culating condensed identity coefficients on large pedigreeswe used a population simulator implemented in [11] togenerate arbitrarily large pedigreesThe population simulatoris based on the algorithm for generating populations withoverlapping generations in Chapter 4 of [19] along withthe parameters given in Appendix B of [20] to model therelatively isolated Finnish Kainuu subpopulation and itsgrowth during the years 1500ndash2000 An overview of thegeneration algorithmwas presented in [11 12 14]The param-eters include startingending year initial population sizeinitial age distribution marriage probability maximum ageat pregnancy expected number of children by time periodimmigration rate and probability of death by time period andage group
We examine the performance of condensed identity coef-ficients using twelve synthetic pedigrees which range from75 individuals to 195197 individuals The smallest pedigreespans 3 generations and the largest pedigree spans 19 gener-ations We analyzed the effects of pedigree size and the depthof individuals in the pedigree (the longest path between theindividual and a progenitor) on the computation efficiencyimprovement
In the first experiment 300 random pairs were selectedfrom each of our 12 synthetic pedigrees Figure 14 showscomputation efficiency improvement for each pedigree Ascan be seen the improvement of NodeCodes over Recursivegrew increasingly larger as the pedigree size increased froma comparable amount of 2683 on the smallest pedigree to9475 on the largest pedigree It also shows that path-count-ing method coupled with NodeCodes can scale very well onlarge pedigrees in terms of computing condensed identitycoefficients
In our next experiment we examined the effect of thedepth of the individual in the pedigree on the query time Foreach depth we generated 300 random pairs from the largestsynthetic pedigree
Figure 15 shows the effect of depth on the compu-tation efficiency improvement We can see the improve-ment of NodeCodes over Recursive ranging from 8648 to9130
4 Conclusion
We have introduced a framework for generalizing Wrightrsquospath-counting formula for more than two individuals Aim-ing at efficiently computing condensed identity coefficients
0
50
100
150
200
77 181
383
769
1558
3105
6174
1235
1
2466
7
4976
1
9832
8
1951
97
250
300
Aver
age t
ime (
ms)
Individuals in pedigree
RecursiveNodecodes
Figure 14 The effect of pedigree size on computation efficiencyimprovement
0200400600800
10001200140016001800
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Aver
age t
ime (
ms)
Depth
RecursiveNodeCodes
Figure 15 The effect of depth on computation efficiency improve-ment
we proposed path-counting formulas (PCF) for all general-ized kinship coefficients for which are sufficient for express-ing condensed identity coefficients by a linear combinationWe also perform experiments to compare the efficiency of ourmethod with the recursive method for computing condensedidentity coefficients on large pedigrees Our future workincludes (i) further improvements on condensed identifycoefficients computation by collectively calculating the setof generalized kinship coefficients to avoid redundant com-putations and (ii) experimental results for using PCF inconjunction with encoding schemes (eg compact path-encoding schemes [13]) for computing condensed identitycoefficients on very large pedigrees
Appendices
A Path-Counting Formulas of Special Cases
A1 Path-Counting Formula for Φ119886119886119887
For ⟨1198751198601198861 1198751198601198862⟩ we
introduce a special case where 1198751198601198861
and 1198751198601198862
aremergeable
14 Computational and Mathematical Methods in Medicine
PAa1 PAa2 PAa1 PAa2
S0 S1
PAb PAb PAb
If is mergeable⟨P ⟩Aa1 PAa2
PAa
S2 S3
Figure 16 A path-pair level graphical representation of ⟨1198751198601198861 1198751198601198862
119875119860119887⟩
Definition A1 (Mergeable Path-Pair) A path-pair ⟨1198751198601198861
1198751198601198862⟩ is mergeable if and only if the two paths 119875
1198601198861and 119875
1198601198862
are completely identical
Next we present a graphical representation of ⟨1198751198601198861 1198751198601198862
119875119860119887⟩ in Figure 16
Lemma A2 For 1198782and 119878
3in Figure 16 ⟨119875
1198601198861 1198751198601198862⟩ cannot
be a mergeable path-pair
Proof For 1198782and 119878
3 if ⟨119875
1198601198861 1198751198601198862⟩ is mergeable then
any common individual 119904 between 1198751198601198861
and 119875119860119887
is alsoa shared individual between 119875
1198601198862and 119875
119860119887 It means
119904 isin 119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) which contradicts the fact that
119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) = 0
Considering all three scenarios in Figure 16 only 1198781can
have a mergeable path-pair ⟨1198751198601198861 1198751198601198862⟩ by Lemma A2 Now
we present our path-counting formula forΦ119886119886119887
where 119886 is notan ancestor of 119887
Φ119886119886119887
= sum
119860
( sum
Type 1(1
2)
119871 tripleminus1
Φ119860119860119860
+ sum
Type 2(1
2)
119871 triple
Φ119860119860
+ sum
Type 3(1
2)
119871⟨119875119860119886119875119860119887⟩+1
Φ119860119860)
(A1)
where 119860 a common ancestor of 119886 and 119887When ⟨119875
1198601198861 1198751198601198862⟩ is not mergeable
Type 1 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has no root 2-overlap
Type 2 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has one root 2-overlap path
119875119860119904
ending at the individual 119904
When ⟨1198751198601198861 1198751198601198862⟩ is mergeable
Type 3 ⟨119875119860119886 119875119860119887⟩ is a nonoverlapping path-pair
119871 triple = 1198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887
for Type 11198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887minus 119871119875119860119904
for Type 2
119871⟨119875119860119886 119875119860119887⟩
= 119871119875119860119886+ 119871119875119860119887
for Type 3
(A2)
For the sake of completeness if 119886 is an ancestor of 119887 there isno recursive formula for Φ
119886119886119887in [10] but we can use either
the recursive formula for Φ119886119887119888
or the path-counting formulaforΦ119886119887119888
to computeΦ11988611198862119887
A2 Path-Counting Formula for Φ119886119886119887119888
Given a path-quad⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩ if ⟨119875
1198601198861 1198751198601198862⟩ is not mergeable then
we process the path-quad as equivalent to ⟨119875119860119886 119875119860119887 119875119860119888
119875119860119889⟩ If ⟨119875
1198601198861 1198751198601198862⟩ is mergeable the path-quad ⟨119875
1198601198861 1198751198601198862
119875119860119887 119875119860119888⟩ can be condensed to scenarios for ⟨119875
119860119886 119875119860119887 119875119860119888⟩
Now we present a path-counting formula forΦ119886119886119887119888
where119886 is not an ancestor of 119887 and 119888 as follows
Φ119886119886119887119888
= sum
119860
( sum
Type 1(1
2)
119871quadminus1
Φ119860119860119860119860
+ sum
Type 2(1
2)
119871quad
ΦAAA
+ sum
Type 3(1
2)
119871quad+1
Φ119860119860)
+sum
119860
( sum
Type 4(1
2)
119871 triple+1
Φ119860119860119860
+ sum
Type 5(1
2)
119871 triple+2
Φ119860119860)
(A3)
where 119860 a quad-common ancestor of 119886 119887 119888 and 119889When ⟨119875
1198601198861 1198751198601198862⟩ is not mergeable
Type 1 zero root 2-overlap and zero root 3-overlappathType 2 one root 2-overlap path 119875
119860119904ending at 119904
Type 3
Case 1 two root 2-overlap paths 1198751198601199041
and 1198751198601199042
ending at 1199041and 1199042 respectively
Case 2 one root 3-overlap path 119875119860119905
ending at 119905Case 3 one root 2-overlapand one root 3-overlap paths119875119860119904
and 119875119860119905
ending at 119904 and 119905respectively
(A4)
When ⟨1198751198601198861 1198751198601198862⟩ is mergeable
Type 4 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has zero root 2-overlap path
Type 5 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has one root 2-overlap path119875
119860119904
ending at 119904
119871quad=
1198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
for Type 11198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus119871119875119860119904
for Type 21198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus1198711198751198601199041
minus 1198711198751198601199042
for Case 1isinType 31198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus119871119875119860119905
for Case 2isinType 31198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus119871119875119860119905minus 119871119875119860119904
for Case 3isinType 3
119871 triple = 119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
for Type 4119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888minus 119871119875119860119904
for Type 5(A5)
Computational and Mathematical Methods in Medicine 15
Note that if 119886 is an ancestor of either 119887 or 119888 or both ofthem then the path-counting formula of Φ
119886119887119888119889is applicable
to computeΦ11988611198862119887119888
A3 Path-Counting Formula for Φ119886119886119886119887
A special case of⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ for ⟨119875
1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ is introduced
when ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable With the existence of
a mergeable path-triple ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ can be con-
densed to ⟨119875119860119886 119875119860119887⟩
Definition A3 (Mergeable Path-Triple) Given three paths1198751198601198861
1198751198601198862
and 1198751198601198863
they are mergeable if and only if theyare completely identical
Lemma A4 Given a path-quad ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there
must be at least one mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩
⟨1198751198601198861 1198751198601198863⟩ ⟨1198751198601198862 1198751198601198863⟩
Proof For an individual 119886 with two parents 119891 and 119898 thepaternal allele of the individual 119886 is transmitted from 119891 andthe maternal allele is transmitted from119898 At allele level onlytwo descent paths starting from an ancestor are allowed Fora path-quad ⟨119875
1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there must be at least one
mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩ ⟨1198751198601198861 1198751198601198863⟩ and
⟨1198751198601198862 1198751198601198863⟩
For simplicity we treat ⟨1198751198601198861 1198751198601198862⟩ as a default mergeable
path-pairNow we present the path-counting formula for Φ
119886119886119886119887
where 119886 is not an ancestor of 119887 as follows
Φ119886119886119886119887
= sum
119860
(3
2( sum
Type 1(1
2)
119871 tripleminus1
Φ119860119860119860
+ sum
Type 2(1
2)
119871 triple
Φ119860119860)
+ sum
Type 3(1
2)
119871pair+2
Φ119860119860)
(A6)
where 119860 a common ancestor of 119886 and 119887When there is only one mergeable path-pair (let us con-
sider ⟨1198751198601198861 1198751198601198862⟩ as the mergeable path-pair)
Type 1 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has zero root 2-overlap path
Type 2 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has one root 2-overlap path
119875119860119904
ending at 119904
When ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable
Type 3 ⟨119875119860119886 119875119860119887⟩ is nonoverlapping
119871 triple = 1198711198751198601198861
+ 1198711198751198601198863
+ 119871119875119860119887
for Type 11198711198751198601198861
+ 1198711198751198601198863
+ 119871119875119860119887minus 119871119875119860119904
for Type 2
119871pair = 119871119875119860119886 + 119871119875119860119887 for Type 3
(A7)
Note that if 119886 is an ancestor of 119887 we treat Φ119886119886119886119887
=
Φ119886111988621198863119887
Then we apply the path-counting formula forΦ119886119887119888119889
to computeΦ119886111988621198863119887
Case21 Case31 ΦAAAΦabCase22 Case32
Case23 ΦAA
Figure 17 Dependency graph for different cases regardingΦ119886119887119888
andΦ119886119886119887
B Proof for Path-Counting Formulas ofThree Individuals
Wefirst demonstrate that for one triple-common ancestor119860the path-counting computation of Φ
119886119887119888is equivalent to the
computation using recursive formulas Then we prove thecorrectness of the path-counting computation for multipletriple-common ancestors
B1 One Triple-Common Ancestor Considering the differenttypes of path-triples starting from a triple-common ancestor119860 in a pedigree graph119866 contributing toΦ
119886119887119888andΦ
119886119886119887119866 can
have 5 different cases
Case 21 119866 does not haveany path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with root overlapCase 22 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with root overlapCase 23 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
having mergeablepath-pair⟨119875
1198601198861 1198751198601198862⟩
lArr997904 Φ119886119886119887
Case 31 119866 does not haveany path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩
with root overlapCase 32 119866 has path-triples
⟨119875119860119886 119875119860119887 119875119860119888⟩
with root overlap
lArr997904 Φ119886119887119888
(B1)
Based on the 5 cases from Case 21 to Case 32 we firstconstruct a dependency graph shown in Figure 17 consist-ent with the recursive formulas (3) (4) and (5) for the gener-alized kinship coefficients for three individuals
Then we take the following steps to prove the correctnessof the path-counting formulas (12) and (A1)
(i) forΦ119886119887 the correctness of the path-counting formula
(ie Wrightrsquos formula) is proven in [21] For Case 21and Case 22 the correctness is proven based on thecorrectness of Cases 31 and 32
(ii) for Case 23 it has no cycle but only depends on Φ119886119887
Thus we prove the correctness of Case 23 by trans-forming the case toΦ
119886119887
16 Computational and Mathematical Methods in Medicine
a b
c
(a)
A
a b c
(b)
Figure 18 (a) 119888 is a parent of 119886 and 119887 (b) no individual is a parent of another
Parent-child relationshipAncestor-descendant relationship
A
a
s v p
f b c
(a)
Parent-child relationshipAncestor-descendant relationship
c
a
s v
f b
(b)
Figure 19 (a) No individual is a parent of another (b) 119888 is an ancestor of 119886 and 119887
(iii) for Cases 31 and 32 the correctness is proven byinduction on the number of edges 119899 in the pedigreegraph 119866
B11 Correctness Proof for Case 31
Case 31 ForΦ119886119887119888
119866 does not have any path triples ⟨119875119860119886 119875119860119887
119875119860119888⟩ with root overlap
Proof (Basis) There are two basic scenarios (i) one individ-ual is a parent of another (ii) no individual is a parent ofanother among 119886 119887 and 119888
Using the recursive formula (3) to compute Φ119886119887119888
forFigure 18(a) Φ
119886119887119888= (12)Φ
119888119887119888= (12)
2
Φ119888119888119888 for Figure 18(b)
Φ119886119887119888= (12)Φ
119860119887119888= (12)
2
Φ119860119860119888
= (12)3
Φ119860119860119860
Using the path-counting formula (12) if a path-triple
⟨119875119860119886 119875119860119887 119875119860119888⟩ has no root overlap (ie Type 1) then the
contribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ
119886119887119888can be computed as
follows sumType 1(12)119871⟨119875119860119886119875119860119887
119875119860119888⟩Φ119860119860119860
where 119871⟨119875119860119886119875119860119887 119875119860119888⟩
=
119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
For Figure 18(a) 119888 is the only triple-common ancestor
and we obtain Φ119886119887119888
= (12)119871⟨119875119888119886119875119888119887
119875119888119888⟩Φ119888119888119888
= (12)2
Φ119888119888119888 for
Figure 18(b) we obtain Φ119886119887119888
= (12)119871⟨119875119860119886119875119860119887
119875119860119888⟩Φ119860119860119860
=
(12)3
Φ119860119860119860
Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Then we show it istrue for 119899 = 119896 + 1
For Figures 19(a) and 19(b) among 119886 119887 and 119888 let 119886 be theindividual having the longest path starting from their triple-common ancestor in the pedigree graph119866with (119896+1) edgesIf we remove the node 119886 and cut the edge 119891 rarr 119886 from 119866
then the new graph 119866lowast has 119896 edges In terms of computingΦ119891119887119888
119866lowast satisfies the condition for induction hypothesisFor Figure 19(a) Φ
119891119887119888= sumType 1(12)
119871⟨119875119860119891119875119860119887119875119860119888⟩Φ119860119860119860
Based on the recursive formula (3)Φ
119886119887119888= (12)(Φ
119891119887119888+Φ119898119887119888)
where 119891 and 119898 are parents of 119886 In 119866 119886 only has one parent119891 thus it indicatesΦ
119898119887119888= 0 Then we can plug-in the path-
counting formula forΦ119891119887119888
to obtain
Φ119886119887119888=1
2Φ119891119887119888
=1
2lowast sum
Type 1(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩
Φ119860119860119860
= sum
Type 1(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860119860
∵ 119871⟨119875119860119886119875119860119887 119875119860119888⟩
= 119871⟨119875119860119891119875119860119887 119875119860119888⟩
+ 1
there4 Φ119886119887119888= sum
Type 1(1
2)
119871⟨119875119860119886119875119860119887119875119860119888⟩
Φ119860119860119860
(B2)
Similarly for Figure 19(b) we obtain Φ119886119887119888
=
sumType 1(12)119871⟨119875119888119891119875119888119887119875119888119888⟩+1
Φ119888119888119888= sumType 1(12)
119871⟨119875119888119886119875119888119887119875119888119888⟩Φ119888119888119888
Thus it is true for 119899 = 119896 + 1
B12 Correctness Proof for Case 32
Case 32 ForΦ119886119887119888
119866 has path triples ⟨119875119860119886 119875119860119887 119875119860119888⟩with root
overlap
Proof (Basis) There are three basic scenarios (i) there are twoindividuals who are parents of another (ii) there is only oneindividual who is parent of another (iii) there is no individualwho is a parent of another among 119886 119887 and 119888
Computational and Mathematical Methods in Medicine 17
a
b
c
(a)
A
a
b c
(b)
A
a
s
b
c
(c)
Figure 20 (a) 119887 is a parent of 119886 and 119888 is a parent of 119887 (b) 119887 is a parentof 119886 (c) no individual who is a parent of another
Using the recursive formula (3) to compute Φ119886119887119888
inFigure 20 for Figure 20(a) Φ
119886119887119888= (12)Φ
119887119887119888= (12)
2
Φ119887119888=
(12)3
Φ119888119888 for Figure 20(b)Φ
119886119887119888= (12)Φ
119887119887119888= (12)
2
Φ119887119888=
(12)4
Φ119860119860
for Figure 20(c)Φ119886119887119888= (12)
2
Φ119904119904119888= (12)
3
Φ119904119888=
(12)5
Φ119860119860
Using the path-counting formula (12) if a path-triple
⟨119875119860119886 119875119860119887 119875119860119888⟩ has root overlap (ie Type 2) then the con-
tribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ
119886119887119888can be computed as
followssumType 2(12)119871⟨119875119860119886119875119860119887
119875119860119888⟩+1
Φ119860119860
where 119871⟨119875119860119886 119875119860119887 119875119860119888⟩
=
119871119875119860119886
+ 119871119875119860119887
+ 119871119875119860119888minus 119871119875119860119904
and 119904 is the last individual of theroot overlap path 119875
119860119904
For Figure 20(a) 119888 is the only triple-common ancestorand we obtain Φ
119886119887119888= (12)
119871⟨119875119888119886119875119888119887119875119888119888⟩+1
Φ119888119888= (12)
2+1
Φ119888119888=
(12)3
Φ119888119888 Similarly for Figures 20(b) and 20(c) we obtain
Φ119886119887119888= (12)
4
Φ119860119860
and Φ119886119887119888= (12)
5
Φ119860119860
respectively
Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Show that it is truefor = 119896 + 1
For Figures 21(a) 21(b) and 21(c) among 119886 119887 and 119888 let119886 be the individual who has the longest path and let 119901 be aparent of 119886 Then we cut the edge 119901 rarr 119886 from 119866 and obtaina new graph 119866lowast which satisfies the condition of inductionhypothesis For Figure 21(a) we use the path-counting for-mula forΦ
119891119887119888in 119866lowast Φ
119891119887119888= sumType 2(12)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860
In 119866 119891 is the only parent of 119886 according to the recursive
formula (3) we have Φ119886119887119888= (12)Φ
119891119887119888 Then we can plug-in
the Φ119891119887119888
and obtain
Φ119886119887119888=1
2Φ119891119887119888
=1
2sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860
= sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1
Φ119860119860
∵ 119871⟨119875119860119886 119875119860119887 119875119860119888⟩
= 119871⟨119875119860119891119875119860119887 119875119860119888⟩
+ 1
there4 Φ119886119887119888= sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1
Φ119860119860
= sum
Type 2(1
2)
119871⟨119875119860119886119875119860119887119875119860119888⟩+1
Φ119860119860
(B3)
For Figures 21(b) and 21(c) we take the same steps as we cal-culate Φ
119886119887119888for Figure 21(a)
In summary it is true for 119899 = 119896 + 1
A
a
s
t
f
b
c
(a)
a
t
b
A
s c
(b)
a
s
t
b
c
(c)Figure 21 (a) No individual who is a parent of another (b) 119887 is aparent of 119886 (c) 119887 is a parent of 119886 and 119888 is an ancestor of 119887
B13 Correctness Proof for Case 23
Case 23 For Φ119886119886119887
the path-triples in the pedigree graph 119866have mergeable path-pair
Proof Considering the relationship between 119886 and 119887 119866has two scenarios (i) 119887 is not an ancestor of 119886 (ii) 119887 isan ancestor of 119886 Using the path-counting formula (A1)if a path-triple ⟨119875
1198601198861 1198751198601198862 119875119860119887⟩ isin Type 3 which means
that it has a mergeable path-pair then the contributionof ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ to Φ
119886119886119887can be computed as follows
sumType 3(12)119871⟨119875119860119886119875119860119887
⟩+1Φ119860119860
where 119871⟨119875119860119886 119875119860119887⟩
= 119871119875119860119886+ 119871119875119860119887
Using the recursive formula (4) we obtain Φ
119886119886119887=
(12)(Φ119886119887+ Φ119891119898119887)
For Figure 22(a) 119860 is a common ancestor of 119886 and 119887∵ 119886 only has one parent 119891
there4 Φ119886119886119887
=1
2(Φ119886119887+ Φ119891119898119887)
=1
2(Φ119886119887+ 0) =
1
2Φ119886119887
(as 119898 is missing) (B4)
For Φ119886119887 we use Wrightrsquos formula and obtain Φ
119886119887=
sum119875(12)119871⟨119875119860119886119875119860119887
⟩Φ119860119860
where 119875 denotes all nonoverlappingpath-pairs ⟨119875
119860119886 119875119860119887⟩
Then we have Φ119886119886119887
= (12)Φ119886119887
=
(12)sum119875(12)119871⟨119875119860119886119875119860119887
⟩Φ119860119860= sum119875(12)119871⟨119875119860119886119875119860119887
⟩+1Φ119860119860
For Figure 22(b) we can also transform the computation
of Φ119886119886119887
to Φ119886119887
In summary it shows that the path-counting formula(A1) is true for Case 23
B14 Correctness Proof for Cases 21 and 22 For Φ119886119886119887
whenthere is no path-triple having mergeable path-pair (ie thepath-triple belongs to either Case 21 or Case 23)Φ
119886119886119887can be
transformed toΦ11988611198862119887
which is equivalent to the computationof Φ119886119887119888
for Cases 31 and 32 The correctness of our path-counting formula for Cases 31 and 32 is proven Thus weobtain the correctness for Φ
119886119886119887when the path-triple belongs
to either Case 21 or Case 22
B2 Multiple Triple-Common Ancestors Now we providethe correctness proof for multiple triple-common ancestorsregarding the path-counting formulas (12) and (A1)
18 Computational and Mathematical Methods in Medicine
A
a
s
w
t
f
b
Parent-child relationshipAncestor-descendant relationship
(a)
a
s
f
b
Parent-child relationshipAncestor-descendant relationship
(b)
Figure 22 (a) 119887 is not an ancestor of 119886 (b) 119887 is an ancestor of 119886
Lemma A Given a pedigree graph 119866 and three individuals 119886119887 119888 having at least one trip-common ancestorΦ
119886119887119888is correctly
computed using the path counting formulas (12) and (A1)
Proof Proof by induction on the number of triple-commonancestorsBasis 119866 has only one triple-common ancestor of 119886 119887 and 119888
The correctness of (12) and (A1) for 119866 with only one tri-ple-common ancestor of 119886 119887 and 119888 is proven in the previoussection
Induction Hypothesis Assume that if 119866 has 119896 or less triple-common ancestors of 119886 119887 and 119888 (12) and (A1) are correct for119866
Induction Step Now we show that it is true for 119866 with 119896 + 1triple-common ancestors of 119886 119887 and 119888
Let 119879119903119894 119862(119886 119887 119888 119866) denote all triple-common ancestorsof 119886 119887 and 119888 in 119866 where 119879119903119894 119862(119886 119887 119888 119866) = 119860
119894| 1 le 119894 le 119896 +
1 Let 1198601be the most top triple-common ancestor such that
there is no individual among the remaining ancestors 119860119894|
2 le 119894 le 119896 + 1 who is an ancestor of 1198601 Let 119878(119860
1) denote the
contribution from 1198601to Φ119886119887119888
Because119860
1is themost top triple-common ancestor there
is no path-triple from 119860119894| 2 le 119894 le 119896 + 1 to 119886 119887 and
119888 which passes through 1198601 Then we can remove 119860
1from
119866 and delete all out-going edges from 1198601and obtain a new
graph 1198661015840 which has 119896 triple-common ancestors of 119886 119887 and 119888It means 119879119903119894 119862(119886 119887 119888 1198661015840) = 119860
119894| 2 le 119894 le 119896 + 1
For the new graph 1198661015840 we can apply our induction
hypothesis and obtainΦ119886119887119888(1198661015840
)For the most top triple-common ancestor 119860
1 there are
two different cases considering its relationship with the othertriple-common ancestors
(1) there is no individual among 119860119894| 2 le 119894 le 119896 + 1 who
is a descendant of 1198601
(2) there is at least one individual among 119860119894| 2 le 119894 le
119896 + 1 who is a descendant of 1198601
For (1) since no individual among 119860119894| 2 le 119894 le 119896 + 1 is a
descendant of 1198601 the set of path-triples from 119860
1to 119886 119887 and
119888 is independent of the set of path-triples from 119860119894| 2 le 119894 le
119896 + 1 to 119886 119887 and 119888 It also means that the contribution from
1198601toΦ119886119887119888
is independent of the contribution from the othertriple-common ancestors
Summing up all contributions we can obtainΦ119886119887119888(119866) =
Φ119886119887119888(1198661015840
) + 119878(1198601)
For (2) let119860119895be one descendant of119860
1 Now both119860
1and
119860119895can reach 119886 119887 and 119888119901119905119894= 119905119886 1198601rarr sdot sdot sdot rarr 119886 119905
119887 1198601rarr sdot sdot sdot rarr 119887 119905
119888 1198601rarr
sdot sdot sdot rarr 119888 a path-triple from 1198601to 119886 119887 and 119888
If 119905119886 119905119887 and 119905
119888all pass through119860
119895 then the path-triple119901119905
119894
is not an eligible path-triple for Φ119886119887119888
When we compute thecontribution from119860
1toΦ119886119887119888
we exclude all such path-tripleswhere 119905
119886 119905119887 and 119905
119888all pass through a lower triple-common
ancestor In other words an eligible path-triple from 1198601
regarding Φ119886119887119888
cannot have three paths all passing through alower triple-common ancestor Therefore we know that thatthe contribution from119860
1toΦ119886119887119888
is independent of the contri-bution from the other triple-common ancestors Summing upall contributions we obtainΦ
119886119887119888(119866) = Φ
119886119887119888(1198661015840
) + 119878(1198601)
C Proof for Four Individuals and TwoPairs of Individuals
Here we give a proof sketch for the correctness of pathcounting formulas for four individuals First of all for fourindividuals in a pedigree graph 119866 we present all differentcases based on which we construct a dependency graphThe correctness of the path-counting formulas for two-pairindividuals can be proved similarly
C1 Proof for Four Individuals Consider the existence ofdifferent types of path-quads regarding Φ
119886119887119888119889 Φ119886119886119887119888
andΦ119886119886119886119887
there are 15 cases for a pedigree graph 119866
Case 21 119866 has path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with zero root overlapCase 22 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with one root overlapCase 23 119866 has path-pairs
⟨119875119860119886 119875119860119887⟩
with zero root overlap
lArr997904 Φ119886119886119886119887
Computational and Mathematical Methods in Medicine 19
Case21
Case31 ΦAAA
ΦAAA
Case41
Case42
Case34ΦAA
Case32
Case331
Case22
Case23
Case431
Case35
Case432
Case4 33
Case332
Case333
Figure 23 Dependency graph for different cases for four individuals
Case 31 119866 has path-quads⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with zero root overlapCase 32 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 2-overlapCase 331 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with two root 2-overlapCase 332 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 3-overlapCase 333 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 2-overlapand one root 3-overlap
Case 34 119866 has path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩
with zero root overlapCase 35 119866 has path-triples
⟨119875119860119886 119875119860119887 119875119860119888⟩
with one root overlap
lArr997904 Φ119886119886119887119888
Case 41 119866 has path-quads⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with zero root overlapCase 42 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 2-overlapCase 431 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with two root 2-overlapCase 432 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 3-overlapCase 433 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 2-overlapand one root 3-overlap
lArr997904 Φ119886119887119888119889
(C1)Then we construct a dependency graph shown in
Figure 23 for all cases for four individualsAccording to the dependency graph in Figure 23 the
intermediate steps including Cases 34 and 35 are already
proved for the computation of Φ119886119887119888
The correctness of thetransformation fromCase 42 to Case 34 can be proved basedon the recursive formula forΦ
119886119887119888119889andΦ
119886119886119887119888 Similarly we can
obtain the transformation from Case 431 to Case 35
C2 Proof for TwoPairs of Individuals Consider the existenceof different types of 2-pair-path-pair regarding Φ
119886119887119888119889 there
are 9 cases which are listed as follows
Case 41 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-
overlap and zero root heter-overlap
Case 42 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-
overlap and one root heter-overlap
Case 431 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root
homo-overlap and two root heter-overlap
Case 432 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root
homo-overlap and two root heter-overlap
Case 44119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root homo-
overlap and zero root heter-overlap
Case 45 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with two root homo-
overlap and zero root heter-overlap
Case 46 119866 has path-triples ⟨119875119860119886 119875119860119887 119875119860119888⟩ with zero root
overlapCase 47 119866 has path-triples ⟨119875
119860119886 119875119860119887 119875119860119888⟩ with one root
overlap
Case 48 119866 has path-pairs ⟨119875119879119888 119875119879119889⟩ with zero root overlap
Then we construct a dependency graph for the casesrelating to Φ
119886119887119888119889in Figure 24
According to the dependency graph in Figure 24Cases 46 47 and 48 are the intermediate steps whichalready are proved for the computation of Φ
119886119887119888 The
correctness of the transformation from Case 42 to Case 46can be proved based on the recursive formula for Φ
119886119887119888119889and
Φ119886119887119886119888
Similarly we can obtain the transformation fromCases 431 and 432 to Case 47 as well as from Case 44 toCase 48 accordingly
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
20 Computational and Mathematical Methods in Medicine
Case41
Case44
ΦAAA
Case42 Case46
Case48
ΦAA
ΦTT
Case431 Case47
Case432
ΦAAAA
Figure 24 Dependency graph for different cases for two pairs of individuals
Acknowledgments
The authors thank Professor Robert C Elston Case Schoolof Medicine for introducing to them the identity coefficientsand referring them to the related literature [7 10 17] Thiswork is partially supported by the National Science Founda-tionGrants DBI 0743705 DBI 0849956 andCRI 0551603 andby the National Institute of Health Grant GM088823
References
[1] Surgeon Generalrsquos New Family Health History Tool Is ReleasedReady for ldquo21st Century Medicinerdquo httpcompmedcomcate-gorypeople-helping-peoplepage7
[2] M Falchi P Forabosco E Mocci et al ldquoA genomewidesearch using an original pairwise sampling approach for largegenealogies identifies a new locus for total and low-density lipo-protein cholesterol in two genetically differentiated isolates ofSardiniardquoThe American Journal of Human Genetics vol 75 no6 pp 1015ndash1031 2004
[3] M Ciullo C Bellenguez V Colonna et al ldquoNew susceptibilitylocus for hypertension on chromosome 8q by efficient pedigree-breaking in an Italian isolaterdquo Human Molecular Genetics vol15 no 10 pp 1735ndash1743 2006
[4] Glossary of Genetic Terms National Human Genome ResearchInstitute httpwwwgenomegovglossaryid=148
[5] CW CottermanA calculus for statistico-genetics [PhD thesis]Columbus Ohio USA Ohio State University 1940 Reprintedin P Ballonoff Ed Genetics and Social Structure DowdenHutchinson amp Ross Stroudsburg Pa USA 1974
[6] G Malecot Les mathematique de lrsquoheredite Masson ParisFrance 1948 Translated edition The Mathematics of HeredityFreeman San Francisco Calif USA 1969
[7] M Gillois ldquoLa relation drsquoidentite en genetiquerdquo Annales delrsquoInstitut Henri Poincare B vol 2 pp 1ndash94 1964
[8] D L Harris ldquoGenotypic covariances between inbred relativesrdquoGenetics vol 50 pp 1319ndash1348 1964
[9] A Jacquard ldquoLogique du calcul des coefficients drsquoidentite entredeux individualsrdquo Population vol 21 pp 751ndash776 1966
[10] G Karigl ldquoA recursive algorithm for the calculation of identitycoefficientsrdquo Annals of Human Genetics vol 45 no 3 pp 299ndash305 1981
[11] B Elliott S F Akgul S Mayes and Z M Ozsoyoglu ldquoEfficientevaluation of inbreeding queries on pedigree datardquo in Proceed-ings of the 19th International Conference on Scientific and Statis-tical Database Management (SSDBM rsquo07) July 2007
[12] B Elliott E Cheng S Mayes and Z M Ozsoyoglu ldquoEfficientlycalculating inbreeding on large pedigrees databasesrdquo Informa-tion Systems vol 34 no 6 pp 469ndash492 2009
[13] L Yang E Cheng and Z M Ozsoyoglu ldquoUsing compactencodings for path-based computations on pedigree graphsrdquo inProceedings of the ACM Conference on Bioinformatics Compu-tational Biology and Biomedicine (ACM-BCB rsquo11) pp 235ndash244August 2011
[14] E Cheng B Elliott and Z M Ozsoyoglu ldquoScalable compu-tation of kinship and identity coefficients on large pedigreesrdquoin Proceedings of the 7th Annual International Conference onComputational Systems Bioinformatics (CSB rsquo08) pp 27ndash362008
[15] E Cheng B Elliott and Z M Ozsoyoglu ldquoEfficient compu-tation of kinship and identity coefficients on large pedigreesrdquoJournal of Bioinformatics and Computational Biology (JBCB)vol 7 no 3 pp 429ndash453 2009
[16] S Wright ldquoCoefficients of inbreeding and relationshiprdquo TheAmerican Naturalist vol 56 no 645 1922
[17] R Nadot and G Vaysseix ldquoKinship and identity algorithm ofcoefficients of identityrdquo Biometrics vol 29 no 2 pp 347ndash3591973
[18] E Cheng Scalable path-based computations on pedigree data[PhD thesis] Case Western Reserve University ClevelandOhio USA 2012
[19] V Ollikainen Simulation Techniques for Disease Gene Localiza-tion in Isolated Populations [PhD thesis] University ofHelsinkiHelsinki Finland 2002
[20] H T T Toivonen P Onkamo K Vasko et al ldquoData miningapplied to linkage diseqilibrium mappingrdquoThe American Jour-nal of Human Genetics vol 67 no 1 pp 133ndash145 2000
[21] W Boucher ldquoCalculation of the inbreeding coefficientrdquo Journalof Mathematical Biology vol 26 no 1 pp 57ndash64 1988
Submit your manuscripts athttpwwwhindawicom
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MEDIATORSINFLAMMATION
of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Behavioural Neurology
EndocrinologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Disease Markers
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
OncologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Oxidative Medicine and Cellular Longevity
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
PPAR Research
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
ObesityJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational and Mathematical Methods in Medicine
OphthalmologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Diabetes ResearchJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Research and TreatmentAIDS
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Gastroenterology Research and Practice
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Parkinsonrsquos Disease
Evidence-Based Complementary and Alternative Medicine
Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom
Computational and Mathematical Methods in Medicine 11
⟨ ⟩(PAa PAb) (PAc PAd) = b
A
c
s t
da
A rarr s rarr aA rarr s rarr bA rarr t rarr cA rarr t rarr d
(a)
⟨ ⟩(PAa PAb) (PAc PAd) = b
A
c
x y
da
A rarr x rarr a
A rarr x rarr d
A rarr y rarr bA rarr y rarr c
(b)
Figure 12 Examples of 2-pair-path-quads for Φ119886119887119888119889
33 Path-Counting Formulas for Two Pairs of Individuals
331 Terminology and Definitions
(1) 2-Pair-Path-Pair It consists of two pairs of path-pairsdenoted as ⟨(119875
119878119886 119875119878119887) (119875119879119888 119875119879119889)⟩ where 119875
119878119886isin 119875(119878 119886) 119875
119878119887isin
119875(119878 119887) 119875119879119888isin 119875(119879 119888) 119875
119879119889isin 119875(119879 119889) 119878 is a common ancestor
of 119886 and 119887 and 119879 is a common ancestor of 119888 and 119889 If119860 = 119878 =119879 then 119860 is a quad-common ancestor of 119886 119887 119888 and 119889
(2) Homo-Overlap and Heter-Overlap Individual Given twopairs of individuals ⟨119886 119887⟩ and ⟨119888 119889⟩ if 119904 isin 119861119894 119862(119875
119860119886 119875119860119887) (or
119904 isin 119861119894 119862(119875119860119888 119875119860119889) we call 119904 a homo-overlap individual when
119875119860119886
and 119875119860119887
(or 119875119860119888
and 119875119860119889) pass through the same parent of
119904 If 119903 isin 119861119894 119862(119875119860119894 119875119860119895) where 119894 isin 119886 119887 and 119895 isin 119888 119889 we call
119903 a heter-overlap individual when 119875119860119894
and 119875119860119895
pass throughthe same parent of 119903
(3) Root Homo-Overlap and Heter-Overlap Path Given a 2-pair-path-pair ⟨(119875
119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ if 119904 is a homo-overlap
individual and the homo-overlap path extends all the wayto the quad-common ancestor 119860 then we call it a roothomo-overlap path If 119903 is a heter-overlap individual and theheter-overlap path extends all the way to the quad-commonancestor 119860 then we call it a root heter-overlap path
Example 8 119860 is quad-common ancestor for 119886 119887 119888 and 119889 inFigure 12 For (a) 119904 is a homo-overlap individual between 119875
119860119886
and 119875119860119887
119905 is a homo-overlap individual between 119875119860119888
and 119875119860119889 And
119860 rarr 119904 and 119860 rarr 119905 are root homo-overlap paths For (b) 119909 isa heter-overlap individual between 119875
119860119886and 119875
119860119889 119910 is a heter-
overlap individual between 119875119860119887
and 119875119860119888 And 119860 rarr 119909 and
119860 rarr 119910 are root heter-overlap paths
332 Path-Counting Formula for Φ119886119887119888119889
Now we presenta path-pair level graphical representation for ⟨(119875
119860119886 119875119860119887)
(119875119860119888 119875119860119889)⟩ shown in Figure 13 The options for an edge can
be 119879119883 119879119883 (Refer to Section 311 for definitions of 119879119883and 119879119883) Based on the different types of ⟨119875
119860119886 119875119860119887 119875119860119888 119875119860119889⟩
presented in (14) all cases for ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ are
summarized in Table 3 where ℎ is the last individual of a roothomo-overlap path 119875
119860ℎ(ie the path 119875
119860ℎending at ℎ) and 119903
1
and 1199032are the last individuals of root heter-overlap paths 119875
1198601199031
and 1198751198601199032
respectivelyGiven a pedigree graph having one or multiple progeni-
tors 119901119894| 119894 gt 0 we define that the generation of a progenitor
Table 3 A summary of all cases for ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩ ⟨(119875
119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩
Zero root 2-overlap andzero root 3-overlap
Zero root homo-overlap and zero rootheter-overlap
One root 2-overlap path
One root homo-overlap and zero rootheter-overlapZero root homo-overlap and one rootheter-overlap
Two root 2-overlap paths
Two root homo-overlaps and zero rootheter-overlapZero root homo-overlap and two rootheter-overlaps
One root 3-overlap path One root homo-overlap and two rootheter-overlaps and ℎ = 119903
1= 1199032
One root 2-overlap andone root 3-overlap
One root homo-overlap and two rootheter-overlaps and 119903
1= 1199032= ℎ
One root homo-overlap and two rootheter-overlaps and ℎ = 119903
1= 1199032
119901119894is 0 denoted as gen(119901
119894) = 0 If an individual 119886 has only
one parent 119901 then we define gen(119886) = gen(119901) + 1 If anindividual 119886 has two parents 119891 and 119898 we define gen(119886) =MAXgen(119891) gen(119898) + 1
The path-counting formula forΦ119886119887119888119889
is as follows
Φ119886119887119888119889
= sum
119860
( sum
Type 1(1
2)
1198712-pair
Φ119860119860119860
+ sum
Type 2(1
2)
1198712-pair+1
Φ119860119860119860
+ sum
Type 3(1
2)
1198712-pair+2
Φ119860119860
+ sum
Type 4(1
2)
1198712-pair+1
Φ119860119860)
+ sum
(119878119879)isinType 5(1
2)
119871⟨119875119878119886119875119878119887⟩+119871⟨119875119879119888119875119879119889
⟩+1
Φ119861119861
(16)
where 119860 a quad-common ancestor of 119886 119887 119888 and 119889 119878a common ancestor of 119886 and 119887 and 119879 a common ances-tor of 119888 and 119889 For ⟨(119875
119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ (119878 = 119879 =
119860) there are four types (ieType 1 to Type 4)
12 Computational and Mathematical Methods in Medicine
S0S1 S2 S3 S4 S5 S6 S7
S8 S9 S10 S11 S12 S13 S14 S15 S16
PAa
PAdPAb
PAc
Figure 13 Scenarios of ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ at path-pair level
Type 1 zero root homo-overlap and zero root heter-overlapType 2 zero root homo-overlap and one root heter-overlap 119875
119860119903ending at 119903
Type 3
zero root homo-overlap and two rootheter-overlap 119875
1198601199031and1198751198601199032
ending at1199031and 1199032 respectively
one root homo-overlap 119875119860ℎ
ending at ℎand two root heter-overlap 119875
1198601199031and 119875
1198601199032
ending at 1199031and 1199032 and 119903
1= 1199032
(17)
Type 4 one root homo-overlap 119875119860ℎ
ending at ℎ andtwo root heter-overlap ending at 119903
1and 1199032 and ℎ =
1199031= 1199032 For ⟨(119875
119878119886 119875119878119887) (119875119879119888 119875119879119889)⟩ (119878 = 119879) there is
one type (ie Type 5)Type 5 ⟨119875
119878119886 119875119878119887⟩ has zero overlap individual ⟨119875
119879119888
119875119879119889⟩ has zero overlap individual
At most one path-pair (either ⟨119875119878119886 119875119878119887⟩ or ⟨119875
119879119888
119875119879119889⟩) can have crossover individualsBetween a path from ⟨119875
119878119886 119875119878119887⟩ and a path from ⟨119875
119879119888 119875119879119889⟩
there are no overlap individuals but there can be crossoverindividuals 119909 where 119909 = 119878 and 119909 = 119879
119861=
119878 when gen (119878) lt gen (119879)119878 when gen (119878) = gen (119879)
and 119879 has two parents119879 otherwise
1198712-pair =
119871119875119860119886+ 119871119875119860119887
+119871119875119860119888+ 119871119875119860119889
for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 119871119875119860119903
for Type 2119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 1198711198751198601199031
minus 1198711198751198601199032
for Type 3119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 2 lowast 119871
119875119860ℎfor Type 4
119871⟨119875119878119886 119875119878119887⟩
= 119871119875119878119886+ 119871119875119878119887
for Type 5
119871⟨119875119879119888 119875119879119889⟩
= 119871119875119879119888+ 119871119875119879119889
for Type 5
(18)
Note that if ⟨119886 119887⟩ and ⟨119888 119889⟩ have zero quad-commonancestors we have the following formula for Φ
119886119887119888119889
Φ119886119887119888119889
= sum
(119878119879)isinType 6(1
2)
119871⟨119875119878119886119875119878119887⟩+119871⟨119875119879119888119875119879119889
⟩
Φ119878119878lowast Φ119879119879 (19)
Type 6 ⟨119875119878119886 119875119878119887⟩ is a nonoverlapping path-pair and ⟨119875
119879119888
119875119879119889⟩ is a nonoverlapping path-pair Between a path from
⟨119875119878119886 119875119878119887⟩ and a path from ⟨119875
119879119888 119875119879119889⟩ there are no overlap
individuals but there can be crossover individuals119871⟨119875119878119886 119875119878119887⟩
and 119871⟨119875119879119888119875119879119889⟩
are defined as in Type 5The correctness of the path-counting formula forΦ
119886119887119888119889is
proven in Appendix C For completeness please refer to [18]for the path-counting formulas for Φ
119886119886119887119888 Φ119886119887119886119888
Φ119886119887119886119887
andΦ119886119886119886119887
34 Experimental Results In this section we show the effi-ciency of our path-counting method using NodeCodes forcondensed identity coefficients by making comparisons withthe performance of a recursive method used in [10] Weimplemented two methods (1) using recursive formulas tocompute each required kinship coefficient and generalizedkinship coefficient (2) using path-counting method coupledwith NodeCodes to compute each required kinship coeffi-cient and generalized kinship coefficient independently Werefer to the first method as Recursive the second methodas NodeCodes For completeness please refer to [18] for thedetails of the NodeCodes-based method
Nodecodes of a node is a set of labels each representing apath to the node from its ancestors Given a pedigree graphlet 119903 be the progenitor (ie the node with 0 in-degree)(For simplicity we assume there is one progenitor 119903 asthe ancestor of all individuals in the pedigree Otherwise avirtual node 119903 can be added to the pedigree graph and allprogenitors can be made children of 119903) For each node 119906 inthe graph the set of NodeCodes of 119906 denoted as NC(119906) areassigned using a breadth-first-search traversal starting from119903 as follows
(1) If 119906 is 119903 then NC(119903) contains only one element theempty string
(2) Otherwise let 119906 be a node with NC(119906) and V0 V1
V119896be 119906rsquos children in sibling order then for each 119909
in NC(119906) a code 119909119894lowast is added to NC(V119894) where 0 le
119894 le 119896 and lowast indicates the gender of the individualrepresented by node V
119894
Computational and Mathematical Methods in Medicine 13
Computations of kinship coefficients for two individualsand generalized kinship coefficients for three individualspresented in [11 12 14 15] are using NodeCodes TheNodeCodes-based computation schemes can also be appliedfor the generalized kinship coefficients for four individualsand two pairs of individuals For completeness please referto [18] for the details using NodeCodes to compute thegeneralized kinship coefficients for four individuals and twopairs of individuals based on our proposed path-countingformulas in Sections 32 and 33
In order to test the scalability of our approach for cal-culating condensed identity coefficients on large pedigreeswe used a population simulator implemented in [11] togenerate arbitrarily large pedigreesThe population simulatoris based on the algorithm for generating populations withoverlapping generations in Chapter 4 of [19] along withthe parameters given in Appendix B of [20] to model therelatively isolated Finnish Kainuu subpopulation and itsgrowth during the years 1500ndash2000 An overview of thegeneration algorithmwas presented in [11 12 14]The param-eters include startingending year initial population sizeinitial age distribution marriage probability maximum ageat pregnancy expected number of children by time periodimmigration rate and probability of death by time period andage group
We examine the performance of condensed identity coef-ficients using twelve synthetic pedigrees which range from75 individuals to 195197 individuals The smallest pedigreespans 3 generations and the largest pedigree spans 19 gener-ations We analyzed the effects of pedigree size and the depthof individuals in the pedigree (the longest path between theindividual and a progenitor) on the computation efficiencyimprovement
In the first experiment 300 random pairs were selectedfrom each of our 12 synthetic pedigrees Figure 14 showscomputation efficiency improvement for each pedigree Ascan be seen the improvement of NodeCodes over Recursivegrew increasingly larger as the pedigree size increased froma comparable amount of 2683 on the smallest pedigree to9475 on the largest pedigree It also shows that path-count-ing method coupled with NodeCodes can scale very well onlarge pedigrees in terms of computing condensed identitycoefficients
In our next experiment we examined the effect of thedepth of the individual in the pedigree on the query time Foreach depth we generated 300 random pairs from the largestsynthetic pedigree
Figure 15 shows the effect of depth on the compu-tation efficiency improvement We can see the improve-ment of NodeCodes over Recursive ranging from 8648 to9130
4 Conclusion
We have introduced a framework for generalizing Wrightrsquospath-counting formula for more than two individuals Aim-ing at efficiently computing condensed identity coefficients
0
50
100
150
200
77 181
383
769
1558
3105
6174
1235
1
2466
7
4976
1
9832
8
1951
97
250
300
Aver
age t
ime (
ms)
Individuals in pedigree
RecursiveNodecodes
Figure 14 The effect of pedigree size on computation efficiencyimprovement
0200400600800
10001200140016001800
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Aver
age t
ime (
ms)
Depth
RecursiveNodeCodes
Figure 15 The effect of depth on computation efficiency improve-ment
we proposed path-counting formulas (PCF) for all general-ized kinship coefficients for which are sufficient for express-ing condensed identity coefficients by a linear combinationWe also perform experiments to compare the efficiency of ourmethod with the recursive method for computing condensedidentity coefficients on large pedigrees Our future workincludes (i) further improvements on condensed identifycoefficients computation by collectively calculating the setof generalized kinship coefficients to avoid redundant com-putations and (ii) experimental results for using PCF inconjunction with encoding schemes (eg compact path-encoding schemes [13]) for computing condensed identitycoefficients on very large pedigrees
Appendices
A Path-Counting Formulas of Special Cases
A1 Path-Counting Formula for Φ119886119886119887
For ⟨1198751198601198861 1198751198601198862⟩ we
introduce a special case where 1198751198601198861
and 1198751198601198862
aremergeable
14 Computational and Mathematical Methods in Medicine
PAa1 PAa2 PAa1 PAa2
S0 S1
PAb PAb PAb
If is mergeable⟨P ⟩Aa1 PAa2
PAa
S2 S3
Figure 16 A path-pair level graphical representation of ⟨1198751198601198861 1198751198601198862
119875119860119887⟩
Definition A1 (Mergeable Path-Pair) A path-pair ⟨1198751198601198861
1198751198601198862⟩ is mergeable if and only if the two paths 119875
1198601198861and 119875
1198601198862
are completely identical
Next we present a graphical representation of ⟨1198751198601198861 1198751198601198862
119875119860119887⟩ in Figure 16
Lemma A2 For 1198782and 119878
3in Figure 16 ⟨119875
1198601198861 1198751198601198862⟩ cannot
be a mergeable path-pair
Proof For 1198782and 119878
3 if ⟨119875
1198601198861 1198751198601198862⟩ is mergeable then
any common individual 119904 between 1198751198601198861
and 119875119860119887
is alsoa shared individual between 119875
1198601198862and 119875
119860119887 It means
119904 isin 119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) which contradicts the fact that
119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) = 0
Considering all three scenarios in Figure 16 only 1198781can
have a mergeable path-pair ⟨1198751198601198861 1198751198601198862⟩ by Lemma A2 Now
we present our path-counting formula forΦ119886119886119887
where 119886 is notan ancestor of 119887
Φ119886119886119887
= sum
119860
( sum
Type 1(1
2)
119871 tripleminus1
Φ119860119860119860
+ sum
Type 2(1
2)
119871 triple
Φ119860119860
+ sum
Type 3(1
2)
119871⟨119875119860119886119875119860119887⟩+1
Φ119860119860)
(A1)
where 119860 a common ancestor of 119886 and 119887When ⟨119875
1198601198861 1198751198601198862⟩ is not mergeable
Type 1 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has no root 2-overlap
Type 2 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has one root 2-overlap path
119875119860119904
ending at the individual 119904
When ⟨1198751198601198861 1198751198601198862⟩ is mergeable
Type 3 ⟨119875119860119886 119875119860119887⟩ is a nonoverlapping path-pair
119871 triple = 1198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887
for Type 11198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887minus 119871119875119860119904
for Type 2
119871⟨119875119860119886 119875119860119887⟩
= 119871119875119860119886+ 119871119875119860119887
for Type 3
(A2)
For the sake of completeness if 119886 is an ancestor of 119887 there isno recursive formula for Φ
119886119886119887in [10] but we can use either
the recursive formula for Φ119886119887119888
or the path-counting formulaforΦ119886119887119888
to computeΦ11988611198862119887
A2 Path-Counting Formula for Φ119886119886119887119888
Given a path-quad⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩ if ⟨119875
1198601198861 1198751198601198862⟩ is not mergeable then
we process the path-quad as equivalent to ⟨119875119860119886 119875119860119887 119875119860119888
119875119860119889⟩ If ⟨119875
1198601198861 1198751198601198862⟩ is mergeable the path-quad ⟨119875
1198601198861 1198751198601198862
119875119860119887 119875119860119888⟩ can be condensed to scenarios for ⟨119875
119860119886 119875119860119887 119875119860119888⟩
Now we present a path-counting formula forΦ119886119886119887119888
where119886 is not an ancestor of 119887 and 119888 as follows
Φ119886119886119887119888
= sum
119860
( sum
Type 1(1
2)
119871quadminus1
Φ119860119860119860119860
+ sum
Type 2(1
2)
119871quad
ΦAAA
+ sum
Type 3(1
2)
119871quad+1
Φ119860119860)
+sum
119860
( sum
Type 4(1
2)
119871 triple+1
Φ119860119860119860
+ sum
Type 5(1
2)
119871 triple+2
Φ119860119860)
(A3)
where 119860 a quad-common ancestor of 119886 119887 119888 and 119889When ⟨119875
1198601198861 1198751198601198862⟩ is not mergeable
Type 1 zero root 2-overlap and zero root 3-overlappathType 2 one root 2-overlap path 119875
119860119904ending at 119904
Type 3
Case 1 two root 2-overlap paths 1198751198601199041
and 1198751198601199042
ending at 1199041and 1199042 respectively
Case 2 one root 3-overlap path 119875119860119905
ending at 119905Case 3 one root 2-overlapand one root 3-overlap paths119875119860119904
and 119875119860119905
ending at 119904 and 119905respectively
(A4)
When ⟨1198751198601198861 1198751198601198862⟩ is mergeable
Type 4 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has zero root 2-overlap path
Type 5 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has one root 2-overlap path119875
119860119904
ending at 119904
119871quad=
1198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
for Type 11198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus119871119875119860119904
for Type 21198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus1198711198751198601199041
minus 1198711198751198601199042
for Case 1isinType 31198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus119871119875119860119905
for Case 2isinType 31198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus119871119875119860119905minus 119871119875119860119904
for Case 3isinType 3
119871 triple = 119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
for Type 4119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888minus 119871119875119860119904
for Type 5(A5)
Computational and Mathematical Methods in Medicine 15
Note that if 119886 is an ancestor of either 119887 or 119888 or both ofthem then the path-counting formula of Φ
119886119887119888119889is applicable
to computeΦ11988611198862119887119888
A3 Path-Counting Formula for Φ119886119886119886119887
A special case of⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ for ⟨119875
1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ is introduced
when ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable With the existence of
a mergeable path-triple ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ can be con-
densed to ⟨119875119860119886 119875119860119887⟩
Definition A3 (Mergeable Path-Triple) Given three paths1198751198601198861
1198751198601198862
and 1198751198601198863
they are mergeable if and only if theyare completely identical
Lemma A4 Given a path-quad ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there
must be at least one mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩
⟨1198751198601198861 1198751198601198863⟩ ⟨1198751198601198862 1198751198601198863⟩
Proof For an individual 119886 with two parents 119891 and 119898 thepaternal allele of the individual 119886 is transmitted from 119891 andthe maternal allele is transmitted from119898 At allele level onlytwo descent paths starting from an ancestor are allowed Fora path-quad ⟨119875
1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there must be at least one
mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩ ⟨1198751198601198861 1198751198601198863⟩ and
⟨1198751198601198862 1198751198601198863⟩
For simplicity we treat ⟨1198751198601198861 1198751198601198862⟩ as a default mergeable
path-pairNow we present the path-counting formula for Φ
119886119886119886119887
where 119886 is not an ancestor of 119887 as follows
Φ119886119886119886119887
= sum
119860
(3
2( sum
Type 1(1
2)
119871 tripleminus1
Φ119860119860119860
+ sum
Type 2(1
2)
119871 triple
Φ119860119860)
+ sum
Type 3(1
2)
119871pair+2
Φ119860119860)
(A6)
where 119860 a common ancestor of 119886 and 119887When there is only one mergeable path-pair (let us con-
sider ⟨1198751198601198861 1198751198601198862⟩ as the mergeable path-pair)
Type 1 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has zero root 2-overlap path
Type 2 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has one root 2-overlap path
119875119860119904
ending at 119904
When ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable
Type 3 ⟨119875119860119886 119875119860119887⟩ is nonoverlapping
119871 triple = 1198711198751198601198861
+ 1198711198751198601198863
+ 119871119875119860119887
for Type 11198711198751198601198861
+ 1198711198751198601198863
+ 119871119875119860119887minus 119871119875119860119904
for Type 2
119871pair = 119871119875119860119886 + 119871119875119860119887 for Type 3
(A7)
Note that if 119886 is an ancestor of 119887 we treat Φ119886119886119886119887
=
Φ119886111988621198863119887
Then we apply the path-counting formula forΦ119886119887119888119889
to computeΦ119886111988621198863119887
Case21 Case31 ΦAAAΦabCase22 Case32
Case23 ΦAA
Figure 17 Dependency graph for different cases regardingΦ119886119887119888
andΦ119886119886119887
B Proof for Path-Counting Formulas ofThree Individuals
Wefirst demonstrate that for one triple-common ancestor119860the path-counting computation of Φ
119886119887119888is equivalent to the
computation using recursive formulas Then we prove thecorrectness of the path-counting computation for multipletriple-common ancestors
B1 One Triple-Common Ancestor Considering the differenttypes of path-triples starting from a triple-common ancestor119860 in a pedigree graph119866 contributing toΦ
119886119887119888andΦ
119886119886119887119866 can
have 5 different cases
Case 21 119866 does not haveany path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with root overlapCase 22 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with root overlapCase 23 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
having mergeablepath-pair⟨119875
1198601198861 1198751198601198862⟩
lArr997904 Φ119886119886119887
Case 31 119866 does not haveany path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩
with root overlapCase 32 119866 has path-triples
⟨119875119860119886 119875119860119887 119875119860119888⟩
with root overlap
lArr997904 Φ119886119887119888
(B1)
Based on the 5 cases from Case 21 to Case 32 we firstconstruct a dependency graph shown in Figure 17 consist-ent with the recursive formulas (3) (4) and (5) for the gener-alized kinship coefficients for three individuals
Then we take the following steps to prove the correctnessof the path-counting formulas (12) and (A1)
(i) forΦ119886119887 the correctness of the path-counting formula
(ie Wrightrsquos formula) is proven in [21] For Case 21and Case 22 the correctness is proven based on thecorrectness of Cases 31 and 32
(ii) for Case 23 it has no cycle but only depends on Φ119886119887
Thus we prove the correctness of Case 23 by trans-forming the case toΦ
119886119887
16 Computational and Mathematical Methods in Medicine
a b
c
(a)
A
a b c
(b)
Figure 18 (a) 119888 is a parent of 119886 and 119887 (b) no individual is a parent of another
Parent-child relationshipAncestor-descendant relationship
A
a
s v p
f b c
(a)
Parent-child relationshipAncestor-descendant relationship
c
a
s v
f b
(b)
Figure 19 (a) No individual is a parent of another (b) 119888 is an ancestor of 119886 and 119887
(iii) for Cases 31 and 32 the correctness is proven byinduction on the number of edges 119899 in the pedigreegraph 119866
B11 Correctness Proof for Case 31
Case 31 ForΦ119886119887119888
119866 does not have any path triples ⟨119875119860119886 119875119860119887
119875119860119888⟩ with root overlap
Proof (Basis) There are two basic scenarios (i) one individ-ual is a parent of another (ii) no individual is a parent ofanother among 119886 119887 and 119888
Using the recursive formula (3) to compute Φ119886119887119888
forFigure 18(a) Φ
119886119887119888= (12)Φ
119888119887119888= (12)
2
Φ119888119888119888 for Figure 18(b)
Φ119886119887119888= (12)Φ
119860119887119888= (12)
2
Φ119860119860119888
= (12)3
Φ119860119860119860
Using the path-counting formula (12) if a path-triple
⟨119875119860119886 119875119860119887 119875119860119888⟩ has no root overlap (ie Type 1) then the
contribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ
119886119887119888can be computed as
follows sumType 1(12)119871⟨119875119860119886119875119860119887
119875119860119888⟩Φ119860119860119860
where 119871⟨119875119860119886119875119860119887 119875119860119888⟩
=
119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
For Figure 18(a) 119888 is the only triple-common ancestor
and we obtain Φ119886119887119888
= (12)119871⟨119875119888119886119875119888119887
119875119888119888⟩Φ119888119888119888
= (12)2
Φ119888119888119888 for
Figure 18(b) we obtain Φ119886119887119888
= (12)119871⟨119875119860119886119875119860119887
119875119860119888⟩Φ119860119860119860
=
(12)3
Φ119860119860119860
Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Then we show it istrue for 119899 = 119896 + 1
For Figures 19(a) and 19(b) among 119886 119887 and 119888 let 119886 be theindividual having the longest path starting from their triple-common ancestor in the pedigree graph119866with (119896+1) edgesIf we remove the node 119886 and cut the edge 119891 rarr 119886 from 119866
then the new graph 119866lowast has 119896 edges In terms of computingΦ119891119887119888
119866lowast satisfies the condition for induction hypothesisFor Figure 19(a) Φ
119891119887119888= sumType 1(12)
119871⟨119875119860119891119875119860119887119875119860119888⟩Φ119860119860119860
Based on the recursive formula (3)Φ
119886119887119888= (12)(Φ
119891119887119888+Φ119898119887119888)
where 119891 and 119898 are parents of 119886 In 119866 119886 only has one parent119891 thus it indicatesΦ
119898119887119888= 0 Then we can plug-in the path-
counting formula forΦ119891119887119888
to obtain
Φ119886119887119888=1
2Φ119891119887119888
=1
2lowast sum
Type 1(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩
Φ119860119860119860
= sum
Type 1(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860119860
∵ 119871⟨119875119860119886119875119860119887 119875119860119888⟩
= 119871⟨119875119860119891119875119860119887 119875119860119888⟩
+ 1
there4 Φ119886119887119888= sum
Type 1(1
2)
119871⟨119875119860119886119875119860119887119875119860119888⟩
Φ119860119860119860
(B2)
Similarly for Figure 19(b) we obtain Φ119886119887119888
=
sumType 1(12)119871⟨119875119888119891119875119888119887119875119888119888⟩+1
Φ119888119888119888= sumType 1(12)
119871⟨119875119888119886119875119888119887119875119888119888⟩Φ119888119888119888
Thus it is true for 119899 = 119896 + 1
B12 Correctness Proof for Case 32
Case 32 ForΦ119886119887119888
119866 has path triples ⟨119875119860119886 119875119860119887 119875119860119888⟩with root
overlap
Proof (Basis) There are three basic scenarios (i) there are twoindividuals who are parents of another (ii) there is only oneindividual who is parent of another (iii) there is no individualwho is a parent of another among 119886 119887 and 119888
Computational and Mathematical Methods in Medicine 17
a
b
c
(a)
A
a
b c
(b)
A
a
s
b
c
(c)
Figure 20 (a) 119887 is a parent of 119886 and 119888 is a parent of 119887 (b) 119887 is a parentof 119886 (c) no individual who is a parent of another
Using the recursive formula (3) to compute Φ119886119887119888
inFigure 20 for Figure 20(a) Φ
119886119887119888= (12)Φ
119887119887119888= (12)
2
Φ119887119888=
(12)3
Φ119888119888 for Figure 20(b)Φ
119886119887119888= (12)Φ
119887119887119888= (12)
2
Φ119887119888=
(12)4
Φ119860119860
for Figure 20(c)Φ119886119887119888= (12)
2
Φ119904119904119888= (12)
3
Φ119904119888=
(12)5
Φ119860119860
Using the path-counting formula (12) if a path-triple
⟨119875119860119886 119875119860119887 119875119860119888⟩ has root overlap (ie Type 2) then the con-
tribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ
119886119887119888can be computed as
followssumType 2(12)119871⟨119875119860119886119875119860119887
119875119860119888⟩+1
Φ119860119860
where 119871⟨119875119860119886 119875119860119887 119875119860119888⟩
=
119871119875119860119886
+ 119871119875119860119887
+ 119871119875119860119888minus 119871119875119860119904
and 119904 is the last individual of theroot overlap path 119875
119860119904
For Figure 20(a) 119888 is the only triple-common ancestorand we obtain Φ
119886119887119888= (12)
119871⟨119875119888119886119875119888119887119875119888119888⟩+1
Φ119888119888= (12)
2+1
Φ119888119888=
(12)3
Φ119888119888 Similarly for Figures 20(b) and 20(c) we obtain
Φ119886119887119888= (12)
4
Φ119860119860
and Φ119886119887119888= (12)
5
Φ119860119860
respectively
Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Show that it is truefor = 119896 + 1
For Figures 21(a) 21(b) and 21(c) among 119886 119887 and 119888 let119886 be the individual who has the longest path and let 119901 be aparent of 119886 Then we cut the edge 119901 rarr 119886 from 119866 and obtaina new graph 119866lowast which satisfies the condition of inductionhypothesis For Figure 21(a) we use the path-counting for-mula forΦ
119891119887119888in 119866lowast Φ
119891119887119888= sumType 2(12)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860
In 119866 119891 is the only parent of 119886 according to the recursive
formula (3) we have Φ119886119887119888= (12)Φ
119891119887119888 Then we can plug-in
the Φ119891119887119888
and obtain
Φ119886119887119888=1
2Φ119891119887119888
=1
2sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860
= sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1
Φ119860119860
∵ 119871⟨119875119860119886 119875119860119887 119875119860119888⟩
= 119871⟨119875119860119891119875119860119887 119875119860119888⟩
+ 1
there4 Φ119886119887119888= sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1
Φ119860119860
= sum
Type 2(1
2)
119871⟨119875119860119886119875119860119887119875119860119888⟩+1
Φ119860119860
(B3)
For Figures 21(b) and 21(c) we take the same steps as we cal-culate Φ
119886119887119888for Figure 21(a)
In summary it is true for 119899 = 119896 + 1
A
a
s
t
f
b
c
(a)
a
t
b
A
s c
(b)
a
s
t
b
c
(c)Figure 21 (a) No individual who is a parent of another (b) 119887 is aparent of 119886 (c) 119887 is a parent of 119886 and 119888 is an ancestor of 119887
B13 Correctness Proof for Case 23
Case 23 For Φ119886119886119887
the path-triples in the pedigree graph 119866have mergeable path-pair
Proof Considering the relationship between 119886 and 119887 119866has two scenarios (i) 119887 is not an ancestor of 119886 (ii) 119887 isan ancestor of 119886 Using the path-counting formula (A1)if a path-triple ⟨119875
1198601198861 1198751198601198862 119875119860119887⟩ isin Type 3 which means
that it has a mergeable path-pair then the contributionof ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ to Φ
119886119886119887can be computed as follows
sumType 3(12)119871⟨119875119860119886119875119860119887
⟩+1Φ119860119860
where 119871⟨119875119860119886 119875119860119887⟩
= 119871119875119860119886+ 119871119875119860119887
Using the recursive formula (4) we obtain Φ
119886119886119887=
(12)(Φ119886119887+ Φ119891119898119887)
For Figure 22(a) 119860 is a common ancestor of 119886 and 119887∵ 119886 only has one parent 119891
there4 Φ119886119886119887
=1
2(Φ119886119887+ Φ119891119898119887)
=1
2(Φ119886119887+ 0) =
1
2Φ119886119887
(as 119898 is missing) (B4)
For Φ119886119887 we use Wrightrsquos formula and obtain Φ
119886119887=
sum119875(12)119871⟨119875119860119886119875119860119887
⟩Φ119860119860
where 119875 denotes all nonoverlappingpath-pairs ⟨119875
119860119886 119875119860119887⟩
Then we have Φ119886119886119887
= (12)Φ119886119887
=
(12)sum119875(12)119871⟨119875119860119886119875119860119887
⟩Φ119860119860= sum119875(12)119871⟨119875119860119886119875119860119887
⟩+1Φ119860119860
For Figure 22(b) we can also transform the computation
of Φ119886119886119887
to Φ119886119887
In summary it shows that the path-counting formula(A1) is true for Case 23
B14 Correctness Proof for Cases 21 and 22 For Φ119886119886119887
whenthere is no path-triple having mergeable path-pair (ie thepath-triple belongs to either Case 21 or Case 23)Φ
119886119886119887can be
transformed toΦ11988611198862119887
which is equivalent to the computationof Φ119886119887119888
for Cases 31 and 32 The correctness of our path-counting formula for Cases 31 and 32 is proven Thus weobtain the correctness for Φ
119886119886119887when the path-triple belongs
to either Case 21 or Case 22
B2 Multiple Triple-Common Ancestors Now we providethe correctness proof for multiple triple-common ancestorsregarding the path-counting formulas (12) and (A1)
18 Computational and Mathematical Methods in Medicine
A
a
s
w
t
f
b
Parent-child relationshipAncestor-descendant relationship
(a)
a
s
f
b
Parent-child relationshipAncestor-descendant relationship
(b)
Figure 22 (a) 119887 is not an ancestor of 119886 (b) 119887 is an ancestor of 119886
Lemma A Given a pedigree graph 119866 and three individuals 119886119887 119888 having at least one trip-common ancestorΦ
119886119887119888is correctly
computed using the path counting formulas (12) and (A1)
Proof Proof by induction on the number of triple-commonancestorsBasis 119866 has only one triple-common ancestor of 119886 119887 and 119888
The correctness of (12) and (A1) for 119866 with only one tri-ple-common ancestor of 119886 119887 and 119888 is proven in the previoussection
Induction Hypothesis Assume that if 119866 has 119896 or less triple-common ancestors of 119886 119887 and 119888 (12) and (A1) are correct for119866
Induction Step Now we show that it is true for 119866 with 119896 + 1triple-common ancestors of 119886 119887 and 119888
Let 119879119903119894 119862(119886 119887 119888 119866) denote all triple-common ancestorsof 119886 119887 and 119888 in 119866 where 119879119903119894 119862(119886 119887 119888 119866) = 119860
119894| 1 le 119894 le 119896 +
1 Let 1198601be the most top triple-common ancestor such that
there is no individual among the remaining ancestors 119860119894|
2 le 119894 le 119896 + 1 who is an ancestor of 1198601 Let 119878(119860
1) denote the
contribution from 1198601to Φ119886119887119888
Because119860
1is themost top triple-common ancestor there
is no path-triple from 119860119894| 2 le 119894 le 119896 + 1 to 119886 119887 and
119888 which passes through 1198601 Then we can remove 119860
1from
119866 and delete all out-going edges from 1198601and obtain a new
graph 1198661015840 which has 119896 triple-common ancestors of 119886 119887 and 119888It means 119879119903119894 119862(119886 119887 119888 1198661015840) = 119860
119894| 2 le 119894 le 119896 + 1
For the new graph 1198661015840 we can apply our induction
hypothesis and obtainΦ119886119887119888(1198661015840
)For the most top triple-common ancestor 119860
1 there are
two different cases considering its relationship with the othertriple-common ancestors
(1) there is no individual among 119860119894| 2 le 119894 le 119896 + 1 who
is a descendant of 1198601
(2) there is at least one individual among 119860119894| 2 le 119894 le
119896 + 1 who is a descendant of 1198601
For (1) since no individual among 119860119894| 2 le 119894 le 119896 + 1 is a
descendant of 1198601 the set of path-triples from 119860
1to 119886 119887 and
119888 is independent of the set of path-triples from 119860119894| 2 le 119894 le
119896 + 1 to 119886 119887 and 119888 It also means that the contribution from
1198601toΦ119886119887119888
is independent of the contribution from the othertriple-common ancestors
Summing up all contributions we can obtainΦ119886119887119888(119866) =
Φ119886119887119888(1198661015840
) + 119878(1198601)
For (2) let119860119895be one descendant of119860
1 Now both119860
1and
119860119895can reach 119886 119887 and 119888119901119905119894= 119905119886 1198601rarr sdot sdot sdot rarr 119886 119905
119887 1198601rarr sdot sdot sdot rarr 119887 119905
119888 1198601rarr
sdot sdot sdot rarr 119888 a path-triple from 1198601to 119886 119887 and 119888
If 119905119886 119905119887 and 119905
119888all pass through119860
119895 then the path-triple119901119905
119894
is not an eligible path-triple for Φ119886119887119888
When we compute thecontribution from119860
1toΦ119886119887119888
we exclude all such path-tripleswhere 119905
119886 119905119887 and 119905
119888all pass through a lower triple-common
ancestor In other words an eligible path-triple from 1198601
regarding Φ119886119887119888
cannot have three paths all passing through alower triple-common ancestor Therefore we know that thatthe contribution from119860
1toΦ119886119887119888
is independent of the contri-bution from the other triple-common ancestors Summing upall contributions we obtainΦ
119886119887119888(119866) = Φ
119886119887119888(1198661015840
) + 119878(1198601)
C Proof for Four Individuals and TwoPairs of Individuals
Here we give a proof sketch for the correctness of pathcounting formulas for four individuals First of all for fourindividuals in a pedigree graph 119866 we present all differentcases based on which we construct a dependency graphThe correctness of the path-counting formulas for two-pairindividuals can be proved similarly
C1 Proof for Four Individuals Consider the existence ofdifferent types of path-quads regarding Φ
119886119887119888119889 Φ119886119886119887119888
andΦ119886119886119886119887
there are 15 cases for a pedigree graph 119866
Case 21 119866 has path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with zero root overlapCase 22 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with one root overlapCase 23 119866 has path-pairs
⟨119875119860119886 119875119860119887⟩
with zero root overlap
lArr997904 Φ119886119886119886119887
Computational and Mathematical Methods in Medicine 19
Case21
Case31 ΦAAA
ΦAAA
Case41
Case42
Case34ΦAA
Case32
Case331
Case22
Case23
Case431
Case35
Case432
Case4 33
Case332
Case333
Figure 23 Dependency graph for different cases for four individuals
Case 31 119866 has path-quads⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with zero root overlapCase 32 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 2-overlapCase 331 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with two root 2-overlapCase 332 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 3-overlapCase 333 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 2-overlapand one root 3-overlap
Case 34 119866 has path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩
with zero root overlapCase 35 119866 has path-triples
⟨119875119860119886 119875119860119887 119875119860119888⟩
with one root overlap
lArr997904 Φ119886119886119887119888
Case 41 119866 has path-quads⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with zero root overlapCase 42 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 2-overlapCase 431 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with two root 2-overlapCase 432 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 3-overlapCase 433 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 2-overlapand one root 3-overlap
lArr997904 Φ119886119887119888119889
(C1)Then we construct a dependency graph shown in
Figure 23 for all cases for four individualsAccording to the dependency graph in Figure 23 the
intermediate steps including Cases 34 and 35 are already
proved for the computation of Φ119886119887119888
The correctness of thetransformation fromCase 42 to Case 34 can be proved basedon the recursive formula forΦ
119886119887119888119889andΦ
119886119886119887119888 Similarly we can
obtain the transformation from Case 431 to Case 35
C2 Proof for TwoPairs of Individuals Consider the existenceof different types of 2-pair-path-pair regarding Φ
119886119887119888119889 there
are 9 cases which are listed as follows
Case 41 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-
overlap and zero root heter-overlap
Case 42 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-
overlap and one root heter-overlap
Case 431 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root
homo-overlap and two root heter-overlap
Case 432 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root
homo-overlap and two root heter-overlap
Case 44119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root homo-
overlap and zero root heter-overlap
Case 45 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with two root homo-
overlap and zero root heter-overlap
Case 46 119866 has path-triples ⟨119875119860119886 119875119860119887 119875119860119888⟩ with zero root
overlapCase 47 119866 has path-triples ⟨119875
119860119886 119875119860119887 119875119860119888⟩ with one root
overlap
Case 48 119866 has path-pairs ⟨119875119879119888 119875119879119889⟩ with zero root overlap
Then we construct a dependency graph for the casesrelating to Φ
119886119887119888119889in Figure 24
According to the dependency graph in Figure 24Cases 46 47 and 48 are the intermediate steps whichalready are proved for the computation of Φ
119886119887119888 The
correctness of the transformation from Case 42 to Case 46can be proved based on the recursive formula for Φ
119886119887119888119889and
Φ119886119887119886119888
Similarly we can obtain the transformation fromCases 431 and 432 to Case 47 as well as from Case 44 toCase 48 accordingly
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
20 Computational and Mathematical Methods in Medicine
Case41
Case44
ΦAAA
Case42 Case46
Case48
ΦAA
ΦTT
Case431 Case47
Case432
ΦAAAA
Figure 24 Dependency graph for different cases for two pairs of individuals
Acknowledgments
The authors thank Professor Robert C Elston Case Schoolof Medicine for introducing to them the identity coefficientsand referring them to the related literature [7 10 17] Thiswork is partially supported by the National Science Founda-tionGrants DBI 0743705 DBI 0849956 andCRI 0551603 andby the National Institute of Health Grant GM088823
References
[1] Surgeon Generalrsquos New Family Health History Tool Is ReleasedReady for ldquo21st Century Medicinerdquo httpcompmedcomcate-gorypeople-helping-peoplepage7
[2] M Falchi P Forabosco E Mocci et al ldquoA genomewidesearch using an original pairwise sampling approach for largegenealogies identifies a new locus for total and low-density lipo-protein cholesterol in two genetically differentiated isolates ofSardiniardquoThe American Journal of Human Genetics vol 75 no6 pp 1015ndash1031 2004
[3] M Ciullo C Bellenguez V Colonna et al ldquoNew susceptibilitylocus for hypertension on chromosome 8q by efficient pedigree-breaking in an Italian isolaterdquo Human Molecular Genetics vol15 no 10 pp 1735ndash1743 2006
[4] Glossary of Genetic Terms National Human Genome ResearchInstitute httpwwwgenomegovglossaryid=148
[5] CW CottermanA calculus for statistico-genetics [PhD thesis]Columbus Ohio USA Ohio State University 1940 Reprintedin P Ballonoff Ed Genetics and Social Structure DowdenHutchinson amp Ross Stroudsburg Pa USA 1974
[6] G Malecot Les mathematique de lrsquoheredite Masson ParisFrance 1948 Translated edition The Mathematics of HeredityFreeman San Francisco Calif USA 1969
[7] M Gillois ldquoLa relation drsquoidentite en genetiquerdquo Annales delrsquoInstitut Henri Poincare B vol 2 pp 1ndash94 1964
[8] D L Harris ldquoGenotypic covariances between inbred relativesrdquoGenetics vol 50 pp 1319ndash1348 1964
[9] A Jacquard ldquoLogique du calcul des coefficients drsquoidentite entredeux individualsrdquo Population vol 21 pp 751ndash776 1966
[10] G Karigl ldquoA recursive algorithm for the calculation of identitycoefficientsrdquo Annals of Human Genetics vol 45 no 3 pp 299ndash305 1981
[11] B Elliott S F Akgul S Mayes and Z M Ozsoyoglu ldquoEfficientevaluation of inbreeding queries on pedigree datardquo in Proceed-ings of the 19th International Conference on Scientific and Statis-tical Database Management (SSDBM rsquo07) July 2007
[12] B Elliott E Cheng S Mayes and Z M Ozsoyoglu ldquoEfficientlycalculating inbreeding on large pedigrees databasesrdquo Informa-tion Systems vol 34 no 6 pp 469ndash492 2009
[13] L Yang E Cheng and Z M Ozsoyoglu ldquoUsing compactencodings for path-based computations on pedigree graphsrdquo inProceedings of the ACM Conference on Bioinformatics Compu-tational Biology and Biomedicine (ACM-BCB rsquo11) pp 235ndash244August 2011
[14] E Cheng B Elliott and Z M Ozsoyoglu ldquoScalable compu-tation of kinship and identity coefficients on large pedigreesrdquoin Proceedings of the 7th Annual International Conference onComputational Systems Bioinformatics (CSB rsquo08) pp 27ndash362008
[15] E Cheng B Elliott and Z M Ozsoyoglu ldquoEfficient compu-tation of kinship and identity coefficients on large pedigreesrdquoJournal of Bioinformatics and Computational Biology (JBCB)vol 7 no 3 pp 429ndash453 2009
[16] S Wright ldquoCoefficients of inbreeding and relationshiprdquo TheAmerican Naturalist vol 56 no 645 1922
[17] R Nadot and G Vaysseix ldquoKinship and identity algorithm ofcoefficients of identityrdquo Biometrics vol 29 no 2 pp 347ndash3591973
[18] E Cheng Scalable path-based computations on pedigree data[PhD thesis] Case Western Reserve University ClevelandOhio USA 2012
[19] V Ollikainen Simulation Techniques for Disease Gene Localiza-tion in Isolated Populations [PhD thesis] University ofHelsinkiHelsinki Finland 2002
[20] H T T Toivonen P Onkamo K Vasko et al ldquoData miningapplied to linkage diseqilibrium mappingrdquoThe American Jour-nal of Human Genetics vol 67 no 1 pp 133ndash145 2000
[21] W Boucher ldquoCalculation of the inbreeding coefficientrdquo Journalof Mathematical Biology vol 26 no 1 pp 57ndash64 1988
Submit your manuscripts athttpwwwhindawicom
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MEDIATORSINFLAMMATION
of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Behavioural Neurology
EndocrinologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Disease Markers
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
OncologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Oxidative Medicine and Cellular Longevity
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
PPAR Research
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
ObesityJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational and Mathematical Methods in Medicine
OphthalmologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Diabetes ResearchJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Research and TreatmentAIDS
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Gastroenterology Research and Practice
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Parkinsonrsquos Disease
Evidence-Based Complementary and Alternative Medicine
Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom
12 Computational and Mathematical Methods in Medicine
S0S1 S2 S3 S4 S5 S6 S7
S8 S9 S10 S11 S12 S13 S14 S15 S16
PAa
PAdPAb
PAc
Figure 13 Scenarios of ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ at path-pair level
Type 1 zero root homo-overlap and zero root heter-overlapType 2 zero root homo-overlap and one root heter-overlap 119875
119860119903ending at 119903
Type 3
zero root homo-overlap and two rootheter-overlap 119875
1198601199031and1198751198601199032
ending at1199031and 1199032 respectively
one root homo-overlap 119875119860ℎ
ending at ℎand two root heter-overlap 119875
1198601199031and 119875
1198601199032
ending at 1199031and 1199032 and 119903
1= 1199032
(17)
Type 4 one root homo-overlap 119875119860ℎ
ending at ℎ andtwo root heter-overlap ending at 119903
1and 1199032 and ℎ =
1199031= 1199032 For ⟨(119875
119878119886 119875119878119887) (119875119879119888 119875119879119889)⟩ (119878 = 119879) there is
one type (ie Type 5)Type 5 ⟨119875
119878119886 119875119878119887⟩ has zero overlap individual ⟨119875
119879119888
119875119879119889⟩ has zero overlap individual
At most one path-pair (either ⟨119875119878119886 119875119878119887⟩ or ⟨119875
119879119888
119875119879119889⟩) can have crossover individualsBetween a path from ⟨119875
119878119886 119875119878119887⟩ and a path from ⟨119875
119879119888 119875119879119889⟩
there are no overlap individuals but there can be crossoverindividuals 119909 where 119909 = 119878 and 119909 = 119879
119861=
119878 when gen (119878) lt gen (119879)119878 when gen (119878) = gen (119879)
and 119879 has two parents119879 otherwise
1198712-pair =
119871119875119860119886+ 119871119875119860119887
+119871119875119860119888+ 119871119875119860119889
for Type 1119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 119871119875119860119903
for Type 2119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 1198711198751198601199031
minus 1198711198751198601199032
for Type 3119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
+119871119875119860119889minus 2 lowast 119871
119875119860ℎfor Type 4
119871⟨119875119878119886 119875119878119887⟩
= 119871119875119878119886+ 119871119875119878119887
for Type 5
119871⟨119875119879119888 119875119879119889⟩
= 119871119875119879119888+ 119871119875119879119889
for Type 5
(18)
Note that if ⟨119886 119887⟩ and ⟨119888 119889⟩ have zero quad-commonancestors we have the following formula for Φ
119886119887119888119889
Φ119886119887119888119889
= sum
(119878119879)isinType 6(1
2)
119871⟨119875119878119886119875119878119887⟩+119871⟨119875119879119888119875119879119889
⟩
Φ119878119878lowast Φ119879119879 (19)
Type 6 ⟨119875119878119886 119875119878119887⟩ is a nonoverlapping path-pair and ⟨119875
119879119888
119875119879119889⟩ is a nonoverlapping path-pair Between a path from
⟨119875119878119886 119875119878119887⟩ and a path from ⟨119875
119879119888 119875119879119889⟩ there are no overlap
individuals but there can be crossover individuals119871⟨119875119878119886 119875119878119887⟩
and 119871⟨119875119879119888119875119879119889⟩
are defined as in Type 5The correctness of the path-counting formula forΦ
119886119887119888119889is
proven in Appendix C For completeness please refer to [18]for the path-counting formulas for Φ
119886119886119887119888 Φ119886119887119886119888
Φ119886119887119886119887
andΦ119886119886119886119887
34 Experimental Results In this section we show the effi-ciency of our path-counting method using NodeCodes forcondensed identity coefficients by making comparisons withthe performance of a recursive method used in [10] Weimplemented two methods (1) using recursive formulas tocompute each required kinship coefficient and generalizedkinship coefficient (2) using path-counting method coupledwith NodeCodes to compute each required kinship coeffi-cient and generalized kinship coefficient independently Werefer to the first method as Recursive the second methodas NodeCodes For completeness please refer to [18] for thedetails of the NodeCodes-based method
Nodecodes of a node is a set of labels each representing apath to the node from its ancestors Given a pedigree graphlet 119903 be the progenitor (ie the node with 0 in-degree)(For simplicity we assume there is one progenitor 119903 asthe ancestor of all individuals in the pedigree Otherwise avirtual node 119903 can be added to the pedigree graph and allprogenitors can be made children of 119903) For each node 119906 inthe graph the set of NodeCodes of 119906 denoted as NC(119906) areassigned using a breadth-first-search traversal starting from119903 as follows
(1) If 119906 is 119903 then NC(119903) contains only one element theempty string
(2) Otherwise let 119906 be a node with NC(119906) and V0 V1
V119896be 119906rsquos children in sibling order then for each 119909
in NC(119906) a code 119909119894lowast is added to NC(V119894) where 0 le
119894 le 119896 and lowast indicates the gender of the individualrepresented by node V
119894
Computational and Mathematical Methods in Medicine 13
Computations of kinship coefficients for two individualsand generalized kinship coefficients for three individualspresented in [11 12 14 15] are using NodeCodes TheNodeCodes-based computation schemes can also be appliedfor the generalized kinship coefficients for four individualsand two pairs of individuals For completeness please referto [18] for the details using NodeCodes to compute thegeneralized kinship coefficients for four individuals and twopairs of individuals based on our proposed path-countingformulas in Sections 32 and 33
In order to test the scalability of our approach for cal-culating condensed identity coefficients on large pedigreeswe used a population simulator implemented in [11] togenerate arbitrarily large pedigreesThe population simulatoris based on the algorithm for generating populations withoverlapping generations in Chapter 4 of [19] along withthe parameters given in Appendix B of [20] to model therelatively isolated Finnish Kainuu subpopulation and itsgrowth during the years 1500ndash2000 An overview of thegeneration algorithmwas presented in [11 12 14]The param-eters include startingending year initial population sizeinitial age distribution marriage probability maximum ageat pregnancy expected number of children by time periodimmigration rate and probability of death by time period andage group
We examine the performance of condensed identity coef-ficients using twelve synthetic pedigrees which range from75 individuals to 195197 individuals The smallest pedigreespans 3 generations and the largest pedigree spans 19 gener-ations We analyzed the effects of pedigree size and the depthof individuals in the pedigree (the longest path between theindividual and a progenitor) on the computation efficiencyimprovement
In the first experiment 300 random pairs were selectedfrom each of our 12 synthetic pedigrees Figure 14 showscomputation efficiency improvement for each pedigree Ascan be seen the improvement of NodeCodes over Recursivegrew increasingly larger as the pedigree size increased froma comparable amount of 2683 on the smallest pedigree to9475 on the largest pedigree It also shows that path-count-ing method coupled with NodeCodes can scale very well onlarge pedigrees in terms of computing condensed identitycoefficients
In our next experiment we examined the effect of thedepth of the individual in the pedigree on the query time Foreach depth we generated 300 random pairs from the largestsynthetic pedigree
Figure 15 shows the effect of depth on the compu-tation efficiency improvement We can see the improve-ment of NodeCodes over Recursive ranging from 8648 to9130
4 Conclusion
We have introduced a framework for generalizing Wrightrsquospath-counting formula for more than two individuals Aim-ing at efficiently computing condensed identity coefficients
0
50
100
150
200
77 181
383
769
1558
3105
6174
1235
1
2466
7
4976
1
9832
8
1951
97
250
300
Aver
age t
ime (
ms)
Individuals in pedigree
RecursiveNodecodes
Figure 14 The effect of pedigree size on computation efficiencyimprovement
0200400600800
10001200140016001800
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Aver
age t
ime (
ms)
Depth
RecursiveNodeCodes
Figure 15 The effect of depth on computation efficiency improve-ment
we proposed path-counting formulas (PCF) for all general-ized kinship coefficients for which are sufficient for express-ing condensed identity coefficients by a linear combinationWe also perform experiments to compare the efficiency of ourmethod with the recursive method for computing condensedidentity coefficients on large pedigrees Our future workincludes (i) further improvements on condensed identifycoefficients computation by collectively calculating the setof generalized kinship coefficients to avoid redundant com-putations and (ii) experimental results for using PCF inconjunction with encoding schemes (eg compact path-encoding schemes [13]) for computing condensed identitycoefficients on very large pedigrees
Appendices
A Path-Counting Formulas of Special Cases
A1 Path-Counting Formula for Φ119886119886119887
For ⟨1198751198601198861 1198751198601198862⟩ we
introduce a special case where 1198751198601198861
and 1198751198601198862
aremergeable
14 Computational and Mathematical Methods in Medicine
PAa1 PAa2 PAa1 PAa2
S0 S1
PAb PAb PAb
If is mergeable⟨P ⟩Aa1 PAa2
PAa
S2 S3
Figure 16 A path-pair level graphical representation of ⟨1198751198601198861 1198751198601198862
119875119860119887⟩
Definition A1 (Mergeable Path-Pair) A path-pair ⟨1198751198601198861
1198751198601198862⟩ is mergeable if and only if the two paths 119875
1198601198861and 119875
1198601198862
are completely identical
Next we present a graphical representation of ⟨1198751198601198861 1198751198601198862
119875119860119887⟩ in Figure 16
Lemma A2 For 1198782and 119878
3in Figure 16 ⟨119875
1198601198861 1198751198601198862⟩ cannot
be a mergeable path-pair
Proof For 1198782and 119878
3 if ⟨119875
1198601198861 1198751198601198862⟩ is mergeable then
any common individual 119904 between 1198751198601198861
and 119875119860119887
is alsoa shared individual between 119875
1198601198862and 119875
119860119887 It means
119904 isin 119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) which contradicts the fact that
119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) = 0
Considering all three scenarios in Figure 16 only 1198781can
have a mergeable path-pair ⟨1198751198601198861 1198751198601198862⟩ by Lemma A2 Now
we present our path-counting formula forΦ119886119886119887
where 119886 is notan ancestor of 119887
Φ119886119886119887
= sum
119860
( sum
Type 1(1
2)
119871 tripleminus1
Φ119860119860119860
+ sum
Type 2(1
2)
119871 triple
Φ119860119860
+ sum
Type 3(1
2)
119871⟨119875119860119886119875119860119887⟩+1
Φ119860119860)
(A1)
where 119860 a common ancestor of 119886 and 119887When ⟨119875
1198601198861 1198751198601198862⟩ is not mergeable
Type 1 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has no root 2-overlap
Type 2 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has one root 2-overlap path
119875119860119904
ending at the individual 119904
When ⟨1198751198601198861 1198751198601198862⟩ is mergeable
Type 3 ⟨119875119860119886 119875119860119887⟩ is a nonoverlapping path-pair
119871 triple = 1198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887
for Type 11198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887minus 119871119875119860119904
for Type 2
119871⟨119875119860119886 119875119860119887⟩
= 119871119875119860119886+ 119871119875119860119887
for Type 3
(A2)
For the sake of completeness if 119886 is an ancestor of 119887 there isno recursive formula for Φ
119886119886119887in [10] but we can use either
the recursive formula for Φ119886119887119888
or the path-counting formulaforΦ119886119887119888
to computeΦ11988611198862119887
A2 Path-Counting Formula for Φ119886119886119887119888
Given a path-quad⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩ if ⟨119875
1198601198861 1198751198601198862⟩ is not mergeable then
we process the path-quad as equivalent to ⟨119875119860119886 119875119860119887 119875119860119888
119875119860119889⟩ If ⟨119875
1198601198861 1198751198601198862⟩ is mergeable the path-quad ⟨119875
1198601198861 1198751198601198862
119875119860119887 119875119860119888⟩ can be condensed to scenarios for ⟨119875
119860119886 119875119860119887 119875119860119888⟩
Now we present a path-counting formula forΦ119886119886119887119888
where119886 is not an ancestor of 119887 and 119888 as follows
Φ119886119886119887119888
= sum
119860
( sum
Type 1(1
2)
119871quadminus1
Φ119860119860119860119860
+ sum
Type 2(1
2)
119871quad
ΦAAA
+ sum
Type 3(1
2)
119871quad+1
Φ119860119860)
+sum
119860
( sum
Type 4(1
2)
119871 triple+1
Φ119860119860119860
+ sum
Type 5(1
2)
119871 triple+2
Φ119860119860)
(A3)
where 119860 a quad-common ancestor of 119886 119887 119888 and 119889When ⟨119875
1198601198861 1198751198601198862⟩ is not mergeable
Type 1 zero root 2-overlap and zero root 3-overlappathType 2 one root 2-overlap path 119875
119860119904ending at 119904
Type 3
Case 1 two root 2-overlap paths 1198751198601199041
and 1198751198601199042
ending at 1199041and 1199042 respectively
Case 2 one root 3-overlap path 119875119860119905
ending at 119905Case 3 one root 2-overlapand one root 3-overlap paths119875119860119904
and 119875119860119905
ending at 119904 and 119905respectively
(A4)
When ⟨1198751198601198861 1198751198601198862⟩ is mergeable
Type 4 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has zero root 2-overlap path
Type 5 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has one root 2-overlap path119875
119860119904
ending at 119904
119871quad=
1198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
for Type 11198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus119871119875119860119904
for Type 21198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus1198711198751198601199041
minus 1198711198751198601199042
for Case 1isinType 31198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus119871119875119860119905
for Case 2isinType 31198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus119871119875119860119905minus 119871119875119860119904
for Case 3isinType 3
119871 triple = 119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
for Type 4119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888minus 119871119875119860119904
for Type 5(A5)
Computational and Mathematical Methods in Medicine 15
Note that if 119886 is an ancestor of either 119887 or 119888 or both ofthem then the path-counting formula of Φ
119886119887119888119889is applicable
to computeΦ11988611198862119887119888
A3 Path-Counting Formula for Φ119886119886119886119887
A special case of⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ for ⟨119875
1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ is introduced
when ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable With the existence of
a mergeable path-triple ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ can be con-
densed to ⟨119875119860119886 119875119860119887⟩
Definition A3 (Mergeable Path-Triple) Given three paths1198751198601198861
1198751198601198862
and 1198751198601198863
they are mergeable if and only if theyare completely identical
Lemma A4 Given a path-quad ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there
must be at least one mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩
⟨1198751198601198861 1198751198601198863⟩ ⟨1198751198601198862 1198751198601198863⟩
Proof For an individual 119886 with two parents 119891 and 119898 thepaternal allele of the individual 119886 is transmitted from 119891 andthe maternal allele is transmitted from119898 At allele level onlytwo descent paths starting from an ancestor are allowed Fora path-quad ⟨119875
1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there must be at least one
mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩ ⟨1198751198601198861 1198751198601198863⟩ and
⟨1198751198601198862 1198751198601198863⟩
For simplicity we treat ⟨1198751198601198861 1198751198601198862⟩ as a default mergeable
path-pairNow we present the path-counting formula for Φ
119886119886119886119887
where 119886 is not an ancestor of 119887 as follows
Φ119886119886119886119887
= sum
119860
(3
2( sum
Type 1(1
2)
119871 tripleminus1
Φ119860119860119860
+ sum
Type 2(1
2)
119871 triple
Φ119860119860)
+ sum
Type 3(1
2)
119871pair+2
Φ119860119860)
(A6)
where 119860 a common ancestor of 119886 and 119887When there is only one mergeable path-pair (let us con-
sider ⟨1198751198601198861 1198751198601198862⟩ as the mergeable path-pair)
Type 1 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has zero root 2-overlap path
Type 2 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has one root 2-overlap path
119875119860119904
ending at 119904
When ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable
Type 3 ⟨119875119860119886 119875119860119887⟩ is nonoverlapping
119871 triple = 1198711198751198601198861
+ 1198711198751198601198863
+ 119871119875119860119887
for Type 11198711198751198601198861
+ 1198711198751198601198863
+ 119871119875119860119887minus 119871119875119860119904
for Type 2
119871pair = 119871119875119860119886 + 119871119875119860119887 for Type 3
(A7)
Note that if 119886 is an ancestor of 119887 we treat Φ119886119886119886119887
=
Φ119886111988621198863119887
Then we apply the path-counting formula forΦ119886119887119888119889
to computeΦ119886111988621198863119887
Case21 Case31 ΦAAAΦabCase22 Case32
Case23 ΦAA
Figure 17 Dependency graph for different cases regardingΦ119886119887119888
andΦ119886119886119887
B Proof for Path-Counting Formulas ofThree Individuals
Wefirst demonstrate that for one triple-common ancestor119860the path-counting computation of Φ
119886119887119888is equivalent to the
computation using recursive formulas Then we prove thecorrectness of the path-counting computation for multipletriple-common ancestors
B1 One Triple-Common Ancestor Considering the differenttypes of path-triples starting from a triple-common ancestor119860 in a pedigree graph119866 contributing toΦ
119886119887119888andΦ
119886119886119887119866 can
have 5 different cases
Case 21 119866 does not haveany path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with root overlapCase 22 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with root overlapCase 23 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
having mergeablepath-pair⟨119875
1198601198861 1198751198601198862⟩
lArr997904 Φ119886119886119887
Case 31 119866 does not haveany path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩
with root overlapCase 32 119866 has path-triples
⟨119875119860119886 119875119860119887 119875119860119888⟩
with root overlap
lArr997904 Φ119886119887119888
(B1)
Based on the 5 cases from Case 21 to Case 32 we firstconstruct a dependency graph shown in Figure 17 consist-ent with the recursive formulas (3) (4) and (5) for the gener-alized kinship coefficients for three individuals
Then we take the following steps to prove the correctnessof the path-counting formulas (12) and (A1)
(i) forΦ119886119887 the correctness of the path-counting formula
(ie Wrightrsquos formula) is proven in [21] For Case 21and Case 22 the correctness is proven based on thecorrectness of Cases 31 and 32
(ii) for Case 23 it has no cycle but only depends on Φ119886119887
Thus we prove the correctness of Case 23 by trans-forming the case toΦ
119886119887
16 Computational and Mathematical Methods in Medicine
a b
c
(a)
A
a b c
(b)
Figure 18 (a) 119888 is a parent of 119886 and 119887 (b) no individual is a parent of another
Parent-child relationshipAncestor-descendant relationship
A
a
s v p
f b c
(a)
Parent-child relationshipAncestor-descendant relationship
c
a
s v
f b
(b)
Figure 19 (a) No individual is a parent of another (b) 119888 is an ancestor of 119886 and 119887
(iii) for Cases 31 and 32 the correctness is proven byinduction on the number of edges 119899 in the pedigreegraph 119866
B11 Correctness Proof for Case 31
Case 31 ForΦ119886119887119888
119866 does not have any path triples ⟨119875119860119886 119875119860119887
119875119860119888⟩ with root overlap
Proof (Basis) There are two basic scenarios (i) one individ-ual is a parent of another (ii) no individual is a parent ofanother among 119886 119887 and 119888
Using the recursive formula (3) to compute Φ119886119887119888
forFigure 18(a) Φ
119886119887119888= (12)Φ
119888119887119888= (12)
2
Φ119888119888119888 for Figure 18(b)
Φ119886119887119888= (12)Φ
119860119887119888= (12)
2
Φ119860119860119888
= (12)3
Φ119860119860119860
Using the path-counting formula (12) if a path-triple
⟨119875119860119886 119875119860119887 119875119860119888⟩ has no root overlap (ie Type 1) then the
contribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ
119886119887119888can be computed as
follows sumType 1(12)119871⟨119875119860119886119875119860119887
119875119860119888⟩Φ119860119860119860
where 119871⟨119875119860119886119875119860119887 119875119860119888⟩
=
119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
For Figure 18(a) 119888 is the only triple-common ancestor
and we obtain Φ119886119887119888
= (12)119871⟨119875119888119886119875119888119887
119875119888119888⟩Φ119888119888119888
= (12)2
Φ119888119888119888 for
Figure 18(b) we obtain Φ119886119887119888
= (12)119871⟨119875119860119886119875119860119887
119875119860119888⟩Φ119860119860119860
=
(12)3
Φ119860119860119860
Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Then we show it istrue for 119899 = 119896 + 1
For Figures 19(a) and 19(b) among 119886 119887 and 119888 let 119886 be theindividual having the longest path starting from their triple-common ancestor in the pedigree graph119866with (119896+1) edgesIf we remove the node 119886 and cut the edge 119891 rarr 119886 from 119866
then the new graph 119866lowast has 119896 edges In terms of computingΦ119891119887119888
119866lowast satisfies the condition for induction hypothesisFor Figure 19(a) Φ
119891119887119888= sumType 1(12)
119871⟨119875119860119891119875119860119887119875119860119888⟩Φ119860119860119860
Based on the recursive formula (3)Φ
119886119887119888= (12)(Φ
119891119887119888+Φ119898119887119888)
where 119891 and 119898 are parents of 119886 In 119866 119886 only has one parent119891 thus it indicatesΦ
119898119887119888= 0 Then we can plug-in the path-
counting formula forΦ119891119887119888
to obtain
Φ119886119887119888=1
2Φ119891119887119888
=1
2lowast sum
Type 1(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩
Φ119860119860119860
= sum
Type 1(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860119860
∵ 119871⟨119875119860119886119875119860119887 119875119860119888⟩
= 119871⟨119875119860119891119875119860119887 119875119860119888⟩
+ 1
there4 Φ119886119887119888= sum
Type 1(1
2)
119871⟨119875119860119886119875119860119887119875119860119888⟩
Φ119860119860119860
(B2)
Similarly for Figure 19(b) we obtain Φ119886119887119888
=
sumType 1(12)119871⟨119875119888119891119875119888119887119875119888119888⟩+1
Φ119888119888119888= sumType 1(12)
119871⟨119875119888119886119875119888119887119875119888119888⟩Φ119888119888119888
Thus it is true for 119899 = 119896 + 1
B12 Correctness Proof for Case 32
Case 32 ForΦ119886119887119888
119866 has path triples ⟨119875119860119886 119875119860119887 119875119860119888⟩with root
overlap
Proof (Basis) There are three basic scenarios (i) there are twoindividuals who are parents of another (ii) there is only oneindividual who is parent of another (iii) there is no individualwho is a parent of another among 119886 119887 and 119888
Computational and Mathematical Methods in Medicine 17
a
b
c
(a)
A
a
b c
(b)
A
a
s
b
c
(c)
Figure 20 (a) 119887 is a parent of 119886 and 119888 is a parent of 119887 (b) 119887 is a parentof 119886 (c) no individual who is a parent of another
Using the recursive formula (3) to compute Φ119886119887119888
inFigure 20 for Figure 20(a) Φ
119886119887119888= (12)Φ
119887119887119888= (12)
2
Φ119887119888=
(12)3
Φ119888119888 for Figure 20(b)Φ
119886119887119888= (12)Φ
119887119887119888= (12)
2
Φ119887119888=
(12)4
Φ119860119860
for Figure 20(c)Φ119886119887119888= (12)
2
Φ119904119904119888= (12)
3
Φ119904119888=
(12)5
Φ119860119860
Using the path-counting formula (12) if a path-triple
⟨119875119860119886 119875119860119887 119875119860119888⟩ has root overlap (ie Type 2) then the con-
tribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ
119886119887119888can be computed as
followssumType 2(12)119871⟨119875119860119886119875119860119887
119875119860119888⟩+1
Φ119860119860
where 119871⟨119875119860119886 119875119860119887 119875119860119888⟩
=
119871119875119860119886
+ 119871119875119860119887
+ 119871119875119860119888minus 119871119875119860119904
and 119904 is the last individual of theroot overlap path 119875
119860119904
For Figure 20(a) 119888 is the only triple-common ancestorand we obtain Φ
119886119887119888= (12)
119871⟨119875119888119886119875119888119887119875119888119888⟩+1
Φ119888119888= (12)
2+1
Φ119888119888=
(12)3
Φ119888119888 Similarly for Figures 20(b) and 20(c) we obtain
Φ119886119887119888= (12)
4
Φ119860119860
and Φ119886119887119888= (12)
5
Φ119860119860
respectively
Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Show that it is truefor = 119896 + 1
For Figures 21(a) 21(b) and 21(c) among 119886 119887 and 119888 let119886 be the individual who has the longest path and let 119901 be aparent of 119886 Then we cut the edge 119901 rarr 119886 from 119866 and obtaina new graph 119866lowast which satisfies the condition of inductionhypothesis For Figure 21(a) we use the path-counting for-mula forΦ
119891119887119888in 119866lowast Φ
119891119887119888= sumType 2(12)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860
In 119866 119891 is the only parent of 119886 according to the recursive
formula (3) we have Φ119886119887119888= (12)Φ
119891119887119888 Then we can plug-in
the Φ119891119887119888
and obtain
Φ119886119887119888=1
2Φ119891119887119888
=1
2sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860
= sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1
Φ119860119860
∵ 119871⟨119875119860119886 119875119860119887 119875119860119888⟩
= 119871⟨119875119860119891119875119860119887 119875119860119888⟩
+ 1
there4 Φ119886119887119888= sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1
Φ119860119860
= sum
Type 2(1
2)
119871⟨119875119860119886119875119860119887119875119860119888⟩+1
Φ119860119860
(B3)
For Figures 21(b) and 21(c) we take the same steps as we cal-culate Φ
119886119887119888for Figure 21(a)
In summary it is true for 119899 = 119896 + 1
A
a
s
t
f
b
c
(a)
a
t
b
A
s c
(b)
a
s
t
b
c
(c)Figure 21 (a) No individual who is a parent of another (b) 119887 is aparent of 119886 (c) 119887 is a parent of 119886 and 119888 is an ancestor of 119887
B13 Correctness Proof for Case 23
Case 23 For Φ119886119886119887
the path-triples in the pedigree graph 119866have mergeable path-pair
Proof Considering the relationship between 119886 and 119887 119866has two scenarios (i) 119887 is not an ancestor of 119886 (ii) 119887 isan ancestor of 119886 Using the path-counting formula (A1)if a path-triple ⟨119875
1198601198861 1198751198601198862 119875119860119887⟩ isin Type 3 which means
that it has a mergeable path-pair then the contributionof ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ to Φ
119886119886119887can be computed as follows
sumType 3(12)119871⟨119875119860119886119875119860119887
⟩+1Φ119860119860
where 119871⟨119875119860119886 119875119860119887⟩
= 119871119875119860119886+ 119871119875119860119887
Using the recursive formula (4) we obtain Φ
119886119886119887=
(12)(Φ119886119887+ Φ119891119898119887)
For Figure 22(a) 119860 is a common ancestor of 119886 and 119887∵ 119886 only has one parent 119891
there4 Φ119886119886119887
=1
2(Φ119886119887+ Φ119891119898119887)
=1
2(Φ119886119887+ 0) =
1
2Φ119886119887
(as 119898 is missing) (B4)
For Φ119886119887 we use Wrightrsquos formula and obtain Φ
119886119887=
sum119875(12)119871⟨119875119860119886119875119860119887
⟩Φ119860119860
where 119875 denotes all nonoverlappingpath-pairs ⟨119875
119860119886 119875119860119887⟩
Then we have Φ119886119886119887
= (12)Φ119886119887
=
(12)sum119875(12)119871⟨119875119860119886119875119860119887
⟩Φ119860119860= sum119875(12)119871⟨119875119860119886119875119860119887
⟩+1Φ119860119860
For Figure 22(b) we can also transform the computation
of Φ119886119886119887
to Φ119886119887
In summary it shows that the path-counting formula(A1) is true for Case 23
B14 Correctness Proof for Cases 21 and 22 For Φ119886119886119887
whenthere is no path-triple having mergeable path-pair (ie thepath-triple belongs to either Case 21 or Case 23)Φ
119886119886119887can be
transformed toΦ11988611198862119887
which is equivalent to the computationof Φ119886119887119888
for Cases 31 and 32 The correctness of our path-counting formula for Cases 31 and 32 is proven Thus weobtain the correctness for Φ
119886119886119887when the path-triple belongs
to either Case 21 or Case 22
B2 Multiple Triple-Common Ancestors Now we providethe correctness proof for multiple triple-common ancestorsregarding the path-counting formulas (12) and (A1)
18 Computational and Mathematical Methods in Medicine
A
a
s
w
t
f
b
Parent-child relationshipAncestor-descendant relationship
(a)
a
s
f
b
Parent-child relationshipAncestor-descendant relationship
(b)
Figure 22 (a) 119887 is not an ancestor of 119886 (b) 119887 is an ancestor of 119886
Lemma A Given a pedigree graph 119866 and three individuals 119886119887 119888 having at least one trip-common ancestorΦ
119886119887119888is correctly
computed using the path counting formulas (12) and (A1)
Proof Proof by induction on the number of triple-commonancestorsBasis 119866 has only one triple-common ancestor of 119886 119887 and 119888
The correctness of (12) and (A1) for 119866 with only one tri-ple-common ancestor of 119886 119887 and 119888 is proven in the previoussection
Induction Hypothesis Assume that if 119866 has 119896 or less triple-common ancestors of 119886 119887 and 119888 (12) and (A1) are correct for119866
Induction Step Now we show that it is true for 119866 with 119896 + 1triple-common ancestors of 119886 119887 and 119888
Let 119879119903119894 119862(119886 119887 119888 119866) denote all triple-common ancestorsof 119886 119887 and 119888 in 119866 where 119879119903119894 119862(119886 119887 119888 119866) = 119860
119894| 1 le 119894 le 119896 +
1 Let 1198601be the most top triple-common ancestor such that
there is no individual among the remaining ancestors 119860119894|
2 le 119894 le 119896 + 1 who is an ancestor of 1198601 Let 119878(119860
1) denote the
contribution from 1198601to Φ119886119887119888
Because119860
1is themost top triple-common ancestor there
is no path-triple from 119860119894| 2 le 119894 le 119896 + 1 to 119886 119887 and
119888 which passes through 1198601 Then we can remove 119860
1from
119866 and delete all out-going edges from 1198601and obtain a new
graph 1198661015840 which has 119896 triple-common ancestors of 119886 119887 and 119888It means 119879119903119894 119862(119886 119887 119888 1198661015840) = 119860
119894| 2 le 119894 le 119896 + 1
For the new graph 1198661015840 we can apply our induction
hypothesis and obtainΦ119886119887119888(1198661015840
)For the most top triple-common ancestor 119860
1 there are
two different cases considering its relationship with the othertriple-common ancestors
(1) there is no individual among 119860119894| 2 le 119894 le 119896 + 1 who
is a descendant of 1198601
(2) there is at least one individual among 119860119894| 2 le 119894 le
119896 + 1 who is a descendant of 1198601
For (1) since no individual among 119860119894| 2 le 119894 le 119896 + 1 is a
descendant of 1198601 the set of path-triples from 119860
1to 119886 119887 and
119888 is independent of the set of path-triples from 119860119894| 2 le 119894 le
119896 + 1 to 119886 119887 and 119888 It also means that the contribution from
1198601toΦ119886119887119888
is independent of the contribution from the othertriple-common ancestors
Summing up all contributions we can obtainΦ119886119887119888(119866) =
Φ119886119887119888(1198661015840
) + 119878(1198601)
For (2) let119860119895be one descendant of119860
1 Now both119860
1and
119860119895can reach 119886 119887 and 119888119901119905119894= 119905119886 1198601rarr sdot sdot sdot rarr 119886 119905
119887 1198601rarr sdot sdot sdot rarr 119887 119905
119888 1198601rarr
sdot sdot sdot rarr 119888 a path-triple from 1198601to 119886 119887 and 119888
If 119905119886 119905119887 and 119905
119888all pass through119860
119895 then the path-triple119901119905
119894
is not an eligible path-triple for Φ119886119887119888
When we compute thecontribution from119860
1toΦ119886119887119888
we exclude all such path-tripleswhere 119905
119886 119905119887 and 119905
119888all pass through a lower triple-common
ancestor In other words an eligible path-triple from 1198601
regarding Φ119886119887119888
cannot have three paths all passing through alower triple-common ancestor Therefore we know that thatthe contribution from119860
1toΦ119886119887119888
is independent of the contri-bution from the other triple-common ancestors Summing upall contributions we obtainΦ
119886119887119888(119866) = Φ
119886119887119888(1198661015840
) + 119878(1198601)
C Proof for Four Individuals and TwoPairs of Individuals
Here we give a proof sketch for the correctness of pathcounting formulas for four individuals First of all for fourindividuals in a pedigree graph 119866 we present all differentcases based on which we construct a dependency graphThe correctness of the path-counting formulas for two-pairindividuals can be proved similarly
C1 Proof for Four Individuals Consider the existence ofdifferent types of path-quads regarding Φ
119886119887119888119889 Φ119886119886119887119888
andΦ119886119886119886119887
there are 15 cases for a pedigree graph 119866
Case 21 119866 has path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with zero root overlapCase 22 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with one root overlapCase 23 119866 has path-pairs
⟨119875119860119886 119875119860119887⟩
with zero root overlap
lArr997904 Φ119886119886119886119887
Computational and Mathematical Methods in Medicine 19
Case21
Case31 ΦAAA
ΦAAA
Case41
Case42
Case34ΦAA
Case32
Case331
Case22
Case23
Case431
Case35
Case432
Case4 33
Case332
Case333
Figure 23 Dependency graph for different cases for four individuals
Case 31 119866 has path-quads⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with zero root overlapCase 32 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 2-overlapCase 331 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with two root 2-overlapCase 332 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 3-overlapCase 333 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 2-overlapand one root 3-overlap
Case 34 119866 has path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩
with zero root overlapCase 35 119866 has path-triples
⟨119875119860119886 119875119860119887 119875119860119888⟩
with one root overlap
lArr997904 Φ119886119886119887119888
Case 41 119866 has path-quads⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with zero root overlapCase 42 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 2-overlapCase 431 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with two root 2-overlapCase 432 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 3-overlapCase 433 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 2-overlapand one root 3-overlap
lArr997904 Φ119886119887119888119889
(C1)Then we construct a dependency graph shown in
Figure 23 for all cases for four individualsAccording to the dependency graph in Figure 23 the
intermediate steps including Cases 34 and 35 are already
proved for the computation of Φ119886119887119888
The correctness of thetransformation fromCase 42 to Case 34 can be proved basedon the recursive formula forΦ
119886119887119888119889andΦ
119886119886119887119888 Similarly we can
obtain the transformation from Case 431 to Case 35
C2 Proof for TwoPairs of Individuals Consider the existenceof different types of 2-pair-path-pair regarding Φ
119886119887119888119889 there
are 9 cases which are listed as follows
Case 41 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-
overlap and zero root heter-overlap
Case 42 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-
overlap and one root heter-overlap
Case 431 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root
homo-overlap and two root heter-overlap
Case 432 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root
homo-overlap and two root heter-overlap
Case 44119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root homo-
overlap and zero root heter-overlap
Case 45 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with two root homo-
overlap and zero root heter-overlap
Case 46 119866 has path-triples ⟨119875119860119886 119875119860119887 119875119860119888⟩ with zero root
overlapCase 47 119866 has path-triples ⟨119875
119860119886 119875119860119887 119875119860119888⟩ with one root
overlap
Case 48 119866 has path-pairs ⟨119875119879119888 119875119879119889⟩ with zero root overlap
Then we construct a dependency graph for the casesrelating to Φ
119886119887119888119889in Figure 24
According to the dependency graph in Figure 24Cases 46 47 and 48 are the intermediate steps whichalready are proved for the computation of Φ
119886119887119888 The
correctness of the transformation from Case 42 to Case 46can be proved based on the recursive formula for Φ
119886119887119888119889and
Φ119886119887119886119888
Similarly we can obtain the transformation fromCases 431 and 432 to Case 47 as well as from Case 44 toCase 48 accordingly
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
20 Computational and Mathematical Methods in Medicine
Case41
Case44
ΦAAA
Case42 Case46
Case48
ΦAA
ΦTT
Case431 Case47
Case432
ΦAAAA
Figure 24 Dependency graph for different cases for two pairs of individuals
Acknowledgments
The authors thank Professor Robert C Elston Case Schoolof Medicine for introducing to them the identity coefficientsand referring them to the related literature [7 10 17] Thiswork is partially supported by the National Science Founda-tionGrants DBI 0743705 DBI 0849956 andCRI 0551603 andby the National Institute of Health Grant GM088823
References
[1] Surgeon Generalrsquos New Family Health History Tool Is ReleasedReady for ldquo21st Century Medicinerdquo httpcompmedcomcate-gorypeople-helping-peoplepage7
[2] M Falchi P Forabosco E Mocci et al ldquoA genomewidesearch using an original pairwise sampling approach for largegenealogies identifies a new locus for total and low-density lipo-protein cholesterol in two genetically differentiated isolates ofSardiniardquoThe American Journal of Human Genetics vol 75 no6 pp 1015ndash1031 2004
[3] M Ciullo C Bellenguez V Colonna et al ldquoNew susceptibilitylocus for hypertension on chromosome 8q by efficient pedigree-breaking in an Italian isolaterdquo Human Molecular Genetics vol15 no 10 pp 1735ndash1743 2006
[4] Glossary of Genetic Terms National Human Genome ResearchInstitute httpwwwgenomegovglossaryid=148
[5] CW CottermanA calculus for statistico-genetics [PhD thesis]Columbus Ohio USA Ohio State University 1940 Reprintedin P Ballonoff Ed Genetics and Social Structure DowdenHutchinson amp Ross Stroudsburg Pa USA 1974
[6] G Malecot Les mathematique de lrsquoheredite Masson ParisFrance 1948 Translated edition The Mathematics of HeredityFreeman San Francisco Calif USA 1969
[7] M Gillois ldquoLa relation drsquoidentite en genetiquerdquo Annales delrsquoInstitut Henri Poincare B vol 2 pp 1ndash94 1964
[8] D L Harris ldquoGenotypic covariances between inbred relativesrdquoGenetics vol 50 pp 1319ndash1348 1964
[9] A Jacquard ldquoLogique du calcul des coefficients drsquoidentite entredeux individualsrdquo Population vol 21 pp 751ndash776 1966
[10] G Karigl ldquoA recursive algorithm for the calculation of identitycoefficientsrdquo Annals of Human Genetics vol 45 no 3 pp 299ndash305 1981
[11] B Elliott S F Akgul S Mayes and Z M Ozsoyoglu ldquoEfficientevaluation of inbreeding queries on pedigree datardquo in Proceed-ings of the 19th International Conference on Scientific and Statis-tical Database Management (SSDBM rsquo07) July 2007
[12] B Elliott E Cheng S Mayes and Z M Ozsoyoglu ldquoEfficientlycalculating inbreeding on large pedigrees databasesrdquo Informa-tion Systems vol 34 no 6 pp 469ndash492 2009
[13] L Yang E Cheng and Z M Ozsoyoglu ldquoUsing compactencodings for path-based computations on pedigree graphsrdquo inProceedings of the ACM Conference on Bioinformatics Compu-tational Biology and Biomedicine (ACM-BCB rsquo11) pp 235ndash244August 2011
[14] E Cheng B Elliott and Z M Ozsoyoglu ldquoScalable compu-tation of kinship and identity coefficients on large pedigreesrdquoin Proceedings of the 7th Annual International Conference onComputational Systems Bioinformatics (CSB rsquo08) pp 27ndash362008
[15] E Cheng B Elliott and Z M Ozsoyoglu ldquoEfficient compu-tation of kinship and identity coefficients on large pedigreesrdquoJournal of Bioinformatics and Computational Biology (JBCB)vol 7 no 3 pp 429ndash453 2009
[16] S Wright ldquoCoefficients of inbreeding and relationshiprdquo TheAmerican Naturalist vol 56 no 645 1922
[17] R Nadot and G Vaysseix ldquoKinship and identity algorithm ofcoefficients of identityrdquo Biometrics vol 29 no 2 pp 347ndash3591973
[18] E Cheng Scalable path-based computations on pedigree data[PhD thesis] Case Western Reserve University ClevelandOhio USA 2012
[19] V Ollikainen Simulation Techniques for Disease Gene Localiza-tion in Isolated Populations [PhD thesis] University ofHelsinkiHelsinki Finland 2002
[20] H T T Toivonen P Onkamo K Vasko et al ldquoData miningapplied to linkage diseqilibrium mappingrdquoThe American Jour-nal of Human Genetics vol 67 no 1 pp 133ndash145 2000
[21] W Boucher ldquoCalculation of the inbreeding coefficientrdquo Journalof Mathematical Biology vol 26 no 1 pp 57ndash64 1988
Submit your manuscripts athttpwwwhindawicom
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MEDIATORSINFLAMMATION
of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Behavioural Neurology
EndocrinologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Disease Markers
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
OncologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Oxidative Medicine and Cellular Longevity
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
PPAR Research
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
ObesityJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational and Mathematical Methods in Medicine
OphthalmologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Diabetes ResearchJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Research and TreatmentAIDS
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Gastroenterology Research and Practice
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Parkinsonrsquos Disease
Evidence-Based Complementary and Alternative Medicine
Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom
Computational and Mathematical Methods in Medicine 13
Computations of kinship coefficients for two individualsand generalized kinship coefficients for three individualspresented in [11 12 14 15] are using NodeCodes TheNodeCodes-based computation schemes can also be appliedfor the generalized kinship coefficients for four individualsand two pairs of individuals For completeness please referto [18] for the details using NodeCodes to compute thegeneralized kinship coefficients for four individuals and twopairs of individuals based on our proposed path-countingformulas in Sections 32 and 33
In order to test the scalability of our approach for cal-culating condensed identity coefficients on large pedigreeswe used a population simulator implemented in [11] togenerate arbitrarily large pedigreesThe population simulatoris based on the algorithm for generating populations withoverlapping generations in Chapter 4 of [19] along withthe parameters given in Appendix B of [20] to model therelatively isolated Finnish Kainuu subpopulation and itsgrowth during the years 1500ndash2000 An overview of thegeneration algorithmwas presented in [11 12 14]The param-eters include startingending year initial population sizeinitial age distribution marriage probability maximum ageat pregnancy expected number of children by time periodimmigration rate and probability of death by time period andage group
We examine the performance of condensed identity coef-ficients using twelve synthetic pedigrees which range from75 individuals to 195197 individuals The smallest pedigreespans 3 generations and the largest pedigree spans 19 gener-ations We analyzed the effects of pedigree size and the depthof individuals in the pedigree (the longest path between theindividual and a progenitor) on the computation efficiencyimprovement
In the first experiment 300 random pairs were selectedfrom each of our 12 synthetic pedigrees Figure 14 showscomputation efficiency improvement for each pedigree Ascan be seen the improvement of NodeCodes over Recursivegrew increasingly larger as the pedigree size increased froma comparable amount of 2683 on the smallest pedigree to9475 on the largest pedigree It also shows that path-count-ing method coupled with NodeCodes can scale very well onlarge pedigrees in terms of computing condensed identitycoefficients
In our next experiment we examined the effect of thedepth of the individual in the pedigree on the query time Foreach depth we generated 300 random pairs from the largestsynthetic pedigree
Figure 15 shows the effect of depth on the compu-tation efficiency improvement We can see the improve-ment of NodeCodes over Recursive ranging from 8648 to9130
4 Conclusion
We have introduced a framework for generalizing Wrightrsquospath-counting formula for more than two individuals Aim-ing at efficiently computing condensed identity coefficients
0
50
100
150
200
77 181
383
769
1558
3105
6174
1235
1
2466
7
4976
1
9832
8
1951
97
250
300
Aver
age t
ime (
ms)
Individuals in pedigree
RecursiveNodecodes
Figure 14 The effect of pedigree size on computation efficiencyimprovement
0200400600800
10001200140016001800
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Aver
age t
ime (
ms)
Depth
RecursiveNodeCodes
Figure 15 The effect of depth on computation efficiency improve-ment
we proposed path-counting formulas (PCF) for all general-ized kinship coefficients for which are sufficient for express-ing condensed identity coefficients by a linear combinationWe also perform experiments to compare the efficiency of ourmethod with the recursive method for computing condensedidentity coefficients on large pedigrees Our future workincludes (i) further improvements on condensed identifycoefficients computation by collectively calculating the setof generalized kinship coefficients to avoid redundant com-putations and (ii) experimental results for using PCF inconjunction with encoding schemes (eg compact path-encoding schemes [13]) for computing condensed identitycoefficients on very large pedigrees
Appendices
A Path-Counting Formulas of Special Cases
A1 Path-Counting Formula for Φ119886119886119887
For ⟨1198751198601198861 1198751198601198862⟩ we
introduce a special case where 1198751198601198861
and 1198751198601198862
aremergeable
14 Computational and Mathematical Methods in Medicine
PAa1 PAa2 PAa1 PAa2
S0 S1
PAb PAb PAb
If is mergeable⟨P ⟩Aa1 PAa2
PAa
S2 S3
Figure 16 A path-pair level graphical representation of ⟨1198751198601198861 1198751198601198862
119875119860119887⟩
Definition A1 (Mergeable Path-Pair) A path-pair ⟨1198751198601198861
1198751198601198862⟩ is mergeable if and only if the two paths 119875
1198601198861and 119875
1198601198862
are completely identical
Next we present a graphical representation of ⟨1198751198601198861 1198751198601198862
119875119860119887⟩ in Figure 16
Lemma A2 For 1198782and 119878
3in Figure 16 ⟨119875
1198601198861 1198751198601198862⟩ cannot
be a mergeable path-pair
Proof For 1198782and 119878
3 if ⟨119875
1198601198861 1198751198601198862⟩ is mergeable then
any common individual 119904 between 1198751198601198861
and 119875119860119887
is alsoa shared individual between 119875
1198601198862and 119875
119860119887 It means
119904 isin 119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) which contradicts the fact that
119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) = 0
Considering all three scenarios in Figure 16 only 1198781can
have a mergeable path-pair ⟨1198751198601198861 1198751198601198862⟩ by Lemma A2 Now
we present our path-counting formula forΦ119886119886119887
where 119886 is notan ancestor of 119887
Φ119886119886119887
= sum
119860
( sum
Type 1(1
2)
119871 tripleminus1
Φ119860119860119860
+ sum
Type 2(1
2)
119871 triple
Φ119860119860
+ sum
Type 3(1
2)
119871⟨119875119860119886119875119860119887⟩+1
Φ119860119860)
(A1)
where 119860 a common ancestor of 119886 and 119887When ⟨119875
1198601198861 1198751198601198862⟩ is not mergeable
Type 1 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has no root 2-overlap
Type 2 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has one root 2-overlap path
119875119860119904
ending at the individual 119904
When ⟨1198751198601198861 1198751198601198862⟩ is mergeable
Type 3 ⟨119875119860119886 119875119860119887⟩ is a nonoverlapping path-pair
119871 triple = 1198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887
for Type 11198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887minus 119871119875119860119904
for Type 2
119871⟨119875119860119886 119875119860119887⟩
= 119871119875119860119886+ 119871119875119860119887
for Type 3
(A2)
For the sake of completeness if 119886 is an ancestor of 119887 there isno recursive formula for Φ
119886119886119887in [10] but we can use either
the recursive formula for Φ119886119887119888
or the path-counting formulaforΦ119886119887119888
to computeΦ11988611198862119887
A2 Path-Counting Formula for Φ119886119886119887119888
Given a path-quad⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩ if ⟨119875
1198601198861 1198751198601198862⟩ is not mergeable then
we process the path-quad as equivalent to ⟨119875119860119886 119875119860119887 119875119860119888
119875119860119889⟩ If ⟨119875
1198601198861 1198751198601198862⟩ is mergeable the path-quad ⟨119875
1198601198861 1198751198601198862
119875119860119887 119875119860119888⟩ can be condensed to scenarios for ⟨119875
119860119886 119875119860119887 119875119860119888⟩
Now we present a path-counting formula forΦ119886119886119887119888
where119886 is not an ancestor of 119887 and 119888 as follows
Φ119886119886119887119888
= sum
119860
( sum
Type 1(1
2)
119871quadminus1
Φ119860119860119860119860
+ sum
Type 2(1
2)
119871quad
ΦAAA
+ sum
Type 3(1
2)
119871quad+1
Φ119860119860)
+sum
119860
( sum
Type 4(1
2)
119871 triple+1
Φ119860119860119860
+ sum
Type 5(1
2)
119871 triple+2
Φ119860119860)
(A3)
where 119860 a quad-common ancestor of 119886 119887 119888 and 119889When ⟨119875
1198601198861 1198751198601198862⟩ is not mergeable
Type 1 zero root 2-overlap and zero root 3-overlappathType 2 one root 2-overlap path 119875
119860119904ending at 119904
Type 3
Case 1 two root 2-overlap paths 1198751198601199041
and 1198751198601199042
ending at 1199041and 1199042 respectively
Case 2 one root 3-overlap path 119875119860119905
ending at 119905Case 3 one root 2-overlapand one root 3-overlap paths119875119860119904
and 119875119860119905
ending at 119904 and 119905respectively
(A4)
When ⟨1198751198601198861 1198751198601198862⟩ is mergeable
Type 4 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has zero root 2-overlap path
Type 5 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has one root 2-overlap path119875
119860119904
ending at 119904
119871quad=
1198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
for Type 11198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus119871119875119860119904
for Type 21198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus1198711198751198601199041
minus 1198711198751198601199042
for Case 1isinType 31198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus119871119875119860119905
for Case 2isinType 31198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus119871119875119860119905minus 119871119875119860119904
for Case 3isinType 3
119871 triple = 119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
for Type 4119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888minus 119871119875119860119904
for Type 5(A5)
Computational and Mathematical Methods in Medicine 15
Note that if 119886 is an ancestor of either 119887 or 119888 or both ofthem then the path-counting formula of Φ
119886119887119888119889is applicable
to computeΦ11988611198862119887119888
A3 Path-Counting Formula for Φ119886119886119886119887
A special case of⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ for ⟨119875
1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ is introduced
when ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable With the existence of
a mergeable path-triple ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ can be con-
densed to ⟨119875119860119886 119875119860119887⟩
Definition A3 (Mergeable Path-Triple) Given three paths1198751198601198861
1198751198601198862
and 1198751198601198863
they are mergeable if and only if theyare completely identical
Lemma A4 Given a path-quad ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there
must be at least one mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩
⟨1198751198601198861 1198751198601198863⟩ ⟨1198751198601198862 1198751198601198863⟩
Proof For an individual 119886 with two parents 119891 and 119898 thepaternal allele of the individual 119886 is transmitted from 119891 andthe maternal allele is transmitted from119898 At allele level onlytwo descent paths starting from an ancestor are allowed Fora path-quad ⟨119875
1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there must be at least one
mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩ ⟨1198751198601198861 1198751198601198863⟩ and
⟨1198751198601198862 1198751198601198863⟩
For simplicity we treat ⟨1198751198601198861 1198751198601198862⟩ as a default mergeable
path-pairNow we present the path-counting formula for Φ
119886119886119886119887
where 119886 is not an ancestor of 119887 as follows
Φ119886119886119886119887
= sum
119860
(3
2( sum
Type 1(1
2)
119871 tripleminus1
Φ119860119860119860
+ sum
Type 2(1
2)
119871 triple
Φ119860119860)
+ sum
Type 3(1
2)
119871pair+2
Φ119860119860)
(A6)
where 119860 a common ancestor of 119886 and 119887When there is only one mergeable path-pair (let us con-
sider ⟨1198751198601198861 1198751198601198862⟩ as the mergeable path-pair)
Type 1 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has zero root 2-overlap path
Type 2 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has one root 2-overlap path
119875119860119904
ending at 119904
When ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable
Type 3 ⟨119875119860119886 119875119860119887⟩ is nonoverlapping
119871 triple = 1198711198751198601198861
+ 1198711198751198601198863
+ 119871119875119860119887
for Type 11198711198751198601198861
+ 1198711198751198601198863
+ 119871119875119860119887minus 119871119875119860119904
for Type 2
119871pair = 119871119875119860119886 + 119871119875119860119887 for Type 3
(A7)
Note that if 119886 is an ancestor of 119887 we treat Φ119886119886119886119887
=
Φ119886111988621198863119887
Then we apply the path-counting formula forΦ119886119887119888119889
to computeΦ119886111988621198863119887
Case21 Case31 ΦAAAΦabCase22 Case32
Case23 ΦAA
Figure 17 Dependency graph for different cases regardingΦ119886119887119888
andΦ119886119886119887
B Proof for Path-Counting Formulas ofThree Individuals
Wefirst demonstrate that for one triple-common ancestor119860the path-counting computation of Φ
119886119887119888is equivalent to the
computation using recursive formulas Then we prove thecorrectness of the path-counting computation for multipletriple-common ancestors
B1 One Triple-Common Ancestor Considering the differenttypes of path-triples starting from a triple-common ancestor119860 in a pedigree graph119866 contributing toΦ
119886119887119888andΦ
119886119886119887119866 can
have 5 different cases
Case 21 119866 does not haveany path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with root overlapCase 22 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with root overlapCase 23 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
having mergeablepath-pair⟨119875
1198601198861 1198751198601198862⟩
lArr997904 Φ119886119886119887
Case 31 119866 does not haveany path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩
with root overlapCase 32 119866 has path-triples
⟨119875119860119886 119875119860119887 119875119860119888⟩
with root overlap
lArr997904 Φ119886119887119888
(B1)
Based on the 5 cases from Case 21 to Case 32 we firstconstruct a dependency graph shown in Figure 17 consist-ent with the recursive formulas (3) (4) and (5) for the gener-alized kinship coefficients for three individuals
Then we take the following steps to prove the correctnessof the path-counting formulas (12) and (A1)
(i) forΦ119886119887 the correctness of the path-counting formula
(ie Wrightrsquos formula) is proven in [21] For Case 21and Case 22 the correctness is proven based on thecorrectness of Cases 31 and 32
(ii) for Case 23 it has no cycle but only depends on Φ119886119887
Thus we prove the correctness of Case 23 by trans-forming the case toΦ
119886119887
16 Computational and Mathematical Methods in Medicine
a b
c
(a)
A
a b c
(b)
Figure 18 (a) 119888 is a parent of 119886 and 119887 (b) no individual is a parent of another
Parent-child relationshipAncestor-descendant relationship
A
a
s v p
f b c
(a)
Parent-child relationshipAncestor-descendant relationship
c
a
s v
f b
(b)
Figure 19 (a) No individual is a parent of another (b) 119888 is an ancestor of 119886 and 119887
(iii) for Cases 31 and 32 the correctness is proven byinduction on the number of edges 119899 in the pedigreegraph 119866
B11 Correctness Proof for Case 31
Case 31 ForΦ119886119887119888
119866 does not have any path triples ⟨119875119860119886 119875119860119887
119875119860119888⟩ with root overlap
Proof (Basis) There are two basic scenarios (i) one individ-ual is a parent of another (ii) no individual is a parent ofanother among 119886 119887 and 119888
Using the recursive formula (3) to compute Φ119886119887119888
forFigure 18(a) Φ
119886119887119888= (12)Φ
119888119887119888= (12)
2
Φ119888119888119888 for Figure 18(b)
Φ119886119887119888= (12)Φ
119860119887119888= (12)
2
Φ119860119860119888
= (12)3
Φ119860119860119860
Using the path-counting formula (12) if a path-triple
⟨119875119860119886 119875119860119887 119875119860119888⟩ has no root overlap (ie Type 1) then the
contribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ
119886119887119888can be computed as
follows sumType 1(12)119871⟨119875119860119886119875119860119887
119875119860119888⟩Φ119860119860119860
where 119871⟨119875119860119886119875119860119887 119875119860119888⟩
=
119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
For Figure 18(a) 119888 is the only triple-common ancestor
and we obtain Φ119886119887119888
= (12)119871⟨119875119888119886119875119888119887
119875119888119888⟩Φ119888119888119888
= (12)2
Φ119888119888119888 for
Figure 18(b) we obtain Φ119886119887119888
= (12)119871⟨119875119860119886119875119860119887
119875119860119888⟩Φ119860119860119860
=
(12)3
Φ119860119860119860
Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Then we show it istrue for 119899 = 119896 + 1
For Figures 19(a) and 19(b) among 119886 119887 and 119888 let 119886 be theindividual having the longest path starting from their triple-common ancestor in the pedigree graph119866with (119896+1) edgesIf we remove the node 119886 and cut the edge 119891 rarr 119886 from 119866
then the new graph 119866lowast has 119896 edges In terms of computingΦ119891119887119888
119866lowast satisfies the condition for induction hypothesisFor Figure 19(a) Φ
119891119887119888= sumType 1(12)
119871⟨119875119860119891119875119860119887119875119860119888⟩Φ119860119860119860
Based on the recursive formula (3)Φ
119886119887119888= (12)(Φ
119891119887119888+Φ119898119887119888)
where 119891 and 119898 are parents of 119886 In 119866 119886 only has one parent119891 thus it indicatesΦ
119898119887119888= 0 Then we can plug-in the path-
counting formula forΦ119891119887119888
to obtain
Φ119886119887119888=1
2Φ119891119887119888
=1
2lowast sum
Type 1(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩
Φ119860119860119860
= sum
Type 1(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860119860
∵ 119871⟨119875119860119886119875119860119887 119875119860119888⟩
= 119871⟨119875119860119891119875119860119887 119875119860119888⟩
+ 1
there4 Φ119886119887119888= sum
Type 1(1
2)
119871⟨119875119860119886119875119860119887119875119860119888⟩
Φ119860119860119860
(B2)
Similarly for Figure 19(b) we obtain Φ119886119887119888
=
sumType 1(12)119871⟨119875119888119891119875119888119887119875119888119888⟩+1
Φ119888119888119888= sumType 1(12)
119871⟨119875119888119886119875119888119887119875119888119888⟩Φ119888119888119888
Thus it is true for 119899 = 119896 + 1
B12 Correctness Proof for Case 32
Case 32 ForΦ119886119887119888
119866 has path triples ⟨119875119860119886 119875119860119887 119875119860119888⟩with root
overlap
Proof (Basis) There are three basic scenarios (i) there are twoindividuals who are parents of another (ii) there is only oneindividual who is parent of another (iii) there is no individualwho is a parent of another among 119886 119887 and 119888
Computational and Mathematical Methods in Medicine 17
a
b
c
(a)
A
a
b c
(b)
A
a
s
b
c
(c)
Figure 20 (a) 119887 is a parent of 119886 and 119888 is a parent of 119887 (b) 119887 is a parentof 119886 (c) no individual who is a parent of another
Using the recursive formula (3) to compute Φ119886119887119888
inFigure 20 for Figure 20(a) Φ
119886119887119888= (12)Φ
119887119887119888= (12)
2
Φ119887119888=
(12)3
Φ119888119888 for Figure 20(b)Φ
119886119887119888= (12)Φ
119887119887119888= (12)
2
Φ119887119888=
(12)4
Φ119860119860
for Figure 20(c)Φ119886119887119888= (12)
2
Φ119904119904119888= (12)
3
Φ119904119888=
(12)5
Φ119860119860
Using the path-counting formula (12) if a path-triple
⟨119875119860119886 119875119860119887 119875119860119888⟩ has root overlap (ie Type 2) then the con-
tribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ
119886119887119888can be computed as
followssumType 2(12)119871⟨119875119860119886119875119860119887
119875119860119888⟩+1
Φ119860119860
where 119871⟨119875119860119886 119875119860119887 119875119860119888⟩
=
119871119875119860119886
+ 119871119875119860119887
+ 119871119875119860119888minus 119871119875119860119904
and 119904 is the last individual of theroot overlap path 119875
119860119904
For Figure 20(a) 119888 is the only triple-common ancestorand we obtain Φ
119886119887119888= (12)
119871⟨119875119888119886119875119888119887119875119888119888⟩+1
Φ119888119888= (12)
2+1
Φ119888119888=
(12)3
Φ119888119888 Similarly for Figures 20(b) and 20(c) we obtain
Φ119886119887119888= (12)
4
Φ119860119860
and Φ119886119887119888= (12)
5
Φ119860119860
respectively
Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Show that it is truefor = 119896 + 1
For Figures 21(a) 21(b) and 21(c) among 119886 119887 and 119888 let119886 be the individual who has the longest path and let 119901 be aparent of 119886 Then we cut the edge 119901 rarr 119886 from 119866 and obtaina new graph 119866lowast which satisfies the condition of inductionhypothesis For Figure 21(a) we use the path-counting for-mula forΦ
119891119887119888in 119866lowast Φ
119891119887119888= sumType 2(12)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860
In 119866 119891 is the only parent of 119886 according to the recursive
formula (3) we have Φ119886119887119888= (12)Φ
119891119887119888 Then we can plug-in
the Φ119891119887119888
and obtain
Φ119886119887119888=1
2Φ119891119887119888
=1
2sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860
= sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1
Φ119860119860
∵ 119871⟨119875119860119886 119875119860119887 119875119860119888⟩
= 119871⟨119875119860119891119875119860119887 119875119860119888⟩
+ 1
there4 Φ119886119887119888= sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1
Φ119860119860
= sum
Type 2(1
2)
119871⟨119875119860119886119875119860119887119875119860119888⟩+1
Φ119860119860
(B3)
For Figures 21(b) and 21(c) we take the same steps as we cal-culate Φ
119886119887119888for Figure 21(a)
In summary it is true for 119899 = 119896 + 1
A
a
s
t
f
b
c
(a)
a
t
b
A
s c
(b)
a
s
t
b
c
(c)Figure 21 (a) No individual who is a parent of another (b) 119887 is aparent of 119886 (c) 119887 is a parent of 119886 and 119888 is an ancestor of 119887
B13 Correctness Proof for Case 23
Case 23 For Φ119886119886119887
the path-triples in the pedigree graph 119866have mergeable path-pair
Proof Considering the relationship between 119886 and 119887 119866has two scenarios (i) 119887 is not an ancestor of 119886 (ii) 119887 isan ancestor of 119886 Using the path-counting formula (A1)if a path-triple ⟨119875
1198601198861 1198751198601198862 119875119860119887⟩ isin Type 3 which means
that it has a mergeable path-pair then the contributionof ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ to Φ
119886119886119887can be computed as follows
sumType 3(12)119871⟨119875119860119886119875119860119887
⟩+1Φ119860119860
where 119871⟨119875119860119886 119875119860119887⟩
= 119871119875119860119886+ 119871119875119860119887
Using the recursive formula (4) we obtain Φ
119886119886119887=
(12)(Φ119886119887+ Φ119891119898119887)
For Figure 22(a) 119860 is a common ancestor of 119886 and 119887∵ 119886 only has one parent 119891
there4 Φ119886119886119887
=1
2(Φ119886119887+ Φ119891119898119887)
=1
2(Φ119886119887+ 0) =
1
2Φ119886119887
(as 119898 is missing) (B4)
For Φ119886119887 we use Wrightrsquos formula and obtain Φ
119886119887=
sum119875(12)119871⟨119875119860119886119875119860119887
⟩Φ119860119860
where 119875 denotes all nonoverlappingpath-pairs ⟨119875
119860119886 119875119860119887⟩
Then we have Φ119886119886119887
= (12)Φ119886119887
=
(12)sum119875(12)119871⟨119875119860119886119875119860119887
⟩Φ119860119860= sum119875(12)119871⟨119875119860119886119875119860119887
⟩+1Φ119860119860
For Figure 22(b) we can also transform the computation
of Φ119886119886119887
to Φ119886119887
In summary it shows that the path-counting formula(A1) is true for Case 23
B14 Correctness Proof for Cases 21 and 22 For Φ119886119886119887
whenthere is no path-triple having mergeable path-pair (ie thepath-triple belongs to either Case 21 or Case 23)Φ
119886119886119887can be
transformed toΦ11988611198862119887
which is equivalent to the computationof Φ119886119887119888
for Cases 31 and 32 The correctness of our path-counting formula for Cases 31 and 32 is proven Thus weobtain the correctness for Φ
119886119886119887when the path-triple belongs
to either Case 21 or Case 22
B2 Multiple Triple-Common Ancestors Now we providethe correctness proof for multiple triple-common ancestorsregarding the path-counting formulas (12) and (A1)
18 Computational and Mathematical Methods in Medicine
A
a
s
w
t
f
b
Parent-child relationshipAncestor-descendant relationship
(a)
a
s
f
b
Parent-child relationshipAncestor-descendant relationship
(b)
Figure 22 (a) 119887 is not an ancestor of 119886 (b) 119887 is an ancestor of 119886
Lemma A Given a pedigree graph 119866 and three individuals 119886119887 119888 having at least one trip-common ancestorΦ
119886119887119888is correctly
computed using the path counting formulas (12) and (A1)
Proof Proof by induction on the number of triple-commonancestorsBasis 119866 has only one triple-common ancestor of 119886 119887 and 119888
The correctness of (12) and (A1) for 119866 with only one tri-ple-common ancestor of 119886 119887 and 119888 is proven in the previoussection
Induction Hypothesis Assume that if 119866 has 119896 or less triple-common ancestors of 119886 119887 and 119888 (12) and (A1) are correct for119866
Induction Step Now we show that it is true for 119866 with 119896 + 1triple-common ancestors of 119886 119887 and 119888
Let 119879119903119894 119862(119886 119887 119888 119866) denote all triple-common ancestorsof 119886 119887 and 119888 in 119866 where 119879119903119894 119862(119886 119887 119888 119866) = 119860
119894| 1 le 119894 le 119896 +
1 Let 1198601be the most top triple-common ancestor such that
there is no individual among the remaining ancestors 119860119894|
2 le 119894 le 119896 + 1 who is an ancestor of 1198601 Let 119878(119860
1) denote the
contribution from 1198601to Φ119886119887119888
Because119860
1is themost top triple-common ancestor there
is no path-triple from 119860119894| 2 le 119894 le 119896 + 1 to 119886 119887 and
119888 which passes through 1198601 Then we can remove 119860
1from
119866 and delete all out-going edges from 1198601and obtain a new
graph 1198661015840 which has 119896 triple-common ancestors of 119886 119887 and 119888It means 119879119903119894 119862(119886 119887 119888 1198661015840) = 119860
119894| 2 le 119894 le 119896 + 1
For the new graph 1198661015840 we can apply our induction
hypothesis and obtainΦ119886119887119888(1198661015840
)For the most top triple-common ancestor 119860
1 there are
two different cases considering its relationship with the othertriple-common ancestors
(1) there is no individual among 119860119894| 2 le 119894 le 119896 + 1 who
is a descendant of 1198601
(2) there is at least one individual among 119860119894| 2 le 119894 le
119896 + 1 who is a descendant of 1198601
For (1) since no individual among 119860119894| 2 le 119894 le 119896 + 1 is a
descendant of 1198601 the set of path-triples from 119860
1to 119886 119887 and
119888 is independent of the set of path-triples from 119860119894| 2 le 119894 le
119896 + 1 to 119886 119887 and 119888 It also means that the contribution from
1198601toΦ119886119887119888
is independent of the contribution from the othertriple-common ancestors
Summing up all contributions we can obtainΦ119886119887119888(119866) =
Φ119886119887119888(1198661015840
) + 119878(1198601)
For (2) let119860119895be one descendant of119860
1 Now both119860
1and
119860119895can reach 119886 119887 and 119888119901119905119894= 119905119886 1198601rarr sdot sdot sdot rarr 119886 119905
119887 1198601rarr sdot sdot sdot rarr 119887 119905
119888 1198601rarr
sdot sdot sdot rarr 119888 a path-triple from 1198601to 119886 119887 and 119888
If 119905119886 119905119887 and 119905
119888all pass through119860
119895 then the path-triple119901119905
119894
is not an eligible path-triple for Φ119886119887119888
When we compute thecontribution from119860
1toΦ119886119887119888
we exclude all such path-tripleswhere 119905
119886 119905119887 and 119905
119888all pass through a lower triple-common
ancestor In other words an eligible path-triple from 1198601
regarding Φ119886119887119888
cannot have three paths all passing through alower triple-common ancestor Therefore we know that thatthe contribution from119860
1toΦ119886119887119888
is independent of the contri-bution from the other triple-common ancestors Summing upall contributions we obtainΦ
119886119887119888(119866) = Φ
119886119887119888(1198661015840
) + 119878(1198601)
C Proof for Four Individuals and TwoPairs of Individuals
Here we give a proof sketch for the correctness of pathcounting formulas for four individuals First of all for fourindividuals in a pedigree graph 119866 we present all differentcases based on which we construct a dependency graphThe correctness of the path-counting formulas for two-pairindividuals can be proved similarly
C1 Proof for Four Individuals Consider the existence ofdifferent types of path-quads regarding Φ
119886119887119888119889 Φ119886119886119887119888
andΦ119886119886119886119887
there are 15 cases for a pedigree graph 119866
Case 21 119866 has path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with zero root overlapCase 22 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with one root overlapCase 23 119866 has path-pairs
⟨119875119860119886 119875119860119887⟩
with zero root overlap
lArr997904 Φ119886119886119886119887
Computational and Mathematical Methods in Medicine 19
Case21
Case31 ΦAAA
ΦAAA
Case41
Case42
Case34ΦAA
Case32
Case331
Case22
Case23
Case431
Case35
Case432
Case4 33
Case332
Case333
Figure 23 Dependency graph for different cases for four individuals
Case 31 119866 has path-quads⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with zero root overlapCase 32 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 2-overlapCase 331 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with two root 2-overlapCase 332 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 3-overlapCase 333 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 2-overlapand one root 3-overlap
Case 34 119866 has path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩
with zero root overlapCase 35 119866 has path-triples
⟨119875119860119886 119875119860119887 119875119860119888⟩
with one root overlap
lArr997904 Φ119886119886119887119888
Case 41 119866 has path-quads⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with zero root overlapCase 42 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 2-overlapCase 431 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with two root 2-overlapCase 432 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 3-overlapCase 433 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 2-overlapand one root 3-overlap
lArr997904 Φ119886119887119888119889
(C1)Then we construct a dependency graph shown in
Figure 23 for all cases for four individualsAccording to the dependency graph in Figure 23 the
intermediate steps including Cases 34 and 35 are already
proved for the computation of Φ119886119887119888
The correctness of thetransformation fromCase 42 to Case 34 can be proved basedon the recursive formula forΦ
119886119887119888119889andΦ
119886119886119887119888 Similarly we can
obtain the transformation from Case 431 to Case 35
C2 Proof for TwoPairs of Individuals Consider the existenceof different types of 2-pair-path-pair regarding Φ
119886119887119888119889 there
are 9 cases which are listed as follows
Case 41 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-
overlap and zero root heter-overlap
Case 42 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-
overlap and one root heter-overlap
Case 431 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root
homo-overlap and two root heter-overlap
Case 432 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root
homo-overlap and two root heter-overlap
Case 44119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root homo-
overlap and zero root heter-overlap
Case 45 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with two root homo-
overlap and zero root heter-overlap
Case 46 119866 has path-triples ⟨119875119860119886 119875119860119887 119875119860119888⟩ with zero root
overlapCase 47 119866 has path-triples ⟨119875
119860119886 119875119860119887 119875119860119888⟩ with one root
overlap
Case 48 119866 has path-pairs ⟨119875119879119888 119875119879119889⟩ with zero root overlap
Then we construct a dependency graph for the casesrelating to Φ
119886119887119888119889in Figure 24
According to the dependency graph in Figure 24Cases 46 47 and 48 are the intermediate steps whichalready are proved for the computation of Φ
119886119887119888 The
correctness of the transformation from Case 42 to Case 46can be proved based on the recursive formula for Φ
119886119887119888119889and
Φ119886119887119886119888
Similarly we can obtain the transformation fromCases 431 and 432 to Case 47 as well as from Case 44 toCase 48 accordingly
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
20 Computational and Mathematical Methods in Medicine
Case41
Case44
ΦAAA
Case42 Case46
Case48
ΦAA
ΦTT
Case431 Case47
Case432
ΦAAAA
Figure 24 Dependency graph for different cases for two pairs of individuals
Acknowledgments
The authors thank Professor Robert C Elston Case Schoolof Medicine for introducing to them the identity coefficientsand referring them to the related literature [7 10 17] Thiswork is partially supported by the National Science Founda-tionGrants DBI 0743705 DBI 0849956 andCRI 0551603 andby the National Institute of Health Grant GM088823
References
[1] Surgeon Generalrsquos New Family Health History Tool Is ReleasedReady for ldquo21st Century Medicinerdquo httpcompmedcomcate-gorypeople-helping-peoplepage7
[2] M Falchi P Forabosco E Mocci et al ldquoA genomewidesearch using an original pairwise sampling approach for largegenealogies identifies a new locus for total and low-density lipo-protein cholesterol in two genetically differentiated isolates ofSardiniardquoThe American Journal of Human Genetics vol 75 no6 pp 1015ndash1031 2004
[3] M Ciullo C Bellenguez V Colonna et al ldquoNew susceptibilitylocus for hypertension on chromosome 8q by efficient pedigree-breaking in an Italian isolaterdquo Human Molecular Genetics vol15 no 10 pp 1735ndash1743 2006
[4] Glossary of Genetic Terms National Human Genome ResearchInstitute httpwwwgenomegovglossaryid=148
[5] CW CottermanA calculus for statistico-genetics [PhD thesis]Columbus Ohio USA Ohio State University 1940 Reprintedin P Ballonoff Ed Genetics and Social Structure DowdenHutchinson amp Ross Stroudsburg Pa USA 1974
[6] G Malecot Les mathematique de lrsquoheredite Masson ParisFrance 1948 Translated edition The Mathematics of HeredityFreeman San Francisco Calif USA 1969
[7] M Gillois ldquoLa relation drsquoidentite en genetiquerdquo Annales delrsquoInstitut Henri Poincare B vol 2 pp 1ndash94 1964
[8] D L Harris ldquoGenotypic covariances between inbred relativesrdquoGenetics vol 50 pp 1319ndash1348 1964
[9] A Jacquard ldquoLogique du calcul des coefficients drsquoidentite entredeux individualsrdquo Population vol 21 pp 751ndash776 1966
[10] G Karigl ldquoA recursive algorithm for the calculation of identitycoefficientsrdquo Annals of Human Genetics vol 45 no 3 pp 299ndash305 1981
[11] B Elliott S F Akgul S Mayes and Z M Ozsoyoglu ldquoEfficientevaluation of inbreeding queries on pedigree datardquo in Proceed-ings of the 19th International Conference on Scientific and Statis-tical Database Management (SSDBM rsquo07) July 2007
[12] B Elliott E Cheng S Mayes and Z M Ozsoyoglu ldquoEfficientlycalculating inbreeding on large pedigrees databasesrdquo Informa-tion Systems vol 34 no 6 pp 469ndash492 2009
[13] L Yang E Cheng and Z M Ozsoyoglu ldquoUsing compactencodings for path-based computations on pedigree graphsrdquo inProceedings of the ACM Conference on Bioinformatics Compu-tational Biology and Biomedicine (ACM-BCB rsquo11) pp 235ndash244August 2011
[14] E Cheng B Elliott and Z M Ozsoyoglu ldquoScalable compu-tation of kinship and identity coefficients on large pedigreesrdquoin Proceedings of the 7th Annual International Conference onComputational Systems Bioinformatics (CSB rsquo08) pp 27ndash362008
[15] E Cheng B Elliott and Z M Ozsoyoglu ldquoEfficient compu-tation of kinship and identity coefficients on large pedigreesrdquoJournal of Bioinformatics and Computational Biology (JBCB)vol 7 no 3 pp 429ndash453 2009
[16] S Wright ldquoCoefficients of inbreeding and relationshiprdquo TheAmerican Naturalist vol 56 no 645 1922
[17] R Nadot and G Vaysseix ldquoKinship and identity algorithm ofcoefficients of identityrdquo Biometrics vol 29 no 2 pp 347ndash3591973
[18] E Cheng Scalable path-based computations on pedigree data[PhD thesis] Case Western Reserve University ClevelandOhio USA 2012
[19] V Ollikainen Simulation Techniques for Disease Gene Localiza-tion in Isolated Populations [PhD thesis] University ofHelsinkiHelsinki Finland 2002
[20] H T T Toivonen P Onkamo K Vasko et al ldquoData miningapplied to linkage diseqilibrium mappingrdquoThe American Jour-nal of Human Genetics vol 67 no 1 pp 133ndash145 2000
[21] W Boucher ldquoCalculation of the inbreeding coefficientrdquo Journalof Mathematical Biology vol 26 no 1 pp 57ndash64 1988
Submit your manuscripts athttpwwwhindawicom
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MEDIATORSINFLAMMATION
of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Behavioural Neurology
EndocrinologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Disease Markers
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
OncologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Oxidative Medicine and Cellular Longevity
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
PPAR Research
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
ObesityJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational and Mathematical Methods in Medicine
OphthalmologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Diabetes ResearchJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Research and TreatmentAIDS
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Gastroenterology Research and Practice
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Parkinsonrsquos Disease
Evidence-Based Complementary and Alternative Medicine
Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom
14 Computational and Mathematical Methods in Medicine
PAa1 PAa2 PAa1 PAa2
S0 S1
PAb PAb PAb
If is mergeable⟨P ⟩Aa1 PAa2
PAa
S2 S3
Figure 16 A path-pair level graphical representation of ⟨1198751198601198861 1198751198601198862
119875119860119887⟩
Definition A1 (Mergeable Path-Pair) A path-pair ⟨1198751198601198861
1198751198601198862⟩ is mergeable if and only if the two paths 119875
1198601198861and 119875
1198601198862
are completely identical
Next we present a graphical representation of ⟨1198751198601198861 1198751198601198862
119875119860119887⟩ in Figure 16
Lemma A2 For 1198782and 119878
3in Figure 16 ⟨119875
1198601198861 1198751198601198862⟩ cannot
be a mergeable path-pair
Proof For 1198782and 119878
3 if ⟨119875
1198601198861 1198751198601198862⟩ is mergeable then
any common individual 119904 between 1198751198601198861
and 119875119860119887
is alsoa shared individual between 119875
1198601198862and 119875
119860119887 It means
119904 isin 119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) which contradicts the fact that
119879119903119894 119862(1198751198601198861 1198751198601198862 119875119860119887) = 0
Considering all three scenarios in Figure 16 only 1198781can
have a mergeable path-pair ⟨1198751198601198861 1198751198601198862⟩ by Lemma A2 Now
we present our path-counting formula forΦ119886119886119887
where 119886 is notan ancestor of 119887
Φ119886119886119887
= sum
119860
( sum
Type 1(1
2)
119871 tripleminus1
Φ119860119860119860
+ sum
Type 2(1
2)
119871 triple
Φ119860119860
+ sum
Type 3(1
2)
119871⟨119875119860119886119875119860119887⟩+1
Φ119860119860)
(A1)
where 119860 a common ancestor of 119886 and 119887When ⟨119875
1198601198861 1198751198601198862⟩ is not mergeable
Type 1 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has no root 2-overlap
Type 2 ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ has one root 2-overlap path
119875119860119904
ending at the individual 119904
When ⟨1198751198601198861 1198751198601198862⟩ is mergeable
Type 3 ⟨119875119860119886 119875119860119887⟩ is a nonoverlapping path-pair
119871 triple = 1198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887
for Type 11198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887minus 119871119875119860119904
for Type 2
119871⟨119875119860119886 119875119860119887⟩
= 119871119875119860119886+ 119871119875119860119887
for Type 3
(A2)
For the sake of completeness if 119886 is an ancestor of 119887 there isno recursive formula for Φ
119886119886119887in [10] but we can use either
the recursive formula for Φ119886119887119888
or the path-counting formulaforΦ119886119887119888
to computeΦ11988611198862119887
A2 Path-Counting Formula for Φ119886119886119887119888
Given a path-quad⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩ if ⟨119875
1198601198861 1198751198601198862⟩ is not mergeable then
we process the path-quad as equivalent to ⟨119875119860119886 119875119860119887 119875119860119888
119875119860119889⟩ If ⟨119875
1198601198861 1198751198601198862⟩ is mergeable the path-quad ⟨119875
1198601198861 1198751198601198862
119875119860119887 119875119860119888⟩ can be condensed to scenarios for ⟨119875
119860119886 119875119860119887 119875119860119888⟩
Now we present a path-counting formula forΦ119886119886119887119888
where119886 is not an ancestor of 119887 and 119888 as follows
Φ119886119886119887119888
= sum
119860
( sum
Type 1(1
2)
119871quadminus1
Φ119860119860119860119860
+ sum
Type 2(1
2)
119871quad
ΦAAA
+ sum
Type 3(1
2)
119871quad+1
Φ119860119860)
+sum
119860
( sum
Type 4(1
2)
119871 triple+1
Φ119860119860119860
+ sum
Type 5(1
2)
119871 triple+2
Φ119860119860)
(A3)
where 119860 a quad-common ancestor of 119886 119887 119888 and 119889When ⟨119875
1198601198861 1198751198601198862⟩ is not mergeable
Type 1 zero root 2-overlap and zero root 3-overlappathType 2 one root 2-overlap path 119875
119860119904ending at 119904
Type 3
Case 1 two root 2-overlap paths 1198751198601199041
and 1198751198601199042
ending at 1199041and 1199042 respectively
Case 2 one root 3-overlap path 119875119860119905
ending at 119905Case 3 one root 2-overlapand one root 3-overlap paths119875119860119904
and 119875119860119905
ending at 119904 and 119905respectively
(A4)
When ⟨1198751198601198861 1198751198601198862⟩ is mergeable
Type 4 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has zero root 2-overlap path
Type 5 ⟨119875119860119886 119875119860119887 119875119860119888⟩ has one root 2-overlap path119875
119860119904
ending at 119904
119871quad=
1198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
for Type 11198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus119871119875119860119904
for Type 21198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus1198711198751198601199041
minus 1198711198751198601199042
for Case 1isinType 31198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus119871119875119860119905
for Case 2isinType 31198711198751198601198861
+ 1198711198751198601198862
+ 119871119875119860119887+ 119871119875119860119888
minus119871119875119860119905minus 119871119875119860119904
for Case 3isinType 3
119871 triple = 119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
for Type 4119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888minus 119871119875119860119904
for Type 5(A5)
Computational and Mathematical Methods in Medicine 15
Note that if 119886 is an ancestor of either 119887 or 119888 or both ofthem then the path-counting formula of Φ
119886119887119888119889is applicable
to computeΦ11988611198862119887119888
A3 Path-Counting Formula for Φ119886119886119886119887
A special case of⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ for ⟨119875
1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ is introduced
when ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable With the existence of
a mergeable path-triple ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ can be con-
densed to ⟨119875119860119886 119875119860119887⟩
Definition A3 (Mergeable Path-Triple) Given three paths1198751198601198861
1198751198601198862
and 1198751198601198863
they are mergeable if and only if theyare completely identical
Lemma A4 Given a path-quad ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there
must be at least one mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩
⟨1198751198601198861 1198751198601198863⟩ ⟨1198751198601198862 1198751198601198863⟩
Proof For an individual 119886 with two parents 119891 and 119898 thepaternal allele of the individual 119886 is transmitted from 119891 andthe maternal allele is transmitted from119898 At allele level onlytwo descent paths starting from an ancestor are allowed Fora path-quad ⟨119875
1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there must be at least one
mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩ ⟨1198751198601198861 1198751198601198863⟩ and
⟨1198751198601198862 1198751198601198863⟩
For simplicity we treat ⟨1198751198601198861 1198751198601198862⟩ as a default mergeable
path-pairNow we present the path-counting formula for Φ
119886119886119886119887
where 119886 is not an ancestor of 119887 as follows
Φ119886119886119886119887
= sum
119860
(3
2( sum
Type 1(1
2)
119871 tripleminus1
Φ119860119860119860
+ sum
Type 2(1
2)
119871 triple
Φ119860119860)
+ sum
Type 3(1
2)
119871pair+2
Φ119860119860)
(A6)
where 119860 a common ancestor of 119886 and 119887When there is only one mergeable path-pair (let us con-
sider ⟨1198751198601198861 1198751198601198862⟩ as the mergeable path-pair)
Type 1 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has zero root 2-overlap path
Type 2 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has one root 2-overlap path
119875119860119904
ending at 119904
When ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable
Type 3 ⟨119875119860119886 119875119860119887⟩ is nonoverlapping
119871 triple = 1198711198751198601198861
+ 1198711198751198601198863
+ 119871119875119860119887
for Type 11198711198751198601198861
+ 1198711198751198601198863
+ 119871119875119860119887minus 119871119875119860119904
for Type 2
119871pair = 119871119875119860119886 + 119871119875119860119887 for Type 3
(A7)
Note that if 119886 is an ancestor of 119887 we treat Φ119886119886119886119887
=
Φ119886111988621198863119887
Then we apply the path-counting formula forΦ119886119887119888119889
to computeΦ119886111988621198863119887
Case21 Case31 ΦAAAΦabCase22 Case32
Case23 ΦAA
Figure 17 Dependency graph for different cases regardingΦ119886119887119888
andΦ119886119886119887
B Proof for Path-Counting Formulas ofThree Individuals
Wefirst demonstrate that for one triple-common ancestor119860the path-counting computation of Φ
119886119887119888is equivalent to the
computation using recursive formulas Then we prove thecorrectness of the path-counting computation for multipletriple-common ancestors
B1 One Triple-Common Ancestor Considering the differenttypes of path-triples starting from a triple-common ancestor119860 in a pedigree graph119866 contributing toΦ
119886119887119888andΦ
119886119886119887119866 can
have 5 different cases
Case 21 119866 does not haveany path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with root overlapCase 22 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with root overlapCase 23 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
having mergeablepath-pair⟨119875
1198601198861 1198751198601198862⟩
lArr997904 Φ119886119886119887
Case 31 119866 does not haveany path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩
with root overlapCase 32 119866 has path-triples
⟨119875119860119886 119875119860119887 119875119860119888⟩
with root overlap
lArr997904 Φ119886119887119888
(B1)
Based on the 5 cases from Case 21 to Case 32 we firstconstruct a dependency graph shown in Figure 17 consist-ent with the recursive formulas (3) (4) and (5) for the gener-alized kinship coefficients for three individuals
Then we take the following steps to prove the correctnessof the path-counting formulas (12) and (A1)
(i) forΦ119886119887 the correctness of the path-counting formula
(ie Wrightrsquos formula) is proven in [21] For Case 21and Case 22 the correctness is proven based on thecorrectness of Cases 31 and 32
(ii) for Case 23 it has no cycle but only depends on Φ119886119887
Thus we prove the correctness of Case 23 by trans-forming the case toΦ
119886119887
16 Computational and Mathematical Methods in Medicine
a b
c
(a)
A
a b c
(b)
Figure 18 (a) 119888 is a parent of 119886 and 119887 (b) no individual is a parent of another
Parent-child relationshipAncestor-descendant relationship
A
a
s v p
f b c
(a)
Parent-child relationshipAncestor-descendant relationship
c
a
s v
f b
(b)
Figure 19 (a) No individual is a parent of another (b) 119888 is an ancestor of 119886 and 119887
(iii) for Cases 31 and 32 the correctness is proven byinduction on the number of edges 119899 in the pedigreegraph 119866
B11 Correctness Proof for Case 31
Case 31 ForΦ119886119887119888
119866 does not have any path triples ⟨119875119860119886 119875119860119887
119875119860119888⟩ with root overlap
Proof (Basis) There are two basic scenarios (i) one individ-ual is a parent of another (ii) no individual is a parent ofanother among 119886 119887 and 119888
Using the recursive formula (3) to compute Φ119886119887119888
forFigure 18(a) Φ
119886119887119888= (12)Φ
119888119887119888= (12)
2
Φ119888119888119888 for Figure 18(b)
Φ119886119887119888= (12)Φ
119860119887119888= (12)
2
Φ119860119860119888
= (12)3
Φ119860119860119860
Using the path-counting formula (12) if a path-triple
⟨119875119860119886 119875119860119887 119875119860119888⟩ has no root overlap (ie Type 1) then the
contribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ
119886119887119888can be computed as
follows sumType 1(12)119871⟨119875119860119886119875119860119887
119875119860119888⟩Φ119860119860119860
where 119871⟨119875119860119886119875119860119887 119875119860119888⟩
=
119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
For Figure 18(a) 119888 is the only triple-common ancestor
and we obtain Φ119886119887119888
= (12)119871⟨119875119888119886119875119888119887
119875119888119888⟩Φ119888119888119888
= (12)2
Φ119888119888119888 for
Figure 18(b) we obtain Φ119886119887119888
= (12)119871⟨119875119860119886119875119860119887
119875119860119888⟩Φ119860119860119860
=
(12)3
Φ119860119860119860
Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Then we show it istrue for 119899 = 119896 + 1
For Figures 19(a) and 19(b) among 119886 119887 and 119888 let 119886 be theindividual having the longest path starting from their triple-common ancestor in the pedigree graph119866with (119896+1) edgesIf we remove the node 119886 and cut the edge 119891 rarr 119886 from 119866
then the new graph 119866lowast has 119896 edges In terms of computingΦ119891119887119888
119866lowast satisfies the condition for induction hypothesisFor Figure 19(a) Φ
119891119887119888= sumType 1(12)
119871⟨119875119860119891119875119860119887119875119860119888⟩Φ119860119860119860
Based on the recursive formula (3)Φ
119886119887119888= (12)(Φ
119891119887119888+Φ119898119887119888)
where 119891 and 119898 are parents of 119886 In 119866 119886 only has one parent119891 thus it indicatesΦ
119898119887119888= 0 Then we can plug-in the path-
counting formula forΦ119891119887119888
to obtain
Φ119886119887119888=1
2Φ119891119887119888
=1
2lowast sum
Type 1(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩
Φ119860119860119860
= sum
Type 1(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860119860
∵ 119871⟨119875119860119886119875119860119887 119875119860119888⟩
= 119871⟨119875119860119891119875119860119887 119875119860119888⟩
+ 1
there4 Φ119886119887119888= sum
Type 1(1
2)
119871⟨119875119860119886119875119860119887119875119860119888⟩
Φ119860119860119860
(B2)
Similarly for Figure 19(b) we obtain Φ119886119887119888
=
sumType 1(12)119871⟨119875119888119891119875119888119887119875119888119888⟩+1
Φ119888119888119888= sumType 1(12)
119871⟨119875119888119886119875119888119887119875119888119888⟩Φ119888119888119888
Thus it is true for 119899 = 119896 + 1
B12 Correctness Proof for Case 32
Case 32 ForΦ119886119887119888
119866 has path triples ⟨119875119860119886 119875119860119887 119875119860119888⟩with root
overlap
Proof (Basis) There are three basic scenarios (i) there are twoindividuals who are parents of another (ii) there is only oneindividual who is parent of another (iii) there is no individualwho is a parent of another among 119886 119887 and 119888
Computational and Mathematical Methods in Medicine 17
a
b
c
(a)
A
a
b c
(b)
A
a
s
b
c
(c)
Figure 20 (a) 119887 is a parent of 119886 and 119888 is a parent of 119887 (b) 119887 is a parentof 119886 (c) no individual who is a parent of another
Using the recursive formula (3) to compute Φ119886119887119888
inFigure 20 for Figure 20(a) Φ
119886119887119888= (12)Φ
119887119887119888= (12)
2
Φ119887119888=
(12)3
Φ119888119888 for Figure 20(b)Φ
119886119887119888= (12)Φ
119887119887119888= (12)
2
Φ119887119888=
(12)4
Φ119860119860
for Figure 20(c)Φ119886119887119888= (12)
2
Φ119904119904119888= (12)
3
Φ119904119888=
(12)5
Φ119860119860
Using the path-counting formula (12) if a path-triple
⟨119875119860119886 119875119860119887 119875119860119888⟩ has root overlap (ie Type 2) then the con-
tribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ
119886119887119888can be computed as
followssumType 2(12)119871⟨119875119860119886119875119860119887
119875119860119888⟩+1
Φ119860119860
where 119871⟨119875119860119886 119875119860119887 119875119860119888⟩
=
119871119875119860119886
+ 119871119875119860119887
+ 119871119875119860119888minus 119871119875119860119904
and 119904 is the last individual of theroot overlap path 119875
119860119904
For Figure 20(a) 119888 is the only triple-common ancestorand we obtain Φ
119886119887119888= (12)
119871⟨119875119888119886119875119888119887119875119888119888⟩+1
Φ119888119888= (12)
2+1
Φ119888119888=
(12)3
Φ119888119888 Similarly for Figures 20(b) and 20(c) we obtain
Φ119886119887119888= (12)
4
Φ119860119860
and Φ119886119887119888= (12)
5
Φ119860119860
respectively
Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Show that it is truefor = 119896 + 1
For Figures 21(a) 21(b) and 21(c) among 119886 119887 and 119888 let119886 be the individual who has the longest path and let 119901 be aparent of 119886 Then we cut the edge 119901 rarr 119886 from 119866 and obtaina new graph 119866lowast which satisfies the condition of inductionhypothesis For Figure 21(a) we use the path-counting for-mula forΦ
119891119887119888in 119866lowast Φ
119891119887119888= sumType 2(12)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860
In 119866 119891 is the only parent of 119886 according to the recursive
formula (3) we have Φ119886119887119888= (12)Φ
119891119887119888 Then we can plug-in
the Φ119891119887119888
and obtain
Φ119886119887119888=1
2Φ119891119887119888
=1
2sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860
= sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1
Φ119860119860
∵ 119871⟨119875119860119886 119875119860119887 119875119860119888⟩
= 119871⟨119875119860119891119875119860119887 119875119860119888⟩
+ 1
there4 Φ119886119887119888= sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1
Φ119860119860
= sum
Type 2(1
2)
119871⟨119875119860119886119875119860119887119875119860119888⟩+1
Φ119860119860
(B3)
For Figures 21(b) and 21(c) we take the same steps as we cal-culate Φ
119886119887119888for Figure 21(a)
In summary it is true for 119899 = 119896 + 1
A
a
s
t
f
b
c
(a)
a
t
b
A
s c
(b)
a
s
t
b
c
(c)Figure 21 (a) No individual who is a parent of another (b) 119887 is aparent of 119886 (c) 119887 is a parent of 119886 and 119888 is an ancestor of 119887
B13 Correctness Proof for Case 23
Case 23 For Φ119886119886119887
the path-triples in the pedigree graph 119866have mergeable path-pair
Proof Considering the relationship between 119886 and 119887 119866has two scenarios (i) 119887 is not an ancestor of 119886 (ii) 119887 isan ancestor of 119886 Using the path-counting formula (A1)if a path-triple ⟨119875
1198601198861 1198751198601198862 119875119860119887⟩ isin Type 3 which means
that it has a mergeable path-pair then the contributionof ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ to Φ
119886119886119887can be computed as follows
sumType 3(12)119871⟨119875119860119886119875119860119887
⟩+1Φ119860119860
where 119871⟨119875119860119886 119875119860119887⟩
= 119871119875119860119886+ 119871119875119860119887
Using the recursive formula (4) we obtain Φ
119886119886119887=
(12)(Φ119886119887+ Φ119891119898119887)
For Figure 22(a) 119860 is a common ancestor of 119886 and 119887∵ 119886 only has one parent 119891
there4 Φ119886119886119887
=1
2(Φ119886119887+ Φ119891119898119887)
=1
2(Φ119886119887+ 0) =
1
2Φ119886119887
(as 119898 is missing) (B4)
For Φ119886119887 we use Wrightrsquos formula and obtain Φ
119886119887=
sum119875(12)119871⟨119875119860119886119875119860119887
⟩Φ119860119860
where 119875 denotes all nonoverlappingpath-pairs ⟨119875
119860119886 119875119860119887⟩
Then we have Φ119886119886119887
= (12)Φ119886119887
=
(12)sum119875(12)119871⟨119875119860119886119875119860119887
⟩Φ119860119860= sum119875(12)119871⟨119875119860119886119875119860119887
⟩+1Φ119860119860
For Figure 22(b) we can also transform the computation
of Φ119886119886119887
to Φ119886119887
In summary it shows that the path-counting formula(A1) is true for Case 23
B14 Correctness Proof for Cases 21 and 22 For Φ119886119886119887
whenthere is no path-triple having mergeable path-pair (ie thepath-triple belongs to either Case 21 or Case 23)Φ
119886119886119887can be
transformed toΦ11988611198862119887
which is equivalent to the computationof Φ119886119887119888
for Cases 31 and 32 The correctness of our path-counting formula for Cases 31 and 32 is proven Thus weobtain the correctness for Φ
119886119886119887when the path-triple belongs
to either Case 21 or Case 22
B2 Multiple Triple-Common Ancestors Now we providethe correctness proof for multiple triple-common ancestorsregarding the path-counting formulas (12) and (A1)
18 Computational and Mathematical Methods in Medicine
A
a
s
w
t
f
b
Parent-child relationshipAncestor-descendant relationship
(a)
a
s
f
b
Parent-child relationshipAncestor-descendant relationship
(b)
Figure 22 (a) 119887 is not an ancestor of 119886 (b) 119887 is an ancestor of 119886
Lemma A Given a pedigree graph 119866 and three individuals 119886119887 119888 having at least one trip-common ancestorΦ
119886119887119888is correctly
computed using the path counting formulas (12) and (A1)
Proof Proof by induction on the number of triple-commonancestorsBasis 119866 has only one triple-common ancestor of 119886 119887 and 119888
The correctness of (12) and (A1) for 119866 with only one tri-ple-common ancestor of 119886 119887 and 119888 is proven in the previoussection
Induction Hypothesis Assume that if 119866 has 119896 or less triple-common ancestors of 119886 119887 and 119888 (12) and (A1) are correct for119866
Induction Step Now we show that it is true for 119866 with 119896 + 1triple-common ancestors of 119886 119887 and 119888
Let 119879119903119894 119862(119886 119887 119888 119866) denote all triple-common ancestorsof 119886 119887 and 119888 in 119866 where 119879119903119894 119862(119886 119887 119888 119866) = 119860
119894| 1 le 119894 le 119896 +
1 Let 1198601be the most top triple-common ancestor such that
there is no individual among the remaining ancestors 119860119894|
2 le 119894 le 119896 + 1 who is an ancestor of 1198601 Let 119878(119860
1) denote the
contribution from 1198601to Φ119886119887119888
Because119860
1is themost top triple-common ancestor there
is no path-triple from 119860119894| 2 le 119894 le 119896 + 1 to 119886 119887 and
119888 which passes through 1198601 Then we can remove 119860
1from
119866 and delete all out-going edges from 1198601and obtain a new
graph 1198661015840 which has 119896 triple-common ancestors of 119886 119887 and 119888It means 119879119903119894 119862(119886 119887 119888 1198661015840) = 119860
119894| 2 le 119894 le 119896 + 1
For the new graph 1198661015840 we can apply our induction
hypothesis and obtainΦ119886119887119888(1198661015840
)For the most top triple-common ancestor 119860
1 there are
two different cases considering its relationship with the othertriple-common ancestors
(1) there is no individual among 119860119894| 2 le 119894 le 119896 + 1 who
is a descendant of 1198601
(2) there is at least one individual among 119860119894| 2 le 119894 le
119896 + 1 who is a descendant of 1198601
For (1) since no individual among 119860119894| 2 le 119894 le 119896 + 1 is a
descendant of 1198601 the set of path-triples from 119860
1to 119886 119887 and
119888 is independent of the set of path-triples from 119860119894| 2 le 119894 le
119896 + 1 to 119886 119887 and 119888 It also means that the contribution from
1198601toΦ119886119887119888
is independent of the contribution from the othertriple-common ancestors
Summing up all contributions we can obtainΦ119886119887119888(119866) =
Φ119886119887119888(1198661015840
) + 119878(1198601)
For (2) let119860119895be one descendant of119860
1 Now both119860
1and
119860119895can reach 119886 119887 and 119888119901119905119894= 119905119886 1198601rarr sdot sdot sdot rarr 119886 119905
119887 1198601rarr sdot sdot sdot rarr 119887 119905
119888 1198601rarr
sdot sdot sdot rarr 119888 a path-triple from 1198601to 119886 119887 and 119888
If 119905119886 119905119887 and 119905
119888all pass through119860
119895 then the path-triple119901119905
119894
is not an eligible path-triple for Φ119886119887119888
When we compute thecontribution from119860
1toΦ119886119887119888
we exclude all such path-tripleswhere 119905
119886 119905119887 and 119905
119888all pass through a lower triple-common
ancestor In other words an eligible path-triple from 1198601
regarding Φ119886119887119888
cannot have three paths all passing through alower triple-common ancestor Therefore we know that thatthe contribution from119860
1toΦ119886119887119888
is independent of the contri-bution from the other triple-common ancestors Summing upall contributions we obtainΦ
119886119887119888(119866) = Φ
119886119887119888(1198661015840
) + 119878(1198601)
C Proof for Four Individuals and TwoPairs of Individuals
Here we give a proof sketch for the correctness of pathcounting formulas for four individuals First of all for fourindividuals in a pedigree graph 119866 we present all differentcases based on which we construct a dependency graphThe correctness of the path-counting formulas for two-pairindividuals can be proved similarly
C1 Proof for Four Individuals Consider the existence ofdifferent types of path-quads regarding Φ
119886119887119888119889 Φ119886119886119887119888
andΦ119886119886119886119887
there are 15 cases for a pedigree graph 119866
Case 21 119866 has path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with zero root overlapCase 22 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with one root overlapCase 23 119866 has path-pairs
⟨119875119860119886 119875119860119887⟩
with zero root overlap
lArr997904 Φ119886119886119886119887
Computational and Mathematical Methods in Medicine 19
Case21
Case31 ΦAAA
ΦAAA
Case41
Case42
Case34ΦAA
Case32
Case331
Case22
Case23
Case431
Case35
Case432
Case4 33
Case332
Case333
Figure 23 Dependency graph for different cases for four individuals
Case 31 119866 has path-quads⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with zero root overlapCase 32 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 2-overlapCase 331 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with two root 2-overlapCase 332 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 3-overlapCase 333 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 2-overlapand one root 3-overlap
Case 34 119866 has path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩
with zero root overlapCase 35 119866 has path-triples
⟨119875119860119886 119875119860119887 119875119860119888⟩
with one root overlap
lArr997904 Φ119886119886119887119888
Case 41 119866 has path-quads⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with zero root overlapCase 42 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 2-overlapCase 431 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with two root 2-overlapCase 432 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 3-overlapCase 433 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 2-overlapand one root 3-overlap
lArr997904 Φ119886119887119888119889
(C1)Then we construct a dependency graph shown in
Figure 23 for all cases for four individualsAccording to the dependency graph in Figure 23 the
intermediate steps including Cases 34 and 35 are already
proved for the computation of Φ119886119887119888
The correctness of thetransformation fromCase 42 to Case 34 can be proved basedon the recursive formula forΦ
119886119887119888119889andΦ
119886119886119887119888 Similarly we can
obtain the transformation from Case 431 to Case 35
C2 Proof for TwoPairs of Individuals Consider the existenceof different types of 2-pair-path-pair regarding Φ
119886119887119888119889 there
are 9 cases which are listed as follows
Case 41 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-
overlap and zero root heter-overlap
Case 42 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-
overlap and one root heter-overlap
Case 431 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root
homo-overlap and two root heter-overlap
Case 432 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root
homo-overlap and two root heter-overlap
Case 44119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root homo-
overlap and zero root heter-overlap
Case 45 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with two root homo-
overlap and zero root heter-overlap
Case 46 119866 has path-triples ⟨119875119860119886 119875119860119887 119875119860119888⟩ with zero root
overlapCase 47 119866 has path-triples ⟨119875
119860119886 119875119860119887 119875119860119888⟩ with one root
overlap
Case 48 119866 has path-pairs ⟨119875119879119888 119875119879119889⟩ with zero root overlap
Then we construct a dependency graph for the casesrelating to Φ
119886119887119888119889in Figure 24
According to the dependency graph in Figure 24Cases 46 47 and 48 are the intermediate steps whichalready are proved for the computation of Φ
119886119887119888 The
correctness of the transformation from Case 42 to Case 46can be proved based on the recursive formula for Φ
119886119887119888119889and
Φ119886119887119886119888
Similarly we can obtain the transformation fromCases 431 and 432 to Case 47 as well as from Case 44 toCase 48 accordingly
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
20 Computational and Mathematical Methods in Medicine
Case41
Case44
ΦAAA
Case42 Case46
Case48
ΦAA
ΦTT
Case431 Case47
Case432
ΦAAAA
Figure 24 Dependency graph for different cases for two pairs of individuals
Acknowledgments
The authors thank Professor Robert C Elston Case Schoolof Medicine for introducing to them the identity coefficientsand referring them to the related literature [7 10 17] Thiswork is partially supported by the National Science Founda-tionGrants DBI 0743705 DBI 0849956 andCRI 0551603 andby the National Institute of Health Grant GM088823
References
[1] Surgeon Generalrsquos New Family Health History Tool Is ReleasedReady for ldquo21st Century Medicinerdquo httpcompmedcomcate-gorypeople-helping-peoplepage7
[2] M Falchi P Forabosco E Mocci et al ldquoA genomewidesearch using an original pairwise sampling approach for largegenealogies identifies a new locus for total and low-density lipo-protein cholesterol in two genetically differentiated isolates ofSardiniardquoThe American Journal of Human Genetics vol 75 no6 pp 1015ndash1031 2004
[3] M Ciullo C Bellenguez V Colonna et al ldquoNew susceptibilitylocus for hypertension on chromosome 8q by efficient pedigree-breaking in an Italian isolaterdquo Human Molecular Genetics vol15 no 10 pp 1735ndash1743 2006
[4] Glossary of Genetic Terms National Human Genome ResearchInstitute httpwwwgenomegovglossaryid=148
[5] CW CottermanA calculus for statistico-genetics [PhD thesis]Columbus Ohio USA Ohio State University 1940 Reprintedin P Ballonoff Ed Genetics and Social Structure DowdenHutchinson amp Ross Stroudsburg Pa USA 1974
[6] G Malecot Les mathematique de lrsquoheredite Masson ParisFrance 1948 Translated edition The Mathematics of HeredityFreeman San Francisco Calif USA 1969
[7] M Gillois ldquoLa relation drsquoidentite en genetiquerdquo Annales delrsquoInstitut Henri Poincare B vol 2 pp 1ndash94 1964
[8] D L Harris ldquoGenotypic covariances between inbred relativesrdquoGenetics vol 50 pp 1319ndash1348 1964
[9] A Jacquard ldquoLogique du calcul des coefficients drsquoidentite entredeux individualsrdquo Population vol 21 pp 751ndash776 1966
[10] G Karigl ldquoA recursive algorithm for the calculation of identitycoefficientsrdquo Annals of Human Genetics vol 45 no 3 pp 299ndash305 1981
[11] B Elliott S F Akgul S Mayes and Z M Ozsoyoglu ldquoEfficientevaluation of inbreeding queries on pedigree datardquo in Proceed-ings of the 19th International Conference on Scientific and Statis-tical Database Management (SSDBM rsquo07) July 2007
[12] B Elliott E Cheng S Mayes and Z M Ozsoyoglu ldquoEfficientlycalculating inbreeding on large pedigrees databasesrdquo Informa-tion Systems vol 34 no 6 pp 469ndash492 2009
[13] L Yang E Cheng and Z M Ozsoyoglu ldquoUsing compactencodings for path-based computations on pedigree graphsrdquo inProceedings of the ACM Conference on Bioinformatics Compu-tational Biology and Biomedicine (ACM-BCB rsquo11) pp 235ndash244August 2011
[14] E Cheng B Elliott and Z M Ozsoyoglu ldquoScalable compu-tation of kinship and identity coefficients on large pedigreesrdquoin Proceedings of the 7th Annual International Conference onComputational Systems Bioinformatics (CSB rsquo08) pp 27ndash362008
[15] E Cheng B Elliott and Z M Ozsoyoglu ldquoEfficient compu-tation of kinship and identity coefficients on large pedigreesrdquoJournal of Bioinformatics and Computational Biology (JBCB)vol 7 no 3 pp 429ndash453 2009
[16] S Wright ldquoCoefficients of inbreeding and relationshiprdquo TheAmerican Naturalist vol 56 no 645 1922
[17] R Nadot and G Vaysseix ldquoKinship and identity algorithm ofcoefficients of identityrdquo Biometrics vol 29 no 2 pp 347ndash3591973
[18] E Cheng Scalable path-based computations on pedigree data[PhD thesis] Case Western Reserve University ClevelandOhio USA 2012
[19] V Ollikainen Simulation Techniques for Disease Gene Localiza-tion in Isolated Populations [PhD thesis] University ofHelsinkiHelsinki Finland 2002
[20] H T T Toivonen P Onkamo K Vasko et al ldquoData miningapplied to linkage diseqilibrium mappingrdquoThe American Jour-nal of Human Genetics vol 67 no 1 pp 133ndash145 2000
[21] W Boucher ldquoCalculation of the inbreeding coefficientrdquo Journalof Mathematical Biology vol 26 no 1 pp 57ndash64 1988
Submit your manuscripts athttpwwwhindawicom
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MEDIATORSINFLAMMATION
of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Behavioural Neurology
EndocrinologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Disease Markers
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
OncologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Oxidative Medicine and Cellular Longevity
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
PPAR Research
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
ObesityJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational and Mathematical Methods in Medicine
OphthalmologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Diabetes ResearchJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Research and TreatmentAIDS
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Gastroenterology Research and Practice
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Parkinsonrsquos Disease
Evidence-Based Complementary and Alternative Medicine
Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom
Computational and Mathematical Methods in Medicine 15
Note that if 119886 is an ancestor of either 119887 or 119888 or both ofthem then the path-counting formula of Φ
119886119887119888119889is applicable
to computeΦ11988611198862119887119888
A3 Path-Counting Formula for Φ119886119886119886119887
A special case of⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ for ⟨119875
1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ is introduced
when ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable With the existence of
a mergeable path-triple ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ can be con-
densed to ⟨119875119860119886 119875119860119887⟩
Definition A3 (Mergeable Path-Triple) Given three paths1198751198601198861
1198751198601198862
and 1198751198601198863
they are mergeable if and only if theyare completely identical
Lemma A4 Given a path-quad ⟨1198751198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there
must be at least one mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩
⟨1198751198601198861 1198751198601198863⟩ ⟨1198751198601198862 1198751198601198863⟩
Proof For an individual 119886 with two parents 119891 and 119898 thepaternal allele of the individual 119886 is transmitted from 119891 andthe maternal allele is transmitted from119898 At allele level onlytwo descent paths starting from an ancestor are allowed Fora path-quad ⟨119875
1198601198861 1198751198601198862 1198751198601198863 119875119860119887⟩ there must be at least one
mergeable path-pair among ⟨1198751198601198861 1198751198601198862⟩ ⟨1198751198601198861 1198751198601198863⟩ and
⟨1198751198601198862 1198751198601198863⟩
For simplicity we treat ⟨1198751198601198861 1198751198601198862⟩ as a default mergeable
path-pairNow we present the path-counting formula for Φ
119886119886119886119887
where 119886 is not an ancestor of 119887 as follows
Φ119886119886119886119887
= sum
119860
(3
2( sum
Type 1(1
2)
119871 tripleminus1
Φ119860119860119860
+ sum
Type 2(1
2)
119871 triple
Φ119860119860)
+ sum
Type 3(1
2)
119871pair+2
Φ119860119860)
(A6)
where 119860 a common ancestor of 119886 and 119887When there is only one mergeable path-pair (let us con-
sider ⟨1198751198601198861 1198751198601198862⟩ as the mergeable path-pair)
Type 1 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has zero root 2-overlap path
Type 2 ⟨1198751198601198861 1198751198601198863 119875119860119887⟩ has one root 2-overlap path
119875119860119904
ending at 119904
When ⟨1198751198601198861 1198751198601198862 1198751198601198863⟩ is mergeable
Type 3 ⟨119875119860119886 119875119860119887⟩ is nonoverlapping
119871 triple = 1198711198751198601198861
+ 1198711198751198601198863
+ 119871119875119860119887
for Type 11198711198751198601198861
+ 1198711198751198601198863
+ 119871119875119860119887minus 119871119875119860119904
for Type 2
119871pair = 119871119875119860119886 + 119871119875119860119887 for Type 3
(A7)
Note that if 119886 is an ancestor of 119887 we treat Φ119886119886119886119887
=
Φ119886111988621198863119887
Then we apply the path-counting formula forΦ119886119887119888119889
to computeΦ119886111988621198863119887
Case21 Case31 ΦAAAΦabCase22 Case32
Case23 ΦAA
Figure 17 Dependency graph for different cases regardingΦ119886119887119888
andΦ119886119886119887
B Proof for Path-Counting Formulas ofThree Individuals
Wefirst demonstrate that for one triple-common ancestor119860the path-counting computation of Φ
119886119887119888is equivalent to the
computation using recursive formulas Then we prove thecorrectness of the path-counting computation for multipletriple-common ancestors
B1 One Triple-Common Ancestor Considering the differenttypes of path-triples starting from a triple-common ancestor119860 in a pedigree graph119866 contributing toΦ
119886119887119888andΦ
119886119886119887119866 can
have 5 different cases
Case 21 119866 does not haveany path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with root overlapCase 22 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with root overlapCase 23 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
having mergeablepath-pair⟨119875
1198601198861 1198751198601198862⟩
lArr997904 Φ119886119886119887
Case 31 119866 does not haveany path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩
with root overlapCase 32 119866 has path-triples
⟨119875119860119886 119875119860119887 119875119860119888⟩
with root overlap
lArr997904 Φ119886119887119888
(B1)
Based on the 5 cases from Case 21 to Case 32 we firstconstruct a dependency graph shown in Figure 17 consist-ent with the recursive formulas (3) (4) and (5) for the gener-alized kinship coefficients for three individuals
Then we take the following steps to prove the correctnessof the path-counting formulas (12) and (A1)
(i) forΦ119886119887 the correctness of the path-counting formula
(ie Wrightrsquos formula) is proven in [21] For Case 21and Case 22 the correctness is proven based on thecorrectness of Cases 31 and 32
(ii) for Case 23 it has no cycle but only depends on Φ119886119887
Thus we prove the correctness of Case 23 by trans-forming the case toΦ
119886119887
16 Computational and Mathematical Methods in Medicine
a b
c
(a)
A
a b c
(b)
Figure 18 (a) 119888 is a parent of 119886 and 119887 (b) no individual is a parent of another
Parent-child relationshipAncestor-descendant relationship
A
a
s v p
f b c
(a)
Parent-child relationshipAncestor-descendant relationship
c
a
s v
f b
(b)
Figure 19 (a) No individual is a parent of another (b) 119888 is an ancestor of 119886 and 119887
(iii) for Cases 31 and 32 the correctness is proven byinduction on the number of edges 119899 in the pedigreegraph 119866
B11 Correctness Proof for Case 31
Case 31 ForΦ119886119887119888
119866 does not have any path triples ⟨119875119860119886 119875119860119887
119875119860119888⟩ with root overlap
Proof (Basis) There are two basic scenarios (i) one individ-ual is a parent of another (ii) no individual is a parent ofanother among 119886 119887 and 119888
Using the recursive formula (3) to compute Φ119886119887119888
forFigure 18(a) Φ
119886119887119888= (12)Φ
119888119887119888= (12)
2
Φ119888119888119888 for Figure 18(b)
Φ119886119887119888= (12)Φ
119860119887119888= (12)
2
Φ119860119860119888
= (12)3
Φ119860119860119860
Using the path-counting formula (12) if a path-triple
⟨119875119860119886 119875119860119887 119875119860119888⟩ has no root overlap (ie Type 1) then the
contribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ
119886119887119888can be computed as
follows sumType 1(12)119871⟨119875119860119886119875119860119887
119875119860119888⟩Φ119860119860119860
where 119871⟨119875119860119886119875119860119887 119875119860119888⟩
=
119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
For Figure 18(a) 119888 is the only triple-common ancestor
and we obtain Φ119886119887119888
= (12)119871⟨119875119888119886119875119888119887
119875119888119888⟩Φ119888119888119888
= (12)2
Φ119888119888119888 for
Figure 18(b) we obtain Φ119886119887119888
= (12)119871⟨119875119860119886119875119860119887
119875119860119888⟩Φ119860119860119860
=
(12)3
Φ119860119860119860
Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Then we show it istrue for 119899 = 119896 + 1
For Figures 19(a) and 19(b) among 119886 119887 and 119888 let 119886 be theindividual having the longest path starting from their triple-common ancestor in the pedigree graph119866with (119896+1) edgesIf we remove the node 119886 and cut the edge 119891 rarr 119886 from 119866
then the new graph 119866lowast has 119896 edges In terms of computingΦ119891119887119888
119866lowast satisfies the condition for induction hypothesisFor Figure 19(a) Φ
119891119887119888= sumType 1(12)
119871⟨119875119860119891119875119860119887119875119860119888⟩Φ119860119860119860
Based on the recursive formula (3)Φ
119886119887119888= (12)(Φ
119891119887119888+Φ119898119887119888)
where 119891 and 119898 are parents of 119886 In 119866 119886 only has one parent119891 thus it indicatesΦ
119898119887119888= 0 Then we can plug-in the path-
counting formula forΦ119891119887119888
to obtain
Φ119886119887119888=1
2Φ119891119887119888
=1
2lowast sum
Type 1(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩
Φ119860119860119860
= sum
Type 1(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860119860
∵ 119871⟨119875119860119886119875119860119887 119875119860119888⟩
= 119871⟨119875119860119891119875119860119887 119875119860119888⟩
+ 1
there4 Φ119886119887119888= sum
Type 1(1
2)
119871⟨119875119860119886119875119860119887119875119860119888⟩
Φ119860119860119860
(B2)
Similarly for Figure 19(b) we obtain Φ119886119887119888
=
sumType 1(12)119871⟨119875119888119891119875119888119887119875119888119888⟩+1
Φ119888119888119888= sumType 1(12)
119871⟨119875119888119886119875119888119887119875119888119888⟩Φ119888119888119888
Thus it is true for 119899 = 119896 + 1
B12 Correctness Proof for Case 32
Case 32 ForΦ119886119887119888
119866 has path triples ⟨119875119860119886 119875119860119887 119875119860119888⟩with root
overlap
Proof (Basis) There are three basic scenarios (i) there are twoindividuals who are parents of another (ii) there is only oneindividual who is parent of another (iii) there is no individualwho is a parent of another among 119886 119887 and 119888
Computational and Mathematical Methods in Medicine 17
a
b
c
(a)
A
a
b c
(b)
A
a
s
b
c
(c)
Figure 20 (a) 119887 is a parent of 119886 and 119888 is a parent of 119887 (b) 119887 is a parentof 119886 (c) no individual who is a parent of another
Using the recursive formula (3) to compute Φ119886119887119888
inFigure 20 for Figure 20(a) Φ
119886119887119888= (12)Φ
119887119887119888= (12)
2
Φ119887119888=
(12)3
Φ119888119888 for Figure 20(b)Φ
119886119887119888= (12)Φ
119887119887119888= (12)
2
Φ119887119888=
(12)4
Φ119860119860
for Figure 20(c)Φ119886119887119888= (12)
2
Φ119904119904119888= (12)
3
Φ119904119888=
(12)5
Φ119860119860
Using the path-counting formula (12) if a path-triple
⟨119875119860119886 119875119860119887 119875119860119888⟩ has root overlap (ie Type 2) then the con-
tribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ
119886119887119888can be computed as
followssumType 2(12)119871⟨119875119860119886119875119860119887
119875119860119888⟩+1
Φ119860119860
where 119871⟨119875119860119886 119875119860119887 119875119860119888⟩
=
119871119875119860119886
+ 119871119875119860119887
+ 119871119875119860119888minus 119871119875119860119904
and 119904 is the last individual of theroot overlap path 119875
119860119904
For Figure 20(a) 119888 is the only triple-common ancestorand we obtain Φ
119886119887119888= (12)
119871⟨119875119888119886119875119888119887119875119888119888⟩+1
Φ119888119888= (12)
2+1
Φ119888119888=
(12)3
Φ119888119888 Similarly for Figures 20(b) and 20(c) we obtain
Φ119886119887119888= (12)
4
Φ119860119860
and Φ119886119887119888= (12)
5
Φ119860119860
respectively
Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Show that it is truefor = 119896 + 1
For Figures 21(a) 21(b) and 21(c) among 119886 119887 and 119888 let119886 be the individual who has the longest path and let 119901 be aparent of 119886 Then we cut the edge 119901 rarr 119886 from 119866 and obtaina new graph 119866lowast which satisfies the condition of inductionhypothesis For Figure 21(a) we use the path-counting for-mula forΦ
119891119887119888in 119866lowast Φ
119891119887119888= sumType 2(12)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860
In 119866 119891 is the only parent of 119886 according to the recursive
formula (3) we have Φ119886119887119888= (12)Φ
119891119887119888 Then we can plug-in
the Φ119891119887119888
and obtain
Φ119886119887119888=1
2Φ119891119887119888
=1
2sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860
= sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1
Φ119860119860
∵ 119871⟨119875119860119886 119875119860119887 119875119860119888⟩
= 119871⟨119875119860119891119875119860119887 119875119860119888⟩
+ 1
there4 Φ119886119887119888= sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1
Φ119860119860
= sum
Type 2(1
2)
119871⟨119875119860119886119875119860119887119875119860119888⟩+1
Φ119860119860
(B3)
For Figures 21(b) and 21(c) we take the same steps as we cal-culate Φ
119886119887119888for Figure 21(a)
In summary it is true for 119899 = 119896 + 1
A
a
s
t
f
b
c
(a)
a
t
b
A
s c
(b)
a
s
t
b
c
(c)Figure 21 (a) No individual who is a parent of another (b) 119887 is aparent of 119886 (c) 119887 is a parent of 119886 and 119888 is an ancestor of 119887
B13 Correctness Proof for Case 23
Case 23 For Φ119886119886119887
the path-triples in the pedigree graph 119866have mergeable path-pair
Proof Considering the relationship between 119886 and 119887 119866has two scenarios (i) 119887 is not an ancestor of 119886 (ii) 119887 isan ancestor of 119886 Using the path-counting formula (A1)if a path-triple ⟨119875
1198601198861 1198751198601198862 119875119860119887⟩ isin Type 3 which means
that it has a mergeable path-pair then the contributionof ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ to Φ
119886119886119887can be computed as follows
sumType 3(12)119871⟨119875119860119886119875119860119887
⟩+1Φ119860119860
where 119871⟨119875119860119886 119875119860119887⟩
= 119871119875119860119886+ 119871119875119860119887
Using the recursive formula (4) we obtain Φ
119886119886119887=
(12)(Φ119886119887+ Φ119891119898119887)
For Figure 22(a) 119860 is a common ancestor of 119886 and 119887∵ 119886 only has one parent 119891
there4 Φ119886119886119887
=1
2(Φ119886119887+ Φ119891119898119887)
=1
2(Φ119886119887+ 0) =
1
2Φ119886119887
(as 119898 is missing) (B4)
For Φ119886119887 we use Wrightrsquos formula and obtain Φ
119886119887=
sum119875(12)119871⟨119875119860119886119875119860119887
⟩Φ119860119860
where 119875 denotes all nonoverlappingpath-pairs ⟨119875
119860119886 119875119860119887⟩
Then we have Φ119886119886119887
= (12)Φ119886119887
=
(12)sum119875(12)119871⟨119875119860119886119875119860119887
⟩Φ119860119860= sum119875(12)119871⟨119875119860119886119875119860119887
⟩+1Φ119860119860
For Figure 22(b) we can also transform the computation
of Φ119886119886119887
to Φ119886119887
In summary it shows that the path-counting formula(A1) is true for Case 23
B14 Correctness Proof for Cases 21 and 22 For Φ119886119886119887
whenthere is no path-triple having mergeable path-pair (ie thepath-triple belongs to either Case 21 or Case 23)Φ
119886119886119887can be
transformed toΦ11988611198862119887
which is equivalent to the computationof Φ119886119887119888
for Cases 31 and 32 The correctness of our path-counting formula for Cases 31 and 32 is proven Thus weobtain the correctness for Φ
119886119886119887when the path-triple belongs
to either Case 21 or Case 22
B2 Multiple Triple-Common Ancestors Now we providethe correctness proof for multiple triple-common ancestorsregarding the path-counting formulas (12) and (A1)
18 Computational and Mathematical Methods in Medicine
A
a
s
w
t
f
b
Parent-child relationshipAncestor-descendant relationship
(a)
a
s
f
b
Parent-child relationshipAncestor-descendant relationship
(b)
Figure 22 (a) 119887 is not an ancestor of 119886 (b) 119887 is an ancestor of 119886
Lemma A Given a pedigree graph 119866 and three individuals 119886119887 119888 having at least one trip-common ancestorΦ
119886119887119888is correctly
computed using the path counting formulas (12) and (A1)
Proof Proof by induction on the number of triple-commonancestorsBasis 119866 has only one triple-common ancestor of 119886 119887 and 119888
The correctness of (12) and (A1) for 119866 with only one tri-ple-common ancestor of 119886 119887 and 119888 is proven in the previoussection
Induction Hypothesis Assume that if 119866 has 119896 or less triple-common ancestors of 119886 119887 and 119888 (12) and (A1) are correct for119866
Induction Step Now we show that it is true for 119866 with 119896 + 1triple-common ancestors of 119886 119887 and 119888
Let 119879119903119894 119862(119886 119887 119888 119866) denote all triple-common ancestorsof 119886 119887 and 119888 in 119866 where 119879119903119894 119862(119886 119887 119888 119866) = 119860
119894| 1 le 119894 le 119896 +
1 Let 1198601be the most top triple-common ancestor such that
there is no individual among the remaining ancestors 119860119894|
2 le 119894 le 119896 + 1 who is an ancestor of 1198601 Let 119878(119860
1) denote the
contribution from 1198601to Φ119886119887119888
Because119860
1is themost top triple-common ancestor there
is no path-triple from 119860119894| 2 le 119894 le 119896 + 1 to 119886 119887 and
119888 which passes through 1198601 Then we can remove 119860
1from
119866 and delete all out-going edges from 1198601and obtain a new
graph 1198661015840 which has 119896 triple-common ancestors of 119886 119887 and 119888It means 119879119903119894 119862(119886 119887 119888 1198661015840) = 119860
119894| 2 le 119894 le 119896 + 1
For the new graph 1198661015840 we can apply our induction
hypothesis and obtainΦ119886119887119888(1198661015840
)For the most top triple-common ancestor 119860
1 there are
two different cases considering its relationship with the othertriple-common ancestors
(1) there is no individual among 119860119894| 2 le 119894 le 119896 + 1 who
is a descendant of 1198601
(2) there is at least one individual among 119860119894| 2 le 119894 le
119896 + 1 who is a descendant of 1198601
For (1) since no individual among 119860119894| 2 le 119894 le 119896 + 1 is a
descendant of 1198601 the set of path-triples from 119860
1to 119886 119887 and
119888 is independent of the set of path-triples from 119860119894| 2 le 119894 le
119896 + 1 to 119886 119887 and 119888 It also means that the contribution from
1198601toΦ119886119887119888
is independent of the contribution from the othertriple-common ancestors
Summing up all contributions we can obtainΦ119886119887119888(119866) =
Φ119886119887119888(1198661015840
) + 119878(1198601)
For (2) let119860119895be one descendant of119860
1 Now both119860
1and
119860119895can reach 119886 119887 and 119888119901119905119894= 119905119886 1198601rarr sdot sdot sdot rarr 119886 119905
119887 1198601rarr sdot sdot sdot rarr 119887 119905
119888 1198601rarr
sdot sdot sdot rarr 119888 a path-triple from 1198601to 119886 119887 and 119888
If 119905119886 119905119887 and 119905
119888all pass through119860
119895 then the path-triple119901119905
119894
is not an eligible path-triple for Φ119886119887119888
When we compute thecontribution from119860
1toΦ119886119887119888
we exclude all such path-tripleswhere 119905
119886 119905119887 and 119905
119888all pass through a lower triple-common
ancestor In other words an eligible path-triple from 1198601
regarding Φ119886119887119888
cannot have three paths all passing through alower triple-common ancestor Therefore we know that thatthe contribution from119860
1toΦ119886119887119888
is independent of the contri-bution from the other triple-common ancestors Summing upall contributions we obtainΦ
119886119887119888(119866) = Φ
119886119887119888(1198661015840
) + 119878(1198601)
C Proof for Four Individuals and TwoPairs of Individuals
Here we give a proof sketch for the correctness of pathcounting formulas for four individuals First of all for fourindividuals in a pedigree graph 119866 we present all differentcases based on which we construct a dependency graphThe correctness of the path-counting formulas for two-pairindividuals can be proved similarly
C1 Proof for Four Individuals Consider the existence ofdifferent types of path-quads regarding Φ
119886119887119888119889 Φ119886119886119887119888
andΦ119886119886119886119887
there are 15 cases for a pedigree graph 119866
Case 21 119866 has path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with zero root overlapCase 22 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with one root overlapCase 23 119866 has path-pairs
⟨119875119860119886 119875119860119887⟩
with zero root overlap
lArr997904 Φ119886119886119886119887
Computational and Mathematical Methods in Medicine 19
Case21
Case31 ΦAAA
ΦAAA
Case41
Case42
Case34ΦAA
Case32
Case331
Case22
Case23
Case431
Case35
Case432
Case4 33
Case332
Case333
Figure 23 Dependency graph for different cases for four individuals
Case 31 119866 has path-quads⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with zero root overlapCase 32 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 2-overlapCase 331 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with two root 2-overlapCase 332 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 3-overlapCase 333 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 2-overlapand one root 3-overlap
Case 34 119866 has path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩
with zero root overlapCase 35 119866 has path-triples
⟨119875119860119886 119875119860119887 119875119860119888⟩
with one root overlap
lArr997904 Φ119886119886119887119888
Case 41 119866 has path-quads⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with zero root overlapCase 42 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 2-overlapCase 431 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with two root 2-overlapCase 432 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 3-overlapCase 433 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 2-overlapand one root 3-overlap
lArr997904 Φ119886119887119888119889
(C1)Then we construct a dependency graph shown in
Figure 23 for all cases for four individualsAccording to the dependency graph in Figure 23 the
intermediate steps including Cases 34 and 35 are already
proved for the computation of Φ119886119887119888
The correctness of thetransformation fromCase 42 to Case 34 can be proved basedon the recursive formula forΦ
119886119887119888119889andΦ
119886119886119887119888 Similarly we can
obtain the transformation from Case 431 to Case 35
C2 Proof for TwoPairs of Individuals Consider the existenceof different types of 2-pair-path-pair regarding Φ
119886119887119888119889 there
are 9 cases which are listed as follows
Case 41 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-
overlap and zero root heter-overlap
Case 42 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-
overlap and one root heter-overlap
Case 431 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root
homo-overlap and two root heter-overlap
Case 432 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root
homo-overlap and two root heter-overlap
Case 44119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root homo-
overlap and zero root heter-overlap
Case 45 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with two root homo-
overlap and zero root heter-overlap
Case 46 119866 has path-triples ⟨119875119860119886 119875119860119887 119875119860119888⟩ with zero root
overlapCase 47 119866 has path-triples ⟨119875
119860119886 119875119860119887 119875119860119888⟩ with one root
overlap
Case 48 119866 has path-pairs ⟨119875119879119888 119875119879119889⟩ with zero root overlap
Then we construct a dependency graph for the casesrelating to Φ
119886119887119888119889in Figure 24
According to the dependency graph in Figure 24Cases 46 47 and 48 are the intermediate steps whichalready are proved for the computation of Φ
119886119887119888 The
correctness of the transformation from Case 42 to Case 46can be proved based on the recursive formula for Φ
119886119887119888119889and
Φ119886119887119886119888
Similarly we can obtain the transformation fromCases 431 and 432 to Case 47 as well as from Case 44 toCase 48 accordingly
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
20 Computational and Mathematical Methods in Medicine
Case41
Case44
ΦAAA
Case42 Case46
Case48
ΦAA
ΦTT
Case431 Case47
Case432
ΦAAAA
Figure 24 Dependency graph for different cases for two pairs of individuals
Acknowledgments
The authors thank Professor Robert C Elston Case Schoolof Medicine for introducing to them the identity coefficientsand referring them to the related literature [7 10 17] Thiswork is partially supported by the National Science Founda-tionGrants DBI 0743705 DBI 0849956 andCRI 0551603 andby the National Institute of Health Grant GM088823
References
[1] Surgeon Generalrsquos New Family Health History Tool Is ReleasedReady for ldquo21st Century Medicinerdquo httpcompmedcomcate-gorypeople-helping-peoplepage7
[2] M Falchi P Forabosco E Mocci et al ldquoA genomewidesearch using an original pairwise sampling approach for largegenealogies identifies a new locus for total and low-density lipo-protein cholesterol in two genetically differentiated isolates ofSardiniardquoThe American Journal of Human Genetics vol 75 no6 pp 1015ndash1031 2004
[3] M Ciullo C Bellenguez V Colonna et al ldquoNew susceptibilitylocus for hypertension on chromosome 8q by efficient pedigree-breaking in an Italian isolaterdquo Human Molecular Genetics vol15 no 10 pp 1735ndash1743 2006
[4] Glossary of Genetic Terms National Human Genome ResearchInstitute httpwwwgenomegovglossaryid=148
[5] CW CottermanA calculus for statistico-genetics [PhD thesis]Columbus Ohio USA Ohio State University 1940 Reprintedin P Ballonoff Ed Genetics and Social Structure DowdenHutchinson amp Ross Stroudsburg Pa USA 1974
[6] G Malecot Les mathematique de lrsquoheredite Masson ParisFrance 1948 Translated edition The Mathematics of HeredityFreeman San Francisco Calif USA 1969
[7] M Gillois ldquoLa relation drsquoidentite en genetiquerdquo Annales delrsquoInstitut Henri Poincare B vol 2 pp 1ndash94 1964
[8] D L Harris ldquoGenotypic covariances between inbred relativesrdquoGenetics vol 50 pp 1319ndash1348 1964
[9] A Jacquard ldquoLogique du calcul des coefficients drsquoidentite entredeux individualsrdquo Population vol 21 pp 751ndash776 1966
[10] G Karigl ldquoA recursive algorithm for the calculation of identitycoefficientsrdquo Annals of Human Genetics vol 45 no 3 pp 299ndash305 1981
[11] B Elliott S F Akgul S Mayes and Z M Ozsoyoglu ldquoEfficientevaluation of inbreeding queries on pedigree datardquo in Proceed-ings of the 19th International Conference on Scientific and Statis-tical Database Management (SSDBM rsquo07) July 2007
[12] B Elliott E Cheng S Mayes and Z M Ozsoyoglu ldquoEfficientlycalculating inbreeding on large pedigrees databasesrdquo Informa-tion Systems vol 34 no 6 pp 469ndash492 2009
[13] L Yang E Cheng and Z M Ozsoyoglu ldquoUsing compactencodings for path-based computations on pedigree graphsrdquo inProceedings of the ACM Conference on Bioinformatics Compu-tational Biology and Biomedicine (ACM-BCB rsquo11) pp 235ndash244August 2011
[14] E Cheng B Elliott and Z M Ozsoyoglu ldquoScalable compu-tation of kinship and identity coefficients on large pedigreesrdquoin Proceedings of the 7th Annual International Conference onComputational Systems Bioinformatics (CSB rsquo08) pp 27ndash362008
[15] E Cheng B Elliott and Z M Ozsoyoglu ldquoEfficient compu-tation of kinship and identity coefficients on large pedigreesrdquoJournal of Bioinformatics and Computational Biology (JBCB)vol 7 no 3 pp 429ndash453 2009
[16] S Wright ldquoCoefficients of inbreeding and relationshiprdquo TheAmerican Naturalist vol 56 no 645 1922
[17] R Nadot and G Vaysseix ldquoKinship and identity algorithm ofcoefficients of identityrdquo Biometrics vol 29 no 2 pp 347ndash3591973
[18] E Cheng Scalable path-based computations on pedigree data[PhD thesis] Case Western Reserve University ClevelandOhio USA 2012
[19] V Ollikainen Simulation Techniques for Disease Gene Localiza-tion in Isolated Populations [PhD thesis] University ofHelsinkiHelsinki Finland 2002
[20] H T T Toivonen P Onkamo K Vasko et al ldquoData miningapplied to linkage diseqilibrium mappingrdquoThe American Jour-nal of Human Genetics vol 67 no 1 pp 133ndash145 2000
[21] W Boucher ldquoCalculation of the inbreeding coefficientrdquo Journalof Mathematical Biology vol 26 no 1 pp 57ndash64 1988
Submit your manuscripts athttpwwwhindawicom
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MEDIATORSINFLAMMATION
of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Behavioural Neurology
EndocrinologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Disease Markers
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
OncologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Oxidative Medicine and Cellular Longevity
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
PPAR Research
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
ObesityJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational and Mathematical Methods in Medicine
OphthalmologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Diabetes ResearchJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Research and TreatmentAIDS
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Gastroenterology Research and Practice
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Parkinsonrsquos Disease
Evidence-Based Complementary and Alternative Medicine
Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom
16 Computational and Mathematical Methods in Medicine
a b
c
(a)
A
a b c
(b)
Figure 18 (a) 119888 is a parent of 119886 and 119887 (b) no individual is a parent of another
Parent-child relationshipAncestor-descendant relationship
A
a
s v p
f b c
(a)
Parent-child relationshipAncestor-descendant relationship
c
a
s v
f b
(b)
Figure 19 (a) No individual is a parent of another (b) 119888 is an ancestor of 119886 and 119887
(iii) for Cases 31 and 32 the correctness is proven byinduction on the number of edges 119899 in the pedigreegraph 119866
B11 Correctness Proof for Case 31
Case 31 ForΦ119886119887119888
119866 does not have any path triples ⟨119875119860119886 119875119860119887
119875119860119888⟩ with root overlap
Proof (Basis) There are two basic scenarios (i) one individ-ual is a parent of another (ii) no individual is a parent ofanother among 119886 119887 and 119888
Using the recursive formula (3) to compute Φ119886119887119888
forFigure 18(a) Φ
119886119887119888= (12)Φ
119888119887119888= (12)
2
Φ119888119888119888 for Figure 18(b)
Φ119886119887119888= (12)Φ
119860119887119888= (12)
2
Φ119860119860119888
= (12)3
Φ119860119860119860
Using the path-counting formula (12) if a path-triple
⟨119875119860119886 119875119860119887 119875119860119888⟩ has no root overlap (ie Type 1) then the
contribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ
119886119887119888can be computed as
follows sumType 1(12)119871⟨119875119860119886119875119860119887
119875119860119888⟩Φ119860119860119860
where 119871⟨119875119860119886119875119860119887 119875119860119888⟩
=
119871119875119860119886+ 119871119875119860119887+ 119871119875119860119888
For Figure 18(a) 119888 is the only triple-common ancestor
and we obtain Φ119886119887119888
= (12)119871⟨119875119888119886119875119888119887
119875119888119888⟩Φ119888119888119888
= (12)2
Φ119888119888119888 for
Figure 18(b) we obtain Φ119886119887119888
= (12)119871⟨119875119860119886119875119860119887
119875119860119888⟩Φ119860119860119860
=
(12)3
Φ119860119860119860
Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Then we show it istrue for 119899 = 119896 + 1
For Figures 19(a) and 19(b) among 119886 119887 and 119888 let 119886 be theindividual having the longest path starting from their triple-common ancestor in the pedigree graph119866with (119896+1) edgesIf we remove the node 119886 and cut the edge 119891 rarr 119886 from 119866
then the new graph 119866lowast has 119896 edges In terms of computingΦ119891119887119888
119866lowast satisfies the condition for induction hypothesisFor Figure 19(a) Φ
119891119887119888= sumType 1(12)
119871⟨119875119860119891119875119860119887119875119860119888⟩Φ119860119860119860
Based on the recursive formula (3)Φ
119886119887119888= (12)(Φ
119891119887119888+Φ119898119887119888)
where 119891 and 119898 are parents of 119886 In 119866 119886 only has one parent119891 thus it indicatesΦ
119898119887119888= 0 Then we can plug-in the path-
counting formula forΦ119891119887119888
to obtain
Φ119886119887119888=1
2Φ119891119887119888
=1
2lowast sum
Type 1(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩
Φ119860119860119860
= sum
Type 1(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860119860
∵ 119871⟨119875119860119886119875119860119887 119875119860119888⟩
= 119871⟨119875119860119891119875119860119887 119875119860119888⟩
+ 1
there4 Φ119886119887119888= sum
Type 1(1
2)
119871⟨119875119860119886119875119860119887119875119860119888⟩
Φ119860119860119860
(B2)
Similarly for Figure 19(b) we obtain Φ119886119887119888
=
sumType 1(12)119871⟨119875119888119891119875119888119887119875119888119888⟩+1
Φ119888119888119888= sumType 1(12)
119871⟨119875119888119886119875119888119887119875119888119888⟩Φ119888119888119888
Thus it is true for 119899 = 119896 + 1
B12 Correctness Proof for Case 32
Case 32 ForΦ119886119887119888
119866 has path triples ⟨119875119860119886 119875119860119887 119875119860119888⟩with root
overlap
Proof (Basis) There are three basic scenarios (i) there are twoindividuals who are parents of another (ii) there is only oneindividual who is parent of another (iii) there is no individualwho is a parent of another among 119886 119887 and 119888
Computational and Mathematical Methods in Medicine 17
a
b
c
(a)
A
a
b c
(b)
A
a
s
b
c
(c)
Figure 20 (a) 119887 is a parent of 119886 and 119888 is a parent of 119887 (b) 119887 is a parentof 119886 (c) no individual who is a parent of another
Using the recursive formula (3) to compute Φ119886119887119888
inFigure 20 for Figure 20(a) Φ
119886119887119888= (12)Φ
119887119887119888= (12)
2
Φ119887119888=
(12)3
Φ119888119888 for Figure 20(b)Φ
119886119887119888= (12)Φ
119887119887119888= (12)
2
Φ119887119888=
(12)4
Φ119860119860
for Figure 20(c)Φ119886119887119888= (12)
2
Φ119904119904119888= (12)
3
Φ119904119888=
(12)5
Φ119860119860
Using the path-counting formula (12) if a path-triple
⟨119875119860119886 119875119860119887 119875119860119888⟩ has root overlap (ie Type 2) then the con-
tribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ
119886119887119888can be computed as
followssumType 2(12)119871⟨119875119860119886119875119860119887
119875119860119888⟩+1
Φ119860119860
where 119871⟨119875119860119886 119875119860119887 119875119860119888⟩
=
119871119875119860119886
+ 119871119875119860119887
+ 119871119875119860119888minus 119871119875119860119904
and 119904 is the last individual of theroot overlap path 119875
119860119904
For Figure 20(a) 119888 is the only triple-common ancestorand we obtain Φ
119886119887119888= (12)
119871⟨119875119888119886119875119888119887119875119888119888⟩+1
Φ119888119888= (12)
2+1
Φ119888119888=
(12)3
Φ119888119888 Similarly for Figures 20(b) and 20(c) we obtain
Φ119886119887119888= (12)
4
Φ119860119860
and Φ119886119887119888= (12)
5
Φ119860119860
respectively
Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Show that it is truefor = 119896 + 1
For Figures 21(a) 21(b) and 21(c) among 119886 119887 and 119888 let119886 be the individual who has the longest path and let 119901 be aparent of 119886 Then we cut the edge 119901 rarr 119886 from 119866 and obtaina new graph 119866lowast which satisfies the condition of inductionhypothesis For Figure 21(a) we use the path-counting for-mula forΦ
119891119887119888in 119866lowast Φ
119891119887119888= sumType 2(12)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860
In 119866 119891 is the only parent of 119886 according to the recursive
formula (3) we have Φ119886119887119888= (12)Φ
119891119887119888 Then we can plug-in
the Φ119891119887119888
and obtain
Φ119886119887119888=1
2Φ119891119887119888
=1
2sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860
= sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1
Φ119860119860
∵ 119871⟨119875119860119886 119875119860119887 119875119860119888⟩
= 119871⟨119875119860119891119875119860119887 119875119860119888⟩
+ 1
there4 Φ119886119887119888= sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1
Φ119860119860
= sum
Type 2(1
2)
119871⟨119875119860119886119875119860119887119875119860119888⟩+1
Φ119860119860
(B3)
For Figures 21(b) and 21(c) we take the same steps as we cal-culate Φ
119886119887119888for Figure 21(a)
In summary it is true for 119899 = 119896 + 1
A
a
s
t
f
b
c
(a)
a
t
b
A
s c
(b)
a
s
t
b
c
(c)Figure 21 (a) No individual who is a parent of another (b) 119887 is aparent of 119886 (c) 119887 is a parent of 119886 and 119888 is an ancestor of 119887
B13 Correctness Proof for Case 23
Case 23 For Φ119886119886119887
the path-triples in the pedigree graph 119866have mergeable path-pair
Proof Considering the relationship between 119886 and 119887 119866has two scenarios (i) 119887 is not an ancestor of 119886 (ii) 119887 isan ancestor of 119886 Using the path-counting formula (A1)if a path-triple ⟨119875
1198601198861 1198751198601198862 119875119860119887⟩ isin Type 3 which means
that it has a mergeable path-pair then the contributionof ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ to Φ
119886119886119887can be computed as follows
sumType 3(12)119871⟨119875119860119886119875119860119887
⟩+1Φ119860119860
where 119871⟨119875119860119886 119875119860119887⟩
= 119871119875119860119886+ 119871119875119860119887
Using the recursive formula (4) we obtain Φ
119886119886119887=
(12)(Φ119886119887+ Φ119891119898119887)
For Figure 22(a) 119860 is a common ancestor of 119886 and 119887∵ 119886 only has one parent 119891
there4 Φ119886119886119887
=1
2(Φ119886119887+ Φ119891119898119887)
=1
2(Φ119886119887+ 0) =
1
2Φ119886119887
(as 119898 is missing) (B4)
For Φ119886119887 we use Wrightrsquos formula and obtain Φ
119886119887=
sum119875(12)119871⟨119875119860119886119875119860119887
⟩Φ119860119860
where 119875 denotes all nonoverlappingpath-pairs ⟨119875
119860119886 119875119860119887⟩
Then we have Φ119886119886119887
= (12)Φ119886119887
=
(12)sum119875(12)119871⟨119875119860119886119875119860119887
⟩Φ119860119860= sum119875(12)119871⟨119875119860119886119875119860119887
⟩+1Φ119860119860
For Figure 22(b) we can also transform the computation
of Φ119886119886119887
to Φ119886119887
In summary it shows that the path-counting formula(A1) is true for Case 23
B14 Correctness Proof for Cases 21 and 22 For Φ119886119886119887
whenthere is no path-triple having mergeable path-pair (ie thepath-triple belongs to either Case 21 or Case 23)Φ
119886119886119887can be
transformed toΦ11988611198862119887
which is equivalent to the computationof Φ119886119887119888
for Cases 31 and 32 The correctness of our path-counting formula for Cases 31 and 32 is proven Thus weobtain the correctness for Φ
119886119886119887when the path-triple belongs
to either Case 21 or Case 22
B2 Multiple Triple-Common Ancestors Now we providethe correctness proof for multiple triple-common ancestorsregarding the path-counting formulas (12) and (A1)
18 Computational and Mathematical Methods in Medicine
A
a
s
w
t
f
b
Parent-child relationshipAncestor-descendant relationship
(a)
a
s
f
b
Parent-child relationshipAncestor-descendant relationship
(b)
Figure 22 (a) 119887 is not an ancestor of 119886 (b) 119887 is an ancestor of 119886
Lemma A Given a pedigree graph 119866 and three individuals 119886119887 119888 having at least one trip-common ancestorΦ
119886119887119888is correctly
computed using the path counting formulas (12) and (A1)
Proof Proof by induction on the number of triple-commonancestorsBasis 119866 has only one triple-common ancestor of 119886 119887 and 119888
The correctness of (12) and (A1) for 119866 with only one tri-ple-common ancestor of 119886 119887 and 119888 is proven in the previoussection
Induction Hypothesis Assume that if 119866 has 119896 or less triple-common ancestors of 119886 119887 and 119888 (12) and (A1) are correct for119866
Induction Step Now we show that it is true for 119866 with 119896 + 1triple-common ancestors of 119886 119887 and 119888
Let 119879119903119894 119862(119886 119887 119888 119866) denote all triple-common ancestorsof 119886 119887 and 119888 in 119866 where 119879119903119894 119862(119886 119887 119888 119866) = 119860
119894| 1 le 119894 le 119896 +
1 Let 1198601be the most top triple-common ancestor such that
there is no individual among the remaining ancestors 119860119894|
2 le 119894 le 119896 + 1 who is an ancestor of 1198601 Let 119878(119860
1) denote the
contribution from 1198601to Φ119886119887119888
Because119860
1is themost top triple-common ancestor there
is no path-triple from 119860119894| 2 le 119894 le 119896 + 1 to 119886 119887 and
119888 which passes through 1198601 Then we can remove 119860
1from
119866 and delete all out-going edges from 1198601and obtain a new
graph 1198661015840 which has 119896 triple-common ancestors of 119886 119887 and 119888It means 119879119903119894 119862(119886 119887 119888 1198661015840) = 119860
119894| 2 le 119894 le 119896 + 1
For the new graph 1198661015840 we can apply our induction
hypothesis and obtainΦ119886119887119888(1198661015840
)For the most top triple-common ancestor 119860
1 there are
two different cases considering its relationship with the othertriple-common ancestors
(1) there is no individual among 119860119894| 2 le 119894 le 119896 + 1 who
is a descendant of 1198601
(2) there is at least one individual among 119860119894| 2 le 119894 le
119896 + 1 who is a descendant of 1198601
For (1) since no individual among 119860119894| 2 le 119894 le 119896 + 1 is a
descendant of 1198601 the set of path-triples from 119860
1to 119886 119887 and
119888 is independent of the set of path-triples from 119860119894| 2 le 119894 le
119896 + 1 to 119886 119887 and 119888 It also means that the contribution from
1198601toΦ119886119887119888
is independent of the contribution from the othertriple-common ancestors
Summing up all contributions we can obtainΦ119886119887119888(119866) =
Φ119886119887119888(1198661015840
) + 119878(1198601)
For (2) let119860119895be one descendant of119860
1 Now both119860
1and
119860119895can reach 119886 119887 and 119888119901119905119894= 119905119886 1198601rarr sdot sdot sdot rarr 119886 119905
119887 1198601rarr sdot sdot sdot rarr 119887 119905
119888 1198601rarr
sdot sdot sdot rarr 119888 a path-triple from 1198601to 119886 119887 and 119888
If 119905119886 119905119887 and 119905
119888all pass through119860
119895 then the path-triple119901119905
119894
is not an eligible path-triple for Φ119886119887119888
When we compute thecontribution from119860
1toΦ119886119887119888
we exclude all such path-tripleswhere 119905
119886 119905119887 and 119905
119888all pass through a lower triple-common
ancestor In other words an eligible path-triple from 1198601
regarding Φ119886119887119888
cannot have three paths all passing through alower triple-common ancestor Therefore we know that thatthe contribution from119860
1toΦ119886119887119888
is independent of the contri-bution from the other triple-common ancestors Summing upall contributions we obtainΦ
119886119887119888(119866) = Φ
119886119887119888(1198661015840
) + 119878(1198601)
C Proof for Four Individuals and TwoPairs of Individuals
Here we give a proof sketch for the correctness of pathcounting formulas for four individuals First of all for fourindividuals in a pedigree graph 119866 we present all differentcases based on which we construct a dependency graphThe correctness of the path-counting formulas for two-pairindividuals can be proved similarly
C1 Proof for Four Individuals Consider the existence ofdifferent types of path-quads regarding Φ
119886119887119888119889 Φ119886119886119887119888
andΦ119886119886119886119887
there are 15 cases for a pedigree graph 119866
Case 21 119866 has path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with zero root overlapCase 22 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with one root overlapCase 23 119866 has path-pairs
⟨119875119860119886 119875119860119887⟩
with zero root overlap
lArr997904 Φ119886119886119886119887
Computational and Mathematical Methods in Medicine 19
Case21
Case31 ΦAAA
ΦAAA
Case41
Case42
Case34ΦAA
Case32
Case331
Case22
Case23
Case431
Case35
Case432
Case4 33
Case332
Case333
Figure 23 Dependency graph for different cases for four individuals
Case 31 119866 has path-quads⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with zero root overlapCase 32 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 2-overlapCase 331 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with two root 2-overlapCase 332 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 3-overlapCase 333 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 2-overlapand one root 3-overlap
Case 34 119866 has path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩
with zero root overlapCase 35 119866 has path-triples
⟨119875119860119886 119875119860119887 119875119860119888⟩
with one root overlap
lArr997904 Φ119886119886119887119888
Case 41 119866 has path-quads⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with zero root overlapCase 42 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 2-overlapCase 431 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with two root 2-overlapCase 432 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 3-overlapCase 433 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 2-overlapand one root 3-overlap
lArr997904 Φ119886119887119888119889
(C1)Then we construct a dependency graph shown in
Figure 23 for all cases for four individualsAccording to the dependency graph in Figure 23 the
intermediate steps including Cases 34 and 35 are already
proved for the computation of Φ119886119887119888
The correctness of thetransformation fromCase 42 to Case 34 can be proved basedon the recursive formula forΦ
119886119887119888119889andΦ
119886119886119887119888 Similarly we can
obtain the transformation from Case 431 to Case 35
C2 Proof for TwoPairs of Individuals Consider the existenceof different types of 2-pair-path-pair regarding Φ
119886119887119888119889 there
are 9 cases which are listed as follows
Case 41 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-
overlap and zero root heter-overlap
Case 42 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-
overlap and one root heter-overlap
Case 431 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root
homo-overlap and two root heter-overlap
Case 432 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root
homo-overlap and two root heter-overlap
Case 44119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root homo-
overlap and zero root heter-overlap
Case 45 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with two root homo-
overlap and zero root heter-overlap
Case 46 119866 has path-triples ⟨119875119860119886 119875119860119887 119875119860119888⟩ with zero root
overlapCase 47 119866 has path-triples ⟨119875
119860119886 119875119860119887 119875119860119888⟩ with one root
overlap
Case 48 119866 has path-pairs ⟨119875119879119888 119875119879119889⟩ with zero root overlap
Then we construct a dependency graph for the casesrelating to Φ
119886119887119888119889in Figure 24
According to the dependency graph in Figure 24Cases 46 47 and 48 are the intermediate steps whichalready are proved for the computation of Φ
119886119887119888 The
correctness of the transformation from Case 42 to Case 46can be proved based on the recursive formula for Φ
119886119887119888119889and
Φ119886119887119886119888
Similarly we can obtain the transformation fromCases 431 and 432 to Case 47 as well as from Case 44 toCase 48 accordingly
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
20 Computational and Mathematical Methods in Medicine
Case41
Case44
ΦAAA
Case42 Case46
Case48
ΦAA
ΦTT
Case431 Case47
Case432
ΦAAAA
Figure 24 Dependency graph for different cases for two pairs of individuals
Acknowledgments
The authors thank Professor Robert C Elston Case Schoolof Medicine for introducing to them the identity coefficientsand referring them to the related literature [7 10 17] Thiswork is partially supported by the National Science Founda-tionGrants DBI 0743705 DBI 0849956 andCRI 0551603 andby the National Institute of Health Grant GM088823
References
[1] Surgeon Generalrsquos New Family Health History Tool Is ReleasedReady for ldquo21st Century Medicinerdquo httpcompmedcomcate-gorypeople-helping-peoplepage7
[2] M Falchi P Forabosco E Mocci et al ldquoA genomewidesearch using an original pairwise sampling approach for largegenealogies identifies a new locus for total and low-density lipo-protein cholesterol in two genetically differentiated isolates ofSardiniardquoThe American Journal of Human Genetics vol 75 no6 pp 1015ndash1031 2004
[3] M Ciullo C Bellenguez V Colonna et al ldquoNew susceptibilitylocus for hypertension on chromosome 8q by efficient pedigree-breaking in an Italian isolaterdquo Human Molecular Genetics vol15 no 10 pp 1735ndash1743 2006
[4] Glossary of Genetic Terms National Human Genome ResearchInstitute httpwwwgenomegovglossaryid=148
[5] CW CottermanA calculus for statistico-genetics [PhD thesis]Columbus Ohio USA Ohio State University 1940 Reprintedin P Ballonoff Ed Genetics and Social Structure DowdenHutchinson amp Ross Stroudsburg Pa USA 1974
[6] G Malecot Les mathematique de lrsquoheredite Masson ParisFrance 1948 Translated edition The Mathematics of HeredityFreeman San Francisco Calif USA 1969
[7] M Gillois ldquoLa relation drsquoidentite en genetiquerdquo Annales delrsquoInstitut Henri Poincare B vol 2 pp 1ndash94 1964
[8] D L Harris ldquoGenotypic covariances between inbred relativesrdquoGenetics vol 50 pp 1319ndash1348 1964
[9] A Jacquard ldquoLogique du calcul des coefficients drsquoidentite entredeux individualsrdquo Population vol 21 pp 751ndash776 1966
[10] G Karigl ldquoA recursive algorithm for the calculation of identitycoefficientsrdquo Annals of Human Genetics vol 45 no 3 pp 299ndash305 1981
[11] B Elliott S F Akgul S Mayes and Z M Ozsoyoglu ldquoEfficientevaluation of inbreeding queries on pedigree datardquo in Proceed-ings of the 19th International Conference on Scientific and Statis-tical Database Management (SSDBM rsquo07) July 2007
[12] B Elliott E Cheng S Mayes and Z M Ozsoyoglu ldquoEfficientlycalculating inbreeding on large pedigrees databasesrdquo Informa-tion Systems vol 34 no 6 pp 469ndash492 2009
[13] L Yang E Cheng and Z M Ozsoyoglu ldquoUsing compactencodings for path-based computations on pedigree graphsrdquo inProceedings of the ACM Conference on Bioinformatics Compu-tational Biology and Biomedicine (ACM-BCB rsquo11) pp 235ndash244August 2011
[14] E Cheng B Elliott and Z M Ozsoyoglu ldquoScalable compu-tation of kinship and identity coefficients on large pedigreesrdquoin Proceedings of the 7th Annual International Conference onComputational Systems Bioinformatics (CSB rsquo08) pp 27ndash362008
[15] E Cheng B Elliott and Z M Ozsoyoglu ldquoEfficient compu-tation of kinship and identity coefficients on large pedigreesrdquoJournal of Bioinformatics and Computational Biology (JBCB)vol 7 no 3 pp 429ndash453 2009
[16] S Wright ldquoCoefficients of inbreeding and relationshiprdquo TheAmerican Naturalist vol 56 no 645 1922
[17] R Nadot and G Vaysseix ldquoKinship and identity algorithm ofcoefficients of identityrdquo Biometrics vol 29 no 2 pp 347ndash3591973
[18] E Cheng Scalable path-based computations on pedigree data[PhD thesis] Case Western Reserve University ClevelandOhio USA 2012
[19] V Ollikainen Simulation Techniques for Disease Gene Localiza-tion in Isolated Populations [PhD thesis] University ofHelsinkiHelsinki Finland 2002
[20] H T T Toivonen P Onkamo K Vasko et al ldquoData miningapplied to linkage diseqilibrium mappingrdquoThe American Jour-nal of Human Genetics vol 67 no 1 pp 133ndash145 2000
[21] W Boucher ldquoCalculation of the inbreeding coefficientrdquo Journalof Mathematical Biology vol 26 no 1 pp 57ndash64 1988
Submit your manuscripts athttpwwwhindawicom
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MEDIATORSINFLAMMATION
of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Behavioural Neurology
EndocrinologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Disease Markers
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
OncologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Oxidative Medicine and Cellular Longevity
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
PPAR Research
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
ObesityJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational and Mathematical Methods in Medicine
OphthalmologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Diabetes ResearchJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Research and TreatmentAIDS
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Gastroenterology Research and Practice
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Parkinsonrsquos Disease
Evidence-Based Complementary and Alternative Medicine
Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom
Computational and Mathematical Methods in Medicine 17
a
b
c
(a)
A
a
b c
(b)
A
a
s
b
c
(c)
Figure 20 (a) 119887 is a parent of 119886 and 119888 is a parent of 119887 (b) 119887 is a parentof 119886 (c) no individual who is a parent of another
Using the recursive formula (3) to compute Φ119886119887119888
inFigure 20 for Figure 20(a) Φ
119886119887119888= (12)Φ
119887119887119888= (12)
2
Φ119887119888=
(12)3
Φ119888119888 for Figure 20(b)Φ
119886119887119888= (12)Φ
119887119887119888= (12)
2
Φ119887119888=
(12)4
Φ119860119860
for Figure 20(c)Φ119886119887119888= (12)
2
Φ119904119904119888= (12)
3
Φ119904119888=
(12)5
Φ119860119860
Using the path-counting formula (12) if a path-triple
⟨119875119860119886 119875119860119887 119875119860119888⟩ has root overlap (ie Type 2) then the con-
tribution of ⟨119875119860119886 119875119860119887 119875119860119888⟩ to Φ
119886119887119888can be computed as
followssumType 2(12)119871⟨119875119860119886119875119860119887
119875119860119888⟩+1
Φ119860119860
where 119871⟨119875119860119886 119875119860119887 119875119860119888⟩
=
119871119875119860119886
+ 119871119875119860119887
+ 119871119875119860119888minus 119871119875119860119904
and 119904 is the last individual of theroot overlap path 119875
119860119904
For Figure 20(a) 119888 is the only triple-common ancestorand we obtain Φ
119886119887119888= (12)
119871⟨119875119888119886119875119888119887119875119888119888⟩+1
Φ119888119888= (12)
2+1
Φ119888119888=
(12)3
Φ119888119888 Similarly for Figures 20(b) and 20(c) we obtain
Φ119886119887119888= (12)
4
Φ119860119860
and Φ119886119887119888= (12)
5
Φ119860119860
respectively
Induction Step Let 119899 denote the number of edges in 119866Assume true for 119899 le 119896 where 119896 ge 2 Show that it is truefor = 119896 + 1
For Figures 21(a) 21(b) and 21(c) among 119886 119887 and 119888 let119886 be the individual who has the longest path and let 119901 be aparent of 119886 Then we cut the edge 119901 rarr 119886 from 119866 and obtaina new graph 119866lowast which satisfies the condition of inductionhypothesis For Figure 21(a) we use the path-counting for-mula forΦ
119891119887119888in 119866lowast Φ
119891119887119888= sumType 2(12)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860
In 119866 119891 is the only parent of 119886 according to the recursive
formula (3) we have Φ119886119887119888= (12)Φ
119891119887119888 Then we can plug-in
the Φ119891119887119888
and obtain
Φ119886119887119888=1
2Φ119891119887119888
=1
2sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1
Φ119860119860
= sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1
Φ119860119860
∵ 119871⟨119875119860119886 119875119860119887 119875119860119888⟩
= 119871⟨119875119860119891119875119860119887 119875119860119888⟩
+ 1
there4 Φ119886119887119888= sum
Type 2(1
2)
119871⟨119875119860119891119875119860119887119875119860119888⟩+1+1
Φ119860119860
= sum
Type 2(1
2)
119871⟨119875119860119886119875119860119887119875119860119888⟩+1
Φ119860119860
(B3)
For Figures 21(b) and 21(c) we take the same steps as we cal-culate Φ
119886119887119888for Figure 21(a)
In summary it is true for 119899 = 119896 + 1
A
a
s
t
f
b
c
(a)
a
t
b
A
s c
(b)
a
s
t
b
c
(c)Figure 21 (a) No individual who is a parent of another (b) 119887 is aparent of 119886 (c) 119887 is a parent of 119886 and 119888 is an ancestor of 119887
B13 Correctness Proof for Case 23
Case 23 For Φ119886119886119887
the path-triples in the pedigree graph 119866have mergeable path-pair
Proof Considering the relationship between 119886 and 119887 119866has two scenarios (i) 119887 is not an ancestor of 119886 (ii) 119887 isan ancestor of 119886 Using the path-counting formula (A1)if a path-triple ⟨119875
1198601198861 1198751198601198862 119875119860119887⟩ isin Type 3 which means
that it has a mergeable path-pair then the contributionof ⟨1198751198601198861 1198751198601198862 119875119860119887⟩ to Φ
119886119886119887can be computed as follows
sumType 3(12)119871⟨119875119860119886119875119860119887
⟩+1Φ119860119860
where 119871⟨119875119860119886 119875119860119887⟩
= 119871119875119860119886+ 119871119875119860119887
Using the recursive formula (4) we obtain Φ
119886119886119887=
(12)(Φ119886119887+ Φ119891119898119887)
For Figure 22(a) 119860 is a common ancestor of 119886 and 119887∵ 119886 only has one parent 119891
there4 Φ119886119886119887
=1
2(Φ119886119887+ Φ119891119898119887)
=1
2(Φ119886119887+ 0) =
1
2Φ119886119887
(as 119898 is missing) (B4)
For Φ119886119887 we use Wrightrsquos formula and obtain Φ
119886119887=
sum119875(12)119871⟨119875119860119886119875119860119887
⟩Φ119860119860
where 119875 denotes all nonoverlappingpath-pairs ⟨119875
119860119886 119875119860119887⟩
Then we have Φ119886119886119887
= (12)Φ119886119887
=
(12)sum119875(12)119871⟨119875119860119886119875119860119887
⟩Φ119860119860= sum119875(12)119871⟨119875119860119886119875119860119887
⟩+1Φ119860119860
For Figure 22(b) we can also transform the computation
of Φ119886119886119887
to Φ119886119887
In summary it shows that the path-counting formula(A1) is true for Case 23
B14 Correctness Proof for Cases 21 and 22 For Φ119886119886119887
whenthere is no path-triple having mergeable path-pair (ie thepath-triple belongs to either Case 21 or Case 23)Φ
119886119886119887can be
transformed toΦ11988611198862119887
which is equivalent to the computationof Φ119886119887119888
for Cases 31 and 32 The correctness of our path-counting formula for Cases 31 and 32 is proven Thus weobtain the correctness for Φ
119886119886119887when the path-triple belongs
to either Case 21 or Case 22
B2 Multiple Triple-Common Ancestors Now we providethe correctness proof for multiple triple-common ancestorsregarding the path-counting formulas (12) and (A1)
18 Computational and Mathematical Methods in Medicine
A
a
s
w
t
f
b
Parent-child relationshipAncestor-descendant relationship
(a)
a
s
f
b
Parent-child relationshipAncestor-descendant relationship
(b)
Figure 22 (a) 119887 is not an ancestor of 119886 (b) 119887 is an ancestor of 119886
Lemma A Given a pedigree graph 119866 and three individuals 119886119887 119888 having at least one trip-common ancestorΦ
119886119887119888is correctly
computed using the path counting formulas (12) and (A1)
Proof Proof by induction on the number of triple-commonancestorsBasis 119866 has only one triple-common ancestor of 119886 119887 and 119888
The correctness of (12) and (A1) for 119866 with only one tri-ple-common ancestor of 119886 119887 and 119888 is proven in the previoussection
Induction Hypothesis Assume that if 119866 has 119896 or less triple-common ancestors of 119886 119887 and 119888 (12) and (A1) are correct for119866
Induction Step Now we show that it is true for 119866 with 119896 + 1triple-common ancestors of 119886 119887 and 119888
Let 119879119903119894 119862(119886 119887 119888 119866) denote all triple-common ancestorsof 119886 119887 and 119888 in 119866 where 119879119903119894 119862(119886 119887 119888 119866) = 119860
119894| 1 le 119894 le 119896 +
1 Let 1198601be the most top triple-common ancestor such that
there is no individual among the remaining ancestors 119860119894|
2 le 119894 le 119896 + 1 who is an ancestor of 1198601 Let 119878(119860
1) denote the
contribution from 1198601to Φ119886119887119888
Because119860
1is themost top triple-common ancestor there
is no path-triple from 119860119894| 2 le 119894 le 119896 + 1 to 119886 119887 and
119888 which passes through 1198601 Then we can remove 119860
1from
119866 and delete all out-going edges from 1198601and obtain a new
graph 1198661015840 which has 119896 triple-common ancestors of 119886 119887 and 119888It means 119879119903119894 119862(119886 119887 119888 1198661015840) = 119860
119894| 2 le 119894 le 119896 + 1
For the new graph 1198661015840 we can apply our induction
hypothesis and obtainΦ119886119887119888(1198661015840
)For the most top triple-common ancestor 119860
1 there are
two different cases considering its relationship with the othertriple-common ancestors
(1) there is no individual among 119860119894| 2 le 119894 le 119896 + 1 who
is a descendant of 1198601
(2) there is at least one individual among 119860119894| 2 le 119894 le
119896 + 1 who is a descendant of 1198601
For (1) since no individual among 119860119894| 2 le 119894 le 119896 + 1 is a
descendant of 1198601 the set of path-triples from 119860
1to 119886 119887 and
119888 is independent of the set of path-triples from 119860119894| 2 le 119894 le
119896 + 1 to 119886 119887 and 119888 It also means that the contribution from
1198601toΦ119886119887119888
is independent of the contribution from the othertriple-common ancestors
Summing up all contributions we can obtainΦ119886119887119888(119866) =
Φ119886119887119888(1198661015840
) + 119878(1198601)
For (2) let119860119895be one descendant of119860
1 Now both119860
1and
119860119895can reach 119886 119887 and 119888119901119905119894= 119905119886 1198601rarr sdot sdot sdot rarr 119886 119905
119887 1198601rarr sdot sdot sdot rarr 119887 119905
119888 1198601rarr
sdot sdot sdot rarr 119888 a path-triple from 1198601to 119886 119887 and 119888
If 119905119886 119905119887 and 119905
119888all pass through119860
119895 then the path-triple119901119905
119894
is not an eligible path-triple for Φ119886119887119888
When we compute thecontribution from119860
1toΦ119886119887119888
we exclude all such path-tripleswhere 119905
119886 119905119887 and 119905
119888all pass through a lower triple-common
ancestor In other words an eligible path-triple from 1198601
regarding Φ119886119887119888
cannot have three paths all passing through alower triple-common ancestor Therefore we know that thatthe contribution from119860
1toΦ119886119887119888
is independent of the contri-bution from the other triple-common ancestors Summing upall contributions we obtainΦ
119886119887119888(119866) = Φ
119886119887119888(1198661015840
) + 119878(1198601)
C Proof for Four Individuals and TwoPairs of Individuals
Here we give a proof sketch for the correctness of pathcounting formulas for four individuals First of all for fourindividuals in a pedigree graph 119866 we present all differentcases based on which we construct a dependency graphThe correctness of the path-counting formulas for two-pairindividuals can be proved similarly
C1 Proof for Four Individuals Consider the existence ofdifferent types of path-quads regarding Φ
119886119887119888119889 Φ119886119886119887119888
andΦ119886119886119886119887
there are 15 cases for a pedigree graph 119866
Case 21 119866 has path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with zero root overlapCase 22 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with one root overlapCase 23 119866 has path-pairs
⟨119875119860119886 119875119860119887⟩
with zero root overlap
lArr997904 Φ119886119886119886119887
Computational and Mathematical Methods in Medicine 19
Case21
Case31 ΦAAA
ΦAAA
Case41
Case42
Case34ΦAA
Case32
Case331
Case22
Case23
Case431
Case35
Case432
Case4 33
Case332
Case333
Figure 23 Dependency graph for different cases for four individuals
Case 31 119866 has path-quads⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with zero root overlapCase 32 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 2-overlapCase 331 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with two root 2-overlapCase 332 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 3-overlapCase 333 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 2-overlapand one root 3-overlap
Case 34 119866 has path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩
with zero root overlapCase 35 119866 has path-triples
⟨119875119860119886 119875119860119887 119875119860119888⟩
with one root overlap
lArr997904 Φ119886119886119887119888
Case 41 119866 has path-quads⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with zero root overlapCase 42 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 2-overlapCase 431 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with two root 2-overlapCase 432 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 3-overlapCase 433 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 2-overlapand one root 3-overlap
lArr997904 Φ119886119887119888119889
(C1)Then we construct a dependency graph shown in
Figure 23 for all cases for four individualsAccording to the dependency graph in Figure 23 the
intermediate steps including Cases 34 and 35 are already
proved for the computation of Φ119886119887119888
The correctness of thetransformation fromCase 42 to Case 34 can be proved basedon the recursive formula forΦ
119886119887119888119889andΦ
119886119886119887119888 Similarly we can
obtain the transformation from Case 431 to Case 35
C2 Proof for TwoPairs of Individuals Consider the existenceof different types of 2-pair-path-pair regarding Φ
119886119887119888119889 there
are 9 cases which are listed as follows
Case 41 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-
overlap and zero root heter-overlap
Case 42 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-
overlap and one root heter-overlap
Case 431 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root
homo-overlap and two root heter-overlap
Case 432 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root
homo-overlap and two root heter-overlap
Case 44119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root homo-
overlap and zero root heter-overlap
Case 45 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with two root homo-
overlap and zero root heter-overlap
Case 46 119866 has path-triples ⟨119875119860119886 119875119860119887 119875119860119888⟩ with zero root
overlapCase 47 119866 has path-triples ⟨119875
119860119886 119875119860119887 119875119860119888⟩ with one root
overlap
Case 48 119866 has path-pairs ⟨119875119879119888 119875119879119889⟩ with zero root overlap
Then we construct a dependency graph for the casesrelating to Φ
119886119887119888119889in Figure 24
According to the dependency graph in Figure 24Cases 46 47 and 48 are the intermediate steps whichalready are proved for the computation of Φ
119886119887119888 The
correctness of the transformation from Case 42 to Case 46can be proved based on the recursive formula for Φ
119886119887119888119889and
Φ119886119887119886119888
Similarly we can obtain the transformation fromCases 431 and 432 to Case 47 as well as from Case 44 toCase 48 accordingly
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
20 Computational and Mathematical Methods in Medicine
Case41
Case44
ΦAAA
Case42 Case46
Case48
ΦAA
ΦTT
Case431 Case47
Case432
ΦAAAA
Figure 24 Dependency graph for different cases for two pairs of individuals
Acknowledgments
The authors thank Professor Robert C Elston Case Schoolof Medicine for introducing to them the identity coefficientsand referring them to the related literature [7 10 17] Thiswork is partially supported by the National Science Founda-tionGrants DBI 0743705 DBI 0849956 andCRI 0551603 andby the National Institute of Health Grant GM088823
References
[1] Surgeon Generalrsquos New Family Health History Tool Is ReleasedReady for ldquo21st Century Medicinerdquo httpcompmedcomcate-gorypeople-helping-peoplepage7
[2] M Falchi P Forabosco E Mocci et al ldquoA genomewidesearch using an original pairwise sampling approach for largegenealogies identifies a new locus for total and low-density lipo-protein cholesterol in two genetically differentiated isolates ofSardiniardquoThe American Journal of Human Genetics vol 75 no6 pp 1015ndash1031 2004
[3] M Ciullo C Bellenguez V Colonna et al ldquoNew susceptibilitylocus for hypertension on chromosome 8q by efficient pedigree-breaking in an Italian isolaterdquo Human Molecular Genetics vol15 no 10 pp 1735ndash1743 2006
[4] Glossary of Genetic Terms National Human Genome ResearchInstitute httpwwwgenomegovglossaryid=148
[5] CW CottermanA calculus for statistico-genetics [PhD thesis]Columbus Ohio USA Ohio State University 1940 Reprintedin P Ballonoff Ed Genetics and Social Structure DowdenHutchinson amp Ross Stroudsburg Pa USA 1974
[6] G Malecot Les mathematique de lrsquoheredite Masson ParisFrance 1948 Translated edition The Mathematics of HeredityFreeman San Francisco Calif USA 1969
[7] M Gillois ldquoLa relation drsquoidentite en genetiquerdquo Annales delrsquoInstitut Henri Poincare B vol 2 pp 1ndash94 1964
[8] D L Harris ldquoGenotypic covariances between inbred relativesrdquoGenetics vol 50 pp 1319ndash1348 1964
[9] A Jacquard ldquoLogique du calcul des coefficients drsquoidentite entredeux individualsrdquo Population vol 21 pp 751ndash776 1966
[10] G Karigl ldquoA recursive algorithm for the calculation of identitycoefficientsrdquo Annals of Human Genetics vol 45 no 3 pp 299ndash305 1981
[11] B Elliott S F Akgul S Mayes and Z M Ozsoyoglu ldquoEfficientevaluation of inbreeding queries on pedigree datardquo in Proceed-ings of the 19th International Conference on Scientific and Statis-tical Database Management (SSDBM rsquo07) July 2007
[12] B Elliott E Cheng S Mayes and Z M Ozsoyoglu ldquoEfficientlycalculating inbreeding on large pedigrees databasesrdquo Informa-tion Systems vol 34 no 6 pp 469ndash492 2009
[13] L Yang E Cheng and Z M Ozsoyoglu ldquoUsing compactencodings for path-based computations on pedigree graphsrdquo inProceedings of the ACM Conference on Bioinformatics Compu-tational Biology and Biomedicine (ACM-BCB rsquo11) pp 235ndash244August 2011
[14] E Cheng B Elliott and Z M Ozsoyoglu ldquoScalable compu-tation of kinship and identity coefficients on large pedigreesrdquoin Proceedings of the 7th Annual International Conference onComputational Systems Bioinformatics (CSB rsquo08) pp 27ndash362008
[15] E Cheng B Elliott and Z M Ozsoyoglu ldquoEfficient compu-tation of kinship and identity coefficients on large pedigreesrdquoJournal of Bioinformatics and Computational Biology (JBCB)vol 7 no 3 pp 429ndash453 2009
[16] S Wright ldquoCoefficients of inbreeding and relationshiprdquo TheAmerican Naturalist vol 56 no 645 1922
[17] R Nadot and G Vaysseix ldquoKinship and identity algorithm ofcoefficients of identityrdquo Biometrics vol 29 no 2 pp 347ndash3591973
[18] E Cheng Scalable path-based computations on pedigree data[PhD thesis] Case Western Reserve University ClevelandOhio USA 2012
[19] V Ollikainen Simulation Techniques for Disease Gene Localiza-tion in Isolated Populations [PhD thesis] University ofHelsinkiHelsinki Finland 2002
[20] H T T Toivonen P Onkamo K Vasko et al ldquoData miningapplied to linkage diseqilibrium mappingrdquoThe American Jour-nal of Human Genetics vol 67 no 1 pp 133ndash145 2000
[21] W Boucher ldquoCalculation of the inbreeding coefficientrdquo Journalof Mathematical Biology vol 26 no 1 pp 57ndash64 1988
Submit your manuscripts athttpwwwhindawicom
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MEDIATORSINFLAMMATION
of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Behavioural Neurology
EndocrinologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Disease Markers
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
OncologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Oxidative Medicine and Cellular Longevity
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
PPAR Research
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
ObesityJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational and Mathematical Methods in Medicine
OphthalmologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Diabetes ResearchJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Research and TreatmentAIDS
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Gastroenterology Research and Practice
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Parkinsonrsquos Disease
Evidence-Based Complementary and Alternative Medicine
Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom
18 Computational and Mathematical Methods in Medicine
A
a
s
w
t
f
b
Parent-child relationshipAncestor-descendant relationship
(a)
a
s
f
b
Parent-child relationshipAncestor-descendant relationship
(b)
Figure 22 (a) 119887 is not an ancestor of 119886 (b) 119887 is an ancestor of 119886
Lemma A Given a pedigree graph 119866 and three individuals 119886119887 119888 having at least one trip-common ancestorΦ
119886119887119888is correctly
computed using the path counting formulas (12) and (A1)
Proof Proof by induction on the number of triple-commonancestorsBasis 119866 has only one triple-common ancestor of 119886 119887 and 119888
The correctness of (12) and (A1) for 119866 with only one tri-ple-common ancestor of 119886 119887 and 119888 is proven in the previoussection
Induction Hypothesis Assume that if 119866 has 119896 or less triple-common ancestors of 119886 119887 and 119888 (12) and (A1) are correct for119866
Induction Step Now we show that it is true for 119866 with 119896 + 1triple-common ancestors of 119886 119887 and 119888
Let 119879119903119894 119862(119886 119887 119888 119866) denote all triple-common ancestorsof 119886 119887 and 119888 in 119866 where 119879119903119894 119862(119886 119887 119888 119866) = 119860
119894| 1 le 119894 le 119896 +
1 Let 1198601be the most top triple-common ancestor such that
there is no individual among the remaining ancestors 119860119894|
2 le 119894 le 119896 + 1 who is an ancestor of 1198601 Let 119878(119860
1) denote the
contribution from 1198601to Φ119886119887119888
Because119860
1is themost top triple-common ancestor there
is no path-triple from 119860119894| 2 le 119894 le 119896 + 1 to 119886 119887 and
119888 which passes through 1198601 Then we can remove 119860
1from
119866 and delete all out-going edges from 1198601and obtain a new
graph 1198661015840 which has 119896 triple-common ancestors of 119886 119887 and 119888It means 119879119903119894 119862(119886 119887 119888 1198661015840) = 119860
119894| 2 le 119894 le 119896 + 1
For the new graph 1198661015840 we can apply our induction
hypothesis and obtainΦ119886119887119888(1198661015840
)For the most top triple-common ancestor 119860
1 there are
two different cases considering its relationship with the othertriple-common ancestors
(1) there is no individual among 119860119894| 2 le 119894 le 119896 + 1 who
is a descendant of 1198601
(2) there is at least one individual among 119860119894| 2 le 119894 le
119896 + 1 who is a descendant of 1198601
For (1) since no individual among 119860119894| 2 le 119894 le 119896 + 1 is a
descendant of 1198601 the set of path-triples from 119860
1to 119886 119887 and
119888 is independent of the set of path-triples from 119860119894| 2 le 119894 le
119896 + 1 to 119886 119887 and 119888 It also means that the contribution from
1198601toΦ119886119887119888
is independent of the contribution from the othertriple-common ancestors
Summing up all contributions we can obtainΦ119886119887119888(119866) =
Φ119886119887119888(1198661015840
) + 119878(1198601)
For (2) let119860119895be one descendant of119860
1 Now both119860
1and
119860119895can reach 119886 119887 and 119888119901119905119894= 119905119886 1198601rarr sdot sdot sdot rarr 119886 119905
119887 1198601rarr sdot sdot sdot rarr 119887 119905
119888 1198601rarr
sdot sdot sdot rarr 119888 a path-triple from 1198601to 119886 119887 and 119888
If 119905119886 119905119887 and 119905
119888all pass through119860
119895 then the path-triple119901119905
119894
is not an eligible path-triple for Φ119886119887119888
When we compute thecontribution from119860
1toΦ119886119887119888
we exclude all such path-tripleswhere 119905
119886 119905119887 and 119905
119888all pass through a lower triple-common
ancestor In other words an eligible path-triple from 1198601
regarding Φ119886119887119888
cannot have three paths all passing through alower triple-common ancestor Therefore we know that thatthe contribution from119860
1toΦ119886119887119888
is independent of the contri-bution from the other triple-common ancestors Summing upall contributions we obtainΦ
119886119887119888(119866) = Φ
119886119887119888(1198661015840
) + 119878(1198601)
C Proof for Four Individuals and TwoPairs of Individuals
Here we give a proof sketch for the correctness of pathcounting formulas for four individuals First of all for fourindividuals in a pedigree graph 119866 we present all differentcases based on which we construct a dependency graphThe correctness of the path-counting formulas for two-pairindividuals can be proved similarly
C1 Proof for Four Individuals Consider the existence ofdifferent types of path-quads regarding Φ
119886119887119888119889 Φ119886119886119887119888
andΦ119886119886119886119887
there are 15 cases for a pedigree graph 119866
Case 21 119866 has path-triples⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with zero root overlapCase 22 119866 has path-triples
⟨1198751198601198861 1198751198601198862 119875119860119887⟩
with one root overlapCase 23 119866 has path-pairs
⟨119875119860119886 119875119860119887⟩
with zero root overlap
lArr997904 Φ119886119886119886119887
Computational and Mathematical Methods in Medicine 19
Case21
Case31 ΦAAA
ΦAAA
Case41
Case42
Case34ΦAA
Case32
Case331
Case22
Case23
Case431
Case35
Case432
Case4 33
Case332
Case333
Figure 23 Dependency graph for different cases for four individuals
Case 31 119866 has path-quads⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with zero root overlapCase 32 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 2-overlapCase 331 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with two root 2-overlapCase 332 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 3-overlapCase 333 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 2-overlapand one root 3-overlap
Case 34 119866 has path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩
with zero root overlapCase 35 119866 has path-triples
⟨119875119860119886 119875119860119887 119875119860119888⟩
with one root overlap
lArr997904 Φ119886119886119887119888
Case 41 119866 has path-quads⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with zero root overlapCase 42 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 2-overlapCase 431 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with two root 2-overlapCase 432 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 3-overlapCase 433 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 2-overlapand one root 3-overlap
lArr997904 Φ119886119887119888119889
(C1)Then we construct a dependency graph shown in
Figure 23 for all cases for four individualsAccording to the dependency graph in Figure 23 the
intermediate steps including Cases 34 and 35 are already
proved for the computation of Φ119886119887119888
The correctness of thetransformation fromCase 42 to Case 34 can be proved basedon the recursive formula forΦ
119886119887119888119889andΦ
119886119886119887119888 Similarly we can
obtain the transformation from Case 431 to Case 35
C2 Proof for TwoPairs of Individuals Consider the existenceof different types of 2-pair-path-pair regarding Φ
119886119887119888119889 there
are 9 cases which are listed as follows
Case 41 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-
overlap and zero root heter-overlap
Case 42 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-
overlap and one root heter-overlap
Case 431 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root
homo-overlap and two root heter-overlap
Case 432 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root
homo-overlap and two root heter-overlap
Case 44119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root homo-
overlap and zero root heter-overlap
Case 45 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with two root homo-
overlap and zero root heter-overlap
Case 46 119866 has path-triples ⟨119875119860119886 119875119860119887 119875119860119888⟩ with zero root
overlapCase 47 119866 has path-triples ⟨119875
119860119886 119875119860119887 119875119860119888⟩ with one root
overlap
Case 48 119866 has path-pairs ⟨119875119879119888 119875119879119889⟩ with zero root overlap
Then we construct a dependency graph for the casesrelating to Φ
119886119887119888119889in Figure 24
According to the dependency graph in Figure 24Cases 46 47 and 48 are the intermediate steps whichalready are proved for the computation of Φ
119886119887119888 The
correctness of the transformation from Case 42 to Case 46can be proved based on the recursive formula for Φ
119886119887119888119889and
Φ119886119887119886119888
Similarly we can obtain the transformation fromCases 431 and 432 to Case 47 as well as from Case 44 toCase 48 accordingly
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
20 Computational and Mathematical Methods in Medicine
Case41
Case44
ΦAAA
Case42 Case46
Case48
ΦAA
ΦTT
Case431 Case47
Case432
ΦAAAA
Figure 24 Dependency graph for different cases for two pairs of individuals
Acknowledgments
The authors thank Professor Robert C Elston Case Schoolof Medicine for introducing to them the identity coefficientsand referring them to the related literature [7 10 17] Thiswork is partially supported by the National Science Founda-tionGrants DBI 0743705 DBI 0849956 andCRI 0551603 andby the National Institute of Health Grant GM088823
References
[1] Surgeon Generalrsquos New Family Health History Tool Is ReleasedReady for ldquo21st Century Medicinerdquo httpcompmedcomcate-gorypeople-helping-peoplepage7
[2] M Falchi P Forabosco E Mocci et al ldquoA genomewidesearch using an original pairwise sampling approach for largegenealogies identifies a new locus for total and low-density lipo-protein cholesterol in two genetically differentiated isolates ofSardiniardquoThe American Journal of Human Genetics vol 75 no6 pp 1015ndash1031 2004
[3] M Ciullo C Bellenguez V Colonna et al ldquoNew susceptibilitylocus for hypertension on chromosome 8q by efficient pedigree-breaking in an Italian isolaterdquo Human Molecular Genetics vol15 no 10 pp 1735ndash1743 2006
[4] Glossary of Genetic Terms National Human Genome ResearchInstitute httpwwwgenomegovglossaryid=148
[5] CW CottermanA calculus for statistico-genetics [PhD thesis]Columbus Ohio USA Ohio State University 1940 Reprintedin P Ballonoff Ed Genetics and Social Structure DowdenHutchinson amp Ross Stroudsburg Pa USA 1974
[6] G Malecot Les mathematique de lrsquoheredite Masson ParisFrance 1948 Translated edition The Mathematics of HeredityFreeman San Francisco Calif USA 1969
[7] M Gillois ldquoLa relation drsquoidentite en genetiquerdquo Annales delrsquoInstitut Henri Poincare B vol 2 pp 1ndash94 1964
[8] D L Harris ldquoGenotypic covariances between inbred relativesrdquoGenetics vol 50 pp 1319ndash1348 1964
[9] A Jacquard ldquoLogique du calcul des coefficients drsquoidentite entredeux individualsrdquo Population vol 21 pp 751ndash776 1966
[10] G Karigl ldquoA recursive algorithm for the calculation of identitycoefficientsrdquo Annals of Human Genetics vol 45 no 3 pp 299ndash305 1981
[11] B Elliott S F Akgul S Mayes and Z M Ozsoyoglu ldquoEfficientevaluation of inbreeding queries on pedigree datardquo in Proceed-ings of the 19th International Conference on Scientific and Statis-tical Database Management (SSDBM rsquo07) July 2007
[12] B Elliott E Cheng S Mayes and Z M Ozsoyoglu ldquoEfficientlycalculating inbreeding on large pedigrees databasesrdquo Informa-tion Systems vol 34 no 6 pp 469ndash492 2009
[13] L Yang E Cheng and Z M Ozsoyoglu ldquoUsing compactencodings for path-based computations on pedigree graphsrdquo inProceedings of the ACM Conference on Bioinformatics Compu-tational Biology and Biomedicine (ACM-BCB rsquo11) pp 235ndash244August 2011
[14] E Cheng B Elliott and Z M Ozsoyoglu ldquoScalable compu-tation of kinship and identity coefficients on large pedigreesrdquoin Proceedings of the 7th Annual International Conference onComputational Systems Bioinformatics (CSB rsquo08) pp 27ndash362008
[15] E Cheng B Elliott and Z M Ozsoyoglu ldquoEfficient compu-tation of kinship and identity coefficients on large pedigreesrdquoJournal of Bioinformatics and Computational Biology (JBCB)vol 7 no 3 pp 429ndash453 2009
[16] S Wright ldquoCoefficients of inbreeding and relationshiprdquo TheAmerican Naturalist vol 56 no 645 1922
[17] R Nadot and G Vaysseix ldquoKinship and identity algorithm ofcoefficients of identityrdquo Biometrics vol 29 no 2 pp 347ndash3591973
[18] E Cheng Scalable path-based computations on pedigree data[PhD thesis] Case Western Reserve University ClevelandOhio USA 2012
[19] V Ollikainen Simulation Techniques for Disease Gene Localiza-tion in Isolated Populations [PhD thesis] University ofHelsinkiHelsinki Finland 2002
[20] H T T Toivonen P Onkamo K Vasko et al ldquoData miningapplied to linkage diseqilibrium mappingrdquoThe American Jour-nal of Human Genetics vol 67 no 1 pp 133ndash145 2000
[21] W Boucher ldquoCalculation of the inbreeding coefficientrdquo Journalof Mathematical Biology vol 26 no 1 pp 57ndash64 1988
Submit your manuscripts athttpwwwhindawicom
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MEDIATORSINFLAMMATION
of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Behavioural Neurology
EndocrinologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Disease Markers
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
OncologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Oxidative Medicine and Cellular Longevity
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
PPAR Research
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
ObesityJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational and Mathematical Methods in Medicine
OphthalmologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Diabetes ResearchJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Research and TreatmentAIDS
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Gastroenterology Research and Practice
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Parkinsonrsquos Disease
Evidence-Based Complementary and Alternative Medicine
Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom
Computational and Mathematical Methods in Medicine 19
Case21
Case31 ΦAAA
ΦAAA
Case41
Case42
Case34ΦAA
Case32
Case331
Case22
Case23
Case431
Case35
Case432
Case4 33
Case332
Case333
Figure 23 Dependency graph for different cases for four individuals
Case 31 119866 has path-quads⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with zero root overlapCase 32 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 2-overlapCase 331 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with two root 2-overlapCase 332 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 3-overlapCase 333 119866 has path-quads
⟨1198751198601198861 1198751198601198862 119875119860119887 119875119860119888⟩
with one root 2-overlapand one root 3-overlap
Case 34 119866 has path-triples⟨119875119860119886 119875119860119887 119875119860119888⟩
with zero root overlapCase 35 119866 has path-triples
⟨119875119860119886 119875119860119887 119875119860119888⟩
with one root overlap
lArr997904 Φ119886119886119887119888
Case 41 119866 has path-quads⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with zero root overlapCase 42 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 2-overlapCase 431 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with two root 2-overlapCase 432 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 3-overlapCase 433 119866 has path-quads
⟨119875119860119886 119875119860119887 119875119860119888 119875119860119889⟩
with one root 2-overlapand one root 3-overlap
lArr997904 Φ119886119887119888119889
(C1)Then we construct a dependency graph shown in
Figure 23 for all cases for four individualsAccording to the dependency graph in Figure 23 the
intermediate steps including Cases 34 and 35 are already
proved for the computation of Φ119886119887119888
The correctness of thetransformation fromCase 42 to Case 34 can be proved basedon the recursive formula forΦ
119886119887119888119889andΦ
119886119886119887119888 Similarly we can
obtain the transformation from Case 431 to Case 35
C2 Proof for TwoPairs of Individuals Consider the existenceof different types of 2-pair-path-pair regarding Φ
119886119887119888119889 there
are 9 cases which are listed as follows
Case 41 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-
overlap and zero root heter-overlap
Case 42 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root homo-
overlap and one root heter-overlap
Case 431 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with zero root
homo-overlap and two root heter-overlap
Case 432 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root
homo-overlap and two root heter-overlap
Case 44119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with one root homo-
overlap and zero root heter-overlap
Case 45 119866 has ⟨(119875119860119886 119875119860119887) (119875119860119888 119875119860119889)⟩ with two root homo-
overlap and zero root heter-overlap
Case 46 119866 has path-triples ⟨119875119860119886 119875119860119887 119875119860119888⟩ with zero root
overlapCase 47 119866 has path-triples ⟨119875
119860119886 119875119860119887 119875119860119888⟩ with one root
overlap
Case 48 119866 has path-pairs ⟨119875119879119888 119875119879119889⟩ with zero root overlap
Then we construct a dependency graph for the casesrelating to Φ
119886119887119888119889in Figure 24
According to the dependency graph in Figure 24Cases 46 47 and 48 are the intermediate steps whichalready are proved for the computation of Φ
119886119887119888 The
correctness of the transformation from Case 42 to Case 46can be proved based on the recursive formula for Φ
119886119887119888119889and
Φ119886119887119886119888
Similarly we can obtain the transformation fromCases 431 and 432 to Case 47 as well as from Case 44 toCase 48 accordingly
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
20 Computational and Mathematical Methods in Medicine
Case41
Case44
ΦAAA
Case42 Case46
Case48
ΦAA
ΦTT
Case431 Case47
Case432
ΦAAAA
Figure 24 Dependency graph for different cases for two pairs of individuals
Acknowledgments
The authors thank Professor Robert C Elston Case Schoolof Medicine for introducing to them the identity coefficientsand referring them to the related literature [7 10 17] Thiswork is partially supported by the National Science Founda-tionGrants DBI 0743705 DBI 0849956 andCRI 0551603 andby the National Institute of Health Grant GM088823
References
[1] Surgeon Generalrsquos New Family Health History Tool Is ReleasedReady for ldquo21st Century Medicinerdquo httpcompmedcomcate-gorypeople-helping-peoplepage7
[2] M Falchi P Forabosco E Mocci et al ldquoA genomewidesearch using an original pairwise sampling approach for largegenealogies identifies a new locus for total and low-density lipo-protein cholesterol in two genetically differentiated isolates ofSardiniardquoThe American Journal of Human Genetics vol 75 no6 pp 1015ndash1031 2004
[3] M Ciullo C Bellenguez V Colonna et al ldquoNew susceptibilitylocus for hypertension on chromosome 8q by efficient pedigree-breaking in an Italian isolaterdquo Human Molecular Genetics vol15 no 10 pp 1735ndash1743 2006
[4] Glossary of Genetic Terms National Human Genome ResearchInstitute httpwwwgenomegovglossaryid=148
[5] CW CottermanA calculus for statistico-genetics [PhD thesis]Columbus Ohio USA Ohio State University 1940 Reprintedin P Ballonoff Ed Genetics and Social Structure DowdenHutchinson amp Ross Stroudsburg Pa USA 1974
[6] G Malecot Les mathematique de lrsquoheredite Masson ParisFrance 1948 Translated edition The Mathematics of HeredityFreeman San Francisco Calif USA 1969
[7] M Gillois ldquoLa relation drsquoidentite en genetiquerdquo Annales delrsquoInstitut Henri Poincare B vol 2 pp 1ndash94 1964
[8] D L Harris ldquoGenotypic covariances between inbred relativesrdquoGenetics vol 50 pp 1319ndash1348 1964
[9] A Jacquard ldquoLogique du calcul des coefficients drsquoidentite entredeux individualsrdquo Population vol 21 pp 751ndash776 1966
[10] G Karigl ldquoA recursive algorithm for the calculation of identitycoefficientsrdquo Annals of Human Genetics vol 45 no 3 pp 299ndash305 1981
[11] B Elliott S F Akgul S Mayes and Z M Ozsoyoglu ldquoEfficientevaluation of inbreeding queries on pedigree datardquo in Proceed-ings of the 19th International Conference on Scientific and Statis-tical Database Management (SSDBM rsquo07) July 2007
[12] B Elliott E Cheng S Mayes and Z M Ozsoyoglu ldquoEfficientlycalculating inbreeding on large pedigrees databasesrdquo Informa-tion Systems vol 34 no 6 pp 469ndash492 2009
[13] L Yang E Cheng and Z M Ozsoyoglu ldquoUsing compactencodings for path-based computations on pedigree graphsrdquo inProceedings of the ACM Conference on Bioinformatics Compu-tational Biology and Biomedicine (ACM-BCB rsquo11) pp 235ndash244August 2011
[14] E Cheng B Elliott and Z M Ozsoyoglu ldquoScalable compu-tation of kinship and identity coefficients on large pedigreesrdquoin Proceedings of the 7th Annual International Conference onComputational Systems Bioinformatics (CSB rsquo08) pp 27ndash362008
[15] E Cheng B Elliott and Z M Ozsoyoglu ldquoEfficient compu-tation of kinship and identity coefficients on large pedigreesrdquoJournal of Bioinformatics and Computational Biology (JBCB)vol 7 no 3 pp 429ndash453 2009
[16] S Wright ldquoCoefficients of inbreeding and relationshiprdquo TheAmerican Naturalist vol 56 no 645 1922
[17] R Nadot and G Vaysseix ldquoKinship and identity algorithm ofcoefficients of identityrdquo Biometrics vol 29 no 2 pp 347ndash3591973
[18] E Cheng Scalable path-based computations on pedigree data[PhD thesis] Case Western Reserve University ClevelandOhio USA 2012
[19] V Ollikainen Simulation Techniques for Disease Gene Localiza-tion in Isolated Populations [PhD thesis] University ofHelsinkiHelsinki Finland 2002
[20] H T T Toivonen P Onkamo K Vasko et al ldquoData miningapplied to linkage diseqilibrium mappingrdquoThe American Jour-nal of Human Genetics vol 67 no 1 pp 133ndash145 2000
[21] W Boucher ldquoCalculation of the inbreeding coefficientrdquo Journalof Mathematical Biology vol 26 no 1 pp 57ndash64 1988
Submit your manuscripts athttpwwwhindawicom
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MEDIATORSINFLAMMATION
of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Behavioural Neurology
EndocrinologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Disease Markers
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
OncologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Oxidative Medicine and Cellular Longevity
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
PPAR Research
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
ObesityJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational and Mathematical Methods in Medicine
OphthalmologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Diabetes ResearchJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Research and TreatmentAIDS
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Gastroenterology Research and Practice
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Parkinsonrsquos Disease
Evidence-Based Complementary and Alternative Medicine
Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom
20 Computational and Mathematical Methods in Medicine
Case41
Case44
ΦAAA
Case42 Case46
Case48
ΦAA
ΦTT
Case431 Case47
Case432
ΦAAAA
Figure 24 Dependency graph for different cases for two pairs of individuals
Acknowledgments
The authors thank Professor Robert C Elston Case Schoolof Medicine for introducing to them the identity coefficientsand referring them to the related literature [7 10 17] Thiswork is partially supported by the National Science Founda-tionGrants DBI 0743705 DBI 0849956 andCRI 0551603 andby the National Institute of Health Grant GM088823
References
[1] Surgeon Generalrsquos New Family Health History Tool Is ReleasedReady for ldquo21st Century Medicinerdquo httpcompmedcomcate-gorypeople-helping-peoplepage7
[2] M Falchi P Forabosco E Mocci et al ldquoA genomewidesearch using an original pairwise sampling approach for largegenealogies identifies a new locus for total and low-density lipo-protein cholesterol in two genetically differentiated isolates ofSardiniardquoThe American Journal of Human Genetics vol 75 no6 pp 1015ndash1031 2004
[3] M Ciullo C Bellenguez V Colonna et al ldquoNew susceptibilitylocus for hypertension on chromosome 8q by efficient pedigree-breaking in an Italian isolaterdquo Human Molecular Genetics vol15 no 10 pp 1735ndash1743 2006
[4] Glossary of Genetic Terms National Human Genome ResearchInstitute httpwwwgenomegovglossaryid=148
[5] CW CottermanA calculus for statistico-genetics [PhD thesis]Columbus Ohio USA Ohio State University 1940 Reprintedin P Ballonoff Ed Genetics and Social Structure DowdenHutchinson amp Ross Stroudsburg Pa USA 1974
[6] G Malecot Les mathematique de lrsquoheredite Masson ParisFrance 1948 Translated edition The Mathematics of HeredityFreeman San Francisco Calif USA 1969
[7] M Gillois ldquoLa relation drsquoidentite en genetiquerdquo Annales delrsquoInstitut Henri Poincare B vol 2 pp 1ndash94 1964
[8] D L Harris ldquoGenotypic covariances between inbred relativesrdquoGenetics vol 50 pp 1319ndash1348 1964
[9] A Jacquard ldquoLogique du calcul des coefficients drsquoidentite entredeux individualsrdquo Population vol 21 pp 751ndash776 1966
[10] G Karigl ldquoA recursive algorithm for the calculation of identitycoefficientsrdquo Annals of Human Genetics vol 45 no 3 pp 299ndash305 1981
[11] B Elliott S F Akgul S Mayes and Z M Ozsoyoglu ldquoEfficientevaluation of inbreeding queries on pedigree datardquo in Proceed-ings of the 19th International Conference on Scientific and Statis-tical Database Management (SSDBM rsquo07) July 2007
[12] B Elliott E Cheng S Mayes and Z M Ozsoyoglu ldquoEfficientlycalculating inbreeding on large pedigrees databasesrdquo Informa-tion Systems vol 34 no 6 pp 469ndash492 2009
[13] L Yang E Cheng and Z M Ozsoyoglu ldquoUsing compactencodings for path-based computations on pedigree graphsrdquo inProceedings of the ACM Conference on Bioinformatics Compu-tational Biology and Biomedicine (ACM-BCB rsquo11) pp 235ndash244August 2011
[14] E Cheng B Elliott and Z M Ozsoyoglu ldquoScalable compu-tation of kinship and identity coefficients on large pedigreesrdquoin Proceedings of the 7th Annual International Conference onComputational Systems Bioinformatics (CSB rsquo08) pp 27ndash362008
[15] E Cheng B Elliott and Z M Ozsoyoglu ldquoEfficient compu-tation of kinship and identity coefficients on large pedigreesrdquoJournal of Bioinformatics and Computational Biology (JBCB)vol 7 no 3 pp 429ndash453 2009
[16] S Wright ldquoCoefficients of inbreeding and relationshiprdquo TheAmerican Naturalist vol 56 no 645 1922
[17] R Nadot and G Vaysseix ldquoKinship and identity algorithm ofcoefficients of identityrdquo Biometrics vol 29 no 2 pp 347ndash3591973
[18] E Cheng Scalable path-based computations on pedigree data[PhD thesis] Case Western Reserve University ClevelandOhio USA 2012
[19] V Ollikainen Simulation Techniques for Disease Gene Localiza-tion in Isolated Populations [PhD thesis] University ofHelsinkiHelsinki Finland 2002
[20] H T T Toivonen P Onkamo K Vasko et al ldquoData miningapplied to linkage diseqilibrium mappingrdquoThe American Jour-nal of Human Genetics vol 67 no 1 pp 133ndash145 2000
[21] W Boucher ldquoCalculation of the inbreeding coefficientrdquo Journalof Mathematical Biology vol 26 no 1 pp 57ndash64 1988
Submit your manuscripts athttpwwwhindawicom
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MEDIATORSINFLAMMATION
of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Behavioural Neurology
EndocrinologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Disease Markers
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
OncologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Oxidative Medicine and Cellular Longevity
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
PPAR Research
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
ObesityJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational and Mathematical Methods in Medicine
OphthalmologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Diabetes ResearchJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Research and TreatmentAIDS
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Gastroenterology Research and Practice
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Parkinsonrsquos Disease
Evidence-Based Complementary and Alternative Medicine
Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom
Submit your manuscripts athttpwwwhindawicom
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MEDIATORSINFLAMMATION
of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Behavioural Neurology
EndocrinologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Disease Markers
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
OncologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Oxidative Medicine and Cellular Longevity
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
PPAR Research
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
ObesityJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational and Mathematical Methods in Medicine
OphthalmologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Diabetes ResearchJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Research and TreatmentAIDS
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Gastroenterology Research and Practice
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Parkinsonrsquos Disease
Evidence-Based Complementary and Alternative Medicine
Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom