Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Relatedness and Differentiation in Arbitrary Population StructuresAlejandro Ochoa1,2 and John D. Storey1,2
1Lewis-Sigler Institute for Integrative Genomics, and 2Center for Statistics and Machine Learning, Princeton University, Princeton, NJ 08544, USA
AbstractSeveral important biomedical applications, including genome-wide associationstudies and heritability estimation for complex traits, require accurate model-ing of the covariance structure of genetic variants. This dependence structurebetween individuals is parametrized by kinship coefficients, which are definedas the probability that random alleles are “identical by descent” (IBD). The fix-ation index FST is also an IBD probability that measures the overall populationstructure. My work is focused on extending current models and estimation ap-proaches for kinship and FST to arbitrary population structures, where individu-als are not assumed to belong to subpopulations that are disjoint, homogeneous,and statistically independent, such as African-Americans and Hispanics. HereI show how my approach improves upon previous approaches in real humandatasets containing world-wide samples and in simulations where the true pa-rameters are known. My work led to a novel kinship and FST estimation frame-work with greatly improved accuracy, implemented in the R package popkinavailable on CRAN. These results have direct implications in the estimation ofheritability, association studies, and other analyses where population structureis a confounder. Tesearch supported in part by NIH grant R01 HG006448.
Model
xij ∈ {0, 1, 2}: genotype of ind. j, biallelic SNP i, counting ref alelles.pi: ancestral allele frequency. ϕjk: kinship coefficient. fj = 2ϕjj− 1:inbreeding coefficient. Genotype moments:
E[xij] = 2pi, Cov(xij, xik) = 4pi(1− pi)ϕjk.
Generalized FST for locally-outbred individuals [1]:
FST =n∑
j=1wjfj.
Standard kinship estimator
Biased when there is population structure [2, 3].
ϕ̂stdjk = 1
m
m∑i=1
(xij − 2p̂i) (xik − 2p̂i)4p̂i (1− p̂i)
, p̂i = 12n
n∑j=1
xij.
New kinship estimator: popkin
Derived for arbitrary population structures [2].
Ajk = 1m
m∑i=1
(xij − 1)(xik − 1)− 1,
Amin = minu 6=v
1|Su||Sv|
∑j∈Su
∑k∈Sv
Ajk, ϕ̂jk = 1− Ajk
Amin.
7 DrAlexOchoa� viiia.org/research/R [email protected]
Index
NA Simulations
Index
NA Real Human Data
Ind. Subpops.
Trut
h
A AdmixtureB Human OriginsC TGP HispanicsD
New
Est
imat
orE
xist
ing
Est
imat
or
−0.1
00.
10.
20.
30.
4
Kin
ship
WC New0.00
0.05
0.10
FS
T
WC New0.00
0.05
0.10
WC New
0.0
0.1
0.2
WC New0.00
0.10
0.20
FST estimator
Individuals
Figure 1: New and existing kinship and FST estimates in simula-tions and real human data. Rows: (1) true kinship matrix, (2) new kinshipestimates, (3) standard kinship estimates, and (4) comparison of Weir-Cockerhamto our new FST estimates and true FST (red dashed lines). Kinship matrix colorsare ϕjk values between individual pairs j, k, and fj along diagonal. Columns aredatasets: A. Independent subpopulations simulation (assumed by existing FST esti-mators). B. Spatial admixture simulation (demonstrates biases in existing methodsand superior performance of our new approach). C. Human Origins dataset (globalnative populations). D. Hispanic subset of 1000 Genomes Project.
Results and Conclusions
Our theory and simulations [1, 2] and real data analysis show thatexisting kinship and FST estimation approaches are biased in arbi-trary population structures, and our new approach overcomes theselimitations (Fig. 1). We estimate a complex population structure innative Human samples (Fig. 2) and a pattern of differentiation consis-tent with the Out-of-Africa model (Fig. 3). Our approach estimatesa higher FST ≈ 26% than existing approaches (≈ 10% or less) whichassume independent subpopulations (Fig. 4).
References
[1] Alejandro Ochoa and John D. Storey. “FST and kinship for arbitrary population structures I: Generalized defi-nitions”. Submitted, preprint at http://biorxiv.org/content/early/2016/10/27/083915. 2016.
[2] Alejandro Ochoa and John D. Storey. “FST and kinship for arbitrary population structures II: Method of momentsestimators”. Submitted, preprint at http://biorxiv.org/content/early/2016/10/27/083923. 2016.
[3] Bruce S. Weir and Jérôme Goudet. “A Unified Characterization of Population Structure and Relatedness”.Genetics (2017), genetics.116.198424.
Ju_h
oan_
Sou
thJu
_hoa
n_N
orth
Taa_
Wes
tTa
a_E
ast
Taa_
Nor
thN
aro
Gui
Hoa
nX
uun
Gan
aT
shw
aK
hom
ani
Nam
aH
aiom
Kga
laga
diS
hua
Khw
eM
buti
Bia
kaB
antu
SA
Tsw
ana
Dam
ara
Him
baW
ambo
Yoru
baE
san
Men
deB
antu
Ken
yaLu
hya
Man
denk
aG
ambi
an Luo
Din
kaK
ikuy
uH
adza
San
daw
eM
asai
Dat
ogS
omal
iO
rom
oJe
w_E
thio
pian
Sah
araw
iM
oroc
can
Moz
abite
Alg
eria
nTu
nisi
anLi
byan
Egy
ptia
nYe
men
iJe
w_L
ibya
nJe
w_T
unis
ian
Jew
_Mor
occa
nJe
w_Y
emen
iteS
audi
Bed
ouin
BB
edou
inA
Pal
estin
ian
Jord
ania
nS
yria
nLe
bane
se_M
uslim
Leba
nese
_Chr
istia
nJe
w_T
urki
shA
ssyr
ian
Dru
zeLe
bane
seJe
w_i
raqi
Iran
ian_
Ban
dari
Iran
ian
Jew
_Ira
nian
Turk
ish
Jew
_Ash
kena
ziC
yprio
tM
alte
seC
anar
y_Is
land
erIta
lian_
Sou
thS
icili
anR
oman
ian
Fre
nchN
Spa
nish
SW
Gre
ekIta
lian_
Nor
thS
pani
shN
EF
renc
hSS
ardi
nian
Bas
que
Orc
adia
nE
nglis
hIc
elan
dic
Nor
weg
ian
Iris
hIr
ish_
Uls
ter
Sco
ttish
She
tland
icG
erm
anS
orb
Pol
ish
Cze
chC
roat
ian
Hun
garia
nA
lban
ian
Bul
garia
nM
ordo
vian
Fin
nish
Rus
sian
Est
onia
nU
krai
nian
Bel
arus
ian
Lith
uani
anA
rmen
ian
Jew
_Geo
rgia
nG
eorg
ian
Abk
hasi
anA
dyge
iN
orth
_Oss
etia
nC
hech
enLe
zgin
Kum
ykB
alka
rN
ogai
Mak
rani
Bra
hui
Bal
ochi
Jew
_Coc
hin
Tajik
Kal
ash
Pat
han
Sin
dhi
Bur
usho
Bra
hmin
_Tiw
ari
Guj
arat
iP
unja
biV
ishw
abra
hmin
Lodh
iM
ala
Ben
gali
Kha
riaO
nge
Chu
vash
Turk
men
Uzb
ekH
azar
aU
ygur
Tuba
lar
Man
siS
elku
pK
yrgy
zA
ltaia
nTu
vini
anN
gana
san
Dol
gan
Yaku
tX
ibo
Hez
hen
Oro
qen
Ulc
hiE
ven
Yuka
gir
Itelm
enK
orya
kC
hukc
hiE
skim
oK
usun
daK
alm
ykB
urm
ese
Cam
bodi
anT
hai
TuM
ongo
laD
aur
Kin
hV
ietn
ames
eD
aiLa
huN
axi
Yi
Tujia
Han
Mia
oS
heJa
pane
seK
orea
nA
mi
Ata
yal
Kan
kana
eyM
urut
Dus
unIlo
cano
Taga
log
Vis
ayan
Baj
oM
alay
Lebb
oC
hipe
wya
nP
ima
Mix
eM
ixte
cZ
apot
ecM
ayan
Kaq
chik
elC
abec
arP
iapo
coK
ariti
ana
Sur
uiB
oliv
ian
Aym
ara
Que
chua
Gua
rani
Mus
sau
Sap
osa
Buk
aTe
opT
igak
Aus
tral
ian
Kov
eM
elam
ela
Nak
anai
_Bile
kiM
angs
eng
Man
usN
asoi
Nai
likN
otsi
SW
_Bou
gain
ville
Kuo
t_K
abil
Kuo
t_La
mal
aua
Lavo
ngai
Mad
akTo
lai
Men
gen
Mam
usi
Mam
usi_
Pal
eabu
Nak
anai
_Los
oA
taS
ulka
Kol
_New
_Brit
ain
Bai
ning
_Mal
asai
tB
aini
ng_M
arab
uP
apua
n
Ju_hoan_SouthJu_hoan_North
Taa_WestTaa_East
Taa_NorthNaro
GuiHoanXuunGana
TshwaKhomani
NamaHaiom
KgalagadiShuaKhweMbutiBiaka
BantuSATswanaDamara
HimbaWamboYoruba
EsanMende
BantuKenyaLuhya
MandenkaGambian
LuoDinka
KikuyuHadza
SandaweMasaiDatog
SomaliOromo
Jew_EthiopianSaharawi
MoroccanMozabiteAlgerianTunisian
LibyanEgyptian
YemeniJew_Libyan
Jew_TunisianJew_MoroccanJew_Yemenite
SaudiBedouinBBedouinA
PalestinianJordanian
SyrianLebanese_Muslim
Lebanese_ChristianJew_Turkish
AssyrianDruze
LebaneseJew_iraqi
Iranian_BandariIranian
Jew_IranianTurkish
Jew_AshkenaziCypriotMaltese
Canary_IslanderItalian_South
SicilianRomanian
FrenchNSpanishSW
GreekItalian_North
SpanishNEFrenchS
SardinianBasque
OrcadianEnglish
IcelandicNorwegian
IrishIrish_Ulster
ScottishShetlandic
GermanSorb
PolishCzech
CroatianHungarian
AlbanianBulgarian
MordovianFinnish
RussianEstonian
UkrainianBelarusianLithuanianArmenian
Jew_GeorgianGeorgian
AbkhasianAdygei
North_OssetianChechen
LezginKumykBalkarNogai
MakraniBrahui
BalochiJew_Cochin
TajikKalashPathanSindhi
BurushoBrahmin_Tiwari
GujaratiPunjabi
VishwabrahminLodhiMala
BengaliKhariaOnge
ChuvashTurkmen
UzbekHazaraUygur
TubalarMansi
SelkupKyrgyzAltaian
TuvinianNganasan
DolganYakutXibo
HezhenOroqen
UlchiEven
YukagirItelmenKoryak
ChukchiEskimo
KusundaKalmyk
BurmeseCambodian
ThaiTu
MongolaDaurKinh
VietnameseDai
LahuNaxi
YiTujiaHan
MiaoShe
JapaneseKorean
AmiAtayal
KankanaeyMurut
DusunIlocanoTagalogVisayan
BajoMalayLebbo
ChipewyanPimaMixe
MixtecZapotec
MayanKaqchikel
CabecarPiapoco
KaritianaSurui
BolivianAymara
QuechuaGuaraniMussauSaposa
BukaTeop
TigakAustralian
KoveMelamela
Nakanai_BilekiMangseng
ManusNasoiNailikNotsi
SW_BougainvilleKuot_Kabil
Kuot_LamalauaLavongai
MadakTolai
MengenMamusi
Mamusi_PaleabuNakanai_Loso
AtaSulka
Kol_New_BritainBaining_MalasaitBaining_Marabu
Papuan
SAfrica MAfrica NAfrica MiddleEast Europe Caucasus SAsia NAsia EAsia Americas Oceania
SA
fric
aM
Afr
ica
NA
fric
aM
iddl
eEas
tE
urop
eC
auca
sus
SA
sia
NA
sia
EA
sia
Am
eric
asO
cean
ia
00.
10.
20.
3
Kin
ship
Indi
vidu
als
Figure 2: Population-wide kinship estimates in Human Origins.Our new estimates reveal substantial kinship between subpopulations and hetero-geneity within subpopulations.
●●
●●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●●
●●
●●● ●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●● ●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●●
●
●
● ●
● ● ●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●●
●●
●● ●
●●
● ●
●
●
●
●
●
●
●
●
●
●
0.2
0.3
0.4
Inbr
eedi
ng c
oeff.
Figure 3: Geography of population-level inbreeding. Mean fj in-creases with distance from Africa, as expected under the Out-of-Africa serial foundermodel.
0.0 0.1 0.2 0.3 0.4 0.5
020
40
Inbreeding Coefficient
Den
sity
SAfricaMAfricaNAfricaMiddleEastEuropeCaucasusSAsiaNAsiaEAsiaAmericasOceania
Subpopulations
SAfricaMAfricaNAfricaMiddleEastEuropeCaucasusSAsiaNAsiaEAsiaAmericasOceania
FST estimates
Weir−CockerhamHudsonKNew
Figure 4: Inbreeding and FST estimates in Human Origins. Weir-Cockerham and HudsonK assumed the K = 11 subpopulations are independent,which causes downward bias.