The update of the phylogenetic structure of Q1b haplogroup based on full Y-chromosome sequencing

Embed Size (px)

Citation preview

  • 8/11/2019 The update of the phylogenetic structure of Q1b haplogroup based on full Y-chromosome sequencing

    1/9

    The Russian Journal of Genetic Genealogy ( ): 6, 1, 2014 ISSN: 1920-2997 http://ru.rjgg.org RJGG

    64

    ___________________________________________________________

    Received: May 10 2014; accepted: May 12 2014; published: May 20 2014.Correspondence: [email protected] [email protected]

    The updateof the phylogenetic structure

    of Q1b haplogroupbased on full

    Y-chromosome sequencing

    Vladimir GurianovRoman SychyevVladimir TagankinVadim Urasin

    1 Independent Researcher, Russia,2YFull Research Group, Russia.

    Abstract

    The new data of full Y-chromosome sequencing allowed the update of the Q1b (Q-L275) haplogroup struc-ture, as well as in identifying new subclades: Q-Y2990 (downstream Q-Y2250), Q-Y2225 (downstream Q-Y2220)and Q-Y3030 (downstream Q-Y2200). It created the background for continuation of further researches of the in-ner structure of the pointed subclades and on comparing of their existing ethno-population composition with themigration of the Indo-European tribes.

    Introduction

    Over the short period passed after publica-

    tion of V. Gurianov et al.s article (2013)1

    ,sev-eral samples of full Y-chromosome sequencingreferring to Q1b (Q-L275) haplogroup and itsdownstream subclades, became public available.

    The analysis of new data made it possible toupdate the phylogenetic structure of Q1b hap-logroup, to identify new subclades, to perform a

    1 V. Gurianov et al. (2013) Phylogenetic Structure of Q-M378 Subclade Based.On Full Y-Chromosome Sequencing. The Russian Journal of Genetic GenealogyVolume 5, 1, 56-74.

    more in-depth typing of a range of scientificsamples, and to make a feasible hypothesis onthe pre-historic migration routes of the Indo-

    European tribes.

    Source Data and Methodology

    Data Sets for Comparison

    The data on the examined samples aresummarised in the table below:

    Table 1. Information on the researched samples of full Y-chromosome sequencing.

    Sample code Population Verified origin Source of the information

    HGDP00100 Hazara Pakistan Lippold et al. (2014)2HGDP00129 Hazara Pakistan Lippold et al. (2014)HGDP00165 Sindhi Pakistan Lippold et al. (2014)PGP193 N/A N/A3 The Personal Genome Project4Eu1 Italians Sicilia, Italy Provided by a volunteer5Eu2 Portugal Azores, Portugal Provided by a volunteer5

    2Sebastian Lippold et al. (2014) Human paternal and maternal demographic histories: insights from high-resolution Y chromosome and mtDNA sequences,doi: 10.1101/0017923Current location California, USA.4 http://www.personalgenomes.org/5 The test was performed by Full Genomes Corporation (FGC) in Beijing Genomics Institute at Illumina HiSeq 2000 sequenator, and is characterized by the following pa-

    rameters: coverage 50 at read length of 100 base pairs.

  • 8/11/2019 The update of the phylogenetic structure of Q1b haplogroup based on full Y-chromosome sequencing

    2/9

    The Russian Journal of Genetic Genealogy ( ): 6, 1, 2014 ISSN: 1920-2997 http://ru.rjgg.org RJGG

    65

    Genotyping

    Data sets in BAM format (BAM/SAM Specifi-cation6) and, in case of PGP193, TSV7 format

    were used for the research. The parameters ofNew Generation sequencing (NGS) of Eu1 andEu2 samples performed by Full Genomes Corpo-ration at Beijing Genomics Institute are thesame as were previously described in the articleof V. Gurianov et al. (2013).

    Data Processing and Analysis

    Processing and analysis of full Y-chromosomesequencing data were made using the softwaredeveloped by YFull research group8, and theVCFTools9.

    Each sample was analysed for both SNPs dis-covered during the research and SNPs includedin the ISOGG list under Q1b haplogroup and itsdownstream subclades.

    Presence of mutation in more than two malesamples not being relatives, as well as data con-sistency between the new SNPs and the previ-ously known information on phylogenetic struc-ture of a respective subclade, served as the cri-

    terion of a new SNP discovery.

    6The specification in force is located here: https://github.com/samtools/hts-specs7TSV ( Tab Separated Values) text format to present table values.8http://www.yfull.com/9http://sourceforge.net/projects/vcftools/

    The research also specified phylogenetic po-sition of SNPs previously described in the articleof V. Gurianov et al. (2013).

    Results

    Eu1 Data Analysis

    The research findings of Eu1 sample werepromptly submitted to ISOGG, and as of thedate of this article have been already includedinto the current version of ISOGG SNP Tree.Nevertheless, we consider it necessary to give adetailed description of the revealed SNPs and al-teration of the structure of subclades down-stream of Q-Y2220 resulted there from.

    Level Q-Y2225 and SNPs general for AJ1,PGP193, and Eu1 were formalized upon compar-ing Eu1 sample with the samples of YFull database.

    SNPs typical for this level are included in theTable below.

    Table 2. SNPs of the Q-Y2225 level.

    Position (hg19)Ancestral

    valueValue positive

    for SNPSNP name

    23646920 C T Y219622471554 A T Y2201

    19425984 G A Y220619053060 C T Y220718207170 A G Y220818046486 T C Y221018043999 G A Y221115834557 G A Y221315658212 C T Y221414385853 T G Y22159892635 C T Y22198662585 C A Y22246949449 C T Y2225

    Consequently Q-Y2200 subclade which now

    may pretend for a more accurate compliance

    Jewish cluster of the Q-L245 branch, is currently

    defined by the following single level SNPs:

  • 8/11/2019 The update of the phylogenetic structure of Q1b haplogroup based on full Y-chromosome sequencing

    3/9

    The Russian Journal of Genetic Genealogy ( ): 6, 1, 2014 ISSN: 1920-2997 http://ru.rjgg.org RJGG

    66

    Table 3. SNPs of the Q-Y2200 level.

    Position (hg19)Ancestral

    valueValue positive

    for SNPSNP name (Y)

    22953894 A G Y2197

    22825080 A G Y219822588598 C T Y220021277083 G A Y220316994660 T A Y221214353022 A C Y221614184253 C A Y21189401947 C A Y22214606181 C T Y22313995524 G A Y22323148720 A G Y2233

    Since according to the phylogenetic structuremade on base of STR-markers Eu1 sample is

    located in the centre of Q-L245 European clus-ter (and presents a typical value ofDYF395S1=15-17, which is an ancestral), wemay reasonably assume that many of its privateSNPs will form branches of the tree subject toavailability of close samples to compare. To thisend the private SNPs of Eu1 sample are includedinto a separate Schedule 1.

    Eu2 Data Analysis

    Eu2 sample was originally known as positiveto a private SNPL327. Comparison of the statedsample with other ones stored in YFull data basedefined a new branch Q-Y2990, downstream ofQ-Y2550 and parallel to the Iran branch Q-L301.

    Table 4. SNPS of the Q-Y2990 branch.

    Position (hg19)Ancestral

    valueValue positive

    for SNPSNP name (Y)

    7929100 A C Y29865398133 A T Y298715540398 G A Y298815656595 A C Y298917455705 C G Y299018205189 C A Y299118427622 C T Y299221794826 T C Y299321824228 C T Y299422779292 G A Y299523574588 G T Y29966675390 A G Y2997

  • 8/11/2019 The update of the phylogenetic structure of Q1b haplogroup based on full Y-chromosome sequencing

    4/9

    The Russian Journal of Genetic Genealogy ( ): 6, 1, 2014 ISSN: 1920-2997 http://ru.rjgg.org RJGG

    67

    The stated branch currently includes twosamples: Eu2 and Kz1.

    It is worth noting that Eu2 haplotype on the

    phylogenetic tree constructed with regard tovalues of 67 STR-markers is located near theroot of the tree, therefore it was possible to cal-culate the time period when Q-M378 com-menced to actively divide into two subclades: Q-L245 and Q-Y2250. Two calculations made withthe use of MURKA software10and the method ofrandom pairs STR haplotype11 demonstratedthis time to be 5000 years ago.

    10MURKA http://sourceforge.net/projects/phylomurka/11Adamov et al. (2011) TMRCA assessment though the method of random pairsof STR haplotypes: http://rjgg.org/index.php/RJGGRE/article/view/83/102,http://www.semargl.me/ru/dna/ydna/tools/asd-pairs/

    Data Analysis of Samplesunder Human Genome DiversityProject (HGDP) from Pakistan

    The above stated book of Lippold et al.(2014) describes three samples of two Pakistaniethnic groups: Hazaras and Sindhis.

    HGDP00129 sample of a Hazara from theNorthern Pakistan may be identified as the oneof Q-L245 level on base of the following proper-ties:

    Table 5. SNPs showing that HGDP00129 sample belongs to Q-L245 level.

    Position(hg19)

    Ancestralvalue

    Value positivefor SNP

    SNP name (Y) SNP name (FGC)

    9382621 G T Y2222 FGC190217860015 G T Y2139 FGC184923733052 A G Y2148 FGC1879

    ______________________________________

    For the avoidance of doubt we shall note that the connection of Hazaras and the population of Khazar Kaganat, is not proved by any sources known to us. Hazaras (fromPersian , hezr thousand) are Shiahs of Mongol or Iran origin who speak Iranian and dwell in the central Afghanistan (8-10% of the total country population).They speak Hazara dialect or the d ialect of Dari language. Some of them speak Mongolian. The historical area of Hazaras dwelling in Afghanistan is Hazaradzhat regionshared in the contemporary Afghanistan by several provinces. Sengupta et al. (2006)12identifies the following structure of Hazaras by haplotypes of Y-chromosome: C-M217 40% (10/25), R1b-M73 32% (8/25), O-M122 8%. Q1b-M378 is mentioned in this research to be localized in only one person. The detailed analysis ofHazaras haplotypes currently known was presented by Sabitov in his article on the Origin of Hazaras from the point of DNA-genealogy 13.

    12Sanghamitra Sengupta et al., Polarity and Temporality of High-Resolution Y-Chromosome Distributions in India Identify Both Indigenous and Exogenous Expansionsand Reveal Minor Genetic Influence of Central Asian Pastoralists, Am J Hum Genet. 2006 February; 78(2): 202221.http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1380230/13Sabitov Zhaksalyk Origin of the Hazara from the point of DNA genealogy, The Russian Journal of Genetic Genealogy, Volume2, 1, 2010, page 38.rjgg.org/index.php/RJGGRE/article/download/42/53

    SNPs Y2139 and Y2148 were defined as posi-tive: samples PGP130 and PGP193, as well asthe tested samples of Q-L245 (AJ1, AJ2, Ar1)level. However they are negative in respect toall Q-L275 (xL245), and namely to samples Ir1,Kz1, Eu2, HG03914, HG03652, HG03864. Wehave similar situation on SNP Y2222 (but for the

    fact that it failed to be defined for PGP130).Therefore, we may very likely suppose that thetested sample HGDP00129 is referred to para-

    subclade Q-L245*. Unfortunately the quality ofsequencing does not allow the sample positionon the phylogenetic tree to be defined more ac-curately. The same may be stated in respect oftwo other samples from HGDP.

    The sample HGDP00165 belongs to a Sindhi

    from Southern Pakistan may be identified as theone of Q-Y2250 level based on the followingproperties:

    Table 6. SNPs showing that HGDP00165 sample belongs to Q-Y2250.

    Position(hg19)

    Ancestralvalue

    Value positivefor SNP

    SNP name (Y) SNP name (FGC)

    6894323 C T Y2245 PR68324452225 G C Y2270 FGC4676

    SNPs Y2056, Y2091 and F1349 are commonfor HGDP00129 and HGDP00165, but are nega-

    tive to HGDP00100, which shows both samplesto belong to subclade Q-M378. All SNPs of Q-

    Y2990 turned to be negative to sampleHGDP00165; the latter, therefore, refers to pa-

    ra-subclade Q-Y2250 (xQ-Y2990).

  • 8/11/2019 The update of the phylogenetic structure of Q1b haplogroup based on full Y-chromosome sequencing

    5/9

    The Russian Journal of Genetic Genealogy ( ): 6, 1, 2014 ISSN: 1920-2997 http://ru.rjgg.org RJGG

    68

    Sample HGDP00100 belonging to a Hazarafrom Northern Pakistan may be clearly identifiedas the one of Q-L275 branch both with regard tothe above pointed information on availability of

    positive mutation of the previously identifiedSNP being of the single level with L275, and onbase of the following properties:

    1) All three samples are positive to SNPsL314, F1169, F1337, F1528 being of a singlelevel with L275.

    2) SNP F753 (hg19 3714320) is positive toall three tested samples and to all Q-L275 in-cluded into YFull data base, which is also similarin respect of F1205 (hg19 8440399).

    It is similar positive to all Q-L275 Y1150+and to HGDP00100 SNPs Y1189, Y1209, Y1218,Y1232, Y1263, L68/S329/PF3781 (hg1918700150), YP505 (hg19 6388256).

    The above mentioned book of Lippold et al.(2014) included a phylogenetic scheme of Qhaplogroup where the mutual alignment of sam-ples HGDP00100, HGDP00129 and HGDP00165proves our conclusions. At the same time, thein-depth analysis as per SNPs with regard tospecific branches failed to be made; samples

    HGDP00129 and HGDP00165 were identified onthe scheme as single level.

    Therefore, we managed to specify the phy-logenetic position of three samples from HGDP,and stated them to belong to the followingbranches:

    HGDP00100 Q-L275 -> Q-Y1150(presumably)

    HGDP00129 Q-L275 -> Q-M378 -> Q-L245

    (presumably)

    HGDP00165 Q-L275 -> Q-M378 -> Q-Y2250(presumably)

    The high level genetic diversity within a sin-gle population and geographic region demon-strates that the territory of the contemporaryPakistan and Afghanistan played a key role inspreading Q-L275 haplogroup in the past.

    We shall note here that the original presence

    of the population referring to this haplogroup inCentral Asia (pre Indo-European substrate)looks more probable than the appearance of this

    population in the region together with the Indo-Europeans. However, the diffusion of indigenouspopulation and the one originated from thenorth may result in establishment of a new

    community where L275 haplogroup was a minorone; its further spreading was connected withthe migrations of the Indo-Europeans to Indiaand Western Asia.

    Due to presence of people belonging to Q-L275 haplogroup in Central Asia by the close ofthe 1stmillennium B.C. proved by paleoDNA re-searches, the territory of contemporary Pakistanand Afghanistan is considered to be a transitzone which presented the main migration routesof the Indo-European tribes (which also includedrepresentatives of Q-L275 haplogroup) toHindustan through the Hindu Kush (Q-Y1150),as well as in the direction of Western Asia (Q-Y2250 and Q-L245). The research of paleoDNAperformed by Chinese scientists based on thefindings of archaeological excavations in CentralAsia demonstrates the presence of Q haplogrouprepresentatives in these lands; 6 Q1a and 4 Q1bwere found in the Black Gouliang barrow to theeast of the Barkol Basin at the ruins of Hami(Kumul).14 With regard to the location of bodiesin the barrow, it may be concluded that repre-sentatives of Q1b haplotype were of a higher so-

    cial status.

    The Hami oasis was located at the Great SilkRoad near to Turfan and Khotan (Yarkend). Thebarrow dated to the Early (Western) Han (II-Icenturies B.C.).

    A part of the contemporary Uyghur popula-tions may be direct progenies of Q1b haplogroupsettled in the ancient Central Asia. The re-searches of Hua Zhong et al., 201015 and Wen-juan Shan et al. (2014)16show the availability of

    Q1b haplogroup only among the people of Xinji-ang. Unfortunately we have at our disposal only17-marker haplotypes which prevent us frommaking any definite conclusions.

    PGP193 Data Analysis

    We defined sample PGP193 as referring to aJewish cluster of Q-L245 (Y2225+ Y2200+)

    14Li Hongjie, Y chromosome genetic diversity of ancient population in the Northern China, JilinUniversit, 2012.http://cdmd.cnki.com.cn/Article/CDMD-10183-1012365432.htm15Zhong et al., Extended Y-chromosome investigation suggests post-Glacial migrations of mod-ern humans into East Asia via the northern route // Molecular Biology and Evolution, First pub-

    lished online: September 13, 2010, doi: 10.1093/molbev/msq247 (among four populations of Ui-gurs from Xinjiang one such person was found in each of the two populations: 1 out of 71, 1 outof 18).16Wenjuan Shan et al. (2014) Genetic polymorphism of 17 Y chromosomal STRs in Kazakh andUighur populations from Xinjiang, China. http://link.springer.com/article/10.1007/s00414-013-0948-y

  • 8/11/2019 The update of the phylogenetic structure of Q1b haplogroup based on full Y-chromosome sequencing

    6/9

    The Russian Journal of Genetic Genealogy ( ): 6, 1, 2014 ISSN: 1920-2997 http://ru.rjgg.org RJGG

    69

    branch. All SNPs out of the stated branches (butfor several17) were derived to be positive forPGP193.

    PGP193 -> Q-L275 -> Q-M378 -> Q-L245 -> Q-Y2220 -> Q-Y2225 -> Q-Y2200

    17Y2114 not read (level Q-Y2225), Y2232, Y2233 and Y2212 (level Q-Y2200).l

    Unfortunately this sample is anonymous, andto justify the tested sample to be of Jewish ori-gin is impossible.

    Notwithstanding, the stated sample wascompared with private SNPs of AJ1 and AJ2samples described in the article of Gurianov v.et al. (2013). The results are summarized in thebelow Table:

    Table 7. SNPs of the Q-Y3030 branch.

    Position (hg19) Ancestral valueValue positive

    for SNPSNP name (Y) SNP name (FGC)

    6985833 G C Y2746 aka YFS028180 FGC48367116693 C G Y3026 aka YFS028187 FGC483714683323 G A Y3027 aka YFS02830317842405 G A Y3028 aka YFS028379 FGC484518697269 A G Y2750 aka YFS028399 FGC484622545510 G T Y3029 aka YFS028485 FGC485022989959 T C Y3030 aka YFS028498 FGC485323338485 T C Y2751 aka YFS028509 FGC4854

    PGP193 -> Q-L275 -> Q-M378 -> Q-L245 -> Q-Y2220 -> Q-Y2225 -> Q-Y2200 ->Q-Y3030

    It is currently difficult to speculate on over-lapping of a new SNP structure of Q-L245 sub-

    clade with the earlier delivered phylogeneticstructures as per 67 STR-markers of Y-chromosome; as well as for the reason that thedata on STR-markers of PGP193 are not avail-able for the research. Moreover, sample AJ1presents DYF395S1=15-19 which is typical for amajority of Q1b Ashkenazi Jews, when AJ2 hasa unique DYF395S1=15-15 (which is apparentlya consequence of RecLOH).

    Final Conclusions

    The undertaken research resulted in the up-date of Q1b (Q-L275) haplogroup structure, aswell as in identifying new subclades: Q-Y2990(downstream Q-Y2250), Q-Y2225 (downstream

    Q-Y2220) and Q-Y3030 (downstream Q-Y2200).

    It created the background for continuation offurther researches on the inner structure of thepointed subclades and on comparing of their ex-isting ethno-population composition with the mi-gration of the Indo-European tribes which con-tributed to formalization of the pointed ethnicgroups.

    The updated findings in respect of phyloge-netic structure of Q1b haplogroup are included

    in the following scheme.

  • 8/11/2019 The update of the phylogenetic structure of Q1b haplogroup based on full Y-chromosome sequencing

    7/9

    The Russian Journal of Genetic Genealogy ( ): 6, 1, 2014 ISSN: 1920-2997 http://ru.rjgg.org RJGG

    70

    SNP Phylogenetic Tree of Q1b Haplogroup.

  • 8/11/2019 The update of the phylogenetic structure of Q1b haplogroup based on full Y-chromosome sequencing

    8/9

    The Russian Journal of Genetic Genealogy ( ): 6, 1, 2014 ISSN: 1920-2997 http://ru.rjgg.org RJGG

    71

    The changes made to the SNP scheme ofQ1b haplogroup compared to the one publishedby V. Gurianov et al. (2013) are included inSchedule 2.

    Acknowledgements

    The authors of the article wish to thank thefollowing people, who rendered their assistance

    in its preparation and conducting the research:

    Alessandro Biondo (Italy)Leon Kull (Israel)Justin Allen Loe (USA)Linda Magellan (USA)Olga Vasilyeva (United Kingdom)

  • 8/11/2019 The update of the phylogenetic structure of Q1b haplogroup based on full Y-chromosome sequencing

    9/9

    The Russian Journal of Genetic Genealogy ( ): 6, 1, 2014 ISSN: 1920-2997 http://ru.rjgg.org RJGG

    72

    Schedule 1.Private SNPs for Sample Eu1.

    Position (hg19) Ancestral valueValue positive

    for SNPSNP name

    (YFull internal notation)

    3131205 T C YFS0685953232026 C T YFS0685963232027 A G YFS0685973403647 C A YFS0685993704060 A G YFS0686056702576 C G YFS0686116881382 C T YFS0686127139179 A T YFS0686147222827 C T YFS0686158356720 T A YFS0686188467849 G A YFS0686198592711 T C YFS0686208990561 G C YFS068621

    9415377 T G YFS06862213828699 C T YFS06862714545910 T C YFS06862915269498 T C YFS06863015455814 T C YFS06863116255444 C T YFS06863417402893 A T YFS06863917722084 G T YFS06864018148788 G A YFS06864118158679 T C YFS06864218394566 C G YFS06864319060348 C T YFS06864419130251 A G YFS068645

    19130253 T C YFS06864619166462 C T YFS06864721329851 C G YFS06865421555930 T C YFS06865522519498 G A YFS06865923064750 C T YFS06866024365889 A G YFS068663

    Schedule 2. Changes made to SNP scheme of the Q1b haplogroup. SNPs under research.

    SNPBelonging

    to subcladeNotes

    CTS4507 Q-Y2250 Reverse SNP of P paragroup level (under research). Updatedin terms of specifying the reverse character of the mutation

    L68 Q-Y1150 Added (Y-DNA Haplotree, FTDNA 2014)F753 Q-L275 AddedF1205 Q-L275 AddedY1193 Excluded from Q-Y1150 level (under research)F2250 Q-L275 Added (Y-DNA Haplotree, FTDNA 2014)Y1200 Q-L275 Revised: transfer from Q-Y1150 levelY1220 Q-Y1150 Added (under research)Y1228 Excluded from Q-Y1150 level (under research)Y2118 Q-L245 Misprint correction: the position confirmed (see Y2218)Y2218 Q-Y2200 Misprint correction: added in lieu of Y2118YP505 Q-Y1150 hg19: 6388256 (->T)

    Z5901 Excluded from Q-Y1150 level (under research)_______________Note:the SNPs under research are not included in the SNP scheme of Q1b haplogroup until their positions are clearly identified.