A Posteriori Tests of Phylogeographic Hypotheses Jeet Sukumaran and Allen Rodrigo Duke University

A Posteriori Tests of Phylogeographic Hypotheses Jeet Sukumaran and Allen Rodrigo Duke University A Priori and A Posteriori Phylogeography Hypotheses Previous discussions on a priori and a posteriori phylogeographic hypotheses include: The dichotomy between vicariant and dispersalist hypotheses of biogeography The use of simulations of gene trees using explicit models (e.g., the single-species or multi-species coalescent) incorporating a priori phylogeographic hypotheses Approaches that use the observed data to drive the discovery of phylogeographic hypotheses (e.g., inference keys in NCPA). In those instances where we observe an apparently meaningful biogeographic pattern, perform a test of that pattern. We focus on a very particular class of a posteriori inference: A Priori and A Posteriori Phylogeography Hypotheses McGuire et al, Evolution 61:2879 Great Basin Collared Lizard The Randomization Test Permute the localities of the samples Calculate a test statistic (e.g. Maddisons and Slatkins test, which counts the number of migration events) Calculate =0.05 critical value from distribution of test statistics Compare with observed value of test statistic Distribution of Test Statistic An alternative An Example Imagine a sample of individuals drawn from a population. We can imagine several different hypothetical spatial partitions of the phylogeny of individuals from this population. Problems with A Posteriori Hypothesis Testing We focus on a hypothesis that was obtained after seeing the results. However, we do not take account of alternative apparently meaningful patterns that may be obtained by chance. Each of these patterns will cause us to perform a hypothesis test of that particular pattern. Correct Procedure We define post-hoc testing of phylogeographic hypotheses as a test carried out after observing an interesting tree, i.e. a genealogy in which a clade-based partition corresponds (loosely) to a spatial-based partition. A classical hypothesis test sets the probability of rejecting a true null hypothesis at . There is a difference between the null hypotheses for a priori and a posteriori tests: A priori H 0 : The particular interesting pattern does not exist. A posteriori H 0 : All interesting patterns do not exisit (i.e., there is no geographic structure). We modify the randomization procedure to allow for any possible interesting pattern. Correct Procedure There are two ways to perform the randomization test: Assume a fixed phylogeny, and permute the spatial labels. Assume fixed spatial labels, and simulate phylogenies. For either of these, we can construct a test-statistic, e.g., Fishers Exact Test statistic or Maddisons and Slatkins s (monophyletic partition discordance). Identify all possible interesting patterns. For each randomized sample, select the smallest value of the statistic amongst all interesting patterns, and construct distribution using this statistic. Calculating the True Null Distribution Replicatef(H 1 )f(H 2 )..f(H k ) Etc... Uncorrected Crit. CDF H1 (u) = 0.05 CDF H2 (u) = CDF Hk (u) = 0.05 Corrected Crit..CDF H1 (v) = 0.05/k CDF H2 (v) = 0.05/k..CDF Hk (v) = 0.05/k f(H1Hk) Etc. Results Uncorrected critical values have a Type I error rate close to 0.05 when there is only one partition tested. As the number of partitions increase, though, the probability of finding an interesting pattern increases. Similar results obtained for both test statistics used (Fishers Exact Test and Maddison and Slatkins s). Similar results obtained when taxon localities allowed to vary randomly as well as tree. Applying the Bonferroni Correction Thus for any a posteriori test, where the hypothesis or hypotheses being tested have not been declared a priori, we can use a corrected value of /k, where is the desired true error rate and k is the number of possible partition patterns in the data. Conclusions A posteriori phylogeographic hypotheses tests are carried out after the inferred phylogeny hints at a possible spatial partitioning of the data corresponding to genetic structure. These tests are prone to inflated Type I error rates because there are potentially many random patterns that will cause us to conduct a test of the observed pattern only. This can be corrected by applying a Bonferroni-style correction, and adjusting for the possible spatial partitions in the data under the null hypothesis How do we know how many possible spatial patterns there are?

Documents

A Posteriori Tests of Phylogeographic Hypotheses Jeet Sukumaran and Allen Rodrigo Duke University