FORDISC AND THE DETERMINATION OF ANCESTRY FROM ...summit.sfu.ca › system › files › iritems1 › 9004 › etd4063.pdf · software applications designed to make ancestry determinations

FORDISC AND THE DETERMINATION OFANCESTRY FROM CRANIOMETRIC DATA

By

Marina Elliott

B.A., The University of British Columbia, 2005

A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OF

MASTER OF ARTS

In

THE DEPARTMENT OF ARCHAEOLOGY

© Marina Elliott, 2008

SIMON FRASER UNIVERSITY

Summer 2008

All rights reserved. This work may not bereproduced in whole or in part, by photocopy

or other means, without permission of the author.

APPROVAL

Name:

Degree:

Title of Thesis:

Examining Committee:

Chair:

Date Defended/Approved:

Marina Elliott

Master of Arts

FORDISC and the determination of ancestry fromcraniometric data

Catherine D'AndreaGraduate Program Chair

Mark CollardSenior SupervisorAssociate Professor, Archaeology

Mark SkinnerSupervisorProfessor, Archaeology

Brian ChisholmInternal ExaminerSenior Instructor, University of British Columbia

ii

SIMON FRASER UNIVERSITYLIBRARY

Declaration ofPartial Copyright LicenceThe author, whose copyright is declared on the title page of this work, has grantedto Simon Fraser University the right to lend this thesis, project or extended essayto users of the Simon Fraser University Library, and to make partial or singlecopies only for such users or in response to a request from the library of any otheruniversity, or other educational institution, on its own behalf or for one of its users.

The author has further granted permission to Simon Fraser University to keep ormake a digital copy for use in its circulating collection (currently available to thepublic at the "Institutional Repository" link of the SFU Library website<www.lib.sfu.ca> at: <http://ir.lib.sfu.ca/handle/1892/112>) and, without changingthe content, to translate the thesis/project or extended essays, if technicallypossible, to any medium or format for the purpose of preservation of the digitalwork.

The author has further agreed that permission for multiple copying of this work forscholarly purposes may be granted by either the author or the Dean of GraduateStudies.

It is understood that copying or publication of this work for financial gain shall notbe allowed without the author's written permission.

Permission for public performance, or limited permission for private scholarly use,of any multimedia materials forming part of this work, may have been granted bythe author. This information may be found on the separately cataloguedmultimedia material and in the signed Partial Copyright Licence.

While licensing SFU to permit the above uses, the author retains copyright in thethesis, project or extended essays, including the right to change the work forsubsequent purposes, including editing and publishing the work in whole or inpart, and licensing other parties, as the author may desire.

The original Partial Copyright Licence attesting to these terms, and signed by thisauthor, may be found in the original bound copy of this work, retained in theSimon Fraser University Archive.

Simon Fraser University LibraryBurnaby, BC, Canada

Revised: Fall 2007

Abstract

FORDISC is a computer program designed to determine ancestry from human

skeletal remains. It is widely used, yet its accuracy has been challenged. In this

study, 200 specimens from one of FORDISC's reference samples are used to

investigate four issues that are central to debate: (1) the inclusion of the source

population in the reference sample, (2) the influence of sex, (3) the impact of

variable number, and (4) the effect of different anatomical regions.

The results indicate that the source population must be present and the sex of

the specimen known before FORDISC can provide an accurate determination of

ancestry. Additionally, a determination will be successful only if more than 10

measurements pertaining to multiple anatomical regions are used. Even when

these conditions are met, few determinations may be considered unambiguously

correct. Overall, FORDISC performed below expectations and the results

suggest that the program should be used cautiously.

Keywords: FORDISC; ancestry determination; cranial morphology; forensicidentification; discriminant function analysis

Subject Terms: biological anthropology; forensics; craniometry; skull;human variation

iii

Acknowledgements

This research could not have happened without the encouragement and

assistance of many people. In particular, I would like to thank my supervisor, Dr.

Mark Collard for his generous advice, support and patience throughout this

process. In addition to all of his other duties and responsibilities, he always

seemed to have time for my questions and concerns. I would also like to thank

my committee members, Dr. Mark Skinner and Dr. Brian Chisholm, both of whom

took precious time out of their summer schedules to read and provide feedback

on this research.

In addition, I am extremely fortunate to have an excellent group of colleagues,

friends and family members. I am especially grateful to Alan Cross, Mana

Dembo, Kevan Edinborough, Luseadra McKerracher and the other members of

the Laboratory of Biological Anthropology whose intelligence, curiosity and

enthusiasm for their research inspired my own efforts. Many thanks also go to my

friends and family for providing valuable comments, welcome distractions and

incalculable kindnesses along the way. Although no words can truly express how

lucky I am to have them, thanks also go to my parents - their example gives me

something to strive for.

Finally, I would like to thank my husband, Robin Elliott. His writing and editing

contributions were invaluable, as were his computer skills when things went

awry. More importantly, his love, support, encouragement and apparently

endless tolerance of my interests (academic and otherwise) are a constant

source of wonder and admiration to me. I hope I have made him proud.

iv

Table of Contents

Approval iiAbstract iiiAcknowledgements ivTable of Contents vList of Tables viiList of Figures viii

1. Introduction 11.1. Aims and objectives 11.2. FORDISC and its applications 31.3. The FORDISC debate 61.4. Issues investigated 121.5. Outline of analyses 16

2. Materials and Methods 182.1. Data 182.2. Analyses 20

3. Results 283.1. Impact of including source population and specifying sex 28

3. 1. 1. Number of correct assignments accepting all posterior and typicalityprobabilities 283.1.2. Number of correct assignments using >0.5 posterior probability and >0.01typicality probability 313.1.3. Number of correct assignments using >0.8 posterior probability and >0.01typicality probability 32

3. 1.4. Summary 333.2. Impact of variable number 34

3.2. 1 Number of correct assignments accepting all posterior and typicalityprobabilities 343.2.2 Number of correct assignments using >0.5 posterior probability and >0.01typicality probability 363.2.3. Number ofcorrect assignments using >0.8 posterior probability and >0.01typicality probability 383.2.4. Variable number and population differences 393.2.4.1. Number of correct assignments accepting all posterior and typicalityprobabilities 393.2.4.2 Number of correct classifications using >0.5 posterior probability and >0.01typicality probability 443.2.4.3 Number of correct classifications using >0.8 posterior probability and >0.01typicality probability 463.2.5. Summary 48

3.3 Impact of cranial region 49

v

3.3. 1. Number of correct assignments accepting all posterior and typicalityprobabilities 493.3.2. Number of correct assignments using >0.5 posterior probability and >0.01typicality probability 513.3.3. Number of correct assignments using >0.8 posterior probability and >0.01typicality probability 533.3.4. Cranial region and population differences 543.3.4.1. Number of correct assignments accepting all posterior and typicalityprobabilities 543.3.4.2. Number of correct assignments using >0.5 posterior probability and >0.01typicality probability 583.3.5. Summary 61

4. Discussion 624.1. Main findings 624.2. Implications for use of FORDISC 664.3. Future considerations 70

5. Conclusions 74

References 78Appendix I 86Appendix II 89Appendix III 91

vi

List of Tables

Table 1. Total number of test specimens correctly classified (n=200) 30

Table 2. Number of test specimens correctly classified using >0.5 posteriorprobability and >0.01 typicality probability (n=200) 32


Table 4. Total number of test specimens correctly classified by variable number(n=200) 36


Table 6. Number of test specimens correctly c1assi'fied using >0.8 posteriorprobability and >0.01 typicality probability (n=200) 39

Table 7. Results by population accepting all posterior and typicality probabilities(n=40) 42

Table 8. Results by population using >0.5 posterior probability and >0.01typicality probability (n=40) 45

Table 9. Results by population using >0.8 posterior probability and >0.01typicality probability (n=40) 47

Table 10.Total number of test specimens correctly classified (n=200) 50

Table 11. Number of test specimens correctly classified using >0.5 posteriorprobability and >0.01 typicality criteria (n=200) 52


Table 13. Total results for each cranial region by population (n=40) 57

Table 14. Results for each cranial region using >0.5 posterior probability and>0.01 typicality probability (n=40) 60

Table 15: Range of posterior and typicality probabilities for correct and incorrectassignments by population 68

vii

List of Figures

Figure 1: Genetic tree for 26 European populations 26

Figure 2: Genetic tree for 33 African populations 26

Figure 3: Genetic tree for 21 Asian populations 27

Figure 4: Genetic tree for 23 American populations 27

viii

1. Introduction

1.1. Aims and objectives

Determining ancestry from skeletonized human remains is an important task for

bioarchaeologists and forensic anthropologists. As part of a biological profile, this

information is used in a wide range of contexts, including the study of the

movements and interactions of past populations, ancestral land claims,

repatriation requests and the investigation of unlawful deaths and human rights

violations (Cox et al. 2006).

Despite attempts to use other skeletal elements (e.g. Marino, 1997; Ballard,

1999; Holliday and Falsetti, 1999: Patriquin et al. 2002), the skull continues to be

regarded as the most reliable area for determining ancestry (Bass 1995). As a

result, both non-metric and metric techniques have been developed to effect

ancestry determinations from the skull. The use of discrete traits, such as the

presence or absence of shoveled incisors, is common. However non-metric

characteristics are not exhaustive or always consistently defined and few

standards exist for their collection (Buikstra and Ubelaker 1994). Additionally,

non-metric methods have been challenged for being more susceptible to inter

observer error (Corruccini 1974).

1

Due to their perceived objectivity and accuracy, metric assessments of the skull

have achieved wide acceptance for assessing ancestry from skeletal remains

(e.g. Giles and Elliot 1962). Furthermore, the development of statistical methods

and computer technologies to manipulate large datasets has contributed to the

widespread use of craniometric methods. In particular, user-friendly computer

software applications designed to make ancestry determinations quickly and

easily have become popular.

Currently, FORDISC (Jantz and Ousley 2005) is the leading computer program

for ancestry determination. Although it is widely used, its application to questions

of ancestry is not unproblematic and its accuracy and reliability have been

questioned (Fukuzawa and Maish 1997; Kosiba 2000; Belcher et al. 2002;

Leathers et al. 2002; Ubelaker et al. 2002; Williams et al. 2005; Hubbe and

Neves 2007). In response, FORDISC's developers argue that the program's

apparent failures are due to inappropriate use of the program and/or

interpretation of results (Freid et al. 2005). In particular, they warn against testing

individuals whose populations are not represented in the database. They also

claim that using too many variables reduces success.

Given the importance of determining ancestry from skeletonized remains and the

confidence placed in FORDISC, there was a pressing need to resolve the issues

that have been raised regarding its accuracy. Accordingly, this study focused on

several key areas of debate. In particular, it evaluated the effect of including or

2

excluding the source population from the reference sample on FORDISC's

accuracy. It also examined how number of variables affects FORDISC's success

rate. To test the impact of constraining sex, test specimens were compared to

reference groups of both sexes and to same-sex groups alone. The effect of

using specific cranial regions on FORDISC's ability to determine ancestry was

also tested using datasets that isolated the basicranium, neurocranium and face.

1.2. FORDISC and its applications

Developed by Richard Jantz and Steve Ousley in association with the University

of Tennessee, FORDISC (short for Forensic Discriminants) was designed to

provide rapid and accurate ancestry determinations for crania of unknown origin

through Discriminant Function Analysis (DFA) of skull measurements. It also

offers ancestry and stature estimations from postcranial measurements. The

program was first commercially released in 1992. A second version followed in

1996. The current version, FORDISC 3.0, was released in 2005.

Before the program became publicly available, Jantz and Ousley provided

custom discriminants for individual specimens by request (Jantz and Ousley

2005). These ancestry determinations were made through comparisons with the

Forensic Anthropology Data Bank (FDB), a repository of U.S. forensic cases from

the 19th and 20th centuries. By the time FORDISC 1.0 was released the program

incorporated a much larger database of craniometric measurements collected by

3

W.W. Howells (Howells 1973; 1989). Howells' dataset includes values for 70

measurements recorded on more than 2500 crania from 29 populations. The

populations come from Africa, Europe, Asia, Australia/Pacific Islands and the

Americas, and range in time from 600 B.C to the mid 20th century. Incorporating

Howells' dataset significantly broadened the geographic and temporal range of

FORDISC's comparative sample. Since the release of the first version of the

program, Jantz and Ousley have augmented the Forensic Data Bank with data

from new U.S. cases and added a sample of males taken from modern forensic

cases in Guatemala.

The first two versions of FORDISC offered ancestry determinations through DFA

of up to 21 cranial measurements. In the current version, users may select up to

82 measurements when using Howells' data or 42 when using the Forensic Data

Bank. However, Jantz and Ousley (2005) note that because some

measurements were not taken on some individuals, sample sizes are limited by

the measurements selected.

From its inception, FORDISC has been a popular tool among bioarchaeologists.

Shortly after the release of version 1.0, Mangold et al. (1993) used FORDISC to

perform a two group DFA of 21 cranial measurements to corroborate a qualitative

trait-based analysis of the sex of a set of pre-contact Native American skeletal

remains. In this study, Mangold et al. concluded that the results "strongly aligned

the specimen with Amerindian females rather than males" (p. 2). In 2001,

4

Williams et al. (2001) used FORDISC 2.0 to explore the ancestry of several

individuals buried in a German settler's graveyard in Halifax, Nova Scotia. The

FORDISC results led Williams et al. to conclude that the remains were "non

European" and to involve the local Mi'kmaq chief in the investigation. FORDISC

2.0 was also used to assess 80 historical crania in the collections of the Institute

of Forensic Medicine in Copenhagen (Sejrsen et al. 2005). Although many of the

crania were marked only with "a general geographic or racial descriptor" (p. 40),

the authors of the study claimed to confirm ettmicity in 70% of the cases.

FORDISC has also been used to analyze more ancient human remains. Lovvorn

et al (1999) used FORDISC 2.0 to compare a male burial specimen from Sidney,

Nebraska with males from Howells' database. Using only six measurements

because the midface and orbits were missing, FORDISC 2.0 selected "Eskimo"

as the most likely population, followed by "Ainu" (Japan). Based on these results,

the authors concluded that the specimen possessed a "blend of Amerindian and

earlier protomongoloid traits" (p. 527) and that this was consistent with the

"hypothesis that Plains Amerindians descended from the earliest wave of

Paleoindians who crossed the Bering Straits" (p. 527). In another study,

Pleistocene remains from the site of Zhoukoudian (UC1 01, UC102 and UC103)

were compared to a reference sample that combined Howells' data with data

from additional Amerindian groups (Cunningham and Westcott 2002). The

authors concluded that their results supported the contention that the remains

"do not represent a family but are relatively contemporaneous" (p. 636).

5

In addition to being used in historical research, FORDISC is used regularly to

assist with identifications in forensic cases. For example, in 2000, an Arlington

Cemetery press report described FORDISC as "a key piece of software" used by

the U.S. Army Central Identification Laboratory in Hawaii for "automating the

process of matching skeletal remains" (ANC 2007).

FORDISC has become sufficiently popular that Jantz and Ousley now run

workshops focusing on the program during the American Academy of Forensic

Sciences annual meetings (Anthropolog 2005). Designed to help anthropologists,

archaeologist and forensic professionals carry out and interpret the results of

FORDISC analyses, these workshops cover a variety of topics such as statistical

parameters, the estimation of ancestry from postcranial material, "problem"

crania, and secular change (Jantz and Ousley 2007).

1.3. The FORDISC debate

Despite its popularity, the utility of FORDISC for ancestry determination has been

challenged. In 1997, Fukuzawa and Maish (1997) tested FORDISC 2.0 with 59

crania from two known Ontario lroquoian sites. Using both complete and partial

crania, the authors compared lroquoian individuals with seven populations from

Howells' dataset and found FORDISC to be an unreliable identifier of ancestry.

Similarly, Kosiba (2000) tested a series of East Indian crania and found that

FORDISC 2.0 was unable to consistently classify the sample.

6

In 2002, two studies used ancient Nubian crania to test FORDISC 2.0. In the 'first,

Belcher et al. (2002) analyzed 47 Meroitic Nubians and found little consistency in

either biological affinity or sex attribution. The authors concluded that the

program was flawed and challenged "the utility of any forensic application that

attempts to constrain worldwide human cranial variability" (p. 42). In the second

study, Leathers et al. (2002) tested FORDISC 2.0 with a collection of post

Meroitic Nubian crania using 12 cranial measurements. Only 57% of the 89

specimens were classified as African and the research team concluded that

FORDISC 2.0's classifications were "not morphologically or biologically accurate"

(p. 99).

A third evaluation of FORDISC 2.0 was published in 2002 (Ubelaker et al. 2002).

This study tested the program with a medieval Spanish sample. The authors

reasoned that, if the program was accurate, the test specimens should be

classified as one of the European populations in the reference sample. The study

achieved "a variety of results" (p. 3). Using the Forensic Data Bank, FORDISC

2.0 classified 44% of the test sample as white, 35% as black, 9% as Hispanic,

4% as American Indian and 3% each as Chinese or Vietnamese. Using Howells'

database, the 95 test individuals were classified into 21 different groups. 25%

were classified as Egyptian, followed by Austrian with 11 %. The remaining

specimens were scattered across 19 different populations ranging from Andaman

Islanders (7%) to Zulu (2%). Despite these diverse results, the authors concluded

that FORDISC was still a "useful forensic tool" (p. 4).

7

In 2005, another study using Nubian crania was published by the same authors

as the 2002 Meroitic paper (Williams et al. 2005). This study used 42 test

specimens instead of the 47 used previously, and 12 variables based on their

availability and diagnostic value. The authors reasoned that, if FORDISC was

accurate, it would group the Nubians together and classify them as Howells' 26

30th dynasty Egyptians since the two groups were geographic neighbours.

According to Williams et al. (2005), FORDISC "failed both tests" (p. 345).

FORDISC's developers did not respond to the various criticisms of the program

until the Williams et al. (2005) study was published. At that point, they suggested

that the "disputed results" were due to the use of "inappropriate reference

samples" (Freid et al. 2005: 103). Citing the limitations of Discriminant Function

Analysis, Jantz and Ousley (2005: 16) noted that any function "will classify an

unknown ... regardless of its actual ethnic group" and cautioned against testing

"an individual whose race or ethnic group is not represented in the reference

samples".

Jantz and Ousley (2005) also suggested that the critics had failed to properly

evaluate the posterior and typicality probabilities provided by FORDISC. These

probabilities are mathematical calculations used to evaluate the likelihood of

group membership (Pietrusewsky 2000). Posterior probabilities are a relative

measure of membership and sum to 1, while typicality probabilities assess

"whether the unknown individual could belong to any of the groups" based on the

8

absolute distances to each group (Albanese and Saunders 2006: 287). Jantz and

Ousley (2005) recommended that a population attribution be accepted only if the

posterior probabilities were 0.5 or more and the typicality probabilities higher than

The debate was not settled at this point, however. In 2007 Current Anthropology

published a discussion of FORDISC 2.0. In a reassessment of the Williams et al.

(2005) paper, Hubbe and Neves (2007) suggested that the study was flawed

because it had used only 12 variables, a number they considered to be "far from

enough to classify a skull on the basis of discriminant functions" (p. 285). In

response, Williams and Armelagos (2007) pointed out that FORDISC tutorials

frequently use 12 or fewer variables and that "there is no stipulated number of

variables ... simply because forensic evidence is often fragmentary" (p. 286).

They also pointed out that more variables would not necessarily improve success

if the measurements are collinear or non-diagnostic.

Williams and Armelagos (2007) also criticized Hubbe and Neves (2007) for using

Howells data as both test and control. They suggested that an 'independent'

sample - one whose individuals were not speci'f1cally included in the database -

should have been used and cited Naar et al. (2006) as an example of such a

I In the Freid et al. (2005) paper, Jantz and Ousley recommend accepting a determination only ifthe typicality probability is 0.1 or more. However, this appears to be a typographical error. TheFORDISC 3.0 manual suggests that typicality probabilities are "interpretively similar to theunivariate p value based on the normal distribution" (Jantz and Ousley, 2005: 48) and that "TPsbelow 0.05 (5%), or certainly 0.01 (1 %) for a group... indicate questionable membership...ormeasurement error" (Jantz and Ousley, 2005: 46). These comments lead me to believe that 0.01is the acceptable typicality probability value rather than 0.1.

9

study. However, Naar et al. (2006) also used Howells' data, specifically the 111

crania that make up the entire Egyptian sample. FORDISC only placed 55

(49.5%) of the sample back into the Egyptian group at the appropriate statistical

level. While the use of individuals from within the database may not provide an

independent test of the program, doing so should result in high levels of success

because the individuals already exist in the reference sample. A failure on

FORDISC's part to attribute members of its core sample appropriately would

suggest a significant problem with the program.

In the next issue of Current Anthropology, Williams, Belcher and Armelagos

(2007) replied to another critique of the 2005 study. In this discussion, Keita

(2007) suggested that Williams et al. (2005) had overemphasized the role of non

genetic factors in cranial development and noted that a "demonstration of

similarity using multivariate analyses does not always mean identity, close recent

common origin, or even origin in an adjacent region" (p. 425) Williams and

Armelagos (2007) responded by saying they had been criticized "for a paper that

we did not write" (p. 426) and that Keita had misunderstood their intent in

highlighting the conditions of growth. They stressed that their previous study had

been undertaken to demonstrate the "lack of fit between conceptual models...

and actual patterns of human biological variation" (p. 426) and maintained the

position that FORDISC is both functionally and conceptually flawed due to the

complexity of this variation.

10

Most recently, Campbell and Armelagos (2007) used a new individual scores

option in FORDISC to test samples taken from within both the W.W. Howells and

Forensic Databank reference groups. In this study, FORDISC was able to

correctly classify 73.1 % of Howells' individuals and 72.0% of the FDB individuals

using the Freid et al. (2005) probability levels when the sex was unspecified.

When sex was constrained, the results improved to 80.7% and 78.6%

respectively. Although Armelagos had previously contributed to almost every

study that challenged FORDISC and been a vocal opponent of the program

(Belcher et al. 2002; Leathers et al. 2002; Williams et al. 2005; Naar et al. 2006),

Campbell and Armelagos (2007) did not suggest that the program was flawed.

Instead, they concluded that the results achieved by FORDISC were

"approaching the limit of craniometric analysis to assign group membership" (p.

84).

Last, Jantz and Ousley have suggested that secular change may be responsible

for FORDISC's inconsistent performance in some cases. In particular, they

suggest that Americans (both "White" and "Black") have changed significantly

over the past 150 years in "response to unparalleled environmental change"

(Wescott and Jantz 2005: 242). As a result, they recommend that the Forensic

Data Bank should only be used "on individuals born in the 20th century" while

Howells data "may be more appropriate for older specimens" (Jantz and Ousley

2005:17). Certainly secular changes have been well documented (Boas 1911;

11

Angel 1976; Smith et al. 1986; Jantz and Meadows-Jantz 2000). However, the

extent to which it complicates ancestry determination is not well understood.

In sum, there are a number of unresolved issues with respect to FORDISC. In

particular, the significance of testing individuals whose populations are not

represented in the reference sample has still not been determined. There are

also inconsistencies with respect to how specifying sex affects FORDISC's

accuracy. The guidelines for determining which variables are the most effective

and how many to use, are also unclear. Lastly, the recommendations for

accepting an attribution based on the posterior and typicality probabilities differ in

the FORDISC literature. While the manual still recommends using a posterior

probability of 0.5, the FORDISC 3.0 workshops run by Jantz and Ousley now

suggest that "posterior probabilities <0.8 have a higher probability of being

incorrect than correct" (Jantz and Ousley 2007: 33). Since FORDISC continues

to be used regularly in biological anthropology and forensic settings, the study

reported here was undertaken to contribute to the resolution of these important

questions.

1.4. Issues investigated

The first issue addressed in this study is the impact of the presence or absence

of a specimen's source population in FORDISC's reference sample. As

mentioned above, a number of researchers have sought to test FORDISC by

12

analyzing specimens of known origin. FORDISC's developers have rejected most

of these tests on the grounds that the test specimens' source populations were

not included in FORDISC's reference sample. It is true that DFA "require[s] in

principle that unknowns belong to one of the groups in the analysis from which

the functions were derived" (Keita 2007: 425). However, biodistance research is

based on a close relationship between morphology and ancestry. As Roseman

(2004: 12824) notes, biodistance studies assume that "populations that share

recent common ancestry and or exchange a large number of migrants should

resemble one another more than geographically isolated and distantly related

populations". Thus, ifthere were no relationship between craniometries and

ancestry, Jantz and Ousley could not continue to claim that FORDISC will

classify individual crania "into the group with which they have the closest affinity"

(Spradley et al. 2008). Furthermore, in our increasingly mobile society "a

representative of almost any population in the world could end up being a

forensic case in almost any place in the world" (Ubelaker et al. 2002: 2).

Consequently, a program that requires an investigator to determine which

populations are represented before running an analysis may have very limited

application for real-world investigations.

The second issue investigated in the study is the effect of specifying the sex of a

target specimen versus leaving its sex unspecified. Several studies found

differences in affinity attribution when the sex was altered (Belcher et al. 2002;

Williams et al. 2005; Campbell and Armelagos 2007). By comparing a test

13

specimen to both males and females, these studies expected FORDISC to

correctly identify both population and sex on the basis that male and female

skulls of a given population are more similar to each other than either is to

another population (Williams et al. 2005). With this in mind, this study tested

whether the population attribution changed when the sex was unspecified versus

when it was restricted to the sex provided by Howells (sex specified).

The third issue addressed in this study is the impact of number of variables on

FORDISC's accuracy. While Jantz and Ousley (2005: 44) admit that "good

separation and classification of many groups requires many variables" they also

argue that "using too many variables produces overfitting and unreliable apparent

accuracy". Similarly, Williams and Armelagos (2007: 286) suggest that using

"additional variables that are collinear or that are not diagnostic may reduce the

efficacy of classification." In contrast, Hubbe and Neves (2007: 285) found that

"the number of variables used rather than the anatomical region measured" was

the most critical factor affecting FORDISC's discriminant ability. Although there is

little consensus as to what constitutes a "sufficiently" large number of variables in

a multivariate analysis (Peitrusewsky 2000), Jantz and Ousley (2005: 49)

suggest that a "reasonable recommended maximum number of variables seems

to be the minimum sample size among all groups divided by three". This is based

on Huberty's (1994) results. Although they suggest fewer variables may be

effective, as a minimum, Jantz and Ousley recommend "10 variables for reliable

comparisons" (2005: 49).

14

A fourth factor that may be contributing to FORDISC's inconsistent performance

relates to the anatomical region analyzed. While most researchers recognize that

all morphology is the result of combined genetic, developmental, and

environmental factors, cranial morphology has been considered a reasonable

proxy for geographic origin. This is particularly true of the facial region, with the

midface and nose considered the most diagnostic of ancestry (Brues 1990; Gill

and Gilbert 1990). However, many studies have shown the face to be particularly

susceptible to external stresses related to diet, conditions of growth, cultural

practices and/or climatic adaptations (Coon et al. 1950; Hiernaux 1963; Hughes

1968; Hylander 1977; Carey and Steegmann 1981; Franciscus and Long 1991;

Skelton and McHenry, 1992; Lieberman et al. 2004; Roseman 2004; Roseman

and Weaver 2004; Nicholson and Harvati 2006). As a result, facial anatomy may

not preserve population history adequately. Instead, the basicranium has been

put forward as a better indicator of ancestry because it is more phylogenetically

stable (Olson, 1981; Wood and Lieberman 2001, Harvati and Weaver 2006b).

And although it may be subject to climatic inlluences as well (Beals et al. 1983),

the neurocranium has also been suggested as a reasonable proxy for population

history (Roseman 2004). In light of these issues, and the fact that more than 50%

of FORDISC's measurements relate to the face, it was deemed important to

determine how anatomical region affected the program's success rate.

The fourth issue investigated in the study is the effect of specifying the sex of a

target specimen versus leaving its sex unspecified. Several studies have found

15

differences in affinity attribution when the sex was altered (Belcher et al. 2002;

Williams et al. 2005, Campbell and Armelagos 2007). By comparing a test

specimen to both males and females, these studies expected FORDISC to

correctly identify both population and sex on the basis that male and female

skulls of a given population are more similar to each other than either is to

another population (Williams et al. 2005). With this in mind, this study tested

whether the population attribution changed when the sex was unspecified versus

when it was restricted to the sex provided by Howells (sex specified).

1.5. Outline of analyses

All analyses were conducted on individuals taken directly from the Howells

reference sample employed by FORDISC. These individuals were only analyzed

against the Howells reference groups and not against the Forensic Databank

samples. This was done to address the question of using members whose

populations are not represented in the database and to give FORDISC the

greatest opportunity for success. As mentioned in the Introduction, there is

disagreement as to whether or not this is an appropriate test of the program's

accuracy in attributing affinity to unknown remains (Hubbe and Neves 2007;

Williams & Armelagos 2007). However, because the test individuals are part of

the reference sample, if the program functions correctly, it should successfully

place the majority with their source population.

16

To determine the effect of using an individual whose population was not

represented in the database, all analyses were run once with the source

population included and once with it excluded. To test the effect of using different

numbers of variables, analyses included variable sets that included the maximum

number of variables common to all groups (56) and the minimum number

recommended by FORDI8C (10). To assess the relative success of using

different anatomical regions on FORDI8C's ability to identify ancestry, the

measurements were divided into sets of basicranial, neurocranial and facial

variables. Lastly, to test the effect of sex selection, analyses used both sexes as

well as the appropriate sex for the test individual. For all analyses, the results

were calculated three times: once with no probability or typicality limitations, once

with 0.5 posterior probability and 0.01 typicality probability values, and once with

a more strict 0.8 posterior probability criterion.

Given the above, the following results were expected. Using individuals whose

populations were represented in the database would result in high numbers of

correct returns for all analyses. FORDI8C was expected to be able to classify

individuals using either 56 or 10 variables. With the source population excluded,

FORDI8C was expected to place test individuals into a closely related group as

determined by genetic and linguistic data.

In general, if sex is not a confounding factor, the sex-unspecified (8U) and sex

specified (88) analyses should return similar results, but practically, the results

17

for SS could be expected to be better as the number of groups in the comparison

is reduced.

With respect to variable number, FORDISC was expected to classify the most

number of test specimens correctly using the 56-variable dataset. Following

Hubbe and Neves (2007), more variables should provide better discrimination

power. At worst, adding more variables would simply fail to improve

discrimination and result in a plateau effect.

For the anatomical regions, if cranial morphology tracks population history then

the basicranium should produce the best results (Olson 1981; Wood and

Lieberman 2001; Harvati and Weaver 2006). Although it is still not clear whether

the neurocranium relates more closely to climate or to population history (Beals

et al. 1983; Roseman 2004), on the basis of Harvati and Weaver's later work

(2006b), FORDISC was expected to return fewer correct assignments using the

neurocranial variable set than the basicranium. Because studies have shown the

face to be the most susceptible to external stresses, the facial variables were

expected to be the least accurate. If, however, cranial morphology correlates with

a factor other than genetic history, then these predictions would not be

supported.

18

2. Materials and Methods

2.1. Data

The craniometric data used in this study were collected by William Howells

between 1965 and 1980 (Howells 1996). Howells published the data in a series

of monographs (1973; 1989; 1995) and also made them available upon request

and via the internet. Although the dataset does not cover certain areas (e.g.,

Indian subcontinent), and the sample sizes for some groups are small (e.g., 29

males and 18 females for Taiwanese Atayal), it is the most comprehensive and

accessible collection of human craniometric data available. As noted earlier, it

also forms the bulk of the reference sample for FORDISC.

The version of Howells' dataset used in this study consists of values for 74 linear

measurements and angles recorded on 2504 crania from 28 populations

representing five geographic regions: Europe, Africa, East Asia, Australia-Pacific,

and the Americas. In an effort to maintain equal sample sizes, Howells tried to

select 50-55 males and females for each of his 28 populations. Although some

groups were deficient in this number, most were reasonably close. Details of the

measurements and angles are given in Appendix 1. The names, geographic

locations and sample sizes of the populations are presented in Appendix II.

Although some of the names Howells and FORDISC use for the groups in the

18

reference sample are no longer considered appropriate, the designations were

maintained to avoid confusion.

The test sample consisted of 200 individuals taken directly from Howells' dataset:

20 males and 20 females from one population in each of the major geographic

areas. The five populations from which the test sample was drawn are the Berg

(Europe), Zulu (Africa), Hokkaido Japanese (East Asia), Tasmanians (Australia

Pacific) and Santa Cruz (Americas). These groups were chosen because their

sample sizes were relatively large (32-56, mean 48) and related populations

were available within the FORDISC reference sample. Test individuals were not

compared to the Forensic Databank groups as they are not included in that

reference sample.

To evaluate the impact of variable number and cranial region on the accuracy of

ancestry determination in FORDISC, four datasets were created for each test

individual. Hereinafter, these will be referred to as the whole cranium dataset, the

basicranium dataset, the neurocranium dataset and the face dataset. Appendix

III lists the variables used to create the four datasets.

The whole cranium dataset was based on the 56 variables that are common to all

groups represented in Howells' dataset. The complete set of 74 variables was

not employed because Jantz and Ousley (2005: 7) suggest that using

measurements that are not common to all groups "will limit sample sizes

19

somewhat". The 56 variables used in the whole cranium dataset were selected

with the aid of FORDISC 3.0's select all variables function.

The basicranium, neurocranium and face datasets were each based on 10

variables. Landmarks employed by Roseman (2004), Harvati and Weaver

(2006), and Hubbe and Neves (2007) were used to divide Howells' variables into

cranial region-specific groups. Of all of the measurements available for

conducting an analysis in FORDISC, 10 were associated with the basicranium,

14 related solely to the neurocranium, and 42 were face-specific. However, to

ensure consistency, each set needed to include the same number of variables.

Since the basicranium was represented by only 10 measurements, all of the

available basicranial measurements were used while 10 measurements for each

of the neurocranium and face datasets were randomly selected from their

respective totals.

2.2. Analyses

Each dataset was subjected to four analyses. In the first, the source population

was included in the reference sample and the test individual was compared to

both males and females of all available populations (population included/both

sexes). The source population was also included in the reference sample in the

second analysis but the test individual was only compared to specimens of the

relevant sex (population included/same sex). In the third analysis, the source

20

population was excluded from the reference sample and the test individual was

compared to both males and females (population excluded/both sexes). In the

fourth, the source population was excluded from the reference sample and the

test individual was only compared to specimens of the relevant sex (population

excluded/same sex).

Analyses were conducted with and without the source population included

because of the disagreement regarding how FORDISC should be applied. As

mentioned in the introduction, several studies have tested FORDISC's accuracy

using specimens whose source populations were not present in the reference

sample (e.g., Williams et al. 2005). The researchers responsible for these studies

argue that FORDISC should assign a test specimen to a closely related

population in the reference sample in the absence of the source population.

However, Jantz and Ousley (2005) contend that FORDISC should only be used

on a specimen if its population is represented in the reference sample.

'Both sexes' and 'same sex' analyses were carried out to control for the

potentially confounding effects of sexual dimorphism. When the test specimen

was compared only to reference specimens of the same sex, the select al/ males

or select al/ females function was used in FORDISC 3.0. The sex of the test

specimen was taken from the "sex" column in the Howells dataset.

21

With the exception of the source population excluded analyses, test specimens

were compared with all available groups. This was done because of confusion

regarding how many groups to use in an analysis. While Jantz and Ousley (2005)

acknowledge that "discriminant analyses should be run initially using all possible

groups that an unknown may classify into" (p. 44), they also suggest that using

two to five groups will be "more accurate than those involving many more groups"

(ibid). To achieve this improved accuracy, they suggest identifying the groups

with the lowest typicality probabilities and removing them after repeated runs.

However, they admit that typicality probabilities "are by no means foolproof'

(Jantz and Ousley 2005: 16) and do not clarify how many groups or runs are

sufficient. Furthermore, the presence or absence of a group in a particular region

cannot be assumed a priori. Nor, as Keita (2007) points out, can one "every

really know if an individual's origin population is actually represented" (p. 425).

Overall, the arguments for limiting the number of groups were judged to be

insufficient to justify reducing the number of comparative sample groups in this

study.

To identify the closest relative of a test population, published genetic and

linguistic were consulted. The best match was then chosen from the populations

available in Howells' dataset. These were the Norse (Europe) for the Berg, the

Teita (Africa) for the Zulu, the Kyushu (East Asia) for the Hokkaido Japanese, the

Yauyos (Americas) for the Santa Cruz, and the mainland Australian Aborigines

22

for the Tasmanians (Australia-Pacific). These groups were selected for the

following reasons:

1. Norse and Berg. As the ancestors of present day Nordic populations, the

Norse are most closely related to Norwegians and Swedish and are the

nearest genetic match in the database for FORDISC's Berg (Austria)

group over the more distantly related Zalavar (Hungary) group (Figure 1)

(Cavalli-Sforza et al. 1994).

2. Teita and Zulu. 'Teita' is a disused name for a North-Eastern Bantu

speaking people of Kenya (Kitson 1931). They share genetic and

linguistic ties with the Zulu, a South-Central Bantu speaking group

(Bendor-Samuel and Hartell 1989). Although the Bushmen (San) tribes

are geographically closer to the Zulu, research shows them to be both

genetically and linguistically more distant from the Zulu than are the Teita

(Figure 2) (Cavalli-Sforza et al. 1994; Knight et al. 2003).

3. Kyushu and Hokkaido Japanese. Cavalli-Sforza et al. (1994) consider

the Kyushu to be an outlier among the Japanese groups (Figure 3).

However, they are genetically closer to the Hokkaido Japanese than the

other East Asian groups in the FORDISC sample, the Ainu and the

Anyang (Omoto and Saitou 1997).

23

4. Yauyos and Santa Cruz. The indigenous groups of the Yauyos District in

Peru speak Quechua, a dialect in the Andean language group (Kaestle

and Smith 2001). Figure 4 shows Andean speakers as closest to those

who speak Penutian, the language of the Santa Cruz Amerindians

(Cavalli-Sforza et al. 1994). Although the Arikara are geographically

closer to the Santa Cruz Amerindians than the Peruvians, they are

Caddoan speakers in a more distantly related Keresiouan language group

(Campbell 1997).

5. Mainland Australian Aborigines and Tasmanians. The exact timing of

the first migration of humans into Sahul - the Pleistocene landmass that

once connected New Guinea, Australia and Tasmania - is still being

debated (Hudjashov et al. 2007; Redd and Stoneking 1999; Webb and"

Rindos 1997). However, the current consensus is that humans colonized

Sahul between 50,000 and 40,000 years ago (Walsh and Eckhoff 2007).

Radiocarbon dates of multiple sites suggest that Tasmania may have

been settled as early as 35,000 years ago (O'Connell and Allen, 1998),

which implies a prolonged period of genetic exchange with other Sahul

migrants until -12,000 years ago, when rising sea levels cut Tasmania off

from mainland Australia (Redd and Stoneking 1999). As such, the

mainland Australian Aborigines were considered to be the closest match

for the Tasmanian group in the FORDISC sample.

24

To score the results for the analyses that included the source population, an

assignment was considered 'correct' if FORDISC chose the test individual's own

population as the most likely population. For the analyses that excluded the

source population, an assignment was 'correct' when FORDISC selected the

population most closely related to the test individual's source population.

As noted earlier, several combinations of acceptable posterior and typicality

probabilities have been proposed. To reiterate, Jantz and Ousley (2005)

recommended that determinations should be accepted only if the posterior

probability exceeds 0.5 and the typicality probability exceeds 0.01. Later they

suggested that determinations with posterior probabilities less than 0.8 are more

likely to be incorrect than correct (Jantz and Ousley 2007). With this lack of

consensus in mind, the number of correctly classified specimens was calculated

three times: once by accepting all posterior probabilities and typicality

probabilities, once by accepting a determination if the posterior probability was

>0.5 and typicality probability >0.01, and once using a posterior probability >0.8

and typicality probability >0.01. FORDISC 3.0 provides three typicality values:

'ranked', 'F' and 'Chi'. The FORDISC 3.0 manual suggests that ranked or 'R'

typicalities are the most reliable since they do not require multivariate normality.

In contrast, the 'F' ratio typicality can be artificially inflated "as the number of

variables approaches a group's sample size" and the Chi-square typicality

probabilities "tend to call more individuals atypical than F typicality probabilities"

(Jantz and Ousley 2005: 54). Accordingly, the R typicality values were used.

25

Figure 1: Genetic tree for 26 European populations (Cavalli-Sforza et al. 1994:268)

DutchDanishEnglishSwissGermanBelgianAustrianFrenchSwedishNorwegianCzechoslovakianPortugueseItalianSpanishHungarianPolishRussianSCottishIrishFinnishIcelandicBasqueYugoslavian

L- Greek

'------------------- Sardinian'----------------------- Lapp

0.010.020.03 o..... ......L ..L.. ......L ...I' Genetic Distance

0.04

Figure 2: Genetic tree for 33 African populations (Cavalli-Sforza et al. 1994: 169)

....--- PygmoId !,--__ Bantu, N.E. I

'-- Bantu, C.E.Bantu,S.W.Bantu,C.W.Nllotlc

Kunama !IBantu. S.E.Bantu, N.W.

'---- UbanglanVoila IEwe •GurMandeKruYoruba100FulanlHaussBane

L- Bedlkl- Funji

....-__ L--C=====::serer

.r------1C=:= WolofPeulL-_L- sendawe

l- Hadza

L------L==~======san

SomaliKholL. Mbull

3.8.13.0.13.0.1

3.0.1

3.0.13.0.1

3.5.33.0.13.0.1

3.0.13.0.13.0.13.0.13.0.13.0.13.0.13.0.13.0.13.0.13.0.13.0.13.5.33.5.3

3.0.13.0.1

3.0.13.0.13.5.33.5.33.2.23.5.33.5.33ll.3

....~_t °--lc:s ~Genetic Dlsta~

26

Figure 3: Genetic tree for 21 Asian populations (Cavalli-Sforza et al. 1994: 231).

s= Turkoman

lUzbekTurkishAItak:North ChineseNepaleseYakutSherpaTwaKoreanJapaneseRyukyuSouthwest HonshuHonshu KantoHonshu ChubuKyushuHonshu KinkiBhutaneseTIbetanAinuSouth Chinese

0.08 0.04 0Genetic Distance

Figure 4: Genetic tree for 23 American populations (Cavalli-Sforza et al. 1994:323)

USSR EskimoChukchiKoryakReindeer ChukchiNorth Na-DeneCanadian Na-DeneInupik EskimoGreenland EskimoAlaskan EskimoCanadian EskimoMacro-PanoanSou1h Macro-ChibchanAndeanPenutianKeresiouanNorth Central AmerindMacro-CaribEquatorialCentral Macro-ChibchanAlmosanSou1h Na-DeneMacro-GeMacro-Tucanoan

0.12I

0.10I

0.05I

oGenetic Distance

27

3. Results

3.1. Impact of including source population and specifying sex

This section addresses the impact of population inclusion and sex selection on

FORDISC's success rate. As discussed, debate surrounds whether or not a

specimen should be tested with FORDISC if its source population is not

represented in the reference database. Similarly, although some studies have

found inconsistencies in classification between using both sexes and using only

the relevant sex, the effect of sex selection on FORDISC's success rate has not

been fully tested.

3. 1. 1. Number of correct assignments accepting all posterior and typicality

probabilities. The totals for the number of correct assignments for each set of

analyses are given in table 1. In the analyses of the whole cranium dataset, the

best test results were obtained when the source population was included and the

sex specified (i.e. only relevant sex included in reference sample). In this

analysis, 88.5% of the test specimens were assigned correctly. The next-best

results were obtained when the source population was included and the sex

unspeci'F1ed (Le. both sexes included in reference sample). In this case, 82.5% of

the test specimens were assigned correctly. The third-best results were obtained

when the source population was excluded and the sex specified. Here,

28

FORDISC correctly classified 39.5% of the test specimens. The worst results

were obtained when the source population was excluded and sex unspecified.

This analysis returned 36.5% correct classifications.

The results of the analyses using the basicranium dataset followed the same

pattern as those for the whole cranium dataset. Again, the best results were

achieved when the source population was included and the sex specified. In this

analysis, FORDISC correctly assigned 33.5% of the test specimens. With the

source population included and the sex unspecified, FORDISC assigned 22.5%

of the test sample correctly. With the source population removed, FORDISC

assigned more test specimens correctly with the sex specified than with it

unspeci'fied. In this case, 10.5% of the test sample was correctly classified with

the sex specified compared to 8.5% with the sex unspecified.

The results of the analyses using the neurocranium dataset also followed the

same pattern as those for the whole cranium dataset. The best result (48.5%)

was achieved with the source population included and the sex specified. The

next-best result occurred when the source population was included and the sex

unspecified. In this case, FORDISC classified 41.5% of the test sample correctly.

When the source population was excluded, FORDISC assigned 23.0% of the test

sample correctly with the sex specified and 16.0% with the sex unspecified.

29

The results of the analyses using the face dataset followed the same pattern as

the results for the previous three sets of analyses. As before, the best result was

achieved when the source population was included and the sex specified. In this

analysis, FORDISC assigned 41.5% of the test sample correctly. When the sex

was unspecified, it assigned 34.0% correctly. With the source population

excluded, FORDISC assigned 23.0% of the test sample correctly with the sex

specified and 15.0% with the sex unspecified.

Table 1. Total number of test specimens correctly classified2 (n=200)

Dataset 15U 155 E5U E55

Whole cranium 165 177 73 79(82.5%) (88.5%) (36.5%) (39.5%)

Basicranium 45 67 17 21(22.5%) (33.5%) (8.5%) (10.5%)

Neurocranium 83 97 32 46(41.5%) (48.5%) (16.0%) (23.0%)

Face68 83 30 46

(34.0%) (41.5%) (15.0%) (23.0%)

2 All tables use the following format: ISU = source population included, sex unspecified. ISS =source population included, sex specified. ESU = source population excluded, sex unspecified.ESS =source population excluded, sex specified. Bold cell indicates the variable set with thehighest success rate for each population. Upper value in cell is the number of test specimenscorrectly classified. The value in parentheses is the percentage of test specimens correctlyclassified.

30

In all four sets of analyses, then, markedly more individuals were correctly

c1assi'f1ed when the source population was included than when it was excluded.

More individuals were also correctly classified when comparisons were limited to

specimens of the same sex rather than when the test specimens were compared

to both males and females.

3.1.2. Number of correct assignments using >0.5 posterior probability and

>0.01 typicality probability. Table 2 shows the scores recalculated based on

these criteria. As before, FORDISC achieved the best results using the whole

cranium dataset with the source population included and the sex specified. This

was followed by the source population included and sex unspecified results.

Third-best results for the whole cranium were achieved when the source

population was excluded and the sex specified. FORDISC consistently returned

the least number of correct assignments using the whole cranium when the

source population was excluded and the sex unspecified. As with the total

number of correct assignments, the results for the three other datasets followed

the same pattern.

31

Table 2. Number of test specimens correctly classified using >0.5 posteriorprobability and >0.01 typicality probability (n=200)

Dataset ISU ISS ESU ESS

Whole cranium160 171 49 54

(80.0%) (85.5%) (24.5%) (27.0%)

Basicranium 4 10 2 4(2.0%) (5.0%) (1.0%) (2.0%)

Neurocranium 34 54 5 16(17.0%) (27.0%) (2.5%) (8.0%)

Face 29 44 6 14(14.5%) (22.0%) (3.0%) (7.0%)

3.1.3. Number of correct assignments using >0.8 posterior

probability and >0.01 typicality probability. Once again, for each dataset,

FORDISC classified the highest number of test specimens correctly when the

source population was included and the sex specified (Table 3). The next-best

results were achieved with the source population included and the sex

unspecified. In the case of the basicranium dataset, the sex-unspecified and

specified analyses returned the same number (0.5%) of correct classifications.

The population-excluded results followed a similar pattern. FORDISC achieved

better success when the sex was specified than when it was unspecified.

However, it is worth noting that, with the source population excluded, FORDISC

failed to classify a single individual out of 200 in three cases.

32

Overall, the numbers of correct assignments were low at the 0.8 posterior

probability level. With one exception, the best results were obtained when the

source population was included and the sex specified. The worst results occurred

when the source population was excluded and the sex unspecified.


Dataset ISU ISS ESU ESS

Whole cranium 139 156 24 38(69.5%) (78.0%) (12.0%) (19.0%)

Basicranium 1 1 0 0(0.5%) (0.5%) (0.0%) (0.0%)

Neurocranium 9 26 1 4(4.5%) (13.0%) (0.5%) (2.0%)

Face 5 13 0 2(2.5%) (6.5%) (0.0%) (1.0%)

3.1.4. Summary. Regardless of the criteria used to assess the results,

many more test specimens were correctly classified when the source population

was included than when it was excluded. Better results were also obtained when

a test specimen was compared only to reference specimens of the same sex,

rather than to both sexes. It is important to note here however, that the numbers

of correct classifications were extremely low in most analyses. Even when all

posterior and typicality probabilities were accepted, FORDISC achieved no better

33

than a 48.5% success rate in the majority of analyses. Only in the whole

cranium, population-included analyses were more than half of the test specimens

correctly c1assi'fied.

3.2. Impact of variable number

This section considers the effect of variable number on FORDISC's classification

rate. To reiterate, although studies have shown more variables to be more

effective in discriminating among groups, Jantz and Ousley (2005) maintain that

using large numbers of variables with FORDISC produces poor results due to a

phenomenon they refer to as 'overfitting'.

3.2.1 Number of correct assignments accepting all posterior and typicality

probabilities. The total number of correct assignments for each set of variables is

given in table 4. For the population-included, sex-unspecified analyses, the 56

variable whole cranium dataset returned the greatest number of correct

classifications. Here, FORDISC correctly assigned 82.5% of the test individuals

to the appropriate population. This was followed by the 10-variable neurocranium

dataset which returned 41.5% test specimens correctly. The face and

basicranium datasets returned 34.0% and 22.5% correct assignments,

respectively.

34

The results were similar for the population-included, sex-specified analyses.

Again, FORDISC was the most successful with the 56-variable dataset, correctly

assigning 88.5% of the test individuals to the appropriate population. The 10

variable neurocranium dataset achieved the next-best result with 48.5%, followed

by the face with 41.5% and the basicranium dataset with 33.5%.

The results for the analyses that excluded the source population were similar to

the population-included results. With the sex unspecified, FORDISC classified

36.5% of the test specimens with the most closely related population using the

56-variable dataset, compared to 16.0%,15.0% and 8.5% for the 10-variable

datasets (neurocranium, face and basicranium, respectively).

As with the other analyses, when the source population was excluded and only

the same-sex reference groups used, the best result was achieved using the 56

variable dataset. In this case, FORDISC assigned 39.5% of the test sample to

the most closely related population. Unlike the previous results, however, the

next-best result was shared by the neurocranium and face. Using each of these

datasets, FORDISC classified 23.0% of the test specimens correctly. In keeping

with the other analyses, the basicranium dataset returned the fewest number of

correct classifications, assigning 10.5% of the test specimens to the most closely

related population.

35

Thus, regardless of other factors, FORDISC returned signi'f1cantly more correct

classi'f1cations using the 56-variable whole-cranium dataset than with any of the

1O-variable datasets.

Table 4. Total number of test specimens correctly classified by variablenumber (n=200)

Analysis56 variables 10 variables 10 variables 10 variables

(cranium) (basicranium) (neurocranium) (face)

ISU 165 45 83 68(82.5%) (22.5%) (41.5%) (34.0%)

ISS177 67 97 83

(88.5%) (33.5%) (48.5%) (41.5%)

ESU 73 17 32 30(36.5%) (8.5%) (16.0%) (15.0%)

ESS 79 21 46 46(39.5%) (10.5%) (23.0%) (23.0%)

3.2.2 Number of correct assignments using >0.5 posterior probability and

>0.01 typicality probability. Table 5 shows the recalculated scores based on 0.5

posterior probability and 0.01 typicality values for the four sets of analyses. The

impact of number of variables on the number of correct classifications was

heightened when the recommended probability and typicality values were taken

into account.

36

For the population-included, sex-unspecified analyses, the best results were

achieved when FORDISC used 56 variables. In this analysis, 80.0% of the test

specimens were correctly classified. Of the 1O-variable datasets, the

neurocranium returned the next-best result (17.0%), followed by the face dataset

(14.5%). The basicranium achieved the poorest result, with only four test

individuals (2.0%) correctly assigned.


FORDISC correctly classified 85.5% of the test population using 56 variables as

opposed to 27.0%, 22.0% and 5.0% for the 10-variable neurocranium, face and

basicranium datasets respectively.

Like the population-included analyses, the best results for the population

excluded, sex-unspecified analyses were obtained using the whole cranium

dataset. Here, FORDISC assigned 24.5% of the test individuals to the most

closely related group. Surprisingly, the next-best results used the face dataset

rather than the neurocranium. In this case, 3.0% of the test specimens were

correctly classified using the face variables versus 2.5% using the neurocranium.

Once again, however, the basicranium dataset achieved the poorest results with

FORDISC placing only two individuals (1.0%) with their related population.

In the population-excluded, sex-specified analyses, the best results were

obtained using the 56-variable whole-cranium dataset. Here, FORDISC correctly

37

classified 27.0% of the test sample to the most closely related group. The pattern

for the 1O-variable datasets followed the first two sets of analyses: the

neurocranium returned the next-best result with 8.0%, followed by the face

(7.0%) and basicranium (2.0%).

Although the number of correct assignments fell significantly when the

recommended posterior probability and typicality values were used, the 56-

variable dataset continued to achieve considerably better results than the three

1O-variable datasets.


Analysis 56 variables 10 variables 10 variables 10 variables(cranium) (basicranium) (neurocranium) (face)

ISU 160 4 34 29(80.0%) (2.0%) (17.0%) (14.5%)

ISS171 10 54 44

(85.5%) (5.0%) (27.0%) (22.0%)

ESU 49 2 5 6(24.5%) (1.0%) (2.5%) (3.0%)

ESS 54 4 16 14(27.0%) (2.0%) (8.0%) (7.0%)


>0.01 typicality probability. Using the 0.8 posterior probability criterion, the

number of correct assignments fell again. Table 6 summarizes these results.

38

Once again, for all analyses, the best results were achieved using the 56-variable

whole-cranium dataset. This was followed by the neurocranium, face and

basicranium datasets. However, the 10-variable datasets performed poorly in

general, especially when the source population was excluded.


Analysis56 variables 10 variables 10 variables 10 variables

(cranium) (basicranium) (neurocranium) (face)

15U 139 1 9 5(69.5%) (0.5%) (4.5%) (2.5%)

155 156 1 26 13(78.0%) (0.5%) (13.0%) (6.5%)

E5U 24 0 1 0(12.0%) (0.0%) (0.5%) (0.0%)

E55 38 0 4 2(19.0%) (0.0%) (2.0%) (1.0%)

3.2.4. Variable number and population differences. The previous sections

combined the results for all 200 test individuals. To determine whether the results

are consistent among the test populations, this section re-examines the results

for each geographic group in relation to variable number.

3.2.4. 1. Number of correct assignments accepting all posterior and

typicality probabilities. Table 7 shows the total number of correct classifications

39

for each population. For the population-included, sex-unspecified analyses,

FORDISC achieved the best results using the 56-variable dataset. This was

consistent across all five populations. For the Berg, 85.0% of the test specimens

were classified correctly using the 56-variable dataset as opposed to 47.5% for

the next-best 10-variable dataset (neurocranium). For the Santa Cruz

Amerindians, FORDISC correctly c1assi'fled 92.5% of the test specimens using 56

variables in contrast to 60.0% for the next-best result (face dataset). 70.0% of the

Northern Japanese were correctly classified using 56 variables, with the next

best result returning only 25.0% (face dataset). The Tasmanian group was

correctly classified in 80.0% of the cases using the 56-variable dataset with the

neurocranium dataset returning the next-best result with 57.5%. For the Zulu,

FORDISC also achieved the best results using 56-variables and the next-best

result using the 10-variable neurocranium dataset (50.0%).

The population-included, sex-specified analyses followed a similar pattern to the

sex-unspecified results. For all five populations, FORDISC was the most

successful using the 56-variable dataset. For the Berg, FORDISC classified

85.0% of the test specimens correctly using 56 variables. The neurocranium

dataset returned the next-best result with 57.5%. All 40 Santa Cruz individuals

(100%) were correctly classified using the 56 variable dataset with the face

returning the next-best result with 62.5%. 77.5% of the Northern Japanese were

correctly classified using 56 variables, in contrast to 30.0% for the best 10

variable dataset (basicranium). The Tasmanian group was correctly classified

40

82.5% of the time using 56 variables with the next-best result coming from the

neurocranium dataset (75.0%). The Zulu group also achieved the best result

using the 56-variable dataset (97.5%) and the next-best using the neurocranium

dataset (47.5%).

The pattern of correct assignments for the population-excluded, sex-unspecified

analyses followed the population-included results for four out of the five

populations. FORDISC correctly classified 15.0% of the Berg specimens using

56 variables as opposed to 10.0% using the next-best 1O-variable dataset (face).

For the Santa Cruz population, 45.0% were correctly classified using the 56

variable dataset followed by 20.0% using the facial dataset. 60.0% of the

Northern Japanese were correctly classified using 56 variables with the 10

variable face dataset returning the next-best result (15.0%). The Tasmanian

group also achieved better results using 56 variables, classifying 40.0% with this

dataset versus 30.0% using the neurocranium dataset. In contrast, FORDISC

achieved the best results for the Zulu group using the neurocranium dataset.

Here, 27.5% of the test sample was correctly assigned in comparison to 22.5%

when 56 variables were used.

For the population-excluded, sex-specified analyses, three out of the five groups

achieved the best results using the 56-variable dataset. For the Santa Cruz

population, FORDISC c1assi'fied more specimens correctly using 56 variables

(45.0%) than 10 variables (35.0% using the face dataset). This was also the case

41

for the Northern Japanese. For this group, FORDISC correctly classified 65.0%

of the specimens using the 56-variable dataset versus 20.0% using the face.

FORDISC also classified more Tasmanians correctly using 56 variables (50.0%)

than with 10 variables (40.0% using the neurocranium dataset). In contrast, both

the Berg and the Zulu groups deviated from the general pattern. For the Berg,

FORDISC classi'fied the same number of specimens using the 1O-variable face

dataset as the 56-variable whole cranium. In each case, 17.5% of the test

specimens were correctly classified. For the Zulu, 47.5% of the test specimens

were correctly classified using the neurocranium dataset and only 20.0% using

the 56-variable whole-cranium dataset.

Table 7. Results by population accepting all posterior and typicalityprobabilities (n=40)

Berg Santa Cruz N.Japan Tasmanian Zulu

ISU

56 variables 34 37 28 32 34(whole cranium) (85.0%) (92.5%) (70.0%) (80.0%) (85.0%)

10 variables 11 8 7 7 12(basicranium) (27.5%) (20.0%) (17.5%) (17.5%) (30.0%)

10 variables 19 15 6 23 20(neurocranium) (47.5%) (37.5%) (15.0%) (57.5%) (50.0%)

10 variables 13 24 10 15 6(face) (32.5%) (60.0%) (25.0%) (37.5%) (15.0%)

ISS

56 variables· 34 40 31 33 39(whole cranium) (85.0%) (100%) (77.5%) (82.5%) (97.5%)

42



10 variables 18 25 11 20 9(face) (45.0%) (62.5%) (27.5%) (50.0%) (22.5%)

ESU




10 variables 4 8 6 10 2(face) (10.0%) (20.0%) (15.0%) (25.0%) (5.0%)

ESS




10 variables 7 14 8 13 4(face) (17.5%) (35.0%) (20.0%) (32.5%) (10.0%)

43

3.2.4.2 Number of correct classifications using >0.5 posterior probability

and >0.01 typicality probability. Table 8 shows the number of correct

classifications for each population when the recommended posterior and

typicality probabilities are considered. The number of correctly classi'fied

specimens fell when the posterior probability and typicality criteria were used, but

the results followed the same pattern as those obtained in the analyses in which

all posterior and typicality probabilities were employed. Thus, in the ISU and ISS

analyses the 56 variable dataset out performed all the 1O-variable datasets, while

in the ESU and ESS analyses the 56-variable dataset out-performed the 10

variable datasets in the case of the Berg, Santa Cruz, Northern Japanese and

Tasmanians, but not the Zulu. In the ESU analyses, the Zulu neurocranium 10

variable dataset performed as well as the 56-variable dataset (10% of specimens

correctly classified in both cases). In the ESS analyses, the Zulu neurocranium

10-variable dataset performed better than the 56-variable dataset (25% versus

17.5%).

44

Table 8. Results by population using >0.5 posterior probability and >0.01typicality probability (n=40)


ISU


10 variables 4 0 0 0 0(basicranium) (10.0) (0.0%) (0.0%) (0.0%) (0.0%)


10 variables 7 16 1 5 0(face) (17.5%) (40.0%) (2.5%) (12.5%) (0.0%)

ISS

56 variables 33 40 30 33 35(whole cranium) (82.5%) (100%) (75.0%) (82.5%) (87.5%)

10 variables 4 3 0 0 3(basicranium) (10.0) (7.5%) (0.0%) (0.0%) (7.5%)


10 variables 11 18 1 14 0(face) (27.5%) (45.0%) (2.5%) (35.0%) (0.0%)

ESU




45

10 variables 0 1 1 3 1(face) (0.0%) (2.5%) (2.5%) (7.5%) (2.5%)

ESS




10 variables 2 3 3 4 2(face) (5.0%) (7.5%) (7.5%) (10.0%) (5.0%)

3.2.4.3 Number of correct classifications using >0.8 posterior probabilitv

and >0.01 tvpicalitv probabilitv. Table 9 shows the number of correct

classifications for each population when the more strict posterior and typicality

probabilities are considered. Again, the number of correctly classified specimens

fell and the results were very low in general. For the ISU and ISS analyses the 56

variable dataset out performed all the 10-variable datasets. In the ESU analyses,

the 56-variable dataset out-performed the 10-variable datasets in the case of the

Berg, Santa Cruz, Northern Japanese and Tasmanians, but not the Zulu. Here,

the Zulu neurocranium 10-variable dataset and the 56-variable dataset achieved

the same result (2.5% of specimens correctly classified in both cases). In the

ESS analyses, the 56-variable dataset outperformed all of the 10-variable

datasets.

46

Table 9. Results by population using >0.8 posterior probability and >0.01typicality probability (n=40)


ISU




10 variables 2 3 0 0 0(face) (5.0%) (7.5%) (0.0%) (0.0%) (0.0%)

ISS




10 variables 3 8 0 2 0(face) (7.5%) (20.0%) (0.0%) (5.0%) (0.0%)

ESU




47

10 variables 0 0 0 0 0(face) (0.0%) (0.0%) (0.0%) (0.0%) (0.0%)

ESS




10 variables 0 0 0 1 1(face) (0.0%) (0.0%) (0.0%) (2.5%) (2.5%)

3.2.5. Summary. Overall, these results indicate that the number of

variables used has a significant impact on FORDISC's ability to identify ancestral

group, regardless of other factors. FORDISC correctly classified more test

specimens using the 56-variable whole-cranium dataset than with any of the 10-

variable datasets. The 10-variable datasets never achieved higher than 48.5%

correct classifications (total result for neurocranium dataset, ISS). This compares

unfavourably to 88.5% for the 56-variable dataset under the same conditions.

However, this finding did not hold for all the test populations or at different

probability levels.

48

3.3 Impact ofcranial region

The effect of cranial region on FORDISC's ability to identify ancestry has not

previously been addressed in the literature. Accordingly, this section compares

the results of the four sets of analyses using equal numbers of variables selected

to isolate the neurocranium, basicranium and face.

3.3.1. Number of correct assignments accepting all posterior and typicality

probabilities. Table 10 provides the total number of correct assignments for each

cranial region accepting all posterior and typicality probabilities. For the

population-included, sex-unspecified analyses, the neurocranium dataset

returned the greatest number of correct classifications. Here, FORDISC correctly

assigned 41.5% of the test individuals to the appropriate population. The face

and basicranium datasets returned 34.0% and 22.5% correct assignments,

respectively.


Again, FORDISC was the most successful using the neurocranium dataset. In

this analysis, 48.5% of the test specimens were correctly classified versus 41.5%

using the face and 33.5% using the basicranium dataset.

49

When the sex was left unspeci'fied, the results for the analyses that excluded the

source population followed a similar pattern to the population-included results. In

this analysis, FORDISC classified more test specimens with the most closely

related population using the neurocranium dataset (16.0%). This was followed by

the face dataset (15%) and the basicranium dataset (8.5%).

In contrast, when the source population was excluded and only the same sex

reference groups used, the neurocranium and face datasets returned the same

results. In both cases, FORDISC classified 23.0% of the test specimens

correctly. In keeping with the other analyses, the basicranium dataset returned

the fewest number of correct classifications, assigning 10.5% of the test

specimens to the most closely related population. Thus, with only one exception,

FORDISC achieved the best results using the neurocranium dataset, followed by

the face and basicranium datasets.

Table 10.Total number of test specimens correctly classified (n=200)

Analysis Basicranium Neurocranium Face

ISU 45 83 68(22.5%) (41.5%) (34.0%)

ISS 67 97 83(33.5%) (48.5%) (41.5%)

ESU 17 32 30(8.5%) (16.0%) (15.0%)

ESS 21 46 46(10.5%) (23.0%) (23.0%)

50


>0.01 typicality probability. Table 11 shows the recalculated scores based on 0.5

posterior probability and 0.01 typicality values for the three sets of cranial-region

analyses. For the population-included, sex-unspecified analyses, the best results

were achieved when FORDISC used the neurocranium dataset. Here, 17.0% of

the test specimens were c1assined correctly. This was followed by the face

dataset (14.5%). The basicranium achieved the poorest result, with only four test

individuals (2.0%) correctly assigned.

The pattern was similar for the population-included, sex-specified analyses.

FORDISC correctly classified 27.0% of the test population using the

neurocranium dataset as opposed to 22.0% and 5.0% for the face and

basicranium datasets, respectively.

Surprisingly, the best results for the population-excluded, sex-unspecified

analyses were obtained using the face dataset. Here, FORDISC assigned 3.0%

of the test individuals to the most closely related group compared to 2.5% using

the neurocranial variables. Once again, the basicranium dataset achieved the

poorest results with FORDISC placing only two individuals (1.0%) with the most

closely related population.

In the population-excluded, sex-specified analyses, the best result was once

again obtained using the neurocranium dataset. Here, FORDISC correctly

51

classified 8.0% of the test sample to the most closely related group. The face

dataset achieved the next-best result with 7.0%, followed by the basicranium with

2.0%.

Although the number of correct assignments fell significantly when the

recommended posterior probability and typicality values were used, the

neurocranium dataset returned the highest number of correct classifications in all

but one case. With this one exception, the face dataset obtained the next-best

results. The basicranium consistently returned the lowest number of correct

assignments.

Table 11. Number of test specimens correctly classified using >0.5posterior probability and >0.01 typicality criteria (n=200)


ISU 4 34 29(2.0%) (17.0%) (14.5%)

ISS10 54 44

(5.0%) (27.0%) (22.0%)

ESU 2 5 6(1.0%) (2.5%) (3.0%)

ESS4 16 14

(2.0%) (8.0%) (7.0%)

52


>0.01 typicality probability. Using the 0.8 posterior probability criterion, the

number of correct assignments fell again. Table 12 shows these results. As with

the previous results, FORDISC classified more test specimens for the population

included, sex-unspecified analyses using the neurocranium dataset. Here, 4.5%

of the specimens were correctly classified using the neurocranium dataset, in

contrast to 2.5% using the face dataset and 0.5% using the basicranium dataset.

The pattern was similar for the population-included, sex-specified analyses.

Using the neurocranium dataset, FORDISC correctly classified 13.0% of the test

specimens, followed by 6.5% using the facial variables and 0.5% using the

basicranial variables. For the population-excluded, sex-unspecified analyses, the

neurocranium again achieved the best results. However, since only one

individual (0.5%) was classified correctly and none were classified correctly using

either the face or basicranium datasets, the term 'best' is used loosely.

FORDISC fared little better with the population-excluded, sex-specified analyses.

Here, 2.0% of the test specimens were correctly classified using the

neurocranium dataset. Only 1.0% was classified using the face dataset and no

individuals were classified correctly using the basicranium dataset.

Once again, the neurocranium dataset achieved the highest number of correct

determinations, followed by the face and the basicranium. However, at the 0.8

S3

posterior probability level, the results are generally poor and the population-

excluded results are extremely low.

Table 12. Number of test specimens correctly classified using >0.8posterior probability and >0.01 typicality probability (n=200)


ISU 1 9 5(0.5%) (4.5%) (2.5%)

ISS 1 26 13(0.5%) (13.0%) (6.5%)

ESU 0 1 0(0.0%) (0.5%) (0.0%)

ESS 0 4 2(0.0%) (2.0%) (1.0%)

3.3.4. Cranial region and population differences. As with the variable

number analyses, the previous sections combined the results for all 200 test

individuals. To establish whether or not FORDISC is consistent between the test

populations, this section considers the results for each cranial region according

to geographic group.

3.3.4. 1. Number of correct assignments accepting all posterior and

typicalitv probabilities. Table 13 breaks down the results for each cranial region

by population accepting all posterior and typicality probabilities. For the Berg,

when the population was included, FORDISC was the most successful with the

54

neurocranial variables, placing 47.5% (sex unspeci'fied) and 57.5% (sex

specified) of the individuals into the Berg group. The facial region was the next

most successful followed by the basicranium. However, when the Berg group

was excluded, the facial dataset achieved the best results. Here, the program

assigned 10.0% (sex unspeci'fied) and 17.5% (sex specified) of the specimens to

the Norse group compared to 7.5%/12.5% for the neurocranium and 2.5%/2.5%

for the basicranium.

The situation was different for the Santa Cruz group. FORDISC was the most

successful in attributing ancestry using the facial variables in all analyses. With

the population included and the sex unselected, 60.0% of the sample was

correctly classified using the face, versus 37.5%, for the neurocranium and

20.0% for the basicranium. With the population included and sex selected, 62.5%

of the sample was classified using the facial variables, in comparison to 45.0%,

using the neurocranium and 30.0% using the basicranium dataset. When the

Santa Cruz group was excluded from the analysis, with the sex unselected,

FORDISC assigned 20.0% of the sample to the Peruvian group using the face

dataset, 10.0% using the neurocranium and 12.5% using the basicranial variable

set.

For the Northern Japanese, FORDISC returned more correct assignments using

the facial region in three out of four analyses. However, when the population was

included and the sex speci'fied, FORDISC obtained the best results using the

55

basicranium variable set, placing 30.0% of the individuals back into the Northern

Japanese group versus 27.5% for the face and 17.5% for the neurocranium.

For the Tasmanians, FORDISC achieved the highest success rate when using

the neurocranial variables in all analyses. For the population included analyses

57.5% (sex unspecified) and 75.0% (sex specified) specimens were assigned

correctly over 37.5% and 50.0% using facial variables and 32.5% and 17.5%

using the basicranium.

The Zulu group also showed the best results when FORDISC used the

neurocranial variables. When the population was included, 50% (sex unspecified)

and 47.5% (sex specified) of the individuals were correctly identified, versus

30.0% and 37.5% for the basicranium and 15.0% and 22.5% for the facial region.

56

Table 13. Total results for each cranial region by population (n=40)


ISU

11 8 7 7 12Basicranium(27.5%) (20.0%) (17.5%) (17.5%) (30.0%)

19 15 6 23 20Neurocranium(47.5%) (37.5%) (15.0%) (57.5%) (50.0%)

13 24 10 15 6Face(32.5%) (60.0%) (25.0%) (37.5%) (15.0%)

ISS

15 12 12 13 15Basicranium(37.5%) (30.0%) (30.0%) (32.5%) (37.5%)

23 18 7 30 19Neurocranium(57.5%) (45.0%) (17.5%) (75.0%) (47.5%)

18 25 11 20 9Face(45.0%) (62.5%) (27.5%) (50.0%) (22.5%)

ESU

1 5 3 5 3Basicranium(2.5%) (12.5%) (7.5%) (12.5%) (7.5%)

3 4 2 12 11Neurocranium(7.5%) (10.0%) (5.0%) (30.0%) (27.5%)

4 8 6 10 2Face(10.0%) (20.0%) (15.0%) (25.0%) (5.0%)

ESS

1 8 4 5 3Basicranium(2.5%) (20.0%) (10.0%) (12.5%) (7.5%)

5 4 2 16 19Neurocranium

(12.5%) (10.0%) (5.0%) (40.0%) (47.5%)

7 14 8 13 4Face(17.5%) (35.0%) (20.0%) (32.5%) (10.0%)

57

3.3.4.2. Number of correct assignments using >0.5 posterior probability

and >0.01 typicality probability. Table 14 lists the results for each cranial region

by population using 0.5 posterior probability and 0.01 typicality probability. Using

the recommended probability criteria with the source population included,

FORDISC achieved the best results for the Berg using the neurocranial

variables. This was followed by the face and basicranium datasets. However,

when the Berg population was excluded and the sex spedfied, FORDISC was

only able to place two specimens into the target group (the Norse). FORDISC

could not place any individuals correctly when the sex was unspecified or by

using the other cranial regions.

In the analysis of the Santa Cruz specimens, FORDISC achieved the highest

rate of success when the population was included using the facial variables

(40.0% - sex unspecified, 45% - sex specified). The facial variables were also the

most successful when the source population was excluded. However, like the

Berg, the c1assi'fication rates were very low and no test specimens were correctly

classified using the basicranium dataset.

With the Northern Japanese group FORDISC assigned specimens correctly at

the recommended probability levels only when using the facial variables. No

individuals were correctly assigned using the other two variable sets.

Surprisingly, FORDISC placed more specimens correctly when the source

population was excluded. Three individuals (7.5%) were placed with the

58

Southern Japanese group when the sex was specified, while only one each was

correctly assigned when the source population was included and the sex

unspeci'fied or specified.

For the Tasmanian sample, FORDISC returned more correct assignments using

the neurocranium variables when the population was included. When the

Tasmanian group was excluded and the sex unspecified, the best result was

obtained using the face dataset (7.5% compared to 2.5% for neurocranium or

basicranium). However, when the sex was specified, FORDISC placed more

specimens into the Australian group using the neurocranial dataset (12.5%

versus 10.0% using the face and 5.0% using the basicranium).

Lastly, for the Zulu group, at the recommended probability levels, FORDISC

achieved the best results using the neurocranium dataset in all analyses.

However, the population-included results were very poor for the other cranial

regions and more individuals were correctly placed when the source population

was excluded.

S9

Table 14. Results for each cranial region using >0.5 posterior probabilityand >0.01 typicality probability (n=40)


ISU

Basicranium 4 0 0 0 0(10.0%) (0.0%) (0.0%) (0.0%) (0.0%)

Neurocranium 11 5 0 13 5(27.5%) (12.5%) (0.0%) (32.5%) (12.5%)

7 16 1 5 0Face(17.5%) (40.0%) (2.5%) (12.5%) (0.0%)

ISS

Basicranium 4 3 0 0 3(10.0%) (7.5%) (0.0%) (0.0%) (7.5%)

14 6 0 20 14Neurocranium(35.0%) (15.0%) (0.0%) (50.0%) (35.0%)

11 18 1 14 0Face(27.5%) (45.0%) (2.5%) (35.0%) (0.0%)

ESU

0 0 0 1 1Basicranium(0.0%) (0.0%) (0.0%) (2.5%) (2.5%)

0 0 0 1 4Neurocranium(0.0%) (0.0%) (0.0%) (2.5%) (10.0%)

0 1 1 3 1Face(0.0%) (2.5%) (2.5%) (7.5%) (2.5%)

ESS

0 0 0 2 2Basicranium(0.0%) (0.0%) (0.0%) (5.0%) (5.0%)

0 1 0 5 10Neurocranium(0.0%) (2.5%) (0.0%) (12.5%) (25.0%)

2 3 3 4 2Face

(5.0%) (7.5%) (7.5%) (10.0%) (5.0%)

60

3.3.5. Summary. When all the test populations were pooled, the

neurocranium produced the best results of the three cranial regions. With two

exceptions, the face dataset obtained the next-best results. The basicranium

consistently returned the lowest number of correct assignments. Indeed, in two

cases, FORDISC was unable to classify a single individual out of 200 using this

dataset. However, when the populations were considered individually, the results

were inconsistent and the rates of correct classification were very low in general.

In sum, FORDISC varied in its ability to classify individuals correctly with respect

to cranial region.

61

4. Discussion

4.1. Main findings

The difference between FORDISC's success rate in the source population

included analyses and its success rate in the source population-excluded

analyses was substantial. When the whole cranium dataset was analyzed with

the source population included, more than two thirds of the test specimens were

classified correctly (70-89%) whereas when the whole cranium dataset was

analyzed with the source population excluded less than half of the test

specimens were classified correctly (12-40%). Far fewer specimens were

classified correctly in the analyses that focused on an individual anatomical

region, but the number classified correctly in the source population-included

analyses was always at least twice the number classified correctly in the source

population-excluded analyses. Thus, the presence or absence of the source

population in the reference sample greatly impacts the accuracy of FORDISC.

Specifically, the analyses suggest that if a test specimen's source population is

represented in FORDISC's reference sample, there is a reasonable chance that

the ancestry will be accurately determined, whereas if the specimen's source

population is not represented in FORDISC's reference sample, there is little

chance that its ancestry will be accurately determined.

62

The finding that a test specimen's source population has to be represented in the

reference sample in order for there to be a reasonable chance for its ancestry to

be accurately determined is consistent with Jantz and Ousley's (2005) cautions

regarding the use of FORDISC. However, as a result it challenges the utility of

the program in any but the most restricted circumstances. As noted in the

Introduction, it is entirely possible for a set of remains to be from any place in the

world, particularly if they are recent. Consequently, the likelihood of being able to

determine in advance if an unknown specimen's population is represented in the

FORDISC reference group sample is extremely low. If FORDISC is only effective

when an individual's source population is represented in the reference sample

and a researcher must establish this in order to be confident about the program's

determinations, there is no point in actually undertaking a FORDISC analysis. At

best it will only confirm a determination made by some other means. urthermore,

if the test specimen's source population is not represented in the program's

reference sample and a specimen is analyzed anyway, an investigator cannot be

confident that the resulting determination actually corresponds to a closely

related population. In the end, the analysis has not assisted in narrowing an

individual's ancestry.

As discussed earlier, Jantz and Ousley have also suggested that secular change

may be responsible for some of FORDISC's poor performance in previous tests.

This means that in addition to needing to have the source population

represented, FORDISC also requires a specimen to be contemporaneous with

63

the specimens in the reference sample to be reliable. The Forensic Databank

includes modern forensic cases as well as "mid to late 19th century Amerindian

remains" (Jantz and Ousley 2005:35) while Howells' reference populations range

from 26-30th Dynasty Egyptians (600-200 B.C.) to mid-20th century dissection

room cadavers (Howells 1973). As a result, even if an investigator knew that an

unknown specimen came from Egypt, for example, if they could not also say it

came from the same time period as Howells' group, FORDISC's attribution would

have to be considered suspect. Furthermore, if modern Americans have changed

so significantly in the last 150 years, the point at which secular changes override

population differences needs to be clearly established.

This study also determined that restricting the program to the relevant sex

improved FORDISC's ability to correctly attribute ancestry. When the source

population was included and 56 variables were used, selecting the sex resulted

in a six percent improvement over not doing so. For the 1O-variable datasets,

selecting the sex achieved between seven and 11 percent better results than

when it was left unselected. The results were similar for the source population

excluded results, with between three and eight percent improvement when the

sex was selected. This suggests that accurately sexing an unknown specimen

through morphological examination is advisable before IJsing FORDISC to

determine a specimen's ancestry.

64

The results of the analyses also suggest that the number of variables greatly

affects FORDISC's ability to determine ancestry. When the 200 test specimens

were considered together, using 56 variables consistently returned the highest

rate of correct assignments, regardless of other criteria. Even in the "best-case

scenario" where the source population was represented in the reference sample,

the sex of the test specimen was specified, and all posterior and typicality

probabilities were accepted, 10 variables achieved less than half the success

rate that 56 variables obtained.

The 56-variable dataset also outperformed the three 1O-variable datasets when

the test specimens were broken down by population. The only exceptions were

the analyses in which the Zulu test specimens were analyzed without the Zulu

population being represented in the reference sample. In these analyses, the 10

variable neurocranium dataset returned more correct assignments than the 56

variable dataset.

In general, these results contradict the claims by Jantz and Ousley (2005) that

"as more variables are added, there is a tendency for the classification accuracy

to plateau and then decrease" (p. 50) and support the findings of other

researchers that better discrimination is achieved by maximizing the number of

variables (Hubbe and Neves 2005). Furthermore, they suggest that, contrary to

claims regarding FORDISC (Ubelaker et al. 2002), the program cannot be used

65

with confidence on incomplete remains from which only a few measurements can

be obtained.

The effect of anatomical region on FORDISC's ability to identify ancestry was not

resolved by this study. Although FORDISC achieved the best results on average

for the five groups using the neurocranium, it did not do so consistently across

populations and the returns were very low in general. When the results were

considered as a whole, the neurocranium was the most effective for determining

ancestry, followed by the face and basicranium.

These results conflict with the prediction that the basicranium would be the most

successful because it is the most phylogenetically and ontogenetically stable of

the three regions, while the face would be the least successful due to non

genetic inl~uences on its shape. In fact, while the neurocranium and facial regions

vied for the highest success rate, the basicranial variable set consistently

returned the fewest correct assignments. However, because all three regional

datasets performed so poorly, this question was not fully resolved by the current

study.

4.2 Implications for use of FORDISC

The results of this study suggest that the utility of FORDISC is limited. In order

for the program to yield an accurate determination of ancestry, the target

66

specimen's source population must be present in FORDISC's reference sample

and its sex must be known. In addition, the target specimen must be complete

enough for more than 10 measurements to be recorded on it and for those

measurements to relate to more than one region of the cranium.

The utility of FORDISC may in fact be more limited than the analyses reported

here suggest. During the course of the study, it became apparent that the

evaluation criteria that have been recommended are ineffective. The following

figures relate to the set of analyses that yielded the highest number of correctly

classified specimens-that is, the analyses in which the source population was

included in the reference sample, sex was specified and 56 variables were

employed. Using the 0.5 posterior probability/0.01 typicality probability

combination, five "correct" test individuals (2.5%) would be falsely rejected. Using

the same criteria, 16 (8%) incorrect determinations would be falsely accepted.

Using the 0.8 posterior probability/0.01 typicality probability combination, 17

(8.5%) of the test individuals would be rejected even though they were correct,

and 'five (2.5%) "incorrect" determinations would be considered correct. Thus,

neither of the recommended combinations of posterior probability and typicality

probability enables us to be confident that the ancestry of a specimen has been

correctly determined.

With the foregoing in mind, a sectioning point for the posterior probability and

typicality probabilities was calculated from the results of the analyses that yielded

67

the highest number of correctly classified specimens. The posterior probabilities

associated with incorrect assignments ranged from 0.389 to 0.991, while the

typicality probabilities ranged from 0.000 to 0.952 (Table 15). This indicates that,

for an ancestry determination to be considered correct without ambiguity, the

posterior probability must be greater than 0.991 and the typicality probability

must be higher than 0.952. Using these criteria, only two determinations out of

200 (1.0%) would be considered unambiguously correct and the rest would have

to be considered unclassifiable. Clearly, if in the best case scenario only 1.0% of

FORDISC's attributions can be accepted with confidence, this has serious

implications for the program's utility.

Table 15: Range of posterior and typicality probabilities for correct andincorrect assignments by population

CORRECT INCORRECT

PP PP TP TP PP PP TP TPMIN MAX MIN MAX MIN MAX MIN MAX

Berg .646 1.0 0.0 .947 .521 .876 0.0 .643

Snt. Cruz .752 1.0 0.077 .942 - - - -

N.Japan .593 1.0 .196 .964 .447 .850 0.0 .952

Tasmania .873 1.0 .043 .935 .436 .991 .327 .690

Zulu .546 1.0 0.0 .964 .389 .939 .440 .482

Even this may overestimate FORDISC's accuracy. As noted in the Materials and

Methods, Howells selected 50-55 crania of each sex to represent each group.

For a number of groups, this meant that only a small percentage of the available

individuals were measured. For example, the 26th_30th Dynasty Egyptian crania

68

were selected from a sample of nearly 1800. Significantly, the individuals were

not chosen at random. Rather, Howells "carefully selected" specimens that he

considered to be typical of the group (Howells 1995: 3). Crania that were

"morphologically unusual for the population as a whole" (Howells 1989: 89) were

not included, even if there were no obvious pathological changes to account for

the differences. Thus, Howells' data collection strategy was such that the degree

of overlap among the reference populations is likely to be artificially low. Given

that the accuracy of classification in DFA is inversely correlated with the degree

of overlap among groups, it is likely that the analyses reported here overestimate

the accuracy of FORDISC.

There is a further reason for suspecting that the study reported here may have

overestimated the utility of FORDISC. A number of the collections Howells

analyzed did not include "mandibles or skeletal parts to aid in the diagnosis" of

sex (Howells 1989: 91). Consequently, sex was frequently assessed on cranial

morphology alone. Although Howells attempted to corroborate his estimates with

those of other researchers who had examined the remains, he admitted that

some of the skulls of known sex "would certainly have been assigned to the

wrong sex if it had been done by inspection" (Howell 1989: 94). This suggests

that the sexes of Howells' populations may be more different than they should

be. The corollary of this is that the success rate of FORDISC in the analyses in

which sex was specified may have been artificially high.

69

4.3. Future considerations

FORDISC's utility may be limited because the nature of human variation is such

that ancestry cannot be determined from skeletal remains, as Williams et al.

(2005) have suggested. However, the importance of determining ancestry is

great enough that it would seem sensible to investigate other possibilities before

concluding that ancestry is an aspect of the biological profile that cannot be

accessed from the human skeleton.

One potential cause of FORDISC's poor performance is its reliance on two

dimensional measurement data. Three-dimensional landmark data may capture

more of the morphological differences among populations and therefore provide

a better basis for determining the ancestry of unknown specimens. Although

studies are beginning to use three-dimensional geometric morphometric methods

to explore population history and climate signals in modern human cranial

morphology (e.g. Harvati and Weaver 2006b), none has attempted to apply these

methods to estimate ancestry of unknown remains.

A second potential cause of FORDISC's poor performance is its reliance on

cranial data. Work on the utility of the cranium for reconstructing primate

phylogeny raises the possibility that the cranium is either an inadequate source

of information regarding ancestry or perhaps even a misleading one (ct. Collard

and Wood 2000). Although earlier studies met with limited success using

70

postcranial data for ancestry determination (Marino 1997; Ballard 1999; Holliday

and Falsetti 1999: Patriquin et al. 2002), it may be worthwhile investigating

whether supplementing cranial data with postcranial data and/or data from the

teeth and lower jaw provides more accurate determinations of ancestry.

A third potential cause of FORDISC's poor performance is its reliance on

Discriminant Function Analysis. It is possible that FORDISC's success rate is so

limited because DFA does not distinguish the form of similarity that is informative

with respect to ancestry-shared derived similarity-from forms of similarity that

are not informative regarding ancestry, such as shared primitive similarity and

convergent similarity. Accordingly, it would be worthwhile trying to adapt

phylogenetic methods that focus on shared derived similarity, such as cladistics

(Hennig 1966), to the problem of determining the ancestry of unknown skeletal

specimens.

While these possibilities are being explored, FORDISC will almost certainly

continue to be used to assist with ancestry determinations. With this in mind,

there would seem to be a pressing need to expand FORDISC's reference

samples. Ideally this would involve maximizing both the numbers of individuals

and populations represented, and ensuring that as many temporal periods are

covered as possible. Although Jantz and Ousley have supplemented the

Forensic Databank with new material, they have not similarly augmented the

Howells samples in FORDISC. While some remains have already been

71

repatriated, it would seem advisable to take advantage of the large number of

skeletal collections available in institutions around the world to fill in the temporal,

geographic or representational gaps in FORDISC's reference sample.

There is also a pressing need to investigate the relationship between number of

variables and success rate in greater detail. In the current study the maximum

number of variables common to all groups was compared to the recommended

minimum according to the FORDISC manual to determine how variable number

affected FORDISC's success rate. Although this provided a clear indication that

10 variables are insufficient to achieve good results, it did not establish what a

reasonable minimum might be. Given the fragmentary nature of many

bioarchaeological and forensic specimens, it would be useful to repeat the

analyses with 20, 30 and 40 variables to determine if the classification rate

improved consistently as the number of variables increased or whether it levels

off.

Lastly, during the 2007 FORDISC 3.0 workshop, Jantz and Ousley outlined a

new option in the program that allows a specimen to be analyzed on the basis of

shape alone. The option was developed, to "neutralize" the confounding effects

of sex (Jantz and Ousley 2007: 40). Given the marked impact that controlling for

sex had on FORDISC's success rate in the current study, it would be sensible to

examine whether employing the shape-only option results in more specimens

being correctly classified than when ancestry is determined on the basis of shape

72

and size. If the former proves to be the case, then the shape-only option may

improve the success rate of FORDISC when dealing with specimens that cannot

be sexed with confidence.

While this new transformation option might ensure that an unknown is assessed

on the basis of shape alone and is not significantly smaller than the reference

samples, it is not uncomplicated. Other evidence suggests that males and

females of a given population are not simply different sized variants of the same

basic form (Wood and Lynch 1996). As non-metric assessments attest, there are

clear shape differences between males and females irrespective of ancestry. If

males tend to have similar proportions regardless of size or population, removing

size would not necessarily help FORDISC achieve the correct ancestry. If this is

the case, then using the new shape transformation function in FORDISC 3.0

would result in males clustering with males and prove only that a skull has a male

shape and not that the shape necessarily relates to ancestry.

73

5. Conclusions

This study explored several issues related to the computer program FORDISC.

In particular, it addressed problems related to population representation in the

database, the number of variables to use in an analysis, the effect of constraining

sex, the effect of anatomical region, and the challenge of interpreting the results.

This research was undertaken in part because these issues are fundamental to

the appropriate use of the program. As FORDISC becomes more popular, a

danger lies in investigators using the program without fully understanding its

limitations. Additionally, the ongoing FORDISC debate has done little to resolve

the questions that have arisen around the program's performance. In fact, it

seems that each time a criticism of the program is raised, FORDISC's developers

add a new caveat to its use. Given the popularity of FORDISC and the

confidence place in it, it was deemed important to determine not only how

effective the program is, but whether or not the criticisms of it are valid.

In total, this study carried out four sets of analyses on four separate datasets for

200 individuals from within FORDISC's reference sample. The test datasets were

selected to include the range of possibilities in terms of both variable number and

anatomical region, while the test individuals were chosen from five populations

representing separate geographic regions. The first set of analyses tested each

dataset using all populations (including the one from which the test individual was

drawn) and both sexes. The second set of analyses also included all populations,

74

but restricted FORDISC's comparison to members of the same sex. The third set

of analyses excluded the test individual's source population but used both males

and females of the remaining groups. The fourth set of analyses excluded the

test individual's source population and compared it only to members of the same

sex.

The results of this study support FORDISC's developers' caution against using

the program if a representative population is not available. However, if a

population is not represented in the database, FORDISC cannot be expected to

find a closely related population - either geographically or genetically. This

suggests that while FORDISC may be useful in very restricted contexts, its

widespread use on geographically or temporally remote populations is not

acceptable.

With respect to variable number, the results contradict FORDISC's developers'

contention that using too many variables reduces performance. Instead, this

study found that FORDISC only achieved reasonable rates of success when the

number of variables was maximized. Reducing the number of variables to the

level recommended by the developers for the size of the reference sample,

resulted in exceedingly low success rates. The results were also not consistent

between test populations. Consequently, these results suggest that the program

should not be used on incomplete remains if sufficient numbers of measurements

cannot be obtained.

7S

This study also determined that FORDISC was more accurate in assigning

ancestry when comparing a specimen only to members of its own sex. When

both sexes of each population were included in the comparison, FORDISC did

not consistently select the appropriate ancestry. Unfortunately, it did not

necessarily select the same sex either. While these results suggest that size may

be confounding FORDISC's determination of ancestry, the problem requires

further investigation.

The issue of how anatomical region affects FORDISC's ability to determine

ancestry was not fully resolved by this research. Although the neurocranial

region achieved the best results overall, all three regions performed very poorly.

Furthermore, the results varied across the five test populations. However, it was

not possible to settle this question through the current FORDISC program as the

number of variables associated with each anatomical region is limited.

Lastly, the issue of how best to interpret the results in terms of the recommended

posterior and typicality probabilities was also not fully resolved. At the levels

recommended by the FORDISC 3.0 manual, more incorrect determinations

would erroneously be considered correct. However, at the levels recommended

by the FORDISC 3.0 workshops, more correct determinations would be rejected

as incorrect. Neither of these recommendations appeared to correspond with a

natural sectioning point between correct and incorrect attributions. However,

76

when a sectioning point was calculated directly from the data, almost every

ancestry determination had to be considered either inconclusive or incorrect.

As it stands, FORDISC requires the population, the time period, the sex and as

many measurements as possible for a set of remains before it can be expected

to return a reasonable estimation of ancestry. Furthermore, if FORDISC does not

achieve a higher than 0.991 posterior probability in addition to a 0.952 typicality

probability, the resulting ancestry determination must be considered ambiguous.

Given this situation, the only conclusion that can be drawn is that if FORDISC is

used at all, it should only be under extremely restricted circumstances or to

provide limited confirmation of information gathered through other means.

77

References

Albanese, J. and S.R Saunders2006 Is it possible to escape racial typology in Forensic Identification? InForensic Anthropology and Medicine: Complementary Sciences fromRecovery to Cause of Death. Schmitt, A, Cunha, E and J. Pinheiro eds.Totowa: Humana Press Inc.

Angel, J. L.1976 Colonial to modern skeletal change in the U.S.A., American Journalof Physical Anthropology. 45:723-736.

Anthropolog2005 Newsletter of the Department of Anthropology. National Museum ofNatural History. Accessed 02/22/08 via http://www.google.com/search?q=american +academy+of+forensic+science+fordisc+workshop&sourceid=navclient-ff&ie=UTF-8&rlz=1B2GGFB_ enCA218&aq=t

Arlington National Cemetery Website2005 Richard Vandergeer, Second Lieutenant, USAF memorial page(http://www.arlingtoncemetery.neUrvandergeer.html) Accessed:01/25/2007.

Ballard, M.E.1999 Anterior femoral curvature revisited: race assessment from thefemur. Journal of Forensic Sciences. Vol. 44:4.

Bass, W.M.1995 Human Osteology: A Laboratory and Field Manual. Columbia, MO:Missouri Archaeological Society.

Beals, K., Smith, C.L. and S.M. Dodd1983 Climate and the evolution of brachycephalization. American Journalof Physical Anthropology. Vol. 62:4.

Belcher, R, Williams, F. & GJ Armelagos2002 Misidentification of Meroitic Nubians using Fordisc 2.0. (Abstract)American Journal of Physical Anthropology. Vol 117, Supplement 34:42.

Bendor-Samuel J, and RL. Hartell (editors)1989 The Niger-Congo Languages - A classification and description of

Africa's largest language family. Lanham, Maryland: University Press ofAmerica.

78

Boas, F.1911 Changes in bodily form of descendants of immigrants. In Reports ofthe Immigration Commission. (1907-1910), Vol 38. Washington:Government Printing Office.

Brues, A.M.1991 The Once and Future Diagnosis of Race. In Skeletal Attribution ofRace. Gill, G.W. and S. Rhine eds. Abuquerque, NM: Maxwell Museum ofAnthropology.

Buikstra, J.E. and D.H. Ubelaker.1994 Standards for Data Collection from Human Skeletal Remains.Fayetteville, AK: Arkansas Archaeological Society

Campbell, L.1997 American Indian Languages: The Historical Linguistics of NativeAmerica. New York: Oxford University Press.

Campbell A.R, and G.J. Armelagos2007 Assessment of FORDISC 3.0's accuracy in classifying individualsfrom WW Howell's populations and the forensic data bank. (Abstract)American Journal of Physical Anthropology Vol.132 Suppl. 44, P 83-84.

Carey, J.W. and A.T. Steegmann Jr.1981 Human Nasal Protrusion, Latitude, and Climate. American Journalof Physical Anthropology. Vol. 56: 3.

Cavalli-Sforza, LL, Menozi, P and A. Piazza1994 The history and geography of human genes. Princeton: UniversityPress.

Collard, M. and B. Wood2000 How reliable are human phylogenetic hypotheses? Proceedings ofthe National Academy of Sciences (PNAS). Vol. 97:9.

Coon, C.S., Gam, S.M. and J.B. Birdsell1950 Races: a study of the problems of race formation in man. Springfield,IL: Charles C. Thomas.

Corruccini, R.S.1974 An examination of the meaning of cranial discrete traits for humanskeletal biological studies. American Journal of Physical Anthropology.Vol. 40: 3.

79

Cox, Katharine, N.G Tayles, & H.R Buckley2006 Forensic Identification of 'Race': The Issues in New Zealand. CurrentAnthropology. Vol. 47: 5.

Cunningham, D.L. & D.J. Westcott2002 Within-group human variation in the Asian Pleistocene: the threeUpper Cave crania. Journal of Human Evolution. Vol. 42: 627-638.

Franciscus, R.G and J.C. Long1991 Variation in human nasal height and breadth. American Journal ofPhysical Anthropology. Vol. 85: 4.

Freid, D., Spradley, M.K., Jantz, R.L. and S.D. Ousley2005 The truth is out there: how NOT to use FORDISC. (Abstract)American Journal of Physical Anthropology, Vol 126, Supplement 40.

Fukuzawa, S. and A. Maish1997 Racial Identi'f1cation of Ontario lroquoian Crania Using FORDISC2.0 (Abstract) from the 44th annual meeting of the Canadian Society ofForensic Science. Accessed via http://www.csfs.ca/journal/reginabstr.htm

Giles E. and O. Elliot1962 Race identification from cranial measurements. Journal of ForensicSciences. Vol. 7: 147-157.

Gill, G.W. and M. Gilbert1990 Race identification from the midfacial skeleton: American blacks andwhites. In Skeletal Attribution of Race. Gill, G.W. and S. Rhine eds.Abuquerque, NM: Maxwell Museum of Anthropology.

Harvati, K and T.D. Weaver2006a Reliability of cranial morphology in reconstructing Neandertalphylogeny. In Neanderta/s revisited: new approaches and perspectives.Harvati, K and T. Harrison, eds. Dordrecht: Springer 239-254.

-- 2006b Human Cranial Anatomy and the Differential Preservation ofPopulation History and Climate Signatures. The Anatomical Record, PartA,288A:1225-1233.

Hennig, W.1966 Phylogenetic systematics. Urbana: University of Illinois Press.

80

Hiernaux, J.1963 Heredity and environment: their innuence on human morphology; acomparison of two independent lines of study. American Journal ofPhysical Anthropology. Vol 21: 575-589.

Holliday, T.W. and A.B. Falsetti1999 A new method for discriminating African-American from EuropeanAmerican skeletons using postcranial osteometries reflective of bodyshape. Journal of Forensic Sciences. Vol. 44: 5. 926-30.

Howells, W.W.1973 Cranial Variation in Man: A Study by Multivariate Analysis ofPatterns of Difference Among Recent Human Populations. Papers of thePeabody Museum of Archaeology and Ethnology, Volume 67.

-- 1989 Skull Shapes and the Map. Cambridge, MA: Papers of thePeabody Museum of Archaeology and Ethnology, Volume 78.

-- 1995 Who's who in skulls: ethnic identification of crania frommeasurements. Cambridge, MA, Peabody Museum of Archaeology andEthnology, Volume 82.

-- 1996 Howells' craniometric data on the internet. American Journal ofPhysical Anthropology. Vol. 101: 3.

Hubbe, M & WA Neves2007 On the Misclassification of Human Crania. Discussion. CurrentAnthropology, volume 48, pp. 285-288.

Huberty, C.J.1994 Applied Discriminant Analysis. In Wiley series in probability andmathematical statistics. Applied probability and statistics. New York, NY:Wiley.

Hudjashov G, Kivisild T, Underhill PA, Endicott P, Sanchez JJ, Lin AA, Shen P,Oefner P, Renfrew C, and R. Villems

2007 Revealing the prehistoric settlement of Australia by Y chromosomeand mtDNA analysis. Proceedings of the National Academy of Sciences(PNAS). 104(21 ):8726-8730.

Hughes, D.R.1968 Skeletal plasticity and its relevance in the Study of EarlierPopulations. In The Skeletal Biology of Earlier Human Populations. D. R.Brothwell editor, pp. 31-55. London: Thames and Hudson.

81

Hylander, W. L.1977 The adaptive significance of Eskimo craniofacial morphology. InOrofacial Growth and Development. Dahlberg, A.A and T.M. Graber eds.Chicago, IL: Mouton 129-170.

Jantz, R.L. and L. Meadows Jantz2000 Secular change in craniofacial morphology. American Journal ofHuman Biology. Vol. 12:327-338.

Jantz, RL & SO. Ousley1992 FORDISC 1.0: Computerized Forensic Discriminant Functions. TheUniversity of Tennessee, Knoxville.

-- 1996 FORDISC 2.0: Computerized Forensic Discriminant Functions.The University of Tennessee, Knoxville.

-- 2005 FORDISC 3: Computerized Forensic Discriminant Functions.Version 3.0. The University of Tennessee, Knoxville.

-- 2007 FORDISC 3.0: Theory, Methods and Applications. Workshop heldin San Antonio, TX. February 20,2007.

Kaestle, F.A, and D.G. Smith2001 Ancient mitochondrial DNA evidence for prehistoric populationmovement: The numic expansion. American Journal of PhysicalAnthropology 115(1 ): 1-12.

Keita, S. O. Y.2007 On Meroitic Nubian Crania, Fordisc 2.0 and Human BiologicalHistory. Discussion. Current Anthropology 48: 425-427.

Kitson, E.1931 A Study of the Negro Skull with Special Reference to the Craniafrom Kenya Colony. Biometrika 23(3/4): 271-314.

Knight, A, Underhill, PA, Mortensen, HM, Zhivotovsky, LA, Lin, AA, Henn, BM,Louis, 0, Ruhlen, M, and J.L. Mountain

2003 African Y Chromosome and mtDNA Divergence Provides Insightinto the History of Click Languages. Current Biology 13(6):464-473.

Kosiba, S.2000 Assessing the Efficacy and Pragmatism of "Race" Designation inHuman Skeletal Identification: A Test of Fordisc 2.0 Program (Abstract).American Journal of Physical Anthropology, Vol 111, Supplement 30:200.

82

Leathers, A, Edwards, J, & GJ Armelagos2002 Assessment of Classification of Crania Using Fordisc 2.0: NubianX-Group Test (Abstract). American Journal of Physical Anthropology Vol.117, S34:99-100.

Lieberman, D., Krovitz, G.E., Yates, F.W., Devlin, M. and M. St.Claire2004 Effects of food processing on masticatory strain and craniofacialgrowth in a retrognathic face. Journal of Human Evolution. Vol. 46: 6.

Lovvorn, MB, Gill, GW, Carlson, GF, Bozell, JR, & TL. Steinacher1999 Microevolution and the Skeletal Traits of a Middle Archaic Burial:Metric and Multivariate Comparison to Paleoindians and ModernAmerindians. American Antiquity, Vol. 64, NO.3. pp. 527-545.

Mangold, WL, Nawrocki, SP, & J. Scherbauer1993 The Shaffer Site (12 GR 109): Additional information on an AlbeePhase Site in the White River Valley. Indiana University. Accessed01/06/07 via www.gbl.indiana.edu/abstracts/93/mangold_93.html

Naar, N. A., D. Hilgenberg, and G.J Armelagos2006 Fordisc 2.0 the ultimate test: What is the truth? (Abstract) AmericanJournal of Physical Anthropology, Vol 129, Supplement 42:136.

Nicholson, E. and K. Harvati2006 Quantitative analysis of human mandibular shape using threedimensional geometric morphometries. American Journal of PhysicalAnthropology. Vol. 131: 3, 368-383.

O'Connell, JF and J. Allen1998 When Did Humans First Arrive in Greater Australia and Why Is ItImportant to Know? Evolutionary Anthropology. Vol 6:132-146.

Omoto K, and Saitou N.1997 Genetic origins of the Japanese: A partial support for the dualstructure hypothesis. American Journal of Physical Anthropology102(4):437-446.

Patriquin, M.L., Steyn, M. and S.R. Loth.2002 Metric assessment of race from the pelVis in South Africans.Forensic Science International. Vol. 127:1-2, pp. 104-113.

Peitrusewsky. M2000 Metric Analysis of Skeletal Remains: Methods and Applications. InBiological Anthropology of the Human Skeleton. M.A. Katzenberg and S.Saunders eds. New York, NY: Wiley-Liss. 375-416.

83

Redd, A.J, and M. Stoneking1999 Peopling of Sahul: mtDNA Variation in Aboriginal Australian andPapua New Guinean Populations. American Journal of Human Genetics65(3).

Roseman, Charles C.2004 Detecting interregionally diversifying natural selection on modernhuman cranial form by using matched molecular and morphometric data.Proceedings of the National Academy of Sciences (PNAS). Vol 101 :35,12824-12829.

Roseman, C.C. and T.D.Weaver2004 Multivariate apportionment of global human craniometric diversity.American Journal of Physical Anthropology. Vol. 125: 257-263.

Sejrsen B, Lynnerup N & Hejmadi M.2005 An historical skull collection and its use in forensic odontology andanthropology. Journal of Forensic Odontostomatology. 2005 Dec.23(2):40-4.

Skelton, RR1996 A Suggested Method for Using Means Data in DiscriminantFunctions Using Anthropometric Data. Journal of World Anthropology. Vol1(4).

Skelton, Rand H. McHenry1992 Evolutionary relationships among early hominids. Journal of HumanEvolution. Vol 23: 309-349.

Smith, BH, Gam, SM and WS Hunter1986 Secular trend in face size. Angle Orthodontist. Vol. 56: 196-204.

Spradley, M.K, Ousley, SD and RL Jantz2008 Evaluating Cranial Morphometric Relationships using DiscriminantFunction Analysis. (Abstract) American Journal of Physical Anthropology.Vo1.135: S46, 199.

Steadman, DW, Adams, BJ, & LW. Konigsberg2006 Statistical basis for positive identification in forensic anthropology.American Journal of Physical Anthropology. Vol 131 (1), pp15-26.

Ubelaker, DH., Ross, AH and SM Graver2002 Application of Forensic Discriminant Functions to a Spanish CranialSample. Forensic Science Communications 4(3).

84

Walsh, SJ and C. Eckhoff2007 Australian Aboriginal population genetics at the D1 S80 VNTR locus.Annals of Human Biology. Vol. 34: 5, 557-565.

Webb RE, and Rindos DJ.1997 The Mode and Tempo of the Initial Human Colonization of EmptyLandmasses: Sahul and the Americas Compared. p 233-250.

Wescott, D.J and R.L. Jantz2005 Assessing Craniofacial Secular Change in American Blacks andWhites Using Geometric Morphometry. In Modern Morphometries inPhysical Anthropology. New York: Kluwer Academic/Plenum Publishers.p.231-45.

Williams, F L'Engle, Belcher, RL. & GJ. Armelagos2005 Forensic Misclassification of Ancient Nubian Crania: Implications forAssumptions about Human Variation. Current Anthropology 46(2): 340346.

Williams, Paul B., Erickson, P and L. Niven2001 Retrieving History: The 18th Century Mortuary History of the LittleDutch Church, Halifax. Paper Presented At The 33rd Annual Meeting ofThe Canadian Archaeological Association.

Wood, B. and D. Lieberman2001 Craniodental variation in Paranthropus boisei: a developmental andfunctional perspective. American Journal of Physical Anthropology.116:13-25.

Wood, C. and J.M. Lynch1996 Sexual dimorphism in the craniofacial skeleton of modern humans.In Advances in Morphometries. F.L. Marcus, M. Corti, A. Loy, G.J.P Naylorand D.E. Slice, editors. NATO ASI Series A: Life Sciences Vol. 284.

Wright, R. V. S.1992 Correlation between cranial form and geography in Homo sapiens:CRANID - A computer program for forensic and other applications.Archaeology in Oceania (27): 128-34.

-- 2005 Guide to using the CRANID program CR5Ind.exe. Accessed viahttp://box.neUpublic/richwrig/dfiles/CR5Ind.lIP

85

Appendix I

Howells' measurements used in FOROISC

Measurement Description

GOl glabello-occipital (maximum cranial) length

NOl nasio-occipital length

BNl basion nasion (cranial base) length

BBH basion bregma height

XCB maximum cranial width

XFB max frontal breadth

STB bistephanic breadth

ZYB bizygomatic breadth

AUB biauricular breadth

WCB minimum cranial breadth

ASB biasterionic breadth

BPl basion prosthion length

NPH nasion prosthion height

NlH nasal height

OBH orbital height

OBB orbital breadth

JUB bijugal breadth

NlB nasal breadth

MAB palate breadth

MOH mastoid height

MOB mastoid width

2MB Bimaxillary breadth

SSS zygomaxillary subtense

FMB bifrontal breadth

NAS nasio-frontal subtense

86

EKB biorbital breadth

OKS dacryon subtense

OKB interorbital breadth

NOS naso-dacryal subtense

WNB simotic chord

SIS simotic subtense

IML malar length, inferior

XML malar length maximum

MLS malar subtense

WMH cheek height

SOS supraorbital projection

GLS glabella projection

FOL foramen magnum length

FRC nasion-bregma chord

FRS nasion-bregma subtense

FRF nasion-subtense fraction

PAC bregma-lambda chord

PAS bregma-lambda subtense

PAF bregma-subtense fraction

OCC lambda-opisthion chord

OCS lambda-opisthion subtense

OCF lambda-subtense fraction

VRR vertex radius

NAR nasion radius

SSR subspinale radius

PRR prosthion radius

OKR dacryon radius

ZOR zygoorbitale radius

FMR frontomalare radius

87

EKR ectoconichion radius

ZMR zygomaxillare radius

AVR M1 alveolus radius

NAA nasion angle ba-pr

PRA prosthion angle na-ba

BAA basion angle na-pr

NBA nasion angle ba-br

BBA basion angle na-br

SSA zygomaxillare angle

NFA nasio-frontal angle

DKA dacryal angle

NDA naso-dacryal angle

SIA simotic angle

FRA frontal angle

PAA parietal angle

OCA occipital angle

BRR Bregma radius

LAR Lambda radius

OSR Opisthion radius

BAR Basion radius

88

Appendix II

Howells populations used in FORDISC and their sample sizes.(test samples in bold).

Abbreviation3 Population Location Males/Females

NOR Medieval Norse Norway 55/55

ZAl Medieval Zalavar Hungary 53/45

BER Berg Austria 56/53

EGYEgyptian (26-30

Egypt 58/53Dynasty)

TEl Teita Kenya 33/50

DOG Dogon Mali 47/52

ZUL Zulu South Africa 55/46

BUS Bushman South Africa 41/49

AND Andaman Islanders Indian ocean 35/35

AUS lake Alexandrina South Australia 52/49Tribes

TAS Tasmanian Tasmania 45/42

TOl TolaiPapua New

56/54Guinea

MOK Mokapu Hawaii 51/49

BUR Buriat Siberia 55/54

ESK Inugsuk Eskimo Greenland 53/55

ARI Arikara South Dakota 42/27

PER Yauyos Peru 55/55

3 Used by FORDISC 3.0 when displaying the results of an analysis.89

EAS Easter Islanders South Padfic 49/37

AIN Ainu Japan 48/38

NJA Hokkaido North Japan 55/32

SJA Kyushu South Japan 50/41

HAl Hainan South China Sea 45/38

ANY Anyang Northeast China 42/0

ATA Atayal Taiwan 29/18

PHI Philippino Philippines 50/0

GUA Indigenous Guam South Pacific 30/27

MOR Moriori Chatham Islands 57/51

SAN Santa Cruz California 51/51

90

Appendix III

Variable sets used in the current study

Variable Sets Variables Used

ASB,AUB,AVR,BBH,BNl,BPl,OKB,OKR,OKS,EKB, EKR, FMB, FMR, FOl, FRC, FRF, FRS, GlS,GOl, IMl, JUB, MAB, MOH, MlS, NAR, NAS, NOS,

56 whole craniumNlB, NlH, NOl, NPH, OBB, OBH, OCC, OCF, OCS,PAC, PAF, PAS, PRR, SIS, SOS, SSR, SSS, STB,VRR, WCB, WMH, WNB, XCB, XFB, XMl, 2MB, ZMR,ZOR and ZYB

10 basicraniumAUB, WCB, ASB, MOH, MOB, OCC, OCS, OCF, FOl,and OCA

10 neurocraniumGOl, NOl, XCB, XFB, FMB, FRS, PAC, PAF, FRA andPM

10 faceBNl, NlB, MAB, OKB, NOS, WNB, NAR, OKR, PRAand OKA

91

Documents

FORDISC AND THE DETERMINATION OF ANCESTRY FROM ...summit.sfu.ca › system › files › iritems1 › 9004 › etd4063.pdf · software applications designed to make ancestry determinations