Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
FORDISC AND THE DETERMINATION OFANCESTRY FROM CRANIOMETRIC DATA
By
Marina Elliott
B.A., The University of British Columbia, 2005
A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OF
MASTER OF ARTS
In
THE DEPARTMENT OF ARCHAEOLOGY
© Marina Elliott, 2008
SIMON FRASER UNIVERSITY
Summer 2008
All rights reserved. This work may not bereproduced in whole or in part, by photocopy
or other means, without permission of the author.
APPROVAL
Name:
Degree:
Title of Thesis:
Examining Committee:
Chair:
Date Defended/Approved:
Marina Elliott
Master of Arts
FORDISC and the determination of ancestry fromcraniometric data
Catherine D'AndreaGraduate Program Chair
Mark CollardSenior SupervisorAssociate Professor, Archaeology
Mark SkinnerSupervisorProfessor, Archaeology
Brian ChisholmInternal ExaminerSenior Instructor, University of British Columbia
ii
SIMON FRASER UNIVERSITYLIBRARY
Declaration ofPartial Copyright LicenceThe author, whose copyright is declared on the title page of this work, has grantedto Simon Fraser University the right to lend this thesis, project or extended essayto users of the Simon Fraser University Library, and to make partial or singlecopies only for such users or in response to a request from the library of any otheruniversity, or other educational institution, on its own behalf or for one of its users.
The author has further granted permission to Simon Fraser University to keep ormake a digital copy for use in its circulating collection (currently available to thepublic at the "Institutional Repository" link of the SFU Library website<www.lib.sfu.ca> at: <http://ir.lib.sfu.ca/handle/1892/112>) and, without changingthe content, to translate the thesis/project or extended essays, if technicallypossible, to any medium or format for the purpose of preservation of the digitalwork.
The author has further agreed that permission for multiple copying of this work forscholarly purposes may be granted by either the author or the Dean of GraduateStudies.
It is understood that copying or publication of this work for financial gain shall notbe allowed without the author's written permission.
Permission for public performance, or limited permission for private scholarly use,of any multimedia materials forming part of this work, may have been granted bythe author. This information may be found on the separately cataloguedmultimedia material and in the signed Partial Copyright Licence.
While licensing SFU to permit the above uses, the author retains copyright in thethesis, project or extended essays, including the right to change the work forsubsequent purposes, including editing and publishing the work in whole or inpart, and licensing other parties, as the author may desire.
The original Partial Copyright Licence attesting to these terms, and signed by thisauthor, may be found in the original bound copy of this work, retained in theSimon Fraser University Archive.
Simon Fraser University LibraryBurnaby, BC, Canada
Revised: Fall 2007
Abstract
FORDISC is a computer program designed to determine ancestry from human
skeletal remains. It is widely used, yet its accuracy has been challenged. In this
study, 200 specimens from one of FORDISC's reference samples are used to
investigate four issues that are central to debate: (1) the inclusion of the source
population in the reference sample, (2) the influence of sex, (3) the impact of
variable number, and (4) the effect of different anatomical regions.
The results indicate that the source population must be present and the sex of
the specimen known before FORDISC can provide an accurate determination of
ancestry. Additionally, a determination will be successful only if more than 10
measurements pertaining to multiple anatomical regions are used. Even when
these conditions are met, few determinations may be considered unambiguously
correct. Overall, FORDISC performed below expectations and the results
suggest that the program should be used cautiously.
Keywords: FORDISC; ancestry determination; cranial morphology; forensicidentification; discriminant function analysis
Subject Terms: biological anthropology; forensics; craniometry; skull;human variation
iii
Acknowledgements
This research could not have happened without the encouragement and
assistance of many people. In particular, I would like to thank my supervisor, Dr.
Mark Collard for his generous advice, support and patience throughout this
process. In addition to all of his other duties and responsibilities, he always
seemed to have time for my questions and concerns. I would also like to thank
my committee members, Dr. Mark Skinner and Dr. Brian Chisholm, both of whom
took precious time out of their summer schedules to read and provide feedback
on this research.
In addition, I am extremely fortunate to have an excellent group of colleagues,
friends and family members. I am especially grateful to Alan Cross, Mana
Dembo, Kevan Edinborough, Luseadra McKerracher and the other members of
the Laboratory of Biological Anthropology whose intelligence, curiosity and
enthusiasm for their research inspired my own efforts. Many thanks also go to my
friends and family for providing valuable comments, welcome distractions and
incalculable kindnesses along the way. Although no words can truly express how
lucky I am to have them, thanks also go to my parents - their example gives me
something to strive for.
Finally, I would like to thank my husband, Robin Elliott. His writing and editing
contributions were invaluable, as were his computer skills when things went
awry. More importantly, his love, support, encouragement and apparently
endless tolerance of my interests (academic and otherwise) are a constant
source of wonder and admiration to me. I hope I have made him proud.
iv
Table of Contents
Approval iiAbstract iiiAcknowledgements ivTable of Contents vList of Tables viiList of Figures viii
1. Introduction 11.1. Aims and objectives 11.2. FORDISC and its applications 31.3. The FORDISC debate 61.4. Issues investigated 121.5. Outline of analyses 16
2. Materials and Methods 182.1. Data 182.2. Analyses 20
3. Results 283.1. Impact of including source population and specifying sex 28
3. 1. 1. Number of correct assignments accepting all posterior and typicalityprobabilities 283.1.2. Number of correct assignments using >0.5 posterior probability and >0.01typicality probability 313.1.3. Number of correct assignments using >0.8 posterior probability and >0.01typicality probability 32
3. 1.4. Summary 333.2. Impact of variable number 34
3.2. 1 Number of correct assignments accepting all posterior and typicalityprobabilities 343.2.2 Number of correct assignments using >0.5 posterior probability and >0.01typicality probability 363.2.3. Number ofcorrect assignments using >0.8 posterior probability and >0.01typicality probability 383.2.4. Variable number and population differences 393.2.4.1. Number of correct assignments accepting all posterior and typicalityprobabilities 393.2.4.2 Number of correct classifications using >0.5 posterior probability and >0.01typicality probability 443.2.4.3 Number of correct classifications using >0.8 posterior probability and >0.01typicality probability 463.2.5. Summary 48
3.3 Impact of cranial region 49
v
3.3. 1. Number of correct assignments accepting all posterior and typicalityprobabilities 493.3.2. Number of correct assignments using >0.5 posterior probability and >0.01typicality probability 513.3.3. Number of correct assignments using >0.8 posterior probability and >0.01typicality probability 533.3.4. Cranial region and population differences 543.3.4.1. Number of correct assignments accepting all posterior and typicalityprobabilities 543.3.4.2. Number of correct assignments using >0.5 posterior probability and >0.01typicality probability 583.3.5. Summary 61
4. Discussion 624.1. Main findings 624.2. Implications for use of FORDISC 664.3. Future considerations 70
5. Conclusions 74
References 78Appendix I 86Appendix II 89Appendix III 91
vi
List of Tables
Table 1. Total number of test specimens correctly classified (n=200) 30
Table 2. Number of test specimens correctly classified using >0.5 posteriorprobability and >0.01 typicality probability (n=200) 32
Table 3. Number of test specimens correctly classified using >0.8 posteriorprobability and >0.01 typicality probability (n=200) 33
Table 4. Total number of test specimens correctly classified by variable number(n=200) 36
Table 5. Number of test specimens correctly classified using >0.5 posteriorprobability and >0.01 typicality probability (n=200) 38
Table 6. Number of test specimens correctly c1assi'fied using >0.8 posteriorprobability and >0.01 typicality probability (n=200) 39
Table 7. Results by population accepting all posterior and typicality probabilities(n=40) 42
Table 8. Results by population using >0.5 posterior probability and >0.01typicality probability (n=40) 45
Table 9. Results by population using >0.8 posterior probability and >0.01typicality probability (n=40) 47
Table 10.Total number of test specimens correctly classified (n=200) 50
Table 11. Number of test specimens correctly classified using >0.5 posteriorprobability and >0.01 typicality criteria (n=200) 52
Table 12. Number of test specimens correctly classified using >0.8 posteriorprobability and >0.01 typicality probability (n=200) 54
Table 13. Total results for each cranial region by population (n=40) 57
Table 14. Results for each cranial region using >0.5 posterior probability and>0.01 typicality probability (n=40) 60
Table 15: Range of posterior and typicality probabilities for correct and incorrectassignments by population 68
vii
List of Figures
Figure 1: Genetic tree for 26 European populations 26
Figure 2: Genetic tree for 33 African populations 26
Figure 3: Genetic tree for 21 Asian populations 27
Figure 4: Genetic tree for 23 American populations 27
viii
1. Introduction
1.1. Aims and objectives
Determining ancestry from skeletonized human remains is an important task for
bioarchaeologists and forensic anthropologists. As part of a biological profile, this
information is used in a wide range of contexts, including the study of the
movements and interactions of past populations, ancestral land claims,
repatriation requests and the investigation of unlawful deaths and human rights
violations (Cox et al. 2006).
Despite attempts to use other skeletal elements (e.g. Marino, 1997; Ballard,
1999; Holliday and Falsetti, 1999: Patriquin et al. 2002), the skull continues to be
regarded as the most reliable area for determining ancestry (Bass 1995). As a
result, both non-metric and metric techniques have been developed to effect
ancestry determinations from the skull. The use of discrete traits, such as the
presence or absence of shoveled incisors, is common. However non-metric
characteristics are not exhaustive or always consistently defined and few
standards exist for their collection (Buikstra and Ubelaker 1994). Additionally,
non-metric methods have been challenged for being more susceptible to inter
observer error (Corruccini 1974).
1
Due to their perceived objectivity and accuracy, metric assessments of the skull
have achieved wide acceptance for assessing ancestry from skeletal remains
(e.g. Giles and Elliot 1962). Furthermore, the development of statistical methods
and computer technologies to manipulate large datasets has contributed to the
widespread use of craniometric methods. In particular, user-friendly computer
software applications designed to make ancestry determinations quickly and
easily have become popular.
Currently, FORDISC (Jantz and Ousley 2005) is the leading computer program
for ancestry determination. Although it is widely used, its application to questions
of ancestry is not unproblematic and its accuracy and reliability have been
questioned (Fukuzawa and Maish 1997; Kosiba 2000; Belcher et al. 2002;
Leathers et al. 2002; Ubelaker et al. 2002; Williams et al. 2005; Hubbe and
Neves 2007). In response, FORDISC's developers argue that the program's
apparent failures are due to inappropriate use of the program and/or
interpretation of results (Freid et al. 2005). In particular, they warn against testing
individuals whose populations are not represented in the database. They also
claim that using too many variables reduces success.
Given the importance of determining ancestry from skeletonized remains and the
confidence placed in FORDISC, there was a pressing need to resolve the issues
that have been raised regarding its accuracy. Accordingly, this study focused on
several key areas of debate. In particular, it evaluated the effect of including or
2
excluding the source population from the reference sample on FORDISC's
accuracy. It also examined how number of variables affects FORDISC's success
rate. To test the impact of constraining sex, test specimens were compared to
reference groups of both sexes and to same-sex groups alone. The effect of
using specific cranial regions on FORDISC's ability to determine ancestry was
also tested using datasets that isolated the basicranium, neurocranium and face.
1.2. FORDISC and its applications
Developed by Richard Jantz and Steve Ousley in association with the University
of Tennessee, FORDISC (short for Forensic Discriminants) was designed to
provide rapid and accurate ancestry determinations for crania of unknown origin
through Discriminant Function Analysis (DFA) of skull measurements. It also
offers ancestry and stature estimations from postcranial measurements. The
program was first commercially released in 1992. A second version followed in
1996. The current version, FORDISC 3.0, was released in 2005.
Before the program became publicly available, Jantz and Ousley provided
custom discriminants for individual specimens by request (Jantz and Ousley
2005). These ancestry determinations were made through comparisons with the
Forensic Anthropology Data Bank (FDB), a repository of U.S. forensic cases from
the 19th and 20th centuries. By the time FORDISC 1.0 was released the program
incorporated a much larger database of craniometric measurements collected by
3
W.W. Howells (Howells 1973; 1989). Howells' dataset includes values for 70
measurements recorded on more than 2500 crania from 29 populations. The
populations come from Africa, Europe, Asia, Australia/Pacific Islands and the
Americas, and range in time from 600 B.C to the mid 20th century. Incorporating
Howells' dataset significantly broadened the geographic and temporal range of
FORDISC's comparative sample. Since the release of the first version of the
program, Jantz and Ousley have augmented the Forensic Data Bank with data
from new U.S. cases and added a sample of males taken from modern forensic
cases in Guatemala.
The first two versions of FORDISC offered ancestry determinations through DFA
of up to 21 cranial measurements. In the current version, users may select up to
82 measurements when using Howells' data or 42 when using the Forensic Data
Bank. However, Jantz and Ousley (2005) note that because some
measurements were not taken on some individuals, sample sizes are limited by
the measurements selected.
From its inception, FORDISC has been a popular tool among bioarchaeologists.
Shortly after the release of version 1.0, Mangold et al. (1993) used FORDISC to
perform a two group DFA of 21 cranial measurements to corroborate a qualitative
trait-based analysis of the sex of a set of pre-contact Native American skeletal
remains. In this study, Mangold et al. concluded that the results "strongly aligned
the specimen with Amerindian females rather than males" (p. 2). In 2001,
4
Williams et al. (2001) used FORDISC 2.0 to explore the ancestry of several
individuals buried in a German settler's graveyard in Halifax, Nova Scotia. The
FORDISC results led Williams et al. to conclude that the remains were "non
European" and to involve the local Mi'kmaq chief in the investigation. FORDISC
2.0 was also used to assess 80 historical crania in the collections of the Institute
of Forensic Medicine in Copenhagen (Sejrsen et al. 2005). Although many of the
crania were marked only with "a general geographic or racial descriptor" (p. 40),
the authors of the study claimed to confirm ettmicity in 70% of the cases.
FORDISC has also been used to analyze more ancient human remains. Lovvorn
et al (1999) used FORDISC 2.0 to compare a male burial specimen from Sidney,
Nebraska with males from Howells' database. Using only six measurements
because the midface and orbits were missing, FORDISC 2.0 selected "Eskimo"
as the most likely population, followed by "Ainu" (Japan). Based on these results,
the authors concluded that the specimen possessed a "blend of Amerindian and
earlier protomongoloid traits" (p. 527) and that this was consistent with the
"hypothesis that Plains Amerindians descended from the earliest wave of
Paleoindians who crossed the Bering Straits" (p. 527). In another study,
Pleistocene remains from the site of Zhoukoudian (UC1 01, UC102 and UC103)
were compared to a reference sample that combined Howells' data with data
from additional Amerindian groups (Cunningham and Westcott 2002). The
authors concluded that their results supported the contention that the remains
"do not represent a family but are relatively contemporaneous" (p. 636).
5
In addition to being used in historical research, FORDISC is used regularly to
assist with identifications in forensic cases. For example, in 2000, an Arlington
Cemetery press report described FORDISC as "a key piece of software" used by
the U.S. Army Central Identification Laboratory in Hawaii for "automating the
process of matching skeletal remains" (ANC 2007).
FORDISC has become sufficiently popular that Jantz and Ousley now run
workshops focusing on the program during the American Academy of Forensic
Sciences annual meetings (Anthropolog 2005). Designed to help anthropologists,
archaeologist and forensic professionals carry out and interpret the results of
FORDISC analyses, these workshops cover a variety of topics such as statistical
parameters, the estimation of ancestry from postcranial material, "problem"
crania, and secular change (Jantz and Ousley 2007).
1.3. The FORDISC debate
Despite its popularity, the utility of FORDISC for ancestry determination has been
challenged. In 1997, Fukuzawa and Maish (1997) tested FORDISC 2.0 with 59
crania from two known Ontario lroquoian sites. Using both complete and partial
crania, the authors compared lroquoian individuals with seven populations from
Howells' dataset and found FORDISC to be an unreliable identifier of ancestry.
Similarly, Kosiba (2000) tested a series of East Indian crania and found that
FORDISC 2.0 was unable to consistently classify the sample.
6
In 2002, two studies used ancient Nubian crania to test FORDISC 2.0. In the 'first,
Belcher et al. (2002) analyzed 47 Meroitic Nubians and found little consistency in
either biological affinity or sex attribution. The authors concluded that the
program was flawed and challenged "the utility of any forensic application that
attempts to constrain worldwide human cranial variability" (p. 42). In the second
study, Leathers et al. (2002) tested FORDISC 2.0 with a collection of post
Meroitic Nubian crania using 12 cranial measurements. Only 57% of the 89
specimens were classified as African and the research team concluded that
FORDISC 2.0's classifications were "not morphologically or biologically accurate"
(p. 99).
A third evaluation of FORDISC 2.0 was published in 2002 (Ubelaker et al. 2002).
This study tested the program with a medieval Spanish sample. The authors
reasoned that, if the program was accurate, the test specimens should be
classified as one of the European populations in the reference sample. The study
achieved "a variety of results" (p. 3). Using the Forensic Data Bank, FORDISC
2.0 classified 44% of the test sample as white, 35% as black, 9% as Hispanic,
4% as American Indian and 3% each as Chinese or Vietnamese. Using Howells'
database, the 95 test individuals were classified into 21 different groups. 25%
were classified as Egyptian, followed by Austrian with 11 %. The remaining
specimens were scattered across 19 different populations ranging from Andaman
Islanders (7%) to Zulu (2%). Despite these diverse results, the authors concluded
that FORDISC was still a "useful forensic tool" (p. 4).
7
In 2005, another study using Nubian crania was published by the same authors
as the 2002 Meroitic paper (Williams et al. 2005). This study used 42 test
specimens instead of the 47 used previously, and 12 variables based on their
availability and diagnostic value. The authors reasoned that, if FORDISC was
accurate, it would group the Nubians together and classify them as Howells' 26
30th dynasty Egyptians since the two groups were geographic neighbours.
According to Williams et al. (2005), FORDISC "failed both tests" (p. 345).
FORDISC's developers did not respond to the various criticisms of the program
until the Williams et al. (2005) study was published. At that point, they suggested
that the "disputed results" were due to the use of "inappropriate reference
samples" (Freid et al. 2005: 103). Citing the limitations of Discriminant Function
Analysis, Jantz and Ousley (2005: 16) noted that any function "will classify an
unknown ... regardless of its actual ethnic group" and cautioned against testing
"an individual whose race or ethnic group is not represented in the reference
samples".
Jantz and Ousley (2005) also suggested that the critics had failed to properly
evaluate the posterior and typicality probabilities provided by FORDISC. These
probabilities are mathematical calculations used to evaluate the likelihood of
group membership (Pietrusewsky 2000). Posterior probabilities are a relative
measure of membership and sum to 1, while typicality probabilities assess
"whether the unknown individual could belong to any of the groups" based on the
8
absolute distances to each group (Albanese and Saunders 2006: 287). Jantz and
Ousley (2005) recommended that a population attribution be accepted only if the
posterior probabilities were 0.5 or more and the typicality probabilities higher than
The debate was not settled at this point, however. In 2007 Current Anthropology
published a discussion of FORDISC 2.0. In a reassessment of the Williams et al.
(2005) paper, Hubbe and Neves (2007) suggested that the study was flawed
because it had used only 12 variables, a number they considered to be "far from
enough to classify a skull on the basis of discriminant functions" (p. 285). In
response, Williams and Armelagos (2007) pointed out that FORDISC tutorials
frequently use 12 or fewer variables and that "there is no stipulated number of
variables ... simply because forensic evidence is often fragmentary" (p. 286).
They also pointed out that more variables would not necessarily improve success
if the measurements are collinear or non-diagnostic.
Williams and Armelagos (2007) also criticized Hubbe and Neves (2007) for using
Howells data as both test and control. They suggested that an 'independent'
sample - one whose individuals were not speci'f1cally included in the database -
should have been used and cited Naar et al. (2006) as an example of such a
I In the Freid et al. (2005) paper, Jantz and Ousley recommend accepting a determination only ifthe typicality probability is 0.1 or more. However, this appears to be a typographical error. TheFORDISC 3.0 manual suggests that typicality probabilities are "interpretively similar to theunivariate p value based on the normal distribution" (Jantz and Ousley, 2005: 48) and that "TPsbelow 0.05 (5%), or certainly 0.01 (1 %) for a group... indicate questionable membership...ormeasurement error" (Jantz and Ousley, 2005: 46). These comments lead me to believe that 0.01is the acceptable typicality probability value rather than 0.1.
9
study. However, Naar et al. (2006) also used Howells' data, specifically the 111
crania that make up the entire Egyptian sample. FORDISC only placed 55
(49.5%) of the sample back into the Egyptian group at the appropriate statistical
level. While the use of individuals from within the database may not provide an
independent test of the program, doing so should result in high levels of success
because the individuals already exist in the reference sample. A failure on
FORDISC's part to attribute members of its core sample appropriately would
suggest a significant problem with the program.
In the next issue of Current Anthropology, Williams, Belcher and Armelagos
(2007) replied to another critique of the 2005 study. In this discussion, Keita
(2007) suggested that Williams et al. (2005) had overemphasized the role of non
genetic factors in cranial development and noted that a "demonstration of
similarity using multivariate analyses does not always mean identity, close recent
common origin, or even origin in an adjacent region" (p. 425) Williams and
Armelagos (2007) responded by saying they had been criticized "for a paper that
we did not write" (p. 426) and that Keita had misunderstood their intent in
highlighting the conditions of growth. They stressed that their previous study had
been undertaken to demonstrate the "lack of fit between conceptual models...
and actual patterns of human biological variation" (p. 426) and maintained the
position that FORDISC is both functionally and conceptually flawed due to the
complexity of this variation.
10
Most recently, Campbell and Armelagos (2007) used a new individual scores
option in FORDISC to test samples taken from within both the W.W. Howells and
Forensic Databank reference groups. In this study, FORDISC was able to
correctly classify 73.1 % of Howells' individuals and 72.0% of the FDB individuals
using the Freid et al. (2005) probability levels when the sex was unspecified.
When sex was constrained, the results improved to 80.7% and 78.6%
respectively. Although Armelagos had previously contributed to almost every
study that challenged FORDISC and been a vocal opponent of the program
(Belcher et al. 2002; Leathers et al. 2002; Williams et al. 2005; Naar et al. 2006),
Campbell and Armelagos (2007) did not suggest that the program was flawed.
Instead, they concluded that the results achieved by FORDISC were
"approaching the limit of craniometric analysis to assign group membership" (p.
84).
Last, Jantz and Ousley have suggested that secular change may be responsible
for FORDISC's inconsistent performance in some cases. In particular, they
suggest that Americans (both "White" and "Black") have changed significantly
over the past 150 years in "response to unparalleled environmental change"
(Wescott and Jantz 2005: 242). As a result, they recommend that the Forensic
Data Bank should only be used "on individuals born in the 20th century" while
Howells data "may be more appropriate for older specimens" (Jantz and Ousley
2005:17). Certainly secular changes have been well documented (Boas 1911;
11
Angel 1976; Smith et al. 1986; Jantz and Meadows-Jantz 2000). However, the
extent to which it complicates ancestry determination is not well understood.
In sum, there are a number of unresolved issues with respect to FORDISC. In
particular, the significance of testing individuals whose populations are not
represented in the reference sample has still not been determined. There are
also inconsistencies with respect to how specifying sex affects FORDISC's
accuracy. The guidelines for determining which variables are the most effective
and how many to use, are also unclear. Lastly, the recommendations for
accepting an attribution based on the posterior and typicality probabilities differ in
the FORDISC literature. While the manual still recommends using a posterior
probability of 0.5, the FORDISC 3.0 workshops run by Jantz and Ousley now
suggest that "posterior probabilities <0.8 have a higher probability of being
incorrect than correct" (Jantz and Ousley 2007: 33). Since FORDISC continues
to be used regularly in biological anthropology and forensic settings, the study
reported here was undertaken to contribute to the resolution of these important
questions.
1.4. Issues investigated
The first issue addressed in this study is the impact of the presence or absence
of a specimen's source population in FORDISC's reference sample. As
mentioned above, a number of researchers have sought to test FORDISC by
12
analyzing specimens of known origin. FORDISC's developers have rejected most
of these tests on the grounds that the test specimens' source populations were
not included in FORDISC's reference sample. It is true that DFA "require[s] in
principle that unknowns belong to one of the groups in the analysis from which
the functions were derived" (Keita 2007: 425). However, biodistance research is
based on a close relationship between morphology and ancestry. As Roseman
(2004: 12824) notes, biodistance studies assume that "populations that share
recent common ancestry and or exchange a large number of migrants should
resemble one another more than geographically isolated and distantly related
populations". Thus, ifthere were no relationship between craniometries and
ancestry, Jantz and Ousley could not continue to claim that FORDISC will
classify individual crania "into the group with which they have the closest affinity"
(Spradley et al. 2008). Furthermore, in our increasingly mobile society "a
representative of almost any population in the world could end up being a
forensic case in almost any place in the world" (Ubelaker et al. 2002: 2).
Consequently, a program that requires an investigator to determine which
populations are represented before running an analysis may have very limited
application for real-world investigations.
The second issue investigated in the study is the effect of specifying the sex of a
target specimen versus leaving its sex unspecified. Several studies found
differences in affinity attribution when the sex was altered (Belcher et al. 2002;
Williams et al. 2005; Campbell and Armelagos 2007). By comparing a test
13
specimen to both males and females, these studies expected FORDISC to
correctly identify both population and sex on the basis that male and female
skulls of a given population are more similar to each other than either is to
another population (Williams et al. 2005). With this in mind, this study tested
whether the population attribution changed when the sex was unspecified versus
when it was restricted to the sex provided by Howells (sex specified).
The third issue addressed in this study is the impact of number of variables on
FORDISC's accuracy. While Jantz and Ousley (2005: 44) admit that "good
separation and classification of many groups requires many variables" they also
argue that "using too many variables produces overfitting and unreliable apparent
accuracy". Similarly, Williams and Armelagos (2007: 286) suggest that using
"additional variables that are collinear or that are not diagnostic may reduce the
efficacy of classification." In contrast, Hubbe and Neves (2007: 285) found that
"the number of variables used rather than the anatomical region measured" was
the most critical factor affecting FORDISC's discriminant ability. Although there is
little consensus as to what constitutes a "sufficiently" large number of variables in
a multivariate analysis (Peitrusewsky 2000), Jantz and Ousley (2005: 49)
suggest that a "reasonable recommended maximum number of variables seems
to be the minimum sample size among all groups divided by three". This is based
on Huberty's (1994) results. Although they suggest fewer variables may be
effective, as a minimum, Jantz and Ousley recommend "10 variables for reliable
comparisons" (2005: 49).
14
A fourth factor that may be contributing to FORDISC's inconsistent performance
relates to the anatomical region analyzed. While most researchers recognize that
all morphology is the result of combined genetic, developmental, and
environmental factors, cranial morphology has been considered a reasonable
proxy for geographic origin. This is particularly true of the facial region, with the
midface and nose considered the most diagnostic of ancestry (Brues 1990; Gill
and Gilbert 1990). However, many studies have shown the face to be particularly
susceptible to external stresses related to diet, conditions of growth, cultural
practices and/or climatic adaptations (Coon et al. 1950; Hiernaux 1963; Hughes
1968; Hylander 1977; Carey and Steegmann 1981; Franciscus and Long 1991;
Skelton and McHenry, 1992; Lieberman et al. 2004; Roseman 2004; Roseman
and Weaver 2004; Nicholson and Harvati 2006). As a result, facial anatomy may
not preserve population history adequately. Instead, the basicranium has been
put forward as a better indicator of ancestry because it is more phylogenetically
stable (Olson, 1981; Wood and Lieberman 2001, Harvati and Weaver 2006b).
And although it may be subject to climatic inlluences as well (Beals et al. 1983),
the neurocranium has also been suggested as a reasonable proxy for population
history (Roseman 2004). In light of these issues, and the fact that more than 50%
of FORDISC's measurements relate to the face, it was deemed important to
determine how anatomical region affected the program's success rate.
The fourth issue investigated in the study is the effect of specifying the sex of a
target specimen versus leaving its sex unspecified. Several studies have found
15
differences in affinity attribution when the sex was altered (Belcher et al. 2002;
Williams et al. 2005, Campbell and Armelagos 2007). By comparing a test
specimen to both males and females, these studies expected FORDISC to
correctly identify both population and sex on the basis that male and female
skulls of a given population are more similar to each other than either is to
another population (Williams et al. 2005). With this in mind, this study tested
whether the population attribution changed when the sex was unspecified versus
when it was restricted to the sex provided by Howells (sex specified).
1.5. Outline of analyses
All analyses were conducted on individuals taken directly from the Howells
reference sample employed by FORDISC. These individuals were only analyzed
against the Howells reference groups and not against the Forensic Databank
samples. This was done to address the question of using members whose
populations are not represented in the database and to give FORDISC the
greatest opportunity for success. As mentioned in the Introduction, there is
disagreement as to whether or not this is an appropriate test of the program's
accuracy in attributing affinity to unknown remains (Hubbe and Neves 2007;
Williams & Armelagos 2007). However, because the test individuals are part of
the reference sample, if the program functions correctly, it should successfully
place the majority with their source population.
16
To determine the effect of using an individual whose population was not
represented in the database, all analyses were run once with the source
population included and once with it excluded. To test the effect of using different
numbers of variables, analyses included variable sets that included the maximum
number of variables common to all groups (56) and the minimum number
recommended by FORDI8C (10). To assess the relative success of using
different anatomical regions on FORDI8C's ability to identify ancestry, the
measurements were divided into sets of basicranial, neurocranial and facial
variables. Lastly, to test the effect of sex selection, analyses used both sexes as
well as the appropriate sex for the test individual. For all analyses, the results
were calculated three times: once with no probability or typicality limitations, once
with 0.5 posterior probability and 0.01 typicality probability values, and once with
a more strict 0.8 posterior probability criterion.
Given the above, the following results were expected. Using individuals whose
populations were represented in the database would result in high numbers of
correct returns for all analyses. FORDI8C was expected to be able to classify
individuals using either 56 or 10 variables. With the source population excluded,
FORDI8C was expected to place test individuals into a closely related group as
determined by genetic and linguistic data.
In general, if sex is not a confounding factor, the sex-unspecified (8U) and sex
specified (88) analyses should return similar results, but practically, the results
17
for SS could be expected to be better as the number of groups in the comparison
is reduced.
With respect to variable number, FORDISC was expected to classify the most
number of test specimens correctly using the 56-variable dataset. Following
Hubbe and Neves (2007), more variables should provide better discrimination
power. At worst, adding more variables would simply fail to improve
discrimination and result in a plateau effect.
For the anatomical regions, if cranial morphology tracks population history then
the basicranium should produce the best results (Olson 1981; Wood and
Lieberman 2001; Harvati and Weaver 2006). Although it is still not clear whether
the neurocranium relates more closely to climate or to population history (Beals
et al. 1983; Roseman 2004), on the basis of Harvati and Weaver's later work
(2006b), FORDISC was expected to return fewer correct assignments using the
neurocranial variable set than the basicranium. Because studies have shown the
face to be the most susceptible to external stresses, the facial variables were
expected to be the least accurate. If, however, cranial morphology correlates with
a factor other than genetic history, then these predictions would not be
supported.
18
2. Materials and Methods
2.1. Data
The craniometric data used in this study were collected by William Howells
between 1965 and 1980 (Howells 1996). Howells published the data in a series
of monographs (1973; 1989; 1995) and also made them available upon request
and via the internet. Although the dataset does not cover certain areas (e.g.,
Indian subcontinent), and the sample sizes for some groups are small (e.g., 29
males and 18 females for Taiwanese Atayal), it is the most comprehensive and
accessible collection of human craniometric data available. As noted earlier, it
also forms the bulk of the reference sample for FORDISC.
The version of Howells' dataset used in this study consists of values for 74 linear
measurements and angles recorded on 2504 crania from 28 populations
representing five geographic regions: Europe, Africa, East Asia, Australia-Pacific,
and the Americas. In an effort to maintain equal sample sizes, Howells tried to
select 50-55 males and females for each of his 28 populations. Although some
groups were deficient in this number, most were reasonably close. Details of the
measurements and angles are given in Appendix 1. The names, geographic
locations and sample sizes of the populations are presented in Appendix II.
Although some of the names Howells and FORDISC use for the groups in the
18
reference sample are no longer considered appropriate, the designations were
maintained to avoid confusion.
The test sample consisted of 200 individuals taken directly from Howells' dataset:
20 males and 20 females from one population in each of the major geographic
areas. The five populations from which the test sample was drawn are the Berg
(Europe), Zulu (Africa), Hokkaido Japanese (East Asia), Tasmanians (Australia
Pacific) and Santa Cruz (Americas). These groups were chosen because their
sample sizes were relatively large (32-56, mean 48) and related populations
were available within the FORDISC reference sample. Test individuals were not
compared to the Forensic Databank groups as they are not included in that
reference sample.
To evaluate the impact of variable number and cranial region on the accuracy of
ancestry determination in FORDISC, four datasets were created for each test
individual. Hereinafter, these will be referred to as the whole cranium dataset, the
basicranium dataset, the neurocranium dataset and the face dataset. Appendix
III lists the variables used to create the four datasets.
The whole cranium dataset was based on the 56 variables that are common to all
groups represented in Howells' dataset. The complete set of 74 variables was
not employed because Jantz and Ousley (2005: 7) suggest that using
measurements that are not common to all groups "will limit sample sizes
19
somewhat". The 56 variables used in the whole cranium dataset were selected
with the aid of FORDISC 3.0's select all variables function.
The basicranium, neurocranium and face datasets were each based on 10
variables. Landmarks employed by Roseman (2004), Harvati and Weaver
(2006), and Hubbe and Neves (2007) were used to divide Howells' variables into
cranial region-specific groups. Of all of the measurements available for
conducting an analysis in FORDISC, 10 were associated with the basicranium,
14 related solely to the neurocranium, and 42 were face-specific. However, to
ensure consistency, each set needed to include the same number of variables.
Since the basicranium was represented by only 10 measurements, all of the
available basicranial measurements were used while 10 measurements for each
of the neurocranium and face datasets were randomly selected from their
respective totals.
2.2. Analyses
Each dataset was subjected to four analyses. In the first, the source population
was included in the reference sample and the test individual was compared to
both males and females of all available populations (population included/both
sexes). The source population was also included in the reference sample in the
second analysis but the test individual was only compared to specimens of the
relevant sex (population included/same sex). In the third analysis, the source
20
population was excluded from the reference sample and the test individual was
compared to both males and females (population excluded/both sexes). In the
fourth, the source population was excluded from the reference sample and the
test individual was only compared to specimens of the relevant sex (population
excluded/same sex).
Analyses were conducted with and without the source population included
because of the disagreement regarding how FORDISC should be applied. As
mentioned in the introduction, several studies have tested FORDISC's accuracy
using specimens whose source populations were not present in the reference
sample (e.g., Williams et al. 2005). The researchers responsible for these studies
argue that FORDISC should assign a test specimen to a closely related
population in the reference sample in the absence of the source population.
However, Jantz and Ousley (2005) contend that FORDISC should only be used
on a specimen if its population is represented in the reference sample.
'Both sexes' and 'same sex' analyses were carried out to control for the
potentially confounding effects of sexual dimorphism. When the test specimen
was compared only to reference specimens of the same sex, the select al/ males
or select al/ females function was used in FORDISC 3.0. The sex of the test
specimen was taken from the "sex" column in the Howells dataset.
21
With the exception of the source population excluded analyses, test specimens
were compared with all available groups. This was done because of confusion
regarding how many groups to use in an analysis. While Jantz and Ousley (2005)
acknowledge that "discriminant analyses should be run initially using all possible
groups that an unknown may classify into" (p. 44), they also suggest that using
two to five groups will be "more accurate than those involving many more groups"
(ibid). To achieve this improved accuracy, they suggest identifying the groups
with the lowest typicality probabilities and removing them after repeated runs.
However, they admit that typicality probabilities "are by no means foolproof'
(Jantz and Ousley 2005: 16) and do not clarify how many groups or runs are
sufficient. Furthermore, the presence or absence of a group in a particular region
cannot be assumed a priori. Nor, as Keita (2007) points out, can one "every
really know if an individual's origin population is actually represented" (p. 425).
Overall, the arguments for limiting the number of groups were judged to be
insufficient to justify reducing the number of comparative sample groups in this
study.
To identify the closest relative of a test population, published genetic and
linguistic were consulted. The best match was then chosen from the populations
available in Howells' dataset. These were the Norse (Europe) for the Berg, the
Teita (Africa) for the Zulu, the Kyushu (East Asia) for the Hokkaido Japanese, the
Yauyos (Americas) for the Santa Cruz, and the mainland Australian Aborigines
22
for the Tasmanians (Australia-Pacific). These groups were selected for the
following reasons:
1. Norse and Berg. As the ancestors of present day Nordic populations, the
Norse are most closely related to Norwegians and Swedish and are the
nearest genetic match in the database for FORDISC's Berg (Austria)
group over the more distantly related Zalavar (Hungary) group (Figure 1)
(Cavalli-Sforza et al. 1994).
2. Teita and Zulu. 'Teita' is a disused name for a North-Eastern Bantu
speaking people of Kenya (Kitson 1931). They share genetic and
linguistic ties with the Zulu, a South-Central Bantu speaking group
(Bendor-Samuel and Hartell 1989). Although the Bushmen (San) tribes
are geographically closer to the Zulu, research shows them to be both
genetically and linguistically more distant from the Zulu than are the Teita
(Figure 2) (Cavalli-Sforza et al. 1994; Knight et al. 2003).
3. Kyushu and Hokkaido Japanese. Cavalli-Sforza et al. (1994) consider
the Kyushu to be an outlier among the Japanese groups (Figure 3).
However, they are genetically closer to the Hokkaido Japanese than the
other East Asian groups in the FORDISC sample, the Ainu and the
Anyang (Omoto and Saitou 1997).
23
4. Yauyos and Santa Cruz. The indigenous groups of the Yauyos District in
Peru speak Quechua, a dialect in the Andean language group (Kaestle
and Smith 2001). Figure 4 shows Andean speakers as closest to those
who speak Penutian, the language of the Santa Cruz Amerindians
(Cavalli-Sforza et al. 1994). Although the Arikara are geographically
closer to the Santa Cruz Amerindians than the Peruvians, they are
Caddoan speakers in a more distantly related Keresiouan language group
(Campbell 1997).
5. Mainland Australian Aborigines and Tasmanians. The exact timing of
the first migration of humans into Sahul - the Pleistocene landmass that
once connected New Guinea, Australia and Tasmania - is still being
debated (Hudjashov et al. 2007; Redd and Stoneking 1999; Webb and"
Rindos 1997). However, the current consensus is that humans colonized
Sahul between 50,000 and 40,000 years ago (Walsh and Eckhoff 2007).
Radiocarbon dates of multiple sites suggest that Tasmania may have
been settled as early as 35,000 years ago (O'Connell and Allen, 1998),
which implies a prolonged period of genetic exchange with other Sahul
migrants until -12,000 years ago, when rising sea levels cut Tasmania off
from mainland Australia (Redd and Stoneking 1999). As such, the
mainland Australian Aborigines were considered to be the closest match
for the Tasmanian group in the FORDISC sample.
24
To score the results for the analyses that included the source population, an
assignment was considered 'correct' if FORDISC chose the test individual's own
population as the most likely population. For the analyses that excluded the
source population, an assignment was 'correct' when FORDISC selected the
population most closely related to the test individual's source population.
As noted earlier, several combinations of acceptable posterior and typicality
probabilities have been proposed. To reiterate, Jantz and Ousley (2005)
recommended that determinations should be accepted only if the posterior
probability exceeds 0.5 and the typicality probability exceeds 0.01. Later they
suggested that determinations with posterior probabilities less than 0.8 are more
likely to be incorrect than correct (Jantz and Ousley 2007). With this lack of
consensus in mind, the number of correctly classified specimens was calculated
three times: once by accepting all posterior probabilities and typicality
probabilities, once by accepting a determination if the posterior probability was
>0.5 and typicality probability >0.01, and once using a posterior probability >0.8
and typicality probability >0.01. FORDISC 3.0 provides three typicality values:
'ranked', 'F' and 'Chi'. The FORDISC 3.0 manual suggests that ranked or 'R'
typicalities are the most reliable since they do not require multivariate normality.
In contrast, the 'F' ratio typicality can be artificially inflated "as the number of
variables approaches a group's sample size" and the Chi-square typicality
probabilities "tend to call more individuals atypical than F typicality probabilities"
(Jantz and Ousley 2005: 54). Accordingly, the R typicality values were used.
25
Figure 1: Genetic tree for 26 European populations (Cavalli-Sforza et al. 1994:268)
DutchDanishEnglishSwissGermanBelgianAustrianFrenchSwedishNorwegianCzechoslovakianPortugueseItalianSpanishHungarianPolishRussianSCottishIrishFinnishIcelandicBasqueYugoslavian
L- Greek
'------------------- Sardinian'----------------------- Lapp
0.010.020.03 o..... ......L ..L.. ......L ...I' Genetic Distance
0.04
Figure 2: Genetic tree for 33 African populations (Cavalli-Sforza et al. 1994: 169)
....--- PygmoId !,--__ Bantu, N.E. I
'-- Bantu, C.E.Bantu,S.W.Bantu,C.W.Nllotlc
Kunama !IBantu. S.E.Bantu, N.W.
'---- UbanglanVoila IEwe •GurMandeKruYoruba100FulanlHaussBane
L- Bedlkl- Funji
....-__ L--C=====::serer
.r------1C=:= WolofPeulL-_L- sendawe
l- Hadza
L------L==~======san
SomaliKholL. Mbull
3.8.13.0.13.0.1
3.0.1
3.0.13.0.1
3.5.33.0.13.0.1
3.0.13.0.13.0.13.0.13.0.13.0.13.0.13.0.13.0.13.0.13.0.13.0.13.5.33.5.3
3.0.13.0.1
3.0.13.0.13.5.33.5.33.2.23.5.33.5.33ll.3
....~_t °--lc:s ~Genetic Dlsta~
26
Figure 3: Genetic tree for 21 Asian populations (Cavalli-Sforza et al. 1994: 231).
s= Turkoman
lUzbekTurkishAItak:North ChineseNepaleseYakutSherpaTwaKoreanJapaneseRyukyuSouthwest HonshuHonshu KantoHonshu ChubuKyushuHonshu KinkiBhutaneseTIbetanAinuSouth Chinese
0.08 0.04 0Genetic Distance
Figure 4: Genetic tree for 23 American populations (Cavalli-Sforza et al. 1994:323)
USSR EskimoChukchiKoryakReindeer ChukchiNorth Na-DeneCanadian Na-DeneInupik EskimoGreenland EskimoAlaskan EskimoCanadian EskimoMacro-PanoanSou1h Macro-ChibchanAndeanPenutianKeresiouanNorth Central AmerindMacro-CaribEquatorialCentral Macro-ChibchanAlmosanSou1h Na-DeneMacro-GeMacro-Tucanoan
0.12I
0.10I
0.05I
oGenetic Distance
27
3. Results
3.1. Impact of including source population and specifying sex
This section addresses the impact of population inclusion and sex selection on
FORDISC's success rate. As discussed, debate surrounds whether or not a
specimen should be tested with FORDISC if its source population is not
represented in the reference database. Similarly, although some studies have
found inconsistencies in classification between using both sexes and using only
the relevant sex, the effect of sex selection on FORDISC's success rate has not
been fully tested.
3. 1. 1. Number of correct assignments accepting all posterior and typicality
probabilities. The totals for the number of correct assignments for each set of
analyses are given in table 1. In the analyses of the whole cranium dataset, the
best test results were obtained when the source population was included and the
sex specified (i.e. only relevant sex included in reference sample). In this
analysis, 88.5% of the test specimens were assigned correctly. The next-best
results were obtained when the source population was included and the sex
unspeci'F1ed (Le. both sexes included in reference sample). In this case, 82.5% of
the test specimens were assigned correctly. The third-best results were obtained
when the source population was excluded and the sex specified. Here,
28
FORDISC correctly classified 39.5% of the test specimens. The worst results
were obtained when the source population was excluded and sex unspecified.
This analysis returned 36.5% correct classifications.
The results of the analyses using the basicranium dataset followed the same
pattern as those for the whole cranium dataset. Again, the best results were
achieved when the source population was included and the sex specified. In this
analysis, FORDISC correctly assigned 33.5% of the test specimens. With the
source population included and the sex unspecified, FORDISC assigned 22.5%
of the test sample correctly. With the source population removed, FORDISC
assigned more test specimens correctly with the sex specified than with it
unspeci'fied. In this case, 10.5% of the test sample was correctly classified with
the sex specified compared to 8.5% with the sex unspecified.
The results of the analyses using the neurocranium dataset also followed the
same pattern as those for the whole cranium dataset. The best result (48.5%)
was achieved with the source population included and the sex specified. The
next-best result occurred when the source population was included and the sex
unspecified. In this case, FORDISC classified 41.5% of the test sample correctly.
When the source population was excluded, FORDISC assigned 23.0% of the test
sample correctly with the sex specified and 16.0% with the sex unspecified.
29
The results of the analyses using the face dataset followed the same pattern as
the results for the previous three sets of analyses. As before, the best result was
achieved when the source population was included and the sex specified. In this
analysis, FORDISC assigned 41.5% of the test sample correctly. When the sex
was unspecified, it assigned 34.0% correctly. With the source population
excluded, FORDISC assigned 23.0% of the test sample correctly with the sex
specified and 15.0% with the sex unspecified.
Table 1. Total number of test specimens correctly classified2 (n=200)
Dataset 15U 155 E5U E55
Whole cranium 165 177 73 79(82.5%) (88.5%) (36.5%) (39.5%)
Basicranium 45 67 17 21(22.5%) (33.5%) (8.5%) (10.5%)
Neurocranium 83 97 32 46(41.5%) (48.5%) (16.0%) (23.0%)
Face68 83 30 46
(34.0%) (41.5%) (15.0%) (23.0%)
2 All tables use the following format: ISU = source population included, sex unspecified. ISS =source population included, sex specified. ESU = source population excluded, sex unspecified.ESS =source population excluded, sex specified. Bold cell indicates the variable set with thehighest success rate for each population. Upper value in cell is the number of test specimenscorrectly classified. The value in parentheses is the percentage of test specimens correctlyclassified.
30
In all four sets of analyses, then, markedly more individuals were correctly
c1assi'f1ed when the source population was included than when it was excluded.
More individuals were also correctly classified when comparisons were limited to
specimens of the same sex rather than when the test specimens were compared
to both males and females.
3.1.2. Number of correct assignments using >0.5 posterior probability and
>0.01 typicality probability. Table 2 shows the scores recalculated based on
these criteria. As before, FORDISC achieved the best results using the whole
cranium dataset with the source population included and the sex specified. This
was followed by the source population included and sex unspecified results.
Third-best results for the whole cranium were achieved when the source
population was excluded and the sex specified. FORDISC consistently returned
the least number of correct assignments using the whole cranium when the
source population was excluded and the sex unspecified. As with the total
number of correct assignments, the results for the three other datasets followed
the same pattern.
31
Table 2. Number of test specimens correctly classified using >0.5 posteriorprobability and >0.01 typicality probability (n=200)
Dataset ISU ISS ESU ESS
Whole cranium160 171 49 54
(80.0%) (85.5%) (24.5%) (27.0%)
Basicranium 4 10 2 4(2.0%) (5.0%) (1.0%) (2.0%)
Neurocranium 34 54 5 16(17.0%) (27.0%) (2.5%) (8.0%)
Face 29 44 6 14(14.5%) (22.0%) (3.0%) (7.0%)
3.1.3. Number of correct assignments using >0.8 posterior
probability and >0.01 typicality probability. Once again, for each dataset,
FORDISC classified the highest number of test specimens correctly when the
source population was included and the sex specified (Table 3). The next-best
results were achieved with the source population included and the sex
unspecified. In the case of the basicranium dataset, the sex-unspecified and
specified analyses returned the same number (0.5%) of correct classifications.
The population-excluded results followed a similar pattern. FORDISC achieved
better success when the sex was specified than when it was unspecified.
However, it is worth noting that, with the source population excluded, FORDISC
failed to classify a single individual out of 200 in three cases.
32
Overall, the numbers of correct assignments were low at the 0.8 posterior
probability level. With one exception, the best results were obtained when the
source population was included and the sex specified. The worst results occurred
when the source population was excluded and the sex unspecified.
Table 3. Number of test specimens correctly classified using >0.8 posteriorprobability and >0.01 typicality probability (n=200)
Dataset ISU ISS ESU ESS
Whole cranium 139 156 24 38(69.5%) (78.0%) (12.0%) (19.0%)
Basicranium 1 1 0 0(0.5%) (0.5%) (0.0%) (0.0%)
Neurocranium 9 26 1 4(4.5%) (13.0%) (0.5%) (2.0%)
Face 5 13 0 2(2.5%) (6.5%) (0.0%) (1.0%)
3.1.4. Summary. Regardless of the criteria used to assess the results,
many more test specimens were correctly classified when the source population
was included than when it was excluded. Better results were also obtained when
a test specimen was compared only to reference specimens of the same sex,
rather than to both sexes. It is important to note here however, that the numbers
of correct classifications were extremely low in most analyses. Even when all
posterior and typicality probabilities were accepted, FORDISC achieved no better
33
than a 48.5% success rate in the majority of analyses. Only in the whole
cranium, population-included analyses were more than half of the test specimens
correctly c1assi'fied.
3.2. Impact of variable number
This section considers the effect of variable number on FORDISC's classification
rate. To reiterate, although studies have shown more variables to be more
effective in discriminating among groups, Jantz and Ousley (2005) maintain that
using large numbers of variables with FORDISC produces poor results due to a
phenomenon they refer to as 'overfitting'.
3.2.1 Number of correct assignments accepting all posterior and typicality
probabilities. The total number of correct assignments for each set of variables is
given in table 4. For the population-included, sex-unspecified analyses, the 56
variable whole cranium dataset returned the greatest number of correct
classifications. Here, FORDISC correctly assigned 82.5% of the test individuals
to the appropriate population. This was followed by the 10-variable neurocranium
dataset which returned 41.5% test specimens correctly. The face and
basicranium datasets returned 34.0% and 22.5% correct assignments,
respectively.
34
The results were similar for the population-included, sex-specified analyses.
Again, FORDISC was the most successful with the 56-variable dataset, correctly
assigning 88.5% of the test individuals to the appropriate population. The 10
variable neurocranium dataset achieved the next-best result with 48.5%, followed
by the face with 41.5% and the basicranium dataset with 33.5%.
The results for the analyses that excluded the source population were similar to
the population-included results. With the sex unspecified, FORDISC classified
36.5% of the test specimens with the most closely related population using the
56-variable dataset, compared to 16.0%,15.0% and 8.5% for the 10-variable
datasets (neurocranium, face and basicranium, respectively).
As with the other analyses, when the source population was excluded and only
the same-sex reference groups used, the best result was achieved using the 56
variable dataset. In this case, FORDISC assigned 39.5% of the test sample to
the most closely related population. Unlike the previous results, however, the
next-best result was shared by the neurocranium and face. Using each of these
datasets, FORDISC classified 23.0% of the test specimens correctly. In keeping
with the other analyses, the basicranium dataset returned the fewest number of
correct classifications, assigning 10.5% of the test specimens to the most closely
related population.
35
Thus, regardless of other factors, FORDISC returned signi'f1cantly more correct
classi'f1cations using the 56-variable whole-cranium dataset than with any of the
1O-variable datasets.
Table 4. Total number of test specimens correctly classified by variablenumber (n=200)
Analysis56 variables 10 variables 10 variables 10 variables
(cranium) (basicranium) (neurocranium) (face)
ISU 165 45 83 68(82.5%) (22.5%) (41.5%) (34.0%)
ISS177 67 97 83
(88.5%) (33.5%) (48.5%) (41.5%)
ESU 73 17 32 30(36.5%) (8.5%) (16.0%) (15.0%)
ESS 79 21 46 46(39.5%) (10.5%) (23.0%) (23.0%)
3.2.2 Number of correct assignments using >0.5 posterior probability and
>0.01 typicality probability. Table 5 shows the recalculated scores based on 0.5
posterior probability and 0.01 typicality values for the four sets of analyses. The
impact of number of variables on the number of correct classifications was
heightened when the recommended probability and typicality values were taken
into account.
36
For the population-included, sex-unspecified analyses, the best results were
achieved when FORDISC used 56 variables. In this analysis, 80.0% of the test
specimens were correctly classified. Of the 1O-variable datasets, the
neurocranium returned the next-best result (17.0%), followed by the face dataset
(14.5%). The basicranium achieved the poorest result, with only four test
individuals (2.0%) correctly assigned.
The results were similar for the population-included, sex-specified analyses.
FORDISC correctly classified 85.5% of the test population using 56 variables as
opposed to 27.0%, 22.0% and 5.0% for the 10-variable neurocranium, face and
basicranium datasets respectively.
Like the population-included analyses, the best results for the population
excluded, sex-unspecified analyses were obtained using the whole cranium
dataset. Here, FORDISC assigned 24.5% of the test individuals to the most
closely related group. Surprisingly, the next-best results used the face dataset
rather than the neurocranium. In this case, 3.0% of the test specimens were
correctly classified using the face variables versus 2.5% using the neurocranium.
Once again, however, the basicranium dataset achieved the poorest results with
FORDISC placing only two individuals (1.0%) with their related population.
In the population-excluded, sex-specified analyses, the best results were
obtained using the 56-variable whole-cranium dataset. Here, FORDISC correctly
37
classified 27.0% of the test sample to the most closely related group. The pattern
for the 1O-variable datasets followed the first two sets of analyses: the
neurocranium returned the next-best result with 8.0%, followed by the face
(7.0%) and basicranium (2.0%).
Although the number of correct assignments fell significantly when the
recommended posterior probability and typicality values were used, the 56-
variable dataset continued to achieve considerably better results than the three
1O-variable datasets.
Table 5. Number of test specimens correctly classified using >0.5 posteriorprobability and >0.01 typicality probability (n=200)
Analysis 56 variables 10 variables 10 variables 10 variables(cranium) (basicranium) (neurocranium) (face)
ISU 160 4 34 29(80.0%) (2.0%) (17.0%) (14.5%)
ISS171 10 54 44
(85.5%) (5.0%) (27.0%) (22.0%)
ESU 49 2 5 6(24.5%) (1.0%) (2.5%) (3.0%)
ESS 54 4 16 14(27.0%) (2.0%) (8.0%) (7.0%)
3.2.3. Number of correct assignments using >0.8 posterior probability and
>0.01 typicality probability. Using the 0.8 posterior probability criterion, the
number of correct assignments fell again. Table 6 summarizes these results.
38
Once again, for all analyses, the best results were achieved using the 56-variable
whole-cranium dataset. This was followed by the neurocranium, face and
basicranium datasets. However, the 10-variable datasets performed poorly in
general, especially when the source population was excluded.
Table 6. Number of test specimens correctly classified using >0.8 posteriorprobability and >0.01 typicality probability (n=200)
Analysis56 variables 10 variables 10 variables 10 variables
(cranium) (basicranium) (neurocranium) (face)
15U 139 1 9 5(69.5%) (0.5%) (4.5%) (2.5%)
155 156 1 26 13(78.0%) (0.5%) (13.0%) (6.5%)
E5U 24 0 1 0(12.0%) (0.0%) (0.5%) (0.0%)
E55 38 0 4 2(19.0%) (0.0%) (2.0%) (1.0%)
3.2.4. Variable number and population differences. The previous sections
combined the results for all 200 test individuals. To determine whether the results
are consistent among the test populations, this section re-examines the results
for each geographic group in relation to variable number.
3.2.4. 1. Number of correct assignments accepting all posterior and
typicality probabilities. Table 7 shows the total number of correct classifications
39
for each population. For the population-included, sex-unspecified analyses,
FORDISC achieved the best results using the 56-variable dataset. This was
consistent across all five populations. For the Berg, 85.0% of the test specimens
were classified correctly using the 56-variable dataset as opposed to 47.5% for
the next-best 10-variable dataset (neurocranium). For the Santa Cruz
Amerindians, FORDISC correctly c1assi'fled 92.5% of the test specimens using 56
variables in contrast to 60.0% for the next-best result (face dataset). 70.0% of the
Northern Japanese were correctly classified using 56 variables, with the next
best result returning only 25.0% (face dataset). The Tasmanian group was
correctly classified in 80.0% of the cases using the 56-variable dataset with the
neurocranium dataset returning the next-best result with 57.5%. For the Zulu,
FORDISC also achieved the best results using 56-variables and the next-best
result using the 10-variable neurocranium dataset (50.0%).
The population-included, sex-specified analyses followed a similar pattern to the
sex-unspecified results. For all five populations, FORDISC was the most
successful using the 56-variable dataset. For the Berg, FORDISC classified
85.0% of the test specimens correctly using 56 variables. The neurocranium
dataset returned the next-best result with 57.5%. All 40 Santa Cruz individuals
(100%) were correctly classified using the 56 variable dataset with the face
returning the next-best result with 62.5%. 77.5% of the Northern Japanese were
correctly classified using 56 variables, in contrast to 30.0% for the best 10
variable dataset (basicranium). The Tasmanian group was correctly classified
40
82.5% of the time using 56 variables with the next-best result coming from the
neurocranium dataset (75.0%). The Zulu group also achieved the best result
using the 56-variable dataset (97.5%) and the next-best using the neurocranium
dataset (47.5%).
The pattern of correct assignments for the population-excluded, sex-unspecified
analyses followed the population-included results for four out of the five
populations. FORDISC correctly classified 15.0% of the Berg specimens using
56 variables as opposed to 10.0% using the next-best 1O-variable dataset (face).
For the Santa Cruz population, 45.0% were correctly classified using the 56
variable dataset followed by 20.0% using the facial dataset. 60.0% of the
Northern Japanese were correctly classified using 56 variables with the 10
variable face dataset returning the next-best result (15.0%). The Tasmanian
group also achieved better results using 56 variables, classifying 40.0% with this
dataset versus 30.0% using the neurocranium dataset. In contrast, FORDISC
achieved the best results for the Zulu group using the neurocranium dataset.
Here, 27.5% of the test sample was correctly assigned in comparison to 22.5%
when 56 variables were used.
For the population-excluded, sex-specified analyses, three out of the five groups
achieved the best results using the 56-variable dataset. For the Santa Cruz
population, FORDISC c1assi'fied more specimens correctly using 56 variables
(45.0%) than 10 variables (35.0% using the face dataset). This was also the case
41
for the Northern Japanese. For this group, FORDISC correctly classified 65.0%
of the specimens using the 56-variable dataset versus 20.0% using the face.
FORDISC also classified more Tasmanians correctly using 56 variables (50.0%)
than with 10 variables (40.0% using the neurocranium dataset). In contrast, both
the Berg and the Zulu groups deviated from the general pattern. For the Berg,
FORDISC classi'fied the same number of specimens using the 1O-variable face
dataset as the 56-variable whole cranium. In each case, 17.5% of the test
specimens were correctly classified. For the Zulu, 47.5% of the test specimens
were correctly classified using the neurocranium dataset and only 20.0% using
the 56-variable whole-cranium dataset.
Table 7. Results by population accepting all posterior and typicalityprobabilities (n=40)
Berg Santa Cruz N.Japan Tasmanian Zulu
ISU
56 variables 34 37 28 32 34(whole cranium) (85.0%) (92.5%) (70.0%) (80.0%) (85.0%)
10 variables 11 8 7 7 12(basicranium) (27.5%) (20.0%) (17.5%) (17.5%) (30.0%)
10 variables 19 15 6 23 20(neurocranium) (47.5%) (37.5%) (15.0%) (57.5%) (50.0%)
10 variables 13 24 10 15 6(face) (32.5%) (60.0%) (25.0%) (37.5%) (15.0%)
ISS
56 variables· 34 40 31 33 39(whole cranium) (85.0%) (100%) (77.5%) (82.5%) (97.5%)
42
10 variables 15 12 12 13 15(basicranium) (37.5%) (30.0%) (30.0%) (32.5%) (37.5%)
10 variables 23 18 7 30 19(neurocranium) (57.5%) (45.0%) (17.5%) (75.0%) (47.5%)
10 variables 18 25 11 20 9(face) (45.0%) (62.5%) (27.5%) (50.0%) (22.5%)
ESU
56 variables 6 18 24 16 9(whole cranium) (15.0%) (45.0%) (60.0%) (40.0%) (22.5%)
10 variables 1 5 3 5 3(basicranium) (2.5%) (12.5%) (7.5%) (12.5%) (7.5%)
10 variables 3 4 2 12 11(neurocranium) (7.5%) (10.0%) (5.0%) (30.0%) (27.5%)
10 variables 4 8 6 10 2(face) (10.0%) (20.0%) (15.0%) (25.0%) (5.0%)
ESS
56 variables 7 18 26 20 8(whole cranium) (17.5%) (45.0%) (65.0%) (50.0%) (20.0%)
10 variables 1 8 4 5 3(basicranium) (2.5%) (20.0%) (10.0%) (12.5%) (7.5%)
10 variables 5 4 2 16 19(neurocranium) (12.5%) (10.0%) (5.0%) (40.0%) (47.5%)
10 variables 7 14 8 13 4(face) (17.5%) (35.0%) (20.0%) (32.5%) (10.0%)
43
3.2.4.2 Number of correct classifications using >0.5 posterior probability
and >0.01 typicality probability. Table 8 shows the number of correct
classifications for each population when the recommended posterior and
typicality probabilities are considered. The number of correctly classi'fied
specimens fell when the posterior probability and typicality criteria were used, but
the results followed the same pattern as those obtained in the analyses in which
all posterior and typicality probabilities were employed. Thus, in the ISU and ISS
analyses the 56 variable dataset out performed all the 1O-variable datasets, while
in the ESU and ESS analyses the 56-variable dataset out-performed the 10
variable datasets in the case of the Berg, Santa Cruz, Northern Japanese and
Tasmanians, but not the Zulu. In the ESU analyses, the Zulu neurocranium 10
variable dataset performed as well as the 56-variable dataset (10% of specimens
correctly classified in both cases). In the ESS analyses, the Zulu neurocranium
10-variable dataset performed better than the 56-variable dataset (25% versus
17.5%).
44
Table 8. Results by population using >0.5 posterior probability and >0.01typicality probability (n=40)
Berg Santa Cruz N.Japan Tasmanian Zulu
ISU
56 variables 33 37 27 32 31(whole cranium) (67.5%) (92.5%) (67.5%) (80.0%) (77.5%)
10 variables 4 0 0 0 0(basicranium) (10.0) (0.0%) (0.0%) (0.0%) (0.0%)
10 variables 11 5 0 13 5(neurocranium) (27.5%) (12.5%) (0.0%) (32.5%) (12.5%)
10 variables 7 16 1 5 0(face) (17.5%) (40.0%) (2.5%) (12.5%) (0.0%)
ISS
56 variables 33 40 30 33 35(whole cranium) (82.5%) (100%) (75.0%) (82.5%) (87.5%)
10 variables 4 3 0 0 3(basicranium) (10.0) (7.5%) (0.0%) (0.0%) (7.5%)
10 variables 14 6 0 20 14(neurocranium) (35.0%) (15.0%) (0.0%) (50.0%) (35.0%)
10 variables 11 18 1 14 0(face) (27.5%) (45.0%) (2.5%) (35.0%) (0.0%)
ESU
56 variables 6 15 17 7 4(whole cranium) (15.0%) (37.5%) (42.5%) (17.5%) (10.0%)
10 variables 0 0 0 1 1(basicranium) (0.0%) (0.0%) (0.0%) (2.5%) (2.5%)
10 variables 0 0 0 1 4(neurocranium) (0.0%) (0.0%) (0.0%) (2.5%) (10.0%)
45
10 variables 0 1 1 3 1(face) (0.0%) (2.5%) (2.5%) (7.5%) (2.5%)
ESS
56 variables 5 17 18 7 7(whole cranium) (12.5%) (42.5%) (45.0%) (17.5%) (17.5%)
10 variables 0 0 0 2 2(basicranium) (0.0%) (0.0%) (0.0%) (5.0%) (5.0%)
10 variables 0 1 0 5 10(neurocranium) (0.0%) (2.5%) (0.0%) (12.5%) (25.0%)
10 variables 2 3 3 4 2(face) (5.0%) (7.5%) (7.5%) (10.0%) (5.0%)
3.2.4.3 Number of correct classifications using >0.8 posterior probabilitv
and >0.01 tvpicalitv probabilitv. Table 9 shows the number of correct
classifications for each population when the more strict posterior and typicality
probabilities are considered. Again, the number of correctly classified specimens
fell and the results were very low in general. For the ISU and ISS analyses the 56
variable dataset out performed all the 10-variable datasets. In the ESU analyses,
the 56-variable dataset out-performed the 10-variable datasets in the case of the
Berg, Santa Cruz, Northern Japanese and Tasmanians, but not the Zulu. Here,
the Zulu neurocranium 10-variable dataset and the 56-variable dataset achieved
the same result (2.5% of specimens correctly classified in both cases). In the
ESS analyses, the 56-variable dataset outperformed all of the 10-variable
datasets.
46
Table 9. Results by population using >0.8 posterior probability and >0.01typicality probability (n=40)
Berg Santa Cruz N.Japan Tasmanian Zulu
ISU
56 variables 27 36 21 29 26(whole cranium) (67.5%) (90.0%) (52.5%) (72.5%) (65.0%)
10 variables 1 0 0 0 0(basicranium) (2.5%) (0.0%) (0.0%) (0.0%) (0.0%)
10 variables 1 0 0 8 0(neurocranium) (2.5%) (0.0%) (0.0%) (20.0%) (0.0%)
10 variables 2 3 0 0 0(face) (5.0%) (7.5%) (0.0%) (0.0%) (0.0%)
ISS
56 variables 30 37 24 34 33(whole cranium) (75.0%) (92.5%) (60.0%) (85.0%) (82.5%)
10 variables 1 0 0 0 0(basicranium) (2.5%) (0.0%) (0.0%) (0.0%) (0.0%)
10 variables 6 2 0 14 4(neurocranium) (15.0%) (5.0%) (0.0%) (35.0%) (10.0%)
10 variables 3 8 0 2 0(face) (7.5%) (20.0%) (0.0%) (5.0%) (0.0%)
ESU
56 variables 2 6 9 6 1(whole cranium) (5.0%) (15.0%) (22.5%) (15.0%) (2.5%)
10 variables 0 0 0 0 0(basicranium) (0.0%) (0.0%) (0.0%) (0.0%) (0.0%)
10 variables 0 0 0 0 1(neurocranium) (0.0%) (0.0%) (0.0%) (0.0%) (2.5%)
47
10 variables 0 0 0 0 0(face) (0.0%) (0.0%) (0.0%) (0.0%) (0.0%)
ESS
56 variables 5 12 10 7 6(whole cranium) (12.5%) (30.0%) (25.0%) (17.5%) (15.0%)
10 variables 0 0 0 0 0(basicranium) (0.0%) (0.0%) (0.0%) (0.0%) (0.0%)
10 variables 0 0 0 0 4(neurocranium) (0.0%) (0.0%) (0.0%) (0.0%) (10.0%)
10 variables 0 0 0 1 1(face) (0.0%) (0.0%) (0.0%) (2.5%) (2.5%)
3.2.5. Summary. Overall, these results indicate that the number of
variables used has a significant impact on FORDISC's ability to identify ancestral
group, regardless of other factors. FORDISC correctly classified more test
specimens using the 56-variable whole-cranium dataset than with any of the 10-
variable datasets. The 10-variable datasets never achieved higher than 48.5%
correct classifications (total result for neurocranium dataset, ISS). This compares
unfavourably to 88.5% for the 56-variable dataset under the same conditions.
However, this finding did not hold for all the test populations or at different
probability levels.
48
3.3 Impact ofcranial region
The effect of cranial region on FORDISC's ability to identify ancestry has not
previously been addressed in the literature. Accordingly, this section compares
the results of the four sets of analyses using equal numbers of variables selected
to isolate the neurocranium, basicranium and face.
3.3.1. Number of correct assignments accepting all posterior and typicality
probabilities. Table 10 provides the total number of correct assignments for each
cranial region accepting all posterior and typicality probabilities. For the
population-included, sex-unspecified analyses, the neurocranium dataset
returned the greatest number of correct classifications. Here, FORDISC correctly
assigned 41.5% of the test individuals to the appropriate population. The face
and basicranium datasets returned 34.0% and 22.5% correct assignments,
respectively.
The results were similar for the population-included, sex-specified analyses.
Again, FORDISC was the most successful using the neurocranium dataset. In
this analysis, 48.5% of the test specimens were correctly classified versus 41.5%
using the face and 33.5% using the basicranium dataset.
49
When the sex was left unspeci'fied, the results for the analyses that excluded the
source population followed a similar pattern to the population-included results. In
this analysis, FORDISC classified more test specimens with the most closely
related population using the neurocranium dataset (16.0%). This was followed by
the face dataset (15%) and the basicranium dataset (8.5%).
In contrast, when the source population was excluded and only the same sex
reference groups used, the neurocranium and face datasets returned the same
results. In both cases, FORDISC classified 23.0% of the test specimens
correctly. In keeping with the other analyses, the basicranium dataset returned
the fewest number of correct classifications, assigning 10.5% of the test
specimens to the most closely related population. Thus, with only one exception,
FORDISC achieved the best results using the neurocranium dataset, followed by
the face and basicranium datasets.
Table 10.Total number of test specimens correctly classified (n=200)
Analysis Basicranium Neurocranium Face
ISU 45 83 68(22.5%) (41.5%) (34.0%)
ISS 67 97 83(33.5%) (48.5%) (41.5%)
ESU 17 32 30(8.5%) (16.0%) (15.0%)
ESS 21 46 46(10.5%) (23.0%) (23.0%)
50
3.3.2. Number of correct assignments using >0.5 posterior probability and
>0.01 typicality probability. Table 11 shows the recalculated scores based on 0.5
posterior probability and 0.01 typicality values for the three sets of cranial-region
analyses. For the population-included, sex-unspecified analyses, the best results
were achieved when FORDISC used the neurocranium dataset. Here, 17.0% of
the test specimens were c1assined correctly. This was followed by the face
dataset (14.5%). The basicranium achieved the poorest result, with only four test
individuals (2.0%) correctly assigned.
The pattern was similar for the population-included, sex-specified analyses.
FORDISC correctly classified 27.0% of the test population using the
neurocranium dataset as opposed to 22.0% and 5.0% for the face and
basicranium datasets, respectively.
Surprisingly, the best results for the population-excluded, sex-unspecified
analyses were obtained using the face dataset. Here, FORDISC assigned 3.0%
of the test individuals to the most closely related group compared to 2.5% using
the neurocranial variables. Once again, the basicranium dataset achieved the
poorest results with FORDISC placing only two individuals (1.0%) with the most
closely related population.
In the population-excluded, sex-specified analyses, the best result was once
again obtained using the neurocranium dataset. Here, FORDISC correctly
51
classified 8.0% of the test sample to the most closely related group. The face
dataset achieved the next-best result with 7.0%, followed by the basicranium with
2.0%.
Although the number of correct assignments fell significantly when the
recommended posterior probability and typicality values were used, the
neurocranium dataset returned the highest number of correct classifications in all
but one case. With this one exception, the face dataset obtained the next-best
results. The basicranium consistently returned the lowest number of correct
assignments.
Table 11. Number of test specimens correctly classified using >0.5posterior probability and >0.01 typicality criteria (n=200)
Analysis Basicranium Neurocranium Face
ISU 4 34 29(2.0%) (17.0%) (14.5%)
ISS10 54 44
(5.0%) (27.0%) (22.0%)
ESU 2 5 6(1.0%) (2.5%) (3.0%)
ESS4 16 14
(2.0%) (8.0%) (7.0%)
52
3.3.3. Number of correct assignments using >0.8 posterior probability and
>0.01 typicality probability. Using the 0.8 posterior probability criterion, the
number of correct assignments fell again. Table 12 shows these results. As with
the previous results, FORDISC classified more test specimens for the population
included, sex-unspecified analyses using the neurocranium dataset. Here, 4.5%
of the specimens were correctly classified using the neurocranium dataset, in
contrast to 2.5% using the face dataset and 0.5% using the basicranium dataset.
The pattern was similar for the population-included, sex-specified analyses.
Using the neurocranium dataset, FORDISC correctly classified 13.0% of the test
specimens, followed by 6.5% using the facial variables and 0.5% using the
basicranial variables. For the population-excluded, sex-unspecified analyses, the
neurocranium again achieved the best results. However, since only one
individual (0.5%) was classified correctly and none were classified correctly using
either the face or basicranium datasets, the term 'best' is used loosely.
FORDISC fared little better with the population-excluded, sex-specified analyses.
Here, 2.0% of the test specimens were correctly classified using the
neurocranium dataset. Only 1.0% was classified using the face dataset and no
individuals were classified correctly using the basicranium dataset.
Once again, the neurocranium dataset achieved the highest number of correct
determinations, followed by the face and the basicranium. However, at the 0.8
S3
posterior probability level, the results are generally poor and the population-
excluded results are extremely low.
Table 12. Number of test specimens correctly classified using >0.8posterior probability and >0.01 typicality probability (n=200)
Analysis Basicranium Neurocranium Face
ISU 1 9 5(0.5%) (4.5%) (2.5%)
ISS 1 26 13(0.5%) (13.0%) (6.5%)
ESU 0 1 0(0.0%) (0.5%) (0.0%)
ESS 0 4 2(0.0%) (2.0%) (1.0%)
3.3.4. Cranial region and population differences. As with the variable
number analyses, the previous sections combined the results for all 200 test
individuals. To establish whether or not FORDISC is consistent between the test
populations, this section considers the results for each cranial region according
to geographic group.
3.3.4. 1. Number of correct assignments accepting all posterior and
typicalitv probabilities. Table 13 breaks down the results for each cranial region
by population accepting all posterior and typicality probabilities. For the Berg,
when the population was included, FORDISC was the most successful with the
54
neurocranial variables, placing 47.5% (sex unspeci'fied) and 57.5% (sex
specified) of the individuals into the Berg group. The facial region was the next
most successful followed by the basicranium. However, when the Berg group
was excluded, the facial dataset achieved the best results. Here, the program
assigned 10.0% (sex unspeci'fied) and 17.5% (sex specified) of the specimens to
the Norse group compared to 7.5%/12.5% for the neurocranium and 2.5%/2.5%
for the basicranium.
The situation was different for the Santa Cruz group. FORDISC was the most
successful in attributing ancestry using the facial variables in all analyses. With
the population included and the sex unselected, 60.0% of the sample was
correctly classified using the face, versus 37.5%, for the neurocranium and
20.0% for the basicranium. With the population included and sex selected, 62.5%
of the sample was classified using the facial variables, in comparison to 45.0%,
using the neurocranium and 30.0% using the basicranium dataset. When the
Santa Cruz group was excluded from the analysis, with the sex unselected,
FORDISC assigned 20.0% of the sample to the Peruvian group using the face
dataset, 10.0% using the neurocranium and 12.5% using the basicranial variable
set.
For the Northern Japanese, FORDISC returned more correct assignments using
the facial region in three out of four analyses. However, when the population was
included and the sex speci'fied, FORDISC obtained the best results using the
55
basicranium variable set, placing 30.0% of the individuals back into the Northern
Japanese group versus 27.5% for the face and 17.5% for the neurocranium.
For the Tasmanians, FORDISC achieved the highest success rate when using
the neurocranial variables in all analyses. For the population included analyses
57.5% (sex unspecified) and 75.0% (sex specified) specimens were assigned
correctly over 37.5% and 50.0% using facial variables and 32.5% and 17.5%
using the basicranium.
The Zulu group also showed the best results when FORDISC used the
neurocranial variables. When the population was included, 50% (sex unspecified)
and 47.5% (sex specified) of the individuals were correctly identified, versus
30.0% and 37.5% for the basicranium and 15.0% and 22.5% for the facial region.
56
Table 13. Total results for each cranial region by population (n=40)
Berg Santa Cruz N.Japan Tasmanian Zulu
ISU
11 8 7 7 12Basicranium(27.5%) (20.0%) (17.5%) (17.5%) (30.0%)
19 15 6 23 20Neurocranium(47.5%) (37.5%) (15.0%) (57.5%) (50.0%)
13 24 10 15 6Face(32.5%) (60.0%) (25.0%) (37.5%) (15.0%)
ISS
15 12 12 13 15Basicranium(37.5%) (30.0%) (30.0%) (32.5%) (37.5%)
23 18 7 30 19Neurocranium(57.5%) (45.0%) (17.5%) (75.0%) (47.5%)
18 25 11 20 9Face(45.0%) (62.5%) (27.5%) (50.0%) (22.5%)
ESU
1 5 3 5 3Basicranium(2.5%) (12.5%) (7.5%) (12.5%) (7.5%)
3 4 2 12 11Neurocranium(7.5%) (10.0%) (5.0%) (30.0%) (27.5%)
4 8 6 10 2Face(10.0%) (20.0%) (15.0%) (25.0%) (5.0%)
ESS
1 8 4 5 3Basicranium(2.5%) (20.0%) (10.0%) (12.5%) (7.5%)
5 4 2 16 19Neurocranium
(12.5%) (10.0%) (5.0%) (40.0%) (47.5%)
7 14 8 13 4Face(17.5%) (35.0%) (20.0%) (32.5%) (10.0%)
57
3.3.4.2. Number of correct assignments using >0.5 posterior probability
and >0.01 typicality probability. Table 14 lists the results for each cranial region
by population using 0.5 posterior probability and 0.01 typicality probability. Using
the recommended probability criteria with the source population included,
FORDISC achieved the best results for the Berg using the neurocranial
variables. This was followed by the face and basicranium datasets. However,
when the Berg population was excluded and the sex spedfied, FORDISC was
only able to place two specimens into the target group (the Norse). FORDISC
could not place any individuals correctly when the sex was unspecified or by
using the other cranial regions.
In the analysis of the Santa Cruz specimens, FORDISC achieved the highest
rate of success when the population was included using the facial variables
(40.0% - sex unspecified, 45% - sex specified). The facial variables were also the
most successful when the source population was excluded. However, like the
Berg, the c1assi'fication rates were very low and no test specimens were correctly
classified using the basicranium dataset.
With the Northern Japanese group FORDISC assigned specimens correctly at
the recommended probability levels only when using the facial variables. No
individuals were correctly assigned using the other two variable sets.
Surprisingly, FORDISC placed more specimens correctly when the source
population was excluded. Three individuals (7.5%) were placed with the
58
Southern Japanese group when the sex was specified, while only one each was
correctly assigned when the source population was included and the sex
unspeci'fied or specified.
For the Tasmanian sample, FORDISC returned more correct assignments using
the neurocranium variables when the population was included. When the
Tasmanian group was excluded and the sex unspecified, the best result was
obtained using the face dataset (7.5% compared to 2.5% for neurocranium or
basicranium). However, when the sex was specified, FORDISC placed more
specimens into the Australian group using the neurocranial dataset (12.5%
versus 10.0% using the face and 5.0% using the basicranium).
Lastly, for the Zulu group, at the recommended probability levels, FORDISC
achieved the best results using the neurocranium dataset in all analyses.
However, the population-included results were very poor for the other cranial
regions and more individuals were correctly placed when the source population
was excluded.
S9
Table 14. Results for each cranial region using >0.5 posterior probabilityand >0.01 typicality probability (n=40)
Berg Santa Cruz N.Japan Tasmanian Zulu
ISU
Basicranium 4 0 0 0 0(10.0%) (0.0%) (0.0%) (0.0%) (0.0%)
Neurocranium 11 5 0 13 5(27.5%) (12.5%) (0.0%) (32.5%) (12.5%)
7 16 1 5 0Face(17.5%) (40.0%) (2.5%) (12.5%) (0.0%)
ISS
Basicranium 4 3 0 0 3(10.0%) (7.5%) (0.0%) (0.0%) (7.5%)
14 6 0 20 14Neurocranium(35.0%) (15.0%) (0.0%) (50.0%) (35.0%)
11 18 1 14 0Face(27.5%) (45.0%) (2.5%) (35.0%) (0.0%)
ESU
0 0 0 1 1Basicranium(0.0%) (0.0%) (0.0%) (2.5%) (2.5%)
0 0 0 1 4Neurocranium(0.0%) (0.0%) (0.0%) (2.5%) (10.0%)
0 1 1 3 1Face(0.0%) (2.5%) (2.5%) (7.5%) (2.5%)
ESS
0 0 0 2 2Basicranium(0.0%) (0.0%) (0.0%) (5.0%) (5.0%)
0 1 0 5 10Neurocranium(0.0%) (2.5%) (0.0%) (12.5%) (25.0%)
2 3 3 4 2Face
(5.0%) (7.5%) (7.5%) (10.0%) (5.0%)
60
3.3.5. Summary. When all the test populations were pooled, the
neurocranium produced the best results of the three cranial regions. With two
exceptions, the face dataset obtained the next-best results. The basicranium
consistently returned the lowest number of correct assignments. Indeed, in two
cases, FORDISC was unable to classify a single individual out of 200 using this
dataset. However, when the populations were considered individually, the results
were inconsistent and the rates of correct classification were very low in general.
In sum, FORDISC varied in its ability to classify individuals correctly with respect
to cranial region.
61
4. Discussion
4.1. Main findings
The difference between FORDISC's success rate in the source population
included analyses and its success rate in the source population-excluded
analyses was substantial. When the whole cranium dataset was analyzed with
the source population included, more than two thirds of the test specimens were
classified correctly (70-89%) whereas when the whole cranium dataset was
analyzed with the source population excluded less than half of the test
specimens were classified correctly (12-40%). Far fewer specimens were
classified correctly in the analyses that focused on an individual anatomical
region, but the number classified correctly in the source population-included
analyses was always at least twice the number classified correctly in the source
population-excluded analyses. Thus, the presence or absence of the source
population in the reference sample greatly impacts the accuracy of FORDISC.
Specifically, the analyses suggest that if a test specimen's source population is
represented in FORDISC's reference sample, there is a reasonable chance that
the ancestry will be accurately determined, whereas if the specimen's source
population is not represented in FORDISC's reference sample, there is little
chance that its ancestry will be accurately determined.
62
The finding that a test specimen's source population has to be represented in the
reference sample in order for there to be a reasonable chance for its ancestry to
be accurately determined is consistent with Jantz and Ousley's (2005) cautions
regarding the use of FORDISC. However, as a result it challenges the utility of
the program in any but the most restricted circumstances. As noted in the
Introduction, it is entirely possible for a set of remains to be from any place in the
world, particularly if they are recent. Consequently, the likelihood of being able to
determine in advance if an unknown specimen's population is represented in the
FORDISC reference group sample is extremely low. If FORDISC is only effective
when an individual's source population is represented in the reference sample
and a researcher must establish this in order to be confident about the program's
determinations, there is no point in actually undertaking a FORDISC analysis. At
best it will only confirm a determination made by some other means. urthermore,
if the test specimen's source population is not represented in the program's
reference sample and a specimen is analyzed anyway, an investigator cannot be
confident that the resulting determination actually corresponds to a closely
related population. In the end, the analysis has not assisted in narrowing an
individual's ancestry.
As discussed earlier, Jantz and Ousley have also suggested that secular change
may be responsible for some of FORDISC's poor performance in previous tests.
This means that in addition to needing to have the source population
represented, FORDISC also requires a specimen to be contemporaneous with
63
the specimens in the reference sample to be reliable. The Forensic Databank
includes modern forensic cases as well as "mid to late 19th century Amerindian
remains" (Jantz and Ousley 2005:35) while Howells' reference populations range
from 26-30th Dynasty Egyptians (600-200 B.C.) to mid-20th century dissection
room cadavers (Howells 1973). As a result, even if an investigator knew that an
unknown specimen came from Egypt, for example, if they could not also say it
came from the same time period as Howells' group, FORDISC's attribution would
have to be considered suspect. Furthermore, if modern Americans have changed
so significantly in the last 150 years, the point at which secular changes override
population differences needs to be clearly established.
This study also determined that restricting the program to the relevant sex
improved FORDISC's ability to correctly attribute ancestry. When the source
population was included and 56 variables were used, selecting the sex resulted
in a six percent improvement over not doing so. For the 1O-variable datasets,
selecting the sex achieved between seven and 11 percent better results than
when it was left unselected. The results were similar for the source population
excluded results, with between three and eight percent improvement when the
sex was selected. This suggests that accurately sexing an unknown specimen
through morphological examination is advisable before IJsing FORDISC to
determine a specimen's ancestry.
64
The results of the analyses also suggest that the number of variables greatly
affects FORDISC's ability to determine ancestry. When the 200 test specimens
were considered together, using 56 variables consistently returned the highest
rate of correct assignments, regardless of other criteria. Even in the "best-case
scenario" where the source population was represented in the reference sample,
the sex of the test specimen was specified, and all posterior and typicality
probabilities were accepted, 10 variables achieved less than half the success
rate that 56 variables obtained.
The 56-variable dataset also outperformed the three 1O-variable datasets when
the test specimens were broken down by population. The only exceptions were
the analyses in which the Zulu test specimens were analyzed without the Zulu
population being represented in the reference sample. In these analyses, the 10
variable neurocranium dataset returned more correct assignments than the 56
variable dataset.
In general, these results contradict the claims by Jantz and Ousley (2005) that
"as more variables are added, there is a tendency for the classification accuracy
to plateau and then decrease" (p. 50) and support the findings of other
researchers that better discrimination is achieved by maximizing the number of
variables (Hubbe and Neves 2005). Furthermore, they suggest that, contrary to
claims regarding FORDISC (Ubelaker et al. 2002), the program cannot be used
65
with confidence on incomplete remains from which only a few measurements can
be obtained.
The effect of anatomical region on FORDISC's ability to identify ancestry was not
resolved by this study. Although FORDISC achieved the best results on average
for the five groups using the neurocranium, it did not do so consistently across
populations and the returns were very low in general. When the results were
considered as a whole, the neurocranium was the most effective for determining
ancestry, followed by the face and basicranium.
These results conflict with the prediction that the basicranium would be the most
successful because it is the most phylogenetically and ontogenetically stable of
the three regions, while the face would be the least successful due to non
genetic inl~uences on its shape. In fact, while the neurocranium and facial regions
vied for the highest success rate, the basicranial variable set consistently
returned the fewest correct assignments. However, because all three regional
datasets performed so poorly, this question was not fully resolved by the current
study.
4.2 Implications for use of FORDISC
The results of this study suggest that the utility of FORDISC is limited. In order
for the program to yield an accurate determination of ancestry, the target
66
specimen's source population must be present in FORDISC's reference sample
and its sex must be known. In addition, the target specimen must be complete
enough for more than 10 measurements to be recorded on it and for those
measurements to relate to more than one region of the cranium.
The utility of FORDISC may in fact be more limited than the analyses reported
here suggest. During the course of the study, it became apparent that the
evaluation criteria that have been recommended are ineffective. The following
figures relate to the set of analyses that yielded the highest number of correctly
classified specimens-that is, the analyses in which the source population was
included in the reference sample, sex was specified and 56 variables were
employed. Using the 0.5 posterior probability/0.01 typicality probability
combination, five "correct" test individuals (2.5%) would be falsely rejected. Using
the same criteria, 16 (8%) incorrect determinations would be falsely accepted.
Using the 0.8 posterior probability/0.01 typicality probability combination, 17
(8.5%) of the test individuals would be rejected even though they were correct,
and 'five (2.5%) "incorrect" determinations would be considered correct. Thus,
neither of the recommended combinations of posterior probability and typicality
probability enables us to be confident that the ancestry of a specimen has been
correctly determined.
With the foregoing in mind, a sectioning point for the posterior probability and
typicality probabilities was calculated from the results of the analyses that yielded
67
the highest number of correctly classified specimens. The posterior probabilities
associated with incorrect assignments ranged from 0.389 to 0.991, while the
typicality probabilities ranged from 0.000 to 0.952 (Table 15). This indicates that,
for an ancestry determination to be considered correct without ambiguity, the
posterior probability must be greater than 0.991 and the typicality probability
must be higher than 0.952. Using these criteria, only two determinations out of
200 (1.0%) would be considered unambiguously correct and the rest would have
to be considered unclassifiable. Clearly, if in the best case scenario only 1.0% of
FORDISC's attributions can be accepted with confidence, this has serious
implications for the program's utility.
Table 15: Range of posterior and typicality probabilities for correct andincorrect assignments by population
CORRECT INCORRECT
PP PP TP TP PP PP TP TPMIN MAX MIN MAX MIN MAX MIN MAX
Berg .646 1.0 0.0 .947 .521 .876 0.0 .643
Snt. Cruz .752 1.0 0.077 .942 - - - -
N.Japan .593 1.0 .196 .964 .447 .850 0.0 .952
Tasmania .873 1.0 .043 .935 .436 .991 .327 .690
Zulu .546 1.0 0.0 .964 .389 .939 .440 .482
Even this may overestimate FORDISC's accuracy. As noted in the Materials and
Methods, Howells selected 50-55 crania of each sex to represent each group.
For a number of groups, this meant that only a small percentage of the available
individuals were measured. For example, the 26th_30th Dynasty Egyptian crania
68
were selected from a sample of nearly 1800. Significantly, the individuals were
not chosen at random. Rather, Howells "carefully selected" specimens that he
considered to be typical of the group (Howells 1995: 3). Crania that were
"morphologically unusual for the population as a whole" (Howells 1989: 89) were
not included, even if there were no obvious pathological changes to account for
the differences. Thus, Howells' data collection strategy was such that the degree
of overlap among the reference populations is likely to be artificially low. Given
that the accuracy of classification in DFA is inversely correlated with the degree
of overlap among groups, it is likely that the analyses reported here overestimate
the accuracy of FORDISC.
There is a further reason for suspecting that the study reported here may have
overestimated the utility of FORDISC. A number of the collections Howells
analyzed did not include "mandibles or skeletal parts to aid in the diagnosis" of
sex (Howells 1989: 91). Consequently, sex was frequently assessed on cranial
morphology alone. Although Howells attempted to corroborate his estimates with
those of other researchers who had examined the remains, he admitted that
some of the skulls of known sex "would certainly have been assigned to the
wrong sex if it had been done by inspection" (Howell 1989: 94). This suggests
that the sexes of Howells' populations may be more different than they should
be. The corollary of this is that the success rate of FORDISC in the analyses in
which sex was specified may have been artificially high.
69
4.3. Future considerations
FORDISC's utility may be limited because the nature of human variation is such
that ancestry cannot be determined from skeletal remains, as Williams et al.
(2005) have suggested. However, the importance of determining ancestry is
great enough that it would seem sensible to investigate other possibilities before
concluding that ancestry is an aspect of the biological profile that cannot be
accessed from the human skeleton.
One potential cause of FORDISC's poor performance is its reliance on two
dimensional measurement data. Three-dimensional landmark data may capture
more of the morphological differences among populations and therefore provide
a better basis for determining the ancestry of unknown specimens. Although
studies are beginning to use three-dimensional geometric morphometric methods
to explore population history and climate signals in modern human cranial
morphology (e.g. Harvati and Weaver 2006b), none has attempted to apply these
methods to estimate ancestry of unknown remains.
A second potential cause of FORDISC's poor performance is its reliance on
cranial data. Work on the utility of the cranium for reconstructing primate
phylogeny raises the possibility that the cranium is either an inadequate source
of information regarding ancestry or perhaps even a misleading one (ct. Collard
and Wood 2000). Although earlier studies met with limited success using
70
postcranial data for ancestry determination (Marino 1997; Ballard 1999; Holliday
and Falsetti 1999: Patriquin et al. 2002), it may be worthwhile investigating
whether supplementing cranial data with postcranial data and/or data from the
teeth and lower jaw provides more accurate determinations of ancestry.
A third potential cause of FORDISC's poor performance is its reliance on
Discriminant Function Analysis. It is possible that FORDISC's success rate is so
limited because DFA does not distinguish the form of similarity that is informative
with respect to ancestry-shared derived similarity-from forms of similarity that
are not informative regarding ancestry, such as shared primitive similarity and
convergent similarity. Accordingly, it would be worthwhile trying to adapt
phylogenetic methods that focus on shared derived similarity, such as cladistics
(Hennig 1966), to the problem of determining the ancestry of unknown skeletal
specimens.
While these possibilities are being explored, FORDISC will almost certainly
continue to be used to assist with ancestry determinations. With this in mind,
there would seem to be a pressing need to expand FORDISC's reference
samples. Ideally this would involve maximizing both the numbers of individuals
and populations represented, and ensuring that as many temporal periods are
covered as possible. Although Jantz and Ousley have supplemented the
Forensic Databank with new material, they have not similarly augmented the
Howells samples in FORDISC. While some remains have already been
71
repatriated, it would seem advisable to take advantage of the large number of
skeletal collections available in institutions around the world to fill in the temporal,
geographic or representational gaps in FORDISC's reference sample.
There is also a pressing need to investigate the relationship between number of
variables and success rate in greater detail. In the current study the maximum
number of variables common to all groups was compared to the recommended
minimum according to the FORDISC manual to determine how variable number
affected FORDISC's success rate. Although this provided a clear indication that
10 variables are insufficient to achieve good results, it did not establish what a
reasonable minimum might be. Given the fragmentary nature of many
bioarchaeological and forensic specimens, it would be useful to repeat the
analyses with 20, 30 and 40 variables to determine if the classification rate
improved consistently as the number of variables increased or whether it levels
off.
Lastly, during the 2007 FORDISC 3.0 workshop, Jantz and Ousley outlined a
new option in the program that allows a specimen to be analyzed on the basis of
shape alone. The option was developed, to "neutralize" the confounding effects
of sex (Jantz and Ousley 2007: 40). Given the marked impact that controlling for
sex had on FORDISC's success rate in the current study, it would be sensible to
examine whether employing the shape-only option results in more specimens
being correctly classified than when ancestry is determined on the basis of shape
72
and size. If the former proves to be the case, then the shape-only option may
improve the success rate of FORDISC when dealing with specimens that cannot
be sexed with confidence.
While this new transformation option might ensure that an unknown is assessed
on the basis of shape alone and is not significantly smaller than the reference
samples, it is not uncomplicated. Other evidence suggests that males and
females of a given population are not simply different sized variants of the same
basic form (Wood and Lynch 1996). As non-metric assessments attest, there are
clear shape differences between males and females irrespective of ancestry. If
males tend to have similar proportions regardless of size or population, removing
size would not necessarily help FORDISC achieve the correct ancestry. If this is
the case, then using the new shape transformation function in FORDISC 3.0
would result in males clustering with males and prove only that a skull has a male
shape and not that the shape necessarily relates to ancestry.
73
5. Conclusions
This study explored several issues related to the computer program FORDISC.
In particular, it addressed problems related to population representation in the
database, the number of variables to use in an analysis, the effect of constraining
sex, the effect of anatomical region, and the challenge of interpreting the results.
This research was undertaken in part because these issues are fundamental to
the appropriate use of the program. As FORDISC becomes more popular, a
danger lies in investigators using the program without fully understanding its
limitations. Additionally, the ongoing FORDISC debate has done little to resolve
the questions that have arisen around the program's performance. In fact, it
seems that each time a criticism of the program is raised, FORDISC's developers
add a new caveat to its use. Given the popularity of FORDISC and the
confidence place in it, it was deemed important to determine not only how
effective the program is, but whether or not the criticisms of it are valid.
In total, this study carried out four sets of analyses on four separate datasets for
200 individuals from within FORDISC's reference sample. The test datasets were
selected to include the range of possibilities in terms of both variable number and
anatomical region, while the test individuals were chosen from five populations
representing separate geographic regions. The first set of analyses tested each
dataset using all populations (including the one from which the test individual was
drawn) and both sexes. The second set of analyses also included all populations,
74
but restricted FORDISC's comparison to members of the same sex. The third set
of analyses excluded the test individual's source population but used both males
and females of the remaining groups. The fourth set of analyses excluded the
test individual's source population and compared it only to members of the same
sex.
The results of this study support FORDISC's developers' caution against using
the program if a representative population is not available. However, if a
population is not represented in the database, FORDISC cannot be expected to
find a closely related population - either geographically or genetically. This
suggests that while FORDISC may be useful in very restricted contexts, its
widespread use on geographically or temporally remote populations is not
acceptable.
With respect to variable number, the results contradict FORDISC's developers'
contention that using too many variables reduces performance. Instead, this
study found that FORDISC only achieved reasonable rates of success when the
number of variables was maximized. Reducing the number of variables to the
level recommended by the developers for the size of the reference sample,
resulted in exceedingly low success rates. The results were also not consistent
between test populations. Consequently, these results suggest that the program
should not be used on incomplete remains if sufficient numbers of measurements
cannot be obtained.
7S
This study also determined that FORDISC was more accurate in assigning
ancestry when comparing a specimen only to members of its own sex. When
both sexes of each population were included in the comparison, FORDISC did
not consistently select the appropriate ancestry. Unfortunately, it did not
necessarily select the same sex either. While these results suggest that size may
be confounding FORDISC's determination of ancestry, the problem requires
further investigation.
The issue of how anatomical region affects FORDISC's ability to determine
ancestry was not fully resolved by this research. Although the neurocranial
region achieved the best results overall, all three regions performed very poorly.
Furthermore, the results varied across the five test populations. However, it was
not possible to settle this question through the current FORDISC program as the
number of variables associated with each anatomical region is limited.
Lastly, the issue of how best to interpret the results in terms of the recommended
posterior and typicality probabilities was also not fully resolved. At the levels
recommended by the FORDISC 3.0 manual, more incorrect determinations
would erroneously be considered correct. However, at the levels recommended
by the FORDISC 3.0 workshops, more correct determinations would be rejected
as incorrect. Neither of these recommendations appeared to correspond with a
natural sectioning point between correct and incorrect attributions. However,
76
when a sectioning point was calculated directly from the data, almost every
ancestry determination had to be considered either inconclusive or incorrect.
As it stands, FORDISC requires the population, the time period, the sex and as
many measurements as possible for a set of remains before it can be expected
to return a reasonable estimation of ancestry. Furthermore, if FORDISC does not
achieve a higher than 0.991 posterior probability in addition to a 0.952 typicality
probability, the resulting ancestry determination must be considered ambiguous.
Given this situation, the only conclusion that can be drawn is that if FORDISC is
used at all, it should only be under extremely restricted circumstances or to
provide limited confirmation of information gathered through other means.
77
References
Albanese, J. and S.R Saunders2006 Is it possible to escape racial typology in Forensic Identification? InForensic Anthropology and Medicine: Complementary Sciences fromRecovery to Cause of Death. Schmitt, A, Cunha, E and J. Pinheiro eds.Totowa: Humana Press Inc.
Angel, J. L.1976 Colonial to modern skeletal change in the U.S.A., American Journalof Physical Anthropology. 45:723-736.
Anthropolog2005 Newsletter of the Department of Anthropology. National Museum ofNatural History. Accessed 02/22/08 via http://www.google.com/search?q=american +academy+of+forensic+science+fordisc+workshop&sourceid=navclient-ff&ie=UTF-8&rlz=1B2GGFB_ enCA218&aq=t
Arlington National Cemetery Website2005 Richard Vandergeer, Second Lieutenant, USAF memorial page(http://www.arlingtoncemetery.neUrvandergeer.html) Accessed:01/25/2007.
Ballard, M.E.1999 Anterior femoral curvature revisited: race assessment from thefemur. Journal of Forensic Sciences. Vol. 44:4.
Bass, W.M.1995 Human Osteology: A Laboratory and Field Manual. Columbia, MO:Missouri Archaeological Society.
Beals, K., Smith, C.L. and S.M. Dodd1983 Climate and the evolution of brachycephalization. American Journalof Physical Anthropology. Vol. 62:4.
Belcher, R, Williams, F. & GJ Armelagos2002 Misidentification of Meroitic Nubians using Fordisc 2.0. (Abstract)American Journal of Physical Anthropology. Vol 117, Supplement 34:42.
Bendor-Samuel J, and RL. Hartell (editors)1989 The Niger-Congo Languages - A classification and description of
Africa's largest language family. Lanham, Maryland: University Press ofAmerica.
78
Boas, F.1911 Changes in bodily form of descendants of immigrants. In Reports ofthe Immigration Commission. (1907-1910), Vol 38. Washington:Government Printing Office.
Brues, A.M.1991 The Once and Future Diagnosis of Race. In Skeletal Attribution ofRace. Gill, G.W. and S. Rhine eds. Abuquerque, NM: Maxwell Museum ofAnthropology.
Buikstra, J.E. and D.H. Ubelaker.1994 Standards for Data Collection from Human Skeletal Remains.Fayetteville, AK: Arkansas Archaeological Society
Campbell, L.1997 American Indian Languages: The Historical Linguistics of NativeAmerica. New York: Oxford University Press.
Campbell A.R, and G.J. Armelagos2007 Assessment of FORDISC 3.0's accuracy in classifying individualsfrom WW Howell's populations and the forensic data bank. (Abstract)American Journal of Physical Anthropology Vol.132 Suppl. 44, P 83-84.
Carey, J.W. and A.T. Steegmann Jr.1981 Human Nasal Protrusion, Latitude, and Climate. American Journalof Physical Anthropology. Vol. 56: 3.
Cavalli-Sforza, LL, Menozi, P and A. Piazza1994 The history and geography of human genes. Princeton: UniversityPress.
Collard, M. and B. Wood2000 How reliable are human phylogenetic hypotheses? Proceedings ofthe National Academy of Sciences (PNAS). Vol. 97:9.
Coon, C.S., Gam, S.M. and J.B. Birdsell1950 Races: a study of the problems of race formation in man. Springfield,IL: Charles C. Thomas.
Corruccini, R.S.1974 An examination of the meaning of cranial discrete traits for humanskeletal biological studies. American Journal of Physical Anthropology.Vol. 40: 3.
79
Cox, Katharine, N.G Tayles, & H.R Buckley2006 Forensic Identification of 'Race': The Issues in New Zealand. CurrentAnthropology. Vol. 47: 5.
Cunningham, D.L. & D.J. Westcott2002 Within-group human variation in the Asian Pleistocene: the threeUpper Cave crania. Journal of Human Evolution. Vol. 42: 627-638.
Franciscus, R.G and J.C. Long1991 Variation in human nasal height and breadth. American Journal ofPhysical Anthropology. Vol. 85: 4.
Freid, D., Spradley, M.K., Jantz, R.L. and S.D. Ousley2005 The truth is out there: how NOT to use FORDISC. (Abstract)American Journal of Physical Anthropology, Vol 126, Supplement 40.
Fukuzawa, S. and A. Maish1997 Racial Identi'f1cation of Ontario lroquoian Crania Using FORDISC2.0 (Abstract) from the 44th annual meeting of the Canadian Society ofForensic Science. Accessed via http://www.csfs.ca/journal/reginabstr.htm
Giles E. and O. Elliot1962 Race identification from cranial measurements. Journal of ForensicSciences. Vol. 7: 147-157.
Gill, G.W. and M. Gilbert1990 Race identification from the midfacial skeleton: American blacks andwhites. In Skeletal Attribution of Race. Gill, G.W. and S. Rhine eds.Abuquerque, NM: Maxwell Museum of Anthropology.
Harvati, K and T.D. Weaver2006a Reliability of cranial morphology in reconstructing Neandertalphylogeny. In Neanderta/s revisited: new approaches and perspectives.Harvati, K and T. Harrison, eds. Dordrecht: Springer 239-254.
-- 2006b Human Cranial Anatomy and the Differential Preservation ofPopulation History and Climate Signatures. The Anatomical Record, PartA,288A:1225-1233.
Hennig, W.1966 Phylogenetic systematics. Urbana: University of Illinois Press.
80
Hiernaux, J.1963 Heredity and environment: their innuence on human morphology; acomparison of two independent lines of study. American Journal ofPhysical Anthropology. Vol 21: 575-589.
Holliday, T.W. and A.B. Falsetti1999 A new method for discriminating African-American from EuropeanAmerican skeletons using postcranial osteometries reflective of bodyshape. Journal of Forensic Sciences. Vol. 44: 5. 926-30.
Howells, W.W.1973 Cranial Variation in Man: A Study by Multivariate Analysis ofPatterns of Difference Among Recent Human Populations. Papers of thePeabody Museum of Archaeology and Ethnology, Volume 67.
-- 1989 Skull Shapes and the Map. Cambridge, MA: Papers of thePeabody Museum of Archaeology and Ethnology, Volume 78.
-- 1995 Who's who in skulls: ethnic identification of crania frommeasurements. Cambridge, MA, Peabody Museum of Archaeology andEthnology, Volume 82.
-- 1996 Howells' craniometric data on the internet. American Journal ofPhysical Anthropology. Vol. 101: 3.
Hubbe, M & WA Neves2007 On the Misclassification of Human Crania. Discussion. CurrentAnthropology, volume 48, pp. 285-288.
Huberty, C.J.1994 Applied Discriminant Analysis. In Wiley series in probability andmathematical statistics. Applied probability and statistics. New York, NY:Wiley.
Hudjashov G, Kivisild T, Underhill PA, Endicott P, Sanchez JJ, Lin AA, Shen P,Oefner P, Renfrew C, and R. Villems
2007 Revealing the prehistoric settlement of Australia by Y chromosomeand mtDNA analysis. Proceedings of the National Academy of Sciences(PNAS). 104(21 ):8726-8730.
Hughes, D.R.1968 Skeletal plasticity and its relevance in the Study of EarlierPopulations. In The Skeletal Biology of Earlier Human Populations. D. R.Brothwell editor, pp. 31-55. London: Thames and Hudson.
81
Hylander, W. L.1977 The adaptive significance of Eskimo craniofacial morphology. InOrofacial Growth and Development. Dahlberg, A.A and T.M. Graber eds.Chicago, IL: Mouton 129-170.
Jantz, R.L. and L. Meadows Jantz2000 Secular change in craniofacial morphology. American Journal ofHuman Biology. Vol. 12:327-338.
Jantz, RL & SO. Ousley1992 FORDISC 1.0: Computerized Forensic Discriminant Functions. TheUniversity of Tennessee, Knoxville.
-- 1996 FORDISC 2.0: Computerized Forensic Discriminant Functions.The University of Tennessee, Knoxville.
-- 2005 FORDISC 3: Computerized Forensic Discriminant Functions.Version 3.0. The University of Tennessee, Knoxville.
-- 2007 FORDISC 3.0: Theory, Methods and Applications. Workshop heldin San Antonio, TX. February 20,2007.
Kaestle, F.A, and D.G. Smith2001 Ancient mitochondrial DNA evidence for prehistoric populationmovement: The numic expansion. American Journal of PhysicalAnthropology 115(1 ): 1-12.
Keita, S. O. Y.2007 On Meroitic Nubian Crania, Fordisc 2.0 and Human BiologicalHistory. Discussion. Current Anthropology 48: 425-427.
Kitson, E.1931 A Study of the Negro Skull with Special Reference to the Craniafrom Kenya Colony. Biometrika 23(3/4): 271-314.
Knight, A, Underhill, PA, Mortensen, HM, Zhivotovsky, LA, Lin, AA, Henn, BM,Louis, 0, Ruhlen, M, and J.L. Mountain
2003 African Y Chromosome and mtDNA Divergence Provides Insightinto the History of Click Languages. Current Biology 13(6):464-473.
Kosiba, S.2000 Assessing the Efficacy and Pragmatism of "Race" Designation inHuman Skeletal Identification: A Test of Fordisc 2.0 Program (Abstract).American Journal of Physical Anthropology, Vol 111, Supplement 30:200.
82
Leathers, A, Edwards, J, & GJ Armelagos2002 Assessment of Classification of Crania Using Fordisc 2.0: NubianX-Group Test (Abstract). American Journal of Physical Anthropology Vol.117, S34:99-100.
Lieberman, D., Krovitz, G.E., Yates, F.W., Devlin, M. and M. St.Claire2004 Effects of food processing on masticatory strain and craniofacialgrowth in a retrognathic face. Journal of Human Evolution. Vol. 46: 6.
Lovvorn, MB, Gill, GW, Carlson, GF, Bozell, JR, & TL. Steinacher1999 Microevolution and the Skeletal Traits of a Middle Archaic Burial:Metric and Multivariate Comparison to Paleoindians and ModernAmerindians. American Antiquity, Vol. 64, NO.3. pp. 527-545.
Mangold, WL, Nawrocki, SP, & J. Scherbauer1993 The Shaffer Site (12 GR 109): Additional information on an AlbeePhase Site in the White River Valley. Indiana University. Accessed01/06/07 via www.gbl.indiana.edu/abstracts/93/mangold_93.html
Naar, N. A., D. Hilgenberg, and G.J Armelagos2006 Fordisc 2.0 the ultimate test: What is the truth? (Abstract) AmericanJournal of Physical Anthropology, Vol 129, Supplement 42:136.
Nicholson, E. and K. Harvati2006 Quantitative analysis of human mandibular shape using threedimensional geometric morphometries. American Journal of PhysicalAnthropology. Vol. 131: 3, 368-383.
O'Connell, JF and J. Allen1998 When Did Humans First Arrive in Greater Australia and Why Is ItImportant to Know? Evolutionary Anthropology. Vol 6:132-146.
Omoto K, and Saitou N.1997 Genetic origins of the Japanese: A partial support for the dualstructure hypothesis. American Journal of Physical Anthropology102(4):437-446.
Patriquin, M.L., Steyn, M. and S.R. Loth.2002 Metric assessment of race from the pelVis in South Africans.Forensic Science International. Vol. 127:1-2, pp. 104-113.
Peitrusewsky. M2000 Metric Analysis of Skeletal Remains: Methods and Applications. InBiological Anthropology of the Human Skeleton. M.A. Katzenberg and S.Saunders eds. New York, NY: Wiley-Liss. 375-416.
83
Redd, A.J, and M. Stoneking1999 Peopling of Sahul: mtDNA Variation in Aboriginal Australian andPapua New Guinean Populations. American Journal of Human Genetics65(3).
Roseman, Charles C.2004 Detecting interregionally diversifying natural selection on modernhuman cranial form by using matched molecular and morphometric data.Proceedings of the National Academy of Sciences (PNAS). Vol 101 :35,12824-12829.
Roseman, C.C. and T.D.Weaver2004 Multivariate apportionment of global human craniometric diversity.American Journal of Physical Anthropology. Vol. 125: 257-263.
Sejrsen B, Lynnerup N & Hejmadi M.2005 An historical skull collection and its use in forensic odontology andanthropology. Journal of Forensic Odontostomatology. 2005 Dec.23(2):40-4.
Skelton, RR1996 A Suggested Method for Using Means Data in DiscriminantFunctions Using Anthropometric Data. Journal of World Anthropology. Vol1(4).
Skelton, Rand H. McHenry1992 Evolutionary relationships among early hominids. Journal of HumanEvolution. Vol 23: 309-349.
Smith, BH, Gam, SM and WS Hunter1986 Secular trend in face size. Angle Orthodontist. Vol. 56: 196-204.
Spradley, M.K, Ousley, SD and RL Jantz2008 Evaluating Cranial Morphometric Relationships using DiscriminantFunction Analysis. (Abstract) American Journal of Physical Anthropology.Vo1.135: S46, 199.
Steadman, DW, Adams, BJ, & LW. Konigsberg2006 Statistical basis for positive identification in forensic anthropology.American Journal of Physical Anthropology. Vol 131 (1), pp15-26.
Ubelaker, DH., Ross, AH and SM Graver2002 Application of Forensic Discriminant Functions to a Spanish CranialSample. Forensic Science Communications 4(3).
84
Walsh, SJ and C. Eckhoff2007 Australian Aboriginal population genetics at the D1 S80 VNTR locus.Annals of Human Biology. Vol. 34: 5, 557-565.
Webb RE, and Rindos DJ.1997 The Mode and Tempo of the Initial Human Colonization of EmptyLandmasses: Sahul and the Americas Compared. p 233-250.
Wescott, D.J and R.L. Jantz2005 Assessing Craniofacial Secular Change in American Blacks andWhites Using Geometric Morphometry. In Modern Morphometries inPhysical Anthropology. New York: Kluwer Academic/Plenum Publishers.p.231-45.
Williams, F L'Engle, Belcher, RL. & GJ. Armelagos2005 Forensic Misclassification of Ancient Nubian Crania: Implications forAssumptions about Human Variation. Current Anthropology 46(2): 340346.
Williams, Paul B., Erickson, P and L. Niven2001 Retrieving History: The 18th Century Mortuary History of the LittleDutch Church, Halifax. Paper Presented At The 33rd Annual Meeting ofThe Canadian Archaeological Association.
Wood, B. and D. Lieberman2001 Craniodental variation in Paranthropus boisei: a developmental andfunctional perspective. American Journal of Physical Anthropology.116:13-25.
Wood, C. and J.M. Lynch1996 Sexual dimorphism in the craniofacial skeleton of modern humans.In Advances in Morphometries. F.L. Marcus, M. Corti, A. Loy, G.J.P Naylorand D.E. Slice, editors. NATO ASI Series A: Life Sciences Vol. 284.
Wright, R. V. S.1992 Correlation between cranial form and geography in Homo sapiens:CRANID - A computer program for forensic and other applications.Archaeology in Oceania (27): 128-34.
-- 2005 Guide to using the CRANID program CR5Ind.exe. Accessed viahttp://box.neUpublic/richwrig/dfiles/CR5Ind.lIP
85
Appendix I
Howells' measurements used in FOROISC
Measurement Description
GOl glabello-occipital (maximum cranial) length
NOl nasio-occipital length
BNl basion nasion (cranial base) length
BBH basion bregma height
XCB maximum cranial width
XFB max frontal breadth
STB bistephanic breadth
ZYB bizygomatic breadth
AUB biauricular breadth
WCB minimum cranial breadth
ASB biasterionic breadth
BPl basion prosthion length
NPH nasion prosthion height
NlH nasal height
OBH orbital height
OBB orbital breadth
JUB bijugal breadth
NlB nasal breadth
MAB palate breadth
MOH mastoid height
MOB mastoid width
2MB Bimaxillary breadth
SSS zygomaxillary subtense
FMB bifrontal breadth
NAS nasio-frontal subtense
86
EKB biorbital breadth
OKS dacryon subtense
OKB interorbital breadth
NOS naso-dacryal subtense
WNB simotic chord
SIS simotic subtense
IML malar length, inferior
XML malar length maximum
MLS malar subtense
WMH cheek height
SOS supraorbital projection
GLS glabella projection
FOL foramen magnum length
FRC nasion-bregma chord
FRS nasion-bregma subtense
FRF nasion-subtense fraction
PAC bregma-lambda chord
PAS bregma-lambda subtense
PAF bregma-subtense fraction
OCC lambda-opisthion chord
OCS lambda-opisthion subtense
OCF lambda-subtense fraction
VRR vertex radius
NAR nasion radius
SSR subspinale radius
PRR prosthion radius
OKR dacryon radius
ZOR zygoorbitale radius
FMR frontomalare radius
87
EKR ectoconichion radius
ZMR zygomaxillare radius
AVR M1 alveolus radius
NAA nasion angle ba-pr
PRA prosthion angle na-ba
BAA basion angle na-pr
NBA nasion angle ba-br
BBA basion angle na-br
SSA zygomaxillare angle
NFA nasio-frontal angle
DKA dacryal angle
NDA naso-dacryal angle
SIA simotic angle
FRA frontal angle
PAA parietal angle
OCA occipital angle
BRR Bregma radius
LAR Lambda radius
OSR Opisthion radius
BAR Basion radius
88
Appendix II
Howells populations used in FORDISC and their sample sizes.(test samples in bold).
Abbreviation3 Population Location Males/Females
NOR Medieval Norse Norway 55/55
ZAl Medieval Zalavar Hungary 53/45
BER Berg Austria 56/53
EGYEgyptian (26-30
Egypt 58/53Dynasty)
TEl Teita Kenya 33/50
DOG Dogon Mali 47/52
ZUL Zulu South Africa 55/46
BUS Bushman South Africa 41/49
AND Andaman Islanders Indian ocean 35/35
AUS lake Alexandrina South Australia 52/49Tribes
TAS Tasmanian Tasmania 45/42
TOl TolaiPapua New
56/54Guinea
MOK Mokapu Hawaii 51/49
BUR Buriat Siberia 55/54
ESK Inugsuk Eskimo Greenland 53/55
ARI Arikara South Dakota 42/27
PER Yauyos Peru 55/55
3 Used by FORDISC 3.0 when displaying the results of an analysis.89
EAS Easter Islanders South Padfic 49/37
AIN Ainu Japan 48/38
NJA Hokkaido North Japan 55/32
SJA Kyushu South Japan 50/41
HAl Hainan South China Sea 45/38
ANY Anyang Northeast China 42/0
ATA Atayal Taiwan 29/18
PHI Philippino Philippines 50/0
GUA Indigenous Guam South Pacific 30/27
MOR Moriori Chatham Islands 57/51
SAN Santa Cruz California 51/51
90
Appendix III
Variable sets used in the current study
Variable Sets Variables Used
ASB,AUB,AVR,BBH,BNl,BPl,OKB,OKR,OKS,EKB, EKR, FMB, FMR, FOl, FRC, FRF, FRS, GlS,GOl, IMl, JUB, MAB, MOH, MlS, NAR, NAS, NOS,
56 whole craniumNlB, NlH, NOl, NPH, OBB, OBH, OCC, OCF, OCS,PAC, PAF, PAS, PRR, SIS, SOS, SSR, SSS, STB,VRR, WCB, WMH, WNB, XCB, XFB, XMl, 2MB, ZMR,ZOR and ZYB
10 basicraniumAUB, WCB, ASB, MOH, MOB, OCC, OCS, OCF, FOl,and OCA
10 neurocraniumGOl, NOl, XCB, XFB, FMB, FRS, PAC, PAF, FRA andPM
10 faceBNl, NlB, MAB, OKB, NOS, WNB, NAR, OKR, PRAand OKA
91