Upload
mark-zimmerman
View
213
Download
0
Embed Size (px)
Citation preview
www.elsevier.com/locate/jad
Journal of Affective Disorders 80 (2004) 79–85
cr24
Brief report
An illustration of how a self-report diagnostic screening scale could
improve the internal validity of antidepressant efficacy trials
Mark Zimmerman*, Iwona Chelminski, Michael Posternak
Department of Psychiatry and Human Behavior, Brown University School of Medicine, Rhode Island Hospital, Providence, RI, USA
Received 9 September 2002; accepted 22 January 2003
Abstract
Background: During the past 20 years semi-structured diagnostic interviews have been the standard for diagnostic
evaluations in research relying on reliable and valid psychiatric assessment and diagnosis; however, only a minority of
antidepressant efficacy trials (AETs) employ these interviews. This might be important insofar as several studies have found that
clinicians conducting unstructured clinical interviews underrecognize diagnostic comorbidity. Because of the financial
incentives to recruit patients into AETs quickly, there is little incentive to vigorously determine the presence of comorbid
conditions that should result in exclusion from the trial. In the present report we demonstrate how a self-report diagnostic
screening scale could be used to identify systematic differences in diagnostic practice across settings, and how such a scale
could be used to compare samples of patients who pass screening evaluations and are accepted into an AET. Methods:
Depressed patients completed the Psychiatric Diagnostic Screening Questionnaire (PDSQ), and were evaluated with either an
unstructured clinical interview or with the Structured Clinical Interview for DSM-IV (SCID). Results: The two samples were
clinically comparable based on their scores on the self-administered PDSQ. Consistent with the greater thoroughness of the
SCID, compared to unstructured diagnostic evaluations, more patients administered the SCID were diagnosed with comorbid
conditions. After excluding patients with disorders that might be the basis for exclusion from an AET, the two samples then
differed in their scores on the PDSQ. That is, more patients in the sample evaluated by an unstructured interview had ‘occult’
pathology than patients evaluated with the SCID. Conclusion: These findings demonstrate how systematic differences in
diagnostic practice might be detected across sites when conducting AETs. Limitations: The study was conducted with patients
in a single outpatient clinical practice rather than participants of a multi-site trial.
D 2003 Elsevier B.V. All rights reserved.
Keywords: Major depressive disorder (MDD); Antidepressant efficacy trials (AETs); Self-report scale; Semi-structured diagnostic interview
1. Introduction
Antidepressant efficacy trials (AETs) rely on accu-
rate diagnostic determinations to select patients with
0165-0327/$ - see front matter D 2003 Elsevier B.V. All rights reserved.
doi:10.1016/S0165-0327(03)00050-8
* Corresponding author. Present address: Bayside Medical
Center, 235 Plain Street, Providence, RI 02905, USA.
E-mail address: [email protected] (M. Zimmerman).
the diagnosis of interest (usually major depressive
disorder, MDD) and exclude patients with comorbid
conditions. In discussing the reasons for failed AETs,
Robinson and Rickels (2000) questioned pharmaceu-
tical companies’ current practice of conducting multi-
site studies involving many treatment centers because
of difficulties maintaining control over the quality of
the diagnostic and outcome evaluations. Variable
M. Zimmerman et al. / Journal of Affective Disorders 80 (2004) 79–8580
competence and quality of diagnostic raters in out-
come studies introduces error variance in the data
collected, and this error variance may be one factor
which contributes to the failure to detect differences in
outcome between active medication and placebo.
Methods to identify and reduce this error variance
should improve the internal validity of AETs.
During the past 20 years semi-structured diagnostic
interviews have been the standard for diagnostic
evaluations in research relying on reliable and valid
psychiatric assessment and diagnosis. However, this
standard has not extended to AETs. We recently
reviewed 39 AETs to determine how many patients
in a routine clinical practice would have been exclud-
ed had the exclusion criteria from the trials been
applied (Zimmerman et al., submitted for publication).
Only eight (20.5%) of the 39 studies reported using
standardized diagnostic interviews to determine
whether patients had MDD, the comorbid diagnoses
requiring exclusion, or a bipolar or psychotic subtype
of depression requiring exclusion. The other 31 stud-
ies apparently relied on unstructured clinical evalua-
tions to diagnose patients.
During the past 3 years four independent reports
have questioned the accuracy of psychiatric diagnoses
made by clinicians using unstructured clinical inter-
views (Basco et al., 2000; Miller et al., 2001; Shear et
al., 2000; Zimmerman and Mattia, 1999). All four
research groups reported that clinicians conducting
unstructured diagnostic interviews underrecognize di-
agnostic comorbidity. Whether or not these findings
extend to clinical evaluators in AETs in unknown.
However, because of the aforementioned financial
incentives to recruit patients into AETs quickly, there
is little incentive to vigorously determine the presence
of comorbid conditions that should result in exclusion
from the trial.
It can be difficult to demonstrate differences in
diagnostic practice between clinicians, or between
clinical/research sites. One method would be to video
or audiotape diagnostic interviews and have them
reviewed by an independent ‘expert’ clinician. A
disadvantage of such an approach is that it is time
consuming and expensive. Another, less costly, meth-
od to determine whether diagnosticians systematically
differ in their diagnostic practice is with the use of
self-administered questionnaires as a ‘paper-standard’.
Zimmerman and colleagues suggested that diagnostic
criteria might be differentially applied across diagnos-
tic centers, and illustrated how the differential appli-
cation of criteria might be responsible for the
difficulty in independently replicating research find-
ings (Zimmerman et al., 1990). In a separate report
they suggested that self-report questionnaires offered
an inexpensive, non-laborious, empirical, and easily
standardized method that can detect systematic differ-
ences between research groups in diagnostic practices
(Zimmerman et al., 1993). The results of a self-report
scale can be used as a benchmark to which interview-
er-derived diagnoses can be compared, and this would
provide a method of detecting systematic differences
in diagnostic practice.
As part of the Rhode Island Methods to Improve
Diagnostic Assessment and Services (MIDAS) proj-
ect, our research group has developed a broad-based
self-report scale that screens for several DSM-IVAxis
I disorders—the Psychiatric Diagnostic Screening
Questionnaire (PDSQ; Zimmerman and Mattia,
2001a,b). In the present report we illustrate how a
self-report scale such as the PDSQ can be used to
identify systematic differences in diagnostic practice
and how this might influence who gets included in
AETs.
During the 7 years of the MIDAS project, some
patients have been evaluated with the Structured
Clinical Interview for DSM-IV (SCID), whereas other
patients have been evaluated by clinicians using
unstructured interviews (non-SCID sample). We con-
ducted a series of three analyses to test the hypothesis
that systematic differences between the different di-
agnostic methods could be detected by a self-report
scale. First, we determined if the depressed patients in
the SCID and non-SCID samples are clinically similar
by comparing the two groups on their scores on the
PDSQ. We did this because when searching for differ-
ences in diagnostic practice it is important to establish
the clinical equivalence of the comparison groups, or
to control for true sample differences. Second, we
compared the SCID and non-SCID samples in diag-
nostic frequencies. Based on our prior work we
predicted that more depressed patients in the SCID
than the non-SCID sample would be diagnosed with
comorbid disorders (Zimmerman and Mattia, 1999). If
this were true, and the two groups were similar on the
self-report PDSQ, this would reflect a systematic
diagnostic bias that is due to diagnostic method rather
M. Zimmerman et al. / Journal of Affective Disorders 80 (2004) 79–85 81
than true clinical differences between the samples.
And third, we compared patients in the SCID and
non-SCID samples on the PDSQ after excluding
patients based on the results of the diagnostic evalu-
ation. In other words, we excluded patients with
disorders that are often the basis for exclusion in
AETs. We predicted that patients in the non-SCID
group will now score higher on the PDSQ than
patients in the SCID group because there will be more
occult disorder in the patients who are evaluated less
thoroughly.
2. Method
2.1. Patients
More than 2000 patients have been evaluated in the
Rhode Island Hospital Department of Psychiatry out-
patient practice. This private practice group predom-
inantly treats individuals with medical insurance
(including Medicare but not Medicaid) on a fee-for-
service basis, and is distinct from the hospital’s
outpatient residency training clinic that predominantly
serves lower income, uninsured, and medical assis-
tance patients.
We examined psychiatric diagnoses in two non-
overlapping cohorts of patients who completed the
final version of the PDSQ—patients interviewed by
clinicians with an unstructured clinical interview
(non-SCID sample, n= 1352) and patients interviewed
with the SCID (n = 993). Not all patients were inter-
viewed with the SCID because of the lack of avail-
ability of diagnostic raters and patients’ preference for
the briefer clinical evaluation. Thus, assignment to
receiving a SCID or unstructured clinical diagnostic
evaluation was not random.
2.2. Assessment
Before the initial evaluation all patients completed
the PDSQ as part of their initial paperwork. The
PDSQ is a broad-based screening questionnaire
assessing the symptoms of mood, eating, anxiety,
substance use, and somatoform disorders (Zimmer-
man and Mattia, 2001a,b). Because the validity of the
PDSQ was under investigation, the clinicians were
kept blind to the patients’ responses on the question-
naire. The institutional review board approved the
evaluation protocol, and participants provided written
informed consent.
In the non-SCID sample, diagnostic evaluations
were conducted by attending psychiatrists. Diagnoses
were based on DSM-IV criteria. Clinicians complet-
ed a standardized intake form modeled on the Intake
Evaluation Form of Mezzich and colleagues (Mez-
zich et al., 1981). The intake form included space for
a narrative description of the chief complaint, history
of present illness, and past psychiatric history. In
addition, there was a checklist to record the presence
or absence of substance use problems, a history of
sexual or physical abuse, psychotic symptoms, panic
attacks, phobias, obsessions, compulsive behavior,
and all of the symptoms of major depression. On
the last page of the five-page form clinicians
recorded patients’ DSM-IV multiaxial diagnoses.
Research assistants recorded the results of the clini-
cian’s diagnostic evaluation written on the last page
of the intake form, and collected demographic infor-
mation from the narrative. To avoid underestimating
comorbidity detection by clinicians, we included as
cases patients whom the clinicians diagnosed with a
‘rule-out’ disorder.
When patients called to schedule their initial
appointment they were offered the opportunity to
receive a more comprehensive evaluation than the
usual clinical evaluation. The patients were told that
they would be interviewed by two people—first by
a diagnostic rater who would conduct a compre-
hensive evaluation, and then by a psychiatrist. After
the SCID, the rater presented the case to a psychi-
atrist who reviewed the findings of the evaluation
with the patient. The extensive training program of
the diagnostic raters has been described in prior
reports (Zimmerman and Mattia, 1999). During the
course of the study, joint-interview diagnostic reli-
ability information has been collected on 47
patients. For anxiety and substance use disorders
the kappa coefficients were: panic disorder (k = 1.0),
social phobia (k= 0.84), obsessive-compulsive dis-
order (k = 1.0), generalized anxiety disorder
(k = 0.93), posttraumatic stress disorder (k = 0.91),
alcohol abuse/dependence (k = 0.64), and drug
abuse/dependence (k = 0.73).
The PDSQ has undergone several rounds of
study involving more than 3000 primary care and
M. Zimmerman et al. / Journal of Affective Disorders 80 (2004) 79–8582
psychiatric outpatients. After each large validation
study, the scale was revised based on a psychomet-
ric analysis of the subscales and items. The final
version of the PDSQ consists of 126 questions
assessing the symptoms of 13 DSM-IV disorders
in five areas: eating disorders (bulimia/binge eating
disorder), mood disorders (MDD), anxiety disorders
(panic disorder, agoraphobia, PTSD, OCD, GAD
and social phobia), substance use disorders (alcohol
abuse/dependence, drug abuse/dependence), and
somatoform disorders (somatization disorder, hypo-
chondriasis). In addition, there is a six-item psy-
chosis screen.
The reliability and validity of the PDSQ have been
described in detail elsewhere (Zimmerman and Mat-
tia, 2001a,b). Briefly, in the validity study of the final
version of the PDSQ, the 13 PDSQ subscales dem-
onstrated good to excellent levels of internal consis-
tency (Zimmerman and Mattia, 2001a). Cronbach’s
alpha was greater than 0.80 for 12 of the 13 subscales,
and the mean of the alpha coefficients was 0.86. Test–
retest reliability coefficients were greater than 0.80 for
nine subscales (mean 0.83). The convergent and
discriminant validity of the PDSQ subscales was
examined in 361 patients who completed a package
of questionnaires at home less than a week after
completing the PDSQ. The booklet included measures
of symptoms related to each of the PDSQ symptoms
domains. Every PDSQ subscale was more highly
correlated with the concordant validity scale assessing
the same symptom domain versus other symptoms
domains. Across all subscales, the mean correlation
between the PDSQ subscales and their respective
validity scale was 0.66, while the mean correlation
between PDSQ subscales and measures of other
symptom domains was 0.25. Finally, the diagnostic
performance of the PDSQ subscales was examined in
630 patients interviewed with the SCID. Sensitivity
and specificity varied according to the cut-off used
(Zimmerman and Mattia, 2001b). In the present report
we used the PDSQ cut-off scores associated with a
specificity of 90%.
2.3. Statistical analysis
In the present report we focused on anxiety and
substance use disorders because they are the most
frequent comorbidities used as exclusion criteria in
AETs (Zimmerman et al., submitted for publication).
We did not examine specific phobia or nicotine
dependence because these disorders are rarely used
as the basis for exclusion in an AET. We examined the
impact of different diagnostic methods on the appli-
cation of the anxiety and substance use disorder
exclusion criteria in the patients with non-psychotic,
unipolar MDD in the SCID and non-SCID samples.
First, we determined the similarity of the two samples
by comparing their demographic characteristics and
scores on the PDSQ. Next, we compared the two
samples on the percentage of patients that might be
excluded from an AET because the disorder is pres-
ent. Last, for each of the anxiety and substance
disorders we compared patients in the SCID and
non-SCID groups on the PDSQ after we excluded
patients with the diagnosis. Categorical variables were
compared by the chi-square statistic and continuous
variables were compared with t-tests.
3. Results
More than 900 patients received a principal diag-
nosis of current non-bipolar MDD, 579 patients in the
non-SCID sample and 339 patients in the SCID
sample. The data in Table 1 indicate that there were
modest, albeit statistically significant, differences be-
tween the non-SCID and SCID samples in age,
marital status, education, and race.
Patients in the non-SCID and SCID samples were
compared on the PDSQ subscale scores, controlling
for demographic differences. There were no signifi-
cant differences between the groups on any of the
PDSQ subscale scores. Thus, despite the significant
differences in demographic characteristics, the SCID
and non-SCID patient samples were clinically similar
as assessed by a reliable and valid self-report measure
of DSM-IV symptoms.
Each anxiety disorder except PTSD was significant-
ly more frequently diagnosed in the SCID than the non-
SCID sample (panic disorder: 16.2 vs. 8.8%,
v2 = 11.51, P < 0.01; obsessive-compulsive disorder:
7.7 vs. 3.3%, v2 = 8.83, P < 0.01; social phobia: 32.4
vs. 2.4%, v2 = 165.0, P < 0.01; posttraumatic stress
disorder: 10.6 vs. 9.0%, v2 = 0.66, NS; generalized
anxiety disorder: 20.1 vs. 7.3%, v2 = 33.24, P < 0.01).
There was no difference between SCID and non-SCID
Table 1
Demographic characteristics of depressed patients in the non-SCID and SCID samples
non-SCID (n= 579) SCID (n= 339) Two-group test
n % n % v2 P
Gender 0.22 NS
Females 399 69.0 229 67.6
Males 179 31.0 110 32.4
Race
11.13 < 0.01
White 465 80.3 301 88.8
Non-white 114 19.7 38 11.2
Education
15.76 < 0.05
Less than high school 84 14.5 30 8.8
High school graduate or GED 341 59.6 223 65.8
College graduate 154 26.9 86 25.4
Marital status
13.54 < 0.05
Married 245 43.4 140 41.3
Living with someone as if married 33 5.8 20 5.9
Widowed 24 4.2 5 1.5
Separated 46 8.1 21 6.2
Divorced 101 17.9 54 15.9
Single 116 20.5 99 29.2
Mean S.D. Mean S.D. t P
Age (years) 40.72 14.3 38.67 12.0 2.32 < 0.05
M. Zimmerman et al. / Journal of Affective Disorders 80 (2004) 79–85 83
groups in rates of current drug abuse/dependence (6.2
vs. 3.5%, v2 = 3.76, NS). Alcohol abuse/dependencewas significantly more frequently diagnosed in the
SCID patients (9.4 vs. 5.5%, v2 = 5.05, P < 0.05).
Table 2
Prevalence of PDSQ cases in the non-SCID and SCID patients after exclu
PDSQ casesa non-SCID sample (n= 579)
nb No. of PDSQ cases %
Panic disorder 492 86 17.5
Social phobia 517 95 18.4
Obsessive compulsive disorder 522 70 13.4
Posttraumatic stress disorder 504 69 13.7
Generalized anxiety disorder 486 103 21.2
Any alcohol use disorder 492 43 8.7
Any drug use disorder 504 22 4.4
a In the non-SCID sample, the number of patients with missing data on
phobia subscale, 38 on the obsessive compulsive disorder subscale, 23 o
anxiety disorder subscale, 55 on the alcohol use disorder subscale, and 55 o
patients with missing data on the PDSQ was ten on the panic disorder
compulsive disorder subscale, three on the posttraumatic stress disorder su
alcohol use disorder subscale, and six on the drug use disorder subscale.b n indicates sample size after patients diagnosed with the index disord
were diagnosed with panic disorder. Data were missing on the PDSQ pan
We next compared the SCID and non-SCID sam-
ples on individual PDSQ subscales after excluding
patients with the index disorder. For example, the
samples were compared on the PDSQ panic disorder
ding patients with the index disorder
SCID sample (n= 339) Two-group test
nb No. of PDSQ cases % v2 P
274 24 8.8 10.88 0.01
223 28 12.6 3.81 0.05
300 21 7.0 7.95 0.01
300 38 12.7 0.17 NS
267 55 20.6 0.04 NS
301 20 6.6 1.12 NS
312 23 7.4 3.34 NS
the PDSQ was 36 on the panic disorder subscale, 48 on the social
n the posttraumatic stress disorder subscale, 51 on the generalized
n the drug use disorder subscale. In the SCID sample, the number of
subscale, six on the social phobia subscale, 13 on the obsessive
bscale, four on the generalized anxiety disorder subscale, six on the
er were excluded. For example, 51 patients in the non-SCID group
ic disorder subscale in 36 patients. Thus, n= 492 (579� 51� 36).
M. Zimmerman et al. / Journal of Affective Disorders 80 (2004) 79–8584
subscale after patients diagnosed with panic disorder
were excluded. This is analogous to comparing dif-
ferent samples included in an AET at different sites (or
in different studies) using different diagnostic meth-
odologies. After this exclusion, we compared the
percentage of patients in each group who were pos-
itive on the PDSQ for each of the anxiety or substance
use disorders used as the basis for exclusion (Table 2).
As expected from the above results, after excluding
the diagnosed cases, significantly more patients
screened positive on the PDSQ for panic disorder,
social phobia, and obsessive-compulsive disorder in
the non-SCID group than the SCID group (because
more of these cases were undetected by the unstruc-
tured interview in the non-SCID group).
4. Discussion
In this report we illustrated how a self-report
questionnaire could be used to detect systematic
differences in diagnostic practices in research studies
such as AETs. Depressed patients drawn from the
same clinical practice were evaluated with an unstruc-
tured clinical interview or with the SCID. The non-
SCID and SCID samples were clinically comparable
(according the PDSQ), though significantly more
patients were diagnosed with anxiety and alcohol
use disorders in the SCID sample. Without the PDSQ
data we would not have been able to determine that
the differences in diagnostic frequencies between the
samples were due to different diagnostic practices
rather than true sample differences.
Differences in diagnostic practices can influence
recruitment into an AET. Anecdotal conversations
with researchers and clinicians who have worked
on clinical trials at different sites suggest that
different levels of rigor are used in adhering to
the specified inclusion and exclusion criteria. To
illustrate how this might occur, we applied the
anxiety and substance use disorder exclusion crite-
ria frequently used in AETs to two samples eval-
uated with different degrees of diagnostic rigor. As
expected, a greater percentage of patients evaluated
more thoroughly with the SCID would have been
excluded from a clinical trial.
Of course, use of a semi-structured diagnostic inter-
view for recruitment into a study does not ensure that
recruitment and the application of exclusion criteria in
AETs will be done similarly across sites. Variability in
the interpretation of subjects’ responses to questions of
a semi-structured interview remains, and different
investigators may have different thresholds for diag-
nosing comorbid conditions. Consequently, it would be
helpful to be able to compare samples that are accepted
into an AETon ameasure that is free of interviewer bias
in the application of diagnostic criteria. A self-report
questionnaire such as the PDSQ is one such approach.
We demonstrated that the patients who might have
passed through the diagnostic evaluation process as
part of an AET based on different methods of diagnosis
were not comparable. That is, when diagnoses were
based on an unstructured clinical evaluation signifi-
cantly more patients who might have been accepted
into the AETscored positive on the PDSQ than patients
who might have been accepted into the AET based on
the SCID interview. If these were the findings from an
actual multi-center AET we would interpret them as
indicating that there was a systematic difference be-
tween sites in applying the exclusion criteria. Theoret-
ically, when this happens it is not possible to know
which site is more appropriately applying the AET
exclusion criteria. A self-report paper-standard does
not indicate which site is more or less accurate in
evaluating patients. Rather, a self-report paper-standard
simply identifies a systematic difference in how
patients were evaluated. However, in light of the
financial incentives to overlook exclusion criteria and
recruit patients into an AET as quickly as possible, one
could infer which sites are less rigorously applying the
exclusion criteria.
We do not knowwhat is actually done when subjects
are evaluated for participation in an AET. It is probable
that use of a semi-structured diagnostic interview
results in closer adherence to the stated exclusion
criteria; thus, this methodology should probably be
routinely used in recruiting patients for an AET. It is
surprising that semi-structured interviews, which are
the diagnostic standard in most areas of psychiatric
research, are so infrequently used in AETs. Even when
semi-structured interviews are used there is still room
for interpretation. Thus, wewould also recommend that
self-report questionnaires be routinely used in AETs,
and that results on these measures be routinely reported
in the same way the samples’ demographic character-
istics are described.
M. Zimmerman et al. / Journal of Affective Disorders 80 (2004) 79–85 85
Acknowledgements
This research was supported, in part, by grants
MH48732 and MH56404 from the National Institute
of Mental Health.
References
Basco, M.R., Bostic, J.Q., Davies, D., Rush, A.J., Witte, B., Hen-
drickse, W., Barnett, V., 2000. Methods to improve diagnostic
accuracy in a community mental health setting. Am. J. Psychia-
try 157, 1599–1605.
Mezzich, J.E., Dow, J.T., Rich, C.L., Costello, A.J., Himmelhoch,
J.M., 1981. Developing an efficient clinical information system
for comprehensive psychiatric institute. II: Initial evaluation
form. Behav. Res. Methods Instrum. Comput. 13, 464–478.
Miller, P.R., Dasher, R., Collins, R., Griffiths, P., Brown, F., 2001.
Inpatient diagnostic assessments: 1. Accuracy of structured ver-
sus unstructured interviews. Psychiatry Res. 105, 255–264.
Robinson, D., Rickels, K., 2000. Concerns about clinical drug tri-
als. J. Clin. Psychopharmacol. 20, 593–596, editorial.
Shear, M.K., Greeno, C., Kang, J., Ludewig, D., Frank, E.,
Swartz, H.A., Hanekamp, M., 2000. Diagnosis of non-psy-
chotic patients in community clinics. Am. J. Psychiatry 157,
581–587.
Zimmerman, M., Coryell, W., Black, D.W., 1990. Variability in the
application of contemporary diagnostic criteria: endogenous de-
pression as an example. Am. J. Psychiatry 147, 1173–1179.
Zimmerman, M., Coryell, W., Black, D.W., 1993. A method to
detect intercenter differences in the application of contemporary
diagnosis criteria. J. Nerv. Ment. Dis. 181, 130–134.
Zimmerman, M., Mattia, J.I., 1999. Psychiatric diagnosis in clinical
practice: is comorbidity being missed? Compr. Psychiatry 40,
182–191.
Zimmerman, M., Mattia, J.I., 2001a. The Psychiatric Diagnostic
Screening Questionnaire: development, reliability and validity.
Compr. Psychiatry 42, 175–189.
Zimmerman, M., Mattia, J.I., 2001b. A self-report scale to help
make psychiatric diagnoses: the Psychiatric Diagnostic
Screening Questionnaire (PDSQ). Arch. Gen. Psychiatry 58,
787–794.
Zimmerman, M., Chelminski, I., Posternak, M.A. Exclusion crite-
ria used in antidepressant efficacy trials: Consistency across
studies and representativeness of samples included. Submitted
for publication.