Upload
roland-osborne-brooks
View
220
Download
0
Tags:
Embed Size (px)
Citation preview
The Test for Significant Toxicity (TST) – A “New”
Hypothesis Testing Approach for Aquatic
Bioassay TestingPhilip Markle
Environmental [email protected]
History of the TSTHistory of the TST June 2010 – EPA released WET TST guidance
(EPA 833-R-10-003)
Also referred as:– Bioequivalence Testing– Alternative Null Hypothesis Testing
Accepted for FDA drug trials and evaluations Originally proposed for use in toxicity testing in
1995 (Erickson and McDonald) Recently proposed for CA’s WET Policy
Limitations of the TSTLimitations of the TST It is still a statistical hypothesis test
– Not very useful for comparing results spatially or temporally
– Pass/Fail test, provides no information on magnitude
Requires knowledge/use of a “threshold” response – “b” or bioequivalence factor
Probably (and debatably) best suited for regulatory purposes
Statistical Hypothesis Statistical Hypothesis Testing 101Testing 101
Statistical speaking;– You can’t “prove” anything with a
hypothesis test – we only “disprove” The “White Swan” Parable:
Statistical Hypothesis Statistical Hypothesis Testing 101Testing 101
You can’t prove that “all swans are white” If we see 10,000 white swans and no
non-white swans, we fail reject our hypothesis
In the absence of evidence to the contrary, we then assume the hypothesis is true
““Proving” with StatisticsProving” with Statistics
However, after observing just one non-white swan, we can then confidently reject or disprove our hypothesis that all swans are white
Statistical Hypothesis Statistical Hypothesis Testing - BackgroundTesting - Background
Null or “Initial” Hypothesis (Ho)
– Mean(sample) Mean(control)
Conduct statistical analyses to try to reject this hypothesis
If unable to reject, we assume the null or “Initial” hypothesis is correct
Type I and Type II error
Type I and Type II ErrorsType I and Type II Errors
Type I Error– Probability of rejecting when the null or
“Initial” hypothesis when it is “true”– Controlled directly by setting alpha ()
Type II Error– Probability of accepting the null or
“Initial” hypothesis when it is “false”– Also called “power” ()– Controlled indirectly
Standard Hypothesis Testing Standard Hypothesis Testing (NOEC)(NOEC)
With the NOEC:– The initial hypothesis is
mean (sample) mean (control) In other words, the sample is non-
toxic!– If we don’t/can’t “prove” this to be
incorrect statistically, we assume it is true– Type I error = Identifying a non-toxic
sample as toxic
TST HypothesisTST Hypothesis
With the TST:– The hypothesis is
mean(effluent) =/< 0.75 * mean(control) In other words, the sample is toxic!
– If we don’t/can’t “prove” this to be incorrect statistically, we assume it is true – we assume the sample is toxic
– Type I error = Identifying a toxic sample as non-toxic
Bioequivalence Factor (b)Bioequivalence Factor (b)
In the EPA Guidance– Set as an unacceptable or “toxic”
thresholdFor Chronic:
– B = 0.75 = 25% EffectFor Acute
– B = 0.80 = 20% Effect
Regulatory Management Regulatory Management Decisions (RMDs)Decisions (RMDs)
Setting the Type I Error Rate–alpha ()– How frequent will you reject the Ho when
it is true?EPA desires that no more than 25% of
the tests with a 25% effect or more are identified as “non-toxic”
Alpha () is then set at 0.05 to 0.25, depending on the test
Test/Species-Specific AlphaTest/Species-Specific Alpha
Why the Different Alphas?Why the Different Alphas?EPA’s Second Regulatory Management
Decision– No more than 5% of tests with effects
less than 10% should be identified as toxic
– Type II Error Rate – not really a “false positive”
Alpha adjusted down until no more than 5% of tests with effects less than 10% were identified as “toxic”– Monte Carlo simulations
TST Equation (Welch’s t-test)TST Equation (Welch’s t-test)
t =
t (calculated) < t (table/critical) = toxic
t (calculated) > t (table/critical) = non-toxic
)control(
)control(
)sample(
)sample(
)control()sample(
nVariance5625.0
nVariance
Mean75.0Mean
Factors That Impact Ability to Factors That Impact Ability to Statistically Reject the Statistically Reject the
HypothesisHypothesis
Magnitude of EffectNumber of ReplicatesWithin Test Variability
TST Equation (Welch’s t-test)TST Equation (Welch’s t-test)
t =
All tests (100%) with an effect of 25% will be identified as “toxic”
The greater the within test variability, the harder or less likely it will be to identify a sample as being statistically different (non-toxic).
The more replication, the more likely it will be to identify a sample as being statistically different (non-toxic).
)control(
)control(
)sample(
)sample(
)control()sample(
nVariance5625.0
nVariance
Mean75.0Mean
Effect of Variability:Effect of Variability:Standard t-testStandard t-test
Col 1 vs Response Mea
n Y
oung
Pro
duce
d
0
10
20
30
40
50
Control ControlEffluent Effluent
25% Reduction
NOEC = Significant/toxic NOEC = Not significant/non-toxic
Example:Example:TST testTST test
Mea
n Y
oung
Pro
duce
d
15
20
25
30
35
40
Control ControlEffluent Effluent
TST = Non-toxic TST = Toxic
25% Effect or75% of Control
Controllable Factors That Impact Controllable Factors That Impact Ability to Statistically Reject the Ability to Statistically Reject the
HypothesisHypothesisVariability
– The greater the within test variability, the harder or less likely it will be to identify a sample as being statistically different.
– For the “regular” hypothesis test• Less frequent identification of “toxicity”
– For the TST• Less frequent identification of “no
toxicity”Replication
Procedures That May Reduce Procedures That May Reduce VariabilityVariability
Maximize Mean Response• CV = S.D. / Mean
From EPA Test of Significant Toxicity (TST) DocumentEPA 833-R-10-003
Impact of Control MeanImpact of Control Mean
At the 10th Percentile (17.7) - a 25% effect is reduction of 4.4 neonates
At the 50th Percentile (25.5) - a 25% effect is reduction of 6.4 neonates
At the 95th Percentile (35.6) - a 25% effect is reduction of 8.9 neonates
Procedures That May Procedures That May Increase Mean ResponseIncrease Mean Response
Dilution Water Selection– Match sample condition as much as
possibleFood Supplements, Combinations
– Specifically allowed (13.6.16.9.2)Feeding Rates
– Twice or three times per day– Amount of food
Fathead Minnow Feeding Fathead Minnow Feeding Rate ExampleRate Example
Fathead Minnow Growth - Control Mean
Con
trol
Mea
n (m
g)
0.25
0.45
0.65
0.85
1.05
1.25
1.45
1.65 1000 Artemia/Test Chambern = 267, Mean = 0.616 mg
0.4% Exceed 95th Percentile1.1% Exceed 90th Percentile2.2% Exceed 85th Percentile7.1% Exceed 75th Percentile
50.9% Exceed 50th Percentile
1500 Artemia/Test Chambern=317, Mean = 0.801
9.5% Exceed 95th Percentile20.5% Exceed 90th Percentile30.3% Exceed 85th Percentile53.3% Exceed 75th Percentile97.2% Exceed 50th Percentile
Impact of Growth on CVImpact of Growth on CVFathead Minnow Growth - Control CV
Con
trol
CV
(%
)
0
10
20
30
40 1000 Artemia/Test Chambern = 267
3% Exceed 95th Percentile9.4% Exceed 90th Percentile9.7% Exceed 85th Percentile
15.7% Exceed 75th Percentile40.4% Exceed 50th Percentile
1500 Artemia/Test Chambern=317
1.9% Exceed 95th Percentile4.7% Exceed 90th Percentile5.4% Exceed 85th Percentile
15.2% Exceed 75th Percentile41.5% Exceed 50th Percentile
Procedures That May Procedures That May Decrease VariabilityDecrease Variability
Set Internal Control CV CriteriaCeriodaphnia dubia Control CV
2010 through February 2011
Date of Test Initiation
1/1/2010 3/1/2010 5/1/2010 7/1/2010 9/1/2010 11/1/2010 1/1/2011 3/1/2011
Con
trol
CV
(%
)
0
20
40
60
80
100
120
140
Meets TACFailed TACNational 50th PercentileNational 75th PercentileNational 95th Percentile
Procedures That May Procedures That May Decrease VariabilityDecrease Variability
Set Internal Control Mean CriteriaCeriodaphnia dubia Control Reproduction Means
2010 through February 2011
Date of Testing1/1/2010 3/1/2010 5/1/2010 7/1/2010 9/1/2010 11/1/2010 1/1/2011 3/1/2011
Mea
n R
epr
odu
ctio
n in
Con
trol
0
10
20
30
40
50
Meets TACFailed TACMinimum TACNational 50th PercentileNational 75th PercentileNational 95th Percentile
Statistical andStatistical andNon-statistical Error Non-statistical Error
False Determinations of ToxicityUSEPA Non-Toxic "Blank" Samples1
Ceriodaphnia dubia Reproduction Results
Effe
ct R
ela
tive
to C
on
tro
l (%
)
-20
0
20
40
60
80 TST Non-Toxic TST Toxic (14.8%)
1 Data Source: USEPA's WET Interlaboratory Validation Study (EPA 821-B-01-004), Table 9.7.
Dose Response EvaluationDose Response Evaluation
Eliminating multiple concentrations may limit ability to evaluate spurious results.
Nu
mb
er
of N
eon
ate
s
0
5
10
15
20
25
30
35
Non-ToxicToxic
Control 100%Effluent
100%Effluent
20%Effluent
40%Effluent
60%Effluent
80%Effluent
Single ConcentrationTest
Multiple Concentration Test
Conclusions Conclusions Same limitations as any hypothesis test
– Implications associated with variability and “power” shifted
Not a magical “black box”– You need to be aware of the impact
variability, QA/QC, and test design may have
May be useful for regulation– NPDES Permits– Possible use for remediation goals?
Questions?Questions?
Contact info: [email protected]