The Test for Significant Toxicity (TST) – A “New” Hypothesis Testing Approach for Aquatic Bioassay Testing Philip Markle Environmental Scientist [email protected]

The Test for Significant Toxicity (TST) – A “New”

Hypothesis Testing Approach for Aquatic

Bioassay TestingPhilip Markle

Environmental [email protected]

History of the TSTHistory of the TST June 2010 – EPA released WET TST guidance

(EPA 833-R-10-003)

Also referred as:– Bioequivalence Testing– Alternative Null Hypothesis Testing

Accepted for FDA drug trials and evaluations Originally proposed for use in toxicity testing in

1995 (Erickson and McDonald) Recently proposed for CA’s WET Policy

Limitations of the TSTLimitations of the TST It is still a statistical hypothesis test

– Not very useful for comparing results spatially or temporally

– Pass/Fail test, provides no information on magnitude

Requires knowledge/use of a “threshold” response – “b” or bioequivalence factor

Probably (and debatably) best suited for regulatory purposes

Statistical Hypothesis Statistical Hypothesis Testing 101Testing 101

Statistical speaking;– You can’t “prove” anything with a

hypothesis test – we only “disprove” The “White Swan” Parable:

Statistical Hypothesis Statistical Hypothesis Testing 101Testing 101

You can’t prove that “all swans are white” If we see 10,000 white swans and no

non-white swans, we fail reject our hypothesis

In the absence of evidence to the contrary, we then assume the hypothesis is true

““Proving” with StatisticsProving” with Statistics

However, after observing just one non-white swan, we can then confidently reject or disprove our hypothesis that all swans are white

Statistical Hypothesis Statistical Hypothesis Testing - BackgroundTesting - Background

Null or “Initial” Hypothesis (Ho)

– Mean(sample) Mean(control)

Conduct statistical analyses to try to reject this hypothesis

If unable to reject, we assume the null or “Initial” hypothesis is correct

Type I and Type II error

Type I and Type II ErrorsType I and Type II Errors

Type I Error– Probability of rejecting when the null or

“Initial” hypothesis when it is “true”– Controlled directly by setting alpha ()

Type II Error– Probability of accepting the null or

“Initial” hypothesis when it is “false”– Also called “power” ()– Controlled indirectly

Standard Hypothesis Testing Standard Hypothesis Testing (NOEC)(NOEC)

With the NOEC:– The initial hypothesis is

mean (sample) mean (control) In other words, the sample is non-

toxic!– If we don’t/can’t “prove” this to be

incorrect statistically, we assume it is true– Type I error = Identifying a non-toxic

sample as toxic

TST HypothesisTST Hypothesis

With the TST:– The hypothesis is

mean(effluent) =/< 0.75 * mean(control) In other words, the sample is toxic!

– If we don’t/can’t “prove” this to be incorrect statistically, we assume it is true – we assume the sample is toxic

– Type I error = Identifying a toxic sample as non-toxic

Bioequivalence Factor (b)Bioequivalence Factor (b)

In the EPA Guidance– Set as an unacceptable or “toxic”

thresholdFor Chronic:

– B = 0.75 = 25% EffectFor Acute

– B = 0.80 = 20% Effect

Regulatory Management Regulatory Management Decisions (RMDs)Decisions (RMDs)

Setting the Type I Error Rate–alpha ()– How frequent will you reject the Ho when

it is true?EPA desires that no more than 25% of

the tests with a 25% effect or more are identified as “non-toxic”

Alpha () is then set at 0.05 to 0.25, depending on the test

Test/Species-Specific AlphaTest/Species-Specific Alpha

Why the Different Alphas?Why the Different Alphas?EPA’s Second Regulatory Management

Decision– No more than 5% of tests with effects

less than 10% should be identified as toxic

– Type II Error Rate – not really a “false positive”

Alpha adjusted down until no more than 5% of tests with effects less than 10% were identified as “toxic”– Monte Carlo simulations

TST Equation (Welch’s t-test)TST Equation (Welch’s t-test)

t =

t (calculated) < t (table/critical) = toxic

t (calculated) > t (table/critical) = non-toxic

)control(

)control(

)sample(

)sample(

)control()sample(

nVariance5625.0

nVariance

Mean75.0Mean

Factors That Impact Ability to Factors That Impact Ability to Statistically Reject the Statistically Reject the

HypothesisHypothesis

Magnitude of EffectNumber of ReplicatesWithin Test Variability

TST Equation (Welch’s t-test)TST Equation (Welch’s t-test)

t =

All tests (100%) with an effect of 25% will be identified as “toxic”

The greater the within test variability, the harder or less likely it will be to identify a sample as being statistically different (non-toxic).

The more replication, the more likely it will be to identify a sample as being statistically different (non-toxic).

)control(

)control(

)sample(

)sample(

)control()sample(

nVariance5625.0

nVariance

Mean75.0Mean

Effect of Variability:Effect of Variability:Standard t-testStandard t-test

Col 1 vs Response Mea

n Y

oung

Pro

duce

d

0

10

20

30

40

50

Control ControlEffluent Effluent

25% Reduction

NOEC = Significant/toxic NOEC = Not significant/non-toxic

Example:Example:TST testTST test

Mea

n Y

oung

Pro

duce

d

15

20

25

30

35

40

Control ControlEffluent Effluent

TST = Non-toxic TST = Toxic

25% Effect or75% of Control

Controllable Factors That Impact Controllable Factors That Impact Ability to Statistically Reject the Ability to Statistically Reject the

HypothesisHypothesisVariability

– The greater the within test variability, the harder or less likely it will be to identify a sample as being statistically different.

– For the “regular” hypothesis test• Less frequent identification of “toxicity”

– For the TST• Less frequent identification of “no

toxicity”Replication

Procedures That May Reduce Procedures That May Reduce VariabilityVariability

Maximize Mean Response• CV = S.D. / Mean

From EPA Test of Significant Toxicity (TST) DocumentEPA 833-R-10-003

Impact of Control MeanImpact of Control Mean

At the 10th Percentile (17.7) - a 25% effect is reduction of 4.4 neonates



Procedures That May Procedures That May Increase Mean ResponseIncrease Mean Response

Dilution Water Selection– Match sample condition as much as

possibleFood Supplements, Combinations

– Specifically allowed (13.6.16.9.2)Feeding Rates

– Twice or three times per day– Amount of food

Fathead Minnow Feeding Fathead Minnow Feeding Rate ExampleRate Example

Fathead Minnow Growth - Control Mean

Con

trol

Mea

n (m

g)

0.25

0.45

0.65

0.85

1.05

1.25

1.45

1.65 1000 Artemia/Test Chambern = 267, Mean = 0.616 mg

0.4% Exceed 95th Percentile1.1% Exceed 90th Percentile2.2% Exceed 85th Percentile7.1% Exceed 75th Percentile

50.9% Exceed 50th Percentile

1500 Artemia/Test Chambern=317, Mean = 0.801

9.5% Exceed 95th Percentile20.5% Exceed 90th Percentile30.3% Exceed 85th Percentile53.3% Exceed 75th Percentile97.2% Exceed 50th Percentile

Impact of Growth on CVImpact of Growth on CVFathead Minnow Growth - Control CV

Con

trol

CV

(%

)

0

10

20

30

40 1000 Artemia/Test Chambern = 267

3% Exceed 95th Percentile9.4% Exceed 90th Percentile9.7% Exceed 85th Percentile

15.7% Exceed 75th Percentile40.4% Exceed 50th Percentile

1500 Artemia/Test Chambern=317

1.9% Exceed 95th Percentile4.7% Exceed 90th Percentile5.4% Exceed 85th Percentile

15.2% Exceed 75th Percentile41.5% Exceed 50th Percentile

Procedures That May Procedures That May Decrease VariabilityDecrease Variability

Set Internal Control CV CriteriaCeriodaphnia dubia Control CV

2010 through February 2011

Date of Test Initiation

1/1/2010 3/1/2010 5/1/2010 7/1/2010 9/1/2010 11/1/2010 1/1/2011 3/1/2011

Con

trol

CV

(%

)

0

20

40

60

80

100

120

140

Meets TACFailed TACNational 50th PercentileNational 75th PercentileNational 95th Percentile

Procedures That May Procedures That May Decrease VariabilityDecrease Variability

Set Internal Control Mean CriteriaCeriodaphnia dubia Control Reproduction Means

2010 through February 2011

Date of Testing1/1/2010 3/1/2010 5/1/2010 7/1/2010 9/1/2010 11/1/2010 1/1/2011 3/1/2011

Mea

n R

epr

odu

ctio

n in

Con

trol

0

10

20

30

40

50

Meets TACFailed TACMinimum TACNational 50th PercentileNational 75th PercentileNational 95th Percentile

Statistical andStatistical andNon-statistical Error Non-statistical Error

False Determinations of ToxicityUSEPA Non-Toxic "Blank" Samples1

Ceriodaphnia dubia Reproduction Results

Effe

ct R

ela

tive

to C

on

tro

l (%

)

-20

0

20

40

60

80 TST Non-Toxic TST Toxic (14.8%)

1 Data Source: USEPA's WET Interlaboratory Validation Study (EPA 821-B-01-004), Table 9.7.

Dose Response EvaluationDose Response Evaluation

Eliminating multiple concentrations may limit ability to evaluate spurious results.

Nu

mb

er

of N

eon

ate

s

0

5

10

15

20

25

30

35

Non-ToxicToxic

Control 100%Effluent

100%Effluent

20%Effluent

40%Effluent

60%Effluent

80%Effluent

Single ConcentrationTest

Multiple Concentration Test

Conclusions Conclusions Same limitations as any hypothesis test

– Implications associated with variability and “power” shifted

Not a magical “black box”– You need to be aware of the impact

variability, QA/QC, and test design may have

May be useful for regulation– NPDES Permits– Possible use for remediation goals?

Questions?Questions?

Contact info: [email protected]

Documents

The Test for Significant Toxicity (TST) – A “New” Hypothesis Testing Approach for Aquatic Bioassay Testing Philip Markle Environmental Scientist [email protected]