24
Topics • Reference interval studies • The importance of seeing • Parametric, Non-parametric and other • Estimating the error of a reference interval study • Sample size for a reference interval study • Outlier exclusion • Partitioning for age, sex or other • Data mining techniques • Requirements for reference interval sharing

Topics Reference interval studies The importance of seeing Parametric, Non-parametric and other Estimating the error of a reference interval study Sample

Embed Size (px)

Citation preview

Page 1: Topics Reference interval studies The importance of seeing Parametric, Non-parametric and other Estimating the error of a reference interval study Sample

Topics

• Reference interval studies • The importance of seeing• Parametric, Non-parametric and other• Estimating the error of a reference interval study• Sample size for a reference interval study• Outlier exclusion• Partitioning for age, sex or other• Data mining techniques• Requirements for reference interval sharing

Page 2: Topics Reference interval studies The importance of seeing Parametric, Non-parametric and other Estimating the error of a reference interval study Sample

The importance of seeing

• This is a workshop on statistical techniques

• The human brain is a very powerful mathematical engine

• The best inputs are graphical not numerical

• ALWAYS graph your data

• ALWAYS think about your data

Page 3: Topics Reference interval studies The importance of seeing Parametric, Non-parametric and other Estimating the error of a reference interval study Sample

The importance of numbers

• Which distribution is Gaussian?

0

2

4

6

8

10

12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

0

50

100

150

200

250

300

350

400

450

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Both!

N=50

N=2000

Page 4: Topics Reference interval studies The importance of seeing Parametric, Non-parametric and other Estimating the error of a reference interval study Sample

Parametric statistics

• Can be used on parametric distributions

• Parametric distributions are those which can be described by parameters

• Gaussian Distribution defined by 2 parameters:

• Mean (average) – indication of the center

• Standard deviation – indication of scatter– Symmetrical distribution (not skewed)– 68.3% within +/- 1SD– 95.4% within +/- 2SD– 99.7% within +/- 3SD

Page 5: Topics Reference interval studies The importance of seeing Parametric, Non-parametric and other Estimating the error of a reference interval study Sample

Non-parametric statistics

• No assumptions about distribution

• Percentiles determined by ranking

• Measure of centre is median (50th percentile)

• Measure of scatter is percentiles (eg 2.5th and 97.5th)

0

5

10

15

20

25

30

35

1 2 3 4 5 6 7 8 9 10 11 12

Page 6: Topics Reference interval studies The importance of seeing Parametric, Non-parametric and other Estimating the error of a reference interval study Sample

Non-parametric statistics• xth percentile is X × (n+1)th lowest sample• Example:

75th centile, n=13875th = 0.75 x 139 th lowest sample = 104th lowest = 9

0

5

10

15

20

25

30

35

1 2 3 4 5 6 7 8 9 10 11 12

Page 7: Topics Reference interval studies The importance of seeing Parametric, Non-parametric and other Estimating the error of a reference interval study Sample

Non-Parametric - numbers

• To determine two percentiles P% apart

• Need at least (100/P)-1 observations

• Examples– 95th Centile (separate from 90th)

– Need (100/5)-1 = 19 observations

– 97.5th Centile (separate from 95th)

– Need (100/2.5)-1 = 39 observations

– 99th Centile (separate from 98th)– Need (100/1)-1 = 99 observations

Page 8: Topics Reference interval studies The importance of seeing Parametric, Non-parametric and other Estimating the error of a reference interval study Sample

Robust Techniques

• Methods giving more weight to the more common (central) values than to the peripheral results

• Described by Amadeo Pesce– Estimating reference intervals with n=20!

– Horne PS, Pesce AJ, Copeland BE. Clin Chem 1998;44:622-631.

• Techniques not readily available ***• Data-mining techniques may be considered

“robust”

Page 9: Topics Reference interval studies The importance of seeing Parametric, Non-parametric and other Estimating the error of a reference interval study Sample

Confidence Intervals

• Reference interval studies are experiments

• There is “Experimental error”

• This is revealed when more than one reference interval study is performed.

• Even if every other factor is the same, a different sampling of a population will produce a different result

• The confidence interval of the Upper and Lower reference intervals describe this error

Page 10: Topics Reference interval studies The importance of seeing Parametric, Non-parametric and other Estimating the error of a reference interval study Sample

Statistical Imprecision of RI study

• Estimates of reference limits has limitations• Expressed as the confidence interval of the

Reference Limits, eg 90% CI of the upper and lower reference limits

• Confidence intervals decrease as the number of people in the study increases.

Large n

Small n

Page 11: Topics Reference interval studies The importance of seeing Parametric, Non-parametric and other Estimating the error of a reference interval study Sample

CI - Parametric

• Mean +{z1s +/- z2*SQRT[s2/n + (z12*s2)/2n]}

• s = SD

• n=sample size

• z1 = probit value related to percentile

(=1.96 for 97.5th percentile)

• z2 = covering factor for confidence level

(= 1.64 for 90%)

Page 12: Topics Reference interval studies The importance of seeing Parametric, Non-parametric and other Estimating the error of a reference interval study Sample

CI - Parametric

• +/-2SD +/- 1.64*SQRT[s2/n + (1.962*s2)/2n]}

• Mean = 20, SD = 10

0

10

20

30

40

50

60

0 50 100 150 200

Number of patients

Ref

eren

ce L

imit

s

0.0%

5.0%

10.0%

15.0%

20.0%

25.0%

30.0%

35.0%

Co

nfi

den

ce i

nte

rval

err

or

Page 13: Topics Reference interval studies The importance of seeing Parametric, Non-parametric and other Estimating the error of a reference interval study Sample

EXAMPLE2.5th Centile, n=25097.5th = 0.025 x (n-1) = 0.025 x 249 = 6th lowest sample90% confidence interval is 3rd to 12th lowest samples

Page 14: Topics Reference interval studies The importance of seeing Parametric, Non-parametric and other Estimating the error of a reference interval study Sample

Outlier exclusion

• “some observation whose discordancy from the majority of the sample is excessive in relation to the assumed distribution model for the sample, thereby leading to the suspicion that it is not generated by this model.”

• A vital part of a reference interval study using parametric or non-parametric statistics

• Particularly difficult with “logarithmic” data

– (BNP data)

Page 15: Topics Reference interval studies The importance of seeing Parametric, Non-parametric and other Estimating the error of a reference interval study Sample

Outlier exclusion

• Dixons criteria

• If D (distance of outlier from next sample) is > 1/3 x R (range of entire data set): exclude

• For groups of outliers treat each individually

– NCCLS, Horn and Pesce

• Other: remove any data outside +/- 4SD

• “Reliable statistical detection of outliers in reference interval data remains a challenge”– Solberg and Lahti, Clin Chem 2005;51:2326-2332

Page 16: Topics Reference interval studies The importance of seeing Parametric, Non-parametric and other Estimating the error of a reference interval study Sample

Distributions

• Commonly “assumed” distributions– Gaussian– Square root– Logarithmic– More skewed

0

0.2

0.4

0.6

0.8

1

0 20 40 60 80

Page 17: Topics Reference interval studies The importance of seeing Parametric, Non-parametric and other Estimating the error of a reference interval study Sample

Box - Cox Transformations

• A family of transformationsy = (xλ - 1)/λy = ln(x+c) λ=0

• Covers many forms of transformationλ = 1 linear transformation (unchanged)λ = 0.5 square root transformationλ = 0.2 skewed right (less skewed than log)λ = 0 (or close to zero) logarithmic transformationλ = -0.2 Heavily skewed right (more than log)λ < 0 “Over-log” transformation

• Normalises data more skewed than log distribution

Page 18: Topics Reference interval studies The importance of seeing Parametric, Non-parametric and other Estimating the error of a reference interval study Sample

Transformations of ALT

NHANES III: ALT, male, age 20 to 80, n=6423

0

200

400

600

800

1000

1200

1400

1600

0

10

20

30

40

50

60

70

0

200

400

600

800

1000

1200

1400

16000 1 2 3 4 5 6

0

200400

600

8001000

1200

14001600

1800

0

0.5 1

1.5 2

2.5

Raw dataLambda=1

LogarithmicLambda=0

“Over-Log”Lambda=-0.5

Page 19: Topics Reference interval studies The importance of seeing Parametric, Non-parametric and other Estimating the error of a reference interval study Sample

1 2 Percent: 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ 1 | . . . .� 2 | . � � � � � � � � � . . . 3 | . .� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 4 | >� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 5 | >� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 6 | .� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 7 | . . .� � � � � � � � � � � � � � � � � � � � � � � � � � � � � 8 | . . . .� � � � � � � � � � � � � � 9 | . . . .� � � � � � � � 10 | . . . .� � � 11 | . . . .� � � 12 | . . � � . . 13 | . . . .� 14 | . . . .� 15 | . . . .� 16 | � . . . . 17 | . . . . 18 | . . . . 19 | . . . . +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

11

19

27

33

41

49

57

NORIP STUDYFemale ALT (n=1220)

Female URL: 45.6 (90% CI 42.5 – 49.3, n=1220)Male URL: 68 (90% CI 63.4 – 73.6, n=1080)

ALT

(U

/L)

Page 20: Topics Reference interval studies The importance of seeing Parametric, Non-parametric and other Estimating the error of a reference interval study Sample

Partitioning

• Provision of separate reference intervals for subgroups

• Sex and age (paediatric & geriatric) most common

• Others may include race, menopausal status, stage of gestation or menstrual cycle.

• Historically Harris and Boyd has been recommended.

• New theories– Lahti A et al. Clin Chem 2002;48:338-352

Page 21: Topics Reference interval studies The importance of seeing Parametric, Non-parametric and other Estimating the error of a reference interval study Sample

Lahti et al

• Criteria depends on asterisk rate of subgroups when common intervals are applied.

• <3.2% asterisk rate of either subgroup: NO

• >4.1% asterisk rate of either subgroup: YES

• In-between: consider other factors

• Note: non-parametric approach also described– Very complex– Clin. Chem., May 2004; 50: 891 - 900.

Page 22: Topics Reference interval studies The importance of seeing Parametric, Non-parametric and other Estimating the error of a reference interval study Sample

Data Mining

• Bhattacharya, LG. Journal of the Biometric Society. 1967;23:115-135.

• Example data: Frequency Distribution of the forkal length of the Porgy caught by pair-trawl fishery in the East China Sea.

Page 23: Topics Reference interval studies The importance of seeing Parametric, Non-parametric and other Estimating the error of a reference interval study Sample

Bhattacharya

• Assumptions– Gaussian or Log Gaussian distributions– Most results unaffected by reason for testing blood– Ideal for “profiles”– No systematic effect of source on results. Eg

• Inpatients with low sodium and albumin

• Outpatients with delayed separation

• Beware– No confidence limits for results– User-influence on results

Page 24: Topics Reference interval studies The importance of seeing Parametric, Non-parametric and other Estimating the error of a reference interval study Sample

0

50

100

150

200

250

300

0 50 100 150 200

-6-5-4

-3-2-101

234

Input Data

Bhattacharya fit

Battacharyafunction

Included data

Zero line

Linear (Includeddata)

GJ - Excel Bhattacharya