Management of Neonatal Hyperbilirubinemia Methods of the AHRQ Evidence Report FDA Advisory Committee Meeting June 11, 2003 Joseph Lau, MD Tufts-New England

Management of Neonatal Hyperbilirubinemia

Methods of the AHRQ Evidence Report

FDA Advisory Committee MeetingJune 11, 2003

Joseph Lau, MD

Tufts-New England Medical Center EPC

INVESTIGATORS

Stanley Ip, MD

Mei Chung, MPH

Stephan Glicken, MD

John Kulig, MD

Rebecca O’Brien, MD

Robert Sege, MD, PhD

Joseph Lau, MD

Evidence report process

• Rigorous, comprehensive syntheses and analyses of relevant scientific literature

• Explicit and detailed documentation of methods, rationale, and assumptions

• Scientific syntheses may include meta-analyses and cost analyses

• Broad range of experts is included in the development process

• Reports do NOT make clinical recommendations

Systematic review process

• Formulate well focused study questions

• Establish evidence review protocol (inclusion and exclusion criteria)

• Perform comprehensive literature search

• Screen abstracts and full articles

• Abstract data and perform critical appraisal

• Perform analyses, summarize and interpret results

Key questionsAssociation of neonatal hyperbilirubinemia with

neurodevelopmental outcomes

1. What is the relationship between peak bilirubin levels and/or duration of hyperbilirubinemia and developmental outcome?

2. What is the evidence for effect modification of the results in question 1, by gestational age, hemolysis, serum albumin, and other factors?

Key questions (cont.)Treatments for neonatal hyperbilirubinemia

3. What are the quantitative estimates of efficacy of treatment for:

1. reducing peak bilirubin levels (e.g., number-needed-to-treat (NNT) at 20 mg/dl to keep total serum bilirubin (TSB) from rising);

2. reducing the duration of hyperbilirubinemia (e.g., average number of hours by which time TSB greater than 20 mg/dl may be shortened by treatment); and

3. improving neurodevelopmental outcomes.

Key questions (cont.)Diagnosis of neonatal hyperbilirubinemia

4. What is the efficacy of various strategies for predicting hyperbilirubinemia, including hour-specific bilirubin percentiles?

5. What is the accuracy of transcutaneous bilirubin measurements?

Literature search

• Medline and Premedline databases searched September 2001, yielding 4,325 citations

• Consulted domain experts and reviewed bibliography of relevant review articles for potential additional studies

• Supplemental search for case reports of kernicterus was also performed

General inclusion criteria

• English language human studies• Newborns between birth and one-month • Healthy, full-term infants 34 weeks EGA or 2,500 grams 10 subjects per arm (5 for Q1 and Q2)

• Additional criteria were applied to specific question

Literature search results

• Total citations screened = 4,325• Full articles retrieved = 663• Studies included in report = 138*

– Q1/Q2 = 37 + 28 kernicterus case reports– Q3 = 21– Q4 = 10– Q5 = 46

* Total of counts of individual questions exceeds 138 due to overlapping coverage

Summarizing and grading of evidence

Important parameters to sum up

• Methodological quality (internal validity, design, conduct, and reporting of the study)

• Applicability (generalizability, external validity, population, setting)

• Study size (weight, precision)• Effect (results, associations, test

performance)

Methodological quality

Refers to the design, conduct, and reporting of the clinical study. Because studies may be from a variety of types of design, the following three-level classification of study quality may be used to apply to each type of design.– Least potential bias (Grade A)– Susceptible to some bias, but not sufficient

to invalidate the results (Grade B)– Significant bias that may invalidate the

result (Grade C)

Applicability Category 1: Sample is representative of the target

population, or if results are definitely applicable to general population irrespective of study sample.

Category 2: Sample is representative of a relevant sub-group of the target population.

Category 3: Sample is representative of a narrow subgroup of patients only, and not well generalizable to other subgroups.

Quantitative methods used in evidence report

Question 3: NNT

What are the quantitative estimates of efficacy of treatment for: reducing peak bilirubin levels (e.g., number-needed-to-treat (NNT) at 20 mg/dl to keep total serum bilirubin (TSB) from rising)?

Hypothetical example of treating bilirubin at 15 mg/dl to prevent it from rising

Treat at 15 mg/dl

Not treat

Rise 10 pts 20

Not rise 90 80

Total 100 100

Risk Difference = 10/100 – 20/100 = -10/100 = -0.1

NNT = 1 / Risk Difference = 1/10 = 10

Methods to assess agreement between two testing methods reported in studies

• Correlation (r value)– Meta-analyses performed in evidence

report when data available

• Bland and Altman method (difference of results of two testing methods plotted against their mean value)– Preferred method

Accuracy of BilicheckTM

Bhutani et al., Pediatrics 2000

Limitations of correlation coefficient to assess agreement

(hypothetical data - all have correlation coefficient of 1)

0

5

10

15

20

25

30

35

40

45

0 5 10 15 20 25 30 35 40 45

HPLC bilirubin(reference standard)

Ne

w m

ea

su

rin

g d

ev

ice

Limitations of correlation coefficients in assessing agreement between two testing methods

• Correlation coefficient provides a measure of the strength and directionality of the association, but NOT agreement

• Correlation measures ignore bias• Correlation coefficient does not provide

information as to clinical utility of diagnostic test• Correlation coefficient (r) is dependent on

distribution of serum bilirubin• Measures relative rather than absolute

agreement• High correlation coefficient is a necessary but

not a sufficient condition to assess agreement

Bland and Altman method

• True value is unknown• Takes the average of the paired measurements

as the best estimate• Plot for each pair of measurements, the

difference in results between devices against the average results

• Removes statistical artifact of plotting the difference against either of the measurement (built-in correlation)

• The magnitude of bias can be estimated as well as the standard deviation of the differences

Error distribution paired HPLC TSB and TcBBhutani et al., Pediatrics 2000

Common methods to summarize diagnostic test performance

• Combining sensitivity and specificity independently

• Combining diagnostic odds ratios across studies

• Summary ROC curve

Summary ROC methodMoses LE, Shapiro D, Littenberg B. Combining independent studies of a diagnostic test into a summary ROC curve: Data-analytic approaches and some additional considerations. Stat Med 1993; 12:1293-1316.

• Assumption: studies results differ because of different thresholds

• Solution: fit a curve in the ROC space that best describes the data

• Problem: sensitivity and specificity are correlated

• Solution: regress the difference of the logits onto the sum of logits and transform back to ROC space

1 - specificity

a

b

d

c

1 - specificity

sen

sitiv

ity

a

b

d

c

ROC curve constructed from multiple test thresholds

Diseased

Notdiseased

Multiple thresholds evaluated in test

b c da

Examples of SROC curves and pooled sensitivity and specificity

Documents

Management of Neonatal Hyperbilirubinemia Methods of the AHRQ Evidence Report FDA Advisory Committee Meeting June 11, 2003 Joseph Lau, MD Tufts-New England