1
Utilizing computational tools to visually explore multivariable datasets: a Lassa fever case study in Nigeria Sara Asad, Andres Colubri, Pardis Sabeti FAS Center for Systems Biology, Harvard University, Cambridge, MA 02138 Abstract Background select plot type select range indication of significance based on selected P-value add a variable to column for comparison / use variable as a covariate / sort columns based on mutual information (MI) adjust composite coefficient add to composite adjust composite range Results MathWorks: Neural Network Toolbox %% This script assumes these variables are defined: % % clinical_data_inputs - input data. %% clinical_data_targets - target data. for k=1:10 k if k == 1 inputs = day_inputs1; targets = day_targets1; elseif k == 2 inputs = day_inputs2; targets = day_targets2; 8 66.7% 1 8.3% 2 16.7% 1 8.3% 83.3% 16.7% 66.7% 33.3% 88.9% 11.1% 88.9% 11.1% 66.7% 33.3% 8 66.7% 1 8.3% 2 16.7% 1 8.3% 83.3% 16.7% 66.7% 33.3% 88.9% 11.1% 88.9% 11.1% 66.7% 33.3% Predicted Class Recovered Died Actual Class 1, 2, 3…50 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Percentage (%) Recovered Died 80 82 84 86 88 90 92 94 96 98 100 1 2 3 4 5 6 7 8 9 10 Accuracy (%) Days of Treatment Single Recurring Figure 7. The accuracy of outcome based on daily vital signs recorded during the course of treatment. Points labeled single used data on that specific day of treatment whereas points labeled recurring incorporate the patients’ vital signs on previous days of treatment. Figure 1. Case-fatality rate for the top nine variables from Zye correlation ranking. The threshold for blood urea nitrogen is above 31.0 (mg/dl), creatine above 6.00 (mg/dl), granulocytes above 9.50 (%) and white blood cell concentration above 18.0 (10 3 /mm 3 ). 0% 20% 40% 60% 80% 100% 21 and under 22 - 34 35 - 44 45 - 54 55 - 64 65 and above Percentage (%) Age of Patient 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 1 - 3 4 - 6 7 - 9 10 - 12 13 - 15 16 - 18 19 - 21 22-24 25-28 Percentage (%) Days of Fever Before Presentation Figures 2, 3. The effect of a patients’ age and the days of fever before presentation at the hospital on CFR. 133 80.1% 6 3.6% 25 15.1% 2 1.2% 95.2% 4.8% 92.6% 7.4% 95.7% 4.3% 98.5% 1.5% 80.6% 19.4% Predicted Class Recovered Died Actual Class Figure 4. The outcome confusion matrix using SCNS, ARF, ENC, and HYPO followed by its corresponding sensitivity and specificity. Recovered Died Sensitivity 98.5% 80.6% Specificity 95.7% 92.6% 1.00E-12 1.00E-11 1.00E-10 1.00E-09 1.00E-08 1.00E-07 1.00E-06 1.00E-05 1.00E-04 1.00E-03 1.00E-02 1.00E-01 1.00E+00 P-Value (log-scale) Figure 6. P-values for the degree of significance between discrete clinical variables and patient outcome using Fisher’s Exact Test. Variables above the line indicate a P- value less than 0.01. First and foremost, I would like to thank my advisor, Dr. Andres Colubri. Without his guidance and support this project would not have been possible. I am grateful to Dr. Pardis Sabeti, all members of the Sabeti lab, and everyone at Fathom for this opportunity and valuable experience. This work was generously supported by the FAS Center for Systems Biology and HHMI Interdisciplinary Undergraduate Fellows Program. SCNS, ENC, ARF, and HYPO are the four discrete variables with highest correlation with patient outcome. These four variables have high predictive powers with a sensitivity of 98.5% and 80.6% and a specificity of 95.7% and 92.6%, for recovered and deceased patients respectively. Older patients have higher case fatality rate. For instance, patients 21 and under have case fatality rate (CFR) of 8.3%, patients between the age of 45 and 54 have CFR of 26.9% and patients 65 and over have CFR of 36.4% Although it was thought to have a significant impact on CFR, there is no clear correlation between the days of fever prior to hospital presentation and a patient’s outcome. A physician can more precisely determine outcome among patients who share similar clinical diagnosis using creatinine and lactate dehydrogenase lab results as shown in Figure 5. The difference in predicting outcome between single and recurring treatment data of patients vital signs is overall not significant, however there is a distinct peak on Day 9 for recurring data. In both cases there is a definite trend for increasing accuracy as treatment progresses. Zye is a powerful and precise tool for visually exploring and assessing multivariable data sets for correlations. For this Lassa fever case-study, Zye allows us to elegantly and intelligently generate hypotheses based on epidemiological data of Lassa fever patients. The Irrua Specialist Teaching Hospital (ISTH) in Nigeria documents extensive and comprehensive epidemiological data of patients in the Lassa fever ward. Information from this epidemiological data is used in determining the case-fatality rate associated with Lassa fever patients, and most importantly, finding which parameters strongly predict patient outcome. This project presents a visualization software, Zye, which allows users to efficiently scan large datasets through visual analysis. In our results, clinical diagnosis of severe central nervous system disease (SCNS), encephalopathy (ENC), acute renal failure (ARF), and hypotension (HYPO) most closely prognosticate outcome. Zye provides users an intuitive understanding of associations in multivariable datasets using mutual information ranking and an exploratory visual interface. The ability to extricate meaningful associations and generate novel hypotheses from large datasets has important implications for future epidemiological studies and more broadly, health sciences research. Lassa fever is a viral hemorrhagic fever endemic in West African countries including Nigeria, Sierra Leone, and Guinea. The Lassa virus is a negative-strand RNA virus carried in small African rodents, the multimammate rat, which serve as the virus’ main reservoir. An estimated 100,000 to 300,000 people are infected by the Lassa virus every year in West Africa. 1 The Irrua Specialist Teaching Hospital (ISTH) in Nigeria has on-site diagnostics and treatment for Lassa fever patients. Clinical data of Lassa patients from January 2011 until June 2013 is utilized in this case study. 1. Asogun DA, Adomeh DI, Ehimuan J, Odia I, Hass M, et al. (2012) Molecular Diagnostics for Lassa Fever at Irrua Specialist Teaching Hospital, Nigeria: Lessons Learnt from Two Years of Laboratory Operation. PLoS Negl Trop Dis 6(9): e1839. 2. Center for Disease Control. 2013. Lassa Fever. [Online]. Available: http://www.cdc.gov/ncidod/dvrd/spb/mnpages/dispages/lassaf.htm.Accessed August 8, 2013. 3. Khan SH, Goba A, Chu M, Roth C, et al. (2008) New opportunities for field research on the pathogenesis and treatment of Lassa fever. Antiviral Res 78: 103–115. 4. McCormick JB, King IJ, Webb PA, Johnson KM, et al. (1987) A case-control study of the clinical diagnosis and course of Lassa fever. J Infect Dis 155: 445–455. 5. McCormick JB, Webb PA, Krebs JW, Johnson KM, Smith ES (1987) A prospective study of the epidemiology and ecology of Lassa fever. J Infect Dis 155: 437–444. Training Data Set Zye Interface assign P-value Recovered Died Sensitivity and Specificity Figure 5. Creatinine and lactate dehydrogenase lab results of patients diagnosed with three or more of the following : SCNS, ENC, ARF, and HYPO. (FR – first lab result of the patient, LR – last available lab result of patient) 0.30 4.00 7.70 11.40 Died Recovered 152.0 1,444.0 2,736.0 4,028.0 Died Recovered LR - Creatinine (mg/dl) 133.0 1,429.8 2,726.0 4,023.3 Died Recovered FR - Lactate Dehydrogenase (IU/L) LR - Lactate Dehydrogenase (IU/L) Conclusions Acknowledgements References Recovered Died Sensitivity 88.9% 66.7% Specificity 88.9% 66.7%

Utilizing computational tools to visually explore ...andrescolubri.net/assets/teaching/sabeti/sara_asad_poster3.pdf · Lassa fever case study in Nigeria Sara Asad, Andres Colubri,

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Utilizing computational tools to visually explore ...andrescolubri.net/assets/teaching/sabeti/sara_asad_poster3.pdf · Lassa fever case study in Nigeria Sara Asad, Andres Colubri,

Utilizing computational tools to visually explore multivariable datasets: a

Lassa fever case study in Nigeria Sara Asad, Andres Colubri, Pardis Sabeti

FAS Center for Systems Biology, Harvard University, Cambridge, MA 02138

Abstract Background

select plot type select range

indication of

significance based on

selected P-value

add a variable to column for

comparison / use variable as a

covariate / sort columns based on

mutual information (MI)

adjust

composite

coefficient

add to composite

adjust composite range

Results

MathWorks: Neural

Network Toolbox

%% This script assumes these

variables are defined:

% % clinical_data_inputs - input data.

%% clinical_data_targets - target data.

for k=1:10

k

if k == 1

inputs = day_inputs1;

targets = day_targets1;

elseif k == 2

inputs = day_inputs2;

targets = day_targets2;

8 66.7%

1 8.3%

2 16.7%

1 8.3%

83.3% 16.7%

66.7% 33.3%

88.9% 11.1%

88.9% 11.1%

66.7% 33.3%

8 66.7%

1 8.3%

2 16.7%

1 8.3%

83.3% 16.7%

66.7% 33.3%

88.9% 11.1%

88.9% 11.1%

66.7% 33.3%

8 66.7%

1 8.3%

2 16.7%

1 8.3%

83.3% 16.7%

66.7% 33.3%

88.9% 11.1%

88.9% 11.1%

66.7% 33.3%

8 66.7%

1 8.3%

2 16.7%

1 8.3%

83.3% 16.7%

66.7% 33.3%

88.9% 11.1%

88.9% 11.1%

66.7% 33.3%

8 66.7%

1 8.3%

2 16.7%

1 8.3%

83.3% 16.7%

66.7% 33.3%

88.9% 11.1%

88.9% 11.1%

66.7% 33.3% P

red

icte

d C

lass

Re

co

ve

red

D

ied

Actual Class

1, 2, 3…50

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Pe

rce

nta

ge

(%

) Recovered

Died

80

82

84

86

88

90

92

94

96

98

100

1 2 3 4 5 6 7 8 9 10

Ac

cu

rac

y (

%)

Days of Treatment

Single

Recurring

Figure 7. The accuracy of outcome based on daily vital signs recorded during the course of treatment. Points labeled single used data on that specific day of treatment whereas points labeled recurring incorporate the patients’ vital signs on previous days of treatment.

Figure 1. Case-fatality rate for the top nine variables from Zye correlation ranking. The threshold for blood urea nitrogen is above 31.0 (mg/dl), creatine above 6.00 (mg/dl), granulocytes above 9.50 (%) and white blood cell concentration above 18.0 (103/mm3).

0%

20%

40%

60%

80%

100%

21 and under 22 - 34 35 - 44 45 - 54 55 - 64 65 and above

Pe

rce

nta

ge

(%

)

Age of Patient

0%10%20%30%40%50%60%70%80%90%

100%

1 - 3 4 - 6 7 - 9 10 - 12 13 - 15 16 - 18 19 - 21 22-24 25-28

Pe

rce

nta

ge

(%

)

Days of Fever Before Presentation

Figures 2, 3. The effect of a patients’ age and the days of fever before presentation at the hospital on CFR.

133

80.1%

6

3.6%

25

15.1%

2

1.2%

95.2%

4.8%

92.6%

7.4%

95.7%

4.3%

98.5%

1.5%

80.6%

19.4%

Pre

dic

ted

Cla

ss

Re

co

ve

red

D

ied

Actual Class

Figure 4. The outcome confusion matrix using SCNS, ARF, ENC, and HYPO followed by its corresponding sensitivity and specificity.

Recovered Died

Sensitivity 98.5% 80.6%

Specificity 95.7% 92.6%

1.00E-12

1.00E-11

1.00E-10

1.00E-09

1.00E-08

1.00E-07

1.00E-06

1.00E-05

1.00E-04

1.00E-03

1.00E-02

1.00E-01

1.00E+00

P-V

alu

e (l

og

-sc

ale

)

Figure 6. P-values for the degree of significance between discrete clinical variables and patient outcome using Fisher’s Exact Test. Variables above the line indicate a P-value less than 0.01.

First and foremost, I would like to thank my advisor, Dr. Andres Colubri. Without his guidance and support this project would not have been possible. I am grateful to Dr. Pardis Sabeti, all members of the Sabeti lab, and everyone at Fathom for this opportunity and valuable experience. This work was generously supported by the FAS Center for Systems Biology and HHMI Interdisciplinary Undergraduate Fellows Program.

• SCNS, ENC, ARF, and HYPO are the four discrete variables with highest correlation with patient outcome.

• These four variables have high predictive powers with a sensitivity of 98.5% and 80.6% and a specificity of 95.7% and 92.6%, for recovered and deceased patients respectively.

• Older patients have higher case fatality rate. For instance, patients 21 and under have case fatality rate (CFR) of 8.3%, patients between the age of 45 and 54 have CFR of 26.9% and patients 65 and over have CFR of 36.4%

• Although it was thought to have a significant impact on CFR, there is no clear correlation between the days of fever prior to hospital presentation and a patient’s outcome.

• A physician can more precisely determine outcome among patients who share similar clinical diagnosis using creatinine and lactate dehydrogenase lab results as shown in Figure 5.

• The difference in predicting outcome between single and recurring treatment data of patients vital signs is overall not significant, however there is a distinct peak on Day 9 for recurring data. In both cases there is a definite trend for increasing accuracy as treatment progresses.

• Zye is a powerful and precise tool for visually exploring and assessing multivariable data sets for correlations. For this Lassa fever case-study, Zye allows us to elegantly and intelligently generate hypotheses based on epidemiological data of Lassa fever patients.

The Irrua Specialist Teaching Hospital (ISTH) in Nigeria documents extensive and comprehensive epidemiological data of patients in the Lassa fever ward. Information from this epidemiological data is used in determining the case-fatality rate associated with Lassa fever patients, and most importantly, finding which parameters strongly predict patient outcome. This project presents a visualization software, Zye, which allows users to efficiently scan large datasets through visual analysis. In our results, clinical diagnosis of severe central nervous system disease

(SCNS), encephalopathy (ENC), acute renal failure (ARF), and hypotension (HYPO) most closely prognosticate outcome. Zye provides users an intuitive understanding of associations in multivariable datasets using mutual information ranking and an exploratory visual interface. The ability to extricate meaningful associations and generate novel hypotheses from large datasets has important implications for future epidemiological studies and more broadly, health sciences research.

Lassa fever is a viral hemorrhagic fever endemic in West African countries including Nigeria, Sierra Leone, and Guinea. The Lassa virus is a negative-strand RNA virus carried in small African rodents, the multimammate rat, which serve as the virus’ main reservoir. An estimated 100,000 to 300,000 people are infected by the Lassa virus every year in West Africa.1 The Irrua Specialist Teaching Hospital (ISTH) in Nigeria has on-site diagnostics and treatment for Lassa fever patients. Clinical data of Lassa patients from January 2011 until June 2013 is utilized in this case study.

1. Asogun DA, Adomeh DI, Ehimuan J, Odia I, Hass M, et al. (2012) Molecular Diagnostics for Lassa Fever at Irrua Specialist Teaching Hospital, Nigeria: Lessons Learnt from Two Years of Laboratory Operation. PLoS Negl Trop Dis 6(9): e1839.

2. Center for Disease Control. 2013. Lassa Fever. [Online]. Available: http://www.cdc.gov/ncidod/dvrd/spb/mnpages/dispages/lassaf.htm.Accessed August 8, 2013.

3. Khan SH, Goba A, Chu M, Roth C, et al. (2008) New opportunities for field research on the pathogenesis and treatment of Lassa fever. Antiviral Res 78: 103–115.

4. McCormick JB, King IJ, Webb PA, Johnson KM, et al. (1987) A case-control study of the clinical diagnosis and course of Lassa fever. J Infect Dis 155: 445–455.

5. McCormick JB, Webb PA, Krebs JW, Johnson KM, Smith ES (1987) A prospective study of the epidemiology and ecology of Lassa fever. J Infect Dis 155: 437–444.

Training Data Set

Zye Interface

assign P-value

Recovered Died

Sensitivity and Specificity Figure 5. Creatinine and lactate dehydrogenase lab results of patients diagnosed with three or more of the following : SCNS, ENC, ARF, and HYPO. (FR – first lab result of the patient, LR – last available lab result of patient)

0.30 4.00 7.70 11.40

Died

Recovered

152.0 1,444.0 2,736.0 4,028.0

Died

Recovered

LR - Creatinine

(mg/dl)

133.0 1,429.8 2,726.0 4,023.3 Died

Recovered

FR - Lactate

Dehydrogenase

(IU/L)

LR - Lactate

Dehydrogenase

(IU/L)

Conclusions

Acknowledgements

References

Recovered Died

Sensitivity

88.9% 66.7%

Specificity

88.9% 66.7%