Upload
lamliem
View
213
Download
0
Embed Size (px)
Citation preview
RAMATHIBODI APPENDICITIS SCORE (RAMA-AS): A USEFUL TOOL FOR DIAGNOSIS OF APPENDICITIS
CHUMPON WILASRUSMEE
A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENT FOR
THE DEGREE OF DOCTOR OF PHILOSOPHY (CLINICAL EPIDEMIOLOGY)
FACULTY OF GRADUATE STUDIES MAHIDOL UNIVERSITY
2016
COPYRIGHT OF MAHIDOL UNIVERSITY
Thesis entitled
RAMATHIBODI APPENDICITIS SCORE (RAMA-AS): A USEFUL TOOL FOR DIAGNOSIS OF APPENDICITIS
............ ................................................
Mr. Chumpon Wilasrusmee Candidate
............................................................
Assoc. Prof. Ammarin Thakkinstian, Ph.D.(Clinical Epidemiology & Community Medicine) Major advisor
............................................................ Assoc. Prof. Patarawan Woratanarat, M.D.,Ph.D.(Clinical Epidemiology) Co-advisor
............................................................ Assoc. Prof. Panuwat Lertsittichai, M.D.,M.Sc.(Medical Statistics) Co-advisor
... ...................................................... ............................................................ Assoc.Prof. Varaporn Akkarapatumwong, Assoc. Prof. Ammarin Thakkinstian, Ph.D. (Science) Ph.D.(Clinical Epidemiology & Acting Dean Community Medicine) Faculty of Graduate Studies Program Director Mahidol University Doctor of Philosophy Program in Clinical Epidemiology Faculty of Medicine, Ramathibodi
Hospital Mahidol University
Thesis entitled
RAMATHIBODI APPENDICITIS SCORE (RAMA-AS): A USEFUL TOOL FOR DIAGNOSIS OF APPENDICITIS
was submitted to the Faculty of Graduate Studies, Mahidol University
for the degree of Doctor of Philosophy (Clinical Epidemiology) on
October 27, 2016
………………………………............. Mr. Chumpon Wilasrusmee Candidate ……………………………………... Lect. Vijj Kasemsap, M.D.,Ph.D.( Social & Administrative Pharmacy) Chair ………………………………..…. …………………………………… Prof. Prakitpunthu Tomtitchong, Assoc. Prof. Ammarin Thakkinstian, M.D., M.Sc., Ph.D., FRSCT Ph.D.(Clinical Epidemiology & Member Community Medicine) Member
………………………………..…. …………………………………… Assoc. Prof. Panuwat Lertsithichai, Assoc. Prof. Patarawan Woratanarat, M.D.,M.Sc.(Medical Statistics) M.D.,Ph.D.(Clinical Epidemiology) Member Member
…………………………………… ……… ……………………………… Assoc.Prof. Varaporn Akkarapatumwong, Prof. Piyamitr Sritara, Ph.D. (Science) M.D., FRCPT, FACP, FRCP (T) Acting Dean Dean Faculty of Graduate Studies Faculty of Medicine Ramathibodi Hospital Mahidol University Mahidol University
iii
ACKNOWLEDGEMENTS
I would like to express my deepest appreciation to my major advisor
Assoc. Prof. Dr. Ammarin Thakkinstian, who has the attitude and the substance of a
genius: she continually and convincingly conveyed a spirit of adventure in regard to
research and scholarship, and an excitement in regard to teaching. Without her
guidance and persistent help this dissertation would not have been possible.
I would like to show my gratitude to Dr. Sasivimol Rattanasiri, Dittapol
Muntham, Prawduen Saravej, and all personnels of the Section of Clinical
Epidimiology and Biostatistic, Ramathibodi Hospital, Mahidol University for their
grateful help and support especially about data management and administrative
management of my project.
I am also thankful to Dr. Boonying Siribumrungwong (Department of
Surgery, Thammasat Hospital, Thammasat University), Dr.Samart Phuwaprisirisarn
(Department of Surgery, Chaiyaphum Hospital), and Napaphat Poprom for their
permission and support for collecting the patient data. Without their support, I could
not develop and validate the prediction model.
Chumpon Wilasrusmee
Fac. of Grad. Studies, Mahidol Univ. Thesis / iv
RAMATHIBODI APPENDICITIS SCORE (RAMA-AS): A USEFUL TOOL FOR DIAGNOSIS OF
APPENDICITIS
CHUMPON WILASRUSMEE 5336192 RACE/D
Ph.D. (CLINICAL EPIDEMIOLOGY)
THESIS ADVISORY COMMITTEE: AMMARIN THAKKINSTIAN, PH.D., PATARAWAN
WORATANARAT, M.D., PH.D., PANUWAT LERTSITHICHAI, M.D., M.SC.
ABSTRACT
Diagnosis of appendicitis is still clinically challenging. Risk stratification of diagnosis should be developed to aid in management of appendicitis. The purpose of this study was to develop and externally validate Ramathibodi Appendicitis Score (RAMA-AS) in aiding diagnosis of appendicitis. This cross-sectional study consisted of two phases: derivation and validation and was conducted at Ramathibodi Hospital, Thammasat University Hospital, and Chaiyaphum Hospital during January 2013-May 2015. Patients with abdominal pain and suspected of having appendicitis visited at these hospitals were enrolled. Multiple logistic regression was applied to develop parsimonious model. Calibration and discrimination performances were assessed. In addition, our RAMA-AS was compared with Alvarado’s score performances using ROC curve analysis. All analysis was performed using Stata version 14 (Stata Corp, College Station, Texas, USA). A P-value of less than 0.05 was taken as a threshold for statistical significance.
The RAMA-AS consisted of 3 domains 7 predictors including symptoms (i.e. progression of pain, aggravation of pain, and migration of pain), signs (i.e. fever and rebound tenderness), and laboratory: white blood cell count (WBC) and neutrophils. The model fitted well with data with Sommer’s D off 0.686 (95%CI: 0.608, 0.763). The model discriminated well with C-statistic of 0.842 (95% CI: 0.804, 0.881); and the bootstrap C-statistics of 0.848 (95% CI: 0.846, 0.849). For external validation, the RAMA_AS worked well after model revisions including calibration of intercept and overall coefficients plus stepwise selection of significant predictors with O/E ratio of 1.005 and 0.996 (95%CI: 0.784, 1.225 and 0.695, 1.333; Hosmer-Lemshow = 8.219 and 6.640, df = 4 and 4, p = 0.838 and 0.156) for the 2 external validations. The C-statistics of the 2 external validations were 0.853 (95%CI: 0.791, 0.915) and 0.813 (95%CI: 0.736, 0.892). The C-statistics of Alvarado score was 0.760 (95%CI: 0.710, 0.810), which had lower discriminative ability.
RAMA-AS should be a useful tool for diagnosis of appendicitis with good calibration and discrimination performances. Practitioners should be encouraged to use the score in clinical practice in order to confirm diagnosis and choosing the patient who should undergo imaging or surgical management.
KEY WORDS: APPENDICITIS SCORE / DERIVE PHASE / VALIDATION PHASE / CALIBRATION
/ DISCRIMINATION
134 pages
Fac. of Grad. Studies, Mahidol Univ. Thesis / v
"รามาธบด อะเพนดไซทส สกอร" ระบบคะแนนซงเปนเครองมอทมประโยชนในการวนจฉยภาวะไสตงอกเสบ
RAMATHIBODI APPENDICITIS SCORE (RAMA-AS): A USEFUL TOOL FOR DIAGNOSIS OF APPENDICITIS
จมพล วลาศรศม 5336192 RACE/D
ปร.ด. (วทยาการระบาดคลนก)
คณะกรรมการทปรกษาวทยานพนธ: อมรนทร ทกขณเสถยร, Ph.D., ภทรวณย วรธนารตน, M.D.,Ph.D., ภาณวฒน เลศสทธชย, M.D.,
M.Sc.
บทคดยอ
การวนจฉยภาวะไสตงอกเสบโรคยงเปนปญหาททาทาย ควรมการพฒนาการจดชนความเสยงในการวนจฉยเพอชวยเปน
แนวทางในการดแลรกษาผปวย จดประสงคของวทยานพนธฉบบนคอการสรางระบบคะแนนรามาธบดทใชชวยการวนจฉยภาวะไสตง
อกเสบ และดาเนนการตรวจสอบจากภายนอก การศกษาแบบตดขวางซงประกอบดวย 2 ระยะ ไดแก การพฒนาและการตรวจสอบระบบ
คะแนน ไดถกดาเนนการในผปวยทมอาการปวดทองและถกสงสยวามภาวะไสตงอกเสบจากทง 3 โรงพยาบาล ไดแก คณะแพทยศาสตร
โรงพยาบาลรามาธบด คณะแพทยศาสตรโรงพยาบาลธรรมศาสตร และโรงพยาบาลจงหวดชยภม ระหวางเดอนมกราคม พ.ศ. 2556 – 2558
การวเคราะหการถดถอยโลจสตคไดถกนามาใชในการสรางสมการตนแบบทมรปแบบทงายทสด(parsimonious model) ตามดวยการ
ประเมนคา การสอบเทยบ (calibration) และความสามารถในการแบงแยก (discrimination) นอกจากนนยงทาการเปรยบเทยบระบบคะแนน
รามาธบดและระบบคะแนนเอาเวอราโด (Alvarado) โดยใช โคงอารโอซ (ROC curve) การวเคราะหขอมลทาโดยใชโปรแกรม
คอมพวเตอรสตาตาร (Stata) รน 14 คาพ (P-value) นอยกวา 0.05 เปนระดบทถอวามความสาคญทางสถต
ระบบคะแนนรามาธบดประกอบดวย 3 ขอบเขตกาเนด (domain) 7 ตวทานาย (predictor) มาจากการซกประวต
(symptom) 3 ตวแปร ไดแก อาการปวดทองทเปนมากขน อาการปวดทองทเพมขนจากการไอ จาม หรอขยบตว และการยายทของการปวด
ทองจากบรเวณรอบสะดอมาสดานขวาลาง มาจากการตรวจรางกาย (sign) 2 ตวแปร ไดแก ตรวจพบวามไข อณหภมรางกายมากกวา 37
องศาเซลเซยส และการตรวจพบวามการ “ปลอยเจบ” หลงจากคอยๆกดทองลงไปอยางชาๆจนตงฝานวมอหรอคนไขเรมเจบ แลวยกมอขน
(เลกกด) ทนท มาจากการตรวจทางหองปฏบตการ (laboratory) 2 ตวแปร ไดแก การเพมขนของเมดเลอดขาวมากกวา 10,000 เซลลตอ
ลกบาศกมลลเมตร และการเพมขนของนวโตรฟลมากกวารอยละ 75 จากผลการตรวจความสมบรณของเมดเลอด ตนแบบสมการทสรางขน
มความเหมาะสมดกบขอมล โดยมคา ซอมเมอรด (Sommer’s D) เทากบ 0.686 ชวงความเชอมนทรอยละ 95 เทากบ 0.608 ถง 0.763 การ
ตรวจสอบภายในพบวาระบบคะแนนทสรางขนมความสามารถในการแยกผปวยทเปนและไมเปนไสตงอกเสบอยในระดบด มคา C-statistic
เทากบ 0.842 ชวงความเชอมนทรอยละ 95 เทากบ 0.804 ถง 0.881 โดยไดรบการยนยนจากการตรวจสอบภายใน ซงมคา C-statistic 0.848
ชวงความเชอมนทรอยละ 95 เทากบ 0.846 ถง 0.849 การตรวจสอบภายนอกพบวาระบบคะแนนรามาธบด ใชไดดหลงจากการปรบปรง
ระบบคะแนน (revision) โดยการสอบเทยบจดตด (calibration of intercept) และ คาสมประสทธ (coefficient) รวมถงการเลอกแบบขนตอน
(stepwise selection) ของตวทานาย โดยมคาอตราสวน โอ/อ เทากบ 1.005 และ 0.996 ชวงความเชอมนทรอยละ 95 เทากบ 0.784 ถง 1.225
และ 0.965, 1.333 ฮอสเมอรเรมโช (Hosmer-Lemshow) เทากบ 8.219 และ 6.640, ดเอฟเทากบ 4และ 4, พเทากบ 0.838 และ 0.156 การ
ตรวจสอบภายนอกจากทงสองแหงมคา C-statistic เทากบ 0.840 ชวงความเชอมนทรอยละ 95 เทากบ 0.780 ถง 0.910 และ 0.810 ชวงความ
เชอมนทรอยละ 95 เทากบ 0.730 ถง 0.890 เมอเปรยบเทยบกบระบบคะแนน Alvarado ซงเปนทนยมใช พบวา ระบบคะแนนรามาธบดม
ความสามารถในการแยกผปวยดกวาระบบคะแนน Alvarado ซงมคา C-statistic เทากบ 0.760 คาความเชอมนทรอยละ 95 เทากบ 0.710 ถง
0.810
โดยสรปการสรางระบบคะแนนรามาธบดเพอชวยในการวนจฉยโรคไสตงอกเสบ มคาความสามารถในการแยก และ
ทานายผปวยไสตงอกเสบในระดบด ซงนาจะนาไปใชในทางคลนคเพอเปนแนวทางในการดแลผปวย การชวยวนจฉยแยกโรคไสตงอกเสบ
การสงเกตอาการ การสงตรวจทางรงสวทยา และการผาตดรกษา
134 หนา
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 1
CHAPTER I
BACKGROUND AND RATIONALE
1.1 Anatomy Appendix is a blind-ended tube at the end of cecum, which is located
approximately 2 cm from ileocecal valve which separates the small intestine from
large intestine (Figure 1.1a). The surface point is known as McBurney’s point (Figure
1.1b) and corresponds with the position of the appendix in the abdomen. The average
length of human appendix is 9 cm (range: 2-20 cm) with a diameter of 7 to 8 mm. The
appendix might serve as a microbial reservoir for repopulation of the gastrointestinal
tract in times of necessity(1). The function of gut associated lymphoid tissue may
provide immune defence from invading pathogen. However, lack of side effects after
appendectomy means it is judged as only vestigial and lack of specific function.
1.2 Pathophysiology and pathology
Appendicitis is defined as an inflammation of the appendix that usually
begins at the inner lining and spreads to its other part. Direct luminal obstruction, often
by fecalith, lymphoid hyperplasia, impacted stool, foreign body, or tumor can cause
appendicitis. Several infectious agents, genetic factors, and environmental influences
have been reported to be associated with appendicitis. Pathology of appendicitis
includes congestion/obstruction, colour changes, increase diameter, exudate, pus,
perforation, transmural inflammation, ulceration, thrombosis, and necrosis as seen in
macroscopic appearance(2).
Obstruction of the appendicular lumen has been reported from a variety of
causes, e.g., fecalith, lymphoid hyperplasia, vegetable matter and fruit seed, barium
from radiography, intestinal worms (especially ascarids), primary tumors (carcinoid,
adenocarcinoma, Kaposi sarcoma, and lymphoma), and metastatic tumors (colon and
breast) (3, 4). Fecolith is the most common cause which was reported as high as 67-
vi
CONTENTS
Page
ACKNOWLEDGEMENTS…………………………………………………………iii
ABSTRACT (ENGLISH)............................................................................................iv
ABSTRACT (THAI).....................................................................................................v
LIST OF TABLES.....................................................................................................viii
LIST OF FIGURES......................................................................................................x
CHAPTER I BACKGROUND AND RATIONALE ................................................. 1
1.1 Anatomy ................................................................................................ 1
1.2 Pathophysiology and pathology ............................................................ 1
1.3 Diagnosis ............................................................................................... 4
1.4 Rationale ................................................................................................ 7
1.5 Research Questions.............................................................................. 10
1.6 Research Objectives ............................................................................ 10
CHAPTER II LITERATURE REVIEW ................................................................. 13
2.1 History of previous scores’ developments ........................................... 13
2.2 Systematic review of scoring systems for diagnosis of ....................... 19
appendicitis(60)
2.3 Definition ............................................................................................. 28
2.4 Research methods in risk prediction scores ......................................... 29
2.5 Conceptual framework…………………………………………….....33
CHAPTER III METHOD .......................................................................................... 50
3.1 Study design and setting ...................................................................... 50
3.2 Study subjects ...................................................................................... 51
3.3 Data Collection……………………………………………………….54
3.4 Sample size estimation ........................................................................ 55
3.5 Data management ................................................................................ 56
3.6 Statistical analysis................................................................................ 57
vii
CONTENTS (cont.)
Page
3.7 Ethics considerations ........................................................................... 65
CHAPTER IV RESULTS .......................................................................................... 70
4.1 Characteristic of patients ..................................................................... 70
4.2 Imputation 70
4.3 Model development ............................................................................. 71
4.4 External validation 74
4.5 Comparison of RAMA-As and previous scores 75
CHAPTER V DISCUSSION ..................................................................................... 97
5.1 Comparison of RAMA-AS and previous score and ............................ 97
radiological investigation
5.2 External validation and model updating .............................................. 98
5.3 Using the RAMA-AS in practice (interpretation and implication) ... 100
5.4 Strengths and limitations ................................................................... 101
CHAPTER VI CONCLUSION ............................................................................... 105
REFERENCES ......................................................................................................... 109
APPENDICES........................................................................................................... 119
BIOGRAPHY.. ..........................................................................................................134
viii
LIST OF TABLES
Table Page
2.1 Alvarado scoring system .................................................................................. 35
2.2 The scoring parameters of RIPASA score based on probability ....................... 36
and extra weight
2.3 The Appendicitis Inflammatory Response Score (AIRS) ................................ 37
2.4 Fenyö-Lindberg scoring system ....................................................................... 38
2.5 Ohmann score .................................................................................................... 39
2.6 Eskelinen score .................................................................................................. 39
2.7 Simple scoring system ....................................................................................... 39
2.8 Practical score of Ramirez and Deus ................................................................. 40
2.9 Scoring system developed by Teicher ............................................................... 41
2.10 Performance of Random Forests (RF), Support Vector Machines (SVM), ...... 42
Artificial Neural Networks (ANN), Logistic Regression (LR), and
Alvarado score on diagnosis of acute appendicitis
2.11 Describe methodological assessments ............................................................... 43
2.12 Characteristics of studies that had developed prediction scores ....................... 44
for appendicitis
3.1 Estimate sample size by PS program .................................................................. 67
3.2 Re-calibration and revision of models for external validations .......................... 68
4.1 Baseline data of 396 patients from Ramathibodi Hospital, 152 patients……… 76
from Thammasat Hospital, and 178 patients from Chaiyaphum Hospital
4.2 Report on number of missing data ..................................................................... 77
4.3 Description of patients’ characteristics in appendicitis and .............................. 78
non-appendicitis groups
4.4 Factors associated with appendicitis: Multiple logistic regression analysis ...... 80
4.5 Risk stratification and predictive values of a RAMA-AS prediction score ....... 81
4.6 Key study characteristics of patients from derivation and external validation .. 82
ix
LIST OF TABLES (cont.)
Table Page
4.7 Description of patients’ characteristics in appendicitis and ............................... 83
non-appendicitis groups of Thammasat Hospital
4.8 Description of patients’ characteristics in appendicitis and ............................... 84
non-appendicitis groups of Chaiyaphum Hospital
x
LIST OF FIGURES
Figure Page
1.1 The Anatomy of appendix ................................................................................ 11
1.2 A Normal appendicitis at McBurney’s point……………………………….…12
1.2 B Inflamed appendicitis……………………………………………………….12
1.2 C Gangrene/ ruptured appendicitis…………………………………………….12
2.1 Identification of studies for inclusion. .............................................................. 34
3.1 Rebound tenderness ............................................................................................ 69
4.1 Diagnostic plot between missing and observe values……………… … ……..85
4.2 Receiver Operating Characteristic (ROC) curves of RAMA-AS………… …86
for diagnosis of appendicitis
4.3 Fagan nomogram plot for RAMA-AS risk stratification………………… ……87
4.4 Calibration plots for external validations at Thammasat Hospital ………… ….88
using different update methods
4.5 Calibration plots for external validations at Chaiyaphum Hospital………. …92
using different update methods
4.6 Comparisons of C-statistics between RAMA-AS, Alvarodo,……………….…96
Eskeline and Fenyo scores
5.1 Comparisons of C-statistics between RAMA-AS, ultrasound and CT scan… 103
…5.2 Guide to choose predictive score for appendicitis according to prevalence.. ..104
…6.1 Website-based calculation of appendicitis.……………………………….. ..105
…6.1 Admission record of ongoing impact analysis.………………………… …....107
Chumpon Wilasrumee Background and Rationale / 2
89% (5). The prevalence of appendicitis is higher in teenagers and young adults than
general adults, which could be explained by a pathophysiological role of lymphoid
hyperplasia that exists in abundance in the appendix (6), and a lumen which was
smaller size relative to wall thickness and susceptible to obstruction.
The obstruction theory is based on the observed frequency of such
obstructions, on the increased intraluminal pressures found in inflamed appendixes,
and on the experimental production of appendicitis by obstruction(7). The distribution
of inflammatory changes in acute appendicitis was analyzed by histological
examination of multiple longitudinal sections which found that inflammation was
either sharply confined to the distal part of the appendix or involved the whole organ
but less small inflammation involved either to the proximal part of the organ or to the
central portion(5).
Although the obstruction theory was reasonable, it has yet to be proven. It
also fails to explain infection case .Many pathologists have suspected that the fecalith
occurred after infections of the appendix, consistent with recent evidences that did not
support obstruction pathology (8, 9). In addition, about 90% of patients with
phlegmonous appendicitis had no evidence of raising intraluminal pressure and signs
of luminal obstruction (10). As a result, obstruction may not be an important role of
acute appendicitis, although it may develop as a result of the inflammatory process.
Another possible cause is about labile factor in the appendix, which
responds to a variety of external and internal stimuli, which causes injury to the
appendix which permits bacterial invasion by the flora normally present(7). It is likely
that there are several aetiologies of appendicitis which leads to the final common
pathway of bacterial invasions to the appendiceal wall(6). The labile factor varies with
different investigators, and change in vascular tone can be one of the basis for the
attack of appendicitis(7). Vascular or muscle spasm altered permeability of capillaries
and changes in local acidity may be causes, but the assumption of a labile element,
capable of rapid response to stimuli, is the common factor in all.
Either obstruction or bacterial invasion leads to inflammation, rising
intraluminal pressures, and ultimately ischemia. Subsequently, the appendix enlarges
and incites inflammatory changes in surrounding tissues, e.g., pericecal fat and
peritoneum. Rapid distension of the appendix is likely because of its small luminal
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 3
capacity, and intraluminal pressures can reach 50 to 65 mm Hg(8). As luminal
pressure increases, venous and lymphatic obstruction and mucosal ischemia develop.
Mucosa becomes hypoxic and begins to ulcerate, resulting in compromise of the
mucosal barrier, and leading to invasion of the appendiceal wall by intraluminal
bacteria.
As a result of the inflammation process, visceral afferent nerve fibers that
enter the spinal cord at T8 - T10 are stimulated, causing referred epigastric and
periumbilical pain. Once inflammation and infection extend to serosa, parietal
peritoneum, and adjacent organs, the somatic pain supersedes the early referred pain.
Patients usually have migratory pain, tenderness, guarding, and rebound tenderness to
the site of maximal pain at the right lower quadrant(11) of the abdomen. Progression
of disease causes compromise arterial blood flow and infarction, resulting in gangrene
and perforation, which usually occurs between 24 and 36 hours. Fever, anorexia,
nausea, and vomiting usually follow as the pathophysiology worsens(8, 12).
Appendicitis with/without perforations may be different in
pathogenesis(6). Patients with a short duration of symptoms had a predominantly
neutrophil infiltrate but long duration might be lymphocytic infiltrate with granulation
tissue. These findings support the argument that a mixed infiltrate of lymphocytes and
eosinophils represents a regression phase of acute appendicitis. Fibrous adhesion
formation and scarring of the appendiceal wall have been demonstrated and are
consistent with resolution of a previous attack of appendicitis. The manifestation of
appendiceal perforation with/without little inflammation was found and in some cases,
an ischemic appendix perforates differently from those in which perforation was due
to the evolution of an inflammation with severe infection.
Recently, the concept of neuroimmune appendicitis has evolved(13). After
a previous minor bout of intestinal inflammation, subtle alterations in enteric
neurotransmitters are detected, which may result in altered visceral perception from
the gut. This process has been implicated in a wide range of gastrointestinal conditions
including appendicitis. The local immune response in the appendiceal tissue is
mirrored in the blood. The immunological response pattern in peripheral blood
suggests Th1/Th17- induced inflammation in advanced appendicitis which is present
at an early clinical presentation. Patients with a history of advanced appendicitis have
Chumpon Wilasrumee Background and Rationale / 4
stronger Th1 responses than individuals with a history of phlegmonous appendicitis.
This may reflect constitutional differences between patients with different outcomes of
appendicitis. The increased inflammatory response observed early in complicated
appendicitis (gangrene or perforation) suggests a more violent inflammation and
supports the hypothesis of different immune pathogeneses, where excessive induction
of Th1/Th17 immunity and/or deficiencies in down-regulatory feedback mechanisms
may explain the excessive inflammation in advanced appendicitis(14).
Normal appendicitis at McBurney’s point (lateral 1/3 of the line between
umbilicus and anterior superior iliac spine) is shown in Figure 1.2A, compared with
inflammation of appendix in Figure 1.2B, and gangrene/ ruptured appendicitis in
Figure 1.2C.
1.3 Diagnosis The diagnosis of suspected appendicitis is usually based on patient history,
physical examination, laboratory test, imaging (e.g., computer tomography (CT) scan
or ultrasound (U/S)) with pathological confirmation. Despite it is a common problem,
appendicitis remains a complicated problem and perplexing to establish diagnosis,
especially in patients who had atypical clinical presentation. The classic signs and
symptoms are present in only 60% to 70 % of patients indicating a difficulty to
ascertain a correct diagnosis. This could delay the diagnosis or leading to unnecessary
operation and contribute to the persistent rate of morbidity and mortality. When using
routine clinical methods, the correct diagnosis can be obtained in between 71% and
97% of patients, but the rate of negative appendectomy was still high, and varied
between 14% and 75 %(15-17), or even as high as 85 %(18). The incidence of
perforated appendicitis varies between 4% and 45 %, and the mortality rate ranges
from 0.17% to 7.5 %(15). Therefore, clinicians should try to improve diagnostic by
not only carefully assessing for those signs and symptoms but also finding the
additional tools to help in discrimination of high risk patients where surgical
intervention is necessary from low risk patients who do not need further investigation
or observed safely.
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 5
There are 3 possible scenarios in which misdiagnosis occurs. First,
appendicitis is diagnosed, the patient undergoes an operation, and non-appendiceal
disease is discovered, which may or may not benefit from surgical intervention (e.g.,
gynecologic lesions, colitis, or inflammatory bowel disease of the terminal ileum), in
this scenario, the appendix may or may not be removed. Second, appendicitis is
diagnosed, the patient undergoes an operation, and no abnormality is found. Again, the
appendix may or may not be removed. Third, appendicitis is not diagnosed but the
patient does have an inflamed appendix. The first 2 scenarios of misdiagnosis or
negative appendectomy (NA) are not only important in quality improvement but also
involve in patient safety, cost, morbidity and mortality. For the last scenario, most
patients may return with persistent/ recurrent appendicitis with/without perforation or
other complications, e.g., abscess. Misdiagnosis is undoubtedly serious, but its
clinical importance is complicated. Some patients may be resolved without surgical
treatment. Some early cases may progress over a period of observation, allowing for
detection on repeat evaluation. Worse in some patients it may progress to perforation
and complicated appendicitis. Concern for the third scenario lead to drive clinical
practice toward the first 2 forms of misdiagnosis, so a high number of negative
explorations for suspected appendicitis has been tolerated as surgeons endeavored to
miss no cases, thereby averting perforation. It is much better to subject a moderate
number of patients to unnecessary operation than to let one patient suffer perforation.
Increasing use of sensitive imaging (e.g., U/S CT) can improve diagnostic
accuracy and detect mild inflammation of the appendix, which may resolve without
operation(19). Several studies over the past years have shown that the use of imaging
was associated with a reduction of NA(20-22). However, this performance was not
consistent, and many studies could not replicate this finding(23-26). The evidences
even showed that clinical examinations and CT scan were not much different in
sensitivity (83% vs 83.8%, respectively) and positive predictive value (PPV) (86.7%
vs 83.8%), whereas the U/S performed inferiorly to both (sensitivity 35.5% and PPV
81.3%)(25).
A routine use of CT scan in the diagnosis of acute appendicitis has been
increasing in recent years. This practice is highly controversial due to concerns related
to the hazards of ionizing radiation and also about its overutilization in clear-cut
Chumpon Wilasrumee Background and Rationale / 6
clinical presentations. Patients are exposed to high doses of radiation which are
equivalent to 400 times of general chest film, and this will increase the risk for
development of cancer or leukemia(27). One study suggested that a large proportion of
patients who undergo abdominal and pelvic CT scanning received medically
unnecessary multiphase examinations, resulting in substantial excessive radiation
exposure(20). Approximately 3 million scans were performed annually in the United
States in 1980, and by 2008, that number had grown to 67 million(20). This study
suggested that a large proportion of patients undergoing abdominal CT scan receive
unindicated additional phases that add substantial excess radiation dose with no
associated clinical benefit. One study has estimated that the benefit of universal
imaging in avoiding 12 unnecessary appendectomies could result in one additional
cancer death(28). In addition, a randomized controlled study compared clinical
assessment with CT for the diagnosis of acute appendicitis indicating clinical
assessment, unaided by CT scan, reliably identified patients who required operation.
Therefore, a routine use of abdominal/pelvic CT was not recommended(21). CT scan
is not considered as a standard of care for the diagnosis of acute appendicitis. A study
of 1,630 patients with suspected appendicitis showed that the overall negative
appendectomy rate in patients with a CT scan was 6% which was similar to that in
those without CT scan(29). Neither CT scan nor US improves the diagnostic accuracy
or the negative appendectomy rate and worse may delay surgical consultation and
treatment. Alvarado score has then been developed and used to help in making
decision of prescribing CT scan in an emergency setting(30). The score considered
abdominal pain which migrates to the right lower quadrant, anorexia,
nausea or vomiting, right lower quadrant tenderness to palpation, rebound abdominal
tenderness, increased temperature (37.3°C or 99.1°F), leukocytosis (white blood cell
(WBC) count >10,000 cell/mm3), and neutrophilia (cell count with left shift), with a
total score of 10. If the score was 4 to 6, an adjunctive CT was recommended to
confirm diagnosis. Combining appropriate imaging with history, physical examination
and laboratory tests in clinical prediction rule are crucial for the management of
patients.
Over 20,000 studies have been published, but few randomized controlled
trials, especially in imaging, have been undertaken with controversial evidences(50).
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 7
In addition, applying CT imaging was also varied, which it was as low as 12 % in the
UK, 25% in Australia, and 95 % in the USA (27). The best way of management is
considering three possibilities: hospital discharge, admission for observation, and
surgical treatment(19). Estimating pre-image likelihood of appendicitis is important in
tailoring management. Low-risk patients could be discharged with appropriate safety
netting, whereas high-risk patients are likely to require early focus on timely surgical
intervention rather than diagnostic imaging. Using scoring systems to guide imaging
can be helpful(27, 31).
Appendicitis is one of the most common causes of surgical abdominal
emergency. Early diagnosis is a primary goal to prevent morbidity and mortality from
appendicitis. Failure of early diagnosis can lead to complications of disease such as
perforation and sepsis, increasing morbidities, and occasionally mortality. NA has
been reported between 15% and 30%, which is considered as surgical security zone.
Conversely, unnecessary appendectomy could burden time and cost. The nature of
negative appendectomy is associated with incorrectly diagnosis and surgeon’s
experience. Patients who are overweight, female, and old age have higher chance for
misdiagnosis of appendicitis. Severity and burden of negative appendectomy (NA)
ranges from economic loss of time and money to death from complication of surgery
or anesthesia such as pulmonary embolism in high risk patients. The mortality rate of
negative appendectomy has been reported as 0.14% to 1% (32).
The total cost of negative appendectomy from a pilot study in Faculty of
Medicine Ramathibodi Hospital ranged from 10,000 to 20,000 bath with mean
hospital stay of 3 days (range: 2-15), mean absence from work 5 days (range: 5-10).
Lowering the negative appendectomy rate would result in considerable saving direct
cost and disability to patients. Improvement in diagnostic accuracy has been reported
to lower perforation rate and coincided with the decrease in negative laparotomy(19).
1.4 Rationale Disparities in access to surgical diagnosis and management can result in
major discrepancies in the outcomes of patients. Omission of surgical care is a serious
oversight while omission of proper diagnosis before surgery may be more
Chumpon Wilasrumee Background and Rationale / 8
harmful(33). Acute appendicitis in rural areas has a very different disease profile and
outcome when compare to that seen in the well heath-serviced city. There is a causal
relationship between delay in management and poor outcome which needs urgent
strategies to reduce these delays. One of the suggested strategies aimed at facilitating
the diagnosis of acute appendicitis is the introduction of clinical decision rules (CDR)
to assist with clinical decision making(34, 35). The Alvarado score is the most widely
used CDR which was originally designed more than two decades ago, although, its
performance and appropriateness for routine clinical use is still unclear. A systematic
review showed that the Alvarado score at the cut point of 5 performs well as a “rule
out” CDR in all patient groups with suspected appendicitis(36). Pooled diagnostic
accuracy in terms of “ruling in” appendicitis at a cut-point of 7 is not sufficiently
specific in any patient group to proceed directly to surgery. Certain loss of diagnostic
information may occur due to dichotomisation when the score was originally
constructed in the derivation study. Its construction was based on a review of patients
who had been operated on with suspicion of appendicitis, whereas the score is used in
all patients with suspicion of appendicitis(10). Applying Alvarado score to general
population may be problematic due to appropriate derived score (see detail in chapter
II). Other CDRs were developed, such as Lindeberg(37), Eskelinen(38) and Fenyo(39)
scores for appendicitis, which have different numerical values for symptoms. The Van
Way, Teicher and Arnbjornssion scores include gender as one of their
components(40). Some authors(41) reported that the Alvarado score outperformed
each of these other scores.
The clinical prediction score may reduce the negative appendectomy rate
as well as decrease complication from appendicitis including ruptured, perforated, and
appendiceal abscess. This will reduce the risk of unnecessary operation, risk of
anesthesia, cost of hospitalization, and unnecessary loss of work and time. Gregory et
al showed the cost effectiveness of integrating a CDR in the diagnostic protocol for
appendicitis(42). The CDR followed by staged imaging is found to be the most cost
effective approach. The implementation of Alvarado and Lintula scores for the
decision of hospital admission and appendectomy has been shown to reduce overall
treatment charges for acute right lower quadrant abdominal pain(43), and the total
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 9
charge for 114 patients was reduced from $39,655 to $34,087 and $25,772 in using
Alvarodo and Lintula scores, respectively.
A clinical scoring system estimates the probability of appendicitis
occurrence and should aid in the decision-making process for management. There are
a number of reasons to use scoring systems in managing cases of appendicitis. A
clinical score may be suitable as an instrument for selecting patients for immediate
surgery, further evaluation with imaging techniques, or observation as out/inpatients.
The score can be repeated during active observation and influence the decision to
operate. It must be emphasized that the intent of the scoring system is not to establish
a primary diagnosis of appendicitis, but simply to discriminate objectively when there
is uncertainty. Routine use of an Alvarado-like scoring system was evaluated in a large
German study comparing patients who were/were not applied Alvarado-like scoring
system(10). No difference in the rates of perforated appendix, negative
appendectomy, or complications was found between groups. However, it showed
significantly lower delayed appendectomy rate and a lower delayed discharge rate in
the group that routinely used the scoring system.
Several scoring systems have been developed for diagnosis of appendicitis
with interesting results, nevertheless these systems have been less routinely applied in
general practice. We have systematically reviewed how those scores were developed
and validated, and how their performances were. The review suggested that the
research methods for scoring systems of appendicitis showed discrepancy. Although
there are several diagnostic scoring systems available, applying them to general
population might be questionable due to improper methods used for creating scores.
The more appropriate scores with internal and external validations are still
required(44).
The goal of this study was to create a good CDR for diagnosis of
appendicitis which has characteristics as follows:
• Consistently, applicable to all adult patients
• Criteria explicit and credible
• Reproducible
• Sufficient and comprehensive
• User friendly, good compliance
Chumpon Wilasrumee Background and Rationale / 10
• Generalization
• Cost-effectiveness
In this thesis, it is expected that the new score should be developed and
validated using proper research methods with good performances in internal and
external settings. It should be able to aid in clinical decision, and also impact on
changing behaviour of clinical practice, and improve outcome in the diagnosis and
management of patients with appendicitis.
1.5 Research Questions - What are significant predictors for diagnosis of appendicitis in patients
who are suspected of appendicitis?
- What is the performance of RAMA-AS in patients who are suspected of
appendicitis?
- Does the RAMA-AS perfrom better than the previously developed
scores?
- Can the RAMA-AS work well in internal validation and external
validation?
1.6 Research Objectives
1.6.1 Primary Objectives
1.6.1.1 To develop a RAMA-AS for diagnosis of appendicitis
in patients who are suspected of appendicitis.
1.6.1.2 To externally validate RAMA-AS using data from
different settings that used for score development.
1.6.2 Secondary objectives
To compare performance of RAMA-AS with the most popular
scoring system used, i.e. Alvarado score and previously developed scoring systems
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 11
Figure 1.1 Anatomy of appendix
Chumpon Wilasrumee Background and Rationale / 12
Figure 1.2 Normal and abnormal appendix: A) Normal appendix at McBurney’s
point, B) Inflamed appendicitis, C) Gangrene/ ruptured appendicitis
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 13
CHAPTER II
LITERATURE REVIEW
This chapter mainly focuses on review previous scoring systems, in term of
how they were developed, what predictors were included, how to calculate scores, how
their performances were, and whether the scores had been internally and externally
validated. A recent publication in World Journal of Emergency 2016(18) had paid
attention in “how to improve the clinical diagnosis of acute appendicitis in resource
limited settings”, which stated that “diagnosis of acute appendicitis can be improved if
the clinician uses a careful history and physical examination, and simple laboratory
tests. However, under certain circumstance, additional tests could be needed”. This
approach had given good results in various studies and proved that the clinical prediction
rules by combining related information were a simple, practical, economical, and
reliable method for the diagnosis of acute appendicitis(18). This chapter consists of
information as follows:
2.1 History of previous scores’ developments Alvarado score was developed and reported since 1986 by Alfredo
Alvarado(45). The score was developed from retrospective data of 227 patients in
Philadelphia, Pennsylvania, USA. The statistical 2x2 table was made for each diagnostic
predictor including migration of pain, anorexia-acetone in urine, nausea-vomiting,
tenderness, rebound pain, elevation of temperature, leucocytosis, shift to the left of
WBC, and rectal tenderness. Chi-square statistic was applied along with estimations of
probabilities, sensitivity, specificity, and predictive values. The diagnostic weight for
each clinical and laboratory result was assigned which considered only the true positive
and true negative results. The value of 2 was assigned to the important elements
(tenderness at right lower quadrant of abdomen and leucocytosis) and 1 to the remaining
elements (abdominal pain that migrated to right lower abdomen, anorexia, nausea or
Chumpon Wilasrumee Literature Review / 14
vomiting, rebound tenderness, elevated body temperature, or neutrophilia) (Table 2.1).
The total score was 10 with a score of 5 or 6 compatible with diagnosis of appendicitis
(patients can be observed), score of 7 to 8 indicates probable appendicitis, and score of
9 to 10 indicated very probable appendicitis. The modified Alvarado score was reported
by Kalan, et al, in 1994(46) by using extra sign(s) from physical examination including
cough test, Rovsing’s sign, and rectal tenderness instead of laboratory value of left shift.
Another modification used the total score of 9 after removing the laboratory value of
left shift from the original score. Khan, et al(47) reported low sensitivity (59%) and
specificity (23%) of Alvarado scoring system with negative appendectomy rate of
15.6% when applied to Asian population. Al-Hashemy, et al(48) reported similar low
sensitivity (53%) and specificity of 80% when modified Alvarado score was applied to
a Middle Eastern population. In my opinion, Alvarado score lacks some parameters that
have important impact on the diagnosis of appendicitis, so there is room for
improvement to generate the better scoring system for Thai population.
The Raja Isteri Pengiran Anak SalehA (RIPASA) score was developed and
reported in 2010 by Chong C, et al(49). This score was developed using the retrospective
collected database of RIPASA hospital, Brunei Darussalem between October 2006 to
May 2008. A total of 312 patients who had presented with right iliac fossa pain suspected
to be appendicitis and who underwent emergency appendectomy as primary procedure
were included in this study. The mean age of patient was 26±13.5 years, with male to
female ratio of 1.4:1. The negative appendectomy rate was 16.3%. Final diagnosis of
appendicitis was obtained from the resected appendix. The panel of surgeons at RIPAS
hospital agreed to use 15 parameters for score development, i.e., age, gender, right iliac
fossa (RIF) pain, nausea and vomiting, anorexia, duration of symptoms, RIF tenderness,
guarding, rebound tenderness, Rovsing’s sign, fever, elevated WBC count, negative
urinalysis, and foreign national registration identity card (NRIC). The probability of
appendicitis was estimated by logistic regression analysis, and used to generate scores
as shown in Table 2.2. The optimal cut-off threshold score generated from the ROC
analysis was 7.5. The sensitivity, specificity, positive predictive value, negative
predictive value, and accuracy were 88.46% (95% confidence interval (CI) 83.94-
92.08), 66.67% (95%CI 52.08-79.24), 93.00%, 53.00%, and 80.50% (95%CI 73.35-
87.65), respectively. The predicted negative appendectomy rate at cut off score of 7.5
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 15
was 6.9% which was reduced from 16.3% (9.3% reduction, p=0.0007). The RIPSA
score had good discrimination with area under Receiver Operating Characteristic (ROC)
curve of 0.89. In my opinion the RIPASA had too many parameters, generated from
retrospective data which had some missing data (84% Rovsing’s sign, 36% rebound
tenderness, 54% anorexia, 18% migration of pain, 13% negative urinalysis, 7% right
lower quadrant guarding). It is possible to have a new appendicitis score that is easier
to use and suitable for Thai population.
The Appendicitis Inflammatory Response Score (AIRS) was reported by
Andersson and Andersson in 2008(50). This score was generated in Sweden by
prospective data collection of 545 patients admitted with suspected appendicitis
between October 1992 to December 1993. The score was developed from 316 randomly
selected patients. The simplified score was constructed based on the ordered logistic
regression. Eight variables with independent diagnostic value including right lower
quadrant (RLQ) pain, rebound tenderness or muscle defense, white blood cell count,
proportion of neutrophil, c-reactive protein (CRP), body temperature ≥ 38.5 degrees
Celsius, and vomiting remained in the final model with score ranged from 0-12 (Table
2.3). The score 0-4 was classified as low probability and out patients follow up can be
done if unaltered general condition. The score 5-8 was indeterminate risk and in-
hospital observation, re-evaluation, and further investigation were recommended. The
score 9-12 was high probability and surgical exploration was proposed. The score was
internally validated using 229 patients. The discrimination capacity of the score was
better than Alvarado score in all appendicitis and advanced appendicitis samples (Table
2.3). The ROC area of the new score was 0.97 for advanced appendicitis and 0.93 for
all appendicitis compared with 0.92 (p = 0.0027) and 0.88 (p = 0.0007), respectively,
for the Alvarado score. The sensitivity, specificity, positive predictive value, and
negative predictive value of the score with the cutoff point more than 4 were 0.96, 0.73,
0.64, and 0.97, respectively. The score were not extrenally validated in Asian
population. This score needs external validation in Thai or Asian population and further
evaluation in a prospective interventional study.
The Fenyö-Lindberg scoring system (Table 2.4) was reported in 1987 by
Fenyö from Sweden(39). The score was developed from prospective data collection of
259 patients who were suspected of having appendicitis. The score was developed
Chumpon Wilasrumee Literature Review / 16
separately by men and women. The sensitivity and specificity were analysed according
to presence and absence of 19 parameters. The weight of evidence, equal to 10 loge
(sensitivity/1-specificity) was expressed as a positive/negative score. The score was 2
times externally revalidated and reported by Finyö, et al in 1997(37). The first validation
encompassed 19 indicators from 830 consecutive patients. The second validation was
based on 10 parameters including sex, white cell count, duration of pain, progression of
pain, relocation of pain, vomiting, aggravation by coughing, rebound tenderness,
rigidity, and tenderness outside right lower quadrant in 1167 patients with suspected
appendicitis. The score of -2 or more had probability of appendicitis ≥0.45 and was used
as an indication that patient had appendicitis and supported a decision to perform
appendectomy. The score of -17 or less had probability of appendicitis ≤ 0.16 and was
considered as non-specific abdominal pain and guided for non-operative management
by observation or discharge. The score between -3 to -16 had probability of appendicitis
between 0.44-0.17 and was interpreted as indeterminate which guided for in hospital
observation with repeated examination. The sensitivity, specificity, PPV, NPV, and
accuracy of this scoring system at the cut-off level of -2 or more were 0.73, 0.87, 0.75,
0.87, and 0.83, respectively. The negative appendectomy rate after using the score was
17.5%. However, the Fenyö-Lindberg scoring system had some limitations such as the
complexity of score (each parameter had both negative and positive score) and high
negative appendectomy rate.
Ohmann scoring system was reported in 1995(51), considering 8 parameters
including tenderness in RLQ, rebound tenderness, dysuria, constant pain, wbc count,
patient age > 50 year old, shifting pain, and local guarding (Table 2.5). The original
publication was in Germany language. The score < 6.5 should exclude appendicitis
whereas the score above 12 made it highly suggestive for appendicitis. The score
between 6.5 and 12 suggested that the finding is unclear and patients need observation.
Tepel, et al(52) performed prospective evaluation of the score and found that the
sensitivity, specificity, PPV, NPV, and accuracy were 61%, 85%, 61%, 85%, and 78%,
respectively.
Eskelinen scoring system was reported in 1992(38) by inlcuding 6 variables,
i.e., tenderness, rigidity, leucocyte count, rebound tenderness, pain at presentation, and
duration of pain. A logistic stepwise multivariate regression analysis was used to
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 17
develop diagnostic score, 3 tests were evaluated to find the best combination of
independent predictors of acute appendicitis for males and females. Each parameter had
criterion points that need to be multiplied by the respective factors and added to have a
final score. The cut-off point for diagnosis of appendicitis was 55 (Table 2.6). Sitter et
al(53) exteranlly validated this score using prospective data from 2,359 consecutive
patients in Germany and found the sensitivity, specificity, PPV, NPV, and accuracy of
79%, 85%, 68%, 91%, and 84%, respectively. They re-calibrated the score’s cut off
value to 57 which yielded better results and decreased the rate of negative appendectomy
from 26.6% to 15.4%.
A simple scoring system was reported by Christian F and Christian GP in
1992(54). The score was developed by non-statistical method. There were 5 parameters
including abdominal pain, vomiting, RLQ tenderness, low grade fever (body
temperature ≥ 37.8C by oral route), and polymorphonuclear (PMN) leukocytosis
(Table 2.7). A simple rule was applied with criteria of having four or more out of 5
parameters, appendectomy was performed. If the patients had 3 criteria on admission,
active inpatient observation was necessary until development of the 4th criteria and
appendectomy was carried out or until patients recovered which no progression beyond
the third criteria was found. The study was done in 58 patients and compared to the
control of 59 patients from another surgical unit. The negative appendectomy rate was
significantly lower in the group of patients that used scoring system (6.5%, 3/46) when
compared to the control group (17%, 10/59). This score has yet to be externally
validated.
A practical score of Ramirez and Deus was reported in 1994(55). The score
was developed using univariate analysis. Positive and negative weights were given to
each significant predictive parameters using Bayesian probability. There were 7
parameters including sex, initial pain (epigastric or other locations), diarrhea, white cell
count, differential white count, guarding in RLQ, and rebound tenderness (Table 2.8).
The Bayesian methodology was used to generate scoring system. No appendicitis was
found in the score less than -15. The mean score in proven appendicitis patients was 18
(-15 to 37). In prospective evaluation, the sensitivity and specificity of this scoring
system were 80% and 81%, respectively. This score proposed a dynamic system,
patient’s score can increase or decrease on reassessment. This system confirmed the
Chumpon Wilasrumee Literature Review / 18
effectiveness of scoring which is generated from local database, opening system, and
incorporate new attribution parameters which can produce a better scoring system.
Neither internal nor external validation has been reported on this system.
A scoring system developed by Teicher, et al was published in 1983(56).
The score was developed using univariate analysis, rate of occurrence for each
predictive parameter was determined and a ratio was assigned a positive value when the
rate was greater in appendicitis group and negative when greater in non-appendicitis
group. There were 7 parameters including sex, age, duration of symptoms, genitourinary
tract symptoms, muscle spasm at RLQ, rectal mass at right side, and WBC count (Table
2.9). The total score ranged from -11 to 11 and cut off points at -3 was recommended.
A single parameter has been reported as a predictor for appendicitis such as
hyperbilirubinemia was associated with perforated appendicitis(57). Imaging
technology such as ultrasonography, computer tomography, and magnetic resonance
imaging were used with clinical data or scoring systems to improve the accuracy in
diagnosis of appendicitis. I have worked with Redmond group and published a new
perspective in appendicitis: calculation of half time (T1/2) for perforation(58). Random
forest (RF), support vector machines (SVM), and artificial neural networks (ANN) were
used to improve the accuracy in diagnosis of appendicitis(59). Hsieh, et al found that
RF was significantly more accurate than ANN, logistic regression, and Alvarado in
diagnosis of appendicitis(21). SVM worked better than logistic regression, and
Alvarado. No significant difference was found between ANN, logistic regression and
Alvarado (Table 2.10).
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 19
2.2 Systematic review of scoring systems for diagnosis of
appendicitis(60) Appendicitis is one of the most important clinical causes among acute
abdominal pain, with an incidence of 110/100,000(52). Although many attempts have
been made to improve the diagnostic accuracy, false positive and false negative rates
remain common with rates of negative appendectomy of 15% to 26%(61, 62) and
perforated appendectomy of 10% to 30%(63). Several scoring systems included
computer-based models and algorithms that had been developed with good
performances at the initial evaluation, but fair when applied to general populations.
Nevertheless, these scoring systems have been occasionally applied in a general routine
practice, because of a lack of accuracy in validation studies(34). The drawback of the
negative appendectomy (i.e., false positive) was less life threatening than a false
negative which could be as worse as mortality from appendiceal perforation and
peritonitis from a perforated appendicitis. As a result, the aggressive surgical approach
was frequently applied when the situation was in doubt which resulted in removal of
normal appendices. In order to reduce the aggressive management, diagnostic tests for
appendectomy are required to improve performance in discriminating patients who
require prompt surgical intervention from the patients who need only observation
without a risk of complication of appendicitis.
Imaging modalities have been used to improve diagnostic accuracy.
However, there are some disadvantages including cost, less accessible particularly in
developing countries, lack of radiologists, examiner-dependent efficacy (e.g.,
ultrasound), potential harmful ionization (e.g., computerized tomography, CT), and low
performance in low or high prevalence of disease. Clinical scoring systems by
synthesizing clinical information have been developed and should be useful for those
countries where imaging is less accessible. The scores are derived by incorporated
physical examination, clinical signs and symptoms in a mathematical equation.
Currently, there are a number of diagnostic scores constructed by many camps using
various statistical methods(21, 30, 37, 39, 41, 45, 46, 49-55, 64-85). Some scores have
been validated either internally(50, 84) or externally(39, 45, 46, 50, 51, 54, 64, 68, 76,
82-84, 86) whereas some scores have been applied without validation(55, 69).
Performances of those scores varied from fair to good in validation phases, but some
Chumpon Wilasrumee Literature Review / 20
scores were still questionable. We therefore conducted a systematic review which aimed
at exploring score performances in both development and validation phases. Strengths
and limitations of previous diagnostic scores were critically appraised. Lessons from
this review will help to identify the most valid model/s or lead to create the new model
if required. The model can be later applied in general settings in developing countries
where resources are limited.
Methods
Search strategy
We searched Medline from 1949 and EMBASE from 1974 to March 2012
to identify relevant articles published in English. Search terms were included as follows:
appendicitis, gangrenous appendicitis, phlegmon, perforated appendicitis, abdominal
pain, score, scoring system, prediction score, prediction model, diagnostic score,
assessment tool, ultrasonogram, ultrasonography, computer tomography, accuracy,
negative appendectomy, sensitivity, specificity, likelihood ratio, false positive, false
negative, true positive, true negative, ROC, AUC. The search strategies are described in
the appendix.
Study selection
Studies were reviewed based on titles and abstracts. If a decision could not
be made, full articles were retrieved. Observational studies (cohort, case-control, or
cross-sectional) published in English were selected if they met with the following
criteria: suspected adult appendicitis, considered more than one risk factor in the
prediction score, had the outcome as appendicitis versus non-appendicitis, applied any
equation (e.g., Logistic regression, Bayesian method, or non-mathematical-investigator
opinion based) to build up the prediction model, and reported each model’s performance
(i.e., calibration and discrimination parameters).
Data extraction
The general characteristics of studies (i.e., author, journal, publication year,
type of participants, ethnicity, study design, number of subjects, rate of negative
appendectomy, percent of complicated appendicitis, and specific objective/s (i.e.,
develop or validate score, or both)) were extracted. If the diagnostic model was firstly
developed, specific information about model building (i.e., type of statistical model,
predictive factors, creating scores using coefficients or exponential of coefficients) were
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 21
extracted. Calibration (a ratio of expected versus observed value (E/O ratio)), and
discrimination parameters (i.e., the concordance statistic (C-statistic)) along with 95%
confidence interval (CI) were also extracted. These parameters were calculated if the
study did not directly report, but did provide summary data which allowed for
calculations. For studies which aimed at a validated model, the type of validations
(internal, external, or both) and results were also recorded. If authors had modified the
previous prediction models, the following aspects were recorded: whether any of the
original included variables were removed or modified; and whether new predictive
factors were added.
Risk of bias assessment
The risk of bias assessment tool was developed based on a user’s guide for
clinical prediction rule(62), which considered both derivation and validation phases.
Four domains were considered for the derivative phase, i.e., selection bias
(representative of spectrum), information bias (ascertainment of outcome
measurements, blinding outcome assessment, number of predictors, assessment
predictors without knowledge of outcome, proportion of important predictors),
confounding bias (used multi-variate regression analysis, created score properly), and
other issues (sample size, clinically sensible). For the validation phase, only 3 domains
were considered, i.e., selection bias (representative of spectrum), information bias
(ascertainment of outcome measurement, blinded assessment of outcome, accurate
interpretation), and other issue (i.e., follow up). Each item was classified as yes (low
risk of bias), no (high risk of bias), and unclear if there was insufficient information to
judge. Two reviewers (CW and TA) had independently extracted data and assessed risk
of bias for all included studies. Any disagreement was discussed with the third party
(AT) to resolve.
Statistical analysis
Model performances were described separately by derivative and validation
phases. Calibration (O/E ratio) and discrimination (C-statistic) coefficients along with
their 95% CIs were estimated for each study. A meta-analysis was applied to pool O/E
and C-statistic using the equations as described in the appendix. Heterogeneity was
assessed using Q statistic and a degree of heterogeneity I2 was estimated. If it was
present (p value <0.10 or I2 > 25%), a random-effect model was used to pool data,
Chumpon Wilasrumee Literature Review / 22
otherwise a fixed-effect model was applied. All analyses were performed using Stata
version 12.0.
Results
Description of studies
We identified 440 studies of which 37 studies met our inclusion criteria and
thus were eligible for the review, see Figure 2.1. Among 37, 10 studies(38, 39, 45, 46,
54, 56, 69, 76, 77) had aimed at only derived prediction scores or modified the previous
prediction models (hereafter called derived studies), 4 studies(50, 55, 83, 84) had
derived and internally and externally validated in the same studies, whereas 23 studies
had only aimed at internal(51, 87) or external(10, 21, 30, 37, 41, 49, 52, 53, 66, 67, 74,
75, 78, 80, 88-93) validations.
Among 14 derived studies(38, 39, 45, 46, 50, 54-56, 69, 76, 77, 83, 84, 94),
all studies focused on adult patients, and most studies included patients with suspected
appendicitis who received operation or were being observed conditions whereas 3
studies(55, 56, 84) include only patients who received operations. Ten models(38, 39,
45, 50, 54-56, 69, 76, 83, 84, 94) were developed in Caucasian populations while three
models(46, 54, 77) were in Asian populations. The models were majorly constructed
based on cohorts either retrospective(45, 55, 84) or prospective cohorts.(38, 39, 46, 50,
69, 76, 77, 83, 94)
Among 23 studies that aimed only for validation, 20 studies had validated
models on patients with suspected appendicitis whereas 3 studies had focused on
operated patients. Most study designs were prospective cohorts. Fifteen studies were
done in Caucasian while 8 studies were done in Asian populations.
Risk of bias assessment
Risk of bias assessments was performed (Table 2.11). The methodological
assessment of derivation studies was developed based on the detailed as follows(95):
were important predictors included and present in significant proportion, were the
outcome events and predictors clearly defined, were assessing the outcome event
blinded, was the sample size adequate, and did the clinical rule make clinical sense?
Among 14 derivative studies, 8/14 (57.1%) studies had recruited
consecutive patients with chief complains of abdominal pain, or randomly sampled
patients from a well defined population frame of abdominal pain; whereas the remaining
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 23
studies had recruited a specific group of patients who had at least a few clinical signs
and symptoms. Most studies (92.9%) had confirmed the diagnosis of appendicitis by
histology without mention of whether histology was performed without blinding clinical
information. Numbers of predictors used in the prediction models were covered and
appropriateness (i.e., low risk of bias) if authors considered and used predictors from all
categories which were demographic, clinical signs, symptoms, lab, and imaging data;
otherwise this item was graded as high risk of bias. Ten out of fourteen (71.4%) studies
clearly listed all categories of predictors where the remaining studies considered only a
few categories. Only 5/14 (35.71%) studies stated clearly how they measured or
collected predictors in the way that assessors were blinded from knowledge of the final
diagnosis of appendicitis, lab, and imaging findings, whereas 57.14% of studies used
predictors which were not blinded or assessed with knowledge of possible diagnosis of
appendicitis.
Eleven out of fourteen studies (78.7%) had performed statistical estimations
or tests for all predictors, whereas 3/14 (21.3%) studies did not apply any statistical
method. However, only 5/14 (35.7%) studies had applied multivariate regressions by
simultaneously including significant predictors in the models, and used coefficients or
relative risks suggested from regression models to create scores, whereas the remaining
studies created prediction scores based on univariate results or non-statistical models.
Twelve (85.7%) studies had sufficient numbers of subjects for either
appendicitis patients or total patients considered based on a rule of thumb (1 predictor
per 10 appendicitis or 20-30 per total subjects). Some studies (71.4%) included
predictors that seemed to be clinically sensible, the scores were easy to apply and also
had suggested a course of clinical action.
The methodological assessment of validation studies was developed based
on the details as follows(95): were patients chosen in an unbiased fashion and
represented a wide spectrum of severity of diseases, was there a blinded assessment for
the criterion standard, was there an explicit and accurate interpretations of predictor
variables and actual rule without the knowledge of outcome, and was there 100% follow
up?
For validation studies, 21/25 (84%) studies were less likely for selection
bias. An ascertainment of diagnosis of appendicitis was clearly defined in 24/25 (96%)
Chumpon Wilasrumee Literature Review / 24
studies. All studies did not mention whether diagnosis of appendicitis was masked from
clinical data. Thirteen out of twenty-five (52%) studies clearly described that
interpretation of the rule was not influenced by information of final diagnosis of
appendicitis, while 24% was influenced by diagnosis of appendicitis and 24% did not
mention it. Only 6 (24%) studies had followed up all included patients.
Score development
Among 14 derivative studies, 5 categories of predictive variables were
considered in the models including demographic data, clinical signs, clinical symptoms,
laboratory results, and imaging (Table 2.12). Among 2 demographic variables, gender
was the more commonly included in the model compared with age (42.9% vs 14.3%).
Ten symptom variables were considered in which nausea (9/14, 64.3 %) was the most
commonly included in the model followed with migration of pain, pain at presentation,
or duration of pain (all were 46.2%). Nine clinical signs were considered and the most
common variables used were rebound tenderness (76.9%), followed with right lower
quadrant (RLQ) tenderness (61.5%), and RLQ guarding (53.9%) or elevated
temperature (53.9%). Among 10 clinical symptoms, nausea/vomiting (53.9%) followed
with migration and duration of pain (46.4%) were most commonly included in the
predictive models. Most studies (84.6%) considered at least one lab variable. Among
these, rising white blood cell count (76.9%) was the most commonly used followed with
left shift of PMN cell (46.2%). Only a few studies used radiological data (e.g.
ultrasonography and abdominal radiograph) in creating scoring systems.
These prediction scores were developed using statistical modeling in 5
studies(38, 50, 76, 83, 84) whereas 9 studies(39, 45, 46, 54-56, 69, 77, 94) did not apply
statistical modeling. Among 5 studies with statistical modelling, 4 studies(38, 50, 76,
83) applied multivariate logistic regression and 1 study(84) used discriminant analysis.
Scoring schemes of these models were created based on regression coefficients of the
logit or discriminant regression models. Among 9 studies that did not apply statistical
models, a univariate analysis (e.g., Chi-square test, relative risk) and estimated
diagnostic parameters (e.g., likelihood ratio, sensitivity, specificity) were used for
assessing associations in 6 studies, whereas 3 studies did not apply any statistical
analysis tests.
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 25
Model performances
The models’ performances using C-statistics and O/E calibration
coefficients were extracted from individual studies, if reported, otherwise they were
estimated using summary data reported in the articles, see Table 2.13. Among 10 studies
where the calibration coefficient O/Es were available, the O/Es were very similar across
studies with the overall pooled O/E of 1 (95% CI: 0.97, 1.03). Contrastingly, the
discrimination coefficient C statistics varied from poor (0.54) to excellent (0.97)
discrimination with the pooled C statistic of 0.79 (95% CI: 0.67, 0.90). The C statistics
were very varied among 2 studies(38, 83) (i.e., ranged from 0.59 to 0.97) with
appropriate statistical methods to derive prediction scores.
Six of 14 prediction models had internally validated their prediction scores,
but only 5 had data available. The discrimination coefficient C statistics ranged from
0.61 to 0.92 with the pooled C statistic of 0.84(0.77, 0.92). Pooling within subgroups
according to appropriateness of derived predictive scores suggested similar results with
the C statistics of 0.81 (95%CI = 0.65, 0.97) and 0.88 (95%CI = 0.85, 0.91) for
appropriate and inappropriate derived predictive scores, respectively.
Twenty-three studies had been conducted which aimed at external
validation of 14 prediction models. The Alvarado score(45) was frequently validated in
14 studies(10, 21, 30, 41, 49, 50, 66, 75, 78, 80, 83, 88, 90, 96) followed by Fenyo model
in 3 studies(37, 67, 83). The study by Tzanakis et al(83) had externally validated 8
previous models, and thus was a major contributor of data in poolings. Most studies
created diagnostic scores using predictive factors according to the original scores. Data
used for validations were 15 Caucasian(10, 30, 37, 51-53, 66, 67, 74, 78, 88-90, 93, 94)
and 8 Asian(21, 41, 49, 75, 80, 91, 92, 96) populations studies.
Fourteen studies had externally validated Alvarado scores. All eight
variables (i.e., migration of pain, anorexia, nausea/vomiting, elevated temperature,
rebound tenderness, RLQ tenderness, increased WBC, and PMN left shift) were
included in the external validated models with the pooled E/O and pooled C-statistic of
0.99 (95%CI, 0.91 to 1.09) and 0.74 (95%CI, 0.69, 0.79), respectively. The Alvarado
score was also modified by two subsequent studies which excluded the shift to left of
PMN because this data was unavailable in a routine laboratory(46, 77), or replaced it
with a few other variables(i.e. cough test, Rovsing’s sign, rectal tenderness). This made
Chumpon Wilasrumee Literature Review / 26
the score performance change from 0.80 (95%CI, 0.73, 0.86) to 0.76 (95%CI, 0.60,
0.92) with PMN excluded, and even worst for replacing PMN with a few more variables
with the C-statistic of 0.54 (95%CI, 0.45, 0.63). External validation of other scoring
systems was performed in 9 models with the pooled statistic of 0.81 (95%CI: 0.77, 0.84),
but this was mainly contributed to by Tzanakis, et al(83) who had validated 7 models.
Pooling external validated studies according to appropriate and inappropriate original
model construction resulted in the C statistics of 0.80 (95% CI: 0.65, 0.94) and 0.77
(95%CI: 0.74, 0.81), respectively.
Discussion
We have reviewed performances of diagnostic models for appendicitis.
Most models yielded relatively fair to good performances in discrimination with the
pooled C-statistic of 0.84 (95%CI 0.77, 0.91) in settings where the models were
developed and 0.78 (95%CI 0.74, 0.82) in settings where the models were applied.
However, only one third of scores were appropriately derived based on regression
models.
For those models with good to excellence external performances (C-statistic
≥0.8), 10 variables were commonly included in the models, which were migration of
pain, anorexia, nausea/vomiting, duration of pain, elevated temperature, rebound
tenderness, right lower quadrant tenderness, guarding, increased white blood cell, and
left shift of PMN. These models were originally developed using proper statistical
modeling (i.e., logistic regression) in only 2/23 studies whereas the rest had used results
of diagnostic parameters or univariate analysis (i.e., Chi-square test) without proper
rationale for weighting in prediction scores.
The Alvarado score, developed by Alvarado et al(45) since 1986, was the
most popular prediction model for diagnosis of appendicitis which aimed to identify
patients who should go to operation or observation. The model was originally developed
using data from 277 Caucasians to assess the association between 8 predictive factors
and appendicitis. These predictive factors included localized tenderness at the right
lower quadrant (RLQ) of abdomen, migration of pain, elevation of temperature,
nausea/vomiting, anorexia/acetone in urine, rebound tenderness at RLQ of abdomen,
leukocytosis, and shift to the left of PMN cell. The diagnostic parameters (i.e.,
sensitivity, specificity and accuracy) were estimated for each individual predictive
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 27
factor and used for creating the prediction score. The discrimination ability of C-statistic
was 0.80 (95%CI, 0.73, 0.86) in the derivative phase which dropped to 0.74 (95%CI,
0.69, 0.79) after external validation.
Since the PMN is unavailable in a routine laboratory(46, 77), it was
excluded it from the model which yielded a bit lower performance in the derived phases
(C-statistics 0.80 vs 0.76), but it was far worst if it was replaced with a few clinical
variables (i.e. cough test, Rovsing’s sign, rectal tenderness; C-statistics 0.80 vs 0.54).
The O/E ratio is commonly used to measure the closeness of the predicted
and the observed values. The C-statistic is usually applied to measure how well the
model will assign a higher probability of having an event to an appendicitis group and
a lower probability to a non-appendicitis group(97). The association between diagnostic
factors and appendicitis derived from the derived data may occur by chance. This
problem is prominent in situations in which there is a relatively small sample size
compared with the diagnostic factors included in the model. With a small sample size,
it is more likely to select unimportant variables, but omit some important variables from
the model(53). Conversely, a very large sample size is more likely to include statistically
important variables without clinical importance. The number of subjects with events
should be at least 10 and more safely with 20 or more per one risk factor to derive a
valid model as for stimulation studies(80, 92). For the results of our review, the number
of variables included in the model varied from 4 to 18 variables, so the required number
of appendicitis cases should therefore be at least 40 to 180 subjects, and 80 to 360
subjects to be safer. Six out of 14 derived studies(46, 49, 54, 56, 69, 76) had their
number of appendicitis cases less than the required numbers by including 5 to 14
variables with total appendicitis cases of 43 to 261 patients.
Although the performances of predictive scores in the derived phase,
internal, and external phases were good (C-statistic = 0.79, 0.84, and 0.78, respectively),
applying these scores to a general population was less confident because most scores
were created inappropriately. Most studies (64.3%) derived predictive scores using
univaraite analyses or estimated diagnostic parameters and used results from these
analyses to create scores. The rationale of choosing a method for creating the predictive
score was not clearly described. In addition, these scores were based on univariate
analyses and thus confounding bias might be present.
Chumpon Wilasrumee Literature Review / 28
Differences in the distribution of risk factors across populations may also
affect the generalizability of the model to different populations. The C-statistics derived
from external validation usually tend to be lower than the C-statistics derived from
internal validation. Our results suggested that the pooled C-statistic from external
validations had slightly lower C-statistic than from internal validation. However,
pooling of external validations was dominated by the Tzanakis, et al(83) study which
had used the same subjects to validate 9 scoring systems. Excluding this study from
pooling resulted in the C-statistic of 0.75 (95%CI 0.74, 0.82).
Our review suggested that the research methods and reporting of research
findings in diagnostic scoring systems of appendicitis showed discrepancy. Some
research groups have advocated and developed research methods and reported
recommendations for conducting research in this area.(98, 99) In addition, a user’s
guide(100) for reading and use of evidence for this area has been also developed to
improve research methods, reporting, and use of evidence of prediction scores. We have
modified and used this tool for our study. The type of studied subjects, study design,
validity of measurements for outcome and diagnostic factors, and use of statistical
methods were mainly reported in most of the model developments. However, only 50%
(7/14) of the derived studies had developed scores and internally tested them in the same
studies. Most studies (64.3%) had created scores without applying statistical modelling.
None of the studies reported calibration parameters and only 9 (39.1%) studies in
external validation performed discrimination analysis and reported the C-statistic. The
models seemed to be clinical sensible in 71.4% (10/14 studies) which were simple and
easy to interpret.
In conclusion, although there are several diagnostic scoring systems for
appendicitis, applying them to a general population might be questionable due to
improper methods used for creating scores and lack of external validations. The more
appropriate scores with internal and external validations are still required.
2.3 Definition 2.3.1 Appendicitis is an inflammation of the appendix, which is the small,
finger-shaped pouch attached to the beginning of the large intestine on the lower-right
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 29
side of the abdomen. Appendicitis is a medical emergency, and if left untreated, the
appendix may rupture and cause a potentially fatal infection.
2.3.2 Migration of pain to the right lower quadrant = pain starting either in
epigastric area, centrally, or in the whole abdomen then eventually migrating down to
the right lower abdomen.
2.3.3 Pain aggravated by coughing = patient was asked to cough and any
worsening of pain was recorded.
2.3.4 Rebound tenderness = elicited in the right lower quadrant when a
hand pressing the abdomen for 10-15 seconds was suddenly withdrawn.
2.3.5 Rigidity = involuntary contraction of the abdominal muscles in the
absence of diagnostic evidence from an attribute.
2.3.6 Abdominal pain = abdominal pain (not only right lower quadrant)
2.3.7 Vomiting = one or more episodes.
2.3.8 Polymorphonuclear leukocytosis = study as a total count >
10,000/mm3 with PMN > 75%.
2.3.9 Rovsing sign was named by the Danish surgeon, Niels Thorkild
Rovsing, it is a sign of suspected appendicitis. If palpation on the left lower abdominal
quadrant results in more pain in the right lower quadrant, the patients have a positive
Rovsing sign and may have appendicitis.
2.3.10 Appendectomy = a surgical procedure to remove appendix.
2.3.11 Negative appendectomy = histological of normal appendix was
found from appendectomy that was done for the purpose of treatment after the
diagnosis of appendicitis.
2.4 Research methods in risk prediction scores Flow of study is displayed below. Methodological issues involved in
development and validation of risk prediction scores were as follows:
Chumpon Wilasrumee Literature Review / 30
2.4.1 Clinical decision rule
Establishing diagnosis of appendicitis is a closely linked activity in routine
physician‘s practice. Clinician experience provides us with the sense of which findings
from patient history, physical examination, and investigation are critical in making an
accurate diagnosis. Information from a single predictor is usually insufficient to give
reliable estimate of diagnosis. A multivariable diagnostic model was developed,
validated, updated, and implemented with the purpose to assist us in estimating
probabilities, improving diagnosis accuracy, and making decision in management
(observation, imaging, and surgery). Scoring system (clinical decision rule) in diagnosis
of appendicitis is a clinical tool which quantifies the individual contributions that
various parameters of patient history, physical examination, and basic investigation
results make toward the diagnosis of appendicitis. This would pave the way to use a
formal test, simplify, and increase the accuracy of clinician in the assessment of
appendicitis.
Why use both internal and external validation? Many appropriate and
rigorous derivation clinical decision rules are not ready to use in clinical practice. Many
rules fail to perform well when tested in a new population. The rule may reflect
association between significant variables and outcomes that are mainly due by chance.
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 31
This poor performance is statistical overfitting or instability in the original derived
model. A different set of predictors will emerge in a different group of patients, despite
patients come from the same setting. Bootstrap technique can deal with this problem by
removing 1 patient from the sample, generating the rule using the remainder, and testing
it on the patient who is removed from the sample(95). The poor performance can be
related to differences in prevalence of disease or differences in how the decision rule is
applied. The clinical decision rule must be validated prospectively in a completely new
patient population (external validation). Under ideal circumstances this would be
performed in a new clinical setting by different clinicians from those involved in the
derivation study(45).
2.4.1.1 Derivation
The creating of a scoring system involves identification of
variables with predictive power. All predictive variables are included and clearly
identified. Assessment of appendicitis is performed by a pathologist who is blinded to
the assessing parameters as well as assessor for the parameters are blinded for the
pathological diagnosis of appendicitis. Many systematic reviews have found that
prediction model studies do not provide the rationale for the sample size or any mention
of model overfitting(101).
2.4.1.2 Validation
This phase tested the reproducible accuracy of RAMA-AS
which was divided into 2 studies. Internal validation was a narrow validation which
would apply the RAMA-AS in a similar clinical setting and population as the derivative
phase. External validation was a broad variation which would apply the RAMA-AS in
multiple clinical settings with varying prevalence of appendicitis. Many studies validate
the newly developed model by using a separate data set, recruited later in time or from
other hospitals. The methodological standards for validation have been reported as the
followings: choose patients in an unbiased fashion and represent a wide spectrum of
severity of disease, blind assessment of the criterion standard, explicit and accurate
interpretation of the predictor variables and the actual rule without knowledge of
outcome, and make 100% follow up of enrolled patients(95).
Chumpon Wilasrumee Literature Review / 32
2.4.1.3 Impact analysis
This phase is aimed to find the evidence that RAMA-AS
changes clinician behaviour and improve outcome regarding to the diagnosis of
appendicitis.
2.4.1.4 Variables and appendicitis
Important variables in the diagnosis of appendicitis were
divided into 4 domains: demographic data, symptoms, clinical signs, and laboratory
results. Two demographic variables that were commonly used in previously developed
appendicitis scoring systems were age and gender. Ten symptom variables were
considered in which nausea/vomiting, followed by migration and duration of pain were
the most commonly included in the predictive models. Nine clinical signs were
considered in which the most commonly included variable was rebound tenderness,
followed by right lower quadrant (RLQ) tenderness, RLQ guarding, and elevated body
temperature. Most of the scoring systems considered at least one lab variable. Among
these, rising white blood cell count was the most commonly used followed with left shift
of PMN cells. All important variables from our previous systematic review were
included in the derivation(60). Included predictive variables in RAMA-AS must be
clinically sensible, easy to use, and suggested a course of action.
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 33
2.5 Conceptual framework
The aggressive attitude regarding diagnosis and management of patients
who are suspected of having appendicitis has recently been questioned. “When in doubt
cut it out” so in an inconclusive diagnosis and finding related to appendicitis patients
will pay the price of frequent removal of normal appendix. In the present time, advances
in peri-operative care reduces the dangers of peritonitis and complicated appendicitis.
This results in a more conservative approach which pays attention to the morbidity
associated with negative appendectomy. In order to decrease the negative appendectomy
without increasing the burden of complicated appendicitis, the generation of RAMA-
AS is one of the broad approaches which may improve surgeon competency in diagnosis
of appendicitis. This will reduce unnecessary complication that may cause by surgeon,
improve patient safety, and increase cost effectiveness of treatment for patients with
acute abdomen. A study with appropriate statistical methods will pave the way to reach
the objective of this study.
Chumpon Wilasrumee Literature Review / 34
Figure 2.1 Identification of studies for inclusion
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 35
Table 2.1 Alvarado scoring system
Domains Variables Points
Symptoms
Migration of pain 1
Anorexia 1
Nausea/vomiting 1
Signs
RLQ tenderness 2
Rebound tenderness 1
Elevated temperature 1
Laboratory Leukocytosis 2
Left shift 1
Chumpon Wilasrumee Literature Review / 36
Table 2.2 The scoring parameters of RIPASA score based on probability and extra
weight
Variables Points
Age
Less than 40
1.0
Greater than 40 0.5
Gender
Male
1.0
Female 0.5
Right iliac fossa pain (RIF) 0.5
Migration of pain to RIF 0.5
Nausea and Vomiting 1.0
Anorexia 1.0
Duration of Symptoms (hrs)
less than 48
1.0
more than 48 0.5
RIF tenderness 1.0
Guarding 2.0
Rebound tenderness 1.0
Rovsing sign 2.0
Fever (body temperature ≥ 37.8C by oral route) 1.0
Raised white cell count 1.0
Negative urinalysis 1.0
Foreign national registration identify card 1.0
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 37
Table 2.3 The Appendicitis inflammatory response score (AIRS)
Variables Points
Vomiting 1
Pain in right inferior fossa 1
Rebound tenderness or muscular defence
Light
1
Medium 2
Strong 3
Body temperature 38.5°C 1
Polymorphonuclear leukocytosis (%)
70-84
1
85 2
White blood cell count (cell/mm3)
10.0-14.9x103
1
15.0x103 2
C-reactive protein concentration (g/L)
10-49
1
50 2
Chumpon Wilasrumee Literature Review / 38
Table 2.4 Fenyö-Lindberg scoring system
Variables Points Constant -10 Sex Male Female
8 -8
WBC (x103cell/mm3) < 8.9 9.0-13.9 > 14.0
-15 2 10
Pain duration (hrs) < 24 24-48 > 48
3 0
-12 Progression of pain Yes No
3 -4
Relocation of pain Yes No
7 -9
Vomiting Yes No
7 -5
Aggravation with cough Yes No
4
-11 Rebound tenderness Yes No
5
-10 Rigidity Yes No
15 -4
Pain outside right lower quadrant Yes No
-6 4
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 39
Table 2.5 Ohmann score
Variables Points
Tenderness in right lower quadrant 4.5
Rebound tenderness, contralateral 2.5
Dysuria 2.0
Constant pain 2.0
WBC > 10 x103 cell/mm3 1.5
Aged > 50 years 1.5
Shifting pain 1.0
Local guarding 1.0
Table 2.6 Eskelinen score
Variables Points Factors
Tenderness 2 = RLQ, 1 = Other location 11.41
Rigidity 2= Yes, 1 = No 6.62
Leukocyte count 2 = 10,000 cell/mm3, 1 = < 10,000 cell/mm3 5.88
Rebound tenderness 2 = Yes, 1 = No 4.25
Pain at presentation 2 = RLQ, 1 = Other locations 3.51
Duration of pain 2 = < 48 hours, 1 = 48 hours 2.13
Table 2.7 Simple scoring system
Variables Points
Abdominal pain 1
Vomiting 1
Right lower quadrant tenderness 1
Low grade fever (≤ 37.8oC) 1
Polymorphonuclear leukocytosis (total count ≥10,000, PMN ≥ 75%) 1
Chumpon Wilasrumee Literature Review / 40
Table 2.8 Practical score of Ramirez and Deus
Variables Points
Sex
Male 6
Female -5
Initial pain
Epigastric 5
Other -6
Diarrhea
No 1
Yes -9
White cell count (cell/mm3)
≥10,500 6
<10,500 -14
Differential white cell count (%)
≥75 6
<75 -19
Guarding in right lower quadrant
Yes 8
No -7
Rebound tenderness
Yes 5
No -21
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 41
Table 2.9 Scoring system developed by Teicher
Variables Points
Sex
Male +2
Female -1
Age (years)
20-39 -1
>50 +3
Duration of pain (days)
1.5 +2
2 +1
3 -3
Genitourinary symptoms -3
Muscle spasm-Right lower quadrant
Involuntary +3
None -3
Rectal Mass at right side -3
WBC (cell/mm3)
<10,000 -3
>13,000 +2
Chumpon Wilasrumee Literature Review / 42
Table 2.10 Performance of random forests (RF), support vector machines (SVM),
artificial neural networks (ANN), logistic regression (LR), and Alvarado score on
diagnosis of acute appendicitis
AUC AC SN SP PPV NPV
RF 0.98(0.017) 0.96 0.94 1.00 1.00 0.87
SVM 0.96(0.027) 0.93 0.91 1.00 0.85 0.73
ANN 0.91(0.047) 0.91 0.94 0.85 0.94 0.85
LR 0.87(0.052) 0.82 0.91 0.62 0.85 0.73
Alvarado 0.77(0.057) 0.80 0.84 0.69 0.87 0.64
AUC, Area under ROC curve; AC, accuracy; SN, sensitivity; SP, specificity; PPV,
positive predictive value; NPV, negative predictive value
Fac. of G
rad. Studies, M
ahidol Univ.
P
h.D.(C
linical Epidem
iology) / 43
Table 2.11 Describe methodological assessments
Study Phase Score
Selection bias Information bias Confounding bias Other issue
Representative spectrum
Ascertainment of
outcome measuremen
t
Blinded assess
outcome
No pre
dictors
Predic tors
blinded outcome
Signifi cant
predict tors
Accurate interpreta
tion
Multivariate
regression analysis
Created score
properly
Sample size
Clinical sensible
Other issue
Van way,1982(84) D Van way N NA NA N N Y - Y Y Y Y - Teicher,1983(56) D Teicher Y Y NA Y NA Y - N N Y Y - Alvarado,1986(45) D Alvarado Y Y NA Y NA Y - N N Y Y - Fenyo,1987(39) D Fenyo Y Y NA Y NA Y - N N Y N - Christian,1992(54) D Christian N Y NA N Y N - N N N Y - Eskelinen,1992(38) D Eskelinen Y Y NA Y Y Y - Y Y Y Y - Kalan,1994(46) D Modified
alvarado N Y NA N NA Y - N N N Y -
Ramirez,1994(55) D Practical score of Ramirez and
Deus
N Y NA Y NA Y - N N Y N -
Gallego,1998(69) D Gallego Y Y NA Y NA Y - N N Y N -
Tzanakis,2005(83) D Tzanakis Y Y NA Y Y Y - Y Y Y Y - Lintula, 2005(76) D Lintula Y Y NA Y Y Y - Y Y Y Y - Malik,2007(77) D Modifies
alvarado N Y NA N NA Y - N N Y Y -
Andersson, 2008(50) D Appendicitis inflammatory response score
Y Y NA Y Y Y - Y Y Y Y -
Chong,2010(87) D RIPASA N Y NA Y NA Y - N N Y N - Van way,1982(84) I Van way N NA NA - - - N - - - - N
Fenyo,1987(39) I Fenyo Y Y NA - - - Y - - - - N
Ramirez,1994(55) I Practical score of Ramirez and
Deus
N Y NA - - - N - - - - N
Tzanakis,2005(83) I Tzanakis Y Y NA - - - Y - - - - Y Andersson, 2008(50) I Appendicitis
inflammatory response score
Y Y NA - - - Y - - - - Y
Lintul,2010(94) I Lintula,2005 Y Y NA - - - Y - - - - Y Fenyo, 1997(37) E Fenyo,1987 Y Y NA - - - N - - - - N
Chum
pon Wilasrum
ee
Literature R
eview / 44
Table 2.12 Characteristics of studies that had developed prediction scores for appendicitis
Author Phase Year Model Study design
Type of participant
Male (%)
Ethnicity Appendicitis Non Appendicitis
Statistical method
% Negative appendectomy
% Complicated appendicitis
Van way(84) D/I 1982 Van way score
Retrospective Cohort
Operated patients
NA Caucasian 360 116 Discrimination
analysis
29.83 25.30
Teicher(56) D 1983 Teicher Case control
Operated patients
45.50 Caucasian 100 100 Diagnostic analysis,
40
Alvarado(45) D 1986 Alvarado score
Retrospective Cohort
Hospitalized patients
NA Caucasian 227 50 Diagnostic analysis
7 18.77
Fenyo(39) D
1987 Fenyo score
Prospective
Cohort
Patients suspected
appendicitis
NA Caucasian 365 833 Diagnostic analysis
18 14.00
Christian(54) D 1992 Christian score
Quasi-experim
ental design
Patients suspected
appendicitis
77.59 Asian 43 15
non-statistical
base
6.5 6.50
Eskelinen(38)
D 1992 Eskelinen
Prospective
cohort
Hospitalized patients
NA Caucasian 270 1333 Multiple logistic
regression
21.6 6.74
Kalan(46) D 1994 Modified
Alvarado
Prospective
cohort
Hospitalized
patients
55.26 Asian 40 9 non-statistical
base
23.68 NA
Ramirez(55) D/I 1994 Practical score
of Ramirez
and Deus
Retrospective Cohort
Operated patients
63.00 Caucasian 293 67 Bayesian, Likelihoo
d ratio weight
18.61 NA
Fac. of G
rad. Studies, M
ahidol Univ.
P
h.D.(C
linical Epidem
iology) / 45
Table 2.11 Describe methodological assessments (cont.)
Study Phase Score
Selection bias Information bias Confounding bias Other issue
Representative spectrum
Ascertainment of
outcome measuremen
t
Blinded assess
outcome
No pre
dictors
Predic tors
blinded outcome
Signifi cant
predict tors
Accurate interpreta
tion
Multivariate
regression
analysis
Created score
properly
Sample size
Clinical sensible
Other issue
Denizbasi,2003(66) E Alvarado,1986 Y Y NA - - - Y - - - - NA AlQahtani,2004(41) E Alvarado,1986 Y Y NA - - - Y - - - - Y Pruekprasert, 2004(96)
E Alvarado,1986 Y Y NA - - - Y - - - - Y
Enochsson, 2004(67) E Fenyo,1987 UN Y NA - - - Y - - - - NA Sitter,2004(53) E Eskelinen,1992 Y Y NA - - - Y - - - - NA Tzanakis,2005(83) E Van way,1982
Teicher,1983 Alvarado,1986
Fenyo,1987 Christian,1992 Eskelinen,1992
Y Y Y Y Y Y
Y Y Y Y Y Y
NA NA NA NA NA NA
- - -
NA NA NA NA NA NA
- - - -
Y Y Y Y Y Y
Mckay,2007(30) E Alvarado,1986 Y Y NA - - - N - - - - NA Andersson,2008(50) E Alvarado,1986 Y Y NA - - - NA - - - - Y Kurane,2008(91) E Kalan,1994 Y Y NA - - - Y - - - - NA Sun, 2008(80) E Alvarado,1986 Y Y NA - - - N - - - - NA Talukder,2009(92) E Malik,2007 Y Y NA - - - Y - - - - NA Hsieh, 2010(21) E Alvarado,1986 Y Y NA - - - N - - - - NA Pouret-Baudry,2010(78) E Alvarado,1986 Y Y NA - - - Y - - - - Y
Chong,2011(49)
E Alvarado,1986 Chong,2010
Y Y
Y Y
NA NA
- - - Y Y
- - - - NA NA
Inci,2011(88) E Alvarado,1986 Y Y NA - - - Y - - - - Y Limpawattan,2011(75) E Alvarado,1986 Y Y NA - - - Y - - - - NA Konan,2011(90) E Alvarado,1986 N Y NA - - - N - - - - NA Kanumba,2011(89) E Kalan,1994 Y Y NA - - - Y - - - - NA Yoldas,2011(93) E Lintula,2005 N Y NA - - - N - - - - NA Castro,2012(10) E Alvarado,1986
Andersson,2008 Y Y
Y Y
NA NA
- - - Y Y
- - - - NA NA
Chum
pon Wilasrum
ee
Literature R
eview / 46
Table 2.12 Characteristics of studies that had developed prediction scores for appendicitis (cont.)
Author Phase Year Model Study design
Type of participant
Male
(%)
Ethnicity Appendicitis Non Append
icitis
Statistical method
% Negative appendectomy
% Complicated appendicitis
Gallego(69) D 1998 Gallego score
Prospective
Cohort
Patients suspected
appendicitis
NA Caucasian 101 91 Bayesian,
Likelihood ratio weight
8.85 18.23
Tzanakis(83) D/I/E 2005 Tzanakis score
Prospective
Cohort
Hospitalized
patients
56.10 Caucasian 217 504 Logistic regression
19.20 10.23
Lintula(76) D 2005 Lintula score
prospective cohort
Patients suspected
appendicitis
100 Caucasian 43 84 Logistic regression
13 NA
Malik(77) D 2007 Modified Alvarado
Prospective cohort
Hospitalized
patients
55.12 Asian 174 80 non-statistical
base
11.49 12.07
Andersson(50)
D/I/E 2008 Appendicitis
inflammatory
response score
Prospective Cohort
Hospitalized
patients
46.00 Caucasian 191 254 Logistic regression
11.00 14.00
Chong(87) I 2010 RIPASA score
Retrospective
cohort
Emergency appendecto
my
57.69 Asian 261 51 Univariate analysis
16.30 NA
Fac. of G
rad. Studies, M
ahidol Univ.
P
h.D.(C
linical Epidem
iology) / 47
Table 2.12 Characteristics of studies that had developed prediction scores for appendicitis (cont.)
Author Phase Year Model Study design
Type of participant
Male (%)
Ethnicity Appendicitis Non Append
icitis
Statistical method
% Negative appendectomy
% Complicated appendicitis
External validation phase Fenyo(37) E 1997 Fenyo
score Prospec
tive Cohort
Patients suspected
appendicitis
11.2 Caucasian 392 775 Diagnostic analysis
17.5 NA
Lamparelli(74) E 2000 Mod. alvarad
o by kalan
Prospective
Cohort
Patients suspected
appendicitis
NA Caucasian 56 26 Diagnostic analysis
NA NA
Denizbasi(66) E 2003 Alvarado score
Prospective
Cohort
Patients suspected
appendicitis
53.6 Caucasian 175 46 Diagnostic analysis
NA NA
AlQahtani(41) E 2004 Alvarado score
Prospective
Cohort
Patients suspected
appendicitis
59 Asian 121 30 Diagnostic analysis
12.5 NA
Pruekprasert(96) E 2004 Alvarado score
Prospective
Cohort
Patients suspected
appendicitis
46 Asian 186 45 Diagnostic analysis,
Discrimination analysis
NA NA
Enochsson(67) E 2004 Fenyo score
Prospective
Cohort
Patients suspected
appendicitis
12 Caucasian 330 96 Diagnostic analysis
NA NA
Sitter(53) E 2004 Eskelinen score
Prospective
Cohort
Hospitalized patients
NA Caucasian 662 1697 Diagnostic analysis,
Discrimination analysis
21.6 NA
Tepel(52) E 2004 Ohmann score
Retrospective Cohort
Patients suspected
appendicitis
37 Caucasian 113 287 Diagnostic analysis
22 NA
McKay(30) E 2007 Alvarado score
Retrospective Cohort
Suspected appendicitis
NA Caucasian 48 96 Diagnostic analysis
NA NA
Chum
pon Wilasrum
ee
Literature R
eview / 48
Table 2.12 Characteristics of studies that had developed prediction scores for appendicitis (cont.)
Author Phase Year Model Study design
Type of participant
Male (%)
Ethnicity Appendicitis Non Append
icitis
Statistical method
% Negative appendectomy
% Complicated appendicitis
Sun(80) E 2008 Alvarado score
Retrospective Cohort
Hospitalized patients
66.2 Asian 213 159 Diagnostic analysis,
Discrimination analysis
NA NA
Kurane(91) E 2008 Mod. Alvarado
by kalan
Prospective
Cohort
Patients suspected
appendicitis
48.33 Asian 23 37 Diagnostic analysis
61.67 NA
Ohmann(51) I 2008 ohmann Prospective
Cohort
Patients suspected
appendicitis
43.8 Caucasian 235 1019 Diagnostic analysis
10.2 13.4
Talukder(92) E 2009 Mod Alvarado
malik
Prospective
Cohort
Hospitalized patients
58 Asian 84 16 Diagnostic analysis
16 NA
Hsieh(21) E 2010 Alvarado score
Retrospective Cohort
Patients suspected
appendicitis
47 Asian 115 65 Diagnostic analysis,
Discrimination analysis
NA NA
Pouget-Baudry(78)
E 2010 Alvarado score
Prospective
Cohort
Emergency appendectom
y
48.07 Caucasian 171 62 Diagnostic analysis
NA NA
Lintula(94) I 2010 Lintula RCT Patients suspected
appendicitis
50 Caucasian 52 44 Diagnostic analysis
13 NA
Chong(49) E 2011 Alvarado score
Chong (2010)
Prospective
Cohort
Patients suspected
appendicitis
47.9 Asian 101 91 Diagnostic analysis,
Discrimination analysis
22.9 6.1
Fac. of G
rad. Studies, M
ahidol Univ.
P
h.D.(C
linical Epidem
iology) / 49
Table 2.12 Characteristics of studies that had developed prediction scores for appendicitis (cont.)
Author Phase Year Model Study design
Type of participa
nt
Male (%)
Ethnicity Appendicitis Non Append
icitis
Statistical method
% Negative appendectomy
% Complicated appendicitis
Inci(88) E 2011 Alvarado score
Prospective
Cohort
Patients suspected appendicit
is
NA Caucasian 57 9 Diagnostic analysis
13.6 NA
Limpawattanasiri (75)
E 2011 Alvarado score
Prospective
Cohort
Patients suspected appendicit
is
40.7 Asian 715 285 Diagnostic analysis,
Discrimination analysis
14.7 NA
Konan(90) E 2011 Alvarado score
Retrospective Cohort
Operated patients
NA Caucasian 41 41 Diagnostic analysis,
Discrimination analysis
NA NA
Kanumba(89) E 2011 Mod. alvarado
kalan
Cross sectional study
Patients suspected appendicit
is
29.1 Caucasian 85 42 Diagnostic analysis
33.1 62.9
Yoldas(93) E 2011 Lintula Retrospective Cohort
Operated patients
50.7 Caucasian 132 24 Diagnostic analysis,
Discrimination analysis
NA NA
Castro(10) E 2012 Alvarado score
Andersson
Retrospective Cohort
Patients suspected appendicit
is
44 Caucasian 340 595 Diagnostic analysis,
Discrimination analysis
21 NA
NA: not applicable, D=Derivative, I=Internal validation, E=External validation, Diagnostic=100%, Discrimination analysis=39.13%
Chumpon Wilasrumee Methodology / 50
CHAPTER III
METHODOLOGY
Continuing from background, rationale, and literature review in previous
chapters, this chapter aims to describe the research methods applied in our study. All
methodological issues for conducting study about risk prediction scores were
considered and covered as follows:
3.1 Study design and setting This study consisted of 2 parts, which were part I: derivation and internal
validation of CDR; and part II: external validation of CDR
3.1.1 Part I
The study design was cross-sectional study, which enrolled patients who
were suspected to have appendicitis and visited at out-patient clinic, the Emergency
and Surgery Department, the Faculty of Medicine Ramathibodi Hospital, Mahidol
University from January 2013 to May 2015.
3.1.2 Part II
The cross-sectional study was conducted at Thammasat University
Hospital and Chaiyaphum Hospital, which are tertiary and secondary care hospitals,
respectively. Thummasat University Hospital is located in central region, whereas
Chaiyaphum Hospital is located in Chaiyaphum province, in the North-Eastern of
Thailand. Data was collected during June 2015 to May 2016 and used for external
validations.
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 51
3.2 Study subjects
3.2.1 Inclusion criteria
3.2.1.1 Aged 15 to 80 years old
3.2.1.2 Patients who had abdominal pain within 7 days and
they were suspected of having appendicitis
3.2.1.3 Patients who had at least one of the following
symptoms and signs:
Symptoms: right lower abdominal pain, migration of
abdominal pain, anorexia, nausea, or vomiting
Signs: body temperature ≥ 37.8°C by oral route, right lower
quadrant tenderness, guarding, rebound tenderness, and decreased bowel sound
3.2.1.4 Agreed to participate and provided consent to the study
3.2.2 Exclusion criteria
3.2.2.1 Patients who could not give the history of illness by
themselves
3.2.2.2 Patients who had severe underlying diseases and
moribund such as severe myocardial infarction and terminal illness
3.2.2.3 Patients who had palpable abdominal mass
3.2.2.4 Patients who were diagnosed as tumor or malignancy
of appendix
3.2.2.5 Patients who had metastatic tumor to the appendix
3.2.3 Variables & Measurements
3.2.3.1 The outcome of interest
The outcome of interest was appendicitis, which was definitely
diagnosed by histopathology using the following criteria:
- Macroscopic finding: intravascular injection of serosa;
fibrinous, purulent film, edematous, hemorrhagic, necrotic changes of the wall: and
blood or pus on opening of the appendix.
Chumpon Wilasrumee Methodology / 52
- Microscopic finding: focal or expanded erosion, ulceration,
abscess.
Histological criteria for acute appendicitis was an inflammatory reaction
with PMN leucocytes in the mucosal layer of the appendix and edema.
Complicated appendicitis was defined by surgeon and/or, a peritoneal
swab or fluid culture grew at least one definite bowel organism, and/or the
histopathologist identifies a perforation in association with gangrene or full thickness
necrosis.
Negative appendectomy was the histological normality of appendix, which
was found from appendectomy. The surgery was done for the purpose of treatment
after the diagnosis of appendicitis. Non-surgical patient was classified as negative
clinical finding by telephone follow up at 1 month.
3.2.3.2 Predictive Variables
There were 4 domains of predictive variables as follows:
- 3.2.3.2.1 Demographic variables
Age: age was recorded as age in year at diagnosis.
Sex: sex was recorded as male or female.
- 3.2.3.2.2 Clinical symptoms
Onset of pain was recorded as insidious or sudden. Onset was a
description of speed/manner onset of pain. Sudden onset of pain happened abruptly
and severely whereas insidious onset of pain happened in mild degree and non-
abruptly.
Duration of pain was recorded as time since occurrence of pain
until patient came to the hospital.
Right lower quadrant (RLQ) abdominal pain was recorded
after asking the patient about presence/absence of RLQ.
Migration of pain was recorded after asking the patient about
presence or absence. Pain could start either in epigastric or central area, or in the
whole abdomen, then eventually migrate down to the right lower abdomen.
Anorexia was recorded after asking patient about loss of
sensation of appetite.
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 53
Aggravation of pain with cough was recorded as presence or
absence. Patient was asked to cough and asked if his/her pain was worsening, which
was classified as presence of aggravation of pain.
Nausea was recorded as presence or absence. Nausea was
sensation of unease and discomfort in the upper stomach with an involuntary urge to
vomit.
Vomiting was recorded as presence or absence. Vomiting was
a forceful expulsion of the contents of one's stomach through the mouth and
sometimes the nose.
Dysuria was recorded as presence if patient experienced pain
or difficulty in urination.
Diarrhea was recorded as presence if patient had three or more
loose or liquid bowel movements per day.
3.2.4.3 Clinical signs
Fever: body temperature >37.8°c by oral route
Tenderness at RLQ: Patient was classified as presence of RLQ
tenderness if s/he had pain when pressing hand at RLQ part of abdomen especially at
McBurney's point (the point over the right side of the abdomen that is one-third of the
distance from the anterior superior iliac spine to the umbilicus (the belly button). This
point roughly corresponds to the most common location of the base of the appendix
where it is attached to the cecum.
Rebound tenderness: pain elicited in the right lower quadrant
when a hand pressing the abdomen for 10-15 seconds is suddenly withdrawn, see
Figure 3.1.
Abdominal guarding: the tensing of the abdominal wall
muscles to guard inflamed organs within the abdomen from the pain of pressure upon
them. The tensing is detected when the abdomen wall was pressed. Abdominal
guarding is also known as 'défense musculaire'.
Rovsing sign: palpation on the left lower abdominal quadrant
results in more pain in the right lower quadrant.
Chumpon Wilasrumee Methodology / 54
Per rectal examination tenderness: Pain at suprapubic area or
within rectum after rectum examination and exert pressure on the peritoneum of the
cul-de-sac of Douglas.
3.2.4.4 Laboratory results
Increased white blood cell count: a total white blood cell count
> 10,000 cells/mm3.
PMN leukocytosis: a PMN cell count > 75%.
Urine analysis was recorded as normal if red blood cell and white blood cell were less
than 5 per high per field.
3.3 Data Collection
3.3.1 Case record forms
Case record forms (CRF) were developed (see Appendix A) and prepared
at the Section for Clinical Epidemiology and Biostatistics, Faculty of Medicine,
Ramathibodi Hospital and distributed to all research sites.
3.3.2 Training
Surgical residents were informed and trained about this research project,
data collection flow, informed consent, and assessment for all variables that were used
in diagnosis of appendicitis at the beginning of project and retrained every rotation at
general surgical unit.
In addition, research assistants were trained for data collection, queries
when data were missing, and assessment for all variables that were used in diagnosis
of appendicitis.
In order to validate the measurement, the quality control process was
started from first training the research assistants, and second year surgical residents for
data collection. The manual of data collection was explicit and explained clearly
including definition of signs, symptoms, and laboratory test, methods of examination,
and follow up at appointment or telephone follow up. Double data entering from the
case record form was used.
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 55
3.3.3 Data flow, queries, quality control, and project monitoring
Consecutive cases of suspected appendicitis (as described in inclusion
criteria) were included at emergency and out-patient surgical unit. The first or second
year surgical residents and research assistants who had already passed the protocol
training were responsible for data collection. Resident/research assistant approached
and informed patients about our aim of this study. Informed consent was given once
patients agreed to participate with study.
The outcome of having appendicitis and variables included age, sex, onset
of pain, duration of pain, progression of pain, right lower quadrant (RLQ) abdominal
pain, migration of pain, anorexia, aggravation of pain with cough, nausea or vomiting,
dysuria, diarrhea, fever (temperature >37.8 °c by oral route), tenderness at RLQ,
rebound tenderness, guarding, Rovsing sign, tenderness at RLQ during per rectal
examination, increased white blood cell count, PMN leukocytosis, normal urinalysis,
and CRP were collected prospectively in consecutive cases for the development
(derivation) and validation of RAMAAS.
3.4 Sample size estimation In our pilot study, the overall prevalence of appendicitis at the Surgical
Department, Faculty of Medicine Ramathibodi Hospital, Mahidol University was
approximately 62%, and its distribution by predictors were shown in Table 3.1. The
sample size based on assessing association or testing two proportions for each
parameter ranged from 40-8084.
For instance, the prevalence of appendicitis in patients who had duration of
pain less than 48 hours was 0.75. The null and alternative hypotheses were as follows.
H0: p1-p2=0
Ha: p1-p2≠ 0
Assign Type I and II errors of 0.05 and 0.1, respectively were assigned,
with a ratio of exposed vs non-exposed of 1:1. The size of difference between P1 and
P2 was set at 0.13. The sample size could be estimated as shown in the equation below
or using PS program version 3.1.2
Chumpon Wilasrumee Methodology / 56
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠(𝑛𝑛) =�Zα
2�2P�(1 − P�) + Zβ�P1(1 − P1) + P2(1 − P2)�
2
(P1 − P2)2
𝑃𝑃� =𝑃𝑃1 + 𝑃𝑃2
2
𝑃𝑃� =(0.62 − 0.75)
2= 0.685
𝑛𝑛 =�1.96�2 ∗ 0.685(1 − 0.685) + 0.84�0.62(1 − 0.62) + 0.75(1 − 0.75)�
2
(0.62 − 0.75)2
𝑛𝑛 = 199
From our pilot study, the proportion of appendicitis in exposed and non-
exposed groups for variables that were associated with appendicitis are described in
Table 3.1. Type I and type II errors were set at 5% and 20%, with the ratio of exposed
vs non-exposed of 1: 1. The size of detectable was set at 5% and 10%. The sample size
estimation from each variable was calculated. Using data from the most significant
variable (migration of pain), therefore 356 (220+136) patients were required to
develop the RAMAAS with taking into account for 10% loss to follow up rate.
A total of 8-10 parameters were expected to be included in the final score.
Using the rule of thumb, 20 subjects with appendicitis were required for 1 variable, so
approximately 160 and 200 subjects with appendicitis were needed for 8 and 10
parameters, respectively. As a result, a total of 239 to 299 patients was required to
enroll.
Comparing estimated sample sizes between the two approaches, the larger
sample size was used. Given percentage of loss to follow up of 5%, the total sample
size was 356 + [356 X 0.05] = 374 subjects. Therefore 374 patients were required to
develop the RAMAAS. Thirty percent of the samples from derivative phase was
required for external validation, therefore 112 subjects from different medical centers
were needed for external validation phase.
3.5 Data management Databases were constructed separately by settings based on the CRFs
using EPIDATA program version 3. The principal investigator checked for the
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 57
completeness of CRFs before data entry. Double data entry was independently done
by staff and two data sets were validated to reduce typing errors. Data checking and
cleaning was done monthly with clarification of missing, unclear, or non-sensible data.
CRFs and electronic data were kept in safe areas, only the principal investigator,
supervisors, and the data manager were able to access data. Real time data backup was
automatically done weekly at the server of data management unit, Section for Clinical
Epidemiology and Biostatistics to prevent data loss. Data was divided into 2 sets: 1)
derivative and internal validation phase from Faculty of Medicine Ramathibodi
Hospital 2) external validation from Faculty of Medicine Thammasat University
Hospital and Chaiyaphum Hospital.
3.6 Statistical analysis
3.6.1 Imputation
The necessity for imputation should be considered when attempts at
recovering lost information have failed. For our data management monthly meeting
monthly, CRFs were retrieved and checked, so if missing data were detected, enquiries
were made by sending to research assistants at each collection site. Principal
investigator and research assistants reviewed and dug data from medical records if
needed, from notes of physician, laboratory results, and direct phone call to patients or
referral hospital/s if needed. Missing data could shrink the sample size, decrease
precision of confidence interval, weaken statistical power, and create bias of
estimations. Instead of omitting missing data, we applied imputation technique to deal
with missing data that related to observed variables.
Multiple imputation (MI) is advocated as the preferred imputation method
and can lead to more correct standard errors and P values (102). It involves generating
multiple copies of the data sets, where the missing values were replaced by imputed
values drawn from the predicted distribution by using the observed data. Variability of
MI consists of two sources, i.e., within and between imputations. Thus, precision of
MI estimate depends not only on number of subjects, but also on numbers of
imputations. Performance of imputation can be assessed using relative variance
Chumpon Wilasrumee Methodology / 58
increase (RVI) and fraction of missing information (FMI). The RVI refers to average
relative increase in variances of estimates because of missing variables (i.e., mean of
variance of all coefficients from missing data), and as this value closes to 0, missing
data reflects less on estimates. The FMI refers to the largest fraction of missing
information of coefficient estimates due to missing data. It is used to get the idea about
the required number of imputations based on a rule of thumb, i.e., FMIx100. For
instance, if FMI = 0.15, a number of imputation = 0.15x100, i.e., at least 15
imputations are required.
Multiple imputations were performed using a simulation-based sequential
multivariate-regression analysis with chain equations (103, 104). Linear regression,
logistic regression, and multi-logit regression were used to predict missing data for
continuous, dichotomous, and categorical data (> 2 groups), respectively.
Distributions/patterns of missing data were explored to ensure that data were missing
at random (MAR). The MAR assumption would be satisfied if the probability of
missing data on Y was unrelated to the value of Y after controlling for other variables
(X) from the analysis, i.e., P(Ymissing/YX) = P(Ymissing/X). Complete data of 14
variables (i.e., outcome of appendicitis, age, sex, location of pain, migration of pain,
progression of pain, aggravation of pain, nausea or vomiting, anorexia, body
temperature, tenderness, rebound tenderness, guarding, and bowel sound) were used as
independent variables to predict the values of the missing data. Each missing variable
was modelled conditionally on the remaining variables in the data set until no missing
variable remained, or in another words, all missing data were filled. Therefore, the
final step would lead to data sets which contained observed and imputed data. Since
the frequency of missing data and the largest FMI (i.e., measures of uncertainty of the
values estimated from MI) were very low in this study (i.e., less than 0.05), twenty
imputations were performed to allow for the uncertainty of imputed data. Bias from
multiple imputations was examined by comparing the distributions between missing
and observed values using the “midiagplots” command in STATA program.
Among 396 eligible patients, only 2 variables, i.e., WBC and neutrophil
were missing with percentages of 10.86 (43/396) and 10.10 (40/396), respectively.
Missing data were imputed using multivariate chain of truncated linear regression
models. WBC and neutrophil were regressed on complete data as mention previously,
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 59
see Appendix B. Possible ranges of WBC and neutrophil were set at 2,036 to 70,400
cell/mm3 and 32% to 96%, respectively. An interaction was not considered in the
imputed model. Twenty imputations were performed to allow for the uncertainty of
imputed data and the summarized values were then used(105). Further statistical
analysis was applied to each imputed data set, and estimations were then combined
using Rubin’s rule to produce an overall estimate of each regression coefficient taking
into account uncertainty in the imputed values.
3.6.2 Derivative phase
The data collected at Ramathibodi Hospital was used to derive the RAMA-
AS. Mean and standard deviation (SD) or median (range) where appropriate was used
for describing continuous data while frequency (percentage) was used for categorical
data. The baseline characteristics of patients were compared between appendicitis and
non-appendicitis groups using independent t test (or quantile regression where
appropriate) and Chi-square (or Exact test where appropriate) for continuous and
categorical data, respectively.
A simple logistic regression analysis was used to screen variables that
might associate with appendicitis. Individual variables of 4 domains (i.e., demographic
data, clinical symptoms, clinical signs and laboratory results) were fitted in the logit
model, and a likelihood ratio (LR) test was used to test association. Variables whose p
values were less than 0.20 were simultaneously considered in the multivariate logistic
regression model. The LR test was applied for model selection with backward
elimination. Only significant variables were finally kept in the parsimonious model.
Goodness of fit was performed to assess whether the model fitted well with data (, i.e.,
predicted and observed values were close) using Chi-square Hosmer-Lemeshow test.
In addition, a calibration coefficient (observed values/predicted values) along with its
95% confidence interval (CI) was estimated. Furthermore, calibration plot was also
constructed, i.e., predictive risk was plotted against observed risk. The coefficients of
the final parsimonious model was used to create RAMA-AS by “summing up” all
coefficients in the final model. The receiver operating curve (ROC) analysis was used
to calibrate the score cutoff. Diagnostic parameters (i.e., sensitivity, specificity,
Chumpon Wilasrumee Methodology / 60
likelihood ratio positive (LR+) and negative (LR-)) were estimated for each distinct
value of the scores.
Receiver-operating characteristic (ROC) curves are an excellent method to
compare diagnostic tests. They were initially developed in World War II, by the
British army (106). The plot of sensitivity versus 1 - specificity is named receiver
operating characteristic (ROC) curve and the area under the curve (AUC), is an
effective measure of accuracy which is considered for meaningful interpretations. This
curve plays an important role in evaluating diagnostic ability of tests to discriminate
the true state of patients, finding the optimal cut off values, and comparing two
alternative diagnostic tasks when each task is performed on the same subject (107). A
rough guide for classifying the accuracy of a diagnostic test in the traditional system is
ROC 0.90-1 indicated excellent, 0.80-0.90 indicated good, 0.70-0.80 indicated fair,
0.60-0.70 indicated poor, and 0.50-0.60 indicated fail.
3.6.3 Validation phases
3.6.3.1 Internal validation
After derivation of model, several factors such as number of
predictors given small effective sample size, categorization of continuous variables,
and strategies of predictor selection may lead to an overfitted model and optimistic of
apparent performance. It is important to assess model performance more honestly by
performing internal validation preferably using resampling techniques such as
bootstrapping or cross-validation methods. The classical split-sample internal
validation technique is done by dividing derivation data into 2 data sets; one for
derivation of model and the other to validate model. The 2 sets of data are typically
created by randomly splitting the original data (e.g. 70:30 or 50:50). This method does
not work when total sample size and/or number of diseased patients outcome is small.
This technique requires a large sample size and splits data by time (temporal
validation) or location (geographic validation) (102).
Cross-validation is an extension of the split-sample technique.
This technique is used to reduce the bias and variability of the performance estimates.
The 10-fold cross-validation technique is done by randomly splitting the data into 10
equally sized groups. The model is derived in 9 of the 10 groups and evaluation of its
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 61
performance is done in the remaining group. This entire process is then repeated 10
times so that each of the 10 groups is used to test the model. The performance of the
model is calculated as the average over the 10 iterations.
The bootstrapping technique is performed by developing the prediction
model using the entire original sample (size n) and determining the apparent
performance. A bootstrap is performed by sampling n individuals with replacement
from the original sample. For each bootstrap data, a model is developed by applying
all the same modeling and predictor selection methods for determining the apparent
performance (e.g., c-index) , called bootstrap performance, and this is compared with
the performance of the original sample (original performance). The optimism is
calculated as the difference between the bootstrap performance and the original
performance. Repeat the previous steps at least 100 times and average the estimates of
optimism then subtract the value from the apparent performance obtained in the first
step to obtain an optimism-corrected estimate of performance (108).
This technique has better effects of predictor selection
strategies on model building than the cross-validation technique, and thus the extent of
model overfitting and optimism can be quantified by repeating the predictor selection
process in each bootstrap sample. This technique also provides an estimate of the
adjustment or correction factor, by which the model’s regression coefficients and its
performance measures can be shrunk and thus adjusted for overfitting. It is very
important that all aspects of model fitting are incorporated into each random or
bootstrap derivation sample, including selection of predictors, deciding on
transformations, and tests of interaction with other variables or time. Omitting these
steps is common in clinical research, but can lead to biased assessments of fit, even in
the validation sample.
A strength of applying bootstrap is it utilizes all data which is
used to develop the prediction model and provides a mechanism to account for model
overfitting or uncertainty in the entire model derivation process. It quantifies any
optimism in the final prediction model and provides an estimation of shrinkage factor
that can be used to adjust the regression coefficients and apparent performance for
optimism, such that in subsequent model validation studies and applications, better
performance is obtained.
Chumpon Wilasrumee Methodology / 62
In this study, the bootstrap technique with 450 replications was
applied for internal validation of the RAMA-AS(99). For each bootstrap sample, the
RAMA-AS score suggested from the derived phase was calculated for each patient
and then was fitted in logistic model. The score performances (i.e., discrimination and
calibration) were estimated. For calibration, the correlation between the observed and
expected values of appendicitis was assessed using the Somer’D correlation
coefficient for all bootstrap data (called Dboot) and derived data (called Dorg).
Calibration of the model was then assessed by subtracting the Dorg from the mean
Dboot. Lower value reflected less bias and thus better calibration. Discrimination of the
model was also assessed by comparing the original C statistic versus an average C
statistic from the bootstraps. The command used for bootstrap is provided in Appendix
C.
3.6.3.2 External validation
The two most important model performances are
discrimination and calibration. For derivation phase, the primary interest is
discrimination because the model is more likely to be well calibrated (on average) by
definition. In validation studies, assessment of both discrimination and calibration are
crucial. Discrimination is the ability of a prediction model to differentiate between
subjects who had negative or positive outcome event. A model has perfect
discrimination if the predicted risks for all individuals who were diagnosed positive
are higher than those for all individuals who have negative outcome. Discrimination is
estimated by the concordance index (c-index), which is the probability that for any
randomly selected pair of individuals, one with positive and one with negative
outcome, the model assigned a higher probability to the individual with positive
outcome. The c-index is identical to the area under the receiver-operating
characteristic (ROC) curve for models with binary endpoints.
Calibration reflects the agreement between predicted outcome
and the observed outcomes. Report of calibration should be done graphically with
predicted outcome probabilities on the X-axis plotted against observed outcome
frequencies on the Y-axis. Predicted probability is divided into 6-10 groups. This plot
is commonly done by sixths-tenths of the predicted risk and should be augmented by a
smoothed line over the entire predicted probability range. This plot displays the
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 63
direction and magnitude of model miss-calibration across the probability range, which
can be combined with estimates of the calibration slope and intercept. Smoothed or by
subgroups, a well-calibrated model shows predictions lying on or around the 45° line
of the calibration plot, amd perfect calibration shows a slope of 1 and intercept of 0.
Data from the two external hospitals mentioned previously were used to
validate the performance of RAMA-AS. The RAMA-AS was calculated for each
patient according to suggestion from the derived model and fitted in the logit model.
Model performances (i.e., calibration and discrimination) were then assessed as
mentioned previously. Calibration performance was explored as mentioned above, if it
was not well calibrated, model re-calibrations were performed by re-calibrating the
intercept (called M1) and overall coefficient (called M2) (109, 110) as shown in Table
3.2. The M1 was constructed by fitting RAMA-AS on appendicitis variable in two
external data separately. The estimated intercept of this model was then used to re-
calibrate by adding it up from the original intercept. An estimated coefficient from
this model was then used to calibrate the coefficient by multiplying it with overall
coefficients (M2). In addition, four model revisions were additionally performed from
the M2 as shown in Table 3.2 (109-112). The M3 was constructed by fitting M2 with
seven additional individual predictors. A likelihood ratio test was applied to compare
the M2 with/without additional predictor. Only significant predictors were kept in the
model M3. The M4 was constructed by fitting M2 plus significant predictors from
stepwise selections of the seven predictors. The M5 was constructed by re-estimating
all coefficients of seven predictors. Finally, the M6 was constructed by re-selecting
only significant predictors out of seven predictors.
In summary, the rationale for models M0 to M6 are described
and summarized as follows (110).
M0: no adjustment, the original model had good calibration.
M1: adjustment of the intercept (i.e., baseline risk) because of
difference in occurrence of appendicitis between derivation and validation samples.
M2: M1 plus adjustment of all predictor regression coefficients
by one overall factor because the regression coefficient of the original model was
overfitted or underfitted.
Chumpon Wilasrumee Methodology / 64
M3: M2 plus extra-adjustment of regression coefficients for
predictors with different strengths in the validation sample as compared with the
derivation sample. The rationale was similar to M2 and the strength (regression
coefficient) of one or more predictors may be different in the validation sample.
M4: M2 plus stepwise selection of additional predictors. The
rationale were as in M2 and one or more potential predictors were not included in the
original model, or a newly discovered variable may need to be added.
M5: Re-estimation of all regression coefficients by using the
data of the validation sample only, because the strength of all predictors may be
different in the validation sample or the validation sample is much larger than the
development sample.
M6: M5 plus additional predictors by stepwise selection. The
rationale is the same as in M5 and one or more potential predictors are not included in
the original model.
When validating a derivation model in other individuals, the
performance of the model is usually poorer than the performance estimated in the
individuals on whom the model was derived. This is likely caused by differences in
study design, measurements of predictors, or prevalence of outcome of interest. When
lower predictive accuracy happens, we may reject, refit, or derive a new model.
However, derivation of a different model may encounter the same problem, i.e.
overfitting and perhaps even less generalizable than the original model. Prior
knowledge captured in the original studies is not used optimally, so unenhanced
evidence-based medicine which should be based on as much data as possible.
Therefore, before derivation of a new model from the validation data, we should first
try to adjust the original model to determine what extent loss in predictive accuracy
may be overcome. An adjusted model should combine the information presented in the
original model with information from individuals in the validation set. This is more
likely to improve transportability to other individuals. Several methods have been
recommended for updating prediction models (2, 20, 31, 102, 290, 372, 373). The
derivation and validation data set commonly differ in proportion of outcome events,
yielding poor calibration of the original model in the new data. By adjusting or
updating the intercept or baseline hazard of the original model to the validation
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 65
sample, calibration is often improved. More advanced adjusting methods vary from
overall adjustment of all predictor weights by a single recalibration factor, adjustment
of a particular predictor weight, or addition of a new predictor to re-estimate all
individual regression coefficients.
3.6.3.3 Comparison of RAMAAS and previous scores
Data from our previous systematic review (60) suggested that
Eskelinen. (95), Alvarado(45), and Fenyo scores (39) provided good discriminations.
Alvarado, Fenyo, and Eskelinen scoring systems were then calculated for individual
patients. Performances of these scores were then estimated and the C-statistics were
compared with our RAMA-AS performances using ROC curve analysis.
All analyses were performed using Stata version 14 (Stata Corp, College
Station, Texus, USA) under mi estimate command. A P-value of less than 0.05 was
taken as a threshold for statistical significance.
3.7 Ethics considerations
3.7.1 Informed consent if needed
The informed consent was shown in the appendix section. This research
study was submitted to the ethics committee for approval. These were conducted
according to the principles of the Helsinki Declaration and in accordance with the
Medical Research Involving Human Subjects Act. The protocol was submitted for
approval from the ethics committee of Faculty of Medicine Ramathibodi Hospital,
Mahidol University. The principles of respect for persons, beneficence, no
maleficence, and justice were applied in this research.
3.7.1.1 Respect for Persons
This principle of two ethical convictions was considered. All
participants were treated as autonomous and second, participants with diminished
autonomy were protected. They were treated with dignity and respect. The medical
and surgical care were not disturbed by this study. This study did not used participants
as a means to an end or in a manner inconsistent with the person's interests or wishes.
All the treatments were given to participants in the standard quality of care. The
Chumpon Wilasrumee Methodology / 66
decision of sending participants for investigations, giving medical treatment and
surgery were not disturbed. The data was collected in case record form by history
taking, physical examination, and reviewing medical records. Telephone follow up
may have disturbed participants, although the convenient time of each participant was
recorded and used for contact.
3.7.1.2 Beneficience
This study prevented and removed harm as well as promoted
the good of the patient by minimizing the possible harms or risks and maximizing the
potential benefits. The no maleficence which prohibits the infliction of harm, injury, or
death upon others was applied (the maxim Primum non nocere: “Above all do no
harm”). This study did not disturb the process of care. No new therapy or medicine
was used in this study.
3.7.1.3 Justice
Each participant was treated fairly and equitably, and was
given his or her due. The study used available resources fairly and distributed them
fairly and equitably. All participants were treated equally. They had the right to leave
the study at any time without affecting the quality of care.
3.7.1.4 Protection
This study assured that participation was voluntary and patients were
treated with equality and fairness. The participants were informed all the procedures
that happened which was related to them and their family.
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 67
Table 3.1 Estimation of sample size
Variables Prevalence of appendicitis
Expected difference
(%)
Sample size (by PS program)
Age (years) < 40 ≥ 40
0.72 0.70
none
-
Sex male female
0.83 0.61
20
76X2=152
Onset of pain insidious sudden
0.93 0.62
20
54X2=108
Duration of pain (hours) < 48 ≥ 48
0.75 0.52
20
88X2=176
Migration of pain presence absence
0.80 0.63
17
110X2=220
Aggravation of pain presence absence
0.65 0.35
20
96X2=192
Nausea presence absence
0.74 0.69
20
89X2=178
Vomiting presence absence
0.73 0.70
20
90X2=180
Dysuria presence absence
0.10 0.71
20
43X2=86
Diarrhea presence absence
0.75 0.70
20
25X2=50
Fever (T>37.8oC) presence absence
0.83 0.61
20
87X2=174
Tender at RLQ presence absence
0.71 0.70
20
80X2=160
Abdominal guarding presence absence
0.79 0.64
15
141X2=282
Rovsing sign presence absence
0.75 0.69
None
-
PR tenderness presence absence
0.74 0.69
None
-
Increase WBC presence absence
0.77 0.20
20
86X2=172
Total 0.62 PR=per rectal examination, RLQ=right lower quadrant, WBC=white blood cell
Chumpon Wilasrumee Methodology / 68
Table 3.2 Re-calibration and revision of models for external validations
Type of update model Thammasat Chaiyaphum M0: Original model
)(ˆˆ 11 ASRAMAx −+ βα )(ˆˆ 22 ASRAMAx −+ βα M1:Re-calibrate intercept
α
M2: Re-calibration α
βoverall )(ˆˆˆ 110 ASRAMA −+± βαα )(ˆˆˆ 220 ASRAMA −+± βαα M3:Revision M2+γiXi α
βoverall )(ˆˆˆ 110 ASRAMA −+± βαα )(ˆˆˆ 220 ASRAMA −+± βαα Likelihood ratio test
M4:Revision M2+γiXi α
βoverall )(ˆˆˆ 110 ASRAMA −+± βαα )(ˆˆˆ 220 ASRAMA −+± βαα Stepwise selection
M5: Enter all predictors
M6: Stepwise selection
0α
10 ˆˆ αα ± 20 ˆˆ αα ±
10 ˆˆ αα ± 20 ˆˆ αα ±
10 ˆˆ αα ± 20 ˆˆ αα ±
...ˆˆ 2211 +++ xx γγ ...ˆˆ 2211 +++ xx γγ
10 ˆˆ αα ± 20 ˆˆ αα ±
...ˆˆ 2211 +++ xx γγ ...ˆˆ 2211 +++ xx γγ
pp xxx βββα ˆ...ˆˆˆ 22111 ++++ pp xxx βββα ˆ...ˆˆˆ 22112 ++++
...ˆˆˆ 22111 +++ xx ββα ...ˆˆˆ 22112 +++ xx ββα
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 69
Figure 3.1 Rebound tenderness
Chumpon Wilasrumee Results / 70
CHAPTER IV
RESULTS
4.1 Characteristics of patients A total of 396 patients with suspected acute appendicitis were included in
the part I derivation study. The baseline data were shown in Table 4.1. One hundred
and thirty-two patients (33.3%) were male, the mean age and BMI were 36.3±14.6 and
22.8±4.5, respectively. A total of 245/396 (61.8%, 95% CI: 56.9%, 66.7%) patients
were appendicitis, with a negative appendectomy rate of 4%.
4.2 Imputation Two variables (i.e. WBC count and neutrophils) contained missing data of
10.9% and 10.1%, respectively (Table 4.2). Exploring distribution of these missing
values suggested that missing values were not a subset of each other, thus their
missing distributions were assumed to be arbitrary patterns. Therefore, data imputation
based on the assumption of MAR could be applied. Imputed data were filled in 43 and
40 subjects for WBC and neutrophils, respectively. The observed and imputed values
for all variables were very similar. Performances of imputation were assessed by
estimate RVI and FMI, see Table 4.2. The average RVI was <0.0001 and a median of
estimated FMI was <0.0001 (range: <0.0001, 0.0535) with the maximum of 0.0535,
which required at least about 6 imputations. Therefore, 20 imputations is sufficient.
The diagnostic plot was performed for WBC and neutrophil variables by comparing
the distributions of missing to observed values, which suggested no difference
between the 2 values, see Figure 4.1A and B.
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 71
4.3 Model development
4.3.1 Derivation
Twenty variables were analyzed in the univariate analysis as shown in
Table 4.3. A total of 15 predictive variables were suggested from this step which
might be associated with appendicitis. These included symptoms (i.e., first location of
pain, migration of pain, onset, progression of pain, right lower quadrant pain at
presentation, nausea or vomiting, aggravation of pain by cough or movement, fever),
signs (i.e., bowel sound, body temperature, tenderness at right lower quadrant of
abdomen, rebound tenderness, guarding), and laboratory results (i.e., WBC >10,000
cell/mm3 and neutrophils >75%).
These parameters were simultaneously included in multiple logistic
regression model. The LR test with backward elimination was applied and suggested 7
variables should remain in the final model. These were 3 symptoms (i.e., migration of
pain, progression of pain, and aggravation of pain by cough or movement), 2 signs
(i.e., body temperature ≥ 37.8°C, and rebound tenderness), 2 laboratory results (i.e.,
WBC >10,000 cell/mm3 and neutrophils >75%). The coefficients, odd ratios, and 95%
CI were estimated and reported, see Table 4.3. The predictive equation was
In � 𝑃𝑃1−𝑃𝑃
� = - 3.37 + (0.80) migration of pain + (1.04) progression of pain
+ (0.78) aggravation of pain by cough or movement
+ (1.64) body temperature + (1.53) rebound tenderness
+ (0.91) white blood cell + (0.69) neutrophils
Among 5 predictors of sign domain, only body temperature ≥ 37.8°C and
rebound tenderness were significant with the odds ratios (ORs) of 5.1 (95% CI: 2.1,
12.1) and 4.6 (95% CI: 2.7, 7.7), respectively. Among 12 predictors of symptom
domain, progression of pain, aggravation of pain, and migration of pain were
significantly associated with appendicitis with the ORs of 2.8 (95% CI: 1.3, 5.9), and
2.2 (95% CI: 1.2, 3.8), and 2.6 (95% CI: 1.3, 3.7), respectively. Both laboratory
predictors for laboratory domain were significantly associated with appendicitis;
Chumpon Wilasrumee Results / 72
patients with WBC >10,000 cell/mm3 and neutrophils >75% were also significantly
higher risk of appendicitis with the ORs of 2.6 (95% CI: 1.3, 5.0) and 2.3 (95% CI:
1.2, 4.1), respectively (Table 4.4).
4.3.2 Model performance
The estimated C-statistic of this model was 0.842 (95% CI: 0.804, 0.881),
see (Figure 4.2), indicating the model performed well in discriminating appendicitis
from non-appendicitis. Hosmer-Lemeshow goodness of fit test indicated the model
fitted well with the data (Chi-square test = 5.64, df= 8, P-value= 0.69) with the O/E
ratio of 0.95 (95% CI: 0.83, 1.08).
The scoring scheme was constructed using estimated coefficients of the 7
significant predictors described in Table 4.4. These scores ranged from -3.37 to 3.99
with a median of 0.86. The ROC curve analysis was applied to calibrate the score
cutoffs, which were arbitrarily chosen based on its performance of LR+, and ease of
application in clinical practice. As a result, it was stratified into 4 categories, i.e., very
low risk (score < -0.64), low risk (score -0.64 to 0.84), moderate risk (score 0.85 to
1.74), and high risk (score >1.74) groups corresponding to Table 4.5. The estimated
LR+ for these 3 later groups were 1.98 (95% CI: 1.65, 2.37), 5.25 (95% CI 3.39, 8.13),
and 8.36 (95% CI: 3.96 to 18.00) when compared to the lowest risk group. The post-
test probability was estimated based on the pretest probability (i.e., prevalence) of
appendicitis of 61.8%, which yielded post-test probability of 76.0%, 89.0%, and
93.0% for low, moderate, and high risk groups, respectively (Figure 4.3).
4.3.3 Internal validation
The 450-replication bootstraps yielded estimated Dorg and Dboot
coefficients of 0.717 and 0.725 (95% CI: 0.721, 0.728) for the derivative and bootstrap
models, respectively. The bias was only 0.0073 (95%CI: -0.0109, -0.0036), which
suggested good calibration. The bootstrap C-statistics was 0.8623 (95% CI: 0.8605,
0.8641), with a bias of -0.004 (95% CI: -0.006, -0.0023).
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 73
4.4 External validation A total of 330 patients with suspected acute appendicitis (152 and 178
from Thammasat University Hospital and Chaiyaphum Hospital, respectively) were
used to externally validated the RAMA-AS. Characteristics of patients in both data
sets were described in Table 4.6.
For Thammasat University Hospital, when compared to Ramathibodi
Hospital, the prevalence of appendicitis was much lower in Thammasat University
Hospital, i.e., 48.7% vs 61.8%), and mean age was quite similar (35.6 vs 36.3 years),
but male percentage was much lower (26.4 vs 35.8%), see Table 4.6. Among seven
predictors, distributions of rebound tenderness (42.8% vs 48.5%), progression of pain
(64.5% vs 84.8%) and aggravation of pain (51.4% vs 72.5%) were little to much
lower, but migration of pain (48.0% vs 44.7%), body temperature (19.7% vs 18.7%)
and WBC > 10,000 cells/mm3 (82.2% vs 79.6%) and neutrophils >75% (75.7% vs
66.2%) were little to much higher difference.
The estimated RAMA-AS ranged from -3.4 to 4.0 with a median of 0.1.
The derivative model seemed to work well in Thammasat University Hospital, with
the estimated O/E ratio of 1.005 (95%CI: 0.784, 1.225; Hosmer-Lemshow =8.219,
(df=4), p = 0.838). However, the calibration plot showed the predicted risk deviated
from the reference line (see Figure 4.4-A), i.e., over-estimated risk for lower score and
under estimated risk for higher scores. The intercept and overall coefficients were then
calibrated (see Table 4.7), and calibration plots were constructed (see Figure 4.4-B-C)
which suggested no improvement of calibrations.
Revision methods were then constructed (i.e., M3-M6), in which migration
of pain, progression of pain, body temperature, WBC, and neutrophils were significant
predictors after adjustment of likelihood ratio in M3 (see Table 4.7). Comparing
coefficients of M3 versus M0, coefficients of body temperature, WBC, and neutrophil,
where their distributions were higher than distribution in Ramathibodi Hospital, were
changed from positive to negative coefficients; whereas coefficients of the rest of the
predictors, where their distributions were lower, increased. Only migration of pain,
progression of pain, and rebound tenderness were significantly positive predictors
after stepwise selection for M4. Of which, distributions of progression of pain and
Chumpon Wilasrumee Results / 74
rebound tenderness were much lower, but migration of pain was higher than
distributions in Ramathibodi Hospital (see Table 4.7).
Calibrations of these models were plotted in Figure 4.4-D-G. These
suggested that the O/E ratio for revision M3 model (recalibrate intercept, coefficient,
and likelihood ratio test) and M4 (calibrated intercept and overall coefficients plus
stepwise selection of significant predictors) were 0.940 (95% CI: 0.729, 1.150;
Hosmer-Lemshow = 2.683, df = 4, p = 0.612) and 1.006 (95% CI: 0.743, 1.269;
Hosmer-Lemshow = 5.00, df = 7, p = 0.660), which were much improved when
compared to M0. C-statistics were estimated for all models, see Table 4.8. These
suggested that the RAMA-AS could well discriminate appendicitis from non-
appendicitis with the C-statistics of 0.853 (95%CI: 0.790, 0.915), 0.877 (95% CI:
0.823, 0.932), and 0.881 (95% CI: 0.828, 0.935) for M0, M3 and M4, respectively.
For Chaiyaphum Hospital, when compared to Ramathibodi Hospital (see
Table 4.6), prevalence of appendicitis in Chaiyaphum Hospital was much higher
(76.9% vs 61.8%), and mean age was older (42.9 vs 36.3 years), but male percentage
was higher (39.9% vs 35.8%). Among seven predictors, three predictors including
migration of pain (70.2% vs 44.7%), body temperature (37.6% vs 18.7%) and rebound
tenderness (71.3% vs 48.5%) showed more presence; but aggravation of pain was
much lower presence (58.4% vs 72.5%), whereas the rest, progression of pain (82.6%
vs 84.8%), WBC >10,000 cells/mm3 (76.9% vs 79.6%) and neutrophils (63.5% vs
66.2%) were little lower than Ramathibodi Hospital.
A median RAMA-AS was 1.6 (-3.4, 4.0) with O/E ratio of 0.996 (95% CI:
0.695, 1.333; Hosmer-Lemshow = 6.640 (df=4), p = 0.156). Calibration of intercept
and overall coefficient models were constructed (see Table 4.8), calibration plots were
graphed for original model M0 (Figure 4.5-A), and M1-M2 (Figure 4.5 B-C) and also
revisions of models of M3-M6 (Figure 4.5 D-G). This suggested that the M0 still
deviated from the reference line particularly for low and high scores. Also M1 and M2
did not improve calibrations when compared to the original M0. Among revision of
M3-M6 models, M3-M5 were improved in calibrations, but the M6 was the best with
O/E ratios of 1.021 (95% CI: 0.905, 1.186). The estimated C-statistics for M6 was
0.860 (95%CI: 0.790, 0.930) (see Table 4.8).
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 75
4.5 Comparison of RAMA-AS and previous scores Alvarado, Fenyo, and Eskelinen scores were calculated which ranged from
2 to 10 (mean = 7.04), 0 to 56 (mean = 25.4), and 2 to 15 (mean = 9.9), respectively.
These scores were then compared with RAMA-AS using ROC curve analysis, see
Figure 4.6. This yielded C-statistics for these which corresponded with scores of 0.752
(95% CI: 0.710, 0.800), 0.764 (95% CI: 0.716, 0.813), and 0.622 (95% CI: 0.567,
0.676), which were statistically lower discriminative ability than our RAMA-AS (P-
value of < 0.001, see Figure 4.6).
Chum
pon Wilasrum
ee Results / 76
Table 4.1 Baseline data of 396 patients from Ramathibodi Hospital, 152 patients from Thammasat Hospital, and 178 patients from Chaiyaphum Hospital
Characteritics Ramathibodi Hospital Thammasat Hospital Chaiyaphum Hospital
Appendencitis,n(%) Non-appendicitis,n(%) Appendencitis,n(%) Non-
appendicitis,n(%) Appendencitis,n(%) Non-appendicitis,n(%)
Progression of pain Yes
223(92.5)
113(72.9)
67(90.5)
31(39.7)
123(87.9)
11(61.1)
No 18(7.5) 42(27.1) 7(9.5) 47(60.3) 17(12.1) 7(38.9) Aggravation of pain Yes
199(82.6)
88(56.8)
55(74.3)
23(29.5)
96(68.6)
9(50.0) No 42(17.4) 67(43.2) 19(25.7) 55(70.5) 44(31.4) 9(50.0) Migration of pain Yes
130(53.9)
47(30.3)
55(74.3)
18(23.1)
105(75.0)
9(50.0)
No 111(46.1) 108(69.7) 19(25.68) 60(76.9) 35(25.0) 9(50.0) Body temperature ≥37.8 °C Yes
176(73.0)
146(94.2)
21(28.4)
9(11.5)
52(37.1)
1(5.6) No 65(26.9) 9(5.8) 53(71.6) 69(88.5) 88(62.9) 17(94.4) Rebound tenderness Yes
155(64.3)
37(23.9)
48(64.9)
17(21.8)
119(85.0)
6(33.3)
No 86(35.7) 118(76.1) 26(35.1) 61(78.2) 21(15.0) 12(66.7) WBC (cells/mm3) >10,000
215(89.2)
100(64.5)
58(78.4)
67(85.9)
122(87.1)
13(72.2)
≤10,000 26(10.8) 55(35.5) 16(21.6) 11(14.1) 18(12.9) 5(27.8) Neutrophils (%) >75
187(77.6)
75(48.4)
54(72.9)
61(78.2)
102(72.9)
13(72.2)
≤75 54(22.4) 80(51.6) 20(27.0) 17(21.8) 38(27.1) 5(27.8)
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 77
Table 4.2 Report of number of missing data
Missing Variables Percent Observed Imputed FMI RVI
WBC 10.86 353 43 <0.0001 <0.0001
Neutrophils 10.10 356 40 <0.0001 <0.0001
Lymphocytes 10.10 356 40 <0.0001 <0.0001
FMI = largest fraction of missing information of coefficient, RVI = average relative
increase in variances of estimates
Chumpon Wilasrumee Results / 78
Table 4.3 Description of patients’ characteristics in appendicitis and non-appendicitis
groups
Characteristics Non-appendicitis n=155
Appendicitis n=241
OR (95%CI) P-Value
Demographic Age (years), mean (SD) 33.8(11.9) 37.9(15.9) <0.001
Age group ≥40 years <40 years
56(36.1) 99(63.9)
101(41.9) 140(58.1)
1.3(0.8-1.9)
1
0.251
Sex, number, (%) Male 39(25.2) 93(38.6) 1.9(1.2-2.9) <0.001
Female 116(74.8) 148(61.4) 1 BMI (kg/m2), mean (SD) 22.4(3.9) 22.95(4.7) 0.23
Symptoms
First location of pain Epigastrium 40(25.8) 102(42.3) 2.2(1.4-3.4) <0.001 Periumbilical 24(15.5) 31(12.9) 1.1(0.6-1.9) Other 91(58.7) 108(44.8) 1 Type of pain Dull aching, constant 49(31.6) 82(34.0) 1.1(0.7-1.7) 0.620 Colicky 106(68.4) 159(65.9) 1 Migration of pain Presence 47(30.3) 130(53.9) 2.7(1.8-4.1) <0.001 Absence 108(69.7) 111(46.1) 1 Onset Sudden 35(22.6) 95(39.4) 2.2(1.4-3.5) <0.001 Insidious 120(77.4) 146(60.6) 1 Progression of pain Yes 113(72.9) 223(92.5) 4.6(2.5-8.4) No 42(27.1) 18(7.5) 1 <0.001 RLQ pain at presentation Yes 140(90.3) 239(99.2) 12.8(2.9-
56.8)
No 15(9.7) 2(0.8) 1 <0.001 Time of pain before presentation (hours)
≤ 48 126(81.3) 204(84.7) 1.3(0.7-2.2) 0.382 > 48 29(18.7) 37(15.4) 1 Time of RLQ pain before presentation (hours)
≤ 12 67(43.2) 107(44.4) 1.1(0.7-1.6) 0.820 > 12 88(56.8) 134(55.6) 1 Nausea or vomiting Yes 64(41.3) 141(58.5) 2.0(1.3-3.0) <0.001 No 91(58.7) 100(41.5) 1
RLQ, right lower quadrant
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 79
Table 4.3 Description of patients’ characteristics in appendicitis and non-appendicitis
groups (cont.)
Characteristics Non-appendicitis
n=151
Appendicitis n=245
OR (95%CI) P-Value
Aggravation of pain Yes 88(56.8) 199(82.6) 3.6(2.3-5.7) <0.001 No 67(43.2) 42(17.4) 1 Anorexia Yes 118(76.1) 164(68.1) 0.7(0.4-1.1) No 37(23.9) 77(31.9) 1 0.083 Fever Yes 135(87.1) 154(63.9) 0.3(0.2-0.4) No 20(12.9) 87(36.1) 1 <0.001 Signs Bowel sound Increase 20(12.9) 37(15.4) 1.4(0.8-2.5) 0.044 Decrease 16(10.3) 45(18.7) 2.1(1.1-3.9) Normal 119(76.8) 159(65.9) 1 Body temperature (°C) ≤ 37.8 9(5.8) 65(26.9) 5.9(2.8-12.4) <0.001 < 37.8 146(94.2) 176(73.0) 1 Tenderness at RLQ Yes 137(88.4) 240(99.6) 31.5(4.2-238.8) <0.001 No 18(11.6) 1(0.4) 1 Rebound tenderness Yes 37(23.9) 155(64.3) 5.8(3.7-9.1) <0.001 No 118(76.1) 86(35.7) 1 Guarding Yes 26(16.8) 82(34.0) 2.6(1.6-4.2) <0.001 No 129(83.2) 159(65.9) 1 Laboratory results WBC (cell/mm3) >10,000 100(64.5) 215(89.2) 4.6(2.7-7.7) <0.001 ≤ 10,000 55(35.5) 26(10.8) 1 Neutrophil (%) >75 75(48.4) 187(77.6) 3.7(2.4-5.8) <0.001 ≤ 75 80(51.6) 54(22.4) 1
Chumpon Wilasrumee Results / 80
Table 4.4 Factors associated with appendicitis: Multiple logistic regression analysis
Domain Parameters Coefficient SE P value OR (95%CI)
Scoring
Symptoms Progression of pain 1.04 0.4 0.007 2.8 (1.3-5.9)
1.04
Aggravation of pain by cough or movement
0.78 0.3 0.009 2.2 (1.2- 3.8)
0.78
Migration of pain
0.80 0.3 0.004 2.6 (1.3-3.7)
0.77
Signs Body temperature ≥37.8 °C
1.64 0.5 <0.001 5.1 (2.1-12.1)
1.64
Rebound tenderness 1.53 0.3 <0.001 4.6 (2.7-7.7)
1.53
Lab results
WBC >10,000 cells/mm3
0.91 0.3 0.005 2.6 (1.3-5.0)
0.91
Neutrophils >75% 0.69 0.3 0.010 2.3 (1.2-4.1)
0.69
Constant -3.37 Total 3.99
WBC= white blood cell count
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 81
Table 4.5 Risk stratification and predictive values of a RAMA-AS prediction score
Score
(Variables)
Risk
groups
Score development for derivative phase
Outcome %Sensitivity
(95%CI)
%Specificity
(95%CI)
LR+
(95%CI)
LR-
(95%CI)
Post-
positive
test odds
(%) AP
Non
-AP
<-0.64 Very low
risk
25 85 100.00 0 1.00 0 61.80
-0.64 to 0.84 Low risk 61 51 89.75
(85.25-93.26)
54.97
(46.67-63.06)
1.98
(1.65-
2.37)
0.19
(0.13-
0.28)
76.00
(73.00-
79.00)
0.85 to 1.74 Intermedi
ate risk
64 12 64.08
(57.73-70.09)
88.08
(81.82-92.78)
5.25
(3.39-
8.13)
0.41
(0.34-
0.49)
89.00
(85.00-
93.00)
>1.74 High risk 91 7 37.96
(31.86-44.36)
95.36
(90.68-98.12)
8.36
(3.96-
18.00)
0.65
(0.59-
0.72)
93.00
(86.00-
97.00)
Chumpon Wilasrumee Results / 82
Table 4.6 Key study characteristics of patients from derivative and external validation
Characteristics RA (N=396) TS (N=152) CP (N=178) Mean age (SD),years 36.3(14.6) 35.6(16.9) 42.9(16.8)
Men 132 (35.8%) 40 (26.4%) 71 (39.9%)
Symptoms
Progression of pain 336 (84.8%) 98 (64.5%) 147 (82.6%)
Aggravation of pain 287 (72.5%) 78 (51.4%) 104 (58.4%)
Migration of pain 177 (44.7%) 73 (48.0%) 125 (70.2%)
Signs
Body temperature ≥37.8 °C 74 (18.7%) 30 (19.7%) 67 (37.6%)
Rebound tenderness 192 (48.5%) 65 (42.8%) 127 (71.3%)
Laboratory
WBC (>10,000 cells/mm3) 315 (79.6%) 125 (82.2%) 137 (76.9%)
Neutrophils (>75%) 262 (66.2%) 115 (75.7%) 113 (63.5%)
Prevalence of appendicitis 245/396 (61.8%) 74/152 (48.7%) 137/178 (76.9%)
CP = Chaiyaphum Hospital; RA = Ramathibodi Hospital; TS = Thammasat Hospital
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 83
Table 4.7 Estimation of intercept and coefficients for external validations using
different update models Type of update model Thammasat (p value) Chaiyaphum
M0: Original model
α=-3.374 0.376 0.501
M1: Re-calibration
α -3.374-0.376 -3.374+(0.501)
M2: Re-calibration
α -3.374-0.376 -3.374+(0.501)
βoverall 0.929 0.798
M3: Revision M2+γiXi
α -3.374-0.376 -3.374+(0.501)
βoverall 0.929 0.798
Migration of pain 1.284 (0.004) 0.0391 (0.930)
Progression of pain
Aggravation of pain
1.318 (0.046)
0.426 (0.378)
-0.490 (0.391)
0.629 (0.184)
Body temperature -1.332 (0.033) -1.339 (0.024)
Rebound tenderness 0.454 (0.332) 1.937 (<0.001)
WBC -1.353 (0.017) -0.333 (0.519)
Neutrophils -1.236 (0.017) -1.275 (0.018)
M4:
Α -3.374-0.376 -3.374+(0.501)
βoverall 0.929 0.798
Migration of pain 1.836 (<0.001) 0.957 (0.043)
Progression of pain
Aggravation of pain
1.768 (0.001)
1.128 (0.016)
Rebound tenderness 1.817 (<0.001) 2.619 (<0.001)
M5:
Α -3.189 -1.965
Migration of pain 1.816 (<0.001) 1.011 (0.041)
Aggravation of pain 1.539 (0.006) 1.971 (0.716)
Progression of pain 0.512 (0.286) 0.988 (0.043)
Body temperature 0.485 (0.378) 0.405 (0.431)
Rebound tenderness 1.720 (<0.001) 2.566 (<0.001)
WBC 0.345 (0.615) 1.129 (0.061)
Neutrophils -0.282 (0.658) -0.757 (0.221)
M6:
Α -2.959 -1.430
Migration of pain 1.836 (<0.001) 0.957 (0.043)
Progression of pain 1.768 (0.001)
Aggravation of pain 1.128 (0.016)
Rebound tenderness 1.817 (<0.001) 2.619 (<0.001)
Chumpon Wilasrumee Results / 84
Table 4.8 Estimations of calibration coefficients and C-statistics for external
validations using different re-calibration and revision methods
Model Thammasat Chaiyaphum
GoF test
O/E (95% CI)
C stat (95% CI)
GoF test
O/E (95%CI)
C stat (95% CI)
M0 0.084 1.01 (0.78, 1.23)
0.853 (0.791, 0.915)
0.156 0.996 (0.659, 1.333)
0.813 (0.736, 0.892)
M1 0.084 1.00 (0.78, 1.23)
0.853 (0.791, 0.915)
0.156 0.996 (0.659, 1.333)
0.813 (0.736, 0.892)
M2 0.084 1.00 (0.78, 1.23)
0.853 (0.791, 0.915)
0.156 0.996 (0.659, 1.333)
0.813 (0.736, 0.892)
M3 0.612 0.94 (0.73, 1.15)
0.877 (0.823, 0.932)
0.261 1.083 (0.734, 1.434)
0.854 (0.777, 0.931)
M4 0.660 1.01 (0.74, 1.27)
0.881 (0.828, 0.935)
0.239 1.035 (0.651, 1.419)
0.857 (0.788, 0.926)
M5 0.270 0.87 (0.58, 1.61)
0.884 (0.832, 0.936)
0.279 0.905 (0.622, 1.186)
0.873 (0.809, 0.938)
M6 0.354 0.95 (0.68, 1.21)
0.872 (0.817, 0.926)
0.967 1.021 (0.947, 1.094)
0.860 (0.790, 0.930)
GoF, Goodness of Fit
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 85
A: WBC
B: Neutrophils
Figure 4.1 Diagnosis plot between missing and observe values
0.0
1.0
2.0
3.0
4kd
ensi
ty la
bneu
20 40 60 80 100x
Imputation 1
0.0
1.0
2.0
3.0
4kd
ensi
ty la
bneu
20 40 60 80 100x
Imputation 2
0.0
1.0
2.0
3.0
4kd
ensi
ty la
bneu
20 40 60 80 100x
Imputation 3
0.0
1.0
2.0
3.0
4kd
ensi
ty la
bneu
20 40 60 80 100x
Imputation 4
0.0
1.0
2.0
3.0
4kd
ensi
ty la
bneu
20 40 60 80 100x
Imputation 50
.01
.02
.03
.04
kden
sity
labn
eu
20 40 60 80 100x
Imputation 6
0.0
1.0
2.0
3.0
4kd
ensi
ty la
bneu
20 40 60 80 100x
Imputation 7
0.0
1.0
2.0
3.0
4kd
ensi
ty la
bneu
20 40 60 80 100x
Imputation 8
0.0
1.0
2.0
3.0
4kd
ensi
ty la
bneu
20 40 60 80 100x
Imputation 9
0.0
1.0
2.0
3.0
4kd
ensi
ty la
bneu
20 40 60 80 100x
Imputation 10
0.0
1.0
2.0
3.0
4kd
ensi
ty la
bneu
20 40 60 80 100x
Imputation 11
0.0
1.0
2.0
3.0
4kd
ensi
ty la
bneu
20 40 60 80 100x
Imputation 12
0.0
1.0
2.0
3.0
4kd
ensi
ty la
bneu
20 40 60 80 100x
Imputation 13
0.0
1.0
2.0
3.0
4kd
ensi
ty la
bneu
20 40 60 80 100x
Imputation 14
0.0
1.0
2.0
3.0
4kd
ensi
ty la
bneu
20 40 60 80 100x
Imputation 15
0.0
1.0
2.0
3.0
4kd
ensi
ty la
bneu
20 40 60 80 100x
Imputation 16
0.0
1.0
2.0
3.0
4kd
ensi
ty la
bneu
20 40 60 80 100x
Imputation 17
0.0
1.0
2.0
3.0
4kd
ensi
ty la
bneu
20 40 60 80 100x
Imputation 18
0.0
1.0
2.0
3.0
4kd
ensi
ty la
bneu
20 40 60 80 100x
Imputation 19
0.0
1.0
2.0
3.0
4kd
ensi
ty la
bneu
20 40 60 80 100x
Imputation 20
Observed Imputed Completed
0.0
1.0
2.0
3.0
4kd
ensi
ty la
bneu
20 40 60 80 100x
Imputation 1
0.0
1.0
2.0
3.0
4kd
ensi
ty la
bneu
20 40 60 80 100x
Imputation 2
0.0
1.0
2.0
3.0
4kd
ensi
ty la
bneu
20 40 60 80 100x
Imputation 3
0.0
1.0
2.0
3.0
4kd
ensi
ty la
bneu
20 40 60 80 100x
Imputation 4
0.0
1.0
2.0
3.0
4kd
ensi
ty la
bneu
20 40 60 80 100x
Imputation 5
0.0
1.0
2.0
3.0
4kd
ensi
ty la
bneu
20 40 60 80 100x
Imputation 6
0.0
1.0
2.0
3.0
4kd
ensi
ty la
bneu
20 40 60 80 100x
Imputation 7
0.0
1.0
2.0
3.0
4kd
ensi
ty la
bneu
20 40 60 80 100x
Imputation 8
0.0
1.0
2.0
3.0
4kd
ensi
ty la
bneu
20 40 60 80 100x
Imputation 9
0.0
1.0
2.0
3.0
4kd
ensi
ty la
bneu
20 40 60 80 100x
Imputation 10
0.0
1.0
2.0
3.0
4kd
ensi
ty la
bneu
20 40 60 80 100x
Imputation 11
0.0
1.0
2.0
3.0
4kd
ensi
ty la
bneu
20 40 60 80 100x
Imputation 12
0.0
1.0
2.0
3.0
4kd
ensi
ty la
bneu
20 40 60 80 100x
Imputation 13
0.0
1.0
2.0
3.0
4kd
ensi
ty la
bneu
20 40 60 80 100x
Imputation 14
0.0
1.0
2.0
3.0
4kd
ensi
ty la
bneu
20 40 60 80 100x
Imputation 15
0.0
1.0
2.0
3.0
4kd
ensi
ty la
bneu
20 40 60 80 100x
Imputation 16
0.0
1.0
2.0
3.0
4kd
ensi
ty la
bneu
20 40 60 80 100x
Imputation 17
0.0
1.0
2.0
3.0
4kd
ensi
ty la
bneu
20 40 60 80 100x
Imputation 18
0.0
1.0
2.0
3.0
4kd
ensi
ty la
bneu
20 40 60 80 100x
Imputation 19
0.0
1.0
2.0
3.0
4kd
ensi
ty la
bneu
20 40 60 80 100x
Imputation 20
Observed Imputed Completed
Chumpon Wilasrumee Results / 86
Figure 4.2 Receiver operating characteristic (ROC) curves of RAMA-AS for
diagnosis of appendicitis
0.00
0.25
0.50
0.75
1.00
Sen
sitiv
ity
0.00 0.25 0.50 0.75 1.001 - Specificity
Area under ROC curve = 0.8422
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 87
Figure 4.3 Fagans nomogram plot for RAMA-AS risk stratification
0.0010.0020.0050.010.020.050.10.20.51251020501002005001000
Likelihood Ratio
0.10.20.30.50.71235710
20304050607080
90939597989999.399.599.799.899.9
Pos
terio
r Pro
babi
lity
(%)
0.10.20.30.50.7
12357
10
20304050607080
909395979899
99.399.599.799.899.9
Prio
r Pro
babi
lity
(%)
PreProb: 62%
1(+)LRP: 1.00PostProb: 61.8%
2(+)LRP: 1.98PostProb: 76.2%
3(+)LRP: 5.25PostProb: 89.5%
4(+)LRP: 8.36PostProb: 93.1%
1(-)LRN: 0.00PostProb: .%
2(-)LRN: 0.19PostProb: 23.5%
3(-)LRN: 0.41PostProb: 39.9%
4(-)LRN: 0.65PostProb: 51.3%
Fagan's nomogram
Chumpon Wilasrumee Results / 88
0.1
.2.3
.4.5
.6.7
.8.9
1
0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1
Reference line Predicted riskFitted values
B) M1: Calibrate intercept
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 89
0.1
.2.3
.4.5
.6.7
.8.9
1
0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1
Reference line Predicted riskFitted values
C) M2: Calibrate intercept & overall coefficient
Chumpon Wilasrumee Results / 90
0.1
.2.3
.4.5
.6.7
.8.9
1
0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1
Reference line Predicted riskFitted values
E) M4: M2+stepwise selection of significant predictors
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 91
Figure 4.4 Calibration plots for external validations at Thammasat Hospital using
different update methods.
A) Original model M0
B) Re-calibration intercept M1
C) Re-calibration intercept M2
D) Revision model M3
E) Revision model M4
F) Revision model M5
G) Revision model M6
0
.
.
.
.
.
.
.7
.8
.9 1
0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1
Reference line Predicted risk Fitted values
G) M6: Stepwise selection of significant predictors
Chumpon Wilasrumee Results / 92
0.1
.2.3
.4.5
.6.7
.8.9
1
0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1
Reference line Predicted riskFitted values
M0: Original model0
.1.2
.3.4
.5.6
.7.8
.91
0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1
Reference line Predicted riskFitted values
B) Calibrate intercept
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 93
0.1
.2.3
.4.5
.6.7
.8.9
1
0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1
Reference line Predicted riskFitted values
C) M2: Calibrate intercept & overall coefficient0
.1.2
.3.4
.5.6
.7.8
.91
0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1
Reference line Predicted riskFitted values
D) M3: M2+additional significant predictors from M2
Chumpon Wilasrumee Results / 94
0.1
.2.3
.4.5
.6.7
.8.9
1
0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1
Reference line Predicted riskFitted values
E) M4: M2+stepwise selection of significant predictors0
.1.2
.3.4
.5.6
.7.8
.91
0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1
Reference line Predicted riskFitted values
F) M5: Reestimation of all coefficients
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 95
Figure 4.5 Calibration plots for external validations at Chaiyaphum Hospital using
different update methods.
A) Original model M0
B) Re-calibration intercept M1
C) Re-calibration intercept M2
D) Revision model M3
E) Revision model M4
F) Revision model M5
G) Revision model M6
0.1
.2.3
.4.5
.6.7
.8.9
1
0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1
Reference line Predicted riskFitted values
G) M6: Stepwise selection of significant predictors
Chumpon Wilasrumee Results / 96
Figure 4.6 Comparisons of C-statistics between RAMA-AS, Alvarodo, Eskeline and Fenyo scores
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 97
CHAPTER V
DISCUSSION
We have developed and validated a clinical prediction score, called
RAMA-AS, for classifying low, intermediate, intermediate-high, and high risk of
having appendicitis using 3 symptoms (i.e., migration of pain, progression of pain, and
aggravation of pain by cough/movement), 2 signs (i.e., body temperature ≥ 37.8°C,
and rebound tenderness), and 2 laboratory results (i.e., WBC >10,000 cells/mm3 and
neutrophils >75%). These variables were routinely assessed and available in clinical
practice in referral or even general hospitals. Internal validation showed the RAMA-
AS performed well for both calibration and discrimination. Given pretest probability
(i.e., prevalence) of appendicitis of 61.8%, the post-test probabilities for moderate and
high risk groups were 89.0% (85.0 to 93.0%) and 93.0% (86.0 to 97.0%), respectively.
In addition, external validation showed good calibrations and discriminations with the
C-statistics of 0.840 (95% CI: 0.800, 0.880) and 0.810 (95%CI: 0.730, 0.890) for
Thammasat University Hospital and Chaiyaphum Hospital, respectively.
5.1 Comparison of RAMA-AS and previous score and radiological
investigation Eskelinen score was the closet clinical decision rule which was developed
using rigorous statistical approaches and exhibited good discrimination and calibration
which was found by our systematic review(38). Alvarado(45) and Fenyo (39) provided
good discrimination and were tested in large samples with sufficient power to
accommodate the number of predictors being tested. These 3 systems were chose to
compare with RAMA-AS. The Alvarado score, most commonly known as prediction
scoring system, was first reported in 1986 and was derived retrospectively from 305
hospitalized patients with abdominal pain and suspected appendicitis (45). Although
the score has been used in many prospective studies (10, 37, 49, 74, 83, 91, 113), it has
Chumpon Wilasrumee Discussion / 98
some limitations including, it was derived from univariate analysis and had only fair to
good discrimination performance. RAMA-AS was derived based on prospective data
collections and proper method of model construction as recommended by TRIPOD
(111). RAMA-AS outperforms Alvarado, Eskelinen, and Fenyo scores (i.e. C-statistics
0.84 for RAMA-AS VS 0.75 for Alvarado, 0.62 for Eskelinen, and 0.76 for Fenyo),
and it can better classify patients into those who can be safe to observe as out-patient
treatment, need further investigation, or need to be admitted for appendectomy. The
predictors used in RAMA-AS, Alvarado, Eskelinen, and Fenyo scores were 7, 8. 6,
and 10, respectively. In laboratory domain, both Eskelinen and Fenyo scores used only
WBC while RAMA-AS and Alvarodo scores used WBC and neutrophils. This might
explain the better discrimination performance of RAMA-AS and Alvarado scores.
There may be less chance that scoring systems can replace imaging studies
in tertiary care hospitals, such as in Ramathibodi or Thammasat University Hospitals.
However, it will be useful where ultrasound is not available or it cannot provide
definitive answers or CT scan is not available. Data from Ramathibodi hospital found
that RAMA-AS had better discrimination performance than ultrasound, but lower than
CT scan (ROC 0.526 for ultrasound and 0.921 for CT scan) as shown in Figure 5.1.
This information might be useful and should be validated more in Thai primary and
secondary care settings.
5.2 External validation and model updating Calibration performance of RAMA-AS was not too good in both external
data sets. This could be explained as follows: First, prevalence of appendicitis in the
derived (Ramathibodi Hospital) and validated data (Thammasat University and
Chaiyaphum Hospitals) were different, i.e., 61.8% vs 48.7% vs 76.9%, respectively.
Therefore, the original model over-estimated risk of appendicitis in Thammasat
University Hospital, but under estimated risk in Chaiyaphum Hospital. We then re-
calibrated the intercept in M1 models by minus and plus the original intercept (i.e.,
baseline risk) with estimated intercepts for Thammasat and Chaiyaphum data,
respectively. These models were still not well calibrated, so we moved further to
recalibrate overall coefficient (M2), but this did not much improve calibrations. The
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 99
revisions of models were next constructed and they were much improved, which were
M4 for Thammasat University Hospital and M3-M6, except M5 for Chaiyaphum
Hospitals. We proposed to choose the predictive score according to the prevalence of
appendicitis as shown in Figure 5.2. Although the RAMA–AS did not calibrate well
in the external data when compared to the derived data, it still could well discriminate
appendicitis from non appendicitis in primary care setting (Chaiyaphum Hospital) and
School of Medicine setting (Thammasat University Hospital).
Our team paid attention to recruit the missing data in the 7 predictive
parameters of RAMA-AS for external validation. No missing data was found in these
7 predictive parameters of RAMA-AS from TS and CP. However there were some
missing data on other variables that we could not recover from the data.
The RAMA-AS seemed to perform well in terms of discrimination for all
external data sets. However, the elicited predictive parameters were found in different
distributions between the three settings. For instance, progression of pain had similar
distributions between Ramathibodi hospital and Chaiyaphum, but it was much lower
in TS, 84.8%, 82.6%, and 64.5%, respectively. Aggravation of pain was much higher
in RA, but showed similar trends between the two external data, i.e., 72.5% vs 58.4%
vs 51.4%. Distribution of migration of pain in Ramathibodi Hospital was similar to
Thammasat University Hospital (44.7% vs 48.0%), but was it much higher in
Chaiyaphum Hospital (70.2%). For sign domain, distribution of body temperature
≥37.8 °C in Ramathibodi Hospital was similar to Thammasat University Hospital
(18.7% vs 19.7%), but was much lower than Chaiyaphum Hospital (37.6%). A similar
trend was found for rebound tenderness, i.e., 48.5% vs 42.8% vs 71.3%. Only
laboratory domain showed distributions of WBC (>10,000 cell/mm3) similar and not
too much different across three settings, i.e., 79.6% vs 76.9% vs 82.2% for WBC;
and 66.2% vs 63.5% vs 75.7% for neutrophil.
Differences in distributions of predictors could be explained by severity of
disease, physical examination techniques, residents’ versus staffs’ experiences, time
from onset of symptoms to hospital presentation, communication skill of physicians
and patients. These can distort performances of RAMA-AS when apply to general
settings.
Chumpon Wilasrumee Discussion / 100
5.3 Using the RAMA-AS in practice Our RAMA-AS should be much useful in acute care settings particularly
in general hospitals where resources are limited. It only requires information of seven
variables, which can be collected from physical examination, interview, and perform
only CBC test. Applying the RAMA-AS is easy and straight forward as follows: First,
data of these seven variables can be input in the equation to estimate the RAMA-AS.
Probability of being appendicitis is then estimated for each risk stratification using
Fagan nomogram.
Second, counting number of positive signs, symptoms, and lab can lead to
estimate risk stratification. For instance, patients are classified as low risks if they
have only positive from all items of signs, symptom, or lab; only 1 positive item for
each of 3 domains; 2 positive items among 3 domains (i.e., 1 symptom and 1 sign, 1
symptom and 1 lab, 1 sign and 1 lab); 3 symptoms with 1 lab without sign; 3
symptoms plus one sign of rebound tenderness without lab. The post-test probability
would be 76.0%; thus observation as out-patient treatment is recommended. The
moderate risk requires 3 symptoms plus one sign of body temperature ≥ 37.8°C, or 3
symptoms plus two labs without any sign. The post-test probability is about 85.0% to
93.0 % for moderate risks, other investigations such as ultrasound or CT scan may
need to be prescribed for these patients.
The high risk group requires 3 symptoms plus 2 signs, or 3 symptoms plus
one sign and one lab, 3 symptoms plus 2 signs plus any of lab, or 3 symptoms plus 2
labs plus any of signs. The post-test probability is about 93.0% and thus surgical
treatment should be performed for high risk patients.
At the very least, even if the RAMA-AS ends up not being used, an
important finding was that when clinically diagnosing appendicitis, only 7 variables
need attention. Thus, for example, there is no need to focus too much on Rovsing’s
sign, if rebound tenderness was positive.
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 101
5.4 Strength and limitation Our study has some strengths. We followed the recommendation by
Altman et al (98) and TRIPOD (111) how to conduct the study for risk prediction
score. The study consisted of two study-phases, i.e., development and validation
phases where data were prospectively collected to minimize missing data as much as
possible. However, we still could not avoid missing data which occurred on a few
variables with small percentage of 10.10% in neutrophils and 10.86% in WBC. We
therefore applied multi-chain imputation with 20 imputations to fill in those missing
data. The predictive variables considered in the RAMA-AS were collected based on
the suggestion of our previous systematic review of diagnostic scores for appendicitis
(44). Thus, missing important variable/s should be less likely. The RAMA-SA was
built using appropriated statistical model and suggestion thereof, rather using expert’s
opinion-base. The RAMA-AS has good performances for both calibration and
discrimination in the derived setting, although one external setting has lower
discrimination performance. The RAMA-AS is easily manually calculated, risk
stratification is provided, and recommendation for management with patient is
suggested. However, further handheld calculation, application, or web-base RAMA-
AS should be developed to encourage general practitioners/surgeons to apply in
clinical practice.
However, some limitation could be not avoided. The study was conducted
in a tertiary care setting for both develop and validation phases, where the prevalence
of appendicitis was quite high. The RAMA-AS should be further validated in general
hospital population, and a modified model may be needed if calibration,
discrimination, or both performances are not good. Clinical impact of the RAMA-AS
should be also further assessed.
Because of this anomaly, there were only 9 patients in the non-appendicitis
group with fever > 37.8 degree Celsius. When compared with the appendicitis group,
the odds ratio was quite high, similar to that obtained for “tenderness” with only 1
non-tender abdomen in the appendicitis group. These OR estimates may be unreliable
(wide 95% CIs), and likely spuriously high associations. This was likely a chance
phenomenon, and the development data set may be of questionable generalizability.
There was actually some evidence to question the validity of the development data set:
Chumpon Wilasrumee Discussion / 102
data from Thammasat University and Chaiyaphum Hospitals did not support body
temperature as an important predictive variable (see models M4 & M6). Other
considerations would similarly point toward less importance of body temperature:
unlike leukocytosis and abdominal tenderness, the body temperature was not constant,
and a patient with fever might have a low body temperature because, for example, the
fever fell during measurement in the ER. Also, the method may or may not elicit high
body temperature, if the method had not been standardized or was wrongly applied in
some way. There was some evidence to support this possibility as well: in table 4.1,
there was less fever in the appendicitis group, while there was more high body
temperature in the same group! How can this discrepancy be accounted for? Similarly,
notice that no laboratory predictors (WBC, or Proportion of Neutrophils) were in the
M4 & M6 models. Either all the predictive information of the lab values was already
contained in the RAMA-AS, or lab predictors were not important for TS & CY. Which
of these possibilities was likely? Each possibility entails a different conclusion.
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 103
Figure 5.1 Comparisons of C-statistics between RAMA-AS, ultrasound and CT scan
0.00
0.25
0.50
0.75
1.00
Sen
sitiv
ity
0.00 0.25 0.50 0.75 1.001-Specificity
RAMA-AS ROC area: 0.8422 Ultrasound ROC area: 0.5263
Computer tomographyt ROC area: 0.9211 Reference
Chumpon Wilasrumee Discussion / 104
Figure 5.2 Guide to choose predictive score for appendicitis according to prevalence
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 105
CHAPTER VI
CONCLUSIONS
The new clinical decision rule for diagnosis of appendicitis described in
this thesis, Ramathibodi Appendicitis Scores or RAMA-AS in short, is promising and
has good calibration and discrimination performances. It is simple and easy to
remember and calculate, and calculation is facilitated by computer and web based
calculation as shown in the Figure 6.1
Figure 6.1 Website-base calculation of appendicitis
Application software is a program written using a database technology and
knowledge-based technology. Database technology is a structured collection of data.
According to this study, database is the data to be analyzed in statistical method.
Statistical model for risk/prognostication is common in a medical field with several
uses, including classification and risk assessment. By using statistical model
knowledge, predictions from data analysis are applied to make a decision to perform a
diagnostic test. C-statistic, sensitivity, specificity, accuracy and cost-effectiveness
Chumpon Wilasrumee Conclusions / 106
analysis are determined in code and unit test. These criteria are collected in a central
unit used in calculation by scoring system.
The objective of this software is to support a developed scoring system for
diagnosis of appendicitis with user-friendly interface. Graphical user interface (GUI)
helps users to interact with software using images. It allows the users to feel more
comfortable to system. Furthermore, this software can simplify calculation process and
be able to be repeatedly evaluated all the time.
RAMA-AS can be considered as level 2 clinical decision rule which can
be used in various settings with confidence in its accuracy (95). RAMA-AS has been
validated in 2 external sites which differ from one another. Systematic review of
previously developed diagnostic scoring systems for appendicitis (44) was performed
to make sure that all important predictors were included in the derivation process.
Important predictors were presented in a significant proportion of the study
population. Outcome events and all predictors were clearly defined. Outcomes were
assessed blinded to the presence of event. Outcome of appendicitis was known after
the pathological report in the surgical cases and telephone follow up at 1 month.
Sample size was calculated and adequate for the number of outcome events. Finally,
RAMA-AS makes clinical sense as recommended in methodological standards for
derivation of a clinical decision rule (95).
In the external validation, patients were chosen in unbiased fashion and
represented a wide spectrum of disease. There were blinded assessments of the
criterion standard for all patients. Explicit and accurate interpretation of the predictors
and the rule without knowledge of the outcome were applied to the 2 external
validation sites. Finally, the 100% follow up was achieved for these sites. The above
methodological standards for validation of the clinical decision rule were applied in
this thesis (95), and RAMA-AS did not have good calibration performance with 2
external sites. Model updating was performed to improve the calibration performance
which result in good calibration and discrimination performance of the RAMA-AS.
Ongoing impact analysis is continued in Phukaew Hospital, Chaiyaphum.
Admission forms for patients who are suspected of having appendicitis were designed
to compare the outcome of using clinical evaluation, RAMA-AS, and Alvarado score
as shown in Figure 6.2.
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 107
Figure 6.2 Admission record of ongoing impact analysis
Future research
Bayesian model averaging, which could attenuate the effect of body
temperature may improve the RAMA-AS performance and may be use for model
updating or derivation of new model.
External validation and impact analysis in primary and secondary care
hospitals, where scoring systems will have the most impact, should be done. Impact
analysis using clustered randomized controlled trial and cost analysis considering
direct costs (e.g., operation, investigations, imaging (ultrasound, CT, and MRI),
laparoscopy, and hospital stay) as well as indirect cost such as cost saving by not
having to perform investigations (C-reactive protein and erythrocyte sedimentation
rate), imaging (ultrasound, CT), or operation. The cost effectiveness and cost utility
analysis should focus on the change of medical expense after using the RAMA-AS.
The cost saving after using the score which can reduce the negative appendectomy rate
should be analyzed. The cost of imaging including US, CT, and MRI and diagnostic
laparoscopy should be analyzed.
Chumpon Wilasrumee Conclusions / 108
We hope that RAMA-AS will be a level 1 clinical decision rule, which
will aid clinical judgment, changing clinical behavior, and reduce unnecessary cost
while maintaining quality of care and patient-doctor satisfaction. It may legally
strengthen decision making in the emergency room and avoid malpractice liability.
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 109
REFERENCES
1.Bhangu A, Soreide K, Di Saverio S, Assarsson JH, Drake FT. Acute appendicitis:
modern understanding of pathogenesis, diagnosis, and management. Lancet.
2015;386(10000):1278-87.
2.Carr NJ. The pathology of acute appendicitis. Ann Diag Pathol. 2000;4(1):46-58.
3.Petroianu A, Alberti LR, Zac RI. Fecal loading in the cecum as a new radiological
sign of acute appendicitis. World J Gastroenterol. 2005;11(27):4230-2.
4.Petroianu A, Alberti LR, Zac RI. Assessment of the persistence of fecal loading in the
cecum in presence of acute appendicitis. Int J Surg. 2007;5(1):11-6.
5.Horton LW. Pathogenesis of acute appendicitis. Brit Med J. 1977;2(6103):1672-3.
6. Kong VY, Sartorius B, Clarke DL. Acute appendicitis in the developing world is a morbid disease. Ann R Coll Surg Engl. 2015 Jul;97(5):390-5.
7.Bohrod MG. The pathogenesis of acute appendicitis. Amer J Clin Pathol.
1946;16(12):752-60.
8.D'Souza N, Nugent K. Appendicitis. Amer Fam Physician. 2016;93(2):142-3.
9.Teixeira PG, Demetriades D. Appendicitis: changing perspectives. Adv Surg.
2013;47:119-40.
10.de Castro SM, Unlu C, Steller EP, van Wagensveld BA, Vrouenraets BC. Evaluation
of the appendicitis inflammatory response score for patients with acute
appendicitis. World J Surg. 2012;36(7):1540-5.
11.Abbas PI, Zamora IJ, Elder SC, Brandt ML, Lopez ME, Orth RC, et al. How Long
Does it Take to Diagnose Appendicitis? Time Point Process Mapping in the
Emergency Department. Pediatr Emerg Care. 2016.
12.Petroianu A. Diagnosis of acute appendicitis. Int J Surg. 2012;10(3):115-9.
13.Di Sebastiano P, Fink T, di Mola FF, Weihe E, Innocenti P, Friess H, et al.
Neuroimmune appendicitis. Lancet. 1999;354(9177):461-6.
14.Rubèr M, Universitetet i L. Immunopathogenic aspects of resolving and progressing
appendicitis. Linköping: Linköping University; 2012.
Chumpon Wilasrumee References / 110
15. Lee M, Paavana T, Mazari F, Wilson TR M. The Morbidity of Negative
Appendectomy. Ann R Coll Surg Engl. 2014;96(7):517-20.
16.Langenscheidt P, Lang C, Puschel W, Feifel G. High rates of appendicectomy in a
developing country: an attempt to contribute to a more rational use of
surgical resources. Eur J Surg. 1999;165(3):248-52.
17.Lewis FR, Holcroft JW, Boey J, Dunphy E. Appendicitis. A critical review of
diagnosis and treatment in 1,000 cases. Arch Surg. 1975;110(5):677-84.
18.Alvarado A. How to improve the clinical diagnosis of acute appendicitis in resource
limited settings. World J Emerg Surg. 2016;11:16.
19.Shogilev DJ, Duus N, Odom SR, Shapiro NI. Diagnosing appendicitis: evidence-
based review of the diagnostic approach in 2014. West J Emerg Med.
2014;15(7):859-71.
20.Guite KM, Hinshaw JL, Ranallo FN, Lindstrom MJ, Lee FT, Jr. Ionizing radiation
in abdominal CT: unindicated multiphase scans are an important source of
medically unnecessary exposure. J Amer Coll Radiol. 2011;8(11):756-61.
21.Hsieh CH, Lu RH, Lee NH, Chiu WT, Hsu MH, Li YC. Novel solutions for an old
disease: diagnosis of acute appendicitis with random forest, support vector
machines, and artificial neural networks. Surg. 2011;149(1):87-93.
22.Rao PM, Rhea JT, Rattner DW, Venus LG, Novelline RA. Introduction of
appendiceal CT: impact on negative appendectomy and appendiceal
perforation rates. Ann Surg. 1999;229(3):344-9.
23.Flum DR, Morris A, Koepsell T, Dellinger EP. Has misdiagnosis of appendicitis
decreased over time? A population-based analysis. J Amer Med Assoc.
2001;286(14):1748-53.
24.Huynh V, Lalezarzadeh F, Lawandy S, Wong DT, Joe VC. Abdominal computed
tomography in the evaluation of acute and perforated appendicitis in the
community setting. Amer Surg. 2007;73(10):1002-5.
25.Lee SL, Walsh AJ, Ho HS. Computed tomography and ultrasonography do not
improve and may delay the diagnosis and treatment of acute appendicitis.
Arch Surg. 2001;136(5):556-62.
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 111
26.Vadeboncoeur TF, Heister RR, Behling CA, Guss DA. Impact of helical computed
tomography on the rate of negative appendicitis. Amer J Emerg Med.
2006;24(1):43-7.
27.Ebell MH. Diagnosis of appendicitis: part II. Laboratory and imaging tests. Amer
Fam Physician. 2008;77(8):1153-5.
28.Bachur RG, Hennelly K, Callahan MJ, Monuteaux MC. Advanced radiologic
imaging for pediatric appendicitis, 2005-2009: trends and outcomes. J
Pediatr. 2012;160(6):1034-8.
29.Petrosyan M, Estrada J, Chan S, Somers S, Yacoub WN, Kelso RL, et al. CT scan in
patients with suspected appendicitis: clinical implications for the acute care
surgeon. Eur Surg Res. 2008;40(2):211-9.
30.McKay R, Shepherd J. The use of the clinical scoring system by Alvarado in the
decision to perform computed tomography for acute appendicitis in the ED.
Amer J Emerg Med. 2007;25(5):489-93.
31.Andersson RE, Hugander A, Ravn H, Offenbartl K, Ghazi SH, Nystrom PO, et al.
Repeated clinical and laboratory examinations in patients with an equivocal
diagnosis of appendicitis. World J Surg. 2000;24(4):479-85.
32.Seetahal SA, Bolorunduro OB, Sookdeo TC, Oyetunji TA, Greene WR, Frederick
W, et al. Negative appendectomy: a 10-year review of a nationally
representative sample. Amer J Surg. 2011;201(4):433-7.
33.Spiegel DA, Gosselin RA. Surgical services in low-income and middle-income
countries. Lancet. 2007;370(9592):1013-5.
34.Andersson RE. Meta-analysis of the clinical and laboratory diagnosis of appendicitis.
Br J Surg. 2004;91(1):28-37.
35.Kong V, Aldous C, Handley J, Clarke D. The cost effectiveness of early management
of acute appendicitis underlies the importance of curative surgical services
to a primary healthcare programme. Ann Royal Coll Surg Engl.
2013;95(4):280-4.
36.Ohle R, O'Reilly F, O'Brien KK, Fahey T, Dimitrov BD. The Alvarado score for
predicting acute appendicitis: a systematic review. BMC med. 2011;9:139.
Chumpon Wilasrumee References / 112
37.Fenyo G, Lindberg G, Blind P, Enochsson L, Oberg A. Diagnostic decision support
in suspected acute appendicitis: validation of a simplified scoring system.
Eur J Surg. 1997;163(11):831-8.
38.Eskelinen M., Ikonen J., P L. A computer-based diagnostic score to aid in diagnosis
of acute appendicitis. A prospective study of 1333 patients with acute
abdominal pain. Theor Surg. 1992;7(2):86-90.
39.Fenyo G. Routine use of a scoring system for decision-making in suspected acute
appendicitis in adults. Acta Chir Scand. 1987;153(9):545-51.
40.Ohmann C, Yang Q, Franke C. Diagnostic scores for acute appendicitis. Abdominal
Pain Study Group. Eur J Surg. 1995;161(4):273-81.
41.Al Qahtani HH, Muhammad AA. Alvarado score as an admission criterion for
suspected appendicitis in adults. Saudi J Gastroenterol. 2004;10(2):86-91.
42.Gregory S, Kuntz K, Sainfort F, Kharbanda A. Cost-Effectiveness of Integrating a
Clinical Decision Rule and Staged Imaging Protocol for Diagnosis of
Appendicitis. Value health. 2016;19(1):28-35.
43.Kirkil C, Karabulut K, Aygen E, Ilhan YS, Yur M, Binnetoglu K, et al. Appendicitis
scores may be useful in reducing the costs of treatment for right lower
quadrant pain. Ulus Travma Acil Cerrahi Derg. 2013;19(1):13-9.
44.Wilasrusmee C, Anothaisintawee T, Poprom N, McEvoy M, Attia J, Thakkinstian.
A. Diagnostic Scores for Appendicitis: A Systematic Review of Scores’
Performance. Br J Med Med Res. 2014;4(2):11-20.
45.Alvarado A. A practical score for the early diagnosis of acute appendicitis. Ann
Emerg Med. 1986;15(5):557-64.
46.Kalan M, Talbot D, Cunliffe WJ, Rich AJ. Evaluation of the modified Alvarado score
in the diagnosis of acute appendicitis: a prospective study. Ann Royal Coll
Surg Engl. 1994;76(6):418-9.
47.Khan I, ur Rehman A. Application of alvarado scoring system in diagnosis of acute
appendicitis. J Ayub Med Coll, Abbottabad : JAMC. 2005;17(3):41-4.
48.Al-Hashemy AM, Seleem MI. Appraisal of the modified Alvarado Score for acute
appendicits in adults. Saudi Med J. 2004;25(9):1229-31.
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 113
49.Chong CF, Thien A, Mackie AJ, Tin AS, Tripathi S, Ahmad MA, et al. Comparison
of RIPASA and Alvarado scores for the diagnosis of acute appendicitis.
Singapore Med J. 2011;52(5):340-5.
50.Andersson M, Andersson RE. The appendicitis inflammatory response score: a tool
for the diagnosis of acute appendicitis that outperforms the Alvarado score.
World J Surg. 2008;32(8):1843-9.
51.Ohmann C, Franke C, Yang Q. Clinical benefit of a diagnostic score for appendicitis:
results of a prospective interventional study. German Study Group of Acute
Abdominal Pain. Arch Surg. 1999;134(9):993-6.
52.Tepel J, Sommerfeld A, Klomp HJ, Kapischke M, Eggert A, Kremer B. Prospective
evaluation of diagnostic modalities in suspected acute appendicitis.
Langenbeck's Arch Surg. 2004;389(3):219-24.
53.Sitter H, Hoffmann S, Hassan I, Zielke A. Diagnostic score in appendicitis.
Validation of a diagnostic score (Eskelinen score) in patients in whom acute
appendicitis is suspected. Langenbeck's Arch Surg. 2004;389(3):213-8.
54.Christian F, Christian GP. A simple scoring system to reduce the negative
appendicectomy rate. Ann Royal Coll Surg Engl. 1992;74(4):281-5.
55.Ramirez JM, Deus J. Practical score to aid decision making in doubtful cases of
appendicitis. Br J Surg. 1994;81(5):680-3.
56.Teicher I, Landa B, Cohen M, Kabnick LS, Wise L. Scoring system to aid in
diagnoses of appendicitis. Ann Surg. 1983;198(6):753-9.
57.Burcharth J, Pommergaard HC, Rosenberg J, Gogenur I. Hyperbilirubinemia as a
predictor for appendiceal perforation: a systematic review. Scand J Surg.
2013;102(2):55-60.
58.Redmond JM, Smith GW, Wilasrusmee C, Kittur DS. A new perspective in
appendicitis: calculation of half time (T(1/2)) for perforation. Amer Surg.
2002;68(7):593-7.
59.Pesonen E, Eskelinen M, Juhola M. Comparison of different neural network
algorithms in the diagnosis of acute appendicitis. Int J Biomed Comput.
1996;40(3):227-33.
60.P K, urengan. Appendicectomy: to do or not. Int J Res Med Sci. 2015;3(3):670-4.
Chumpon Wilasrumee References / 114
61.Addiss DG, Shaffer N, Fowler BS, Tauxe RV. The epidemiology of appendicitis and
appendectomy in the United States. Amer J Epidemiol. 1990;132(5):910-25.
62.Horntrich J, Schneider W. [Appendicitis from an epidemiological viewpoint].
Zentralblatt fur Chirurgie. 1990;115(23):1521-9.
63.Temple CL, Huchcroft SA, Temple WJ. The natural history of appendicitis in adults.
A prospective study. Ann Surg. 1995;221(3):278-81.
64.Chong CF, Adi MIW, Thien A, Suyoi A, Mackie AJ, Tin AS, et al. Development of
the RIPASA score: A new appendicitis scoring system for the diagnosis of
acute appendicitis. Singapore Med J. 2010;51(3):220-5.
65.de Castro SM, Unlu C, Steller EP, van Wagensveld BA, Vrouenraets BC. Evaluation
of the Appendicitis Inflammatory Response Score for Patients with Acute
Appendicitis. World J Surg. 2012.
66.Denizbasi A, Unluer EE. The role of the emergency medicine resident using the
Alvarado score in the diagnosis of acute appendicitis compared with the
general surgery resident. Eur J Trauma Emerg Surg. 2003;10(4):296-301.
67.Enochsson L, Gudbjartsson T, Hellberg A, Rudberg C, Wenner J, Ringqvist I, et al.
The Fenyo-Lindberg scoring system for appendicitis increases positive
predictive value in fertile women--a prospective study in 455 patients
randomized to either laparoscopic or open appendectomy. Surg Endosc.
2004;18(10):1509-13.
68.Eskelinen M, Ikonen J, Lipponen P. A computer-based diagnostic score to aid in
diagnosis of acute appendicitis. A prospective study of 1333 patients with
acute abdominal pain. Theor Surg. 1992;7(2):86-90.
69.Galindo Gallego M, Fadrique B, Nieto MA, Calleja S, Fernandez-Acenero MJ, Ais
G, et al. Evaluation of ultrasonography and clinical diagnostic scoring in
suspected appendicitis. Br J Surg. 1998;85(1):37-40.
70.Inci E, Hocaoglu E, Aydin S, Palabiyik F, Cimilli T, Turhan AN, et al. Efficiency of
unenhanced MRI in the diagnosis of acute appendicitis: Comparison with
Alvarado scoring system and histopathological results. Eur J Radiol.
2011;80(2):253-8.
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 115
71.Kanumba ES, Mabula JB, Rambau P, Chalya PL. Modified Alvarado Scoring
System as a diagnostic tool for Acute Appendicitis at Bugando Medical
Centre, Mwanza, Tanzania. BMC surg. 2011;11.
72.Konan A, Hayran M, Kilic YA, Karakoc D, Kaynaroglu V. Scoring systems in the
diagnosis of acute appendicitis in the elderly. Ulusal travma ve acil cerrahi
dergisi. 2011;17(5):396-400.
73.Kurane SB, Sangolli MS, Gogate AS. A one year prospective study to compare and
evaluate diagnostic accuracy of modified Alvarado score and
ultrasonography in acute appendicitis, in adults. Indian J Surg.
2008;70(3):125-9.
74.Lamparelli MJ, Hoque HM, Pogson CJ, Ball AB. A prospective evaluation of the
combined use of the modified Alvarado score with selective laparoscopy in
adult females in the management of suspected appendicitis. Ann R Coll Surg
Engl 2000;82(3):192-5.
75.Limpawattanasiri C. Alvarado score for the acute appendicitis in a provincial
hospital. J Med Assoc Thai. 2011;94(4):441-9.
76.Lintula H, Pesonen E, Kokki H, Vanamo K, Eskelinen M. A diagnostic score for
children with suspected appendicitis. Langenbecks Arch Surg.
2005;390(2):164-70.
77.Malik AH, Wani RA, Saima BD, Wani MY. Small lateral access--an alternative
approach to appendicitis in paediatric patients: a randomised controlled trial.
Int J Surg. 2007;5(4):234-8.
78.Pouget-Baudry Y, Mucci S, Eyssartier E, Guesdon-Portes A, Lada P, Casa C, et al.
The use of the Alvarado score in the management of right lower quadrant
abdominal pain in the adult. J Visc Surg. 2010;147(2):e40-4.
79.Pruekprasert P, Geater A, Ksuntigij P, Maipang T, Apakupakul N. Accuracy in
diagnosis of acute appendicitis by comparing serum C-reactive protein
measurements, Alvarado score and clinical impression of surgeons. J Med
Assoc Thai.. 2004;87(3):296-303.
80.Sun JS, Noh HW, Min YG, Lee JH, Kim JK, Park KJ, et al. Receiver operating
characteristic analysis of the diagnostic performance of a computed
tomographic examination and the Alvarado score for diagnosing acute
Chumpon Wilasrumee References / 116
appendicitis: emphasis on age and sex of the patients. J Comput Tomogr.
2008;32(3):386-91.
81.Talukder D.B. SA. Modified Alvarado Scoring System in the Diagnosis of Acute
Appendicitis. JAFMC Bangladesh. 2009;5(1):18-20.
82.Teicher I, Landa B, Cohen M. Scoring system to aid in diagnoses of appendicitis.
Ann Surg. 1983;198(6):753-9.
83.Tzanakis NE, Efstathiou SP, Danulidis K, Rallis GE, Tsioulos DI, Chatzivasiliou A,
et al. A new approach to accurate diagnosis of acute appendicitis. World J
Surg. 2005;29(9):1151-6, discussion 7.
84.Van Way CW, 3rd, Murphy JR, Dunn EL, Elerding SC. A feasibility study of
computer aided diagnosis in appendicitis. Surgery, gynecology & obstetrics.
1982;155(5):685-8.
85.Yoldas O, Karaca T, Tez M. External validation of Lintula score in Turkish acute
appendicitis patients. International journal of surgery. 2012;10(1):25-7.
86.Malik AH, Wani RA, Saima BD, Wani MY. Small lateral access-an alternative
approach to appendicitis in paediatric patients: A randomised controlled
trial. Int J Surg. 2007;5(4):234-8.
87. Kong VY, van der Linde S, Aldous C, Handley JJ, Clarke DL. The accuracy of the Alvarado score in predicting acute appendicitis in the black South African
population needs to be validated. Can J Surg. 2014;57(4):E121-5. 88.Inci E, Hocaoglu E, Aydin S, Palabiyik F, Cimilli T, Turhan AN, et al. Efficiency of
unenhanced MRI in the diagnosis of acute appendicitis: comparison with
Alvarado scoring system and histopathological results. Eur J Radiol.
2011;80(2):253-8.
89.Kanumba ES, Mabula JB, Rambau P, Chalya PL. Modified Alvarado Scoring
System as a diagnostic tool for acute appendicitis at Bugando Medical
Centre, Mwanza, Tanzania. BMC Surg. 2011;11:4.
90.Konan A, Hayran M, Kilic YA, Karakoc D, Kaynaroglu V. Scoring systems in the
diagnosis of acute appendicitis in the elderly. Ulus Travma Acil Cerrahi
Derg. 2011;17(5):396-400.
91.Kurane SB, Sangolli MS, Gogate AS. A one year prospective study to compare and
evaluate diagnostic accuracy of modified Alvarado score and
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 117
ultrasonography in acute appendicitis, in adults. Indian J Surg.
2008;70(3):125-9.
92.Talukder DB, Siddiq AZ. Modified Alvarado Scoring System in the Diagnosis of
Acute Appendicitis. JAFMC Bangladesh. 2009;5(1):3.
93.Yoldas O, Karaca T, Tez M. External validation of Lintula score in Turkish acute
appendicitis patients. Inter J Surg. 2012;10(1):25-7.
94.Lintula H, Kokki H, Pulkkinen J, Kettunen R, Grohn O, Eskelinen M. Diagnostic
score in acute appendicitis. Validation of a diagnostic score (Lintula score)
for adults with suspected appendicitis. Langenbecks Arch Surg.
2010;395(5):495-500.
95.McGinn TG, Guyatt GH, Wyer PC, Naylor CD, Stiell IG, Richardson WS. Users'
guides to the medical literature: XXII: how to use articles about clinical
decision rules. Evidence-Based Medicine Working Group. J Amer Med
Assoc. 2000;284(1):79-84.
96.Pruekprasert P, Maipang T, Geater A, Apakupakul N, Ksuntigij P. Accuracy in
diagnosis of acute appendicitis by comparing serum C-reactive protein
measurements, Alvarado score and clinical impression of surgeons. J Med
Assoc Thai. 2004;87(3):296-303.
97.Normand SL. Meta-analysis: formulating, evaluating, combining, and reporting. Stat
Med. 1999;18(3):321-59.
98.Altman DG, Royston P. What do we mean by validating a prognostic model? Stat
Med. 2000;19(4):453-73.
99.Harrell FE, Jr., Lee KL, Mark DB. Multivariable prognostic models: issues in
developing models, evaluating assumptions and adequacy, and measuring
and reducing errors. Stat Med. 1996;15(4):361-87.
100. Gordon G, Rennie D, Meade MO, Cook DJ. Users' Guides to the medical literature:
A Menual for evidence-based clinical practice. 2, editor. Chicago: McGraw
Hill; 2008.
101.Mallett S, Royston P, Dutton S, Waters R, Altman DG. Reporting methods in
studies developing prognostic models in cancer: a review. BMC Med.
2010;8:20.
Chumpon Wilasrumee References / 118
102.Moons KG, Altman DG, Reitsma JB, Ioannidis JP, Macaskill P, Steyerberg EW, et
al. Transparent Reporting of a multivariable prediction model for Individual
Prognosis or Diagnosis (TRIPOD): explanation and elaboration. Ann Inter
Med. 2015;162(1):W1-73.
103.Rubin DB, Schenker N. Multiple imputation in health-care databases: an overview
and some applications. Stat Med. 1991;10(4):585-98.
104.White IR, Royston P, Wood AM. Multiple imputation using chained equations:
Issues and guidance for practice. Stat Med. 2011;30(4):377-99.
105.van Buuren S, Boshuizen HC, Knook DL. Multiple imputation of missing blood
pressure covariates in survival analysis. Stat Med. 1999;18(6):681-94.
106.McNeil BJ, Hanley JA. Statistical approaches to the analysis of receiver operating
characteristic (ROC) curves. Med Decis Making. 1984;4(2):137-50.
107.Hajian-Tilaki K. Receiver Operating Characteristic (ROC) Curve Analysis for
Medical Diagnostic Test Evaluation. Caspian J Int Med. 2013;4(2):627-35.
108.Ehsanullah J, Ahmad U, Solanki K, Healy J, Kadoglou N. The surgical admissions
proforma: Does it make a difference? Ann Med Surg. 2015;4(1):53-7.
109.Janssen KJ, Vergouwe Y, Kalkman CJ, Grobbee DE, Moons KG. A simple method
to adjust clinical prediction models to local circumstances. Can J Anaesth.
2009;56(3):194-201.
110.Toll DB, Janssen KJ, Vergouwe Y, Moons KG. Validation, updating and impact of
clinical prediction rules: a review. J Clin Epidemiol. 2008;61(11):1085-94.
111.Moons KG, Altman DG, Reitsma JB, Collins GS. New Guideline for the Reporting
of Studies Developing, Validating, or Updating a Multivariable Clinical
Prediction Model: The TRIPOD Statement. Adv Anat Pathol.
2015;22(5):303-5.
112.Steyerberg EW, Moons KG, van der Windt DA, Hayden JA, Perel P, Schroter S, et
al. Prognosis Research Strategy (PROGRESS) 3: prognostic model research.
PLoS Med. 2013;10(2):e1001381.
113.Watters JM. The appendicitis inflammatory response score: a tool for the diagnosis
of appendicitis that outperforms the Alvarado score. World J Surg.
2008;32(8):1850.
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 119
APPENDICES
Chumpon Wilasrumee Appendices / 120
APPENDIX A
CASE RECORD FORMS
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 121
Chumpon Wilasrumee Appendices / 122
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 123
Chumpon Wilasrumee Appendices / 124
APPENDIX B
IMPUTATION
Table Report 1 number of missing data
Missing Variables
Percent Observed Imputed FMI RVI
WBC 10.86 353 43 <0.0001 <0.0001
Neutrophil 10.10 356 40 <0.0001 <0.0001
Lymphocyte 10.10 356 40 <0.0001 <0.0001
FMI = largest fraction of missing information of coefficient, RVI = average relative increase in variances of estimates
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 125
A: WBC
B: Neutrophils
Figure 1 Diagnosis plot between missing and observed values
Chumpon Wilasrumee Appendices / 126
APPENDIX C
STATA COMMANDS
Imputations
***Pattern
misstable sum outcome_2 age sex symptom1 symptom3 symptom4 symptom5 symptom6 nv
movecoug bowel fever gr_bt tender rebten guarding labwbc labneu lablym
misstable pattern labwbc labneu lablym
****pattern as monotone vs arbitary
mi misstable nested labwbc labneu lablym
1. labneu(40) <-> lablym(40)
2. labwbc(43)
missing labneu is always missing lablym! but both vars were not overlapped with
labwbc!
Thus a pattern of missing in monotone for two vars and arbitary for wbc.
mi set mlong
mi register imputed labwbc labneu lablym
mi register regular outcome_2 age sex symptom1 symptom3 symptom4 symptom5
symptom6 nv movecoug bowel fever gr_bt tender rebten guarding
foreach var of varlist symptom3 symptom5 symptom6 rebten movecoug fever tender
guarding {
recode `var' 2=0
tab gr_bt
}
mi impute chained (truncreg, ll(50) ul(80000)) labwbc ///
(truncreg, ll(1) ul(100))labneu lablym = outcome_2 age ib(2).sex gr_bt tender rebten
ib(3).symptom1 symptom3 symptom5 symptom6 movecoug guarding nv i.bowel, add(20) force
rseed(15690)
***mi estimate
mi estimate, saving(m1, replace) dots: logit outcome_2 i.symptom3 i.symptom5
i.movecoug i.rebten i.gr_bt i.labwbc_2 i.labneu_2
mi predict xb using m1, xb
mi passive:gen prob=(exp(xb))/(1+(exp(xb)))
sum xb prob
***Exploring mi performances
a) Estimate Relative Variance Increase (RVI) & Fraction of Missing Information (FMI)
The variablility of MI consists of two sources, i.e., within and between imputations.
Thus, precision of MI estimated depends not only on number of subjects, but also number
of imputations.
The RVI refers to average relative increase in variances of estimates because of
missing WBC & Neu.
That mean (variance of all coefficients) from missing data, is = 0.0000. This value is
close to 0, so missing data less reflects on estimates.
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 127
FMI, refers to the largest fraction of missing information of coefficient estimates due
to missing data. This FMI was used to get an idea about the number of imputations based
on a rule of thumb. The M>=FMIx100, e.g., .212853 x100, i.e., at least 21 Ms are
required.
mi estimate, vartable nocitable
mi estimate, dftable
b) Diagnostic plots:
by comparing distribution of the imputed values with the observed values
midiagplots labwbc, sample(all) combine ksmirnov
Score development
tab variables appendic_status,col chi2 exp exact
gen score = -3.374991 + .7978508*symptom3 + 1.042774*symptom5 + .7787298*movecoug +
1.529419*rebten + 1.636311 *fever2 + ///
.9095179*labwbc2 + .6898424*labneu2
( >= -3.37.. ) 100.00% 0.00% 60.86% 1.0000
( >= -2.68.. ) 100.00% 3.87% 62.37% 1.0403 0.0000
( >= -2.57.. ) 100.00% 5.16% 62.88% 1.0544 0.0000
( >= -2.46.. ) 100.00% 6.45% 63.38% 1.0690 0.0000
( >= -2.33.. ) 99.59% 8.39% 63.89% 1.0870 0.0495
( >= -1.90.. ) 99.59% 12.90% 65.66% 1.1434 0.0322
( >= -1.88.. ) 99.59% 13.55% 65.91% 1.1519 0.0306
( >= -1.84.. ) 99.59% 14.19% 66.16% 1.1606 0.0292
( >= -1.77.. ) 99.59% 14.84% 66.41% 1.1694 0.0280
( >= -1.68.. ) 99.59% 18.06% 67.68% 1.2154 0.0230
( >= -1.66.. ) 99.17% 18.71% 67.68% 1.2199 0.0444
( >= -1.64.. ) 99.17% 19.35% 67.93% 1.2297 0.0429
( >= -1.55.. ) 99.17% 20.00% 68.18% 1.2396 0.0415
( >= -1.53.. ) 97.51% 27.74% 70.20% 1.3495 0.0897
( >= -1.42.. ) 97.51% 28.39% 70.45% 1.3616 0.0877
( >= -1.15.. ) 97.10% 34.19% 72.47% 1.4755 0.0849
( >= -1.06.. ) 97.10% 34.84% 72.73% 1.4901 0.0834
( >= -1.04.. ) 97.10% 35.48% 72.98% 1.5050 0.0819
( >= -.996.. ) 96.68% 35.48% 72.73% 1.4985 0.0935
( >= -.97778 ) 95.85% 38.06% 73.23% 1.5476 0.1090
( >= -.959.. ) 95.44% 39.35% 73.48% 1.5737 0.1160
( >= -.936.. ) 95.44% 40.00% 73.74% 1.5906 0.1141
( >= -.863.. ) 95.44% 40.65% 73.99% 1.6079 0.1123
( >= -.844.. ) 94.61% 40.65% 73.48% 1.5939 0.1327
( >= -.829.. ) 94.19% 40.65% 73.23% 1.5869 0.1429
( >= -.802.. ) 93.78% 40.65% 72.98% 1.5799 0.1531
( >= -.755.. ) 93.36% 41.94% 73.23% 1.6079 0.1583
( >= -.732.. ) 92.12% 46.45% 74.24% 1.7202 0.1697
( >= -.643.. ) 90.87% 50.32% 75.00% 1.8292 0.1814
( >= -.624.. ) 89.63% 54.84% 76.01% 1.9846 0.1892
( >= -.377.. ) 88.80% 55.48% 75.76% 1.9947 0.2019
( >= -.246.. ) 88.38% 55.48% 75.51% 1.9854 0.2094
( >= -.209.. ) 87.55% 56.13% 75.25% 1.9957 0.2218
( >= -.199.. ) 87.55% 56.77% 75.51% 2.0255 0.2193
Chumpon Wilasrumee Appendices / 128
( >= -.157.. ) 87.14% 57.42% 75.51% 2.0464 0.2240
( >= -.139.. ) 87.14% 58.06% 75.76% 2.0779 0.2215
( >= -.065.. ) 86.72% 59.35% 76.01% 2.1336 0.2237
( >= -.024.. ) 86.72% 60.65% 76.52% 2.2036 0.2189
( >= .0458.. ) 86.72% 61.29% 76.77% 2.2403 0.2166
( >= .0649.. ) 80.08% 69.03% 75.76% 2.5860 0.2885
( >= .1067.. ) 78.84% 71.61% 76.01% 2.7773 0.2955
( >= .1538.. ) 78.42% 72.90% 76.26% 2.8942 0.2960
( >= .5325.. ) 77.18% 75.48% 76.52% 3.1481 0.3023
( >= .5516.. ) 76.76% 76.13% 76.52% 3.2158 0.3052
( >= .6394.. ) 76.35% 76.13% 76.26% 3.1984 0.3107
( >= .6657.. ) 76.35% 76.77% 76.52% 3.2872 0.3081
( >= .6848.. ) 75.93% 78.06% 76.77% 3.4617 0.3083
( >= .7737.. ) 74.69% 78.71% 76.26% 3.5081 0.3216
( >= .7965.. ) 73.03% 80.00% 75.76% 3.6515 0.3371
( >= .8437.. ) 71.37% 81.29% 75.25% 3.8146 0.3522
( >= .8854.. ) 64.32% 87.74% 73.48% 5.2468 0.4067
( >= .9045.. ) 60.17% 90.32% 71.97% 6.2171 0.4410
( >= .9923.. ) 59.34% 90.97% 71.72% 6.5693 0.4470
( >= 1.011.. ) 58.92% 90.97% 71.46% 6.5234 0.4516
( >= 1.330.. ) 58.51% 90.97% 71.21% 6.4775 0.4561
( >= 1.390.. ) 57.68% 90.97% 70.71% 6.3856 0.4653
( >= 1.463.. ) 57.26% 90.97% 70.45% 6.3397 0.4698
( >= 1.478.. ) 56.43% 90.97% 69.95% 6.2478 0.4789
( >= 1.570.. ) 56.02% 90.97% 69.70% 6.2018 0.4835
( >= 1.575.. ) 55.60% 90.97% 69.44% 6.1559 0.4881
( >= 1.594.. ) 45.64% 93.55% 64.39% 7.0747 0.5811
( >= 1.682.. ) 44.40% 93.55% 63.64% 6.8817 0.5944
( >= 1.6833 ) 42.32% 94.19% 62.63% 7.2891 0.6123
( >= 1.701.. ) 38.59% 94.84% 60.61% 7.4767 0.6475
( >= 1.74303 ) 37.76% 95.48% 60.35% 8.3610 0.6518
( >= 1.790.. ) 37.34% 95.48% 60.10% 8.2691 0.6562
( >= 2.168.. ) 36.93% 95.48% 59.85% 8.1772 0.6605
( >= 2.302.. ) 36.93% 96.13% 60.10% 9.5401 0.6561
( >= 2.373.. ) 36.10% 96.13% 59.60% 9.3257 0.6647
( >= 2.432.. ) 19.50% 99.35% 50.76% 30.2284 0.8102
( >= 2.480.. ) 19.09% 99.35% 50.51% 29.5853 0.8144
( >= 2.540.. ) 15.35% 99.35% 48.23% 23.7968 0.8520
( >= 3.211.. ) 14.94% 99.35% 47.98% 23.1537 0.8561
( >= 3.230.. ) 7.05% 100.00% 43.43% 0.9295
( >= 3.319.. ) 5.81% 100.00% 42.68% 0.9419
( >= 4.009.. ) 5.39% 100.00% 42.42% 0.9461
( > 4.009.. ) 0.00% 100.00% 39.14% 1.0000
Score performance
mi convert wide
roctab outcome_2 xb
fagani 0.618 8.2691 0.6562
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 129
use fagan.dta, clear
fagan lrp lrn, grpvar(test)
fagan lrp lrn, grpvar(test) pr(0.5)
fagan lrp lrn, grpvar(test) pr(0.5) scheme(s2color
Internal validation
Bootstrap
set more off
cd c:\data\
mi estimate, saving(ramap, replace) dots: logit outcome_2 i.symptom3 i.symptom5
i.movecoug i.rebten i.gr_bt i.labwbc_2 i.labneu_2
mi predictnl pr2_mi = predict(pr) using ramaap
mi xeq 0: summarize pr2_mi
mi predict xb_mi using ramaap, xb
***estimate original D
somersd outcome_2 pr2_mi
***estimate original roc
roctab outcome_2 xb_mi
***note temp3=400
This was not successful! Did not know why! I tried to use 450
cd c:\data
set seed 123456
local nSim = 1000
set more off
cap postclose pf
postfile pf outcome_2 score roc nSim pr2 D using "temp1000.dta", replace
forvalues s = 1/`nSim' {
preserve
bsample
tempvar pr2 D score outcome_2
mi estimate, saving(m1, replace) dots: logit outcome_2 i.symptom3 i.symptom5
i.movecoug i.rebten i.gr_bt i.labwbc_2 i.labneu_2
mi predict `score' using m1, xb
**gen `score' = xb
capture noisily mi estimate, saving(m2,replace) dots: logit outcome_2 `score'
mi predict xb_b using m2, xb
mi predictnl pr2 = predict(pr) using m2
gen `pr2'=pr2
gen `outcome_2'=outcome_2
**predict `p' , pr
qui somersd outcome_2 `pr2'
gen `D' =_b[`pr2']
capture noisily qui roctab outcome_2 `score'
local roc = r(area)
sum `score'
replace `score'=r(mean)
sum `outcome_2'
replace `outcome_2'=round(r(mean))
post pf (`outcome_2') (`score') (`roc') (`nSim') (`pr2') (`D')
Chumpon Wilasrumee Appendices / 130
*di "." _continue
restore
}
postclose pf
***Calculation of bias (D and C) & boostrap correction coefficients
cd c:\data
use temp1000 /*450 replications from Tong's do file*/
A) calibration bias
gen D_org = .6856378 /*original D*/
sum D
gen meanD_boot = r(mean) if pr~=.
ci D
gen D_bias = D_org - D
gen meanD_bias = (D_org-meanD_boot) /*D optimism*/
sum D_bias meanD_bias
ci D_bias
***Calculate a bootstrap corrected calibration coefficient: D_org-meanD_bias
gen bs_correctedD = D_org- (meanD_bias) /* bias = -, i.e., D boostrap is higher than
D_or */
list D_org meanD_bias bs_correctedD in 1
lab var bs_correctedD"A bs-correction of D"
sum D D_org meanD_bias bs_corr
gen percent_Derror = (D_org - bs_correctedD)/D_org*100
sum meanD_bias percent_Derror
B) Discrimination bias
gen roc_org= 0.8428 /*from the original model in the derive phase*/
sum roc
gen mean_rocboot = r(mean) if pr2 ~=.
ci roc
gen roc_bias = roc_org-roc /*individual bias*/
*gen bias_roc = roc_org-rocboot
gen meanroc_bias = roc_org-mean_rocboot
sum roc_bias meanroc_bias roc_org
ci roc_bias
C) bootstrap corrected discrimination coefficient by roc_org- bias
gen corrected_roc = roc_org-(meanroc_bias)
lab var corrected_roc " A bs-corrected ROC"
sum roc_org roc corrected_*
gen percent_rocerror = (roc_org-corrected_roc)/roc_org*100
sum roc_org roc corrected_* percent_rocerror
External validation
gen score = -3.374991 + .7978508*symptom3 + 1.042774*symptom5 + .7787298*movecoug +
1.529419*rebten + 1.636311 *fever2 +.9095179*labwbc2 + .6898424*labneu2
sum score
logit appendicitis score
lroc
roctab appendicitis score
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 131
predict p, pr
estat gof, gr(6)
A) Recalibrate intercept
Recalibrate the constant term due to difference of incidence/prevalence of appendicitis
between derived and validated data
Correction factor = ln[{(prev/(1-prev)}/{(MPV/(1-MPV}]
MPV=mean predicted value, which would be exactly the same as the prevalence of disease
given predicted values = estimated P from predict command.
Therefore, I used median instead
sum p,det
50% .4218531 Mean .4868421
disp ln((.4868421/(1-.4868421))/(.4218531/(1-.4218531)))
.26252708
gen cf = ln((.4868421/(1-.4868421))/(.4218531/(1-.4218531)))
***Prevalence of appendicitis in the derived data
M1 Calibration of intercept (baseline risk)
If the prev(original) was higher than prev(val), decrease prev(org) was required by
adding up CF because it was +cf
calibrate b0 = b0(orig) - cf ; if cf > 0, b0-cf
If the prev(orig) was lower than prev(val), increasing prev(org) could be done by
subtraction of the cf from the b0
calibrate b0 = b0(orig) + cf
For appendicitis data, the prev(org) < prev(val),
so,
calb0 = b0(org) - cf
***M1
logit app score
gen score1 = (-3.374991-cf) + .7978508*symptom3 + 1.042774*symptom5 + .7787298*movecoug
+ 1.529419*rebten + 1.636311 *fever2 +.9095179*labwbc2 + .6898424*labneu2
***Refit the equation using cal_b0 as below. This model also yields calibrated
coefficient (cal_b), which can be used for the next step of calibration coefficient
ln [risk of app/(1 - risk of app)] = cal_b0 + cal_b*xb
logit append score1
lroc
roctab appen score1
***Calibration for m1
logit append score1
estat gof, gr(6) tab
disp chiprob(4,8.219118) /*df = 6-1-1*/
***Calibration plot
twoway (line obs_p1_ref p1_ref, lpattern(dash)) (scatter p1 obs_p1 , sort
msymbol(triangle_hollow) ylabel(0(.1)1) xlab(0(.1)1)) (lfit p1 obs_p1)
ci mean o_e
B) M2: Calibration by correction of all coefficients by one overall correction factor
i.e., cal_b as defined below. This is because original coefficients are overfitted or
underfitted.
As for ln [risk of app/(1 - risk of app)] = cal_b0 + cal_b*xb
logit app score
gen cal_b = _b[score]
Chumpon Wilasrumee Appendices / 132
sum cal_b
use the cal_b to calibrate coefficients in the original mo
gen score2 = (-3.374991-cf) + cal_b*(.7978508*symptom3 + 1.042774*symptom5 +
.7787298*movecoug + 1.529419*rebten + 1.636311 *fever2 + .9095179*labwbc2 +
.6898424*labneu2)
logit app score2
roctab appen score2
***Calibration for M2
ci mean o_e
C) M3: Update model or revision method
This is M2 + additional adjustment of regression coefficient for few predictors which
were under/overfitted in validated data compared to original data
ln [risk of app/(1 - risk of app)] = cal_b0 + cal_b*xb + gamma*predictor
Refitted the equation to be the same as calibration coefficient, but included each
predictor one by one in this model.
If gamma coefficient was significant, that meant that predictor was needed to calibrate
Compare [cal_b0+cal_b*xb + gammaX] vs [cal_b0+cal_b*xb]
logit app score2 /*cal_b0 + cal_b*xb*/
estimates store cal_b0b
logit app score2 /*cal_b0 + cal_b*xb*/
***keep LL for M2
estimates store cal_b0b
for each var of varlist symptom3 symptom5 move fever2 rebten labwbc2 labneu2 {
logit append score2 `var'
estimates store m`var'
lrtest cal_b0b m`var'
}
As a result, this comparisons between the two models, with vs without that predictor
indicated that effects of symptom3 and symptom5 were underestimated, fever, labwbc2,
and labneu2 and overestimated the effects, whereas rebten and movecough were not
statistically significant.
Thus, coefficients of these 5 varibles should be used to calibrate specific variables
***Calibrate specific predictors: symptom3 symptom5, fever2, labwbc2, labneu2
gen score3 = (-3.374991+cf) + cal_b*(.7978508*symptom3 + 1.042774*symptom5 +
.7787298*movecoug + 1.529419*rebten + 1.636311 *fever2 + .9095179*labwbc2 +
.6898424*labneu2) + (1.284446*symptom3)+ (1.138048*symptom5)+ (-1.331718*fever2) + (-
1.353248*labwbc2) + (-1.236354*labneu2)
sum score*
estat gof, gr(6)
roctab appen score3
***Calibration for m3
***Prepare p data
disp chiprob(4,2.683341) /*df = 6-1-1*/
.61213286
***Calibration plot
twoway (line obs_p3_ref p3_ref, lpattern(dash)) (scatter p3 obs_p3 , sort
msymbol(triangle_hollow) ylabel(0(.1)1) xlab(0(.1)1)) (lfit p3 obs_p3)
ci mean o_e
D) M4: M2+ Stepwise selection of additional predictors.
Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 133
For this step, one or more predictors were not included in the original model, or
new marker was needed to include in the model.
I started with
cal_b0 + cal_b*xb
but needed to re-select what variables should be included in xb among 7 variables
sw, pr(.05): logit append symptom3 symptom5 move fever2 rebten labwbc2 labneu2
began with full model
only 3, i.e., symptom3, symtom5, rebten remained in the model!
gen score4 = (-3.374991-cf) + cal_b*(.7978508*symptom3 + 1.042774*symptom5 +
.7787298*movecoug + 1.529419*rebten + 1.636311 *fever2 + .9095179*labwbc2 +
.6898424*labneu2) + (1.83606)*symptom3 + (1.768313)*symptom5 + (1.816649)*rebten
logit append score4
estat gof, gr(6) tab
***Calibration for M4
***Prepare p data
disp chiprob(7, 5.000413)
.65991283
***Calibration plot
twoway (line obs_p4_ref p4_ref, sort lpattern(dash)) (scatter p4 obs_p4, sort
msymbol(triangle_hollow) ylabel(0(.1)1) xlab(0(.1)1)) (lfit p4 obs_p4)
ci mean o_e
D) Model 5: Re-estimate all coefficients using validation data only
logit append symptom3 symptom5 move fever2 rebten labwbc2 labneu2
roctab append score5
estat gof, gr(6) tab
*** Prepare p data
disp chiprob(4, 5.177072)
***Calibration plot
lab var p5 ***Predicted risk***
twoway (line obs_p5_ref p5_ref, sort lpattern(dash)) (scatter p5 obs_p5, sort
msymbol(triangle_hollow) ylabel(0(.1)1) xlab(0(.1)1)) (lfit p5 obs_p5)
ci mean o_e
E) Model 6: M5 + stepwise selection; one or more predictors were not included
sw, pr(.05): logit append symptom3 symptom5 move fever2 rebten labwbc2 labneu2
begin with full model
estat gof, gr(6) tab
***Calibration for m6
***Prepare p data
disp chiprob(3,3.257358)
***Calibration plot
twoway (line obs_p6_ref p6_ref, sort lpattern(dash)) (scatter p6 obs_p6, sort
msymbol(triangle_hollow) ylabel(0(.1)1) xlab(0(.1)1)) (lfit p6 obs_p6)
ci mean o_e
Chumpon Wilasrumee Biography / 134
BIOGRAPHY
NAME Chumpon Wilasrusmee
DATE OF BIRTH 16 June 1968
PLACE OF BIRTH Bangkok, Thailand
INSTITUTIONS ATTENDED Faculty of Medicine, Ramathibodi Hospital,
Mahidol University, (1986 – 1992)
Doctor of Medicine (First Class Honors)
Faculty of Medicine, Ramathibodi Hospital,
Mahidol University, 1997-2000
Master of Science (Clinical Medicine)
Faculty of Medicine, Ramathibodi Hospital,
Mahidol University, 2010-2016
Doctor of Philosophy (Clinical Epidemiology)
SCHOLARSHIP RECEIVED Faculty of Medicine, Ramathibodi Hospital,
Mahidol University
RESEARCH GRANTS -
HOME ADDRESS 88/350 M.17, Thaweewattana,
Salathammasob, Bangkok 10170,
THAILAND
EMPLOYMENT ADDRESS Department of Surgery, Faculty of
Medicine, Ramathibodi Hospital, 270,
Rama VI Road, Ratchathevi, Bangkok
10400, THAILAND
PHONE Mobile: (6689) 7986337
E-MAIL [email protected]