RAMATHIBODI APPENDICITIS SCORE (RAMA-AS): A USEFUL … · Appendicitis is defined as an inflammation of the appendix that usually begins at the inner lining and spreads to its other

RAMATHIBODI APPENDICITIS SCORE (RAMA-AS): A USEFUL TOOL FOR DIAGNOSIS OF APPENDICITIS

CHUMPON WILASRUSMEE

A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENT FOR

THE DEGREE OF DOCTOR OF PHILOSOPHY (CLINICAL EPIDEMIOLOGY)

FACULTY OF GRADUATE STUDIES MAHIDOL UNIVERSITY

2016

COPYRIGHT OF MAHIDOL UNIVERSITY

Thesis entitled


............ ................................................

Mr. Chumpon Wilasrusmee Candidate

............................................................

Assoc. Prof. Ammarin Thakkinstian, Ph.D.(Clinical Epidemiology & Community Medicine) Major advisor

............................................................ Assoc. Prof. Patarawan Woratanarat, M.D.,Ph.D.(Clinical Epidemiology) Co-advisor

............................................................ Assoc. Prof. Panuwat Lertsittichai, M.D.,M.Sc.(Medical Statistics) Co-advisor

... ...................................................... ............................................................ Assoc.Prof. Varaporn Akkarapatumwong, Assoc. Prof. Ammarin Thakkinstian, Ph.D. (Science) Ph.D.(Clinical Epidemiology & Acting Dean Community Medicine) Faculty of Graduate Studies Program Director Mahidol University Doctor of Philosophy Program in Clinical Epidemiology Faculty of Medicine, Ramathibodi

Hospital Mahidol University

Thesis entitled


was submitted to the Faculty of Graduate Studies, Mahidol University

for the degree of Doctor of Philosophy (Clinical Epidemiology) on

October 27, 2016

………………………………............. Mr. Chumpon Wilasrusmee Candidate ……………………………………... Lect. Vijj Kasemsap, M.D.,Ph.D.( Social & Administrative Pharmacy) Chair ………………………………..…. …………………………………… Prof. Prakitpunthu Tomtitchong, Assoc. Prof. Ammarin Thakkinstian, M.D., M.Sc., Ph.D., FRSCT Ph.D.(Clinical Epidemiology & Member Community Medicine) Member

………………………………..…. …………………………………… Assoc. Prof. Panuwat Lertsithichai, Assoc. Prof. Patarawan Woratanarat, M.D.,M.Sc.(Medical Statistics) M.D.,Ph.D.(Clinical Epidemiology) Member Member

…………………………………… ……… ……………………………… Assoc.Prof. Varaporn Akkarapatumwong, Prof. Piyamitr Sritara, Ph.D. (Science) M.D., FRCPT, FACP, FRCP (T) Acting Dean Dean Faculty of Graduate Studies Faculty of Medicine Ramathibodi Hospital Mahidol University Mahidol University

iii

ACKNOWLEDGEMENTS

I would like to express my deepest appreciation to my major advisor

Assoc. Prof. Dr. Ammarin Thakkinstian, who has the attitude and the substance of a

genius: she continually and convincingly conveyed a spirit of adventure in regard to

research and scholarship, and an excitement in regard to teaching. Without her

guidance and persistent help this dissertation would not have been possible.

I would like to show my gratitude to Dr. Sasivimol Rattanasiri, Dittapol

Muntham, Prawduen Saravej, and all personnels of the Section of Clinical

Epidimiology and Biostatistic, Ramathibodi Hospital, Mahidol University for their

grateful help and support especially about data management and administrative

management of my project.

I am also thankful to Dr. Boonying Siribumrungwong (Department of

Surgery, Thammasat Hospital, Thammasat University), Dr.Samart Phuwaprisirisarn

(Department of Surgery, Chaiyaphum Hospital), and Napaphat Poprom for their

permission and support for collecting the patient data. Without their support, I could

not develop and validate the prediction model.

Chumpon Wilasrusmee

Fac. of Grad. Studies, Mahidol Univ. Thesis / iv

RAMATHIBODI APPENDICITIS SCORE (RAMA-AS): A USEFUL TOOL FOR DIAGNOSIS OF

APPENDICITIS

CHUMPON WILASRUSMEE 5336192 RACE/D

Ph.D. (CLINICAL EPIDEMIOLOGY)

THESIS ADVISORY COMMITTEE: AMMARIN THAKKINSTIAN, PH.D., PATARAWAN

WORATANARAT, M.D., PH.D., PANUWAT LERTSITHICHAI, M.D., M.SC.

ABSTRACT

Diagnosis of appendicitis is still clinically challenging. Risk stratification of diagnosis should be developed to aid in management of appendicitis. The purpose of this study was to develop and externally validate Ramathibodi Appendicitis Score (RAMA-AS) in aiding diagnosis of appendicitis. This cross-sectional study consisted of two phases: derivation and validation and was conducted at Ramathibodi Hospital, Thammasat University Hospital, and Chaiyaphum Hospital during January 2013-May 2015. Patients with abdominal pain and suspected of having appendicitis visited at these hospitals were enrolled. Multiple logistic regression was applied to develop parsimonious model. Calibration and discrimination performances were assessed. In addition, our RAMA-AS was compared with Alvarado’s score performances using ROC curve analysis. All analysis was performed using Stata version 14 (Stata Corp, College Station, Texas, USA). A P-value of less than 0.05 was taken as a threshold for statistical significance.

The RAMA-AS consisted of 3 domains 7 predictors including symptoms (i.e. progression of pain, aggravation of pain, and migration of pain), signs (i.e. fever and rebound tenderness), and laboratory: white blood cell count (WBC) and neutrophils. The model fitted well with data with Sommer’s D off 0.686 (95%CI: 0.608, 0.763). The model discriminated well with C-statistic of 0.842 (95% CI: 0.804, 0.881); and the bootstrap C-statistics of 0.848 (95% CI: 0.846, 0.849). For external validation, the RAMA_AS worked well after model revisions including calibration of intercept and overall coefficients plus stepwise selection of significant predictors with O/E ratio of 1.005 and 0.996 (95%CI: 0.784, 1.225 and 0.695, 1.333; Hosmer-Lemshow = 8.219 and 6.640, df = 4 and 4, p = 0.838 and 0.156) for the 2 external validations. The C-statistics of the 2 external validations were 0.853 (95%CI: 0.791, 0.915) and 0.813 (95%CI: 0.736, 0.892). The C-statistics of Alvarado score was 0.760 (95%CI: 0.710, 0.810), which had lower discriminative ability.

RAMA-AS should be a useful tool for diagnosis of appendicitis with good calibration and discrimination performances. Practitioners should be encouraged to use the score in clinical practice in order to confirm diagnosis and choosing the patient who should undergo imaging or surgical management.

KEY WORDS: APPENDICITIS SCORE / DERIVE PHASE / VALIDATION PHASE / CALIBRATION

/ DISCRIMINATION

134 pages

Fac. of Grad. Studies, Mahidol Univ. Thesis / v

"รามาธบด อะเพนดไซทส สกอร" ระบบคะแนนซงเปนเครองมอทมประโยชนในการวนจฉยภาวะไสตงอกเสบ


จมพล วลาศรศม 5336192 RACE/D

ปร.ด. (วทยาการระบาดคลนก)

คณะกรรมการทปรกษาวทยานพนธ: อมรนทร ทกขณเสถยร, Ph.D., ภทรวณย วรธนารตน, M.D.,Ph.D., ภาณวฒน เลศสทธชย, M.D.,

M.Sc.

บทคดยอ

การวนจฉยภาวะไสตงอกเสบโรคยงเปนปญหาททาทาย ควรมการพฒนาการจดชนความเสยงในการวนจฉยเพอชวยเปน

แนวทางในการดแลรกษาผปวย จดประสงคของวทยานพนธฉบบนคอการสรางระบบคะแนนรามาธบดทใชชวยการวนจฉยภาวะไสตง

อกเสบ และดาเนนการตรวจสอบจากภายนอก การศกษาแบบตดขวางซงประกอบดวย 2 ระยะ ไดแก การพฒนาและการตรวจสอบระบบ

คะแนน ไดถกดาเนนการในผปวยทมอาการปวดทองและถกสงสยวามภาวะไสตงอกเสบจากทง 3 โรงพยาบาล ไดแก คณะแพทยศาสตร

โรงพยาบาลรามาธบด คณะแพทยศาสตรโรงพยาบาลธรรมศาสตร และโรงพยาบาลจงหวดชยภม ระหวางเดอนมกราคม พ.ศ. 2556 – 2558

การวเคราะหการถดถอยโลจสตคไดถกนามาใชในการสรางสมการตนแบบทมรปแบบทงายทสด(parsimonious model) ตามดวยการ

ประเมนคา การสอบเทยบ (calibration) และความสามารถในการแบงแยก (discrimination) นอกจากนนยงทาการเปรยบเทยบระบบคะแนน

รามาธบดและระบบคะแนนเอาเวอราโด (Alvarado) โดยใช โคงอารโอซ (ROC curve) การวเคราะหขอมลทาโดยใชโปรแกรม

คอมพวเตอรสตาตาร (Stata) รน 14 คาพ (P-value) นอยกวา 0.05 เปนระดบทถอวามความสาคญทางสถต

ระบบคะแนนรามาธบดประกอบดวย 3 ขอบเขตกาเนด (domain) 7 ตวทานาย (predictor) มาจากการซกประวต

(symptom) 3 ตวแปร ไดแก อาการปวดทองทเปนมากขน อาการปวดทองทเพมขนจากการไอ จาม หรอขยบตว และการยายทของการปวด

ทองจากบรเวณรอบสะดอมาสดานขวาลาง มาจากการตรวจรางกาย (sign) 2 ตวแปร ไดแก ตรวจพบวามไข อณหภมรางกายมากกวา 37

องศาเซลเซยส และการตรวจพบวามการ “ปลอยเจบ” หลงจากคอยๆกดทองลงไปอยางชาๆจนตงฝานวมอหรอคนไขเรมเจบ แลวยกมอขน

(เลกกด) ทนท มาจากการตรวจทางหองปฏบตการ (laboratory) 2 ตวแปร ไดแก การเพมขนของเมดเลอดขาวมากกวา 10,000 เซลลตอ

ลกบาศกมลลเมตร และการเพมขนของนวโตรฟลมากกวารอยละ 75 จากผลการตรวจความสมบรณของเมดเลอด ตนแบบสมการทสรางขน

มความเหมาะสมดกบขอมล โดยมคา ซอมเมอรด (Sommer’s D) เทากบ 0.686 ชวงความเชอมนทรอยละ 95 เทากบ 0.608 ถง 0.763 การ

ตรวจสอบภายในพบวาระบบคะแนนทสรางขนมความสามารถในการแยกผปวยทเปนและไมเปนไสตงอกเสบอยในระดบด มคา C-statistic

เทากบ 0.842 ชวงความเชอมนทรอยละ 95 เทากบ 0.804 ถง 0.881 โดยไดรบการยนยนจากการตรวจสอบภายใน ซงมคา C-statistic 0.848

ชวงความเชอมนทรอยละ 95 เทากบ 0.846 ถง 0.849 การตรวจสอบภายนอกพบวาระบบคะแนนรามาธบด ใชไดดหลงจากการปรบปรง

ระบบคะแนน (revision) โดยการสอบเทยบจดตด (calibration of intercept) และ คาสมประสทธ (coefficient) รวมถงการเลอกแบบขนตอน

(stepwise selection) ของตวทานาย โดยมคาอตราสวน โอ/อ เทากบ 1.005 และ 0.996 ชวงความเชอมนทรอยละ 95 เทากบ 0.784 ถง 1.225

และ 0.965, 1.333 ฮอสเมอรเรมโช (Hosmer-Lemshow) เทากบ 8.219 และ 6.640, ดเอฟเทากบ 4และ 4, พเทากบ 0.838 และ 0.156 การ

ตรวจสอบภายนอกจากทงสองแหงมคา C-statistic เทากบ 0.840 ชวงความเชอมนทรอยละ 95 เทากบ 0.780 ถง 0.910 และ 0.810 ชวงความ

เชอมนทรอยละ 95 เทากบ 0.730 ถง 0.890 เมอเปรยบเทยบกบระบบคะแนน Alvarado ซงเปนทนยมใช พบวา ระบบคะแนนรามาธบดม

ความสามารถในการแยกผปวยดกวาระบบคะแนน Alvarado ซงมคา C-statistic เทากบ 0.760 คาความเชอมนทรอยละ 95 เทากบ 0.710 ถง

0.810

โดยสรปการสรางระบบคะแนนรามาธบดเพอชวยในการวนจฉยโรคไสตงอกเสบ มคาความสามารถในการแยก และ

ทานายผปวยไสตงอกเสบในระดบด ซงนาจะนาไปใชในทางคลนคเพอเปนแนวทางในการดแลผปวย การชวยวนจฉยแยกโรคไสตงอกเสบ

การสงเกตอาการ การสงตรวจทางรงสวทยา และการผาตดรกษา

134 หนา

Fac. of Grad. Studies, Mahidol Univ. Ph.D.(Clinical Epidemiology) / 1

CHAPTER I

BACKGROUND AND RATIONALE

1.1 Anatomy Appendix is a blind-ended tube at the end of cecum, which is located

approximately 2 cm from ileocecal valve which separates the small intestine from

large intestine (Figure 1.1a). The surface point is known as McBurney’s point (Figure

1.1b) and corresponds with the position of the appendix in the abdomen. The average

length of human appendix is 9 cm (range: 2-20 cm) with a diameter of 7 to 8 mm. The

appendix might serve as a microbial reservoir for repopulation of the gastrointestinal

tract in times of necessity(1). The function of gut associated lymphoid tissue may

provide immune defence from invading pathogen. However, lack of side effects after

appendectomy means it is judged as only vestigial and lack of specific function.

1.2 Pathophysiology and pathology

Appendicitis is defined as an inflammation of the appendix that usually

begins at the inner lining and spreads to its other part. Direct luminal obstruction, often

by fecalith, lymphoid hyperplasia, impacted stool, foreign body, or tumor can cause

appendicitis. Several infectious agents, genetic factors, and environmental influences

have been reported to be associated with appendicitis. Pathology of appendicitis

includes congestion/obstruction, colour changes, increase diameter, exudate, pus,

perforation, transmural inflammation, ulceration, thrombosis, and necrosis as seen in

macroscopic appearance(2).

Obstruction of the appendicular lumen has been reported from a variety of

causes, e.g., fecalith, lymphoid hyperplasia, vegetable matter and fruit seed, barium

from radiography, intestinal worms (especially ascarids), primary tumors (carcinoid,

adenocarcinoma, Kaposi sarcoma, and lymphoma), and metastatic tumors (colon and

breast) (3, 4). Fecolith is the most common cause which was reported as high as 67-

vi

CONTENTS

Page

ACKNOWLEDGEMENTS…………………………………………………………iii

ABSTRACT (ENGLISH)............................................................................................iv

ABSTRACT (THAI).....................................................................................................v

LIST OF TABLES.....................................................................................................viii

LIST OF FIGURES......................................................................................................x

CHAPTER I BACKGROUND AND RATIONALE ................................................. 1

1.1 Anatomy ................................................................................................ 1

1.2 Pathophysiology and pathology ............................................................ 1

1.3 Diagnosis ............................................................................................... 4

1.4 Rationale ................................................................................................ 7

1.5 Research Questions.............................................................................. 10

1.6 Research Objectives ............................................................................ 10

CHAPTER II LITERATURE REVIEW ................................................................. 13

2.1 History of previous scores’ developments ........................................... 13

2.2 Systematic review of scoring systems for diagnosis of ....................... 19

appendicitis(60)

2.3 Definition ............................................................................................. 28

2.4 Research methods in risk prediction scores ......................................... 29

2.5 Conceptual framework…………………………………………….....33

CHAPTER III METHOD .......................................................................................... 50

3.1 Study design and setting ...................................................................... 50

3.2 Study subjects ...................................................................................... 51

3.3 Data Collection……………………………………………………….54

3.4 Sample size estimation ........................................................................ 55

3.5 Data management ................................................................................ 56

3.6 Statistical analysis................................................................................ 57

vii

CONTENTS (cont.)

Page

3.7 Ethics considerations ........................................................................... 65

CHAPTER IV RESULTS .......................................................................................... 70

4.1 Characteristic of patients ..................................................................... 70

4.2 Imputation 70

4.3 Model development ............................................................................. 71

4.4 External validation 74

4.5 Comparison of RAMA-As and previous scores 75

CHAPTER V DISCUSSION ..................................................................................... 97

5.1 Comparison of RAMA-AS and previous score and ............................ 97

radiological investigation

5.2 External validation and model updating .............................................. 98

5.3 Using the RAMA-AS in practice (interpretation and implication) ... 100

5.4 Strengths and limitations ................................................................... 101

CHAPTER VI CONCLUSION ............................................................................... 105

REFERENCES ......................................................................................................... 109

APPENDICES........................................................................................................... 119

BIOGRAPHY.. ..........................................................................................................134

viii

LIST OF TABLES

Table Page

2.1 Alvarado scoring system .................................................................................. 35

2.2 The scoring parameters of RIPASA score based on probability ....................... 36

and extra weight

2.3 The Appendicitis Inflammatory Response Score (AIRS) ................................ 37

2.4 Fenyö-Lindberg scoring system ....................................................................... 38

2.5 Ohmann score .................................................................................................... 39

2.6 Eskelinen score .................................................................................................. 39

2.7 Simple scoring system ....................................................................................... 39

2.8 Practical score of Ramirez and Deus ................................................................. 40

2.9 Scoring system developed by Teicher ............................................................... 41

2.10 Performance of Random Forests (RF), Support Vector Machines (SVM), ...... 42

Artificial Neural Networks (ANN), Logistic Regression (LR), and

Alvarado score on diagnosis of acute appendicitis

2.11 Describe methodological assessments ............................................................... 43

2.12 Characteristics of studies that had developed prediction scores ....................... 44

for appendicitis

3.1 Estimate sample size by PS program .................................................................. 67

3.2 Re-calibration and revision of models for external validations .......................... 68

4.1 Baseline data of 396 patients from Ramathibodi Hospital, 152 patients……… 76

from Thammasat Hospital, and 178 patients from Chaiyaphum Hospital

4.2 Report on number of missing data ..................................................................... 77

4.3 Description of patients’ characteristics in appendicitis and .............................. 78

non-appendicitis groups

4.4 Factors associated with appendicitis: Multiple logistic regression analysis ...... 80

4.5 Risk stratification and predictive values of a RAMA-AS prediction score ....... 81

4.6 Key study characteristics of patients from derivation and external validation .. 82

ix

LIST OF TABLES (cont.)

Table Page

4.7 Description of patients’ characteristics in appendicitis and ............................... 83

non-appendicitis groups of Thammasat Hospital

4.8 Description of patients’ characteristics in appendicitis and ............................... 84

non-appendicitis groups of Chaiyaphum Hospital

x

LIST OF FIGURES

Figure Page

1.1 The Anatomy of appendix ................................................................................ 11

1.2 A Normal appendicitis at McBurney’s point……………………………….…12

1.2 B Inflamed appendicitis……………………………………………………….12

1.2 C Gangrene/ ruptured appendicitis…………………………………………….12

2.1 Identification of studies for inclusion. .............................................................. 34

3.1 Rebound tenderness ............................................................................................ 69

4.1 Diagnostic plot between missing and observe values……………… … ……..85

4.2 Receiver Operating Characteristic (ROC) curves of RAMA-AS………… …86

for diagnosis of appendicitis

4.3 Fagan nomogram plot for RAMA-AS risk stratification………………… ……87

4.4 Calibration plots for external validations at Thammasat Hospital ………… ….88

using different update methods

4.5 Calibration plots for external validations at Chaiyaphum Hospital………. …92

using different update methods

4.6 Comparisons of C-statistics between RAMA-AS, Alvarodo,……………….…96

Eskeline and Fenyo scores

5.1 Comparisons of C-statistics between RAMA-AS, ultrasound and CT scan… 103

…5.2 Guide to choose predictive score for appendicitis according to prevalence.. ..104

…6.1 Website-based calculation of appendicitis.……………………………….. ..105

…6.1 Admission record of ongoing impact analysis.………………………… …....107

Chumpon Wilasrumee Background and Rationale / 2

89% (5). The prevalence of appendicitis is higher in teenagers and young adults than

general adults, which could be explained by a pathophysiological role of lymphoid

hyperplasia that exists in abundance in the appendix (6), and a lumen which was

smaller size relative to wall thickness and susceptible to obstruction.

The obstruction theory is based on the observed frequency of such

obstructions, on the increased intraluminal pressures found in inflamed appendixes,

and on the experimental production of appendicitis by obstruction(7). The distribution

of inflammatory changes in acute appendicitis was analyzed by histological

examination of multiple longitudinal sections which found that inflammation was

either sharply confined to the distal part of the appendix or involved the whole organ

but less small inflammation involved either to the proximal part of the organ or to the

central portion(5).

Although the obstruction theory was reasonable, it has yet to be proven. It

also fails to explain infection case .Many pathologists have suspected that the fecalith

occurred after infections of the appendix, consistent with recent evidences that did not

support obstruction pathology (8, 9). In addition, about 90% of patients with

phlegmonous appendicitis had no evidence of raising intraluminal pressure and signs

of luminal obstruction (10). As a result, obstruction may not be an important role of

acute appendicitis, although it may develop as a result of the inflammatory process.

Another possible cause is about labile factor in the appendix, which

responds to a variety of external and internal stimuli, which causes injury to the

appendix which permits bacterial invasion by the flora normally present(7). It is likely

that there are several aetiologies of appendicitis which leads to the final common

pathway of bacterial invasions to the appendiceal wall(6). The labile factor varies with

different investigators, and change in vascular tone can be one of the basis for the

attack of appendicitis(7). Vascular or muscle spasm altered permeability of capillaries

and changes in local acidity may be causes, but the assumption of a labile element,

capable of rapid response to stimuli, is the common factor in all.

Either obstruction or bacterial invasion leads to inflammation, rising

intraluminal pressures, and ultimately ischemia. Subsequently, the appendix enlarges

and incites inflammatory changes in surrounding tissues, e.g., pericecal fat and

peritoneum. Rapid distension of the appendix is likely because of its small luminal


capacity, and intraluminal pressures can reach 50 to 65 mm Hg(8). As luminal

pressure increases, venous and lymphatic obstruction and mucosal ischemia develop.

Mucosa becomes hypoxic and begins to ulcerate, resulting in compromise of the

mucosal barrier, and leading to invasion of the appendiceal wall by intraluminal

bacteria.

As a result of the inflammation process, visceral afferent nerve fibers that

enter the spinal cord at T8 - T10 are stimulated, causing referred epigastric and

periumbilical pain. Once inflammation and infection extend to serosa, parietal

peritoneum, and adjacent organs, the somatic pain supersedes the early referred pain.

Patients usually have migratory pain, tenderness, guarding, and rebound tenderness to

the site of maximal pain at the right lower quadrant(11) of the abdomen. Progression

of disease causes compromise arterial blood flow and infarction, resulting in gangrene

and perforation, which usually occurs between 24 and 36 hours. Fever, anorexia,

nausea, and vomiting usually follow as the pathophysiology worsens(8, 12).

Appendicitis with/without perforations may be different in

pathogenesis(6). Patients with a short duration of symptoms had a predominantly

neutrophil infiltrate but long duration might be lymphocytic infiltrate with granulation

tissue. These findings support the argument that a mixed infiltrate of lymphocytes and

eosinophils represents a regression phase of acute appendicitis. Fibrous adhesion

formation and scarring of the appendiceal wall have been demonstrated and are

consistent with resolution of a previous attack of appendicitis. The manifestation of

appendiceal perforation with/without little inflammation was found and in some cases,

an ischemic appendix perforates differently from those in which perforation was due

to the evolution of an inflammation with severe infection.

Recently, the concept of neuroimmune appendicitis has evolved(13). After

a previous minor bout of intestinal inflammation, subtle alterations in enteric

neurotransmitters are detected, which may result in altered visceral perception from

the gut. This process has been implicated in a wide range of gastrointestinal conditions

including appendicitis. The local immune response in the appendiceal tissue is

mirrored in the blood. The immunological response pattern in peripheral blood

suggests Th1/Th17- induced inflammation in advanced appendicitis which is present

at an early clinical presentation. Patients with a history of advanced appendicitis have


stronger Th1 responses than individuals with a history of phlegmonous appendicitis.

This may reflect constitutional differences between patients with different outcomes of

appendicitis. The increased inflammatory response observed early in complicated

appendicitis (gangrene or perforation) suggests a more violent inflammation and

supports the hypothesis of different immune pathogeneses, where excessive induction

of Th1/Th17 immunity and/or deficiencies in down-regulatory feedback mechanisms

may explain the excessive inflammation in advanced appendicitis(14).

Normal appendicitis at McBurney’s point (lateral 1/3 of the line between

umbilicus and anterior superior iliac spine) is shown in Figure 1.2A, compared with

inflammation of appendix in Figure 1.2B, and gangrene/ ruptured appendicitis in

Figure 1.2C.

1.3 Diagnosis The diagnosis of suspected appendicitis is usually based on patient history,

physical examination, laboratory test, imaging (e.g., computer tomography (CT) scan

or ultrasound (U/S)) with pathological confirmation. Despite it is a common problem,

appendicitis remains a complicated problem and perplexing to establish diagnosis,

especially in patients who had atypical clinical presentation. The classic signs and

symptoms are present in only 60% to 70 % of patients indicating a difficulty to

ascertain a correct diagnosis. This could delay the diagnosis or leading to unnecessary

operation and contribute to the persistent rate of morbidity and mortality. When using

routine clinical methods, the correct diagnosis can be obtained in between 71% and

97% of patients, but the rate of negative appendectomy was still high, and varied

between 14% and 75 %(15-17), or even as high as 85 %(18). The incidence of

perforated appendicitis varies between 4% and 45 %, and the mortality rate ranges

from 0.17% to 7.5 %(15). Therefore, clinicians should try to improve diagnostic by

not only carefully assessing for those signs and symptoms but also finding the

additional tools to help in discrimination of high risk patients where surgical

intervention is necessary from low risk patients who do not need further investigation

or observed safely.


There are 3 possible scenarios in which misdiagnosis occurs. First,

appendicitis is diagnosed, the patient undergoes an operation, and non-appendiceal

disease is discovered, which may or may not benefit from surgical intervention (e.g.,

gynecologic lesions, colitis, or inflammatory bowel disease of the terminal ileum), in

this scenario, the appendix may or may not be removed. Second, appendicitis is

diagnosed, the patient undergoes an operation, and no abnormality is found. Again, the

appendix may or may not be removed. Third, appendicitis is not diagnosed but the

patient does have an inflamed appendix. The first 2 scenarios of misdiagnosis or

negative appendectomy (NA) are not only important in quality improvement but also

involve in patient safety, cost, morbidity and mortality. For the last scenario, most

patients may return with persistent/ recurrent appendicitis with/without perforation or

other complications, e.g., abscess. Misdiagnosis is undoubtedly serious, but its

clinical importance is complicated. Some patients may be resolved without surgical

treatment. Some early cases may progress over a period of observation, allowing for

detection on repeat evaluation. Worse in some patients it may progress to perforation

and complicated appendicitis. Concern for the third scenario lead to drive clinical

practice toward the first 2 forms of misdiagnosis, so a high number of negative

explorations for suspected appendicitis has been tolerated as surgeons endeavored to

miss no cases, thereby averting perforation. It is much better to subject a moderate

number of patients to unnecessary operation than to let one patient suffer perforation.

Increasing use of sensitive imaging (e.g., U/S CT) can improve diagnostic

accuracy and detect mild inflammation of the appendix, which may resolve without

operation(19). Several studies over the past years have shown that the use of imaging

was associated with a reduction of NA(20-22). However, this performance was not

consistent, and many studies could not replicate this finding(23-26). The evidences

even showed that clinical examinations and CT scan were not much different in

sensitivity (83% vs 83.8%, respectively) and positive predictive value (PPV) (86.7%

vs 83.8%), whereas the U/S performed inferiorly to both (sensitivity 35.5% and PPV

81.3%)(25).

A routine use of CT scan in the diagnosis of acute appendicitis has been

increasing in recent years. This practice is highly controversial due to concerns related

to the hazards of ionizing radiation and also about its overutilization in clear-cut


clinical presentations. Patients are exposed to high doses of radiation which are

equivalent to 400 times of general chest film, and this will increase the risk for

development of cancer or leukemia(27). One study suggested that a large proportion of

patients who undergo abdominal and pelvic CT scanning received medically

unnecessary multiphase examinations, resulting in substantial excessive radiation

exposure(20). Approximately 3 million scans were performed annually in the United

States in 1980, and by 2008, that number had grown to 67 million(20). This study

suggested that a large proportion of patients undergoing abdominal CT scan receive

unindicated additional phases that add substantial excess radiation dose with no

associated clinical benefit. One study has estimated that the benefit of universal

imaging in avoiding 12 unnecessary appendectomies could result in one additional

cancer death(28). In addition, a randomized controlled study compared clinical

assessment with CT for the diagnosis of acute appendicitis indicating clinical

assessment, unaided by CT scan, reliably identified patients who required operation.

Therefore, a routine use of abdominal/pelvic CT was not recommended(21). CT scan

is not considered as a standard of care for the diagnosis of acute appendicitis. A study

of 1,630 patients with suspected appendicitis showed that the overall negative

appendectomy rate in patients with a CT scan was 6% which was similar to that in

those without CT scan(29). Neither CT scan nor US improves the diagnostic accuracy

or the negative appendectomy rate and worse may delay surgical consultation and

treatment. Alvarado score has then been developed and used to help in making

decision of prescribing CT scan in an emergency setting(30). The score considered

abdominal pain which migrates to the right lower quadrant, anorexia,

nausea or vomiting, right lower quadrant tenderness to palpation, rebound abdominal

tenderness, increased temperature (37.3°C or 99.1°F), leukocytosis (white blood cell

(WBC) count >10,000 cell/mm3), and neutrophilia (cell count with left shift), with a

total score of 10. If the score was 4 to 6, an adjunctive CT was recommended to

confirm diagnosis. Combining appropriate imaging with history, physical examination

and laboratory tests in clinical prediction rule are crucial for the management of

patients.

Over 20,000 studies have been published, but few randomized controlled

trials, especially in imaging, have been undertaken with controversial evidences(50).


In addition, applying CT imaging was also varied, which it was as low as 12 % in the

UK, 25% in Australia, and 95 % in the USA (27). The best way of management is

considering three possibilities: hospital discharge, admission for observation, and

surgical treatment(19). Estimating pre-image likelihood of appendicitis is important in

tailoring management. Low-risk patients could be discharged with appropriate safety

netting, whereas high-risk patients are likely to require early focus on timely surgical

intervention rather than diagnostic imaging. Using scoring systems to guide imaging

can be helpful(27, 31).

Appendicitis is one of the most common causes of surgical abdominal

emergency. Early diagnosis is a primary goal to prevent morbidity and mortality from

appendicitis. Failure of early diagnosis can lead to complications of disease such as

perforation and sepsis, increasing morbidities, and occasionally mortality. NA has

been reported between 15% and 30%, which is considered as surgical security zone.

Conversely, unnecessary appendectomy could burden time and cost. The nature of

negative appendectomy is associated with incorrectly diagnosis and surgeon’s

experience. Patients who are overweight, female, and old age have higher chance for

misdiagnosis of appendicitis. Severity and burden of negative appendectomy (NA)

ranges from economic loss of time and money to death from complication of surgery

or anesthesia such as pulmonary embolism in high risk patients. The mortality rate of

negative appendectomy has been reported as 0.14% to 1% (32).

The total cost of negative appendectomy from a pilot study in Faculty of

Medicine Ramathibodi Hospital ranged from 10,000 to 20,000 bath with mean

hospital stay of 3 days (range: 2-15), mean absence from work 5 days (range: 5-10).

Lowering the negative appendectomy rate would result in considerable saving direct

cost and disability to patients. Improvement in diagnostic accuracy has been reported

to lower perforation rate and coincided with the decrease in negative laparotomy(19).

1.4 Rationale Disparities in access to surgical diagnosis and management can result in

major discrepancies in the outcomes of patients. Omission of surgical care is a serious

oversight while omission of proper diagnosis before surgery may be more


harmful(33). Acute appendicitis in rural areas has a very different disease profile and

outcome when compare to that seen in the well heath-serviced city. There is a causal

relationship between delay in management and poor outcome which needs urgent

strategies to reduce these delays. One of the suggested strategies aimed at facilitating

the diagnosis of acute appendicitis is the introduction of clinical decision rules (CDR)

to assist with clinical decision making(34, 35). The Alvarado score is the most widely

used CDR which was originally designed more than two decades ago, although, its

performance and appropriateness for routine clinical use is still unclear. A systematic

review showed that the Alvarado score at the cut point of 5 performs well as a “rule

out” CDR in all patient groups with suspected appendicitis(36). Pooled diagnostic

accuracy in terms of “ruling in” appendicitis at a cut-point of 7 is not sufficiently

specific in any patient group to proceed directly to surgery. Certain loss of diagnostic

information may occur due to dichotomisation when the score was originally

constructed in the derivation study. Its construction was based on a review of patients

who had been operated on with suspicion of appendicitis, whereas the score is used in

all patients with suspicion of appendicitis(10). Applying Alvarado score to general

population may be problematic due to appropriate derived score (see detail in chapter

II). Other CDRs were developed, such as Lindeberg(37), Eskelinen(38) and Fenyo(39)

scores for appendicitis, which have different numerical values for symptoms. The Van

Way, Teicher and Arnbjornssion scores include gender as one of their

components(40). Some authors(41) reported that the Alvarado score outperformed

each of these other scores.

The clinical prediction score may reduce the negative appendectomy rate

as well as decrease complication from appendicitis including ruptured, perforated, and

appendiceal abscess. This will reduce the risk of unnecessary operation, risk of

anesthesia, cost of hospitalization, and unnecessary loss of work and time. Gregory et

al showed the cost effectiveness of integrating a CDR in the diagnostic protocol for

appendicitis(42). The CDR followed by staged imaging is found to be the most cost

effective approach. The implementation of Alvarado and Lintula scores for the

decision of hospital admission and appendectomy has been shown to reduce overall

treatment charges for acute right lower quadrant abdominal pain(43), and the total


charge for 114 patients was reduced from $39,655 to $34,087 and $25,772 in using

Alvarodo and Lintula scores, respectively.

A clinical scoring system estimates the probability of appendicitis

occurrence and should aid in the decision-making process for management. There are

a number of reasons to use scoring systems in managing cases of appendicitis. A

clinical score may be suitable as an instrument for selecting patients for immediate

surgery, further evaluation with imaging techniques, or observation as out/inpatients.

The score can be repeated during active observation and influence the decision to

operate. It must be emphasized that the intent of the scoring system is not to establish

a primary diagnosis of appendicitis, but simply to discriminate objectively when there

is uncertainty. Routine use of an Alvarado-like scoring system was evaluated in a large

German study comparing patients who were/were not applied Alvarado-like scoring

system(10). No difference in the rates of perforated appendix, negative

appendectomy, or complications was found between groups. However, it showed

significantly lower delayed appendectomy rate and a lower delayed discharge rate in

the group that routinely used the scoring system.

Several scoring systems have been developed for diagnosis of appendicitis

with interesting results, nevertheless these systems have been less routinely applied in

general practice. We have systematically reviewed how those scores were developed

and validated, and how their performances were. The review suggested that the

research methods for scoring systems of appendicitis showed discrepancy. Although

there are several diagnostic scoring systems available, applying them to general

population might be questionable due to improper methods used for creating scores.

The more appropriate scores with internal and external validations are still

required(44).

The goal of this study was to create a good CDR for diagnosis of

appendicitis which has characteristics as follows:

• Consistently, applicable to all adult patients

• Criteria explicit and credible

• Reproducible

• Sufficient and comprehensive

• User friendly, good compliance


• Generalization

• Cost-effectiveness

In this thesis, it is expected that the new score should be developed and

validated using proper research methods with good performances in internal and

external settings. It should be able to aid in clinical decision, and also impact on

changing behaviour of clinical practice, and improve outcome in the diagnosis and

management of patients with appendicitis.

1.5 Research Questions - What are significant predictors for diagnosis of appendicitis in patients

who are suspected of appendicitis?

- What is the performance of RAMA-AS in patients who are suspected of

appendicitis?

- Does the RAMA-AS perfrom better than the previously developed

scores?

- Can the RAMA-AS work well in internal validation and external

validation?

1.6 Research Objectives

1.6.1 Primary Objectives

1.6.1.1 To develop a RAMA-AS for diagnosis of appendicitis

in patients who are suspected of appendicitis.

1.6.1.2 To externally validate RAMA-AS using data from

different settings that used for score development.

1.6.2 Secondary objectives

To compare performance of RAMA-AS with the most popular

scoring system used, i.e. Alvarado score and previously developed scoring systems


Figure 1.1 Anatomy of appendix


Figure 1.2 Normal and abnormal appendix: A) Normal appendix at McBurney’s

point, B) Inflamed appendicitis, C) Gangrene/ ruptured appendicitis


CHAPTER II

LITERATURE REVIEW

This chapter mainly focuses on review previous scoring systems, in term of

how they were developed, what predictors were included, how to calculate scores, how

their performances were, and whether the scores had been internally and externally

validated. A recent publication in World Journal of Emergency 2016(18) had paid

attention in “how to improve the clinical diagnosis of acute appendicitis in resource

limited settings”, which stated that “diagnosis of acute appendicitis can be improved if

the clinician uses a careful history and physical examination, and simple laboratory

tests. However, under certain circumstance, additional tests could be needed”. This

approach had given good results in various studies and proved that the clinical prediction

rules by combining related information were a simple, practical, economical, and

reliable method for the diagnosis of acute appendicitis(18). This chapter consists of

information as follows:

2.1 History of previous scores’ developments Alvarado score was developed and reported since 1986 by Alfredo

Alvarado(45). The score was developed from retrospective data of 227 patients in

Philadelphia, Pennsylvania, USA. The statistical 2x2 table was made for each diagnostic

predictor including migration of pain, anorexia-acetone in urine, nausea-vomiting,

tenderness, rebound pain, elevation of temperature, leucocytosis, shift to the left of

WBC, and rectal tenderness. Chi-square statistic was applied along with estimations of

probabilities, sensitivity, specificity, and predictive values. The diagnostic weight for

each clinical and laboratory result was assigned which considered only the true positive

and true negative results. The value of 2 was assigned to the important elements

(tenderness at right lower quadrant of abdomen and leucocytosis) and 1 to the remaining

elements (abdominal pain that migrated to right lower abdomen, anorexia, nausea or

Chumpon Wilasrumee Literature Review / 14

vomiting, rebound tenderness, elevated body temperature, or neutrophilia) (Table 2.1).

The total score was 10 with a score of 5 or 6 compatible with diagnosis of appendicitis

(patients can be observed), score of 7 to 8 indicates probable appendicitis, and score of

9 to 10 indicated very probable appendicitis. The modified Alvarado score was reported

by Kalan, et al, in 1994(46) by using extra sign(s) from physical examination including

cough test, Rovsing’s sign, and rectal tenderness instead of laboratory value of left shift.

Another modification used the total score of 9 after removing the laboratory value of

left shift from the original score. Khan, et al(47) reported low sensitivity (59%) and

specificity (23%) of Alvarado scoring system with negative appendectomy rate of

15.6% when applied to Asian population. Al-Hashemy, et al(48) reported similar low

sensitivity (53%) and specificity of 80% when modified Alvarado score was applied to

a Middle Eastern population. In my opinion, Alvarado score lacks some parameters that

have important impact on the diagnosis of appendicitis, so there is room for

improvement to generate the better scoring system for Thai population.

The Raja Isteri Pengiran Anak SalehA (RIPASA) score was developed and

reported in 2010 by Chong C, et al(49). This score was developed using the retrospective

collected database of RIPASA hospital, Brunei Darussalem between October 2006 to

May 2008. A total of 312 patients who had presented with right iliac fossa pain suspected

to be appendicitis and who underwent emergency appendectomy as primary procedure

were included in this study. The mean age of patient was 26±13.5 years, with male to

female ratio of 1.4:1. The negative appendectomy rate was 16.3%. Final diagnosis of

appendicitis was obtained from the resected appendix. The panel of surgeons at RIPAS

hospital agreed to use 15 parameters for score development, i.e., age, gender, right iliac

fossa (RIF) pain, nausea and vomiting, anorexia, duration of symptoms, RIF tenderness,

guarding, rebound tenderness, Rovsing’s sign, fever, elevated WBC count, negative

urinalysis, and foreign national registration identity card (NRIC). The probability of

appendicitis was estimated by logistic regression analysis, and used to generate scores

as shown in Table 2.2. The optimal cut-off threshold score generated from the ROC

analysis was 7.5. The sensitivity, specificity, positive predictive value, negative

predictive value, and accuracy were 88.46% (95% confidence interval (CI) 83.94-

92.08), 66.67% (95%CI 52.08-79.24), 93.00%, 53.00%, and 80.50% (95%CI 73.35-

87.65), respectively. The predicted negative appendectomy rate at cut off score of 7.5


was 6.9% which was reduced from 16.3% (9.3% reduction, p=0.0007). The RIPSA

score had good discrimination with area under Receiver Operating Characteristic (ROC)

curve of 0.89. In my opinion the RIPASA had too many parameters, generated from

retrospective data which had some missing data (84% Rovsing’s sign, 36% rebound

tenderness, 54% anorexia, 18% migration of pain, 13% negative urinalysis, 7% right

lower quadrant guarding). It is possible to have a new appendicitis score that is easier

to use and suitable for Thai population.

The Appendicitis Inflammatory Response Score (AIRS) was reported by

Andersson and Andersson in 2008(50). This score was generated in Sweden by

prospective data collection of 545 patients admitted with suspected appendicitis

between October 1992 to December 1993. The score was developed from 316 randomly

selected patients. The simplified score was constructed based on the ordered logistic

regression. Eight variables with independent diagnostic value including right lower

quadrant (RLQ) pain, rebound tenderness or muscle defense, white blood cell count,

proportion of neutrophil, c-reactive protein (CRP), body temperature ≥ 38.5 degrees

Celsius, and vomiting remained in the final model with score ranged from 0-12 (Table

2.3). The score 0-4 was classified as low probability and out patients follow up can be

done if unaltered general condition. The score 5-8 was indeterminate risk and in-

hospital observation, re-evaluation, and further investigation were recommended. The

score 9-12 was high probability and surgical exploration was proposed. The score was

internally validated using 229 patients. The discrimination capacity of the score was

better than Alvarado score in all appendicitis and advanced appendicitis samples (Table

2.3). The ROC area of the new score was 0.97 for advanced appendicitis and 0.93 for

all appendicitis compared with 0.92 (p = 0.0027) and 0.88 (p = 0.0007), respectively,

for the Alvarado score. The sensitivity, specificity, positive predictive value, and

negative predictive value of the score with the cutoff point more than 4 were 0.96, 0.73,

0.64, and 0.97, respectively. The score were not extrenally validated in Asian

population. This score needs external validation in Thai or Asian population and further

evaluation in a prospective interventional study.

The Fenyö-Lindberg scoring system (Table 2.4) was reported in 1987 by

Fenyö from Sweden(39). The score was developed from prospective data collection of

259 patients who were suspected of having appendicitis. The score was developed


separately by men and women. The sensitivity and specificity were analysed according

to presence and absence of 19 parameters. The weight of evidence, equal to 10 loge

(sensitivity/1-specificity) was expressed as a positive/negative score. The score was 2

times externally revalidated and reported by Finyö, et al in 1997(37). The first validation

encompassed 19 indicators from 830 consecutive patients. The second validation was

based on 10 parameters including sex, white cell count, duration of pain, progression of

pain, relocation of pain, vomiting, aggravation by coughing, rebound tenderness,

rigidity, and tenderness outside right lower quadrant in 1167 patients with suspected

appendicitis. The score of -2 or more had probability of appendicitis ≥0.45 and was used

as an indication that patient had appendicitis and supported a decision to perform

appendectomy. The score of -17 or less had probability of appendicitis ≤ 0.16 and was

considered as non-specific abdominal pain and guided for non-operative management

by observation or discharge. The score between -3 to -16 had probability of appendicitis

between 0.44-0.17 and was interpreted as indeterminate which guided for in hospital

observation with repeated examination. The sensitivity, specificity, PPV, NPV, and

accuracy of this scoring system at the cut-off level of -2 or more were 0.73, 0.87, 0.75,

0.87, and 0.83, respectively. The negative appendectomy rate after using the score was

17.5%. However, the Fenyö-Lindberg scoring system had some limitations such as the

complexity of score (each parameter had both negative and positive score) and high

negative appendectomy rate.

Ohmann scoring system was reported in 1995(51), considering 8 parameters

including tenderness in RLQ, rebound tenderness, dysuria, constant pain, wbc count,

patient age > 50 year old, shifting pain, and local guarding (Table 2.5). The original

publication was in Germany language. The score < 6.5 should exclude appendicitis

whereas the score above 12 made it highly suggestive for appendicitis. The score

between 6.5 and 12 suggested that the finding is unclear and patients need observation.

Tepel, et al(52) performed prospective evaluation of the score and found that the

sensitivity, specificity, PPV, NPV, and accuracy were 61%, 85%, 61%, 85%, and 78%,

respectively.

Eskelinen scoring system was reported in 1992(38) by inlcuding 6 variables,

i.e., tenderness, rigidity, leucocyte count, rebound tenderness, pain at presentation, and

duration of pain. A logistic stepwise multivariate regression analysis was used to


develop diagnostic score, 3 tests were evaluated to find the best combination of

independent predictors of acute appendicitis for males and females. Each parameter had

criterion points that need to be multiplied by the respective factors and added to have a

final score. The cut-off point for diagnosis of appendicitis was 55 (Table 2.6). Sitter et

al(53) exteranlly validated this score using prospective data from 2,359 consecutive

patients in Germany and found the sensitivity, specificity, PPV, NPV, and accuracy of

79%, 85%, 68%, 91%, and 84%, respectively. They re-calibrated the score’s cut off

value to 57 which yielded better results and decreased the rate of negative appendectomy

from 26.6% to 15.4%.

A simple scoring system was reported by Christian F and Christian GP in

1992(54). The score was developed by non-statistical method. There were 5 parameters

including abdominal pain, vomiting, RLQ tenderness, low grade fever (body

temperature ≥ 37.8C by oral route), and polymorphonuclear (PMN) leukocytosis

(Table 2.7). A simple rule was applied with criteria of having four or more out of 5

parameters, appendectomy was performed. If the patients had 3 criteria on admission,

active inpatient observation was necessary until development of the 4th criteria and

appendectomy was carried out or until patients recovered which no progression beyond

the third criteria was found. The study was done in 58 patients and compared to the

control of 59 patients from another surgical unit. The negative appendectomy rate was

significantly lower in the group of patients that used scoring system (6.5%, 3/46) when

compared to the control group (17%, 10/59). This score has yet to be externally

validated.

A practical score of Ramirez and Deus was reported in 1994(55). The score

was developed using univariate analysis. Positive and negative weights were given to

each significant predictive parameters using Bayesian probability. There were 7

parameters including sex, initial pain (epigastric or other locations), diarrhea, white cell

count, differential white count, guarding in RLQ, and rebound tenderness (Table 2.8).

The Bayesian methodology was used to generate scoring system. No appendicitis was

found in the score less than -15. The mean score in proven appendicitis patients was 18

(-15 to 37). In prospective evaluation, the sensitivity and specificity of this scoring

system were 80% and 81%, respectively. This score proposed a dynamic system,

patient’s score can increase or decrease on reassessment. This system confirmed the


effectiveness of scoring which is generated from local database, opening system, and

incorporate new attribution parameters which can produce a better scoring system.

Neither internal nor external validation has been reported on this system.

A scoring system developed by Teicher, et al was published in 1983(56).

The score was developed using univariate analysis, rate of occurrence for each

predictive parameter was determined and a ratio was assigned a positive value when the

rate was greater in appendicitis group and negative when greater in non-appendicitis

group. There were 7 parameters including sex, age, duration of symptoms, genitourinary

tract symptoms, muscle spasm at RLQ, rectal mass at right side, and WBC count (Table

2.9). The total score ranged from -11 to 11 and cut off points at -3 was recommended.

A single parameter has been reported as a predictor for appendicitis such as

hyperbilirubinemia was associated with perforated appendicitis(57). Imaging

technology such as ultrasonography, computer tomography, and magnetic resonance

imaging were used with clinical data or scoring systems to improve the accuracy in

diagnosis of appendicitis. I have worked with Redmond group and published a new

perspective in appendicitis: calculation of half time (T1/2) for perforation(58). Random

forest (RF), support vector machines (SVM), and artificial neural networks (ANN) were

used to improve the accuracy in diagnosis of appendicitis(59). Hsieh, et al found that

RF was significantly more accurate than ANN, logistic regression, and Alvarado in

diagnosis of appendicitis(21). SVM worked better than logistic regression, and

Alvarado. No significant difference was found between ANN, logistic regression and

Alvarado (Table 2.10).


2.2 Systematic review of scoring systems for diagnosis of

appendicitis(60) Appendicitis is one of the most important clinical causes among acute

abdominal pain, with an incidence of 110/100,000(52). Although many attempts have

been made to improve the diagnostic accuracy, false positive and false negative rates

remain common with rates of negative appendectomy of 15% to 26%(61, 62) and

perforated appendectomy of 10% to 30%(63). Several scoring systems included

computer-based models and algorithms that had been developed with good

performances at the initial evaluation, but fair when applied to general populations.

Nevertheless, these scoring systems have been occasionally applied in a general routine

practice, because of a lack of accuracy in validation studies(34). The drawback of the

negative appendectomy (i.e., false positive) was less life threatening than a false

negative which could be as worse as mortality from appendiceal perforation and

peritonitis from a perforated appendicitis. As a result, the aggressive surgical approach

was frequently applied when the situation was in doubt which resulted in removal of

normal appendices. In order to reduce the aggressive management, diagnostic tests for

appendectomy are required to improve performance in discriminating patients who

require prompt surgical intervention from the patients who need only observation

without a risk of complication of appendicitis.

Imaging modalities have been used to improve diagnostic accuracy.

However, there are some disadvantages including cost, less accessible particularly in

developing countries, lack of radiologists, examiner-dependent efficacy (e.g.,

ultrasound), potential harmful ionization (e.g., computerized tomography, CT), and low

performance in low or high prevalence of disease. Clinical scoring systems by

synthesizing clinical information have been developed and should be useful for those

countries where imaging is less accessible. The scores are derived by incorporated

physical examination, clinical signs and symptoms in a mathematical equation.

Currently, there are a number of diagnostic scores constructed by many camps using

various statistical methods(21, 30, 37, 39, 41, 45, 46, 49-55, 64-85). Some scores have

been validated either internally(50, 84) or externally(39, 45, 46, 50, 51, 54, 64, 68, 76,

82-84, 86) whereas some scores have been applied without validation(55, 69).

Performances of those scores varied from fair to good in validation phases, but some


scores were still questionable. We therefore conducted a systematic review which aimed

at exploring score performances in both development and validation phases. Strengths

and limitations of previous diagnostic scores were critically appraised. Lessons from

this review will help to identify the most valid model/s or lead to create the new model

if required. The model can be later applied in general settings in developing countries

where resources are limited.

Methods

Search strategy

We searched Medline from 1949 and EMBASE from 1974 to March 2012

to identify relevant articles published in English. Search terms were included as follows:

appendicitis, gangrenous appendicitis, phlegmon, perforated appendicitis, abdominal

pain, score, scoring system, prediction score, prediction model, diagnostic score,

assessment tool, ultrasonogram, ultrasonography, computer tomography, accuracy,

negative appendectomy, sensitivity, specificity, likelihood ratio, false positive, false

negative, true positive, true negative, ROC, AUC. The search strategies are described in

the appendix.

Study selection

Studies were reviewed based on titles and abstracts. If a decision could not

be made, full articles were retrieved. Observational studies (cohort, case-control, or

cross-sectional) published in English were selected if they met with the following

criteria: suspected adult appendicitis, considered more than one risk factor in the

prediction score, had the outcome as appendicitis versus non-appendicitis, applied any

equation (e.g., Logistic regression, Bayesian method, or non-mathematical-investigator

opinion based) to build up the prediction model, and reported each model’s performance

(i.e., calibration and discrimination parameters).

Data extraction

The general characteristics of studies (i.e., author, journal, publication year,

type of participants, ethnicity, study design, number of subjects, rate of negative

appendectomy, percent of complicated appendicitis, and specific objective/s (i.e.,

develop or validate score, or both)) were extracted. If the diagnostic model was firstly

developed, specific information about model building (i.e., type of statistical model,

predictive factors, creating scores using coefficients or exponential of coefficients) were


extracted. Calibration (a ratio of expected versus observed value (E/O ratio)), and

discrimination parameters (i.e., the concordance statistic (C-statistic)) along with 95%

confidence interval (CI) were also extracted. These parameters were calculated if the

study did not directly report, but did provide summary data which allowed for

calculations. For studies which aimed at a validated model, the type of validations

(internal, external, or both) and results were also recorded. If authors had modified the

previous prediction models, the following aspects were recorded: whether any of the

original included variables were removed or modified; and whether new predictive

factors were added.

Risk of bias assessment

The risk of bias assessment tool was developed based on a user’s guide for

clinical prediction rule(62), which considered both derivation and validation phases.

Four domains were considered for the derivative phase, i.e., selection bias

(representative of spectrum), information bias (ascertainment of outcome

measurements, blinding outcome assessment, number of predictors, assessment

predictors without knowledge of outcome, proportion of important predictors),

confounding bias (used multi-variate regression analysis, created score properly), and

other issues (sample size, clinically sensible). For the validation phase, only 3 domains

were considered, i.e., selection bias (representative of spectrum), information bias

(ascertainment of outcome measurement, blinded assessment of outcome, accurate

interpretation), and other issue (i.e., follow up). Each item was classified as yes (low

risk of bias), no (high risk of bias), and unclear if there was insufficient information to

judge. Two reviewers (CW and TA) had independently extracted data and assessed risk

of bias for all included studies. Any disagreement was discussed with the third party

(AT) to resolve.

Statistical analysis

Model performances were described separately by derivative and validation

phases. Calibration (O/E ratio) and discrimination (C-statistic) coefficients along with

their 95% CIs were estimated for each study. A meta-analysis was applied to pool O/E

and C-statistic using the equations as described in the appendix. Heterogeneity was

assessed using Q statistic and a degree of heterogeneity I2 was estimated. If it was

present (p value <0.10 or I2 > 25%), a random-effect model was used to pool data,


otherwise a fixed-effect model was applied. All analyses were performed using Stata

version 12.0.

Results

Description of studies

We identified 440 studies of which 37 studies met our inclusion criteria and

thus were eligible for the review, see Figure 2.1. Among 37, 10 studies(38, 39, 45, 46,

54, 56, 69, 76, 77) had aimed at only derived prediction scores or modified the previous

prediction models (hereafter called derived studies), 4 studies(50, 55, 83, 84) had

derived and internally and externally validated in the same studies, whereas 23 studies

had only aimed at internal(51, 87) or external(10, 21, 30, 37, 41, 49, 52, 53, 66, 67, 74,

75, 78, 80, 88-93) validations.

Among 14 derived studies(38, 39, 45, 46, 50, 54-56, 69, 76, 77, 83, 84, 94),

all studies focused on adult patients, and most studies included patients with suspected

appendicitis who received operation or were being observed conditions whereas 3

studies(55, 56, 84) include only patients who received operations. Ten models(38, 39,

45, 50, 54-56, 69, 76, 83, 84, 94) were developed in Caucasian populations while three

models(46, 54, 77) were in Asian populations. The models were majorly constructed

based on cohorts either retrospective(45, 55, 84) or prospective cohorts.(38, 39, 46, 50,

69, 76, 77, 83, 94)

Among 23 studies that aimed only for validation, 20 studies had validated

models on patients with suspected appendicitis whereas 3 studies had focused on

operated patients. Most study designs were prospective cohorts. Fifteen studies were

done in Caucasian while 8 studies were done in Asian populations.

Risk of bias assessment

Risk of bias assessments was performed (Table 2.11). The methodological

assessment of derivation studies was developed based on the detailed as follows(95):

were important predictors included and present in significant proportion, were the

outcome events and predictors clearly defined, were assessing the outcome event

blinded, was the sample size adequate, and did the clinical rule make clinical sense?

Among 14 derivative studies, 8/14 (57.1%) studies had recruited

consecutive patients with chief complains of abdominal pain, or randomly sampled

patients from a well defined population frame of abdominal pain; whereas the remaining


studies had recruited a specific group of patients who had at least a few clinical signs

and symptoms. Most studies (92.9%) had confirmed the diagnosis of appendicitis by

histology without mention of whether histology was performed without blinding clinical

information. Numbers of predictors used in the prediction models were covered and

appropriateness (i.e., low risk of bias) if authors considered and used predictors from all

categories which were demographic, clinical signs, symptoms, lab, and imaging data;

otherwise this item was graded as high risk of bias. Ten out of fourteen (71.4%) studies

clearly listed all categories of predictors where the remaining studies considered only a

few categories. Only 5/14 (35.71%) studies stated clearly how they measured or

collected predictors in the way that assessors were blinded from knowledge of the final

diagnosis of appendicitis, lab, and imaging findings, whereas 57.14% of studies used

predictors which were not blinded or assessed with knowledge of possible diagnosis of

appendicitis.

Eleven out of fourteen studies (78.7%) had performed statistical estimations

or tests for all predictors, whereas 3/14 (21.3%) studies did not apply any statistical

method. However, only 5/14 (35.7%) studies had applied multivariate regressions by

simultaneously including significant predictors in the models, and used coefficients or

relative risks suggested from regression models to create scores, whereas the remaining

studies created prediction scores based on univariate results or non-statistical models.

Twelve (85.7%) studies had sufficient numbers of subjects for either

appendicitis patients or total patients considered based on a rule of thumb (1 predictor

per 10 appendicitis or 20-30 per total subjects). Some studies (71.4%) included

predictors that seemed to be clinically sensible, the scores were easy to apply and also

had suggested a course of clinical action.

The methodological assessment of validation studies was developed based

on the details as follows(95): were patients chosen in an unbiased fashion and

represented a wide spectrum of severity of diseases, was there a blinded assessment for

the criterion standard, was there an explicit and accurate interpretations of predictor

variables and actual rule without the knowledge of outcome, and was there 100% follow

up?

For validation studies, 21/25 (84%) studies were less likely for selection

bias. An ascertainment of diagnosis of appendicitis was clearly defined in 24/25 (96%)


studies. All studies did not mention whether diagnosis of appendicitis was masked from

clinical data. Thirteen out of twenty-five (52%) studies clearly described that

interpretation of the rule was not influenced by information of final diagnosis of

appendicitis, while 24% was influenced by diagnosis of appendicitis and 24% did not

mention it. Only 6 (24%) studies had followed up all included patients.

Score development

Among 14 derivative studies, 5 categories of predictive variables were

considered in the models including demographic data, clinical signs, clinical symptoms,

laboratory results, and imaging (Table 2.12). Among 2 demographic variables, gender

was the more commonly included in the model compared with age (42.9% vs 14.3%).

Ten symptom variables were considered in which nausea (9/14, 64.3 %) was the most

commonly included in the model followed with migration of pain, pain at presentation,

or duration of pain (all were 46.2%). Nine clinical signs were considered and the most

common variables used were rebound tenderness (76.9%), followed with right lower

quadrant (RLQ) tenderness (61.5%), and RLQ guarding (53.9%) or elevated

temperature (53.9%). Among 10 clinical symptoms, nausea/vomiting (53.9%) followed

with migration and duration of pain (46.4%) were most commonly included in the

predictive models. Most studies (84.6%) considered at least one lab variable. Among

these, rising white blood cell count (76.9%) was the most commonly used followed with

left shift of PMN cell (46.2%). Only a few studies used radiological data (e.g.

ultrasonography and abdominal radiograph) in creating scoring systems.

These prediction scores were developed using statistical modeling in 5

studies(38, 50, 76, 83, 84) whereas 9 studies(39, 45, 46, 54-56, 69, 77, 94) did not apply

statistical modeling. Among 5 studies with statistical modelling, 4 studies(38, 50, 76,

83) applied multivariate logistic regression and 1 study(84) used discriminant analysis.

Scoring schemes of these models were created based on regression coefficients of the

logit or discriminant regression models. Among 9 studies that did not apply statistical

models, a univariate analysis (e.g., Chi-square test, relative risk) and estimated

diagnostic parameters (e.g., likelihood ratio, sensitivity, specificity) were used for

assessing associations in 6 studies, whereas 3 studies did not apply any statistical

analysis tests.


Model performances

The models’ performances using C-statistics and O/E calibration

coefficients were extracted from individual studies, if reported, otherwise they were

estimated using summary data reported in the articles, see Table 2.13. Among 10 studies

where the calibration coefficient O/Es were available, the O/Es were very similar across

studies with the overall pooled O/E of 1 (95% CI: 0.97, 1.03). Contrastingly, the

discrimination coefficient C statistics varied from poor (0.54) to excellent (0.97)

discrimination with the pooled C statistic of 0.79 (95% CI: 0.67, 0.90). The C statistics

were very varied among 2 studies(38, 83) (i.e., ranged from 0.59 to 0.97) with

appropriate statistical methods to derive prediction scores.

Six of 14 prediction models had internally validated their prediction scores,

but only 5 had data available. The discrimination coefficient C statistics ranged from

0.61 to 0.92 with the pooled C statistic of 0.84(0.77, 0.92). Pooling within subgroups

according to appropriateness of derived predictive scores suggested similar results with

the C statistics of 0.81 (95%CI = 0.65, 0.97) and 0.88 (95%CI = 0.85, 0.91) for

appropriate and inappropriate derived predictive scores, respectively.

Twenty-three studies had been conducted which aimed at external

validation of 14 prediction models. The Alvarado score(45) was frequently validated in

14 studies(10, 21, 30, 41, 49, 50, 66, 75, 78, 80, 83, 88, 90, 96) followed by Fenyo model

in 3 studies(37, 67, 83). The study by Tzanakis et al(83) had externally validated 8

previous models, and thus was a major contributor of data in poolings. Most studies

created diagnostic scores using predictive factors according to the original scores. Data

used for validations were 15 Caucasian(10, 30, 37, 51-53, 66, 67, 74, 78, 88-90, 93, 94)

and 8 Asian(21, 41, 49, 75, 80, 91, 92, 96) populations studies.

Fourteen studies had externally validated Alvarado scores. All eight

variables (i.e., migration of pain, anorexia, nausea/vomiting, elevated temperature,

rebound tenderness, RLQ tenderness, increased WBC, and PMN left shift) were

included in the external validated models with the pooled E/O and pooled C-statistic of

0.99 (95%CI, 0.91 to 1.09) and 0.74 (95%CI, 0.69, 0.79), respectively. The Alvarado

score was also modified by two subsequent studies which excluded the shift to left of

PMN because this data was unavailable in a routine laboratory(46, 77), or replaced it

with a few other variables(i.e. cough test, Rovsing’s sign, rectal tenderness). This made


the score performance change from 0.80 (95%CI, 0.73, 0.86) to 0.76 (95%CI, 0.60,

0.92) with PMN excluded, and even worst for replacing PMN with a few more variables

with the C-statistic of 0.54 (95%CI, 0.45, 0.63). External validation of other scoring

systems was performed in 9 models with the pooled statistic of 0.81 (95%CI: 0.77, 0.84),

but this was mainly contributed to by Tzanakis, et al(83) who had validated 7 models.

Pooling external validated studies according to appropriate and inappropriate original

model construction resulted in the C statistics of 0.80 (95% CI: 0.65, 0.94) and 0.77

(95%CI: 0.74, 0.81), respectively.

Discussion

We have reviewed performances of diagnostic models for appendicitis.

Most models yielded relatively fair to good performances in discrimination with the

pooled C-statistic of 0.84 (95%CI 0.77, 0.91) in settings where the models were

developed and 0.78 (95%CI 0.74, 0.82) in settings where the models were applied.

However, only one third of scores were appropriately derived based on regression

models.

For those models with good to excellence external performances (C-statistic

≥0.8), 10 variables were commonly included in the models, which were migration of

pain, anorexia, nausea/vomiting, duration of pain, elevated temperature, rebound

tenderness, right lower quadrant tenderness, guarding, increased white blood cell, and

left shift of PMN. These models were originally developed using proper statistical

modeling (i.e., logistic regression) in only 2/23 studies whereas the rest had used results

of diagnostic parameters or univariate analysis (i.e., Chi-square test) without proper

rationale for weighting in prediction scores.

The Alvarado score, developed by Alvarado et al(45) since 1986, was the

most popular prediction model for diagnosis of appendicitis which aimed to identify

patients who should go to operation or observation. The model was originally developed

using data from 277 Caucasians to assess the association between 8 predictive factors

and appendicitis. These predictive factors included localized tenderness at the right

lower quadrant (RLQ) of abdomen, migration of pain, elevation of temperature,

nausea/vomiting, anorexia/acetone in urine, rebound tenderness at RLQ of abdomen,

leukocytosis, and shift to the left of PMN cell. The diagnostic parameters (i.e.,

sensitivity, specificity and accuracy) were estimated for each individual predictive


factor and used for creating the prediction score. The discrimination ability of C-statistic

was 0.80 (95%CI, 0.73, 0.86) in the derivative phase which dropped to 0.74 (95%CI,

0.69, 0.79) after external validation.

Since the PMN is unavailable in a routine laboratory(46, 77), it was

excluded it from the model which yielded a bit lower performance in the derived phases

(C-statistics 0.80 vs 0.76), but it was far worst if it was replaced with a few clinical

variables (i.e. cough test, Rovsing’s sign, rectal tenderness; C-statistics 0.80 vs 0.54).

The O/E ratio is commonly used to measure the closeness of the predicted

and the observed values. The C-statistic is usually applied to measure how well the

model will assign a higher probability of having an event to an appendicitis group and

a lower probability to a non-appendicitis group(97). The association between diagnostic

factors and appendicitis derived from the derived data may occur by chance. This

problem is prominent in situations in which there is a relatively small sample size

compared with the diagnostic factors included in the model. With a small sample size,

it is more likely to select unimportant variables, but omit some important variables from

the model(53). Conversely, a very large sample size is more likely to include statistically

important variables without clinical importance. The number of subjects with events

should be at least 10 and more safely with 20 or more per one risk factor to derive a

valid model as for stimulation studies(80, 92). For the results of our review, the number

of variables included in the model varied from 4 to 18 variables, so the required number

of appendicitis cases should therefore be at least 40 to 180 subjects, and 80 to 360

subjects to be safer. Six out of 14 derived studies(46, 49, 54, 56, 69, 76) had their

number of appendicitis cases less than the required numbers by including 5 to 14

variables with total appendicitis cases of 43 to 261 patients.

Although the performances of predictive scores in the derived phase,

internal, and external phases were good (C-statistic = 0.79, 0.84, and 0.78, respectively),

applying these scores to a general population was less confident because most scores

were created inappropriately. Most studies (64.3%) derived predictive scores using

univaraite analyses or estimated diagnostic parameters and used results from these

analyses to create scores. The rationale of choosing a method for creating the predictive

score was not clearly described. In addition, these scores were based on univariate

analyses and thus confounding bias might be present.


Differences in the distribution of risk factors across populations may also

affect the generalizability of the model to different populations. The C-statistics derived

from external validation usually tend to be lower than the C-statistics derived from

internal validation. Our results suggested that the pooled C-statistic from external

validations had slightly lower C-statistic than from internal validation. However,

pooling of external validations was dominated by the Tzanakis, et al(83) study which

had used the same subjects to validate 9 scoring systems. Excluding this study from

pooling resulted in the C-statistic of 0.75 (95%CI 0.74, 0.82).

Our review suggested that the research methods and reporting of research

findings in diagnostic scoring systems of appendicitis showed discrepancy. Some

research groups have advocated and developed research methods and reported

recommendations for conducting research in this area.(98, 99) In addition, a user’s

guide(100) for reading and use of evidence for this area has been also developed to

improve research methods, reporting, and use of evidence of prediction scores. We have

modified and used this tool for our study. The type of studied subjects, study design,

validity of measurements for outcome and diagnostic factors, and use of statistical

methods were mainly reported in most of the model developments. However, only 50%

(7/14) of the derived studies had developed scores and internally tested them in the same

studies. Most studies (64.3%) had created scores without applying statistical modelling.

None of the studies reported calibration parameters and only 9 (39.1%) studies in

external validation performed discrimination analysis and reported the C-statistic. The

models seemed to be clinical sensible in 71.4% (10/14 studies) which were simple and

easy to interpret.

In conclusion, although there are several diagnostic scoring systems for

appendicitis, applying them to a general population might be questionable due to

improper methods used for creating scores and lack of external validations. The more

appropriate scores with internal and external validations are still required.

2.3 Definition 2.3.1 Appendicitis is an inflammation of the appendix, which is the small,

finger-shaped pouch attached to the beginning of the large intestine on the lower-right


side of the abdomen. Appendicitis is a medical emergency, and if left untreated, the

appendix may rupture and cause a potentially fatal infection.

2.3.2 Migration of pain to the right lower quadrant = pain starting either in

epigastric area, centrally, or in the whole abdomen then eventually migrating down to

the right lower abdomen.

2.3.3 Pain aggravated by coughing = patient was asked to cough and any

worsening of pain was recorded.

2.3.4 Rebound tenderness = elicited in the right lower quadrant when a

hand pressing the abdomen for 10-15 seconds was suddenly withdrawn.

2.3.5 Rigidity = involuntary contraction of the abdominal muscles in the

absence of diagnostic evidence from an attribute.

2.3.6 Abdominal pain = abdominal pain (not only right lower quadrant)

2.3.7 Vomiting = one or more episodes.

2.3.8 Polymorphonuclear leukocytosis = study as a total count >

10,000/mm3 with PMN > 75%.

2.3.9 Rovsing sign was named by the Danish surgeon, Niels Thorkild

Rovsing, it is a sign of suspected appendicitis. If palpation on the left lower abdominal

quadrant results in more pain in the right lower quadrant, the patients have a positive

Rovsing sign and may have appendicitis.

2.3.10 Appendectomy = a surgical procedure to remove appendix.

2.3.11 Negative appendectomy = histological of normal appendix was

found from appendectomy that was done for the purpose of treatment after the

diagnosis of appendicitis.

2.4 Research methods in risk prediction scores Flow of study is displayed below. Methodological issues involved in

development and validation of risk prediction scores were as follows:


2.4.1 Clinical decision rule

Establishing diagnosis of appendicitis is a closely linked activity in routine

physician‘s practice. Clinician experience provides us with the sense of which findings

from patient history, physical examination, and investigation are critical in making an

accurate diagnosis. Information from a single predictor is usually insufficient to give

reliable estimate of diagnosis. A multivariable diagnostic model was developed,

validated, updated, and implemented with the purpose to assist us in estimating

probabilities, improving diagnosis accuracy, and making decision in management

(observation, imaging, and surgery). Scoring system (clinical decision rule) in diagnosis

of appendicitis is a clinical tool which quantifies the individual contributions that

various parameters of patient history, physical examination, and basic investigation

results make toward the diagnosis of appendicitis. This would pave the way to use a

formal test, simplify, and increase the accuracy of clinician in the assessment of

appendicitis.

Why use both internal and external validation? Many appropriate and

rigorous derivation clinical decision rules are not ready to use in clinical practice. Many

rules fail to perform well when tested in a new population. The rule may reflect

association between significant variables and outcomes that are mainly due by chance.


This poor performance is statistical overfitting or instability in the original derived

model. A different set of predictors will emerge in a different group of patients, despite

patients come from the same setting. Bootstrap technique can deal with this problem by

removing 1 patient from the sample, generating the rule using the remainder, and testing

it on the patient who is removed from the sample(95). The poor performance can be

related to differences in prevalence of disease or differences in how the decision rule is

applied. The clinical decision rule must be validated prospectively in a completely new

patient population (external validation). Under ideal circumstances this would be

performed in a new clinical setting by different clinicians from those involved in the

derivation study(45).

2.4.1.1 Derivation

The creating of a scoring system involves identification of

variables with predictive power. All predictive variables are included and clearly

identified. Assessment of appendicitis is performed by a pathologist who is blinded to

the assessing parameters as well as assessor for the parameters are blinded for the

pathological diagnosis of appendicitis. Many systematic reviews have found that

prediction model studies do not provide the rationale for the sample size or any mention

of model overfitting(101).

2.4.1.2 Validation

This phase tested the reproducible accuracy of RAMA-AS

which was divided into 2 studies. Internal validation was a narrow validation which

would apply the RAMA-AS in a similar clinical setting and population as the derivative

phase. External validation was a broad variation which would apply the RAMA-AS in

multiple clinical settings with varying prevalence of appendicitis. Many studies validate

the newly developed model by using a separate data set, recruited later in time or from

other hospitals. The methodological standards for validation have been reported as the

followings: choose patients in an unbiased fashion and represent a wide spectrum of

severity of disease, blind assessment of the criterion standard, explicit and accurate

interpretation of the predictor variables and the actual rule without knowledge of

outcome, and make 100% follow up of enrolled patients(95).


2.4.1.3 Impact analysis

This phase is aimed to find the evidence that RAMA-AS

changes clinician behaviour and improve outcome regarding to the diagnosis of

appendicitis.

2.4.1.4 Variables and appendicitis

Important variables in the diagnosis of appendicitis were

divided into 4 domains: demographic data, symptoms, clinical signs, and laboratory

results. Two demographic variables that were commonly used in previously developed

appendicitis scoring systems were age and gender. Ten symptom variables were

considered in which nausea/vomiting, followed by migration and duration of pain were

the most commonly included in the predictive models. Nine clinical signs were

considered in which the most commonly included variable was rebound tenderness,

followed by right lower quadrant (RLQ) tenderness, RLQ guarding, and elevated body

temperature. Most of the scoring systems considered at least one lab variable. Among

these, rising white blood cell count was the most commonly used followed with left shift

of PMN cells. All important variables from our previous systematic review were

included in the derivation(60). Included predictive variables in RAMA-AS must be

clinically sensible, easy to use, and suggested a course of action.


2.5 Conceptual framework

The aggressive attitude regarding diagnosis and management of patients

who are suspected of having appendicitis has recently been questioned. “When in doubt

cut it out” so in an inconclusive diagnosis and finding related to appendicitis patients

will pay the price of frequent removal of normal appendix. In the present time, advances

in peri-operative care reduces the dangers of peritonitis and complicated appendicitis.

This results in a more conservative approach which pays attention to the morbidity

associated with negative appendectomy. In order to decrease the negative appendectomy

without increasing the burden of complicated appendicitis, the generation of RAMA-

AS is one of the broad approaches which may improve surgeon competency in diagnosis

of appendicitis. This will reduce unnecessary complication that may cause by surgeon,

improve patient safety, and increase cost effectiveness of treatment for patients with

acute abdomen. A study with appropriate statistical methods will pave the way to reach

the objective of this study.


Figure 2.1 Identification of studies for inclusion


Table 2.1 Alvarado scoring system

Domains Variables Points

Symptoms

Migration of pain 1

Anorexia 1

Nausea/vomiting 1

Signs

RLQ tenderness 2

Rebound tenderness 1

Elevated temperature 1

Laboratory Leukocytosis 2

Left shift 1


Table 2.2 The scoring parameters of RIPASA score based on probability and extra

weight

Variables Points

Age

Less than 40

1.0

Greater than 40 0.5

Gender

Male

1.0

Female 0.5

Right iliac fossa pain (RIF) 0.5

Migration of pain to RIF 0.5

Nausea and Vomiting 1.0

Anorexia 1.0

Duration of Symptoms (hrs)

less than 48

1.0

more than 48 0.5

RIF tenderness 1.0

Guarding 2.0

Rebound tenderness 1.0

Rovsing sign 2.0

Fever (body temperature ≥ 37.8C by oral route) 1.0

Raised white cell count 1.0

Negative urinalysis 1.0

Foreign national registration identify card 1.0


Table 2.3 The Appendicitis inflammatory response score (AIRS)

Variables Points

Vomiting 1

Pain in right inferior fossa 1

Rebound tenderness or muscular defence

Light

1

Medium 2

Strong 3

Body temperature 38.5°C 1

Polymorphonuclear leukocytosis (%)

70-84

1

85 2

White blood cell count (cell/mm3)

10.0-14.9x103

1

15.0x103 2

C-reactive protein concentration (g/L)

10-49

1

50 2


Table 2.4 Fenyö-Lindberg scoring system

Variables Points Constant -10 Sex Male Female

8 -8

WBC (x103cell/mm3) < 8.9 9.0-13.9 > 14.0

-15 2 10

Pain duration (hrs) < 24 24-48 > 48

3 0

-12 Progression of pain Yes No

3 -4

Relocation of pain Yes No

7 -9

Vomiting Yes No

7 -5

Aggravation with cough Yes No

4

-11 Rebound tenderness Yes No

5

-10 Rigidity Yes No

15 -4

Pain outside right lower quadrant Yes No

-6 4


Table 2.5 Ohmann score

Variables Points

Tenderness in right lower quadrant 4.5

Rebound tenderness, contralateral 2.5

Dysuria 2.0

Constant pain 2.0

WBC > 10 x103 cell/mm3 1.5

Aged > 50 years 1.5

Shifting pain 1.0

Local guarding 1.0

Table 2.6 Eskelinen score

Variables Points Factors

Tenderness 2 = RLQ, 1 = Other location 11.41

Rigidity 2= Yes, 1 = No 6.62

Leukocyte count 2 = 10,000 cell/mm3, 1 = < 10,000 cell/mm3 5.88

Rebound tenderness 2 = Yes, 1 = No 4.25

Pain at presentation 2 = RLQ, 1 = Other locations 3.51

Duration of pain 2 = < 48 hours, 1 = 48 hours 2.13

Table 2.7 Simple scoring system

Variables Points

Abdominal pain 1

Vomiting 1

Right lower quadrant tenderness 1

Low grade fever (≤ 37.8oC) 1

Polymorphonuclear leukocytosis (total count ≥10,000, PMN ≥ 75%) 1


Table 2.8 Practical score of Ramirez and Deus

Variables Points

Sex

Male 6

Female -5

Initial pain

Epigastric 5

Other -6

Diarrhea

No 1

Yes -9

White cell count (cell/mm3)

≥10,500 6

<10,500 -14

Differential white cell count (%)

≥75 6

<75 -19

Guarding in right lower quadrant

Yes 8

No -7

Rebound tenderness

Yes 5

No -21


Table 2.9 Scoring system developed by Teicher

Variables Points

Sex

Male +2

Female -1

Age (years)

20-39 -1

>50 +3

Duration of pain (days)

1.5 +2

2 +1

3 -3

Genitourinary symptoms -3

Muscle spasm-Right lower quadrant

Involuntary +3

None -3

Rectal Mass at right side -3

WBC (cell/mm3)

<10,000 -3

>13,000 +2


Table 2.10 Performance of random forests (RF), support vector machines (SVM),

artificial neural networks (ANN), logistic regression (LR), and Alvarado score on

diagnosis of acute appendicitis

AUC AC SN SP PPV NPV

RF 0.98(0.017) 0.96 0.94 1.00 1.00 0.87

SVM 0.96(0.027) 0.93 0.91 1.00 0.85 0.73

ANN 0.91(0.047) 0.91 0.94 0.85 0.94 0.85

LR 0.87(0.052) 0.82 0.91 0.62 0.85 0.73

Alvarado 0.77(0.057) 0.80 0.84 0.69 0.87 0.64

AUC, Area under ROC curve; AC, accuracy; SN, sensitivity; SP, specificity; PPV,

positive predictive value; NPV, negative predictive value

Fac. of G

rad. Studies, M

ahidol Univ.

P

h.D.(C

linical Epidem

iology) / 43

Table 2.11 Describe methodological assessments

Study Phase Score

Selection bias Information bias Confounding bias Other issue

Representative spectrum

Ascertainment of

outcome measuremen

t

Blinded assess

outcome

No pre

dictors

Predic tors

blinded outcome

Signifi cant

predict tors

Accurate interpreta

tion

Multivariate

regression analysis

Created score

properly

Sample size

Clinical sensible

Other issue

Van way,1982(84) D Van way N NA NA N N Y - Y Y Y Y - Teicher,1983(56) D Teicher Y Y NA Y NA Y - N N Y Y - Alvarado,1986(45) D Alvarado Y Y NA Y NA Y - N N Y Y - Fenyo,1987(39) D Fenyo Y Y NA Y NA Y - N N Y N - Christian,1992(54) D Christian N Y NA N Y N - N N N Y - Eskelinen,1992(38) D Eskelinen Y Y NA Y Y Y - Y Y Y Y - Kalan,1994(46) D Modified

alvarado N Y NA N NA Y - N N N Y -

Ramirez,1994(55) D Practical score of Ramirez and

Deus

N Y NA Y NA Y - N N Y N -

Gallego,1998(69) D Gallego Y Y NA Y NA Y - N N Y N -

Tzanakis,2005(83) D Tzanakis Y Y NA Y Y Y - Y Y Y Y - Lintula, 2005(76) D Lintula Y Y NA Y Y Y - Y Y Y Y - Malik,2007(77) D Modifies

alvarado N Y NA N NA Y - N N Y Y -

Andersson, 2008(50) D Appendicitis inflammatory response score

Y Y NA Y Y Y - Y Y Y Y -

Chong,2010(87) D RIPASA N Y NA Y NA Y - N N Y N - Van way,1982(84) I Van way N NA NA - - - N - - - - N

Fenyo,1987(39) I Fenyo Y Y NA - - - Y - - - - N

Ramirez,1994(55) I Practical score of Ramirez and

Deus

N Y NA - - - N - - - - N

Tzanakis,2005(83) I Tzanakis Y Y NA - - - Y - - - - Y Andersson, 2008(50) I Appendicitis

inflammatory response score

Y Y NA - - - Y - - - - Y

Lintul,2010(94) I Lintula,2005 Y Y NA - - - Y - - - - Y Fenyo, 1997(37) E Fenyo,1987 Y Y NA - - - N - - - - N

Chum

pon Wilasrum

ee

Literature R

eview / 44

Table 2.12 Characteristics of studies that had developed prediction scores for appendicitis

Author Phase Year Model Study design

Type of participant

Male (%)

Ethnicity Appendicitis Non Appendicitis

Statistical method

% Negative appendectomy

% Complicated appendicitis

Van way(84) D/I 1982 Van way score

Retrospective Cohort

Operated patients

NA Caucasian 360 116 Discrimination

analysis

29.83 25.30

Teicher(56) D 1983 Teicher Case control

Operated patients

45.50 Caucasian 100 100 Diagnostic analysis,

40

Alvarado(45) D 1986 Alvarado score


Hospitalized patients

NA Caucasian 227 50 Diagnostic analysis

7 18.77

Fenyo(39) D

1987 Fenyo score

Prospective

Cohort

Patients suspected

appendicitis


18 14.00

Christian(54) D 1992 Christian score

Quasi-experim

ental design

Patients suspected

appendicitis

77.59 Asian 43 15

non-statistical

base

6.5 6.50

Eskelinen(38)

D 1992 Eskelinen

Prospective

cohort


NA Caucasian 270 1333 Multiple logistic

regression

21.6 6.74

Kalan(46) D 1994 Modified

Alvarado

Prospective

cohort

Hospitalized

patients

55.26 Asian 40 9 non-statistical

base

23.68 NA

Ramirez(55) D/I 1994 Practical score

of Ramirez

and Deus


Operated patients

63.00 Caucasian 293 67 Bayesian, Likelihoo

d ratio weight

18.61 NA

Fac. of G

rad. Studies, M

ahidol Univ.

P

h.D.(C

linical Epidem

iology) / 45

Table 2.11 Describe methodological assessments (cont.)

Study Phase Score

Selection bias Information bias Confounding bias Other issue

Representative spectrum

Ascertainment of

outcome measuremen

t

Blinded assess

outcome

No pre

dictors

Predic tors

blinded outcome

Signifi cant

predict tors

Accurate interpreta

tion

Multivariate

regression

analysis

Created score

properly

Sample size

Clinical sensible

Other issue

Denizbasi,2003(66) E Alvarado,1986 Y Y NA - - - Y - - - - NA AlQahtani,2004(41) E Alvarado,1986 Y Y NA - - - Y - - - - Y Pruekprasert, 2004(96)

E Alvarado,1986 Y Y NA - - - Y - - - - Y

Enochsson, 2004(67) E Fenyo,1987 UN Y NA - - - Y - - - - NA Sitter,2004(53) E Eskelinen,1992 Y Y NA - - - Y - - - - NA Tzanakis,2005(83) E Van way,1982

Teicher,1983 Alvarado,1986

Fenyo,1987 Christian,1992 Eskelinen,1992

Y Y Y Y Y Y

Y Y Y Y Y Y

NA NA NA NA NA NA

- - -

NA NA NA NA NA NA

- - - -

Y Y Y Y Y Y

Mckay,2007(30) E Alvarado,1986 Y Y NA - - - N - - - - NA Andersson,2008(50) E Alvarado,1986 Y Y NA - - - NA - - - - Y Kurane,2008(91) E Kalan,1994 Y Y NA - - - Y - - - - NA Sun, 2008(80) E Alvarado,1986 Y Y NA - - - N - - - - NA Talukder,2009(92) E Malik,2007 Y Y NA - - - Y - - - - NA Hsieh, 2010(21) E Alvarado,1986 Y Y NA - - - N - - - - NA Pouret-Baudry,2010(78) E Alvarado,1986 Y Y NA - - - Y - - - - Y

Chong,2011(49)

E Alvarado,1986 Chong,2010

Y Y

Y Y

NA NA

- - - Y Y

- - - - NA NA

Inci,2011(88) E Alvarado,1986 Y Y NA - - - Y - - - - Y Limpawattan,2011(75) E Alvarado,1986 Y Y NA - - - Y - - - - NA Konan,2011(90) E Alvarado,1986 N Y NA - - - N - - - - NA Kanumba,2011(89) E Kalan,1994 Y Y NA - - - Y - - - - NA Yoldas,2011(93) E Lintula,2005 N Y NA - - - N - - - - NA Castro,2012(10) E Alvarado,1986

Andersson,2008 Y Y

Y Y

NA NA

- - - Y Y

- - - - NA NA

Chum

pon Wilasrum

ee

Literature R

eview / 46

Table 2.12 Characteristics of studies that had developed prediction scores for appendicitis (cont.)


Type of participant

Male

(%)

Ethnicity Appendicitis Non Append

icitis

Statistical method



Gallego(69) D 1998 Gallego score

Prospective

Cohort

Patients suspected

appendicitis

NA Caucasian 101 91 Bayesian,

Likelihood ratio weight

8.85 18.23

Tzanakis(83) D/I/E 2005 Tzanakis score

Prospective

Cohort

Hospitalized

patients

56.10 Caucasian 217 504 Logistic regression

19.20 10.23

Lintula(76) D 2005 Lintula score

prospective cohort

Patients suspected

appendicitis

100 Caucasian 43 84 Logistic regression

13 NA

Malik(77) D 2007 Modified Alvarado

Prospective cohort

Hospitalized

patients

55.12 Asian 174 80 non-statistical

base

11.49 12.07

Andersson(50)

D/I/E 2008 Appendicitis

inflammatory

response score

Prospective Cohort

Hospitalized

patients

46.00 Caucasian 191 254 Logistic regression

11.00 14.00

Chong(87) I 2010 RIPASA score

Retrospective

cohort

Emergency appendecto

my

57.69 Asian 261 51 Univariate analysis

16.30 NA

Fac. of G

rad. Studies, M

ahidol Univ.

P

h.D.(C

linical Epidem

iology) / 47



Type of participant

Male (%)


icitis

Statistical method



External validation phase Fenyo(37) E 1997 Fenyo

score Prospec

tive Cohort

Patients suspected

appendicitis

11.2 Caucasian 392 775 Diagnostic analysis

17.5 NA

Lamparelli(74) E 2000 Mod. alvarad

o by kalan

Prospective

Cohort

Patients suspected

appendicitis


NA NA

Denizbasi(66) E 2003 Alvarado score

Prospective

Cohort

Patients suspected

appendicitis


NA NA

AlQahtani(41) E 2004 Alvarado score

Prospective

Cohort

Patients suspected

appendicitis

59 Asian 121 30 Diagnostic analysis

12.5 NA

Pruekprasert(96) E 2004 Alvarado score

Prospective

Cohort

Patients suspected

appendicitis

46 Asian 186 45 Diagnostic analysis,

Discrimination analysis

NA NA

Enochsson(67) E 2004 Fenyo score

Prospective

Cohort

Patients suspected

appendicitis

12 Caucasian 330 96 Diagnostic analysis

NA NA

Sitter(53) E 2004 Eskelinen score

Prospective

Cohort


NA Caucasian 662 1697 Diagnostic analysis,


21.6 NA

Tepel(52) E 2004 Ohmann score


Patients suspected

appendicitis


22 NA

McKay(30) E 2007 Alvarado score


Suspected appendicitis


NA NA

Chum

pon Wilasrum

ee

Literature R

eview / 48



Type of participant

Male (%)


icitis

Statistical method



Sun(80) E 2008 Alvarado score



66.2 Asian 213 159 Diagnostic analysis,


NA NA

Kurane(91) E 2008 Mod. Alvarado

by kalan

Prospective

Cohort

Patients suspected

appendicitis

48.33 Asian 23 37 Diagnostic analysis

61.67 NA

Ohmann(51) I 2008 ohmann Prospective

Cohort

Patients suspected

appendicitis


10.2 13.4

Talukder(92) E 2009 Mod Alvarado

malik

Prospective

Cohort


58 Asian 84 16 Diagnostic analysis

16 NA

Hsieh(21) E 2010 Alvarado score


Patients suspected

appendicitis

47 Asian 115 65 Diagnostic analysis,


NA NA

Pouget-Baudry(78)

E 2010 Alvarado score

Prospective

Cohort

Emergency appendectom

y


NA NA

Lintula(94) I 2010 Lintula RCT Patients suspected

appendicitis


13 NA

Chong(49) E 2011 Alvarado score

Chong (2010)

Prospective

Cohort

Patients suspected

appendicitis



22.9 6.1

Fac. of G

rad. Studies, M

ahidol Univ.

P

h.D.(C

linical Epidem

iology) / 49



Type of participa

nt

Male (%)


icitis

Statistical method



Inci(88) E 2011 Alvarado score

Prospective

Cohort

Patients suspected appendicit

is


13.6 NA

Limpawattanasiri (75)

E 2011 Alvarado score

Prospective

Cohort


is



14.7 NA

Konan(90) E 2011 Alvarado score


Operated patients

NA Caucasian 41 41 Diagnostic analysis,


NA NA

Kanumba(89) E 2011 Mod. alvarado

kalan

Cross sectional study


is


33.1 62.9

Yoldas(93) E 2011 Lintula Retrospective Cohort

Operated patients

50.7 Caucasian 132 24 Diagnostic analysis,


NA NA

Castro(10) E 2012 Alvarado score

Andersson



is

44 Caucasian 340 595 Diagnostic analysis,


21 NA

NA: not applicable, D=Derivative, I=Internal validation, E=External validation, Diagnostic=100%, Discrimination analysis=39.13%

Chumpon Wilasrumee Methodology / 50

CHAPTER III

METHODOLOGY

Continuing from background, rationale, and literature review in previous

chapters, this chapter aims to describe the research methods applied in our study. All

methodological issues for conducting study about risk prediction scores were

considered and covered as follows:

3.1 Study design and setting This study consisted of 2 parts, which were part I: derivation and internal

validation of CDR; and part II: external validation of CDR

3.1.1 Part I

The study design was cross-sectional study, which enrolled patients who

were suspected to have appendicitis and visited at out-patient clinic, the Emergency

and Surgery Department, the Faculty of Medicine Ramathibodi Hospital, Mahidol

University from January 2013 to May 2015.

3.1.2 Part II

The cross-sectional study was conducted at Thammasat University

Hospital and Chaiyaphum Hospital, which are tertiary and secondary care hospitals,

respectively. Thummasat University Hospital is located in central region, whereas

Chaiyaphum Hospital is located in Chaiyaphum province, in the North-Eastern of

Thailand. Data was collected during June 2015 to May 2016 and used for external

validations.


3.2 Study subjects

3.2.1 Inclusion criteria

3.2.1.1 Aged 15 to 80 years old

3.2.1.2 Patients who had abdominal pain within 7 days and

they were suspected of having appendicitis

3.2.1.3 Patients who had at least one of the following

symptoms and signs:

Symptoms: right lower abdominal pain, migration of

abdominal pain, anorexia, nausea, or vomiting

Signs: body temperature ≥ 37.8°C by oral route, right lower

quadrant tenderness, guarding, rebound tenderness, and decreased bowel sound

3.2.1.4 Agreed to participate and provided consent to the study

3.2.2 Exclusion criteria

3.2.2.1 Patients who could not give the history of illness by

themselves

3.2.2.2 Patients who had severe underlying diseases and

moribund such as severe myocardial infarction and terminal illness

3.2.2.3 Patients who had palpable abdominal mass

3.2.2.4 Patients who were diagnosed as tumor or malignancy

of appendix

3.2.2.5 Patients who had metastatic tumor to the appendix

3.2.3 Variables & Measurements

3.2.3.1 The outcome of interest

The outcome of interest was appendicitis, which was definitely

diagnosed by histopathology using the following criteria:

- Macroscopic finding: intravascular injection of serosa;

fibrinous, purulent film, edematous, hemorrhagic, necrotic changes of the wall: and

blood or pus on opening of the appendix.


- Microscopic finding: focal or expanded erosion, ulceration,

abscess.

Histological criteria for acute appendicitis was an inflammatory reaction

with PMN leucocytes in the mucosal layer of the appendix and edema.

Complicated appendicitis was defined by surgeon and/or, a peritoneal

swab or fluid culture grew at least one definite bowel organism, and/or the

histopathologist identifies a perforation in association with gangrene or full thickness

necrosis.

Negative appendectomy was the histological normality of appendix, which

was found from appendectomy. The surgery was done for the purpose of treatment

after the diagnosis of appendicitis. Non-surgical patient was classified as negative

clinical finding by telephone follow up at 1 month.

3.2.3.2 Predictive Variables

There were 4 domains of predictive variables as follows:

- 3.2.3.2.1 Demographic variables

Age: age was recorded as age in year at diagnosis.

Sex: sex was recorded as male or female.

- 3.2.3.2.2 Clinical symptoms

Onset of pain was recorded as insidious or sudden. Onset was a

description of speed/manner onset of pain. Sudden onset of pain happened abruptly

and severely whereas insidious onset of pain happened in mild degree and non-

abruptly.

Duration of pain was recorded as time since occurrence of pain

until patient came to the hospital.

Right lower quadrant (RLQ) abdominal pain was recorded

after asking the patient about presence/absence of RLQ.

Migration of pain was recorded after asking the patient about

presence or absence. Pain could start either in epigastric or central area, or in the

whole abdomen, then eventually migrate down to the right lower abdomen.

Anorexia was recorded after asking patient about loss of

sensation of appetite.


Aggravation of pain with cough was recorded as presence or

absence. Patient was asked to cough and asked if his/her pain was worsening, which

was classified as presence of aggravation of pain.

Nausea was recorded as presence or absence. Nausea was

sensation of unease and discomfort in the upper stomach with an involuntary urge to

vomit.

Vomiting was recorded as presence or absence. Vomiting was

a forceful expulsion of the contents of one's stomach through the mouth and

sometimes the nose.

Dysuria was recorded as presence if patient experienced pain

or difficulty in urination.

Diarrhea was recorded as presence if patient had three or more

loose or liquid bowel movements per day.

3.2.4.3 Clinical signs

Fever: body temperature >37.8°c by oral route

Tenderness at RLQ: Patient was classified as presence of RLQ

tenderness if s/he had pain when pressing hand at RLQ part of abdomen especially at

McBurney's point (the point over the right side of the abdomen that is one-third of the

distance from the anterior superior iliac spine to the umbilicus (the belly button). This

point roughly corresponds to the most common location of the base of the appendix

where it is attached to the cecum.

Rebound tenderness: pain elicited in the right lower quadrant

when a hand pressing the abdomen for 10-15 seconds is suddenly withdrawn, see

Figure 3.1.

Abdominal guarding: the tensing of the abdominal wall

muscles to guard inflamed organs within the abdomen from the pain of pressure upon

them. The tensing is detected when the abdomen wall was pressed. Abdominal

guarding is also known as 'défense musculaire'.

Rovsing sign: palpation on the left lower abdominal quadrant

results in more pain in the right lower quadrant.


Per rectal examination tenderness: Pain at suprapubic area or

within rectum after rectum examination and exert pressure on the peritoneum of the

cul-de-sac of Douglas.

3.2.4.4 Laboratory results

Increased white blood cell count: a total white blood cell count

> 10,000 cells/mm3.

PMN leukocytosis: a PMN cell count > 75%.

Urine analysis was recorded as normal if red blood cell and white blood cell were less

than 5 per high per field.

3.3 Data Collection

3.3.1 Case record forms

Case record forms (CRF) were developed (see Appendix A) and prepared

at the Section for Clinical Epidemiology and Biostatistics, Faculty of Medicine,

Ramathibodi Hospital and distributed to all research sites.

3.3.2 Training

Surgical residents were informed and trained about this research project,

data collection flow, informed consent, and assessment for all variables that were used

in diagnosis of appendicitis at the beginning of project and retrained every rotation at

general surgical unit.

In addition, research assistants were trained for data collection, queries

when data were missing, and assessment for all variables that were used in diagnosis

of appendicitis.

In order to validate the measurement, the quality control process was

started from first training the research assistants, and second year surgical residents for

data collection. The manual of data collection was explicit and explained clearly

including definition of signs, symptoms, and laboratory test, methods of examination,

and follow up at appointment or telephone follow up. Double data entering from the

case record form was used.


3.3.3 Data flow, queries, quality control, and project monitoring

Consecutive cases of suspected appendicitis (as described in inclusion

criteria) were included at emergency and out-patient surgical unit. The first or second

year surgical residents and research assistants who had already passed the protocol

training were responsible for data collection. Resident/research assistant approached

and informed patients about our aim of this study. Informed consent was given once

patients agreed to participate with study.

The outcome of having appendicitis and variables included age, sex, onset

of pain, duration of pain, progression of pain, right lower quadrant (RLQ) abdominal

pain, migration of pain, anorexia, aggravation of pain with cough, nausea or vomiting,

dysuria, diarrhea, fever (temperature >37.8 °c by oral route), tenderness at RLQ,

rebound tenderness, guarding, Rovsing sign, tenderness at RLQ during per rectal

examination, increased white blood cell count, PMN leukocytosis, normal urinalysis,

and CRP were collected prospectively in consecutive cases for the development

(derivation) and validation of RAMAAS.

3.4 Sample size estimation In our pilot study, the overall prevalence of appendicitis at the Surgical

Department, Faculty of Medicine Ramathibodi Hospital, Mahidol University was

approximately 62%, and its distribution by predictors were shown in Table 3.1. The

sample size based on assessing association or testing two proportions for each

parameter ranged from 40-8084.

For instance, the prevalence of appendicitis in patients who had duration of

pain less than 48 hours was 0.75. The null and alternative hypotheses were as follows.

H0: p1-p2=0

Ha: p1-p2≠ 0

Assign Type I and II errors of 0.05 and 0.1, respectively were assigned,

with a ratio of exposed vs non-exposed of 1:1. The size of difference between P1 and

P2 was set at 0.13. The sample size could be estimated as shown in the equation below

or using PS program version 3.1.2


𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠(𝑛𝑛) =�Zα

2�2P�(1 − P�) + Zβ�P1(1 − P1) + P2(1 − P2)�

2

(P1 − P2)2

𝑃𝑃� =𝑃𝑃1 + 𝑃𝑃2

2

𝑃𝑃� =(0.62 − 0.75)

2= 0.685

𝑛𝑛 =�1.96�2 ∗ 0.685(1 − 0.685) + 0.84�0.62(1 − 0.62) + 0.75(1 − 0.75)�

2

(0.62 − 0.75)2

𝑛𝑛 = 199

From our pilot study, the proportion of appendicitis in exposed and non-

exposed groups for variables that were associated with appendicitis are described in

Table 3.1. Type I and type II errors were set at 5% and 20%, with the ratio of exposed

vs non-exposed of 1: 1. The size of detectable was set at 5% and 10%. The sample size

estimation from each variable was calculated. Using data from the most significant

variable (migration of pain), therefore 356 (220+136) patients were required to

develop the RAMAAS with taking into account for 10% loss to follow up rate.

A total of 8-10 parameters were expected to be included in the final score.

Using the rule of thumb, 20 subjects with appendicitis were required for 1 variable, so

approximately 160 and 200 subjects with appendicitis were needed for 8 and 10

parameters, respectively. As a result, a total of 239 to 299 patients was required to

enroll.

Comparing estimated sample sizes between the two approaches, the larger

sample size was used. Given percentage of loss to follow up of 5%, the total sample

size was 356 + [356 X 0.05] = 374 subjects. Therefore 374 patients were required to

develop the RAMAAS. Thirty percent of the samples from derivative phase was

required for external validation, therefore 112 subjects from different medical centers

were needed for external validation phase.

3.5 Data management Databases were constructed separately by settings based on the CRFs

using EPIDATA program version 3. The principal investigator checked for the


completeness of CRFs before data entry. Double data entry was independently done

by staff and two data sets were validated to reduce typing errors. Data checking and

cleaning was done monthly with clarification of missing, unclear, or non-sensible data.

CRFs and electronic data were kept in safe areas, only the principal investigator,

supervisors, and the data manager were able to access data. Real time data backup was

automatically done weekly at the server of data management unit, Section for Clinical

Epidemiology and Biostatistics to prevent data loss. Data was divided into 2 sets: 1)

derivative and internal validation phase from Faculty of Medicine Ramathibodi

Hospital 2) external validation from Faculty of Medicine Thammasat University

Hospital and Chaiyaphum Hospital.

3.6 Statistical analysis

3.6.1 Imputation

The necessity for imputation should be considered when attempts at

recovering lost information have failed. For our data management monthly meeting

monthly, CRFs were retrieved and checked, so if missing data were detected, enquiries

were made by sending to research assistants at each collection site. Principal

investigator and research assistants reviewed and dug data from medical records if

needed, from notes of physician, laboratory results, and direct phone call to patients or

referral hospital/s if needed. Missing data could shrink the sample size, decrease

precision of confidence interval, weaken statistical power, and create bias of

estimations. Instead of omitting missing data, we applied imputation technique to deal

with missing data that related to observed variables.

Multiple imputation (MI) is advocated as the preferred imputation method

and can lead to more correct standard errors and P values (102). It involves generating

multiple copies of the data sets, where the missing values were replaced by imputed

values drawn from the predicted distribution by using the observed data. Variability of

MI consists of two sources, i.e., within and between imputations. Thus, precision of

MI estimate depends not only on number of subjects, but also on numbers of

imputations. Performance of imputation can be assessed using relative variance


increase (RVI) and fraction of missing information (FMI). The RVI refers to average

relative increase in variances of estimates because of missing variables (i.e., mean of

variance of all coefficients from missing data), and as this value closes to 0, missing

data reflects less on estimates. The FMI refers to the largest fraction of missing

information of coefficient estimates due to missing data. It is used to get the idea about

the required number of imputations based on a rule of thumb, i.e., FMIx100. For

instance, if FMI = 0.15, a number of imputation = 0.15x100, i.e., at least 15

imputations are required.

Multiple imputations were performed using a simulation-based sequential

multivariate-regression analysis with chain equations (103, 104). Linear regression,

logistic regression, and multi-logit regression were used to predict missing data for

continuous, dichotomous, and categorical data (> 2 groups), respectively.

Distributions/patterns of missing data were explored to ensure that data were missing

at random (MAR). The MAR assumption would be satisfied if the probability of

missing data on Y was unrelated to the value of Y after controlling for other variables

(X) from the analysis, i.e., P(Ymissing/YX) = P(Ymissing/X). Complete data of 14

variables (i.e., outcome of appendicitis, age, sex, location of pain, migration of pain,

progression of pain, aggravation of pain, nausea or vomiting, anorexia, body

temperature, tenderness, rebound tenderness, guarding, and bowel sound) were used as

independent variables to predict the values of the missing data. Each missing variable

was modelled conditionally on the remaining variables in the data set until no missing

variable remained, or in another words, all missing data were filled. Therefore, the

final step would lead to data sets which contained observed and imputed data. Since

the frequency of missing data and the largest FMI (i.e., measures of uncertainty of the

values estimated from MI) were very low in this study (i.e., less than 0.05), twenty

imputations were performed to allow for the uncertainty of imputed data. Bias from

multiple imputations was examined by comparing the distributions between missing

and observed values using the “midiagplots” command in STATA program.

Among 396 eligible patients, only 2 variables, i.e., WBC and neutrophil

were missing with percentages of 10.86 (43/396) and 10.10 (40/396), respectively.

Missing data were imputed using multivariate chain of truncated linear regression

models. WBC and neutrophil were regressed on complete data as mention previously,


see Appendix B. Possible ranges of WBC and neutrophil were set at 2,036 to 70,400

cell/mm3 and 32% to 96%, respectively. An interaction was not considered in the

imputed model. Twenty imputations were performed to allow for the uncertainty of

imputed data and the summarized values were then used(105). Further statistical

analysis was applied to each imputed data set, and estimations were then combined

using Rubin’s rule to produce an overall estimate of each regression coefficient taking

into account uncertainty in the imputed values.

3.6.2 Derivative phase

The data collected at Ramathibodi Hospital was used to derive the RAMA-

AS. Mean and standard deviation (SD) or median (range) where appropriate was used

for describing continuous data while frequency (percentage) was used for categorical

data. The baseline characteristics of patients were compared between appendicitis and

non-appendicitis groups using independent t test (or quantile regression where

appropriate) and Chi-square (or Exact test where appropriate) for continuous and

categorical data, respectively.

A simple logistic regression analysis was used to screen variables that

might associate with appendicitis. Individual variables of 4 domains (i.e., demographic

data, clinical symptoms, clinical signs and laboratory results) were fitted in the logit

model, and a likelihood ratio (LR) test was used to test association. Variables whose p

values were less than 0.20 were simultaneously considered in the multivariate logistic

regression model. The LR test was applied for model selection with backward

elimination. Only significant variables were finally kept in the parsimonious model.

Goodness of fit was performed to assess whether the model fitted well with data (, i.e.,

predicted and observed values were close) using Chi-square Hosmer-Lemeshow test.

In addition, a calibration coefficient (observed values/predicted values) along with its

95% confidence interval (CI) was estimated. Furthermore, calibration plot was also

constructed, i.e., predictive risk was plotted against observed risk. The coefficients of

the final parsimonious model was used to create RAMA-AS by “summing up” all

coefficients in the final model. The receiver operating curve (ROC) analysis was used

to calibrate the score cutoff. Diagnostic parameters (i.e., sensitivity, specificity,


likelihood ratio positive (LR+) and negative (LR-)) were estimated for each distinct

value of the scores.

Receiver-operating characteristic (ROC) curves are an excellent method to

compare diagnostic tests. They were initially developed in World War II, by the

British army (106). The plot of sensitivity versus 1 - specificity is named receiver

operating characteristic (ROC) curve and the area under the curve (AUC), is an

effective measure of accuracy which is considered for meaningful interpretations. This

curve plays an important role in evaluating diagnostic ability of tests to discriminate

the true state of patients, finding the optimal cut off values, and comparing two

alternative diagnostic tasks when each task is performed on the same subject (107). A

rough guide for classifying the accuracy of a diagnostic test in the traditional system is

ROC 0.90-1 indicated excellent, 0.80-0.90 indicated good, 0.70-0.80 indicated fair,

0.60-0.70 indicated poor, and 0.50-0.60 indicated fail.

3.6.3 Validation phases

3.6.3.1 Internal validation

After derivation of model, several factors such as number of

predictors given small effective sample size, categorization of continuous variables,

and strategies of predictor selection may lead to an overfitted model and optimistic of

apparent performance. It is important to assess model performance more honestly by

performing internal validation preferably using resampling techniques such as

bootstrapping or cross-validation methods. The classical split-sample internal

validation technique is done by dividing derivation data into 2 data sets; one for

derivation of model and the other to validate model. The 2 sets of data are typically

created by randomly splitting the original data (e.g. 70:30 or 50:50). This method does

not work when total sample size and/or number of diseased patients outcome is small.

This technique requires a large sample size and splits data by time (temporal

validation) or location (geographic validation) (102).

Cross-validation is an extension of the split-sample technique.

This technique is used to reduce the bias and variability of the performance estimates.

The 10-fold cross-validation technique is done by randomly splitting the data into 10

equally sized groups. The model is derived in 9 of the 10 groups and evaluation of its


performance is done in the remaining group. This entire process is then repeated 10

times so that each of the 10 groups is used to test the model. The performance of the

model is calculated as the average over the 10 iterations.

The bootstrapping technique is performed by developing the prediction

model using the entire original sample (size n) and determining the apparent

performance. A bootstrap is performed by sampling n individuals with replacement

from the original sample. For each bootstrap data, a model is developed by applying

all the same modeling and predictor selection methods for determining the apparent

performance (e.g., c-index) , called bootstrap performance, and this is compared with

the performance of the original sample (original performance). The optimism is

calculated as the difference between the bootstrap performance and the original

performance. Repeat the previous steps at least 100 times and average the estimates of

optimism then subtract the value from the apparent performance obtained in the first

step to obtain an optimism-corrected estimate of performance (108).

This technique has better effects of predictor selection

strategies on model building than the cross-validation technique, and thus the extent of

model overfitting and optimism can be quantified by repeating the predictor selection

process in each bootstrap sample. This technique also provides an estimate of the

adjustment or correction factor, by which the model’s regression coefficients and its

performance measures can be shrunk and thus adjusted for overfitting. It is very

important that all aspects of model fitting are incorporated into each random or

bootstrap derivation sample, including selection of predictors, deciding on

transformations, and tests of interaction with other variables or time. Omitting these

steps is common in clinical research, but can lead to biased assessments of fit, even in

the validation sample.

A strength of applying bootstrap is it utilizes all data which is

used to develop the prediction model and provides a mechanism to account for model

overfitting or uncertainty in the entire model derivation process. It quantifies any

optimism in the final prediction model and provides an estimation of shrinkage factor

that can be used to adjust the regression coefficients and apparent performance for

optimism, such that in subsequent model validation studies and applications, better

performance is obtained.


In this study, the bootstrap technique with 450 replications was

applied for internal validation of the RAMA-AS(99). For each bootstrap sample, the

RAMA-AS score suggested from the derived phase was calculated for each patient

and then was fitted in logistic model. The score performances (i.e., discrimination and

calibration) were estimated. For calibration, the correlation between the observed and

expected values of appendicitis was assessed using the Somer’D correlation

coefficient for all bootstrap data (called Dboot) and derived data (called Dorg).

Calibration of the model was then assessed by subtracting the Dorg from the mean

Dboot. Lower value reflected less bias and thus better calibration. Discrimination of the

model was also assessed by comparing the original C statistic versus an average C

statistic from the bootstraps. The command used for bootstrap is provided in Appendix

C.

3.6.3.2 External validation

The two most important model performances are

discrimination and calibration. For derivation phase, the primary interest is

discrimination because the model is more likely to be well calibrated (on average) by

definition. In validation studies, assessment of both discrimination and calibration are

crucial. Discrimination is the ability of a prediction model to differentiate between

subjects who had negative or positive outcome event. A model has perfect

discrimination if the predicted risks for all individuals who were diagnosed positive

are higher than those for all individuals who have negative outcome. Discrimination is

estimated by the concordance index (c-index), which is the probability that for any

randomly selected pair of individuals, one with positive and one with negative

outcome, the model assigned a higher probability to the individual with positive

outcome. The c-index is identical to the area under the receiver-operating

characteristic (ROC) curve for models with binary endpoints.

Calibration reflects the agreement between predicted outcome

and the observed outcomes. Report of calibration should be done graphically with

predicted outcome probabilities on the X-axis plotted against observed outcome

frequencies on the Y-axis. Predicted probability is divided into 6-10 groups. This plot

is commonly done by sixths-tenths of the predicted risk and should be augmented by a

smoothed line over the entire predicted probability range. This plot displays the


direction and magnitude of model miss-calibration across the probability range, which

can be combined with estimates of the calibration slope and intercept. Smoothed or by

subgroups, a well-calibrated model shows predictions lying on or around the 45° line

of the calibration plot, amd perfect calibration shows a slope of 1 and intercept of 0.

Data from the two external hospitals mentioned previously were used to

validate the performance of RAMA-AS. The RAMA-AS was calculated for each

patient according to suggestion from the derived model and fitted in the logit model.

Model performances (i.e., calibration and discrimination) were then assessed as

mentioned previously. Calibration performance was explored as mentioned above, if it

was not well calibrated, model re-calibrations were performed by re-calibrating the

intercept (called M1) and overall coefficient (called M2) (109, 110) as shown in Table

3.2. The M1 was constructed by fitting RAMA-AS on appendicitis variable in two

external data separately. The estimated intercept of this model was then used to re-

calibrate by adding it up from the original intercept. An estimated coefficient from

this model was then used to calibrate the coefficient by multiplying it with overall

coefficients (M2). In addition, four model revisions were additionally performed from

the M2 as shown in Table 3.2 (109-112). The M3 was constructed by fitting M2 with

seven additional individual predictors. A likelihood ratio test was applied to compare

the M2 with/without additional predictor. Only significant predictors were kept in the

model M3. The M4 was constructed by fitting M2 plus significant predictors from

stepwise selections of the seven predictors. The M5 was constructed by re-estimating

all coefficients of seven predictors. Finally, the M6 was constructed by re-selecting

only significant predictors out of seven predictors.

In summary, the rationale for models M0 to M6 are described

and summarized as follows (110).

M0: no adjustment, the original model had good calibration.

M1: adjustment of the intercept (i.e., baseline risk) because of

difference in occurrence of appendicitis between derivation and validation samples.

M2: M1 plus adjustment of all predictor regression coefficients

by one overall factor because the regression coefficient of the original model was

overfitted or underfitted.


M3: M2 plus extra-adjustment of regression coefficients for

predictors with different strengths in the validation sample as compared with the

derivation sample. The rationale was similar to M2 and the strength (regression

coefficient) of one or more predictors may be different in the validation sample.

M4: M2 plus stepwise selection of additional predictors. The

rationale were as in M2 and one or more potential predictors were not included in the

original model, or a newly discovered variable may need to be added.

M5: Re-estimation of all regression coefficients by using the

data of the validation sample only, because the strength of all predictors may be

different in the validation sample or the validation sample is much larger than the

development sample.

M6: M5 plus additional predictors by stepwise selection. The

rationale is the same as in M5 and one or more potential predictors are not included in

the original model.

When validating a derivation model in other individuals, the

performance of the model is usually poorer than the performance estimated in the

individuals on whom the model was derived. This is likely caused by differences in

study design, measurements of predictors, or prevalence of outcome of interest. When

lower predictive accuracy happens, we may reject, refit, or derive a new model.

However, derivation of a different model may encounter the same problem, i.e.

overfitting and perhaps even less generalizable than the original model. Prior

knowledge captured in the original studies is not used optimally, so unenhanced

evidence-based medicine which should be based on as much data as possible.

Therefore, before derivation of a new model from the validation data, we should first

try to adjust the original model to determine what extent loss in predictive accuracy

may be overcome. An adjusted model should combine the information presented in the

original model with information from individuals in the validation set. This is more

likely to improve transportability to other individuals. Several methods have been

recommended for updating prediction models (2, 20, 31, 102, 290, 372, 373). The

derivation and validation data set commonly differ in proportion of outcome events,

yielding poor calibration of the original model in the new data. By adjusting or

updating the intercept or baseline hazard of the original model to the validation


sample, calibration is often improved. More advanced adjusting methods vary from

overall adjustment of all predictor weights by a single recalibration factor, adjustment

of a particular predictor weight, or addition of a new predictor to re-estimate all

individual regression coefficients.

3.6.3.3 Comparison of RAMAAS and previous scores

Data from our previous systematic review (60) suggested that

Eskelinen. (95), Alvarado(45), and Fenyo scores (39) provided good discriminations.

Alvarado, Fenyo, and Eskelinen scoring systems were then calculated for individual

patients. Performances of these scores were then estimated and the C-statistics were

compared with our RAMA-AS performances using ROC curve analysis.

All analyses were performed using Stata version 14 (Stata Corp, College

Station, Texus, USA) under mi estimate command. A P-value of less than 0.05 was

taken as a threshold for statistical significance.

3.7 Ethics considerations

3.7.1 Informed consent if needed

The informed consent was shown in the appendix section. This research

study was submitted to the ethics committee for approval. These were conducted

according to the principles of the Helsinki Declaration and in accordance with the

Medical Research Involving Human Subjects Act. The protocol was submitted for

approval from the ethics committee of Faculty of Medicine Ramathibodi Hospital,

Mahidol University. The principles of respect for persons, beneficence, no

maleficence, and justice were applied in this research.

3.7.1.1 Respect for Persons

This principle of two ethical convictions was considered. All

participants were treated as autonomous and second, participants with diminished

autonomy were protected. They were treated with dignity and respect. The medical

and surgical care were not disturbed by this study. This study did not used participants

as a means to an end or in a manner inconsistent with the person's interests or wishes.

All the treatments were given to participants in the standard quality of care. The


decision of sending participants for investigations, giving medical treatment and

surgery were not disturbed. The data was collected in case record form by history

taking, physical examination, and reviewing medical records. Telephone follow up

may have disturbed participants, although the convenient time of each participant was

recorded and used for contact.

3.7.1.2 Beneficience

This study prevented and removed harm as well as promoted

the good of the patient by minimizing the possible harms or risks and maximizing the

potential benefits. The no maleficence which prohibits the infliction of harm, injury, or

death upon others was applied (the maxim Primum non nocere: “Above all do no

harm”). This study did not disturb the process of care. No new therapy or medicine

was used in this study.

3.7.1.3 Justice

Each participant was treated fairly and equitably, and was

given his or her due. The study used available resources fairly and distributed them

fairly and equitably. All participants were treated equally. They had the right to leave

the study at any time without affecting the quality of care.

3.7.1.4 Protection

This study assured that participation was voluntary and patients were

treated with equality and fairness. The participants were informed all the procedures

that happened which was related to them and their family.


Table 3.1 Estimation of sample size

Variables Prevalence of appendicitis

Expected difference

(%)

Sample size (by PS program)

Age (years) < 40 ≥ 40

0.72 0.70

none

-

Sex male female

0.83 0.61

20

76X2=152

Onset of pain insidious sudden

0.93 0.62

20

54X2=108

Duration of pain (hours) < 48 ≥ 48

0.75 0.52

20

88X2=176

Migration of pain presence absence

0.80 0.63

17

110X2=220

Aggravation of pain presence absence

0.65 0.35

20

96X2=192

Nausea presence absence

0.74 0.69

20

89X2=178

Vomiting presence absence

0.73 0.70

20

90X2=180

Dysuria presence absence

0.10 0.71

20

43X2=86

Diarrhea presence absence

0.75 0.70

20

25X2=50

Fever (T>37.8oC) presence absence

0.83 0.61

20

87X2=174

Tender at RLQ presence absence

0.71 0.70

20

80X2=160

Abdominal guarding presence absence

0.79 0.64

15

141X2=282

Rovsing sign presence absence

0.75 0.69

None

-

PR tenderness presence absence

0.74 0.69

None

-

Increase WBC presence absence

0.77 0.20

20

86X2=172

Total 0.62 PR=per rectal examination, RLQ=right lower quadrant, WBC=white blood cell


Table 3.2 Re-calibration and revision of models for external validations

Type of update model Thammasat Chaiyaphum M0: Original model

)(ˆˆ 11 ASRAMAx −+ βα )(ˆˆ 22 ASRAMAx −+ βα M1:Re-calibrate intercept

α

M2: Re-calibration α

βoverall )(ˆˆˆ 110 ASRAMA −+± βαα )(ˆˆˆ 220 ASRAMA −+± βαα M3:Revision M2+γiXi α

βoverall )(ˆˆˆ 110 ASRAMA −+± βαα )(ˆˆˆ 220 ASRAMA −+± βαα Likelihood ratio test

M4:Revision M2+γiXi α

βoverall )(ˆˆˆ 110 ASRAMA −+± βαα )(ˆˆˆ 220 ASRAMA −+± βαα Stepwise selection

M5: Enter all predictors

M6: Stepwise selection

0α

10 ˆˆ αα ± 20 ˆˆ αα ±

10 ˆˆ αα ± 20 ˆˆ αα ±

10 ˆˆ αα ± 20 ˆˆ αα ±

...ˆˆ 2211 +++ xx γγ ...ˆˆ 2211 +++ xx γγ

10 ˆˆ αα ± 20 ˆˆ αα ±

...ˆˆ 2211 +++ xx γγ ...ˆˆ 2211 +++ xx γγ

pp xxx βββα ˆ...ˆˆˆ 22111 ++++ pp xxx βββα ˆ...ˆˆˆ 22112 ++++

...ˆˆˆ 22111 +++ xx ββα ...ˆˆˆ 22112 +++ xx ββα


Figure 3.1 Rebound tenderness

Chumpon Wilasrumee Results / 70

CHAPTER IV

RESULTS

4.1 Characteristics of patients A total of 396 patients with suspected acute appendicitis were included in

the part I derivation study. The baseline data were shown in Table 4.1. One hundred

and thirty-two patients (33.3%) were male, the mean age and BMI were 36.3±14.6 and

22.8±4.5, respectively. A total of 245/396 (61.8%, 95% CI: 56.9%, 66.7%) patients

were appendicitis, with a negative appendectomy rate of 4%.

4.2 Imputation Two variables (i.e. WBC count and neutrophils) contained missing data of

10.9% and 10.1%, respectively (Table 4.2). Exploring distribution of these missing

values suggested that missing values were not a subset of each other, thus their

missing distributions were assumed to be arbitrary patterns. Therefore, data imputation

based on the assumption of MAR could be applied. Imputed data were filled in 43 and

40 subjects for WBC and neutrophils, respectively. The observed and imputed values

for all variables were very similar. Performances of imputation were assessed by

estimate RVI and FMI, see Table 4.2. The average RVI was <0.0001 and a median of

estimated FMI was <0.0001 (range: <0.0001, 0.0535) with the maximum of 0.0535,

which required at least about 6 imputations. Therefore, 20 imputations is sufficient.

The diagnostic plot was performed for WBC and neutrophil variables by comparing

the distributions of missing to observed values, which suggested no difference

between the 2 values, see Figure 4.1A and B.


4.3 Model development

4.3.1 Derivation

Twenty variables were analyzed in the univariate analysis as shown in

Table 4.3. A total of 15 predictive variables were suggested from this step which

might be associated with appendicitis. These included symptoms (i.e., first location of

pain, migration of pain, onset, progression of pain, right lower quadrant pain at

presentation, nausea or vomiting, aggravation of pain by cough or movement, fever),

signs (i.e., bowel sound, body temperature, tenderness at right lower quadrant of

abdomen, rebound tenderness, guarding), and laboratory results (i.e., WBC >10,000

cell/mm3 and neutrophils >75%).

These parameters were simultaneously included in multiple logistic

regression model. The LR test with backward elimination was applied and suggested 7

variables should remain in the final model. These were 3 symptoms (i.e., migration of

pain, progression of pain, and aggravation of pain by cough or movement), 2 signs

(i.e., body temperature ≥ 37.8°C, and rebound tenderness), 2 laboratory results (i.e.,

WBC >10,000 cell/mm3 and neutrophils >75%). The coefficients, odd ratios, and 95%

CI were estimated and reported, see Table 4.3. The predictive equation was

In � 𝑃𝑃1−𝑃𝑃

� = - 3.37 + (0.80) migration of pain + (1.04) progression of pain

+ (0.78) aggravation of pain by cough or movement

+ (1.64) body temperature + (1.53) rebound tenderness

+ (0.91) white blood cell + (0.69) neutrophils

Among 5 predictors of sign domain, only body temperature ≥ 37.8°C and

rebound tenderness were significant with the odds ratios (ORs) of 5.1 (95% CI: 2.1,

12.1) and 4.6 (95% CI: 2.7, 7.7), respectively. Among 12 predictors of symptom

domain, progression of pain, aggravation of pain, and migration of pain were

significantly associated with appendicitis with the ORs of 2.8 (95% CI: 1.3, 5.9), and

2.2 (95% CI: 1.2, 3.8), and 2.6 (95% CI: 1.3, 3.7), respectively. Both laboratory

predictors for laboratory domain were significantly associated with appendicitis;


patients with WBC >10,000 cell/mm3 and neutrophils >75% were also significantly

higher risk of appendicitis with the ORs of 2.6 (95% CI: 1.3, 5.0) and 2.3 (95% CI:

1.2, 4.1), respectively (Table 4.4).

4.3.2 Model performance

The estimated C-statistic of this model was 0.842 (95% CI: 0.804, 0.881),

see (Figure 4.2), indicating the model performed well in discriminating appendicitis

from non-appendicitis. Hosmer-Lemeshow goodness of fit test indicated the model

fitted well with the data (Chi-square test = 5.64, df= 8, P-value= 0.69) with the O/E

ratio of 0.95 (95% CI: 0.83, 1.08).

The scoring scheme was constructed using estimated coefficients of the 7

significant predictors described in Table 4.4. These scores ranged from -3.37 to 3.99

with a median of 0.86. The ROC curve analysis was applied to calibrate the score

cutoffs, which were arbitrarily chosen based on its performance of LR+, and ease of

application in clinical practice. As a result, it was stratified into 4 categories, i.e., very

low risk (score < -0.64), low risk (score -0.64 to 0.84), moderate risk (score 0.85 to

1.74), and high risk (score >1.74) groups corresponding to Table 4.5. The estimated

LR+ for these 3 later groups were 1.98 (95% CI: 1.65, 2.37), 5.25 (95% CI 3.39, 8.13),

and 8.36 (95% CI: 3.96 to 18.00) when compared to the lowest risk group. The post-

test probability was estimated based on the pretest probability (i.e., prevalence) of

appendicitis of 61.8%, which yielded post-test probability of 76.0%, 89.0%, and

93.0% for low, moderate, and high risk groups, respectively (Figure 4.3).

4.3.3 Internal validation

The 450-replication bootstraps yielded estimated Dorg and Dboot

coefficients of 0.717 and 0.725 (95% CI: 0.721, 0.728) for the derivative and bootstrap

models, respectively. The bias was only 0.0073 (95%CI: -0.0109, -0.0036), which

suggested good calibration. The bootstrap C-statistics was 0.8623 (95% CI: 0.8605,

0.8641), with a bias of -0.004 (95% CI: -0.006, -0.0023).


4.4 External validation A total of 330 patients with suspected acute appendicitis (152 and 178

from Thammasat University Hospital and Chaiyaphum Hospital, respectively) were

used to externally validated the RAMA-AS. Characteristics of patients in both data

sets were described in Table 4.6.

For Thammasat University Hospital, when compared to Ramathibodi

Hospital, the prevalence of appendicitis was much lower in Thammasat University

Hospital, i.e., 48.7% vs 61.8%), and mean age was quite similar (35.6 vs 36.3 years),

but male percentage was much lower (26.4 vs 35.8%), see Table 4.6. Among seven

predictors, distributions of rebound tenderness (42.8% vs 48.5%), progression of pain

(64.5% vs 84.8%) and aggravation of pain (51.4% vs 72.5%) were little to much

lower, but migration of pain (48.0% vs 44.7%), body temperature (19.7% vs 18.7%)

and WBC > 10,000 cells/mm3 (82.2% vs 79.6%) and neutrophils >75% (75.7% vs

66.2%) were little to much higher difference.

The estimated RAMA-AS ranged from -3.4 to 4.0 with a median of 0.1.

The derivative model seemed to work well in Thammasat University Hospital, with

the estimated O/E ratio of 1.005 (95%CI: 0.784, 1.225; Hosmer-Lemshow =8.219,

(df=4), p = 0.838). However, the calibration plot showed the predicted risk deviated

from the reference line (see Figure 4.4-A), i.e., over-estimated risk for lower score and

under estimated risk for higher scores. The intercept and overall coefficients were then

calibrated (see Table 4.7), and calibration plots were constructed (see Figure 4.4-B-C)

which suggested no improvement of calibrations.

Revision methods were then constructed (i.e., M3-M6), in which migration

of pain, progression of pain, body temperature, WBC, and neutrophils were significant

predictors after adjustment of likelihood ratio in M3 (see Table 4.7). Comparing

coefficients of M3 versus M0, coefficients of body temperature, WBC, and neutrophil,

where their distributions were higher than distribution in Ramathibodi Hospital, were

changed from positive to negative coefficients; whereas coefficients of the rest of the

predictors, where their distributions were lower, increased. Only migration of pain,

progression of pain, and rebound tenderness were significantly positive predictors

after stepwise selection for M4. Of which, distributions of progression of pain and


rebound tenderness were much lower, but migration of pain was higher than

distributions in Ramathibodi Hospital (see Table 4.7).

Calibrations of these models were plotted in Figure 4.4-D-G. These

suggested that the O/E ratio for revision M3 model (recalibrate intercept, coefficient,

and likelihood ratio test) and M4 (calibrated intercept and overall coefficients plus

stepwise selection of significant predictors) were 0.940 (95% CI: 0.729, 1.150;

Hosmer-Lemshow = 2.683, df = 4, p = 0.612) and 1.006 (95% CI: 0.743, 1.269;

Hosmer-Lemshow = 5.00, df = 7, p = 0.660), which were much improved when

compared to M0. C-statistics were estimated for all models, see Table 4.8. These

suggested that the RAMA-AS could well discriminate appendicitis from non-

appendicitis with the C-statistics of 0.853 (95%CI: 0.790, 0.915), 0.877 (95% CI:

0.823, 0.932), and 0.881 (95% CI: 0.828, 0.935) for M0, M3 and M4, respectively.

For Chaiyaphum Hospital, when compared to Ramathibodi Hospital (see

Table 4.6), prevalence of appendicitis in Chaiyaphum Hospital was much higher

(76.9% vs 61.8%), and mean age was older (42.9 vs 36.3 years), but male percentage

was higher (39.9% vs 35.8%). Among seven predictors, three predictors including

migration of pain (70.2% vs 44.7%), body temperature (37.6% vs 18.7%) and rebound

tenderness (71.3% vs 48.5%) showed more presence; but aggravation of pain was

much lower presence (58.4% vs 72.5%), whereas the rest, progression of pain (82.6%

vs 84.8%), WBC >10,000 cells/mm3 (76.9% vs 79.6%) and neutrophils (63.5% vs

66.2%) were little lower than Ramathibodi Hospital.

A median RAMA-AS was 1.6 (-3.4, 4.0) with O/E ratio of 0.996 (95% CI:

0.695, 1.333; Hosmer-Lemshow = 6.640 (df=4), p = 0.156). Calibration of intercept

and overall coefficient models were constructed (see Table 4.8), calibration plots were

graphed for original model M0 (Figure 4.5-A), and M1-M2 (Figure 4.5 B-C) and also

revisions of models of M3-M6 (Figure 4.5 D-G). This suggested that the M0 still

deviated from the reference line particularly for low and high scores. Also M1 and M2

did not improve calibrations when compared to the original M0. Among revision of

M3-M6 models, M3-M5 were improved in calibrations, but the M6 was the best with

O/E ratios of 1.021 (95% CI: 0.905, 1.186). The estimated C-statistics for M6 was

0.860 (95%CI: 0.790, 0.930) (see Table 4.8).


4.5 Comparison of RAMA-AS and previous scores Alvarado, Fenyo, and Eskelinen scores were calculated which ranged from

2 to 10 (mean = 7.04), 0 to 56 (mean = 25.4), and 2 to 15 (mean = 9.9), respectively.

These scores were then compared with RAMA-AS using ROC curve analysis, see

Figure 4.6. This yielded C-statistics for these which corresponded with scores of 0.752

(95% CI: 0.710, 0.800), 0.764 (95% CI: 0.716, 0.813), and 0.622 (95% CI: 0.567,

0.676), which were statistically lower discriminative ability than our RAMA-AS (P-

value of < 0.001, see Figure 4.6).

Chum

pon Wilasrum

ee Results / 76

Table 4.1 Baseline data of 396 patients from Ramathibodi Hospital, 152 patients from Thammasat Hospital, and 178 patients from Chaiyaphum Hospital

Characteritics Ramathibodi Hospital Thammasat Hospital Chaiyaphum Hospital

Appendencitis,n(%) Non-appendicitis,n(%) Appendencitis,n(%) Non-

appendicitis,n(%) Appendencitis,n(%) Non-appendicitis,n(%)

Progression of pain Yes

223(92.5)

113(72.9)

67(90.5)

31(39.7)

123(87.9)

11(61.1)

No 18(7.5) 42(27.1) 7(9.5) 47(60.3) 17(12.1) 7(38.9) Aggravation of pain Yes

199(82.6)

88(56.8)

55(74.3)

23(29.5)

96(68.6)

9(50.0) No 42(17.4) 67(43.2) 19(25.7) 55(70.5) 44(31.4) 9(50.0) Migration of pain Yes

130(53.9)

47(30.3)

55(74.3)

18(23.1)

105(75.0)

9(50.0)

No 111(46.1) 108(69.7) 19(25.68) 60(76.9) 35(25.0) 9(50.0) Body temperature ≥37.8 °C Yes

176(73.0)

146(94.2)

21(28.4)

9(11.5)

52(37.1)

1(5.6) No 65(26.9) 9(5.8) 53(71.6) 69(88.5) 88(62.9) 17(94.4) Rebound tenderness Yes

155(64.3)

37(23.9)

48(64.9)

17(21.8)

119(85.0)

6(33.3)

No 86(35.7) 118(76.1) 26(35.1) 61(78.2) 21(15.0) 12(66.7) WBC (cells/mm3) >10,000

215(89.2)

100(64.5)

58(78.4)

67(85.9)

122(87.1)

13(72.2)

≤10,000 26(10.8) 55(35.5) 16(21.6) 11(14.1) 18(12.9) 5(27.8) Neutrophils (%) >75

187(77.6)

75(48.4)

54(72.9)

61(78.2)

102(72.9)

13(72.2)

≤75 54(22.4) 80(51.6) 20(27.0) 17(21.8) 38(27.1) 5(27.8)


Table 4.2 Report of number of missing data

Missing Variables Percent Observed Imputed FMI RVI

WBC 10.86 353 43 <0.0001 <0.0001

Neutrophils 10.10 356 40 <0.0001 <0.0001

Lymphocytes 10.10 356 40 <0.0001 <0.0001

FMI = largest fraction of missing information of coefficient, RVI = average relative

increase in variances of estimates


Table 4.3 Description of patients’ characteristics in appendicitis and non-appendicitis

groups

Characteristics Non-appendicitis n=155

Appendicitis n=241

OR (95%CI) P-Value

Demographic Age (years), mean (SD) 33.8(11.9) 37.9(15.9) <0.001

Age group ≥40 years <40 years

56(36.1) 99(63.9)

101(41.9) 140(58.1)

1.3(0.8-1.9)

1

0.251

Sex, number, (%) Male 39(25.2) 93(38.6) 1.9(1.2-2.9) <0.001

Female 116(74.8) 148(61.4) 1 BMI (kg/m2), mean (SD) 22.4(3.9) 22.95(4.7) 0.23

Symptoms

First location of pain Epigastrium 40(25.8) 102(42.3) 2.2(1.4-3.4) <0.001 Periumbilical 24(15.5) 31(12.9) 1.1(0.6-1.9) Other 91(58.7) 108(44.8) 1 Type of pain Dull aching, constant 49(31.6) 82(34.0) 1.1(0.7-1.7) 0.620 Colicky 106(68.4) 159(65.9) 1 Migration of pain Presence 47(30.3) 130(53.9) 2.7(1.8-4.1) <0.001 Absence 108(69.7) 111(46.1) 1 Onset Sudden 35(22.6) 95(39.4) 2.2(1.4-3.5) <0.001 Insidious 120(77.4) 146(60.6) 1 Progression of pain Yes 113(72.9) 223(92.5) 4.6(2.5-8.4) No 42(27.1) 18(7.5) 1 <0.001 RLQ pain at presentation Yes 140(90.3) 239(99.2) 12.8(2.9-

56.8)

No 15(9.7) 2(0.8) 1 <0.001 Time of pain before presentation (hours)

≤ 48 126(81.3) 204(84.7) 1.3(0.7-2.2) 0.382 > 48 29(18.7) 37(15.4) 1 Time of RLQ pain before presentation (hours)

≤ 12 67(43.2) 107(44.4) 1.1(0.7-1.6) 0.820 > 12 88(56.8) 134(55.6) 1 Nausea or vomiting Yes 64(41.3) 141(58.5) 2.0(1.3-3.0) <0.001 No 91(58.7) 100(41.5) 1

RLQ, right lower quadrant


Table 4.3 Description of patients’ characteristics in appendicitis and non-appendicitis

groups (cont.)

Characteristics Non-appendicitis

n=151

Appendicitis n=245

OR (95%CI) P-Value

Aggravation of pain Yes 88(56.8) 199(82.6) 3.6(2.3-5.7) <0.001 No 67(43.2) 42(17.4) 1 Anorexia Yes 118(76.1) 164(68.1) 0.7(0.4-1.1) No 37(23.9) 77(31.9) 1 0.083 Fever Yes 135(87.1) 154(63.9) 0.3(0.2-0.4) No 20(12.9) 87(36.1) 1 <0.001 Signs Bowel sound Increase 20(12.9) 37(15.4) 1.4(0.8-2.5) 0.044 Decrease 16(10.3) 45(18.7) 2.1(1.1-3.9) Normal 119(76.8) 159(65.9) 1 Body temperature (°C) ≤ 37.8 9(5.8) 65(26.9) 5.9(2.8-12.4) <0.001 < 37.8 146(94.2) 176(73.0) 1 Tenderness at RLQ Yes 137(88.4) 240(99.6) 31.5(4.2-238.8) <0.001 No 18(11.6) 1(0.4) 1 Rebound tenderness Yes 37(23.9) 155(64.3) 5.8(3.7-9.1) <0.001 No 118(76.1) 86(35.7) 1 Guarding Yes 26(16.8) 82(34.0) 2.6(1.6-4.2) <0.001 No 129(83.2) 159(65.9) 1 Laboratory results WBC (cell/mm3) >10,000 100(64.5) 215(89.2) 4.6(2.7-7.7) <0.001 ≤ 10,000 55(35.5) 26(10.8) 1 Neutrophil (%) >75 75(48.4) 187(77.6) 3.7(2.4-5.8) <0.001 ≤ 75 80(51.6) 54(22.4) 1


Table 4.4 Factors associated with appendicitis: Multiple logistic regression analysis

Domain Parameters Coefficient SE P value OR (95%CI)

Scoring

Symptoms Progression of pain 1.04 0.4 0.007 2.8 (1.3-5.9)

1.04

Aggravation of pain by cough or movement

0.78 0.3 0.009 2.2 (1.2- 3.8)

0.78

Migration of pain

0.80 0.3 0.004 2.6 (1.3-3.7)

0.77

Signs Body temperature ≥37.8 °C

1.64 0.5 <0.001 5.1 (2.1-12.1)

1.64

Rebound tenderness 1.53 0.3 <0.001 4.6 (2.7-7.7)

1.53

Lab results

WBC >10,000 cells/mm3

0.91 0.3 0.005 2.6 (1.3-5.0)

0.91

Neutrophils >75% 0.69 0.3 0.010 2.3 (1.2-4.1)

0.69

Constant -3.37 Total 3.99

WBC= white blood cell count


Table 4.5 Risk stratification and predictive values of a RAMA-AS prediction score

Score

(Variables)

Risk

groups

Score development for derivative phase

Outcome %Sensitivity

(95%CI)

%Specificity

(95%CI)

LR+

(95%CI)

LR-

(95%CI)

Post-

positive

test odds

(%) AP

Non

-AP

<-0.64 Very low

risk

25 85 100.00 0 1.00 0 61.80

-0.64 to 0.84 Low risk 61 51 89.75

(85.25-93.26)

54.97

(46.67-63.06)

1.98

(1.65-

2.37)

0.19

(0.13-

0.28)

76.00

(73.00-

79.00)

0.85 to 1.74 Intermedi

ate risk

64 12 64.08

(57.73-70.09)

88.08

(81.82-92.78)

5.25

(3.39-

8.13)

0.41

(0.34-

0.49)

89.00

(85.00-

93.00)

>1.74 High risk 91 7 37.96

(31.86-44.36)

95.36

(90.68-98.12)

8.36

(3.96-

18.00)

0.65

(0.59-

0.72)

93.00

(86.00-

97.00)


Table 4.6 Key study characteristics of patients from derivative and external validation

Characteristics RA (N=396) TS (N=152) CP (N=178) Mean age (SD),years 36.3(14.6) 35.6(16.9) 42.9(16.8)

Men 132 (35.8%) 40 (26.4%) 71 (39.9%)

Symptoms

Progression of pain 336 (84.8%) 98 (64.5%) 147 (82.6%)

Aggravation of pain 287 (72.5%) 78 (51.4%) 104 (58.4%)

Migration of pain 177 (44.7%) 73 (48.0%) 125 (70.2%)

Signs

Body temperature ≥37.8 °C 74 (18.7%) 30 (19.7%) 67 (37.6%)

Rebound tenderness 192 (48.5%) 65 (42.8%) 127 (71.3%)

Laboratory

WBC (>10,000 cells/mm3) 315 (79.6%) 125 (82.2%) 137 (76.9%)

Neutrophils (>75%) 262 (66.2%) 115 (75.7%) 113 (63.5%)

Prevalence of appendicitis 245/396 (61.8%) 74/152 (48.7%) 137/178 (76.9%)

CP = Chaiyaphum Hospital; RA = Ramathibodi Hospital; TS = Thammasat Hospital


Table 4.7 Estimation of intercept and coefficients for external validations using

different update models Type of update model Thammasat (p value) Chaiyaphum

M0: Original model

α=-3.374 0.376 0.501

M1: Re-calibration

α -3.374-0.376 -3.374+(0.501)

M2: Re-calibration

α -3.374-0.376 -3.374+(0.501)

βoverall 0.929 0.798

M3: Revision M2+γiXi

α -3.374-0.376 -3.374+(0.501)


Migration of pain 1.284 (0.004) 0.0391 (0.930)

Progression of pain

Aggravation of pain

1.318 (0.046)

0.426 (0.378)

-0.490 (0.391)

0.629 (0.184)

Body temperature -1.332 (0.033) -1.339 (0.024)

Rebound tenderness 0.454 (0.332) 1.937 (<0.001)

WBC -1.353 (0.017) -0.333 (0.519)

Neutrophils -1.236 (0.017) -1.275 (0.018)

M4:

Α -3.374-0.376 -3.374+(0.501)


Migration of pain 1.836 (<0.001) 0.957 (0.043)

Progression of pain

Aggravation of pain

1.768 (0.001)

1.128 (0.016)

Rebound tenderness 1.817 (<0.001) 2.619 (<0.001)

M5:

Α -3.189 -1.965


Aggravation of pain 1.539 (0.006) 1.971 (0.716)

Progression of pain 0.512 (0.286) 0.988 (0.043)

Body temperature 0.485 (0.378) 0.405 (0.431)


WBC 0.345 (0.615) 1.129 (0.061)

Neutrophils -0.282 (0.658) -0.757 (0.221)

M6:

Α -2.959 -1.430


Progression of pain 1.768 (0.001)

Aggravation of pain 1.128 (0.016)



Table 4.8 Estimations of calibration coefficients and C-statistics for external

validations using different re-calibration and revision methods

Model Thammasat Chaiyaphum

GoF test

O/E (95% CI)

C stat (95% CI)

GoF test

O/E (95%CI)

C stat (95% CI)

M0 0.084 1.01 (0.78, 1.23)

0.853 (0.791, 0.915)

0.156 0.996 (0.659, 1.333)

0.813 (0.736, 0.892)

M1 0.084 1.00 (0.78, 1.23)

0.853 (0.791, 0.915)

0.156 0.996 (0.659, 1.333)

0.813 (0.736, 0.892)

M2 0.084 1.00 (0.78, 1.23)

0.853 (0.791, 0.915)

0.156 0.996 (0.659, 1.333)

0.813 (0.736, 0.892)

M3 0.612 0.94 (0.73, 1.15)

0.877 (0.823, 0.932)

0.261 1.083 (0.734, 1.434)

0.854 (0.777, 0.931)

M4 0.660 1.01 (0.74, 1.27)

0.881 (0.828, 0.935)

0.239 1.035 (0.651, 1.419)

0.857 (0.788, 0.926)

M5 0.270 0.87 (0.58, 1.61)

0.884 (0.832, 0.936)

0.279 0.905 (0.622, 1.186)

0.873 (0.809, 0.938)

M6 0.354 0.95 (0.68, 1.21)

0.872 (0.817, 0.926)

0.967 1.021 (0.947, 1.094)

0.860 (0.790, 0.930)

GoF, Goodness of Fit


A: WBC

B: Neutrophils

Figure 4.1 Diagnosis plot between missing and observe values

0.0

1.0

2.0

3.0

4kd

ensi

ty la

bneu

20 40 60 80 100x

Imputation 1

0.0

1.0

2.0

3.0

4kd

ensi

ty la

bneu

20 40 60 80 100x

Imputation 2

0.0

1.0

2.0

3.0

4kd

ensi

ty la

bneu

20 40 60 80 100x

Imputation 3

0.0

1.0

2.0

3.0

4kd

ensi

ty la

bneu

20 40 60 80 100x

Imputation 4

0.0

1.0

2.0

3.0

4kd

ensi

ty la

bneu

20 40 60 80 100x

Imputation 50

.01

.02

.03

.04

kden

sity

labn

eu

20 40 60 80 100x

Imputation 6

0.0

1.0

2.0

3.0

4kd

ensi

ty la

bneu

20 40 60 80 100x

Imputation 7

0.0

1.0

2.0

3.0

4kd

ensi

ty la

bneu

20 40 60 80 100x

Imputation 8

0.0

1.0

2.0

3.0

4kd

ensi

ty la

bneu

20 40 60 80 100x

Imputation 9

0.0

1.0

2.0

3.0

4kd

ensi

ty la

bneu

20 40 60 80 100x

Imputation 10

0.0

1.0

2.0

3.0

4kd

ensi

ty la

bneu

20 40 60 80 100x

Imputation 11

0.0

1.0

2.0

3.0

4kd

ensi

ty la

bneu

20 40 60 80 100x

Imputation 12

0.0

1.0

2.0

3.0

4kd

ensi

ty la

bneu

20 40 60 80 100x

Imputation 13

0.0

1.0

2.0

3.0

4kd

ensi

ty la

bneu

20 40 60 80 100x

Imputation 14

0.0

1.0

2.0

3.0

4kd

ensi

ty la

bneu

20 40 60 80 100x

Imputation 15

0.0

1.0

2.0

3.0

4kd

ensi

ty la

bneu

20 40 60 80 100x

Imputation 16

0.0

1.0

2.0

3.0

4kd

ensi

ty la

bneu

20 40 60 80 100x

Imputation 17

0.0

1.0

2.0

3.0

4kd

ensi

ty la

bneu

20 40 60 80 100x

Imputation 18

0.0

1.0

2.0

3.0

4kd

ensi

ty la

bneu

20 40 60 80 100x

Imputation 19

0.0

1.0

2.0

3.0

4kd

ensi

ty la

bneu

20 40 60 80 100x

Imputation 20

Observed Imputed Completed

0.0

1.0

2.0

3.0

4kd

ensi

ty la

bneu

20 40 60 80 100x

Imputation 1

0.0

1.0

2.0

3.0

4kd

ensi

ty la

bneu

20 40 60 80 100x

Imputation 2

0.0

1.0

2.0

3.0

4kd

ensi

ty la

bneu

20 40 60 80 100x

Imputation 3

0.0

1.0

2.0

3.0

4kd

ensi

ty la

bneu

20 40 60 80 100x

Imputation 4

0.0

1.0

2.0

3.0

4kd

ensi

ty la

bneu

20 40 60 80 100x

Imputation 5

0.0

1.0

2.0

3.0

4kd

ensi

ty la

bneu

20 40 60 80 100x

Imputation 6

0.0

1.0

2.0

3.0

4kd

ensi

ty la

bneu

20 40 60 80 100x

Imputation 7

0.0

1.0

2.0

3.0

4kd

ensi

ty la

bneu

20 40 60 80 100x

Imputation 8

0.0

1.0

2.0

3.0

4kd

ensi

ty la

bneu

20 40 60 80 100x

Imputation 9

0.0

1.0

2.0

3.0

4kd

ensi

ty la

bneu

20 40 60 80 100x

Imputation 10

0.0

1.0

2.0

3.0

4kd

ensi

ty la

bneu

20 40 60 80 100x

Imputation 11

0.0

1.0

2.0

3.0

4kd

ensi

ty la

bneu

20 40 60 80 100x

Imputation 12

0.0

1.0

2.0

3.0

4kd

ensi

ty la

bneu

20 40 60 80 100x

Imputation 13

0.0

1.0

2.0

3.0

4kd

ensi

ty la

bneu

20 40 60 80 100x

Imputation 14

0.0

1.0

2.0

3.0

4kd

ensi

ty la

bneu

20 40 60 80 100x

Imputation 15

0.0

1.0

2.0

3.0

4kd

ensi

ty la

bneu

20 40 60 80 100x

Imputation 16

0.0

1.0

2.0

3.0

4kd

ensi

ty la

bneu

20 40 60 80 100x

Imputation 17

0.0

1.0

2.0

3.0

4kd

ensi

ty la

bneu

20 40 60 80 100x

Imputation 18

0.0

1.0

2.0

3.0

4kd

ensi

ty la

bneu

20 40 60 80 100x

Imputation 19

0.0

1.0

2.0

3.0

4kd

ensi

ty la

bneu

20 40 60 80 100x

Imputation 20

Observed Imputed Completed


Figure 4.2 Receiver operating characteristic (ROC) curves of RAMA-AS for

diagnosis of appendicitis

0.00

0.25

0.50

0.75

1.00

Sen

sitiv

ity

0.00 0.25 0.50 0.75 1.001 - Specificity

Area under ROC curve = 0.8422


Figure 4.3 Fagans nomogram plot for RAMA-AS risk stratification

0.0010.0020.0050.010.020.050.10.20.51251020501002005001000

Likelihood Ratio

0.10.20.30.50.71235710

20304050607080

90939597989999.399.599.799.899.9

Pos

terio

r Pro

babi

lity

(%)

0.10.20.30.50.7

12357

10

20304050607080

909395979899

99.399.599.799.899.9

Prio

r Pro

babi

lity

(%)

PreProb: 62%

1(+)LRP: 1.00PostProb: 61.8%

2(+)LRP: 1.98PostProb: 76.2%

3(+)LRP: 5.25PostProb: 89.5%

4(+)LRP: 8.36PostProb: 93.1%

1(-)LRN: 0.00PostProb: .%

2(-)LRN: 0.19PostProb: 23.5%

3(-)LRN: 0.41PostProb: 39.9%

4(-)LRN: 0.65PostProb: 51.3%

Fagan's nomogram


0.1

.2.3

.4.5

.6.7

.8.9

1

0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1

Reference line Predicted riskFitted values

B) M1: Calibrate intercept


0.1

.2.3

.4.5

.6.7

.8.9

1

0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1


C) M2: Calibrate intercept & overall coefficient


0.1

.2.3

.4.5

.6.7

.8.9

1

0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1


E) M4: M2+stepwise selection of significant predictors


Figure 4.4 Calibration plots for external validations at Thammasat Hospital using

different update methods.

A) Original model M0

B) Re-calibration intercept M1

C) Re-calibration intercept M2

D) Revision model M3

E) Revision model M4

F) Revision model M5

G) Revision model M6

0

.

.

.

.

.

.

.7

.8

.9 1

0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1

Reference line Predicted risk Fitted values

G) M6: Stepwise selection of significant predictors


0.1

.2.3

.4.5

.6.7

.8.9

1

0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1


M0: Original model0

.1.2

.3.4

.5.6

.7.8

.91

0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1


B) Calibrate intercept


0.1

.2.3

.4.5

.6.7

.8.9

1

0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1


C) M2: Calibrate intercept & overall coefficient0

.1.2

.3.4

.5.6

.7.8

.91

0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1


D) M3: M2+additional significant predictors from M2


0.1

.2.3

.4.5

.6.7

.8.9

1

0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1


E) M4: M2+stepwise selection of significant predictors0

.1.2

.3.4

.5.6

.7.8

.91

0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1


F) M5: Reestimation of all coefficients


Figure 4.5 Calibration plots for external validations at Chaiyaphum Hospital using

different update methods.

A) Original model M0

B) Re-calibration intercept M1

C) Re-calibration intercept M2

D) Revision model M3

E) Revision model M4

F) Revision model M5

G) Revision model M6

0.1

.2.3

.4.5

.6.7

.8.9

1

0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1


G) M6: Stepwise selection of significant predictors


Figure 4.6 Comparisons of C-statistics between RAMA-AS, Alvarodo, Eskeline and Fenyo scores


CHAPTER V

DISCUSSION

We have developed and validated a clinical prediction score, called

RAMA-AS, for classifying low, intermediate, intermediate-high, and high risk of

having appendicitis using 3 symptoms (i.e., migration of pain, progression of pain, and

aggravation of pain by cough/movement), 2 signs (i.e., body temperature ≥ 37.8°C,

and rebound tenderness), and 2 laboratory results (i.e., WBC >10,000 cells/mm3 and

neutrophils >75%). These variables were routinely assessed and available in clinical

practice in referral or even general hospitals. Internal validation showed the RAMA-

AS performed well for both calibration and discrimination. Given pretest probability

(i.e., prevalence) of appendicitis of 61.8%, the post-test probabilities for moderate and

high risk groups were 89.0% (85.0 to 93.0%) and 93.0% (86.0 to 97.0%), respectively.

In addition, external validation showed good calibrations and discriminations with the

C-statistics of 0.840 (95% CI: 0.800, 0.880) and 0.810 (95%CI: 0.730, 0.890) for

Thammasat University Hospital and Chaiyaphum Hospital, respectively.

5.1 Comparison of RAMA-AS and previous score and radiological

investigation Eskelinen score was the closet clinical decision rule which was developed

using rigorous statistical approaches and exhibited good discrimination and calibration

which was found by our systematic review(38). Alvarado(45) and Fenyo (39) provided

good discrimination and were tested in large samples with sufficient power to

accommodate the number of predictors being tested. These 3 systems were chose to

compare with RAMA-AS. The Alvarado score, most commonly known as prediction

scoring system, was first reported in 1986 and was derived retrospectively from 305

hospitalized patients with abdominal pain and suspected appendicitis (45). Although

the score has been used in many prospective studies (10, 37, 49, 74, 83, 91, 113), it has

Chumpon Wilasrumee Discussion / 98

some limitations including, it was derived from univariate analysis and had only fair to

good discrimination performance. RAMA-AS was derived based on prospective data

collections and proper method of model construction as recommended by TRIPOD

(111). RAMA-AS outperforms Alvarado, Eskelinen, and Fenyo scores (i.e. C-statistics

0.84 for RAMA-AS VS 0.75 for Alvarado, 0.62 for Eskelinen, and 0.76 for Fenyo),

and it can better classify patients into those who can be safe to observe as out-patient

treatment, need further investigation, or need to be admitted for appendectomy. The

predictors used in RAMA-AS, Alvarado, Eskelinen, and Fenyo scores were 7, 8. 6,

and 10, respectively. In laboratory domain, both Eskelinen and Fenyo scores used only

WBC while RAMA-AS and Alvarodo scores used WBC and neutrophils. This might

explain the better discrimination performance of RAMA-AS and Alvarado scores.

There may be less chance that scoring systems can replace imaging studies

in tertiary care hospitals, such as in Ramathibodi or Thammasat University Hospitals.

However, it will be useful where ultrasound is not available or it cannot provide

definitive answers or CT scan is not available. Data from Ramathibodi hospital found

that RAMA-AS had better discrimination performance than ultrasound, but lower than

CT scan (ROC 0.526 for ultrasound and 0.921 for CT scan) as shown in Figure 5.1.

This information might be useful and should be validated more in Thai primary and

secondary care settings.

5.2 External validation and model updating Calibration performance of RAMA-AS was not too good in both external

data sets. This could be explained as follows: First, prevalence of appendicitis in the

derived (Ramathibodi Hospital) and validated data (Thammasat University and

Chaiyaphum Hospitals) were different, i.e., 61.8% vs 48.7% vs 76.9%, respectively.

Therefore, the original model over-estimated risk of appendicitis in Thammasat

University Hospital, but under estimated risk in Chaiyaphum Hospital. We then re-

calibrated the intercept in M1 models by minus and plus the original intercept (i.e.,

baseline risk) with estimated intercepts for Thammasat and Chaiyaphum data,

respectively. These models were still not well calibrated, so we moved further to

recalibrate overall coefficient (M2), but this did not much improve calibrations. The


revisions of models were next constructed and they were much improved, which were

M4 for Thammasat University Hospital and M3-M6, except M5 for Chaiyaphum

Hospitals. We proposed to choose the predictive score according to the prevalence of

appendicitis as shown in Figure 5.2. Although the RAMA–AS did not calibrate well

in the external data when compared to the derived data, it still could well discriminate

appendicitis from non appendicitis in primary care setting (Chaiyaphum Hospital) and

School of Medicine setting (Thammasat University Hospital).

Our team paid attention to recruit the missing data in the 7 predictive

parameters of RAMA-AS for external validation. No missing data was found in these

7 predictive parameters of RAMA-AS from TS and CP. However there were some

missing data on other variables that we could not recover from the data.

The RAMA-AS seemed to perform well in terms of discrimination for all

external data sets. However, the elicited predictive parameters were found in different

distributions between the three settings. For instance, progression of pain had similar

distributions between Ramathibodi hospital and Chaiyaphum, but it was much lower

in TS, 84.8%, 82.6%, and 64.5%, respectively. Aggravation of pain was much higher

in RA, but showed similar trends between the two external data, i.e., 72.5% vs 58.4%

vs 51.4%. Distribution of migration of pain in Ramathibodi Hospital was similar to

Thammasat University Hospital (44.7% vs 48.0%), but was it much higher in

Chaiyaphum Hospital (70.2%). For sign domain, distribution of body temperature

≥37.8 °C in Ramathibodi Hospital was similar to Thammasat University Hospital

(18.7% vs 19.7%), but was much lower than Chaiyaphum Hospital (37.6%). A similar

trend was found for rebound tenderness, i.e., 48.5% vs 42.8% vs 71.3%. Only

laboratory domain showed distributions of WBC (>10,000 cell/mm3) similar and not

too much different across three settings, i.e., 79.6% vs 76.9% vs 82.2% for WBC;

and 66.2% vs 63.5% vs 75.7% for neutrophil.

Differences in distributions of predictors could be explained by severity of

disease, physical examination techniques, residents’ versus staffs’ experiences, time

from onset of symptoms to hospital presentation, communication skill of physicians

and patients. These can distort performances of RAMA-AS when apply to general

settings.


5.3 Using the RAMA-AS in practice Our RAMA-AS should be much useful in acute care settings particularly

in general hospitals where resources are limited. It only requires information of seven

variables, which can be collected from physical examination, interview, and perform

only CBC test. Applying the RAMA-AS is easy and straight forward as follows: First,

data of these seven variables can be input in the equation to estimate the RAMA-AS.

Probability of being appendicitis is then estimated for each risk stratification using

Fagan nomogram.

Second, counting number of positive signs, symptoms, and lab can lead to

estimate risk stratification. For instance, patients are classified as low risks if they

have only positive from all items of signs, symptom, or lab; only 1 positive item for

each of 3 domains; 2 positive items among 3 domains (i.e., 1 symptom and 1 sign, 1

symptom and 1 lab, 1 sign and 1 lab); 3 symptoms with 1 lab without sign; 3

symptoms plus one sign of rebound tenderness without lab. The post-test probability

would be 76.0%; thus observation as out-patient treatment is recommended. The

moderate risk requires 3 symptoms plus one sign of body temperature ≥ 37.8°C, or 3

symptoms plus two labs without any sign. The post-test probability is about 85.0% to

93.0 % for moderate risks, other investigations such as ultrasound or CT scan may

need to be prescribed for these patients.

The high risk group requires 3 symptoms plus 2 signs, or 3 symptoms plus

one sign and one lab, 3 symptoms plus 2 signs plus any of lab, or 3 symptoms plus 2

labs plus any of signs. The post-test probability is about 93.0% and thus surgical

treatment should be performed for high risk patients.

At the very least, even if the RAMA-AS ends up not being used, an

important finding was that when clinically diagnosing appendicitis, only 7 variables

need attention. Thus, for example, there is no need to focus too much on Rovsing’s

sign, if rebound tenderness was positive.


5.4 Strength and limitation Our study has some strengths. We followed the recommendation by

Altman et al (98) and TRIPOD (111) how to conduct the study for risk prediction

score. The study consisted of two study-phases, i.e., development and validation

phases where data were prospectively collected to minimize missing data as much as

possible. However, we still could not avoid missing data which occurred on a few

variables with small percentage of 10.10% in neutrophils and 10.86% in WBC. We

therefore applied multi-chain imputation with 20 imputations to fill in those missing

data. The predictive variables considered in the RAMA-AS were collected based on

the suggestion of our previous systematic review of diagnostic scores for appendicitis

(44). Thus, missing important variable/s should be less likely. The RAMA-SA was

built using appropriated statistical model and suggestion thereof, rather using expert’s

opinion-base. The RAMA-AS has good performances for both calibration and

discrimination in the derived setting, although one external setting has lower

discrimination performance. The RAMA-AS is easily manually calculated, risk

stratification is provided, and recommendation for management with patient is

suggested. However, further handheld calculation, application, or web-base RAMA-

AS should be developed to encourage general practitioners/surgeons to apply in

clinical practice.

However, some limitation could be not avoided. The study was conducted

in a tertiary care setting for both develop and validation phases, where the prevalence

of appendicitis was quite high. The RAMA-AS should be further validated in general

hospital population, and a modified model may be needed if calibration,

discrimination, or both performances are not good. Clinical impact of the RAMA-AS

should be also further assessed.

Because of this anomaly, there were only 9 patients in the non-appendicitis

group with fever > 37.8 degree Celsius. When compared with the appendicitis group,

the odds ratio was quite high, similar to that obtained for “tenderness” with only 1

non-tender abdomen in the appendicitis group. These OR estimates may be unreliable

(wide 95% CIs), and likely spuriously high associations. This was likely a chance

phenomenon, and the development data set may be of questionable generalizability.

There was actually some evidence to question the validity of the development data set:


data from Thammasat University and Chaiyaphum Hospitals did not support body

temperature as an important predictive variable (see models M4 & M6). Other

considerations would similarly point toward less importance of body temperature:

unlike leukocytosis and abdominal tenderness, the body temperature was not constant,

and a patient with fever might have a low body temperature because, for example, the

fever fell during measurement in the ER. Also, the method may or may not elicit high

body temperature, if the method had not been standardized or was wrongly applied in

some way. There was some evidence to support this possibility as well: in table 4.1,

there was less fever in the appendicitis group, while there was more high body

temperature in the same group! How can this discrepancy be accounted for? Similarly,

notice that no laboratory predictors (WBC, or Proportion of Neutrophils) were in the

M4 & M6 models. Either all the predictive information of the lab values was already

contained in the RAMA-AS, or lab predictors were not important for TS & CY. Which

of these possibilities was likely? Each possibility entails a different conclusion.


Figure 5.1 Comparisons of C-statistics between RAMA-AS, ultrasound and CT scan

0.00

0.25

0.50

0.75

1.00

Sen

sitiv

ity

0.00 0.25 0.50 0.75 1.001-Specificity

RAMA-AS ROC area: 0.8422 Ultrasound ROC area: 0.5263

Computer tomographyt ROC area: 0.9211 Reference


Figure 5.2 Guide to choose predictive score for appendicitis according to prevalence


CHAPTER VI

CONCLUSIONS

The new clinical decision rule for diagnosis of appendicitis described in

this thesis, Ramathibodi Appendicitis Scores or RAMA-AS in short, is promising and

has good calibration and discrimination performances. It is simple and easy to

remember and calculate, and calculation is facilitated by computer and web based

calculation as shown in the Figure 6.1

Figure 6.1 Website-base calculation of appendicitis

Application software is a program written using a database technology and

knowledge-based technology. Database technology is a structured collection of data.

According to this study, database is the data to be analyzed in statistical method.

Statistical model for risk/prognostication is common in a medical field with several

uses, including classification and risk assessment. By using statistical model

knowledge, predictions from data analysis are applied to make a decision to perform a

diagnostic test. C-statistic, sensitivity, specificity, accuracy and cost-effectiveness

Chumpon Wilasrumee Conclusions / 106

analysis are determined in code and unit test. These criteria are collected in a central

unit used in calculation by scoring system.

The objective of this software is to support a developed scoring system for

diagnosis of appendicitis with user-friendly interface. Graphical user interface (GUI)

helps users to interact with software using images. It allows the users to feel more

comfortable to system. Furthermore, this software can simplify calculation process and

be able to be repeatedly evaluated all the time.

RAMA-AS can be considered as level 2 clinical decision rule which can

be used in various settings with confidence in its accuracy (95). RAMA-AS has been

validated in 2 external sites which differ from one another. Systematic review of

previously developed diagnostic scoring systems for appendicitis (44) was performed

to make sure that all important predictors were included in the derivation process.

Important predictors were presented in a significant proportion of the study

population. Outcome events and all predictors were clearly defined. Outcomes were

assessed blinded to the presence of event. Outcome of appendicitis was known after

the pathological report in the surgical cases and telephone follow up at 1 month.

Sample size was calculated and adequate for the number of outcome events. Finally,

RAMA-AS makes clinical sense as recommended in methodological standards for

derivation of a clinical decision rule (95).

In the external validation, patients were chosen in unbiased fashion and

represented a wide spectrum of disease. There were blinded assessments of the

criterion standard for all patients. Explicit and accurate interpretation of the predictors

and the rule without knowledge of the outcome were applied to the 2 external

validation sites. Finally, the 100% follow up was achieved for these sites. The above

methodological standards for validation of the clinical decision rule were applied in

this thesis (95), and RAMA-AS did not have good calibration performance with 2

external sites. Model updating was performed to improve the calibration performance

which result in good calibration and discrimination performance of the RAMA-AS.

Ongoing impact analysis is continued in Phukaew Hospital, Chaiyaphum.

Admission forms for patients who are suspected of having appendicitis were designed

to compare the outcome of using clinical evaluation, RAMA-AS, and Alvarado score

as shown in Figure 6.2.


Figure 6.2 Admission record of ongoing impact analysis

Future research

Bayesian model averaging, which could attenuate the effect of body

temperature may improve the RAMA-AS performance and may be use for model

updating or derivation of new model.

External validation and impact analysis in primary and secondary care

hospitals, where scoring systems will have the most impact, should be done. Impact

analysis using clustered randomized controlled trial and cost analysis considering

direct costs (e.g., operation, investigations, imaging (ultrasound, CT, and MRI),

laparoscopy, and hospital stay) as well as indirect cost such as cost saving by not

having to perform investigations (C-reactive protein and erythrocyte sedimentation

rate), imaging (ultrasound, CT), or operation. The cost effectiveness and cost utility

analysis should focus on the change of medical expense after using the RAMA-AS.

The cost saving after using the score which can reduce the negative appendectomy rate

should be analyzed. The cost of imaging including US, CT, and MRI and diagnostic

laparoscopy should be analyzed.

Chumpon Wilasrumee Conclusions / 108

We hope that RAMA-AS will be a level 1 clinical decision rule, which

will aid clinical judgment, changing clinical behavior, and reduce unnecessary cost

while maintaining quality of care and patient-doctor satisfaction. It may legally

strengthen decision making in the emergency room and avoid malpractice liability.


REFERENCES

1.Bhangu A, Soreide K, Di Saverio S, Assarsson JH, Drake FT. Acute appendicitis:

modern understanding of pathogenesis, diagnosis, and management. Lancet.

2015;386(10000):1278-87.

2.Carr NJ. The pathology of acute appendicitis. Ann Diag Pathol. 2000;4(1):46-58.

3.Petroianu A, Alberti LR, Zac RI. Fecal loading in the cecum as a new radiological

sign of acute appendicitis. World J Gastroenterol. 2005;11(27):4230-2.

4.Petroianu A, Alberti LR, Zac RI. Assessment of the persistence of fecal loading in the

cecum in presence of acute appendicitis. Int J Surg. 2007;5(1):11-6.

5.Horton LW. Pathogenesis of acute appendicitis. Brit Med J. 1977;2(6103):1672-3.

6. Kong VY, Sartorius B, Clarke DL. Acute appendicitis in the developing world is a morbid disease. Ann R Coll Surg Engl. 2015 Jul;97(5):390-5.

7.Bohrod MG. The pathogenesis of acute appendicitis. Amer J Clin Pathol.

1946;16(12):752-60.

8.D'Souza N, Nugent K. Appendicitis. Amer Fam Physician. 2016;93(2):142-3.

9.Teixeira PG, Demetriades D. Appendicitis: changing perspectives. Adv Surg.

2013;47:119-40.

10.de Castro SM, Unlu C, Steller EP, van Wagensveld BA, Vrouenraets BC. Evaluation

of the appendicitis inflammatory response score for patients with acute

appendicitis. World J Surg. 2012;36(7):1540-5.

11.Abbas PI, Zamora IJ, Elder SC, Brandt ML, Lopez ME, Orth RC, et al. How Long

Does it Take to Diagnose Appendicitis? Time Point Process Mapping in the

Emergency Department. Pediatr Emerg Care. 2016.

12.Petroianu A. Diagnosis of acute appendicitis. Int J Surg. 2012;10(3):115-9.

13.Di Sebastiano P, Fink T, di Mola FF, Weihe E, Innocenti P, Friess H, et al.

Neuroimmune appendicitis. Lancet. 1999;354(9177):461-6.

14.Rubèr M, Universitetet i L. Immunopathogenic aspects of resolving and progressing

appendicitis. Linköping: Linköping University; 2012.

Chumpon Wilasrumee References / 110

15. Lee M, Paavana T, Mazari F, Wilson TR M. The Morbidity of Negative

Appendectomy. Ann R Coll Surg Engl. 2014;96(7):517-20.

16.Langenscheidt P, Lang C, Puschel W, Feifel G. High rates of appendicectomy in a

developing country: an attempt to contribute to a more rational use of

surgical resources. Eur J Surg. 1999;165(3):248-52.

17.Lewis FR, Holcroft JW, Boey J, Dunphy E. Appendicitis. A critical review of

diagnosis and treatment in 1,000 cases. Arch Surg. 1975;110(5):677-84.

18.Alvarado A. How to improve the clinical diagnosis of acute appendicitis in resource

limited settings. World J Emerg Surg. 2016;11:16.

19.Shogilev DJ, Duus N, Odom SR, Shapiro NI. Diagnosing appendicitis: evidence-

based review of the diagnostic approach in 2014. West J Emerg Med.

2014;15(7):859-71.

20.Guite KM, Hinshaw JL, Ranallo FN, Lindstrom MJ, Lee FT, Jr. Ionizing radiation

in abdominal CT: unindicated multiphase scans are an important source of

medically unnecessary exposure. J Amer Coll Radiol. 2011;8(11):756-61.

21.Hsieh CH, Lu RH, Lee NH, Chiu WT, Hsu MH, Li YC. Novel solutions for an old

disease: diagnosis of acute appendicitis with random forest, support vector

machines, and artificial neural networks. Surg. 2011;149(1):87-93.

22.Rao PM, Rhea JT, Rattner DW, Venus LG, Novelline RA. Introduction of

appendiceal CT: impact on negative appendectomy and appendiceal

perforation rates. Ann Surg. 1999;229(3):344-9.

23.Flum DR, Morris A, Koepsell T, Dellinger EP. Has misdiagnosis of appendicitis

decreased over time? A population-based analysis. J Amer Med Assoc.

2001;286(14):1748-53.

24.Huynh V, Lalezarzadeh F, Lawandy S, Wong DT, Joe VC. Abdominal computed

tomography in the evaluation of acute and perforated appendicitis in the

community setting. Amer Surg. 2007;73(10):1002-5.

25.Lee SL, Walsh AJ, Ho HS. Computed tomography and ultrasonography do not

improve and may delay the diagnosis and treatment of acute appendicitis.

Arch Surg. 2001;136(5):556-62.


26.Vadeboncoeur TF, Heister RR, Behling CA, Guss DA. Impact of helical computed

tomography on the rate of negative appendicitis. Amer J Emerg Med.

2006;24(1):43-7.

27.Ebell MH. Diagnosis of appendicitis: part II. Laboratory and imaging tests. Amer

Fam Physician. 2008;77(8):1153-5.

28.Bachur RG, Hennelly K, Callahan MJ, Monuteaux MC. Advanced radiologic

imaging for pediatric appendicitis, 2005-2009: trends and outcomes. J

Pediatr. 2012;160(6):1034-8.

29.Petrosyan M, Estrada J, Chan S, Somers S, Yacoub WN, Kelso RL, et al. CT scan in

patients with suspected appendicitis: clinical implications for the acute care

surgeon. Eur Surg Res. 2008;40(2):211-9.

30.McKay R, Shepherd J. The use of the clinical scoring system by Alvarado in the

decision to perform computed tomography for acute appendicitis in the ED.

Amer J Emerg Med. 2007;25(5):489-93.

31.Andersson RE, Hugander A, Ravn H, Offenbartl K, Ghazi SH, Nystrom PO, et al.

Repeated clinical and laboratory examinations in patients with an equivocal

diagnosis of appendicitis. World J Surg. 2000;24(4):479-85.

32.Seetahal SA, Bolorunduro OB, Sookdeo TC, Oyetunji TA, Greene WR, Frederick

W, et al. Negative appendectomy: a 10-year review of a nationally

representative sample. Amer J Surg. 2011;201(4):433-7.

33.Spiegel DA, Gosselin RA. Surgical services in low-income and middle-income

countries. Lancet. 2007;370(9592):1013-5.

34.Andersson RE. Meta-analysis of the clinical and laboratory diagnosis of appendicitis.

Br J Surg. 2004;91(1):28-37.

35.Kong V, Aldous C, Handley J, Clarke D. The cost effectiveness of early management

of acute appendicitis underlies the importance of curative surgical services

to a primary healthcare programme. Ann Royal Coll Surg Engl.

2013;95(4):280-4.

36.Ohle R, O'Reilly F, O'Brien KK, Fahey T, Dimitrov BD. The Alvarado score for

predicting acute appendicitis: a systematic review. BMC med. 2011;9:139.


37.Fenyo G, Lindberg G, Blind P, Enochsson L, Oberg A. Diagnostic decision support

in suspected acute appendicitis: validation of a simplified scoring system.

Eur J Surg. 1997;163(11):831-8.

38.Eskelinen M., Ikonen J., P L. A computer-based diagnostic score to aid in diagnosis

of acute appendicitis. A prospective study of 1333 patients with acute

abdominal pain. Theor Surg. 1992;7(2):86-90.

39.Fenyo G. Routine use of a scoring system for decision-making in suspected acute

appendicitis in adults. Acta Chir Scand. 1987;153(9):545-51.

40.Ohmann C, Yang Q, Franke C. Diagnostic scores for acute appendicitis. Abdominal

Pain Study Group. Eur J Surg. 1995;161(4):273-81.

41.Al Qahtani HH, Muhammad AA. Alvarado score as an admission criterion for

suspected appendicitis in adults. Saudi J Gastroenterol. 2004;10(2):86-91.

42.Gregory S, Kuntz K, Sainfort F, Kharbanda A. Cost-Effectiveness of Integrating a

Clinical Decision Rule and Staged Imaging Protocol for Diagnosis of

Appendicitis. Value health. 2016;19(1):28-35.

43.Kirkil C, Karabulut K, Aygen E, Ilhan YS, Yur M, Binnetoglu K, et al. Appendicitis

scores may be useful in reducing the costs of treatment for right lower

quadrant pain. Ulus Travma Acil Cerrahi Derg. 2013;19(1):13-9.

44.Wilasrusmee C, Anothaisintawee T, Poprom N, McEvoy M, Attia J, Thakkinstian.

A. Diagnostic Scores for Appendicitis: A Systematic Review of Scores’

Performance. Br J Med Med Res. 2014;4(2):11-20.

45.Alvarado A. A practical score for the early diagnosis of acute appendicitis. Ann

Emerg Med. 1986;15(5):557-64.

46.Kalan M, Talbot D, Cunliffe WJ, Rich AJ. Evaluation of the modified Alvarado score

in the diagnosis of acute appendicitis: a prospective study. Ann Royal Coll

Surg Engl. 1994;76(6):418-9.

47.Khan I, ur Rehman A. Application of alvarado scoring system in diagnosis of acute

appendicitis. J Ayub Med Coll, Abbottabad : JAMC. 2005;17(3):41-4.

48.Al-Hashemy AM, Seleem MI. Appraisal of the modified Alvarado Score for acute

appendicits in adults. Saudi Med J. 2004;25(9):1229-31.


49.Chong CF, Thien A, Mackie AJ, Tin AS, Tripathi S, Ahmad MA, et al. Comparison

of RIPASA and Alvarado scores for the diagnosis of acute appendicitis.

Singapore Med J. 2011;52(5):340-5.

50.Andersson M, Andersson RE. The appendicitis inflammatory response score: a tool

for the diagnosis of acute appendicitis that outperforms the Alvarado score.

World J Surg. 2008;32(8):1843-9.

51.Ohmann C, Franke C, Yang Q. Clinical benefit of a diagnostic score for appendicitis:

results of a prospective interventional study. German Study Group of Acute

Abdominal Pain. Arch Surg. 1999;134(9):993-6.

52.Tepel J, Sommerfeld A, Klomp HJ, Kapischke M, Eggert A, Kremer B. Prospective

evaluation of diagnostic modalities in suspected acute appendicitis.

Langenbeck's Arch Surg. 2004;389(3):219-24.

53.Sitter H, Hoffmann S, Hassan I, Zielke A. Diagnostic score in appendicitis.

Validation of a diagnostic score (Eskelinen score) in patients in whom acute

appendicitis is suspected. Langenbeck's Arch Surg. 2004;389(3):213-8.

54.Christian F, Christian GP. A simple scoring system to reduce the negative

appendicectomy rate. Ann Royal Coll Surg Engl. 1992;74(4):281-5.

55.Ramirez JM, Deus J. Practical score to aid decision making in doubtful cases of

appendicitis. Br J Surg. 1994;81(5):680-3.

56.Teicher I, Landa B, Cohen M, Kabnick LS, Wise L. Scoring system to aid in

diagnoses of appendicitis. Ann Surg. 1983;198(6):753-9.

57.Burcharth J, Pommergaard HC, Rosenberg J, Gogenur I. Hyperbilirubinemia as a

predictor for appendiceal perforation: a systematic review. Scand J Surg.

2013;102(2):55-60.

58.Redmond JM, Smith GW, Wilasrusmee C, Kittur DS. A new perspective in

appendicitis: calculation of half time (T(1/2)) for perforation. Amer Surg.

2002;68(7):593-7.

59.Pesonen E, Eskelinen M, Juhola M. Comparison of different neural network

algorithms in the diagnosis of acute appendicitis. Int J Biomed Comput.

1996;40(3):227-33.

60.P K, urengan. Appendicectomy: to do or not. Int J Res Med Sci. 2015;3(3):670-4.


61.Addiss DG, Shaffer N, Fowler BS, Tauxe RV. The epidemiology of appendicitis and

appendectomy in the United States. Amer J Epidemiol. 1990;132(5):910-25.

62.Horntrich J, Schneider W. [Appendicitis from an epidemiological viewpoint].

Zentralblatt fur Chirurgie. 1990;115(23):1521-9.

63.Temple CL, Huchcroft SA, Temple WJ. The natural history of appendicitis in adults.

A prospective study. Ann Surg. 1995;221(3):278-81.

64.Chong CF, Adi MIW, Thien A, Suyoi A, Mackie AJ, Tin AS, et al. Development of

the RIPASA score: A new appendicitis scoring system for the diagnosis of

acute appendicitis. Singapore Med J. 2010;51(3):220-5.

65.de Castro SM, Unlu C, Steller EP, van Wagensveld BA, Vrouenraets BC. Evaluation

of the Appendicitis Inflammatory Response Score for Patients with Acute

Appendicitis. World J Surg. 2012.

66.Denizbasi A, Unluer EE. The role of the emergency medicine resident using the

Alvarado score in the diagnosis of acute appendicitis compared with the

general surgery resident. Eur J Trauma Emerg Surg. 2003;10(4):296-301.

67.Enochsson L, Gudbjartsson T, Hellberg A, Rudberg C, Wenner J, Ringqvist I, et al.

The Fenyo-Lindberg scoring system for appendicitis increases positive

predictive value in fertile women--a prospective study in 455 patients

randomized to either laparoscopic or open appendectomy. Surg Endosc.

2004;18(10):1509-13.

68.Eskelinen M, Ikonen J, Lipponen P. A computer-based diagnostic score to aid in

diagnosis of acute appendicitis. A prospective study of 1333 patients with

acute abdominal pain. Theor Surg. 1992;7(2):86-90.

69.Galindo Gallego M, Fadrique B, Nieto MA, Calleja S, Fernandez-Acenero MJ, Ais

G, et al. Evaluation of ultrasonography and clinical diagnostic scoring in

suspected appendicitis. Br J Surg. 1998;85(1):37-40.

70.Inci E, Hocaoglu E, Aydin S, Palabiyik F, Cimilli T, Turhan AN, et al. Efficiency of

unenhanced MRI in the diagnosis of acute appendicitis: Comparison with

Alvarado scoring system and histopathological results. Eur J Radiol.

2011;80(2):253-8.


71.Kanumba ES, Mabula JB, Rambau P, Chalya PL. Modified Alvarado Scoring

System as a diagnostic tool for Acute Appendicitis at Bugando Medical

Centre, Mwanza, Tanzania. BMC surg. 2011;11.

72.Konan A, Hayran M, Kilic YA, Karakoc D, Kaynaroglu V. Scoring systems in the

diagnosis of acute appendicitis in the elderly. Ulusal travma ve acil cerrahi

dergisi. 2011;17(5):396-400.

73.Kurane SB, Sangolli MS, Gogate AS. A one year prospective study to compare and

evaluate diagnostic accuracy of modified Alvarado score and

ultrasonography in acute appendicitis, in adults. Indian J Surg.

2008;70(3):125-9.

74.Lamparelli MJ, Hoque HM, Pogson CJ, Ball AB. A prospective evaluation of the

combined use of the modified Alvarado score with selective laparoscopy in

adult females in the management of suspected appendicitis. Ann R Coll Surg

Engl 2000;82(3):192-5.

75.Limpawattanasiri C. Alvarado score for the acute appendicitis in a provincial

hospital. J Med Assoc Thai. 2011;94(4):441-9.

76.Lintula H, Pesonen E, Kokki H, Vanamo K, Eskelinen M. A diagnostic score for

children with suspected appendicitis. Langenbecks Arch Surg.

2005;390(2):164-70.

77.Malik AH, Wani RA, Saima BD, Wani MY. Small lateral access--an alternative

approach to appendicitis in paediatric patients: a randomised controlled trial.

Int J Surg. 2007;5(4):234-8.

78.Pouget-Baudry Y, Mucci S, Eyssartier E, Guesdon-Portes A, Lada P, Casa C, et al.

The use of the Alvarado score in the management of right lower quadrant

abdominal pain in the adult. J Visc Surg. 2010;147(2):e40-4.

79.Pruekprasert P, Geater A, Ksuntigij P, Maipang T, Apakupakul N. Accuracy in

diagnosis of acute appendicitis by comparing serum C-reactive protein

measurements, Alvarado score and clinical impression of surgeons. J Med

Assoc Thai.. 2004;87(3):296-303.

80.Sun JS, Noh HW, Min YG, Lee JH, Kim JK, Park KJ, et al. Receiver operating

characteristic analysis of the diagnostic performance of a computed

tomographic examination and the Alvarado score for diagnosing acute


appendicitis: emphasis on age and sex of the patients. J Comput Tomogr.

2008;32(3):386-91.

81.Talukder D.B. SA. Modified Alvarado Scoring System in the Diagnosis of Acute

Appendicitis. JAFMC Bangladesh. 2009;5(1):18-20.

82.Teicher I, Landa B, Cohen M. Scoring system to aid in diagnoses of appendicitis.

Ann Surg. 1983;198(6):753-9.

83.Tzanakis NE, Efstathiou SP, Danulidis K, Rallis GE, Tsioulos DI, Chatzivasiliou A,

et al. A new approach to accurate diagnosis of acute appendicitis. World J

Surg. 2005;29(9):1151-6, discussion 7.

84.Van Way CW, 3rd, Murphy JR, Dunn EL, Elerding SC. A feasibility study of

computer aided diagnosis in appendicitis. Surgery, gynecology & obstetrics.

1982;155(5):685-8.

85.Yoldas O, Karaca T, Tez M. External validation of Lintula score in Turkish acute

appendicitis patients. International journal of surgery. 2012;10(1):25-7.

86.Malik AH, Wani RA, Saima BD, Wani MY. Small lateral access-an alternative

approach to appendicitis in paediatric patients: A randomised controlled

trial. Int J Surg. 2007;5(4):234-8.

87. Kong VY, van der Linde S, Aldous C, Handley JJ, Clarke DL. The accuracy of the Alvarado score in predicting acute appendicitis in the black South African

population needs to be validated. Can J Surg. 2014;57(4):E121-5. 88.Inci E, Hocaoglu E, Aydin S, Palabiyik F, Cimilli T, Turhan AN, et al. Efficiency of

unenhanced MRI in the diagnosis of acute appendicitis: comparison with

Alvarado scoring system and histopathological results. Eur J Radiol.

2011;80(2):253-8.

89.Kanumba ES, Mabula JB, Rambau P, Chalya PL. Modified Alvarado Scoring

System as a diagnostic tool for acute appendicitis at Bugando Medical

Centre, Mwanza, Tanzania. BMC Surg. 2011;11:4.

90.Konan A, Hayran M, Kilic YA, Karakoc D, Kaynaroglu V. Scoring systems in the

diagnosis of acute appendicitis in the elderly. Ulus Travma Acil Cerrahi

Derg. 2011;17(5):396-400.

91.Kurane SB, Sangolli MS, Gogate AS. A one year prospective study to compare and

evaluate diagnostic accuracy of modified Alvarado score and


ultrasonography in acute appendicitis, in adults. Indian J Surg.

2008;70(3):125-9.

92.Talukder DB, Siddiq AZ. Modified Alvarado Scoring System in the Diagnosis of

Acute Appendicitis. JAFMC Bangladesh. 2009;5(1):3.

93.Yoldas O, Karaca T, Tez M. External validation of Lintula score in Turkish acute

appendicitis patients. Inter J Surg. 2012;10(1):25-7.

94.Lintula H, Kokki H, Pulkkinen J, Kettunen R, Grohn O, Eskelinen M. Diagnostic

score in acute appendicitis. Validation of a diagnostic score (Lintula score)

for adults with suspected appendicitis. Langenbecks Arch Surg.

2010;395(5):495-500.

95.McGinn TG, Guyatt GH, Wyer PC, Naylor CD, Stiell IG, Richardson WS. Users'

guides to the medical literature: XXII: how to use articles about clinical

decision rules. Evidence-Based Medicine Working Group. J Amer Med

Assoc. 2000;284(1):79-84.

96.Pruekprasert P, Maipang T, Geater A, Apakupakul N, Ksuntigij P. Accuracy in

diagnosis of acute appendicitis by comparing serum C-reactive protein

measurements, Alvarado score and clinical impression of surgeons. J Med

Assoc Thai. 2004;87(3):296-303.

97.Normand SL. Meta-analysis: formulating, evaluating, combining, and reporting. Stat

Med. 1999;18(3):321-59.

98.Altman DG, Royston P. What do we mean by validating a prognostic model? Stat

Med. 2000;19(4):453-73.

99.Harrell FE, Jr., Lee KL, Mark DB. Multivariable prognostic models: issues in

developing models, evaluating assumptions and adequacy, and measuring

and reducing errors. Stat Med. 1996;15(4):361-87.

100. Gordon G, Rennie D, Meade MO, Cook DJ. Users' Guides to the medical literature:

A Menual for evidence-based clinical practice. 2, editor. Chicago: McGraw

Hill; 2008.

101.Mallett S, Royston P, Dutton S, Waters R, Altman DG. Reporting methods in

studies developing prognostic models in cancer: a review. BMC Med.

2010;8:20.


102.Moons KG, Altman DG, Reitsma JB, Ioannidis JP, Macaskill P, Steyerberg EW, et

al. Transparent Reporting of a multivariable prediction model for Individual

Prognosis or Diagnosis (TRIPOD): explanation and elaboration. Ann Inter

Med. 2015;162(1):W1-73.

103.Rubin DB, Schenker N. Multiple imputation in health-care databases: an overview

and some applications. Stat Med. 1991;10(4):585-98.

104.White IR, Royston P, Wood AM. Multiple imputation using chained equations:

Issues and guidance for practice. Stat Med. 2011;30(4):377-99.

105.van Buuren S, Boshuizen HC, Knook DL. Multiple imputation of missing blood

pressure covariates in survival analysis. Stat Med. 1999;18(6):681-94.

106.McNeil BJ, Hanley JA. Statistical approaches to the analysis of receiver operating

characteristic (ROC) curves. Med Decis Making. 1984;4(2):137-50.

107.Hajian-Tilaki K. Receiver Operating Characteristic (ROC) Curve Analysis for

Medical Diagnostic Test Evaluation. Caspian J Int Med. 2013;4(2):627-35.

108.Ehsanullah J, Ahmad U, Solanki K, Healy J, Kadoglou N. The surgical admissions

proforma: Does it make a difference? Ann Med Surg. 2015;4(1):53-7.

109.Janssen KJ, Vergouwe Y, Kalkman CJ, Grobbee DE, Moons KG. A simple method

to adjust clinical prediction models to local circumstances. Can J Anaesth.

2009;56(3):194-201.

110.Toll DB, Janssen KJ, Vergouwe Y, Moons KG. Validation, updating and impact of

clinical prediction rules: a review. J Clin Epidemiol. 2008;61(11):1085-94.

111.Moons KG, Altman DG, Reitsma JB, Collins GS. New Guideline for the Reporting

of Studies Developing, Validating, or Updating a Multivariable Clinical

Prediction Model: The TRIPOD Statement. Adv Anat Pathol.

2015;22(5):303-5.

112.Steyerberg EW, Moons KG, van der Windt DA, Hayden JA, Perel P, Schroter S, et

al. Prognosis Research Strategy (PROGRESS) 3: prognostic model research.

PLoS Med. 2013;10(2):e1001381.

113.Watters JM. The appendicitis inflammatory response score: a tool for the diagnosis

of appendicitis that outperforms the Alvarado score. World J Surg.

2008;32(8):1850.


APPENDICES

Chumpon Wilasrumee Appendices / 120

APPENDIX A

CASE RECORD FORMS





APPENDIX B

IMPUTATION

Table Report 1 number of missing data

Missing Variables

Percent Observed Imputed FMI RVI

WBC 10.86 353 43 <0.0001 <0.0001

Neutrophil 10.10 356 40 <0.0001 <0.0001

Lymphocyte 10.10 356 40 <0.0001 <0.0001

FMI = largest fraction of missing information of coefficient, RVI = average relative increase in variances of estimates


A: WBC

B: Neutrophils

Figure 1 Diagnosis plot between missing and observed values


APPENDIX C

STATA COMMANDS

Imputations

***Pattern

misstable sum outcome_2 age sex symptom1 symptom3 symptom4 symptom5 symptom6 nv

movecoug bowel fever gr_bt tender rebten guarding labwbc labneu lablym

misstable pattern labwbc labneu lablym

****pattern as monotone vs arbitary

mi misstable nested labwbc labneu lablym

1. labneu(40) <-> lablym(40)

2. labwbc(43)

missing labneu is always missing lablym! but both vars were not overlapped with

labwbc!

Thus a pattern of missing in monotone for two vars and arbitary for wbc.

mi set mlong

mi register imputed labwbc labneu lablym

mi register regular outcome_2 age sex symptom1 symptom3 symptom4 symptom5

symptom6 nv movecoug bowel fever gr_bt tender rebten guarding

foreach var of varlist symptom3 symptom5 symptom6 rebten movecoug fever tender

guarding {

recode `var' 2=0

tab gr_bt

}

mi impute chained (truncreg, ll(50) ul(80000)) labwbc ///

(truncreg, ll(1) ul(100))labneu lablym = outcome_2 age ib(2).sex gr_bt tender rebten

ib(3).symptom1 symptom3 symptom5 symptom6 movecoug guarding nv i.bowel, add(20) force

rseed(15690)

***mi estimate

mi estimate, saving(m1, replace) dots: logit outcome_2 i.symptom3 i.symptom5

i.movecoug i.rebten i.gr_bt i.labwbc_2 i.labneu_2

mi predict xb using m1, xb

mi passive:gen prob=(exp(xb))/(1+(exp(xb)))

sum xb prob

***Exploring mi performances

a) Estimate Relative Variance Increase (RVI) & Fraction of Missing Information (FMI)

The variablility of MI consists of two sources, i.e., within and between imputations.

Thus, precision of MI estimated depends not only on number of subjects, but also number

of imputations.

The RVI refers to average relative increase in variances of estimates because of

missing WBC & Neu.

That mean (variance of all coefficients) from missing data, is = 0.0000. This value is

close to 0, so missing data less reflects on estimates.


FMI, refers to the largest fraction of missing information of coefficient estimates due

to missing data. This FMI was used to get an idea about the number of imputations based

on a rule of thumb. The M>=FMIx100, e.g., .212853 x100, i.e., at least 21 Ms are

required.

mi estimate, vartable nocitable

mi estimate, dftable

b) Diagnostic plots:

by comparing distribution of the imputed values with the observed values

midiagplots labwbc, sample(all) combine ksmirnov

Score development

tab variables appendic_status,col chi2 exp exact

gen score = -3.374991 + .7978508*symptom3 + 1.042774*symptom5 + .7787298*movecoug +

1.529419*rebten + 1.636311 *fever2 + ///

.9095179*labwbc2 + .6898424*labneu2

( >= -3.37.. ) 100.00% 0.00% 60.86% 1.0000

( >= -2.68.. ) 100.00% 3.87% 62.37% 1.0403 0.0000

( >= -2.57.. ) 100.00% 5.16% 62.88% 1.0544 0.0000

( >= -2.46.. ) 100.00% 6.45% 63.38% 1.0690 0.0000

( >= -2.33.. ) 99.59% 8.39% 63.89% 1.0870 0.0495

( >= -1.90.. ) 99.59% 12.90% 65.66% 1.1434 0.0322

( >= -1.88.. ) 99.59% 13.55% 65.91% 1.1519 0.0306

( >= -1.84.. ) 99.59% 14.19% 66.16% 1.1606 0.0292

( >= -1.77.. ) 99.59% 14.84% 66.41% 1.1694 0.0280

( >= -1.68.. ) 99.59% 18.06% 67.68% 1.2154 0.0230

( >= -1.66.. ) 99.17% 18.71% 67.68% 1.2199 0.0444

( >= -1.64.. ) 99.17% 19.35% 67.93% 1.2297 0.0429

( >= -1.55.. ) 99.17% 20.00% 68.18% 1.2396 0.0415

( >= -1.53.. ) 97.51% 27.74% 70.20% 1.3495 0.0897

( >= -1.42.. ) 97.51% 28.39% 70.45% 1.3616 0.0877

( >= -1.15.. ) 97.10% 34.19% 72.47% 1.4755 0.0849

( >= -1.06.. ) 97.10% 34.84% 72.73% 1.4901 0.0834

( >= -1.04.. ) 97.10% 35.48% 72.98% 1.5050 0.0819

( >= -.996.. ) 96.68% 35.48% 72.73% 1.4985 0.0935

( >= -.97778 ) 95.85% 38.06% 73.23% 1.5476 0.1090

( >= -.959.. ) 95.44% 39.35% 73.48% 1.5737 0.1160

( >= -.936.. ) 95.44% 40.00% 73.74% 1.5906 0.1141

( >= -.863.. ) 95.44% 40.65% 73.99% 1.6079 0.1123

( >= -.844.. ) 94.61% 40.65% 73.48% 1.5939 0.1327

( >= -.829.. ) 94.19% 40.65% 73.23% 1.5869 0.1429

( >= -.802.. ) 93.78% 40.65% 72.98% 1.5799 0.1531

( >= -.755.. ) 93.36% 41.94% 73.23% 1.6079 0.1583

( >= -.732.. ) 92.12% 46.45% 74.24% 1.7202 0.1697

( >= -.643.. ) 90.87% 50.32% 75.00% 1.8292 0.1814

( >= -.624.. ) 89.63% 54.84% 76.01% 1.9846 0.1892

( >= -.377.. ) 88.80% 55.48% 75.76% 1.9947 0.2019

( >= -.246.. ) 88.38% 55.48% 75.51% 1.9854 0.2094

( >= -.209.. ) 87.55% 56.13% 75.25% 1.9957 0.2218

( >= -.199.. ) 87.55% 56.77% 75.51% 2.0255 0.2193


( >= -.157.. ) 87.14% 57.42% 75.51% 2.0464 0.2240

( >= -.139.. ) 87.14% 58.06% 75.76% 2.0779 0.2215

( >= -.065.. ) 86.72% 59.35% 76.01% 2.1336 0.2237

( >= -.024.. ) 86.72% 60.65% 76.52% 2.2036 0.2189

( >= .0458.. ) 86.72% 61.29% 76.77% 2.2403 0.2166

( >= .0649.. ) 80.08% 69.03% 75.76% 2.5860 0.2885

( >= .1067.. ) 78.84% 71.61% 76.01% 2.7773 0.2955

( >= .1538.. ) 78.42% 72.90% 76.26% 2.8942 0.2960

( >= .5325.. ) 77.18% 75.48% 76.52% 3.1481 0.3023

( >= .5516.. ) 76.76% 76.13% 76.52% 3.2158 0.3052

( >= .6394.. ) 76.35% 76.13% 76.26% 3.1984 0.3107

( >= .6657.. ) 76.35% 76.77% 76.52% 3.2872 0.3081

( >= .6848.. ) 75.93% 78.06% 76.77% 3.4617 0.3083

( >= .7737.. ) 74.69% 78.71% 76.26% 3.5081 0.3216

( >= .7965.. ) 73.03% 80.00% 75.76% 3.6515 0.3371

( >= .8437.. ) 71.37% 81.29% 75.25% 3.8146 0.3522

( >= .8854.. ) 64.32% 87.74% 73.48% 5.2468 0.4067

( >= .9045.. ) 60.17% 90.32% 71.97% 6.2171 0.4410

( >= .9923.. ) 59.34% 90.97% 71.72% 6.5693 0.4470

( >= 1.011.. ) 58.92% 90.97% 71.46% 6.5234 0.4516

( >= 1.330.. ) 58.51% 90.97% 71.21% 6.4775 0.4561

( >= 1.390.. ) 57.68% 90.97% 70.71% 6.3856 0.4653

( >= 1.463.. ) 57.26% 90.97% 70.45% 6.3397 0.4698

( >= 1.478.. ) 56.43% 90.97% 69.95% 6.2478 0.4789

( >= 1.570.. ) 56.02% 90.97% 69.70% 6.2018 0.4835

( >= 1.575.. ) 55.60% 90.97% 69.44% 6.1559 0.4881

( >= 1.594.. ) 45.64% 93.55% 64.39% 7.0747 0.5811

( >= 1.682.. ) 44.40% 93.55% 63.64% 6.8817 0.5944

( >= 1.6833 ) 42.32% 94.19% 62.63% 7.2891 0.6123

( >= 1.701.. ) 38.59% 94.84% 60.61% 7.4767 0.6475

( >= 1.74303 ) 37.76% 95.48% 60.35% 8.3610 0.6518

( >= 1.790.. ) 37.34% 95.48% 60.10% 8.2691 0.6562

( >= 2.168.. ) 36.93% 95.48% 59.85% 8.1772 0.6605

( >= 2.302.. ) 36.93% 96.13% 60.10% 9.5401 0.6561

( >= 2.373.. ) 36.10% 96.13% 59.60% 9.3257 0.6647

( >= 2.432.. ) 19.50% 99.35% 50.76% 30.2284 0.8102

( >= 2.480.. ) 19.09% 99.35% 50.51% 29.5853 0.8144

( >= 2.540.. ) 15.35% 99.35% 48.23% 23.7968 0.8520

( >= 3.211.. ) 14.94% 99.35% 47.98% 23.1537 0.8561

( >= 3.230.. ) 7.05% 100.00% 43.43% 0.9295

( >= 3.319.. ) 5.81% 100.00% 42.68% 0.9419

( >= 4.009.. ) 5.39% 100.00% 42.42% 0.9461

( > 4.009.. ) 0.00% 100.00% 39.14% 1.0000

Score performance

mi convert wide

roctab outcome_2 xb

fagani 0.618 8.2691 0.6562


use fagan.dta, clear

fagan lrp lrn, grpvar(test)

fagan lrp lrn, grpvar(test) pr(0.5)

fagan lrp lrn, grpvar(test) pr(0.5) scheme(s2color

Internal validation

Bootstrap

set more off

cd c:\data\

mi estimate, saving(ramap, replace) dots: logit outcome_2 i.symptom3 i.symptom5


mi predictnl pr2_mi = predict(pr) using ramaap

mi xeq 0: summarize pr2_mi

mi predict xb_mi using ramaap, xb

***estimate original D

somersd outcome_2 pr2_mi

***estimate original roc

roctab outcome_2 xb_mi

***note temp3=400

This was not successful! Did not know why! I tried to use 450

cd c:\data

set seed 123456

local nSim = 1000

set more off

cap postclose pf

postfile pf outcome_2 score roc nSim pr2 D using "temp1000.dta", replace

forvalues s = 1/`nSim' {

preserve

bsample

tempvar pr2 D score outcome_2

mi estimate, saving(m1, replace) dots: logit outcome_2 i.symptom3 i.symptom5


mi predict `score' using m1, xb

**gen `score' = xb

capture noisily mi estimate, saving(m2,replace) dots: logit outcome_2 `score'

mi predict xb_b using m2, xb

mi predictnl pr2 = predict(pr) using m2

gen `pr2'=pr2

gen òutcome_2'=outcome_2

**predict `p' , pr

qui somersd outcome_2 `pr2'

gen `D' =_b[`pr2']

capture noisily qui roctab outcome_2 `score'

local roc = r(area)

sum `score'

replace `score'=r(mean)

sum òutcome_2'

replace òutcome_2'=round(r(mean))

post pf (òutcome_2') (`score') (`roc') (`nSim') (`pr2') (`D')


*di "." _continue

restore

}

postclose pf

***Calculation of bias (D and C) & boostrap correction coefficients

cd c:\data

use temp1000 /*450 replications from Tong's do file*/

A) calibration bias

gen D_org = .6856378 /*original D*/

sum D

gen meanD_boot = r(mean) if pr~=.

ci D

gen D_bias = D_org - D

gen meanD_bias = (D_org-meanD_boot) /*D optimism*/

sum D_bias meanD_bias

ci D_bias

***Calculate a bootstrap corrected calibration coefficient: D_org-meanD_bias

gen bs_correctedD = D_org- (meanD_bias) /* bias = -, i.e., D boostrap is higher than

D_or */

list D_org meanD_bias bs_correctedD in 1

lab var bs_correctedD"A bs-correction of D"

sum D D_org meanD_bias bs_corr

gen percent_Derror = (D_org - bs_correctedD)/D_org*100

sum meanD_bias percent_Derror

B) Discrimination bias

gen roc_org= 0.8428 /*from the original model in the derive phase*/

sum roc

gen mean_rocboot = r(mean) if pr2 ~=.

ci roc

gen roc_bias = roc_org-roc /*individual bias*/

*gen bias_roc = roc_org-rocboot

gen meanroc_bias = roc_org-mean_rocboot

sum roc_bias meanroc_bias roc_org

ci roc_bias

C) bootstrap corrected discrimination coefficient by roc_org- bias

gen corrected_roc = roc_org-(meanroc_bias)

lab var corrected_roc " A bs-corrected ROC"

sum roc_org roc corrected_*

gen percent_rocerror = (roc_org-corrected_roc)/roc_org*100

sum roc_org roc corrected_* percent_rocerror

External validation

gen score = -3.374991 + .7978508*symptom3 + 1.042774*symptom5 + .7787298*movecoug +

1.529419*rebten + 1.636311 *fever2 +.9095179*labwbc2 + .6898424*labneu2

sum score

logit appendicitis score

lroc

roctab appendicitis score


predict p, pr

estat gof, gr(6)

A) Recalibrate intercept

Recalibrate the constant term due to difference of incidence/prevalence of appendicitis

between derived and validated data

Correction factor = ln[{(prev/(1-prev)}/{(MPV/(1-MPV}]

MPV=mean predicted value, which would be exactly the same as the prevalence of disease

given predicted values = estimated P from predict command.

Therefore, I used median instead

sum p,det

50% .4218531 Mean .4868421

disp ln((.4868421/(1-.4868421))/(.4218531/(1-.4218531)))

.26252708

gen cf = ln((.4868421/(1-.4868421))/(.4218531/(1-.4218531)))

***Prevalence of appendicitis in the derived data

M1 Calibration of intercept (baseline risk)

If the prev(original) was higher than prev(val), decrease prev(org) was required by

adding up CF because it was +cf

calibrate b0 = b0(orig) - cf ; if cf > 0, b0-cf

If the prev(orig) was lower than prev(val), increasing prev(org) could be done by

subtraction of the cf from the b0

calibrate b0 = b0(orig) + cf

For appendicitis data, the prev(org) < prev(val),

so,

calb0 = b0(org) - cf

***M1

logit app score

gen score1 = (-3.374991-cf) + .7978508*symptom3 + 1.042774*symptom5 + .7787298*movecoug

+ 1.529419*rebten + 1.636311 *fever2 +.9095179*labwbc2 + .6898424*labneu2

***Refit the equation using cal_b0 as below. This model also yields calibrated

coefficient (cal_b), which can be used for the next step of calibration coefficient

ln [risk of app/(1 - risk of app)] = cal_b0 + cal_b*xb

logit append score1

lroc

roctab appen score1

***Calibration for m1

logit append score1

estat gof, gr(6) tab

disp chiprob(4,8.219118) /*df = 6-1-1*/

***Calibration plot

twoway (line obs_p1_ref p1_ref, lpattern(dash)) (scatter p1 obs_p1 , sort

msymbol(triangle_hollow) ylabel(0(.1)1) xlab(0(.1)1)) (lfit p1 obs_p1)

ci mean o_e

B) M2: Calibration by correction of all coefficients by one overall correction factor

i.e., cal_b as defined below. This is because original coefficients are overfitted or

underfitted.

As for ln [risk of app/(1 - risk of app)] = cal_b0 + cal_b*xb

logit app score

gen cal_b = _b[score]


sum cal_b

use the cal_b to calibrate coefficients in the original mo

gen score2 = (-3.374991-cf) + cal_b*(.7978508*symptom3 + 1.042774*symptom5 +

.7787298*movecoug + 1.529419*rebten + 1.636311 *fever2 + .9095179*labwbc2 +

.6898424*labneu2)

logit app score2

roctab appen score2

***Calibration for M2

ci mean o_e

C) M3: Update model or revision method

This is M2 + additional adjustment of regression coefficient for few predictors which

were under/overfitted in validated data compared to original data

ln [risk of app/(1 - risk of app)] = cal_b0 + cal_b*xb + gamma*predictor

Refitted the equation to be the same as calibration coefficient, but included each

predictor one by one in this model.

If gamma coefficient was significant, that meant that predictor was needed to calibrate

Compare [cal_b0+cal_b*xb + gammaX] vs [cal_b0+cal_b*xb]

logit app score2 /*cal_b0 + cal_b*xb*/

estimates store cal_b0b

logit app score2 /*cal_b0 + cal_b*xb*/

***keep LL for M2

estimates store cal_b0b

for each var of varlist symptom3 symptom5 move fever2 rebten labwbc2 labneu2 {

logit append score2 `var'

estimates store m`var'

lrtest cal_b0b m`var'

}

As a result, this comparisons between the two models, with vs without that predictor

indicated that effects of symptom3 and symptom5 were underestimated, fever, labwbc2,

and labneu2 and overestimated the effects, whereas rebten and movecough were not

statistically significant.

Thus, coefficients of these 5 varibles should be used to calibrate specific variables

***Calibrate specific predictors: symptom3 symptom5, fever2, labwbc2, labneu2

gen score3 = (-3.374991+cf) + cal_b*(.7978508*symptom3 + 1.042774*symptom5 +


.6898424*labneu2) + (1.284446*symptom3)+ (1.138048*symptom5)+ (-1.331718*fever2) + (-

1.353248*labwbc2) + (-1.236354*labneu2)

sum score*

estat gof, gr(6)

roctab appen score3


***Prepare p data

disp chiprob(4,2.683341) /*df = 6-1-1*/

.61213286

***Calibration plot

twoway (line obs_p3_ref p3_ref, lpattern(dash)) (scatter p3 obs_p3 , sort


ci mean o_e

D) M4: M2+ Stepwise selection of additional predictors.


For this step, one or more predictors were not included in the original model, or

new marker was needed to include in the model.

I started with

cal_b0 + cal_b*xb

but needed to re-select what variables should be included in xb among 7 variables

sw, pr(.05): logit append symptom3 symptom5 move fever2 rebten labwbc2 labneu2

began with full model

only 3, i.e., symptom3, symtom5, rebten remained in the model!

gen score4 = (-3.374991-cf) + cal_b*(.7978508*symptom3 + 1.042774*symptom5 +


.6898424*labneu2) + (1.83606)*symptom3 + (1.768313)*symptom5 + (1.816649)*rebten

logit append score4


***Calibration for M4

***Prepare p data

disp chiprob(7, 5.000413)

.65991283

***Calibration plot

twoway (line obs_p4_ref p4_ref, sort lpattern(dash)) (scatter p4 obs_p4, sort


ci mean o_e

D) Model 5: Re-estimate all coefficients using validation data only

logit append symptom3 symptom5 move fever2 rebten labwbc2 labneu2

roctab append score5


*** Prepare p data

disp chiprob(4, 5.177072)

***Calibration plot

lab var p5 ***Predicted risk***



ci mean o_e

E) Model 6: M5 + stepwise selection; one or more predictors were not included

sw, pr(.05): logit append symptom3 symptom5 move fever2 rebten labwbc2 labneu2

begin with full model



***Prepare p data

disp chiprob(3,3.257358)

***Calibration plot



ci mean o_e

Chumpon Wilasrumee Biography / 134

BIOGRAPHY

NAME Chumpon Wilasrusmee

DATE OF BIRTH 16 June 1968

PLACE OF BIRTH Bangkok, Thailand

INSTITUTIONS ATTENDED Faculty of Medicine, Ramathibodi Hospital,

Mahidol University, (1986 – 1992)

Doctor of Medicine (First Class Honors)

Faculty of Medicine, Ramathibodi Hospital,

Mahidol University, 1997-2000

Master of Science (Clinical Medicine)

Faculty of Medicine, Ramathibodi Hospital,

Mahidol University, 2010-2016

Doctor of Philosophy (Clinical Epidemiology)

SCHOLARSHIP RECEIVED Faculty of Medicine, Ramathibodi Hospital,

Mahidol University

RESEARCH GRANTS -

HOME ADDRESS 88/350 M.17, Thaweewattana,

Salathammasob, Bangkok 10170,

THAILAND

EMPLOYMENT ADDRESS Department of Surgery, Faculty of

Medicine, Ramathibodi Hospital, 270,

Rama VI Road, Ratchathevi, Bangkok

10400, THAILAND

PHONE Mobile: (6689) 7986337

E-MAIL [email protected]

Documents

RAMATHIBODI APPENDICITIS SCORE (RAMA-AS): A USEFUL … · Appendicitis is defined as an inflammation of the appendix that usually begins at the inner lining and spreads to its other