6
Proceedings of International Joint Conference on Neural Networks, Montreal, Canada, July 31 - August 4, 2005 Detection of Disease Outbreaks in Pharmaceutical Sales: Neural Networks and Threshold Algorithms Glenn Guthrie Computing and Information Science University of Guelph Guelph, ON, Canada Deborah A. Stacey Computing and Information Science University of Guelph Guelph, ON, Canada Email: [email protected] David Calvert Computing and Information Science University of Guelph Guelph, ON, Canada Email: [email protected] Victoria Edge Foodborne, Waterborne and Zoonotic Infections Division Public Health Agency of Canada Guelph, ON, Canada Abstract-Syndromic surveillance involves monitoring data that could indicate disease trends a population, such as gastroin- testinal illness and respiratory illness. Different types of data can be used to detect potential outbreaks of disease or biological contaminant based on deviations from historical norms. The system discussed in this paper is intended to detect aberration by identifying changes in sequence data that do not match the norms for a given time and location. Artificial neural networks (ANNs) were used to detect changes in the sales trends for over-the- counter (OTC) pharmaceuticals. Early detection of an outbreak will allow public health officials to respond faster to potential outbreak situations. Our research examines the application of a multilayer percep- tron using back-propagation learning and a moving window of the daily OTC sales values as inputs. The network is trained to identify changes in the sales trends which can be an indicator of a change in the population's health. The sales data exhibits a large amount of variability and the ANN must be trained to process this without prematurely signalling that a change has occurred. The network is trained using multiple years (hundred's) of simulated sales data containing simulated outbreaks. The success of the ANN is determined by its accuracy and by the amount of time (number of days into the outbreak) that the system takes to correctly signal that an anomalous trend is occurring. I. INTRODUCTION The term syndromic surveillance applies to surveillance using health-related data that precede diagnosis and signal a sufficient probability of a case or an outbreak to warrant further public health response [11, [15], [18], [20]. Syndromic surveillance involves monitoring data that could indicate trends in disease syndromes such as gastrointestinal illness and respi- ratory illness in communities. These types of data (emergency room visits, over the counter drug sales, teletriage calls) might be used to detect and predict potential outbreaks of disease or biological contaminant in the general population based on deviations from historical norms. Syndromic surveillance attempts to find aberrations in the data that need further examination and explanation. An aberration can be described as a change in the distribution or frequency of important health-related data when compared with historical (more than 3 years in the past) or recent (less than 9 days in the past) data. When an aberration of interest is detected a health official epidemiologist is brought in to review the data to determine its significance. The majority of work in this field has focused on using statistics and rule based anomaly pattern detection [6], [8], [9], [11], [14], [161, [17], [19]. More recently, computer modeling and spatial-temporal modeling as well as clustering techniques have been proposed, but results in this area are sparse. Given the fears of terrorism and the potential for waterborne contaminations [2], [3], there is a strong desire to intensify research efforts in this area. In the past few years, there have been waterbome outbreaks of E. coli 0157 in Walkerton, Ontario, Canada and of Cryptosporidium in North Battleford, Saskatchewan, Canada. In each of the outbreaks, when the pharmaceutical sales for local pharmacies were examined, there were seen increased sales of anti-diarrheals drugs during the outbreak periods. Figure 1 shows the resulting epidemic curve that was gathered after the outbreak of Cryptosporidium in North Battleford. II 2I 1 o =nmdcmmes (n'11) * n.SAnked cses (p429) * n- mlk s pl.119) iI. 8. . J.IkAb"A 0 U 1 15 n1 12 29 12 29 Z ZS 14 2X i rsb _k may Fig. 1. Cryptosporidium Outbreak in North Battleford, Saskatchewan, Canada The level of non-outbreak illnesses matches the intensity of the confirmed cases for the five weeks of the outbreak. It is only in the sixth week of the outbreak that the number of confirmed cases rises above the normal levels of illness. This makes the outbreak difficult to detect by doctors and health officials since the number of people seeking medical treatment for illness has remained stable during this period and the tests for the illness take time to come back from the lab. As an 0-7803-9048-2/05/$20.00 @2005 IEEE 3138

[IEEE 2005 IEEE International Joint Conference on Neural Networks, 2005. - MOntreal, QC, Canada (July 31-Aug. 4, 2005)] Proceedings. 2005 IEEE International Joint Conference on Neural

  • Upload
    d

  • View
    221

  • Download
    1

Embed Size (px)

Citation preview

Page 1: [IEEE 2005 IEEE International Joint Conference on Neural Networks, 2005. - MOntreal, QC, Canada (July 31-Aug. 4, 2005)] Proceedings. 2005 IEEE International Joint Conference on Neural

Proceedings of International Joint Conference on Neural Networks, Montreal, Canada, July 31 - August 4, 2005

Detection of Disease Outbreaks in PharmaceuticalSales: Neural Networks and Threshold Algorithms

Glenn GuthrieComputing and Information Science

University of GuelphGuelph, ON, Canada

Deborah A. StaceyComputing and Information Science

University of GuelphGuelph, ON, Canada

Email: [email protected]

David CalvertComputing and Information Science

University of GuelphGuelph, ON, Canada

Email: [email protected]

Victoria EdgeFoodborne, Waterborne and Zoonotic Infections Division

Public Health Agency of CanadaGuelph, ON, Canada

Abstract-Syndromic surveillance involves monitoring datathat could indicate disease trends a population, such as gastroin-testinal illness and respiratory illness. Different types of datacan be used to detect potential outbreaks of disease or biologicalcontaminant based on deviations from historical norms. Thesystem discussed in this paper is intended to detect aberration byidentifying changes in sequence data that do not match the normsfor a given time and location. Artificial neural networks (ANNs)were used to detect changes in the sales trends for over-the-counter (OTC) pharmaceuticals. Early detection of an outbreakwill allow public health officials to respond faster to potentialoutbreak situations.Our research examines the application of a multilayer percep-

tron using back-propagation learning and a moving window ofthe daily OTC sales values as inputs. The network is trained toidentify changes in the sales trends which can be an indicator of achange in the population's health. The sales data exhibits a largeamount of variability and the ANN must be trained to process thiswithout prematurely signalling that a change has occurred. Thenetwork is trained using multiple years (hundred's) of simulatedsales data containing simulated outbreaks. The success of theANN is determined by its accuracy and by the amount of time(number of days into the outbreak) that the system takes tocorrectly signal that an anomalous trend is occurring.

I. INTRODUCTIONThe term syndromic surveillance applies to surveillance

using health-related data that precede diagnosis and signala sufficient probability of a case or an outbreak to warrantfurther public health response [11, [15], [18], [20]. Syndromicsurveillance involves monitoring data that could indicate trendsin disease syndromes such as gastrointestinal illness and respi-ratory illness in communities. These types of data (emergencyroom visits, over the counter drug sales, teletriage calls) mightbe used to detect and predict potential outbreaks of diseaseor biological contaminant in the general population basedon deviations from historical norms. Syndromic surveillanceattempts to find aberrations in the data that need furtherexamination and explanation. An aberration can be describedas a change in the distribution or frequency of importanthealth-related data when compared with historical (more than3 years in the past) or recent (less than 9 days in the past)

data. When an aberration of interest is detected a health officialepidemiologist is brought in to review the data to determine itssignificance. The majority of work in this field has focused onusing statistics and rule based anomaly pattern detection [6],[8], [9], [11], [14], [161, [17], [19]. More recently, computermodeling and spatial-temporal modeling as well as clusteringtechniques have been proposed, but results in this area aresparse.

Given the fears of terrorism and the potential for waterbornecontaminations [2], [3], there is a strong desire to intensifyresearch efforts in this area. In the past few years, there havebeen waterbome outbreaks of E. coli 0157 in Walkerton,Ontario, Canada and of Cryptosporidium in North Battleford,Saskatchewan, Canada. In each of the outbreaks, when thepharmaceutical sales for local pharmacies were examined,there were seen increased sales of anti-diarrheals drugs duringthe outbreak periods. Figure 1 shows the resulting epidemiccurve that was gathered after the outbreak of Cryptosporidiumin North Battleford.

II2I1

o =nmdcmmes (n'11)* n.SAnked cses (p429)* n- mlk s pl.119)

iI.

8. . J.IkAb"A 0 U1 15 n1 12 29 12 29 Z ZS 14 2Xi rsb _k may

Fig. 1. Cryptosporidium Outbreak in North Battleford, Saskatchewan, Canada

The level of non-outbreak illnesses matches the intensityof the confirmed cases for the five weeks of the outbreak. Itis only in the sixth week of the outbreak that the number ofconfirmed cases rises above the normal levels of illness. Thismakes the outbreak difficult to detect by doctors and healthofficials since the number of people seeking medical treatmentfor illness has remained stable during this period and the testsfor the illness take time to come back from the lab. As an

0-7803-9048-2/05/$20.00 @2005 IEEE 3138

Page 2: [IEEE 2005 IEEE International Joint Conference on Neural Networks, 2005. - MOntreal, QC, Canada (July 31-Aug. 4, 2005)] Proceedings. 2005 IEEE International Joint Conference on Neural

alternative to monitoring trends in the number of diagnosedcases this paper will focus on the changes in over-the-counter(OTC) drug sales.A number of current systems exist that collect and analyze

OTC drug sales along with other health related data [7], [10],[121, [13]. The theory that OTC data can provide timely andmeaningful indicators of public health conditions is fosteredby the fact that the products are widely used and an electronicrecord of these sales is readily available in electronic format.The sales are categorized as to the type of medication. Thesecategories include anti-diarrheals, respiratory medication, andanti-influenza. It is currently not clear if product sales can berelated to certain types of illness or if there will be sufficienttime to identify sales trends and use them in the detection ofchanges in public health.The Alternative Alert Surveillance Program (ASAP) is a

Public Health Agency of Canada (PHAC) initiative to monitorpharmaceutical sales [5]. The goal is to monitor for specificdisease profiles and large aberrations in OTC sales data inorder provide health warnings to local health officials as towhen an outbreak of disease may be occurring in their area.Our research within this project is currently focusing on ANNtechniques to improve detection time and accuracy. Threeareas of interest are 1) the usage of temporal based ANNsfor detection, 2) the creation of simulations for the validationof the detection algorithms and 3) the resulting effectivenessand the expected performance of the detection algorithms.The diseases examined in our current research are Cryp-

tosporidium and E coli 0157, each of which has its ownunique properties. The onset time, duration, potency and trans-missibility are different for each disease. These diseases lendthemselves to surveillance with pharmaceutical sales becausethe symptoms are initially mild enough that people often donot immediately seek out medical assistance and instead self-medicate.

II. METHODOLOGYA. Performance Measures

The performance measures are used to compare the aber-ration detection algorithms and to determine which are mostsuccessful for this type of surveillance. The measures usedto indicate the success of this work include the number offalse posatives and negatives, the time required to detect anoutbreak, and the ability of the system to dentify the differentoutbreak trends caused by the different diseases.

B. Data Analysis and SimulationRelatively little data was available to train the system. Three

years of data from five pharmacies was available but none ofthe stores had a known disease outbreak during that time. Thisprovided a limited baseline of data for training the detectionalgorithm. The available outbreak data was from Walkertonand North Battleford but the only baseline data for the storeswas from shortly before and after the outbreak. This did notprovide a sufficient baseline to model the characterisitics ofthe pharmacies before the outbreaks occurred.

To generate sufficient quantities of data needed to trainthe ANN a methodology for simulating pharmaceutical saleswas derived in consultation with an epidemiologist fromPHAC [21], [22]. A system for creating simulations was con-structed to implement this methodology. This system createsa statistical profile for a pharmacy based on the its past sales.A baseline of normal sales is generated from this profile andsimulated outbreaks are then added. The profiles of the twotypes (slow and rapid) of outbreaks based on North Battlefordand Walkerton are shown in Figure 2.

| ActualData Point Fitted Curve

5 X

0

5

.24m=3

c22( 1

0

0 5 10 15 20 25 30 35 40 45 50 55 60Outbreak Day

* Actual Data Poirt Fitted Curve

-> i- -- -s-

/ Sowub

0 5 10 15 20 25 30 35 40 45 50 55 60Outbreak Day

Fig. 2. OTC Sales Profiles for Outbreaks based on Walkerton (Slow) andNorth Battleford (Rapid)

The sales simulator is designed to maintain the variabilityseen in the day to day pharmaceutical sales and also to includeoutbreaks that are equally as variable. The simulator has anumber of configurable parameters. A database of knownstore parameters (overall, seasonal and daily) can be used tocreate the baseline sales data for a store. Simulated outbreakscan be altered in duration, magnitude, frequency and dailyrandomness. An example of our simulated data is provided inFigure 3.

III. TRAINING AND TESTINGA multilayer perceptron, using back-propagation (BP) learn-

ing and a moving window of the daily sales values as inputs,was employed to examine for aberrations or patterns in thesales data for individual stores.The inputs to the network were a moving window of sales

values for the previous days sales. Since each store only had asingle sales value for each day, the number of inputs matchedthe number of days in the window. Additional inputs werelater added for the season the sales window was in and theday the sale was on.The network is first taught to recognize outbreaks by

training the network with one hundred years of simulated sales

3139

,s F

Page 3: [IEEE 2005 IEEE International Joint Conference on Neural Networks, 2005. - MOntreal, QC, Canada (July 31-Aug. 4, 2005)] Proceedings. 2005 IEEE International Joint Conference on Neural

I- *'SVNew:'>E-.i - .. j aS

2c.' ..................................................------------------------------------.-...........------------------ -

... I.... +''. '. -----' '-'-

35.J' 15. 43 Z3 0. 4 3 .. 0. 533) Is.

1 jl t- ---- +-~.--....................... .................. .......

nIt .- ~..........W.. ...IV it tia .IrZT . .

O 5. 5. 4 13 *.5 '.0 20t .5.3 t .:.3 15 4.

Fig. 3. Plots of one year of simulated anti-diarrheals sales data with slowand rapid outbreaks inserted and labeled - first seen with no change in theoutbreak profiles and then with randomness levels of 1.0

data containing randomized outbreaks. The desired output ofthe network is based on how far into an outbreak period hasbeen presented to the ANN. When only baseline sales arepresent at the input to the ANN then the desired output iszero. The output value rises linearly from zero to one fromthe start to the peak of the outbreak. The higher the value ofthe output, the further the network is into a potential outbreak.The network has a single output and this was a measure

of how far it was into a potential outbreak. It was necessaryto find the outbreaks before the peak values, and so onlythe outbreak days leading up to and a shortly after the peakwere assigned values greater than the baseline value of zero.Maintaining outputs past the peak was necessary since thedaily sales values are small and the difference between anoutbreak day with a daily multiplier of 1.0 and 0.95 is notlarge. The resulting peaks of the outbreak can appear flat forseveral days in an outbreak before it begins to subside. Forthe small sales values seen in these stores, a labeling cut offpoint of 90% of the largest multiplier, past the peak of theoutbreak, worked well. For larger sales values this labellingcut off would need to be raised or perhaps even removed sincethe peak would not appear as flat. This is important because itensured that the network never associated both decreasing andincreasing sales trends with the same output value. Figure 4illustrates this concept with the idealized profiles of our twodisease types.

After training, a threshold value is used with the output todetermine when an outbreak has occurred.A moving average (MA) to detect the outbreaks was also

tested and used for comparison. The formula for calculatingthe moving average for a given day is:

x

MovingAverage(x) = 1/n E A(i)i=x-n

where

. x is the most recent day in the window

.- svd ;t3w. P05>,ii.01. teIj0 Neu4wcs1ra(\IIFd,Ipuiesow.

f ..e.

.3 4 \t...........

O 4 5 8 1J 12 4

.'...I.owM'akF1tIle -LeSfed N*eLlNeI5A4 (. iteR snIe,

1 " = _.____._____.. _ __.__._ _____.__ ............}- ?J4~~~~~~~~~~~~.

O4 _. il. ,5 I, ,5 ,..

n r. in l<> 7A Fr :31 r, 4fl 45! P. !,)r,Dow Af 086r*A

Fig. 4. Plots of the desired neural network outputs for the rapid and slowoutbreak profiles

. A is the sales value for day i* n is the size of the moving window

Both the ANN outputs and the moving average have athreshold value fitted that attempts to balance the number offalse positives and false negatives. A false positive is definedto be that situation where the ANN indicates that an outbreakis occurring when it is not. A false negative is that situationwhere the ANN does not indicate that an outbreak is occurringwhen in fact an outbreak is in progress. Due to the variabilityin the baseline data, setting the threshold too low causes toomany false positives. Setting it too high means that someoutbreaks are missed (a false negative). To best fit the thresholdto the baseline data, a program iteratively tests a range ofthreshold values for each detection method. Once the thresholdis determined for each detection method a testing data set isgenerated using the same parameters as used to generate thetraining set. The ANN and the MA are then tested on the newdata set and the number of false positives and negatives, andthe time required to detect each outbreak is determined.The viewing point is the location in the ANN's input

window that corresponds to the output. Each day in the movingwindow could be made the viewing point into a potentialoutbreak. If the viewing point is moved to the most currentsales value in the window, then at the start of the outbreak thenetwork only has a single input of outbreak data to respond toand change its output. Moving the viewing point to the startof the window reduces the accuracy of the output since thedata is highly variable and the network does not have muchinformation to base its output on. If the viewing point is movedto the last day in the moving window, the network has theentire length of the window to learn from. As the viewing pointis moved later into the sales window, the network's responsebegins to lag as it begins to take a longer time for the networkto respond to the outbreak.

3140

Page 4: [IEEE 2005 IEEE International Joint Conference on Neural Networks, 2005. - MOntreal, QC, Canada (July 31-Aug. 4, 2005)] Proceedings. 2005 IEEE International Joint Conference on Neural

A. Determining the Neural Network ParametersPrior to testing, the ANN parameters needed to be deter-

mined. Some initial trials were run to examine the responseof the network to varying hidden layer sizes, learning andmomentum rates and training epochs. The total squared error(TSE) was used as a comparative measure, where the erroris the difference between the desired output and the actualnetwork output.

It was found that altering the hidden layer size and thelearning and momentum rates did not have a great effect onthe ability of the network to train on the data. The networkroutinely converged to the same training error between differ-ent training runs. It was found that conservative learning andmomentum rates reduced training error. Hidden layer sizesthat matched the input layer moving window size were largeenough for accurate training.

Over-training was an initial concern, and validation datasets were used in the trials in conjunction with training sets toexamine the effects of a greater number of training epochs. Itwas found that there was a point where the network's trainingTSE was still dropping slightly, but the TSE of the validationset was beginning to rise. The rise in the validation seterror was small, and even with extremely long training timesthe validation set error only changed a small amount (lessthan 0.1%). The decision was made to choose a reasonabletraining epoch size and not use verifications sets during theexperiments.The threshold level is the output value that when crossed

signals an outbreak. One way to ensure the best threshold isbeing used is to try to fit the value to the training data iter-atively. Through testing of a wide range of thresholds valuesfor the detection method output, a threshold was determinedthat minimizes the errors and the time to detection.

IV. TESTING DESCRIPTION AND RESULTS

Three types of tests were performed to examine the perfor-mance characteristics of the ANN and to compare it to theMA system. Thirty training and testing sets were created ofsimulated anti-diarrheal sales for one hundred years. Outbreakslocations were randomized and outbreaks were added intoboth training and testing sets with an average period betweenoutbreaks of three hundred days.Once the simulated data sets were generated, the ANN was

trained and the MA for the training set sales periods wascalculated. Threshold values based on the training data werethen calculated for the ANN output and for the MA. The ANNthen was presented with the testing data and the outputs itgenerated were recorded. The MA was again calculated for thetesting data. The number of false positives, negatives, and thedetection time for both techniques was then calculated fromthese results. Figure 5 shows an outbreak detection example.

A. Base Comparison of the ANN versus the Simple MovingAverage

This suite of tests compared the perfornance of the ANNto the MA for both outbreak profiles. The profiles were left

*00S-0i

04

T5 Th -jd4

O iD O SDm liD

,-~~~~~~~~~~.........,e 03 _____. 0.w

'Io45i0~2

____________D0 20 40 S0 S3 (NJ 120

D~

Fig. 5. An example outbreak starting at daymoving average and neural network output

82 with plots of raw data,

unaltered, and were not randomized, or changed in scale orduration. Training sets were created for both outbreak profilesseparately, and the Walkerton and North Battleford outbreaksdid not appear in the same training data set. The window sizeused in this test was seven. The viewing point was at the centerof the moving window.The difference in performance of the two algorithms on the

slow and rapid profiles really is quite substantial, as seen inTable I and Table II.

TABLE I

SUMMARY OF RESULTING ERROR RATES AND TIME TO DETECTION FOR

ANN AND MA

For the rapid profile, the ANN greatly reduces the errorrate and at the same time improves the detection rate for theoutbreaks when contrasted to the moving average. It improvesthe error rate by over 10 fold, while on average it improves thedetection rate by over a day. The slow profile is easier to detectsince it lasts longer and rises higher above the baseline sales,so not surprisingly the moving average performed better here.But with nearly identical error rates, the neural network is ableto detect the outbreaks over 2 days faster on average, whichis quite a substantial improvement when every day counts inan outbreak situation. It is also quite interesting to note thatthe ANN rarely generates a false negative, i.e. it does notmiss outbreaks. Combined with a relatively low false positive

3141

Page 5: [IEEE 2005 IEEE International Joint Conference on Neural Networks, 2005. - MOntreal, QC, Canada (July 31-Aug. 4, 2005)] Proceedings. 2005 IEEE International Joint Conference on Neural

TABLE 11BREAKDOWN OF ERROR RATES INTO FALSE NEGATIVES AND FALSE

POSITIVES (IN 30 TESTS)

Outbreak Detection Total Total False Total FalseProfile ] Method Error Positive NegativeRapid ANN 78 74 4Profile MA 852 575 277Slow ANN 48 48 0Profile MA 40 38 2

rate, this makes the ANN a very well behaved surveillancemechanism compared to the MA.

B. Tests Examining the Characteristics of the ANN's MovingWindow

For the first of this type of test, the same testing procedurewas used as in the previous set of tests except that the windowsize was altered from three days to a month. This necessitateda change in the size of the input and hidden layers for theANN.The results show that there is an optimum moving window

size for both the rapid and the slow profiles. For the rapidprofile, to minimize errors the best moving window size fallsbetween 5 and 9 days depending on the desired accuracy andtime to detection. For the slow profile, it appears that the bestmoving window size is in the range of 7 to 15 days.The trade-off between accuracy and time to detection as a

result of varying the size of the moving window can be seen

in Figure 6 for the rapid outbreak profile and in Figure 7 forthe slow outbreak profile..

_r.T143Wt --TMS Sadard Dr.b2

A: .

...\\ ,^,,6,,,.. . .. .,:10.. ------

3 5 ' 9 11 13 '5maviq Wkidw Size Dawi

........... ............. -......... ...........

4~~~~~~~a.~l

2 >x..~..,,..^.... ..,.,................77 ;D54t---* ffF __3 s X 1k A13 ;5

Fig. 6. Trade-off between accuracy and time to detection as the size of themoving window is varied for the rapid outbreak profile

Additional tests were performed that showed placing theviewing point towards the center of the moving windowprovided the best results. What is now clear from the testsis that there are two parameters, the threshold value and the

8 ,,,

4-

3 7 9 11 13 15 17 19 2I 23 26 27 ;9 31._.AN.hir.se On DrylF

'B

3025

2015

105

35 9 11 13 15 171 22: 2 25 27 2931Moobo Window Size (in DSYi

Fig. 7. Trade-off between accuracy and time to detection as the size of themoving window is varied for the slow outbreak profile

moving window size that can be altered to tune the ANN'sdetection performance.

C. Variability TestsTests were performed to examine the sensitivity of the ANN

to outbreak duration, outbreak scale and increased variability(randomness) in the outbreaks. A wide range of these parame-

ters were tested as were combinations of all three variables. Ineach case, at first the parameters are lowered and raised. Oncethe lower and upper limits have been examined then ranges

of values are used to test if the network can generalize theoutbreak profiles when each outbreak appears different thanthose previously presented to the system.The ANN was found to handle outbreaks that varied widely

in duration, scale and variability. As each factor drifted furtherfrom the norm, performance decreased but the ANN was

robust enough to outperform the moving average and toshow clear trade-offs in accuracy versus time to detection. Asmore information is discovered about the true nature of thecorrelation between OTC sales and disease outbreaks, it willbe possible to discover more about the true accuracy of theANN technique with respect to the bounds of performance.

V. CONCLUSIONS

The goal in creating this system was to use OTC sales todetect the outbreak of a gastrointestinal disease as quicklyas possible, while minimizing both false positives and falsenegatives. There is a trade-off between the time to detectionand the resulting errors in that the earlier the algorithm triesto detect the outbreak, the more likely it is that it will mistakethe variability seen naturally in the sales data for an outbreak.The longer the system waits to detect the outbreak as it risesabove the baseline, the more likely it is that it will have madethe right determination.

3142

I- Tos Awage -Tesrdard [r-i35 ----

.

::7

Page 6: [IEEE 2005 IEEE International Joint Conference on Neural Networks, 2005. - MOntreal, QC, Canada (July 31-Aug. 4, 2005)] Proceedings. 2005 IEEE International Joint Conference on Neural

The domain of syndromic surveillance is in need of moreand varied techniques for aberration detection and there is aneed to start to classify the effectiveness of syndromic systemsbased on these detection methods and allow for the directcomparison of results [4]. Given our results, it appears thatan ANN using back-propagation learning can be trained todistinguish a limited number and variety of outbreak profilesfrom pharmacy baseline data. It will be interesting to continueto develop the ANN methodology with respect to syndromicsurveillance and to compare it to the other techniques nowused by the community.

REFERENCES

[1] Henning, Kelly J., "What is Syndromic Surveillance?", MMWR,53(Supplement), pg. 7-11, Sep, 200.

[2] Buehler, James W., "Review of the 2003 National Syndromic Surveil-lance Conference - Lessons Learned and Questions To Be Answered",MMWR, 53(Supplement), pg. 18-22, Sep, 2004.

[3] Sosin, Daniel M., "Evaluation Challenges for Syndromic Surveillance -Making Incremental Progress", MMWR, 53(Supplement), pg. 125-129,Sep, 2004.

[4] Mandl, Kenneth D. and Reis, B. and Cassa, C.," Measuring Outbreak-Detection Performance By Using Controlled Feature Set Simulations",MMWR, 53(Supplement), pg. 130-136, Sep, 2004.

[5] Edge, Victoria. L. and Lim, G. H. and Aramini, J. J. and Sockett,P. and Pollari, F. L., "Development of an Alternative SurveillanceAlert Program (ASAP): Syndromic Surveillance Of GastrointestinalIllness Using Pharmacy Over-The-Counter Sales", National SyndromicSurveillance Conference, New York Academy of Medicine, Mew YorkCity Department of Health and Mental Hygiene and the Centers forDisease Control and Prevention New York City, Sep, 2002.

[6] Heffernan, Richard and Mostashari, F. and Das, D. and Besculides, M.and Rodriguez, C. and Greenko, J. and Steiner-Sichel, L. and Balter,S.and Karpati, A. and Thomas, P. and Phillips, M. and Ackelsberg, J. andLee, E. and Leng, J. and Hartman, J. and Metzger, K. and Rosselli,R. andWeiss, D., New York City Syndromic Surveillance Systems, MMWR,53(Supplement), pg. 23-27, Sep, 2004.

[7] Wagner, Michael M. and Tsui, F-C. and Espino, J. and Hogan,W. andHutman, J. and Hersh, J. and Neill, D. and Moore, A. and Parks, G.and Lewis, C. and Aller, R., "National Retail Data Monitor for PublicHealth Surveillance", MMWR, 53(Supplement), pg. 40-42, Sep, 2004.

[8] Espino, Jeremy U. and Wagner, M. and Szczepaniak, C. and Tsui, F-C. and Su, H. and Olszewski, R. and Liu, Z. and Chapman, W. andZeng, X. and Ma, L. and Lu, Z. and Dara, J., "Removing a Barrier toComputer-Based Outbreak and Disease Surveillance - The RODS OpenSource Project", MMWR, 53(Supplement), pg. 32-39, Sep, 2004.

[9] Dafni, Urania G. and Tsiodras, S. and Panagiotakos, D. andGkolfinopoulou, K. and Kouvatseas, G. and Tsourti, Z. and Saroglou, G.,"Algorithm for Statistical Detection of Peaks - Syndromic SurveillanceSystem for the Athens 2004 Olympic Games", MMWR, 53(Supple-ment), pg. 86-94, Sep, 2004.

[10] Magruder, Steven F. and Happel Lewis, S. and Najmi, A. and Florio,E., "Progress in Understanding and Using Over-the-Counter Pharma-ceuticals for Syndromic Surveillance", MMWR, 53(Supplement), pg.117-122, Sep, 2004.

[11] Moore, Andrew and Cooper, G. and Tsui, R and Wagner, M, "Summaryof Biosurveillance-Relevant Statistical and Data Mining Technologies",Unpublished Internet Report, Feb. 2002.

[12] Goldenberg, A and Shmueli, G and Caruana, RA and Fienberg, SE,"Early Statistical Detection of Anthrax Outbreaks by Tracking Over-The-Counter Medication Sales", Proc Natl Acad Sci USA, num 99, vol8, pg. 523740, 2002.

[13] Reis, Ben Y. and Mandl, K. D., "Early Statistical Detection of AnthraxOutbreaks by Tracking Over-The-Counter Medication Sales", BMCMedical Informatics and Decision Making, num 3, vol 1, pg. 2, 2003.

[14] Wong, W. K. and Moore, A. M. and Cooper, G. F. and Wagner, M. M.,"Bayesian Network Anomaly Pattern Detection for Disease Outbreaks",Proceedings of the 20th International Conference on Machine Learning,2003.

[15] Buehler, James W. and Berkelman, R. L. and Hartley, D. M. and Peters,C. J., "Syndromic Surveillance and Bioterrorism-Related Epidemics",Emerg Infect Dis, URL: http://www.cdc.gov/ncidod/EID/vol9nolO/03-0231.htm, 2003.

[16] Hutwagner, L. and Thompson, and W. Seeman G., M. and Treadwell,T., "The Bioterrorism Preparedness and Response Early AberrationReporting System (EARS)", Journal of Urban Health, vol 80, pg. 186-196, 2003.

[17] Wong, Weng-Keen and Andrew, M. and Cooper, G., "WSARE: What'sStrange About Recent Events?", Journal of Urban Health, vol 80, pg.166-175, 2003.

[18] Mostashari, Farzad and Hartman, J. J., "Syndromic Surveillance: a LocalPerspective", Journal of Urban Health, vol 80, pg. 11-17, 2003.

[19] Lombardo, Joseph and Burkom, H. and Elbert, E. and Magruder, S.and Lewis, H. and Loschen, W. and Suri, J. and Sniegoski, C. andWojcik, R. and Pavlin, J., "A Systems Overview of the ElectronicSurveillance System for the Early Notification of Community-BasedEpidemics (ESSENCE II)", Journal of Urban Health, vol 80, pg. 132-142,2003.

[20] Pavlin, Julie A., "Investigation of Disease Outbreaks Detected bySyndromic Surveillance Systenms", Journal of Urban Health, vol 80, pg.1107-I114, 2003.

[21] Guthrie, G., "'Detection of Disease Outbreaks in Pharmaceutical Sales:An Investigation of an Artificial Neural Network Approach"', MastersThesis, Dept. of Computer Science, University of Guelph, Guelph,Ontario, Canada., April 2005.

[22] Guthrie, G. and Calvert, D. and Stacey, D. and Edge, V., "SimulatingPharmaceutical Sales and Disease Outbreaks Based on Actual BaselineStore Sales and Available Outbreak Data: A Model for ImprovingDetection Algorithms", Oral Presentation, Syndromic Surveillance Con-ference, Boston, Mass. 2004.

3143