6
A Method for Optimizing and Validating Institution- Specific Flagging Criteria for Automated Cell Counters Anthony Sireci, MD; Robert Schlaberg, MD, MPH; Alexander Kratz, MD, PhD, MPH N Context.—Automated cell counters use alerts (flags) to indicate which differential white blood cell counts can be released directly from the instrument and which samples require labor-intensive slide reviews. The thresholds at which many of these flags are triggered can be adjusted by individual laboratories. Many users, however, use factory- default settings or adjust the thresholds through a process of trial and error. Objective.—To develop a systematic method, combining statistical analysis and clinical judgment, to optimize the flagging thresholds on automated cell counters. Design.—Data from 502 samples flagged by Sysmex XE- 2100/5000 (Sysmex, Kobe, Japan) instruments, with at least 1 of 5 user-adjustable, white blood cell count flags, were used to change the flagging thresholds for maximal diagnostic effectiveness by optimizing the Youden index for each flag (the optimization set). The optimized thresh- olds were then validated with a second set of 378 samples (the validation set). Results.—Use of the new thresholds reduced the review rate caused by the 5 flags from 6.5% to 2.9% and improved the positive predictive value of the flagging system for any abnormality from 27% to 37%. Conclusions.—This method can be used to optimize thresholds for flag alerts on automated cell counters of any type and to improve the overall positive predictive value of the flagging system at the expense of a reduction in the negative predictive value. A reduced manual review rate helps to focus resources on differential white blood cell counts that are of clinical significance and may improve turnaround time. (Arch Pathol Lab Med. 2010;134:1528–1533) T he accurate and timely delivery of differential white blood cell (WBC) count results by the hematology laboratory is crucial in many clinical settings, including acute infections, hematologic malignancies, and the administration of chemotherapy. Automated cell counters were introduced by Wallace Coulter in 1953 and have since replaced the microscope as the instrument of choice for most peripheral, differential WBC counts. The ad- vantages of automated cell counters over the microscope include faster turnaround times, significantly lower labor costs, lack of interobserver variation, and results with greater statistical validity. 1 Modern automated cell counters provide a reliable differential WBC count for samples that are within reference range and for those that exhibit only quantitative abnormalities. Qualitatively abnormal sample results, such as those with abnormal or immature cells, still require the preparation of a slide and microscopic analysis. 2 The instruments use flags (electronic or printed alerts) to notify the user that the automated differential WBC count may not be correct and requires review. The factors that trigger these flags vary with the underlying technology of the cell counter; in most cases, the flags have some level of specificity for the presence of certain abnormal findings. For example, instruments that use light scatter and fluorescence will flag samples that contain cell populations in certain areas of their scatter- grams with a blast flag. The presence of one or more flags does not indicate that a specific abnormality or any abnormality has to be present; it only indicates that there is an increased probability of an abnormality that can only be excluded or proven by slide review. The presence of a flag on a sample, therefore, usually prompts the laboratory to prepare and review a slide with a microscope or a digital imaging device. 3 If no qualitative WBC abnormalities are identified on the slide and the quantitative distribution of the normal cell populations on the slide corresponds to the automated differential, the results from the instrument may be released. Otherwise, a full microscopic differential WBC count may be indicated. The percentage of differential WBC count samples that are flagged by an automated cell counter and submitted for microscopic review is known as the review rate. Factors that influence the review rate include the patient population served (eg, the review rate will be higher in patients with hematologic or oncologic disease because of the immature or abnormal WBCs often observed in these patients), the type of automated cell counter, and the settings of the flagging thresholds of the automated cell counter. The latter variable is most easily influenced by individual laboratories. Most automated cell counters are installed with factory- set or factory-recommended settings for the thresholds. It Accepted for publication January 25, 2010. From the Department of Pathology, Columbia University College of Physicians and Surgeons, New York, New York; and the Clinical Laboratory Service, NewYork–Presbyterian Hospital, New York. Dr Schlaberg is now with the Department of Pathology, University of Utah, Salt Lake City. The authors have no relevant financial interest in the products or companies described in this article. Reprints: Alexander Kratz, MD, PhD, MPH, Columbia University Medical Center, Core Laboratory, 622 W 168th St, PH3-363, New York, NY 10032 (e-mail: [email protected]). 1528 Arch Pathol Lab Med—Vol 134, October 2010 Optimizing Flagging Thresholds—Sireci et al

016 a Method for Optimizing and Validating Institution

Embed Size (px)

Citation preview

Page 1: 016 a Method for Optimizing and Validating Institution

A Method for Optimizing and Validating Institution-Specific Flagging Criteria for Automated Cell Counters

Anthony Sireci, MD; Robert Schlaberg, MD, MPH; Alexander Kratz, MD, PhD, MPH

N Context.—Automated cell counters use alerts (flags) toindicate which differential white blood cell counts can bereleased directly from the instrument and which samplesrequire labor-intensive slide reviews. The thresholds atwhich many of these flags are triggered can be adjusted byindividual laboratories. Many users, however, use factory-default settings or adjust the thresholds through a processof trial and error.

Objective.—To develop a systematic method, combiningstatistical analysis and clinical judgment, to optimize theflagging thresholds on automated cell counters.

Design.—Data from 502 samples flagged by Sysmex XE-2100/5000 (Sysmex, Kobe, Japan) instruments, with atleast 1 of 5 user-adjustable, white blood cell count flags,were used to change the flagging thresholds for maximaldiagnostic effectiveness by optimizing the Youden index

for each flag (the optimization set). The optimized thresh-olds were then validated with a second set of 378 samples(the validation set).

Results.—Use of the new thresholds reduced the reviewrate caused by the 5 flags from 6.5% to 2.9% andimproved the positive predictive value of the flaggingsystem for any abnormality from 27% to 37%.

Conclusions.—This method can be used to optimizethresholds for flag alerts on automated cell counters of anytype and to improve the overall positive predictive value ofthe flagging system at the expense of a reduction in thenegative predictive value. A reduced manual review ratehelps to focus resources on differential white blood cellcounts that are of clinical significance and may improveturnaround time.

(Arch Pathol Lab Med. 2010;134:1528–1533)

The accurate and timely delivery of differential whiteblood cell (WBC) count results by the hematology

laboratory is crucial in many clinical settings, includingacute infections, hematologic malignancies, and theadministration of chemotherapy. Automated cell counterswere introduced by Wallace Coulter in 1953 and havesince replaced the microscope as the instrument of choicefor most peripheral, differential WBC counts. The ad-vantages of automated cell counters over the microscopeinclude faster turnaround times, significantly lower laborcosts, lack of interobserver variation, and results withgreater statistical validity.1

Modern automated cell counters provide a reliabledifferential WBC count for samples that are withinreference range and for those that exhibit only quantitativeabnormalities. Qualitatively abnormal sample results,such as those with abnormal or immature cells, stillrequire the preparation of a slide and microscopicanalysis.2 The instruments use flags (electronic or printedalerts) to notify the user that the automated differentialWBC count may not be correct and requires review. The

factors that trigger these flags vary with the underlyingtechnology of the cell counter; in most cases, the flags havesome level of specificity for the presence of certainabnormal findings. For example, instruments that uselight scatter and fluorescence will flag samples thatcontain cell populations in certain areas of their scatter-grams with a blast flag. The presence of one or more flagsdoes not indicate that a specific abnormality or anyabnormality has to be present; it only indicates that thereis an increased probability of an abnormality that can onlybe excluded or proven by slide review.

The presence of a flag on a sample, therefore, usuallyprompts the laboratory to prepare and review a slide witha microscope or a digital imaging device.3 If no qualitativeWBC abnormalities are identified on the slide and thequantitative distribution of the normal cell populations onthe slide corresponds to the automated differential, theresults from the instrument may be released. Otherwise, afull microscopic differential WBC count may be indicated.

The percentage of differential WBC count samples that areflagged by an automated cell counter and submitted formicroscopic review is known as the review rate. Factors thatinfluence the review rate include the patient populationserved (eg, the review rate will be higher in patients withhematologic or oncologic disease because of the immature orabnormal WBCs often observed in these patients), the typeof automated cell counter, and the settings of the flaggingthresholds of the automated cell counter. The latter variableis most easily influenced by individual laboratories.

Most automated cell counters are installed with factory-set or factory-recommended settings for the thresholds. It

Accepted for publication January 25, 2010.From the Department of Pathology, Columbia University College of

Physicians and Surgeons, New York, New York; and the ClinicalLaboratory Service, NewYork–Presbyterian Hospital, New York. DrSchlaberg is now with the Department of Pathology, University of Utah,Salt Lake City.

The authors have no relevant financial interest in the products orcompanies described in this article.

Reprints: Alexander Kratz, MD, PhD, MPH, Columbia UniversityMedical Center, Core Laboratory, 622 W 168th St, PH3-363, New York,NY 10032 (e-mail: [email protected]).

1528 Arch Pathol Lab Med—Vol 134, October 2010 Optimizing Flagging Thresholds—Sireci et al

Page 2: 016 a Method for Optimizing and Validating Institution

is then up to the individual laboratories to adjust thethresholds to the clinical needs of their patients andclinical staff. These adjustments involve a careful balan-cing of the potential risk of missing any abnormal cells(which favors thresholds set at very low values) and theincrease in turnaround time, manual labor, and costcaused by an increase in the review rate (which favorsthresholds at high values). In many cases, laboratories opteither to use the factory-set defaults or to make adjust-ments over time by a process of trial and error without aclear understanding of the exact consequences of theadjustments.

On the Sysmex XE-2100 and XE-5000 (hereafter, the XE-2100/5000; Sysmex, Kobe, Japan) analyzers, more than 30different flags are used to indicate the possible presence ofqualitative and quantitative abnormalities of red bloodcells, WBCs, or platelets in a sample. Our studyinvestigated 5 of those flags that indicate the possiblepresence of abnormal WBCs and that have user-adjustablethresholds.4,5 Using those 5 flags as a model system, wedeveloped a systematic method for optimizing the thresh-olds at which the cell counter’s flags are triggered.

MATERIALS AND METHODS

The Sysmex XE-2100/5000 Line of AutomatedCell Counters

The Sysmex XE-2100/5000 line of automated cell countersestablishes a 5-part differential WBC count using an opticaltechnique that includes forward scatter, side scatter, and sidefluorescence as well as a method of electric impedance.6,7 Anupgraded software package (XE-IG Master, Sysmex) allows usersto obtain an extended differential WBC count with the additionof an immature granulocyte count, consisting of promyelocytes,myelocytes, and metamyelocytes, to the standard 5-part differ-ential test.8,9

The 5 WBC-specific flags with user-definable thresholds thatwe used as our model system are called (1) blasts, (2) immaturegranulocytes, (3) left shift, (4) atypical lymphocytes, and (5)abnormal lymphocytes/lymphoblasts flags. These flags aregenerated by patterns in the scattergrams that are typical forcertain abnormalities. We did not study WBC count flags whosetriggering thresholds were not adjustable by the user or flags thatindicate possible red blood cell or platelet abnormalities.

Manual Differential WBC Counts

Blood smears were prepared using the SP-1000i slidemaker-stainer (Sysmex; adult samples) or using the push-pull methodwith a spreader slide (pediatric samples) and stained withWright-Giemsa. Manual differential WBC counts using thestandard microscopic technique followed the laboratory’s stan-dard operating procedure, which is based on the Clinical andLaboratory Standards Institute guidelines.10 One hundred WBCswere counted for each manual differential WBC count test.

In our laboratory, slides prepared from flagged samples arescanned by technologists for the presence of one or moreabnormal cells populations, as summarized in Table 1. Thesecriteria are based on the recommendations by the InternationalConsensus Group for Hematology Review and modified basedon the clinical needs of our institution.11 The modifications werebased on an informal survey of the practices of other laboratories,a review of the literature, consultation with the clinical staff, andthe experience of our laboratory leadership. If no abnormal cellpopulations are found and the instrument’s differential resultsappear representative of the relative distribution of cells on theslide, the automated differential WBC count results are released;if one or more of the conditions listed in Table 1 is present, a fullmicroscopic differential WBC count is performed. We used thesecriteria to identify cases as true-positives (one or more of the

criteria was met, and a microscopic differential was necessary) orfalse-positives (none of the criteria was met, a microscopicdifferential was not necessary, and the automated differentialWBC count result was released).

Study Samples

The study was performed in the Core Laboratory of theColumbia University Medical Center campus of the New York-Presbyterian Hospital (New York), a tertiary care, academicmedical center serving a large inpatient and outpatient popula-tion, including adult and pediatric hematology-oncology ser-vices. The study samples were routine patient specimens thathad a differential WBC count ordered by the clinician and amicroscopic slide reviewed because one or more of the 5 flags ofinterest was triggered at the factory default settings.

Statistical Analysis

Statistical analysis was performed using Excel software(Microsoft, Redmond, Washington). Receiver operating charac-teristic curves were constructed using Stata 10.0 (StataCorp,College Station, Texas) software. The optimal settings for theflagging thresholds were described using positive predictivevalues (PPVs) and efficiency, and they were derived using themaximized Youden index (YI). The PPV was defined as

PPV~True Positives=All Positives

The YI is a function of both sensitivity and specificity and isused as a summary of the diagnostic effectiveness of an assay atvarious cutoffs.12 The threshold for an assay when the YI ismaximized, therefore, represents the best performance profile ofa test, which is the largest vertical distance from the diagonal tothe receiver operating characteristic curve. The YI can be seen asa simplified measure of area under the receiver operatingcharacteristic curve and a method of minimizing regret inmedical decision making.13,14 The YI assumes a value between 0and 1, with 1 representing the most effective cutoff, and iscalculated with the equation

YI~Sensitivityz Specificity{1ð Þ

The maximized YI was used to select the optimized thresholds ofthe 5 flags, both for their specific abnormal findings and for thepresence of any abnormalities in the WBC counts. Efficiency isdefined as

Efficiency~ True-PositiveszTrue-Negativesð Þ=All Cases

Efficiency can best be understood as the proportion of samplesthat an assay (or threshold) correctly classifies as disease ornondisease. Efficiency is the probability of disease given apositive test result and no disease given a negative test result. Assuch, efficiency is a useful measure to compare 2 assays orvarious thresholds.15

Study Design.—Optimization Set.—A group of 502 specimensthat had been flagged by one or more of the 5 flags of interest whenthresholds were set at factory-default levels was used as the

Table 1. Abnormal Findings Qualifying as True-Positives on Manual Differential White Blood

Cell Counts

Abnormality Threshold, %

Blasts, plasma cells, hypersegmentedneutrophils (.6 lobes) .1

Metamyelocytes and/or myelocytes .3Atypical lymphocytes .5Band forms .7Nucleated red blood cell counts .1

Data are modified from the recommendations of the InternationalConsensus Group for Hematology Review.11

Arch Pathol Lab Med—Vol 134, October 2010 Optimizing Flagging Thresholds—Sireci et al 1529

Page 3: 016 a Method for Optimizing and Validating Institution

optimization set. The numeric value of each of the 5 differentialWBC count-specific flag variables was extracted from the auto-mated cell counter and incorporated into an Excel spreadsheet.

Optimization began by raising the cutoff of each flag from thefactory default setting in increments of 10 units and calculatingthe YI of each level for the identification of the specificabnormality denoted by the flag (eg, the blast flag for blasts).The threshold yielding the highest YI was chosen for each flag.

The Figure depicts graphically the process of choosing a cutoffthat maximizes the YI using the receiver operating characteristiccurve of the abnormal lymphocyte/lymphoblast flag for itsabnormality (atypical lymphocytes or lymphoblasts). Note thatalthough sensitivity of the flag decreases with optimization, themaximized YI represents the optimal balance between true-positives and false-positives.

We next varied the threshold of each flag from the previouslyoptimized point, in 10-unit increments, to find the thresholdcorresponding to the highest YI for each flag in the presence ofany abnormality. The 5 flags were adjusted in the following order:blasts, abnormal lymphocytes/lymphoblasts, immature granulo-cytes,atypicallymphocytes,andleftshift. Inadditiontomaximizingthe YI, we also used clinical judgment in this step. For example, wemadetheclinicaldecisionthatwewerewillingtoacceptasignificantnumber of missed bands and some missed myelocytes andmetamyelocytes, but we would not tolerate missing a single blast.

Validation Set.—We applied the optimized flagging thresholdsderived from the optimization set to a new, separate set of 378samples that had been flagged by at least one of the 5 flags set atfactory-default levels. The PPV, the efficiency, and the resultantreview rate were calculated for the validation set using theoptimized criteria. Additionally, all cases that would have beenmissed by our new criteria were enumerated and followed up forevaluation.

Additional Blast Cases.—To increase the number of cases withactual leukemic blasts in the differential WBC count tests, wecollected additional samples, flagged at factory-set criteria, whichwere ultimately confirmed by manual differential tests to harbormore than 1% blasts. A total of 14 cases were recruited, and thevalues of each of the 5 flags were recorded to assess whether thecase would have been detected by optimized criteria.

RESULTS

Performance of the 5 Flags at Factory Default Settings

The abnormality-specific PPV of each of the 5 flags wasbetween 5.4% (PPV of the blast flag for the presence of

blasts) and 33% (PPV of the immature granulocyte flag forthe presence of myelocytes and/or metamyelocytes)(Table 2). When we considered the PPV of each flag forany abnormality, the PPV ranged from 8.6% (PPV of theabnormal lymphocyte/lymphoblast flag for any WBCabnormal finding on manual differential) to 64% (PPV forthe blast flag for any WBC abnormal finding). Thecombined PPV for any abnormal finding of any one ormore of the 5 flags at factory-default thresholds was 23%.Sensitivity and the negative predictive values could not beassessed due to lack of cases negative at factory-defaultsettings (a consequence of our sampling scheme).

Optimized Thresholds

The thresholds for the 5 flags were optimized fordetection of their specific abnormalities, as well as fordetection of any abnormality, as described in ‘‘Materialsand Methods,’’ to the values shown in Table 3.

The results of optimization on the abnormality-specificPPV of each flag and the PPV of each flag for anyabnormality are given in Table 2 for comparison tofactory-default thresholds. The abnormality-specific PPVof all but the atypical lymphocyte flag was improved,whereas all overall PPVs were improved by optimization.The overall PPV of the 5 flags for any abnormalityincreased to 31%, and the efficiency of flagging overallwas 52%. Additionally, the flagging efficacies of each flagfor its specific abnormality and for any abnormalitywere calculated, and these data are also summarized inTable 2.

Validation of the Optimized Settings

A second, independent set of 378 samples was used tovalidate the optimized settings. Application of theoptimized thresholds instead of the factory-default set-tings to these samples would have reduced the number offalse-positive samples from 275 to 106 (Table 4). Theoverall PPV of the 5 flags for any abnormality in thedifferential white blood cell count increased from 27% atfactory-default settings to 37% with the optimized thresh-olds.

A representative receiver operating character-istic (ROC) curve demonstrating the processof optimizing the flagging thresholds bymaximizing the Youden Index (YI). TheROC curve of the abnormal lymphocyte/lymphoblast flag for the detection of theabnormalities for which it is named (atypicallymphocytes and/or lymphoblasts) is shown;each point on the curve represents a differentthreshold for the flag. The area under thecurve is 0.6108. At the factory-default thresh-old of 99, the sensitivity and specificity of theflag for its abnormality are 40% and 68.4%,respectively, with a resultant YI of 0.084. Thehighest YI is achieved at a cutoff of 200 with asensitivity of 30% and a specificity of95.34%. The point of maximized YI is alsoequivalent to the point on the curve with thelargest perpendicular distance from the diag-onal. This ROC curve is derived from the datacollected and, therefore, does not includesamples that were negative for all flags atfactory settings (as at least one flag had to betriggered for the sample to be included inthe study).

1530 Arch Pathol Lab Med—Vol 134, October 2010 Optimizing Flagging Thresholds—Sireci et al

Page 4: 016 a Method for Optimizing and Validating Institution

Effects of the Adjustment of the Flagging Thresholds on theReview Rate

The review rate in the validation set for the 5 flagsstudied was 6.5% of all complete blood cell counts withdifferentials when the factory-default thresholds wereused. When the optimized thresholds were applied, thereview rate for the 5 flags of interest dropped to 2.9%.These data are summarized in Table 4.

Clinical Effect of False-Negative Cases

The breakdown of abnormalities observed in themanual differential WBC counts of the validation set (n5 378) was 4 samples (1.1%) with blasts, 48 samples(12.7%) with more than 3% myelocytes and/or metamye-locytes, 51 samples (13.5%) with more than 5% atypicallymphocytes, and 50 samples (13.2%) with more than 7%bands. Many of these samples had more than one flag; 103different samples (27.2%) had one or more flags. Use of theoptimized thresholds resulted in 41 samples (10.8%) in thevalidation set that were flagged by the factory-defaultsettings and truly harbored significant pathology but weremissed by the optimized thresholds (false-negatives). Nocases of blasts were missed. The abnormalities present inthese cases are summarized in Table 5.

Additional Blast Cases

Fourteen additional cases flagged at factory-defaultsettings and proven, by manual differential WBC count, toharbor blast cells were analyzed. All 14 cases weredetected by our optimized criteria; 12 (86%) by the blastflag and 2 (14%) by a combination of the other 4 user-

adjustable flags. Optimization did not result in any casesof missed blasts.

DISCUSSION

In this study, we have described a method foroptimizing flagging thresholds on an automated cellcounter that allows laboratories to safely reduce thenumber of differential WBC counts that require prepara-tion of a slide (review rate). Through a combination of amaximized YI and clinical judgment, we were able toadjust the thresholds of 5 flags on the Sysmex XE-2100/5000 line of hematology analyzers and reduce the reviewrate from those 5 flags from 6.5% to 2.9%.

Overall, the optimized thresholds resulted in animproved PPV of each flag for either its particularabnormality or for any abnormality in the differentialWBC count. The exception to this was the atypicallymphocyte flag whose PPV decreased slightly in theoptimization process. The generally low PPV of the blastflags (blasts and abnormal lymphocytes/lymphoblasts)for their specific abnormalities in both factory-default andoptimized settings is necessary, given the clinical need todetect all cases of blasts, requiring a relatively nonspecificflagging. Interestingly, although each flag is generally apoor predictor of its specific abnormality (ie, the abnorm-ality after which it is named), the flags (with the exceptionof the atypical lymphocyte flag) have good PPVs fordetection of any abnormality.

Comparisons of the efficiency rates of flagging reportedin the literature on automated cell counters are inherentlydifficult because of different definitions of clinicallysignificant abnormalities and true-negatives and the use ofdifferent types and models of automated cells counters indifferent patient populations. With these limitations, theefficiency rates of the differential WBC count flags in ourstudy are very similar to those reported by Lacombe andcolleagues16 for the Cobas Argos 5 Diff (ABX/RocheHematology Division, Montpellier, France) and theTechnicon H2 (Technicon Instruments, Tarrytown, NewYork), and by Ruzicka and coworkers17 for Sysmex XE-2100 instruments.

Previous studies18,19 have shown that flagging sensitiv-ity is dependent on the total WBC count, with a lowersensitivity in leukopenic samples and a lower specificityin samples with WBC counts greater than 10 000/mL.However, other studies17 have shown only a mild effect of

Table 3. Flag Thresholds for Factory-Default andOptimized Settings

Flags

Threshold, No.

Factory Default Optimized

Blast 99 200Immature granulocyte 159a 250Left shift 99 200Atypical lymphocyte 99 150Abnormal lymphocyte/

lymphoblast 99 200a Factory default setting for the immature granulocyte flag is 99.However, as a result of previous adjustments by our laboratory, theimmature granulocyte flag threshold was set at 159 at the time the studywas initiated.

Table 2. Number of False-Positive Samples and the Positive Predictive Values (PPVs) of Each Flag for Its SpecificAbnormal Finding (Optimization Set, n = 502)

Category Blast FlagImmature

Granulocyte Flag Left Shift FlagAtypical

Lymphocyte Flag

AbnormalLymphocyte/

Lymphoblast Flag

Factory-set thresholds

False-positives, No. 53 163 84 72 149Abnormality-specific PPV, % 5.4 24 33 13 7.4Overall PPV, % 64 37 45 19 8.6

Optimized thresholds

False-positives, No. 35 111 44 34 22Abnormality-specific PPV, % 7.9 29 41 11 29Abnormality-specific efficiency, % 93 76 85 89 91

Overall PPV, % 71 45 52 21 29

Overall efficiency, % 80 74 77 72 74

Arch Pathol Lab Med—Vol 134, October 2010 Optimizing Flagging Thresholds—Sireci et al 1531

Page 5: 016 a Method for Optimizing and Validating Institution

WBC count on overall efficiency. We did not examine thispotential confounder.

Although the adjusted thresholds afforded a decrease inthe number of false-positive flags, there were cases withabnormalities that were missed by our new criteria, whichwould have been flagged by factory-default settings. Mostof the missed cases had either more than 5% atypicallymphocytes (n 5 26) or more than 7% bands (n 5 9).Studies indicate that ‘‘the band count is a nonspecific,inaccurate, and imprecise laboratory test’’ with a review ofthe literature providing ‘‘little support for clinical utility ofthe band count in patients over 3 months of age.’’ 20(p101) Asour laboratory routinely prepares slides for all newborns,we were not concerned that underreporting of band formsbecause of changes in our flagging thresholds would havean adverse clinical effect.

The difficulty in correctly classifying lymphocytefindings either as within reference range or as atypicalwas pointed out in 1977 by Koepke.21 Using data based onproficiency-sample glass slides sent to more than 4000laboratories, he reported a coefficient of variation of 88%for the atypical lymphocyte count.21 More recently, vander Meer and colleagues22,23 sent PowerPoint presenta-tions of WBCs to 671 technologists at 114 hospitallaboratories. That study22 also found significant interob-server variability in the classification of lymphocytes asatypical or within reference range. Furthermore, when thesame cell was shown twice in the PowerPoint presenta-tion, it was classified by 210 of the 617 observers (34%) as adifferent subtype.22,23 Because of the limited reproduci-bility of the atypical lymphocyte count, we felt thatmissing some cases with increased numbers of atypicallymphocytes was acceptable.

Although no cases of more than 1% blasts were missedby our optimized settings in our validation set, we wereconcerned that we did not have a sufficient number ofcases to adequately test the new criteria. For that reason,an additional 14 cases, flagged by factory-default criteriaand confirmed to harbor blasts by manual differential,were analyzed. No cases of blasts would have been missedusing optimized criteria, although the optimized blast flagonly detected 12 of the 14 cases. The additional 2 caseswere detected by a combination of the 4 remaining flags.We conclude that our optimized thresholds are, atminimum, no worse at the detection of blasts than thefactory-default settings.

We were concerned, however, about the 10 cases ofincreased myeloid progenitors missed by our optimizedcriteria. The percentage of immature granulocytes is areproducible parameter and is important in the diagnosisof many disease states.9 Review of patient historiesshowed that 6 of the missed cases represented acuteinfectious processes (cryptococcal meningitis, methicillin-susceptible Staphylococcus aureus bacteremia, infectiousdiarrheal disease, pediatric sepsis, and 2 cases of urinarytract infection in immunocompromised hosts). Follow-upon 3 additional samples revealed 1 patient who wasrecovering from extensive excision of a facial squamouscell carcinoma, 1 patient with sickle-cell disease and paincrisis, and 1 patient with a new onset pericarditis ofhitherto undefined etiology. The final case was one ofpreviously diagnosed chronic myeloid leukemia. Theaccurate enumeration of myeloid precursors was clinicallyimportant in these cases.

The goal of our protocol was to improve the PPV of oursystem of flags, thereby reducing the number of manualdifferential WBC counts performed on false-positivespecimens. The number of missed immature myeloidcells is a consequence and limitation of our method ofoptimization. Use of the maximized YI optimizes therelationship between true-positives and false-positives,thereby improving the PPV. However, that improved PPVcame at the expense of a decrease in negative predictivevalue, particularly in the area of immature myeloidprecursors, such as myelocytes and metamyelocytes.Our analysis was limited to improving the PPV becauseour data set included only samples flagged at factory-default settings, thereby precluding estimation of a truebaseline measure of the NPV.

We considered reducing our immature granulocyte flagback to factory-settings, thereby reducing the number ofmissed myeloid progenitors from 10 to 5. Doing so wouldincrease the number of false-positives in our sample from106 to 113 and the review rate from 2.9% to 3.1%, not asubstantial increase. Four of the other 5 missed caseswould be detected only by reducing the left shift flag tothe factory-default setting. However, doing so wouldincrease our false-positive rate to 148 from 106 and,consequently, increased our review rate to 4% whiledecreasing the PPV to 34.1% and the efficiency to 54%. Thelast missed case would only have been detected bydecreasing the blast flag to 110.

An additional mechanism by which missed myeloidprogenitors might be avoided is the concurrent introduc-tion of the XE-IG master software with the new flaggingsystem. That software allows the reporting of a parametercalled immature granulocytes (promyelocytes, myelo-cytes, and metamyelocytes) directly from the analyzer,without a slide review.9,24 Additional studies are requiredto validate the use of that technology with our optimizedflagging settings.

The optimized flagging criteria described here reducesthe review rate from the 5 flags studied from 6.5% to 2.9%.

Table 5. False-Negative Samples Resulting FromOptimized Criteria in the Validation Set (n = 378)

AbnormalitySamples,

No.Mean,

%Range,

%

.7% bands 10 14.75 8–24

.5% atypical lymphocytes 28 9.52 5–33

.3% metamyelocytes and/ormyelocytes 6 4.50 3–6

Table 4. False-Positives, Positive Predictive Values (PPV), Review Rate, and Efficiency of the 5 Flags in the Validation Set(n = 378)

Threshold False-Positive, No. Overall PPV, % Review Rate, % Overall Efficiency, %

Factory default 275 27 6.5 NAOptimized 106 37 2.9 61

Abbreviation: NA, not available.

1532 Arch Pathol Lab Med—Vol 134, October 2010 Optimizing Flagging Thresholds—Sireci et al

Page 6: 016 a Method for Optimizing and Validating Institution

Further studies will be necessary to similarly decrease thereview rates from other user-adjustable flags on ouranalyzers.

In conclusion, we have developed a method for optimiz-ing the thresholds for quantitative flags on automated cellcounters and for reducing review rates. We improved theoverall PPV of each flag for any abnormal finding andachieved flagging efficacies similar to studies using otheranalyzers. Although this study was performed on theSysmex XE-2100/5000 line of analyzers, the overallapproach to optimization can be used on any hematologyanalyzer that uses quantitative flagging criteria.

The authors thank Barbara J. Connell, MS, MT SH(ASCP), foradvice and careful reading of the manuscript. This work isdedicated to the memory of Daniel J. Fink, MD, MPH, whoinitiated the research that culminated in this article.

References

1. Pierre RV. Peripheral blood film review: the demise of the eyecountleukocyte differential. Clin Lab Med. 2002;22(1):279–297.

2. Hoyer JD. Leukocyte differential. Mayo Clin Proc. 1993;68(10):1027–1028.3. Kratz A, Bengtsson HI, Casey JE, et al. Performance evaluation of the

CellaVision DM96 system: WBC differentials by automated digital image analysissupported by an artificial neural network. Am J Clin Pathol. 2005;124(5):770–781.

4. Briggs C, Harrison P, Grant D, Staves J, Chavada N, Machin SJ. Performanceevaluation of the Sysmex XE-2100TM, automated haematology analyser. Sysmex JInt. 1999;9(2):113–119.

5. Gould N, Connell B, Dyer K, Richmond T. Performance evaluation of theSysmex XE-2100, automated hematology analyzer. Sysmex J Int. 1999;9(2):120–128.

6. Fujimoto K. Principles of measurement in hematology analyzers manufac-tured by Sysmex Corporation. Sysmex J Int. 1999;9(1):31–40.

7. Hiroyuki I. Overview of automated hematology analyzer XE-2100. Sysmex JInt. 1999;9(1):58–64.

8. Briggs C, Kunka S, Fujimoto H, Hamaguchi Y, Davis BH, Machin SJ.Evaluation of immature granulocyte counts by the XE-IG master: upgradedsoftware for the XE-2100 automated hematology analyzer. Lab Hematol. 2003;9(3):117–124.

9. Ansari-Lari MA, Kickler TS, Borowitz MJ. Immature granulocyte measure-ment using the Sysmex XE-2100. Relationship to infection and sepsis. Am J ClinPathol. 2003;120(5):795–799.

10. National Committee for Clinical Laboratory Standards. Reference Leuko-cyte Differential Count (Proportional) and Evaluation of Instrumental Methods.Villanova, PA: NCCLS; 1992. Approved NCCLS document H20-A.

11. Barnes PW, McFadden SL, Machin SJ, Simson E. The internationalconsensus group for hematology review: suggested criteria for actionfollowing automated CBC and WBC differential analysis. Lab Hematol. 2005;11(2):83–90.

12. Schisterman EF, Perkins NJ, Liu A, Bondell H. Optimal cut-point and itscorresponding Youden Index to discriminate individuals using pooled bloodsamples. Epidemiology. 2005;16(1):73–81.

13. Hilden J, Glasziou P. Regret graphs, diagnostic uncertainty and Youden’sIndex. Stat Med. 1996;15(10):969–986.

14. Pekkanen J, Pearce N. Defining asthma in epidemiological studies. EurRespir J. 1999;14(4):951–957.

15. John R, Lifshitz MR, Jhang J, Fink DJ. Post-analysis: medical decision-making. In: McPherson RA, Pincus MR, eds. Henry’s Clinical Diagnosis andManagement by Laboratory Methods. 21st ed. Philadelphia, PA: Elsevier; 2007:68–75.

16. Lacombe F, Cazaux N, Briais A, et al. Evaluation of the leukocytedifferential flags on an hematologic analyzer: the Cobas Argos 5 Diff. Am J ClinPathol. 1995;104(5):495–502.

17. Ruzicka K, Veitl M, Thalhammer-Scherrer R, Schwarzinger I. The newhematology analyzer Sysmex XE-2100: performance evaluation of a novel whiteblood cell differential technology. Arch Pathol Lab Med. 2001;125(3):391–396.

18. Korninger L, Mustafa G, Schwarzinger I. The haematology analyser SF-3000: performance of the automated white blood cell differential count incomparison to the haematology analyser NE-1500. Clin Lab Haematol. 1998;20(2):81–86.

19. Thalhammer-Scherrer R, Knobl P, Korninger L, Schwarzinger I. Automatedfive-part white blood cell differential counts: efficiency of software-generatedwhite blood cell suspect flags of the hematology analyzers Sysmex SE-9000,Sysmex NE-8000, and Coulter STKS. Arch Pathol Lab Med. 1997;121(6):573–577.

20. Cornbleet PJ. Clinical utility of the band count. Clin Lab Med. 2002;22(1):101–136.

21. Koepke JA. A delineation of performance criteria for the differentiation ofleukocytes. Am J Clin Pathol. 1977;68(1)(suppl):202–206.

22. van der Meer W, Scott CS, de Keijzer MH. Automated flagging influencesthe inconsistency and bias of band cell and atypical lymphocyte morphologicaldifferentials. Clin Chem Lab Med. 2004;42(4):371–377.

23. van der Meer W, van Gelder W, de Keijzer R, Willems H. The divergentmorphological classification of variant lymphocytes in blood smears. J ClinPathol. 2007;60(7):838–839.

24. Briggs C, Kunka S, Pennaneach C, Forbes L, Machin SJ. Performanceevaluation of a new compact hematology analyzer, the Sysmex pocH-100i. LabHematol. 2003;9(4):225–233.

Arch Pathol Lab Med—Vol 134, October 2010 Optimizing Flagging Thresholds—Sireci et al 1533