29
When is Bigger Be+er? Criteria for S3mulus Size in Perimetry Paul H Artes, PhD IPS Melbourne January 2012

When is Bigger Better: Criteria for Stimulus Size in Perimetry

Embed Size (px)

DESCRIPTION

Talk by Paul H Artes at the 2012 Imaging & Perimetry Society (IPS) Meeting in Melbourne, Australia.

Citation preview

Page 1: When is Bigger Better: Criteria for Stimulus Size in Perimetry

When  is  Bigger  Be+er?    Criteria  for  S3mulus  Size  in  Perimetry  Paul  H  Artes,  PhD  IPS  Melbourne  January  2012  

Page 2: When is Bigger Better: Criteria for Stimulus Size in Perimetry

“Two  Men  Contempla/ng  the  Moon”  Caspar  David  Friedrich,  Dresden,  1820  

Page 3: When is Bigger Better: Criteria for Stimulus Size in Perimetry

“Two  Men  Contempla/ng  the  Moon”  Caspar  David  Friedrich,  Dresden,  1820  

0.5  degrees  

Page 4: When is Bigger Better: Criteria for Stimulus Size in Perimetry

Standard  Automated  Perimetry  (III)  

Insensi3ve  with  early  damage  Highly  variable  with  moderate  damage  Limited  dynamic  range  with  severe  damage    

Page 5: When is Bigger Better: Criteria for Stimulus Size in Perimetry

Standard  Automated  Perimetry  (III)  

Insensi3ve  with  early  damage  Highly  variable  with  moderate  damage  Limited  dynamic  range  with  severe  damage    Many  alterna3ves  have  been  developed  None  have  been  unequiv  “be+er”  than  SAPIII  HRP,  Matrix,  SWAP,…    

Page 6: When is Bigger Better: Criteria for Stimulus Size in Perimetry

CDF,  1830  

Page 7: When is Bigger Better: Criteria for Stimulus Size in Perimetry

Theoretical design considerations

Better match with visual system ���Size of receptive fields���Structure of stimulus Spatial summation

testable in small-scale studies

Closer correlation with structure���Lower variance between controls Lower variability over time Larger useful dynamic range Higher signal/noise ratio

real-world evidence

Higher sensitivity & specificity to damage, and to change over time.

Page 8: When is Bigger Better: Criteria for Stimulus Size in Perimetry

Theoretical design considerations

Better match with visual system ���Size of receptive fields���Structure of stimulus Spatial summation

testable in small-scale studies

Closer correlation with structure ���Lower variance between controls Lower variability over time Larger useful dynamic range Higher signal/noise ratio

real-world evidence

Higher sensitivity & specificity to damage, and to change over time.

Page 9: When is Bigger Better: Criteria for Stimulus Size in Perimetry

Theoretical considerations

Better match with visual system ���Size of receptive fields���Structure of stimulus Spatial summation

testable in small-scale studies

Closer correlation with structure ���Lower variance between controls Lower variability over time Larger useful dynamic range Higher signal/noise ratio

real-world evidence

Higher sensitivity & specificity to damage, and to change over time.

Page 10: When is Bigger Better: Criteria for Stimulus Size in Perimetry

“Wreck  in  Moonlight”,  1835  

Page 11: When is Bigger Better: Criteria for Stimulus Size in Perimetry

What  specific  proper3es  do  we  need  to  demonstrate    

in  a  “be+er”  test,    (and  how)?  

 A  case  study  with  Matrix  Perimetry  

Page 12: When is Bigger Better: Criteria for Stimulus Size in Perimetry

The main objective of this study was to compare thresholdestimates and their test–retest variability between FDT2 andSAP. To address the methodological issues that arise when testsare compared to an imperfect gold standard, we reduced theeffects of variability by examining each patient six times withboth techniques. Principal curve analysis12 (which accountsfor variability in both the dependent and independent vari-ables) was then performed to determine the relationship be-tween threshold values obtained with both techniques. Thetest–retest variability of both techniques was investigated as afunction of visual field sensitivity and established in terms oftest–retest intervals and standard deviations (SDs). Total andpattern deviation probability maps obtained with both tech-niques were compared by calculating ordinal defect scores.

METHODS

Description of Techniques

Frequency-Doubling Technology Perimetry. The Hum-phrey Matrix (FTD2) perimeter presents FDT stimuli on a cathode raytube with a nominal background luminance of 100 cd/m2, mounted atoptical infinity from the observer’s eye. With the exception of thefoveal stimulus, the stimuli are 5°-square windows of a vertical cosinewave grating with a spatial frequency of 0.5 cyc/deg, counterphaseflickered at 18 Hz (Fig. 1). Stimuli are presented for 500 ms, includingramped onsets and offsets of 100 ms. The principles and psychometricproperties of the maximum-likelihood strategy (ZEST) used for thresh-old estimation have been described in detail.11,13–15 Briefly, a proba-bility density function (PDF) that describes the likely distribution ofthresholds is modified according to the patient’s responses to fourpresentations at each test location, and the threshold is estimated bythe mean of the final PDF. Because 2 of the 16 possible combinationsof 4 yes/no responses give a similar mean, the threshold estimatesassume 15 discrete levels, ranging from 0 to 38 dB. Thresholds areexpressed in terms of contrast attenuation in units of decibels (one-twentieth of a logarithmic unit), using the Michelson definition ofcontrast [(Lmax ! Lmin)/(Lmax " Lmin)], where Lmax and Lmin refer to the

maximum and minimum luminance of the stimulus, respectively. Thetest locations of the 24-2 program of the instrument are shown inFigure 2.

Standard Automated Perimetry. SAP was performed with aHumphrey Field Analyzer (HFA; Carl-Zeiss Meditec), using the SITA-Standard program 24-2 with the standard Goldmann size III stimulus(diameter, 0.43°, see Fig. 1). The principles and properties of the SITAStandard strategy have been described previously.1,16–18 In summary,stimulus intensities are varied in steps of 4 and 2 dB, and the finalthreshold estimates are obtained after maximum-likelihood calcula-tions based on the patient’s responses and the prior PDFs. In con-trast to the threshold strategy of the Humphrey Matrix, the SITAStandard strategy terminates once the threshold of each locationhas been estimated with a given confidence, after a variable numberof presentations. The definition of the dB scale in SAP relates tothe brightest stimulus that the instrument is capable of displaying[dB # 10 log(maximum intensity/threshold stimulus intensity)],which, with the HFA, is 3183 cd/m2. The test locations of the HFA 24-2program are shown in Figure 2, alongside those of the correspondingHumphrey Matrix program.

Study Sample and Testing

Fifteen patients with glaucoma (mean age, 66.3 years; range, 56.1–80.6) with early to moderate visual field loss (mean MD [SITA Stan-dard], !4.0 dB; range, "0.2 to !16.1) were recruited from the glau-coma clinics of the QEII Health Sciences Centre (Halifax, Nova Scotia,Canada). Criteria for inclusion in the study were a clinical diagnosis ofopen-angle glaucoma, refractive error within 5 D equivalent sphere or3 D astigmatism, best-corrected visual acuity !6/12 ("0.3 logMAR),and prior experience with FDT1 perimetry and SAP. Patients wereexamined over three sessions within a period of 4 weeks. Within eachsession, the randomly selected study eye was examined twice withFDT2 (24-2 threshold test) and twice with SAP (SITA Standard; 24-2test). The order of the tests was randomized, and a mandatory break of6 minutes was given between examinations. All participants wore theappropriate refractive correction for each test. The study adhered tothe tenets of the Declaration of Helsinki. The protocol was approvedby the Queen Elizabeth II Health Science Centre Research EthicsCommittee, and all participants gave written informed consent.

Analyses

Comparison of Threshold Estimates. To establish the re-lationship between the threshold estimates of FDT2 and SAP, wecompared the mean result of the six tests at those locations at which

FIGURE 2. Stimulus locations (right eye) of the 24-2 programs of theHFA (small circles) and the Humphrey Matrix perimeter (squares andlarge central circle). For the comparison between the threshold esti-mates, locations were excluded if the stimulus centers of the twotechniques were not within 2° of each other (filled circles, shadedsquares). Also excluded were the two locations in the vicinity of theblind spot.

FIGURE 1. Spatial and temporal characteristics of the stimuli of FDT2(left) and SAP (right). With the exception of the foveal stimulus, thestimuli of FDT2 (A) are 5°-square windows of a vertical cosine wavegrating with a spatial frequency of 0.5 cyc/deg (B), counterphaseflickered at 18 Hz (C). The foveal stimulus is a circular patch with adiameter of 5°. The presentation time is 500 ms, including rampedonsets and offsets of 100 ms. The background luminance is 100 cd/m2.The SAP stimuli are circular luminance increments with a diameter of0.43°, presented for 200 ms with sudden on- and offsets. The back-ground luminance is 10 cd/m2. Grayscale levels (A) and amplitudes (B,C) are not to scale.

2452 Artes et al. IOVS, July 2005, Vol. 46, No. 7

The main objective of this study was to compare thresholdestimates and their test–retest variability between FDT2 andSAP. To address the methodological issues that arise when testsare compared to an imperfect gold standard, we reduced theeffects of variability by examining each patient six times withboth techniques. Principal curve analysis12 (which accountsfor variability in both the dependent and independent vari-ables) was then performed to determine the relationship be-tween threshold values obtained with both techniques. Thetest–retest variability of both techniques was investigated as afunction of visual field sensitivity and established in terms oftest–retest intervals and standard deviations (SDs). Total andpattern deviation probability maps obtained with both tech-niques were compared by calculating ordinal defect scores.

METHODS

Description of Techniques

Frequency-Doubling Technology Perimetry. The Hum-phrey Matrix (FTD2) perimeter presents FDT stimuli on a cathode raytube with a nominal background luminance of 100 cd/m2, mounted atoptical infinity from the observer’s eye. With the exception of thefoveal stimulus, the stimuli are 5°-square windows of a vertical cosinewave grating with a spatial frequency of 0.5 cyc/deg, counterphaseflickered at 18 Hz (Fig. 1). Stimuli are presented for 500 ms, includingramped onsets and offsets of 100 ms. The principles and psychometricproperties of the maximum-likelihood strategy (ZEST) used for thresh-old estimation have been described in detail.11,13–15 Briefly, a proba-bility density function (PDF) that describes the likely distribution ofthresholds is modified according to the patient’s responses to fourpresentations at each test location, and the threshold is estimated bythe mean of the final PDF. Because 2 of the 16 possible combinationsof 4 yes/no responses give a similar mean, the threshold estimatesassume 15 discrete levels, ranging from 0 to 38 dB. Thresholds areexpressed in terms of contrast attenuation in units of decibels (one-twentieth of a logarithmic unit), using the Michelson definition ofcontrast [(Lmax ! Lmin)/(Lmax " Lmin)], where Lmax and Lmin refer to the

maximum and minimum luminance of the stimulus, respectively. Thetest locations of the 24-2 program of the instrument are shown inFigure 2.

Standard Automated Perimetry. SAP was performed with aHumphrey Field Analyzer (HFA; Carl-Zeiss Meditec), using the SITA-Standard program 24-2 with the standard Goldmann size III stimulus(diameter, 0.43°, see Fig. 1). The principles and properties of the SITAStandard strategy have been described previously.1,16–18 In summary,stimulus intensities are varied in steps of 4 and 2 dB, and the finalthreshold estimates are obtained after maximum-likelihood calcula-tions based on the patient’s responses and the prior PDFs. In con-trast to the threshold strategy of the Humphrey Matrix, the SITAStandard strategy terminates once the threshold of each locationhas been estimated with a given confidence, after a variable numberof presentations. The definition of the dB scale in SAP relates tothe brightest stimulus that the instrument is capable of displaying[dB # 10 log(maximum intensity/threshold stimulus intensity)],which, with the HFA, is 3183 cd/m2. The test locations of the HFA 24-2program are shown in Figure 2, alongside those of the correspondingHumphrey Matrix program.

Study Sample and Testing

Fifteen patients with glaucoma (mean age, 66.3 years; range, 56.1–80.6) with early to moderate visual field loss (mean MD [SITA Stan-dard], !4.0 dB; range, "0.2 to !16.1) were recruited from the glau-coma clinics of the QEII Health Sciences Centre (Halifax, Nova Scotia,Canada). Criteria for inclusion in the study were a clinical diagnosis ofopen-angle glaucoma, refractive error within 5 D equivalent sphere or3 D astigmatism, best-corrected visual acuity !6/12 ("0.3 logMAR),and prior experience with FDT1 perimetry and SAP. Patients wereexamined over three sessions within a period of 4 weeks. Within eachsession, the randomly selected study eye was examined twice withFDT2 (24-2 threshold test) and twice with SAP (SITA Standard; 24-2test). The order of the tests was randomized, and a mandatory break of6 minutes was given between examinations. All participants wore theappropriate refractive correction for each test. The study adhered tothe tenets of the Declaration of Helsinki. The protocol was approvedby the Queen Elizabeth II Health Science Centre Research EthicsCommittee, and all participants gave written informed consent.

Analyses

Comparison of Threshold Estimates. To establish the re-lationship between the threshold estimates of FDT2 and SAP, wecompared the mean result of the six tests at those locations at which

FIGURE 2. Stimulus locations (right eye) of the 24-2 programs of theHFA (small circles) and the Humphrey Matrix perimeter (squares andlarge central circle). For the comparison between the threshold esti-mates, locations were excluded if the stimulus centers of the twotechniques were not within 2° of each other (filled circles, shadedsquares). Also excluded were the two locations in the vicinity of theblind spot.

FIGURE 1. Spatial and temporal characteristics of the stimuli of FDT2(left) and SAP (right). With the exception of the foveal stimulus, thestimuli of FDT2 (A) are 5°-square windows of a vertical cosine wavegrating with a spatial frequency of 0.5 cyc/deg (B), counterphaseflickered at 18 Hz (C). The foveal stimulus is a circular patch with adiameter of 5°. The presentation time is 500 ms, including rampedonsets and offsets of 100 ms. The background luminance is 100 cd/m2.The SAP stimuli are circular luminance increments with a diameter of0.43°, presented for 200 ms with sudden on- and offsets. The back-ground luminance is 10 cd/m2. Grayscale levels (A) and amplitudes (B,C) are not to scale.

2452 Artes et al. IOVS, July 2005, Vol. 46, No. 7

The main objective of this study was to compare thresholdestimates and their test–retest variability between FDT2 andSAP. To address the methodological issues that arise when testsare compared to an imperfect gold standard, we reduced theeffects of variability by examining each patient six times withboth techniques. Principal curve analysis12 (which accountsfor variability in both the dependent and independent vari-ables) was then performed to determine the relationship be-tween threshold values obtained with both techniques. Thetest–retest variability of both techniques was investigated as afunction of visual field sensitivity and established in terms oftest–retest intervals and standard deviations (SDs). Total andpattern deviation probability maps obtained with both tech-niques were compared by calculating ordinal defect scores.

METHODS

Description of Techniques

Frequency-Doubling Technology Perimetry. The Hum-phrey Matrix (FTD2) perimeter presents FDT stimuli on a cathode raytube with a nominal background luminance of 100 cd/m2, mounted atoptical infinity from the observer’s eye. With the exception of thefoveal stimulus, the stimuli are 5°-square windows of a vertical cosinewave grating with a spatial frequency of 0.5 cyc/deg, counterphaseflickered at 18 Hz (Fig. 1). Stimuli are presented for 500 ms, includingramped onsets and offsets of 100 ms. The principles and psychometricproperties of the maximum-likelihood strategy (ZEST) used for thresh-old estimation have been described in detail.11,13–15 Briefly, a proba-bility density function (PDF) that describes the likely distribution ofthresholds is modified according to the patient’s responses to fourpresentations at each test location, and the threshold is estimated bythe mean of the final PDF. Because 2 of the 16 possible combinationsof 4 yes/no responses give a similar mean, the threshold estimatesassume 15 discrete levels, ranging from 0 to 38 dB. Thresholds areexpressed in terms of contrast attenuation in units of decibels (one-twentieth of a logarithmic unit), using the Michelson definition ofcontrast [(Lmax ! Lmin)/(Lmax " Lmin)], where Lmax and Lmin refer to the

maximum and minimum luminance of the stimulus, respectively. Thetest locations of the 24-2 program of the instrument are shown inFigure 2.

Standard Automated Perimetry. SAP was performed with aHumphrey Field Analyzer (HFA; Carl-Zeiss Meditec), using the SITA-Standard program 24-2 with the standard Goldmann size III stimulus(diameter, 0.43°, see Fig. 1). The principles and properties of the SITAStandard strategy have been described previously.1,16–18 In summary,stimulus intensities are varied in steps of 4 and 2 dB, and the finalthreshold estimates are obtained after maximum-likelihood calcula-tions based on the patient’s responses and the prior PDFs. In con-trast to the threshold strategy of the Humphrey Matrix, the SITAStandard strategy terminates once the threshold of each locationhas been estimated with a given confidence, after a variable numberof presentations. The definition of the dB scale in SAP relates tothe brightest stimulus that the instrument is capable of displaying[dB # 10 log(maximum intensity/threshold stimulus intensity)],which, with the HFA, is 3183 cd/m2. The test locations of the HFA 24-2program are shown in Figure 2, alongside those of the correspondingHumphrey Matrix program.

Study Sample and Testing

Fifteen patients with glaucoma (mean age, 66.3 years; range, 56.1–80.6) with early to moderate visual field loss (mean MD [SITA Stan-dard], !4.0 dB; range, "0.2 to !16.1) were recruited from the glau-coma clinics of the QEII Health Sciences Centre (Halifax, Nova Scotia,Canada). Criteria for inclusion in the study were a clinical diagnosis ofopen-angle glaucoma, refractive error within 5 D equivalent sphere or3 D astigmatism, best-corrected visual acuity !6/12 ("0.3 logMAR),and prior experience with FDT1 perimetry and SAP. Patients wereexamined over three sessions within a period of 4 weeks. Within eachsession, the randomly selected study eye was examined twice withFDT2 (24-2 threshold test) and twice with SAP (SITA Standard; 24-2test). The order of the tests was randomized, and a mandatory break of6 minutes was given between examinations. All participants wore theappropriate refractive correction for each test. The study adhered tothe tenets of the Declaration of Helsinki. The protocol was approvedby the Queen Elizabeth II Health Science Centre Research EthicsCommittee, and all participants gave written informed consent.

Analyses

Comparison of Threshold Estimates. To establish the re-lationship between the threshold estimates of FDT2 and SAP, wecompared the mean result of the six tests at those locations at which

FIGURE 2. Stimulus locations (right eye) of the 24-2 programs of theHFA (small circles) and the Humphrey Matrix perimeter (squares andlarge central circle). For the comparison between the threshold esti-mates, locations were excluded if the stimulus centers of the twotechniques were not within 2° of each other (filled circles, shadedsquares). Also excluded were the two locations in the vicinity of theblind spot.

FIGURE 1. Spatial and temporal characteristics of the stimuli of FDT2(left) and SAP (right). With the exception of the foveal stimulus, thestimuli of FDT2 (A) are 5°-square windows of a vertical cosine wavegrating with a spatial frequency of 0.5 cyc/deg (B), counterphaseflickered at 18 Hz (C). The foveal stimulus is a circular patch with adiameter of 5°. The presentation time is 500 ms, including rampedonsets and offsets of 100 ms. The background luminance is 100 cd/m2.The SAP stimuli are circular luminance increments with a diameter of0.43°, presented for 200 ms with sudden on- and offsets. The back-ground luminance is 10 cd/m2. Grayscale levels (A) and amplitudes (B,C) are not to scale.

2452 Artes et al. IOVS, July 2005, Vol. 46, No. 7

Page 13: When is Bigger Better: Criteria for Stimulus Size in Perimetry

What  have  we  learned  from  Matrix?  the stimulus centers were within 2° of each other (Fig. 2, closed circlesand unshaded squares). In addition to the test locations on either sideof the vertical meridian, we excluded the two locations near the blindspot.

A principal curve12 implementation19 available in the open-sourcestatistical environment R20 was used to derive the relationship be-tween the two perimetric techniques. This mathematical method findsa best fit between two variables by minimizing the residuals perpen-dicular to the fitted curve in both the x and y variables. In contrast tothe more familiar least-squares type of regression (which minimizes theresiduals in the dependent variable only), the principal curve algorithmis not based on the assumption that the independent variable is mea-sured without error, and it is therefore immaterial which one of thetwo perimetric techniques is represented on the x-axis. To avoid flooreffects that occur when the lower limit of the dynamic range of eithertechnique has been reached, we excluded the data from test locationsat which one or more of the threshold estimates were 0 dB with eitherFDT2 or SAP.

Test–Retest Variability. Test–retest intervals describe therange within which the central 90% of follow-up thresholds are likelyto fall, for each level of baseline threshold, if no real change has takenplace. We derived such intervals for combinations of two baseline andtwo follow-up tests with FDT2 and SAP. Since there were no significantlearning or fatigue effects within or between the three sessions witheither of the two techniques (repeated-measures ANOVA of MD andPSD, P ! 0.1), we treated the order of the six examinations asinterchangeable, thus using all 90 possible combinations of indepen-dent baseline and follow-up pairs of tests. For example, a baselinethreshold calculated from tests 1 and 2 was compared to all sixpossible follow-up thresholds obtained by pair-wise combination oftests 3, 4, 5, and 6. The test–retest intervals were then established bycalculating the empiric 5th and 95th percentiles of the distribution offollow-up thresholds, stratified for the baseline value.

For a quantitative comparison of the relationships between thresh-old and test–retest variability with FDT2 and SAP, we investigated theSD of the six repeated threshold estimates as a function of their meanvalue at each test location, using linear regression analyses of log SDversus mean. Because the dB scales of FDT2 and SAP are based on twodistinct definitions (provided in Description of Techniques), we firsttransformed all thresholds into instrument-independent units of logWeber contrast sensitivity to render the data numerically comparable.As the dynamic range of both perimeters is limited, the sensitivity ofseverely damaged visual field locations may not always be truly mea-surable. Some thresholds may be estimated at 0 dB, since the maximumstimulus contrast has been reached. Because this floor effect can leadto an artifactual decrease in the test–retest variability, the regressionanalyses were confined to locations at which all six threshold estimateswere !0 dB. Similarly, if all six threshold estimates had the same value,the resultant SDs (0) may unduly bias the estimation of the underlyingvariability, and data from such locations were also excluded from theregression analyses of both techniques.

Comparison of Total and Pattern Deviation Probabil-ity Maps. For the comparison between the probability maps of FDT2and SAP, we derived an ordinal defect score for each visual field test.Each test location was assigned a value ranging from 0 to 4 accordingto its probability (P ! 5%, P " 5%, P " 2%, P " 1%, P " 0.5%,respectively) in the printout. These scores were then summed acrossthe 52 test locations of the entire visual field (excluding the foveal testpoint and the two locations at and above the blind spot), and the globaldefect sums were compared between total and pattern deviation prob-ability maps of FDT2 and SAP.

RESULTS

The global indices mean deviation (MD) and pattern standarddeviation (PSD) obtained with FDT2 and SAP were closelyrelated (r # 0.86, P " 0.001; r # 0.95, P " 0.001; respec-

tively). The mean test time of FDT2 was 313 seconds (range,302–321), independent of visual field damage as measured byMD (r # 0.05, P # 0.44). In comparison, the mean test time forSAP was 319 seconds (range, 259–413), increasing signifi-cantly with visual field damage (r # 0.81, P " 0.001).

Comparison of Threshold Estimates

The measurement scales of FDT2 and SAP appeared numeri-cally similar; 90% of threshold estimates in our study werebetween 3 and 32 dB with FDT and between 5 and 32 dB withSAP. At test locations with high sensitivity (mean SAP threshold!25 dB), there was a close and approximately linear relation-ship between the mean thresholds of FDT2 and SAP (Fig. 3). Inthis range of sensitivities, the principal curve had a slope ofapproximately 2.0. At locations with lower sensitivity, how-ever, the spread of the data points increased considerably andthe curve became progressively shallower. In comparison toFDT2, SAP estimated a slightly larger proportion of absolutedefects (proportion of threshold estimates at 0 dB, 3.0% vs.2.2%, respectively with SAP and FDT2; P " 0.001).

Test–Retest Variability

Test–Retest Intervals of Threshold Values. With FDT2,the width of the test–retest intervals appeared nearly constant($8 dB) across virtually the entire measurement range. Incontrast, the intervals of SAP were narrow ($3 dB) withthresholds near 30 dB, but broadened considerably with lowerthresholds, ranging over nearly 15 dB at locations with baselinevalues near 10 dB (Fig. 4).

Parametric Analyses of Test–Retest Variability. In con-trast to SAP, which showed a substantial increase in test–retestvariability with decreasing levels of visual field sensitivity, therewas essentially no such relationship with FDT2. Whereas therelationship between sensitivity and test–retest variability ac-counted for approximately 40% of the observed variance withSAP, virtually none of the variance observed in the FDT2 datawas explained by sensitivity (Fig. 5).

FIGURE 3. Comparison between thresholds of FDT2 and SAP. Eachdata point shows the mean threshold of six tests with FDT2 and sixtests with SAP. Locations affected by floor effects (where one or moreof the six threshold estimates were 0 dB) are shown by filled circles.These effects only occurred at locations with mean thresholds "20 and"23 dB with FDT2 and SAP, respectively. The curve shows the rela-tionship between both techniques established by principal curve anal-ysis for which any locations with truncation effects were excluded.The slope of the linear portion of the curve (SAP sensitivities !25 dB)is approximately 2.0.

IOVS, July 2005, Vol. 46, No. 7 Matrix Perimetry 2453

in  physiologic  range:    Predictable  rela3on  between  thresholds  of  both  techniques  

The main objective of this study was to compare thresholdestimates and their test–retest variability between FDT2 andSAP. To address the methodological issues that arise when testsare compared to an imperfect gold standard, we reduced theeffects of variability by examining each patient six times withboth techniques. Principal curve analysis12 (which accountsfor variability in both the dependent and independent vari-ables) was then performed to determine the relationship be-tween threshold values obtained with both techniques. Thetest–retest variability of both techniques was investigated as afunction of visual field sensitivity and established in terms oftest–retest intervals and standard deviations (SDs). Total andpattern deviation probability maps obtained with both tech-niques were compared by calculating ordinal defect scores.

METHODS

Description of Techniques

Frequency-Doubling Technology Perimetry. The Hum-phrey Matrix (FTD2) perimeter presents FDT stimuli on a cathode raytube with a nominal background luminance of 100 cd/m2, mounted atoptical infinity from the observer’s eye. With the exception of thefoveal stimulus, the stimuli are 5°-square windows of a vertical cosinewave grating with a spatial frequency of 0.5 cyc/deg, counterphaseflickered at 18 Hz (Fig. 1). Stimuli are presented for 500 ms, includingramped onsets and offsets of 100 ms. The principles and psychometricproperties of the maximum-likelihood strategy (ZEST) used for thresh-old estimation have been described in detail.11,13–15 Briefly, a proba-bility density function (PDF) that describes the likely distribution ofthresholds is modified according to the patient’s responses to fourpresentations at each test location, and the threshold is estimated bythe mean of the final PDF. Because 2 of the 16 possible combinationsof 4 yes/no responses give a similar mean, the threshold estimatesassume 15 discrete levels, ranging from 0 to 38 dB. Thresholds areexpressed in terms of contrast attenuation in units of decibels (one-twentieth of a logarithmic unit), using the Michelson definition ofcontrast [(Lmax ! Lmin)/(Lmax " Lmin)], where Lmax and Lmin refer to the

maximum and minimum luminance of the stimulus, respectively. Thetest locations of the 24-2 program of the instrument are shown inFigure 2.

Standard Automated Perimetry. SAP was performed with aHumphrey Field Analyzer (HFA; Carl-Zeiss Meditec), using the SITA-Standard program 24-2 with the standard Goldmann size III stimulus(diameter, 0.43°, see Fig. 1). The principles and properties of the SITAStandard strategy have been described previously.1,16–18 In summary,stimulus intensities are varied in steps of 4 and 2 dB, and the finalthreshold estimates are obtained after maximum-likelihood calcula-tions based on the patient’s responses and the prior PDFs. In con-trast to the threshold strategy of the Humphrey Matrix, the SITAStandard strategy terminates once the threshold of each locationhas been estimated with a given confidence, after a variable numberof presentations. The definition of the dB scale in SAP relates tothe brightest stimulus that the instrument is capable of displaying[dB # 10 log(maximum intensity/threshold stimulus intensity)],which, with the HFA, is 3183 cd/m2. The test locations of the HFA 24-2program are shown in Figure 2, alongside those of the correspondingHumphrey Matrix program.

Study Sample and Testing

Fifteen patients with glaucoma (mean age, 66.3 years; range, 56.1–80.6) with early to moderate visual field loss (mean MD [SITA Stan-dard], !4.0 dB; range, "0.2 to !16.1) were recruited from the glau-coma clinics of the QEII Health Sciences Centre (Halifax, Nova Scotia,Canada). Criteria for inclusion in the study were a clinical diagnosis ofopen-angle glaucoma, refractive error within 5 D equivalent sphere or3 D astigmatism, best-corrected visual acuity !6/12 ("0.3 logMAR),and prior experience with FDT1 perimetry and SAP. Patients wereexamined over three sessions within a period of 4 weeks. Within eachsession, the randomly selected study eye was examined twice withFDT2 (24-2 threshold test) and twice with SAP (SITA Standard; 24-2test). The order of the tests was randomized, and a mandatory break of6 minutes was given between examinations. All participants wore theappropriate refractive correction for each test. The study adhered tothe tenets of the Declaration of Helsinki. The protocol was approvedby the Queen Elizabeth II Health Science Centre Research EthicsCommittee, and all participants gave written informed consent.

Analyses

Comparison of Threshold Estimates. To establish the re-lationship between the threshold estimates of FDT2 and SAP, wecompared the mean result of the six tests at those locations at which

FIGURE 2. Stimulus locations (right eye) of the 24-2 programs of theHFA (small circles) and the Humphrey Matrix perimeter (squares andlarge central circle). For the comparison between the threshold esti-mates, locations were excluded if the stimulus centers of the twotechniques were not within 2° of each other (filled circles, shadedsquares). Also excluded were the two locations in the vicinity of theblind spot.

FIGURE 1. Spatial and temporal characteristics of the stimuli of FDT2(left) and SAP (right). With the exception of the foveal stimulus, thestimuli of FDT2 (A) are 5°-square windows of a vertical cosine wavegrating with a spatial frequency of 0.5 cyc/deg (B), counterphaseflickered at 18 Hz (C). The foveal stimulus is a circular patch with adiameter of 5°. The presentation time is 500 ms, including rampedonsets and offsets of 100 ms. The background luminance is 100 cd/m2.The SAP stimuli are circular luminance increments with a diameter of0.43°, presented for 200 ms with sudden on- and offsets. The back-ground luminance is 10 cd/m2. Grayscale levels (A) and amplitudes (B,C) are not to scale.

2452 Artes et al. IOVS, July 2005, Vol. 46, No. 7

Page 14: When is Bigger Better: Criteria for Stimulus Size in Perimetry

What  have  we  learned  from  Matrix?  the stimulus centers were within 2° of each other (Fig. 2, closed circlesand unshaded squares). In addition to the test locations on either sideof the vertical meridian, we excluded the two locations near the blindspot.

A principal curve12 implementation19 available in the open-sourcestatistical environment R20 was used to derive the relationship be-tween the two perimetric techniques. This mathematical method findsa best fit between two variables by minimizing the residuals perpen-dicular to the fitted curve in both the x and y variables. In contrast tothe more familiar least-squares type of regression (which minimizes theresiduals in the dependent variable only), the principal curve algorithmis not based on the assumption that the independent variable is mea-sured without error, and it is therefore immaterial which one of thetwo perimetric techniques is represented on the x-axis. To avoid flooreffects that occur when the lower limit of the dynamic range of eithertechnique has been reached, we excluded the data from test locationsat which one or more of the threshold estimates were 0 dB with eitherFDT2 or SAP.

Test–Retest Variability. Test–retest intervals describe therange within which the central 90% of follow-up thresholds are likelyto fall, for each level of baseline threshold, if no real change has takenplace. We derived such intervals for combinations of two baseline andtwo follow-up tests with FDT2 and SAP. Since there were no significantlearning or fatigue effects within or between the three sessions witheither of the two techniques (repeated-measures ANOVA of MD andPSD, P ! 0.1), we treated the order of the six examinations asinterchangeable, thus using all 90 possible combinations of indepen-dent baseline and follow-up pairs of tests. For example, a baselinethreshold calculated from tests 1 and 2 was compared to all sixpossible follow-up thresholds obtained by pair-wise combination oftests 3, 4, 5, and 6. The test–retest intervals were then established bycalculating the empiric 5th and 95th percentiles of the distribution offollow-up thresholds, stratified for the baseline value.

For a quantitative comparison of the relationships between thresh-old and test–retest variability with FDT2 and SAP, we investigated theSD of the six repeated threshold estimates as a function of their meanvalue at each test location, using linear regression analyses of log SDversus mean. Because the dB scales of FDT2 and SAP are based on twodistinct definitions (provided in Description of Techniques), we firsttransformed all thresholds into instrument-independent units of logWeber contrast sensitivity to render the data numerically comparable.As the dynamic range of both perimeters is limited, the sensitivity ofseverely damaged visual field locations may not always be truly mea-surable. Some thresholds may be estimated at 0 dB, since the maximumstimulus contrast has been reached. Because this floor effect can leadto an artifactual decrease in the test–retest variability, the regressionanalyses were confined to locations at which all six threshold estimateswere !0 dB. Similarly, if all six threshold estimates had the same value,the resultant SDs (0) may unduly bias the estimation of the underlyingvariability, and data from such locations were also excluded from theregression analyses of both techniques.

Comparison of Total and Pattern Deviation Probabil-ity Maps. For the comparison between the probability maps of FDT2and SAP, we derived an ordinal defect score for each visual field test.Each test location was assigned a value ranging from 0 to 4 accordingto its probability (P ! 5%, P " 5%, P " 2%, P " 1%, P " 0.5%,respectively) in the printout. These scores were then summed acrossthe 52 test locations of the entire visual field (excluding the foveal testpoint and the two locations at and above the blind spot), and the globaldefect sums were compared between total and pattern deviation prob-ability maps of FDT2 and SAP.

RESULTS

The global indices mean deviation (MD) and pattern standarddeviation (PSD) obtained with FDT2 and SAP were closelyrelated (r # 0.86, P " 0.001; r # 0.95, P " 0.001; respec-

tively). The mean test time of FDT2 was 313 seconds (range,302–321), independent of visual field damage as measured byMD (r # 0.05, P # 0.44). In comparison, the mean test time forSAP was 319 seconds (range, 259–413), increasing signifi-cantly with visual field damage (r # 0.81, P " 0.001).

Comparison of Threshold Estimates

The measurement scales of FDT2 and SAP appeared numeri-cally similar; 90% of threshold estimates in our study werebetween 3 and 32 dB with FDT and between 5 and 32 dB withSAP. At test locations with high sensitivity (mean SAP threshold!25 dB), there was a close and approximately linear relation-ship between the mean thresholds of FDT2 and SAP (Fig. 3). Inthis range of sensitivities, the principal curve had a slope ofapproximately 2.0. At locations with lower sensitivity, how-ever, the spread of the data points increased considerably andthe curve became progressively shallower. In comparison toFDT2, SAP estimated a slightly larger proportion of absolutedefects (proportion of threshold estimates at 0 dB, 3.0% vs.2.2%, respectively with SAP and FDT2; P " 0.001).

Test–Retest Variability

Test–Retest Intervals of Threshold Values. With FDT2,the width of the test–retest intervals appeared nearly constant($8 dB) across virtually the entire measurement range. Incontrast, the intervals of SAP were narrow ($3 dB) withthresholds near 30 dB, but broadened considerably with lowerthresholds, ranging over nearly 15 dB at locations with baselinevalues near 10 dB (Fig. 4).

Parametric Analyses of Test–Retest Variability. In con-trast to SAP, which showed a substantial increase in test–retestvariability with decreasing levels of visual field sensitivity, therewas essentially no such relationship with FDT2. Whereas therelationship between sensitivity and test–retest variability ac-counted for approximately 40% of the observed variance withSAP, virtually none of the variance observed in the FDT2 datawas explained by sensitivity (Fig. 5).

FIGURE 3. Comparison between thresholds of FDT2 and SAP. Eachdata point shows the mean threshold of six tests with FDT2 and sixtests with SAP. Locations affected by floor effects (where one or moreof the six threshold estimates were 0 dB) are shown by filled circles.These effects only occurred at locations with mean thresholds "20 and"23 dB with FDT2 and SAP, respectively. The curve shows the rela-tionship between both techniques established by principal curve anal-ysis for which any locations with truncation effects were excluded.The slope of the linear portion of the curve (SAP sensitivities !25 dB)is approximately 2.0.

IOVS, July 2005, Vol. 46, No. 7 Matrix Perimetry 2453

in  pathologic  range:    No  discernable  rela3onship  between  thresholds  

The main objective of this study was to compare thresholdestimates and their test–retest variability between FDT2 andSAP. To address the methodological issues that arise when testsare compared to an imperfect gold standard, we reduced theeffects of variability by examining each patient six times withboth techniques. Principal curve analysis12 (which accountsfor variability in both the dependent and independent vari-ables) was then performed to determine the relationship be-tween threshold values obtained with both techniques. Thetest–retest variability of both techniques was investigated as afunction of visual field sensitivity and established in terms oftest–retest intervals and standard deviations (SDs). Total andpattern deviation probability maps obtained with both tech-niques were compared by calculating ordinal defect scores.

METHODS

Description of Techniques

Frequency-Doubling Technology Perimetry. The Hum-phrey Matrix (FTD2) perimeter presents FDT stimuli on a cathode raytube with a nominal background luminance of 100 cd/m2, mounted atoptical infinity from the observer’s eye. With the exception of thefoveal stimulus, the stimuli are 5°-square windows of a vertical cosinewave grating with a spatial frequency of 0.5 cyc/deg, counterphaseflickered at 18 Hz (Fig. 1). Stimuli are presented for 500 ms, includingramped onsets and offsets of 100 ms. The principles and psychometricproperties of the maximum-likelihood strategy (ZEST) used for thresh-old estimation have been described in detail.11,13–15 Briefly, a proba-bility density function (PDF) that describes the likely distribution ofthresholds is modified according to the patient’s responses to fourpresentations at each test location, and the threshold is estimated bythe mean of the final PDF. Because 2 of the 16 possible combinationsof 4 yes/no responses give a similar mean, the threshold estimatesassume 15 discrete levels, ranging from 0 to 38 dB. Thresholds areexpressed in terms of contrast attenuation in units of decibels (one-twentieth of a logarithmic unit), using the Michelson definition ofcontrast [(Lmax ! Lmin)/(Lmax " Lmin)], where Lmax and Lmin refer to the

maximum and minimum luminance of the stimulus, respectively. Thetest locations of the 24-2 program of the instrument are shown inFigure 2.

Standard Automated Perimetry. SAP was performed with aHumphrey Field Analyzer (HFA; Carl-Zeiss Meditec), using the SITA-Standard program 24-2 with the standard Goldmann size III stimulus(diameter, 0.43°, see Fig. 1). The principles and properties of the SITAStandard strategy have been described previously.1,16–18 In summary,stimulus intensities are varied in steps of 4 and 2 dB, and the finalthreshold estimates are obtained after maximum-likelihood calcula-tions based on the patient’s responses and the prior PDFs. In con-trast to the threshold strategy of the Humphrey Matrix, the SITAStandard strategy terminates once the threshold of each locationhas been estimated with a given confidence, after a variable numberof presentations. The definition of the dB scale in SAP relates tothe brightest stimulus that the instrument is capable of displaying[dB # 10 log(maximum intensity/threshold stimulus intensity)],which, with the HFA, is 3183 cd/m2. The test locations of the HFA 24-2program are shown in Figure 2, alongside those of the correspondingHumphrey Matrix program.

Study Sample and Testing

Fifteen patients with glaucoma (mean age, 66.3 years; range, 56.1–80.6) with early to moderate visual field loss (mean MD [SITA Stan-dard], !4.0 dB; range, "0.2 to !16.1) were recruited from the glau-coma clinics of the QEII Health Sciences Centre (Halifax, Nova Scotia,Canada). Criteria for inclusion in the study were a clinical diagnosis ofopen-angle glaucoma, refractive error within 5 D equivalent sphere or3 D astigmatism, best-corrected visual acuity !6/12 ("0.3 logMAR),and prior experience with FDT1 perimetry and SAP. Patients wereexamined over three sessions within a period of 4 weeks. Within eachsession, the randomly selected study eye was examined twice withFDT2 (24-2 threshold test) and twice with SAP (SITA Standard; 24-2test). The order of the tests was randomized, and a mandatory break of6 minutes was given between examinations. All participants wore theappropriate refractive correction for each test. The study adhered tothe tenets of the Declaration of Helsinki. The protocol was approvedby the Queen Elizabeth II Health Science Centre Research EthicsCommittee, and all participants gave written informed consent.

Analyses

Comparison of Threshold Estimates. To establish the re-lationship between the threshold estimates of FDT2 and SAP, wecompared the mean result of the six tests at those locations at which

FIGURE 2. Stimulus locations (right eye) of the 24-2 programs of theHFA (small circles) and the Humphrey Matrix perimeter (squares andlarge central circle). For the comparison between the threshold esti-mates, locations were excluded if the stimulus centers of the twotechniques were not within 2° of each other (filled circles, shadedsquares). Also excluded were the two locations in the vicinity of theblind spot.

FIGURE 1. Spatial and temporal characteristics of the stimuli of FDT2(left) and SAP (right). With the exception of the foveal stimulus, thestimuli of FDT2 (A) are 5°-square windows of a vertical cosine wavegrating with a spatial frequency of 0.5 cyc/deg (B), counterphaseflickered at 18 Hz (C). The foveal stimulus is a circular patch with adiameter of 5°. The presentation time is 500 ms, including rampedonsets and offsets of 100 ms. The background luminance is 100 cd/m2.The SAP stimuli are circular luminance increments with a diameter of0.43°, presented for 200 ms with sudden on- and offsets. The back-ground luminance is 10 cd/m2. Grayscale levels (A) and amplitudes (B,C) are not to scale.

2452 Artes et al. IOVS, July 2005, Vol. 46, No. 7

Page 15: When is Bigger Better: Criteria for Stimulus Size in Perimetry

What  have  we  learned  from  Matrix?  the stimulus centers were within 2° of each other (Fig. 2, closed circlesand unshaded squares). In addition to the test locations on either sideof the vertical meridian, we excluded the two locations near the blindspot.

A principal curve12 implementation19 available in the open-sourcestatistical environment R20 was used to derive the relationship be-tween the two perimetric techniques. This mathematical method findsa best fit between two variables by minimizing the residuals perpen-dicular to the fitted curve in both the x and y variables. In contrast tothe more familiar least-squares type of regression (which minimizes theresiduals in the dependent variable only), the principal curve algorithmis not based on the assumption that the independent variable is mea-sured without error, and it is therefore immaterial which one of thetwo perimetric techniques is represented on the x-axis. To avoid flooreffects that occur when the lower limit of the dynamic range of eithertechnique has been reached, we excluded the data from test locationsat which one or more of the threshold estimates were 0 dB with eitherFDT2 or SAP.

Test–Retest Variability. Test–retest intervals describe therange within which the central 90% of follow-up thresholds are likelyto fall, for each level of baseline threshold, if no real change has takenplace. We derived such intervals for combinations of two baseline andtwo follow-up tests with FDT2 and SAP. Since there were no significantlearning or fatigue effects within or between the three sessions witheither of the two techniques (repeated-measures ANOVA of MD andPSD, P ! 0.1), we treated the order of the six examinations asinterchangeable, thus using all 90 possible combinations of indepen-dent baseline and follow-up pairs of tests. For example, a baselinethreshold calculated from tests 1 and 2 was compared to all sixpossible follow-up thresholds obtained by pair-wise combination oftests 3, 4, 5, and 6. The test–retest intervals were then established bycalculating the empiric 5th and 95th percentiles of the distribution offollow-up thresholds, stratified for the baseline value.

For a quantitative comparison of the relationships between thresh-old and test–retest variability with FDT2 and SAP, we investigated theSD of the six repeated threshold estimates as a function of their meanvalue at each test location, using linear regression analyses of log SDversus mean. Because the dB scales of FDT2 and SAP are based on twodistinct definitions (provided in Description of Techniques), we firsttransformed all thresholds into instrument-independent units of logWeber contrast sensitivity to render the data numerically comparable.As the dynamic range of both perimeters is limited, the sensitivity ofseverely damaged visual field locations may not always be truly mea-surable. Some thresholds may be estimated at 0 dB, since the maximumstimulus contrast has been reached. Because this floor effect can leadto an artifactual decrease in the test–retest variability, the regressionanalyses were confined to locations at which all six threshold estimateswere !0 dB. Similarly, if all six threshold estimates had the same value,the resultant SDs (0) may unduly bias the estimation of the underlyingvariability, and data from such locations were also excluded from theregression analyses of both techniques.

Comparison of Total and Pattern Deviation Probabil-ity Maps. For the comparison between the probability maps of FDT2and SAP, we derived an ordinal defect score for each visual field test.Each test location was assigned a value ranging from 0 to 4 accordingto its probability (P ! 5%, P " 5%, P " 2%, P " 1%, P " 0.5%,respectively) in the printout. These scores were then summed acrossthe 52 test locations of the entire visual field (excluding the foveal testpoint and the two locations at and above the blind spot), and the globaldefect sums were compared between total and pattern deviation prob-ability maps of FDT2 and SAP.

RESULTS

The global indices mean deviation (MD) and pattern standarddeviation (PSD) obtained with FDT2 and SAP were closelyrelated (r # 0.86, P " 0.001; r # 0.95, P " 0.001; respec-

tively). The mean test time of FDT2 was 313 seconds (range,302–321), independent of visual field damage as measured byMD (r # 0.05, P # 0.44). In comparison, the mean test time forSAP was 319 seconds (range, 259–413), increasing signifi-cantly with visual field damage (r # 0.81, P " 0.001).

Comparison of Threshold Estimates

The measurement scales of FDT2 and SAP appeared numeri-cally similar; 90% of threshold estimates in our study werebetween 3 and 32 dB with FDT and between 5 and 32 dB withSAP. At test locations with high sensitivity (mean SAP threshold!25 dB), there was a close and approximately linear relation-ship between the mean thresholds of FDT2 and SAP (Fig. 3). Inthis range of sensitivities, the principal curve had a slope ofapproximately 2.0. At locations with lower sensitivity, how-ever, the spread of the data points increased considerably andthe curve became progressively shallower. In comparison toFDT2, SAP estimated a slightly larger proportion of absolutedefects (proportion of threshold estimates at 0 dB, 3.0% vs.2.2%, respectively with SAP and FDT2; P " 0.001).

Test–Retest Variability

Test–Retest Intervals of Threshold Values. With FDT2,the width of the test–retest intervals appeared nearly constant($8 dB) across virtually the entire measurement range. Incontrast, the intervals of SAP were narrow ($3 dB) withthresholds near 30 dB, but broadened considerably with lowerthresholds, ranging over nearly 15 dB at locations with baselinevalues near 10 dB (Fig. 4).

Parametric Analyses of Test–Retest Variability. In con-trast to SAP, which showed a substantial increase in test–retestvariability with decreasing levels of visual field sensitivity, therewas essentially no such relationship with FDT2. Whereas therelationship between sensitivity and test–retest variability ac-counted for approximately 40% of the observed variance withSAP, virtually none of the variance observed in the FDT2 datawas explained by sensitivity (Fig. 5).

FIGURE 3. Comparison between thresholds of FDT2 and SAP. Eachdata point shows the mean threshold of six tests with FDT2 and sixtests with SAP. Locations affected by floor effects (where one or moreof the six threshold estimates were 0 dB) are shown by filled circles.These effects only occurred at locations with mean thresholds "20 and"23 dB with FDT2 and SAP, respectively. The curve shows the rela-tionship between both techniques established by principal curve anal-ysis for which any locations with truncation effects were excluded.The slope of the linear portion of the curve (SAP sensitivities !25 dB)is approximately 2.0.

IOVS, July 2005, Vol. 46, No. 7 Matrix Perimetry 2453

0-­‐dB  es/mates:  similar  propor3ons  (3.0  vs  2.2%)  

The main objective of this study was to compare thresholdestimates and their test–retest variability between FDT2 andSAP. To address the methodological issues that arise when testsare compared to an imperfect gold standard, we reduced theeffects of variability by examining each patient six times withboth techniques. Principal curve analysis12 (which accountsfor variability in both the dependent and independent vari-ables) was then performed to determine the relationship be-tween threshold values obtained with both techniques. Thetest–retest variability of both techniques was investigated as afunction of visual field sensitivity and established in terms oftest–retest intervals and standard deviations (SDs). Total andpattern deviation probability maps obtained with both tech-niques were compared by calculating ordinal defect scores.

METHODS

Description of Techniques

Frequency-Doubling Technology Perimetry. The Hum-phrey Matrix (FTD2) perimeter presents FDT stimuli on a cathode raytube with a nominal background luminance of 100 cd/m2, mounted atoptical infinity from the observer’s eye. With the exception of thefoveal stimulus, the stimuli are 5°-square windows of a vertical cosinewave grating with a spatial frequency of 0.5 cyc/deg, counterphaseflickered at 18 Hz (Fig. 1). Stimuli are presented for 500 ms, includingramped onsets and offsets of 100 ms. The principles and psychometricproperties of the maximum-likelihood strategy (ZEST) used for thresh-old estimation have been described in detail.11,13–15 Briefly, a proba-bility density function (PDF) that describes the likely distribution ofthresholds is modified according to the patient’s responses to fourpresentations at each test location, and the threshold is estimated bythe mean of the final PDF. Because 2 of the 16 possible combinationsof 4 yes/no responses give a similar mean, the threshold estimatesassume 15 discrete levels, ranging from 0 to 38 dB. Thresholds areexpressed in terms of contrast attenuation in units of decibels (one-twentieth of a logarithmic unit), using the Michelson definition ofcontrast [(Lmax ! Lmin)/(Lmax " Lmin)], where Lmax and Lmin refer to the

maximum and minimum luminance of the stimulus, respectively. Thetest locations of the 24-2 program of the instrument are shown inFigure 2.

Standard Automated Perimetry. SAP was performed with aHumphrey Field Analyzer (HFA; Carl-Zeiss Meditec), using the SITA-Standard program 24-2 with the standard Goldmann size III stimulus(diameter, 0.43°, see Fig. 1). The principles and properties of the SITAStandard strategy have been described previously.1,16–18 In summary,stimulus intensities are varied in steps of 4 and 2 dB, and the finalthreshold estimates are obtained after maximum-likelihood calcula-tions based on the patient’s responses and the prior PDFs. In con-trast to the threshold strategy of the Humphrey Matrix, the SITAStandard strategy terminates once the threshold of each locationhas been estimated with a given confidence, after a variable numberof presentations. The definition of the dB scale in SAP relates tothe brightest stimulus that the instrument is capable of displaying[dB # 10 log(maximum intensity/threshold stimulus intensity)],which, with the HFA, is 3183 cd/m2. The test locations of the HFA 24-2program are shown in Figure 2, alongside those of the correspondingHumphrey Matrix program.

Study Sample and Testing

Fifteen patients with glaucoma (mean age, 66.3 years; range, 56.1–80.6) with early to moderate visual field loss (mean MD [SITA Stan-dard], !4.0 dB; range, "0.2 to !16.1) were recruited from the glau-coma clinics of the QEII Health Sciences Centre (Halifax, Nova Scotia,Canada). Criteria for inclusion in the study were a clinical diagnosis ofopen-angle glaucoma, refractive error within 5 D equivalent sphere or3 D astigmatism, best-corrected visual acuity !6/12 ("0.3 logMAR),and prior experience with FDT1 perimetry and SAP. Patients wereexamined over three sessions within a period of 4 weeks. Within eachsession, the randomly selected study eye was examined twice withFDT2 (24-2 threshold test) and twice with SAP (SITA Standard; 24-2test). The order of the tests was randomized, and a mandatory break of6 minutes was given between examinations. All participants wore theappropriate refractive correction for each test. The study adhered tothe tenets of the Declaration of Helsinki. The protocol was approvedby the Queen Elizabeth II Health Science Centre Research EthicsCommittee, and all participants gave written informed consent.

Analyses

Comparison of Threshold Estimates. To establish the re-lationship between the threshold estimates of FDT2 and SAP, wecompared the mean result of the six tests at those locations at which

FIGURE 2. Stimulus locations (right eye) of the 24-2 programs of theHFA (small circles) and the Humphrey Matrix perimeter (squares andlarge central circle). For the comparison between the threshold esti-mates, locations were excluded if the stimulus centers of the twotechniques were not within 2° of each other (filled circles, shadedsquares). Also excluded were the two locations in the vicinity of theblind spot.

FIGURE 1. Spatial and temporal characteristics of the stimuli of FDT2(left) and SAP (right). With the exception of the foveal stimulus, thestimuli of FDT2 (A) are 5°-square windows of a vertical cosine wavegrating with a spatial frequency of 0.5 cyc/deg (B), counterphaseflickered at 18 Hz (C). The foveal stimulus is a circular patch with adiameter of 5°. The presentation time is 500 ms, including rampedonsets and offsets of 100 ms. The background luminance is 100 cd/m2.The SAP stimuli are circular luminance increments with a diameter of0.43°, presented for 200 ms with sudden on- and offsets. The back-ground luminance is 10 cd/m2. Grayscale levels (A) and amplitudes (B,C) are not to scale.

2452 Artes et al. IOVS, July 2005, Vol. 46, No. 7

Page 16: When is Bigger Better: Criteria for Stimulus Size in Perimetry

What  have  we  learned  from  Matrix?  

The main objective of this study was to compare thresholdestimates and their test–retest variability between FDT2 andSAP. To address the methodological issues that arise when testsare compared to an imperfect gold standard, we reduced theeffects of variability by examining each patient six times withboth techniques. Principal curve analysis12 (which accountsfor variability in both the dependent and independent vari-ables) was then performed to determine the relationship be-tween threshold values obtained with both techniques. Thetest–retest variability of both techniques was investigated as afunction of visual field sensitivity and established in terms oftest–retest intervals and standard deviations (SDs). Total andpattern deviation probability maps obtained with both tech-niques were compared by calculating ordinal defect scores.

METHODS

Description of Techniques

Frequency-Doubling Technology Perimetry. The Hum-phrey Matrix (FTD2) perimeter presents FDT stimuli on a cathode raytube with a nominal background luminance of 100 cd/m2, mounted atoptical infinity from the observer’s eye. With the exception of thefoveal stimulus, the stimuli are 5°-square windows of a vertical cosinewave grating with a spatial frequency of 0.5 cyc/deg, counterphaseflickered at 18 Hz (Fig. 1). Stimuli are presented for 500 ms, includingramped onsets and offsets of 100 ms. The principles and psychometricproperties of the maximum-likelihood strategy (ZEST) used for thresh-old estimation have been described in detail.11,13–15 Briefly, a proba-bility density function (PDF) that describes the likely distribution ofthresholds is modified according to the patient’s responses to fourpresentations at each test location, and the threshold is estimated bythe mean of the final PDF. Because 2 of the 16 possible combinationsof 4 yes/no responses give a similar mean, the threshold estimatesassume 15 discrete levels, ranging from 0 to 38 dB. Thresholds areexpressed in terms of contrast attenuation in units of decibels (one-twentieth of a logarithmic unit), using the Michelson definition ofcontrast [(Lmax ! Lmin)/(Lmax " Lmin)], where Lmax and Lmin refer to the

maximum and minimum luminance of the stimulus, respectively. Thetest locations of the 24-2 program of the instrument are shown inFigure 2.

Standard Automated Perimetry. SAP was performed with aHumphrey Field Analyzer (HFA; Carl-Zeiss Meditec), using the SITA-Standard program 24-2 with the standard Goldmann size III stimulus(diameter, 0.43°, see Fig. 1). The principles and properties of the SITAStandard strategy have been described previously.1,16–18 In summary,stimulus intensities are varied in steps of 4 and 2 dB, and the finalthreshold estimates are obtained after maximum-likelihood calcula-tions based on the patient’s responses and the prior PDFs. In con-trast to the threshold strategy of the Humphrey Matrix, the SITAStandard strategy terminates once the threshold of each locationhas been estimated with a given confidence, after a variable numberof presentations. The definition of the dB scale in SAP relates tothe brightest stimulus that the instrument is capable of displaying[dB # 10 log(maximum intensity/threshold stimulus intensity)],which, with the HFA, is 3183 cd/m2. The test locations of the HFA 24-2program are shown in Figure 2, alongside those of the correspondingHumphrey Matrix program.

Study Sample and Testing

Fifteen patients with glaucoma (mean age, 66.3 years; range, 56.1–80.6) with early to moderate visual field loss (mean MD [SITA Stan-dard], !4.0 dB; range, "0.2 to !16.1) were recruited from the glau-coma clinics of the QEII Health Sciences Centre (Halifax, Nova Scotia,Canada). Criteria for inclusion in the study were a clinical diagnosis ofopen-angle glaucoma, refractive error within 5 D equivalent sphere or3 D astigmatism, best-corrected visual acuity !6/12 ("0.3 logMAR),and prior experience with FDT1 perimetry and SAP. Patients wereexamined over three sessions within a period of 4 weeks. Within eachsession, the randomly selected study eye was examined twice withFDT2 (24-2 threshold test) and twice with SAP (SITA Standard; 24-2test). The order of the tests was randomized, and a mandatory break of6 minutes was given between examinations. All participants wore theappropriate refractive correction for each test. The study adhered tothe tenets of the Declaration of Helsinki. The protocol was approvedby the Queen Elizabeth II Health Science Centre Research EthicsCommittee, and all participants gave written informed consent.

Analyses

Comparison of Threshold Estimates. To establish the re-lationship between the threshold estimates of FDT2 and SAP, wecompared the mean result of the six tests at those locations at which

FIGURE 2. Stimulus locations (right eye) of the 24-2 programs of theHFA (small circles) and the Humphrey Matrix perimeter (squares andlarge central circle). For the comparison between the threshold esti-mates, locations were excluded if the stimulus centers of the twotechniques were not within 2° of each other (filled circles, shadedsquares). Also excluded were the two locations in the vicinity of theblind spot.

FIGURE 1. Spatial and temporal characteristics of the stimuli of FDT2(left) and SAP (right). With the exception of the foveal stimulus, thestimuli of FDT2 (A) are 5°-square windows of a vertical cosine wavegrating with a spatial frequency of 0.5 cyc/deg (B), counterphaseflickered at 18 Hz (C). The foveal stimulus is a circular patch with adiameter of 5°. The presentation time is 500 ms, including rampedonsets and offsets of 100 ms. The background luminance is 100 cd/m2.The SAP stimuli are circular luminance increments with a diameter of0.43°, presented for 200 ms with sudden on- and offsets. The back-ground luminance is 10 cd/m2. Grayscale levels (A) and amplitudes (B,C) are not to scale.

2452 Artes et al. IOVS, July 2005, Vol. 46, No. 7Comparison of Total and Pattern-DeviationProbability Maps

A comparison of the defect scores from the total and patterndeviation probability maps of FDT2 and SAP showed goodoverall agreement between both techniques (Fig. 6). In pa-tients with relatively early SAP visual field loss (SAP total devi-ation defect scores !30), the total deviation probability mapsof FDT2 appeared slightly less abnormal than those of SAP, butthis finding did not reach statistical significance (P " 0.06,Wilcoxon test). No such systematic differences were seen withthe pattern deviation analyses (P " 0.78, Wilcoxon test).

Case Examples

The examples given in Figures 7 and 8 illustrate some of thefindings of our study. For clarity, only single test results areshown, but they are representative of all six examinations withboth perimetric techniques. The first example (Fig. 7) illus-trates discrepancies between the threshold estimates of FDT2and SAP where the decibel values of the former were consis-tently higher than those of the latter. In the second example(Fig. 8), discrepancies in the opposite direction were observedin the superior nasal quadrant. At a location in the inferiortemporal quadrant, however, the decibel values of FDT2 wereconsistently higher than those of SAP. In both examples, theprobability maps of both techniques agreed closely with each

other, although SAP flagged a slightly larger number of loca-tions as outside normal limits.

DISCUSSION

Previous investigations with FDT perimetry have shown thatthe technique may possess several potential advantages overSAP.8 The second generation of instruments using this technol-ogy has now become available, and the objective of this studywas to compare test results from FDT2 and SAP in patientswith glaucoma. To gain precise estimates, both of thresholdsand of test–retest variability, we examined a small group ofpatients with a rigorous protocol of six examinations with eachtechnique.

For visual field locations with high sensitivity (#25 dB withSAP), our data showed a close association between the point-wise mean thresholds of FDT2 and SAP. Within this range, therelationship appeared nearly linear, with a gradient of 2, con-forming to what would be expected from consideration of thetechniques’ distinct definitions of the dB scale. With FDT2, a1-dB change in threshold refers to a 0.05-log unit change instimulus contrast, whereas with SAP, a 1-dB change refers to acontrast change of 0.1 log units. This means that, at testlocations with early damage, visual field changes over timeshould be numerically twice as large with FDT2 comparedwith SAP. At visual field locations with lower sensitivity (!25

FIGURE 4. Test–retest estimates ofFDT2 and SAP. Data points show the5th and 95th percentiles of the dis-tribution of retest estimates, strati-fied by mean threshold at baseline.Smooth curves were fitted throughthe data points by locally weightedregression to approximate the 90%test–retest intervals (solid lines).Dashed diagonal line: line of equality.

FIGURE 5. Parametric estimates oftest–retest variability with FDT2and SAP. Scaling is in instrument-independent units of log Webercontrast sensitivity (CS) with instru-ment-dependent dB units providedfor comparison. The linear curveshows the fitted function log(e)

SD " A ! log CS $ constant. Thevalues for the slope parameter Aand its confidence interval areshown in the graph, along with theproportion of variance explainedby the relationship (R2). (F) Testlocations with floor effects (thresh-old values of 0 dB, all mean thresh-olds !20 and !23 dB with FDT2and SAP, respectively) were ex-cluded from the regression analyses(solid lines). The italic values above

the abscissae give the number of data points with SDs of 0, which were also excluded from the regression. A small amount of noise was addedto both plots to improve the visibility of overlapping data points.

2454 Artes et al. IOVS, July 2005, Vol. 46, No. 7

Uniform  variability  with  Matrix  

Page 17: When is Bigger Better: Criteria for Stimulus Size in Perimetry
Page 18: When is Bigger Better: Criteria for Stimulus Size in Perimetry

Problem  Which  test  has  lower  variability?  Which  test  detects  more  visual  field  loss?  Which  test  demonstrates  change  be8er?  

Intensity  scales  are  different.  Thresholding  algorithms  are  different.  Norma3ve  datasets  are  different.  There  is  no  ideal  reference  for  loss,  or  change.    

Page 19: When is Bigger Better: Criteria for Stimulus Size in Perimetry

Signal  /  Noise  Analysis  

of loss, the variability of the measurements, and the dynamicrange of the technique. A larger SNR therefore does not nec-essarily mean that one technique is more sensitive than an-other, nor does a more sensitive technique necessarily providea larger SNR (Fig. 10).

The signal/noise methodology proposed in this article hasseveral limitations. To obtain robust estimates of signal andnoise, multiple tests have to be performed. Nevertheless, arigorous protocol with at least five examinations per eye hasadvantages also for the derivation of test–retest intervals, be-cause a large number of combinations of test–retest examina-tions can be analyzed.26,48 SNRs depend on the sample ofpatients and therefore cannot be compared across differentstudies. They also depend on the somewhat arbitrary choice ofwhere in the visual field the signal and noise distributions arederived from. In this study, we used the superior–inferiorsectors of the Glaucoma Hemifield Test, and therefore ourfinding of larger SNRs with FDT2 may strictly apply only tothose analyses that make use of a similar clustering. In princi-ple, however, other pairs of test locations or pairs of clusterscould be chosen. Finally, SNRs can be estimated only if focal

losses are present in the visual field; diffuse reductions insensitivity do not contribute a signal. As a consequence, themethod is unsuitable for evaluating techniques that predomi-nantly uncover diffuse loss.

The assumption that gradients in space can be used as a firstapproximation for change over time appears reasonable but is,as yet, untested. Signal/noise estimates from test–retest studieswill therefore not replace longitudinal studies for investigatingnew visual field tests’ ability to monitor patients with glau-coma, but they may provide early insight into properties thatcannot be gained solely from analyses of test–retest variability.They may help in hypothesis-building and in planning effectivelongitudinal studies of new visual field tests.

FIGURE 8. Relationship between SNRs with SAP and FDT2. The pointslabeled A, B, C correspond to the examples in Figures 4, 5, and 6,respectively. Axes are drawn on a square-root scale to emphasizemid-range values.

FIGURE 9. (a) The SNR measured between locations At1 and Bt1 mea-sures how reliably the technique represents the gradient of damage inspace (vertical arrow). (b) If B deteriorates over time, such that itsdeviation at Bt2 becomes equal to that of At1, the gradient in timebetween Bt1 and Bt2 is equal to that between At1 and Bt1. The ability todetect the change from Bt1 to Bt2 should be similar to that measured bythe SNR between At1 and Bt1.

TABLE 2. Comparison between SNRs of FDT2 and SAP

SNRRatio of SNR

(FDT2/SAP CI) SNR FDT2 > SAP/total P

!0.5 1.20 (0.93–1.56) 66/71 (93%) 0.15!1.0 1.43 (1.05–1.94) 34/50 (68%) 0.03!2.0 1.39 (1.04–10.6) 22/30 (73%) 0.02

A ratio !1 in column 2 indicates that FDT2 provided a larger SNRthan SAP. Column 3 gives the number of pairs in which the SNR ofFDT2 was greater than that of SAP. P values and confidence intervalswere established by mixed-effects modeling, because each patientcontributed five estimates.

FIGURE 10. Relationship between signals of two techniques, A and B,when A is more sensitive than B. (a) Technique A reveals superior loss(signal at nasal step, vertical arrow). There is no signal with B. (b)Technique A reveals more extensive loss than does technique B but hasreached the limit of its dynamic range; its signal is small compared toB. (c) Owing to its larger dynamic range, B continues to provide signaleven though both superior and inferior sectors are damaged.

4706 Artes and Chauhan IOVS, October 2009, Vol. 50, No. 10

of loss, the variability of the measurements, and the dynamicrange of the technique. A larger SNR therefore does not nec-essarily mean that one technique is more sensitive than an-other, nor does a more sensitive technique necessarily providea larger SNR (Fig. 10).

The signal/noise methodology proposed in this article hasseveral limitations. To obtain robust estimates of signal andnoise, multiple tests have to be performed. Nevertheless, arigorous protocol with at least five examinations per eye hasadvantages also for the derivation of test–retest intervals, be-cause a large number of combinations of test–retest examina-tions can be analyzed.26,48 SNRs depend on the sample ofpatients and therefore cannot be compared across differentstudies. They also depend on the somewhat arbitrary choice ofwhere in the visual field the signal and noise distributions arederived from. In this study, we used the superior–inferiorsectors of the Glaucoma Hemifield Test, and therefore ourfinding of larger SNRs with FDT2 may strictly apply only tothose analyses that make use of a similar clustering. In princi-ple, however, other pairs of test locations or pairs of clusterscould be chosen. Finally, SNRs can be estimated only if focal

losses are present in the visual field; diffuse reductions insensitivity do not contribute a signal. As a consequence, themethod is unsuitable for evaluating techniques that predomi-nantly uncover diffuse loss.

The assumption that gradients in space can be used as a firstapproximation for change over time appears reasonable but is,as yet, untested. Signal/noise estimates from test–retest studieswill therefore not replace longitudinal studies for investigatingnew visual field tests’ ability to monitor patients with glau-coma, but they may provide early insight into properties thatcannot be gained solely from analyses of test–retest variability.They may help in hypothesis-building and in planning effectivelongitudinal studies of new visual field tests.

FIGURE 8. Relationship between SNRs with SAP and FDT2. The pointslabeled A, B, C correspond to the examples in Figures 4, 5, and 6,respectively. Axes are drawn on a square-root scale to emphasizemid-range values.

FIGURE 9. (a) The SNR measured between locations At1 and Bt1 mea-sures how reliably the technique represents the gradient of damage inspace (vertical arrow). (b) If B deteriorates over time, such that itsdeviation at Bt2 becomes equal to that of At1, the gradient in timebetween Bt1 and Bt2 is equal to that between At1 and Bt1. The ability todetect the change from Bt1 to Bt2 should be similar to that measured bythe SNR between At1 and Bt1.

TABLE 2. Comparison between SNRs of FDT2 and SAP

SNRRatio of SNR

(FDT2/SAP CI) SNR FDT2 > SAP/total P

!0.5 1.20 (0.93–1.56) 66/71 (93%) 0.15!1.0 1.43 (1.05–1.94) 34/50 (68%) 0.03!2.0 1.39 (1.04–10.6) 22/30 (73%) 0.02

A ratio !1 in column 2 indicates that FDT2 provided a larger SNRthan SAP. Column 3 gives the number of pairs in which the SNR ofFDT2 was greater than that of SAP. P values and confidence intervalswere established by mixed-effects modeling, because each patientcontributed five estimates.

FIGURE 10. Relationship between signals of two techniques, A and B,when A is more sensitive than B. (a) Technique A reveals superior loss(signal at nasal step, vertical arrow). There is no signal with B. (b)Technique A reveals more extensive loss than does technique B but hasreached the limit of its dynamic range; its signal is small compared toB. (c) Owing to its larger dynamic range, B continues to provide signaleven though both superior and inferior sectors are damaged.

4706 Artes and Chauhan IOVS, October 2009, Vol. 50, No. 10

Page 20: When is Bigger Better: Criteria for Stimulus Size in Perimetry

with the median absolute deviation (MAD)39 of the differences (de-scribed in the Appendix, along with Bland-Altman comparisons ofparametric and nonparametric estimates).

Signal, noise, and signal/noise ratios (SNRs) were then comparedbetween SAP and FDT2, in each of the five pairs of sectors and each ofthe 15 patients. All analyses were performed in the freely availableopen-source environment R.40,41 The nlme library42 was used to esti-mate statistical significance and confidence intervals. Patients weretreated as random factors, to adjust for the nonindependence of thefive sector pairs within each patient.

RESULTS

The sMD in the 10 visual field sectors ranged from !27.8 to"1.5 dB (median, !2.3 dB) with SAP, and from !26.2 to "3.3dB (median, !2.4 dB) with FDT2. The relationship betweenSAP and FDT accounted for 69% of the variance in the data(Spearman rank correlation, P # 0.001, Fig. 3), and, for sectors

with SAP sMD better than !10 dB, the relationship betweenSAP and FDT2 appeared linear (Tukey’s test for additivity,43

P $ 0.22) with a slope of 2.1 (RMA regression; 95% confidenceinterval [CI], 1.9–2.3). For sectors with more advanced dam-age, however, the relationship between the sMDs of bothtechniques became progressively weaker and deviated signifi-cantly from that observed in less damaged sectors (P # 0.001,Tukey’s test). Despite the averaging of six repeated tests withboth techniques, the data exhibited a large degree of scatter(Fig. 3).

Signal and noise estimates were derived from the mean andthe SD of the superior–inferior differences (Fig. 2), with bothFDT2 and SAP. Three selected examples are shown in Figures4, 5, and 6.

There was a moderately close relationship between thesignals of both techniques (Fig. 7a, r2 $ 0.52, P # 0.001), butno such relationship between the noise estimates (r2 $ 0.01,P $ 0.16; Fig. 7b).

FIGURE 5. Example B: In this exten-sively damaged visual field, both SAPand FDT2 revealed a small difference(2.8 and 4.2 dB, respectively) be-tween the superior and inferior para-central sectors. With SAP, the largevariability of both sectors (SD, 3.7dB) made it difficult to distinguishthis signal (SNR, 0.8). With FDT2,the asymmetry was more clearly ap-parent (SNR, 3.1), chiefly because oflower noise (1.3 dB). Vertical graybar (noise): as described in Figure 4.

IOVS, October 2009, Vol. 50, No. 10 Signal/Noise Ratios in Perimetry 4703

of loss, the variability of the measurements, and the dynamicrange of the technique. A larger SNR therefore does not nec-essarily mean that one technique is more sensitive than an-other, nor does a more sensitive technique necessarily providea larger SNR (Fig. 10).

The signal/noise methodology proposed in this article hasseveral limitations. To obtain robust estimates of signal andnoise, multiple tests have to be performed. Nevertheless, arigorous protocol with at least five examinations per eye hasadvantages also for the derivation of test–retest intervals, be-cause a large number of combinations of test–retest examina-tions can be analyzed.26,48 SNRs depend on the sample ofpatients and therefore cannot be compared across differentstudies. They also depend on the somewhat arbitrary choice ofwhere in the visual field the signal and noise distributions arederived from. In this study, we used the superior–inferiorsectors of the Glaucoma Hemifield Test, and therefore ourfinding of larger SNRs with FDT2 may strictly apply only tothose analyses that make use of a similar clustering. In princi-ple, however, other pairs of test locations or pairs of clusterscould be chosen. Finally, SNRs can be estimated only if focal

losses are present in the visual field; diffuse reductions insensitivity do not contribute a signal. As a consequence, themethod is unsuitable for evaluating techniques that predomi-nantly uncover diffuse loss.

The assumption that gradients in space can be used as a firstapproximation for change over time appears reasonable but is,as yet, untested. Signal/noise estimates from test–retest studieswill therefore not replace longitudinal studies for investigatingnew visual field tests’ ability to monitor patients with glau-coma, but they may provide early insight into properties thatcannot be gained solely from analyses of test–retest variability.They may help in hypothesis-building and in planning effectivelongitudinal studies of new visual field tests.

FIGURE 8. Relationship between SNRs with SAP and FDT2. The pointslabeled A, B, C correspond to the examples in Figures 4, 5, and 6,respectively. Axes are drawn on a square-root scale to emphasizemid-range values.

FIGURE 9. (a) The SNR measured between locations At1 and Bt1 mea-sures how reliably the technique represents the gradient of damage inspace (vertical arrow). (b) If B deteriorates over time, such that itsdeviation at Bt2 becomes equal to that of At1, the gradient in timebetween Bt1 and Bt2 is equal to that between At1 and Bt1. The ability todetect the change from Bt1 to Bt2 should be similar to that measured bythe SNR between At1 and Bt1.

TABLE 2. Comparison between SNRs of FDT2 and SAP

SNRRatio of SNR

(FDT2/SAP CI) SNR FDT2 > SAP/total P

!0.5 1.20 (0.93–1.56) 66/71 (93%) 0.15!1.0 1.43 (1.05–1.94) 34/50 (68%) 0.03!2.0 1.39 (1.04–10.6) 22/30 (73%) 0.02

A ratio !1 in column 2 indicates that FDT2 provided a larger SNRthan SAP. Column 3 gives the number of pairs in which the SNR ofFDT2 was greater than that of SAP. P values and confidence intervalswere established by mixed-effects modeling, because each patientcontributed five estimates.

FIGURE 10. Relationship between signals of two techniques, A and B,when A is more sensitive than B. (a) Technique A reveals superior loss(signal at nasal step, vertical arrow). There is no signal with B. (b)Technique A reveals more extensive loss than does technique B but hasreached the limit of its dynamic range; its signal is small compared toB. (c) Owing to its larger dynamic range, B continues to provide signaleven though both superior and inferior sectors are damaged.

4706 Artes and Chauhan IOVS, October 2009, Vol. 50, No. 10

Page 21: When is Bigger Better: Criteria for Stimulus Size in Perimetry

with the median absolute deviation (MAD)39 of the differences (de-scribed in the Appendix, along with Bland-Altman comparisons ofparametric and nonparametric estimates).

Signal, noise, and signal/noise ratios (SNRs) were then comparedbetween SAP and FDT2, in each of the five pairs of sectors and each ofthe 15 patients. All analyses were performed in the freely availableopen-source environment R.40,41 The nlme library42 was used to esti-mate statistical significance and confidence intervals. Patients weretreated as random factors, to adjust for the nonindependence of thefive sector pairs within each patient.

RESULTS

The sMD in the 10 visual field sectors ranged from !27.8 to"1.5 dB (median, !2.3 dB) with SAP, and from !26.2 to "3.3dB (median, !2.4 dB) with FDT2. The relationship betweenSAP and FDT accounted for 69% of the variance in the data(Spearman rank correlation, P # 0.001, Fig. 3), and, for sectors

with SAP sMD better than !10 dB, the relationship betweenSAP and FDT2 appeared linear (Tukey’s test for additivity,43

P $ 0.22) with a slope of 2.1 (RMA regression; 95% confidenceinterval [CI], 1.9–2.3). For sectors with more advanced dam-age, however, the relationship between the sMDs of bothtechniques became progressively weaker and deviated signifi-cantly from that observed in less damaged sectors (P # 0.001,Tukey’s test). Despite the averaging of six repeated tests withboth techniques, the data exhibited a large degree of scatter(Fig. 3).

Signal and noise estimates were derived from the mean andthe SD of the superior–inferior differences (Fig. 2), with bothFDT2 and SAP. Three selected examples are shown in Figures4, 5, and 6.

There was a moderately close relationship between thesignals of both techniques (Fig. 7a, r2 $ 0.52, P # 0.001), butno such relationship between the noise estimates (r2 $ 0.01,P $ 0.16; Fig. 7b).

FIGURE 5. Example B: In this exten-sively damaged visual field, both SAPand FDT2 revealed a small difference(2.8 and 4.2 dB, respectively) be-tween the superior and inferior para-central sectors. With SAP, the largevariability of both sectors (SD, 3.7dB) made it difficult to distinguishthis signal (SNR, 0.8). With FDT2,the asymmetry was more clearly ap-parent (SNR, 3.1), chiefly because oflower noise (1.3 dB). Vertical graybar (noise): as described in Figure 4.

IOVS, October 2009, Vol. 50, No. 10 Signal/Noise Ratios in Perimetry 4703

of loss, the variability of the measurements, and the dynamicrange of the technique. A larger SNR therefore does not nec-essarily mean that one technique is more sensitive than an-other, nor does a more sensitive technique necessarily providea larger SNR (Fig. 10).

The signal/noise methodology proposed in this article hasseveral limitations. To obtain robust estimates of signal andnoise, multiple tests have to be performed. Nevertheless, arigorous protocol with at least five examinations per eye hasadvantages also for the derivation of test–retest intervals, be-cause a large number of combinations of test–retest examina-tions can be analyzed.26,48 SNRs depend on the sample ofpatients and therefore cannot be compared across differentstudies. They also depend on the somewhat arbitrary choice ofwhere in the visual field the signal and noise distributions arederived from. In this study, we used the superior–inferiorsectors of the Glaucoma Hemifield Test, and therefore ourfinding of larger SNRs with FDT2 may strictly apply only tothose analyses that make use of a similar clustering. In princi-ple, however, other pairs of test locations or pairs of clusterscould be chosen. Finally, SNRs can be estimated only if focal

losses are present in the visual field; diffuse reductions insensitivity do not contribute a signal. As a consequence, themethod is unsuitable for evaluating techniques that predomi-nantly uncover diffuse loss.

The assumption that gradients in space can be used as a firstapproximation for change over time appears reasonable but is,as yet, untested. Signal/noise estimates from test–retest studieswill therefore not replace longitudinal studies for investigatingnew visual field tests’ ability to monitor patients with glau-coma, but they may provide early insight into properties thatcannot be gained solely from analyses of test–retest variability.They may help in hypothesis-building and in planning effectivelongitudinal studies of new visual field tests.

FIGURE 8. Relationship between SNRs with SAP and FDT2. The pointslabeled A, B, C correspond to the examples in Figures 4, 5, and 6,respectively. Axes are drawn on a square-root scale to emphasizemid-range values.

FIGURE 9. (a) The SNR measured between locations At1 and Bt1 mea-sures how reliably the technique represents the gradient of damage inspace (vertical arrow). (b) If B deteriorates over time, such that itsdeviation at Bt2 becomes equal to that of At1, the gradient in timebetween Bt1 and Bt2 is equal to that between At1 and Bt1. The ability todetect the change from Bt1 to Bt2 should be similar to that measured bythe SNR between At1 and Bt1.

TABLE 2. Comparison between SNRs of FDT2 and SAP

SNRRatio of SNR

(FDT2/SAP CI) SNR FDT2 > SAP/total P

!0.5 1.20 (0.93–1.56) 66/71 (93%) 0.15!1.0 1.43 (1.05–1.94) 34/50 (68%) 0.03!2.0 1.39 (1.04–10.6) 22/30 (73%) 0.02

A ratio !1 in column 2 indicates that FDT2 provided a larger SNRthan SAP. Column 3 gives the number of pairs in which the SNR ofFDT2 was greater than that of SAP. P values and confidence intervalswere established by mixed-effects modeling, because each patientcontributed five estimates.

FIGURE 10. Relationship between signals of two techniques, A and B,when A is more sensitive than B. (a) Technique A reveals superior loss(signal at nasal step, vertical arrow). There is no signal with B. (b)Technique A reveals more extensive loss than does technique B but hasreached the limit of its dynamic range; its signal is small compared toB. (c) Owing to its larger dynamic range, B continues to provide signaleven though both superior and inferior sectors are damaged.

4706 Artes and Chauhan IOVS, October 2009, Vol. 50, No. 10

of loss, the variability of the measurements, and the dynamicrange of the technique. A larger SNR therefore does not nec-essarily mean that one technique is more sensitive than an-other, nor does a more sensitive technique necessarily providea larger SNR (Fig. 10).

The signal/noise methodology proposed in this article hasseveral limitations. To obtain robust estimates of signal andnoise, multiple tests have to be performed. Nevertheless, arigorous protocol with at least five examinations per eye hasadvantages also for the derivation of test–retest intervals, be-cause a large number of combinations of test–retest examina-tions can be analyzed.26,48 SNRs depend on the sample ofpatients and therefore cannot be compared across differentstudies. They also depend on the somewhat arbitrary choice ofwhere in the visual field the signal and noise distributions arederived from. In this study, we used the superior–inferiorsectors of the Glaucoma Hemifield Test, and therefore ourfinding of larger SNRs with FDT2 may strictly apply only tothose analyses that make use of a similar clustering. In princi-ple, however, other pairs of test locations or pairs of clusterscould be chosen. Finally, SNRs can be estimated only if focal

losses are present in the visual field; diffuse reductions insensitivity do not contribute a signal. As a consequence, themethod is unsuitable for evaluating techniques that predomi-nantly uncover diffuse loss.

The assumption that gradients in space can be used as a firstapproximation for change over time appears reasonable but is,as yet, untested. Signal/noise estimates from test–retest studieswill therefore not replace longitudinal studies for investigatingnew visual field tests’ ability to monitor patients with glau-coma, but they may provide early insight into properties thatcannot be gained solely from analyses of test–retest variability.They may help in hypothesis-building and in planning effectivelongitudinal studies of new visual field tests.

FIGURE 8. Relationship between SNRs with SAP and FDT2. The pointslabeled A, B, C correspond to the examples in Figures 4, 5, and 6,respectively. Axes are drawn on a square-root scale to emphasizemid-range values.

FIGURE 9. (a) The SNR measured between locations At1 and Bt1 mea-sures how reliably the technique represents the gradient of damage inspace (vertical arrow). (b) If B deteriorates over time, such that itsdeviation at Bt2 becomes equal to that of At1, the gradient in timebetween Bt1 and Bt2 is equal to that between At1 and Bt1. The ability todetect the change from Bt1 to Bt2 should be similar to that measured bythe SNR between At1 and Bt1.

TABLE 2. Comparison between SNRs of FDT2 and SAP

SNRRatio of SNR

(FDT2/SAP CI) SNR FDT2 > SAP/total P

!0.5 1.20 (0.93–1.56) 66/71 (93%) 0.15!1.0 1.43 (1.05–1.94) 34/50 (68%) 0.03!2.0 1.39 (1.04–10.6) 22/30 (73%) 0.02

A ratio !1 in column 2 indicates that FDT2 provided a larger SNRthan SAP. Column 3 gives the number of pairs in which the SNR ofFDT2 was greater than that of SAP. P values and confidence intervalswere established by mixed-effects modeling, because each patientcontributed five estimates.

FIGURE 10. Relationship between signals of two techniques, A and B,when A is more sensitive than B. (a) Technique A reveals superior loss(signal at nasal step, vertical arrow). There is no signal with B. (b)Technique A reveals more extensive loss than does technique B but hasreached the limit of its dynamic range; its signal is small compared toB. (c) Owing to its larger dynamic range, B continues to provide signaleven though both superior and inferior sectors are damaged.

4706 Artes and Chauhan IOVS, October 2009, Vol. 50, No. 10

Page 22: When is Bigger Better: Criteria for Stimulus Size in Perimetry
Page 23: When is Bigger Better: Criteria for Stimulus Size in Perimetry

Similar  dynamic  range  Similar  separa3on  between  controls  and  pa3ents  More  uniform  variability  Higher  cross-­‐sec3onal  signal/noise  ra3os    Progression:    Goren  et  al.,  IPS  2012:  similar  rates  of  change  Redmond  et  al:  similar  %  with  significant  change          

What  have  we  learned  from  Matrix?  

Page 24: When is Bigger Better: Criteria for Stimulus Size in Perimetry

Several  desirable  proper3es,  but  no  compelling  evidence  of  overall  superiority.  Evidence  on  change  over  3me  is  coming  along…    Ques/on:      Is  it  /me  to  abandon  the  search  for  the  “universally  beOer  test”?  

What  have  we  learned  from  Matrix?  

Page 25: When is Bigger Better: Criteria for Stimulus Size in Perimetry

stimulus area, mm^2

asb

1/16 1/4 1 4 16 64 256

110

100

1000

10000

Gol

dman

n I

Gol

dman

n II

Gol

dman

n III

Gol

dman

n IV

Gol

dman

n V

0.25

1 4 16 640.354

0.5

0.707

1.41

2 2.83

5.66

8 11.3

22.6

32 128

256

mm^2

Page 26: When is Bigger Better: Criteria for Stimulus Size in Perimetry

stimulus area, mm^2

asb

1/16 1/4 1 4 16 64 256

110

100

1000

10000

Gol

dman

n I

Gol

dman

n II

Gol

dman

n III

Gol

dman

n IV

Gol

dman

n V

0.25

1 4 16 640.354

0.5

0.707

1.41

2 2.83

5.66

8 11.3

22.6

32 128

256

mm^2

Page 27: When is Bigger Better: Criteria for Stimulus Size in Perimetry

G  I  

G  V  

Page 28: When is Bigger Better: Criteria for Stimulus Size in Perimetry

Some3mes  even  God  needs  s3mulus  size  V.  

*  Personal  communica3on,  Neil  Armstrong,  July  21,  1969  

Page 29: When is Bigger Better: Criteria for Stimulus Size in Perimetry

Thank  you!