You are seeing this message because your Web browser does not support basic Web standards. Find out more about why this message is appearing and what you can do to make your experience on this site better.


ABOUT ARCHIVES
Advanced Search

Welcome   | My Account | E-mail Alerts | Access Rights | Sign In


  Vol. 122 No. 6, June 2004 TABLE OF CONTENTS
  Archives
  •  Online Features
  Clinical Sciences
 This Article
 •Abstract
 •PDF
 • Reply to article
 •Send to a friend
 • Save in My Folder
 •Save to citation manager
 •Permissions
 Citing Articles
 •Citation map
 •Citing articles on HighWire
 •Citing articles on ISI (114)
 •Contact me when this article is cited
 Related Content
 •Similar articles in this journal
 Topic Collections
 •Radiologic Imaging
 •Glaucoma
 •Ocular Imaging
 •Alert me on articles by topic

Comparison of the GDx VCC Scanning Laser Polarimeter, HRT II Confocal Scanning Laser Ophthalmoscope, and Stratus OCT Optical Coherence Tomograph for the Detection of Glaucoma

Felipe A. Medeiros, MD; Linda M. Zangwill, PhD; Christopher Bowd, PhD; Robert N. Weinreb, MD

Arch Ophthalmol. 2004;122:827-837.

ABSTRACT

Objective  To compare the abilities of current commercially available versions of 3 optical imaging techniques: scanning laser polarimetry with variable corneal compensation (GDx VCC), confocal scanning laser ophthalmoscopy (HRT II [Heidelberg Retina Tomograph]), and optical coherence tomography (Stratus OCT) to discriminate between healthy eyes and eyes with glaucomatous visual field loss.

Methods  We included 107 patients with glaucomatous visual field loss and 76 healthy subjects of a similar age. All individuals underwent imaging with a GDx VCC, HRT II, and fast retinal nerve fiber layer scan with the Stratus OCT as well as visual field testing within a 6-month period. Receiver operating characteristic curves and sensitivities at fixed specificities (80% and 95%) were calculated for parameters reported as continuous variables. Diagnostic categorization (outside normal limits, borderline, or within normal limits) provided by each instrument after comparison with its respective normative database was also evaluated, and likelihood ratios were reported. Agreement on categorization between methods (weighted {kappa}) was assessed.

Results  After the exclusion of subjects with unacceptable images, the final study sample included 141 eyes of 141 subjects (75 with glaucoma and 66 healthy control subjects). Mean ± SD mean deviation of the visual field test result for patients with glaucoma was –4.87 ± 3.9 dB, and 70% of these patients had early glaucomatous visual field damage. No statistically significant difference was found between the areas under the receiver operating characteristic curves (AUCs) for the best parameters from the GDx VCC (nerve fiber indicator, AUC = 0.91), Stratus OCT (retinal nerve fiber layer inferior thickness, AUC = 0.92), and HRT II (linear discriminant function, AUC = 0.86). Abnormal results for each of the instruments, after comparison with their normative databases, were associated with strong positive likelihood ratios. Chance-corrected agreement (weighted {kappa}) among the 3 instruments ranged from moderate to substantial (0.50-0.72).

Conclusions  The AUCs and the sensitivities at high specificities were similar among the best parameters from each instrument. Abnormal results (as compared with each instrument's normative database) were associated with high likelihood ratios and large effects on posttest probabilities of having glaucomatous visual field loss. Calculation of likelihood ratios may provide additional information to assist the clinician in diagnosing glaucoma with these instruments.



INTRODUCTION
 Jump to Section
 •Top
 •Introduction
 •Methods
 •Results
 •Comment
 •Author information
 •References

Changes in the structural appearance of the optic nerve head and retinal nerve fiber layer (RNFL) often precede the development of visual field loss in glaucoma.1-3 Thus, detection of optic nerve head and RNFL damage is crucial for the diagnosis of glaucoma in its early stages. Until recently, structural evaluation in glaucoma has been subjective, with primarily qualitative descriptions of change. With the emergence of optical imaging instruments, assessment of the optic nerve head and RNFL is objective, providing quantitative information.

Confocal scanning laser ophthalmoscopy, scanning laser polarimetry, and optical coherence tomography are various technologies that make use of the different properties of light and different characteristics of retinal tissue to obtain their measurements.4-10 Studies have compared the ability of earlier versions of these technologies to differentiate patients with glaucoma from healthy subjects.4, 11-24 However, each of these technologies has recently undergone significant hardware and software improvements, and issues that were once limitations for a given technique may no longer be relevant.

For scanning laser polarimetry, the introduction of variable corneal compensation in the GDx VCC (Laser Diagnostic Technologies, Inc, San Diego, Calif) has resulted in improved diagnostic accuracy compared with the earlier version of this instrument, which used fixed corneal compensation.25-28 For optical coherence tomography, the new Stratus OCT (Carl Zeiss Meditec, Inc, Dublin, Calif) includes several improvements compared with the original OCT, including better resolution, an increased number of A-scans, and a reduced need for pupil dilation.29 Also, the Stratus OCT provides information on the probability of abnormality of patient examination results after comparison with an internal normative database. For confocal scanning laser ophthalmoscopy, the HRT II Heidelberg Retina Tomograph (Heidelberg Engineering, Dossenheim, Germany) is designed specifically for imaging of the optic nerve head, with almost completely automatic image acquisition and improved diagnostic accuracy.11, 30

The purpose of this study was to compare, in 1 study population, the ability of current commercially available versions of these 3 technologies to differentiate between healthy eyes and eyes with glaucomatous visual field loss.


METHODS
 Jump to Section
 •Top
 •Introduction
 •Methods
 •Results
 •Comment
 •Author information
 •References

This observational cross-sectional study included 183 eyes of 183 patients (107 patients with glaucoma and 76 healthy control subjects). All patients were evaluated at the Hamilton Glaucoma Center, University of California, San Diego, from April 2002 to November 2003. These patients were included in a prospective longitudinal study designed to evaluate optic nerve structure and visual function in glaucoma (Diagnostic Innovations in Glaucoma Study). All patients who met the inclusion criteria were enrolled in the current study. Informed consent was obtained from all participants. The University of California, San Diego, Human Subjects Committee approved all protocols, and the methods described adhered to the tenets of the Declaration of Helsinki.

Each subject underwent a comprehensive ophthalmologic examination including a review of the medical history, best-corrected visual acuity measurement, slitlamp biomicroscopy, intraocular pressure measurement using Goldmann applanation tonometry, gonioscopy, dilated fundoscopic examination using a 78D lens, stereoscopic optic disc photography, and automated perimetry using the 24-2 Swedish Interactive Threshold Algorithm (Carl Zeiss Meditec, Inc). To be included, subjects had to have a best-corrected visual acuity of 20/40 or better in the affected eye, spherical refraction within ±5.0 diopters (D) and cylinder correction within ±3.0 D, and open angles on gonioscopy. Eyes with coexisting retinal disease, uveitis, or nonglaucomatous optic neuropathy were excluded from this investigation. One eye of each patient was randomly selected for inclusion in the study.

Normal control eyes had an intraocular pressure of 21 mm Hg or less (with no history of increased intraocular pressure) and a normal visual field test result. A normal visual field was defined as a mean deviation and pattern standard deviation within 95% confidence limits and a glaucoma hemifield test31 result within normal limits. Normal control eyes also had a healthy appearance of the optic disc and RNFL (no diffuse or focal rim thinning, cupping, optic disc hemorrhage, or RNFL defects), as evaluated by clinical examination. Eyes were classified as glaucomatous if they had repeated (2 consecutive) abnormal visual field test results, defined as a pattern standard deviation outside the 95% normal confidence limits or a glaucoma hemifield test result outside normal limits, regardless of the appearance of the optic disc.

All subjects underwent ocular imaging with the GDx VCC, HRT II, and Stratus OCT. For each subject, all ocular imaging and visual field examinations were completed within 6 months.

INSTRUMENTATION AND DATA ANALYSIS

GDx VCC Scanning Laser Polarimeter

All patients underwent imaging using a commercially available scanning laser polarimeter (GDx VCC; software version 5.0.1; Laser Diagnostic Technologies, Inc). The general principles of scanning laser polarimetry have been described in detail elsewhere.17 The GDx VCC is a modified scanning laser polarimetry system with variable corneal compensation. Images of the ocular fundus are formed by scanning the beam of a near infrared laser (780 nm) in a raster pattern. The scan raster covers an image field 40° horizontally and 20° vertically, including both the parapapillary and macular regions of the eye. With the GDx VCC, the method of variable corneal polarization compensation, as described by Zhou and Weinreb,25 has been automated and has replaced the original fixed corneal compensator. The variable corneal compensator in this system consists of 2 identical linear retarders in rotating mounts so that both the retardation and axis of the unit can be adjusted according to requirements. To measure eye-specific corneal polarization axis and magnitude, scanning laser polarimetry images of the macula are first acquired without compensation (the retardation is set to 0). The uniform radial birefringence of the Henle fiber layer in the macula is used as an "intraocular polarimeter," and both the Henle fiber layer and corneal retardation can be determined from the macular retardation profile. Next, corneal birefringence–compensated scanning laser polarimetry images are obtained using the appropriate eye-specific corneal polarization axis and magnitude values by adjusting the variable corneal compensation retarders. The GDx VCC measures retardation in nanometers. To simplify communications, retardation values are converted into thickness values (micrometers) using a fixed conversion factor of 0.67 nm/µm.32

In this study, a baseline image was automatically created from 3 images obtained for each subject. Assessment of GDx VCC image quality was performed by an experienced examiner masked to the subject's identity and results from the other tests. The assessment was based on the appearance of the reflectance image, presence of residual anterior segment retardation, and presence of an atypical pattern of retardation. To be classified as good quality, an image required a focused and evenly illuminated reflectance image with a centered optic disc. To be acceptable, the baseline image also had to have a residual anterior segment retardation of 15 nm or less and an atypical scan score less than 25. The atypical scan score is a measure provided by the GDx VCC standard software, which indicates the presence of atypical patterns of retardation that can generate spurious RNFL thickness measurements. Twenty-four (13%) of 183 patients had unacceptable GDx VCC scans and were excluded from further analysis. The Venn diagram in Figure 1 shows the number of unacceptable images from each instrument.



View larger version (10K):
[in this window]
[in a new window]
Figure 1. Venn diagram showing the number of images classified as unacceptable from the GDx VCC scanning laser polarimeter (Laser Diagnostic Technologies, Inc, San Diego, Calif), HRT II confocal scanning laser ophthalmoscope (Heidelberg Retina Tomograph; Heidelberg Engineering, Dossenheim, Germany), and Stratus OCT optical coherence tomograph (Carl Zeiss Meditec, Inc, Dublin, Calif).


The GDx VCC software calculates summary parameters based on quadrants that are defined as temporal (335° to 24°), superior (25° to 144°), nasal (145° to 214°), or inferior (215° to 334°). The GDx VCC parameters investigated in this study were superior ratio (superior quadrant thickness/temporal quadrant thickness), inferior ratio, superior/nasal ratio, superior maximum (mean of the 1500 thickest points in the superior quadrant), inferior maximum, superior average, inferior average, normalized superior area (area under the temporal-superior-nasal-inferior-temporal [TSNIT] curve in the superior quadrant), normalized inferior area, maximum modulation ([thick est quadrant–thinnest quadrant]/thinnest quadrant), ellipse modulation, ellipse average (TSNIT average), ellipse standard deviation (TSNIT standard deviation), and nerve fiber indicator (NFI). The NFI is calculated using a support vector machine algorithm based on several RNFL measures (Michael Sinai, PhD, Laser Diagnostic Technologies, written communication, March 2003) and assigns a number from 0 to 100 to each eye. The higher the NFI, the greater the likelihood that the patient has glaucoma. For each of these parameters, receiver operating characteristic (ROC) curves were constructed and sensitivities at fixed specificities (≥80% and ≥95%) were reported.

For the parameters TSNIT average, superior average, inferior average, and TSNIT standard deviation, the GDx VCC printout also provides probability measures of abnormality based on comparison with an internal normative database containing information on 540 normal eyes. In the GDx VCC printout, each color represents a different probability of the parameter being outside normal limits, with red having the highest probability (P<.005), followed by yellow (P<.01), light blue (P<.02), and dark blue (P<.05). For this study, a parameter was considered outside normal limits if P<.005 (red), borderline if P<.05 (yellow, light blue, or dark blue), and within normal limits if P>.05 (green). We evaluated the diagnostic categorization (outside normal limits, borderline, or within normal limits) provided by the GDx VCC after comparison with its normative database and reported likelihood ratios (LRs) for each parameter. For the parameter of NFI, no probability measure of abnormality is currently provided in the printout. However, the cutoffs suggested by the manufacturer are 0 to 30 for within normal limits, 31 to 50 for borderline, and 51 to 100 for outside normal limits (Michael Sinai, PhD, written communication, December 2003). In our study, we investigated the diagnostic ability of the NFI using the manufacturer's suggested cutoffs as well as other arbitrarily selected cutoffs. Interval LRs were calculated for the NFI.

HRT II Confocal Scanning Laser Ophthalmoscope

The HRT II uses a diode laser (670-nm wavelength) to sequentially scan the retinal surface in the horizontal and vertical directions at multiple focal planes. Using confocal scanning principles, a 3-dimensional topographic image is constructed from a series of optical image sections at consecutive focal planes.33 The topographic image determined from the acquired 3-dimensional image consists of 384 x 384 (147 456 total) pixels, each of which is a measurement of retinal height at its corresponding location. For every patient, 3 topographic images were obtained, combined, and automatically aligned to make a single mean topographic image used for analysis. Magnification errors were corrected using patients' corneal curvature measurements. An experienced examiner outlined the optic disc margin on the mean topographic image while viewing stereoscopic photographs of the optic disc. Good-quality images required focused reflectance with a standard deviation no greater than 50 µm. Fifteen (8%) of the 183 patients had unacceptable topographic images and were excluded from further analysis (Figure 1).

Topographic parameters included with HRT II software and investigated in this study were disc area, cup area, rim area, cup/disc area ratio, rim/disc area ratio, cup volume, rim volume, mean cup depth, maximum cup depth, height variation contour, cup shape measure, mean RNFL thickness, RNFL cross-sectional area, horizontal cup/disc ratio, vertical cup/disc ratio, and 2 linear discriminant functions, from Mikelberg et al34 (Mikelberg function) and Bathija et al4 (Bathija function). These parameters have been described in detail elsewhere. For each of these parameters, ROC curves were constructed and sensitivities at fixed specificities (≥80% and ≥95%) were reported. All of these parameters except for the 2 linear discriminant functions and the horizontal and vertical cup/disc ratios were further examined using sectors categorized as temporal superior (45° to 90°), nasal superior (91° to 135°), nasal (136° to 225°), nasal inferior (226° to 270°), temporal inferior (271° to 315°), and temporal (316° to 44°).

The software for the HRT II also incorporates the Moorfields regression analysis,11 which is a comparison of the subject's rim area with a predicted rim area for a given disc area and age, based on confidence limits of a regression analysis derived from 112 normal eyes of white subjects. Each sector is classified as normal if the measurement falls within the 95% confidence interval (CI), borderline if the measurement falls between the 95% and 99.9% CI, and outside normal limits if the measurement falls lower than the 99.9% CI. The Moorfields regression analysis also provides results for the global rim area as well as a final classification. A normal classification requires the Moorfields regression analysis of all sectors and the global rim area to be within normal limits. A borderline classification occurs when at least 1 of the sectors or the global rim area is borderline, and an outside normal limits result occurs when at least 1 sector or the global rim area is outside normal limits. The LRs were calculated for each possible diagnostic categorization (within normal limits, borderline, and outside normal limits) of the global and sectorial results as well as the final classification of Moorfields regression analysis.

Stratus OCT

The commercially available optical coherence tomograph, the Stratus OCT, was used to assess parapapillary RNFL thickness measurements. Optical coherence tomography uses the principles of low-coherence interferometry and is analogous to ultrasound B-mode imaging, but it uses light instead of sound to acquire high-resolution images of ocular structures.8 A low-coherence near infrared (840-nm) light beam is directed onto a partially reflective mirror (beam splitter) that creates 2 light beams, a reference and a measurement beam. The measurement beam is directed onto the subject's eye and is reflected from intraocular microstructures and tissues according to their distance, thickness, and reflectivity. The reference beam is reflected from the reference mirror at a known, variable position. Both beams travel back to the partially reflective mirror, recombine, and are transmitted to a photosensitive detector. The pattern of interference is used to provide information regarding distance and thickness of the retinal structures. Bidimensional images are created by successive longitudinal scanning in transverse directions.

The fast RNFL algorithm was used to obtain RNFL thickness measurements with the Stratus OCT. Three images were acquired from each subject, with each image consisting of 256 A-scans along a 3.4-mm-diameter circular ring around the optic disc. A baseline image was automatically created using the Stratus OCT software. Quality assessment of Stratus OCT scans was determined by an experienced examiner masked to the subject's identity and the results of the other tests. Good-quality scans had to have focused images from the ocular fundus, an adequate signal-to-noise ratio, and the presence of a centered circular ring around the optic disc. Nineteen (10%) of 183 patients had unacceptable Stratus OCT scans and were excluded from further analysis (Figure 1).

Parapapillary RNFL thickness parameters automatically calculated by existing Stratus OCT software (version 3.1) and evaluated in this study were average thickness (360° measurement), temporal quadrant thickness (316° to 45°), superior quadrant thickness (46° to 135°), nasal quadrant thickness (136° to 225°), inferior quadrant thickness (226° to 315°), and thickness for each of 12 clock-hour positions with the 3-o'clock position as nasal, 6-o'clock position as inferior, 9-o'clock position as temporal, and 12-o'clock position as superior. Other parameters evaluated included superior maximum (Smax)(thickest point in the superior quadrant), inferior maximum (Imax) (thickest point in the inferior quadrant), and relational parameters such as Imax/Smax, Smax/Imax, Imax/temporal average thickness (Imax/Tavg), Smax/nasal average thickness (Smax/Navg), and difference between the thickest and thinnest points along the measurement circle (Max-Min). For each of these parameters, ROC curves were constructed and sensitivities at fixed specificities (≥80% and ≥95%) were reported.

For each parameter, the Stratus OCT software provides a classification (within normal limits, borderline, or outside normal limits) based on comparison with an internal normative database of 328 eyes. A parameter is classified as outside normal limits if its value falls lower than the 99.9% CI of the healthy, age-matched population. A borderline result indicates that the value is between the 95% and 99.9% CI, and a within-normal-limits result indicates that the value is within the 95% CI. The LRs were calculated for each parameter and each possible diagnostic categorization, as provided by the Stratus OCT software.

STATISTICAL ANALYSIS

We used t tests to evaluate optic nerve head and RNFL measurement differences between glaucomatous and normal eyes. Results of statistical significance were also provided after Bonferroni correction based on the number of comparisons within each analysis (GDx VCC, HRT II, and Stratus OCT).

The ROC curves were used to describe the ability of each parameter from each instrument to differentiate glaucomatous from normal eyes. The ROC curve shows the trade-off between sensitivity and 1–specificity. An area under the ROC curve (AUC) of 1.0 represents perfect discrimination, whereas an AUC of 0.5 represents chance discrimination. The method of DeLong et al35 was used to compare AUCs. Sensitivities at fixed specificities were compared using the McNemar test.

Diagnostic categorization (outside normal limits, borderline, or within normal limits) provided by each instrument after comparison with its respective normative database was also evaluated, and LRs were reported. An LR is defined as the probability of a given test result in those with disease divided by the probability of that same test result in those without the disease.36-37 Once determined, an LR can be incorporated directly into the calculation of posttest probability of disease by using a formulation of the Bayes theorem.38 The LR for a given test result indicates how much that result will raise or lower the probability of disease. A value of 1 means that the test provides no additional information, and ratios higher or lower than 1 increase or decrease the likelihood of disease. A classification of the effect of LRs of different magnitudes on the posttest probability of disease has been suggested and was used in our study.36 According to this classification, LRs higher than 10 or lower than 0.1 would be associated with large effects on posttest probability, LRs from 5 to 10 or from 0.1 to 0.2 would be associated with moderate effects, LRs from 2 to 5 or from 0.2 to 0.5 would be associated with small effects, and LRs closer to 1 would be insignificant. The 95% CIs for LRs were calculated according to the method proposed by Simel et al.39

Chance-corrected agreement on categorization between different instruments was assessed using a weighted {kappa} approach, with quadratic weighting assignment as proposed by Fleiss.40 This method allows for differences in the importance of disagreement, assuming that disagreement between adjacent categories (eg, between normal and borderline or borderline and outside normal limits) is not as important as that between distant categories (eg, between normal and outside normal limits). Strength of agreement was categorized according to the method proposed by Landis and Koch41: less than 0 was poor, 0 to 0.20 was slight, 0.21 to 0.40 was fair, 0.41 to 0.60 was moderate, 0.61 to 0.80 was substantial, and 0.81 to 1.00 was almost perfect.

P<.05 was considered statistically significant. Statistical analyses were performed using SPSS version 10.0 (SPSS Inc, Chicago, Ill) and S-PLUS 2000 (Mathsoft Inc, Seattle, Wash) statistical software.


RESULTS
 Jump to Section
 •Top
 •Introduction
 •Methods
 •Results
 •Comment
 •Author information
 •References

After the exclusion of subjects with unacceptable images (Figure 1), the final study sample included 141 eyes of 141 subjects (75 patients with glaucoma and 66 healthy control subjects). There was no statistically significant difference between the mean ± SD ages of patients with glaucoma and healthy subjects (mean ± SD, 68 ± 10 years vs 65 ± 8 years, respectively; P = .06 using a t test). The mean ± SD mean deviation of the glaucomatous eyes on the visual field test nearest the imaging date was –4.87 ± 3.9 dB. According to the grading scale for severity of visual field defects developed by Hodapp et al,42 53 patients (70%) were classified as having early visual field defects, 11 patients (15%) had moderate defects, and 11 patients (15%) had severe visual field defects.

GDx VCC SCANNING LASER POLARIMETER RESULTS

Table 1 presents the mean values of GDx VCC parameters in glaucomatous and normal eyes. After Bonferroni correction ({alpha} = .003; 15 comparisons), statistically significant differences were found for all parameters except symmetry. Table 1 also shows the ROC curve areas and sensitivities at fixed specificities. The 3 GDx VCC parameters with the largest AUCs were NFI (0.91), inferior normalized area (0.86), and TSNIT average (0.85). The AUC for NFI was significantly higher than those for inferior normalized area (P = .04) and TSNIT average (P = .004).


View this table:
[in this window]
[in a new window]
Table 1. Values of GDx VCC Parameters With Areas Under the Receiver Operating Characteristic Curves and Sensitivities at Fixed Specificities*


Table 2 presents LRs with their 95% CIs for the GDx VCC parameters after comparison with the instrument's normative database. For all parameters, outside normal limits results were associated with large effects on the posttest probability of disease. Borderline results were associated with small to moderate effects, whereas within-normal-limits results were associated with small effects on the posttest probability of disease. The LRs of the overall GDx VCC classification were also evaluated. For this classification, an outside-normal-limits result was considered to be the presence of an NFI greater than 50 or any other parameter outside-normal-limits. The LR of an outside-normal-limits result in the GDx VCC overall classification was infinity. For a borderline result (NFI between 31 and 50 or any other parameter that was borderline), the LR was 1.60 (95% CI, 1.02-2.50). A within-normal-limits result (NFI ≤30 and all parameters within normal limits) had an LR of 0.24 (95% CI, 0.14-0.40).


View this table:
[in this window]
[in a new window]
Table 2. Likelihood Ratios and 95% Confidence Intervals for GDx VCC Parameters


Interval LRs were also calculated for the parameter of NFI using arbitrarily selected cutoffs (Table 3); results of 0 to 15 and greater than 50 were associated with large effects on the posttest probability of disease, whereas the other test ranges were associated with small effects.


View this table:
[in this window]
[in a new window]
Table 3. Interval Likelihood Ratios and 95% Confidence Intervals for the Parameters With Highest Areas Under the Receiver Operating Characteristic Curves From the GDx VCC, Stratus OCT, and HRT II


HRT II RESULTS

Table 4 presents the mean values of HRT II parameters in glaucomatous and normal eyes. After Bonferroni correction ({alpha} = .003; 17 comparisons), statistically significant differences were found for all parameters except disc area and height variation contour. Table 4 also indicates ROC curve areas and sensitivities at fixed specificities. The 3 HRT II parameters with the largest AUCs were the Bathija function (0.86), the Mikelberg function (0.83), and the vertical cup/disc ratio (0.83). There were no statistically significant differences in ROC curve areas for these parameters (P>.05 for all comparisons). The analysis of HRT II parameters by sector did not result in higher ROC curve areas, with the parameter of temporal inferior rim/disc area ratio having the largest ROC curve area (0.81).


View this table:
[in this window]
[in a new window]
Table 4. Values of HRT II Parameters With Areas Under the Receiver Operating Characteristic Curves and Sensitivities at Fixed Specificities*


Table 5 presents LRs with their 95% CIs for the HRT II Moorfields regression analysis. Global and sectorial results outside normal limits were generally associated with large effects on the posttest probability of disease. Borderline results were associated with small to moderate effects, whereas within normal limits results were associated with small effects. An outside normal limits result in the overall HRT II classification (ie, the Moorfields regression analysis classification) was associated with a large effect on the posttest probability of disease (LR = 19.4), whereas borderline (LR = 0.88) and within normal limits (LR = 0.35) results were associated with small changes in the probability of disease.


View this table:
[in this window]
[in a new window]
Table 5. Likelihood Ratios and 95% Confidence Intervals for HRT II Moorfields Regression Analysis


We evaluated interval LRs for the HRT II parameter with the largest AUC, the Bathija function. Several cutoffs were arbitrarily created for this parameter, and the interval LRs are indicated in Table 3. Values for the Bathija function greater than 1.0 or smaller than –1.0 were associated with large effects on posttest probabilities of disease, whereas the other test results had small effects on the probability of disease.

STRATUS OCT RESULTS

Table 6 presents the mean values of Stratus OCT parameters in glaucomatous and normal eyes. After Bonferroni correction ({alpha} = .002; 25 comparisons), statistically significant differences were found for all parameters except thickness at 8 o'clock, thickness at 9 o'clock, temporal thickness, Imax/Smax, Smax/Tavg, and Smax/Navg. Table 6 also indicates ROC curve areas and sensitivities at fixed specificities. The 3 Stratus OCT parameters with the largest AUCs were inferior thickness (0.92), average thickness (0.91), and Imax (0.91). There were no statistically significant differences in ROC curve areas for these parameters (P>.05 for all comparisons).


View this table:
[in this window]
[in a new window]
Table 6. Values of Stratus OCT Parameters With Areas Under the Receiver Operating Characteristic Curves and Sensitivities at Fixed Specificities*


Table 7 presents LRs with their 95% CIs for the Stratus OCT parameters after comparison with the instrument's normative database. For the overall Stratus OCT classification, an outside-normal-limits result was considered to be the presence of any quadrant outside-normal-limits. The LR of an outside-normal-limits result in the Stratus OCT overall classification was 43.1 (95% CI, 32.2-57.8). For a borderline result (any quadrant with a borderline result), the LR was 0.88 (95% CI, 0.44-1.77). A within normal-limits-result (all quadrants within normal limits) had an LR of 0.28 (95% CI, 0.17-0.44).


View this table:
[in this window]
[in a new window]
Table 7. Likelihood Ratios and 95% Confidence Intervals for Stratus OCT Parameters


We evaluated interval LRs for the Stratus OCT parameter with the largest AUC, inferior thickness. Several cutoffs were arbitrarily created for this parameter, and the interval LRs are indicated in Table 3. Inferior thickness values less than or equal to 70 µm or greater than 130 µm were associated with large effects on the posttest probability of disease. Values ranging from 71 µm to 90 µm were also associated with large effects, whereas the other test results were associated with small or insignificant effects on posttest probabilities of disease.

COMPARISON OF GDx VCC, HRT II, AND STRATUS OCT

No statistically significant difference was found between AUCs for the best parameters from the GDx VCC (NFI, AUC = 0.91), Stratus OCT (inferior thickness, AUC = 0.92), and HRT II (Bathija function, AUC = 0.86) (P>.05 for all comparisons). Figure 2 shows the ROC curves for the best parameters from each instrument.



View larger version (30K):
[in this window]
[in a new window]
Figure 2. Receiver operating characteristic curves of the best parameters from the GDx VCC scanning laser polarimeter, HRT II confocal scanning laser ophthalmoscope, and Stratus OCT optical coherence tomograph. Manufacturers are provided in the legend to Figure 1. Bathija function refers to the linear discriminant function used by Bathija et al.4


At specificities of at least 95%, no statistically significant differences were found among parameters with the highest sensitivities from each instrument: GDx VCC NFI (sensitivity, 61%), Stratus OCT average thickness (sensitivity, 71%), and HRT II Bathija function (sensitivity, 59%; P>.05 for all comparisons). At specificities of at least 80%, a statistically significant difference was found between the HRT II parameter with the highest sensitivity (Mikelberg function; sensitivity, 73%) and those from the GDx VCC (NFI; sensitivity, 87%; P = .05) and Stratus OCT (inferior thickness; sensitivity, 89%; P = .01). No statistically significant difference was found between parameters with the highest sensitivities from the GDx VCC and Stratus OCT.

Agreement on diagnostic categorization between pairs of instruments was also evaluated. For this analysis, the overall classification of each instrument was used as described previously. The GDx VCC and Stratus OCT overall classifications agreed in 89% of cases, with a substantial chance-corrected agreement ({kappa} = 0.72 [0.08]). The GDx VCC and HRT II overall classifications agreed in 81% of cases, with a moderate chance-corrected agreement ({kappa} = 0.50 [0.08]). The Stratus OCT and HRT II overall classifications agreed in 81% of cases, with a moderate chance-corrected agreement ({kappa} = 0.55 [0.08]).


COMMENT
 Jump to Section
 •Top
 •Introduction
 •Methods
 •Results
 •Comment
 •Author information
 •References

To our knowledge, this is the first study to provide a comparison, using the same population, of the diagnostic accuracies of 3 instruments: the GDx VCC scanning laser polarimeter, HRT II confocal scanning laser ophthalmoscope, and Stratus OCT optical coherence tomograph. Each instrument represents the current commercially available version of a different technology for evaluation of the optic nerve head and RNFL in glaucoma.

Several measures of diagnostic accuracy were provided in our study, including ROC curve areas, sensitivities at fixed specificities, and LRs. For parameters reported as continuous variables, no statistically significant differences in ROC curves were found among the best parameters of the 3 instruments.

Previous studies have compared ROC curve areas derived from measures obtained using older versions of these technologies. Zangwill et al19 and Greaney et al43 reported that ROC curve areas were similar among the best parameters from the GDx Nerve Fiber Analyzer (Laser Diagnostic Technologies, Inc), OCT 2000 (Carl Zeiss Meditec, Inc), and HRT I (Heidelberg Engineering). In the study by Zangwill and colleagues, the best parameters from the OCT 2000 and HRT I had higher sensitivities than the best parameter from the GDx Nerve Fiber Analyzer. At 96% specificity, the best parameter from the GDx Nerve Fiber Analyzer (a linear discriminant function combining several parameters of the instrument) had a sensitivity of only 32%. The introduction of the GDx with variable corneal compensation, the GDx VCC, has reportedly resulted in improved diagnostic accuracy as compared with scanning laser polarimetry using fixed corneal compensation.26 In our study, at a specificity of 97%, the sensitivity of the best parameter from the GDx VCC, NFI, was 61%. This confirms the improvement in diagnostic accuracy as described in other studies.26, 44

For optical coherence tomography, the ROC curve areas obtained in our study for the Stratus OCT were similar to those obtained with the previous versions of this technology. The AUCs for the earlier versions of the optical coherence tomographic reportedly ranged from 0.79 to 0.94 depending on the parameter and characteristics of the population evaluated.14, 18-19,22-23,43 In studies evaluating the diagnostic ability of several optical coherence tomographic parameters, the RNFL thickness in the inferior region often had the best ability to discriminate healthy eyes from eyes with early to moderate glaucoma, with sensitivities between 67% and 79% for specificities of 90% or higher.18-19,22 In our study, the parameter of inferior thickness also had the highest AUC, with a sensitivity of 64% for a specificity set at 95%. Although results for the Stratus OCT in our study were at least as good as those reported for the previous version of the instrument, the Stratus OCT may still have advantages compared with its predecessor, including increased sampling, a reduced need for pupillary dilation, and easier image acquisition.

For the HRT II, the ROC curve areas for parameters reported as continuous variables were also similar to those reported for previous versions of the instrument.19, 24 The incorporation of the Moorfields regression analysis has been demonstrated to improve the diagnostic accuracy of this instrument.11-12,30 In our study, ROC curve areas for parameters obtained from the HRT II Moorfields regression analysis were not provided because the small number of categories in these parameters may cause an underestimation of the ROC curve area.45 At high specificity (≥95%), the sensitivity of the Moorfields regression analysis overall classification (59%) was not statistically significantly different from the best parameters of the Stratus OCT and GDx VCC. Furthermore, the application of sophisticated methods of analysis of HRT data has recently been reported to improve the diagnostic ability of this instrument.46-48 Additional studies are necessary to compare these methods with measures provided by the Stratus OCT and GDx VCC.

Although sensitivity and specificity have commonly been reported as measures of diagnostic accuracy in medical studies, their clinical utility is limited.36, 49-50 They reflect the probability that a particular test result is positive or negative given the presence (sensitivity) or absence (specificity) of disease. However, there is an inversion of clinical logic intrinsic to this definition; knowledge of whether the patient had the disease would clearly obviate the need for a diagnostic test. Similarly, the AUC is important for comparing the diagnostic accuracies of different tests but has little intrinsic clinical meaning. The starting points of any diagnostic process are the patient developing a constellation of symptoms and signs and the clinician integrating this information to assign a pretest probability of disease. The results of diagnostic tests are then used to modify the pretest probability of disease, yielding a new posttest probability. The direction and magnitude of this change from pretest to posttest are determined by the test's properties, particularly the LR. The LR represents the magnitude of change from a physician's initial suspicion of disease (pretest probability) to the likelihood of disease after the test result (posttest probability).36

In our study, LRs for outside-normal-limits results in all instruments were generally associated with large changes from pretest to posttest probability of glaucoma. However, LRs for within-normal-limits results were associated with small changes in probability. For the overall classifications from each instrument, the LRs of within-normal-limits results were 0.24, 0.28, and 0.35, respectively, for the GDx VCC, Stratus OCT, and HRT II. This indicates that normal results for each of these tests would induce only a small change in the pretest probability of disease; that is, they would be of limited value in excluding the presence of disease. For borderline results, the LRs were generally associated with small to moderate changes in the probability of disease. Depending on the pretest probability of disease and the clinical situation in which the test is used, even small changes in probability may be clinically relevant.

Previous studies using the HRT I have reported the results of Moorfields regression analysis in terms of sensitivity and specificity.13, 30 To do so, the borderline test results had to be forced into the within-normal-limits or outside-normal-limits categories.51 This approach results in valuable loss of information and may cause distortions in interpretation of the test results when used in clinical practice. Combining borderline results with the within-normal-limits category reduces the sensitivity of the test and the importance of a within-normal-limits result, whereas combining borderline results with the outside-normal-limits category reduces the specificity of the test and the importance of an outside-normal-limits result. In contrast, LRs can be calculated for each diagnostic categorization, permitting the clinician to assess the diagnostic importance of each category. In this study, we showed that borderline results in the Moorfields regression analysis overall classification did not appreciably change the probability of disease but that an outside-normal-limits or within-normal-limits result produced a larger change in its probability.

The dichotomization of test results with continuous outcomes may also result in loss of information because results that are markedly abnormal are lumped with results that are only mildly abnormal, leading to distortion in their clinical interpretation.36, 52-53 These distortions are especially exaggerated when the patient's test result is close to the established cutoff.52 Interval LRs, however, assign a specific value to each level of abnormality, and this value can be used to calculate the posttest probability of disease for a given level of the test. Interval LRs calculated for parameters from each instrument are likely to provide more clinically relevant information than what is currently available in the printout. For instance, according to the manufacturer's suggested cutoffs for the GDx VCC NFI, values from 0 to 30 should be considered normal. This range of values was associated with an LR of 0.38, inducing only a small change toward reduction in the probability of having glaucomatous visual field loss. Using the manufacturer's suggested cutoffs, a test result with an NFI of 10 would be considered similar to one with an NFI of 27. Our results demonstrate the great difference between these 2 situations. Based on the interval LRs calculated in our study, an NFI of 10 would almost exclude the presence of glaucomatous visual field loss, whereas a result of 27 would have a nearly insignificant effect on changing the pretest probability of disease. A similar analogy could be demonstrated for the Stratus OCT and HRT II results.

The usefulness of a diagnostic test is strongly influenced by the proportion of patients suspected of having the target disorder whose test results have very high (>10) or very low (<0.1) LRs, thus having a huge effect on the probability of disease. As indicated in Table 3, this proportion was 41% for the GDx VCC NFI, 51% for Stratus OCT inferior thickness, and 45% for the HRT II Bathija function. For the HRT II Moorfields regression analysis, this proportion ranged from 1% to 33% depending on the specific parameter chosen. Selection of other cutoffs or parameters may produce different results, and studies with larger sample sizes are necessary to provide more robust estimations of LRs using smaller intervals of the range of possible test values. In our study, test results for some parameters had LRs of infinity, which indicates that a particular test result was not found in any of the healthy subjects (ie, the probability of the test result in healthy subjects was 0). When evaluating the clinical importance of such parameters, it is critical to evaluate the probability of the same test result in subjects with disease.

Agreement among the different instruments in our study varied from moderate to substantial. The chance-corrected agreement between the Stratus OCT and HRT II was 0.55, similar to the 0.58 index reported by Greaney et al43 when comparing earlier versions of these instruments. For the GDx VCC compared with the Stratus OCT and the GDx VCC compared with the HRT II, the agreements reported in our study were higher than those using earlier versions of these instruments.19, 43 These findings are similar to those in recent reports showing an improvement in the correlation coefficients for associations between scanning laser polarimetry parameters and other measures of glaucomatous damage, such as RNFL semiquantitative photographic scores54 or OCT RNFL thickness measurements,55 when variable corneal compensation rather than fixed corneal compensation was used. Interestingly, the agreement of the GDx VCC with the Stratus OCT was higher than that between the GDx VCC and HRT II and between the Stratus OCT and HRT II, which most likely reflects the fact that both the GDx VCC and Stratus OCT measure RNFL properties, whereas the HRT II measures optic disc topography and provides only an indirect measure of the RNFL.

In conclusion, the AUCs and the sensitivities at high specificities were similar among the best parameters from each instrument. Abnormal results (as compared with each instrument's normative database) were associated with high LRs and large effects on posttest probabilities of having glaucomatous visual field loss. The calculation of interval LRs may provide additional information to assist the clinician in diagnosing glaucoma.


AUTHOR INFORMATION
 Jump to Section
 •Top
 •Introduction
 •Methods
 •Results
 •Comment
 •Author information
 •References

Corresponding author: Robert N. Weinreb, MD, Hamilton Glaucoma Center, University of California, San Diego, 9500 Gilman Dr, La Jolla, CA 92093-0946.

Submitted for publication January 26, 2004; final revision received March 1, 2004; accepted March 4, 2004.

This study was supported in part by the Foundation for Eye Research, Rancho Santa Fe, Calif (Dr Medeiros), and by grant EY11008 from the National Institutes of Health, Bethesda, Md (Dr Zangwill).

From the Hamilton Glaucoma Center and Department of Ophthalmology, University of California, San Diego. Dr Weinreb receives research support from Carl Zeiss Meditec, Inc (Dublin, Calif), Heidelberg Engineering (Dossenheim, Germany), and Laser Diagnostic Technologies, Inc (San Diego).


REFERENCES
 Jump to Section
 •Top
 •Introduction
 •Methods
 •Results
 •Comment
 •Author information
 •References