Asking the Question
The first step toward practicing EGO is directing the reader's inquiry
with a well-phrased question. Knowledge of the literature is important so
that the question can be answered if an answer exists. Without knowing the
contents of the published literature, a well-formed question cannot be made.
The question should be organized into 3 elements: exposure, outcome, and setting.
Exposure is a term from epidemiology, which describes what the patients were
exposed to. The exposure might be a treatment for an existing disease or a
risk factor that might increase or decrease the risk of developing the disease.
The outcome is the precise end point of interest. The more precisely this
is specified, the more specific the answer can be. Outcomes do not have to
be desirable end points such as improvement in vision; they may be undesirable
end points such as adverse effects. The specific setting can be very important
in narrowing the search. For example, let's say the reader is interested in
knowing whether to stop aspirin therapy in patients with diabetic retinopathy.
The first step is focusing the question: Does oral aspirin treatment (exposure)
affect vitreous hemorrhage (outcome) in patients with diabetes mellitus (setting)?
Getting the Evidence
The second step is finding the evidence. There are many ways of finding
information electronically. CD-ROMs can be helpful, but the best sources of
information are the databases maintained by the National Library of Medicine,
such as MEDLINE. Prior to June 1997, ophthalmologists had to search MEDLINE
at a library with a direct connection to MEDLINE via password. Now, through
2 web-based interfaces, MEDLINE can be accessed through either PubMed
(http://www.ncbi.nlm.nih.gov/pubmed/) or Internet Grateful Med
(http://igm.nlm.nih.gov/) without charge or password. In the past, searching
with either Index Medicus or MEDLINE required the use of Medical Subject Headings
(MeSH), but use of MeSH is no longer necessary. The "regular" language of
medicine can now be directly entered because the current search interfaces
have built-in thesauruses for interpreting user entries. The PubMed interface
is easier to use, but it has less powerful search options. A useful PubMed
tool is "Clinical Queries" with built-in sensitivity and specificity concepts
related to therapy, diagnosis, etiology, and prognosis, which automatically
filter the literature. The Grateful Med interface is more powerful, and search
limits such as English, human studies, age groups, clinical trials, and gender
can be easily added.
After getting a list of references from the search interfaces, the full
text of the articles is often necessary to learn the results of a study. Depending
on the size of the reader's hospital or clinic, the library may have the relevant
journal. If not, the librarian can usually get a copy of the article via DOCLINE
or arrange for an interlibrary loan of the journal. Loansome Doc is service
provided by the National Library of Medicine that allows registered users
to have articles mailed or faxed to them. For some journals, many of the articles
are on the World Wide Web. These journals include: Archives
of Ophthalmology
(http://archopht.ama-assn.org/),
the American Journal of Ophthalmology
(http://www.ajo.com), and others.
Using the Internet Grateful Med interface, we enter "vitreous hemorrhage"
and "aspirin" in the query terms, and we apply "clinical trial" to limit the
publication types (Figure 1). Five
articles are found, and after reviewing the abstracts, we choose to read "Effects
of Aspirin Treatment on Diabetic Retinopathy."1
|
|
|
|
Image of the National Library of Medicine's search screen.
|
|
|
EVALUATING THE EVIDENCE
Although, the quality of reports in peer-reviewed journals has been
steadily increasing, the level of evidence varies from one report to the next.
Some articles are published not because they provide good evidence, but because
they are speculative and stimulate additional inquiries. Careful review requires
the separation of conjecture from evidence.
Fortunately, neither a degree in epidemiology nor biostatistics is necessary
to appraise the evidence in the peer-reviewed literature. To illustrate how
judging articles can be easy, this article will focus on evaluating articles
on treatment. As seen in Table 1, the Evidence-Based Medicine Working Group attempts to judge the evidence by
distilling it to the following 3 specific questions2:
(1) Are the results valid? (2) What are the results? (3) Can the results be
applied to my patients? We will address how these 3 value judgment questions
relate to the hypothetical clinical question about the effects of aspirin
treatment on diabetic retinopathy.
Are the Results Valid?
To appropriately interpret the results of a study, we have to assess
how the methods used in the study would affect the results and conclusions.
Flaws in study design can leave the results uninterpretable. Primary and secondary
criteria can be developed to assist in the assessment of the literature. If
an article fails the primary criteria, the information should not be considered
evidence even if the article is the only report on the subject or question.
Secondary criteria are additional requirements that help identify potential
problems in study validity. Problems with secondary criteria may weaken the
apparent validity of the results, but depending on the magnitude of the deficiency,
the article can still be used as evidence.
The 2 primary criteria for assessing a treatment study are: (1) Were
patients assigned to treatment and controls by randomization? Past experience
has demonstrated that investigator enthusiasm can favor enrollment of a potential
study patient into one treatment group vs another in subtle (often subconscious)
but powerful ways. Investigators may subconsciously choose to only enroll
younger or apparently more healthy patients into the treatment arm because
they believe that these patients will be more likely to comply with the treatment
and complete follow-up, or might be less likely to suffer adverse experiences
from the treatment. On the surface, this may not seem to be a problem. However,
if the disease is less severe in these patients, or if such patients are more
likely to spontaneously recover, the results would then be biased. The results
may only reflect the distribution of disease; the treatment group may seem
to have better results because the patients in that group had less severe
disease. Without randomization, there is no way to really know how potential
differences in study groups will affect the results. Randomization tends to
balance risk factors, both known and unknown, in the study groups. Larger
study groups increase the likelihood that the risk factors will be balanced.
Nonrandomized studies can provide evidence, but the evidence should generally
be considered weaker than that based on clinical trial results. In the hypothetical
question on the effect of aspirin on vitreous hemorrhage, the article that
was selected from our literature search reports the results of a large randomized
clinical trial in which patients with diabetic retinopathy were randomly assigned
to either aspirin at 650 mg per day, or placebo.
(2) Were all patients who entered the trial properly accounted for at
the end of the study? Every patient who enters a clinical trial should be
accounted for at its end. The greater the percentage of missing information
at the end of the study, the more suspect the results. This is because patients
who do not finish the study may have developed problems. For example, patients
may have suffered an adverse event related to the treatment and decided to
go elsewhere. Such patients might have an adverse experience that cannot be
assessed because the information is missing. This is a particular problem
because both the adverse experience and the missing information are related
to the treatment. One conservative approach to assessing missing data is to
attribute an adverse experience to all patients who had missing information
at the end of the study. One might also consider how the results would be
affected if the reason that persons in the treated arm did not return for
follow-up was because of poor results, while the reason the control group
had missing patients was because they had good results and decided that returning
for study visits was not necessary. In Table 2 of "Effects of Aspirin Treatment
on Diabetic Retinopathy," the investigators reported that 93% of all patients
were accounted for.1 The largest effect this
missing information could have on a treatment difference is therefore 7%.
A truly randomized comparison must include all of the randomized individuals
in the outcome assessment. Omitting subgroups because of missing information
or failure to comply with treatment creates a nonrandomized subgroup analysis.
It is often tempting to eliminate all patients who did not comply with the
study treatment. However, even if the treatment was not taken, or if it was
the opposite of the original assignment, the main analysis should be done
according to the original treatment assignment. This is called an "intention
to treat" analysis.
An example of the problems with these nonrandomized comparisons can
be seen using data from the Coronary Drug Project.3
This study was designed to assess the safety and efficacy of several lipid-lowering
drugs in patients with coronary heart disease. One of the drugs studied was
clofibrate. The 5-year mortality rates in the 1103 patients assigned to clofibrate
and in the 2789 patients assigned to placebo were 20% and 20.9%, respectively
(P = .55). However, only about two thirds of the
clofibrate group were considered to be good adherers (taking 80% or more of
the study drug) throughout the 5-year study period, and in this group, the
5-year mortality rate was 15%. This was substantially lower than the 24.6%
5-year mortality rate in the group that was not taking study medication (P = .00011). Based on these data, one could conclude that
clofibrate markedly lowered the mortality rate. Interestingly, about two thirds
of the placebo group were also considered to be good adherers to their study
medication. In the placebo group, the good adherers also had a much lower
5-year mortality rate than the poor adherers (15.1% vs 28.3%, respectively
[P = 4.7 x 10-16]).
This demonstrates the danger of assessing the treatment effect in subgroups
of patients. In this case, the lower mortality seen in the group adhering
to the study medication was not a result of the medication, but rather associated
with patient behavior. One can also easily see the problem in comparing outcomes
in those who regularly took clofibrate with outcomes in the entire placebo
group. Even the comparison between adherers in the treated and control groups
is problematic, because there may be different motivations to adherence in
the 2 groups that could bias the results. It is only the overall comparison
that is truly a randomized comparison. This primary analysis is considered
hypothesis testing. Other subgroup analyses may be interesting, but they are
considered hypothesis generation.
Secondary criteria include the following: (1) Was masking used? Everyone
who is involved with a study is likely to have an opinion, conscious or unconscious,
as to what the results will show. Patients who know they are in the "treatment
group" want the treatment to work, may complain less about their adverse effects,
and may try harder when reading the eye chart. Study personnel who want the
treatment to work may try harder to measure an improvement in the outcome
variables with patients who got the treatment. One way to avoid this source
of bias is to let neither the patient nor study personnel know which treatment
was given to the patient. The reader should evaluate how well the study investigators
tried to minimize this source of bias. Because a matched placebo tablet was
used in the aspirin study, both the patients and the investigators were likely
to be unaware of who was assigned to take aspirin.
(2) Were the groups similar at the start of the trial? We discussed
earlier that imbalances in the distribution of prognosis affecting risk factors
might affect the results. The reader should look at the imbalances and the
size of the imbalances. Obviously, if the imbalance is large and the risk
factor strongly affects the results, the reader has to be careful in interpreting
the results. Although randomization increases that likelihood that factors
will be balanced in the study groups, imbalances can occur. If imbalances
occur despite randomization, the imbalances can be at least partially accounted
for by performing an analysis that adjusts for the risk factor(s) that are
not balanced. If both the adjusted and unadjusted analyses show the same results,
then the reader can be more certain that the results are valid. The article
under discussion did not report the baseline characteristics, but Table 5
of the accompanying article4 did compare age,
duration of diabetes, type of diabetes, race, blood pressure, levels for serum
lipids and hemoglobin AIc, body weight, visual acuity, and level
of retinopathy. There were no important or clinically significant differences
(P<.01) between the aspirin and the placebo groups.
(3) Were the groups treated equally? Sometimes the control and treatment
groups may be treated differently. If for example, there was reason to worry
particularly about the control group (because they were not getting treatment)
or about the treated group (because there may be some adverse effects from
the treatment), the investigator may choose to follow that group more carefully.
Although seemingly harmless, a group that is being observed more frequently
may have more adverse events recorded, or they might be receiving better medical
treatment. This ascertainment bias could have an important effect in assessing
the study results. The Early Treatment Diabetic Retinopathy Study article
specified that all patients were treated similarly.4
What Were the Results?
If the methods of the article are valid, then it is appropriate to assess
the results. The results should be examined for the magnitude of the treatment
effect. A treatment effect that is dose dependent would confirm that the effect
is related to the treatment. In addition, if the treatment has biological
plausibility, the reader then can be further assured of the validity of the
treatment effect. If there is no plausible biological mechanism for its actions,
then the treatment effect might be questioned. Finally, confirmation of the
results in other studies provides good evidence that the results are valid.
In the study, "Effects of Aspirin Treatment on Diabetic Retinopathy,"1 the investigators were not able to find an effect.
They reported a relative risk for development of vitreous hemorrhage (aspirin
to placebo) of 1.05 (99% confidence interval, 0.81-1.36). The relative risk
is the ratio of the risk in the intervention group divided by the risk in
the control group. When the relative risk is 1.0, there is no difference between
the risk of reaching the end point for patients assigned to the aspirin, and
that of patients assigned to placebo. A relative risk substantially less than
1.0 indicates a reduced risk (in this case for the aspirin-treated group),
while a relative risk substantially greater than 1.0 indicates an increased
risk. A confidence interval that includes 1.0 indicates that the observed
data are consistent with no difference between the 2 treatment groups. In
this case, the relative risk of developing vitreous hemorrhage compared with
placebo is 1.05, but the confidence interval includes 1.0. This suggests that
aspirin has little or no effect. Clinical trials cannot assess whether or
not 2 treatments are identical. There is always some uncertainty of the results.
The tightness of the confidence interval (0.81-1.36) identifies the magnitude
of this uncertainty. The "true" effect of aspirin on vitreous hemorrhage is
likely to lie between a 19% beneficial effect for aspirin and a 36% harmful
effect on this particular outcome.
Can the Results Be Applied to My Patients?
One way to examine this issue is to assess whether patients similar
to the reader's patients were well represented in the study. If they were,
then the results are likely to apply. However, if there are differences, then
clinical judgment is required to determine whether the differences are significant.
If the differences are minor, then the results are also likely to apply. If
substantive differences are present, then the reader should determine how
the differences might affect the results. In "Effects of Aspirin Treatment
on Diabetic Retinopathy,"1 the inclusion criteria
are broad, so the results should be broadly applicable.
After deciding about the types of patients covered by the study, the
reader has to determine whether all clinically important outcomes were studied.
In ophthalmology, visual acuity is an important outcome variable. If the visual
acuity improves in the treatment group vs the control group, it is likely
that the treatment is effective. Surrogate measures such as intraocular pressure
can also be used if the surrogate measure has been previously shown to correlate
with an outcome of interest such as visual field measurements or visual acuity.
A recent clinical trial showing the risk of using surrogate end points is
a clinical trial on vitamin A and retinitis pigmentosa.5
In that study, the authors conducted a well-designed and well-executed clinical
trial. However, their outcome assessment was based on changes in electroretinograms,
which some clinicians did not accept as a standard clinical measure of effectiveness.
As a result, the study results have remained controversial and have not had
the desired effect on clinical practice, despite the fact that the study investigators
have offered further evidence that their results also applied to visual field
measurements.6
The reader should also look for harmful effects. If a treatment shows
efficacy but has significant adverse effects, the reader may be less likely
to prescribe it. The only way to evaluate the adverse effects is to perform
an analysis of all the adverse experiences in the study. If the study fails
to address adverse events or "quality-of-life" outcomes, the reader should
be cautious about broadly applying the results. The assessment of the effect
of macular hole surgery provides a good example of the need to consider both
the benefits and the potential risks. Clinical trials have shown that the
surgery is effective, but few studies have addressed the effect of the required
postoperative positioning. After macular hole surgery, patients are asked
to remain in the face-down position for at least 1 to 2 weeks. Although gain
in visual acuity following surgery does occur, it does come at a cost. Face-down
positioning requires a caretaker for meal preparation and household chores.
In addition, socializing, watching television, and other activities have to
be curtailed during this period. This reduction in quality of life is probably
most difficult for older patients (the group most likely to be offered surgery),
but its effect is rarely included in studies. The practitioner should consider
and discuss both benefits and potential adverse effects with each patient.
Assessment
We asked the question, "does oral aspirin treatment affect vitreous
hemorrhage in patients with diabetes mellitus?" After finding a list of articles,
we decided to evaluate the article "Effects of Aspirin Treatment on Diabetic
Retinopathy."1 We decided that the methods
were valid, that aspirin does not have an appreciable effect on the development
of vitreous hemorrhage, and that the inclusion criteria of the study were
sufficiently broad to make the study results generalizable. Based on this,
we decided that the article provided good evidence that can be applied to
answer our clinical question.
Using an approach similar to that described in this article, the American
Academy of Ophthalmology's Ophthalmic Technology Assessment Committee (OTAC)
reports the evidence on new and emerging procedures. Publications from OTAC
often provide a solid first step at determining whether one should adopt a
new procedure.
CONCLUSION