The hierarchy of evidence

Michael Bigby

Once appropriate questions have been formulated, what are the sources for the best evidence to answer these questions? Potential sources include personal experience, colleagues or experts, textbooks, articles published in journals and systematic reviews. An important principle of evidence-based medicine is that the quality (strength) of evidence is based on a concept of a hierarchy of evidence. This hierarchy of evidence consists of, in descending order1,2:

• results of systematic reviews of well designed studies

• results of one or more well designed studies,

• results of large case series

• expert opinion

• personal experience.

The ordering of the hierarchy of evidence has been widely discussed, actively debated and sometimes hotly contested.3-7

Well done systematic reviews of well-performed clinical studies (especially if the studies have results of similar magnitude and direction, and if there is statistical homogeneity) are most likely to have results that are true and useful. A systematic review is an overview that answers a specific clinical question, contains a thorough and unbiased search of the relevant literature, explicit criteria for assessing studies and structured presentation of the results.

A systematic review that uses quantitative methods to summarise results is a meta-analysis.3,8 Meta-analysis is credited with allowing recognition of important treatment effects by combining the results of small trials that individually lacked the power to demonstrate differences between treatments. For example, the benefits of intravenous streptokinase in acute myocardial infarction was recognised from the results of a cumulative meta-analysis of smaller trials at least a decade before the treatment was recommended by experts and before it was demonstrated to be efficacious in large clinical trials.3,4 Meta-analysis has been criticised because of discrepancies between the results of meta-analysis and results of large clinical trials.3,5-7 For example, results of a meta-analysis of 14 small studies of calcium in the treatment of pre-eclampsia showed benefit of treatment, whereas a large trial failed to show a treatment effect.3 The frequency of discrepancies ranges from 10% to 23%.3 Discrepancies can often be explained by differences in treatment protocols, heterogeneity of study populations or changes that occur over time.3 Not all systematic reviews and metaanalyses are equal. Systematic reviews conducted within the Cochrane Collaboration are rated among the best, but even then up to a third may contain significant problems.9,10 Methods for assessing the quality of each type of analysis are available.2,11

The type of clinical study that constitutes best evidence is determined by the category of the question being asked (Table 7.1).12 Questions about diagnosis are best addressed by comparisons with a reference standard

Table 7.1

Grades of evidence12


Level of evidence

Therapy/prevention Harm





Systematic review (with homogeneitya) of RCTs

Systematic review (with homogeneity) of inception cohort studies, or a CPG validated on a test set

Systematic review (with homogeneity) of Level 1 (see column 2) diagnostic studies, or a CPG validated on a test set


Individual RCT (with narrow confidence intervals)

Individual inception cohort study with at least 80% follow up

Independent blind comparison of an appropriate spectrum of consecutive patients, all of whom have undergone both the diagnostic test and the reference standard


All or noneb

All-or-none case series

Very high sensitivity or specificity



Systematic review (with homogeneity) of cohort studies

Systematic review (with homogeneity) of either retrospective cohort studies or untreated control groups in RCTs

Systematic review (with homogeneity) of at least level 2 (see column 2) diagnostic studies


Individual cohort study (including low quality RCT: for example <80% follow up)

Retrospective cohort study or follow up of untreated control patients in an RCT, or CPG not validated in a test set

Independent blind comparison but either in non-consecutive patients, or confined to a narrow spectrum of study individuals (or both), all of whom have undergone both the diagnostic test and the reference standard, or a diagnostic CPG not validated in a test set


"Outcomes" research0


Systematic review (with homogeneity) of case-control studies

Systematic review (with homogeneity) of 3b (see column 2) and better studies


Individual case-control study

Independent blind comparison of an appropriate spectrum, but the reference standard was not applied to all study patients



Case series (and poor quality cohort and case-control studies)

Case series (and poor quality prognostic cohort studies)

Reference standard was not applied independently or not applied blindly



D 5 Expert opinion without explicit critical appraisal, or based on physiology, bench research or "first principles"

These levels were generated in a series of iterations among members of the NHS R&D Centre for Evidence-based Medicine (Chris Ball, Dave Sackett, Bob Phillips, Brian Haynes and Sharon Straus). For details see (accessed 17 Sep 1998). Recommendations based on this approach apply to "average" patients and may need to be modified in light of an individual patient's unique biology (risk, responsiveness etc.) and preferences about the care they receive.

RCT, randomised controlled trial; CPG, clinical practice guideline - a systematically developed statement designed to help practitioners and patients make decisions about appropriate health care for specific clinical circumstances. aHomogeneity: lacking variation in the direction and magnitude of results of individual studies.

bAll or none: interventions that produced dramatic increases in survival or outcome, for example streptomycin for tuberculosis meningitis.

cOutcomes research includes cost-benefit, cost-effectiveness and cost-utility analysis.

Table 7.1 (Continued)

Grade Level of Therapy/prevention Prognosis/aetiology Diagnosis evidence Harm evaluated In an appropriate spectrum of patients where the test is likely to be used.2,11,13,14 Questions about therapy and prevention are best addressed by randomised controlled trials (RCTs).2,11,15,16 Cohort studies or case-control studies best address questions about prognosis, harm and disease aetiology.2,11,17,18 Methods for assessing the quality of each type of evidence are available.2,9

The RCT has become the gold standard for determining treatment efficacy, following the publication in 1948 of the trial that demonstrated that streptomycin was effective in the treatment of tuberculosis.19 Over 327 700 RCTs have been recorded in the Cochrane Central Register of Controlled Trials and thousands more probably exist in an unpublished form.20 Large inclusive fully blinded RCTs are likely to provide the best possible evidence about effectiveness.21-23 However, this assumption about methods should be tested empirically, just as assumptions about treatment effects need to be substantiated by empirical evidence.4 Studies have demonstrated that failure to use randomisation or adequate concealment of allocation resulted in larger estimates of treatment effects, caused predominantly by a poorer prognosis in non-randomly selected control groups compared with randomly selected control groups.23

Expert opinion can be valuable, particularly for rare conditions in which the expert has the most experience or when other forms of evidence are not available. However, several studies have demonstrated that expert opinion often lags significantly behind conclusive evidence.1 Experts should be aware of the quality of evidence that exists.

Whereas personal experience is an invaluable part of becoming a competent physician, the pitfalls of relying too heavily on personal experience have been widely documented.1,24,25 Nisbett and Ross extensively reviewed people's ability to draw inferences from personal experience and documented several pitfalls.26 They include:

• overemphasis on vivid anecdotal occurrences and underemphasis on significant statistically strong evidence;

• bias in recognising and accepting evidence that supports one's beliefs, and parallel failure to recognise or accept evidence that contradicts one's beliefs;

• persistence of beliefs in spite of overwhelming evidence presented against.

Although textbooks appear to be a valuable source of evidence, they have several well-documented shortcomings. First, by virtue of how they are written, produced and distributed, most are about 2 years out of date at the time of publication. Most textbook chapters are narrative reviews that do not consider the quality of the evidence reported.1,2 They also tend to reflect the biases and shortcomings of the experts who write them.

More detailed studies of the relationship of study type and the direction and magnitude of purported benefit are needed in dermatology in order to guide dermatologists on the relative merits of different study designs. In the meantime, the hierarchy of evidence should not be conceptualised as a linear phenomenon (i.e. as a scale going from "good" to "bad"). The quality and relevance of evidence should be considered. Thus, a well-conducted large cohort study may be more reliable than a small RCT that has violated most aspects of good RCT design and reporting. Similarly, a small RCT of moderate quality dealing with the exact problem that the patient is complaining about (for example lipodermatosclerosis) is likely to be more useful than a large RCT dealing with a different problem (for example venous stasis ulcer).

Was this article helpful?

0 0
Natural Treatments For Psoriasis

Natural Treatments For Psoriasis

Do You Suffer From the Itching and Scaling of Psoriasis? Or the Chronic Agony of Psoriatic Arthritis? If so you are not ALONE! A whopping three percent of the world’s populations suffer from either condition! An incredible 56 million working hours are lost every year by psoriasis sufferers according to the National Psoriasis Foundation.

Get My Free Ebook

Post a comment