## Assessing the Value of Candidate Tests Epidemiological Considerations

Diagnostic tests are often judged on the basis of their sensitivity, specificity, and predictive value. Recall that sensitivity is defined as the probability that a test will be positive when the disease is present and specificity reflects the probability that a test will be negative if the disease is not present. Ideally, a test for AD would be both highly sensitive and specific. Most biological tests under consideration have a range of values. Establishing cutoff points for disease usually involves a tradeoff between sensitivity and specificity. The relationship between sensitivity and specificity can be characterized by a receiver operator characteristic (ROC) curve that plots the probability of having a true positive result against that of a false positive one for a range of cutoff scores (see Fig. 3). Generally, sensitivity is emphasized when failure to detect a disease has very deleterious consequences. Specificity is emphasized when a false-positive result leads to potential harm. In terms of tests for AD, the "ideal" set point of this balancing act will evolve over time as a reflection of changes in the risk:benefit ratio of treatments being developed. New tests that allow for improvement of both sensitivity and specificity (shifting the ROC curve) would be considered an advance over current diagnostic probes. Furthermore, tests also will need to be judged by how early they can sensitively detect the underlying AD pathologic process without generating a false-positive rate that is too high.*

Clinically, diagnostic tests are evaluated not by sensitivity and specificity, but rather their positive predictive value (PPV), that is, the probability that the disease is present if the test is positive and their negative predictive value, the probability that there is no disease if the test is negative. Such information is what clinicians and their patients are interested in knowing when a test has been ordered. The PPV is defined as the number of true positives divided by the sum of

*It is likely that diagnostic tests will have different ROC curves for each of the stages of AD that we have discussed. For example, the sensitivity of marker A for a given specificity (say 80%) may be 95% at CDR stage 2, 60% at CDR stage 0.5, and 30% for the preclinical stage. Conceivably, the most appropriate set point for a given ROC curve between sensitivity and specificity would be different for each stage.

1 - Specificity

Fig. 3. Example of a receiver operator characteristic (ROC) curve.

1 - Specificity

Fig. 3. Example of a receiver operator characteristic (ROC) curve.

the true positives plus the false positives. Central to calculating the PPV is an estimate of the prevalence of the disease (prior probability) in the community being tested. According to the Bayes theorem, PPV can be calculated as follows:

PPV = (prevalence X sensitivity) / [(prevalence X sensitivity)

Prior probability determines how much of an impact the false positive rate (i.e, 1 — specificity) has on the predictive value of the test. For example, even if a test were 99% sensitive and 99% specific, if the prior probability were only 1%, the PPV only would be 50%. By contrast, if the prior probability were 50%, the PPV would be 99%.

Thus, for Alzheimer's disease, if the estimated prevalence is relatively low, even a very sensitive and specific test would have limited predictive value. However, establishing true prevalence rates for AD is not as straightforward as it might first appear. The prevalence reported in the literature reflects an estimate of the number of clinically demented cases in a particular age range that are felt to be due to AD. Some reports have suggested that as many as 10% of Americans over the age of 65 and nearly 50% of elders over 85 suffer from a clinical dementia of the Alzheimer's type (1). There are several limitations to using established prevalence estimates to evaluate the usefulness of newer diagnostic strategies. First, these numbers are based on current methods for diagnosing the disease. They identify individuals whose clinical state has declined to the point of being demented, but do not include individuals who are in the preclinical or presympotmatic stages of the illness. Currently, there is no definitive way to identify such individuals for an accurate estimate of the prevalence of AD pathology in the community. Several lines of evidence would suggest that the prevalence is quite high. For example, if 40-50% of individuals over 85 suffer from a clinical dementia of the Alzheimer's type and if the disease process begins 15-20 years before a person is clinically demented, then 40-50% of individuals in their early 70s may have developing AD pathology. While these particular numbers may represent the "worst-case scenario," the logic behind them needs to be taken seriously. Certainly, in evaluating tests for AD in the presymptomatic stages, we will need new ways of estimating prior probability of underlying pathology in order to assess the potential utility of the assays.

Review of the epidemiological aspects of early diagnosis of AD parallels discussions of screening tests in medicine that address ways of evaluating at-risk populations. However, the current approach to diagnosis in AD distinguishes it from other diseases for which screening tests are common. Most often, a positive result on a screening evaluation leads to "more definitive" tests (e.g, occult blood in the stool on a screening examination leads to colonoscopy and/or radiological studies; an abnormal screening digital prostate examination or positive PSA results in sonography and biopsy of the prostate). Unfortunately, short of brain biopsy, which is very rarely done, there is currently no "gold standard" marker for AD (see below) that could provide the next level of assessment. Thus, in AD the usual distinction between screening and diagnostic tests is blurred. Despite the absence of a definitive noninvasive marker for AD, one can still make use of test data. A positive test can help identify elders at greatest risk for becoming demented. Such information could result in following them with greater vigilance. Confirmatory evidence could come in the form of a convergence of other diagnostic markers or clinical signs that become positive over time. Depending on the risk:benefit profile of available therapies, the threshold for initiating treatment in such patients might be lowered. Unfortunately, there are also potential negative social consequences in identifying elders at increased risk for becoming demented. Such information could be used by insurance companies or other members of society to deprive them of potential benefits. These important issues are discussed in Chapter 12.