(Radiographics. 2002;22:e4-e4.)
© RSNA, 2002
Novel Internet-based Tool for Correcting Apparent Sensitivity and Specificity of Diagnostic Tests to Adjust for Referral (Verification) Bias1
Peter G. Danias, MD, PhD and
J. Anthony Parker, MD, PhD
1 From the Cardiovascular Division, Department of Medicine (P.G.D.), and Nuclear Medicine Division, Department of Radiology (J.A.P.), Beth Israel Deaconess Medical Center and Harvard Medical School, 330 Brookline Ave, Boston Mass 02215. Received April 30, 2001; revision requested November 7; revision received and accepted December 6. Address correspondence to: P.G.D. (e-mail: pdanias{at}caregroup.harvard.edu)
 |
Abstract
|
|---|
Referral (verification) biasthe selective sampling of a population under evaluation for definitive confirmation of disease statushas been recognized as affecting the measured sensitivity and specificity of diagnostic tests. The authors developed an Internet-based Java applet to correct the apparent (measured) values of sensitivity and specificity of diagnostic tests to adjust for referral (verification) bias. The applet was applied to the diagnosis of coronary artery disease by means of exercise stress testing. Referral rates for coronary arteriography can be adjusted separately for patients with positive and negative test results. The more complicated situation, in which the results are stratified in terms of exercise heart rate, was also investigated.
© RSNA, 2002
Index Terms: Data analysis
 |
Introduction
|
|---|
The appropriate use of diagnostic tests to determine the presence or absence of a certain disease depends on the prevalence of the disease in the population under evaluation (ie, pretest likelihood of disease) and the inherent operative characteristics of the test, namely sensitivity and specificity. However, measures of sensitivity and specificity are frequently subject to referral (verification) biasthe selective sampling of the population under evaluation for definitive confirmation of disease status. The impact of referral (verification) bias on the apparent (measured) sensitivity and specificity has been previously recognized and discussed (1-7) and mathematically demonstrated (8). However, the complexity of the formulas needed to debias clinical trial data result in underutilization of such approaches in the medical literature.
The advances in and wide availability of electronic media have facilitated the implementation of methods for quickly processing relevant information and enhancing clinical decision making. We present a practical approach for determining the effects of referral (verification) bias on the measured (apparent) sensitivity and specificity values of any diagnostic test. Using Internet technology, we offer a simple, readily usable interactive educational and research tool (Java applet) that can correct for referral (verification) bias. Coronary artery disease (CAD) is used as the "model" disease under investigation and exercise stress testing as the test for which sensitivity and specificity are measured. Coronary angiography is used as the reference standard for definitively confirming (or rejecting) the presence of CAD.
 |
Methods
|
|---|
Three scenarios were modeled with use of the Bayes theorem (Flowcharts A1, A2, and A3). Standard equations for calculation of sensitivity and specificity were used for the three scenarios (Appendices 15), and equations for calculation of apparent sensitivity and apparent specificity were derived for the last two scenarios (Appendices 2 and 4).

View larger version (16K):
[in this window]
[in a new window]
[Download PPT slide]
|
Figure A1. In this flowchart, the CAD status (d) of the population (N) is represented in the second row. The TP, TN, FP, and FN test results can be calculated as a function of the test sensitivity (Se) and specificity (Sp). (+ CAD) = patients with CAD, (- CAD) = patients without CAD.
|
|

View larger version (18K):
[in this window]
[in a new window]
[Download PPT slide]
|
Figure A2. Flowchart shows that when the study population is subject to referral bias for conclusive determination of disease statusp+ and p- for a positive and negative test result, respectivelyapparent test sensitivity and specificity are measured (see text).
|
|

View larger version (22K):
[in this window]
[in a new window]
[Download PPT slide]
|
Figure A3. Flowchart for calculation of the operative characteristics of a test with varying sensitivity and specificity depending on the test performance. A proportion k of the patients with CAD and a proportion m of patients without CAD will achieve at least 85% of age-predicted maximum heart rate. The sensitivity and specificity of the test is given by Se1 and Sp1 when at least 85% of maximum heart rate is achieved and by Se2 and Sp2 when less than 85% of maximum heart rate is achieved. p1+ and p1- are the referral rates for positive (+) and negative (-) test results for patients who achieved at least 85% of maximum heart rate, and p2+ and p2- are referral rates for positive and negative test results for patients who achieved less than 85% of maximum heart rate.
|
|
With use of the Java programming language (Sun Microsystems, Palo Alto, Calif), Applets 1 and 2 were written to allow an interactive input-output graphical display of equations described in Appendices 2 and 3. The Java program was compiled with Code Warrior Professional (Metrowerks, Austin, Tex). A single Java source file is used for both applets; a hypertext (html) parameter determines which of the two programs is displayed. For the applets to work, your browser must be able to run Java 1.1.
 |
Results
|
|---|
Applet 1. Tests with Binary Outcomes
The (true) sensitivity and specificity of a test can be calculated if all study subjects who undergo testing have definitive confirmation of disease state. In our example, sensitivity and specificity for stress testing can be determined if all subjects undergo both exercise stress testing and coronary angiography, regardless of the results of the stress test (Appendix 1). However, it is neither feasible nor ethical to submit all study patients to invasive testing for definitive confirmation of presence of disease, as coronary angiography carries a measurable (albeit small) risk for potentially serious complications (9,10). Accordingly, patients with low posttest likelihood of disease are commonly not referred for cardiac catheterization and are assumed to be free of disease. In fact, the parameter "normalcy" has been widely used instead of specificity to describe the performance of nuclear stress testing (1118), although the validity of such an approach has been criticized (19,20).
In Applet 1, the effects of referral bias can be easily demonstrated. The instructions for use of this applet are in the legend, and the mathematical equations used are described in Appendix 2. Apparent (measured) sensitivity and specificity and the corresponding "true" values are equal, when referral rates are set to equal values for both positive and negative stress test results (values can by typed in and updated by entering a carriage return). By modifying the referral rates for positive and negative tests, the impact on apparent (measured) sensitivity and specificity can be explored.
For example, it has been reported that only 5%40% of patients are referred for cardiac catheterization if the initial noninvasive evaluation suggests the presence of CAD, and a smaller fraction if the stress test results are negative (2131). If one uses data from clinical studies reporting such referral rates for coronary angiography for positive and negative stress test results, one can see what a great effect referral rates have on the apparent test sensitivity and specificity. For example, by setting the referral rate for positive test results to 32% and negative test results to 3.5% (22), starting with a true sensitivity of 75% and a specificity of 80% as inherent operative characteristics of stress testing for detection of CAD, one obtains an apparent sensitivity of 96% and an apparent specificity of 30%. Inversely, with the same referral rates, starting with an apparent sensitivity of 75% and apparent specificity of 80%, the true sensitivity and specificity values are 25% and 97%, respectively. Since the calculation of apparent sensitivity and specificity takes into account both the true values and referral rates for both positive and negative test results, the relationship between apparent and true values is complex. The positive and negative predictive values of the test, however, are not affected by referral bias (rates), as the relationship between true-positive and false-positive and (negative) test results and the corresponding apparent values remains constant. Positive and negative predictive values, however, are dependent on the prevalence of disease in the general population. For the same true sensitivity and specificity values (75% and 80%, respectively), the positive predictive value increases dramatically from 4% (for a disease prevalence of 1%) to 99% (for a disease prevalence of 95%). The extreme values of 0% and 100% can also be achieved, if the disease prevalence is set at 0% or 99%, accordingly. By using our applet and modifying the population disease prevalence, one can also obtain positive and negative predictive values that are numerically equal to the positive and negative posttest probabilities, respectively. Appendix 3 provides more examples that demonstrate how Applet 1 can provide a better appreciation of the effect of referral (verification) bias.
Applet 2. Tests with Multiple Outcomes
Frequently the problem of assessing sensitivities and specificities becomes more complicated. A single test can have varying sensitivity and specificity according to the performance of the test itself. Furthermore, the test results may not be binary, not just positive or negative. The likelihood of pursuing definitive confirmation of presence of disease depends on multiple factors beyond just the "positivity" or "negativity" of a test result. In our example, the decision to proceed to cardiac catheterization after an exercise stress test depends on multiple factors, including whether the patient achieved maximal exercise (ie, at least 85% of age-predicted maximum heart rate), developed angina, or demonstrated marked electrocardiographic abnormalities or other "high risk" markers suggestive of an adverse prognosis. Applet 2 explores how adding one more level of complexity (achieving at least 85% of maximum heart rate) affects the impact of referral (verification) bias on the relationship between true and apparent sensitivity and specificity values. The mathematical equations governing the functionality of Applet 2 are presented in Appendix 4.
Apparent sensitivity and specificity values can take all possible values, including 0% and 100% regardless of the "true" test sensitivity and specificity, if referral rates are set to extreme values (combinations of 0% and 100%). Again, positive and negative predictive values are independent of referral bias and can be calculated equally from the apparent or true positive and negative test results. Appendix 5 demonstrates how Applet 2 can provide a better appreciation of the effect of referral (verification) bias.
 |
Discussion
|
|---|
When sensitivity and specificity are measured and reported as "apparent" values, one should also include an estimate (or measure) of the referral (verification) bias in order to calculate the true test sensitivity and specificity. In this report, we used exercise stress testing as the model test, but these applets can be applied to any diagnostic test.
The simple two-test-result, two-disease-state model is useful for understanding how test results affect the posttest likelihood of disease. However, for tests that are not simply positive or negative and can encompass a spectrum of graded results, the calculation of the true test sensitivity and specificity is more complicated and is affected by the proportion of patients within each test-response group and the relative referral (verification) rates for each test outcome. These factors have to be considered and integrated for the critical assessment of the operative characteristics of diagnostic tests.
The use of simple interactive tools, such as the two Java applets included in this report, which are made readily available on the World Wide Web, can serve as educational and research tools for appreciating the effects of and adjusting for referral (verification) bias.
 |
Appendix 1
|
|---|
Sensitivity (Se) of diagnostic tests is calculated from the true-positive (TP) and false-negative (FN) and specificity (Sp) from the false-positive (FP) and true-negative (TN) results with the following equations:


If fraction d of the population has CAD and 1 - d does not have CAD, the number of patients with TP and FN results can be expressed as a function of sensitivity and the number of patients with FP and TN results can be expressed as a function of specificity (Fig A1).
 |
Appendix 2
|
|---|
Let us hypothesize that if the stress test results are positive, there is a probability to proceed to cardiac catheterization (p+), and if the test results are negative, there is a probability to pursue conclusive confirmation of disease status (p-). In general p+ is greater than p-, as it is more likely that a definitive invasive (and potentially risky) test will be pursued if the noninvasive evaluation suggests the presence of disease. In this case, test and disease relationships in the population are presented in Figure A2. The different referral (verification) rates result in "apparent" values of sensitivity and specificity that largely depend on the probabilities p+ and p-. In our example (Fig A2), the apparent sensitivity (ASe) and apparent specificity (ASp) derived from Equations (1) and (2), respectively, are now expressed as follows:


where aTP, aTN, are the apparent TP and TN test results and aFP and aFN are the apparent FP and FN test results. Thus the corresponding "true" test sensitivity and specificity would be derived from the equations:


Neither sensitivity and specificity nor apparent sensitivity and specificity are affected by the prevalence of the disease (d) in the general population. Conversely, the positive and negative predictive values are greatly dependent on disease prevalence, as predicted by the Bayes theorem.
 |
Appendix 3
|
|---|
The default values in Applet 1 are as follows:
- Number of patients: 10000
- Disease prevalence: 50%
- True sensitivity: 75%
- True specificity: 80%
- Referral rate for positive test results: 30%
- Referral rate for negative test results: 5%
- Apparent sensitivity: 95%
- Apparent specificity: 40%
- Referral rate arrow pointing to the right
By changing the referral (verification) rate of positive tests to 50%, the apparent sensitivity increases to 97% and the apparent specificity decreases to 29%. By further increasing the referral rate of positive tests to 75%, the apparent sensitivity increases only slightly (up to 98%) and the apparent specificity now decreases to 21%. By again decreasing the referral rate of positive tests to 50% and increasing the corresponding referral rate of negative tests to 20%, the apparent sensitivity becomes 88% and the apparent specificity 62%. A further increase in the referral rate of negative tests to 40% decreases the apparent sensitivity to 79% and increases the apparent specificity to 76%. The apparent sensitivity and specificity values become equal to the corresponding true values when the referral rates for positive and negative test results are set to the same value (eg, 40%).
By resetting the applet (click on the "refresh"/"reload" button of your browser) and then changing the direction of the referral rate to point to the left, one can see that the true sensitivity and specificity now change to 66% and 87%, respectively, when the referral rate for positive tests is set to 50%. The user can experiment more by changing the positive and negative test referral rates to better appreciate their impact on the relationship between true and apparent sensitivity and specificity.
 |
Appendix 4
|
|---|
Figure A3 presents the test and disease relationships of the population when the degree of exercise attained is also considered. For this analysis, we have hypothesized that proportion k of patients with CAD achieved at least 85% of age-predicted maximum heart rate and proportion m of subjects without CAD achieved maximum heart rate. In Figure A3, the differences in sensitivity and specificity between maximal (at least 85% age-predicted maximum heart rate) and submaximal (less than 85% maximum heart rate) stress tests (2931) have been taken into account, and expressed as Se1 and Se2 (sensitivity for maximal and submaximal stress tests, respectively) and Sp1 and Sp2 (specificity for maximal and submaximal stress tests, respectively). In this example, apparent sensitivity would be calculated as:

where aTP1, aTN1, aFP1, and aFN1 refer to the apparent TP, TN, FP, and FN test results of a maximal stress test, and aTP2, aTN2, aFP2, and aFN2 refer to the apparent TP, TN, FP, and FN test results of a submaximal stress test. Thus, the apparent specificity would be calculated as:

Thus, if p1+ and p2+ represent the referral rates for positive maximal and submaximal stress tests, respectively, and p1- and p2- represent the referral rates for negative maximal and submaximal stress tests, respectively, the apparent sensitivity and specificity would be calculated as:


This analysis is more representative of the complex interrelationships accounting for differences between apparent (measured) operative test characteristics and "true" ones. The use of equations such as (7) and (8) allows better assessment of diagnostic tests, but their implementation has not been widely employed. Calculations from Equations (7) and (8) are presented as output values in Applet 2.
 |
Appendix 5
|
|---|
The default values in Applet 2 are as follows:
Population Characteristics
- Number of patients: 10000
- CAD prevalence: 85%
- Patients with CAD who exercise to at least 85% of maximum heart rate: 50%
- Patients without CAD who exercise to at least 85% of maximum heart rate: 90%
Patients Who Exercise to at Least 85% of Maximum Heart Rate
- True sensitivity: 75%
- True specificity: 80%
- Referral rate for positive test results: 30%
- Referral rate for negative test results: 5%
- Apparent sensitivity: 95%
- Apparent specificity: 40%
- Referral rate arrow pointing to the right
Patients Who Exercise to Less than 85% of Maximum Heart Rate
- True sensitivity: 50%
- True specificity: 65%
- Referral rate for positive test results: 70%
- Referral rate for negative test results: 15%
- Apparent sensitivity: 82%
- Apparent specificity: 29%
- Referral rate arrow pointing to the right
You can always return to these preset values by clicking on the "refresh"/"reload" button of your browser.
By changing the referral rate for positive test results to 40% for patients who exercise to at least 85% of maximum heart rate, you will change the apparent sensitivity for this subgroup as described in Applet 1 (see Appendix 3). However, if the sensitivity of the test is considered as a single value (regardless of patient performance), the overall sensitivity (shown at the bottom of the applet) will change from 87% to 88% and the apparent specificity from 37% to 32%. With the applet reset, if the referral rate for positive test results for patients who exercise to less than 85% of maximum heart rate is now set from 70% to 40%, the apparent sensitivity and specificity for this subgroup will change to 73% and 42%, respectively. Again, if the sensitivity of the test is considered as a single value (regardless of patient performance), the overall sensitivity will now change from 87% to 83% and the apparent specificity from 37% to 40%. The user can experiment more by changing the positive and negative test referral (verification) rates to better appreciate their impact on the relationship between true and apparent sensitivity and specificity in a much more complex system than that presented in Applet 1.
 |
Acknowledgments
|
|---|
The authors thank Robert B. Johnson, MD, for insightful comments and discussion.
 |
References
|
|---|
-
Cecil MP, Kosinski AS, Jones MT, et al. The importance of work-up (verification) bias correction in assessing the accuracy of SPECT thallium-201 testing for the diagnosis of coronary artery disease. J Clin Epidemiol 1996; 49:735-742.[CrossRef][Medline]
-
Diamond GA, Rozanski A, Forrester JS, et al. A model for assessing the sensitivity and specificity of tests subject to selection bias: application to exercise radionuclide ventriculography for diagnosis of coronary artery disease. J Chronic Dis 1986; 39:343-355.[CrossRef][Medline]
-
Diamond GA. Affirmative actions: can the discriminant accuracy of a test be determined in the face of selection bias?. Med Decis Making 1991; 11:48-56.
-
Diamond GA. Off Bayes: effect of verification bias on posterior probabilities calculated using Bayes' theorem. Med Decis Making 1992; 12:22-31.
-
Gray R, Begg CB, Greenes RA. Construction of receiver operating characteristic curves when disease verification is subject to selection bias. Med Decis Making 1984; 4:151-164.
-
Greenes RA, Begg CB. Assessment of diagnostic technologies: methodology for unbiased estimation from samples of selectively verified patients. Invest Radiol 1985; 20:751-756.[CrossRef][Medline]
-
Ransohoff DF, Feinstein AR. Problems of spectrum and bias in evaluating the efficacy of diagnostic tests. N Engl J Med 1978; 299:926-930.[Abstract]
-
Begg CB, Greenes RA. Assessment of diagnostic tests when disease verification is subject to selection bias. Biometrics 1983; 39:207-215.[CrossRef][Medline]
-
Noto TJ, Jr, Johnson LW, Krone R, et al. Cardiac catheterization 1990: a report of the Registry of the Society for Cardiac Angiography and Interventions (SCA&I). Cathet Cardiovasc Diagn 1991; 24:75-83.[Medline]
-
Scanlon PJ, Faxon DP, Audet AM, et al. ACC/AHA guidelines for coronary angiography. A report of the American College of Cardiology/American Heart Association Task Force on practice guidelines (Committee on Coronary Angiography). Developed in collaboration with the Society for Cardiac Angiography and Interventions. J Am Coll Cardiol 1999; 33:1756-1824.
-
Kang X, Berman DS, Lewin H, et al. Comparative ability of myocardial perfusion single-photon emission computed tomography to detect coronary artery disease in patients with and without diabetes mellitus. Am Heart J 1999; 137:949-957.[CrossRef][Medline]
-
Ficaro EP, Fessler JA, Shreve PD, Kritzman JN, Rose PA, Corbett JR. Simultaneous transmission/emission myocardial perfusion tomography: diagnostic accuracy of attenuation-corrected 99mTc-sestamibi single-photon emission computed tomography. Circulation 1996; 93:463-473.[Abstract/Free Full Text]
-
Kiat H, Iskandrian AS, Villegas BJ, Starling MR, Berman DS. Arbutamine stress thallium-201 single-photon emission computed tomography using a computerized closed-loop delivery system. Multicenter trial for evaluation of safety and diagnostic accuracy. The International Arbutamine Study Group. J Am Coll Cardiol 1995; 26:1159-1167.
-
Zaret BL, Rigo P, Wackers FJ, et al. Myocardial perfusion imaging with 99mTc tetrofosmin. Comparison to 201Tl imaging and coronary angiography in a phase III multicenter trial. Tetrofosmin International Trial Study Group. Circulation 1995; 91:313-319.
-
Matzer L, Kiat H, Wang FP, et al. Pharmacologic stress dual-isotope myocardial perfusion single-photon emission computed tomography. Am Heart J 1994; 128:1067-1076.[CrossRef][Medline]
-
Berman DS, Kiat H, Friedman JD, et al. Separate acquisition rest thallium-201/stress technetium-99m sestamibi dual-isotope myocardial perfusion single-photon emission computed tomography: a clinical validation study. J Am Coll Cardiol 1993; 22:1455-1464.[Abstract]
-
Van Train KF, Maddahi J, Berman DS, et al. Quantitative analysis of tomographic stress thallium-201 myocardial scintigrams: a multicenter trial. J Nucl Med 1990; 31:1168-1179.[Abstract/Free Full Text]
-
Maddahi J, Van Train K, Prigent F, et al. Quantitative single photon emission computed thallium-201 tomography for detection and localization of coronary artery disease: optimization and prospective validation of a new technique. Am Coll Cardiol 1989; 14:1689-1699.[Abstract]
-
Diamond GA. An improbable criterion of normality (abstr). Circulation 1982; 66:681.[Medline]
-
Diamond GA. Monkey business. Am J Cardiol 1986; 57:471-475.[CrossRef][Medline]
-
Amanullah AM, Kiat H, Hachamovitch R, et al. Impact of myocardial perfusion single-photon emission computed tomography on referral to catheterization of the very elderly. Is there evidence of gender-related referral bias? J Am Coll Cardiol 1996; 28:680-686.
-
Bateman TM, O'Keefe JH, Jr, Dong VM, Barnhart C, Ligon RW. Coronary angiographic rates after stress single-photon emission computed tomographic scintigraphy. J Nucl Cardiol 1995; 2:217-223.[CrossRef][Medline]
-
Hachamovitch R, Berman DS, Kiat H, et al. Gender-related differences in clinical management after exercise nuclear testing. J Am Coll Cardiol 1995; 26:1457-1464.[Abstract]
-
Hachamovitch R, Berman DS, Kiat H, et al. Exercise myocardial perfusion SPECT in patients without known coronary artery disease: incremental prognostic value and use in risk stratification. Circulation 1996; 93:905-914.[Abstract/Free Full Text]
-
Hachamovitch R, Berman DS, Kiat H, et al. Incremental prognostic value of adenosine stress myocardial perfusion single-photon emission computed tomography and impact on subsequent management in patients with or suspected of having myocardial ischemia. Am J Cardiol 1997; 80:426-433.[CrossRef][Medline]
-
Hartz A, Deber R, Bartholomew M, Midtling J. Physician characteristics affecting referral decisions following an exercise tolerance test. Arch Fam Med 1993; 2:513-519.[Abstract]
-
Schmoliner R, Dudczak R, Kronik G, et al. Impact of thallium-201 imaging on clinical assessment and management of patients with chest pain. Clin Cardiol 1984; 7:660-666.[Medline]
-
Wassertheil-Smoller S, Steingart RM, Wexler JP, et al. Nuclear scans: a clinical decision making tool that reduces the need for cardiac catheterization. J Chronic Dis 1987; 40:385-397.[CrossRef][Medline]
-
Marwick TH, Nemec JJ, Pashkow FJ, Stewart WJ, Salcedo EE. Accuracy and limitations of exercise echocardiography in a routine clinical setting. J Am Coll Cardiol 1992; 19:74-81.[Abstract]
-
Stewart RE, Kander N, Juni JE, et al. Submaximal exercise thallium-201 SPECT for assessment of interventional therapy in patients with acute myocardial infarction. Am Heart J 1991; 121:1033-1041.[CrossRef][Medline]
-
Stolzenberg J, London R. Reliability of stress thallium-201 scanning in the clinical evaluation of coronary artery disease. Clin Nucl Med 1979; 4:225-228.[CrossRef][Medline]
This article has been cited by other articles:

|
 |

|
 |
 
C.-S. Yam, D. Levine, M. Nishino, A. Sitek, and M. Larson
A Simple Method for Displaying Cine Images on Web-Based Teaching Files
Am. J. Roentgenol.,
February 1, 2005;
184(2):
691 - 694.
[Abstract]
[Full Text]
[PDF]
|
 |
|