|
|
||||||||
Editorials |
1 From the Division of Nutritional Sciences, Cornell University, Ithaca, NY.
2 Address reprint requests to J-P Habicht, Savage Hall, Cornell University, Ithaca, NY 14853. E-mail: jh48{at}cornell.edu and bjs25{at}cornell.edu.
See corresponding article on page 799.
The publication of the article by Rice et al (1) in this issue of the Journal is a good occasion to review the use of the receiver operating characteristic (ROC) method almost 30 y after it was introduced for medical diagnoses (reviewed in reference 2) and 20 y after it was introduced to readers of the Journal (3). In medical diagnosis, ROC analysis is used to select indicators that differentiate between diseased and nondiseased states (2). In nutrition, ROC analysis is used to select reflective indicators that differentiate between better and worse nutritional status (3) and to select risk indicators that predict health or survival outcomes based on nutritional variables (4). ROC analysis has also been used to identify response indicators (1, 5), measures that respond to nutritional influences. A response indicator should not be confused with a potential-to-benefit indicator, which predicts a future response (6).
ROC analysis requires a standard with which the indicator is compared. For reflective and risk indicators in nutrition, the nutrition or health standards are directly and logically equivalent to disease and nondisease. However, for the response indicator investigated by Rice et al (1), the standard was whether the subject was in a nutritional intervention group (equivalent to nondisease at the end of the study) or in a placebo group. Although the nondisease label is appropriate for the intervention group, the disease label for the placebo group is more problematic. The indicators measured in many of the subjects in the intervention group did not respond to treatment because the subjects were not vitamin A deficient; in other words, they were always nondiseased. A similar proportion of always nondiseased subjects were in the placebo group. The values of these always nondiseased persons in both groups overlapped and therefore decreased the indicator's discriminatory power at higher sensitivity values and lower specificity values in the placebo and intervention groups, respectively. This misclassification impaired the selection of the best response indicators.
All ROC parameters can be formulated as a standardized difference or as a t test between the means of the specificity (nondiseased) and the sensitivity (diseased) distributions (4). The indicator of choice is the one that produces the least overlap between the specificity and sensitivity distributions, resulting in the highest t test value and largest ROC parameter (area under the curve). ROC parameters can only be compared if the ratio of the specificity to sensitivity variances is the same between indicators, which results in parallel ROC lines if the sensitivity and specificity values are transformed to z scores (Gaussian probability scale). If the z scored ROC lines are not parallel, the ROC comparison statistic is meaningless. For instance, 2 ROC lines can cross, as is the case in Table 4 of the article by Rice et al (1). In that case, the better indicator before the crossover is the poorer indicator after the crossover.
In most publications presenting ROC analyses, computer programs are used to perform the analyses. This is necessary when dealing with indicators that are not continuous variables, such as X-ray diagnoses (2). However, when continuous variables are used, as is usually the case in nutrition, calculations made without a computer program (4) are preferable because they force explicit consideration of the components of the ROC parametersthe variances of the specificity and specificity distributions and the difference between the means of the distributions. This method is certainly preferable to the method most cited in the literature for comparing indicators (7). The method described by Erdreich and Lee (7) does not take into account the correlations of the indicator variables among themselves and therefore can result in severely biased ROC statistics. The major limitation to using the method described by Brownie et al (4) is understanding the publication that describes it. The authors present much more information than is necessary. The reader is advised to start with the first 2 sections and then to turn to the example.
When the z scored ROC lines are parallel, ROC analyses are better by far than any other technique for describing and comparing indicators to be used for screening. The next step is choosing an appropriate cutoff (8, 9). When the conditions of parallelism are not met, one must select the cutoff points and then test their discriminatory powers by other less satisfactory means (10).
Screening with a reflective indicator to identify malnourished children or with a risk indicator to identify children at risk of bad outcomes makes sense because one can do something for these children. The population prevalences derived from such screens also make intuitive sense. The rational for screening children with a response indicator to identify those who did or did not receive a past intervention is less obvious.
A response indicator is, however, the right indicator to use to measure influences on nutrition at the population level. For this use, estimating the effect on nutrition with dichotomous prevalences is usually much less efficient statistically than using the means or ordinary least squares (OLS) regressions analyses (9). Selecting the best indicator for OLS analyses is best done by comparing standardized differences directly (1, 5) or indirectly through sample size estimates (11). The standardized difference is directly related to the t and F tests for statistical significance and therefore establishes the sample sizes needed to identify a nutritional determinant (5). The higher the standardized difference and the lower the required sample size, the better the indicator. The ranking of indicator quality is almost identical whether one uses the ROC or the standardized difference method (1), which is not surprising because the ROC is also based on standardized difference considerations (4). Therefore, using the standardized difference technique is most appropriate for selecting the best response indicator.
ACKNOWLEDGMENTS
I thank Gretel Pelto for helpful discussions and editing.
REFERENCES
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |