TY - JOUR
T1 - Methods for categorizing a molecular biomarker in early marker studies: A simulation study
JF - Cancer Research
JO - Cancer Res
SP - 180
LP - 180
VL - 65
IS - 9 Supplement
AU - Yang, Dongyun
AU - Groshen, Susan
AU - Lenz, Heinz-Josef
Y1 - 2005/05/01
UR - http://cancerres.aacrjournals.org/content/65/9_Supplement/180.2.abstract
N2 - Proc Amer Assoc Cancer Res, Volume 46, 2005 763 Background: Identifying new molecular biomarkers that can be used for cancer prevention, screening, diagnosis, prognosis, and/or monitoring clinical outcome of treatment is essential in cancer research. Many biomarkers are measured as continuous variables. Categorizing a biomarker (selecting a cutpoint to define favorable and unfavorable risk groups) is reasonable if a threshold effect/association is suspected and may be desirable in making decisions, such as choosing prevention or treatment options. Since knowledge of biological functioning of most biomarkers is usually still lacking during early marker research, statistical methods have been used for categorizing continuous biomarkers. In a Monte Carlo study, we compare the type I error and power of two statistical methods commonly used to categorize a continuous variable with those of three univariate procedures used to examine the association between a continuous variable and binary outcome variable. Methods: Methods used to categorize continuous biomarkers include: (1) selection of a cutpoint at the sample median (2) maximal χ2 approach with adjusted p-value. Methods used to test the association between a continuous variable and a binary outcome include (1) Mann-Whitney-Wilcoxon test (2) Student’s t-test, and (3) logistic regression. A total of 5,000 simulations were performed with the following parameters: (1) Sample size: 20, 30, 50, 100, 200 (2) Difference in outcome between favorable and unfavorable groups: 0, 0.2, 0.4 (3) Proportion in the unfavorable group: 0.3, 0.5, and 0.7. Results: Under the null hypothesis in which there are no differences in the outcome variable, the two categorizing methods provided the correct nominal significance level (α=0.05). In alternative situations in which the proportion of patients having a good outcome in the favorable group is higher than that in the unfavorable group, the power for the maximal χ2 approach is usually higher than that for the median approach. The power is comparable between maximal χ2 approach and three methods where the biomarker is continuous when the difference in the outcome variable is large (0.4). A substantial loss of statistical power for maximal χ2 approach is observed when the difference in the outcome variable is moderate (0.2). Conclusions: The maximal χ2 approach performs better than the median cut-point approach in the univariate setting when categorizing a molecular biomarker is desirable and is a reasonable method for establishing association when large effects are desired.
ER -