| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Letters to the Editor |
Discovery Systems Laboratory, Department of Biomedical Informatics, Vanderbilt University, Nashville, Tennessee
Department of Biostatistics, Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee
Discovery Systems Laboratory, Departments of Biomedical Informatics, Biostatics, and Cancer Biology, Vanderbilt University, Nashville, Tennessee
To the Editor:
A recent esophageal cancer genome-wide association study by Hu and colleagues (1) identified 37 statistically significant single nucleotide polymorphisms (SNP) and reported a nearly perfect classification of cancer cases and controls on the basis of only these SNPs. Taken at face value, this implies that esophageal cancer is a solely genetic disease, although literature in the field suggests that environmental factors make a major contribution to susceptibility for many cancer types (2). To shed light on this issue, we reanalyzed the data of Hu and colleagues (1) and identified two data analysis pitfalls that caused overoptimistic conclusions in the original article.
First, the SNP selection method by Hu and colleagues (1) was severely biased toward claiming significance for SNPs that are not truly associated with the disease. The calculation of P value in the published generalized linear model (GLM)–based SNP selection method does not reflect the significance of the SNP under consideration but the significance of three variables combined (SNP, family history of esophageal cancer, and alcohol consumption). Because family history and alcohol consumption are strong risk factors for esophageal cancer, this P value will be biased toward zero, even when the SNP has nothing to do with esophageal cancer. When an unbiased GLM-based procedure is used instead, no SNPs can be found significant at the Bonferroni adjusted 0.05
-level. See Fig. 1
for details and histograms of the distributions of SNP P values produced by both previously published and unbiased procedures for SNP screening.
|
These findings suggest that the data analysis of Hu and colleagues (1) identified nonstatistically significant SNPs and derived a severely biased estimate of classification performance of esophageal cancer patients and healthy controls. For a study of effects of environment and genetics versus data analysis pitfalls, see Statnikov and colleagues (5). The present case study also underscores the importance of sound data analysis in genome-wide association studies.
References
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Cancer Research | Clinical Cancer Research |
| Cancer Epidemiology Biomarkers & Prevention | Molecular Cancer Therapeutics |
| Molecular Cancer Research | Cancer Prevention Research |
| Cancer Prevention Journals Portal | Cancer Reviews Online |
| Annual Meeting Education Book | Meeting Abstracts Online |