Cancer Research Annual Meeting 2010  Protein Translation and Cancer
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Cancer Research Clinical Cancer Research
Cancer Epidemiology Biomarkers & Prevention Molecular Cancer Therapeutics
Molecular Cancer Research Cancer Prevention Research
Cancer Prevention Journals Portal Cancer Reviews Online
Annual Meeting Education Book Meeting Abstracts Online

Cancer Research 68, 3074, April 15, 2008. doi: 10.1158/0008-5472.CAN-07-2999
© 2008 American Association for Cancer Research

This Article
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Statnikov, A.
Right arrow Articles by Aliferis, C. F.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Statnikov, A.
Right arrow Articles by Aliferis, C. F.

Letters to the Editor

A Statistical Reappraisal of the Findings of an Esophageal Cancer Genome-Wide Association Study

Alexander Statnikov

Discovery Systems Laboratory, Department of Biomedical Informatics, Vanderbilt University, Nashville, Tennessee

Chun Li

Department of Biostatistics, Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee

Constantin F. Aliferis

Discovery Systems Laboratory, Departments of Biomedical Informatics, Biostatics, and Cancer Biology, Vanderbilt University, Nashville, Tennessee

To the Editor:

A recent esophageal cancer genome-wide association study by Hu and colleagues (1) identified 37 statistically significant single nucleotide polymorphisms (SNP) and reported a nearly perfect classification of cancer cases and controls on the basis of only these SNPs. Taken at face value, this implies that esophageal cancer is a solely genetic disease, although literature in the field suggests that environmental factors make a major contribution to susceptibility for many cancer types (2). To shed light on this issue, we reanalyzed the data of Hu and colleagues (1) and identified two data analysis pitfalls that caused overoptimistic conclusions in the original article.

First, the SNP selection method by Hu and colleagues (1) was severely biased toward claiming significance for SNPs that are not truly associated with the disease. The calculation of P value in the published generalized linear model (GLM)–based SNP selection method does not reflect the significance of the SNP under consideration but the significance of three variables combined (SNP, family history of esophageal cancer, and alcohol consumption). Because family history and alcohol consumption are strong risk factors for esophageal cancer, this P value will be biased toward zero, even when the SNP has nothing to do with esophageal cancer. When an unbiased GLM-based procedure is used instead, no SNPs can be found significant at the Bonferroni adjusted 0.05 {alpha}-level. See Fig. 1 for details and histograms of the distributions of SNP P values produced by both previously published and unbiased procedures for SNP screening.


Figure 1
View larger version (13K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 1. Distributions of SNP P values computed by the original GLM-based method of Hu and colleagues (1) and the unbiased procedure. Dotted vertical line, Bonferroni adjusted {alpha}-level (0.05/10k). The original method used the model with no independent variables as the "null" versus that including alcohol consumption, family history, and SNP. In contrast, the unbiased method tests the addition of SNP to a model with alcohol consumption and family history. Although there are SNPs that are significant according to the original method, no SNP is significant by the unbiased method.

 
Second, both SNP selection and building of the principal component analysis–based classifier model were performed by Hu and colleagues (1) on the same 100 subjects as used for estimation of the final classification accuracy. Because neither cross-validation nor independent sample validation was performed, the resulting classification performance estimate is overoptimistic as explained by Simon and colleagues (3). To obtain an unbiased performance estimate for the SNP selection method and the classifier of Hu and colleagues (1), the above methods were applied by repeated 10-fold cross-validation procedure (4). The resulting classification performance estimate was 0.68 area under receiver-operating characteristics curve (AUC), whereas the original procedure in Hu and colleagues (1) led to 0.98 AUC, indicating a 0.30 AUC overestimation.

These findings suggest that the data analysis of Hu and colleagues (1) identified nonstatistically significant SNPs and derived a severely biased estimate of classification performance of esophageal cancer patients and healthy controls. For a study of effects of environment and genetics versus data analysis pitfalls, see Statnikov and colleagues (5). The present case study also underscores the importance of sound data analysis in genome-wide association studies.

References

  1. Hu N, Wang C, Hu Y, et al. Genome-wide association study in esophageal cancer using GeneChip mapping 10K array. Cancer Res 2005;65:2542–6.[Abstract/Free Full Text]
  2. Czene K, Lichtenstein P, Hemminki K. Environmental and heritable causes of cancer among 9.6 million individuals in the Swedish Family-Cancer Database. Int J Cancer 2002;99:260–6.[CrossRef][Medline]
  3. Simon R, Radmacher MD, Dobbin K, McShane LM. Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. J Natl Cancer Inst 2003;95:14–8.[Free Full Text]
  4. Braga-Neto UM, Dougherty ER. Is cross-validation valid for small-sample microarray classification? Bioinformatics 2004;20:374–80.[Abstract/Free Full Text]
  5. Statnikov A, Li C, Aliferis CF. Effects of environment, genetics and data analysis pitfalls in an esophageal cancer genome-wide association study. PLoS ONE 2007;2:e958.[CrossRef]




This Article
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Statnikov, A.
Right arrow Articles by Aliferis, C. F.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Statnikov, A.
Right arrow Articles by Aliferis, C. F.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Cancer Research Clinical Cancer Research
Cancer Epidemiology Biomarkers & Prevention Molecular Cancer Therapeutics
Molecular Cancer Research Cancer Prevention Research
Cancer Prevention Journals Portal Cancer Reviews Online
Annual Meeting Education Book Meeting Abstracts Online