Abstract
Background: Over the coming year, the NIH Cancer Genome Atlas (TCGA) initiative will profile several hundred lung squamous cell carcinomas (LSCC) for gene expression, DNA copy number aberrations, and somatic mutation events as has already been dramatically reported for glioblastoma. In the recent TCGA glioblastoma subtype study, analysis of existing datasets was essential in validating molecular subtypes and copy number associations. In preparation for the forthcoming TCGA LSCC project, this abstract presents an analysis of public LSCC datasets to produce a consensus molecular subtype classification. No consensus classification has been attempted so far. Methods: Raw data was obtained from three of the largest published LSCC microarray datasets: Veridex (N = 129; Raponi, et al. 2006), Duke (N = 52; Bild, et al. 2006), Expo (N = 36; International Genomics Consortium, 2008). Microarray probes were mapped to the latest RefSeq gene build (GenBank release 161) by BLAT and ambiguously mapped probes were discarded. Using this mapping, gene expression values were calculated by the Robust Multichip Average method. Genes were then filtered for reliability and variability using Integrative Correlations and median absolute deviation. Consensus Clustering of the agglomerative hierarchical algorithm was independently completed by data source to determine a number of LSCC groups. To evaluate whether LSCC groups represent the same genetic profiles across datasets, centroids (median expression profiles) of each dataset's groups were clustered by the agglomerative hierarchical algorithm and validated by the SigClust method. Gene Set Enrichment Analysis was used to discern biological pathways, and genomic copy number by the proxy of cytoband gene differential expression. Survival outcome was modeled by the Kaplan-Meier method. Results: All datasets supported four LSCC groups. Each datasets' centroids unambiguously clustered with exactly one centroid from the other datasets, which demonstrates that the four groups\#8217; genetic profiles are consistent across cohorts. Using summaries of subtype-enriched pathways, the subtypes are named: immune-mediation, E2F-regulated, xenobiotic-predominant, and developmental. Subtypes were enriched with distinct cytobands: 3q:26-29 - xenobiotic, 18q:12 - xenobiotic & developmental, 8q24 - developmental. 3q26 contains the oncogene PIK3CA and is a known common amplification in LSCC. There are distinct 3-year survival outcomes among the subtypes, maximally 24%. In continuing to profile these subtypes, we are assaying a cohort of 77 patients at UNC by SNP arrays. Conclusions: These results demonstrate, for the first time, that LSCC subtypes are reproducible across multiple, independent cohorts and represent distinct biological processes, clinical outcomes and putative copy number events. These results constitute a molecular subtype classification reference for the TCGA LSCC project.
Citation Information: In: Proc Am Assoc Cancer Res; 2009 Apr 18-22; Denver, CO. Philadelphia (PA): AACR; 2009. Abstract nr 1449.
Footnotes
100th AACR Annual Meeting-- Apr 18-22, 2009; Denver, CO
- American Association for Cancer Research