| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Molecular Biology, Pathobiology, and Genetics |
1 Laboratory of Epithelial Cancer Biology, Head and Neck Service, 2 Thoracic Oncology Service, 3 Dental Service, Departments of Surgery and 4 Pathology, 5 Computational Biology Center, 6 Genomics Core Laboratory, Memorial Sloan-Kettering Cancer Center, New York, New York; and 7 Department of Head and Neck Surgery, Siriraj Hospital, Bangkok, Thailand
Requests for reprints: Bhuvanesh Singh, Head and Neck Service, Memorial Sloan-Kettering Cancer Center, 1275 York Avenue, New York, NY 10021. Phone: 212-639-2024; Fax: 212-717-3302; E-mail: singhb{at}mskcc.org.
| Abstract |
|---|
|
|
|---|
| Introduction |
|---|
|
|
|---|
Numerous risk factors are shared by lung and head and neck cancers. The most important known risk factor for both is tobacco exposure, which is known to contain numerous mutagens and carcinogens. Tobacco use may increase the risk for development of head and neck squamous cell carcinomas up to 17 times (2) and lung cancers at least 2.6 times that of nonusers (3). Given the similar risk factors and etiologies, it is not surprising that these diseases frequently occur together in the same patient. Two broad explanations may account for the conjoint occurrence of these tumors. First, the concept of "field cancerization" is well-known in the literature (4, 5), and can perhaps be more broadly applied to the coexistence of these diseases throughout the body. Squamous mucosa of both the head and neck and lungs is exposed to mutagens that may cause genetic aberrations responsible for the initiation of the carcinogenic process at multiple sites. Alternatively, carcinogens such as tobacco may cause both diseases by different mechanisms in each site, still ultimately causing concurrent cancers. In patients with a previous lung cancer, the incidence ratio of a second primary lung cancer is 1.7 and that of an oral cavity or pharyngeal cancer is 2.7 (6). Conversely, patients with a previous laryngeal cancer have a 4.5-fold increased incidence ratio of a lung cancer when followed for >5 years (7). This perhaps attests to similar genetic mechanisms and biologies accounting for squamous cell carcinomas in general.
Confounding the diagnosis of these patients is the fact that the lungs are the most common site for metastasis of head and neck squamous cell carcinomas. Since both lung primary tumors and metastatic head and neck tumors arise from squamous cells, these carcinomas are often indistinguishable based on traditional histopathologic analysis; and thus, the site of origin of the tumor is indeterminate. This creates a complex conundrum in patients with lung lesions and a history of prior head and neck cancers: is this a second primary lung squamous cell carcinoma or is it a metastasis from the previously treated head and neck squamous cell carcinoma? Oligonucleotide and cDNA microarrays have been hailed as "the newest types of microscopes" (8) since they allow the classification of tumors on the assumption that the genotype directly leads to a phenotype, and can be fingerprinted in a way far more precise than our ability to detect subtle changes under the light microscope.
This concept is eloquently stated by Caldas et al. (9), who explains that the first step in rationally treating a disease is to correctly classify the disease in order to predict response. This process is dependent on the quality of the classification method, and molecular techniques seem to be refining this method. Others have shown molecular classification to have
78% accuracy, compared with random classification's 9% accuracy (10). Whereas this still shows significant error, the inclusion of more genes, novel techniques, and applications shows considerable promise, with early studies generating significant interest (11). Moreover, whereas our biological techniques expand, it is often the ingenuity of the in silico modeling which gives the data clinical utility, as we show in this study.
Knowledge of whether a lung lesion constitutes a second lung primary versus a head and neck metastasis contributes significantly to treatment planning and prognosis. First, patients with an isolated primary lung lesion may be effectively treated by resection with an
82% 5-year survival and 74% 10-year survival for T1 nonsmall cell lung cancer (12). Even if the lesion is in fact a metachronous lung lesion, survival may be similar to that for resection of the initial primary (13). In contrast, those patients with metastatic head and neck cancers to the lung have clinically and biologically systemic disease, in which the cancer cells have acquired the ability to migrate, travel, and seed new metastases. Consequently, surgical therapy offers only a 34% 5-year survival for squamous cell carcinomas metastatic to the lung even in well selected patients (14). Second, the extent of lung resection (if the patient is a surgical candidate) is potentially more tissue-sparing if it is for a metastasis versus a primary lung tumorconsisting of a wedge or segmental resection rather than a lobe or more extensive resection. Third, if we know that a patient has a metastasis, given an otherwise poor survival with resection alone, we may be inclined to give multimodality therapy including chemotherapy and surgery. Thus, the distinction between whether a lung lesion constitutes a primary or metastatic lesion is not trivial and may be important in the clinical care of the patient (15). Whereas other groups have used similar methodologies to molecularly classify tumors into known subsets (16), our data may provide an answer to this troubling clinical question of the location of the primary disease.
To date, the vast majority of oligonucleotide and cDNA microarray studies have focused on delineating the genetic mechanisms of carcinogenesis (8, 1719), isolating novel therapeutic and diagnostic targets (2022), classifying tumors molecularly (23, 24), and prognosticating (11, 2527). Here we show that validated oligonucleotide microarray technology can be used to clearly distinguish the tissue of origin of a squamous cell carcinoma, using an unsupervised hierarchical clustering analysis with supervised validation. This algorithm can then be applied to a "test set" of tumors of unknown origin, allowing us to determine where these originated, and whether they constitute new lung primary tumors or metastatic head and neck tumors.
| Materials and Methods |
|---|
|
|
|---|
Tissue collection. Under informed consent and Institutional Review Boardapproved protocols, tissue from patients undergoing operative procedures was collected directly following resection in the operating rooms at Memorial Sloan-Kettering Cancer Center. Tumor samples were collected from the advancing tumor edge along with adjacent normal tissue and snap-frozen in liquid nitrogen at the time of operation. These tumors were then banked at 80°C for storage until later use.
Confirmation of pathology. All tumors were examined by routine pathology using H&E staining to determine pathology and tumor grade. To confirm the initial pathology report and grade, we further reviewed every case with a pathologist (R.A. Ghossein and D. Carlson). We also confirmed a minimum of 70% tumor involvement of the final section in order to minimize contamination of gene expression profiles from surrounding normal stromal tissue, in keeping with reports from other investigators (10). Whereas tumor cells may be enriched using laser capture microdissection, this introduces the inherent problems of cDNA amplification and is impractical for clinical use.
Tissue processing. For RNA extraction, 100 mg of frozen tissue was homogenized in 1 mL of TRIzol Reagent (Life Technologies, Gaithersburg, MD) and RNA extracted according to the manufacturer's protocol. The obtained RNA was then further purified using the RNeasy (Qiagen, Valencia, CA) system and protocol. Samples were quantified using standard spectrophotometry, and considered acceptable if the A260/280 reading was >1.7.
Affymetrix oligonucleotide microarray. RNA quality was confirmed using an Agilent 2100 Bioanalyzer. Total RNA (25-50 ng) was run on a RNA 6000 Nano Assay (Agilent, Palo Alto, CA). Samples were accepted for further analysis if clear 28S and 18S RNA bands were present. Total RNA (5-10 µg) was then reverse-transcribed using an oligo dT-T7 primer and cDNA synthesis kit (Life Technologies). cDNA was isolated by phenol/chloroform extraction and resuspended in 12 µL of diethyl pyrocarbonate-treated water. Ten microliters of the resultant solution was then used in an in vitro transcription-amplification reaction in the presence of biotinylated nucleotides (Enzo Diagnostics, Farmingdale, NY). Fifteen micrograms of labeled cRNA was fragmented by incubation at 95°C for 35 minutes in fragmentation buffer [40 mmol/L Tris-acetate (pH 8.1), 100 mmol/L KOAc, 30 mmol/L MgOAc] and hybridized onto a Test 3 array and subsequently onto an Affymetrix HG_U95Av2 oligonucleotide chip for 16 hours at 45°C (Affymetrix, Santa Clara, CA). Posthybridization staining and washing were processed according to the manufacturer's instructions (Affymetrix). Scanning was done using a Hewlett-Packard argon-ion laser confocal scanner and analyzed using Microarray Suite 5.0 (Affymetrix). Images were quantified using MAS 5.1 (MicroArray Suite, Affymetrix) with the default parameters for the statistical algorithm and all probe set scaling with a target intensity of 500.
Validation of microarray data. Results from the Affymetrix oligonucleotide microarrays were validated by real-time reverse transcription-PCR. For the lung squamous cell carcinoma cohort, six up-regulated genes and three down-regulated genes were chosen to confirm the microarray findings. For the oral tongue squamous cell carcinoma cohort, three genes were examined by real-time reverse transcription-PCR. Real-time reverse transcription-PCR methodology has been published previously (28).
Statistical analyses. In order to filter noise from the analysis, genes that were present in <25% of the samples were excluded from initial analyses. Hierarchical clustering on this set was done as follows: gene values were normalized by subtracting the means of the signal intensities for each gene; the distance metric used was d = (1
) / 2, where
is the Pearson correlation function between samples. The Ward linkage method was used. Bootstrap nonparametric resampling was done using 1,000 iterations. A majority rule consensus tree was constructed from the 1,000 bootstrap trees. Next, genes were selected to maximally discriminate the tongue from lung tumors using a standard t test on the log of the absolute signal intensity, and the data was reclustered using these genes. The number of genes selected was arbitrary. However, we found that approximately 100 genes perfectly discriminated the two groups; whereas choosing more genes improved the robustness of the clustering. Five hundred genes seemed to be an acceptable compromise, allowing a manageable number of genes yet maintaining stable clusters. To confirm and quantify how well tongue versus lung tumors could be discriminated, a supervised learning analysis was done using support vector machines and leave-one-out cross-validation on the original tumors. Thirteen nonhead and neck squamous cell carcinoma samples of unknown origin (primary or metastatic) were then added to the analysis after the gene selection.
| Results and Discussion |
|---|
|
|
|---|
In the tongue cancer patient cohort 79 genes were found to be significantly associated with oral tongue cancer when compared with case-matched, histologically normal mucosa. Three genes (GLUT3, HSAL2, and PACE 4) were selected for further analysis and validated as prognostic markers in a larger cohort of 49 patients by quantitative real-time reverse transcription-PCR. Using a criterion of 2-fold up-regulation as a cutoff, 30.6%, 24.5%, and 26.5% of patients expressed high levels of GLUT3, HSAL2, and PACE4, respectively; again confirming the validity of the microarray data (30).
Other groups have used Northern blot and/or immunohistochemistry validation (25). However, the specificity and accurate quantification afforded by real-time reverse transcription-PCR probably make it an ideal method for internal validation of array data.
Although it is clearly impractical to examine all genes responsible for the segregation of the various tumor subsets, several provide confirmatory validation of the segregation and also raise a number of interesting potential causal links. For example, the protein kinase C substrate is found significantly more often in the primary lung tumor subset. Given that this is a regulatory molecule that mediates mucin granule release by bronchial epithelial cells, it is not surprising to find it more commonly expressed in the lung tumor subset (31). The polycomb 2 homologue is a gene also highly expressed in lung tumor tissue. This gene is thought to be a repressor of proto-oncogene function and that interference with it may lead to cellular transformation, consistent with our finding that this gene is highly expressed in this malignant tissue (32). Conversely, EIF-2
is found more commonly in our tongue tumor subset. Notably, this gene may be involved in the hypoxic stress response (33). Hence, it is likely to be more important to cells in the relatively more hypoxic environment of the oral mucosa than in the lung.
Hierarchical clustering of primary tumors. All lung and tongue tumors were combined into an unsupervised hierarchical clustering model. From the 12,625 probe sets on the array, those expressed in <25% of samples were excluded (as is usual practice in microarray analysis) to filter out noise, leaving 6,716 probe sets for analysis. The two groups clearly segregated based on the tissue of origin of the primary tumor (Fig. 1). Unsupervised clustering was done without prior knowledge of how these tumors might cluster, and to determine if groups would form without biasing the data, as suggested by numerous authors (34). Thirty of the 31 tongue tumors grouped in cluster 1. All 21 lung tumors grouped in cluster 2, along with a single tongue tumor. Notably, the three lung adenocarcinoma samples formed a separate subcluster within the lung group, as our group and others have found previously (24, 35, 36). These two clusters and single subcluster were the only truly robust clusters, with bootstrap resampling of >95%. Together, these data show the uniqueness of tumors from each tissue, and indicate the importance of genes intrinsic to the tissue of origin in determining the expression profile of a tumor.
|
We next aimed to determine both the approximate number and identities of the genes causing this powerful tissue-based segregation. On one hand, ideally a single gene would determine the origin of a tissue and/or its carcinogenic potential. However, we find that no less than 50 genes can be used to segregate these tissues. Consequently, some authors propose the use of the maximum number of probe sets (37); however, this can both mask the differences in genotype due to noise and decrease the predictive value when arrays are validated using an unfiltered gene set. Moreover, others have suggested that increasing gene numbers from small to moderate improves the error rate in classification algorithms, but as the number of genes used becomes large, the generalization performance worsens (38). Thus, to determine an optimally small gene set to use, we arbitrarily selected the 1,000, 500, and 100 genes which were most different between the lung and tongue data sets by t test. Table 1 lists the 50 most significantly different genes between the two groups. Even using as few as 100 genes, results were sufficient to completely distinguish the two data sets. This suggests that <100 genes were responsible for the distinction between the tissues of origin. However, as the number of genes decreased, the robustness of the clustering reduced. Thus, for later analyses we chose to use 500 genes, as this maintained both the segregation and robustness of the clustering. Whereas 500 genes may seem to be a large number of genes to consider, it is a relatively small number in comparison with the total number of genes likely to determine a tissue type or the number of mutations thought to contribute to most carcinogenic processes.
|
Clustering using just the subset of "unique genes" resulted in groupings similar to the initial unsupervised clustering (Fig. 2). However, this time every lung and tongue tumor segregated into their respective groups, with no outliers. Of note, within the lung tumor subgroup, we also included three lung adenocarcinomas (in addition to the 28 lung squamous cell carcinomas), as before, to delineate the significance of tissue of origin versus disease biology as the key segregating factor. These primary lung neoplasms of markedly different histology tended to segregate into a separate subcluster of the dendrogram within the lung subset, but more closely correlated than previously, confirming that although, as mentioned by others, disease biology contributed to the dendrogram clusters (36), the key feature in this analysis of 500 genes was tissue of origin.
|
Another group, Leong et al. (39), have previously used microsatellite analyses of loss on chromosomal arms 3p and 9p to examine the origins of lung squamous cell carcinomas. Our work builds on this in that array-based analyses examines many thousands of genes (in comparison to just a small number of chromosomal arms). We are also able to retrospectively and algorithmically choose genes which seem to be important rather than having to predetermine and "best-guess" which areas of the genome are best to examine. Furthermore, computer pattern recognition allows us to compare a single biopsy sample from the lung against a generic normal tissue template of any tissue, alleviating the need for tissue from both the new tumor and previous primaries as are required for microsatellite analyses.
Our tumor cohorts for each group were not large enough to perform a complete validation by dividing the data into "training" and "test" sets, as would be ideal (16). In these circumstances, one of the optimal techniques is a supervised cross-validation rather than to re-test class assignment by repeated unsupervised clustering (34). Furthermore, supervised methods ensure that criteria relevant to the classification are what determine the groupings (40). We therefore elected to use a leave-one-out cross-validation using support vector machines to learn the classification (11, 25, 41). This analysis removes a sample, re-tests the categorization of remaining samples, and then returns the removed sample. This is repeated for every patient sample. The percentage of samples correctly predicted was 98%, further confirming that our prior clustering very accurately predicted class assignment of tumors to the head and neck or lung subsets.
Gene expression profiling of lung tumors of undetermined origin. Having identified a subset of 500 genes which reliably distinguished lung from tongue squamous cell carcinomas, we then chose to test this clustering algorithm on a clinical set of 12 lung squamous cell carcinomas resected from patients who were previously treated for head and neck squamous cell carcinomas (Table 2). As expected, histologic or clinical criteria could not precisely define the tissue of origin of these cases. However, the study group was biased as the majority of these tumors were thought to be of lung origin by virtue of the fact that they underwent resection. This reflects a clinical bias against resecting head and neck metastases to the lung given the poor patient outcome in this population. Nonetheless, many of the patient records documented the lack of certainty in deciding on the tissue of origin from the histologic biopsy or surgical samples. The tumors from the test set were processed and analyzed exactly as the training set and hybridized to Affymetrix HG_U95Av2 arrays. The cluster algorithm was rerun using the 500 genes selected from the training set (Fig. 3). The addition of the samples from the test cohort did not destabilize the clusters. All of the cases from the test cohort reliably segregated into one of the two cluster groups. This offers strong molecular evidence to link the unknown cases either as lung or head and neck primary origin tumors, although no "gold standard" test exists to confirm this. As expected, the majority (11 samples, labeled U1-11) segregated with the lung tumors, supporting the clinical suspicion that these were lung primary tumors. One sample (labeled U12) robustly clustered with the tongue tumor subset, suggesting that it was likely a metastasis from a prior head and neck cancer. This suspicion was supported clinically by the development of a metachronous metastasis to the pancreas in this patient. To further validate the concept that metastatic tissues retain the genetic profile of the tissue of origin, we tested this pancreatic metastasis (labeled U13). This sample not only reliably clustered with the tongue subset, but despite being of pancreatic focus, clustered most closely with the same patient's lung lesion. This suggests a very strong maintenance of the tumor's gene expression profile regardless of the site to which it metastasizes.
|
|
Oligonucleotide microarrays have shown considerable promise in determining genetic mechanisms, therapeutic targets, and prognostication. In this study, we use microarray data clinically to allow us to determine the tissue of origin of a tumor of unknown origin. We show that gene expression profiling can reliably inform us of a tumor's tissue of origin when compared with other like tissues of indistinguishable phenotypes. Furthermore, this profiling can be applied to metastases which seem to retain the expression profile of their primary tissue of origin, regardless of their location in the body. In this way, microarray technology may be used to improve clinical decision-making and patient care.
| Acknowledgments |
|---|
We thank Nancy Bennett for her outstanding editorial assistance.
| Footnotes |
|---|
Presented at the American Association for Cancer Research Annual Meeting, 2003. Washington, DC, Poster Session, Abstract #6014.
Received 6/14/04. Revised 1/25/05. Accepted 2/ 7/05.
| References |
|---|
|
|
|---|
B signal pathway. Cancer Res 2001;61:4797808.This article has been cited by other articles:
![]() |
A. C. Borczuk, R. L. Toonkel, and C. A. Powell Genomics of Lung Cancer Proceedings of the ATS, April 15, 2009; 6(2): 152 - 158. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Keshelava, E. Davicioni, Z. Wan, L. Ji, R. Sposto, T. J. Triche, and C. P. Reynolds Histone Deacetylase 1 Gene Expression and Sensitization of Multidrug-Resistant Neuroblastoma Cell Lines to Cytotoxic Agents by Depsipeptide J Natl Cancer Inst, July 18, 2007; 99(14): 1107 - 1119. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Vachani, M. Nebozhyn, S. Singhal, L. Alila, E. Wakeam, R. Muschel, C. A. Powell, P. Gaffney, B. Singh, M. S. Brose, et al. A 10-Gene Classifier for Distinguishing Head and Neck Squamous Cell Carcinoma and Lung Squamous Cell Carcinoma Clin. Cancer Res., May 15, 2007; 13(10): 2905 - 2915. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. P. Gruber, C. D. Coldren, M. D. Woolum, G. P. Cosgrove, C. Zeng, A. E. Baron, M. D. Moore, C. D. Cool, G. S. Worthen, K. K. Brown, et al. Human Lung Project: Evaluating Variance of Gene Expression in the Human Lung Am. J. Respir. Cell Mol. Biol., July 1, 2006; 35(1): 65 - 71. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Cancer Research | Clinical Cancer Research |
| Cancer Epidemiology Biomarkers & Prevention | Molecular Cancer Therapeutics |
| Molecular Cancer Research | Cancer Prevention Research |
| Cancer Prevention Journals Portal | Cancer Reviews Online |
| Annual Meeting Education Book | Meeting Abstracts Online |