Pathologic differentiation of tissue of origin in tumors found in the lung can be challenging, with differentiation of mesothelioma and lung adenocarcinoma emblematic of this problem. Indeed, proper classification is essential for determination of treatment regimen for these diseases, making accurate and early diagnosis critical. Here, we investigate the potential of epigenetic profiles of lung adenocarcinoma, mesothelioma, and nonmalignant pulmonary tissues (n = 285) as differentiation markers in an analysis of DNA methylation at 1413 autosomal CpG loci associated with 773 cancer-related genes. Using an unsupervised recursively partitioned mixture modeling technique for all samples, the derived methylation profile classes were significantly associated with sample type (P < 0.0001). In a similar analysis restricted to tumors, methylation profile classes significantly predicted tumor type (P < 0.0001). Random forests classification of CpG methylation of tumors—which splits the data into training and test sets—accurately differentiated mesothelioma from lung adenocarcinoma over 99% of the time (P < 0.0001). In a locus-by-locus comparison of CpG methylation between tumor types, 1266 CpG loci had significantly different methylation between tumors following correction for multiple comparisons (Q < 0.05); 61% had higher methylation in adenocarcinoma. Using the CpG loci with significant differential methylation in a pathway analysis revealed significant enrichment of methylated gene-loci in Cell Cycle Regulation, DNA Damage Response, PTEN Signaling, and Apoptosis Signaling pathways in lung adenocarcinoma when compared with mesothelioma. Methylation profile–based differentiation of lung adenocarcinoma and mesothelioma is highly accurate, informs on the distinct etiologies of these diseases, and holds promise for clinical application. [Cancer Res 2009;69(15):6315–21]
- lung adenocarcinoma
Malignant pleural mesothelioma is a rapidly fatal neoplasm with a clinical presentation that can mimic adenocarcinoma of the lung, complicating diagnosis ( 1, 2). These malignancies likely have distinct cellular origins, although this remains unclear. Shared signs and symptoms of these diseases include malignant pleural effusion, dsypnea, chest pain, and fatigue ( 3, 4). An enhanced description of the character of the underlying somatic alterations, and thereby a proper diagnosis, is of paramount importance, especially considering the disparate prognoses and treatment regimens for lung adenocarcinoma and mesothelioma ( 5, 6).
Several techniques have been used or proposed for differential diagnosis. Cytologic approaches to differential diagnosis have historically had a wide margin of variability in sensitivity depending on sample preparation methods and feature sets analyzed ( 7, 8). Currently, the most common method uses an immunohistochemical panel containing both epithelial and mesothelial markers ( 9). Despite recent improvements in antibody panels for differential diagnosis, there is no consensus immunohistochemical panel or evidence-based guidelines for panel selection ( 9, 10). Another method, using mRNA expression gene ratios, has reported differential diagnosis accuracy of 95% and 99% for mesothelioma and adenocarcinoma, respectively ( 11). The instability of mRNA, though, may make wide-scale implementation of this technology challenging, particularly outside of major academic surgical centers.
It is well recognized that promoter DNA hypermethylation is a mechanism of stable control of transcription, and an important contributor to carcinogenesis. When certain cytosines in specific clustered regions primarily located in gene promoters are hypermethylated, aberrant, stable gene silencing can occur. Regulatory CpG clusters are common, often occur in tumor suppressor genes, and are thought to remain largely unmethylated in noncancerous cells. In fact, about half of all human genes contain CpG islands and are potentially subject to aberrant methylation silencing ( 12, 13). Recently, the simultaneous resolution of hundreds of specific, phenotypically defined cancer-related CpG methylation marks has become technologically feasible, allowing for rapid, high-throughput epigenetic profiling of human tissue CpG methylation ( 14). Our previous work has shown hundreds of differentially methylated CpG loci in pleural mesothelioma compared with nondiseased pleura ( 15). Other reports, using a small number of candidate loci, have shown significant differences in gene-promoter methylation prevalences between lung adenocarcinoma and mesothelioma ( 16, 17).
In this study, we exploited the stability of the aberrant cytosine methylation mark and new array-based technology for high throughput measurement of DNA CpG methylation to investigate the methylation status of 1413 autosomal CpG loci associated with 773 cancer-related genes on Illumina's GoldenGate methylation bead-array platform. Using one of the largest case series studies of these diseases and focusing on epigenetic alteration, we show that methylation profiling can differentiate lung adenocarcinoma, mesothelioma, and nonmalignant tissues.
Materials and Methods
Study samples. Mesotheliomas (n = 158) and grossly nontumorigenic parietal pleura (n = 18) were obtained following surgical resection at Brigham and Women's Hospital through the International Mesothelioma Program from a pilot study conducted in 2002 (n = 70) and an incident case series beginning in 2005 (n = 88) with a participation rate of 85%. We used biopsy specimens from patients treated for non–small cell lung cancer at the Massachusetts General Hospital from 1992 to 1996 ( 18) including lung adenocarcinomas (n = 57) and nonmalignant pulmonary tissues [n = 48; of which 22 (39%) were taken from the adenocarcinoma patients; ref. 18] Additional normal lung tissues were obtained from the National Disease Research Interchange from donors free of lung malignancy (n = 4). All patients provided informed consent under the approval of the appropriate Institutional Review Boards. Clinical information, including histologic diagnosis, was obtained from pathology reports. The study pathologist confirmed the histologic diagnoses and further assessed the percent tumor from resected specimens (mean, >60% for mesotheliomas; >50% for lung adenocarcinomas).
Methylation analysis. DNA from fresh frozen tissue was isolated with QIAamp DNA mini kit (Qiagen), and sodium bisulfite modified using the EZ DNA Methylation kit (Zymo Research). Illumina GoldenGate methylation bead arrays interrogated 1505 CpG loci associated with 803 cancer-related genes processed at the University of California San Francisco Institute for Human Genetics, Genomics Core Facility as described by Bibikova and colleagues ( 14). Methylation array data are available on the Gene Expression Omnibus archive (accession GSE16559).
Statistical analysis. Illumina BeadStudio Methylation software was used for data set assembly. Fluorescent signals for methylated (Cy5) and unmethylated (Cy3) alleles give methylation level: β = (max(Cy5, 0))/(∣Cy3∣ + ∣Cy5∣ + 100) with ∼30 replicate bead measurements per locus. Detection P values determined poor performing samples (n = 2) and CpG loci (n = 8), which were removed from analysis. X chromosome loci were also removed, leaving 1413 CpG loci associated with 773 genes.
Subsequent analyses were conducted in R ( 19). Hierarchical clustering was performed with the hclust function: Manhattan metric and average linkage for CpG loci with the highest variance. For inference, data were clustered using a recursively partitioned mixture model (RPMM; ref. 20). Associations between covariates and methylation at individual CpG loci were tested with generalized linear models, accounting for the beta-distribution of average β as in Hsuing and colleagues ( 21). False discovery rate correction via Q-values were computed by the qvalue package ( 22).
Recognizing the importance of using a training set to build a classifier, and a test set upon which to test the validity of the classification scheme, we have used the Random Forests approach (RF), R package version 4.5–25 by Liaw and Wiener. RF builds classifiers by repeatedly sampling with replacement from the original data (i.e., bootstrap sampling), sampling from the predictors, and building a classification tree with the resulting samples ( 23). Upon every iteration, approximately a third of the original data are not sampled; the unsampled, or “out of the bag” observations are used as a test set against which the tree is assessed with respect to classification error. The out of the bag error rate—the average classification error over all iterations—is thus an unbiased estimate of the fraction of times the RF prediction is incorrect.
Canonical pathway analysis was conducted with the use of Ingenuity Pathway Analysis (Ingenuity Systems; ref. 24). CpG gene-loci associated with the Ingenuity Pathway Knowledge Base were considered for analysis and differentially methylated loci from locus-by-locus analysis were compared. The significance of gene-locus enrichment within canonical pathways was measured with a Fisher's exact test (P < 0.05).
Incident cases of mesothelioma (n = 158), lung adenocarcinoma (n = 57) and associated nonmalignant pleural (n = 18), and pulmonary tissues (n = 52) were assessed for methylation (total n = 285). Demographic and tumor characteristic data for these samples are presented in Table 1 . Mean age and gender distributions were similar between tumor and their nontumor samples of origin. Lung adenocarcinomas and nontumor lung samples were from individuals with similar exposures to smoking, and their asbestos exposure histories did not differ. Mesotheliomas and nontumor pleural samples were from individuals with similar exposures to asbestos.
Unsupervised hierarchical clustering of the 500 most methylation-variable autosomal CpG loci revealed readily apparent differences in the epigenetic profiles among lung adenocarcinoma, mesothelioma, and nonmalignant tissues ( Fig. 1A ). However, nonmalignant pleural and pulmonary tissues did not seem to segregate from each other. Unsupervised hierarchical clustering of tumors only is shown in Fig. 1B. We next applied a modified model-based form of unsupervised clustering known as RPMM ( 20). The RPMM returned 17 methylation classes whose average methylation profiles are shown in Fig. 2 ; 11 of these classes (68%) perfectly captured a single sample type, and methylation profiles were a significant predictor of tissue sample type (P < 0.0001). The 50 CpG loci whose methylation status most effectively discriminates among methylation classes are listed in Supplementary Table S1.
A supervised RF classification of methylation data in all samples returned a confusion matrix showing which samples are correctly classified, those that are misclassified, and the misclassification error (ME) rate for each sample type ( Table 2 ). The overall ME rate of 7.0% was significantly lower than the expected error rate under the null hypothesis (P < 0.0001).
Consistent with the patterns observed from unsupervised clustering, nonmalignant tissues had a higher misclassification error (ME, 24.3%), than tumors (ME, 1.4%). Of 52 nonmalignant pulmonary tissues, 4 were confused as lung adenocarcinoma, and 1 as a mesothelioma (ME, 9.6%). Among 18 nonmalignant pleural tissues, 7 were confused as nontumor lung, and 5 as mesothelioma (ME, 66.6%). On the other hand, only one lung adenocarcinoma was misclassified, as a nontumor lung (ME, 1.8%); and only two mesotheliomas were misclassified, both as lung adenocarcinoma (ME, 1.3%). The 50 most discriminatory CpG loci from this RF analysis are given in Supplementary Table S2.
We next restricted our analysis to lung adenocarcinoma and nonsarcomatoid mesotheliomas (n = 210) and applied the RPMM approach ( Fig. 3 ). In this model, 14 methylation classes resulted, and 12 (86%) perfectly capture a single tumor type. Methylation classes significantly predicted tumor type (P < 0.0001). The 50 most critical loci for differentiating the methylation classes in this model are listed in Supplementary Table S3. Results were again followed up with RF classification resulting in a confusion matrix with an overall ME of <1%, (P < 0.0001; Table 2). The 50 most discriminatory CpG loci for RF classification of tumors are given in Supplementary Table S4.
In a univariate approach, we tested all CpG loci individually for an association between methylation and tumor type with generalized linear models followed by correction for multiple comparisons. In this manner, 1266 CpG loci had methylation levels that differed between lung adenocarcinoma and mesothelioma (Q < 0.05; Supplementary Table S5). Among these 1266 CpG loci, 61% had higher methylation in lung adenocarcinoma compared with mesothelioma. In addition, epithelioid and sarcomatoid mesotheliomas had differential methylation (Q < 0.05) at 87 CpG loci including 15 gene-loci (e.g., SLC22A18, RARA, and SEPT9) with >1 CpG displaying differential methylation (Supplementary Table S6).
Lastly, using the locus-by-locus data, we performed a pathway analysis comparing methylation profiles between lung adenocarcinoma and mesothelioma. Among mesotheliomas, Fc Epsilon RI Signaling, and Calcium Signaling pathways were significantly enriched (Fisher's P < 0.05) for methylation versus lung adenocarcinoma ( Table 3 ). Lung adenocarcinomas had six pathways with significant enrichment (Fisher's P < 0.05) of methylated gene-loci versus mesothelioma including Cell Cycle Regulation, DNA Damage Response, PTEN Signaling, and Apoptosis Signaling.
The microscopic assessment of adenocarcinoma of the lung can resemble malignant pleural mesothelioma. There is no absolute standardized approach to differential diagnosis of these diseases, which can be challenging. As is the case with any disease, proper diagnosis is paramount; a rapid, accurate diagnosis has the potential to improve patient outcome. Using DNA methylation profiling, we successfully differentiated these tumors, suggesting that this approach may be a useful adjunct in diagnosis.
All somatic cells in a given individual are genetically identical (excluding T and B cells). However, different cell types form distinct anatomic structures and carry out a wide range of physiologic functions. This is made possible largely via control of gene expression. One approach for differentiating pleural mesothelioma and lung adenocarcinoma relies on the differential gene expression profiles of these tumors ( 11). Although this approach is sound, and has been reproduced in malignant pleural effusions ( 1), the instability of mRNA transcripts makes methods relying upon RNA measures difficult to standardize and implement. DNA methylation profiles reflect phenotypically important differences in gene transcription and the molecular structure of DNA is inherently more stable than RNA, making assessment of DNA methylation profiles attractive as a highly accurate and reproducible diagnostic test.
Unsupervised clustering achieved excellent segregation of tumor tissues from each other and from nontumor tissues, although there was indistinct clustering of nontumorigenic lung and pleural samples. Similarly, some RPMM methylation classes contained a mixture of both nontumor lung and nontumor pleura samples, and in RF classification, nontumor pleura samples had the highest misclassification error. The most likely reason for pleura being misclassified as lung tissue is the potential contamination of the pleural sample with adjacent lung tissue. In addition, in this and other RF classifications of methylation data from our group, we found a significant correlation between sample size and classification error. Therefore, some of the ME for pleural samples may be attributable to small sample size. In the future, arrays with larger panels of CpG methylation markers may further increase the accuracy with which these tissue types can be differentiated.
In an analysis restricted to tumors, we showed the great extent to which CpG methylation varies between mesothelioma and lung adenocarcinoma. Disparate CpG methylation profiles between these tumor types can be attributed in part to differential methylation profiles in the tissues of origin. Although there has been a general consensus that normal cells maintain CpG islands in an unmethylated state permissive to transcription ( 13), tissue-specific methylation of CpG islands has been described in nondiseased cells ( 25). In fact, data from the Human Epigenome Project have shown that there is tissue-specific methylation among 90 genes associated with the human major histocompatability complex ( 26), and others have reported tissue-specific promoter-region methylation of monocytes, testis, and brain tissues ( 27). Consistent with these findings, our data show that, in general, normal lung and pleura have different basal methylation profiles.
The different etiologic factors associated with the induction of these tumors likely contribute to their differential methylation. Although the majority of lung adenocarcinomas are related to smoking, smoking is not a risk factor for mesothelioma; rather, the vast majority of mesotheliomas are linked to asbestos exposure. Although asbestos is also a risk factor for lung adenocarcinoma, in our study population, only one lung adenocarcinoma patient had a known occupational asbestos exposure, and this individual was also a smoker. Significant smoking-related and asbestos-related methylation-induced gene inactivation events have been described in lung adenocarcinoma and mesothelioma, respectively ( 28, 29). It is possible that differences in carcinogen exposure result in differences in methylation profiles within and between tumor types.
In a locus-by-locus analysis of tumor samples, over 1,000 CpG loci were differentially methylated between tumor types. Previously, with a combined sample of over 100 mesotheliomas and lung adenocarcinomas, Toyooka and colleagues ( 16) reported significantly increased methylation in lung adenocarcinoma at APC, CDH13, CDKN2A, MGMT, and RARB. Consistent with these results, in our study, all 12 CpG loci examined among these five genes had significantly higher methylation in adenocarcinomas after correcting for multiple comparisons. In another study, methylation of CDH1, ESR1, PTGS2, and RASSF1 had significantly different methylation among normal lung, mesothelioma and adenocarcinoma (total n = 24), with all gene-loci exhibiting higher methylation in lung adenocarcinoma versus mesothelioma ( 17). Similarly, in our results, at least one of the two CpG loci investigated in each of these genes had significantly higher methylation in lung adenocarcinoma and none of the CpG loci we examined in these genes had higher methylation in mesothelioma.
Pathway analysis of differentially methylated CpG loci suggested that there is significant, tumor-type–specific enrichment for methylation-based silencing of genes in specific pathways. As tumorigenesis requires somatic inactivation of several pathways, our observations suggest that either the differing etiologic factors or the differential response of the target cells to these factors is driving the mode of pathway inactivation (i.e., epigenetic versus genetic). For example, the enrichment for methylation inactivation of differential cytokine signaling pathway genes (IL-6 Signaling in lung adenocarcinoma and Fc Epsilon Signaling in mesothelioma) could represent a differential immune-regulated inflammatory response to the primary carcinogens of tobacco smoke and asbestos for these tumors. Furthermore, our group and others have shown that there is an increasing prevalence of DNA methylation of CDKN2A with greater smoking duration in lung cancers ( 30, 31), whereas this gene is often inactivated through homozygous deletion in malignant mesothelioma ( 32, 33). These results suggest that a preferential mode of inactivation may not be occurring in a gene-specific pattern, but instead represents a broader selection of inactivation by exposure and/or target tissue. Alternatively, but not mutually exclusively, the epigenetic status of the genes in these pathways in the stem cells that give rise to these tissues could differ, contributing to the observed differences between these tumors. More complete detailing of the somatic alterations, including profiles of both genetic and epigenetic alterations would assist in characterizing the relationship between exposures and differential pathway inactivation in these cancers.
Future studies that include treatment and survival data for these patients in their respective diseases may identify specific markers of therapeutic value. Epigenetic alterations associated with overall prognosis could potentially contribute to treatment decisions.
In summary, using CpG methylation profiles, we accurately differentiated mesothelioma from lung adenocarcinoma. This approach is DNA based, inexpensive, commercially available, and individual samples can be classified by simply comparing to existing RPMM data with an empirical Bayes estimator. Furthermore, random forest is a prediction-based algorithm and can, in principle, be used as the basis for diagnostic software. In addition to characterizing the methylation profiles of these tumors for potential diagnostic use, these data and those of the pathway analysis could aid in understanding variation in patients' response to treatment, and/or the identification of novel, critical therapeutic targets. Finally, beyond the classification of lung adenocarcinoma and mesothelioma, this method may be useful for a range of other clinical scenarios.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Grant support: National Cancer Institute (R01CA126939, R01CA105274, R01CA126831, R01CA52689, P50CA097257), National Institutes of Environmental Health Sciences (T32ES007155, P42ES05947, R01ES006717, P30ES00002), NIEHS/NCI (ES/CA06409), International Mesothelioma Program at Brigham and Women's Hospital (Research grant), Mesothelioma Applied Research Foundation (Research grant).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Note: Supplementary data for this article are available at Cancer Research Online (http://cancerres.aacrjournals.org/).
- Received March 22, 2009.
- Revision received May 12, 2009.
- Accepted May 27, 2009.
- ©2009 American Association for Cancer Research.