| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Experimental Therapeutics, Molecular Targets, and Chemical Biology |
1 SEQUENOM, Inc., San Diego, California and 2 The University of Liverpool Cancer Research Centre, Roy Castle Lung Cancer Research Program, Liverpool, United Kingdom
Requests for reprints: Mathias Ehrich, Molecular Biology, SEQUENOM, Inc., 3595 John Hopkins Court, San Diego, CA 92121. Phone: 858-202-9068; Fax: 858-202-9084; E-mail: mehrich{at}sequenom.com.
| Abstract |
|---|
|
|
|---|
| Introduction |
|---|
|
|
|---|
70%, it decreases to 30% for stage IIIa (2). There is a need for improved clinical stratification methods that can identify patients with early-stage disease and identify those with high risk of recurrence (3). Conventional methods, including spiral computed tomography, sputum cytology, histopathology, or tumor-node-metastasis classification, have thus far failed to overcome limitations in early detection and risk assessment. On the contrary, a variety of novel molecular methods, such as detection of K-ras (4) and p53 mutation status (5, 6), microsatellite instability (7), protein profiling (8, 9), and especially gene expression profiling (10, 11), have shown very promising results. Other potential molecular markers in lung cancer are epigenetic changes of the DNA (1214). Alterations in DNA methylation and related chromatin changes have been reported as an early event in carcinogenesis and hence hold the promise of being useful as one of the earliest detection markers available (15). To date, researchers in the field have focused on detection of hypermethylated DNA as a marker for tumor progression using methylation-specific PCR (MSP; ref. 16). MSP is an easy to use method with very high sensitivity, but it suffers from limited versatility. The method only allows assessment of the presence or absence of methylation at the CpG sites enclosed in the PCR primer hybridization site. Consequently, tissues (tumors) with different fractions of methylated DNA cannot be differentiated; relative changes in the amount of methylated DNA usually remain invisible. Different methods, such as semiquantitative real-time PCR or bisulfite sequencing, are now being used to obtain more quantitative results. Current methods are limited by restricted CpG coverage per assay, poor quantitative resolution, or a combination of both. Hence, large-scale studies that evaluate quantitative methylation for multiple CpG sites in various gene regions and a large number of samples are rare.
A novel technology has been introduced recently that aims to overcome these shortcomings and allows large-scale cytosine methylation profiling (17). Here, we used this technology to quantify the degree of cytosine methylation at 47 genes in tumor and adjacent normal tissue from 96 lung cancer patients. We evaluate the feasibility of this approach to reveal practical markers for lung cancer research.
We selected 96 patients with NSCLC and a history of smoking. From each patient, we collected one specimen from the primary tumor and one specimen from adjacent normal tissue, resulting in a total of 192 samples. The patient collection consists of 34 females and 62 males, ages 41 to 87 years (median, 67 years). In this collection, 50 patients were diagnosed with squamous cell carcinoma, 43 with adenocarcinoma, 1 with large cell carcinoma, 1 with atypical carcinoma, and 1 with not further classified NSCLC. At the time of diagnosis, 45 patients had stage I disease, 27 patients had stage II disease, 20 patients had stage IIIa disease, 3 had stage IIIb disease, and 1 was undetermined. For analysis purposes, this patient collection was randomly divided into separate training and test sets, matched for age, sex, histology, and disease stage. These groups are summarized in Table 1 .
|
For each sample in our collection, 2 µg of genomic DNA were isolated from frozen tissue specimens using a standard phenol/chloroform protocol. The DNA was prepared for methylation analysis using a commercially available bisulphite conversion kit (see Materials and Methods for details). The bisulphite-treated DNA was then used for PCR amplification (independent of methylation status).
We measured DNA methylation using a novel technique that combines base-specific cleavage of single-stranded nucleic acids with MALDI-TOF mass spectrometry (MS) analysis of the cleavage products (17). In brief, the method starts with PCR amplification of the target region from bisulphite-treated DNA, which is followed by in vitro transcription to generate a single-stranded RNA molecule. The RNA strand is then cleaved base specifically in individual reactions either after U or C, determined by the usage of noncleavable nucleotides (18). The cleavage reaction driven to completion and the resulting cleavage products represent a well-defined substring of the analyzed target region, which is only dependent on the sequence context and not dependent on the reaction conditions. The cleavage products are then analyzed using MALDI-TOF MS. For analysis of DNA methylation, we examine the methylation-dependent C/T sequence changes introduced by bisulfite treatment. Those C/T changes are reflected as G/A changes on the reverse strand and hence result in a mass difference of 16 kDa for each CpG site enclosed in the cleavage products generated from the RNA transcript. The mass signals representing nonmethylated DNA and those representing methylated DNA build signal pairs, which are representative for the CpG sites within the analyzed sequence substring. The intensities of the are compared, and the relative amount of methylated DNA can be calculated from this ratio. The method yields quantitative results for each of these sequence defined analytic units, which contain either one individual CpG site or an aggregate of subsequent CpG sites. We refer to these analytic units as "CpG units."
| Materials and Methods |
|---|
|
|
|---|
PCR and in vitro transcription. The target regions were amplified using the primer pairs described in Supplementary Table S1. The PCRs were carried out in a total volume of 5 µL using 1 pmol of each primer, 40 µmol/L deoxynucleotide triphosphate (dNTP), 0.1 units HotStar Taq DNA polymerase (Qiagen, Valencia, CA), 1.5 mmol/L MgCl2, and buffer supplied with the enzyme (final concentration, 1x). The reaction mix was preactivated for 15 minutes at 95°C. The reactions were amplified in 45 cycles of 95°C for 20 seconds, 62°C for 30 seconds, and 72°C for 30 seconds followed by 72°C for 3 minutes. Unincorporated dNTPs were dephosphorylated by adding 1.7 µL H2O and 0.3 units shrimp alkaline phosphatase (SAP; SEQUENOM, Inc., San Diego, CA). The reaction was incubated at 37°C for 20 minutes and SAP was then heat inactivated for 10 minutes at 85°C.
Typically, 2 µL of the PCR were directly used as template in a 6.5 µL transcription reaction. Twenty units T7 R&DNA polymerase (Epicentre, Madison, WI) were used to incorporate either dCTP or dTTP in the transcripts. Ribonucleotides were used at 1 mmol/L and the dNTP substrate at 2.5 mmol/L; other components in the reaction were as recommended by the supplier. In the same step, the in vitro transcription RNase A (SEQUENOM) was added to cleave the in vitro transcript. The mixture was then further diluted with H2O to a final volume of 27 µL. Conditioning of the phosphate backbone before MALDI-TOF MS was achieved by the addition of 6 mg Clean Resin (SEQUENOM). Further experimental details have been described elsewhere (18).
MS measurements. The cleavage reactions (15 nL) were robotically dispensed onto silicon chips preloaded with matrix (SpectroCHIP, SEQUENOM). Mass spectra were collected using a MassARRAY mass spectrometer (SEQUENOM). Spectra were analyzed using proprietary peak picking and spectra interpretation tools.
A description of the regions used for methylation analysis in NSCLC can be found as Supplementary Table S1.
Expression analysis. Gene expression levels were assayed for 48 paired normal/tumor samples, consistent with the samples used for methylation analysis, using real-competitive PCR in conjunction with quantitative primer extension measurements via MassARRAY (20, 21). Exact conditions for this methodology are published online3 in the data normalization using multiplexed gene panels for quantitative gene expression analysis with MassARRAY application note, with the exception that cDNA samples unique to this study were diluted 1:10 in DNase-free water. Target genes (genes with clear methylation patterns) and internal control (genes used for normalization) were designed into separate multiplexed assays using MassARRAY QGE assay design software (SEQUENOM) based on the transcript sequences found at the Ensembl genome browser.4 The target gene panel consisted of HUGO IDs: SERPINB5, AQP1, CDH13, CDH5, CDKN2A, DAPK1, and MGP1. The internal control gene panel consisted of HUGO IDs: ACTB, GAPD, RPL13A, SDHA, TBP, UBC, YWHAZ, B2M, HMBS, and HPTR1. Normalization was conducted using the six most stable internal control genes, for this sample set: GAPD, RPL13A, SDHA, TBP, UBC, and YWHAZ, identified using geNorm software. We then calculated the average expression of these six internal control genes for each sample. The averages were used to calculate a correction factor for baseline expression. Every expression value was corrected by its sample-specific correction factor. A pair-wise comparison of expression values shows the expected high correlation between internal control genes (Supplementary Fig. S1).
Statistical methods. We used the Wilcoxon signed-rank test, a nonparametric counterpart of the paired t test, to compare methylation levels between normal and tumor samples and to identify sites with statistically significant differences. The two-way hierarchical cluster analysis clustered the 96 tissue samples and 76 most variable CpG fragments (variance, >0.02) based on pair-wise Euclidean distances and the complete linkage clustering algorithm (22). The method first establishes a measure for the strength of a connection between two samples (called distance). Then, the samples get reorganized according to their relationship to each other. The algorithm "clusters" samples with a high degree of similarity into groups. The resulting dendrogram is used to visualize the results. The method presented in this article clusters CpG units along the x axis and samples along the y axis. The procedure was carried out using the heatmap.2 function of the "gplots" package using the R statistical environment (23). The tree-based classifier for the classification of tumor and normal samples was found using the J48 classification algorithm in the statistical package Weka (24). A complete four-node tree was pruned to the two-node tree that resulted in the lowest 10-fold cross validation error (25).
| Results |
|---|
|
|
|---|
30% of all genes examined are represented by one or more CpG units. We carried out an unsupervised two-way hierarchical clustering of the CpG unit methylation and the combined tumor and normal tissues in the training set to explore any natural groupings in this data set (Fig. 1A ). This reveals three visible clusters of samples, consisting of the following: (a) 9 tumor samples, (b) 1 normal and 34 tumor samples, and (c) 47 normal and 6 tumor samples. The clustering of CpG units reveals two primary clusters, separating the predominantly hypermethylated and hypomethylated units. Five genes had multiple CpG units included in this analysis. For SERPINB5, MGMT, MGP, and TNA, the corresponding CpG units tended to cluster together, showing similar intragenic methylation patterns. However, the six units corresponding to SDK2 were divided evenly between the two clusters.
|
The patterns we observed in the cluster analyses show that methylation patterns of normal lung tissues are notably different from those observed in tumor tissues. To evaluate the predictive ability of these 30 CpG unit measures, we applied a statistical learning algorithm, using our training set to select a model and the test set to validate the model performance. For a classifier, we chose the decision treebased method C4.5 (26), implemented as the so-called "J48" algorithm in the Weka data mining package (24). This algorithm identified a pruned three-node tree, including CpG units from MGP and SERPINB5 as the optimal classifier and achieved >95% sensitivity and specificity when applied to the test set (Fig. 1C). We also evaluated several further classification methods (random forest, support vector machines, linear model transformation, naive bayes, and recursive partitioning) and found that all methods result in a predictive accuracy >90%. Here, we focused on the results from the "J48" method because the decision treebased method allows clear interpretation of the resulting model.
In addition to the selection of a predictive model, we examined which of the genes contained CpG units where methylation differed significantly between tissue types (Table 2 ). We found that multiple CpG units within MGP, SERPINB5, GAGED2, TNA, RASSF1, and SDK2 showed highly statistically significant associations with tissue type (P < 106). Note that AQP1 showed only one significant CpG unit and hence was excluded from the list.
|
|
We carried out a survival analysis based on the methylation patterns of all tumor samples. In the present sample, survival information was only available for 61 individuals. Furthermore, the data were largely right censored (46 alive and 12 dead). Hence, a robust survival analysis could not be carried out. We analyzed the relationship between patient survival and tumor stage and found that this data set fails to present the established association between survival and tumor stage (P = 0.52; Fig. 3A ). Nevertheless, we used a supervised approach to search for a combination of CpG units that improve survival prediction. We evaluated each of the 377 variable CpG units for an association with survival (P < 0.05). The 21 CpG units (derived from 13 genes) that satisfied this criterion were subsequently included in a hierarchical cluster analysis to group patients with similar patterns of methylation (Supplementary Fig. S3). We used the first split in the dendrogram to separate patients into two groups for survival analysis, which displayed a modest association with survival (P = 0.021; Fig. 3B). Notably, only nine stage I tumors can be found in the good prognosis group.
|
= 0. 43; P = 1010), showing a general trend toward an inverse relationship between DNA methylation and gene expression.
|
| Discussion |
|---|
|
|
|---|
We identified CpG units from the promoter regions of six genes that exhibited significantly different levels of methylation (P < 106) between normal and tumor samples. Four (SERPINB5, TNA, RASSF1, and GAGED2) of the six genes have been implicated previously in tumor development. Whereas cancer-related changes in DNA methylation have been described extensively for RASSF1, methylation of SERPINB5 (maspin) has been studied less frequently. Our analysis shows that SERPINB5 is highly methylated in normal lung tissue, consistent with previous studies by Yatabe et al. (29). The lung tumor tissue analyzed in this study, however, showed hypomethylation of SERPINB5 in tumors. Interpretation of this result is unclear. Hypomethylation of SERPINB5 generally correlated with an increase in gene expression, and this agrees with Smith et al. and contradicts, at least in lung (30), its suggested role as a tumor suppressor inhibiting cell motility, invasion, angiogenesis, and metastasis in vitro (31, 32). In addition, Yatabe et al. have shown that SERPINB5 expression is controlled by promoter methylation and varies among the different cell types in the lung. Correspondingly, hypomethylation and therefore expression of SERPINB5 might be indicative of tumor clonality and its cell typespecific origin.
Methylation levels in the promoter regions of MGP and SDK2, not previously implicated in tumor development, were also found to be significantly different between tumor and normal tissues. Neither MGP nor any genes that are likely to be coregulated by these CpG sites have been linked to cancer (genes found within 100 kb upstream and downstream are WBP11, DO, PDE6H, ARHGDIB as well as three hypothetical proteins). Although unlikely, hypermethylation of the MGP region could indicate a new cancer relevant gene function besides ossification. However, it is more likely to be an effect of instable DNA methylation maintenance in cancer, which is observed more frequently.
We analyzed the change in expression for a subset of the differentially methylated genes and found that differences in methylation are strongly correlated to expression differences in three of the six examined genes (
= 0.52; P = 108). Clearly, the lack of response for the remaining three genes is striking, especially because previous studies have already shown a clear relationship between expression and DNA methylation in NSCLC. The most likely explanation is of technical nature. Methylation levels for all three genes are low across all samples (mean methylation for CDH13, 4%; CDKN2A, 4%; and DAPK1, 2%). The used technology has a detection limit
5% methylated DNA and therefore is not suitable to reliably detect methylation changes of this scale. Furthermore, gene expression is not exclusively regulated by methylation. On the contrary, multiple factors influence genetic transcription. In addition, many genes have promoter regions that are larger than the analyzed regions; thus, CpGs in important regulatory elements may have not been analyzed in this study.
It is of note that the most significant methylation differences in this study were observed in genomic regions with relatively low CpG density. The University of California Santa Cruz genome browser identifies only two of the seven most significant regions revealed in this study as CpG islands. The remaining five regions are either located in the 5'-UTR or, for MGP, were selected simply because they had the highest CpG content in the entire genomic region. However, the relationship between changes in DNA methylation and gene expression is statistically significant. Our findings indicate that DNA methylation regulates gene expression outside of traditional CpG islands and suggest rethinking of the common theorem that only regions of high CpG density are involved in gene silencing.
In this study, we have examined tumor specimens that contained up to 5% to 30% stromal cells. This inevitably results in a mixture of tumor- and nontumor-related cell types in the sample. Hence, small differences in DNA methylation may not be detectable. However, the ability to quantitate methylation may make the requirement for microdissection less critical, at least for discovery of differentially methylated genes.
This study failed to identify robust CpG predictors of survival. This can partly be attributed to the fact that the survival data for the analyzed samples were insufficient to build a good model. The sample set only included 61 patients with survival data and the vast majority of samples were right censored at time of analysis. Unlike the expression profiling studies commonly done on oligonucleotide microarrays, we did not screen DNA methylation on a genome-wide scale. The set of 47 promoter regions used here was selected based on previous expression microarray data (33) in a candidate gene approach and therefore cannot be expected to be a universal clinical predictor.
The results of this study suggest that DNA methylation analysis can be used in combination with gene expression profiling to discover a clinically meaningful molecular marker set. The strength of expression profiles is obviously the number of genes that can be analyzed simultaneously. Genome-wide analysis can be done to identify genes that are differentially expressed. Once these genes are discovered, quantitative methylation analysis can be applied and a subset of methylation-regulated genes can be identified. When methylation and gene expression profiles have similar predictive value, a methylation-based test could be preferable. Although improvements have been made in the recent years and gene expression markers are now found in clinical settings, reproducibility of chip array expression profiles remains an issue. RNA is much more fragile and more prone to degradation compared with the covalent addition of methyl groups to cytosine. In our laboratory, we have observed stable methylation ratios independent of the quality of the DNA and were able to accurately analyze DNA methylation from paraffin embedded tissue samples.5
This study is the first to show that DNA methylation can be analyzed on a large scale and quantitative results can be used for predicting tissue pathology. The data also suggest a potential role of DNA methylation in the identification of poor and good survival groups in NSCLC.
Epigenetic events are likely to occur early in tumor progression and identification of tumor-specific methylation changes will likely influence our understanding of the disease, possibly leading to molecular markers for early detection of lung cancer.
| Acknowledgments |
|---|
| Footnotes |
|---|
3 http://www.sequenom.com/customer_support/scientific_applicationnotes.php. ![]()
4 http://www.ensembl.org/Homo_sapiens/index.html. ![]()
Received 1/31/06. Revised 7/21/06. Accepted 8/22/06.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
M. Ehrich, J. Turner, P. Gibbs, L. Lipton, M. Giovanneti, C. Cantor, and D. van den Boom Cytosine methylation profiling of cancer cell lines PNAS, March 25, 2008; 105(12): 4844 - 4849. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Zilberman and S. Henikoff Genome-wide analysis of DNA methylation patterns Development, November 15, 2007; 134(22): 3959 - 3965. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Cancer Research | Clinical Cancer Research |
| Cancer Epidemiology Biomarkers & Prevention | Molecular Cancer Therapeutics |
| Molecular Cancer Research | Cancer Prevention Research |
| Cancer Prevention Journals Portal | Cancer Reviews Online |
| Annual Meeting Education Book | Meeting Abstracts Online |