| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Epidemiology and Prevention |
1 Department of Epidemiology, The University of Texas M. D. Anderson Cancer Center, Houston, Texas; 2 Department for Respiratory Medicine, Asklepios Specialist Hospitals, Munich-Gauting, Germany; 3 Genefinder Technologies Ltd.; 4 Institute of Molecular Medicine, Munich, Germany; 5 Sequenom, Inc., San Diego, California; 6 Roy Castle Lung Cancer Research Programme, University of Liverpool Cancer Research Centre, Liverpool, United Kingdom; 7 Cancer Research UK Centre for Epidemiology, Mathematics and Statistics Wolfson Institute of Preventive Medicine, London, United Kingdom; 8 Department for Respiratory Medicine, Schillerhöhe Specialist Hospital, Stuttgart-Gerlingen, Germany; and 9 Hamon Center for Therapeutic Oncology Research, Simmons Cancer Center, University of Texas Southwestern Medical Center, Dallas, Texas
Requests for reprints: Christopher I. Amos, Department of Epidemiology, The University of Texas M. D. Anderson Cancer Center, Box 189, 1515 Holcombe Boulevard, Houston, TX 77030. Phone: 713-563-49-51; Fax: 49-8106-23273; E-mail: camos{at}mail.mdanderson.org or Peter Meyer, Genefinder Technologies Ltd., Sperberstrasse 2, 81827 Munich, Germany. Phone: 49-171-2838333; Fax: 49-8106-23273; E-mail: Peter.Meyer{at}onkogenetik.de.
| Abstract |
|---|
|
|
|---|
2-fold elevated and statistically significant (P = 0.004) level of SEZ6L expression in tumor samples compared with normal lung tissues. In conclusion, the results of these studies representing 906 cases compared with 811 controls indicate a role of the SEZ6L Met430Ile polymorphic variant in increasing lung cancer risk. [Cancer Res 2007;67(17):8406–11] | Introduction |
|---|
|
|
|---|
Identification of genetic factors modulating lung cancer risk requires a combination of effective genotyping technologies with an appropriate and efficient study design. Sequenom (San Diego, CA) has developed a DNA analysis platform, capable of high-throughput genotyping with pooled DNA allele frequency analysis. Using this approach, Sequenom implemented a Genetics Discovery platform with dense genome-wide single nucleotide polymorphism (SNP) markers (7, 8). A hypothesis-free approach using allele frequency estimates of many thousands (for lung cancer, 83,715 SNPs) of SNPs was used as a first step (pilot study) in identifying potentially relevant genetic variants. Significant SNPs identified in this first step were then individually genotyped and validated in replication studies using independent samples. The efficiency of this strategy has been shown by the rediscovery of genes shown previously to be involved in several common diseases (8–10). The purpose of the current study was to implement this strategy to identify genetic variation modulating lung cancer risk.
| Materials and Methods |
|---|
|
|
|---|
|
The Liverpool Lung Project replication study (United Kingdom). The lung cancer case-control data were derived from the Liverpool Lung Project (LLP) from an ongoing molecular epidemiologic study of lung cancer in Liverpool, United Kingdom (12). Histologically or cytologically confirmed lung cancer cases with primary tumors were recruited from participating chest clinics. Population controls were selected from registers of General Practitioners in Liverpool to ensure similar age-sex distributions to the cases. In all studies, a standardized questionnaire was used to determine basic demographic characteristics in addition to details on smoking history, lifetime residence and occupation, history of lung diseases, family history of cancer in first-degree relatives, and exposure to environmental tobacco smoke. Smoking status was defined as in the U.S. study. In total, 248 lung cancer cases and 233 controls were included in this analysis. Table 1 provides a description of cases and controls used in the LLP study.
SNP markers and genotyping. Genomic DNA was extracted from blood peripheral leukocytes by using the Qiagen DNA blood mini kit (Qiagen) according to the manufacturer's instruction. DNA pools were formed by combining equimolar amounts of individual samples as described elsewhere (13). For the pilot study, one pool of 369 cases and one pool of 287 controls, respectively, were constructed. For assays carried out on sample pools, 25 ng of a 5-ng/L pool were used for PCRs. All PCR and MassEXTEND reactions were conducted using standard conditions. Relative allele frequency estimates were derived from calculations based on the area under the peak of mass spectrometry measurements from four analyte aliquots (14). Tests of association between disease status and each SNP were carried out as previously discussed (15). When three or more replicate measurements of a SNP were available, the corresponding variance component was estimated from the data. Otherwise, the following historical laboratory averages were used to calculate sources of variability: pool formation, 5.0 x 10–5; PCR/mass extension, 1.7 x 10–4; and chip measurement, 1.0 x 10–4. The same procedure was used for individual genotyping except 2.5 ng DNA was used and only one mass spectrometry measurement was taken. The following gene-specific primers were used to genotype rs663048: the forward PCR primer was 5'-TGGGCTATGAGCTCCAGGG-3'; the reverse PCR primer was 5'-TGCGGCTTGGAGGCATTGAT-3'; and extend primer was 5'-GAGCTCCAGGGCGCTAAGAT-3'.
The Sequenom-Genefinder pilot study included 83,715 SNPs selected based on their location within a gene region (including the coding region plus additional 10 kb at the both ends) and minor allele frequency (MAF) from a total of 125,799 experimentally validated polymorphic variations (7, 8, 10). In the first step, one PCR and primer extension reaction was carried out for 83,715 SNPs on each pool (case and control). In the second step, 4,293 SNPs (
5%) with the most statistically significant associations were remeasured in triplicate on each DNA pool. In the third step, the 301 most significant SNPs (
7%) from step two were individually genotyped in each sample. A total of 160 SNP markers were identified with statistically significant differences between cases and controls (P < 0.05) after individual genotyping in the German pilot study and were then genotyped in the MDACC and LLP replication samples.
Expression of the seizure 6-like gene. Besides analyzing the effect of the Met430Ile variant on lung cancer risk, we also compared the seizure 6-like (SEZ6L) expression level in normal versus cancer cell lines (our data) and in primary tumors versus normal lung tissues [data from the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database]. Affymetrix HG-U133A and HG-U133B high-density oligonucleotide microarrays were also used to evaluate expression of the SEZ6L gene in cell lines. Gene expression profiling was done on a panel of 52 non–small cell lung cancer (NSCLC) and 22 small cell lung cancer (SCLC) cell lines. As a control, we used seven normal human bronchial epithelial cell lines immortalized with cyclin-dependent kinase 4 and hTert or with E6/E7 with or without hTert (16), and two unimmortalized lung cell lines (NHBEC and SAEC). The list of the cell lines used in the study can be found in Supplementary Table S1. Four probes were used for SEZ6L. Signals were median normalized and log 2 transformed. Averaging of the signal across the five probes was used to estimate signal intensity for each cell line as well as for controls. Expression of the SEZ6L was detected in all cell lines.
For comparing the SEZ6L expression in normal tissues and primary lung tumors, we used the GDS619 data set with microarray gene expression data in SCLC (19 samples), non–small lung cancer (12 samples), and normal lung tissue (20 samples). Tumor samples were obtained from the patients undergoing surgery at the Cancer Institute Hospital (Tokyo, Japan). The control samples were obtained by bronchial brushes from unrelated healthy individuals. Normalized and log-transformed gene expression data were downloaded and the SEZ6L expression in normal and tumor tissues was compared (see description for GDS619 data set from the GEO database).11
Statistical analysis. The distributions of the demographic variables between cases and controls were compared using the
2 test. For categorical variables (sex, ethnicity, and smoking status), two-sample Student's t test was used. A goodness-of-fit
2 test was used to determine whether the polymorphisms were in Hardy-Weinberg equilibrium. The adjusted odds ratios (OR) were calculated using multiple logistic regression to control for age, sex, and intensity of smoking (pack-year), to estimate effect of SNPs on lung cancer risk. To estimate the overall ORs from the replication studies, we used the Mantel-Haenszel test (17). All statistical analyses were done using STATISTICA (StatSoft, Inc.).
| Results |
|---|
|
|
|---|
37 kb. The region contains a cluster of 17 of 25 SNPs significantly associated with lung cancer risk (P < 0.05). Most of the SNPs in the region were in linkage disequilibrium (minimum r2
0.8). Based on moderate or low r2
0.6 levels of linkage disequilibrium, six significant SNPs were chosen for genotyping in two replication sample sets. We found that the position and size of the candidate region coincides well with the position and size of the SEZ6L gene. The candidate region occupied the distal part of the 428-kb region frequently deleted in lung cancer cell lines (18). The deletion contains two genes: SEZ6L and MYO18B (18). No significant SNPs were detected in the MYO18B region (Fig. 1 ).
|
|
|
|
2 = 7.4; degrees of freedom (df) = 1; P = 0.006]. We observed a 3.8-fold increased risk of lung cancer in individuals who carried the rs663048 null genotype T/T compared with those who carried the rs663048 more common (wild-type) genotype G/G (95% CI, 1.40–10.42) after adjusting for age, sex, and smoking (Table 4). Expression of the SEZ6L in normal and lung tumor cell lines. Supplementary Fig. S1 shows the expression level of SEZ6L in 9 normal lung cell lines, 54 NSCLC, and 22 SCLC cell lines. The average expression signal in controls was 6.2 (n = 9) compared with the average expression signal in the NSCLC cell lines of 7 (n = 54) and in SCLC cell lines of 8.8 (n = 22). A nonparametric Mann-Whitney U test was significant for all pairwise comparisons. The smallest (but still very significant) difference was found between the normal lung cell lines and NSCLC cell lines (Z = –3.3; P = 0.001). We also found significant differences in expression level of GATA2, with the expression being higher in cancer lines compared with controls (Mann-Whitney U test, Z = –3.9; P = 0.0005). Other candidate genes listed in Table 3 did not show differential expression between lung cancer and normal lung cell lines. SNP genotypes were not available for this analysis.
Expression of the SEZ6L in primary tumor and normal lung tissues. We used the NCBI GEO database containing microarray data on the genome wide assessment of gene expression. Using the key words "lung AND cancer OR tumor," we have identified several entries with data on gene expression. The GDS619 data set (platform GPL962: CHUGAI41K) was most appropriate for our goal to compare SEZ6L expression in normal and tumor tissues. This data set contains data on gene expression in SCLC, adenocarcinomas, and normal tissues. We found that the average log 2-transformed SEZ6L expression value was significantly higher in adenocarcinoma compared with normal lung tissues (0.34 ± 0.13 versus –0.38 ± 0.15; two sided t test = 3.1; df = 29; P = 0.004). The expression of SEZ6L was also insignificantly higher in SCLC compared with normal tissues (Student's t test = 1.4; df = 43; P = 0.15). The variance of expression values in NSCLC sample was significantly higher compared with the normal tissues (Var = 0.07 among controls versus Var = 0.42 among SCLCs; F = 6.2; P = 0.001). This result together with our analysis of the SEZ6L expression in cell lines suggests an increased level of the SEZ6L expression in lung tumor compared with normal lung tissues.
| Discussion |
|---|
|
|
|---|
In our study, we have combined elements of both approaches. At the first step, more than 83,000 SNP markers covering the coding region of the whole genome were analyzed. This analysis yielded many candidate SNPs showing significant associations with lung cancer. Significant SNPs identified in the pilot study are a mix of false positives and true associations. For selected SNPs in SEZ6L candidate region, two independent replication studies were then conducted to identify true associations. The MDACC replication study yielded eight SNPs showing associations with lung cancer risk. We then used additional SNP-related information to identify most promising genes. As a result of this analysis, the SEZ6L gene emerged as a top candidate gene to be associated with lung cancer. It is interesting that based only on the significance of the
2 test, this gene was number five in the list. The LLP replication study further validated the association of SNP rs663048 in SEZ6L with lung cancer risk.
An analysis of the expression of the SEZ6L gene showed different expression of SEZ6L in normal and NSCLC SCLC cell lines. Interestingly, we found that the two histologic types of lung cancer had different levels of expression of SEZ6L. The average expression signal in NSCLC was 7.0 ± 0.1 and in SCLC 8.8 ± 0.2 (Mann-Whitney U test, Z = 6.0; P < 0.001). It needs to be noted that there is an apparent inconsistency between the results of the analysis of the expression of SEZ6L and the results of the association studies. In the MDACC and United Kingdom samples, the variant allele (that was predicted to be protein disturbing based on analysis of the protein structure and evolutionary conservation) was associated with increased risk for lung cancer, suggesting that loss of normal SEZ6L function may be a risk factor for lung cancer. This is consistent with the finding that SEZ6L region is often deleted in lung cancer cell lines (18). On the other hand, we found that expression of SEZ6L is elevated in lung cancer cell lines. One explanation might be that SEZ6L is both a tumor marker and a variant affecting lung cancer susceptibility. Loss of normal SEZ6L function is a risk factor for development of lung cancer; however, when lung cancer is caused by factors other than loss-of-SEZ6L function, expression of the SEZ6L is adaptively up-regulated to suppress tumorigenesis.
Several lines of evidence support the hypothesis that SEZ6L might modulate lung cancer risk. First, frequent allelic losses on 22q in NSCLCs have been reported, indicating the presence of tumor suppressor gene(s) on that chromosome arm (18). Cloning of the breakpoints revealed a 400-kb deletion containing the SEZ6L and MYO18B genes (18, 27). A study conducted by Suzuki at al. (28) suggests that SEZ6L gene may also influence development and progression of colorectal cancer. The authors found that SEZ6L was one of the few genes highly hypermethylated in primary colorectal tumors.
In the pilot study, we populated the candidate region with 35 SNPs and found that markers located in the SEZ6L gene region show a strong association with lung cancer; however, no significant associations were found in the neighboring MYO18B gene. Applying a sliding window of five neighboring SNPs revealed a peak of –log10 (P values) that coincides with the position of the SEZ6L gene. We found that the principal contributor to the peak was rs663048. The association of this SNP with lung cancer risk was verified in two independent replication studies. The rs663048 SNP is a Met430Ile amino acid substitution that has been predicted to be functional by both SIFT and PolyPhen, suggesting that this amino acid substitution is protein disturbing.
Nishioka et al. (18) found that 95% (43 of 45) of primary tumor samples carry the Met430Ile mutation. The authors did not estimate the frequency of the variant in controls. We found that 38% of controls and 48% of cases carry at least one variant allele. This suggests that
40% of tumors may carry a somatic Met430Ile mutation. If we consider that according to the HapMap the frequency of the Met430Ile polymorphism is lower in Japanese than in Caucasians, the percentage of accumulated somatic Met430Ile may actually be higher.
We found that homozygotes for the variant allele had 3-fold higher lung cancer risk compared with the normal variant homozygotes. Lung cancer risk was also significantly elevated in heterozygotes. According to our estimates, the frequency of the variant allele for rs663048 is 22%, which is very similar to the 20% reported for Caucasians by the HapMap database. We found that 36% of Caucasian controls are hetorozygotes and
4% are homozygotes for the risk allele, making the portion of Caucasians having at least one risk allele
40%. Results from combined, Mantel-Haenszel, analysis yielded ORs of 1.15 [95% confidence interval (95% CI), 1.04–1.59] for heterozygotes and 3.32 (95% CI, 1.81–7.21) for homozygotes. The population attributable risk percentage [PAR% = (OR – 1) x P / [(OR – 1) x P + 1] x 100, where P is the risk genotype frequency in the controls] was 7.5 for homozygotes and 8.3 for heterozygotes, suggesting
16% of excess risk in lung cancer cases is due to the presence of the variant allele.
In conclusion, our data together with published studies suggest that the Met430Ile variant might be a causal variant affecting risk of lung cancer. Although the strongest evidence from our study indicates this SNP, it is possible that another closely located SNP plays a dominant role in promoting lung cancer risk and that the Met340Ile variant is a marker in linkage disequilibrium with the underlying causal variant. However, further studies, especially those implementing functional assays, are warranted to provide more conclusive evidences on causal association between the Met430Ile and lung cancer risk.
| Acknowledgments |
|---|
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
| Footnotes |
|---|
I.P. Gorlov, P. Meyer, R. Dierkesmann, J.K. Field, and C.I. Amos contributed equally to this work.
10 http://www.cancerindex.org/geneweb/index.htm ![]()
11 http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=search&DB=gds ![]()
Received 12/27/06. Revised 4/27/07. Accepted 6/20/07.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
R. Karchin Next generation tools for the annotation of human SNPs Brief Bioinform, January 1, 2009; 10(1): 35 - 52. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Cancer Research | Clinical Cancer Research |
| Cancer Epidemiology Biomarkers & Prevention | Molecular Cancer Therapeutics |
| Molecular Cancer Research | Cancer Prevention Research |
| Cancer Prevention Journals Portal | Cancer Reviews Online |
| Annual Meeting Education Book | Meeting Abstracts Online |