There is a region with a high risk for esophageal squamous cell carcinoma (ESCC) in the northeast of Iran. Previous studies suggest that hereditary factors play a role in the high incidence of cancer in the region. We selected 22 functional variants (and 130 related tagSNPs) from 15 genes that have been associated previously with the risk of ESCC. We genotyped a primary set of samples from 451 Turkmens (197 cases and 254 controls). Seven of 152 variants were associated with ESCC at the P = 0.05 level; these single nucleotide polymorphisms were then studied in a validation set of 549 cases and 1,119 controls, which included both Turkmens and non-Turkmens. The association observed for a functional variant in ADH1B was confirmed in the validation set, and that of a tagSNP in MGMT, the association was borderline significant in the validation set, after correcting for multiple testing. The other 5 variants that were associated in the primary set were not significantly associated in the validation set. The histidine allele at codon 48 of ADH1B gene was associated with a significantly decreased risk of ESCC in the joint data set (primary and validation set) under a recessive model (odds ratio, 0.41; 95% confidence interval, 0.29-0.76; P = 4 × 10−4). The A allele of the rs7087131 variant of MGMT gene was associated with a decreased risk of ESCC under a dominant model (odds ratio, 0.79; 95% confidence interval, 0.64-0.96; P = 0.02). These results support the hypothesis that genetic predisposition plays a role in the high incidence of ESSC in Iran. [Cancer Res 2009;69(20):7994–8000]
- esophageal squamous cell carcinoma
Worldwide, 462,000 new cases of esophageal cancer are diagnosed annually ( 1). Seventy-five percent of affected individuals die within a year of diagnosis and the 5-year survival rate is only 10% to 16% ( 1). The incidence of esophageal cancer varies greatly between populations and across geographic regions. The east part of Golestan Province in northeastern Iran is a region with a very high risk of esophageal cancer, but the causative factors remain elusive. The annual incidence rate exceeds 100/100,000 ( 2, 3), although the rate has declined over the last three decades ( 4). Turkmens constitute half of the population in this region and Persians, Turks, Sistanies, Balouches, and Kurds constitute the other half (referred to here as non-Turkmens).
Esophageal cancer is of two main histologic types: squamous cell carcinoma and adenocarcinoma. Esophageal squamous cell carcinomas (ESCC) constitute >90% of esophageal cancers among both Turkmens and non-Turkmens in eastern Golestan ( 5). We have recently shown that, among Turkmens, there is a strong familial component to esophageal cancer ( 6). The first gene found to contribute to the burden of esophageal cancer in Iran was BRCA2 ( 7). Protein-truncating mutations in BRCA2 are associated with an increased risk of esophageal cancer among Turkmens. However, because only 8% of the cases in the region carry a deleterious BRCA2 mutation ( 7), it is likely that other genes are involved.
Turkmens are believed to descend from Turkic tribes who migrated from the Altai Mountains (on the border of China and Mongolia) to northern Iran ( 8). A population at high risk for ESCC is also present in China ( 9) and the two high-risk populations might share common susceptibility alleles. Based on this hypothesis, we propose that the genes that have been reported previously to be associated with ESCC in the high-risk Chinese population ( 10, 11) are candidate genes among Turkmens. Initially, we focused on Turkmens because this is the largest ethnic group in the region and they are believed to be genetically homogeneous. However, we also studied a similar number of non-Turkmens from the same region. We studied 15 different genes that are involved in DNA repair, cell cycle control, and the metabolism of alcohol, folate, and carcinogens.
Materials and Methods
This report is a component of the Gastric and Esophageal Malignancies in Northern Iran Project ( 5, 12, 13), which is under way in Gonbad, the second largest city of Golestan Province, in northeastern Iran and in Ardabil, the largest city in Ardabil Province in northwestern Iran. The project is directed by the Digestive Disease Research Center of the Tehran University of Medical Sciences in collaboration with the National Cancer Institute, the IARC, and the University of Toronto. Cases and controls were collected between August 1, 2001 and May 15, 2008. The diagnosis of ESCC was confirmed for all cases by upper gastrointestinal endoscopy and by pathologic evaluation of tumor biopsies. The study protocol was approved by the ethics committees of the Digestive Disease Research Center, the National Cancer Institute, and the IARC.
We employed a two-stage design. In the first stage, we genotyped a total of 152 variants in 197 Turkmen cases and 254 Turkmen controls (the primary sample set). In the second stage, we genotyped the most promising variants identified from the primary sample set in 549 more cases and 1,119 more controls, including both Turkmens and non-Turkmens (the validation sample). The entire set of samples genotyped in this study is referred to as the joint sample set.
Cases. A total of 746 cases was enrolled in this study. There were 281 Turkmens and 220 non-Turkmens ESCC cases from Gonbad City. Non-Turkmen cases included Persians (n = 114), Turks (n = 59), Sistanies (n = 28), Balouches (n = 9), and Kurds (n = 10). In addition, we enrolled 245 Turkish cases from Ardabil, a city in northwestern Iran with intermediate rates of ESCC ( 13). The mean age at diagnosis was 63.6 years (range, 25-89 years); 380 cases (50.9%) were male and 366 cases (49.1%) were female. Epidemiologic data included smoking status, use of opium and alcohol, and a family history of cancer. The primary set of cases consisted of the first 197 Turkmens who were available for testing. The remaining 549 cases composed the validation set (84 Turkmens and 465 non-Turkmens).
Controls. One thousand three hundred seventy-three controls were recruited to the study: 244 were Turks from Ardabil City, 811 were Turkmen from Gonbad City, and 318 were non-Turkmen from Gonbad City [Persians (n = 133), Turks (n = 78), Sistanies (n = 79), Balouches (n = 19), and Kurds (n = 9)]. Eight hundred ninety-eight controls were patients referred to local hospitals for a reason other than cancer and 475 controls were healthy individuals enrolled from the Golestan cohort study ( 12). Controls had no personal history of cancer. The mean age for controls was 55.2 years (range, 24-90 years); 700 (51.0%) were male and 673 (49.0%) were female. The primary set of controls consisted of the first 254 Turkmens who were available for testing. The remaining 1,119 controls composed the validation set (557 Turkmens and 562 non-Turkmens). Characteristics of the total 746 cases and 1,373 controls are presented in Table 1 .
Genes were selected based on literature review of the previously reported ESCC-associated variants, which mostly were from Chinese population. Four genes were involved in DNA repair ERCC2 (XPD), XRCC1, hOGG1, and MGMT; four genes were involved in cell cycle control: CCND1 (cyclin D1), CDKN2A (p16), TP53 (p53), and TMPRSS11A (ECRG1); four genes coded for carcinogen-metabolizing enzymes: CYP1A1, CYP2E1, GSTP1, and NAT2; two genes coded for alcohol-metabolizing enzymes: ADH1B and ALDH2; and one gene coded for a folate-metabolizing enzyme: MTHFR. The role of these genes and their variants in esophageal cancer has been reviewed ( 10, 11).
For each gene, at least one functional variant has been reported in the literature to be associated with ESCC ( 10, 11, 14). These variants include Asp312Asn (rs1799793) and Lys751Gln (rs13181) from ERCC2 gene; Arg399Gln (rs25487) from XRCC1 gene; Ser326Cys (rs1052133) from hOGG1 gene; Leu74Phe (rs12917) and Lys178Arg (rs2308327) from MGMT gene; c.870A>G (rs9344) from CCND1 gene; Ala148Thr (rs3731249) from CDKN2A gene; Arg72Pro (rs1042522) from TP53 gene; Arg293Gln (rs353163) from TMPRSS11A (ECRG1) gene; Ile462Val (rs1048943) from CYP1A1 gene; c.-1019C>T (rs2031920) from CYP2E1; Ile105Val (rs1695) from GSTP1 gene; Tyr94Tyr (rs1041983), Ile114Thr (rs1801280), Leu161Leu (rs1799929), Arg197Gln (rs1799930), and Gly286Glu (rs1799931) from NAT2 gene; Arg48His (rs1229984) from ADH1B gene; Glu487Lys (rs671) from ALDH2 gene; and Ala222Val (rs1801133) and Glu429Ala (rs1801131) from MTHFR gene.
Three slow-metabolizing alleles of M1, M2, and M3 were reported for NAT2 gene. Variants Ile114Thr and Leu161Leu constitute allele M1, variants Arg197Gln and Tyr94Tyr constitute allele M2, and variant Gly286Glu is named allele M3 ( 15).
Within populations, it is believed that single nucleotide polymorphisms (SNP) are distributed along chromosomes in haplotype blocks; therefore, genotyping of one or more SNPs, which are characteristic of a haplotype block, will capture the important information for the additional SNPs in that block. These SNPs are in linkage disequilibrium with the genotyped SNPs and are called tagSNPs. Genotype data from the Chinese population of the HapMap project (ref. 16; Rel21/phase II) were used for selecting tagSNPs to cover the genomic region of each gene, including 5 kbp in each 5′ and 3′ ends to cover the regulatory regions of the genes. Haploview software ( 17) was used for selecting tagSNPs. Aggressive tagging with using two- and three-marker haplotypes, LOD score threshold of 3.0 for multimarker testing, and r2 threshold of 0.8 were employed for tagging procedure. Genomic regions of all 15 candidate genes encompass 705 kbp of the human genome. There are 635 SNPs with minor allele frequency of >1% in the Chinese population of the HapMap project. One hundred forty-one tagSNP were selected to capture 594 SNPs (94% of the total common SNPs) with mean r2 of 0.98 in 152 tests ( Table 2 ).
We initially tested the 152 SNPs in a primary set of 197 cases and 254 controls. From the 141 tagSNPs selected for study, 11 were among the functional variants mentioned above and 130 were additional. All SNPs were genotyped in germ-line DNA of all samples of the primary set of cases and controls. iPLEX chemistry on a matrix-assisted laser desorption/ionization—time-of-flight MassARRAY System (Sequenom) was used for genotyping the 152 SNPs in eight reactions. The procedures were done according to the manufacturer's standard protocol. SNPs that showed association with ESCC in the primary set of samples (P < 0.05) were combined and a new assay design was done to genotype them in a single reaction. ESCC-associated SNPs identified in the primary set of samples were genotyped in the germ-line DNA of the validation set of samples at the second stage. Ten percent blinded quality-control samples were included in each plate; the concordance rates ranged from 98% to 100%. The average genotyping call rate was 98% (ranging from 95% to 100%). Two ESCC-associated SNPs were failed for genotyping in the primary set of samples; they were genotyped using TaqMan assay on ABI 7500 fast real-time system (Applied Biosystems). Genotyping of the primary set and validation set were done in Analytical Genetics Technology Centre of the University of Toronto and Genome Quebec Innovation Centre, respectively.
The permutation version of the exact test was done to test for Hardy-Weinberg equilibrium. All genotypes were first studied in the primary sample set. Based on the results of the primary set, variants were selected for genotyping in the validation set. Haplotype analysis using expectation maximization algorithm ( 18) was used for comparing SNPs tagged by multiple markers. The significance level of α = 0.05 was used for all comparisons in the primary sample set (the P values of the comparisons from the primary set were not adjusted for multiple testing). Because the direction of association in the validation set was based on the primary set results, a one-sided test was used for the comparisons in the validation set. In the validation set, the P values of the comparisons were corrected for multiple testing using the Bonferroni approach. For those variants that were independently confirmed in the validation set, the genotypes of the primary set and the validation set were combined to form a joint sample set. All case-control comparisons were adjusted for age and ethnicity using multivariate logistic regression. Ethnicity was defined as Turkmen, Turk, Persian, Sistani, Balouch, or Kurd.
To exclude the possibility that there were technical errors in genotyping between the typing of subjects in the primary and validation sets, we re-genotyped all the 7 variants genotyped in the validation set in the primary sample set using the genotyping assay used for the validation sample set. The concordance rate of the genotyping results was 97%.
The possibility that there is spurious population stratification in the Turkmens was addressed by principle component analysis using the Eigenstrat method ( 19) for identifying any population stratification in the primary sample set, which we had genotypes of 130 tagSNPs plus 22 functional variants for them. By plotting the Eigenvectors of the first two principle components with the largest Eigenvalues for the cases and the controls ( Fig. 1 ), it appears that population stratification was not present. All analyses were done by SNP Variation Suit version 7 (Golden Helix).
Initially, we genotyped 152 SNPs in a primary set of 197 cases and 254 controls. These were the first samples to be collected in the course of the project and all subjects were Turkmen. These SNPs included 22 functional variants of 15 genes and 130 tagSNPs from these genes. The results of the 17 functional variants and 3 slow-metabolizing alleles of NAT2 gene in the primary set of 451 samples are shown in Supplementary Table S1. The frequencies of the minor alleles of all variants, except one (ALDH2 Glu487Lys), were >1%. All were in Hardy-Weinberg equilibrium.
In addition, we selected and genotyped 130 tagSNPs of 15 different genes in the primary set of samples. These SNPs had not been studied previously in ESCC. Eight tagSNPs had minor allele frequencies of <0.01 and 3 tagSNPs were not in Hardy-Weinberg equilibrium. These 11 tagSNPs were excluded from the analysis. The remaining 119 tagSNPs were analyzed in the primary set of 197 cases and 254 controls.
Two functional variants showed associations with ESCC in the primary sample set with P values < 0.05 (the Arg48His variant from ADH1B gene and c.870A>G from CCND1). Five tagSNPs, from five different genes, also showed significant associations in the primary set. These 7 SNPs (2 functional variants and 5 tagSNPs) were then genotyped in the validation set of samples. The results are reported separately for the primary set, the validation set, and the joint set ( Table 3 ). Results were reported in the form of odds ratios (OR) for the three genotype classes, with the most common homozygous genotype being the reference class. The cases and controls were different with regards to age and ethnic group ( Table 1); therefore, all ORs were adjusted for age and ethnic group. Results were also reported assuming dominant and recessive models.
The strongest association was observed with the His allele of the Arg48His variant of ADH1B gene. This allele was associated with a decreased risk of ESCC under a recessive model ( Table 3). Compared with those who carried no His allele, those who carried two His alleles were at reduced risk (OR, 0.63; P = 0.0002; 95% confidence interval, 0.34-0.93).
This association with the rs7087131 variant of MGMT gene was borderline significant in the validation set after adjusting for multiple testing ( Table 3). In the combined set, the A allele of the rs7087131 variant of MGMT gene was associated with a decreased risk of ESCC under a dominant model (OR, 0.79; 95% confidence interval, 0.64-0.96; P = 0.02). For the other 5 variants, the associations were failed to be confirmed in the validation set ( Table 4 ).
We studied 22 different functional variants of 15 genes, which were previously reported to be associated with ESCC. Notably, the ORs reported for these variants in the earlier studies were typically ≥2.0 ( 11). Given that the frequencies of the minor alleles of these variants in our population were almost all above 0.1 (Supplementary Table S1), the power of our study to detect ORs above 2.0 exceeded 90%. However, of the 22 variants tested, only two polymorphisms (ADH1B Arg48His and CCND1 c.870A>G) showed significant associations at the P = 0.05 level in the primary sample set and ADH1B Arg48His was then confirmed in the validation set.
In the joint sample set, the Arg48His allele was protective under a recessive model (OR, 0.41; 95% confidence interval, 0.29-0.76; P = 4 × 10−4; Table 4). ADH1B is one of seven different genes that encode five different classes of alcohol dehydrogenase (ADH) enzymes. These enzymes are involved in the oxidation of alcohol groups of a variety of substrates. ADH class I, which is responsible for the oxidation of ethanol to acetaldehyde, is a homodimer or heterodimer of three related subunits (α, β, and γ) encoded by ADH1A, ADH1B, and ADH1C, respectively ( 20). All ADH enzymes use NAD+/NADH as their oxidizing cofactor. The arginine residue at codon 48 of ADH1B is in close contact with its cofactor and seems to have a crucial role on modulating the ethanol oxidizing capability of the protein. When the arginine is replaced by histidine at this codon, the enzyme activity increases by 70- to 80-fold ( 20). In a rodent model, ADH1B, which has proline at this codon, is not capable of oxidizing ethanol ( 21).
In previous studies, the Arg allele of ADH1B consistently has been reported to increase the risk of ESCC by 1.6- to 4-fold among alcohol drinkers compared with individuals with two His alleles ( 10, 11, 22). The suggested mechanism is that alcohol and acetaldehyde persist for a longer time in the circulation of individuals with the Arg allele, who are slow metabolizers. These individuals tend to experience less flushing on exposure to alcohol and are prone to alcoholism ( 23). We confirmed that the presence of the Arg allele at codon 48 of the ADH1B protein increases the risk of ESCC by 2.4-fold in comparison with two His alleles (P = 4 × 10−4). Considering that only 4.5% of our study subjects were alcohol drinkers, this finding suggests that in Iran the association is not due to metabolism of alcohol per se. Our observation is contrary to other studies that report no association between the Arg48His variant of the ADH1B gene and the risk of ESCC among nondrinkers ( 22, 24, 25). Most studies that showed the association of Arg48His variant of ADH1B with ESCC in alcohol drinkers also report an association between the Glu487Lys variant of ALDH2 and esophageal cancer ( 11). The ALDH2 protein with lysine residue at codon 487 is catalytically inactive and cannot metabolize acetaldehyde, a known carcinogen ( 23). We did not observe this association in Iran.
Our data suggest that, in Northern Iran, ESCC may be related to an environmental factor other than ethanol but is also metabolized by ADH1B. 1-Methylpyrene (1-MP) is a candidate for such an environmental factor. 1-MP is a carcinogenic polycyclic aromatic hydrocarbon (PAH), which is metabolized to1-hydroxymethylpyrene (1-HMP) by cytochrome P450 enzymes.1-HMP could be activated by sulfotransferases to 1-sulfooxymethylpyrene (1-SMP), which is capable of forming DNA adducts, but most of 1-HMP is inactivated by ADH1B and excreted in urine and does not form DNA adducts ( 26, 27).
PAHs have been shown to be esophageal carcinogens in both animal and human studies. In a 2-year feeding study in mice, a dose-response relationship was shown between benzo[a]pyrene intake and ESCC incidence ( 28). Occupational exposure to PAHs in Sweden was reported to double the risk of ESCC ( 29). Other lines of evidence that support the contribution of PAHs in the pathogenesis of ESCC come from a high-risk region in China and include high urinary levels of PAH metabolites in residents ( 30), increased concentrations of PAHs in cooked and uncooked foods in the region ( 28), and the presence of PAH-DNA adducts in ESCC tissue samples of patients ( 31).
PAHs require metabolic activation to become carcinogens. Purely aromatic PAHs, such as benzo[a]pyrene, are activated via the diol-epoxide pathway ( 32), but alkylated PAHs, such as 1-MP, are metabolized by an alternative pathway named the aralkylating hydrocarbon pathway ( 33). In this pathway ( Fig. 2 ), 1-MP is hydroxylated to 1-HMP by cytochrome P450 enzymes such as CYP1A1 ( 34). 1-HMP subsequently forms reactive sulfuric acid ester (1-SMP) by the action of sulfotransferases ( 35, 36). 1-SMP forms DNA adducts and has been shown to be mutagenic in rats ( 33) and in Salmonella typhimurium ( 37). 1-SMP is also reacts with glutathione and is excreted in urine as methylpyrenyl mercapturic acid (MPMA). The urine MPMA constitutes only 2% of the administrated 1-HMP to rats; 80% of it is excreted as free 1-pyrenylcarboxylic acid (1-PCA) and its glucuronic acid or other conjugates ( 26). 1-PCA is an inactivate metabolite of 1-HMP, which has been oxidized by ADH1B ( 27). Therefore, ADH1B competes with sulfotransferases for the metabolism of 1-HMP and ADH1B activity decreases the formation of DNA adducts. If ADH1B is the major enzyme for the inactivation of 1-HMP, then other substrates of ADH1B are expected to interfere with inactivation of 1-HMP and increase the amount of its activated metabolite. This has been exhibited by concurrent administration of 1-HMP and ethanol to rats, thereby enhancing the hepatic level of PAH-DNA adducts by 15-fold ( 26). The competition of ethanol with 1-HMP for oxidation by ADH1B could also explain the synergistic carcinogenic effect of alcohol and cigarette smoke (which is rich in 1-MP, the precursor of 1-HMP; ref. 27). It is possible that individuals with two slow-metabolizing Arg alleles at codon 48 of ADH1B gene do not inactivate 1-HMP as efficiently as those who carry one or more His alleles at this codon. In this case, more 1-HMP would be available for conversion to 1-SMP by sulfutransferases; consequently, more DNA adducts would be formed ( Fig. 2).
The rs7087131 variant of MGMT was also found to be associated with ESCC under a dominant model. However, this association was of borderline significance. O6-methylguanine is formed by alkylating agents such as nitrosamines, which are associated with an increased risk of ESCC ( 38). O6-methylguanine has been found in the epithelial cells of the esophagus of individuals living in the high-risk region of China ( 39). Unrepaired O6-methylguanine tends to pair with thymine and introduces G:C to A:T transitions during cell replication ( 40). O6-methylguanine-DNA methyltransferase (MGMT) repairs O6-methylguanine by transferring the methyl group to a cysteine residue at codon 145 of the enzyme ( 41). Binding of methyl group to the cytosine residues of MGMT is an irreversible reaction, and after repairing each single O6-methylguanine, the protein is ubiquitylated and degraded and needs to be replaced by a new protein. Therefore, it is expected that cells with low expression of MGMT cannot repair O6-methylguanine efficiently. Low expression of MGMT in cells can result from either promoter hypermethylation of the gene or by cis-acting nucleotide polymorphisms, which change the binding sites of key regulatory factors ( 42). A group of SNPs at the 5′ half of the gene were described to be in association with MGMT activity in WBC ( 42). The rs7087131 variant is also located at the 5′ half of the MGMT gene in the intronic region between exons 2 and 3 and directly or indirectly might affect expression level of the gene and contribute to the risk of ESCC in Iran. In our study, the substitution of G by A nucleotide in this variant decreased the risk of ESCC under a dominant model (OR, 0.79; 95% confidence interval, 0.64-0.96; P = 0.02).
There are several limitations to our study. The sample size was relatively small and we did not have sufficient power to detect associations of a modest size, for example, ORs of <1.5. The primary data set was composed of Turkmens, and if an association were to be present only in non-Turkmen, it would not be discovered using this strategy. Similarly, if an association were present only in Turkmens, it would fail to be confirmed in the validation set. We have collected data on family history for most of the cases, but because of the high fatality rate of esophageal cancer, a DNA sample was available for the proband only.
In conclusion, this study extends our earlier observations that genetic factors play an important role in the high incidence of esophageal cancer in the northeast of Iran. A strong and significant association was seen with the Arg48His allele of ADH1B gene among both Turkmens and non-Turkmens. In several previous studies, this variant was associated with the risk of ESCC, primarily among alcohol drinkers ( 10, 11, 22). Interestingly, in this study, the variant is related to the risk of ESCC in alcohol nondrinkers, suggesting that other carcinogens might be causative.
It is important that this observation be confirmed in larger series of patients. To date, we have strong evidence that both BRCA2 and ADH1B contribute to the burden of esophageal cancer in Northern Iran, but the known variants in these two genes are not sufficient to explain the high incidence of esophageal cancer in the region, or the high familial relative risk, and further studies are needed.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Grant support: Canadian Institutes of Health Research and Digestive Disease Research Center of the Tehran University of Medical Sciences.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
We thank Dr. Jafar Bashiri [case-control study center (Aras Clinic) in Ardabil City], Safura Kor [case-control study center (Atrak Clinic)], Goharshad Goglani and Mina Bahrami (cohort study center in Gonbad City) for help with study subject recruitment, Cheryl Crozier (Analytical Genetics Technology Centre, University of Toronto) for the kind help, and Alexandre Belisle (Genotyping Platform of the McGill University and Genome Quebec Innovation Centre) for help in performing the genetic tests of this study.
Note: Supplementary data for this article are available at Cancer Research Online (http://cancerres.aacrjournals.org/).
- Received April 4, 2009.
- Revision received June 16, 2009.
- Accepted July 2, 2009.
- ©2009 American Association for Cancer Research.