| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Molecular Biology, Pathobiology, and Genetics |
1 National Genotyping Centre, 2 Human Genetics Group, 3 Hereditary Endocrine Cancer Group, Human Cancer Genetics Programme, Spanish National Cancer Centre, Madrid, Spain; 4 Department of Obstetrics and Gynecology, Helsinki University Central Hospital, Helsinki, Finland; 5 Unidad de Genética, Instituto de Medicina Legal, Facultad de Medicina, Universidad de Santiago de Compostela, Galicia, Spain; and 6 Department of Bioinformatics, Centro de Investigación Príncipe Felipe, Valencia, Spain
Requests for reprints: Javier Benítez Human Cancer Genetics Programme, Spanish National Cancer Centre, Melchor Fernández Almagro, 3 E-28029, Madrid, Spain. Phone: 34-91-224-6965; Fax: 34-91-224-6923; E-mail: jbenitez{at}cnio.es.
| Abstract |
|---|
|
|
|---|
| Introduction |
|---|
|
|
|---|
An alternative approach is based on the argument that disease-associated variants with modest effects might be distributed proportionately between coding and noncoding sequences of the genome (8, 9). Recent developments in high-throughput genotyping technology have made genotyping up to hundreds of thousands of marker single-nucleotide polymorphisms (SNPs) throughout the genome a possibility, both logistically and economically (9), and studies are now beginning to emerge applying these to a range of complex diseases using case-control studies (1016). Marker SNPs are ideally chosen to maximally capture the common variation across the genome. The idea behind this approach is that associations will be detected either directly with causal variants, if genotyped, or indirectly with markers in linkage disequilibrium with causal variants (8, 17).
Association studies that test a large number of SNPs are expensive and can lead to false-positive associations if multiple testing is not adequately accounted for. At the same time, correcting for a large number of tests requires large sample sizes to maintain adequate statistical power and therefore avoid false-negative associations. In addition, confirmation of associations identified in such studies by replication in independent and adequately sized samples is essential (9). These considerations present an economic and logistical challenge to investigators seeking to produce quality research in this field. Two-stage study designs have been proposed as an efficient means of addressing these challenges (9, 18). Under this design, all SNPs are genotyped in a case-control series at stage 1 (discovery) and a reduced set of candidate SNP selected based on unadjusted P values. In stage 2 (replication), only candidate SNPs are genotyped and tested for associations in an independent case-control series, thereby reducing both genotyping costs and the number of tests to be corrected for. Both the discovery and replication studies should be of sufficient size to avoid false-positive and false-negative findings (19).
Breast cancer is a complex disease for which very little of the genetic cause is known. Although several rare, high-penetrance genes have been identified (BRCA1 and BRCA2 being the most common), these explain only a minority (30%) of familial breast cancers and a negligible proportion of sporadic breast cancers (20, 21). A polygenic model, with more common variants having modest effects on breast cancer risk, has therefore been suggested (4, 9).
We have carried out a two-stage case-control study in two European populations to identify low-penetrance genes for breast cancer, focusing on SNPs in 111 preselected candidate cancer-related genes.
| Materials and Methods |
|---|
|
|
|---|
Controls were 845 Spanish women free of breast cancer at ages ranging from 23 to 86 years (mean, 53 years), recruited via the following sources: 442 (52%) from the Menopause Research Centre at the Instituto Palacios (Madrid, Spain), 239 (28%) from the College of Lawyers (Madrid, Spain), 91 (11%) from the National Blood Transfusion Centre (Madrid, Spain), 57 (7%) from the Catalan Institute of Oncology (ICO; Barcelona, Spain), and 16 (2%) from the Centre for the Investigation of Cancer (CIC; Salamanca, Spain). Controls were recruited between 2000 and 2005, and those from the latter two centers were women aged >60 years recruited as part of prospective epidemiologic studies unrelated to cancer, specifically selected for this study to be comparable with older cases on age.
Informed consent was obtained from all participants, and the study was approved by the institutional review board of Hospital La Paz.
Study populationstage 2 Finnish study. The case series of 884 women includes consecutive newly diagnosed breast cancer patients recruited in 1997 to 1998 (622 patients) and 2000 (262 patients) at the Helsinki University Central Hospital (Helsinki, Finland) and covers 79% of all breast cancer patients treated at the Department of Oncology during the collection period (described in detail in refs. 25, 26). Their mean age at diagnosis was 57 years (range, 22-96 years). This series included 214 familial cases (with no ovarian cancers in the family), 66 with a strong family history (three or more first- or second-degree relatives with breast cancer in the family, including the proband), and 148 with one first-degree relative affected with breast cancer. The 622 unselected cases recruited in 1997 to 1998 were screened for 19 deleterious Finnish BRCA1 and BRCA2 mutations as described previously (26), complemented with a more thorough screening of the genes in familial cases (27, 28), and 12 mutation carriers were identified. Among the 262 cases collected in 2000, 11 familial cases were screened for mutations and 1 was found to carry a deleterious BRCA2 mutation.
Eligible controls were a random sample of 28% of blood donors free of breast cancer, attending blood banks in Helsinki in 2003. DNA was available for 1,104 (82%) of these, and they were between 18 and 65 years of age (mean, 41 years).
The study was carried out with informed consents from the patients and permissions from the Ethics Committees of the Departments of Oncology and Obstetrics and Gynecology as well as from the Ministry of Social Affairs and Health in Finland.
Candidate gene choice and SNP selection. A total of 112 candidate genes were selected for stage 1 according to the following criteria: genes previously reported to be associated with or known to be involved in cancer and genes involved in cell cycle pathways, DNA repair, cell communication, hormone metabolism, apoptosis, carcinogen metabolism, cell adhesion, and/or signal transmission. A full list of these genes is provided in Appendix A. SNP selection across each of these genes was carried out using density as the primary criterion, with lower density in regions of higher linkage disequilibrium and higher density in regions of lower linkage disequilibrium and giving priority to tagSNPs defining common haplotypes. In addition, SNPs with potentially functional effects (causing amino acid changes, potentially causing alternative splicing, in the promoter region or in putative transcription factor binding sites) were chosen wherever possible. In general, SNPs selected had minor allele frequencies (MAFs) of at least 10%, with the exception of putative coding SNPs (where available) with a minimum MAF of 5%.
A list of validated SNPs in each gene, along with their MAFs, was compiled using publicly available information in dbSNP build 120.7 TagSNPs were defined using HapMap CEU genotype data (29) and Haploview software (30). Putative functional SNPs were identified using the bioinformatic tool PupaSNP (31),8 now part of the PupaSuite package. SNPs were also screened for suitability for the Illumina genotyping platform (selecting only those with an assay score >0.6, associated with a high success rate). A final total of 710 SNPs relevant to this study was included in an oligonucleotide pool assay for analysis using the Illumina platform. The average density was 1 SNP every 8.7 kb.
All SNPs with associated nominal P values < 0.01 in stage 1 were selected for genotyping in stage 2.
An additional 28 SNPs across the genome were independently selected to be used as markers to assess population stratification in the Spanish subjects. All 28 were chosen to be at least 100 kb from genes.
In post hoc analyses, putative causal SNPs in the promoter region of ERCC4 were identified using both PupaSNP (31) as well as additional phylogenetic analyses. For the latter, human DNA sequence was compared with that of other species using ECR Browser (32),9 which aligns nucleotide sequences using FASTA and screens for evolutionary conserved regions, defined as fragments of at least 200 bp with similarity >75%.
Genotyping. Genomic DNA from Spanish subjects was isolated from peripheral blood lymphocytes using automatic DNA extraction (MagNA Pure, Roche, Mannheim, Germany) according to the manufacturer's recommended protocols. This DNA was quantified using PicoGreen and diluted to a final concentration of 50 ng/µL for genotyping. The DNAs in the Finnish study were isolated by standard procedures using phenol-chloroform extraction and phase-lock gel tubes (Eppendorf AG, Hamburg, Germany), quantified using a NanoDrop ND-1000 spectrophotometer (NanoDrop Technologies, Wilmington, DE), and diluted to a final concentration of 100 to 200 ng/µL for genotyping.
Genotyping of SNPs in candidate genes in the stage 1 Spanish study was carried out according to the manufacturer's protocols using the Illumina Bead Array System (Illumina, Inc., San Diego, CA; ref. 33). For stage 1 and all post hoc SNP genotyping, at least one duplicate and one negative control were included per 96-well plate and six samples were duplicated across plates. The total number of duplicates across all plates was 35 (15 cases, 17 controls, and a non-study child-parents triad).
Genotyping of the 10 SNPs in the stage 2, the Finnish study was carried out using Amplifluor fluorescent genotyping (KBiosciences, Cambridge, United Kingdom).10 For quality control, duplicate samples from 92 (10%) cases and 92 (8%) controls were independently reanalyzed in a blinded fashion.
Genotyping of the 28 marker SNPs to assess population stratification among Spanish subjects was carried out using the MassARRAY genotyping system (Sequenom, Inc., San Diego, CA) following the manufacturer's instructions.
In post hoc analysis using the Spanish data, genotyping was carried out using Taqman technology (Applied Biosystems, Foster City, CA) for rs1800067 and, rs1649492 and using Amplifluor for rs3136038 (Fig. 1 ) following manufacturer's instructions in both cases.
|
Departure from Hardy-Weinberg equilibrium (HWE) for all SNPs was tested in controls using the genhwi command in STATA version 8. In the stage 1 analyses, a modified Bonferroni-corrected nominal Pvalue threshold of 0.05/N1* was used in assessing departure from HWE, where N1* is the "effective number of independent marker loci" after consideration of linkage disequilibrium between SNPs (marker loci) on the same chromosome. N1* was calculated using the formula of Li and Li (38) by applying the web-based program SNPSpD (39, 40) to SNPs on individual chromosomes and summing estimates across chromosomes.
Associations between individual SNPs and breast cancer risk were assessed using unconditional logistic regression, comparing genotype frequencies in cases and controls and estimating odds ratios (OR) using homozygotes in the more frequent allele in controls as the reference group. For each SNP assessed in stage 1, the best-fitting model among dominant, recessive, and multiplicative (single variable) codominant was determined by parsimony, and this was tested against the two-variable codominant model via the likelihood ratio test. Associated two-sided nominal P values were determined using the likelihood ratio test. A nominal Pvalue threshold of 0.01 was used to screen for SNPs potentially associated with breast cancer at stage 1. Age in years was adjusted for as a categorical variable with the following categories: <35, 35 to 39, 40 to 44, 45 to 49, 50 to 54, 55 to 59, 60 to 64, and >64. For SNPs assess in stage 2, the best-fitting model from stage 1 was tested using one-sided P values, the alternative hypothesis being in the direction indicated by estimated ORs from stage 1 analyses. Analyses of pooled data were adjusted for country as a dichotomous variable. These analyses were carried out using STATA version 8. (41).
Two methods were considered to address the issue of multiple testing at both stage 1 and stage 2. The Bonferroni correction was applied based on the effective number of independent marker loci, estimated at each stage as described above. This approach appropriately accounts for the nonindependence of SNPs on the same chromosome due to linkage disequilibrium has been shown to closely approximate results from adjustment for multiple testing using permutation methods (39, 40). Adjusted P values from stage 2 were confirmed using a one-sided permutation test based on 10,000 permutations, in which case/control status was randomly allocated and
2 statistics calculated for each SNP tested (ignoring values for SNPs in which the difference in allele frequencies of cases versus controls was not in the same direction as that observed in stage 1). The distribution-free method of controlling the false-discovery rate (FDR) of Benjamini et al. (42), which is robust to the presence of nonindependent explanatory variables, was also applied.
Post hoc haplotype analysis for ERCC4 was done using the haplo.stats package, implemented in R, which compares haplotype frequencies in cases and controls in an unbiased way by including haplotype uncertainty in unconditional logistic regression.
| Results |
|---|
|
|
|---|
All duplicates both within and between plates genotyped in stage 1 were concordant for all SNPs. Of the 710 SNP assays, 65 either failed genotyping (no PCR amplification, insufficient intensity for cluster separation, or no or poor cluster definition) or were monomorphic. All three SNPs in MAP2K3 failed genotyping, leaving 111 genes with at least one SNP genotyped. A further 5 of the remaining 645 SNPs were found to violate HWE, with nominal Ps well below the modified Bonferroni threshold of 0.0001 (based on N1* = 422), and were therefore excluded from further analyses. The number of SNPs successfully genotyped in each gene in stage 1 is included in Appendix A.
The 640 SNPs in 111 genes successfully genotyped and investigated for associations with breast cancer in stage 1 represented the equivalent of an estimated 417 independent loci. A full list of these SNPs, including estimated MAFs for controls, can be found at Bioinformatics Web site.11 Their genomic positions are summarized in Table 1 . Allele frequencies observed in controls were highly consistent with those reported for Centre dEtude du Polymorphisme Humaine individuals by HapMap (29), with a high positive correlation of 0.91 (43).
|
|
|
We carried out phylogenetic analyses and found that sequences in the first intron and in the promoter region are highly conserved across species, highlighting that SNPs lying therein (including rs744154) could be of functional importance and therefore cause the observed protection from breast cancer. We screened these regions to identify additional SNPs with allele frequencies >5% located at potential transcription factor binding sites and/or in highly conserved sequences. We selected an insertion/deletion polymorphism (rs11337253) and a SNP (rs11649492) located 0.6 and 4.6 kb upstream of ERCC4, respectively, as candidate causal loci and genotyped them in the Spanish case-control series. PupaSNP predicts that rs11337253 is located at an HNF1 transcription factor binding region, whereas rs11649492 lies in a sequence that is highly conserved in both dog and mouse. The insertion/deletion polymorphism (rs11337253) could not be genotyped due to difficulty designing probes for Taqman or Amplifluor, given the poly(A) flanking sequence (9A). We instead genotyped rs3136038 located just 85 bp upstream of rs11337253.
Table 3 summarizes the comparison of genotype distributions between cases and controls for all SNPs in ERCC4 that were studied. For both rs3136038 and rs11649492, there was no evidence of departure from HWE among controls and all duplicates both within and between plates were concordant. There was no evidence of an association with breast cancer for either SNP.
Five SNPs in ERCC4 were originally genotyped in stage 1 (see Table 3 for results). The haplotypes formed by these SNPs were inferred and compared between Spanish cases and controls. There were only three haplotypes with estimated frequency >1%, the two most common (CTCGT and GGTAC) being yin yang (46) and accounting for 62% and 29% of all haplotypes, respectively, reflecting the high linkage disequilibrium observed across the entire gene. GGTAC includes the minor allele of each SNP and was associated with reduced breast cancer risk compared with CTCGT (per copy OR, 0.84; P = 0.02). Results did not change substantially when these analyses were repeated, including rs3136038, located 85 bp from the insertion-deletion polymorphism in the promoter (yin yang haplotypes accounting for 91% of all haplotypes and TGGTAC having OR of 0.83; P = 0.01). These estimates are practically identical to the per-allele risk estimated for rs744154 alone (OR, 0.83; P = 0.01).
| Discussion |
|---|
|
|
|---|
Only one study to date has published results from the application of high-throughput genotyping to the study of breast cancer (12, 47, 48), studying 25,494 marker SNPs in 14,000 to 16,000 genes among 254 breast cancer cases and 268 controls. In the first two publications (12, 48), the 52 SNPs with nominal P values < 0.05 were selected and studied in 368 cases and 330 controls from Germany and Australia. One SNP in ICAM5 was considered consistently associated with case-control status (unadjusted P < 0.05) in the original and replication series and another in NuMA in the original and pooled series (but not in the replication series alone). Their third publication (47) applied more relaxed criteria to declare associations and identified DPF3 as an additional candidate gene for further study. None of the SNPs studied were statistically significantly associated with breast cancer after correction for multiple testing.
The present study applies this two-stage candidate SNP selection and replication approach (9) but with much larger samples, both enhanced for family history of disease among cases and each from a distinct European population, and the replicated result was statistically significant after correction for multiple testing.
ERCC4, also known as XPF, is involved in the nucleotide excision repair pathway and is linked to susceptibility to xeroderma pigmentosum, a rare recessive syndrome that includes photosensitivity and malignant tumor development (49). ERCC4 plays an important role in recombination repair, mismatch repair, and possibly immunoglobulin class switching because of its unique function in damage site recognition (50). Several studies have investigated associations between polymorphisms in ERCC4 and breast cancer risk (44, 45, 51). Smith et al. (45) and Mechanic et al. (44) both found that, for the nonsynonymous coding SNP on exon 8, rs1800067 (Fig. 1), the percentage of rare (AA) homozygotes in rs180067 was higher for cases than controls (among Whites/Caucasians), although in neither was this statistically significant. Lee et al. (51) found no evidence of an association with breast cancer with the synonymous coding SNP in exon 11, rs1799801 (Fig. 1); however, they suggested that carriers of both the variant in this SNP and Asp312Asn in ERCC2 might be at increased breast cancer risk. Our results (for rs799801 studied at stage 1 and rs1800067 studied post hoc) do not support these findings, as per-allele OR estimates were in the opposite directions (respectively) for both SNPs (Table 3).
The observed association with the intronic SNP, rs744154, in this study would be explained by either rs744154 being directly protective or it being in linkage disequilibrium with another protective variant. HapMap data suggest that the entire ERCC4 gene forms a high linkage disequilibrium block, including the promoter region (Fig. 1). We studied rs1800067 (R415Q), the only nonsynonymous coding SNP identified in ERCC4 with an allele frequency >1%, and found no evidence of an association with breast cancer. Having observed that the four stage 1 SNPs located downstream of rs744154 showed weaker evidence of association with breast cancer, we focused on the promoter region and on potential transcription factor binding sites and highly conserved regions in particular, considering these strong candidates to contain functionally important SNPs (9). Associations were not detected for either of two additional SNPs located in highly conserved sequences in this region. One of those (rs3136038) was just 85 bp upstream of a candidate insertion-deletion polymorphism (rs11337253), which we were unable to genotype directly (Fig. 1); however, the lack of association and high linkage disequilibrium observed between all SNPs studied suggests that the latter is unlikely to be causal.
As mentioned above, rs744154, though intronic, could itself be causal. It is plausible that disease-associated variants with modest effects will be distributed proportionately between coding and noncoding sequences of the genome (8), and conserved noncoding regions in particular are often functionally important (9, 52). Indeed, several studies have found functional intronic variants associated with disease (53, 54). Furthermore, various studies have identified transcriptional regulation elements on the first intron of human genes (5557). Rs744154 is located in a sequence of intron 1 that is highly (>90%) conserved in Canis familiaris, indicating a potential functional role. Functional studies are being planned to clarify this.
Population stratification is unlikely to have confounded the observed association of rs744154 with breast cancer risk. Although the power to detect population stratification in the subsample analyzed was limited, Structure convincingly identified only one population stratum among cases and controls. Results were also highly consistent when women recruited outside of Madrid were excluded. That the association between rs744154 in ERCC4 was replicated in an independent study of a distinct European population confirms that population stratification is unlikely to have been influential. The inclusion of cases selected for family history of disease in stage 1 also seems not to have influenced the findings of this study because the OR estimates for rs744154 from both samples did not change when selected cases were excluded.
In summary, we have conducted a two-stage case-control study, first screening SNPs in 111 cancer-related genes in a large Spanish case-control series and then validating associations in a large Finnish series. A SNP (rs744154) on intron 1 of ERCC4 was associated with breast cancer risk after adjustment for multiple testing in stage 2, indicating that common variation in ERCC4 is associated with protection from the disease.
| Appendix A. List of the 112 preselected cancer-related genes, in alphabetical order, with the number of successfully genotyped SNPs per gene in parenthesis |
|---|
|
|
|---|
*None of the three SNPs in MAP2K3 were successfully genotyped, leaving 111 genes studied in cases and controls.
| Acknowledgments |
|---|
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
We thank José Ignacio Arias (Hospital Monte Naranco), Pilar Zamora (Hospital La Paz), Álvaro Ruibal (Fundación Jiménez Díaz), Santiago Palacios (Instituto Palacios), Silvia de Sanjose (ICO), and Rogelio González Sarmiento (CIC) for the use of Spanish samples of cases and controls; Charo Alonso, Christian Torrenteras, Alicia Barroso, Victoria Fernández, Rocío Letón, and Fátima Mercadillo for their technical assistance in Spain; and Drs. Hannaleena Eerola and Carl Blomqvist as well as RN Nina Puolakka for their kind help with the patient contacts and sample collection in Finland.
| Footnotes |
|---|
9 http://www.ecrbrowser.dcode.org. ![]()
10 http://www.kbioscience.co.uk. ![]()
11 http://bioinfo.cnio.es/cgi-bin/cegen/frequencies.cgi. ![]()
Received 4/19/06. Revised 6/26/06. Accepted 7/20/06.
| References |
|---|
|
|
|---|
-actin transcriptional regulation in activated mesangial cells in vivo. Kidney Int 1999;55:233848.[CrossRef][Medline]This article has been cited by other articles:
![]() |
E. D. Strome, X. Wu, M. Kimmel, and S. E. Plon Heterozygous Screen in Saccharomyces cerevisiae Identifies Dosage-Sensitive Genes That Affect Chromosome Stability Genetics, March 1, 2008; 178(3): 1193 - 1207. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Mosquera-Miguel, V. Alvarez-Iglesias, A. Carracedo, A. Salas, A. Vega, A. Carracedo, R. Milne, A. C. de Leon, J. Benitez, A. Carracedo, et al. Is Mitochondrial DNA Variation Associated with Sporadic Breast Cancer Risk? Cancer Res., January 15, 2008; 68(2): 623 - 625. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Cancer Research | Clinical Cancer Research |
| Cancer Epidemiology Biomarkers & Prevention | Molecular Cancer Therapeutics |
| Molecular Cancer Research | Cancer Prevention Research |
| Cancer Prevention Journals Portal | Cancer Reviews Online |
| Annual Meeting Education Book | Cell Growth & Differentiation |