Only two genome-wide association studies (GWAS) have been conducted to date to identify potential markers for total mortality after diagnosis of breast cancer. Here, we report the identification of two single-nucleotide polymorphisms (SNP) associated with total mortality from a two-stage GWAS conducted among 6,110 Shanghai-resident Chinese women with tumor–node–metastasis (TNM) stage I to IV breast cancer. The discovery stage included 1,950 patients and evaluated 613,031 common SNPs. The top 49 associations were evaluated in an independent replication stage of 4,160 Shanghai patients with breast cancer. A consistent and highly significant association with total mortality was documented for SNPs rs3784099 and rs9934948. SNP rs3784099, located in the RAD51L1 gene, was associated with total morality in both the discovery stage (P = 1.44 × 10−8) and replication stage (P = 0.06; P-combined = 1.17 × 10−7). Adjusted HRs for total mortality were 1.41 [95% confidence interval (CI), 1.18–1.68] for the AG genotype and 2.64 (95% CI, 1.74–4.03) for the AA genotype, when compared with the GG genotype. The variant C allele of rs9934948, located on chromosome 16, was associated with a similarly elevated risk of total mortality (P-combined = 5.75 × 10−6). We also observed this association among 1,145 patients with breast cancer of European ancestry from the Nurses’ Health Study (NHS; P = 0.006); the association was highly significant in a combined analysis of NHS and Chinese data (P = 1.39 × 10−7). Similar associations were observed for these two SNPs with breast cancer–specific mortality. This study provides strong evidence suggesting that the RAD51L1 gene and a chromosome 16 locus influence breast cancer prognosis. Cancer Res; 72(5); 1182–9. ©2012 AACR.
Breast cancer is one of the most common malignancies among women in many countries including China. Despite generally good prognosis for patients with breast cancer, wide variation exists in survival, even after accounting for clinical prognostic factors, suggesting that genetic susceptibility may influence breast cancer outcomes. Over the past 10 years, candidate gene studies, including our own (1–5), have found several genetic variants to be related to breast cancer prognosis. These genetic variants are found primarily in breast cancer susceptibility genes (e.g., BRCA1, BRCA2, TP53; refs. 6–8) or genes involved in drug metabolism (e.g., CYP2D6, NQO1; refs. 9, 10) and tumor microenvironment regulation (e.g., TGFβ1, VEGF, CCND1, PAI1, MMP7; refs. 1–5). However, very few of these associations have been confirmed. Given that almost all previous studies used the candidate gene approach, in which only a limited number of genetic variants are investigated and the choice of variant is based on our limited knowledge of the underlying biology of cancer, more comprehensive genomic investigations of breast cancer prognosis are urgently needed. Recent genome-wide association studies (GWAS) have identified genetic variants related to breast cancer risk that have been robustly replicated across populations (11–18). Many GWAS-identified genetic markers are located in regions that had never been suspected of being related to cancer susceptibility. To our knowledge, only 2 studies have evaluated genetic factors in relation to breast cancer survival using the GWAS approach (19, 20) and both were conducted among women of European ancestry. The first study reported a single-nucleotide polymorphism (SNP) in the OCA2 gene (rs4778137) associated with total mortality among women of European ancestry with estrogen receptor (ER)-negative tumors at P = 5 × 10−4 (19). However, the second study, conducted as part of the Cancer Genetic Markers of Susceptibility (CGEMS) study, found no SNPs with a statistically significant association with breast cancer survival (20).
Over the past 15 years, we have conducted multiple, large-scale, population-based studies of breast cancer among Chinese women in Shanghai (14, 21, 22). Using the data collected from these studies, we evaluated lifestyle determinants of breast cancer survival (21, 23, 24). In addition to the candidate gene studies reported previously (1–5), we recently conducted a 2-stage GWAS among 6,110 patients (719 deaths) with stage I to IV breast cancer recruited in the Shanghai studies to identify novel genetic variants associated with breast cancer survival. To evaluate the generalizability of our findings to other ethnic groups, we investigated these GWAS-identified SNPs in CGEMS data from 1,145 patients of European ancestry with breast cancer (229 deaths) who participated in the Nurses’ Health Study (NHS).
Overall study design and study populations
Samples included in this GWAS came from participants of the Shanghai Breast Cancer Study (SBCS) and Shanghai Breast Cancer Survival Study (SBCSS). Details on the methodology of the parent studies have been described previously (14, 21). Briefly, the SBCS is a population-based case–control study that recruited incident patients with breast cancer and controls in urban Shanghai between August 1996 and March 1998 and again between April 2002 and February 2005 (14). A total of 3,448 patients were recruited (participation rate: 86.7%); 90.6% of participants provided a blood or exfoliated buccal cell sample. The SBCSS also was conducted in urban Shanghai and recruited 5,042 patients with breast cancer between March 2002 and April 2006 (participation rate: 80.1%); 98% of patients provided an exfoliated buccal cell sample (14, 21). All participants of both studies provided written informed consent before participating in the study and the Institutional Review Boards of all institutes involved approved the study protocols. Medical charts for patients with breast cancer were reviewed to verify cancer diagnosis and obtain treatment information. Patients with cancer have been followed for survival status and breast cancer recurrence through a combination of record linkages with the Shanghai Vital Statistics Registry and in-person surveys. Because of a time overlap during recruitment, 1,469 women participated in both the SBCS and SBCSS. After taking these overlaps into consideration and excluding patients with stage 0 disease (n = 190), those for whom we had no information on survival status (n = 185), and genotyping failures due to limited DNA (n = 17), a total of 6,110 participants remained in the present study.
The discovery stage of this study included 1,950 participants. Genomic DNA samples were genotyped primarily using the Affymetrix Genome-Wide Human SNP Array 6.0. From the discovery stage, 2 batches of SNPs were selected for replication in an independent set of samples from the Shanghai studies. Criteria used to select SNPs for validation were as follows: (i) P ≤ 0.001 under the additive model for either the total mortality (overall survival) or breast cancer recurrence (disease-free survival) analysis; for SNPs that are on or close to metastasis genes or genes previously indicated in breast cancer prognosis, the P value was relaxed to ≤0.01; (ii) minor allele frequency (MAF) > 10%; (iii) exhibited high-quality genotype cluster plots; and (iv) in regions where multiple SNPs in linkage disequilibrium (LD; r2 ≥ 0.6) met the above criteria, the SNP with the lowest P value was chosen. The first batch (the top 30 SNPs not in LD) was selected when GWAS data were available for 1,436 participants. Twenty-nine of these SNPs were successfully genotyped in an independent set of 3,881 Shanghai study participants. The second batch (the top 20 SNPs not in LD and not overlapping with those included in the first batch) was selected after an additional 514 patients with breast cancer (for a total of 1,950 participants included in the discovery stage) were scanned. These 20 SNPs were genotyped in an independent set of 4,160 Shanghai study participants. Among women who participated in the 2 batches of validation studies, 3,522 overlapped.
To explore the generalizability of the study findings to other ethnic groups, we selected 8 SNPs with possible associations with breast cancer survival in the Shanghai studies and evaluated their associations with breast cancer survival in European-ancestry Americans using data from the CGEMS project. These SNPs were chosen on the basis of the significance level of the association found in the discovery stage and/or the consistency of the associations observed in both discovery and replication stages. The CGEMS project included 1,145 postmenopausal breast cancer cases from the NHS whose DNA samples were scanned using the Illumina HumanHap500 Array (12, 16). The NHS is a prospective cohort of 121,700 registered nurses who resided in 11 U.S. states and enrolled in the study in 1976. Follow-up was conducted by personal mailings and searches of the National Death Index (20).
Genotyping and quality control procedures
We included 3 positive quality control (QC) samples purchased from Coriell Cell Repositories (Coriell Institute, Camden, NJ) and a negative QC sample (water) in each of the 96-well test plates. The average concordance among the QC samples was 99.85% (median, 100%). The gender of all scanned samples was confirmed. Genetically identical, unexpected duplicated samples were excluded, as were close relatives with a pairwise proportion of identity-by-descent estimate >0.25. Multidimensional scaling (MDS) analyses of pooled data including 210 unrelated HapMap subjects together with our study data showed that all our study participants clustered closely with HapMap Asians. All samples with a call rate < 95% were excluded. In addition, each SNP met following inclusion criteria: (i) MAF ≥ 5%; (ii) call rate ≥ 95%; (iii) P for Hardy–Weinberg equilibrium (HWE) ≥ 0.000001; and (iv) concordance ≥ 95% among duplicated QC samples. After exclusions, 607,728 SNPs from batch 1 and 613,031 SNPs from batch 2 were available for the statistical analyses used to select promising SNPs for replication. The genotyping and QC protocols for the CGEMS project are described in detail elsewhere (12, 16).
Genotyping for the replication stage was conducted on the iPLEX Sequenom MassARRAY Platform. PCR and extension primers were designed by using the MassARRAY Assay Design 3.0 Software (Sequenom, Inc). PCR and extension reactions were carried out according to the manufacturer's instructions, and extension product sizes were determined by mass spectrometry using the Sequenom iPLEX System. In each 96-well plate, 2 negative controls (water), 2 blinded duplicates, and 2 samples from the HapMap project were included. The concordance was 100% for all SNPs for both the blinded duplicates and the HapMap samples. We also included 65 participants who were genotyped by using the Affymetrix 6.0 Array on the Sequenom genotyping platform and found 100% consistency for data generated on these 2 platforms. All SNPs showed high call rates (>95%).
A set of 4,305 SNPs (not in LD) with an MAF > 35% and at least 100 kb apart was selected to evaluate the population structure. The inflation factor λ was estimated to be 1.045, suggesting that population substructure, if present, should not have any appreciable effect on the results.
Outcomes of the study were total mortality (for the overall survival analysis) and breast cancer recurrence (for the disease-free analysis). For the overall survival analysis, follow-up time was calculated as the number of days between the date of cancer diagnosis and the date of death or date of last record linkage for survivors. For the disease-free survival analysis, follow-up time was calculated as the number of days between the date of cancer diagnosis and disease recurrence or date of last survey for women who did not have disease recurrence or died of breast cancer. For 62 women who died of breast cancer but were missing information on disease recurrence, we imputed the date for recurrence on the basis of the tumor–node–metastasis (TNM) stage-specific recurrence rate estimated for the current study. Delayed-entry Cox proportional hazards regression models were used to derive HRs for total mortality and breast cancer recurrence in association with each SNP with adjustment for age. Additional adjustments for known clinical predictors for breast cancer prognosis, including breast cancer stage (TNM); estrogen/progesterone receptor (ER/PR) status; and ever use of chemotherapy, radiotherapy, and tamoxifen did not materially change the results. We also examined the influence of population substructure by adjusting for the first 5 principal components derived on the basis of 196,471 SNPs with a pairwise LD of r2 < 0.2 that were selected using PLINK (25, 26). We observed no appreciable changes in study results (data not shown). Thus, the results presented in this article were not adjusted for population substructure.
Clinical characteristics of study participants are presented in Table 1. Patients included in the discovery stage were younger and more likely to have late-stage disease; lower 5-year survival rates; and to have received chemotherapy, radiotherapy, or tamoxifen than those included in the replication stage. These differences reflect differences in study enrollment criteria (the SBCS, contributed the majority of cases to the discovery stage and disproportionally recruited younger women with breast cancer) and, possibly, temporal changes in breast cancer treatment protocols and outcomes.
Of the top 50 SNPs chosen for replication, 49 SNPs were successfully genotyped. Associations for all SNPs with breast cancer outcomes and P values for HWE tests are presented in Supplementary Table S1. A nominally statistically significant (P ≤ 0.05) or marginally significant (P < 0.06) association with total mortality was observed for 4 SNPs: rs3784099, rs9934948, rs729438, and rs1769441, and the directions of the associations were consistent in both the discovery and replication stages (Supplementary Table S1). For 2 SNPs, rs3784099 on chromosome 14 and rs9934948 on chromosome 16, the P value for the combined analysis reached 1.17 × 10−7 and 5.75 × 10−6, respectively (Table 2). SNP rs3784099 was associated with total mortality with a per-allele HR of 1.79 [95% confidence interval (CI), 1.46–2.19; Ptrend = 1.44 × 10−8] in the discovery stage, 1.22 (95% CI, 0.99–1.52; Ptrend = 0.06) in the replication stage, and 1.49 (95% CI, 1.28–1.72; Ptrend = 1.17 × 10−7) for all samples combined. In the recurrence analyses, the Ptrend for rs3784099 was of marginal statistical significance in the replication stage (P = 0.07), and the direction of the association was consistent with the discovery stage. In the combined analysis, the per-allele HR for recurrence for rs3784099 was 1.43 (95% CI, 1.25–1.64; Ptrend = 2.83 × 10−7). SNP rs9934948 showed a statistically significant association with total mortality (P = 0.03) but was not significantly associated with recurrence (P = 0.32) in the replication stage. In combined analyses, per-allele HRs were 1.29 (95% CI, 1.16–1.44; Ptrend = 5.75 × 10−6) for total mortality and 1.19 (95% CI, 1.08–1.31; Ptrend = 7.32 × 10−4) for recurrence. Regional association plots for these 2 SNPs are presented in Figs. 1 and 2. The associations of these 2 SNPs with total mortality and recurrence did not vary by ER or menopausal status (Table 3). The vast majority of deaths among study participants were due to breast cancer (88%). In analyses of breast cancer–specific mortality, associations similar to those for total mortality were observed (HR, 1.45; 95% CI, 1.24–1.70; Ptrend = 3.8 × 10−6 for rs3784099 and HR, 1.27; 95% CI, 1.13–1.43; Ptrend = 6.0 × 10−5 for rs9934948), although the P values increased because of the decrease in the number of events (data not shown in tables).
SNP rs9934948 was associated with total mortality in European-ancestry Americans and the direction of association was the same as that observed in the Shanghai studies (Supplementary Table S2). The age-adjusted HRs were 3.27 (95% CI, 0.75–14.19) for the CT genotype and 4.70 (95% CI, 1.11–19.97) for the CC genotype compared with the TT genotype (Ptrend = 0.006). Meta-analyses combining Shanghai samples and CGEMS data showed a combined P value of 1.39 × 10−7. Data on recurrence were not available for CGEMS participants.
In this 2-stage GWAS of breast cancer survival conducted among Chinese women, we found strong evidence for an association of SNP rs3784099 with total mortality and with recurrence and breast cancer–specific mortality. This SNP is located on chromosome 14 in intron 7 of the RAD51L1 gene, an established cancer susceptibility gene (27). The RAD51L1 gene encodes a protein that is part of the RAD51 family, which is essential for DNA repair by homologous recombination. Overexpression of this gene has been shown to cause cell-cycle delay and apoptosis (27, 28). The RAD51L1 gene is not ubiquitously expressed, but it is significantly expressed in breast cancer–derived MCF7 cells (29). A recent GWAS identified a SNP in this gene, rs999737, to be associated with breast cancer risk (16). SNP rs999737, however, was not related to breast cancer survival in our study (data not presented), nor is it in LD with SNP rs3784099 (r2 = 0 in Asians and r2 = 0.032 in Europeans based on HapMap data).
SNP rs3784099 is also associated with differential expression of 2 other genes involved in cancer, SNCG and CTF1, according to the SCAN database (30), which uses HapMap human lymphoblastoid cell lines to identify expression quantitative trait loci (eQTL; ref. 31). Both genes yielded a P value of 0.0001 in cell lines of European ancestry (CEU), although the specific allele of rs3784099 responsible for increased/decreased expression is not apparent in this resource. The SNCG gene encodes synuclein gamma, which is also known as breast cancer–specific protein 1 (32). Upregulation of the SNCG gene has been shown to enhance cancer cell motility and contributes to cancer cell survival (32). There are indications that the SNCG gene may be involved in late-stage breast and ovarian cancer metastasis by enhancing cell motility through activation of RHO family small GTPases and extracellular signal–regulated kinases (32, 33). Overexpression of the SNCG gene is a marker for breast cancer progression and a potential target for breast cancer treatment (32, 34). The CTF1 gene is a transcription factor that can delimit chromatin boundaries and thereby block the propagation of silent chromatin (35). These data provide additional support for the association between rs3784099 and breast cancer outcomes observed in our study.
SNP rs9934948 resides on chromosome 16, in the middle of a gene desert with its nearest neighboring genes, ZFHX3 and PSMD7, 346 and 891 kb away, respectively. ZFHX3 is one of the homeobox genes that are often located in gene deserts. PSMD7 is a proteasome component (36) and has been previously shown to be one of the genes most impacted by siRNA knockdown of the ER in MCF cells (37). Proteasome activity is increased in tumor cells, resulting in increased turnover rates for signaling molecules that are involved in the regulation of cell growth and apoptosis (38). These biologic links and the strong association of this SNP with total mortality observed among breast cancer survivors of European ancestry in CGEMS data support a possible role for rs9934948 in breast cancer prognosis.
To date, only one GWAS-identified SNP, rs4778137, has been associated with breast cancer survival, although the association for this SNP did not reach the conventional genome-wide significance level of 5 × 10−8 (only 5 × 10−4; ref. 19). We evaluated this SNP using the scanned data from our discovery stage and found that rs4778137 was significantly associated with total mortality (per-allele HR, 1.25; 95% CI, 1.03–1.51; Ptrend = 0.02; data not shown in tables). The association was observed predominantly among premenopausal women (per-allele HR, 1.29; 95% CI, 1.02–1.64) and women with ER-positive breast cancer (per-allele HR, 1.27; 95% CI, 0.96–1.68). Thus, our results provide some support for the association identified by the previous GWAS conducted among women of European ancestry.
Given the difference in genetic architecture across ethnic groups, disease-associated SNPs identified in one population are often not replicated directly in another population. In a recent study conducted among approximately 6,000 female Chinese patients with cancer and controls in Shanghai, only 8 of the 12 breast cancer risk SNPs identified in women of European ancestry could be directly replicated (39). Therefore, it is not surprising that the top SNP identified by our study, rs3784099, was not directly replicated in the CGEMS data. Differences in study eligibility could also have contributed to the lack of replication. For example, CGEMS only included postmenopausal women, whereas the SBCS oversampled younger patients with breast cancer. However, we did not find the association of rs3784099 to be modified by menopausal status. On the other hand, replication of an association in other ethnic groups, as is the case with rs9934948, provides additional evidence for a true association.
In summary, we found that genetic variants in the RAD51L1 gene and chromosome 16 were associated with survival among patients with breast cancer. Additional research on the genetic regions and genes identified by our study would lead to a better understanding of the biologic mechanisms responsible for breast cancer progression and survival.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interests were disclosed.
X.O. Shu and W. Zheng conceived the study.
X.O. Shu, W. Lu, Y. Zheng, Y.-T. Gao, W. Zheng, and Q. Cai contributed to the study design.
X.O. Shu, W. Lu, Y. Zheng, Y.-T. Gao, W. Zheng, Q. Cai, J. Cheng, K. Gu, and W.-J. Wang contributed to study implementation.
X.O. Shu, J. Long, H. Cai, C. Li, and W.Y. Chen contributed to data analysis.
X.O. Shu, C. Li, P. Kraft, Q. Cai, W.Y. Chen, and W. Zheng contributed to data interpretation.
X.O. Shu, W. Zheng, and Q. Cai obtained funding for the study.
W.Y. Chen, W. Zheng, and P. Kraft provided critical review of the manuscript.
X.O. Shu, J. Long, and R. Delahanty contributed to the writing of the manuscript.
J. Shi conducted the genotyping.
X.O. Shu confirms that she had full access to all of the study data and bears final responsibility for the decision to submit the manuscript.
This work was supported by the U.S. NIH/National Cancer Institute (grant numbers: R01CA118229 to X.O. Shu and R01CA124558, R01CA064277, and R01CA090899 to W. Zheng), as well as the U.S. Department of Defense (DOD) Breast Cancer Research Program (Idea Awards BC011118 to X.O. Shu and BC050791 to Q. Cai). Sample preparation and discovery stage genotyping were conducted at the Survey and Biospecimen and Microarray Shared Resources, which are supported, in part, by the Vanderbilt-Ingram Cancer Center (P30 CA68485).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The authors thank study participants and research staff for their contributions and commitment to this project, Regina Courtney for DNA preparation, and Bethanie Rammer for editorial support in the preparation of the manuscript. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Cancer Institute or Department of Defense.
Note: Supplementary data for this article are available at Cancer Research Online (http://cancerres.aacrjournals.org/).
- Received July 29, 2011.
- Revision received December 20, 2011.
- Accepted December 27, 2011.
- ©2012 American Association for Cancer Research.