The CYP19A1 gene encodes the enzyme aromatase, which is responsible for the final step in the biosynthesis of estrogens. In this study, we used a systematic two-step approach that included gene resequencing and a haplotype-based analysis to comprehensively survey common genetic variation across the CYP19A1 locus in relation to circulating postmenopausal steroid hormone levels and breast cancer risk. This study was conducted among 5,356 invasive breast cancer cases and 7,129 controls comprised primarily of White women of European descent drawn from five large prospective cohorts within the National Cancer Institute Breast and Prostate Cancer Cohort Consortium. A high-density single-nucleotide polymorphism (SNP) map of 103 common SNPs (≥5% frequency) was used to identify the linkage disequilibrium and haplotype patterns across the CYP19A1 locus, and 19 haplotype-tagging SNPs were selected to provide high predictability of the common haplotype patterns. We found haplotype-tagging SNPs and common haplotypes spanning the coding and proximal 5′ region of CYP19A1 to be significantly associated with a 10% to 20% increase in endogenous estrogen levels in postmenopausal women [effect per copy of the two-SNP haplotype rs749292-rs727479 (A-A) versus noncarriers; P = 4.4 × 10−15]. No significant associations were observed, however, with these SNPs or common haplotypes and breast cancer risk. Thus, although genetic variation in CYP19A1 produces measurable differences in estrogen levels among postmenopausal women, the magnitude of the change was insufficient to contribute detectably to breast cancer. [Cancer Res 2007;67(5):1893–7]
- breast cancer
A doubling of postmenopausal endogenous levels is estimated to increase breast cancer risk by ∼30% ( 1). After menopause, estrogen biosynthesis takes place predominantly in adipose tissue and is catalyzed by the aromatase enzyme (encoded by the gene CYP19A1) which converts androgens to estrogens. CYP19A1 spans ∼123 kb at 15q21.1 and comprises nine coding exons (II–X) and several alternative, untranslated first exons that are expressed under the control of tissue-specific promoters ( 2). Studies have suggested that variation at this locus may be a marker of circulating estrogen levels in postmenopausal women and contribute to risk of breast cancer ( 3– 8). Results, however, have been inconsistent, and most studies have been underpowered to detect the modest magnitude of effect anticipated for a common low-penetrance allele or have focused on a limited number of markers across the locus. A combination of genomic approaches is applied here; this includes resequencing of the coding exons and a linkage disequilibrium (LD)–based analysis to systematically search for markers of circulating steroid hormones and breast cancer risk at the CYP19A1 locus.
Materials and Methods
This study includes five breast cancer case-control studies nested within established prospective cohorts: American Cancer Society Cancer Prevention Study II ( 9), European Prospective Investigation into Cancer and Nutrition (EPIC) cohort ( 10), Harvard Nurses' Health Study (NHS; ref. 11); Women's Health Study ( 12); and Multiethnic Cohort Study (MEC; ref. 13) that represent the National Cancer Institute Breast and Prostate Cancer Cohort Consortium (Supplementary Materials). The nested case-control studies comprised 5,356 invasive breast cancer cases and 7,129 controls. Cases were identified in each cohort by self report with subsequent confirmation of the diagnosis from medical records or tumor registries and/or linkage with population-based tumor registries. Within each cohort, controls were individually or frequency matched to cases by ethnicity, age, and country of residence (EPIC). Eighty percent of cases were postmenopausal, and with the exception of the MEC (which also comprises African Americans, Latinos, Japanese Americans, and Native Hawaiians) over 93% of women in each cohort were of European descent (range, 93–100%). Thus, our study focused on genetic variation at the CYP19A1 locus common among ancestral European Whites.
Levels of estradiol (E2), estrone (E1), testosterone (T) and androstenedione (A) had been measured in prospectively collected samples (E2, N 3,431; E1, N 3,146; T, N 3,435; A, N 3,224) as part of previous studies among postmenopausal women within the EPIC ( 14), NHS ( 15), and MEC ( 16) cohorts. Because the hormone fractions were measured in different laboratories by either direct or indirect RIA, results are presented as the percentage change in geometric mean levels by genotype and haplotype class.
We initially resequenced the coding exons of CYP19A1 in a panel of 95 women in the MEC (19 each from African Americans, Latinos, Japanese Americans, Native Hawaiians and Whites) with invasive, nonlocalized breast cancer. This panel provided ≥85% power to detect a SNP with an ethnic-specific minor allele frequency of ≥5% or an overall minor allele frequency of ≥1%. In the haplotype analysis, LD structure of CYP19A1 was determined by genotyping 103 common SNPs and two rare missense SNPs (R264C and T201M) among a panel of 349 women from the MEC (including 70 Whites). Seventy-four SNPs had been used to characterize LD patterns at this locus in a previous study ( 6). Markers were distributed over a 181.0-kb region, from 22.5 kb 5′ of exon I.1 through 29.4 kb 3′ of the 3′ untranslated region (UTR; average SNP spacing 1.76 kb; Supplementary Table S1). The CYP19A1 locus has been shown to contain four blocks of LD ( Fig. 1 ; ref. 6). Within blocks, haplotype frequency estimates were constructed from genotype data for Whites using the expectation-maximization algorithm, and 19 tag SNPs were selected to predict (rH2 ≥ 0.7) 15 common haplotypes (≥0.05; Supplementary Table S2; refs. 6, 17, 18).
Tag SNPs were genotyped using the TaqMan assay (Applied Biosystems, Foster City, CA) in three Breast and Prostate Cancer Cohort Consortium laboratories. Details of each assay can be found online. 22 In each laboratory, assays were run on a designated set of 94 samples; concordance across laboratories was 99.6%. Genotype data quality from each laboratory was assessed by typing 5% to 10% blinded samples; the concordance rates were >99.7%. No deviation in Hardy-Weinberg equilibrium was observed (at the P < 0.01 level) across more than one study for any given SNP. The study protocol was approved by all institutional review boards.
Tag SNPs were genotyped in the HapMap CEU trios ( 19) to permit an assessment of coverage in relation to the HapMap database (phase II, October 2005). These 19 tag SNPs were estimated to predict (r2 ≥ 0.7) 70% of all common SNPs genotyped in the HapMap CEU population across the four LD block regions.
In the analysis, we estimated haplotype frequencies and expected haplotype counts (0, 1, or 2) for each individual, conditional on that individual's genotype data ( 20). This was conducted within each study and by race (MEC) and country (EPIC). These expectations were included in conditional logistic regression models to estimate odds ratios and confidence intervals and, in linear models, to evaluate associations with circulating steroid hormone levels. To control for type I error over all the SNPs and haplotypes considered, we conducted global tests for the influence of any genomic variation in a given block by putting all common haplotypes or SNPs into a multidegree of freedom likelihood ratio or F test for risk/hormones, respectively.
Results and Discussion
We found highly significant relationships ( Table 1 ) between tag SNPs and circulating estrogen levels among postmenopausal women with the strongest effects observed with SNPs in LD blocks 3 and 4 (P values for global effects by block for E2/E1: block 1, 0.24/0.03; block 2, 0.16/1.4 × 10−5; block 3, 7.2 × 10−9/6.2 × 10−9; block 4, 5.9 × 10−11/1.3 × 10−12). Some of the tagging SNPs in LD blocks 3 and 4 were highly correlated [r2 ≥ 0.83 for SNPs rs749292, CV8234971 (rs6493494) and rs2414096, rs10046]. The two most significantly associated SNPs, rs749292 (A allele) and rs727479 (A allele), were only modestly correlated with each other (r2 = 0.46), and each remained significantly associated with circulating levels when modeled concurrently (E2, P = 2.1 × 10−3 and P = 3.1 × 10−4; E1, P = 0.047 and P = 6.1 × 10−8, respectively). A two-SNP haplotype (A-A) composed of these SNPs was found to be a more significant predictor of E2 and E1 levels (P = 4.6 × 10−13 and P = 4.4 × 10−15, respectively); however, it accounted for ≤2.0% of the variation in estrogen levels. Haplotype G-A (composed of these two SNPs) was also associated with levels when modeled together with haplotype A-A (E2, P = 0.012; E1, P = 4.2 × 10−5), indicating that neither of these multimarker haplotypes nor tagging SNPs fully capture the effects of the functional allele at this locus. These effects were observed in all three cohorts that measured hormone levels ( Table 2 ). Blocks 3 and 4 span a >67-kb region encompassing the translated region, 3′ UTR and promoters I.2, I.6, I.3, and PII.
Haplotypes in blocks 3 and 4 were also strongly associated with estrogen levels (P values for global haplotype effects by block for E2/E1: block 1, 0.11/0.01; block 2, 0.07/4.3 × 10−5; block 3, 1.4 × 10−9/9.1 × 10−11; block 4, 1.1 × 10−10/7.5 × 10−14). The magnitude of the associations was similar to the independent tagging SNPs in these blocks (Supplementary Tables S3 and S4). These SNPs and haplotypes were not significantly associated with androgen levels, whereas associations with androgen to estrogen conversion ratios (E2/T and E1/A), which were evaluated as a measure of enzyme activity, mirrored those of E2 and E1 (Supplementary Tables S5 and S6). Significant associations were observed with SNPs/haplotypes in blocks 1 and 2, although the effects were substantially smaller than those observed in blocks 3 and 4 and were most likely due to LD with the tagging SNPs in these blocks.
In the case-control analyses, none of the SNPs or haplotypes found to be related to postmenopausal E2 and E1 levels were significantly associated with breast cancer risk ( Table 1). The odds ratio for the rs749292-rs727479 A-A haplotype (versus G-C haplotype) was 0.99 [95% confidence interval (95% CI), 0.94-1.05; P = 0.74]. And no SNPs or haplotypes in any other LD block were consistently associated with risk (Supplementary Tables S7 and S8). Global likelihood ratio tests to control for type I error over all SNPs or haplotypes in each block were not significant (P ≥ 0.32). Associations were similar when limiting the analysis to cases with more advanced breast cancer (n = 1,648). Two correlated haplotypes (r2 = 0.56) in blocks 2 and 3 (and associated tagging SNPs), although showing no overall association, seemed to display heterogeneous effects (P < 0.01) when viewed separately for each cohort (Supplementary Table S9). Aside from the MEC, which did not contribute to the observed heterogeneous effects, these cohorts largely represent populations of European ancestry. Thus, heterogeneity of the results does not represent population differences and is more likely to reflect random variation than true biological differences. This highlights the importance of large-scale studies to resolve inconsistent findings from single small studies.
No novel nonsynonymous SNPs were identified through resequencing efforts. SNPs R264C (rs700519) and T201M (rs28757184) were not associated with breast cancer risk or circulating hormone levels in any single cohort or in the pooled analysis (Supplementary Tables S3 and S10).
In conclusion, we provide compelling statistical evidence that the CYP19A1 locus harbors genetic variation that is associated with circulating estrogen levels among postmenopausal women. In a study among 1,747 postmenopausal women that examined only two polymorphisms at this locus, Dunning et al. ( 7) found SNP rs10046 to be a modest predictor of E2 levels (∼6% mean increase per allele, P = 2 × 10−4). In our more comprehensive analysis of common variation spanning the entire CYP19A1 locus, we not only corroborated the observation with this particular SNP but also identified even stronger markers of circulating postmenopausal hormone levels at this locus. Whereas through resequencing we have ruled out nonsynonymous variants, the identity of the functional variant at this locus remains unknown. Screening for variation in noncoding and regulatory regions spanning LD blocks 3 and 4 among individuals with the multimarker A-A haplotype (rs749292-rs727479) should be done to enumerate all possible functional alleles at this locus.
Based on previous epidemiologic studies ( 1), one can estimate that a functional variant associated with a 10% to 20% increase in postmenopausal estrogen levels for heterozygous or homozygous carriers would generate, respectively, approximately a 1.07-fold or 1.15-fold increase in risk of breast cancer (assuming log linearity between levels and risk) and, therefore, be potentially detectable in a study of this size. Our combined analysis of five cohort studies, although extremely large by traditional standards, is still not sufficiently powered to consistently detect effects at <1.10 per allele even for SNPs or haplotypes with very high frequency. If variants in several genes affect estrogen exposure in a similar fashion, then the combination might have a more measurable effect on breast cancer risk. Whether aromatase inhibitors used as hormonal therapy or as chemoprevention may be more effective among women with modestly elevated estrogen profiles as determined by CYP19A1 genotype should be examined.
Grant support: National Cancer Institute cooperative agreements UO1-CA98233, UO1-CA98710, UO1-CA98216, and UO1-CA98758 and Intramural Research Program of NIH/National Cancer Institute, Division of Cancer Epidemiology and Genetics.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
We thank the cohort participants.
Note: Supplementary data for this article are available at Cancer Research Online (http://cancerres.aacrjournals.org/).
- Received November 8, 2006.
- Revision received January 6, 2007.
- Accepted January 18, 2007.
- ©2007 American Association for Cancer Research.