Rare, highly penetrant germ line mutations in BRCA1 strongly predispose women to a familial form of breast and ovarian cancer. Whether common variants (either coding or noncoding) at this locus contribute to the more common form of the disease is not yet known. We tested common variation across the BRCA1 locus in African American, Native Hawaiian, Japanese, Latino, and White women in the Multiethnic Cohort Study. Specifically, 28 single nucleotide polymorphisms (SNPs) spanning the BRCA1 gene were used to define patterns of common variation in these populations. The majority of SNPs were in strong linkage disequilibrium with one another, indicating that our survey captured most of the common inherited variation across this gene. Nine tagging SNPs, including five missense SNPs, were selected to predict the common BRCA1 variants and haplotypes among the non–African American groups (five additional SNPs were required for African Americans) and genotyped in a breast cancer case-control study nested in the Multiethnic Cohort Study (cases, n = 1,715; controls, n = 2,502). We found no evidence for significant associations between common variation in BRCA1 and risk of breast cancer. Given the large size of our study population and detailed analysis of the locus, this result indicates either that common variants in BRCA1 do not substantially influence sporadic breast cancer risk, or that unmeasured heterogeneity in the breast cancer phenotype or unmeasured interactions with genetic or environmental exposures obscure our ability to detect any influence that may be present.
- sporadic breast cancer
- genetic epidemiology
One of the key discoveries in breast cancer research over the past decade was the identification of mutations in the genes BRCA1 and BRCA2. These loci were identified through linkage analysis in large families segregating multiple cases of breast cancer ( 1, 2). Disease-causing mutations (resulting in a truncated protein) in BRCA1 (and BRCA2) are highly penetrant with mutation carriers having a lifetime risk of developing breast cancer of 40% to 80% ( 3– 6). The number of women in the general population with breast cancer attributable to these highly penetrant BRCA1 coding mutations, however, is quite small (<5%; refs. 7, 8). To date, there is little evidence that highly penetrant, germ line mutations in BRCA1 are observed in sporadic cases, but whether common polymorphisms play a role in disease risk has not yet been thoroughly evaluated.
Previous association studies of BRCA1 variants and breast cancer risk have focused primarily on variation in the coding region ( 9, 10). It is also a possibility, however, that common germ line variation in the noncoding (i.e., regulatory) regions of genes may influence predisposition to cancer ( 11). The BRCA1 gene lies in a region of extensive linkage disequilibrium (>200 kb), which makes it possible to test a subset of variants in the region that represents the bulk of all common genetic variation at the locus ( 12). 8 Dunning et al. ( 9) typed four missense SNPs in a study of breast and ovarian cancer in White women, but found no significant associations between any of these variants or the three common haplotypes they formed and cancer risk. More recently, using sequencing data (across the majority of the locus) generated from National Institute of Environmental Health Sciences Environmental Genome Project, 9 Cox et al. ( 13) identified five common haplotypes (≥5%) that could be predicted by four tagging SNPs. Testing of these SNPs among White women in the Nurses' Heath Study showed a weakly significant association of one of the haplotypes with breast cancer risk ( 13).
Evaluation of variation in both coding and noncoding regions across a locus has only recently become possible because of large-scale efforts to identify and catalogue single nucleotide polymorphisms in the genome ( 14, 15). In the present study, we typed a large number of common variants, and by testing single sites and the ancestral haplotypes they form in a large and multiethnic breast cancer study, aimed to more comprehensively survey common genetic variation at the BRCA1 locus in relation to breast cancer risk.
Materials and Methods
The multiethnic cohort. The Multiethnic Cohort Study consists of over 215,000 men and women in Hawaii and Los Angeles (with additional African Americans from elsewhere in California) and has been described in detail elsewhere ( 16). In brief, the cohort is comprised predominantly of a general population sample of Native Hawaiians, Japanese, and Whites in Hawaii, and African Americans, Japanese, and Latinos in Los Angeles. Between 1993 and 1996, participants entered the Multiethnic Cohort Study by completing a 26-page self-administered mail questionnaire that asked detailed information about dietary habits, demographic factors, personal behaviors, history of prior medical conditions, family history of common cancers, and, for women, reproductive history and exogenous hormone use. The participants were between the ages 45 and 75 when they entered the cohort.
Incident cancers in the Multiethnic Cohort Study are identified by cohort linkage to population-based cancer Surveillance, Epidemiology, and End Results registries covering Hawaii and Los Angeles County, and to the California State cancer registry covering all of California. Information on stage of disease at the time of diagnosis is also collected from the cancer registries; women were classified as having advanced breast cancer when there was evidence of dissemination beyond the breast at diagnosis (stage II or higher).
Beginning in 1994, blood samples were collected from incident breast cancer cases. At this time, blood collection was also initiated in a random sample of Multiethnic Cohort Study participants to serve as a control pool for genetic analyses in the cohort. The participation rates for providing a blood sample were 74% and 66% for cases and controls, respectively. Eligible cases in this nested breast cancer case-control study consisted of women with incident breast cancer (including second primaries) diagnosed after enrollment in the Multiethnic Cohort Study through May 31, 2003. Controls were women without breast cancer before entry into the cohort and without a diagnosis up to May 31, 2003. The breast cancer case-control study consists of 1,715 invasive breast cancer cases and 2,502 controls and was approved by the Institutional Review Boards at the University of Southern California and at the University of Hawaii.
Characterizing linkage disequilibrium and haplotypes. To characterize variation, we implemented a haplotype-based approach to examine common variation throughout the BRCA1 gene as described in detail previously ( 17). We initially surveyed common genetic variation across 108.1 kb of the BRCA1 gene using markers from the public SNP map. Our goal was to capture the common haplotype patterns across the BRCA1 locus. To do this, we genotyped 28 SNPs across the locus selected from the National Center of Biotechnology Information SNP database. 10 This SNP map included six missense SNPs that are relatively common (≥5% in at least one ethnic group in the Multiethnic Cohort Study) and have been previously reported in the BIC database 11 and/or the literature (Q356R, P871L, E1038G, S1140G, K1183R, and S1613G; ref. 9). SNPs were genotyped in a multiethnic panel of 349 women in the Multiethnic Cohort Study without a history of cancer (n = 69-70 per ethnic group). This sample size guaranteed that any haplotype with a frequency of >5% will be represented at least once among the 140 chromosomes with probability >99%.
The |D′| and r2 statistics were used to assess pairwise linkage disequilibrium between SNPs as described ( 18, 19). Linkage disequilibrium block structure was examined using the 90% confidence bounds of D′ to define sites of historical recombination between SNPs with minor allele frequencies >10% ( 20).
Haplotype construction and tagging single nucleotide polymorphism selection. Haplotype frequency estimates were constructed from the genotype data in the multiethnic panel (one ethnicity at a time) within linkage disequilibrium blocks using the expectation-maximization algorithm of Excoffier and Slatkin ( 21). The squared correlation (Rh2) between the true haplotypes (h) and their estimates were then calculated as described ( 22). Tagging SNPs for the case-control study were then chosen by finding the minimum set of SNPs which would have Rh2 > 0.7 for all haplotypes with an estimated frequency of >5%. We included all missense SNPs as tagging SNPs before minimizing the number of tagging SNPs required to predict the common haplotypes. The calculation of Rh2 is described in detail by Stram et al. ( 22) and a computer program (tagSNPs) for the calculation is available at D. Stram's website. 12 Pairwise r2 measures were calculated 13 to assess how well the chosen tagging SNPs were correlated with the variants that were not typed in the breast cancer cases and controls.
Comparison of tagging single nucleotide polymorphism and haplotype frequencies between cases and controls. We evaluated associations with the tagging SNPs and missense SNPs, and the common haplotypes within each block. Haplotype frequencies among breast cancer cases and controls were estimated using the tagging SNPs selected to distinguish the common haplotypes (>5% frequency) for each ethnic group in the multiethnic panel as described ( 17). We first did global likelihood ratio tests to test whether the frequency distributions of the common haplotypes within each block differed between cases and controls. We used the methods described by Zaykin et al. ( 23) to perform global tests of no association between haplotypes and cancer risk and to estimate haplotype-specific odds ratios (OR). Briefly, the expectation-maximization algorithm was run to estimate haplotype frequencies for the tagging SNPs for a combined data set (cases + controls) and individual estimates of haplotype count (expected number of copies of each haplotype carried by each individual) from the expectation-maximization were output to an external file and merged with case-control status. These estimates were then used as explanatory variables in logistic regression analysis, in place of true haplotype count, first to perform a global test of no association between haplotypes and risk and next to estimate haplotype-specific ORs for each common haplotype. ORs and 95% confidence intervals (95% CI) were estimated for each haplotype and missense SNP using unconditional logistic regression. Associations with SNPs and haplotypes were examined in ethnic-stratified analyses, and summary ORs were presented adjusted for ethnicity and age. Adjustment for the established breast cancer risk factors ( 24), first-degree family history of breast cancer, body mass index, parity, age at first birth, age at menarche, menopausal status, type of menopause, age at menopause, use of hormone replacement therapy, and alcohol consumption did not affect our results. We used the Statistical Analysis System for all analyses (version 8.2, SAS Institute, Inc., Cary, NC).
Genotyping. DNA for the multiethnic panel of breast cancer cases and controls was extracted from WBC fractions using the Qiagen Blood Kit (Qiagen, Chatsworth, CA). Genotyping for linkage disequilibrium and haplotype discovery was done by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry using the Sequenom platform at the Broad Institute. Genotyping of the tagging SNPs and missense SNPs was done by the 5′ nuclease TaqMan allelic discrimination assay using the ABI 7900 (Applied Biosystems, Foster City, CA) in the USC Genotyping Laboratory. The E1038G missense SNP was not genotyped in the case-control study because of a nearby SNP (6 bp away) that interferes with probe binding and results in genotype misclassification. SNP 18 (K1183R; rs16942), however, is a good proxy for this common missense SNP (r2 ≥ 0.94 in all ethnic groups). Replicate blinded quality control samples were included to assess reproducibility of the genotyping procedure; concordance was >99% in the multiethnic panel and in the case-control study.
Characterization of linkage disequilibrium and common haplotype patterns. We genotyped 28 SNPs selected from over 450 variants that exist in the SNP database (average spacing of one common polymorphism every ∼3.9 kb; Table 1 )—from 21.5 kb upstream through 7.0 kb downstream of the BRCA1 gene (total distance, 108.1 kb)—to characterize the pattern of linkage disequilibrium in this region. Consistent with other reports ( 12), we observed a region of strong linkage disequilibrium (in all ethnic groups) across the BRCA1 locus ( Fig. 1 ). Our goal was to genotype at least six high frequency markers (≥ 10%) in the region of strong linkage disequilibrium to ensure that we thoroughly sampled the common diversity in the region ( 20).
We observed a total of 13 common haplotypes; for the analyses, we considered all haplotypes with a frequency ≥5% in at least one ethnic population ( Table 1). Common BRCA1 haplotypes accounted for at least 79% of all chromosomes in each ethnic group. We observed differences in haplotype number and distribution between the African Americans and the other four populations. The African American population was the most diverse population, possessing eight common haplotypes, followed by Whites with seven, Latinos with six, and Japanese and Native Hawaiians each with four. Interestingly, five of the haplotypes among African Americans were only rarely observed in the other populations (≤1% in all populations; haplotype 11, 2% among Latinos), and yet, they accounted for the majority (56%) of the total haplotype diversity in the African American sample.
We selected 14 markers as tagging SNPs (forcing in the five missense SNPs) that strongly predicted the common haplotypes (>5% frequency) in each ethnic group in the multiethnic panel ( Table 1). The average Rh2 in predicting the common haplotypes across all ethnic groups was 0.87 and we observed only minor differences in haplotype frequencies predicted solely by the tagging set of SNPs versus haplotype frequencies as defined by all of the SNPs ( Table 1). To assess how well the 14 tagging SNPs captured the unmeasured SNPs (the remaining 14 SNPs), we computed the pairwise r2 values (correlations) of the tag SNPs to each unmeasured variant. All r2 values were above 0.8 (average r2 across populations = 0.98) except for one variant among African Americans (rs799907, r2 = 0.53). Thus, we believe that the chosen tagging SNPs (and the haplotypes they define) provide good prediction of all SNPs assayed in the multiethnic panel and that common variation was thoroughly characterized at this locus.
Associations of missense and tagging single nucleotide polymorphisms in BRCA1 and breast cancer risk. All SNPs conformed to Hardy-Weinberg equilibrium among controls within each ethnic group (data not shown). We observed no evidence of significant associations between any of the missense SNPs and breast cancer risk ( Table 2 ). In addition, no significant associations were noted with any of the tagging SNPs (data not shown).
Associations of BRCA1 haplotypes and breast cancer risk. For the haplotype analysis, we first did a global likelihood ratio test to assess if the overall distribution of haplotypes differed between cases and controls. This test was not statistically significant (P = 0.38), and thus provided little support for the existence of a common disease allele at the BRCA1 locus contributing to sporadic breast cancer risk. Because this test focuses on the global distribution of the haplotypes, and thus could miss subtler effects of individual haplotypes on breast cancer risk, we also evaluated the effects of individual haplotypes. Across all ethnic groups combined, we observed no statistically significant haplotype effects ( Table 3 ). Compared with noncarriers for each haplotype, we did observe heterozygous (OR, 1.17; 95% CI, 0.96-1.43) and homozygous carriers of haplotype 3 (OR, 1.17; 95% CI, 0.50-2.69) to have a slight, nonsignificantly elevated risk of breast cancer, but no more so than would be expected based on chance fluctuation alone. This haplotype was more common among Whites (frequency, 14%; heterozygotes: OR, 1.30; 95% CI 0.96-1.77; homozygotes: OR, 1.63; 95% CI 0.57-4.66). These modest effects were not stronger when limiting the analysis to women with advanced disease (cases, n = 447; data not shown). We did not observe a significant positive association with haplotype 2, which was reported in the study by Cox et al. ( 13). None of the haplotypes carried predominantly by the African American population were significantly associated with risk (although our power to detect an effect was lower because of the smaller sample size; Table 3).
Studies of familial aggregation ( 25, 26) and twins ( 27) document substantial heritability of breast cancer in the population. Highly penetrant mutations in BRCA1 and BRCA2 have been estimated to account for ∼20% of familial breast cancer, but <5% of all breast cancers ( 28). Thus, the genetic elements contributing to the vast majority of breast cancer cases remain largely unknown. The role of common missense SNPs, as well as variation in noncoding regions (that may influence risk through expression levels and alternative splicing), have yet to be thoroughly explored (at single loci as well as throughout the genome) as markers of breast cancer susceptibility. Ongoing efforts to systematically characterize genetic polymorphisms, such as the International HapMap Project ( 15), provide the foundation for conducting comprehensive association studies of common variation.
In the present study, we thoroughly tested common variation in BRCA1 in a large multiethnic population and found no evidence for common germ line variation in BRCA1 playing a significant role in sporadic breast cancer. Another recent study ( 13) reported a modestly positive association between a BRCA1 haplotype and breast cancer among White women in the Nurses' Health Study (OR, 1.18, 95% CI, 1.02-1.37). Our results do not support these findings; however, among the populations that carried this haplotype with a frequency of at least 5% (haplotype 2: Whites, 16%; Latinos, 9%), we only had 70% power to detect a relative risk of 1.3 for a dominant allele (assuming an Rh2 of 0.9). Overall, our study was well powered to detect effects for alleles that were shared across ethnic populations; we had 94% statistical power to detect a relative risk of 1.35 for a dominant allele with a frequency of 10%, and 70% power to detect a relative risk of 2.2 for a recessive allele with a frequency of 10%.
The vast majority of common variation is shared between populations; however, allele frequencies are known to vary across populations ( 29) and studies conducted in a multiethnic population may lend insight into better understanding ethnic differences in breast cancer risk ( 24). In this study, we observed most of the common BRCA1 haplotypes to be shared among Native Hawaiians, Japanese, Latinos, and Whites; four of the eight common haplotypes were found in at least three ethnic populations and six of the eight were found in at least two groups. African Americans, however, were observed to have five haplotypes that were uncommon in the other groups (most < 1%). The SNPs that defined these African American–specific haplotypes were rare or did not exist in the other populations (SNPs: 1, 2, 4, 8, and 11), which suggests that these alleles either did not make it through the Out of Africa bottleneck that occurred ∼50,000 to 100,000 years ago ( 30) or did make it through the bottleneck and then drifted to low frequencies.
Although we have thoroughly addressed the role of common variation in BRCA1 and sporadic breast cancer risk, it remains a possibility that this locus may still prove to be involved in breast cancer risk. Specifically, rare (<5%) variants (that our sample size is not adequately powered to test) may contribute to disease; to address this hypothesis, however, large-scale resequencing efforts (to discover the rare variants) and testing of these variants in even larger cohorts, such as the National Cancer Institute Consortium of Cohorts, 14 will be required. Another possibility is that what we label as sporadic breast cancer is actually a collection of genetically distinct subclasses of breast cancer. In this scenario, it would be unlikely that the same set of underlying susceptibility alleles occur in all breast cancer cases (i.e., the genetic architecture of disease is not genetically homogeneous). If these subgroups are not recognized and separately analyzed, then the power to detect them will be diminished. At the histologic and molecular levels, breast tumors have different characteristics; subsets of breast tumors as defined by immunohistochemistry [e.g., estrogen-receptor (+/−) and HER2/neu(+/−)] often display different biological behaviors, such as time to disease progression and response to therapy, which may reflect different genetic origins. Studies have shown that breast tumors of women with hereditary breast cancer with mutations in BRCA1 and a subset of women with sporadic disease (∼25%) share similar traits, including a basal cell histology, higher-grade tumors, cytokeratins 5/6, and estrogen receptor negativity, suggesting they may have a similar etiology ( 31– 33). The ability to stratify breast cancer cases by expression profiling, immunohistochemistry, methylation patterns, and/or clinical variables may facilitate the identification of more genetically homogeneous subsets of cancer cases, and therefore may help to identify causal variants underlying specific breast cancer phenotypes.
Grant support: Howard Hughes Physician Postdoctoral Fellowship (M.L. Freedman); California Breast Cancer Research Program, grant no. 9KB-0006; and General Motors Cancer Research Scholars grant (C.A. Haiman).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
We thank the participants of the Multiethnic Cohort Study for their participation and commitment. We also thank Loreall Pooler and David Wong for their laboratory assistance, and Dr. Kristine Monroe and Hank Huang for their support with the study design.
- Received January 14, 2005.
- Revision received May 17, 2005.
- Accepted June 2, 2005.
- ©2005 American Association for Cancer Research.