| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Molecular Biology, Pathobiology, and Genetics |
Departments of 1 Cancer Biology, 2 Medicine, and 3 Biostatistics, Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine; 4 Medical Research Service, VA Tennessee Valley Healthcare System, Nashville, Tennessee; and 5 Department of Epidemiology, Shanghai Cancer Institute, Shanghai, China
Requests for reprints: Jeffrey R. Smith, Department of Medicine, Vanderbilt University School of Medicine, 529 Light Hall, 2215 Garland Avenue, Nashville, TN 37232-0275. Phone: 615-936-2171; Fax: 615-936-2296; E-mail: jeffrey.smith{at}vanderbilt.edu and Wei Zheng, Department of Medicine, Vanderbilt University School of Medicine, Nashville, TN. Phone: 615-936-0682; Fax: 615-322-1754; E-mail: wei.zheng{at}vanderbilt.edu.
| Abstract |
|---|
|
|
|---|
| Introduction |
|---|
|
|
|---|
In this study, we sought to comprehensively characterize common genetic variation at CYP11A1 to assess patterns of linkage disequilibrium (LD) and to refine our understanding of the contribution of CYP11A1 genetic variation to breast cancer risk. The initial discovery of the single-allele association at one STR of the CYP11A1 gene led us to hypothesize that it marked an uncharacterized haplotype conferring breast cancer risk. Among alleles of variant sites identifying a cancer-associated haplotype, a subset that directly marks it are candidates that may be functional in the disease. Those altering transcript expression or processing or the encoded enzyme itself remain of great interest in further delineating the role of this gene in common breast cancer. We tested this hypothesis within the Shanghai Breast Cancer Study using haplotype-based analyses. Here, we show that the disease-associated haplotype is designated by multiple variants upstream of the coding region. We further observe that CYP11A1 expression in a lymphoblastoid cell line homozygous for the disease-associated haplotype is 2-fold greater than the expression in lymphoblastoid cell lines harboring alternative haplotypes. We conclude that common cis-acting variants upstream of the coding region may affect transcriptional regulation to influence breast cancer risk.
| Materials and Methods |
|---|
|
|
|---|
To preserve the limited DNA from study subjects recruited in the Shanghai Breast Cancer Study, allele discovery used DNAs obtained from Chinese cell lines of the Coriell Institute for Medical Research (Camden, NJ). These included NA18524, NA18526, NA18529, NA18532, NA18537, NA18540, NA18542, NA18545, NA18547, NA18550, NA18552, NA18555, NA18558, NA18561, NA18562, NA18563, NA18564, NA18566, NA18570, NA18571, NA18572, NA18573, NA18576, NA18577, NA18579, NA18582, NA18592, NA18593, NA18594, NA18603, NA18605, NA18608, NA18609, NA18611, NA18612, NA18620, NA18621, NA18622, NA18623, NA18624, NA18632, NA18633, NA18636, NA18637, NA00576, NA03433, NA13411, NA14821, NA16654, NA16688, NA16689, NA17013, NA17014, NA17015, NA17016, NA17017, NA17018, NA17019, and NA17020.
Variant discovery and confirmation. To capture genetic diversity of CYP11A1, database single-nucleotide polymorphisms (SNP) annotated in dbSNP were screened for common polymorphism in the study population. Fifty-three annotated SNPs spanning CYP11A1 from 7.8 kb 5' of the 29.9-kb gene to 10 kb 3' were genotyped to assess polymorphism. This was done in quadruplicate among 15 Chinese cell line DNAs. The screening set was estimated to provide 95% power to detect a polymorphism with a minor variant frequency of 0.10 and 78% power with a frequency of 0.05.
The 15 Chinese cell lines were also used for de novo SNP discovery by dual single-strand conformational polymorphism (SSCP) methods and resequencing. Where either SSCP method identified a variant, conformers were resequenced for allele discovery. Overlapping amplimers across CYP11A1 were used for polymorphism screening. This included 142 amplimers spanning from 1.9 kb 5' to 98 bp 3' of the gene. Intron 1 of the CYP11A1 gene contains a 13.8-kb nearly contiguous interval of RepBase repeats. Select nonunique regions embedded within that interval (presenting an impediment for PCR-based assay) were omitted from the survey, as outlined in Fig. 1 .
|
To identify additional genetic variation on the disease-associated haplotype, Chinese cell line GM16654, which is homozygous for the disease haplotype, and comparative cell line GM10859 (CEPH 1347-02) were resequenced from 1.9 kb 5' to 98 bp 3' of the gene (again omitting nonunique regions within the 13.8-kb interval of intron 1) as outlined in Fig. 1. All exons and exon-intron junctions were additionally resequenced in five subjects of the Shanghai Breast Cancer Study, harboring five common haplotypes, including a subject homozygous for the disease-associated haplotype. Sequencing used BigDye terminator chemistry on a 3100 Genetic Analyzer (Applied Biosystems). Discovered variants were then genotyped in the 59 Chinese cell line DNAs to assign alleles to known haplotypes.
Fifteen novel SNPs discovered in the study have been submitted to dbSNP (ss68316999ss68317010 and ss68362647ss68362649). Two novel polymorphic STRs have been submitted to GDB and to dbSNP (D15S1547/ss69363921 and D15S1546/ss69363922).
SNP genotyping. We genotyped SNPs by single-nucleotide primer extension and fluorescence polarization in 384-well format (8). Reaction processing entailed three steps: a 4.4-µL PCR reaction, addition of 4 µL of an exonuclease I (New England Biolabs) and calf intestinal alkaline phosphatase (Promega) reagent mix to degrade unincorporated primer and dephosphorylate deoxynucleotide triphosphates (dNTP), and a final addition of 4 µL of an Acyclopol and Acycloterminator reagent mix for the primer extension reaction (AcycloPrime FP SNP Detection System, Perkin-Elmer). Each PCR mixture included 0.1 unit of AmpliTaq Gold DNA polymerase, 1x Buffer II (Applied Biosystems), 2.5 mmol/L MgCl2, 0.25 mmol/L dNTP, 335 nmol/L of each primer, and 2 ng DNA template. We detected incorporation of R110-labeled and TAMRA-labeled terminators by fluorescence polarization on a Molecular Devices LJL Analyst HT. Both forward and reverse strand extension primers were tested to select the most robust assay. Amplimer and extension primer sequences for genotyped SNPs of Fig. 2 are provided in Supplementary Table S1.
|
Single-stranded conformation polymorphism detection. Amplimers were electrophoresed on 0.5x mutation detection enhancement gels (Cambrex Biosciences) at room temperature at 2 W for 14 h and at 4°C at 4 W for 14 h. PCR conditions were as described above. Amplimers were visualized by silver staining (10). Representative conformers were sequenced using BigDye terminator chemistry on a 3100 Genetic Analyzer (Applied Biosystems) to identify the polymorphic sites.
Statistical analyses. Hardy-Weinberg equilibrium (HWE) for markers was calculated using the Stata package genassoc of Clayton (11). Pairwise LD between SNPs was calculated and visualized using Haploview version 3.2 (12). Pairwise LD for SNPs and multiallelic STRs were calculated and visualized using MIDAS version 1 (13). Tagging SNPs were selected using LDSelect with a minor allele frequency (MAF) threshold of 5% and an r2 threshold of 0.7 (14). When multiple SNPs were assigned as tagging SNPs for a particular bin, the SNP with most robust assay performance was selected for that bin.
Population haplotype frequencies were estimated by the Bayesian method implemented in PHASE version 2.1 (1517) and by an expectation-maximization (18) algorithm implemented in custom software that we based upon a parent program written by Fallin and Schork (19, 20). The custom expectation-maximization program enabled use of multiallelic markers, placed no hard-coded limit on the number of subjects or markers, and allowed parallel processing. Diplotypes were predicted using PHASE, and those predicted with a probability of >95% were used for tests of association.
The
2 test statistic was used to evaluate differences in allele or haplotype frequency of case and control groups. Alleles or haplotypes with an overall frequency of <0.05 were grouped for analysis. A sliding window approach tested a haplotype window of N markers, sliding the window along the map in single-marker increments (19, 21). For a given window of N adjacent markers, the profile of multiple common haplotypes and rare haplotypes as a group were evaluated in cases and controls by the
2 test statistic. Each N-marker haplotype and remaining haplotypes of the window as a group was also evaluated by the
2 test statistic. Permutation testing was used to assess significance. Subsequent estimation of effect size used logistic regression models adjusted for age (Intercooled Stata 9, Stata Corporation).
Cladistic modeling of haplotypes resolved by PHASE with
99% probability was accomplished using DNAPARS and DRAWTREE of the software package Phylip 3.6. The observed haplotype with the least number of state changes to all other observed haplotypes was designated as the outgroup for unrooted parsimony. Each multiallelic marker of N alleles was encoded as a series of N-1 binary allelic sites to allow inclusion in the model.
Expression analyses in lymphoblastoid cell lines. Expression analyses used RNA prepared from the lymphoblastoid cell lines GM16654, GM17020, and GM17014 (Coriell Institute for Medical Research) carrying select CYP11A1 diplotypes. Cells were cultured at 37°C under 5% CO2 in a medium containing RPMI 1640 with 2 mmol/L L-glutamine and 15% fetal bovine serum. Total RNA from each cell line was prepared from cells in the log phase of growth using the RNeasy midi kit with on-column DNase treatment (Qiagen). RNA quality was assessed by reverse transcriptase PCR using two different sets of intron-spanning primers, one for phosphoglycerate kinase and one for p53, with a no-reverse transcriptase control to rule out DNA contamination. Nine 1-µg aliquots of total RNA of each cell line were reverse transcribed into single-stranded cDNA using High-Capacity cDNA Archive kit (Applied Biosystems). After cDNA synthesis, RNA was degraded by alkaline hydrolysis, pH was neutralized, cDNA was purified by adsorption to silica gel (QIAquick PCR Purification kit, Qiagen) and eluted in 60 µL of 10 mmol/L Tris Cl (pH 8.5). cDNA quantities were measured spectrophotometrically (NanoDrop ND-1000, NanoDrop Technologies).
A fluorescently labeled TaqMan MGB probe was used to quantify CYP11A1 expression in each of the nine reverse-transcribed aliquots by real-time quantitative PCR. Each assay was done in quadruplicate. The probe spanned the exon 1 to exon 2 boundary within the coding region (Chr 15 7242443872427449, assay Hs00167984_m1, Applied Biosystems). cDNA (5 ng) was amplified in a 5-µL reaction using the TaqMan system (Assays-On-Demand Gene Expression Products, TaqMan Universal PCR Master Mix, 7900HT Real-Time PCR System; Applied Biosystems). For each CYP11A1 expression assay, results were normalized to the expression of the 18S rRNA housekeeping gene in the same sample (assay Hs99999901_s1, Applied Biosystems). Statistical comparisons were made using a one-way ANOVA and two-tailed Student's t test.
| Results |
|---|
|
|
|---|
2 kb 5'-flanking sequence by SSCP and resequencing. Repetitive sequence was an obstacle for unique assay. A 5.6-kb window of nonunique sequence of intron 1 and several additional small repetitive intronic regions totaling under 2 kb were omitted from SNP discovery efforts (Fig. 1). Collectively we identified 59 variant sites in the CYP11A1 genomic region positioned on the map of Fig. 1. Of these, 80% were annotated in dbSNP. We developed assays for three STRs (including D15S1547, D15S520, and D15S1546, but omitting poly(A) tract indels rs3831490 and rs12899703) and 46 SNPs using Chinese cell lines. Among these markers, 3 STRs and 42 SNP assays were further genotyped in a subset of the Shanghai Breast Cancer Study population for the assessment of MAF, HWE, and haplotype diversity and for the selection of tagging markers. This study population subset included 178 cases and 178 controls. This yielded 3 polymorphic STRs and 27 SNPs (Fig. 2) with MAFs
0.05 and in HWE (P
0.05) for inclusion in analyses. These SNPs had MAFs that ranged from 0.49 to 0.06 among controls. STR heterozygosities were 0.79 (D15S1547), 0.52 (D15S520), and 0.70 (D15S1546). Pairwise LD across the CYP11A1 gene was relatively strong in the study population and without clear LD block subdivision. A Haploview plot of SNP allele pairwise D' values is presented in Fig. 1. If an STR was highly mutable, one would anticipate low LD with neighboring SNPs. Instead, specific alleles of the STRs were in strong LD with select SNP alleles and efficiently tagged SNP haplotypes with few assays (Fig. 2). For example, the T allele of SNP rs8039957 (associated with breast cancer risk as shown further below) had pairwise D' values of 0.93 with the 12-repeat allele of D15S1547, 0.86 with the 8-repeat allele of D15S520, and 0.58 with the 8-repeat allele of D15S1546. Throughout the article, we refer to each SNP allele as that on the coding strand of the chromosome.
We sought an efficient set of tagging markers among the 3 STRs and 27 SNPs to capture CYP11A1 gene diversity for tests of association with breast cancer. Eight SNPs and three STRs were selected as robust tagging markers, each with an allele in pairwise LD with an allele of remaining markers with an r2
0.70 for the control group. This set of markers included rs8039957, D15S1547, D15S520, D15S1546, ss68317008, ss68317006, rs1484215, rs11638442, rs7173655, rs2279357, and rs2277606. Four SNPs at map ends (rs3825944, rs12438594, rs4077585, and rs2930306) were less efficiently tagged by the set, with maximal r2 values ranging from 0.57 to 0.66.
Diplotypes of the 356 Shanghai Breast Cancer Study subjects were inferred for frequency estimation. Figure 2 illustrates haplotypes inferred by PHASE with a probability of
0.99; these are presented in an order predicted by cladistic modeling. Each haplotype has an identifying number from 1 to 57 (assigned by order of decreasing haplotype frequency). These account for 88% of all CYP11A1 haplotypes in this population. Only five haplotypes were present with greater than a frequency of 0.05.
The STR alleles marked predominant SNP haplotypes well, in concordance with the high-measured pairwise LD values. Among more closely related SNP haplotypes (proximal in Fig. 2), STR alleles do deviate from the principal one and tend to do so by one- or two-repeat increments. This may reflect a stepwise rather than stochastic mutational mechanism (22, 23). Typical STR polymorphisms have alleles varying in increments of the repeat unit. D15S1547 is a dimer, D15S520 is a pentamer, and D15S1546 is a tetramer. However, D15S520 is distinct because predominant population alleles are in increments of 10 bp rather than 5 bp. We subcloned and resequenced each of the major alleles to confirm this.
We next genotyped the set of 11 tagging markers in 1159 breast cancer cases and 1236 controls of Shanghai Breast Cancer Study population to explore CYP11A1 contribution to breast cancer risk. Data was obtained on 94% of genotypes sought (per marker range, 8698%). Each of the tagging SNPs and STRs was in HWE (P
0.05). Table 1
presents single-allele association results, comparing the case and control groups for these markers. The most significant evidence of association is observed at the three most 5' markers, each just upstream of the CYP11A1 coding region. Significance estimates by permutation testing range from P = 2.0e5 to 4.1e4 for one allele at each of these markers. Each of the risk alleles observed in single-allele association tests (the T allele of rs8039957, 12-repeat allele of D15S1547, 8-repeat allele of D15S520, and 8-repeat allele of D15S1546) marks closely related haplotypes in the 5' end of the CYP11A1 gene, predominated in prevalence by haplotype 4 of Fig. 2 (frequency, 0.086).
|
|
|
The evidence that these experiments uncovered supports a role for common CYP11A1 promoter variation in breast cancer risk. Although CYP11A1 expression is greatest in steroidogenic tissues, it is also expressed in lymphocytes (24). Because we had identified the CYP11A1 diplotype for each of 59 Chinese lymphoblastoid transformed cell lines, we subsequently evaluated expression of a cell line homozygous for the disease-associated haplotype (4 of Fig. 2) and compared with expression of two cell lines homozygous for alternative common haplotypes (1 and 3). CYP11A1 expression was measured in total RNA prepared from the cell lines using a 5' fluorogenic nuclease quantitative real-time PCR assay, normalizing to expression of 18S rRNA. Within these cell lines the expression of the disease-associated haplotype was roughly 2-fold greater than that of either alternative haplotype tested (Fig. 4 ). Increased relative expression is consistent with increased risk for breast cancer conferred by the promoter haplotype.
|
| Discussion |
|---|
|
|
|---|
Our observations are consistent with the important role of the cholesterol side-chain cleavage enzyme in steroid sex hormone biosynthesis and with epidemiologic studies implicating estrogen biosynthesis and metabolism in breast cancer etiology (3). We estimate that population-attributable risk of the 5' regulatory region haplotype of CYP11A1 is 6.9%. This reflects an important contribution to breast cancer in the Chinese population. HapMap data for the CEU study population suggest a higher frequency of this haplotype (defined by rs8039957 allele T, rs4887139 allele C, and rs4278698 allele A) among Caucasians than among Chinese (25).
Because tissue-specific regulatory elements of CYP11A1 are known to function in ovary and adrenal, it is conceivable that promoter haplotypes may be correlated selectively with premenopausal or postmenopausal breast cancer, reflecting the relative tissue origin of steroidogenesis. The cases of the Shanghai Breast Cancer Study that we evaluated are predominantly (68%) premenopausal, and evidence of association is strongest in this group (5). Intriguingly, Setiawan et al. also found evidence of association between a haplotype over the 5' region of the CYP11A1 gene and breast cancer risk in the Multiethnic Cohort Study (26). However, the risk haplotype that they identified (similar to haplotype 3 of Fig. 2) is distinct from the risk haplotype (4) identified in our study. Cases of the Multiethnic Cohort Study are predominantly (69%) postmenopausal. The risk haplotype of the Setiawan et al. study is tagged by rs3803463 at 7,542 bp upstream of the gene, a marker also in LD with rs1484215 between exons 2 and 3 (r2 range, 0.750.87 in HapMap populations). In light of these collective findings, further epidemiologic evaluation of CYP11A1 haplotypes in premenopausal and postmenopausal breast cancer, and investigation of their effect on tissue-specific expression is warranted.
The structure of the CYP11A1 gene promoter has been extensively investigated in prior studies (2739). The proximal promoter is composed of a TATA box, a highly conserved SF1/LRH-1 site, and two SP1 sites. Promoter deletion mapping has also identified a negative regulatory element residing between 300 and 660 bp (27, 33, 37). This region harbors nonconserved repetitive elements flanking D15S520 at 487 bp. The nonconserved CA simple-sequence repeat D15S1547 at 1,361 bp is also adjacent to a repetitive element. The upstream cyclic AMP (cAMP)response sequence at 1,540 to 1,640 bp harbors two AP1/cAMP-responsive element binding protein binding sites flanking an SF1 site. Further upstream are two adrenal-specific enhancers between 1,840 and 1,900. SNPs marking the disease-associated haplotype, rs4887139 at 2,228, rs8039957 at 4,884 bp, and potentially rs4278698 at 4,984 do not reside within conserved regions. Both the [AAAT]n simple-sequence repeat of D15S1546 and SNP rs12442401 reside within nonconserved regions of the first intron. All identified variants of the disease-associated haplotype, thus, fall outside of conserved elements defined by vertebrate Multiz alignment (40) in the CYP11A1 region. However, the D15S520 repeat [TAAAA]n potentially resides within a described functional promoter element.
A [TAAAA]n polymorphic repeat has been shown to be a negative regulatory element within the promoter of the plasma sex hormonebinding globulin gene (SHBG; ref. 41); 6 to 11 repeats reside at 726 bp of that promoter. Reporter constructs carrying six repeats showed significantly less transcriptional activity than constructs carrying other repeat lengths. The six-repeat version of the SHBG promoter is also associated with lower SHBG levels (42). The six-repeat allele of D15S520 was the most commonly observed in our study, and two cell lines homozygous for the six-repeat allele each had significantly lower CYP11A1 expression than a cell line homozygous for the disease-associated eight-repeat allele. The two promoter repeats may not be fully analogous, however, because increased risk of breast cancer in our study was associated only with the eight-repeat allele at CYP11A1, not with other nonsix-repeat alleles. An even and odd number of repeats could alternatively orient closely flanking transcription factors on the same or opposite DNA helical faces to influence interactions; odd repeat alleles of D15S520 were relatively rare in both the Shanghai Breast Cancer Study and in the Multiethnic Cohort Study (26).
Heritable variation of both cis and trans regulatory elements controlling expression of steroid hormone biosynthesis and metabolism genes could greatly contribute to population breast cancer risk (43, 44). Broader investigation of this large network of genes should reasonably include genetic variation of potential regulatory elements. A genome-wide or a candidate gene association study based upon tagging SNP selection from current HapMap data could have detected association of CYP11A1 with breast cancer risk in our study population. A direct investigation of SHBG promoter variation in breast cancer risk has not yet been conducted, although higher plasma levels of SHBG (with corresponding lower levels of circulating estrogen) have been associated with reduced risk for breast cancer (45). Among other genes of the steroid hormone regulatory network, a SNP within the human progesterone receptor gene promoter, located between its two alternative isoform transcript start sites, has been shown to have a direct effect on expression (46). That promoter variant was further associated with breast cancer risk in the Nurses' Health Study (46). Systematic investigation of steroid hormone biosynthesis and metabolism gene variation may provide a more comprehensive picture of the role of these pathways in breast cancer risk.
| Acknowledgments |
|---|
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
We thank the study participants and staff of Shanghai Breast Cancer Study.
| Footnotes |
|---|
Received 2/ 5/07. Revised 4/ 6/07. Accepted 4/12/07.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
L. C. Sakoda, C. Blackston, J. A. Doherty, R. M. Ray, M. G. Lin, H. Stalsberg, D. L. Gao, Z. Feng, D. B. Thomas, and C. Chen Polymorphisms in Steroid Hormone Biosynthesis Genes and Risk of Breast Cancer and Fibrocystic Breast Conditions in Chinese Women Cancer Epidemiol. Biomarkers Prev., May 1, 2008; 17(5): 1066 - 1073. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Cancer Research | Clinical Cancer Research |
| Cancer Epidemiology Biomarkers & Prevention | Molecular Cancer Therapeutics |
| Molecular Cancer Research | Cancer Prevention Research |
| Cancer Prevention Journals Portal | Cancer Reviews Online |
| Annual Meeting Education Book | Meeting Abstracts Online |