| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Epidemiology |
1 Division of General Internal Medicine and 2 Lung Biology Center, Department of Medicine; 3 Department of Biopharmaceutical Sciences; 4 Helen Diller Family Comprehensive Cancer Center; and 5 Medical Effectiveness Research Center for Diverse Populations, University of California, San Francisco, San Francisco, California; 6 Northern California Cancer Center, Fremont, California; 7 Stanford School of Medicine and Stanford Cancer Center, Stanford, California; and 8 Children's Hospital Research Institute, Oakland, California
Requests for reprints: Laura Fejerman, University of California, San Francisco, 1701 Divisadero Street, 5th Floor, San Francisco, CA 94143-1732. Phone: 415-885-7504; Fax: 415-353-7932; E-mail: laura.fejerman{at}ucsf.edu.
| Abstract |
|---|
|
|
|---|
| Introduction |
|---|
|
|
|---|
Women of mixed descent, like U.S. Latinas, present both a challenge and a unique opportunity in genetic association studies (9–11). On one hand, studies in Latinos may be confounded due to the potentially underlying dissimilarity between cases and controls in terms of genetic ancestry (12, 13). On the other hand, populations of mixed ancestry provide an opportunity for examining the role of genetic and environmental factors in explaining observed differences in incidence between populations and, eventually, for locating alleles that contribute to dissimilarities in disease risk. This can be achieved by means of admixture mapping, an approach that is based on the idea that if a marker increases the risk of disease and is found at a much higher frequency in one population, then that marker will also be found more commonly among cases and will be strongly associated with other ancestry specific markers across large stretches of the genome (14). Breast cancer among Latinas presents a particularly interesting case because the main ancestral components of the Latino population (European and Indigenous American) have the highest and lowest breast cancer incidence (1).
We have previously investigated the association between genetic ancestry and breast cancer risk factors among Latinas in the San Francisco Bay Area using 44 ancestry informative markers (AIM; ref. 7). Here we use DNA samples from our previous study (167 cases and 286 controls) and DNA samples for an additional 273 cases and 311 controls to test the association between breast cancer risk and genetic ancestry among Latinas. We used 106 AIMs to determine the genetic ancestry in all of the women and compared ancestry between cases and controls, adjusting for known breast cancer risk factors in an effort to identify a genetic ancestry component to breast cancer risk. We also investigated the use of genetic ancestry as a covariate in genetic association studies for breast cancer among Latinas.
| Materials and Methods |
|---|
|
|
|---|
The San Francisco Bay Area Breast Cancer Study, described elsewhere (8, 15), is a multiethnic population–based case-control study of breast cancer initiated in 1995, and with biospecimen collection added for cases diagnosed between April 1, 1997 and April 30, 2002 and matching controls. Depending on the study protocol, study participants were invited to provide a blood or buccal sample. Women ages 35 to 79 y; residing in San Francisco, San Mateo, Alameda, Contra Costa, or Santa Clara counties; and newly diagnosed with a first primary invasive breast cancer were identified through the Greater Bay Area Cancer Registry, which ascertains all incident cancers as part of the Surveillance, Epidemiology, and End Results program and the California Cancer Registry. A brief telephone screening interview that assessed study eligibility and self-reported race/ethnicity (89% response among those contacted) identified 873 eligible Latina cases. Of these, 798 (91%) completed an in-person interview and 747 (86%) provided a biospecimen sample. Control women, ages 35 to 79 y and residing in the same five Bay Area counties, were ascertained by random digit dialing. They were frequency matched to cases by race/ethnicity and expected 5-y age group. The telephone screening interview, completed by 93% of women selected as controls, identified 1,126 eligible Latina controls without a personal history of breast cancer. Of these, 999 (89%) completed the in-person interview and 911 (81%) provided a biospecimen sample.
The present analysis includes only cases and controls who donated a blood sample. Sixty-three of the cases that participated in the current case-control study also participated in the Northern California site of the Breast Cancer Family Registry (16) and donated a blood sample as part of that study, which was obtained for this analysis.
The total number of blood samples available for the study was 503 cases and 679 controls. Individuals who did not provide information about country of birth (n = 9) or who were born in Europe (n = 6), Hawaii (n = 2), Philippines (n = 1), or in a country that was represented only by one individual (Brazil, Dominican Republic) were excluded from the present analysis (11 cases and 9 controls). The final number of samples genotyped was 492 cases and 670 controls.
All participants provided written informed consent and the research protocols were approved by the respective Institutional Review Boards at University of California, San Francisco and the Northern California Cancer Center.
Measures
Survey data. Data on age, demographic background (education in years, country of birth, age at migration if not U.S. born, and country of birth of parents and grandparents), and known or suspected breast cancer risk factors (age at menarche, parity, age at first full-term pregnancy, breast-feeding, use of oral contraceptives, use of hormone replacement therapy, daily alcohol intake, family history of breast cancer, and benign breast disease) were collected by in-person interview using a structured questionnaire (7). Dietary intake during the reference year (defined as the year before diagnosis for cases and the year before selection into the study for controls) was assessed using a modified version of the Block Food Frequency Questionnaire. Standing height and weight were measured by the interviewers. Body mass index (BMI) was calculated as measured weight (kg) divided by measured height (m) squared. For participants (13 cases and 21 controls) who declined the measurements, the BMI was based on self-reported height and weight during the reference year.
Tumor grade, stage, histologic type, and hormone receptor status were obtained from the Surveillance, Epidemiology, and End Results Cancer Registry records. Estrogen and progesterone receptor status were dichotomized (positive, negative) based on categories reported in pathology records. Information on human epidermal growth factor receptor 2 (Her2) status was not routinely obtained by the cancer registry for cases diagnosed before 2002. Therefore, we did not include Her2 status in the present analysis.
Marker Selection and Ancestral Populations
A set of 106 single nucleotide polymorphisms (SNP) that can separate Indigenous American, African, and European ancestry was used to estimate proportion of genetic ancestry in the sample of U.S. Latinas. Simulation studies have shown that
100 AIMs with allele frequency differences similar to the ones we used are required to achieve a correlation coefficient of >0.9 with true ancestry (13); thus, we genotyped 112 markers with the goal of successfully typing >100 markers. The AIMs used in this study were biallelic SNPs selected from the Affymetrix 100K SNP chip. AIM selection was based on calculations of allele frequency differences between Europeans, West Africans, and Indigenous Americans. The SNPs chosen maximize information for more than one ancestral population pairing, with a large difference in allele frequency between ancestral populations (>0.5). The AIMs are widely spaced throughout the genome and have a well-balanced distribution across all 22 autosomal chromosomes. The average distance between markers is
2.4 x 107 bp. The parental population samples that were genotyped on the Affymetrix 100K SNP chip included 42 Europeans (Coriell's North American Caucasian panel), 37 West Africans (nonadmixed Africans living in London, United Kingdom and South Carolina) and 30 Indigenous Americans (15 Mayans and 15 Nahuas). (More detailed information on the AIMs is available from the authors on request).
Genotyping
Genotyping of the 106 AIMs was done by Dr. Kenneth Beckman at the Children's Hospital Oakland Research Institute. Quality control was done on all DNA using a two-part procedure. Quantitative quality control (part 1) involved nonallelic quantitative real-time PCR using a single TaqMan probe to ensure amplifiability of DNA samples. Qualitative quality control (part 2) involved genotyping using a balanced polymorphism present in most human populations (rs3818) to ensure that cross-contamination of samples has not occurred. Genotyping was done using iPLEX reagents and protocols for multiplex PCR, single-base primer extension, and generation of mass spectra, as per manufacturer's instructions (for complete details, see iPLEX Application Note, Sequenom). It involved four multiplexed assays containing 29, 29, 28, and 26 SNPs, respectively, for a total of 112 candidate AIMs. Of these 112 markers, 106 robustly generated call rates at 90% of samples or higher, with typical call rates in excess of 99% of samples. Only those 106 markers were used in the study. Multiplexed PCR was done in 5-µL reactions on 384-well plates containing 5 ng of genomic DNA. Reactions contained 0.5 unit HotStarTaq polymerase (Qiagen), 100 nmol/L primers, 1.25x HotStarTaq buffer, 1.625 mmol/L MgCl2, and 500 µmol/L deoxynucleotide triphosphates (dNTP). Following enzyme activation at 94°C for 15 min, DNA was amplified with 45 cycles of 94°C x 20 s, 56°C x 30 s, 72°C x 1 min, followed by a 3-min extension at 72°C. Unincorporated dNTPs were removed using shrimp alkaline phosphatase (0.3 unit; Sequenom). Single-base extension was carried out by addition of single-base primers at concentrations from 0.625 µmol/L (low molecular weight primers) to 1.25 µmol/L (high molecular weight primers) using iPLEX enzyme and buffers (Sequenom) in 9-µL reactions. Reactions were desalted and single-base primer products measured using the MassARRAY Compact system, and mass spectra were analyzed using TYPER software (Sequenom) to generate genotype calls and allele frequencies.
There was insufficient DNA available from 574 individuals in the study. Therefore, DNA from these samples was amplified using a commercially available whole genome amplification kit (Qiagen REPLI-g Midi Kit). From the original set of samples that went through amplification, 92 yielded low-quality DNA and were excluded from the genotyping phase. A total of 1,070 samples (462 cases and 608 controls) were genotyped. Quality control measures were high for the whole genome amplification samples and the nonamplified ones. For whole genome amplification samples, the average AIM success rate was 98.5%, compared with 99% for the nonamplified samples. The average sample call rate was 95.6% for the whole genome amplification samples and 97.4% for the nonamplified samples. Samples with call rate smaller than 75% were excluded from the analysis (22 cases and 11 controls).
Three of the AIMs deviated significantly from Hardy-Weinberg equilibrium (P < 0.0005), all of them showing excess homozygosity, which is expected in the presence of population substructure (17).
Genotype and phenotype information was available for a total of 1,037 individuals (440 cases and 597 controls).
Statistical Analysis
Estimates of each individual's genetic ancestry were derived using a maximum likelihood approach (18, 19). The maximum likelihood model infers ancestry of each individual as a function of the probability of the genotypes observed at each locus based on the ancestral allele frequencies (Java script available from the authors on request). We used t tests (for continuous variables) and Fisher's exact tests for two by two frequency tables (for categorical variables) to determine if there were significant differences in characteristics between cases and controls. Mean genetic ancestry was estimated as the average of the individual genetic ancestry estimates within a group.
Associations between breast cancer risk and genetic ancestry were assessed using logistic regression models. Genetic ancestry was modeled as a continuous variable (with each unit change representing a 25% increase in European or African ancestry). The multivariate adjusted models included European ancestry, age (continuous), family history of breast cancer in first-degree relatives (yes, no), place of birth (U.S. born, foreign born), personal history of benign breast disease (yes, no), age at menarche, number of full-term pregnancies, months of breast-feeding per child, use of hormone replacement therapy (yes, no), daily alcohol intake (
10 versus >10 g), daily calorie intake (log transformed) during the reference year, and education (elementary school, middle school, high school, and college). Individuals with missing data were dropped from the multivariate analysis (32 cases and 25 controls). We evaluated models including both European and African ancestry (continuous) and using parent/grandparent European origin instead of genetic ancestry. The association with each AIM was evaluated with a logistic regression model with and without inclusion of genetic ancestry as a covariate to compare the distribution of z statistics before and after correction for population substructure.
All statistical tests were done using the programs STATA (20) and R (21), and all tests are two-sided.
| Results |
|---|
|
|
|---|
|
|
|
Adjustment for place of birth (U.S. born versus foreign born) and number of European-born ancestors was not as effective as genetic ancestry in eliminating the excess number of AIMs associated with risk of breast cancer. In models that included these factors but did not include genetic ancestry, 13 of 106 markers were nominally associated with breast cancer.
We estimated individual ancestry with and without the three AIMs that were not in Hardy-Weinberg equilibrium. Estimates were very similar and the associations remained significant.
| Discussion |
|---|
|
|
|---|
The association between European genetic ancestry and breast cancer needs to be interpreted with caution. There may be unmeasured or unknown risk factors for breast cancer that underlie the association that we observed. The present and previous studies (6, 8) found that breast cancer risk is higher among U.S. born Latinas, which suggests the influence of important unmeasured confounders. For example, place of birth (U.S. born versus foreign born) is significantly associated with breast cancer risk in our multivariate model and is likely to be a marker of some other more proximate risk factor. Similarly, genetic ancestry may be associated with other unmeasured, nongenetic factors that underlie breast cancer risk. Alternatively, our results suggest that there might be genetic variants with different frequencies in Indigenous American and European populations that influence risk for breast cancer. The only way to directly test this is to identify the genetic factors that underlie breast cancer susceptibility among Latinas. Such work is currently under way in a larger Latina population.
An important caveat in interpreting our results is that Indigenous American populations in the United States are diverse and may have some systematic genetic (as well as obvious nongenetic) differences compared with Indigenous American populations in Mexico, Central America, and South America. Wang and colleagues (22) recently explored the population genetics in Amerindian populations from North, Central America, and South America. They found substantial genetic differences among populations in the Americas compared with the differences among Asian or European populations. This may be due to repeated founder effects that occurred during the settlement of the Americas. Thus, even if the association we found is due to genetic factors, it may not be applicable to all indigenous populations in the Americas.
We found no evidence that associations with genetic ancestry differed by tumor characteristics such as hormone receptor status, stage, or grade. However, because sample sizes for most of the tumor subtypes were small, further work will be needed to explore the observed trends.
A related question that our study addresses is whether the variation in genetic ancestry among Latina women acts as a confounding factor in genetic association studies of breast cancer. Our results show that such studies may be confounded by genetic ancestry. Without adjustment for genetic ancestry, there was a dramatic deviation from the null hypothesis when testing the association between specific AIMs and breast cancer risk. However, there was no deviation after adjusting for ancestry differences, as expected based on theoretical results (23–29) and previous empirical studies (11–13, 28, 30–32). It is important to note that the AIMs we tested are among the markers that are most likely to be falsely associated with disease precisely because they are strongly correlated with genetic ancestry. However, the bias due to stratification may affect even less informative markers as the sample size increases (27).
We observed a strong association between the number of European-born parents and grandparents and breast cancer risk. This implies that the information provided by Latina women about place of birth of parents and grandparents could be an adequate approximation to genetic ancestry for risk assessment purposes. However, using the number of European parents and grandparents to adjust the association of individual markers with breast cancer risk, 13 of 106 markers were left significant at P < 0.05, compared with 4 of 106 markers when genetic ancestry was adjusted for. Thus, use of genetic ancestry in recently admixed populations may provide information above that of grandparents' origin. The four SNPs that had P < 0.05 after adjustment for ancestry are likely to be false positives because they did not achieve significance when we corrected the significant P value for multiple testing.
In summary, European genetic ancestry in U.S. Latinas residing in the San Francisco Bay area was associated with increased breast cancer risk after adjustment for known risk factors. Further work is needed to evaluate if the observed association is solely due to differences in nongenetic risk factors not included in the model or to genetic differences between populations.
| Disclosure of Potential Conflicts of Interest |
|---|
|
|
|---|
| Acknowledgments |
|---|
The content of this article does not necessarily reflect the views or policies of the NCI or any of the collaborating centers in the Breast Cancer Family Registry, nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. Government or the Breast Cancer Family Registry.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
We thank all the study participants.
Received 5/30/08. Revised 8/11/08. Accepted 9/ 3/08.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
M. Kagawa-Singer, A. Valdez Dadia, M. C. Yu, and A. Surbone Cancer, Culture, and Health Disparities: Time to Chart a New Course? CA Cancer J Clin, January 1, 2010; 60(1): 12 - 39. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Ducci, A. Roy, P.-H. Shen, Q. Yuan, N. P. Yuan, C. A. Hodgkinson, L. R. Goldman, and D. Goldman Association of Substance Use Disorders With Childhood Trauma but not African Genetic Heritage in an African American Cohort Am J Psychiatry, September 1, 2009; 166(9): 1031 - 1040. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Cancer Research | Clinical Cancer Research |
| Cancer Epidemiology Biomarkers & Prevention | Molecular Cancer Therapeutics |
| Molecular Cancer Research | Cancer Prevention Research |
| Cancer Prevention Journals Portal | Cancer Reviews Online |
| Annual Meeting Education Book | Meeting Abstracts Online |