| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Molecular Biology, Pathobiology, and Genetics |
1 Human Cancer Genetics Program and 2 Department of Internal Medicine, Comprehensive Cancer Center and 3 Mathematical Biosciences Institute, The Ohio State University, Columbus, Ohio; 4 Department of Preventive Medicine, Creighton University School of Medicine, Omaha, Nebraska; and 5 Department of Laboratory Medicine and Pathology, Mayo Clinic College of Medicine, Rochester, Minnesota
Requests for reprints: Albert de la Chapelle, Human Cancer Genetics, Room 804, Biomedical Research Tower, 460 West 12th Avenue, Columbus, OH 43210. Phone: 614-688-4781; Fax: 614-688-4772; E-mail: albert.delachapelle{at}osumc.edu.
| Abstract |
|---|
|
|
|---|
500 years (95% confidence interval, 425–625) for this mutation. Taken together, these data are suggestive of an earlier founding event than was first thought, which likely occurred in a European or a Native American population. The consequences of this finding would be that the AFM is significantly more frequent in the United States than was previously predicted. [Cancer Res 2008;68(7):2145–53] | Introduction |
|---|
|
|
|---|
4,500 cases a year in the United States alone (2). It is currently accepted that Lynch syndrome arises as a result of mutations in mismatch repair genes, giving the syndrome a dominant inheritance pattern. Four mismatch repair genes, MLH1, MSH2, PMS2, and MSH6, are typically implicated in the transmission of Lynch syndrome, with the majority of mutations having been identified in either MLH1 or MSH2. A broad spectrum of mutations have been identified throughout these genes, but due to the detection capabilities for large rearrangements being relatively poor until recently, the majority of mutations identified to date have been point mutations or small insertion/deletions. With the development of improved techniques, such as multiplex ligation-dependent probe amplification (MLPA; ref. 3) and semiquantitative PCR (4), the number of large genomic rearrangements identified has increased considerably. One such mutation is a deletion of exons 1 to 6 of MSH2, which was initially identified in nine families from the United States (5, 6). The MSH2 American Founder Mutation (AFM) was characterized by long-range PCR and sequencing of the breakpoints, which allowed for the separation of this deletion from that of other similar, but distinct, exons 1 to 6 deletions. Subsequent genealogic studies had linked three of the nine families and traced them through 12 generations to a putative common ancestor who migrated to the United States from Germany in the early 18th century (7). Based on this information, it was calculated that 18,981 [95% confidence intervals (95% CI), 6,038–34,466] Americans carry the AFM mutation (8).
In this study, we sought to gain further insight into the origin, occurrence, and spread of the MSH2 AFM in the United States. We developed a robust multiplex PCR which can be performed using standard conditions and thus allows for the accurate identification of AFM carriers. Using this detection method, we identified a further 32 families who carry the AFM. Haplotypes flanking the mutation were characterized in 29 of these new families, as well as four of the original families (throughout this article, we refer to the original families, families A–I, as families 1–9, respectively). The use of an intraallelic coalescent model generated an estimate for the age of the mutation, which may have considerable implications for the predicted origin and prevalence of the MSH2 AFM.
| Materials and Methods |
|---|
|
|
|---|
Genealogic analysis. Each proband with an exons 1 to 6 deletion of MSH2 who enrolled in this study provided a copy of their pedigree as drawn by their local genetic counselor and named a family member contact person to help with the genealogy work. The family member contact was asked to review the family tree to provide as much of the following information as possible on the oldest individuals in the pedigree (generally grandparents): full names, dates of birth, places of birth, dates of death, places of death, sibling names, and parent names. In addition, where possible, we tried to ascertain from which side of the family the mutation had been inherited to help limit the amount of genealogic work necessary. This information was then used by the study genealogist (M.E.B.) to develop a tentative genealogy based on a general consensus of information7 and from other pedigrees that have been posted on various Internet sites.8 After developing a "consensus" pedigree, the genealogist then attempted to find original documentation or other verifiable information to prove or disprove each claim (especially with respect to individuals born before 1850, when the United States censuses first began to list all household members). This was accomplished using censuses, wills, marriage and death records, deeds, cemetery records, diaries, and other related information that can be accessed via the Internet to determine names, dates, and geographic locations for the generation being analyzed. This information was then verified by cross-correlation with the same information for later and/or earlier generations using standard genealogic techniques.
MLPA. MLPA was carried out according to the manufacturer's instructions (MRC-Holland9). Briefly, 125 ng of genomic DNA was denatured and hybridized for 18 h at 60°C with the probe mix. Samples were then treated with ligase-65 at 54°C for 15 min. Subsequent PCR reactions were performed using a 6-carboxyfluorescein (FAM)–labeled primer. PCR products were resolved using an ABI3700 sequencer, and the results were analyzed with Genotyper software (Applied Biosystems). Deletions were suspected when the peak height was 60% or less than the peak height of the normal controls.
PCR-based screening and breakpoint characterization. Previously published primer sets (6) for this procedure were not ideal due to their tendency to produce false-negative results, primarily because the product was relatively large and spanned across three complete and one partial Alu repeats. For this study, we positioned the forward primer (5'-gcctggcgtcaaacgtt-3') in a region between two repeat elements that lacked homology to other human sequences. The reverse primer (5'-tgagtcattttggggatcagtt-3') is very similar to the F3 primer used by Wagner et al. (6); however, in combination with this new forward primer, it produces a breakpoint specific, 562-bp amplicon under standard PCR conditions. To ensure that poor DNA quality was not a reason for a negative result, this amplicon was multiplexed with an 811-bp amplicon (forward 5'-aagcatctcacctcatcctaacaca-3', reverse 5'-ggatcacacctgccttaaattgcat-3') from the BRAF locus, in a 15-µL PCR reaction. Each 15-µL reaction, containing 7.5 µL of GoTaq master mix (Promega), 25 ng of genomic DNA, and 5 pmol of each of the four primers, was cycled under the following conditions: 95°C for 2 min, 30 cycles of 95°C for 30 s, 60°C for 30 s, 72°C for 45 s, and a final extension at 72°C for 8 min. Samples with a breakpoint-specific product were treated with ExoSAP-IT (USB Corporation) and sequenced with the forward primer to confirm the presence of an AFM-specific breakpoint.
Diploid-to-haploid conversion. Haploid-converted clones from patient 8 were created using the conversion technology of Yan et al. (10). In brief, human lymphoblastoid cell lines were electrofused with a specifically designed mouse cell line (E2). Unfused mouse parental cells were negatively selected by sodium hypoxanthine, aminopterin, and thymidine (HAT), and unfused human lymphocytes were negatively selected by Geneticin. Hybrid cells were maintained in DMEM, including 10% fetal bovine serum, 0.5 mg/mL Geneticin, 1x HAT, and penicillin-streptomycin. To ensure that each clone contained one of the two homologues of chromosome 2, clones were typed at two polymorphic microsatellite loci (D2S2952 and D2S434).
Genotype analysis. Twelve microsatellite and seven single-nucleotide polymorphism (SNP) markers (Supplementary Table S1 and Supplementary Fig. S1) were used to obtain a haplotype spanning
12.5 Mb across the MSH2 locus. Markers were typed in AFM-positive probands and a set of 118 Caucasian control DNAs (Coriell Institute). SNPs were typed using standard PCR and sequencing protocols. For the microsatellite analysis, the reverse primer had an M13 tail which, in combination with a FAM-labeled M13 oligo, could be sized using an ABI7000. Each 25-µL PCR reaction contained 12.5 µL of AmpliTaq Gold master mix (Applied Biosystems), 25 ng of genomic DNA, 10 pmol of forward primer, 2 pmol of tailed reverse primer, and 10 pmol of the FAM-labeled M13 primer. Reactions were multiplexed when possible and cycled using the following profile: 96°C for 10 min, 50 cycles of 96°C for 30 s, 60°C for 30 s, 72°C for 30 s, and a final extension at 72°C for 10 min.
Estimating allele age. To make an estimate for the age of the AFM, we took two approaches, both of which use the haplotype data generated from affected individuals and data obtained from a set of unaffected controls. The first method, described by Risch et al. (11), gives a separate age estimate for each marker using the following calculation: age (in generations) = ln(LD) / ln(1 –
), wherein LD is the linkage disequilibrium index (12) and
is the recombination fraction between the marker of interest and the mutation. Genetic distances were calculated using a conversion factor of 1.1 Mb/cM, which was obtained from a comparison of Centimorgan and megabase values for the 2p22-2p16 region (based on deCODE mapping data; ref. 13). The second method uses an intraallelic coalescent model to assess the linkage disequilibrium across the marker set as a whole. The analysis is performed using the DMLE+2.2 software (14).10 In addition to the genotype data, marker locations, population growth rates, and an estimate for the proportion of disease bearing chromosomes being analyzed are used by the software. To estimate the proportion of disease-bearing chromosomes being studied, we used the following data, 6% lifetime risk for CRC (American Cancer Society),11 2.8% of CRCs are Lynch syndrome,6 and 6.8% of Lynch syndrome cases are due to the AFM.6 For the population growth rate, we took data from the U.S. census, which records population figures back to 1790.12 The census records show that the population growth rates have decreased with time; so to account for this, we used a growth rate across the whole time period [1.65-fold per generation (fpg), where a generation is taken as 25 y] along with two extreme growth rates (one from the 19th century, 1.96 fpg, and one from the 20th century, 1.38 fpg) which will be representative of our outer CIs.
| Results |
|---|
|
|
|---|
|
For each proband who carried the AFM, a geographic location was recorded and plotted onto a map of the United States (Fig. 2 ). From these data, we were able to see that present day families were primarily located within three States; Kentucky, Ohio, and Texas. To address the issue of ascertainment bias being an apparent cause for the prevalence within Ohio and Kentucky, we sought to obtain additional data on AFM carriers from two of the nation's most prominent genetics laboratories, the Mayo Clinic13 and Myriad Genetics.14 In keeping with HIPAA guidelines, both laboratories were unable to provide certain patient-specific information; however, they were able to report that the incidences of the AFM among Lynch syndrome patients in their cohorts was 4.3% and 7%, respectively, as well as the geographic locations for their AFM-positive probands (Fig. 2).
|
Genotype characterization. To link the probands genetically and rule out the possibility that the mutation was occurring in the affected families de novo, we typed a set of polymorphic microsatellite markers and SNPs which flanked the mutation site. A panel of 12 microsatellites which flanked the mutation site and had an average interval of 956 kb (range, 120 kb–2 Mb) was chosen. We optimized microsatellite selection based on three criteria: (a) the repeat had to be highly polymorphic to reduce likelihood of similarity by chance, (b) the microsatellite and its flanking sequence had to be unique within the genome, and (c) the repeat was not located close to or within a more complex repeat, such as an Alu, which can often introduce additional size variation due to poly(A) tracts and can cause problems with primer binding due to their excessive prevalence. The use of a haploid cell line from patient 8 enabled us to generate an inferred haplotype for 34 samples (29 from this study and 5 from the original study; Fig. 3 ) and showed that all patients shared a common disease haplotype. Seven informative SNPs were typed at selected regions to establish if a specific microsatellite was different from the consensus because of a recombination event rather than a mutation. Twenty-one of the patients have disease-specific haplotypes in excess of 4.99 Mb; however, the core haplotype shared by all 34 of the patients is between 0.59 Mb (Clen32-Clen30) and 2.26 Mb (rs10495934-Clen25). To ensure that allele match was not a chance event, a panel of 118 control DNAs were typed with the 19 markers to determine the degree of variation for each marker (Fig. 3).
|
|
|
Estimate of allele age. It has been shown that the rate of recombination and population frequency of flanking marker alleles can be used to estimate the age of a mutation. In this study, we used two established methods: a single marker approach (11) and a combined marker approach (14). Using a marker-by-marker approach, we generated age estimates at all loci with respect to the AFM mutation (Supplementary Table S2; refs. 18, 19). Each marker estimate is generated independently from all the others, based on its location with respect to the mutation (recombination fraction) and the frequency with which the disease allele is found within the normal population. When all markers are considered, we get a broad estimate of 4 to 98 generations (100–2,450 years), with 13 of 15 markers giving a range of 4 to 26 generations (100–650 years). The two markers which define the outer limits of the fully conserved haplotype (rs10495934 and Clen25) give an estimate of 23 generations (575 years) and 19 generations (475 years), respectively.
The second method, which considers the linkage disequilibrium across all markers in a single analysis, is implemented in the DMLE+2.2 software. To use this method, we provided the program with an estimate of 32,150 (see Materials and Methods) for the number of disease chromosomes currently present in the U.S. population (based on data from the 2000 census). Three separate analyses were performed, each using a different estimate for the population growth rate (Fig. 6 ). The first analysis, which used a fast growth rate estimated from the 19th century censuses, gave an age estimate of 375 years (15 generations; 95% CI, 13–18). The second analysis, which used a slow growth rate estimated from the 20th century censuses, gave an age estimate of 775 years (31 generations; 95% CI, 23–40). The third analysis used a growth rate estimate from across both time periods and gave an age estimate of 500 years (20 generations; 95% CI, 17–25).
|
| Discussion |
|---|
|
|
|---|
500 years for the AFM. Whereas haplotyping proves that these modern day AFM carriers are all distantly related to one another, the oldest genealogic links are to individuals born during the early 18th century in Virginia. As it becomes ever more difficult to extend these genealogies before 1700, due to the sparsity of recorded data from these times, it is apparent that we may never know who the original AFM founding individual was. This MSH2 AFM has some similarities to a recently described AFM in the APC gene which was brought to the United States from England around 1630 and affects two large U.S. kindreds (20). Our new genealogic findings and haplotyping data do however provide very strong evidence against the hypothesis that the AFM arrived in the United States as a single event in the early 18th century. The records linking 27 of the AFM families into seven extended families each with a common ancestor being born between 1700 and the early 1800s would make it seem very unlikely that they could all converge on each other in such a period of time that would be consistent with a single common ancestor having arrived in the United States during the early 18th century, a time of significant European immigrations. With this in mind, we are left with two hypotheses, both of which are supported by the mutation age calculations. Either, the subfounder families were in the United States for several generations before that of the current subfounder common ancestors, which would allow for the possibility that the mutation was either brought into the United States by a single European immigrant during an earlier period or that the mutation had been introduced into these European lines by a Native American individual during this early colonial period. Or, the mutation originated within Europe several generations before it arrived on the shores of the United States, probably through several individuals.
The lack of evidence for the AFM outside of the United States is perhaps more supportive of the first hypothesis; however, there also lies the possibility that most of the possible carriers of the mutation in Europe may have emigrated or that many failed to pass on the mutation to subsequent generations and so we might predict a significant reduction of cases, or even an absence of cases in the founding country. In addition, our search for the AFM in Europe has by no means been exhaustive, so it is still possible that there are AFM mutation carriers in Europe that we have just not been able to identify yet. We have speculated, based on our genealogic studies, that Scotland in particular is a potential European source for the mutation; however, we have only been able to identify a single candidate (individual with a known exons 1–6 deletion of MSH2) from this region, who was shown not to carry the AFM. In addition, samples we have screened from an ancestrally related population (Ireland) were also negative for the mutation, and to the best of our knowledge, it has not been described elsewhere in the British Isles. Until evidence in support of one of these two hypotheses is obtained, whether it is the identification of a single common ancestor within the United States or the presence of a case of the AFM in another country, we cannot make firm conclusions as to the origins of this mutation. Nevertheless our current data show that the AFM is significantly more prevalent within the United States than was previously thought, and is clearly a prominent cause of Lynch syndrome, particularly in Ohio, Kentucky, and Texas. This is an important public health issue because there is ample evidence that individuals who are identified as having Lynch syndrome can benefit from highly targeted cancer control measures (21–23).
With this in mind, we would promote the introduction of a single site diagnostic PCR, such as the one described here, as a first line of molecular screening in patients whose tumors stain negatively for MSH2 at the protein level. Such a diagnostic test would not only reduce the cost of genetic screening in a significant proportion of Lynch syndrome cases, but it would also allow for a far more rapid diagnosis than techniques, such as MLPA, and complete gene sequencing.
| Acknowledgments |
|---|
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
We thank Thierry Frebourg, Annika Lindblom, Alessandra Viel, Jan Lubinski, Elisabeth Mangold, Timm Goecke, Patrick Morrison, Andrew Green, Malcolm Dunlop, and Rebecca Barnetson for the provision and screening of MSH2 exons 1 to 6 deletion cases; Thomas Prior and the Ohio State University Molecular Pathology Laboratory for confirming the breakpoints in all the probands and for providing testing to the at-risk family members; Douglas Crews for his contributions to the genealogic studies; Richard Wenstrup, Lynn Anne Burbidge, and Cynthia Frye of Myriad Genetic Laboratories, Inc., for the provision of epidemiologic data; and the families who participated in this research.
| Footnotes |
|---|
6 H. Hampel, W.L. Frankel, S. Ramsey, et al., unpublished observations. ![]()
7 Found at www.ancestry.com. ![]()
8 Usually a combination of information found at either www.ancestry.com or www.familysearch.org and from privately known/researched information. ![]()
10 Freely available from www.dmle.org. ![]()
11 http://www.cancer.org/downloads/STT/CAFF2006PWSecured.pdf ![]()
12 www.census.gov and http://www.usapopulationmap.com. ![]()
15 Present-day northeastern Tennessee was part of North Carolina during the described period. ![]()
16 http://www.libs.uga.edu/darchive/hargrett/maps/maps.html ![]()
Received 12/10/07. Revised 1/15/08. Accepted 1/16/08.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
H. T. Lynch, Z. Gatalica, and J. Knezetic Molecular Genetics and Hereditary Colorectal Cancer: Resolution of the Diagnostic Dilemma of Hereditary Nonpolyposis Colorectal Cancer, Lynch Syndrome, Familial Colorectal Cancer Type X, and Multiple Polyposis Syndromes ASCO Educational Book, January 1, 2009; 2009(1): 221 - 226. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Cancer Research | Clinical Cancer Research |
| Cancer Epidemiology Biomarkers & Prevention | Molecular Cancer Therapeutics |
| Molecular Cancer Research | Cancer Prevention Research |
| Cancer Prevention Journals Portal | Cancer Reviews Online |
| Annual Meeting Education Book | Meeting Abstracts Online |