| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Molecular Biology, Pathobiology, and Genetics |
Departments of 1 Oncology and 2 Pathology, The Sol Goldman Pancreatic Cancer Research Center at the Sidney Kimmel Comprehensive Cancer Center, and 3 Department of Genetic Medicine at the Mckusick-Nathans Institute of Genetic Medicine, Johns Hopkins Medical Institutes, Baltimore, Maryland
Requests for reprints: Eric S. Calhoun, Department of Oncology, Johns Hopkins Medical Institutes, CRB 1, Room 464, 1650 Orleans Street, Baltimore, MD 21231. Phone: 410-614-3314; Fax: 410-614-9705; E-mail: ecalhou2{at}jhmi.edu.
| Abstract |
|---|
|
|
|---|
| Introduction |
|---|
|
|
|---|
The efficiency of analyzing allelic loss is also dependent on technological limitations. The analysis of microsatellite markers has been the gold standard for estimating allelic loss because the rate of heterozygosity at each locus is comparably higher than that seen for diallelic single-nucleotide polymorphisms (SNP; refs. 24). Such low heterozygosity frequencies have been one reason why analyses based on SNPs have found limited use in genetic studies (5). Technical difficulties in the scalability for microsatellite marker studies, however, limit the number of samples and/or the number of markers used, effectively ensuring a low genomic coverage and poor resolution of LOH breakpoints (6). Recent advances in detecting chromosomal copy number (e.g., bacterial artificial chromosome arraybased comparative genomic hybridization and representational oligonucleotide microarray analysis; refs. 711) have provided a solution to some of these problems but fail to identify many cases of LOH, as when uniparental disomy or polysomy is present. Additionally, genomic coverage remains somewhat limited by the number of probes arrayed (recently
5,400 probes per array; ref. 10).
New high-throughput technologies, such as the Affymetrix genotyping arrays which evaluate >100,000 genotypic-variant loci simultaneously, effectively allow for a high-resolution genome-wide analysis of allelic status on large sample sets (3, 12). The reduced informativeness of diallelic markers is overcome by an increase in the number of evaluated loci and subsequently the number of informative calls. For example, Matsuzaki et al. (3) reported a heterozygous frequency rate of 30.41% using the 100K SNP arrays, which would yield nearly 30,000 informative calls in diploid individuals. The quality of the results, however, is dependent on the purity of the starting tissue. Allelic loss studies are dramatically affected by contaminating normal tissues, a condition common to many scirrhous neoplasms (1, 13), and the effect of contaminating mouse tissues in xenografts has yet to be determined for SNP analysis. Consequently, studies of these tumor systems using the Affymetrix genotyping arrays are limited to established cell line cultures.
By combining the analysis of established, commercially available pancreatic cancer cell lines, as well as several early-passage cell lines with the Affymetrix 100K genotyping arrays, we have overcome long-standing problems associated with allelic-loss analysis in pancreatic cancer. We present here the high-resolution allelotype and breakpoint maps of 26 pancreatic cancer cell lines without the use of matched normal tissues.
| Materials and Methods |
|---|
|
|
|---|
Microsatellite marker analysis. Microsatellite marker mapping was done in collaboration with the NIH Center for Inherited Disease Research using a total of 386 microsatellite markers (Supplementary File 1; ref. 4). The positions of 317 of these markers (chromosomes 1-22) are currently mapped by the Ensembl database (16) and were used for comparisons made here. The genotype (the identity of the alleles at a particular locus) for each marker was "called" as homozygous/hemizygous (A) or heterozygous (B) based on the resolution of one or two PCR bands, respectively. If the PCR reaction yielded no detectable band, the marker was classified as missing (M). Homozygous/hemizygous and missing calls were considered uninformative whereas heterozygosity indicated the retention of both parental alleles. Data were visualized with Excel (Microsoft, Redmond, WA) by plotting the calls with respect to their positions in the May 2004 human genome build.
SNP allelotype mapping. The genotypes of 115,353 SNPs were analyzed using the Affymetrix CentXba and CentHind oligonucleotide arrays hybridized to reduced-complexity genomic DNA as described (3). Briefly, 250 ng of genomic DNA were digested with either XbaI or HindIII, adapters were ligated to the digested DNA, and PCR was done to preferentially amplify fragments 250 to 2,000 bp in length. Samples were then fragmented, fluorescently labeled, and hybridized to the arrays according to the protocol of the manufacturer. Genotypes were called by the GeneChip DNA Analysis Software Tool (version 3.0) using a 0.05% error setting (Supplementary Files 2-5). Four potential genotype calls were made: AA, BB, AB, and NoCall. No distinction was made between homozygosity for either allele; each result was considered uninformative and provided only a visual reference to indicate genome coverage. Heterozygous calls indicated the retention of both alleles and were considered informative. "NoCall" data were not used in the analysis of LOH. Genotyping calls were visualized by plotting with respect to their genomic position listed in the May 2004 assembly of the human genome. Any probes of which the position was not known or not annotated by GeneChip DNA Analysis Software Tool were subsequently omitted as were SNPs with a Caucasian heterozygous frequency of zero (3). This left a maximum of 104,502 evaluated SNP loci per sample.
Critical gap distance for LOH. The distances between heterozygous calls for each of the 42 Caucasian normal, EBV-transformed, lymphoblastoid cell lines were calculated from previously published allelotypes (3). An interheterozygous call distance of 2.75 million bp (Mbp) represented a value greater than the 99.9th percentile of all gap sizes observed. This distance was used as the cutoff to eliminate smaller regions of contiguous homozygous SNP markers that occur naturally (i.e., by chance) in each cell line. Candidate regions evaluated by at least 40 SNPs were considered as "critical" gaps, indicating instances of LOH. Immediately adjacent critical gaps were combined and represented as one contiguous region of LOH. Instances of regions occurring between critical gaps, or those occurring at the ends of chromosome arms, were combined into one contiguous region at our best judgment. Centromeric regions were included in LOH estimates when critical gaps were located immediately adjacent on both p and q arms; otherwise, each breakpoint estimate was terminated using the position of the last SNP evaluated by the arrays on each respective arm. Excessively homozygous regions (between 2.75 and 8 Mbp and at least 40 SNPs) identified in the 131 Caucasian normals (Supplementary Table S1) were removed from the list of critical gaps as they may represent false-positive events of identity-by-descent. Finally, homozygous regions >8 Mbp identified in the normals were believed to represent true LOH events affecting the immortalized "normal" cell line samples and, as such, were not used to remove critical gaps from the cell line data.
Precision of critical gap breakpoints. Matched normal tissue (not immortalized cell lines) was available for three of the early-passage carcinoma cell lines (PL9, PN139; PL11, PN192; PL13, PN147) and were used to test the method for estimating critical gaps. SNP allelotypes for each normal were determined as described above and compared with the respective carcinoma line allelotype maps. Gaps containing at least three informative hemizygous markers were used to identify a region of true LOH, and the positions of the outermost hemizygous markers established the minimal region of loss (MRL) breakpoints. Regions identified as lost using the matched normals, but not identified as a critical gap, were used to estimate the false-negative rate. Critical gaps containing less than three informative markers, as determined from the matched normal genotypes, were used to estimate the false-positive rate. Critical gaps showing true loss were compared with the MRL estimates to gauge the precision of our allelic loss breakpoint positions.
Estimation of total allelic loss. The extent of total allelic loss was estimated by calculating genomic fractional allelic loss (GFAL; ref. 4), with the following modifications. An expected Caucasian frequency of total heterozygous calls for the Affymetrix CentXba and CentHind oligonucleotide arrays (0.3041; ref. 3) was used as an estimate of the total number of informative markers. GFAL was calculated by dividing the total observed heterozygous frequencies by 0.3041, subtracting from 1, and converting to a percent.
Statistical analysis. The Spearman rank correlation coefficient was used to evaluate whether the degree of allelic loss (i.e., increasing levels of GFAL) correlated with the number of years because the cell line was isolated and established in culture. The year of establishment was obtained from published records (1723) or from the respective suppliers and ranked with the oldest as last (ranked 24th, Table 1
). GFAL scores were similarly ranked in descending order with cell lines having the highest degree of allelic loss last (ranked 24th). Cell lines with the same year of establishment were assigned an average of the ranks for the year. A high level of significance [
= 0.01; critical region rs > 0.485 (n = 24)] was used to evaluate the test statistic. The critical value as well as the method for calculation of the test statistic was obtained from published sources (24). The microsatellite unstable lines, PL3 and PL5, were not included in this analysis as they showed no deviation of GFAL from the 42 published Caucasian controls (Table 1 and data not shown; ref. 3).
|
Homozygous deletions. Dramatic drops in copy number values below 1.0, evidenced by both arrays, were indicative of a true homozygous deletion. With decreasing regional size (and/or corresponding copy number averages closer to 1.0), the ability to distinguish between homozygous deletion and LOH was reduced. To compensate, we calculated a 20-SNP moving average of the frequency of NoCall genotypes across the genome for each array. Agreement between increasing NoCall frequency and decreases in copy number averages was used to identify additional candidate homozygous deletions. Representative candidate regions, as well as several previously identified, were validated using PCR with at least two consecutive nonoverlapping primer sets. Regional chromosomal gains and amplifications were not specifically evaluated by this study. Raw data and positional moving average (PMA) estimates of copy number are included in Supplementary data.
| Results and Discussion |
|---|
|
|
|---|
3,000 heterozygous informative calls. With the introduction of the 100K SNP arrays, genomic coverage improves 10-fold and would yield a similar 10-fold increase in heterozygous calls (i.e.,
30,000), suggesting it may be possible to do allelic studies without matched normal tissues. To test this proposal, we evaluated an initial panel of seven pancreatic cancer cell lines (AsPc1, BxPc3, COLO357, Hs766T, MiaPaCa2, Panc-1, and Su86.86) using the Affymetrix CentXba and CentHind oligonucleotide arrays. A total of 104,502 loci were evaluated per cell line, yielding informative heterozygous calls ranging from 18.8% (17,049 of 90,822) to 6.1% (5,326 of 87,005). The data were plotted with respect to their physical positions, indicating several areas of chromosomal loss (Fig. 1 and Supplementary Figs. S1-S9). Validation of the experimental method and data was accomplished using a large panel of microsatellite markers. In total, 2,001 microsatellite calls were obtained, yielding 729 informative heterozygous calls. Heterozygous microsatellite frequencies ranged from 23.3% (65 of 279) to 47.2% (125 of 265) for each cell line (Supplementary Table S2).
|
As expected, the ability to use the microsatellite marker data to determine areas of LOH proved difficult due to the low number of informative calls. Using the SNP arrays, a 120-fold increase in heterozygous calls was obtained over the microsatellite analysis (86,158 versus 729). Using 2.86 billion bp (Chr1-22; ref. 16) as the combined length of the human genome and after subtracting centromeric and telomeric regions not evaluated with the SNP arrays (3), the average heterozygous intermarker distance between these methods varied from nearly 25.5 Mbp (microsatellite analysis) to 216 kbp (SNP arrays). Additionally, plotting of the genotype calls of the SNP arrays revealed clear demarcations of chromosomal loss even without the use of normals, which otherwise would increase the number of informative markers (Fig. 1 and Supplementary Figs. S1-S9).
Considering these results, the ease of the assay, and the time and tissue savings shown, we expanded our analysis to an additional panel of 19 pancreatic cancer lines evaluated with the 100K arrays. In total, 2.72 million loci were evaluated, yielding 2.39 million genotype calls with more than 415K heterozygous calls (Supplementary Files 2 and 3). The frequencies of informative heterozygous calls ranged from 30.1% (PL3) to 6.1% (MiaPaCa2; Table 1). The average interheterozygous marker distance was calculated to be 166kbp.
Regions of LOH determined by the critical gap distance. Data were separated by chromosome and plotted with respect to physical position. This enabled a visual comparison of overlapping regions of LOH. Large regions of homozygosity encompassing entire chromosomes or chromosomal arms were clearly evident as were smaller areas spanning <10 Mbp (Fig. 1 and Supplementary Figs. S1-S9). Lacking a matched normal control, the distinction between identity-by-descent and true areas of homozygosity accomplished through allelic loss became more difficult with a decreasing size of the region of interest. We therefore used the published record of 42 Caucasian normals (3) to determine the average distance between heterozygous calls on the 100K SNP arrays. We assigned a critical gap distance of >2.75 Mbp (evaluated with at least 40 SNP probes) as a cutoff for filtering out the areas of chance homozygosity; this size represents greater than the 99.9th percentile of all gaps observed (data not shown). Regions >8.0 Mbp identified in the controls were not filtered out of the candidate regions identified in the carcinoma cell lines as many of these homozygous regions may indeed represent true areas of LOH, a phenomenon known to occur in clones of lymphoblasts transformed by EBV (28, 29). Regions of excessive homozygosity between 2.75 and 8 Mbp appearing within the control population were more difficult to evaluate. Several possible mechanisms could account for the observation of excessively homozygous regions in the control population. Foremost, chip design artifacts, such as decreasing SNP-probe densities as well as stretches of polymorphisms with very low minor allele frequencies in Caucasian populations (chosen to minimize gaps in genomic coverage), may lead to such phenomena. These regions may represent areas of true LOH or simply represent inherently stable chromosomal regions having little genomic variation in the population.
To gain a better perspective of the ability of the arrays to evaluate the above Caucasian population, we catalogued all gaps >2.75 Mbp (containing at least 40 evaluated SNPs) occurring in these 42 Caucasian normals, as well as a second panel of 89 HapMap CEPH normals previously published (30). In total, 178 instances of homozygosity >2.75 Mbp but <8 Mbp occurred in the 131 normal controls (Supplementary Table S1). More than half of those (n = 90) were clustered (three or more instances per cluster) within 14 common regions across the genome, suggesting the chip density or SNP heterozygous frequencies are inadequate to accurately assess the allelic status in those areas. Taken together, the 178 instances of homozygosity would represent a theoretical false-positive rate of 1.37 regions per genome analyzed. This rate decreases to 0.68 regions per genome after elimination of common clusters.
Using all of the catalogued homozygous regions from the 131 normals as a filter, we eliminated any common critical gaps occurring in the carcinoma cell lines as candidate regions of LOH as these were likely to be false positives. The remaining (qualifying) areas of LOH were then plotted (Fig. 2 ). The breakpoint positions of each region (Supplementary Table S3) were taken from the SNP genomic positions of the first and last heterozygous genotype call (or the terminal chromosomal SNP evaluated by the arrays for each chromosomal arm) that defined a critical gap. Chromosome arms with the greatest frequency of allelic loss (Fig. 3 ) were confirmed on 18q (95.8%, MADH4), 17p (91.7%, TP53), 9p (87.5%, CDKN2A), 8p (79.2%), 3p (75.0%), 6q (75.0%), and 12q (70.8%; refs. 4, 10, 31, 32). Importantly, detailed maps of allelic loss are provided for these widely shared cell lines allowing for the rapid and efficient evaluation of candidate genes for somatic mutations (Fig. 2; Supplementary Table S3).
|
|
The use of matched normal tissues allowed for a separate estimate of our false-positive and false-negative rates. The false-positive rate, represented by the frequency of critical gaps failing to have any informative markers (i.e., no heterozygous genotypes in the matched normal tissue), was
2.3 per genome (N = 7). The median size of these critical gaps was 3,478,995 bp. We identified 11 instances of false negatives (i.e., regions containing at least three hemizygous markers within a single heterozygous gap), indicating a projected rate of 3.6 regions per genome. The median interheterozygous call distance for these regions was 970,860 bp with a median MRL distance of 493,645 bp. Four of these occurred on chromosome 8 of PL9 and may be indicative of a mosaic chromosome or chromosomal region undergoing change in culture. Other cell lines (e.g., AsPc1 chromosome 1q, PK9 chromosome 13, and PL21 chromosome 3p) show similar genotype patterns (Supplementary Figs. S1-S9) and may therefore underestimate LOH in these regions as aberrant heterozygous calls will interfere with the calculation of critical gaps. Single-cell dilution before determining the allelic status seems to minimize this problem (data not shown) but may introduce a clonal variability artifact to the data set, necessitating the analysis of multiple clones for each line. Each cell line was therefore only analyzed in aggregate.
Total allelic losses estimated by GFAL. GFAL, an estimate of the fractional total of all chromosomal losses, is ideally calculated by dividing the number of markers indicating allelic loss by the total number of informative markers. Such a calculation, however, would require a matched normal tissue to determine the informativeness of each marker. Instead, we used an average heterozygosity rate of 30.41% as a standard expected proportion for informative markers. GFAL scores were calculated to be 1.09% and 3.44% for the chromosomal stable cell lines PL3 and PL5, respectively, and varied from 6.47% to 2.40% in the Caucasian normals (data not shown). The SD of heterozygous frequencies in the controls was 0.54%, suggesting that the calculated GFAL in the tumor cell lines may incorrectly estimate total allelic loss by ±3.58% (95% confidence). In the lines having chromosome instability, we found GFAL to vary from 79.87% (MiaPaCa2) to 17.11% (XPA3) of the genome length. Increasing levels of GFAL did not correlate with the year of establishment [rs = 0.405; critical region rs > 0.485 (n = 24)]. Such a result, however, is not unexpected as the total number of cell doublings cannot be accounted for in the commercially available lines.
Copy number determination and identification of homozygous deletions. Using raw intensity data from each array, the Copy Number Analysis Tool estimates ploidy for each SNP locus by comparing the distribution of intensities obtained from a reference set of normal DNAs (25). Ideally, this comparison would produce a clear and consistent copy number estimate between each of the arrays at a particular locus (or between consecutive SNPs on the same array). In practice, however, copy number estimates often deviate significantly, leading some investigators to determine ploidy values from more direct and less variable techniques such as array-based comparative genomic hybridization (33, 34). To help reduce the variation in copy number estimates, we applied a PMA smoothing algorithm to each of the arrays and plotted the PMAs separately (Fig. 1 and Supplementary Figs. S1-S9). Agreement between the CentHind and CentXba PMA results was used to help reduce our uncertainty of regional copy number estimates.
Visual inspection of the PMA plots revealed complex genomic patterns. For example, PMAs of copy number suggest that CAPAN1 is likely a near-triploid cell line, given that the mean copy number estimates are often noninteger values (i.e., between 3 and 2 or between 2 and 1). On chromosome 2 of CAPAN1, variations in copy number as well as a reduction to homozygosity (i.e., true LOH) occurred in localized regions and did not involve the entire chromosome (Fig. 1). Reductions to homozygosity did not always coincide with copy number reduction (i.e., between
118 and 141 Mbp copy number PMAs remained at
2.5 despite a reduction to homozygosity) and may have, in part, involved locus-restricted (interstitial) recombinational events (i.e., gene conversion). By no means was this a unique event restricted to just chromosome 2 of CAPAN1; examples are present among other chromosomes in most of the cell lines reported here (Supplementary Figs. S1-S9). Most importantly, the ability to identify regions of LOH that lack copy number reduction illustrates why evaluation of genotypes alone (and associated critical gaps) is crucial for LOH determination. Such regions are produced by gene conversion, mitotic recombination, and chromosomal nondisjunction combined with chromosome reduplication (35).
Copy number PMAs also proved beneficial in the identification of homozygously deleted regions. Simultaneous decreases in copy number PMAs (<1.0) from both the CentHind and CentXba arrays routinely coincided with the presence of a homozygous deletion (Fig. 1 and Supplementary Figs. S1-S9). We combined copy number data with genotyping NoCall frequencies (homozygous deletions have an inherent increase in NoCall genotypes) to help identify additional candidate regions that were partially masked by the PMA smoothing algorithm. In total, 41 homozygous deletions were identified, of which 22 were previously reported (3643). The remaining 19 homozygous deletions represent first reports implicating an additional 13 regions (11 separate chromosomes) not previously associated with pancreatic carcinogenesis (Table 2 ). Unexpectedly, the total number of homozygous deletions seemed to be bimodal: of 41 homozygous deletions, 23 occurred in just two cell lines (15 in BxPc3 and 8 in MiaPaCa2). Based on GFAL scores alone, such a high rate of homozygous deletions would not have been predicted for BxPc3. This suggests that the processes leading to chromosomal instability (CIN) having only few homozygous deletions (original CIN) may be distinguishable from an occasional superimposed phenotype having high numbers of these genomic holes (holey CIN).
|
LOH represents a key structural foundation for proposed cancer genome projects, a required component for identification of gene conversion and similar chromosome recombinational events [as shown in Fig. 1 and as proposed by Cavenee et al. (35)], and a virtually required accompaniment of somatic (as opposed to polymorphic germ line) homozygous deletions, among other uses. The facile extension of LOH analysis to the most common and widely shared cancer samples (i.e., the commercially available cell lines lacking matched normals) is likely to prove essential for many anticipated cancer studies.
| Acknowledgments |
|---|
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
We thank Drs. Elizabeth M. Jaffee and Arata Horii for the kind gifts of cell lines, Dr. Alison P. Klein for statistical advice, Carl S. Kashuk for help in the evaluation of the HapMap data set, and Dr. Steven C. Cunningham for critically reading the manuscript.
| Footnotes |
|---|
Received 2/23/06. Revised 5/ 4/06. Accepted 6/20/06.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
M. E. Mullendore, J.-B. Koorstra, Y.-M. Li, G. J. Offerhaus, X. Fan, C. M. Henderson, W. Matsui, C. G. Eberhart, A. Maitra, and G. Feldmann Ligand-dependent Notch Signaling Is Involved in Tumor Initiation and Tumor Maintenance in Pancreatic Cancer Clin. Cancer Res., April 1, 2009; 15(7): 2291 - 2301. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. C. Kimmelman, A. F. Hezel, A. J. Aguirre, H. Zheng, J.-h. Paik, H. Ying, G. C. Chu, J. X. Zhang, E. Sahin, G. Yeo, et al. Genomic alterations link Rho family of GTPases to the highly invasive phenotype of pancreas cancer PNAS, December 9, 2008; 105(49): 19372 - 19377. [Abstract] [Full Text] [PDF] |
||||
![]() |
G.-M. Zou and A. Maitra Small-molecule inhibitor of the AP endonuclease 1/REF-1 E3330 inhibits pancreatic cancer cell growth and migration Mol. Cancer Ther., July 1, 2008; 7(7): 2012 - 2021. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Abe, N. Fukushima, K. Brune, C. Boehm, N. Sato, H. Matsubayashi, M. Canto, G. M. Petersen, R. H. Hruban, and M. Goggins Genome-Wide Allelotypes of Familial Pancreatic Adenocarcinomas and Familial and Sporadic Intraductal Papillary Mucinous Neoplasms Clin. Cancer Res., October 15, 2007; 13(20): 6019 - 6025. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Stark and N. Hayward Genome-Wide Loss of Heterozygosity and Copy Number Analysis in Melanoma Using High-Density Single-Nucleotide Polymorphism Arrays Cancer Res., March 15, 2007; 67(6): 2632 - 2642. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Cancer Research | Clinical Cancer Research |
| Cancer Epidemiology Biomarkers & Prevention | Molecular Cancer Therapeutics |
| Molecular Cancer Research | Cancer Prevention Research |
| Cancer Prevention Journals Portal | Cancer Reviews Online |
| Annual Meeting Education Book | Meeting Abstracts Online |