Abstract
We applied whole-genome single-nucleotide polymorphism arrays to define a comprehensive genetic profile of 23 esophageal adenocarcinoma (EAC) primary tumor biopsies based on loss of heterozygosity (LOH) and DNA copy number changes. Alterations were common, averaging 97 (range, 23–208) per tumor. LOH and gains averaged 33 (range, 3–83) and 31 (range, 11–73) per tumor, respectively. Copy neutral LOH events averaged 27 (range, 7–57) per EAC. We noted 126 homozygous deletions (HD) across the EAC panel (range, 0–11 in individual tumors). Frequent HDs within FHIT (17 of 23), WWOX (8 of 23), and DMD (6 of 23) suggest a role for common fragile sites or genomic instability in EAC etiology. HDs were also noted for known tumor suppressor genes (TSG), including CDKN2A, CDKN2B, SMAD4, and GALR1, and identified PDE4D and MGC48628 as potentially novel TSGs. All tumors showed LOH for most of chromosome 17p, suggesting that TSGs other than TP53 may be targeted. Frequent gains were noted around MYC (13 of 23), BCL9 (12 of 23), CTAGE1 (14 of 23), and ZNF217 (12 of 23). Thus, we have confirmed previous reports indicating frequent changes to FHIT, CDKN2A, TP53, and MYC in EAC and identified additional genes of interest. Meta-analysis of previous genome-wide EAC studies together with the data presented here highlighted consistent regions of gain on 8q, 18q, and 20q and multiple LOH regions on 4q, 5q, 17p, and 18q, suggesting that more than one gene may be targeted on each of these chromosome arms. The focal gains and deletions documented here are a step toward identifying the key genes involved in EAC development. [Cancer Res 2008;68(11):4163–72]
- esophageal adenocarcinoma
- array comparative genomic hybridization
- loss of heterozygosity
- DNA copy number
- amplification
- homozygous deletion
Introduction
During the past 3 decades, there have been significant increases in the incidence of esophageal adenocarcinoma (EAC). In the United States, rates have risen faster than any other cancer ( 1– 3), with similar increases documented in Europe ( 4, 5) and Australia ( 6, 7). With the increasing prevalence of contributing factors (i.e., obesity and acid reflux; ref. 8) in developed societies, it is predicted that EAC incidence will continue to rise, posing an escalating health burden. Better understanding of the genes involved, combined with increased knowledge of risk factors, may lead to improved screening and treatment. Although candidate screening approaches have implicated a few genes related to EAC development (reviewed in ref. 9), few studies have conducted detailed analyses on a genome-wide scale.
DNA copy number changes frequently contribute to tumor progression. Loss of tumor suppressor genes (TSGs), such as CDKN2A and TP53, commonly occurs in cancerous and precancerous conditions, including those of the esophagus (reviewed in ref. 10). In other genomic regions, copy number gain leads to the increased activity of oncogenes (e.g., MYC), which promote autonomous cell growth. Numerous TSGs and oncogenes have been shown to have a wide range of cellular functions and cancer specificities. There is little doubt that additional genes with roles in tumorigenesis await identification and whole-genome methodologies offer a powerful means to identify such genes.
Several studies have applied low-resolution comparative genomic hybridization (CGH) methodologies to investigate DNA copy number alterations in EAC ( 11– 13). This technology adequately detects high-level amplifications and homozygous deletions (HDs) but underestimates loss of heterozygosity (LOH; ref. 14). The application of high-density single-nucleotide polymorphism (SNP) microarrays to define genome-wide copy number changes provides superior resolution and combines the advantages of both CGH and LOH methodologies, allowing the detection of copy neutral LOH (NLOH) events ( 15, 16). Here, SNP arrays were used to generate high-resolution DNA copy number profiles in a panel of primary EAC tumor biopsies.
Materials and Methods
Biopsy collection and DNA extraction. Approval to undertake the study was obtained from the research ethics committees of the Queensland Institute of Medical Research and participating hospitals. Written informed consent to participate was obtained from all patients. Primary EAC biopsies were taken from 26 patients (25 male and 1 female) before treatment. A biopsy of normal squamous esophageal epithelium from one patient was used as a reference sample. Biopsies were placed in RNAlater (Ambion) immediately on collection and left at 4°C overnight. Samples were then stored at −20°C before removal of excess RNAlater before storage at −70°C. DNA and RNA were simultaneously extracted using Qiagen AllPrep extraction kits via the Tissue Lyser–based protocol according to the manufacturer's instructions (Qiagen). Diagnosis was confirmed by pathologic review (by A.D.C.) of a separate biopsy taken from the same lesion (n = 17, usually at the same level of the esophagus) or by clinical review (n = 9). Patient information was collected through self-completed, mailed questionnaires and clinical chart review ( 8); salient features are summarized in Table 1 .
Demographic details for 26 EAC biopsy patients
SNP microarray preparations. The Infinium II Assay was done using Illumina Sentrix HumanHap300 BeadChips (317K, TagSNP Phase I, v1.1) according to the manufacturer's specifications (Illumina). Briefly, 750 ng of genomic DNA were amplified at 37°C overnight using solutions WG-AMM and WG-MP1, following which the amplified DNA was fragmented using solution WG-FRG and precipitated with isopropanol after the addition of WG-PA1. Dried pellets were then resuspended in WG-RA1 and hybridized to beadchips along with WG-RA1 and formamide. Arrays were incubated overnight at 48°C, after which they underwent single-base extension on a TeFlow Chamber Rack system (Tecan) using solutions WG-XC1, WG-XC2, and WG-TEM. Following which, they were stained with WG-LTM and WG-ATM, dried for 1 h, and then imaged using a BeadArray Reader (Illumina). Image data were analyzed using BeadStudio 2.0 (Illumina). For additional details and example outputs, refer to Peiffer and colleagues ( 17). All genomic positions were based on hg17 from the University of California at Santa Cruz (UCSC) Genome Browser. 8
Compilation of individual EAC profiles. BeadStudio 2.0 software was used to generate whole-genome profiles for the tumors. Each EAC was referenced against a sample of normal squamous epithelium. As noted previously ( 17), this approach provided more consistent logR ratios compared with using the common reference pool provided by Illumina. We compared this squamous sample against the Illumina reference pool and found it to have a normal 2n DNA complement, with the exception of a very small region of amplification within chromosomal band 6q27. Thus, SNPs mapping within the 6q27 chromosomal band were excluded from analysis. Because the HumanHap300 Genotyping BeadChip only includes two SNPs that map to the Y chromosome, we also excluded this chromosome from our analyses. The process EAC and squamous reference data files are in the Gene Expression Omnibus, series GSE10506.
Initially, we used the autoscoring procedures built into BeadStudio 2.0 (LOH score and copy number variation algorithms); however, these yielded inconsistent output in the presence of nontumor tissue contamination. Thus, all DNA copy number regions were annotated manually by one of us (D.J.N.) and confirmed by another (H.Y.H.) with the aid of our Simulated DNA Copy Number (SiDCoN) tool ( 18). Any discrepancies between scorers were discussed and consensus was reached. With the exception of HD and 4n gain events (which both have normal Ballele profiles but grossly altered logR values), we did not score DNA changes if Ballele plots indicated that <30% of cells were involved, as described previously ( 18). DNA copy number changes for all samples were converted using Excel into custom data tracks for the UCSC Genome Browser.
Estimations of tumor cell density in biopsy samples. Tumor biopsies contain a variable amount of normal tissue. High levels of nontumor tissue interfere with DNA copy number interpretations of tumor samples. We thus estimated the tumor fraction of each biopsy using the following procedure: regions of LOH were identified manually; the level of Ballele involvement was determined within each of these regions by exporting the Ballele data for each sample from BeadStudio; using an R script, Ballele values <0.5 were inverted, providing plots ranging from 0.5 to 1 (this step essentially doubles the density of the Ballele signal intensity measures, providing more accurate estimates, particularly for shorter DNA copy number changes); and a smoothed binary segregation algorithm ( 19) was then applied, across each chromosome, to the modified Ballele values <0.97 (polymorphic SNPs), generating regionally normalized data for each SNP. Another R script was then used to identify the SNPs that mapped within each LOH event and average the normalized Ballele values within the appropriate tumor sample. Using SiDCoN ( 18), we then estimated the tumor involvement within each of these events. Across each sample, the LOH events with the highest tumor involvement were assumed to approximate the total fraction of tumor in each sample. This procedure assumes that contaminating nontumor tissue has a normal 2n DNA complement. The result of this analysis (presented in Table 1) showed that 3 of our 26 EAC biopsies contained <50% tumor material. These samples were excluded from further study because the Illumina platform has been found insensitive when tumor content drops below this level ( 17). The remaining 23 EAC biopsies contained 50% to 90% tumor cells based on the strongest LOH changes.
Median logR plots. Sample-specific logR data were exported from BeadStudio 2.0 for the 23 EAC biopsies with >50% tumor cells, as described above. R scripts were used to apply a smoothed binary segregation algorithm ( 19) to these data for each sample across each chromosome. These data were then compiled to generate median (smoothed) logR values across each data point.
Results
Sample details. Average age at diagnosis was 67 years (range, 52–80). The cohort included two stage I, seven stage II, eight stage III, and seven stage IV patients based on postoperative staging, with patient survival times ranging from 53 to 699 days and a minimum follow-up time of 70 days for surviving patients. Consistent with the previous genome-wide array studies ( 11– 13), and most EAC publications, our cohort has a strong male bias. As expected, the postoperative staging correlates significantly with patient survival [P = 0.0136, Cox proportional hazards (CPH)], whereas age (P = 0.1336, CPH) and the nontumor content of the biopsy (P = 0.2304, CPH) do not.
Types and numbers of DNA copy number changes. We observed 2,229 DNA copy number changes across all 23 EAC samples, an average of 97 per tumor (range, 23–208; Supplementary Table S1). Within these changes, 20% to 90% of the genome showed a change in DNA copy number in each EAC biopsy. This suggests a high background rate, with at least 39% (9 of 23) of our EAC primary tumor panel showing DNA copy number changes at any autosomal point. The genomic regions with the least number of changes were 1p36.32 and 10q23.3 with 9 changes and 11q22.3 and 16p13.2 with 10 changes ( Table 2A ). In each case, the type of changes seen was a roughly even mix of losses, NLOH, and gains, with the exception of 11q22.3, which was mostly LOH (8 of 10).
Key regions of DNA copy number change in 23 EAC biopsies
The most common changes were LOH and gains, with averages of 33 (range, 3–83) and 31 (range, 11–73) per EAC, respectively (Supplementary Table S1). SNP arrays allow the detection of copy NLOH ( 15, 16), which was surprisingly common, averaging 27 (range, 7–57) per EAC. LOH and NLOH changes tended to be larger, averaging 18 and 23 Mb, respectively, compared with 13 Mb for gains (Supplementary Table S1). Within each sample, LOH and NLOH changes were seen in an average of 20% of the genome, but the range within individual EACs was very large (3–52%). These changes often spanned a whole chromosome arm. It is noteworthy that in some biopsies the majority of LOH changes tended to involve all tumor cells, whereas others exhibited variable proportions of tumor cell involvement. Examining the tumor panel based on this variable revealed that it was variable, rather than categorical, with most samples exhibiting some to many LOH regions with <100% tumor cell involvement.
We noted 126 HDs within the EAC panel, ranging from 0 to 11 in individual tumors (Supplementary Table S1). Of these, 29 (23%) were partial HDs such that some DNA remained within a proportion of tumor cells. This was determined manually using SiDCoN ( 18) to show that the observed logR and Ballele pattern resulted from mixed cell populations. Partial HDs generally arose within a LOH or NLOH event, which perhaps demarks the initial allelic loss. When the mixed populations included a combination of HD, LOH, and normal cells or HD, NLOH, and normal cells, both Ballele and logR changes were used to determine the presence of the HD. In some samples, partial HD events spanned up to 100 Mb; however, HD regions in general were much smaller (<10 Mb) than the other DNA copy number changes observed.
Considering the different DNA copy number changes outlined above, the number of each type of change or the fraction of the genome involved does not seem to relate to patient survival (all CPH tests yielded P > 0.1) or tumor stage (P > 0.1, ANOVA).
Key regions of gain. It is worth noting that NLOH changes can be considered, along with LOH, as the loss of a functional allele (then duplication of the null/reduced allele) or with gains as a duplication of an overactive oncogenic allele (and removal of the normal allele). In this way, it is important to note NLOH changes within regions of either high LOH or gain. Table 2B shows those chromosomal regions with the most number of EAC biopsies showing concomitant gain. The indicated regions represent the smallest hg17 location shared among all amplified samples. It is interesting that the region with the highest number of gains is within 18q11.2, on a chromosome arm well documented as lost in EAC. This is also the region in which we saw the most amplifications, with 4 of the 14 gained samples showing greater than five copies within 18q11.2. Although we also observed loss further down the chromosome arm, this region of shared gain was present in >60% of our samples and a further four EACs (17%) had NLOH. The only known gene within the minimal region of overlap between the 14 amplified samples ( Table 2B) is CTAGE1, which is expressed in a variety of cancer types ( 20).
Chromosome 8 contained three key regions of gain (8q22.3-8q24.21), each involving the same 13 (56%) biopsies. The individual regions range in size from 1 to 3 Mb ( Table 2B). The two smaller regions, both in 8q24.21, seem to specifically target known oncogenes (MYC, MLZE, and DDEF1), whereas the larger region (8q22.3) contains several genes, including MYBL1.
The EAC panel showed frequent gains on chromosome 20, most commonly a 2.9-Mb section of 20q13.2. One sample, 40334, was amplified spanning chr20:49397548-50700650, whereas 53048 showed a 4n gain at chr20:50788541-51029249 and strong (9n) gain between chr20:51029250-52711041. The latter region contains several genes, including the oncogene ZNF217.
We observed two regions on the long arm of chromosome 1 with gains in >50% of EACs: chr1:143,700,001-144,900,000, centered on BCL9 in 1q21.1, and chr1:151,700,000-152,300,000 in 1q22 ( Table 2B). The latter region, defined by a 3n gain in 40334, contains many genes, including MUC1, which is known to have oncogenic potential ( 21) and to be frequently overexpressed in EACs ( 22).
Key regions of loss. We identified two types of LOH events within our data, regions that include frequent HD events, such as FHIT ( Fig. 1 ), and areas of LOH with no accompanying HDs, such as 17p (Supplementary Fig. S1), suggesting potentially different mechanisms of action. Regions where two or more EAC biopsies had overlapping HDs are shown in Table 3 . All known genes that map within each of these regions are listed. In several cases, for example, FHIT (3p14.2), WWOX (16q23.2), DMD (Xp21.2), MGC48628 (4q22.1), and PDE4D (5q11.2-q12.1), the smallest region of overlap between the contributing samples can be narrowed to within a gene. The smallest region of overlap within FHIT is a 20- to 25-kb region within intron 4, defined by 17 HD events ( Fig. 1).
A region of chromosome 3p14.2 centered on the FHIT gene, viewed in the UCSC browser, May 2004 (hg17) build of the human genome. The custom tracks represent profiles for 23 EAC biopsies with DNA copy number status noted by the shades: no fill, gain; light gray, NLOH; gray, LOH; black, HD.
Regions with 2 or more HDs within 23 EAC biopsies
On chromosome 9p, six samples had HDs ( Table 2E), which overlapped such that the smallest region includes five genes: MTAP, CDKN2A, CDKN2B, C9orf53, and DMRTA1 (Supplementary Fig. S2). Both CDKN2A and CDKN2B are recognized TSGs, and several reports have noted CDKN2A deletions and mutations in EAC (reviewed in refs. 9, 10). In our cohort, most HD events on chromosome 9 clustered on 9p21.3, although 42199 has two separate HD events [9p21.3 (chr9:20009316-22549868) and 9p21.2-9p21.1 (chr9:27354350-30507289)] and a single HD in 40334 spans 9p21.1-pter (Supplementary Fig. S2).
We found three regions with HDs, 18q21.1, 18q21.32, and 18q23, which center on a total of six genes ( Table 3), two of which (SMAD4 and GALR1) are TSGs. Two EACs (40356 and 40364) had separate HD events across 18q and two HD events span more than half the telomeric end of these arms, further supporting the hypothesis that there are multiple TSGs in this region.
Our analysis specifically identified MGC48628 (4q21.1) as a potential TSG, with three (40345, 40358, and 42199) of five HDs mapping within, or exclusively including, this gene. Six other samples had LOH across MGC48628, two more switched from NLOH to LOH within it, whereas another EAC was NLOH across the gene.
Table 2C presents a broad region of frequent LOH on chromosome 5q (5q11.2-q14.3), whereas Table 2E shows three HDs that focus on PDE4D (at the border or 5q11.2-q12.1). In the latter region, 15 of 23 EACs had either LOH or HD, and two others showed NLOH. By comparison, 10 samples had LOH for APC, whereas 4 had NLOH for all, or part, of the gene.
Table 3 lists all HDs seen in two or more samples, and the samples in which they occurred. None of the regions with two or more HD samples ( Table 3), nor all of them combined, provided any evidence of association with patient survival (CPH) or tumor stage (t test; data not shown).
Several regions contained high level of DNA loss but few HDs. The most prominent of these was on chromosome 17p, where all 23 EACs showed LOH ( Table 2C) or NLOH ( Table 2D) changes for most of the short arm, spanning chromosomal bands 17p12-p13.2 (Supplementary Fig. S1). Historically, TP53 (17p13.1) is frequently lost or mutated in EAC ( 10). Our data did not specifically implicate TP53; in fact, the broad overlapping region of change shown in Supplementary Fig. S1 suggested additional targets on 17p. Furthermore, the single HD event present on 17p did not include TP53. Given that several samples shifted back and forth between LOH and NLOH, it is difficult to ascertain a more defined target region. If we assume NLOH to be more important, then the regions 442766-1386639 (which includes ABR and TUSC5) and 6879962-7662915 (which includes BCL6B, TP53, and POLR2A) both contained 10 NLOH events and 13 LOH events across the 23 EAC biopsies, whereas if we assume LOH to be more important the region chr17:12378912-15024426 (17p12), defined by sample 41299, included 16 LOH and 7 NLOH events. The only HD detected on chromosome 17p (10868740-11363397) targeted FLJ45455 and there were 15 LOH and 7 NLOH events across the same region.
Within chromosomal band 11p15.4, there were two small adjacent regions (283572-10868740 and 10868740-21670355) with 14 LOH, 3 NLOH, and no HD and 15 LOH, 2 NLOH, and no HD events, respectively.
Averaged gain/loss plots. Figure 2 shows median logR values across each chromosome. The black line shows the median values, whereas the gray margins demark the 75% (upper quartile) and 25% (lower quartile) levels across all 23 samples. This format allows for more direct comparison of our results to those of the previous CGH and LOH studies because NLOH events were not highlighted. Using a median gain of >0.2 as a cutoff (with a value of 0.33 indicating all cells show a 3n complement for that DNA fragment), we found gains on 1q, 2q, 7p, 8q, 12p, 13q, 18q, 20p, and 20q. Peak median gains (>0.27) occurred at 8q23.3 (chr8:113800000-113900000) within the CSMD3 gene, where 11 samples were amplified, and on 13q12.13 (chr13:24660000-24680000) near FLJ25477, where 8 samples were amplified.
Median EAC biopsy logR values for chromosomes 1 to 22 and X. To generate this plot, the smoothseg algorithm ( 19) was applied to individual sample data and median values (black), across 23 EACs, were plotted in R. Top and bottom light lines, quartile values for each median value. Guide lines at 0.2 and −0.2 represent median value thresholds discussed in text, whereas light guides at 0.3 and −0.3 represent quartile thresholds.
The sharply defined losses on chromosomes 3p and 9p, with median logR values of less than −0.35 ( Fig. 2), corresponded to HDs within FHIT and 9p21.3 (containing CDKN2A and CDKN2B; Fig. 1; Supplementary Fig. S2). The region surrounding WWOX on 16q was evident from the combined data ( Fig. 2) but not to the same extent due to the large number of NLOH events rather than the relatively high number of LOH changes seen on 9p. Other extended regions of loss, with median values of less than −0.2, mapped to 4p, 4q, 5q, 11p, 16p, 17p, 18q, 19p, and 22q ( Fig. 2).
By using >0.3 for the 75% quartile level and less than −0.3 for the 25% level, we have flagged regions altered in a minority (∼25%) of samples. This added gains on 3q and 5p and deletions of 16q, 21q, and Xp. Table 4 summarizes these data across all chromosome arms, incorporating evidence drawn from previous CGH and LOH genome-wide DNA copy number studies done on EACs, which is discussed in detail below.
DNA copy number loss and gain summary of 10 genome-wide EAC tumor studies
Discussion
Previous studies reported a high background rate of DNA copy number change (20–30%) in EAC. We found a similar rate (40%) when one considers that about one third of the changes we observed are NLOH events, not detected by previous CGH studies. In Table 4, the highlighted regions summarized for each study indicate either gain (AMP), loss (LOH), or allelic imbalance (AI) in Gleeson and colleagues ( 23), who did not distinguish loss and gain events. Given this high background rate, it is not surprising that only 10p and 18p (along with most acrocentric arms) show no highlighted regions in any of the 10 studies ( Table 4). Adopting a pragmatic approach, we believe further investigation is required into all chromosomal regions where three studies or less have reported frequent changes. Important EAC genes may lie within these regions; however, a much larger sample size would be required to clarify whether genes within these regions play a significant role in EAC etiology.
Given the strength of the 3p14 findings from our cohort (>70% HD within FHIT) and others ( 24), it is surprising that only three of the other nine studies in Table 4 report noteworthy LOH on chromosome 3p. A partial explanation may lie in the fact that, of the 22 samples in our study that showed loss on 3p, only 8 show extensive regions of loss ( Fig. 1); thus, low-density studies, such as that of Hammoud and colleagues ( 25) with only a single marker at 3p25, may have missed the peak region within FHIT. This does not explain the data of Weiss and colleagues ( 26), however, where concomitant CGH analysis of esophageal squamous cell carcinoma showed high levels of 3p deletion (64%), yet only 4% of their 24 EACs showed deletion on 3p.
Similar to FHIT (FRA3B), WWOX (FRA16B) and DMD (FRAXC) occur within common fragile sites. Our data also showed frequent HD within the latter two genes. We found NLOH events to be frequent on 16q, masking its potential significance in the median logR plot ( Fig. 2), and in previous CGH studies that could not detect NLOH events. Furthermore, given the low resolution of microsatellite LOH studies listed in Table 4 (39–138 markers), it is also likely that other studies would have missed intragenic deletions within WWOX. Thus, it is unclear whether previous genome-wide studies did not detect losses in common fragile sites or whether these changes are only present in a subset of EACs. The group that previously reported HDs within FHIT ( 24) also observed changes in other fragile site genes WWOX (FRA16D) and genes within FRAXB ( 27). Because deletions targeting FRAXB have no known relevance to EAC tumor biology, Arlt and colleagues ( 27) propose that the increased activity of these fragile sites was a marker for genomic instability. Several lines of evidence link common fragile site stability to cell cycle checkpoints and DNA repair (reviewed in ref. 28). This does not negate the importance that loss of known TSGs FHIT and WWOX (reviewed in ref. 28, 29) could have on the progression of tumors with these changes; in fact, data indicate that these genes and the fragile sites they arise in are co-conserved (reviewed in ref. 29). Further work is required to characterize this phenomenon, to determine whether it is restricted to a subset of EAC patients, and to elicit specific roles for the disrupted genes.
The most consistent regions of frequent DNA loss in the literature are on 4q, 5q, 9p, 17p, and 18q, with particular foci on 4q22, 5q21, 9p21, 17p12, and 18q21-22 ( Table 4; reviewed in ref. 9). On 4q, our cohort has highlighted MGC48628 (4q21.1) with 56% LOH and three of five HDs involving only this gene. MGC48628 maps within the minimal region of loss identified by Rumpel and colleagues ( 30) in 29 primary EAC tumors. Six other genome-wide studies ( Table 4) support frequent 4q LOH; however, within these, there are two regions of loss: 4q22-23 ( 12, 13, 31) and 4p34-35 ( 11, 23, 25). Sterian and colleagues ( 32) reported frequent LOH on 4q, focused on 4q31-35. Taken together, these data suggest multiple target genes on this arm. Our median sample data ( Fig. 2) showed a sharp trough at 4p22 (MGC48628) and a much broader region of loss at the telomere of 4q (4q34-35).
Chromosome 5q also shows frequent losses in 70% of studies ( Table 4). The most common region of loss maps to 5q21, likely targeting APC ( 11– 13). These data are supported by several microsatellite-based studies, which report >30% LOH in the vicinity of APC ( 33, 34). In contrast, our EAC cohort shows LOH across much of 5q, with particularly strong regions of loss at 5q11.2-q12.1 and 5q14.3-31.1 ( Fig. 2). The only three HDs observed on 5q are clearly centered on PDE4D, which borders on 5q11.2 and 5q12.1, supported by losses in 65% of our cohort compared with <45% in the vicinity of APC. Thus, PDE4D, rather than APC, seems to be the focus of 5q losses in our EAC cohort. HDs have recently been reported within PDE4D in lung adenocarcinoma ( 35), suggesting it is a putative TSG.
The broad LOH peak we observed on 18q ( Fig. 2) is also present in six of the nine previous genome-wide reports in Table 4. Although the peak region seems to be within 18q21-22, containing the candidate TSGs DCC, SMAD2, and SMAD4, CGH studies report frequent loss of whole arm ( 11– 13). Within our cohort, we have nine HDs on chromosome 18q, which tend to cluster at 18q21.1, 18q21.32, or 18q23, with several samples having multiple HD events across the arm implicating SMAD4, GALR1, and MC4R, whereas DCC and SMAD2 fall outside critical HD events. We have identified a small (100 kb) region of gain at 18q11.2 ( Fig. 2), suggesting the presence of both an oncogene as well as the one or more TSGs on this arm. The only gene that maps within this AMP is CTAGE1. An array CGH (aCGH) study noted that 39% (7 of 18) of their EACs were amplified for LAMA3 ( 36), 1.3 Mb telomeric of CTAGE1. Further specific investigations will be needed to verify CTAGE1 as a potential oncogenic target.
CDKN2A is one of the most frequently deleted genes across cancers, and it is most likely the target for LOH in 9p21, a frequent change noted in 7 of the 10 genome-wide studies in Table 4. Our data showed frequent (26%) HDs involving CDKN2A; however, the minimal region of overlap between the six HD events contained four other genes, including CDKN2B, another TSG. Several studies have shown losses, mutation, or hypermethylation of CDKN2A in EAC ( 10, 37– 39).
All 23 of the EAC biopsies in this study showed DNA copy number variations along the length of the short arm of chromosome 17, generally LOH or NLOH. Changes to TP53 (17p13.1) have been reported as frequent early events in EAC progression ( 40). However, others have noted that also multiple regions on 17p are the target of LOH events ( 41).
Three studies reported frequent Y chromosome loss in 40% to 76% of EACs ( 11– 13). The underrepresentation of Y loss in Table 4 may be simply because so few studies have investigated it. Because the HapMap 330K SNP chips do not include Y-specific markers, we are unable to confirm this in our cohort. Given the strong male bias for this cancer, detailed investigation of the Y chromosome is warranted, especially because reintroducing Y has been shown to suppress the tumorigenic potential of other human cancer cells ( 42).
Table 4 shows that four to five genome-wide DNA copy number studies reported frequent gains on 3q, 7p, 7q, 15q, and 17q. Of these, our cohort has median logR peaks >0.2 on 3q and 7p. Looking across studies, focal points at 3q26, 7q21, and 15q25 can be determined. Although the regions on 7p are broad, 17q gains seem to center on 17q11 or 17q21 in different studies. Unlike the above listed regions, peak gains on 8q and 20q were very broad in our EAC panel ( Fig. 2), suggesting the presence of multiple oncogenes and perhaps explaining why these were the most frequently amplified regions across the 10 studies ( Table 4). Focal points of common gain across the studies can be narrowed to 8q24.1 and 20q13. The main target on 8q is believed to be MYC, although other surrounding oncogenes (including MLZE, DDEF1, and MYBL1) are frequently amplified. A novel amplicon resulting in the overexpression of CTSB has been shown in one study ( 43), indicating that this too may be a target for 8q gain. On chromosome 20q, candidate genes include ZNF217 ( 36) and MYBL2, although the latter gene is not within our peak region.
In summary, we have generated the most comprehensive investigation of DNA copy number variation in EAC tumors to date. Our data indicate that structural genetic changes are very frequent events in EAC, with an average of 97 changes per tumor. These changes constitute roughly even proportions of gain, LOH, and NLOH events, which together spanned 20% to 90% of the individual tumor genomes. HDs were relatively infrequent, 0 to 11 per tumor, and tended to be highly focused. We confirm that deletion within FHIT is one of the most common events in EACs ( 24). Frequent HDs within FHIT (3p14.3), WWOX (16q), and DMD (Xp) suggest a role for common fragile sites in EAC etiology; alternatively, decreased genomic stability may be a critical marker for a subset of EAC tumors. Aside from FHIT, these regions seem to be infrequent sites of loss in other EAC DNA copy number studies. This may be an issue of technical sensitivity (detection density) or it may indicate an as yet unidentified EAC stratification. Our data also showed multiple HDs targeting PDE4D (5q11.2-q12.1) and possibly SMAD4 (18q21.1) and GALR1 (18q23), which appear in the chromosomal regions frequently lost in previous studies ( Table 4). In addition, we found HDs clustered within the gene MGC48628 (4q22.1), making it a potential novel TSG.
Meta-analysis of the 10 genome-wide CGH and LOH studies summarized in Table 4 indicates that the most common sites for gain in EAC are 8q24 and 20q13. The broad peaks generally observed in these regions suggest that multiple oncogenes are involved and our results are consistent with this. Additionally, we have identified a frequent, focused gain within 18q11.2, centered on CTAGE1. Across the 10 studies, regions on 4q, 5q, 9p, 17p, 18q, and Y are the most frequently lost in EAC. The target genes seem to include the TSGs APC (5q21), CDKN2A (9p21.3), and TP53 (17p12), although other genes on 5q and 17p also seem to be critically lost. The key genes on 4q, 18q, and Y have yet to be identified. Comprehensive genomic profiling such as that presented here will allow a more defined approach to identifying and characterizing genes involved in EAC progression, offering the potential for improved clinical tests and treatments.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Acknowledgments
Grant support: NIH grant CA 001833-03 and National Health and Medical Research Council of Australia (Program no. 199600). D.C. Whiteman and N.K. Hayward are recipients of research fellowships from the National Health and Medical Research Council of Australia.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
We thank the study participants and their families as well as the other Study of Digestive Health and Australian Cancer Study investigators for their contribution: Adele C. Green, Greg Falk, Peter G. Parsons, David M. Purdie, and Penelope M. Webb (Queensland Institute of Medical Research); Sandra J. Pavey (Princess Alexandra Hospital); and Glyn Jamieson (University of Adelaide).
The funding bodies played no role in the design or conduct of the study; the collection, management, analysis, or interpretation of the data; or the preparation, review, or approval of the manuscript.
Footnotes
-
Note: Supplementary data for this article are available at Cancer Research Online (http://cancerres.aacrjournals.org/).
- Received December 19, 2007.
- Revision received February 15, 2008.
- Accepted March 2, 2008.
- ©2008 American Association for Cancer Research.