To identify genetic changes involved in the progression of breast carcinoma, we did cDNA array comparative genomic hybridization (CGH) on a panel of breast tumors, including 10 ductal carcinoma in situ (DCIS), 18 invasive breast carcinomas, and two lymph node metastases. We identified 49 minimal commonly amplified regions (MCRs) that included known (1q, 8q24, 11q13, 17q21-q23, and 20q13) and several uncharacterized (12p13 and 16p13) regional copy number gains. With the exception of the 17q21 (ERBB2) amplicon, the overall frequency of copy number alterations was higher in invasive tumors than that in DCIS, with several of them present only in invasive cancer. Amplification of candidate loci was confirmed by quantitative PCR in breast carcinomas and cell lines. To identify putative targets of amplicons, we developed a method combining array CGH and serial analysis of gene expression (SAGE) data to correlate copy number and expression levels for each gene within MCRs. Using this approach, we were able to distinguish a few candidate targets from a set of coamplified genes. Analysis of the 12p13-p12 amplicon identified four putative targets: TEL/ETV6, H2AFJ, EPS8, and KRAS2. The amplification of all four candidates was confirmed by quantitative PCR and fluorescence in situ hybridization, but only H2AFJ and EPS8 were overexpressed in breast tumors with 12p13 amplification compared with a panel of normal mammary epithelial cells. These results show the power of combined array CGH and SAGE analysis for the identification of candidate amplicon targets and identify H2AFJ and EPS8 as novel putative oncogenes in breast cancer. (Cancer Res 2006; 66(8): 4065-78)
- array CGH
- gene amplification
Gene amplification is one of the mechanisms underlying the activation of oncogenes, and it is often associated with poor prognosis, tumor progression, and acquired drug resistance ( 1). Therefore, identification of amplified oncogenes has potential diagnostic and therapeutic implications. In breast cancer, gene amplification occurs recurrently on some chromosomal locations, indicating the common activation of some oncogenes during tumor development. The most prominent and frequent amplicons have been reported on chromosomes 1q, 8p12, 8q24, 11q13, 12q13, 17q21, 17q23, and 20q13, and several candidate targets have been proposed and verified ( 2). The most well characterized breast cancer oncogene is ERBB2, located on chromosome 17q21 and amplified in 20% to 30% of breast carcinomas ( 3). Other oncogenes amplified in breast cancer include MYC (8q24); CCND1, EMS1, EMSY (11q13), IGF1R (15q26); and STK15, AIB1, and ZNF217 (20q13), whereas PI3KCA is activated in 25% to 40% of breast carcinomas due to oncogenic mutations, although its amplification was also reported in a fraction of breast tumors ( 4– 8). The successful treatment of ERBB2-amplified breast tumors with Herceptin, an inhibitor of ERBB2 activity, is one of the few examples of successful molecular-based therapy in breast cancer ( 9). Therefore, several large-scale genome resequencing and genomic approaches are aimed at the identification of novel tumor-specific genetic events, amplifications, or mutations, which could be targeted therapeutically.
Complicating the identification of relevant genes is the fact that most amplicons are fairly large, can span several megabases, and in extreme cases involve whole chromosome arms (e.g., chromosome 1q). Thus, amplification of the targeted oncogene is inevitably associated with coamplification of many surrounding genes. For example, amplification of ERBB2 is frequently accompanied by amplification and overexpression of nearby GRB7, TOP2A, and PIP4K2B genes ( 10). GRB7 encodes a protein that directly binds to ERBB2 and modulates ERBB2 signaling ( 11), TOP2A is a topoisomerase involved in DNA repair, and PIP4K2B is a lipid kinase that has been shown to enhance breast cancer cell growth ( 12). The function of these genes suggests that they may cooperate with ERBB2 and contribute to the malignant phenotype and therefore could also be considered targets of the 17q21 amplicon. Similar complexity is observed for the 20q13, 11q13, and other amplicons, suggesting that genetic selection for the overactivation of a group of genes is a general phenomenon. Thus, despite the fact that many amplified chromosomal regions have been identified, the characterization of their targets remains a difficult task. Performing comprehensive genome-wide screens on a large set of tumors and analyzing both genetic and gene expression changes in the same tumors may help in resolving this problem because this combined approach facilitates the identification of genes that are both amplified and overexpressed. Using this approach, KCNK9 was found to be the target of a small (550 kb) amplicon at 8q24.3 because it was the only overexpressed gene within that region ( 13).
Array comparative genomic hybridization (CGH) is a technology suitable to study gene copy number changes on a genome-wide scale at a high resolution ( 14– 16). Currently, three different array CGH platforms are used: bacterial artificial chromosome, cDNA, and long (60 bp) oligo arrays, with each having its own advantages and disadvantages. Previous cDNA array CGH studies of breast tumors and breast cancer cell lines revealed a strong correlation between copy number gain and increased gene expression and led to the identification of several new amplicons and their candidate targets, including HOXB7 at 17q21.3 ( 17– 19). However, the limitation of these studies was the use of a reference cDNA mix for evaluating gene expression changes, the potential probe hybridization bias associated with the use of fairly long cDNA fragments on the arrays, and limiting their analysis at advanced-stage tumors. In this study, we did cDNA array CGH on 30 breast tumors, including 10 preinvasive tumors (DCIS), and five nonmalignant cells purified from normal breast tissue or breast carcinomas, and in parallel, we used serial analysis of gene expression (SAGE) for evaluating gene expression patterns. Using this integrated approach, we confirmed known amplicons and their targets, such as ERBB2 at 17q21 and PAK1/EMSY at 11q13. We further identified many uncharacterized amplicons and their putative targets in both in situ and invasive carcinomas. Based on a targeted screen for the amplification of known oncogenes, KRAS2 in the 12p13-12 amplicon has previously been described as a gene amplified in a subset of breast carcinomas ( 20) and in one metastatic rectal tumor ( 21), but this amplicon has not been systematically characterized at high resolution. Our detailed characterization of the 12p13-p12 amplicon identified four putative candidate targets (ETV6, KRAS2, H2AFJ, and EPS8), and subsequent follow-up experiments confirmed H2AFJ and EPS8 as novel candidate oncogenes in breast cancer. Thus, these data show that the integration of SAGE with cDNA array CGH is a powerful approach for the identification of amplified candidate oncogenes.
Materials and Methods
Tissue specimens and cell lines. Tumor specimens were obtained from Brigham and Women's and Massachusetts General hospitals (Boston MA), Duke University (Durham, NC), University Hospital Zagreb (Zagreb, Croatia), and the National Disease Research Interchange. All human tissue was collected using protocols approved by the Dana-Farber Cancer Institute Institutional Review Board. Tissues were snap frozen on dry ice, stored in −80°C until use, or were immediately processed for immunomagnetic purification ( 22). Breast cancer cell lines were obtained from the American Type Culture Collection (Manassas, VA) or were generously provided by Dr. Steve Ethier (University of Michigan) and Dr. Arthur Pardee (Dana-Farber Cancer Institute). Cells were grown in media recommended by the provider.
cDNA array CGH profiling. Array CGH analysis was done essentially as previously described ( 23). Briefly, genomic DNA from normal and tumor tissues was fragmented and labeled according to published protocols. 8 Labeled DNAs were hybridized to human cDNA microarrays containing 14,160 cDNA clones (Agilent Technologies, Palo Alto, CA), for which ∼9,000 unique map positions were defined (National Center for Biotechnology Information, build 34). The median interval between mapped elements is 100 kb, with 92.8% of intervals <1 Mb and 98.6% <3 Mb. Log 2 ratios were calculated from Cye3/Cye5 fluorescence channels and further normalized by GC % content of the probe's genome region using local regression. These normalized profiles were then processed using Circular Binary Segmentation, a change-point identification technique developed for array CGH, to demarcate genomic segments with statistically uniform copy number ( 23, 24). Segments are assigned a log 2 ratio that is the median log 2 ratio of the contained probes. The data were then centered as previously described, so that the peak in the distribution of segment values lies at zero ( 23). Based on the histogram of log 2 ratio distribution among all normal samples and among all tumors, a minimum gain threshold was set to log 2 ratio of ≥0.09 (Supplementary Fig. S1), and the amplification threshold was set to log 2 ratio of 0.5. Similarly, segments with log 2 ratio less than −0.08 and −0.35 are considered to have a chromosomal region of “loss” and “deletion,” respectively. The raw and segmented data sets are provided as Supplementary Data Files.
Identification of amplified loci and statistical analysis. Identification of priority loci was done in a similar way as described ( 23). Based on the histogram of log 2 ratio distribution among all normal samples and among all tumors, a minimum gain threshold was set to log 2 ratio of ≥0.09 (Supplementary Fig. S1), and the amplification threshold was set to log 2 ratio of 0.5. Minimal common regions (MCRs) of chromosome amplification were generated based on overlapping recurrence across samples using the same algorithm as previously described ( 23). MCRs were further prioritized by the presence of the following features: (a) recurrence of high-fold amplification events in more than one sample, (b) a peak segment value of >0.8 in at least one sample, or (c) statistically significant recurrence of low-level alteration. MCRs with one or more of these features are summarized in Table 1 . Recurrence of array CGH gains and losses were compared between DCIS and invasive ductal carcinoma (IDC) sample groups, as well as between estrogen receptor–negative (ER−) and ER+ sample groups. Low-level thresholds, >0.09 and less than −0.08, were used to define gain and loss, respectively. At each probe location, total numbers of samples with gains and losses were counted in each sample group (e.g., DCIS versus IBC and ER+ versus ER− tumors), and significant difference was determined by Fisher's exact test (P < 0.05, not accounting for multiple testing).
Identification of best candidate targets in MCRs using SAGE. Breast cancer SAGE libraries were previously described ( 22, 25) and are also available online. 9 SAGE libraries were normalized to 100,000 total tags. For each gene in the MCR, normalized SAGE tag numbers are listed for each tumor. For each gene in the MCR, amplification status was defined by the local segmented array CGH value in that sample's profile. Thus, for each gene, samples are divided into three groups: (a) amplified tumors, (b) nonamplified tumors, and (c) normal tissues. For each gene with SAGE data, four Ps were generated reflecting the statistical significance of differences in tag numbers (a) between amplified tumors and normal group (PA/N); (b) among amplified tumor, nonamplified tumor, and normal group (PA/NA/N); (c) between amplified group and nonamplified plus normal group (PA/NA,N); and (d) between all tumors (amplified and nonamplified) and normal group (PA,NA/N). The last two Ps (PA/NA,N and PA,NA/N) were only calculated if PA/NA/N is significant (P < 0.05). These tests were done separately for each of three thresholds for defining “amplified” segment values: (a) + level (>0.09), (b) ++ level (>0.2), and (c) +++ level (>0.5) to confirm that results were stable across a range of CNA. Fold overexpression is estimated by dividing mean tag numbers from the tumor samples with amplification with mean tag numbers from normal group. We used the following criteria for identifying a gene as a candidate target of the amplicon: the gene must meet the following criteria (a) either PA/N or PA/NA/N is <0.05, (b) overexpression must be >2-fold in amplified tumors compared with normal, (c) among tumors predicted to be amplified the SAGE tag ratio of (tumor tag) / [max (normal tag)] must reach 0.67, 0.8, and 1.0 for levels of +, ++, and +++ amplification (see above), respectively.
Overall correlation between gene overexpression and amplification. SAGE data from each tumor with array CGH data were compared with those from two normal mammary epithelial cells using a previously described method ( 22, 26), and tags that satisfied the following two criteria were considered overrepresented in tumors: (a) the difference between the tag numbers in tumor and normal samples is statistically significant (P < 0.05) using the PK algorithm ( 26), and (b) normalized tumor tag number is at least 2-fold higher than the tag number in either of the two normal samples. Each tag with at least two copies/library in the SAGE libraries was assigned to the best matching gene using an online resource. 9 The total number of overexpressed genes in each sample was estimated based on the number of overexpressed tags with unique best gene match. SAGE analysis was done for all genes located in predicted amplicons to determine the fraction of amplified genes that are also overexpressed, and the subset of overexpressed genes that are amplified. For each sample, the observed number of overexpressed genes within amplified regions is compared with the number expected by chance to give an odds ratio. Odds ratios are not calculated if the expected number of overexpressed genes in amplified regions is <2. For each sample, odds ratios are determined separately for all three levels of amplification, as above: >0.09 (+ level), >0.2 (++), and >0.5 (+++).
Quantitative real-time PCR. Quantitative PCR primers were designed to amplify products of 100 to 150 bp (sequence of primers for genes analyzed is listed in Supplementary Table S4). Quantitative PCR was done on MJ Research Chromo 4 (Bio-Rad, Hercules, CA). Briefly, PCR reactions were done in a total volume of 25 μL composed of 1× PCR buffer [16.6 mmol/L NH4SO4, 67 mmol/L Tris (pH 8.8), 6.7 mmol/L MgCl2, 10 mmol/L β-mercaptoethanol] containing 2 ng of genomic DNA, 0.5 μmol/L of each primer, 0.5 mmol/L deoxynucleotide triphosphates, 0.5 mg/mL bovine serum albumin, 1 μL 1:1,500 diluted SYBR Green I, and 0.2 μL Platinum Taq (Invitrogen, Carlsbad, CA). The cycling conditions were 10 minutes at 95°C followed by 40 cycles of 15 seconds at 95°C and 1 minute at 58.1°C with a plate read at the end of each cycle. Composition of PCR products was examined by generating melting curves. The relative gene copy number was calculated by the comparative Ct method ( 27) and normalized to normal human genomic DNA and to a nonamplified gene (PVR on chromosome 19q13) from the same sample. Based on array CGH data, PVR gene copy number had no particular changes in all the samples. Quantitative PCR using cDNA templates was similarly done using RPL 39 (ribosomal protein L39) as control for normalization.
Fluorescence in situ hybridization. Bacterial artificial chromosome clones flanking or containing TEL/ETV6, KRAS2 (RP11-37P8), H2AFJ (RP-911J12), or EPS8 (RP-878D15) were obtained from Invitrogen/Research Genetics (Carlsbad, CA). The bacterial artificial chromosomes in the TEL/ETV6 probe are RP11-144O23 (AC006518) and RP11-267J23 (AC007537). The CEP4, CEP12 (D12Z3), and TEL/AML1 probes were obtained from Vysis, Inc. (Downers Grove, IL). The TEL/AML1 mix contains a TEL (spectrum green) probe at 12p13 that begins between exons 3 and 5 of TEL and extends ∼350 kb towards the telomere of 12p and an AML1 (spectrum red) probe at 21q22 spanning the entire AML1 gene. Touch preparations from the frozen tissues were prepared as follows: tissue was cut with a razor blade, and fresh cut surface was touched gently against a surface of a clean glass slide in several places and air-dried. Cells were fixed in cold 70% ethanol at 4°C for 2 hours, dehydrated in ascending ethanol series, and air-dried. After probe application, both tissue and probe DNAs were denatured simultaneously at 80°C for 2 minutes. Hybridization was carried out overnight at 37°C. Hybridizations of metaphase chromosomes obtained from the HCC1937 and ZR75-1 cell lines were done according to the method described ( 28). Metaphase chromosomes and interphase nuclei were stained with 4′,6-diamidino-2-phenylindole.
Results and Discussion
Array CGH analysis. cDNA array CGH was used to analyze copy number changes in 30 breast tumors (10 DCIS, 18 IDCs, and two lymph node metastases) along with five nonmalignant cells purified from normal breast tissue or breast carcinomas. The array used in this study (Agilent Human 1 clone set) covers >9,000 unique map positions with a median interval of about 100 kb between mapped elements. Typical array CGH profiles after normalization and Circular Binary Segmentation (see Materials and Methods for details) are depicted in Fig. 1A . Overall, individual cDNA log 2 ratios are scattered with the majority of log 2 ratios ranging from −0.5 to 0.5 even when using normal DNA. However, segmented array CGH data of nonmalignant samples clearly showed lack of statistically significant copy number gains and losses, whereas that of tumor samples identified multiple genetic alterations ( Fig. 1A). To identify genomic areas that are recurrently amplified or deleted, we generated plots summarizing copy number alterations in all tumors and in separate DCIS and IDC groups ( Fig. 1B). This overview shows that low-fold copy number gains and losses (segment values >0.09 or less than −0.08) affect nearly all chromosomes, whereas most high-fold amplifications and deletions (>0.5 or less than −0.35) correspond to previously identified regions (1q21, 8q24, 11q13, 12q13, 15q26, 17q21, 20q13 and 1p32, 11q11-12, 13q, 16q24, and 17p13, respectively). Increased log 2 ratio for chromosome X was observed in all samples due to the use of male genomic DNA as reference, whereas all breast tissue samples were obtained from females. The highest level amplification (>50-fold, calculated based on array CGH log 2 ratio; refs. 17, 18) was found at 15q26.3, presumably targeting IGF1R ( 29); but because this amplification was present in only one tumor (IDC-B17), it is not prominent in the recurrence chart ( Fig. 1B), whereas 17q21 harboring ERBB2 clearly stands out. We also identified areas of amplifications, such as 12p13 and 16p13, that have not been characterized in detail and may harbor novel breast cancer oncogenes ( Fig. 1B). The segmentation algorithm (Circular Binary Segmentation; ref. 24) effectively disregards single cDNA probes possessing aberrant high CGH log 2 ratios but identifies sets of adjacent probes with an altered average log 2 ratio. This segmental filtration can only detect amplicons that are composed of more than two genes. However, we cannot exclude the possibility that single highly aberrant log 2 ratios could represent very small focal amplifications not captured in Fig. 1B.
In addition to the combined analysis of all 30 tumors, we also analyzed the 10 DCIS and 18 invasive tumors as separate groups with the aim of identifying genetic alterations potentially involved in the in situ to invasive carcinoma transition. Copy number changes were readily detected by array CGH in DCIS and in some cases were extremely prevalent across the whole genome (e.g., in DCIS5; Fig. 1C), correlating with prior studies describing high degree of genomic instability at this early stage of breast tumorigenesis ( 30). However, with the exception of 1q and 17q21/ERBB2 amplicons, an overall trend toward an increase in the number and amplitude of gains and losses from DCIS to IDC was observed ( Fig. 1B, middle and lower). Unsupervised clustering of filtered raw data (log 2 ratio above 0.09), including all amplified genes with at least one sample having a log 2 ratio above 0.5, did not identify clear DCIS and IDC clusters nor did the tumors cluster according to grade or ER status ( Fig. 1C). However, statistical analyses determined that gain of 5q, chromosome 7, 11q, 16p, and 20p was statistically significantly (P < 0.05, not corrected for multihypothesis testing) more likely to be detected in IDC than in DCIS, whereas loss of chromosome 9 preferentially occurred in DCIS ( Fig. 1B). Similarly ER+ invasive tumors were more likely to have 1q and 11q gain than ER− ones, whereas we did not detect any statistically significant association between a specific amplification event and tumor grade, potentially because the majority of our tumors (22 of 30) were high grade ( Fig. 1C).
Identification of MCRs of amplifications. MCRs of amplifications from the predicted copy number gains were identified using a recently described algorithm ( 23). The 49 highest ranked MCRs and their known and candidate targets are listed in Table 1, and examples of corresponding array CGH profiles with the MCRs indicated are depicted in Fig. 1D and F. Among these highest ranked MCRs are many known amplicons frequently (10-30% of tumors) amplified in breast cancer, including 17q21 (ERBB2), 8q24 (MYC), 11q13 (GARP and EMSY), 8p12 (FGFR1), and 20q13 (BCAS1 and ZNF217). We also detected less frequent amplicons, including 5q35 (FGFR4) and 15q25.6 (IGF1R), that are amplified in 3% to 5 % of tumors. Although the majority of MCRs are fairly large (0.5-10 Mb), a few of them are small (<500 kb) and contain a limited number ( 5– 10) of genes representing attractive regions for further study.
To validate our array CGH results, we did quantitative real-time PCR analysis of selected candidate genes in primary breast tumors and breast cancer cell lines ( Table 2 ). Amplification of most candidates was confirmed, although the fold amplification determined by array CGH and quantitative real-time PCR was not always in perfect agreement.
Overall correlation between gene amplification and overexpression. We also analyzed the overall contribution of copy number gain to gene expression changes by calculating the percentage of overexpressed genes that are located in amplicons and the percentage of genes present in amplicons that are overexpressed within each tumor. Among the 30 tumor samples with array CGH profiling data, we had SAGE libraries for 14 of them. In addition, we had SAGE libraries from two different cases of normal luminal mammary epithelial cells to be used for comparisons. Odds ratios for overexpressed genes within amplicons were determined for each sample (see Materials and Methods and Supplementary Table S1.). For amplifications defined by segment values >0.09 (+ level), the mean odds ratio among all samples is 1.85 (0.52-2.96). In other words, genes in low-level amplified regions are nearly twice as likely to be overexpressed than genes in not amplified areas. Raising the amplification threshold to 0.2 (++) and 0.5 (+++) levels leads to a stronger association, with odds ratios of 2.14 (0.27-4.47) and 3.97 (1.49-10.65), respectively, supporting that higher levels of amplification lead to greater overexpression of contained genes. Contrary to previous studies reporting high overall association between gene expression and amplification ( 18, 19), we found that the correlation between gene amplification and overexpression is highly variable among tumors, suggesting different mechanisms of gene activation depending on tumor subtype (Supplementary Table S1). The difference between our results and that of previous studies could be due to the use of different platforms for expression profiling, setting more stringent criteria for “overexpression,” using purified normal mammary epithelial cells as reference, and the fact that SAGE tag numbers predict absolute mRNA copy numbers, whereas cDNA array data reflects relative mRNA levels.
Identification of candidate targets of MCRs. To identify candidate targets of amplicons based on gene expression patterns, we tested the genes in each MCR based on SAGE data obtained from the same samples by comparing to those from three normal controls: two from purified normal mammary epithelial cells and one from normal breast organoid (details of the statistical analysis are described in Materials and Methods). Using this combinatorial approach in many cases, we were able to narrow down the number of candidate targets to a few genes. A complete gene list and statistical analysis for each MCR is available as a Supplementary Data File. It is noteworthy that previous gene expression profiling and array CGH studies did not use normal mammary epithelial cells for reference ( 18, 19). Thus, we believe our analysis was improved and more reliable in detecting overexpressed oncogenes. On the other hand, a limitation of the SAGE method is that at the usual sequencing depth (∼50,000 tags per library), it is not able to identify targets with low overall expression levels. In addition, regardless of the method used, the magnitude of gene dosage effect is not a necessary predictor of functional relevance in tumorigenesis.
To validate our approach, we first applied our statistical analysis to fairly well characterized amplicons, including 17q21, 11q13, and 8q24. The 17q21/ERBB2 MCR contains 21 genes, and among the 14 tumors with SAGE data, eight of them showed amplification of this 17q21 MCR. Using our method, we predicted that the strongest candidate of this MCR was ERBB2 based on the P values obtained in the four different statistical tests and the fold overexpression (Supplementary Table S2, top). However, in addition to ERBB2, neighboring gene PERLD1 was also identified as a potential candidate target. Correlating with our data, a recent detailed characterization of the ERBB2 amplicon at the copy number [fluorescence in situ hybridization (FISH) on tissue microarrays] and gene expression (real-time PCR) levels also found that ERBB2, PNMT, and PERLD (MGC9753) show the best correlation between amplification and overexpression ( 31).
The same analysis done in the 11q13.3-5 MCR identified E2IG4 as the strongest candidate target, immediately adjacent to EMSY, a recent proposed target of this amplicon (refs. 32, 33; Supplementary Table S2, bottom). Interestingly, E2IG4 was previously identified as a gene induced by estrogen in the MCF-7 breast cancer cell line, and it encodes a secreted protein with leucine-rich repeats. Its exact function and potential role in breast cancer are unknown ( 34). EMSY was recently identified as a BRCA2-interacting protein and a target of the 11q13 amplicon in breast and ovarian carcinomas ( 32, 33). The amplification and overexpression of EMSY may compromise BRCA2 function in sporadic tumors. Although our SAGE data did not support EMSY as the target, the fact that our method pinpointed to the neighboring gene E2IG4 confirmed the location of target(s) in this particular amplicon. Overexpression of E2IG4 and EMSY should be further validated using other means, such as quantitative RT-PCR or Northern hybridization.
On the other hand, our analysis of the 8q24.3 amplicon did not identify the known presumed target, MYC ( Fig. 1D; Supplementary Table S3). Among the five tumors with amplification of this area, none of them showed significant overexpression of MYC compared with normal mammary epithelial cells or tumors that lacked amplification, suggesting that it may not be the best candidate target of this MCR in these breast tumors. A recent study analyzing myeloid malignancies with 8q24 amplification also concluded that MYC is not overexpressed in the amplified tumors and thus may not be the only target of this amplicon ( 35). Similarly, cDNA array CGH and gene expression analysis of breast carcinomas showed that only two of eight tumors with MYC amplification had increased expression of its mRNA, again raising the question whether MYC is the only target of the 8q24 amplicon in breast cancer ( 18). A previous study using an inducible MYC expression model showed that MYC expression was not fully required for tumor progression in the presence of additional genomic changes, such as KRAS2 mutation ( 36), raising the possibility that MYC overexpression in tumors might be transiently maintained. We identified KIAA0196 as another potential target of this region, correlating with recent findings that KIAA0196 was both amplified and overexpressed in prostate cancer ( 37, 38).
Systematic characterization of the 12p13-p12 amplicon. In addition to these previously described and well-characterized amplicons, we also identified several chromosomal areas with high-level copy number gains that have not previously been characterized in detail in breast cancer. Among these, we further characterized a 1.8-Mb amplicon at 12p13-p12 found in three tumors ( Fig. 1F; Table 3 ), because it showed one of the highest levels of amplification (comparable with that of the ERBB2 amplicon). KRAS2, an oncogene that is located in the 12p13-12 amplicon, has previously been described as a gene amplified in a subset of breast carcinomas based on a on a limited array CGH screen for the amplification of known oncogenes ( 20), but this amplicon has not been systematically characterized at high resolution. Based on our integrated array CGH/SAGE analysis, we identified three candidate target genes in this region: H2AFJ, EPS8, and KRAS2. However, we also considered TEL/ETV6 as a candidate target, because it is a known oncogene ( 39– 41), and translocation of TEL/ETV6 to NTRK3 on 15q25 was reported in >90% of secretory breast cancers ( 42, 43) and even in rare cases of IDCs ( 44). TEL/ETV6 encodes a member of the ETS family of transcription factors with an NH2-terminal oligomerization (PNT) and a COOH-terminal DNA binding (ETS) domain ( 40). TEL/ETV6 translocation was frequently found in myeloid and lymphoid leukemias and in solid tumors ( 39, 40, 45), and recently, amplification of TEL/ETV6 was reported in a myelodysplastic syndrome ( 46). KRAS2 is another attractive target due to the well-established roles of Ras family proteins involved in tumorigenesis. Mutation of RAS genes is infrequent in human breast carcinomas ( 47, 48), but amplification of KRAS2 has previously been reported in 10 of 27 cases of breast carcinomas and breast cancer cell lines ( 20) and in one case of metastatic rectal carcinoma ( 21).
Amplification of all four candidates in tumors predicted by array CGH was confirmed by quantitative PCR ( Tables 2 and 4 ). Within additional 16 tumor samples screened that did not have array CGH data, only one tumor had amplification of H2AFJ, EPS8, and KRAS2. Among 26 breast cancer cell lines screened, only HCC1937 and ZR-75-1 showed copy number gain in this region ( Tables 2 and 4). Thus, amplification of 12p13-p12 occurs with a low frequency in breast cancers. We further did FISH to confirm amplification of the candidates on the single-cell level and to determine whether TEL/ETV6 is involved in a translocation, as in the case of secretory breast cancer and other tumor types. FISH displayed dramatic amplification of all four candidates in tumors IDC1 and LN1, but not in the control tumor LN2, which was also negative by array CGH ( Fig. 2 ). FISH using two bacterial artificial chromosomes flanking the TEL/ETV6 gene labeled in red and green showed adjacent red and green signals in the tumors, suggesting that the TEL/ETV6 gene is not disrupted in these tumors. In HCC1937 cell line, the majority cells carried five copies of TEL/ETV6. Staining of metaphase chromosomes using centrosome-specific probes showed that three copies of ETV6 were associated with chromosome 12, whereas the other two copies associated with chromosome 4 and one yet unidentified chromosome. Again, the TEL/ETV6 gene is not disrupted in HCC1937 cell line. Therefore, translocation and fusion of TEL/ETV6 to NTRK3 or other genes is not a common event in breast cancers.
We next did quantitative reverse transcription-PCR (RT-PCR) on cDNA samples from primary breast tumors, breast cancer cell lines, and five purified normal mammary epithelial cells as references to examine overexpression of the four putative targets ( Table 4). TEL/ETV6 and KRAS2 were overexpressed in a subset of breast tumors, but this did not correlate with their amplification. From the four genes tested, only H2AFJ and EPS8 were overexpressed in tumors in which they were also amplified, although the association between gene amplification and overexpression was statistically significant only for ERBB2 (P = 0.007, Fisher exact test) due to small sample size and the low frequency of amplification of the 12p13 target genes in breast tumors. Thus, based on these data, H2AFJ and EPS8 are the potential targets of this 12p13-p12 amplicon. The H2AFJ gene encodes a member of the histone H2A super family. It has two isoforms generated by differential splicing. We detected the expression of only isoform 2 (NM_177925) by SAGE and quantitative RT-PCR. This isoform encodes a 129-amino-acid protein that is highly homologous (∼95%) to other histone H2A family members. The function of this particular H2A protein is not known, but presumably, it is involved in modulating chromatin structure and gene expression. A recent study described overexpression of H2AFJ in human metastatic melanoma lesions compared with common nevocellular nevi, suggesting a potential role for this gene in melanoma metastasis ( 49). Epidermal growth factor (EGF) pathway substrate 8 (EPS8) was originally identified as a substrate of EGF receptor (EGFR) that enhances mitogenic signaling from receptor tyrosine kinases, phorbol ester, and c-Src ( 50, 51). Constitutive tyrosine phosphorylation of EPS8 was observed in many tumor cell lines ( 52). Overexpression of EPS8 in murine C3H10T1/2 fibroblasts induced cellular transformation in the presence of EGF ( 53), whereas down-regulation of EPS8 by trichostatin A or small interfering RNA inhibited the growth of v-Src-transformed chicken cells ( 46). At the molecular level, EPS8 binds to internalized EGFR, controls EGFR trafficking, and relays signals from Ras, phosphatidylinositol 3-kinase to Rac. More recently, EPS8 was found to bind to the barbed ends of actin filaments and regulates actin polymerization and cell motility ( 54, 55). Public gene expression data suggest that EPS8 is overexpressed in breast and several other cancer types, including lung and pancreatic cancer. Further studies are required to confirm overexpression of EPS8 protein in breast tumors by immunochemistry and to evaluate its potential prognostic value.
In summary, we identified H2AFJ and EPS8 as novel candidate breast cancer oncogenes based on integrated cDNA array CGH and SAGE analyses. The combination of these two technologies seems to be powerful for the identification of candidate target genes of amplified loci as shown by the identification of a novel 12p13 amplicon and its putative targets (H2AFJ and EPS8) detected in a subset of breast tumors. Further functional studies are necessary to validate the role of these new candidate oncogenes in breast tumorigenesis.
Grant support: National Cancer Institute Cancer Genome Anatomy Project and Specialized Program in Research Excellence in Breast Cancer at Dana-Farber/Harvard Cancer Center grant CA89393, Department of Defense Breast Cancer Center of Excellence grant DAMD17-02-1-0692 (K. Polyak), Department of Defense Postdoctoral Fellowship grant DAMD17-02-1-0363 (J. Yao), and grant CA93683.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
We thank Drs. Drazen Belina, Zrinka Pagon, and Jasminka Razumovic (University Hospital Rebro and Zagreb Medical School, Zagreb, Croatia) and Drs. Andrea Richardson, Gabriela Lodeiro, and Ruth Gomes (Brigham and Women's Hospital, Boston, MA) for help with the acquisition of tumor samples and the current and past members of the Polyak laboratory for critical reading of the article and their constructive criticism throughout the execution of this project.
Note: Supplementary data for this article are available at Cancer Research Online (http://cancerres.aacrjournals.org/).
C. Brennan is currently at the Neurosurgery Service, Memorial Sloan-Kettering Cancer Center, 1275 York Avenue, New York, NY 10021.
- Received November 14, 2005.
- Revision received January 8, 2006.
- Accepted January 26, 2006.
- ©2006 American Association for Cancer Research.