Global gene expression analysis using microarrays has been used to characterize the molecular profile of tumors. Gene expression variability at the mRNA level can be caused by a number of different events, including novel signaling, downstream activation of transcription enhancers or silencers, somatic mutation, and genetic amplification or deletion. Genomic amplifications are commonly observed in cancer and often include known oncogenes. The tyrosine kinase-type cell surface receptor, ERBB2, is an oncogene located on chromosome 17q21.1 that is amplified in 10–40% of breast tumors. We report for the first time that phenylethanolamine N-methyltransferase (PNMT), proteasome subunit, β type 3 (PSMB3), ribosomal protein L19 (RPL19), and nuclear receptor subfamily 1, group D, member 1 (NR1D1) are coexpressed with ERBB2 in 34 breast cancer biopsies and also mapped within the same chromosomal location as the ERBB2 gene. Consistent with previous reports, we also observed that the steroidogenic acute regulatory protein-related gene, MLN64, and growth factor receptor bound protein 7 were coexpressed with ERBB2. Coexpression and colocalization of PNMT and MLN64 with ERBB2 suggested that the amplification of ERBB2 includes the chromosomal region harboring these genes. This hypothesis was validated in a subset of 12 biopsies. Gene amplification of ERBB2, PNMT, and MLN64 significantly correlated with increased mRNA gene expression (P < 0.05). These results suggest that gene expression profiling of breast biopsies may become a valuable method for adequately characterizing and choosing treatment modality for patients with breast cancer.
Gene amplification is a common mechanism of oncogene activation in neoplastic cell transformation and tumor progression. The ERBB2 gene encodes a growth factor receptor with tyrosine kinase activity that is amplified in 10–40% of breast cancers resulting in overexpression of the ERBB2 gene (1, 2, 3, 4) . The ERBB2 gene has been localized to 17q21.1. 2 Nearby, amplifications of the genomic DNA at chromosomal bands 17q22-q24 have been described in breast cancer (5, 6, 7, 8) . A recent study has identified two regions telomeric to ERBB2 that are highly amplified, regions A and B, which are separated by 5 MB of unamplified DNA (8) . Region A has been shown to be coamplified with the ERBB2 locus in breast cancer cell lines (8) . The size of the ERBB2 amplicon has not previously been defined at the resolution now possible with the completion of the human genome project. Two recent reports studying the gene expression profiles of breast cancer suggest that the ERBB2 amplicon includes GRB7 3 and MLN64 (9 , 10) .
The use of oligonucleotide microarrays allows for the rapid monitoring of gene expression in tumors. Gene expression evaluation of human breast cancers by DNA microarray technology is beginning to show promise in molecular phenotyping (9, 10, 11, 12) . We profiled 34 breast tumors, 19 of which were ER+, 14 that were ER−, and 1 of unknown status. Examination of the mRNA expression data revealed that ERBB2 was frequently overexpressed in these cancers and predominantly in the ER− tumors. We applied a City Block algorithm on a logarithmic scale to identify genes with expression patterns similar to ERBB2 in this set of tumors. Interestingly, three of the genes coexpressed with ERBB2 map centromeric to the ERBB2 locus on chromosome 17q21.1. The colocalization and coexpression of these genes with ERBB2 suggested a region of genomic amplification in breast cancer that extends beyond ERBB2. To investigate this hypothesis, we performed correlation studies between gene copy number and mRNA expression in 12 of 34 tumors for which we had genomic DNA. These studies confirmed that breast tumors with the highest expression of ERBB2, MLN64, and PNMT also had increased gene copy number at the chromosomal level for each of these three genes. In addition, we investigated whether the gene expression patterns of ERBB2, MLN64, and PNMT in breast tumors were predictive of phenotypic parameters. The strategy of combining gene expression microarray data with phenotypic parameters should be useful in classifying tumor types.
MATERIALS AND METHODS
Sample Preparation and Microarray Analysis.
Thirty-four breast tumor specimens were collected from fresh surgical resections from multiple hospitals in Sweden and stored at −70°C until processed for additional analysis for steroid hormone receptor measurement and flow cytometry. The remaining tissue was used for microarray studies. Patients were informed of their receptor, DNA, and S-phase values but not about results from the gene expression profiling analysis according to advice from the chairman of the local ethics committee. ER and PGR levels were determined in cytosol from freshly frozen tumor. S phase and ploidity was analyzed in isolated nuclei from individual breast cancers as described previously (13) . Fourteen of the biopsies were ER−, 19 were ER+, and for 1 sample, ER status was unavailable. The tumors were roughly matched for ploidity and S phase (Table 1) ⇓ . Ploidity and S phase were determined using standard flow cytometric methods at the individual hospitals.
RNA was extracted using Trizol (Life Technologies, Inc., Gaithersburg, MD). RNA was purchased for one sample representing a pool of normal breast tissue from two deceased females ages 24 and 43 years (Clontech, Palo Alto, CA). Total RNA was purified using Qiagen RNeasy columns (Qiagen, Valencia, CA). Full-length total RNA was used to synthesize double-stranded cDNA using the Superscript Choice System (Life Technologies, Inc.). The cDNA was then phenol/chloroform extracted with phase lock gels (Fisher, Newark, DE) and ethanol precipitated. The cDNA was transcribed in vitro using Enzo BioArray High Yield RNA transcript Labeling Kit to produce biotin-labeled cRNA (ENZO Diagnostics, Farmingdale, NY). The labeled cRNA was then fragmented at 94°C for 35 min at a minimum final concentration of 0.5 μg/μl. 1× Fragmentation buffer consisted of 40 mm Tris-acetate (pH 8.1), 100 mm KOAc, 150 mm MgOAc, mixed with diethyl pyrocarbonate-treated water to a final volume of 20 ml. The cRNA was then combined in a hybridization mixture at a final concentration of 0.05 μg/μl fragmented cRNA, with eukaryotic hybridization controls (BioB, BioC, BioD, cre) at 1.5, 5, 25, and 100 pm concentration, respectively, 0.1 mg/ml herring sperm DNA, 0.5 mg/ml acetylated BSA, 100 mm MES, 1 m [Na+], 20 mm EDTA, 0.01% Tween 20, 50 pm control oligo B2 (Affymetrix, Santa Clara, CA), and water to a final volume of 300 μl. The hybridization mixture was heated at 99°C for 5 min followed by a 5-min incubation at 45°C. HuGene FL (6800) microarrays were filled with 200 μl of 100 mm MES, 1 m [Na+], 20 mm EDTA, and 0.01% Tween 20 each and then incubated at 45°C for 10 min in a hybridization oven. The buffer was removed from the array and filled with 200 μl of hybridization mixture. Hybridization was performed for 16 h in a 45°C hybridization oven. Upon completion, arrays were washed according to the standard Affymetrix EukGE-WS2v2 protocol and stained with 10 μg/ml streptavidin-phycoerythrin conjugate (Molecular Probes, Eugene, OR). The signal was antibody amplified with 2 mg/ml acetylated BSA (Life Technologies, Inc.), 100 mm MES, 1 m [Na+], 0.05% Antifoam (Sigma, St. Louis, MO), 0.1 mg/ml goat IgG (Sigma), and 0.5 mg/ml biotinylated antibody (Vector Laboratories, Burlingame, CA) and re-stained with fresh streptavidin solution. After washing and staining, arrays were scanned twice with the Gene Array scanner (Affymetrix). Hybridization intensity data detected by the scanner are automatically acquired and processed by the Gene Chip software. Raw data were converted to expression levels using a target intensity of 150. Expression levels are represented by an AvgDiff value (Affymetrix, Expression Analysis Technical Manual), which is used to determine the ratio change in the hybridization intensity of a given probe set between two or more experiments.
Quantitative Real-Time PCR.
DNA was extracted from 12 biopsies from which RNA was extracted using Trizol (Life Technologies, Inc.). The Comparative CT Method was used to determine the relative gene copy number for MLN64, PNMT, and ERBB2 (14) . The amount of target DNA was normalized to the endogenous18S RNase gene and is reported relative to a calibrator, Coriell Cell Repository DNA NA17246 (Camden, NJ). Primers for MLN64, PNMT, and ERBB2 were designed using the ABI Primer Express Software 2.0 and synthesized by Megabases, Inc. (Evanston, IL; Table 2 ⇓ ). Primers were selected according to the following criteria: target sequence between 50 and 100 bp in length; G-C content between 30 and 80%; and avoidance of identical nucleotides runs (Table 2) ⇓ . Primer concentrations and combinations were optimized such that the conditions resulting in the lowest CT value and the highest Δ RN were used for the quantitative PCR experiments. The TaqMan probes were labeled with FAM and a nonfluorescent quencher.
A validation experiment was performed to confirm that the efficiency of each of the target amplifications and the efficiency of the reference amplification were approximately equal. The absolute value of the slope of the log input amount versus ΔCT was <0.1.
All samples were run in triplicate along with a positive and negative control on each plate for all three primer sets and the endogenous control. The DNA for the positive control was extracted from SKOV3, an adenocarcinoma ovarian cell line kindly provided by the National Cancer Institute (Frederick, MD). SKOV3 reportedly has 5–9-fold genomic amplification of ERBB2 (15) . Primers MLN64–1150F and 1219R were used at a concentration of 300 nm each. ERBB2–114F and 180R were used at concentration of 900 nm each, and PNMT–1215F and 1294R were used at 300 and 900 nm, respectively. The PCR reaction was performed following the manufacturers protocol (Applied Biosystems, Foster City, CA).
An algorithm based on the City Block metric, was developed for identifying genes with similar expression patterns to a query gene, in this case ERBB2. The City Block metric was implemented on a logarithmic scale. This transforms the City Block metric to which is equivalent to By using the City Block metric based on a logarithmic scale rather than absolute values of expression, the ratios of gene expression are compared. This allows for genes with similar patterns of expression, but not necessarily similar absolute expression levels, to be correlated with each other. Pairwise comparisons between the ERBB2 values and each of the remaining 7128 probe sets were performed with the algorithm presented. The 20 genes with the closest distance to the query gene (ERBB2) were recorded. The measured distances were scaled by setting the distance between ERBB2 and its closest neighbor (monokine induced by IFN γ, MIG) to 1. Finally, the experimental expression profiles over this specific gene subset were clustered using an agglomerative hierarchical tree method. To determine the significance of the correlations discovered between ERBB2, MLN64, and PNMT using the City Block metric algorithm described above, a Spearman Rank Order Correlation Coefficient was calculated (Sigma Stat, Chicago, IL).
Genes in close proximity to ERBB2 were investigated to determine whether their expression pattern significantly correlated with ERBB2. Spearman Rank Order Correlations were performed with genes located within ∼1 MB of ERBB2 (Map Viewer, Build 30) 4 that were also represented by a probe set on the Affymetrix HuGene FL (6800) microarray.
To determine whether the gene expression levels of ERBB2, PNMT, or MLN64 in the tumor biopsies were predictive of phenotypic parameters, seven parameters were studied, including ER and PGR status, characteristics of the primary tumor (T), regional axillary node metastases (N), and S phase and ploidity (Table 1) ⇓ . Distant metastases (M) was not tested because all but one of the tumors for which this information was available had no detectable distant metastases. For T, three categories were designated for analysis, for N, two categories, and for ploidity, three categories. The expression data for the three genes were not normally distributed; consequently, they were transformed using a log (10) scale. One-way ANOVA tests were performed to determine whether there were significant differences between the transformed data for the three genes of interest and the phenotypic parameters, excluding S phase (Sigma Stat).
The S-phase data area continuous variable, representing the percentage of cells in synthesis phase of the cell cycle, with data ranging from 2.5 to 22 in the 34 breast tumors. Spearman Rank Order Correlations were performed to determine the degree of linear relationship for ERBB2, PSMB3, PNMT, or MLN64 mRNA expression levels and the S-phase percentage (SigmaStat).
ERBB2 Gene Expression and ER Status.
We evaluated the expression of ∼7128 genes using oligonucleotide arrays in 34 human breast tumors. ERBB2 was overexpressed by a factor of two in 12 of 34 (35%) of the breast tumors relative to normal breast tissue (Table 1) ⇓ . The term overexpression will be used to mean two times the expression of the gene of interest as compared with the normal breast pool sample. Expression of ERBB2 was more prominent in the ER− tumors with 8 of 14 (57%) tumors overexpressing the gene compared with 4 of 19 (21%) ER+ samples. Furthermore, there was a significant difference in the mean expression levels of ERBB2 between ER+ and ER− tumors, P = 0.030 (Table 3) ⇓ . The significant difference in the mean expression levels of ERBB2 between ER+ and ER− tumors should be interpreted cautiously as the degree of variation in gene expression between the two ER classes was not equal. We also observed a significant difference in the mean expression levels of ERBB2 between PGR+ and PGR− biopsies, P = 0.044 (Table 3) ⇓ . Fifty percent (10 of 20) of PGR− biopsies overexpress ERBB2 compared with 15% (2 of 13) of PGR+ biopsies.
Characterization of the ERBB2 Amplicon by Gene Expression.
Applying the City Block metric based on a logarithmic scale, we identified 20 genes with a similar expression pattern to ERBB2 in the 34 breast cancers. The 20 genes were then used to cluster the samples (Fig. 1) ⇓ . Interestingly three of the genes, PSMB3, PNMT, and MLN64, map to the long arm of human chromosome 17, near the ERBB2 gene. PSMB3 is 950 kb, MLN64 is 51 kb, and PNMT is 20 kb centromeric to ERBB2 (Table 4) ⇓ . 2
The expression pattern of ERBB2, PSMB3, MLN64, and PNMT were positively correlated with each other. ERBB2 was positively correlated with each of PSMB3, PNMT, and MLN64, P < 0.001 and r > 0.50. Correlation coefficients were calculated for the expression patterns of 19 additional genes that are located in a 2.5-MB region surrounding ERBB2 on chromosome 17q. In addition to PSMB3, MLN64, and PNMT, three genes had expression patterns that correlate with ERBB2 with a P = <0.05, RPL19, GRB7, and NR1d1 (Table 4) ⇓ .
Expression levels of the 20 genes identified with the City Block metric that have similar expression to ERBB2 distinguished two classes of samples (Fig. 1) ⇓ . Class 1 consisted of 24 samples with an ERBB2 mean AvgDiff of 178.6 and a SD of 129.7. Class 2 consisted of 10 samples with a mean AvgDiff of 2050.0 and a SD of 1289.0. Normal breast with an AvgDiff value of 139.3 for ERBB2 clustered with the class 1 samples (data not shown).
Correlations between Gene Copy Number and Gene Expression.
Relative gene copy number for ERBB2 and two of six genes in close proximity that had gene expression patterns that correlate with ERBB2, MLN64, and PNMT was determined in 12 of 34 biopsies for which DNA was available using real-time quantitative PCR and the comparative CT method (Table 5) ⇓ . The relative gene copy number for ERBB2 ranged from 0.28 to 4.82 with five of the biopsies having a gene copy number >2 relative to the normal control and normalized to the endogenous 18S rRNA gene (samples 88, 92, 101, 110, and 121). All five of the biopsies with ERBB2 genomic amplification also had genomic amplification of PNMT and MLN64. Similarly, the ovarian adenocarcinoma cell line, SKOV3, had genomic amplification of all three genes. Four of five biopsies with ERBB2 amplification overexpressed ERBB2, PNMT, and MLN64 mRNA (samples 88, 92, 110, and 121). The single biopsy, sample 101, with ERBB2 amplification that did not overexpress ERBB2 mRNA had genomic amplification for PNMT and MLN64. This biopsy overexpressed PNMT but not MLN64. There were 2 of 12 biopsies that had ERBB2 overexpression but no apparent genomic amplification for any of the three genes studied (samples 95 and 96).
Gene expression and relative gene copy number for ERBB2 had a correlation coefficient of 0.76 (P = 0.005). Similarly, PNMT and MLN64 gene copy number and gene expression had correlation coefficients equal to 0.80 (P = 0.002) and 0.69 (P = 0.013), respectively.
Correlations and Associations with Phenotypic Parameters.
To investigate whether ERBB2, PNMT, and MLN64 mRNA expression correlated with phenotypic parameters, one-way ANOVAs were calculated between ER status, PGR status, tumor staging, regional axillary node metastases, stage, ploidity, and gene expression in the tumors for which the phenotypic data were available (Table 3) ⇓ . The significant associations between ERBB2 expression and ER and PGR status were discussed previously. In addition, the differences in the mean gene expression level for MLN64 among ER+ and ER− were found to be significant, P = 0.017. Fifty percent (7 of 14) of ER− biopsies overexpress MLN64 compared with 5% (1 of 19) of ER+ biopsies. There were no other significant correlations observed between mRNA expression and phenotypic parameters of the tumors.
We have combined gene expression profiling and gene mapping data in this study of 34 breast cancers. We examined the expression of ERBB2 in 14 ER− and 19 ER+ tumors and found it to be overexpressed in 57% of the ER− tumors and 21% of the ER+ samples. These results are in agreement with previous reports of overexpression of ERBB2 in breast cancer (1, 2, 3, 4) . ERBB2 and its expression in breast cancer has achieved special importance with both the recognition of poorer prognosis in the ERBB2+ class of tumors and the development of Herceptin (trastuzumab), a therapeutic antibody strategy targeting ERBB2 in the treatment of breast cancer (16, 17, 18) . Although overexpression of ERBB2 is common in breast cancer, the mechanism at the genomic level is not fully understood. Possible mechanisms of overexpression of ERBB2 include somatic mutations or genomic amplification, which is frequently observed in cancer. The chromosomal region of 17q21.1, where ERBB2 resides, is often amplified in breast cancer, suggesting that the overexpression is because of increased copy number (3) .
We hypothesized that if amplification was a mechanism by which ERBB2 was overexpressed, then genes coamplified with ERBB2 should have similar expression patterns to ERBB2 in breast cancer. We developed an algorithm to search for such genes in 34 tumors across 7128 genes. Twenty genes were identified that are closely related to ERBB2 based on gene expression patterns. Hierarchical clustering of the 34 samples based on these 20 genes resulted in two clusters, demonstrating that ERBB2 expression defines two classes of tumors, those overexpressing ERBB2 and those with expression levels similar to normal breast.
Inspection of the genomic sequence map revealed that 3 of 20 genes with expression patterns similar to ERBB2, PSMB3, MLN64, and PNMT map in close proximity to ERBB2. This finding led us to speculate that our observation of overexpression of multiple genes in this chromosomal region was because of a genomic amplification that included ERBB2. This hypothesis was confirmed using quantitative real-time PCR for the chromosomal region extending 51-kb centromeric from ERBB2. We found that genomic amplification of ERBB2, MLN64, and PNMT correlated with increased mRNA expression. PNMT has not previously been reported to be overexpressed in breast cancer, nor is it known to be regulated by ERBB2. PNMT is an enzyme that catalyzes the last step in the biosynthesis of chatecholamines, changing norepinephrine to epinephrine. Like PNMT, MLN64 is not known to be regulated by the ERBB2 gene product (erbb2), however, it has previously been described to be overexpressed with ERBB2 (9 , 10 , 19) . MLN64 is a steroidogenic acute regulatory protein that binds and transports cholesterol and promotes steroidogenesis in placenta and brain.
Our identification of concordance between expression levels and relative gene copy number of ERBB2, PNMT, and MLN64 confirms the hypothesis that coordinate overexpression of these three genes is a result of genomic amplification of the chromosomal region 17q21.1 in breast cancer. Five of five breast tumors with ERBB2 genomic amplification had amplification that extended to include PNMT and MLN64. Although only 12 tumors were examined, this suggests that genomic amplification of ERBB2 in breast cancer frequently extends to MLN64. Two-thirds of the samples overexpressing ERBB2 also overexpress PNMT and MLN64. Three samples overexpress ERBB2 and do not overexpress MLN64 or PNMT. Two of these samples were included in the quantitative PCR analysis, nos. 95 and 96. Neither no. 95 nor no. 96 had genomic amplification of any of the three genes studied. The overexpression of ERBB2 mRNA may be the result of a mutation in the regulatory sequence for ERBB2.
In addition to MLN64 and PNMT, the genes RPL19, GRB7, and NR1d1 are coexpressed with ERBB2. Investigation of all known genes for which expression data were available in a 2.5-MB region surrounding ERBB2 revealed that RPL19, GRB7, and NR1d1 had expression patterns that significantly correlated with ERBB2.
In conclusion, we have identified 23 genes with a similar pattern of gene expression to ERBB2 in 34 breast tumors. Two of these genes, PNMT and MLN64, have a similar pattern because they are genomically amplified with ERBB2. The other 18 genes in Fig. 1 ⇓ , which do not map to chromosome 17q, may have similar expression patterns because of ERBB2 regulation, coregulation by other factors or association with some undefined breast cancer variability. The potential for the use of these 23 genes as biomarkers for ERBB2 activation and/or amplification and as therapeutic targets should be additionally investigated. Furthermore, additional studies should be performed to determine whether the overexpression of any of the 23 genes identified is a contributing factor to oncogenesis or is only a result of genomic amplification. Finally, the research presented here helps define the ERBB2 amplicon and suggests algorithms for identifying genomic amplification and deletions.
We thank the patients who kindly agreed to participate in these studies and Tim McGee for his technical assistance.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
↵1 To whom requests for reprints should be addressed, at Marlene A. Dressman, Pharmacogenetics, Novartis Pharmaceuticals Corporation, 9 West Watkins Mill Road, Gaithersburg, MD 20878. Phone: (301) 330-3128; Fax: (301) 330-2108; E-mail:
↵2 Internet address: www.ncbi.nlm.nih.gov/cgi-bin/Entrez/hum_srch?chr=hum_chr.inf&query.
↵3 The abbreviations used are: GRB7, growth factor receptor-bound protein 7; ER, estrogen receptor; PNMT, phenylethanolamine N-methyltransferase; MES, 4-morpholinepropanesulfonic acid; PGR, progesterone receptor; MB, megabase; AvgDiff, average difference; PSMB3, proteasome subunit, β type 3; RPL19, ribosomal protein L19; NR1D1, nuclear receptor subfamily 1, group D, member 1.
↵4 Internet address: www.ncbi.nlm.nih.gov/mapview/static/MVstart.html.
- Received January 8, 2002.
- Accepted March 5, 2003.
- ©2003 American Association for Cancer Research.