Abstract
Chromosomal instability (CIN) is associated with poor outcome in epithelial malignancies, including breast carcinomas. Evidence suggests that prognostic signatures in estrogen receptor–positive (ER+) breast cancer define tumors with CIN and high proliferative potential. Intriguingly, CIN induction in lower eukaryotic cells and human cells is context dependent, typically resulting in a proliferation disadvantage but conferring a fitness benefit under strong selection pressures. We hypothesized that CIN permits accelerated genomic evolution through the generation of diverse DNA copy-number events that may be selected during disease development. In support of this hypothesis, we found evidence for selection of gene amplification of core regulators of proliferation in CIN-associated cancer genomes. Stable DNA copy-number amplifications of the core regulators TPX2 and UBE2C were associated with expression of a gene module involved in proliferation. The module genes were enriched within prognostic signature gene sets for ER+ breast cancer, providing a logical connection between CIN and prognostic signature expression. Our results provide a framework to decipher the impact of intratumor heterogeneity on key cancer phenotypes, and they suggest that CIN provides a permissive landscape for selection of copy-number alterations that drive cancer proliferation. Cancer Res; 74(17); 4853–63. ©2014 AACR.
Introduction
Induction of aneuploidy affects cellular fitness in various organisms, including yeast (1), mice (2), and human cells (3). Most aneuploid cells exhibit reduced proliferation rates, but studies in yeast have shown that aneuploidy can also be beneficial for the adaptation to stressful conditions not previously experienced by the cell population (4). Highly aneuploid cancers are characterized by a large number of structural and numerical DNA copy-number changes altering expression of most genes located in these regions. Aneuploidy is frequently accompanied by chromosomal instability (CIN), defined as an increased rate of gains and losses of whole chromosomes or fractions of chromosomes (5). CIN generates intercellular chromosomal heterogeneity that may facilitate selection of genotypes tolerant to aneuploidy and strong selective pressures in the tumor, and accelerate the proliferation of aneuploid cancer cells (6, 7). The specific chromosomal changes and expression patterns that are implicated in this adaptive response remain unclear.
Several breast cancer gene expression signatures forecasting clinical outcome and response to chemotherapy are strongly associated with proliferation (8–11). It is still being debated whether the prognostic value of these diverse gene lists reflects similar underlying biologic processes and pathways (12). Intriguingly, there is increasing evidence that many breast cancer prognostic signatures are associated with CIN (13, 14). CIN is associated with inferior prognosis across multiple cancer types, including ER-positive (ER+) breast cancer (15). The CIN70 signature (16) and a 12-gene genomic instability signature (13), both derived from their associations with aneuploidy and CIN, have prognostic value in many cancer types and also correlate with proliferation markers. However, little is known about possible genotypes crucial for the CIN phenotype or the relationships between CIN and proliferation in vivo. Indeed, it has been proposed that CIN is a consequence, rather than a cause of the selective pressure for proliferative drive in tumors.
In this study, we sought to better characterize the relationships between CIN and proliferation through an analysis of the effects of copy-number alterations associated with CIN and the downstream consequences of such alterations upon prognostic signature gene expression.
Materials and Methods
SNP and expression data processing
Microarray expression data and SNP array based copy-number data of 264 patients with ER+ breast cancer with pathologist estimates of more than 60% tumor cell content or more than 60% tumor nuclei were selected from The Cancer Genome Atlas (TCGA). Agilent 244K Custom Gene Expression G4502A-07 two-color microarrays were normalized (print-tip-group loess normalization, Bioconductor package marray; ref. 17). Probes with missing values in more than 15% of the samples were excluded and duplicated probes were averaged. Gene mappings based on NCBI build 36.3 were downloaded from the TCGA data portal (18). For probes mapped to the same ENTREZ Gene ID, the probe with the highest Pearson correlation coefficient of gene expression with wGII score was chosen. For the remaining genes, missing gene expression data were imputed with nearest neighbor (k = 10) averaging implemented in the R package impute (19). Genes with standard deviation lower than 0.25 were removed. The expression values of all remaining genes were standardized.
SNP 6.0 samples were normalized with the Affymetrix Genotyping Console using standard settings. Samples not passing the quality check criteria were excluded. Integer copy numbers were estimated by the GAP algorithm (20).
RNA interference screening data
Whole-genome RNA interference (RNAi) screening data (21) analyzing cell numbers after gene silencing were available for five cell lines: PC9 (lung), RCC4 (kidney), HCT116 (colon), MCF-10A (breast), and HT1080 (fibrosarcoma). The PC9 and RCC4 screening data were normalized with respect to the plate median, and then smoothed using the well position in the following manner: normalized well value = (well value − plate median)/(plate median absolute deviation) and smoothed well value = (normalized well value − well position median)/(well position median absolute deviation). For HT1080, the data were logged and then normalized with respect to the plate median: normalized well value = log (well value − plate median)/(plate median absolute deviation). For MCF-10A, the data were normalized as follows: normalized well value = (well value − plate median)/(batch median absolute deviation). The median over the replicates (2-3) was taken as the normalized Z-score and the 82 genes that impair cell viability following siRNA transfection (Z-score < -2) in at least three of the five cell lines were defined as proliferation genes (see Supplementary Table S1).
Genome-wide shRNA data, including P values for impaired cell survival for the 11 ER+ cell lines BT-474, ZR-75-1, T47D, MCF7, MDA-MB-361, KPL-1, HCC1500, HCC1428, HCC1419, EFM-19, and CAMA-1, were downloaded from the COLT database (22). Genes inducing cell death (P < 0.05) in at least four cell lines were considered as proliferation genes in ER+ breast cancer.
Microarray expression analysis after UBE2C and TPX2 silencing in T47D cells
T47D cells were maintained in 5% CO2 at 37°C, in RPMI-1640 medium supplemented with 10% FBS, glutamine, and penicillin/streptomycin. All siRNA (Dharmacon, Thermo Scientific) experiments were performed at 40 nmol/L final concentrations by reverse transfection with Dharmafect2 reagent according to the manufacturer's instructions. Transfections were performed using TPX2 [L-010571-00-0005 and M-010571-00-0005 and UBE2C (L-004693-00-0005 and M-004693-03-0005] siRNA pools. Non-targeting control siRNA and scrambled control 2 were used. At 72 hours after transfection, RNA was extracted and knockdowns validated by quantitative PCR to be at least 85% to 90%. RNA was extracted and samples were hybridized to HG_U133 Plus 2.0 arrays. Expression calls were generated by the MAS5.0 algorithm (R-package simpleaffy; ref. 23). Gene expression values were subtracted from the gene expression values after silencing TPX2 and UBE2C, respectively, from the siRNA control experiment. This difference was multiplied by the sign of the Spearman correlation of the gene expression with TPX2 or UBE2C in the tumor samples. Negative values indicate that silencing of TPX2 or UBE2C results in downregulation of genes positively correlated and in upregulation of genes negatively correlated with TPX2 or UBE2C expression in tumors. Deviations from zero were tested with the Wilcoxon signed-rank test.
Weighted genome integrity index
The ploidy of a tumor sample was determined as the weighted median integer copy number, with weights equal to the lengths of the copy-number segments. For each sample and each of the 22 autosomal chromosomes, the percentage of gained and lost genomic material was calculated relative to the ploidy of the sample. The use of percentages eliminates the bias induced by differing chromosome sizes (24). The weighted genome integrity index (wGII) score of a sample is defined as the average of this percentage value over the 22 autosomal chromosomes.
Identification of CIN-associated core regulators and their coexpressed gene modules
We defined a core regulator for CIN as a gene or transcript driving the expression of a coregulated gene set (termed an expression module) by having aberrant expression and copy number in high-CIN tumors.
Step 1: Identification of genes whose expression is correlated with CIN.
From all genes passing the thresholds for SNP and gene expression data processing, the 500 genes with the highest positive correlation (Kendall Tau, τ) and the 500 genes with the lowest negative correlation between wGII score and gene expression were selected. All these genes had ∣τ∣ > 0.15 and a P < 0.05.
Step 2: Identification of genes passing step 1 and whose expression is correlated with CIN.
A modified GISTIC algorithm (25) for integer copy numbers was used to find genes significantly gained or lost (q value threshold of 25%) in high-CIN tumor samples [wGII > median(wGII)]. For each gene passing the filtering step 1, we computed a two-sample t statistics for expression differences between samples in which the gene was lost (copy number < ploidy) and samples in which the gene was not lost. Analogously, a two-sample t test for expression differences between samples in which the gene was gained (copy number > ploidy) and samples in which it was not gained was performed. Only genes passing step 1 and having a two-sided P < 0.05 for gains or losses, respectively, were used for further analysis.
Step 3: CONEXIC analysis to detect core regulators and their expression modules.
Genes passing all three criteria were taken as candidates in the Single Modulator Step of the CONEXIC algorithm (26). This algorithm splits the gene expression values of each candidate regulator across the different samples into two groups of low and high expression. For a given candidate regulator, the resulting expression threshold separating these two groups is then used to split all “target genes” (all genes including other candidate regulators) into two classes. CONEXIC uses a normal gamma score to compute the performance of all pair-wise splits of candidate regulators and “target genes” and uses permutation-based sampling to assign a P value (26). All genes with P < 0.001 are then assigned to the candidate regulator with the highest score. Nonparametric bootstrapping (100 bootstrap samples) is used to filter out spurious associations and all candidate regulators with more than 20 module genes in 90% of the bootstrap runs are used for a final run of the Single Modulator step, resulting in a list of potential core regulators. The top 30 genes with highest normal gamma score were selected as core regulators.
Association of key regulator copy number with signature expression
The correlation (Spearman) between copy number of all core regulators and expression of all signature genes was compared with the corresponding copy number—gene expression correlations of all signature genes and the percentage of significant (P < 0.05) correlations was computed. The connectedness of a gene is the mean over all these correlation coefficients. For Oncotype Dx, all loading control genes were removed before the analysis.
Enrichment of gene ontology terms
Several sets of genes were tested for an enrichment of gene ontology (GO) terms using version v3.0 (27, 28). A 2 × 2 contingency table was created by overlapping the ENTREZ gene identifiers of the gene lists with GO term-associated gene sets. GO term enrichment was tested by one-sided Fisher exact tests.
Somatic mutations
For the analysis of somatic gene mutations with next-generation DNA sequencing data, unvalidated (level 2) mutation annotation format (MAF) files were downloaded from the TCGA data portal (18). The MAF files contain information about mutation type, gene names, validation status, and the type of sequencing platform. Matching copy number and somatic mutation data were available for 254 ER+ breast tumors. To detect genes that show an association of the probability of a somatic mutation with increasing or decreasing degrees of wGII scores, a univariate logistic regression model was applied for each gene separately, and genes were ranked by P values.
Data access
Microarray expression data and SNP array data were selected from TCGA (18). Genome-wide shRNA data for ER+ cell lines were downloaded from the COLT database (22). The RNAi screening data for HCT116, PC9, RCC4, HT1080, and MCF-10A are available from the High-throughput screening database (21).
Results
Prognostic signature expression correlates with proliferation
The association of prognostic gene expression signatures with outcome may be explained in part by their correlation with measures of cellular proliferation (10, 29, 30). We investigated this relationship for the five prognostic gene signatures, CIN70, Oncotype Dx, MammaPrint, PAM50, and gene expression grade index (GGI; refs. 16, 29–32) in 264 ER+ breast tumor samples from TCGA (33). Oncotype Dx (16 genes + 5 genes for baseline normalization), MammaPrint (70 genes), PAM50 (50 genes), and GGI (97 genes) are all breast cancer–specific prognostic gene expression signatures. The CIN70 signature was derived from a measure of total functional aneuploidy, indicative of CIN, and is associated with prognosis in multiple tumor types, including breast cancer (16).
On the basis of a GO analysis, we find many genes in these five signatures to be involved in cell cycle and mitotic processes (Fig. 1A). Consistent with previous studies, the expression of most signature genes of all five of the prognostic signatures is highly correlated with the mRNA expression of the MKI67 proliferation marker (Supplementary Fig. S1A). MKI67 is a member of the three signatures, GGI, PAM50, and Oncotype DX. Prognostic signature gene expression is also significantly correlated with expression of a gene set that correlates with the key proliferation gene PCNA (proliferating cell nuclear antigen)—the meta-PCNA signature (10), see Supplementary Fig. S1B.
Association of breast cancer prognostic signatures with proliferation. A, enrichment of GO terms in the five breast cancer prognostic signatures. The rows are ordered by increasing sums of P value across the breast cancer signatures. B, overlap of breast cancer signature genes with genes essential (P < 0.05) in at least four of eleven whole-genome shRNA screens from the COLT database (22, 34). C, overlap of breast cancer signature genes with proliferation genes whose siRNAs reduced cell viability by a Z-score of < −2 in three of five cell lines with diverse cell type or tissue of origin.
Next, we checked for a functional role of the individual genes within the breast cancer prognostic signatures with cellular proliferation through an independent analysis of two whole-genome RNAi screening datasets (see Materials and Methods). Genes inducing cell death (P < 0.05) in at least four out of 11 ER+ breast cancer cell lines (22, 34) were defined as proliferation genes in breast cancer (721 genes). A second set of 82 proliferation genes was derived from in-house RNAi screens performed in a panel of 5 human cell lines: HCT116 (colon carcinoma), PC9 (lung cancer), RCC4 (renal cell carcinoma), HT1080 (fibrosarcoma), and MCF-10A (mammary epithelial cells). For both proliferation gene sets, we found a significant (P < 0.05) overlap with the CIN70, GGI, and PAM50 signatures (Fig 1B and C). The relatively small overlap with Oncotype DX signature genes might be attributable to the small number of genes in Oncotype DX. The exception is MammaPrint, in which no significant overlap with any proliferation gene set was observed. This might be related to the fact that the 70 signature genes in MammaPrint were obtained from genes differentially expressed in metastatic versus non metastatic tumors.
These results confirm the functional role in cellular viability and proliferation for many of the genes incorporated in prognostic signatures, independent of cell type or endocrine responsive status (Fig. 1B and C).
Increased chromosomal complexity is associated with increased expression of proliferative gene expression markers
Prognostic signatures have been suggested to reflect CIN and proliferation (13, 14). We explored this relationship in the cohort of 264 ER+ breast tumor samples. To assess the CIN status of each tumor, we used a SNP array based surrogate score, wGII (24, 35). In comparison with the original GII score (35), the wGII score avoids the bias caused by gains and losses of large chromosomes relative to small chromosomes. The wGII score takes values between zero and one and is positively correlated with all genes in the CIN70 signature in the 264 tumor samples (Fig. 2A). This consistency supports the use of wGII to classify CIN status in breast tumors (36). In addition, the majority of genes in the four breast cancer–specific gene signatures (Oncotype Dx, MammaPrint, PAM50, and GGI) are also significantly correlated with wGII (Fig. 2A). Together with the broad spectrum of wGII scores observed in these 264 tumors (Fig. 2B), these associations suggest that CIN status might define patient subgroups with differential outcome (14, 16, 37, 38).
Association of wGII score with proliferation. A, correlation of wGII score with prognostic signature gene expression. Each dot shows the correlation coefficient of wGII score with one gene of the particular signature (red, P < 0.05; and black, P ≥ 0.05). B, distribution of wGII scores in the 264 ER+ breast cancer samples. C, the wGII score versus MKI67 expression in each breast cancer sample. D, correlation of wGII score with expression of meta-PCNA genes. Each point corresponds to one gene in the meta-PCNA gene set.
We also examined the relationship between CIN and markers of proliferation in this cohort. We found a highly significant correlation between wGII score and both MKI67 expression (Fig. 2C, P = 3 × 10-7), and the majority of genes in the meta-PCNA gene set (Fig. 2D, 78% with P < 0.05).
Identification of core regulators encoded within regions of somatic copy-number alteration
Increased CIN requires reconfigurations in cancer cell gene expression programs. On the basis of the association of CIN and proliferation, we hypothesized that part of this CIN expression program may be regulated by a small number of “core regulators.” Specifically, we hypothesized that expression signatures functionally implicated with proliferation (see Glossary in Fig. 3) might be “hard wired” in the CIN genome through the selection of recurrent copy-number changes encoding such core regulators.
Identification of core regulators and modules. A, core regulators of CIN are genes with altered copy number and expression in high CIN tumors that regulate a gene expression module. B, first step, candidate regulators are identified by filtering copy number–aberrant genes whose expression (GE) is associated with both their copy number (CN) and with wGII score. Second step, each core regulator is linked to expression modules that are regulated by the core regulator. The CONEXIC algorithm (26) is applied to rank these core regulator–module pairs. Third step, core regulators and their associated modules are then compared with known breast cancer prognostic signatures. Copy number data are displayed in blue (loss), white (no change), and red (gain) and gene expression data are displayed in red (low), black (average), and green (high).
To identify possible core regulators (Fig. 3), we first selected a set of candidate regulator genes whose expression and copy number are associated with increasing wGII score and whose somatic copy-number gain or loss results in altered expression (see Materials and Methods). We found 180 candidate regulators located in regions of frequent genomic loss and 267 candidate regulators located in regions of frequent genomic gain in wGII high tumors (Supplementary Table S3).
In a second step, we searched for sets of coexpressed genes (collectively termed a “module”) belonging to each individual candidate regulator. During this search, the set of candidate regulators was reduced to the set of core regulators by eliminating candidates for which a sufficiently strong module could not be found. To find these regulator–module pairs, we used the first step of the CONEXIC algorithm (26), which was originally developed to identify general drivers of cancer, not those specifically associated with CIN. In this approach, the pairing of a candidate regulator with a given individual module gene occurs only if the expression of the regulator best explains the expression of its module genes and if the expression of no module gene is better explained by any other candidate regulator. Each regulator and paired module of genes is assigned a normal gamma score (26), which measures the strength of the cohesion between the core regulator and its module. We selected the 30 regulator–module pairs with the highest normal gamma score (for both gained and lost chromosome regions) and defined these 30 regulators as core regulators (see Table 1 and Supplementary Fig. S2).
Top 30 core regulators (CONEXIC) ordered by decreasing CONEXIC scores
Overlap of the meta-PCNA signature with core regulator modules
Amplified core regulators and their gene modules are enriched for proliferation GO terms
A GO analysis revealed a significant association of the 30 core regulators with mitotic and cell cycle–related processes (Supplementary Table S4). We also tested the individual modules of each core regulator separately for an enrichment of biologic processes in GO and ranked them according to their P values (see Supplementary Table S5 for the top 50 GO terms). The modules of the amplified core regulators, PAQR4, TPX2, UBE2C, PTTG3P, PKMYT1, were among the top 50 associations and were enriched for genes involved in cell cycle and mitotic processes.
Only seven of the 30 core regulators are located in regions of recurrent genomic loss (Table 1). With exception of THSD1 and TPT1 (also known as TCTP; ref. 39), the core regulators in regions of recurrent genomic loss encode components of the ribosome. Four of the five ribosomal core regulators are pseudogenes, which might indicate a regulatory function of these transcripts, and only RPL17 encodes a protein product. Accordingly, the modules belonging to the lost core regulator pseudogenes, RPS13P2, LOC253482, and RPL12P14, exhibit highly ranked associations with structural components of the ribosome (Supplementary Table S5). The lost core regulator TPT1 was recently shown to interact with p53 and to be important for DNA damage sensing and repair (40).
Expression of a large fraction of genes in the meta-PCNA proliferation signature is significantly correlated with expression of the core regulators (Supplementary Fig. S3). In addition, eight of 30 core regulator modules overlapped significantly (P < 0.05) with the meta-PCNA signature (Table 2). Three amplified core regulators, TPX2, UBE2C and AURKA, are themselves members of the meta-PCNA signature (Fisher exact test P = 0.00097). The TPX2 module contained 20 genes from the meta-PCNA signature (P = 1.4 × 10−23), and TPX2 was also classified as a proliferation gene in our meta-analysis of genome RNAi screens (see Supplementary Table S1).
These strong associations suggest that copy-number amplification of certain core regulators, including TPX2 and UBE2C, might regulate proliferation in high CIN tumors by regulating CIN-specific gene expression modules functionally implicated in proliferation. Of note, we found a strong and significant association of p53 somatic mutations with wGII (Supplementary Fig. S4), reconfirming published associations of p53 with CIN (7, 41).
Copy number of UBE2C and TPX2 is highly associated with prognostic signature gene expression
The modules of seven core regulators (TPX2, UBE2C, EXO1, PAQR4, PTTG3P, MYBL2, and UBE2T) significantly (Fisher exact test P < 0.05) overlap with the CIN70 signature (Fig 4A and B). More surprisingly, the expression modules of the seven amplified core regulators, TPX2, UBE2C, EXO1, PAQR4, UBE2T, MYBL2, and PTTG3P have a significant overlap (Fisher exact test P < 0.05) with at least one of the four breast cancer–specific prognostic gene signatures, GGI, PAM50, MammaPrint, or Oncotype Dx (Fig. 4A and B). The TPX2 and UBE2C expression modules were overrepresented in three breast cancer–specific prognostic signatures (Fig. 4B) and in CIN70 with P values ranging from 0.0075 to 1.1 × 10−41 for TPX2 and from 1.5 × 10−4 to 3.7 × 10−24 for UBE2C. The modules of EXO1 and PAQR4 were overrepresented in GGI and MammaPrint. In addition, eight amplified core regulators (TPX2, AURKA, UBE2C, EXO1, MYBL2, TPT1, KIF14, and UBE2T) are members of one or more prognostic signatures, and four out of five of the signatures contain at least one amplified core regulator (indicated as crosses in Fig. 4B; see also Supplementary Table S6). We did not find any such associations for core regulators or their modules located in regions of genomic loss.
Association of core regulators with breast cancer prognostic signatures. A, the overlap of breast cancer signatures (columns) with core regulator modules (rows). The rows in each column are ordered by decreasing Fisher exact test P values and only the top five significant core regulator–associated modules are displayed per breast cancer signature. B, core regulators (rows) with modules enriched (P < 0.05) for at least one breast cancer prognostic signature (red, significant enrichment; +, the core regulator is a member of the signature). C, percentage of significant correlations between copy number of signature genes with gene expression of each other signature gene (circles). The colored symbols indicate the percentage of significant correlations between the copy number of the core regulators UBE2C (red squares), TPX2 (blue circles), PAQR4 (magenta triangles), and EXO1 (green diamonds) with the expression of all genes in the respective gene signature.
We next asked how the copy number of any amplified core regulator is related to the expression of the prognostic cancer signature genes (Fig. 3A). We restricted this analysis to TPX2, UBE2C, EXO1, and PAQR4, the only four core regulators whose modules were significantly overrepresented in at least two breast cancer–specific prognostic signatures and in CIN70 (Fig. 4B). For a given signature, we computed the correlation coefficient between the copy number of each signature gene with the expression of all other signature genes. Similarly, the copy number of the four core regulators TPX2, UBE2C, EXO1, and PAQR4 was correlated with the expression of each signature gene. By comparing the percentage of significant correlations (Fig. 4C), we found the core regulators UBE2C and TPX2 to be highly ranked in the six signatures, CIN70, GGI, PAM50, PAM50, MammaPrint, and Oncotype Dx, with UBE2C being found among the top five genes in all of them (red squares in Fig. 4C). In particular, UBE2C copy number is significantly correlated with the expression of more than 80% of the genes in the GGI signature and ranks second in Oncotype DX, although it is not a member of that gene expression signature. The copy number of the core regulators PAQR4 and EXO1 had only moderate correlations with signature gene expression.
As an alternative summary measure of association between copy number and expression, we computed the average connectedness (27) of a gene, which is given by its average absolute correlation coefficient between its copy number and expression taken over all genes in all signatures. We found UBE2C and TPX2 copy numbers to be strongly connected with the expression of the signature genes (Supplementary Fig. S5). In summary, copy-number amplifications of the core regulators UBE2C and TPX2 are strongly associated with prognostic breast cancer signature expression in ER+ CIN tumors.
Effects of silencing UBE2C or TPX2 by RNAi in T47D cells
We tested the effects of silencing UBE2C and TPX2 upon gene expression in the ER+ T47D cell line (Fig. 5). T47D cells have increased UBE2C and TPX2 expression and copy number. For each gene in the respective module or signature, we computed the expression difference of the RNAi non-targeting control experiment with the expression value in cells silenced for the core regulator TPX2 or UBE2C. This difference was then multiplied by the sign of the correlation coefficient between the core regulator and the respective gene in the tumor data. Thus, a negative sign indicates expression changes in T47D concordant with the changes predicted from the tumor data, whereas positive values correspond to discordant changes.
Silencing TPX2 or UBE2C in the T47D cell line. Silencing of TPX2 (A and B) and UBE2C (C and D) using two different siRNA agents (siGenome and ON-Targetplus) and effect on the expression of the respective core regulator modules and on gene expression signatures. The boxplots display signed differences in expression between siRNA and control samples for the genes in the respective core regulator module or gene signature. Negative values of these signed differences correspond to expression changes concordant with the corresponding associations in the tumor data and positive values to discordant results (see text for details). The overall P value indicates the overall significance of the signed expression differences.
The majority of the gene expression changes of core regulator modules and gene signatures is concordant with the expected changes in the tumor data (Fig. 5). The only changes discordant appear for CIN70, GGI, and the meta-PCNA proliferation signature in response to UBE2C silencing with siGENOME (Fig. 5C). The P values for the signed expression changes of Oncotype DX are relatively large—a finding that is not unexpected given that there are only 16 genes in this signature. Taken together, the expression changes for the signatures and modules follow the same pattern (Fig. 5), as observed for the tumor data. These data support the effect of TPX2 and UBE2C core regulator expression on their modules and the relation with prognostic and proliferation signatures.
Discussion
In this study, we worked from the hypothesis that CIN tumor cells have evolved a specific gene expression program conferring a selective advantage (41, 42). We focused on the detection of specific copy-number aberrations that are both strongly associated with CIN (as assessed by wGII) and with the altered expression activity of characteristic expression modules (Fig. 3B). For ER+ breast tumors, we provide evidence that part of the CIN expression program is indeed hard-wired in the CIN genome by specific copy-number aberrations of core regulators such as TPX2 and UBE2C. In particular, the amplified core regulators TPX2 and UBE2C and their respective modules are associated with both proliferation markers and cell cycle- or mitosis-related processes.
Although the association of TPX2 and UBE2C copy number gain and CIN has been previously reported (43, 44), we show that the copy number of these core regulators is also strongly linked to the expression of prognostic signature gene sets for ER+ breast cancer, which themselves are associated with proliferation and CIN. These results are supported by the changes in the expression patterns of the breast cancer signatures following siRNA silencing of UBE2C and TPX2.
Although most core regulators were amplified, we also found seven core regulators in regions of genomic loss. The lost core regulator TPT1 encodes a multifunctional protein involved in the regulation of cell death and proliferation, as well as DNA damage sensing and repair, and is part of a reciprocal feedback loop with p53 enabling tolerance of ongoing DNA damage and repair (39). Five of the seven lost core regulators are related to the ribosome, which might reflect evolutionary adaptation to the increased transcriptional and translational load of CIN cells, due to increased ploidy, or other extra-ribosomal functions such as DNA repair, apoptosis and cellular homeostasis (45). The fact that four of the five ribosomal core regulators are pseudogenes might also hint to a potential regulatory function, possibly similar to the role of the PTENP1 pseudogene as a growth suppressor (46).
A direct measurement of both CIN and proliferation, in conjunction with copy number and gene expression data acquisition, is currently not feasible. Increased proliferation was only indirectly assessed by correlation with proliferation markers. We cannot exclude potential technical bias caused by comparing data from different expression microarray platforms. The wGII surrogate score is also not a direct measure for CIN status, but its predictive capability is well tested in various data sets (35, 36). By design, we might have missed regulators of gene expression in CIN cells that are not located in regions of aberrant copy number. A similar analysis using alternative aberrations as filters, such as methylation patterns or frequent somatic mutations, is a direction for future research.
The relationship between aneuploidy, CIN, and proliferation remains a complex research question (42, 47). Our results cannot explain the development of CIN and tumor heterogeneity. However, one emerging picture is that a certain level of CIN provides a way to sample many aneuploid karyotypes and, by means of Darwinian selection, to increase the chance of acquiring genomic configurations associated with higher fitness. On the basis of results presented here, we suggest that the identification of core regulators driving gene expression in CIN tumors contributes to the understanding of these fitness states, which might facilitate the identification of therapeutic strategies to attenuate CIN.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Disclaimer
The results published here are in part based upon data generated by TCGA pilot project established by the NCI and NHGRI. Information about TCGA and the investigators and institutions that constitute TCGA Network can be found at http://cancergenome.nih.gov/. The data were retrieved through dbGaP authorization (accession numbers phs000178.v4.p4 and phs000178.v5.p5).
Authors' Contributions
Conception and design: D. Endesfelder, R. Burrell, N. McGranahan, C. Swanton, M. Kschischo
Development of methodology: D. Endesfelder, C. Swanton, M. Kschischo
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): M. Howell, P.J. Parker, C. Swanton
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): D. Endesfelder, R. Burrell, N. Kanu, N. McGranahan, C. Swanton, M. Kschischo
Writing, review, and/or revision of the manuscript: D. Endesfelder, R. Burrell, P.J. Parker, J. Downward, C. Swanton, M. Kschischo
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): R. Burrell, C. Swanton, M. Kschischo
Study supervision: C. Swanton, M. Kschischo
Grant Support
C. Swanton, P.J. Parker, J. Downward, R. Burrell, and D. Endesfelder were funded by Cancer Research UK. C. Swanton receives funding from the EU Framework Program 7, The Prostate Cancer Foundation, and the Breast Cancer Research Foundation, and is supported by researchers at the National Institute for Health Research University College London Hospitals Biomedical Research Centre. N. Kanu is funded by the Breast Cancer Research Foundation.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact
Footnotes
Note: Supplementary data for this article are available at Cancer Research Online (http://cancerres.aacrjournals.org/).
- Received September 18, 2013.
- Revision received February 20, 2014.
- Accepted May 28, 2014.
- ©2014 American Association for Cancer Research.