| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Systems Biology and Emerging Technologies |
1 GeneGo, Inc., St. Joseph, Michigan; 2 Vavilov Institute for General Genetics, Moscow, Russia; 3 Department of Medical Oncology, 4 Center for Cancer Genome Discovery, Dana-Farber Cancer Institute 5 Harvard Medical School, Boston, Massachusetts; and 6 Harvard University and 7 Broad Institute, Cambridge, Massachusetts
Requests for reprints: Kornelia Polyak, Dana-Farber Cancer Institute, 44 Binney Street D740C, Boston, MA 02115. Phone: 617-632-2106; Fax: 617-582-8490; E-mail: Kornelia_Polyak{at}dfci.harvard.edu.
| Abstract |
|---|
|
|
|---|
| Introduction |
|---|
|
|
|---|
Cancer genomes were screened for somatic mutations and genetically altered pathways in several large-scale studies using unbiased genome-wide technologies (3, 10, 11). Recently, 1,142 genes were identified with somatic alterations based on exon resequencing of over 18,000 human genes in 11 breast cancer samples (12). One hundred forty of these mutated genes were defined as "CAN" (candidate cancer) driver genes. A study focused on protein kinases identified 79 genes with somatic mutations in breast cancer (3). Both studies concluded that the cancer genome is extremely complex, a large number of genes are mutated in each tumor, and very few genes are mutated in a high fraction of tumors. This high degree of complexity raised new challenges with data interpretation and for the identification of key genetically altered pathways amenable to therapeutic targeting.
To help the functional interpretation of complex data sets, several systems analysis methods have been developed and applied for breast cancer including gene set ontology enrichment, interactome, network and pathway analysis, and network modeling (11, 13–15). We have previously used the data analysis MetaCore platform for elucidating pathways specifically activated in breast cancer stem cells (16), for the identification of pathways differentiating normal mammary luminal epithelial and myoepithelial cells (17), and for the comparative analysis of DNA methylation and gene expression profiles of four distinct normal mammary epithelial cell types (18).
Here, we describe a large-scale study to characterize copy number alterations in primary breast carcinomas and to define functional interactions among genetically altered genes in breast cancer. Using high-density single nucleotide polymorphism (SNP) arrays, we analyzed 191 breast cancer samples and identified 1,747 genes with copy number gain (called the breast cancer amplicome). We compared these amplified genes to genes mutated in breast cancer (called the breast cancer mutome) and identified interactions and networks reflecting potential cooperation among genes genetically altered in breast tumors.
| Materials and Methods |
|---|
|
|
|---|
Breast tumor samples. Breast cancer cell lines were obtained from American Type Culture Collection or were generous gifts of Drs. Steve Ethier (University of Michigan, Ann Arbor, MI), Fred Miller (Karmanos Cancer Institute, Detroit, MI), and Arthur Pardee (Dana-Farber Cancer Institute, Boston, MA). All cell lines were grown in medium recommended by the provider. Fresh or frozen tumor specimens were obtained from the Brigham and Women's Hospital, Beth-Israel Deaconess Medical Center, Massachusetts General Hospital, University of Zagreb (Croatia), Duke University, and National Disease Research Interchange. All human tissue was collected using protocols approved by the Institutional Review Boards. Only immunomagnetic bead purified or microdissected tumor samples were used for the analysis. Tissue was snap frozen on dry ice and stored at –80°C until use or were processed for purification as described previously (16).
SNP array analysis. SNP array analysis was performed by the Dana-Farber Microarray Core and by the Broad Institute using Affymetrix 250K StyI SNP arrays as described (16). The raw .CEL files were normalized, and the copy number of each SNP were generated using the GenePattern software.8 The procedure was optimized to exclude normal references that were contaminated with tumor tissues. Normal references (matched or virtual) were used as controls to exclude copy number polymorphisms. Raw SNP copy numbers were then smoothened by a segmentation algorithm using the DNAcopy package.9 To facilitate detailed analysis, copy number of each gene was estimated by averaging copy numbers from all SNPs found within the gene structure and flanking 100-kbp regions.
| Results and Discussion |
|---|
|
|
|---|
|
Amplicons are highly variable in encoded gene function. We analyzed relative enrichment of individual amplicons for specific protein functions including transcription factors (TF), receptors, secreted proteins, kinases, phosphatases, proteases, and metabolic enzymes. Amplicons varied greatly in composition of encoded protein functions (Supplementary Excel File S2). For example, amplicon 7p15 was entirely composed of TFs encompassing the HOXA homebox gene cluster. Amplicon 20p13 showed 43% enrichment in receptors, whereas amplicons 1q32, 17q21-q25, and 22q13 were highly enriched in kinases. Amplicons 1q32 and 12q13-q21 had high fraction of ligands, whereas effector proteins, including proteases and metabolic enzymes, were concentrated on amplicons 16p13 and 12q24, respectively. This enrichment could potentially be due to the evolvement of gene families through gene duplications, which frequently results in physically linked genes encoding for the same function (21). HOXA and HOXB gene clusters in amplicons 7p15 and 17q21-25, respectively, encoding for TFs and amplicon 17q11-21 enriched for chemokines and keratins are potential examples for this. However, other amplicons contained genes with unrelated sequence, but encoding for similar function including the 20q12-13 amplicons enriched for TFs.
Genes within individual amplicons do not form self-sustainable functional entities. Coamplified genes located in one amplicon were thought to form functional interactions. The best example for this is the coamplification of ERBB2 and GRB7 in the 17q12 amplicon (1). However, in our data set, genes within individual amplicons generally did seem to form concise functional entities such as pathways, cellular processes, and networks. We conducted enrichment analysis of genes in all 30 amplicons in 5 functional ontologies (GO processes, GO molecular functions, canonical pathway maps, cellular process networks, and disease biomarkers). Genes within individual amplicons were enriched in cellular processes and pathways relevant to tumorigenesis including cell cycle, cell adhesion, DNA repair, development, immune response, and cytoskeleton (Supplementary Excel File S3). However, in individual amplicons, cellular pathways were sparsely populated with amplified genes (only one or two genes per pathway) and the P values of hypergeometric distribution for individual amplicons were high in comparison with the complete amplicome, suggesting that none of the processes and pathways were encoded by a single amplicon. All large amplicons encoded multiple pathways and processes without significant dominance of individual functional entities. Thus, amplicons did not show distinct specialization for pathways and processes. However, individual amplicons were highly synergistic in encoding for genes with functional roles in tumorigenesis, as evident from the enrichment analysis of the entire amplicome. This synergy was clearly visible in some tumorigenic pathway maps assembled from genes located in different amplicons, including the IGF-RI pathway map composed of the genes derived from 11 amplicons (Fig. 2 ).
|
The density of interactions within individual amplicons was lower than that in the entire human interactome. Network topology analysis evaluates the connectivity among genes within a set measured as the average number of interactions per network object (degree) divided into incoming interactions (degree IN) and outgoing interactions (degree OUT; ref. 23). The global human interactome of 16,000 proteins with >200,000 interactions (MetaCore database) was used as a control for evaluating the relative connectivity within amplicons. Only 6 of 30 amplicons (7p15, 8q12-q22, 8q23-q24, 17q11-q21, 17q21-q25, and 20q12-q13) displayed any sensible degree of intraamplicon connectivity. Moreover, connectivity within these amplicons was significantly lower than that in the global human interactome (Supplementary Excel File S4). The largest hubs within individual amplicons were GRB2, c-MYC, 14-3-3β/
, PKC-
, MAP3K3, NDPKB, and CAAT/enhancer binding protein (C/EBP)-β (Supplementary Fig. S3). Conversely, some amplicons had significantly higher number of interactions with genes outside of the amplicon than the global interactome had (Supplementary Excel File S4). Interestingly, half of the amplicons were enriched with degrees IN, but only four (8q23-24, 15q26, 17p11, and 20q12-13) were enriched with degrees OUT. This indicates that amplicons in general are "regulated" rather than "regulating" data sets, and they were enriched for "effector" rather than signaling pathways and processes.
Network analysis of the breast cancer amplicome. To further evaluate functional interactions among genes within amplicons, we analyzed signaling and metabolic networks among 1,360 amplified genes encoding known function using Analyze Networks (AN) algorithms. This series of algorithms reveal and visualize the most relevant multistep subnetworks of a selected size. The AN Transcription factors (ANTF) algorithm (24) connects all genes via directed shortest path and prioritizes subnetworks based on the following variables: (a) relative enrichment of the network with the seed genes (in this case, gained genes); (b) relative enrichment of the network with canonical pathways; and (c) relative completeness of noncanonical pathways from ligands to TF targets.
One of the top scored ANTF networks of the breast cancer amplicome included the JUN oncogene as central hub with 32 targets (Supplementary Fig. S4). Although JUN itself was not amplified, it is regulated by the amplified MEF2 TF via calmodulin signaling and its targets include amplified ligands (IL24, IFN-
, galanin, and endostatin) and receptors (HER2, GRP30, and SSTR2). We found 30 additional highly scored ANTF networks, many of them including tumor suppressors (TP53 and SMAD3) and oncogenes (C/EBPB and STAT3) encoding for TFs as central hubs highlighting a key role for transcriptional regulation in tumorigenesis (Supplementary Table S2).
Next, we applied the AN Receptors (ANR) algorithm that adds selectivity of the entire pathway from the most likely ligand to one membrane receptor as additional prioritization criteria (24). The highest scored ANR subnetwork was centered around the ESR1 nuclear hormone receptor (Supplementary Fig. S5; Fig. 3 ). ESR1 was not amplified in our data set, although it was previously reported to have copy number gain in a subset of breast tumors (25), but it had significantly more direct targets among amplified genes than expected by chance highlighting its importance in breast cancer. The algorithm selected the amplified IGF1R as the most likely receptor connected with ESR1 via SP1 signaling path between the two. Other important nodes of the ESR1 network included BRCA1 (downstream of ATM and upstream of ESR1) and amplified MYC (downstream of BRCA1 and CDK4; Fig. 3).
|
Among 17 interaction mechanisms tested, interactions between amplicons were enriched in transcription regulation (TR) links, defined as physical binding of TFs to the transcription regulatory regions of their target genes (Supplementary Excel File S5). Out of 65 amplicon pairs featuring TR interactions, 49 were overconnected and only 16 underconnected. Two amplicons were particularly enriched with outgoing TR interactions: 8q23-q24 and 20q12-q13 that transcriptionally regulated 18 (102 TR links) and 14 (52 TR links) amplicons, respectively. The most prominent TF in amplicon 8q23-q24 was c-Myc as it had the highest number of outgoing TR links. In contrast, in amplicon 20q12-q13, several TFs (C/EBP-β, ZNF-217, and TFAP2C) were equally prominent. These data were consistent with the above-described functional analysis demonstrating that amplicon 20q12-q13 was enriched in TFs (Supplementary Excel File S2), whereas amplicon 8q23-q24 has the highest number of outgoing interactions (Supplementary Excel File S4).
Comparative analysis of breast cancer amplicome and mutome. Next, we investigated functional interactions among genes genetically altered due to amplification (amplicome) or somatic mutations (mutome) in breast cancer. The mutome gene set was defined as a nonredundant union of 1,188 somatically mutated genes from two large-scale genome sequencing studies (3, 12). Nine hundred sixty-five of these 1,188 genes had interactions in MetaCore database (Supplementary Excel File S6). We also analyzed the set of 140 CAN (candidate cancer) genes that were likely to play a casual role in tumorigenesis (11).
Contrary to amplified genes, mutated genes were randomly distributed in the genome (Supplementary Excel File S6). Only 94 genes were part of both the mutome and the amplicome, and only 12 from the CAN genes were amplified, which is a statistically significantly smaller set than expected (P = 0.048 and P = 0.456, respectively; Supplementary Excel File S7; Table 1 ). Enrichment analysis of 94 mutated and gained genes did not show significantly enriched pathways, processes, and cellular processes (data not shown). One possible reason for the small overlap between two data sets could be that the mutational analysis was conducted on a different and much smaller set of tumors than what we used for copy number analysis. Alternatively, some functional gene categories are preferentially affected by copy number gain versus mutation.
|
, the direct target of ERM. The second scored network enriched with DNA exchange genes, linked the mutated ATM, ATR, BRCA1, and BRCA2 genes with the amplified RAD9, BRIP1, and EMSY (Supplementary Fig. S8B). Interestingly, mutated genes were always upstream of the amplified ones on these networks. In total, 23 mutated TFs were overconnected with amplicome genes with the highest overconnection scores observed for TFs ATF2 and NFYC (Supplementary Table S4). Seven of CAN TFs had direct targets among amplified genes: TP53 (52 targets), HNF1A (25 targets), GLI1 (4 targets), BRCA1 (3 targets), PLU1 (2 targets), RFX2, and TAF1 (1 target each; Supplementary Excel File S9). In addition to TFs, several other CAN genes of different function had a large number of downstream interactions with amplicome genes including NOTCH1 (11 interactions), HDAC4 (5 interactions), and FLNA (4 interactions). Among amplified genes, the gene encoding for cytoskeletal actin had the highest number (5) of upstream links with mutated genes (DBN1, FLNB, GSN, TLN1, and TARA); MYC and IGF1R each had four links (BRCA1, HDAC4, NOTCH1, and TP53; and BRCA1, HNF1A, NOTCH1, and TP53), respectively; and Cyclin D1 had three links (BRCA1, GLI1, and HDAC4). Based on these data, BRCA1 and HDAC4 were the two mutated genes with the highest number of interactions with amplified genes.
Connectivity of mutome and amplicome proteins within sets and with the global proteome. Overall, both the mutome and the amplicome showed somewhat higher connectivity than the global human interactome. The degree was 11.75 and 9.579 for the mutome and the amplicome, respectively, compared with 9.133 for the entire human interactome (Supplementary Excel File S10). This may indicate that the mutome is slightly enriched in hubs (highly connected proteins). Interestingly, the difference between the connectivity of the mutome and the amplicome was higher for outgoing links than for incoming links. On average, mutome and amplicome genes had 11.44 and 7.978 OUT interactions, respectively, compared with only 8.56 in the global interactome. This phenomenon was particularly striking in the case of the CAN genes that featured 22.1 OUT interactions (2.6-fold more than expected). These data were consistent with the observations above indicating that the mutome is generally a "regulating" data set. Intraconnectivity within data sets was comparable between mutome and amplicome (2.942 for mutome and 2.548 for amplicome; Supplementary Excel File S10).
Subsequently, we evaluated the relative distribution of genes encoding for proteins with different molecular function within the mutome and the amplicome (Supplementary Excel File S11). The mutome had fewer secreted ligands than expected but was enriched in kinases (2-fold over expected) and receptors. The amplicome contained slightly fewer receptors than expected. The fraction of TFs was similar in both data sets.
Next, we evaluated intraconnectivity and interconnectivity of individual proteins in the mutome and the amplicome, specifically searching for statistically significantly overconnected and underconnected proteins. We analyzed separately interconnections between each sets (mutome and amplicome), and the global proteome (Supplementary Excel File S12) and for intraconnections within mutome and amplicome (Supplementary Excel File S13). The largest overconnected hubs in the case of interconnections were proteins outside the sets (e.g., proteins encoded by neither gained nor mutated genes). ESR1 was the most important transcription hub (overconnected TF) for the amplicome. ESR1 had 102 direct targets among amplified genes (128 interactions total), which was statistically significantly (P < 0.00003) higher than the expected 73; a 63% enrichment (Supplementary Excel File S12; Fig. 3). In contrast, ESR1 was statistically significantly (P = 0.03) underconnected with the mutome (41 TR links with mutated genes compared with the expected 54). This could potentially be due to the different distribution of estrogen receptor (ER)-positive and ER-negative tumors within the amplicome and mutome, although ESR1 remained a main transcriptional regulator of amplified genes even when we restricted the amplicome analysis to ER-negative tumors (data not shown). Interestingly, the amplicome was specifically enriched in protein-encoding ESR1 target genes but not in ESR1 binding sites in general. About 11% of all ER
binding sites, determined based on ChIP-on-chip studies in MCF-7 cells (26), are located in the amplicome, which constitutes
10% of the genome (Supplementary Table S5). The size of individual amplicons correlated (R = 0.94) with the number of ER
sites they contained.
Other notable statistically significant outside (not amplified) overconnected hubs for the amplicome included TFs ESR2, RUNX2, PIT1, CSDA, and MBD1; ligands SPP1, LAMA4, CTGF, PDGF-A, and CCL24; EPHA receptors and PGE2R4, kinases PRKCD, PRKACB, MAP2K5, mammalian target of rapamycin, and PRKCZ and proteases SENP2, ABHD7, AMPL, CTRC, and CLPP (Supplementary Excel File S12).
The top statistically significant overconnected TFs within the mutome were HIVEP1, IRF7, NRSF, NEUROD2, and PAX8; receptors ROS1, FCGR1A, LILRB3, IGF1R, and E-selectin; ligands IFNA5, IL-12
, FGF10, Jagged2, and IFN-g; kinases Chk1, LKB1, ZIP-kinase, Fyn, and BUBR1; and proteases caspase-3, calpain 3, Granzyme B, presenelin 1, and caspase-7.
Within the amplicome, C/EBPβ, HOXB4, HOXB1, cKrox, and c-Myc were the top-scored self-regulatory TFs. The highest scored overconnected receptors included ITGB4, FGFR1, PGD2R, ADRB3, and PIGR, whereas the most connected ligands included K12, MIP-1
, Substance P, Neuropeptide W, and IL-24. The most relevant intraconnected kinases were MAPKAPK2, CLK2, Fructosamine-3-kinase, Casein kinase I
, and PKC-
; proteases included CAAX prenyl protease 2, cathepsin F, ADAM-33, matrix metalloproteinase (MMP)-17, and MMP-25. (Supplementary Excel File S13).
Ontology enrichment analysis of the breast cancer mutome and amplicome. To identify gene functions and pathways enriched in the breast cancer mutome and amplicome, we performed enrichment analysis in five functional ontologies (canonical pathways, process networks, disease biomarkers, GO processes, and GO molecular functions). Both the mutome and amplicome were enriched for genes and processes involved in tumorigenesis (Supplementary Excel File S14). The mutome was enriched for cell adhesion, cell cycle, and DNA damage pathways (this later was especially pronounced for the CAN genes), whereas the amplicome was enriched for developmental pathways. With respect of disease biomarkers (e.g., genes linked to over 500 diseases based on genetic and biochemical data), the amplicome was highly enriched for genes associated with breast cancer (P = 2.03 x 10–28) and skin disease (3.48 x 10–17). Surprisingly, the top 15 disease markers enriched in the mutome did not include cancer-associated biomarkers (with the exception of Leukemia), but markers for different nervous system disorders (Supplementary Excel File S14). This suggests that activation of tumorigenic pathways by gene amplification is a more significant mechanism driving breast tumorigenesis than somatic mutations.
Synergy in ontology enrichment distributions between breast cancer mutome and amplicome. To determine if mutated and amplified genes cooperatively alter certain pathways and networks, we analyzed synergy in ontology enrichment distributions between mutome and amplicome and found high degree of functional connectivity. Synergy was defined by a lower P value for an ontology entity in the distribution of the nonredundant union of amplicome and mutome gene content compared with amplicome and mutome individually. For example, pathway maps "Cell adhesion_Chemokines and adhesion" scored P = 3.2 x 10–12 for the union set and 3.1 x 10–9 for the mutome (Supplementary Excel File S14). The three orders of magnitude difference in P values reflects the enhancement of the mutant genes by amplified genes in this particular pathway and indicates functional connectivity between amplicome and mutome because a union of functionally disconnected data sets would yield higher P values in enrichment analysis due to its larger size.
Enrichment synergy between amplicome and mutome was evident for certain functional ontologies canonical pathway and processes networks (Fig. 4 ). The highest synergy in canonical pathways was found for "Cell adhesion_Chemokines and adhesion," "Cytoskeleton remodeling_TGF, WNT and cytoskeletal remodeling," "Cell cycle_ATM/ATR regulation of G1-S checkpoint," and "Development_IGF-RI signaling" (three orders of magnitude). These synergies suggest close interactions between mutated and amplified genes in these selected processes. In the ontology of process networks, the highest degree of synergy was observed for "Inflammation_IL10 anti-inflammatory response" and "Cell adhesion_Cadherins" (two orders of magnitude enhancement compared with amplicome or mutome alone).
|
| Disclosure of Potential Conflicts of Interest |
|---|
|
|
|---|
| Acknowledgments |
|---|
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
We thank Dr. Andrea Richardson for help with the acquisition of tumor samples and Drs. William Hahn and Matthew Meyerson for their critical reading of the manuscript.
| Footnotes |
|---|
Y. Nikolsky and E. Sviridov contributed equally to this work.
8 http://www.broad.mit.edu/cancer/software/genepattern/index.html ![]()
Received 8/11/08. Revised 8/26/08. Accepted 9/ 2/08.
| References |
|---|
|
|
|---|
(ESR1) gene amplification is frequent in breast cancer. Nat Genet 2007;39:655–60.[CrossRef][Medline]
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Cancer Research | Clinical Cancer Research |
| Cancer Epidemiology Biomarkers & Prevention | Molecular Cancer Therapeutics |
| Molecular Cancer Research | Cancer Prevention Research |
| Cancer Prevention Journals Portal | Cancer Reviews Online |
| Annual Meeting Education Book | Meeting Abstracts Online |