Gene expression profiling has proven useful in subclassification and outcome prognostication for human glial brain tumors. The analysis of biological significance of the hundreds or thousands of alterations in gene expression found in genomic profiling remains a major challenge. Moreover, it is increasingly evident that genes do not act as individual units but collaborate in overlapping networks, the deregulation of which is a hallmark of cancer. Thus, we have here applied refined network knowledge to the analysis of key functions and pathways associated with gliomagenesis in a set of 50 human gliomas of various histogenesis, using cDNA microarrays, inferential and descriptive statistics, and dynamic mapping of gene expression data into a functional annotation database. Highest-significance networks were assembled around the myc oncogene in gliomagenesis and around the integrin signaling pathway in the glioblastoma subtype, which is paradigmatic for its strong migratory and invasive behavior. Three novel MYC-interacting genes (UBE2C, EMP1, and FBXW7) with cancer-related functions were identified as network constituents differentially expressed in gliomas, as was CD151 as a new component of a network that mediates glioblastoma cell invasion. Complementary, unsupervised relevance network analysis showed a conserved self-organization of modules of interconnected genes with functions in cell cycle regulation in human gliomas. This approach has extended existing knowledge about the organizational pattern of gene expression in human gliomas and identified potential novel targets for future therapeutic development.
- functional network analysis
- relevance network
Gene expression profiling in human gliomas has been a valuable tool in identifying differentially expressed genes to classify disease subtypes and patient outcome ( 1). However, a persistent difficulty has been the assignment of biological significance to the large number of genes that have been implicated by such studies, even when inferential statistics has been used to allocate confidence to the discovery of regulated genes. There is increasing recognition that complex biological systems, such as gliomas, adhere to fundamental organizational principles. Modularity, where cellular functions are carried out by groups of interacting molecules in overlapping networks, is a predominant feature of such systems ( 2). A major challenge is to map out, understand, and model the topological and dynamic properties of the various networks that control cell behavior. The rising awareness of the importance of molecular networks as organizational patterns in disease states has fostered the development of corresponding analytic tools ( 2). Gene ontology databases in combination with pathway integration software now allow for dynamic mapping of gene expression data sets into potential pathway maps based on their functional annotation and known molecular interactions. Appreciation of the value of such analysis in cancers is partly founded on the assumption that cancer cells do not invent new pathways but use preexisting biological pathways in a modified fashion.
In complex disease states such as gliomas, genes may additionally interact in new pathways and networks. Therefore, the exploration of gene expression networks without a priori assumptions may complement knowledge-based network approaches. Relevance network analysis ( 3) is an unsupervised exploratory computational method that allows individual genes to be directly or indirectly linked to other genes by offering a method to construct networks of similarity across gene expression data sets. Here, so-called nodes with varying degrees of cross-connectivity are displayed, which represent features that are not only associated pairwise but also in aggregate. Implicit in the goal of applying such analysis to tumor gene expression profiles is the notion of correlated expression patterns of genes with related functions, a high-level self-organization in gene expression networks of tumor cells ( 4) and a scale-free topology of such networks ( 2).
We present an integrated approach that combines gene expression profiling in a set of 50 human gliomas of various histogenesis using a 43,000-element cDNA microarray platform, inferential statistics, and dynamic mapping of gene expression data into a functional annotation and pathway database. This knowledge-based network approach was complemented by an unsupervised relevance network learning algorithm without any a priori assumptions. We have captured extended biological pathway maps associated with gliomagenesis as well as distinct glioma subtypes. Detailed analysis of the properties of these refined maps has identified modules of interconnected genes with shared biological functions in the glioma transcriptome.
Materials and Methods
Tumor samples. Fifty fresh-frozen glioma specimens were collected under Institutional Review Board–approved guidelines and subjected to standard WHO classification ( 5). Specimens included 31 glioblastomas (including two gliosarcomas), eight oligodendrogliomas (five oligodendrogliomas and three anaplastic oligodendrogliomas), six anaplastic oligoastrocytomas, and five WHO grades 1 to 3 astrocytic tumors. Although 1p and 19q deletion statuses were available for all tumors (Supplementary Fig. 1A; ref. 6), a histologic diagnosis was assigned that was not influenced by the presence or absence of loss of heterozygosity (LOH) 1p and/or LOH 19q. Except for one infratentorial secondary glioblastoma, all tumors were located in the cerebral hemispheres (16 frontal, 10 parietal, 14 temporal, three occipital, two each frontoparietal and temporoparietal, one each frontotemporal and temporo-occipital). There was no difference in cerebral location between the histologic subtypes. The patient age ranged from 20 to 79 years (one outlier: 12 years, anaplastic astrocytoma) with a median of 53.5 years. Eleven tumors were from female patients and 39 from male patients. Tumor samples were disrupted and homogenized using a rotor-stator homogenizer (Kinematica, Cincinnati, OH). Total RNA was isolated from each homogenate using the RNeasy Lipid Tissue Kit (Qiagen, Valencia, CA) and quantified via spectrophotometry. RNA integrity was confirmed using the Agilent 2100 Bioanalyzer (Agilent, Palo Alto, CA). Normal brain total RNA and universal human reference total RNA were purchased from Stratagene (La Jolla, CA).
Microarray-based gene expression profiling. An indirect, dendrimer-based labeling method ( 7) was used for microarray hybridization that used the Genisphere 3DNA Array 900 labeling system (Genisphere, Hatfield, PA), following the procedural protocol provided by the manufacturer without any modifications. For cDNA synthesis, 3 μg of tumor/normal brain and universal human reference total RNA were separately reverse transcribed using the Cy5- and Cy3-specific Genisphere primers, respectively, and hybridized together overnight at 65°C to human cDNA microarrays (from the Stanford Functional Genomics Facility) containing 41,421 cDNA elements, corresponding to 27,290 different UniGene cluster IDs. Microarrays were coated with DyeSaver2 (Genisphere) immediately after the last wash.
Data selection. Microarrays were scanned on a GenePix 4000B scanner (Axon Instruments, Union City, CA). Primary data collection was done using GenePix Pro 5.1 software. Raw data were background corrected, filtered using a flag and background filter (1.5-minimal signal over background ratio), and normalized by the LOWESS normalization function using the TIGR MIDAS function of the TM4 microarray software suite. 6 The GoldenPath Human Genome Assembly (National Center for Biotechnology Information build 34) 7 was used to map fluorescent ratios of the arrayed human cDNAs to chromosomal positions.
Class predictor and relevance network analysis. Genes (n = 6,706) with expression in 80% of samples and whose expression levels differed by at least 3-fold, in at least one sample, from their mean expression levels across all samples were included in downstream statistical analyses. Two-class, unpaired significance analysis of microarrays (SAM; ref. 8) was used to identify genes differentially expressed between normal brain and ( 1) all 50 gliomas, ( 2) 31 glioblastomas, and ( 3) 14 oligodendroglia-enriched tumors. Genes identified with a false discovery rate (FDR) of 0.005 in the first comparison and with a FDR of 0.001 in each of the second and third comparisons were deemed significant if they passed a 4-fold change filter. Principal component analysis was executed in MATLAB (The MathWorks, Natick, MA). Relevance networks ( 3) were constructed using the TIGR MultiExperiment Viewer function of the TM4 microarray software suite. The similarity of features was computed by comprehensively comparing all features with each other in a pairwise manner using a modified Pearson coefficient of correlation as a distance metric as described ( 3). The threshold for the minimal Pearson squared value was chosen based on random permutation testing.
Gene ontology, canonical pathway, and functional network analysis. Gene ontology, canonical pathway, and functional network analyses were executed using Ingenuity Pathways Analysis tools (Ingenuity Systems, Mountain View, CA), a web-delivered application that enables the discovery, visualization, and exploration of molecular interaction networks in gene expression data. The gene lists identified by SAM, containing Genbank accession numbers as clone identifiers as well as d scores, were uploaded into the Ingenuity pathway analysis. Each clone identifier was mapped to its corresponding gene object in the Ingenuity pathway knowledge base. These so-called focus genes were then used as a starting point for generating biological networks. A score was computed for each network according to the fit of the original set of significant genes. This score reflects the negative logarithm of the P that indicates the likelihood of the focus genes in a network being found together due to random chance. Using a 99% confidence level, scores of ≥2 were considered significant. Significances for biological functions or canonical pathways were then assigned to each network by determining a P for the enrichment of the genes in the network for such functions or pathways compared with the whole Ingenuity pathway knowledge base as a reference set. Right-tailed Fisher's exact test was used with α = 0.05. The same statistical approach was used for gene ontology analyses of the initial gene lists.
Real-time reverse transcription-PCR. Quantitative real-time reverse transcription-PCR (RT-PCR) reactions were done with the ABI Prism 7900HT Sequence Detection System using SYBR GREEN PCR Master Mix (Applied Biosystems, Foster City, CA). Primers targeting the transcripts of BNIP2, CD151, CSDA, EMP1, FBXW7, MYC, and UBE2C genes and the HPRT1 housekeeping gene were designed with the Primer3 program 8 and synthesized at the Stanford PAN Facility (for sequences, see Supplementary Table 1). Total RNA was reverse transcribed using the SuperScript first-strand synthesis system and SuperScript II (both from Invitrogen, Carlsbad, CA). Thermocycling for each PCR reaction was carried out in a final volume of 20 μL containing 1 ng of cDNA, forward and reverse primers at 3 μmol/L final concentration, and 1× SYBR GREEN PCR Master Mix. After 10 minutes of initial denaturation at 95°C, the cycling conditions of 40 cycles consisted of denaturation at 95°C for 15 seconds, annealing at 55°C for 30 seconds, and elongation at 72°C for 30 seconds. All reactions were done in triplicate. Dissociation curve analysis was done after every run to confirm the primer specificity. Gene quantities were determined using standard curves, constructed by five serial dilutions of RT product of Stratagene universal human reference RNA, and gene expression levels were reported as ratios of quantities of the target gene and HPRT1 as the reference gene.
Class predictor analysis identifies genomic signatures of glioma. Two-class, unpaired SAM was used to identify genes differentially expressed between normal brain tissue and glioma tissue or between normal brain and distinct glioma subtypes. SAM computes a statistic measuring the strength of the relationship between gene expression and a response variable, while taking into account the multiple testing nature of a microarray experiment. To each gene, a (d) score is assigned based on its change in expression relative to the SD of repeated measurements for that gene. Genes with scores greater than a threshold, as determined by a tuning variable δ, are deemed potentially significant. The percentage of such genes identified by chance is the FDR. To estimate the FDR, nonsense genes are identified by analyzing permutations of the measurements ( 8). In addition, we used a fold change variable to filter for genes changed at least 4-fold. For the analysis of normal brain versus glioma subtypes, tumors were grouped into 31 pure glioblastomas and 14 tumors with enrichment for oligodendroglial morphology (from now on referred to as oligodendroglial tumors). The cogrouping of pure oligodendroglial and mixed oligoastroyctic tumor was based on an apparent coclustering of these tumors in unsupervised, average-linkage hierarchical cluster analysis. Three hundred fifty one clones were revealed to be significantly linked to gliomagenesis (normal brain versus all 50 gliomas: 31 glioblastomas, 14 oligodendrogliomas, and five grade 1-3 astrocytomas; Fig. 1A ), as were 479 and 167 clones to the glioblastoma and oligodendroglial tumor phenotypes ( Fig. 1B), respectively (Supplementary Tables 2-4). The known glioma genes TNC ( 9, 10), MYC ( 11), NES ( 12), MMP2 ( 13), SPARC ( 14), and FOS ( 15) scored among the top 20 overexpressed genes in gliomagenesis. As to be expected by the shared site of tumor origin, there was considerable overlap in genes (103 clones, Supplementary Table 5) associated with glioblastoma and oligodendroglial tumor morphology versus normal brain (Venn diagram, Fig. 1C), all of which were present in the gliomagenesis data set. Multidimensional scaling using the first three principal components, a linear projection method that reduces the complex dimensionality of microarray data to create a three-dimensional plot that visualizes the relatedness of the tumors, was then used to test whether the above subsets could be used to distinguish glioblastoma, oligodendroglial tumor, and normal brain. This analysis showed a clear separation of all three groups based on these gene subsets ( Fig. 1D).
Gene ontology analysis. The three sets of clones (all 50 gliomas, 31 glioblastoma, and 14 oligodendroglial tumor) identified by SAM, along with their effective significance as indicated by the d score, were uploaded into the Ingenuity pathway knowledge base for analysis of functional annotations. Biological functions were assigned to each data set by using the knowledge base as a reference set and a proprietary ontology representing over 300,000 classes of biological objects and consisting of millions of individually modeled relationships between proteins, genes, complexes, cells, tissues, small molecules, and diseases. These semantically encoded relationships are based on a continual, formal extraction from the public domain literature and cover >10,300 human genes. 9 The biological functions assigned to each data set are ranked according to the significance of that biological function to the data set. A Fisher's exact test is used to calculate a P determining the probability that the biological function assigned to that data set is explained by chance alone. Because our genomic platform uniformly represents the whole human transcriptome, this analysis was not biased towards the coverage of our microarrays. One hundred ninety, 266, and 85 clones of the above subsets of clones predicting gliomas, glioblastoma, and oligodendroglial tumor respectively, mapped to corresponding genes in the knowledge base. Table 1 classifies the genes in each set by function, including carcinogenesis, cell cycle, cell death, and cellular growth and proliferation (complete gene listing, Supplementary Table 6). The sum of 10 tumorigenesis-related functions, the underlying genes for which we have labeled as cancer genes, accounted for 68%, 65%, and 65% of all biological functions in the glioma, glioblastoma, and oligodendroglial tumor gene sets, respectively. Because recent work ( 16) highlights a role for neural stem cells and developmental processes in the formation of gliomas, we also searched the sets for functions related to such processes. A sizable number of genes in each of the three sets (47%, 41%, and 57% in gliomas, glioblastoma, and oligodendroglial tumor, respectively) were revealed to have developmental functions, which partly overlapped with cancer-related functions. A high percentage (34%) of genes linked particularly to nervous system development was noted in the oligodendroglial tumor subgroup ( Table 1).
Functional network analysis. To understand how the genes identified by inferential statistics are related, the empirical data sets were then placed in the context of present knowledge about pathways and molecular interactions, using the Ingenuity knowledge base. Genes implicated in gliomagenesis as well as the nonoverlapping genes ( Fig. 1C) that predicted glioblastoma and oligodendroglial tumor morphology were explored. Genes (n = 172, 228, and 76), so-called focus genes, were used as the starting point for generating biological networks in the glioma, glioblastoma, and oligodendroglial tumor sets, respectively (see Methods and Materials for details). Based on these genes, new and expanded pathway maps and connections and specific gene-gene interactions were inferred, functionally analyzed, and used to build on the existing pathway knowledge base. To generate networks, the knowledge base was queried for interactions between focus genes and all other gene objects stored in the base. The output, displayed graphically as nodes (genes) and edges (the biological relationship between the nodes), provided a semantically consistent representation of a number of biological pathways and functions implicated by the empirical data sets.
Multiple networks were discovered in each of the three data sets. The ability to rank the networks based on their relevance to the empirical data sets allowed for rapid prioritization of networks with highest importance. Based on the computed scores, 8, 10, and 2 networks were found to be significant in the glioma, glioblastoma, and oligodendroglial tumor data sets, respectively (Supplementary Table 7). In addition, high-level functions were calculated and assigned to each network if the significance of the association between the network and the biological function had a P < 0.05. High-scoring functions in the top-ranking network in each of the three data sets were related to cellular growth and proliferation (glioma, P < 0.006 for each feature; Fig. 2A ), cellular movement/invasion (glioblastoma, P < 0.006; Fig. 2B), and cancer (oligodendroglial tumor, P < 0.005; Supplementary Fig. 2).
The nodes for the highest scoring networks of the glioma and glioblastoma data sets were comprised solely of focus genes ( Fig. 2). Both networks included a substantial number of genes that have been implicated in gliomagenesis based on previous studies in the literature. For example, the top network in the glioma set contained the glioma-related genes VEGF ( 17), CD44 ( 18), TNC ( 9, 10), CTGF ( 19), TGFBI ( 20), ID3 ( 21), CDC2 ( 22), PTTG1 ( 23), CCL2 ( 24), FN1 ( 25, 26), MMP2 ( 13), CSPG2 ( 27), CYR61 ( 19, 28), RUNX1 ( 29), and SPARC ( 14). The glioblastoma network included the glioma-related genes FN1 ( 25, 26), ITGA3 ( 30), ITGA5 ( 31), IGFBP2 ( 32), IGFBP3 ( 33), IGFBP5 ( 34), CD44 ( 18), TGFBI ( 20), PLAU ( 25), PLAUR ( 25), CDC2 ( 22), VEGF ( 17), PTTG1 ( 23), CYR61 ( 19, 28), and CTGF ( 19). Several novel genes without previous implication in gliomas but with evidence in carcinogenesis were revealed for both networks, such as the oncogene LYN ( 35) in the glioblastoma network ( Fig. 2B).
The oncogene myc mapped to the core of the top gliomagenesis network and was revealed as the most prominent interaction partner within the network ( Fig. 2A). The recurrent overexpression of this gene was associated with amplification of the cognate MYC locus at 8q24.21 in several of the tumors (Supplementary Fig. 1B-C). Besides genes with established MYC interactions and/or known functions in gliomas, such as SPARC ( 14, 36), CD44 ( 18, 37), FN1 ( 25, 26, 38), GAS1 ( 39, 40), and VEGF ( 17, 41), novel MYC-responsive genes were identified. These included the overexpressed genes UBE2C and EMP1 and the underexpressed gene FBXW7 ( Fig. 2A), all of which have reported cancer-related functions. UBE2C ( 42) and EMP1 ( 43) have been implicated in promoting tumorigenesis and FBXW7 ( 44) as a tumor suppressor gene.
At the core of the top glioblastoma network were genes that play a role in integrin signaling, including FN1 ( 25, 26), ITGA3 ( 30), ITGA5 ( 31), PLAU ( 25), PLAUR ( 25), CTGF ( 45), CYR61 ( 28, 45), and CAV1 ( 46), as well as insulin-like growth factor (IGF) signaling elements such as IGFBP2 ( 32), IGFBP3 ( 33), IFGBP5 ( 34), CTGF (IGFBP8; ref. 47), and CYR61 (IGFBP10; ref. 47; Fig. 2B). Accordingly, the functional analysis for the network revealed a top function in cellular movement and invasion and a high significance of the canonical pathways for integrin (P = 0.017) and IGF-I (P = 0.038) signaling. This extended pathway map newly identified CD151 as a putative factor in glioblastoma cell invasion ( Fig. 2B), a gene that has been shown to enhance cell motility, invasion, and metastasis of cancer cells ( 48).
Relevance network analysis. Relevance network analysis was used as an unsupervised, exploratory learning algorithm to discover functional relationships between genes in the entire gene expression data sets (glioblastoma, oligodendroglial tumor, and brain) based on the assumption that genes with correlated expression behavior are also functionally related. This analysis takes into account both positively and negatively correlated gene expression patterns. Pairwise associations based on squared Pearson coefficients of correlation were computed, and associations weaker than 0.8 were removed, leaving genes that were nonrandomly associated with another and possibly biologically related. The effect of random chance in the large number of calculations was empirically determined by random permutation testing. Associations stronger than those seen in the multiply permuted data were used to construct networks of highly correlated genes. This approach identified four relevance networks with >10 nodes and varying degrees of cross-connectivity (Supplementary Table 8). One of the networks revealed 101 clones with overexpression in glioblastoma compared with oligodendroglial tumor and normal brain ( Fig. 3A-C ). Sixty-nine of these 101 clones were included in the 376 clones identified by SAM to distinguish glioblastoma from oligodendroglial tumor and normal brain (Supplementary Table 9), some of which (ITGA3, ITGA5, PLAUR, LYN, HCLS1, FCGR2B, and MSN) mapped to the top-scoring glioblastoma knowledge-based functional network ( Fig. 2B). Another cluster of 36 clones, which included UBE2C, showed highly correlated although not glioblastoma-specific expression behavior ( Fig. 3D). Gene ontology analysis of this network revealed functions in cell cycle regulation and mitosis for almost any node, many of which have been also implicated in carcinogenesis. Deposition of these nodes into the Ingenuity knowledge base resulted in the formation of a single highly significant network that included MYC ( Fig. 3E) and confirmed top-scoring functions of its constituents in cell cycle turnover and cell cycle check point regulation (P < 10−6).
Target gene validation by real-time reverse transcription-PCR. Several target genes deemed biologically interesting because of their differential expression in gliomas and/or their interaction in the knowledge-based network analysis were validated by real-time RT-PCR analysis in a representative panel of tumors and normal brain ( Fig. 4 ). This study confirmed a recurrent overexpression of MYC and the MYC-target genes UBE2C and EMP1 and a persistent underexpression of the MYC regulator FBXW7 in human gliomas. It also substantiated the up-regulated expression of the CDK4-interacting Y-box transcription factor CSDA and of the CD151 gene. Finally, we have validated the CDC42-interacting BNIP2 gene and CD151, both of which may promote tumor cell invasion, to be highly expressed in the glioblastoma subtype. The expression level for each target gene was related to the housekeeping gene HPRT1 and the ratio was correlated with the corresponding expression ratio of target versus housekeeping gene in the microarray experiment. For each of the examined genes, the real-time RT-PCR results closely mirrored the expression levels for these genes assessed by microarray analysis (mean correlation, R = 0.92; Fig. 4).
We have applied refined computational methods to the genomic signatures of gliomas to highlight key functional networks. Based on increasing recognition that a systems approach is necessary to view the overall molecular events responsible for gliomagenesis, we have combined large-scale analysis of gene expression with knowledge-based and relevance network analyses. Complex networks involved in tumorigenesis were mapped and subsequently explored in the context of genes deemed important in gliomagenesis by inferential statistics for a refined molecular pathway picture.
Our initial class predictor analysis in combination with gene ontology assessment has shown the differential expression of a substantial number of genes annotated to cancer-related and developmental functions in human gliomas. The latter observation substantiates recent work ( 16) that emphasizes multifaceted mechanisms for the origin and formation of gliomas, in which neural stem cells have an important role. Based on these genes and the assumption that glioma cell behavior emerges from the orchestrated activity of many cellular components that interact with each other through pairwise interactions, we have identified distinct molecular networks perturbed in the formation of gliomas, many of which are extended characterizations of pathways previously implicated in gliomas. These networks include both directed and undirected interactions. Directed interactions are characterized by a well-defined information flow (e.g., from a transcription factor to the gene it regulates). Undirected interactions do not have an assigned direction (e.g., mutual binding relationships).
The most significant gliomagenesis network arose around the myc oncogene, which encodes a basic helix-loop-helix leucine zipper transcription factor that triggers cell growth and division, in part through activation of the CDK4/RB/E2F pathway ( 49, 50). In line with its involvement in a wide range of malignancies ( 51), there is circumstantial evidence that suggests a role of MYC and deregulation of MYC pathways in gliomagenesis ( 11, 52). Multiple epigenetic and genetic mechanisms underlie the frequent overexpression, overactivation, and accumulation of MYC in human gliomas, including increased upstream growth factor receptor tyrosine kinase signaling ( 53), amplification and rearrangement of the MYC locus at 8q24.21 ( 11), inactivation of MYC pathway antagonists ( 54), and prolongation of MYC half-life ( 55). Accordingly, we have observed MYC overexpression in our tumors with and without concurrent gene copy number alteration. MYC acts as a sequence-specific DNA-binding protein to facilitate expression of a wide variety of E-box-containing target genes ( 56). It has also been assigned roles in chromatin remodeling of target promoters ( 57) and regulation of translation initiation ( 58).
Our network has identified a number of new MYC-response genes with cancer-related functions to be differentially expressed during gliomagenesis. The MYC-up-regulated UBE2C gene ( 59) codes for an E2 ubiquitin-conjugating enzyme whose function is closely linked to the progression of cells through the M phase and the coupling of mitosis to S-phase entry via autonomous regulation of the anaphase-promoting complex ( 60). This gene, which was highly overexpressed in human gliomas, has only recently been shown to be up-regulated in human carcinomas of diverse origin and to be associated with poor tumor differentiation ( 42, 61). Its silencing significantly inhibits cancer cell proliferation and sensitizes cells to tumor necrosis factor–related apoptosis-inducing ligand–mediated cell killing ( 42). UBE2C may be a target for the common amplification of the 20q13.1 locus in carcinomas ( 42). We have also noted a mechanistic link between gene expression and copy number for UBE2C in human gliomas (data not shown).
By contrast, we found persistent down-regulation of the MYC-binding FBXW7 gene in human gliomas. This putative haplosufficient tumor suppressor gene, which functions as a phosphoepitope-specific substrate recognition component of the SKP1-cullin-F-box ubiquitin ligase complex, promotes proteasome-dependent MYC turnover in vivo and MYC degradation and ubiquitination in vitro ( 44, 62). The frequent mutation of MYC at the FBWX7-binding site suggests that MYC activation is one of the key oncogenic consequences of FBWX7 loss in carcinogenesis ( 44, 62). The recent identification of FBWX7-inactivating mutations in human cancer has been linked to increased genomic instability ( 63) by a p53-dependent, STK6-involving mechanism ( 64).
We have also revealed common overexpression of the EMP1 gene during gliomagenesis. EMP1 has been originally isolated from a MYC-induced mouse brain tumor cDNA library ( 65). This putative membrane glycoprotein is an immediate transcriptional target for MYC that possesses tumorigenic activity ( 43). EMP1 expression is associated with cell proliferation, which is down-regulated when cells are growth arrested ( 65). It is also highly expressed in undifferentiated, proliferating embryonic stem cells, but a much lower expression is observed after these cells are induced to differentiate ( 65). The increased expression of EMP1 during brain development ( 66) emphasizes the conceptual interrelatedness of gliomagenesis and neurogenesis. We have recently mapped EMP1 to a locus of recurrent gene copy number gain on chromosome 12p13.1 in human gliomas ( 6).
We also found the CSDA gene, a Y-box transcription factor that promotes G0-G1 to S-phase transition and has been implicated in carcinogenesis ( 67, 68), to be persistently overexpressed in gliomas. This gene is thought to stimulate cell cycle progression at least in part via interaction with the CDK4/RB pathway ( 68), which is chronically active in a subset of gliomas. Finally, we report overexpression of the BNIP2 gene in human gliomas, in particular the glioblastoma subtype. This gene has been originally identified as an interacting partner of the BCL2 and p19E1B prosurvival factors ( 69) as well as a putative substrate of the fibroblast growth factor receptor in the cell ( 70). Recently, BNIP2 has been implicated in modulating cell dynamics by inducing cellular protrusions in cancer cells through interaction with CDC42, suggesting its potential role in cell migration and invasion ( 71).
Our network analysis has also revealed an extended pathway map potentially involved in glioblastoma cell invasion. The invasion of neoplastic cells into normal surrounding brain tissue is a pathologic hallmark of glioblastomas ( 72). Understanding the mechanisms of tumor cell invasion is critical as it has a central role in glioblastoma progression and failure of contemporary therapy due to disease recurrence from microdisseminated disease. Glioblastoma cells share the common attributes of the invasion process, including cell adhesion to extracellular matrix (ECM) components, cell locomotion, and the ability to remodel extracellular space. The brain parenchyma is a unique environment devoid of rigid protein barriers. Integrins and the hyaluronan receptor CD44 are specific adhesion receptors active in glioblastoma-cell matrix interactions, which are used by the neoplastic cells to adhere and migrate along the brain ECM ( 72). The ECM protein FN1 interacts with many integrins and promotes glioblastoma cell adhesion and migration ( 25, 26). The serine protease PLAU and its cognate receptor (PLAUR) degrade the brain ECM and thus contribute to the invasive capabilities of glioblastoma cells ( 25). The secreted CCN gene family members CYR61 and CTGF, which can act as ligands for integrins and stimulate adhesion and migration, are overexpressed in glioma cells and prognostic for tumor progression and survival of glioma patients ( 19, 28). The SPARC-induced secretory protein TGFBI interacts with FN1 ( 73) and distinct integrins ( 74, 75) and promotes integrin-dependent adhesion of glioblastoma cells ( 20, 76). The integrin-associated protein CD47 serves as a ligand for signal regulatory protein α in malignant glioma cells and is frequently expressed in glioblastomas ( 77).
We have identified CD151 as a new component of a network that may potentially mediate glioblastoma cell invasion. Constituting a member of the tetraspanin superfamily, CD151 is a cell surface glycoprotein that complexes with integrins and other tetraspanins ( 78), regulates integrin trafficking and/or function ( 79), and enhances cell motility, invasion, and metastasis of cancer cells ( 48). Most recently, CD151 has been implicated in modulating the ligand-binding activity of ITGA3 through stabilizing its activated conformation ( 80) and has been shown to predict the outcome of low-grade primary prostate cancer better than histologic grading ( 81). We confirm a recent report ( 82) of low expression of L1CAM in various glioma subtypes compared with normal brain. This glycoprotein has a role in nervous system development ( 83). We found the SRC family member LYN to be overexpressed in glioblastoma. This nonreceptor tyrosine kinase acts as an oncogene, and has been shown to be critical in platelet-derived growth factor–stimulated integrin αVβ3–mediated migration of U-87MG glioblastoma cells ( 84).
There is growing evidence that most functional networks within the cell approximate a scale-free topology, where the number of nodes shows a power-law degree distribution ( 2). Examples of scale-free organization are genetic regulatory networks, in which the nodes are individual genes and the links are derived from the expression correlations that are based on microarray data ( 4, 85). Using relevance networks ( 3), we have found a scale-free organizational principle in the gene expression signatures of gliomas, where the main feature is the coexistence of nodes of widely different degrees, from nodes with one or two links to so-called hubs with a very large number of links. Our functional network analysis also indicates that molecular interaction networks in glioma cells show scale-free features, specifically the existence of hubs (such as MYC) which may fundamentally determine the network's behavior. A relevance network is a group of genes whose expression profiles are highly predictive of one another. We have shown that genes can be directly or indirectly linked to several genes based on positively and negatively correlated expression. This feature provides a considerable advantage compared with traditional cluster algorithms where each gene can be linked to only one other feature and be only a member of one functional genomic cluster. In addition, it can capture negative correlations, which are missed in hierarchical clustering and simultaneously display them with positive correlations. Our findings support the implicit notion of unsupervised, exploratory network algorithms, that genes with similar expression behavior are related biologically. Our data evidence a high level of self-organization in the gene expression networks of glioblastoma and indicate that genes with functions in glioblastoma formation and progression show interrelated expression patterns. They also indicate modularity in the glioma transcriptome, in which distinct functions are carried out by groups of interacting molecules. Such modularity is substantiated by our discovery of a highly conserved module of interconnected genes with shared cell cycle functions across the glioma transcriptome.
Our study has several limitations. A potential shortcoming of our study was the analysis of brain tumor tissue versus normal brain tissue. Normal brain tissue is composed of various cell types, most of which may actually not represent the cells of glioma origin. Although the identification of multiple genes with previous implication in gliomagenesis provides a measure of reliability for our experimental setting, differences in the relative amounts of grey versus white matter and tumor versus normal tissue may potentially skew the gene expression profiles towards regulators of proliferation. In addition, a potential pitfall of our study is that the functional networks and their underlying gene-by-gene interactions have been deduced by an in silico modeling approach that was based on predetermined database knowledge and which therefore can only be considered as a source of hypotheses. These interactions have been established in various physiologic and pathologic cell conditions. This however does not necessarily prove that the genes in our network maps interact in a similar way in human gliomas. Molecular functions vary by cellular and tissue contexts, which limit conclusions regarding the potential functional significance of molecular associations in gliomas made through database analysis. Functional testing will be needed for the rigorous evaluation of individual molecular interactions inferred by our database approach. Our hypothesis-generating approach has revealed several new candidates altered in expression in human gliomas. Gene-targeting experiments, such as gene silencing by RNA interference and gene knockout in an animal model or gene delivery by gene transfection, will be useful in validating the significance of these genes in gliomagenesis or individual glioma subtypes. Finally, none of the described molecular interaction networks are independent. Instead, they are likely constituents of a complex network map that determines the biological behavior of glioma cells. Integrated studies that enable the analysis of all (regulatory, metabolic, spatial, etc.) interactions may offer additional insights into how such an extensive map contributes to gliomagenesis.
In conclusion, we have used network analysis as a conceptual framework to explore the pathobiology of human gliomas, based on the assumption that glioma cell behavior is a contextual attribute of distinct patterns of interactions between multiple genes. The salient results of our study include the delineation of refined biological pathway maps differentially modulated in human gliomas, one of which is built around the myc oncogene and includes distinct MYC-interacting genes that were not previously implicated in gliomagenesis. Modules of interconnected genes with common functions in cell cycle regulation are conserved in the glioma transcriptome. In glioblastoma, clusters of genes with tumor-promoting functions show high-level self-organization and correlated patterns of gene expression. The paradigmatic, extensive migratory and invasive behavior of this glioma subtype is accentuated by the demarcation of a pathway map enriched for integrin-signaling components. Although the actual role of individual genes and gene interactions in the inferred genetic regulatory networks requires rigorous functional evaluation, these networks may contribute to understanding key biological functions and pathways that are altered during gliomagenesis and may offer the potential for new avenues of future therapeutic intervention.
Grant support: NIH grant CA92474 (B.I. Sikic) and Emmy-Noether-Program of the German Research Society (M. Bredel).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Note: Supplementary data for this article are available at Cancer Research Online (http://cancerres.aacrjournals.org/).
- Received April 7, 2005.
- Revision received June 14, 2005.
- Accepted July 12, 2005.
- ©2005 American Association for Cancer Research.