Glioblastoma is classified into two subtypes on the basis of clinical history: “primary glioblastoma” arising de novo without detectable antecedent disease and “secondary glioblastoma” evolving from a low-grade astrocytoma. Despite their distinctive clinical courses, they arrive at an indistinguishable clinical and pathologic end point highlighted by widespread invasion and resistance to therapy and, as such, are managed clinically as if they are one disease entity. Because the life history of a cancer cell is often reflected in the pattern of genomic alterations, we sought to determine whether primary and secondary glioblastomas evolve through similar or different molecular pathogenetic routes. Clinically annotated primary and secondary glioblastoma samples were subjected to high-resolution copy number analysis using oligonucleotide-based array comparative genomic hybridization. Unsupervised classification using genomic nonnegative matrix factorization methods identified three distinct genomic subclasses. Whereas one corresponded to clinically defined primary glioblastomas, the remaining two stratified secondary glioblastoma into two genetically distinct cohorts. Thus, this global genomic analysis showed wide-scale differences between primary and secondary glioblastomas that were previously unappreciated, and has shown for the first time that secondary glioblastoma is heterogeneous in its molecular pathogenesis. Consistent with these findings, analysis of regional recurrent copy number alterations revealed many more events unique to these subclasses than shared. The pathobiological significance of these shared and subtype-specific copy number alterations is reinforced by their frequent occurrence, resident genes with clear links to cancer, recurrence in diverse cancer types, and apparent association with clinical outcome. We conclude that glioblastoma is composed of at least three distinct molecular subtypes, including novel subgroups of secondary glioblastoma, which may benefit from different therapeutic strategies. (Cancer Res 2006; 66(23): 11502-13)
- array CGH
The recalcitrant nature of glioblastoma, despite the application of novel therapies targeting “signature glioma genes” in clinical trials, continues to motivate efforts to define the genetics and biology of this aggressive brain tumor. The lethal nature of glioblastoma relates to its propensity to invade early and widely throughout the normal brain parenchyma and to its intense apoptosis resistance rendering it incurable with existing chemo/radiotherapies. Glioblastoma can develop along two distinct clinical pathways: primary glioblastoma, which presents with no pathologic precursor lesion and has a clinical prodrome of <3 months, and secondary glioblastoma, which develops over a 5- to 10-year period from a low-grade astrocytoma ( 1). Molecular analysis of the glioblastoma subtypes has identified epidermal growth factor receptor (EGFR) amplification as exclusive to primary glioblastoma and TP53 loss as the genetic hallmark of low-grade astrocytoma and secondary glioblastoma ( 1), and transcriptional differences seem to reflect biological differences between these clinical subtypes ( 2). At the same time, the strikingly similar phenotype of the glioblastoma subtypes is reflected in common genetic lesions such as loss of phosphatase and tensin homologue (PTEN) and cyclin-dependent kinase (CDK) inhibitor 2A (CDKN2A), as well as amplification/overexpression of CDK4 and MDM2 ( 1). Emerging data, however, suggest that these genetic events may represent only a fraction of mutations involved in glioblastoma pathogenesis.
Conventional and array-based comparative genomic hybridization (aCGH) and microarray expression profiling have catalogued the common chromosomal aberrations, each of which contains genes with copy number–driven expression changes validated across large glioblastoma sample sets ( 3– 5). Application of higher resolution aCGH platforms has identified additional loci targeted for recurrent copy number alterations (CNA), underscoring the genomic complexity of this aggressive cancer. Attempting to ascribe biological significance to these various loci, several studies have correlated changes in copy number or expression with prognosis. Whereas gain of 7p has been linked to poor prognosis ( 6), EGFR amplification and activated EGFR mutant (VIII) expression have not been shown to be consistent independent predictors of survival in glioblastoma ( 7, 8). Long-term survival (>3 years) has been correlated with 19q loss, whereas 6q loss, 10q loss, and 19q gains have been correlated with shorter survival ( 9)—the clinically relevant gene targets of most of these regions have yet to be identified. In the case of 10q, whereas loss of PTEN has not been shown to predict survival time in glioblastoma ( 8), coexpression of PTEN and activated EGFR was correlated with a good response to EGFR inhibition in a recurrent glioblastoma clinical trial ( 10).
Within glioblastoma expression datasets, gene clusters ( 5) and genes ( 4) correlate with survival and seem to have relevance to glioma biology and heterogeneous cellular morphology ( 11). Integrating aCGH and expression data into survival analysis has identified molecular subsets in glioblastoma, thus enabling a more focused search for genes involved in glioblastoma pathogenesis and clinical outcome ( 4). Such studies underscore the value of genomic analysis of clinically annotated tumor samples and, at the same time, emphasize the limited scope of knowledge surrounding the identity and function of the presumed cancer genes targeted in these loci.
High-resolution oligo-based aCGH platforms and advances in computational methods provide an opportunity to obtain a detailed view of the genomic changes in cancer and to correlate such CNA patterns with the tumor subtypes and clinical outcome. In this study, the application of this approach to clinically annotated, formalin-fixed paraffin-embedded primary and secondary glioblastoma samples readily documented all known glioblastoma CNAs and, in the vast majority of cases, narrowed significantly the common regions targeted for amplification or deletion. Unsupervised classification, including genomic nonnegative matrix factorization (gNMF) methods, defined primary and secondary glioblastomas as different genetic diseases and identified two novel genetic subgroups in secondary glioblastoma. These genomic profiles point to prime therapeutic candidates but emphasize that effective treatment strategies will likely require tailored and multiple targeted therapies for each of the three glioblastoma molecular subtypes.
Materials and Methods
Identification of primary tumors. Tumors were identified through the Neuro-Oncology Program at Dana-Farber Cancer Institute/Brigham and Women's Hospital and included review of case records to classify tumors by histopathologic diagnosis and clinical history under an Institutional Review Board–approved protocol. Primary glioblastomas had no prior history of glioma, had a clinical prodrome of ≤3 months, and were diagnosed as glioblastoma at the first operation. Secondary glioblastoma patients had a histologic diagnosis of low-grade astrocytoma that preceded the diagnosis of glioblastoma by at least 1 year. Slides from each case were reviewed by a neuropathologist (K.L.L.) to confirm the diagnosis of glioblastoma and low-grade astrocytoma, when relevant. To study as homogeneous subgroups as possible, tumors with histologic evidence of a prominent oligodendroglial component were excluded given the known genotypic differences between oligodendroglial and astrocytic tumors ( 12). Similarly, the “giant cell” variant of glioblastoma was excluded because it has genetics that have been described as overlapping primary and secondary glioblastomas ( 1). For each case, gender, age, location of tumor, treatment course, clinical course, including time to tumor transformation/progression, and survival time were collected.
DNA extraction. The details of the extraction method are described in Supplementary Methods. Briefly, one representative block per case containing >90% cellularity and minimal necrosis was selected. Twenty-five milligrams of tissue were obtained in 40-μm histologic sections. High-quality DNA (>10 kb fragments) was extracted by modifications of Qiagen (Valencia, CA) tissue extraction protocol. Formalin-fixed paraffin-embedded sections were deparaffinized in a series of three steps: treatment with mixed xylenes, removal of xylene with 100% ethanol, and rehydration in PBS. Samples were lysed in buffer and proteinase K and inverted at 55°C for 24 to 72 hours and prepared using the Qiagen protocol. DNA quality was examined by gel electrophoresis and samples with at least 50% of total DNA running above the 10-kb marker (New England Biolabs, Ipswich, MA) were of sufficient quality to proceed with aCGH (Supplementary Fig. S1A).
aCGH profiling on oligonucleotide microarrays. Genomic DNA was fragmented and random prime labeled as previously described ( 13) and hybridized to oligonucleotide arrays containing 22,500 elements designed for expression profiling (Human 1A V2, Agilent Technologies, Santa Clara, CA). Using National Center for Biotechnology Information Build 35, 16,097 unique map positions were defined with a median interval between mapped elements of 54.8 kb. Fluorescence ratios of scanned images were calculated as the average of two paired arrays (dye swap), and the raw profiles were processed to identify statistically significant transitions in copy number using Circular Binary Segmentation ( 13, 14).
Definition of thresholds for analysis. A “segmented” data set was generated by determining uniform copy number segment boundaries and then replacing raw log 2 ratio for each probe by the mean log 2 ratio of the segment containing the probe. The distribution of segment mean log 2 ratios for the 37 samples is shown in Supplementary Fig. S1B. The tall central peak represents the majority of the genome, which is without CNA in the sample set. Based on this distribution, thresholds representing minimal CNA were chosen at ±0.15, equal to ±6 SDs of the middle 75% of the data (blue lines). Thresholds representing more significant CNAs were chosen at ±0.4, representing the 97% and 4% quantiles (green lines). High-priority minimal common region (MCR) were chosen by requiring at least one sample to show an extreme CNA event, defined by thresholds +0.8 and −0.65, >99% and <1% quantiles, respectively (red lines).
Genomic NMF analysis and Fisher's exact test of aCGH profiles. Genomic NMF was applied to the current data sets as previously described ( 15). Briefly, the segmented data set was first dimension reduced by eliminating redundant probe locations and then transformed to nonnegative values. The resultant data set is a nonnegative matrix that is subject to gNMF using a custom software package ( 16) and run in MATLAB (The MathWorks, Inc., Natick, MA). For each factor level 2 through 6, gNMF is repeated 1,000 times to build a consensus matrix, and this is used to assign samples to clusters based on the most common consensus. The rank K = 3 clustering was further significance tested by permuting sample labels for secondary glioblastoma samples independently for each chromosome. One hundred permutations were subject to rank 3 NMF over 1,000 iterations and the consensus matrix assessed by cophenetic correlation.
Fisher's exact test was used to identify significantly different regional gains or losses between primary and secondary glioblastomas. For each aCGH probe, each sample was classified as having copy number normal, gained or lost based on log 2 ratio thresholds of ±0.15 (minimal CNA threshold; see above). Two-by-two contingency tables tested gained versus normal and lost versus normal between primary and secondary glioblastomas. Fisher's exact test P values were corrected for multiple testing (q-value FDR 10%, “qvalue” package for R). 12
Quantitative PCR verification. PCR primers were designed to amplify products of 100 to 150 bp within target and control sequences as described ( 13).
Automated MCR definition. Loci of amplification and deletion were evaluated across the subgroups in an effort to define MCRs targeted by overlapping events in two or more samples. An algorithmic approach is described in Supplementary Methods and has previously been described ( 13, 15).
Comparison with published expression data (KLL/DNL expression). Expression data from ref. 17 were used to generate a comparison data set of gene expression changes in glioblastoma versus normal brain (n = 8 samples). Statistical significance of overexpression or underexpression in glioblastoma relative to normal brain was set at P < 0.05.
Recurrent Genomic Changes Characterize the Glioblastoma Clinical Subtypes
Array-CGH methods, employing a high-density gene-specific oligo-based platform (see Materials and Methods), were used to detect CNAs in the genomes of clinically annotated and pathologically verified primary and secondary glioblastoma cases using DNA extracted from formalin-fixed paraffin-embedded tissue (see Materials and Methods). An approximately equal number of primary (n = 20) and secondary (n = 17) glioblastomas, all of pure astrocytic histology and secondary glioblastomas arising from histologically verified low-grade astrocytomas, were included in this study to have statistical power to detect differences between the two clinical subgroups. The cases are representative of patients treated in academic neuro-oncology centers in which the median age at diagnosis is significantly less for secondary glioblastoma [40 years (range, 19-54 years), versus 51 years (range, 33-65 years) for primary glioblastoma; P < 0.0001] but the overall response to radiation and cytotoxic chemotherapy is similar. Median overall survival time was similar between the groups, 17 months (range, 10-47 months) for primary glioblastoma and 14 months (range, 2-60 months) for secondary glioblastoma (primary versus secondary, P = 0.95; Supplementary Fig. S2). All except one patient with secondary glioblastoma were treated with external beam radiotherapy at the time of low-grade astrocytoma diagnosis and 13 of 17 (76%) were treated with adjuvant nitrosurea-based chemotherapy or temozolomide. Time to development of secondary glioblastoma was not correlated with any clinical feature or treatment modality (data not shown).
This established high-resolution aCGH platform ( 18) and modified segmentation analysis (ref. 13, see Materials and Methods) readily identified large regional changes of low amplitude, hereafter referred to as “gain” or “loss” (i.e., segment mean log 2 ratios of >0.15 and ≤0.15), as well as focal higher-amplitude CNAs representing more significant amplification or deletion events (i.e., average size spans 2.78 Mb; range, 0.06-20.23 Mb) and log2 ratio amplitude ranges from +0.4 to +4.6 and −0.4 to −3.1 (see Materials and Methods). Skyline recurrence plots of primary and secondary glioblastoma profiles mirror well the frequencies of known glioblastoma loci targeted for copy number alteration (refs. 1, 19; Fig. 1 ) and readily show regions of gain and loss that are distinct between these classic subtypes. Ch7 gain and ch10 loss were identified in >90% of the primary glioblastomas in this sample set. Whereas it has been shown that primary glioblastoma can be divided based on the presence or absence of these chromosomal changes ( 20), the high resolution of the oligo-based platform as well as the sample set design (both primary and secondary glioblastomas) in this study allowed for identification of many distinct regions associated with the primary and secondary glioblastoma subtypes. Gain of ch19, in addition to ch7 gain and ch10 loss, was significantly overrepresented in primary compared with secondary glioblastomas (P < 0.01, FDR 10%) whereas secondary glioblastoma showed unique gains in subregions of 8q, 10p, and 12p (P < 0.01, FDR 10%). Loss of 9p and 10q was common for primary and secondary glioblastomas, reflecting the classic loss of p16 and PTEN in the two subtypes. However, loss in other chromosomal regions was overrepresented among secondary glioblastomas and reached statistical significance for 5q, 11p (P < 0.01, FDR 10%), 3p, and 4q (P < 0.017, FDR 15%), all well delineated by the array.
Unsupervised Classification of Glioblastoma Genome Recapitulates Clinical Subgroups
Historically, glioblastoma subclassification has been based entirely on clinical history. Here, we sought to determine if chromosomal aberration patterns define genomic subclasses that corroborate or expand the classic clinical subclasses. We used gNMF, an unsupervised clustering method designed to extract distinctive genomic features from aCGH profiles and proven useful for identifying novel clinical subtypes in multiple myeloma ( 15, 16). After transforming the segmented aCGH data to nonnegative values, gNMF consensus matrices were generated for ranks 2 to 8, representing attempts to cluster into 2 to 8 separate groups. Ranks K = 2 and 3 showed relatively stable cluster assignments, whereas groupings of rank ≥4 were unstable ( Fig. 2A ). The stability of these ranks was confirmed by a strong cophenetic correlation value of ∼0.99 for ranks 2 and 3 but an abrupt decrease to 0.94 with K = 4 (Supplementary Fig. S3).
The rank 2 classification (K2) essentially encapsulated the primary and secondary glioblastoma clinical classes ( Fig. 2B), 19 of 20 primary glioblastomas (identified by blue bar to right) being assigned to one cluster (K2-1) and 14 of 17 secondary glioblastomas (orange bar) assigned to the other (K2-2). The concordance between unsupervised classification and clinical class was better than that achieved by unsupervised hierarchical clustering, which was driven predominantly by gain of ch7 and loss of ch10 in the primary glioblastomas, consistent with what has been described for the genetic subgrouping of primary glioblastoma ( 20), as well as loss of ch19 in the secondary glioblastomas (Supplementary Fig. S4). Alteration of chromosome 19 shows an additional insight made by applying gNMR to our data set. As reflected in the skyline recurrence plot ( Fig. 1), it has been established that chromosome 19 can be gained or lost in glioblastoma. Using gNMF, we show that gain of ch19 is frequent and exclusive to primary glioblastoma and, unexpectedly, tracks closely with gain of ch20 ( Fig. 2B). In contrast, loss of ch19 is a feature of the K2-1 genetic subgroup, almost exclusive to secondary glioblastoma, with gain of ch20 rarely seen in this subgroup. The strong correlation between genomic profile and clinical class strongly supports the view that the pathogeneses of these classic clinical subgroups are driven by a large number of distinct genetic events.
The rank K = 3 classification largely preserved the K2-1 cluster of primary glioblastomas (K3-1, with 18 of 20 primary glioblastoma; blue bar, Fig. 3A ). Viewing the individual gNMF components ( Fig. 3B) indicates that assignment to K3-1 is driven by gains of chromosome 7, 19, and 20 and a small region of chromosome 12, as well as loss of chromosome 10 and smaller regions of 9 and 11, features typical of primary glioblastoma. Notably, the K3 classification reveals two distinct subclasses of secondary glioblastoma: K3-3 is dominated by regions of loss, in particular on chromosomes 6, 9, 10, 13, 18, and 19, whereas K3-2 has prominent regional gains on chromosomes 4, 8, and 12 and focal gains on ch7 and ch11.
The cluster stability for rank K = 3 gNMF strongly supports the presence of these two molecularly distinct subclasses of secondary glioblastoma in our data set, each with signature alterations correlated across the genome. To address the statistical significance of the K3-2 and K3-3 gNMF subclassification, random permutation (n = 100) of the sample labels for each chromosome was done (see Materials and Methods). This procedure maintains the distribution of CNAs among secondary glioblastoma samples overall while breaking correlation between chromosomes. The cophenetic correlations for rank 3 calculated for each of 100 iterations showed a mean of 0.918 (range, 0.839-0.965). Thus, the high correlation for rank 3 in the initial dataset (0.99) reflects significantly nonrandom correlation of CNA events defining the K3 secondary glioblastoma subclasses.
A further support for significance of these gNMF subclasses derives from our assessment of whether these two genetic subclasses of secondary glioblastoma were associated with any clinical variables. Whereas these divisions did not track with any of the baseline patient characteristics or treatment modality of antecedent low-grade astrocytoma, they seemed to correlate with time to progression of secondary glioblastoma from LGA disease by Kaplan-Meier analysis ( Fig. 4 ). Whereas statistically significant differences were not achieved with this limited sample size (log-rank test, P = 0.31), it is worth noting a trend toward shorter time to tumor progression in the K3-3 group (42 months) versus the K3-2 (100 months). In summary, the stratification of secondary glioblastoma into two genomic subclasses was statistically significant in our dataset and the classes may have further biological significance in predicting time to tumor progression. This shows that despite their shared histopathology, primary and secondary glioblastomas are molecularly distinct and that secondary glioblastoma consists of two molecular entities apparently driven by radically different pathogenetic mechanisms.
Presence of Many Recurrent and Novel Amplifications and Deletions in Glioblastoma
The gNMF clustering of genomic profiles is dominated by recurrent CNA events, which often span long regions such as chromosomes or chromosomal arms. Whereas such broad gains and losses provide useful classifying features, the more focal events revealed by the high-resolution platform in this study may hold valuable information about tumor subclass and the target genes driving these diseases. For example, whereas complete or partial gain of ch7 is not uncommon in secondary glioblastomas (10 of 17), the discrete amplification of the region containing EGFR is seen frequently and exclusively in primary glioblastomas.
We first sought to generate a complete catalog of discrete CNAs in glioblastoma, without classification by clinical history. Narrow events were catalogued using a previously described approach to defining MCRs of overlapping CNAs among sample sets ( 13, 15). Analysis of the 37 independent primary and secondary glioblastoma samples defined 222 MCRs of gain/amplification or loss/deletion based on data smoothed by segmentation (Supplementary Table S1). To identify MCRs with potentially greater pathogenetic significance, we further applied a criterion that at least one CNA of an MCR exceeded log 2 thresholds of +0.8 for amplifications and −0.65 for deletions. A total of 97 “high-priority” MCRs fulfilled this criterion (Supplementary Table S2), 41 amplifications with a median size of 0.92 Mb (range, 0.1-14.51 Mb) containing a median number of 8 known genes (range, 2-60) and 56 deletions with a median size of 1.13 Mb (range, 0.07-22.38 Mb) containing a median of 10 known genes (range, 2-233). High confidence is ascribed to these MCRs as evidenced by (i) consistent verification via real-time quantitative PCR in 34 of 36 (94%) randomly assayed high-priority MCRs (data not shown), and (ii) complete overlap with known glioblastoma loci including focal deletions encompassing CDKN2A, PTEN, and RB and focal amplifications targeting EGFR, platelet-derived growth factor receptor α (PDGFRα), CDK4, MDM2, MET, and c-MYC (Supplementary Table S1). Notably, 17 of 97 MCRs contain hotspots for proviral integration in diverse cancer models including a murine glioma model ( 21) and 12 MCRs contain microRNAs (Supplementary Table S2), several of which are overexpressed in glioblastoma ( 22).
Highly Focal Amplifications and Deletions with Cancer-Relevant Genes
The resolution of the platform employed enabled precise definition of MCR boundaries, revealing 42 MCRs that were present in ≥4 (10%) tumors and spanning ≤1.0 Mb with a median of five known genes. Examination of these resident genes revealed known oncogenes, tumor suppressor genes, or their highly related homologues, most of which have not been ascribed to gliomagenesis but many that contain genes with strong cancer relevance. Some notable genes in amplicons include EPS15 (1p32), WNT7A (3p26.3-3p24.3), and RASSF3, a RAS association family member (12q14.2), and in loss/deletion MCRs IRF2BP2, an IFN regulatory factor 2 repressor (1q42.2) and MKK4/JNKK1 (6q26). Even more striking are the profiles of 15 MCRs spanning <0.5 Mb (median, 4 genes) and present in >4 (10%) tumor samples (Supplementary Table S2, bold and italics). Among these, some notable candidates not previously ascribed to glioblastoma include extracellular signal–regulated kinase (ERK)-8 (8q23-24 amplicon), a novel member of the RAS-mitogen-activated protein kinase family that is activated by a SRC-independent mechanism, and TMPO/LAP2 (12q22 deletion), a protein that anchors Rb in the nucleoskeleton and regulates Rb function. Recognizing that the sample size is limited, it is intriguing that deletions of TMPO/LAP2 and RB are mutually exclusive.
Comparison of Primary and Secondary Glioblastoma Genomic Profiles
The differences observed in the skyline recurrence plots of primary and secondary glioblastomas ( Fig. 1) and the gNMF classification ( Fig. 2) prompted a comparison of MCRs of primary and secondary glioblastomas to identify focal common and subtype-specific genomic events that may drive pathogenesis. Primary glioblastomas contain 101 MCRs (38 amplifications, 63 deletions) with a median size of 1.28 Mb (range, 0.07-24.9 Mb; Supplementary Table S3), and secondary glioblastomas contain 135 MCRs (52 amplifications, 83 deletions) with a median size of 2.94 Mb (range, 0.06-21.03 Mb; Supplementary Table S4). Only 59 MCRs were shared, of which 24 seem to be novel loci and 35 reside within known glioblastoma loci, although it is worth noting that many of these previously reported loci are very large and our MCRs have narrowed the regions of interest considerably ( Table 1 ). Strikingly, most MCRs proved to be subtype specific (see below).
Shared MCRs. Among the 35 shared known loci (10 amplifications and 25 deletions) were deletion of RB, CDKN2A, and PTEN, and amplification of CDK4, MDM2, c-MYC, and GAC1. Of interest, the 4q11-13 amplification containing PDGFRα was present in both subtypes. Twenty-two shared novel loci (11 amplifications, 11 deletions) were quite focal—0.98 Mb median size, median 10 resident genes. Although not previously described in glioblastoma, there is a high degree of confidence that these MCRs are cancer relevant because a significant number of the MCR resident genes in the amplicons exhibit increased expression in glioblastoma relative to normal brain (P < 0.05; see Materials and Methods and Supplementary Table S2) and/or these loci and resident genes have been implicated in the pathogenesis of other cancer types. For example, the ErbB receptor ligand NRG2 (neuregulin-2), which resides in a 5q MCR amplicon, is overexpressed in glioblastoma compared with normal brain (P < 0.05, KLL/DNL dataset) and has been correlated with node positivity and worse outcome in breast cancer ( 23). The same MCR contains DTR (heparin binding EGF-like growth factor), which is up-regulated in inflammatory breast cancer and associated with poor prognosis ( 24). Another 12q13-15 amplicon with four MCR genes contains the dual specificity kinase DYRK2, which is amplified and/or overexpressed in gastrointestinal stromal tumor, esophageal cancer, and non–small-cell lung cancer ( 25). An intriguing MCR spanning ∼4 Mb at 19p13.12, which is gained/amplified in 40% of glioblastoma cases, is particularly notable for the presence of many genes almost exclusively encoding zinc finger proteins.
Similarly, recurrent deletions point to a number of prime candidates that are targeted in both subtypes. For example, ∼10% of glioblastomas sustain deletion of 2q37.1, which contains INPP5D/SHIP1, an inositol polyphosphate-5-phosphatase family member that has been implicated in leukemia and gastrointestinal stromal tumor ( 26). Focal loss on 12q24 contains three genes, SFRS8 (a splicing factor for fibronectin, among others), MMP17 (MT4-MMP), a membrane-type matrix metalloproteinase that is down-regulated in prostate cancer ( 27), and ULK1, a novel serine-threonine kinase ( 28).
Subtype-specific MCRs. Whereas subtype-specific differences are well established for a handful of loci, 60% of the MCRs were specific to either primary or secondary glioblastoma (Supplementary Tables S3 and S4, respectively). It is worth noting that this degree of difference is comparable to that reported for pancreas versus lung cancer ( 13), further substantiating the distinct pathogenesis of the classic glioblastoma subtypes. For primary glioblastoma-specific MCRs, there are 23 gains/amplifications with a median size of 0.87 Mb and median number of 13 genes. Of these 23 loci, 6 represent loci that have been significantly narrowed and 11 loci have not been described in glioblastoma, yet their cancer relevance is reinforced by validated connections to other cancer types such 2p25 in ganglioneuroblastoma ( 29), 6p25 in uveal melanoma ( 30), and 17q21 in medulloblastoma (ref. 31; Table 2 ). For the loss/deletion MCRs, there are 35 loci with a median size of 1.86 Mb and median of 16 genes. For most loci, the MCR represents a significantly narrowed focal region(s) within large regions or whole chromosomal arms of loss of heterozygosity (LOH) previously described in glioblastoma. There is strong cancer relevance within these narrowed loci, with many being associated with multiple diverse cancers; some examples include loss of 3q12 in small-cell lung cancer ( 32), leukemia ( 33), and nasopharyngeal carcinoma ( 34); loss of 4q34 in gastric cancer ( 35), hepatocellular cancer ( 36), and primitive neuroectodermal tumor ( 37); and loss of 15q21 in bladder cancer ( 38) and rhabdomyosarcoma ( 39). Thus, it is likely that these candidate tumor suppressor genes in primary glioblastoma bear relevance to other cancer types.
For secondary glioblastoma-specific MCRs, there are 34 gains/amplification (median size 3.25 Mb, median of 27 genes). Consistent with previous reports of MET amplification occurring independently of EGFR amplification ( 40), we observed MET amplification only in secondary glioblastoma. Of the remaining 33 loci, none have previously been described in secondary glioblastoma and only a few have been found in glioblastoma datasets not otherwise classified. However, several loci have been reported at least by band location in other cancer types [e.g., 11q13 in head and neck squamous cell cancer ( 41) and 20q12-13 in poor prognosis esophageal cancers ( 42)], supporting the cancer relevance of these subtype specific loci. With regard to the loss/deletion MCRs, there are 54 loci with median size of 3.02 Mb and median of 27 genes. Most loci have either not been described for secondary glioblastoma or have been significantly narrowed (e.g., 19q13.12-q13.2). Commonly, large regions of LOH that are reported for glioblastoma have been narrowed in this analysis to small regions that are distinct from primary glioblastoma. Again, the potential relevance of these loci to glioblastoma pathogenesis is reflected by their association with other cancer subtypes [e.g., loss of 1q23 in medulloblastoma ( 43), 5q11 in head and neck squamous cell cancer ( 44), and 8p12 in hepatocellular carcinoma ( 45)].
Among the large number of subtype-specific MCRs, 18 of 58 primary glioblastoma-specific MCRs fulfill the high-priority criteria and several are very focal—8 amplifications and 10 deletions with median sizes of 0.31 and 0.89 Mb. In the secondary glioblastoma-specific MCRs, 34 of 88 are high-priority MCRs ( Table 2) and 5 amplifications and 5 deletions are highly focal (<1.0 Mb) with a median number of 5 genes, including an amplicon at 11q13.5-11q14.1 that contains GAB2, a docking protein involved in EGF-induced ERK activation ( 46).
In this study, gNMF-based unsupervised classification of glioblastoma genomic profiles distinguished the known clinical subgroups and identified previously unappreciated wide-scale genomic differences between primary and secondary glioblastomas. We have further identified novel subclasses of secondary glioblastoma grounded in clinical history. In addition to many classic loci, we identified more than 20 novel loci common to all glioblastoma subtypes as well as numerous novel loci specific to primary and secondary glioblastomas. These subcategories of MCRs, linked to clinical and potentially biologically important subclasses of glioblastoma, provide a molecularly grounded framework for the study of glioblastoma pathogenesis and for the more rapid prioritization of genes for validation and ultimately drug discovery. On a more technical level, our ability to use archival tissue to generate DNA of quality sufficient for analysis by high-resolution aCGH has enabled the use of clinically annotated tumor specimens that, without frozen tissue, have largely been inaccessible to gene-specific whole genome profiling for basic investigation and discovery.
Using the clinical information to generate subgroups of tumors for analysis proved useful in suggesting new pathogenetic pathways and may be useful in guiding the application of targeted therapies. The unsupervised clustering of glioblastoma into three genetic subgroups suggests distinct routes of pathogenesis and tumor biology and raises questions about whether a common phenotype results from functional redundancy within varied genetic pathways or from the existence of critical common overlap genes that ultimately determine phenotype. The answers to these questions are of prime relevance to the development of effective therapies for glioblastoma because establishing what specific genes, gene patterns, and/or genetic pathways are responsible for the uniformity of biological and clinical outcome will ultimately identify prime targets for glioblastoma therapy. Along these lines, it is worth recalling the recent clinical trials of EGFR inhibitors for recurrent glioblastoma. Whereas fewer than 10% of patients derived benefit from the EGFR inhibitors ( 47), there is clear evidence that clinical response requires normal PTEN activity and constitutive EGFR activation ( 10). On examination of our glioblastoma genomic profiles, one would predict infrequent response to EGFR inhibitors. Specifically, whereas primary glioblastoma commonly harbors amplified EGFR (with approximately half of these expressing an activated mutant allele), the majority of our cases also sustained loss of chromosome 10 encompassing PTEN. Along similar lines, examination of secondary glioblastoma profiles shows neither EGFR amplification nor activating mutations, strongly suggesting that these patients would fail to respond to EGFR inhibition regardless of PTEN status. It stands to reason that, because treatment planning and clinical trial design have been based on a small number of signature mutations, the more complete genomic profiles of this study could provide drug response biomarkers for EGFR and other targeted therapies.
A recent study has proposed that the genomic differences between primary and secondary glioblastomas may be reflected at a transcriptional level as well ( 2). This study has reported only 21 genes distinctly overexpressed in secondary and 58 genes in primary glioblastoma, when each glioblastoma group was compared with grade 2/3 astrocytoma. An additional 15 genes were overexpressed in both primary and secondary glioblastomas. Whereas this study did not conduct direct comparisons of primary and secondary glioblastoma expression patterns, they provided the opportunity for comparison of these expression changes with our genomic alterations and molecular classification. Of the 55 autosomal genes uniquely overexpressed in primary glioblastoma, we found 13 to fall within regions that were preferentially amplified in primary versus secondary glioblastoma, compared with 4 that we would have expected based on genome size. Of the 21 autosomal genes uniquely overexpressed in secondary glioblastoma, 3 mapped to preferentially amplified regions whereas <1 is expected by chance. Taken together, these data suggest that a subset of these expression changes are driven by primary genomic events and that the many fewer genes found by Tso and et al. ( 2) to be differentially expressed may be due to the molecular heterogeneity of the disease that we observed within the genome. Finally, this study, along with that of Tso et al. ( 2), supports the view that the glioblastoma subtypes, indistinguishable on the histopathologic level, are driven by common and distinct genetic events.
Transformation to glioblastoma from low-grade diffuse astrocytoma in adults is considered to be an almost inevitable clinical outcome ( 48). Various treatment strategies have been implemented to delay transformation but, to date, no effective strategies have been identified owing mainly to the lack of understanding of this phenomenon and paucity of molecular targets. The identification of secondary glioblastoma-unique MCRs, as well as two clearly distinct genetic subgroups within the secondary glioblastoma tumors, provides an opportunity to investigate transformation to secondary glioblastoma. It is intriguing that there may be differences in clinical behavior, specifically time to transformation to glioblastoma in these subgroups ( Fig. 4), although the lack of statistical significance of this finding underscores the need for an expanded sample set and an independent validation that these two classes differ by clinical progression time. Logistically, the number of candidate transformation loci is substantially reduced by subclassification of the secondary glioblastoma genome. Moreover, useful insights may derive from the identification of genomic events that track together in the subtypes. For example, the K3-3 subclass is interesting in its predominance of regional loss, suggesting that there are tumor suppressor genes embedded within the MCRs. Relevance to transformation to glioblastoma will ultimately benefit from detailed analysis of low-grade astrocytomas because genes important for tumor initiation would be present in both datasets whereas transformation genes would appear uniquely in the secondary glioblastoma. Along these lines, the chromosomal patterns within the subgroups might reflect the unique combinations of initiation and progression loci required for full transformation to glioblastoma, insights that could not be gleaned from the dataset without subclassification. How distinct genetic programs lead to a phenotypically indistinguishable end point in glioblastoma or how these genetic profiles will influence treatment responses remains a high priority for future investigation.
Grant support: The Goldhirsh Foundation (E.A. Maher and L. Chin); the Christopher S. Elliott Glioblastoma Brain Tumor Research Fund, The Richard Cerullo Fund, and Par Fore The Cure (E.A. Maher); NIH grants RO1CA99041 (L. Chin), CA57683 (D.N. Louis), and 5P01CA95616 (R.A. DePinho, D.N. Louis and L. Chin); and the Robert A. and Renee E. Belfer Foundation Institute for Innovative Cancer Science (R.A. DePinho).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
We wish to thank Debra Gigas, R.N., Lisa Doherty, RN, ANP, OCN, Jennifer Zimmerman, and Louis Ostrowsky for assistance with case identification and clinical review.
Note: Supplementary data for this article are available at Cancer Research Online (http://cancerres.aacrjournals.org/).
Current address for E.A. Maher: Annette G. Strauss Center for Neuro-Oncology, Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center at Dallas, Dallas, TX 75390-9186.
R.A. DePinho is an American Cancer Society Professor and an Ellison Medical Foundation Scholar.
- Received June 8, 2006.
- Revision received September 22, 2006.
- Accepted September 29, 2006.
- ©2006 American Association for Cancer Research.