Sarcomas are a biologically complex group of tumors of mesenchymal origin. By using gene expression microarray analysis, we aimed to find clues into the cellular differentiation and oncogenic pathways active in these tumors as well as potential biomarkers and therapeutic targets. We examined 181 tumors representing 16 classes of human bone and soft tissue sarcomas on a 12,601-feature cDNA microarray. Remarkably, 2,766 probes differentially expressed across this sample set clearly delineated the various tumor classes. Several genes of potential biological and therapeutic interest were associated with each sarcoma type, including specific tyrosine kinases, transcription factors, and homeobox genes. We also identified subgroups of tumors within the liposarcomas, leiomyosarcomas, and malignant fibrous histiocytomas. We found significant gene ontology correlates for each tumor group and identified similarity to normal tissues by Gene Set Enrichment Analysis. Mutation analysis done on 275 tumor samples revealed that the high expression of epidermal growth factor receptor (EGFR) in certain tumors was not associated with gene mutations. Finally, to further the investigation of human sarcoma biology, we have created an online, publicly available, searchable database housing the data from the gene expression profiles of these tumors (http://watson.nhgri.nih.gov/sarcoma), allowing the user to interactively explore this data set in depth.
- gene expression profile
- gene ontology
Sarcomas are malignant tumors of mesenchymal origin with ∼15,000 soft tissue and bone sarcomas newly diagnosed in the United States annually. Although sarcomas represent only 1% of all human malignancies, they have distinctive biological characteristics, which include a high incidence of aggressive local behavior and a predilection for metastasis. Several sarcomas such as Ewing's sarcoma, synovial sarcoma, alveolar rhabdomyosarcoma, and myxoid liposarcoma tend to occur in younger patients and are characterized by tumor-specific chromosomal translocations. In contrast, other sarcomas such as leiomyosarcoma and malignant fibrous histiocytoma lack specific translocations and have a chaotic karyotype accompanied by frequent chromosome copy number changes. This latter group occurs more frequently in older adults and includes several types of sarcomas that lack known disease-specific chromosome translocations or mutations, but may contain mutations in RB1, CDKN2A, and TP53. Investigation of these and other aspects of sarcoma biology have provided insights into broadly relevant fundamental mechanisms of oncogenesis (for review, see ref. 1) and may have profound implications in the development of therapeutic intervention. This is exemplified by the successful treatment of gastrointestinal stromal tumors, which frequently have activating mutations of KIT, with the tyrosine kinase inhibitor, imatinib mesylate. Although genetic alterations, particularly fusion genes arising from translocations, have been identified in many sarcomas, the function of the fusion gene products are not well understood and their downstream targets have not been fully identified ( 1). Additionally, in tumors that lack specific chromosomal translocations, there may well be additional oncogenic mutations to be described. Furthermore, “second hits” or additional genetic mutations, thought to be essential in cancer development, have rarely been identified in sarcomas ( 1).
Progress in the evaluation of sarcomas has been limited not only by their rarity, but also by their histologic diversity and genetic complexity, making high-throughput tumor profiling a critical tool to advance understanding of sarcoma tumor biology. Studies of human sarcoma samples using microarray technology first began with a report on seven alveolar rhabdomyosarcoma on a 1,238-feature microarray ( 2). Since then, there have been remarkable advances in microarray technology leading to a growing body of gene expression studies focused on characterizing this complex group of tumors ( 3– 5). Recent studies have described gene signatures associated with poor clinical outcome in leiomyosarcoma ( 6) and Ewing's sarcoma ( 7), diagnostic classification ( 8), and novel biomarkers in dermatofibrosarcoma protuberans ( 9) and clear cell sarcoma ( 10).
Previous studies have generally focused on a limited number of histotypes and have typically separated bone and soft tissue sarcomas. We sought to generate a technically uniform gene expression data set that would allow a broad view of the more frequent bone and soft tissue sarcoma types. In this study, we utilized gene expression array analysis and denaturing high-performance liquid chromatography and immunohistochemistry on tissue microarray to evaluate the largest set of human sarcoma samples studied by high-throughput genetic techniques to date. Gene expression data was processed by integrating standard cluster analysis methods with more recently developed approaches, such as gene ontology analysis and gene set enrichment analysis. Our primary goal was to present an in depth evaluation of the expression profiles found in human sarcomas which could be used as a basis for the identification of their key biological features. To achieve this goal, we have established tumor specific profiles with highly significant gene lists, created gene ontology profiles, identified expression of potentially critical genes and pathways and developed a searchable database which makes these data available to the sarcoma community. Through this comprehensive approach, we gained additional insight into the genetic diversity and complexity of sarcomas, including clues regarding their origin, differentiation and pathophysiology.
Materials and Methods
Frozen tissue samples and pathologic diagnosis. Gene expression data represent 181 tumors taken from a sarcoma frozen tissue tumor bank (−80°C) located at the Cancer Genetics Branch of the National Human Genome Research Institute ( Table 1 ). The tumors were accessioned through the Cooperative Human Tissue Network, the National Cancer Institute, the Norwegian Radium Hospital, and the Memorial Sloan-Kettering Cancer Center over an ∼15-year period. All samples had original pathology reports available and the majority, 108 of 181 tumors, had H&E reviewed from accompanying paraffin blocks of tissue by one of us with expertise in sarcoma pathology (C.R. Antonescu). The central pathology review was done in a semiblinded fashion, based on one representative H&E slide and limited clinical information (age, sex, and anatomic site), in the absence of the original pathology report or immunoprofile. An additional 38 tumors had their diagnosis confirmed by the presence of a chromosomal translocation (Ewing's sarcoma, synovial sarcoma, alveolar rhabdomyosarcoma), or gene mutations such as KIT and PDGFRA in gastrointestinal stromal tumor.
Reference cell lines. Reference RNA was derived from a pooled group of five sarcoma cell lines: SK-LMS-1, HT-1080, RMS-13, OSA-Cl, and TC-32. The SK-LMS-1 and the HT-1080 cell lines were obtained from the American Type Culture Collection and grown in RPMI 1640 + 10% fetal bovine serum (FBS) + penicillin-streptomycin + nonessential amino acids. The RMS-13 and OSA cell lines were obtained from St. Jude Children's Research Hospital (Memphis, TN) and grown in RPMI 1640 + 10% FBS + penicillin-streptomycin + l-glutamine. The TC-32 cell line was obtained from the Laboratory of Pathology at National Cancer Institute/NIH (Bethesda, MD) and grown in RPMI 1640 + 15% FBS.
RNA labeling and hybridization. RNA was extracted from tumors and cell lines via standard methods using tissue homogenization in Trizol reagent (Invitrogen, Carlsbad, CA), and purified using RNeasy midi kits (Qiagen, Valencia, CA), quality checked on 2% agarose gel. RNA labeling and hybridization was done as previously described ( 2, 11). 4 Approximately 100 to 120 μg of tumor RNA and 75 to 100 μg of control RNA were oligo(dT)-primed and labeled using reverse transcription with either Cy3 dUTP (control) or Cy5 dUTP (sample; Amersham Pharmacia Biotech, Piscataway, NJ). Microarrays contained 12,601 PCR-amplified cDNAs from sequence-verified IMAGE consortium clones deposited on poly-l-lysine–coated glass slides using ink jet technology by Agilent Technologies (Palo Alto, CA). Gene names were assigned according to the UniGene human sequence collection. 5
Image analysis and statistical analysis. Microarray slides were scanned on an Agilent DNA Microarray Scanner. Image analyses were done using DeArray software (Scanalytics, Fairfax, VA; ref. 12). Using mean red intensity of >200 as a general quality threshold, 7,788 probes were selected and subjected to several data analysis algorithms. 6 For hierarchical clustering, we selected the 1,527 most highly variable genes across the data set with a standard deviation of the log of the calibrated ratio of >0.35. The ratio values were log transformed and clustered using the Pearson correlation coefficient. Further hierarchical clustering was done on several of the larger tumor subsets: malignant fibrous histiocytomas, leiomyosarcomas, and liposarcomas. Similar parameters were applied, using a mean red intensity of >200 and a standard deviation of the log of the calibrated ratio >0.35 within the subset of tumors (1,250 probes for malignant fibrous histiocytoma, 1,246 probes for liposarcomas, and 1,305 probes for leiomyosarcoma). Finally, using the same parameters, calibrated ratio values were log transformed and multidimensional scaling plots and weighted gene lists were generated as previously described ( 13). 4 Weighted gene lists were derived by t test for two-group comparison, whereas f-statistic (one-way ANOVA) was used for comparison of sets of more than two groups. Weighted gene list P values were derived from 10,000 random permutations. All algorithms were implemented in MATLAB (Mathworks, Inc. Natik, MA).
Tissue microarray. A sarcoma tumor tissue microarray was created using paraffin blocks from a subset of 374 sarcoma samples supplied by the Cooperative Human Tissue Network (123 samples overlap with expression analysis) using techniques as described previously ( 14). The sarcoma tissue microarray is comprised of two to three 0.6 mm tissue cores from each sample and consisted of 14 tumor types ( Table 1).
Immunohistochemistry. Five-micrometer-cut sections were transferred via a tape transfer system onto adhesive-coated glass slides (Instrumedics, Hackensack, NJ). Slides were deparaffinized using a xylene-ethanol series. Antigen retrieval was done by steaming for 30 minutes in antigen retrieval buffer (Vector Laboratories, Burlington, Ontario, Canada) followed by horseradish peroxidase/3,3′-diaminobenzidine immunohistochemistry using UltraVision Detection System (Lab Vision Corporation, Fremont, CA) for rabbit and mouse antibodies, and Immunocruz Staining System (Santa Cruz Biotechnology, Inc., Santa Cruz, CA) for goat antibodies. The following primary antibodies were used: FTL1, EPHB4, JAK1 (Santa Cruz Biotechnology); EGFR (Cell Signaling Technology); WNT5A, FZD1 (R&D Systems, Minneapolis, MN). Primary antibodies were applied to tissue microarray slides and incubated at 4°C overnight. Slides were counterstained with hematoxylin and fixed. All tissue microarrays were reviewed and scored on an eight-point scoring system by one of us (P.H. Duray). The scale for scoring spots are based on 0 to 4 points for antibody intensity (0 = no staining, 1 = weakly staining, 2 = moderate staining, 3 = heavy staining, 4 = very heavy staining) and 0 to 4 points for percentage of the tumor cells staining positive (0 = 0% of cells staining positive, 1 = 1-25%, 2 = 26-50%, 3 = 51-75%, and 4 = 76-100%).
Denaturing high-performance liquid chromatography. DNA was extracted from a total of 275 tumors from the same set of sarcoma frozen tissue tumor bank (−80°C; Table 1). Total genomic DNA was isolated by ethanol precipitation from the interphase of primary tumor samples homogenized in Trizol and purified by phenol-chloroform extractions, precipitated, and further purified over a Qiagen column. PCR products were analyzed on a Transgenomic Nucleic Acid Fragment Analysis System (Omaha, NE). All available samples were screened for mutations in KIT exons 9, 11, and 13, PDGFRA exons 12 and 18, and EGFR all exons (1-28). Samples with potential mutations detected by denaturing high-performance liquid chromatography were subsequently sequenced. Primer sequences are available in Supplementary Table S5. 7
Gene ontology and gene set enrichment analysis. Gene ontology provides structured functional, localization, and biological process information about individual genes. 8 We used the GoHyperG function from the bioconductor project to find overrepresented ontology categories from the weighted lists of genes associated with each tumor type. For each tumor type, we obtained a list of gene ontology categories, ranked by hypergeometric test P value. 9 Gene set enrichment analysis is “designed to detect modest but coordinate changes in the expression of groups of functionally related genes” ( 15). In our implementation of the gene set enrichment analysis scan, we first obtained the raw data from the Genomics Institute of the Novartis Foundation ( 16) compendium of normal tissues consisting of 79 normal tissues assayed in duplicate for 22,000 features representing over 13,000 unique LocusLink IDs using the Affymetrix hgU133A array. These data were normalized using robust multiarray averaging ( 17). For each normal tissue, all genes were ranked by t statistic using a pooled variance using limma, a technique particularly suited to small numbers of samples per tissue type ( 18). Gene set enrichment analysis scanning of the weighted gene lists for each tumor against these ranked lists for each normal tissue produces one value for each tumor-normal tissue pair; mappings between cDNA and Affymetrix platforms was via LocusLink IDs. High values represent similarity in expression between the tumor and normal tissue. We computed rough statistical significance by randomly resampling genes from the sarcoma array 10,000 times per sarcoma group for each normal tissue type.
Results and Discussion
Hierarchical Clustering Analysis
To gain an overview of the data, we first did unsupervised hierarchical clustering of 181 tumors using the 1,527 most variably expressed genes within the data set (log SD > 0.35). The dendrogram divided into two main branches one containing tight clusters of many of the sarcoma subgroups, in particular the gastrointestinal stromal tumor, osteosarcomas, rhabdomyosarcoma, Ewing's sarcoma, synovial sarcoma, hemangiopericytoma, dermatofibrosarcoma protuberans, and myxoid liposarcomas. The second branch contained the leiomyosarcoma, the majority of the malignant fibrous histiocytoma, and the remaining liposarcoma tumors, which clustered more loosely ( Fig. 1A ; Supplementary Fig. S1). The liposarcoma group separated into two distinct populations, one of which formed a tightly clustered group, and the remainder intermingled with the malignant fibrous histiocytoma cluster. The majority of liposarcomas that formed the tightly clustered group were considered lower grade and were predominantly myxoid or round cell (17 samples) or well differentiated (1 sample). Only three members in this group were identified as high grade, dedifferentiated (one sample), or pleomorphic (two samples). Conversely, the majority of those liposarcoma tumors dispersed among the malignant fibrous histiocytoma tumors were considered higher-grade tumors with six samples classified as dedifferentiated and three samples as pleomorphic. Only one tumor from this second group was characterized as a myxoid tumor. Finally, three groups, the peripheral nerve sheath tumors (which was comprised of three benign schwannomas and five malignant peripheral nerve sheath tumors), the fibrosarcomas, and the miscellaneous tumors [not otherwise specified (NOS)] formed no distinct clusters from the other groups. The three benign schwannomas were found scattered along the dendrogram and did not separate from the malignant tumors. Many of the remaining tumors (four of eight peripheral nerve sheath tumors, two of seven fibrosarcoma tumors, and three of ten NOS tumors) intermingled within the main malignant fibrous histiocytoma cluster of tumors. Two of the seven fibrosarcoma samples clustered with the synovial sarcoma, similar to a previous expression study of soft tissue sarcomas where the majority of their fibrosarcoma samples (five of eight) clustered with the synovial sarcoma ( 3). Finally, one malignant peripheral nerve sheath tumor clustered within the synovial sarcoma, which was also seen in a prior study of synovial sarcoma, where synovial sarcoma and malignant peripheral nerve sheath tumor tumors clustered tightly (detailed dendrogram; Supplementary Fig. S1; ref. 19).
Weighted Gene Analysis
We next subjected the data to supervised cluster analysis to generate a weighted gene list. Because the gene selection algorithm is based on determination of homogeneity within a group, the following criteria were applied in selecting the evaluable: the tumor had a specific diagnosis confirmed by pathologic review; the tumor clustered on the same branch as the other members of the diagnostic group; and there was more than one member in the group. Of the original 181 tumors, 134 met these conditions—three were eliminated for only having one group member (clear cell sarcoma, alveolar soft part sarcoma, chondrosarcoma), 10 tumors were undiagnosed (NOS), and 34 tumors either did not cluster with their diagnostic group (24 samples) or had an uncertain diagnosis on pathology review (10 samples). For the two diagnoses that did not form a major cluster by hierarchical clustering (malignant peripheral nerve sheath tumor, fibrosarcoma), the tumors were evaluated if both the original pathology report and the diagnosis by our pathologic review were in concordance. All five of the malignant peripheral nerve sheath tumors and five of seven fibrosarcoma met these criteria.
This gene selection analysis produced a highly statistically significant list of 2,766 probes of the 7,788 submitted for analysis with P < 0.0001, including 562 probes with weights >10 and P < 0.000001 ( Fig. 1B; Supplementary Fig. S2A and S2B; Supplementary Table S1). This represented 36% of those probes submitted reflecting the extreme diversity of these tumors. Each group exhibited a highly informative list of associated genes, including numerous genes not previously associated with sarcoma. Many genes previously associated with specific tumor types were among those most highly weighted for example: the muscle markers MYLK, CNN1, and ACTG2 in leiomyosarcoma; KIT in gastrointestinal stromal tumor tumors; SSX1 in synovial sarcoma; and PDGFB in dermatofibrosarcoma protuberans ( 4, 20– 23).
The top discriminators for the dermatofibrosarcoma protuberans include the most highly associated probe arylsulfatase G (ARSG), as well as NBEA, KCTD12, RRM2, LGALS1, NOTCH2, SPRY2, HCC-4, NRP1, and several collagen family members. SPRY2 is an interesting gene that encodes a putative signaling molecule that is involved in the regulation of the EGF, FGF, and Ras/MAPK signaling pathways. NRP1 is also noteworthy in that it encodes a membrane-bound coreceptor to a tyrosine kinase receptor for both vascular endothelial growth factor and semaphorin family members and plays a role in angiogenesis, cell survival, migration, and invasion ( 24). The top discriminators for the Ewing's sarcoma tumors include the most highly associated probe FVT1, DCC, and DKK2. DKK2 is a member of the Dickkopf family of genes that are involved in regulation of WNT signaling. Interestingly, 9 of the top 20 discriminators in the Ewing's sarcoma weighted list are genes or isoforms of genes predominantly expressed in brain or neuronal tissue. The most highly associated identified probe for the fibrosarcoma tumors was PMP22. The protein tyrosine phosphatase PTPRZ1 and fibronectin1 (FN1) were also highly expressed and heavily weighted in the fibrosarcoma tumors. The top discriminators for the gastrointestinal stromal tumors include the oncogene KIT and other top ranking genes—ZNF41, HRASLS3, PLAT, GHR, FHL2, IGF2, and FAT. The top discriminators for the hemangiopericytomas include the most highly associated probe TLE3, which is involved in epithelial differentiation, as well as CELSR2, COL13A1, and SVIL.
The genes most highly associated with the liposarcoma tumors discriminating them from the other sarcoma types include several genes associated with adipocyte differentiation and fatty acid metabolism, PPARG, FABP4, FALC5, as well as SH3KBP1, a mediator of apoptotic cell death, and the candidate tumor suppressor gene AIM1. Many of the top leiomyosarcoma discriminators were related to muscle structure and function such as MYLK, potassium/calcium channel components, CCN1, and SLMAP. However, IL4R, the developmental gene WDR1, and the cell signaling molecule RAB23 also heavily discriminated for leiomyosarcoma. Although the malignant fibrous histiocytomas did not exhibit a cohesive group by unsupervised cluster analysis, this group does display an intriguing group of associated genes when subjected to weighted gene analysis. Among the most heavily weighted, highly expressed genes in the malignant fibrous histiocytoma set are several transcription factors, genes involved in motility, adhesion, and proteolysis: RAB32, PLAU, MSN, RUNX1, and DSC2 ( 25– 28).
Although malignant peripheral nerve sheath tumors did not form a tight cluster on unsupervised hierarchical analysis, there was a robust list of associated genes identified by supervised, weighted gene analysis. This included the most highly associated probes IRAK1, DOK1, QSN6, and COL6A3. Also found among the top malignant peripheral nerve sheath tumor discriminators were ARHC and MMP2, both of which have been shown to be highly expressed in metastatic phenotypes ( 29, 30). The top discriminators for the osteosarcoma include PPFIBP2, S100A13, and PTHR1, as well as several collagen, cartilage, osteoid-associated genes: P4HA2, PLOD, COL5A1, LUM. Several fibroblast growth factor receptors (FGFR) were also highly associated with osteosarcoma: FGFR3, FGFR2, and FGFR1. The top discriminators for the rhabdomyosarcoma include both functional and structural genes including the most highly associated probe MYL4 as well as several cell cycle and cell signaling molecules and transcription factors: GAB1, MYCL1, MYOIE, IGF2, CCND2, PTPRF, PPFIA1, CDK6, FGFR4, GPC3, and POU4F1. Finally, the top discriminators for the synovial sarcoma include FGF11, COL4A5, PBX3, BCE-1, TLE1, PDGFRA, EFNB3, NRP2, CXCL2, SHANK2, and EGFR. The entire ranked weighted gene list can be found in spreadsheet format and detailed figure form in the Supplementary Data (Supplementary Table S1; Supplementary Fig. S2A).
In-depth Tumor-specific Profiles
Liposarcoma. We next examined the more common sarcomas, liposarcoma, leiomyosarcoma, and malignant fibrous histiocytoma for heterogeneity within groups. We first submitted the gene expression profiles of the two divergent groups of liposarcomas from the unsupervised hierarchical clustering to weighted gene analysis to determine the distinguishing features of these groups. The weighted gene list and corresponding P values were generated using a t test with 10,000 random permutations as previously described. This generated a list of 1,038 probes with P values <0.001. Interestingly, among the genes most highly associated with the higher grade tumors were several genes associated with cell adhesion, invasion, cell death, chemotherapy resistance, and metastases, including GSTTLp28, PLAT, PLAU, FN1, THBS2, MCAM, FGFR4, FAPA, FGFR2, FGF18 ( Fig. 2A ; Supplementary Table S2; refs. 28, 31– 39).
Leiomyosarcoma. In contrast to the liposarcoma group, there was surprisingly little variability within the leiomyosarcoma group. Using unsupervised clustering, no discernible distinction among anatomic site, tumor grade, or metastatic lesions was observed. Supervised analysis comparing uterine or vaginal-based leiomyosarcoma against other anatomic sites revealed only 25 genes with P < 0.001 discriminating these two groups (Supplementary Table S3). These genes were primarily related to tissue type and included regulators of urogenital differentiation, development, and growth: ESR1, HOXA10, PBX1, and FAT ( 40– 43).
Malignant fibrous histiocytoma. The malignant fibrous histiocytoma group of tumors, which are poorly differentiated and of uncertain histogenesis, proved to be a rather complex group. Although this diagnostic category is controversial, it is generally accepted to be a distinct subset of soft tissue sarcoma and remains the most common diagnosis of adult soft tissue sarcomas. There are four histologic subsets of malignant fibrous histiocytoma described, which include storiform-pleomorphic, myxoid (myxofibrosarcoma), inflammatory, and giant cell types ( 44). Although malignant fibrous histiocytoma can exhibit some muscle-associated proteins by immunohistochemistry, it is controversial whether this reflects myofibroblastic differentiation or derivation from smooth muscle tissue. Commonly malignant fibrous histiocytoma tumors are difficult to distinguish from other pleomorphic sarcomas and diagnostic certainty is difficult ( 44). Accordingly, on unsupervised clustering, the majority of malignant fibrous histiocytoma tumors coclustered with the more poorly differentiated tumors, forming a large branch that included dedifferentiated and pleomorphic liposarcomas, malignant peripheral nerve sheath tumors, and the sarcomas NOS. Interestingly, the adjacent branch on the unsupervised clustering dendrogram contained the leiomyosarcoma ( Fig. 1).
Similar to the leiomyosarcoma, neither supervised nor unsupervised clustering analysis of malignant fibrous histiocytoma led to significant distinction among anatomic site, tumor grade, or metastasis. However, unsupervised hierarchical cluster analysis of the malignant fibrous histiocytoma tumors alone identified two groups of nearly equal size. When these two groups were subjected to weighted gene analysis and gene ontology profiling, an interesting pattern emerged. A list of 279 differentially expressed genes with P < 0.001 was generated with 41 genes up-regulated within the first group and the remainder within the second group ( Fig. 2B; Supplementary Table S4). The group of genes associated with the first group carried a muscle profile with the genes myosin X, sarcoglycan β, and tenascin C among the 10 most significantly expressed genes. In contrast, the second group of tumors revealed an abundance of immune regulatory genes with HEM1, MX1, DAP10, PLCG2, and FOLR3, constituting the five most highly weighted genes. We then mapped these two gene lists to the biological process gene ontology classification. Overwhelmingly, the genes expressed in the first group belonged to the ontologic family of motor activity and the second group of genes belonged to the ontologic family of immune activity and cell adhesion ( Fig. 2C). Storiform-pleomorphic and myxoid subtypes equally populate both groups of tumors. However, the distinction of malignant fibrous histiocytoma with myogenic differentiation versus inflammatory characteristics could ultimately have substantial clinical relevance. The presence of myogenic differentiation in malignant fibrous histiocytoma and undifferentiated sarcomas has been shown to correlate to a poor clinical prognosis ( 45). The significance of the inflammatory signature we find in almost half of our malignant fibrous histiocytoma is less clear. Although there is a subclassification of inflammatory-type malignant fibrous histiocytoma, they are rare and this classification is quite controversial and more recently is thought to encompass misdiagnosed dedifferentiated liposarcomas ( 46). It is unlikely that the inflammatory signature we found represents this subclass of tumors.
Further Evaluation of Selected Pathways
Tyrosine kinases. As exemplified by the success of imatinib in gastrointestinal stromal tumor, the receptor tyrosine kinases are attractive therapeutic targets ( 47) and there are currently clinical and preclinical trials evaluating various receptor tyrosine kinase inhibitors ( 48). We found highly expressed tyrosine kinases or receptor tyrosine kinases associating with half of the tumor groups. Along with known highly expressed kinases, such as KIT in gastrointestinal stromal tumor and PDGFRB in dermatofibrosarcoma protuberans, we additionally found JAK1 in Ewing's sarcoma, FLT1 in hemangiopericytoma, EGFR and PDGFRA in synovial sarcoma, and several FGFR in osteosarcoma ( Table 2 ). Because of the potential clinical relevance, we chose to evaluate several of these proteins on our tissue microarray. Corresponding immunohistochemistry confirmed the presence of many of these proteins by staining on the array (see Supplementary Data for all immunohistochemistry images). For example, all of the Ewing's sarcoma (eight tumors) on the tissue microarray stained for JAK1 with a minimum intensity of 3+. Similarly, each hemangiopericytoma (three tumors) stained heavily for FLT1 with intensity of 7+ or greater. All of the 13 evaluable synovial sarcoma had moderate to heavy staining for EPHB4 with 9 of 13 with intensity of >6+ and the remaining four tumors staining with 3 to 5+ intensity. In the biphasic synovial tumors, the glandular epithelial components stained most intensely for EPHB4 (Supplementary Fig. S3A-C).
Since the development of the EGFR inhibitor, gefitinib, there have been several clinical trials evaluating its efficacy and subsequent studies evaluating whether EGFR expression is predictive for response to gefitinib. In lung cancer, rather than high expression, it is the presence of activating mutations in the EGFR gene that predicts response ( 49). In our data set, we found high EGFR expression in several of the tumors, particularly the synovial sarcoma, where on weighted analysis EGFR was the 25th most highly weighted gene. These observations were confirmed by correspondingly intense antibody staining by immunohistochemistry, with 8 of 16 evaluable tumors on the tissue microarray having high intensity staining of >6, two tumors having moderate-intensity scores ranging from 3 to 5, two having low-intensity scores ranging from 1 to 2, and five with no detectable EGFR staining (Supplementary Fig. S3D). However, when the complete panel of 275 sarcoma DNAs ( Table 1) was screened for mutations in all 28 exons via denaturing high-performance liquid chromatography and sequencing, no mutations were identified. This suggests that EGFR inhibition may not be an optimal therapeutic strategy for sarcomas.
WNT Signaling Pathway
The WNT signaling pathway, involved in the development of brain and the peripheral nervous system, has been found to play a critical role in the formation of several cancers ( 50), including synovial sarcoma ( 3, 19, 51). We also found evidence implicating this pathway in synovial sarcoma. WNT5A expression was pronounced, as was the expression of the WNT signaling targets FZD1, CDH4, EN2, TLE4, and TLE1. TLE1, WNT5A, FZD1, and TLE4 were ranked among the top 50 discriminating genes for the synovial sarcoma, and both WNT5A and FZD1 showed heavy staining on the tissue microarray as well. All of the synovial sarcoma samples stained positive for WNT5A with 11 of 16 tumors staining >6+ for WNT5A, and 9 of 16 tumors staining >6+ for Frizzled (Supplementary Table S1; Supplementary Fig. S4).
Homeobox and Early Developmental Genes
The homeobox genes are involved in early embryonic development and the determination of cell fate. There is interest in defining the relationship between deviant expression of these early developmental genes and their role in cancer. Specifically, aberrant homeobox gene expression has been linked to the development of leukemia, testicular cancer, breast carcinoma, as well as several other tumors ( 52). The possibility that sarcomas are derived from early mesenchymal stem cells or pluripotent progenitor cells ( 53) makes this group of genes particularly intriguing. Several homeobox genes have been identified in the gene expression profiles of various sarcomas: MEOX2 in synovial sarcoma ( 3), HOXA5 in liposarcoma ( 5), and MEOX1 in dermatofibrosarcoma protuberans ( 54). In our study, we confirmed these associations and identify several other highly expressed and statistically significantly associated homeobox genes ( Table 3 ).
To examine more global differences in gene expression between sarcoma, we mapped the genes from the weighted gene list for each tumor type to the biological process gene ontology classification and obtained a list of gene ontology categories, ranked by hypergeometric test P. There were multiple categories for many tumor types of both biological and statistical interest (complete gene ontology tables can be found on the Supplementary Data). In addition to providing a high-level description of some of the biological processes involved in differentiating these tumors from each other, many of the gene ontology lists suggest a tissue-of-origin effect. Tissue-type influences are clear where angiogenesis and blood vessel development are seen in hemangiopericytoma, muscle development and regulation of muscle contraction in rhabdomyosarcoma and leiomyosarcoma, fatty acid metabolism in liposarcoma, skeletal development in osteosarcoma, and neurogenic regulation in malignant peripheral nerve sheath tumor. In those tumors where tissue of origin is unknown, Ewing's sarcoma and synovial sarcoma, no clear tissue association emerges. There are also cellular pathway categories prominent within the gene ontology classification such as phospholipase C activation in dermatofibrosarcoma protuberans, FGFR signaling in rhabdomyosarcoma, and transcription regulation and the WNT signaling pathway in synovial sarcoma. WNT signaling, phospholipase C, and FGFR signaling are all thought to play key roles in the oncogenesis and metastatic potential of tumors ( 50).
Gene Set Enrichment Analysis
We used gene set enrichment analysis to further explore similarities between sarcoma gene expression and normal tissues using the publicly available normal tissue gene expression data from Genomics Institute of the Novartis Foundation. Gene set enrichment analysis generated a table of 936 P values (13 tumor types × 72 normal tissues). The −log10 (P value) of the 936 members of the P value matrix were used to construct a heatmap by clustering tumor types on the x-axis and normal tissues on the y-axis using the Euclidian distance metric ( Fig. 3 ). Some rather striking tumor-normal tissue similarities characterize some but not all tumor types. Particularly interesting is the unanticipated similarity between the profiles of the Ewing's sarcoma and synovial sarcoma, which both exhibit similarity to several brain tissues, which supports the position that these tumor groups exhibit features of neuroectodermal differentiation. Other similarities exist between the malignant peripheral nerve sheath tumor and malignant fibrous histiocytoma tumors as well as the dermatofibrosarcoma protuberans and fibrosarcoma tumors. The inflammatory component of the malignant fibrous histiocytoma tumors is also apparent by gene set enrichment analysis.
The rich, highly informative profiles that emerged from this large study of multiple sarcoma histotypes are extraordinary. Comparison of our results with previously published studies reinforces the importance of certain specific pathways. For instance, the Wnt-frizzled signaling pathway (WNT5A, FZD1, TLE1) and MEOX2 ( 3), EPHB3 ( 19), SALL2 ( 55), have been found in the profiles of synovial sarcoma in previous gene array studies and are again found to be heavily weighted and highly expressed in our synovial sarcoma samples. Similarly, HOXA5 in liposarcoma ( 5), PRKCQ in gastrointestinal stromal tumor ( 4), MEOX1 and EGR2 in dermatofibrosarcoma protuberans ( 54), and IGF2 in rhabdomyosarcoma ( 56) have also been implicated in previous studies and are prominent in our profiles as well.
Although clinical data was not available for these samples, the expression profiles of these tumors provide information that is highly relevant to translational research. The cell signaling pathways, cell surface adhesion molecules, receptor tyrosine kinases, growth factors, transcription factors, and early developmental genes expressed in these tumors identify potential candidates for therapeutic intervention and diagnostic development. In this report, we have highlighted several of the most interesting aspects of these data. However, many more facets merit further examination, particularly in the context of newly developed targeted therapies.
To facilitate exploration of these data, we have created an on-line, publicly available, searchable database housing the data from the gene expression profiles of the 134 tumors profiled through weighted gene analysis. 7 This database allows the user to browse through the expression profiles of the 7,788 probes of highest quality from the gene expression array. The user can search for genes of interest, plot their profiles, and identify genes with similar expression patterns. The database also allows searches based on tumor subclasses, and chromosome regions. The complete raw data are also available through the Gene Expression Omnibus (GEO) data repository, 10 GEO accession number GSE2553. Mining this information provides an opportunity to identify new markers for sarcoma classification and to identify potential prognostic indicators, and oncogenic pathways, which may ultimately guide the development of targeted therapeutics. In summary, we have presented the largest collection of expression profiles from human soft tissue and bone sarcomas to date. We have highlighted some of the most interesting and potentially clinically relevant features of these data based on hierarchical clustering, weighted gene analysis, gene ontology, and gene set enrichment analysis. We have also made available a database that will allow other investigators to utilize this data set to continue the process of gene discovery in these tumors.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
We thank the following individuals for their efforts in this study: Suzanne Allander, Michael Bittner, Peter Borys, Jenna Cherchio, Yu-Waye Chu, Robert Cornelison, Javed Khan, John Kakareka, Marc Ladanyi, Darryl Leja, John Leuders, Greg Maher, Livia Mezinos, Ola Myklebost, Lauren Schuler, Jun Wei, and Michael Wu.
Note: Supplementary data for this article are available at Cancer Research Online (http://cancerres.aacrjournals.org/).
- Received May 25, 2005.
- Revision received July 19, 2005.
- Accepted August 1, 2005.
- ©2005 American Association for Cancer Research.