Microarrays have been used to identify genes involved in cancer progression. We have now developed an algorithm that identifies dysregulated pathways from multiple expression array data sets without a priori definition of gene expression thresholds. Integrative microarray analysis of pathways (IMAP) was done using existing expression array data from localized and metastatic prostate cancer. Comparison of metastatic cancer and localized disease in multiple expression array profiling studies using the IMAP approach yielded a list of about 100 pathways that were significantly dysregulated (P < 0.05) in prostate cancer metastasis. The pathway that showed the most significant dysregulation, HIV-I NEF, was validated at both the transcript level and the protein level by quantitative PCR and immunohistochemical analysis, respectively. Validation by unsupervised analysis on an independent data set using the gene expression signature from the HIV-I NEF pathway verified the accuracy of our method. Our results indicate that this pathway is especially dysregulated in hormone-refractory prostate cancer. [Cancer Res 2007;67(21):10296–303]
- Signaling pathways
- Bioinformatics and Computational Molecular Biology
- Genitourinary cancers: prostate
- Signal transduction pathways
- Metastasis/metastasis genes/metastasis models
- Tumor Progression, Invasion, and Metastasis
Prostate cancer is the second leading cause of cancer-related deaths in men after lung cancer ( 1). The mechanism of progression from a clinically localized disease to metastatic cancer is not well understood. Moreover, metastatic cancers inevitably become unresponsive to androgen withdrawal therapies. A clearer understanding of the underlying mechanisms would benefit the design of more effective clinical intervention strategies.
A popular approach in understanding the development of various types of cancers has been through the employment of genome-wide expression array analysis that has yielded a vast amount of information about marker genes involved in disease progression. The conventional method of analyzing microarray data has been to systematically examine the pattern of regulation of individual genes (up-regulation/down-regulation) and then to study the most highly dysregulated genes in greater detail. This approach has been useful both in dissecting the functionality of various genes in cancer progression and in correlating gene expression with clinical outcome ( 2– 5).
However, focusing on individual genes in a microarray data set is not the most efficient method for making use of the vast amount of genome-wide information because only a few highly up-/down-regulated candidate genes can be validated and studied in detail at any given time. For instance, an expression array comparison between benign and prostate cancer tissue yielded >1,000 genes that were significantly up-regulated (corrected P value, Q value <0.05; refs. 2, 6). Moreover, because proteins are known to function in regulated networks or pathways, a bioinformatics approach that would help identify the dynamics of these protein interactions would substantially improve the analysis of microarray data. Additionally, genes that are unaltered or minimally altered may become significant in the context of a biological pathway. Hence, such an approach would be useful in understanding the cross-talk between pathways that confer upon cells and the transformed phenotype and would maximize the information that can be gleaned from genome-wide expression array data.
Recently, methods have been developed toward analysis of sets of genes or pathways ( 7– 13) from expression array data. Gene set enrichment analysis (GSEA; ref. 9) ranks gene sets by assigning scores according to their enrichment in a particular microarray data set. Additional methods have been developed on the principles of GSEA ( 7, 14, 15). Other “module”-based approaches (sets of genes that are biologically related; refs. 8, 10) require the definition of a threshold in terms of fold change, or statistical significance, in expression and, therefore, do not incorporate all the genes in the identification of dysregulated gene sets or pathways. All the methods described are focused on analyzing individual expression array data sets. Even when multiple data sets are used, the results of the pathways are computed separately for each study. We present an alternate method, integrative microarray analysis of pathways (IMAP), that allows for meta-analysis of multiple sets of microarray data and can query multiple pathways simultaneously determining the significance of each pathway. We used IMAP to study pathways dysregulated in metastatic prostate cancer using the data from the profiling studies available on the Oncomine database. We validated our results using quantitative PCR (qPCR) and in situ protein expression analysis in a prostate cancer tissue microarray.
Materials and Methods
IMAP. The data from multiple expression array studies was obtained from the Oncomine database. 8 We included expression array data from three different studies that were found to provide consistent information when analyzed for similarity. To assess for consistency between the three studies, Pearson correlation was computed pair-wise between the mean values of common genes. The three studies showed significant positive pair-wise correlation. The characteristics of the samples included in the three data sets are presented in Supplementary Table S1. The pathways on Biocarta 9 (n = 315) and KEGG 10 (n = 137) databases were used as pathway references for the analysis.
Meta-analysis. P values were calculated for all genes within each data set using the Mann-Whitney U test by comparing localized prostate cancer and metastatic samples. The P values were then converted to Z scores by mapping onto a standard normal curve. The sign of a gene's Z score is then defined to be positive if the median expression level within metastatic samples is greater than that in the localized samples and negative otherwise.
For each gene, Z scores from across studies were combined using the Liptak-Stouffer method ( 16) to obtain a single Z score ( Fig. 1A ). This method for combining different studies requires computing a weighted Z score for each gene:where m is the number of studies, Zg,i is the Z score of the gene g in the i-th study, and wi is the weight of the i-th study. The weights can be assigned according to different criteria like the sample size, recent studies, etc. In the analysis conducted, the criterion of sample size was selected to assign weights.
Pathway analysis. The basic idea of our approach is to assign a score to each pathway and to compute its significance. To begin with, a score sg is assigned to every gene in the pathway P.where pg is the P value computed from the corresponding Zcomb,g. In this case, a gene with a low P value will have a high score. The gene scores are then summed up for all the genes constituting the pathway P, obtaining therefore a pathway score S:
The significance of S was then computed by iteratively computing a score S* on randomly sampled genes, following the above equation, and counting the number of times S* is >S. One thousand random permutations were done, and a P value for the pathway is therefore defined as this count divided by 1,000.
Pathway validation on an independent data set. The Glinsky et al. data set ( 5) was used for the validation of the HIV-I NEF pathway. This data set includes 79 localized prostate cancer samples and 8 metastatic samples. The cases are well annotated by Gleason grade and biochemical recurrence (PSA recurrence following prostatectomy). Hierarchical clustering was done using the dChip software. 11 We used centroid linkage and correlation as distance metric. The P value of the sample cluster is computed via the hypergeometric distribution.
Tissue samples and tissue microarray. Four unmatched localized and hormone-naïve lymph node metastatic samples obtained from the radical prostatectomy program (University of Ulm) were used for the validation studies by qPCR. A progression tissue microarray (TMA) was constructed from benign prostatic tissues (n = 10), localized (n = 36), and metastatic prostate cancer samples (18 hormone-naïve metastases, 18 hormone-refractory metastases). Each sample was represented by four TMA cores.
RNA extraction and cDNA synthesis. Frozen tissue blocks with at least 90% tumor were selected. The tissues were sectioned and grossly dissected to enrich for tumor glands before RNA isolation. Six 10-μm sections were used for RNA extraction. Total RNA was isolated using the Ribopure RNA isolation kit (Ambion Inc.) according to the manufacturer's instructions. The RNA was reverse transcribed to generate first-strand cDNA using the TaqMan reverse transcription kit (Applied Biosystems) with random hexamer priming.
Real-time qPCR. Prevalidated TaqMan gene expression assay probes and primers (Applied Biosystems) were used for the validation of a set of genes [nuclear factor of κ light polypeptide gene enhancer in B cells 1 (NFKB1), nuclear factor of κ light polypeptide gene enhancer in B cells inhibitor α (NFKBIA/IKBα), BCL2-associated athanogene 4 (BAG4)/Silencer of death domains (SODD), baculoviral IAP repeat-containing 2 (BIRC2/cIAP1), caspase-7 (CASP7), TNF receptor-associated factor 1 (TRAF1), TNF receptor-associated factor 2 (TRAF2)] from the pathway showing the highest dysregulation (HIV-I NEF). Real-time PCR was done using Opticon monitor 2 (Bio-Rad Laboratories). Quantitation was carried out using the ΔΔCT method. Average expression of the localized prostate cancer cases was considered for the calculation of fold changes. GraphPad Prism software was used for plotting the data.
Immunohistochemistry. The antibodies used for immunohistochemistry were NFKB1 (Biolegend), NFKBIA/IκBα (Cell Signaling), SODD/BAG4 (Oncogene Research products), TRAF1 (Santa Cruz Biotechnology), and TRAF2 (Cell Signaling). Immunohistochemistry was done as described earlier ( 17). Briefly, 6-μm sections were deparaffinized and treated with 10 mmol/L citrate buffer (pH, 6.0). Antigen retrieval was done by pressure cooking. The following dilutions of the different antibodies were used: SODD (1:100); IκBα (1:50, overnight PC); TRAF1 (1:50, overnight PC); NFKB1 (1:50); and TRAF2 (1:50). The TMAs were evaluated using a semiautomated quantitative image analysis system ACIS II (Chromavision Medical Systems Inc.) that has been previously validated ( 18, 19). The areas for analysis were selected by the study pathologist (J.-M.M). Cores/cases that were deemed non-assessable by the pathologist were excluded from the analysis. The results were exported and analyzed using the SPSS software.
Pathway analysis. We used three data sets for this study ( 2, 20, 21). These studies used either the Affymetrix or the spotted cDNA microarray platforms for the gene expression analysis. We used the gene symbols to map the different genes across the platforms. The data were downloaded from the Oncomine database. This database provides preprocessed data sets, i.e., “all data are log-transformed, median centered per array, and the SD is normalized to 1.” The three data sets used contained benign samples, localized prostate cancer, and a mixture of hormone-naïve and hormone-refractory metastatic samples. The Mann-Whitney U test was used to score genes for being differentially expressed in metastases compared with localized prostate cancers. The gene lists obtained from each study were then merged as described in Materials and Methods ( Fig. 1A) to arrive at a meta-gene list. This list was then used for pathway analysis. We found 104 pathways significantly dysregulated in metastasis (P < 0.05). Table 1 shows the top 10 dysregulated pathways, which include the cell cycle pathway, the nuclear factor-κB (NF-κB) pathway, the mitogen-activated protein kinase (MAPK) signaling, and the cell adhesion pathways that have been implicated to play a role in the development of prostate cancer ( 22). The pathway that was found to be maximally altered was the HIV-I NEF pathway, which comprises the tumor necrosis factor receptor (TNFR) and the fas receptor signaling pathways ( Fig. 1B). It was seen that the TNFR arm was present in multiple pathways that were significant in the analysis, indicative of its importance. The TNFR pathway involves the activation of NF-κB heterodimer by TNFα and mediates cell survival.
Validation of the HIV-I NEF pathway on the transcript level. To validate the IMAP results, the expression of a few key genes in the NF-κB pathway that were significantly dysregulated in all three data sets was determined by real-time qPCR analysis using unmatched localized (n = 4) and metastatic samples (n = 4). The details of the regulation of the individual genes are shown in Fig. 2 . Agreement was seen in six of the seven validated genes. The regulation of gene expression of the individual genes within the pathway validated by qPCR is depicted in Fig. 3 . The solid up and down black arrows indicate the up- or down-regulation of gene expression. The survival pathway mediated by NF-κB seems to be up-regulated as seen by the increase in TRAF2 expression and BIRC2 expression (data not shown).
Validation of the HIV-I NEF pathway at the protein level. We further examined the in situ protein expression of the genes that were consistently dysregulated by qPCR in a larger number of samples using a progression TMA ( Fig. 4 ). The TMA data showed that NFKB1, NFKBIA/IκBα, and TRAF1 were up-regulated in localized prostate cancer and down-regulated in hormone-refractory metastasis. The results with TRAF2 were not conclusive. BAG4/SODD showed an increasing trend in expression from benign tissues to hormone-naïve metastasis, but was also down-regulated in hormone-refractory metastasis (Supplementary Fig. S1). Hence, it seems that this pathway is dysregulated particularly in the hormone-refractory metastasis compared with localized prostate cancer. In addition, increase in nuclear localization of NFKB1 was seen in the metastatic samples, indicating that although the levels of NFKB1 are decreased in hormone-refractory metastasis, the protein present is transcriptionally active, which could be mediated by low levels of IκBα.
We further investigated the reciprocal behavior of the proteins by evaluating protein ratios of NFKB1 and its inhibitor or target proteins (NFKB1:IκBα, NFKB1:TRAF1, NFKB1:TRAF2, and TRAF1:TRAF2; Supplementary Fig. S2). We considered intraclass and interclass expression heterogeneity. We found heterogeneity in the protein expression among the samples belonging to each class. In general, a large percentage of samples expressed low ratios of proteins in benign tissues. The majority of samples expressed high ratios of the proteins in hormone-naïve metastases and localized cancers. The hormone-refractory metastasis, on the other hand, resembled the benign tissues. In the case of TRAF1:TRAF2 ratios, it was seen that about 40% of hormone-refractory metastasis had high TRAF2 levels, which indicates that the NF-κB pathway is activated in hormone-refractory metastasis.
Validation on an independent data set. The gene signature constituting the HIV-I NEF pathway was further validated on an independent prostate cancer microarray data set (ref. 5; Fig. 5 ). The Glinsky et al. ( 23) data set is well characterized and has been used for other validation studies. By reducing the expression array signature to the gene space including the gene signature in the HIV-I NEF pathway and performing hierarchical clustering, the localized prostate cancer segregated from metastatic samples. By further reducing the gene space, it was seen that 26 genes were sufficient to distinguish the two groups (data not shown). Principal component analysis showed that the metastatic samples were clustered closely with the samples that had biochemical recurrence (Supplementary Fig. S3). Furthermore, these genes were able to significantly distinguish the samples that did not recur (Supplementary Fig. S4). Validation of additional top ranking pathways (Supplementary Fig. S5) showed that the pathways identified by IMAP could distinguish the metastatic samples from localized prostate cancer.
We have developed a method to carry out meta-analysis of altered pathways from multiple micorarray data sets. In the current study, we show how this approach can be used to identify critical pathways in prostate cancer progression. Several other approaches have been developed toward determining the regulation of molecular interactions from microarray data sets. Some of these like the GSEA depend on defining sets of genes and querying them on a single data set ( 7, 9, 12– 14), whereas others have focused on finding sets of genes that are biologically related, e.g., “modules” that are “over-represented” in the up- or down-regulated gene lists after defining a threshold for fold change in expression ( 8, 10). IMAP differs from these methods by allowing the combination of data from multiple data sets, leading to the identification of significantly dysregulated biological pathways without a priori definition of thresholds. Hence, it takes into account genes that show a marginal variation but become significant in the context of the pathway as a whole. It is therefore possible to identify genes that were not previously described in literature because of their apparent lack of association with prostate cancer when considered individually. However, those genes can function together in a pathway and become relevant for disease progression. Pathways are identified as being dysregulated instead of being overexpressed or underexpressed.
Using this strategy, we found several pathways as being dysregulated in metastatic prostate cancer, like NFKB, transforming growth factor-β (TGFβ) signaling, integrin signaling, and cell cycle pathways, that have been previously implicated in prostate cancer metastasis ( 22, 24, 25). The most significantly dysregulated pathway, the NF-κB arm of the HIV-I NEF pathway ( Fig. 3), consists of activation of NF-κB by TNFα. The binding of TNFα to TNFR results in the dissociation of the inhibitory protein SODD and recruitment of the adapter protein TRADD. TRADD in turn binds to additional adapter proteins, RIP1 and TRAF2, which leads to the recruitment and activation of the IKK complex ( 26). This activation results in the phosphorylation and dissociation of IκBα from the NF-κB heterodimer and subsequent nuclear translocation of the active NF-κB heterodimer. Nuclear translocation results in the transcription of the target survival pathway genes that include the TRAF proteins and the inhibitors of apoptosis (IAP).
Validation by qPCR and immunohistochemistry showed this pathway to be dysregulated especially in hormone-refractory metastasis. The differences that we detected between localized prostate cancer and hormone-naïve metastasis on the transcript (mRNA) level were not apparent at the protein level as detected by immunohistochemistry. We have previously shown that transcript and protein expression are discordant in 48% to 60% of cases ( 27). These discrepancies may arise due to tissue heterogeneity. The three microarray data sets used for IMAP included both hormone-naïve and hormone-refractory cases, and it has been reported that the metastatic lesions are not homogeneous and differ between individuals, sometimes within the same individual depending on the site of occurrence ( 28). Our results showed that the percentage of samples having a particular ratio of proteins varies within each class (Supplementary Fig. S2). This further shows that the tissues are heterogeneous with respect to protein expression. In situ hybridization to probe for transcript expression in tissues could be used on the TMAs to help confirm these results, but is not suitable for high-throughput analysis as is immunohistochemistry. Because the microarray pathway analysis identifies critical nodes in prostate cancer progression, immunohistochemistry is the best approach to translate the data into a clinically applicable test.
We found an increase in nuclear localization of NFKB1 in both hormone-naïve metastasis and hormone-refractory metastasis consistent with other studies that have shown that NF-κB p65 is localized in the nucleus and constitutively activated in cell lines ( 29– 33) and metastatic tissue samples ( 31, 33, 34). The down-regulation of IκBα may result in the constitutive activation of NF-κB as reported by Tergaonkar et al. in the case of p65, where depletion of the IκB proteins led to constitutive activation, with a fraction of NF-κB localized to the nucleus at the basal steady-state level ( 35). The nuclear localization has also been reported to be associated with increased invasion ( 36) and biochemical relapse ( 37). Hormone-refractory metastasis are androgen-independent tumors, and there have been some studies that have shown that androgen receptor (AR) and NF-κB are inversely related ( 22), and the loss of AR is accompanied by an increase in NF-κB activity. Thus, the increase in the nuclear localization and activation of NFKB1 in metastasis might allow for better survival. The analysis also identified TRAF1, TRAF2, and SODD as being involved in prostate cancer metastasis.
These results suggest that the NF-κB pathway is important for the development and spread of prostate cancer, and that the low levels of NFKB1 in hormone-refractory metastasis are active and sufficient to sustain the survival of metastatic lesions. NFKB1 is known to be autoregulated and is also regulated by members of the ETS family ( 38), which have been recently shown to play a critical role in prostate cancer ( 39). Further studies will be needed to understand the mechanism underlying the down-regulation of NFKB1 expression because this pathway seems to be important for early development of prostate cancer and in advanced disease (hormone-refractory metastasis).
In conclusion, we have developed a robust method for analyzing pathways altered in prostate cancer metastasis using three expression array data sets. The gene signature of the top dysregulated pathway, HIV-I NEF, when validated on an independent data set, was found to correctly classify all the metastatic samples. The genes in the pathway were also able to distinguish samples that had a good prognosis (i.e., no biochemical failure). Although this method has been used on three microarray studies, it can be extended to multiple expression array data sets. This study sets the stage for further discovery of the basic mechanisms that underlie a diseased state and would help in identifying critical nodes in the pathway that can be targeted for diagnosis and therapeutic intervention.
Grant support: Department of Defense Prostate Cancer Research Program fellowship award PC050965 (S.R. Setlur).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
We thank Robert Kim and Cara Pina for their technical assistance and Monideepa Roy for critical reading of the manuscript.
Note: Supplementary data for this article are available at Cancer Research Online (http://cancerres.aacrjournals.org/).
S.R. Setlur and T.E. Royce contributed equally to this work.
- Received June 12, 2007.
- Revision received August 28, 2007.
- Accepted September 11, 2007.
- ©2007 American Association for Cancer Research.