| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Systems Biology and Emerging Technologies |
Laboratory of Cancer Biology and Genetics, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, Maryland
Requests for reprints: Kent W. Hunter, Laboratory of Cancer Biology and Genetics, Center for Cancer Research, National Cancer Institute, NIH, Building 37, Room 5046C, 37 Convent Drive, Bethesda, MD 20892-4264. Phone: 301-435-8957; Fax: 301-480-2772; E-mail: hunterk{at}mail.nih.gov.
| Abstract |
|---|
|
|
|---|
| Introduction |
|---|
|
|
|---|
Most of these investigations have been based on the assumption that the metastasis-predictive gene expression signatures are the result of an early somatic mutation (12, 13). However, studies from our laboratory have shown that inherited polymorphism also play a role in metastatic progression (14–18) and that this germline variation drives the establishment of gene expression signatures that distinguish tumors with varying propensities to metastasize (19). More recently we have identified a number of genes with differential functionality, presumably as a consequence of germline polymorphism, in recombinant inbred mice derived from founder strains with inherently different metastatic capacities (14, 20, 21). We subsequently showed that ectopic expression of these genes could induce gene signatures in mouse tumor epithelium that predict outcome in human breast cancer clinical samples. These studies have provided some preliminary evidence to suggest that metastasis-predictive gene signatures may be induced by germline polymorphism of metastasis susceptibility genes. However, these initial studies do not enable dissection of the contribution of different cell types in the bulk tumor or the relative contribution of somatic mutation versus germline variation in the establishment of these expression patterns.
The aim of the current study is to gain a better understanding of the origins of the metastasis predictive gene expression profiles. To achieve this aim, we have used a mouse model system to define the factors driving the induction of metastasis-predictive gene expression signatures. Our studies suggest that the signatures are likely due to a combination of preexisting signatures established by inherited factors present in all tissues, as well as somatic mutations within the tumor epithelium.
| Materials and Methods |
|---|
|
|
|---|
Generation of mouse tissue gene signatures. Analysis of mouse tissue microarray data were performed using BRB-ArrayTools Version 3.5.0-Patch_1. Signatures distinguishing the tissues from the high or low metastatic genotypes were developed using the Class Comparison tool. Data were prefiltered to include only probe sets whose log-ratio variation were P < 0.01 and included in the signature only if univariate analysis for differential expression between the genotypes was P < 0.001. For the spleen and thymus samples, the univariate P-value thresholds were P < 0.0001 or P < 0.00001, respectively, to truncate the number of probe sets included in the signature. Gene expression data from these studies can be accessed at the National Center for Biotechnology Information (NCBI) GEO database (22) under accession GSE13231.
Tumor transplant assays. Two days before injection, highly metastatic Mvt-1 mouse mammary tumor cells (23) were passaged and permitted to grow to 80% to 90% confluence. The cells were then washed with PBS and trypsinized, collected, washed twice with cold PBS, counted in hemocytometer, and resuspended at a concentration of 106 cells/mL. One hundred thousand cells (100 µL) were injected into the fourth mammary gland of 6-wk-old virgin FVB/NJ female mice. The mice were then aged for 28 d and euthanized by anesthetic overdose. The 28-d time point was selected based on previously observed tumor growth and metastatic capacities (18, 24). Tumors were dissected and weighed. Lungs were isolated, and surface metastases were enumerated using a dissecting microscope. These experiments were performed in compliance with the National Cancer Institute's Animal Care and Use Committee guidelines.
Generation of human gene signatures. Human gene signatures were generated using Affymetrix Netaffx tools.1 Mouse tissue signature probe sets generated by the Class Comparison analysis of BRB Array Tools were used to query the database using the Batch Query tool of the Exon/Gene Array Expression toolset. Human probe sets corresponding to the individual mouse tissue signatures were identified using the Show Orthologues tool and the Human Genome U133 Plus 2.0 Array probe sets were downloaded for further analysis. Generation of the Rosetta Hu25K signatures was performed by matching the mouse gene symbols to the human gene symbols in the Hu25K annotation data.
Analysis of human gene expression data sets. Analysis of human gene expression data sets was performed as previously described (14, 20, 25). Analyses were performed using BRB-ArrayTools developed by Dr. Richard Simon and Amy Peng Lam.2 The GSE1456 (26), GSE2034 (1), GSE3494 (27), and GSE4922 (28) data sets were downloaded from the NCBI Gene Expression Omnibus Web site.3 Where samples were present in more than GEO submission (e.g., GSE1456 and GSE4922) duplicate samples were excluded from one or more of the data sets to ensure independence among the data sets. The Rosetta data set (10) was downloaded from the Rosetta Inpharmatics Web site.4 Expression data were loaded into BRB ArrayTools using the Affymetrix GeneChip Probe Level Data option or the Data Import Wizard. The equivalent human tissue gene signatures used to filter the expression data using the Select Gene Subset tool to exclude any probe set that was not a component of the relevant tissue gene expression signature and to eliminate any probe set whose expression variation across the data set was P
0.01.
Unsupervised clustering of each data set was performed using the Samples Only clustering option of BRB ArrayTools. Clustering was performed using average linkage, the centered correlation metric, and the genes analytic option. Samples were assigned into two groups based on the first bifurcation of the cluster dendogram, and Kaplan-Meier analysis was performed using the Survival module of the software package Statistica version 7.1 (StatSoft, Inc.). Significance of outcome analyses was performed using the Cox F test. Hazard ratios for the genes in the tissue gene expression signatures that correlated with outcome in the human data sets were identified using the Find Genes Correlated with Survival tool in the Survival Analysis toolset of BRB ArrayTools.
Survival analysis was performed using the publicly available outcome data. Where available, distant metastasis free survival was used (GSE2034 and Rosetta data sets). Because death by breast cancer is associated primarily with metastatic disease rather than primary tumors or local regional relapse, overall survival or death due to breast cancer was used for GSE1456 and GSE3494 data sets, respectively, as a surrogate for metastatic disease. For GSE4922, only the relapse data, including both local and regional, was available.
Pathway and functional category analysis. Pathway and biological functional category analysis was performed using the Ingenuity Pathways Analysis program (Ingenuity IPA 6.3-1402).
| Results |
|---|
|
|
|---|
|
|
|
|
To test these possibilities, gene expression analysis was performed using normal, nonneoplastic tissues isolated from transgene-negative highly metastatic AKR/J x FVB/NJ or low-metastatic DBA/2J x FVB/NJ F1 animals (Supplementary Tables S5–S9). Tissues were selected based on their presence in the primary tumor (whole blood, bone marrow), metastatic target organ and representative nonproliferative epithelial tissue (lung), and source of invading immunologic cells (spleen and thymus). Due to the high adipose content of mouse mammary, which is not represented in most human tumor samples used for gene expression, this tissue was excluded from the analysis. Signatures derived from expression differences from spleen and thymus of mice of differing metastatic capacities accurately predicted outcome in four of the five human breast cancer data sets analyzed in this study (Supplementary Fig. S1 and S2, respectively; Supplementary Tables S10–S14). Furthermore, the gene expression signature derived from normal lung accurately predicted outcome in all five breast cancer data sets (Fig. 4 ) consistent with the hypothesis that human breast cancer predictive signature profiles are driven, at least partially, by inherited rather than acquired factors. However, no consistent outcome effects were observed for the gene expression signatures of whole blood or bone marrow (Supplementary Figs. S3 and S4) suggesting that these tissues do not significantly contribute to the prognostic gene signatures derived from human tumor samples.
|
|
Network and biological function analysis of human data sets. To gain a better understanding of the genes and networks associated with prognosis in the human data sets, the Ingenuity Pathway Analysis was performed on the orthologous human probe sets. For this analysis, only those probe sets that significantly varied (P < 0.001; Supplementary Tables S10–S14) in one or more of the data sets were included in the analysis. Similar to the mouse data, genes from each of the tissue signatures could be assembled into a large network consistent with the possibility of a common underlying mechanism.
Individual gene signatures were then analyzed for the biological functions significantly overrepresented. Consistent with the analysis of human gene signatures (33, 34), genes associated with cell growth and proliferation were among the most significant (Supplementary Fig. S5A). In contrast, in the Mvt-1 transplant tumors, genes associated with cell growth or cell cycle were not the most significant biological functions (Supplementary Fig. S5B), although they were present within the signature. Analysis of the normal tissue gene signatures also revealed the universal presence of growth-associated genes in all of the profiles (Supplementary Fig. S6).
The presence of proliferation-associated genes in all of the signature profiles, including those that did not consistently predict outcome, suggests that either specific subsets of proliferation-associated genes are important in predicting outcome or other biological networks present in some of the tissue profile, but not others, are also associated with outcome. To test the later possibility, probe sets associated with the biological functions of cell cycle, cell growth and proliferation, and cellular assembly and organization were removed from the nonproliferative adult lung gene signature and the human data sets reanalyzed using the truncated profile. As can be observed in Supplementary Fig. S7 and Table 1, the proliferation-truncated gene signature was still capable of discriminating outcome in four of the five human data sets. This result is consistent with the possibility that other pathways, in addition to cellular proliferation, are capable of contributing to prognostic gene expression profiles. However, at this time, we cannot rule out the possibility that proliferation-associated genes remain in the lung signature but were not identified due to incomplete annotation or because the genes in the lung signature have proliferation-associated functions that have not yet been identified.
| Discussion |
|---|
|
|
|---|
This hypothesis makes several predictions. The most important is that if the predictive gene signatures are due in part to inherited polymorphism, it would suggest that the signatures should be detectable in normal, preneoplastic tissue in susceptible individuals. The aim of this study was, therefore, to test this hypothesis and to evaluate the ability to translate the results of our mouse genetic model system of breast cancer progression to human clinical samples. To do so, we performed a series of gene expression array analyses to ask the following questions: (a) Do gene expression profiles from mouse models of inherited metastasis susceptibility predict outcome in human breast cancer? (b) What are the cellular origins of prognostic gene expression signatures? (c) Does germline variation contribute to the induction of prognostic expression patterns in human breast cancer? (d) If there is indeed an inherited component to such signatures, what are the relative contributions of somatic and inherited factors in the establishment of the predictive expression profiles?
The strategy we used was to examine spontaneous tumors, transplant tumors, and normal tissues in mouse strains with different genetic susceptibility to metastatic progression for the presence of gene signatures that were able to discriminate outcomes in human breast cancer data sets. Our previous studies suggested that, like mice, humans also exhibit an inherited genetic susceptibility to metastasis (14, 15, 20). This, in turn, implied that the prognostic gene expression profiles observed in human breast cancer data sets might be, at least partially, the result of inherited factors (14, 20, 21). In the current study, we provide further support for the hypothesis that metastasis susceptibility is a complex heritable trait. More significantly, we provide evidence supporting our hypothesis that metastasis-predictive microarray gene expression signatures, which are currently being evaluated as potential prognostic tools in the clinical setting, may be partially driven by host germline polymorphism.
To investigate this, we performed microarray analysis to derive a gene expression signature indicative of the differences in gene expression between primary spontaneous mammary tumors from mice with a 20-fold difference in metastatic propensity (17). The resulting gene expression signature accurately predicted outcome in four of the five human breast cancer data sets examined. Additionally, nonneoplastic tissues from five other organs involved in the process of tumorigenesis were analyzed to investigate the relative cellular contributions to signatures derived from complex, bulk human tumors. Whole blood, spleen, and thymus were chosen to investigate the contribution of hematologically derived cells present within the primary tumor mass. Additionally, we characterized gene expression patterns in bone marrow because these cells have recently been shown to promote metastasis in both the primary tumor (38, 39) and secondary site (40). Finally, lungs were selected for gene expression analysis because the majority of metastatic lesions in this model system form at this site.
Several important conclusions can be drawn from these experiments. First, as predicted by the genetic predisposition hypothesis, metastasis-predictive gene expression signatures could be derived from a variety of normal, nonneoplastic tissues. Specifically, normal lung, spleen, and thymus derived from mice of differing metastatic propensities exhibited gene expression signatures that could predict outcome in breast cancer. No consistent predictive signal was observed for the circulating whole blood or bone marrow, supporting the conclusion that the contribution of these tissues to metastatic phenotype, while potentially critical to the clinical phenotype, may not contribute a large fraction of the expression patterns of most bulk primary tumors. The ability of the lung, spleen, and thymus to distinguish patient outcomes suggests that both basal epithelial and lymphocyte signals may comprise the majority of the signal observed in bulk tumor tissue.
The cellular origins of the inherited components of the predictive gene signatures were further investigated using a transplant strategy. Previously published analyses and earlier work in our laboratory showed that genes associated with stromal tissues and the immune compartments are frequently dysregulated in tumors more prone to metastasizing (10, 13, 41, 42). We, therefore, sought to investigate the relative contribution of these tissues to signatures by removing a major source of genetic heterogeneity: the tumor epithelium. This was achieved by implanting a malignant highly metastatic mouse mammary tumor cell line into the mammary fat pad of mice with differing metastasis susceptibilities. The resulting primary tumors were, therefore, composed of identical tumor epithelium but contained different infiltrating host components from the two mouse genotypes. Thus, any gene expression differences between tumors from different hosts would result directly from host tissue germline polymorphism and/or the reaction of tumor cells to the differing microenvironments.
Based on the presence of numerous host-derived, nonepithelial transcripts in the prognostic signatures, we anticipated that both the spontaneous and transplant tumors would be able to discriminate patient outcome. Indeed, we did observe that this was the case. However, no difference was observed in the metastatic capacity of this tumor cell line in spite of the previously observed 20-fold difference in metastatic susceptibility of the host genotypes (17). One possible explanation for this lies in the highly malignant properties of the Mvt-1 cell line. It may be that the influence that host germline polymorphism exerts upon the tumor epithelium is too subtle to be detected by in vivo orthotopic transplantation assays using a cell line selected for high malignant potential (23). Microarray analysis is, however, a very sensitive means of detecting changes in gene expression. Therefore, the observed prognostic gene expression signature in the Mvt-1 implant tumors likely reflects the subtle changes in gene expression resulting from interaction with the different hosts. Alternatively, it is possible that the effect of inherited polymorphisms on metastatic capacity is a tumor autonomous effect, and the prognostic gene expression profile from the transplant tumors is due entirely from the infiltrating host tissues. Thus, although the prognostic signature is apparent in the bulk tumor, the presence of the same highly malignant cell line in both hosts results in equivalent metastatic capacity. Additional work will be necessary to resolve these two scenarios.
Significant variation in the number of significant probe sets and the discriminatory ability of the tissue signatures was also observed across the human data sets. We believe that this reflects the underlying heterogeneity of the human populations represented in each data set, which are composed of mixtures of different molecular subtypes and stages. Previously, bioinformatic investigation into gene expression signatures showed that subsets of predictive genes would be identified based on the particular subset of patients analyzed (43, 44). As a result, the different sets of patients included within each data set, as well as different experimental variation introduced during array analysis, would be expected to generate different significant subsets of each tissue signature. Despite these fluctuations, use of all of these large data sets in the analysis increases the probability that any results that were observed were due to a general phenomenon, rather than a data set specific effect, or due to false-positives from analyzing only one of a limited number of data sets.
In addition, differences in the clinical characteristics of each patient set may also contribute significantly to the probe set selection and discriminatory ability of each data set. The data set from Wang and colleagues (GSE2034; ref. 1), for example, consists of only untreated lymph node–negative patients, whereas the other data sets contain a mixture of node-positive, node-negative, and adjuvant therapy-treated patients. The GSE2034 data set, therefore, represents the natural progression of node-negative breast cancer because there is no confound due to adjuvant therapy to account for. The Rosetta data set, in contrast, was designed to develop a discriminatory assay for younger patients (10). The differences observed for the prognostic ability of our samples between the data sets may therefore be potentially explained by these confounding variables. Of note, however, is the fact that the lung expression profile had prognostic value in all of the data sets, regardless of these confounding clinical differences. Because GSE2034 represents the natural progression of node-negative patients, this result supports our hypothesis that germline encoded transcriptional differences may, in fact, account for some measurable fraction of the prognostic gene signatures.
Finally, investigations over the past few years into the factors underlying the metastasis predictive expression profiles have suggested that all of the prognostic gene signatures may be sampling the same underlying network (32), most commonly thought to be cell cycle and proliferation (33, 34). The data presented here are consistent with these being important biological functions associated with progression. The signature profile derived from the spontaneous PyMT-induced tumors from (AKR x PyMT)F1 and (DBA x PyMT)F1 mice was capable of discriminating outcome in four of the five human data sets and was trending toward significance in the GSE2034 data set (Fig. 1). Removal of potential differences in proliferative capacity of the tumor epithelium resulting from constitutional polymorphism by implanting the same cell line into nontransgenic hosts eliminated any trend in GSE2034 (Fig. 3) and somewhat reduced the risk ratio in both GSE3494 and GSE4922 (Table 1). Similar results were observed when proliferation associated genes were stripped out of the lung gene expression signature (Supplementary Fig. S7; Table 1).
The ability of Mvt-1 transplant and truncated lung signatures to predict outcome in the data sets other than GSE2034, however, raises the possibility that other biological networks may also be predictive of breast cancer outcome. There are several possibilities that would need to be considered. First, these other pathways may not be causative factors predicting outcome. It is possible that the same polymorphic differences that are driving the predictive proliferation-associated gene sets may also be affecting the other networks as a bystander effect. Second, they may be causative factors but have not been detected as a common mechanism in analysis of the human data sets because of the dominant effect of the cell proliferation pathway and/or effects only in subsets of the human population. Third, it is possible that genes remaining in the Mvt-1 and truncated lung profiles are, in fact, members of the proliferation network but have not been so annotated either because their functional significance in cell growth is as of yet unrealized or that the current annotations are incomplete. Whereas it is not possible to definitely distinguish between these possibilities at this time, we favor the first two possibilities. Previous studies have shown that expression profiles are an independent predictive factor compared with standard clinical measures, including mitotic index. This suggests that either the signatures are a much more accurate measure of proliferation compared with standard immunohistochemistry or that they are measuring factors in addition to cellular growth. However, additional studies will be necessary to investigate and definitely address these possibilities.
In summary, these results provide additional evidence for the role of inherited factors in human breast cancer progression. In addition, they suggest that the prognostic gene signatures, currently in clinical trial, likely result from a complex mixture of somatic and inherited factors present not only in the tumor epithelium but also infiltrating nonneoplastic cells. Further investigations will hopefully improve our current understanding of the relationship between these various factors not only in the tumor epithelium itself but also in the infiltrating nonneoplastic tissues, with a goal of improving not only the current prognostic tools but also developing more effective therapeutic strategies for therapeutic intervention.
| Disclosure of Potential Conflicts of Interest |
|---|
|
|
|---|
| Acknowledgments |
|---|
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
We thank Dr. Richard Simon and Amy Peng Lam for developing the analyses performed using the BRB-ArrayTools.
| Footnotes |
|---|
1 http://www.affymetrix.com/analysis/index.affx ![]()
2 http://linus.nci.nih.gov/BRB-ArrayTools.html ![]()
3 http://www.ncbi.nlm.nih.gov/projects/geo/ ![]()
4 http://www.rii.com/publications/2002/vantveer.html ![]()
Received 9/ 9/08. Revised 10/16/08. Accepted 10/17/08.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
N. P. S. Crawford, H. Yang, K. R. Mattaini, and K. W. Hunter The Metastasis Efficiency Modifier Ribosomal RNA Processing 1 Homolog B (RRP1B) Is a Chromatin-associated Factor J. Biol. Chem., October 16, 2009; 284(42): 28660 - 28673. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. A. Emery, A. Tripathi, C. King, M. Kavanah, J. Mendez, M. D. Stone, A. de las Morenas, P. Sebastiani, and C. L. Rosenberg Early Dysregulation of Cell Adhesion and Extracellular Matrix Pathways in Breast Cancer Progression Am. J. Pathol., September 1, 2009; 175(3): 1292 - 1302. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Cancer Research | Clinical Cancer Research |
| Cancer Epidemiology Biomarkers & Prevention | Molecular Cancer Therapeutics |
| Molecular Cancer Research | Cancer Prevention Research |
| Cancer Prevention Journals Portal | Cancer Reviews Online |
| Annual Meeting Education Book | Meeting Abstracts Online |