| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Advances in Brief |
1 National Cancer Center and 3 Defence Medical and Environmental Research Institute, Republic of Singapore, and 2 Department of Pathology, Republic of Singapore
| ABSTRACT |
|---|
|
|
|---|
| Introduction |
|---|
|
|
|---|
| Materials and Methods |
|---|
|
|
|---|
4 = 3 points). As tumor size in the Stanford data set was defined using a categorical system, we assigned an approximate value for each categorical grade (i.e., T1 = 2 cm, T2 = 3.5, T3 = 5, T4 = 3.5), using an "average" value of 3.5 cm (T2) to describe T4 tumors (T4 indicating that the size of the tumor is not known). The end point criteria for disease-free survival (DFS) in the Stanford and Rosetta data sets are described in the original reports (10
, 15)
. The Stanford data set contained seven patients whose deaths were not attributable to cancer or who died within 5 years without a relapse. These patients were excluded from Table 3
|
Data Processing and Analysis.
Raw Genechip scans were quality controlled using GeneData Refiner and analyzed using Genedata Expressionist or conventional spreadsheet applications. The unsupervised data set (Fig. 1, A and B)
contains genes exhibiting SD > 1.5 across all well-measured samples. Minor variations of the variation filter yielded similar results.4
Redundant probes for the same gene were removed from analysis, leaving one probe per gene. Average-linkage hierarchical clustering using Pearson correlation distance metrics was performed using CLUSTER and displayed using TREEVIEW software (12)
. Unsupervised clustering was used to compare gene sets across different array technologies. Gene selections were performed using significance analysis of microarrays (SAM; Ref. 13
), with "false discovery rates" being 0.1% for Fig. 1C
and 15% for Fig. 2
. For Fig. 2
, similar results were obtained using false discovery rates ranging from 530%.4
Supervised approaches using leave-one-out cross-validation (LOOCV) assays were used to calculate the confidence of a specific class assignment. Weighted voting and prediction strengths were calculated as in Golub et al. (Ref. 14
; supplementary data). For each leave-one-out "training" data set in the LOOCV assay, the optimal NPI cutoff value was reselected by determining the cutpoint yielding the highest number of differentially expressed genes (supplementary data) from an initial data set comprising all well-measured genes (
14,000 genes). This set of differentially expressed genes was then used to classify the remaining tumor sample. This analysis was repeated for all ER+ tumors, and the overall classification accuracy was determined by summing the individual classifications. Kaplan-Meier survival curves were created using SPSS (SPSS Inc. Chicago, IL). Statistical associations between gene expression and clinical variables were determined by
2 analysis. McNemars test was used to compare the classification accuracies of the NPI-ES and NPI.
|
|
| Results and Discussion |
|---|
|
|
|---|
Using Affymetrix U133A Genechips, we generated expression profiles for 98 sporadic breast tumors derived from our local predominantly Chinese patient population. After data normalization and preprocessing, we applied a SD filter to identify a set of 367 genes (SD-367) exhibiting a high degree of gene expression variation across the tumor series. We used this gene set and unsupervised hierarchical clustering to group the tumor expression profiles on the basis of their overall similarity. The breast tumors self-segregated into three major subgroups, referred to as ER+, ER, and ERBB2+, respectively (Fig. 1A)
. There was good agreement between these molecular subgroups and conventional immunohistochemisty (supplementary data). A similar segregation pattern was also obtained when the tumor profiles were reclustered using principal components analysis, an independent analytical technique (Fig. 1B)
.
As the SD filter used in the unsupervised analysis does not discriminate between inter-subtype and intra-subtype differences, the SD-367 gene set consequently contains the following two broad classes of genes: those exhibiting significant expression variation between the tumor subtypes (inter-subtype) and those exhibiting expression variations within the subtypes (intra-subtype). Examples of genes exhibiting significant intra-subtype variation included those related to immune function (e.g., IGLJ3, IGL
), and stromal tissue remodelling (e.g., COL11A1, MMP13). To identify genes specifically exhibiting inter-subtype variation, we then used SAM (14)
at a false discovery rate of 0.1%, to select 409 genes (SAM-409) that were significantly regulated in a subtype-specific manner (100 genes were found in common between the SD-367 and SAM-409 gene sets, see supplementary data; Fig. 1c
). In this analysis, SAM was used in a descriptive fashion to identify genes that appear to be largely responsible for defining the clusters. As such, the SAM false discovery rate control should not be interpreted in this situation as providing a true statistical inference procedure because the clusters to which it was applied were derived from the observed data. Approximately 69% of the SAM-409 gene set consisted of genes exhibiting high expression in the ER+ subgroup compared with the other two subtypes, including the estrogen receptor gene ESR1 and estrogen-regulated genes such as LIV1, TFF1, and MYB (supplementary data). In agreement with other studies, high expression levels of GATA3, HNF3
, Annexin A9, and XBP1, were also observed in this subtype (7, 8, 9
, 11) . In contrast, the ER subgroup was associated with high expression of basal mammary epithelia markers (keratin 5 and 17), the basement membrane protein ladinin 1, and the serine protease inhibitor maspin, a tamoxifen-inducible gene that is expressed in an inverse fashion to ER (16)
. Finally, the ERBB2+ subtype was associated with high expression levels of the ERBB2 receptor and other genes physically linked to the 17q21 locus, such as GRB7 and PNMT(15)
, suggesting the presence of DNA amplification. Taken collectively, our results validate and confirm previous reports that the majority of breast tumors can indeed be subdivided into distinct molecular subtypes on the basis of their global gene expression profiles.
Identification of a Molecular Signature Correlated to the NPI in ER+ Tumors.
We focused on the 34 tumors belonging to the ER+ subtype and identified genes within this subtype whose expression was correlated to NPI status. Classically, breast cancer patients are typically stratified by the NPI into three major groups, e.g., "good" (NPI < 3.4), "moderate" (NPI 3.45.4), and "poor" prognosis (NPI > 5.4; Ref. 2
). Possibly reflecting the effects of variability across different scoring pathologists, other studies have proposed slightly different values for the cutoff values defining these groups (17)
. To avoid any potential bias in determining the appropriate NPI cutoff value, we conducted a moving threshold analysis where the ER+ tumors were divided into a series of binary groups by a NPI threshold that was steadily increased from 2.3 to 7.8. At each threshold value, genes exhibiting significant variation in expression between the two groups were identified. We found that using an NPI cutoff value of 3.8 to 4.6 yielded a gene set of 62 differentially expressed genes (Fig. 2A)
, the majority of which exhibited increased expression in the ER+ samples with a high NPI (Fig. 2B)
. We refer to this 62-member gene set as an "NPI Expression Signature" or NPI-ES (supplementary data). The genes belonging to the NPI-ES are associated with a wide variety of cellular functions implicated in oncogenesis, including DNA replication and cell division (APRT, MCM4, KNSL 1, CDC2), cellular signaling (chemokine ligand 1, Met, ShC), apoptosis (survivin, CD27-binding protein), and cellular adhesion (discs-large homologue 7, tetraspan 1). Of the individual NPI components (tumor size, tumor grade, lymph-node status), tumor grade appears to represent the predominant contributor to the molecular makeup of the NPI-ES (supplementary data).
Classification of Tumors by the NPI-ES Defines Two Discrete Molecular Groups.
One advantage in the use of molecular profiles for tumor classification is the ability to mathematically quantify the confidence level of the classification (12)
, which is particularly important if the classification affects the subsequent course of treatment. Notably, although the ER+ samples in our data set were associated with a continuous spectrum of classical NPI values (2, 3, 4, 5, 6, 7, 8)
, the clustering analysis using the NPI-ES appeared to separate the ER+ tumors into two apparently discrete groups (Fig. 2B)
, raising the possibility that samples exhibiting continuous values based on histopathological parameters may be nevertheless separable into discrete categories at the molecular level. To test this hypothesis, we used a supervised learning algorithm, weighted voting, with LOOCV assays to classify the tumors into "high NPI" and "low NPI" categories based on their global gene expression profiles. In addition to classification accuracy, quantitative metrics (prediction strengths) were also calculated to provide an assessment of prediction confidence (Fig. 2C)
. Briefly, the ER+ samples were divided into a series of thirty-four "leave-one-out" data sets, each consisting of all ER+ tumors except one, and the optimal NPI cutoff value was reselected for each data set. Confirming the original analysis, an NPI cutoff value of 4.0 was identified as the optimal cutpoint for every leave-one-out data set (supplementary data), and all 34 individual 60 member predictor gene sets generated by the LOOCV assay exhibited substantial overlaps (mean overlap 93%, range 7298%) with the original 62 gene NPI-ES signature identified in the previous section. Twenty-eight of the original 34 tumors (82%) were correctly classified by the LOOCV assay, and 24 of these 28 tumors (or 70% of all tumors) were classified with high confidence (Fig. 2C)
. Taken collectively, these results suggest that the NPI-ES can be used to classify the majority of the ER+ tumors in our data set into discrete groups with high confidence.
Application of the NPI-ES Across Multiple Independent Breast Cancer Expression Data Sets.
To test the ability of the NPI-ES to predict both NPI status and disease prognosis in a series of blind "test sets," we used two independent publicly available breast cancer data sets. The first data set (referred to as the Rosetta data set) consists of 78 lymph-node-negative breast tumors profiled using oligonucleotide-based microarrays, and the duration of DFS (the time from initial tumor diagnosis to the appearance of a new distant metastasis) for each patient (10)
. Importantly, there are already several published studies confirming the prognostic value of the NPI in node-negative tumors (18
, 19)
. The second data set consists of 78 breast carcinomas profiled using cDNA microarrays with overall patient survival information (referred to as the Stanford data set; Ref. 15
). The availability of these data sets allowed us to independently test the predictive power of the NPI-ES, as the Rosetta and Stanford data sets are different from our data set in multiple ways, including (a) patient population, (b) sample handling protocols, (c) scoring pathologist, (d) adjuvant treatment selection (more than half of the Stanford patients were subsequently treated with tamoxifen, whereas in contrast only 5 of 78 patients in the Rosetta data set were treated with systemic adjuvant therapy), and (e) choice of array technology and probe sets (two-color in the Rosetta and Stanford data sets and single color in ours).
Rosetta Breast Cancer Data Set.
Of the 409 genes identified by SAM analysis defining the ER+, ER, and ERBB2+ subtypes, 276 genes (67%) were found on the Rosetta microarray. We applied this gene set to the 78 Rosetta tumor profiles and identified 49 tumors belonging to the ER+ molecular subtype (supplementary data). We then determined that 46 of the 62 NPI-ES genes were also present on the Rosetta microarray. Because the Rosetta data set is based on a different array technology than ours, it is not possible to directly apply the trained weighted voting model developed on our data set to classify the Rosetta tumors. However, following the strategy described in Ramaswamy et al. (20)
for the comparison of gene sets across different array technologies, we used hierarchical clustering to group the 49 ER+ Rosetta tumors using the overlapping NPI-ES set of 46 genes. The clustering analysis divided the 49 ER+ Rosetta tumors into two groups consisting of 24 and 25 tumors exhibiting "high" and "low" expression levels of the NPI-ES, respectively (supplementary data).
We compared the tumors in these two subgroups to determine whether they were associated with differences in their NPI values. Using two distinct statistical approaches where the tumor NPI values were treated either as a continuous gradient (Students t test), or as two discrete groups (
2 analysis, using a classical NPI cutoff value of 3.4), tumors exhibiting high expression of the NPI-ES consistently exhibited a significantly higher NPI value compared with tumors expressing low levels of the NPI-ES (P = 0.0004 for continuous analysis, P = 0.0087 for binary analysis; Table 1
). We also investigated the prognostic performance of the NPI-ES and NPI using Kaplan-Meier survival analysis (Fig. 3)
. In agreement with other studies, patients with tumors of low NPI (<3.4) exhibited better DFS compared with patients of higher NPI (>3.4; P = 0.007; Fig. 3A
). When this same population was restratified by the NPI-ES, patients with tumors of low NPI-ES expression also exhibited improved relapse-free survival compared with patients with tumors expressing high levels of the NPI-ES (P = 0.0007). This analysis indicates that expression of the NPI-ES is significantly correlated with classical NPI status and clinical outcome in ER+ tumors even in an independent data set generated by a different array technology.
|
|
|
The Prognostic Capacity of the NPI-ES Is Comparable with a Previously Described "Prognosis Signature" for Breast Cancer.
In the same study by Vant Veer et al. (10)
, the authors also identified a 70-gene prognosis expression signature (PES) that predicted the DFS status of breast tumors. Interestingly, there is minimal overlap between the genes belonging to the NPI-ES and the PES, because they share only one gene in common, and only eight genes if compared with the extended PES list of 231 genes (supplementary data). To investigate the prognostic performance of the NPI-ES and the PES on the Rosetta ER+ tumors, we first used Kaplan-Meier survival analysis to compare the DFS of patients stratified either by the NPI-ES (Fig. 3B)
or the PES (Fig. 3C)
. Both classifiers were associated with statistically significant differences in survival (PES, P = 0.0001; NPI-ES, P = 0.0007). It is worth noting, however, that the identification of the PES was directly based on the expression profiles and clinical information of these same tumors, and thus the Rosetta tumors are not strictly blinded to the PES. When the PES and NPI-ES were applied to the Stanford ER+ tumors, both molecular signatures delivered highly similar predictive accuracies for poor prognosis tumors (Table 3)
, suggesting that the prognostic performances of the NPI-ES and PES are relatively comparable.
In summary, we have in this report identified a 62-gene expression signature that can potentially function as a molecular surrogate for the NPI in ER+ tumors. Confidence in the reliability of the NPI-ES was obtained by showing that it was significantly correlated with NPI status for two independent sets of tumors generated by different centers. We note that the NPI-ES was derived to correlate with NPI status and, as a consequence, indirectly with overall survival. We did not use our array data to directly derive a predictor for clinical outcome. Nevertheless, NPI-ES expression was also significantly associated with other clinical parameters, such as disease-free survival (Table 1)
, and overall survival (Fig. 3D)
, the latter being a clinical outcome historically associated with the NPI. One interesting concept emerging from this study is that samples exhibiting apparently continuous variables at the histopathological level may nevertheless be separable into discrete categories at the molecular level. This may address a major challenge in cancer histopathology, i.e., the difficulty of defining clinically appropriate cutoff values when the parameter being scored is of a continuous nature. We conclude by acknowledging that more work needs to be performed before the clinical utility of the NPI-ES can be fully assessed. First, the predictive power of the NPI-ES obviously needs to be tested against a much larger group of early (node-negative) and late-stage (node-positive) tumors. Second, although we have demonstrated the applicability of the NPI-ES in the ER+ molecular subtype, expression of the NPI-ES does not appear to be correlated as well to NPI values associated with the other molecular subtypes (ER, ERBB2+; supplementary data). As mentioned earlier, one explanation might be that the genetic circuitry regulating tumor grade, lymph-node status, and tumor size may be distinct between the different subtypes, which will require the identification of subtype-specific molecular signatures for these other subtypes. Given that ER and ERBB2+ tumors are typically associated with highly aggressive clinical courses, addressing this issue will undoubtedly be a crucial issue for additional research efforts.
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Note: Supplemental data for this article can be found at Cancer Research Online (http://cancerres@aacrjournals.org).
Requests for reprints: Patrick Tan, National Cancer Center/Defence Medical and Environmental Research Institute, 11 Hospital Drive, Singapore 169610, Republic of Singapore. Phone: 65-6-436-8345; Fax: 65-6-226-5694; E-mail: cmrtan{at}nccs.com.sg
4 K. Yu, unpublished observations. ![]()
Received 8/ 6/03. Revised 2/11/04. Accepted 2/24/04.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
A. Vincent-Salomon and O. Delattre Role of Genetic, Genomic, and Transcriptomic Factors Am. Assoc. Cancer Res. Educ. Book, April 18, 2009; 2009(1): 97 - 103. [Full Text] [PDF] |
||||
![]() |
J. A. Foekens, A. M. Sieuwerts, M. Smid, M. P. Look, V. de Weerd, A. W. M. Boersma, J. G. M. Klijn, E. A. C. Wiemer, and J. W. M. Martens Four miRNAs associated with aggressiveness of lymph node-negative, estrogen receptor-positive human breast cancer PNAS, September 2, 2008; 105(35): 13021 - 13026. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. A. Rakha, M. E. El-Sayed, A. H.S. Lee, C. W. Elston, M. J. Grainge, Z. Hodi, R. W. Blamey, and I. O. Ellis Prognostic Significance of Nottingham Histologic Grade in Invasive Breast Carcinoma J. Clin. Oncol., July 1, 2008; 26(19): 3153 - 3158. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. K.T. Tan, L. K. Tan, K. Yu, P. H. Tan, M. Lee, L. H. Sii, C. Y. Wong, G. H. Ho, A. W.Y. Yeo, P. K.H. Chow, et al. Clinical Validation of a Customized Multiple Signature Microarray for Breast Cancer Clin. Cancer Res., January 15, 2008; 14(2): 461 - 469. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. C. Hsu, J. Chang, T. Wang, E. Steingrimsson, M. K. Magnusson, and K. Bergsteinsdottir Statistically designing microarrays and microarray experiments to enhance sensitivity and specificity Brief Bioinform, January 1, 2007; 8(1): 22 - 31. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Hara, K. I. Nakayama, and K. Nakayama Geminin is essential for the development of preimplantation mouse embryos Genes Cells, November 1, 2006; 11(11): 1281 - 1293. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Yu, K. Ganesan, L. D. Miller, and P. Tan A modular analysis of breast cancer reveals a novel low-grade molecular signature in estrogen receptor-positive tumors. Clin. Cancer Res., June 1, 2006; 12(11): 3288 - 3296. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Ambrogi, E. Biganzoli, P. Querzoli, S. Ferretti, P. Boracchi, S. Alberti, E. Marubini, and I. Nenci Molecular Subtyping of Breast Cancer from Traditional Tumor Marker Profiles Using Parallel Clustering Methods Clin. Cancer Res., February 1, 2006; 12(3): 781 - 790. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Aggarwal, D. Li Guo, Y. Hoshida, S. Tsan Yuen, K.-M. Chu, S. So, A. Boussioutas, X. Chen, D. Bowtell, H. Aburatani, et al. Topological and Functional Discovery in a Gene Coexpression Meta-Network of Gastric Cancer Cancer Res., January 1, 2006; 66(1): 232 - 241. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Cross and J. K. Burmester The Promise of Molecular Profiling for Cancer Identification and Treatment Clin. Med. Res., August 1, 2004; 2(3): 147 - 150. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Cancer Research | Clinical Cancer Research |
| Cancer Epidemiology Biomarkers & Prevention | Molecular Cancer Therapeutics |
| Molecular Cancer Research | Cancer Prevention Research |
| Cancer Prevention Journals Portal | Cancer Reviews Online |
| Annual Meeting Education Book | Meeting Abstracts Online |