Earlier detection of patients with metastatic colorectal cancer (mCRC) might improve their treatment and survival outcomes. In this study, we used proton nuclear magnetic resonance (1H-NMR) to profile the serum metabolome in patients with mCRC and determine whether a disease signature may exist that is strong enough to predict overall survival (OS). In 153 patients with mCRC and 139 healthy subjects from three Danish hospitals, we profiled two independent sets of serum samples in a prospective phase II study. In the training set, 1H-NMR metabolomic profiling could discriminate patients with mCRC from healthy subjects with a cross-validated accuracy of 100%. In the validation set, 96.7% of subjects were correctly classified. Patients from the training set with maximally divergent OS were chosen to construct an OS predictor. After validation, patients predicted to have short OS had significantly reduced survival (HR, 3.4; 95% confidence interval, 2.06–5.50; P = 1.33 × 10−6). A number of metabolites concurred with the 1H-NMR fingerprint of mCRC, offering insights into mCRC metabolic pathways. Our findings establish that 1H-NMR profiling of patient serum can provide a strong metabolomic signature of mCRC and that analysis of this signature may offer an independent tool to predict OS. Cancer Res; 72(1); 356–64. ©2011 AACR.
Pattern recognition technologies in the omics world have been used for the diagnosis and prognosis of several tumor types using a variety of experimental platforms (1–3). Metabolomics is a post-genomic research field concerned with the study of metabolic profiles in biologic samples such as isolated cells (4), tissues (5) or body fluids (6–10). In recent years, metabolomic studies have provided a significant contribution to the detailed understanding of the biochemical network and pathways in oncology (11), highlighting the potential of metabolomic analyses of blood samples for the detection of cancers, including epithelial ovarian (12), leukemia (13), oral (14), breast (15, 16), prostate (11), liver (17) as well as colorectal cancer (CRC; refs. 18–25). CRC is the second most common cause of cancer-related deaths in Europe (26) and the third most common cause of cancer-related deaths in the United States (27). The impact of colorectal carcinogenesis on the metabolic profile of tissues and serum has been tested by mass spectrometry (MS; refs. 18–21), nuclear magnetic resonance (NMR; refs. 22, 23), or a combination of the two analytic tools (24).
Most of the symptoms associated with CRC do not manifest themselves until late in the progression of the disease and many patients have metastases at the time of initial diagnosis (27). The overall survival (OS) of a patient is reduced considerably when the disease is diagnosed at a stage where it has spread to distant sites. In the past decade, the OS of patients with metastatic CRC (mCRC) has improved because of new combinations of chemotherapy, including 5-fluorouracil (5-FU), irinotecan, and oxaliplatin (28–31). The introduction of the new targeted therapies directed against the epidermal growth factor receptor (EGFR) has further increased response rates (32) in some patients. Nevertheless, early selection of patients with mCRC for optimal treatment with these biologics is very difficult. For example, KRAS mutations occur in 35% to 45% of patients with mCRC and predict poor response to EGFR-targeted therapy with cetuximab and panitumumab (33, 34). The remaining patients with mCRC are KRAS wild-type and approximately 25% to 40% of them are nonresponders and do not benefit from this treatment (33, 34). OS HRs for mCRC have been evaluated from KRAS mutations (HR, 2.4; ref. 35), Eastern Cooperative Oncology Group (ECOG) performance status (PS; HR, 1.78–1.88; refs. 29, 36) and serum levels of alkaline phosphatase (HR, 1.71; ref. 36), bilirubin (HR, 1.89; ref. 29), and lactate dehydrogenase (HR, 2.12; ref. 29). Somewhat higher HR values for primary CRC are obtained from monocyte (HR, 2.86) and neutrophil (HR, 2.90) count (37). Novel noninvasive, sensitive, and inexpensive indicators for mCRC OS are therefore highly desirable.
In this study, conducted within the frame of the biomarker discovery activities of the FP7 project SPIDIA (http://www.spidia.eu), we conducted 1H-NMR profiling of serum samples collected from 153 Danish patients with mCRC before third-line treatment with cetuximab and irinotecan (Table 1 and Supplementary Table S1) and of serum samples from 139 healthy subjects (Table 1). The statistical model generated from 1H-NMR profiles of serum samples of subjects from 2 of the 3 hospitals can robustly discriminate healthy subjects (n = 96) from patients with mCRC (n = 45) with 100% cross-validated accuracy. General applicability of the resultant classifier was successfully validated in an independent set of 43 healthy subjects and 108 patients with mCRC from the third hospital. The capability of the 1H-NMR profiles to predict OS after start of treatment with cetuximab and irinotecan using pretreatment serum samples was tested on a set of 20 subjects of the training set with maximally divergent OS. The classifier was validated on the independent set of 108 patients with mCRC. Furthermore, through multivariate analysis, a number of serum metabolites were identified whose levels were significantly different in patients with mCRC as compared to healthy subjects, as well as between patients with short and long OS. These metabolites can provide hints both to define new biomarkers and to better understand the biochemistry involved in mCRC.
Materials and Methods
Patients with mCRC
Serum samples were collected from a cohort of 181 Danish patients with mCRC resistant to 5-FU, oxaliplatin, and irinotecan. Details on patient selection are provided in Supplementary Methods. The patients were included in a prospective phase II study evaluating cetuximab and irinotecan every second week. The patients were included at 3 Hospitals in Denmark during the period October 17, 2006, to October 3, 2008. The patients were treated until disease progression with third-line irinotecan (180 mg/m2 of body surface area on day 1 of each 14-day period during the study) and cetuximab (500 mg/m2 of body-surface area every second week) independent of their KRAS status. Serum samples used for metabolomics were collected before initiating the treatment. In the present study, we did not include any patient with hereditary CRC.
Patients were included in the “Phase II study of treatment with cetuximab and irinotecan administered biweekly to irinotecan-resistant patients with metastatic colorectal cancer–efficacy and biologic markers”. Inclusion criteria were as follows: (i) patients with histologically verified adenocarcinoma of the colon or rectum with nonresectable or metastatic disease; (ii) patients with measurable disease as per Response Evaluation Criteria in Solid Tumors (RECIST; ref. 38); (iii) patients with progressive disease following either oxaliplatin- or irinotecan-based treatment; (iv) patients with irinotecan-resistant disease—defined as progressive disease after at least 6 weeks of treatment with irinotecan-based treatment or within 6 months following discontinuation of such treatment. Irinotecan-based treatment should not necessarily be given immediately before inclusion, but oxaliplatin-based treatment may be; (v) ECOG-PS less than 3; (vi) an expected survival time of at least 3 months; (vii) neutrophil count ≥ 1.5 × 109/L and thrombocytes ≥ 100 × 109/L; (viii) normal liver function with bilirubin < 1.5 × upper limit of normal and aspartate aminotransferase/alanine aminotransferase < 5 × upper limit of normal; (ix) all should be exposed to oxaliplatin-based treatment; and (x) The signed informed consent as per requirements of the Scientific Ethical Committees.
The patients were followed until death or July 7, 2010 (39). The study (including biomarker analysis) was approved by the Regional Ethics Committee (VEK ref. KA-20060094). For 95% of the patients, mutations on the KRAS gene were determined, as reported in Supplementary Methods. Further clinical data about the study will be published elsewhere by B.V. Jensen. One hundred sixty-eight pretreatment serum samples were available for 1H-NMR profiling.
Serum samples were collected from 96 Danish healthy blood donors and 43 healthy subjects (hospital staff; Table 1).
Serum sample preparation
Blood was collected into standard blood collection tubes and allowed to clot at room temperature for 30 to 120 minutes before centrifugation (1,500 × g for 10 minutes at 4°C). Serum was aliquoted and stored at −80°C in Greiner cryogenic vials. This procedure is compatible with the standard operating procedures (SOP) for metabolomic-grade serum samples recently defined (40).
NMR sample preparation
Frozen serum samples were thawed at room temperature and shaken before use. A total of 300 μL of buffer [70 mmol/L Na2HPO4; 20% (v/v) H2O; 6.15 mmol/L NaN3; 6.64 mmol/L sodium trimethylsilyl [2,2,3,3-2H4] propionate; pH 7.4] was added to 300 μL of each serum sample, and the mixture was homogenized by vortexing for 30 seconds. A total of 450 μL of this mixture was transferred into a 4.25-mm NMR tube for analysis.
Carr-Purcell-Meiboom-Gill (CPMG) 1H-NMR spectra were acquired to observe metabolites signals and suppress resonances arising from high–molecular weight molecules and J-resolved spectra projections (pJRES) to help identify biomarkers. Further details are provided in Supplementary Methods. The quality of the serum samples was assessed by visual inspection of the resulting NMR spectra according to the criteria defined in the study of Bernini and colleagues (40). Thirteen (8%) serum samples were not included in the statistical analysis because of the insufficient quality of the resulting NMR spectra that contained very broad features attributable to high–molecular weight impurities. Two patients with mCRC younger than 45 years were excluded as hereditary non-polyposis CRC is typically associated with increased inflammatory response and different prognosis than sporadic CRC. Demographic and clinical characteristics of the patient cohort studied by NMR are provided in Table 1 and Supplementary Table S1.
Multivariate data analyses were conducted on processed data by combining established methods (6, 41). Projection to latent structure (PLS) was applied on processed data for dimension reduction using the SIMPLS algorithm as implemented in the R library “plsgenomics”; the support vector machines (SVM) method was used for data classification using the “libsvm” module of the R library “e1071.” Canonical correlation analysis (CA) was conducted using the standard R function “cancor.”
To assess the prediction ability of the model, a 10-fold cross-validation was conducted by generating splits with a ratio 1:9 for the data set, that is by removing 10% of samples prior to any step of the statistical analysis, including PLS component selection. Parameter selection (best number of components for PLS, kernel type, cost of constraints violation for SVM) was carried out by means of 5-fold cross-validation on the remaining 90%. The whole procedure was repeated 100 times. Data classification using SVM was conducted by applying the classifier on PLS scores. Accuracy, sensitivity, and specificity were calculated using standard definitions. To estimate the classification/feature selection performance of our predictors, the statistical significance of all observations was further assessed through permutation tests.
Score contribution weights were examined to assess which metabolites were most responsible for the observed discrimination between groups (42). The relative concentrations of each metabolite were calculated by integrating the signals in the spectra. Statistical significance was assessed using the nonparametric Kruskal–Wallis test. A P < 0.05 was considered statistically significant. Groupwise comparisons of distributions of clinical and demographic data were conducted with the Fisher exact test for the categorical variables and Kruskal–Wallis test for the continuous variables.
Kaplan–Meier survival curves were created using the R library “survival.” The Wald test was used to calculate the statistical significance (P) of the differences between survival curves. Prognostic factors for OS were analyzed using the Cox proportional hazard regression.
All calculations were made using R (41) scripts developed in-house.
Fingerprinting of mCRC
A schematic diagram of the present strategy for constructing and validating a metabolic signature in patients with mCRC before third-line treatment with cetuximab and irinotecan is shown in Fig. 1A. Patients with mCRC and healthy subjects (Table 1) were discriminated by using SVM on PLS scores (PLS-SVM) calculated on 1H-NMR serum spectra of samples from Aalborg and Odense hospitals (n = 141). Accuracy, sensitivity, and specificity results for the training set were averaged on a total of 1,000 random subsamplings (100-times repeated 10-fold cross-validation). This procedure consists of two nested loops of cross-validation (Fig. 1A): an outer loop (10-fold cross-validation) for the estimation of the prediction accuracy and an inner loop (5-fold cross-validation) for the optimization of the parameters of the classifier training procedure. Classification performances on the validation set were calculated from the model built on the training set with a 10-fold parameter optimization procedure. Accuracy, sensitivity, and specificity of the PLS-SVM classifier are given in Table 2; excellent classification accuracies were found both for the training (100.0%) and the validation sets (96.7%). On a descriptive level, CA on PLS score (PLS-CA) was used to ascertain whether the spectral patterns of serum of patients with mCRC carry unique features with respect to those of healthy subjects. The resulting plot is shown in Fig. 1B: a very good discrimination is clearly apparent.
The different age distribution between the patients with mCRC and healthy subjects could be a possible source of inhomogeneity in the training set (P < 0.01). Therefore, the same statistical analysis described above was repeated on a subgroup aged between 45 and 65 years (P = 0.23), providing a classification accuracy of 100.0% in the training set and 92.1% in the validation set (Table 2). These results show that the difference in age distribution in the two groups does not affect significantly the classification accuracy values.
PLS-SVM cross-validation analysis was also repeated on a data set built with 96 healthy subjects and the 17 patients with mCRC and ECOG-PS of 0, obtaining a classification accuracy of 100.0% on the training set and 86.4% on the validation set (Table 2), which indicates that a clear metabolic signature of the disease exists even in the serum of patients with mCRC and ECOG-PS of 0.
Prediction of survival
The strong metabolic signature of mCRC observed in the present study suggests that even within patients with mCRC, there may be statistically significant differences in the metabolic profile. This could be related to OS. The prognostic classification capabilities of 1H-NMR profiling were therefore tested. A model was built by selecting from the training set a subset of 20 patients with mCRC with maximally divergent OS: that is, the 10 patients with the shortest OS and the 10 patients with longest OS were studied. The PLS-SVM classifier, similar to that used in the previous analysis, was used to assess whether patients with short or long OS could be identified by 1H-NMR profiling. The resulting accuracy was 78.5% (Table 2), suggesting that a signature for the progression state of the tumor is present in the 1H-NMR profiles. The statistical confidence of the prediction performance of the PLS-SVM classifier on the multimode data set with 10-fold cross-validation evaluation was investigated using a permutation test. Permutation test results (number of permutations = 1,000) showed that the accuracy difference between PLS-SVM and a random classifier was statistically significant (P = 0.026). PLS-CA scores are shown in Fig. 1C.
The patients with mCRC from the validation set were examined by using the model built on the basis of the 1H-NMR profiles of the serum samples of the training set of patients with extreme values of OS, as described before. The model identified 85 (78.7%) patients as belonging to the long OS group and 23 (21.3%) patients to the short OS group, with features as in Table 3.
Fig. 1D shows the Kaplan–Meier plots of the survival data for the 108 patients with mCRC of the independent validation set, segregated according to the 1H-NMR-based clustering described above. Statistically significant differences (P = 1.33 × 10−6) in the curve of OS were calculated on the basis of the Wald test. Cox hazard analysis shows that a significantly high HR exists between short and long OS groups [HR, 3.37; 95% confidence interval (CI), 2.06–5.50, P = 1.33 × 10−6; Table 4].
ECOG-PS also provides a meaningful prediction (HR, 1.57; 95% CI, 1.20–2.06, P = 1.13 × 10−5; Table 4), whereas the presence of KRAS mutations does not result as a meaningful predictor of OS (P = 2.11 × 10−1; Table 4). Consistently, mutated and wild-type KRAS patients were found to be almost equally distributed between the long OS and short OS groups (Table 3). When a multivariate analysis on NMR and ECOG-PS is conducted, only NMR profiling remains meaningful (Table 4). The predictive power of NMR profiling with respect to that of classical serologic markers (YKL-40, C-reactive protein, carcinoembryonic antigen) was also tested (full analysis to be published elsewhere by B.V. Jensen); NMR profiling remains an independent and meaningful predictor, together with CRP levels.
Metabolites contributing to the mCRC fingerprint
An analysis of the PLS loadings was conducted to identify the metabolites contributing to the mCRC fingerprint using the entire sets of patients with mCRC and healthy subjects. The values of the relative serum concentrations of metabolites were estimated through the integration of the signals in the NMR spectra. By comparing the spectra of the serum samples of patients with mCRC with those of healthy subjects, it appears that the patients with mCRC are characterized by lower serum levels (P < 0.05) of alanine, citrate, creatine, glutamine, peptide NHs, lactate, leucine, pyruvate, tyrosine, and valine and higher serum levels (P < 0.05) of 3-hydroxybutyrate, acetate, formate, glycerol, lipid (-CH2-OCOR), N-acetyl signal of glycoproteins, phenylalanine, and proline (Supplementary Table S2 and Fig. 2A).
To identify the metabolites that contribute to discriminate patients with short and long OS, we selected from the entire group of patients with mCRC those with extreme OS values. By setting OS thresholds of more than 24 months and less than 3 months, 18 and 17 patients were selected, respectively. Analysis of the PLS loadings shows that patients with short OS are characterized by lower serum levels (P < 0.05) of creatine, lipid (-C=C-CH2-C=C-), lipid (-CH=CH-), and valine and higher serum levels (P < 0.05) of lipid (-CH2-OCOR) and N-acetyl signal of glycoproteins (Supplementary Table S3 and Fig. 2B). The worsening of the clinical conditions thus corresponds to a more marked difference in the concentration of several of the metabolites that are responsible of the disease fingerprint.
By comparing the spectra of the serum samples of the underweight (N = 7) and normal weight (N = 71) versus overweight (N = 52) and obese patients with mCRC (N = 21), it appears that the overweight and obese patients are characterized by lower levels of formate (P = 7.78 × 10−3) and low-density lipoprotein (LDL)/high-density lipoprotein (HDL; P = 1.42 × 10−4) and higher levels of valine (P = 1.98 × 10−3), N-acetyl signal of glycoproteins (P = 1.28 × 10−2), CH-CH2-CO and CH2-CO signals due to lipids (P = 1.24 × 10−2; P = 7.37 × 10−3) and very LDL (VLDL; P = 3.07 × 10−2).
1H-NMR profiling was used to characterize the metabolomic signature of patients with mCRC before third-line treatment with cetuximab and irinotecan. The classification based on 1H-NMR profiling was highly accurate (100.0%, as estimated by cross-validation) and confirmed on the validation set (96.7%) showing a clear separation between the patients with mCRC and healthy subjects. The validity of the approach was strengthened by the demonstrated ability to find a clear signature in serum also of patients with mCRC in good performance (i.e., ECOG-PS = 0).
The capability of 1H-NMR profiling to predict OS is interesting. It probably reflects the fact that metabolites are the end products of the ensemble of processes occurring in living organisms and can be regarded as the ultimate response of the organisms to disease-induced metabolic alterations, inflammatory processes, and changes in lifestyle (diet, sedentary vs. active life, etc.) as a consequence of the pathologic status. The information content derived from classical approaches on the basis of single molecule markers is somehow embedded in the 1H-NMR patterns. On the other hand, both NMR profiling and quantification of patients' general well-being like the definition of ECOG-PS are both complex multiparametric indicators. However, the former is based on an objective, albeit complex, evaluation of spectral data that contain molecular level information. On the contrary, the ECOG-PS is based on criteria that reflect how a patient's disease is progressing and how the disease affects the daily living abilities of the patient; it is therefore intrinsically associated with a certain degree of subjectivity (43).
It has been reported that indicators of an inflammatory response in serum NMR are represented by increased intensity for the signals of CH2-COOR of lipids and for the N-acetyl resonance of glycoproteins (44). We show here that the metabolomic fingerprint in serum of patients with mCRC is characterized by a higher intensity of these resonances; the effect is larger for patients with short OS after starting third-line treatment with cetuximab and irinotecan. This finding may reflect a nonspecific inflammatory response and is in agreement with the proposal by Pages and colleagues (45) and Galon and colleagues (46) that the OS time is “governed in large part by the state of the local adaptive immune response.”
Patients with short survival showed decreased serum concentrations of polyunsaturated lipids. Reduced levels of hydroxylated, polyunsaturated ultra long-chain fatty acids have been reported for patients with CRC on the basis of NMR/MS studies and proposed as a signature of this disease (21).
Moreover, in the serum of the patients with mCRC, we found a depletion of several potential precursors of glucose in gluconeogenesis (47, 48), such as lactate, pyruvate, alanine, glutamine, and other gluconeogenic amino acids, with no significant changes in glucose concentration. The observed trend may suggest an increased uptake of glucose precursors by the liver (49, 50) that could be consistent with an increased hepatic gluconeogenesis (51). Such an effect is found in patients with cachexia that occurs frequently in patients with metastatic cancer and is associated with more than 20% of cancer deaths (52). By comparing the spectra of the serum samples of the underweight and normal weight versus overweight and obese patients with mCRC, it appears that the overweight and obese patients are characterized by lower levels of formate and LDL/HDL and higher levels of valine, N-acetyl signal of glycoproteins, CH-CH2-CO and CH2-CO signals due to lipids and VLDL. Although information on nutritional status is not available, the N-acetyl signal of glycoproteins suggests that underweight and normal weight patients with mCRC may be characterized by a significantly higher immune response, highlighting a link between cachexia and inflammatory response.
The relative low serum levels of lactate and the nonstatistically significant changes in glucose levels are different from the observations in previous studies of patients with CRC (20, 22) and in other types of cancer where, commonly, serum lactate levels are high (12, 53). However, relatively high serum levels of glucose and low levels of lactate, alanine, and gluconeogenic amino acids were observed in patients with oral cancer (14). This pattern was attributed to an altered energy metabolism (14). Gluconeogenesis in the present study of patients with mCRC may mask anaerobic dissimilation of glucose (48). From the comparison between patients with mCRC and healthy subjects it also emerges that high serum levels of 3-hydroxybutyrate are present in the patients with mCRC. This molecule is formed during fatty acid oxidation and its high levels may suggest a liver response to an increased energy demand needed for an increased hepatic gluconeogenesis (14). Depletion of polyunsaturated lipids observed in patients with mCRC also fits in this scheme.
In summary, we found that the serum of patients with mCRC before third-line therapy with cetuximab and irinotecan shows a signature of altered energy metabolism, which may reflect an increased gluconeogenesis and an accumulation of 3-hydroxybutyrate. Our findings also suggest that the overall 1H-NMR profile of serum from patients with mCRC may reflect an inflammatory status, more serious in patients with short OS.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
The research was funded by the European Union Seventh Framework Programme [FP7/2007-2013] under grant agreement no. 222916.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Note: Supplementary data for this article are available at Cancer Research Online (http://cancerres.aacrjournals.org/).
- Received May 5, 2011.
- Revision received October 25, 2011.
- Accepted November 1, 2011.
- ©2011 American Association for Cancer Research.