| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Regular Articles |
Cancer Research UK Clinical Centre [M. A. R., P. C., N. P. M., P. J. S., R. E. B.] and Department of Urology [A. P.], St. Jamess University Hospital, Leeds LS9 7TF, and School of Computing, University of Leeds, Leeds LS2 9JT [J. N.], United Kingdom
| ABSTRACT |
|---|
|
|
|---|
Samples from patients before undergoing nephrectomy for clear cell renal cell carcinoma (RCC; n = 48), normal volunteers (n = 38), and outpatients attending with benign diseases of the urogenital tract (n = 20) were used to successfully train neural-network models based on either presence/absence of peaks or peak intensity values, resulting in sensitivity and specificity values of 98.3100%. Using an initial "blind" group of samples from 12 patients with RCC, 11 healthy controls, and 9 patients with benign diseases to test the models, sensitivities and specificities of 81.883.3% were achieved. The robustness of the approach was subsequently evaluated with a group of 80 samples analyzed "blind" 10 months later, (36 patients with RCC, 31 healthy volunteers, and 13 patients with benign urological conditions). However, sensitivities and specificities declined markedly, ranging from 41.0% to 76.6%. Possible contributing factors including sample stability, changing laser performance, and chip variability were examined, which may be important for the long-term robustness of such approaches, and this study highlights the need for rigorous evaluation of such factors in future studies.
| INTRODUCTION |
|---|
|
|
|---|
In urological cancers, urine represents a particularly useful fluid in which to examine tumor markers because of its enhanced potential to contain higher concentrations of directly released tumor-derived products, and additionally its collection is noninvasive. The analysis of urine offers multiple technical challenges such as the several hundred-fold to thousand-fold more dilute protein concentration compared with serum, the presence of proteases, the variability of protein concentration because of hydration state, and the potential for contamination from other sources such as seminal or vaginal fluids. Systematic examination of urine composition by high-resolution two-dimensional PAGE and more recently by liquid chromatography/tandem mass spectrometry approaches has identified several hundred proteins, including intact and cleaved forms of plasma proteins such as retinol binding protein, transferrin, albumin and ß2-microglobulin, kidney or urogenital tract-derived proteins such as erythropoietin, urokallikrein, epidermal growth factor, E-cadherin, basement membrane antigens and Tamm-Horsfall protein, and viral or bacterial proteins (6, 7, 8, 9, 10, 11, 12) . Tumor-associated proteins in urine include PCA-1 forms in prostate cancer (13) , psoriasin (S100A7) in bladder squamous cell carcinoma (14 , 15) , tumor-associated trypsin inhibitor in many cancers (16) , and several bladder transitional cell carcinoma markers including NMP-22,3 BTA, and fibrin-fibrinogen degradation products (17 , 18) . Few urinary markers have been described in renal cancer, with NMP-22 (19 , 20) and ß-glucuronidase (21) being of potential interest.
A more recently developed complementary proteomic technology is SELDI, which is particularly biased toward investigation of molecules <20 kDa and has a sensitivity in the femtomole region or less (22) . SELDI uses chip-based protein sample arrays of differing chemical chromatographic surfaces to selectively bind those proteins in a sample with specific chemical properties, for example hydrophobic, cationic, anionic, or metal binding molecules, before generating mass/charge profiles of the applied sample. Several examples now exist where SELDI has been used to identify cancer-specific protein "fingerprints" in tissue samples (23, 24, 25, 26) , quantitatively and qualitatively characterize existing known tumor markers in prostate cancer (27, 28, 29) , to identify potential new markers such as hepatocarcinoma-intestine-pancreas/pancreatitis-associated protein I in pancreatic ductal adenocarcinoma (30) , and several potential urinary biomarkers in bladder cancer (29) . Considerable potential has also been shown in combining this approach with the use of various computational models that, irrespective of knowledge of peak identity, can generate diagnostic predictions based on peak profiles, with promising results in studies using serum samples from patients with prostate cancer and breast cancer (31, 32, 33, 34) . In ovarian cancer, SELDI profiles of serum samples subjected to an iterative searching algorithm resulted in a model that identified patients with ovarian cancer, including patients with stage I diseases, from patients with benign conditions or healthy controls with a sensitivity of 100% and specificity of 95% (35) . Although such results are extremely impressive, their validation and utility over a longer time period have yet to be explored.
We have examined the clinical utility of the combination of neural-network modeling and SELDI profiling using urine samples from patients with renal cancer, initially examining the ability of such models to discriminate between normal and malignant disease, and also the potential to differentiate from various benign urological conditions. As an additional examination of the robustness of the technique, the predictive accuracy of such models when applied to similar data collected 10 months later has also been explored.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Sample Collection.
Voided urine samples were obtained from a total of 218 individuals. For the main body of the study (n = 138; see Table 1
), samples were obtained from patients before undergoing nephrectomy for later histologically confirmed clear cell RCC (n = 60), normal volunteers (n = 49), and outpatients attending with benign diseases of the urogenital tract (n = 29). For the subsequent long-term testing of the technique, urine samples from an additional 80 subjects were used comprising 36 patients with RCC, 31 healthy volunteers, and 13 patients with benign urological conditions (Table 2)
. This study was approved by the Local Research Ethics Committee, and all of the participating patients gave informed consent. A sample of the urine obtained was assayed by standard automated method for protein and creatinine, using the Synchron LX-20. Within 15 min of voiding, each urine sample (50100 ml) was placed on ice and a Complete protease inhibitor mixture tablet added. Hematuria was assessed using Multistix, and the pH was measured and adjusted to 7 (by addition of 1 M NaOH or 1 M HCl as appropriate). The adjustment of the pH to neutral before freezing minimizes precipitation during storage (36)
. Samples were then sieved through a 100-µm nylon filter to remove cellular debris and aggregates, centrifuged for 10 min at 800 x g at 4°C, and aliquotted and stored at -80°C. The period of storage for samples used in this study ranged from 2 weeks to 2 years.
|
|
Preparation of Protein Chips for SELDI Analysis.
Samples from the groups in Table 2
were analyzed by SELDI 810 months after the initial analysis and establishment of the neural net models using those in Table 1
. On the basis of optimization studies as described in the following sections, the following protocol was adopted for the standard analysis of all of the urine samples in the study. Urine samples were thawed and microfuged (10,000 x g) for 10 min. Samples were then diluted to a protein concentration of 50 µg/ml, spiked with additional Complete protease inhibitor mixture (equivalent to 1 mini protease tablet/100 ml of diluted sample), and placed on ice.
After initial pilot experiments, the chip type selected for this study was the weak cation exchange (WCX2) chip. Protein chips (8 spot) were prepared by pretreatment with 5 µl of 10 mM HCl/spot (2 x 5 min) followed by three rapid washes, each with 5 µl of water. The urine samples prepared above were diluted 1:1 with 2x binding buffer [40 mM ammonium acetate/0.2% v/v NP40 (pH 6.5)] containing 4 femtomol/µl bovine cytochrome C (Mr 12,230.9) as an internal calibrant and 50 µl of diluted sample applied/spot (1.25 µg total protein) using a bioprocessor. After incubation at room temperature with shaking at 200 rpm for 30 min, samples were removed and the chips washed five times (100 µl/spot for 1 min each wash) with 1x binding buffer followed by two washes as above but with water, and allowed to air-dry. To each spot, 0.35 µl of sinapinic acid matrix solution was added, allowed to dry, and an additional 0.35 µl of matrix was added. After air drying, the chips were analyzed on the SELDI system with PBS-II software using a protocol with the following parameters: acquisition up to 100 kDa, optimum mass range 320kDa, laser intensity of 210 with two warming shots not collected, sensitivity of 10, and collection of 50 transients across the spot surface.
Determination of Optimum Sample Load.
To examine the effects of total protein amount on the SELDI profiles, 6 urine samples (3 normal and 3 RCC) were prepared as described previously with final protein concentrations of 50 µg/ml, diluted 1:1 with 2x binding buffer, and final volumes varying from 5 µl (125 ng total protein applied directly to the chip) to 250 µl (6.25 µg total protein applied using the bioprocessor) were loaded on to WCX2 chips and subjected to SELDI analysis as described above. Similar experiments but examining the effect of various protein concentrations of the diluted samples from 25 µg/ml to 6.25 µg/ml were also carried out. The effects of salt and urea on profiles were also examined by adding appropriate amounts of stock solutions of urea and NaCl to produce elevations of 20 and 100 mM for each.
Assessment of Sample Stability.
To evaluate stability during experimental processing before SELDI, several experiments were carried out on a total of 8 urine samples (processed as described above, i.e., with the addition of protease inhibitor before freezing), where samples were thawed and either no additional protease inhibitor was added or protease inhibitor equivalent to 1 mini protease inhibitor tablet/100 ml was added. Samples were then analyzed by SELDI immediately and after being left on ice for up to 2 h. The effects of the additional protease inhibitor per se on peak profiles were examined by comparison of peak intensities of paired samples (plus and minus inhibitor addition) at time zero. Sample degradation was examined by analyzing changes in peak intensities across the 2-h period after thawing given the addition of additional protease inhibitor and changes in peak intensities across this same time period without the addition of additional inhibitor.
Assessment of Reproducibility.
To assess intrachip variability, 50 µl of 2 urine samples, 1 normal and 1 RCC, were each loaded onto 4 spots of a WCX2 chip as described previously with a total protein load per spot of 1.25 µg. To assess interchip variability, the same samples were assayed using three different chips. Profiles were normalized for mass to the internal calibrant (single and double-charged peaks), and reproducibility of the mass determination and of the peak intensities were calculated for five selected peaks in each sample ranging from 3 to 20 kDa and of various signal intensity, exporting the peak information for additional analysis.
Data Processing.
Data were processed and examined in two distinct ways. The calibration of all of the spectra was checked using the peaks of the internal calibrant at 12230.9 and 6115.4 and recalibrated where necessary. In the first approach, the Biomarker Wizard feature on the standard Ciphergen ProteinChip software version 3.0 was used to autodetect the peaks present in all of the 138 samples used in the initial training sets and blind sets. The spectra were examined in two separate regions of 250020,000 Da (the lower region being excluded because of matrix interference making this region unreliable) and 20,000100,000 Da as different peak detection settings were found to be optimal for each because of different noise levels and peak resolution stringency. These were determined by iterative processing of spectra and monitoring the changes seen with modifications in the different parameters until optimal peak detection was achieved visually. For the lower molecular weight region, settings of 2
for the noise determination using the spectral mass region of interest only, a signal:noise ratio >5 was used for first pass with clusters being completed using signal:noise ratio of >3 for second pass, cluster mass setting of 0.2% (i.e. ±0.1%), and a requirement to be in a minimum of 10% of samples was used. For the higher molecular mass region, noise was set at 3
and the cluster mass of 2% otherwise values were the same as for the low region. The Biomarker Wizard was then used to determine clusters significantly different among the normal control, benign, or RCC patient groups using standard nonparametric tests. Additionally, after peak detection manually using these settings but with no cluster analysis, raw data were exported to Microsoft Excel to determine the presence/absence of peaks with values of zero assigned to spectra for absent peaks and the number of spectra containing each peak determined.
In the second approach, the spectra generated by SELDI were processed before evaluation by neural-network analysis. This is described in detail below. In summary, the SELDI profile data from a subset (training set) consisting of 48 RCC samples, 38 controls, and 20 benign samples of the main study group of patients and controls (Table 1)
were analyzed to locate spectral peaks that, considered alone, were valuable in distinguishing cancer patients from controls as described below. The results of this analysis were, in turn, used to generate multivariate models through the application of neural-network classification techniques. Two models using a control set composed of either healthy controls or healthy controls combined with benign controls were both tested. In addition, the effect of correcting for creatinine by scaling peak intensity values by a factor derived from the protein/creatinine index was examined, generating two additional models. A simpler method, taking the binary presence or absence of each peak, regardless of its intensity, was also examined. The success of each model in correctly classifying a "blind" test set of 12 RCC samples, 11 controls, and 9 benign samples for which the programmers were unaware of the diagnosis was then measured. As an additional test of the applicability of the technique and model over long-term use, an additional "blind" group of samples consisting of 36 RCC, 31 controls, and 13 benign samples (Table 2)
was analyzed by SELDI over a period of 23 months 910 months later and against tested the neural-network models.
Peak Detection and Selection Procedure.
The data output from the SELDI mass spectrometer consists of
32,000 data points for each sample profiled with a high mass acquisition of 100,000 Da, each one associating a m/z with an amplitude. In the region between 2,500 and 20,000 Da where the majority of peaks were seen, there are
10,000 data points for each spectrum.
To reduce the spectral data to a smaller number of key variables and enable the construction of valid predictive models, a peak detection algorithm was generated, taking into account the requirement to discriminate peaks from noise and the variance in peak width in different regions of the spectra. The algorithm examined each of the 10,000 data points in the 2,50020,000 Da range to determine which of these could form the center of a peak. A peak was defined as a high intensity value with a given number of progressively lower intensity values immediately to both sides. A varying size window was passed over the data, with the number of data points required to form a peak varying between 2 points and 20 points on each side of the central data point under consideration. To overcome the minor fluctuations because of noise, the data were smoothed using a simple n-point moving average where n is equal to the window width.
The results of this peak detection for each of the 106 training spectra and the 32 blind spectra were combined to locate clusters of peaks with similar m/z values. This involved forming a frequency distribution of the 10,000 m/z values against the total number of peaks identified within a tolerance of 0.1% of each one. The peaks within this distribution formed the center points of clusters of peaks that could be considered the "same" peak: 368 "master" peaks were identified in this way.
The final stage of peak detection was to build an input pattern for each sample. These patterns consisted of one value for each of the 368 master peaks, namely the intensity of any peak in the sample within 0.1% of the master peak, or zero if no such peak was present. These were then scaled into the range of (0, 1) for each input. For the binary models, this was simplified to an input of one for present or zero for absent. Some statistical analysis was required to select which of the inputs would be included in the classification model. A 2 x 2 table was formed for each input showing the incidence of peaks against the incidence of cancer over the training set. From these, the
2 indicator for each input was calculated, and this sorted the inputs into order of significance.
Neural-Network Classification Procedures.
Fully interconnected feed-forward neural-networks were set up with varying numbers of the most significant inputs, five hidden-layer neurons, and one output neuron. The hidden neurons mapped the sum of their inputs to their outputs using a standard sigmoid function (1/1+e-x). A prediction of cancer corresponded to a high signal on the output neuron. All of the connection weights were randomly initialized in the range (-1, +1). The network was then presented with the useful-peak data for each of the subjects, and trained using the back-propagation algorithm (37)
. Subjects were presented in random order, alternating between positive and negative classifications to prevent any bias forming because of their unequal numbers. Training was complete when the network correctly classified all of the input data or when 100 presentations of the data set had been completed, whichever came first. The learning rate and momentum term were set to 0.45 and 0.90, respectively.
The process of predictive model construction was conducted six times, once using only the 38 normal controls as comparison and a second time including 20 benign controls for a broader control group of 58 cases, with each, in turn, being subjected to creatinine correction and then simplification to binary input as described previously. The predictive accuracy of each neural-net model was assessed by using it to predict the status of the initial blind test group and subsequently the larger, later time point blind group (Table 2)
. Because of the stochastic nature of the training process (random initial weights; random order of presentation of subjects) there was a need for repeated training runs. Thus, when a particular neural-net model is spoken of below, we are, in fact, referring to the averaged outcome of 10 models run with identical parameters.
| RESULTS |
|---|
|
|
|---|
|
0.05% in all of the cases (
0.02% in 50% of cases) with the exception of two peaks in the RCC sample where CVs of 0.1% and 0.09% were found. As expected, the CVs for peak intensity were considerably greater, with selected peaks having mean peak intensities ranging from 4.6 to 44.6 and CVs 14.0% to 57.4% (average of 34.2% and values <30% in 50% of cases) with similar values for inter- and intrachip comparisons. Examples of the degree of reproducibility are shown in Fig. 2A
|
From the manual analysis using the standard Ciphergen software to examine the spectral profiles and generate cluster reports in the range 2,500 to 20,000 Da, a total of 180 peaks were detected in the initial 138 samples used. Using the threshold of being present in at least 10% of spectra and the peak clustering parameters indicated, this figure was reduced to 30 peak clusters excluding those derived from the cytochrome C spike or aprotinin. Peaks identified as being consistently present in the majority of samples irrespective of sample grouping were present at 4,753.3, 6,715.6 and 16,939.0 Da. Sixteen peaks were identified as being significantly different between the normal and RCC groups with the most significant of the positive discriminators for RCC being 2,789.5 (P = 0.0196; 32 RCC versus 21 normal), 3,580.7 (P < 0.0001; 11 RCC versus 0 normal), and 4,136.3 Da (P = 0.0113; 19 RCC versus 9 normal). The peak at 3580.7 was the only peak found solely in the RCC group, although only present in 11 samples. Interestingly, this peak was biased toward patients with larger tumors with 3 of 3 patients with T4 tumors, 6 of 26 T3 samples, 2 of 8 T2 samples, and 0 of 23 T1 samples being positive. The most significant discriminatory peaks for normality were at 4,957.7 (P < 0.0001; 37 normal versus 26 RCC), 5,071.5 (P < 0.0001; 37 normal versus 19 RCC), 5,276.3 (P = 0.0001; 20 normal versus 7 RCC) and 16,793.3 Da (P = 0.01; 33 normal versus 17 RCC). Examples of some spectra illustrating these peaks are presented in Fig. 3
.
|
Neural-network analysis of the complex spectra obtained using SELDI initially appeared to be successful in that both the models using either presence/absence of peaks or the intensities of peaks were able to discriminate between RCC and healthy normal controls in the training sets with sensitivities and specificities of 98100% (Tables 3
and 4
). Using an initial set of 32 samples tested blind against the models, variable results were achieved, and in general the models incorporating peak intensities performed better than those solely examining presence and absence of peaks with sensitivities and specificities of >80% for all of the group comparisons (Table 3)
. The inclusion of benign samples in the control training set produced poorer results in the presence/absence model than training solely on healthy controls whether discriminating between RCC and healthy controls alone, or healthy controls and benign combined (Table 4)
. This was not the case for the intensity model where either training set produced similar results with sensitivities and specificities >80% (Tables 3
and 4
). The peaks found to be most discriminatory and incorporated into the neural-network from all of the peaks detected in the samples significantly overlapped with those found by manual analysis to be significant, particularly those at 5071.5, 16793.3, 5276.3, 4537.5, and 4957.7 Da. In the benign group, BPH and benign kidney diseases were generally predicted well, whereas the urinary tract infection samples were all predicted as RCC.
|
|
10 months later were disappointing. For the presence/absence models, although specificities were either little affected or indeed improved, sensitivities decreased markedly to <40% (Table 5)
|
A later analysis of chips from different batches revealed an additional possible contributory factor in that the late blind set of samples was analyzed gradually over a period of 3 months using several batches of chips but not those used for the initial analysis. Unfortunately none of the initial batches of chips was available for repeat examination with the exception of 4 spare spots on a chip batch from a similar time to those used for establishment of the initial model, although now significantly past its validation date (AB058). It was also possible to examine 2 samples on a limited number of spare sample chip spots from several batches of chips including some used in the late blind trial. It is quite evident that although in most cases the overall gross profiles are similar, there are both qualitative and quantitative variations between chip batches with several of the peaks found to be the most significant discriminators being most affected by the use of different chips. An example of the effects seen is shown in Fig. 4
. Clearly some chips such as DU081 and AB058 (although 1 year past validation date) produced very similar profiles to those used originally, although particularly and not unexpectedly in the latter case, with reduced signal. However, using the identical sample solution, CN651 showed selective loss of some peaks including the cluster at 5 kDa, and DZ145 showed a more similar profile to the early results but markedly reduced signal across the range. These effects were reproducible in that alternative CN and DZ chips from the same batches showed the same differences in profile.
|
18 months earlier were rerun at a variety of laser intensities and tested blind against the original network. Laser performance was clearly vastly improved with the best results visually in terms of comparability between runs being obtained at a laser intensity of only 190 compared with 210 when the samples were originally used to generate the neural-network (Fig. 5)
|
| DISCUSSION |
|---|
|
|
|---|
The establishment of suitably promising initial neural-network models in our study indicated the potential of this approach in being able to use urine samples to discriminate between renal cancer and control sample groups. However, the sensitivity and specificity values, and ultimately the associated predictive values obtained in the initial small test group analyzed blind, indicated that our procedure would be unsuitable as a screening tool without additional refinement of the model. However, potential peaks of interest for subsequent characterization and quantification using assays such as immunoassays were seen. Although poor, the sensitivity achieved was better than that found in preliminary studies for NMP-22 in renal cancer, for example, where only 40% is achieved (20)
. The only previous studies using SELDI to analyze urine samples examined renal function after administration of radiocontrast medium to rats and humans undergoing cardiac catheterization (38)
and profiled samples from 30 patients with transitional cell carcinoma (29)
. The reproducibility of the mass accuracy was similar in these studies to that reported here, although the reproducibility of peak intensities was not reported. However, CVs of 1520% have been seen using serum samples after normalization of peak intensities (34)
. Similar to our findings, significant interindividual heterogeneity in urinary profiles was also seen in the TCC study (29)
using SAX2 anion exchange chips, and with similar numbers of peaks and multiple protein changes detected in the cancer group including five potential novel biomarkers. Using individual peaks, sensitivity ranged from 43% to 70% and specificity from 70% to 86%, but using the presence of particular clusters of peaks, sensitivity was significantly increased to 87% with a specificity of 66%, with the detection of low-grade tumors being superior to conventional cytologic approaches. Using antibodies, peaks at 3.3/3.4 kDa were identified as being human defensins
1 and
2. Relatively little is known about the identities of smaller molecular weight proteins in urine, as they are not normally present on gels because of their size and also the fact that an initial dialysis step with cutoff of 10 kDa is often used in urine processing. In addition to the urinary tumor-associated proteins described previously, a number of proteins with proapoptotic and antitumor activity have also been found in urine (39)
, which may represent some of the peaks seen here. These include eosinophil-derived neurotoxin, antineoplastic urinary protein, angiostatin, inhibin, and activin A. A tentative identification of the presence of the peptide ß defensin-1 in urine profiles in this study can be made based on comparison with previous non-SELDI-based urinary studies. Produced by many epithelial cell types including particularly the renal distal tubules and Loops of Henle in the kidney (40
, 41)
, the presence of ß defensin-1 in urine has been described previously (40
, 42)
, with proforms and proteolytically processed forms including of 5068.1, 5068.7, 4749.2, and 4750.5 Da, very close to the masses of the peaks seen here at 5071.5 and 4753.3, and able to bind WCX2 chips (43)
. The reason for the decrease or relative absence in RCC urines may reflect increased proteolysis or alternatively decreased production, because complete loss of defensin expression has been shown in clear cell tumors (44)
.
However, the most marked problem in the present study is the poor results achieved with the subsequent larger blind set analyzed 10 months later, particularly the marked decline in sensitivity. There are several possible explanations ranging from sample processing and stability through machine performance to problems with the neural-network model. The neural-network model was used in combination with an equally promising peak detection algorithm that enabled the explicit identification of predictively relevant peaks, in contrast with the "black box" approaches often used in machine learning. As is typical with neural-network analyses, various aspects of this design were arrived at via a process of trial and error. Experimentation on fictional data sets was used to establish that the network could successfully classify cases given a few predictively relevant peaks against a background of noise, and experimentation within the training set determined the optimum number of inputs for each model type with this number determining the sizes of models used to predict the blind test sets. Peaks were ranked by their associated
2 statistics when cross-tabulated with the incidence of cancer, and the number of peaks used as input ranged from the best 10 to the best 70. The exact number of inputs was chosen after testing to find the smallest subset that would result in the minimum classification error. The choice of 5 hidden-layer neurons was somewhat arbitrary; the performance of networks with 10 neurons in the hidden layer was also examined but produced no improvement in predictive accuracy. More fundamentally, the use of a small hidden layer was intended as a defense against the dangers of over-fitting: with enough inputs and a large enough hidden layer, a network could be trained to successfully classify any data set. Given the modest sample size in comparison with the amount of data collected per subject, it seemed prudent to limit the space of possible models. Although a 5-hidden-neuron model will fail to capture highly complex interactions between input variables, it also prevents the detection of spurious interactions through over-fitting to the training set. (By way of comparison with traditional statistical techniques, the use of 5 neurons in the hidden layer is roughly equivalent to limiting a factor analysis or principal components analysis to the top 5 factors). An element of over-fitting may have occurred with the artificial neural network, which would account for the 15% drop in predictive accuracy with the initial blind test samples, but we feel that this is then unlikely to account for an additional drop 10 months later, although there may be a minor contribution.
An area for future development would be the comparative use of different modeling strategies and decisions regarding optimal samples numbers on which to establish models. Sample group size is critical, but the method of selection is not clear with traditional sample size calculations not necessarily applying to this kind of data where multiple biomarkers are being analyzed by machine learning approaches (45) . By iteratively varying the numbers of subjects included in the test and training sets, suitable numbers should be found beyond which the power of the model does not substantially improve. Whatever the technique used to build predictive models, experimentation will always be required to fine-tune the approach. For example, using serum-based SELDI datasets, a sensitivity of 83% and specificity of 97% for prostate cancer versus noncancer was found using a simple decision tree classification algorithm (32) but improved to 95.8100% on the same datasets if a boosted decision tree algorithm was used (34) .
Precautions were taken to ensure that issues such as urine stability and processing would contribute minimally in introducing variability to the study. Stability of urine samples during freezing and processing may be more of an issue than for plasma or serum samples with the potential for increased protease content particularly in patients with cancer and the relative lack of endogenous protease inhibitors compared with serum. All of the samples were processed and stored identically and although sample stability was not generally found to be a problem, the use of protease inhibitor mixtures is a precaution that we use routinely and that does not appear to interfere with analysis by SELDI with the exception of introducing an aprotinin peak at 6504 Da. The amount used was titrated on the basis of ensuring minimal interference of this peak on the spectra while being within the recommended working range. The issue of differences in sample stability accounting for some of the problems seen here cannot be ruled out with storage times varying from weeks to 2 years. However, all of the samples were stored at -70 to -80°C, which has been shown to be generally less detrimental to several urinary proteins than -20°C (46) , and, although samples in the control groups tended to have been stored for less time than the RCC groups, samples in both the training sets and late blind groups contained similar mixtures of samples in terms of storage times so that the possibility of differential sample storage artifacts was unlikely. However, samples were always diluted and applied to the chip with minimal delay, because changes in profiles were occasionally seen in samples that had been diluted and purposely left on ice for prolonged periods of time before analysis (4 h), but they appeared to be nonselective and may represent precipitation. There is clearly a need to standardize protein amount loaded, a practice that is not always followed in the published studies using SELDI and, indeed, may not always be necessary depending on the sample type if sufficient sample is applied to exceed the protein threshold. Therefore, differences in protein concentration could not account for the results seen and, although factors such as ionic strength of solutions are important in determining ion-exchange binding, urinary composition in terms of salt and urea concentrations were not found to affect profiles on WCX2 chips within the ranges examined. The demonstrated necessity to standardize protein concentrations precluded normalizing for hydration state using creatinine concentrations, as would normally happen for a quantitative assay of any analyte. A mathematical adjustment of peak intensities taking into account protein and creatinine ratios of samples before analysis was investigated, but this did not improve the power of the models used. This is not surprising, because the correction for any sample could only be applied uniformly to peaks detected, and these are partly governed by amount of sample loaded that, if normalized for creatinine rather than protein, may change between samples.
Machine performance in terms of laser intensity changes clearly alters with time and, it is likely that this is at least partly implicated in the failure of the model when used over a long period of time. Although the primary failure in the blind set was most marked in sensitivity, adjustment of the replacement laser and reanalysis of 39 training set samples clearly boosted the results toward better sensitivity rather than specificity. However, the failure to achieve 100% sensitivity and specificity clearly indicates that other factors or adjustments need to be made. Laser performance will deteriorate with time, and quite clearly measures need to be taken to monitor this regularly and adjust settings accordingly. The use of a standard chip, which can be used for calibration and to monitor laser performance in terms of the peak intensities, may be ideal, because this would not rely on chip chemistry, being essentially dried on to the chip surface. We are currently examining various ways of achieving this monitoring and being able to detect subtle changes over time, in addition to those readily apparent when using this approach in our case when a study is recommenced after a long break. An automatic laser adjustment feature is apparently now included in the latest version of the Ciphergen software, but whether this can achieve the reproducibility of performance needed remains to be determined. In addition, changes in the detector performance with time may be important. The issue of quantification by mass spectrometry is technically very demanding and when used as in this and in similar studies, can at best be regarded as semiquantitative, with a variety of factors influencing protein ionization including the presence of other proteins in the mixture analyzed exerting suppressive effects. Normalization of peaks against total ion current may be one software option to improve this aspect, but again for samples such as urine with relatively few peaks and exhibiting a relatively high degree of heterogeneity, this may not be as valid as for more complex samples and certainly could not be used to improve long-term comparisons.
The variability found with some chips, which may be batch-related on the basis of the results shown here but also may represent variability of some chip spots within batches, although not found within our initial reproducibility studies, is also likely to have contributed to the results. This requires additional evaluation, and the use of duplicate analyses for example may be explored but our initial work indicated that this would not necessarily overcome the issue. This has not been highlighted before, although in one of the prostate cancer trials quality control chips consisting of pooled normal and cancer serum samples were developed to allow monitoring of between-day/chip variability (34) . We are also exploring similar practices to assess chip chemistry combining this with the standard chips to assess laser performance as described. To our knowledge, no studies examining the long-term use of computational profiling in this scenario have been published, although one study using serum on immobilized metal affinity chromatography chips has reported that randomly selected samples rerun months or even a year after the study were correctly classified (32) . It is not clear how many samples were examined in this way or whether the same batch of chips was used.
The results initially obtained here for renal cancer are equal to or superior to the urinary protein assays described currently for bladder cancer (17 , 18) , and even the later blind results are similar to those obtained with BTA and BTA Stat. However, such assays have a role in monitoring recurrence in bladder cancer where local recurrence is common, whereas for renal cancer, recurrence at other sites is more likely and, therefore, serum-based markers may be more useful. However, the more important issues raised by this study are the implications for the long-term robustness of the type of approach used here. With the recognition that no single biomarker fulfils all of the desirable roles of markers in terms of utility in diagnosis, prognosis, or monitoring, approaches using multiple parameters such as SELDI and other similar array-based approaches, whether at the cDNA or protein level, may ultimately prove to be the most versatile. It could be that some of the problems experienced in this study may have been compounded by the use of urine, which contains far fewer peaks and is potentially more labile. Several extremely promising studies have now been published using SELDI in combination with computer algorithms and serum samples, although follow-up studies regarding long-term utility have not yet been published. If they prove to be more robust in the long-term, then this may be an indication that serum samples may be more appropriate. The exciting challenge then will be their translation to clinical practice with prospective samples in multiple geographic sites. There is concern about the designs of trials to formally take promising biomarkers forward and the importance of maintaining standards (47, 48, 49) . Several initiatives such as the Tumor Marker Utility Grading System (TMUGS), the Early Detection Research Network (EDRN), and the Program for Assessment of Clinical Cancer Tests (PACCT) address issues ranging from coordination of research activities through to experimental validation and to guidelines for clinical evaluation (45 , 48 , 50 , 51) . The specific technical factors highlighted in this study are likely to be important considerations in the pursuit of successful translation of SELDI-based approaches in this area.
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
1 Supported by Cancer Research United Kingdom. ![]()
2 To whom requests for reprints should be addressed, at Cancer Research UK Clinical Centre, St. Jamess University Hospital, Beckett Street, Leeds LS9 7TF, United Kingdom. Phone: 44-113-2064927; Fax: 44-113-2429886; E-mail: r.banks{at}leeds.ac.uk ![]()
3 The abbreviations used are: NMP, nuclear matrix protein; BTA, bladder tumor antigen; SELDI, surface enhanced laser desorption/ionization time of flight mass spectrometry; RCC, renal cell carcinoma; CV, coefficient of variation. ![]()
Received 4/ 7/03. Revised 6/30/03. Accepted 7/24/03.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
P. de Valpine, H.-M. Bitter, M. P. S. Brown, and J. Heller A simulation-approximation approach to sample size planning for high-dimensional classification studies Biostat., July 1, 2009; 10(3): 424 - 435. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. J. Lancashire, C. Lemetre, and G. R. Ball An introduction to artificial neural networks in bioinformatics--application to complex microarray and mass spectrometry datasets in cancer studies Brief Bioinform, May 1, 2009; 10(3): 315 - 329. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Magistroni, G. Ligabue, V. Lupo, L. Furci, M. Leonelli, L. Manganelli, M. Masellis, V. Gatti, F. Cavazzini, W. Tizzanini, et al. Proteomic analysis of urine from proteinuric patients shows a proteolitic activity directed against albumin Nephrol. Dial. Transplant., May 1, 2009; 24(5): 1672 - 1681. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Decramer, A. G. de Peredo, B. Breuil, H. Mischak, B. Monsarrat, J.-L. Bascands, and J. P. Schanstra Urine in Clinical Proteomics Mol. Cell. Proteomics, October 1, 2008; 7(10): 1850 - 1862. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Hilario and A. Kalousis Approaches to dimensionality reduction in proteomic biomarker studies Brief Bioinform, March 1, 2008; 9(2): 102 - 118. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Seam, D. A. Gonzales, S. J. Kern, G. L. Hortin, G. T. Hoehn, and A. F. Suffredini Quality Control of Serum Albumin Depletion for Proteomic Analysis Clin. Chem., November 1, 2007; 53(11): 1915 - 1920. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. J. A. Vanhoutte, C. Laarakkers, E. Marchiori, P. Pickkers, J. F. M. Wetzels, J. L. Willems, L. P. van den Heuvel, F. G. M. Russel, and R. Masereeuw Biomarker discovery with SELDI-TOF MS in human urine associated with early renal injury: evaluation with computational analytical tools Nephrol. Dial. Transplant., October 1, 2007; 22(10): 2932 - 2943. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. S. Goligorsky, F. Addabbo, and E. O'Riordan Diagnostic Potential of Urine Proteome: A Broken Mirror of Renal Diseases J. Am. Soc. Nephrol., August 1, 2007; 18(8): 2233 - 2239. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. A. Muller, C. A. Muller, and H. Dihazi Clinical proteomics--on the long way from bench to bedside? Nephrol. Dial. Transplant., May 1, 2007; 22(5): 1297 - 1300. [Full Text] [PDF] |
||||
![]() |
J. Albrethsen Reproducibility in Protein Profiling by MALDI-TOF Mass Spectrometry Clin. Chem., May 1, 2007; 53(5): 852 - 858. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. M. Fiedler, S. Baumann, A. Leichtle, A. Oltmann, J. Kase, J. Thiery, and U. Ceglarek Standardized Peptidome Profiling of Human Urine by Magnetic Bead Separation and Matrix-Assisted Laser Desorption/Ionization Time-of-Flight Mass Spectrometry Clin. Chem., March 1, 2007; 53(3): 421 - 428. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Mosley, F. W. K. Tam, R. J. Edwards, J. Crozier, C. D. Pusey, and L. Lightstone Urinary proteomic profiles distinguish between active and inactive lupus nephritis Rheumatology, December 1, 2006; 45(12): 1497 - 1504. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Jahnukainen, D. Malehorn, M. Sun, J. Lyons-Weiler, W. Bigbee, G. Gupta, R. Shapiro, P. S. Randhawa, R. Pelikan, M. Hauskrecht, et al. Proteomic Analysis of Urine in Kidney Transplant Patients with BK Virus Nephropathy J. Am. Soc. Nephrol., November 1, 2006; 17(11): 3248 - 3256. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Pisitkun, R. Johnstone, and M. A. Knepper Discovery of Urinary Biomarkers Mol. Cell. Proteomics, October 1, 2006; 5(10): 1760 - 1771. [Abstract] [Full Text] [PDF] |
||||
![]() |
A.-J. Cheng, L.-C. Chen, K.-Y. Chien, Y.-J. Chen, J. T.-C. Chang, H.-M. Wang, C.-T. Liao, and I-H. Chen Oral Cancer Plasma Tumor Marker Identified with Bead-Based Affinity-Fractionated Proteomic Technology Clin. Chem., December 1, 2005; 51(12): 2236 - 2244. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. J. Dekker, W. Boogerd, G. Stockhammer, J. C. Dalebout, I. Siccama, P. Zheng, J. M. Bonfrer, J. J. Verschuuren, G. Jenster, M. M. Verbeek, et al. MALDI-TOF Mass Spectrometry Analysis of Cerebrospinal Fluid Tryptic Peptide Profiles to Diagnose Leptomeningeal Metastases in Patients with Breast Cancer Mol. Cell. Proteomics, September 1, 2005; 4(9): 1341 - 1349. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. E. Banks, A. J. Stanley, D. A. Cairns, J. H. Barrett, P. Clarke, D. Thompson, and P. J. Selby Influences of Blood Sample Processing on Low-Molecular-Weight Proteome Identified by Surface-Enhanced Laser Desorption/Ionization Mass Spectrometry Clin. Chem., September 1, 2005; 51(9): 1637 - 1649. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. A. Baggerly, J. S. Morris, S. R. Edmonson, and K. R. Coombes Signal in Noise: Evaluating Reported Reproducibility of Serum Proteomic Tests for Ovarian Cancer J Natl Cancer Inst, February 16, 2005; 97(4): 307 - 309. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. P. Diamandis and D.-E. van der Merwe Plasma Protein Profiling by Mass Spectrometry for Cancer Diagnosis: Opportunities and Limitations Clin. Cancer Res., February 1, 2005; 11(3): 963 - 965. [Full Text] [PDF] |
||||
![]() |
K. R. Coombes Analysis of Mass Spectrometry Profiles of the Serum Proteome Clin. Chem., January 1, 2005; 51(1): 1 - 2. [Full Text] [PDF] |
||||
![]() |
S. Skates and O. Iliopoulos Molecular Markers for Early Detection of Renal Carcinoma: Investigative Approach Clin. Cancer Res., September 15, 2004; 10(18): 6296S - 6301S. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Vlahou, A. Giannopoulos, B. W. Gregory, T. Manousakas, F. I. Kondylis, L. L. Wilson, P. F. Schellhammer, G. L. Wright Jr, and O. J. Semmes Protein Profiling in Urine for the Diagnosis of Bladder Cancer Clin. Chem., August 1, 2004; 50(8): 1438 - 1441. [Full Text] [PDF] |
||||
![]() |
E. P. Diamandis Mass Spectrometry as a Diagnostic and a Cancer Biomarker Discovery Tool: Opportunities and Potential Limitations Mol. Cell. Proteomics, April 1, 2004; 3(4): 367 - 378. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. P. Diamandis Analysis of Serum Proteomic Patterns for Early Cancer Diagnosis: Drawing Attention to Potential Problems J Natl Cancer Inst, March 3, 2004; 96(5): 353 - 356. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Cancer Research | Clinical Cancer Research |
| Cancer Epidemiology Biomarkers & Prevention | Molecular Cancer Therapeutics |
| Molecular Cancer Research | Cancer Prevention Research |
| Cancer Prevention Journals Portal | Cancer Reviews Online |
| Annual Meeting Education Book | Meeting Abstracts Online |