Application of Raman Spectroscopy to Identify Microcalcifications and Underlying Breast Lesions at Stereotactic Core Needle Biopsy

Microcalcifications are a feature of diagnostic significance on a mammogram and a target for stereotactic breast needle biopsy. Here, we report development of a Raman spectroscopy technique to simultaneously identify microcalcification status and diagnose the underlying breast lesion, in real-time, during stereotactic core needle biopsy procedures. Raman spectra were obtained ex vivo from 146 tissue sites from fresh stereotactic breast needle biopsy tissue cores from 33 patients, including 50 normal tissue sites, 77 lesions with microcalcifications, and 19 lesions without microcalcifications, using a compact clinical system. The Raman spectra were modeled on the basis of the breast tissue components, and a support vector machine framework was used to develop a single-step diagnostic algorithm to distinguish normal tissue, fibrocystic change (FCC), fibroadenoma, and breast cancer, in the absence and presence of microcalcifications. This algorithm was subjected to leave-one-site-out cross-validation, yielding a positive predictive value, negative predictive value, sensitivity, and specificity of 100%, 95.6%, 62.5%, and 100% for diagnosis of breast cancer (with or without microcalcifications) and an overall accuracy of 82.2% for classification into specific categories of normal tissue, FCC, fibroadenoma, or breast cancer (with and without microcalcifications). Notably, the majority of breast cancers diagnosed are ductal carcinoma in situ (DCIS), the most common lesion associated with microcalcifications, which could not be diagnosed using previous Raman algorithm(s). Our study shows the potential of Raman spectroscopy to concomitantly detect microcalcifications and diagnose associated lesions, including DCIS, and thus provide real-time feedback to radiologists during such biopsy procedures, reducing nondiagnostic and false-negative biopsies.


Introduction
Breast cancer is the second leading cause of cancer-related death in women, with 1 in 8 women likely to develop breast cancer in her lifetime.In 2011, 230,480 new cases of breast cancer are estimated to have occurred in the United States alone (1).The most effective approach for preventing breast cancer morbidity and mortality is early detection.X-ray mam-mography is currently the only accepted routine screening method for early detection (2).
Microcalcifications are localized deposits of calcium species in breast tissue that geographically target the most clinically significant abnormality within the breast and are considered an early mammographic sign of breast cancer (3).However, mammography cannot accurately distinguish microcalcifications associated with benign and malignant breast lesions and even the best mammographic algorithms have limitations arising from dark mammographic backgrounds and densely clustered calcifications (4)(5)(6).Therefore, although the type of microcalcification is known to correlate with disease status (e.g., type II microcalcifications co-localize with proliferative lesions; refs.3,7), most patients currently undergo vacuumassisted stereotactic core needle biopsy to determine whether or not the microcalcifications are associated with breast cancer.
In addition, despite stereotactic guidance, core needle biopsy fails to retrieve microcalcifications in up to 15% of patients (8).Failure to retrieve the microcalcifications results in nondiagnostic or false-negative biopsies, requiring the patient to undergo repeat, often surgical biopsy.Therefore, there is a clinical need for a tool that can detect microcalcifications in the breast tissue to be biopsied and provide real-time feedback to the radiologist during stereotactic core needle biopsy procedures to ensure that the biopsied tissue core contains the microcalcifications observed during mammography.
Raman spectroscopy is a nondestructive, chemical-specific, inelastic scattering technique (9,10) that can be conducted with fiber optic probes compatible with vacuum-assisted stereotactic biopsy needles and so is an ideal choice for detecting microcalcifications during these procedures.Calcium-containing species found in breast microcalcifications, such as calcium hydroxyapatite (CHA) and calcium oxalate (CAO), are strong Raman scatterers.Thus, Raman spectroscopy is sensitive to the presence of microcalcifications and can therefore be used as a clinical tool for guidance of stereotactic breast core needle biopsies for microcalcifications.
Raman spectroscopy is currently being explored by our laboratory (9,(11)(12)(13)(14)(15)(16)(17)(18) and others (19)(20)(21)(22)(23)(24)(25)(26)(27)(28) for its potential to both detect breast microcalcifications and diagnose breast cancer.Our group has shown the potential of Raman spectroscopy to detect and distinguish type I and II breast microcalcifications and to differentiate type II calcifications associated with benign and malignant breast lesions (12) in Raman microscopy studies of formalin-fixed, paraffin-embedded breast biopsies.We also recently developed the first Raman spectroscopy algorithm to detect microcalcifications in fresh breast needle biopsy tissue cores (18).However, if Raman spectroscopy is to be used as a clinical tool for guidance of stereotactic breast needle biopsies for microcalcifications, it is desirable to not only detect microcalcifications but also diagnose the specific breast lesion associated with the microcalcifications.We previously devised a Raman algorithm to diagnose breast cancer and benign breast lesions including fibrocystic change (FCC) and fibroadenoma.However, the algorithm was developed on tissue sites devoid of microcalcifications (15,17).This Raman algorithm is, therefore, not applicable to breast lesions associated with microcalcifications because of the large spectral contributions from CHA and/or CAO, which makes it difficult to map the contributions of the other tissue components onto the algorithm domain created without consideration of microcalcifications.
To address this issue, we are developing new Raman algorithms to diagnose breast cancer and benign breast lesions in the presence or absence of microcalcifications.One possible approach is to combine the previous algorithm for microcalcification detection (18) with a new algorithm for diagnosis of breast lesions (irrespective of microcalcification status; ref. 29), thereby constructing a sequential 2-step algorithm to identify the breast lesion(s) associated with the microcalcifications.However, this is a laborious and unwieldy process (that also does not provide the required level of accuracy as detailed below), which can be simplified by devising a single algorithm to simultaneously detect microcalcifications and diagnose the associated breast lesion(s).
Here, we report the first development of a single-step Raman spectroscopy algorithm to simultaneously determine microcalcification status and diagnose the underlying breast lesion (s), in real-time, during stereotactic breast core needle biopsy procedures.We also compare the diagnostic performance of this single-step algorithm with its 2-step counterpart(s) to comprehensively assess the advantages (and/or drawbacks) of pursuing these 2 approaches.

Patient population
This study was conducted on the Raman spectroscopy data set used previously to develop our decision algorithms to detect breast microcalcifications (18).Raman spectroscopy was conducted ex vivo on fresh breast tissue cores obtained from 33 female patients (ages, 38-79 years) undergoing vacuum-assisted stereotactic core needle breast biopsy procedures in the Breast Health Center at University Hospitals-Case Medical Center, Cleveland, OH.All studies were approved by the Case Cancer Institutional Review Board and the Massachusetts Institute of Technology Committee On the Use of Humans as Experimental Subjects, in accordance with assurances filed with and approved by the U.S. Department of Health and Human Services.Informed consent was obtained from all subjects before their biopsy procedures.

Raman spectral acquisition
The Raman spectra were obtained using a portable clinical Raman spectroscopy system, previously described in detail (18).The instrument delivers 830 nm near-infrared (NIR) excitation light to the tissue via an optical fiber probe.The fiber probe, which is in the form of a flexible catheter of outer diameter 2 mm, consists of a central excitation fiber surrounded by nine acquisition fibers, each of 200 mm diameter.The acquisition fibers are coupled to an f/1.8i spectrograph for dispersion onto a thermoelectrically cooled CCD detector.Laser power at the probe-tissue interface was 98 to 105 mW.Ten spectral acquisitions were obtained from each tissue site, and these were summed to provide the final spectrum corresponding to the tissue site.Each of the 10 spectral acquisitions was conducted in 0.25 seconds, for a total collection time of 2.5 seconds per tissue site.Analysis of the spectra was conducted in real-time, that is, in a few tens of milliseconds as detailed previously (30).In other words, each spectrum was acquired, model fit, analyzed, and the diagnostic classification for the tissue site displayed on a computer interface in just more than 2.5 seconds.
Raman spectra were collected from several tissue sites of interest on each tissue core [typically normal tissue, lesions (grossly abnormal tissue) without microcalcifications and lesions with microcalcifications] identified by gross inspection and comparison with the accompanying specimen radiograph.The tissue cores were roughly cylindrical and measured ca.1.7 mm by 2 cm, as determined by the size of the needle biopsy port.Spectra were also collected from different tissue cores in each biopsy, so the number of spectra varied from patient to patient.The number of tissue sites studied per core varied from 1 to 8 and the number of cores per patient from 1 to 6.All spectra were obtained within 30 minutes of excision.

Histopathology
After spectral acquisition, the tissue sites from which Raman spectra were obtained were uniquely identified with multicolored colloidal inks.The tissue was then fixed in 10% neutral-buffered formalin and paraffin-embedded and sections cut and hematoxylin and eosin (H&E)-stained for microscopic examination by an experienced breast pathologist.Histopathologic evaluation, in conjunction with the radiographic assessment, was used as the gold standard for comparison with the Raman spectral diagnosis.

Raman data analysis
The Raman system was wave number calibrated and the Raman spectra corrected for the system wavelength response and background subtracted, as previously described (18).The Raman spectra were then fit with a previously developed breast model (13), in which the Raman spectrum is considered as a linear combination of the basis spectra of 10 breast tissue constituents, including epithelial cell nuclei (ECN) and cytoplasm (ECC), fat, cholesterol-like deposits (CHOL), b-carotene (b-CAR), collagen (COLL), oxy-hemoglobin (oxy-HB), CHA, CAO, and water; and 2 fiberoptic probe materials, epoxy and sapphire.Ordinary least squares (OLS) fitting was used to determine the contribution of each basis spectrum to the tissue spectrum, yielding fit coefficients (FC) that provide information about the morphologic and chemical composition of the tissue.The goodness of the model fit was qualitatively estimated by visual inspection of the residual (spectrum minus fit) and quantitatively from the SD of the residual.The extracted FCs were used to develop the Raman algorithms outlined below.

Raman algorithm development
In this study, support vector machines (SVM) were used to construct single-and 2-step algorithms to concomitantly detect microcalcifications and diagnose the associated breast lesion(s) based on the Raman FCs.SVMs are a relatively new class of nonlinear classification techniques (31)(32)(33), whose robust nature with respect to sparse and noisy data have enabled its extensive usage, especially in bioinformatics and chemometrics.
First, a single-step SVM Raman algorithm was constructed to simultaneously detect microcalcifications and diagnose the associated breast lesion(s).In this single-step algorithm, we considered the positive class to be cancer (with or without microcalcifications), unless otherwise specified.All other classes including normal, fibroadenoma, and fibrocystic change, irrespective of microcalcification status, were considered to be negative.This single-step SVM Raman algorithm was then compared with 2-step Raman algorithms.Initially, a "na€ ve" 2step algorithm was built by combining the logistic regression (LR) algorithm for the detection of microcalcifications described by Saha and colleagues (18) with an SVM algorithm developed for diagnosis of lesions irrespective of microcalcification status described by Dingari and colleagues (29).In the first step of this 2-step na€ ve algorithm, LR was conducted to first discriminate the entire data set into 3 classes: normal breast tissue, lesions without microcalcifications, and lesions with microcalcifications.For the LR step, a likelihood ratio test was used to estimate the FCs most critical for diagnosis, namely COLL, fat and the combined contribution of CAH and CAO.In the second step of the na€ ve algorithm, a breast model FC-based SVM algorithm was used for all tissue sites to further diagnose the specific tissue type, namely, normal, FCC, fibroadenoma, or cancer.
In addition, to comprehensively assess the 2-step algorithm, we subdivided the second-step into 2 new SVM Raman algorithms (as diagrammed in Fig. 1), one optimized for classification of breast lesions without microcalcifications and one optimized for classification of breast lesions with  microcalcifications.We call this implementation of the 2-step algorithm the "optimized" version.It is worth noting that the first step (the LR algorithm for detection of microcalcifications) is identical in both the na€ ve and optimized versions of the 2-step algorithms.All 3 scenarios [namely, single-step, 2step (na€ ve), and 2-step (optimized) algorithms] result in classification of the tissue sites into 1 of 8 categories, based on the tissue type (normal breast tissue, FCC, fibroadenoma, or cancer) and microcalcification status (with and without microcalcifications).

Cancer with microcalcifications
Here, LR was conducted using in-house code (MATLAB R2010b, Math Works).The SVM classification analysis was conducted using Orange (http://www.ailab.si/orange),an open-source data mining suite featuring Python scripting and a graphic interface (34).Specifically, a radial basis function (RBF) kernel with a Gaussian profile K(x i ,x j ) ¼ exp(Àgkx i À x j k 2 ) was used for nonlinear SVM classification, where g represents the RBF kernel parameter.The optimal model parameters C (cost parameter) and g that give minimum error in cross-validation were determined by conducting a grid search over appropriate ranges (30).
The performance of the single-and 2-step SVM Raman algorithms were assessed using the following metrics: sensitivity (SE), specificity (SP), positive predictive value (PPV), and negative predictive value (NPV) for the diagnosis of cancer (with or without microcalcifications) and overall accuracy (OA) for the diagnosis of all categories: normal tissue, FCC, fibroadenoma, and cancer, with and without microcalcifications.These metrics were evaluated on the basis of the rates of true and false-positive and -negative results per standard definitions (35).Algorithm performance was validated using a leave-one-out cross-validation (LOOCV) technique (15).In this technique, the data from a particular tissue site are eliminated and a decision algorithm developed that classifies all of the remaining tissue sites in the dataset (including ones from the same tissue core and patient) optimizing agreement with the histopathology diagnoses.The resulting decision algorithm is then used to classify the excluded site.This process is successively applied to each site.

Results
The data set initially included Raman spectra obtained from 158 tissue sites.Five tissue sites with miscellaneous tissue histopathologic diagnoses (fat necrosis, healing biopsy site, etc.) were excluded from analysis during algorithm development.Seven additional tissue sites were excluded during LOOCV as unallocated, based on their relatively low probability of belonging to any class, including: 1 DCIS with microcalcifications, 4 FCC with microcalcifications, 1 FCC without microcalcifications, and 1 normal breast tissue site.(Specifically, these 7 tissue sites yielded low probability of classification on application of the single-step Raman algorithm and were excluded from all ensuing analysis for the sake of consistency.In clinical practice, tissue sites for which there is no definitive spectral diagnosis would be designated as unclassified or equivocal and additional Raman measurements made in the hope of obtaining more definitive results.)Analysis was conducted on the remaining 146 tissue sites, whose reference classifications are as follows: 50 normal; 3 normal with microcalcifications; 17 fibroadenoma all with microcalcifications; 60 FCC, 43 with and 17 without microcalcifications; 16 cancers, 14 with microcalcifications [13 DCIS and 1 invasive ductal carcinoma (IDC)] and 2 without microcalcifications [1 DCIS and 1 lobular carcinoma in situ (LCIS)].Breast lesions were classified as having microcalcifications if microcalcifications were seen at that tissue site on either the specimen radiograph or the H&Estained tissue sections.In particular, there were 13 microcalcifications seen on the H&E-stained sections that were not observed on the specimen radiograph.In addition, 3 normal tissue sites were classified as normal with microcalcifications as microcalcifications were seen on radiography that were not seen on histopathology.Of the 77 tissue sites with microcalcifications, 75 had type II (CHA-derived) and only 2 had type I (CAO-derived) microcalcifications.

Raman spectra
Figure 2 shows the histopathology and Raman spectrum (blue) with model fit (red) and residual (black) for a typical breast lesion (FCC) with type II microcalcifications.The microcalcifications are visible as dark blue concretions (arrow) in the photomicrograph in Fig. 2A (H&E; Â10).The corresponding Raman spectrum in Fig. 2B shows a prominent band at 960 cm À1 due to CHA (arrow; arising from the n 1 (PO 4 ) totally symmetric stretching mode of the "free" tetrahedral phosphate ion) present in the microcalcifications.

Single-step SVM Raman algorithm
We devised a single-step Raman spectral algorithm to diagnose normal breast tissue, FCC, fibroadenoma, and breast cancer with and without microcalcifications.This algorithm uses SVM to classify the tissue sites into the different lesion categories based on the FCs extracted using the OLS model.In particular, the FCs corresponding to CAH, ECC, fat, oxy-HB, COLL, and CHOL were selected as input parameters to the SVM model as they provided the optimal diagnostic performance on LOOCV.The inclusion of the other parameters (e.g., epoxy and water FCs) had a slightly detrimental effect on the classification capability of the algorithm, which can be attributed to lack of correlation between the presence of these components and the lesion type.
The LOOCV results for this single-step SVM Raman algorithm are shown in the confusion matrix in Table 1.This algorithm, which takes into account microcalcification status for the first time, has a SE of 62.5%, SP of 100%, PPV of 100%, and NPV of 95.6% for the diagnosis of breast cancer (with or without microcalcifications) and an OA of 82.2% for the classification into the specific categories of normal breast tissue, FCC, fibroadenoma, or breast cancer (with and without microcalcifications).Here, the area under the curve (AUC) is 0.92 and indicates the robustness of the algorithm (with respect to a maximum AUC of 1.00 for a perfect algorithm).
In these studies, we chose the decision line whose operating point results in maximal PPV for the diagnosis of breast cancer, as is typically done in a clinical situation where the disease to be diagnosed is serious, should not be missed, and is treatable (35).However, the situation is more complex in the case of Raman spectroscopy guidance of stereotactic breast biopsies for microcalcifications.In this instance, the radiologist's goal is to diagnosis breast cancer if present or, failing that, to retrieve the targeted microcalcifications.So, the radiologist needs to know whether or not there are microcalcifications present in the tissue to be biopsied and whether or not the associated breast lesion is cancer.To take this into account, we also carefully considered the OA of the diagnostic algorithm, the Table 1.only metric that fully takes into account the combined diagnosis of microcalcification status and the underlying breast lesion.

Confusion matrix for LOOCV of single-step SVM Raman decision algorithm for all 7 diagnostic categories
The performance of the single-step SVM algorithm for simultaneous detection of microcalcification status and diagnosis of the underlying breast lesion is virtually identical to that of a previously developed SVM Raman algorithm that ignored microcalcification status, which has a SE of 62.5%, SP of 100%, PPV of 100%, and NPV of 95.6% for the diagnosis of breast cancer (ROC AUC ¼ 0.92) and overall accuracy of 81.5%, in this same data set (29), as shown in Table 2.This is not surprising as, even though the new algorithm simultaneously detects microcalcification status and diagnoses the underlying lesion, here we are assessing algorithm performance for the diagnosis of breast cancer irrespective of microcalcification status for the sake of comparison.We have also assessed its performance for an instance of a combined diagnosis of microcalcification status and lesion diagnosis (See Concomitant diagnosis of microcalcifications and underlying breast lesion below).Furthermore, the performance of this single-step SVM algorithm is markedly superior to that obtained with our previously developed LR Raman algorithm (in a prospective ex vivo validation study), which has a SE of 83%, SP of 93%, PPV of only 36%, and NPV of 99% for the diagnosis of breast cancer in the absence of microcalcifications (17).Significantly, the majority of breast cancers in the current study are DCIS, a lesion that could not be diagnosed with the previous LR Raman algorithm, as the latter algorithm was developed in a data set that consisted primarily of samples with IDC (15).

Impact of microcalcifications on single-step SVM Raman algorithm performance
To specifically assess the impact of the presence of microcalcifications on the performance of the single-step SVM Raman algorithm, the 7 specific diagnostic categories were collapsed into 2 subcategories: tissue sites (normal, FCC, fibroadenoma and cancer) with and without microcalcifications, as shown in Table 3.This resulted in a PPV of 100%, NPV of 92.7%, and an OA of 70.1% for lesions with microcalcifications, compared with a PPV of 100%, NPV of 98.5%, and OA of 95.7% for lesions without microcalcifications.These results indicate that the SVM Raman algorithm is more robust for classification of tissue sites without microcalcifications than for tissue sites with microcalcifications, especially when one considers classification of all classes (i.e., the OA metric) as compared with discrimination of cancer sites only (i.e., the PPV and NPV metrics).Nevertheless, the SVM algorithm provides comparable accuracy for the critical metrics of PPV and, to a slightly lesser extent, NPV.The confusion matrices for these 2 subclassification schemes are strikingly similar to that of the 7-category classification scheme (Table 1) and are shown in the Supplementary Tables S1 and S2.Supplementary Table S3 also provides a detailed breakdown of the tissue sites by pathology and the corresponding diagnoses based on the singlestep SVM Raman decision algorithm.Overall, these results indicate that performance of the single-step SVM Raman algorithm is remarkably consistent, irrespective of the presence of microcalcifications.

Misdiagnoses using single-step SVM Raman algorithm
As might be suspected from the results above, a significant number of tissue site misdiagnoses using the single-step SVM Raman algorithm involved misdiagnosis of the underlying lesion in tissue sites with microcalcifications.All 3 normal breast tissue sites with microcalcifications were misclassified, 1 as FCC, 1 as FCC with microcalcifications, and 1 as fibroadenoma with microcalcifications.Presumably, the spectroscopy results for these tissue sites (which showed normal breast tissue on histopathology) are correct as they agree with the radiographic assessment of the presence of lesions with microcalcifications at these sites, and the apparent misclassifications arise due to spectroscopy-histopathology registration errors.The majority of tissue sites with fibroadenoma with microcalcifications (11 of 17) were also misclassified as FCC with microcalcifications.For these tissue sites, microcalcifications are the dominant spectral contributors, and the spectral contributions from the other tissue components (such as CHOL, fat, and COLL) are fairly comparable for fibroadenoma and FCC.In other words, we suspect that the SVM class allocation probability for these tissue sites was reasonably similar for both the fibroadenoma and FCC categories, and the resulting misclassification can be attributed to the combination of an imperfect decision plane (between these 2 classes) and the uncertainty inherent in spectral assignment (from shot noise etc.).It is worth noting that the substantially larger number of samples designated as FCC with microcalcification (43) in relation to fibroadenoma with microcalcification (17) in this data set may have also artificially skewed the decision plane in favor of the former.Fortunately, this problem can be remedied by adequately increasing the sample size, for example, by conducting larger clinical studies.These misclassifications are likely not clinically significant in the context of Raman guidance of stereotactic breast biopsies, viewed in terms of overall biopsy (or patient) results, and not individual tissue site results, as the overall assessment that these biopsies harbor benign breast lesions with microcalcifications that would be retrieved at biopsy is correct.
Furthermore, 5 of 14 tissue sites with cancer with microcalcifications were misclassified by the single-step SVM Raman algorithm as normal (1) or FCC with microcalcifications (4).The first of these was classified as normal due to a high FC of fat.The spectroscopy result for this tissue site is most likely correct, as it appeared normal (largely fat) on gross inspection.Thus, the apparent misclassification of this site is likely due to a spectroscopy-histopathology registration error.Also, 3 of the 4 tissue site misclassifications as FCC with microcalcifications belonged to an individual patient, the spectral data from whom exhibited tissue features that were not accounted for by the breast constituent model.The misclassification of these 4 cancers with microcalcifications sites, while potentially clinically significant, is not unexpected due to the relatively high FC of CHA in both of these categories (FCC and cancer with microcalcifications).There were only 2 tissue sites with cancer without microcalcifications, and these were both misclassified as normal (1) and cancer with microcalcifications (1).Again, the misclassification for cancer without microcalcifications is not likely clinically significant, as the biopsy also contained tissue sites correctly classified by the Raman algorithm as cancer with microcalcifications.Thus the overall assessment that the biopsy harbors cancer with microcalcifications that would be retrieved at biopsy is correct.
Other misdiagnoses resulted from misclassification of microcalcification status.The majority of tissue sites with FCCs without microcalcifications (12 of 17) were misclassified as FCC with microcalcifications, most likely because the corresponding decision plane is dominated by the FC of COLL as opposed to the FC of CHA.Even so, these misclassifications are again likely not clinically significant, as these biopsies also contained tissue sites correctly classified by the Raman algorithm as FCCs with microcalcifications.Thus, the overall assessment that these biopsies harbor benign breast lesions with microcalcifications that would be retrieved at biopsy is again correct.As alluded to above in the case of misclassification of fibroadenoma sites as FCC with microcalcifications, the SVM classification algorithm is likely to conduct with greater accuracy for more extensive clinical data sets, which have sufficiently large number of FCCs without microcalcifications sites.

Two-step SVM algorithm
Performance of the single-step SVM Raman algorithm was compared with that of the na€ ve and optimized 2-step algorithms.The na€ ve 2-step algorithm, which sequentially applies our LR algorithm (for microcalcification detection) and a SVM algorithm (for lesion discrimination), yields a SE of 53.8%, SP of 94%, PPV of 95%, and NPV of 81.3% and an OA of 73.4%.Clearly, the performance of this algorithm in LOOCV is considerably inferior to that of the single-step SVM algorithm across the board (Table 2).This reveals the underlying deficiencies of using a 2-step method where the multistage or hierarchical decision scheme amplifies errors emanating from the imperfect decision planes in each of the steps.
In comparison, the optimized 2-step algorithm, which uses 2 separate SVM algorithms to subcategorize lesions with and without microcalcifications as FCC, fibroadenoma, and breast cancer in step 2, yields a SE of 56.3%, SP of 100%, PPV of 100%, and NPV of 94.9% for the diagnosis of breast cancer and an OA of 80.2% for classification into the specific categories of normal breast tissue, FCC, fibroadenoma, or breast cancer (with and without microcalcifications).Thus, when optimized, performance of the 2-step Raman algorithm is comparable to that of the single-step SVM Raman algorithm (Table 2).Nevertheless, it should be noted that application of 2 separate SVM algorithms for lesion discrimination in our limited dataset may run the risk of overtraining (overfitting) due to the concomitant reduction in number of samples available for each algorithm (i.e., data sparsity).For example, 12 tissue sites were classified by the LR algorithm as lesions without microcalcifications and, as all of these 12 were designated as FCC by histopathology, the corresponding SVM algorithm was able to readily identify all these tissue sites as FCCs.For larger datasets, one would anticipate that lesion discrimination would be more difficult due to the presence of other types of lesions without microcalcifications.Thus, viewed from the perspective of diagnostic accuracy (better PPV, NPV, and OA) as well as robustness (lower chance of overfitting), use of the single-step SVM Raman algorithm appears more promising.Nevertheless, the 2-step algorithm has certain intrinsic advantages in terms of interpretability, as it decomposes the overall classification into 2 independent steps to determine microcalcification status and diagnose the underlying lesion.Furthermore, in specific cases, only one step of the decomposition (e.g., microcalcification status) may be of interest to the radiologist.

Concomitant diagnosis of microcalcifications and underlying breast lesions
Finally, to further validate the capability of the single-step SVM algorithm to concomitantly diagnose both microcalcification status and the underlying lesion, performance metrics were also calculated for the diagnosis of breast cancer with microcalcifications.In this case, we considered the positive class to be cancer with microcalcifications.All other classes, including cancer without microcalcifications, were considered to be negative.In this case, our single-step SVM algorithm exhibits a SE of 64.3%, SP of 100%, PPV of 100%, and NPV of 96.4%, a slight improvement in SE and NPV over those for the diagnosis of breast cancer with or without microcalcifications (62.5% and 95.6%, respectively).A ROC curve illustrating the performance of this single-step Raman algorithm is shown in Fig. 3.The AUC for the diagnosis of breast cancer with microcalcifications was 0.92, indicating the robustness of the algorithm.

Discussion
Our research focuses on development of Raman spectroscopy as a clinical tool for the real-time diagnosis of breast cancer, motivated by its exquisite chemical specificity especially for detection of microcalcifications.Here, we report on the development of a novel single-step Raman spectroscopy algorithm to simultaneously determine microcalcification status and diagnose the underlying breast lesion, in real-time, at stereotactic breast core needle biopsy.Our SVM-derived algorithm yielded a PPV and NPV of 100% and 95%, respectively, for the diagnosis of breast cancer, with or without microcalcifications.The single-step algorithm had an OA of 82% for the specific diagnosis of microcalcification status and the underlying breast lesion.Significantly, the algorithm is able to classify DCIS cases, the most common type of breast cancer associated with microcalcifications, which could not be done with our previous algorithms (15,17).This is a vital step in the development of Raman spectroscopy as a viable biopsy guidance tool.Specifically, we see Raman spectroscopy as a clinically viable adjunct to stereotactic breast needle biopsy procedures.Real-time analysis of the Raman spectra using our single-step algorithm would reveal whether or not the tissue harbors the targeted microcalcifications and diagnose any breast lesions present, helping the radiologist to decide how many cores to take for submission for pathology examination.This should improve the likelihood of an adequate, diagnostic biopsy that contains the targeted microcalcifications and reduce the need for repeat, surgical biopsy.
The findings in this study suggest that the new single-step SVM Raman algorithm is more robust and accurate than 2-step Raman algorithms using previously developed or newly constructed SVM algorithms.In fact, due to the potential error propagation in a multistage algorithm, it is expected that a single-step algorithm simultaneously considering all the variables will do a superior job.An analogy can be drawn to the relatively poor performance of multistage decision tree algorithms (where the dataset is repetitively split based on the criterion that maximizes the separation of the tissue type) in relation to SVMs in classifying large and complex datasets (29, 31-33, 36, 37).
Furthermore, these preliminary results are not representative of the best classification performance that is likely to be obtainable after further optimization of the probe hardware (optical excitation and collection) and algorithm selection procedures.Specifically, we are designing and fabricating customized fiber probes that integrate nonimaging optical elements and bifocal lenses to enhance the efficiency of collection of Raman photons.In terms of algorithm performance enhancement, we anticipate that hybrid multivariate classification schemes that tailor to individual categories of lesion types and microcalcification status will be developed with the incorporation of larger clinical datasets.More extensive clinical studies will also enable us to expand our diagnostic algorithm to encompass breast lesions (such as epithelial hyperplasia, sclerosing adenosis, and Monckeberg's arteriosclerosis), which were not observed in the current data set.Furthermore, such datasets would enable us to conduct cross- validation by leaving out a larger number of tissue sites (e.g., 25% of the total data set) or alternately by leaving out the sites corresponding to a single patient.The final milestone in algorithm validation would be in prospective application of the developed algorithm in patients undergoing stereotactic breast needle biopsy.

Figure 1 .
Figure1.Schematic diagram of sequential two-step optimized Raman algorithm using a logistic regression algorithm for step 1 and two separate SVM classification models (naïve and optimized) for step 2. Both these algorithms were used in an LOOCV protocol.Ã , normal tissue sites with microcalcifications were not considered for algorithm development as they represent a discordance between radiographic assessment (lesion with microcalcifications) and histopathologic evaluation (normal).

Figure 2 .
Figure 2. Histopathology and Raman spectrum (blue) with model fit (red) and residual (black) for a typical breast lesion (FCC) with type II microcalcifications.The microcalcifications are visible as dark blue concretions (arrow) in the photomicrograph in A (H&E; Â10).Note the yellow ink on the breast tissue surface at the top in A, marking the site for spectral correlation.The corresponding Raman spectrum in B shows a prominent band at 960 cm À1 due to CHA (arrow), which is a major constituent of type II microcalcifications.
font is used for the diagonal values of the table, which represent accurate Raman predictions in relation to the reference diagnosis obtained from radiology and histopathology.a There were no tissue sites with fibroadenoma without microcalcifications.

Figure 3 .
Figure 3. ROC curve for the single-step SVM Raman decision algorithm for the diagnosis of breast cancer with microcalcifications.The xand yaxis represent the false-positive (FP) rate and the true-positive (TP) rate, respectively.The ROC curve of 2 indistinguishable populations, represented by the dashed line, is included for comparison.The AUC is 0.92, the AUC for a perfect algorithm is 1.

Table 2 .
(29)arison of diagnostic performance of single-, 2-step, and previous Raman algorithms for detection of breast cancer, with and without microcalcifications SVM Raman algorithm for the diagnosis of breast cancer irrespective of microcalcification status(29).
a b LR Raman algorithm for the diagnosis of breast cancer in the absence of microcalcifications (17).

Table 3 .
Performance of single-step SVM Raman decision algorithm for tissue sites with and without microcalcifications