| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Molecular Biology and Genetics |
Department of Medicine, Division of Gastroenterology [Y. X., F. M. S., J. Y., T. T. Z., V. S., Y. M., F. S., T. C. L., A. O., S. W., M. C. K., K. D., B. D. G., J. M. A., S. J. M.], Department of Surgery, Division of Thoracic Surgery [M. J. K.], and Department of Surgical Oncology [K. P., D. S.], Greenebaum Cancer Center, University of Maryland School of Medicine, Baltimore VA Hospital, Baltimore, Maryland 21201
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
BA4 is a premalignant condition caused by chronic gastroesophageal reflux (6) . Numerous scientific reports suggest that BA is a precursor lesion for esophageal adenocarcinoma (7, 8, 9) . Elucidation of the molecular biology underlying malignant transformation in BA may lead to markers for early detection of carcinomas and enable therapeutic interventions to prevent or treat these otherwise highly lethal neoplasms (2) .
In recent years, cDNA microarray technology has brought new hope to the field of cancer research. This technology has proven capable of improving accuracy in disease classification, discriminative diagnostic power, and early lesion detection efficiency (10 , 11) . Although cDNA microarrays have permitted the collection of expression information on thousands of genes simultaneously, analysis of this large amount of data has proven difficult. For this reason, numerous bioinformatics strategies have been developed, including hierarchical clustering (12) , multidimensional scaling (13) , and ANNs (14 , 15) . For global gene expression profiling, cDNA microarray-generated information has been analyzed using hierarchical agglomerative clustering techniques (10 , 12 , 16, 17, 18, 19, 20, 21, 22) .
However, despite numerous advantages, clustering has a series of drawbacks (14)
. It is an unsupervised classification method that may group patient samples based on characteristics irrelevant to the clinical question under study. In contrast, ANNs are a supervised classification technique. Originally designed to mimic the parallel functioning of the mammalian brain, ANNs are mathematical information- processing models composed of many units, named neurons. The units in an ANN are highly interconnected by weighted links, very similar to neural synapses. Another similarity between the mammalian brain and ANNs is that they both learn by example. ANNs are presented with sets of defining characteristics for the particular state of an object, such as cDNA microarray data, by which ANNs are taught the correct interpretation of this data (such as diagnosis). By performing repetitive cycles, ANNs adjust the weights of links between neurons in order to associate input data with correct output. After training with multiple input-output data pairs, ANNs are usually able to make diagnoses on blinded input data. A schematic illustration of these concepts is given in Fig. 1
.
|
A major aim of published microarray studies is the identification of differentially expressed genes between two sample groups (26 , 27) . Genes identified from these comparisons are candidate targets for diagnosis and therapy (28) . This study of differentially expressed genes could provide important new information regarding the progression of BA to cancer. Furthermore, these genes could unveil pathologic pathways that would further our understanding of CA. One method for identifying differentially expressed genes is SAM, a software program developed at Stanford University (29) . Using a derivation of the unpaired Student t test, SAM assigns each gene a score based on its change in average expression between two groups, relative to the genes SD within each group. An estimate of the FDR, i.e., the percentage of genes falsely reported as significant, is also provided.
In the current study, we show that ANNs can discriminate between subtly different clinical disease lesions, specifically, the premalignant lesion BA versus CA, based on microarray-derived data. Moreover, by applying SAM, we identified 160 clones that are significantly differentially expressed between BA and CA.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Preparation of Microarray Clones.
The 95% nonredundant, sequence-verified cDNA library prepared by the Lawrence Livermore Laboratories was used as a source of clones (Research Genetics, Huntsville, AL). Inserts from the first 8064 clones in this library were amplified for microarray printing. Each bacterial stock (1 µl) was amplified in a 108-µl reaction containing 2.5 units of Taq polymerase (Life Sciences, Gaithersburg, MD). The master mix for each 96-well plate contained 9.5 ml of double-distilled water, 1 ml of 10 x buffer, 10 µl each of 1000 µM M13 primers (forward and reverse), 20 µl each of the four deoxynucleotide triphosphates (100 mM), and 50 µl of Taq polymerase (5 units/µl). PCR conditions consisted of an initial denaturation step at 96°C for 30 s followed by 30 cycles of 45 s at 94°C, 45 s at 55°C, and 2 min 30 s at 72°C, then concluded by a final extension of 5 min at 72°C. The amplified inserts were then purified using a Qiagen PCR purification kit (Qiagen) on a Qiagen BioRobot 9600 liquid handling robotic workstation. After purification, PCR products were desiccated in 96-well plates using a large Speed-vac apparatus, then reconstituted in 30 µl of distilled water.
Reference Probe.
All microarrays were cohybridized to an experimental aRNA and invariant reference probe. The reference probe was prepared from an equimolar mixture containing RNAs from the cell lines HCT116, HT29, CaCo-2, HCT15, HTB114, MCF-7, HeLa, and AGS. HTB114 was derived from a patient with leukemia, HeLa was from a cervical cancer, MCF-7 was from a breast cancer, and AGS was from a gastric cancer. The remaining four cell lines (HCT116, HT29, CaCo-2, and HCT15) were derived from colorectal cancers. The choice of cell lines included in the reference probe was influenced by previous microarray studies (18
, 19
, 21
, 30)
. These cell lines were chosen to represent a variety of cell types to obtain a reference expression level on as many genes as possible and to maintain consistency of experimental design between studies. These cell lines were chosen as a source of reference probe RNA before initiation of the current study; esophageal cell lines were not available at the time of this study.
Extraction, Amplification, and Labeling of the aRNA Probe.
Total RNA (2050 µg) was extracted from freshly frozen tissue by standard organic methods and amplified with a T7-based protocol (31, 32, 33)
. For each two-way comparison, 36 µg of aRNA prepared from the reference cells or esophageal lesion were labeled by incorporating Cy3- or Cy5-labeled dCTP using random primers and Superscript reverse transcriptase. The resulting probes were purified with a Microcon microcentrifuge filter device and recovered in a volume of 25 µl.
Microarray Preparation.
We prepared lysine-coated slides using National Cancer Institute-Advanced Technology Center and Stanford University protocols.5
cDNA clones were printed using eight pins in a 32-pin print head (Majer Precision Engineering, Tempe, AZ) on a GeneMachines Omnigrid Arrayer (GeneMachines, Oxnard, CA). The printed slides were UV cross-linked, post-treated with succinic anhydride to reduce background, and subjected to hybridization. Each slide was incubated in 35 µl of hybridization solution containing Cy3- and Cy5-labeled target, 1 µl of 50 x Denhardts blocking solution (Sigma, St. Louis, MO), 20 µg of Human COT 1-DNA (Roche Diagnostics Corp., Indianapolis, IN), 10 µg of yeast tRNA (Roche Diagnostics Corp.), and 810 µg of Poly-A (Roche Diagnostics Corp.) in 2.24 x SSC/0.25% SDS under a 40 x 22-mm coverslip at 65°C overnight in a final hybridization volume of 35 µl. The slide was placed in a sealed hybridization chamber (Teleckem, Sunnyvale, CA) containing two side wells with a total of 50 µl of water for humidification at 65°C overnight. On the next day, the slide was washed in 500 ml of 2 x SSC, 0.1% SDS at room temperature, during which time the coverslip fell off, and washing continued for 2 min. The slide was then placed in 1 x SSC for 2 min at room temperature. Finally, it was washed once with 0.2 x SSC at room temperature and once with 0.05 x SSC for 2 min and air dried. Each hybridized slide was scanned using a GenePix 4000A dual-laser slide scanning system (Axon) at wavelengths corresponding to each probes unique fluorescence (635 and 532 nm for Cy5 and Cy3, respectively).
Hierarchical Agglomerative Clustering.
Data imported from GenePix was manipulated and clustered, using established algorithms implemented in the software program Cluster (12
, 22)
. Average linkage clustering with centered correlation was used. TreeView software (ibid.) generated visual representations of clusters.
SAM Gene Filtering.
The gene filtering process was performed using SAM (29)
. Specifically designed for usage with microarray data, SAM is a software program that reports the most statistically significant differentially expressed genes between two groups of samples. In addition, SAM reports an estimate of the median FDR, which is the percentage of genes falsely reported as showing statistically significant differential expression. SAM uses an algorithm based on the Student t test and also performs data permutations to determine the FDR.
ANN Construction and Testing.
We constructed an ANN using the software program MatLab (MathWorks, Inc., Natick, MA). Our ANN was based on the principle of FeedForward with Error Backpropagation. The net was designed with one hidden layer. The number of neurons in the input layer was equal to the number of clones used, whereas the ideal outputs were set at -1 for BA and +1 for CA. The ANN was then trained using a training set consisting of eight Barretts metaplasias and four cancers. After training, the ANN was tested with the remaining samples, consisting of six Barretts specimens and four cancers.
| RESULTS |
|---|
|
|
|---|
|
|
Hierarchical Agglomerative Clustering Based on Information from the 160 Clones Selected by SAM.
All 22 esophageal samples were reclustered based on expression data from the 160 genes selected by SAM (Fig. 4)
. All but two of the Barretts metaplasias clustered in one main group. All eight cancers clustered in the other main group, which also contained the two Barretts specimens. However, the two Barretts cases in the cancer group were in a separate subcluster very close to the Barretts cluster (Fig. 4)
.
|
|
| DISCUSSION |
|---|
|
|
|---|
Supervised techniques can also be used to analyze microarray data. These techniques can focus on a specific comparison of interest, such as esophageal premalignancy versus malignancy. Supervised techniques include SAM, a gene filtering software program that identifies genes significantly different in expression level between two groups (29) . Supervised methods also include ANNs, which can be trained to recognize clinically discrete groups of lesions or other biological entities, even when these entities are very similar (14 , 15) .
In our study, we first applied unsupervised, classical statistical techniques to unfiltered microarray data. Thus, the esophageal samples were grouped using Cluster based on gene expression information from all 8064 clones. This approach was unsuccessful in distinguishing between Barretts metaplasias and CAs. The clustering result can be interpreted as a consequence of using unfiltered, unsupervised data. In other words, although some of these data were relevant to the premalignant or malignant status of the samples, much of the data contained irrelevant information. Because of this type of "noise", Cluster was unable to group the samples according to their pathologic diagnoses.
To restrict the genes to those showing statistically significant differential expression between Barretts and CA, we used SAM. With an FDR of 0, SAM reported 129 genes overexpressed in cancer and 31 genes overexpressed in Barretts. The complete list containing all 160 of these genes can be found on a Web site.6
Clustering based on these 160 genes produced results very close to the true pathological diagnosis (Fig. 3)
. Although two Barretts samples were still misclassified in the cancer group, these samples clustered very closely to the other Barretts samples. This result supports the conclusion that genes selected by SAM were relevant to distinguishing Barretts from CA. Moreover, this result also suggests that gene filtering to select genes relevant to the difference between Barretts and cancer is itself a justifiable intermediate step before applying unsupervised techniques, such as clustering. However, when using only the 160 genes selected by SAM, clustering still misclassified two specimens, whereas our ANN correctly classified all specimens in its test set.
By using gene filtering programs, subgroups of important genes could be identified for the differential diagnosis of BA and CA. This gene filtering technique could potentially be applied to isolate other subgroups of genes. For example, it could identify genes capable of predicting outcome (i.e., likelihood of neoplastic progression) in the Barretts population. It would be quite useful to predict which patients with BA will rapidly evolve toward frank carcinoma and which will have a slow progression. The same technique could be applied to the esophageal carcinoma population, with the purpose of identifying therapeutic responders versus nonresponders or long survivors versus short survivors. Therapies could be tailored based on these findings, with significant improvements in the outcome and quality of life of patients with esophageal premalignant and malignant lesions. However, to achieve high sensitivity and specificity, it will be necessary to analyze a large cohort of patients and to have very accurate pathologic, clinical, and epidemiological data.
The classification results obtained with ANNs add to the promise of microarray-derived data analysis. The results of another research group (14) , as well as our own (15) , suggest that ANNs are robust classifiers, capable of recognizing patterns even when data are incomplete, biased, or extremely abundant. With the proper training, ANNs are capable of making subtle correlations and distinctions based on complex input data, such as microarray data, and to use these correlations to arrive at a correct diagnosis. In fact, there are many published examples in which ANNs performed better than classical statistical techniques (36 , 37) . Although we did not compare the efficiency and accuracy of these two approaches in a large, prospective study, our results suggest that ANNs will at least complement classical statistical approaches.
In conclusion, these data suggest that ANNs are a useful addition to the armamentarium of methods currently used to analyze microarray data. They offer the potential to advance our ability to extract more information from microarray data and to more accurately interpret these data. Finally, when combined with gene filtering algorithms, ANNs appear to represent powerful tools to identify and understand subtle or occult pathologic processes. Ultimately, we believe that these tools will have a far-ranging impact on cancer detection, diagnosis, and management.
| FOOTNOTES |
|---|
1 Supported by NIH Grants DK47717, CA95323, CA85069, CA78843, and CA77057 and the Medical Research Office, Department of Veterans Affairs. ![]()
2 Y. X., F. M. S., and J. Y. contributed equally to this manuscript. ![]()
3 To whom requests for reprints should be addressed, at University of Maryland, Room N3W62, 22 South Greene Street, Baltimore, MD 21201. E-mail: smeltzer{at}medicine.umaryland.edu. ![]()
4 The abbreviations used are: BA, Barretts esophagus; SAM, Significance Analysis of Microarrays; ANN, Artificial Neural Network; FDR, False Discovery Rate; aRNA, amplified RNA; CA, esophageal cancer. ![]()
5 Internet addresses: http://www.microarrays.org/protocols.html and rana.lbl.gov. ![]()
6 Internet address: http://microarray.umaryland.edu. ![]()
Received 12/ 8/01. Accepted 4/17/02.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
S. M. Lagarde, P. E. Ver Loren van Themaat, P. D. Moerland, L. A. Gilhuijs-Pederson, F. J. W. ten Kate, P. H. Reitsma, A. H. C. van Kampen, A. H. Zwinderman, F. Baas, and J. J. B. van Lanschot Analysis of Gene Expression Identifies Differentially Expressed Genes and Pathways Associated with Lymphatic Dissemination in Patients with Adenocarcinoma of the Esophagus Ann. Surg. Oncol., December 1, 2008; 15(12): 3459 - 3470. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Sabo, P. A. Meitner, R. Tavares, C. L. Corless, G. Y. Lauwers, S. F. Moss, and M. B. Resnick Expression Analysis of Barrett's Esophagus-Associated High-Grade Dysplasia in Laser Capture Microdissected Archival Tissue Clin. Cancer Res., October 15, 2008; 14(20): 6440 - 6448. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. M. Lagarde, F. J. W. ten Kate, D. J. Richel, G. J. A. Offerhaus, and J. J. B. van Lanschot Molecular Prognostic Factors in Adenocarcinoma of the Esophagus and Gastroesophageal Junction Ann. Surg. Oncol., February 1, 2007; 14(2): 977 - 991. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-Y. Chung, T. Braunschweig, N. Hu, M. Roth, J. L. Traicoff, Q.-H. Wang, V. Knezevic, P. R. Taylor, and S. M. Hewitt A multiplex tissue immunoblotting assay for proteomic profiling: a pilot study of the normal to tumor transition of esophageal squamous cell carcinoma. Cancer Epidemiol. Biomarkers Prev., July 1, 2006; 15(7): 1403 - 1408. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Luthra, T.-T. Wu, M. G. Luthra, J. Izzo, E. Lopez-Alvarez, L. Zhang, J. Bailey, J. H. Lee, R. Bresalier, A. Rashid, et al. Gene Expression Profiling of Localized Esophageal Carcinomas: Association With Pathologic Response to Preoperative Chemoradiation J. Clin. Oncol., January 10, 2006; 24(2): 259 - 267. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. G. Talbot, C. Estilo, E. Maghami, I. S. Sarkaria, D. K. Pham, P. O-charoenrat, N. D. Socci, I. Ngai, D. Carlson, R. Ghossein, et al. Gene Expression Profiling Allows Distinction between Primary and Metastatic Squamous Cell Carcinomas in the Lung Cancer Res., April 15, 2005; 65(8): 3063 - 3071. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. T. Kimchi, M. C. Posner, J. O. Park, T. E. Darga, M. Kocherginsky, T. Karrison, J. Hart, K. D. Smith, J. J. Mezhir, R. R. Weichselbaum, et al. Progression of Barrett's Metaplasia to Adenocarcinoma Is Associated with the Suppression of the Transcriptional Programs of Epidermal Differentiation Cancer Res., April 15, 2005; 65(8): 3146 - 3154. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Helm, S. A. Enkemann, D. Coppola, J. S. Barthel, S. T. Kelley, and T. J. Yeatman Dedifferentiation Precedes Invasion in the Progression from Barrett's Metaplasia to Esophageal Adenocarcinoma Clin. Cancer Res., April 1, 2005; 11(7): 2478 - 2485. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Takahashi, T. Kobayashi, and H. Honda Construction of robust prognostic predictors by using projective adaptive resonance theory as a gene filtering method Bioinformatics, January 15, 2005; 21(2): 179 - 186. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. M. Kim, H. J. Jeong, M. Y. Seo, S. C. Kim, G. Cho, C. H. Park, T. S. Kim, K. H. Park, H. C. Chung, and S. Y. Rha Determination of Genes Related to Gastrointestinal Tract Origin Cancer Cells Using a cDNA Microarray Clin. Cancer Res., January 1, 2005; 11(1): 79 - 86. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Kan, Y. Shimada, F. Sato, T. Ito, K. Kondo, G. Watanabe, M. Maeda, S. Yamasaki, S. J. Meltzer, and M. Imamura Prediction of Lymph Node Metastasis with Use of Artificial Neural Networks Based on Gene Expression Profiles in Esophageal Squamous Cell Carcinoma Ann. Surg. Oncol., December 1, 2004; 11(12): 1070 - 1078. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Tamoto, M. Tada, K. Murakawa, M. Takada, G. Shindo, K.-i. Teramoto, A. Matsunaga, K. Komuro, M. Kanai, A. Kawakami, et al. Gene-Expression Profile Changes Correlated with Tumor Progression and Lymph Node Metastasis in Esophageal Cancer Clin. Cancer Res., June 1, 2004; 10(11): 3629 - 3638. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. S. Dahlberg, L. F. Ferrin, S. M. Grindle, C. M. Nelson, C. D. Hoang, and B. Jacobson Gene expression profiles in esophageal adenocarcinoma Ann. Thorac. Surg., March 1, 2004; 77(3): 1008 - 1015. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. T. McManus, A. Olaru, and S. J. Meltzer Biomarkers of Esophageal Adenocarcinoma and Barrett's Esophagus Cancer Res., March 1, 2004; 64(5): 1561 - 1569. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. M. Selaru, J. Yin, A. Olaru, Y. Mori, Y. Xu, S. H. Epstein, F. Sato, E. Deacu, S. Wang, A. Sterian, et al. An Unsupervised Approach to Identify Molecular Phenotypic Components Influencing Breast Cancer Features Cancer Res., March 1, 2004; 64(5): 1584 - 1588. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. C. Gunther, D. J. Stone, R. W. Gerwien, P. Bento, and M. P. Heyes Prediction of clinical drug efficacy by classification of drug-induced genomic expression profiles in vitro PNAS, August 5, 2003; 100(16): 9608 - 9613. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Mori, F. M. Selaru, F. Sato, J. Yin, L. A. Simms, Y. Xu, A. Olaru, E. Deacu, S. Wang, J. M. Taylor, et al. The Impact of Microsatellite Instability on the Molecular Phenotype of Colorectal Tumors Cancer Res., August 1, 2003; 63(15): 4577 - 4582. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Su, N. Hu, J. Shih, Y. Hu, Q.-H. Wang, E. Y. Chuang, M. J. Roth, C. Wang, A. M. Goldstein, T. Ding, et al. Gene Expression Analysis of Esophageal Squamous Cell Carcinoma Reveals Consistent Molecular Profiles Related to a Family History of Upper Gastrointestinal Cancer Cancer Res., July 15, 2003; 63(14): 3872 - 3876. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. E. Hansel, A. Rahman, M. Hidalgo, P. J. Thuluvath, K. D. Lillemoe, R. Shulick, J.-L. Ku, J.-G. Park, K. Miyazaki, R. Ashfaq, et al. Identification of Novel Cellular Targets in Biliary Tract Cancers Using Global Gene Expression Technology Am. J. Pathol., July 1, 2003; 163(1): 217 - 229. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. C.W. Poon, T.-T. Yip, A. T.C. Chan, C. Yip, V. Yip, T. S.K. Mok, C. C.Y. Lee, T. W.T. Leung, S. K.W. Ho, and P. J. Johnson Comprehensive Proteomic Profiling Identifies Serum Proteomic Signatures for Detection of Hepatocellular Carcinoma and Its Subtypes Clin. Chem., May 1, 2003; 49(5): 752 - 760. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. M. Lippman and W. K. Hong Cancer Prevention Science and Practice Cancer Res., September 15, 2002; 62(18): 5119 - 5125. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Cancer Research | Clinical Cancer Research |
| Cancer Epidemiology Biomarkers & Prevention | Molecular Cancer Therapeutics |
| Molecular Cancer Research | Cancer Prevention Research |
| Cancer Prevention Journals Portal | Cancer Reviews Online |
| Annual Meeting Education Book | Meeting Abstracts Online |