Metastasis represents a crucial transition in disease development and progression and has a profound impact on survival for a wide variety of cancers. Cell line models of metastasis have played an important role in developing our understanding of the metastatic process. We used a 19,200-element human cDNA microarray to profile transcription in three paired cell-line models of colorectal tumor metastasis. By correlating expression patterns across these cell lines, we have identified 176 genes that appear to be differentially expressed (greater than 2-fold) in all highly metastatic cell lines relative to their reference. An analysis of these genes reiterates much of our understanding of the metastatic process and suggests additional genes, many of previously uncharacterized function, that may be causatively involved in, or at least prognostic of, metastasis. Northern analysis of a limited number of these genes validates the observed pattern of expression and suggests that further investigation and functional characterization of the identified genes is warranted.
CRC4 is the second most common cause of cancer-related deaths in the United States, the majority of which are secondary to liver metastasis. Whereas current treatments have made significant improvements in patient survival, cases with metastatic cancer frequently result in death (1) . One of the challenges to effective therapy lies in understanding the complex biology of metastasis in human colon cancer, thus allowing for new therapeutic and genetic intervention. Over the past few decades, a number of in vivo and in vitro models have been developed to study various aspects of colon carcinomas, including tumor cell heterogeneity (2) , drug resistance (3) , and metastasis (4 , 5) . One of the more widely used models for in vivo evaluation of human colorectal liver metastasis is the KM12 cell line system (6 , 7) . The cell lines were established by intrasplenically or s.c. injecting poorly metastatic KM12C cells derived from a Dukes’ B stage colon carcinoma into nude mice to isolate variant lines KM12L4A and KM12SM, respectively, both with high liver metastatic potential (7) . The advantage of this model system is that the highly metastatic variants are genetically related to the poorly metastatic parental cell line KM12C and, for this reason, presumably share the expression of the majority of genes with the exception of those promoting metastasis. To identify potential markers for metastasis, we used cDNA microarrays to examine global gene expression patterns in the KM12 cell lines. To further refine the metastatic gene set and to ensure that identified genes were not limited to one cell line, we compared the expression profiles to a separate, genetically matched set of poorly (SW480) and highly (SW620) metastatic CRC cell lines. SW480 was derived from a primary Dukes’ C colon cancer, whereas SW620 was derived from a lymph node metastasis in the same patient (8) .
cDNA microarrays were used to generate a molecular fingerprint of gene expression patterns with the goal of elucidating changes that may contribute to the metastatic process. Genome sequencing and EST projects have provided a wealth of data to study the biology of tumor progression. Researchers have access to more than 2,000,000 ESTs representing greater than 80% of the estimated human genes. Fewer than 10,000 human genes however, have functionally annotated entries in GenBank. To best use these resources, the microarray technique presents parallel expression analysis of thousands of these genes in a single experiment without prior knowledge of gene function (9 , 10) .
An array of 19,200 clones selected using EST assemblies that comprise the TIGR HGI,5 was used to assess patterns of gene expression between low and high metastatic states of colon carcinomas. A comparison of transcriptional levels has identified 176 genes that appear to be differentially expressed (greater than 2-fold) in all highly metastatic cell lines relative to their reference. Included in the gene set are those previously known to possess altered patterns of expression in colon cancer as well as a large number with previously uncharacterized function. The identification of these additional genes is of tremendous importance because they provide potential new insights into cancer biology, they may allow a better understanding of pathways that regulate metastasis, and they yield gene expression patterns that may be diagnostic or prognostic of metastasis in colon cancer.
MATERIALS AND METHODS
Clone Set and Array Preparation.
cDNA clones were selected using EST sequences represented within the TIGR HGI.6 HGI consists of THC sequences, derived from coding and EST sequences deposited in GenBank, using a highly refined process in which sequences are first clustered, then assembled (11 , 12) . Clones were grown overnight in 96-well microtiter plates; clone inserts were amplified using vector-specific primers and were validated and purified; aliquots of each reaction were combined 1:1 with DMSO for arraying as described previously (13) . The success rate for single-band amplification was greater than 87.5%; 6.3% of the reactions yielded multiple or weak bands, and 6.2% failed to amplify. Wherever possible, alternate cDNA clones were selected for those failing amplification. All of the clones were sequenced to verify their identities, with 78% yielding usable sequence (15,028 of 19,200), of which 96% (14,379 of 15,028) allowed positive identification for an overall 75% clone validation success rate. Clone amplicons were printed onto CMT-GAPS amino-silane coated glass microscope slides (Corning) using a high-speed robotic arrayer (Intelligent Automation Systems) at 42–45% relative humidity. Printed slides were cross-linked by UV irradiation (60 mJ) and stored at 25°C in a desiccated environment until use.
Tissue Culture and Probe Preparation.
Tumor cell lines were maintained in RPMI 1640 (Life Technologies, Inc.) modified medium containing l-glutamine and supplemented with 10% fetal bovine serum, 200 units/ml penicillin, and 200 μg/ml streptomycin. Cells were maintained in 150-mm culture dishes (Corning) at 37°C in 5% CO2. Total RNA was extracted from the cells using Triazol (Life Technologies, Inc.) and was used to prepare direct Cy3- and Cy5-labeled first-strand cDNA probes for hybridization as described previously (13) . Before hybridization, slides were incubated in 1% BSA to block nonspecific hybridization to the glass surface; differentially labeled pooled probes were hybridized to microarray slides overnight and washed. Expression was assayed by measuring fluorescence intensities using the Genepix 4000 (Axon) dual-color confocal laser scanner; data were recorded as paired 16-bit TIFF images.
Image Processing and Data Analysis.
Images were analyzed to determine each spot’s background-subtracted integrated fluorescence intensity in both the Cy3 and Cy5 channels using TIGR Spotfinder.7 This software uses a dynamic thresholding algorithm to identify spots and calculate local background and intensities. For each array, expression ratios were normalized using an iterative mean log2(ratio) centering approach. Data from replica arrays were averaged and genes showing consistent differential expression across replicas were tallied for each pair of cell lines. For the KM12-derived lines, the intersection of the KM12SM and KM12L4A gene sets were identified. The intersection of those differentially expressed in the SW620/SW480 lines were also identified. Each of the clones representing differentially expressed genes on the arrays and selected for further evaluation were subjected to an additional round of sequence verification, with 100% validation of clone identity.
cDNA microarrays containing 19,200 elements were constructed using clones selected by sequence analysis of ESTs comprising the TIGR HGI. In selecting clones, priority was given to known genes with mapping information; the remainder was selected to represent genes of unknown function. Included in this array were 19,159 distinct cDNA clones representing 14,167 separate tentative consensus sequences. This included 9,577 cDNAs corresponding to 6,328 genes of known function, 9,582 of unknown function and 6,467 with mapping information (14) .8 Clone inserts were amplified by PCR, purified to remove contaminants, and mechanically spotted at high density onto silane-coated microscope slides using a high-speed robotic system (13) .
To identify genes with differential expression profiles in colon cancer metastasis, we used the arrays to compare genetically related colon carcinoma cell lines. For each set of experiments, mRNA was extracted from the parental poorly metastatic KM12C cells, reverse transcribed, and labeled with Cy 3-dUTP. These were used as controls. Gene expression in KM12C was compared with that in the highly metastatic KM12L4A and KM12SM, the RNA of which was reverse transcribed and labeled with Cy 5-dUTP. Similarly, transcript levels of SW480 were compared with SW620. Relative expression was assayed by a two-color hybridization. Four replicates were performed for each experiment. For any pair of cell lines, genes exhibiting a consistent 2-fold up- or down-regulation across three or more replicas were considered significant. Consensus sets from each of the KM12 cell lines were then compared to generate a KM12-specific gene set. Finally, this set was cross-referenced with a consensus set derived from the SW480/SW620 cell lines to produce a final list of candidate metastasis-associated genes.
Of the 19,200 elements analyzed by this method, 176 genes appear to have differential expression patterns between the cellular phenotypes of low and high liver metastasis in the KM12 and SW cells (Fig. 1)⇓ . Of these, mRNA levels for 121 genes are up-regulated and 55 genes are down-regulated. Included are 108 genes of known function and 68 genes of unknown function. Analysis of each individual set of experiments showed 2421 and 2834 genes showing differential expression patterns in the KM12L4A and KM12SM cell lines respectively. Of these, 1911 genes were differentially expressed in both metastatic variants (Fig. 1)⇓ . The SW metastatic variant exhibited 1569 genes with altered patterns of expression as compared with its reference (Fig. 1)⇓ . Overall, most of the genes exhibited an average of ∼2- to 4-fold change in relative expression. A small percentage of genes showed >5-fold change in expression levels. Relatively small incremental changes in gene expression are not surprising because progression to metastasis is the result of the cumulative accumulation of a number of spontaneous molecular changes, some of which may be subtle (15) . By comparison, larger-fold differences would be expected when comparing normal mucosa to paired cancerous tissues.9
Northern blot analysis was performed on a subset of genes to validate results from the microarrays (Fig. 2)⇓ . A set of 11 clones was selected from those that were most significantly differentially expressed and that represented both up- and down-regulated genes not previously associated with colon cancer. These were sequence validated, and the inserts were amplified by PCR, purified, labeled, and used to probe blots containing RNA extracted from the cell lines. Cyclophilin was used as the control for all of the Northern blots because its expression was observed to be nearly constant across all microarray assays. Note that glyceraldehydephosphate dehydrogenase (GAPDH), which is commonly used as a control for Northern analyses, was differentially expressed in the highly metastatic cell lines. The zinc finger protein ZNF37A (THC545551), GlcNac transferase (THC509558), novel G-protein coupled receptor GPCR48 (THC545962), lipocortin II (THC480662), FGFR4 (THC480331), and two genes of unknown function [THC530481 and THC559091 (KIAA0746)], were found to be induced in the highly metastatic cell lines, as had been determined using microarrays. The transcription factor MADS (THC563169), nucleolar autoantigen (THC485148) a hypothetic protein (THC528453), and THC511611 were down-regulated in the metastatic cell lines, similar to the array analysis. As has been reported previously (16 , 17) , whereas the absolute magnitude of the relative expression level determined by the Northern analysis differed slightly from that measured on the arrays, the direction of change in expression was consistent between the techniques. An additional four cDNAs were selected for confirmation by Northern analysis; however, these failed to give a detectable signal after hybridization. On analysis of the microarray data, we found these genes to be expressed at very low levels, which suggested that alternative means of validation, such as quantitative reverse transcription-PCR, would be necessary for confirmation.
Metastasis is broadly defined as the formation of secondary foci at a site that is distant from the primary site of origin. This process involves a series of interdependent, sequential events that include initial growth, angiogenesis, invasion, extravasation, and establishment of new growth at the secondary site. Gene expression profiling that uses high-density cDNA microarrays provides a revolutionary approach to studying metastasis. Transcriptional changes of genes associated with CRC cell line systems, and the potential role of these changes in various steps of tumor metastasis, are described below. Differentially expressed genes have been grouped by known or putative functions (Table 1)⇓ .
Cell Cycle and Tumorigenesis.
Progression of the cell cycle from its initial growth phase (G1) to its mitotic phase (M) is driven by positive and negative regulators that ultimately direct the fate of a cell either to form two daughter cells or to enter into resting state (G0). Dysregulated cellular proliferation, arising from abnormal expression of genes that control cell cycle checkpoints (G1-S and G2-M phases), plays a critical role in tumorigenesis. Calpastatin is an endogenous inhibitor of calpain, a cystein protease implicated in apoptosis via inhibition of the cyclin-dependent kinase inhibitor p27 and cyclin D1 (18) . We saw a 3.5-fold increase in expression of calpastatin, which seems to suggest that the gene may be involved in cellular proliferation by inhibiting calpain-induced apoptosis. This is the first report of the association of calpastatin with metastasis. Changes in expression levels of factors controlling transcriptional and translational machinery such as Nap1 also suggest an increased predisposition of these cells to transformation. Moreover, the nuclear proliferation marker Ki-67 antigen is also overexpressed, which provides further corroboration of cellular proliferation in these cell lines. Cell cycle pathways are critical for the initial steps of tumorigenesis. Our analysis of gene expression changes in late stages of cancer progression continues to show the involvement of cell cycle events in metastasis. SW620 cells also have a higher bromodeoxyuridine labeling index than the SW480 cells (8) , which suggests that cell cycle control may be an important factor even in later stages of tumor progression.
Genetic instability is a characteristic feature of CRCs (19) . Genes involved in DNA mismatch repair have long been associated with hereditary nonpolyposis CRCs (20) . A decrease in expression of the mutL homologues, hPMS1 and hPMS2 suggests that these cell lines exhibit a high level of microsatellite instability that is attributable to a lack of replication error repair activity. It is possible, in fact, that these cell lines were derived from patients with hereditary nonpolyposis CRC, or that this instability is a product of in vitro passage. Changes in expression of genes, such as Bub1 and Bub1B, that are involved in chromosomal instability were seen in the KM-derived and SW cell lines, respectively.
Hypoxia is a common occurrence in tumors and arises from a lack of blood supply to rapidly proliferating cells (21) . Once the tumor mass reaches a diameter of ∼2 mm, establishment of new vascular system is essential for its survival. Until then, endurance in hypoxic conditions is an important factor for tumor progression. Transformation studies have earlier shown the regulation of individual genes in tumor progression (21) . Here we have comprehensively shown the regulation of glycolytic enzymes during metastasis. Enolase (α) showed the most dramatic increase in expression, (∼6-fold) in all three highly metastatic cell lines and could potentially be used as a marker for metastasis in colon carcinomas. There was a general up-regulation of genes involved in glycolysis in the KM12-derived cell lines, which suggested the need for the glycolytic pathway as an alternate energy source for cell survival during liver metastasis.
The KM12 and SW cell lines showed varied patterns of growth factor expression. High levels of IGF-I have been associated with increased risk of cancer (22) . Whereas the KM12 metastatic cell lines showed elevated levels of IGF-I to the order of 3.7-fold, the SW cell lines did not show any significant change in IGF-I expression. Similarly, the growth-enhancing properties of transforming growth factor β during angiogenesis in colon cancers have been well documented. Here too, the SW620 cells did not show any altered patterns of expression. Although IGF-I and transforming growth factor β may be used as prognostic markers in carcinomas (23 , 24) , their significance as diagnostic or prognostic markers for metastasis would be questionable.
Genes associated with hepatomas were also represented in the expression profile. The gene encoding for the hepatocellular carcinoma-associated protein is overexpressed ∼10-fold in all three metastatic cell lines. Differential expression of hepatoma-derived growth factor and hepatocyte nuclear factor 3β are specific only to the liver-metastatic KM12 variants and not to the lymph node-metastatic SW cells, which indicates that these genes may be more specific to liver metastasis whereas the hepatocellular carcinoma-associated protein may be associated with general metastasis of colon carcinomas.
Among all of the genes differentially expressed, those involved in cytoskeletal organization and the extracellular matrix formation such as actin-β, tubulin βd1, and profilin II were most significant. For a tumor to become invasive, it must pass through the muscularis mucosa and infiltrate the subserosal layer in which terminal lymphatics reside. Subsequently, genes that are involved in breaking the barriers of cellular adhesion play an important role in tumor invasiveness. In general, there was no significant change in expression of homotypic cell adhesion molecules such as CEACAM7 and the nonspecific cross-reacting antigen (NCA) molecules.
Gross abnormalities were also observed in expression of genes involved in the formation of cytoskeletal architecture and the extracellular matrix. The actin cytoskeleton is the basic machinery that makes cells motile, a characteristic property of invasive cells. Evidence for dynamic actin-based cytoskeletal motility in the metastatic cell lines comes from differential expression of genes that are involved in actin polymerization such as actin-capping proteins, which showed an ∼6-fold increase in expression.
Signal Transduction and Cancer Metastasis.
Cross-talk between different pathways makes intracellular signal transduction a challenging area for cancer research. The molecular components of signal transduction that lead to tumorigenicity are poorly understood. One gene that is consistently overexpressed across all three cell lines is a novel G-protein-coupled receptor, GPCR48 (Fig. 2)⇓ . The function of this protein is unknown. However, its expression pattern in metastasis could make this a useful marker for colon cancer metastasis. In addition to the protein phosphatases and protein kinases, a large number of oncogenes such as the set oncogenes and the v-Yes oncogenes show increased expression. Proto-oncogene c-k-ras, a highly characterized indicator of carcinomas, showed increased expression in the KM12 cell lines. The SW480 cells contain a point mutation in the ki-ras gene that results in activated ras gene product (25) . No significant change in expression of c-k-ras was detected on going from the low metastatic cells to their highly metastatic SW620 variant. Other proto-oncogenes such as N-ras and rhoA also exhibit elevated levels in all highly metastatic variants. Recently, overexpression of RhoC has been demonstrated to stimulate metastasis in melanoma cells (26) . Our findings provide evidence that other members of the Rho family of small GTPases may contribute to the process of metastasis.
Apart from the known genes, 39% of the genes that are differentially expressed have no assigned functional role. A hypothetical protein (THC347434) is consistently underrepresented in the highly metastatic variants (Fig. 2)⇓ . Sequence analysis of the gene using SMART protein prediction tools (27) , revealed that it contains multiple zinc finger domains, which suggests it may encode a DNA-binding protein. Although further work will be required to fully characterize this and other genes, their patterns of expression may make them useful as markers of metastasis.
To demonstrate the utility of cell line analysis as a model for understanding clinical prognosis, we selected two previously uncharacterized transcripts, THC511611 and THC559091 (KIAA0746), which were shown to be down- and up-regulated, respectively. These were then used to measure relative expression in paired samples of normal and tumor tissue (Fig. 3)⇓ . Because these analyses again confirmed the results obtained from the arrays, we then used KIAA0746 to probe a tumor progression blot to characterize its expression through tumor progression and to assess its potential utility as a marker for metastasis (Fig. 4)⇓ .
The use of cell lines as model systems to study colon cancer metastasis may not represent the disease in its entirety. The CRC cellular models used in these experiments, which are based in intrasplenic injections, actually represents a test of the capacity of CRC to adhere to and to colonize the liver (experimental metastasis), which may be different from a model of spontaneous metastasis in which tumor cells are injected s.c. or intracecally and then metastasize distantly. In the former, tumor cells can bypass the invasion processes, whereas in the latter, tumor cells must first have the capacity to invade the primary organ to gain access to the vasculature and lymphatic systems. The model does, however, present a comprehensive picture of critical regulators implicated in the process. We used CRC cell lines to identify genes that may play an important role in metastasis. Many of these genes had not been previously identified and serve as interesting candidates for further investigation. An extension of this analysis would be to use these genes as screening tools in extensive disease states to identify targets that would serve as potential diagnostic or prognostic markers for colon cancer metastasis.
We thank R. L. Malek and E. Snesrud for assistance in the development of the laboratory protocols; I. E. Holt, J. Li, J. Tsai, and J. White for computational assistance; and C. M. Fraser for her key role in initiating and facilitating this work.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
↵1 Supported by grants from the National Cancer Institute [NCI CA85052-01A1 and NCI CA85429-01 (to T. J. Y.) and NCI CA77049-02 and NCI 6120-119-L0-A (to J. Q.)], from the American Cancer Society [ACS RPG-99-099-01-MGO (to T. J. Y.)], the National Institute of Neurological Disorders and Stroke [NINDS NS-35231 (to N. H. L.)], and the National Heart Lung and Blood Institute [NHLBI HL-59781 ( to N. H. L.)].
↵2 Present address: GlaxoSmithKline, King of Prussia, PA 19406.
↵3 To whom requests for reprints should be addressed, at E-mail: ; or
↵4 The abbreviations used are: CRC, colorectal cancer; EST, expressed sequence tag; HGI, Human Gene Index; TIGR, The Institute for Genomic Research; THC, tentative human consensus (sequence); IGF-I, insulin-like growth factor I.
↵5 Internet address: http://www.tigr.org/tdb/tgi.shtml.
↵6 Internet address: http://www.tigr.org/tdb/hgi/hgi.html.
↵7 Internet address: http://cancer.tigr.org/tools/.
↵8 Summary information for the array can be found at http://cancer.tigr.org/data/Hum19k_1.
↵9 The list of genes, with average measured levels of expression can be found online at http://cancer.tigr.org/data.
- Received June 1, 1901.
- Accepted August 28, 1901.