Cancer Research
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Cancer Research Clinical Cancer Research
Cancer Epidemiology Biomarkers & Prevention Molecular Cancer Therapeutics
Molecular Cancer Research Cancer Prevention Research
Cancer Prevention Journals Portal Cancer Reviews Online
Annual Meeting Education Book Meeting Abstracts Online

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Ferguson, D. A.
Right arrow Articles by Graff, J. M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Ferguson, D. A.
Right arrow Articles by Graff, J. M.
[Cancer Research 65, 8209-8217, September 15, 2005]
© 2005 American Association for Cancer Research


Cell and Tumor Biology

Selective Identification of Secreted and Transmembrane Breast Cancer Markers using Escherichia coli Ampicillin Secretion Trap

Deborah A. Ferguson1, Matthew R. Muenster1, Qun Zang1, Jeffrey A. Spencer1, Jeoffrey J. Schageman2, Yun Lian2, Harold R. Garner2, Richard B. Gaynor3, J. Warren Huff7, Alexander Pertsemlidis2, Raheela Ashfaq4, John Schorge5, Carlos Becerra3, Noelle S. Williams6 and Jonathan M. Graff1,3

1 Center for Developmental Biology; 2 Eugene McDermott Center for Human Growth and Development; Departments of 3 Medicine, 4 Pathology, 5 Obstetrics and Gynecology, and 6 Biochemistry, University of Texas Southwestern Medical Center; and 7 Reata Discovery, Inc., Dallas, Texas

Requests for reprints: Jonathan Graff, Center for Developmental Biology, University of Texas Southwestern Medical Center at Dallas, 6000 Harry Hines Boulevard, NB5.112, Dallas, TX, 75390-9039. Phone: 214-648-1481; E-mail: Deborah.Ferguson{at}UTSouthwestern.edu.


    Abstract
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 References
 
Secreted and cell surface proteins play important roles in cancer and are potential drug targets and tumor markers. Here, we describe a large-scale analysis of the genes encoding secreted and cell surface proteins in breast cancer. To identify these genes, we developed a novel signal sequence trap method called Escherichia coli ampicillin secretion trap (CAST). For CAST, we constructed a plasmid in which the signal sequence of ß-lactamase was deleted such that it does not confer ampicillin resistance. Eukaryotic cDNA libraries cloned into pCAST produced tens of thousands of ampicillin-resistant clones, 80% of which contained cDNA fragments encoding secreted and membrane spanning proteins. We identified 2,708 unique sequences from cDNA libraries made from surgical breast cancer specimens. We analyzed the expression of 1,287 of the 2,708 genes and found that 166 were overexpressed in breast cancers relative to normal breast tissues. Eighty-five percent of these genes had not been previously identified as markers of breast cancer. Twenty-three of the 166 genes (14%) were relatively tissue restricted, suggesting use as cancer-specific targets. We also identified several new markers of ovarian cancer. Our results indicate that CAST is a robust, rapid, and low cost method to identify cell surface and secreted proteins and is applicable to a variety of relevant biological questions.


    Introduction
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 References
 
Cell surface and secreted proteins are important both in basic science (i.e., cell-cell signaling, cellular adhesion and migration, morphogenesis, and ionic conductance) and clinically (e.g., controlling many characteristics of malignancy including proliferation, angiogenesis, tissue invasion, and metastasis; ref. 1). Several genes that encode such proteins, including Her-2/neu, PSA, MMPs, and VEGF, are overexpressed in cancers and contribute to the malignant phenotype (28). These proteins are important targets for diagnostic blood screening tests, small molecule inhibitors, and monoclonal antibody therapies (913). Therefore, selective identification and characterization of secreted and cell surface proteins that are produced by malignant tissues may provide novel targets for cancer diagnosis and treatment.

Recently, several methods to identify cell surface and secreted proteins have been developed, including bioinformatics, cell fractionation combined with cDNA microarrays, mass spectrometry, and signal sequence traps in eukaryotic cells (1421). Although these methods can identify cell surface and secreted proteins, each has significant limitations. Signal peptides, which target secreted and transmembrane proteins to their appropriate subcellular location, typically consist of 4 to 15 hydrophobic amino acids flanked by a basic NH2 terminus and a polar COOH terminus (22). A consensus sequence for the signal peptide has not been identified, which means that standard molecular techniques are not well suited to identify such proteins. This also makes it difficult to design computer algorithms to identify true signal sequences from the genome (20). cDNA microarrays hybridized with probes derived from membrane-bound polysome RNA (18) permits screening of many clones simultaneously but is challenging and precludes identification of novel genes as it is restricted to the clones that are represented on the chip (19). Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry has identified several cell surface proteins enriched in cancers (21). However, elucidation of the protein and corresponding gene sequence from mass spectrometry data is labor intensive, time consuming, and expensive. Signal sequence traps done in eukaryotic cells have identified secreted and transmembrane proteins; however, they are inefficient: tens of thousands to millions of clones must be screened to identify a few positives (1417).

The mechanisms for protein translocation across prokaryotic and eukaryotic membranes are relatively conserved (23); therefore, we hypothesized that mammalian signal sequences could functionally replace those of prokaryotic genes. Thus, we developed a survival-based signal sequence trap called Escherichia coli ampicillin secretion trap (CAST) that exploits the ability of mammalian signal sequences to confer ampicillin resistance to a mutant ß-lactamase lacking the endogenous signal sequence (24). Here, we show the ability of CAST to identify thousands of cell surface and secreted proteins from several different eukaryotes.

To evaluate the methodology on a large scale, we applied CAST to breast cancer. Breast cancers affect over 215,000 women each year and is the second leading cause of cancer deaths in women in the United States (25). Monoclonal antibody therapies against cell surface proteins expressed in breast cancers, such as Herceptin are effective (26); however, Herceptin is only helpful for the 20% to 30% of breast cancers that overexpress Her-2/neu (3). Furthermore, there are no effective blood tests for breast cancer. As much work has been done on breast carcinogenesis, finding new breast cancer cell surface and secreted markers would be a stringent test of CAST with potential clinical ramifications. Using CAST, we identified thousands of cell surface and secreted protein-encoding genes expressed in a variety of different types and stages of breast cancer. Of note, over half of the genes were previously uncharacterized, making them excellent candidates for further study. Next, we analyzed the expression of over one thousand of these genes and found 166 that were overexpressed in breast cancers compared with normal breast tissues. Many of these differentially expressed genes also displayed a limited distribution in other normal tissues making them excellent candidate breast cancer markers. We extended the approach to ovarian cancer and found many potential markers of that disease as well. Taken together, these data support the notion that CAST can isolate a large number of secreted and transmembrane proteins and can identify a battery of potential markers and therapeutic targets in cancer.


    Materials and Methods
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 References
 
RNA isolation. Human breast samples were obtained with informed consent from University of Texas Southwestern Medical Center (UTSW) and ILSBio (Chestertown, MD). The BC1 CAST library was constructed from RNA derived from two different breast cancers (UTSW Samples A and B). Sample A was a ductal carcinoma in situ (DCIS), grade 3, T2N0, ER. Sample B was a lobular carcinoma, grade II, T2N0, ER+. The BC2 CAST library was constructed from RNA derived from three different breast cancers (UTSW samples R, S, and U). Sample R was a ductal carcinoma, grade 3, T2N0, ER. Sample S was a ductal carcinoma, grade 3, T2N0, ER+. Sample U was a ductal carcinoma, grade 3, T3N1, ER.

The breast samples used to determine HER-2/neu levels by reverse transcription-PCR (RT-PCR) and immunohistostaining were from UTSW. All samples were ductal carcinoma unless otherwise indicated. Sample BB: DCIS, ER+; sample D: grade 3, T2N1, ER+; sample E: lobular carcinoma, grade 3, T1N0, ER+; sample I: grade 2, T1N0, ER+; sample L: grade 2, T2N0, ER+; sample R: grade 3, T2N0, ER; sample T: grade 3, T2N0, ER; sample U: grade 3, T3N1, ER; tumor V: lobular carcinoma, grade 1, T2N1, ER+; sample W: grade 3, T1N0, ER.

The samples used for the large scale RT-PCR screen were from Clinomics BioSciences, Watervliet, NY (normal breast RNA: M-0420, M-0430, M-0410, M-0470, and M-0450); Stratagene, La Jolla, CA (Normal Breast RNA); and ILSBio. These comprise 12 breast cancer specimens: stage I (samples 1 and 2), stage IIa (samples 3 and 4), stage IIb (samples 5 and 6), stage IIIa (samples 7 and 8), stage IIIb (samples 9-11), and stage IV (sample 12). Six matched normal samples were also obtained from ILSBio, corresponding to samples 1, 2, 4, 5, 6, and 8.

RNA was isolated using Trizol (Invitrogen, Carlsbad, CA). All samples were run on a denaturing agarose gel, and any RNA with detectable degradation was not included in the study. Polyadenylated RNA was purified with the Oligotex mRNA midi kit (Qiagen, Valencia, CA).

Vector and library construction. pCAST was designed to contain the kanamycin resistance gene and the ß-lactamase gene lacking the first 69 nucleotides encoding the endogenous signal peptide. EcoRI and BamHI sites were placed upstream of the mutant ß-lactamase gene for directional cloning. CAST cDNA libraries (SuperScript Choice System) were generated from 1 to 2 µg of mRNA with a random primer containing a BamHI restriction site (5'-CGGGATCCNNNNNN-3'; where N is A, C, G, or T) for reverse transcription. The EcoRI-adapted cDNA was digested with BamHI, size fractionated, ligated into pCAST (EcoRI and BamHI), and plated onto Luris-Bertani (LB)/ampicillin. The timeframe from RNA isolation to colony picking is ~1 week. Individual colonies were picked and grown in 1.5 mL LB with (50 µg/mL) kanamycin in a 96-well format. Plasmid DNA was isolated using NucleoSpin Multi-96 Flash Kits (BD Biosciences/Macherey-Nagel, Mountain View, CA) and end sequenced in 96-well format (Seqwright, Houston, TX) using a primer located within the ß-lactamase gene (5'-TCTTACCGCTGTTGAGATCC-3').

Sequence analysis. Nonredundant and expressed sequence tag (EST) databases (National Center for Biotechnology Information, NCBI) were searched for similarity to the identified sequences using the BLAST programs (27). The searches were done in batches of 96 using NETBLAST with the text files generated from sequencing. We compiled a list of nonredundant genes, ESTs, and chromosomal fragments identified using Excel. For characterized genes, subcellular localization and protein domain information was gathered from UniGene (28). Novel sequences were analyzed for predicted signal sequences and transmembrane domains using PSORTII (29), SMART (30), and SignalP (31).

Reverse transcription and PCR analysis. One microgram of RNA was reverse transcribed and amplified with PCR primers designed (Primer3; ref. 32) to generate ~200-bp amplicons. The PCR primers were synthesized by Qiagen or Illumina, Inc. (San Diego, CA) in 96-well format and delivered with the forward and reverse primers for each gene in the same well. For the large-scale screen, PCR reactions were set up in 96-well format (two genes per row for the primary screen and two rows per gene for the secondary screen). The samples were amplified (94°C, 30 seconds; 55°C, 30 seconds; and 72°C, 30 seconds) and aliquots were removed following 27, 30, and 33 cycles to ensure that the amplification was in the linear range. If the cDNA product was abundant following 27 cycles of amplification the reaction was repeated using a lower number of cycles. The aliquots were run in large electrophoresis tanks that could accommodate 192 samples at a time. For ease of manipulation, all samples were loaded using a multichannel pipetteman. For quantitative PCR, we used DyNAmo SYBR Green quantitative PCR mix and the DNA Engine Opticon 2 Continuous Fluorescence Detection System (Bio-Rad Laboratories, San Francisco, CA). Samples were denatured at 94°C for 10 minutes and incubated at 94°C for 15 seconds, 55°C for 15 seconds, and 72°C for 15 seconds for 40 cycles. Products of each primer pair were subjected to a melting curve (55-95°C) to ensure that primer-dimers and nonspecific products were not produced. Each sample was analyzed in duplicate and each experiment was repeated thrice. Relative levels of gene expression were determined using the comparative Ct ({Delta}{Delta}Ct) method using primers specific for S9 rRNA as a control.

HER-2/neu immunostaining. All steps of the immunohistochemical analyses were done at room temperature and carried out on the Dako Autostainer (DAKO, Carpinteria, CA). Reagents were used as supplied in the Dako Envision Plus Detection Kit. Four-micrometer paraffin sections were generated on a rotary microtome, mounted on positively charged glass slides (Superfrost, Fisher, Pittsburgh, PA), and baked overnight. Sections were then deparaffinized in xylene and ethanol and placed in 200 mL Dako* Target Retrieval Solution (pH 6.0), heated at 100°C for 20 minutes, cooled for 20 minutes, then rinsed thoroughly in deionized water, and loaded onto the Dako Autostainer. The sections were quenched with 3% H2O2 for 5 minutes, incubated with primary antibody for 30 minutes followed by a 30-minute incubation in Dako EnVision Plus, Peroxidase, Rabbit, a labeled dextran polymer. Sections were then stained for 5 minutes in freshly prepared diaminobenzidine and buffer substrate solution, counterstained with hematoxylin and blued in Richard Allen Bluing Reagent, dehydrated in a graded series of ethanols and xylene, coverslipped, and reviewed by light microscopy. Optimum primary antibody dilutions were predetermined using known positive control tissues. A known positive control section was included in each run to assure proper staining. Rabbit Immunoglobulin Fraction (Normal) or IgG1 monoclonal was used as a negative control.


    Results
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 References
 
Escherichia coli ampicillin secretion trap. To develop a rapid, bacterial-based method to identify signal sequence-containing proteins, we generated a plasmid, pCAST, with a mutant ß-lactamase lacking the endogenous signal peptide (Fig. 1). A BamHI site was placed upstream of and in-frame with the mutant ß-lactamase and a EcoRI site was included for directional cloning. When transformed with pCAST, E. coli did not grow on ampicillin. Survival on ampicillin was observed only when various prokaryotic and eukaryotic cDNA fragments encoding a signal sequence were inserted into pCAST (data not shown).



View larger version (25K):
[in this window]
[in a new window]
 
Figure 1. CAST system. A BamHI, EcoRI cDNA library is directionally cloned into the pCAST vector upstream of the mutant leaderless ß-lactamase gene. pCAST also contains a kanamycin resistance cassette for ease of manipulation. cDNA fragments encoding a signal peptide or transmembrane domain in-frame with the leaderless ß-lactamase gene restore correct localization of the ß-lactamase enzyme and confer resistance to ampicillin.

 
To examine the ability of CAST to select genes encoding transmembrane and secreted proteins from a pool of sequences, random-primed cDNA libraries were generated from MCF-7 and SK-BR-3 human breast tumor cell lines, ligated into pCAST, and random ampicillin-resistant clones were sequenced. The majority encoded secreted and transmembrane proteins (type I, type II, and multispanning; data not shown).

To determine whether CAST was amenable to small quantities of starting material we generated a library from mouse mammary mRNA. Approximately 40,000 clones survived ampicillin selection; all randomly selected clones encoded secreted or transmembrane proteins (data not shown). The CAST system also worked efficiently on RNA amplified from a few early-stage Xenopus laevis embryos (data not shown). Taken together, these data support the notion that CAST readily identifies secreted and transmembrane proteins from mammalian cell lines and relatively small amounts of tissue from different organisms.

Large-scale identification of transmembrane and secreted proteins from breast cancers. To identify genes that encode cell surface and secreted proteins present in breast cancers, we generated CAST libraries from surgical specimens. To diversify the candidates, we extracted RNA from DCIS, lobular carcinoma, and invasive ductal carcinomas and generated two libraries, BC1 (DCIS and lobular) and BC2 (ductal) and ligated them into pCAST. Next, we assessed the number of putative signal sequence containing proteins in the libraries and found that each library contained ~50,000 clones that survived ampicillin selection. To further evaluate these breast cancer libraries, we sequenced DNA isolated from about 200 randomly chosen colonies that survived the ampicillin selection. Approximately 83% of the named genes identified in this pilot were predicted to encode secreted and transmembrane proteins. Taken together, these data support the notion that a large percentage of the clones identified by CAST encode secreted and cell surface proteins and that the BC1 and BC2 breast cancer libraries would be appropriate sources for a detailed and large-scale characterization.

Next we sequenced ~5,000 ampicillin-resistant colonies from each library. We compared these sequences to those deposited in the public databases using BLAST (NCBI) and categorized the clones according to their identity to known genes, hypothetical genes, ESTs and genomic DNA. The 9,719 clones yielding useful sequence information encoded 2,708 unique, nonredundant sequences. Of these, 46% were named genes, 11.5% encoded hypothetical proteins, 20% were uncharacterized ESTs, 18% were identified in genome sequencing projects, and 4.5% did not share identity with any sequences in the public databases. Thus, ~50% of the unique sequences that we identified using CAST corresponded to previously uncharacterized genes, providing an excellent source for gene discovery. As a benchmark, we analyzed the subset of named genes to determine whether the CAST screen had identified known secreted and cell surface breast cancer markers. Remarkably, many well-studied breast cancer markers were represented in our collection, including Her-2/neu, Her-3, mucin 1, mammoglobin, MMP9, MMP11, osteopontin, PAI-1, Cathepsin D, CTGF, VEGF, and many others. These data are consistent with the idea that CAST is an appropriate method to identify secreted or transmembrane protein markers of breast cancer.

Validation of reverse transcription-PCR to identify tumor markers. To help identify new breast cancer markers, we chose to analyze the expression of the genes identified by CAST using semiquantitative RT-PCR. However, we first tested the ability of this semiquantitative PCR approach to identify breast cancer markers and analyzed the expression of Her-2/neu, a known breast cancer marker that was also identified using CAST and a target for which PCR primers and antibodies are readily available. For this test, we selected four normal breast samples and 10 breast cancer samples, extracted RNA, and did semiquantitative RT-PCR and quantitative PCR with Her-2/neu-specific primers. We found that the levels of Her-2/neu were higher in tumors BB, T, and W relative to controls (Fig. 2A). Next, we analyzed the expression levels of Her-2/neu in the same samples but this time with quantitative PCR, which is thought to be a more stringent and quantitative method. Again, we found that the levels of Her-2/neu were higher in tumors BB, T, and W compared with the control samples (Fig. 2B). However, the more important issue is whether the semiquantitative RT-PCR (and quantitative PCR) results correlate with protein expression, as many potential artifacts could obscure the PCR analyses. Thus, we analyzed HER-2/neu protein expression in the identical specimens with immunohistochemistry, an approach that is routinely used in pathologic analysis of breast cancer specimens. With this technique, we only observed increased immunostaining in samples BB, T, and W, exactly paralleling the RT-PCR results. (Fig. 2C-E). These results validated the selection of semiquantitative PCR as a method to identify potential novel markers of breast cancer.



View larger version (49K):
[in this window]
[in a new window]
 
Figure 2. Validation of CAST and semiquantitative RT-PCR as a method to identify cell surface and secreted markers of breast cancer. A, expression levels of Her-2/neu were assessed by semiquantitative PCR in four normal breast and 10 breast cancer samples and found to be elevated in three cancer specimens: BB, T, and W. S9 ribosomal RNA serves as a control. B, quantitative PCR was done on the same samples as in (A). The Ct values ranged between 18 and 27 for both Her-2/neu and S9. Relative fold increase was determined by dividing the {Delta}Ct value by the average {Delta}Ct value for the four normal breast samples. Three tumors (BB, T, and W) exhibited relatively higher levels of Her-2/neu (19-, 95-, and 8-fold higher than normal breast, respectively). C-E, immunohistostaining confirmed the PCR results and showed that the HER-2/neu protein was overexpressed in breast cancers BB (data not shown), T (C), and W (D) but not in L (E).

 
Identification of novel breast cancer markers. With this understanding, we turned to our large collection of CAST-identified cell surface and secreted candidates to identify potential novel breast cancer markers. As a primary screen, we analyzed the expression of 1,287 genes with a panel of six normal breast and six breast cancer samples and identified 463 candidate genes that were overexpressed in at least one of the cancers. Next, we subjected the 463 differentially expressed candidates to a second level of screening with a panel of 12 normal breast samples (six from unaffected individuals and six matched controls from unaffected breast tissue of six of the women with breast cancer) and 12 ductal carcinomas (stages I-IV). Through the two-tiered process, we found that ~13% (166 of 1,287) were overexpressed in breast cancer. We grouped these genes into four categories based on the number of cancers in which they were overexpressed (Fig. 3). Known breast cancer markers (red) were included in the screen as positive controls and for comparison and comprised 15% (23 of 166) of the differentially expressed genes. Thus, 85% (143 of 166) of the differentially expressed genes identified by CAST were not previously described breast cancer markers. Furthermore, one third (49 of 143) of these genes were uncharacterized with respect to function (Fig. 3, green). Of the named genes, 7% (Fig. 3, blue) had been previously reported to be markers of other cancers such as leukemia or prostate carcinoma, suggesting that some of the breast cancer markers identified by CAST may also be overexpressed in additional types of cancer.



View larger version (36K):
[in this window]
[in a new window]
 
Figure 3. CAST identified a plethora of cell surface and secreted markers of breast cancer. Expression levels of the indicated transcripts were assessed by semiquantitative RT-PCR in 12 normal breast samples and 12 ductal breast cancers. The genes were parsed into four categories based upon the number of cancers in which the transcript was overexpressed (1-3, 4-6, 7-9, and 10-12). Known breast cancer markers (red), genes that mark other cancers (blue), and uncharacterized genes (green). Genes expressed in <6 normal human tissues (++) and genes expressed in 6 to 10 normal human tissues of the 20 normal tissues tested (+).

 
Representative examples of the differentially expressed genes are shown in Fig. 4. ERBB2 (Her-2/neu) is included as a control for the group of genes overexpressed in 1 to 3 of the 12 breast cancers (top). This group comprised 35 genes including STEAP2 (a prostate cancer marker; ref. 33), AMICA (a recently identified adhesion molecule; ref. 34), and a hypothetical protein, LOC116068 (Fig. 3). LOC116068 is predicted to contain a single membrane-spanning domain and lysine motif. None of these genes have previously been suggested as markers of breast cancer. Twenty-five percent (43 of 166) of the up-regulated genes were overexpressed in 4 to 6 of the 12 breast cancers tested. This group of genes included the known markers osteopontin, ERBB3, and TFF1 (Fig. 2). ENPP1, overexpressed in 5 of the 12 cancers (Fig. 4), encodes a type II transmembrane surface glycoprotein that has not previously been identified as a breast cancer marker. SLC35F5 is up-regulated in 5 of 12 cancers and contains 10 putative membrane-spanning regions. Also in this group were two clones, AC103652 (6 of 12 cancers) and AC104837 (5 of 12 cancers) that exhibited homology only to genomic DNA. Forty-two percent (70 of 166) of the candidates were overexpressed in 7 to 9 of the 12 breast cancers. The known breast cancer markers, MMP9 and VEGF, were included in this group. HPN (hepsin), a serine protease, is a marker of prostate cancer (35) and is overexpressed in renal and ovarian carcinomas (36, 37). Here, we show that HPN is also overexpressed in the majority of breast cancers tested (Fig. 4), extending the tumor spectrum of which this protein is a marker. Other genes in this group include KCNK15, a potassium channel with unknown function (7 of 12 cancers); KIAA1363, a novel 408 residue protein predicted to contain a signal peptide (7 of 12 cancers); FLJ23309, a hypothetical protein predicted to contain two to three membrane-spanning domains (8 of 12 cancers); and FLJ20174, a hypothetical protein predicted to be the mammalian homologue of the Caenorhabditis elegans protein SID-1, a multispanning membrane protein which is involved in systemic RNAi via the uptake of the double-stranded RNA into cells (refs. 38, 39; 9/12 cancers, Fig. 4). This observation may support an RNA interference–based approach for breast cancer therapeutics. The remaining 12% of the CAST targets were overexpressed in 10 to 12 of the breast cancers. Included in this group are the breast cancer markers MMP11 and MUC5B (4, 40). Figure 4 shows the expression patterns of a hypothetical protein in this group, FLJ20406. FLJ20406/I> is predicted to contain a signal sequence and has homology to the mouse Lck-interacting transmembrane adaptor protein (41). Two genes COL10A1 (collagen X, alpha 1) and CLECSF5 (C-type lectin superfamily member 5) were overexpressed in almost all of the cancers tested, although varying levels of expression were observed (Fig. 4). None of these genes were previously reported as overexpressed in breast cancer.



View larger version (81K):
[in this window]
[in a new window]
 
Figure 4. Expression patterns of CAST-identified breast cancer markers. Representative examples of gene expression patterns from each of the four categories, based upon the percentage of cancers in which the marker is overexpressed. Gene symbols are indicated to the left of each panel. Breast cancer stages (I-IV) are indicated above lanes 13 to 24. S9 rRNA serves as a loading control.

 
Tissue distribution of the Escherichia coli ampicillin secretion trap–identified breast cancer markers. An ideal cancer marker might have a restricted pattern of expression: high in the cancerous tissue and low in other adult tissues. Thus, we analyzed the expression of the 166 differentially expressed genes in a panel of 20 normal human tissues. Notably, 23 genes (14%) were detected in a relatively restricted pattern (≤25% of the tissues tested; denoted by a ‘++’ sign in Fig. 3). Many of these 23 tissue-restricted genes were derived from those that we found to be overexpressed in >50% of the breast cancers groupings described above (Fig. 3). Representative examples of these genes are shown in Fig. 5. COL10A1 was present at much higher levels in breast cancer compared with all other normal human tissues tested (Fig. 5B). Quantitative PCR confirmed the results and showed that COL10A1 was up-regulated in breast cancers 80-fold compared with the average level of expression in normal human tissues and 40-fold compared with the average level of expression in normal breast tissue (Fig. 5B). CLECSF5 was expressed at very low levels in all human tissues examined except for bone marrow (Fig. 5C). Quantitative PCR determined that the breast cancer levels were on average 4-fold higher than the average expression level in normal tissues (25-fold higher if bone marrow was excluded). The hypothetical protein FLJ20174was overexpressed in the majority of breast cancer specimens and showed a restricted pattern of expression, 8-fold higher than in normal tissues (Fig. 5D). The genes shown in Figs. 3 and 4 are up-regulated in a higher percentage of breast cancers than the Herceptin target HER-2/neu (Fig. 4), and are expressed at comparatively lower levels in normal human tissues (Fig. 5A).



View larger version (51K):
[in this window]
[in a new window]
 
Figure 5. Many CAST-identified breast cancer markers have limited expression in normal human tissues. Expression levels of the indicated transcripts were assessed using semiquantitative RT-PCR and quantitative PCR. The average Ct value for the breast cancer samples is indicated for each gene. Relative expression levels were determined from the average {Delta}Ct value of 15 normal human tissues, 12 normal breast samples, and 12 breast cancer samples. The average normal breast value was set to 1. A, Her-2/neu, which has a relatively broad tissue distribution, is shown for comparison (Ct = 24). B, Collagen X, alpha 1 (COL10A1; Ct = 24). C, C-type lectin superfamily member 5 (CLECSF5; Ct = 33). D, Hypothetical protein FLJ20174/I> (Ct = 27). All have limited tissue distribution. S9 rRNA is the control (Ct = 21).

 
Identification of ovarian cancer markers. We found that several genes, such as osteopontin, isolated in the breast cancer screen were reported to be ovarian cancer markers (42, 43). To extend that, we selected 22 genes that were overexpressed in breast cancers and had tissue restricted expression in other human tissues and analyzed the levels of these genes in a panel of seven normal ovaries and 12 ovarian cancers using semiquantitative and quantitative RT-PCR. Remarkably, 77% (17 of 22) of the tissue restricted breast cancer markers that we tested were also up-regulated in ovarian cancers relative to normal ovary tissue (Table 1). Expression levels of four genes (COL10A1, CLECSF5, FLJ20174/I>, and CXCL9) that are tissue-restricted markers of both ovarian cancer and breast cancer are shown in Fig. 6. These results were confirmed by quantitative PCR showing 7- to 50-fold higher ovarian cancer expression compared with normal ovary. These results indicate that the CAST system is useful for identifying genes that are potential markers of more than one type of cancer.


View this table:
[in this window]
[in a new window]
 
Table 1. Ovarian cancer markers identified using CAST

 


View larger version (32K):
[in this window]
[in a new window]
 
Figure 6. CAST identifies ovarian cancer markers. Expression levels of the indicated genes as assessed with semiquantitative RT-PCR and quantitative PCR. Relative expression levels were calculated by dividing the average {Delta}Ct value from 12 ovarian cancer samples by the average {Delta}Ct value of seven normal ovary samples. Average Ct values for ovarian cancers: COL10A1, Ct = 23; CLECSF5, Ct = 32; FLJ20174 Ct = 28; CXCL9, Ct = 24; S9, Ct = 18.

 

    Discussion
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 References
 
The importance of cell surface and secreted proteins in a wide variety of biological processes and their importance to health care has prompted the search for high-throughput methods to speed their identification. However, most of the methods described to date are either labor intensive, inefficient, technically demanding, or costly (1421). We developed and validated the CAST method that rapidly identifies eukaryotic genes encoding signal sequence-containing proteins. CAST can be efficiently done with cDNA derived from various organisms and is suitable for situations in which the amount of available tissue is limited, such as human biopsies. With CAST, we identified >2,700 genes from breast cancers that conferred ampicillin resistance to E. coli by restoring the function of the pCAST ß-lactamase signal sequence. We examined the expression of almost 1,300 of these genes in normal and cancerous breast tissues and found >150 with elevated levels of expression in breast cancers. Many of these were also expressed in a relatively restricted number of normal tissues. Whereas differential expression between a cancerous tissue and its normal counterpart is important and may be indicative of a role in malignancy, expression in other normal human tissues may limit the usefulness of such a marker. An ideal marker might be highly expressed in malignant cells and have limited expression in normal tissues; such specificity might reduce potential side effects of a novel therapeutic. One gene with a particularly striking pattern of expression is COL10A1. This gene was detected in 100% of the breast cancers tested and none of the normal breast tissues. In addition, expression of COL10A1 was not observed in any of the normal human tissues examined. COL10A1 encodes a homotrimeric short chain collagen and is expressed during endochondral ossification such as growth plates in childhood (44). As a component of the extracellular matrix, COL10A1 may be involved in invasion and metastasis of cancer cells. If so, targeted degradation of the COL10A1 mRNA or prevention of homotrimer formation via a monoclonal antibody or small molecule may be an effective cancer therapy. Because COL10A1 is expressed only in developing bone and because human genetic syndromes with mutations in COL10A1 only affect growing bones, therapies against it may not adversely affect adult cancer patients.

As we analyzed the genes identified by CAST and determined whether any were known markers of cancer, it became apparent that several genes were reported markers of ovarian cancer. Women with mutations in certain genes, such as BRCA1 and BRCA2 have an increased risk of developing both breast and ovarian cancer indicating that there may be common mechanisms for malignancy in these tissues (45). Thus, it was plausible that some of the secreted and transmembrane proteins that we found to have increased levels in breast cancer might also be ovarian cancer markers. Indeed, we found that 77% of the genes we examined were also up-regulated in ovarian cancers. It is likely that some of the breast cancer markers that we identified also mark other types of cancer. In support of this, ~7% of the named genes that had not been previously reported as breast cancer markers had been identified as markers of other cancers. Thus, some of the proteins that we found in the CAST screen may be markers of multiple cancers, whereas others may be site specific.

Although we did not do an extensive comparison, it was evident that unique populations of CAST targets were identified from each of the breast cancer libraries we generated. Furthermore, using several different tumors in the construction of the CAST libraries enabled the identification of genes that were expressed in distinct subsets and varying percentages of the cancers tested. It follows that cell surface and secreted proteins that are specific to a cancer type or stage could be identified with CAST. For example, cancers that exhibit a particular characteristic (i.e., Her-2/neu overexpression, ER status) or disease stage could be grouped together and CAST targets compared. Identification of a subset of cell surface or secreted proteins overexpressed in cancers that exhibit certain characteristics could be useful for molecular profiling of human biopsies. The pattern of genes encoding cell surface and secreted cancer markers may facilitate diagnosis and may indicate the best treatment strategy to use on individual patients.

A major advantage of the CAST method is that a large number of genes encoding secreted and transmembrane proteins can be identified quickly and for relatively low cost. For example, within 1 week, the entire method from RNA isolation to sequencing of CAST isolates can be completed, including quality control for both RNA isolation (i.e., Northern blots) and library construction. The use of bacterial selection engenders such rapidity and its ease is highlighted by the thousands of unique genes that we identified. During the large-scale screen, we decreased the labor and time expended by exploiting 96-well formats for growing bacterial cultures, DNA isolation, sequencing, primer synthesis, and RT-PCR. Because sequencing (as well as oligonucleotide synthesis for RT-PCR) costs are substantially reduced when processed in 96-well format, the use of 96-well plates decreased the labor and time involved and increased the cost-effectiveness of the methodologies. There are several other modifications that could be made to further reduce the cost associated with the identification of genes encoding cell surface and secreted proteins. To reduce the redundancy that is encountered during sequencing, single nucleotide sequencing can be used or alternatively a hybridization step could be introduced such that genes that are represented at a high frequency in the CAST library are not selected for sequencing. After CAST, we attempted to identify putative cancer markers with RT-PCR. For this, we again used the 96-well format and to further increase efficiency, we ordered the forward and reverse primers for each gene in the same well. For ease of manipulation, all samples were loaded using multichannel pipettemen in electrophoresis apparatus that could accommodate 192 samples. Although we selected RT-PCR as the method to assess transcript levels, many other methods such as dot blots or cDNA microarrays that incur less time, effort, and expense are equally applicable. Furthermore, one may prioritize the list of clones to further examine in a variety of ways (e.g., selecting novel genes or certain classes of receptors or ligands). In addition to technical ease and speed, the CAST system is quite flexible and adaptable and therefore can be tailored to the meet the criteria of each project.

In summary, we show that CAST is a rapid and efficient method to enrich for genes encoding secreted and cell surface proteins. From breast cancer specimens, we isolated >2,000 genes encoding secreted and cell surface proteins, which represent an extremely large collection of these high value targets. Furthermore, we easily identified many genes encoding these types of proteins that were up-regulated in human breast and ovarian cancers. Many of these genes displayed a relatively limited distribution in a panel of normal human tissues, supporting the notion that they may be excellent targets for diagnostics or therapeutics. Our goal for this study was to identify genes that encode potential markers of breast cancer. The next important step in the evaluation of these genes will be to generate antibodies to the putative novel cell surface and secreted proteins to determine whether the transcript levels are indicative of protein levels in cancerous tissues or serum from cancer patients. This would validate the ability of CAST to identify genes that encode secreted and cell surface proteins as well as our strategy to determine whether any of these genes encode cancer markers. Because we used normal breast tissue for our control samples, there is a possibility that the genes identified as markers of cancer are simply markers of epithelial cells, which would be enriched in cancer samples relative to normal tissues. There are at least three reasons to support that the genes identified in this study are indeed markers of cancer. (a) The genes identified were overexpressed in different numbers and combinations of cancer samples, whereas markers of epithelial cells should be expressed in similar patterns. (b) We identified known breast cancer markers and these markers were expressed in an appropriate number of cancer samples (i.e., HER-2/neu was overexpressed in ~20% of the cancers consistent with the literature). We also showed, by immunohistostaining, that HER-2/neu protein levels positively correlated with elevated HER-2/neu mRNA levels as detected by RT-PCR. (c) We identified several known markers of normal breast luminal epithelial cells that were not up-regulated in the cancer specimens included in our panel. For example, a recent publication lists several genes that are expressed in normal breast luminal epithelial cells (46). Of the reported genes, we identified eight using CAST (CD24, ATP1B1, Mucin 1, CTL2, KIAA0233, PPAP2C, BACE2, and ERBB3). Our RT-PCR analysis only identified one of these genes (ERBB3) as a marker of cancer as has been reported (47); the remaining genes were expressed at equivalent levels between normal and cancerous samples. Taken together, these results support the notion that the methodologies described herein identified genes encoding putative cell surface and secreted markers of cancer.

In summary, CAST is applicable to virtually any tissue from a wide range of organisms and can be used in combination with existing technologies to identify genes encoding cell surface and secreted proteins from specific tissues or disease states.


    Acknowledgments
 
Grant support: Department of Defense, NIH, American Cancer Society, and Leukemia and Lymphoma Society, and Department of Defense Breast Cancer Research Program award (J.M. Graff) under the auspices of the Congressionally Directed Medical Research Programs.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

We thank the members of the Graff laboratory.


    Footnotes
 
Note: While this article was in preparation, a similar method for the identification of secreted and membrane-containing proteins was reported (48). They identified 65 unique proteins from 282 positive clones, 74% of which were processed through the secretory pathway. These results are in concordance with what we have observed thus supporting the use of CAST for rapid identification of secreted and cell surface proteins.

D.A. Ferguson dedicates this article to the memory of Barbara Scott and her family.

Received 10/18/04. Revised 5/22/05. Accepted 7/12/05.


    References
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 References
 

  1. Hanahan D, Weinberg RA. The hallmarks of cancer. Cell 2000;100:57–70.[CrossRef][Medline]
  2. Klijn JG, Berns PM, Schmitz PI, Foekens JA. The clinical significance of epidermal growth factor receptor (EGF-R) in human breast cancer: a review on 5232 patients. Endocr Rev 1992;13:3–17.[Abstract]
  3. Slamon DJ, Clark GM, Wong SG, Levin WJ, Ullrich A, McGuire WL. Human breast cancer: correlation of relapse and survival with amplification of the HER-2/neu oncogene. Science 1987;235:177–82.[Abstract/Free Full Text]
  4. Duffy MJ, Maguire TM, Hill A, McDermott E, O'Higgins N. Metalloproteinases: role in breast carcinogenesis, invasion and metastasis. Breast Cancer Res 2000;2:252–7.[CrossRef][Medline]
  5. Sledge GW Jr. Vascular endothelial growth factor in breast cancer: biologic and therapeutic aspects. Semin Oncol 2002;29:104–10.[Medline]
  6. Balk SP, Ko YJ, Bubley GJ. Biology of prostate-specific antigen. J Clin Oncol 2003;21:383–91.[Abstract/Free Full Text]
  7. Brandt R, Eisenbrandt R, Leenders F, et al.Mammary gland specific hEGF receptor transgene expression induces neoplasia and inhibits differentiation. Oncogene 2000;19:2129–37.[CrossRef][Medline]
  8. Tang CK, Gong XQ, Moscatello DK, Wong AJ, Lippman ME. Epidermal growth factor receptor vIII enhances tumorigenicity in human breast cancer. Cancer Res 2000;60:3081–7.[Abstract/Free Full Text]
  9. Nahta R, Hortobagyi GN, Esteva FJ. Growth factor receptors in breast cancer: potential for therapeutic intervention. Oncologist 2003;8:5–17.[Abstract/Free Full Text]
  10. Slamon DJ, Leyland-Jones B, Shak S, et al. Use of chemotherapy plus a monoclonal antibody against HER2 for metastatic breast cancer that overexpresses HER2. N Engl J Med 2001;344:783–92.[Abstract/Free Full Text]
  11. Cohen MH, Williams GA, Sridhara R, Chen G, Pazdur R. FDA drug approval summary: gefitinib (ZD1839) (Iressa) tablets. Oncologist 2003;8:303–6.[Abstract/Free Full Text]
  12. Herbst RS. Erlotinib (Tarceva): an update on the clinical trial program. Semin Oncol 2003;30:34–46.
  13. Sledge GW, Miller KD. Exploiting the hallmarks of cancer. The future conquest of breast cancer. Eur J Cancer 2003;39:1668–75.
  14. Tashiro K, Tada H, Heilker R, Shirozu M, Nakano T, Honjo T. Signal sequence trap: a cloning strategy for secreted proteins and type I membrane proteins. Science 1993;261:600–3.[Abstract/Free Full Text]
  15. Klein RD, Gu Q, Goddard A, Rosenthal A. Selection for genes encoding secreted proteins and receptors. Proc Natl Acad Sci U S A 1996;93:7108–13.[Abstract/Free Full Text]
  16. Kojima T, Kitamura T. A signal sequence trap based on a constitutively active cytokine receptor. Nat Biotechnol 1999;17:487–90.[CrossRef][Medline]
  17. Lim SP, Garzino-Demo A. Cloning trap for signal peptide sequences. Biotechniques 2000;28:124–6, 128–30.
  18. Mechler BM. Isolation of messenger RNA from membrane-bound polysomes. Methods Enzymol 1987;152:241–8.[Medline]
  19. Diehn M, Eisen MB, Botstein D, Brown PO. Large-scale identification of secreted and membrane-associated gene products using DNA microarrays. Nat Genet 2000;25:58–62.[CrossRef][Medline]
  20. Nielsen H, Brunak S, von Heijne G. Machine learning approaches for the prediction of signal peptides and other protein sorting signals. Protein Eng 1999;12:3–9.[Abstract/Free Full Text]
  21. Adam PJ, Boyd R, Tyson KL, et al. Comprehensive proteomic analysis of breast cancer cell membranes reveals unique proteins with potential roles in clinical cancer. J Biol Chem 2003;278:6482–9.[Abstract/Free Full Text]
  22. von Heijne G. A new method for predicting signal sequence cleavage sites. Nucleic Acids Res 1986;14:4683–90.[Abstract/Free Full Text]
  23. Schnell DJ, Hebert DN. Protein translocons: multifunctional mediators of protein translocation across membranes. Cell 2003;112:491–505.[CrossRef][Medline]
  24. Kadonaga JT, Gautier AE, Straus DR, Charles AD, Edge MD, Knowles JR. The role of the ß-lactamase signal sequence in the secretion of proteins by Escherichia coli. J Biol Chem 1984;259:2149–54.[Abstract/Free Full Text]
  25. Weir HK, Thun MJ, Hankey BF, et al. Annual report to the nation on the status of cancer, 1975–2000, featuring the uses of surveillance data for cancer prevention and control. J Natl Cancer Inst 2003;95:1276–99.[Abstract/Free Full Text]
  26. Colomer R, Shamon LA, Tsai MS, Lupu R. Herceptin: from the bench to the clinic. Cancer Invest 2001;19:49–56.[CrossRef][Medline]
  27. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol 1990;215:403–10.[CrossRef][Medline]
  28. Schuler GD, Boguski MS, Stewart EA, et al. A gene map of the human genome. Science 1996;274:540–6.[Abstract/Free Full Text]
  29. Nakai K, Kanehisa M. A knowledge base for predicting protein localization sites in eukaryotic cells. Genomics 1992;14:897–911.[CrossRef][Medline]
  30. Schultz J, Milpetz F, Bork P, Ponting CP. SMART, a simple modular architecture research tool: identification of signaling domains. Proc Natl Acad Sci U S A 1998;95:5857–64.[Abstract/Free Full Text]
  31. Nielsen H, Engelbrecht J, Brunak S, von Heijne G. Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng 1997;10:1–6.[Abstract/Free Full Text]
  32. Rozen S, Skaletsky H. Primer3 on the WWW for general users and for biologist programmers. Methods Mol Biol 2000;132:365–86.[Medline]
  33. Korkmaz KS, Elbi C, Korkmaz CG, Loda M, Hager GL, Saatcioglu F. Molecular cloning and characterization of STAMP1, a highly prostate-specific six transmembrane protein that is overexpressed in prostate cancer. J Biol Chem 2002;277:36689–96.[Abstract/Free Full Text]
  34. Moog-Lutz C, Cave-Riant F, Guibal FC, et al. JAML, a novel protein with characteristics of a junctional adhesion molecule, is induced during differentiation of myeloid leukemia cells. Blood 2003;102:3371–8.[Abstract/Free Full Text]
  35. Kim JH, Skates SJ, Uede T, et al. Osteopontin as a potential diagnostic biomarker for ovarian cancer. JAMA 2002;287:1671–9.[Abstract/Free Full Text]
  36. Dhanasekaran SM, Barrette TR, Ghosh D, et al. Delineation of prognostic biomarkers in prostate cancer. Nature 2001;412:822–6.[CrossRef][Medline]
  37. Tanimoto H, Yan Y, Clarke J, et al. a cell surface serine protease identified in hepatoma cells, is overexpressed in ovarian cancer. Cancer Res 1997;57:2884–7.[Abstract/Free Full Text]
  38. Zacharski LR, Ornstein DL, Memoli VA, Rousseau SM, Kisiel W. Expression of the factor VII activating protease, hepsin, in situ in renal cell carcinoma. Thromb Haemost 1998;79:876–7.[Medline]
  39. Feinberg EH, Hunter CP. Transport of dsRNA into cells by the transmembrane protein SID-1. Science 2003;301:1545–7.[Abstract/Free Full Text]
  40. Winston WM, Molodowitch C, Hunter CP. Systemic RNAi in C. elegans requires the putative transmembrane protein SID-1. Science 2002;295:2456–9.[Abstract/Free Full Text]
  41. Berois N, Varangot M, Sonora C, et al. Detection of bone marrow-disseminated breast cancer cells using an RT-PCR assay of MUC5B mRNA. Int J Cancer 2003;103:550–5.[CrossRef][Medline]
  42. Hur EM, Son M, Lee OH, et al. LIME, a novel transmembrane adaptor protein, associates with p56lck and mediates T cell activation. J Exp Med 2003;198:1463–73.[Abstract/Free Full Text]
  43. Kristiansen G, Denkert C, Schluns K, Dahl E, Pilarsky C, Hauptmann S. CD24 is expressed in ovarian cancer and is a new independent prognostic marker of patient survival. Am J Pathol 2002;161:1215–21.[Abstract/Free Full Text]
  44. Warman ML, Abbott M, Apte SS, et al. A type X collagen mutation causes Schmid metaphyseal chondrodysplasia. Nat Genet 1993;5:79–82.[CrossRef][Medline]
  45. Wooster R, Weber BL. Breast and ovarian cancer. N Engl J Med 2003;348:2339–47.[Free Full Text]
  46. Jones C, Mackay A, Grigoriadis A, et al. Expression profiling of purified normal human luminal and myoepithelial breast cells: identification of novel prognostic markers for breast cancer. Cancer Res 2004;64:3037–45.[Abstract/Free Full Text]
  47. Kraus MH, Issing W, Miki T, Popescu NC, Aaronson SA. Isolation and characterization of ERBB3, a third member of the ERBB/epidermal growth factor receptor family: evidence for overexpression in a subset of human mammary tumors. Proc Natl Acad Sci U S A 1989;86:9193–7.[Abstract/Free Full Text]
  48. Tan R, Jiang X, Jackson A, et al. E. coli selection of human genes encoding secreted and membrane proteins based on cDNA fusions to a leaderless ß-lactamase reporter. Genome Res 2003;13:1938–43.[Abstract/Free Full Text]




This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Ferguson, D. A.
Right arrow Articles by Graff, J. M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Ferguson, D. A.
Right arrow Articles by Graff, J. M.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Cancer Research Clinical Cancer Research
Cancer Epidemiology Biomarkers & Prevention Molecular Cancer Therapeutics
Molecular Cancer Research Cancer Prevention Research
Cancer Prevention Journals Portal Cancer Reviews Online
Annual Meeting Education Book Meeting Abstracts Online