| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Advances in Brief |
Cancer Research UK Department of Medical Oncology, University of Glasgow, Cancer Research UK Beatson Laboratories, Glasgow G61 1BD, Scotland, United Kingdom [J. L. D., W. N. K., K. A. O.]; Beatson Institute for Cancer Research, Cancer Research UK Beatson Laboratories, Glasgow G61 1BD, Scotland, United Kingdom [J. K. V.]; Department of Statistics, University of Glasgow, Glasgow G12 8QW, Scotland, United Kingdom [E. C. W.]; and University Department of Pathology, Glasgow Royal Infirmary, Glasgow G4 0SF, Scotland, United Kingdom [K. A. O.]
| ABSTRACT |
|---|
|
|
|---|
| Introduction |
|---|
|
|
|---|
Traditionally, cancer classification has been based on histopathological and clinical data. New technologies, however, have enabled genome-wide analysis of gene expression patterns, which have suggested that individual cancer phenotypes exhibit characteristic expression profiles [examples include leukemia (5) and breast cancer (6) ]. We wanted to compare gene expression profiles from adenocarcinomas from a range of sites commonly known to give rise to metastatic disease to identify tumor markers characteristic of the site of origin. Our approach centered on data generated by SAGE,4 which is based on the isolation of short sequence tags (10 bp) from individual mRNA species (7) . Sequencing of linked tags allows efficient characterization of transcripts and the digital representation of their expression levels. The production of absolute transcript numbers in a digital format allows comparison between data sets generated by different laboratories. SAGE data are currently available publicly via the Cancer Genome Anatomy Project.5
In this study, hierarchical clustering of publicly available SAGE data from a range of adenocarcinomas confirmed the presence of differences in gene expression between tumor sites. These differences were then exploited using a bioinformatic approach that incorporated data from a large panel of tumors investigated using the various technologies currently available for global gene expression profiling. The approach, outlined in Fig. 1
, had two phases: identification of differentially expressed genes; followed by assessment of their tissue specificity. Differentially expressed genes were identified on-line and from published literature from experiments using SAGE, microarrays, DDD, differential display, and subtractive hybridization. These 515 genes were then investigated in a wider set of SAGE data for selection on the basis of gene expression in silico specific to adenocarcinoma from one site of origin or restricted to two primary sites. Sixty-one candidate markers emerged. The expression pattern of 11 of these markers was then validated by RT-PCR in a range of primary adenocarcinomas.
|
| Materials and Methods |
|---|
|
|
|---|
|
Expression Analysis of SAGE Data.
Five SAGE libraries from adenocarcinomas of breast, colon, ovary, pancreas, and prostate (96-349, Tu102, OVT7, Panc-96-6252, and PR317 prostate tumor) were downloaded from the NCBI website. A local SAGE library of primary gastric adenocarcinoma was also used. At the time of analysis, no SAGE tumor libraries from lung were publicly available, so lung adenocarcinomas were not included in the expression analysis. Using the SAGE 3.04 program, tags were extracted, and each library was compared with the five other libraries in a series of pair-wise comparisons. Statistical significance for these comparisons was based on Monte Carlo simulations within the program with a cutoff of P < 0.001. Tag fold change was defined in Microsoft Access as (fx/
library 1 tags)/(fx/
library 2 tags) where fx represents the frequency of a given tag and
library 1 tags represents the total tags in the library of interest. Differential expression was defined as a >10-fold increase in expression over all other libraries in combination with P < 0.001.
SAGE Data: Supervised Learning.
Supervised learning was performed on the same data set of 15 SAGE libraries used for hierarchical clustering. Genes with low expression levels (total level of <20 over all 15 tumors) were eliminated, and square root transformations of the remaining data were taken. Iterative logistic regression was used to determine the usefulness of each gene in discriminating each tumor from the remaining tumors. The predictive power of the genes was then assessed by calculating the predicted class label for a randomly omitted tumor and comparing that with the true class label. A calibration score was calculated as Pestimated Ptrue, where Ptrue is either 1 (correct) or 0 (false); a low score represents a good prediction. This cross-validation process was repeated, and an average prediction score was determined. Calculations were performed using the program Splus 2000. For each tumor, the 20 tags with the best predicting scores were selected.
DDD.
DDD, a bioinformatic tool available via the NCBI website,7
compares the frequencies of ESTs (or uncharacterized cDNAs) between tumor expression libraries in the Cancer Genome Anatomy Project database. Sixteen pooled EST libraries were compared, representing adenocarcinomas from breast (Br15, Br17, and Br18), colon (Co11, Co18, and Co22), lung (Lu26 and Lu27), ovary (Ov1, Ov2, and Ov8), pancreas (Pan1), prostate (Pr3, Pr8, and Pr23), and stomach (Gas4). DDD identifies ESTs differentially expressed with a statistical significance of P < 0.05 (Fishers exact test). The 25 most differentially expressed ESTs were selected.
Literature: Tumor Markers.
The Medline database from 1997 to 2000 was searched for papers describing putative tumor markers for sites relevant to adenocarcinoma of unknown origin (breast, colon, lung, ovary, pancreas, prostate, and stomach). One hundred and eighteen potential markers were identified.
Literature: Differential Expression.
A literature search was performed of publications between 1998 and 2001 on large-scale expression analysis of tumors relevant to adenocarcinoma of unknown origin. Twenty-seven papers were chosen (10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36)
, which used diverse technologies including microarrays, differential display, and subtractive hybridization, as well as SAGE. These reported a total of 530 differentially expressed genes, from which 87 were selected on the basis of experimental evidence of higher expression in the tumors of interest by Northern blotting, IHC, or PCR-based methods.
Tissue Specificity or Restriction.
The differentially expressed genes and putative tumor markers thus identified were then tested for their tissue specificity (Fig. 1)
against a wider panel of 47 SAGE libraries relevant to the problem of adenocarcinoma of unknown primary site, as detailed in Table 1
. Expression information from these SAGE libraries is accessible via the "gene to tag mapper" tool available at the NCBI website.8
For a given gene, this tool displays the potential SAGE tags in the transcript, the frequencies of those tags in each SAGE library in the database, and the normalized level of expression (tags/million) therein. Libraries were separated according to tissue of origin and expression level. Expression levels were grouped into abundance classes, and each class was assigned a weighting. Scores were then calculated based on weight and number of libraries for each expression class. This process resulted in a numerical representation of the expression level of each gene in five tissues. These numbers were then converted to a color scale that was used to prioritize genes for subsequent validation. Candidate tumor markers were defined as genes expressed highly in one (specific) or two (restricted) tissues and at low levels in all other tissues. This definition was fulfilled by 61 of the 515 differentially expressed genes already identified. Further information was then sought on each of the 61 candidate genes individually, through literature searches, NCBIs UniGene database, and the Weizmann Institutes GeneCards database.9
Eleven genes with the strongest evidence for expression restricted to the desired tissues were selected for validation by RT-PCR.
Tissue and cDNA Samples.
Primary tissues used included adenocarcinomas from breast, ovary, stomach, pancreas, and lung (two samples of the first four types and one sample of the last type were obtained). Ethical approval for the use of clinical material was given by the North Glasgow University Hospitals National Health Service Trust. The LNCaP prostate adenocarcinoma cell line was also used, along with cDNA samples of prostatic and lung adenocarcinomas [Invitrogen (Paisley, United Kingdom) and Origene (Rockville, MD), respectively].
RNA Isolation and RT-PCR.
Tissues were homogenized using a Ribolyser (Hybaid, Ashford, United Kingdom). Total RNA was isolated using Trizol reagent (Invitrogen). RNA was reversed transcribed to cDNA using SuperScript First Strand Synthesis System for RT-PCR (Invitrogen) with oligo(dT) primers. PCR was then performed; primer sequences and conditions are available from the authors on request.
| Results |
|---|
|
|
|---|
|
Identification of Tumor Markers Characteristic of the Site of Origin.
The 515 differentially expressed genes and putative tumor markers were then tested for tissue specificity against a wider panel of 47 SAGE libraries relevant to adenocarcinoma of unknown primary site. Candidate tumor markers were defined as genes expressed highly in one (specific) or two (restricted) tissues and at low levels in all other tissues. Sixty-one transcripts emerged, including both known tumor markers and genes not previously reported as markers (Fig. 3)
. Among the established markers were CEA and PSA, which are both valuable and widely used clinically. CEA expression in silico was restricted to colonic and pancreatic tumors, where it was present at high levels (orange color). PSA was abundant in prostatic carcinoma (red) but low elsewhere (blue). Genes found to be tissue specific or tissue restricted but not previously regarded as tumor markers included lipophilin B and glutathione peroxidase 2. The complete set of results is available online as supplementary information.2
|
|
| Discussion |
|---|
|
|
|---|
Publicly available SAGE data representing 15 tumors from sites relevant to the problem of adenocarcinoma of unknown origin were downloaded from the NCBI website. Hierarchical clustering of this data showed that tumors clustered according to site of origin, suggesting that there are similarities between tumors of common origin and differences between primary sites. These similarities and differences may be exploited to assist in prediction of the tissue of origin of metastatic adenocarcinoma and underlie our research approach.
Recently, Ramaswamy et al. (38) generated microarray expression data for a very large series of tumor types, including adenocarcinomas from the seven sites studied here, lymphoma, leukemia, malignant melanoma, and central nervous system tumors. Hierarchical clustering successfully divided the carcinomas from the other tumor types but did not, unlike our study, separate the adenocarcinomas according to their site of origin. This may be due to their use of measured (microarray) data rather than the counted (SAGE) data here. Alternatively, their chosen algorithm could have masked subtle differences between tumors of epithelial origin.
The clustered data included two pairs of breast carcinoma samples derived from primary tumors and corresponding lymph node metastases. These four samples clustered together. This suggests that the expression patterns of the metastatic tumors more closely resembled primary tumors of the same origin than primary adenocarcinomas from alternative sites. Primary and metastasis pairs also showed similar expression profiles in a cDNA microarray study of lung adenocarcinomas (40) . Few other large-scale studies of primary and metastatic tumors exist, but data from the many immunohistochemical studies that have looked at small numbers of genes suggest phenotypic similarity (41) .
Candidate tumor markers were identified through a bioinformatic approach centered on the analysis of SAGE data. This approach comprised two phases: identification of differentially expressed genes using expression data generated through a range of technologies; and then evaluation of the tissue specificity or tissue restriction of these genes using SAGE data. This approach resulted in the selection of the 61 candidates shown in Fig. 3
. Among these genes were the known tumor markers PSA (which differs from PSCA) and CEA. Their clinical value in the diagnosis and management of patients with prostatic and colorectal tumors, respectively, is well-established (42
, 43)
. The presence of these genes self-validates the approach and suggests that the other genes thus identified may prove similarly useful.
Eleven of these candidate genes were chosen for validation by RT-PCR in clinical material. RT-PCR is an established method for the rapid assessment of candidate genes in a range of samples [for example, see Scheurle et al. (26) ]. RT-PCR results for seven (64%) of these genes were consistent with the expression patterns predicted by bioinformatics: three agreed exactly; and four were broadly similar but with some variance. This compares well with previous studies [for example, Scheurle et al. (26) achieved concordance in 3 of 12 known genes]. Seven genes had previously been reported as tumor markers, either in a tissue-specific manner or by being up-regulated in tumors when compared with normal tissues. These genes were PSA [prostate (42) ], mammaglobin 1 (breast), TFF2 (also called human spasmolytic polypeptide, specific for pancreas), pepsinogen C (stomach), surfactant A (lung), PSCA (pancreas), and metallothionein 1L (pancreas).
For PSA, mammaglobin 1, and TFF2, the in silico and RT-PCR analyses agreed exactly. Pepsinogen C was abundant in the gastric cancers by RT-PCR but was also present in one of the two lung and pancreatic adenocarcinomas and in the prostate cell line. Expression of pepsinogen C in tissues other than the stomach had not been predicted by our bioinformatics approach but is described in Unigene and in the literature; nevertheless, these sources report the highest levels by far of pepsinogen C to be in gastric tissues. Surfactant A was abundant in one of the lung adenocarcinomas by RT-PCR, as predicted, but was also present in the prostate cell line. In fact, expression of surfactant A in the prostate, at lower levels than in the lung, has recently been reported (44) . In silico analysis had predicted that PSCA would be abundant in pancreatic tumors and would be present, but at lower levels, in breast, colon, ovary, prostate, and stomach tumors. Not surprisingly, PSCA was found by RT-PCR in all of these samples.
The expression patterns of four genes not previously reported as tumor markers were also validated by RT-PCR. These genes, all identified through analysis of SAGE data, were lipophilin B, glutathione peroxidase 2, KIAA0876, and PR domain containing 10. By both bioinformatics and RT-PCR, lipophilin B was restricted to breast, ovary, and prostate tumors, and glutathione peroxidase 2 was expressed at higher levels in colon and pancreatic carcinomas than in breast, ovary, and prostate tumors. Conversely, although KIAA0876, PR domain containing 10, and the known tumor marker metallothionein 1L were predicted be differentially expressed in breast, colon, and pancreatic tumors, respectively, none of the three displayed tissue specificity by RT-PCR. This is likely to be due to variability within and between the tumors from which the SAGE data were generated and those used for our validation. The expression patterns of these genes remain to be determined in a larger set of clinical samples.
Analysis of the expression data and RT-PCR results generated from clinical samples shows that few potential markers are expressed in a truly tissue-specific manner. We anticipate that information from a number of genes used together in a marker panel would be required to assist in predicting the primary site of adenocarcinomas of unknown origin. Such marker panels are already well established for the diagnosis and classification of malignant lymphoma in pathological specimens. Moreover, Su et al. (39) recently reported multiclass tumor classification using a set of 11 genes that correctly assigned 83% of test tumors into classes. This suggests that it should also be possible to use the similar small number of genes identified here in a marker panel to assign adenocarcinomas to different classes, according to site.
Application of such a marker panel should be achievable through IHC, which localizes the protein product of a gene in microscopic sections. The advantage of using IHC over other technologies (for example, cDNA microarrays) is that IHC is already in routine use in histopathology laboratories, where the diagnosis of metastatic adenocarcinoma is usually confirmed on a tissue biopsy or cell sample from the patient. Few immunohistochemical markers have yet been used for this purpose. The most valuable are CK20 and CK7, with CK20 positivity suggesting an origin in the gut (41) , and thyroid transcription factor-1, which is associated with lung tumors (45) . Consequently, no new technology would be needed, and any new panel of tumor markers for adenocarcinomas could be taken forward swiftly into clinical use. Ultimately, this depends on demonstrating that the gene expression profiles described here translate to protein expression patterns that can correctly predict tumor class, and our results provide a resource for further pathological testing.
In conclusion, novel and known tissue-specific and tissue-restricted tumor markers for adenocarcinoma have been identified through the analysis of publicly available expression data and validated in clinical material. The ability of these markers to categorize adenocarcinomas according to their tissue of origin can now be tested in a broader panel of archival tumor samples. Improved prediction of likely primary site in patients with adenocarcinomas of unknown origin should lead to better assessment of their prognosis and optimal, tailored therapy.
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
1 Supported by Cancer Research UK and the University of Glasgow. ![]()
2 Supplementary data for this article are available at Cancer Research Online (http://cancerres.aacrjournals.org). ![]()
3 To whom requests for reprints should be addressed, at Cancer Research UK Department of Medical Oncology, Cancer Research UK Beatson Laboratories, University of Glasgow, Garscube Estate, Switchback Road, Glasgow G61 1BD, Scotland, United Kingdom. Phone: 44-0-141-330-3506; Fax: 44-0-141-330-4127; E-mail: k.oien{at}beatson.gla.ac.uk ![]()
4 The abbreviations used are: SAGE, serial analysis of gene expression; CEA, carcinoembryonic antigen; CK, cytokeratin; DDD, digital differential display; EST, expressed sequence tag; IHC, immunohistochemistry; NCBI, National Center for Biotechnology Information; PSA, prostate-specific antigen; RT-PCR, reverse transcription-PCR; TFF2, trefoil factor 2; PSCA, prostate stem cell antigen. ![]()
6 ftp.ncbi.nih.gov/pub/sage/seq. ![]()
7 www.ncbi.nlm.nih.gov/UniGene/ddd.cgi?ORG=Hs. ![]()
8 www.ncbi.nlm.nih.gov/SAGE/SAGEcid.cgi. ![]()
9 www.ncbi.nlm.nih.gov/UniGene and bioinformatics.weizmann.ac.il/cards. ![]()
Received 5/ 3/02. Accepted 9/11/02.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
G. Quon and Q. Morris ISOLATE: a computational strategy for identifying the primary origin of cancers using high-throughput sequencing Bioinformatics, November 1, 2009; 25(21): 2882 - 2889. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. A. Oien and T.R. J. Evans Raising the Profile of Cancer of Unknown Primary J. Clin. Oncol., September 20, 2008; 26(27): 4373 - 4375. [Full Text] [PDF] |
||||
![]() |
H. M. Horlings, R. K. van Laar, J.-M. Kerst, H. H. Helgason, J. Wesseling, J. J.M. van der Hoeven, M. O. Warmoes, A. Floore, A. Witteveen, J. Lahti-Domenici, et al. Gene Expression Profiling to Identify the Histogenetic Origin of Metastatic Adenocarcinomas of Unknown Primary J. Clin. Oncol., September 20, 2008; 26(27): 4435 - 4441. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Pentheroudakis, E. Briasoulis, and N. Pavlidis Cancer of Unknown Primary Site: Missing Primary or Missing Biology? Oncologist, April 1, 2007; 12(4): 418 - 425. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Talantov, J. Baden, T. Jatkoe, K. Hahn, J. Yu, Y. Rajpurohit, Y. Jiang, C. Choi, J. S. Ross, D. Atkins, et al. A Quantitative Reverse Transcriptase-Polymerase Chain Reaction Assay to Identify Metastatic Carcinoma Tissue of Origin J. Mol. Diagn., July 1, 2006; 8(3): 320 - 329. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Fluck, C. Dapp, S. Schmutz, E. Wit, and H. Hoppeler Transcriptional profiling of tissue plasticity: role of shifts in gene expression and technical limitations J Appl Physiol, August 1, 2005; 99(2): 397 - 413. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. D. Cohen, P. P. Doran, S. M. Blattner, M. Merkle, G. Q. Wang, H. Schmid, P. W. Mathieson, M. A. Saleem, A. Henger, M. P. Rastaldi, et al. Sam68-Like Mammalian Protein 2, Identified by Digital Differential Display as Expressed by Podocytes, Is Induced in Proteinuria and Involved in Splice Site Selection of Vascular Endothelial Growth Factor J. Am. Soc. Nephrol., July 1, 2005; 16(7): 1958 - 1965. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. W. Tothill, A. Kowalczyk, D. Rischin, A. Bousioutas, I. Haviv, R. K. van Laar, P. M. Waring, J. Zalcberg, R. Ward, A. V. Biankin, et al. An Expression-Based Site of Origin Diagnostic Method Designed for Clinical Application to Cancer of Unknown Origin Cancer Res., May 15, 2005; 65(10): 4031 - 4040. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. L. Dennis, T. R. Hvidsten, E. C. Wit, J. Komorowski, A. K. Bell, I. Downie, J. Mooney, C. Verbeke, C. Bellamy, W. N. Keith, et al. Markers of Adenocarcinoma Characteristic of the Site of Origin: Development of a Diagnostic Algorithm Clin. Cancer Res., May 15, 2005; 11(10): 3766 - 3772. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. G. Talbot, C. Estilo, E. Maghami, I. S. Sarkaria, D. K. Pham, P. O-charoenrat, N. D. Socci, I. Ngai, D. Carlson, R. Ghossein, et al. Gene Expression Profiling Allows Distinction between Primary and Metastatic Squamous Cell Carcinomas in the Lung Cancer Res., April 15, 2005; 65(8): 3063 - 3071. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. M. Mintzer, M. Warhol, A.-M. Martin, and G. Greene Cancer of Unknown Primary: Changing Approaches. A Multidisciplinary Case Presentation from the Joan Karnell Cancer Center of Pennsylvania Hospital Oncologist, June 1, 2004; 9(3): 330 - 338. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. S. Dahlberg, L. F. Ferrin, S. M. Grindle, C. M. Nelson, C. D. Hoang, and B. Jacobson Gene expression profiles in esophageal adenocarcinoma Ann. Thorac. Surg., March 1, 2004; 77(3): 1008 - 1015. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Bardelli, S. Saha, J. A. Sager, K. E. Romans, B. Xin, S. D. Markowitz, C. Lengauer, V. E. Velculescu, K. W. Kinzler, and B. Vogelstein PRL-3 Expression in Metastatic Cancers Clin. Cancer Res., November 15, 2003; 9(15): 5607 - 5615. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Balana, J.-L. Manzano, I. Moreno, B. Cirauqui, A. Abad, A. Font, J.-L. Mate, and R. Rosell A phase II study of cisplatin, etoposide and gemcitabine in an unfavourable group of patients with carcinoma of unknown primary site Ann. Onc., September 1, 2003; 14(9): 1425 - 1429. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Buckhaults, Z. Zhang, Y.-C. Chen, T.-L. Wang, B. St. Croix, S. Saha, A. Bardelli, P. J. Morin, K. Polyak, R. H. Hruban, et al. Identifying Tumor Origin Using a Gene Expression-based Classification Map Cancer Res., July 15, 2003; 63(14): 4144 - 4149. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Cancer Research | Clinical Cancer Research |
| Cancer Epidemiology Biomarkers & Prevention | Molecular Cancer Therapeutics |
| Molecular Cancer Research | Cancer Prevention Research |
| Cancer Prevention Journals Portal | Cancer Reviews Online |
| Annual Meeting Education Book | Meeting Abstracts Online |