| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Molecular Biology, Pathobiology and Genetics |
1 Departamento de Bioquímica, 2 Laboratório de Bioinformática, Instituto de Química, 3 Laboratório de Neurociências (LIM-27), Instituto e Departamento de Psiquiatria, and 4 Disciplina de Oncologia, Departamento de Radiologia, Faculdade de Medicina, Universidade de São Paulo; 5 Laboratório de Endocrinologia Molecular, Departamentos de Medicina e Morfologia, Universidade Federal de São Paulo; 6 Departamento de Cirurgia de Cabeça e Pescoço e Otorrinolaringologia, Hospital do Câncer A.C. Camargo, São Paulo, SP, Brazil; Laboratórios de 7 Biologia Molecular e Genômica, Hemocentro and 8 Genômica e Expressão, Departamento de Genética e Evolução, Instituto de Biologia, Universidade Estadual de Campinas, Campinas, SP, Brazil; 9 Departamento de Biologia, Instituto de Biociências, Letras e Ciências Exatas, Universidade Estadual Paulista; 10 Departamento de Biologia Molecular, Faculdade de Medicina de São José do Rio Preto, São José do Rio Preto, SP, Brazil; and 11 Departamento de Ciências Biológicas, Escola de Farmácia, Universidade Estadual Paulista, Araraquara, SP, Brazil
Requests for reprints: Emmanuel Dias-Neto, Laboratory of Neurosciences (LIM-27), Instituto de Psiquiatria, Faculdade de Medicina, Universidade de São Paulo, R. Dr. Ovidio de Campos, s/n Consolação 05403-010, São Paulo, SP, Brazil. Phone: 55-11-3069-7267; Fax: 55-11-3069-8010; E-mail: emmanuel{at}usp.br.
| Abstract |
|---|
|
|
|---|
Key Words: head and neck HNSCC thyroid goiter cancer ORESTES ESTs transcriptome alternative splicing tumor markers
| Introduction |
|---|
|
|
|---|
Cancer phenotype is defined by the accumulation of mutations and epigenetic changes that may alter protein function and/or cause alterations in transcriptional patterns. These events usually confer competitive advantages to a cell, ultimately leading to the malignant phenotype. Transcriptome wide analysis is an important tool to reveal some of these molecular mechanisms that underlie human malignancies (5, 6).
To generate a database of transcriptional profiles derived from tumor and nontumor tissues, a high-throughput cDNA-sequencing project, based on normalized mini-cDNA libraries generated with the open reading frame expressed sequence tags (ORESTES) methodology, was conducted by our group, yielding a set of
1.2 million expressed sequence tags (EST) from diverse human tissues, from which a filtered set was deposited in public databases (6, 7). ORESTES sequences are predominantly derived from the central coding region of genes, thus favoring the identification of gene function based on protein similarity. In addition, this methodology partially normalizes the population of tagged genes, enabling the study of rare transcripts (8). The present study reports on a detailed informatics analysis of 134,495 ORESTES from head and neck as well as 79,141 ORESTES from thyroid tissues followed by experimental validation of new transcripts and new splicing isoforms and by an evaluation of putative markers for these malignancies.
| Materials and Methods |
|---|
|
|
|---|
RNA Extraction and ORESTES. PolyA+ RNA was isolated, treated with DNase and purified as described before, (8) and used for constructing 1,576 ORESTES cDNA minilibraries (8) that were cloned and sequenced (7). The detailed composition of each of these cDNA minilibraries, in terms of tissue origin and pathologic state, is given in Supplementary Methods and in Supplementary Table 1. A total of 1,003 cDNA minilibraries were sequenced, generating 18,620 sequences from nontumor tissues, 190,253 from tumor tissues, and 4,763 from thyroid goiter samples; 134,495 sequences were derived from head and neck and 79,141 from thyroid tissues. mRNA derived from a second set of tumors and nontumor samples as well as from four human cell lines (FaDu, Hep2, HeLa, and Siha) was extracted as described above and used for the experimental validation of new splicing isoforms, new genes, and potential tumor markers predicted by bioinformatics analyses.
cDNA Sequence Analyses. A total of 213,636 ESTs generated from head and neck and thyroid tissues were analyzed along with local copies of the human genome sequence draft (University of California at Santa Cruz, June 2002) and of the RefSeq and mRNA public data sets (www.ncbi.nlm.nih.gov). The complete set of 1,187,342 ESTs produced in the Human Cancer Genome project (HCGP; ref. 7) from more than 20 tissues was filtered to reduce the presence of low-quality sequences or contaminants, and assembled in 956,456 reads, which included 173,640 sequences derived from head, neck, and thyroid tissues. The data set was clustered using BLAST (http://www.ncbi.nlm.nih.gov/) and assembled with Phrap (http://www.phrap.org). To further reduce the residual redundancy and to eliminate misassembled contigs or spurious sequences that passed the filtering step, all contigs and singlet sequences were aligned to the draft human genome sequence using BLAT (9). All mapped sequences that exhibited genomic overlap were merged into a unique genomic cluster. Only sequences that aligned with >90% identity to the draft sequence through at least 50% of their length were considered for further analyses. A subset of 119,552 ESTs that were used here and were not analyzed in previous publications (68) were now deposited in dbEST GenBank under accession numbers CV309219 to CV428770.
Identification of Tissue-Specific Transcriptional Markers. Genes that were preferentially expressed in certain head and neck sites (oral cavity, pharynx, and larynx) or thyroid were identified using a Bayesian model, as described in detail in Supplementary Methods. Using this approach, EST count thresholds were established to define 95% confidence intervals for the absence or presence of expression of a certain gene in each anatomic site.
Analysis of New Splicing Isoforms and New Human Transcripts. Four trained investigators, using the same criteria, visually inspected the genomic mapping of all ESTs with putative new splicing isoforms. For all new splicings approved by visual inspection, additional data were collected regarding splicing classification (10), size of the event in terms of nucleotides involved, donor/acceptor sites, and confirmation of the event by other cDNA sequences. Reverse transcriptionPCR (RT-PCR) and cDNA sequencing were done to confirm a subset of these new gene isoforms. To reduce individual variations, isoform validations were done using human cell lines as well as pooled samples from different patients and topological locations.
Candidates for new human transcripts were identified by selecting clusters or single ESTs with no similarity to known human genes, as determined by BLASTN queries against a complete set of 162,104 human mRNA full-length sequences deposited in GenBank (cutoff E value = 1015). In addition, clusters and singlets were identified that mapped to a 2.37-Mb region of chromosome 22 (22q11.2), flanking marker D22S421, previously shown to be deleted in larynx tumors (11). These were selected for manual annotation, which included sequence alignment with in silico gene predictions (GeneScan) and comparison with mouse full-length transcripts and ESTs, using the UCSC Genome Browser (http://genome.ucsc.edu/). A set of putative new transcripts was validated by RT-PCR, using pooled mRNAs from nontumor or tumor larynx and further evaluated in 45 different mRNA samples derived from larynx (14 nontumor and 13 tumor), tongue (9 nontumor and 3 tumor), and tonsil (4 nontumor and 2 tumor).
Evaluation of Differential Gene Expression. Experimental evaluation of differential expression was done for a subset of ORESTES contigs mapped to genomic regions that, according to the literature data, exhibit recurrent loss of heterozygosity or amplifications in the selected tumors. Three genes were selected for experimental validation of their differential expression by quantitative real-time PCR analysis in a GenAmp 5700 (Applied Biosystems, Foster City, CA). cDNA was produced from individual tumor and paired normal adjacent tissue samples from the larynx and oral cavity. To reduce individual sample variations possibly arising from contamination of normal tissue with adjacent tumor, the average expression level of each gene in all normal samples was used as a reference for each tissue examined. For each patient, fold change of expression was calculated as the ratio between the individual tumor sample and the average level of normal tissue samples. Each cDNA sample was analyzed in duplicate reactions using the SybrGreen PCR Core Reagent (Applied Biosystems), and normalization was done according to the manufacturer's instructions using ACTB or GAPDH as the reference gene.
Web site. A dedicated Web site was prepared for data analysis, enabling visual inspection of the assemblies and of their genome map coordinates as well as searches using keywords, Gene Ontology classification, or chromosomal location of transcripts detected in head, neck, or thyroid tumor types under study. The site is publicly available at http://verjo19.iq.usp.br/java/jsp/head_neck/.
| Results and Discussion |
|---|
|
|
|---|
|
|
New Genes and Noncoding RNAs. Out of the total of 20,348 genomic clusters containing sequences from head and neck or thyroid, a significant fraction (25%, i.e., 5,038 clusters) represents possible new human transcripts because they have no significant similarities to previously known human genes (see Materials and Methods). Among these, a small fraction has evidence of splicing (16%, i.e., 835) and only 507 sequences (10%) have a normalized ESTscan score (13) >1, which is indicative of high protein coding potential. Nevertheless, none of them contain known protein motifs when compared with the PFAM data set. Among these new coding transcript fragments, 254 were found to map to intronic regions of RefSeq genes and may represent new coding exons (see Supplementary Table 3). The remaining 253 map to intergenic regions and may represent fragments of new human genes (see Supplementary Table 4).
The remaining 4,531 clusters that represent new transcripts have a normalized ESTscan score lower than 1; that is, they have a low coding potential. Remarkably, a large fraction of these (50%, i.e., 2,251 clusters) map to intronic regions of RefSeq genes (Supplementary Table 5). Whereas some of these may reflect a certain degree of genomic DNA contamination that might have persisted even after DNase treatment and polyA+ mRNA selection, it is likely that most of these ESTs represent bona fide mRNA transcripts. In fact, recent data showed that similar fractions of intronic and exonic transcripts are expressed in 47 samples of tumor and normal prostate, as detected by cDNA microarrays constructed with ORESTES clones (14).
Recent work describing the transcriptional output of the human genome points to the existence of a significant number of noncoding RNA transcripts (15, 16) and that 10% to 20% of human transcripts might form sense-antisense pairs (17, 18). Furthermore, comparable fractions of transcriptional activity were detected within exons or introns of annotated genes, and nearly half of these intronic transcripts were expressed antisense to their respective well-characterized introns (16, 19). Although the role of these antisense intronic noncoding messages in RNA regulation is not yet fully understood (20), it has been recently shown that intronic noncoding transcripts expressed antisense to their respective introns have their expression level significantly correlated to the degree of tumor differentiation in prostate cancer (14). Identification in the present work of a large set of intronic noncoding transcripts expressed in head-neck and thyroid suggests that noncoding intronic transcripts may also play an important role in HNSCC and thyroid tumors.
Significantly, the majority of the new transcripts containing sequences from head and neck or thyroid (2,832 of 5,038, i.e., 56%) map to genomic regions previously associated to HNSCC or thyroid tumors (21). We found 12 putative new human transcripts mapping to a 2.37-Mb sequence from human chromosome 22q11-12, a region selected for its involvement with larynx tumors (11). These include three EST clusters containing head, neck, and thyroid sequences. All three had their expression confirmed in larynx by RT-PCR and were further verified by semiquantitative RT-PCR in 45 different mRNA samples obtained from pairs of nontumor/tumor tissues (see Materials and Methods). Positive amplification was obtained in 42 of 45 (LOC91353, 29 of 45 (LOC164615), and 21 of 45 (LOC91355 of the evaluated samples. Although no evidence for differential expression between nontumor and tumor samples was observed for this small set of genes, these results confirm the potential of this approach to identify new human genes.
New Splicing Isoforms. Genomic alignments of all EST clusters matching known human genes were visually inspected to evaluate the presence of putative new splicing isoforms. New events involving at least 10 nucleotides of deletions or insertions in previously known genes were considered as putative new splicing isoforms and were classified according to Wang et al. (10). Presence of conserved donor/acceptor sites as well as eventual confirmation of splicing events by other human ESTs or nonhuman transcripts in the public databases were recorded. A total of 788 new splicing events were identified in 748 different transcripts (see Supplementary Table 6).
Among the new splicing events observed here, 86% and 89% showed the canonical GT/AG donor and acceptor sites, respectively. The majority (75.3%) of new splicings was due to insertion events (Table 2). Overall, the most frequent events were exon extensions at 5' (type II) and insertions at 3' (type III). New splicing events could be confirmed by other human ESTs from GenBank in 28.6% of the cases and by sequences from other organisms in 6%.
|
|
Because the ORESTES database was shown to be enriched with rare messages (7), splicing variants described here may represent rare new splicing isoforms related to these tumors. Whereas some false-negative or positive splicing events may have occurred among the 788 new splicing events described, our validation rate (68%) suggests that true events are the majority. Indeed, the set of human ESTs available in public databases has recently been shown to provide potential markers for the classification of cancer (25). The new splicing isoforms reported here contribute to this notion and warrant further research aimed at the identification of tumor-specific transcriptional events.
Finding Tumor Markers Using Transcript Analysis. The complete HCGP data set is composed of a 9.8-fold excess of tumor-derived sequences. Despite this, a set of 8,988 genomic clusters exclusively from nontumor tissues sampled in the project was found. This set probably is enriched in genes down-regulated during tumor development and progression. Within this set, 270 genomic clusters contained sequences derived from normal head, neck, or thyroid tissues (the complete list can be searched at the project Web site). Twenty-three of these contain sequences from normal head and neck tissues and are of particular interest because they mapped to regions known to be frequently deleted in HNSCC (refs. 21, 2628; Supplementary Table 7). Although we found several clusters composed exclusively from sequences derived from normal thyroid, only one of these (R2_CLUSTER_36019) mapped to a region (17p11) known to be deleted in thyroid tumors (21, 29, 30). Several of the genes included in this subset have antiproliferative, apoptotic, or differentiation-induction activities: CASP10 is involved in the proximal pathway of Fas-mediated apoptosis and is down-regulated in lung cancer, neuroblastoma, and non-Hodgkin lymphomas (3133). Two other genes are related to apoptosis: CSEN, a member of the neuronal calcium sensor family (34), and MAPK8, which is important for the induction of apoptosis following stress (35). RGS6 and HHEX are involved in neuronal or thyroid differentiation, respectively (36, 37), and APRIN is a putative regulator of cell proliferation (38). Altogether, these observations strongly suggest that genes with a still unknown tumor suppressor activity in HNSCC or thyroid cancer will be found within this selected data set of loss of heterozygosity candidates.
Conversely, clusters exclusively derived from HNSCC or thyroid tumor tissues suggest oncogenes from genomic regions amplified in tumors. A set of 3,776 such clusters were identified (complete lists can be searched at the project Web site), including 144 that mapped to regions previously described as frequently amplified in HNSCC (see Supplementary Table 8) and therefore are the best candidates for further validation. In fact, 12 of these clusters are transcripts from known oncogenes, whereas another subset of about 30% of these clusters are from genes of unknown functions. Genes from this list may include potential antigens that could be used in immunodiagnosis or immunotherapeutic approaches, as well as tumor markers. However, due to the biased amount of sequences from tumor samples in our project, the list of tumor-only EST clusters identified here should be analyzed with caution.
Three transcripts were further investigated for differential expression in HNSCC by quantitative real-time PCR in paired patient samples (eight oral cavity tumors and nine larynx tumors). Selected genes mapped to genomic regions were frequently amplified (ZRF1 and NDRG1) or frequently lost (RAP140) in HNSCC. ZRF1 mRNA levels were increased in most samples of larynx (2.4- to 13.4-fold change, 6 of 8 samples, i.e., 75%) and of oral cavity tumors (1.5- to 2.3-fold change, 5 of 10 samples, i.e., 50%; Fig. 3A and B). This is in agreement with a previous study that showed an increase in chromosome 7 and ZRF1 copy numbers (39). RAP140 exhibited a decreased expression in half of the patient samples of larynx tumors (1.6- to 8.1-fold change, 4 of 8 samples, i.e., 50%; Fig. 3C). No information is available in the literature on RAP140 expression or function in tumors. NDRG1 mRNA levels were increased in half of the oral cavity tumor samples (1.7- to 3.5-fold change, 5 of 10 samples, i.e., 50%; Fig. 3D). NDRG1 has been referred to be either under- or overexpressed in a variety of cancers, including lung, brain, melanoma, liver, prostate, breast, and renal cancers and also in colon adenomas and adenocarcinomas (4042). The variability observed in the expression of cancer-related genes in individual patient samples probably reflects the biological heterogeneity of HNSCC (43). Nevertheless, the 50% to 75% validation rate achieved in the present work indicates that an informative list of possible candidate markers was generated in this transcriptome analysis, for which further experimental validation is warranted.
|
| Appendix |
|---|
|
|
|---|
Affiliations: Departamentos de Bioquímica, Instituto de Química (M.H.B., C.C., K.B.M., D.N.N., and M.C.S.), Neurociências (LIM-27), Instituto e Departamento de Psiquiatria (P.E.M.G.), and Biologia Celular e Desenvolvimento, Instituto de Ciências Biomédicas (E.T.K., S.G.L., S.E.M.), Disciplinas de Oncologia, Departamento de Radiologia, Faculdade de Medicina (M.H.H.F., S.M., F.R.R.M., F.S.P., P.C.C.d.S.) and Patologia Bucal, Faculdade de Odontologia (F.N.), Universidade de São Paulo; Laboratório de Endocrinologia Molecular, Departamentos de Medicina e Morfologia, Universidade Federal de São Paulo, São Paulo, SP, Brazil (R.M.B.M.); Faculdade de Medicina (R.A.C., L.I.C.V.E., S.R.R.), Hemocentro, Faculdade de Medicina (M.I.M.C.P.), and Departamento de Genética, Instituto de Biociências (C.A.R., P.P.d.R.), Universidade Estadual Paulista, Botucatu, SP, Brazil; Departamento de Ciências Biológicas, Escola de Farmácia, Universidade Estadual Paulista, Araraquara, SP, Brazil (J.R.P., C.F.Z.); Departamento de Biologia, Instituto de Biociências, Letras e Ciências Exatas, Universidade Estadual Paulista, São José do Rio Preto, SP, Brazil (F.C.C.R-L.); Laboratório de Genômica e Expressão, Departamento de Genética e Evolução, Instituto de Biologia (M.F.C.), Laboratório de Biologia Molecular e Genômica, Hemocentro (F.F.C., T.P.), Departamento de Genética Médica, Faculdade de Ciências Médicas (C.H., A.d.S.), Universidade Estadual de Campinas, Campinas, SP, Brazil; Universidade de Ribeirão Preto, Curso de Medicina, Ribeirão Preto, SP, Brazil (M.C.R.C.); Instituto de Pesquisa e Desenvolvimento, Universidade do Vale do Paraíba, São José dos Campos, SP, Brazil (F.G.N., M.P.N.); and Department of Leukemia, University of Texas MD Anderson Cancer Center, Houston, Texas (M.R.H.E.).
| Acknowledgments |
|---|
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
| Footnotes |
|---|
Supplementary data for this are available at Cancer Research Online (http://cancerres.aacrjournals.org/).
Received 9/28/04. Revised 12/14/04. Accepted 12/28/04.
| References |
|---|
|
|
|---|
subunit-like (GGL) domain of RGS6. J Biol Chem 2002;277:378329.This article has been cited by other articles:
![]() |
M. E. Dinger, P. P. Amaral, T. R. Mercer, and J. S. Mattick Pervasive transcription of the eukaryotic genome: functional indices and conceptual implications Brief Funct Genomic Proteomic, November 1, 2009; 8(6): 407 - 423. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. P. Mello, E. F. Abrantes, C. H. Torres, A. Machado-Lima, R. d. S. Fonseca, D. M. Carraro, R. R. Brentani, L. F. L. Reis, and H. Brentani No-match ORESTES explored as tumor markers Nucleic Acids Res., May 1, 2009; 37(8): 2607 - 2617. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. B. Prasad, H. Somervell, R. P. Tufano, A. P.B. Dackiw, M. R. Marohn, J. A. Califano, Y. Wang, W. H. Westra, D. P. Clark, C. B. Umbricht, et al. Identification of Genes Differentially Expressed in Benign versus Malignant Thyroid Tumors Clin. Cancer Res., June 1, 2008; 14(11): 3327 - 3337. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. S. Perez, T. R. Hoage, J. R. Pritchett, A. L. Ducharme-Smith, M. L. Halling, S. C. Ganapathiraju, P. S. Streng, and D. I. Smith Long, abundantly expressed non-coding transcripts are altered in cancer Hum. Mol. Genet., March 1, 2008; 17(5): 642 - 655. [Abstract] [Full Text] [PDF] |
||||
![]() |
X.-S. Wang, Z. Zhang, H.-C. Wang, J.-L. Cai, Q.-W. Xu, M.-Q. Li, Y.-C. Chen, X.-P. Qian, T.-J. Lu, L.-Z. Yu, et al. Rapid Identification of UCA1 as a Very Sensitive and Specific Unique Marker for Human Bladder Carcinoma. Clin. Cancer Res., August 15, 2006; 12(16): 4851 - 4858. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Roy, Q. Xu, and C. Lee Evidence that public database records for many cancer-associated genes reflect a splice form found in tumors and lack normal splice forms Nucleic Acids Res., September 7, 2005; 33(16): 5026 - 5033. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Cancer Research | Clinical Cancer Research |
| Cancer Epidemiology Biomarkers & Prevention | Molecular Cancer Therapeutics |
| Molecular Cancer Research | Cancer Prevention Research |
| Cancer Prevention Journals Portal | Cancer Reviews Online |
| Annual Meeting Education Book | Meeting Abstracts Online |