Cancer Research The Future of Cancer Research: Science and Patient Impact  09 AM Call for Abstracts
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Cancer Research Clinical Cancer Research
Cancer Epidemiology Biomarkers & Prevention Molecular Cancer Therapeutics
Molecular Cancer Research Cancer Prevention Research
Cancer Prevention Journals Portal Cancer Reviews Online
Annual Meeting Education Book Meeting Abstracts Online

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Lerman, M. I.
Right arrow Articles by Minna, J. D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Lerman, M. I.
Right arrow Articles by Minna, J. D.
[Cancer Research 60, 6116-6133, November 1, 2000]
© 2000 American Association for Cancer Research


Regular Articles

The 630-kb Lung Cancer Homozygous Deletion Region on Human Chromosome 3p21.3: Identification and Evaluation of the Resident Candidate Tumor Suppressor Genes1

Michael I. Lerman2, John D. Minna2 and for The International Lung Cancer Chromosome 3p21.3 Tumor Suppressor Gene Consortium ,3

Hamon Center for Therapeutic Oncology Research, University of Texas Southwestern Medical Center, Dallas, Texas 75390-8593 [J. D. M.], and Laboratory of Immunobiology, National Cancer Institute, Frederick Cancer Research and Development Center, Frederick, Maryland 21702 [M. I. L.]


    ABSTRACT
 Top
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
We used overlapping and nested homozygous deletions, contig building, genomic sequencing, and physical and transcript mapping to further define a ~630-kb lung cancer homozygous deletion region harboring one or more tumor suppressor genes (TSGs) on chromosome 3p21.3. This location was identified through somatic genetic mapping in tumors, cancer cell lines, and premalignant lesions of the lung and breast, including the discovery of several homozygous deletions. The combination of molecular manual methods and computational predictions permitted us to detect, isolate, characterize, and annotate a set of 25 genes that likely constitute the complete set of protein-coding genes residing in this ~630-kb sequence. A subset of 19 of these genes was found within the deleted overlap region of ~370-kb. This region was further subdivided by a nesting 200-kb breast cancer homozygous deletion into two gene sets: 8 genes lying in the proximal ~120-kb segment and 11 genes lying in the distal ~250-kb segment. These 19 genes were analyzed extensively by computational methods and were tested by manual methods for loss of expression and mutations in lung cancers to identify candidate TSGs from within this group. Four genes showed loss-of-expression or reduced mRNA levels in non-small cell lung cancer (CACNA2D2/{alpha}2{delta}-2, SEMA3B [formerly SEMA(V), BLU, and HYAL1] or small cell lung cancer (SEMA3B, BLU, and HYAL1) cell lines. We found six of the genes to have two or more amino acid sequence-altering mutations including BLU, NPRL2/Gene21, FUS1, HYAL1, FUS2, and SEMA3B. However, none of the 19 genes tested for mutation showed a frequent (>10%) mutation rate in lung cancer samples. This led us to exclude several of the genes in the region as classical tumor suppressors for sporadic lung cancer. On the other hand, the putative lung cancer TSG in this location may either be inactivated by tumor-acquired promoter hypermethylation or belong to the novel class of haploinsufficient genes that predispose to cancer in a hemizygous (+/-) state but do not show a second mutation in the remaining wild-type allele in the tumor. We discuss the data in the context of novel and classic cancer gene models as applied to lung carcinogenesis. Further functional testing of the critical genes by gene transfer and gene disruption strategies should permit the identification of the putative lung cancer TSG(s), LUCA. Analysis of the ~630-kb sequence also provides an opportunity to probe and understand the genomic structure, evolution, and functional organization of this relatively gene-rich region.


    INTRODUCTION
 Top
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Lung cancer kills >150,000 patients each year in the United States and many more around the world. These are more deaths than attributable to colon, prostate, and breast cancer combined (1, 2, 3, 4) . The scale of this epidemic has heightened efforts to understand the molecular pathogenesis of lung cancer, including genes frequently undergoing somatic mutation. The isolation of such "lung cancer genes" should guide the development of new therapeutic interventions and new early detection and prevention strategies. Although tobacco smoking is a well-established environmental etiology in lung carcinogenesis (5) , our understanding of the acquired genetic changes leading to lung cancer is still rudimentary and incomplete (6) . The lack of (and difficulty of performing) genetic linkage studies of lung cancer in families (4 , 7) that have the potential of identifying initiating cancer causing genes has directed the search toward allele loss mapping in tumors, cell lines, and premalignant lesions of the lung and breast (8, 9, 10, 11, 12, 13) . A convergence of evidence from these allele loss mapping studies, including identification of overlapping homozygous deletions, strongly suggests the presence of a TSG4 in the chromosome 3p21.3 band (6 , 9 , 10 , 14) . Allele loss in the 3p21.3 area is the earliest premalignant change thus far detected in lung cancer development (12 , 15 , 16) . Biallelic or monoallelic inactivation of this putative TSG gene(s) likely represents a critical (rate-limiting) step in the development of sporadic lung cancer. As part of our efforts to identify a lung cancer TSG (which we will provisionally call LUCA) on 3p21.3, we physically mapped and cloned the genomic DNA surrounding this locus as defined by homozygous deletions (6 , 9 , 10 , 14 , 17, 18, 19, 20) . Subsequently, the ~630-kb clone contig was sequenced jointly by The Washington University5 and The Sanger6 Human Genome Sequencing Centers. In more recent work, we placed the putative 3p21.3 TSG(s) in a ~120-kb segment that was defined by a homozygous deletion in a breast cancer specimen that was nested within the three small cell lung cancer homozygous deletions (10) . In parallel with these genetic and physical studies, we have been constructing a map of transcript sequences with the aim to identify a complete set of all transcripts encoded in the region and to define/annotate the respective genes. Here we report the catalogue of genes we have discovered to be residing in the 630-kb sequence and their experimental and informatics characterization. Of these, only two "G protein" genes, i.e., GNAI2 (21) and GNAT1 (22) , had been cloned and characterized previously, and from this catalog we positioned these two genes within the contig DNA sequence. The set of 19 genes found in the overlapping homozygous deletions in SCLCs NCI-H740, NCI-H1450, and GLC20 (18) , including eight in the smaller critical 120-kb sequence defined by a breast homozygous deletion (10) , were analyzed extensively. We used both manual experimental methods to study expression and search for mutations and web-based computational servers to predict possible protein functions. Four of the genes by Northern analysis showed frequent reduced or absent mRNA levels in NSCLC (CACNA2D2, SEMA3B, BLU, and HYAL1) and SCLC (BLU, HYAL1, and SEMA3B) cell lines. We found that six of the genes had mutations, but none of the 19 genes showed a high frequency of mutations (>10%) in the analyzed lung tumor samples. This raises the possibility that the putative TSG, LUCA, may be one of the genes with frequent loss of expression that occurs through acquired tumor promoter hypermethylation (23) . Alternatively, it could belong to the class of haploinsufficient TSGs. This novel class of TSGs is predicted to predispose to cancer in a hemizygous (+/-) state but does not show a second hit in the remaining wild-type allele in tumors. Further functional experimental analysis such as growth suppression studies and gene knockout strategies will be required to reveal the identity of the putative 3p21.3 TSG(s). In addition, our study shows that genomic DNA sequencing in combination with high-quality gene annotation is an effective method of gene discovery.


    MATERIALS AND METHODS
 Top
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Cell Lines and DNA Samples
Lung cancer cell lines were started and maintained by us using published methods, and information on the lines are summarized in the NCI-Navy collection (American Type Culture Collection; Ref. 24 ). DNA from the SCLC line, GLC20 (17) , was a gift of C. H. C. Buys (University of Groningen, Groningen, the Netherlands); the SCLC cell line, U2020 (25) , was a gift of P. Rabbitts (MRC, Cambridge United Kingdom). Lung cancer normal/tumor paired DNA samples were from the NCI-Navy collection established by B. Johnson (26) .

Commercial Reagents
The following materials were purchased from the vendors indicated: PCR tool kits, Perkin-Elmer Cetus; DNA sequencing tool kits, Applied Biosystems (Foster City, CA); fluorescent in situ hybridization reagents, Boehringer Mannheim (Mannheim, Germany); cDNA and cosmid libraries, ClonTech Laboratories (Palo Alto CA) and Stratagene (La Jolla, CA); EST clones from public databases, the I.M.A.G.E consortium7 or Research Genetics (Rockville, MD); SSCP tool kits and blotting nylon membranes, Amersham (Arlington Heights, IL); oligonucleotide primers, Life Technologies, Inc. (Rockville, MD); restriction enzymes, Life Technologies, Inc., New England Biolabs (Beverly, MA), and Amersham (Arlington Heights, IL); MTN poly(A)+ RNA blots, ClonTech Laboratories (Palo Alto, CA); buffers, blotting solutions, and RNase free water, Quality Biologicals (Gaithersburg, MD); chemicals, Sigma Chemical Co. (St. Louis, MO); and cell culture media, Life Technologies, Inc. (Rockville, MD).

Informatics Tools
The software package, GENSCAN (27) , was licensed from Christopher Burge (MIT, Cambridge, MA) and installed at the NCI Advanced Biomedical Computing Center at the Frederick Cancer Research and Development Center. The integrated informatics package, PANORAMA, incorporating BLAST, GENSCAN, GRAIL, and other gene interpretation features, was developed at University of Texas Southwestern Medical Center at Dallas (TX) by H. Garner and run on a Hewlett Packard Exemplar supercomputer. PANORAMA is available for internet use.8 For this analysis, GenBank was downloaded December 1999.

Manual Molecular Procedures
All molecular manipulations (DNA and RNA isolations, screening genomic and cDNA libraries, Northern and Southern blot analyses, and PCR) were performed using standard methods according to Sambrook et al. (28) . For DNA sequencing, cDNA clones were sequenced on an Applied Biosystems 373 or 377 DNA sequencer (Stretch) using Taq Didroxy Terminator Cycle Sequence kits (Applied Biosystems, Foster City, CA) with either vector or clone-specific walking primers. Cosmid and P1 phage DNAs were sequenced by the Washington University and Sanger Human Genome Sequencing Centers using the shotgun procedure as described (21 , 22) . FISH and two-color FISH were used to locate and orient the cosmid contig on chromosome 3p. Normal metaphase chromosomes were hybridized simultaneously with digoxygenin-labeled NotI linking clone NL1–210, part of cosmid LUCA1, (green) and biotin-labeled cosmid LUCA20 (red) (29) . 4',6-Diamidino-2-phenylindole was used as a counterstain. Both the metaphase spreads and interphase nucleus staining confirmed the single-site location of each probe on 3p21.31, establishing the following order: centromere-cosmids LUCA1-LUCA20-telomere. Pulsed-field gel electrophoresis analysis was performed as follows. High molecular weight DNA was prepared in agarose plugs as described (30) . Slices containing ~106 cells were digested for 16 h with 50 units of enzyme (NotI, Nru1, and Mlu1; Boehringer Mannheim) and resolved on 1% agarose gels using a Bio-Rad CHEF Mapper (Hercules, CA) and electrophoresis profiles, allowing separation in the range 50–1000 kb. For expression analyses, Northern blot hybridization was performed with cDNA probes using commercial MTN poly(A)+ RNA blots ClonTech (Palo Alto, CA) from a variety of adult human tissues and tumor cell lines and in-house blots with total or poly(A)+ RNA were prepared from lung cancer cell lines. Radioactive DNA probes were prepared by random priming Rediprime II (Amersham, Arlington Heights, IL). Hybridization was performed in ExpressHyb hybridization solution according to manufacturer’s instructions (ClonTech Laboratories, Palo Alto, CA). In addition, the presence of gene transcripts was monitored in silico by BLAST homology searches (31) in public EST databases. Mutational analyses were performed by RT-PCR-SSCP or exon-PCR-SSCP, followed by sequencing of shifted bands as described previously (32 , 33) . Experimental gene discovery by using conserved and transcribed genomic fragments was performed as detailed previously (18) .

Computational and Bioinformatics Procedures
World Wide Web-based Servers and Databases.
World Wide Web-based servers and databases (34) were used to analyze genomic, cDNA, and predicted protein sequences. In addition, the Wisconsin Genetics Computer Group, package 10 (35) , and the GENSCAN (27) programs were run at the Advanced Biomedical Computing Center (Frederick Cancer Research and Development Center), whereas the University of Texas Southwestern Medical Center integrated gene analysis software PANORAMA was run at the University of Texas Southwestern Medical Center.

DNA and Protein Sequence Analyses.
Global sequence alignments were done using BLAST (31) and Advanced BLAST9 (36) programs as provided by National Center for Biotechnology Institute,10and BLAST2/WU-BLAST.11Multiple sequence alignments, global and local, were done using the CLUSTAL version W program12as provided by EMBL, Baylor Computing Center,13and the Wisconsin Genetics Computer Group, package 10 program (35) . Protein structural features were delineated using the EXPASY proteomics tools.14Protein domains were discovered using Pfam (37) and SMART (38) programs as provided by EMBL.15Protein subcellular localization was predicted using PSORT16(39) . Signal peptides, transmembrane helices, and membrane topologies were predicted by SPLIT17(40) , TMHMM18(41) , and PSORT (39) programs. Protein motifs were found by visually inspecting local alignments or using the protein motifs (42) , ProfileScan and Prosite (43) programs. In addition, we used the INTERPRO server (44) .19

Discovery of Orthologous Genes in Model Organisms.
Stringent criteria for identification of candidate orthologous genes were applied as suggested (45 , 46) . In the mouse, orthologous pairs were >90% identical on the protein level with >90% alignment of their entire lengths (47) . In the fly, worm, and yeast, candidates were identified with 20–50% identity over at least 80% of their lengths. The TBLASTN program was used to search nonredundant nucleotides, Unigene, and EST databases of the model organisms. EST clusters were then built by the EST assembly machine server20or the EST Assembler21at Max Delbrück Center.22The advanced BLAST2 and Orthologue program (48) at EMBL was used to confirm the putative orthologous relationships and obtain and ascertain phylogenetic trees.

In Silico Gene Discovery Was Performed following Two Different Protocols.
Genome-wide repeats and low complexity regions in the genomic DNA sequences were identified and masked using the program RepeatMasker.23They were then used in BLASTN searches against EST, Unigene, and nonredundant nucleotide databases to identify potential transcripts (ESTs and cDNAs) and build EST clusters. Next, genomic sequences assembled from the individual cosmid sequences were subjected to gene prediction programs, i.e., GENSCAN (27) and XGRAIL (49) , with default settings to identify coding DNA sequences and corresponding protein sequences. These were then used in BLASTN and TBLASTN searches, respectively, against nonredundant nucleotide and EST databases to identify ESTs and cDNAs. The ESTs were then assembled into clusters (see above). Genomic information (repetitive elements, coding exons, ESTs, and known and predicted genes) was also obtained and analyzed by first-pass automatic genome annotation programs, PANORAMA (18) , for individual cosmid sequences and by the Rummage package24for the whole assembled ~630-kb contig sequence. The Rummage analysis was kindly performed by Drs. A. Rosenthal and R. Schattevoy, both at the Genome Sequencing Center (Jena, Austria). Recently, The Genome Annotation Channel25made available their first-pass annotations for most of the contig sequences.

Gene Annotations.
Annotations for the proteins for all of the genes discovered in the contig sequence were compiled from computational predictions, experimental observations, and by transfer of information from the yeast, worm, and fly orthologue pairs. Functional conservation between human proteins and their orthologous counterparts was repeatedly demonstrated experimentally (46 , 50) .


    RESULTS
 Top
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Overlapping and Nested Homozygous Deletions Define 120-kb and 250-kb Regions for a TSG Search and Identification of 25 Resident Genes in the Overall ~630-kb Region.
LOH (allele loss) is an important sign of somatically acquired genetic events in the natural history of many tumors and is a useful tool to discover the location of new TSGs. However, the regions of allelic losses involving 3p in lung cancer (6 , 9) and premalignant lesions (12 , 14, 15, 16) are multiple and often large, making it difficult to define a small consensus region that would facilitate a realistic positional cloning effort. Fortunately, three overlapping homozygous deletions in lung cancer cell lines (NCI-H740, NCI-H1450, and GLC20; Refs. 9 , 17, 18, 19 , 51 ) and one in a breast tumor specimen (HCC1500; Ref. 10 ), in conjunction with functional evidence of suppressor activity (9 , 52, 53, 54) , strongly support the existence of a putative TSG(s) in this location and successively narrowed the critical gene region, first to 370 kb (18) , and more recently, to ~120 kb (Ref. 10 ; Fig. 1Citation ). This latest reduction of the critical segment by a nesting deletion in a breast carcinoma assumes that the same gene(s) was targeted in both lung and breast malignancies (10) . However, if the targeted gene(s) were different, the lung cancer gene(s) could lie in either the ~120-kb region or in the more telomeric ~250-kb segment of the 370-kb region.



View larger version (33K):
[in this window]
[in a new window]
 
Fig. 1. Genetic and physical characterization of the human tumor nested homozygous deletion region on 3p21.3 showing the steps leading to the identification of the cloned resident candidate tumor suppressor genes. Top, ideogram of the banding pattern of chromosome 3p with some genetic framework markers found in and flanking the deleted region. Next is shown a physical pulsed-field gel electrophoresis map of the homozygously deleted region and its flanking sites. The positions of the rare cutter restriction sites are indicated above the line and the sizes (in kb) of the corresponding NotI restriction fragments are given below the line. The physical map is followed by a diagrammatic representation of the overlapping homozygous deletions discovered in SCLC cell lines (NCI H1450 in blue, NCI H740 in magenta, and GLC20 in green) and a nested deletion in a breast cancer tumor sample and corresponding cell line (H1500 in brick). The sizes of the homozygous deletions, deletion overlaps, and the precisely mapped breakpoints are indicated. The minimum tiling cosmid contig covering the deleted region is shown below the deletions. The position of the framework genetic markers is shown by downward arrows; the GenBank accession nos. for cosmid sequences are given under the lines, and the abbreviated cosmid/P1 clone numbering system given to the Genome Centers (e.g., Luca01, Luca02 ... Luca22, and P1 clone p3938) is given above the lines. The sequencing was done jointly by The Washington University (Luca12 to P3938) and The Sanger (Luca01 to Luca11) Genome Sequencing Centers using the high-redundancy shotgun procedure and is available from GenBank or the centers ftp sites (see: ftp.sanger.ac.uk/pub/human/sequences/Chr_3/ and ftp://genome.wustl.edu/pub/gsc1/sequence/st.louis/human/). Finally, a line representing the ~630-kb assembled genomic sequence with the positioned resident gene and breakpoints defining critical regions is shown at the bottom. The genes are represented by pointed rectangles, indicating the orientations of transcription. The gene names are given above and the GenBank accession nos. below the rectangles. The * with downward arrows indicates the position of two genes (temporary names LUCA1.2 and LUCA2.3) on cosmids Luca01 and Luca02, respectively, whose sequencing is not yet completed. The 6 genes outside of the critical region in the centromeric portion are in magenta, the 8 genes in the combined breast and lung 120-kb critical region are in blue, whereas the remaining 11 genes still within the rest of the lung cancer nested homozygous deletions (and additional 250 kb) are all in green. The 6 genes exhibiting at least some point or small mutations have an "M" in their box. Probable pseudogenes and other in silico predicted genes for which we currently cannot confirm their status as a bona fide gene are not shown.

 
The genomic DNA cosmid/P1 clone contig (Fig. 1)Citation covering the overlapping homozygous deletions in 3p21.3 is now sequenced almost to completion and is ~630 kb long.26Fig. 1Citation summarizes the genetic and physical map of the 3p21.3 region on which we have searched for a lung cancer TSG, including contig structure, and successive steps leading to the definition of the critical regions of ~370 kb, ~250 kb, and ~120 kb. We have systematically proceeded to identify and test all of the genes in the region for their candidacy as a TSG (Fig. 1Citation ; Table 1Citation ). In addition to the genes we detected by manual methods (such as screening cDNA libraries with cosmid DNAs and using the cosmids for exon capture), the genomic sequence allowed us to detect other gene candidates in silico. We used BLAST searches for ESTs, and gene prediction programs such as GRAIL, GENQUEST, and GENSCAN (27 , 49) including our own integrated informatics tool, PANORAMA.27The total number of verified resident genes in the 630-kb region currently is 25, 6 of which were in the centromeric portion of the 630-kb region and outside of the overlap of SCLC or breast cancer homozygous deletions (Fig. 1Citation ; Table 1Citation ). For these 25 genes, we have reported mutational and expression analysis on 3pK, IFRD2/SM15, and SEMA3B and laid out the rough outline of some of the gene locations in the contig (18 , 55, 56, 57) . The gene prediction programs, particularly GENSCAN, identified all of the 25 verified genes and fairly accurately (>90%) predicted their ORFs and intron/exon structure (although in some cases distinct genes were combined into one gene).


View this table:
[in this window]
[in a new window]
 
Table 1 Candidate and verified genes identified in the ~630-kb 3p21.3 homozygous deletion region and status of their mutation analysis

 
In addition to the 25 genes, by an intensive informatics effort using the genomic DNA sequence, we found several potential genes (Table 1)Citation . These included two gene candidates (arbitrary names Gene29, cosmid LUCA2, and Gene19, cosmid LUCA10; Table 1Citation ), identified by one or two EST hits that did not have intron/exon structure, were not detected by the GENSCAN prediction program, did not have a significant ORF, and did not have a detectable mRNA signal on MTN blots. We believe these may represent genomic DNA contamination of the EST database. Another set of gene candidates were detected by GENSCAN (29) and GRAIL-EXP (49) gene prediction programs analysis of genomic DNA (arbitrary names Gene22, Gene23, Gene24, Gene25, and Gene27; Table 1Citation ). Although programs provide a predicted cDNA sequence, ORF, and intron/exon structure, thus far we have found no EST hits in GenBank, no mRNA signal on MTN blots, and no homology of the predicted protein to proteins in GenBank. Thus, at the present time we cannot verify these GENSCAN predicted candidates as actual genes, and they are noted for future reference. These observations suggest that we have found all, or nearly all, resident genes in the ~630-kb sequence. This contention is likely true for the ~120-kb sequence (Fig. 1)Citation that contains all of seven complete genes and a large part of the CACNA2D2 gene. In this critical region, the genes reside in the following order with the indicated sizes and intergenic distances: CACNA2D2 (80 kb), 5-kb intron; PL6 (4.5 kb), 1.5-kb intron; 101F6 (2.3 kb), 161-bp intron; NPRL2/g21 (3.3 kb), 1.8-kb intron; BLU (4.7 kb), 4.4-kb intron; RASSF1/123F2 (7.6 kb), 1.6-kb intron; FUS1 (3.3 kb), 4.5-kb intron; and HYAL2 (2.7 kb). With a total of ~18 kb of intergenic and ~17 kb of intronic sequence, this region is extremely gene dense and contains 20–25% of coding sequence.

The location of the contig on 3p21.3 and its position relative to other 3p genes were determined by the presence in the ~630-kb sequence of several framework genetic markers mapped to this location (e.g., D3S1621 and D3S1568; Fig. 1Citation ), multiple radiation hybrid mapped sequence tagged sites (SHGC-11855 shown), and other linked genetic markers (see detailed marker information on genomic sequence contigs NT_002322, NT_000067, and NT-000069).28In addition, in situ hybridization [FISH and two-color FISH with DNA from cosmids LUCA1 (Z74618), LUCA8 (Z84495), and LUCA20 (AC004693)] located the contig to 3p21.3 and proved the orientation as shown on Fig. 1Citation (data not shown). We also performed radiation hybrid mapping with the TNG panel with selected markers for finer resolution of the location (not shown). In addition, using the contig sequence, we have determined intergenic distances, exon-intron structures, the intron sizes of the resident genes, and ascertained the direction of transcription. These are all features that are immediately available by performing a BLAST analysis with our deposited cDNA sequences (Table 1)Citation against the individual cosmid or assembled genomic sequences.29

In total, the 25 genes occupy ~630-kb of genomic DNA, resulting in an average size of ~25 kb/gene, which agrees well with the size of an average gene (~30 kb) estimated for the whole human genome. However, distribution of these genes along the sequence is rather uneven, with the highest gene density (genes in green in Fig. 1Citation ) in the ~120-kb region (average gene size, ~15-kb). Gene sizes varied dramatically, with the smallest gene size of 2.3 kb (g101F6) and the largest of ~140 kb (CACNA2D2). Actually, this large gene is interrupted by the centromeric breakpoints of both the HCC1500 breast cancer homozygous deletion (in cosmid LUCA6) and in the SCLC GLC20 homozygous deletion in LUCA10 (Fig. 1)Citation . The intergenic distances also differed enormously, from 161 bp (between g101F6 and NPRL2/g21) to ~60 kb (between genes g20 and CACNA2D2). The intron sizes also varied dramatically, the smaller ones ranging from 50–1000 bp and the largest being 40–50 kb (in CACNA2D2).

It is worthwhile to examine what type of genes (if any) would have been missed by our manual and informatics prediction analyses. These omissions could include genes that lack GenBank matches, genes whose sequences would not be recognized by the available gene prediction programs, such as non-protein coding genes, and genes whose mRNAs have a very restricted pattern of tissue expression. Another approach to this gene saturation problem will be to compare the human sequence with the corresponding mouse genomic sequence. Because functional sequences such as transcriptional enhancers and mRNA-like noncoding RNAs are highly conserved in mammalian genomes, they might be readily detected in this comparison (58 , 59) . The effectiveness of this approach has been demonstrated recently (60 , 61) . The mouse BAC clone (#AC025353) covering this region was used for this alignment (data not shown).

The three lung cancer homozygous deletions identified an ~370-kb region extending from cosmid LUCA10 (Z75742; defined by the centromeric end of the GLC20 homozygous deletion) to P3938 (AC004814; defined by the telomeric end of the NCI-H740 homozygous deletion (Fig. 1)Citation . The nested homozygous deletion in breast cancer HCC1500 (10) , which covered part of cosmid LUCA06 (Z84493) to part of cosmid LUCA13 (AC002455; Fig. 1Citation ), divided the 19 genes in the 370-kb region into two critical gene sets: 8 genes in a 120-kb segment extending from part of cosmid LUCA10 to part of cosmid LUCA13, and 11 genes in the telomeric portion (~250 kb, extending from cosmid LUCA13 to P1 clone P3938, AC004814). These deletions eliminated the 3pk/MAPKAP3 (U09578), CISH (AF132297, temporarily called Gene18 in our GenBank deposit), HEMK isoforms I and II (AF131220 and AF172244), Gene20 (AF188706), and two partially characterized genes [Gene28(Luca1.2) and Gene30(Luca2.3)], all lying in the centromeric portion of the contig (~260 kb of genomic DNA located in cosmids LUCA1, LUCA2, LUCA3, and LUCA4), from further consideration. In addition, expression and mutation analysis by us of 3pk (U09578) and by Sithanandam et al. (55) and Uchida et al. (62) of CISH (AF132297) provided other evidence excluding these genes as well.

Expression Analysis of the Candidate Tumor Suppressor Genes.
All of the remaining19 genes were analyzed extensively by manual and computational methods. Northern analyses with cDNA probes for each gene using commercial poly(A)+ RNA MTN blots (Clontech) and a panel of lung cancer cell line RNAs revealed the sizes of the respective transcripts and patterns of expression in normal human tissues and lung cancer samples (Figs. 2Citation and 3Citation ; Table 2Citation ). Expression in Northern blots prepared from total or poly(A)+ RNA of 20–30 RNA samples from lung cancer cell lines, representing both SCLC and NSCLC, revealed, for many of the genes, levels of expression similar to normal lung. No abnormal transcript sizes suggestive of mutations were found. However, several genes showed reduced expression in the lung cancer lines, i.e., the CACNA2D2 gene not expressed in 50% of the lines, the BLU gene expressed only in 30%, the HYAL1 gene expressed in <30% (Fig. 3)Citation , and SEMA3B and SEMA3F (19 , 57 , 63) expressed in <50%. Thus, expression analysis in lung cancers identified these five genes as potential TSG candidates based on loss of expression in a sizable number (but not all lung cancers). Because a possible mutation mechanism is tumor-acquired promoter hypermethylation (23) , the methylation status of the CpG islands associated with the genes showing reduced or absent expression is currently under investigation.



View larger version (89K):
[in this window]
[in a new window]
 
Fig. 2. mRNA expression in normal human tissues using MTN blots of 15 of the 630-kb 3p21.3 homozygous deletion region resident genes. The RNA filters (#7759, #7760 from Clontech, Palo Alto, CA) contain 2 µg of poly(A)+ mRNA per lane for each tissue indicated: Lane 1, heart; Lane 2, brain; Lane 3, placenta; Lane 4, lung; Lane 5, liver; Lane 6, skeletal muscle; Lane 7, kidney; and Lane 8, pancreas. cDNA probes for the genes were labeled and hybridized as described in "Materials and Methods." Published examples of MTN expression for some of the other genes are given for 3pk/MAPKAP3 (55) , CACNA2D2 (68) , SEMA3B and SEMA3F (57) , IFRD2/SM15 (56) , and HYAL3 (79) .

 


View larger version (108K):
[in this window]
[in a new window]
 
Fig. 3. mRNA expression in lung cancer cell lines of 14 of the genes resident in the 630-kb 3p21.3 homozygous deletion region. Replicate Northern blots were made using 20 µg of total RNA per lane for each of the samples; probes were labeled and hybridized as described in "Materials and Methods." Each of the blots was used for two to three different probes with stripping of label between hybridization. The lung cancer cell lines are shown above the panel as well as one B lymphoblastoid cell line (BL5). As positive controls, the various replicate blots showed approximately equivalent loading by rRNA amounts and for various positive control probes (not shown). Note that for genes PL6 and G15/RBM5, there is an approximately equal expression in most of the samples for these two genes. As a negative control, note that NCI-H740 RNA homozygously deleted for the entire region provides a negative background signal. The size of the mRNA species for each gene is given on the right of the panel. For published examples of expression of several of the other genes in lung cancer cell lines, see 3pk/MAPKAP3 (55) , CACNA2D2 (68) , SEMA3B and SEMA3F (57) , and IFRD2/SM15 (56) . The tumor histologies of the lung cancer lines are: SCLC (H82, H146, H249, H524, H740, H1514, H1618, H2141, H2171, and H2227); adenocarcinoma (H358 [bronchioloalveolar] H838, H1742, and H2077); large cell (H460, H1155, and H1299); and mesothelioma (H290 and H2052; Ref. 24 ).

 

View this table:
[in this window]
[in a new window]
 
Table 2 mRNA expression of 3p21.3 genes in Multiple Tissue Northern (MTN) blotsa

 
Mutation Analysis of the Candidate Tumor Suppressor Genes.
Initially, we performed Southern blot analysis on ~100 genomic DNAs from our large panel of lung cancer cell lines representing both SCLC and NSCLCs (24) using cDNA or genomic probes representing each of the 25 genes in our search for other homozygous deletions or genomic DNA rearrangements (data not shown). We found only the homozygous deletions listed in Fig. 1Citation and ~30-kb homozygous deletion in SCLC NCI-H524 involving most of the genomic sequence in cosmid LUCA13-interrupting gene FUS1 to gene HYAL1 (data not shown, and see discussion below). Mutational analyses of the resident genes (summarized in Tables 1Citation and 3Citation ) were performed on lung cancer cDNAs or genomic DNAs by RT-PCR-SSCP or PCR-SSCP, respectively, followed by DNA sequencing of any altered bands detected. A large number of both SCLC and NSCLC cell lines and for some genes, paired normal and tumor DNA samples, were used. In addition, the coding sequences of the FUS1 and the BLU genes were sequenced completely in a large number of lung cancer samples, including paired tumor and normal tissue samples. The results for the eight-gene set in the 120-kb region show that the genes either had no mutations at all (CACNA2D2, PL6, 101F6, 123F2, and HYAL2), or the mutation rate was in the range of 5% (NPRL2/g21, BLU, and FUS1; Table 1Citation ). The same absence or low frequency of mutations was detected in the 11 genes lying in the more telomeric ~250-kb portion of the 630-kb contig with HYAL1, FUS2, and SEMA3B, each exhibiting a few mutations (Table 1)Citation . Many examples of the mutations detected in lung cancer cell lines are given in Table 3Citation . Thus, this extensive mutational analysis (involving 1102 separate tumor sample/gene mutation tests; Table 1Citation ) did not pinpoint a strong candidate gene with a high frequency of mutation among either of the critical gene sets. This finding was unexpected and disappointing. Because it is possible that genes not involved in tumorigenesis may show a low frequency of mutations in common tumors related to a "mutator phenotype" expressed by tumors (64 , 65) , the finding of a low mutation rate in a candidate TSG(s) must be regarded with caution. Accordingly, other lines of evidence besides finding frequent mutations are needed to rule out positionally defined candidate TSGs. These include analysis for tumor-acquired promoter hypermethylation, gene transfer into tumor cells with tests for suppression of the malignant phenotype, and disruption of the candidate genes in "knock-out" mice. In addition, we now also have to consider the newly recognized class of haploinsufficient TSGs, where the presence of only one wild-type allele facilitates tumorigenesis (see "Discussion").


View this table:
[in this window]
[in a new window]
 
Table 3 Examples of tumor cell lines bearing homozygous amino acid sequence altering mutations of candidate 3p21.3 TSGs

 
Informatics Analysis for Predicted Protein Functions and Orthologue Identification in Model Organisms.
Computational predictions of biochemical functions could also provide clues as to whether a particular gene could serve a tumor suppressor function by affecting cell growth and/or survival. Therefore, we next studied the biochemical functions and subcellular localization of the proteins encoded by the genes using a variety of computational tools. Information about protein function is a continuum that begins with the finding of homology between proteins within and between species, domain composition, functional motifs, and subcellular localization signals, extending ultimately to demonstration of function by biochemical analysis. Finding orthologous genes in model organisms (mouse, fly, worm, and yeast) permits the transfer of any functional annotation to the human gene in question; therefore, we performed an extensive search for orthologues of the resident candidate TSGs in these model organisms. The results of these computations are summarized and discussed in the annotations for each of the genes (see below). As expected, we found that all 19 genes have true murine orthologues discovered in mouse EST databases. These genes showed nearly 90–100% identity/similarity on the protein sequence level and about 80–90% on the mRNA sequence level (GenBank accession nos. are provided in the annotations). In the worm, highly likely orthologous pairs were identified for 14 of the 19 genes using stringent criteria of orthology, i.e., 30–50% amino acid sequence identity or similarity with >80% alignment of their entire amino acid lengths (45 , 46) . In the fly, highly likely orthologues were found for 10 of the 19 genes among the complete set (~14,000) of fly genes30(66) . We noticed that orthologous pairs fall into three categories: common for both worm and fly, only present in the worm, and only present in the fly. Yeast genes sharing ~50% similarity in common domains/ features were found for two of the genes, i.e., PL6 and NPRL2/Gene21, probably only the Schizosaccharomyces pombe counterpart (NPRL2) of NPRL2/g21 should be considered a candidate orthologous gene. The availability of the complete DNA sequences of yeast (50) , worm31, and fly (66) genomes makes it unlikely that we have missed any of the orthologous gene pairs for our TSG candidates in these three model genomes. We now provide annotations for each of the 19 genes found in the ~370-kb segment starting with the genes in the smaller ~120-kb sequence (Fig. 1)Citation .

The {alpha}2{delta}-2 calcium channel subunit gene, CACNA2D2, was discovered in silico by both finding EST matches with fragments of genomic sequence and by exons predicted by GENSCAN. The gene occupies ~140 kb of genomic space and is composed of at least 40 exons. It is expressed as a 5.5–5.7 kb mRNA. Three mRNA splice forms have been detected that code for two protein isoforms in several normal tissues. GenBank deposits AF040709 (mRNA isoform 3) and AF042792 (mRNA isoform 1) differ in the 5' untranslated region and encode the same amino acid sequence (protein isoform I), whereas AF042793 (mRNA isoform 2) differs in the 5' translated region and has a slightly different amino terminal amino acid sequence (protein isoform II). The expression of CACNA2D2 is reduced or absent in >50% of lung cancer cell lines, particularly NSCLCs. However, no mutations were detected in analysis of 60 lung cancer cell lines and 40 paired normal/SCLC tumor samples. The nucleotide sequence suggests that the gene encodes an auxiliary regulatory {alpha}2-{delta} subunit of calcium channels and joins the {alpha}2-{delta}-1 (previously A subunit) gene (67) as a new and second member of the {alpha}2-{delta} gene family. Three putative transmembrane helices predicted previously in the {alpha}2{delta}-1 protein (67) were also predicted in both protein isoforms of the CACNA2D2 gene with the SPLIT 35 program (40) . In addition, protein isoform I of the CACNA2D2 gene has another membrane helix at the very amino terminus. Using the TMHMM program (41) , all three {alpha}2{delta} proteins were predicted to span the membrane only once at the amino ({alpha}2{delta}-2 isoform I) or at the carboxy termini ({alpha}2{delta}-1 and {alpha}2{delta}-2 isoform II), favoring the single-transmembrane model for the {alpha}2{delta} subunit proteins, which was verified experimentally for the {alpha}2{delta}-1 protein (67) . A protein binding a VWA-like domain was discovered by the PFAM (37) program in the extracellular part at similar positions in all three {alpha}2{delta} subunit proteins (amino acid residues: 291–469 and 222–400 for {alpha}2{delta}-2 protein isoforms I and II, respectively, and residues 253–430 for the {alpha}2{delta}-1 protein). The VWA-like domain may facilitate the binding of the {alpha}2{delta} complex with the calcium channel {alpha}-1 pore forming subunit protein (67) . The almost identical membrane topologies, similar domain structures, and posttranslational modifications of all three {alpha}2{delta} subunit proteins strongly support the identity of the new {alpha}2{delta}-2 gene as a member of the {alpha}2{delta} gene family. To provide experimental confirmation of this predicted function, through injection of CACNA2D2 cRNA into Xenopus oocytes, we have confirmed recently that CACNA2D2 acts as a regulatory subunit of voltage-gated calcium channels able to augment the function of all three pore-forming units (68) . BLAST (36) searches in the mouse EST database detected two different nonoverlapping EST clones (accession nos. AA000341 and AA008996), which showed 91 and 85% cDNA sequence identity with CACNA2D2 (residues 2925–3421 and 4989–5391), respectively. These EST sequences showed only limited homologies to the murine {alpha}2{delta}-1 gene splice forms, indicating that they represent true orthologous sequences of the human CACNA2D2 gene. This was further corroborated by protein alignment of the 86-amino acid ORF encoded by mouse EST AA000341, which was 96% identical to the CACNA2D2 isoform I protein (amino acid residues 922-1005). The worm genome also contains two {alpha}2{delta} genes; by stringent criteria of orthology (i.e., ~50% identity/similarity with >80% alignment of their entire amino acid sequence) the worm gene, T24F1.6, appears to be the orthologue of CACNA2D2, whereas the second worm {alpha}2{delta} gene, UNC-36 (accession no. P34374), is the orthologue of the {alpha}2{delta}-1 gene (67) . The UNC-36 phenotypes do not affect growth, and no phenotypes were yet reported for the T24F1.6 locus. The fly proteome (~14,000 proteins; Ref. 66 ) contains three {alpha}2{delta} proteins (accession nos. AAF53505, AAF53476, and AAF58335) of which the first contains a likely orthologue of {alpha}2{delta}-1 (44) , the second is a likely orthologue of the {alpha}2{delta}-2 gene, and the third of a still-not-cloned human gene; all three have the VWA_DOMAIN. The yeast proteome (50) contains only one ion channel gene and appears to have no orthologues for either of the {alpha}2{delta} genes. Despite the lack of mutations, the absence of CACNA2D2 expression in many but not all NSCLCs with high CACNA2D2 expression in normal lung makes CACNA2D2 an excellent candidate TSG with the need for testing of function in tumor cells and study of acquired promoter hypermethylation as a method of inactivation of gene expression.

The PL6 gene was discovered manually by probing Northern blots with genomic fragments. The gene occupies 4.5 kb of genomic space, is composed of two exons, and expressed as a 2.2-kb mRNA in many normal human tissues including lung. The expression of PL6 is slightly reduced in some SCLC lines and abundantly represented in the human and mouse EST databases. No mutations were detected in 38 cell lines and 40 paired normal/SCLC tumor samples. By sequence analysis, PL6 encodes an integral plasma membrane protein [PSORT program (39) ] with six [SPLIT program (40) ] or seven to eight [TMHMM program (41) ] transmembrane helices. The predicted cytoplasmic portion of the protein (the last 103 amino acids, residues 249–351) contains an OMPdecase domain [residues 274–298; PFAM, (39) ] that may involve PL6 in protein-protein interactions and a bipartite NLS (residues 282–299; Ref. 39 ) that may guide it to the nucleus. The mouse orthologue was discovered in an EST (accession no. W96860), sequenced (our accession no. AF134238), and shown to be 92% identical on protein and 87% on cDNA levels. The worm F11A10.3 gene encoding a multidomain protein that aligns with PL6 and the aligned region contains the NLS and the OMPdecase domains, suggesting that it is the orthologue of PL6. The fly gene CG9536 product (450 residues; accession no. AAF52388) is a likely orthologue of PL6 (48% similarity over the first 306 residues of PL6) and is also an integral membrane protein with seven to eight transmembrane helices. It has two HMW kininogen domains (residues 325–349 and 353-3760) but no NLS and OMPdecase domains. The yeast gene Yol107w product has substantial homology with PL6 but does not have the NLS and the OMPdecase domain. The absence of mutations and robust expression of PL6 in most lung cancers suggest that PL6 is an unlikely candidate TSG.

The 101F6 gene was discovered manually by screening arrayed cDNA libraries with cosmid LUCA12 DNA. The gene space of 3.2 kb contains four exons encoding a 1.5-kb mRNA. The gene is expressed in many normal tissues including lung, is highly expressed in SCLC and NSCLC cell lines, and is abundantly represented in the EST data bases. No mutations were detected in 40 cell lines and 40 paired normal/SCLC tumor samples. By sequence analysis, 101F6 encodes an integral plasma membrane protein [PSORT program (39) ] with six [SPLIT program (40) and TMHMM program (41) ] transmembrane helices with both termini in the cytoplasm. No other known domains or significant motifs were detected. The mouse orthologue was discovered in the mouse EST database (accession nos. AA285935, AA198541, and AA198960), sequenced (our accession no. AF131206), and shown to be 95% identical on the protein and 85% on the cDNA sequence level. No orthologous pairs were detected in the fly (66) , worm, and yeast proteomes (50) . The absence of mutations and robust expression of 101F6 in most lung cancers suggest that 101F6 is an unlikely candidate TSG.

The NPRL2/Gene21 gene was discovered in silico by finding both ESTs matches and GENSCAN predicted exons. The gene space of 3.3 kb contains 11 exons coding for a 1.5-kb mRNA with multiple splice isoforms that are expressed in many normal tissues including lung and testis and is abundantly represented in the EST databases. NPRL2/Gene21 is well expressed in SCLC and NSCLC lines except for the SCLC line NCI H1514. A frameshift mutation producing a stop codon was detected in 1 of 40 lung cancer cell lines. Sequence analysis shows NPRL2/Gene21 encodes a soluble protein that has a bipartite NLS (residues 62–79) and a protein binding domain, granulin (residues 86–98), predicted by PFAM (35 , 37 . The mouse orthologue was discovered in mouse EST databases (accession nos. AI037102, AA764527, AA709972, and W64225), sequenced (our accession no. AF131206), and shown to be 97% identical on protein and 90% on cDNA sequence levels. True orthologues were identified: in yeast (the NPR2 gene in Saccharomyces cerevisiae, GenBank accession no. P39923, and the hypothetical Mr 47,000 protein in S. pombe, accession no. Z99163); in the fly (66) , the CG9104 gene product (accession no. AAF48677) with 65% similarity over the whole length of the NPRL2/g21; and in the worm (accession no. U61949) proteome databases. However, only the mouse orthologue contains the bipartite NLS (residues 62–79) and the granulin domain (residues 86–98). NPRL2/Gene21 mRNA is expressed in most lung cancers. The mutations in NPRL2/Gene21, particularly the stop mutations, indicate the need for further study of this gene as a candidate TSG.

The BLU gene was discovered manually (and serendipitously) using PCR primers (kindly provided by B. Vogelstein, Johns Hopkins, Baltimore, MD) to screen for the presence of the ß-catenin gene (at the time recently assigned to chromosome region 3p21) in our cosmid contig. Although a PCR product was identified, DNA sequence analysis showed no sequence relationship of the product to ß-catenin. This PCR product was used as a probe that identified a mRNA on Northern blot analysis, which then led to the subsequent isolation of the full BLU cDNA by library screening. The gene space of ~4.5 kb contains 11 (testis version) or 12 (lung version) exons coding for a 2-kb, alternatively spliced mRNA, well expressed in lung and testis but not expressed in all other tested human tissues. The EST databases contain a moderate number of hits, mostly from lung and testis cDNA libraries. The testis isoform contains 11 exons because of a complex selection of an alternative acceptor site. The testis-specific protein isoform contains a different amino acid sequence between residues 199 and 234 as compared with the lung-specific isoform; this change results in the loss of one of three PKC phosphorylation sites (residues 229–231). The expression in SCLC and NSCLC cell lines is reduced or virtually undetectable in 70% of tested lines. Three missense mutations were discovered in a sample of 61 lung cancer cell lines. The BLU protein is likely a soluble cytoplasmic protein and shares 30–32% identity over a stretch of 100–112 amino acids (residues 334–437 or 318–430) with proteins of the MTG/ETO family of transcription factors (69) and the suppressins (70) that may regulate entry into the cell cycle and suppress growth of colon carcinoma cells. The "Zn knuckle" motif involved in specific protein-protein interactions is part of this domain and is present in many proteins.32No orthologous pairs were found in the worm and yeast proteomes (50) . However, the fly genome (66) contains a true orthologue of BLU: the CG11253 gene product (accession no. AAF49850) is of similar size (451 residues), has 49% amino acid sequence similarity over the whole length of BLU, and also has a MYND finger domain (residues 412–448). Several other fly, worm, and S. pombe proteins share 35% identity with the MTG/ETO domain. The mouse orthologue was discovered in mouse EST databases (accession nos. AI595515 and AI429164), sequenced (our accession no. AF123386), and shown to be 89% identical on protein and 87% on cDNA levels. The loss of expression in most lung cancers and the occurrence of a few mutations make BLU an attractive TSG candidate requiring further functional and promoter methylation status studies.

The RASSF1/123F2 gene was discovered manually by screening gridded cDNA libraries with cosmid LUCA12 DNA. The gene space of 7.6 kb contains 5 exons coding for 2-kb, alternatively spliced mRNAs ("short" and "long" forms, 123F2SF and 123F2LF, that should now be referred to by Human Genome Organization-approved nomenclature as RASSF1C and RASSF1A, respectively) that are well expressed in all analyzed human tissues including lung. The RASSF1C/123FSF but not the RASSF1A/123F2LF mRNAs are well expressed in most lung cancer cell lines. The mRNA is well represented in EST databases from normal and tumor tissues. Using GENSCAN prediction programs, the RASSF1A/123F2LF splicing form was discovered using RT-PCR on mRNA with a difference in amino acid sequence in the NH2 terminus, giving a total amino acid sequence of 340 amino acids compared with 270 amino acids for the RASSF1C/123F2SF. The amino acid sequence of RASSF1A/123F2LF contains a predicted DAG binding domain also found in the related gene NORE1 but not found in the RASSF1C/123F2SF cDNA sequence. RASSF1A/123F2LF mRNAs also come in multiple tissue-related splicing forms, with slight differences in amino acid sequence, including forms for lung (RASSF1A, AF102770), heart (RASSF1D, AF102771, and pancreas (RASSF1E, AF102772). No mutations were detected in 40 paired normal tumor (SCLC/NSCLC) DNA samples (studied for RASSF1C/123F2SF and RASSF1/123F2 common region) and in 38 lung cancer cell lines (RASSF1C/123F2SF and RASSF1A/123F2LF, all regions). The RASSF1/123F2 protein is a soluble cytoplasmic protein that contains a Ras association domain (residues 124–218) discovered by the SMART (38) and PFAM (37) programs. Although not all Ras association domains bind RasGTP, the Ras association domain in the mouse paralogue of RASSF1/123F2, NORE1 was found to bind RasGTP (71) . The NORE1 protein also contains the PKC-C1 and DAG/PE domains, which are found in the RASSF1A/123F2LF predicted protein but not in the RASSF1C/123F2SF protein. Recently, the Kastan group has identified RASSF1/123F2 amino acid sequence (common to both the RASSF1C/123F2SF and RASSF1A/123F2LF proteins) as a potential phosphorylation target for ataxia telangiectasia mutated (72) . The mouse orthologue of RASSF1/123F2 was discovered in mouse EST databases (accession nos. AA543890, AA161846, and AA466998), sequenced (our accession no. AF132851), and shown to be 97% identical on protein and 88% on cDNA sequence levels. In contrast, the human orthologue of the mouse NORE1 and the rat MAXP1 (accession no. AF002251) genes is present in a single human EST (accession no. AA362184). Thus, RASSF1/123F2 is part of the same gene family as (but not the orthologue of) NORE1 and the rat gene MAXP1. The worm gene, T24F1.3 (accession no. Z49912), encodes a 615-amino acid hypothetical protein that shares 33% identity and 53% similarity over 95% of the length of the RASSF1/123F2 protein. The T24F1.3 protein contains in the shared portion with RASSF1/123F2 the Ras association domain (residues 396–496), and in addition a PH domain (residues 1–53), and PKC-C1, DAG/PE binding domains (residues:164–214), which are found in the RASSF1A/123F2LF predicted protein. The fly (66) and yeast (50) proteomes do not contain a gene with substantial homology to 123F2. The absence of expression of RASSF1A/123F2LF in many lung cancers makes this isoform an attractive candidate for further promoter hypermethylation and tumor-suppressing functional studies. In fact, recent studies by us have shown that RASSF1A/123F2LF promoter region CpG islands undergo tumor-acquired hypermethylation associated with loss of expression, and that forced re-expression of RASSF1A/123F2LF leads to suppression of the malignant phenotype.33

The FUS1 gene was discovered manually by screening cDNA libraries with a genomic fragment from the area of cosmids LUCA12 and LUCA13 showing sequence conservation by Southern blot hybridization and isolated as the fusion (FUS = "fusion") junction of the ends part of a ~30-kb homozygous deletion in SCLC NCI-H524 linking LUCA12 with LUCA13 sequences. The gene space of 3.3 kb contains three exons coding for a 1.8-kb mRNA that is well expressed in all analyzed human tissues including lung and in 20 lung cancer cell lines. The mRNA is well represented in EST databases from normal and tumor cells. Three mutations were discovered in 79 lung cancer cell line DNAs leading to truncated products. The FUS1 protein (110 amino acids) is probably a soluble cytoplasmic protein with a high pI of 9.69; no domains or known motifs were detected by SMART (38) or PFAM (37) programs. The mouse orthologue was discovered in mouse EST databases (AA867009, AA473614, and AA672013), sequenced (our accession no. AF123387), and shown to be 93% identical on the protein level and 87% on the cDNA level. The fly proteome (66) does not contain a gene with substantial homology to FUS1. However, the worm gene, C09E9.1, shows 41% identity on global alignment and 43% identity over 83% of the FUS1 protein length and should be considered a candidate orthologue of FUS1. This small worm protein (123 amino acids) is predicted to have a bipartite NLS (residues 84–101) and weak similarity with DNA-directed RNA polymerase subunit A' (accession no. P31813). The mutations found in FUS1 make it an attractive candidate for further functional TSG studies.

The HYAL2 gene along with HYAL1 was discovered manually by screening cDNA libraries with a genomic fragment from LUCA13 conserved across species in Southern blotting. Because these were the first two genes we isolated in our positional cloning effort, they were initially given the working names LUCA1 and LUCA2 (see GenBank deposits). With the discovery of their function as hyaluronidases, they should now be referred to as HYAL1 and HYAL2 and the LUCA1, 2 names reserved for future discovery of a lung cancer functional TSG. The gene space of 2.8 kb contains three exons that encode a 2-kb mRNA well expressed in all analyzed human tissues including lung, well expressed in lung cancer cell lines except SCLC line NCI-H524 because of a small (~30 kb) homozygous deletion/rearrangement.34HYAL2 is abundantly represented in EST databases from normal and tumor tissues. No mutations were detected in 40 lung cancer cell lines tested. The HYAL2 protein is a member of a large family of hyaluronidases (EC3.2.1.35), and in fact the expressed recombinant protein was shown to have enzymatic activity (73) . PSORT predicts a signal peptide and cell surface and lysosomal sublocalizations (37 , 39) . Similarly, SMART predicts a signal peptide and a Ca2+ binding epidermal growth factor-like domain (residues 365–440; Ref. 38 ). Recently, the mouse orthologue (AJ000059) was cloned and mapped to the syntenic region of mouse chromosome 9 between the microsatellite markers D9Mit183 and D9Mit17 (74) . The worm gene, T22C8.2, encodes a similar size protein of 458 amino acids and shows 32% global identity and 50% similarity with the HYAL2 protein and is predicted to be an orthologous protein by the Orthologue program (48) . The fly (66) and yeast (50) proteome databases do not have any members with homology to the hyaluronidase family of proteins.

HYAL1 was discovered along with HYAL2, manually by screening cDNA libraries with a conserved genomic fragment from cosmid LUCA13. It is another hyaluronidase with amino acid sequence homology to HYAL2 and HYAL3 (see below). The gene space of ~3.5 kb contains three exons coding for a 2.6-kb mRNA well expressed in all analyzed human tissues, including lung, and is abundantly represented in EST databases from normal and tumor tissues. However, it is not expressed in 18 of 20 lung cancer cell lines. Two missense mutations were detected in 40 lung cancer cell lines. The HYAL1 protein is a member of a large family of hyaluronidases (EC3.2.1.35) and in fact was shown to have enzymatic activity (accession nos. U03056 and U96078.1). Triggs-Raine et al. (75) identified two mutations in the HYAL1 alleles of a patient with newly described lysosomal disorder, mucopolysaccharidosis IX, a mutation that introduces a nonconservative amino acid substitution (Glu268Lys) in a putative active site residue and a complex intragenic rearrangement, 1361del37ins14, which results in a premature termination codon. They reasoned that the mild phenotype engendered by these mutations was the result of redundancy resulting from the three tandemly located hyaluronidases HYAL1, HYAL2, and HYAL3 (discussed below). Thus far, no increased incidence of cancer has been reported in these kindreds. PSORT predicts a signal peptide and a cell surface and lysosomal sublocalizations (39) . Similarly, SMART predicts a signal peptide and a visible Ca2+ binding epidermal growth factor-like domain (residues 357–430; Ref. 38 ). Recently, the mouse orthologue was cloned and shown to map to the syntenic mouse chromosome 9 region (accession no. AF011567; Ref. 76 ). The worm gene, T22C8.2, encodes a similar size protein of 458 amino acids and shows 31% global identity and 46% similarity with the HYAL1 protein and contains the same domains. It is predicted to be an orthologous protein by the Orthologue program (48) . The fly (66) and yeast (50) proteome databases do not have a member homologous to the hyaluronidase family of proteins. The absent expression and occurrence of mutations make HYAL1 an attractive candidate for future promoter methylation and TSG functional studies.

The FUS2 gene was discovered manually by screening cDNA libraries with a genomic fragment occurring in cosmid LUCA14 that showed conservation in Southern blot cross species hybridizations. FUS2 also was present in the fusion junction genomic DNA clone isolated from the NCI-H524 30-kb homozygous deletion but was not involved or rearranged in this deletion but was given the "FUS" working name at the time of its isolation. The gene space of ~3.5 kb contains an intronless, single-copy gene (accession no. AF040705) coding for a 1.9-kb mRNA expressed in normal human tissues including lung. However, an alternatively spliced form (accession no. AF040706) with one intron exists that results in the same predicted amino acid sequence. The alternatively spliced form contains an intron in the 5' untranslated region, whereas the other form is intronless. The mRNA is well represented in EST databases from normal and tumor tissues. Four FUS2 missense mutations were detected in 78 lung cancer cell lines. The FUS2 protein was predicted to be a soluble nuclear protein [predicted by PSORT (39) ] with interesting domains and motifs. SMART (38) and PFAM (37) programs predicted an acetyltransferase (GNAT) domain (residues 66–189) and a proline-rich domain (residues 239–262) that overlaps (residues 234–249) with the Wilms’ tumor protein signature. A Src homology 2 domain (residues 240–250) was detected by the BLOKS (77) program, whereas the EMOTIF (42) program detected a ZP motif (residues 25–32) and an eukaryotic thiol (cysteine) protease signature (residues 180–188), which may explain the suggested weak similarity to furin-like proteases (accession no. AAC02732). The presence of these domains raises the intriguing possibility that FUS2 may be directly involved in nuclear activities. However, Zegerman et al. (78) demonstrated recently that FUS2 functions as an N-acetyltransferase using a ping-pong mechanism with a specificity for substrates and is a soluble cytoplasmic protein. The worm protein, C56G2.15, shows 32% identity and 65% similarity on global alignment and should be considered a true orthologue of the FUS2 gene. As expected, it also contains all but the ZP predicted protein domains and is predicted to be a nuclear protein. Interestingly, the worm gene contains three small introns in contrast to the one or no intron forms of the human FUS2 gene. The mouse orthologue was discovered in mouse EST databases (accession nos. AA051756, AA051686, AI425576, and AA833145), sequenced and shown to be 69% identical on protein level and 87% on cDNA level (accession no. AF172275). The mouse mFUS2 protein contains an additional 28-amino acid stretch, and similar to the human protein, is predicted to be a nuclear protein by the PSORT program (39) . PFAM (37) and ProfileScan (43) both predict an acetyltransferase (GNAT) domain (residues 92–217) and a proline-rich domain (residues 267–291). The Flybase (66) contains several ESTs (accession nos. AI064351, AI109425, and AI404849), which could be assembled into a partial cDNA coding for a 168-residue protein that is 39% identical and 60% similar to the human FUS2 protein and is predicted to have an acetyltransferase (GNAT) domain (residues 13–138). However, the fly proteome (66) does not have a true orthologue of FUS2, only several proteins with a GNAT domain. The occurrence of mutations and the demonstration of its biochemical activity make FUS2 and attractive candidate for future TSG functional studies.

The HYAL3 gene was discovered in silico by finding EST matches and sequence relationship to the HYAL1 and HYAL2 genes. It occupies 5.5 kb of genomic space and codes for a ~2.0-kb mRNA composed of two or three coding exons, expressed in several human tissues including lung and testis and well represented in EST databases. The protein belongs to the hyaluronidase family of enzymes (EC3.2.1.35) and thus represents the third member of this family in the region. Phylogenetically, it is closer to the worm hyaluronidase gene, T22C8.2, than the other members of the human family. No mutations were found in 40 lung cancer cell lines. No mouse orthologous sequences were found in the databases as of April 2000. HYAL3 RNA was not expressed in any lung cancer cell lines (data not shown); however, it has a very restricted pattern of expression in normal tissues and was not found by Csoka et al. to be expressed in normal lung (79) . The lack of mutations and absent expression in normal lung make HYAL3 a less attractive TSG candidate.

The IFRD2/SKMC15/SM15 gene was discovered experimentally by screening cDNA libraries with conserved genomic fragments (56) . GenBank refers to SKMC15/SM15 as IFRD2, and thus we will use the terminology IFRD2/SM15. The gene space is ~6 kb and codes for a ~4-kb mRNA composed of 12 exons, expressed in several human tissues including lung (56) . It is well represented in EST databases from normal and tumor cells. The IFRD2/SM15 protein is a soluble nuclear protein as predicted by PSORT (39) and contains a bipartite NLS at residues 115–132 predicted by ProfileScan (43) . PFAM (37) discovered one Armadillo-ß catenin-like repeat (residues 249–288), which suggests the possibility of involvement in APC signaling. No mutations were found in 63 lung cancer cell lines (56) . The mouse orthologue was discovered in ESTs (accession no. W65790), sequenced and shown to be 93% identical on the protein level and 87% on the cDNA level. This true mouse orthologue of IFRD2/SM15 is different from the mouse gene mIFRD1/PC4, which has its own human orthologue located on chromosome 7q22–31 (80) . Interestingly, mIFRD1/PC4 and its human orthologue, IFRD1, are not localized in the nucleus and probably are membrane proteins. IFRD2/SM15 and IFRD1/PC4 have different patterns of expression during mouse development (47 , 80) . The relation of IFRD2/SM15 and probably other members of the family (PC4 and IFRD1) to the IFNs is not really supported. The slightly shorter worm protein, F58B3.6 (accession no. Z73427), shows on global alignment 36% identity and 52% similarity to the SM15 protein and should be considered a potential orthologue. The fly gene CG3098 product (accession no. AAF51186) is shorter (324 residues) and has 43% similarity to 277 residues of SM15, has a NLS signal (residues 93–110), and should be considered a potential orthologue. The expression of IFRD2/SM15 and lack of mutations make it a less attractive TSG candidate.

The SEMA3B/SEMA A(V) gene was discovered experimentally by using DNA fragments from cosmid LUCA14 to screen cDNA libraries and for capture of exons (57) . The correct nomenclature for this member of the semaphorin family is SEMA3B [previously referred to as SEMA-A(V)]. It is composed of 17 exons spread over 8–10 kb of genomic space coding for a 3.4-kb mRNA expressed in several normal tissues including lung and testis and not expressed at all in 12 SCLC lines (Fig. 2Citation ; Ref. 57 ). It is well represented in EST databases from normal and tumor tissues. Three missense mutations were found in 39 lung cancer cell lines; all mutations were in NSCLCs. The mouse semaphorin A gene (accession no. X85990) is most likely the mouse orthologue of the SEMA3B gene (86% identity and 89% similarity on the protein level on global alignment; Ref. 48 ). Several mouse EST clones (accession nos. AI553114, AA518074, and AA466386) when translated show 80–94% identity. The worm genome contains three semaphorin genes, of which the CeSema gene (accession no. U15667) shows 33% identity and 49% similarity over the whole length of the CeSema protein and could be considered an orthologous gene (48) . The fly proteome (66) contains seven semaphorin proteins, of which the product of the Sema-2a gene (accession no. AAF57990) is of similar size, predicted to be secreted, has a similar domain structure, and should be considered a potential candidate orthologue of SEMA3B. The SEMA3B protein is predicted by PSORT (39) to be an extracellular secreted protein. SMART (38) and PFAM (37) programs identify a signal peptide (residues 1–25), a PFAM:SEMA domain (residues 55–497), and one IGc2 domain (residues 587–646). Interestingly, the PFAM:SEMA domain is also present in the extracellular part of the MET and RON oncoproteins belonging to the MET family of receptor tyrosine kinases, as discovered by the PFAM program (37) . Thus, it will be reasonable to test the hypothesis that interaction of SEMA3B and SEMA3F (see below) proteins with these oncogenes may disrupt the activation of MET and RON and therefore convey a negative growth signal. The lack of expression and mutation make SEMA3B an attractive candidate for methylation and TSG functional analysis.

The GNAI2 gene was discovered and cloned 12 years ago as part of studies on G proteins (21) . GNAI2 was mapped to 3p21.3 by us and others and located to the central part of the 370-kb region (Fig. 1Citation ; Refs. 9 , 17 , 18, and 20 ). It is composed of 8 exons spread over ~22 kb of genomic space. The ~2.5-kb mRNA is well expressed in normal tissues and lung cancer cell lines and is well represented in EST databases. No mutations were found in 34 lung cancer cell lines. The product is a G protein localized to the endoplasmic reticulum. PFAM (37) predicts a G-{alpha} domain (residues 6–354) and an arf domain (residues 157–307). The {alpha} GTPase function was established experimentally. The mouse orthologue (accession nos. RGMSI2 and P08752) is 98% identical and 99% similar on protein and 96% identical on cDNA levels. The worm orthologue (accession no. P51875) is 67% identical, and the fly orthologue (accession no. P20353) is 76% identical on the protein level. The newly predicted fly gene product G-o{alpha}47A (accession no. AAF58790) is identical in size, 72% similar in amino acid sequence, and should be considered a potential orthologue of GNAI2. The lack of mutations and continued expression of GNAI2 in lung cancers suggest it is an unlikely TSG candidate.

The G17 gene was discovered experimentally by cDNA selection onto cosmid LUCA17, which was then used to screen cDNA libraries. The gene space of 17 kb encodes a 3-kb mRNA composed of 18 exons. The mRNA is expressed in several human tissues including lung, well represented in EST databases from normal and tumor tissues. No mutations were found in 38 lung cancer cell lines, and Gene17 was expressed in many lung cancers. The product is predicted to function as a plasma membrane amino acid transporter by homology to ABC transporters. It contains 10–11 transmembrane helices [predicted by SPLIT (40) and TMHMM (41) programs] and an aromatic amino acid permease-2/xan_ur_permease domain (residues 67–455) predicted by ProfileScan and PFAM servers. The mouse orthologue was discovered in several ESTs clones (accession nos. AI098786, AI048261, and AI466351), sequenced (our accession no. AI098786), and shown to be 90% identical on cDNA and 97% on protein levels. The yeast (50) , worm, and fly (66) proteomes contain several amino acid transporter genes of similar size. The lack of mutations and continued expression make Gene17 an unlikely TSG candidate.

The GNAT1 gene was cloned 10 years ago (Ref. 22 ; accession no. X15088) and encodes the transducin protein isolated from the eye. We and others positioned the gene in 3p21.3, i.e., in the homozygous deletion overlap region close to the GNAI2 gene (Fig. 1Citation ; Refs. 18 and 20 ). The gene space of ~3.5 kb contains seven exons and encodes a 1.5-kb mRNA expressed abundantly in the retina and fetal heart tissues and T-cell lines. GNAT1 was not expressed in lung and lung cancer cell lines. No mutations were found in analysis of genomic DNA in 35 lung cancer cell lines. The mouse orthologue (48) was cloned (accession no. P20612) and is 100% identical on global protein alignment. The worm (accession no. P51875) and fly (accession no. P20353) proteomes contain similarly sized G-proteins with 50 and 60% identity, respectively, with yet unknown function. The newly predicted fly gene product G-o{alpha}65A (accession no. AAF50626) is identical in size, 86% similar, and should be considered a true orthologue of GNAT1. The transducin protein is an {alpha}1 G-protein subunit localized in the endoplasmic reticulum. PFAM (37) predicts a G-{alpha} domain (residues 2–349) and an arf domain (residues 161–342). The restricted tissue distribution of expression of GNAT1 and lack of mutations make GNAT1 an unlikely TSG candidate.

The SEMA3F/SEMA-IV/SEM IIIF gene, the second semaphorin gene in the region, was identified experimentally and cloned independently by </