| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Regular Articles |
Hamon Center for Therapeutic Oncology Research, University of Texas Southwestern Medical Center, Dallas, Texas 75390-8593 [J. D. M.], and Laboratory of Immunobiology, National Cancer Institute, Frederick Cancer Research and Development Center, Frederick, Maryland 21702 [M. I. L.]
| ABSTRACT |
|---|
|
|
|---|
630-kb lung cancer homozygous deletion region harboring one
or more tumor suppressor genes (TSGs) on chromosome 3p21.3. This
location was identified through somatic genetic mapping in tumors,
cancer cell lines, and premalignant lesions of the lung and breast,
including the discovery of several homozygous deletions. The
combination of molecular manual methods and computational predictions
permitted us to detect, isolate, characterize, and annotate a set of 25
genes that likely constitute the complete set of protein-coding genes
residing in this
630-kb sequence. A subset of 19 of these genes was
found within the deleted overlap region of
370-kb. This region was
further subdivided by a nesting 200-kb breast cancer homozygous
deletion into two gene sets: 8 genes lying in the proximal
120-kb
segment and 11 genes lying in the distal
250-kb segment. These 19
genes were analyzed extensively by computational methods and were
tested by manual methods for loss of expression and mutations in lung
cancers to identify candidate TSGs from within this group. Four genes
showed loss-of-expression or reduced mRNA levels in non-small cell lung
cancer
(CACNA2D2/
2
-2,
SEMA3B [formerly SEMA(V),
BLU, and HYAL1] or small cell lung
cancer (SEMA3B, BLU, and
HYAL1) cell lines. We found six of the genes to have two
or more amino acid sequence-altering mutations including
BLU, NPRL2/Gene21, FUS1,
HYAL1, FUS2, and SEMA3B.
However, none of the 19 genes tested for mutation showed a frequent
(>10%) mutation rate in lung cancer samples. This led us to exclude
several of the genes in the region as classical tumor suppressors for
sporadic lung cancer. On the other hand, the putative lung cancer TSG
in this location may either be inactivated by tumor-acquired promoter
hypermethylation or belong to the novel class of haploinsufficient
genes that predispose to cancer in a hemizygous (+/-) state but do not
show a second mutation in the remaining wild-type allele in the tumor.
We discuss the data in the context of novel and classic cancer gene
models as applied to lung carcinogenesis. Further functional testing of
the critical genes by gene transfer and gene disruption strategies
should permit the identification of the putative lung cancer TSG(s),
LUCA. Analysis of the
630-kb sequence also provides
an opportunity to probe and understand the genomic structure,
evolution, and functional organization of this relatively gene-rich
region. | INTRODUCTION |
|---|
|
|
|---|
630-kb clone contig was sequenced jointly by The
Washington
University5
and The
Sanger6
Human Genome Sequencing Centers. In more recent work, we placed
the putative 3p21.3 TSG(s) in a
120-kb segment that was defined by a
homozygous deletion in a breast cancer specimen that was nested within
the three small cell lung cancer homozygous deletions
(10)
. In parallel with these genetic and physical studies,
we have been constructing a map of transcript sequences with the aim to
identify a complete set of all transcripts encoded in the region and to
define/annotate the respective genes. Here we report the catalogue of
genes we have discovered to be residing in the 630-kb sequence and
their experimental and informatics characterization. Of these, only two
"G protein" genes, i.e., GNAI2
(21)
and GNAT1 (22)
, had been
cloned and characterized previously, and from this catalog we
positioned these two genes within the contig DNA sequence. The
set of 19 genes found in the overlapping homozygous deletions in SCLCs
NCI-H740, NCI-H1450, and GLC20 (18)
, including eight in
the smaller critical 120-kb sequence defined by a breast homozygous
deletion (10)
, were analyzed extensively. We used both
manual experimental methods to study expression and search for
mutations and web-based computational servers to predict possible
protein functions. Four of the genes by Northern analysis showed
frequent reduced or absent mRNA levels in NSCLC (CACNA2D2,
SEMA3B, BLU, and HYAL1) and SCLC
(BLU, HYAL1, and SEMA3B) cell lines.
We found that six of the genes had mutations, but none of the 19 genes
showed a high frequency of mutations (>10%) in the analyzed lung
tumor samples. This raises the possibility that the putative TSG,
LUCA, may be one of the genes with frequent loss of
expression that occurs through acquired tumor promoter hypermethylation
(23)
. Alternatively, it could belong to the class of
haploinsufficient TSGs. This novel class of TSGs is predicted to
predispose to cancer in a hemizygous (+/-) state but does not show a
second hit in the remaining wild-type allele in tumors. Further
functional experimental analysis such as growth suppression studies and
gene knockout strategies will be required to reveal the identity of the
putative 3p21.3 TSG(s). In addition, our study shows that genomic DNA
sequencing in combination with high-quality gene annotation is an
effective method of gene discovery. | MATERIALS AND METHODS |
|---|
|
|
|---|
Commercial Reagents
The following materials were purchased from the vendors
indicated: PCR tool kits, Perkin-Elmer Cetus; DNA sequencing tool kits,
Applied Biosystems (Foster City, CA); fluorescent in situ
hybridization reagents, Boehringer Mannheim (Mannheim, Germany); cDNA
and cosmid libraries, ClonTech Laboratories (Palo Alto CA) and
Stratagene (La Jolla, CA); EST clones from public databases, the
I.M.A.G.E
consortium7
or Research Genetics (Rockville, MD); SSCP tool kits and blotting nylon
membranes, Amersham (Arlington Heights, IL); oligonucleotide primers,
Life Technologies, Inc. (Rockville, MD); restriction enzymes, Life
Technologies, Inc., New England Biolabs (Beverly, MA), and Amersham
(Arlington Heights, IL); MTN poly(A)+ RNA blots, ClonTech Laboratories
(Palo Alto, CA); buffers, blotting solutions, and RNase free water,
Quality Biologicals (Gaithersburg, MD); chemicals, Sigma Chemical Co.
(St. Louis, MO); and cell culture media, Life Technologies, Inc.
(Rockville, MD).
Informatics Tools
The software package, GENSCAN (27)
, was
licensed from Christopher Burge (MIT, Cambridge, MA) and installed at
the NCI Advanced Biomedical Computing Center at the Frederick Cancer
Research and Development Center. The integrated informatics
package, PANORAMA, incorporating BLAST, GENSCAN, GRAIL, and other gene
interpretation features, was developed at University of Texas
Southwestern Medical Center at Dallas (TX) by H. Garner and run
on a Hewlett Packard Exemplar supercomputer. PANORAMA is available for
internet
use.8
For this analysis, GenBank was downloaded December 1999.
Manual Molecular Procedures
All molecular manipulations (DNA and RNA isolations, screening
genomic and cDNA libraries, Northern and Southern blot analyses, and
PCR) were performed using standard methods according to Sambrook
et al. (28)
. For DNA sequencing, cDNA clones
were sequenced on an Applied Biosystems 373 or 377 DNA sequencer
(Stretch) using Taq Didroxy Terminator Cycle Sequence kits
(Applied Biosystems, Foster City, CA) with either vector or
clone-specific walking primers. Cosmid and P1 phage DNAs were sequenced
by the Washington University and Sanger Human Genome Sequencing Centers
using the shotgun procedure as described (21
, 22)
. FISH
and two-color FISH were used to locate and orient the cosmid contig on
chromosome 3p. Normal metaphase chromosomes were hybridized
simultaneously with digoxygenin-labeled NotI linking clone
NL1210, part of cosmid LUCA1, (green) and biotin-labeled cosmid
LUCA20 (red) (29)
. 4',6-Diamidino-2-phenylindole was used
as a counterstain. Both the metaphase spreads and interphase nucleus
staining confirmed the single-site location of each probe on 3p21.31,
establishing the following order: centromere-cosmids
LUCA1-LUCA20-telomere. Pulsed-field gel electrophoresis analysis was
performed as follows. High molecular weight DNA was prepared in agarose
plugs as described (30)
. Slices containing
106
cells were digested for 16 h with 50
units of enzyme (NotI, Nru1, and Mlu1;
Boehringer Mannheim) and resolved on 1% agarose gels using a Bio-Rad
CHEF Mapper (Hercules, CA) and electrophoresis profiles, allowing
separation in the range 501000 kb. For expression analyses, Northern
blot hybridization was performed with cDNA probes using commercial MTN
poly(A)+ RNA blots ClonTech (Palo Alto, CA) from a variety of adult
human tissues and tumor cell lines and in-house blots with total or
poly(A)+ RNA were prepared from lung cancer cell lines. Radioactive DNA
probes were prepared by random priming Rediprime II (Amersham,
Arlington Heights, IL). Hybridization was performed in ExpressHyb
hybridization solution according to manufacturers instructions
(ClonTech Laboratories, Palo Alto, CA). In addition, the presence of
gene transcripts was monitored in silico by BLAST homology
searches (31)
in public EST databases. Mutational analyses
were performed by RT-PCR-SSCP or exon-PCR-SSCP, followed by sequencing
of shifted bands as described previously (32
, 33)
.
Experimental gene discovery by using conserved and transcribed genomic
fragments was performed as detailed previously (18)
.
Computational and Bioinformatics Procedures
World Wide Web-based Servers and Databases.
World Wide Web-based servers and databases (34)
were used
to analyze genomic, cDNA, and predicted protein sequences. In addition,
the Wisconsin Genetics Computer Group, package 10 (35)
,
and the GENSCAN (27)
programs were run at the Advanced
Biomedical Computing Center (Frederick Cancer Research and Development
Center), whereas the University of Texas Southwestern Medical Center
integrated gene analysis software PANORAMA was run at the University of
Texas Southwestern Medical Center.
DNA and Protein Sequence Analyses.
Global sequence alignments were done using BLAST (31)
and
Advanced
BLAST9
(36)
programs as provided by National Center for
Biotechnology
Institute,10and
BLAST2/WU-BLAST.11Multiple sequence alignments, global and local, were done using the
CLUSTAL version W
program12as provided by EMBL, Baylor Computing
Center,13and the Wisconsin Genetics Computer Group, package 10 program
(35)
. Protein structural features were delineated using
the EXPASY proteomics
tools.14Protein domains were discovered using Pfam (37)
and SMART
(38)
programs as provided by
EMBL.15Protein subcellular localization was predicted using
PSORT16(39)
. Signal peptides, transmembrane helices, and membrane
topologies were predicted by
SPLIT17(40)
,
TMHMM18(41)
, and PSORT (39)
programs. Protein motifs
were found by visually inspecting local alignments or using the protein
motifs (42)
, ProfileScan and Prosite (43)
programs. In addition, we used the INTERPRO server
(44)
.19
Discovery of Orthologous Genes in Model Organisms.
Stringent criteria for identification of candidate orthologous genes
were applied as suggested (45
, 46)
. In the mouse,
orthologous pairs were >90% identical on the protein level with
>90% alignment of their entire lengths (47)
. In the fly,
worm, and yeast, candidates were identified with 2050% identity over
at least 80% of their lengths. The TBLASTN program was used to search
nonredundant nucleotides, Unigene, and EST databases of the model
organisms. EST clusters were then built by the EST assembly machine
server20or the EST
Assembler21at Max Delbrück
Center.22The advanced BLAST2 and Orthologue program (48)
at EMBL
was used to confirm the putative orthologous relationships and obtain
and ascertain phylogenetic trees.
In Silico Gene Discovery Was Performed following
Two Different Protocols.
Genome-wide repeats and low complexity regions in the genomic DNA
sequences were identified and masked using the program
RepeatMasker.23They were then used in BLASTN searches against EST, Unigene, and
nonredundant nucleotide databases to identify potential transcripts
(ESTs and cDNAs) and build EST clusters. Next, genomic sequences
assembled from the individual cosmid sequences were subjected to gene
prediction programs, i.e., GENSCAN (27)
and
XGRAIL (49)
, with default settings to identify coding DNA
sequences and corresponding protein sequences. These were then used in
BLASTN and TBLASTN searches, respectively, against nonredundant
nucleotide and EST databases to identify ESTs and cDNAs. The ESTs were
then assembled into clusters (see above). Genomic information
(repetitive elements, coding exons, ESTs, and known and predicted
genes) was also obtained and analyzed by first-pass automatic genome
annotation programs, PANORAMA (18)
, for individual cosmid
sequences and by the Rummage
package24for the whole assembled
630-kb contig sequence. The Rummage analysis
was kindly performed by Drs. A. Rosenthal and R. Schattevoy, both at
the Genome Sequencing Center (Jena, Austria). Recently, The
Genome Annotation
Channel25made available their first-pass annotations for most of the contig
sequences.
Gene Annotations.
Annotations for the proteins for all of the genes discovered in the
contig sequence were compiled from computational predictions,
experimental observations, and by transfer of information from the
yeast, worm, and fly orthologue pairs. Functional conservation between
human proteins and their orthologous counterparts was repeatedly
demonstrated experimentally (46
, 50)
.
| RESULTS |
|---|
|
|
|---|
630-kb Region.
120 kb (Ref.
10
; Fig. 1
120-kb region or in the more telomeric
250-kb
segment of the 370-kb region.
|
630 kb
long.26Fig. 1
370 kb,
250 kb, and
120 kb. We have
systematically proceeded to identify and test all of the genes in the
region for their candidacy as a TSG (Fig. 1
|
630-kb sequence. This
contention is likely true for the
120-kb sequence (Fig. 1)
18 kb of intergenic and
17 kb of intronic
sequence, this region is extremely gene dense and contains 2025% of
coding sequence.
The location of the contig on 3p21.3 and its position relative to other
3p genes were determined by the presence in the
630-kb sequence of
several framework genetic markers mapped to this location
(e.g., D3S1621 and D3S1568; Fig. 1
),
multiple radiation hybrid mapped sequence tagged sites (SHGC-11855
shown), and other linked genetic markers (see detailed marker
information on genomic sequence contigs NT_002322, NT_000067, and
NT-000069).28In addition, in situ hybridization [FISH and two-color FISH
with DNA from cosmids LUCA1 (Z74618), LUCA8 (Z84495), and LUCA20
(AC004693)] located the contig to 3p21.3 and proved the orientation as
shown on Fig. 1
(data not shown). We also performed radiation hybrid
mapping with the TNG panel with selected markers for finer resolution
of the location (not shown). In addition, using the contig sequence, we
have determined intergenic distances, exon-intron structures, the
intron sizes of the resident genes, and ascertained the direction of
transcription. These are all features that are immediately available by
performing a BLAST analysis with our deposited cDNA sequences (Table 1)
against the individual cosmid or assembled genomic
sequences.29
In total, the 25 genes occupy
630-kb of genomic DNA, resulting in an
average size of
25 kb/gene, which agrees well with the size of an
average gene (
30 kb) estimated for the whole human genome. However,
distribution of these genes along the sequence is rather uneven, with
the highest gene density (genes in green in Fig. 1
) in the
120-kb region (average gene size,
15-kb). Gene sizes varied
dramatically, with the smallest gene size of 2.3 kb (g101F6)
and the largest of
140 kb (CACNA2D2). Actually, this
large gene is interrupted by the centromeric breakpoints of both the
HCC1500 breast cancer homozygous deletion (in cosmid LUCA6) and in the
SCLC GLC20 homozygous deletion in LUCA10 (Fig. 1)
. The intergenic
distances also differed enormously, from 161 bp (between
g101F6 and NPRL2/g21) to
60 kb (between genes
g20 and CACNA2D2). The intron sizes also varied
dramatically, the smaller ones ranging from 501000 bp and the largest
being 4050 kb (in CACNA2D2).
It is worthwhile to examine what type of genes (if any) would have been missed by our manual and informatics prediction analyses. These omissions could include genes that lack GenBank matches, genes whose sequences would not be recognized by the available gene prediction programs, such as non-protein coding genes, and genes whose mRNAs have a very restricted pattern of tissue expression. Another approach to this gene saturation problem will be to compare the human sequence with the corresponding mouse genomic sequence. Because functional sequences such as transcriptional enhancers and mRNA-like noncoding RNAs are highly conserved in mammalian genomes, they might be readily detected in this comparison (58 , 59) . The effectiveness of this approach has been demonstrated recently (60 , 61) . The mouse BAC clone (#AC025353) covering this region was used for this alignment (data not shown).
The three lung cancer homozygous deletions identified an
370-kb
region extending from cosmid LUCA10 (Z75742; defined by the centromeric
end of the GLC20 homozygous deletion) to P3938 (AC004814; defined by
the telomeric end of the NCI-H740 homozygous deletion (Fig. 1)
. The
nested homozygous deletion in breast cancer HCC1500 (10)
,
which covered part of cosmid LUCA06 (Z84493) to part of cosmid LUCA13
(AC002455; Fig. 1
), divided the 19 genes in the 370-kb region into two
critical gene sets: 8 genes in a 120-kb segment extending from part of
cosmid LUCA10 to part of cosmid LUCA13, and 11 genes in the telomeric
portion (
250 kb, extending from cosmid LUCA13 to P1 clone P3938,
AC004814). These deletions eliminated the
3pk/MAPKAP3 (U09578), CISH (AF132297,
temporarily called Gene18 in our GenBank deposit),
HEMK isoforms I and II (AF131220 and AF172244),
Gene20 (AF188706), and two partially characterized genes
[Gene28(Luca1.2) and Gene30(Luca2.3)], all
lying in the centromeric portion of the contig (
260 kb of genomic
DNA located in cosmids LUCA1, LUCA2, LUCA3, and LUCA4), from further
consideration. In addition, expression and mutation analysis by us of
3pk (U09578) and by Sithanandam et al.
(55)
and Uchida et al. (62)
of
CISH (AF132297) provided other evidence excluding these
genes as well.
Expression Analysis of the Candidate Tumor Suppressor Genes.
All of the remaining19 genes were analyzed extensively by manual and
computational methods. Northern analyses with cDNA probes for each gene
using commercial poly(A)+ RNA MTN blots (Clontech) and a panel of lung
cancer cell line RNAs revealed the sizes of the respective transcripts
and patterns of expression in normal human tissues and lung cancer
samples (Figs. 2
and 3
; Table 2
). Expression in Northern blots prepared from total or poly(A)+ RNA of
2030 RNA samples from lung cancer cell lines, representing both SCLC
and NSCLC, revealed, for many of the genes, levels of expression
similar to normal lung. No abnormal transcript sizes suggestive of
mutations were found. However, several genes showed reduced expression
in the lung cancer lines, i.e., the CACNA2D2 gene
not expressed in 50% of the lines, the BLU gene expressed
only in 30%, the HYAL1 gene expressed in <30% (Fig. 3)
, and SEMA3B and SEMA3F (19
, 57 , 63)
expressed in <50%. Thus, expression analysis in lung cancers
identified these five genes as potential TSG candidates based on loss
of expression in a sizable number (but not all lung cancers). Because a
possible mutation mechanism is tumor-acquired promoter hypermethylation
(23)
, the methylation status of the CpG islands associated
with the genes showing reduced or absent expression is currently under
investigation.
|
|
|
100 genomic DNAs
from our large panel of lung cancer cell lines representing both SCLC
and NSCLCs (24)
using cDNA or genomic probes representing
each of the 25 genes in our search for other homozygous deletions or
genomic DNA rearrangements (data not shown). We found only the
homozygous deletions listed in Fig. 1
30-kb homozygous deletion
in SCLC NCI-H524 involving most of the genomic sequence in cosmid
LUCA13-interrupting gene FUS1 to gene HYAL1 (data
not shown, and see discussion below). Mutational analyses of the
resident genes (summarized in Tables 1
250-kb portion of the 630-kb contig with HYAL1,
FUS2, and SEMA3B, each exhibiting a few mutations
(Table 1)
|
14,000) of fly
genes30(66)
. We noticed that orthologous pairs fall into three
categories: common for both worm and fly, only present in the worm, and
only present in the fly. Yeast genes sharing
50% similarity in
common domains/ features were found for two of the genes,
i.e., PL6 and NPRL2/Gene21, probably
only the Schizosaccharomyces pombe counterpart
(NPRL2) of NPRL2/g21 should be considered a
candidate orthologous gene. The availability of the complete DNA
sequences of yeast (50)
,
worm31, and fly (66)
genomes makes it unlikely that we have
missed any of the orthologous gene pairs for our TSG candidates in
these three model genomes. We now provide annotations for each of the
19 genes found in the
370-kb segment starting with the genes in the
smaller
120-kb sequence (Fig. 1)
The
2
-2 calcium channel subunit gene, CACNA2D2, was
discovered in silico by both finding EST matches with
fragments of genomic sequence and by exons predicted by GENSCAN. The
gene occupies
140 kb of genomic space and is composed of at least 40
exons. It is expressed as a 5.55.7 kb mRNA. Three mRNA splice forms
have been detected that code for two protein isoforms in several normal
tissues. GenBank deposits AF040709 (mRNA isoform 3) and AF042792 (mRNA
isoform 1) differ in the 5' untranslated region and encode the same
amino acid sequence (protein isoform I), whereas AF042793 (mRNA isoform
2) differs in the 5' translated region and has a slightly different
amino terminal amino acid sequence (protein isoform II). The expression
of CACNA2D2 is reduced or absent in >50% of lung cancer
cell lines, particularly NSCLCs. However, no mutations were detected in
analysis of 60 lung cancer cell lines and 40 paired normal/SCLC tumor
samples. The nucleotide sequence suggests that the gene encodes an
auxiliary regulatory
2-
subunit of calcium channels and joins the
2-
-1 (previously A subunit) gene (67)
as a new and
second member of the
2-
gene family. Three putative transmembrane
helices predicted previously in the
2
-1 protein (67)
were also predicted in both protein isoforms of the CACNA2D2
gene with the SPLIT 35 program (40)
. In addition, protein
isoform I of the CACNA2D2 gene has another membrane helix at
the very amino terminus. Using the TMHMM program (41)
, all
three
2
proteins were predicted to span the membrane only once at
the amino (
2
-2 isoform I) or at the carboxy termini (
2
-1
and
2
-2 isoform II), favoring the single-transmembrane model for
the
2
subunit proteins, which was verified experimentally for the
2
-1 protein (67)
. A protein binding a VWA-like
domain was discovered by the PFAM (37)
program in the
extracellular part at similar positions in all three
2
subunit
proteins (amino acid residues: 291469 and 222400 for
2
-2
protein isoforms I and II, respectively, and residues 253430 for the
2
-1 protein). The VWA-like domain may facilitate the binding of
the
2
complex with the calcium channel
-1 pore forming subunit
protein (67)
. The almost identical membrane topologies,
similar domain structures, and posttranslational modifications of all
three
2
subunit proteins strongly support the identity of the new
2
-2 gene as a member of the
2
gene family. To provide experimental confirmation of this
predicted function, through injection of CACNA2D2 cRNA into
Xenopus oocytes, we have confirmed recently that
CACNA2D2 acts as a regulatory subunit of voltage-gated
calcium channels able to augment the function of all three pore-forming
units (68)
. BLAST (36)
searches in the mouse
EST database detected two different nonoverlapping EST clones
(accession nos. AA000341 and AA008996), which showed 91 and 85% cDNA
sequence identity with CACNA2D2 (residues 29253421 and
49895391), respectively. These EST sequences showed only limited
homologies to the murine
2
-1
gene splice forms, indicating that they represent true orthologous
sequences of the human CACNA2D2 gene. This was further
corroborated by protein alignment of the 86-amino acid ORF encoded by
mouse EST AA000341, which was 96% identical to the CACNA2D2
isoform I protein (amino acid residues 922-1005). The worm genome also
contains two
2
genes; by stringent criteria
of orthology (i.e.,
50% identity/similarity with >80%
alignment of their entire amino acid sequence) the worm gene,
T24F1.6, appears to be the orthologue of
CACNA2D2, whereas the second worm
2
gene, UNC-36 (accession no.
P34374), is the orthologue of the
2
-1 gene (67)
. The
UNC-36 phenotypes do not affect growth, and no phenotypes
were yet reported for the T24F1.6 locus. The fly proteome
(
14,000 proteins; Ref. 66
) contains three
2
proteins (accession nos. AAF53505, AAF53476, and AAF58335) of which the
first contains a likely orthologue of
2
-1 (44)
, the
second is a likely orthologue of the
2
-2 gene, and the third of a
still-not-cloned human gene; all three have the VWA_DOMAIN. The yeast
proteome (50)
contains only one ion channel gene and
appears to have no orthologues for either of the
2
genes. Despite the lack of mutations, the
absence of CACNA2D2 expression in many but not all NSCLCs
with high CACNA2D2 expression in normal lung makes
CACNA2D2 an excellent candidate TSG with the need for
testing of function in tumor cells and study of acquired promoter
hypermethylation as a method of inactivation of gene expression.
The PL6 gene was discovered manually by probing Northern blots with genomic fragments. The gene occupies 4.5 kb of genomic space, is composed of two exons, and expressed as a 2.2-kb mRNA in many normal human tissues including lung. The expression of PL6 is slightly reduced in some SCLC lines and abundantly represented in the human and mouse EST databases. No mutations were detected in 38 cell lines and 40 paired normal/SCLC tumor samples. By sequence analysis, PL6 encodes an integral plasma membrane protein [PSORT program (39) ] with six [SPLIT program (40) ] or seven to eight [TMHMM program (41) ] transmembrane helices. The predicted cytoplasmic portion of the protein (the last 103 amino acids, residues 249351) contains an OMPdecase domain [residues 274298; PFAM, (39) ] that may involve PL6 in protein-protein interactions and a bipartite NLS (residues 282299; Ref. 39 ) that may guide it to the nucleus. The mouse orthologue was discovered in an EST (accession no. W96860), sequenced (our accession no. AF134238), and shown to be 92% identical on protein and 87% on cDNA levels. The worm F11A10.3 gene encoding a multidomain protein that aligns with PL6 and the aligned region contains the NLS and the OMPdecase domains, suggesting that it is the orthologue of PL6. The fly gene CG9536 product (450 residues; accession no. AAF52388) is a likely orthologue of PL6 (48% similarity over the first 306 residues of PL6) and is also an integral membrane protein with seven to eight transmembrane helices. It has two HMW kininogen domains (residues 325349 and 353-3760) but no NLS and OMPdecase domains. The yeast gene Yol107w product has substantial homology with PL6 but does not have the NLS and the OMPdecase domain. The absence of mutations and robust expression of PL6 in most lung cancers suggest that PL6 is an unlikely candidate TSG.
The 101F6 gene was discovered manually by screening arrayed cDNA libraries with cosmid LUCA12 DNA. The gene space of 3.2 kb contains four exons encoding a 1.5-kb mRNA. The gene is expressed in many normal tissues including lung, is highly expressed in SCLC and NSCLC cell lines, and is abundantly represented in the EST data bases. No mutations were detected in 40 cell lines and 40 paired normal/SCLC tumor samples. By sequence analysis, 101F6 encodes an integral plasma membrane protein [PSORT program (39) ] with six [SPLIT program (40) and TMHMM program (41) ] transmembrane helices with both termini in the cytoplasm. No other known domains or significant motifs were detected. The mouse orthologue was discovered in the mouse EST database (accession nos. AA285935, AA198541, and AA198960), sequenced (our accession no. AF131206), and shown to be 95% identical on the protein and 85% on the cDNA sequence level. No orthologous pairs were detected in the fly (66) , worm, and yeast proteomes (50) . The absence of mutations and robust expression of 101F6 in most lung cancers suggest that 101F6 is an unlikely candidate TSG.
The NPRL2/Gene21 gene was discovered in silico by finding both ESTs matches and GENSCAN predicted exons. The gene space of 3.3 kb contains 11 exons coding for a 1.5-kb mRNA with multiple splice isoforms that are expressed in many normal tissues including lung and testis and is abundantly represented in the EST databases. NPRL2/Gene21 is well expressed in SCLC and NSCLC lines except for the SCLC line NCI H1514. A frameshift mutation producing a stop codon was detected in 1 of 40 lung cancer cell lines. Sequence analysis shows NPRL2/Gene21 encodes a soluble protein that has a bipartite NLS (residues 6279) and a protein binding domain, granulin (residues 8698), predicted by PFAM (35 , 37 . The mouse orthologue was discovered in mouse EST databases (accession nos. AI037102, AA764527, AA709972, and W64225), sequenced (our accession no. AF131206), and shown to be 97% identical on protein and 90% on cDNA sequence levels. True orthologues were identified: in yeast (the NPR2 gene in Saccharomyces cerevisiae, GenBank accession no. P39923, and the hypothetical Mr 47,000 protein in S. pombe, accession no. Z99163); in the fly (66) , the CG9104 gene product (accession no. AAF48677) with 65% similarity over the whole length of the NPRL2/g21; and in the worm (accession no. U61949) proteome databases. However, only the mouse orthologue contains the bipartite NLS (residues 6279) and the granulin domain (residues 8698). NPRL2/Gene21 mRNA is expressed in most lung cancers. The mutations in NPRL2/Gene21, particularly the stop mutations, indicate the need for further study of this gene as a candidate TSG.
The BLU gene was discovered manually (and serendipitously)
using PCR primers (kindly provided by B. Vogelstein, Johns Hopkins,
Baltimore, MD) to screen for the presence of the
ß-catenin gene (at the time recently assigned
to chromosome region 3p21) in our cosmid contig. Although a PCR product
was identified, DNA sequence analysis showed no sequence relationship
of the product to ß-catenin. This PCR product
was used as a probe that identified a mRNA on Northern blot analysis,
which then led to the subsequent isolation of the full BLU
cDNA by library screening. The gene space of
4.5 kb contains 11
(testis version) or 12 (lung version) exons coding for a 2-kb,
alternatively spliced mRNA, well expressed in lung and testis but not
expressed in all other tested human tissues. The EST databases contain
a moderate number of hits, mostly from lung and testis cDNA libraries.
The testis isoform contains 11 exons because of a complex selection of
an alternative acceptor site. The testis-specific protein isoform
contains a different amino acid sequence between residues 199 and 234
as compared with the lung-specific isoform; this change results in the
loss of one of three PKC phosphorylation sites (residues 229231). The
expression in SCLC and NSCLC cell lines is reduced or virtually
undetectable in 70% of tested lines. Three missense mutations were
discovered in a sample of 61 lung cancer cell lines. The BLU protein is
likely a soluble cytoplasmic protein and shares 3032% identity over
a stretch of 100112 amino acids (residues 334437 or 318430) with
proteins of the MTG/ETO family of transcription factors
(69)
and the suppressins (70)
that may
regulate entry into the cell cycle and suppress growth of colon
carcinoma cells. The "Zn knuckle" motif involved in specific
protein-protein interactions is part of this domain and is present in
many
proteins.32No orthologous pairs were found in the worm and yeast proteomes
(50)
. However, the fly genome (66)
contains a
true orthologue of BLU: the CG11253 gene product (accession
no. AAF49850) is of similar size (451 residues), has 49% amino acid
sequence similarity over the whole length of BLU, and also has a MYND
finger domain (residues 412448). Several other fly, worm, and
S. pombe proteins share 35% identity with the MTG/ETO
domain. The mouse orthologue was discovered in mouse EST databases
(accession nos. AI595515 and AI429164), sequenced (our accession no.
AF123386), and shown to be 89% identical on protein and 87% on cDNA
levels. The loss of expression in most lung cancers and the occurrence
of a few mutations make BLU an attractive TSG candidate
requiring further functional and promoter methylation status studies.
The RASSF1/123F2 gene was discovered manually by screening gridded cDNA libraries with cosmid LUCA12 DNA. The gene space of 7.6 kb contains 5 exons coding for 2-kb, alternatively spliced mRNAs ("short" and "long" forms, 123F2SF and 123F2LF, that should now be referred to by Human Genome Organization-approved nomenclature as RASSF1C and RASSF1A, respectively) that are well expressed in all analyzed human tissues including lung. The RASSF1C/123FSF but not the RASSF1A/123F2LF mRNAs are well expressed in most lung cancer cell lines. The mRNA is well represented in EST databases from normal and tumor tissues. Using GENSCAN prediction programs, the RASSF1A/123F2LF splicing form was discovered using RT-PCR on mRNA with a difference in amino acid sequence in the NH2 terminus, giving a total amino acid sequence of 340 amino acids compared with 270 amino acids for the RASSF1C/123F2SF. The amino acid sequence of RASSF1A/123F2LF contains a predicted DAG binding domain also found in the related gene NORE1 but not found in the RASSF1C/123F2SF cDNA sequence. RASSF1A/123F2LF mRNAs also come in multiple tissue-related splicing forms, with slight differences in amino acid sequence, including forms for lung (RASSF1A, AF102770), heart (RASSF1D, AF102771, and pancreas (RASSF1E, AF102772). No mutations were detected in 40 paired normal tumor (SCLC/NSCLC) DNA samples (studied for RASSF1C/123F2SF and RASSF1/123F2 common region) and in 38 lung cancer cell lines (RASSF1C/123F2SF and RASSF1A/123F2LF, all regions). The RASSF1/123F2 protein is a soluble cytoplasmic protein that contains a Ras association domain (residues 124218) discovered by the SMART (38) and PFAM (37) programs. Although not all Ras association domains bind RasGTP, the Ras association domain in the mouse paralogue of RASSF1/123F2, NORE1 was found to bind RasGTP (71) . The NORE1 protein also contains the PKC-C1 and DAG/PE domains, which are found in the RASSF1A/123F2LF predicted protein but not in the RASSF1C/123F2SF protein. Recently, the Kastan group has identified RASSF1/123F2 amino acid sequence (common to both the RASSF1C/123F2SF and RASSF1A/123F2LF proteins) as a potential phosphorylation target for ataxia telangiectasia mutated (72) . The mouse orthologue of RASSF1/123F2 was discovered in mouse EST databases (accession nos. AA543890, AA161846, and AA466998), sequenced (our accession no. AF132851), and shown to be 97% identical on protein and 88% on cDNA sequence levels. In contrast, the human orthologue of the mouse NORE1 and the rat MAXP1 (accession no. AF002251) genes is present in a single human EST (accession no. AA362184). Thus, RASSF1/123F2 is part of the same gene family as (but not the orthologue of) NORE1 and the rat gene MAXP1. The worm gene, T24F1.3 (accession no. Z49912), encodes a 615-amino acid hypothetical protein that shares 33% identity and 53% similarity over 95% of the length of the RASSF1/123F2 protein. The T24F1.3 protein contains in the shared portion with RASSF1/123F2 the Ras association domain (residues 396496), and in addition a PH domain (residues 153), and PKC-C1, DAG/PE binding domains (residues:164214), which are found in the RASSF1A/123F2LF predicted protein. The fly (66) and yeast (50) proteomes do not contain a gene with substantial homology to 123F2. The absence of expression of RASSF1A/123F2LF in many lung cancers makes this isoform an attractive candidate for further promoter hypermethylation and tumor-suppressing functional studies. In fact, recent studies by us have shown that RASSF1A/123F2LF promoter region CpG islands undergo tumor-acquired hypermethylation associated with loss of expression, and that forced re-expression of RASSF1A/123F2LF leads to suppression of the malignant phenotype.33
The FUS1 gene was discovered manually by screening cDNA
libraries with a genomic fragment from the area of cosmids LUCA12 and
LUCA13 showing sequence conservation by Southern blot hybridization and
isolated as the fusion (FUS = "fusion") junction of
the ends part of a
30-kb homozygous deletion in SCLC NCI-H524
linking LUCA12 with LUCA13 sequences. The gene space of 3.3 kb contains
three exons coding for a 1.8-kb mRNA that is well expressed in all
analyzed human tissues including lung and in 20 lung cancer cell lines.
The mRNA is well represented in EST databases from normal and tumor
cells. Three mutations were discovered in 79 lung cancer cell line DNAs
leading to truncated products. The FUS1 protein (110 amino acids) is
probably a soluble cytoplasmic protein with a high pI of 9.69; no
domains or known motifs were detected by SMART (38)
or
PFAM (37)
programs. The mouse orthologue was discovered in
mouse EST databases (AA867009, AA473614, and AA672013), sequenced (our
accession no. AF123387), and shown to be 93% identical on the protein
level and 87% on the cDNA level. The fly proteome (66)
does not contain a gene with substantial homology to FUS1.
However, the worm gene, C09E9.1, shows 41% identity on
global alignment and 43% identity over 83% of the FUS1 protein length
and should be considered a candidate orthologue of FUS1. This small
worm protein (123 amino acids) is predicted to have a bipartite NLS
(residues 84101) and weak similarity with DNA-directed RNA polymerase
subunit A' (accession no. P31813). The mutations found in
FUS1 make it an attractive candidate for further functional
TSG studies.
The HYAL2 gene along with HYAL1 was discovered
manually by screening cDNA libraries with a genomic fragment from
LUCA13 conserved across species in Southern blotting. Because these
were the first two genes we isolated in our positional cloning effort,
they were initially given the working names LUCA1 and
LUCA2 (see GenBank deposits). With the discovery of their
function as hyaluronidases, they should now be referred to as
HYAL1 and HYAL2 and the LUCA1, 2 names
reserved for future discovery of a lung cancer functional TSG. The gene
space of 2.8 kb contains three exons that encode a 2-kb mRNA well
expressed in all analyzed human tissues including lung, well expressed
in lung cancer cell lines except SCLC line NCI-H524 because of a small
(
30 kb) homozygous
deletion/rearrangement.34HYAL2 is abundantly represented in EST databases from normal
and tumor tissues. No mutations were detected in 40 lung cancer cell
lines tested. The HYAL2 protein is a member of a large family of
hyaluronidases (EC3.2.1.35), and in fact the expressed recombinant
protein was shown to have enzymatic activity (73)
. PSORT
predicts a signal peptide and cell surface and lysosomal
sublocalizations (37
, 39)
. Similarly, SMART predicts a
signal peptide and a Ca2+ binding epidermal
growth factor-like domain (residues 365440; Ref. 38
).
Recently, the mouse orthologue (AJ000059) was cloned and mapped to the
syntenic region of mouse chromosome 9 between the microsatellite
markers D9Mit183 and D9Mit17 (74)
.
The worm gene, T22C8.2, encodes a similar size protein of
458 amino acids and shows 32% global identity and 50% similarity with
the HYAL2 protein and is predicted to be an orthologous protein by the
Orthologue program (48)
. The fly (66)
and
yeast (50)
proteome databases do not have any members with
homology to the hyaluronidase family of proteins.
HYAL1 was discovered along with HYAL2, manually
by screening cDNA libraries with a conserved genomic fragment from
cosmid LUCA13. It is another hyaluronidase with amino acid sequence
homology to HYAL2 and HYAL3 (see below). The gene
space of
3.5 kb contains three exons coding for a 2.6-kb mRNA well
expressed in all analyzed human tissues, including lung, and is
abundantly represented in EST databases from normal and tumor tissues.
However, it is not expressed in 18 of 20 lung cancer cell lines. Two
missense mutations were detected in 40 lung cancer cell lines. The
HYAL1 protein is a member of a large family of hyaluronidases
(EC3.2.1.35) and in fact was shown to have enzymatic activity
(accession nos. U03056 and U96078.1). Triggs-Raine et al.
(75)
identified two mutations in the HYAL1
alleles of a patient with newly described lysosomal disorder,
mucopolysaccharidosis IX, a mutation that introduces a nonconservative
amino acid substitution (Glu268Lys) in a putative active site residue
and a complex intragenic rearrangement, 1361del37ins14, which results
in a premature termination codon. They reasoned that the mild phenotype
engendered by these mutations was the result of redundancy resulting
from the three tandemly located hyaluronidases HYAL1,
HYAL2, and HYAL3 (discussed below). Thus far, no
increased incidence of cancer has been reported in these kindreds.
PSORT predicts a signal peptide and a cell surface and lysosomal
sublocalizations (39)
. Similarly, SMART predicts a signal
peptide and a visible Ca2+ binding epidermal
growth factor-like domain (residues 357430; Ref. 38
).
Recently, the mouse orthologue was cloned and shown to map to the
syntenic mouse chromosome 9 region (accession no. AF011567; Ref.
76
). The worm gene, T22C8.2, encodes a similar
size protein of 458 amino acids and shows 31% global identity and 46%
similarity with the HYAL1 protein and contains the same domains. It is
predicted to be an orthologous protein by the Orthologue program
(48)
. The fly (66)
and yeast
(50)
proteome databases do not have a member homologous to
the hyaluronidase family of proteins. The absent expression and
occurrence of mutations make HYAL1 an attractive candidate
for future promoter methylation and TSG functional studies.
The FUS2 gene was discovered manually by screening cDNA
libraries with a genomic fragment occurring in cosmid LUCA14 that
showed conservation in Southern blot cross species hybridizations.
FUS2 also was present in the fusion junction genomic DNA
clone isolated from the NCI-H524 30-kb homozygous deletion but was not
involved or rearranged in this deletion but was given the "FUS"
working name at the time of its isolation. The gene space of
3.5 kb
contains an intronless, single-copy gene (accession no. AF040705)
coding for a 1.9-kb mRNA expressed in normal human tissues including
lung. However, an alternatively spliced form (accession no. AF040706)
with one intron exists that results in the same predicted amino acid
sequence. The alternatively spliced form contains an intron in the 5'
untranslated region, whereas the other form is intronless. The mRNA is
well represented in EST databases from normal and tumor tissues. Four
FUS2 missense mutations were detected in 78 lung cancer cell
lines. The FUS2 protein was predicted to be a soluble nuclear protein
[predicted by PSORT (39)
] with interesting domains and
motifs. SMART (38)
and PFAM (37)
programs
predicted an acetyltransferase (GNAT) domain (residues 66189) and a
proline-rich domain (residues 239262) that overlaps (residues
234249) with the Wilms tumor protein signature. A Src homology 2
domain (residues 240250) was detected by the BLOKS (77)
program, whereas the EMOTIF (42)
program detected a ZP
motif (residues 2532) and an eukaryotic thiol (cysteine) protease
signature (residues 180188), which may explain the suggested weak
similarity to furin-like proteases (accession no. AAC02732). The
presence of these domains raises the intriguing possibility that FUS2
may be directly involved in nuclear activities. However, Zegerman
et al. (78)
demonstrated recently that FUS2
functions as an N-acetyltransferase using a ping-pong
mechanism with a specificity for substrates and is a soluble
cytoplasmic protein. The worm protein, C56G2.15, shows 32% identity
and 65% similarity on global alignment and should be considered a true
orthologue of the FUS2 gene. As expected, it also contains
all but the ZP predicted protein domains and is predicted to be a
nuclear protein. Interestingly, the worm gene contains three small
introns in contrast to the one or no intron forms of the human
FUS2 gene. The mouse orthologue was discovered in mouse EST
databases (accession nos. AA051756, AA051686, AI425576, and AA833145),
sequenced and shown to be 69% identical on protein level and 87% on
cDNA level (accession no. AF172275). The mouse mFUS2 protein contains
an additional 28-amino acid stretch, and similar to the human protein,
is predicted to be a nuclear protein by the PSORT program
(39)
. PFAM (37)
and ProfileScan
(43)
both predict an acetyltransferase (GNAT) domain
(residues 92217) and a proline-rich domain (residues 267291). The
Flybase (66)
contains several ESTs (accession nos.
AI064351, AI109425, and AI404849), which could be assembled into a
partial cDNA coding for a 168-residue protein that is 39% identical
and 60% similar to the human FUS2 protein and is predicted to have an
acetyltransferase (GNAT) domain (residues 13138). However, the fly
proteome (66)
does not have a true orthologue of FUS2,
only several proteins with a GNAT domain. The occurrence of mutations
and the demonstration of its biochemical activity make FUS2
and attractive candidate for future TSG functional studies.
The HYAL3 gene was discovered in silico by
finding EST matches and sequence relationship to the HYAL1
and HYAL2 genes. It occupies 5.5 kb of genomic space and
codes for a
2.0-kb mRNA composed of two or three coding exons,
expressed in several human tissues including lung and testis and well
represented in EST databases. The protein belongs to the hyaluronidase
family of enzymes (EC3.2.1.35) and thus represents the third member of
this family in the region. Phylogenetically, it is closer to the worm
hyaluronidase gene, T22C8.2, than the other members of the
human family. No mutations were found in 40 lung cancer cell lines. No
mouse orthologous sequences were found in the databases as of April
2000. HYAL3 RNA was not expressed in any lung cancer cell
lines (data not shown); however, it has a very restricted pattern of
expression in normal tissues and was not found by Csoka et
al. to be expressed in normal lung (79)
. The lack of
mutations and absent expression in normal lung make HYAL3 a
less attractive TSG candidate.
The IFRD2/SKMC15/SM15 gene was discovered experimentally by
screening cDNA libraries with conserved genomic fragments
(56)
. GenBank refers to SKMC15/SM15
as IFRD2, and thus we will use the terminology
IFRD2/SM15. The gene space is
6 kb and codes for a
4-kb mRNA composed of 12 exons, expressed in several human tissues
including lung (56)
. It is well represented in EST
databases from normal and tumor cells. The IFRD2/SM15 protein is a
soluble nuclear protein as predicted by PSORT (39)
and
contains a bipartite NLS at residues 115132 predicted by ProfileScan
(43)
. PFAM (37)
discovered one Armadillo-ß
catenin-like repeat (residues 249288), which suggests the possibility
of involvement in APC signaling. No mutations were found in 63
lung cancer cell lines (56)
. The mouse orthologue was
discovered in ESTs (accession no. W65790), sequenced and shown to be
93% identical on the protein level and 87% on the cDNA level. This
true mouse orthologue of IFRD2/SM15 is different from the
mouse gene mIFRD1/PC4, which has its own human orthologue
located on chromosome 7q2231 (80)
. Interestingly,
mIFRD1/PC4 and its human orthologue,
IFRD1, are not localized in the nucleus and probably are
membrane proteins. IFRD2/SM15 and IFRD1/PC4 have
different patterns of expression during mouse development (47
, 80)
. The relation of IFRD2/SM15 and probably other
members of the family (PC4 and IFRD1) to the IFNs is not really
supported. The slightly shorter worm protein, F58B3.6 (accession no.
Z73427), shows on global alignment 36% identity and 52% similarity to
the SM15 protein and should be considered a potential orthologue. The
fly gene CG3098 product (accession no. AAF51186) is shorter (324
residues) and has 43% similarity to 277 residues of SM15, has a NLS
signal (residues 93110), and should be considered a potential
orthologue. The expression of IFRD2/SM15 and lack of
mutations make it a less attractive TSG candidate.
The SEMA3B/SEMA A(V) gene was discovered experimentally by
using DNA fragments from cosmid LUCA14 to screen cDNA libraries and for
capture of exons (57)
. The correct nomenclature for this
member of the semaphorin family is SEMA3B [previously
referred to as SEMA-A(V)]. It is composed of 17 exons
spread over 810 kb of genomic space coding for a 3.4-kb mRNA
expressed in several normal tissues including lung and testis and not
expressed at all in 12 SCLC lines (Fig. 2
; Ref. 57
). It is
well represented in EST databases from normal and tumor tissues. Three
missense mutations were found in 39 lung cancer cell lines; all
mutations were in NSCLCs. The mouse semaphorin A gene (accession no.
X85990) is most likely the mouse orthologue of the SEMA3B
gene (86% identity and 89% similarity on the protein level on global
alignment; Ref. 48
). Several mouse EST clones (accession
nos. AI553114, AA518074, and AA466386) when translated show 8094%
identity. The worm genome contains three semaphorin genes, of which the
CeSema gene (accession no. U15667) shows 33% identity and
49% similarity over the whole length of the CeSema protein and could
be considered an orthologous gene (48)
. The fly proteome
(66)
contains seven semaphorin proteins, of which the
product of the Sema-2a gene (accession no. AAF57990) is of
similar size, predicted to be secreted, has a similar domain structure,
and should be considered a potential candidate orthologue of SEMA3B.
The SEMA3B protein is predicted by PSORT (39)
to be an
extracellular secreted protein. SMART (38)
and PFAM
(37)
programs identify a signal peptide (residues 125),
a PFAM:SEMA domain (residues 55497), and one IGc2 domain (residues
587646). Interestingly, the PFAM:SEMA domain is also present in the
extracellular part of the MET and RON oncoproteins belonging to the
MET family of receptor tyrosine kinases, as discovered by
the PFAM program (37)
. Thus, it will be reasonable to test
the hypothesis that interaction of SEMA3B and SEMA3F (see below)
proteins with these oncogenes may disrupt the activation of MET and RON
and therefore convey a negative growth signal. The lack of expression
and mutation make SEMA3B an attractive candidate for
methylation and TSG functional analysis.
The GNAI2 gene was discovered and cloned 12 years ago as
part of studies on G proteins (21)
. GNAI2 was mapped to
3p21.3 by us and others and located to the central part of the 370-kb
region (Fig. 1
; Refs. 9
, 17
, 18,
and 20
). It
is composed of 8 exons spread over
22 kb of genomic space. The
2.5-kb mRNA is well expressed in normal tissues and lung cancer cell
lines and is well represented in EST databases. No mutations were found
in 34 lung cancer cell lines. The product is a G protein localized to
the endoplasmic reticulum. PFAM (37)
predicts a G-
domain (residues 6354) and an arf domain (residues 157307). The
GTPase function was established experimentally. The mouse orthologue
(accession nos. RGMSI2 and P08752) is 98% identical and 99% similar
on protein and 96% identical on cDNA levels. The worm orthologue
(accession no. P51875) is 67% identical, and the fly orthologue
(accession no. P20353) is 76% identical on the protein level. The
newly predicted fly gene product G-o
47A (accession no. AAF58790) is
identical in size, 72% similar in amino acid sequence, and should be
considered a potential orthologue of GNAI2. The lack of
mutations and continued expression of GNAI2 in lung cancers
suggest it is an unlikely TSG candidate.
The G17 gene was discovered experimentally by cDNA selection onto cosmid LUCA17, which was then used to screen cDNA libraries. The gene space of 17 kb encodes a 3-kb mRNA composed of 18 exons. The mRNA is expressed in several human tissues including lung, well represented in EST databases from normal and tumor tissues. No mutations were found in 38 lung cancer cell lines, and Gene17 was expressed in many lung cancers. The product is predicted to function as a plasma membrane amino acid transporter by homology to ABC transporters. It contains 1011 transmembrane helices [predicted by SPLIT (40) and TMHMM (41) programs] and an aromatic amino acid permease-2/xan_ur_permease domain (residues 67455) predicted by ProfileScan and PFAM servers. The mouse orthologue was discovered in several ESTs clones (accession nos. AI098786, AI048261, and AI466351), sequenced (our accession no. AI098786), and shown to be 90% identical on cDNA and 97% on protein levels. The yeast (50) , worm, and fly (66) proteomes contain several amino acid transporter genes of similar size. The lack of mutations and continued expression make Gene17 an unlikely TSG candidate.
The GNAT1 gene was cloned 10 years ago (Ref.
22
; accession no. X15088) and encodes the transducin
protein isolated from the eye. We and others positioned the gene in
3p21.3, i.e., in the homozygous deletion overlap region
close to the GNAI2 gene (Fig. 1
; Refs. 18
and
20
). The gene space of
3.5 kb contains seven exons and
encodes a 1.5-kb mRNA expressed abundantly in the retina and fetal
heart tissues and T-cell lines. GNAT1 was not expressed in
lung and lung cancer cell lines. No mutations were found in analysis of
genomic DNA in 35 lung cancer cell lines. The mouse orthologue
(48)
was cloned (accession no. P20612) and is 100%
identical on global protein alignment. The worm (accession no. P51875)
and fly (accession no. P20353) proteomes contain similarly sized
G-proteins with 50 and 60% identity, respectively, with yet unknown
function. The newly predicted fly gene product G-o
65A (accession no.
AAF50626) is identical in size, 86% similar, and should be considered
a true orthologue of GNAT1. The transducin protein is an
1 G-protein subunit localized in the endoplasmic reticulum. PFAM
(37)
predicts a G-
domain (residues 2349) and an arf
domain (residues 161342). The restricted tissue distribution of
expression of GNAT1 and lack of mutations make GNAT1 an
unlikely TSG candidate.
The SEMA3F/SEMA-IV/SEM IIIF gene, the second semaphorin gene in the region, was identified experimentally and cloned independently by