Abstract
To facilitate the identification of tumor suppressor genes in the chromosome 3p21.3-p22 AP20 subregion, we constructed a 3.5-Mb physical and gene map of this segment (between markers D3S4285 and D3S3873) that spans the distance from 124.4cR3000 to 133.5 cR3000 of the GB4 genetic map. We used NotI-linking and -jumping clones, sequence-tagged site PCR marker analysis, and multicolor and fiber fluorescence in situ hybridization to confirm the sequence order and map orientation. An integrated clone contig composed of 5 yeast artificial chromosome, 15 bacterial artificial chromosome, 5 P1 artificial chromosome, and 8 NotI-linking clones provided the physical base of the map. We unequivocally established the order of 28 sequence-tagged sites and 35 genes in the region. Gaps between published bacterial artificial chromosome contigs were determined and covered by our own sequence data. Furthermore, three new genes were isolated, namely the human homologue to the rat Golgi peripheral membrane protein p65, GOLPH5 (GORASP1), the gene for stress-inducible protein, STI2, and the AP20-region gene 1, APRG1.
The tumor suppressor gene candidate APRG1 was positioned close to the border of the homozygous deletion in a small cell lung cancer cell line ACC-LC5. Expression analysis with a tissue-specific panel of cDNA revealed seven distinct tissue-specific splice variants (A–G) of the message (size range, 1.0–1.8 kb). Although the gene was expressed at a low level in all tested tissues, comparatively higher expression was detected in pancreas (splice forms B and D), kidney (A) and placenta (B and C). The APRG1 gene encoded a predicted protein of 170 amino acids (isoform B), which had an NH2-terminal part conserved among members of the eukaryotic translation factor 6 gene family. A Prosite pattern corresponding to the cell attachment sequence Arg-Gly-Asp was also found. The presence of this domain raised the intriguing possibility that APRG1B may be directly involved in membrane interactions and cell adhesion.
We showed that the AP20 region was duplicated during mammalian evolution and homologous gene clusters were present in human chromosome 2 and syntenic mouse regions on chromosomes 1, 2, and 9. Interestingly, the HYA22 gene (human ortholog of the yeast YA22 gene) was located at the borders of both breakpoints, evolutionarily conserved gene cluster and homozygous deletions detected in lung, kidney and other cancers.
NotI digestion revealed that the AP20 region was frequently and extensively methylated in renal carcinoma cell lines and tumor biopsies.
INTRODUCTION
Numerous studies have indicated the presence of distinct TSGs 4 or groups of genes in human chromosome 3p involved in the origin and/or development of carcinomas of the lung, breast, cervix, kidney, and head and neck. Detailed studies reviewed elsewhere (1) showed that these tumor suppressors could be of various types: classical, as RB1 or P53, haplo-insufficient; cancer specific or multiple, i.e., involved in several distinct cancers. Our and others’ studies (1, 2, 3, 4, 5, 6) identified two most frequently rearranged regions in human chromosome 3p, namely LUCA or 3p21.3C (centromeric) and AP20 or 3p21.3T (telomeric). These two regions are hot spots for loss of heterozygosity and/or homozygous loss in major epithelial cancers mentioned above. In fact, homozygous deletions at both locations have been reported in lung, breast, kidney, and cervical carcinomas (1 , 3 , 6, 7, 8, 9) .
The LUCA region was completely sequenced and several candidate TSGs were identified as a result of the major effort by the International Lung Tumor Suppressor Gene Consortium (1 , 3 , 8 , 10) .
A homozygous deletion in 3p21.3T region was found in a SCLC ACC-LC5 cell line. Physical and gene maps covering this deletion were constructed and 14 genes were identified, however, none of them demonstrated features of TSGs (11, 12, 13, 14) .
Several of our previous studies were associated with human NotI-jumping and -linking clones (13 , 15, 16, 17, 18) . NotI-linking clones contain DNA fragments flanking a single NotI recognition site, whereas NotI-jumping clones contain sequences adjacent to neighboring sites. Such clones were shown to be tightly associated with CpG islands and genes (18) .
We have shown that our previously built NotI clone contig (AP20) overlapped with the homozygous deletion in ACC-LC5 (4 , 13) . The physical map constructed with NotI-jumping and -linking clones (13) revealed significant differences between our map and that of Daigo et al. (14) and the draft human genome sequence. 5 Careful analysis of these differences resulted in the identification of several new genes and alternative gene splicing forms that are currently under analysis. Here, we present an integrated 3.5-Mb physical and gene map that facilitates identification of multiple TSG(s) in the critical AP20 region.
MATERIALS AND METHODS
Cell Lines and General Methods.
MCH/mouse (MCH939.2, MCH910.7, and MCH924.4) and human MCH/rat (MCH429.11) microcell cell lines were generated by microcell-mediated chromosome transfer as described previously (19) . The ACC-LC5 SCLC cell line that carried a deletion in 3p21.3 (11) was kindly provided by Dr. Yusuke Nakamura (University of Tokyo, Tokyo, Japan). ACHN, HN51, and Caki-2 RCC cell lines were purchased from the American Type Culture Collection (Manassas, VA). Lymphoblastoid cell line CBMI-Ral-STO (20) , as well as RCC cell lines KRC/Y, TK-164, TK-10, and KH-39, were obtained from Microbiology and Tumor Biology Center, Karolinska Institute (Stockholm, Sweden) cell lines collection (4) .
All molecular biology and microbiology procedures were performed as described previously (15, 16, 17) . Plasmid DNA was purified using REAL-Prep kit (Qiagen, Valencia, CA). Sequencing was done using ABI 310 and ABI 377 Sequencers (Applied Biosystems, Foster City, CA) according to manufacturer’s protocol.
PFGE and Hybridization.
Paired normal and renal cell carcinoma tissue samples were obtained immediately after resection and stored at −80°C before DNA extraction. Each tumor piece was examined histopathologically. Only clear cell type tumors were included. Preparation of DNA for PFGE, digestion with restriction enzymes, and PFGE were done using the Chef mapper (Bio-Rad Laboratories, Hercules, CA) according to manufacturer’s protocols. Southern transfer and hybridization were performed as described previously (16 , 17) .
Molecular Probes.
The construction of NotI-linking/jumping libraries was described previously (15, 16, 17) . The following clones were used for DNA hybridizations: NotI-linking clones NL1-024 (D3S4258), AP40 (D3S1646), NL1-401 (D3S4581), NLJ-003 (D3S1642), AP20 (D3S4311), NL3-003 (D3S3872), and NL1-308 (D3S3873); and NotI-jumping clones J32-611 (connecting NLJ-003 and AP20), J32-612 (connecting AP20 and NL3-003), and J31-613 (connecting NL3-003 and NL1-308). NL3-019 (D3S4633) was cloned by PCR from normal lymphocytes DNA using PCR primers: NL3-019F 5′-GGATCCGGGATGGGGTATAC-3′; NL3-019R 5′-GGATCCTTAAATGCATAAGACCC-3′.
D3S1611, D3S3880, D3S1298, WI-6058, WI-692, SGC30812, D3S1260, D3S3521, D3S2343, and WI-7900 STS PCR markers were selected from databases of Whitehead Institute and CHLC. 6 7 Additionally, new PCR probes were designed: DLC1F 5′-GAGATACATGTTGCCTCACCAG-3′, DLC1R 5′-CATACTGGTCTTCGCTATGCAC-3′; APRG1F 5′-TGTAAACTTTCCAGAACAGGCCCAGA-3′, APRG1R 5′-TTAATAAGGCTGTTACCGTGTAAATGT-3′; NL1-024F 5′-GGGCTGGCAGAACAGGTAACG-3′, NL1-024R 5′-GAGGCATCACTGGGTTCGCTG-3′; AP40F 5′-GGTAGCTTTCGGGCTTCC-3′, AP40R 5′-TCTGCACCTAGATGGCTGTG-3′; NL1-401F 5′-AAGAAGCCTGTTAGTGACGG-3′, NL1-401R 5′-CACAAGCTCTGTACCACTGG-3′; NLJ3F 5′-GGGACACGAGGATGCCCTAA-3′, NLJ3R 5′-CAGAGGCAGCCAGCCAATTT-3′; AP20F 5′-CTTCACCACAGCTGGCCAC-3′, AP20R 5′-CCTATGGCATCGTGTGTCTG-3′.
Probes for the FISH analysis were labeled using Nick translation kits (Roche Molecular Biochemicals, Indianapolis, IN) with either biotin-14-dATP or digoxigenin-11-dUTP according to manufacturer’s protocol.
Five YAC clones, 925e3, 938g7, 936c1, 803g5, and 790f3, were selected from Centre d’Etude du Polymorphisme Humain-Genethon integrated maps. Human PAC clones 167i15, 84h8, 38k3, and 296a3 were identified using hybridization of NotI-linking clones with high-density-gridded filters (PAC Library RPC11, HGMP Resource Centre, Hinxton, Cambridge, United Kingdom). P1 clone RDK3118, containing a full-length human mutL (Escherichia coli) homologue 1 (MLH1) gene, was kindly provided by Dr. Richard D. Kolodner (Ludwig Institute for Cancer Research, La Jolla, CA). Human BAC clones were purchased from HGMP Resource Center.
FISH.
Slides with metaphase spreads of normal male individuals were supplied by Micro System Sweden AB (Stockholm, Sweden).
DNA fibers were prepared according to the technique that uses agarose-embedded high molecular weight DNA as a target for FISH, essentially as described previously (21) . In brief, peripheral blood lymphocytes were embedded in a 1% low melting point agarose for the preparation of blocks containing 108 cells/ml. Cells were lysed by incubation at 52°C in 1 mg/ml proteinase K (Merck, Darmstadt, Germany) in 1× 10 mm Tris (pH 8)-1 mm EDTA in the presence of 1% N-laurylsarcsine for 48 h. Agarose blocks were washed five times in 1× 10 mm Tris (pH 8)-1 mm EDTA overnight, treated with 100 μg/ml RNase A in 2× SSC at 52°C overnight, and stored at 4°C in 50 mm EDTA. A small piece of block (5 μl) was placed at the end of 3-aminopropyl-trimethoxysilane-coated glass microscope slide (Merck), 15 μl of water were added onto the agarose, and the slide was heated at 95°C for 20–30 s. DNA extended on the slide that was then air dried, submerged in a 70% ethanol for 30 min at room temperature, cross-linked at 80°C for 30 min, and stored in the dark at −20°C.
FISH was performed using standard procedures (22) . Biotinylated probes were detected using Cy3-conjugated avidin (Amersham Pharmacia Biotech, Piscataway, NJ), and the signal was amplified by biotinylated goat antiavidin (Vector Laboratories, Burlingame, CA) and another layer of Cy3-avidin. For digoxigenin-labeled probes, mouse antidigoxigenin (Roche Molecular Biochemicals) and rabbit anti-FITC and fluorescein-conjugated swine antirabbit (both from Dako A/S, Glostrup, Denmark) were used. Slides were counterstained with 5 μg/ml 4′,6-diamidino-2-phenylindole (Merck) and mounted in Vectashield antifade medium (Vector Laboratories).
Molecular Cloning of Human APRG1, GOLPH5, and STI2 Genes and Expression Analysis.
Gene fragments have been obtained by PCR from the Multiple Tissue cDNA panel no. K1421-1 (Clontech, Palo Alto, CA), using the following primer sets, according to manufacture’s manual: hP65–5 5′-GAATCGAGCGCCGAGAGAGCGAGT-3′, hP65–3 5′-GTGAGGGCAACTTTGGGTCAGACT-3′; APRG1–5 5′-ATCTGTTATGTTCACTGGGGCATCTCC-3′, APRG1–3 5′-AATGAAGTGCCATCATTTAGCCAGTCC-3′; STI2–5 5′-GAGATGAGCAGCAATGACTCCTCCCTTAT-3′, STI2–3 5′-AATGTGTCATTTTCTGAATCCCTTCTCCA-3′.
PCR products were cloned by Topo TA cloning kit for sequencing (Invitrogen, Carlsbad, CA).
To determine expression pattern of GOLPH5 and STI2, the Northern hybridization with human multiple tissue nos. 7760-1 and 7766-1 Northern blots (Clontech) was performed.
Bioinformatics.
DNA homology searches were performed using BLASTX and BLASTN (23 , 24) programs at the NCBI server. Sequence assembling was done using Dnasis (Hitachi-Pharmacia). The Beauty Post-Processor was used with the BLASTP protein database searches provided by the Human Genome Sequencing Center (Houston, TX). 8 Scanning of the Prosite and the PfamA protein families and domains was performed at the server of the Swiss Institute for Experimental Cancer Research and at the NCBI server (CD-Search). 9 Transmembrane regions and their orientation/topology (TMpred prediction) were provided by the ISREC-server. 10
RESULTS AND DISCUSSION
NotI Physical Map of the AP20 (3p21.3T) Region.
We have established the order of NotI-linking clones NLJ-003, AP20, and NL3-003 and showed that the 3p21.3 breakpoint in MCH939.2 was located between NLJ-003 and AP20 clones (13) . Moreover, this region was missing in all other spontaneously deleted MCH cell lines that we analyzed previously (25, 26, 27, 28) : MCH910.7, MCH429.11, and MCH924.4.
To confirm that the breakpoint in the MCH939.2 cell line was inside the homozygous deletion described by Murata et al. (11) , we designed PCR primers, including NL1-024, AP40, NL1-401, NLJ-003, and AP20. All primers except D3S1611, AP20, and D3S3521 yielded expected products with the YAC 936c1 DNA that completely covered the homozygous deletion. This confirmed that the region spontaneously deleted in MCH cell lines coincided with the homozygous deletion detected in SCLC and non-small cell lung cancer cell lines. This was additionally validated by PCR analysis that placed the NLJ-003 linking clone into the homozygous deletion in ACC-LC5 cell line (Table 1) ⇓ .
AP-20 region markers ordering according to PCR with YAC clones and ACC-LC5
Using FISH and the deletion cell hybrid panel, eight NotI-linking clones were mapped to the 3p21.3-p22 region (25) and connected by NotI-jumping clones and PFGE hybridization (Fig. 1 ⇓ ; jumping clone J31-613 connecting NL3-003 and downstream located NL1-308 is not shown, as well as the PFGE data for NL1-024 and NL3-019). The NotI map for the AP20 region (Fig. 2) ⇓ was significantly different from Gemmill et al. (29) . For example, between MLH1 and NL3-003 we found 4 NotI clones compared with 12 in this published map (29) .
A NotI physical map of the AP20 region. Two halves of NotI clones are designated by letters a–k. A, orientation of NLJ-003 and NL1-401 clones according to Daigo et al. (14) . Black boxes show possible NotI-jumping clones. B, orientation of NotI clones and physical distances between them established in this study. Possible NotI jumping clones are shown. Two of them (J31-611 and J32-612) have been isolated from libraries described previously (16 , 17) . C, representative PFGE hybridizations with NotI-linking clones.
An integrated map of the AP20 region. A, ideogram of the banding pattern of human chromosome 3. B, physical map (in Mb) of the region with NotI framework markers and sequence-tagged sites, including microsatellite markers. C, minimum tiling contig of BAC, PAC, and NotI-linking clones covering the region. The YAC 936c1 and homozygous deletion in ACC-LC5 are also shown. D, gene map. Orientations of transcription are designated by arrows. Putative pseudogenes and other in silico predicted genes for which no confirmed information was obtained are not shown. E, the cluster of genes orthologous to AP20 region on murine chromosome 9 (MMU9), 62–70 cM, in same orientation: from telomere to centromere. Solid line shows the region of continuous DNA homology. GenBank accession numbers are given for unnamed genes. F, paralogous genes on human chromosome 2 (red, HSA2q35, and green, HSA2q32, lines) and murine chromosomes 1 and 2 (MMU1 and MMU2, deep and light blue lines). The region of natural resistance/susceptibility to intracellular macrophages parasites (NRAMP1, gene SLC11A1) on the HSA2q35 is highlighted. The name for the murine ortholog of the human NLI-IF gene, Gip, is given according to GenBank accession no. AY028804.
Our map was also in contradiction with other published (12 , 14) and draft maps available online. 11 The most striking difference is the orientation of the central part between clones NL1-401 and AP20. According to our data, the order of the clones is as follows: tel, NL1-401-NLJ-003 - AP20-cen (Fig. 1B) ⇓ and NotI-jumping clones J31-611 and J32-612 ultimately argue against order suggested in Ref. 14 , tel, AP20 - NL1-401-NLJ-003 (Fig. 1A) ⇓ where such NotI-jumping clones could not exist (see construction of NotI jumping libraries in Ref. 16 ). This NotI map is in excellent agreement with our previously published data (13) and contains significantly more information, including physical distances between framework markers.
Construction of an Ordered Contig of Overlapping Clones Using FISH.
Similarly to the YACs in the LUCA 3p21.3C region (6) , all YAC clones covering our AP20 region showed rearrangements and deletions (see Table 1 ⇓ ). However, by combining published, internet data, and our own search for PAC and BAC clones from the region, we succeeded in creating a contig of overlapping PAC/BAC/NotI-linking clones completely covering 3.5 Mb in the region with only one gap of ∼90 kb that was spanned by the SCN5A gene. We found that the 1-Mb centromeric segment was enriched by endogenous retrovirus sequences (between SGC38212 and D3S3521 and around WI2025). It, perhaps, created crucial difficulties for shotgun genome sequencing approaches and could be an explanation as to why maps published by Celera Genomics (Rockville, MD) and NCBI (Bethesda, MD) were not correct for this particular region.
Many LINE repeats were present in the only remaining gap, and it was very unlikely that an unknown protein coding gene could be present in this gap. Therefore, the gene map shown in Fig. 2D ⇓ is the most complete among available at present.
The minimal contig of overlapping clones included 15 BACs, 5 PACs, and 8 NotI-linking clones (Fig. 2C) ⇓ . Gaps between NCBI bactigs were overlapped. To generate new sequence tags, direct sequencing of the PACs ends was performed. In total, 28 genetic markers were tested, including 9 derived by us, 16 by the Whitehead Institute, and 3 by the Cooperative Human Linkage Center. All these markers were mapped in separate genetic maps, and we were able to integrate them onto our single physical map. Both NCBI physical and genetic maps released at various time points were rather different (see for example Build 25 and Build 29), thus we can compare our map only with published previously (14 , 29 , 30) . Because their maps overlapped partially with our map, only limited comparisons were possible. The contig orientation and hence the gene map between AP40 (D3S1646) and NIB1520 were in reversed order as compared with our data. Published positions of MYD88 and ACAA1 genes were not concordant with their real physical positions because we have assigned these genes by direct sequencing to a single PAC clone 296a3. The order of markers of internal 1.2-Mb region between NIB1520 and SGC38212 in our map was opposite to earlier published integrated map (14) . Also there were two gaps between NCBI contigs in this region, close to NIB1520 and SGC38212. Thus, it is likely that the region between D3S1646 and SGC38212 was misplaced in earlier genetic and physical maps.
To confirm the contig assembly of AP20 region by an independent method, several clones were mapped by multicolor FISH on metaphase chromosomes (see examples in Fig. 3, A–C ⇓ ). From Fig. 3, B and C ⇓ , where RDK3118 (containing MLH1 gene), 167i15 (containing clone NL1-401), and AP20 clones were used, it was clear that AP20 marker was the most centromeric probe. To prove the physical distances and assignment of the borders of ACC-LC5 homozygous deletion, we performed Fiber-FISH with 5 PACs (167i15, 84h8, 38k3, 296a3, and RDK3118) and two NotI clones (AP40 and AP20). The length of the PAC signals and gaps was measured using AP20 (11.4 kb) as a ruler (Fig. 3F) ⇓ . All PACs were end sequenced and fiber-FISH confirmed their perfect integrity. For instance, the size of PAC 296a3 was estimated by Fiber-FISH as 190 kb, compared with 190,764 bp, obtained by end sequencing and alignment to published data (AB026898). The data proved that PACs 167i15 and 296a3 were located at the borders of ACC-LC5 homozygous deletion, with the distance between them of ∼37 kb in this cell line compared with ∼606 kb in normal lymphocytes. Interestingly, the AP40 marker was rather close to MLH1 gene (30-kb gap between this clone and RDK3118 containing MLH1).
Representative FISH images. A, two-color FISH assignment of PAC 167i15 (red) and AP20 (green) to 3p22-p21.3. B, ordering of PAC 167i15 (red) and AP20 (green) clones. Selected interphase nuclei (1) and metaphase chromosome images (2) probe show more centromeric localization of AP20 clone. C, ordering of RDK3118 (green), 167i15 (red), and AP20 (green) clones. Figure demonstrates that these clones are colocalized at one band of metaphase chromosome (2) , but interphase FISH (1) reveals their order as following: RDK3118, 167i15, AP20. D, localization of PAC 296a3 (green) centromeric to RDK3118 (red) and 167i15 (red) clones. Interphase FISH was done with 167i15 and 296a3 (1) , RDK3118 and 296a3 (2) , and all three clones (3) . E, localization of 296a3 (green) between 167i15 (red) and AP20 (red) clones. Examples of interphase FISH are shown (1) . F, collection of high-resolution fiber-FISH images. The length of fiber-FISH hybridization signal is proportional to DNA length of the probe. Thus, an image of probe with known length could be used as a visual ruler to measure physical distances. G, evaluation of physical distance between AP40 (red) and RDK3118 (green). The gap in hybridization pattern corresponds to cirka 30-kb distance between these probes on physical map. H, fiber-FISH with PAC 167i15 (red) and PAC 84h5 (green). The second image is shown with artificial shift between green and red signals, whereas the first shows the real image, where the overlapping part is yellow. I, FISH with PAC 167i15 (red) and PAC 296a3 (green) using DNA fibers from ACC-LC5 SCLC cell line. The homozygous deletion drastically decreases distance between these two probes.
Gene Map of AP20 Region.
First of all we tested which genes were linked to the studied NotI sites. We have already shown that NL1-308 contains the MOBP gene, and AP20 was a part of the SCN5A gene (13 , 25) . Additionally, clone NL1-401 was assigned to the ITGA9 gene; NLJ-003 marked the HYA22 gene (human ortholog of the yeast YA22 gene); NL3-019 contained 5′ end of the ORCTL4 gene; and NL3-003 had the GOLPH5 gene. Clone NL1-024 possessed the OSBPL10 gene, and AP40 was tied to a gene with an unknown function (AW967589). We analyzed microsatellite markers from the AP20 region used by us in earlier studies and found that all of them were associated with genes. D3S1611 was inside the MLH1 gene, the villin-like gene connected to D3S1298, the XYLB gene contained D3S1260, and the MYD88 gene contained STS SGC30812.
In silico methods based on computational analysis of the ∼3.5-Mb sequence allowed us to discover and clone additional genes there by increasing the total number of resident genes to 35 (Fig. 2D ⇓ , Table 2 ⇓ ) compared with 14 genes mapped earlier (14) .
Genes identified in AP-20 region
Gene sets were analyzed extensively using manual experimental methods as well as web-based computational servers to assign, when possible, protein functions. Three of the genes by Northern analysis showed reduced mRNA levels or loss of expression in SCLC: MLH1, DLEC1, and AXUD1. However, no other data supporting their TSG activity or involvement in lung or kidney carcinogenesis was published (1) .
Three new genes found in the critical segment are described below.
The APRG1 gene (GenBank accession nos. AJ493599–AJ493605) was discovered by the analysis of EST clone H86663, assigned to PAC 167i15. The gene occupied ∼36.3 kb of genomic space and was composed of at least seven exons. By screening of cDNA libraries and the Multiple Tissue cDNA panel with APRG1-5/APRG1-3 primers set, we have found four splice forms as 1.0–1.8 kb messages (Fig. 4) ⇓ : form A (coding exon 6), form B (coding exons 6 and 8αβ), form C (coding exons 4 and 8β), and form D (coding exons 4 and 8αβ). Two ESTs from heart cDNA library were available in public databases. The clone H86663 contained intron 6 because of incomplete splicing (form E at Fig. 4 ⇓ ; AJ493603), and clone AA593744 presented exons 2–4 only (form F; AJ493604). Promoter/enhancer site (−778 bp) was predicted upstream of exon 2. Numerous putative transcription factor binding signals could be found throughout the first 500 bp, e.g., for transcription factors Lyf-1, DeltaEF-1, Tcf-11, GATA, and others. Two more ESTs, BM701622 (retina) and AW152277 (uterus), formed rare splice variant G, including exon 1 (AJ493605). The predicted promoter/enhancer site started at −215 bp. A predicted molecular weight of the longest polypeptide, coding by isoform B, was Mr 18,500. The splice variants F and G did not present an extended ORF.
Organization of the APRG1 gene. Boxes indicate APRG1 exons (numbers 1–8). Two CpG islands with the transcription initiation sites are designated with black, thick lines. Seven APRG1 transcripts, A–G, are outlined and their structures are also presented on the right.
Northern hybridization did not result in any major specific signal with neither A nor B splice variants, indicating low abundance of APRG1 mRNA. We performed PCR analysis of APRG1 expression with the Multiple Tissue cDNA panel, using the APRG1–5/APRG1–3 primers (exons 2, 8β). The APRG1 mRNA was expressed at the highest level in pancreas (splice forms B and D), kidney (A), and placenta (B and C). No PCR products were observed with cDNAs from heart, skeletal muscles, liver, brain, retina, and uterus. Therefore, we hypothesized that the above-mentioned tissues expressed incomplete splicing forms without either exon 2 or exon 8β of the APRG1.
By sequence analysis, the gene encoded an unknown protein (170 aa, isoform B) that has NH2 terminus conserved among eIF6 gene family (pfam01912 and pfam02697). This family included eukaryotic translation factor 6 as well as its presumed archaebacterial homologues.
Using the TMpred program, the APRG1B protein was predicted to span the membrane once at the COOH terminus (residues 148–166 aa). In addition, weak matches of kinase phosphorylation sites, N-myristoylation, and amidation sites were discovered by ProfileScan program. Prosite pattern corresponding to the cell attachment sequence, Arg-Gly-Asp, was found in the position 135–137. The presence of these domains raised the intriguing possibility that APRG1B may be directly involved in membrane interactions and cell adhesion (Prosite: pdoc00016). Interestingly, the isoform E, containing the same NH2-terminal domain structures, had no membrane helix or Arg-Gly-Asp sequence.
BLAST searches in the mouse and other EST databases revealed cDNA from Bos taurus (AW358963; similarity 93% over 190 bp; exon 6). Only limited protein homologies to the Drosophila’s dah gene product, microneme protein-1 (Plasmodium vivax) and related to transforming growth factor β receptor associated protein 1 (Neurospora crassa) were found (similarity 52% at 57 aa overlap, 51% at 58 aa, and 44% at 87 aa, respectively).
The GOLPH5 (AJ409349; latest NCBI name GORASP1) was discovered by screening EST databases with NotI-linking clone NL3-003 (D3S3872) sequence. The gene occupied ∼11.0-kb of genomic space was composed of 8 exons and abundantly represented in EST databases.
To determine the tissue distribution of GOLPH5 mRNA, a multiple tissue Northern blot was probed with the hP65-5/hP65-3 PCR product, containing complete ORF of GOLPH5. GOLPH5 was expressed as a 3.5-kb mRNA in many normal human tissues (Fig. 5A) ⇓ . The transcript was underrepresented in lung and heart.
Analysis of expression of the GOLPH5 (A) and STI2 (B) in different tissues. Control hybridization with the same filter (Human Multiple Tissue Northern blots, Clontech, nos. 7765-1 and 7760-1) using β-actin probe is shown for comparison.
By sequence analysis, the gene product of 440 aa was homologous (81% identity) to rat Golgi reassembly stacking protein, GRASP65 (NP_062258), and mouse hypothetical protein (82%, BC012251). NH2-terminal end was discovered by CD-Search to have two PDZ domains (residues 2–71 aa and 78–131 aa). They may function in targeting signaling molecules to submembranous sites. COOH-terminal domain (288–410 aa) was very conserved with rat and mouse (91% similarity), but its function seemed to be unclear. The neighboring domain 206–288 aa was less homologous (62%) and detected as a Pro-rich sequence (PS50099) by ProfileScan program in human and in rodent orthologs. The rat ortholog was shown to be a Golgi-associated protein, targeted by mitotic kinases (31) , and it is reasonable to suggest the same function for the GOLPH5.
The STI2 (AJ487015) was discovered with the help of ESTs AC092053 assigned to BAC 331g2. The gene occupied ∼27kb of genomic space, was composed of 27 exons, and abundantly represented in EST databases. To determine the tissue distribution of STI2 mRNA, multiple tissue Northern blots were probed with the STI2-5/STI2-3 PCR product, containing complete ORF of STI2. As shown in Fig. 5B ⇓ , STI2 was strongly expressed as a 6.5-kb mRNA only in testis. In the following tissues, pancreas, kidney, skeletal muscle, liver, lung, placenta, brain, heart, the STI2, the transcript was hardly detectable by Northern analysis (Fig. 5) ⇓ .
By sequence analysis, the gene product of 1320 aa was homologous (76% identity) to putative murine protein containing TPR domain (BAB29674 and AK015017) and stress-inducible protein STI1 of C. elegans (29%, NP_498315). NH2-terminal end was found by CD-Search to have 10 TPRs. The tetratricopeptide repeat of ∼34 amino acids was first described in the yeast cell cycle regulator Cdc23p and later was found to occur in a large number of proteins. This family has been implicated in a wide variety of functions, including tumorigenesis. It has been proposed that TPR proteins preferably interact with WD-40 repeat proteins, but in many instances, several TPR proteins seemed to aggregate to multiprotein complexes (InterPro accession no. IPR001440).
AP20 Region Was Duplicated during Mammalian Evolution.
We have found the gene set paralogous to AP20 region in chromosome 2q. Its chromosomal location was assigned to 2q32 and 2q35 (Fig. 2F) ⇓ . The protein homologies between chr.2 and chr.3 cluster members were as follows: 60% positive for ITGA4/ITGA9, 76% NLI-IF/HYA22, 63% VIL1/VILL, 73% PLCD4/PLCD1, 69% SCN2A/SCN5A, 80% SCN9A/SCN10A, and 84% GRASP55/GOLPH5. The linkage relationship within at least a 2-Mb region indicates a single large-scale genomic duplication. We have also found that the integrin ITGA4 gene (paralog to ITGA9) is assigned on 2q more centromeric, to chromosomal band 2q32 (shown in green in Fig. 2F ⇓ ) and, thus, is discontinuous with the other gene set (shown in red). The comparison with orthologous linkage groups in the murine genome revealed a similar organization (Fig. 2, E and F) ⇓ : single cluster on MMU 9 (HSA 3p22-21.3) and two clusters on MMU1 (Vil and Plcd4) and MMU2 (Itga4 gene). This could be explained by additional rearrangement of originally duplicated region that still exists in its original form in human chromosome 3 and mouse chromosome 9. The HYA22 shares a common ancestor with nuclear LIM interactor-interacting factor (NLI-IF), which is linked to SLC11A1, gene of natural resistance/susceptibility to intracellular macrophages parasites (NRAMP1 region; Refs. 32 , 33 ). SLC11A1 encodes a biallelic (G169D) macrophage-restricted divalent-cation transporter and is implicated in iron regulation in vivo (33) . Moreover, it was suggested that this region is involved in susceptibility to tuberculosis (34) . The SLC11A1 gene spans 13604 bp and its sequence is highly enriched for DNA repeats. We performed careful search for the homologous gene in 3p22-p21.33 and have found only an untranscribed sequence with weak homology to 3′-end of the SLC11A1.
Thus, the nucleotide sequence overlapping HYA22 shares both breakpoints of evolutionary conserved gene cluster and homozygous deletion.
This gene product contains a nuclear LIM interactor (NLI) interacting domain. NLI domain is a structural motif that has been well conserved throughout evolution from yeast to plants and known to play important regulatory roles in cellular development. Present evidence suggests that NLI facilitates long-range promoter enhancer interaction and is involved in mediating the cross-talk between transcriptional control elements. NLI may regulate the transcriptional activity of LIM homeodomain proteins by determining specific partner interactions. It is likely that NLI may function as an adapter protein to mediate the interaction between LIM domain transcription factors and a non-LIM factor (NLI-interacting factor). Thus it is possible that HYA22 may function as coordinator of transcriptional activity via its interaction with NLI.
Noteworthy, this gene was deleted during the construction of original cosmid contig covering ACC-LC5 deletion and yeast cells that lack YA22 lose viability (12) . Furthermore, closely related OS4 (conserved gene amplified in osteosarcoma) gene is most likely involved in the development of human sarcomas (35) .
Methylation status of AP20 region in RCC cell lines and biopsies.
Aberrant de novo methylation and silencing of tumor suppressor genes may be an initiating event in carcinogenesis. Therefore, additional experiments were performed to study methylation status of AP20 region. All this experiments exploited methylation sensitivity of NotI restriction enzyme.
DNA from RCC cell lines was digested either with single methylation-insensitive restriction enzyme (XbaI or BamHI) or double digested with one of these enzymes and NotI. Example of such experiment is shown in Fig. 6A ⇓ . The NL1-401 NotI site is methylated in all RCC cell lines in contrast to the control. Cell line CBMI-Ral-STO served as a control because of highly hypomethylated status of its DNA (20) . The summary data from this experiment are shown in Table 3 ⇓ . As demonstrated by Southern blot analysis, the NL3-019 NotI site was methylated even in the control. The AP40, NL1-401, and NL1-308 CpG islands remained methylated in all seven studied RCC cell lines.
Methylation analysis of AP20 region. A, Southern hybridization with DNA from RCC cell lines digested with XbaI (X) or simultaneously with XbaI and NotI (XN). The Southern blot was hybridized with NL1-401 NotI-linking clone. CBMI-Ral-STO DNA was used to show unmethylated status of NotI site. NotI digestion was not observed in all RCC cell lines. PFGE hybridization of telomeric AP20 (g) probe to cell lines (B) or normal (N) and RCC (T) biopsies (C). DNA from ACHN RCC cell line was also added in C for comparison. From these last data, it is clear that only normal kidney DNA samples contain cleavable NL3-019 NotI site located between AP20 and NLJ-003.
Methylation of sites from AP-20 region in cell cultures
In the second type of experiments, methylation was investigated by PFGE, using NotI clones as probes. Again it was found that AP20 region was methylated in all 11 studied RCC cell lines and eight biopsies. Example of such hybridization with AP20 (g) fragment (Fig. 1) ⇓ is shown in Fig. 6, B and C ⇓ . Only 0.9-Mb fragment was revealed, thus DNA was not digested at the NL3-019 site. Interestingly, normal kidney tissue samples were hemizygously methylated in NL3-019 (Fig. 6C) ⇓ , the region corresponded to 5′ end of the ORCTL4 gene.
In summary, our results implied that inactivation of the 3p22-p21.33-specific putative TSG(s) might be caused not only by deletion but also by hypermethylation of this region. We and others have shown that another critical 3p21.3 region, LUCA, contained several genes with functions that antagonized tumor growth and therefore could be considered as bona fide TSGs. Importantly, TSGs were located in a 0.5-Mb region and were frequently hemizygously or homozygously deleted. These deletions result in inactivation of one or both copies of a gene. Hypermethylation, the second, may be an even more important inactivating mechanism of TSGs in the LUCA region, leads to severe decrease or complete loss of gene expression (1) .
Our finding that the AP20 region is heavily methylated in all studied RCC cell lines suggests that hypermethylation of TSG(s) in this region may also play a critical role similar to the situation in the LUCA region. It is possible that this locus contains insulators that can inactivate several genes simultaneously. The AP20 locus is frequently affected in several cancers (1 , 5) and even YAC clones in AP20 are extremely unstable (Table 1) ⇓ .
In this study, we have constructed the most complete gene map of the AP20 region and provided a foundation for the additional testing of candidate TSGs. No obvious TSG candidate could be recognized at a first look. However, it was indeed shown that the MLH1 and MYD88 genes have tumor antagonizing activity (9 , 36) . It is very likely that the function of the ITGA9 gene is also important for growth regulation and could be involved in tumorigenesis (1) . Moreover, many genes in the region have multiple splicing forms and both mutation or functional analyses of such genes represent a difficult task. For example, DLEC1 (DLC1; Ref. 37 ) consists of at least 37 exons with numerous alternative splice forms. Introduction of the DLEC1 cDNA significantly suppressed the growth of different cancer cell lines. These features are reminiscent of the situation with the putative tumor suppressor FHIT gene (1) . Other examples could be the APRG1 and HYA22 genes. Both genes have several alternative splice forms that were not analyzed earlier. Preliminary data showed that both APRG1 and HYA22 suppressed growth of ACC-LC5 cells in vitro in a tetracycline-regulated system (Refs. 1 , 38 , 39 ; Zabarovsky, unpublished data). Thus AP20 region may be similar to LUCA region and contains several TSGs.
Most gene-finding algorithms are designed to look for protein-coding sequences, which can be more readily identified than noncoding RNAs by virtue of their ORFs, polyadenylation signals, conserved promoter regions, or splice-site signals. However, it was recently understood that genes coding for so-called ncRNA could have extremely important functions. Many of these ncRNAs have regulatory functions that play an important role in gene silencing mechanisms (RNA interference, RNAi; small intermediate RNAs, siRNA; for example, see Ref. 40 ). Regulatory ncRNA molecules could originate from the introns of protein-coding genes as functional by-products. The growing number of new, functional ncRNAs shows that to fully understand the molecular mechanisms in a cell, we have to go beyond the predicted proteome when analyzing genomic sequences, and there remains much more to be explored in this field with regard to the AP20 region.
Footnotes
-
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
-
↵1 This work was supported by research grants from the Swedish Cancer Society, the Swedish Research Council, Pharmacia Corporation, Åke Wiberg Foundation, STINT, and Karolinska Institute.
-
↵10 Internet address: www.ch.embnet.org.
-
↵11 Internet address: www.ncbi.nlm.nih.gov/cgi-bin/Entrez/maps.cgi?org = hum&chr = 3.
-
↵2 To whom requests for reprints should be addressed, at Microbiology and Tumor Biology Center, Karolinska Institute, Box 280, S-171 77 Stockholm, Sweden. Phone: 46-8-728-67-37; Fax: 46-8-31-94-70; E-mail: alepro{at}ki.se or Phone: 46-8-728-67-50; Fax: 46-8-31-94-70; E-mail: eugzab{at}ki.se
-
↵3 These authors contributed equally to this work.
-
↵4 The abbreviations used are: TSG, tumor suppressor gene; SCLC, small cell lung cancer; MCH, human monochromosome; RCC, renal cancer cell; PFGE, pulsed-field gel electrophoresis; FISH, fluorescence in situ hybridization; YAC, yeast artificial chromosome; BAC, bacterial artificial chromosome; NCBI, National Center for Biotechnology Information; ORF, open reading frame; TPR, tetratricopeptide repeat; PAC, P1 artificial chromosome; EST, expressed sequence tag; ISREC, Swiss Institute for Experimental Cancer Research.
-
↵5 Internet address: www.ncbi.nlm.nih.gov.
-
↵6 Internet address: www-genome.wi.mit.edu.
-
↵7 Internet address: gai.nci.nih.gov/CHLC.
-
↵8 Internet address: dot.imgen.bcm.tmc.edu: 9331.
-
↵9 Internet address: www.isrec.isb-sib.ch/software/PFSCAN_form.html.
- Received July 23, 2002.
- Accepted November 13, 2002.
- ©2003 American Association for Cancer Research.