Integration of human papillomavirus (HPV) DNA into the host genome is a frequent event in cervical carcinogenesis and is reported to occur at randomly selected chromosomal sites. However, as the databases are being up-dated continuously, the knowledge based on sequenced viral integration sites also expands. In this study, viral-cellular fusion transcripts of a preselected group of 74 cervical carcinoma or cervical intraepithelial neoplasia grade 3 (CIN3) biopsies harboring integrated HPV16, HPV18, HPV31, HPV33, or HPV45 DNA were amplified by 3′-rapid amplification of cDNA ends PCR and sequenced. Consistent with previous reports, integration sites were found to be distributed throughout the genome. However, 23% (17 of 74) of the integration sites were located within the cytogenetic bands 4q13.3, 8q24.21, 13q22.1, and 17q21, in clusters ranging from 86 to 900 kb. Of note is that clusters 8q24.21 and 13q22.1 are within 1.5 Mbp of an adjacent fragile site whereas clusters 4q13.3 and 17q21 are >15 Mbp distant to any known fragile sites. It is tempting to speculate that as yet unknown fragile sites may be identified on the basis of HPV integration hotspots. No correlation between HPV type and specific integration loci was found. Of 74 fusion transcripts, 28 contained cellular sequences, which were homologous to known genes, and 40 samples contained sequences of predicted genes. In 33 fusion transcripts, both viral and cellular sequences were in sense orientation, indicating that the gene itself or upstream sequences were affected by integration. These data suggest that the influence of HPV integration on host gene expression may not be a rare effect and should encourage more detailed analyses. [Cancer Res 2008;68(7):2514–22]
- cervical cancer
- human papillomavirus
- fusion transcripts
- fragile sites
Persistent infection with high-risk human papillomavirus (HPV) is the main risk factor for the development of high-grade precancerous lesions [cervical intraepithelial neoplasia grade 2 (CIN2) and CIN3] and cervical carcinoma ( 1, 2), with HPV16 being the most prevalent type, followed by HPV18, HPV31, HPV33, and HPV45 ( 3). The immortalizing and transforming potential of the virus lies in its oncogenes E6 and E7, underlined by their constitutive expression in HPV-induced cervical carcinomas ( 4, 5). The E6 and E7 oncoproteins mediate mitogenic and antiapoptotic stimuli by interacting with numerous regulatory proteins of the host cell that control the cell cycle ( 6, 7). Moreover, the viral oncoproteins induce mitotic defects and genomic instability by uncoupling centrosome duplication from the cell division cycle ( 8, 9). During malignant progression, viral DNA is frequently integrated into the host genome ( 10, 11). A number of experimental studies show that integration results in altered expression of E6 and E7 thereby providing the cells with a selective growth advantage ( 12). Moreover, viral sequences expressed as fusion transcripts together with human sequences have been reported to have a longer half-life as compared with episome-derived transcripts, which is explained by the absence of instability elements normally present in the 3′ untranslated region of the episome-derived transcripts ( 13). In addition, due to the frequent disruption of the E2 region during the integration process, the viral transcriptional control by E2 is lost and, as a consequence, the expression of E6 and E7 increases ( 14). However, it needs to be considered that in many precancers and cancers, episomal viral genomes are present together with the integrated viral DNA, suggesting that viral E2 is available in trans to regulate E6 and E7 expression ( 15). E6 and E7 expression levels may also be influenced by human cis-acting regulatory elements present in the vicinity of the viral integration site ( 16). Indeed, differences in the oncogene expression pattern have been observed by RNA-RNA in situ hybridization between low-grade and high-grade CIN but not between high-grade CIN and invasive cancer ( 17, 18). E6 and E7 transcripts were not detectable or were expressed at very low levels in the basal and parabasal layers of low-grade CIN. In contrast, high-grade CIN invariably expressed E6 and E7 mRNA in basal cells as is the case for cancer cells, but the signal intensity varied considerably irrespective of histologic grading. These differences in steady-state levels of oncogene transcripts were also evident in analyses using quantitative reverse transcription-PCR (RT-PCR) technology ( 19, 20). In a recent analysis, no correlation was found between the expression levels of viral oncogene transcripts and the physical state of the viral genome, which is in clear contrast to the observations made in cell culture models ( 21).
To date, more than 200 virus-host integration sites have been mapped ( 11, 22). All integration loci examined were unique, involving all chromosomes, and fragile sites, translocation break points, and transcriptionally active regions were found to be preferred sites for integration ( 23– 25). In addition to the effects on viral oncogene expression, integration may also influence the expression of cellular genes. Reuter et al. ( 26) report on the functional inactivation of APM-1, a putative tumor suppressor gene, which had resulted from insertional mutagenesis in combination with the deletion of the second allele of the gene. Moreover, in a number of tumors, integration sites have been mapped near the MYC locus. Of note are the high levels of MYC expression in these tumors, suggesting that integration may influence transcription of neighboring genes ( 27– 29). However, a systematic analysis of the disruption or deregulation of cellular genes as a consequence of viral integration has not been done.
The number of characterized integration sites is still too small to fully understand all aspects of HPV integration. The aim of this study was to analyze in further detail viral integration sites at the RNA level to underscore current knowledge and to provide new insights in this process. In addition to HPV16 and HPV18, we have also analyzed integration sites for HPV31, HPV33, and HPV45.
Materials and Methods
Clinical samples. In this study, biopsies from a preselected group of CIN3 and cervical carcinomas known to harbor integrated HPV DNA were analyzed. Sixty-five cervical carcinoma samples (International Federation of Gynecology and Obstetrics stages IA–IVB) were obtained from patients treated at the Department of Gynecology of Friedrich-Schiller-Universität, Jena, Germany, between 1995 and 2003 (n = 29) and at the Norwegian Radium Hospital, Oslo, Norway, between 1995 and 2003 (n = 36). Nine CIN3 samples were obtained from women admitted to Østfold County Hospital, Norway, for treatment in the period October 1999 to October 2001 (n = 5) and to the Norwegian Radium Hospital in October 2002 through October 2003 (n = 4). All biopsies were histologically verified. The samples were positive for HPV16 (n = 15), HPV18 (n = 27), HPV31 (n = 2), HPV33 (n = 7), or HPV45 (n = 23). For HPV typing, we used the GP5+/GP6+ enzyme immunoassay approach as described by Jacobs et al. ( 30) and nucleic acid sequence–based amplification as previously described ( 4, 31).
Nucleic acid isolations. Total RNA was isolated with the RNeasy Mini Kit following the protocol for total RNA extraction from animal tissues (Qiagen). Samples were homogenized by using the QIAshredder (Qiagen). An additional DNase treatment step was included for all samples. For some of the HPV18-, HPV45-, HPV31-, and HPV33-positive samples collected in Norway, nucleic acids were extracted as described earlier ( 4, 31) and DNA was removed by DNase treatment for 20 min at 70°C (Sigma-Aldrich). Total RNA was eluted in 40 μL of RNase-free water and stored at −70°C.
Reverse transcription. Total RNA (∼500 ng) was reverse transcribed using 20 units of Superscript II reverse transcriptase (Invitrogen) and an oligo(dT)17 primer coupled to a linker sequence ( 32), referred to as Frohman primer. The reaction was done for 1 h at 42°C in a final volume of 20 μL. To allow for proper primer annealing, the primer and template were incubated at 70°C for 10 min before the reverse transcription. To control for RNA integrity and first-strand cDNA quality, PCR using glyceraldehyde-3-phosphate dehydrogenase primers was done as previously described ( 33).
Amplification of integrate-derived fusion transcripts. HPV oncogene transcripts derived from integrated viral genomes (viral-cellular fusion transcripts) were amplified using the amplification of papillomavirus oncogene transcript (APOT) assay ( 34). APOT is based on a 3′-rapid amplification of cDNA ends PCR. HPV E7 primers are used as forward primers, an adapter primer complementary to the linker sequence in the Frohman primer as first reverse primer, and the Frohman primer as the second nested reverse primer. The reactions were done as previously described ( 34) 7 with minor modifications for HPV16 and HPV18: the reaction mixture was subjected to an initial denaturation step for 3 min at 94°C, followed by 30 cycles of denaturation at 94°C for 0.5 min, primer annealing, and elongation at 72°C for 2 min. For HPV16, annealing temperatures of 61°C and 67°C for first and second PCR, respectively, were used; for HPV18, 61°C and 70°C. The reaction was ended by a final elongation step at 72°C for 6 min. Five microliters of the first PCR were used as template for the nested PCR step.
APOT amplification products were visualized by 1.2% agarose gel electrophoresis. Viral-cellular fusion transcripts (often referred to as off-size bands) deviate from the characteristic size of the major viral transcript (E6*I-E7-E1vE4-E5) derived from episomal viral DNA. HPV-positive cell lines like HeLa and CaSki, which are known to harbor multiple copies of integrated viral DNA, served as controls in the APOT assay and invariably show off-size bands only.
PCR. For a number of randomly selected samples (n = 15), the existence of the fusion transcript was confirmed by semi-nested RT-PCR using viral and cellular integration site–specific primers. The PCR amplification was carried out in a total volume of 50 μL containing as final concentrations 1× PCR buffer [20 mmol/L Tris-HCl (pH 8.4), 50 mmol/L KCl], 1.5 mmol/L MgCl2, 0.2 mmol/L deoxynucleotide triphosphates, 0.25 μmol/L (250 pmol) of each primer, and 1 unit of Taq DNA polymerase (Invitrogen). The first DNA denaturation step was done for 3 min at 94°C, followed by 30 cycles of denaturation for 45 s at 94°C, annealing for 45 s at 55°C, and extension for 1 min at 72°C, before a final extension for 6 min at 72°C. Five microliters of the first PCR were used as template for the nested PCR step. The temperature profile was as for the first PCR step except for the annealing temperature, which was 57°C.
Expression of predicted genes was explored by semi-nested RT-PCR using gene-specific primers only. The reaction conditions were the same as above.
Cloning and sequence analysis. Viral-cellular fusion transcripts were excised from the agarose gel and extracted using the QIAquick Gel Extraction Kit from Qiagen. The isolated amplimers were cloned into the TOPOpCR4 vector using the TA Cloning Kit (Invitrogen), then subjected to DNA sequence analysis; for some samples, the PCR products were sequenced directly. Samples were sequenced using LI-COR 4200S (MWG-Biotech) or the CEQ 2000 XL DNA Capillary Electrophoresis System from Beckman Coulter. The genomic localization of the human sequences, as well as chromosomal fragile sites located adjacent to the evident viral integration locus, were determined by database query using National Centre for Biotechnology Information (NCBI) Build 36 ( 35) and the University of California, Santa Cruz (UCSC) hg18 (March 2006) human genome assemblies ( 36).
Viral-cellular fusion transcripts of a preselected group of biopsies comprising 65 cervical carcinomas and 9 CIN3 known to harbor integrated viral DNA were amplified, cloned, and sequenced. The cellular sequences of the fusion transcripts were first characterized by the NCBI human megaBlast database alignment tool. Additionally, all sequences were analyzed using the UCSC Blat database to identify expressed sequence tags and genes of probable identity.
Mapping viral-cellular fusion transcripts to chromosomal bands. The chromosomal reference of all viral-cellular fusion transcripts with respect to Giemsa-stained bands was taken from the UCSC database ( 37). All bands targeted by integration were aligned and are presented in Table 1 . Viral-cellular fusion transcripts and thus the corresponding integration sites were found to be distributed throughout the genome, targeting all chromosomes except for chromosomes 21 and 22 ( Fig. 1 ). Twenty-one integration loci, of which most are assigned to sub-bands, have thus far not been reported in the literature. These are marked with a suffix in Table 1. Moreover, several cytogenetic bands showed three or more HPV integration events. Five sites on 8q24.21 were located within a distance of ∼1 to 860 kb upstream of the MYC gene. For 13q22.1, six sites were located within a region of 380 kb between the two genes KLF5 and KLF12. For 4q13.3, three sites were spanning a region of 86 kb between the IL8 gene and CXCL6; for 17q21, a region of 900 kb was targeted by three integration sites. No correlation was seen between HPV type and integration loci.
HPV integration within fragile sites. All integration loci were checked for the presence of fragile sites. By use of the NCBI fragile site map viewer, we found that 42 of the integrations were located in or close to a fragile site ( Table 1): 21 within common or rare fragile sites and 21 up to 5 Mbp adjacent to a common or rare fragile site. Thirty-two samples were not associated with any fragile site.
Splicing within viral sequences. All viral-cellular fusion transcripts were spliced ( Fig. 2 ). In all cases, the splice donor site was within the 5′ end of the viral E1 open reading frame (ORF). The nucleotide positions for the splice donor sites were 880 for HPV16, 929 for HPV18 and HPV45, and 877 and 894 for HPV31 and HPV33, respectively. Almost all transcripts (71 of 74) were spliced directly to cellular sequences. Only in two cases splicing involved the E1 and E4 ORFs before continuing into the human sequence: In the HPV18-positive sample D3772, an ag-ta splice acceptor signal at position 3,432 was used, and in the HPV45-positive sample D3458, an ag-at splice acceptor signal was used at position 3,421. The viral sequences were fused without further splicing to human sequences from nucleotides 3,759 and 3,638, respectively. Exceptional also was a fusion transcript that used a splice acceptor within E1 at nucleotide 2,779 before the transcript continued into the human sequence at nucleotide 3,033 within E2 ( Fig. 2).
Cotranscribed cellular sequences show homology to known and predicted genes. Of 74 fusion transcripts, 28 cellular sequences corresponded to known genes. Forty transcripts contained sequences of predicted genes, as revealed by searching with the UCSC Blat tool ( Table 1). Six samples did not show any similarity other than genomic sequences; however, for sample F380, the use of characteristic splice sites and the presence of a polyadenylation [poly(A)] signal are suggestive of an unidentified gene in this area. In some cases, cDNA synthesis was initiated at A-rich stretches rather than at the 3′ poly(A) tail. This resulted in shorter cDNAs but still permitted the identification of authentic viral-cellular fusion transcripts.
Splicing to cellular sequences and orientation of ORFs. Among the 28 fusion transcripts containing sequences of known cellular genes, 15 transcripts had both the viral and the cellular sequences in sense orientation (i.e., in a coding orientation). In most of these cases (12 of 15), the viral sequence was spliced to a cellular exon sequence ( Table 1). For all but one (sample D3708, NPM3) of these transcripts, the regular cellular splice acceptor site was used. For fusion transcripts containing sequences of predicted genes, the results are more complex: For several transcripts, two or more predicted genes were found on the same chromosomal location and often on both strands of the DNA. Among the 40 samples containing fusion transcripts with sequences of predicted genes, 6 transcripts showed fusion in sense to an exon of a predicted gene.
DNA-based PCR methods allow the determination of the precise chromosomal location of viral integration sites. Examples for these techniques are human interspersed repetitive sequence–specific PCR ( 38), restriction-site PCR ( 39), detection of integrated papillomavirus sequences by ligation-mediated PCR ( 40), and restriction enzyme cleavage, self-ligation, inverse PCR ( 41). The integration assay of the present study is based on the amplification of cDNA by the APOT approach ( 34). This strategy was chosen because we wanted to limit our analyses to integration sites with a transcriptionally active viral genome. In tumors with multiple integration sites, only one integration site seems to be active ( 42). This integration site is not only relevant for the dysregulated expression of the viral oncogenes but may also provide further insights into the effect integration may have on the host genome. By the APOT assay, viral-cellular fusion transcripts are readily detected because the fragment size differs from episome-derived transcripts.
The DNA sequence of the cellular part of all cloned fusion transcripts was assigned to Giemsa-stained chromosomal bands taken from UCSC database ( 37). This enabled an easier comparison with data from the literature. Consistent with previous reports, integration sites were found located over all chromosomes, except for chromosomes 21 and 22. Besides, it is evident that several integration sites are clustered in some chromosomal regions. Twenty-three percent (17 of 74) of the integration sites analyzed were located within the cytogenetic bands 4q13.3, 8q24.21, 13q22.1, and 17q21.2. The overall physical distance between the integration sites within each cluster ranged from 86 to 900 kb, which represents exceedingly short regions in genomic terms. The most prominent integration sites are the ones in the location of 8q24.21. In a review, Wentzensen et al. ( 11) report on 12 clinical samples with HPV integration within band 8q24. In addition, the cell line HeLa shows HPV18 integration in this region ( 27, 28, 43– 45). However, all previous reports refer to the broader bands 8q24, 8q24.1, or 8q24.2. Several studies also report that integration has taken place within or close to the MYC proto-oncogene, also known as c-MYC, previously mapped to the region 8q24.12-13 ( 28, 45). According to the UCSC hg18 database, the MYC gene is located within band 8q24.21. In our study, the sequences identified on 8q24.21 matched neither the MYC gene nor any other gene of known function. However, all integrations were located within a distance of 1 to 860 kb upstream MYC. Integrations of HPV close to the MYC gene may lead to deregulation and activation of MYC, which was previously shown for five cell lines harboring HPV integration in this region ( 46). On the other hand, the MYC gene is commonly deregulated in cancers and it is therefore difficult to substantiate that the deregulation is actually a result of HPV integration. Still, Peter and colleagues showed that in cell lines with integration at other chromosomal sites, no MYC mRNA or protein overexpression was observed, indicating a role of HPV integration in the up-regulation observed in the other cell-lines. Moreover, Ferber et al. ( 28) have found only HPV18 integrations and no HPV16 integrations to be associated with the MYC locus. Hence, it was speculated that this could be an explanation for the generally more aggressive tumors associated with HPV18. Our study, however, does not confirm this hypothesis because we, in addition to two HPV18 integrations, found two HPV16 as well as one HPV 45 integration on 8q24.21. In analogy to the clustering of integration sites on chromosome 8, all of six integration sites on 13q22.1 were located within a region of 380 kb adjacent to the gene KLF5. The influence the viral sequences exert on KLF5 has not been examined, but it is of interest to note that, by ontology, KLF5 is termed as a tumor suppressor gene. Moreover, clusters with three viral integration sites each are located on bands 4q13.3 and 17q21.2. Further evidence for HPV integration sites within these two sub-bands is provided in the literature ( 22, 47). Of interest also are the two integration sites within the gene LRP1B (low-density lipoprotein-related protein 1B) on 2q22.1. HPV integration within this gene has also been reported in two previous independent studies ( 25, 28). LRP1B belongs to the low-density lipoprotein receptor gene family, and through interactions with multiple ligands, these receptors play a wide variety of roles in normal cell function and development ( 48). Further, based on the finding about its inactivation in different tumor types, LRP1B has been reported to be a candidate tumor suppressor gene ( 49– 51).
Viral integration within fragile sites has previously been reported in several studies ( 23– 25). Depending on their mode of induction and frequency within the population, fragile sites are grouped as rare or common ( 52). Fragile sites are regions that form gaps, breaks, or rearrangements on chromosomes exposed to replication stress, and genes in or near these sites are therefore disposed to foreign DNA integration. It is also argued that the frequently observed integration within fragile sites can be related to their greater susceptibility to integration-induced chromosomal alterations ( 22, 24, 28, 29, 39, 53). Of the 74 integrations presented in our study, 21 were located within a common or rare fragile site. It is also of note that the clustered integration sites on 8q24.21 and 13q22.1 were located within 1.52 and 0.83 Mbp, respectively, of an adjacent fragile site and that the clusters on 4q13.3 and 17q21 were located >15 Mbp distant to any known fragile site. Thus, it is tempting to speculate that as yet unknown fragile sites may be identified on the basis of HPV integration hotspots ( Table 1).
In total, 28 of 74 (37.8%) samples showed integration within known genes and 40 of 74 (54.1%) samples within predicted genes. Considering the fact that coding sequences comprise only 2% of the genome ( 54), these data strongly indicate that integration as such is not an entirely random event but may frequently occur in transcriptionally active regions, which may simply be more accessible for integration. This was confirmed in additional experiments in which the transcriptional activity for a subset of the predicted genes (n = 10) was analyzed in normal cervical tissue and CIN. In all cases, transcriptional activity was shown in at least one of the tissues examined by semi-nested RT-PCR. Moreover, direct evidence for the involvement of cellular genes is provided by those fusion transcripts (n = 18) in which viral 5′ coding sequences are fused to cellular exons. For these cases, it remains to be shown whether viral integration has resulted in disruption of the gene or is located upstream of the coding sequences.
The results reported here expand on previous studies about HPV16 and HPV18 integration sites. We also provide the first detailed sequence data on HPV31, HPV33, and HPV45 fusion transcripts identified by the APOT assay. Our findings strongly suggest that there is no correlation between HPV type and specific integration loci. Furthermore, the observed clustering of integration sites in the cytogenetic bands 4q13.3, 8q24.21, 13q22.1, and 17q21 may reflect chromosomal regions, which are genetically unstable and which may be relevant to the development of cervical cancer. In addition, the relatively high number of integration sites in either defined genes or predicted genes suggests that the disruption or deregulation of specific genes through HPV integration could contribute to the process of transformation and carcinogenesis. This needs to be addressed systematically in future analyses.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
We thank Gunnar Kristensen (Department of Gynaecologic Oncology) and Kathrine Lie and Ruth Holm (Department of Pathology, Norwegian Radium Hospital, Oslo, Norway); Björn Hagmar (Institute of Pathology, Rikshospitalet University Hospital, Oslo, Norway); and Lars Espen Ernø (Department of Gynaecology and Obstetrics, Østfold County Hospital, Fredrikstad, Norway) for providing clinical samples and for histologic verification of the biopsies.
Note: I. Kraus and C. Driesch contributed equally to this work.
↵7 S. Vinokurova, personal communication.
- Received July 20, 2007.
- Revision received November 27, 2007.
- Accepted January 30, 2008.
- ©2008 American Association for Cancer Research.