The cyclin dependent kinase (CDK) inhibitors p15, p16, p21, and p27 are frequently deleted, silenced, or downregulated in many malignancies. Inactivation of CDK inhibitors predisposes mice to tumor development, showing that these genes function as tumor suppressors. Here, we describe high-throughput murine leukemia virus insertional mutagenesis screens in mice that are deficient for one or two CDK inhibitors. We retrieved 9,117 retroviral insertions from 476 lymphomas to define hundreds of loci that are mutated more frequently than expected by chance. Many of these loci are skewed toward a specific genetic context of predisposing germline and somatic mutations. We also found associations between these loci with gender, age of tumor onset, and lymphocyte lineage (B or T cell). Comparison of retroviral insertion sites with single nucleotide polymorphisms associated with chronic lymphocytic leukemia revealed a significant overlap between the datasets. Together, our findings highlight the importance of genetic context within large-scale mutation detection studies, and they show a novel use for insertional mutagenesis data in prioritizing disease-associated genes that emerge from genome-wide association studies. Cancer Res; 70(2); 520–31
- CDK inhibitors
- insertional mutagenesis
- Down syndrome
Recent mutation profiling of human tumors has implicated hundreds of genes in the pathogenesis of cancer; however, the vast majority of these genes are only rarely mutated, making it difficult to determine which events are causal “driver” mutations and which are incidental “passenger” mutations (reviewed in ref. 1). Retroviral insertional mutagenesis screens performed in mouse models of cancer are useful complements to studies of human tumors because they provide an independent validation of oncogenic mutations in another organism (2). In these screens, slow-transforming retroviruses are used to induce mutations within somatic tissues (3). Over the lifetime of the mouse, these mutations accumulate, eventually promoting clonal expansion of cells bearing multiple oncogenic mutations. Because viral insertion sites can be easily identified by PCR, a high proportion of the mutations in each tumor can be identified with relatively little effort. This coverage allows for the identification of genetic interactions between mutations at rates that are not yet possible from the study of human tumors. Furthermore, the use of mice allows the identification of cancer genes that collaborate with specific cancer-predisposing lesions that have been introduced into the germline.
Cyclin/cyclin-dependent kinase (CDK) complexes promote the progression of cells through the cell cycle. Deregulation of CDKs is implicated in the progression of a variety of cancers, including hematologic malignancies (reviewed in ref. 4). Based on structural and functional characteristics, two families of CDK inhibitors can be distinguished: the Ink4 family (p16Ink4a, p15Ink4b, p18Ink4c, and p19Ink4d) and the Cip/Kip family (p21Cip1/Waf1/Sdi1, p27Kip1, and p57Kip2; reviewed in ref. 5). The Ink4 proteins specifically inhibit cyclin D–cdk4-6 complexes and expression of Ink4 genes results in a G1-phase cell cycle arrest. p15Ink4b and p16Ink4a as well as p19Arf (which shares part of its coding sequence with p16Ink4a in a different reading frame) are located in the same chromosomal region that is frequently deleted in human cancer (6). Germline knockout studies have shown that loss of p15 and p16 predisposes mice for tumor development, confirming their role as tumor suppressor genes (7–10).
The Cip/Kip proteins have a broader spectrum of substrates, binding to cyclin D–, cyclin E–, and cyclin A–dependent kinase complexes. The p27 and p21 genes are rarely subject to inactivating mutations in human cancer, but can be inactivated by other mechanisms (reviewed by ref. 5). Mouse studies have clearly indicated that p27 is a tumor suppressor gene (11, 12). Germline disruption of p21 may have both antioncogenic and pro-oncogenic effects depending on the genetic context (13–15).
We performed retroviral insertional mutagenesis in mice deficient for one or more Ink4 and/or Cip/Kip family members to identify genes that collaborate with absence of these CDK inhibitors in cancer. Analyzing tumors from 476 mice, we identified 9,117 insertions, finding hundreds of common insertion sites (CIS) targeting known cancer genes as well as novel candidate cancer genes. Many of these CISs are specifically mutated in mice lacking one or more CDK inhibitors. Furthermore, we find CISs correlating with the cell type of origin and latency of the tumor, as well as co-occurring or mutually exclusive combinations of mutations. We also found a significant correlation between the location of single nucleotide polymorphisms (SNP) associated with familial chronic lymphocytic leukemia (CLL) and CISs, indicating that insertional mutagenesis data may be of use in identifying disease-associated genes.
Materials and Methods
Mice strains and induction of tumors by murine leukemia virus infection
Some tumors were described in prior studies (16–18). p15Ink4b-deficient mice are described in ref. (7). Newborn pups were injected i.p. with the supernatant of murine leukemia virus (MuLV)–producing NIH3T3s. Animals were monitored for the development of tumors; moribund mice were sacrificed and tumors were isolated.
Cloning of insertion sites
Ligation-mediated splinkerette PCRs were performed as described in ref. (19). Two ligations were performed for each tumor, using DNA digested with Sau3AI and Tsp509I. PCR products were shotgun subcloned and 96 colonies per PCR were picked and sequenced on a capillary sequencer. Insertions are available online at the Mutapedia7 and RTCGD databases.8
Sequences were mapped onto National Center for Biotechnology Information mouse build 37 using exonerate (20). Sequences from a single mouse with an identical insertion site were merged. CISs were identified and their significance values were estimated using the kernel convolution method described in ref. (21). CIS labels were manually annotated based on viral position and literature analysis. P values for associations between CISs and genotype, gender, age, fluorescence-activated cell sorting (FACS) markers, and co-occurring mutations were calculated using Fisher's exact test. Identification of genes and loci with more than one insertion per tumor at a higher frequency than expected by chance is as described in ref. (22); however, for random permutations, inserts were shuffled between mice of the same genotype rather than between all genotypes together to minimize false positives.
Identification of MuLV insertion sites and CISs in CDK-deficient mice
To identify genes that collaborate in tumorigenesis with deficiency for p15, p16/p19Arf, p21, and p27, we performed retroviral insertional mutagenesis in mice deficient for one or a combination of these genes. p27−/− mice developed tumors significantly faster than wild-type mice. p21 deficiency had no effect on tumor latency even when combined with loss of p27 (Supplementary Table S1; ref. 17). Mice deficient for both p19Arf and p16 also show accelerated tumor formation upon MuLV infection (18), whereas p15 deficiency significantly decelerates tumor formation compared with wild-type controls (log-rank test, P < 0.04).
To identify the genes that are mutated by retroviral insertions, we amplified the flanking sequences of the insertions using a ligation-mediated PCR protocol followed by shotgun subcloning and sequencing as described previously (19, 22). Previously, 931 insertion sites were cloned from a subset of 189 of these tumors (16, 18): an average of 6.6 inserts per tumor. In the current study, using an improved PCR protocol in combination with shotgun subcloning, we identified 9,117 independent insertions from 476 tumors (∼19 insertions/tumor; Supplementary Table S2).
Loci that are mutated at a frequency higher than expected by random chance may contain cancer genes. To identify these CISs, we created a density distribution of insertions over the genome using smoothed Gaussian kernels and estimated the significance of each peak by comparison with 10,000 permutations of insertions randomly distributed over the genome (21). Using a kernel width of 30 kb, we identified CISs for the separate tumor panels (Fig. 1A). We identified many CISs near known or candidate cancer genes, and their prevalence varies between genotypes. For example, some CISs are uniquely found in single knockouts but not in the compound knockouts and vice versa (Supplementary Table S3).
Genotype and gender specificities of CISs
We next asked which CISs are more frequently mutated in one genotype versus another. For this and subsequent analyses, we pooled our insertions with ∼11,000 insertions from 510 MuLV-induced wild-type, p19Arf-, and p53-deficient tumors (22) to extend the number of different genotypes that can be compared, as well as to improve the power of any statistical tests that use all tumors. After identifying CISs (596 using a 30-kb kernel and 300 using a 300-kb kernel; Supplementary Table S4; Fig. 1A and B), we determined which CISs are mutated significantly more frequently in one genotype versus the other genotypes by pairwise Fisher's exact tests (Supplementary Table S5 lists all significant associations). To visualize multiple pairwise comparisons per CIS, we represented the associations of each CIS as a heat map.
The top 12 CISs are mutated significantly more frequently in some genotypes than others (Supplementary Fig. S1). Myc and Mycn are structurally and functionally equivalent in many respects (23). Consistently, both genes show a significant bias toward p27-deficient tumors (Fig. 2A); however, Myc but not Mycn is selected against in p15−/− and p16−/− p19Arf−/− tumors. The genotype specificities of the cyclin D family members Ccnd3 and Ccnd1 also differ. Ccnd3 is less frequently mutated in p16−/− p19Arf−/− mice compared with other genotypes, including p19Arf−/− mice, whereas the opposite is found for Ccnd1 (Fig. 2B), suggesting that activation of Ccnd1 but not Ccnd3 provides a selective advantage in the absence of p16. A similar difference in genotype specificities is found for mutations near the miRNA cistron encoding mmu-mir-106-363 and its paralogue mmu-mir-17-92 (Fig. 2B). This differential specificity of paralogues is a recurrent phenomenon; CISs for Runx1 and Runx3, Pim1 and Pim2, and Rasgrp1 and Rasgrp2 are also selected in different genetic backgrounds, suggesting distinct roles in tumorigenesis despite their homology (Supplementary Fig. S2).
Some CISs are surprisingly specific for compound genotypes but not single knockouts (Zfp217 and Fli1/Ets1 are selected more frequently in p15−/− p21−/− mice than p15−/− or p21−/− single knockouts) or for heterozygotes but not homozygotes (CISs near Sox4 or Cebpb/A530013C23Rik/Ptpn1 are more frequently mutated in p16+/− p19Arf+/− compared with wild-types or p16−/− p19Arf−/−; Supplementary Fig. S3). Heat maps for full sets of genotype associations are available online.9
In a similar fashion, we examined the association of all CISs with gender (Table 1). The Trim25/Dgke CIS is selectively mutated in female mice compared with males. Trim25 is an estrogen-responsive gene that promotes breast cancer growth by targeting the antiproliferative 14-3-3s for proteolysis (24). Mutation of two Irf family members, Irf4 and Irf2, is selected for in female mice (at the 30-kb and 300-kb scales, respectively; Table 1 and data not shown). These oncogenes are thought to act through inhibition of transcription of the tumor suppressor Irf1. Female mice lacking Dev6 (a negative regulator of Irf4; ref. 25) develop a systemic autoimmune disorder more frequently and at an earlier age than males. This lymphadenopathy is similar to human systemic lupus erythematosus, which also occurs more frequently in females than males (26).
Correlating CIS genes with tumor phenotype
Because average tumor latency differs between genotypes (Supplementary Table S1), we reasoned that the insertion profile of tumors would also significantly influence tumor latency. To separate the effects of genotype on latency from the effect of insertions, we normalized tumor latencies between all genotypes by assigning each mouse a percentile life span within its cohort. Using these values, two pooled cohorts were assembled, one containing the short latency mice from each genotype (<50th percentile) and a cohort that contained the long latency mice from each genotype (>50th percentile). Next, we examined what CISs were significantly more frequently mutated in the “short latency” cohort versus the “long latency” cohort and vice versa (Table 2).
Gfi1, Mycn, and Tbxa2r are most preferentially mutated in the short latency cohort, as well as the CIS upstream of Myc. In contrast, CISs found downstream of Myc, which may also affect Pvt1 expression (27), are more frequently mutated in mice with a longer life span, suggesting distinct effects in tumorigenesis. Notch1, Lfng, and Ikzf1 (Ikaros) have been found to collaborate in tumorigenesis (28) and are mutated in the long latency cohort, indicating that mutation of these genes is either not as potent as others or is only tolerated or selected for after other predisposing mutations have taken place.
We analyzed many of the tumors from the p19Arf−/−, p53−/−, and wild-type screen for T- or B-cell content using T- and B-cell–specific markers (CD3 and B220, respectively) for FACS analysis (22). We separated splenic and thymic tumors and ranked them on T- or B-cell content to investigate if CISs were preferentially mutated in tumors with high or low percentages of T or B cells. Both Flt3 and Kdr (Vegfr2), known oncogenes in human hematopoietic malignancies, are selectively mutated in splenic tumors with a high content of B220-positive cells, suggesting that these genes might contribute to B-cell lymphomagenesis (Supplementary Table S6; refs. 29, 30). Conversely, mutations near mmu-mir-106-363 were found in spleen tumors with high levels of CD3-positive cells and low levels of B220-positive cells, indicating that this gene may be involved in the development of T-cell tumors.
Genetic interactions between CISs
We and others have previously observed a strong selection for or against comutation of certain CISs (22, 31). To identify co-occurring and mutually exclusive mutations, we performed two analyses: first, we pooled the insertions from the current study with our previous study of p53−/−, p19Arf−/−, and wild-type tumors to maximize the statistical power of our tests for comutation. However, we also postulated that pooling unlike genotypes might create false interactions between CISs that are enriched in one genotype versus another. To correct for this effect in CIS interactions, we also calculated interactions for all the genotypes separately. The increased number of tumors over our previous study gives higher statistical power to many of the associations found previously (22) and identifies hundreds of new interactions, many of which are skewed toward particular genotypes (Supplementary Table S7A and B).
Many of the interactions that were significant in the pooled data showed a similar trend in one or more genotypes analyzed separately. For example, using a 300-kb window, we find significant co-occurrence of inserts near Myb/Ahi1 and Rras2 and significant mutual exclusivity of Myc/Pvt1 and Mycn in multiple genotypes (Fig. 3A). Some recurrent interactions seem to be remarkably genotype specific for mice lacking both alleles of p19Arf (p19Arf−/− and p16−/− p19Arf−/−; Fig. 3B) or for p16−/− p19Arf−/− and p53−/− mice (Fig. 3C). It is important to note that many of the mutually exclusive interactions are not observed in individual panels (FliI and the mmu-mir-106-363 cluster, Rras2 and Map3K8; Fig. 3D), suggesting that bias of these CISs toward different genotypes is at least partially responsible for the rarity of these comutations. Thus, although pooling panels lend statistical power to identification of co-occurring and mutually exclusive mutations, it is important to simultaneously stratify tumors into separate subpanels to observe whether these interactions are an indirect byproduct of genotype specificity.
Using a 30-kb kernel width, many associations seem to be recurrent for different CISs of the same locus. For instance, the 30-kb CIS of Gata1 significantly co-occurs with Runx1 30-kb CIS nos. 1, 5, and 6 (Supplementary Table S7A). Gata1 mutations removing the NH2 terminus of the protein are frequently detected in Down syndrome patients with transient myeloproliferative disorder and acute megakaryoblastic leukemia (32) and even in some patients with no apparent hematopoietic disorders. This suggests that selection for Gata1 mutations is a direct consequence of trisomy 21. Human chromosome 21 encodes >250 protein-coding genes. Runx1 is one of seven loci associated with Gata1 in our screens, but it is the only one located within regions of the mouse genome sharing a common origin with human chromosome 21. Runx1 and Gata1 are transcription factors that bind each other and cooperate to activate genes involved in hematopoietic differentiation (33). The role of Runx1 dosage in trisomy 21–induced Gata1 mutation remains controversial (34); however, concomitant mutation of these loci in mouse lymphoma supports the hypothesis that Runx1 is at least one of the genes on chromosome 21 promoting selection for Down syndrome–associated Gata1 mutation.
Another novel interaction with precedent in the literature is the co-occurrence of mutations in Lck and the Stat5a/Stat5b/Stat3 locus. The Lck kinase is required for phosphorylating the Stat5a and Stat5b transcription factors in response to T-cell receptor signaling and enhances DNA binding of Stat3, Stat5a, and Stat5b (35, 36), supporting a scenario in which Lck and Stats collaborate in tumorigenesis (Supplementary Table S7A).
Identification of tumor suppressor candidate genes
MuLV insertions can inactivate tumor suppressor genes through disruption of transcripts. It is difficult to distinguish which insertions are likely to be activating or inactivating based solely on location and orientation of the insertions. We previously hypothesized that the presence of more than one insertion within the transcribed region of a gene in cancer suggests selection for loss of both copies (i.e., the gene is a tumor suppressor). However, the most frequently mutated oncogenes may also have more than one insertion within them through chance (either in the same cell or in different subclones). We have previously distinguished between these events by looking for genes that have more than one insertion within the transcribed region, within the same tumor, more frequently than expected by chance (22).
Using similar methods in the current analysis, pooling insertions from all panels, we find 81 genes with more than one insert per tumor (Supplementary Table S8). Forty-four genes have more than one insertion per tumor at frequencies higher than expected by chance, including some new candidate tumor suppressor genes such as Rere (P = 0.007) and Anks1 (or Odin; P = 0.01). Rere was found to be located in the minimally defined loss of heterozygosity region at 1p36.2-p36.1 in a neuroblastoma cell line (37). Embryonic fibroblasts from Anks1-deficient mice exhibited a hyperproliferative phenotype compared with wild-type fibroblasts, consistent with a role for Anks1/Odin as a negative regulator of growth factor receptor signaling (38, 39).
For some genes, there seems to be selection against a second insertion within the gene in the same tumor. Insertions within the Mycn transcript are suspected to be stabilizing the transcript (40). Of the 78 inserts within this gene, only one tumor carries more than one insertion, whereas in randomized data, on average, 3.9 tumors have more than one insert in this gene. Similarly, significant selection against more than one intragenic insertion is also observed within Ahi1 and Pvt1. The simplest explanation may be that mutation of the first allele of each of these genes is sufficient to remove any selection for a second mutation.
A number of verified tumor suppressor loci that would be anticipated to be downregulated during tumor development (Ikzf1, p53, and Nf1) have insertions outside the transcribed region. Thus, rather than limiting our analysis to gene boundaries, we also looked for CIS windows with more than one insertion per tumor at frequencies higher than expected by chance (Supplementary Table S9). The most significant CIS in this analysis was Cbfa2t3 (Core-binding factor, runt domain, α subunit 2, translocated to 3), which, aside from its involvement in an acute myelogenous leukemia translocation (41), is suspected of being a tumor suppressor in human breast cancer (42). Smyd4, the second most significant CIS in this analysis, is implicated to be a tumor suppressor gene in breast cancer development because its expression is lost in a subset of human breast tumors and suppression of Smyd4 expression stimulates proliferation of mammary cells in vitro (43). Retroviral insertions may inactivate these genes by mutation of their promoters and enhancers, or methylation induced by proviral sequences might silence these genes (44).
Cross-species comparison of MuLV insertional mutagenesis data and SNP loci from familial CLL
A recent genome-wide association study of 511 CLL cases (155 with a family history of the disease) using 346,000 SNP arrays identified 49 SNPs distributed over 35 loci that were significantly associated with the disease. Seventeen SNPs were chosen for further validation and seven were found to significantly associate with disease in two independent validation cohorts (45). Observing that one of the validated loci is orthologous to one of our CISs (IRF4), we tested whether the entire set of 49 SNPs/35 loci from the first phase of screening significantly overlaps with our insertions.
Each of the 49 SNPs was mapped to its orthologous position in the mouse genome using the Ensembl Compara database. To avoid redundancy, loci bearing more than one SNP were grouped into a single coordinate representing the average position of all SNPs, yielding 35 loci in total. Using windows ranging from 10 kb up to 300 kb surrounding each orthologous position, we observed enrichment of our insertions within these windows (Table 3). To estimate the significance of this overlap, we compared the number of insertions within the windows surrounding the orthologous loci of SNPs to 1,000 permutations of windows surrounding the random loci (random gene start sites). For all window sizes, orthologous loci of SNPs had more insertions than expected for random loci. The degree and significance of this enrichment varied between a maximum of 2.61-fold (70-kb window, P = 0.016) and a minimum of 1.56-fold (300-kb window, P = 0.066). In addition to finding retroviral inserts within 150 kb of five of the six loci that were previously validated in independent CLL patient cohorts (ACOXL/BCL2L11, IRF4, CR17890/GRAMD1B, AK097902/BC029061, and PRKD2/FKRP/SLC1A5; ref. 45), we also find inserts near a number of SNPs not significantly associated with disease in the validation cohorts (which might justify further investigation of these SNPs in larger validation cohorts) and some SNPs not chosen for rescreening in the validation cohorts (suggesting that these may warrant screening in a validation cohort; Supplementary Table S10).
We isolated 9,117 retroviral insertions from a cohort of 478 MuLV-induced tumors derived from mice deficient for one or a combination of two CDK inhibitors to identify genes contributing to tumorigenesis in these backgrounds. Combining this data set with insertions from a previous screen, we identified 596 CISs at a kernel width of 30 kb, >250 of which had not been found in our previous screen. The 596 CISs are enriched for loci in the vicinity of paralogues of human cancer genes (22).10 By comparison of 13 genotypes, we have illustrated that given sufficient statistical power, the majority of oncogenes and tumor suppressors mutated in these screens are greatly influenced by deficiency for alleles of one or more CDK inhibitors. We also observed that although some combinations of retroviral mutations are recurrent in many genotypes, others are clearly dependent on a specific genetic background.
MuLVs can cause lymphoid, myeloid, and erythroid tumors with the relative frequency of these being influenced by both host genotype and virus substrain. Ecotropic MuLVs infect cells through the mCAT-1/Slc7a1 receptor (46), which is expressed in a variety of hematopoietic lineages, making the cell type of origin of these tumors unclear. Germline mutation of CDK inhibitors may shift the balance between cell populations in the lymphoid compartment and thus affect the cellular composition of the tumors (17). Thus, although the prevalence of CISs in each genotype may stem from cell-intrinsic effects of tumor suppressor loss within the tumor-initiating cells, they may also be influenced by the relative size of different hematopoietic populations within each genotype.
Resources for validation in genome-wide association studies are currently limiting. Integration of independently derived data sets such as mouse retroviral insertions and human genome-wide association studies can help in prioritizing validation of loci, particularly those with no prior suggested role in cancer. Such comparisons are also useful because in many cases, the relationship of SNPs to disease genes is unclear. Although the position of our retroviral insertions in mice frequently implicates the gene that is nearest to the SNPs in humans (as is the case for SNPs and inserts near Ccnd2 and Runx3), this is not always the case (Supplementary Table S10). For example, one SNP on chromosome 21 is nearest to Cryaa; however, six insertions in the orthologous region on chromosome 17 in mice seem most likely to deregulate Snf1LK, an AMP-activated kinase implicated in cell cycle regulation that is found to be overexpressed in murine lymphoma (47, 48). Snf1LK is a regulator of CREB1, a transcription factor best characterized for its neuronal functions but which is also implicated in leukemia (49). Cross-species comparisons can also be usefully combined with genetic interactions as illustrated by the comutation of Gata1 and Runx1. Similar inferences could be made for whole chromosome gains and losses in human tumors where the context of other mutations within the same tumor is known.
We find hundreds of significant associations between CISs and deficiency for different CDK inhibitors. These associations may serve as an additional tool to direct large-scale mutation detection studies of human tumors, particularly those deficient for these CDK inhibitors. Although it is unlikely that we have identified all mutations from our tumors, approximately one third of our insertions occur within CISs, suggesting the presence of at least six to seven driver mutations per tumor. Stratton and colleagues (1) estimate that to define a single cancer genome in its entirety will require more than 1 × 1011 bp of DNA sequence to provide adequate coverage of tumor DNA and somatic controls. As such, until every patient's tumor genome can be sequenced in full, clinical approaches to mutation detection can initially be better directed toward panning for frequent events. Based on these results, a subset of the genome can then be screened for rare mutations which are most likely to occur within that context.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Grant Support: Netherlands Organization for Scientific Research (NWO) Genomics program and Netherlands Genomics Initiative/NWO (A. Berns and M.V. Lohuizen); BioRange program of the Netherlands Bioinformatics Centre, which is supported by a Netherlands Genomics Initiative BSIK grant (J.D. Ridder); Netherlands Genomics Initiative Horizon Breakthrough Project (M.V. Uitert); and Cancer Research UK and the Wellcome Trust (D.J. Adams). Sequencing was carried out by the Wellcome Trust Sanger Institute sequencing facility.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
- Received July 23, 2009.
- Revision received September 28, 2009.
- Accepted October 23, 2009.
- ©2010 American Association for Cancer Research.