Despite the millions of dollars spent on target validation and drug optimization in preclinical models, most therapies still fail in phase III clinical trials. Our current model systems, or the way we interpret data from them, clearly do not have sufficient clinical predictive power. Current opinion suggests that this is because the cell lines and xenografts that are commonly used are inadequate models that do not effectively mimic and predict human responses. This has become such a widespread belief that it approaches dogma in the field of drug discovery and optimization and has spurred a surge in studies devoted to the development of more sophisticated animal models such as orthotopic patient-derived xenografts in an attempt to obtain more accurate estimates of whether particular cancers will respond to given treatments. Here, we explore the evidence that has led to the move away from the use of in vitro cell lines and toward various forms of xenograft models for drug screening and development. We review some of the pros and cons of each model and give an overview of ways in which the use of cell lines could be modified to improve the predictive capacity of this well-defined model. Cancer Res; 74(9); 2377–84. ©2014 AACR.
In the early 1990s, the National Cancer Institute (NCI; Bethesda, MD) introduced a “disease-oriented” drug screening approach using a panel of 60 human cancer cell lines derived from nine different types of cancer (brain, colon, leukemia, lung, melanoma, ovarian, renal, breast, and prostate; ref. 1). This approach was designed to facilitate high-throughput screening of large numbers of drugs with sufficient discrimination such that only a minority of drugs would be selected for further preclinical assessment in xenograft models. The general failure of this approach to identify the most promising candidate drugs led to the use of the hollow fiber assay, with the NCI currently using just 12 human cell lines as a routine preliminary in vivo screen before more time-consuming and labor-intensive xenograft assays on only the most promising candidate drugs (http://dtp.nci.nih.gov/branches/btb/hfa.html).
More recently, the lack of confidence in preclinical models has led to the concept of tailored drug response profiling using patient-derived xenografts (PDX; reviewed in ref. 2–4). The move away from cell lines and the focus on in vivo models are generally defended by criticisms along the following lines:
Cell lines no longer retain the tumor heterogeneity present in the primary cancer
Cell lines do not contain the relevant components of the tumor microenvironment
The assumption is that in vitro cell line responses have not been able to predict human responses due to these failings in the model. Much of the earlier, highly referenced evidence is, however, restricted to relatively specific examples using small numbers of cell lines and clinical samples. A serious problem with the NCI panel is that it includes only a very limited number of lines for any given cancer (only at most 6–7 lines for each cancer type). This severely limits the possibility for identifying responder subsets of a given type of cancer for any particular drug. However, with emerging high-throughput data technologies, it is now becoming possible to make large-scale comparisons of gene mutation, structural and copy number changes, and mRNA expression profiles between cell lines and primary cancers for the same tissue type. For example, a recent report describing the Cancer Cell Line Encyclopedia (CCLE; ref. 8) used massively parallel sequencing data and microarray expression profiles from 947 human cancer cell lines, coupled with drug responses for 24 anticancer drugs across 479 of the cell lines. They also compared the genomic similarities between the lines and large datasets of primary tumors from publicly available databases and demonstrated strong correlations for DNA copy number changes, mRNA expression profiles, point mutation frequencies, and mutation profiles of commonly mutated genes in each cancer type. Furthermore, analysis of molecular correlates of drug sensitivity in the cell lines identified multiple known features, including tissue type (lineage), as top predictors of sensitivity to several agents. The study concluded that the cell lines “may provide representative genetic proxies for primary tumors in many cancer types.”
The Problem with Numbers
It is now clear that response to therapy, especially for targeted drugs, is strongly dependent on a cancer's genetic and epigenetic make up (9), which varies substantially between different cancers even from the same tissue (10). Drug responses must therefore be evaluated in relation to a cancer's genotype/epigenotype, not only during the drug discovery process, but also particularly for biomarker studies that aim to identify the correlations that predict for response to individual therapies preclinically as well as for drugs that are already in clinical use. Many drugs will fail simply because the fact that they are effective on only a subset of cancers is not initially recognized at the time the trial was designed, and the trials are not large enough for the analysis of subsets.
In 2003, Voskoglou-Nomikos (11) compared published drug activity for 31 cytotoxic cancer drugs where phase II trials, human xenograft, and mouse allograft responses were available for comparison for breast, ovarian, non–small cell lung, and colon cancers. They used data from the NCI Human Tumor Cell Line Screen to calculate the drug in vitro preclinical activity for each cancer type. They then analyzed the results from either a disease-oriented approach (using one tumor type as a predictor of overall activity in the other three types combined) or a compound-oriented approach, or for all four tumor types together. They concluded that the in vitro cell line model was broadly predictive for non–small cell lung cancer under the disease-oriented approach, for breast and ovarian cancers under the compound-oriented approach, and for all four tumor types together. The human xenograft model was not predictive for breast or colon cancers, but was predictive for non–small cell lung and ovarian cancers, but only when the analysis was performed on a subset of drugs for which preclinical information on more than 100 human xenografts was available.
It is important to note that in this analysis, the predicted in vitro preclinical activity was calculated using the NCI-60 panel of cell lines, where each of the nine tumor types is represented by approximately six or seven lines. Clearly, by using drug responses for just six or seven lines of a cancer type to predict in vitro response to a drug in that type of cancer, there is in most cases not nearly enough statistical power to detect correlations with even a single key relatively common difference, such as the presence or absence of a KRAS mutation. Such correlations with clinical responses will only be detected by using a larger panel of cell lines, reflecting the molecular heterogeneity within that cancer type. Similar arguments apply to xenograft studies, where typically the numbers of cell lines or even PDXs studied are small. It is therefore almost impossible for either the NCI cell line–based or the typical xenograft-based models to reliably predict human clinical response for a particular type of cancer.
This problem becomes even more serious when considering the importance of predicting responses to combination therapies. Even a cancer that is little more than a cubic centimeter in size will contain the order of 109 cells. Given that mutation rates are between 10−8 and 10−9 per base per cell division, nearly every possible type of mutation is likely to be represented at least once in even a small cancer. It is therefore hardly surprising that resistance will nearly always develop to monotherapy. The only way to overcome this problem is to attack different key pathways of the cancer process at the same time, as then the probability of several resistance mutations arising at the same time becomes vanishingly small even in large cancers. If combinations of drugs are to be tested with a range of concentrations for each drug, in combination with a range of concentrations of other drugs needing to be tested, the number of mice that would be required for any kind of in vivo approach increases to impractical numbers.
In addition to the CCLE study, a cell line study from the Sanger Institute (Hinxton, South Cambridgeshire; ref. 12) assayed 130 drug responses in 639 human cancer lines. Although both these studies used significant numbers of cell lines overall, the numbers for each type of cancer are still relatively low. For example, the Sanger and CCLE studies included only 34 and 23 colorectal cancer cell lines, respectively, in their drug screens. This means that, even assuming that the cancer cell lines used represent the subtypes of a particular cancer in the same proportions seen in patients, response profiles in molecular subtypes less frequent than, for example, 20%, are very likely to be missed.
The successful use of cell lines for evaluating drug responses in relation to tumor properties clearly depends critically on the use of a substantial cell line panel for adequate statistical power or, where prior knowledge indicates, the maximum number of lines with given characteristics. For example, a 10% sensitive subset of 100 lines that is associated with an OR of 9 for the difference between resistant and sensitive lines can be detected with a P value of 0.008, whereas a 20% subset for the same OR would be detected with a P value of < 0.0001. Thus, even with 100 lines, largish effects in relatively small subsets can be significantly detected, but the power for doing this diminishes rapidly as the number of lines is reduced.
Cell Line Models: Colorectal Cancer as an Example
The Bodmer laboratory has a panel of more than 120 colorectal cancer cell lines, probably one of the largest tissue-specific panels of cell lines held in academia. As such, it provides a good example of how well cell lines reflect the wide spectrum of subtypes of a particular cancer with respect to mRNA expression profiles (13) and common mutations (14). For colorectal cancer, these include mutations in the genes APC, TP53, CTNNB1, BRAF, PIK3CA, and FBXW7, and the mismatch repair genes, mainly MLH1 and MSH2. We have taken two published datasets together with data from our cell line panel and compared the frequency of mutations for these commonly mutated genes with two sources of primary tumor data (Fig. 1). The frequency spectrum of the mutations differs somewhat between cell lines and primary tumors depending on the source of the data. This may be, in part, due to overrepresentation of mismatch repair deficiency (and thus the mutations highly associated with this hypermutated subtype) in the cell line panels compared with primary cancers. It is also possible that at the time most of the cell lines currently in use were isolated, the chance of obtaining a cell line from a tumor was influenced by its genetic and epigenetic make-up. Although there may be differences in the frequency spectrum of key mutations between the cell lines and primary cancers, as long as there are adequate numbers of cell lines to represent each genetically defined subtype that is present in primary cancers, the differences in relative proportions of subtypes in cell lines versus primary cancers should not pose any problems.
Using this colorectal cancer cell line panel, we have shown a clear relationship between 5-fluorouracil sensitivity and mismatch repair status in a subset of 77 cell lines using a relatively high-throughput test procedure and an objective categorization of response (15). More recently, we have shown that for cetuximab [an anti-EGF receptor (EGFR) antibody], direct (nonimmune mediated) responses, in addition to being wild-type (i.e., nonmutant) for KRAS, it is also necessary to be wild-type for NRAS, BRAF, and PIK3CA (16), as suggested by recent clinical data (9).
Both these cell line studies closely parallel clinical data, providing further evidence for the extent to which appropriately large panels of cell lines can be used to predict clinical responses to targeted therapies in patients with similar molecular profiles.
Importantly, we have also demonstrated using cell line in vitro studies that the immune-mediated effects of cetuximab (largely through antibody-dependent cellular cytotoxicity) correlate with EGFR levels and not with KRAS and other mutation status (16). Such comparisons between cell line and clinical data can only be made for treatments that are already in the clinic, and when patient tumors have been characterized to the same depth as the cell lines. Obviously, no novel treatment can ever be tried in the clinic unless there is adequate supporting preclinical evaluation.
It is worth noting that if these studies had been done on panels of cell lines that only contained seven to ten colorectal cancer lines, such as used in the NCI screens or even more recent publications [CCLE (8) and Sanger (12)] where up to 30 colorectal cancer lines were used, it is highly unlikely that these genetic associations would have been detected at a convincing level of significance. For example, the genetic association described above for cetuximab response in our panel of 64 cell lines results in a power of 95%, but this would drop to 70% if performed on a panel of 30 lines and just 20% for a panel of seven cell lines (power calculations were derived using G*Power 3 software). This demonstrates how likely it is that a false-negative response profile would be obtained through the use of inadequate numbers of cell lines.
Limitations of Mouse Models
Recent studies have suggested that PDX models might hold great promise for testing novel drug candidates as well as to identify rational combination therapies (17). Among the arguments made for PDX models is that they may be expected to provide a better representation of within tumor heterogeneity and also that they would reflect the relevant human components of the tumor microenvironment better (Table 1).
However, there is evidence that the range of mutations is narrowed in engrafted tumors compared with their parental cancers and that to capture tumor heterogeneity, several implants of each tumor may be necessary (18–20). These results suggest clonal selection from within the heterogeneous population of implanted human cancer cells for those clones that impart a selective advantage for growth in mice. The problem of tumor heterogeneity applies equally to PDXs and to cell lines. Genetic heterogeneity in an evolving cancer is to be expected, as a newly developed advantageous clone does not immediately replace all its predecessors. Any treatment of a cancer with more than one clone must be based on attacking all the clones present in the cancer as if they were separate cancers. So long as the drug response profile has been based on a sufficient number of cancers to be representative of the variety of genetic and epigenetic combinations expected for that type of cancer, identifying the genetic factors associated with differential responses of heterogeneous subpopulations should not pose any novel problems.
Several reports also describe the loss of human stromal cells within the first few passages after engraftment (17, 21), which raises the question of the possibility of species incompatibility with respect to human tumor–mouse microenvironment interactions. This is emphasized by the results of Hylander and colleagues (22), who showed that by the first passage of PDX tumors, the stroma and vessels supporting their growth are of murine origin (22). Kinetic studies showed that replacement of human vessels and vascularization by host vessels occurred within 3 weeks in a colon PDX and by 9 weeks in a mesothelioma PDX.
The overall usefulness of the xenograft or PDX model may be further hampered by relatively low engraftment rates in comparison with current success rates in growing cells in vitro from primary tumor material (23–25) and the slow growth rates of many engrafted primary cancers. Reported success rates for establishing and serially propagating human solid tumors range from 20% to 50% (2). Similarly, the use of genetically engineered mouse models (GEM) in drug discovery and preclinical studies is hampered by the stochastic nature of spontaneous tumor outgrowth in these models (i.e., tumors do not grow synchronously), making comparisons of drug responses between mice difficult.
Despite the shortcomings of animal models, they remain a prerequisite step for target validation and toxicity studies of in vitro validated compounds. Only through exploring improvements in standard tissue culture practices might we be able to produce an in vitro model that more closely recapitulates the human environment and drug responses.
Developing a Better In Vitro Model
There have been many recent improvements in cell culture conditions that help to mimic more closely an in vivo growth environment. These include growth in three-dimensional (3D) matrices and coculture with normal cell counterparts such as myofibroblasts and immune cells. In addition, microfluidic perfusion systems allow careful control of levels of specific growth factors and additives.
There has also been a dramatic improvement in the success rate of establishing primary cultures of many different types of cancers. Slow growth rates still make large-scale studies difficult and although there have been promising preliminary studies, it is not yet clear whether these in vitro cultures represent better models of human cancer for drug response profiling than well-established cell lines. The particular value of successful short-term culturing of primary tumors will be that it enables relatively rapid testing of whether proposed therapies for the particular cancer, based on its molecular properties, are in fact likely to be effective—the hallmark of personalized cancer treatment.
Growth in 3D environments
Growth of cancer cell lines in 3D matrices such as Matrigel results in the development of multidimensional structures and a range of phenotypic changes including cell morphology (indicative of differentiation), gene expression profiles, proliferation rates, and drug resistance (reviewed in ref. 26). The result is a model with properties that more closely resembles the tumors from which they were derived. Crucially, there is evidence that they respond to drug treatments differently compared with when grown in two-dimensional traditional tissue culture flasks (27), so growth in 3D may well be one of the most significant changes that can be implemented in in vitro drug screening and in development to improve the capacity for the cell line model to predict human drug responses.
Focusing on cancer stem cells
Using 3D gel culture systems has also improved our ability to identify and study cell line-derived cancer stem cells and their patterns of differentiation in vitro (24, 28–30). This enables the direct study of drug effects on cancer stem cells and their differentiation patterns. Part of the problem with identifying successful new therapies is that our current measure of efficacy is almost always by tumor shrinkage in vivo or by overall growth inhibition in short-term in vitro assays. Unfortunately these responses often do not translate into increased disease-free survival in patients. Any therapy that targets more differentiated cells in a tumor while sparing the cancer stem cells will give an initial measurement of efficacy by these criteria. Only by isolating cancer stem cells and identifying therapies that specifically kill this population of a tumor are we likely to find treatments that result in improvements in patient survival.
Recreating the microenvironment
Clearly the effects of the microenvironment, largely determined by myofibroblasts and some key cells of the immune system, notably macrophages and their derivatives, (31, 32) are important for the in vivo evolution of a tumor. Those effects are likely one of the major causes for mRNA expression differences between freshly derived tumor material and the cell cultures obtained from them. However, just as we can create in vitro the conditions required to facilitate immune-mediated antibody killing of tumors by adding appropriate immune effector cells, so can we also now create an appropriate stromal environment by, for example, coculturing tumor-derived cell lines with myofibroblasts or macrophages. Studying these effects separately and together allows an assessment of their relative importance for different drug responses. It will also enable the study of cytokines and lymphokines that may be the effectors of the stromal environment, and of factors that may block myofibroblast and macrophage tumor interactions that are beneficial for the growth of the tumor.
In vitro systems for toxicity testing, including establishment of “normal” cell lines
Toxicity testing of any new treatments will, for the foreseeable future, eventually need to be done in animal models, as that is currently the only way to assess organismal overall toxicity. However, this needs only to be done with those drug regimes that have proved successful in appropriate in vitro cell line tests, possibly followed up by limited animal model testing. General toxicity at the level of epithelial cells, for example, can generally be excluded as long as not all the cell lines tested are susceptible to the drug treatment. Toxicity to other cell types can be evaluated by testing the drug regime on a variety of other cell types for which cell lines are available, including, for example, epithelial cells from other carcinomas, fibroblasts, and lymphoid cells. This may give some indication of the selectivity of the drug treatment for the particular cancer type being studied.
Until recently (at least for most carcinomas), there has been no suitable source of in vitro normal cell cultures, but now, with improved epithelial cell culture conditions and the potential availability of suitable normal cells from human-induced pluripotent stem cells (hiPSC), that is likely to change. Thus, hiPSCs have been induced to differentiate into various tissue-specific lineages that could provide normal cell counterparts that would broaden the field of in vitro toxicity testing at the normal cellular level (33). There are commercially available sources of hiPSC-derived cardiomyocytes, endothelial cells, hepatocytes, and neuronal cells. However, there is still a need to improve the efficiency of reprogramming somatic cells, optimizing differentiation such that full maturation is achieved and improving ways to manipulate the epigenetic state of the derived cells such that they more closely match the tissue-specific epigenetic profiles of normal human tissues. Perhaps, in due course, whole organ systems such as for the liver and heart will become available through these techniques and take us closer to a situation where, at most, limited toxicity testing in whole animals will remain necessary.
Harnessing CRISPR/Cas9 and TALENs for mechanistic studies
Cell lines readily enable functional studies, for example, of drug effects on signaling pathways or knocking in and out candidate genes for investigating effects on drug responses. Recent examples of such cell line studies that have led to novel suggestions for treatment include the preclinical demonstration of the potential effectiveness of PARP inhibitors for the treatment of triple-negative breast cancers and of Ewing tumors with specific translocations, and the need to combine anti-BRAF therapy with anti-EGFR therapy for treatment of BRAF-mutant colorectal cancers (34–36).
Highly efficient genome-editing technologies such as CRISPR (clustered regularly interspaced short palindromic repeats) and TALENs (transcription activator-like effector nuclease; refs. 37–39) have revolutionized the ease with which it is now possible to generate isogenic cell lines with specific genes inactivated or specific mutations inserted, enabling target validation and mechanistic studies to be performed far more efficiently and conceivably on larger panels of tissue-specific cancers. This is important because focusing on just one or two cell lines for isogenic studies will not average sufficiently over the inevitable variety of background differences that exist between lines, so that the results may be biased by the particular set of variations carried in the cell line chosen for isogenic studies.
A Note on In Vitro Functional Assays on Small Numbers of Cancer Cell Lines
Clearly, although we have made the argument for the use of sufficient numbers of cell lines for drug screening and genetic associations that predict for responses, it is not always possible to perform all types of in vitro studies on large panels of tissue-specific cell lines.
More in depth functional studies are often performed using just one or two cell lines of a particular tissue type. Clearly these types of studies cannot be performed using large panels of cell lines, but great care should be taken in the choice of cell line. Wherever possible, cell lines should be selected that most closely resemble the genomic alterations of the tumor subtype being studied. No individual cell line can be representative of all cancers derived from a single tissue and many commonly used cancer cell lines represent highly specific molecularly defined subtypes of a particular cancer. For example, a recent genomic and transcriptome analysis of 47 ovarian cell lines compared with more than 300 primary cancers showed that the most commonly used ovarian cell lines (SK-OV-3, A2780, OVCAR-3, CAOV3, and IGROV1) are probably not good models of high-grade serous ovarian carcinoma, the most common human form of ovarian cancer (40). One of the cell lines (IGROV1) is hypermutated, and three of them have copy number and mutation profiles that more closely match those obtained from low-grade serous, endometrioid, clear cell, or mucinous ovarian carcinomas. Similarly, in colorectal cancer studies, most of the more commonly referenced cell lines are in fact mismatch repair deficient (DLD1, RKO, HCT116, LOVO, LS174T, SW48). It is likely that the majority of these studies were functional investigations where the cell lines used were considered to be representative of colorectal cancer as a whole. Mismatch repair-deficient cancers are a distinct subgroup characterized by unique mutation and expression profiles and may behave quite differently from most other colorectal cancers.
The selection of both individual cell lines and PDX models in research is often influenced by ease and speed of growth. PDXs with higher engraftment rates are associated with clinically aggressive tumors, and cell lines with higher doubling times are often more tumorigenic in mice than their slower growing counterparts. The bias toward exclusive use of models that are easiest to use may significantly reduce the overall representation of molecular heterogeneity for any cancer type in any study.
The Importance of Cell Line Validation
Cell line authentication has become a serious problem in scientific studies and many journals and funding bodies now require regular validation of cell lines before publication or awarding grants.
In 2007 in the United Kingdom, the BBC reported that “thousands of studies have been invalidated” and “millions of pounds of charity donations and taxpayers' money have been wasted on worthless cancer studies” as a result of cell line contamination and lack of authentication (41). A separate report from the then Secretary of the U. S. Department of Health and Human Services estimated that as many as 20% of scientific publications using cultured cells might be affected by this issue (42).
There are many commercially available services for validating cell lines. Although short tandem repeat (STR) profiling is perhaps the most common, this service can cost between £100 and £200 in the United Kingdom to validate each cell line. By multiplexing up to 40 single-nucleotide polymorphism (SNP) PCRs, it is possible for many laboratories to perform their own cell line validation at a vastly reduced cost (typically around £20 per sample in the United Kingdom for 40 multiplexed SNPs).
Cell lines provide an unlimited supply of material that is widely available, easy to propagate, and so form the basis for relatively high-throughput assays. This makes cell line studies on large numbers of drug combinations quite feasible. Many cell lines have now already been exhaustively characterized (8, 12, 13, 15, 16). They represent the spectrum of mutations found in cancers, have similar patterns of chromosomal gains and losses, methylation and mRNA expression, and show no evidence of genetic changes in major driver mutations on long-term in vitro cultivation.
Thus, we believe there is a very strong case for appropriate use of cell line panels for in vitro preclinical evaluation of cancer drug regime responses in relation to the properties of the cancer. It is important that a sufficient number of cell lines is used to give adequate power for detecting subset responses, that these are shown to be representative of the range of genetic and epigenetic variation for the cancer being studied, and that cell lines are regularly revalidated using a suitable SNP panel or other genotyping technology. With the inevitable increase in the need for treatment using combinations of existing and newer targeted drugs and antibodies, extensive animal-based evaluation becomes unrealistic. There are now many newer technologies, such as growth in 3D, better culture conditions for primary tumor outgrowth, and coculture with suitable cellular representatives of the tumor microenvironment that bring the in vitro cell culture conditions closer to the in vivo situation. Other newer approaches, such as the generation of hiPSCs, are likely to improve substantially in vitro toxicology testing. Genome modification using CRISPR and related techniques should make target validation and mechanistic studies much more efficient and evaluation of treatment responses on isolated stem cell populations may finally help to change the focus from treatments that simply result in temporary growth inhibition and reduction in tumor bulk to those that specifically target the stem cell population. Our increasing ability to manipulate the in vitro tissue culture environment significantly enhances the value of cell lines for drug discovery and development.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
- Received October 15, 2013.
- Revision received January 29, 2014.
- Accepted February 9, 2014.
- ©2014 American Association for Cancer Research.