Abstract
Cellular information dynamics during somatic evolution of the malignant phenotypes are complex and poorly understood. Accumulating, random genetic mutations and, therefore, loss of genomic information appears necessary for carcinogenesis. However, additional control parameters can be inferred because unconstrained mutagenesis would ultimately produce cellular information degradation incompatible with life. Similarly, the stability of some genomic segments, such as those controlling proliferation and metabolism, indicates the presence of selective mutational constraints.
By applying Information Theory and Extreme Physical Information (EPI) analysis, we demonstrate that the phenotypic characteristics and growth pattern of cancer populations are emergent properties resulting from the nonlinear dynamics of accumulating, random genetic mutations and tissue selection factors. Maximum quantitative loss of transgenerational information is demonstrated in genomic segments encoding negative or neutral evolutionary properties. This is most evident in the progressive dedifferentiation observed during carcinogenesis and may terminate in a differentiation “information catastrophe” producing decoherent cellular morphology and function.
In contrast, microenvironmental selection pressures preserve genomic information controlling properties that confer selective growth advantages even in the presence of a high background mutation rate. Thus, phenotypic traits characteristically retained by tumor populations can be identified as critical selection parameters favoring clonal proliferation.
The information model of carcinogenesis is tested by applying EPI analysis to predict tumor growth dynamics. We found that cellular proliferation attributable to information degradation will produce power law tumor growth with an exponent of 1.62. Data from six published studies that use sequential mammograms to measure the volume of small, untreated human breast cancers demonstrate power law tumor growth with a mean exponent value of 1.73 ± 0.23. Other predictions including exponential growth of tumor cells in vitro are also supported by experimental observations.
The nonlinear dynamics of stochastic information loss constrained by somatic evolution indicate that carcinogenesis will not be associated with any predictable, fixed sequence of genomic alterations. Rather, sporadic clinical cancers are emergent structures produced by multiple, fundamentally nondeterministic genetic pathways.
INTRODUCTION
The formation of coherent societies of diverse cellular phenotypes from a common inherited genome is dependent on the translation of specific subsets of inherited information and the ability of cells to receive and process information in the environment (1) . The active information content of a cell is a timedependent summation of translated intracellular and acquired extracellular information. This information state controls the morphology and function of that cell as well as its interaction with the external environment. Abnormalities in cellular information content will result in disease as clearly exemplified by several thousand genetically linked disorders (2) .
Less clear is the role of disordered cellular information in cancer. Although carcinogenesis typically requires multiple genetic mutations (3, 4, 5) , the mechanisms by which this information loss produces the malignant phenotype remain imprecisely defined. This is primarily because of the absence of a prototypical malignant genotype. Cancer cells typically possess hundreds and even thousands of genomic errors, and unique patterns of genetic mutations are found in virtually every different tumor (5 , 6) . Thus, unlike classical genetic diseases, no single genotype is invariably present in cancer cells, and therefore there is no welldefined correspondence between the genetic mutations present in cancer populations and the cellular characteristics of the malignant phenotype.
Clearly, the large number of genetic mutations found in cancer cells indicates a critical role for genomic information degradation in tumorigenesis. However, although cellular information loss through random accumulating genetic mutations appears necessary for carcinogenesis (7, 8, 9) , two general observations demonstrate that it is not sufficient:
(a) A constant mutation rate, in the absence of any modifying constraints, will monotonically increase genomic noise while useful information decays until it is quantitatively insufficient to maintain life.
(b) The cancer phenotype exhibits apparent nonrandomness in its genomic degradation such that differentiated cell functions are progressively lost while the gene product necessary for proliferation and energy production remain functional (and often amplified) even in the most advanced cancers.
Thus, cancer, as a “disease of the genes” (10) , is a complex disorder of the storage, processing, and propagation of cellular information maintained in the genome. This general class of phenomenon can be addressed through information theory developed by Shannon (11) and Fisher (12) to measure information content and communication. Their seminal contributions have spawned extensive research in the physical sciences (13 , 14) and modest application in biology (15) .
Both Shannon (11) and Fisher (12) information measures are examples of a more general form of information called cross entropy (Ref. 16 ; also often called “KullbackLeibler information”). Although the concepts of Shannon’s information (11) and Fisher’s information (12) that provide the basis for our analyses are but specific cases of one general information measure, we will specifically identify the cross entropy forms as being of the Shannon or Fisher type. We return to this point below.
In this report, we develop mathematical models of cellular and extracellular information dynamics in carcinogenesis. This approach is not meant to be analogous to the linear model of sequential genetic changes in colorectal tumorigenesis described by Fearon and Vogelstein (17) . That is, we do not address the role of specific defects in oncogenes and tumor suppressor genes, the sequence of these changes, the causes of tumor genomic instability, or the influence of nonneoplastic cells during carcinogenesis. Rather, we are interested in understanding the dynamic forces including genomic instability and environmental selection factors that control the evolving cellular information content resulting in the malignant phenotype and ultimately invasive cancer growth.
By applying Shannon’s information (11) to carcinogenesis, we demonstrate the interactions of stochastic mutation events and environmental selection pressures (18 , 19) that produce the cellular characteristics of the malignant phenotype. Following this, we used Fisher’s information to analyze tumor population growth dynamics based on perturbations in environmental information produced by genomic alterations during carcinogenesis. Physical information is the difference between two Fisher information quantities (see below). Using the EPI 2 technique, we accurately predicted tumor growth dynamics in vivo and in vitro, based solely upon changes in the environmental information content.
INTRACELLULAR INFORMATION IN CARCINOGENESIS
The character of information in biological systems can be described as bound or free (20) . Bound information is encoded in the genome serving as a reservoir to be passed from one generation to the next. Free information is contained in nonnucleotide organic polymers such as proteins, lipids, fatty acids, and polysaccharides that regulate cellular morphology and function. Free information is dependent in a complex way on the translation of specific subsets of the bound information and their subsequent interactions. Free information is also present in the extracellular space in multicellular organisms. In this case, cellular function is influenced by both the intra and extracellular free information.
The bound genetic information can be calculated using the Shannon’s information (12) function as demonstrated by Kendal (20) . This approach measures the quantity of information of a string of symbols such as DNA (21) . If there are γ equally probable arrangements for the symbols of a coded message, then the information content is the displacement from randomness defined by k in γ where k is a constant. The information content is often termed the “entropy,” although its relationship to the traditional thermodynamic entropy term is complex and outside the scope of this report. An increase in the Shannon entropy implies an increase in disorder and loss of information. There are several limitations to the application of Shannon’s information to biological systems: (a) because there is redundancy in the genome because of codon degeneracy, Shannon’s information will systematically overestimate the actual information content of a genetic segment; and (b) although Shannon’s information may measure the quantity of information, it does not measure its quality. That is, a small change in the information content of a gene (i.e., a single bp mutation) may catastrophically alter the function of a gene product while multiple errors in a less critical segment of the gene may have no biological consequences.
Despite these limitations, Shannon’s information has been shown to be a useful and reasonably accurate measure of genetic information (20) . Furthermore, because Shannon’s information is both additive and conserved, it is well suited to estimating changes in information content over time. Fisher’s information (12) has these properties as well.
Let a cellular genotype consist of g genes, g = 1,2, …, G. If at time t a gene consists of m codons, the entropy H of each codon, in which r_{j}^{i} is the probability of occurrence of the jth configuration of the ith codon is:
Limit N is 64 (the number of bp combinations in each triplet). If the m codons are equally probable, then all H^{i} = H, a single value. Because there is degeneracy in the third position of the triplet code, this slightly overestimates the actual genetic information content (20) . The maximum bound information content of each gene I_{g} consisting of m codons is:
The bound information content of the cell I_{c} is estimated as the sum of the information content of all of the genes: by additivity over the codons.
After reproduction at a time (t + Δt), the entropy H of the transgenerational encoded information depends on the probability r̄_{j}^{i} (t + Δt) of finding the identical component of j in the ith codon: In the absence of mutations, all values of r̄_{j}^{i} (t + Δt) = r̄_{j}^{i} (t). Then H̄ = H, and there is no change in the entropy. However, in the presence of mutations, the two probabilities are not equal, therefore the two entropies are likewise not equal, and an information change for gene g can be defined as:
In the evolutionary environment that develops during carcinogenesis (22 , 23) , the information content of each gene may contribute to the selective growth advantage of the cell which, in turn, determines its ability to proliferate. We designate u as a measure of this growth advantage, u_{g} being that for gene g. Cells with the highest values of u_{g} in the tissue will proliferate at the expense of those with lower values. We suppose that u_{g} is some function f of the the information content I_{g}: and after a generation the growth advantage generated by gene g becomes: where u_{g} can be 0, positive or negative (i.e., it can confer growth advantage, disadvantage, or have no interaction with evolutionary selection pressures). Similarly, Δu_{g} can increase, decrease, or leave unchanged the cellular growth advantage. The total growth advantage of a cell with N genes can be estimated as: This assumes additivity of the growth advantages for each gene, which is undoubtedly an oversimplification because the complex interactions of gene products will likely produce nonlinear dynamics.
Building on Eq. H , we define the proliferative capacity P_{c} of each cell population c as some increasing function h of the genetic growth information u of each population as compared with the mean value ū over all of the M cellular populations present:
EXTRACELLULAR INFORMATION CONTENT IN CARCINOGENESIS
The preceding analyzed the role of perturbations in bound information on intracellular free information. Free extracellular information in the presence of stochastic changes in bound information is now briefly examined.
Consider an arbitrarily small volume of tissue in a multicellular organism that contains subpopulations of cell types c with c = 1, …, R. There are S_{c} members of each cell type c. The total environmental information I_{e} over all the cell types can be expressed as a function of the bound intracellular information as follows: where q_{c} is the number translated extracellular copies (proteins) of the information content of each gene I^{g} in the microenvironment from each local cell type c (where there are a total of G genes present as above). These organic molecules are assumed to contribute additively. Also, for simplicity we will for now neglect diffusion of information in and out of the local environment.
Alternatively, I_{e} can be simply characterized by a distribution function φ(x,t) that describes the local environmental concentration of all organic molecules x at time t. This distribution function, however, is directly dependent on the number and differentiated phenotype of the local cellular populations. That is φ_{liver}(x,t) will differ from φ_{kidney}(x,t) and so on. Furthermore, φ_{kidney}(x,t) will vary spatially within the kidney because, for example, the environmental information around the glomerulus will differ from that around the distal tubule. In other words, the environmental free information φ(x,t) is a measure of the morphology and function of the constitutive cells in the tissue. We note the possibility of feedback loops such that the environmental information maintains the stability of the cellular distribution that produces it. This may provide a quantitative model for the role of extracellular information in controlling differentiation.
In normal tissue, the number and phenotypic distribution of cells will remain constant maintaining a stable I_{e}. However, in the presence of transformed cells, S_{c}, R, and f_{c} will be subject to stochastic perturbations from random genetic mutations as well as local selection forces. This will produce a complex dynamics in which, by Eq. J , I_{e} will vary with time.
Thus, in the absence of severe injury or infection, the information distribution function is highly robust: φ(x,t) = φ(x,t + Δt), where Δt represents a time frame on the scale of a cellular generation (i.e., days or weeks), thus smoothing diurnal and daytoday stochastic perturbations. The presence of tumor will alter the distribution function depending on the number of tumor cells present (e.g., the presence of tumor typically decreases glucose concentration and increase lactic acid concentrations). This timedependent change in the environmental information content will, thus, be a measure of tumor growth within the tissue.
We will now examine the consequences of this intra and extracellular information degradation in the development of the malignant phenotype and tumor formation. The former follows immediately and focuses on cellular morphological and functional effects of random genomic mutations constrained by environmental selection factors. Subsequently, we explore the role of transformationinduced changes in the extracellular information content in tumor growth dynamics in vivo and in vitro using EPI.
LIMITS OF INFORMATION DEGRADATION IN CARCINOGENESIS
We view the interaction of mutation rate, evolutionary competition, and transmitted information in carcinogenesis by applying the limit developed by Eigen and Schuster (24) . In our previous notation, this is: where I_{cmax} is the maximum number of “signals” or genetic information that can be passed successfully through each generation, u_{c} − ū is the competitive advantage of the population compared with the average of all of the local populations, and α is the mutation rate. Rearranging and adding the result to Eq. I gives:
We now have explicitly linked the proliferation of transformed populations to mutation rate and maximum information transfer. Environmental parameters are indirectly included in Eq. L because they modify the specific competitive advantage of one or all of the populations (i.e., change u and ū). Again, note the time frame is one or more cellular generations, thus smoothing stochastic events that might transiently reverse the otherwise monotonicappearing multistep cellular progression from normal to cancer.
The functional relationship of information loss, mutation rate, and selective growth advantage can be applied to individual genetic segments via Eq. F . Genes with small u_{g} contribute minimally to the overall fitness of the cell and are, by Eq. I , subject to the maximal information degradation, whereas genes with large u_{g} will demonstrate the least drift from the initial information content.
If genes encoding specific differentiated traits contribute minimally to the proliferative advantage of the cell (i.e., have a small value for u_{g}), then I_{max} will be monotonically degraded by unconstrained mutational forces. This is reasonable because differentiated morphology and function are specialized traits found only in multicellular organisms and can be expected to influence the survival of the organism but not the competitive advantage of an individual cell. The timedependent decline in I_{max} is apparent clinically in progressive morphological and functional drift away from the original differentiated phenotype observed during carcinogenesis (Ref. 25 ; Fig. 1 ⇓ ). Interestingly, the unconstrained accumulation of mutations that produces dedifferentiation may eventually produce sufficient genetic noise that a differentiation phase transition (24) results. This new cell state, in the absence of sufficient transgenerational information to retain a “memory” of the original differentiated state, will exhibit nonphysiological combinations of phenotypic traits, such as expression of endothelial cell traits in melanoma and breast cancer (26 , 27) . These cells will, in turn, produce morphologically disorganized and poorly functioning tissue typical of invasive cancers (Ref. 28 ; Fig. 1 ⇓ ).
Similarly, mutations in genes with large u_{g} will decrease the cellular competitive advantage, resulting in nonproliferation or cell death. Although this is catastrophic for the individual cell, it eliminates the entry of the mutation into the evolving neoplastic population. Thus, genes controlling phenotypic traits critical for proliferation (such as oncogenes, glucose transporters, enzymes in the glycolytic pathway, angiogenesis, promoters, and others) are predicted to exhibit only gain of function or amplification genetic defects, whereas loss of function defects will dominate in noncritical genes.
This prediction is supported by carcinogenesis literature showing gain of function defects in oncogenes such as KRAS and amplification of glucose transport genes (29, 30, 31) . An interesting test of this model is observed when environmental selection forces are perturbed by the administration of chemotherapy. This new environmental selection parameter will increase in the value of u for genes controlling the multidrug resistance phenotype. The predicted amplification and gain of function mutations in these genes during the subsequent evolution of the cancer population has been observed (32, 33, 34) .
The development of angiogenesis probably represents a similar evolutionary phenomenon. Early cancers rely on diffusion and the native blood supply for substrate delivery. As the tumor grows, angiogenesis, becomes a critical survival parameter. It has been shown that, as predicted by this model, the angiogenic phenotype is found far more frequently in “older” tumors and develops far more quickly in the presence of a high mutation rate (35) .
Specific environmental selection pressures can be inferred by common phenotypic traits found in transformed populations. Thus, the ability of most transformed cells to withstand severe conditions of nutrient deprivation suggests that enhanced ability to obtain and use substrate efficiently is strongly selected in the Darwinian environment of carcinogenesis (36) .
Thus, for genes with small values of u_{g}: and genes with large values of u_{g}: or where I_{g} is the information content of the gene and I_{gmax} represents some minimal quantity that allows the gene product to remain functional. In other words for high u_{g} genes, the timedependent function describing the information loss has an extremum at I_{gmax} because of the selection pressures. This allows the cells to maintain a minimal information content necessary for life (i.e., they continue to proliferate and generate energy) despite the marked drift in morphology and function that result from genomic instability.
THE EVOLVING MICROENVIRONMENT AND MUTATION RATE
We briefly address the critical role of an unstable microenvironment in carcinogenesis. From Eqs. J and L , it apparent that although cellular mutations will produce perturbations in the microenvironment, these changes, in turn, will alter the competitive advantage of individual cell population u_{c} and the average of all of the populations ū. For example, if the competitive macroenvironment of carcinogenesis results in dū/dt > 0, clonal expansion will be restricted to populations with du_{c}/dt > dū/dt. This accelerating rate of competitive phenotypes is consistent with observations of malignant progression in which increasing aggressive cellular populations emerge over time during carcinogenesis (37) . Other perturbations that may be stimulated by the presence of tumor cells include host immune response. If, for example, cytotoxic T cells detect a specific surface antigen expressed on some of the tumor cells, the u_{c} of that population will decline. However, as noted below, the forces of mutation and selection may simply result in new resistant populations as the total tumor population continues to evolve into progressively more diverse phenotypes (38) .
In an evolving microenvironment, from Eq. K and the increasing nature of function h with its argument, populations with du_{c}/dt < 0 will not survive (an obvious exception being the situation in which ū is decreasing at a greater rate) and those with du_{c}/dt > 0 will proliferate. In this way, the high mutation rate found in cancer is favored because it will produce a greater cellular diversity and increased probability of a proliferative phenotype. This is demonstrated in bacterial studies that have found a 5000fold increase in the mutation rate when culture conditions became more restrictive and made to evolve over time (39) . Clinical studies have demonstrated mutation rate increases during tumor progression (40) . Loeb (8) has hypothesized that acquisition of the mutator phenotype is a critical and necessary event in carcinogenesis. Our analysis supports this hypothesis.
APPLICATION OF EPI TO TUMOR GROWTH
We now turn our attention to the dynamics of altered environmental information caused by the degradation of the genome in transformed cells. As noted earlier, normal tissue will contain an extracellular distribution of organic molecules φ(x,t) that will remain stable with time under physiological conditions. However, the presence of tumor will perturb this distribution, depending on the number of tumor cells present. Thus, changes in φ(x,t) will be dependent on the number of tumor cells present and the deviation of their internal information content from the tissue of origin. This new (cancer) distribution function φ_{c}(x,t) will increasingly differ from φ(x,t) with time. This difference between the distribution functions conveys information regarding the onset and progression of carcinogenesis.
This model prediction of timedependent environmental change as a consequence of cellular information degradation during carcinogenesis can be tested by applying EPI techniques to derive estimates of tumor growth kinetics. This calculation is carried through next.
The problem we address is the form taken by the marginal probability law p(t) of p_{c}(x,t). This marginal law also represents, by the law of large numbers, the relative number of cancer cells anywhere in the space x of points within an affected organ. The time dependence of p(t) then defines as well the growth with time of the cancer.
The EPI approach (14 , 41 , 42) uses Fisher information (12) rather than Shannon information (11) H or I (as in the preceding). However, there is a tiein between the two. Both are manifestations of a cross entropy measure of information (16) ; The probability law r(t) is called the “reference” law. Mathematically, the Shannon information (11) is the cross entropy between a probability law p(t) and a constant, or uniform, reference law r(t) = const. By comparison, Fisher’s information is the the cross entropy between a probability law p(t) and its slightly shifted version r(t) = p(t + Δt).
The two measures also differ qualitatively. Shannon’s information measures the degree to which a random variable is uniformly random (i.e., resembles its uniform reference function, see above). This in turn measures the ability of a system to distinguish signals, as in Eq. K , where it is used to distinguish genotypes. By comparison, Fisher’s information measures the degree to which a required parameter (in our case, the time) may be known (14) . Knowledge of the time is particularly apt to the analysis of cancer growth, as is discussed below.
The working hypothesis of the EPI approach (14 , 41 , 42) is that the data in any measurements result from a flow of information, specifically Fisher’s information, that proceeds from an information source to a sink in data space. This follows a model of Brillouin (16) . The information source is the phenomenon that underlies the data. Its information level is denoted as J. The data information is called I, and the information flow is symbolized by the transition J → I. The flow physically consists of material particles that carry the information. Each phenomenon generally uses different messenger particles. (For example, in an optical microscope, the particles are photons.) The source information J represents the Fisher information that is intrinsic to the measured phenomenon. Assuming that we are dealing with an isolated, passive system, the received information can at best equal the source level, i.e., I ≤ J. The level J corresponds as well to the bound, as opposed to free, information that was referred to earlier.
In our analysis J represents the total extracellular Fisher information that is produced by a cancer cell. The information is about the time, in particular the time θ, at which the carcinogenesis process began. This information may be carried by several biological intermediates. Of these, protons appear to be the dominant messengers because increased glucose uptake and excessive secretion of acid into the extracellular spaces are observed in the vast majority of clinical cancers (43 , 44) . Furthermore, experimental demonstration of the proton as a biological messenger between macromolecules has been published recently (45) . Other carriers of information may exist as well, such as increased interstitial pressure or decreased oxygen or glucose concentrations.
The totality of such information that reaches a normal cell because of all neighboring cancer cells is what we call I. Thus, I is the total information that is provided the functioning cell by all its neighboring cells about the onset time θ of carcinogenesis. Empirically, the received information tends to suppress the receiver, i.e., the normal cell. We now apply EPI to predict tumor growth dynamics expected from this model.
EPI SELFCONSISTENCY APPROACH, GENERAL CONSIDERATIONS
If θ is the age of the individual at which carcinogenesis began, t is the time since it began (i.e., the age of the tumor). At the time (θ + t) ≡ τ, some quantity of information (e.g., lactate concentration) is present in the environment because of the presence of the developing tumor. In the usual clinical situation, when an observer detects the information (the presence of a cancer), he has knowledge of the value of τ but not the value of θ. Of course, by the above definition of τ, the detected information (i.e., the size of the tumor) carries information about θ. For example, if t (and, therefore, the tumor size) is small then the value of τ is very close to the value of θ. We seek to find p(τ θ), the probability of observing the existence of a tumor at time τ in the presence of the fixed value of θ. In other words, we want the probability of detecting the environmental information resulting from the presence of a tumor that originated at time θ at some later time τ. This is called the “likelihood law” for the measurement process. This likelihood law is equivalently p(t), because it will not change shape with a shift in the absolute position θ.
Thus, we define time τ as simply the time at which a cancer is observed in a given organ. t is its unknown (random) component since carcinogenesis began, and p(t) is the probability density for the latter.
By the law of large numbers, p(t) is also proportional to the relative mass of the cancer, i.e., the total mass of cancer cells within the organ divided by the total organ mass. This assumes the cancer cells and functioning cells to have about the same mass density.
The EPI approach establishes an answer p(t) by first determining its real probability amplitude function q(t), where: Here we seek an EPI selfconsistent solution for q(t). This is to satisfy the basic condition that the information loss (I − J) in transit from the phenomenon to the data is an extreme value: through choice of the function q(t). The solution q(t) to the problem obeys the EulerLagrange condition (14) : The indicated differentiations in Eq. S result is a differential equation in the unknown function q(t). This can be linear or nonlinear, depending upon case. A linear example from physics is the Schroedinger wave equation, a linear, secondorder differential equation (14 , 42) . A nonlinear example from genetics is the law of genetic change (41) . Biological growth processes, including cancer growth, are generally nonlinear (41) .
The Eq. S is to be supplemented by a zerocondition: as a constraint. This merely states that I = κJ, or I ≤ J as mentioned before. The solution to the combined problem Eqs. S and T is called a selfconsistent EPI solution.
Quantity κ is a positive constant that is to be found. Quantities i and j are information densities, i.e., the integrands of information functionals: where Eqs. U and V together define the Fisher information level I in received data (see above) for a temporally shiftinvariant system. From Eq. T , the constant κ ≡ i/j represents the efficiency with which information j is transformed into data information i. We next find the selfconsistent solution to the problem Eqs. R and T .
A POWERLAW SOLUTION
The constant κ and the functional j[q,t] are the effective unknowns of the problem. They are solved for by carrying through the selfconsistency approach. The solution is found in Eq (A21) of Appendix A to be: where A = const. This is a powerlaw solution. The power κ/(1 + κ) is as yet unknown. We find it next.
DETERMINING THE POWER
Without loss of generality, Eq. W may be placed in the form: The power is now α. Our overall aim is to find this power.
Because form Eq. X grows unlimitedly with t, for the corresponding p(t) to be normalized the cancer growth must be presumed to exist over a finite time interval t = (0, T). This is of course the survey time over which cancers are observed. The normalization requirement then gives a requirement: after the integration.
The Fisher information conveyed by the cancer over the entire survey period (0, T) is:
EPI theory (14) generally requires that a “free field” solution for q(t) convey minimum Fisher information. The field would in general be some source or force that is imposed from outside that modifies the evolution of q(t). Examples of external fields might include clinical treatment such as some regimen of chemotherapy, tissue constraints such as inadequate blood supply producing tumor necrosis, host immune response, and physical barriers such as adjacent bone or cartilage. However, here we are studying a freely growing tumor. This then defines our freefield scenario. Thus, we require the Fisher information for the solution q(t) to be a minimum. The minimum will be through choice of the free parameters of the process. The observation interval T is already fixed, and E is fixed in terms of it by the second Eq. Y . Hence, there is only one free parameter left, the power α.
Differentiating Eq. X , substituting it into Eq. Z and using Eq. Y gives, after an integration: This may be minimized through choice of α by ordinary differentiation. Setting ∂I/∂α = 0 results in the algebraic equation: as roots. However the negative squareroot choice when placed in Eq. AA gives a negative value for I, which is inconsistent because I must be positive [see Eqs. U and V] . Thus, the answer is a powerlaw solution: This value of the power γ can be compared with published data on growth rate determined by sequential mammographic measurements of tumor size in human breast cancers (46, 47, 48, 49, 50, 51, 52) . In the six available studies (46, 47, 48, 49, 50, 51) , the value of γ was: 1.72, 1.69, 1.47, 1.75, 2.17, and 1.61 (mean 1.73 with standard deviation 0.23). The tumors in these studies seem well suited to test our prediction because they were small (<2 cm in diameter), untreated cancers growing within the breast, i.e., the tumors were not subject to any apparent clinical or tissue constraints so that the conditions for a “free field” solution are probably met.
EXPERIMENTAL VERIFICATION
The powerlaw result Eq. CC was experimentally confirmed as follows. The most convenient clinical data for purposes of confirming Eq. CC would directly observed values of p(t), i.e., cancer mass versus time. On the other hand, the clinical data in Refs. 46, 47, 48, 49, 50, 51 do not consist of values of p(t) but rather values of p(v), the occurrence of tumor volumes (or masses) over all time. However, such data can in fact be used. It is shown by Hart et al. (52) that if p(t) obeys a power law Eq. CC , then p(v) likewise obeys a power law, but with a different power: Given the value of γ in Eq. CC , this becomes The exponent is approximately value −0.382. Hence, on a loglog basis, Eq. DD predicts a straightline relation with a slope of −0.382. The data in Refs. 46, 47, 48, 49, 50, 51 are values of p(v) versus tumor size (apparent diameter). We cubed the diameters to give equivalent volumes v.
Fig. 2 ⇓ shows, on a loglog basis, the theoretical solution Eq. EE with, for comparison, all points of the clinical studies in Refs. 46, 47, 48, 49, 50, 51, 52 that are quantitatively tabulated. [One point in each of the three studies (46, 47, 48) is ignored because the indicated tumor sizes for these points is the indefinite value 2+.] The agreement between the theoretical curve and this experimental data is visually good. Moreover, as we saw above, the theoretical curve is well within 1 SD of the experimentally determined values.
Of course, we must also emphasize the many factors that may produce clinical variation from the predicted growth dynamics. These include:
(a) Altered tumor cell volume. We assumed that the tumor cell volume is exactly equal to the volume of the normal cells, but some heterogeneity in tumor cell size is frequent.
(b) Power law growth is predicted only for tumors unconstrained by external factors (i.e., a “free field”) and thus represents a relative maximum value, i.e., the predicted dynamics are only applicable to tumor growth governed by unimpeded in vivo replication of the tumor cells. If tumor cell expansion is constrained by external factors such as inadequate vascular supply, host response, spatial considerations or clinical therapy, slower growth will be observed.
(c) Tumors may contain significant volumes of noncellular regions such as necrosis and blood vessels or large numbers of nonneoplastic cells such as endothelium, fibroblasts, macrophages, and lymphocytes.
It is encouraging, then, that despite these possible complications, the predicted tumor growth law agrees well with the experimental data.
EFFICIENCY κ
By the second Eq. X , κ = α/(1 − α). Then by the positive root of Eq. BB : This is the total efficiency attributable to the unknown number S (see Eq. I ) cancer cells that are communicating with the functioning cell. Because the efficiency for any single cell can be at most unity, this implies that there are at least S = 4 neighboring cancer cells that are interacting with the functioning cell. That S is rather large means that a relatively large amount of lactic acid bathes the healthy cell. This forms a toxic environment that may eventually kill the normal cell, leading to compromise of functioning tissue and death of the host. This mechanism of tumor invasion has been proposed and modeled previously (43) .
FIBONACCI CONSTANT
We note that the power γ of the growth process Eq. CC is precisely the Fibonacci “golden mean.” This number occurs naturally as the relative increase in an ideally breeding population p_{F}(t) from one generation to the next: (The original subject of Fibonacci’s investigations was a colony of rabbits.) By comparison, Eq. CC shows that the relative increase in cancer mass from one generation to the next obeys: in firstorder Taylor series, where Δt is the time between generations, Δt ≪ t. Thus, although Eqs. GG and HH are both linear in γ, the growth increment γ^{Δt/t} characterizing the cancer decreases with the time while the ideally breeding Fibonacci population maintains a constant relative increase with time. The prediction is therefore that a clinical cancer grows at a less than maximum rate. This agrees with empirical data which shows that: (a) in vitro cancer cells maintained in threedimensional culture under ideal conditions grows exponentially (53) , i.e., follows full Fibonacci growth, whereas (b) in situ cancer grows according to powerlaw growth, i.e., much slower than Fibonacci growth.
Because of the ideal growth effect in Eq. GG , a Fibonacci population grows in time according to a pure exponential law: Use of this probability law in Eqs. U and V gives rise to a Fibonacci information level:
By contrast, the cancer population obeyed the information Eq. AA with α = (1/4)(1 + ). Comparing Eqs. AA and JJ shows that, because Δt ≪ T, necessarily I ≪ I_{F}. That is, in situ cancer transmits much less Fisher information about the time than does ideally growing tissue. The similarity of this prediction to the extensive data on telomere dysfunction in malignant populations is quite evident (54, 55, 56) . Because the telomeres of malignant populations do not typically shorten normally with time, they have incorrect information regarding their “age” (56) and are less likely to demonstrate the cellular effects of senescence. Thus, the results of the analytic solutions are consistent with the critical roles of telomerase and telomere dysfunction in promoting the malignant phenotype (54, 55, 56) .
UNCERTAINTY IN ONSET TIME
A Fisher information level I has a direct bearing on estimated error. The information level (25 , 26) gives rise to a prediction that the minimum mean square error in the estimated onset time θ for breast cancer is 30% of the total time over which such cancers are observed. This is shown at Eq. A24 in Appendix A. The clinical consequence of this observation is that the size of the cancer at the time of first observation may not necessarily correlate well with the age of the cancer (i.e., the time between the development of the malignant phenotype and the first observation of the cancer). This may place a theoretical limit on the value of screening if the development of metastases is dependent on the time during which the cancer has been present rather than its size.
CONCLUSIONS
The work presented here, as with any mathematical model of a complex system such as carcinogenesis, is necessarily limited by a number of simplifying assumptions. Our goal is to develop a broad, conceptual framework of the intra and extracellular information dynamics resulting in the malignant phenotype that both agrees with extant data and makes testable predictions. This is meant to be analogous to the role of theoretical analysis in the physical sciences.
By applying mathematical models from information theory, we demonstrate that phenotypic properties and growth patterns of cancer cells are emergent phenomena controlled by the nonlinear interactions of random genomic mutations and timedependent somatic evolutionary forces.
Information degradation is unconstrained in segments of the genome that do not alter growth advantage and is most evident in the genes controlling differentiated phenotypes. This produces the dedifferentiation characteristically observed during carcinogenesis. Furthermore, if the mutation rate increases through some threshold value, an error catastrophe results [as classically describe by Eigen and Shuster (24)] , producing a differentiation phase transition with unpredictable, nonphysiological combinations of phenotypic traits and chaotic dysfunctional tissue, characteristics that are often observed in highly aggressive clinical cancers.
On the other hand, subsets of the genome that do confer selective growth advantage are shown to remain relatively stable because selection forces generated by somatic evolution constrain the apparent mutation rate in these segments. This allows a cancer cell to maintain an information minima (see Eqs. AA and CC ) compatible with life despite the underlying genomic instability.
Testable predictions from this model include: (a) accumulated mutations during carcinogenesis will always be fewer in genes controlling critical cellular functions than those controlling differentiation function and morphology, and this divergence will increase with time; and (b) traits retained by most malignant phenotypes will be parameters that confer the greatest selective growth advantage and, therefore, optimal targets for therapeutic intervention.
Using EPI analysis, which models informationbased phenomena, we derive cancer growth dynamics attributable to information degradation in an unstable genome that are quantitatively and qualitatively in good agreement with clinical and experimental observations. Furthermore, this approach demonstrates mathematically the critical role of degradation of cellular information regarding elapsed time in carcinogenesis. This is consistent with extensive literature regarding the biological importance of telomere dysfunction and prolonged cell survival in tumor growth. In addition, the prediction is made that the minimum mean square error in the estimated onset time θ for breast cancer is 30% of the time during at which the tumor is observed. This large degree of uncertainty could place a fundamental limit on the efficacy of cancer screening if the onset of metastases correlates with the age of the cancer rather than its size.
As with most nonlinear dynamic systems, the genetic pathways in carcinogenesis are complex and unpredictable. Thus, fixed, linear genetic sequences leading to cancer (17) will likely be uncommon in sporadic human cancers (exceptions may be congenital neoplastic syndromes such as retinoblastoma, neurofibromatosis, and familial polyposis syndrome). This is consistent with studies demonstrating heterogenous genetic pathways in the pathogenesis of breast and renal cancers (57 , 58) . Finally, our model adds a cautionary note to a wide range of strategies in cancer therapy because the same mutational and selection forces that produced the initial malignant population remain available to foil these therapies through rapid evolution of resistant phenotypes.
APPENDIX A. DERIVATION OF EPI POWER LAW SOLUTION
General Solution.
By the use of Eq. V , Eqs. S and T become, respectively: and Explicitly carrying through the differentiations in Eq. A1 gives: Eqs. A2 and A3 must be simultaneously solved for j and q. Notice that each equation involves both j and q. The aim is to get one equation in one of the unknown functions q or j.
Differentiating d/dt Eq. A2 gives: Eqs. A3 and A4 must be solved simultaneously.
Using Eq. A3 on the left side of Eq. A4 gives a requirement: Taking the square root of Eq. A2 gives: where the + sign was chosen because we are seeking a solution that grows with the time.
Substituting Eq. A6 into Eq. A5 gives directly: After a rearrangement of terms and division by , this becomes: This accomplishes our immediate aim of obtaining one equation in the single unknown function j.
Let us seek a separable solution: where functions j_{q}(q) and j_{t}(t) are to be found. Substituting Eq. A9 into Eq. A8 gives a requirement: where the derivatives indicated by primes are with respect to the indicated subscripts (and arguments) q or t. Multiplying the equation by j_{q}^{−1}j_{t}^{−3/2} gives: The first term on the left side is only a function of q and the second is only a function of t. The only way they can add up to zero (the right side) for all q and t is for the first to equal a constant and the second to equal the negative of that constant. Thus: The first equation Eq. A12 can be placed in the integrable form: Integrating and then squaring both sides gives: The second Eq. A12 can be placed in the integrable form: Integrating this, then solving algebraically for j_{t}, gives: Using solutions Eqs. A14 and A16 in Eq. A9 gives: Having found j, we can now use it to find q. Using Eq. A17 in Eq. A6 gives directly: This is equivalent to the differential form: This may be readily integrated, giving logarithms of functions on each side. After taking antilogarithms the solution is: This is a general powerlaw solution in the time t. Two of the constants A,B,D may be fixed from boundary value conditions, as follows.
BoundaryValue Conditions.
At t = 0 the cell is presumed to not be cancerous. Cancer has its onset at t = ε, ε > 0, a small increment beyond 0. This has two boundaryvalue effects: (a) At t = 0, the information j about cancer in the given tissue is zero, j(0) = 0, and (b) At t = 0 the cancer has zero mass, q(0) = 0.
Using these conditions in Eq. A17 gives the requirement B = 0. Using the latter and condition (b) in Eq. A20 gives either D = 0 or κ = −1. But the latter is ruled out a priori by EPI, because κ ≥ 0 represents an efficiency measure of information transmission. Hence, D = 0. With this choice of constants, Eq. A20 becomes: This is the powerlaw we sought. Its square, the probability law p(t), is therefore also a power law. Eq. A21 is the result D = 7 formulation that assumes growth with time, whereas accrual growth occurs at discrete generations. The discrete effects are more important at small values, so that Eq. A21 becomes inaccurate near t = 0. Equivalently, the constant time offset D in Eq. A20 is some unknown, small constant.
Error in the Estimated Duration of the Cancer.
The time t is the time since the unknown onset time θ of the cancer condition. However, the time of observation of the tumor is not t but rather τ, where τ = θ + t. The question is, how well can θ be estimated based upon knowledge of τ? This amounts to determining the position of the origin of the growth curve p(t) based upon one observed event at the randomly displaced time value θ + t.
The CramerRao inequality (12) states that any estimate of θ that is correct on average (unbiased) suffers a rootmeansquare error e that can be no smaller than: In our case I has a value given by Eqs. AA and BB : Using this in Eq. A22 gives a minimum rms error of size: As a relative error, this is quite large, amounting to 30% of the total time interval T over which cancer events are detected. Thus, cancer has an innate ability to mask its onset time. This contributes to its prevalence, and hence might be considered an evolutionary tactic.
Footnotes

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

↵1 To whom requests for reprints should be addressed, at Department of Radiology, University Medical Center, 1501 North Compbell Avenue, Tucson, AZ 85724. Phone: (520) 6265725; Fax: (520) 6269981; Email: rgatenby{at}radiology.arizona.edu

↵2 The abbreviation used is: EPI, extreme physical information.
 Received September 26, 2001.
 Accepted April 23, 2001.
 ©2002 American Association for Cancer Research.