| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Cell, Tumor and Stem Cell Biology |
1 Cellular and Molecular Research, National Cancer Centre of Singapore; 2 Genome Institute of Singapore, Singapore; Departments of 3 Pathology and 4 Surgery, The University of Hong Kong, Queen Mary Hospital, Pokfulam, Hong Kong, Republic of China; 5 Genome Science Division, Research Center for Advanced Science and Technology, The University of Tokyo, Tokyo, Japan; 6 Department of Surgery, Stanford University School of Medicine, Stanford; 7 Department of Biopharmaceutical Sciences, University of California, San Francisco, California; and 8 Peter MacCallum Cancer Centre, East Melbourne Victoria, Australia
Requests for reprints: Patrick Tan, Cellular and Molecular Research, National Cancer Centre of Singapore, 11 Hospital Drive, Singapore 169610. Fax: 65-6-226-5694; E-mail: cmrtan{at}nccs.com.sg.
| Abstract |
|---|
|
|
|---|
| Introduction |
|---|
|
|
|---|
In this report, we present a systematic and comprehensive meta-analysis of the gene coexpression networks in gastric cancer, the second highest cause of global cancer mortality next to lung cancer (10). To assemble a collection of clinical samples with sufficient breadth and diversity to characterize the gene expression interactions associated with this complex disease, we formed an international Gastric Cancer Consortium whose founding members from Australia, Hong Kong, Japan, and Singapore, had previously reported analyses of single-center gastric cancer data sets (1114). We pooled the data from these independent studies to create a gene expression database of gastric cancer, comprising >300 human samples profiled at various histologic stages of gastric tumorigenesis, ranging from normal gastric tissue, chronic gastritis, intestinal metaplasia, to overt carcinoma. Employing rank-based statistics, we identified gene coexpression interactions conserved across the four centers and used these interactions to construct a consensus gene coexpression meta-network of gastric cancer (the "gastrome"). The resulting meta-network supports a scale-free hierarchical architecture, containing several deeply embedded functional modules associated with distinct biological functions. Although some of these modules were previously known, other modules were previously unreported, demonstrating the ability of the meta-approach to reveal novel biological information. The modules were diverse in their overall connectivitysome modules (e.g., cellular proliferation), were highly integrated with other modules, whereas others (e.g., ribosomal biosynthesis) were isolated and autonomous. Strikingly, a module associated with intestinal differentiation exhibited a remarkably high degree of autonomy, raising the possibility that the specific topological features of this module may functionally contribute towards the frequent appearance of intestinal metaplasia in gastric cancer. Finally, the gastrome analysis identified biological relationships that were less apparent from the single data setsspecifically, we found that PLA2G2A, previously identified as a potential prognostic marker for gastric cancer (15), exhibited a conserved coexpression relationship with the EphB2 receptor and subsequently validated this association using tissue microarrays. Motivated by previous reports that EphB2 is a target of the Wnt signaling pathway, we obtained further experimental evidence that the Wnt pathway may also regulate PLA2G2A. Taken collectively, our findings show that meta-analytic approaches can successfully identify novel systems-level features as well as subtle but potentially important biological relationships in tumor biology. Such strategies may thus prove useful and applicable to many other disease types, for the purposes of both biological validation and discovery.
| Materials and Methods |
|---|
|
|
|---|
Identification of conserved coexpression interactions. For each data set, we calculated Pearson correlation coefficients between every gene-gene pair, and for each gene, ranked all other genes (from 1 to 2251) by its correlation coefficient to the former, rank 1 being the least correlated and rank 2,251 the most correlated. Notably, although the absolute correlation value of (A
B) = correlation (B
A), the rank of gene A with respect to gene B may not be the same as the rank of gene B with respect to gene A (see Supplementary Methods SM1 for an example). To identify gene pairs exhibiting consistently high ranks across multiple data sets, we calculated the joint probability of observing a particular sequence of ranks: briefly, for a random sample X of size "n" drawn from a uniform distribution [1...N] (n = 4 and N = 2,252 in this case), if the "n" numbers are sorted such that X(1) and X(n) are the smallest and largest numbers, respectively; then X(k) follows a ß distribution with parameters (k, n k + 1), and p(X(1), X(2), X(3), X(4)) or the joint probability "p" of observing a sequence of ranks X(1), X(2), X(3), X(4) can be calculated (Supplementary Methods SM2). Because the X(k)s correspond to an observed sequence of ranks for a particular gene pair (A, B) across the four data sets, we can define the null hypothesis H0 that for a gene pair (A, B) the sequence of X(k)'s or ranks is random, and the alternative hypothesis H1 that the observed ranks across the four data sets is nonrandom. Using a metric termed the "log-likelihood ratio" (LLR) = log10 [p(H0)/ p(H1)], H0 is rejected iff LLR
LLRcrit, where LLRcrit is user-defined. As the LLR is based on ranks, LLR (A
B)
LLR (B
A), hence we define LLR (A
B) = max [LLR (A
B), LLR (B
A)], and two genes A and B are called "coexpressed" iff LLR (A
B)
LLRcrit. Only 0.94% and 0.27% of possible interactions show a LLR difference >1.5 and 2 between LLR (A
B) and LLR (B
A), respectively (A. Aggarwal, data not shown). The LLR score is also blind to gene pairs being positively or negatively correlated; thus, this technique does not segregate on the basis of gene-pairs being positively or negatively coexpressed. In addition to the LLR, a false discovery rate (FDR) was used to reflect the fraction of false-positives present in hypotheses deemed significant above a given LLRcrit. To estimate the FDR, we generated 50 randomly permuted metadata sets in which the rank order of the genes in the single-center data sets were shuffled, and calculated the number of links found significant at a given LLR. This randomization process was then repeated 50 independent times and the results averaged. For a given
.
Clustering coefficient. The clustering coefficient (16) Ci of a node "i" with "ki" neighbors is Ci = 2 x C/ ki (ki 1) where C is the number of connections between the "ki" neighbors of node "i." The Ci provides a quantification of the extent to which the coexpression neighbors of a gene are also connected to each other. The clustering coefficient for the network is given by CNo. = 1/No.
Ci where "No." is the number of nodes in the network with more than one coexpression partner. For comparative analysis of the clustering coefficient, we simulated "equivalent" scale-free networks using a previously described preferential attachment model (ref. 17; Supplementary Methods SM3).
Assembly of expression communities and functional modules. To generate expression communities, we identified all coexpressed gene pairs of LLR
LLRcrit and connected these genes by "chain-linking": assume gene A's maximal link occurs with gene B, and gene B's maximal link is to gene C (and not necessarily gene A). Gene A is then connected to gene B, and gene B to gene C. This process continues, forming a gene chain, until a terminator pair is encountered which is reciprocally maximally linked (i.e., gene A
gene B
...
gene I
gene J). Chains of length >3 and which end in the same chain terminator pair are pooled to form a "scaffold." The coexpression partners of the genes in the "scaffold" are subsequently aggregated to form an expression community. Communities of size > Smin, where S refers to the number of genes, are considered for further analysis. To assess the similarity of any two communities, we calculated the probability of overlap: two communities of gene sizes S and T are chosen out of N (2,252 in this case), with the probability of the two groups having M genes in common being
, where C(X,Y) is the number of ways sets of size Y can be selected from a set of size X(X
Y). To assemble functional modules from the communities, we ranked the communities in order of increasing size. Starting with the smallest community; its highest overlapping community was identified and if 50% or more of the members in the former were also present in the larger community, then the smaller-sized community was merged with the larger. The above steps were repeated until all the expression communities were partitioned into distinct functional modules. To confirm their modularity, we compared the numbers of internal to external expression links of each module to an equivalent number of randomly selected genes drawn from the original 588 gene set (see Results). Across 2,000 runs, the modularity of the former always exceeded the random data, thus assigning an empirical P < 0.0005. We note that in constructing the final network, all samples (nonmalignant and cancerous) were examined irrespective of phenotype. However, subnetworks that are present in both nonmalignant and malignant tissues (e.g., the ribosomal network) do not seem to exhibit significantly higher statistical associations compared with subnetworks present in either cancer or nonmalignant tissues only (Supplementary Methods SM4).
Hierarchical clustering and other software sources. Average linkage hierarchical clustering was done using an uncentered correlation similarity metric. Median centering by genes was done prior to clustering each data set. Cluster and Treeview (http://rana.lbl.gov/EisenSoftware.htm) software were used for clustering and generating expression heatmaps. All other methodologies were implemented in MatLab (http://www.mathworks.com) and the network plots visualized using Pajek (http://vlado.fmf.uni-lj.si/pub/networks/pajek/).
Immunohistochemisty. A gastric cancer tissue microarray of 343 gastric tumors was prepared using standard methods (ref. 18; see Supplementary Methods SM5 for clinical data associated with these arrays). Villin1 (VIL1) expression was assessed using antibody ID2C3 (Immunotech, France) at a 1:200 dilution and streptavidin-biotin peroxidase after heat-mediated antigen retrieval. EphrinB2 (EphB2) expression was assessed using goat anti-EphB2 (R&D Systems, Minneapolis, MN) at a 1:100 dilution and the DAKO (Carpinteria, CA) EnVision+ System Peroxidase (3,3'-diaminobenzidine) after heat-mediated antigen retrieval. Phospholipase AII group IIA (PLA2G2A) expression was assessed by nonradioactive in situ hybridization (19). Primers for generating the PCR-based riboprobe templates (minus T7 promoter sequences) were: PLA2G2A (forward) TTCTACGGCTGCCACTGTGG; (reverse) GGAGGAGAGCAGTAGAAGGC. A ß-actin riboprobe (Roche) was used to ensure RNA integrity. For cell line experiments, a mouse monoclonal anti-ß-catenin antibody (Transduction Laboratory, Lexington, KY) was used at a 1:200 dilution. Gastric cell lines were cultured to 80% confluence, washed in PBS and fixed in 4% paraformaldehyde. Paraffin cell blocks were prepared. Immunohistochemical staining was done using the DAKO EnVision+ system, peroxidase (3,3'-diaminobenzidine) after heat-mediated antigen retrieval.
| Results |
|---|
|
|
|---|
5 x 106 (2,252 x 2,252) pair wise ranks, and ultimately associating every gene pair (A, B) with a characteristic sequence of four ranks (one from each data set). For example, the rank order of proliferating cell nuclear antigen (PCNA) with respect to TOP2A is (2,224, 2,230, 2,242, and 2,244), indicating that PCNA is the 28th, 22nd, 10th, and 8th most positively correlated gene to TOP2A across the four sets. To identify pairs of genes whose expression was significantly correlated and conserved, we applied a rank-based statistic to calculate the LLR for each gene pair, in which a large LLR indicates a deviation from the null hypothesis (i.e., that the sequence of ranks associated with the gene pair is unlikely to be random). "Significant" gene pairs (henceforth termed "expression links") were defined as those whose LLR exceeded a user-defined cutoff (LLRcrit). To reflect the balance between sensitivity and specificity, we computed FDRs to estimate the proportion of false-positive expression links associated with any particular LLR valuethis was achieved by generating a series of randomly permuted metadata sets in which the gene names within each individual data set were internally shuffled, and determining the number of links found to be "significant" within the randomized data (see Materials and Methods).
|
12.3 k and 925 links were found to be significant, with associated FDRs of
13% and
1.6%, respectively. This result indicates that a substantial number of expression links are indeed conserved across the different data sets, which are likely to represent bona fide biological interactions. Second, we found that increasing the LLR cutoff caused the numbers of significant expression links to decrease at a greater rate than the number of individual genes (Fig. 1B, right), indicating that increases in specificity can be apparently obtained without incurring a similarly dramatic penalty in the absolute number of genes participating in the network. For example, increasing the LLR stringency from 4 to 8 caused a 36-fold drop in the numbers of expression links (
33.5 k to 925) but only a 3.7-fold reduction in the number of genes (
2.2 k to 588). This result is consistent with the hypothesis that the majority of genes are associated with relatively few highly specific expression links, a concept that is further developed later in the report. Third, to test if the significant expression links were robust or sensitive to the presence of particular sample types, we generated 10 randomly truncated meta data sets wherein each 50% of the original samples were removed, and compared the significant links in each truncated set to those found in the original metadata set. We found that at LLR
5,
57% of the significant links in the truncated data sets were conserved with the complete data set, and that the conservation increased to
64% at LLR
8 (corresponding FDRs are
21.3% and
3.2%). This result suggests that the sample population used in this study is reasonably broad and diverse with respect to gastric cancer physiology, as the majority of the significant expression links are fairly robust to the presence of any particular sample type. We also checked the dependence of the expression links on the specific presence of nonmalignant tissues by removing the 70 nonmalignant samples (including normal, chronic gastritis, and intestinal metaplasia) and repeating the entire analysis. Despite the removal of nonmalignant tissues, a similar result was obtained in which
56% of the expression links were conserved between the "malignant-only" and the complete data set at LLR
5, increasing to
61% at LLR
8 (corresponding FDRs
13% and
1.5%), once again indicating their robustness. These results show that rank-order statistical approaches can be successfully used to identify conserved and robust gene expression interactions from multiple disparate data sets, despite the latter being derived from distinct patient populations and array technologies. Having identified these interactions, we assembled them into a gene coexpression meta-network of gastric cancer, termed the "gastrome." We now analyze the gastrome in terms of its general topology, functional modules, and constituent genes.
A topological analysis of the gastrome reveals a hierarchical scale-free architecture with embedded modularity. Previous reports analyzing the topologies of various biological networks in simpler model organisms such as bacteria and yeast have suggested a common "scale-free" character. In a scale-free network, most of the connections are confined to a few major nodes (hubs), which link the other, lesser-connected nodes in the network. In contrast, in random or Gaussian networks, the connections are spread in a statistically homogenous manner across the nodes, such that most nodes have similar numbers of links (see refs. 16, 20). To determine the overall network architecture of the gastrome, we computed the probability P(k) of a gene having "k" expression links, and found that P(k) was inversely related to k following a power law, a hallmark of scale-free networks (Fig. 2A). We confirmed this finding across a range of LLRs, indicating that an overall scale-free topology is a fairly robust feature of the gastrome, similar to that reported for other gene coexpression, protein interaction, and metabolic networks. However, despite this overt similarity in scale-free character, we report here the possibility that there might exist formal differences between the different network types. Specifically, whereas
, the slope of the power law distribution, typically behaves as
(2, 3) for metabolic (20) or protein interaction networks (21),
seems to be consistently <2 for a number of gene coexpression networks [
= 1.5-1.8, 1.5, and 1.1-1.8 for this study, ref. (22) and ref. (23), respectively]. In general, a
exponent in the range
(1, 2) will result in a network not possessing a characteristic mean and variance, along with being dominated by nodes with large degrees (24). An important consequence of this is that because the overall number of nodes is finite, nodes of large degrees will be preferentially connected to one another rather than to nodes of lesser degrees. To test this, we analyzed the cohort of highly connected genes within the gastrome, and found that genes with large numbers of expression links were indeed highly biased in connecting to other "highly connected" genes rather than genes with few connections (P < 105), and that this phenomenon was not an automatic property of a generic scale-free system (Supplementary Information S3). This result suggests that gene coexpression networks, despite being scale-free, may nevertheless possess topological properties distinct from biological networks from other cellular levels. This peculiar network design may facilitate the cell's ability to control the activity of multiple transcriptional programs in a coherent fashion. However, this is clearly speculative and more work is required to investigate the functional properties conferred by this particular systems' organization.
|
No., a metric of global network modularity, for the gastrome at four different LLR cutoffs, and compared these coefficients to those found in either simulated pure scale-free or random Gaussian networks, the latter possessing a nonscale-free topology. At all LLR ranges, the clustering coefficient of the gastrome was substantially higher than either the pure scale-free or a Gaussian network, indicating that the gastrome is indeed highly modular (Table 1). Notably, the modularity of the gastrome, as reflected by
No., became more evident when the LLR cutoff was increasedthis is consistent with a hierarchical model of assembly in which core modules are clearly discernible at a high LLR stringency, and relaxing this stringency subsequently "draws in" expression links combining these modules, resulting in less cohesive structures (25). Previous work has shown that C(k) is essentially independent of k for purely scale-free or modular architectures, but for hierarchical networks C(k) reduces as 1/k (16). To test the gastrome for such a hierarchical assembly, we computed the dependence of C(k) on k at different LLRs. A typical case is shown in Fig. 2B in which the C(k) decreases with increasing k, thereby indicating the presence of hierarchy. Taken collectively, our topological analysis suggests that the gastrome is organized along a hierarchical scale-free architecture, with a deeply embedded substructure of distinct subnetworks and modules.
|
1.6%), and are thus highly specific. In "chain-linking," the expression links, comprising 925 expression links across 588 unique genes, were combined into gene-gene maximal neighbor "chains," and subsequently pooled into "communities" with embedded tree-like structures (Fig. 3A) without restricting genes to be exclusively present in any single assembly. We then agglomerated those communities exhibiting high overlap into larger units. This latter property distinguishes the groupings defined by this approach from the conventional "expression clusters" generated by hierarchical clustering algorithms. We constructed 298 gene-gene chains each of length
3 genes, pooled them into 31 separate communities, and combined the communities exhibiting high overlap (>50%) into 13 distinct units (Fig. 3B). Using random permutation assays, we also confirmed that each of these units was more internally than externally connected (P < 0.0005 for all units), supporting their inherent modularity, and that the integrity of each unit was maintained even after variation of the LLR cutoff (Supplementary Information S4). We henceforth refer to these units as functional modules.
|
Functional modules have highly distinct subtopologies consistent with their different biological functions. We then compared the individual connectivities of the different modules. In principle, the topologies of modules could belong to one of two contrasting scenarios. For example, a module might connect to other modules via several genes and expression linksin this case, parts of the former module would thus be integrated with and potentially regulatable by other modules. Alternatively, a module could be isolated, meaning that other modules would share very few to none of the former module's constituent genes and expression linksin this case, the formers' activity might be fairly autonomous and independent from other modules in the global network. To quantify the relative integration or isolation of the different modules, we defined a simple metric termed the "isolation index," a ratio of external (intermodule) to internal (intramodule) links, and computed this index for all modules (Table 2). We observed a considerable range in the isolation indexes of the different modulesfor example, some modules like the ribosomal module were highly isolated (isolation index of 0) whereas other modules in comparison were relatively integrated (module 9, index 2.1). To ensure that these findings were not simply due to the spurious presence of a few key "outlier" genes, we generated permuted data sets in which 10% to 20% of the genes within each module were randomly removed and repeated the analysis, comparing the module rankings obtained from the modified data to the original complete data set. Although there were slight variations in the rankings of adjacent modules (i.e., modules in which highly similar indexes might sometimes switch places), the overall finding that certain modules seem to be integrated whereas others highly isolated was robust, supporting the idea that different functional modules can exhibit highly distinct topological structures (Supplementary Information S7). There is biological consistency to this finding, as shown in Fig. 3C, in which the interrelatedness of the modules is depicted. For example, modules associated with cellular proliferation were negatively linked and integrated with modules associated with cellular adhesion and extracellular matrix production, which may reflect the requirement of a rapidly proliferating tumor to dissociate the extracellular matrix potentially inhibiting its growth. In contrast, the externally isolated but highly internally connected nature of the ribosomal module may provide a mechanism for tightly balancing the relative transcript levels of each ribosomal subunit, thereby ensuring that their protein products are produced in the appropriate stoichiometric balance for correct physical association and formation of a functional ribosomal complex. As we discuss later, such examples of "modular insularity" may emerge as a general design principle used by cells to (a) "shield" particular functional modules from excessive external regulation, and (b) to ensure that genes within that specific module will act as a concerted unit.
|
A gene neighborhood analysis of the gastrome reveals novel interactions between phospholipase PLA2G2A and the EphB2 receptor. To gain insights into the component pathways affected in gastric cancer, we then performed an analysis of "gene neighborhoods," referring to the set of closest coexpression partners associated with any particular gene. As an example of such "neighborhood analysis," junction plakoglobin/
-catenin (JUP), which resides on chromosome 17q12q21, possesses a coexpression neighborhood of several other 17q12q21 genes within the gastrome (TOP2A, GRB7, TRAP100, and PSMB3), suggesting that a 17q12q21 coexpression amplicon is likely to be present in a certain proportion of tumors from all four patient populations. In this report, we focused on phospholipase II group 2A (PLA2G2A), a secreted phospholipase whose expression was recently found to be of prognostic significance in gastric cancer (15), but for which little is known concerning its actual molecular function in gastric tumors. Using neighborhood analysis, we found that the coexpression neighborhood of PLA2G2A in the gastrome contained the cell surface receptor EphB2, and ß-catenin, a central member of the Wnt signaling pathway (see Supplementary Information S10 for PLA2G2A neighbors). To validate the association between PLA2G2A and EphB2, we analyzed a series of gastric tumor tissue microarrays and found that the histologic expression of PLA2G2A mRNA and EphB2 protein was indeed significantly correlated (P = 0.017,
2 test; Fig. 4A; Supplementary Information S11). We also asked if the association between PLA2G2A and EphB2 could have been discerned by conventional singledata set analysis rather than the meta-analytic approach by reexamining the former, and found that in the individual data sets, EphB2 expression was only modestly correlated to PLA2G2A (Pearson's correlation values of 0.4521, 0.3746, 0.4908, and 0.2129) with thousands of other gene pairs exhibiting higher correlation scores (Supplementary Information S10). In the absence of knowing that this moderate correlation is conserved across multiple data sets, it is quite unlikely that such an interaction would have been identified as significant in any of the original analyses. Indeed, it was stated in one of the original reports that "...the pattern of variation in PLA2G2A expression among (the) gastric cancer samples was not closely related to that of any other genes" (15).
|
| Discussion |
|---|
|
|
|---|
In order to identify gene coexpression relationships that were robustly conserved across the multiple data sets, we employed a rank-based statistical methodology conceptually similar to a previous study identifying evolutionarily conserved patterns of gene coexpression (22). This approach is distinct from a number of previously reported meta-analytic reports of cancer transcriptomes, as it does not rely on performing intergroup comparisons to define differential expression signatures from multiple data sets (7), and is focused on a single cancer type as opposed to all cancers (9). In our study, the success of the meta-analytic approach in identifying hitherto undetected subtle but significant gene coexpression relationships is likely due to both having larger sample sizes for additional statistical power and also to the use of multiple independent data sets to identify conserved associations. The former (increased sample size) increases the sensitivity of detection, whereas the latter (conserved behavior) increases specificity. For example, the expression of EphB2 is only moderately correlated to PLA2G2A in the individual data sets (each containing different numbers of samples), but the preservation of this correlation across the four sets allows it to be identified above other interactions that display "strong" interactions in one data set but not in others (Supplementary Information S13). The network-driven approach also possesses a number of advantages compared with the conventional hierarchical clustering algorithms commonly used in such studies. In hierarchical clustering, the distance or similarity-based metrics used to cluster groups of genes (and samples) are usually based on preprocessed gene sets selected using arbitrary fold change or variance cutoffs. Such an approach offers no correction for potential false-positive relationships. In contrast, the network analyses used in this study combines data from multiple independent studies to perform a rigorous assessment of the FDR. Furthermore, as genes could have multiple cellular functions, another advantage of the network approach is the ability to assign a gene to multiple clusters, unlike hierarchical clustering, in which genes are assigned to single individual clusters.
We note that our study possesses a number of limitations. First, by focusing on conserved gene behavior, our study necessarily identifies common aspects of gastric cancer that are preserved across the different patient populations and array studies. As such, we are unable to infer if there are population-specific differences in the molecular features of gastric cancer between the different countries. Second, our study is limited to the number of genes commonly found across the data sets (2,252), and consequently, does not provide information regarding other potentially important genes. Nevertheless, it should be possible to use the consensus framework defined in this study as a core molecular scaffold to which additional genes can be added as more information becomes available. Third, the gene expression data used in this analysis provides only a static snapshot of the collective gene expression interactions occurring within a complex tissue sample, and is thus likely to underrepresent both the dynamic nature of these interactions, and whether they arise in either single or multiple cell types within the tumor. Fourth, the network presented in our report is undoubtedly simplified as it employs binary relationships between genes and does not consider the "strength" of the relation. Nevertheless, it is apparent that even a "simplified" version of the real network can be useful in identifying novel biological associations for hypothesis generation and subsequent experimental testing, which provides a strong biological motivation for this form of research.
We analyzed the coexpression relationships within the gastrome at the levels of network topology and functional modules. Such higher-level analyses have proved useful in elucidating various systems-level properties of biological networks, such as robustness to perturbations (30, 31), module evolution (7), and oscillatory behavior (32). It is not unreasonable that a similar systems-level description of cancer transcriptomes might also provide insights into the complex biological properties of tumors, including independence from local growth control, derangement of cellular architecture, and the adoption of novel tissue phenotypes (e.g., metaplasia). We found that the global architecture of the gastrome was consistent with a hierarchical scale-free topology containing several deeply embedded modules, and that different modules seemed to possess distinct subtopologiescertain modules were relatively integrated with others, whereas other modules were relatively isolated. Although we cannot definitely rule out that some of our topological inferences may be a consequence of our specific methodology, we believe that they are unlikely to be complete artifacts, for the following reasons: (a) we focused on gene-gene relationships that are strongly conserved multiple independent data sets; such relationships are unlikely to be technical errors, (b) the subnetworks constructed in this study display strong biological coherence, as reflected by their individual constituent genes exhibiting similar cellular functions (e.g., intestinal and gastric differentiation, etc.), and (c) using different network construction methods, other groups have also reported similar topological findings, albeit in simpler organisms such as bacteria and yeast (22, 23). However, to our knowledge, this is the first report demonstrating the presence of topological differences between different functional modules. We propose that such differential "insularity" may reflect a general design principle used by cells to delegate specific functional roles to groups of genes, and to ensure their proper functioning by inhibiting or facilitating intermodule cross-talk (33). Regarding intestinal metaplasia, we suggest that once the intestinal module is initiated, the highly internally connected nature of the module may cause the remainder of the module members to be similarly activated to manifest the complete intestinal differentiation phenotype. In addition, the "insularity" of this module may prevent this differentiation process from being regulated by other modules. The potential role of such systems-level features in contributing to the cancer phenotype needs to be further investigated.
We also studied the gastrome at the level of individual genes and pathways. Specifically, we analyzed the gene PLA2G2A, a secreted phospholipase identified in a previous study as a molecular prognostic marker for gastric cancer (15). Although previous work has implicated PLA2G2A activity in a wide variety of cellular functions, including prostaglandin biosynthesis, inflammatory responses, and antibacterial activity (34), the exact nature of PLA2G2A's role in gastric cancer remains unclear. To identify potential cellular pathways affected by PLA2G2A in gastric cancer, we explored the coexpression neighborhood of PLA2G2A in the gastrome and found that it contained both ß-catenin, a major component of the Wnt signaling pathway, and the EphB2 receptor, a target of Wnt/ß-catenin signaling. These results, coupled with our additional supporting data that the Wnt pathway may regulate PLA2G2A expression, raises the intriguing hypothesis that one of PLA2G2A's roles in gastric cancer may be to modulate the activity of the Wnt pathway. As our tumor expression profiles only provide a static snapshot of these interactions, it is not possible to definitely establish from our data if PLA2G2A might function as a positive or negative regulator of Wnt signaling. We note, however, that (a) PLA2G2A was previously identified as a genetic modulator of colon cancer in the APCmin mouse model (35), and that (b) that high levels of PLA2 have been shown to suppress colon cancer formation (36). This potential connection between PLA2G2A and the Wnt pathway definitely deserves further study.
In conclusion, we have described in this report a consensus molecular framework for gastric cancer, and shown how an analysis of this framework can deliver novel topological and functional insights. More broadly, our results show how meta-analytic approaches, by capitalizing on greater sample numbers and focusing on conserved gene behavior, could aid in identifying subtle but potentially important biological relationships relevant to tumor biology. With the rapidly increasing availability of such data sets in the public domain, it is likely that such meta-analytic approaches will play a valuable role in elucidating both general and tissue-specific molecular circuits in the oncogenome.
| Acknowledgments |
|---|
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
H. Aburatani, D. Bowtell, S.Y. Leung, and P. Tan are founding members of the Gastric Cancer Genomics Consortium. We thank Hyun Cheol Chung and Sun Young Rha of Yonsei Cancer Centre, South Korea for organizing the Gastric Cancer Conference where the Gastric Cancer Genomics Consortium was first conceived.
| Footnotes |
|---|
Received 6/27/05. Revised 9/14/05. Accepted 10/25/05.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
B. J. Capoccia, W. J. Huh, and J. C. Mills How form follows functional genomics: gene expression profiling gastric epithelial cells with a particular discourse on the parietal cell Physiol Genomics, April 10, 2009; 37(2): 67 - 78. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Ganesan, T. Ivanova, Y. Wu, V. Rajasegaran, J. Wu, M. H. Lee, K. Yu, S. Y. Rha, H. C. Chung, B. Ylstra, et al. Inhibition of Gastric Cancer Invasion and Metastasis by PLA2G2A, a Novel {beta}-Catenin/TCF Target Gene Cancer Res., June 1, 2008; 68(11): 4277 - 4286. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. N. Birts, C. H. Barton, and D. C. Wilton A Catalytically Independent Physiological Function for Human Acute Phase Protein Group IIA Phospholipase A2: CELLULAR UPTAKE FACILITATES CELL DEBRIS REMOVAL J. Biol. Chem., February 22, 2008; 283(8): 5034 - 5045. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Huang, H. Li, H. Hu, X. Yan, M. S. Waterman, H. Huang, and X. J. Zhou Systematic discovery of functional modules and context-specific functional annotation of human genome Bioinformatics, July 1, 2007; 23(13): i222 - i229. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Cancer Research | Clinical Cancer Research |
| Cancer Epidemiology Biomarkers & Prevention | Molecular Cancer Therapeutics |
| Molecular Cancer Research | Cancer Prevention Research |
| Cancer Prevention Journals Portal | Cancer Reviews Online |
| Annual Meeting Education Book | Meeting Abstracts Online |