Cancer Research Versailles No Abst  Frontiers in Basic Cancer Research
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Cancer Research Clinical Cancer Research
Cancer Epidemiology Biomarkers & Prevention Molecular Cancer Therapeutics
Molecular Cancer Research Cancer Prevention Research
Cancer Prevention Journals Portal Cancer Reviews Online
Annual Meeting Education Book Meeting Abstracts Online

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Supplementary Data
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Aggarwal, A.
Right arrow Articles by Tan, P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Aggarwal, A.
Right arrow Articles by Tan, P.
[Cancer Research 65, 186-194, January 1, 2005]
© 2005 American Association for Cancer Research


Cell and Tumor Biology

Wavelet Transformations of Tumor Expression Profiles Reveals a Pervasive Genome-Wide Imprinting of Aneuploidy on the Cancer Transcriptome

Amit Aggarwal1,4, Siew Hong Leong2, Cheryl Lee2, Oi Lian Kon2 and Patrick Tan1,3,4

1 Cellular and Molecular Research, 2 Division of Medical Science, National Cancer Centre, 3 Genome Institute of Singapore, 4 Department of Physiology, Faculty of Medicine, National University of Singapore, Singapore, Republic of Singapore

Requests for reprints: National Cancer Center/Genome Institute of Singapore, 11 Hospital Drive, Singapore 169610. Phone: 65-6-436-8385; Fax: 65-6-226-5694; E-mail: cmrtan{at}nccs.com.sg.


    Abstract
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 References
 
Aneuploidy is frequently observed in many human cancers, but its global effects on the cancer transcriptome are controversial. We did a systematic and unbiased genome-wide survey to determine the extent a tumor's abnormal karyotype (chromosomal amplifications and deletions) is detectably "imprinted" onto that tumor's gene expression profile. By using a novel methodology employing wavelet transform signal-processing algorithms to identify genomic regions of coordinated gene expression (wavelet variance scanning), we analyzed a series of gastric cancer cell lines and identified >100 genomic regions exhibiting distinct patterns of subtle but significant coordinated transcription, ranging from tens to hundreds of genes. A large majority (80%) of these regions could be specifically localized to a site of detectable genomic amplification or deletion; reciprocally, up to 47% of the total aneuploidy in each of the individual cell lines could be directly inferred from the gene expression data. Genome-wide portraits of tumor aneuploidy can thus be successfully reconstructed solely from gene expression data, implying that the effects of aneuploidy must be pervasively and globally imprinted within the cancer transcriptome. Aneuploidy may contribute to tumor behavior not just by affecting the expression of a few key oncogenes and tumor suppressor genes but also by subtly altering the expression levels of hundreds of genes in the oncogenome.

Key Words: Wavelet Transforms • Cancer genome anatomy: comparative expression patterns • Computational methods (CAAD, CAMM)


    Introduction
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 References
 
Aneuploidy is one of the most frequently observed genetic aberrations in human cancers, and tumors with increasingly abnormal karyotypes (e.g., chromosomal amplifications, duplications, and deletions) are often associated with greater aggressiveness, chemoresistance, and tendency for metastasis, suggesting a functional role for these genomic aberrations in shaping tumor behavior (1–3). Despite its ubiquitous nature, the specific effects of such large-scale chromosomal aberrations on the cancer cell, in particular the cancer transcriptome, remain controversial. For example, although certain groups have shown that alterations in DNA copy number can play a major role in determining a gene's expression level (4–8), others have reported that genes on regions of chromosomal amplification are rarely associated with increased expression (9). In addition, most of these reports have focused on specific regions, such as sites of recurrent chromosomal amplification (5, 8–10) and may thus have been inherently biased. To resolve this issue and to understand the role of aneuploidy in the carcinogenic process, a systematic and unbiased genome-wide survey of the relationship between aneuploidy and cancer gene expression is required.

We reasoned that if aneuploidy truly exerts pervasive effects on gene expression, then (a) the effects of aneuploidy should be "imprinted" within the cancer transcriptome and (b) with the appropriate tools, it should be possible to deconvolute an individual tumor's gene expression profile to directly infer and reconstruct the specific portrait of chromosomal aberrations inherent to that tumor. A major difficulty in this regard is that the absolute expression levels of individual genes can vary tremendously, even when they are localized in close physical proximity in the genome. Indeed, to our knowledge, there is no report that has successfully showed that global gene expression information can be deconvoluted in a systematic and unbiased manner to derive a specific, genome-wide, de novo portrait of tumor aneuploidy. To address this challenge, we developed a novel methodology, wavelet variance scanning (WAVES), which uses wavelet transform signal-processing algorithms to identify regions of coordinated transcription within a target genome. By applying WAVES to a series of gastric cancer cell lines, we identified several (>100) distinct regions of coordinated transcription and found that these coregulated regions were more frequently observed in cell lines with numerous chromosomal aberrations. Remarkably, the majority (~80%) of these coregulated regions could be specifically localized to a site of chromosomal aneuploidy, and up to 47% of the total aneuploidy in the tumor cell lines could be directly inferred by the WAVES analysis, without requiring a priori knowledge of the specific genomic locations of the chromosomal aberrations. Compared to methodologies relying on absolute gene expression levels, WAVES also seems to be a superior test for identifying regions of coordinated expression. This result has significant implications for cancer biology because it strongly suggests that aneuploidy does indeed act to drive pervasive and widespread gene expression changes throughout the cancer transcriptome. Our results confirm and extend previous reports proposing that aneuploidy may contribute to tumor behavior not just by affecting the expression of a few key oncogenes and tumor suppressor genes but also by subtly altering the expression levels of hundreds of genes in the cancer genome.


    Materials and Methods
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 References
 
Cell Lines. Gastric cancer cell lines SNU1, SNU5, SNU16, KATOIII, AGS, Hs746, and N87 (Supplementary Table 1) were purchased from the American Type Culture Collection and cultured according to its recommendations.

Comparative Genomic Hybridization and Spectral Karyotyping. For comparative genomic hybridization (CGH), tumor and normal (obtained from a healthy volunteer) genomic DNAs were cohybridized to metaphase spreads obtained from lymphocyte cultures of a normal individual (11). Ten to fifteen metaphase spreads were counted per slide. Spectral karyotyping (SKY) was done on metaphase slides prepared from each tumor cell line, using SKY Paint (Applied Spectral Imaging, Israel; (12), and analyzed by SKYview software (Applied Spectral Imaging, Israel). Aminimum of seven metaphases was analyzed for each cell line. The complete CGH (Supplementary Fig. 1) and SKY (Supplementary Table 2) data are available.

Expression Profiling. Total RNA was extracted from cell line pellets using Trizol reagent and processed for hybridization to Affymetrix U133A GeneChips (Affymetrix, Santa Clara, CA) following the manufacturer's instructions. Each cell line experiment was replicated in triplicate.

Mapping of Affymetrix GeneChip Probes to the Human Genome Sequence. We selected GeneChip probes (19,442) with an assigned LocusLink identifier (LocusID), using annotations from the Affymetrix Website (http://www.netaffx.com), and determined their corresponding physical location on the human genome using the NCBI Entrez Mapviewer database (www.ncbi.nlm.nih.gov/mapview/; June 2003). Of 19,442 probes with a LocusID, 8,104 were localized to a unique LocusID, 8,470 were localized to 2 to 3 probes per LocusIDs and the remaining 2,868 to 617LocusIDs.

Data Preprocessing. Gene expression data were quality controlled by Gene Data Refiner (www.genedata.com). Gene expression data from individual arrays were normalized by median centering around 1,000 expression units. For each cell line, the three replicates were averaged and the missing values were replaced by a nominal value of 1. Mean centering and normalization by SD was also done prior to wavelet transforms.

Wavelet Transforms. Wavelets are small waves with similarities to Fourier transforms, and are conventionally used to convert data from a time domain to a frequency domain (13, 14). Briefly, a wavelet is a function of zero average

(1)
which can be dilated by a scale parameter s and translated by a position parameter t. Mathematically, this can be denoted as

(2)
The wavelet transform of f, which correlates f with {psi}u,s at scale s and position t, is computed by

(3)
where * indicates a complex conjugate. By varying the wavelet scale s and translating the wavelet along the positional index u, a plot of how the Wf wavelet coefficients vary with scale and position can be generated. The transformation to Fourier space provides a rapid way to calculate the coefficients at all translations for a given scale in one step (14, 15).

Continuous Wavelet Transforms and Scale-Averaged Variance. To estimate the continuous wavelet transform, the scales are dilated in powers of 2J (with J = 1 to 5, resulting in 2, 4, 8, 16, and 32) with four logarithmic subdivisions within each division. This range of scales was chosen based on an initial analysis of the relationship between wavelet variance density and scale, which revealed minimal variance beyond 25(see Supplementary Fig. 2). Morlet wavelets (15), which are Gaussian curves modulated by a sine wave, are used here for ease of interpretation and application.

(4)
An estimate of wavelet variance at a given scale is obtained by summing the squares of the wavelet coefficients (the square of coefficients represents the variance). To estimate wavelet variability over multiple scales, we use

(5)
The square of the absolute value of the wavelet coefficients represents the variance, and division by scale converts it into a variance density, represented in this manuscript as "wavelet gene expression" values.

Wavelet Variance Scanning. In wavelet variance scanning (WAVES), a moving window of L probes (Lis termed the scan length) is slid continuously over a wavelet variancematrix consisting of the scale-averaged wavelet gene expression values (Eq.5)of all cell lines in the data set. Within each window, the most dominant cell line is defined by Ni (i [1,7]), the dominance value. Ni refers to the number of times a particular cell line exhibits either the highest wavelet gene expression value (for amplifications) or lowest wavelet gene expression value (for deletions) in that window. It should be noted that in this particular implementation, only those regions unique to a particular cell line would be strongly elucidated. If a region is present in multiple cell lines, this methodology will result in one cell line being preferentially emphasized over the others (Supplementary Information Technical Note).

Confidence Assessment Using Random Permutations. For each cell line, a statistical confidence value is attached to each region of high Ni. Because the null distribution of this data is not known, we empirically approximated the null distribution by simulating it under conditions in which the gene order is randomly permuted. This was done by generating 100 randomly scrambled genomes and then subjecting them to wavelet transformation followed by conversion to dominance space. For each of the 100 simulations, 19,442 – L windows are observed for each cell line. The mean of the 99th percentile cutoffs from the 100 random genome analyses () is taken as an estimator of the 99th percentile value in the permuted data. Windows in the actual genome scan having (i.e., above the permuted 99th percentile cutoff) are called significant at P ≤ 0.01.

Estimating False Discovery Rates for Individual Cell Lines. In addition to the type I confidence values ascribed to each region of high Ni, it is also important to interpret these regions in the context of overall accuracy based on the total set of significant windows for each cell line. Thus, we have also used the false discovery rates to estimate the proportion of false-positives from the total number of "significant" windows (166). Using the rejection region fixed at the 99th percentile from the random simulation results (see previous section), the false discovery rate of windows in the rejection region is defined as , where Nwi = the number of windows in the actual genome scan with , and (the number of windows in the random data above the 99th percentile value).


    Results
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 References
 
Wavelet Transformations of Gene Expression Information. Wavelet transforms are signal-processing algorithms similar to Fourier transforms that are used to convert complex signals from time to frequency domains. However, unlike Fourier transforms, wavelets are able to functionally localize a signal in both time and frequency space, thus allowing transformed data to be simultaneously analyzed in both domains (frequency and time). We hypothesized that wavelet transforms might provide an effective means to identify genomic regions of coordinated transcription within an mRNA expression profile due to their ability to accentuate recurrent temporal relationships between neighboring data points (17). To test this hypothesis, we applied the continuous wavelet transform procedure to genomically ordered transcription data derived from seven different gastric cancer cell lines. The wavelet transform maps the absolute gene expression levels in an expression profile to a new data set in which the absolute variability is represented as wavelet coefficients across different scales and locations. This can be represented as a three-dimensional graph that depicts the wavelet variance as a function of scale and location. An example of this process is shown in Fig.1A, in which the gene expression levels of array probes ordered along chromosomal region 17q are resolved over both multiple scales and genomic location for cell lines N87 and AGS. Because this operation essentially converts absolute gene expression levels to their wavelet counterparts, we will henceforth refer to the wavelet variance value of a particular array probe as a wavelet gene expression value. To address the challenge of interpreting data over multiple disparate wavelet scales, we also did a scale-averaging operation of the wavelet gene expression data, in which the individual variances were integrated over different scale ranges (see Materials and Methods). The resultant scale-averaged data provide a representation of coordinated transcriptional behavior at a particular genomic locus. The effects of the scale averaging operations are shown in Fig. 1B for the same 17q genomic region—narrow wavelets (small-scale ranges) uncover sharp features (top), whereas wide wavelets (large-scale ranges) uncover more global features by "flattening" the peak through distribution of the wavelet variance over a larger region (bottom). These results indicate that continuous wavelet transforms can be successfully applied to gene expression data and that averaging of wavelet gene expression values over smaller scale ranges captures local trends, whereas averaging over larger scale ranges captures long-range trends.



View larger version (31K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 1. Wavelet transformations of gene expression data. A, normalized gene expression values for microarray probes localized to the 17q chromosomal region for all seven gastric cell lines (left), and wavelet-transformed gene expression (wavelet gene expression) data for the cell lines AGS (top right) and N87 (bottom right). The axes on the three-dimensional graphs represent genomic location, wavelet scale, and wavelet gene expression values (Wavelet Variance). Cell line N87 displays a 17q12q21 amplicon (red arrows). B, scale-averaged two-dimensional wavelet gene expression data for all seven cell lines, using narrow- (top) and wide-scale (bottom) wavelets. Narrow wavelets (small scales) uncover sharp features and local trends, whereas broader wavelets (large scales) are more biased toward global features and long-range trends.

 
Targeted Analysis of Regions Exhibiting Coordinated Gene Expression Suggests a Correlation with DNA Amplifications and Deletions. As continuous wavelet transforms emphasize patterns of recurrent behavior, a genomic region exhibiting either a high or low wavelet gene expression value indicates that the transcriptional behavior of genes in that region is occurring in a coordinated fashion. We refer to such regions as coordinated regions of expression (CORE). Each of the seven gastric cancer cell lines displayed a unique wavelet profile, comprising several distinct COREs of high or low scale-averaged wavelet gene expression values relative to the other cell lines (usually spanning 100-200 ordered array probes, and 600-700 probes for some regions). We hypothesized that these COREs might correspond to sites of chromosomal aneuploidy as the cell line N87, which carries a chromosomal amplification of the 17q12q21 region (ref. 18; also see Supplementary Information Technical Note for a more detailed examination of this region in the cell lines), also exhibited a higher wavelet gene expression variance at this locus compared to the other cell lines(Fig. 1). To test this possibility, we manually identified several COREs and found that most of them could be localized to sites of chromosomal aneuploidy, as assayed by CGH orSKY. An example is shown in Fig. 2, which shows the seven cell lines plotted by their wavelet gene expression values across chromosome 7. Here, an extended region of high wavelet gene expression variance was observed for cell line SNU5, as well as a region of low wavelet gene expression variance for the cell line N87 (Fig. 2A). Indeed, as confirmed by both CGH and SKY, cell line SNU5 possesses a chromosome 7 amplicon, whereas the cell line N87 has a deletion of chromosome 7q (7q22qter; Fig.2B and C for CGH and SKY, respectively). These results suggest that a correlation may exist between COREs and sites of aneuploidy in gastric cancer cell lines.



View larger version (39K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 2. Correlation of wavelet gene expression values to specific chromosomal aberrations. A, wavelet gene expression profiles of all seven gastric cancer cell lines for array probes localized to chromosome 7 (X axis). The wavelet gene expression values (Y axis) are presented on a log-transformed scale to highlight regions of both high and low wavelet variance. The cell line SNU5 exhibits several peaks of increased transcriptional coregulation in this region (red arrows), whereas the cell line N87 exhibits a region of decreased coregulation (pink arrows). B and C, detection of chromosome 7 aberrations in the cell lines SNU5 and N87 by CGH (B) and SKY (C) The majority of chromosome 7 is amplified in cell line SNU5 (B, green lines, left), whereas the distal end is deleted in cell line N87 (B, red lines, right). Pink numbers, total number of metaphase spreads counted. Similarly, SKY analysis shows multiple copies of chromosome 7 in SNU5 cells (C, beige staining regions, top), whereas the distal end is deleted in N87 cells (C, pink arrow, bottom; a normal chromosome is shown on the right). The actual 4',6-diamidino-2-phenylindole–stained metaphase chromosomes are shown to the left of the false-colored images. Representative of 10 to 12 cells.

 
WAVES: a Systematic and Unbiased Methodology for Identifying COREs. The identification of COREs by manual inspection is highly laborious, prone to interobserver bias, and does not provide any statistical likelihood of such regions truly existing within a particular cell line (rather than being false positives). To establish the suggested correlation between COREs and aneuploidy, we implemented a systematic and unbiased methodology to identify COREs on a genome-wide scale. Referred to as WAVES, the absolute gene expression data from every cell line is subjected to a continuous wavelet transform, scale averaged, and combined into a wavelet-transformed gene expression matrix (wavelet gene expression matrix; Fig. 3). Next, a moving window of scan length L is continuously shifted over the wavelet gene expression matrix, and within each window the "dominance frequency" (Ni) of each cell line is recorded to form a dominance matrix (Fig. 3A, left). Depending on the desired comparison, the dominance frequency is defined as the number of times a cell line either exhibits the highest wavelet gene expression value compared to the other cell lines within a window (for amplifications) or the lowest wavelet gene expression value (for deletions).



View larger version (38K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 3. Unsupervised detection of COREs. A, schematic of detection scheme. The Expression Matrix is transformed into a Wavelet Matrix, containing the ordered wavelet gene expression data for all cell lines. Ordered indicates that the array probes were aligned by their true chromosomal sequence. A moving window is applied to the Wavelet Matrix, and within each window's scan length, the frequency of a particular cell line exhibiting a dominant wavelet-Gene Expression value is computed. Dominance can be defined as a cell line exhibiting either the highest (for amplifications) or lowest (for deletions) wavelet gene expression value compared to other cell lines. The Dominance Matrix summarizes the dominance frequencies across all cell lines and windows. For any specific cell line, the distribution of dominance frequencies is compared against randomized data generated from an Expression Matrix where the probe locations were permuted. Dominance frequencies in the ordered data that exceed the 99th percentile of frequencies in the permuted set are deemed significant (red arrowhead). B, graphs of dominance frequencies in ordered data compared to permuted data. X axis, moving windows as they occur along the ordered genome (left) or in the permuted genome (right). Y axis, dominance frequencies of the cell line KATOIII when an ordered genome is used (left), compared to a permuted genome (right). Red arrow, 99th percentile dominance frequency in the permuted data as averaged across 100 permuted simulations. Peaks in the ordered data exceeding this 99th percentile value (blue arrows) are deemed significant.

 
To evaluate the significance of a dominance frequency Ni for a particular cell line, we empirically estimated the probability that a dominance frequency greater or equal to Ni would be observed by random chance by comparing the actual results to 100 randomly permuted genomes in which the probe locations were randomized (Fig. 3A, right). For each cell line, the mean of the 99th percentile values () from the 100 scrambled genomes was used to define a rejection region that was then applied to the distribution of true dominance frequencies (Fig. 3A), Ranked Dominance. The use of this rejection region allows us to attach a true positive probability of at least 99% to regions in each cell line having a dominance frequency . However, as (19,442 – L) hypotheses (the total number of windows) are simultaneously tested with no correction for multiple hypothesis testing, the probability of observing at least one false positive in this assay is almost one [1 – (0.99)J, where J = number of hypothesis found significant]. The decision not to impose a specific control on the overall false positive rate was deliberate, to ensure maximal sensitivity and that all possible regions are detected. Instead, we defined a false discovery rate () (see Materials and Methods; ref. 16) to provide an indication of the extent of false positives among the significant hypotheses (i.e., cell line regions, where ). A high false discovery rate for a cell line indicates that many of the significant hypotheses are likely to be false-positives.

The performance of the WAVES algorithm is presented inFig.4. As scan length L is increased, both Ni and were observed toincrease, but the increase in Ni was more dramatic (Fig. 4A and B for cell lines SNU5 and AGS, respectively). Conversely, reducing the scan length resulted in Ni eventually converging to an expected result because narrower windows would naturally deemphasize regional patterns of coordinated behavior within an ordered genome in favor of local "noise." The false discovery rates for all cell lines were also observed to improve with increases in scan length L (Fig. 4C and D), up until 100 probe units. In contrast, small changes in the scale-averaging conditions had a minimal influence on the overall results on the false discovery rates (Fig. 4A and B).



View larger version (28K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 4. Performance characteristics of detection methodology. A and B, effects of varying Wavelet Scale and Scan Length of the moving window. Dominance frequencies were calculated for ordered (solid lines) or permuted (dotted lines) genomes in two cell lines, SNU5 (A) and AGS (B). Y axis, 99th percentile value of dominance frequencies. X axis, scan length of the moving window from 25 to 150, using incremental steps of 25. Results from permuted genomes were averaged from 100 independent permutations. Bar, ± 1{sigma}. Dominance frequencies were calculated using a Morlet wavelet of either scale-averaged wavelets of 2 to 8 (red lines) or 2 to 16 (blue lines). The dominance frequencies of ordered and permuted data are observed to converge with decreasing scan length. C and D, FDR (Y axis) as a variation of scan length for all cell lines. See main text for a definition of the FDR. An FDR of 0.1 indicates that 10% of the dominance frequencies in the ordered data set are also observed in the permuted data set above the 99% percentile. FDRs are observed to decrease with increasing scan length sizes for all cell lines. Left, dominance is defined as the cell line having the highest wavelet gene expression values. The highest FDRs are observed for cell line SNU1, which exhibits the lowest number of overt chromosomal aberrations, whereas low FDRs are generally observed in cell lines with high numbers of chromosomal amplifications (cell lines KATOIII, Hs746). Right, dominance is defined as the cell line having the lowest wavelet gene expression values. The lowest FDRs are observed for cell lines carrying numerous chromosomal deletions (N87 and Hs746), whereas cell lines SNU1 and AGS, which carry few chromosomal deletions, exhibit a high FDR.

 
An interesting biological correlation was observed when COREs from the different cell lines were globally compared in this manner. Specifically, when dominance was defined as the cell line exhibiting the highest wavelet gene expression value, the lowest false discovery rates (and highest specificity) were observed in cell lines exhibiting numerous chromosomal amplifications (e.g., cell lines KATOIII and Hs746), whereas the largest false discovery rates (and lowest specificity) were observed for cell line SNU1, which carries comparatively few chromosomal amplifications. However, if dominance was defined as the cell line exhibiting the lowest wavelet gene expression value, the lowest false discovery rates were observed in cell lines exhibiting numerous chromosomal deletions (e.g., cell lines N87 and Hs746), whereas cell lines SNU1 and AGS, which carry comparatively few chromosomal deletions, exhibited large false discovery rates. This result suggests that (a) a global correlation exists between the presence of COREs and sites of chromosomal aneuploidy and (b) that this correlation is sufficiently strong that it can be observed even under conditions in which the whole genome is analyzed in an unsupervised manner.

Global Concordance of COREs with Chromosomal Aberrations. To establish that individual COREs can indeed be used to directly infer specific sites of chromosomal aneuploidy, we performed a global concordance study between the CORE predictions and the CGH data. When dominance was defined as the cell line exhibiting the highest wavelet gene expression value, four cell lines (SNU5, KATOIII, Hs746, and AGS) exhibited low false discovery rates of <0.2 (Fig. 4C). For these four cell lines, 63 COREs were collectively found to be significant by WAVES, and of these, 47 regions (or75%) could be localized to regions of chromosomal amplifications, although the COREs were initially identified without a priori knowledge of the locations of any chromosomal aberrations. Conversely, if dominance was defined as the cell line exhibiting the lowest wavelet gene expression value, five cell lines (SNU5, KATOIII, Hs746, N87, and SNU16) exhibited low false discovery rates of <0.2 (Fig. 4D). For these cell lines, 76 COREs were collectively found significant by WAVES, out of which 65 (86%) were indeed located within regions of genomic deletion (see Fig. 5 for examples). In summary, approximately 80% of COREs identified by WAVES could be localized to confirmed sites of chromosomal amplifications or deletions, confirming that WAVES-identified peaks are indeed highly specific in their association with regions of chromosomal aneuploidy. Regarding the sensitivity of WAVES in identifying sites of known chromosomal aberrations, of 291 bands scored as significantly amplified by CGH (in the four cell lines with low false discovery rates, FDR <0.2), 146 (50%) could be associated with a CORE. Similarly, of 450 bands scored as significantly deleted by CGH (in five cell lines having low false discovery rates, FDR <0.2),205 (46%) could be associated with a CORE. Thus, in total, approximately 47% of the total chromosomal aneuploidy observed in these cell lines could be directly inferred from the WAVES analysis. We note, however, that the figure of 47% is almost certainly a lower limit and the actual figure is likely to be much higher. This is because the current implementation of WAVES was designed to identify CORES that are unique to each cell line, and thus aneuploid regions that are commonly present in multiple cell lines would have been missed (see Supplementary Information Technical Note).



View larger version (46K):
[in this window]
[in a new window]
[Download PPT slide]
 
Figure 5. Genome-wide association of COREs with chromosomal amplifications and deletions. X axis, moving windows ordered along the human genome; Y axis, dominance frequencies of the cell line; red line, 99th percentile dominance frequency derived from the permuted genome simulations. A and B, association of transcriptionally coregulated regions with genomic amplifications detected by CGH for the cell lines SNU5 and KATOIII. Detected amplifications are indicated by their cytogenetic coordinates above each peak, and dominance is defined as the cell line exhibiting the highest wavelet gene expression. C and D, association of transcriptionally coregulated regions with genomic deletions detected by CGH for the cell lines SNU5 and N87. Detected deletions are indicated by their cytogenetic coordinates above each peak, and dominance is defined as the cell line exhibiting the lowest wavelet gene expression. (All significant peaks that can be matched to an observed chromosomal amplification or deletion are shown in black. Chromosomal coordinates in red type indicate a coordinated expression region for which a chromosomal aberration could not be detected by CGH. Chromosomal coordinates in green type indicate a coordinated expression region close to the centromere; the CGH data at these locations is not reliable due to the presence of numerous repetitive sequences).

 
Finally, we also compared the performance of WAVES to a more conventional methodology in which wavelet transforms were not done. When the techniques were assessed across two different cell lines, we found that at similar levels of stringency, the conventional methodology was less specific and more prone toward identifying false-positive peaks than WAVES (see Supplementary Information Technical Note). Hence, WAVES seems to be a superior test for uncovering genomic regions harboring coordinated patterns of expression.


    Discussion
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 References
 
In this report, we developed a novel methodology, WAVES, to explore the relationship between the cancer transcriptome and chromosomal aneuploidy. Central to WAVES is the use of wavelet transforms to identify genomic regions exhibiting coordinated transcriptional expression. Although others have also investigated the correlation between DNA copy number and gene expression, the strategy adopted by most of these previous studies has been to initially identify specific regions of aneuploidy or DNA copy number change followed by examining genes within these regions for biases in gene expression (5, 8–10). In addition, most of these reports have focused on chromosomal amplifications, and not deletions. In contrast, we believe that our study is the first to show that gene expression can be successfully used on an unbiased genome-wide scale to directly infer the locations of both amplifications and deletions in cancers without relying on a priori knowledge of the locations of the chromosomal aberrations.

Our study made several methodological and biological findings. First, we found that wavelet transforms are efficient at identifying regions of subtle but significant gene expression coordination in gene expression data. Previous applications of wavelet transforms in biology (13) have been in the analysis of genome sequences (identifying pathogenicity islands and long-range correlations between DNA bending sites) and protein structures (detection and characterization of repeating motifs). Our results indicate the wavelet transforms can be effectively applied to cancer expression data as well. Second, by applying WAVES to a set of gastric cancer cell line transcriptomes, we found that each cell line exhibited a distinctive profile of COREs, supporting previous studies that every tumor (or cell line) is molecularly unique (19, 20). Third, the number of COREs observed in each cell line was generally related to the number of chromosomal aberrations in that cell line, implying a relationship between the presence of CORES and the presence of genomic aberrations. Fourth, the majority of these expression regions (80%) could be correlated to a region of known chromosomal aneuploidy, as assayed by two independent methodologies, CGH and SKY. Reciprocally, up to 47% of the total chromosomal aneuploidy observed in the cell lines could be directly inferred from the gene expression data, and it is likely that this number is a lower limit (see below). The strong association between the presence of a CORE and a genomic aberration at that same genetic locus shows that it is possible to directly infer the pattern of chromosomal aberrations for a particular tumor by studying its transcriptome.

Our finding that genome-wide portraits of tumor aneuploidy can be reconstructed from gene expression data strongly suggests that the effects of chromosomal aneuploidy are likely to be pervasively and globally imprinted throughout the cancer transcriptome. This is of significance inasmuch as many researchers currently studying chromosomal aberrations in cancers have usually focused on identifying a few key genes within the area of aneuploidy (5, 8, 9), under the hypothesis that these may represent important "driver" oncogenes and tumor suppressor genes. Although this is likely the case for genes such as the ERBB2 receptor on the 17q21 region (ref. 21; more examples in Supplementary Information Technical Note), our results suggest that aneuploidy may also contribute to tumor behavior by effecting subtle but widespread changes in gene expression at the level of hundreds and even thousands of genes. Others have reported similar results for selected chromosomal regions (4, 7, 22), and our study confirms these studies and extends their validity to the entire genome. Although the absolute gene expression differences in the COREs are subtle, recent studies have also shown that subtle changes in multiple genes can lead to significant biological effects, particularly if these genes are associated with shared cellular programs 23). As such, we suggest that an important question for future research will lie in the development of methodologies that can address the possible phenotypic consequences of such subtle but significant patterns of transcriptional coordination. Some possible methodologies might include metabolic control analysis (24) or flux balance analysis (25), which have been successfully used in other scenarios to analyze complex phenotypes generated through the combinatorial interaction of multiple genes and cellular components. It is worth noting that a previous report, using metabolic control analysis to study cancer gene expression data, has proposed that the large gains in metabolic fluxes observed in tumor cells are physically achievable only if one considers the contribution of thousands of marginally changing transcripts, which could arise in the context of chromosomal aneuploidy (26). This finding is consistent with our hypothesis that aneuploidy may contribute to tumor behavior not just by affecting the expression of a few key oncogenes and tumor suppressor genes, but also by subtly altering the expression levels of hundreds of genes in the oncogenome.

Our study has several potential limitations. First and most importantly, in the current implementation of WAVES, only expression regions unique to a particular cell line are identified (see Materials and Methods). This feature is almost certainly the reason why 53% of the CGH-aberrant bands could not be inferred from the gene expression data. Future versions of WAVES will contain enhancements to allow the identification of regions present in multiple cell lines. Second, we have focused in this study on the trancriptomes of cancer cell lines rather than solid tumors, and invitro data may not fully reflect a tumor's invivo behavior. However, a recent report has shown that the loss or gain of selected chromosomes in primary head and neck cancers is also strongly associated with alterations in gene expression over these large chromosomal regions (22). Third, the genomic aberrations in these cell lines were characterized using chromosomal CGH and SKY, and it is formally possible that these techniques, due to their low resolution, may have failed to detect small aberrant regions of the genome. However, a preliminary analysis correlating the chromosomal CGH results with BAC-array based CGH data, which can detect more fine-scale aberrations (~1 Mb resolution), has revealed a high degree of concordance in the locations of the aberrations commonly detected by both technologies (A. Aggarwal, data not shown).

Finally, we suggest that in addition to cancer, WAVES could also be useful in studying other biological problems involving regional patterns of gene expression, such as various developmental conditions (e.g., trisomy 21 in Down's syndrome), or processes involving regional processes of transcriptional control (e.g., heterochromatin). Previous reports have also shown that regions of coordinated gene expression can also be observed in a variety of nonmalignant tissues and model organisms (27–30). Methodologies such as WAVES may thus represent useful tools in future efforts to generally understand how transcriptional information is physically encoded across a genome.


    Acknowledgments
 
Grant support: Biomedical Research Council of Singapore grants 01/01/31/19/209 (P. Tan) and 30/1/31/18/230 (O.L. Kon) and the Lee Foundation (National Cancer Centre).

The costs of publication of this article were defrayed in part by the payment ofpage charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

P. Tan thanks Hui Kam Man for his encouragement and support. A. Aggarwal thanks Dr. H. Shen for help with the wavelet transforms.


    Footnotes
 
Note: Supplementary data for this are available at Clinical Cancer Research Online (http://clincancerres.aacrjournals.org).

Received 7/11/04. Revised 10/ 7/04. Accepted 10/27/04.


    References
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 References
 

  1. Albertson DG, Collins C, McCormick F, Gray JW. Chromosome aberrations in solid tumors. Nat Genet 2003;34:369–76.[CrossRef][Medline]
  2. Lerebours F, Bertheau P, Bieche I, et al. Two prognostic groups of inflammatory breast cancer have distinct genotypes. Clin Cancer Res 2003;9:4184–9.[Abstract/Free Full Text]
  3. Rennstam K, Ahlstedt SM, Baldetorp B, et al. Patterns of chromosomal imbalances defines subgroups of breast cancer with distinct clinical features and prognosis. A study of 305 tumors by comparative genomic hybridization, Cancer Res 2003;63:8861–8.
  4. Hyman E, Kauraniemi P, Hautaniemi S, et al. Impact of DNA amplification on gene expression patterns in breast cancer. Cancer Res 2002;62:6240–5.[Abstract/Free Full Text]
  5. Virtaneva K, Fred AW, Tanner SM, et al. Expression profiling reveals fundamental biological differences in acute myeloid leukemia with isolated trisomy 8 and normal cytogenetics. Proc Natl Acad Sci U S A 2001;98:1124–9.[Abstract/Free Full Text]
  6. Hughes TR, Roberts CJ, Dai H, et al. Widespread aneuploidy revealed by DNA microarray expression profiling. Nat Genet 2000;25:333–7.[CrossRef][Medline]
  7. Pollack JR, Sorlie T, Perou CM, et al. Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast cancer. Proc Natl Acad Sci U S A 2002;99:12963–8.[Abstract/Free Full Text]
  8. Phillips JL, Hayward SW, Wang Y et al. The consequences of chromosomal aneuploidy on gene expression profiles in a cell line model for prostate carcinogenesis. Cancer Res 2001;61:8143–9.[Abstract/Free Full Text]
  9. Platzer P, Upender MB, Wilson K, et al. Silence of chromosomal amplifications in colon cancer. Cancer Res 2002;62:1134–8.[Abstract/Free Full Text]
  10. Hüsing J, Zeschnigk M, Boes T, Jöckel KH. Combining DNA expression with positional information to detect functional silencing of chromosomal regions. Bioinformatics 2003;19:2335–42.[Abstract/Free Full Text]
  11. Kallioniemi A, Kallioniemi OP, Sudar D, et al. Comparative genomic hybridization for molecular cytogenetic analysis of solid tumors. Science 1992;258:818–21.[Abstract/Free Full Text]
  12. Schrock E, du Manoir S, Veldman T, et al. Multicolor spectral karyotyping of human chromosomes. Science 1996;273:494–7.[Abstract]
  13. Lio P. Wavelets in bioinformatics and computational biology: state of art and perspectives. Bioinformatics 2003;19:2–9.[Abstract/Free Full Text]
  14. Mallat SG. A wavelet tour of signal processing. New York: Academic Press; 1998.
  15. Torrence C, Compo GP. A practical guide to wavelet analysis. Bull Am Meteorol Soc 1998;79:61–78.[CrossRef]
  16. Storey JD, Tibshirani R. Statistical significance for genomewide studies. Proc Natl Acad Sci U S A 2003;100:9440–5.[Abstract/Free Full Text]
  17. Murray KB, Gorse D, Thornton JM. Wavelet transforms for the characterization and detection of repeating motifs. J Mol Biol 2002;316:341–63.[CrossRef][Medline]
  18. Ji J, Chen X, Leung SY, et al. Comprehensive analysis of the gene expression profiles in human gastric cancer cell lines. Oncogene 2002;21:6549–56.[CrossRef][Medline]
  19. Perou CM, Sørlie T, Eisen MB, et al. Molecular portraits of human breast tumors. Nature 2000;406:747–52.[CrossRef][Medline]
  20. Weigelt B, Glas AM, Wessels LF, Witteveen AT, Peterse JL, van't Veer LJ. Gene expression profiles of primary breast tumors maintained in distant metastases. Proc Natl Acad Sci U S A 2003;100:15901–5.[Abstract/Free Full Text]
  21. Varis A, Wolf M, Monni O, et al. Targets of gene amplification and overexpression at 17q in gastric cancer. Cancer Res 2002;62:2625–9.[Abstract/Free Full Text]
  22. Masayesva BG, Ha P, Garrett-Mayer E, et al. Gene expression alterations over large chromosomal regions in cancers include multiple genes unrelated to malignant progression. Proc Natl Acad Sci U S A 2004;23:8715–20.
  23. Mootha VK, Lindgren CM, Eriksson KF, et al. PGC-1{alpha} responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet 2003;34:267–73.[CrossRef][Medline]
  24. Kacser H, Burns JA. The molecular basis of dominance. Genetics 1981;97:639–66.[Abstract/Free Full Text]
  25. Edwards JS, Ibarra RU, Palsson BO. In silico predictions of Escherichia coli metabolic capabilities are consistent with experimental data. Nat Biotechnol 2001;19:125–30.[CrossRef][Medline]
  26. Rasnick D, Duesberg PH. How aneuploidy affects metabolic control and causes cancer. Biochem J 1999;340:621–30.
  27. Cohen BA, Mitra RD, Hughes JD, Church GM. A computational analysis of whole-genome expression data reveals chromosomal domains of gene expression. Nat Genet 2000;26:183–6.[CrossRef][Medline]
  28. Caron HE, van Schaik B, van der Mee M, et al. The human transcriptome map: clustering of highly expressed genes in chromosomal domains. Science 2001;291:1289–92.[Abstract/Free Full Text]
  29. Lercher MJ, Urrutia AO, Hurst LD. Clustering of housekeeping genes provides a unified model of gene order in the human genome. Nat Genet 2002;31:180–3.[CrossRef][Medline]
  30. Roy PJ, Stuart JM, Lund J, Kim SK, Chromosomal clustering of muscle-expressed genes in Caenorhabditis elegans. Nature 2002;418:975–9.[Medline]



This article has been cited by other articles:


Home page
Cancer Res.Home page
Q. Hou, Y. H. Wu, H. Grabsch, Y. Zhu, S. H. Leong, K. Ganesan, D. Cross, L. K. Tan, J. Tao, V. Gopalakrishnan, et al.
Integrative Genomics Identifies RAB23 as an Invasion Mediator Gene in Diffuse-Type Gastric Cancer
Cancer Res., June 15, 2008; 68(12): 4623 - 4630.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Supplementary Data
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Aggarwal, A.
Right arrow Articles by Tan, P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Aggarwal, A.
Right arrow Articles by Tan, P.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Cancer Research Clinical Cancer Research
Cancer Epidemiology Biomarkers & Prevention Molecular Cancer Therapeutics
Molecular Cancer Research Cancer Prevention Research
Cancer Prevention Journals Portal Cancer Reviews Online
Annual Meeting Education Book Meeting Abstracts Online