| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Molecular Biology, Pathobiology, and Genetics |
1 Translational Medicine, GlaxoSmithKline, King of Prussia, Pennsylvania; 2 Center for Applied Cancer Science, the Belfer Institute for Innovative Cancer Science and 3 Department of Medical Oncology, Dana-Farber Cancer Institute; 4 Department of Dermatology, Harvard Medical School, Boston, Massachusetts; and 5 Abramson Family Cancer Research Institute, University of Pennsylvania, Philadelphia, Pennsylvania
Requests for reprints: Lynda Chin, Department of Medical Oncology, Dana-Farber Cancer Institute, 44 Binney Street, Boston, MA 02115-6084. Phone: 617-632-6091; Fax: 617-632-6069; E-mail: lynda_chin{at}dfci.harvard.edu.
| Abstract |
|---|
|
|
|---|
| Introduction |
|---|
|
|
|---|
Recent years have witnessed major advancement in copy number profiling technologies beyond the traditional metaphase comparative genomic hybridization (CGH; ref. 4). Early efforts used large PCR-amplified sequences as probes, typically bacterial artificial chromosomes (BAC; ref. 5), or cDNAs (6) and provided a resolution of 1 to 2 Mb. The large probe sequences of these BACs, with stringent preselection, provided robust responses to copy number variations, where nonamplified complex targets exhibited sufficiently low noise levels for consistent detection of single copy aberrations (7). Continued development of BAC clone-based assays has produced arrays with complete sequence coverage of the human genome (8), increasing the effective resolution to
80 kb. In parallel, a new generation of oligonucleotide-based platforms has taken advantage of synthesized sequences to achieve dense coverage, shedding the dependence on (and thus the limitation of) clone libraries. This diverse generation of array-based copy number assays has inherent flexibility in design and versatility in applications while permitting fabrication of high-density microarrays.
A major difference among these oligonucleotide microarray platforms and the gold-standard BAC-based platforms is probe length. Whereas BAC clones, typically
150 kb in length, provide a high degree of specificity for the fragmented target sequences, the relatively small sizes of synthesized oligonucleotides offer lower signal to noise ratios for each probe. Optimization in labeling and hybridization protocols coupled with analytic development [e.g., circular binary segmentation (9) that avoids defining alterations associated with a single probe] has shown that sufficient signal to noise could be achieved with 60-mer oligonucleotide probes in full-complexity genomic hybridization (10). Single-channel single nucleotide polymorphism (SNP)-based microarrays designed for genotyping have also been adopted for copy number analysis (11, 12). Such SNP microarrays depend on a separate data set to establish a copy number reference against which to differentiate diploid from aberrant. Of note, genomic hybridization onto these short oligonucleotide probe arrays (typically 25 nucleotide long) typically uses an adaptor ligation PCR step before labeling to reduce target complexity.
Although a fast-evolving technological front, several 60-mer and SNP oligonucleotide microarray assays are now routinely used for copy number profiling. Thus, we reasoned that a systematic assessment of these established platforms with objectively defined variables will generate a well-controlled data set that not only informs investigators in their experimental design but also facilitates development of next generation of improved assays as well as new analytic tools for copy number analyses that address the common and unique computational challenges presented by each platform. To this end, we generated copy number profiles of a defined set of tumor cell lines on five oligonucleotide microarray-based assays of three platforms (Agilent, Affymetrix, and NimbleGen) and determined the reproducibility, signal and noise, as well as sensitivity and specificity of each in detecting 2-fold signals based on spectral karyotyping (SKY)-defined aberrations as ground truth for comparison. In addition, high-density microarray assays from Agilent and Affymetrix platforms were further compared for definition of CNAs in an independent data set using published analytic approaches. All primary and processed profiling data were deposited in the National Center for Biotechnology Information's Gene Expression Omnibus repository for public access, analyses, and tool development (series GSE7822).
| Materials and Methods |
|---|
|
|
|---|
Platforms. In phase 1, all seven cell lines were profiled in duplicate on five oligonucleotide-based microarray platforms (Table 1 ). These include Agilent 44K (AG44K) and 185K (AG185K) microarrays, NimbleGen 1500K (NG1500K) density microarray, and Affymetrix 100K (AF100K; Centurion) and AF500K (Mendel) chips. AG185K served as a prototype microarray for the currently commercialized AG244K CGH microarray. Both Agilent and NimbleGen platforms use dual-channel competitive hybridization protocol, whereas Affymetrix platforms use single-channel hybridization. CGH profiles on NimbleGen were generated by the manufacturer, whereas the AG185K array CGH profiles were generated by Agilent Laboratories. The AF100K and AF500K profiles were generated by the Children's Hospital of Philadelphia microarray facility and GlaxoSmithKline. Additionally, for the purposes of serving as a reference, these seven cell lines were also assayed on full tiling path Human reArray BAC clone array. Previously published data on these cell lines from a 1-Mb resolution BAC array were downloaded from public resources (14). In phase 2, 18 cell lines were profiled without duplicate on AF500K chip by Expression Analysis, Inc. and on AG244K by the Belfer Cancer Genomics Center at the Dana-Farber Cancer Institute. In all cases, factory-recommended hybridization protocols were followed as closely as possible for each platform.
|
Analysis. Probes for every assay were mapped to human genome build 36 (March 2006) using data provided by the University of California at Santa Cruz genome browser site6 or by the vendor. For each probe on every platform, a log2 copy number ratio was measured from raw data derived from the scanned image. For dual-channel arrays, this ratio was calculated by dividing the test channel image intensity by that of the reference channel for every probe. Probe-wise ratios were calculated for the single-channel Affymetrix chips by comparing the "perfect match" intensities with the range of intensities seen in the reference chip set using the dChip software package (15) and methods described in ref. 12. Ratios of duplicate clones were averaged for all assays. Subsequently, every assay was normalized under the assumption that median copy number was diploid such that the median log2 ratio is zero. Details of this process for each platform can be found as part of the supplement.
To quantify the probe-wise signal response of each platform, copy number alterations were identified at four loci in cell line SKY data. This consisted of three distinct subchromosomal gains (4n) ranging from approximately 18 to 103 Mb and one
100 Mb loss (1n). Additionally, four regions of at least
45 Mb in size representing diploidy (2n), one in each matching cell line, were identified to serve as a reference for comparison. A signal to noise ratio was calculated as follows.
![]() |
represents the log2 SD from diploid region k. The degree to which individual probe measurements can accurately distinguish gains and losses was estimated by recalculating probe ratio scores for these regions with sliding windows of 1, 3, and 7 probes. This procedure also supplied a platform-specific ratio threshold by identifying the ratio with the minimal degree of overlap for every platform as the most appropriate boundary for discriminating gains and losses for that platform in subsequent analyses. Platform specificity for individual probes was defined simply as the proportion of probes that were diploid (reference region defined by SKY) that would be classified as aberrant given a sliding log2 ratio threshold, whereas sensitivity was the reverse scenario (aberrant classified as diploid). Given the optimal ratio thresholds defined by receiver operator characteristic (ROC) analysis, the false-positive rate (FPR) for a platform was calculated by querying the proportion of diploid probes that would be classified as aberrant. The false-negative rate (FNR) is proportion of aberrant probes classified as 2n.
The area under the curve (AUC) of ROC curve is calculated by evenly dividing ROC curve into 10,000 pieces at the X axis. Each piece is approximated as a rectangle and the AUC is the sum of area of 10,000 rectangles.
Probe set–independent comparisons between platforms were calculated for each assay using standard circular binary segmentation (9) for both phase 1 and 2 analysis. Focal amplifications (gains more than approximately five copies) and homozygous losses were identified by querying regions
1 Mb or smaller that were assigned a log2 copy number ratio two times the calibrated gain/loss threshold described above.
In phase 2, CNAs are defined as described in previous studies. A "segmented" data set was generated by determining uniform copy number segment boundaries and then replacing raw log2 ratio for each probe by the mean log2 ratio of the segment containing the probe. Segments at 98th percentile and 2nd percentile were used as amplification and deletion threshold, respectively. All 18 samples in AG244K data set are mode centered based on segmented data before generating CNAs. All 18 samples in AF500K SNP data set are also baseline adjusted based on one chromosome for each sample in AG244K data set. Supplementary Table S5 shows that the median log2 ratio of listed chromosomes is the same in both AG244K and AF500K data sets after baseline adjustment.
| Results and Discussion |
|---|
|
|
|---|
Seven melanoma cell lines (Supplementary Table S2A) were expanded in vitro and harvested for metaphase spreads and genomic DNA isolation. Cytogenetic profiles of each cell line were generated in house by SKY and copy number profiles on all platforms were generated by commercial vendors or expert core facility (see Materials and Methods) outside of the authors' laboratory to minimize bias due to technical familiarity. To control for effect of DNA quality on data, same genomic DNA preparations were used for all platforms. The resultant data set was analyzed for reproducibility, signal, noise, sensitivity, and specificity of 2-fold copy number alteration detection as well as identification of known focal CNAs.
Reproducibility. The first variable we determined was the reproducibility of replicate hybridizations for each assay. Here, reproducibility was measured by comparing correlation scores between replicates. As shown in Table 1, the Agilent-optimized CGH arrays offered the highest degree of reproducibility in replicate hybridizations, whereas the single-channel Affymetrix SNP arrays were intermediate in this respect. For both Agilent and Affymetrix platforms, the degree of reproducibility was higher for the higher-density assays (AG44K versus AG185K with P = 0.0436, paired t test; AF500K versus AF100K with P = 0.0223, paired t test), likely reflecting more consistent detection of focal aberrations with additional reporting probes as well as design/manufacturing advances (e.g., probe selection). Although it is conventional to do duplicate hybridizations for dual-channel assays (e.g., Agilent and NimbleGen) and single hybridization for one-channel assay (e.g., Affymetrix), to achieve the most parallel comparison possible, all subsequent analyses in this study are based on single hybridization for either dual- or single-channel assays.
Sensitivity and specificity in deletion of regional gain and loss. Two-fold change in copy number translates into detection of 1n (heterozygous loss) or 4n (two copy gain) relative to baseline 2n (diploid) genome. The ability to detect these low-amplitude events depends on absolute signals and signal to noise ratios, thus serving as an ideal test to assess the robustness of a platform. Here, we first determined the ploidy and defined regions of "ground truth" for comparison based on the SKY profiles (16). In particular, four of the seven cell lines were selected, each determined to be predominantly diploid and harboring large contiguous genomic regions (>20 Mb) with 2-fold gain (4n; WM983C, WM88, and Lu1205) or 2-fold loss (1n; WM1366; Supplementary Figs. S1–4; Table 2 ). Within these defined genomic regions, individual probe values from each of the platforms were used for calculation of signals and noises. Absolute signal was calculated as the mean probe values reporting on regions of 4n or 1n, whereas noise was defined as SD of probes reporting on the defined 2n region. On a log2 scale, the theoretical maximum for 2-fold signal is 1.0. As shown on Table 2, the strongest absolute signal achieved was 0.93 by AG185K microarray in WM88 cell line. Among the three cell lines with 4n gain (WM983C, WM88, and Lu1205), signal was poorest for WM983C in all assays regardless of platform, consistent with the fact that this cell line is consisted of two major subpopulations as revealed by SKY (Supplementary Table S3). In other words, heterogeneity within a sample will result in lower signals of observed CNAs, a variable of importance when one considers analyses of primary tumor tissues consisting of both tumor and stromal populations.
|
|
0.95 on the resultant ROC curves (Fig. 2B), resulting in an "effective" resolution of 14 kb, down from 2 kb. Similarly, for both AF100K and AF500K platforms, a smoothing window of three consecutive probes enabled them to do comparably with the Agilent platforms, with AUCs of
0.97, resulting in effective resolutions of 76.2 and 17.7 kb, respectively.
|
100 kb covered by only 6 probes on the AF100K SNP array, in contrast to 8 to 48 probes by the other microarrays. In the case of WM35, where all other platforms detected a CDKN2A deletion, the NimbleGen platform showed a hemizygous loss, suggesting that the probes surrounding this locus may not offer optimal signal.
|
1.5 Mb) was well covered on all of the microarrays, likely contributing to the high level of concordance. In contrast, amplification of SNAI2, a transcriptional repressor associated with neural crest cell development and migration required for metastasis of transformed melanoma cells (17), was identified only by the high-density microarrays (AG185K, AF500K, and NG1500K). The detection failure by the AG44K and AF100K platforms can be explained by poor probe coverage, where these platforms had only one and two probes, respectively, reporting within the 150 kb minimal common region identified by the higher-density platforms. Therefore, high-density coverage across the genome offers an important advantage in detection of focal events. Genome-wide catalogues of CNAs. Phase 1 study above showed comparable detection of three known CNA events by high-density Agilent and Affymetrix platforms; thus, we next compared these two platforms in cataloguing known and unknown CNAs in a cohort of 18 melanoma cell lines. For this phase 2 comparison, the highest-density microarrays available at the time (i.e., AF500K and AG244K) were used. As before, same preparations of genomic DNA from all 18 cell lines were used for profiling. The Agilent profiles were generated by the authors' laboratory for this part of the study, whereas the Affymetrix profiles were generated by a commercial vendor. All 18 profiles from both platforms were processed by circular binary segmentation algorithm (9) and CNAs were defined as previously reported (see Materials and Methods; refs. 12, 18, 19). Unlike phase 1 of the study, it is not possible to generate a ground-truth CNA list against which to compare performance of these two platforms. Therefore, we limited our analyses here to concordance of high-amplitude CNAs between platforms.
First, we defined a list of CNAs with amplitudes in the top or bottom 2% of all segment values detected by each platform (see Materials and Methods); this translated to log2 thresholds approximating twice of the optimal thresholds for 2-fold signal detection (e.g., log2 ratio >0.868 or <–0.858 versus >0.498 or <–0.471 for AG244K versus AF500K, respectively). As summarized in Table 4 , AG244K platform defined a total of 485 unique CNAs (260 amplifications and 225 deletions) among these 18 cell lines, whereas the AF500K detected 476 CNAs (177 amplifications and 299 deletions). Collectively, 29% (215 of 749) of these unique CNAs were common between the two platforms. Concordance among the amplification events seemed higher (137 of 300, 46%) than that for estimated homozygous deletions (78 of 447, 17%).
|
Our phase 1 comparison on known CNA detection (Table 3) indicated that one important contributing factor to discordance between platforms is the absolute signal of detection by a platform. Thus, we next asked whether the peak log2 ratio among the concordant CNAs were higher than the nonconcordant events. Indeed, among the 182 CNAs defined by AF500K, the average peak log2 ratio for the 123 concordant events was 0.85 as opposed to 0.66 for the 59 nonconcordant CNAs (P = 0.00018, t test). Similarly, among the 361 CNAs defined by AG244K, the average peak log2 ratio for the 194 concordant CNAs was 1.76 as opposed to 1.30 for the 167 nonconcordant CNAs (P = 2.65E–6, t test). In other words, high-amplitude events are more likely to be of higher confidence.
Lastly, given the design differences between these two platforms, we expected that CNAs detectable only by AF500K profiles would be more likely to reside in genomic regions with fewer annotated genes. Indeed, of the 182 AF500K CNAs, an average of 5.1 annotated genes mapped to the 59 nonconcordant events, whereas an average of 11.7 genes resided within the 123 concordant CNAs (P = 0.04, t test). On the other hand, the average number of resident genes in AG244K-defined CNAs that were concordant or nonconcordant with AF500K was not different (7.9 versus 7.6; P = 0.823, t test). Therefore, CNAs targeting genetic elements other than annotated genes are more likely to be missed by the AG244K platform.
Conclusion. Taking advantage of SKY to define large regional events as ground truth for comparison, we were able to determine a set of variables, including reproducibility, absolute signals, and signal to noise as well as ROC curves, with which to compare objectively the robustness of five oligonucleotide microarray-based genome-wide copy number assays from three different platforms. These comparisons convincingly showed that longer oligonucleotide probes optimized for genomic hybridization offer the most robust detection of CNAs. Furthermore, increased density of probe coverage not only improves resolution but also enhances confidence of detection by providing more data points reporting on a particular genomic event. An advantage of Affymetrix platform over Agilent is its broader and more even coverage across the genome, increasing probability of detecting CNAs targeting noncoding genetic elements. On the other hand, Agilent CGH microarrays offer more robust and focal detection of CNAs targeting gene-rich regions. Availability of these data sets should encourage computational algorithm development for improved copy number modeling.
| Acknowledgments |
|---|
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
| Footnotes |
|---|
J. Greshock and B. Feng contributed equally as first author.
Received 6/ 6/07. Revised 8/ 7/07. Accepted 8/21/07.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
H.-Q. Qu, K. Jacob, S. Fatet, B. Ge, D. Barnett, O. Delattre, D. Faury, A. Montpetit, L. Solomon, P. Hauser, et al. Genome-wide profiling using single-nucleotide polymorphism arrays identifies novel chromosomal imbalances in pediatric glioblastomas Neuro Oncology, October 15, 2009; (2009) nop001v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. S. Nagaraj Evolving 'omics' technologies for diagnostics of head and neck cancer Brief Funct Genomic Proteomic, March 9, 2009; (2009) elp004v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. P. Coe, C. MacAulay, W. L. Lam, B. Ylstra, B. Carvalho, and G. A. Meijer Comment re: A Comparison of DNA Copy Number Profiling Platforms Cancer Res., May 15, 2008; 68(10): 4010 - 4010. [Full Text] [PDF] |
||||
![]() |
J. Greshock, J. Cheng, D. Rusnak, A. M. Martin, R. Wooster, T. Gilmer, K. Lee, B. L. Weber, and T. Zaks Genome-wide DNA copy number predictors of lapatinib sensitivity in tumor-derived cell lines Mol. Cancer Ther., April 1, 2008; 7(4): 935 - 943. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Yang, R. R. Almon, D. C. DuBois, W. J. Jusko, and I. P. Androulakis Extracting Global System Dynamics of Corticosteroid Genomic Effects in Rat Liver J. Pharmacol. Exp. Ther., March 1, 2008; 324(3): 1243 - 1254. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |