Identification of etiology of human cancers is important for effective cancer prevention, and attempts to estimate the roles of a variety of environmental carcinogens in human cancers are being made. Here, we applied cDNA microarray technology to estimate whether gene expression profiles of cancers would reflect their etiology. Using rat mammary carcinoma models, expression profiles were analyzed in two groups of carcinomas induced by distinct carcinogens but with the same histological classification. Four carcinomas induced by 7,12-dimethylbenz[a]anthracene (DMBA) and three carcinomas induced by 2-amino-1-methyl-6-phenylimidazo[4,5-b]pyridine (PhIP) and a high-fat diet were analyzed by a GeneChip oligonucleotide microarray that contained ∼8000 rat genes. By hierarchical clustering analysis, the seven carcinomas were classified into two groups that exactly coincided with the DMBA-induced and the PhIP-induced groups. The correlation coefficient between the two groups was 0.63, and those between any carcinomas within each group ranged from 0.78 to 0.95. In addition, characteristic clusters of genes were also identified that highlighted distinct and common characteristics of both groups. Seventeen genes were down-regulated in the DMBA and up-regulated in the PhIP-induced groups. Thirty-three genes were regulated in the opposite manner. Our results indicated that gene expression profiles in cancers reflect their etiology and suggested a possibility that etiology of cancers could be retrospectively estimated from their expression profiles.
Clarification of the etiology of human cancers and estimation of their roles are some of the most important issues in cancer research. If specific contributions of environmental carcinogenic factors to human cancers could be estimated, it would be possible to take reasonable procedures to avoid them with minimal expense and sacrifice of daily comfort. Epidemiology has played an important role in identification of major carcinogenic factors, such as smoking, viral infection, and carcinogens in diet, and the estimation of their roles in human cancers. However, there are many more carcinogenic factors in the environment, the roles of which in carcinogenesis are difficult to be estimated only with traditional epidemiology (1, 2, 3) . Molecular epidemiology is one of the approaches to estimate the carcinogens involved in human cancers, making use of signature mutations left in the cancer (4 , 5) .
cDNA microarray technology has provided new information in many aspects of tumor biology (6) . The expression profiles in cancers were shown to reflect their histological characteristics, clinical outcomes, and the responses to treatment (7, 8, 9, 10, 11, 12) . In addition, the expression profiles were used to develop clinical biomarkers (13, 14, 15) . However, no reports are available regarding the correlation between the etiology of human cancers and their gene expression profiles, except for hepatocellular carcinomas caused by hepatitis viruses (16) . This is because the etiologic factors of human cancers are complex and very difficult to determine.
For this reason, animal models are expected to provide good resources, as animal cancers are induced by defined protocols and the carcinogenic factors involved are clear. Rat mammary carcinomas can be reproducibly induced by well-established methods using DMBA 3 (17) , PhIP (18) , and NMU (19) . The induced mammary carcinomas predominantly originate from mammary ducts (17) and show smaller phenotypic variations (20) and fewer genetic alterations than the human counterpart (21 , 22) . In addition, normal mammary glands can be purely isolated from inguinal fat by the gland isolation technique.
In this study, we analyzed expression profiles of rat mammary carcinomas induced by DMBA and those induced by PhIP, a representative food-born carcinogen (23) , using a cDNA microarray, and we classified these carcinomas based on their gene expression profiles and estimated whether the profiles reflect their etiology.
Materials and Methods
Mammary Carcinomas and Normal Mammary Ducts.
To induce mammary carcinomas by DMBA, 30 female (F344 × SD)F1 rats at the age of 7 weeks were administrated a single dose of DMBA (50 mg/kg) in corn oil by gavage. Mammary carcinomas were induced in 16 of 30 rats at the ages of 25–32 weeks. To induce mammary carcinomas by PhIP and high-fat diet, 33 female (F344 × SD)F1 rats were given 10 doses of PhIP (75 mg/kg/day) in water at 6 weeks of age by gavage and were fed a diet with 23.5% corn oil. Mammary carcinomas were induced in 15 of 33 rats at the ages of 56–72 weeks (24) . Macroscopic tumors were histologically examined by two experienced pathologists (K. M., S. F.). Normal mammary ducts were collected from age-matched untreated female (F344 × SD)F1 rats by the gland isolation technique for mammary ducts (24) . The tumors and mammary ducts were kept frozen at −80°C until extraction of total RNA. Total RNA was isolated by Isogen (Nippon Gene, Tokyo, Japan).
GeneChip Rat Genome U34A arrays, which contained probe sets for ∼8000 rat genes, were purchased from Affymetrix (Santa Clara, CA). Five μg of total RNA were used as starting material for cDNA preparation. The first-strand cDNA was synthesized with the SuperScript II reverse transcriptase (Invitrogen, Groningen, The Netherlands) and a T7-(dT)24 primer (Amersham Pharmacia Biotech, Buckinghamshire, United Kingdom). The double-strand cDNA was synthesized with Escherichia coli RNase H, E. coli DNA polymerase I, and E. coli DNA ligase (Toyobo, Tokyo, Japan). Biotin-labeled cRNA was prepared using a HighYield RNA transcript labeling kit (Affymetrix). After in vitro transcription, the unincorporated nucleotides were removed using the RNeasy Mini kit (Qiagen, Valencia, CA). Twenty μg of each labeled cRNA were fragmented, and its quality was assessed by gel electrophoresis.
The labeled cRNA was placed in a hybridization mixture containing four biotinylated hybridization controls (BioB, BioC, BioD, and Cre), as recommended by the manufacturer, and the U34A arrays were hybridized for 16 h at 40°C at constant rotation (60 rpm). After washing, the GeneChip arrays were stained with streptavidin-phycoerythrin conjugate (Molecular probes, Eugene, OR) and then scanned with a GeneArray scanner (Hewlett-Packard, Palo Alto, CA). The scanned images were processed using an Affymetrix GeneChip Analysis Suite (Version 4.0.1) to obtain the “average difference value” for each probe set. The image from each array was scaled so that the average of average difference values of all probe sets was adjusted to 1000. The values of “fold change” were obtained with normalization for all probe sets. Scaled average difference values and fold change data from each GeneChip array were exported to flat text files and used for statistical analysis. The complete set of data are available at our web site. 4
Statistical comparisons were performed on the 2000 transcripts with the highest intensity from each GeneChip array. Before clustering, the fold change values of the genes were log10 transformed. The log-transformed fold change values were filtered for presence in six or more of the seven carcinomas. Average-linkage hierarchical clustering of an uncentered Pearson correlation similarity matrix was applied with the program Cluster, and the figures were generated with the program TreeView (25) . Similarities of gene expression profiles between two given carcinomas were assessed by the Pearson correlation coefficient (25) .
cDNA was synthesized from total RNA with oligo (dT)12–18 primer and Superscript II reverse transcriptase (Invitrogen). Quantitative RT-PCR analysis was performed using a iCycler iQ detection system (Bio-Rad Laboratories, Hercules, CA) with Sybr Green PCR Core Reagents (Applied Biosystems). The sequences of the primers and annealing temperature are listed in Table 1 ⇓ . The number of molecules of a specific gene in a sample was measured by comparing its amplification with the amplifications of standard samples that contained 101-106 copies of the gene and was normalized to that of cyclophilin (26) . Fold change of the gene in a carcinoma was calculated by dividing the normalized value of the carcinoma by that of the normal control mammary duct.
Histology of the Tumors Induced.
All of the mammary carcinomas induced by DMBA and PhIP were diagnosed as adenocarcinomas. Four DMBA-induced and three PhIP-induced carcinomas were selected based on availability for histological and GeneChip analyses. Histological examination of these seven carcinomas, even after retrial, did not reveal special features for DMBA-induced or PhIP-induced carcinomas (data not shown).
Gene Expression Profiling of the Mammary Carcinomas.
The four DMBA-induced and three PhIP-induced mammary carcinomas were analyzed by GeneChip microarrays. From the 2000 genes, a set of 1564 genes was selected for cluster analysis with the criterion that the data of each gene in the set were present in six or seven carcinomas. Hierarchical cluster analysis separated these seven carcinomas into two main branches in concordance with the carcinogens used to induce them (Fig. 1) ⇓ . The correlation coefficients between two of the four DMBA-induced carcinomas ranged from 0.88 to 0.95 and those between two of the three PhIP-induced carcinomas ranged from 0.78 to 0.80. The correlation coefficient between the DMBA-induced and the PhIP-induced carcinomas was 0.63. These results showed that, although the gene expression profiles of the chemically induced rat mammary carcinomas were relatively similar, distinct gene expression profiles existed in the DMBA-induced and the PhIP-induced mammary carcinomas.
Genes with Altered Expressions.
Several clusters of genes showed characteristic expression patterns that highlighted differences and similarities between the DMBA-induced and PhIP-induced mammary carcinomas (bars in Fig. 1A ⇓ ). Clusters A1 and A2 consisted of genes down-regulated in the DMBA group while up-regulated in the PhIP group, and genes in cluster A1 were more clearly down-regulated than those in cluster A2. Of the 19 genes in cluster A1, 17 genes showed significantly lower expression levels in the DMBA group than in the PhIP group (P < 0.01 by Student’s t test; Fig. 1B ⇓ ).
On the other hand, clusters C1, C2, and C3 consisted of genes up-regulated in the DMBA group while down-regulated in the PhIP group, and genes in cluster C3 were more clearly up-regulated than those in clusters C1 and C2. All of the 33 genes in cluster C3 showed significantly higher expression levels in the DMBA group than in the PhIP group (P < 0.01 by Student’s t test). Glutathione S-transferase IV(Gstp1, Rn.44821), cathepsin D, and MHC genes were in this cluster (Fig. 1C) ⇓ .
Cluster B consisted of genes down-regulated in both the DMBA and PhIP groups, whereas cluster D consisted of genes up-regulated in both groups. Of the 83 genes in cluster B, 37 (45%) were down-regulated with fold change values >2.5 in all seven mammary carcinomas, including tissue inhibitor of metalloproteinase-1(Timp1, Rn.25754) and transforming growth factor-β-inducible early response gene (Tieg, Rn.2398). Of the 109 genes in cluster D, 68 (62%) were up-regulated with fold changes >2.5 in all seven mammary carcinomas, including cyclin D1 (Ccnd1, Rn.9471) and DNA methyltransferase (Dnmt1, Rn.6955).
Confirmation of Data from GeneChip Analysis by Quantitative RT-PCR Analysis.
The data obtained by the GeneChip analysis was confirmed by quantitative RT-PCR analysis of 12 genes, which were selected as those with typical expression in each cluster (Table 2) ⇓ . Only when fold changes in GeneChip analysis were between −1.3 and 1.6, changes in opposite directions were occasionally observed by quantitative RT-PCR analysis. Specific changes observed in clusters A1, B, C3, and D were reproducible in the quantitative RT-PCR analysis.
Expression analysis of ∼8000 rat genes in four DMBA-induced and three PhIP-induced rat mammary carcinomas revealed that the carcinomas could be classified into two carcinogen-specific groups by nonbiased hierarchical analysis. Any two carcinomas within the DMBA-induced or PhIP-induced groups showed much higher correlation coefficients (>0.78) than the average among all of the carcinomas (Fig. 1) ⇓ . Bec ause PhIP and a high-fat diet were used to induce carcinomas in the PhIP-induced group, the expression profile in the PhIP-induced carcinomas would reflect the combined effect of PhIP and a high-fat diet. It is known that the level of dietary fat can modulate the gene expression levels of β-casein and transferrin in the PhIP-induced mammary carcinomas (27) . Histologically, all of the DMBA-induced and PhIP-induced mammary carcinomas have been diagnosed as adenocarcinomas, and even after retrial, the carcinomas could not be classified into two groups with specific etiologies. These results showed that gene expression profiles of the rat mammary carcinoma reflected their etiology that could not be estimated by histological examination. If a subset of genes that are tightly associated with specific carcinogens can be identified, the etiology of specific cancer could be retrospectively estimated from the expression profiles in it.
As for mechanisms of how expression profiles specific to DMBA- or PhIP-induced mammary carcinomas were generated, the following mechanisms can be considered. First, carcinogens are known to induce genetic alterations relatively specific to each carcinogen (28, 29, 30) . These genetic alterations would affect their downstream signaling pathways, which would be reflected in the expression profiles in the resultant tumors. DMBA is known to induce A:T to T:A transversions predominantly (31) , whereas PhIP induces G:C to T:A transversions and G:C deletions (24) , and these differences are considered to lead to different target genes. The H-ras mutation frequencies and incidences of loss of heterozygosity are reported to be different between the PhIP-induced and DMBA-induced mammary carcinomas (22 , 32, 33, 34) . Secondly, carcinogens are known to induce specific responses in normal cells (35 , 36) . DMBA has a stronger effect on serum prolactin and estradiol levels than PhIP, whereas PhIP shows stronger suppression of apoptosis in the mammary glands (37) . Even after repeated clonal selections during carcinogenesis, there is a possibility that some of the specific responses might be retained.
Genes differentially expressed between DMBA- and PhIP-induced mammary carcinomas are expected to be related with the molecular pathway specific to each group. Genes up-regulated in the DMBA-induced carcinomas and down-regulated in PhIP-induced carcinomas (cluster C3) included genes related to immune reactions such as a rat major histocompatibility gene and tumor rejection antigen 1 (38) . This suggested that stronger immune responses were induced in the DMBA-induced carcinomas. Genes up- or down-regulated commonly in DMBA- and PhIP-induced mammary carcinomas were expected to include the genes that were related to phenotypes as mammary carcinomas. Increased expression of cyclin D1 observed in both groups was in accordance with the previous report (39) .
We showed the presence of etiology-specific expression profiles in chemically induced rat mammary carcinomas. This suggested the presence of etiology-specific expression profiles in human cancers also, and this could provide clues to identifying carcinogenic agents in a certain cancer based on its gene expression profile.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
↵1 Supported by a Grant-in-Aid for the Second-term Cancer Control Strategy and by a Grant-in-Aid for the Millennium Genome Project from the Ministry of Health, Labor and Welfare.
↵2 To whom requests for reprints should be addressed, at Carcinogenesis Division, National Cancer Center Research Institute, 5-1-1, Tsukiji, Chuo-ku, Tokyo, 104-0045, Japan. Phone: 81-3-3547-5240; Fax: 81-3-5565-1753; E-mail:
↵3 The abbreviations used are: DMBA, 7,12-dimethylbenz[a]anthracene; PhIP, 2-amino-1-methyl-6-phenylimidazo[4,5-b]pyridine; NMU, N-methyl-N-nitrosourea; RT-PCR, reverse transcription-PCR.
↵4 Internet address: http://www.ncc.go.jp/research/rat-genome/.
- Received February 15, 2002.
- Accepted May 17, 2002.
- ©2002 American Association for Cancer Research.