Identification of a Cholangiocarcinoma-Like Gene Expression Trait in Hepatocellular Carcinoma

Authors' A Research National U Experimen Cancer Ins Ajou Unive Systems B Anderson C Note: Sup Online (http The raw da (http://www Correspon lege of Med Phone: 82-


Introduction
Hepatocellular carcinoma (HCC) and cholangiocarcinoma (CC) are the major primary liver cancers in adults.HCC is enormously heterogeneous with dismal clinical outcome (1), whereas CC is more difficult to diagnose and to treat compared with HCC showing worse prognosis (2).In addition, a rare form of combined hepatocellular-cholangiocarcinoma (CHC) has been reported to have intermediate characteristics between HCC and CC (3).Most HCCs and CCs are believed to be derived from hepatocytes and cholangiocytes, respectively.Meanwhile, CHC have been suggested to be derived from bipotential liver stem cells which can differentiate into either hepatic or biliary progenitor cells (4).In addition, several studies have suggested the stem/progenitor cell origin of liver cancers including HCC (5,6), CC (7), and cholangiolocellular carcinoma (8).Such heterogeneous differentiation status of cellular origin suggests the phenotypic overlap between HCC, CHC, and CC.
Liver stem cells have bipotential to differentiate into either hepatic or biliary progenitor cells (9,10).However, the cancer phenotypes derived from the biliary committed cells during developmental hierarchy were not investigated thoroughly.In this study, to address the biological and clinical implications of the overlapping phenotype between biliary and hepatic cell traits in liver cancer, we applied an integrative genomic approach by comparing the expression profiles of HCC, CHC, and CC.By calculating the expression levels of cholangiocarcinoma-like traits (CC signature), we identified a novel subtype of HCC, i.e., cholangiocarcinoma-like HCC (CLHCC) which might be derived from biliary lineage cells.Further evaluation of the cholangiocarcinoma-like trait with the previous stem cell-derived traits such as embryonic stem (ES) cell-like (11) and hepatoblast-like (12) signatures was helpful to elucidate the heterogeneous progression of liver cancers implying their cellular origins from different developmental stages.

Materials and Methods
Patients and diagnosis of tumor types.We prospectively collected intrahepatic HCC (n=70), CHC (n=7), and CC (n=13) specimens from patients who had surgical treatment for tumor at Seoul National University Hospital.Intraoperative ultrasonography confirmed that no distant metastases or space-occupying lesions existed in the nonresected remnant liver of any of the individuals in this study.Patients with extrahepatic tumor were excluded.All the patients were determined to have received curative resection by examining the presence of residual tumors at the surgical margin.The study protocol was approved by the Institutional Review Board at the Seoul National University Hospital.Histologic diagnosis using H&E-stained slides were independently reviewed by two experienced pathologists.Validation of tumor types was performed by using the immune-reactivity of antihepatocyte antibody and CK7/CK19, respectively (i.e., HCC, +/−; CHC, +/+; CC, −/+).All CHC specimens were of mixed type C according to Allen and Lisa classification (3).
Microarray experiments.Microarray experiments were performed using Affymetrix HGU-133A2 GeneChip as described previously (13).Raw data from 90 tumors including our previous data (13) were normalized by Robust Multichip Average method (14), and further set the mean values of each gene and each sample to zero.For multiple tagged gene features in the same Entrez Gene identifier, the gene feature with the highest magnitude (i.e., sum of square of the expression levels) was used as a representative gene feature.
Construction of gene expression compendium of HCC.The gene expression compendium of HCC (HCCcomp) was constructed by collecting six different data sets including data from the Laboratory of Experimental Carcinogenesis (i.e., National Cancer Institute; GSE1898 and GSE4024), GSE5975 ( 15), E-TABM-36 (16), GSE9843 (17), SNU (data from the Seoul National University, GSE15765), and a new platform data for formalin-fixed, paraffin-embedded tumors (GSE10186; ref. 18).Publicly available data were downloaded from National Center for Biotechnology Information GEO or ArrayExpress databases.Non-HCC samples in the data sets were excluded, and each data set was normalized as described above.Finally, 712 HCC gene expression profiles were compiled.
The signatures for stem cell-like expression traits and canonical pathways.A collection of ES-related signatures (i.e., ES1, ES2, sox2, oct4, myc1, and myc2), polycomb-target gene signatures (i.e., Suz2, Eed, H3K27 bound, and PRC2), and a proliferation-related gene signature (prol) were obtained from the previous study (11).The rat HB gene signature expressed in early fetal liver development was also obtained from the publication site (12).The homologous human and rat genes were linked via the National Center for Biotechnology Information HomoloGene database.For the enrichment test of canonical functions, the gene sets in each of biological process Gene Ontology (GO) terms with at least three genes were used.
Estimation of the enrichment of gene expression signatures.The enrichment of a gene set in individual tumors was determined by Kolmogorov-Smirnov test.Briefly, for each individual gene expression profile, two P values for the estimates D+ and D− were calculated by Kolmogorov-Smirnov test, which determines the significance of the directional (positive or negative) enrichment of distributions of the signature.The enrichment scores S D+ and S D− for a given signature was calculated as −log 10 (P value) from D+ and D−, respectively.The enrichment score S was defined as S D+ if S D+ > S D− and -S D− if S D+ < S D− .The samples with |S|>2 (P < 0.01) were regarded to be significantly enriched.The enrichment patterns across sample groups were determined by calculating fraction of the significantly enriched samples in a group, and the significance of the group enrichment was calculated by hypergeometric test as described previously (11).
Gene network analysis.Genetic network was constructed using PathwayStudio (Ariadne Genomics version 6.2).All the direct interactions among an interested gene set were identified from the curated database provided by the software.Of the subnetworks constructed from the gene set, the largest subnetwork with significant enrichment (P < 0.05, calculated from the software) was identified as a key regulatory network.

Results
Clinical and transcriptomic characteristics of CC, CHC, and HCC.The clinical features of HCC, CHC, and CC were summarized in Supplementary Table S1.Compared with the HCC patients, the CC patients were much older (>55 years of age) and had lower serum levels of AFP and platelets.Histologically, all the CC tumors were of the multiple nodular types, implying their aggressive behavior.Kaplan-Meir plot analyses revealed that the patients with CC had poorer recurrence-free survival (RFS) and overall survival (OS) compared with those of patients with HCC (Supplementary Fig. S1A  and B).The patients with CHC showed poor prognosis similar to patients with CC, suggesting the closer likeliness of CC and CHC tumors in agreement with previous study (19).
First, we sought to evaluate whether the expression profiles were ready to classify the tumor types.Unsupervised clustering analysis showed that HCC and CC were well stratified, indicating the marked difference between HCC and CC (Supplementary Fig. S1C).Five out of seven CHCs were clustered together with the CC tumors supporting the prognostic similarity between CHC and CC.No significant batch effect was found between the newly added HCC samples and the previous ones (Supplementary Fig. S1C).Next, to verify the reflection of the conserved tumor type characteristics in the gene expression profiles, the functional characteristics of each tumor type was evaluated by calculating the signature enrichment in the GO hierarchy.Considering the heterogeneity of the expression patterns in individual tumors, the signature enrichment analysis was applied to each tumor (for details, see Materials and Methods).The HCC specimens were enriched with the metabolism-and immune-related functions, whereas the CC specimens were enriched with the development/differentiation-or metastasis/ adhesion-related functions (Supplementary Fig. S2).The CHC showed similar patterns with the CC.These data indicate that the distinct prognostic values of each tumor class are well reflected in the gene expression patterns.Therefore, we suggest that the prognostic differences among the tumor types are bona fide tumor characteristics rather than the casual associations with other clinical factors such as operability of the tumors, time to diagnosis, or patient's health condition.
Identification of a novel subtype CLHCC.Next, to determine the expression of cholangiocarcinoma-like traits in HCC, we identified differentially expressed genes between HCC and CC (n = 2,188) by using 10,000 permuted twosample t tests (P < 0.001) and fold differences >2 (false discovery rate < 0.0023).These genes included many putative candidates for the prognostic biomarkers as well as biomarkers for the differential diagnosis of CC.Notably, wellknown biomarkers for CC or hepatic progenitor cells such as KRT19 (CK19, 6.0-fold), TACSTD1 (EpCam, 4.1-fold), and PROM1 (CD133, 3.0-fold) were identified.Also, known CC biomarkers such as CEACAM6 (20), MUC1 (21), and CLDN4 (22) were identified indicating the usefulness of the CC signature as novel differential biomarkers for CC.Supervised clustering of the tumors with these genes revealed that a fraction of HCC (14 out of 70) were clustered together with CC samples.These tumors, referred to as CLHCC, showed shorter RFS (P = 5.21 × 10 −6 ) and OS (P = 0.004) compared with other HCC (log rank test; Fig. 1A and B).No significant association with clinical features was found (Supplementary Table S2).
Independent validation of the prognostic value of the CLHCC.To validate and characterize CLHCC in independent data sets with larger sample size, we constructed a gene expression compendium of HCC (HCCcomp, n = 712) by concatenating six independent HCC data sets.We defined the most differentially expressed genes between CLHCC and other HCC as classifiers for CLHCC, which we referred to as cholangiocarcinoma-like expression trait in HCC (i.e., CC signature, n = 625; false discovery rate < 0.0073; Supplementary Table 3).Then, the HCC tumors were classified based on the expression status of the CC signature.Because the clustering-based classification shown in Fig. 1 could be influenced by sample composition in the data set to be tested, we calculated the individual enrichment scores of CC signatures based on Kolmogorov-Smirnov test.The tumors expressing both upregulated (CC_UP, P < 0.01) and downregulated (CC_DOWN, P < 0.01) genes were classified as C1 (representing CLHCC, n = 190), whereas the other tumors were classified as C2 (n = 522).Notably, each subtype in the C1-C2 classification showed homogeneous expression patterns independent of microarray platforms and patient cohorts, which might support the consistency and robustness of our classification (Fig. 2A).
For independent validation, the prognostic values between C1 and C2 tumors were evaluated in two independent cohorts of Chinese (n = 61) and Caucasian (n = 78) patients in the Laboratory of Experimental Carcinogenesis data set which had been described in a previous study (5).The C1 tumors showed shorter RFS and OS compared with the C2 tumors in both cohorts (Fig. 2B and C).In addition, univariate and multivariate analyses in the validation data set (n = 139; Table 1) and test data set (n = 70; Supplementary Table S4) also showed that the C1-C2 clas-sification was significant in predicting RFS as well as OS.Taken together, we suggest that the CC signature is a strong predictor for poor prognosis.
Similar with CC, GO analysis showed that the C1 tumors were enriched with the proliferation, metastasis/adhesion, and development-related functions reflecting their aggressive phenotype (Fig. 3A and B).By contrast, the C2 tumors were enriched with metabolism-related genes, which might be due to the high metabolic rate of well-differentiated hepatocyte-derived HCC.Next, we further sought to identify key regulators of the CC signature.Of the genes which directly interacted with the CC_UP genes (n = 251), the TP53 expression subnetwork was identified as the most prominent subnetwork with significant enrichment (P = 0.036), indicating that the finding was not likely to be observed by chance (Fig. 3C; Supplementary Table S5).These results might support the regulatory role of TP53 for the aggressive phenotype of CLHCC.
Comparison of the expression of ES and CC signatures.Considering the hypothesis that the CHC originated from bipotential liver stem/progenitor cells (4), we examined whether CLHCC (C1) express stem cell-like traits.We evaluated multiple ES-related signatures (i.e., ES1, ES2, and the target genes for Nanog, Oct4, Sox2, and Myc) and the polycomb group target signatures (i.e., Suz2, Eed, H3K27 bound, and PRC2 target genes) which were previously known to play important roles in maintaining the undifferentiating status of ES cells (11,(23)(24)(25)(26)(27).Strikingly, the C1 tumors showed significant enrichment of the ES signatures and combined repression of polycomb target genes (Fig. 4A).Another stemness trait, a hepatoblast-derived signature (HB signature) was also evaluated, which showed similar expression patterns with the ES signatures.In addition, to exclude the possible influence of the proliferation-related genes in the ES signature, we subtracted them from the CC, HB, and ES signatures (noprol), but no significant influence was found Research.
on October 3, 2017.© 2010 American Association for Cancer cancerres.aacrjournals.orgDownloaded from (Fig. 4A).When we compared the genes in these signatures, only the 36 HB genes (5.8%) and the 26 ES1 genes (4.2%) overlapped with the CC signature genes (Supplementary Fig. S3A), implying that the prognostic difference between C1 and C2 classes is not likely to be confounded by the coenrichment of the ES or HB signatures.As expected, no significant influence was found on the CC signature enrichment by subtracting the HB signature and/or ES signature genes (i.e., ES1) from the CC signature genes (i.e., CC_noHB, CC_noES, and CC_noHBnoES; Fig. 4B).Taking these results together, we suggest that the prognostic value of the CC signature is independent of the coexpression of the HB, ES, or proliferation-related genes.
For independent validation, we next evaluated the relationship between CC and ES signatures in each data set that was compiled in HCCcomp.Remarkably, all six data sets showed significant enrichment of ES signatures in C1 tumors compared with C2 tumors.These findings strongly indicate the robustness and consistency of the coexpression of CC and ES signatures regardless of microarray platforms and patient cohorts (Fig. 4C).
Next, when we further examined the individual tumors, not all the C1 tumors coexpressed the ES signature (Supplementary Fig. S3B).This may imply the distinct expression of CC and ES signatures to some extent, which might be derived from intermediate cellular origins during the sequential development stages from the primitive liver stem cells to biliary committed cells.With this concern, we investigated whether the combined expression status of ES and CC signatures could reflect distinct prognostic phenotypes.The HB signature was not considered in the analysis because of its intermediate property between ES and CC The enriched patterns of the GO terms for the four functional categories are indicated by different colors, respectively (the rightmost bar).The four functional categories were chosen according to prominence in the enrichment scores from each group.Other GO terms which are not prominent in the functional grouping are indicated as "unclassified."C, network analysis indicates TP53 as a most prominent regulator of the CC_UP signature with significant enrichment (P < 0.036).Research.
on October 3, 2017.© 2010 American Association for Cancer cancerres.aacrjournals.orgDownloaded from signatures.When we re-classified the 209 tumors (Laboratory of Experimental Carcinogenesis and Seoul National University), from which survival data are available, into four classes based on the expression of CC and ES signatures (i.e., CC + ES + , CC + ES − , CC − ES + , and CC − ES − ), the CC + ES + tumors showed the worst prognosis for both RFS (hazard ratio, 2.84; 95% confidence interval, 1.51-5.34;P = 7.42 × 10 −4 ) and OS (hazard ratio, 2.98; 95% confidence interval, 1.79-4.98;P = 1.2 × 10 −5 ) compared with the CC − ES − showing the best prognosis (Fig. 4D).The CC + ES − (n = 21) and CC − ES + (n = 26) tumors showed intermediate prognostic values, indicating the correlation between prognostic outcomes and the expression status of the CC and ES signatures.Although further validation might be required, this finding might support the idea that the cellular origin at different developmental stages plays a pivotal role in the heterogeneous clinical outcome of HCC.

Discussion
In this study, we addressed the heterogeneity of HCC by identifying a fraction of HCC expressing the cholangiocarcinoma-like traits.Although we profiled relatively small samples of CC, the huge expression difference between CC and HCC (similar to "apples and oranges") allowed us to identify the robust CC signature.The functional and clinical relevance of the CC signature could be further validated by independent data sets (Fig. 2B and C).
The CC signature was concomitantly expressed with the ES or HB signatures suggesting the stem-like features of CLHCC, which could be validated by all six independent HCC data sets.In fact, biliary markers such as CK-7 and CK-19 are frequently used as hepatic progenitor cell markers (10,28,29).Therefore, the expression of stem-like traits or biliary traits alone may not be specific in discriminating the stem cell origin tumors from the biliary cell origin tumors.Further evaluation using HCCcomp revealed the presence of CC + ES − or CC − ES + tumors, indicating the existence of intermediate transition of the expression of those signatures.The ES signature is presumed to be derived from more primitive and pluripotent stem cells.The HB signature might be expressed in the tumors derived from HB cells, whereas the CC signature might be expressed in tumors derived from the biliary lineage cells, including premature and mature cholangiocytes.Based on this concept, the cellular origin of HCC from the sequential development stages could be postulated by the expression status of the ES, HB, and CC signatures as illustrated in Supplementary Fig. S4.This suggests that the different cellular origins of HCC during transition from pluripotent to differentiated CC or HCC might play a critical role in the heterogeneous progression of HCC.However, we cannot exclude the possibility that trans-differentiation and dedifferentiation of hepatocytes or cholangiocytes during cancer development could contribute to the acquisition of these signatures.
Our analysis dissecting the heterogeneous HCC based on the expression of CC signatures could classify the tumors into homogeneous and distinct prognostic phenotypes.Targeting molecular pathways specific to such subpopulations would be more effective for the development of personalized clinical strategies (30).We suggested that the TP53 pathway might play a pivotal role in the development of the CC signature-expressing HCC (CLHCC).The association of TP53 mutation with poor prognosis is well known in many cancer types (reviewed in ref. 31).Moreover, recent studies have shown the critical role of TP53 in the control of neural and glioma stem/progenitor cell renewal and differentiation (32,33).These findings consistently support the pivotal role of TP53 in the aggressive progression of HCC harboring cholangiocarcinoma-like or stem-like traits.
In conclusion, our findings provide novel biological and clinical insights into the cholangiocarcinoma-like traits of HCC, emphasizing the critical role of the developmental stage of the cell of origin in HCC pathogenesis.

Figure 1 .
Figure 1.Identification of CLHCC.A, supervised clustering of CC, CHC, and HCC based on the expression of the CC signature (left).The top 20 most differentially expressed genes between CC and HCC are indicated (right).B, and C Kaplan-Meir plot analyses for RFS (B) and OS (C) between HCC and CLHCC.

Figure 2 .
Figure 2. Validation of the CC signature in HCCcomp.A, the gene expression profiles of HCCcomp were classified into two groups based on the expression of the CC signature (top).The enrichment scores for CC_UP and CC_DOWN signatures (bottom).B and C, Kaplan-Meir plots for RFS (left) and OS (right) in independent cohorts of Chinese (n = 61, B) and Caucasian (n = 78, C) patients, respectively.The follow-up times for RFS and OS are truncated to 5 years.

Figure 3 .
Figure 3. Functional characteristics of the CC signature.A, the functional enrichment scores in the GO hierarchy for each group (C1 and C2).A total of 196 GO terms were significantly enriched in at least one group (P < 0.01, hypergeometric test).B, the enrichment scores in each group are shown by bar-views (right bar).The enriched patterns of the GO terms for the four functional categories are indicated by different colors, respectively (the rightmost bar).The four functional categories were chosen according to prominence in the enrichment scores from each group.Other GO terms which are not prominent in the functional grouping are indicated as "unclassified."C, network analysis indicates TP53 as a most prominent regulator of the CC_UP signature with significant enrichment (P < 0.036).

Figure 4 .
Figure 4. Comparison of the CC signature with ES signatures.A, the enrichment of ES signatures and polycomb target gene sets in C1 and C2 classes in HCCcomp.The enrichment scores of ES signatures without proliferation signature (noprol; right bar).B, bar plots for the enrichment scores of the CC_UP (top) and CC_DOWN (bottom) signatures and the CC_UP and CC_DOWN signatures subtracted by ES (noES), HB (noHB), or ES and HB (noESnoHB) signatures.C, the enrichment ES signatures in six independent data sets are shown.For each data set, the group enrichment in C1 and C2 tumors is indicated in the right bars (P < 0.05).D, Kaplan-Meir plots analyses for RFS (left) and OS (right) based on the expression status of CC and ES signatures in the integrated Laboratory of Experimental Carcinogenesis and Seoul National University data sets (n = 209).The CC + represents C1 tumors, and ES + represents the tumors which express ES1 signature.

Table 1 .
Univariate and multivariate analyses for the selected clinical factors in validation data set