Abstract
The presence of tumor-infiltrating lymphocytes (TIL) is a favorable prognostic factor in breast cancer, but what drives immune infiltration remains unknown. Here we examine if clonal heterogeneity, total mutation load, neoantigen load, copy number variations (CNV), gene- or pathway-level somatic mutations, or germline polymorphisms (SNP) are associated with immune metagene expression in breast cancer subtypes. Thirteen published immune metagenes correlated separately with genomic metrics in the three major breast cancer subtypes. We analyzed RNA-Seq, DNA copy number, mutation and germline SNP data of 627 ER+, 207 HER2+, and 191 triple-negative (TNBC) cancers from The Cancer Genome Atlas. P-values were adjusted for multiple comparisons, and permutation testing was used to assess false discovery rates. Increased immune metagene expression associated significantly with lower clonal heterogeneity estimated by MATH score in all subtypes and with a trend for lower overall mutation, neoantigen, and CNV loads in TNBC and HER2+ cancers. In ER+ cancers, mutation load, neoantigen load, and CNV load weakly but positively associated with immune infiltration, which reached significance for overall mutation load only. No highly recurrent single gene or pathway level mutations associated with immune infiltration. High immune gene expression and lower clonal heterogeneity in TNBC and HER2+ cancers suggest an immune pruning effect and equilibrium between immune surveillance and clonal expansion. Thus, immune checkpoint inhibitors may tip the balance in favor of immune surveillance in these cancers. Cancer Res; 77(12); 3317–24. ©2017 AACR.
Introduction
The presence of immune infiltration in the breast cancer microenvironment is a favorable prognostic marker particularly among triple-negative (TNBC), HER2+ and highly proliferative estrogen receptor (ER) positive cancers (1). High levels of immune infiltration, measured as either TIL count or expression of immune-cell related genes, predicts for better survival with or without systemic adjuvant therapy in early stage disease (2–6). Additionally, breast cancers that are rich in immune cells, regardless of subtype, have higher rates of pathologic complete response (pCR) to neoadjuvant chemotherapy (7, 8). The extent of immune infiltration is higher in TNBC and HER2+ cancers than in ER+ disease (7). However, within each subtype there is great variability in TIL counts ranging from no TILs in 10% to 20% of cancers to lymphocyte predominant cancers (i.e., >50% of stromal cells are lymphocytes) in 5% to 10% of cases (4, 7). The biological mechanisms underlying the variable TIL infiltration are unknown.
In a pooled analysis of solid tumors in The Cancer Genome Atlas (TCGA) database, the total number of somatic mutations and the number of new antigen epitopes (i.e., neoantigen load) correlated with immune infiltration (9–11). In hepatocellular, squamous cell lung cancer, and colorectal carcinomas greater number of copy number alterations were associated with higher immunogenicity (12–14). On the basis of these observations one can hypothesize that the more genomic alterations a cancer has, the greater the immune infiltration is, due to more immunogenic neoantigens in these cancers. Somatic mutations in the PI3KCA and MAPK genes were also shown to affect the immune microenvironment (15–17). Germline polymorphisms influence predisposition to immune disorders and response to infectious agents (18–20) and one could therefore speculate that they may also influence antitumor immune response.
The goal of this study was to systematically examine what DNA-level genomic alterations are associated with immune cell infiltration, measured by immune metagene expression, and if these associations differ by breast cancer subtype. We tested if either (i) total mutation load, (ii) neoantigen load, (iii) copy number variations (CNV), (iv) intratumor genomic heterogeneity, (v) gene-level or (vi) biological pathway level somatic mutations, or (vii) germline single-nucleotide variants (SNV) are associated with immune gene expression. Although associations do not imply a cause and effect relationship, they could lead to testable hypotheses in the laboratory and in the clinic.
Materials and Methods
Data sources
We obtained gene-level RNA-Seq expression (n = 1,066), level-4 copy number (n = 1,080), and germline SNV data (n = 501) and corresponding clinical information from TCGA public access portal. Supplementary Table S1 lists the TCGA samples included in this study. Gene-level somatic mutation data (n = 817 cases, n = 14,440 mutations) were obtained from Ciriello and colleagues (21). DNA segments were assigned to copy number categories based on GISTIC threshold scores. We filtered SNVs using the Duplicated Genes Database (DGD), removed rare variants and variants deviating from Hardy–Weinberg Equilibrium, and retained only SNPs with moderate or high functional impact (n = 8,861) using Variant Effect Predictor (22) in the final analysis.
Breast cancer subtypes were defined as (i) ER+/HER− (hereafter referred to as ER+), (ii) HER+ with any ER status (HER2+), and (iii) ER and HER2− (TNBC) based on the routine clinical information available for the samples (n = 1003 for ER and n = 892 for HER2). This routine clinical classification was chosen over PAM50 subtyping because of its more direct clinical applicability and to maintain consistency with previous immune marker studies in breast cancer. When clinical receptor status was unavailable or equivocal (n = 63), HER2 and ER status was assigned on the basis of mRNA expression of ERBB2 and ESR1, respectively. The final sample size for this study was n = 627 ER+ cases, n = 207 HER2+ cases, and n = 191 TNBC cases.
Analysis plan
The expression levels of 13 previously reported immune metagenes were calculated as the mean of the log2-transformed expression of the member genes (5–23). These metagenes correspond to various immune cell types and reflect various immune functions (Supplementary Table S2). The prognostic and chemotherapy response predictive value of each of these metagenes were previously assessed in the TCGA and also independent data sets (5, 23). In some analysis we selected the LCK metagene that showed a high average coexpression with other immune signatures and also correlated significantly with histologic tumor infiltration lymphocyte count, as the single representative measure of immune infiltration for correlation with global genomic metrics.
Neoantigen load data were taken from a previous publication (23). Overall deletion load was defined as the number of genes with GISTIC value of “−2,” and amplification load was defined as the number of genes with GISTIC value of “+2,” indicating definite deletion or amplification of a given segment, respectively. Somatic mutations in TCGA whole exome sequencing samples were detected using MuTech, as previously described (24). Mutation load was calculated as the number of somatic mutations in a sample, normalized by the total length of sequences with adequate read coverage. Mutational heterogeneity was measured using the MATH score, which uses the variance of the variant allele frequency distribution to approximate clonal heterogeneity, however this metric is influenced by the combined effect of clonality and copy number alterations (25). Correlation between immune metagene expression as continuous variable and the genomic metrics were assessed using the Spearman rank correlation coefficient. Significance was assessed by using the upper tail probabilities of Spearman's rho (26).
The association between nonsynonymous somatic mutations or high/intermediate functional impact germline SNVs and immune infiltration was assessed with linear regression after variants were collapsed at gene level. Histologic subtype (infiltrating ductal vs. lobular carcinoma) and the mutation load were included as additional covariates. The P-values were adjusted by calculating empirical FDRs (≤10%). Associations between somatic mutations and immune metagene expression were assessed in a “discovery” analysis, which included all genes with mutation frequency >3%, and a “candidate gene” analysis that included genes from biological pathways related to antigen presenting, cytokines, chemokines, angiogenesis-related signaling (27, 28), the MAPK pathway (29, 30), cell adhesion, and epithelial-to-mesenchymal transition (31). These pathways contained a total of 910 unique genes (Supplementary Table S3).
The association between copy number alterations and immune infiltration was assessed using linear regression of metagene expression as a function of either amplifications or deletions, with histologic subtype and the background rate of copy number alterations included as covariates. Contiguous regions with significant copy number effects (P < 0.05) were defined as copy number peaks. To obtain a null distribution for significance testing, the immune metagene expression value was permuted 500 times and copy number peak significance scores were generated for each permutation. The quantile of each true peak within this null distribution was assigned as the adjusted P value.
For pathway level analysis, we assembled 714 biological pathways from the NCI Pathway Interaction and BioCarta Pathway databases that correspond to most known biological functions (32). For each pathway, we defined an “aberration ratio score” calculated as the number of genes affected by either mutation or copy number change (GISTIC score of +2 or −2), divided by the total number of genes in the pathway. We examined the association between immune metagene expression and pathway aberration scores using linear regression including the histological diagnosis as covariate. To calculate significance, we constructed random gene sets with the same number of genes as a given pathway from our pathway gene pool and calculated aberration scores and their correlation with immune gene expression for these random sets in 1,000 iterations. The coefficients were compiled into a null distribution for each pathway. An observed coefficient from the unperturbed data was considered significant if it was >95% percentile of the null distribution.
Results
Correlation between immune metagenes and global genomic metrics
The expression distribution of 13 immune metagenes in the three breast cancer subtypes is shown in Supplementary Fig. S1 and the correlations between metagene expressions are presented in Fig. 1. The lymphocyte-specific kinase (LCK) metagene showed high average correlation with other immune metagenes across all subtypes and this metagene has also showed a strong correlation with histologic TIL counts in breast cancer samples in a previous study (5), which we have also observed in our data (Supplementary Fig. S2), therefore we selected this metagene as the best single measure of immune infiltration. When compared across breast cancer subtypes, the LCK metagene expression, mutation count, neoantigen load and amplification, and deletion loads were all higher in TNBC compared to the other breast cancer subtypes (Supplementary Fig. S3). When all breast cancers are analyzed together, these genomic metrics correlate closely with TIL and immune gene expression. This is because TNBCs are higher, and ER+ cancers are lower, in both measures. Supplementary Fig. S4 shows the correlations between the 13 immune metagenes and 5 exome-wide genomic features including all breast cancers combined.
Correlation between immune metagene expression and mutation, neoantigen, amplification, and deletion loads, and tumor genomic heterogeneity. ER+ (A); triple-negative (B); HER2+ (C) cancers. Spearman correlation coefficients are shown color-coded to illustrate positive (red) or negative (blue) associations. The top portion shows correlation between immune metagenes, and the lower part between the metagenes and genomic aberration metrics. Significant correlations at P < 0.0001 are outlined in bold.
Next, we examined the correlation between the 13 immune metagene expression levels and five different measures of global genomic aberrations in the three distinct breast cancer subtypes separately. In the subtypes, correlations were weak and in TNBC and HER2+ cancers tended to show an overall negative association between immune signatures and the five different types of genomic aberrations (Fig. 1). Correlation analysis revealed statistically significant negative associations between mutational heterogeneity, measured by MATH score, and the several immune metagenes in each breast cancer subtype. A significant positive association was only seen in ER+ cancers for the STAT1 metagene expression and overall mutation load. Supplementary Fig. S5 shows the correlation between the LCK metagene expression and the five genomic features with the corresponding R2 values for each subtype.
A potential confounder in mutation load and copy number analysis is the variable tumor cellularity of the TCGA samples and that cancers rich in TILs may have a higher normal to cancer cell ratio. To assess if tumor cellularity influenced our results, we applied computationally estimated tumor cellularity using the ASCAT tool (33) to adjust mutation load for each sample and have also performed immune gene signature correlation with the somatic copy number alteration (SCNA) score from Davoli and colleagues (34). The SCNA score is tumor aneuploidy measure that is adjusted for tumor cellularity. Adjusting for tumor cellularity did not substantially alter the associations we observed (Supplementary Fig. S6).
It is important to point out that the correlation coefficients between various immune metagenes and genomic metrics are small, which reflect that many other important variables, which are not captured by these genomic metrics, influence the extent of lymphocytic infiltration.
Correlation between LCK metagene expression and somatic mutations, CNVs and germline SNVs
After filtering somatic mutations to include only mutations with >3% frequency, a total of 188, 104, and 37 mutated genes were present in the ER+, HER2+, and TNBC cohorts, respectively. In ER+ cancers, mutations in six genes were nominally significantly associated with LCK metagene expression, but only two remained significant after adjusting for multiple hypothesis testing (FDR < 10%). Mutations in MAP2K4, which affected 5.3% of cases, were associated with lower, and mutations in TP53 (17.5% of cases) with higher LCK metagene expression (Table 1). In TNBC, mutations in seven genes had nominally significant association but only two remained significant at FDR < 10%. Mutations in MYH9 (4.1% of cases) and HERC2 (3.4% of cases) were both associated with lower LCK metagene expression (Table 1). There were no gene-level mutations significantly associated with immune infiltration in HER2+ cancers. When we restricted analysis only to genes that are involved in regulating the immune system, no additional gene level mutations were identified as significant. These results suggest that the primary driver of immune infiltration in breast cancers is not recurrent somatic mutations.
Genomic alterations significantly associated with either higher or lower LCK immune metagene expression by breast cancer subtype
We performed similar analysis for germline polymorphisms. No SNP was significantly associated with higher immune infiltration in any subtype. In TNBC, we could identify three SNPs that were significantly associated with lower immune infiltration after adjusting for multiple hypothesis testing. These included rs425757 and rs410232, both in the coding regions of the CFHR1 gene, and rs470797 in the coding region of MLP (Table 1). These results suggest minimal contribution from germline polymorphisms reported in the TCGA data, to immune infiltration in breast cancer.
Next, we examined associations between amplifications or deletions and LCK metagene expression. In TNBC, two amplicons 5p12-14.3 and 17q11-241 showed significant association with decreased LCK metagene expression (Table 1). In HER2+ cancers, we found four significant amplifications (1q21-23.1, 1q24-32.1, 17q21.2, 17q21.32) associated with decreased immune infiltration, and one deletion (1p13.2-36.33) associated with increased immune infiltration (Table 1). In ER+ cancers, no copy number alterations were significantly associated with immune infiltration after multiple testing adjustment.
Taken together, these results indicate that there are no recurrent mutations, germline polymorphisms or copy number alterations that account for the majority of between-cancer variability in immune gene expression.
Association between LCK metagene expression and biological pathway-level alterations
In ER+ cancers, aberrations in 11 pathways showed association with immune gene expression at FDR < 10%, 10 of which were associated with lower immune infiltration (Table 1). Eight of the 10 pathways included members of the MAP-kinase family (MAP3K1, MAPK8, MAP2K4, MAPK1, MAPK3, MAP2K1, MAPK14, MAP2K3), suggesting that alterations in MAPK signaling may lead to lower cancer immunogenicity. In TNBC, aberrations in nine pathways showed association with immune infiltration at FDR <10% and in HER2+ cancers, aberrations in seven pathways had FDR < 10% (Table 1). An overview of all significant genomic aberrations at the level of individual cases in each breast cancer subtype are presented in Fig. 2. The results illustrate that the extent of immune cell infiltration is not associated with highly recurrent genomic events but rather with unique combinations of genomic alterations in each cancer.
Genomic alterations associated with LCK immune metagene expression in ER+ (A), triple-negative (B), and HER2+ (C) cancers. Each column represents a sample ordered in ascending order by LCK metagene expression. Each row indicates a type of genomic abnormality that is statistically significantly associated with immune infiltration. Somatic mutations and germline SNPs are shown as binary (i.e., present or absent) variables. Mutation load, neoantigen load, total copy number alteration count, and intratumor heterogeneity (MATH score) and pathway alterations (i.e., higher proportion of mutated genes in the pathway is indicated by deeper shade) are displayed as continuous variables. Pathway alterations are displayed as combined alterations and also as amplifications or deletions only. White, normal genotype; gray, missing data.
Discussion
We examined associations between immune metagene expression and a broad range of DNA-level alterations in breast cancer subtypes. In all subtypes, higher immune metagene expression was statistically significantly associated with lower clonal heterogeneity. In TNBC and HER2+ cancers, higher overall mutation, neoantigen, and CNV loads were also consistently, but not statistically significantly associated with lower expression of a broad range of immune metagenes. These observations support an immune pruning/immune editing effect that is particularly apparent in TNBC. Although cancer neoantigenes are required for mounting an anticancer immune response (35) and a more disturbed cancer genome is more likely to produce more immunogenic epitopes, a robust local antitumor immune response is expected to continuously eliminate highly immunogenic clones and slow the genomic diversification of the cancer or could even lead to complete elimination before it becomes clinically apparent. In the case of clinically apparent, immune-rich cancers, immune surveillance does not completely control the growth but may impose a precarious balance (i.e., near-equilibrium) for a variable length of time, which could be tipped in favor of immune elimination (of microscopic residual cancer) with interventions such as surgery, chemotherapy, or immune checkpoint therapy (36). This model could explain the better prognosis of immune rich cancers and also raise the possibility that immune therapy may have a chemo-preventive effect. In this framework, TNBC with no, or very low, immune infiltration represent cancers that have escaped immune surveillance and are no longer subject to clonal elimination by immune cells, which explains their greater clonal heterogeneity higher mutation load and worse prognosis (Fig. 3).
Schema of tumor evolution under immune editing. A, Neoantigenes are required for mounting an initial anticancer immune response and genomic heterogeneity can foster this. B, A subsequent antitumor immune response may eliminate many of the immunogenic clones and lead to lower clonal heterogeneity and a near-equilibrium. C, With the emergence of immune escape mechanisms, the cancer becomes clonally heterogeneous again, because it is no longer subject to clonal elimination by immune cells. Tumors may progress through these phases at different rate depending on proliferation rate and other variables.
In contrast, in ER+ cancers, we detected a positive but weak association of mutation, neoantigen, and CNV loads with immune infiltration, which reached significance for the overall mutation load (i.e., higher mutation load correlated with higher immune infiltration). These results suggest a different dynamic between immune surveillance and subsequent immune editing in ER+ breast cancer. One might speculate that this difference may reflect the different proliferation rate of these cancers. Most ER+ cancers have a slower growth rate and may spend a longer time in the various phases of “immune struggle,” whereas TNBC has a higher proliferation rate, which could accelerate, reaching either a state of immune escape or near-equilibrium with immune surveillance.
Our original goal was to identify DNA level alterations that are associated with low or high immune gene expression and could therefore suggest possible molecular causes for the variable levels of immune infiltration. We could not identify any high frequency, recurrent, gene-level DNA alterations that are significantly associated with immune metagene expression in breast cancer. This is consistent with a previous report that showed no recurrent neoantigens in cancers but rather a broad distribution of individually rare tumor neoantigens (37). However, in all breast cancer subtypes we observed a few genomic alterations that were significantly associated with immune metagene expression even after adjusting for multiple comparisons (Table 1). Each individual alteration was rare and accounted for only a small portion of variability in immune gene expression. Overall, our analysis indicates that immune infiltration in breast cancer subtypes is not associated with a few highly recurrent genomic events but rather by a broad spectrum of gene and pathway level alterations that each affect small subsets of patients within each subtype. It is tempting to speculate that at least some of the alterations may mechanistically contribute to determining immune infiltration. For examples, in TNBC, two missense SNVs (rs425757 and rs410232) in the CFHR1 (Complement Factor H related 1) gene, an inhibitor of the complement cascade (38, 39), and the stop-gain variant in MBP (Myelin Basic Protein) gene (rs470797) that can regulate Th2 cells (40, 41) were associated with lower immune infiltration. Amplifications at the 17q11-241 region were also associated with lower immune infiltration in TNBC. This amplicon includes the Chemokine (C-C Motif) Receptor 7 (CCR7) gene and high expression of CCR7 was previously shown to cause decreased T-cell presence in the melanoma (42). Deletion in the 1p13-36 region was associated with increased immune infiltration in HER2+ cancers and this amplicon contains the immune checkpoint genes tumor necrosis factor receptor superfamily member 18 and 25 (TNFRSF18 and TNFRSF25). In ER+ cancers, mutations in MAPK kinase 4 (MAP2K4) were associated with low immune infiltration. Pathway-level analysis also identified several biological pathways that had alterations significantly more frequently in cancers with lower immune infiltration, and nine of these pathways included MAPK genes. This pathway was previously linked regulation of the tumor microenvironment. Although these associations do not prove a cause-and-effect relationship, they raise experimentally testable hypotheses and suggest a multiplicity of potential biological mechanisms that influence local antitumor immunity.
In summary, our data suggest that immune surveillance has an impact on sculpting the breast cancer genome. Cancers that have minimal or no immune infiltration have greater clonal heterogeneity, likely suggesting an escape from immune surveillance. However, cancers with high immune infiltration may be in near-equilibrium. These observations suggest that immune checkpoint inhibitors may be the most effective to tilt the balance in favor of immune surveillance in the immune-rich, breast cancers. For breast cancers with little immune infiltration, more complex immunotherapy strategies may be needed to rekindle immune response against a clonally diverse neoplastic population.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Authors' Contributions
Conception and design: A. Safonov, G. Bianchini, C. Hatzis, L. Pusztai
Development of methodology: A. Safonov, T. Jiang, B. Győrffy, T. Karn, C. Hatzis
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): A. Safonov, T. Jiang, T. Karn, L. Pusztai
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): A. Safonov, T. Jiang, G. Bianchini, B. Győrffy, T. Karn, C. Hatzis, L. Pusztai
Writing, review, and/or revision of the manuscript: A. Safonov, G. Bianchini, T. Karn, C. Hatzis, L. Pusztai
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): A. Safonov, L. Pusztai
Study supervision: L. Pusztai
Grant Support
This work was supported in part by grants from the Breast Cancer Research Foundation (L. Pusztai and C. Hatzis) and Susan G. Komen Leadership Award (L. Pusztai), the H.W. & J. Hector-Stiftung, Mannheim (M67; T. Karn), and the Associazione Italiana per la Ricerca sul Cancro (AIRC; MFGA 13428; G. Bianchini).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Footnotes
Note: Supplementary data for this article are available at Cancer Research Online (http://cancerres.aacrjournals.org/).
- Received December 29, 2016.
- Revision received March 13, 2017.
- Accepted April 14, 2017.
- ©2017 American Association for Cancer Research.