A Prognostic Model Based on the Immune-related Genes in Colon Adenocarcinoma

Background: Immune-related genes (IRGs) are critically involved in the tumor microenvironment (TME) of colon adenocarcinoma (COAD). Here, the study was mainly designed to establish a prognostic model of IRGs to predict the survival of COAD patients. Methods: The Cancer Genome Atlas (TCGA), Immunology Database and Analysis Portal (ImmPort) database, and Cistrome database were utilized for extracting data regarding the expression of immune gene- and tumor-related transcription factors (TFs), aimed at the identification of differentially expressed genes (DEGs), differentially expressed IRGs (DEIRGs), and differentially expressed TFs (DETFs). Univariate Cox regression analysis was subsequently performed for the acquisition of prognosis-related IRGs, followed by establishment of TF regulatory network for uncovering the possible molecular regulatory association in COAD. Subsequently, multivariate Cox regression analysis was conducted to further determine the role of prognosis-related IRGs for prognostic prediction in COAD. Finally, the feasibility of a prognostic model with immunocytes was explored by immunocyte infiltration analysis. Results: A total of 2450 DEGs, 8 DETFs, and 79 DEIRGs were extracted from the corresponding databases. Univariate Cox regression analysis revealed 11 prognosis-related IRGs, followed by establishment of a regulatory network on prognosis-related IRGs at transcriptional levels. Functionally, IRG GLP2R was negatively modulated by TF MYH11, whereas IRG TDGF1 was positively modulated by TF TFAP2A. Multivariate Cox regression analysis was subsequently performed to establish a prognostic model on the basis of seven prognosis-related IRGs (GLP2R, ESM1, TDGF1, SLC10A2, INHBA, STC2, and CXCL1). Moreover, correlation analysis of immunocyte infiltration also revealed that the seven-IRG prognostic model was positively associated with five types of immunocytes (dendritic cell, macrophage, CD4 T cell, CD8 T cell, and neutrophil), which may directly reflect tumor immune state in COAD. Conclusions: Our present findings indicate that the prognostic model based on prognosis-related IRGs plays a crucial role in the clinical supervision and prognostic prediction of COAD patients at both molecular and cellular levels.


Introduction
Colorectal cancer (CRC) is one of the primary lethal malignancies which can be divided into colon cancer and rectal cancer based on the corresponding primary tumor sites [1]. The incidence cases of CRC are estimated to exceed 2.2 million and the death cases are predicted to exceed 1.1 million deaths by 2030 [2]. Colon adenocarcinoma (COAD) is considered as the most common pathological type of CRC [3]. The incidence and mortality of COAD accounts for 6.1% and 5.8%, respectively, ranking the fourth and fifth among all types of new cancer cases [4]. Although the diverse types of advanced therapies, including surgery, adjuvant radiation therapy or chemotherapy, and targeted molecular therapy, are currently used to treat CRC, the poor prognosis and survival of patients demand prompt solutions due to delayed diagnosis Ivyspring International Publisher and adverse drug effects [5]. In consideration of the dilemma, it is urgent to explore new biomarkers, which might exert an impact on the prognosis of COAD.
At present, immunotherapy is wisely applied in the individualized treatment in a variety of tumors. Of note, interferons (IFNs) have long been demonstrated to play diverse roles, including antimicrobial and antiviral response, cell cycle progression, apoptosis and mediators of other cytokines [6][7][8]. More recently, James et al. have revolutionarily concluded that blocking CTLA-4 would prime T-cells to attack cancer cells [9], and Salem M et al. have revealed the removal of cancer cells by knocking out GARP of Treg cells [10]. Piero et al. have estimated that CDX2 could be recognized as a biomarker for malignant tumors in clinical surveillance and prognosis due to the fact that patients with stage II and stage III colon cancer and a lack of CDX2 suffered neoplasm recurrence and subsequent death [11]. These findings have illustrated the therapeutic importance of immune systems in COAD. Recently, Li et al. and Peng et al have expounded the prognostic value of immune-related genes (IRGs) to non-small cell lung cancer and papillary thyroid cancer, respectively [12,13]. However, it remains unknown of the clinical correlation and prognostic evaluation of IRGs in COAD.
The present study was designed to explore the potential correlation between the clinical prognosis and immune-related genes (IRGs), which were molecular biomarkers that can be further applied to individualized and targeted therapies. In particular, Cox regression analysis was performed to construct an IRGs-based prognostic model. The visual regulatory network formed by DETFs and prognosis-related IRGs demonstrated the potential mechanisms at a molecular level. Additionally, immune infiltration analysis concerning seven-IRG prognostic model as well as immunocyte accumulation might shed new light on the function of immunocytes for prognostic prediction in COAD.

Patient samples and data collection
The transcriptome RNA-sequencing data were retrieved and downloaded from The Cancer Genome Atlas (TCGA) data portal (https://portal.gdc.cancer. gov/); a total of 473 tumor tissues and 41 healthy tissues were included. The data on IRGs were downloaded from the Immunology Database and Analysis Portal (ImmPort) (https://www.immport. org/) [14]. Moreover, the cancer transcriptional genes data were available in Cistrome Project (http://www. cistrome.org/).

Functional enrichment analysis for DEIRGs
To integrate and analyze the IRGs among transcriptome RNA-sequencing data and IRGs, the cancer transcriptional genes data, and the "limma package" in R software [15] was used for data management. Subsequently, DEIRGs were screened, and the cut-off value assigned for false discovery rate (FDR) <0.05 and |log2 fold change| (|logFC|) >1. DEIRGs were used to elucidate the underlying molecular mechanisms based on the findings from Database for Annotation, Visualization and Integrated Discovery (DAVID), Gene Ontology (GO), and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis. For GO analysis conducted using "GO plot packages" (https://cran.rproject.org/ web/packages/GOplot/citation.html) [16], GOCircle and GOChord graphs were obtained on the basis of a precondition and by setting a P <0.05 to indicate statistical significance.

Prognosis-related IRGs and DETFs
After incorporating overall survival (OS) and DEIRGs of patients using Perl language, we performed univariate Cox regression analysis for the preliminary screening of prognosis-related IRGs based on a P <0.05. Afterwards, prognosis-related IRGs were categorized into high-risk prognosisrelated IRGs (HR>1) and low-risk IRGs (HR<1) based on hazard ratios (HRs) derived from univariate analysis. TFs bound with specific DNA sequences and are thereby critically involved in the direct regulation of gene expression. In this study, relevant TFs were acquired from Cistrome database (http://www. cistrome.org/), which is an online tool, and from TCGA by integrating chromatin profiling data and cancer genomics data, respectively, which in turn facilitated the exploration of regulatory correlations of TFs with cancer transcriptomes [17]. For further investigating the association of DETFs, which were extracted from the Cistrome database, with prognosis-related IRGs obtained on the basis of univariate Cox regression analysis, the regulatory network was constructed [18,19] on the basis of a co-expression analysis according to Cor filter >0.04 and a P value filter <0.001 and the TF regulatory network was visualized using Cytoscape 3.7.2.

Prognostic model construction
Multivariate Cox regression analysis was further conducted on prognosis-related IRGs that were obtained based on univariate analysis, aiming at establishing a prognostic model and subsequently calculating the riskScore. Moreover, COAD patients were divided into two groups based on their riskScore, namely the low-risk and high-risk group. Kaplan-Meier (K-M) survival analysis with survival R packages could be used for obtaining the survivorship curve, which revealed the difference in terms of survival between the low-risk and high-risk group by demonstrating their respective survival periods and their corresponding survival rates. Furthermore, the risk plot R package was used to plot the risk curve to reveal the difference in survival periods between the low-risk and high-risk group. And the receiver operating characteristic (ROC) curve along with the area under the curve (AUC) could be obtained by analyzing the prognostic value of prognostic model.

The correlation between the prognostic model and immunocytes
Tumor Immune Estimation Resource (TIMER) (https://cistrome.shinyapps.io/timer/), a publicly available resource for analyzing and visualizing the abundance of tumor-infiltrating immune cells, includes 10,897 samples encompassing 32 types of cancer identified from TCGA and can be used for estimating the abundance of six TIIC subsets (CD4 T cells, CD8 T cells, B cells, neutrophils, macrophages, and dendritic cells) [20]. In this study, we carried out immunocyte infiltration analysis with the Immune Estimation file retrieved from TIMER to explore the underlying correlation between the prognostic model and immunocytes.

Further verification on IRGs in the prognostic model
Oncomine (http://www.oncomine.org) [21] and TIMER (https://cistrome.shinyapps.io/timer/) [22] were both employed for further verifying differences in the expression of IRGs between tumor tissues and healthy tissues in the prognostic model. Moreover, Human Protein Atlas (HPA) (https://www. proteinatlas.org/) [23] was used to extract immunochemical images, which showed the distribution and protein expression of IRGs in the prognostic model.

Identification of DEGs, DEIRGs, and DETFs
In this study, procedures are shown in the Figure 1 wherein 514 cases were collected, including 473 cases of COAD tissues and 41 cases of non-COAD tissues. The clinical features of the subjects are displayed in Table 1. A total of 2450 DEGs (Table S1),    (Figure 2A

GO enrichment and KEGG pathway analysis
To validate the biological characteristics of DEIRGs, functional relationship analyses were performed, including GO and KEGG pathway analysis. DEIRGs were significantly enriched in five GOs (P <0.05), with the most enriched term, "GO: 0005576 extracellular region" in CC category ( Figure  3A, B). The visual presentation of the association of 79 DEIRGs and relevant GO terms is shown in Figure  3C. Figure 3D showed the nine KEGG pathways enriched DEIRGs with statistical significance (P <0.05). Moreover, it was apparent from Figure 3E that the nine KEGG pathways sorted by the number of enriched DEIRGs were in order as follows: Cytokine-cytokine receptor interaction, Neuroactive ligand-receptor interaction, Rheumatoid arthritis, IL-17 signaling pathway, cAMP signaling pathway, Amoebiasis, Natural killer (NK) cell mediated cytotoxicity, Salmonella infection and Reninangiotensin system (RAS). The nine statistically significant signaling pathways in the KEGG database (P <0.05) are shown in Table 4, and a network ( Figure  3F) used for visualizing the interaction between signaling pathways and DEIRGs (P <0.05) was constructed. Based on the network visualization, it was evident that hsa04060 (cytokine-cytokine receptor interaction) was frequently used to validate the KEGG pathway.

TFs-IRGs regulatory network
Univariate Cox regression analysis was performed on DEIRGs for screening prognosis-related IRGs using "survival package" of R software (P<0.05). Consequently, 11 types of genes ( Figure 4A) were identified, including six high-risk IRGs as well as five low-risk IRGs. Co-expression analysis (Cor filter>0.4 and P value filter<0.001) was performed by incorporating seven prognosis-related IRGs and DETFs, which revealed two types of DETFs (TFAP2A and MYH11) and two types of low-risk prognosisrelated IRGs (GLP2R and TDGF1) for regulatory network construction (Figure 4B). IRG GLP2R was negatively modulated by TF MYH11, whereas IRG TDGF1 was positively modulated by TF TFAP2A.

Seven-IRG prognostic model
Multivariate Cox regression analysis further screened seven prognosis-related IRGs, including four high-risk IRGs and three low-risk IRGs, out of the 11 prognosis-related IRGs identified from univariate Cox regression analysis. The expression of risk coefficients was combined to calculate the riskScore, which was supportive of the prognostic model construction. This riskScore was calculated as the sum of the expression quantities of selected IRGs when multiplied with their corresponding coefficients, as represented by the following formula: riskScore = the expression quantity of SLC10A2*(0.9319) + the expression quantity of CXCL1*(−0.2474) + the expression quantity of ESM1*(0.4490) + the expression quantity of INHBA*(0.2186) + the expression quantity of GLP2R*(−2.1451) + the expression quantity of STC2* (0.1822) + the expression quantity of TDGF1* (-0.2097). The above results of multivariate Cox regression analysis are shown in Table 5. Patients were categorized into two groups, namely the high-risk group and the low-risk group, using median riskScore (0.984) as a cut-off value. The Kaplan-Meier (KM) survival curves of both high-risk and low-risk group are displayed in Figure 5A, in which the red curve represents the high-risk group (N=195) and the blue curve represents the low-risk group (N=196). And it was clear that the survival rate of patients in the high-risk group was significantly lower than that in the low-risk group (P =2.567e-04). The AUC of ROC was approximately 0.715 (Figure 5B), and the IRGbased prognostic model was relatively accurate. RiskScore curve, reflecting patient distribution in both high-risk and low-risk groups, are displayed in Figure  5C. Similarly, survival status plot, illustrating the survival status of patients in both high-risk and low-risk group, is shown in Figure 5D. A heatmap revealing the expression of seven prognosis-related IRGs is shown in Figure 5E.

Independent prognostic analysis
Multivariate Cox regression analysis revealed that seven types of prognosis-related IRGs correlated with the prognosis of COAD patients. Univariate independent prognostic analysis ( Figure 6A) and multivariate independent prognostic analysis ( Figure  6B) revealed that age, T staging, and riskScore were significantly independent prognostic factors ( Table 6) (P <0.05).

Seven-IRG prognostic model and clinical features
Some clinical features of COAD, including age, sex, tumor stage, T staging, N staging, and M staging, were assessed for their probable correlation with prognosis-related IRGs via "beeswarm R" packages of R software, as shown in Table 7 (P <0.05). The median values of the selected gene expression were used as cut-off values, and the amounts of medians were found to be directly proportional to the specific clinical features. The median values in T1-2 staging were lower than those in T3-4 staging among ESM1 expression, INHBA expression, STC2 expression and riskScore ( Figure 7E,F,O and I). In terms of the N staging, the median values of CXCL1 expression were relatively higher in N0 staging than in N1-2 staging ( Figure 7A). Meanwhile, the median values of INHBA expression and riskScore were relatively higher in N1-2 staging than in N0 staging (Figure 7G, J). Moreover, in terms of M staging, the median values of CXCL1 expression were relatively higher in M0 staging than in M1 staging (Figure 7B), and for the tumor stage, the median values of INHBA expression and riskScore were significantly elevated in stage Ⅲ-Ⅳ than in stage I-Ⅱ (Figure 7H, K). Meanwhile, the median values of CXCL1 expression were higher in stage I-Ⅱ than in stage Ⅲ-Ⅳ (Figure 7C). The patient's age also played an important role, and the median values of CXCL1 expression were significantly higher in patients aged >65 years than in patients aged <65 years ( Figure 7D); the trend was exactly the opposite for riskScore (Figure 7L). Additionally, the median values of SLC10A2 expression were also statistically different between T staging and M staging (P <0.05) (Figure 7M, N).    Figure 8 revealed a positive correlation of the riskScore of the seven-IRG prognostic model with immunocytes, including dendritic cells, neutrophils, CD8 T cells, macrophages, and CD4 T cells; macrophages were found to have the most significant relationship with riskScore (Cor=0.306, P=6.264e-10).

External verification of seven IRGs using online databases
In the seven-IRG prognostic model, five IRGs were upregulated and the remaining two IRGs were downregulated in COAD. Furthermore, Oncomine was used to externally validate the discrimination regarding the expression of seven IRGs between tumor and normal tissues. As displayed in Figure  9A-E and 10B-F, the expression of STC2, ESM1, INHBA, CXCL1 and TDGF1 was significantly increased in tumor tissues than in normal tissues, whereas GLP2R expression was higher in normal tissues than in tumor tissues (Figure 9F and 10G). Similarly, SLC10A2 expression was also increased in normal tissues than in tumor tissues ( Figure 10A). However, we failed to extract corresponding information on SLC10A2 from Oncomine. The distribution and expression of SLC10A2, STC2, and GLP2R at protein level are shown in Figure 10G-L, whereas the distribution and expression of CXCL1, INHBA, ESM1, or TDGF1 remained inaccessible in HPA.

Discussion
The researches on the predictive effects of biomarkers and their expression on cancer prognosis are a hotspot [24][25][26][27]. CTHRC1 has been estimated as a peritoneal metastasis-related gene for prognostic prediction in CRC [28,29]. However, it is more dominant to study how the immune-related molecular mechanisms underlying the prognosis of COAD. The formation of the inflammatory environment caused by the deficiency of p53 might become the accelerant of colorectal tumors [30], which may be a direct reference to the investigation, progression and prognosis of COAD related to the immune microenvironment. In addition, some studies have shown that better performance of immunoscore in evaluating high-risk recurrence and metastasis of CRC [31,32]. In this study, we mainly aimed to construct the IRGs-related prognostic model, which were screened out based on immune microenvironment.
By analyzing signaling pathways as well as functional enrichment on DEIRGs, the nine KEGG pathways enriched DEIRGs were shown as follows: Cytokine-cytokine receptor interaction, Neuroactive ligand-receptor interaction, Rheumatoid arthritis, IL-17 signaling pathway, cAMP signaling pathway, Amoebiasis, NK cell mediated cytotoxicity, Salmonella infection and RAS. The above nine KEGG pathways were related to inflammatory response, most of which are validated to have relationship with progression and therapies of colon cancer, including cAMP signaling pathway, Salmonella infection, IL-17 signaling pathway, cytokine-cytokine interaction receptor, NK cell mediated cytotoxicity and RAS. Cytokines along with correlated pathways may be involved in COAD progression in early stage [33]. Tseng et al. have shown that the level of IL-17 signaling was enhanced in CRC tissues than normal tissues [34]. Additionally, Chin et al. revealed that IL-17 could enhance the DNA binding capacity of NF-κB to stimulate CCR6 expression, potentially involved in the mechanisms of CRC migration [35]. cAMP signaling stimulation has been demonstrated to suppress tumor cell migration, including melanoma [36,37], breast cancer [37] and colon cancer [37,38] cells. Prospective cohort study has previously demonstrated that high activity of NK cell-mediated cytotoxicity is related to attenuated tumor risk [39], and NK cells have been revealed to be significantly decreased in primary colorectal lesion and liver metastases [40]. In a population-based study on patients with Salmonella infection, Mughini-Gras et al. have shown Salmonella infection as a risk factor for low-grade colon tumors [41]. To further investigate the mechanisms, Lu et al. have reported that β-catenin signals could be stimulated following Salmonella infection, thereby enhancing colonic tumorigenesis [42]. RAS is vitally involved in tumor angiogenesis and tumor cell growth [43,44]. Nakamura et al. have revealed that a combination of RAS inhibitor ARB and anti-PD-L1 antibody showed synergistic anti-tumor effects [44]. Herein, in this study, the biological functions of DEIRGs were comprehensively investigated in COAD populations, which could possibly offer promising foundation to illustrate possible molecular regulatory mechanisms.
Multivariate Cox regression analysis revealed seven types of IRGs, including four types of high-risk IRGs (SLC10A2, STC2, ESM1 and INHBA) and three low-risk IRGs (CXCL1, TDGF1 and GLP2R), can be used to construct prognostic model of COAD, which also showed favorable feasibility for the result that AUC being 0.715. Wnt signaling pathway activation, a signal for CRC carcinogenesis, has recently been reported to cause decreased level of SLC10A2 [45]. STC2 has recently been reported as an independent prognostic biomarker, whose expression is related to colon cancer progression [46]. The overexpression of ESM1 in serum of CRC patients has been revealed Ji et al. [47]. INHBA expression has been previously reported to be increased in CRC tissues than normal tissues, and high INHBA expression might be used as an independent prognostic factor for lymph node involvement in CRC [48]. In terms of CXCL1, increased CXCL1 expression in colon cancer cells has recently been reported, and the carcinogenesispromoting effect of CXCL2-CXCR2 axis is mediated by Gαi-2 and Gαq/11 [49]. Miyoshi et al. have demonstrated that CRC patients with high TDGF1 expression following surgery are significantly more likely to suffer palindromia and have poorer DFS in comparison with those with low expression [50]. GLP2R is regarded to be a pivotal gene type in CRC progression via the modulation of colonic epithelial integration [51]. Both univariate and multivariate independent prognostic analysis showed that age, T staging and riskScore were independent prognostic factors. In this study, seven types of specific IRGs were incorporated to explore the prognostic significance, compared with a previous analogous study. At present, the seven-IRG prognostic model could reflect their correlation with immunocyte infiltration according to the riskScore, which could be used as a type of evaluation indicator of immunologic status. Accumulative studies have demonstrated that immune system components, such as CD8+ and CD45RO+ memory T cells with specific cytokine signatures and possibly B cells, might be prognostic biomarkers, which is associated with tumor evolution with different presence [52][53][54][55][56]. Tumor-associated macrophages (TAMs) infiltration has been reported to be associated with OS of patients with prostate cancer and breast cancer [57,58]. Here, notably, riskScore of prognostic model is significantly associated with macrophages, expanding the research vision concerning the relationship of immunocyte with progression and prognosis of COAD. In recent years, the application of nanomedicine based on immune microenvironment, has been identified as a type of novel and individualized anti-cancer therapy. Amphiphilic nanoparticles have been developed as an adjuvant agent, which can be combined with EphA2derived peptide to assist the immune system to fight against liver metastasis in CRC [59]. Kuai et al. have demonstrated that nanodiscs-based phospholipids and apolipoprotein A1 (ApoA1)-mimetic peptides could be used for antigen presentation on dendritic cells to further elevate antitumor immunity [60], and the prognostic model at the center of IRGs might be used as a favorable evaluation of therapeutic efficiency. A previous research has shown a TFs-miRNA-targeted genes network for exploration of COAD progression [61]. To further integrally explore the possible molecular regulatory mechanisms, an IRGs-TFs network was established to elucidate the correlation between MYH11 and TFAP2A, TFs screened out, with IRGs (GLP2R and TDGF1). Nevertheless, studies concerning the regulatory mechanisms underlying TFs and IRGs in COAD are still in progress. Relevant studies have shown that MYH11 can foster the pathogenesis of leukemia and NSCLC [62,63]. Zhang et al. have demonstrated that TFAP2A can enhance anlotinib resistance via the promotion of tumor-triggered angiogenesis in lung tumor cells [64]. In our research, IRG GLP2R was positively modulated by TF MYH11, while IRG TDGF1 was negatively modulated by TF TFAP2A. Of note, the TFs-IRGs network could be used to novelly present the prospective molecular mechanisms underlying the tumorigenesis and progression of COAD.

Strengths and limitations
In this research, IRGs were synthesized for constructing a prognostic model to predict prognosis in COAD patients using bioinformatics. Survival analysis as well as independent prognostic analysis confirmed that the prognostic model was clinically feasible. Additionally, in-depth molecular regulatory correlation analysis, including immunocyte infiltration analysis as well as TFs-IRGs regulatory network, broadened the horizon on researches concerning COAD progression. Multi-database (Oncomine, TIMER and HPA) analysis was utilized to verify the seven prognosis-related IRGs on the basis of RNA expression and protein level. Nevertheless, this study had certain limitations. First, the applicability of the prognostic model needs to be validated for a larger sample population in future studies. Second, we will continue to advance molecular mechanism experiments on the role of IRGs in COAD progression.

Conclusions
Collectively, DEGs, DEIRGs, and DETFs were retrieved from TCGA, Immport database, and Cistrome database. Univariate and multivariate Cox regression analysis of DEIRGs facilitated the construction of a seven-IRG prognostic model (GLP2R, INHBA CXCL1, STC2, SLC10A2, TDGF1, and ESM1), which could be reliably used for prognostic prediction in COAD patients. Furthermore, both protein and RNA expression of seven IRGs were verified in tumor and normal tissues from Oncomine, TIMER and HPA. In addition, the transcriptional regulatory network and underlying exploration of the association of seven-IRG prognostic model with immunocyte infiltration could potentially uncover new biomolecular interactions in COAD progression, which could be utilized as a potential biomarker for personalized treatment in COAD patients.