Comprehensive analysis of TPX2-related ceRNA network as prognostic biomarkers in lung adenocarcinoma

Background and aim: Competing endogenous RNA (ceRNA) is believed to play vital roles in tumorigenesis. The goal of this study was to screen prognostic biomarkers in lung adenocarcinoma (LUAD). Methods: Common differentially expressed genes (DEGs) were collected from Gene Expression Omnibus (GEO) databases and The Cancer Genome Atlas databases (TCGA) using GEO2R and “limma” package in R, respectively. Overlapping DEGs were conducted using enrichment of functions and protein-protein interaction (PPI) network to discover significant candidate genes. By using a comprehensive analysis, we constructed an mRNA mediated ceRNA network. Survival rates were used Kaplan-Meier analysis. Statistical analysis was used to further identify the prognosis of studied genes. Results: Integrated analysis of GSE32863 and TCGA databases, a total of 886 overlapping DEGs, including 279 up-regulated and 607 down-regulated genes were identified. Considering the highest term of candidate genes in PPI, we identified TPX2, which was enriched in cell division signaling pathway. Besides, 35 differentially expressed miRNAs (DEmiRNAs) were predicted to target TPX2 and only 7 DEmiRNAs were identified to be prognostic biomarkers in LUAD. Then, 30 differentially expressed lncRNAs (DElncRNAs) were predicted to bind these 7 DEmiRNAs. Finally, we found that 7 DElncRNAs were correlated with the overall survival (all p <0.05). Furthermore, we identified elevated TPX2 was strongly correlated with the worse survival rate among 458 samples. Univariate and multivariate cox analysis showed TPX2 may act as an independent factor for prognosis in LUAD (p <0.05). Then pathway enrichment results suggested that TPX2 may facilitate tumorigenesis by participating in several cancer-related signaling pathways in LUAD, especially in Notch signal pathway. Conclusions: TPX2-related lncRNAs and miRNAs are related to the survival of LUAD. 7 lncRNAs, 7 miRNAs and TPX2 may serve as prognostic biomarkers in LUAD.


Introduction
Lung cancer is the main cause of cancer-related death, and lung adenocarcinoma (LUAD) is one of histologic type, which account for about half of lung cancer [1][2][3][4]. Despite the advances in new targeted therapies, chemotherapy, radiotherapy as well as surgical methods, the 5-year survival rate of LUAD is 20% [5][6][7], which illustrate the urgency to find biomarkers for prognosis of LUAD. So it is vital to identify potential molecular mechanisms and significant prognostic biomarkers in LUAD.
Long non-coding RNAs (lncRNAs) with longer than 200 bp could interact with mRNAs, miRNAs and circular RNAs, participating in complicated regulation network of gene expression. LncRNAs' Ivyspring International Publisher abnormal expressions have linked with various cancers, for instance, prostate cancer, colorectal cancer and pancreatic cancer [8][9][10][11][12]. MiRNAs are classes of short RNA molecules with 19 to 25 nucleotides, which regarded as one of the regulators of gene expression and silenced target genes in post-transcriptional levels. Growing evidences suggested that miRNAs participate in biological process, for instance, DNA damage response, cell proliferation, cell apoptosis [13][14][15]. With further understanding of biological process, a new regulatory mechanism named competing endogenous RNA (ceRNA) was proposed, which provides a better insight into tumorigenesis [16]. CeRNA can bind to miRNA by miRNA response elements (MREs) to induce gene silencing. More and more researches demonstrated that ceRNA network plays crucial roles in numerous cancers [17][18].
Targeting protein for Xenopus kinesin-like protein 2 (TPX2) is a microtubule-associated protein, which was involved in mitotic spindle formation [19][20]. Growing evidence has proved that increased TPX2 contribute to the poor prognosis by triggering spindle dysfunctions, subsequent chromosomal instability or abnormal DNA damage response in tumor development [21]. TPX2 overexpression is positively correlated with lymphatic metastasis, tumor grade and stage, negatively with the survival rate [22][23][24]. However, there are no systematic studies to investigate the prognostic value of TPX2 in LUAD. This study aims to construct a critical TPX2-related ceRNA network in LUAD.

Data collection and significant differentially expressed RNAs screening
The gene expression profile of GSE32863 based on the GPL6884 platform, was selected from Gene Expression Omnibus (GEO) (http://www.ncbi.nlm. nih.gov/geo/) [25]. GSE32863 included 58 pairs of LUAD and normal lung tissues, normalized microarray data was reanalyzed using the GEO2R online tools (https://www.ncbi.nlm.nih.gov/geo/ geo2r/) [26]. The Cancer Genome Atlas (TCGA) (http://cancergenome.nih.gov/) contains expression profiles and prognostic information, so we gathered RNA-sequencing data of LUAD as well. Then "limma" package in R was performed to identify differentially expressed lncRNAs (DElncRNAs), differentially expressed miRNAs (DEmiRNAs) and differentially expressed genes (DEGs). Volcano plot was generated using the "ggplot2" R package, and the "Venn Diagram" package was performed to overlap DEGs. Down-regulated gene was defined as logFC<0, on the contrary, up-regulated was defined as log FC >0. The significant differentially expressed RNAs were defined as p< 0.01 and |logFC| > 1, statistically.

The functional enrichment and pathway analysis
Gene ontology (GO) analysis is a common method to identify functional enrichment of gene data. And the function and pathway analysis were constructed through Database for Annotation, Visualization and Integrated Discovery (DAVID) (https://david.ncifcrf.gov/). Then functional enrichment of candidate genes were visualized using "GOplot" R package. As for signaling pathway analysis, "cluster profile" R package was used to discover significant pathways. p < 0.05 was used to significant terms statistically. To identify the potential function of TPX2, Gene set enrichment analysis (GSEA) (http://software.broadinstitute.org/gsea/ index.jsp) was conducted between the low-expression and high-expression groups [27]. And p < 0.05 was regarded as statistically significant.

Candidate gene selection
Search Tool for the Retrieval of Interacting Genes (STRING) (http://string-db.org/) was used to obtain the predicted interactions for overlapping DEGs with confidence score >0.4 [28], protein-protein interaction (PPI) network of overlapping DEGs was visualized by Cytoscape [29]. The CytoHubba plugin gave us a novel insight into the importance of networks by ranking nodes [30], according to which we found hub genes. The Molecular Complex Detection (MCODE) plugin was performed to select linked genes with degree cutoff = 2 and k-core = 2.

Construction of a TPX2-related ceRNA network
LncRNA and mRNA have the same miRNA response elements (MREs) by competitively binding to miRNA to regulate each other's expression. In this study, a ceRNA network of lncRNAs, miRNAs, and TPX2 was constructed to explore their functions. The relationship between TPX2 and target miRNAs was analyzed by using three databases miRTarBase (http://mirtarbase.mbc.nctu.edu.tw/), TargetScan (http://www.targetscan.org/) and StarBase (http:// starbase.sysu.edu.cn/). The unions of the prediction results of three databases were used to perform further analysis. LncRNAs targeted by the miRNAs were retrieved from the StarBase.

Statistical analysis
Statistical analysis was performed using SPSS and R 3.6.2 software. The differences of TPX2 expression levels between the LUAD and normal lung groups were determined by Student's t-test. The Chi-square test was used to analyze the associations between TPX2 expression and clinicopathological factors of LUAD patients. The survival R packages were performed for survival analysis of TPX2 and survival curve was displayed by Kaplan-Meier plotter and examined by log-rank test. Univariate and multivariate Cox analysis were used to evaluate the prognostic value between TPX2 and other clinicopathological fators. Lung cancer explorer (LCE) database (http://lce.biohpc.swmed.edu/lungcancer/ index.php#page-top) includes gene expression and clinical data from more than 6,700 lung cancer patients, which provides forest plots to summarize tumor-normal standardized mean difference and hazard ratios [31]. The p < 0.05 was considered to be statistically significant.

Differentially expressed RNAs in LUAD
We firstly obtained the gene expression profile of 58 pairs of LUAD tissues and their matched normal lung tissues from GSE32863. A total of 1270 DEGs were identified by using GEO2R. Then the RNAsequencing data and clinical data of LUAD were downloaded from TCGA. Based on the differentially expressed RNAs analysis by limma R package, a total of 186 DElncRNAs, 150 DEmiRNAs and 2953 DEGs were identified, and the proportion of all differentially expressed RNAs from TCGA was visualized ( Figure 1A). As it shown in Figure 1C, the result of volcano plot displayed the distribution of DEGs in the dimensions of -log (FDR) and logFC. On the basis of p <0.01 and |Log2(FC)|>1, a total of 886 overlapping DEGs, including 279 up-regulated and 607 down-regulated genes, were screened from two studies ( Figure 1B).

Enrichment analysis of DEGs in LUAD
To investigate the function of the DEGs, GO and KEGG analysis were performed using the web-based tool DAVID. Enriched terms with p <0.05 were displayed (Figure 2). The genes were mainly enriched in leukocyte migration and positive regulation of angiogenesis in the biological process; collagen trimer and proteinaceous extracellular matrix in the cellular component; transforming growth factor betaactivated receptor activity and glycosaminoglycan binding in the molecular function.

Construction of the PPI network and screening of modules and hub genes
To identify the most important proteins and biological modules which may be involved in the progression of LUAD the PPI network was generated by overlapping genes using STRING and the interactors of top 10 hub genes were visualized using Cytoscape software. Subsequently, to explore the significance of these DEGs, the top 1 module, which displayed the most densely connected region, was selected ( Figure 3A). The result showed the top 10 hub nodes (TPX2, TOP2A DEGs (KIFC1, CDCA8, CCNB2, CDCA7, TIMELESS,  NEK2, CCNF, TPX2, KNTC1, CENPF, BIRC5, AURKA, CDC20, PTTG1, UBE2C, CDCA5) were contained in this term ( Table 1). Above all, top 10 hub genes, which were contained in these 16 genes, indicated that these proteins may serve crucial roles in the whole protein interaction network. Then in consideration of the enrichment results of highest GO term, TPX2 was selected for further analysis.

TPX2 expression and correlation with clinical characteristics
486 LUAD and normal lung samples from the TCGA database were enrolled, and a simple summary of the data was showed in Table 2. The expression of TPX2 in LUAD tissues was significantly up-regulated compared with normal tissues (p <0.001), which is consistent with GEO database (Figure 6A-C). The expression of TPX2 in the 7 independent datasets was pooled in a forest plot (Figure 6D). The SMD was 1.40 (95% CI: 1.29-3.19; I 2 =95%; p = 2.0e-10), which suggested that the occurrence of LUAD is related to expression levels of TPX2. To determine whether TPX2 expression level was related to LUAD progression, expression level of TPX2 from 458 patients was analyzed. Clinical characteristics and their relations with TPX2 were performed, and the results showed TPX2 expression level was significantly associated with gender (p=0.005), stage (p=0.001), pathological T stage (p=0.018) and pathological N stage (p=0.004). However, no significant difference was found in age (p=0.133) and pathological M stage (p=0.703) ( Table 3). Figure 6E showed that the expression level of TPX2 displayed strong correlation with the tumor stage in patients with LUAD (p <0.05).

Prognostic significance of TPX2 in LUAD
To elucidate the prognostic values of TPX2, we assessed the correlation between TPX2 expression level and overall survival rate in LUAD. On the basis of the median risk scores, a total of 458 patients were divided into a high risk group and a low risk group. The results showed that high expression of TPX2 was associated with worse overall survival for LUAD patients (p=0.004) (Figure 7A). Moreover, we used univariate and multivariate analysis to distinguish the risk factors correlated with the prognosis of LUAD patients by using Cox regression model. Univariate independent prognostic analysis showed that clinical stage, pathological T stage, pathological N stage and expression levels of TPX2 were statistically significant with overall survival (p <0.05). Multivariate independent prognostic analysis showed that clinical stage and the TPX2 expression can act as independent prognostic factors (p <0.05) (Figure 7B, Table 4).

Gene Set Enrichment Analysis of TPX2
To further verify the potential function of TPX2 gene, GSEA was used to obtain the biological process. Ultimately, TPX2 was associated with cancer-related signaling pathways including "cell cycle", "RNA degradation", "mismatch repair", "DNA replication", "base excision repair", "endocytosis", "Notch signaling pathway" "PPAR signaling pathway" and "ERBB signaling pathway" (all p <0.05) (Figure 8, Table 5). Enrichment plots of GSEA demonstrated that the gene signatures of "Notch signaling pathway" (NES = 1.76) with higher TPX2 expression were more active than lower TPX2 expression (p <0.05, FDR <0.05). All these results suggested that TPX2 may facilitate tumorigenesis by taking part in several cancer-related signaling pathways in LUAD.

Discussion
Lung cancer is the chief cause of cancer death (18.4% of the 9.6 million cancer death) [32], and LUAD is the most common subtype of lung cancer. Despite advances in the diagnosis and treatment, the overall survival rates remain poor in LUAD [33]. Recent studies have identified novel molecular biomarkers may provide a prognosis value in combination with patients' clinical parameters [34][35], and in which non-coding RNAs (ncRNAs) play vital roles in cancer initiation and progression [36]. For example, LncRNA DGCR5 promotes LUAD progression [37], miR-1323 promotes cell migration in LUAD [38]. With the increasing understanding of ceRNA, more and more people have studied the mechanism of ceRNAs in cancer. Yang et al. indicated that lncRNA LCAT1 regulates RAC1 in lung cancer [39]. And lncRNA WDFY3-AS2 promotes LUAD progression via targeting miR-491-5p/ZNF703 axis [40]. In this present study, we firstly constructed a new mRNA-miRNA-lncRNA regulatory network using RNA sequencing data and microchip data, in which each RNA in the network has important implications for the prognosis of LUAD.
A total of 886 significant DEGs were identified, which mainly enriched in cell growth, cell adhesion and metabolic pathways. Considering the results of functional enrichment analysis, these significant DEGs may regulate growth of LUAD. We obtained 35 core genes from PPI network and these genes were re-analyzed for functional enrichment, in which cell division were chosen for further study because it is the most significant GO terms (p=1.44E-13). Finally, 16 DEGs were significantly enriched in the cell division which is closely related to tumorigenesis. Considering the highest term of candidate genes in PPI, we identified TPX2 for further study. Besides, we discovered that most of these genes have been well studied except TPX2. For example, AURKA promotes tumor proliferation and was associated with worse prognosis in bladder cancer and gastrointestinal cancer [41][42]. Increased AURKB have a poor prognosis in NSCLC [43]. CDCA8 participated in lung carcinogenesis [44]. KIF20A promoted cell proliferation and inhibit apoptosis in LUAD [45]. And UBE2C was found to enhance lung cancer growth [46].   Subsequently, we predicted 30 DElncRNAs and 35 DEmiRNAs, in which 7 DEmiRNAs and 7 DElncRNAs functioned as prognostic biomarkers in LUAD. High expression levels of 4 DEmiRNAs (miR-218-5p, miR-29b-3p, miR-200a-5p, miR-29a-5p) have a better prognosis and high expression levels of 3 DEmiRNAs (miR-17-5p, miR-942-5p, miR-193b-3p) were associated with worse prognosis in LUAD. In the past studies, miR-218-5p inhibited proliferation and migration of NSCLC [47]. LncRNA H19 was found to promote epithelial-mesenchymal transition of LUAD by targeting miR-29b-3p and modifying STAT3 [48]. Wang A et al. proved that miR-29a-5p regulated TETs in colorectal cancer [49]. MiRNA-17-5p inhibited triggers apoptosis in NSCLC [50]. Then we predicted 30 DElncRNAs of 7 DEmiRNAs, and only 7 DElncRNAs were significantly associated with overall survival. Low expression levels of 2 DElncRNAs (MALAT1 and CYTOR) increased survival times. High expression levels of 5 DElncRNAs (KCNQ1OT1, XIST, NEAT1, CASC2, MIAT) have a better prognosis. In previous studies, KCNQ1OT1 indicated a better prognosis in lung cancer [51]. MALAT1 promoted the proliferation and invasion of NSCLC [52]. Finally, a prognosisassociated TPX2-DEmiRNA-DElncRNA network in LUAD was established.
Then our findings revealed that overexpressed TPX2 is correlated with poor prognosis in LUAD. Moreover, univariate and multivariate cox model proved that TPX2 expression can act as an independent factor for prognisis. For example, TPX2 is an independent prognostic factor in esophageal cancer [53][54], digestive system cancer [55] and clear cell renal cell carcinoma [56]. Furthermore, our analysis indicated TPX2 was mostly concentrated in tumor-related pathways. Among results of pathway, we ultimately focused on the Notch pathway, which is connected with cell proliferation, differentiation and survival. And Notch pathway played a crucial role in the development of multiple cancers [57][58]. Previous study showed that activation of the Notch pathway can lead to a malignant phenotype of lung cancer [59]. It has been noted that Notch1 signaling was reported to be activated in NSCLC [60]. Another study proved that Notch2 has a tumor suppressive effect in NSCLC [61]. Hassan et al. presented that Notch3 behaves as a tumor promoter pathway in NSCLC [62]. In conclusion, we constructed a novel TPX2related ceRNA regulatory network by comprehensive bioinformatics analysis, in which all RNAs could be prognostic biomarkers of LUAD. However, these biomarkers need to be further validated by molecular biology experiments.