Investigating Novel Genes Potentially Involved in Endometrial Adenocarcinoma using Next-Generation Sequencing and Bioinformatic Approaches

Endometrial cancer is one of the most common cancers in women worldwide, affecting more than 300,000 women annually. Dysregulated gene expression, especially those mediated by microRNAs, play important role in the development and progression of cancer. This study aimed to investigate differentially expressed genes in endometrial adenocarcinoma using next generation sequencing (NGS) and bioinformatics. The gene expression profiles and microRNA profiles of endometrial adenocarcinoma (cancer part) and normal endometrial tissue (non-cancer part) were assessed with NGS. We identified 56 significantly dysregulated genes, including 47 upregulated and 9 downregulated genes, in endometrial adenocarcinoma. Most of these genes were associated with defense response, response to stimulus, and immune system process, and further pathway analysis showed that human papillomavirus infection was the most significant pathway in endometrial adenocarcinoma. In addition, these genes were also associated with decreased cell death and survival as well as increased cellular movement. The analyses using Human Protein Atlas, identified 6 genes (PEG10, CLDN1, ASS1, WNT7A, GLDC, and RSAD2) significantly associated with poorer prognosis and 3 genes (SFN, PIGR, and CDKN1A) significantly associated with better prognosis. Combining with the data of microRNA profiles using microRNA target predicting tools, two significantly dysregulated microRNA-mediated gene expression changes in endometrial adenocarcinoma were identified: downregulated hsa-miR-127-5p with upregulated CSTB and upregulated hsa-miR-218-5p with downregulated HPGD. These findings may contribute important new insights into possible novel diagnostic or therapeutic strategies for endometrial adenocarcinoma.


Introduction
Cancers of the corpus uteri, primarily from the endometrium, rank as the sixth most common neoplasm in women worldwide. The incidence increased from 290,000 in 2008 to over 380,000 in 2018 Ivyspring International Publisher (1). Estrogen exposure, either endogenous or exogenous, is a major risk factor of endometrial cancer, while endometrial cancer is generally divided into two distinct types, type I (estrogen-related) and type II (non-estrogen-related) (2). As mentioned in a large review, strong evidence suggested that three factors were associated with endometrial cancer: increased body mass index and increased waist-to-hip ratio were associated with increased risk, while increased parity reduced the risk of disease (3). The genetic mechanism underlying the pathogenesis of endometrial cancer is not fully understood. In type I endometrial cancer, which account for nearly 80% of endometrial cancer, PTEN mutation, hMLH1 methylation, and hMSH6 mutation are important in atypical hyperplastic change of normal endometrium. Mutations in PTEN, KRAS, and CTNNB1 are associated with malignant change from atypical endometrial hyperplasia to low-grade endometrioid cancer, while P53 mutation plays an important role in advancing low-grade cancer into high-grade one (4). In type II endometrial cancer, mutations in P53 and HER2/neu are associated with non-endometrioid malignant transformation from normal or atrophic endometrium (4).
Traditionally, patients suffered from endometrial cancer have a favorable treatment outcome if diagnosed in the early stage. The overall five-year survival rate of endometrial cancer is 81%, but is only 17% if distal metastasis occurs (5). The three-year overall survival rate is 96.2% for women without recurrence; however, it is 73.4% for women with vaginal vault recurrence, loco-regional nodal recurrence, or local central pelvic recurrence, and is only 38.1% for those with distal metastases and/or peritoneal carcinomatosis (6). This might be result from the absence of a perfect treatment modality for advanced or recurrent disease currently.
The development of next-generation sequencing (NGS) technologies provides the capability to rapidly sequence exomes, transcriptomes, and genomes at relatively low cost. The application of this technology to catalog the mutational landscapes of tumor exomes, transcriptomes, and genomes has remarkably accelerated the progress in basic and clinical cancer researches (7), making precision medicine possible (8). Individual cancer patients can therefore receive personalized care with the most suitable drugs at the appropriate dose and at the right time (8).
As microRNAs have the ability to repress the expression of protein-coding genes, they might contribute to the pathogenesis of various diseases including cancer (9)(10)(11)(12)(13). Functional studies have shown that microRNA dysregulation plays important role in the development and progression of various cancers (9). Some microRNAs may act as either tumor suppressors (miR onco-suppressors) or tumor enhancers (onco-miRs), and anti-cancer treatment with microRNA mimics or molecules targeted at miRNAs are under development. With increasing knowledge of the microRNA-mediated changes in cancer cells, we will have better opportunity to develop a better microRNA-based anti-cancer treatment.
Through identifying novel gene expression signature and microRNA-gene interactions in endometrial adenocarcinoma, we may provide new perspectives for the development of novel diagnostic methods, prognostic predicting tools, and therapeutic strategies of endometrial adenocarcinoma. Therefore, in this study, we would like to identify the differentially expressed gene and the potential regulatory mechanisms through microRNAs in endometrial adenocarcinoma with systematic bioinformatics analysis.

Study design
The flowchart of study design is illustrated in Figure 1. The cancer part and non-cancer part (normal endometrial tissue) were taken from the surgical specimen of a 53-year-old woman with stage Ia endometrial adenocarcinoma cancer after informed consent was obtained. This pair of tissues was sent for NGS to assess the expression profiles of mRNAs and microRNAs. Using bioinformatic tools, including Search Tool for the Retrieval of Interacting Genes (STRING), the Database for Annotation, Visualization and Integrated Discovery (DAVID), and Ingenuity ® Pathway Analysis (IPA), the altered functions and pathways related to the dysregulated genes in endometrial cancer were investigated. In addition, the potential targets of the significantly dysregulated microRNAs were predicted with miRmap, TargetScan, and miRDB, and the potential microRNA-mRNA interactions in endometrial cancer were identified.

NGS for microRNA and mRNA expression profiles
The expression profiles of microRNAs and mRNAs were examined using NGS as in our previous studies (10,11,(13)(14)(15)(16) (17), alignment of the qualified reads were performed using HISAT2 (18,19), which is a fast and sensitive alignment program for mapping NGS reads to genomes based on hierarchical graph FM index. The genes with low expression levels (< 0.3 fragment per kilobase of transcript per million mapped reads [FPKM]) in any group were excluded. The p values were calculated by Cuffdiff with non-grouped samples using the "blind mode", in which all samples were treated as replicates of a single global "condition" and used to build a model for statistical test (20,21). The q values were the p values adjusted with false discovery rate using the method by Benjamini and Hochberg (22). Genes with q-value < 0.05 (i.e., -log10(q value) > 1.3) and > 2-fold changes were considered significantly differentially expressed.
For small RNA sequencing, samples were prepared using Illumina sample preparation kit as per the TruSeq Small RNA Sample Preparation Guide. The 3' and 5' adaptors were ligated to the RNA, and then reverse transcription and PCR amplification were performed. The cDNA constructs were size-fractionated and purified using a 6% polyacrylamide gel electrophoresis and the bands corresponding to the 18-40 nucleotide RNA fragments (140-155 nucleotide in length with both adapters) were extracted. After sequencing on an Illumina (San Diego, CA, USA) instrument (75 bp single-end reads), the data was processed with the Illumina software. After trimming and filtering out low-quality data with Trimmomatics (17) and clipping the 3' adapter sequence and discarding reads shorter than 18 nucleotides with miRDeep2 (23), the qualified reads were aligned to the human genome from University of California, Santa Cruz (UCSC). Because microRNAs usually map to few genomic locations, only reads mapped perfectly to the genome ≤5 times were taken. MiRDeep2 is useful for estimating the expression levels of known microRNAs, as well as identifying novel microRNAs. The microRNAs with low levels (<1 normalized read per million (rpm)) in both groups were excluded. The microRNAs with >2 fold change are considered significantly changed. Analyses using microRNA target predicting databases miRmap (http://mirmap.ezlab.org/) is an open-source software library which can provide comprehensive prediction of microRNA targets (24). The putative target genes could be identified by calculating the complementary ability of microRNA-mRNA interactions. The prediction results provide a list of putative target genes with miRmap scores, which are predictive reference values representing the repression strength of the microRNAs on a target mRNA. In this study, the criteria for selection of putative microRNA targets were miRmap score ≥ 97.0.
TargetScan (http://www.targetscan.org) is an online database predicting the target of microRNA by searching for the presence of conserved 8mer, 7mer, and 6mer sites matching the seed region of each microRNA (25). The results of predictions are ranked by the predicted efficacy of targeting or by their probability of conserved targeting (25). TargetScan could provide a valuable resource for investigating the role of microRNAs in gene-regulatory networks. miRDB (http://mirdb.org) provides web-based microRNA-target prediction and functional annotations in five species, including human, mouse, rat, dog, and chicken (26,27). In miRDB, all targets were predicted by MirTarget, which was developed by analyzing microRNA-target interactions from high-throughput sequencing experiments.

Analysis using STRING
The functional interactions between expressed proteins in cells are very important and complicated. STRING database (https://string-db.org/) has collected and integrated this information, by consolidating known and predicted protein-protein association data of various organisms (28). The protein-protein interactions, including direct (physical) and indirect (functional) interactions, collected in STRING are derived from five main sources, including conserved co-expressions, high-throughput lab experiment, genomic context predictions, automated text-mining, and previous knowledge in database. In this study, the significantly dysregulated genes were input into STRING for protein-protein interaction network analysis. The minimum required interaction score was set to the medium confidence (score = 0.400). In addition, STRING also provides information of Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway.

Analysis using DAVID
DAVID (https://david.ncifcrf.gov/) is a powerful tool for functional classification of genes (29). It integrates gene ontology, biological process, and KEGG pathway. In DAVID database, a list of interesting genes can be classified into clusters of related biological functions, signaling pathways, or diseases by calculating the similarity of global annotation profiles with an agglomeration algorithm method. An Expression Analysis Systematic Explorer (EASE) score is a modified Fisher's exact p value in DAVID database which represents how specifically the genes are involved in a category. In this study, we selected EASE score = 0.1 as the default and defined pathways with a q value (p value adjusted with false discovery rate using the method by Benjamini, et al.) <0.05 as significant.

Analysis using IPA
IPA (Ingenuity systems, Redwood City, CA, USA) is a database software containing large database with detailed and structured findings reviewed by experts, which was derived from thousands of biological, chemical and medical researches (30). IPA enables rapid searching, analysis, integration, and recognition of data from gene and single nucleotide polymorphism (SNP) arrays, RNA and small RNA sequencing, proteomics and many other biological experiments (30). Deeper understanding and identification of related signaling pathways, upstream regulators, molecular interactions, disease process, and candidate biomarkers are also available (30). In this study, we used IPA to assess the diseases and functions associated with the significantly dysregulated genes in endometrial adenocarcinoma. The disease and function with a p value < 0.05 was considered significant.

Analysis using the Human Protein Atlas
The Human Protein Atlas is a Swedish-based program initiated in 2003 with the aim to map all the human proteins in cells, tissues, and organs using integration of various omics technologies, including antibody-based imaging, mass spectrometry-based proteomics, transcriptomics, and systems biology (31). All data collected in it is open-access to allow scientists, either in academia or industry, to freely access the data for exploration of the human proteome. The Human Protein Atlas consists of three separate parts, each focusing on a particular aspect of the genome-wide analysis of the human proteins, including the Tissue Atlas showing the distribution of the proteins across all major tissues and organs in the human body, the Cell Atlas showing the subcellular localization of proteins in single cells, and the Pathology Atlas showing the impact of protein levels for survival of cancer patients. This program has already contributed to several thousands of publications in the field of human biology and disease and it is selected by the organization ELIXIR (http://www.elixir-europe.org) as a European core resource due to its fundamental importance for a wider life science community. In this study, the prognosis-predicting values of the significantly dysregulated genes in endometrial cancer were assessed with the Human Protein Atlas. Based on the FPKM value of each gene, the patients were classified into two groups, low-expression and high-expression, to compare their prognoses. The prognosis (survival) of each group of patients was examined and compared with Kaplan-Meier survival estimators and log-rank tests. To determine the best cut-off FPKM values for grouping the patients, all FPKM values from the 20 th to 80 th percentiles were used to group the patients, and the cut-off FPKM value that yielded the lowest log-rank p value was selected.

Differential gene expressions in endometrial cancer
Using NGS, the gene expression profiles of the cancer part and non-cancer part of the surgical specimen from the patient with endometrial adenocarcinoma were assessed ( Figure 2A). As shown in the volcano plots ( Figure 2B), significantly dysregulated genes in endometrial adenocarcinoma (cancer part vs. non-cancer part) (those with -log10(q value) > 1.3 and fold change > 2) were identified (Table 1), including 47 upregulated and 9 downregulated genes.
Using STRING to investigate the protein-protein interactions of the significantly dysregulated genes in endometrial adenocarcinoma, we built a highly interactive protein-protein interaction (PPI) network of 56 nodes and 67 edges (enrichment p value < 1.0 x 10 -16 ) (Figure 3). Most genes in the PPI network were associated with three biological pathways, including defense response (19 genes), response to stimulus (44 genes), and immune system process (21 genes). Furthermore, the KEGG pathway analysis indicated that human papillomavirus (HPV) infection might be the most significant pathway involved in endometrial adenocarcinoma (q value = 0.0038) ( Table 2).
We then used DAVID to analyze the biological processes, cellular components, and molecular functions associated with the 56 significantly dysregulated genes in endometrial adenocarcinoma ( Table 3). The significant biological processes included response to virus (6 genes) and type I interferon signaling pathway (5 genes). The significant cellular components included extracellular space (19 genes), extracellular exosome (25 genes), and cell surface (9 genes). The only significant molecular functions associated with the 56 significantly dysregulated genes was protease binging (7 genes).
Using IPA, the associated diseases and functions of the 56 significantly dysregulated genes in endometrial adenocarcinoma were investigated ( Figure 4). The diseases and functions significantly associated with these dysregulated genes belonged to three categories, including cell death and survival (downregulated), cellular movement (upregulated), and cellular development and tissue development (upregulated).  * p values adjusted with false discovery rate Abbreviation: FPKM, fragments per kilobase of transcript per million mapped reads. The genes highlighted with underlines were those significantly associated with poorer (in red color) or better (in blue color) prognosis, as shown in Figure 5.

The possible genes associated with prognosis of endometrial cancer
Using the Human Protein Atlas, the prognosis-predicting values of the 56 significantly dysregulated genes in endometrial cancer were assessed ( Figure 5). Totally, the information of 541 patients with endometrial cancer (Table 4) was obtained for analyses. The median follow-up time of this cohort was 2.5 years. Among these 56 significantly dysregulated genes, 6 genes were significantly associated with poorer prognosis (PEG10, CLDN1, ASS1, WNT7A, GLDC, and RSAD2) and 3 genes were significantly associated with better prognosis (SFN, PIGR, and CDKN1A). * p values adjusted with false discovery rate The genes highlighted with underlines were those significantly associated with poorer (in red color) or better (in blue color) prognosis, as shown in Figure 5.  <0.001 a q value < 0.05 was considered significant. * p value adjusted with false discovery rate. The genes highlighted with underlines were those significantly associated with poorer (in red color) or better (in blue color) prognosis, as shown in Figure 5. Abbreviation: DAVID, Database for Annotation, Visualization and Integrated Discovery.

Potential dysregulated microRNA-mRNA interactions in endometrial cancer
Using NGS, 227 significantly dysregulated microRNAs (>2 fold change, including 186 upregulated and 41 downregulated microRNAs) were identified. We predicted the potential targets of these microRNAs with miRmap database, selecting the targets in the list of the 56 significantly dysregulated genes in endometrial cancer and the microRNA-mRNA interactions with miRmap score ≥ 97.0, and found 34 possible microRNA-mRNA interactions (including 12 interactions between a downregulated microRNA and an upregulated mRNA and 22 interactions between an upregulated microRNA and a downregulated mRNA) involving 16 mRNAs (9 upregulated and 7 downregulated mRNAs) (Figure 1). Further investigation using TargetScan and miRDB databases showed that only two microRNA-mRNA interactions, downregulated hsa-miR-127-5p with upregulated CSTB and upregulated hsa-miR-218-5p with downregulated HPGD, were validated in both TargetScan and miRDB databases (Table 5).

Discussion
In the current study, significantly dysregulated genes, especially those mediated by dysregulated microRNAs, in endometrial adenocarcinoma were investigated comprehensively using an approach with NGS and bioinformatics. We found 56 significantly dysregulated genes, including 47 upregulated and 9 downregulated genes. Most of these genes were associated with defense response, response to stimulus, and immune system process, suggesting the important association between endometrial adenocarcinoma and immune response. Interestingly, the KEGG pathway analysis in STRING showed that HPV infection was the most significant pathway in endometrial adenocarcinoma, suggesting the possible role of HPV in the carcinogenesis of endometrial cancer. Further analyses using DAVID also implied that endometrial adenocarcinoma might be associated with virus infection. On the other hand, the analyses using IPA found that these 56 genes were associated with decreased cell death and survival as well as Figure 3. Protein-protein interaction network analysis of the dysregulated genes in endometrial adenocarcinoma. The 56 significantly dysregulated genes (47 upregulated and 9 downregulated) were input into the Search Tool for the Retrieval of Interacting Genes (STRING) database for protein-protein interaction (PPI) network analysis. The minimum required interaction score was set to the medium confidence (score = 0.400). Nodes represent proteins and edges represent protein-protein associations. Nodes without edges are not displayed. This analysis obtained a highly interactive PPI network of 56 nodes and 67 edges, with PPI enrichment p value of < 1.0 х 10 -16 . Most genes in the PPI network were associated with three biological pathways, including defense response (19 genes, shown in blue), response to stimulus (44 genes, shown in green), and immune system process (21 genes, shown in red).
increased cellular movement, which were common behaviors of malignant cells. The analyses using Human Protein Atlas, identified 6 genes significantly associated with poorer prognosis and 3 genes significantly associated with better prognosis. Combining with the data of microRNA profiles using microRNA target predicting tools, two significantly dysregulated microRNA-mediated gene expression changes in endometrial adenocarcinoma were found: downregulated hsa-miR-127-5p with upregulated CSTB and upregulated hsa-miR-218-5p with downregulated HPGD.
It has been debated whether endometrial cancer is associated HPV, although a strong association between HPV and cervical cancer is well-known. The close anatomical proximity to the cervix has led researchers to study whether HPV has a role in the carcinogenesis of endometrial cancer. However, a systematic review and meta-analysis revealed a pooled prevalence of HPV DNA in endometrial cancer tissue of 10.0% (95% confidence interval: 5.2-16.2%) with large between-study heterogeneity related to the HPV DNA detection methods, and concluded that HPV had a limited or no role in the carcinogenesis of endometrial cancer (2). Nevertheless, the absence of HPV DNA in the endometrial cancer tissue cannot totally exclude the possible contribution of HPV in the pathogenesis of endometrial cancer. In our study, however, the significantly dysregulated genes in endometrial adenocarcinoma was associated with immune responses, and the KEGG pathway analysis showed that HPV infection was the most significant pathway, suggesting the possible role of HPV in the carcinogenesis of endometrial cancer. Further large-scale study is needed to elucidate the association between HPV and endometrial cancer.  Table 5. Potential miRNA-mRNA interactions involved in endometrial adenocarcinoma, validated with TargetScan, miRDB, and miRmap. Aberrant expression of various microRNAs in endometrial cancer has been reported (32). Significantly higher serum levels of miR-186, miR-222, and miR-223 and significantly lower serum level of miR-204 were noted in patients of endometrial carcinoma than in the matched control subjects (33). Some studies also demonstrated the possibility of using extracellular vesicles isolated from the peritoneal lavage as biomarkers of endometrial cancer (34). In our study, we found significantly downregulated has-miR-127-5p and upregulated has-miR-218-5p in endometrial adenocarcinoma. In line with our findings, Dong et al. have also demonstrated downregulation of miR-127 in endometrial cancer using microRNA microarray profiling (35) and Delangle et al. have shown upregulation of miR-218 in endometrial cancer tissue using quantitative reverse transcription polymerase chain reaction (32,36).
The CSTB gene encodes cystatin B in humans, which may interact with cathepsin B (37). Cysteine cathepsins are highly upregulated in many cancers (38). Being in various locations (being secreted, cell-surface, and intracellular space), they involve in many proteolytic pathways that may contribute to the progression of cancers (38). In human epithelial ovarian tumors, cystatin B was a progression marker, which was associated with the transforming growth factor β (TGF-β) signaling pathway (39). In a previous study enrolling 27 patients of endometrial cancer, increased expression of cathepsin B was found as a predictor of more aggressive cancer behavior over time, suggesting its potential as being a tumor marker of unfavorable outcome (40). In the present study, significantly increased CSTB expression was noted in the cancer tissue, which might be related to the early stage of the cancer in our patient. Further study should be taken to elucidate the role of CSTB.
The HPGD gene encodes 15-hydroxyprostaglandin dehydrogenase, an important enzyme responsible for inactivating prostaglandins and associated eicosanoids via reducing the 15S-hydroxyl group, which have been demonstrated as an important factor of disease-associated pain and inflammation in patients with endometriosis (41). Decreased HPGD expression is associated with abnormal prostaglandin metabolism in endometriosis (42). Generally speaking, 15-hydroxyprostaglandin dehydrogenase is considered a tumor suppressor (43). It has been reported that HPGD is associated with various types of cancer, such as bladder cancer, gastrointestinal cancer, breast cancer, and cervical cancer (44)(45)(46)(47). In a recent study, decreased inactivation of prostaglandins

E2
and F2α (PGE 2 and PGF 2α ) via 15-hydroxyprostaglandin dehydrogenase was noted in type II endometrial cancer, while low HPGD expression was associated with worse progression-free survival and overall survival (48). In line with the previous studies, we found decreased HPGD expression in endometrial adenocarcinoma tissue, which might be suppressed by upregulated miR-218-5p.
A major limitation of this study should be specified. This study was conducted mainly using the NGS data of a pair of tissue from a patient. Since it was mainly a single subject study, the findings might not be applied to other patients. However, our study provided insights to understand the pathogenic mechanisms of endometrial adenocarcinoma. After further validation, the potential targets identified in our study might provide scientific basis for developing novel diagnostic and treatment modalities for endometrial cancer.
In summary, we identified 56 significantly dysregulated genes in endometrial adenocarcinoma. These genes were involved in defense response, response to stimulus, and immune system process, as well as the pathway associated with HPV infection. In these genes, 6 genes were associated with poorer prognosis and 3 genes were associated with better prognosis. We further found two significantly dysregulated microRNA-mediated gene expression alterations in endometrial adenocarcinoma: downregulated hsa-miR-127-5p with upregulated CSTB and upregulated hsa-miR-218-5p with downregulated HPGD. These findings may contribute important new insights into possible novel diagnostic or therapeutic strategies for endometrial cancer.