A Clinical Decision Support System for Diabetes Patients with Deep Learning: Experience of a Taiwan Medical Center

Background: Diabetes mellitus (DM) is a major public health problem worldwide. It involves dysfunction of blood sugar regulation resulting from insulin resistance, inadequate insulin secretion, or excessive glucagon secretion. Methods: This study collated 971,401 drug usage records of 51,009 DM patients. These data include patient identification code, age, gender, outpatient visiting dates, visiting code, medication features (included items, doses, and frequencies of drugs), HbA1c results, and testing time. We apply a random forest (RF) model for feature selection and implement a regression model with the bidirectional long short-term memory (Bi-LSTM) deep learning architecture. Finally, we use the root mean square error (RMSE) as the evaluation index for the prediction model. Results: After data cleaning, the data included 8,729 male and 9,115 female cases. Metformin was the most important feature suggested by the RF model, followed by glimepiride, acarbose, pioglitazone, glibenclamide, gliclazide, repaglinide, nateglinide, sitagliptin, and vildagliptin. The model performed better with the past two seasons in the training data than with additional seasons. Further, the Bi-LSTM architecture model performed better than support vector machines (SVMs). Discussion & Conclusion: This study found that Bi-LSTM models is a well kernel in a CDSS which help physicians' decision-making, and the increasing the number of seasons will negative impact the performance. In addition, this study found that the most important drug is metformin, which is recommended as first-line treatment OHA in various situations for DM patients.


Introduction
Diabetes mellitus (DM) is a major public health problem worldwide. It involves dysfunction of blood sugar regulation resulting from insulin resistance, inadequate insulin secretion, or excessive glucagon secretion [1]. There are two types of DM. Type 1 DM is usually due to an autoimmune disorder and involves the destruction of pancreatic beta-cells. Type 2 DM is caused by impairment of glucose regulation due to the malfunction of pancreatic beta cells or insulin resistance [1]. Treatment using oral hypoglycemic agents (OHA) for type 2 DM may have negative side effects, such as hypoglycemia. Therefore, it is crucial to ensure the safety and efficacy of OHA usage [1][2][3][4][5][6][7].

Ivyspring
International Publisher rough sets, and trajectory methods [13][14][15][16][17][18][19]. Glycated hemoglobin (HbA1c) is extensively studied in these approaches because it is a good indicator of DM control. DM patients with higher Hba1c measures are more likely to experience renal diseases, macrovascular events, cardiovascular diseases, retinopathies, skin ulceration/gangrene, and high mortality [3]. A well-controlled HbA1c value plays an important role in DM management. CDSSs have been shown to be effective in supporting HbA1c control. For example, O'Connor et al. showed that the HbA1c of DM patients significantly improve when the physicians use a CDSS compared with when physicians do not use a CDSS (p < 0.01). Moreover, 94% of physicians using the CDSS were satisfied for this application and physicians continued to use the CDSS for more than one year without research funding support [20].
Recently, deep learning methods have dramatically improved different fields of medical care and research [21,22]. They have also been used as the core methods to build the CDSS [23,24]. For example, convolutional neural networks (CNNs) are used to process image data and recurrent neural networks (RNNs) are used for sequential pattern problems [23,25]. Sun et al. proposed a method to predict blood sugar levels at four intervals, namely 15, 30, 45, and 60 minutes, using the long short-term memory (LSTM) model and the bidirectional-LSTM (Bi-LSTM) model [26]. Therefore, we devise a CDSS using a Bi-LSTM model with HbA1c as the outcome index for managing OHA usage. The structure of the proposed CDSS, the LSTM model, and the Bi-LSTM model are shown in Figure 1.

Materials and Methods
We collated 971,401 drug usage records of 51,009 diabetes mellitus (DM) patients from January 2012 to December 2014 (12 seasons) and 313,165 laboratory records of 74,792 DM patients in a medical center from January 2012 to June 2015 (14 seasons). These data included patient identification code, age, gender, outpatient visiting dates, visiting code, medication features (included items, doses, and frequencies of drugs), HbA1c results, and testing time. The data were combined and cleansed. Twelve seasons of data and 17,844 DM patients were included in this study. The data were evaluated with five-fold crossvalidation (training data = 80% and testing data = 20%) ( Figure 2). We applied an RF model with mean square error (MSE) for feature selection where higher mean decrease MSE indicated more important parameters [27][28][29]. OHA dosages and codes were collected. This study was approved by the Institutional Review Board (IRB) of the MacKay
To evaluate the models, we used root mean , and the Matthews correlation ). We applied Pearson's chi-squared test and the student t-test for data analysis. The statistical analysis was conducted using SPSS version 19.0 (SPSS Inc., Chicago, IL, USA). Statistical significance was defined as p < 0.05.

Results
Of the included 17,844 cases, 8729 (49.0%) were male and 9115 (51.0%) were female. The mean age was 62.3 years old (SD = 11.9) overall, 60.4 (SD = 11.8) for males, and 64.2 (11.7) for females. The 45 to 64 year old age had the most cases (8,507 cases), followed by those aged above 65(6,966 cases). The mean Hba1c was 7.6% (SD = 1.7). There were 13,346 cases with Hba1c higher than 6.5% and 4,498 cases whose Hba1c were less than 6.5% (Table 1).    This study treated every season as ground truth from 2013 Q1 to 2015 Q1 and constructed nine datasets, each having a different sample size. For example, the dataset of 2014 Q4 had 12,677 and 3169 cases as training and test samples, respectively. Using other data as independent factors, we designed three kinds of models. The first used two seasons of data to This study also evaluated differences in Hba1c between seasons. For example, we calculated the differences in mean Hba1c between 2015 Q1 and 2014 Q4 (0.87%), 2014 Q3 (0.98%) and 2014 Q2 (1.09%). We found that longer time distances had greater differences in Hba1c (Table 3).
The sensitivity of the SVM models were not significantly different to each other (two seasons: 0.88±0.03, three seasons: 0.88±0.02, four seasons: 0.89±0.02). The sensitivity of the Bi-LSTM models gradually decreased non-significantly (two seasons: 0.83±0.16, three seasons: 0.80±0.21, four seasons: 0.77±0.23), but performed worse than the SVM models. Table 3. Research design and Hba1c differences between first and last seasons in each model. There are nine datasets. The models use two/three/four seasons to predict the drugs for the third/fourth/fifth seasons. Two seasons  Three seasons  Four seasons  Training  sample   Testing  sample  Time period  HBa1 difference Time period  HBa1 difference Time period HBa1 difference   (Table 4).

Discussion & Conclusion
Studies have found that higher Hba1c is linked with increased risk of complications in DM patients [3]. A physician-pharmacist collaboration is useful for OHA adjustment to manage Hba1c owing to physician knowledge and experience [32]. However, it is a challenge to leverage the knowledge of these experts. The current study found that Bi-LSTM models performed better than SVM models for a CDSS to support physicians' decision-making related to OHA adjustment for DM patients. Many CDSS and classification models have used SVM and other artificial intelligence technologies [18,[33][34][35][36].
We also found that increasing the number of seasons used in the prediction negatively impacted accuracy and RMSE. Physicians reference the most recent Hba1c value to adjust OHA dosage and may choose to maintain the dosage if the Hba1c value is only slightly higher than 7% for the first time. Thus, it is not necessary to reference three or more seasons of Hba1c data, as validated by our experimental results.
We calculated the importance of these drugs and used RF to translate this information into the CDSS [27-29, 33, 35]. The most important drug was metformin, which is recommended as first-line treatment OHA in various situations for DM patients. This drug improves lipids and inflammatory markers and reduces cardiovascular events, but may be contraindicated for patients with mild to moderate chronic kidney disease. Recent research indicates that metformin requires caution in these kinds of DM patients [5]. Sulfonylureas is an important DM drug.
We found that the glimepiride (2 nd ) glibenclamide (5 th ), and gliclazide (6 th ) are also important OHA for DM patients [2,4]. Acarbose was also found to have some gastrointestinal adverse effects, which were similar to metformin [6]. Pioglitazone is an important DM drug that may reduce Hba1c and improve both metabolic syndrome and nonalcoholic fatty liver disease/nonalcoholic steatohepatitis [7]. It also has a side effect of weight loss for some patients, which is sometimes treated as a benefit. Glimepiride, pioglitazone, and vildagliptin were all combined with metformin ( Table 2).
This study has some limitations and areas for extension. Yanase et al. reported that low HbA1c is linked with frailty and suspected malnutrition in elderly type 2 DM patients [37]. Around 25% of cases in this study had Hba1c < 6.5%, indicating that the DM control was too strict for some patients, potentially leading to malnutrition or hypoglycemia. Our CDSS defined "good control" as Hba1c ≤ 7. Although our approach worked well, future versions could be enhanced by considering 6.5 ≤ Hba1c ≤ 7.