Artificial intelligence supported anemia control system (AISACS) to prevent anemia in maintenance hemodialysis patients

Anemia, for which erythropoiesis-stimulating agents (ESAs) and iron supplements (ISs) are used as preventive measures, presents important difficulties for hemodialysis patients. Nevertheless, the number of physicians able to manage such medications appropriately is not keeping pace with the rapid increase of hemodialysis patients. Moreover, the high cost of ESAs imposes heavy burdens on medical insurance systems. An artificial-intelligence-supported anemia control system (AISACS) trained using administration direction data from experienced physicians has been developed by the authors. For the system, appropriate data selection and rectification techniques play important roles. Decision making related to ESAs poses a multi-class classification problem for which a two-step classification technique is introduced. Several validations have demonstrated that AISACS exhibits high performance with correct classification rates of 72%-87% and clinically appropriate classification rates of 92%-98%.


Introduction
Anemia, a common complication associated with chronic kidney disease (CKD), is a risk factor for high mortality [1]. Erythropoiesis-stimulating agents (ESAs) and iron supplements (ISs) are usually administered during hemodialysis treatment to patients. Generally, patients with large hemoglobin (Hb) variations are likely to have complications and often need to be hospitalized, and vice versa [2]. Therefore, physicians are trying to stabilize patients' Hb values within a certain range. However, doing so is very difficult because of complicated disorders such as altered iron metabolism, poor response to ESAs, and residual blood in dialysis equipment, which are mostly common problems for hemodialysis patients. Moreover, general situations such as concomitant diseases and differing backgrounds of patients in different countries [3,4] are also affecting the difficulty. Compounding these difficulties are economics concerns such as high costs of ESAs, which are heavily burdening medical insurance systems [5,6].
Although hemodialysis patients are becoming increasingly numerous worldwide, physicians who are able to manage and administer treatment appropriately are not being trained in sufficient numbers to keep pace with the increasing numbers of patients requiring hemodialysis treatment [6]. To reduce burdens on physicians and medical insurance systems under these circumstances, effective decision-making support systems are urgently anticipated. Recently, artificial intelligence (AI) technologies have been used extensively in nephrology [7,8]. Several studies conducted to assess hemodialysis have predicted vital reactions including studies specifically examining anemia control [9][10][11][12]. Model predictive control (MPC) approach was utilized and extended for effective anemia control [10][11][12]. Systems using AI for predicting Hb values for hemodialysis patients were presented in the literature [13,14]. Anemia control model (ACM) achieved improved control accuracy and decreased patients' need for ESAs [15,16]. Although anemia control assisted by AI technologies appears promising, a discrepancy persists between technologies and actual medical practice. Widely diverse health conditions of actual patients and various legal and economic constraints can cause many difficulties. As a result, available datasets including data of similar patients are usually not so large. Therefore, a different approach was adopted for AI learning in this study: the AI learns based on decisions of experienced physicians rather than data showing reactions of the patients' living bodies, such as Hb values. From highly experienced physicians with work histories including blood examination, we gathered data of their dosage direction decisions for patients there. To enhance the learning process, we constructed procedures for the rectification of clinical data. Then we developed an artificial-intelligence-supported anemia control system (AISCAS).

Ethics statement
Clinical data were collected retrospectively from electronic health records. This study, which was conducted in accordance with the Declaration of Helsinki, was approved by the institutional review board (IRB) at Shigei Medical Research Hospital (#20161219-1) and Kobayashi Medical Clinic (#20190925), as a retrospective observational study. The endpoint of this study approved at IRBs was to construct a decision-making support system that can provide dosage directions that are equal to or better than those of physicians who control dosages to maintain hemoglobin (Hb) values within 10-12 g/dl: the criterion stated in the Japanese hemodialysis guideline.

Clinical data collection
Clinical data were collected at two hospitals where Japanese adult hemodialysis patients were receiving anemia control treatment by board-certified senior members of the Japanese Society for Dialysis Therapy. Data were collected at Shigei Medical Research Hospital (Hospital S) from January 2015 through May 2019 and at Kobayashi Medical Clinic (Hospital K) from November 2018 through September 2019. All clinical data were anonymized. At Hospital S, the S1 and S2 datasets were prepared. Dataset S1 was used for training the neural network; S2 was used for raw data validation. At Hospital K, dataset K1 was prepared and used for raw data validation. At both hospitals S and K, directions by physicians at every hemodialysis occasion, which are every one or two weeks depending on the hospitals, were recorded in the form of UP, DOWN, or STAY because dosages for administration were directed in units of one ampoule under hospital regulations. The hemodialysis patients were 350 per year at Hospital S and 90 per year at Hospital K. The cases of mortality were 35 per year at Hospital S and 10 per year at Hospital K. Hospital K was selected to examine the applicability of AISACS at smaller hospitals.
The patient selection criteria were the following: maintenance hemodialysis, no concomitant inflammation (CRP<0.3 mg/dL), no infectious disease, and no present cancer. Moreover, the data collection period for each patient case was chosen to include as many UP and DOWN directions as possible in both training and validation groups. This period-selection criterion was used because data for maintenance hemodialysis patients in stable condition include larger numbers of STAY directions than either UP or DOWN directions, indicating that appropriate timings of UP and DOWN decisions are significant for patient care.
As a result obtained from data selection criteria described above, dataset S1 with N=130, W=6080, and dataset S2 with N=81, W=1857 were prepared from Hospital S, where N and W respectively represent the number of patients and hemodialysis occasions. Dataset S1 was used for training the neural network, whereas S2 was used for raw data validation. Dataset K1 was prepared and used for raw data validation with N=16 and W= 298.
Darbepoetin alfa and epoetin beta pegol were used as ESAs. The ISs were provided in the form of sodium ferrous citrate, ferrous fumarate, and saccharated ferric oxide (Supplemental Table A1). The target range was set as 10.0-12.0 g/dl at Hospital S according to the Japanese hemodialysis guideline. The Hb values were controlled by physicians within target ranges of 74% in S1 and 73% in S2 (Supplemental Table A2). Also, ESA-resistant patients were excluded. Therefore, the mean administered dosages of darbepoetin alfa were 20.2±10.1 µg/week in S1, 18.8±14.1 µg/week in S2 and 20.4±13.5 µg/week in K1. The mean administered dosages of epoetin beta pegol were 26.1±8.9 µg/week in S1, 36.0±15.7 µg/week in S2, with no use in K1 (Supplemental Table A3).

Inputs and outputs for machine learning
Four items of blood examination were regarded as neural network inputs: Hb; mean corpuscular volume (MCV); ferritin; and transferrin saturation (TSAT). These items, their trends, and histories of dosages for ESAs and ISs up until the previous administration occasion were used as input parameters. Finally, AISACS outputs probabilities for ternary directions in the form of UP, STAY, and DOWN in ESAs, and UP and STAY for binary directions in ISs, as shown in Figure 1. Ternary directions were not needed for ISs because the ISs were set to stop after 6 weeks, in accordance with hospital regulations.

Data rectification
One important difficulty in collecting administered dosage data is posed by "delayed decisions." For each hemodialysis occasion, patients underwent blood examinations. Usually the physicians then examined the results and gave administration directions. However, not all the decisions were made on the same day of the examination because of the delays in delivering the examination results to physicians caused by mechanical troubles, working time restrictions, and other factors. In such cases, the decision events were actually recorded with a week delay after the blood examination results on which the decision was actually based. Such a non-essential difference between blood examination and decision dates confused the neural network training process considerably. Therefore, we performed data rectification by moving the UP and DOWN decision dates to the exact dates on which the blood examinations were actually performed. This rectification procedure was done automatically and was confirmed by three physicians. The procedure was applied only for S1 to be used for neural network training.

Preliminary analyses
Before starting a deep learning approach, we applied simpler approaches to examine the complexity of our classification problem. Figure 2 portrays a principal component analysis (PCA) based on input data. From Fig. 2 using three principal components (PCs), it is apparent that almost all UP and DOWN decisions were readily classifiable using linear approaches, but UP and STAY, or STAY and DOWN are difficult to classify clearly using PCs. Moreover, several outliers exist, such as UP decisions located in the upper-right corner of Fig. 2(b). Based on these preliminary attempts, we decided to apply a deep learning approach, which is expected to work for such nonlinear, high-complexity classification problems.

Machine learning setup
Machine learning codes were written using Keras with a TensorFlow backend [17,18]. The blood examination intervals for Ferritin/TSAT are usually longer than that of Hb/MCV. Therefore, we used independent neural networks of two kinds for the two forms of medication. Indeed, Hb and MCV are examined every week, whereas Ferritin and TSAT are examined every month, which means that only a quarter of the dataset has actual measured values of Ferritin and TSAT to predict ISs. For this reason, whereas a dense neural network was used for ESAs, a recurrent neural network (RNN) [19] was used for ISs as a more effective method when fewer data are available. Considering the tradeoff between training data size and representation ability, a recursive layer with sequence size two was added to the dense neural network, so two successive timings are passed as inputs. Both networks used 10 hidden layers with L 1 regularization and drop-out techniques [22] to prevent overfitting phenomena. Other training parameters and hardware used for machine learning are presented in Table 4A.

Validations
We defined correct classification rates RTOTAL as TOTAL = number of correct decisions number of input decision data , which were the ratios by which AISACS gave the same directions on the same dates as those given by physicians. We also defined RUP, RSTAY, and RDOWN by confining the decision to each class.
Using these values, we performed the following validations of two types.
• "Leave one patient out" cross-validation (LOPO) LOPO was performed by removing data of one patient from the dataset. The neural network was trained using the remaining N-1 patient data. Then the removed patient data were used to evaluate the performance of the trained neural network. After repeating this procedure N times, correct classification rates were calculated using N patients results. The S1 dataset was used for LOPO.
• Raw data validation (RDV) RDV was performed using S2 and K1. First, we trained the neural network using S1. Then the correct classification rates were calculated using S2 (RDV_S) and K1 (RDV_K). Training and validation processes are completely independent in RDV_S and RDV_K.
Validations performed in this study are presented in Table 1 and are shown schematically in Figure 3. Table 1. Validations and datasets using S1 and S2 from Hospital S and K1 from Hospital K.

Name Validation procedure Dataset for training Dataset for validation LOPO
Leave one patient out cross-validation S1 RDV_S Raw data validation S1 S2 RDV_K Raw data validation S1 K1

Class-imbalanced training data
Although we selected the clinical data period that includes plentiful UPs and DOWNs, the numbers of different directions included in the dataset are still markedly imbalanced. For example, in dataset S1, ESA directions by physicians comprised 344 UPs, 585 DOWNs, and 5151 STAYs. Simple machine learning using such an imbalanced dataset led to AI always outputting the STAY direction to achieve the highest RTOTAL. However, the timings of UP and DOWN are much more important for the present problem. Such a discrepancy can usually be controlled by class weights, respectively strengthening and weakening the effects of minority and majority classes on the target functions. Although values of class weights are usually defined using the inverse ratios of quantities of data, class-imbalance was not improved sufficiently for AISACS. Therefore, they were further adjusted to strengthen minority classes by trial and error so that RUP, RSTAY, and RDOWN are approximately equal in S1.

Two-step classification for the ternary classification for ESAs
Because the ESA administration belongs to ternary classification problems, three probability values of PUP, PSTAY, and PDOWN, respectively corresponding to UP, STAY, and DOWN directions, were computed as outputs from the neural network. The simplest method for classification is to adopt a direction that gives the highest probability value. However, such a simple algorithm does not seem to work for the present situation in which the timings of UP and DOWN are crucially important to appropriate anemia control. Therefore, we propose the following procedure for the ternary classification problem: First, we set a threshold value T. The direction is assigned as STAY if the probability of STAY was larger than T. Otherwise, UP or DOWN, which has a larger probability, is assigned, as portrayed in Fig. 4. We designate the union of UP and DOWN classes as NON-STAY in the following sections.

Correct classification rates after fixing threshold T
On actual situations in hospitals, a threshold value T discussed in 2.2.4 should be decided. One possible strategy using the ROC curves is to choose T corresponding to the nearest point on the ROC curve from point (x, y) = (0, 1) to achieve similar abilities for both STAY and NON-STAY. For dataset S1, this value appeared to be 0.475 for ESAs and 0.470 for ISs, which we adopted also for validations and which gives the correct classification rates RTOTAL for LOPO, RDV_S, and RDV_K as 80%, 77%, and 72% for ESAs and 81%, 87%, and 80% for ISs.

Examining incorrect classification cases
To analyze reasons for incorrect classification cases, we reviewed them carefully one-by-one, which revealed some directions by AISACS that appeared to be appropriate from a medical perspective, even though they differed from the physician's recorded directions. We defined these as "clinically appropriate" directions. Moreover, we found that a characteristic type exists in "clinically appropriate" directions, which we defined as a "before physician" direction. In "before physician" directions, AISACS gave the same UP or DOWN directions with physicians, but gave it a week or so earlier than the physician did. "Before physician" directions are calculable automatically by counting up to three earlier administration occasions than the physician. Although such "before physician" directions are counted as incorrect classifications in 3.2, they portray an interesting feature of AISACS. Other "clinically appropriate" directions are the other portion in clinically appropriate directions judged by board-certified doctors. The rate of "before physician" in validations LOPO, RDV_S, and RDV_K were, respectively, 9%, 7%, and 8% for ESAs and 5%, 5%, and 5% for ISs. The rate of "clinically appropriate: other" directions were, respectively, 8%, 8%, and 15% for ESAs and 9%, 6%, and 10% for ISs. Ratios for "correct classification," "clinically appropriate: before physician," and "clinically appropriate: other" are shown respectively in Figs. 6 and 7.  Finally, gross rates of appropriate directions, which were the sum of "correct classification," "clinically appropriate: before physician," and "clinically appropriate: other," in validations LOPO, RDV_S, and RDV_K were 97%, 92%, and 95% for ESAs and 95%, 98%, and 95% for ISs.

Discussion
Four features of AISACS are particularly important. The first feature is what AI learns: reactions of living bodies or decisions of experienced physicians. Systems for predicting future Hb values of maintenance hemodialysis patients using AI technology have been reported as described in Section 1. We adopted a different approach by which AI learns from experienced physicians' dosage directions. Actually, experienced physicians do not calculate detailed values of vital reactions when deciding dosages. We selected five items of blood examination, their trends, and dosage histories as inputs by looking at the judgments reported by physicians.
A second feature is proper data selection and rectification. For example, "delayed decisions" appear frequently in real datasets because of mechanical difficulties and working time restrictions. In such cases, the decision dates were recorded with a one or two week lag after the blood examination actually occurred. Such a nonessential difference between blood examination and actual decision dates confuse the training process of our neural network considerably. Therefore, we moved the dates of UP and DOWN directions to dates on which the decisions were actually based. Such a data rectification procedure functioned well to make the training process efficient, even though the training in this study was based on a small sample of data. Figure 8 presents correct classification rates for ESAs in S1 improved during AISACS development: in (a) with a few layers in a neural network with no weighting techniques, it almost always yields the STAY direction. Then, by a tuning of class weights, the correct classification rates RUP, RSTAY, and RDOWN became approximately equal to each other as portrayed in Fig. 8(b). By increasing the number of layers and by adding several means from (c)-(e) such as class weights, dosage histories reference and two-step classification, the correct classification rates, especially for UP and DOWN, were improved considerably. When comparing the AUCs in raw data validation using data from hospitals S and K (RDV_S vs. RDV_K), the AUC from RDV_S was found to be higher than that from RDV_K because AISACS was trained using the dataset from Hospital S. Apparently, AISACS has some affinity to physicians at Hospital S. However, the "clinically appropriate" rates for Hospital K were sufficient, which suggests that AISACS has a certain degree of flexibility.
A third feature is the multi-class classification for ESAs. The direction timings of UP and DOWN are crucially important for appropriate anemia control. Therefore, we set a threshold between STAY and NON-STAY directions using the ROC curve based on probabilities calculated using the neural network. Then, NON-STAY is classified to UP or DOWN simply by comparison of their probabilities. It is possible to tune the frequency of decision changes by adjusting the threshold value. For example, if the threshold were set at a higher value, then AISACS would give more frequent UPs and DOWNs. This feature might be useful when AISACS is applied at different hospitals.
A fourth feature is that AISACS sometimes shows better timing than physicians for changing dosage directions as described in Section 3.3. The appearance of "before physician" directions portrays an interesting feature of AISACS, which can contribute to helping physicians to see right timings to increase or decrease dosages. There is an additional interesting point here. As presented in Section 3.1, the AUC value from RDV_K for ESAs was quite lower than that from RDV_S, which might be attributable to AISACS learned decisions of physicians at Hospital S. However, many of the incorrect classification cases were regarded as clinically correct through multiple doctors' reviews. Actually, on one hand, the AUC of RDV_K for ESAs is the lowest among four raw data validations. On the other hand, the "clinically appropriate decision" portion of it was the highest.
The present study has the following limitations. We conducted retrospective analysis for patients from only two hospitals, involving only Japanese patients with a small sample size. Moreover, we did not evaluate the cost of ESAs and irregular cases such as patients with conditions aggravated by other diseases. Considering the endpoint approved at IRBs for this study, it is difficult at the moment, to ascertain whether AISACS can give better directions than physicians, or not. A prospective, multi-center study is therefore needed, especially for confirmation of patient safety.

Conclusions
Preventing anemia is important to improve the prognosis and quality of life of hemodialysis patients. However, the pathophysiology associated with anemia is complicated. It requires a great deal of experience to control anemia cases adequately. The number of such physicians is insufficient. For this reason, we have constructed AISACS. The challenges and contributions to anemia control practices described in this paper are the following.
• Not-so-large training dataset: We have constructed proper data selection and rectification procedures that play important roles in enhancing machine learning efficiency with small datasets. • Importance of appropriate timing of dosage changes: AISACS provides ternary directions for ESAs equipped with a threshold value to control NON-STAY and STAY decision tendencies. • Widely diverse health conditions of dialysis patients: Patients have several legal and economic constraints. A feature that is unique to AISACS is that it learns dosage directions from physicians using no prediction model based on biochemistry or physiology.
In addition, an interesting feature of AISACS is that it sometimes produces "clinically appropriate" directions that are different from those of physicians, but which are nonetheless proper. Finally, AISACS has achieved a quite high gross rates of correct classification, which means giving the same direction with physicians on the same date, as 72%-87% and clinically appropriate classification, although it includes different decisions from those of physicians as 92%-98% through several validations. These results attest to AISACS' promising possibilities for clinical applications after wider validation through a prospective, multi-center study.
Appendix A