Multi-biomarker is an early-stage predictor for progression of Coronavirus disease 2019 (COVID-19) infection

Coronavirus disease 2019 (COVID-19) has spread widely in the communities in many countries. Although most of the mild patients could be cured by their body's ability to self-heal, many patients quickly progressed to severe disease and had to undergo treatment in the intensive care unit (ICU). Thus, it is very important to effectively predict which patients with mild disease are more likely to progress to severe disease. A total of 72 patients hospitalized with COVID-19 in Shandong Provincial Public Health Clinical Center and 1141 patients included in the published papers were enrolled in this study. We determined that the combination of interleukin-6 (IL-6), Neutrophil (NEUT), and Natural Killer (NK) cells had the highest prediction accuracy (with 75% sensitivity and 95% specificity) for progression of COVID-19 infection. A binomial regression equation that accounted for a multiple risk score for the combination of IL-6, NEUT, and NK was also established. The multiple risk score is a good indicator for early stratification of mild patients into risk categories, which is very important for adjusting the treatment plan and preventing death.


Introduction
Coronavirus disease 2019 (COVID- 19), which is thought to be related to the severe acute respiratory syndrome (SARS), is triggered by SARS-COV2 and has become a public health emergency of international concern [1]. COVID-19 is transmitted from human to human, mainly through droplet and contact routes. A wide range of signs, ranging from mild disease to severe symptoms, has been reported in patients with COVID-19 [2,3]. Generally, the current new coronavirus seems to have relatively low pathogenicity in mild patients but it may result in certain sequelae and high fatality rate among severe patients. Its R0 value can be as high as 5.7 [4]. Compared with influenza A in 2009 and Middle East Respiratory Syndrome (MERS) in 2014, COVID-19 is more infectious (i.e., its R0 value is greater). Until Oct 1, 2020, 34 million persons have been infected worldwide with the death toll topping 1,014,958.
At present, COVID-19 has spread widely in the communities in many countries [5]. Because too many people have been infected, medical institutions have advised hospitalization for severe cases and home quarantine for mild cases. Although most of the mild patients could be cured by their body's ability to self-heal, many patients quickly progressed to severe disease and had to undergo treatment in the intensive care unit (ICU). Severe COVID-19 disease can cause great harm to the human body, has a high mortality rate, and may result in many sequelae, such as reduced pulmonary function and impaired nervous system function after treatment [6,7]; therefore, it is important to effectively predict which patients with Ivyspring International Publisher mild disease are more likely to progress to severe disease, in order to pay attention and provide early timely treatment.
Although some biomarkers, such as lymphocyte count, D-dimer, and interleukin (IL)-6, have been reported as risk factors for the severity of COVID-19 infection, most of these biomarkers can be used to distinguish patients with severe disease from normal persons or patients with mild disease [8]; however, estimation of risk factors for COVID-19 disease progression in previous studies is not very robust. Since the risk of COVID-19 is affected by multiple biologically redundant factors, the relationships between these hematological biomarkers may contribute to predicting the progression of COVID-19. In many common diseases, polygenic risk scores of multi common variations provide better disease risk prediction than single rare or common mutations [9,10]. A previous study has also shown that collective effects of common single nucleotide polymorphisms (SNPs), in which single variation has small effect size in diseases, could improve risk prediction of many diseases [11,12]. A generalized linear model (GLM) is a good predictor that has feature importance measures and excellent predictive accuracy [13]. In this study, the GLM was employed to determine the optimal combination of biomarkers for early prediction of the risk of patients with mild disease progressing to severe disease. As many countries slowly emerge from lockdown measures, early-stage predictors for progression of COVID-19 infection are of great value as early effective intervention can effectively protect the vulnerable population from COVID-19 and reduce the fear of disease, which is conducive to return to normal socially productive activities. Therefore, the primary aim of this study was to evaluate whether the multi-biomarker is a good early-stage predictor compared to a single biomarker for progression of COVID-19 infection.

Data collection
This retrospective cohort study included 72 inpatients diagnosed with COVID-19 infection from January 29, 2020 to April 24, 2020 in the Shandong Provincial Public Health Clinical Center. The Research Ethics Commission of Shandong Provincial Public Health Clinical Center (2020XKYYEC-03) approved the study. Reverse transcriptionpolymerase chain reaction (RT-PCR) was used to confirm that all patients were positive for the new coronavirus nucleic acid. The World Health Organization (WHO) interim guidance for COVID-19 was used to diagnose the patients accordingly, and they were divided into mild and severe groups. Patients with mild disease met the following criteria: (1) RT-PCR positive result for SARS-COV2 RNA, (2) Fever or other respiratory signs, (3) Viral pneumonia abnormality diagnosed on a typical CT image. Patients with severe disease met at least one of the following criteria: (1) Shortness of breath, respiratory rate (RR) ≥ 30 breaths/min, (2) Oxygen saturation ≤ 93%, or (3) PaO2/FiO2 ≤ 300 mmHg. The WHO/ International Severe Acute Respiratory and Emerging Infection Consortium case record form for severe acute respiratory infections was used to extract the epidemiological, demographic, clinical, laboratory, treatment, and outcome data from the electronic medical records.

Laboratory procedures
Real-time PCR methods were used for determining the methods for laboratory validation of SARS-CoV-2 infection [2]. After clinical remission of symptoms, including fever, cough, and dyspnea, throat swab specimens were collected for SARS-CoV-2 PCR retesting every other day; however, only qualitative data were available. Absence of fever for at least 3 days, substantial improvement in both lungs on chest CT, clinical remission of respiratory symptoms, and two throat swab samples negative for SARS-CoV-2 RNA obtained at least 24 h apart were the criteria for discharge. Routine hematological investigations were as follows: complete hematological count, coagulation profile, serum biochemical tests (including renal and liver function, creatine kinase, lactate dehydrogenase, and electrolytes), myocardial enzymes, and cytokines.

Search strategy
This study was a review conducted in 2020. Searches were performed in the scientific PubMed database, using the combination of related keywords based of MeSH terms (Table 1). A researcher (Z. Z), a professional clinician, searched the PubMed database for all published articles on COVID-19 up to March 9, 2020 using the following keywords: "2019-nCoV", "Coronavirus", "COVID-19", and "SARS-CoV-2". Another researcher (L. Y), a professional clinician with expertise in systematic reviews, independently repeated the first reviewer's search. Both searches were in complete agreement with each other. All steps of searches were performed based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) checklist. Searches were limited to papers published in English and Chinese languages. "COVID-19 virus" and "Cytokines" #2 (Entry Terms) "COVID 19 virus" or "COVID-19 virus" or "coronavirus disease" or "2019 virus" and "Cytokines" Search #1 or #2

Inclusion and exclusion criteria
The inclusion criteria were as follows: original articles, retrospective case series, and case reports of COVID-19 infection, including clinical features, epidemiological findings, laboratory and imageological examination, treatment options, or pathological studies. Exclusion criteria were as follows: non-availability of full text, no target observations, and other article types. Other article types included review articles, comments, and news.
Data extraction included that of Lymphocytes, CD3+ T cells, Neutrophils (NEUT), Platelet count (PLT), CD4+ T cells, CD8+ T cells, C-reactive protein (CRP), D-dimer, Natural Killer (NK) cells, White blood cell count (WBC), Fibrin degradation products (FDP), Thrombin time (TT), Activated partial thromboplastin time (APTT), IL-10, IL-6, and Platelet distribution width (PDW). After data extraction, the findings were summarized and reported in tables and figures according to the objectives of the study. Two researchers (Z. Z and L. Y who were specialist physicians) reviewed all articles in detail. The researchers identified all articles presenting the clinical characteristics or pathologic studies of COVID-19 infection. The search results were submitted to a third party (M. Y. H who was a professionally trained physician), which reviewed the discrepancies and made decisions in the event of disagreement (Table 2). Table 2. Search strategy used in the meta-analysis for selecting patients for inclusion in the study
We appraised the trial quality using the Cochrane collaboration tool for assessing the risk of bias (ROB), including assessment of random sequence generation, allocation concealment, blinding (of interventions and outcome measurement or assessment), incomplete outcome data, selective reporting bias, and other potential sources of bias (e.g., age). For each criterion, we appraised the ROB as being either low, high, or unclear risk (e.g., insufficient details). Two authors (Z.Z and L Y) independently assessed the study quality and disagreements were resolved by consensus.

Data synthesis and meta-analysis
For continuous outcomes, standardized mean difference (SMD) with the corresponding 95% CI was calculated. Cochran Chi-square test and I 2 were used to assess the heterogeneity among studies. A fixed-effects model was used when I 2 was < 50%, while a random-effects model was selected when I 2 was > 50%. If there was statistical heterogeneity among the results, further sensitivity analysis was conducted to determine the source of heterogeneity. After significant clinical heterogeneity was excluded, the randomized effects model was used for meta-analysis. Publication bias was evaluated using Egger's test (Table 3). P < 0.05 was considered to indicate statistical significance. All data were analyzed using the Review Manager 5.2 software.

Statistical analysis
The means and standard deviations were used to represent continuous variables. Percentages were used to represent categorical variables. The biomarkers, which showed differences between the patients with mild disease and severe disease, were examined by the Mann Whitney test. The incidence of clinical disease, which differed between mild and severe patients, was examined by Fisher's exact test. The sensitivity (true positive rate, TPR) and specificity (true negative rate, TNR) were then calculated using Prism 5 [12]. GLMs and Pearson correlation test were performed on R-Studio version 1.2.5033. GLM covariates were selected using binomial regression and the best fit subset using the Bayesian information criterion (BIC).

Baseline patient and disease characteristics
The study population included 72 hospitalized patients diagnosed with COVID-19 in Shandong Provincial Public Health Clinical Center before April 24, 2020. Among the 72 patients, 56 were categorized as having mild disease, and 16 were categorized as having severe disease. Patients in the severe disease group (n = 16) were significantly older (median age, 60 years vs. 47 years; p < 0.05) and were more likely to have clinical comorbidities, including hypertension (50.00% vs. 14.30%), diabetes (25.00% vs. 10.2%), coronary heart disease (25% vs. 6.1%), and cerebrovascular disease (31.3% vs. 6.1%) when compared with patients in the mild disease group (n = 56) ( Table 4).

Hematological biomarkers could distinguish between mild and severe patients
Hematological biomarkers, including total hematological count, agglutination profile, serum biochemical tests, myocardial enzymes, lymphocyte subsets, and cytokine profiles, were examined. Twenty-eight hematological biomarkers showed a significant difference between mild and severe patients (Table 5). Then the true positive rate (TPR) of each hematological biomarker was calculated. Sixteen hematological biomarkers that showed good discriminatory capability were finally identified (P < 0.05, TPR > 30%) ( Table 5 and Supplemental Figure 1). Then, the discriminatory capability of 16 hematological biomarkers was confirmed by systematic review of the data published all over the world. Finally, a total of 18 articles out of the 178 articles that were retrieved, were included in the meta-analysis, which comprised data from 1141 patients, after excluding the following papers: 85 papers were excluded due to repeated retrieval, 46 papers were excluded after reading the abstracts, and 29 papers were excluded after reading the full text. Through the meta-analysis, it was found that most of these hematological biomarkers could effectively distinguish patients with mild disease from patients with severe disease.

Figure 1. The value of hematological biomarkers in predicting the progression of COVID-19.
Hematological biomarkers can be better predictors if we can use them to identify the patients with poor prognosis from the population with good prognosis. In this study, the predictive ability of 16 hematological biomarkers for COVID-19 infection progression, which showed a significant difference between mild patients and severe patients, confirmed in our data and in a systematic review was further assessed.

None of the single hematological biomarkers could effectively predict disease progression in patients with mild disease
During hospitalization of 72 patients, most of the hematological biomarkers were detected more than 3 times. Among these 72 patients, 4 patients had complete data from mild to severe disease status. We used these 4 patients to examine the prediction effect of the 16 biomarkers and the value of 10 biomarkers, especially CRP, WBC, and FDP showed a significant difference between these 4 patients in a mild state with poor prognosis and mild patients with good prognosis (P <0.05). However, none of the single biomarkers could effectively predict the progression of COVID-19 (Fig. 1).

Multiple-factor risk score had a better prediction effect for progression of COVID-19 than single hematological biomarkers
Hematological biomarkers, including complete hematological count, serum biochemical tests, coagulation profile, myocardial enzymes, lymphocyte subsets, and cytokine profiles, were examined using the venous blood obtained simultaneously. Thus, the combined effect of different biomarkers on COVID-19 infection could be analyzed. Here, combinations of biomarkers were first studied with respect to whether they could be used as early-stage predictive markers for progression of COVID-19 infection.
A GLM was used to analyze the interaction between the 19 biomarkers. Sixteen biomarkers were incorporated into the GLM as variables. The optimal models were three-dimensional models with the highest discrimination capability of 94.12% (P < 0.001) (Fig. 2A). These results indicated that the IL-6, neutrophil granulocytes, and NK cells exhibited interaction effects on COVID-19 infection. A binomial regression equation was then presented by using IL-6, neutrophil granulocytes, and NK cells to calculate the multiple-factor risk score for progression and survival of COVID-19. The binomial regression equation was -(exp ( -30.140 -1.821 × NEUT + 10.519 × ln (NK/ul) + 0.305 × ln(IL-6)) / (1+exp (-30.140 -1.821 × NEUT +10.519 ×ln (NK/ul) + 0.305 × ln(IL-6))-0.5). The value of IL-6, NEUT cells, and NK cells in patients with mild disease was imported into the binomial regression equation; if the score was greater than 0, the patient with mild disease had a high probability of progressing to severe disease; and if the score was less than 0, the patient with mild disease had a low probability of progressing to severe disease. By using the binomial regression equation, it was found that the combination of biomarkers, including IL-6, neutrophil granulocytes, and NK cells, showed a better discriminating ability than the optimal single biomarker (75% vs 25%) (Fig. 2B).

Discussion
WHO recommends that all patients with new coronavirus pneumonia should be kept under observation in medical institutions. When it is impossible to keep the patients under observation in a medical institution due to objective reasons, home isolation and observation are also a viable strategy [14]. In many countries, only people with severe symptoms are tested for new coronavirus and treated in hospitals. Most of the people with mild symptoms are recommended home quarantine and are sent to a hospital for treatment if their condition becomes serious. Most of the patients with mild disease can be cured by their body's ability to self-heal, but some of them deteriorate quickly; and once they develop severe disease, it would cause great harm to their body and may result in sequelae. The formation of scars results in decreased lung capacity. There is no long-term follow-up investigation for severe COVID-19 patients after recovery, but from 2003 to 2018, 71 SARS patients were followed up; it was found that more than one-third of patients had residual scars in their lungs [15]. Among the 36 surviving MERS patients, about one-third of the patients also had long-term lung injury [16]. In addition, the scarring rate in patients with COVID-19 may eventually be higher than that in patients with SARS and MERS because these diseases usually affect only one lung and COVID-19 frequently seems to affect both lungs, which also exacerbates the risk of lung scarring [17]. Impaired lung function caused by SARS-COV-2 infection could negatively affect other organs (such as heart, kidney, and brain), and health effects of this infection may persist after the disease is cured.
According to the WHO report, about 10 to 15% of patients with mild and moderate disease will develop severe disease and the disease course in some patients is rapid [18]. Evaluation of the risk of developing severe disease among mild patients, isolation of low-risk patients at home, and treatment of high-risk mild patients in medical institutions in a timely manner can not only reduce the burden on medical resources, but can also effectively reduce the proportion of severe patients as well as the mortality. The value of the combination of IL-6, neutrophil granulocytes, and NK cells in predicting the progression of COVID-19 by using an independent dataset. If the multiple risk score is greater than 0, the mild patient has a high probability of progressing to severe disease; if the multiple risk score is less than 0, the mild patient has a low probability of progressing to severe disease.
Early identification and management of mild patients are essential to reduce the incidence of severe disease. It has been reported that many biomarkers showed a significant difference between mild and severe patients [17,18]. A systematic review and meta-analysis suggested that elevated procalcitonin, CRP, D-dimer, and LDH and decreased albumin can be used for predicting severe outcomes in COVID-19 [19]. Our data also confirmed some of the hematologic markers. Furthermore, in order to make early predictions, the predictors presented in our study were obtained by comparing the hematologic markers between severe patients with mild symptoms and patients with mild symptoms who did not eventually develop severe symptoms. Sixteen biomarkers were selected, which showed a significant difference between mild and severe patients in Shandong Provincial Public Health Clinical Center and were confirmed by systematic review of 18 published articles. In medical practice, sensitivity (TPR) and specificity (TNR) are often used to assess the accuracy and effectiveness of a biomarker in disease prediction. In this study, none of the 16 biomarkers showed good sensitivity and specificity in the training data. Herein, hematological biomarkers at different time points were recorded in four mild patients whose symptoms worsened rapidly and became severe, which could be used to further verify the prediction effect of these biomarkers. On comparing the two groups of 4 patients with poor prognosis and mild patients with good prognosis, 10 biomarkers showed statistically significant differences (P <0.05); however, it was not possible to distinguish these 4 patients with poor prognosis from the population with good prognosis based on any single biomarker. This result suggests that although most of the biomarkers could distinguish between mild and severe disease, the ability to predict the progression of COVID-19 infection was insufficient.
Previous work has shown that multiple variations in which a single SNP has small effect size can improve risk prediction of many diseases [11,12]. Currently, there is no standard method to analyze and interpret the data of multiple biomarkers. A method known as GLM, which is conventionally used to understand genetic epistasis, was first used to identify biomarker relationships. Contrary to genetic markers that are immobile, hematological biomarkers are mobile. Therefore, in order to truly evaluate the correlation between different biomarkers, all blood samples were collected on the same day and at the same time, rather than performing different biomarker tests at different time points. A good predictive model should not be disturbed by clinical factors to a large extent. The biomarkers, such as IL-6, NEUT, and NK cells, in this study were selected from the clinical routine test index, and some important potential confounders, such as age and the basic disease; therefore, age and clinical symptoms had minimal influence on the predicted results. This study determined that the IL-6, NEUT, and NK cell combination, which showed good prediction of COVID-19, had 93% sensitivity and 100% specificity in the training data. In the independent test data, the IL-6, NEUT, and NK cell combination had a good predictive value with 75% sensitivity and 95% specificity. Among these indicators, IL-6 is a good predictor and an effective target for drug therapy [20]. Our results suggested that the combination of IL-6, NEUT, and NK cells had a good predictive ability than a single biomarker for progression of COVID-19 infection. Furthermore, the results of this study revealed that the combination of IL-6, NEUT, and NK cells had a good discriminating ability.
This study has some limitations. The study was limited by the number of patients who had complete data of hematological biomarkers, from mild status to severe status, because many patients had a very severe status on admission, and the combination of primary and secondary data in this study could have resulted in multiple biases. Therefore, the interacting biomarkers identified in this study need to be validated further in more mild patients with different outcomes and more samples from different countries or regions. Evaluation of these biomarkers in a longitudinal study is another way to address this limitation.

Data availability
All available data are published in the current manuscript.

Ethics approval and consent to participate
This study was approved by the Research Ethics Commission of Shandong Provincial Chest Hospital (2020XKYYEC-03), and written informed consent was obtained from all participants.