Deep Learning-based Artificial Intelligence Improves Accuracy of Error-prone Lung Nodules

Introduction: Early detection of lung cancer is one way to improve outcomes. Improving the detection of nodules on chest CT scans is important. Previous artificial intelligence (AI) modules show rapid advantages, which improves the performance of detecting lung nodules in some datasets. However, they have a high false-positive (FP) rate. Its effectiveness in clinical practice has not yet been fully proven. We aimed to use AI assistance in CT scans to decrease FP. Materials and methods: CT images of 60 patients were obtained. Five senior doctors who were blinded to these cases participated in this study for the detection of lung nodules. Two doctors performed manual detection and labeling of lung nodules without AI assistance. Another three doctors used AI assistance to detect and label lung nodules before manual interpretation. The AI program is based on a deep learning framework. Results: In total, 266 nodules were identified. For doctors without AI assistance, the FP was 0.617-0.650/scan and the sensitivity was 59.2-67.0%. For doctors with AI assistance, the FP was 0.067 to 0.2/scan and the sensitivity was 59.2-77.3% This AI-assisted program significantly reduced FP. The error-prone characteristics of lung nodules were central locations, ground-glass appearances, and small sizes. The AI-assisted program improved the detection of error-prone nodules. Conclusions: Detection of lung nodules is important for lung cancer treatment. When facing a large number of CT scans, error-prone nodules are a great challenge for doctors. The AI-assisted program improved the performance of detecting lung nodules, especially for error-prone nodules.


Introduction
Lung cancer is one of the leading causes of cancer-related deaths worldwide [1]. The prognosis of lung cancer depends largely on the stage of the tumor. Surgical treatment is the only curative treatment for patients with lung cancer [2]. Patients with lung cancer at an operable stage have higher survival rates than those with metastatic disease. Therefore, early detection of early lung cancer is important [1].
The diagnosis and treatment of early stage lung cancer remains challenging. Chest CT is still the main tool used to diagnose lung cancer [3]. Using chest CT scans to identify lung nodules may help physicians Ivyspring International Publisher find early lung cancer. Many efforts have been made to detect lung nodules on chest CT scans to detect early lung cancer [1]. Improving the diagnosis of lung nodules on chest CT scans may help diagnose early lung cancer and improve prognosis. Early detection of lung nodules may help in early detection of early lung cancer, which might improve the prognosis of lung cancer patients, and reduce medical costs. However, manually detecting a large number of CT scans is a great burden, requires attention, and is very timeconsuming, making it prone to errors. Moreover, lung nodules are sometimes very difficult to detect, even for experienced doctors.
Artificial intelligence (AI) has rapid advantages and exciting achievements in imaging diagnosis. Therefore, many studies have used AI for the detection of lung nodules. These efforts have attempted to improve the accuracy of the detection of lung nodules [1]. AI applications have great potential for improving the diagnosis of lung nodules on CT scans. Many of the programs performed well in detecting lung nodules in certain datasets. However, its effectiveness in clinical practice has not been fully proven [1]. The low sensitivity or high false-positive rate limits its practical application in clinical practice [1]. Therefore, more research is needed to study clinical AI applications. Therefore, our current research is aimed at using AI-based computer-aided diagnostic systems to help clinicians detect lung nodules on CT scans.

CT acquisition and reading
CT scan images of 60 cases were obtained for the detection of lung nodules. The chest CT scan was performed using a 64-slice detector, GE LightSpeed, and the thickness of the lung window slice was 2.5 mm. Five senior doctors (all had more than 10 years of experience in reading chest CT scans), including three chest physicians, one chest surgeon, and one radiologist, participated in this study for the detection of lung nodules. All of the doctors were blinded to all these cases. Two doctors (doctors 1 and 2) performed manual detection and labeling of lung nodules without AI assistance as a traditional method. Another three doctors (doctors 3, 4, and 5) received AI assistance to detect and label lung nodules before manual interpretation. The study was approved by the Taipei Tzu Chi Hospital, Buddhist Tzu Chi Medical Foundation Institutional Review Board (Protocol Number: 09-X-007).

Setting and Notations of AI algorithm
Given a lung 3D CT scan image I with N nodules, we denote the set of nodules, while {̂1,̂2, … ,̂, ≤ } denotes the spatial location and diameter of the ith nodule. Weakly supervised pulmonary nodule detection is not accessible during training. Instead, one typically observes the image label from electronic medical records (EMR) during the training stage, which indicates whether the CT scan contains nodules. In our work, we further consider auxiliary information from EMR, including the number k of nodules and the slice indices of each nodule on CT scan. Fig. 1 shows our proposed deep learning framework for weakly supervised pulmonary nodule detection. As shown in Fig. 1 pre-trained nodule of the 3D feature pyramid network (3D-FPN) [1] is applied to extract the preliminary prediction (i.e., features, bounding box location) of each nodule. Such prediction outputs can be viewed as primitive nodule proposals, and the aforementioned weak EMR labels (i.e., image label y, nodule number k, and nodule slice index z) were further utilized to guide the learning of our framework.

Pulmonary nodule detection with supervision
Previously, multiple instance learning (MIL) [4] has been applied to address object detection in weakly supervised settings, which is realized by observing only image-level labels during training. Without the need to collect any instance-level labels, the above model aims to estimate nodule proposals ℍ � , which would be properly associated with the image-level label y. For each proposal � , the pooling operation is applied to extract the corresponding feature maps from the 3D-FPN backbone detector, denoted as � = �̂� =1 . In a previous study [5], fully connected layers with a rectified linear unit (ReLU) activation function were deployed to infer the confidence score of each object proposal. Finally, to match the groundtruth image-level prediction, a number of techniques have been proposed to process the predicted � from ℍ � [6][7][8]. In our work, we followed and considered the maximum operator as the MIL pooling function: where MIL denotes the MIL branch, and ℎ � is the predicted score of the proposals in the proposed learning framework. We noted that we fed the extracted visual features into our weakly supervised pulmonary nodule detection module without adjusting the weights of the original ResNet-18 or FPN backbones. This allowed us to focus on the network modules for predicting and re-ranking the extracted nodule proposals under different weak supervision.
The data used for pre-training ResNet 18 comed from the lung nodule open dataset of Lung Image Database Consortium image collection (LIDC-IDRI) (https://wiki.cancerimagingarchive.net/plugins/ servlet/mobile?contentId=1966254#content/view/ 1966254), including 1018 CT volumes from 1010 different patients. A range of scanner manufacturers and models was represented (670 scans from seven different GE Medical Systems LightSpeed scanner models, 74 scans from four different Philips Brilliance scanner models, 205 scans from five different Siemens Definition, Emotion, and Sensation scanner models, and 69 scans from Toshiba Aquilion scanner) [9]. After training, we tested the model performance on hospital private data, including 60 CT volumes from 60 patients.

Reference standard
The lung nodules interpreted by AI and most doctors (at least three doctors) were used as standard references. Sensitivity refers to the rate at which the physician has labeled, and most other physicians and AI are also labeled. Lung nodules labeled by the expert, but beyond the consensus of most other experts, are regarded as false positives (FP) [10].

Analysis
The overall sensitivity and FP were analyzed. The influence of nodular location (upper, middle, lower; central or peripheral), size, and texture in the CT scan was analyzed.

Demographic characteristics
The demographic data of the patients are summarized in Table 1. The mean age was 62.6±11.0 years. The mean body height was 159.8±8.8 cm and the mean body weight was 61.1±13.6 kg. Among them, 27 were males (45%) and 33 were females (55%). Most patients did not smoke (N=44, 73.3%), there were 4 (6.7%) current smokers and 12 (20.0%) former smokers.

Overall nodular detection
There were 266 nodules in 60 patients (Fig. 2). For doctors without AI assistance, the FP was 0.617-0.650/scan (mean 0.634, 95% CI 0.586-0.680) and the sensitivity was 59.2-67.0% (mean 63.1%, 95% CI 52.0-74.1%). For doctors with AI assistance, the FP was 0.067 to 0.2/scan (mean 0.122, 95% CI 0.000-0.261), and the sensitivity was 59.2-77.3% (mean 69.8%, 95% CI 50.9-88.6%).  The 3D CNN is a pre-trained fully supervised detector that serves as the detector backbone to extract nodule proposals and features in weakly supervised settings. In addition to image-level labels to predict the pseudo labels for each proposal, this model additionally observed nodule numbers and slice index information from EMR to guide the learning process. Abbreviations: 3D CNN: 3-Dimentional convolutional neural network; NMS, non-maximum suppression; RoI pooling, region of interest pooling; FC layer, fully connected layer; ReLU, rectified linear unit; MIL, multiple instance learning; P.S., pseudo labels.

Left, central and right lung fields and nodular detection
There were 60 nodules in the left lung fields, 150 nodules in the central lung fields, and 56 nodules in the right lung fields (Fig. 3A). The summary of FP and sensitivity for detecting lung nodules in left, central and right lung fields are shown in Table 2. For the left lung field, the FP was 0.083-0.167/scan (mean 0.125, 95% CI 0.006-0.243) and the sensitivity was 63.9-77.9% (mean 70.9%, 95% CI 51.1-90.6%) without AI assistance (Fig. 3B). With AI assistance, the FP was 0-0.033/scan (mean 0.011, 95% CI 0.000-0.049), and the sensitivity was 60.

Upper, middle and lower lung fields and nodular detection
There were 83 nodules in the upper lung fields, 123 nodules in the middle lung fields, and 50 nodules in the lower lung fields (Fig. 4A). The summary of FP and sensitivity for detecting lung nodules in upper, middle and lower lung fields are shown in Table 2.

Discussion
The current study has several important findings. Doctors who are not assisted by AI are more likely to have more FP and less sensitivity in their predictions with respect to the center and middle positions. Facing small lung nodules (less than 0.5 cm), the doctor's sensitivity is quite poor. In terms of nodule texture, doctors were more likely to have more FP for GGOs. With the assistance of AI, the overall false positive and sensitivity of the doctor's interpretation can be improved. The accuracy of the above-prone areas or features can also be improved.
A reasonable idea is that if the nodule is too small, the doctors or the AI may easily miss the lung nodules. Previous studies have found that the most common cause of missed diagnosis on CT scans is its small size [11]. Del Ciello et al. suggested that a small diameter (<7 mm) is one of the causes of failed diagnosis [11]. In our study, the tiny nodules (< 0.5 cm) were quite difficult to manually detect in chest CT scans with a sensitivity of only 3.2%. With AI assistance, the sensitivity was greatly improved for tiny nodules. Several studies have also demonstrated that AI, as a second reader, significantly increases sensitivity in the identification of lung nodules [12].
One of the factors leading to missed lung nodules is their central location [11]. Del Ciello et al. reported that the missing rate of lung nodules in the central area is disproportionately high [11]. Deveraj et al. also found that hilar nodules, that are blind spots on CT scans, are one of the causes of missed lung cancers [13]. Some normal tissues have similar appearances as nodules on CT images. There are more normal lung tissues in the central area, especially near the hilar region. It is sometimes challenging for doctors to distinguish lung nodules from pulmonary vessels, bones, and other structures [11]. This will cause difficulties in interpretation, resulting in a decrease in accuracy. When lung tissues are regarded as lung nodules, they cause an increase in FP. Conversely, if the existing lung nodules are regarded as lung tissues, the sensitivity will be reduced. The approach for differentiating between the tissues and nodules is therefore crucial to reduce FP in an automatic lung nodule detection scheme [1]. In our AI assistant programs, the FP rates were greatly decreased.
The characteristics of lung nodules are also a factor for detecting lung nodules on CT scans. Previously, Li et al. revealed that lung cancers missed on CT screenings are very subtle and appear as small faint nodules [14]. Del Ciello et al. also showed that blurred and unclear margins are also factors of missed lung nodules [11]. Benzakoun et al. also suggested that the ground-glass component may hinder software detection of attenuation differences with the surrounding parenchyma [15]. They reported that partially solid nodules with a sensitivity of 72% were much better than pure ground-glass nodules with a sensitivity of 28% [15].
Compared with previous studies, our study showed a balance between sensitivity and FP. Although many previous studies have shown high identification sensitivity, their FP rates are also quite high. Cui et al. built on a 50-layer deep neural network and trained a large multi-center database; its deep learning algorithm showed a sensitivity of 91.0% but had 2FPs/case [10]. The computer-aided detection algorithms proposed by Ali et al. showed an overall accuracy of 64.4% (sensitivity 58.9%, specificity 55.3%, PPV 54.2%, and NPV 60.0%) [16]. Cao et al. produced a sensitivity of 90% and FP 1/scan on their 3-Dimentional convolutional neural network (3D-CNN) [17]. Dou et al. also used 3D CNNs and showed a sensitivity of 90.7% and 4 FPs/case [18] which was similar to that of Setio et al., who used multiview convolutional networks to obtain a sensitivity of 90.1% and 4 FPs/case [19]. These results showed high sensitivity, but their FP rates were also quite high at approximately 2-4 FPs/scan [10,16,17,19].
The balance between the sensitivity and FP is important. Most previous models showed high sensitivity and high FP in CT screening of lung nodules using computer-aided detection. Modern technologies allow doctors to detect focal lung lesions more efficiently. However, FP is a critical problem in lung nodule detection because FP results in unnecessary follow-up tests and expenditures [16]. It also leads to increased patient suffering and even unnecessary invasive procedures to confirm the diagnosis. This results in an increased risk of procedures. In addition, frequent and regular follow-up of chest CT scans may also lead to radiation-induced cancer [20]. Previous studies also showed that computer-aided detection software shows high FP, which represents a major limitation in the wider use of the system [11]. Therefore, most computer-aided detection does not show evidence of benefits in the real world [20]. FP reduction is a critical issue in AI lung nodule detection. In our current study, our AI-assisted model can greatly reduce the FP rate and improve the performance of doctors.

Limitations of the study
Our AI-assisted model improved the accuracy of reading CT scans and the work efficiency of doctors. However, this study has some limitations. First, this research focused on the detection of lung nodules, but did not focus on the differentiation of benign and malignant nodules. We did not analyze the performance of the lung cancer using the current model. Therefore, such an assistant system requires further research to confirm the diagnosis of lung cancers. However, through early detection of small lung nodules, we believe that early detection of lung cancer is still helpful. Second, the lack of a gold standard is a common problem in the AI detection of lung nodules [10]. Biopsy of lung nodules can confirm the correctness, but this is not feasible in most cases. In this study, we used the consensus of most experts as the reference standard. This approach was similar to that of previous studies on AI detection of lung nodules [10]. Third, the incidence of lung nodules varies with different characteristics of the study population, such as race, age, and smoking status. Therefore, the accuracy of AI differs for different populations [10,20]. Our current system still needs to be used in other ethnic groups.

Conclusions
Missed lung cancer has potentially serious medicolegal implications for doctors. The reasons for misdiagnosis on CT scans are related to specific characteristics of the undetected lesion, such as small size, ground-glass appearance, and central location. Manual detection with a large number of CT scans is a great burden, requires attention, is very timeconsuming for doctors, and is prone to errors. In our study, AI assistant programs decreased the incidence of misinterpretation of lung nodules in the errorprone characteristics of lung nodules.