Int J Med Sci 2020; 17(3):280-291. doi:10.7150/ijms.37134

Research Paper

Machine Learning in Prediction of Second Primary Cancer and Recurrence in Colorectal Cancer

Wen-Chien Ting1,2, Yen-Chiao Angel Lu3, Wei-Chi Ho4✉, Chalong Cheewakriangkrai5, Horng-Rong Chang6,7✉, Chia-Ling Lin8

1. Division of Colorectal Surgery, Department of Surgery, Chung Shan Medical University Hospital, Taiwan
2. Institute of Medicine, Chung Shan Medical University, Taiwan
3. School of Nursing, Chung-Shan Medical University, Taiwan
4. Department of Gastroenterology, Jen-Ai Hospital, Taichung, Taiwan
5. Division of Gynecologic Oncology, Department of Obstetrics and Gynecology, Faculty of Medicine, Chiang Mai University, Thailand
6. Division of Nephrology, Department of Internal medicine, Chung Shan Medical University Hospital, Taiwan
7. School of Medicine, Chung Shan Medical University
8. Department of Nutrition, Jen-Ai hospital, Taichung, Taiwan

This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/). See http://ivyspring.com/terms for full terms and conditions.
Citation:
Ting WC, Lu YCA, Ho WC, Cheewakriangkrai C, Chang HR, Lin CL. Machine Learning in Prediction of Second Primary Cancer and Recurrence in Colorectal Cancer. Int J Med Sci 2020; 17(3):280-291. doi:10.7150/ijms.37134. Available from http://www.medsci.org/v17p0280.htm

File import instruction

Abstract

Background: Colorectal cancer (CRC) is the third commonly diagnosed cancer worldwide. Recurrence of CRC (Re) and onset of a second primary malignancy (SPM) are important indicators in treating CRC, but it is often difficult to predict the onset of a SPM. Therefore, we used mechanical learning to identify risk factors that affect Re and SPM.

Patient and Methods: CRC patients with cancer registry database at three medical centers were identified. All patients were classified based on Re or no recurrence (NRe) as well as SPM or no SPM (NSPM). Two classifiers, namely A Library for Support Vector Machines (LIBSVM) and Reduced Error Pruning Tree (REPTree), were applied to analyze the relationship between clinical features and Re and/or SPM category by constructing optimized models.

Results: When Re and SPM were evaluated separately, the accuracy of LIBSVM was 0.878 and that of REPTree was 0.622. When Re and SPM were evaluated in combination, the precision of models for SPM+Re, NSPM+Re, SPM+NRe, and NSPM+NRe was 0.878, 0.662, 0.774, and 0.778, respectively.

Conclusions: Machine learning can be used to rank factors affecting tumor Re and SPM. In clinical practice, routine checkups are necessary to ensure early detection of new tumors. The success of prediction and early detection may be enhanced in the future by applying “big data” analysis methods such as machine learning.

Keywords: colorectal cancer, second primary malignancy, machine learning