Diagnostic value of machine learning methods in predicting the diagnosis of prostate cancer
-
摘要:
目的建立联合多参数MRI前列腺影像报告与数据系统(PI-RADS) v2.1评分及临床数据的决策树、K近邻、朴素贝叶斯、随机森林、支持向量机5种机器学习模型, 评价上述模型对前列腺癌的诊断价值。 方法回顾性分析在本院接受MR检查并获得的病理结果的242例患者。将PI-RADS v2.1评分、年龄、总前列腺特异抗原、游离前列腺特异抗原、游离前列腺特异抗原比值、体积、前列腺特异抗原密度录入5种机器学习模型进行诊断。通过F1值及ROC曲线评价机器学习模型的诊断价值并且计算特征变量所占比重大小。 结果随机森林模型诊断前列腺癌ROC的AUC最大(0.93), 决策树及朴素贝叶斯模型AUC也较高(0.86、0.87), 支持向量机最差(0.55);F1值最高的为随机森林模型, 其次依序为朴素贝叶斯、决策树、K近邻, 支持向量机模型最小。通过随机森林和决策树模型计算各特征变量重要性, PI-RADS评分均占比例最大, 其次为前列腺特异抗原密度、前列腺体积, 年龄对模型分类贡献最低。 结论随机森林、朴素贝叶斯、决策树分类模型用于预测诊断前列腺癌具有更好的效果。随机森林方法在5种机器学习模型中最优, 且PI-RADS v2.1及前列腺密度变量的特征重要性表现更明显。 Abstract:ObjectiveTo establish five machine learning models including decision tree, k-nearest neighbor, Naive Bayes, random forest and support vector machine (SVM) combined with multi-parameter MRI PI-RADS v2.1 score and clinical data, to evaluate the diagnostic value of these models for prostate cancer. MethodsA total of 242 patients who received MR examination in our hospital were analyzed retrospectively. PI-RADS v2.1 score, age, total prostate specific antigen, free prostate specific antigen, free prostate antigen ratio, volume and prostate specific antigen density were recorded into five machine learning models for diagnosis. The diagnostic value of machine learning model was evaluated by F1 value and ROC curve, and the proportion of characteristic variables was calculated. ResultsThe AUC of the random forest model was the largest (AUC=0.93), the AUC of the decision tree and the Naive Bayes model was also higher (AUC=0.86 and 0.87, respectively), and the support vector machine was the worst (AUC=0.55). The importance of each characteristic variable was calculated by random forest and decision tree model, and PI-RADS score accounted for the largest proportion, followed by PSA density and prostate volume, and age contributed the least to model classification. Conclusionrandom forest, naive bayes and decision tree regression model are all good in the diagnosis of prostate cancer, among which random forest is a better diagnosis model, and PI-RADS v2.1 and PSAD account for an important proportion in the diagnosis model of prostate cancer. -
Key words:
- decision tree /
- K-nearest neighbor /
- naive bayes /
- random forest /
- support vector machine /
- prostate cancer /
- predictive diagnosis /
- PI-RADS v2.1
-
表 1 5种机器学习模型诊断前列腺癌的结果
Table 1. Results of 5 machine learning models in the diagnosis of prostate cancer
指标 决策树 K近邻 朴素贝叶斯 随机森林 支持向量机 特异度 0.83 0.83 0.90 0.97 1 敏感度 0.9 0.7 0.85 0.9 0.1 AUC 0.86 0.76 0.87 0.93 0.55 精确度 0.86 0.78 0.88 0.94 0.63 査准率 0.85 0.77 0.87 0.94 0.81 召lnl率 0.86 0.76 0.87 0.93 0.55 F1值 0.86 0.77 0.88 0.94 0.53 -
[1] Culp MB, Soerjomataram I, Efstathiou JA, et al.Recent global patterns in prostate cancer incidence and mortality rates[J].Eur Urol, 2020, 77(1):38-52. http://cn.bing.com/academic/profile?id=fa12e5c43e7a943a7b8f16565f27640a&encoded=0&v=paper_preview&mkt=zh-cn [2] 曾小辉, 彭涛, 高月琴, 等.基于前列腺影像报告和数据系统第2版的机器学习模型诊断高级别前列腺癌[J].中国医学影像技术, 2018, 34(12):1852-6. http://d.old.wanfangdata.com.cn/Periodical/zgyxyxjs201812023 [3] Bermejo P, Vivo A, Tárraga PJ, et al.Development of interpretable predictive models for BPH and prostate cancer[J].Clin Med Insights Oncol, 2015, 9:15-24. http://cn.bing.com/academic/profile?id=9bf748d45de8d0cb7dd8edfde4ea63a0&encoded=0&v=paper_preview&mkt=zh-cn [4] van Leeuwen PJ, Hayen A, Thompson JE, et al.A multiparametric magnetic resonance imaging-based risk model to determine the risk of significant prostate cancer prior to biopsy[J].BJU Int, 2017, 120(6):774-81. doi: 10.1111/bju.13814 [5] Padhani AR, Weinreb J, Rosenkrantz AB, et al.Prostate imagingreporting and data system steering committee:PI-RADS v2 status update and future directions[J].Eur Urol, 2019, 75(3):385-96. doi: 10.1016/j.eururo.2018.05.035 [6] 李锐, 李鹏, 曲亚东, 等.机器学习实战[M].北京:人民邮电出版社, 2013. [7] 周志华.机器学习[M].北京:清华大学出版社, 2016. [8] 李航.统计学习方法[M].北京:清华大学出版社, 2012. [9] Mark L, 李军, 刘红伟, 等.Python学习手册[M].北京:机械工业出版社, 2017:540-9. [10] Ciudici P.Applied data mining:Statistical methods for business and industry[M].Hoboken:John Wiley & Sons, 2003:2. [11] Wang GT, Li WQ, Zuluaga MA, et al.Interactive medical image segmentation using deep learning with image-specific fine tuning [J].IEEE Trans Med Imaging, 2018, 37(7):1562-73. doi: 10.1109/TMI.2018.2791721 [12] Balakrishnan G, Zhao A, Sabuncu MR, et al.An unsupervised learning model for deformable medical image registration[EB/OL]. [2018-02-07].https://arxiv.org/pdf/1802.02604v1.pdf. [13] Hu L, Cui J.Digital image recognition based on Fractional-orderPCA-SVM coupling algorithm[J].Measurement, 2019, 145:150-9. doi: 10.1016/j.measurement.2019.02.006 [14] Frid-Adar M, Diamant I, Klang E, et al.GAN-based synthetic medical image augmentation for increased CNN performance in liver lesion classification[J].Neurocomputing, 2018, 321:321-31. doi: 10.1016/j.neucom.2018.09.013 [15] 彭涛, 肖建明, 张仕慧, 等.基于多参数MRI及影像组学建立机器学习模型诊断临床显著性前列腺癌[J].中国医学影像技术, 2019, 35(10):1526-30. http://d.old.wanfangdata.com.cn/Periodical/zgyxyxjs201910022 [16] 肖利洪, 陈沛然, 李梅, 等.TAN贝叶斯网络模型在前列腺癌中的预测研究[J].中华男科学杂志, 2016, 22(6):506-10. http://d.old.wanfangdata.com.cn/Periodical/zhnkx201606005 [17] 茆诗松.贝叶斯统计[M].北京:中国统计出版社, 1999. [18] 田冰.经典统计学与机器学习中变量选择方法的比较分析[D].济南: 山东大学, 2019. [19] Zheng YX, Huang Y, Cheng G, et al.Developing a new score system for patients with PSA ranging from 4 to 20 ng/mL to improve the accuracy of PCa detection[J].Springerplus, 2016, 5(1):1484. doi: 10.1186/s40064-016-3176-3