机器学习方法对前列腺癌的诊断价值

柏冬; 王浩; 李璐; 王宏林

doi:10.12122/j.issn.1674-4500.2020.02.02

机器学习方法对前列腺癌的诊断价值

doi: 10.12122/j.issn.1674-4500.2020.02.02

柏冬^1,,
王浩²,
李璐³,
王宏林^3, ,

1.
航天中心医院（北京大学航天临床医学院）放射科，北京 100049
2.
32081部队，北京 100049
3.
盐城师范学院信息工程学院，江苏盐城 224002

基金项目:

国家自然科学基金 61602400

详细信息

作者简介:
柏冬，硕士，E-mail: flybaicai@126.com

通讯作者:
王宏林，副教授，E-mail: whonglin@126.com

计量
- 文章访问数: 637
- HTML全文浏览量: 272
- PDF下载量: 16
- 被引次数: 0
出版历程
- 收稿日期: 2020-05-07
- 刊出日期: 2020-04-15

Diagnostic value of machine learning methods in predicting the diagnosis of prostate cancer

Dong BAI^1
,,
Hao WANG²,
Lu LI³,
Honglin WANG^{3
, ,}

1.
Department of Radiology, Aerospace Central Hospital (Aerospace Clinical College of Medicine, Peking University), Beijing 100049, China
2.
Unit 32081, Beijing 100049, China
3.
School of Information Engineering, Yancheng Teachers University, Yancheng 224002, China

Funds:

National Natural Science Foundation of China 61602400

摘要

摘要: 目的建立联合多参数MRI前列腺影像报告与数据系统(PI-RADS) v2.1评分及临床数据的决策树、K近邻、朴素贝叶斯、随机森林、支持向量机5种机器学习模型, 评价上述模型对前列腺癌的诊断价值。方法回顾性分析在本院接受MR检查并获得的病理结果的242例患者。将PI-RADS v2.1评分、年龄、总前列腺特异抗原、游离前列腺特异抗原、游离前列腺特异抗原比值、体积、前列腺特异抗原密度录入5种机器学习模型进行诊断。通过F1值及ROC曲线评价机器学习模型的诊断价值并且计算特征变量所占比重大小。结果随机森林模型诊断前列腺癌ROC的AUC最大(0.93), 决策树及朴素贝叶斯模型AUC也较高(0.86、0.87), 支持向量机最差(0.55);F1值最高的为随机森林模型, 其次依序为朴素贝叶斯、决策树、K近邻, 支持向量机模型最小。通过随机森林和决策树模型计算各特征变量重要性, PI-RADS评分均占比例最大, 其次为前列腺特异抗原密度、前列腺体积, 年龄对模型分类贡献最低。结论随机森林、朴素贝叶斯、决策树分类模型用于预测诊断前列腺癌具有更好的效果。随机森林方法在5种机器学习模型中最优, 且PI-RADS v2.1及前列腺密度变量的特征重要性表现更明显。
- 决策树 /
- K近邻 /
- 朴素贝叶斯 /
- 随机森林 /
- 支持向量机 /
- 前列腺癌 /
- 预测诊断 /
- PI-RADS v2.1
Abstract: ObjectiveTo establish five machine learning models including decision tree, k-nearest neighbor, Naive Bayes, random forest and support vector machine (SVM) combined with multi-parameter MRI PI-RADS v2.1 score and clinical data, to evaluate the diagnostic value of these models for prostate cancer.MethodsA total of 242 patients who received MR examination in our hospital were analyzed retrospectively. PI-RADS v2.1 score, age, total prostate specific antigen, free prostate specific antigen, free prostate antigen ratio, volume and prostate specific antigen density were recorded into five machine learning models for diagnosis. The diagnostic value of machine learning model was evaluated by F1 value and ROC curve, and the proportion of characteristic variables was calculated.ResultsThe AUC of the random forest model was the largest (AUC=0.93), the AUC of the decision tree and the Naive Bayes model was also higher (AUC=0.86 and 0.87, respectively), and the support vector machine was the worst (AUC=0.55). The importance of each characteristic variable was calculated by random forest and decision tree model, and PI-RADS score accounted for the largest proportion, followed by PSA density and prostate volume, and age contributed the least to model classification.Conclusionrandom forest, naive bayes and decision tree regression model are all good in the diagnosis of prostate cancer, among which random forest is a better diagnosis model, and PI-RADS v2.1 and PSAD account for an important proportion in the diagnosis model of prostate cancer.
- decision tree /
- K-nearest neighbor /
- naive bayes /
- random forest /
- support vector machine /
- prostate cancer /
- predictive diagnosis /
- PI-RADS v2.1

HTML全文

图 1 分类预测模型流程图

Figure 1. Flow chart of classification prediction model

下载: 全尺寸图片幻灯片

图 2 变量之间的特征相关热图

Figure 2. Characteristic correlation heat map between variables

下载: 全尺寸图片幻灯片

图 3 ROC曲线及AUC图

Figure 3. ROC curve andAUC diagram

下载: 全尺寸图片幻灯片

图 4 随机森林和决策树变量重要性分布

Figure 4. Importance distribution of random forest and decision tree variables

下载: 全尺寸图片幻灯片

表 1 5种机器学习模型诊断前列腺癌的结果

Table 1. Results of 5 machine learning models in the diagnosis of prostate cancer

指标	决策树	K近邻	朴素贝叶斯	随机森林	支持向量机
特异度	0.83	0.83	0.90	0.97	1
敏感度	0.9	0.7	0.85	0.9	0.1
AUC	0.86	0.76	0.87	0.93	0.55
精确度	0.86	0.78	0.88	0.94	0.63
査准率	0.85	0.77	0.87	0.94	0.81
召lnl率	0.86	0.76	0.87	0.93	0.55
F1值	0.86	0.77	0.88	0.94	0.53

下载: 导出CSV

参考文献(19)

[1]	Culp MB, Soerjomataram I, Efstathiou JA, et al.Recent global patterns in prostate cancer incidence and mortality rates[J].Eur Urol, 2020, 77(1):38-52. http://cn.bing.com/academic/profile?id=fa12e5c43e7a943a7b8f16565f27640a&encoded=0&v=paper_preview&mkt=zh-cn
[2]	曾小辉, 彭涛, 高月琴, 等.基于前列腺影像报告和数据系统第2版的机器学习模型诊断高级别前列腺癌[J].中国医学影像技术, 2018, 34(12):1852-6. http://d.old.wanfangdata.com.cn/Periodical/zgyxyxjs201812023
[3]	Bermejo P, Vivo A, Tárraga PJ, et al.Development of interpretable predictive models for BPH and prostate cancer[J].Clin Med Insights Oncol, 2015, 9:15-24. http://cn.bing.com/academic/profile?id=9bf748d45de8d0cb7dd8edfde4ea63a0&encoded=0&v=paper_preview&mkt=zh-cn
[4]	van Leeuwen PJ, Hayen A, Thompson JE, et al.A multiparametric magnetic resonance imaging-based risk model to determine the risk of significant prostate cancer prior to biopsy[J].BJU Int, 2017, 120(6):774-81. doi: 10.1111/bju.13814
[5]	Padhani AR, Weinreb J, Rosenkrantz AB, et al.Prostate imagingreporting and data system steering committee:PI-RADS v2 status update and future directions[J].Eur Urol, 2019, 75(3):385-96. doi: 10.1016/j.eururo.2018.05.035
[6]	李锐, 李鹏, 曲亚东, 等.机器学习实战[M].北京:人民邮电出版社, 2013.
[7]	周志华.机器学习[M].北京:清华大学出版社, 2016.
[8]	李航.统计学习方法[M].北京:清华大学出版社, 2012.
[9]	Mark L, 李军, 刘红伟, 等.Python学习手册[M].北京:机械工业出版社, 2017:540-9.
[10]	Ciudici P.Applied data mining:Statistical methods for business and industry[M].Hoboken:John Wiley & Sons, 2003:2.
[11]	Wang GT, Li WQ, Zuluaga MA, et al.Interactive medical image segmentation using deep learning with image-specific fine tuning [J].IEEE Trans Med Imaging, 2018, 37(7):1562-73. doi: 10.1109/TMI.2018.2791721
[12]	Balakrishnan G, Zhao A, Sabuncu MR, et al.An unsupervised learning model for deformable medical image registration[EB/OL]. [2018-02-07].https://arxiv.org/pdf/1802.02604v1.pdf.
[13]	Hu L, Cui J.Digital image recognition based on Fractional-orderPCA-SVM coupling algorithm[J].Measurement, 2019, 145:150-9. doi: 10.1016/j.measurement.2019.02.006
[14]	Frid-Adar M, Diamant I, Klang E, et al.GAN-based synthetic medical image augmentation for increased CNN performance in liver lesion classification[J].Neurocomputing, 2018, 321:321-31. doi: 10.1016/j.neucom.2018.09.013
[15]	彭涛, 肖建明, 张仕慧, 等.基于多参数MRI及影像组学建立机器学习模型诊断临床显著性前列腺癌[J].中国医学影像技术, 2019, 35(10):1526-30. http://d.old.wanfangdata.com.cn/Periodical/zgyxyxjs201910022
[16]	肖利洪, 陈沛然, 李梅, 等.TAN贝叶斯网络模型在前列腺癌中的预测研究[J].中华男科学杂志, 2016, 22(6):506-10. http://d.old.wanfangdata.com.cn/Periodical/zhnkx201606005
[17]	茆诗松.贝叶斯统计[M].北京:中国统计出版社, 1999.
[18]	田冰.经典统计学与机器学习中变量选择方法的比较分析[D].济南: 山东大学, 2019.
[19]	Zheng YX, Huang Y, Cheng G, et al.Developing a new score system for patients with PSA ranging from 4 to 20 ng/mL to improve the accuracy of PCa detection[J].Springerplus, 2016, 5(1):1484. doi: 10.1186/s40064-016-3176-3

施引文献

资源附件(0)

访问统计

点击查看大图

图(4) / 表(1)

计量

文章访问数: 637
HTML全文浏览量: 272
PDF下载量: 16
被引次数: 0

姓名
邮箱
手机号码
标题
留言内容
验证码

留言板

机器学习方法对前列腺癌的诊断价值

doi: 10.12122/j.issn.1674-4500.2020.02.02

作者简介:
柏冬，硕士，E-mail: flybaicai@126.com

通讯作者:
王宏林，副教授，E-mail: whonglin@126.com

计量

Diagnostic value of machine learning methods in predicting the diagnosis of prostate cancer

计量

目录

留言板

机器学习方法对前列腺癌的诊断价值

doi: 10.12122/j.issn.1674-4500.2020.02.02

作者简介: 柏冬，硕士，E-mail: flybaicai@126.com

通讯作者: 王宏林，副教授，E-mail: whonglin@126.com

计量

出版历程

Diagnostic value of machine learning methods in predicting the diagnosis of prostate cancer

计量

出版历程

目录

作者简介:
柏冬，硕士，E-mail: flybaicai@126.com

通讯作者:
王宏林，副教授，E-mail: whonglin@126.com