COVID-19 auxiliary diagnostic model based on migration learning and multi-parameter fusion optimization
-
摘要:
目的 针对传统CT影像诊断准确性不高和效率低下问题,探讨深度学习技术在影像学中辅助诊断COVID-19的模型研究。 方法 首先构建早期、进展期和重症期三类别的COVID-19影像学数据集,然后构建一个基于VGG-16迁移学习的诊断COVID-19的初始模型,最后通过逐步对全连接层网络结构、激活函数、损失函数、优化算法、学习率和样本批次大小的多参数融合优化,设计出一个COVID-19辅助诊断模型。 结果 在COVID-19影像学测试集上COVID-19辅助诊断模型的准确率为98.10%,其中早期、进展期和重症期样本的敏感度分别为0.97、1.00、0.97,F1-score分别为0.98、0.97、0.99。 结论 通过迁移学习和多参数融合优化策略,设计的COVID-19辅助诊断模型在测试集上有较高的准确率。在防控疫情时,辅助诊断模型能帮助医务工作者提高工作效率。 Abstract:Objective To address the problem of low accuracy and efficiency of traditional CT image diagnosis, this paper discusses the model of deep learning technology in assisting the diagnosis of covid-19 in imaging. Methods First of all, a COVID-19 imaging dataset was constructed for three categories: early stage, progressive stage and critical stage. Then, an initial model for diagnosing COVID-19 based on VGG-16 transfer learning was constructed. Finally, a COVID-19 aided diagnosis model is designed by gradually optimizing the multi parameter fusion of the full connection layer network structure, activation function, loss function, optimization algorithm, learning rate and sample batch size. Results The accuracy of the COVID-19 auxiliary diagnosis model on the COVID-19 imaging test set was 98.10% with sensitivities of 0.97, 1.00 and 0.97 for the early, progressive and severe samples, and F1-scores of 0.98, 0.97 and 0.99, respectively. Conclusion Through migration learning and multi-parameter fusion optimization strategies, the designed COVID-19 aided diagnosis model had high accuracy on the test set. The assisted diagnosis model can help medical workers to improve their efficiency when preventing and controlling epidemics. -
Key words:
- transfer learning /
- deep learning /
- assisted diagnosis /
- COVID-19 /
- CT
-
图 10 COVID-19早期CT影像表现
COVID-19确诊患者,A:男性,42岁,咳嗽、咽痛和发热3 d,双肺多叶多灶分布(箭头所示);B:女性,28岁,发热4 h,入院6 d右肺下叶后段胸膜下区扇形片状影(箭头所示);C:男性,23岁,无临床症状,入院时左肺下叶孤立类圆形结节伴晕征(箭头所示)[30]
Figure 10. Early CT imaging performance of COVID-19.
图 11 COVID-19发展期CT影像表现
A~F:同一COVID-19患者处于发展期CT影像的表现,双肺GGO明显增大、增多,范围较大,可见“铺路石”征,部分实变,内可见空气支气管征;纵隔淋巴结未见肿大,未见胸腔积液征,未受累肺组织密度正常[31].
Figure 11. COVID-19 CT imaging performance during development.
图 12 不同病变分期患者胸部CT影像
A~B:女性,54岁,早期COVID-19患者(双下肺胸膜下散在斑片状GGO);C~D,女性,54岁,进展期COVID-19患者(病灶累及多个肺叶,可见实变影及纤维条索影);E~F,男性,78岁,重症期COVID-19患者(双肺弥漫性病变,实变影为主,可见“含气支气管征”)[32].
Figure 12. Chest CT images of patients with different pathological stages.
表 1 COVID-19影像学数据集的划分
Table 1. Division of COVID-19 imaging data set
数据集 样本个数 样本张量 训练集 412 412×224×224 测试集 105 105×224×224 表 2 COVID-19影像学数据集在CT表现的分布情况
Table 2. Distribution of the COVID-19 imaging data set on CT manifestations
CT表现 训练集 测试集 标注 早期 116 30 0 进展期 156 39 1 重症期 140 26 2 表 3 评估模型常见的指标
Table 3. Common indicators of evaluation models
混淆矩阵 目标 指标 真实正样本 真实负样本 模型 预测正样本 TP FP 阳性预测值或精确率=TP/(TP+FP) 预测负样本 FN TN 阴性预测值=(TN)(/ TN + FN) 召回率=TP/(TP+FN) 特异性=TN/(TN+FP) 准确率= (TP+TN)/(TP+FP+FN+TN) 注:TP为正样本预测为正样本,FP为负样本预测为正样本,FN是预测为负的正样本,TN是预测为负的负样本,大多数情况下评估模型都要用到准确率,根据具体的应用侧重使用不同的评估指标. 表 4 传统机器学习与各种不同迁移学习之间的关系
Table 4. Relationship between traditional machine learning and various transfer learning
学习框架 源域和目标域 源任务和目标任务 传统机器学习 相同 相同 迁移学习 归纳式迁移学习 相同 不同但相关 直推式迁移学习 不同但相关 不同但相关 无监督迁移学习 不同但相关 相同 表 5 简单全连接层神经网络模型参数
Table 5. Model parameters of simple fully connected layer neural network
参数 批次大小 损失函数 学习率 优化器 神经元个数 激活函数 参数值 15 交叉熵 0.0001 AdagradOptimizer 2048 Relu 表 6 全连接层神经网络模型参数
Table 6. Fully connected layer neural network model parameters
参数 批次大小 损失函数 学习率 优化器 两层神经元个数 激活函数 参数值 15 交叉熵 0.0001 AdagradOptimizer 2048、128 Relu 表 7 三分类混淆矩阵
Table 7. Three-class confusion matrix (n)
混淆矩阵 预测类别 0 1 2 真实类别 0 29 1 0 1 0 39 0 2 0 1 35 表 8 评估模型的指标
Table 8. Methods for evaluating models
评估方法 敏感度 精确率 F1-Score 类别 0 0.97 1.00 0.98 1 1 0.95 0.97 2 0.97 1.00 0.99 -
[1] World Health Organization. Naming the coronavirus disease (COVID-19) and the virus that causes it[EB/OL].[2021-7-27].https://www.who.int/emergencies/diseases/novel-coronavirus-2019/technical-guidance. [2] 百度.新型冠状病毒肺炎百度疫情实时大数据报告[EB/OL].[2021- 7-27].https://voice.baidu.com/act/newpneumonia/newpneumonia/? from = osari_aladin_banner. [3] Chan JFW, Yuan SF, Kok KH, et al. A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-toperson transmission: a study of a family cluster[J]. Lancet, 2020,395(10223): 514-23. doi: 10.1016/S0140-6736(20)30154-9 [4] 卫生健康委办公厅, 中医药局办公室.新型冠状病毒肺炎诊疗方案(试行第八版修订版)[EB/OL].[2021-04-15].http://www.gov.cn/zhengce/zhengceku/2021-04/15/5599795/files/e9ce837932e6434db 998bdbbc5d36d32.pdf [5] Fang Y, Zhang H, Xie J, et al. Sensitivity of chest CT for COVID-19: comparison to RT-PCR[J]. Radiology, 2020,296(2): E115-7. doi: 10.1148/radiol.2020200432 [6] 谭鸣, 冯晓源, 刘士远, 等.新型冠状病毒肺炎影像检查诊断与感染控制指导意见[J/OL].中国医学计算机成像杂志: 1-19. https://doi.org/10.19627/j.cnki.cn31-1700/th.20200309.001. [7] 李祥霞.恶性肺结节计算机辅助诊断关键技术研究[D].广州: 华南理工大学, 2018. [8] 王继元, 李真林, 蒲立新, 等.基于人工智能的正位DR胸片质控体系研究与应用[J].生物医学工程学杂志, 2020, 37(1): 158-68. https://www.cnki.com.cn/Article/CJFDTOTAL-SWGC202001020.htm [9] 刘珍娟, 傅迎霞, 张羽, 等.不同CT图像重建算法下基于深度学习的肺结节检测算法效能[J].中国医学影像技术, 2019, 35(12): 1775-9. https://www.cnki.com.cn/Article/CJFDTOTAL-ZYXX201912006.htm [10] 谢未央, 陈彦博, 王季勇, 等.基于卷积神经网络的CT图像肺结节检测[J].计算机工程与设计, 2019, 40(12): 3575-81. https://www.cnki.com.cn/Article/CJFDTOTAL-SJSJ201912035.htm [11] 李欣菱, 郭芳芳, 周振, 等.基于深度学习的人工智能胸部CT肺结节检测效能评估[J].中国肺癌杂志, 2019, 22(6): 336-40. https://www.cnki.com.cn/Article/CJFDTOTAL-FAIZ201906002.htm [12] 刘晓鹏, 周海英, 胡志雄, 等.人工智能识别技术在T1期肺癌诊断中的临床应用研究[J].中国肺癌杂志, 2019, 22(5): 319-23. https://www.cnki.com.cn/Article/CJFDTOTAL-FAIZ201905011.htm [13] 张鹏, 徐欣楠, 王洪伟, 等.基于深度学习的计算机辅助肺癌诊断方法[J].计算机辅助设计与图形学学报, 2018, 30(1): 90-9. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJF201801009.htm [14] Chen H, Guo J, Wang C, et al. Clinical characteristics and intrauterine vertical transmission potential of COVID-19 infection in nine pregnant women: a retrospective review of medical records [J]. Lancet, 2020,395(10226): 809-15. doi: 10.1016/S0140-6736(20)30360-3 [15] Hu Z, Song C, Xu C, et al. Clinical characteristics of 24 asymptomatic infections with COVID-19 screened among close contacts in Nanjing, China[J]. Sci China Life Sci, 2020, 63(5): 706-11. doi: 10.1007/s11427-020-1661-4 [16] Jiang F, Deng L, Zhang L, et al. Review of the clinical characteristics of coronavirus disease 2019 (COVID-19)[J]. J Gen Intern Med, 2020, 35(5): 1545-9. doi: 10.1007/s11606-020-05762-w [17] McCall B. COVID-19 and artificial intelligence: protecting healthcare workers and curbing the spread[J]. Lancet Digit Health, 2020, 2(4): e166-7. doi: 10.1016/S2589-7500(20)30054-6 [18] Bai L, Yang DW, Wang X, et al. Chinese experts'consensus on the Internet of Things-aided diagnosis and treatment of coronavirus disease 2019 (COVID-19[) J]. Clin Health, 2020, 3: 7-15. [19] Chen J, Wu L, Zhang J, et al. Deep learning-based model for detecting 2019 novel coronavirus pneumonia on high-resolution computed tomography: a prospective study[J/OL]. MedRxiv.https:// www.medrxiv.org/content/10.1101/2020.02.25.20021568v2. [20] Song Y, Zheng SJ, Li L, et al. Deep learning enables accurate diagnosis of novel coronavirus (COVID-19) with CT images[J]. IEEE/ACM Trans Comput Biol Bioinform, 5361, PP(99): 1. [21] 史河水, 韩小雨, 樊艳青, 等.新型冠状病毒(2019-nCoV)感染的肺炎临床特征及影像学表现[J].临床放射学杂志, 2020, 39(1): 8-11. https://www.cnki.com.cn/Article/CJFDTOTAL-LCFS202001003.htm [22] 钟飞扬, 张寒菲, 王彬宸, 等.新型冠状病毒肺炎的CT影像学表现[J].武汉大学学报:医学版, 2020, 41(3): 345-8. https://www.cnki.com.cn/Article/CJFDTOTAL-HBYK202003001.htm [23] 夏静, 潘素, 颜默磊, 等.基于迁移学习的小样本重症疾病预后模型[J].生物医学工程学杂志, 2020, 37(1): 1-9. https://www.cnki.com.cn/Article/CJFDTOTAL-SWGC202001001.htm [24] 蒋正锋, 许昕.基于CT影像的COVID-19智能辅助诊断方法[J].分子影像学杂志, 2020, 43(2): 264-9. doi: 10.12122/j.issn.1674-4500.2020.02.17 [25] Wei FM, Zhang JP, Chu Y, et al. FSFP: transfer learning from long texts to the short[J]. Appl Math Inf Sci, 2014, 8(4): 2033-40. doi: 10.12785/amis/080462 [26] Blitzer J, McDonald R, Pereira F. Domain adaptation with structural correspondence learning[C]//Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing - EMNLP ' 06. July 22-23, 2006. Sydney, Australia. Morristown, NJ, USA: Association for Computational Linguistics, 2006. [27] Dai WY, Xue GR, Yang Q, et al. Co-clustering based classification for out-of-domain documents[C]//Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '07. August 12-15, 2007. San Jose, California, USA. New York: ACM Press, 2007. [28] Xing DK, Dai WY, Xue GR, et al. Bridged refinement for transfer learning[M]//Knowledge Discovery in Databases: PKDD 2007. Berlin, Heidelberg: Springer Berlin Heidelberg: 324-35. [29] Dai WY, Yang Q, Xue GR, et al. Self-taught clustering[C]// Proceedings of the 25th international conference on Machine learning - ICML '08. July 5-9, 2008. Helsinki, Finland. New York: ACM Press, 2008: 200-7. [30] 何其舟, 侯云清, 代平, 等.新型冠状病毒肺炎早期与进展期的CT影像表现探讨[J].西南医科大学学报, 2020, 43(2): 196-200. doi: 10.3969/j.issn.2096-3351.2020.02.022 [31] 龙冰清, 熊曾, 刘进康.以磨玻璃影为主要表现的肺部感染性病变影像学鉴别诊断[J].中国感染控制杂志, 2020, 19(3): 214-22. https://www.cnki.com.cn/Article/CJFDTOTAL-GRKZ202003003.htm [32] 汪锴, 康嗣如, 田荣华, 等.新型冠状病毒肺炎胸部CT影像学特征分析[J].中国临床医学, 2020, 27(1): 27-31. https://www.cnki.com.cn/Article/CJFDTOTAL-LCYX202001009.htm