留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码
x

机器学习XGBoost算法在医学领域的应用研究进展

齐巧娜 刘艳 陈霁晖 刘昕竹 杨锐 张津源 崔梦璇 谢艺萌 王则远 于泽 高飞 张健

齐巧娜, 刘艳, 陈霁晖, 刘昕竹, 杨锐, 张津源, 崔梦璇, 谢艺萌, 王则远, 于泽, 高飞, 张健. 机器学习XGBoost算法在医学领域的应用研究进展[J]. 分子影像学杂志, 2021, 44(5): 856-862. doi: 10.12122/j.issn.1674-4500.2021.05.25
引用本文: 齐巧娜, 刘艳, 陈霁晖, 刘昕竹, 杨锐, 张津源, 崔梦璇, 谢艺萌, 王则远, 于泽, 高飞, 张健. 机器学习XGBoost算法在医学领域的应用研究进展[J]. 分子影像学杂志, 2021, 44(5): 856-862. doi: 10.12122/j.issn.1674-4500.2021.05.25
QI Qiaona, LIU Yan, CHEN Jihui, LIU Xinzhu, YANG Rui, ZHANG Jinyuan, CUI Mengxuan, XIE Yimeng, WANG Zeyuan, YU Ze, GAO Fei, ZHANG Jian. Research progress on machine learning XGBoost algorithm in medicine[J]. Journal of Molecular Imaging, 2021, 44(5): 856-862. doi: 10.12122/j.issn.1674-4500.2021.05.25
Citation: QI Qiaona, LIU Yan, CHEN Jihui, LIU Xinzhu, YANG Rui, ZHANG Jinyuan, CUI Mengxuan, XIE Yimeng, WANG Zeyuan, YU Ze, GAO Fei, ZHANG Jian. Research progress on machine learning XGBoost algorithm in medicine[J]. Journal of Molecular Imaging, 2021, 44(5): 856-862. doi: 10.12122/j.issn.1674-4500.2021.05.25

机器学习XGBoost算法在医学领域的应用研究进展

doi: 10.12122/j.issn.1674-4500.2021.05.25
基金项目: 

国家重点研发计划项目 2020YFC2005502

国家重点研发计划项目 2020YFC2005503

北京市科技计划课题 Z201100005620006

详细信息

Research progress on machine learning XGBoost algorithm in medicine

  • 摘要: 机器学习XGBoost算法于2014年提出,其基于boosting算法展开,在许多数据科学大赛上都显示出了极高的可用性和优异性能。目前基于XGBoost算法构建的分类或回归预测模型已经广泛地运用于医疗保健、金融、教育、制造等领域的数据分析中。在医药学领域中XGBoost已广泛应用于疾病诊断以及疾病发生风险、转归与预后、合理安全用药和药物研发的等方面,并且在这些领域中提供了具有极大可能性的解决方案,有助于提高决策的效率和质量,降低假阳性率。同时,XGBoost算法在处理数据缺失值时,能自动学习分裂方向;在处理大型数据集时,能够模拟非线性效应,具有较高的效率和准确性。

     

  • 图  1  决策树分类原理整体模型—对于给定的示例,最后的预测是每棵树的预测之和[9]

    Figure  1.  Tree ensemble model-The final prediction for a given example is the sum of predictions from each tree[9].

  • [1] Murdoch TB, Detsky AS. The inevitable application of big data to health care[J]. JAMA, 2013, 309(13): 1351-2. doi: 10.1001/jama.2013.393
    [2] Merelli I, Pérez-Sánchez H, Gesing S, et al. Managing, analysing, and integrating big data in medical bioinformatics: open problems and future perspectives[J]. Biomed Res Int, 2014, 2014: 134023.
    [3] Raghupathi W, Raghupathi V. Big data analytics in healthcare: promise and potential[J]. Health Inf Sci Syst, 2014, 2: 3. doi: 10.1186/2047-2501-2-3
    [4] Shavlik JW. Readings in Machine Learning[M]. Los Altos, CA: Morgan Kaufmann, 1990.
    [5] Michalski RS, Bratko I, Kubat M. Machine learning, data mining and knowledge discovery: methods and applications[M]. New York: Wiley, 1998.
    [6] Jordan MI, Mitchell TM. Machine learning: Trends, perspectives, and prospects[J]. Science, 2015, 349(6245): 255-60. doi: 10.1126/science.aaa8415
    [7] Altman RB. Artificial intelligence (AI) systems for interpreting complex medical datasets[J]. Clin Pharmacol Ther, 2017, 101(5): 585-6. doi: 10.1002/cpt.650
    [8] Sun Y, Todorovic S, Goodison S. Local-learning-based feature selection for high-dimensional data analysis[J]. IEEE Trans Pattern Anal Mach Intell, 2010, 32(9): 1610-26. doi: 10.1109/TPAMI.2009.190
    [9] Chen TQ, Guestrin C. XGBoost: a scalable tree boosting system[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco California USA. New York, NY, USA: ACM, 2016: 785-94.
    [10] Cios KJ, William Moore G. Uniqueness of medical data mining[J]. Artif Intell Med, 2002, 26(1/2): 1-24. http://www.ncbi.nlm.nih.gov/pubmed/12234714
    [11] Bellazzi R, Zupan B. Predictive data mining in clinical medicine: current issues and guidelines[J]. Int J Med Inform, 2008, 77(2): 81-97. doi: 10.1016/j.ijmedinf.2006.11.006
    [12] Zhang X, Yan C, Gao C, et al. Predicting missing values in medical data via XGBoost regression[J]. J Healthc Inform Res, 2020, 4(4): 383-94. doi: 10.1007/s41666-020-00077-1
    [13] Newgard CD, Lewis RJ. Missing data: how to best account for what is not known[J]. JAMA, 2015, 314(9): 940-1. doi: 10.1001/jama.2015.10516
    [14] Luo Y, Szolovits P, Dighe AS, et al. 3D-MICE: integration of cross-sectional and longitudinal imputation for multi-analyte longitudinal clinical data[J]. J Am Med Inform Assoc, 2018, 25(6): 645-53. doi: 10.1093/jamia/ocx133
    [15] Cismondi F, Fialho AS, Vieira SM, et al. Missing data in medical databases: impute, delete or classify?[J]. Artif Intell Med, 2013, 58 (1): 63-72. doi: 10.1016/j.artmed.2013.01.003
    [16] Torlay L, Perrone-Bertolotti M, Thomas E, et al. Machine learning-XGBoost analysis of language networks to classify patients with epilepsy[J]. Brain Inform, 2017, 4(3): 159-69. doi: 10.1007/s40708-017-0065-7
    [17] Nishio M, Nishizawa M, Sugiyama O, et al. Computer-aided diagnosis of lung nodule using gradient tree boosting and Bayesian optimization[J]. PLoS One, 2018, 13(4): 195875. http://arxiv.org/ftp/arxiv/papers/1708/1708.05897.pdf
    [18] Taylor RA, Moore CL, Cheung KH, et al. Predicting urinary tract infections in the emergency department with machine learning[J]. PLoS One, 2018, 13(3): e0194085. doi: 10.1371/journal.pone.0194085
    [19] Maass F, Michalke B, Leha A, et al. Elemental fingerprint as a cerebrospinal fluid biomarker for the diagnosis of Parkinson's disease[J]. J Neurochem, 2018, 145(4): 342-51. doi: 10.1111/jnc.14316
    [20] Yu DP, Liu ZD, Su CY, et al. Copy number variation in plasma as a tool for lung cancer prediction using Extreme Gradient Boosting (XGBoost) classifier[J]. Thorac Cancer, 2020, 11(1): 95-102. doi: 10.1111/1759-7714.13204
    [21] Ye C, Fu T, Hao S, et al. Prediction of incident hypertension within the next year: prospective study using statewide electronic health records and machine learning[J]. J Med Internet Res, 2018, 20(1): e22. doi: 10.2196/jmir.9268
    [22] Chen X, Huang L, Xie D, et al. EGBMMDA: extreme gradient boosting machine for MiRNA-disease association prediction[J]. Cell Death Dis, 2018, 9(1): 3. doi: 10.1038/s41419-017-0003-x
    [23] Trakadis YJ, Sardaar S, Chen A, et al. Machine learning in schizophrenia genomics, a case-control study using 5, 090 exomes[J]. Am J Med Genet, 2019, 180(2): 103-12. doi: 10.1002/ajmg.b.32638
    [24] van Rosendael AR, Maliakal G, Kolli KK, et al. Maximization of the usage of coronary CTA derived plaque information using a machine learning based algorithm to improve risk stratification; insights from the CONFIRM registry[J]. J Cardiovasc Comput Tomogr, 2018, 12(3): 204-9. doi: 10.1016/j.jcct.2018.04.011
    [25] Livne M, Boldsen JK, Mikkelsen IK, et al. Boosted tree model reforms multimodal magnetic resonance imaging infarct prediction in acute stroke[J]. Stroke, 2018, 49(4): 912-8. doi: 10.1161/STROKEAHA.117.019440
    [26] Donovan FO, Brecht T, Kekeh C, et al. Machine learning generated risk model to predict unplanned hospital admission in heart failure[J]. Circulation, 2018, 1(1): 136-42. http://3b2dgy3hpb1t33upat2k0467-wpengine.netdna-ssl.com/wp-content/uploads/2017/11/AHA-HF-admit-poster-40x84.pdf
    [27] Zhou F, Li TF, Li H, et al. TPCNN: two-phase patch-based convolutional neural network for automatic brain tumor segmentation and survival prediction[C]//Brainlesion: Glioma Mult Scler Stroke Trauma Brain Inj, 2018. DOI: 10.1007/978-3-319-75238-9_24.
    [28] Gao C, Sun H, Wang T, et al. Model-based and model-free machine learning techniques for diagnostic prediction and classification of clinical outcomes in Parkinson's disease[J]. Sci Rep, 2018, 8(1): 7129. doi: 10.1038/s41598-018-24783-4
    [29] Li ZZ, Yuan L, Zhang C, et al. A novel prognostic scoring system of intrahepatic cholangiocarcinoma with machine learning basing on real-world data[J]. Front Oncol, 2021, 10: 576901. doi: 10.3389/fonc.2020.576901
    [30] Liu L, Yu Y, Fei Z, et al. An interpretable boosting model to predict side effects of analgesics for osteoarthritis[J]. BMC Syst Biol, 2018, 12(suppl 6): 105. doi: 10.1186%2Fs12918-018-0624-4.pdf
    [31] Mo X, Chen X, Ieong C, et al. Early prediction of clinical response to etanercept treatment in juvenile idiopathic arthritis using machine learning[J]. Front Pharmacol, 2020, 21(4): 1164-75. http://www.researchgate.net/publication/343338927_Early_Prediction_of_Clinical_Response_to_Etanercept_Treatment_in_Juvenile_Idiopathic_Arthritis_Using_Machine_Learning
    [32] Hatmal MM, Al-Hatamleh MAI, Olaimat AN, et al. Side effects and perceptions following COVID-19 vaccination in Jordan: a randomized, cross-sectional study implementing machine learning for predicting severity of side effects[J]. Vaccines, 2021, 9(6): 556. doi: 10.3390/vaccines9060556
    [33] Kan JT, Li A, Zou H, et al. A machine learning based dose prediction of lutein supplements for individuals with eye fatigue[J]. Front Nutr, 2020, 7: 577923. doi: 10.3389/fnut.2020.577923
    [34] Huang X, Yu Z, Wei X, et al. Prediction of vancomycin dose on high-dimensional data using machine learning techniques[J]. Expert Rev Clin Pharmacol, 2021, 14(6): 761-71. doi: 10.1080/17512433.2021.1911642
    [35] Huang X, Yu Z, Bu S, et al. An ensemble model for prediction of vancomycin trough concentrations in pediatric patients[J]. Drug Des Devel Ther, 2021, 15: 1549-59. doi: 10.2147/DDDT.S299037
    [36] Mamada H, Iwamoto K, Nomura Y, et al. Predicting blood-to-plasma concentration ratios of drugs from chemical structures and volumes of distribution in humans[J]. Mol Divers, 2021, 25(3): 1261-70. doi: 10.1007/s11030-021-10186-7
    [37] Nguyen M, Brettin T, Long SW, et al. Developing an in silico minimum inhibitory concentration panel test for Klebsiella pneumonia[J]. Sci Rep, 2018, 8(1): 421-6. doi: 10.1038/s41598-017-18972-w
    [38] Cui W, Bachi K, Hurd Y, et al. Using big data to predict outcomes of opioid treatment programs[J]. Stud Health Technol Inform, 2020, 272: 366-9.
    [39] Sidorov P, Naulaerts S, Ariey-Bonnet J, et al. Predicting synergism of cancer drug combinations using NCI-ALMANAC data[J]. Front Chem, 2019, 7: 509. doi: 10.3389/fchem.2019.00509
    [40] Wacker S, Noskov SY. Performance of machine learning algorithms for qualitative and quantitative prediction drug blockade of hERG1 channel[J]. Comput Toxicol, 2018, 6: 55-63. doi: 10.1016/j.comtox.2017.05.001
    [41] Babajide Mustapha I, Saeed F. Bioactive molecule prediction using extreme gradient boosting[J]. Molecules, 2016, 21(8): 983. doi: 10.3390/molecules21080983
    [42] Lu J, Chen M, Qin Y. Drug-induced cell viability prediction from LINCS-L1000 through WRFEN-XGBoost algorithm[J]. BMC Bioinformatics, 2021, 22(1): 13. doi: 10.1186/s12859-020-03949-w
    [43] Zhong JC, Sun YS, Peng W, et al. XGBFEMF: an XGBoost-based framework for essential protein prediction[J]. IEEE Trans Nanobioscience, 2018, 17(3): 243-50. doi: 10.1109/TNB.2018.2842219
    [44] Yu B, Qiu W, Chen C, et al. SubMito-XGBoost: predicting protein submitochondrial localization by fusing multiple feature information and eXtreme gradient boosting[J]. Bioinformatics, 2020, 36(4): 1074-81. doi: 10.1093/bioinformatics/btz734
    [45] Kaushal R, Shojania KG, Bates DW. Effects of computerized physician order entry and clinical decision support systems on medication safety: a systematic review[J]. Arch Intern Med, 2003, 163(12): 1409-16. doi: 10.1001/archinte.163.12.1409
    [46] Mishra AK, Keserwani PK, Samaddar SG, et al. A decision support system in healthcare prediction[M]//Lecture Notes in Electrical Engineering. Singapore: Springer Singapore, 2018: 156-67.
    [47] Fitriyani NL, Syafrudin M, Alfian G, et al. HDPM: an effective heart disease prediction model for a clinical decision support system[J]. IEEE Access, 2020, 8: 133034-50. doi: 10.1109/ACCESS.2020.3010511
    [48] Mo XL, Chen XJ, Li HW, et al. Early and accurate prediction of clinical response to methotrexate treatment in juvenile idiopathic arthritis using machine learning[J]. Front Pharmacol, 2019, 10: 1155. doi: 10.3389/fphar.2019.01155
    [49] Hou N, Li M, He L, et al. Predicting 30-days mortality for MIMIC-Ⅲ patients with Sepsis-3: a machine learning approach using XGboost[J]. J Transl Med, 2020, 18(1): 462. doi: 10.1186/s12967-020-02620-5
  • 加载中
图(1)
计量
  • 文章访问数:  996
  • HTML全文浏览量:  1239
  • PDF下载量:  167
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-08-05
  • 刊出日期:  2021-09-20

目录

    /

    返回文章
    返回