Page 106 - 《中国药房》2025年19期
P. 106

·智慧药学·


          基于 MIMIC-Ⅳ数据库的万古霉素血药谷浓度机器学习分类预
          测模型构建
                            Δ


                *
          林小惠 ,汪余嘉,张玲玲,许双临(宁德师范学院附属宁德市医院临床药学室,福建 宁德 352100)
                                        #
          中图分类号  R969.3;TP181      文献标志码  A      文章编号  1001-0408(2025)19-2448-06
          DOI  10.6039/j.issn.1001-0408.2025.19.16

          摘   要  目的  构建万古霉素血药谷浓度的分类预测模型,优化其精准用药策略。方法  从重症监护医学信息集市数据库中筛选
          符合条件的患者数据,经过数据清洗和预处理,最终纳入9 902例患者,结合相关性分析和Boruta特征选择算法进行特征选择,根
          据临床治疗窗标准离散化万古霉素血药谷浓度结果为低浓度(<10 μg/mL)、中浓度(10~20 μg/mL)和高浓度(≥20 μg/mL)。采用6
          种机器学习算法:表格先验数据拟合网络(TabPFN)、逻辑回归(LR)、随机森林(RF)、极端梯度提升(XGBoost)、支持向量机
         (SVM)、K近邻(KNN)构建分类模型,通过10折交叉验证(10-CV)评估模型性能,主要性能评估指标包括准确率、平衡准确率、宏
          平均精确率、宏平均召回率、宏平均F1、多类ROC曲线的曲线下面积(OvR-AUC)。采用沙普利加性解释(SHAP)分析不同特征对
          模型预测结果的影响方向与强度。结果  RF和TabPFN模型表现最优(准确率为0.741 4和0.737 7,OvR-AUC为0.907 0和0.895 8),
          XGBoost模型表现中等,而LR、SVM和KNN模型的性能较差。混淆矩阵热力图显示,RF和TabPFN模型在高浓度类别上的预测
          准确率较高,但在低、中浓度类别上的表观略显不足。自举法结合10-CV评估显示,RF模型各项性能评价指标表现稳定(准确率
          0.741 4,平衡准确率0.740 3,宏平均精确率0.732 1,宏平均召回率0.736 0,宏平均F1 0.736 0,OvR-AUC 0.907 0),具备良好的分类
          性能与判断能力。SHAP法分析发现,肌酐、尿素氮及万古霉素日累计量和给药频率等关键特征对预测结果具有显著影响。结论  RF
          和TabPFN模型在万古霉素血药谷浓度分类预测任务中表现出一定优势,在低、中浓度类别上的表现仍有改进空间。
          关键词  机器学习;万古霉素;血药浓度;MIMIC-Ⅳ数据库;分类预测

          Construction  of  machine  learning  classification  prediction  model  for  vancomycin  blood  concentrations
          based on MIMIC-Ⅳ database
          LIN Xiaohui,WANG Yujia,ZHANG Lingling,XU Shuanglin(Dept.  of  Clinical  Pharmacy,  Ningde  Municipal
          Hospital Affiliated to Ningde Normal University, Fujian Ningde 352100, China)

          ABSTRACT    OBJECTIVE  To  construct  a  classification  prediction  model  for  vancomycin  blood  concentration,  and  to  optimize
          its  precision  dosing  strategies.  METHODS  Patient  records  meeting  inclusion  criteria  were  extracted  from  the  Medical  Information
          Mart for Intensive Care database. Following data cleaning and preprocessing, a final cohort of 9 902 patient was analyzed. Feature
          selection  was  performed  through  correlation  analysis  and  the  Boruta  feature  selection  algorithm.  Vancomycin  blood  concentrations
          were  discretized  into  three  categories  based  on  clinical  therapeutic  windows:  low (<10  μg/mL),  intermediate (10-20  μg/mL),
          and  high (≥20  μg/mL).  Six  machine  learning  algorithms  were  employed  to  construct  classification  models:  tabular  prior-data
          fitted  network (TabPFN),  logistic  regression (LR),  random  forest (RF),  extreme  gradient  boosting (XGBoost),  support  vector
          machine (SVM),  K-nearest  neighbors (KNN).  Model  performance  was  evaluated  using  10-fold  cross-validation (10-CV),  with
          primary  metrics  including:  accuracy,  balanced  accuracy,  precision  macro,  recall  macro,  macro  F1,  area  under  the  receiver
          operating  characteristic  curve (OvR-AUC).  Shapley  Additive  Explanations (SHAP)  was  adopted  to  analyze  the  direction  and
          magnitude of the impact that different features had on the model’s predictive outcomes. RESULTS The results showed that the RF
          and  TabPFN  models  performed  the  best (with  accuracy  of  0.741  4  and  0.737  7,  and  OvR-AUC  of  0.907  0  and  0.895  8,
          respectively).  XGBoost  model  exhibited  moderate  performance,  while  LR,  SVM,  and  KNN  models  demonstrated  relatively  poor
          performance.  Confusion  matrix  heatmap  analysis  revealed  that  both  RF  and  TabPFN  achieved  higher  accuracy  in  predicting  high-
          concentration cases but exhibited slightly lower performance in the low and medium concentration categories. Bootstrap with 10-CV
          revealed  that  the  RF  model  demonstrated  stable  performance  across  various  evaluation  metrics (accuracy:  0.741  4;  balanced
          accuracy: 0.740 3; precision macro: 0.732 1; recall macro: 0.736 0; macro F1: 0.736 0; OvR-AUC: 0.907 0), indicating good
                                                              classification  performance  and  generalization  ability.  SHAP
              Δ 基金项目 福建省自然科学基金联合资助项目(No.2024J01942);
                                                              analysis  revealed  that  creatinine,  urea  nitrogen,  daily
          宁德师范学院校级科研项目(No.2023ZX715)                          cumulative  dose  and  administration  frequency  of  vancomycin,
             *第一作者 主管药师,硕士。研究方向:循证药学、药物经济学。
                                                              which  were  key  predictors,  had  a  significant  impact  on  the
          E-mail:linxiaohui@ndnu.edu.cn
              # 通信作者 副主任药师。研究方向:临床药学、数据科学。                    prediction  results.  CONCLUSIONS  RF  and  TabPFN  models
          E-mail:sunbird01@163.com                            demonstrate  certain  advantages  in  the  classification  prediction


          · 2448 ·    China Pharmacy  2025 Vol. 36  No. 19                            中国药房  2025年第36卷第19期
   101   102   103   104   105   106   107   108   109   110   111