Page 106 - 《中国药房》2025年19期
P. 106
·智慧药学·
基于 MIMIC-Ⅳ数据库的万古霉素血药谷浓度机器学习分类预
测模型构建
Δ
*
林小惠 ,汪余嘉,张玲玲,许双临(宁德师范学院附属宁德市医院临床药学室,福建 宁德 352100)
#
中图分类号 R969.3;TP181 文献标志码 A 文章编号 1001-0408(2025)19-2448-06
DOI 10.6039/j.issn.1001-0408.2025.19.16
摘 要 目的 构建万古霉素血药谷浓度的分类预测模型,优化其精准用药策略。方法 从重症监护医学信息集市数据库中筛选
符合条件的患者数据,经过数据清洗和预处理,最终纳入9 902例患者,结合相关性分析和Boruta特征选择算法进行特征选择,根
据临床治疗窗标准离散化万古霉素血药谷浓度结果为低浓度(<10 μg/mL)、中浓度(10~20 μg/mL)和高浓度(≥20 μg/mL)。采用6
种机器学习算法:表格先验数据拟合网络(TabPFN)、逻辑回归(LR)、随机森林(RF)、极端梯度提升(XGBoost)、支持向量机
(SVM)、K近邻(KNN)构建分类模型,通过10折交叉验证(10-CV)评估模型性能,主要性能评估指标包括准确率、平衡准确率、宏
平均精确率、宏平均召回率、宏平均F1、多类ROC曲线的曲线下面积(OvR-AUC)。采用沙普利加性解释(SHAP)分析不同特征对
模型预测结果的影响方向与强度。结果 RF和TabPFN模型表现最优(准确率为0.741 4和0.737 7,OvR-AUC为0.907 0和0.895 8),
XGBoost模型表现中等,而LR、SVM和KNN模型的性能较差。混淆矩阵热力图显示,RF和TabPFN模型在高浓度类别上的预测
准确率较高,但在低、中浓度类别上的表观略显不足。自举法结合10-CV评估显示,RF模型各项性能评价指标表现稳定(准确率
0.741 4,平衡准确率0.740 3,宏平均精确率0.732 1,宏平均召回率0.736 0,宏平均F1 0.736 0,OvR-AUC 0.907 0),具备良好的分类
性能与判断能力。SHAP法分析发现,肌酐、尿素氮及万古霉素日累计量和给药频率等关键特征对预测结果具有显著影响。结论 RF
和TabPFN模型在万古霉素血药谷浓度分类预测任务中表现出一定优势,在低、中浓度类别上的表现仍有改进空间。
关键词 机器学习;万古霉素;血药浓度;MIMIC-Ⅳ数据库;分类预测
Construction of machine learning classification prediction model for vancomycin blood concentrations
based on MIMIC-Ⅳ database
LIN Xiaohui,WANG Yujia,ZHANG Lingling,XU Shuanglin(Dept. of Clinical Pharmacy, Ningde Municipal
Hospital Affiliated to Ningde Normal University, Fujian Ningde 352100, China)
ABSTRACT OBJECTIVE To construct a classification prediction model for vancomycin blood concentration, and to optimize
its precision dosing strategies. METHODS Patient records meeting inclusion criteria were extracted from the Medical Information
Mart for Intensive Care database. Following data cleaning and preprocessing, a final cohort of 9 902 patient was analyzed. Feature
selection was performed through correlation analysis and the Boruta feature selection algorithm. Vancomycin blood concentrations
were discretized into three categories based on clinical therapeutic windows: low (<10 μg/mL), intermediate (10-20 μg/mL),
and high (≥20 μg/mL). Six machine learning algorithms were employed to construct classification models: tabular prior-data
fitted network (TabPFN), logistic regression (LR), random forest (RF), extreme gradient boosting (XGBoost), support vector
machine (SVM), K-nearest neighbors (KNN). Model performance was evaluated using 10-fold cross-validation (10-CV), with
primary metrics including: accuracy, balanced accuracy, precision macro, recall macro, macro F1, area under the receiver
operating characteristic curve (OvR-AUC). Shapley Additive Explanations (SHAP) was adopted to analyze the direction and
magnitude of the impact that different features had on the model’s predictive outcomes. RESULTS The results showed that the RF
and TabPFN models performed the best (with accuracy of 0.741 4 and 0.737 7, and OvR-AUC of 0.907 0 and 0.895 8,
respectively). XGBoost model exhibited moderate performance, while LR, SVM, and KNN models demonstrated relatively poor
performance. Confusion matrix heatmap analysis revealed that both RF and TabPFN achieved higher accuracy in predicting high-
concentration cases but exhibited slightly lower performance in the low and medium concentration categories. Bootstrap with 10-CV
revealed that the RF model demonstrated stable performance across various evaluation metrics (accuracy: 0.741 4; balanced
accuracy: 0.740 3; precision macro: 0.732 1; recall macro: 0.736 0; macro F1: 0.736 0; OvR-AUC: 0.907 0), indicating good
classification performance and generalization ability. SHAP
Δ 基金项目 福建省自然科学基金联合资助项目(No.2024J01942);
analysis revealed that creatinine, urea nitrogen, daily
宁德师范学院校级科研项目(No.2023ZX715) cumulative dose and administration frequency of vancomycin,
*第一作者 主管药师,硕士。研究方向:循证药学、药物经济学。
which were key predictors, had a significant impact on the
E-mail:linxiaohui@ndnu.edu.cn
# 通信作者 副主任药师。研究方向:临床药学、数据科学。 prediction results. CONCLUSIONS RF and TabPFN models
E-mail:sunbird01@163.com demonstrate certain advantages in the classification prediction
· 2448 · China Pharmacy 2025 Vol. 36 No. 19 中国药房 2025年第36卷第19期

