TY - JOUR AU - Nguyen Tien, Dung AU - Thi Thu Bui, Huong AU - Hoang Thi Ngoc, Tram AU - Thi Pham, Thuy AU - Trung Nguyen, Dac AU - Nguyen Thi Thu, Huyen AU - Thu Hang Vu, Thi AU - Lan Anh Luong, Thi AU - Thu Hoang, Lan AU - Cam Tu, Ho AU - Körber, Nina AU - Bauer, Tanja AU - Khanh Ho, Lam PY - 2025 DA - 2025/5/23 TI - A Data-Driven Approach to Assessing Hepatitis B Mother-to-Child Transmission Risk Prediction Model: Machine Learning Perspective JO - JMIR Form Res SP - e69838 VL - 9 KW - chronic hepatitis B virus infection KW - liver KW - pregnant women KW - cord blood KW - PBMCs (peripheral blood mononuclear cells) KW - ID3 (Iterative Dichotomiser 3) KW - CART (classification and regression trees) AB - Background: Hepatitis B virus (HBV) can be transmitted from mother to child either through transplacental infection or via blood-to-blood contact during or immediately after delivery. Early and accurate risk assessments are essential for guiding clinical decisions and implementing effective preventive measures. Data mining techniques are powerful tools for identifying key predictors in medical diagnostics. Objective: This study aims to develop a robust predictive model for mother-to-child transmission (MTCT) of HBV using decision tree algorithms, specifically Iterative Dichotomiser 3 (ID3) and classification and regression trees (CART). The study identifies clinically and paraclinically relevant predictors, particularly hepatitis B e antigen (HBeAg) status and peripheral blood mononuclear cell (PBMC) concentration, for effective risk stratification and prevention. Additionally, we will assess the model’s reliability and generalizability through cross-validation with various training-test split ratios, aiming to enhance its applicability in clinical settings and inform improved preventive strategies against HBV MTCT. Methods: This study used decision tree algorithms—ID3 and CART—on a data set of 60 hepatitis B surface antigen (HBsAg)–positive pregnant women. Samples were collected either before or at the time of delivery, enabling the inclusion of patients who were undiagnosed or had limited access to treatment. We analyzed both clinical and paraclinical parameters, with a particular focus on HBeAg status and PBMC concentration. Additional biochemical markers were evaluated for their potential contributory or inhibitory effects on MTCT risk. The predictive models were validated using multiple training-test split ratios to ensure robustness and generalizability. Results: Our analysis showed that 20 out of 48 (based on a split ratio of 0.8 from a total of 60 cases, 42%) to 27 out of 57 (based on a split ratio of 0.95 from a total of 60 cases, 47%) training cases with HBeAg-positive status were associated with a significant risk of MTCT of HBV (χ28=21.16, P=.007, df=8). Among HBeAg-negative women, those with PBMC concentrations ≥8 × 106 cells/mL exhibited a low risk of MTCT, whereas individuals with PBMC concentrations <8 × 106 cells/mL demonstrated a negligible risk. Across all training-test split ratios, the decision tree models consistently identified HBeAg status and PBMC concentration as the most influential predictors, underscoring their robustness and critical role in MTCT risk stratification. Conclusions: This study demonstrates that decision tree models are effective tools for stratifying the risk of MTCT of HBV by integrating key clinical and paraclinical markers. Among these, HBeAg status and PBMC concentration emerged as the most critical predictors. While the analysis focused on untreated patients, it provides a strong foundation for future investigations involving treated populations. These findings offer actionable insights to support the development of more targeted and effective HBV MTCT prevention strategies. SN - 2561-326X UR - https://formative.jmir.org/2025/1/e69838 UR - https://doi.org/10.2196/69838 DO - 10.2196/69838 ID - info:doi/10.2196/69838 ER -