@Article{info:doi/10.2196/69838, author="Nguyen Tien, Dung and Thi Thu Bui, Huong and Hoang Thi Ngoc, Tram and Thi Pham, Thuy and Trung Nguyen, Dac and Nguyen Thi Thu, Huyen and Thu Hang Vu, Thi and Lan Anh Luong, Thi and Thu Hoang, Lan and Cam Tu, Ho and K{\"o}rber, Nina and Bauer, Tanja and Khanh Ho, Lam", title="A Data-Driven Approach to Assessing Hepatitis B Mother-to-Child Transmission Risk Prediction Model: Machine Learning Perspective", journal="JMIR Form Res", year="2025", month="May", day="23", volume="9", pages="e69838", keywords="chronic hepatitis B virus infection; liver; pregnant women; cord blood; PBMCs (peripheral blood mononuclear cells); ID3 (Iterative Dichotomiser 3); CART (classification and regression trees)", abstract="Background: Hepatitis B virus (HBV) can be transmitted from mother to child either through transplacental infection or via blood-to-blood contact during or immediately after delivery. Early and accurate risk assessments are essential for guiding clinical decisions and implementing effective preventive measures. Data mining techniques are powerful tools for identifying key predictors in medical diagnostics. Objective: This study aims to develop a robust predictive model for mother-to-child transmission (MTCT) of HBV using decision tree algorithms, specifically Iterative Dichotomiser 3 (ID3) and classification and regression trees (CART). The study identifies clinically and paraclinically relevant predictors, particularly hepatitis B e antigen (HBeAg) status and peripheral blood mononuclear cell (PBMC) concentration, for effective risk stratification and prevention. Additionally, we will assess the model's reliability and generalizability through cross-validation with various training-test split ratios, aiming to enhance its applicability in clinical settings and inform improved preventive strategies against HBV MTCT. Methods: This study used decision tree algorithms---ID3 and CART---on a data set of 60 hepatitis B surface antigen (HBsAg)--positive pregnant women. Samples were collected either before or at the time of delivery, enabling the inclusion of patients who were undiagnosed or had limited access to treatment. We analyzed both clinical and paraclinical parameters, with a particular focus on HBeAg status and PBMC concentration. Additional biochemical markers were evaluated for their potential contributory or inhibitory effects on MTCT risk. The predictive models were validated using multiple training-test split ratios to ensure robustness and generalizability. Results: Our analysis showed that 20 out of 48 (based on a split ratio of 0.8 from a total of 60 cases, 42{\%}) to 27 out of 57 (based on a split ratio of 0.95 from a total of 60 cases, 47{\%}) training cases with HBeAg-positive status were associated with a significant risk of MTCT of HBV ($\chi$28=21.16, P=.007, df=8). Among HBeAg-negative women, those with PBMC concentrations ≥8 {\texttimes} 106 cells/mL exhibited a low risk of MTCT, whereas individuals with PBMC concentrations <8 {\texttimes} 106 cells/mL demonstrated a negligible risk. Across all training-test split ratios, the decision tree models consistently identified HBeAg status and PBMC concentration as the most influential predictors, underscoring their robustness and critical role in MTCT risk stratification. Conclusions: This study demonstrates that decision tree models are effective tools for stratifying the risk of MTCT of HBV by integrating key clinical and paraclinical markers. Among these, HBeAg status and PBMC concentration emerged as the most critical predictors. While the analysis focused on untreated patients, it provides a strong foundation for future investigations involving treated populations. These findings offer actionable insights to support the development of more targeted and effective HBV MTCT prevention strategies. ", issn="2561-326X", doi="10.2196/69838", url="https://formative.jmir.org/2025/1/e69838", url="https://doi.org/10.2196/69838" }