TY - JOUR AU - Ma, Shaoying AU - Jiang, Shuning AU - Yang, Olivia AU - Zhang, Xuanzhi AU - Fu, Yu AU - Zhang, Yusen AU - Kaareen, Aadeeba AU - Ling, Meng AU - Chen, Jian AU - Shang, Ce PY - 2024 DA - 2024/1/24 TI - Use of Machine Learning Tools in Evidence Synthesis of Tobacco Use Among Sexual and Gender Diverse Populations: Algorithm Development and Validation JO - JMIR Form Res SP - e49031 VL - 8 KW - machine learning KW - natural language processing KW - tobacco control KW - sexual and gender diverse populations KW - lesbian KW - gay KW - bisexual KW - transgender KW - queer KW - LGBTQ+ KW - evidence synthesis AB - Background: From 2016 to 2021, the volume of peer-reviewed publications related to tobacco has experienced a significant increase. This presents a considerable challenge in efficiently summarizing, synthesizing, and disseminating research findings, especially when it comes to addressing specific target populations, such as the LGBTQ+ (lesbian, gay, bisexual, transgender, queer, intersex, asexual, Two Spirit, and other persons who identify as part of this community) populations. Objective: In order to expedite evidence synthesis and research gap discoveries, this pilot study has the following three aims: (1) to compile a specialized semantic database for tobacco policy research to extract information from journal article abstracts, (2) to develop natural language processing (NLP) algorithms that comprehend the literature on nicotine and tobacco product use among sexual and gender diverse populations, and (3) to compare the discoveries of the NLP algorithms with an ongoing systematic review of tobacco policy research among LGBTQ+ populations. Methods: We built a tobacco research domain–specific semantic database using data from 2993 paper abstracts from 4 leading tobacco-specific journals, with enrichment from other publicly available sources. We then trained an NLP model to extract named entities after learning patterns and relationships between words and their context in text, which further enriched the semantic database. Using this iterative process, we extracted and assessed studies relevant to LGBTQ+ tobacco control issues, further comparing our findings with an ongoing systematic review that also focuses on evidence synthesis for this demographic group. Results: In total, 33 studies were identified as relevant to sexual and gender diverse individuals’ nicotine and tobacco product use. Consistent with the ongoing systematic review, the NLP results showed that there is a scarcity of studies assessing policy impact on this demographic using causal inference methods. In addition, the literature is dominated by US data. We found that the product drawing the most attention in the body of existing research is cigarettes or cigarette smoking and that the number of studies of various age groups is almost evenly distributed between youth or young adults and adults, consistent with the research needs identified by the US health agencies. Conclusions: Our pilot study serves as a compelling demonstration of the capabilities of NLP tools in expediting the processes of evidence synthesis and the identification of research gaps. While future research is needed to statistically test the NLP tool’s performance, there is potential for NLP tools to fundamentally transform the approach to evidence synthesis. SN - 2561-326X UR - https://formative.jmir.org/2024/1/e49031 UR - https://doi.org/10.2196/49031 UR - http://www.ncbi.nlm.nih.gov/pubmed/38265858 DO - 10.2196/49031 ID - info:doi/10.2196/49031 ER -