@Article{info:doi/10.2196/35027, author="Allem, Jon-Patrick and Majmundar, Anuja and Dormanesh, Allison and Donaldson, Scott I", title="Identifying Health-Related Discussions of Cannabis Use on Twitter by Using a Medical Dictionary: Content Analysis of Tweets", journal="JMIR Form Res", year="2022", month="Feb", day="25", volume="6", number="2", pages="e35027", keywords="cannabis; marijuana; Twitter; social media; adverse event; cannabis safety; dictionary; rule-based classifier; medical; health-related; conversation; codebook", abstract="Background: The cannabis product and regulatory landscape is changing in the United States. Against the backdrop of these changes, there have been increasing reports on health-related motives for cannabis use and adverse events from its use. The use of social media data in monitoring cannabis-related health conversations may be useful to state- and federal-level regulatory agencies as they grapple with identifying cannabis safety signals in a comprehensive and scalable fashion. Objective: This study attempted to determine the extent to which a medical dictionary---the Unified Medical Language System Consumer Health Vocabulary---could identify cannabis-related motivations for use and health consequences of cannabis use based on Twitter posts in 2020. Methods: Twitter posts containing cannabis-related terms were obtained from January 1 to August 31, 2020. Each post from the sample (N=353,353) was classified into at least 1 of 17 a priori categories of common health-related topics by using a rule-based classifier. Each category was defined by the terms in the medical dictionary. A subsample of posts (n=1092) was then manually annotated to help validate the rule-based classifier and determine if each post pertained to health-related motivations for cannabis use, perceived adverse health effects from its use, or neither. Results: The validation process indicated that the medical dictionary could identify health-related conversations in 31.2{\%} (341/1092) of posts. Specifically, 20.4{\%} (223/1092) of posts were accurately identified as posts related to a health-related motivation for cannabis use, while 10.8{\%} (118/1092) of posts were accurately identified as posts related to a health-related consequence from cannabis use. The health-related conversations about cannabis use included those about issues with the respiratory system, stress to the immune system, and gastrointestinal issues, among others. Conclusions: The mining of social media data may prove helpful in improving the surveillance of cannabis products and their adverse health effects. However, future research needs to develop and validate a dictionary and codebook that capture cannabis use--specific health conversations on Twitter. ", issn="2561-326X", doi="10.2196/35027", url="https://formative.jmir.org/2022/2/e35027", url="https://doi.org/10.2196/35027", url="http://www.ncbi.nlm.nih.gov/pubmed/35212637" }