Search Articles

View query in Help articles search

Search Results (1 to 7 of 7 Results)

Download search results: CSV END BibTex RIS


Using Natural Language Processing Methods to Build the Hypersexuality in Bipolar Reddit Corpus: Infodemiology Study of Reddit

Using Natural Language Processing Methods to Build the Hypersexuality in Bipolar Reddit Corpus: Infodemiology Study of Reddit

BERTopic [29] is an algorithm that uses pretrained embedding models to create word and document embeddings so that documents that occupy similar vector space can be grouped together to form topics. By default, BERTopic incorporates Bidirectional Encoder Representations From Transformers embeddings and a term frequency–inverse document frequency algorithm, which compares the importance of terms within a cluster and creates term representation based on this [60].

Daisy Harvey, Paul Rayson, Fiona Lobban, Jasper Palmier-Claus, Clare Dolman, Anne Chataigné, Steven Jones

JMIR Infodemiology 2025;5:e65632

Probing Public Perceptions of Antidepressants on Social Media: Mixed Methods Study

Probing Public Perceptions of Antidepressants on Social Media: Mixed Methods Study

We used BERTopic [44], a topic modeling approach leveraging transformers and class-based term frequency-inverse document frequency to generate coherent topics. To enhance interpretability, we used GPT-4 to refine topic labels by analyzing keywords and representative documents. The following prompt was used (Textbox 1).

Jianfeng Zhu, Xinyu Zhang, Ruoming Jin, Hailong Jiang, Deric R Kenne

JMIR Form Res 2025;9:e62680

Mpox Discourse on Twitter by Sexual Minority Men and Gender-Diverse Individuals: Infodemiological Study Using BERTopic

Mpox Discourse on Twitter by Sexual Minority Men and Gender-Diverse Individuals: Infodemiological Study Using BERTopic

BERTopic [17] is a more recent topic-modeling technique that has gained popularity for its ease of interpretation and ability to leverage Hugging Face transformers and class-based Term Frequency–Inverse Document Frequency (c-TF-IDF) to create dense clusters.

Yunwen Wang, Karen O’Connor, Ivan Flores, Carl T Berdahl, Ryan J Urbanowicz, Robin Stevens, José A Bauermeister, Graciela Gonzalez-Hernandez

JMIR Public Health Surveill 2024;10:e59193

Comparing Open-Access Database and Traditional Intensive Care Studies Using Machine Learning: Bibliometric Analysis Study

Comparing Open-Access Database and Traditional Intensive Care Studies Using Machine Learning: Bibliometric Analysis Study

Built on the foundations of bidirectional encoder representations from transformers (BERT), BERTopic introduces a novel approach to topic modeling [29,30]. Unlike traditional unsupervised models like latent Dirichlet allocation, which rely on “bag-of-words” model [31], BERTopic overcomes the problem of semantic information loss, significantly enhancing the accuracy of generated topics, and providing more interpretable compositions for each topic, which greatly facilitates the classification of topics.

Yuhe Ke, Rui Yang, Nan Liu

J Med Internet Res 2024;26:e48330

Machine Learning–Based Approach for Identifying Research Gaps: COVID-19 as a Case Study

Machine Learning–Based Approach for Identifying Research Gaps: COVID-19 as a Case Study

For clustering the sentences into semantically similar topics, we used the BERTopic algorithm [25]. The BERTopic algorithm is an unsupervised learning algorithm for topic modeling. It uses the Bidirectional Encoder Representations from Transformers (BERT). BERTopic does not require labeled data as it extracts topics from an input text in a supervised way [26].

Alaa Abd-alrazaq, Abdulqadir J Nashwan, Zubair Shah, Ahmad Abujaber, Dari Alhuwail, Jens Schneider, Rawan AlSaad, Hazrat Ali, Waleed Alomoush, Arfan Ahmed, Sarah Aziz

JMIR Form Res 2024;8:e49411

Disruptions in the Cystic Fibrosis Community’s Experiences and Concerns During the COVID-19 Pandemic: Topic Modeling and Time Series Analysis of Reddit Comments

Disruptions in the Cystic Fibrosis Community’s Experiences and Concerns During the COVID-19 Pandemic: Topic Modeling and Time Series Analysis of Reddit Comments

At this stage, we did not perform any further data cleaning to maintain the natural structure of the comments since the BERTopic library was developed with natural text and has its own way of dealing with noise and outliers. BERTopic is a topic modeling technique that uses state-of-the-art language models and applies a class-based term frequency-inverse document, which calculates how relevant a word is to the class of documents and uses a frequency procedure for generating topics [22].

Lean Franzl Yao, Kiki Ferawati, Kongmeng Liew, Shoko Wakamiya, Eiji Aramaki

J Med Internet Res 2023;25:e45249

Comparison of Methods for Estimating Temporal Topic Models From Primary Care Clinical Text Data: Retrospective Closed Cohort Study

Comparison of Methods for Estimating Temporal Topic Models From Primary Care Clinical Text Data: Retrospective Closed Cohort Study

Examples of recently developed neural topic models include top2vec [10] and BERTopic [11]. In this study, we focused on the BERTopic model. BERTopic begins with embedding documents empirically observed in the study corpus into a latent embedding space. Many methods exist for embedding discrete linguistic units (words, sentences, paragraphs, documents, etc) into an embedding space.

Christopher Meaney, Michael Escobar, Therese A Stukel, Peter C Austin, Liisa Jaakkimainen

JMIR Med Inform 2022;10(12):e40102