Published on in Vol 9 (2025)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/68316, first published .
Co-Design of a Health Screening Program Fact Sheet by People Experiencing Homelessness and ChatGPT: Focus Group Study

Co-Design of a Health Screening Program Fact Sheet by People Experiencing Homelessness and ChatGPT: Focus Group Study

Co-Design of a Health Screening Program Fact Sheet by People Experiencing Homelessness and ChatGPT: Focus Group Study

1DocRoom Health Research Program, Hungarian Charity Service of the Order of Malta, 28 Bem Rakpart, Budapest, Hungary

2Institute of Behavioural Sciences, Faculty of Medicine, Semmelweis University, Budapest, Hungary

3Department of Community Dentistry, Faculty of Dentistry, Semmelweis University, Budapest, Hungary

4Centre for Translational Medicine, Semmelweis University, Budapest, Hungary

*all authors contributed equally

Corresponding Author:

Nóra Radó, PhD


Background: People experiencing homelessness have worse oral health outcomes and a notable health informational asymmetry compared to the general population. Screening programs present a viable option for this population; however, barriers to access, such as lower levels of health literacy, lack of information, and mistrust, narrow their chances to participate in such programs.

Objective: The aim of this study is to investigate the applicability of generative artificial intelligence (AI) in designing a homeless health screening program fact sheet with experts by experience using co-design principles.

Methods: Six fact sheet text variants were created by the open-access version of ChatGPT 3.5 for an oral cancer screening program targeting people experiencing homelessness in Budapest, Hungary. Clients of homeless social services (N=23) were invited to a short questionnaire survey and 3 semistructured focus group discussions between May and July 2024. General opinions regarding generative AI technology and direct feedback on the text variants were obtained. Additionally, a standardized readability assessment of the text variants was completed via the Sydney Health Literacy Lab Editor.

Results: Almost two-thirds of participants (17/23) stated that they had previously heard about AI; however, their self-assessment regarding the extent of their knowledge resulted in an average of 2.38 (n=16) on a 5-point Likert scale. During the first focus group discussion, all 6 variants received a high score (between 4.63 and 4.92 on a 5-point Likert scale). In the next sessions, participants remained positive when the pool was narrowed to 4 versions, although they scored the texts lower. During open discussions, text variants were considered understandable, while difficulties with medical expressions, lengthiness of sentences, and references to a stereotypical homeless subgroup (rough sleepers) were also reported. The health literacy editor showed that most AI-generated text variants were difficult to read and too complex for the target group.

Conclusions: The co-design process revealed that focus group participants actively wanted to shape the fact sheet drafts. They shared their insights on how to make the text variants more appealing for the target audience. Moreover, the involvement of generative AI technology revealed that the participants have heard about the concept of AI and text generation as a potential function, and they have not rejected its use in health care settings.

JMIR Form Res 2025;9:e68316

doi:10.2196/68316

Keywords



Homelessness and Oral Health

Homelessness is a complex social phenomenon that leaves individuals for a shorter or longer period in an extremely vulnerable life situation. According to previous research, homelessness is associated with significantly higher disease burden [1-3] and higher mortality rates for both women and men than the average population [4]. In Western, high-income countries, studies have also shown that homelessness is an independent risk factor for mortality, and life expectancy varies between 50‐65 years on average [5].

Previous research on the oral health of people experiencing homelessness found that this population has poor outcomes; they are in great need of restorative, oral hygiene, and periodontal treatment. They also have inadequate access to dental services, mostly relying on emergency treatment, in parallel with unmet treatment needs [6-8]. In the United Kingdom, dental health was identified as this group’s largest unmet health need [9]. In the United States, a national study of homeless adults using Health Care for the Homeless services found that approximately half of homeless adults had an unmet need for dental care [10]. Higher rates of substance use (alcohol, tobacco, drugs) further put the oral and general health of people experiencing homelessness at risk [11]. Freitas et al [10] found strong associations between having lost half or more of their teeth and evidence of problem drinking, cocaine use, or having ever smoked. In 1997, in Hungary, precancerous lesions were found in or around the oral cavity in 14% of people experiencing homelessness or participating in alcohol withdrawal treatment, benign tumors in 2.33%, and malignancies in 2.66% [12].

Access to oral care also comes with serious barriers for this population; the cost of care for private service providers, lengthy waiting lists for publicly funded institutions, competing priorities (which might lead them to secure food and accommodation before health care), a lack of information, mistrust of health care systems, and experiences of discrimination in care settings all drive people experiencing homelessness away from dental care services, resulting in them needing to rely on emergency treatment in cases of acute problems [11,13-15]. Moreover, psychosocial factors play a significant role; higher levels of dental anxiety and dental phobia were found in the homeless adult population [16].

Screening Programs, Health Literacy, and Information Asymmetry

As literature shows, the potential implications of a health screening program in dental practice are reductions in morbidity, mortality, and onward cost to health care systems by avoiding acute presentations of late-stage chronic diseases [17]. Moreover, Nunez et al [18] found that in the United States, veterans who received dental care were found to stay in homeless intervention programs significantly longer than veterans who did not. Their findings also indicated that the impact of the provision of dental care on outcomes among homeless veterans is equivalent to the impact of psychological treatments for depression.

To overcome the barriers to dental care for people experiencing homelessness in Hungary, the Charity Service of the Order of Malta, in collaboration with Semmelweis University and Óbuda University, launched an oral cancer screening program with digital capabilities in Budapest in 2024. The initiative fits into the wider digital health research agenda of the Charity Service, which previously completed numerous digital health projects [14,19-21]. Using advanced asynchronous telecare solutions in this vulnerable community, the new digital platform Lesionwizard was designed to deliver an oral cancer screening program for people experiencing homelessness using teledentistry [22].

As an additional barrier, a lack of information seriously burdens vulnerable populations. One of the main problems is information asymmetry between providers and people experiencing homelessness, coupled with lower levels of (oral) health literacy. In our previous study in collaboration with the Digital Health Working Group at Semmelweis University, Budapest, Hungary, we found that difficulties in gaining reliable information from service providers might result in the phenomenon that people experiencing homelessness look up medical information online or turn to alternative sources [14]. Csikar et al [23] also identified the level of (oral) health literacy as a barrier for people experiencing homelessness who had difficulties understanding letters sent to them. The authors concluded that it impacted their prioritization of oral health, as individuals may have yet to understand the importance of oral care or their options for accessing it.

The Application of Co-Design and Generative Artificial Intelligence

To facilitate participation in our oral cancer screening program, the research team decided to aid the initiative with an A5-format, awareness-raising, short health information fact sheet that presents the initiative as acceptable, available, and effective for this vulnerable population [13]. Co-design principles and the technological assistance of the generative artificial intelligence (AI) tool ChatGPT (OpenAI) were applied.

Co-design has previously been defined as a participatory approach that brings individuals together to collaborate and combine their knowledge, skills, and resources to accomplish a design task [24], also in the area of digital health for tool, educational, and health information material development [24-28]. It involves the meaningful engagement of end users recognized as experts by experience [29]. Previous research found that co-design, co-creation, or co-production can be empowering for socially marginalized or excluded groups, such as people experiencing homelessness, while it is also a pivotal approach to tackling stigmatization and promoting inclusivity. Co-design techniques resulted in increased applicability and acceptance of research questions, outputs, participant engagement, and knowledge of different contexts, as well as an improved community network for the researchers [30].

Generative AI software, such as ChatGPT, is a large language model (LLM) combined with a user-friendly interface that uses deep learning algorithms trained on vast amounts of data to generate multimodal humanlike responses to user prompts [31]. Its applicability in medicine is currently under scrutiny, but it has great promise in aiding doctor-patient communication and providing patient information. It has performed satisfactorily in answering physician-generated medical queries across 12 distinct specialties [32]. It has also been shown to simplify online health information [33], to generate dermatologic patient education materials according to specific reading levels [34], and to translate patient education materials from English into other languages [35].

In this research project, we aimed to co-design an awareness-raising fact sheet for an oral cancer screening program with people experiencing homelessness as experts by experience and ChatGPT. The latter was used to present textual alternatives for this health information piece, so we could also test the usability of ChatGPT in designing adequate information materials serving the needs of people experiencing homelessness.


Participants and Recruitment Procedure

The study followed the Consolidated Criteria for Reporting Qualitative Research (COREQ) checklist, adapted to focus groups [34] (see Checklist 1). Three focus group discussions were organized to provide feedback regarding patient information materials for an oral cancer screening program. One of them was an already existing group of experts by experience; in addition, two ad hoc groups were formed from clients of 3 shelters in Budapest, Hungary (Miklós Street Integrated Homeless Care Center, Homeless Care Center at Bem rakpart, and Galvani Street Homeless Care Center), operated by the Hungarian Charity Service of the Order of Malta. The sample constituted a convenience sample; the researchers advertised ad hoc focus groups in the shelters, and clients over 18 years without mental health problems and dementia who expressed their interest participated voluntarily, without any compensation.

The experts by experience group was established in 2023 to assist in co-designing initiatives targeting relevant health issues of people experiencing homelessness. Expert group meetings were organized on a monthly schedule with the attendance of 6‐9 experts. The option to participate in the experts by experience group was open to adult clients (>18 years) of homeless shelters operated by the Hungarian Charity Service of the Order of Malta, without mental health problems or dementia.

From the recruited sample (N=26), three people decided not to participate (2 people due to scheduling problems and 1 person due to the difficulty of the topic). The number of participants in the 3 focus groups was 6, 10, and 7, and the demographic characteristics are shown in Table 1. The focus group discussions took place on May 16, June 4, and July 4, 2024; their length varied between 40 and 55 minutes.

Table 1. Demographic characteristics of the focus groups discussing ChatGPT-generated text variants. Participants were clients of 3 homeless shelters operated by the Hungarian Charity Service of the Order of Malta.
GroupAge (years), mean (SD)Gender, female (male)
Experts by experience (n=6)55.83 (14.97)1 (5)
Focus group 2 (n=10)61.50 (7.11)2 (8)
Focus group 3 (n=7)53.57 (5.19)0 (7)

Text Generation

Six text variants of basic client information materials were generated on May 13, 2024, by the open-access version of ChatGPT 3.5 developed by OpenAI. The researchers chose OpenAI’s most advanced freely available product because, according to statistics, it is the most widely available [36]. Prompts were applied in English, while the results were given in Hungarian. Each text version was limited to a word count of 150 due to the limitations of an A5-size one-sided fact sheet. All prompts emphasized the target population (people experiencing homelessness), the main aim of the text (to raise the level of participation), and a reasoning or style/tonal requirement. These requirements were the following: (1) scientific evidence regarding oral cancer, (2) statistical evidence regarding oral cancer, (3) as motivating as possible, (4) based on an informal, familiar tone, using slang expressions, (5) formatted as a clickbait news article, and (6) structured in a bullet-point format. Otherwise, the prompts were formulated as plain texts produced by people without relevant expertise in prompt design, as the researchers had the intention to involve ChatGPT as a tool that would be used by nonexpert social sector users. The prompts used in this study and the resulting Hungarian text variants, as well as the texts translated into English, are provided in Multimedia Appendix 1.

Feedback Questionnaires and Semistructured Group Discussions

A 2-part short feedback questionnaire developed by the research team was used to quantify different aspects of AI in general and AI-generated text variants, and it was also used to catalyze an open group discussion. The first part consisted of three items: (1) whether the participants have heard about AI (in a Yes or No scheme), (2) self-assessment of knowledge regarding AI technology (on a 5-point Likert scale), (3) and trust in its use in health care settings (on a 5-point Likert scale). The second part included each text variant with 7 items. Assessment of understandability and clarity, the quality of information content, the tone and style of the texts, and the convincing factor were conducted on a 5-point Likert scale. Lastly, 3 open questions inquired about the strengths and weaknesses of the texts, any changes suggested, and the applicability of the texts in the screening program. The quantified values were obtained in paper and pencil form, while answers to open questions were discussed by the group members, and notes were taken by the research team.

Text Evaluation via the Sydney Health Literacy Lab Health Literacy Editor

After the focus groups analyzed the text variants, we also assessed the texts according to standardized readability measurement tools. There are several methods to calculate the readability scores of texts, such as the Flesch-Kincaid method [37], the Gunning fog index [38], or the SMOG readability formula [39]. The third method is frequently used in health research [40]. In this study, we used the framework by Ayre et al [33] entitled the Sydney Health Literacy Lab (SheLL) Health Literacy Editor as it is a web-based tool designed to objectively assess the extent to which health information is written in plain language, while all the other methods serve as general tools for readability measurement. The SHeLL Editor, available as a web-based tool [41], assesses the number of words, readability as grade reading score, language complexity, passive voice usage, and the use of bullet points for lists [33]. Based on this framework, we made the first 4 assessments and left out bullet points for lists as they only appeared in 1 text variant. The text assessments were then compared with the focus group assessment.

Ethical Considerations

Participation in the focus group discussion was voluntary and without any compensation. Data collection from the questionnaires was anonymous, and notes from the focus group discussions were deidentified. After a verbal summary of the study tasks and setting the ground rules of the focus groups, consent was obtained from all members of the group, and questionnaires were collected anonymously. During the focus group discussions, no dropout occurred. As an observational, noninterventional, and nonbiomedical investigation of the study subjects’ sociological behavior, it was exempt from ethical review, as it is out of the scope of the Hungarian Act CLIV of 1997 on Health Care, the Decree 23/2002 (9 May) of the Ministry of Health on Medical Research on Human Subjects, and the Decree 35/2005 (26 VIII) of the Ministry of Health on the Clinical Investigation of Investigational Medicinal Products for Human Use and the Application of Good Clinical Practice [42]. For the same reason, the Semmelweis University Committee for Regional Institutional Scientific and Research Ethics could not issue an institutional review board exemption.


General Acceptance of AI

During the focus group discussions, participants were asked about AI technology as a starting point. Of the 23 participants, 17 (74%) stated they had heard about AI in a Yes or No scheme. On a 5-point Likert scale asking about the extent of their knowledge of AI, they were more hesitant, resulting in an average of 2.38 (n=16), where 1 was not familiar at all and 5 was totally familiar. As examples of the possible functions of AI, text or picture generation was mentioned the most (8 times), and in 3 cases, AI-generated content was attributed as “fake” or “not real.” One participant said:

I know it can also generate fake photos.

After a general impression of AI, its application in health care was also discussed. For the question “Would you trust in AI-generated medical texts, documents, or tools?” the answers averaged 3.06 (n=16) on a 5-point Likert scale (where 1 was no trust at all and 5 was complete trust). When participants were asked about the reasoning behind their answers, the need for human involvement was emphasized concerning decision-making regarding health issues. Two participants said the following:

Even if it was created by humans, machines can have errors, so I would have less confidence in it if my health were at stake.
I have no opposition regarding artificial intelligence if they use it as a helping tool, but it would be frightening for me if it were to make decisions without human oversight.

Applicability of Text Variants

In the focus group discussions, the AI-generated text variants were presented. As the first step before using these texts, 2 independent researchers reviewed the AI-generated draft text variants. Modifications were applied in only 2 cases due to severe grammatical errors in the Hungarian language that limited the integrity of these texts. Otherwise, all variants were intact and brought to the focus groups in their original form. The source of each text was clarified for members of the groups only during the closure of group sessions.

First, participants were asked to provide general feedback on the applicability of each text variant in the context of a future oral cancer screening program. Scores measured on a 5-point Likert scale were detected in 4 dimensions (understandability and clarity, the quality of information content, the tone and style of the texts, and their convincing factor), and the results are shown as the average of these 4 items. During the first focus group discussion with experts by experience, all 6 variants were presented to the group.

Although the expert group members were highly positive regarding all variants, there were slight differences in the scoring of the text versions. The ranking turned out to be the following: (1) scientific reasoning (4.92; n=6), (2) informal, familiar tone (4.83; n=6), (3) focusing on motivation (4.75; n=6), (4) clickbait news article style (4.71; n=6), (5) statistical reasoning (4.67; n=6), and (6) bullet-point format (4.63; n=6). Participants were also asked to agree on the two most promising text variants that represented the highest opportunity to raise the attendance rate according to their experience. A consensus was reached after a short discussion, resulting in the variant based on scientific reasoning being selected as the top choice, and the informal, familiar version as the second choice, without knowing the quantitative results. Participants were convinced that different text variants could address different subgroups of people experiencing homelessness. One participant remarked the following:

The familiar one will motivate the youth more. It sounds not so official.

After the first focus group discussion, 2 text variants (number 2 with statistical reasoning and number 6 with a bullet-point format) were removed from the pool as these were highly redundant according to the previous participants, and going through 6 texts challenged their attention, limiting the effectiveness of group discussions. The remaining 4 variants were presented to both remaining focus groups in the same form.

Participants of the latter two group discussions (n=17) were more critical in all aspects of the quantitative survey. The results of the 5-point Likert scale scoring were the following: (1) informal, familiar tone (3.77; n=13), (2) focusing on motivation (3.69; n=15), (3) scientific reasoning (3.69; n=16), and (4) clickbait news article style (3.50; n=12).

Evaluation of AI-Generated Content by Research Participants

After scoring all text versions, an open discussion took place. All group discussions concluded that the texts are almost fully understandable. Two participants remarked the following:

I can totally get what they are speaking about.
The main point is clear, even if there are difficult words.

However, there were suggestions for certain changes related to wording for ease of reading. The replacement of medical jargon—from “oral cancer” to “mouth cavity tumor,” as the latter is a more commonly used term by the general population in the Hungarian language—was mentioned 7 times and affected all variants, while words with Latin roots, for example, “informing” and “early staging,” were advised to be changed to a more widely used expression one time each.

In addition, the length of sentences as a factor causing gaps in readability was mentioned twice in the context of the versions based on scientific and statistical reasoning. Furthermore, participants accommodated in night shelters and other temporary housing solutions mentioned that the phrasing in two-thirds (4/6) of the text variants was not inclusive enough, as the term “rough sleepers” was used as a synonym for the homeless population, and this might result in the alienation of other subgroups. As one participant said:

They say people living on the streets only. That’s not very motivating for me, who is living in a shelter.

Based on the focus group discussions, the research group summarized the main strengths and weaknesses of the text variants created by ChatGPT 3.5 in Textbox 1.

Textbox 1. Evaluation of strengths and weaknesses of the ChatGPT-generated health information content of 6 text variants by people experiencing homelessness.

Strengths

  • No significant opposition was detected against AI-created content from people experiencing homelessness.
  • It is easy to generate many text outputs with open-access tools quickly.
  • The results are almost ready to use, with minimal modification needed from the textual coherence point of view (in the Hungarian language).
  • In most cases, participants were positive about whether the texts could fulfill the goal of motivating the target population to attend the program.
  • Text variants in various tones and styles can attract different age groups.

Weaknesses

  • There was a level of disapproval, mostly regarding AI-based decision-making processes concerning health issues.
  • Text variants repeated the same problems (eg, medical jargon is difficult to understand for vulnerable populations).
  • The motivational elements of text variants were stereotypical to a subgroup of people experiencing homelessness (rough sleepers) and lacking other prominent subgroups (eg, people accommodated in community shelters or temporary hostels).

Assessment of Text Variants With the SHeLL Editor

The research participants mentioned during the focus group discussions that text variants presented words that were difficult for them to understand, so to have a more comprehensive understanding of the ChatGPT-generated text variants’ readability level, we evaluated the text with the help of the SHeLL Editor [33,38,43]. As this tool is only available in English, we translated the Hungarian text variants into English. The assessment of the text variants based on their word count, grade reading score, language complexity in percentages, and passive voice usage is summarized in Table 2.

Table 2. Evaluation of the readability of the 6 ChatGPT-generated text variants with the Sydney Health Literacy Lab Health Literacy Editor.
Text versionsWord count (sentence count)Grade reading scoreText complexity, %Passive voice, word count
Scientific evidence87 (6)11.526.60
Statistical evidence81 (5)10.421.40
Motivational90 (7)9.216.20
Informal94 (11)7.47.30
Clickbait news article100 (12)10.322.30
Bullet point format65 (6)8.321.20

Grade reading score refers to how difficult a text is to read and roughly corresponds to the expected reading ability for US school students in different grades [33]. Text complexity means the proportion of the text (%) that contains acronyms, uncommon words (as defined by an existing English-language corpus), or terms listed as public health or medical jargon [43].


Main Findings

Our aim to co-design an awareness-raising fact sheet for an oral cancer screening program with people experiencing homelessness and ChatGPT was realized. We were also able to test the usability of ChatGPT in designing adequate information materials serving the needs of people experiencing homelessness by having focus group participants evaluate the ChatGPT-generated text variants. Moreover, focus group participants expressed prior knowledge of the concept of AI. Of potential functions of AI, they mentioned text or image generation the most. It also turned out that they did not reject the medical use of AI, although they indicated hesitancy in trusting it, especially without human oversight.

The text evaluation included cohesiveness, wording, tone, and style, and the results showed that, overall, the texts were able to fulfill their purpose of motivating the target group to participate in the screening activities, although participants suggested that the wording could be less stereotypical and less difficult to read. They also mentioned that text variants with different tones and styles could attract different age groups from the diverse population of people experiencing homelessness. The readability assessment of the texts underpinned their findings as the readability level of the majority of the text variants was above the readability level recommended for health-related texts by the literature [44].

Applicability of Generative Software in Health Care

Many fields of possible applications have been raised in using generative AI in clinical settings, such as writing discharge summaries [44], medical notes based on transcripts of physician-patient encounters, summaries of laboratory test results [45], medical education [46], medical research [47], providing a communication platform for patients, and facilitating health information dissemination [47]. One of the most obvious applications is generating tailored patient information on a predetermined topic, as collecting massive amounts of available evidence on different topics and human-like reasoning are easily achievable with open-access versions of generative software.

However, vulnerable populations might have different contexts, motivations, challenges, and medical needs than the general population and often require tailored medical treatment approaches to ensure the safety and efficacy of the treatment alongside potentially optimal health outcomes [48]. Moreover, concerns have arisen that the quality of AI-generated results depends on the user’s ability to develop effective prompts, input accurate text for inquiries, and access advanced features through subscriptions; as a result, individuals with limited health literacy, insufficient prompt development skills, or an inability to afford premium subscriptions may miss out on these technological benefits, potentially exacerbating health disparities [49].

Vulnerable Groups and Their Knowledge and Trust Around AI

In health care, underserved subgroups are known to have limited access to care pathways and possess altered demands in addition to an existing systematic information asymmetry, as our previous study also revealed [14]. As the results showed, anxiety, misunderstanding, discrimination, and negative experiences related to this information deficit could be compensated for by using co-design principles. Better usability of such services might play an important role in the more equitable management of health issues. Moreover, the usage of ChatGPT as a co-design element might unburden health care and social care personnel tasked with the formulation of client information, as creating relevant materials with appropriate prompts takes significantly less time than building them from scratch. On the other hand, editing the draft or iterating the prompt sequence may require a level of expertise and take additional time, making the time-saving element of the use of ChatGPT unclear. Further on, as another potential downside of relying on the technology, it is questionable whether and for how long the subscription-based model of OpenAI or any other generative software development company will allow vulnerable populations to benefit from the advantages of generative AI in the future.

Our study recruited people experiencing homelessness, one of the most underserved populations. The randomly invited study participants had a nonnegligible prior knowledge of AI technology’s existence, although they self-evaluated their knowledge as slightly below average. Previous research shows that people with lower socioeconomic status are slower to adopt new technology, and the rates of smartphone and internet use among people experiencing homelessness were lower than for those with similarly low socioeconomic status but more stable housing [14,50]. A 2023 international, multicenter, cross-sectional study assessing the attitudes of hospital patients toward AI in health care across 43 countries, including Hungary, found that patients have a predominantly favorable general view of AI in health care [51]. In Hungary, a representative survey published in September 2024 found that 79% of the population believed they knew what AI was, and 31% of respondents used chatbots and virtual customer service assistants [52].

In our study, participants’ attitudes toward the medical use of AI were slightly above average, meaning that they might be hesitant or neutral when it comes to trusting such services. This is in line with other Hungarian general populational findings [53]. In this survey, researchers asked respondents how they would feel if their family doctor or medical specialists would partly rely on AI during their care; overall, 41.2% of the respondents were neutral, 27.5% said they would feel rather bad or very bad, while 31.3% reported they would feel rather well or very well about it [53].

Text Quality Evaluation

The ChatGPT-generated draft text variants had to be modified by the researchers as these versions contained a few severe grammatical errors in the Hungarian language; however, after such modifications were made, the texts were presentable and positively accepted by the focus groups. This might be due to generative AI software being most predominantly trained on Standard English texts, which means that in the case of small languages such as Hungarian, there is limited data available online for model training; therefore, large language models perform worse in such a “low resource” language compared to English or other “high resource” languages such as Spanish, Chinese, or Arabic [54].

This could partly explain why the assessment of the text variants by the SHeLL Editor showed such strong differences by readability, although the target group was defined in the prompts as people experiencing homelessness, implying generally lower health literacy levels. As Ayre et al [33] found when they were experimenting with prompt design, prompts that described specific health literacy principles (eg, simple language, active voice, minimal jargon) worked better with ChatGPT than prompts that described the target audience. This could suggest that social sector employees would greatly benefit from prompting skill enhancement concerning AI health-related text generation. The use of the official ChatGPT prompt engineering guide [55] or specific prompt design elements, such as in-context learning, could also aid the process [56].

Members of the focus groups generally stated that the various styles and tones might attract various subgroups and generations of people experiencing homelessness; however, they also noted that the motivational elements of text variants were stereotypical of a subgroup of people experiencing homelessness (rough sleepers) and lacking other prominent subgroups (eg, people accommodated in community shelters or temporary hostels). This could partly stem from the generalization bias challenge of LLM models such as the one behind ChatGPT. These models are trained on large datasets that may contain biases, stereotypes, and prejudiced language [49,57]. As a result, the model may unintentionally learn these biases and produce responses that are offensive or perpetuate harmful stereotypes, such as the one about people experiencing homelessness being represented as rough sleepers.

Co-Design With Experts by Experience and Technology

In recent years, co-design, co-creation, co-production, or different forms of citizen engagement and collaboration of stakeholders have gained popularity in various fields, including social services for people experiencing homelessness [58]. The involvement of individuals with lived experience has also been shown to increase recruitment and follow-up rates in research projects, add to the validation of research findings, and generate more useful outputs [59,60]. This research project highlights these previous findings, as the involvement of the experts by experience group, as well as two focus groups, generated useful insights.

This experimental focus group study offered the opportunity to bring generative AI technology into the co-design process as a potential new element for the consideration of personnel working in the social or health sectors dealing with vulnerable subgroups, although the final benefits of this approach require further research and analysis. Our results showed that ChatGPT could produce usable material as a solid base for a health information material draft, which was acceptable for the target group, while the co-design process revealed additional benefits.

Limitations

Our study had some limitations. As a qualitative study relying on focus groups and feedback questionnaires, the methods themselves posed certain drawbacks. Although focus groups encourage participation from vulnerable populations and do not rely on participant literacy, they offer a space where those individual perspectives that differ from the majority opinion might remain hidden due to overriding behavioral or cultural norms or a desire to be seen as conforming [61,62].

The study participants were selected from the urban homeless population in Budapest, Hungary, where socioeconomic conditions might differ from those in the countryside. In addition, participants represented people experiencing homelessness who had a connection to the social infrastructure; therefore, others not in touch with the Hungarian social service architecture were not represented in the study sample. For a qualitative study using focus groups and feedback questionnaires, the sample size was small, and this should be taken into account when drawing conclusions.

Regarding technology, the researchers used OpenAI’s most advanced freely accessible technology, ChatGPT 3.5, at the time of the research, while other generative AI software, such as Google’s Gemini (previously Bard), Claude, or Synthesia, were not used. The use of ChatGPT 3.5, or any other generative software for that matter, also raises the question of replicability; with the constant and rapid development of LLMs, it might become uncertain whether this research could be replicated with the same technological conditions. Regarding the text evaluation aspect, we did not use a baseline text variant produced by human hands, as we had the intention to involve only ChatGPT in the co-creation process, as well as to assess the quality of the text variants that emerged during the process.

Conclusions

Our study revealed that health information materials generated by AI can be used by people experiencing homelessness in an oral cancer screening program. The co-design process revealed that the participants in the focus groups wanted to actively shape the drafts for the screening program and shared their ideas and insights on how to finalize the texts to avoid prevailing stereotypes about people experiencing homelessness and include more subgroups, as well as how to frame the text for various target audiences.

The group discussion also revealed some challenges of current LLM technology when using it without prior prompting experience. Based on our results, using the most up-to-date LLM technology, considering the health literacy and general language skills of vulnerable populations and avoiding generalization bias for this underrepresented group, and extensive prompt design upskilling of social workers and other groups of people aiming to produce health information material would be beneficial for future applications.

Moreover, via co-creation with members of the target audience, the final product might be more appealing to the target group of a health screening program. As a recommendation for its efficient use, offering prompt design training to personnel working in the social or health sectors may help maximize the impact of AI in client care.

Acknowledgments

The publication of this research project was supported by the European Social Fund Plus (ESF+) under the Project Code SOLACE-CEE 101172625. Views and opinions expressed are those of the author(s) only and do not necessarily reflect those of the European Union or the European Commission. Neither the European Union nor the granting authority can be held responsible for them.

Conflicts of Interest

None declared.

Multimedia Appendix 1

ChatGPT prompts and responses.

DOCX File, 26 KB

Checklist 1

Consolidated Criteria for Reporting Qualitative Research checklist.

DOCX File, 19 KB

  1. Schreiter S, Bermpohl F, Krausz M, et al. The prevalence of mental illness in homeless people in Germany. Dtsch Arztebl Int. Oct 6, 2017;114(40):665-672. [CrossRef] [Medline]
  2. Zhang L, Norena M, Gadermann A, et al. Concurrent disorders and health care utilization among homeless and vulnerably housed persons in Canada. J Dual Diagn. 2018;14(1):21-31. [CrossRef] [Medline]
  3. Graffy P, McKinnon S, Lee G, Remington P. Life outside: a narrative ethnographic inquiry into the determinants of homelessness. J Poverty. Apr 16, 2019;23(3):202-228. [CrossRef]
  4. Aldridge RW, Story A, Hwang SW, et al. Morbidity and mortality in homeless individuals, prisoners, sex workers, and individuals with substance use disorders in high-income countries: a systematic review and meta-analysis. Lancet. Jan 20, 2018;391(10117):241-250. [CrossRef] [Medline]
  5. van Dongen SI, van Straaten B, Wolf J, et al. Self-reported health, healthcare service use and health-related needs: a comparison of older and younger homeless people. Health Soc Care Community. Jul 2019;27(4):e379-e388. [CrossRef] [Medline]
  6. Beaton L, Coles E, Freeman R. Homeless in Scotland: an oral health and psychosocial needs assessment. Dent J (Basel). Dec 1, 2018;6(4):67. [CrossRef] [Medline]
  7. Daly B, Newton T, Batchelor P, Jones K. Oral health care needs and oral health-related quality of life (OHIP-14) in homeless people. Community Dent Oral Epidemiol. Apr 2010;38(2):136-144. [CrossRef] [Medline]
  8. Figueiredo RLF, Hwang SW, Quiñonez C. Dental health of homeless adults in Toronto, Canada. J Public Health Dent. 2013;73(1):74-78. [CrossRef] [Medline]
  9. Simons D, Pearson N, Movasaghi Z. Developing dental services for homeless people in East London. Br Dent J. Oct 2012;213(7):E11. [CrossRef] [Medline]
  10. Freitas DJ, Kaplan LM, Tieu L, Ponath C, Guzman D, Kushel M. Oral health and access to dental care among older homeless adults: results from the HOPE HOME study. J Public Health Dent. Dec 2019;79(1):3-9. [CrossRef] [Medline]
  11. Bedmar MA, Bennasar-Veny M, Artigas-Lelong B, et al. Health and access to healthcare in homeless people: protocol for a mixed-methods study. Medicine (Baltimore). Feb 18, 2022;101(7):e28816. [CrossRef] [Medline]
  12. Szabó G, Klenk G, Veér A. Correlation between the combination of alcohol consumption and smoking in oral cancer (screening of the population at risk). Orv Hetil. Dec 28, 1997;138(52):3297-3299. [Medline]
  13. Stormon N, Pradhan A, McAuliffe A, Ford PJ. Does a facilitated pathway improve access to dental services for homeless and disadvantaged adults? Eval Program Plann. Dec 2018;71:46-50. [CrossRef] [Medline]
  14. Radó N, Békási S, Győrffy Z. Health technology access and peer support among digitally engaged people experiencing homelessness: qualitative study. JMIR Hum Factors. May 14, 2024;11:e55415. [CrossRef] [Medline]
  15. Liu M, Hwang SW. Health care for homeless people. Nat Rev Dis Primers. Jan 14, 2021;7(1):5. [CrossRef] [Medline]
  16. Goode J, Hoang H, Crocombe L. Homeless adults’ access to dental services and strategies to improve their oral health: a systematic literature review. Aust J Prim Health. Jul 9, 2018. [CrossRef] [Medline]
  17. Doughty J, M Gallier S, Paisi M, Witton R, J Daley A. Opportunistic health screening for cardiovascular and diabetes risk factors in primary care dental practices: experiences from a service evaluation and a call to action. Br Dent J. Nov 2023;235(9):727-733. [CrossRef] [Medline]
  18. Nunez E, Gibson G, Jones JA, Schinka JA. Evaluating the impact of dental care on housing intervention program outcomes among homeless veterans. Am J Public Health. Dec 2013;103 Suppl 2(Suppl 2):S368-S373. [CrossRef] [Medline]
  19. Radó N, Girasek E, Békási S, Győrffy Z. Digital technology access and health-related internet use among people experiencing homelessness in Hungary: quantitative survey. J Med Internet Res. Oct 19, 2022;24(10):e38729. [CrossRef] [Medline]
  20. Győrffy Z, Békási S, Döbrössy B, et al. Exploratory attitude survey of homeless persons regarding telecare services in shelters providing mid- and long-term accommodation: the importance of trust. PLoS One. 2022;17(1):e0261145. [CrossRef] [Medline]
  21. Békási S, Girasek E, Győrffy Z. Telemedicine in community shelters: possibilities to improve chronic care among people experiencing homelessness in Hungary. Int J Equity Health. Dec 17, 2022;21(1):181. [CrossRef] [Medline]
  22. Sanders E. From user-centered to participatory design approaches. In: Des Soc Sci Mak Connect. 2002:1-7. [CrossRef] ISBN: 978-0-415-27376-3
  23. Csikar J, Vinall-Collier K, Richemond JM, Talbot J, Serban ST, Douglas GVA. Identifying the barriers and facilitators for homeless people to achieve good oral health. Community Dent Health. May 30, 2019;36(2):137-142. [CrossRef] [Medline]
  24. Sayani A, Hussain A, Freedman H, et al. EP.04A.10 Creating safe connections: usability testing of an intervention co-designed to increase equitable access to lung cancer screening. J Thorac Oncol. Oct 2024;19(10):S463-S464. [CrossRef]
  25. Fox S, Brown LJE, Antrobus S, et al. Co-design of a smartphone app for people living with dementia by applying agile, Iierative co-design principles: development and usability study. JMIR mHealth uHealth. Jan 14, 2022;10(1):e24483. [CrossRef] [Medline]
  26. Latulippe K, Hamel C, Giroux D. Co-design to support the development of inclusive eHealth tools for caregivers of functionally dependent older persons: social justice design. J Med Internet Res. Nov 9, 2020;22(11):e18399. [CrossRef] [Medline]
  27. Noorbergen TJ, Adam MTP, Teubner T, Collins CE. Using co-design in mobile health system development: a qualitative study with experts in co-design and mobile health system development. JMIR mHealth uHealth. Nov 10, 2021;9(11):e27896. [CrossRef] [Medline]
  28. Lewis CC, Taba M, Allen TB, et al. Developing an educational resource aimed at improving adolescent digital health literacy: using co-design as research methodology. J Med Internet Res. Aug 7, 2024;26(1):e49453. [CrossRef] [Medline]
  29. Visser FS, Stappers PJ, van der Lugt R, Sanders EBN. Contextmapping: experiences from practice. CoDesign. Apr 2005;1(2):119-149. [CrossRef]
  30. Rodriguez A, Shambhunath S, Wijesiri TID, Biazus-Dalcin C, Mc Goldrick N. Co-design of health educational materials with people experiencing homelessness and support workers: a scoping review. Front Oral Health. 2024;5:1355349. [CrossRef] [Medline]
  31. Lambert R, Choo ZY, Gradwohl K, Schroedl L, Ruiz De Luzuriaga A. Assessing the application of large language models in generating dermatologic patient education materials according to reading level: qualitative study. JMIR Dermatol. May 16, 2024;7:e55898. [CrossRef] [Medline]
  32. Jia JL, Nguyen B, Sarin KY. Assessment of readability and content of patient-initiated Google search results for epidermolysis bullosa. Pediatr Dermatol. Nov 2019;36(6):1004-1006. [CrossRef] [Medline]
  33. Ayre J, Mac O, McCaffery K, et al. New frontiers in health literacy: using ChatGPT to simplify health information for people in the community. J Gen Intern Med. Mar 2024;39(4):573-577. [CrossRef] [Medline]
  34. Tong A, Sainsbury P, Craig J. Consolidated criteria for reporting qualitative research (COREQ): a 32-item checklist for interviews and focus groups. Int J Qual Health Care. Dec 2007;19(6):349-357. [CrossRef] [Medline]
  35. Dzuali F, Seiger K, Novoa R, et al. ChatGPT may improve access to language-concordant care for patients with non-English language preferences. JMIR Med Educ. Dec 10, 2024;10:e51435. [CrossRef] [Medline]
  36. Liu Y, Wang H. Who on Earth Is Using Generative AI. World Bank; 2024. [CrossRef]
  37. Williamson JML, Martin AG. Analysis of patient information leaflets provided by a district general hospital by the Flesch and Flesch-Kincaid method. Int J Clin Pract. Dec 2010;64(13):1824-1831. [CrossRef] [Medline]
  38. Świeczkowski D, Kułacz S. The use of the Gunning Fog Index to evaluate the readability of Polish and English drug leaflets in the context of Health Literacy challenges in Medical Linguistics: an exploratory study. Cardiol J. 2021;28(4):627-631. [CrossRef] [Medline]
  39. McLaughlin GH. SMOG grading — a new readability formula. J Read. 1969;22:639-646. URL: https:/​/ogg.​osu.edu/​media/​documents/​health_lit/​WRRSMOG_Readability_Formula_G.​_Harry_McLaughlin__1969_.​pdf [Accessed 2025-06-30]
  40. Mac O, Ayre J, Bell K, McCaffery K, Muscat DM. Comparison of readability scores for written health information across formulas using automated vs manual measures. JAMA Netw Open. Dec 1, 2022;5(12):e2246051. [CrossRef] [Medline]
  41. SHeLL Editor. URL: https://shell.techlab.works/ [Accessed 2025-06-30]
  42. 1997. évi CLIV. törvény az egészségügyről (1997th CLIV. Law on Health Care). WHO MiNDbank. URL: https://extranet.who.int/mindbank/item/3817 [Accessed 2025-05-29]
  43. Ayre J, Bonner C, Muscat DM, et al. Multiple automated health literacy assessments of written health information: development of the SHeLL (Sydney Health Literacy Lab) Health Literacy Editor v1. JMIR Form Res. Feb 14, 2023;7:e40645. [CrossRef] [Medline]
  44. Patel SB, Lam K. ChatGPT: the future of discharge summaries? Lancet Digit Health. Mar 2023;5(3):e107-e108. [CrossRef] [Medline]
  45. Lee P, Bubeck S, Petro J. Benefits, Limits, and Risks of GPT-4 as an AI Chatbot for Medicine. N Engl J Med. Mar 30, 2023;388(13):1233-1239. URL: https://www.nejm.org/doi/full/10.1056/NEJMsr2214184 [Accessed 2024-10-11] [CrossRef] [Medline]
  46. Preiksaitis C, Rose C. Opportunities, challenges, and future directions of generative artificial intelligence in medical education: scoping review. JMIR Med Educ. Oct 20, 2023;9(1):e48785. [CrossRef] [Medline]
  47. Fatima A, Shafique MA, Alam K, Fadlalla Ahmed TK, Mustafa MS. ChatGPT in medicine: a cross-disciplinary systematic review of ChatGPT’s (artificial intelligence) role in research, clinical practice, education, and patient interaction. Medicine (Baltimore). Aug 9, 2024;103(32):e39250. [CrossRef] [Medline]
  48. Tao H, Liu L, Cui J, Wang K, Peng L, Nahata MC. Potential use of ChatGPT for the treatment of infectious diseases in vulnerable populations. Ann Biomed Eng. Dec 2024;52(12):3141-3144. [CrossRef] [Medline]
  49. Uddin J, Feng C, Xu J. Health communication on the internet: promoting public health and exploring disparities in the generative AI era. J Med Internet Res. Mar 6, 2025;27(1):e66032. [CrossRef] [Medline]
  50. Raven MC, Kaplan LM, Rosenberg M, Tieu L, Guzman D, Kushel M. Mobile phone, computer, and internet use among older homeless aadults: results from the HOPE HOME cohort study. JMIR mHealth uHealth. Dec 10, 2018;6(12):e10049. [CrossRef] [Medline]
  51. Busch F, Hoffmann L, Xu L, et al. Multinational attitudes towards AI in healthcare and diagnostics among hospital patients. medRxiv. Preprint posted online on Sep 2, 2024. [CrossRef]
  52. Broadest ever research on AI readiness in Hungary published. IT Business. 2024. URL: https://itbusiness.hu/english/research-on-ai-awareness-in-hungary/ [Accessed 2024-10-21]
  53. Győrffy Z, Radó N, editors. E-Patients and E-Physicians in Hungary. 2024. URL: https://www.semmelweiskiado.hu/termek/2027/e-patients-and-e-physicians-in-hungary [Accessed 2025-06-30] ISBN: 9789633316443
  54. Nicholas G, Bhatia A. Lost in translation: large language models in non-English content analysis. arXiv. Preprint posted online on Jun 12, 2023. [CrossRef]
  55. Prompt engineering best practices for ChatGPT. OpenAI. URL: https://help.openai.com/en/articles/10032626-prompt-engineering-best-practices-for-chatgpt [Accessed 2025-05-08]
  56. Ferber D, Wölflein G, Wiest IC, et al. In-context learning enables multimodal large language models to classify cancer pathology images. Nat Commun. Nov 21, 2024;15(1):10104. [CrossRef] [Medline]
  57. Ray PP. ChatGPT: a comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Internet of Things and Cyber-Physical Systems. 2023;3:121-154. [CrossRef]
  58. Meriluoto T. Between expertise and authenticity: co-creation in Finnish Housing First Initiatives from the perspective of experts-by-experience. Eur J Homelessness. 2018;12(1):61-83. URL: https://www.feantsaresearch.org/download/12-1_a3_article-35525901618013701312.pdf [Accessed 2025-06-30]
  59. Crooks J, Flemming K, Shulman C, Casey E, Hudson B. Involving people with lived experience of homelessness in palliative and end of life care research: key considerations from experts in the field. Res Involv Engagem. Jan 30, 2024;10(1):16. [CrossRef] [Medline]
  60. Staley K. 'Is it worth doing?' Measuring the impact of patient and public involvement in research. Res Involv Engagem. 2015;1(1):6. [CrossRef] [Medline]
  61. Kitzinger J. Qualitative research. Introducing focus groups. BMJ. Jul 29, 1995;311(7000):299-302. [CrossRef] [Medline]
  62. Rice PL, Ezzy D. Qualitative Research Methods: A Health Focus. Oxford University Press ISBN: 9780195506105


AI: artificial intelligence
COREQ: Consolidated Criteria for Reporting Qualitative Research
LLM: large language model
SHeLL: Sydney Health Literacy Lab


Edited by Amaryllis Mavragani; submitted 02.11.24; peer-reviewed by Fangyuan Chen, Velvin Fu; final revised version received 02.06.25; accepted 03.06.25; published 04.07.25.

Copyright

© Nóra Radó, Orsolya Németh, Sándor Békási. Originally published in JMIR Formative Research (https://formative.jmir.org), 4.7.2025.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Formative Research, is properly cited. The complete bibliographic information, a link to the original publication on https://formative.jmir.org, as well as this copyright and license information must be included.