Original Paper
Abstract
Background: The increasing use of ChatGPT in clinical practice and medical education necessitates the evaluation of its reliability, particularly in geriatrics.
Objective: This study aimed to evaluate ChatGPT’s trustworthiness in geriatrics through 3 distinct approaches: evaluating ChatGPT’s geriatrics attitude, knowledge, and clinical application with 2 vignettes of geriatric syndromes (polypharmacy and falls).
Methods: We used the validated University of California, Los Angeles, geriatrics attitude and knowledge instruments to evaluate ChatGPT’s geriatrics attitude and knowledge and compare its performance with that of medical students, residents, and geriatrics fellows from reported results in the literature. We also evaluated ChatGPT’s application to 2 vignettes of geriatric syndromes (polypharmacy and falls).
Results: The mean total score on geriatrics attitude of ChatGPT was significantly lower than that of trainees (medical students, internal medicine residents, and geriatric medicine fellows; 2.7 vs 3.7 on a scale from 1-5; 1=strongly disagree; 5=strongly agree). The mean subscore on positive geriatrics attitude of ChatGPT was higher than that of the trainees (medical students, internal medicine residents, and neurologists; 4.1 vs 3.7 on a scale from 1 to 5 where a higher score means a more positive attitude toward older adults). The mean subscore on negative geriatrics attitude of ChatGPT was lower than that of the trainees and neurologists (1.8 vs 2.8 on a scale from 1 to 5 where a lower subscore means a less negative attitude toward aging). On the University of California, Los Angeles geriatrics knowledge test, ChatGPT outperformed all medical students, internal medicine residents, and geriatric medicine fellows from validated studies (14.7 vs 11.3 with a score range of –18 to +18 where +18 means that all questions were answered correctly). Regarding the polypharmacy vignette, ChatGPT not only demonstrated solid knowledge of potentially inappropriate medications but also accurately identified 7 common potentially inappropriate medications and 5 drug-drug and 3 drug-disease interactions. However, ChatGPT missed 5 drug-disease and 1 drug-drug interaction and produced 2 hallucinations. Regarding the fall vignette, ChatGPT answered 3 of 5 pretests correctly and 2 of 5 pretests partially correctly, identified 6 categories of fall risks, followed fall guidelines correctly, listed 6 key physical examinations, and recommended 6 categories of fall prevention methods.
Conclusions: This study suggests that ChatGPT can be a valuable supplemental tool in geriatrics, offering reliable information with less age bias, robust geriatrics knowledge, and comprehensive recommendations for managing 2 common geriatric syndromes (polypharmacy and falls) that are consistent with evidence from guidelines, systematic reviews, and other types of studies. ChatGPT’s potential as an educational and clinical resource could significantly benefit trainees, health care providers, and laypeople. Further research using GPT-4o, larger geriatrics question sets, and more geriatric syndromes is needed to expand and confirm these findings before adopting ChatGPT widely for geriatrics education and practice.
doi:10.2196/63494
Keywords
Introduction
Background
ChatGPT stands for Chat Generative Pretrained Transformer and was developed by an artificial intelligence (AI) research company, OpenAI. It is an AI chatbot technology that can process our natural human language and generate a response. ChatGPT was released to the public by OpenAI on November 30, 2022, and surpassed 1 million users in just 5 days. This performance set a record for the second fastest-growing user base after that of Threads. ChatGPT currently has >180 million users (as of October 26, 2024) [
]. ChatGPT users have increased exponentially, and it has been increasingly used in medicine, resulting in 4625 publications on PubMed (as of October 26, 2024) [ ]. Notably, ChatGPT was not programmed originally to be used for medical education and clinical application [ ] despite reports of its wide application for medical education [ - ] and clinical practice [ - ]. ChatGPT has been studied in multiple specialties and subspecialties, such as psychiatry [ ], radiology [ ], and cardiology [ ]. Its rapid progress in such a brief time has raised challenges, concerns, and limitations, including privacy, ethnic and racial bias, and legal risks [ - ].Any application of ChatGPT to medical education and clinical practice could be promising but needs to be well designed and investigated. In the context of a growing aging population, geriatrics education, geriatrics workforces, and geriatrics practice have been facing significant challenges for the last 4 decades [
- ]. ChatGPT could offer an opportunity to improve geriatrics education and clinical practice [ - ]. Preliminary studies have shown promise [ - ]. For example, older adults often have polypharmacy issues [ ], and providers review and deprescribe medications [ ]. A specifically trained large language model may provide useful clinical support in polypharmacy management for primary care physicians [ ]. ChatGPT has scored better on general and theoretical questions in geriatrics than on complex decisions and end-of-life situations, with the lowest scores related to diagnosis and performing complex tests [ ]. Overall, the application of ChatGPT to geriatrics education and clinical practice is sporadic and needs to be expanded and further investigated. Therefore, this study was designed to determine whether we can trust ChatGPT to be used in geriatrics education and clinical practice using 3 distinct approaches.The first approach was to evaluate any age-biased outputs generated by ChatGPT. The aging narrative spans >210 years [
]. The term ageism, coined by Robert Butler in 1969, refers to age-based stereotyping, prejudice, and discrimination [ ]. Self-perception of aging is another emergent concept related to ageism [ ]. Ageism is global and prevalent in daily life [ , , ], health care systems [ , ], and the news media [ , ]. It is present in various medicine specialties such as oncology [ ] and cardiology [ ] and was persistent during the COVID-19 pandemic [ , ]. Ageism is significantly associated with poor health outcomes and quality of life [ , ]. Interventions to reduce ageism have been widely studied [ , ]. ChatGPT has been found to exhibit privacy, ethnicity, gender, and racial bias and entail legal risks [ - ]. For example, ChatGPT recommends fewer female than male ophthalmologists [ ]. However, whether ChatGPT generates age-biased outputs has not been reported. This study aimed to demonstrate whether ChatGPT exhibits ageism by testing its geriatrics attitude using a validated University of California, Los Angeles (UCLA) geriatrics attitude instrument [ , ] and comparing its results with those obtained by medical students residents, and geriatrics medicine fellows from published articlesThe second approach was to evaluate the geriatrics knowledge of ChatGPT. ChatGPT has passed the United States Medical Licensing Examination (USMLE) [
, ] and many other medicine exams [ - ], demonstrating its competence in medical knowledge. However, it has been less studied in geriatrics. For example, in one Geriatrics and AI in Spain project in Spain [ ], ChatGPT was prompted with 10 questions about geriatric medicine. Compared to 130 physicians who answered the questionnaire, ChatGPT scored better on general and theoretical questions than on complex decisions and end-of-life situations, with the lowest scores for diagnosis and performing complex tests [ ]. AI is likely to be incorporated into some areas of geriatric medicine, but it still presents significant limitations, mainly in complex medical decision-making [ ]. This study was designed to demonstrate the geriatrics competency of ChatGPT by examining its performance on the validated UCLA geriatrics knowledge test [ , , ] and comparing it with that of medical students, internal medicine residents, and geriatric medicine fellows from previously published articles in the literature.The third approach was to evaluate ChatGPT’s knowledge of 2 geriatric syndromes and its clinical application to them (polypharmacy and falls). Multiple previous studies have shown that ChatGPT performs well on clinical vignette questions [
- ]. For example, ChatGPT achieved 71.7% accuracy overall on 36 clinical vignettes from the Merck Sharpe and Dohme clinical manuals and impressive accuracy in clinical decision-making [ ]. In another study, GPT-4o was queried for diagnoses and management plans with 20 physician-written clinical vignettes in otolaryngology, showing high agreement with physicians [ ]. In addition, ChatGPT demonstrated appreciable knowledge and interpretation skills in psychiatry through 100 clinical case vignettes [ ]. In another study, 33 physicians across 17 specialties generated 284 medical questions. ChatGPT generated largely accurate information in response to diverse medical queries as judged by academic physician specialists, with improvement over time [ ]. ChatGPT also offered accurate recommendations on managing hypertension based on clinical practice guidelines [ ]. ChatGPT provided accurate recommendations for 8 out of 10 clinical scenarios in cardiology [ ]. In another study, GPT-4o generated a good rehabilitation plan for a stroke case from a textbook [ ]. Finally, several studies have reported the application of ChatGPT to predict common drug-drug interactions [ ], managing polypharmacy [ ], clinical pharmacy [ - ], and medical pharmacology as a self-learning tool [ ]. Taken together, all previous studies suggest that ChatGPT could potentially be applied to complex geriatric syndrome vignettes with comprehensive clinical questions. This study was designed to demonstrate whether ChatGPT has solid geriatrics knowledge and can apply it to 2 common complex geriatric syndrome vignettes (polypharmacy and falls) by responding to comprehensive questions.Study Objectives
We designed these 3 distinct approaches to provide evidence of whether ChatGPT can be trusted to be potentially applied to geriatrics education and clinical practice as an assistive tool.
Methods
This observational study used the validated UCLA geriatrics attitude instrument [
, ] and geriatrics knowledge test [ ].The Geriatrics Attitude Instrument
The geriatrics attitude instrument comprises 16 questions (
) [ , ]. In total, 6 of these statements reflect positive attitudes toward aging, such as “Most old people are pleasant to be with” (question 1). A total of 10 statements reflect negative attitudes toward aging, such as “Treatment of chronically ill old patients is hopeless” (question 11). Responses are graded on a Likert scale from 1 (strongly disagree) to 5 (strongly agree). The mean total score of ChatGPT was calculated and compared to scores from medical students, internal medicine residents, and geriatric medicine fellows based on previously published studies in the literature. Subtotals for positive and negative attitudes toward aging were also calculated to compare them to those in previously published studies. For statements reflecting a positive attitude toward aging, higher scores indicated a more positive attitude toward aging. For statements reflecting a negative attitude toward aging, lower scores indicated a less biased attitude toward aging.The Geriatrics Knowledge Test
The validated UCLA geriatrics knowledge test contains 18 questions (
) [ ]. In total, 8 questions are true-or-false statements, such as “Most older people are living in nursing homes” (question 4; false). A total of 10 questions are short clinical vignettes, such as “A 78-year-old nursing home resident with mild dementia associated with Alzheimer’s disease is best determined by their ability to understand treatment options” (question 9; correct answer: B).To minimize unfounded guessing, the following scoring system was used: +1 for a correct answer, –1 for a wrong answer, and 0 for “don’t know.” Therefore, the total score ranged from −18 to +18 [
]. Correct answers were determined by the author based on the literature.ChatGPT Inputs and Outputs
GPT-3.5 was prompted to undertake the validated UCLA geriatrics attitude instrument and geriatrics knowledge test. Prompts were derived from original published papers [
, , ]. Each prompt was repeated 4 times to evaluate the consistency and obtain an average response. In addition, ChatGPT was prompted to respond to 2 common geriatric syndrome vignettes (polypharmacy and falls). These 2 vignettes with questions were used in the author’s previously published geriatrics curricula [ , ] and are described fully in the following sections.The Polypharmacy Vignette
This vignette was used from a previously published workshop of prescribing and deprescribing [
]:Ms. Smith, an 85-year-old white woman, has HTN, CAD, HFpEF, A-Fib, DM (II), HLD, hypothyroidism, gout, OA. On Warfarin 2.5 mg/d, furosemide 20 mg/d, Glyburide 5 mg/d, Spironolactone 50 mg/d, Amiodarone 300mg/d, Carvedilol 6.25 mg twice a day, Lisinopril 40 mg/d, Lipitor 80 mg/d, Levothyroxine 250 mg/d, Allopurinol 100 mg three times a day, acetaminophen 500 mg two tabs q6h for joint pain, and several supplements. Her pulse is 62 beat/min. Her BP is 120/70 (sit) and 98/64 (upright), weight 60 kg. No JVD, HR regularly, no murmur, few crackles on lung bases. Abdomen exam is benign, + pitting edema of ankles. Serum Cr is 1.2 mg/dL and Bun is 15 mg/dL. INR is 3.5. LDL is 50. albumin 4, Hb1ac is 6.1. TSH is 2. EF is 60%.
ChatGPT was prompted with the polypharmacy vignette following 2 steps. First, ChatGPT was prompted to answer 4 general questions on the appropriate use of medications to examine its geriatric pharmacology knowledge: (1) “Which of the commonly used 8 drugs are potentially inappropriate medication (PIM) and should be avoided in older adults?” (2) “What is PIM in older adults?” (3) “How to identify PIM in older adults?” (4) “What to do with PIM?” Second, ChatGPT was prompted to answer 4 additional specific questions related to the polypharmacy vignette to evaluate its application of geriatric pharmacology to a clinical vignette: (1) “Any drug-drug interactions for this patient?” (2) “Any drug is ineffective for this patient?” (3) “This patient was taking so many medications. Any way to reduce the number of her medications?” (4) “Any drug-disease interaction?”
The Fall Vignette
This vignette was used from a previously published workshop of prescribing and deprescribing [
]:An 82-year-old community-dwelling woman with PMH of multiple falls, right hip fracture, Dx of hypertension, coronary heart disease, knee arthritis, came to my clinic after she fell after her meal at home this morning. She felt pain of R head. She had a cane and walker at home but didn’t use that often. She thought she didn’t need it. She has taken care of her husband for several years. Her medication included Doxazosin (Cardura) and diltiazem (Cardizem), and acetaminophen as needed. I asked many questions related to her fall. Her vital signs were normal without orthostatic hypotension. Her cardiovascular and neurological examinations were benign. Her mini-mental status examination (MMSE) was 30/30. She had independent activities of daily living (ADL). Her geriatric depression scale (GDS) was 0/15. She passed timed up and go test. Because of her right black eye and pain of right head, I sent her to UVA to rule out orbital fracture. Please ask the following questions: List all potential fall risk factors for this patient What are three fall screening questions you need to ask for elderly patients in an outpatient setting based on 2010 fall prevention guideline you want to follow for this patient? Which one of three pathways in figure 1 from 2010 fall prevention guideline you want to follow for this patient? List all physical examinations that you think are important to perform for this patient. List your recommendations as many as you think that could help her to prevent her falling.
ChatGPT was prompted with the fall vignette following 3 steps. First, ChatGPT was prompted to answer 1 general question on fall risks in older adults. Second, ChatGPT was prompted to answer five pretest questions on fall prevention in older adults: (1) “When an older person falls, should you ask the following questions (one or more or none)? the circumstances of fall, mental status, medication changes, vision change, drinking”; (2) “Which of the following illnesses is a fall risk (one or more or none)? Pneumonia, Parkinson’s disease, Diabetes mellitus, Dementia, Orthostatic hypotension”; (3) “Which of the following medications are not associated with an increased fall risk (one or more)? Levothyroxine, Diazepam, Oxycodone, Amitriptyline”; (4) “Which ones of the following are true (one or more or none)? An elderly person may become so fearful of falling that they restrict his or her mobility Most falls are associated with significant injuries Falls is a leading cause of accidental death in older population About 5% community-dwelling elderly person falls each year Survivors of fall-related hip fracture are rarely institutionalized Exercise improves function and reduces fall risk and injurious falls”; and (5) “Which of the following is NOT an environmental fall risk? (only pick one answer) Throw rug, freshly waxed kitchen floor, Grab bars, Electrical cord lying on the floor.” Finally, ChatGPT was prompted to answer 5 additional questions to examine the application of fall prevention knowledge to the fall vignette: (1) “List all potential fall risk factors for this patient”; (2) “What are three fall screening questions you need to ask for elderly patients in an outpatient setting based on 2010 fall prevention guideline you want to follow for this patient?”; (3) “Which one of three pathways in figure 1 from 2010 fall prevention guideline you want to follow for this patient?”; (4) “List all physical examinations that you think they are important to perform for this patient”; and (5) “List your recommendations as many as you think that could help her to prevent her falling.”
Evaluation of ChatGPT Outputs
ChatGPT outputs were collected and analyzed by the author. The author judged the correctness of the ChatGPT outputs on the polypharmacy and fall vignette questions based on evidence from clinical practice guidelines, systematic reviews, and other types of evidence in the literature. The descriptive analysis was performed using SPSS (version 28; IBM Corp).
In summary, we designed these 3 distinct approaches to provide evidence on whether we can trust ChatGPT to be potentially applied to geriatrics education and clinical practice as an assisting tool.
Ethical Considerations
This study was not human participant related, including the secondary data analysis. No informed consent was needed.
Results
Geriatrics Attitude
The total geriatrics attitude score of ChatGPT was significantly lower than that of trainees (medical students, internal medicine residents, and geriatric medicine fellows) from validation [
, ] and follow-up studies [ ] ( ). However, the mean subscore on positive geriatrics attitude of ChatGPT was higher than that of the trainees (medical students, internal medicine residents, and neurologists; ), where a higher score is better. The higher subscore on positive geriatrics attitude indicates a better attitude toward aging. Conversely, the mean subscore on negative geriatrics attitude of ChatGPT was lower than that of the trainees and neurologists, where lower subscores are better ( ). The lower subscore on negative geriatrics attitude indicates a less age-biased attitude toward aging. Individual responses to 14 geriatrics attitude statements by ChatGPT, trainees, and neurologists are shown in . The subscores of the comparison group in were based on previously published studies [ - ]. Notably, responses to statements 15 and 16 were rarely reported in previously published studies ( ).Of the 14 geriatrics attitude statements (
), 5 (36%) were positive, and 9 (64%) were negative. Regarding the positive geriatrics attitude statements, ChatGPT had 1 lower positive geriatric attitude response (indicating a less positive geriatrics attitude) and 5 similar or better positive geriatric attitude responses compared to medical students and residents (indicating a similar or better positive geriatrics attitude). Regarding the negative geriatrics attitude statements, ChatGPT had 1 similar negative geriatric attitude response and 8 better negative geriatrics attitude responses than medical students and residents, indicating a less negative geriatrics attitude.The response from ChatGPT to the first prompt of the geriatrics attitude questions did not follow the Likert-scale format (1-5; 1=strongly disagree; 5=strongly agree;
and ) as ChatGPT provided only comments. The response to the third prompt included both the Likert-scale format (1-5; 1=strongly disagree; 5=strongly agree) and comments ( ).Geriatrics Knowledge
The total score of ChatGPT (mean 14.25) was significantly higher than the scores from all trainees (medical students, residents, and internal medicine fellows) in the validation studies [
] (mean 11.3; ) and was also significantly higher than that of first-year (mean 9.9) and second-year (mean 9.5) medical students ( ) in the follow-up studies [ ]. It was slightly higher than the scores of third-year medical students (mean 13.6) and internal medicine interns (mean 13.2), slightly lower than the scores of second- and third-year internal medicine residents (mean 14.7 and 14.9, respectively), and significantly lower than the scores of geriatric medicine fellows (mean 17.5) in the follow-up studies [ ]. ChatGPT’s performance on geriatrics knowledge in the first prompt was lower than that on the following 3 repeat prompts. For the first prompt, ChatGPT provided rationales without selecting options ( ). For the third prompt, ChatGPT selected options and provided rationales for its choices ( ). For the second and fourth prompts, ChatGPT select options without providing rationales ( ).To make it easy for the reader to replicate the results of this study,
shows a few examples of prompts on geriatrics attitude and knowledge test questions and outputs by ChatGPT (all original outputs of ChatGPT are available upon request).The Polypharmacy Vignette
Overall, ChatGPT performed well on the polypharmacy vignette involving a woman aged 85 years (
). ChatGPT provided appropriate responses to 4 general drug therapy questions and moderate responses to 4 specific drug therapy questions based on the vignette. ChatGPT correctly identified 5 drug-drug interactions and suggested deprescribing. However, it missed an ineffective medication (a supplement), 1 drug-drug interaction, and 3 drug-disease interactions ( ). Despite the patient not having lung disease, ChatGPT provided 2 irrelevant drug-disease interactions specific to the vignette (ChatGPT hallucination; ).The Fall Vignette
ChatGPT performed well on the fall vignette involving a woman aged 82 years (
). ChatGPT correctly summarized 10 common fall risks in older adults and correctly answered 3 out of 5 pretest questions. In the remaining 2 pretest questions, ChatGPT missed a few correct responses. In the fall vignette, ChatGPT provided appropriate responses to all 4 prompts ( ). ChatGPT recognized most fall risk factors and responded perfectly to fall screening questions following the latest Centers for Disease Control and Prevention (CDC) fall guidelines even though the prompt specified the 2010 fall prevention guidelines. ChatGPT accurately followed the 3 pathways from the 2010 fall guidelines. It missed a few physical examinations. ChatGPT provided almost all recommended fall prevention strategies, with only a few omissions.Discussion
Principal Findings
With the increasing use of ChatGPT in medical education [
- ] and clinical practice [ - ], the aim of this study was to evaluate whether we can trust ChatGPT to be used in geriatrics. The major findings of this study demonstrated that ChatGPT had less age-biased output than the trainees, outperformed the trainees on the validated UCLA geriatrics knowledge test, and reasonably applied geriatrics knowledge to 2 common geriatric syndrome vignettes (polypharmacy and falls). The preliminary findings of this study are promising and demonstrate that ChatGPT could be trusted and used as an assistant tool in geriatrics practice and education via 3 approaches, which are fully discussed in the following paragraphs.The first approach was to evaluate ChatGPT’s performance on a validated UCLA geriatrics attitude instrument [
, ]. To the best of our knowledge, this is the first study to demonstrate that ChatGPT is less likely to generate age-biased outputs, which contrasts with a few ethical and other concerns regarding ChatGPT [ - ]. This finding is significant because ageism is prevalent and associated with multiple poor outcomes, including health outcomes [ , ]. Various instruments exist to measure geriatrics attitude to assess ageism [ , ], with the UCLA geriatrics attitude instrument [ , ] being one of these validated tools [ ], which we used in this study [ , ]. The UCLA geriatrics attitude instrument includes 16 items [ , ], 6 of which refer to a positive geriatrics attitude (eg, “most old people are pleasant to be with”) where a higher score is better, and 10 of which refer to a negative geriatrics attitude where a lower score is better. Previous studies using the UCLA geriatrics attitude instrument have primarily used the total score on the geriatrics attitude scales without calculating and reporting the mean subscores on positive and negative geriatrics attitudes [ , ], which makes it difficult to interpret the significance of geriatrics attitude. However, a few studies have reported individual responses to the UCLA geriatrics attitude instrument, including positive and negative geriatrics attitude [ , ], for which we were able to calculate the subscores for comparison. We recommend that the subscores of the UCLA geriatrics attitude instrument should be reported to assess positive and negative geriatrics attitude. Overall, ChatGPT’s individual responses to the UCLA geriatrics attitude statements were better than those of medical students and residents, with higher scores for positive attitude and lower scores for negative attitude ( ), suggesting that ChatGPT has less age-biased outputs and could be trusted from an ethical perspective. This study suggests that ChatGPT is better than the news media and humans in providing less age-biased information [ , ]. However, a reliable and valid instrument with which to quantify modern medical student attitudes toward older people has not yet been developed. An adaptation of the Aging Semantic Differential scale for contemporary use has been recommended [ ]. This study can be repeated using the Aging Semantic Differential scale to see whether ChatGPT still generates less age-biased outputs in the future.The second approach was to evaluate ChatGPT’s geriatrics knowledge. This study is the first to assess ChatGPT’s performance on a validated UCLA geriatrics knowledge test [
, ]. ChatGPT has a substantial knowledge base and has been reported to pass the USMLE [ , ] and perform well on many other examinations [ - ]. Previous studies have suggested that ChatGPT could significantly impact medical education [ - ] and clinical practice, including clinical decision-making [ - ]. For example, a recent systematic review and meta-analysis on the performance of ChatGPT in medical examinations showed that ChatGPT has been evaluated in multiple specialties, including plastic surgery, ophthalmology, neurosurgery, orthopedics, diabetes, gastroenterology, radiology, cardiology, dermatology, and anesthesia [ ]. It can be reasonably believed that ChatGPT has good geriatrics knowledge. This study supported this belief and demonstrated that ChatGPT performed better than the average trainee in the validation studies (mean score 14.7 vs 11.3) and follow-up studies (mean score 14 vs 13.3; ). ChatGPT’s performance was better than that of medical students and first-year internal medicine residents (PGY 1 in ), comparable to that of second- and third-year residents (PGY 2 and PGY 3 in ), but lower than that of geriatric medicine fellows. This suggests that ChatGPT could be a valuable supplemental resource for health professional trainees, health care providers, and laypeople. Given that GPT-4o has more parameters, it is reasonable to believe that GPT-4o could perform better on the geriatrics knowledge test. The UCLA geriatrics knowledge test only has 18 questions [ ] and much less questions than the USMLE tested in previous studies [ , ]. Future studies should evaluate GPT-4o’s performance on more multiple-choice questions from geriatrics board examinations to confirm the findings of this study before adopting ChatGPT for geriatrics education and clinical practice.The third approach was to evaluate ChatGPT’s responses to 2 common geriatric syndrome vignettes—polypharmacy and falls in older adults—from the author’s previously published geriatrics curricula [
, ]. This study demonstrated that ChatGPT performed well on these 2 geriatric syndrome vignettes ( and ), which was promising and will be fully discussed in the following paragraphs.For the polypharmacy vignette, ChatGPT was able to provide systematic recommendations for polypharmacy in older adults, which has not been investigated before. In this study, ChatGPT provided appropriate general principles for prescribing and deprescribing; identified PIMs; and suggested resources and criteria such as the Beers criteria [
], the Screening Tool of Older Persons’ Prescriptions [ ], the Drug Burden Index [ ], and the Medication Appropriateness Index [ ] ( ). In addition, ChatGPT was able to provide 5 appropriate ways to manage PIMs in older adults ( ) that are consistent with the evidence from geriatrics practice guidelines and prescribing principles [ , - ]. In particular, this study demonstrated that ChatGPT can identify all 8 exemplary PIMs ( ). These medications should be avoided in older adults [ , - ]. ChatGPT was able to identify 4 drug-disease interactions but missed 3 drug-disease interactions in the polypharmacy vignette ( ). This suggests room for improvement in training ChatGPT further. ChatGPT also identified drug-drug interactions accurately in the polypharmacy vignette ( ), which was different from a previous study in which ChatGPT identified 39 out of 40 drug-drug interactions when it was given 40 pairs of drugs [ ]. Therefore, the question is what the implications are of what ChatGPT can do for polypharmacy. Previous studies have shown that within the last 10 years medical students, postgraduate residents, primary care providers, and pharmacists were unaware of PIMs and the standard guidelines for older adults, such as the Beers criteria [ - ]. This study has shown that ChatGPT is more knowledgeable than humans. Both geriatricians and general internists still prescribed 7.2% to 8.7% of PIMs, respectively [ ].The persistent prevalence of PIMs and polypharmacy in older adults was well-reported [
, ]. What ChatGPT demonstrated its good knowledge and ability to do with polypharmacy in this study suggests that ChatGPT has great potential to be used by trainees and providers as an assistant tool. One suggested example will be to include an older patient’s medical history and a list of all their medications in the ChatGPT prompt used in this study for medication review and to generate ChatGPT outputs which can be reviewed by providers at Clinic or other clinical settings. This will help nurses, other providers, and trainees at the point of care. It could help laypeople self-check PIMs. It could also help health professional trainees in the self-study of geriatric pharmacology.Falling is another significant US and global problem [
, ] and one of the common geriatric syndromes in older adults [ ]. ChatGPT has not been tested to provide systematic fall prevention recommendations for an older adult. This study demonstrated for the first time that ChatGPT has good knowledge of fall prevention as it performed well on pretests and provided a comprehensive summary of 10 common fall risks in older adults and specific recommendations in the fall vignette ( ).The 2024 US Preventive Services Task Force fall prevention guidelines listed several fall risks, including age; history of falls; cognitive and sensory deficits; presence of acute and chronic medical conditions; certain medications; environmental or occupational hazards; home or neighborhood features and alcohol or drug use; and impairments in mobility, gait, and balance [
]. The 2023 CDC Stopping Elderly Accidents, Deaths, and Injuries also has a long list of fall risks [ ]. In addition, a systematic review and meta-analysis showed evidence-based fall risk factors among the aging population [ ]. This study showed that ChatGPT identified most of fall risks for the fall vignette ( ) consistent with the aforementioned fall prevention guidelines [ , ] and systematic review [ ] except for missing risky behaviors such as standing on a chair instead of a step stool and loss of sensation in the feet.This study also used 5 pretests to further evaluate ChatGPT’s knowledge of fall prevention (
). ChatGPT generated responses consistent with the guidelines [ , ] and a systematic review [ ]. In pretest 1, ChatGPT provided 5 good recommendations with reasoning for taking the medical history of an older adult. In pretest 2, ChatGPT correctly identified several medical conditions as fall risks, including Parkinson disease, dementia, and orthostatic hypotension but missing pneumonia and diabetes mellitus. This indicates that ChatGPT has some limitations. In pretest 3, ChatGPT correctly identified diazepam, oxycodone, and amitriptyline as fall risks [ , - ] and levothyroxine as not a fall risk. In pretest 4, ChatGPT underestimated the prevalence of falls in community-dwelling older adults (5%) but correctly answered most other true and false questions about fall risks and consequences. In pretest 5, ChatGPT correctly answered all questions about environmental fall risks and identified grab bars as not a risk. Overall, ChatGPT performed very well on 5 pretests with a few mistakes, indicating good knowledge of fall prevention for older adults.To further test ChatGPT’s clinical application to individual patients, the fall vignette was used to prompt ChatGPT. ChatGPT correctly identified 6 common fall risk factors except for potential home environment factors, consistent with the evidence from fall prevention guidelines [
, ] and a systematic review [ ]. ChatGPT impressively recognized older caregiver burden, hypertension, and coronary heart diseases as fall risks, also consistent with the evidence from fall prevention guidelines [ , ] and a systematic review [ ]. In response to fall screening questions, ChatGPT was correct if the 2023 CDC guidelines were used [ ] but incorrect if the 2010 American Geriatrics Society guidelines were used [ ]. This study indicates that ChatGPT can use the latest guidelines, although specific prompts might be unclear to ChatGPT. Regarding physical examinations relevant to the fall vignette, ChatGPT missed several but recommended multiple appropriate physical examinations [ , ], indicating a reasonably good performance. In the final prompt on fall prevention recommendations for the fall vignette, ChatGPT provided 6 appropriate fall prevention recommendations, including home environment–related prevention despite previously missing the home environment as a fall risk [ - ]. It was notable that the home environment was not on the list of fall risks in ChatGPT’s response to the first prompt. However, ChatGPT correctly answered pretest 5 (home environment fall risk) and recommended home environment–related prevention for the patient in the fall vignette ( ). This suggests some discordance between the fall assessment and action plans by ChatGPT, indicating a possibly different thinking and decision process within ChatGPT.The crucial question is whether ChatGPT can be helpful and necessary in screening for fall risks and providing fall prevention recommendations to older adults in daily practice. Providers need knowledge and willingness to screen for fall risks in older adults, but multiple studies have shown that providers often lack fall prevention knowledge and are unwilling to screen for fall risks [
- ]. For example, one study showed that emergency providers lacked knowledge on which patients to be screened and were unwilling to spend more than a few minutes on screening for fall intervention [ ]. Another study revealed that only 14% of providers at accountable care organizations were aware of the CDC’s fall risk assessment algorithm (Stopping Elderly Accidents, Deaths, and Injuries) [ ]. Furthermore, 43% of primary care providers did not agree that they had the expertise or time to perform fall risk assessments, and only a small percentage billed for fall risk screening despite being aware of Medicare reimbursement [ ]. In addition, general practitioners were often unaware of their frail patients’ fall history or fear of falling, with most patients not receiving fall prevention care. Less than 40% of providers asked most or all their older patients if they had fallen in the previous 12 months, and less than a quarter referred their patients to physical therapists for balance or gait training [ ]. A recent CDC report showed some improvement, but gaps remain. Only 20% of providers were aware of any injury prevention resources, and a higher percentage screened for fall risk when older adults presented with specific concerns [ ].Given the high prevalence of fall among older adults and poor knowledge of fall prevention among providers, this study demonstrated that ChatGPT has better knowledge and ability to apply the fall prevention guidelines than many providers, suggesting that it can assist providers in assessing fall risks and preventing falls in older adults. For instance, entering an older patient’s history into ChatGPT and reviewing the outputs generated can help providers identify patients with fall risk at the point of care. ChatGPT can also be used as an assistant tool to help health professional trainees with self-study.
Limitations and Quality Assessment of This Study
There are serval limitations to this study, which will be discussed in this section. ChatGPT’s responses are based on the data that it was trained on, which may not include the most current geriatrics research, best practices, or region-specific guidelines, potentially limiting the trustworthiness of its advice. This study did not account for the quality of the information sources that ChatGPT draws from as ChatGPT cannot differentiate between authoritative and nonauthoritative content in real time like all other ChatGPT-based research. Despite GPT-4o having more data for training without releasing its data sources, this will not solve this basic problem. This study indicates that applying ChatGPT to geriatrics practice and education should be cautious and it should be used as an assistant tool only. ChatGPT will never or is unlikely to replace clinicians. As this study evaluated ChatGPT based on geriatric syndrome vignettes, it does not fully account for the complexities and nuances of real-world clinical decision-making and patient interactions, where additional context, history, and human judgment play critical roles. In addition, ChatGPT’s responses might be overly generalized as it lacks the ability to perform individualized assessments or make nuanced clinical decisions based on patient-specific data. This is similar to vignette-based simulation education. This indicates that ChatGPT will not make a decision for someone but could assist with the decision-making. The effectiveness of ChatGPT’s responses varies significantly depending on how well the user phrases the questions or provides necessary information. The input provided in this study was based on the author’s geriatrics practice experience for >20 years, which might not mimic the way in which other clinicians would interact with the tool. Therefore, the results of this study may not be generalizable. Prompt engineering is developing to help improve the quality of the input. However, a standardized prompts among different providers can be very challenging. The author suggests practicing prompts to improve human-ChatGPT interaction and obtain reliable outputs from ChatGPT.
ChatGPT lacks clinical experience and the ability to engage in human clinical judgment, whereas it can process vast amounts of medical information. This may limit its application in critical decision-making scenarios or nuanced treatment planning for complex geriatrics cases. Providers must use their clinical judgment integrated with clinical circumstances and not fully depend on the outputs of ChatGPT.
ChatGPT might face ethical and legal consequences. ChatGPT or AI hallucinations could cause ethical and legal risks. After the release of GPT-4o, ChatGPT often says, “I am AI and not clinician and cannot make decisions for you.” It seems that ChatGPT developers recognize the potential legal and ethical consequences and protect ChatGPT.
Human-AI interaction is complex. Providers should be cautious in interpreting ChatGPT outputs. ChatGPT outputs should be interpreting with evidence-based clinical guidelines, systematic reviews, and randomized controlled trials shown in the Comments from the author columns in
and . Whether the ChatGPT outputs are appropriate in this study should be judged and verified by a group of geriatriciansIn addition, this study has several other limitations. This study used GPT-3.5. GPT-4o might produce better results. Comparisons between ChatGPT and human performance on geriatrics attitude and knowledge were based on previously reported results in the literature many years ago. A new study assessing geriatrics attitude and knowledge among current medical students, residents, and geriatric medicine fellows is needed to confirm the findings of this study. ChatGPT produced some errors, including ChatGPT hallucinations such as unrelated drug-disease interactions in
, consistent with previous reports [ , ]. We should be aware that ChatGPT can make mistakes, just like humans. ChatGPT should not be used alone in geriatrics practice and education at this point. ChatGPT needs to be pretrained specifically for medical education and clinical practice.Like any other research, systematic appraisal of the current ChatGPT research is needed. A preliminary checklist, Model, Evaluation, Timing, Range and Randomization, Individual Factors, Counts, and Specificity of Prompts and Language (METRICS), to standardize the design and reporting of studies on generative AI–based models in health care education and practice has been recently developed [
]. The current study was started and completed before its development. However, it is worthwhile to use METRICS to evaluate the quality of this study. The following seven domains of METRICS will be used:- Model—ChatGPT was described and used.
- Evaluation—both subjective and objective evaluation were described and used.
- Timing—the study date was documented, but the duration was not documented; transparency—the geriatrics attitude instrument and knowledge test, the vignettes, and prompts were well described, and the outputs were exactly presented.
- Range—the topics were well described, including geriatrics attitude and knowledge; randomization was not used.
- Individual—there was subjective involvement with ChatGPT, such as judging the correctness of its outputs. This study was conducted by a single investigator.
- Count—a total of 4 prompts were used. This study did not have a sample size.
- Specificity of the prompts or language—for geriatrics attitude and knowledge, the prompts were from previously published studies. It is unclear whether they are appropriate for ChatGPT. For the 2 vignettes on polypharmacy and fall, the prompts were from previously published curricula. It is also unclear whether they are appropriate for ChatGPT.
Overall, the quality of this study is reasonably good based on METRICS. METRICS should be used in the reporting of any AI-based research in all journals.
In summary, this study took 3 distinct approaches to demonstrate the trustworthiness of ChatGPT to be used as an assistant tool in geriatrics practice and education. The implications of this preliminary study for geriatrics practice and education are further discussed in the following section.
Implications for Geriatrics Practice and Education
This study showed that ChatGPT outputs are not age-biased using the validated geriatrics attitude test and that ChatGPT performed well on the validated geriatrics knowledge test and on applied geriatrics knowledge tests (2 cases of common geriatric syndromes). These findings suggest that ChatGPT could potentially help geriatrics practice and education as an assistant tool in numerous ways. It is very important to know that ChatGPT is algorithm based. Clinical practice often uses algorithm- and pathway. Therefore, the underlying rationale for ChatGPT and clinical practice is similar. However, ChatGPT is not intended to replace physicians and other providers.
ChatGPT can be used as an assistant geriatrics tool and tutor to support self-learning and immediate feedback and self-assessment. For example, when trainees are studying geriatric syndromes, they can obtain responses on geriatric syndromes from ChatGPT. Therefore, ChatGPT could be incorporated to supplement the existing program in geriatrics education and support continuing medical education. ChatGPT can be potentially to be integrated to clinical reasoning and decision in a timely fashion and potentially improve the point of care in all clinical settings. For example, ChatGPT can help identify individual older patients who might have fall risks and drug-drug interactions.
ChatGPT can be used as an assistant or autonomous clinical tool to alleviate the geriatrics workforce crisis. For example, a nongeriatrician health provider could look for answers to clinical geriatrics questions, such as recommendations for preventing falls and reducing polypharmacy in older patients, from ChatGPT and help answer their patients’ questions in a timely fashion. This is particularly helpful for rural practice where geriatrics consults are rare or nonexistent. This will improve physicians’ and other providers’ efficiency and accuracy in the care of older patients and provider-patient interactions.
It is expected that ChatGPT can help with post-visit summaries by providing patient education materials, such as home safety and environment modification for fall prevention. ChatGPT can also draft Clinic letters such as providing lab results and their interpretation to the patient in the Clinic letter.
ChatGPT can help older patients with self-screening or initial self-assessment when they are sitting in a clinic waiting room. For example, they can input their medical history into ChatGPT such as age, physical function medications, and chronic conditions to ask whether they are at higher risk of fall.
Importantly, GPT-3.5 is free to use and can be cost-saving for providers and the health care system.
Finally, with more data training, ChatGPT will be more reliable and helpful in geriatrics practice and education.
Comparison With Previous Work
To the best of our knowledge, health professional trainees and providers but not ChatGPT have been assessed for geriatrics attitude [
, ]. ChatGPT has been assessed for geriatrics knowledge [ - ]. For example, ChatGPT was asked to answer 10 geriatrics questions [ ]. The correctness of the outputs by ChatGPT were rated by a group of geriatricians [ ]. In contrast, in this study, the correctness of the outputs by ChatGPT was determined by the author only, which could lead to errors. In another study, ChatGPT was prompted with 1 simple question [ ]. The questions in the aforementioned studies were not validated. However, this study used validated UCLA geriatrics knowledge tests [ , - ]. Several previous studies have demonstrated that ChatGPT has geriatric pharmacology knowledge [ , - ]. For example, ChatGPT could identify drug-drug interactions from 40 drug-drug pairs [ ]. In another study, ChatGPT could manage polypharmacy and make decisions on deprescribing 2 to 3 inappropriate medications based on a vignette [ ]. Another study compared the answers from ChatGPT and clinical pharmacists to real clinical cases and clinical competency assessments [ ]. However, these studies are different from this study, which used a systemic approach to polypharmacy ( ). To the best of our knowledge, ChatGPT has not approached falls in older adults.Future Directions
This study was a pilot with both promising findings and limitations. With the fast-growing use of ChatGPT and the new version of GPT-4o being released, the author suggests the following research directions.
A larger study using ChatGPT is needed to extend and confirm the findings of this study before adopting ChatGPT for geriatrics education and clinical practice. For example, more geriatrics knowledge questions and vignettes should be examined using GPT-4o. For example, the performance of ChatGPT on geriatrics certification exams should be tested.
Coinvestigators are needed to reduce biases. In particular, the correctness of the ChatGPT outputs should be evaluated and judged by a group of experts in geriatrics and integrated with the latest evidence, such as clinical practice guidelines, systematic reviews, and randomized controlled trials. For example, the ChatGPT outputs to a prompt should be rated from strongly agree to strongly disagree by a group of geriatrics experts in addition to determining their consistency with evidence-based clinical practice guidelines
Those who conduct ChatGPT research should receive training on prompt engineering. The author is doing this and feels that it is beneficial. This could reduce the variation in prompts.
The reliability of ChatGPT performance needs to be tested using different prompts to improve generalizability. The outcomes of the application of ChatGPT to geriatrics practice and education need to be further investigated.
Conclusions
This study suggests that ChatGPT could be a valuable assistant tool in geriatrics education and practice, helping health professional trainees and providers combat ageism, supplement geriatrics knowledge, and address common geriatric syndromes such as polypharmacy and falls. This study demonstrated that we could trust ChatGPT to be used in geriatrics practice and education by using 3 distinct approaches. One strength of this study is that
provide the details of prompts to ChatGPT and the outputs generated by ChatGPT, which allows readers to apply a similar approach to their geriatrics education and practice studies. The findings of this study are promising but need more investigation before ChatGPT is widely adopted in geriatrics practice and education.Data Availability
All data generated or analyzed during this study including all ChatGPT prompts and outputs are included in this published article and its supplementary information files on the journal website.
Conflicts of Interest
None declared.
Geriatrics attitudes tests.
DOCX File , 23 KBUniversity of California, Los Angeles, geriatrics knowledge test (questions 1-18).
DOCX File , 25 KBComparison of geriatrics attitude and knowledge test performance between ChatGPT and trainees.
DOCX File , 25 KBComparison of performance on the geriatrics attitude subscales by ChatGPT, trainees, and neurologists.
DOCX File , 17 KBExamples of ChatGPT outputs on geriatrics attitude and knowledge questions.
DOCX File , 15 KBChatGPT output on the polypharmacy vignette about a woman aged 85 years.
DOCX File , 17 KBFall risk identification and management in a woman aged 82 years.
DOCX File , 20 KBReferences
- Number of ChatGPT users (Dec 2024). Exploding Topics. Dec 3, 2024. URL: https://explodingtopics.com/blog/chatgpt-users [accessed 2024-10-26]
- PubMed homepage. PubMed. URL: https://pubmed.ncbi.nlm.nih.gov/ [accessed 2024-05-15]
- Boscardin CK, Gin B, Golde PB, Hauer KE. ChatGPT and generative artificial intelligence for medical education: potential impact and opportunity. Acad Med. Aug 31, 2023;99(1):22-27. [CrossRef]
- Preiksaitis C, Rose C. Opportunities, challenges, and future directions of generative artificial intelligence in medical education: scoping review. JMIR Med Educ. Oct 20, 2023;9:e48785. [FREE Full text] [CrossRef] [Medline]
- Sallam M. ChatGPT utility in healthcare education, research, and practice: systematic review on the promising perspectives and valid concerns. Healthcare (Basel). Mar 19, 2023;11(6):887. [FREE Full text] [CrossRef] [Medline]
- Thirunavukarasu AJ, Ting DS, Elangovan K, Gutierrez L, Tan TF, Ting DS. Large language models in medicine. Nat Med. Aug 17, 2023;29(8):1930-1940. [CrossRef] [Medline]
- Meng X, Yan X, Zhang K, Liu D, Cui X, Yang Y, et al. The application of large language models in medicine: a scoping review. iScience. May 17, 2024;27(5):109713. [FREE Full text] [CrossRef] [Medline]
- Liu J, Wang C, Liu S. Utility of ChatGPT in clinical practice. J Med Internet Res. Jun 28, 2023;25:e48568. [FREE Full text] [CrossRef] [Medline]
- Andrew A. Potential applications and implications of large language models in primary care. Fam Med Community Health. Jan 30, 2024;12(Suppl 1):e002602. [FREE Full text] [CrossRef] [Medline]
- Li J, Dada A, Puladi B, Kleesiek J, Egger J. ChatGPT in healthcare: a taxonomy and systematic review. Comput Methods Programs Biomed. Mar 2024;245:108013. [FREE Full text] [CrossRef] [Medline]
- Rao SJ, Isath A, Krishnan P, Tangsrivimol JA, Virk HU, Wang Z, et al. ChatGPT: a conceptual review of applications and utility in the field of medicine. J Med Syst. Jun 05, 2024;48(1):59. [CrossRef] [Medline]
- Momenaei B, Mansour HA, Kuriyan AE, Xu D, Sridhar J, Ting DS, et al. ChatGPT enters the room: what it means for patient counseling, physician education, academics, and disease management. Curr Opin Ophthalmol. May 01, 2024;35(3):205-209. [CrossRef] [Medline]
- Cheng SW, Chang CW, Chang WJ, Wang HW, Liang CS, Kishimoto T, et al. The now and future of ChatGPT and GPT in psychiatry. Psychiatry Clin Neurosci. Nov 11, 2023;77(11):592-596. [FREE Full text] [CrossRef] [Medline]
- Temperley HC, O'Sullivan NJ, Mac Curtain BM, Corr A, Meaney JF, Kelly ME, et al. Current applications and future potential of ChatGPT in radiology: a systematic review. J Med Imaging Radiat Oncol. Apr 2024;68(3):257-264. [CrossRef] [Medline]
- Lautrup AD, Hyrup T, Schneider-Kamp A, Dahl M, Lindholt JS, Schneider-Kamp P. Heart-to-heart with ChatGPT: the impact of patients consulting AI for cardiovascular health advice. Open Heart. Nov 2023;10(2):e002455. [FREE Full text] [CrossRef] [Medline]
- Mohammad B, Supti T, Alzubaidi M, Shah H, Alam T, Shah Z, et al. The pros and cons of using ChatGPT in medical education: a scoping review. Stud Health Technol Inform. Jun 29, 2023;305:644-647. [CrossRef] [Medline]
- Duffourc M, Gerke S. Generative AI in health care and liability risks for physicians and safety concerns for patients. JAMA. Jul 25, 2023;330(4):313-314. [CrossRef] [Medline]
- Kanter GP, Packel EA. Health care privacy risks of AI chatbots. JAMA. Jul 25, 2023;330(4):311-312. [CrossRef] [Medline]
- Hanna JJ, Wakene AD, Lehmann CU, Medford RJ. Assessing racial and ethnic bias in text generation for healthcare-related tasks by ChatGPT. medRxiv. Preprint posted online on August 28, 2023. [FREE Full text] [CrossRef] [Medline]
- Amin KS, Forman HP, Davis MA. Even with ChatGPT, race matters. Clin Imaging. May 2024;109:110113. [CrossRef] [Medline]
- Erren TC, Lewis P, Shaw DM. Brave (in a) new world: an ethical perspective on chatbots for medical advice. Front Public Health. Aug 17, 2023;11:1254334. [FREE Full text] [CrossRef] [Medline]
- Korcok M. Medical schools face challenge of preparing physicians to care for fast-growing elderly population. JAMA. Mar 01, 1985;253(9):1225-7, 1231. [Medline]
- Besdine R, Boult C, Brangman S, Coleman EA, Fried LP, Gerety M, et al. Caring for older Americans: the future of geriatric medicine. J Am Geriatr Soc. Jun 2005;53(6 Suppl):S245-S256. [CrossRef] [Medline]
- Institute of Medicine, Committee on the Future Health Care Workforce for Older Americans, Board on Health Care Services. Retooling for an Aging America: Building the Health Care Workforce. Washington, DC. National Academies Press; 2008.
- Dawson CM, Abiola AO, Sullivan AM, Schwartz AW, Members of the GERI Team Research Group. You can't be what you can't see: a systematic website review of Geriatrics Online-Visibility at US medical schools. J Am Geriatr Soc. Oct 2022;70(10):2996-3005. [FREE Full text] [CrossRef] [Medline]
- Gurwitz JH. The paradoxical decline of geriatric medicine as a profession. JAMA. Aug 22, 2023;330(8):693-694. [CrossRef] [Medline]
- Fear K, Gleber C. Shaping the future of older adult care: ChatGPT, advanced AI, and the transformation of clinical practice. JMIR Aging. Sep 13, 2023;6:e51776. [FREE Full text] [CrossRef] [Medline]
- Haque N. Artificial intelligence and geriatric medicine: new possibilities and consequences. J Am Geriatr Soc. Jun 17, 2023;71(6):2028-2031. [CrossRef] [Medline]
- Rao A, Kim J, Lie W, Pang M, Fuh L, Dreyer KJ, et al. Proactive polypharmacy management using large language models: opportunities to enhance geriatric care. J Med Syst. Apr 18, 2024;48(1):41. [CrossRef] [Medline]
- Rosselló-Jiménez D, Docampo S, Collado Y, Cuadra-Llopart L, Riba F, Llonch-Masriera M. Geriatrics and artificial intelligence in Spain (Ger-IA project): talking to ChatGPT, a nationwide survey. Eur Geriatr Med. Aug 2024;15(4):1129-1136. [CrossRef] [Medline]
- Nicholson K, Liu W, Fitzpatrick D, Hardacre KA, Roberts S, Salerno J, et al. Prevalence of multimorbidity and polypharmacy among adults and older adults: a systematic review. Lancet Healthy Longev. Apr 2024;5(4):e287-e296. [FREE Full text] [CrossRef] [Medline]
- Hung A, Kim YH, Pavon JM. Deprescribing in older adults with polypharmacy. BMJ. May 07, 2024;385:e074892. [CrossRef] [Medline]
- Ng R, Chow TY. Aging narratives over 210 years (1810-2019). J Gerontol B Psychol Sci Soc Sci. Oct 30, 2021;76(9):1799-1807. [FREE Full text] [CrossRef] [Medline]
- Global report on ageism. World Health Organization. Mar 18, 2021. URL: https://www.who.int/teams/social-determinants-of-health/demographic-change-and-healthy-ageing/combatting-ageism/global-report-on-ageism [accessed 2024-12-12]
- Hu RX, Luo M, Zhang A, Li LW. Associations of ageism and health: a systematic review of quantitative observational studies. Res Aging. Aug 2021;43(7-8):311-322. [CrossRef] [Medline]
- Allen JO, Solway E, Kirch M, Singer D, Kullgren JT, Moïse V, et al. Experiences of everyday ageism and the health of older US adults. JAMA Netw Open. Jun 01, 2022;5(6):e2217240. [FREE Full text] [CrossRef] [Medline]
- Chang ES, Kannoth S, Levy S, Wang SY, Lee JE, Levy BR. Global reach of ageism on older persons' health: a systematic review. PLoS One. 2020;15(1):e0220857. [FREE Full text] [CrossRef] [Medline]
- Araújo PO, Soares IM, Vale PR, Sousa AR, Aparicio EC, Carvalho ES. Ageism directed to older adults in health services: a scoping review. Rev Lat Am Enfermagem. 2023;31:e4019. [FREE Full text] [CrossRef] [Medline]
- Amundsen D. A critical gerontological framing analysis of persistent ageism in NZ online news media: don't call us "elderly"! J Aging Stud. Jun 2022;61:101009. [FREE Full text] [CrossRef] [Medline]
- Murphy E, Fallon A, Dukelow T, O'Neill D. Don't call me elderly: a review of medical journals' use of ageist literature. Eur Geriatr Med. Aug 2022;13(4):1007-1009. [CrossRef] [Medline]
- Haase KR, Sattar S, Pilleron S, Lambrechts Y, Hannan M, Navarrete E, et al. A scoping review of ageism towards older adults in cancer care. J Geriatr Oncol. Jan 2023;14(1):101385. [CrossRef] [Medline]
- Bowling A. Ageism in cardiology. BMJ. Nov 20, 1999;319(7221):1353-1355. [FREE Full text] [CrossRef] [Medline]
- Silva MF, Silva DS, Bacurau AG, Francisco PM, Assumpção D, Neri AL, et al. Ageism against older adults in the context of the COVID-19 pandemic: an integrative review. Rev Saude Publica. Apr 05, 2021;55:4. [FREE Full text] [CrossRef] [Medline]
- Altın Z, Buran F. Attitudes of health professionals toward elderly patients during the COVID-19 pandemic. Aging Clin Exp Res. Oct 2022;34(10):2567-2576. [FREE Full text] [CrossRef] [Medline]
- Martínez-Arnau FM, López-Hernández L, Castellano-Rioja E, Botella-Navas M, Pérez-Ros P. Interventions to improve attitudes toward older people in undergraduate health and social sciences students. A systematic review and meta-analysis. Nurse Educ Today. Mar 2022;110:105269. [FREE Full text] [CrossRef] [Medline]
- Apriceno M, Levy SR. Systematic review and meta-analyses of effective programs for reducing ageism toward older adults. J Appl Gerontol. Jun 17, 2023;42(6):1356-1375. [FREE Full text] [CrossRef] [Medline]
- Oca M, Meller L, Wilson K, Parikh AO, McCoy A, Chang J, et al. Bias and inaccuracy in AI chatbot ophthalmologist recommendations. Cureus. Sep 2023;15(9):e45911. [FREE Full text] [CrossRef] [Medline]
- Kishimoto M, Nagoshi M, Williams S, Masaki KH, Blanchette PL. Knowledge and attitudes about geriatrics of medical students, internal medicine residents, and geriatric medicine fellows. J Am Geriatr Soc. Jan 22, 2005;53(1):99-102. [CrossRef] [Medline]
- Reuben DB, Lee M, Davis JWJ, Eslami MS, Osterweil DG, Melchiore S, et al. Development and validation of a geriatrics attitudes scale for primary care residents. J Am Geriatr Soc. Nov 27, 1998;46(11):1425-1430. [CrossRef] [Medline]
- Gilson A, Safranek CW, Huang T, Socrates V, Chi L, Taylor RA, et al. How does ChatGPT perform on the united states medical licensing examination (USMLE)? The implications of large language models for medical education and knowledge assessment. JMIR Med Educ. Feb 08, 2023;9:e45312. [FREE Full text] [CrossRef] [Medline]
- Yaneva V, Baldwin P, Jurich DP, Swygert K, Clauser BE. Examining ChatGPT performance on USMLE sample items and implications for assessment. Acad Med. Feb 01, 2024;99(2):192-197. [CrossRef] [Medline]
- Levin G, Horesh N, Brezinov Y, Meyer R. Performance of ChatGPT in medical examinations: a systematic review and a meta-analysis. BJOG. Feb 2024;131(3):378-380. [CrossRef] [Medline]
- Kung JE, Marshall C, Gauthier C, Gonzalez TA, Jackson JB3. Evaluating ChatGPT performance on the orthopaedic in-training examination. JB JS Open Access. 2023;8(3):e23.00056. [FREE Full text] [CrossRef] [Medline]
- Passby L, Jenko N, Wernham A. Performance of ChatGPT on specialty certificate examination in dermatology multiple-choice questions. Clin Exp Dermatol. Jun 25, 2024;49(7):722-727. [CrossRef] [Medline]
- Ali R, Tang OY, Connolly ID, Zadnik Sullivan PL, Shin JH, Fridley JS, et al. Performance of ChatGPT and GPT-4 on neurosurgery written board examinations. Neurosurgery. Dec 01, 2023;93(6):1353-1365. [CrossRef] [Medline]
- Long C, Lowe K, Zhang J, Santos AD, Alanazi A, O'Brien D, et al. A novel evaluation model for assessing ChatGPT on otolaryngology-head and neck surgery certification examinations: performance study. JMIR Med Educ. Jan 16, 2024;10:e49970. [FREE Full text] [CrossRef] [Medline]
- Reuben DB, Lee M, Davis JW, Eslami MS, Osterweil DG, Melchiore S, et al. Development and validation of a geriatrics knowledge test for primary care residents. J Gen Intern Med. Jul 1997;12(7):450-452. [FREE Full text] [CrossRef] [Medline]
- Rao A, Pang M, Kim J, Kamineni M, Lie W, Prasad AK, et al. Assessing the utility of ChatGPT throughout the entire clinical workflow. medRxiv. Preprint posted online on February 26, 2023. [FREE Full text] [CrossRef] [Medline]
- Qu RW, Qureshi U, Petersen G, Lee SC. Diagnostic and management applications of ChatGPT in structured otolaryngology clinical scenarios. OTO Open. 2023;7(3):e67. [FREE Full text] [CrossRef] [Medline]
- Franco D'Souza R, Amanullah S, Mathew M, Surapaneni KM. Appraising the performance of ChatGPT in psychiatry using 100 clinical case vignettes. Asian J Psychiatr. Nov 2023;89:103770. [CrossRef] [Medline]
- Goodman RS, Patrinely JR, Stone CAJ, Zimmerman E, Donald RR, Chang SS, et al. Accuracy and reliability of chatbot responses to physician questions. JAMA Netw Open. Oct 02, 2023;6(10):e2336483. [FREE Full text] [CrossRef] [Medline]
- Kassab J, El Dahdah J, Chedid El Helou M, Layoun H, Sarraju A, Laffin LJ, et al. Assessing the accuracy of an online chat-based artificial intelligence model in providing recommendations on hypertension management in accordance with the 2017 American College of Cardiology/American Heart Association and 2018 European Society of Cardiology/European Society of Hypertension Guidelines. Hypertension. Jul 2023;80(7). [CrossRef]
- Rizwan A, Sadiq T. The use of AI in diagnosing diseases and providing management plans: a consultation on cardiovascular disorders with ChatGPT. Cureus. Aug 2023;15(8):e43106. [FREE Full text] [CrossRef] [Medline]
- Zhang L, Tashiro S, Mukaino M, Yamada S. Use of artificial intelligence large language models as a clinical tool in rehabilitation medicine: a comparative test case. J Rehabil Med. Sep 11, 2023;55:jrm13373. [FREE Full text] [CrossRef] [Medline]
- Juhi A, Pipil N, Santra S, Mondal S, Behera JK, Mondal H. The capability of ChatGPT in predicting and explaining common drug-drug interactions. Cureus. Mar 2023;15(3):e36272. [FREE Full text] [CrossRef] [Medline]
- Fournier A, Fallet C, Sadeghipour F, Perrottet N. Assessing the applicability and appropriateness of ChatGPT in answering clinical pharmacy questions. Ann Pharm Fr. May 2024;82(3):507-513. [CrossRef] [Medline]
- Al-Dujaili Z, Omari S, Pillai J, Al Faraj A. Assessing the accuracy and consistency of ChatGPT in clinical pharmacy management: a preliminary analysis with clinical pharmacy experts worldwide. Res Social Adm Pharm. Dec 2023;19(12):1590-1594. [CrossRef] [Medline]
- Huang X, Estau D, Liu X, Yu Y, Qin J, Li Z. Evaluating the performance of ChatGPT in clinical pharmacy: a comparative study of ChatGPT and clinical pharmacists. Br J Clin Pharmacol. Jan 2024;90(1):232-238. [CrossRef] [Medline]
- Choi W. Assessment of the capacity of ChatGPT as a self-learning tool in medical pharmacology: a study using MCQs. BMC Med Educ. Nov 13, 2023;23(1):864. [FREE Full text] [CrossRef] [Medline]
- Cheng HY, Bradley E. Improving identification of potentially inappropriate medications among older patients: a preliminary report of a single-blind, controlled trial teaching fourth year medical students value- and preference-based prescribing/de-prescribing. Med Sci Educ. Jun 28, 2016;26:431-435. [CrossRef]
- Cheng H. Falls and fall prevention. MedEdPORTAL. Sep 26, 2014:9910. [CrossRef]
- Alqahtani R, Almuhaidib S, Jradi H. A cross-sectional study: exploring knowledge and attitude of medical and nursing students to Care for Elders in the future. BMC Geriatr. Nov 14, 2022;22(1):856. [FREE Full text] [CrossRef] [Medline]
- Hogan TM, Chan SB, Hansoti B. Multidimensional attitudes of emergency medicine residents toward older adults. West J Emerg Med. Jul 2014;15(4):511-517. [FREE Full text] [CrossRef] [Medline]
- Seferoğlu M, Yıldız D, Pekel NB, Güneş A, Yıldız A, Tufan F. Attitudes of neurology specialists toward older adults. Aging Clin Exp Res. Aug 2017;29(4):787-792. [CrossRef] [Medline]
- O'Mahony D, Cherubini A, Guiteras AR, Denkinger M, Beuscart JB, Onder G, et al. STOPP/START criteria for potentially inappropriate prescribing in older people: version 3. Eur Geriatr Med. Aug 2023;14(4):625-632. [FREE Full text] [CrossRef] [Medline]
- Wilson MA, Kurrle S, Wilson I. Medical student attitudes towards older people: a critical review of quantitative measures. BMC Res Notes. Jan 24, 2018;11(1):71. [FREE Full text] [CrossRef] [Medline]
- By the 2023 American Geriatrics Society Beers Criteria® Update Expert Panel. American Geriatrics Society 2023 updated AGS Beers Criteria® for potentially inappropriate medication use in older adults. J Am Geriatr Soc. Jul 04, 2023;71(7):2052-2081. [CrossRef] [Medline]
- Hilmer SN, Mager DE, Simonsick EM, Cao Y, Ling SM, Windham BG, et al. A drug burden index to define the functional burden of medications in older people. Arch Intern Med. Apr 23, 2007;167(8):781-787. [CrossRef] [Medline]
- Samsa GP, Hanlon JT, Schmader KE, Weinberger M, Clipp EC, Uttech KM, et al. A summated score for the medication appropriateness index: development and assessment of clinimetric properties including content validity. J Clin Epidemiol. Aug 1994;47(8):891-896. [CrossRef] [Medline]
- Pravodelov V. Thoughtful prescribing and deprescribing. Med Clin North Am. Sep 2020;104(5):751-765. [CrossRef] [Medline]
- Ramaswamy R, Maio V, Diamond JJ, Talati AR, Hartmann CW, Arenson C, et al. Potentially inappropriate prescribing in elderly: assessing doctor knowledge, confidence and barriers. J Eval Clin Pract. Dec 2011;17(6):1153-1159. [CrossRef] [Medline]
- Abd Wahab MS. The relevance of educating doctors, pharmacists and older patients about potentially inappropriate medications. Int J Clin Pharm. Dec 2015;37(6):971-974. [CrossRef] [Medline]
- Rathore A, Sharma R, Bansal P, Chhabra M, Arora M. Knowledge, attitude, and practice of medical interns and postgraduate residents on American Geriatric Society updated Beers criteria. J Educ Health Promot. 2023;12:1. [FREE Full text] [CrossRef] [Medline]
- Vandergrift JL, Weng W, Leff B, Gray BM. Geriatricians, general internists, and potentially inappropriate medications for a national sample of older adults. J Am Geriatr Soc. Jan 2024;72(1):37-47. [CrossRef] [Medline]
- Moreland B, Kakara R, Henry A. Trends in nonfatal falls and fall-related injuries among adults aged ≥65 years - United States, 2012-2018. MMWR Morb Mortal Wkly Rep. Jul 10, 2020;69(27):875-881. [FREE Full text] [CrossRef] [Medline]
- Xiong W, Wang D, Ren W, Liu X, Wen R, Luo Y. The global prevalence of and risk factors for fear of falling among older adults: a systematic review and meta-analysis. BMC Geriatr. Apr 05, 2024;24(1):321. [FREE Full text] [CrossRef] [Medline]
- Rausch C, van Zon SK, Liang Y, Laflamme L, Möller J, de Rooij SE, et al. Geriatric syndromes and incident chronic health conditions among 9094 older community-dwellers: findings from the lifelines cohort study. J Am Med Dir Assoc. Jan 2022;23(1):54-9.e2. [FREE Full text] [CrossRef] [Medline]
- US Preventive Services Task Force, Nicholson WK, Silverstein M, Wong JB, Barry MJ, Chelmow D, et al. Interventions to prevent falls in community-dwelling older adults: US Preventive Services Task Force recommendation statement. JAMA. Jul 02, 2024;332(1):51-57. [CrossRef] [Medline]
- Clinical resources. Centers for Disease Control and Prevention STEADI. Apr 22, 2024. URL: https://www.cdc.gov/steadi/hcp/clinical-resources/index.html [accessed 2024-06-25]
- Xu Q, Ou X, Li J. The risk of falls among the aging population: a systematic review and meta-analysis. Front Public Health. 2022;10:902599. [FREE Full text] [CrossRef] [Medline]
- Panel on Prevention of Falls in Older Persons, American Geriatrics Society and British Geriatrics Society. Summary of the Updated American Geriatrics Society/British Geriatrics Society clinical practice guideline for prevention of falls in older persons. J Am Geriatr Soc. Jan 2011;59(1):148-157. [CrossRef] [Medline]
- Smith ML, Stevens JA, Ehrenreich H, Wilson AD, Schuster RJ, Cherry CO, et al. Healthcare providers' perceptions and self-reported fall prevention practices: findings from a large New York health system. Front Public Health. 2015;3:17. [FREE Full text] [CrossRef] [Medline]
- Howland J, Hackman H, Taylor A, O'Hara K, Liu J, Brusch J. Older adult fall prevention practices among primary care providers at accountable care organizations: a pilot study. PLoS One. 2018;13(10):e0205279. [FREE Full text] [CrossRef] [Medline]
- Ortmann N, Haddad YK, Beck L. Special report from the CDC: provider knowledge and practices around driving safety and fall prevention screening and recommendations for their older adult patients, DocStyles 2019. J Safety Res. Sep 2023;86:401-408. [CrossRef] [Medline]
- Davenport K, Cameron A, Samson M, Sri-On J, Liu SW. Fall prevention knowledge, attitudes, and behaviors: a survey of emergency providers. West J Emerg Med. Jul 10, 2020;21(4):826-830. [FREE Full text] [CrossRef] [Medline]
- Meekes WM, Leemrijse CJ, Weesie YM, van de Goor IA, Donker GA, Korevaar JC. Falls prevention at GP practices: a description of daily practice. BMC Fam Pract. Sep 21, 2021;22(1):190. [FREE Full text] [CrossRef] [Medline]
- Siontis KC, Attia ZI, Asirvatham SJ, Friedman PA. ChatGPT hallucinating: can it get any more humanlike? Eur Heart J. Feb 01, 2024;45(5):321-323. [CrossRef] [Medline]
- Hatem R, Simmons B, Thornton JE. A call to address AI "hallucinations" and how healthcare professionals can mitigate their risks. Cureus. Sep 2023;15(9):e44720. [FREE Full text] [CrossRef] [Medline]
- Sallam M, Barakat M, Sallam M. A preliminary checklist (METRICS) to standardize the design and reporting of studies on generative artificial intelligence-based models in health care education and practice: development study involving a literature review. Interact J Med Res. Feb 15, 2024;13:e54704. [FREE Full text] [CrossRef] [Medline]
Abbreviations
AI: artificial intelligence |
CDC: Centers for Disease Control and Prevention |
METRICS: Model, Evaluation, Timing, Range and Randomization, Individual Factors, Counts, and Specificity of Prompts and Language |
PIM: potentially inappropriate medication |
UCLA: University of California, Los Angeles |
USMLE: United States Medical Licensing Examination |
Edited by A Mavragani; submitted 30.06.24; peer-reviewed by J Mira, P Heyn; comments to author 17.10.24; revised version received 26.10.24; accepted 17.11.24; published 03.01.25.
Copyright©Huai Yong Cheng. Originally published in JMIR Formative Research (https://formative.jmir.org), 03.01.2025.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Formative Research, is properly cited. The complete bibliographic information, a link to the original publication on https://formative.jmir.org, as well as this copyright and license information must be included.