This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Formative Research, is properly cited. The complete bibliographic information, a link to the original publication on https://formative.jmir.org, as well as this copyright and license information must be included.
People with low health literacy experience more challenges in understanding instructions given by their health providers, following prescriptions, and understanding their health care system sufficiently to obtain the maximum benefits. People with insufficient health literacy have high risk of making medical mistakes, more chances of experiencing adverse drug effects, and inferior control of chronic diseases.
This study aims to design, develop, and evaluate a mobile health app, MediReader, to help individuals better understand complex medical materials and improve their health literacy.
MediReader is designed and implemented through several steps, which are as follows: measure and understand an individual’s health literacy level; identify medical terminologies that the individual may not understand based on their health literacy; annotate and interpret the identified medical terminologies tailored to the individual’s reading skill levels, with meanings defined in the appropriate external knowledge sources; evaluate MediReader using task-based user study and satisfaction surveys.
On the basis of the comparison with a control group, user study results demonstrate that MediReader can improve users’ understanding of medical documents. This improvement is particularly significant for users with low health literacy levels. The satisfaction survey showed that users are satisfied with the tool in general.
MediReader provides an easy-to-use interface for users to read and understand medical documents. It can effectively identify medical terms that a user may not understand, and then, annotate and interpret them with appropriate meanings using languages that the user can understand. Experimental results demonstrate the feasibility of using this tool to improve an individual’s understanding of medical materials.
Effective communication in health care has an enormous impact on the health and safety of patients. Limited health literacy is one of the major obstacles to good health care results including health status, health outcomes, health care use, and health costs for patients [
However, according to the US Department of Health and Human Services, only 12% of adults in the United States have proficient health literacy, whereas more than one-third of adults have low health literacy levels, which make it difficult for them to deal with common health tasks such as following directions for how to use prescription medications [
Given the aforementioned gap between the current health information and people’s poor understanding of this information to make life-altering decisions, many policies and strategies have been proposed by policy makers, administrators, educators, and health care professionals to simplify medical information and improve health literacy. Besides these efforts, there is an increasing need to provide tools to facilitate people to understand medical information. This may enhance the patient-physician relationship and improve health care outcomes by reducing the incidence of morbidity, mortality, and misuse of health care [
This study was approved by the institutional review board of North Dakota State University (IRB0003857).
The goal of the system is to design a mobile app to remove people’s barriers to understanding difficult medical documents by annotating or interpreting medical terminologies with plain texts, which they can understand easily. The app, MediReader, is built based on comprehensive knowledge sources and artificial intelligence–based processing mechanisms. It annotates a medical document with external knowledge according to each user’s health literacy level.
The architecture of the system. UMLS: Unified Medical Language System.
We created a comprehensive medical knowledge base by integrating multiple publicly available knowledge sources, including Unified Medical Language System (UMLS) [
The objective of our health literacy measurement was to identify the degree to which individuals can understand health information and services. We studied many health literacy screening and measurement approaches, including the National Assessment of Adult Literacy [
We chose a subset of ComprehENotes’ questions to perform user evaluation, as a test with fewer questions can be administered more quickly than the full test. The subset of the questions should be sufficiently informative to identify different health literacy levels. We used the item response theory (IRT) [
ComprehENotes uses the IRT model that is widely used in education to calibrate and evaluate items in tests, questionnaires, and other instruments and to score participants on their abilities, attitudes, or other latent traits. Specifically, we applied the 3-parameter logistic model, in which the item characteristic curves are assumed to follow a logistic function with a nonzero lower asymptote:
In the above equation, Pij is the probability that person j answers item i correctly, and θj is the ability level of individual j. In our project, θ represents the ability of an individual in the task of medical document comprehension. As individuals are assumed to be sampled from a population, their ability levels are assumed to have a random effect with a standard normal distribution. Therefore, a score of 0 is considered as average (ie, in the 50th percentile), scores >0 are considered as above average, and scores <0 are considered as below average.
In this task, medical entities in a document, such as diseases, medical problems, drug names, tests, and examinations, will be identified. Existing research on biomedical named entity recognition can be classified into three types: rule-based [
MediReader uses the so-called
Bidirectional long short-term memory (BiLSTM), convolutional neural networks (CNNs), and conditional random field (CRF) neural network architecture.
Word embedding [
For each word, the character-level representation was computed by the CNN with character embeddings as inputs. Then, the combined character-level and word-level encoding were fed into a BiLSTM to model the context information of each word. LSTMs [
After the medical entities (referred to as
To link mentions to the right entities defined in a knowledge base, the system needs to generate a manageable candidate list containing possible entities that the mention may refer to. In our system, the knowledge base entries were retrieved from a subset of the UMLS concepts data set and extended using Wikidata [
Entities (concepts and aliases) in the knowledge base were encoded using term frequency-inverse document frequency scores of character n-grams (n=3 in our implementation) that appears more than a certain number of times in the knowledge base. Then, the k-nearest neighbor search was applied to generate candidate entities for linking a given mention.
Entity linking may encounter the problem of entity ambiguity; that is, 1 mention may be mapped to several candidate entries in the knowledge base. For example, the word
After mentions in the document and entities in the knowledge sources were linked, annotation was performed. Annotating all medicine-related mentions is unnecessary as readers may know many of them. By contrast, a full annotation may cause discomfort to readers. Therefore, the system needs to determine which mentions should be annotated. MediReader proposes a personalized annotation scheme that annotates a mention based on an individual reader’s health literacy level, as discussed in the previous section. For readers with very low literacy levels, more mentions should be annotated, and the annotation should be easy to understand. For readers with high literacy levels, only complex medical terms should be annotated.
Medical term’s difficulty and readability assessment was approached as a classification problem. We used a feature set with many features commonly used for standard natural language processing, such as grammatical metrics, semantic metrics, and new composite metrics. We also added new features to the biomedical domain to make the classification specialized in this field. The feature set included the following items:
Syntactic categories; for example, nouns, adjectives, proper names, verbs, and abbreviations
Number of characters and syllables in the word
Prefixes and suffixes of the word
Number and percentage of consonants, vowels, and other characters (ie, hyphen, apostrophe, and commas)
Presence of words in WordNet
Word frequency in Google
Word frequency in UMLS
Word semantic categories in UMLS
Pretrained word embeddings using MedMentions
To build our data set, we extracted medical concepts from the website of Medical Transcription Samples [
On the basis of a reader’s health literacy level, medical mentions were annotated. For readers with high health literacy levels, only difficult words were annotated. For readers with low health literacy levels, medium and difficult words were annotated. We did not annotate easy words such as
Each entity was annotated with its definition in the knowledge source. In addition, they were linked by taxonomic relations, such as
Screenshot of an annotated document.
We implemented the MediReader prototype system as a mobile app. We conducted a set of evaluation tests with representative users to assess the technical viability and effectiveness of this app. To conduct the test, we developed a test plan, recruited participants, and then, analyzed and reported our findings. In our study, we used 2 types of quality metrics that combine to form the big construct we call
For the
We also performed a
Before conducting the test, we conducted a pilot study to verify our programming, database, and scoring. We expected that some participants may not read the assigned documents and questions and may choose random answers. To eliminate such responses, we included qualifying questions in different sections. In each multiple-choice section, we added 1 question that could easily be answered correctly if the participant read it. Participants who did not answer these questions correctly were eliminated from the data set.
Owing to the difficulty in recruiting participants, we had to include as many participants as possible. Therefore, we only required the participants who were aged ≥18 years and knew English. A total of 52 individuals participated in our test. Among the 52 individuals, 13 (25%) individuals did not complete the test and 11 (21%) individuals were disqualified based on our qualifying questions. The remaining 54% (28/52) of the participants completed the test successfully.
Demographic information of the participants (N=28).
Variable | Value, n (%) | ||
|
|||
|
Men | 13 (46) | |
|
Women | 15 (54) | |
|
|||
|
40-50 | 4 (14) | |
|
30-39 | 7 (25) | |
|
20-29 | 17 (61) | |
|
|||
|
Undergraduate | 19 (68) | |
|
Postgraduate | 9 (32) | |
|
|||
|
Low | 10 (36) | |
|
High | 18 (64) |
We compared the average scores between the experimental and control groups for all the questions. We noticed that the experimental group significantly exceeded the control group, as they scored 76% compared with 36% for the control group, which means that participants who were provided with medical documents annotated using our tool had a higher score than those who were given documents without annotation.
In terms of the time spent for the reading test, we found that the experimental group spent more time than the control group (29 minutes and 24 minutes, respectively). From the participants’ comments, we learned that they spent time in reading more information about the annotated terms and other information related to the term. We believe that this explains why the experimental group spent more time in the test.
Comparison of the average scores of the experimental and control groups on different medical domains or diseases.
Disease and group | Score, mean (SD) | ||
|
|||
|
Experimental | 70 (0.35) | |
|
Control | 60 (0.32) | |
|
|||
|
Experimental | 74 (0.39) | |
|
Control | 26 (0.29) | |
|
|||
|
Experimental | 87 (0.19) | |
|
Control | 27 (0.16) |
Our tool has a different impact on participants with different health literacy levels. The average score increased from 36% to 88% for participants with high health literacy levels. For participants with low health literacy, the score greatly increased from 17% to 85%.
We applied the Wilcoxon rank-sum test [
To identify participants’ overall satisfaction, they were asked to provide their satisfaction feedback regarding the use of the mobile app. The participant satisfaction analysis showed that, in general, the participants were satisfied with the mobile app. As shown in
Comparison of the average score and
Disease, health literacy level, and group | Score, mean (SD) | |||||
|
||||||
|
|
.54 | ||||
|
|
Experimental | 71 (0.30) |
|
||
|
|
Control | 62 (0.23) |
|
||
|
|
.20 | ||||
|
|
Experimental | 67 (0.42) |
|
||
|
|
Control | 46 (0.25) |
|
||
|
||||||
|
|
.02 | ||||
|
|
Experimental | 86 (0.26) |
|
||
|
|
Control | 43 (0.25) |
|
||
|
|
.06 | ||||
|
|
Experimental | 61 (0.49) |
|
||
|
|
Control | 12.5 (0.25) |
|
||
|
||||||
|
|
.002 | ||||
|
|
Experimental | 88 (0.16) |
|
||
|
|
Control | 36 (0.15) |
|
||
|
|
.002 | ||||
|
|
Experimental | 85 (0.23) |
|
||
|
|
Control | 17 (0.11) |
|
Overall feedback regarding the use of the mobile app (N=28).
Survey question | Strongly agreed, n (%) | Agreed, n (%) | Disagreed, n (%) | Strongly disagreed, n (%) |
The application helped me understand medical documents better | 18 (64) | 6 (21) | 3 (10) | 1 (4) |
The application was easy to use | 18 (64) | 4 (14) | 4 (14) | 1 (4) |
I will recommend the application to others | 18 (64) | 6 (21) | 3 (11) | 1 (4) |
The application annotated appropriate medical terms | 12 (43) | 13 (46) | 2 (7) | 1 (4) |
People need to understand medical information to have the best chance of a good health outcome. However, understanding medical information is more difficult than what most people realize, as it requires a certain degree of health literacy. To assist people in understanding medical documents, we designed, developed, and evaluated a mobile app, MediReader. MediReader uses external knowledge sources to annotate medical documents according to each user’s health literacy level. Algorithms based on machine learning and natural language processing have been proposed and implemented to recognize medical entities, identify the complexity of medical terms, and link medical terms to external knowledge that can explain the terms. MediReader was evaluated through task-based user studies with a control group and users’ satisfaction survey.
On the basis of the comparison with a control group, the test results demonstrate that MediReader can improve users’ understanding of medical documents. This improvement is particularly significant for users with low health literacy levels. The satisfaction survey shows that users are satisfied with the tool in general. The result also shows that some medical information is more difficult to understand than others, even with the help of MediReader. In summary, our study demonstrated that it is feasible and effective to implement an mHealth tool to help people better understand medical documents.
MediReader simplified medical documents for the general public and improved their understanding, whereas most existing annotation tools, such as MetaMap [
This study had some limitations. The qualitative evaluation was performed with limited participants and most of them were college students. The results that will be obtained if it is conducted on underrepresented racial or ethnic groups and older adults remains questionable. More comprehensive user studies will be performed on a large population to evaluate the usability, satisfaction rate of users, and health and quality of life-improvement outcomes.
Some medical information is still difficult to understand even after our tool’s annotation.
Through our test, we found that some medical terms are annotated with annotations and definitions that are difficult to understand, especially when the annotations are retrieved from professional medical resources such as the UMLS vocabularies. We will work on exploiting more information sources (eg, Google Knowledge Graph) to enrich and simplify the annotation.
When a new document is loaded, there is a delay in providing the annotations to the users. We will continue to optimize our algorithms in natural language processing and machine learning to reduce the execution time. In addition, we plan to encode frequently used knowledge and store it in the storage memory of the device to further reduce the delay.
Limited health literacy may restrict an individual’s participation in health contexts and activities. To help people improve their health literacy and understand medical documents better, in this study, we proposed and evaluated an mHealth app, MediReader. The app annotates medical documents with information that people can understand. Our experiments demonstrated that this tool can help users better comprehend the contents of medical documents. It is especially useful for people with low health literacy levels. From our test, we found that low health literacy does not necessarily correspond to general low literacy; individuals who may be extremely literate in their areas of expertise (eg, graduate students) may also have a problem in understanding medical terminology. Further research is needed to overcome the limitations of this study.
bidirectional long short-term memory
convolutional neural network
conditional random field
electronic health record
item response theory
mobile health
Unified Medical Language System
The authors would like to thank the study participants. This study was supported by the National Science Foundation under the Division of Information and Intelligent Systems, with award number 1722913.
None declared.