This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Formative Research, is properly cited. The complete bibliographic information, a link to the original publication on https://formative.jmir.org, as well as this copyright and license information must be included.
Pain description is fundamental to health care. The McGill Pain Questionnaire (MPQ) has been validated as a tool for the multidimensional measurement of pain; however, its use relies heavily on language proficiency. Although the MPQ has remained unchanged since its inception, the English language has evolved significantly since then. The advent of the internet and social media has allowed for the generation of a staggering amount of publicly available data, allowing linguistic analysis at a scale never seen before.
The aim of this study is to use social media data to examine the relevance of pain descriptors from the existing MPQ, identify novel contemporary English descriptors for pain among users of social media, and suggest a modification for a new MPQ for future validation and testing.
All posts from social media platforms from January 1, 2019, to December 31, 2019, were extracted. Artificial intelligence and emotion analytics algorithms (Crystalace and CrystalFeel) were used to measure the emotional properties of the text, including
A total of 118 new associated words were found via Word2Vec. Of these 118 words, 49 (41.5%) words had a count of at least 110, which corresponded to the count of the bottom 10% (8/78) of the original MPQ pain descriptors. The count and intensity of pain descriptors were used to formulate the inclusion criteria for a new pain questionnaire. For the suggested new pain questionnaire, 11 existing pain descriptors were removed, 13 new descriptors were added to existing subclasses, and a new
This study presents a novel methodology using social media data to identify new pain descriptors and can be repeated at regular intervals to ensure the relevance of pain questionnaires. The original MPQ contains several potentially outdated pain descriptors and is inadequate for reporting the psychological aspects of pain. Further research is needed to examine the reliability and validity of the revised MPQ.
Pain is “an unpleasant sensory and emotional experience associated with, or resembling that associated with, actual or potential tissue damage” [
Various instruments have been developed for the assessment of pain. For acute pain, pain scales that focus on identifying pain location and intensity, such as the visual analog scale and numeric rating scale, are most commonly used [
Assessment of chronic pain is indisputably more complex. The long-term burden of pain plays a profound role in shaping an individual’s physical and psychological state. In addition, the negative downstream effects of chronic pain can exacerbate the original pain condition through various pathways that remain poorly understood. Owing to these complexities, the above unidimensional instruments that only describe pain in terms of intensity may be too simplistic for meaningful clinical correlation.
Several authors have emphasized the need to recognize the multidimensional aspects of pain [
The McGill Pain Questionnaire (MPQ), created in 1971, is one of the most frequently cited instruments and has been validated for use in asymptomatic, symptomatic, and persistently symptomatic populations [
The MPQ can be administered by an interviewer who reads the instructions to the patient and defines any words that are not understood [
The MPQ has remained unchanged since its inception, and the impact of modern language use on the relevance of MPQ pain descriptors remains unreported. During this time, the invention of the internet and its exponential penetration have drastically reshaped our world. Social media, which comprises forums, blogs, business networks, social gaming, microblogs, photo-sharing platforms, and chat apps, has evolved dramatically alongside these developments. Furthermore, >50% of the global population was expected to access the internet in 2019, and the same figure was expected to use social media platforms [
Natural language processing or computational linguistics, together with machine learning algorithms, have evolved substantially over the years to be able to analyze, learn, and understand the linguistic contexts of words, identify sentiments and emotions, and form neural network models [
In relation to chronic pain, social media represents a snapshot of natural day-to-day colloquial language rather than formal communication. Furthermore, the nature of social media encourages users to capture their thoughts and ideas instantaneously. This is particularly important for accurate pain reporting, which is vulnerable to recall bias and relies heavily on timely reporting. Previous studies have observed extensive amounts of web-based conversations regarding pain [
This study used artificial intelligence and emotion analytics algorithms for the derivation and analysis of pain expression from social media platforms. The workflow was executed by a company specializing in linguistic and emotion analytics (INTNT.AI) and comprised 5 main steps conducted in an iterative process: (1) preliminary data gathering, (2) data cleaning, (3) Word2Vec (patent number US9037464B1; Google Inc), (4) final data gathering and cleaning, and (5) data analysis. The workflow is summarized in
Workflow for the derivation of new pain descriptors. SMP: social media post.
All posts from social media platforms over a 1-year period, from January 1, 2019, to December 31, 2019, were extracted. These data were acquired from a social listening platform (Meltwater) that aggregates and gives direct and official access to all accounts open to the public on Twitter, Facebook, Instagram, and YouTube.
A list comprising 78 pain descriptors from the MPQ and 51 additions yielded through the use of a mixture of web-based thesauruses was used to identify relevant social media posts (
Only social media posts containing the word
The selected social media posts were manually scrutinized and cleaned. Usernames, hyperlinks, and internet-specific symbols were removed. In addition, the content of the social media posts was evaluated for relevance. Social media posts that contained the pain descriptor in irrelevant contexts were removed.
The remaining posts were then input into a sarcasm detection machine (Crystalace, Institute of High Performance Computing, Agency for Science, Technology and Research), which is a support vector machine classifier trained with an affect-cognition-sociolinguistics feature model [
Sarcasm is a difficult concept to handle in emotion analysis. Traditionally, sarcasm has been viewed from a psychological perspective where overt irony is actively pursued by the speaker as a tool of
Word2Vec [
Sample 2D illustration of words with common contexts within the overall 3D vector space; similar words are color-coded. This graph was constructed using T-distributed Stochastic Neighbor Embedding (TSNE), to aid visualization of word clusters. TSNE works by taking a group of high-dimensional vocabulary word feature vectors, then compressing them down to 2-dimensional x, y coordinate pairs. This method keeps similar words close together on the plane, while maximizing the distance between dissimilar words.
This process allowed for the classification of pain dimensions from the open text found in the included social media posts and the identification of new pain descriptors. All new words with a positive vector distance from the original list of pain descriptors were considered to be associated, and a maximum of 20 associated words per root word was selected for inclusion.
Newly identified pain descriptors derived from Word2Vec mechanisms were compiled with the original list of pain descriptors used in preliminary data gathering to form an expanded list of keywords to be used for the final round of data gathering. The search for relevant social media posts was performed in the same social media platforms and period as above.
For greater specificity to health conditions, social media posts were included only if they contained at least one of the pain descriptors in this new list, as well as one pain condition from a list of common pain conditions (
The final data set was obtained after the selected social media posts were put through the same data cleaning steps as detailed above.
The original MPQ comprised 78 pain descriptors categorized into 20 subclasses. In addition, 51 additional words were yielded through the use of web-based thesauruses. These 129 original pain descriptors served as keywords for the final analysis. The final data set was input into Word2Vec, and the final classification of descriptors was obtained using the algorithm.
Words found to be related to keywords were identically color-coded, located in close proximity to the overall vector space, and had a positive vector distance. A maximum of 20 words found to be most similar to each of the predetermined 129 keywords was selected for inclusion and further pruning. These were evaluated for relevance to pain descriptors. Entries with contrasting meanings to the keywords (eg,
The number of mentions, or count, of each pain descriptor within the final data set of selected social media posts was computed. Analysis of the counts for the 78 original MPQ keywords was conducted to determine the minimum threshold level for the inclusion of new associated words. The bottom 10% of the original MPQ pain descriptors were found to have counts of <110. Therefore, a count of 110 was set as the minimum threshold level for the prevalence of word use, and all words with a count of <110 were removed.
The intensities of all descriptors were also analyzed using natural language processing emotion analytics algorithms (CrystalFeel, Institute of High Performance Computing, Agency for Science, Technology and Research), which considered the entire sentence or paragraph in which each descriptor was found. The CrystalFeel algorithms allowed for the measurement of the emotional properties in text, including
A total of 572,742 social media posts were obtained from a preliminary round of data gathering. Following manual evaluation of the 572,742 social media posts for relevance, 8310 (1.45%) social media posts were removed, whereas 7824 (1.37%) social media posts were removed after failing to meet the threshold criterion for sarcasm. Word2Vec identified 34 new pain descriptors that were used together with the original list of words to widen the search for relevant social media posts in the second round of data gathering. A total of 1,877,122 social media posts were identified in the second round of data gathering. After data cleaning and the additional inclusion criteria of containing at least one pain condition as well as one pain descriptor in the social media post, 11.55% (216,873/1,877,122) of social media posts remained for the final data analysis.
Using Word2Vec, a total of 118 new associated words were found for the 129 pain descriptor keywords defined in this study, following the removal of repetitions. Of these 118 words, 5 (4.2%) were associated with both the original MPQ and thesaurus-derived keywords, 87 (73.7%) were associated with MPQ keywords only, and 26 (22%) were associated with thesaurus-derived keywords only (
A total of 118 new words were found to be associated with the 78 original McGill Pain Questionnaire keywords and 51 thesaurus-derived keywords; bracketed values indicate the number of words with count ≥110. MPQ: McGill Pain Questionnaire.
Of the 118 new associated words acquired through Word2Vec, 49 (41.5%) words were found to have a count of at least 110, meeting the minimum threshold level for the prevalence of word use.
28 (23.7%) thesaurus-derived keywords met the minimum threshold count of 110. These and the 49 out of 118 (41.5%) new associated words derived through Word2Vec were combined into a single list for further evaluation, representing the final list of 77 newly derived pain descriptors (
Breakdown of original and newly derived pain descriptors. MPQ: McGill Pain Questionnaire.
Of the 78 original MPQ keywords, the pain descriptors that received the top 10 highest counts in descending order were
The intensity of the 78 original MPQ keywords ranged from 0.367 (for
The counts and intensities of existing and new words are presented in
Analyses of count and intensity of the pain descriptors, as well as manual analysis of the social media posts for the context in which the word was used, were used to recommend the inclusion of pain descriptors in a new pain questionnaire. The suggested changes, categorized by the MPQ subclasses, are summarized in
Comparison of the original McGill Pain Questionnaire (MPQ) with the suggestion for a new pain questionnaire.
Subclass and pain descriptors from the original MPQ | Suggested words for a new pain questionnaire | Ranking reordered? (yes or no) | |||
|
|||||
|
Flickering Quivering (removed) Pulsing Throbbing Beating (removed) Pounding |
Flickering Pulsing Throbbing Pounding |
No |
||
|
|||||
|
Jumping Flashing Shooting |
Jumping Flashing Shooting |
No |
||
|
|||||
|
Pricking Boring Drilling Stabbing Lancinating (removed) |
Pricking Boring Drilling Stabbing Puncturing (new) |
No |
||
|
|||||
|
Sharp Cutting Lacerating (removed) |
Cutting Sharp |
Yes |
||
|
|||||
|
Pinching Pressing Gnawing Cramping Crushing |
Pressing Gnawing Crushing Pinching Cramping |
Yes |
||
|
|||||
|
Tugging Pulling Wrenching |
Tugging Contraction (new) Pulling Wrenching Clenching (new) |
No |
||
|
|||||
|
Hot Burning Scalding (removed) Searing |
Hot Searing Burning |
Yes |
||
|
|
||||
|
Tingling Itchy Smarting (removed) Stinging |
Scratching (new) Tingling Itchy Stinging |
No |
||
|
|||||
|
Dull Sore Hurting Aching Heavy |
Sore Aching Dull Hurting Heavy |
Yes |
||
|
|||||
|
Tender Taut (removed) Rasping (removed) Splitting |
Tender Splitting |
No |
||
|
|||||
|
Tiring Exhausting |
Tiring Straining (new) Exhausting |
No |
||
|
|||||
|
Sickening Suffocating |
Sickening Suffocating |
No |
||
|
|||||
|
Fearful Frightful (removed) |
Fearful Horrendous (new) Horrifying (new) |
No |
||
|
|||||
|
Terrifying Punishing (removed) Grueling Cruel Vicious Killing |
Grueling Cruel Vicious Killing Terrifying |
Yes |
||
|
|||||
|
Wretched Blinding |
Wretched Blinding |
No |
||
|
|||||
|
Annoying Troublesome Miserable Intense Unbearable |
Mild (new) Troublesome Intense Annoying Irritating (new) Unbearable Horrible (new) Miserable Excruciating (new) Distressing (new) |
Yes |
||
|
|||||
|
Spreading Radiating Penetrating Piercing |
Spreading Radiating Penetrating Piercing |
No |
||
|
|||||
|
Tight Numb Drawing (removed) Squeezing Tearing |
Bruising (new) Tight Numb Squeezing Tearing |
No |
||
|
|||||
|
Cool Cold Freezing |
Cool Cold Freezing |
No |
||
|
|||||
|
Nagging Nauseating Agonizing Dreadful Torturing |
Nagging Nauseating Agonizing Dreadful Torturing |
No |
||
|
|||||
|
—a |
Worried (new) Angry (new) Fearful Sad (new) Depressed (new) Nervous (new) Anxious (new) Feel hopeless (new) Suicidal (new) |
N/Ab |
aNot available.
bN/A: not applicable.
Of the 78 pain descriptors, 8 (10%) pain descriptors from the original MPQ were removed because of low use (count<110). The words removed were
In addition, of the 78 pain descriptors from the original MPQ, 3 (4%) pain descriptors were removed because, on manual analysis of the social media posts, the contexts in which they were used were found to have deviated from pain description. The word
A total of 13 new associated words were added to the pre-existing subclasses. The added words were
An entirely new
The identified pain descriptors consisted of a mixture of nouns and adjectives. For consistency, the 9 selected descriptors were modified to fit into the sentence
The pain descriptors of 6 of the 20 subclasses were reordered based on their emotional intensity to reflect decreasing pain intensity. These new rankings are shown in
This study provides insights into modern language use in the context of pain description. We found infrequent use and even a change in context for the use of several descriptors from the original MPQ, reflecting the evolution of language and suggesting limitations in the current MPQ. We also identified several new pain descriptors that can be used to update the MPQ, including the emergence of a possible
The top 10 highest counts for the 78 original MPQ keywords were substantially lower than those for the new words, ranging from 14,204 to 26,679 and from 21,441 to 96,909, respectively. This suggests that words previously selected for the MPQ may no longer be as commonly used today.
Interestingly, 6 out of the top 10 newly identified pain descriptors that received the top 10 highest counts were relevant to the emotional or mental description of pain, namely
Our study combined the big data afforded by social media posts with artificial intelligence and additional emotion analytics algorithms that allow for a broad analysis of social media posts with high speed and accuracy. The combination of pain research with this technology, originally designed to help businesses understand what their customers want through linguistic and contextual cues, helps to address a pressing health care need. As much as 40% of the population contends with chronic pain, and the extensive cumulative impact of chronic pain in the United States alone was estimated to exceed US $500 billion annually [
The updated pain definition by the International Association for the Study of Pain highlights the subjective and emotional nature of pain. The literature reports various links between pain and mood or psychiatric disorders. For instance, pain and major depressive disorder often occur concurrently, appearing to mutually exacerbate the severity of the individual conditions [
Unfortunately, these psychological, emotional, and behavioral interactions with pain experience are rarely brought up in patient interviews. The existing MPQ focuses largely on the physical description of pain. Although its existing words allow the interviewer to infer the emotional toll of pain on the patient, there is lack of a dedicated segment that acknowledges the psychological burden of pain. The introduction of a new
This study has some limitations. Only social media posts in English were included in this study, and most social media posts originated from the Western hemisphere. Therefore, the application of these findings is largely limited to the United States and may not be generalizable to other English-speaking countries. Future work using similar artificial intelligence mechanisms stratified by geographical regions to address regional linguistic differences, and perhaps different languages, may be explored.
Similarly, the sample population in this study was limited to users of social media platforms. The preferred social media platform differs according to age, and in general, the overall population tends to skew young [
Although the artificial intelligence and emotion analytics algorithms selected for the study had previously demonstrated good accuracy, we did not evaluate the validity of the newly identified pain descriptors. The complexity of human thought and expression makes further validation of the suggested questionnaire on a sample of the target patients necessary. Future research could involve a qualitative focus group to examine the face validity of the word descriptors and to ensure that patients feel that their pain is adequately described by the options available or to suggest other terms. Validation and reliability of the new pain questionnaire should be conducted, and the proposed reordering of words within each subclass should be tested.
The original MPQ is inadequate for reporting the psychological aspects of pain. Several descriptors from the original MPQ were also noted to have infrequent use or changes in context. This study used artificial intelligence and emotion analytics algorithms to identify contemporary vocabulary for pain description. The described methodology could be repeated at regular intervals to ensure the relevance of the pain questionnaires. Further research is needed to examine the reliability and validity of the revised MPQ.
Pain descriptors for the first round of data gathering.
List of common pain conditions.
Counts and intensity of pain descriptors from the McGill Pain Questionnaire, web-based thesauruses, and identified through Word2Vec.
McGill Pain Questionnaire
The authors would like to express their gratitude to the team at INTNT.AI (Singapore) for their invaluable support and technical guidance: Mr Manuel Ho, Mr Xuyuan Kee, Dr Murphy Choy, and Dr Yang Yinping. This study was supported by a grant from the Faculty of Dentistry, National University of Singapore.
MYT, CEG, and HHT conceived and designed the study, collected the data, contributed to data or analysis tools (obtained through a partnership with INTNT.AI), and performed the analysis. MYT and CEG were involved with writing the paper.
None declared.