Identification of Emotional Expression With Cancer Survivors: Validation of Linguistic Inquiry and Word Count

Background Given the high volume of text-based communication such as email, Facebook, Twitter, and additional web-based and mobile apps, there are unique opportunities to use text to better understand underlying psychological constructs such as emotion. Emotion recognition in text is critical to commercial enterprises (eg, understanding the valence of customer reviews) and to current and emerging clinical applications (eg, as markers of clinical progress and risk of suicide), and the Linguistic Inquiry and Word Count (LIWC) is a commonly used program. Objective Given the wide use of this program, the purpose of this study is to update previous validation results with two newer versions of LIWC. Methods Tests of proportions were conducted using the total number of emotion words identified by human coders for each emotional category as the reference group. In addition to tests of proportions, we calculated F scores to evaluate the accuracy of LIWC 2001, LIWC 2007, and LIWC 2015. Results Results indicate that LIWC 2001, LIWC 2007, and LIWC 2015 each demonstrate good sensitivity for identifying emotional expression, whereas LIWC 2007 and LIWC 2015 were significantly more sensitive than LIWC 2001 for identifying emotional expression and positive emotion; however, more recent versions of LIWC were also significantly more likely to overidentify emotional content than LIWC 2001. LIWC 2001 demonstrated significantly better precision (F score) for identifying overall emotion, negative emotion, and anxiety compared with LIWC 2007 and LIWC 2015. Conclusions Taken together, these results suggest that LIWC 2001 most accurately reflects the emotional identification of human coders.


Introduction
Recent studies have provided evidence that emotions can be effectively identified in written text [1][2][3][4]. Written emotions have been identified as significantly different from characteristically nonemotional writing, such as academic tasks [4], and more importantly, they can be correctly identified by readers [3]. As the use of web-based interventions, mobile app, social media, other text communication (eg, emails and text messages), and web-based communication (eg, Zoom and Skype) increases, the need for validated tests that can rapidly analyze text data has become exceptionally important. A large proportion of emotion-based text data are currently analyzed by human coders [5,6] as they add predictive power above and beyond that of computer analyses for the identification of positive emotions [7] as well as symptoms of depression [8].
Although computer analysis has become increasingly more efficient in evaluating written text, it lacks the nuance and accuracy provided by human coders [9]. Although qualitative analysis provides the most complete method for characterizing text-based communications [10], the cost, time requirements, and subjectivity of manual coding make these methods prohibitively difficult for many applications. Computerized text analysis programs may be able to ameliorate these limitations. Unfortunately, many computerized text analysis programs are not validated for the identification of emotional expression and still have time-consuming data preparation to ensure that the text is clear of all typographical errors [9].
The Linguistic Inquiry and Word Count (LIWC) [11] is a text analysis program that calculates the degree to which various categories of words are used in text. Bantum and Owen [12] previously evaluated the validity of the LIWC 2001, demonstrating that LIWC 2001 had good sensitivity and specificity for identifying emotion; however, the positive predictive value (PPV), or precision of emotional identification, was poor. Additional work by our team with the creation of a machine learning program [13] demonstrated that a machine learning approach was not necessarily more predictive than LIWC. Since our previous validation study, Pennebaker et al [14] have released 2 updates: LIWC 2007 and LIWC 2015, both of which have removed categories found to have poor emotional identification (eg, optimism) and increased the dictionary size of the program. These updates have greatly altered the dictionary for this program, including new categories of drive, analytical thinking, emotional tones, time orientation, and relativity and new subcategories of gender references and netspeak. In terms of emotional content, the largest change involves the inclusion of netspeak to quantify emotional expression (eg, ":)" would be categorized as a positive emotion word).
It is essential to validate LIWC 2007 and LIWC 2015 due to the widespread use of LIWC in research, clinical treatment, and commercial applications. LIWC 2001 and 2015 successfully identified individuals at risk for suicide as they had an increased presence of first-person singular self-references and negative emotional expression as they approached their suicide [1,2,15]. Sonnenschein et al [16] indicated LIWC 2007 noted that depressed individuals produced more words related to the sadness subcategory when compared with anxious individuals. Overall, LIWC has been determined to be able to accurately identify language differences between people who are diagnosed with attention deficit hyperactivity disorder, anxiety, bipolar, borderline, depression, eating disorders, obsessive-compulsive disorder, schizophrenia, and seasonal affective disorder [17,18]. LIWC 2007 has been used to evaluate whether a clinician's distancing language increases suicide rates among veterans [19], assess what men and women value in romantic relationships [20], determine how social media predicts the outcomes of presidential elections [21], and evaluate language style and its ability to determine relationship initiation and stability [22]. Similar to LIWC 2001, the original evaluation of the validity of LIWC 2007 and LIWC 2015 was based on the results of a series of correlations between judges' ratings and LIWC scores. Bantum and Owen [12] found that LIWC have low predictive value and overidentification of emotions, despite significant, though modest, correlations between judges' ratings and LIWC scores [12]. Given these shortcomings and the need to independently validate LIWC 2007 and LIWC 2015 for emotion recognition in text [23,24], we sought to determine whether LIWC 2007 and LIWC 2015 are valid for emotion recognition in text and whether they improve upon the known limitations of LIWC 2001. The first aim of this study was to assess the accuracy of LIWC 2007 and LIWC 2015 for the detection of emotional expression using tests of specificity, sensitivity, PPV, and negative predictive value (NPV). The second aim was to evaluate potential differences between LIWC 2001, LIWC 2007, and LIWC 2015 for emotional identification. It is hypothesized that LIWC 2015 will be more efficacious in emotional identification than LIWC 2007 and LIWC 2001 because of the significant alterations in the most recent dictionary, recent regard for some contextual information in written text, and continued improvement in word selection through the construction of the product.

Participants
The participants were recruited from a hematology/oncology outpatient clinic at a large medical center in the southeastern United States. The participants included 49 women with stage 1 or stage 2 breast cancer and 14 women with stage 3 or stage 4 breast cancer. Participants were not excluded from participation based on the time elapsed since their diagnosis or medical treatment. The women participated in a randomized 12-week clinical trial of an internet-based support group. Additional information regarding the sample has been previously reported [25]. The textual data of the 63 participants were analyzed for this particular study. The participants had a mean age of 49.8 years (SD 11.0), the majority were college educated (mean 15.4 years, SD 2.4), and they were largely of White descent (57/63, 93%).

Procedures
All participants completed a baseline assessment once they agreed to participate in the study, before being given access to the web-based support group. Once the participants were given access to the web-based support group, they were encouraged to communicate with one another through a discussion board regarding general topics and a series of interactive coping-skills training. The textual data were stored in an individual data file for each participant. Further information regarding the experimental procedures for these participants has been previously reported [12,26]. The study data set is identical to that used in the original validation study of LIWC 2001.

Rater Coding of Emotional Expression
This particular study utilized human-coded ratings of emotion generated in a previous analysis of these linguistic data [12]. To briefly describe how these codes were generated, Bantum and Owen [12] had a well-defined set of rater coding rules for the human coders to follow. Each coder independently identified all words containing emotional expression. If the coders determined that emotional expression was present in a given sentence (as taken into consideration within the context of the text), then the word that most acutely captured the emotion was placed into the best fitting subcategory, based on categories specified by LIWC 2001. Eight potential categories were used: "positive feelings," "optimism," "anxiety," "anger," "sadness," "other positive emotion," "other negative emotion," or "not emotion." Coders were asked to differentiate between what was likely to be a physical pain and emotional pain or experience (eg, "the impact of chemotherapy on me physically was physically exhausting"). This process took quite a lot of training and judgment, as the use of emotion can be ambiguous, especially solely in text. Any discrepancies found between coders were discussed among the researchers, and final codes were established by consensus. The interrater reliability between the two trained coders was excellent (κ=0.80). Two additional raters were trained on the coding process and then reviewed 33% of the text. The interrater reliability was evaluated between the two additional raters and was established to have substantial agreement between raters (κ=0.69).

LIWC 2001
LIWC 2001 is a computational text analysis method that processes text-based data on a word-by-word basis. LIWC compares each word of text with a library of 5 categories (ie, linguistic dimensions, psychological processes, relativity, personal concerns, and experimental dimensions) of words to identify whether each specific word from the source data set matches any of the words or word fragments found in the LIWC library. With regard to emotion, words are compared with each of the 3 categories of emotion (eg, emotional expression, positive emotion, and negative emotion). For each word that is established as a match to a word or fragment in a LIWC emotion dictionary, the program iterates a count of all emotion words identified in that particular emotion dictionary (eg, positive emotion). LIWC uses the results of the word count to establish a percentage of total words in the text that contain emotion words or a specific type of emotion. After the word has been identified as positive or negative, it is then placed into a specific category, such as positive feelings or optimism in the positive category and sadness, anger, or anxiety in the negative category. In some instances, words may be identified as expressing emotion and categorized as positive or negative, but LIWC is unable to resolve a more specific emotion category (eg, anxiety vs sadness), which may cause it to identify a word as belonging to multiple emotion dictionaries.

LIWC 2007
Each participant's text information was also analyzed using LIWC 2007 (n=63) [14]. LIWC 2007 has a similar structure to that of LIWC 2001 in that it is a computational text analysis program that evaluates each item on a word-by-word basis. Furthermore, LIWC 2007 also provides a percentage of total words that are represented by emotion.
Pennebaker et al [14] developed LIWC 2007 to specifically address a number of key limitations in LIWC 2001, such as a limited dictionary, uncommonly used word categories, and a lack of function words (eg, conjunctions, adverbs, quantifiers, auxiliary verbs, and impersonal pronouns). The creators of LIWC 2007 removed the following word categories found in LIWC 2001 because they had poor base rates: optimism, positive feelings, communication verbs, metaphysical, sleeping, grooming, and school. The new dictionary for LIWC 2007 was altered to provide more accurate word categories by omitting those categories with insufficient validity and adding a number of categories to represent function words as well as including previously experimental categories into the program (eg, swear words, nonfluencies, and fillers). Additionally, researchers increased the dictionary count from 2300 words and word stems to 4500 words so that it may better represent emotional expression and other key psychological constructs. There were a number of words defined as emotion in the LIWC 2007 dictionary that were previously categorized as nonemotion (eg, confident, champ, and resolve). In addition to a reclassification of preexisting words, LIWC 2007 added emotional words that were not originally included in the LIWC 2001 dictionary (eg, grace, jaded, joke, openness, and rancid) and removed some emotional words that were in the LIWC 2001 dictionary (eg, sensitive). Finally, the LIWC 2007 dictionary classified the roots of words as emotion (eg, stammer) that may be perceived as nonemotion by human coders in an extended form (eg, stammered and stammering).

LIWC 2015
Each participant's text information was also analyzed using LIWC 2015 (n=63) [27]. LIWC 2015 has a similar structure to the previous versions in that it evaluates textual data on a word-by-word basis, looking for specific target words within the appropriate word category scales. As words are identified as target words, they are classified into one or more categories or subcategories, which are then arranged hierarchically. Specifically, the word "cried" may be placed in the categories of sadness, negative emotion, overall affect, verbs, and past focus. Furthermore, LIWC 2015 provides a percentage of total words that are represented by emotion, and various structural composition elements (eg, sentence punctuation) are also incremented.
Pennebaker et al [27] created an entirely new dictionary and software system, rather than simply updating the previous versions of LIWC. This resulted in an increase in the dictionary size, totaling nearly 6400 words, word stems, and select emoticons. In addition to an increase in dictionary size, LIWC 2015 added 9 word categories tapping into psychological constructs (eg, affect and cognition) and 5 informal language categories (assents, fillers, swear words, and netspeak). There was the removal of one personal concern category (eg, work, home, and leisure activities) and one standard linguistic category (eg, percentage of words in the text that are pronouns, articles, and adjectives). Pennebaker et al [27] also reclassified how the word "like" is categorized in an attempt to reduce the risk of false positives that may occur in utterances, comparatives, or prepositions. Specifically, in previous versions of LIWC, the word "like" was categorized as indicative of emotion, particularly positive emotion. In LIWC 2015, the word "like" only qualifies as an emotion word when attributed to a person or an action, such as "I like," "they like," and "will like." Pennebaker et al [27] added a category of Netspeak to assess for words and communication styles (eg, btw, lol, and thx) common on social media platforms (eg, Facebook, Snapchat, and instant messaging).

Data Preparation
Each time the individuals participated in the web-based support group, their textual data were saved in person-specific files. The files were then combined into a single spreadsheet per participant so that each word was considered a subject. The text files from the human rater coding of emotional expression were created and combined with the results from the LIWC 2001, LIWC 2007, and LIWC 2015 data analysis. Each instance of emotion was counted as one point, and the frequency of a given emotion was divided by total words for that participant, resulting in a percentage of a given emotion for each participant. This was true for LIWC 2001, LIWC 2007, and LIWC 2015.

Data Analytic Plan
This study contains a total of 165,754 words consisting of 278 single-spaced, 12-point font pages. Each word is considered as a single variable. An analysis of power was conducted using G*Power 3 [28] and indicated that with an effect size of 0.5, alpha level of .01, and a sample size of 165,754, the power to detect between-method differences in greater than 0.80. To assess the differences between LIWC 2001, LIWC 2007, and LIWC 2015 with regard to the accuracy of emotional identification, we calculated tests of proportions. We calculated the sensitivity, specificity, PPV, and NPV to identify the proportion of words that were similarly identified by human coders for LIWC 2001, LIWC 2007, and LIWC 2015. Subsequently, we utilized the overall proportions for each emotional category and conducted the test of proportions using the total number of emotion words identified by human coders for each emotional category as the reference group. To control for the number of tests, we calculated a Bonferroni correction for the P value to provide more stringent criteria for meeting sensitivity, which was calculated at P=.008. In addition to tests of proportions, we calculated F scores to evaluate the accuracy of LIWC 2001, LIWC 2007, and LIWC 2015. Accuracy was assessed by considering the harmonic mean of precision, or the fraction of retrieved items that are relevant, and recall, or the fraction of retrieved items that are relevant and successfully retrieved, of each program. This is a way of balancing the various measures (sensitivity and PPV) that we have evaluated. The F score was calculated by multiplying 2 with the results of PPV (precision) multiplied by sensitivity (recall) divided by PPV (precision) added to sensitivity (recall).

Overview
The average percentage of words identified by LIWC 2001, LIWC 2007, LIWC 2015, and human coders as emotion, positive emotion, and negative emotion as well as specific subcategories of anxiety, anger, and sadness ranged from 0.1% to 4.1% of total words (

F Score
The results of the F score were compared using a test of difference, which was conducted to determine whether the difference between two proportions was significant, and it revealed that LIWC 2001 was significantly more precise in its evaluation of total emotional expression, positive emotion, and anxiety in comparison with LIWC 2007. Additionally, LIWC 2001 showed significantly more precision in its evaluation of total affect, positive emotion, negative emotion, and anxiety in comparison with LIWC 2015. Notably, there were no significant differences in accuracy between LIWC 2007 and LIWC 2015 ( Table 6).  The sensitivity levels for all 3 versions of LIWC indicate strength with regard to the identification of emotional content, such that they were highly sensitive to the identification of emotional expression. This is exceptionally important when analyzing text data where the expected prevalence of emotional expression is low or where the risk of overidentification is much lower than the risk of under-identification (eg, in evaluating suicide risk), where a missed instance could have grave consequences. There are many applications for which sensitivity, rather than accuracy, may be preferable, and LIWC 2007 and LIWC 2015 demonstrate an improved performance in sensitivity relative to LIWC 2001. However, there are potential consequences to the overidentification of emotional expression, particularly when accurate emotion recognition is required for a specific utterance. LIWC 2001 may present as superior in its emotional identification over LIWC 2007 and LIWC 2015, yet the accuracy in its performance is highly dependent upon the population being evaluated. The PPV is dependent upon the prevalence in the population, meaning it can vary based on the sample utilized, whereas sensitivity may remain the same despite what population is being evaluated [29]. More specifically, cancer patients have been found to express more emotion than a physically healthy population [30], meaning that the prevalence of expressed emotion is higher for the sample utilized in this study than that of the general population. Considering prevalence rates or emotional expression in the cancer population, LIWC 2001 and LIWC 2007 are likely to produce poorer PPVs if being utilized with the emotional expression of a nonclinical population. Text that is likely to have nuanced emotion, such as is often the case in the experience of cancer, will also be less likely to be related to accuracy when coded by LIWC. LIWC 2001 and LIWC 2007 currently have a high rate of false positives, which may increase when evaluating a less emotionally expressive population or decrease when evaluating a more emotionally expressive population. Ultimately, the LIWC programs would benefit from further validation utilizing alternative populations with varying levels of emotional expression.
The initial validation of LIWC 2001, 2007, and 2015 relied heavily on simple correlations between LIWC codes and judge's ratings of emotional content. Correlation analyses describe the extent to which the variables covary, but not the accuracy of identification. For this, measures of testing accuracy are more appropriate. Conducting analyses such as a test of proportions allows users to see the weaknesses and strengths of the relationship and what factors contribute to the strengths of that relationship. Additionally, future studies may also benefit from evaluating writers' intentions related to emotional expression to further validate both LIWC and human coders' ratings. For example, the LIWC classification results (eg, negative emotion) could have been reviewed while also conducting a review with the text writers, ensuring that they intended to express negative emotions, thus confirming the LIWC classification. A machine learning approach, in which rules for coding are created as the raw data are under analysis, in combination with LIWC is another possibility for increasing the overall accuracy as well as making the coding procedure more sample dependent. With a machine learning approach, automated coding rules or algorithms are generated from patterns seen in other coding methods, such as manual coding, potentially allowing for increased accuracy without the time-consuming nature of manual coding [13]. Emotions are multifaceted, making them much more difficult to accurately identify when simplified to one modality. It is important to note that there are many other ways to code emotion in text, and this study focused on validating a program that is commonly used in the field of web-based behavioral sciences.
It must be noted that there are some limitations to this study. The narratives utilized in this study were obtained from women diagnosed with breast cancer. Research has indicated that female cancer patients express more emotion than male counterparts [30]. Additionally, cancer patients are more inclined to endorse affective disorders, such as anxiety, which may impact their emotional expression [31]. Additionally, based on the specific circumstances these women faced (eg, cancer diagnosis, treatment, and outcomes), this could have limited the range of emotions that may have been discussed compared with a healthy population. On the basis of the population utilized, results may be limited to cancer survivors rather than the general population. Furthermore, there were few emotions evaluated (eg, overall affect, positive emotions, negative emotions, anger, anxiety, and sadness), which does not reflect the full range of emotions experienced. This limited range of emotions measured may not accurately reflect the emotions expressed (eg, frustration, excitement, and fear). Finally, the sample utilized was highly homogenous with respect to gender, ethnicity, level of education, and even health status, further limiting the generalizability of the findings.
In contrast to the sample, additional limitations present within this study include intricacies specific to LIWC. As previously mentioned, the classification of the word "like" has been altered across each iteration of LIWC. Most notably, when utilizing LIWC 2015 in text analysis, it may be most accurate to manually evaluate each instance of the word "like" to determine the contextual meaning of the word to ensure appropriate classification of each word. Despite this, manual evaluation determining the context of the word "like" was not performed before conducting automatic analysis with LIWC 2015, as this defeats the utility and efficiency that automatic programs of text analysis offer.
Although human coders are the gold standard for emotional identification in text data, due to the time and cost associated with evaluating such large volumes of data, human coding is not always practical. In addition, manual coding of text, although considered the gold standard, is not entirely accurate. There is inevitably some ambiguity in any attempt to capture the emotion expressed by another person. Text-based data also leave absent, nonverbal expression, leaving fewer cues to code. On the basis of the results, LIWC 2001 is clearly superior to LIWC 2007 and LIWC 2015 with respect to overall precision, but LIWC 2007 and LIWC 2015 are more sensitive to identifying overall expressed emotions. The PPV is highly dependent on the prevalence of emotion in the specific population, such that the more emotion presented in a population, the more accurate the analysis will likely be. Considering the high prevalence of emotion in a cancer population and that LIWC 2001 performed significantly better than LIWC 2007 and LIWC 2015, this indicates that for a population with much less emotional expression, LIWC 2007 and LIWC 2015 will still perform significantly poorer than LIWC 2001. LIWC 2001 seems to have good validity in emotional identification and presents as a reasonable tool for identification of emotion in text data, which is important in the increasingly digital world.