Patient Experience and Satisfaction in Online Reviews of Obstetric Care: Observational Study

Background The quality of care in labor and delivery is traditionally measured through the Hospital Consumer Assessment of Healthcare Providers and Systems but less is known about the experiences of care reported by patients and caregivers on online sites that are more easily accessed by the public. Objective The aim of this study was to generate insight into the labor and delivery experience using hospital reviews on Yelp. Methods We identified all Yelp reviews of US hospitals posted online from May 2005 to March 2017. We used a machine learning tool, latent Dirichlet allocation, to identify 100 topics or themes within these reviews and used Pearson r to identify statistically significant correlations between topics and high (5-star) and low (1-star) ratings. Results A total of 1569 hospitals listed in the American Hospital Association directory had at least one Yelp posting, contributing a total of 41,095 Yelp reviews. Among those hospitals, 919 (59%) had at least one Yelp rating for labor and delivery services (median of 9 reviews), contributing a total of 6523 labor and delivery reviews. Reviews concentrated among 5-star (n=2643, 41%) and 1-star reviews (n=1934, 30%). Themes strongly associated with favorable ratings included the following: top-notch care (r=0.45, P<.001), describing staff as comforting (r=0.52, P<.001), the delivery experience (r=0.46, P<.001), modern and clean facilities (r=0.44, P<.001), and hospital food (r=0.38, P<.001). Themes strongly correlated with 1-star labor and delivery reviews included complaints to management (r=0.30, P<.001), a lack of agency among patients (r=0.47, P<.001), and issues with discharging from the hospital (r=0.32, P<.001). Conclusions Online review content about labor and delivery can provide meaningful information about patient satisfaction and experiences. Narratives from these reviews that are not otherwise captured in traditional surveys can direct efforts to improve the experience of obstetrical care.


Introduction
Many hospitals in the United States use the Hospital Consumer Assessment of Healthcare Providers and Systems (HCAHPS) and Press Ganey surveys to evaluate patient experiences [1]. Survey results are standardized and publicly reported to facilitate comparisons of patient experience. However, they are costly and often have low response rates [2,3]. Prespecified domains may miss the concerns of many patients, and aggregated public reporting can obscure differences across specialties [4,5].
Yelp is a website where users share information about their experiences at local businesses by giving a star rating from 1-5 and leaving a narrative review. Yelp is the most used free website in the United States for hospital ratings [6]. In one study, 65% of gynecologists reported being likely to use online ratings to improve patient care, more so than physicians from other specialties [7]. Prior work demonstrates that reviews from online rating services like Yelp are correlated with traditional methods for understanding the patient experience, and the platform's unstructured design provides information not captured in conventional patient experience surveys [2,[8][9][10][11][12]. The scale and utilization of these platforms is significant and may provide a nuanced way to better listen to patients [13]. In this study, we aim to evaluate how the content of labor and delivery Yelp reviews relates to star rating to provide insight into the labor and delivery experience in the United States.

Obtaining Hospital Reviews
We identified hospitals in the United States that have Yelp reviews using the Yelp Search application programming interface. We included only hospitals listed in the American Hospital Association directory with at least one review. Hospital reviews were then searched for keywords specific to labor and delivery, identified by referencing the Unified Medical Language System database and gathering input from an obstetrician (SKS). The search terms included variations of the same word-for example, "deliver" and "delivery" were both used but counted as one search term. Reviews containing at least one of the specified keywords were characterized as "labor and delivery reviews" and all others as "non-labor and delivery" (Table 1). We used only reviews that received a 5-star or 1-star review for the final analyses, considering the bimodal distribution (Table 2).

Deriving Language Features
After removing stop words, common words of low information content (eg, "the," "as," "a"), we used the MALLET implementation [14] of the machine learning program latent Dirichlet allocation (LDA) to generate 100 topics based on prior work [13]. This machine learning technique automates the identification of co-occurring words whose combination suggests themes or topics [15]. For example, the frequent co-occurrence of "hours," "waiting," "sitting," and "lobby" would define a topic which, on inspection, suggests the theme of long wait times. LDA was used to build a topic model using the corpus of review text; afterward, each review was represented as a weighted mixture of the 100 topics generated from the reviews.

Identifying Differentially Expressed Language Features
Our analysis was aimed at identifying differentially expressed topics in reviews with a 1-star (low) rating versus a 5-star (high) rating considering the bimodal distribution of ratings and based on prior work [11,12]. All statistical analyses were performed in R (version 3.4.1; R Foundation for Statistical Computing). We took a data-driven approach to allow for a more transparent view of the words and phrases that differentiate posts with a high rating (5-star) from those with a low rating (1-star). We isolated the patterns in language topics to obtain correlations in both groups using ordinary least squares (OLS) regression. Treating each review as an observation, OLS regression was performed on standardized LDA derived variables for each review, with the reviews that received 5 stars labeled as 1 and those that received 1 star labeled as 0, and the LDA topic weights of the written review text as the independent variables. Since the variables were standardized, the OLS regression coefficients can be interpreted as Pearson correlations. Topics with a positive coefficient are therefore associated with 5-star reviews, and topics with large negative coefficients are associated with 1-star reviews. We used Bonferroni correction and P<.001 for indicating meaningful correlations and the effect size was measured using Pearson r. Most highly correlated topics were labeled independently by two coauthors by examining the top 7 terms in each topic. Adjudication of discrepancies occurred via consensus with a third coauthor reviewer.

Ethical Considerations
The University of Pennsylvania Institutional Review Board deemed the study exempt.

Hospital Reviews
We identified 41,095 reviews from 1569 hospitals listed in the American Hospital Association directory with at least one Yelp rating posted from May 2005 to March 2017. Among those hospitals, 919 (59%) had at least one Yelp rating for labor and delivery (median of 9), contributing a total of 6523 labor and delivery reviews about labor and delivery services. The distribution of ratings is shown in Table 2.
Themes correlated with 1-star labor and delivery reviews included the experience of calling the hospital (r=0.33), interactions with reception (r=0.31), complaints to management (r=0.30), telling others to avoid the hospital (r=0.32), a lack of agency among patients (r=0.47), and issues with discharging from the hospital (r=0.32; Table 4).

Compassionate caring staff
We are very blessed to have such an amazing team of nurses. The support and caring was priceless. We most likely will have baby 2 in here. If you looking for a happy medium between home-birth or hospital-birth, this place is the answer.  Table 4. Yelp differential language analysis topics associated with negative (1-star) labor and delivery reviews.

Negative interactions
This review is for the prenatal clinic. I called this morning because I had been advised to do so by my Dr. The lady who answered the phone was extremely rude and unprofessional. Her exact words " and your calling us for????" With the rudest tone I have ever heard from a healthcare professional. I was beyond shocked and will never go to their clinic. 0.465 Lack of agency (told, asked, didn't, questions, wasn't, couldn't, rude, talk, telling, upset) Beware!!! Don't entrust this incompetent facility with your life!! Intake form -misspelling of name and incorrect recording of birth date so records could not be found until 4phone calls later… 0.321 Being discharged (information, discharge, medications, papers, husband, refused, stated, physician, attending, signed)

Communication with hospital
Horrible customer service. Called the operator to inquire about setting up a prenatal appointment to find an OBGYN and she told me to google them to find one I like! Ha what a joke.

Principal Findings
This study found identifiable themes associated with high and low ratings, offering insights into what patients seeking labor and delivery services care about most. Online reviews about hospitals include comments about the experience of labor and delivery care. Although online reviews are not validated and may attract or amplify the most negative comments [13,16], they reflect raw reports from patients unconstrained by pre-established topics.
Positive reviews on the labor and delivery experience overwhelmingly cited compassionate and attentive hospital staff. Nurses were frequently cited as the most important component of the experience. For 5-star reviewers who criticized their experience in any way, caring and helpful nurses and staff almost always made up for the negative aspects of their stay. In addition, 5-star reviews in our study largely referenced positive feelings about hospital staff and the importance of hospital amenities (often citing spa showers, advanced technology, and appealing decor). Prior work reported that, in the patient-provider relationship in an obstetrics and gynecology setting, patients reported greater satisfaction with their health care experience when they had a positive relationship with their care team, which parallels our finding that patients are more satisfied when providers are caring and attentive [17]. Compassion of staff is not a topic measured in HCAHPS surveys. Additionally, HCAHPS and Press Ganey do not include free-text questions; rather, questions are multiple choice.
Negative reviews of labor and delivery included topics typically inverse to the topics discussed in positive reviews. Raters cited negative interactions and lack of communication with hospital staff, long wait times, and low-quality obstetrics care in 1-star labor and delivery reviews. In a prior review of patient satisfaction in obstetrics care, researchers interviewed patients and compiled a total of 51 items related to patient satisfaction [18]. The list included multiple characteristics related to provider communication style, including compassion/sensitivity, communication, accessibility, support, and positive affirmation of birthing process. Access to and communication with hospital staff contribute to a more positive patient experience in the context of labor and delivery care. Understanding the common themes of positive and negative experiences may help clinical and operational staff create initiatives and protocols that lead to better patient encounters.

Limitations
This study has several limitations. The American Hospital Association data set represents broader obstetrical programs (eg, the Hospital of the University of Pennsylvania) but may miss subsidiary programs (eg, Penn Ob/Gyn & Midwifery Care).
The bimodal distribution of reviews may amplify the voices of those with strongly positive and strongly negative experiences, muting the more nuanced and mixed experiences. Clinical terms and procedures may be talked about in slang and ways that are harder to identify using automated techniques. However, using machine learning techniques allows for the analysis of hundreds of thousands of reviews as opposed to what is possible with human coders. In addition, Yelp reviews are not validated and may vary in quality and quantity. To counter this, we eliminated reviews "not recommended" by Yelp (a measure indicating a review is likely to be fake). "Not recommended" reviews are determined automatically by Yelp's proprietary algorithm that considers a number of factors to try and remove fake reviews (eg, one person posting many reviews from the same computer).
In the future, including other online review platforms may provide richer insights. The practical application of this data is largely valuable as a supplemental insight into the patient's psychological experience of their labor and delivery care. Understanding the themes that correlate to high and low reviews may provide a place to start when developing standardized surveys for measuring care.

Conclusions
Transparency of hospital performance data is vital to enhancing patient trust and improving health care delivery. Online rating websites may help foster trust and goodwill between hospitals and their consumers, allow consumers to make more informed decisions, and encourage quality improvements [16,19]. Increasing the validity and scientific rigor of these narrative feedback platforms may increase the value of these patient narratives for further improving obstetrics care in the United States [20,21].