This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Formative Research, is properly cited. The complete bibliographic information, a link to the original publication on https://formative.jmir.org, as well as this copyright and license information must be included.
Many researchers have aimed to develop chronic health surveillance systems to assist in public health decision-making. Several digital health solutions created lack the ability to explain their decisions and actions to human users.
This study sought to (1) expand our existing Urban Population Health Observatory (UPHO) system by incorporating a semantics layer; (2) cohesively employ machine learning and semantic/logical inference to provide measurable evidence and detect pathways leading to undesirable health outcomes; (3) provide clinical use case scenarios and design case studies to identify socioenvironmental determinants of health associated with the prevalence of obesity, and (4) design a dashboard that demonstrates the use of UPHO in the context of obesity surveillance using the provided scenarios.
The system design includes a knowledge graph generation component that provides contextual knowledge from relevant domains of interest. This system leverages semantics using concepts, properties, and axioms from existing ontologies. In addition, we used the publicly available US Centers for Disease Control and Prevention 500 Cities data set to perform multivariate analysis. A cohesive approach that employs machine learning and semantic/logical inference reveals pathways leading to diseases.
In this study, we present 2 clinical case scenarios and a proof-of-concept prototype design of a dashboard that provides warnings, recommendations, and explanations and demonstrates the use of UPHO in the context of obesity surveillance, treatment, and prevention. While exploring the case scenarios using a support vector regression machine learning model, we found that poverty, lack of physical activity, education, and unemployment were the most important predictive variables that contribute to obesity in Memphis, TN.
The application of UPHO could help reduce health disparities and improve urban population health. The expanded UPHO feature incorporates an additional level of interpretable knowledge to enhance physicians, researchers, and health officials' informed decision-making at both patient and community levels.
RR2-10.2196/28269
Enhanced health surveillance systems for chronic disease support could mitigate factors that contribute to the incline of morbidity and mortality of diseases such as obesity. Obesity is linked to increased overall mortality and has reached pandemic proportions, being responsible for approximately 2.8 million deaths annually [
Neighborhood factors such as socioenvironmental determinants of health (SDoH) significantly contribute to these statistics [
Health surveillance involves the “ongoing systematic collection, analysis, and interpretation of data essential to the planning, implementation, and evaluation of public health practice, closely integrated with the timely dissemination of these data to those who need to know” [
Many current digital health solutions and electronic health record (EHR) systems lack the ability to incorporate machine learning algorithms into their decision-making process, and even if they do, the algorithms used do not have appropriate capabilities to explain the suggested decisions and actions to human users [
We implement a UPHO platform as a knowledge-based surveillance system that provides better insight to improve decision-making by incorporating SDoH and providing XAI and interpretability functions [
Expanded Urban Population Health Observatory framework. CDC: US Centers for Control and Prevention; USDA: US Department of Agriculture; KG: knowledge graph; UPHO: Urban Population Health Observatory.
The objectives of this article are to (1) expand UPHO by incorporating a semantics layer, (2) cohesively employ machine learning and semantic/logical inference to provide measurable evidence and detect pathways that lead to undesirable health outcomes, (3) provide clinical case scenarios and design case studies on identifying SDoH associated with obesity prevalence, and (4) provide a dashboard design that demonstrates the use of UPHO in the context of obesity, using the provided case scenario.
The data management domain comprises the data layer. The UPHO collects population-level health and SDoH data and individual-level clinical and demographic data from EHRs through regional registries.
To obtain population-level health data, we used the US Centers for Control and Prevention (CDC) 500 Cities Behavior Risk Factors Surveillance System, which includes data regarding chronic diseases and their behavioral risk factors [
We extracted population-level SDoH variables that pertain to food insecurity, transportation, and socioeconomic stability at zip code, census tract, census block, and census block group levels from the US Census Bureau 2018 American Community Survey [
The analytics layer pulls raw data from different sources in the data layer and analyzes it to classify it, predict new relations, conduct spatial pattern detection, and calculate new metrics. The analytics layer also performs feature engineering by deriving new metrics and using them to enrich the original data sets.
The stages of the UPHO semantics layer are shown in
We start our semantic analysis using concepts defined in our ontologies and web services to align concepts to actual data resources, allowing us to construct a population knowledge graph structure that abides by an ontology and contains both data and concepts [
Urban Population Health Observatory semantic layer framework.
An effective explainable system accounts for the target user group (eg, physician, researcher). Knowledge of the end user is very important for the delivery of decisions, recommendations, and actions. Each analytics and semantics layer contains an explainability component that can be leveraged in the uppermost health applications layer. To maintain features such as data integration, XAI, and interpretability, we must achieve interoperability by using semantics and ontologies. Explanations adaptable to the user can decrease errors in interpretation by enhancing the interpretability of outcomes and findings.
The UPHO platform can be used as a basis to develop several applications, some of which we have already developed, including dashboards [
The following sections present 2 clinical case scenarios that focus on a physician and a researcher as users to demonstrate the methodology used in the knowledge and intelligence domain layers and the corresponding dashboard design in the application.
Scenario 1: A physician seeks an effective intervention for an adult African American patient diagnosed with obesity. The physician focuses on how SDoH in the patient’s neighborhood can influence the doctor’s management plans.
Scenario 2: A researcher investigating the impact of SDoH on obesity seeks an effective intervention for the adult obese populations in Memphis, TN.
We trained a machine learning–based support vector regression (SVR) machine model [
Summary statistics for obesity and related risk factors in Memphis, TN, census tract (n=178 census tracts).
Features | Operationalization | Original, mean (SD) | Training, mean (SD) | Test, mean (SD) |
Obesity | Model-based estimate for crude prevalence of obesity among adults aged ≥18 years, 2018 | 37.50 (7.84) | 37.42 (7.54) | 37.97 (6.95) |
Low access to supermarket | Count of low income population more than half mile from a supermarket in the census tract | 1382.20 (108.37) | 1345.68 (967.83) | 1616.17 (1120.23) |
Black | Percentage of population that is Black or African American | 63.17 (32.70) | 62.22 (33.04) | 63.72 (31.88) |
Poverty | Percentage of population living below the federal poverty line | 28.65 (16.28) | 28.27 (16.18) | 31.06 (17.06) |
Unemployment | Percentage of unemployed population | 15.73 (9.31) | 15.97 (9.67) | 14.16 (6.52) |
High school diploma | Percentage of population 25 years or older without high school diploma | 10.38 (6.59) | 10.23 (6.70) | 11.35 (5.89) |
Lack of physical activity | Model-based estimate for crude prevalence of lack of physical activity among adults aged ≥18 years, 2018 | 36.16 (9.80) | 35.97 (9.79) | 37.34 (9.99) |
Crime | Crime rate per thousand people | 350.20 (126.26) | 160.99 (337.65) | 111.93 (80.40) |
Lack of access to insurance | Model-based estimate for crude prevalence of lack of insurance among adults aged ≥18 years, 2018 | 20.21 (6.78) | 20.10 (6.81) | 20.88 (6.67) |
We followed the following ordered steps to generate the semantics layer knowledge graph from concepts defined in our domain ontologies.
We use concepts, relations, and axioms from domain ontologies to construct a preliminary population knowledge graph. For our scenario, we start by adding a dummy node that represents either a patient or a population (
We populate the generated graph structure with evidence from the data layer. For instance, our data set contains a variable that shows the prevalence of obesity as a percentage metric in specific neighborhoods. We use that information to add edges to our graph that link obesity (as a disease) to prevalence (as a metric).
We further enrich and refine the initial graph by performing knowledge engineering using the logical reasoner (
After performing the logical inference on the initial graph structure, we incorporate new nodes and edges in the graph corresponding to new concepts (eg, the lackOfPhysicalActivity concept from the COPE ontology) or new relations (eg,
To gather the most important information from this graph, a user can trace a specific pathway based on both logical inference and machine learning results. The red arrows in
Knowledge graph that links concepts defined in domain ontologies (eg, GISO: CensusTract) to data resources stored in databases (eg, percentage Black population) or those derived from the analytics layer. The upper part of the figure shows the nodes and edges produced through semantic inference during the knowledge engineering phase. The lower part of the figure shows the nodes and edges added through ML analysis during the feature engineering phase. GISO: geographical information system ontology; HIO: health indicators ontology; ACESO: adverse childhood experiences ontology; COPE: Childhood Obesity Prevention (Knowledge) Enterprise; DO: disease ontology.
Generic rule axioms
COPE:lackOfPhysicialActivity
%ObesityPrevalence:Metric
Obesity:Disease
Facts
individual:Patient
“10300”:CensusTract
“10300”:CensusTract
“10300”:CensusTract
“10300”:CensusTract
Feature engineered through multivariate analysis
%PopWLackOfPhysicialActivity:Metric
Logical reasoning
individual:Patient
individual:Patient
No ethics review board assessment was required for this study because we used publicly available data.
The significant Spearman rank coefficient and VIF of the 7 features included in this study are shown in
Spearman rank coefficient and variance inflation factor for each feature.
Features | Spearman rank coefficient | VIFa |
Low access to supermarket | 0.37 | 1.70 |
Black | 0.77 | 2.80 |
Poverty | 0.83 | 3.66 |
Unemployment | 0.73 | 3.02 |
No high school diploma | 0.81 | 3.55 |
Lack of physical activity | 0.92 | 8.82 |
Crime | 0.37 | 1.68 |
aVIF: variance inflation factor.
Support vector regression data set–level feature importance score.
Features | SVRa feature importance |
Low access to supermarket | 4.39 |
Black | 68.20 |
Poverty | 78.60 |
Unemployment | 70.16 |
No high school diploma | 73.41 |
Lack of physical activity | 100 |
Crime | 0 |
aSVR: support vector regression.
The Shapley Additive Explanations (SHAP) value plot of the feature contribution (unscaled) for the patient’s neighborhood (census tract:10300). The x-axis represents the SHAP’s value, and the y-axis represents the features. The lack of physical activity and poverty had the largest positive (increase) contributions to obesity prevalence in the patient’s neighborhood.
In this section, we describe the semantics feature provided by UPHO through a proof-of-concept prototype that will display the different features of the expanded system by implementing the clinical scenarios described in the previous section.
First, the user will sign into the UPHO platform dashboard, which will determine their specific role and establish the proper access permissions. The user will make the selections from the following menu items:
S1. Select an outcome of interest (eg, obesity prevalence, cancer,)
S2. Select analytics aim
S3. Select level of analysis and enter address/location (patient’s address [patient-level], city, county, or state [population-level])
S4. Select geographical level of granularity (eg, zip code, census tract)
S5. Select SDoH domain-specific risk factors
After making these selections, the system will present on-demand explanations of risk level calculations, based on the selected level of geographic granularity.
The physician selects “obesity prevalence” as the outcome of interest (S1), and “causal pathway analysis” (S2) as the analytics aim, “patient-level” as the level of interest (entering patient’s address, S3), and “census tract” as the geographical level of granularity (S4). The system provides risk-level calculations and descriptive statistics based on the census tract of the patient’s address. The physician also has the option to select a particular SDoH of interest in S5, in which case the system will highlight these nodes in the graph. Finally, the user selects “Explore” to generate the results and a corresponding knowledge graph. These results are tailored to the user’s interest in patient-level analysis and provide an explanative overview of the analysis results (
The dashboard of the Urban Population Health Observatory displays a physician user interested in obesity prevalence in her patient’s neighborhood with an overview of analysis results (A), explanations displayed when user hovers over a particular pathway (B), knowledge displayed when user hovers over a particular node (C), and summary of recommendations and knowledge (D). ACESO: adverse childhood experiences ontology; GISO: geographical information system ontology; DO: disease ontology; HIO: health indicators ontology.
Here the researcher has access to more features. The researcher explores the causal pathway analysis aim in a population-level analysis and enters Memphis, TN, as a location of interest at the census tract–level (S1-S3), as shown in
The dashboard of the Urban Population Health Observatory displays a researcher as the user interested in obesity prevalence in Memphis, TN, with univariate regression plot (A), multivariate analysis (B), and (C) which contains an overview of analysis results (a), explanations displayed when user hovers over a particular pathway (b), knowledge displayed when user hovers over a particular node (c), and summary of recommendations and knowledge (d). ACESO: adverse childhood experiences ontology; GISO: geographical information system ontology; DO: disease ontology; HIO: health indicators ontology.
The graph part of the dashboard can serve as a tool for researchers and physicians to semantically explain the recommendations that we made about a specific patient or population. The current version of the graph provides 2 different visual cues, as follows.
Tracing pathways on the graph provides visual cues. The red arrows in
Clicking on a node or edge on the graph displays analysis results or knowledge. The user can hover over a certain edge (eg, lackOfPhysicalActivity
isPredictorOf
ObesityPrevalence;
UPHO’s metrics can be implemented into the backend of EHR systems (eg, Epic), and the results of those metrics can be rendered on the EHR interface in the form of risk scores on dashboards with severity indicators based on thresholds. Physicians can examine these metrics at the population level or individual patient level. UPHO can alternatively be used in a standalone approach by allowing a physician to extract more details about a single patient by providing the patient’s address or a population of patients by providing their city, state, or county. The input is coded to a geographical level of granularity that can be aligned with the population-level data to gain insights into the patient’s environment.
Previous studies provided evidence that socially disadvantaged communities are disproportionately affected by chronic diseases such as obesity [
The incorporation of semantics provides the user with an additional layer of explainability and interpretability, which could decrease errors in intervention or treatment due to misinterpretation or misunderstanding. The semantics layer can also use ontologies to overcome the challenge of scattered data sources, thereby assisting in the achievement of interoperability, which will be used to maintain features such as data integration, XAI, and interpretability. We apply logical reasoners to extract and supply knowledge despite limited data.
Similar chronic disease surveillance systems [
We followed the conceptual framework for UPHOs [
One of the major limitations of UPHO is that it collects population data so that neighborhood or population assumptions are made for an individual in a clinical setting. For instance, individuals or patients who live in a particular population or neighborhood might not have the same characteristics as other individuals residing in the same neighborhood or population. However, our platform provides an end-to-end approach to examining the environment one resides and incorporates information that is important for the implementation of effective interventions for a given disease.
The future work will be focusing on the further development of the UPHO platform, so it can enable timely, insight-driven decisions and inform immediate or long-term health policy responses [
This study leveraged semantic technology and presented a proof-of-concept prototype design for our knowledge-based surveillance system, UPHO, which aims to reduce health disparities and improve population health. The expanded feature incorporates another level of interpretable knowledge needed to inform physicians, researchers, and health officials’ decision-making process at the community level. Incorporating XAI helps with the explainability and interpretability of the relevant data, information, and knowledge. Users who are not equipped with domain knowledge could extract common sense knowledge from a system that incorporates XAI [
adverse childhood experience
adverse childhood experiences ontology
US Centers for Control and Prevention
Childhood Obesity Prevention (Knowledge) Enterprise
disease ontology
electronic health record
geographical information system ontology
health indicators ontology
mobile health
root mean square error
socioenvironmental determinants of health
Shapley Additive Explanations
support vector regression
Urban Population Health Observatory
variance inflation factor
World Health Organization
explainable artificial intelligence
The funding for this study is provided by the University of Tennessee Health Science Center.
None declared.