Identification of Risk Groups for and Factors Affecting Metabolic Syndrome in South Korean Single-Person Households Using Latent Class Analysis and Machine Learning Techniques: Secondary Analysis Study

Background The rapid increase of single-person households in South Korea is leading to an increase in the incidence of metabolic syndrome, which causes cardiovascular and cerebrovascular diseases, due to lifestyle changes. It is necessary to analyze the complex effects of metabolic syndrome risk factors in South Korean single-person households, which differ from one household to another, considering the diversity of single-person households. Objective This study aimed to identify the factors affecting metabolic syndrome in single-person households using machine learning techniques and categorically characterize the risk factors through latent class analysis (LCA). Methods This cross-sectional study included 10-year secondary data obtained from the National Health and Nutrition Examination Survey (2009-2018). We selected 1371 participants belonging to single-person households. Data were analyzed using SPSS (version 25.0; IBM Corp), Mplus (version 8.0; Muthen & Muthen), and Python (version 3.0; Plone & Python). We applied 4 machine learning algorithms (logistic regression, decision tree, random forest, and extreme gradient boost) to identify important factors and then applied LCA to categorize the risk groups of metabolic syndromes in single-person households. Results Through LCA, participants were classified into 4 groups (group 1: intense physical activity in early adulthood, group 2: hypertension among middle-aged female respondents, group 3: smoking and drinking among middle-aged male respondents, and group 4: obesity and abdominal obesity among middle-aged respondents). In addition, age, BMI, obesity, subjective body shape recognition, alcohol consumption, smoking, binge drinking frequency, and job type were investigated as common factors that affect metabolic syndrome in single-person households through machine learning techniques. Group 4 was the most susceptible and at-risk group for metabolic syndrome (odds ratio 17.67, 95% CI 14.5-25.3; P<.001), and obesity and abdominal obesity were the most influential risk factors for metabolic syndrome. Conclusions This study identified risk groups and factors affecting metabolic syndrome in single-person households through machine learning techniques and LCA. Through these findings, customized interventions for each generational risk factor for metabolic syndrome can be implemented, leading to the prevention of metabolic syndrome, which causes cardiovascular and cerebrovascular diseases. In conclusion, this study contributes to the prevention of metabolic syndrome in single-person households by providing new insights and priority groups for the development of customized interventions using classification.

The reasons for this rising trend include the large number of unmarried people and late marriages, resulting in changes in marital values, divorce, separation, high unemployment, and diverse and complex social factors in larger cities [5].On the basis of individuals' sociodemographic characteristics and lifestyle, single-person households are more susceptible to exposure to high-risk health behaviors, such as smoking and alcohol consumption, as well as experiences of depression and stress, than multiperson households [6][7][8].
Adult single-person households are known to show distinct differences from multiperson households in terms of demographic characteristics and living habits.For instance, it has been reported that single-person households are more likely than multiperson households to be more susceptible to health problems [9][10][11].In addition, compared with multiperson households, single-person households are more exposed to high-risk health behaviors, such as smoking and drinking, and experience more depression and stress [12,13].
These sociodemographic characteristics and lifestyles indicate that single-person households have a higher prevalence of metabolic syndrome and chronic diseases, such as hypertension, diabetes, dyslipidemia, arthritis, asthma, myocardial infarction, and stroke [14][15][16].
Metabolic syndrome leads to cardiovascular disease and a risk of diabetes [6], involving at least 3 clinical characteristics, namely hypertension, hyperglycemia, and hypertriglyceridemia, and high levels of low-density lipoprotein, as well as to abdominal obesity [6,15].It also increases the occurrence of myocardial infarction, stroke, and dementia [1,6,11,17]; therefore, it is important to decrease the incidence of metabolic syndrome to prevent chronic cardiac and cerebrovascular diseases and reduce the mortality rate [18,19].
It is also necessary to assess the morbidity associated with the disease and develop customized medications and guidelines to manage its risk factors [20].Previous studies have demonstrated that risk factors include age, sex, obesity, smoking, a lack of physical activity, and education [4][5][6][7] Although single-person households include various characteristics, their influences on metabolic syndrome may differ from those of multiperson households and across age groups [1,3].This necessitates a more holistic and systematic understanding of the metabolic syndrome risk factors in single-person households [1], as each risk factor may have a discriminatory or an interrelated effect on metabolic syndrome depending on individual characteristics [21].
Latent class analysis (LCA), a human-centered approach, checks the multidimensional characteristics of human behavior; it contrasts with a conventional variable-centered approach, which describes predictors' relative influence on outcome variables [22][23][24].In addition, identifying the patient type and characteristics is advantageous in predicting the disease, and a customized intervention program can be planned according to individual risk factor vulnerabilities and diagnosis [25][26][27].Machine learning refers to a method of automatically extracting general rules or new knowledge by implementing learning ability, one of the unique intelligence functions of humans, through machines and analyzing the given data [28,29].In this study, the factors affecting metabolic syndrome in South Korean single-person households were analyzed using logistic regression (LR), decision tree (DT), random forest (RF), and extreme gradient boost (XGBoost).LR, DT, and RF are the most commonly used machine learning techniques, and XGBoost is a machine learning technique that has recently emerged [27][28][29].
This study aimed to identify the factors affecting metabolic syndrome in single-person households using machine learning with large-scale health data from the National Health and Nutrition Examination Survey (NHANES) [30].However, few studies have applied machine learning and LCA to identify the factors affecting metabolic syndrome in single-person households [23,24,30].The contribution or significance of this study is not finding any exact answer but finding new variables or overlooked parts through basic research or translational research for clinical application.The core value of translational research lies in its effort to apply basic research to clinical practice with a high success rate at a low cost in a short period.
Hence, this study was designed to establish basic data to develop customized interventions by categorizing and characterizing metabolic syndrome risk factors in South Korean single-person households using machine learning techniques and LCA.

Purpose of This Study
This study used data from the NHANES spanning 10 years (2009-2018), applied machine learning techniques to identify the factors that affect the occurrence of metabolic syndrome, and applied LCA to classify single-person households.The purpose of this study was to categorize risk groups and identify risk factors for metabolic syndrome in South Korean single-person households.

Research Design
This study was a secondary data analysis that used machine learning techniques and LCA to categorize metabolic syndrome risk factors to identify the factors influencing the occurrence of metabolic syndrome in single-person households.The overall flowchart of the study is shown in Figure 1.

Participants of the Study
This study used raw data from the 10-year NHANES (2009-2018) conducted by the Ministry of Health and Welfare and the Korea Centers for Disease Control and Prevention for a secondary data analysis.The South Korean NHANES generated data representative of the South Korean population using stratified colony sampling.The total number of respondents was 83,294, among whom there were 1376 (1.65%) single-person households, and 79,717 (95.71%) households with ≥2 persons.Of the 1376 single-person households, 1371 (99.64%) were finally selected as study participants, excluding 5 households because of missing data and older age.

General Characteristics
We selected participants with the following characteristics: sex (male or female), age (early adulthood, ie, 19-39 y of age, and middle adulthood, ie, 40-64 y of age), educational level (lower than high school to higher than an undergraduate [4-year] college degree), marital status (married or unmarried), income level, and economic activity status (active or inactive).

Eating Habits
We administered a questionnaire to determine how frequently respondents dined out (5 times/wk, 1-4 times/wk, and <3 times/mo) and their dietary lifestyle ("good" or "bad").

Mental Health
We assessed respondents' awareness of stress (recognition or nonrecognition) and diagnoses of depression (diagnosed or undiagnosed).

Use of Medical Institutions and Community Services
We classified participants based on health, cancer, and oral cavity using "yes" or "no" responses and included the type of health insurance (local, employment-related, or uninsured or self-paying medical care) and subscription to private medical insurance (registered or unregistered for private medical insurance).

Metabolic Syndrome
We determined the presence of metabolic syndrome based on the National Cholesterol Education Program-Adult Treatment Panel 3 diagnostic criteria [31] and whether respondents possessed ≥3 of the following 5 criteria: hypertension, hyperglycemia, hypertriglyceridemia, hypo-high-density lipoproteinemia, and abdominal obesity.Waist circumference, triglycerides, high-density lipoprotein cholesterol levels, final systolic and diastolic blood pressures (mean of the second and third measurements), and fasting blood glucose level were used to determine the existence of metabolic syndrome.

Data Collection Method
We submitted our affiliation and purpose of using the data to the Korea Disease Control and Prevention Agency's data portal and then used the data, which contained no personal information.

Data Preprocessing
After sampling and merging the 10-year data from the NHANES, we conducted a data-cleansing process, and the distribution of variables was confirmed using the missing values function of the SPSS software (version 25.0;IBM Corp) to identify both ideal values and missing data [32].
In this study, data from a total of 83,294 individuals who participated in the 10-year (2009-2018) NHANES and application year survey were extracted.After extracting cases where the number of households (code name=cfam) was "1," out of 83,294 households, we found 3577 (4.29%) single-person households from 2009 to 2018.Of these 3577 individuals from single-person households, 1371 (38.33%) were finally selected after excluding older adults (aged ≥65 y) and those with missing values.
After extracting 10 years of data from the NHANES, this study went through a lightweight process, and to check outliers and missing values in the data, the missing value program of SPSS was used to check the weight of the group.A total of 1182 cases were finalized, processed, and deleted to confirm the initial and intermediate defects applied in the overlapping files of 10 years of data from the NHANES.For the analysis, age, a continuous variable, was converted into a categorical variable, and a metabolic syndrome variable was newly created in the case of having at least 3 of hypertension, hyperglycemia, high-density lipoproteinemia, hypertriglyceridemia, and abdominal obesity.The case of having 3 or more of each currency was made a reimbursement syndrome.Metabolic syndrome was analyzed according to the National Cholesterol Education Program-Adult Treatment Panel 3 diagnostic criteria [31].
In this study, when the influencing factors of the syndrome were analyzed by applying LR, DT, RF, and XGBoost among machine learning methods, the total number of discussions of the 10-year data from the NHANES was 7450.From 2009 to 2018, there were 390 results of splitting the data using the 10-fold cross-validation method.Among them, 154 items that accumulated drainage, 5 diagnostic criteria for metabolic syndrome unrelated to measurements, and study participants were analyzed as factors influencing the occurrence of metabolic syndrome in single-person households based on the code name MetS (metabolic syndrome or not).
We applied LR, DT, RF, and XGBoost algorithms among machine learning techniques with a total of 7450 variables of the 10-year NHANES data to analyze the influencing factors of metabolic syndrome.

Ethical Considerations
We performed data analysis after obtaining approval from Keimyung University's ethics committee for an exemption from deliberation (institutional review board number 40525-202008-HR-043-01) because we used existing data or published documents instead of directly engaging with participants.
In addition, of the 1371 respondents, 602 (43.91%) and 778 (56.75%) respondents indicated that their father and mother had an elementary school education, respectively.Regarding mental health, 930 (67.83%) of the 1371 respondents were not aware of stress, and 1247 (90.96%) of the 1371 respondents were not diagnosed with depression (Table 1).

Analysis of the Factors Influencing Metabolic Syndrome Using Machine Learning Techniques
We observed 390 common variables from 10 years of merged data (2009-2018) of the NHANES.Among them, 154 were excluded because they did not comply with the study, and 236 missing variables were analyzed to assess the factors affecting metabolic syndrome in single-person households.
Overall, 4 algorithms were applied in the analysis: LR, DT, RF, and XGBoost.The importance of the variables age, BMI, and subjective recognition of body type as extracted from LR was 212.56, 173.26, and 138.01, respectively.Furthermore, the importance of the variables BMI and dietary condition as extracted from DT was 35.50, 7.07, and 5.53, respectively.The importance of the variables BMI, obesity, and age as extracted from RF was 7.07, 2.99, and 2.80, respectively.Finally, the importance of the variables status of drinking, weight control, and age as extracted from XGBoost was 6.34, 5.81, and 3.06, respectively (Table 2).To summarize, we found age, BMI, obesity, and the subjective recognition of body type to be the most important common variables.

Determining the Number of Latent Class Layers
LCA was used to determine 4 indices of the model's goodness of fit: Bayesian information criteria, sample size-adjusted Bayesian information criteria, Lo-Mendell-Rubin adjusted likelihood ratio test, and bootstrapped likelihood ratio test.We determined the number of class layers through a preferential check of each measured model's goodness-of-fit index.In particular, we increased the number of layers, as illustrated in Table 3, and used several influencing factors to reveal the presence of metabolic syndrome in single-person households, finally deciding on 4 latent classes.

Names and Characteristics of the Latent Classes
It is important to select latent class classification variables to identify the factors affecting metabolic syndrome in single-person households through an in-depth consideration of prior research results [22,23].
Therefore, to diagnose metabolic syndrome, we selected sex, age, smoking, alcohol consumption, walking, obesity, hypertriglyceridemia, high blood pressure, high blood glucose, abdominal obesity, and hypo-high-density lipoproteinemia.On the basis of the characteristics and response patterns of subclass types classified through the LCA, we named these categorized classes as follows: group 1: intense physical activity in early adulthood, group 2: hypertension among middle-aged female respondents, group 3: smoking and drinking among middle-aged male respondents, and group 4: obesity and abdominal obesity among middle-aged male respondents.Tables 4 and 5 present the characteristics and names of each sublayer type according to each latent class.
From the 1371 participants, groups 1, 2, 3, and 4 had 320 (23.34%), 368 (26.84%), 329 (24%), and 354 (25.82%) participants, respectively.First, group 1 was compared with the other 3 groups, with 300 (93.8%) of the 320 participants indicating that age was the most important factor.Moreover, 289 (90.3%) respondents walked >3 times a week, which was substantially higher than that of the other groups.All the 5 diagnostic criteria for metabolic syndrome exhibited low rates, regardless of whether metabolic syndrome was present at 0%.In group 2, out of 368 respondents, 337 (91.6%) were female, and all participants in this group were in their middle adulthood.All the diagnostic criteria for metabolic syndrome exhibited low rates, whereas 47 (12.8%) participants had metabolic syndrome.In group 3, out of 329 respondents, 318 (96.7%) were male, which is more than the number of male respondents in other groups, and 250 (76%) respondents in this group were in their middle adulthood.The rate of smoking was high (n=249, 75.7%), and 181 (55%) participants reported a high frequency of alcohol consumption (>2 times/wk).In addition, 72 (21.9%) respondents had metabolic syndrome.In group 4, out of 354 participants, 255 (72%) participants were in middle adulthood.In terms of the diagnostic criteria for metabolic syndrome, 265 (74.9%) had hypertension, 306 (86.4%) were obese, 354 (100%) had abdominal obesity, and 232 (65.5%) had metabolic syndrome.

Relationships Between Latent Class Groups and Metabolic Syndrome
We performed a binary LR to predict metabolic syndrome outbreaks in the categorized latent class groups (Table 6).
Regression analysis of the groups, as classified by the LCA (independent variables) and occurrence of metabolic syndrome (dependent variable) was significant (χ

Principal Findings
This study is the first to identify risk factors for metabolic syndrome in South Korean single-person households from multiple angles using LCA and machine learning techniques.The purpose of this study was to classify the risk factors for metabolic syndrome in single-person households using LCA and to identify the types and characteristics of the classified latent class.This paper describes metabolic health (BMI, body weight, body fat percentage, blood pressure, and blood sugar) among the physical and social characteristics of single-person households.There were more single-person households in middle adulthood (40-64 y) than in early adulthood (19-39 y).In this study, age, BMI, obesity, drinking, and body shape were found as potential risk factors for metabolic syndrome in single-person households.A cross-sectional study such as this is necessary because it can identify the factors that affect metabolic syndrome in single-person households in South Korea and determine which factors should be targeted through appropriate intervention [33].
Existing studies on metabolic syndrome were conducted mainly among older and middle-aged adults [34][35][36][37].Among recent studies, several studies have confirmed the presence metabolic syndrome in the younger generation, suggesting that the metabolic syndrome morbidity rate among generations with various characteristics has increased [37,38].On the basis of this, it was found that the diversity of single-person households could not be overlooked.Importantly, it has been reported that health habits have substantial influence on metabolic syndrome [39].As health habits are already fixed in middle to late adulthood, it is difficult to expect changes in health behavior later; therefore, the prevention and management of metabolic syndrome in early adulthood should be considered [38][39][40].Therefore, it is evident that modifying health habits is the most important step in treating or preventing metabolic syndrome.
In this study, to categorize the risk factors for metabolic syndrome in adult single-person households, the LR, DT, RF, and XGBoost algorithms, which are machine learning techniques, were applied to identify factors that affect the occurrence of metabolic syndrome in adult single-person households.In this analysis, variables such as age, BMI, obesity, XSL • FO RenderX alcohol consumption, and subjective body shape recognition were commonly derived.This suggests that the factors identified in previous studies as affecting metabolic syndrome in adult single-person households and the factors identified by applying machine learning techniques in this study are consistent with each other [30].It is important to actively encourage physical activity to prevent metabolic syndrome [39].In addition, it is necessary to develop a differentiated health management strategy using mobile health programs for single-person households in early adulthood with sustainable and compelling content relevant to their daily lives.
Unlike group 1, group 2 comprised mostly female respondents, primarily in the center of middle adulthood or older.In addition, this group had low rates of smoking and obesity and a high rate of hypertension.These results were consistent with those of previous studies, which indicated that high blood pressure in middle adulthood causes metabolic syndrome [40].In addition, the rates of normal weight and overweight were the highest and second highest, respectively, in this group, which is consistent with the study by Kang et al [41], which reported that physical activity reduces hypertension and prevents metabolic syndrome among female individuals.This finding suggests that high blood pressure is an important risk factor for developing metabolic syndrome in single-person households [42].
Hypertension was an important risk factor, as seen in group 2. Thus, to prevent metabolic syndrome in group 2, it is important to develop and implement intervention programs for reducing blood pressure through diet and exercise therapy programs, encourage physical activity, and reduce obesity [43,44].
In group 3, the proportion of male respondents was significantly higher.In addition, the rate of smoking, frequency of alcohol consumption, and the rate of obesity were the highest in this group compared with the other groups.Moreover, sex and age were important risk factors for metabolic syndrome, which is consistent with the large proportion of middle-aged respondents in group 3.This group also exhibited characteristics of typical middle-aged workers, indicating the need to observe and manage smoking and alcohol consumption, especially among office workers [45].These findings coincide with the finding of the study by Oh [46] that smoking facilitates metabolic syndrome, whereas its cessation prevents it among middle-aged male individuals.Thus, alcohol consumption and smoking were important risk factors for metabolic syndrome in group 3.
In this group, 21.9% (72/329) of the participants developed metabolic syndrome, and this group was 8.99 times more likely to develop metabolic syndrome than group 1.This corroborates the findings of Oh [46], as those in middle adulthood are more likely to be exposed to hypertension, hyperlipidemia, smoking, and alcohol consumption; hence, this group requires close monitoring and preventive nursing interventions.Moreover, although stress often leads to a desire to smoke and compels ex-smokers to begin smoking again, it is not fully clear as to why it is difficult to cease smoking [45,46].Therefore, nursing interventions are needed to increase the motivation to quit smoking.
Further, another study discovered that the greater the stress, the higher the risk of health problems, such as smoking and depression [6,25].Higher nicotine dependence demonstrates that smoking may be an inappropriate response if psychological problems such as stress and depression are not properly managed [12,44,45].In addition, as Korean populations are often exposed to smoking when dining together and drinking socially, it is necessary to establish a culture of smoking cessation and changes in dining manners.
In group 4, the proportions of male respondents and female respondents were similar, with a high proportion of respondents in middle adulthood.Further, all respondents in the group exhibited obesity (based on the respondents' BMI) or abdominal obesity (based on the respondents' waist circumference).Obesity is also associated with the development of insulin resistance and beta-cell dysfunction, regardless of whether it is accompanied by abdominal obesity, which is consistent with prior literature [37,42].Our results are also consistent with a report by Detournay et al [14], which revealed that obesity and abdominal obesity during female menopause may cause metabolic syndrome.
In group 4, metabolic syndrome was prevalent among 65.5% (232/354) of the respondents, and this group was 17.67 times more likely to develop metabolic syndrome than group 1.Moreover, the rates of hypertension, hyperglycemia, abdominal obesity, and hypo-high-density lipoproteinemia were higher than those in the other groups.As having at least 3 of the 5 criteria is an important basis for diagnosing metabolic syndrome, this is a critical factor [47].This study's LCA demonstrated that heterogeneous subgroups exist depending on metabolic syndrome risk factors, which is different from the results of most previous studies that focused on specific metabolic syndrome risk factors.We have proven that certain risk factors may have more prominent effects and affect certain age groups more strongly.Moreover, obesity and abdominal obesity were the most influential risk factors for metabolic syndrome in single-person households.
A national policy to promote physical activity is needed to prevent and manage metabolic syndrome in single-person households.In addition, strategies are needed to develop intervention programs for enhancing physical activity at any time or anywhere through mobile health and wearable devices; such programs would naturally integrate physical activity into daily life.Thus, it would be much more effective to develop and implement different risk-based intervention strategies for different individuals.It would be beneficial if customized mediations based on individual needs could be developed and implemented, taking into consideration subgroup characteristics instead of the collective metabolic syndrome risk factors.Therefore, rather than considering individuals with metabolic syndrome risk factors as a homogenous group and applying the developed interventions collectively, customized interventions should be developed considering the characteristics of each subgroup, and groups that share the same characteristics should be efficiently classified.Such interventions can be made much more effective if they incorporate strategies targeting each of the various risk factors for metabolic syndrome.

Limitations
This study has several limitations.First, the NHANES questionnaire we used could not incorporate various variables.Due to annual changes in the survey questions, data were extracted that matched all 10 years of the survey questions.Second, as the object of investigation differed every year, tracking the longitudinal changes and progress of metabolic syndrome was a challenge.Third, although various machine learning techniques were used in this study, the most commonly used artificial neural network technique was not used.In the future, it will be necessary to conduct research applying deep learning methods such as artificial neural networks.

Conclusions
This study is significant in that it is the first to use latent stratification analysis and machine learning techniques to identify the types and characteristics of potential subgroups classified based on potential metabolic syndrome risk factor indicators in adult single-person households.This study conducted a secondary analysis of data (2009-2018) from the NHANES hosted by the Korea Centers for Disease Control and Prevention, through which it classified and characterized risk factors for metabolic syndrome in adult single-person households.
In this study, machine learning techniques were applied to identify factors affecting metabolic syndrome in adult single-person households, which were identified as high parameters.In addition, the groups classified based on risk factors for metabolic syndrome in adult single-person households using LCA were intense physical activity in early adulthood, hypertension in middle-aged female respondents, smoking and drinking in middle-aged male respondents, and obesity and abdominal obesity in middle-aged male respondents.In addition, when confirming the difference between potential class groups according to the factors influencing metabolic syndrome, the 4 potential classes showed substantial differences in general characteristics such as education level, income level, frequency of dining out, dietary life, subjective health status, and subjective body shape recognition.In addition, when examining the prediction of the occurrence of metabolic syndrome for each group, it was found that the obesity and abdominal obesity in middle-aged male respondents group had the highest probability, indicating that it was the most susceptible high-risk group in terms of the occurrence of metabolic syndrome.
This study is meaningful as a new attempt to identify the factors influencing metabolic syndrome in adult single-person households by applying machine learning techniques, categorize risk factors for metabolic syndrome using LCA, and identify the characteristics of each latent class.Therefore, this study provides new knowledge and contributes to the prevention of metabolic syndrome in adult single-person households by identifying 4 latent classes through LCA and thus facilitating the development of customized interventions.

Figure 1 .
Figure 1.Overall flowchart of this study.DT: decision tree; LR: logistic regression; NHANES: National Health and Nutrition Examination Survey; RF: random forest; XGBoost: extreme gradient boost.

Table 1 .
General characteristics of the study participants (N=1371).

Table 2 .
Analysis of the factors influencing metabolic syndrome using machine learning techniques.

Table 3 .
Model fit indices for the latent class analysis model (N=1371).

Table 5 .
Latent classes of metabolic syndrome in South Korean single-person households.
a b Set as reference category in latent class analysis.