Published on in Vol 7 (2023)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/42202, first published .
Development, Reliability, and Structural Validity of the Scale for Knowledge, Attitude, and Practice in Ethics Implementation Among AI Researchers: Cross-Sectional Study

Development, Reliability, and Structural Validity of the Scale for Knowledge, Attitude, and Practice in Ethics Implementation Among AI Researchers: Cross-Sectional Study

Development, Reliability, and Structural Validity of the Scale for Knowledge, Attitude, and Practice in Ethics Implementation Among AI Researchers: Cross-Sectional Study

Original Paper

1Children's Hospital of Fudan University, Shanghai, China

2School of Philosophy Fudan University, Shanghai, China

3School of Computer Science Fudan University, Shanghai, China

4School of Public Health Fudan University, Shanghai, China

5Department of Paediatrics, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China

6Shanghai Hospital Development Center, Shanghai, China

*these authors contributed equally

Corresponding Author:

Rui Feng, PhD

School of Computer Science Fudan University

Room D5011, Interdisciplinary Academic Building, Jiangwan Campus

2005 Songhu Road

Shanghai, 200438

China

Phone: 86 21 51355534

Email: fengrui@fudan.edu.cn


Background: Medical artificial intelligence (AI) has significantly contributed to decision support for disease screening, diagnosis, and management. With the growing number of medical AI developments and applications, incorporating ethics is considered essential to avoiding harm and ensuring broad benefits in the lifecycle of medical AI. One of the premises for effectively implementing ethics in Medical AI research necessitates researchers' comprehensive knowledge, enthusiastic attitude, and practical experience. However, there is currently a lack of an available instrument to measure these aspects.

Objective: The aim of this study was to develop a comprehensive scale for measuring the knowledge, attitude, and practice of ethics implementation among medical AI researchers, and to evaluate its measurement properties.

Methods: The construct of the Knowledge-Attitude-Practice in Ethics Implementation (KAP-EI) scale was based on the Knowledge-Attitude-Practice (KAP) model, and the evaluation of its measurement properties was in compliance with the COnsensus-based Standards for the selection of health status Measurement INstruments (COSMIN) reporting guidelines for studies on measurement instruments. The study was conducted in 2 phases. The first phase involved scale development through a systematic literature review, qualitative interviews, and item analysis based on a cross-sectional survey. The second phase involved evaluation of structural validity and reliability through another cross-sectional study.

Results: The KAP-EI scale had 3 dimensions including knowledge (10 items), attitude (6 items), and practice (7 items). The Cronbach α for the whole scale reached .934. Confirmatory factor analysis showed that the goodness-of-fit indices of the scale were satisfactory (χ2/df ratio:=2.338, comparative fit index=0.949, Tucker Lewis index=0.941, root-mean-square error of approximation=0.064, and standardized root-mean-square residual=0.052).

Conclusions: The results show that the scale has good reliability and structural validity; hence, it could be considered an effective instrument. This is the first instrument developed for this purpose.

JMIR Form Res 2023;7:e42202

doi:10.2196/42202

Keywords



The expression “artificial intelligence” (AI) was introduced by John McCarthy, and the official birth of AI as a field of research was unanimously considered at the Dartmouth Conference in 1956 [1]. AI has been defined as “the machine simulation of human mental reasoning, decision making, and behavior” [2]. The increased power of computing, storage capacity expansion, and compilation of big medical data helped the AI implementation surge in medical practice and research [2]. Medical AI is currently used in various medical fields such as medical image analysis, disease screening and prediction, clinical decision support, surgical robotics, health management, virtual medical assistants, and aiding in screening drug targets [3-7]. However, while the rapid development of AI brings higher efficiency, accuracy, and convenience to medicine and health care, it is also accompanied by many risks and challenges. Such issues are related to the intrusion of AI algorithms into the privacy and intimacy of people under investigation, enormous deficits of informed consent detected in AI research processes, and shirking liability in medical malpractice where AI is applied in decision-making [8-10]. There is now a growing consensus among experts that implementing ethics in medical AI research is crucial [11].

National and international research institutions have put forward several principles or guidelines for ethical governance of medical AI. The guidelines of the Ethics and Governance of Artificial Intelligence for Health (World Health Organization, 2021) [12] contains a set of recommendations to ensure that the governance of AI holds all stakeholders accountable and responsive to end users, and TheNew Generation of Ethical Norms of Artificial Intelligence (Ministry of Science and Technology of China, 2021) [13] and TheGuidelines of Strengthening Governance over Ethics in Science, Technology (General Office of the State Council of China, 2022) [14] raise some similar principles of ethics governance in the development and application of AI. Meanwhile, the Data Governance Act (European Parliament, 2020) [15] suggests certain technical instruments to ensure the preservation of protection, privacy, and confidentiality in the transfer, reuse, and recovery of data by third parties, and the Artificial Intelligence Act (European Parliament, 2021) [16] establishes a European Artificial Intelligence Committee to ensure compliance with the implementation and enforcement of the regulations and encourage the exchange of best practices. There are also several strategies for implementing ethics into medical AI research. One approach is to consider ethics at the design and requirement capture stage by embedding ethical values into the application using methodologies such as Value-Sensitive-Design [17] and Values in Motion Design [18]. Others consist of coding ethics in the operating system [19], embedding ethics principles in the algorithm [20], and monitoring and evaluating the applications [21].

Implementing ethics is essentially motivated by the need to gain the trust of the patients, the implementor being the researcher [22]. Successful ethics engagement requires the ethical competence of stakeholders as well as the intention to comply with corresponding values. The prerequisite is that the researchers master relevant ethical knowledge, agree with ethical values, and then behave as expected [23]. Currently published ethics implementation evaluation mainly focuses on better patient outcomes [22,24], reporting the safety, equity, cost-effectiveness, privacy, clear professional responsibilities, autonomy, justice, and fairness in AI development and implementation [22,24-27], conducted through checklists, questionnaires, or stakeholders' consultations to promote careful design and execution of medical AI research, and to assess the ethical and social implications of AI implementations [28-30]. We frequently fail to consider the perception of ethics implementation among medical AI researchers, which serves as the fundamental driving force for ensuring ethical practices. To address this gap, it is crucial to develop tools that comprehensively measure AI researchers' knowledge, attitudes, and practices of ethics implementation.

The Knowledge-Attitude-Practice (KAP) model is widely used in medical research as the most commonly used model [31-33], proposed that knowledge was the basis of behavior change, and attitude and practice are the driving force of behavior change. This study aimed to develop a scale based on the KAP model for measuring the perception of ethics implementation among medical AI researchers and evaluate the reliability and structural validity of the scale. Our hypothesis is that the scale is well-designed and has good measurement properties.


Study Design

The study was conducted in 2 phases. Item generation, expression refinement, and item analysis were involved in the first phase through systematic literature reviews, qualitative analysis, and item analysis based on a cross-sectional study. In the second phase, another cross-sectional survey was conducted to test the measurement properties of the developed scale.

Procedures and Participants

Phase 1-1: Item Generation and Cognitive Interview

The KAP model was used as the conceptual framework to define the construct to be measured. Based on the model, knowledge is composed of scientific knowledge, local knowledge, tacit knowledge, and self-reflective knowledge [34]. Attitude referred to a positive or negative option of objective evaluation [35]. Practice included regular activities influenced by widely shared beliefs [36]. The initial step involved systematic literature retrieval to gather guidelines, expert consensus, practice standards, and norms referring to the implementation of ethics in AI research. A librarian working in the hospital library provided valuable assistance during this process (see Multimedia Appendix 1). Then, a focus group interview was conducted, consisting of 10 experts (2 medical ethics professors, a sociology professor, 3 AI professors, and 4 medical professors proficient in medical AI implementation research). They expressed their opinions on the following issues: (1) what is your understanding of implementing ethics in medical AI research? (2) What knowledge do you think medical AI researchers should master to help implement ethics? (3) What are your perceptions of implementing ethics in medical AI research? (4) How do you implement ethics in medical AI research? Eventually, relevant contents from various bodies of literature and interviews were extracted and classified in accordance with the items.

All the generated items were sent to another 10 experts (including medical ethics professors, sociology professors, AI professors, and medical professors) for consultation. Item deletion and revision were applied in accordance with the findings from 3 rounds of expert consultation. After that, the first draft was formed. Eight people (including AI researchers, health care workers, and health information managers) were invited to complete the first draft scale and then interviewed with the following questions: (1) was each item clearly expressed without ambiguity? If no, please identify the unclear or ambiguous expressions. (2) Were there any items difficult to understand? If yes, please identify the difficulties and if not, please try to explain each item in your own words. (3) What were your reasons for each of your answers? (4) What else is needed to be added? Language readability of each point was modified in accordance with the comments. The time that each person spent completing the questionnaire was also recorded. The final draft was a 5-point Likert scale with 3 dimensions. The responses for the dimension of knowledge ranged from “not familiar at all (1 point)” to “extremely familiar (5 points),” that for attitudes ranged from “strongly disagree (1 point)” to “strongly agree (5 points),” and that for practice ranged from “never (1 point)” to “always (5 points).” Reverse scoring was performed for the items running in the opposite direction. Face validity was calculated by scoring with a 4-point scale [37] (1=not relevant, 2=unable to assess or need much revision, 3=relevant but need minor revision, and 4=very relevant and succinct) with the other 52 participants. The inclusion criteria were AI developers, AI algorithm engineers, or AI implementation researchers with relevant experiences of more than 5 years.

Phase 1-2: Item Analysis

A cross-sectional survey was conducted on June 25, 2022, and ended on July 31, 2022. Considering that the amount of medical AI researchers is relatively small, snowball sampling [38-40] was used to enrich survey samples in the cross-sectional study. Researchers of medical AI development or medical AI application were eligible to serve as participants if they had taken part in more than 1 medical AI research project. Potential participants were excluded if none of the projects they were involved in had been finished; if the time they spent completing the survey was too short or too long (time<mean-SD or time>mean+SD), if their answers to demographic questions were illogical, and if their answers to all items were the same. At first, 6 medical AI researchers who had previously worked on medical AI research projects with our hospital were identified. They were from hospitals, the department of computer science at a certain university, and 3 computer programming companies across the regions of North China, East China, and Northeast China. Initially, they were invited to complete the questionnaire, and then they were asked to send the QR code or link of the blank questionnaire to their colleagues or research partners who were eligible and might be willing to be recruited. Subsequent participants repeated the abovementioned procedure to recruit other potential participants until the required sample size was achieved. Ideally, each participant was asked to invite 3-5 eligible individuals to join the study.

We distributed an electronic questionnaire including participants' demographics and the final draft by Wenjuanxing [41], a professional and widely used website for conducting surveys in China. Participants could scan the QR code using their cell phones or log in on their computers to access and complete the questionnaire. The purpose of the survey and answering instructions were described on the first page of the web-based questionnaire. The participants were suggested to complete the questionnaire within 5 to 10 minutes. The time limit was set on the basis of the actual time spent on the questionnaire recorded in the first phase of the study. There is also a limit on respondents’ IP addresses to avoid multiple enrollments. A reminder for checking blank answers was set to block the submission of unfinished questionnaires.

Phase 2: Testing Reliability and Structural Validity

Another cross-sectional survey was conducted on February 20, 2023, and ended on April 26, 2023, for testing the reliability and structural validity of the scale, following the COnsensus-based Standards for the selection of health status Measurement INstruments (COSMIN) reporting guideline [42]. Snowball sampling was adopted again. The inclusion and exclusion criteria were the same as those in the first cross-sectional study. The sample-to-item ratio is used to decide the sample size. The sample size was estimated at 15 to 20 participants per item in the first survey [43]. As there were 23 items in the developed scale, a sample size of 345 to 460 participants was required. Paper questionnaires were distributed by trained investigators employed at each survey site.

Statistical Analysis

Excel 365 for Windows (Microsoft Corp) was used to establish a database. Data were analyzed using SPSS (version 25.0; IBM Corp) and AMOS (version 24.0; IBM Corp) for Windows.

Descriptive statistics are used to show the characteristics of the participants involved in the cross-sectional studies. Item analysis was conducted as described previously [44] by calculating the following: (1) item discrimination: after ranking the participants by their total score on items, we selected those from the top 27% and the bottom 27% and ran an independent t test to determine whether each item could significantly distinguish the 2 groups. The item that failed to do so or whose t value was <3 would be removed. (2) Item correlation: we inspected the correlation matrix between items and scale and removed the item whose correlation coefficient was less than 0.40. (3) Item homogeneity: we measured the Cronbach α coefficient of the scale first and inspected the change in value by deleting 1 item at a time. The item would be removed if the changeable value was significantly higher than the original one, the absolute value of factor loading was less than 0.4, or the value of community was less than 0.6.

A test of structural validity was performed using exploratory factor analysis (EFA) and confirmatory factor analysis (CFA). The Kaiser-Meyer-Olkin (KMO) and Bartlett tests were used to determine whether our data were suitable for EFA. A Bartlett score of <0.05 and a KMO score of ≥0.7 were considered appropriate. The Varimax oblique rotation method was applied to extract the factor loadings [45]. As the KAP model is composed of 3 dimensions, the third-order CFA model was used to establish the scale’s construct validity. To assess the model’s fitness, the following absolute and incremental fit indices were used: (1) root-mean-square error of approximation (RMSEA), with 0.08 as a cutoff for poor-fitting models; (2) standardized root-mean-square residual (SRMR), where a value of less than 0.08 is generally considered a good fit; (3) comparative fit index (CFI) ranging between 0.0 and 1.0, where values closer to 1.0 indicate good fit (CFI≥0.90); and (4) Tucker Lewis Index (TLI), also ranging between 0.0 and 1.0, where TLI≥0.9 indicates a good fit [45-48].

For face validity, the item-level content validity index (I-CVI) was computed as the number of experts assigning a scoring of 4 or 3 for each item divided by the total number of experts. Similarly, the average scale–level CVI (S-CVI/Ave) was calculated using the number of items that achieved a scoring of 4 or 3 divided by the total number of items. Interrater reliability (IRR) was calculated using the total number of items scoring in agreement divided by the total number of items. An I-CVI of ≥0.8, S-CVI/Ave of ≥0.9, and IRR of >0.7 were considered acceptable [49,50].

For the reliability of the scale, a Cronbach α of ≥.7 was used as the reference. Both values for split-half reliability and test-retest reliability of ≥0.7 were considered acceptable.

Ethical Considerations

The study was conducted in accordance with the guidelines of the Declaration of Helsinki and approved by the Research Ethics Board of the Children's Hospital of Fudan University (2022-52).


Sample Characteristics

A total of 306 responses were received in the first survey and 48 questionnaires were excluded (5 with illogical answers about date of birth, 23 with an answering time of <8.21 (SD 4.18) minutes, and 9 had the same answers to all items). Finally, 269 questionnaires were included in the analysis of the first survey. Similarly, 481 responses were received in the second survey with 11 questionnaires excluded for having the same answers to all items. The characteristics of the participants are shown in Table 1.

Table 1. Characteristics of participants in the first and second surveys.
CharacteristicParticipants in the first survey, n (%)Participants in the second survey, n (%)
Gender

Male196 (72.9)368(78.3)

Female73 (27.1)102 (21.7)
Related working experience

<3 MAIa research projects137 (50.9)231 (49.1)

3-5 MAI research projects102 (37.9)179 (38.1)

6-10 MAI research projects22 (8.2)43 (9.1)

>10 MAI research projects8 (3.0)17 (3.6)
Education level

Bachelor\'s degree108 (40.2)185 (39.3)

Master\'s degree101 (37.5)171 (36.3)

Doctoral degree60 (22.3)114 (24.3)
Occupation

Medical staff60 (22.3)134 (28.5)

Health information manager65 (24.2)148 (31.5)

AI research and development Engineer or algorithm engineer68 (25.3)117 (24.8)

Medical student11 (4.1)0 (0)

Computer science student65 (24.2)71 (15.1)

aMAI: medical artificial intelligence.

Development of the Knowledge-Attitude-Practice in Ethics Implementation Scale

In total, 25 items and 3 dimensions (9 for knowledge, 7 for attitude, and 9 for practice) were generated first. Two identical items were combined, 1 miscellaneous item was split, 8 items were revised for obscure expression, and 3 items were added as suggested by the experts during expert consultation. The experts for content validity assessment also proposed wording amendments in the practice dimension, such that the same content is expressed in 2 ways from the perspective of both research leaders or primary researchers and research participants. The final draft (see Multimedia Appendix 2) had 28 items (F1- F28). According to the results of item analysis, 5 items (F14, F17, F19, F20, and F21) were removed for the suboptimal absolute value of the critical ratio (CR), factor loading, correlation coefficient, and Cronbach α coefficient (after the change), as shown in Table 2. The final version of the Knowledge-Attitude-Practice in Ethics Implementation (KAP-EI) scale is presented in Multimedia Appendix 3 with 23 items.

Table 2. Results of item analysis (N=28; Cronbach α=.904).
ItemItem discrimination, critical ratio (P value)Item correlation, r (P value)Item homogeneitySubstandard items, nComments



Cronbach α (after the change)Value of communityValue of factor loading

F18.567 (<.001)0.553 (<.001)0.9010.5070.6120 Reserved
F211.741 (<.001)0.697 (<.001)0.8980.7130.7740Reserved
F312.84 (<.001)0.716 (<.001)0.8980.7640.7900Reserved
F413.995 (<.001)0.738 (<.001)0.8970.7920.8200Reserved
F516.119 (<.001)0.792 (<.001)0.8960.8470.8680Reserved
F613.874 (<.001)0.759 (<.001)0.8970.8090.8460Reserved
F716.168 (<.001)0.782 (<.001)0.8960.8480.8560Reserved
F815.388 (<.001)0.754 (<.001)0.8970.8170.8310Reserved
F914.848 (<.001)0.767 (<.001)0.8960.8240.8430Reserved
F1014.326 (<.001)0.755 (<.001)0.8970.7260.8130Reserved
F116.078 (<.001)0.403 (<.001)0.9030.6900.7240Reserved
F123.896 (<.001)0.332a (<.001)0.9040.7840.8381Reserved
F135.792 (<.001)0.423 (<.001)0.9030.8130.7640Reserved
F142.366a (.02)0.141a (.02)0.909a0.5740.475a4Removed
F155.106 (<.001)0.417 (<.001)0.9030.7760.7960Reserved
F165.151 (<.001)0.401 (<.001)0.9030.6840.7360Reserved
F17–1.517a (.13)–0.142a (.02)0.914a0.361–0.505a4Removed
F184.938 (<.001)0.359a (<.001)0.9040.7550.7951Reserved
F192.705a (.008)0.248a (<.001)0.906a0.5300.6693Removed
F20–0.128a (.90)–0.029a (.63)0.911a0.637–0.297a4Removed
F211.536a (.13)0.124a (.04)0.909a0.7230.035a4Removed
F2218.411 (<.001)0.739 (<.001)0.8960.6690.7110Reserved
F2315.496 (<.001)0.718 (<.001)0.8970.7720.6850Reserved
F2411.675 (<.001)0.641 (<.001)0.8990.7920.6120Reserved
F2516.826 (<.001)0.749 (<.001)0.8960.7960.7180Reserved
F2611.327 (<.001)0.639 (<.001)0.8990.8210.6150Reserved
F2714.129 (<.001)0.710 (<.001)0.8970.8340.6850Reserved
F2813.209 (<.001) 0.677 (<.001)0.8980.8140.6680Reserved
Criteria≥3.000 (N/Ab)≥0.400 (N/A)≤0.904≥0.200≥0.600N/A N/A

aSubstandard values.

bN/A: not applicable.

Face Validity of the KAP-EI Scale

The corrected I-CVI was 0.851, S-CVI/Ave was 0.901, and IRR was 0.882.

EFA of the KAP-EI Scale

The Bartlett test was sensitive(χ2/df ratio=6583.040; P<.001), and the KMO measure of sampling adequacy for the scale was 0.930, indicating that factor analysis was applied to the scale sample. In the preliminary results, 3 factors were extracted and explained 76.76% of the variance. EFA resulted in 3 factors with a total of 23 items. The eigenvalue of factor 1, which explained 44.25% of the variance, was 10.177. Based on the scale items’ content, factor 1 was “knowledge” and comprised 10 items. The eigenvalue of factor 2, which explained 19.952% of the variance, was 4.589. Based on the scale items’ content, factor 2 was “attitude” and comprised 6 items. The eigenvalue of factor 3, which explained 12.562% of the variance, was 2.889. Based on the scale items’ content, factor 3 was “practice” and comprised 7 items (Table 3).

Table 3. Exploratory factor analysis of the Knowledge-Attitude-Practice in Ethics Implementation scale.
ItemValue of factor loading after rotationValue of community

KnowledgePracticeAttitude
F70.884N/AaN/A0.848
F50.873N/AN/A0.846
F90.873N/AN/A0.824
F80.872N/AN/A0.816
F40.857N/AN/A0.791
F60.851N/AN/A0.809
F30.850N/AN/A0.762
F20.817N/AN/A0.712
F100.799N/AN/A0.726
F10.677N/AN/A0.507
F26N/A0.882N/A0.809
F27N/A0.879N/A0.834
F24N/A0.875N/A0.789
F28N/A0.864N/A0.810
F25N/A0.841N/A0.795
F23N/A0.839N/A0.771
F22N/A0.723N/A 0.657
F13N/AN/A0.8890.798
F15N/AN/A0.8830.799
F12N/AN/A0.8780.802
F18N/AN/A0.8670.754
F11N/AN/A0.8330.704
F16N/AN/A0.8250.690
Percentage of variance, %44.24619.95212.562N/A
Percentage of the cumulative, %44.24664.19876.759N/A

aN/A: not applicable.

CFA of the KAP-EI Scale

Fit indices with a revised parameter specification yielded better and reasonably good fit (χ2/df ratio=2.338, CFI=0.949, TLI=0.941, RMSEA=0.064, and SRMR=0.052), which supports the KAP-EI scale's 3D structure. Figure 1 shows the standardized estimates of CFA. The standardized factor loadings (λ) of all the items ranged from 0.62 to 0.92. The CR of the 3 dimensions of knowledge, attitude, and practice was 0.963, 0.935, and 0.948, and the average variance extracted was 0.725, 0.706, and 0.724, respectively.

Figure 1. Standardized estimates of confirmatory factor analysis for the validation sample.

Reliability of the KAP-EI Scale

Each of the 3 dimensions demonstrated satisfactory internal consistency with Cronbach α values in the range of .935-.964. The Cronbach α for the whole scale approached .934 (Table 4).

Table 4. Internal consistency of the Knowledge-Attitude-Practice in Ethics Implementation (KAP-EI) scale.
DimensionsItems, nScore, mean (SD)Cronbach α
Knowledge1029.85 (7.412).964
Attitude623.98 (4.321).935
Practice718.36 (7.362).950
KAP-EI scale2372.18 (1)4.132.934

Principal Findings

The purpose of this study was to develop a KAP model–based scale for researchers to measure the implementation of ethics into medical AI research and explore its validity and reliability. The Cronbach α, the value for split-half reliability and test-retest reliability of the whole scale, was higher than .7, which indicated that the scale had excellent reliability. Additionally, the corrected I-CVI was higher than 0.8, the S-CVI/Ave was more than 0.9, and IRR was more than 0.7, which implied that it had good content validity. The 3-factor model obtained after EFA was tested by CFA. Based on the results, the model (23 items) was a good fit for the data. For the item analysis, except for F12, F14, and F17-F21, the correlation coefficient between the items was greater than 0.4; the absolute value of the critical ratio of F14, F17, F19, F20, and F21 was less than 3; the revised Cronbach α coefficient after deleting F14, F17, F20, and F21 was less than the original Cronbach α coefficient (.904); and the absolute values of factor loading of F14, F17, F20, and F21 were less than 0.6. According to the experts' opinions from the focus group interview at the stage of item generation, these items were good questions because the response to either of them was uncertain and the participants might freely express their views. Notably, we intended to develop a scale to measure medical AI researchers' knowledge, attitude, and practice in the implementation of ethics, instead of investigating their views about knowledge, attitude, and practice. The results also show that questions about views were not suitable for a scale. F14, F17, and F18-F21 were deleted in this procedure. To optimize the questionnaire design, the content of items should have been more specific and directional, but we failed to do that precisely because China lacks the necessary mechanism guarantee for ethics engagement and extensive research on the very subject.

Evaluation of the Implementation of Ethics in Medical AI

While some strategies include ways of evaluating the implementation of ethics [22,24-27], we could not find clear measures on whether the implementations were successful in medical AI research. The absence of clearly defined measures of successful implementation of ethics might reveal a lack of maturity in this emerging field. The complexity of implementing ethics may reinforce the need for a common language among different stakeholders [51].

The relationship between ethical values and behavior has attracted the interest of social scientists for several decades [52,53]. This is also reflected in correlation coefficients between the dimensions of attitude and practice. Values (referring to attitudes) are defined as desirable goals that act as guiding principles in implementing ethics. They are then translated and become visible through individual behaviors and concrete actions (referring to practice). Values might be incommensurable, and people may confer different significance to the same value [54], which indicates a weak relationship between attitude and practice in the implementation of ethics. Similarly, correlation coefficients between the dimensions of knowledge and practice also indicated a weak relationship.

Unsatisfactory performance of knowledge and practice in the implementation of ethics [55,56] may be due to the unclear ethics framework and the lack of a series of support such as ethical training [23]. As far as ethical evaluation is concerned, there are 2 important premises to decide whether the effect of ethics implementation is good or not and whether it may be reflected from medical AI researchers' KAP. One premise is the supporting mechanism, and the other is a feasible ethics framework. The supporting mechanism includes soft constraints (such as ethical norms, specific requirements for ethical review, etc), hard constraints (such as relevant laws and regulations), and a series of conditions to guarantee them. The ethics framework results from interdisciplinary cooperation of science, engineering, and ethics to some extent, which requires joint research and discussion with philosophers, medical AI researchers, end users, and policy makers. Therefore, further research to investigate the effectiveness of the framework is needed, similar to this study, to serve as evidence for decision-making. We hope that this scale serves as the first tool to embark on further medical AI ethics implementation, which still has a long way to go in China.

Limitations

Our results raise a number of issues, which could be best explored in future research. First, the members of the expert group who participated in the stage of item generation were from the same university and the same tertiary hospital in Shanghai; thus, the geographical limitation restricts the extent of generalization of our findings. Second, the participants were recruited through snowball sampling, which might have resulted in a sample selection bias. In addition, with the rapid development of medical AI, the approaches to implementing ethics are also constantly changing. The usage of the scale contents will be limited, and it needs to be revised regularly for optimization.

Conclusions

Creating a comprehensive scale is of paramount importance in investigating medical AI researchers' knowledge, attitudes, and practices of ethics implementation. The KAP-EI scale appears to be a reliable and valid instrument developed to advance the measurement of the perception of implementing ethics among medical AI researchers. To our knowledge, the KAP-EI scale is the first instrument designed for this purpose.

Acknowledgments

We would like to thank Rui Wang for her help in revising our paper. This work was partially supported by grants from the construction of a pediatric disease database and knowledge graph under the National Key Research and Development Program of China (2021ZD0113501), Social Experiment of Clinical Diagnosis Decision Support System under the Science and Technology Commission of Shanghai Municipality (21511104502), and the application and validation of real scenarios of major pediatric respiratory diseases under the Science and Technology Commission of Shanghai Municipality (22511106001).

Data Availability

The data sets generated during or analyzed during this study are not publicly available due to hospital policy but are available from the corresponding author on reasonable request.

Authors' Contributions

XZ, YG, and RF formulated the research aims, designed the methodology, drafted the manuscript. DW and WW designed the methodology and analyzed the study data. JY, CJ, and AML reviewed and edited the manuscript. XG and CY computed the resources. YZ and LT verified the results. YW and LS collected the data. HX and XZ acquired funding. BS and JF coordinated the execution of the study. All authors agree with the content of the manuscript and have provided final approval of the version to be published. RF is responsible for the overall content as a guarantor and accepts full responsibility for the finished work, had access to the data, and controlled the decision to publish.

Conflicts of Interest

None declared.

Multimedia Appendix 1

Data sources, retrieval strategies, and primary reference lists.

DOCX File , 20 KB

Multimedia Appendix 2

Final draft of the KAP-EI scale. KAP-EI: Knowledge-Attitude-Practice in Ethics Implementation.

DOCX File , 25 KB

Multimedia Appendix 3

Final version of the KAP-EI scale. KAP-EI: Knowledge-Attitude-Practice in Ethics Implementation.

DOCX File , 24 KB

  1. Cordeschi R. AI turns fifty: revisiting its origins. Appl Artif Intell. Apr 25, 2007;21(4-5):259-279. [FREE Full text] [CrossRef]
  2. Abdullah YI, Schuman JS, Shabsigh R, Caplan A, Al-Aswad LA. Ethics of artificial intelligence in medicine and ophthalmology. Asia Pac J Ophthalmol (Phila). 2021;10(3):289-298. [FREE Full text] [CrossRef] [Medline]
  3. Keskinbora KH. Medical ethics considerations on artificial intelligence. J Clin Neurosci. Jun 2019;64:277-282. [FREE Full text] [CrossRef] [Medline]
  4. Attia ZI, Kapa S, Lopez-Jimenez F, McKie PM, Ladewig DJ, Satam G, et al. Screening for cardiac contractile dysfunction using an artificial intelligence-enabled electrocardiogram. Nat Med. Jan 2019;25(1):70-74. [FREE Full text] [CrossRef] [Medline]
  5. Álvarez-Machancoses Ó, Fernández-Martínez JL. Using artificial intelligence methods to speed up drug discovery. Expert Opin Drug Discov. Aug 2019;14(8):769-777. [FREE Full text] [CrossRef] [Medline]
  6. Watson DS, Krutzinna J, Bruce IN, Griffiths CE, McInnes IB, Barnes MR, et al. Clinical applications of machine learning algorithms: beyond the black box. BMJ. Mar 12, 2019;364:l886. [CrossRef] [Medline]
  7. Artificial Intelligence White Paper (2022). China Academy of Information and Communications Technology. 2022. URL: https://cset.georgetown.edu/publication/artificial-intelligence-white-paper-2022/ [accessed 2022-07-01]
  8. González-Esteban Y Patrici Calvo E. Ethically governing artificial intelligence in the field of scientific research and innovation. Heliyon. Feb 2022;8(2):e08946. [FREE Full text] [CrossRef] [Medline]
  9. Wei BR, Xue P, Jiang Y, Zhai XM, Qiao YL. [World Health Organization guidance Ethical and Governance of Artificial Intelligence for health and implications for China]. Zhonghua Yi Xue Za Zhi. Mar 29, 2022;102(12):833-837. [CrossRef] [Medline]
  10. Sharma M, Savage C, Nair M, Larsson I, Svedberg P, Nygren JM. Artificial intelligence applications in health care practice: scoping review. J Med Internet Res. Oct 05, 2022;24(10):e40238. [FREE Full text] [CrossRef] [Medline]
  11. Guo R. The ethics and governance of artificial intelligence. Beijing. Law Press; Aug 01, 2020;42-42.
  12. Ethics and governance of artificial intelligence for health. World Health Organization. Jun 28, 2021. URL: https://www.who.int/publications/i/item/9789240029200 [accessed 2022-07-01]
  13. The New Generation of Ethical Norms of Artificial Intelligence. Ministry of Science and Technology of the People's Republic of China. 2021. URL: https://www.most.gov.cn/kjbgz/202109/t20210926_177063.html [accessed 2023-10-12]
  14. The Guidelines of Strengthening Governance over Ethics in Science, Technology. The State Council of the People's Republic of China. 2022. URL: http://www.gov.cn/zhengce/2022-03/20/content_5680105.htm [accessed 2023-10-12]
  15. Regulation of the European Parliament and of the Council on European data governance (Data Governance Act). European Commission. 2020. URL: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=celex%3A52020PC0767 [accessed 2023-10-12]
  16. Proposal for a Regulation of the European Parliament and of the Council. Laying Down Harmonised Rules on Artificial Intelligence (Artificial Intelligence Act) and Amending Certain Union Legislative Acts. European Commission. 2021. URL: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=celex%3A52021PC0206 [accessed 2023-10-12]
  17. Cawthorne D, Robbins-van Wynsberghe A. An ethical framework for the design, development, implementation, and assessment of drones used in public healthcare. Sci Eng Ethics. Oct 23, 2020;26(5):2867-2891. [FREE Full text] [CrossRef] [Medline]
  18. Poulsen A, Burmeister O, Tien D. A new design approach and framework for elderly care robots. ACIS Proceedings. 2018. URL: https://aisel.aisnet.org/acis2018/75 [accessed 2023-10-12]
  19. Milosevic Z. Ethics in Digital Health: A Deontic Accountability Framework. Presented at: 2019 IEEE 23rd International Enterprise Distributed Object Computing Conference (EDOC); October 28-31, 2019, 2019; Paris, France. [CrossRef]
  20. Abràmoff MD, Tobey D, Char DS. Lessons learned about autonomous AI: finding a safe, efficacious, and ethical path through the development process. Am J Ophthalmol. Jun 2020;214:134-142. [FREE Full text] [CrossRef] [Medline]
  21. Joerin A, Rauws M, Fulmer R, Black V. Ethical artificial intelligence for digital health organizations. Cureus. Mar 07, 2020;12(3):e7202. [FREE Full text] [CrossRef] [Medline]
  22. Carter SM, Rogers W, Win KT, Frazer H, Richards B, Houssami N. The ethical, legal and social implications of using artificial intelligence systems in breast cancer care. Breast. Feb 2020;49:25-32. [FREE Full text] [CrossRef] [Medline]
  23. Goirand M, Austin E, Clay-Williams R. Implementing ethics in healthcare AI-based applications: a scoping review. Sci Eng Ethics. Sep 03, 2021;27(5):61. [CrossRef] [Medline]
  24. Channa R, Wolf R, Abramoff MD. Autonomous artificial intelligence in diabetic retinopathy: from algorithm to clinical application. J Diabetes Sci Technol. May 04, 2021;15(3):695-698. [FREE Full text] [CrossRef] [Medline]
  25. Kretzschmar K, Tyroll H, Pavarini G, Manzini A, Singh I, NeurOx Young People’s Advisory Group. Can your phone be your therapist? Young people's ethical perspectives on the use of fully automated conversational agents (chatbots) in mental health support. Biomed Inform Insights. Mar 05, 2019;11:1178222619829083. [FREE Full text] [CrossRef] [Medline]
  26. Hasenauer R, Belviso C, Ehrenmueller I. New Efficiency: Introducing Social Assistive Robots in Social Eldercare Organizations. Presented at: 2019 IEEE International Symposium on Innovation and Entrepreneurship (TEMS-ISIE); October 24-26, 2019, 2019; Hangzhou, China. [CrossRef]
  27. Yirmibesoglu Erkal E, Akpınar A, Erkal H. Ethical evaluation of artificial intelligence applications in radiotherapy using the Four Topics Approach. Artif Intell Med. May 2021;115:102055. [CrossRef] [Medline]
  28. Anderson M, Anderson SL, Berenz V. A Value-Driven Eldercare Robot: Virtual and Physical Instantiations of a Case-Supported Principle-Based Behavior Paradigm. Proceedings of the IEEE. 2019;107(3):526-540. [CrossRef]
  29. Fiske A, Henningsen P, Buyx A. Your robot therapist will see you now: ethical implications of embodied artificial intelligence in psychiatry, psychology, and psychotherapy. J Med Internet Res. May 09, 2019;21(5):e13216. [FREE Full text] [CrossRef] [Medline]
  30. Shen FX, Silverman BC, Monette P, Kimble S, Rauch SL, Baker JT. An ethics checklist for digital health research in psychiatry: viewpoint. J Med Internet Res. Feb 09, 2022;24(2):e31146. [FREE Full text] [CrossRef] [Medline]
  31. Alzghoul BI, Abdullah NAC. Pain management practices by nurses: an application of the Knowledge, Attitude and Practices (KAP) model. Glob J Health Sci. Oct 26, 2015;8(6):154-160. [FREE Full text] [CrossRef] [Medline]
  32. Habib MA, Dayyab FM, Iliyasu G, Habib AG. Knowledge, attitude and practice survey of COVID-19 pandemic in Northern Nigeria. PLoS One. 2021;16(1):e0245176. [FREE Full text] [CrossRef] [Medline]
  33. Dhakal R, Paudel S, Paudel D. Knowledge, attitude, and practice regarding testicular cancer and testicular self-examination among male students pursuing bachelor's degree in Bharatpur Metropolitan City, Chitwan, Nepal. Biomed Res Int. 2021;2021:1802031. [FREE Full text] [CrossRef] [Medline]
  34. Hulme M. “Gaps” in climate change knowledge: do they exist? Can they be filled? Environ Humanit. 2018;10(1):330-337. [FREE Full text] [CrossRef]
  35. Ajzen I, Fishbein M. Attitudes and the attitude-behavior relation: reasoned and automatic processes. Eur Rev Soc Psychol. Apr 15, 2011;11(1):1-33. [CrossRef]
  36. Bourdieu P. The Logical of Practice. Redwood City, CA. Stanford University Press; 1990.
  37. Polit DF, Beck CT, Owen SV. Is the CVI an acceptable indicator of content validity? Appraisal and recommendations. Res Nurs Health. Aug 2007;30(4):459-467. [CrossRef] [Medline]
  38. Kennedy-Shaffer L, Qiu X, Hanage WP. Snowball sampling study design for serosurveys early in disease outbreaks. Am J Epidemiol. Sep 01, 2021;190(9):1918-1927. [FREE Full text] [CrossRef] [Medline]
  39. Sheu S, Wei I, Chen C, Yu S, Tang F. Using snowball sampling method with nurses to understand medication administration errors. J Clin Nurs. Feb 2009;18(4):559-569. [CrossRef] [Medline]
  40. Valerio MA, Rodriguez N, Winkler P, Lopez J, Dennison M, Liang Y, et al. Comparing two sampling methods to engage hard-to-reach communities in research priority setting. BMC Med Res Methodol. Oct 28, 2016;16(1):146. [FREE Full text] [CrossRef] [Medline]
  41. Wenjuanxing. URL: https://www.wjx.cn [accessed 2023-10-13]
  42. Gagnier JJ, Lai J, Mokkink LB, Terwee CB. COSMIN reporting guideline for studies on measurement properties of patient-reported outcome measures. Qual Life Res. Aug 05, 2021;30(8):2197-2218. [CrossRef] [Medline]
  43. Tinsley H, Tinsley D. Uses of factor analysis in counseling psychology research. J Couns Psychol. Oct 1987;34(4):414-424. [FREE Full text] [CrossRef]
  44. Wu M. Practice of questionnaire statistical analysis: SPSS operation and application. Chongqing, China. Chongqing University Press; 2010.
  45. Di Iorio CK. Measurement in Health Behavior: Methods for Research and Evaluation. New York, NY. John Wiley & Sons; 2006.
  46. Atanasova S, Petric G. Collective empowerment in online health communities: scale development and empirical validation. J Med Internet Res. Nov 20, 2019;21(11):e14392. [FREE Full text] [CrossRef] [Medline]
  47. Bollen K. Structural Equations With Latent Variables. New York, NY. John Wiley & Sons; 1989.
  48. Tabachnick B, Fidell LS. Using Multivariate Statistics (6th edition). Boston, MA. Pearson; 2012.
  49. Zamanzadeh V, Ghahramanian A, Rassouli M, Abbaszadeh A, Alavi-Majd H, Nikanfar A. Design and implementation content validity study: development of an instrument for measuring patient-centered communication. J Caring Sci. Jun 2015;4(2):165-178. [FREE Full text] [CrossRef] [Medline]
  50. Gisev N, Bell J, Chen T. Interrater agreement and interrater reliability: key concepts, approaches, and applications. Res Social Adm Pharm. 2013;9(3):330-338. [FREE Full text] [CrossRef] [Medline]
  51. Morley J, Floridi L, Kinsey L, Elhalal A. From what to how: an initial review of publicly available AI ethics tools, methods and research to translate principles into practices. Sci Eng Ethics. Aug 2020;26(4):2141-2168. [FREE Full text] [CrossRef] [Medline]
  52. Barnea MF, Schwartz SH. Values and voting. Political Psychol. Jun 28, 2008;19(1):17-40. [CrossRef]
  53. Graham J, Haidt J, Koleva S, Motyl M, Iyer R, Wojcik SP, et al. Chapter Two - Moral Foundations Theory: The Pragmatic Validity of Moral Pluralism. Advances in Experimental Social Psychology. 2013;47:55-130. [CrossRef]
  54. Bardi A, Schwartz SH. Values and behavior: strength and structure of relations. Pers Soc Psychol Bull. Oct 02, 2003;29(10):1207-1220. [CrossRef] [Medline]
  55. Shaw J, Rudzicz F, Jamieson T, Goldfarb A. Artificial intelligence and the implementation challenge. J Med Internet Res. Jul 10, 2019;21(7):e13659. [FREE Full text] [CrossRef] [Medline]
  56. Zheng B, Wu M, Zhu S, Zhou H, Hao X, Fei F, et al. Attitudes of medical workers in China toward artificial intelligence in ophthalmology: a comparative survey. BMC Health Serv Res. Oct 09, 2021;21(1):1067. [FREE Full text] [CrossRef] [Medline]


AI: artificial intelligence
CFA: confirmatory factor analysis
CFI: comparative fit index
COSMIN: e COnsensus-based Standards for the selection of health status Measurement INstruments
CR: critical ratio
EFA: exploratory factor analysis
I-CVI: item-level content validity index
IRR: interrater reliability
KAP: Knowledge-Attitude-Practice
KAP-EI: Knowledge-Attitude-Practice in Ethics Implementation
KMO: Kaiser-Meyer-Olkin
RMSEA: root-mean-square error of approximation
S-CVI/Ave: average scale–level content validity index
SRMR: standardized root-mean-square residual
TLI: Tucker Lewis index


Edited by G Eysenbach, T Leung; submitted 26.08.22; peer-reviewed by KZ Yap, D Juzwishin, R Meng; comments to author 20.12.22; revised version received 31.01.23; accepted 24.09.23; published 26.10.23.

Copyright

©Xiaobo Zhang, Ying Gu, Jie Yin, Yuejie Zhang, Cheng Jin, Weibing Wang, Albert Martin Li, Yingwen Wang, Ling Su, Hong Xu, Xiaoling Ge, Chengjie Ye, Liangfeng Tang, Bing Shen, Jinwu Fang, Daoyang Wang, Rui Feng. Originally published in JMIR Formative Research (https://formative.jmir.org), 26.10.2023.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Formative Research, is properly cited. The complete bibliographic information, a link to the original publication on https://formative.jmir.org, as well as this copyright and license information must be included.