This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Formative Research, is properly cited. The complete bibliographic information, a link to the original publication on https://formative.jmir.org, as well as this copyright and license information must be included.
The Data and Connectivity COVID-19 Vaccines Pharmacovigilance (DaC-VaP) UK-wide collaboration was created to monitor vaccine uptake and effectiveness and provide pharmacovigilance using routine clinical and administrative data. To monitor these, pooled analyses may be needed. However, variation in terminologies present a barrier as England uses the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT), while the rest of the United Kingdom uses the Read v2 terminology in primary care. The availability of data sources is not uniform across the United Kingdom.
This study aims to use the concept mappings in the Observational Medical Outcomes Partnership (OMOP) common data model (CDM) to identify common concepts recorded and to report these in a repeated cross-sectional study. We planned to do this for vaccine coverage and 2 adverse events of interest (AEIs), cerebral venous sinus thrombosis (CVST) and anaphylaxis. We identified concept mappings to SNOMED CT, Read v2, the World Health Organization’s
Exposures and outcomes of interest to DaC-VaP for pharmacovigilance studies were selected. Mappings of these variables to different terminologies used across the United Kingdom’s devolved nations’ health services were identified from the Observational Health Data Sciences and Informatics (OHDSI) Automated Terminology Harmonization, Extraction, and Normalization for Analytics (ATHENA) online browser. Lead analysts from each nation then confirmed or added to the mappings identified. These mappings were then used to report AEIs in a common format. We reported rates for windows of 0-2 and 3-28 days postvaccine every 28 days.
We listed the mappings between Read v2, SNOMED CT, ICD-10, and dm+d. For vaccine exposure, we found clear mapping from OMOP to our clinical terminologies, though dm+d had codes not listed by OMOP at the time of searching. We found a list of CVST and anaphylaxis codes. For CVST, we had to use a broader cerebral venous thrombosis conceptual approach to include Read v2. We identified 56 SNOMED CT codes, of which we selected 47 (84%), and 15 Read v2 codes. For anaphylaxis, our refined search identified 60 SNOMED CT codes and 9 Read v2 codes, of which we selected 10 (17%) and 4 (44%), respectively, to include in our repeated cross-sectional studies.
This approach enables the use of mappings to different terminologies within the OMOP CDM without the need to catalogue an entire database. However, Read v2 has less granular concepts than some terminologies, such as SNOMED CT. Additionally, the OMOP CDM cannot compensate for limitations in the clinical coding system. Neither Read v2 nor ICD-10 is sufficiently granular to enable CVST to be specifically flagged. Hence, any pooled analysis will have to be at the less specific level of cerebrovascular venous thrombosis. Overall, the mappings within this CDM are useful, and our method could be used for rapid collaborations where there are only a limited number of concepts to pool.
COVID-19 vaccination is the best option for controlling the current pandemic, with data about uptake and pharmacovigilance (the science and activities relating to the detection, assessment, understanding, and prevention of any side effects of a vaccine or drug) therefore essential for monitoring its progress. Since COVID-19 was first identified in Wuhan, China, at the end of 2019, the virus has spread worldwide, with more than 190 million confirmed cases and over 5.9 million COVID-19–related deaths as of Feburary 28, 2022 [
Medical record systems enable information flow beyond organizational boundaries. General practices with their own information technology (IT) systems record millions of patient interactions daily. A challenge for this partnership is the heterogeneity of routine primary care data due to variation in the clinical terminologies used across the 4 UK nations. The Data and Connectivity COVID-19 Vaccines Pharmacovigilance (DaC-VaP) collaboration was formed to explore vaccine effectiveness, uptake, and safety across the United Kingdom. England has transitioned to the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) and has not updated Read v2 since April 2016 [
The use of a common data model (CDM) could provide a solution faced by many looking to aggregate data from different sources [
Other groups, including the National COVID Cohort Collaborative (N3C) in the United States, have faced challenges in how to achieve harmonization between data sources. Although this was customized and drew together data from different sources and CDMs, the N3C also extensively used OMOP [
Our primary aim is to assess the feasibility of using the OMOP CDM to report the incidence of exemplar AEIs following COVID-19 vaccination across the DaC-VaP collaboration and report these as repeated cross-sectional analyses. The objectives of the study include the following:
To test the validity of the mappings within the OMOP Automated Terminology Harmonization, Extraction, and Normalization for Analytics (ATHENA) online browser to our exemplar AEIs. ATHENA is a repository of all the latest OMOP CDM vocabularies and mappings are hosted and can be searched and downloaded.
To report the vaccine uptake rate across the United Kingdom, stratified by age group, sex, vaccine type, and ethnicity, with the goal of reporting a UK-wide vaccination uptake rate. We differentiated people who have had their first and second doses.
To report the rates for England, Scotland, Wales, and Northern Ireland and overall for the 2 exemplar AEIs, cerebral venous sinus thrombosis (CVST) and anaphylaxis.
We used the OMOP ATHENA online browser to identify mappings to SNOMED CT, Read v2 terminology, and the
The data were drawn from the data sources of the 4 DaC-VaP partners.
The data from England are from the Oxford Royal College of General Practitioners (RCGP) Research and Surveillance Centre (RSC), 1 of Europe’s oldest surveillance networks. It is now in its 53rd season of operation, working alongside Public Health England (PHE) [
The EAVE II data of 5.4 million people registered in general practices in Scotland track COVID-19 within the Scottish population. This effort has led to impactful findings used by the Scottish and UK governments to respond to the COVID-19 pandemic.
The Secure Anonymised Information Linkage (SAIL) databank is a trusted research environment (TRE), a Wales-wide research resource focused on improving health, well-being, and services, which includes primary care general practitioner (GP) records, secondary care hospital data, and emergency services data, along with a range of administrative, governmental, education, social care, and specialist audit, register, and services data. These data are also used by the National Institute for Health and Care Excellence (NICE) to shape policies that cover England and Wales. SAIL is powered by the Secure e-Research Platform (SeRP, a technology platform and service that enables the SAIL databank and other TREs and platforms in the United Kinngdom and worldwide).
Data are accessed through the Business Services Organisation Honest Broker Service (HBS), which also uses SeRP. It has available GP registration (but not primary care clinical records), the enhanced prescribing database, emergency department attendances, hospital admissions, COVID-19 testing, and the vaccine management system.
We performed repeated cross-sectional reports of the incidence of the AEIs in the vaccinated population in a single time interval postvaccination.
We searched the OMOP CDM for the concepts of interest using the ATHENA online browser. These concepts were demographic details, COVID-19 vaccine uptake, and the AEIs. The demographic and socioeconomic status (SES) data of interest were age and sex, and the SES was divided into quintiles (quintile 5 being the most deprived). We also collected data on obesity (defined as the latest BMI≥30 or coded as obesity; the BMI is a measure that uses your height and weight to work out whether your weight is healthy) and smoking status (current, ex-, or never smoked).
We compared the linkages flagged by the OMOP ATHENA online browser with those currently used across the 4 nations. We reported any differences and achieved a consensus as to which terms/codes will be used in each nation.
OMOP also maps to the
Each DaC-VaP partner nation will restrict its ATHENA search using the VOCAB (vocabulary) tool to SNOMED CT or Read v2, as relevant, and dm+d.
The medication dictionary dm+d is made up of a hierarchy of generic terms (termed “virtual”) and real prescribable items (termed “actual”). The dm+d use case for the COVID-19 vaccine is set out below:
Virtual therapeutic moiety (VTM): top of the hierarchy (type of therapy indicator, eg, COVID-19 vaccine); in this use case, this is the COVID-19 vaccine.
Virtual medicinal product (VMP): This is the next level of notional product and allows vaccine types (COVID-19 vaccine type: mRNA or vector); in our use case, messenger RNA (mRNA) vaccines and their manufacturers were to be distinguished from recombinant vaccines.
Virtual medicinal product pack (VMPP): This is the notional product pack for the medical VMP, for example, a 6-dose multidose vial.
Actual medicinal products (AMPs): These are the medicinal products prescribed to an individual; for example, vaccine brands (Pfizer-BioNTech, Moderna).
Actual medicinal product packs (AMPPs): These are the distribution packs of medicinal products.
The analyst team from each nation reported whether they included all the terms identified from their search of OMOP for mapping to their terminology using ATHENA and whether they added others they routinely use.
We ran these searches monthly to produce a monthly output of vaccine coverage by demographic group and reported the incidence of our AEIs.
Vaccine uptake was reported as the percentage of adults vaccinated per nation and stratified by age, sex, smoking status, and obesity. We reported 2 exemplars of AEIs following vaccination using 2 time windows (0-2 and 3-28 days).
DaC-VaP partners ran cross-sectional studies for the previous 28 days.
The first search started on December 8, 2020 (first dose of the Pfizer vaccine given in the United Kingdom), and the second search on January 5, 2021. These ran in 28-day intervals (February 2, March 2, March 30, April 27-August 17, 2021).
The cross sections included all individuals registered with general practices on the date of vaccination and remained registered for 28 days. The outputs were reported in the following age bands: <16 years old, 16-39 years, 40-64 years, and 65 years and older. Mortality in the postvaccination period were also reported for those with AEIs.
We reported by vaccine brand, including reporting unknown vaccines. We presumed that the unknown vaccine brand for December 2020 was Pfizer-BioNTech, as Oxford-AstraZeneca was unavailable until January 2021 and other vaccine types later.
We aimed to include statistical reporting and disproportional analysis metrics, a proportional reporting ratio (PRR), and a reporting odds ratio (ROR). We used a Bayesian method that provided a framework to combine prior information/knowledge and data to account for conceptual transparency. Our aim was to use IC = log2 (observed + 1/2)/(expected + 1/2).
The DaC-VaP collaborators had individual ethical control of their data. No data were reported that might risk identifying individuals. Where less than 5 individuals were in a group, this was reported as <5. This study aims to demonstrate the potential of the DaC-VaP collaboration to report outcomes of interest.
The University of Oxford complies with the General Data Protection Regulation and the NHS Digital Data Security and Protection Policy [
Ethical permission for this study was granted by the South-East Scotland Research Ethics Committee (#314 02; 12/SS/0201). The Public Benefit and Privacy Panel Committee of Public Health Scotland (#315) approved the linkage and analysis of the de-identified data sets for this project (#1920-0279).
This study made use of anonymized data held in the SAIL databank. We used data provided by patients and collected by the NHS as part of their care and support. All research conducted was completed after the permission and approval of the SAIL independent Information Governance Review Panel (IGRP; project number 0911).
Nothern Irish data were accessed from the Business Services Organization HBS, which provided de-identified linked data via SeRP. All research conducted was approved by the HBS Governance Board (HBSGB; project number 064).
We initially reported whether the data items or clinical concepts required for the study existed within the OMOP CDM and whether there were mappings to SNOMED CT, Read v2, or dm+d (
We then reported the components by terminology (
Of most importance are the AEIs. For CVST, the specific concept exists within OMOP, and it also exists in SNOMED CT: SNOMED concept IDs 95455008 and19522900 from CVST (concept ID 195229008). For anaphylaxis, SNOMED CT and the ATHENA online browser showed 130 and 161 items, respectively. SNOMED CT and OMOP had 16 and 15 items, respectively. Read v2 codes were generic for CVST and anaphylaxis. Of these, those relevant to vaccination are shown in
Variables included in the CDMa conceptual mapping exercise, with counts subsequently reported monthly.
Data item | OMOPb (Y=yes, N=no) | Data source | |||
|
|||||
|
Gender | Y | Standardized sex (gender) codes are used in OMOP CDM mapping. Date of birth and age concepts also exist. | ||
|
Age band | Y | Date of birth and age concepts exist in OMOP. | ||
|
|||||
|
The Index of Multiple Deprivation (IMD, a set of relative measures of deprivation for small areas [lower-layer super-output areas) across England, based on 7 domains of deprivation), in England, the Scottish Index of Multiple Deprivation (SIMD), the Welsh Index of Multiple Deprivation (WIMD), and the Northern Ireland Multiple Deprivation Measure (NIMDM) | N | Does not exist in the OMOP CDM. It can be introduced as a custom mapping in all UK databases within OMOP. It is to be harmonized across the DaC-VaPd data partners (quintile 5 most deprived, quintile 1 least deprived). | ||
|
|||||
|
BMI>30/obesity | Y | Will be found in the measurement table or from a diagnosis of obesity. | ||
|
Smoking status | Y | Will be found in the observation table. | ||
|
|||||
|
Vaccine type | Y | Will be found in the drug, procedure, and event tables. For England, the source codes are dm+de or SNOMED CTf. | ||
|
Vaccine dose | Y | In vaccine administration. | ||
|
Vaccination date | Y | Date of event when the event is COVID-19 vaccination. | ||
|
|||||
|
CVSTh | Y | Will be found in the condition table. It is mapped to SNOMED CT or Read v2. We did not include medications in this feasibility study. | ||
|
Anaphylaxis | Y | Will be found in the condition table. It is mapped to SNOMED CT or Read v2. We did not include medications in this feasibility study. |
aCDM: common data model.
bOMOP: Observational Medical Outcomes Partnership.
cSES: socioeconomic status.
dDaC-VaP: Data and Connectivity COVID-19 Vaccines Pharmacovigila.
edm+d: Dictionary of Medicines and Devices.
fSNOMED CT: Systematized Nomenclature of Medicine Clinical Trials.
gAEI: adverse event of interest.
hCVST: cerebral venous sinus thrombosis.
Study concepts identified within OMOPa and ICD-10b, and any mapping to SNOMED CTc, Read v2, and dm+dd.
Primary term | OMOP ATHENAe concept ID | ICD-10 | dm+d | SNOMED CT | Read v2 |
CVSTf (nonstandard to standard map) | 10083037 | I63.6, I67.6, U07.7 (vaccine caused adverse effects) and P3.344 (CVST in hospitalized adults) | N/Ag | 4102202 | N/A |
CVST (standard to nonstandard map) | 10083037 | I63.6, 167.6, UO7.7 (vaccine caused adverse effects) | N/A | 4102202 | N/A |
Anaphylaxis (localized) | 4034658 | T78.2 (anaphylactic shock unspecified) | N/A | 40316757 (systemic), 42536383 (anaphylactic shock), 4294049 (sudden onset), 2084167 (allergic), 4084167 (acute allergic reaction), 441202 (nonstandard to standard OMOP map), 441202, 40640468 (generalized) | N/A |
Anaphylaxis | 441202 | 45537000 (anaphylactic shock unspecified) | N/A | 40316757 (systemic), 40640468 (generalized) | N/A |
Anaphylaxis (anaphylactic shock due to adverse effect of correct medicinal substance properly administered) | 45376003 | 45537000 (anaphylactic shock unspecified), 19746 | N/A | 4254051 (drug or medicament), 441297 (adverse reaction to drug) | N/A |
Anaphylaxis (drug induced) | 241937000 | 45537000 (anaphylactic shock unspecified) | N/A | 46274027, 4084168 (nonstandard OMOP) | N/A |
Anaphylaxis (procedure) | 42537947 | 45537000 (anaphylactic shock unspecified) | N/A | 44807057 (anaphylaxis care), 4021200 (care of patient states), 42537947 (nonstandard to standard OMOP map), 44807057 (standard to nonstandard OMOP map) | N/A |
Anaphylaxis (due to substance) | 4221182 | 45537000 (anaphylactic shock unspecified) | N/A | 4022675 (substance), 4294049 (sudden onset), 441202 (anaphylaxis), 4221182 (nonstandard to standard OMOP map), 4083868 (standard to nonstandard OMOP map) | N/A |
aOMOP: Observational Medical Outcomes Partnership.
bICD-10:
cSNOMED CT: Systematized Nomenclature of Medicine Clinical Trials.
ddm+d: Dictionary of Medicines and Devices.
eATHENA: Automated Terminology Harmonization, Extraction, and Normalization for Analytics.
fCVST: cerebral venous sinus thrombosis.
gN/A: not applicable.
Study concepts identified within OMOPa and any mapping to ICD-10b, dm+dc, SNOMED CTd, or Read v2 in Scotland.
Primary term | OMOP ATHENAe concept ID | ICD-10 | dm+d | SNOMED CT | Read v2 |
CVSTf (nonstandard to standard map) | 10083037 | I63.6, 167.6, UO7.7 (vaccine caused adverse effects) and P3.344 (CVST in hospitalized adults) | N/Ag | N/A | N/A |
CVST (standard to nonstandard map) | 10083037 | I63.6, 167.6, UO7.7 (vaccine caused adverse effects) | N/A | N/A | N/A |
Cerebral vein thrombosis | 45446702 | I63.6, 167.6, UO7.7 (vaccine caused adverse effects) | N/A | N/A | G67A |
Thrombosis of central nervous system venous sinus NOS | 3534267 | I63.6, 167.6, UO7.7 (vaccine caused adverse effects) | N/A | N/A | F051z |
Thrombophlebitis of central nervous system venous sinuses | 4100223 | I63.6, 167.6, UO7.7 (vaccine caused adverse effects) | N/A | N/A | F053 |
Nonpyogenic venous sinus thrombosis | 45456755 | I63.6, 167.6, UO7.7 (vaccine caused adverse effects) | N/A | N/A | G676 |
Anaphylaxis (localized) | 4034658 | 45537000 (anaphylactic shock unspecified) | N/A | N/A | N/A |
Anaphylaxis | 441202 | 45537000 (anaphylactic shock unspecified) | N/A | N/A | N/A |
Anaphylaxis (anaphylactic shock due to adverse effect of correct medicinal substance properly administered) | 45376003 | 45537000 (anaphylactic shock unspecified), 19746 | N/A | N/A | N/A |
Anaphylaxis (drug induced) | 241937000 | 45537000 (anaphylactic shock unspecified) | N/A | N/A | N/A |
Anaphylaxis (procedure) | 42537947 | 45537000 (anaphylactic shock unspecified) | N/A | N/A | N/A |
Anaphylaxis (due to substance) | 4221182 | 45537000 (anaphylactic shock unspecified) | N/A | N/A | N/A |
aOMOP: Observational Medical Outcomes Partnership.
bICD-10:
cdm+d:
dSNOMED CT: Systematized Nomenclature of Medicine Clinical Trials.
eATHENA: Automated Terminology Harmonization, Extraction, and Normalization for Analytics.
fCVST: cerebral venous sinus thrombosis.
gN/A: not applicable.
Study concepts identified within OMOPa and any mapping to ICD-10b, dm+dc, SNOMED CTd, or Read v2 in Wales.
Primary term | OMOP ATHENAe concept ID | ICD-10 | dm+d | SNOMED CT | Read v2 |
CVSTf (nonstandard to standard map) | 10083037 | I63.6, I67.6, U07.7 (vaccine caused adverse effects) and P3.344 (CVST in hospitalized adults) | N/Ag | N/A | N/A |
CVST (standard to nonstandard map) | 10083037 | I63.6, 167.6, UO7.7 (vaccine caused adverse effects) | N/A | N/A | N/A |
Anaphylaxis (localized) | 4034658 | T78.2 (anaphylactic shock unspecified) | N/A | N/A | N/A |
Anaphylaxis | 441202 | 45537000 (anaphylactic shock unspecified) | N/A | N/A | SN50.11 |
Anaphylaxis (anaphylactic shock due to adverse effect of correct medicinal substance properly administered) | 45376003 | 45537000 (anaphylactic shock unspecified), 19746 | N/A | N/A | SN50110 |
Anaphylaxis (drug induced) | 241937000 | 45537000 (anaphylactic shock unspecified) | N/A | N/A | SN50.00, 14M5.00 |
Anaphylaxis (procedure) | 42537947 | 45537000 (anaphylactic shock unspecified) | N/A | N/A | SN50.11, SN50.00, 14M5.00 |
Anaphylaxis (due to substance) | 4221182 | 45537000 (anaphylactic shock unspecified) | N/A | N/A | SN50.11, SN50.00, 14M5.00 |
aOMOP: Observational Medical Outcomes Partnership.
bICD-10:
cdm+d:
dSNOMED CT: Systematized Nomenclature of Medicine Clinical Trials.
eATHENA: Automated Terminology Harmonization, Extraction, and Normalization for Analytics.
fCVST: cerebral venous sinus thrombosis.
gN/A: not applicable.
Study concepts identified within OMOPa and any mapping to ICD-10b, dm+dc, SNOMED CTd, or Read v2 in Northern Ireland.
Primary term | OMOP ATHENAe concept ID | ICD-10 | dm+d | SNOMED CT | Read v2 |
CVSTf (nonstandard to standard map) | 10083037 | I63.6, 167.6, UO7.7 (vaccine caused adverse effects) and P3.344 (CVST in hospitalized adults) | N/Ag | N/A | N/A |
CVST (standard to nonstandard map) | 10083037 | I63.6, 167.6, UO7.7 (vaccine caused adverse effects) | N/A | N/A | N/A |
Anaphylaxis (localized) | 4034658 | 45537000 (anaphylactic shock unspecified) | N/A | N/A | N/A |
Anaphylaxis | 441202 | 45537000 (anaphylactic shock unspecified) | N/A | N/A | N/A |
Anaphylaxis (anaphylactic shock due to adverse effect of correct medicinal substance properly administered) | 45376003 | 45537000 (anaphylactic shock unspecified), 19746 | N/A | N/A | N/A |
Anaphylaxis (drug induced) | 241937000 | 45537000 (anaphylactic shock unspecified) | N/A | N/A | N/A |
Anaphylaxis (procedure) | 42537947 | 45537000 (anaphylactic shock unspecified) | N/A | N/A | N/A |
Anaphylaxis (due to substance) | 4221182 | 45537000 (anaphylactic shock unspecified) | N/A | N/A | N/A |
aOMOP: Observational Medical Outcomes Partnership.
bICD-10:
cdm+d:
dSNOMED CT: Systematized Nomenclature of Medicine Clinical Trials.
eATHENA: Automated Terminology Harmonization, Extraction, and Normalization for Analytics.
fCVST: cerebral venous sinus thrombosis.
gN/A: not applicable.
COVID-19 vaccine exposure was well recorded with dm+d and SNOMED CT. The vaccine unsurprisingly was listed as a VTM at the top of the drug dictionary hierarchy, with VMPs created for each vaccine type. There were virtual and actual packs and products to match the vaccines available. Additional administration and vaccine-type clinical terms were also within SNOMED CT (
COVID-19 vaccine concepts.
Vaccine brand/generic/administration | Administration, n | Number of dm+da or SNOMED CTb codes, n | Ingredients, n | |||||
|
VTMc | VMPd | VMPPe | AMPPf | AMPg |
|
||
Generic COVID-19 | N/Ah | 1 | N/A | N/A | N/A | N/A | N/A | |
Generic mRNAi | 3 | N/A | N/A | N/A | N/A | N/A | N/A | |
Generic recombinant | N/A | N/A | N/A | N/A | N/A | N/A | 3 | |
Vaccine administration | 1 | N/A | N/A | N/A | N/A | N/A | N/A | |
COVID-19 vaccine administration | 6 | N/A | N/A | N/A | N/A | N/A | N/A | |
COVID-19 1st dose vaccine administration | 2 | N/A | N/A | N/A | N/A | N/A | N/A | |
COVID-19 2nd dose vaccine administration | 2 | N/A | N/A | N/A | N/A | N/A | N/A | |
Oxford-AstraZeneca( | N/A | N/A | 1 | 4 | 4 | 1 | N/A | |
Moderna | N/A | N/A | 1 | 2 | 2 | 1 | N/A | |
Pfizer-BioNTech | N/A | N/A | 1 | 2 | 2 | 1 | N/A |
adm+d: Dictionary of Medicines and Devices.
bSNOMED CT: Systematized Nomenclature of Medicine Clinical Trials.
cVTM: virtual therapeutic moiety.
dVMP: virtual medicinal product.
eVMPP: virtual medicinal product pack.
fAMPP: actual medicinal product pack.
gAMP: actual medicinal product.
hN/A: not applicable.
This study shows a mapping method to identify codes relevant to CVST and anaphylaxis using the OMOP CDM to link common concepts required for COVID-19 vaccine pharmacovigilance to different terminologies relevant to the United Kingdom. All our predefined concepts were represented in the OMOP CDM. However, some, such as SES, did not have specific mappings and, thus, custom mappings would need development. We noted that local codes and curation of variables may be used to enable specificity where the concepts are less granular, especially for CVST.
The OMOP CDM may be suboptimal to overcome the limitation in the granularity of the coding systems used for AEIs. As well as being less granular, the Read v2 terminology has not been updated formally since April 2016, so local adaptions have been undertaken in the developed UK nations to enable new conditions and treatments, such as COVID-19 and vaccination, to be recorded.
Conventionally, CDMs, such as OMOP, are used by each database, mapping data and querying them using the script created by 1 of the teams. The cataloguing is carried out using applications such as White Rabbit and Rabbit in a Hat (
Formal mapping of a database (in this case ORCHID) to the OMOP CDM. We used 3 steps to develop the extract, transform, and load (ETL) design. We tested this specifically in relation to the SQL environment of the ORCHID database. CDM: common data model; OHDSI: Observational Health Data Sciences and Informatics; OMOP: Observational Medical Outcomes Partnership; ORCHID: Oxford RCGP Clinical Informatics Digital Hub.
The OMOP CDM provides a framework for capturing patient demographic and socioeconomic characteristics, varying vaccine exposure [
Concept mapping to a large number of terminologies, such as within OMOP and its ATHENA online browser, are usable and valuable for those conducting studies that draw together heterogeneous data to perform pooled analyses. Comprehensive mappings have to set a level of granularity that may be more or less specific than the terminologies they map to. Clinical variable curation at a local database level would prove useful to address issues around granularity. This would allow local expert refinement of the mappings that could be used by others looking to do a limited pooled analysis of a small number of clinical concepts. The interconnectivity of the pooled analysis may also support the MHRA’s Sponstaneous Report System used for optimizing patient safety.
adverse event of interest
actual medicinal product
actual medicinal product pack
Automated Terminology Harmonization, Extraction, and Normalization for Analytics
common data model
cerebral venous sinus thrombosis
Data and Connectivity COVID-19 Vaccines Pharmacovigila
Dictionary of Medicines and Devices
Food and Drug Administration
general practitioner
Honest Broker Service
Medical Dictionary for Regulatory Activities
Medicines and Healthcare products Regulatory Agency
National COVID Cohort Collaborative
National Health Service
Observational Health Data Sciences and Informatics
Observational Medical Outcomes Partnership
Oxford RCGP Clinical Informatics Digital Hub
Patient-Centered Outcomes Research Institute
Patient-Centred Outcomes Research Network
Royal College of General Practitioners
Research and Surveillance Centre
real-world data
Secure Anonymised Information Linkage
Sentinel common data model
Secure e-Research Platform
socioeconomic status
Systematized Nomenclature of Medicine Clinical Terms
trusted research environment
virtual medicinal product
virtual medicinal product pack
virtual therapeutic moiety
We thank patients registered with practice members of the Oxford Royal College of General Practitioners (RCGP) Research and Surveillance Centre (RSC) who allowed their pseudonymized data to be shared for this research. We also thank EMIS (Education Management Information System), TPP (The Phoenix Partnership), InPractice Systems, and Wellbeing for cooperation to facilitate data extraction. The RSC is principally funded by the UK Health Security Agency.
We would also like to acknowledge all data providers who made anonymized data available for research. We wish to acknowledge the collaborative partnership that enabled acquisition and access to the de-identified data, which led to this output. The collaboration was led by the Swansea University Health Data Research UK team under the direction of the Welsh Government Technical Advisory Cell (TAC) and includes the following groups and organizations: the Secure Anonymised Information Linkage (SAIL) databank, Administrative Data Research (ADR) Wales, Digital Health and Care Wales (DHCW), Public Health Wales, the National Health Service (NHS) Shared Services Partnership (NWSSP), and the Welsh Ambulance Service Trust (WAST).
SdeL conceived the approach in collaboration with FDRH and A Sheikh. SdeL, GD, and A Stipanic drafted versions of the protocol, with input from all authors who have read and approved the paper.
SdeL reports that through his university, he has had grants from AstraZeneca, GSK, Sanofi, Seqirus, and Takeda for vaccine-related research and membership of advisory boards for AstraZeneca, Sanofi, and Seqirus. FDRH acknowledges part support as director of the National Institute for Health and Care Research (NIHR) Applied Research Collaboration (ARC) Oxford Thames Valley, and theme lead of the NIHR Oxford University Hospitals (OUH) Biomedical Research Centre (BRC). FDRH also received occasional fees or expenses for speaking or consultancy from AstraZeneca, Boehringer Ingelheim (BI), Bayer, Bristol Myers Squibb (BMS)/Pfizer, and Novartis. A Sheikh serves as an adviser to the UK and the Scottish Governments. He is also a member of Astra-Zeneca's Thrombotic Thrombocytopenic TaskForce. All these roles are unremunerated. RO is a member of the National Institute for Health and Care Excellence (NICE) Technology Appraisal Committee, member of the NICE Decision Support Unit (DSU), and associate member of the NICE Technical Support Unit (TSU). She has served as a paid consultant to the pharmaceutical industry, providing unrelated methodological advice. She reports teaching fees from the Association of British Pharmaceutical Industry (ABPI) and the University of Bristol. RAL is an unrenumerated member of the Welsh Government COVID-19 Technical Advisory Group. All other authors declare no conflicts of interest.